* [PATCH 0/3] Small fixes for issues detected during internal CI runs @ 2024-08-02 4:10 Kyle Lippincott via GitGitGadget 2024-08-02 4:10 ` [PATCH 1/3] set errno=0 before strtoX calls Kyle Lippincott via GitGitGadget ` (3 more replies) 0 siblings, 4 replies; 29+ messages in thread From: Kyle Lippincott via GitGitGadget @ 2024-08-02 4:10 UTC (permalink / raw) To: git; +Cc: Kyle Lippincott I'm attempting to get the git test suite running automatically during our weekly import. I have this mostly working, including with Address Sanitizer and Memory Sanitizer, but ran into a few issues: * several tests were failing due to strbuf_getcwd not clearing errno on success after it internally looped due to the path being >128 bytes. This is resolved in depth; though either one of the commits alone would resolve our issues: * modify locations that call strtoX and check for ERANGE to set errno = 0; prior to calling the conversion function. This is the typical way that these functions are invoked, and may indicate that we want compatibility helpers in git-compat-util.h to ensure that this happens correctly (and add these functions to the banned list). * have strbuf_getcwd set errno = 0; prior to a successful exit. This isn't very common for most functions in the codebase, but some other examples of this were found. * t6421-merge-partial-clone.sh had >10% flakiness. This is due to our build system using paths that contain a 64-hex-char hash, which had a 12.5% chance of containing the substring d0. Kyle Lippincott (3): set errno=0 before strtoX calls strbuf: set errno to 0 after strbuf_getcwd t6421: fix test to work when repo dir contains d0 builtin/get-tar-commit-id.c | 1 + ref-filter.c | 1 + strbuf.c | 1 + t/helper/test-json-writer.c | 2 ++ t/helper/test-trace2.c | 1 + t/t6421-merge-partial-clone.sh | 6 +++--- 6 files changed, 9 insertions(+), 3 deletions(-) base-commit: e559c4bf1a306cf5814418d318cc0fea070da3c7 Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1756%2Fspectral54%2Fstrbuf_getcwd-clear-errno-v1 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1756/spectral54/strbuf_getcwd-clear-errno-v1 Pull-Request: https://github.com/git/git/pull/1756 -- gitgitgadget ^ permalink raw reply [flat|nested] 29+ messages in thread
* [PATCH 1/3] set errno=0 before strtoX calls 2024-08-02 4:10 [PATCH 0/3] Small fixes for issues detected during internal CI runs Kyle Lippincott via GitGitGadget @ 2024-08-02 4:10 ` Kyle Lippincott via GitGitGadget 2024-08-02 5:12 ` Patrick Steinhardt 2024-08-02 4:10 ` [PATCH 2/3] strbuf: set errno to 0 after strbuf_getcwd Kyle Lippincott via GitGitGadget ` (2 subsequent siblings) 3 siblings, 1 reply; 29+ messages in thread From: Kyle Lippincott via GitGitGadget @ 2024-08-02 4:10 UTC (permalink / raw) To: git; +Cc: Kyle Lippincott, Kyle Lippincott From: Kyle Lippincott <spectral@google.com> To detect conversion failure after calls to functions like `strtod`, one can check `errno == ERANGE`. These functions are not guaranteed to set `errno` to `0` on successful conversion, however. Manual manipulation of `errno` can likely be avoided by checking that the output pointer differs from the input pointer, but that's not how other locations, such as parse.c:139, handle this issue; they set errno to 0 prior to executing the function. For every place I could find a strtoX function with an ERANGE check following it, set `errno = 0;` prior to executing the conversion function. Signed-off-by: Kyle Lippincott <spectral@google.com> --- builtin/get-tar-commit-id.c | 1 + ref-filter.c | 1 + t/helper/test-json-writer.c | 2 ++ t/helper/test-trace2.c | 1 + 4 files changed, 5 insertions(+) diff --git a/builtin/get-tar-commit-id.c b/builtin/get-tar-commit-id.c index 66a7389f9f4..7195a072edc 100644 --- a/builtin/get-tar-commit-id.c +++ b/builtin/get-tar-commit-id.c @@ -35,6 +35,7 @@ int cmd_get_tar_commit_id(int argc, const char **argv UNUSED, const char *prefix if (header->typeflag[0] != TYPEFLAG_GLOBAL_HEADER) return 1; + errno = 0; len = strtol(content, &end, 10); if (errno == ERANGE || end == content || len < 0) return 1; diff --git a/ref-filter.c b/ref-filter.c index 8c5e673fc0a..54880a2497a 100644 --- a/ref-filter.c +++ b/ref-filter.c @@ -1628,6 +1628,7 @@ static void grab_date(const char *buf, struct atom_value *v, const char *atomnam timestamp = parse_timestamp(eoemail + 2, &zone, 10); if (timestamp == TIME_MAX) goto bad; + errno = 0; tz = strtol(zone, NULL, 10); if ((tz == LONG_MIN || tz == LONG_MAX) && errno == ERANGE) goto bad; diff --git a/t/helper/test-json-writer.c b/t/helper/test-json-writer.c index ed52eb76bfc..a288069b04c 100644 --- a/t/helper/test-json-writer.c +++ b/t/helper/test-json-writer.c @@ -415,6 +415,7 @@ static void get_i(struct line *line, intmax_t *s_in) get_s(line, &s); + errno = 0; *s_in = strtol(s, &endptr, 10); if (*endptr || errno == ERANGE) die("line[%d]: invalid integer value", line->nr); @@ -427,6 +428,7 @@ static void get_d(struct line *line, double *s_in) get_s(line, &s); + errno = 0; *s_in = strtod(s, &endptr); if (*endptr || errno == ERANGE) die("line[%d]: invalid float value", line->nr); diff --git a/t/helper/test-trace2.c b/t/helper/test-trace2.c index cd955ec63e9..c588c273ce7 100644 --- a/t/helper/test-trace2.c +++ b/t/helper/test-trace2.c @@ -26,6 +26,7 @@ static int get_i(int *p_value, const char *data) if (!data || !*data) return MyError; + errno = 0; *p_value = strtol(data, &endptr, 10); if (*endptr || errno == ERANGE) return MyError; -- gitgitgadget ^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: [PATCH 1/3] set errno=0 before strtoX calls 2024-08-02 4:10 ` [PATCH 1/3] set errno=0 before strtoX calls Kyle Lippincott via GitGitGadget @ 2024-08-02 5:12 ` Patrick Steinhardt 2024-08-02 6:15 ` Kyle Lippincott 2024-08-02 15:01 ` Junio C Hamano 0 siblings, 2 replies; 29+ messages in thread From: Patrick Steinhardt @ 2024-08-02 5:12 UTC (permalink / raw) To: Kyle Lippincott via GitGitGadget; +Cc: git, Kyle Lippincott [-- Attachment #1: Type: text/plain, Size: 1634 bytes --] On Fri, Aug 02, 2024 at 04:10:51AM +0000, Kyle Lippincott via GitGitGadget wrote: > From: Kyle Lippincott <spectral@google.com> > > To detect conversion failure after calls to functions like `strtod`, one > can check `errno == ERANGE`. These functions are not guaranteed to set > `errno` to `0` on successful conversion, however. Manual manipulation of > `errno` can likely be avoided by checking that the output pointer > differs from the input pointer, but that's not how other locations, such > as parse.c:139, handle this issue; they set errno to 0 prior to > executing the function. > > For every place I could find a strtoX function with an ERANGE check > following it, set `errno = 0;` prior to executing the conversion > function. Makes sense. I've also gone through callsites and couldn't spot any additional ones that are broken. Generally speaking, the interfaces provided by the `strtod()` family of functions is just plain awful, and ideally we wouldn't be using them in the Git codebase at all without a wrapper. We already do have wrappers for a subset of those functions, e.g. `strtol_i()`, which use an out pointer to store the result and indicate success via the return value instead of via `errno`. It would be great if we could extend those wrappers to cover all of the integer types, convert our code base to use them, and then extend our "banned.h" banner. I'm of course not asking you to do that in this patch series. Out of curiosity, why do you hit those errors in your test setup? Do you use a special libc that behaves differently than the most common ones? Patrick [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 1/3] set errno=0 before strtoX calls 2024-08-02 5:12 ` Patrick Steinhardt @ 2024-08-02 6:15 ` Kyle Lippincott 2024-08-02 15:01 ` Junio C Hamano 1 sibling, 0 replies; 29+ messages in thread From: Kyle Lippincott @ 2024-08-02 6:15 UTC (permalink / raw) To: Patrick Steinhardt; +Cc: Kyle Lippincott via GitGitGadget, git On Thu, Aug 1, 2024 at 10:12 PM Patrick Steinhardt <ps@pks.im> wrote: > > On Fri, Aug 02, 2024 at 04:10:51AM +0000, Kyle Lippincott via GitGitGadget wrote: > > From: Kyle Lippincott <spectral@google.com> > > > > To detect conversion failure after calls to functions like `strtod`, one > > can check `errno == ERANGE`. These functions are not guaranteed to set > > `errno` to `0` on successful conversion, however. Manual manipulation of > > `errno` can likely be avoided by checking that the output pointer > > differs from the input pointer, but that's not how other locations, such > > as parse.c:139, handle this issue; they set errno to 0 prior to > > executing the function. > > > > For every place I could find a strtoX function with an ERANGE check > > following it, set `errno = 0;` prior to executing the conversion > > function. > > Makes sense. I've also gone through callsites and couldn't spot any > additional ones that are broken. > > Generally speaking, the interfaces provided by the `strtod()` family of > functions is just plain awful, and ideally we wouldn't be using them in > the Git codebase at all without a wrapper. We already do have wrappers > for a subset of those functions, e.g. `strtol_i()`, which use an out > pointer to store the result and indicate success via the return value > instead of via `errno`. > > It would be great if we could extend those wrappers to cover all of the > integer types, convert our code base to use them, and then extend our > "banned.h" banner. I'm of course not asking you to do that in this patch > series. > > Out of curiosity, why do you hit those errors in your test setup? Do you > use a special libc that behaves differently than the most common ones? The second patch in this series fixes the original reason I noticed the issues in three of the files: our remote test execution service uses paths that are >128 bytes long, so the getcwd call in strbuf_getcwd was returning ERANGE once, and then it remained set since getcwd didn't clear it on success. ref-filter.c was found via searching, I think that was during the search for `ERANGE`. > > Patrick ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 1/3] set errno=0 before strtoX calls 2024-08-02 5:12 ` Patrick Steinhardt 2024-08-02 6:15 ` Kyle Lippincott @ 2024-08-02 15:01 ` Junio C Hamano 1 sibling, 0 replies; 29+ messages in thread From: Junio C Hamano @ 2024-08-02 15:01 UTC (permalink / raw) To: Patrick Steinhardt; +Cc: Kyle Lippincott via GitGitGadget, git, Kyle Lippincott Patrick Steinhardt <ps@pks.im> writes: > It would be great if we could extend those wrappers to cover all of the > integer types, convert our code base to use them, and then extend our > "banned.h" banner. I'm of course not asking you to do that in this patch > series. A good #leftoverbits material. > Out of curiosity, why do you hit those errors in your test setup? Do you > use a special libc that behaves differently than the most common ones? ;-) ^ permalink raw reply [flat|nested] 29+ messages in thread
* [PATCH 2/3] strbuf: set errno to 0 after strbuf_getcwd 2024-08-02 4:10 [PATCH 0/3] Small fixes for issues detected during internal CI runs Kyle Lippincott via GitGitGadget 2024-08-02 4:10 ` [PATCH 1/3] set errno=0 before strtoX calls Kyle Lippincott via GitGitGadget @ 2024-08-02 4:10 ` Kyle Lippincott via GitGitGadget 2024-08-02 15:10 ` Junio C Hamano 2024-08-02 4:10 ` [PATCH 3/3] t6421: fix test to work when repo dir contains d0 Kyle Lippincott via GitGitGadget 2024-08-02 20:58 ` [PATCH v2 0/3] Small fixes for issues detected during internal CI runs Kyle Lippincott via GitGitGadget 3 siblings, 1 reply; 29+ messages in thread From: Kyle Lippincott via GitGitGadget @ 2024-08-02 4:10 UTC (permalink / raw) To: git; +Cc: Kyle Lippincott, Kyle Lippincott From: Kyle Lippincott <spectral@google.com> If the loop executes more than once due to cwd being longer than 128 bytes, then `errno = ERANGE` might persist outside of this function. This technically shouldn't be a problem, as all locations where the value in `errno` is tested should either (a) call a function that's guaranteed to set `errno` to 0 on success, or (b) set `errno` to 0 prior to calling the function that only conditionally sets errno, such as the `strtod` function. In the case of functions in category (b), it's easy to forget to do that. Set `errno = 0;` prior to exiting from `strbuf_getcwd` successfully. This matches the behavior in functions like `run_transaction_hook` (refs.c:2176) and `read_ref_internal` (refs/files-backend.c:564). Signed-off-by: Kyle Lippincott <spectral@google.com> --- strbuf.c | 1 + 1 file changed, 1 insertion(+) diff --git a/strbuf.c b/strbuf.c index 3d2189a7f64..b94ef040ab0 100644 --- a/strbuf.c +++ b/strbuf.c @@ -601,6 +601,7 @@ int strbuf_getcwd(struct strbuf *sb) strbuf_grow(sb, guessed_len); if (getcwd(sb->buf, sb->alloc)) { strbuf_setlen(sb, strlen(sb->buf)); + errno = 0; return 0; } -- gitgitgadget ^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: [PATCH 2/3] strbuf: set errno to 0 after strbuf_getcwd 2024-08-02 4:10 ` [PATCH 2/3] strbuf: set errno to 0 after strbuf_getcwd Kyle Lippincott via GitGitGadget @ 2024-08-02 15:10 ` Junio C Hamano 2024-08-02 17:56 ` Kyle Lippincott 0 siblings, 1 reply; 29+ messages in thread From: Junio C Hamano @ 2024-08-02 15:10 UTC (permalink / raw) To: Kyle Lippincott via GitGitGadget; +Cc: git, Kyle Lippincott "Kyle Lippincott via GitGitGadget" <gitgitgadget@gmail.com> writes: > Set `errno = 0;` prior to exiting from `strbuf_getcwd` successfully. > This matches the behavior in functions like `run_transaction_hook` > (refs.c:2176) and `read_ref_internal` (refs/files-backend.c:564). This deep in the call chain, there is nothing that assures us that the caller of this function does not care about the error before entering this function, so I feel a bit uneasy about the approach, and my initial reaction was "wouldn't it be safer to do the usual int saved_errno = errno; for (guessed_len = 128;; guessed_len *= 2) { ... do things ... if (...) { ... happy ... errno = saved_errno; return 0; } } pattern. Who calls this function, and inspects errno when this function returns 0? I do not mind adding the "save and restore" fix to this function, but if there is a caller that looks at errno from a call that returns success, that caller may also have to be looked at and fixed if necessary. Thanks. > Signed-off-by: Kyle Lippincott <spectral@google.com> > --- > strbuf.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/strbuf.c b/strbuf.c > index 3d2189a7f64..b94ef040ab0 100644 > --- a/strbuf.c > +++ b/strbuf.c > @@ -601,6 +601,7 @@ int strbuf_getcwd(struct strbuf *sb) > strbuf_grow(sb, guessed_len); > if (getcwd(sb->buf, sb->alloc)) { > strbuf_setlen(sb, strlen(sb->buf)); > + errno = 0; > return 0; > } ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 2/3] strbuf: set errno to 0 after strbuf_getcwd 2024-08-02 15:10 ` Junio C Hamano @ 2024-08-02 17:56 ` Kyle Lippincott 0 siblings, 0 replies; 29+ messages in thread From: Kyle Lippincott @ 2024-08-02 17:56 UTC (permalink / raw) To: Junio C Hamano; +Cc: Kyle Lippincott via GitGitGadget, git On Fri, Aug 2, 2024 at 8:10 AM Junio C Hamano <gitster@pobox.com> wrote: > > "Kyle Lippincott via GitGitGadget" <gitgitgadget@gmail.com> writes: > > > Set `errno = 0;` prior to exiting from `strbuf_getcwd` successfully. > > This matches the behavior in functions like `run_transaction_hook` > > (refs.c:2176) and `read_ref_internal` (refs/files-backend.c:564). > > This deep in the call chain, there is nothing that assures us that > the caller of this function does not care about the error before > entering this function, so I feel a bit uneasy about the approach, > and my initial reaction was "wouldn't it be safer to do the usual > > int saved_errno = errno; > > for (guessed_len = 128;; guessed_len *= 2) { > ... do things ... > if (...) { > ... happy ... > errno = saved_errno; > return 0; > } > } > > pattern. > > Who calls this function, and inspects errno when this function > returns 0? That's a difficult question to answer if you want to be wholistic for the whole program :) For immediate callers: - unix_sockaddr_init: doesn't inspect or adjust errno itself if strbuf_getcwd returns 0. Continues on to call other functions that may set errno. - strbuf_realpath_1: same - chdir_notify: same - discover_git_directory_reason: same - setup_git_directory_gently: same - setup_enlistment_directory (in scalar.c): dies immediately if strbuf_getcwd returns < 0, otherwise same - xgetcwd: also doesn't inspect/adjust errno if strbuf_getcwd returns 0. Doesn't call any other functions afterward (besides strbuf functions). - main: stores the value if strbuf_getcwd returns 0, but doesn't inspect errno. > I do not mind adding the "save and restore" fix to this > function, but if there is a caller that looks at errno from a call > that returns success, that caller may also have to be looked at and > fixed if necessary. There aren't any that I could find, this patch is mostly a defense-in-depth solution to the strtoX functions that were fixed in patch 1. This function _may_ set errno even on success. That errno value ends up retained indefinitely as long as things continue succeeding, and then we call a function like `strtod` which has a suboptimal interface. If this patch doesn't land, the codebase is still correct; the main reason to want to land this is that without this patch, any user that has paths longer than 128 bytes becomes de facto responsible for finding and reporting/fixing issues that arise from this errno value being persisted, and I was hoping I wouldn't be signing the people maintaining CI at $JOB up for that :) It's not an obvious failure, either. For example, t0211's failure, prior to setting errno to 0 just before calling strtoX is just: `fatal: expect <exit_code>`. That's not easy to trace back to "strbuf_getcwd sets ERANGE in errno in our environment, so this is a misuse of a strtoX or parse_timestamp function". > > Thanks. > > > Signed-off-by: Kyle Lippincott <spectral@google.com> > > --- > > strbuf.c | 1 + > > 1 file changed, 1 insertion(+) > > > > diff --git a/strbuf.c b/strbuf.c > > index 3d2189a7f64..b94ef040ab0 100644 > > --- a/strbuf.c > > +++ b/strbuf.c > > @@ -601,6 +601,7 @@ int strbuf_getcwd(struct strbuf *sb) > > strbuf_grow(sb, guessed_len); > > if (getcwd(sb->buf, sb->alloc)) { > > strbuf_setlen(sb, strlen(sb->buf)); > > + errno = 0; > > return 0; > > } ^ permalink raw reply [flat|nested] 29+ messages in thread
* [PATCH 3/3] t6421: fix test to work when repo dir contains d0 2024-08-02 4:10 [PATCH 0/3] Small fixes for issues detected during internal CI runs Kyle Lippincott via GitGitGadget 2024-08-02 4:10 ` [PATCH 1/3] set errno=0 before strtoX calls Kyle Lippincott via GitGitGadget 2024-08-02 4:10 ` [PATCH 2/3] strbuf: set errno to 0 after strbuf_getcwd Kyle Lippincott via GitGitGadget @ 2024-08-02 4:10 ` Kyle Lippincott via GitGitGadget 2024-08-02 15:13 ` Junio C Hamano 2024-08-02 20:58 ` [PATCH v2 0/3] Small fixes for issues detected during internal CI runs Kyle Lippincott via GitGitGadget 3 siblings, 1 reply; 29+ messages in thread From: Kyle Lippincott via GitGitGadget @ 2024-08-02 4:10 UTC (permalink / raw) To: git; +Cc: Kyle Lippincott, Kyle Lippincott From: Kyle Lippincott <spectral@google.com> The `grep` statement in this test looks for `d0.*<string>`, attempting to filter to only show lines that had tabular output where the 2nd column had `d0` and the final column had a substring of [`git -c `]`fetch.negotiationAlgorithm`. These lines also have `child_start` in the 4th column, but this isn't part of the condition. A subsequent line will have `d1` in the 2nd column, `start` in the 4th column, and `/path/to/git/git -c fetch.negotiationAlgorihm` in the final column. If `/path/to/git/git` contains the substring `d0`, then this line is included by `grep` as well as the desired line, leading to an effective doubling of the number of lines, and test failures. Tighten the grep expression to require `d0` to be surrounded by spaces, and to have the `child_start` label. Signed-off-by: Kyle Lippincott <spectral@google.com> --- t/t6421-merge-partial-clone.sh | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/t/t6421-merge-partial-clone.sh b/t/t6421-merge-partial-clone.sh index 711b709e755..0f312ac93dc 100755 --- a/t/t6421-merge-partial-clone.sh +++ b/t/t6421-merge-partial-clone.sh @@ -231,7 +231,7 @@ test_expect_merge_algorithm failure success 'Objects downloaded for single relev test_cmp expect actual && # Check the number of fetch commands exec-ed - grep d0.*fetch.negotiationAlgorithm trace.output >fetches && + grep " d0 .* child_start .*fetch.negotiationAlgorithm" trace.output >fetches && test_line_count = 2 fetches && git rev-list --objects --all --missing=print | @@ -319,7 +319,7 @@ test_expect_merge_algorithm failure success 'Objects downloaded when a directory test_cmp expect actual && # Check the number of fetch commands exec-ed - grep d0.*fetch.negotiationAlgorithm trace.output >fetches && + grep " d0 .* child_start .*fetch.negotiationAlgorithm" trace.output >fetches && test_line_count = 1 fetches && git rev-list --objects --all --missing=print | @@ -423,7 +423,7 @@ test_expect_merge_algorithm failure success 'Objects downloaded with lots of ren test_cmp expect actual && # Check the number of fetch commands exec-ed - grep d0.*fetch.negotiationAlgorithm trace.output >fetches && + grep " d0 .* child_start .*fetch.negotiationAlgorithm" trace.output >fetches && test_line_count = 4 fetches && git rev-list --objects --all --missing=print | -- gitgitgadget ^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: [PATCH 3/3] t6421: fix test to work when repo dir contains d0 2024-08-02 4:10 ` [PATCH 3/3] t6421: fix test to work when repo dir contains d0 Kyle Lippincott via GitGitGadget @ 2024-08-02 15:13 ` Junio C Hamano 0 siblings, 0 replies; 29+ messages in thread From: Junio C Hamano @ 2024-08-02 15:13 UTC (permalink / raw) To: Kyle Lippincott via GitGitGadget; +Cc: git, Kyle Lippincott "Kyle Lippincott via GitGitGadget" <gitgitgadget@gmail.com> writes: > From: Kyle Lippincott <spectral@google.com> > > The `grep` statement in this test looks for `d0.*<string>`, attempting > to filter to only show lines that had tabular output where the 2nd > column had `d0` and the final column had a substring of > [`git -c `]`fetch.negotiationAlgorithm`. These lines also have > `child_start` in the 4th column, but this isn't part of the condition. > > A subsequent line will have `d1` in the 2nd column, `start` in the 4th > column, and `/path/to/git/git -c fetch.negotiationAlgorihm` in the final > column. If `/path/to/git/git` contains the substring `d0`, then this > line is included by `grep` as well as the desired line, leading to an > effective doubling of the number of lines, and test failures. > > Tighten the grep expression to require `d0` to be surrounded by spaces, > and to have the `child_start` label. Makes sense. Updating the comment with expected shape of the output might make it even less likely that we'd break these fixes again by mistake. Thanks. > Signed-off-by: Kyle Lippincott <spectral@google.com> > --- > t/t6421-merge-partial-clone.sh | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > > diff --git a/t/t6421-merge-partial-clone.sh b/t/t6421-merge-partial-clone.sh > index 711b709e755..0f312ac93dc 100755 > --- a/t/t6421-merge-partial-clone.sh > +++ b/t/t6421-merge-partial-clone.sh > @@ -231,7 +231,7 @@ test_expect_merge_algorithm failure success 'Objects downloaded for single relev > test_cmp expect actual && > > # Check the number of fetch commands exec-ed > - grep d0.*fetch.negotiationAlgorithm trace.output >fetches && > + grep " d0 .* child_start .*fetch.negotiationAlgorithm" trace.output >fetches && > test_line_count = 2 fetches && > > git rev-list --objects --all --missing=print | > @@ -319,7 +319,7 @@ test_expect_merge_algorithm failure success 'Objects downloaded when a directory > test_cmp expect actual && > > # Check the number of fetch commands exec-ed > - grep d0.*fetch.negotiationAlgorithm trace.output >fetches && > + grep " d0 .* child_start .*fetch.negotiationAlgorithm" trace.output >fetches && > test_line_count = 1 fetches && > > git rev-list --objects --all --missing=print | > @@ -423,7 +423,7 @@ test_expect_merge_algorithm failure success 'Objects downloaded with lots of ren > test_cmp expect actual && > > # Check the number of fetch commands exec-ed > - grep d0.*fetch.negotiationAlgorithm trace.output >fetches && > + grep " d0 .* child_start .*fetch.negotiationAlgorithm" trace.output >fetches && > test_line_count = 4 fetches && > > git rev-list --objects --all --missing=print | ^ permalink raw reply [flat|nested] 29+ messages in thread
* [PATCH v2 0/3] Small fixes for issues detected during internal CI runs 2024-08-02 4:10 [PATCH 0/3] Small fixes for issues detected during internal CI runs Kyle Lippincott via GitGitGadget ` (2 preceding siblings ...) 2024-08-02 4:10 ` [PATCH 3/3] t6421: fix test to work when repo dir contains d0 Kyle Lippincott via GitGitGadget @ 2024-08-02 20:58 ` Kyle Lippincott via GitGitGadget 2024-08-02 20:58 ` [PATCH v2 1/3] set errno=0 before strtoX calls Kyle Lippincott via GitGitGadget ` (3 more replies) 3 siblings, 4 replies; 29+ messages in thread From: Kyle Lippincott via GitGitGadget @ 2024-08-02 20:58 UTC (permalink / raw) To: git; +Cc: Patrick Steinhardt, Kyle Lippincott I'm attempting to get the git test suite running automatically during our weekly import. I have this mostly working, including with Address Sanitizer and Memory Sanitizer, but ran into a few issues: * several tests were failing due to strbuf_getcwd not clearing errno on success after it internally looped due to the path being >128 bytes. This is resolved in depth; though either one of the commits alone would resolve our issues: * modify locations that call strtoX and check for ERANGE to set errno = 0; prior to calling the conversion function. This is the typical way that these functions are invoked, and may indicate that we want compatibility helpers in git-compat-util.h to ensure that this happens correctly (and add these functions to the banned list). * have strbuf_getcwd set errno = 0; prior to a successful exit. This isn't very common for most functions in the codebase, but some other examples of this were found. * t6421-merge-partial-clone.sh had >10% flakiness. This is due to our build system using paths that contain a 64-hex-char hash, which had a 12.5% chance of containing the substring d0. Kyle Lippincott (3): set errno=0 before strtoX calls strbuf: set errno to 0 after strbuf_getcwd t6421: fix test to work when repo dir contains d0 builtin/get-tar-commit-id.c | 1 + ref-filter.c | 1 + strbuf.c | 1 + t/helper/test-json-writer.c | 2 ++ t/helper/test-trace2.c | 1 + t/t6421-merge-partial-clone.sh | 15 +++++++++------ 6 files changed, 15 insertions(+), 6 deletions(-) base-commit: e559c4bf1a306cf5814418d318cc0fea070da3c7 Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1756%2Fspectral54%2Fstrbuf_getcwd-clear-errno-v2 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1756/spectral54/strbuf_getcwd-clear-errno-v2 Pull-Request: https://github.com/git/git/pull/1756 Range-diff vs v1: 1: 4dbd0bec40a = 1: 4dbd0bec40a set errno=0 before strtoX calls 2: 0ed09e9abb8 = 2: 0ed09e9abb8 strbuf: set errno to 0 after strbuf_getcwd 3: 6c08b8ceb2b ! 3: 818dc9e6b3e t6421: fix test to work when repo dir contains d0 @@ Commit message ## t/t6421-merge-partial-clone.sh ## @@ t/t6421-merge-partial-clone.sh: test_expect_merge_algorithm failure success 'Objects downloaded for single relev + grep fetch_count trace.output | cut -d "|" -f 9 | tr -d " ." >actual && test_cmp expect actual && - # Check the number of fetch commands exec-ed +- # Check the number of fetch commands exec-ed - grep d0.*fetch.negotiationAlgorithm trace.output >fetches && ++ # Check the number of fetch commands exec-ed by filtering trace to ++ # child_start events by the top-level program (2nd field == d0) + grep " d0 .* child_start .*fetch.negotiationAlgorithm" trace.output >fetches && test_line_count = 2 fetches && git rev-list --objects --all --missing=print | @@ t/t6421-merge-partial-clone.sh: test_expect_merge_algorithm failure success 'Objects downloaded when a directory + grep fetch_count trace.output | cut -d "|" -f 9 | tr -d " ." >actual && test_cmp expect actual && - # Check the number of fetch commands exec-ed +- # Check the number of fetch commands exec-ed - grep d0.*fetch.negotiationAlgorithm trace.output >fetches && ++ # Check the number of fetch commands exec-ed by filtering trace to ++ # child_start events by the top-level program (2nd field == d0) + grep " d0 .* child_start .*fetch.negotiationAlgorithm" trace.output >fetches && test_line_count = 1 fetches && git rev-list --objects --all --missing=print | @@ t/t6421-merge-partial-clone.sh: test_expect_merge_algorithm failure success 'Objects downloaded with lots of ren + grep fetch_count trace.output | cut -d "|" -f 9 | tr -d " ." >actual && test_cmp expect actual && - # Check the number of fetch commands exec-ed +- # Check the number of fetch commands exec-ed - grep d0.*fetch.negotiationAlgorithm trace.output >fetches && ++ # Check the number of fetch commands exec-ed by filtering trace to ++ # child_start events by the top-level program (2nd field == d0) + grep " d0 .* child_start .*fetch.negotiationAlgorithm" trace.output >fetches && test_line_count = 4 fetches && -- gitgitgadget ^ permalink raw reply [flat|nested] 29+ messages in thread
* [PATCH v2 1/3] set errno=0 before strtoX calls 2024-08-02 20:58 ` [PATCH v2 0/3] Small fixes for issues detected during internal CI runs Kyle Lippincott via GitGitGadget @ 2024-08-02 20:58 ` Kyle Lippincott via GitGitGadget 2024-08-02 21:18 ` Junio C Hamano 2024-08-02 20:58 ` [PATCH v2 2/3] strbuf: set errno to 0 after strbuf_getcwd Kyle Lippincott via GitGitGadget ` (2 subsequent siblings) 3 siblings, 1 reply; 29+ messages in thread From: Kyle Lippincott via GitGitGadget @ 2024-08-02 20:58 UTC (permalink / raw) To: git; +Cc: Patrick Steinhardt, Kyle Lippincott, Kyle Lippincott From: Kyle Lippincott <spectral@google.com> To detect conversion failure after calls to functions like `strtod`, one can check `errno == ERANGE`. These functions are not guaranteed to set `errno` to `0` on successful conversion, however. Manual manipulation of `errno` can likely be avoided by checking that the output pointer differs from the input pointer, but that's not how other locations, such as parse.c:139, handle this issue; they set errno to 0 prior to executing the function. For every place I could find a strtoX function with an ERANGE check following it, set `errno = 0;` prior to executing the conversion function. Signed-off-by: Kyle Lippincott <spectral@google.com> --- builtin/get-tar-commit-id.c | 1 + ref-filter.c | 1 + t/helper/test-json-writer.c | 2 ++ t/helper/test-trace2.c | 1 + 4 files changed, 5 insertions(+) diff --git a/builtin/get-tar-commit-id.c b/builtin/get-tar-commit-id.c index 66a7389f9f4..7195a072edc 100644 --- a/builtin/get-tar-commit-id.c +++ b/builtin/get-tar-commit-id.c @@ -35,6 +35,7 @@ int cmd_get_tar_commit_id(int argc, const char **argv UNUSED, const char *prefix if (header->typeflag[0] != TYPEFLAG_GLOBAL_HEADER) return 1; + errno = 0; len = strtol(content, &end, 10); if (errno == ERANGE || end == content || len < 0) return 1; diff --git a/ref-filter.c b/ref-filter.c index 8c5e673fc0a..54880a2497a 100644 --- a/ref-filter.c +++ b/ref-filter.c @@ -1628,6 +1628,7 @@ static void grab_date(const char *buf, struct atom_value *v, const char *atomnam timestamp = parse_timestamp(eoemail + 2, &zone, 10); if (timestamp == TIME_MAX) goto bad; + errno = 0; tz = strtol(zone, NULL, 10); if ((tz == LONG_MIN || tz == LONG_MAX) && errno == ERANGE) goto bad; diff --git a/t/helper/test-json-writer.c b/t/helper/test-json-writer.c index ed52eb76bfc..a288069b04c 100644 --- a/t/helper/test-json-writer.c +++ b/t/helper/test-json-writer.c @@ -415,6 +415,7 @@ static void get_i(struct line *line, intmax_t *s_in) get_s(line, &s); + errno = 0; *s_in = strtol(s, &endptr, 10); if (*endptr || errno == ERANGE) die("line[%d]: invalid integer value", line->nr); @@ -427,6 +428,7 @@ static void get_d(struct line *line, double *s_in) get_s(line, &s); + errno = 0; *s_in = strtod(s, &endptr); if (*endptr || errno == ERANGE) die("line[%d]: invalid float value", line->nr); diff --git a/t/helper/test-trace2.c b/t/helper/test-trace2.c index cd955ec63e9..c588c273ce7 100644 --- a/t/helper/test-trace2.c +++ b/t/helper/test-trace2.c @@ -26,6 +26,7 @@ static int get_i(int *p_value, const char *data) if (!data || !*data) return MyError; + errno = 0; *p_value = strtol(data, &endptr, 10); if (*endptr || errno == ERANGE) return MyError; -- gitgitgadget ^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: [PATCH v2 1/3] set errno=0 before strtoX calls 2024-08-02 20:58 ` [PATCH v2 1/3] set errno=0 before strtoX calls Kyle Lippincott via GitGitGadget @ 2024-08-02 21:18 ` Junio C Hamano 0 siblings, 0 replies; 29+ messages in thread From: Junio C Hamano @ 2024-08-02 21:18 UTC (permalink / raw) To: Kyle Lippincott via GitGitGadget; +Cc: git, Patrick Steinhardt, Kyle Lippincott "Kyle Lippincott via GitGitGadget" <gitgitgadget@gmail.com> writes: > From: Kyle Lippincott <spectral@google.com> > > To detect conversion failure after calls to functions like `strtod`, one > can check `errno == ERANGE`. These functions are not guaranteed to set > `errno` to `0` on successful conversion, however. Manual manipulation of > `errno` can likely be avoided by checking that the output pointer > differs from the input pointer, but that's not how other locations, such > as parse.c:139, handle this issue; they set errno to 0 prior to > executing the function. > > For every place I could find a strtoX function with an ERANGE check > following it, set `errno = 0;` prior to executing the conversion > function. > > Signed-off-by: Kyle Lippincott <spectral@google.com> > --- > builtin/get-tar-commit-id.c | 1 + > ref-filter.c | 1 + > t/helper/test-json-writer.c | 2 ++ > t/helper/test-trace2.c | 1 + > 4 files changed, 5 insertions(+) Clearilng before strtoX() call like these changes make perfect sense (within the constraint of strtoX() API, which is horrible as pointed out by others many times in the past ;-) Thanks, will queue. > diff --git a/builtin/get-tar-commit-id.c b/builtin/get-tar-commit-id.c > index 66a7389f9f4..7195a072edc 100644 > --- a/builtin/get-tar-commit-id.c > +++ b/builtin/get-tar-commit-id.c > @@ -35,6 +35,7 @@ int cmd_get_tar_commit_id(int argc, const char **argv UNUSED, const char *prefix > if (header->typeflag[0] != TYPEFLAG_GLOBAL_HEADER) > return 1; > > + errno = 0; > len = strtol(content, &end, 10); > if (errno == ERANGE || end == content || len < 0) > return 1; > diff --git a/ref-filter.c b/ref-filter.c > index 8c5e673fc0a..54880a2497a 100644 > --- a/ref-filter.c > +++ b/ref-filter.c > @@ -1628,6 +1628,7 @@ static void grab_date(const char *buf, struct atom_value *v, const char *atomnam > timestamp = parse_timestamp(eoemail + 2, &zone, 10); > if (timestamp == TIME_MAX) > goto bad; > + errno = 0; > tz = strtol(zone, NULL, 10); > if ((tz == LONG_MIN || tz == LONG_MAX) && errno == ERANGE) > goto bad; > diff --git a/t/helper/test-json-writer.c b/t/helper/test-json-writer.c > index ed52eb76bfc..a288069b04c 100644 > --- a/t/helper/test-json-writer.c > +++ b/t/helper/test-json-writer.c > @@ -415,6 +415,7 @@ static void get_i(struct line *line, intmax_t *s_in) > > get_s(line, &s); > > + errno = 0; > *s_in = strtol(s, &endptr, 10); > if (*endptr || errno == ERANGE) > die("line[%d]: invalid integer value", line->nr); > @@ -427,6 +428,7 @@ static void get_d(struct line *line, double *s_in) > > get_s(line, &s); > > + errno = 0; > *s_in = strtod(s, &endptr); > if (*endptr || errno == ERANGE) > die("line[%d]: invalid float value", line->nr); > diff --git a/t/helper/test-trace2.c b/t/helper/test-trace2.c > index cd955ec63e9..c588c273ce7 100644 > --- a/t/helper/test-trace2.c > +++ b/t/helper/test-trace2.c > @@ -26,6 +26,7 @@ static int get_i(int *p_value, const char *data) > if (!data || !*data) > return MyError; > > + errno = 0; > *p_value = strtol(data, &endptr, 10); > if (*endptr || errno == ERANGE) > return MyError; ^ permalink raw reply [flat|nested] 29+ messages in thread
* [PATCH v2 2/3] strbuf: set errno to 0 after strbuf_getcwd 2024-08-02 20:58 ` [PATCH v2 0/3] Small fixes for issues detected during internal CI runs Kyle Lippincott via GitGitGadget 2024-08-02 20:58 ` [PATCH v2 1/3] set errno=0 before strtoX calls Kyle Lippincott via GitGitGadget @ 2024-08-02 20:58 ` Kyle Lippincott via GitGitGadget 2024-08-02 21:32 ` Junio C Hamano 2024-08-02 20:58 ` [PATCH v2 3/3] t6421: fix test to work when repo dir contains d0 Kyle Lippincott via GitGitGadget 2024-08-05 17:10 ` [PATCH v3 0/2] Small fixes for issues detected during internal CI runs Kyle Lippincott via GitGitGadget 3 siblings, 1 reply; 29+ messages in thread From: Kyle Lippincott via GitGitGadget @ 2024-08-02 20:58 UTC (permalink / raw) To: git; +Cc: Patrick Steinhardt, Kyle Lippincott, Kyle Lippincott From: Kyle Lippincott <spectral@google.com> If the loop executes more than once due to cwd being longer than 128 bytes, then `errno = ERANGE` might persist outside of this function. This technically shouldn't be a problem, as all locations where the value in `errno` is tested should either (a) call a function that's guaranteed to set `errno` to 0 on success, or (b) set `errno` to 0 prior to calling the function that only conditionally sets errno, such as the `strtod` function. In the case of functions in category (b), it's easy to forget to do that. Set `errno = 0;` prior to exiting from `strbuf_getcwd` successfully. This matches the behavior in functions like `run_transaction_hook` (refs.c:2176) and `read_ref_internal` (refs/files-backend.c:564). Signed-off-by: Kyle Lippincott <spectral@google.com> --- strbuf.c | 1 + 1 file changed, 1 insertion(+) diff --git a/strbuf.c b/strbuf.c index 3d2189a7f64..b94ef040ab0 100644 --- a/strbuf.c +++ b/strbuf.c @@ -601,6 +601,7 @@ int strbuf_getcwd(struct strbuf *sb) strbuf_grow(sb, guessed_len); if (getcwd(sb->buf, sb->alloc)) { strbuf_setlen(sb, strlen(sb->buf)); + errno = 0; return 0; } -- gitgitgadget ^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: [PATCH v2 2/3] strbuf: set errno to 0 after strbuf_getcwd 2024-08-02 20:58 ` [PATCH v2 2/3] strbuf: set errno to 0 after strbuf_getcwd Kyle Lippincott via GitGitGadget @ 2024-08-02 21:32 ` Junio C Hamano 2024-08-02 21:54 ` Eric Sunshine 2024-08-02 23:51 ` Kyle Lippincott 0 siblings, 2 replies; 29+ messages in thread From: Junio C Hamano @ 2024-08-02 21:32 UTC (permalink / raw) To: Kyle Lippincott via GitGitGadget; +Cc: git, Patrick Steinhardt, Kyle Lippincott "Kyle Lippincott via GitGitGadget" <gitgitgadget@gmail.com> writes: > From: Kyle Lippincott <spectral@google.com> > > If the loop executes more than once due to cwd being longer than 128 > bytes, then `errno = ERANGE` might persist outside of this function. > This technically shouldn't be a problem, as all locations where the > value in `errno` is tested should either (a) call a function that's > guaranteed to set `errno` to 0 on success, or (b) set `errno` to 0 prior > to calling the function that only conditionally sets errno, such as the > `strtod` function. In the case of functions in category (b), it's easy > to forget to do that. > > Set `errno = 0;` prior to exiting from `strbuf_getcwd` successfully. > This matches the behavior in functions like `run_transaction_hook` > (refs.c:2176) and `read_ref_internal` (refs/files-backend.c:564). I am still uneasy to see this unconditional clearing, which looks more like spreading the bad practice from two places you identified than following good behaviour modelled after these two places. But I'll let it pass. As long as our programmers understand that across strbuf_getcwd(), errno will *not* be preserved, even if the function returns success, it would be OK. As the usual convention around errno is that a successful call would leave errno intact, not clear it to 0, it would make it a bit harder to learn our API for newcomers, though. Thanks. > Signed-off-by: Kyle Lippincott <spectral@google.com> > --- > strbuf.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/strbuf.c b/strbuf.c > index 3d2189a7f64..b94ef040ab0 100644 > --- a/strbuf.c > +++ b/strbuf.c > @@ -601,6 +601,7 @@ int strbuf_getcwd(struct strbuf *sb) > strbuf_grow(sb, guessed_len); > if (getcwd(sb->buf, sb->alloc)) { > strbuf_setlen(sb, strlen(sb->buf)); > + errno = 0; > return 0; > } ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v2 2/3] strbuf: set errno to 0 after strbuf_getcwd 2024-08-02 21:32 ` Junio C Hamano @ 2024-08-02 21:54 ` Eric Sunshine 2024-08-05 15:51 ` Junio C Hamano 2024-08-02 23:51 ` Kyle Lippincott 1 sibling, 1 reply; 29+ messages in thread From: Eric Sunshine @ 2024-08-02 21:54 UTC (permalink / raw) To: Junio C Hamano Cc: Kyle Lippincott via GitGitGadget, git, Patrick Steinhardt, Kyle Lippincott On Fri, Aug 2, 2024 at 5:32 PM Junio C Hamano <gitster@pobox.com> wrote: > > [...] > > Set `errno = 0;` prior to exiting from `strbuf_getcwd` successfully. > > This matches the behavior in functions like `run_transaction_hook` > > (refs.c:2176) and `read_ref_internal` (refs/files-backend.c:564). > > I am still uneasy to see this unconditional clearing, which looks > more like spreading the bad practice from two places you identified > than following good behaviour modelled after these two places. > > But I'll let it pass. > > As long as our programmers understand that across strbuf_getcwd(), > errno will *not* be preserved, even if the function returns success, > it would be OK. As the usual convention around errno is that a > successful call would leave errno intact, not clear it to 0, it > would make it a bit harder to learn our API for newcomers, though. For what it's worth, I share your misgivings about this change and consider the suggestion[*] to make it save/restore `errno` upon success more sensible. It would also be a welcome change to see the function documentation in strbuf.h updated to mention that it follows the usual convention of leaving `errno` untouched upon success and clobbered upon error. [*]: https://lore.kernel.org/git/xmqqv80jeza5.fsf@gitster.g/ ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v2 2/3] strbuf: set errno to 0 after strbuf_getcwd 2024-08-02 21:54 ` Eric Sunshine @ 2024-08-05 15:51 ` Junio C Hamano 2024-08-06 6:26 ` Patrick Steinhardt 0 siblings, 1 reply; 29+ messages in thread From: Junio C Hamano @ 2024-08-05 15:51 UTC (permalink / raw) To: Eric Sunshine Cc: Kyle Lippincott via GitGitGadget, git, Patrick Steinhardt, Kyle Lippincott Eric Sunshine <sunshine@sunshineco.com> writes: > On Fri, Aug 2, 2024 at 5:32 PM Junio C Hamano <gitster@pobox.com> wrote: >> > [...] >> > Set `errno = 0;` prior to exiting from `strbuf_getcwd` successfully. >> > This matches the behavior in functions like `run_transaction_hook` >> > (refs.c:2176) and `read_ref_internal` (refs/files-backend.c:564). >> >> I am still uneasy to see this unconditional clearing, which looks >> more like spreading the bad practice from two places you identified >> than following good behaviour modelled after these two places. >> >> But I'll let it pass. >> >> As long as our programmers understand that across strbuf_getcwd(), >> errno will *not* be preserved, even if the function returns success, >> it would be OK. As the usual convention around errno is that a >> successful call would leave errno intact, not clear it to 0, it >> would make it a bit harder to learn our API for newcomers, though. > > For what it's worth, I share your misgivings about this change and > consider the suggestion[*] to make it save/restore `errno` upon > success more sensible. It would also be a welcome change to see the > function documentation in strbuf.h updated to mention that it follows > the usual convention of leaving `errno` untouched upon success and > clobbered upon error. > > [*]: https://lore.kernel.org/git/xmqqv80jeza5.fsf@gitster.g/ Yup, of course save/restore would be safer, and probably easier to reason about for many people. Thanks. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v2 2/3] strbuf: set errno to 0 after strbuf_getcwd 2024-08-05 15:51 ` Junio C Hamano @ 2024-08-06 6:26 ` Patrick Steinhardt 2024-08-06 7:04 ` Kyle Lippincott 0 siblings, 1 reply; 29+ messages in thread From: Patrick Steinhardt @ 2024-08-06 6:26 UTC (permalink / raw) To: Junio C Hamano Cc: Eric Sunshine, Kyle Lippincott via GitGitGadget, git, Kyle Lippincott [-- Attachment #1: Type: text/plain, Size: 2697 bytes --] On Mon, Aug 05, 2024 at 08:51:50AM -0700, Junio C Hamano wrote: > Eric Sunshine <sunshine@sunshineco.com> writes: > > > On Fri, Aug 2, 2024 at 5:32 PM Junio C Hamano <gitster@pobox.com> wrote: > >> > [...] > >> > Set `errno = 0;` prior to exiting from `strbuf_getcwd` successfully. > >> > This matches the behavior in functions like `run_transaction_hook` > >> > (refs.c:2176) and `read_ref_internal` (refs/files-backend.c:564). > >> > >> I am still uneasy to see this unconditional clearing, which looks > >> more like spreading the bad practice from two places you identified > >> than following good behaviour modelled after these two places. > >> > >> But I'll let it pass. > >> > >> As long as our programmers understand that across strbuf_getcwd(), > >> errno will *not* be preserved, even if the function returns success, > >> it would be OK. As the usual convention around errno is that a > >> successful call would leave errno intact, not clear it to 0, it > >> would make it a bit harder to learn our API for newcomers, though. > > > > For what it's worth, I share your misgivings about this change and > > consider the suggestion[*] to make it save/restore `errno` upon > > success more sensible. It would also be a welcome change to see the > > function documentation in strbuf.h updated to mention that it follows > > the usual convention of leaving `errno` untouched upon success and > > clobbered upon error. > > > > [*]: https://lore.kernel.org/git/xmqqv80jeza5.fsf@gitster.g/ > > Yup, of course save/restore would be safer, and probably easier to > reason about for many people. Is it really all that reasonable? We're essentially partitioning our set of APIs into two sets, where one set knows to keep `errno` intact whereas another set doesn't. In such a world, you have to be very careful about which APIs you are calling in a function that wants to keep `errno` intact, which to me sounds like a maintenance headache. I'd claim that most callers never care about `errno` at all. For the callers that do, I feel it is way more fragile to rely on whether or not a called function leaves `errno` intact or not. For one, it's fragile because that may easily change due to a bug. Second, it is fragile because the dependency on `errno` is not explicitly documented via code, but rather an implicit dependency. So isn't it more reasonable to rather make the few callers that do require `errno` to be left intact to save it? It makes the dependency explicit, avoids splitting our functions into two sets and allows us to just ignore this issue for the majority of functions that couldn't care less about `errno`. Patrick [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v2 2/3] strbuf: set errno to 0 after strbuf_getcwd 2024-08-06 6:26 ` Patrick Steinhardt @ 2024-08-06 7:04 ` Kyle Lippincott 0 siblings, 0 replies; 29+ messages in thread From: Kyle Lippincott @ 2024-08-06 7:04 UTC (permalink / raw) To: Patrick Steinhardt Cc: Junio C Hamano, Eric Sunshine, Kyle Lippincott via GitGitGadget, git On Mon, Aug 5, 2024 at 11:26 PM Patrick Steinhardt <ps@pks.im> wrote: > > On Mon, Aug 05, 2024 at 08:51:50AM -0700, Junio C Hamano wrote: > > Eric Sunshine <sunshine@sunshineco.com> writes: > > > > > On Fri, Aug 2, 2024 at 5:32 PM Junio C Hamano <gitster@pobox.com> wrote: > > >> > [...] > > >> > Set `errno = 0;` prior to exiting from `strbuf_getcwd` successfully. > > >> > This matches the behavior in functions like `run_transaction_hook` > > >> > (refs.c:2176) and `read_ref_internal` (refs/files-backend.c:564). > > >> > > >> I am still uneasy to see this unconditional clearing, which looks > > >> more like spreading the bad practice from two places you identified > > >> than following good behaviour modelled after these two places. > > >> > > >> But I'll let it pass. > > >> > > >> As long as our programmers understand that across strbuf_getcwd(), > > >> errno will *not* be preserved, even if the function returns success, > > >> it would be OK. As the usual convention around errno is that a > > >> successful call would leave errno intact, not clear it to 0, it > > >> would make it a bit harder to learn our API for newcomers, though. > > > > > > For what it's worth, I share your misgivings about this change and > > > consider the suggestion[*] to make it save/restore `errno` upon > > > success more sensible. It would also be a welcome change to see the > > > function documentation in strbuf.h updated to mention that it follows > > > the usual convention of leaving `errno` untouched upon success and > > > clobbered upon error. > > > > > > [*]: https://lore.kernel.org/git/xmqqv80jeza5.fsf@gitster.g/ > > > > Yup, of course save/restore would be safer, and probably easier to > > reason about for many people. > > Is it really all that reasonable? We're essentially partitioning our set > of APIs into two sets, where one set knows to keep `errno` intact > whereas another set doesn't. In such a world, you have to be very > careful about which APIs you are calling in a function that wants to > keep `errno` intact, which to me sounds like a maintenance headache. > > I'd claim that most callers never care about `errno` at all. For the > callers that do, I feel it is way more fragile to rely on whether or not > a called function leaves `errno` intact or not. For one, it's fragile > because that may easily change due to a bug. Second, it is fragile > because the dependency on `errno` is not explicitly documented via code, > but rather an implicit dependency. > > So isn't it more reasonable to rather make the few callers that do > require `errno` to be left intact to save it? It makes the dependency > explicit, avoids splitting our functions into two sets and allows us to > just ignore this issue for the majority of functions that couldn't care > less about `errno`. 100% agreed. The C language specification says you can't rely on errno persisting across function calls, and that the caller must preserve it if it needs that behavior for some reason. The POSIX specification says you can't either except in very rare circumstances where it guarantees errno will not change. The Linux man page for errno says you can't rely on errno not changing, even for printf: https://man7.org/linux/man-pages/man3/errno.3.html A common mistake is to do if (somecall() == -1) { printf("somecall() failed\n"); if (errno == ...) { ... } } where errno no longer needs to have the value it had upon return from somecall() (i.e., it may have been changed by the printf(3)). If the value of errno should be preserved across a library call, it must be saved: if (somecall() == -1) { int errsv = errno; printf("somecall() failed\n"); if (errsv == ...) { ... } } Basically: errno is _extremely_ volatile. One should assume that _every_ function call is going to change it, even if they return successfully. The only thing that can't happen is that the functions defined in the C and POSIX standards set errno to 0, which is why I withdrew the patch (since it's a wrapper around a function defined in POSIX). But in general, I don't see any reason for any of the functions we write to be errno preserving, especially since any call to malloc, printf, trace functionality, etc. may modify errno. > > Patrick ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v2 2/3] strbuf: set errno to 0 after strbuf_getcwd 2024-08-02 21:32 ` Junio C Hamano 2024-08-02 21:54 ` Eric Sunshine @ 2024-08-02 23:51 ` Kyle Lippincott 2024-08-05 17:12 ` Kyle Lippincott 1 sibling, 1 reply; 29+ messages in thread From: Kyle Lippincott @ 2024-08-02 23:51 UTC (permalink / raw) To: Junio C Hamano; +Cc: Kyle Lippincott via GitGitGadget, git, Patrick Steinhardt On Fri, Aug 2, 2024 at 2:32 PM Junio C Hamano <gitster@pobox.com> wrote: > > "Kyle Lippincott via GitGitGadget" <gitgitgadget@gmail.com> writes: > > > From: Kyle Lippincott <spectral@google.com> > > > > If the loop executes more than once due to cwd being longer than 128 > > bytes, then `errno = ERANGE` might persist outside of this function. > > This technically shouldn't be a problem, as all locations where the > > value in `errno` is tested should either (a) call a function that's > > guaranteed to set `errno` to 0 on success, or (b) set `errno` to 0 prior > > to calling the function that only conditionally sets errno, such as the > > `strtod` function. In the case of functions in category (b), it's easy > > to forget to do that. > > > > Set `errno = 0;` prior to exiting from `strbuf_getcwd` successfully. > > This matches the behavior in functions like `run_transaction_hook` > > (refs.c:2176) and `read_ref_internal` (refs/files-backend.c:564). > > I am still uneasy to see this unconditional clearing, which looks > more like spreading the bad practice from two places you identified > than following good behaviour modelled after these two places. > > But I'll let it pass. > > As long as our programmers understand that across strbuf_getcwd(), > errno will *not* be preserved, even if the function returns success, > it would be OK. As the usual convention around errno is that a > successful call would leave errno intact, not clear it to 0, it > would make it a bit harder to learn our API for newcomers, though. I'm sympathetic to that argument. If you'd prefer to not have this patch, I'm fine with it not landing, and instead at some future date I may try to work on those #leftoverbits from the previous patch (to make a safer wrapper around strtoX, and ban the use of the unwrapped versions), or someone else can if they beat me to it. Since this is wrapping a posix function, and posix has things to say about this (see below), I agree that it shouldn't set it to 0, and withdraw this patch. I'm including my references below mostly because with the information I just acquired, I think that any attempt to _preserve_ errno is also folly. No function we write, unless we explicitly state that it _will_ preserve errno, should feel obligated to do so. The number of cases where errno _could_ be modified according to the various specifications (C99 and posix) are just too numerous. --- Perhaps because I'm not all that experienced with C, but when I did C a couple decades ago, I operated in a mode where basically every function was actively hostile. If I wanted errno preserved across a function call, then it's up to me (the caller) to do so, regardless of what the current implementation of that function says will happen, because that can change at any point. Unless the function is documented as errno-preserving, I'm going to treat it as errno-hostile. In practice, this didn't really matter much, as I've never found `if (some_func()) { if (!some_other_func()) { /* use errno from `some_func` */ } }` logic to happen often, but maybe it does in "real" programs, I was just a hobbyist self-teaching at the time. The C standard has a very precise definition of how the library functions defined in the C specification will act. It guarantees: - the library functions defined in the specification will never set errno to 0. - the library functions defined in the specification may set the value to non-zero whether an error occurs or not, "provided the use of errno is not documented in the description of the function in this International Standard". What this means is that (a) if the function as defined in the C standard mentions errno, it can only set the values as specified there, and (b) if the function as defined in the C standard does _not_ mention errno, such as `fopen` or `strstr`, it can do _whatever it wants_ to errno, even on success, _except_ set it to 0. POSIX has similar language (https://pubs.opengroup.org/onlinepubs/009695399/functions/errno.html), with some key differences: - The value of errno should only be examined when it is indicated to be valid by a function's return value. - The setting of errno after a successful call to a function is unspecified unless the description of that function specifies that errno shall not be modified. This means that unlike the C specification, which says that if a function doesn't describe its use of errno it can do anything it wants to errno [except set it to 0], in POSIX, a function can do anything it wants to errno [except set it to 0] at any time. What this means in practice is that errno should never be assumed to be preserved across calls to posix functions (like getcwd). Also, strbuf_getcwd calls free, malloc, and realloc, none of which mention errno in the C specification, so they can do whatever they want to it [except set it to 0]. That I was able to find one single function that was causing problems is luck, and not guaranteed by any specification. Kind of makes me want to try writing an actively hostile C99 and POSIX environment, and see how many things break with it. :) C99 spec doesn't say anything about malloc setting errno? Ok! malloc now sets errno to ENOENT on tuesdays [in GMT because I'm not a monster], but only on success. On any other day, it'll set it to ERANGE, regardless of success or failure. > > Thanks. > > > Signed-off-by: Kyle Lippincott <spectral@google.com> > > --- > > strbuf.c | 1 + > > 1 file changed, 1 insertion(+) > > > > diff --git a/strbuf.c b/strbuf.c > > index 3d2189a7f64..b94ef040ab0 100644 > > --- a/strbuf.c > > +++ b/strbuf.c > > @@ -601,6 +601,7 @@ int strbuf_getcwd(struct strbuf *sb) > > strbuf_grow(sb, guessed_len); > > if (getcwd(sb->buf, sb->alloc)) { > > strbuf_setlen(sb, strlen(sb->buf)); > > + errno = 0; > > return 0; > > } ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v2 2/3] strbuf: set errno to 0 after strbuf_getcwd 2024-08-02 23:51 ` Kyle Lippincott @ 2024-08-05 17:12 ` Kyle Lippincott 0 siblings, 0 replies; 29+ messages in thread From: Kyle Lippincott @ 2024-08-05 17:12 UTC (permalink / raw) To: Junio C Hamano Cc: Kyle Lippincott via GitGitGadget, git, Patrick Steinhardt, Eric Sunshine On Fri, Aug 2, 2024 at 4:51 PM Kyle Lippincott <spectral@google.com> wrote: > > On Fri, Aug 2, 2024 at 2:32 PM Junio C Hamano <gitster@pobox.com> wrote: > > > > "Kyle Lippincott via GitGitGadget" <gitgitgadget@gmail.com> writes: > > > > > From: Kyle Lippincott <spectral@google.com> > > > > > > If the loop executes more than once due to cwd being longer than 128 > > > bytes, then `errno = ERANGE` might persist outside of this function. > > > This technically shouldn't be a problem, as all locations where the > > > value in `errno` is tested should either (a) call a function that's > > > guaranteed to set `errno` to 0 on success, or (b) set `errno` to 0 prior > > > to calling the function that only conditionally sets errno, such as the > > > `strtod` function. In the case of functions in category (b), it's easy > > > to forget to do that. > > > > > > Set `errno = 0;` prior to exiting from `strbuf_getcwd` successfully. > > > This matches the behavior in functions like `run_transaction_hook` > > > (refs.c:2176) and `read_ref_internal` (refs/files-backend.c:564). > > > > I am still uneasy to see this unconditional clearing, which looks > > more like spreading the bad practice from two places you identified > > than following good behaviour modelled after these two places. > > > > But I'll let it pass. > > > > As long as our programmers understand that across strbuf_getcwd(), > > errno will *not* be preserved, even if the function returns success, > > it would be OK. As the usual convention around errno is that a > > successful call would leave errno intact, not clear it to 0, it > > would make it a bit harder to learn our API for newcomers, though. > > I'm sympathetic to that argument. If you'd prefer to not have this > patch, I'm fine with it not landing, and instead at some future date I > may try to work on those #leftoverbits from the previous patch (to > make a safer wrapper around strtoX, and ban the use of the unwrapped > versions), or someone else can if they beat me to it. > > Since this is wrapping a posix function, and posix has things to say > about this (see below), I agree that it shouldn't set it to 0, and > withdraw this patch. Dropped this patch in the reroll that (I think) I just sent. > > I'm including my references below mostly because with the information > I just acquired, I think that any attempt to _preserve_ errno is also > folly. No function we write, unless we explicitly state that it _will_ > preserve errno, should feel obligated to do so. The number of cases > where errno _could_ be modified according to the various > specifications (C99 and posix) are just too numerous. > > --- > > Perhaps because I'm not all that experienced with C, but when I did C > a couple decades ago, I operated in a mode where basically every > function was actively hostile. If I wanted errno preserved across a > function call, then it's up to me (the caller) to do so, regardless of > what the current implementation of that function says will happen, > because that can change at any point. Unless the function is > documented as errno-preserving, I'm going to treat it as > errno-hostile. In practice, this didn't really matter much, as I've > never found `if (some_func()) { if (!some_other_func()) { /* use errno > from `some_func` */ } }` logic to happen often, but maybe it does in > "real" programs, I was just a hobbyist self-teaching at the time. > > The C standard has a very precise definition of how the library > functions defined in the C specification will act. It guarantees: > - the library functions defined in the specification will never set errno to 0. > - the library functions defined in the specification may set the value > to non-zero whether an error occurs or not, "provided the use of errno > is not documented in the description of the function in this > International Standard". What this means is that (a) if the function > as defined in the C standard mentions errno, it can only set the > values as specified there, and (b) if the function as defined in the C > standard does _not_ mention errno, such as `fopen` or `strstr`, it can > do _whatever it wants_ to errno, even on success, _except_ set it to > 0. > > POSIX has similar language > (https://pubs.opengroup.org/onlinepubs/009695399/functions/errno.html), > with some key differences: > - The value of errno should only be examined when it is indicated to > be valid by a function's return value. > - The setting of errno after a successful call to a function is > unspecified unless the description of that function specifies that > errno shall not be modified. > > This means that unlike the C specification, which says that if a > function doesn't describe its use of errno it can do anything it wants > to errno [except set it to 0], in POSIX, a function can do anything it > wants to errno [except set it to 0] at any time. > > What this means in practice is that errno should never be assumed to > be preserved across calls to posix functions (like getcwd). Also, > strbuf_getcwd calls free, malloc, and realloc, none of which mention > errno in the C specification, so they can do whatever they want to it > [except set it to 0]. That I was able to find one single function that > was causing problems is luck, and not guaranteed by any specification. > > Kind of makes me want to try writing an actively hostile C99 and POSIX > environment, and see how many things break with it. :) C99 spec > doesn't say anything about malloc setting errno? Ok! malloc now sets > errno to ENOENT on tuesdays [in GMT because I'm not a monster], but > only on success. On any other day, it'll set it to ERANGE, regardless > of success or failure. > > > > > Thanks. > > > > > Signed-off-by: Kyle Lippincott <spectral@google.com> > > > --- > > > strbuf.c | 1 + > > > 1 file changed, 1 insertion(+) > > > > > > diff --git a/strbuf.c b/strbuf.c > > > index 3d2189a7f64..b94ef040ab0 100644 > > > --- a/strbuf.c > > > +++ b/strbuf.c > > > @@ -601,6 +601,7 @@ int strbuf_getcwd(struct strbuf *sb) > > > strbuf_grow(sb, guessed_len); > > > if (getcwd(sb->buf, sb->alloc)) { > > > strbuf_setlen(sb, strlen(sb->buf)); > > > + errno = 0; > > > return 0; > > > } ^ permalink raw reply [flat|nested] 29+ messages in thread
* [PATCH v2 3/3] t6421: fix test to work when repo dir contains d0 2024-08-02 20:58 ` [PATCH v2 0/3] Small fixes for issues detected during internal CI runs Kyle Lippincott via GitGitGadget 2024-08-02 20:58 ` [PATCH v2 1/3] set errno=0 before strtoX calls Kyle Lippincott via GitGitGadget 2024-08-02 20:58 ` [PATCH v2 2/3] strbuf: set errno to 0 after strbuf_getcwd Kyle Lippincott via GitGitGadget @ 2024-08-02 20:58 ` Kyle Lippincott via GitGitGadget 2024-08-02 21:41 ` Junio C Hamano 2024-08-05 17:10 ` [PATCH v3 0/2] Small fixes for issues detected during internal CI runs Kyle Lippincott via GitGitGadget 3 siblings, 1 reply; 29+ messages in thread From: Kyle Lippincott via GitGitGadget @ 2024-08-02 20:58 UTC (permalink / raw) To: git; +Cc: Patrick Steinhardt, Kyle Lippincott, Kyle Lippincott From: Kyle Lippincott <spectral@google.com> The `grep` statement in this test looks for `d0.*<string>`, attempting to filter to only show lines that had tabular output where the 2nd column had `d0` and the final column had a substring of [`git -c `]`fetch.negotiationAlgorithm`. These lines also have `child_start` in the 4th column, but this isn't part of the condition. A subsequent line will have `d1` in the 2nd column, `start` in the 4th column, and `/path/to/git/git -c fetch.negotiationAlgorihm` in the final column. If `/path/to/git/git` contains the substring `d0`, then this line is included by `grep` as well as the desired line, leading to an effective doubling of the number of lines, and test failures. Tighten the grep expression to require `d0` to be surrounded by spaces, and to have the `child_start` label. Signed-off-by: Kyle Lippincott <spectral@google.com> --- t/t6421-merge-partial-clone.sh | 15 +++++++++------ 1 file changed, 9 insertions(+), 6 deletions(-) diff --git a/t/t6421-merge-partial-clone.sh b/t/t6421-merge-partial-clone.sh index 711b709e755..b99f29ef9ba 100755 --- a/t/t6421-merge-partial-clone.sh +++ b/t/t6421-merge-partial-clone.sh @@ -230,8 +230,9 @@ test_expect_merge_algorithm failure success 'Objects downloaded for single relev grep fetch_count trace.output | cut -d "|" -f 9 | tr -d " ." >actual && test_cmp expect actual && - # Check the number of fetch commands exec-ed - grep d0.*fetch.negotiationAlgorithm trace.output >fetches && + # Check the number of fetch commands exec-ed by filtering trace to + # child_start events by the top-level program (2nd field == d0) + grep " d0 .* child_start .*fetch.negotiationAlgorithm" trace.output >fetches && test_line_count = 2 fetches && git rev-list --objects --all --missing=print | @@ -318,8 +319,9 @@ test_expect_merge_algorithm failure success 'Objects downloaded when a directory grep fetch_count trace.output | cut -d "|" -f 9 | tr -d " ." >actual && test_cmp expect actual && - # Check the number of fetch commands exec-ed - grep d0.*fetch.negotiationAlgorithm trace.output >fetches && + # Check the number of fetch commands exec-ed by filtering trace to + # child_start events by the top-level program (2nd field == d0) + grep " d0 .* child_start .*fetch.negotiationAlgorithm" trace.output >fetches && test_line_count = 1 fetches && git rev-list --objects --all --missing=print | @@ -422,8 +424,9 @@ test_expect_merge_algorithm failure success 'Objects downloaded with lots of ren grep fetch_count trace.output | cut -d "|" -f 9 | tr -d " ." >actual && test_cmp expect actual && - # Check the number of fetch commands exec-ed - grep d0.*fetch.negotiationAlgorithm trace.output >fetches && + # Check the number of fetch commands exec-ed by filtering trace to + # child_start events by the top-level program (2nd field == d0) + grep " d0 .* child_start .*fetch.negotiationAlgorithm" trace.output >fetches && test_line_count = 4 fetches && git rev-list --objects --all --missing=print | -- gitgitgadget ^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: [PATCH v2 3/3] t6421: fix test to work when repo dir contains d0 2024-08-02 20:58 ` [PATCH v2 3/3] t6421: fix test to work when repo dir contains d0 Kyle Lippincott via GitGitGadget @ 2024-08-02 21:41 ` Junio C Hamano 2024-08-03 0:03 ` Kyle Lippincott 0 siblings, 1 reply; 29+ messages in thread From: Junio C Hamano @ 2024-08-02 21:41 UTC (permalink / raw) To: Kyle Lippincott via GitGitGadget; +Cc: git, Patrick Steinhardt, Kyle Lippincott "Kyle Lippincott via GitGitGadget" <gitgitgadget@gmail.com> writes: > From: Kyle Lippincott <spectral@google.com> > > The `grep` statement in this test looks for `d0.*<string>`, attempting > to filter to only show lines that had tabular output where the 2nd > column had `d0` and the final column had a substring of > [`git -c `]`fetch.negotiationAlgorithm`. These lines also have > `child_start` in the 4th column, but this isn't part of the condition. > > A subsequent line will have `d1` in the 2nd column, `start` in the 4th > column, and `/path/to/git/git -c fetch.negotiationAlgorihm` in the final > column. If `/path/to/git/git` contains the substring `d0`, then this > line is included by `grep` as well as the desired line, leading to an > effective doubling of the number of lines, and test failures. > > Tighten the grep expression to require `d0` to be surrounded by spaces, > and to have the `child_start` label. OK. I think I actually misinterpreted what you meant with these changes. It is not what the patterns are picking. It is some _other_ trace entry we do not necessarily care about, like label:do_write_index that has the path to the .git/index.lock file, that can accidentally contain d0, that can be picked up with a pattern that is too loose. So it really didn't have to clarify what it is looking for, as it would not help seeing what false positives the patterns are designed to avoid matching. Sorry about that. Will queue. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v2 3/3] t6421: fix test to work when repo dir contains d0 2024-08-02 21:41 ` Junio C Hamano @ 2024-08-03 0:03 ` Kyle Lippincott 2024-08-03 0:27 ` Junio C Hamano 0 siblings, 1 reply; 29+ messages in thread From: Kyle Lippincott @ 2024-08-03 0:03 UTC (permalink / raw) To: Junio C Hamano; +Cc: Kyle Lippincott via GitGitGadget, git, Patrick Steinhardt On Fri, Aug 2, 2024 at 2:41 PM Junio C Hamano <gitster@pobox.com> wrote: > > "Kyle Lippincott via GitGitGadget" <gitgitgadget@gmail.com> writes: > > > From: Kyle Lippincott <spectral@google.com> > > > > The `grep` statement in this test looks for `d0.*<string>`, attempting > > to filter to only show lines that had tabular output where the 2nd > > column had `d0` and the final column had a substring of > > [`git -c `]`fetch.negotiationAlgorithm`. These lines also have > > `child_start` in the 4th column, but this isn't part of the condition. > > > > A subsequent line will have `d1` in the 2nd column, `start` in the 4th > > column, and `/path/to/git/git -c fetch.negotiationAlgorihm` in the final > > column. If `/path/to/git/git` contains the substring `d0`, then this > > line is included by `grep` as well as the desired line, leading to an > > effective doubling of the number of lines, and test failures. > > > > Tighten the grep expression to require `d0` to be surrounded by spaces, > > and to have the `child_start` label. > > OK. > > I think I actually misinterpreted what you meant with these changes. > It is not what the patterns are picking. It is some _other_ trace > entry we do not necessarily care about, like label:do_write_index > that has the path to the .git/index.lock file, that can accidentally > contain d0, that can be picked up with a pattern that is too loose. > So it really didn't have to clarify what it is looking for, as it > would not help seeing what false positives the patterns are designed > to avoid matching. Sorry about that. I would have included examples, but they're quite long (>>>80 chars), so seemed very out of place in both commit description and in the codebase. With line wrapping, it wasn't very readable either. At the risk of this also getting line-wrapped into unreadability: test_line_count: line count for fetches != 1 23:59:48.794453 run-command.c:733 | d0 | main | child_start | | 0.027328 | | | ..........[ch1] class:? argv:[git -c fetch.negotiationAlgorithm=noop fetch origin --no-tags --no-write-fetch-head --recurse-submodules=no --filter=blob:none --stdin] 23:59:48.798901 common-main.c:58 | d1 | main | start | | 0.000852 | | | /usr/local/google/home/spectral/src/oss/d0/git/git -c fetch.negotiationAlgorithm=noop fetch origin --no-tags --no-write-fetch-head --recurse-submodules=no --filter=blob:none --stdin where each line in the `fetches` file starts with `23:59:48` here. It's 9 columns, separated by `|` characters, and the line we don't want is the second one; the regex `d0.*fetch.negotiationAlgorithm` includes it because of the `d0` in the path. I considered using `awk -F"|" "\$2~/d0/ && \$9~/fetch\\.negotiationAlgorithm/{ print }" trace.output >fetches`, but it was longer, possibly less clear, and less specific (since it didn't include the $4~/child_start/ condition) > > Will queue. > > ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v2 3/3] t6421: fix test to work when repo dir contains d0 2024-08-03 0:03 ` Kyle Lippincott @ 2024-08-03 0:27 ` Junio C Hamano 0 siblings, 0 replies; 29+ messages in thread From: Junio C Hamano @ 2024-08-03 0:27 UTC (permalink / raw) To: Kyle Lippincott; +Cc: Kyle Lippincott via GitGitGadget, git, Patrick Steinhardt Kyle Lippincott <spectral@google.com> writes: >> So it really didn't have to clarify what it is looking for, as it >> would not help seeing what false positives the patterns are designed >> to avoid matching. Sorry about that. > > I would have included examples, but they're quite long (>>>80 chars), > so seemed very out of place in both commit description and in the > codebase. Absolutely. It turned out not to be so useful to show the shape of potential matches, like this one: ... run-command.c:733 | d0 | main | child_start |... To explain why spaces around " d0 " matters, the readers need to understand that other trace entries that are irrelevant for our purpose, like this one ... | label:do_write_index /path/to/t/trash directory.../.git/index.lock we want reject, and for that we want the pattern to be specific enough by looking for " do " that is followed by " child_start ". Otherwise the leading paths that can contain anything won't easily match, and the original of looking for just "d0" was way too error prone. But it is hard to leave a concise hint for that there. So, again, sorry about the bad suggestion. > I considered using `awk -F"|" "\$2~/d0/ && > \$9~/fetch\\.negotiationAlgorithm/{ print }" trace.output >fetches`, > but it was longer, possibly less clear, and less specific (since it > didn't include the $4~/child_start/ condition) Yeah, using the syntactic clue -F"|" would also be a way to convey the intention (i.e. "we are dealing with tabular output and we expect nth column to be X"), but what you have is probably good enough---it certainly is simpler to read and understand. I briefly considered that looking for "| d0 |" (i.e. explicitly mentioning the column separator in the pattern) would make it even more obvious what we are looking for, but having to worry about quoting "|" in regexp would negate the benefit of obviousness out of the approach to use "grep". Thanks. ^ permalink raw reply [flat|nested] 29+ messages in thread
* [PATCH v3 0/2] Small fixes for issues detected during internal CI runs 2024-08-02 20:58 ` [PATCH v2 0/3] Small fixes for issues detected during internal CI runs Kyle Lippincott via GitGitGadget ` (2 preceding siblings ...) 2024-08-02 20:58 ` [PATCH v2 3/3] t6421: fix test to work when repo dir contains d0 Kyle Lippincott via GitGitGadget @ 2024-08-05 17:10 ` Kyle Lippincott via GitGitGadget 2024-08-05 17:10 ` [PATCH v3 1/2] set errno=0 before strtoX calls Kyle Lippincott via GitGitGadget ` (2 more replies) 3 siblings, 3 replies; 29+ messages in thread From: Kyle Lippincott via GitGitGadget @ 2024-08-05 17:10 UTC (permalink / raw) To: git; +Cc: Patrick Steinhardt, Eric Sunshine, Kyle Lippincott I'm attempting to get the git test suite running automatically during our weekly import. I have this mostly working, including with Address Sanitizer and Memory Sanitizer, but ran into a few issues: * several tests were failing due to strbuf_getcwd not clearing errno on success after it internally looped due to the path being >128 bytes. This is resolved in depth; though either one of the commits alone would resolve our issues: * modify locations that call strtoX and check for ERANGE to set errno = 0; prior to calling the conversion function. This is the typical way that these functions are invoked, and may indicate that we want compatibility helpers in git-compat-util.h to ensure that this happens correctly (and add these functions to the banned list). * have strbuf_getcwd set errno = 0; prior to a successful exit. This isn't very common for most functions in the codebase, but some other examples of this were found. * t6421-merge-partial-clone.sh had >10% flakiness. This is due to our build system using paths that contain a 64-hex-char hash, which had a 12.5% chance of containing the substring d0. Kyle Lippincott (2): set errno=0 before strtoX calls t6421: fix test to work when repo dir contains d0 builtin/get-tar-commit-id.c | 1 + ref-filter.c | 1 + t/helper/test-json-writer.c | 2 ++ t/helper/test-trace2.c | 1 + t/t6421-merge-partial-clone.sh | 15 +++++++++------ 5 files changed, 14 insertions(+), 6 deletions(-) base-commit: e559c4bf1a306cf5814418d318cc0fea070da3c7 Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1756%2Fspectral54%2Fstrbuf_getcwd-clear-errno-v3 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1756/spectral54/strbuf_getcwd-clear-errno-v3 Pull-Request: https://github.com/git/git/pull/1756 Range-diff vs v2: 1: 4dbd0bec40a = 1: 4dbd0bec40a set errno=0 before strtoX calls 2: 0ed09e9abb8 < -: ----------- strbuf: set errno to 0 after strbuf_getcwd 3: 818dc9e6b3e = 2: 96984c4a15e t6421: fix test to work when repo dir contains d0 -- gitgitgadget ^ permalink raw reply [flat|nested] 29+ messages in thread
* [PATCH v3 1/2] set errno=0 before strtoX calls 2024-08-05 17:10 ` [PATCH v3 0/2] Small fixes for issues detected during internal CI runs Kyle Lippincott via GitGitGadget @ 2024-08-05 17:10 ` Kyle Lippincott via GitGitGadget 2024-08-05 17:10 ` [PATCH v3 2/2] t6421: fix test to work when repo dir contains d0 Kyle Lippincott via GitGitGadget 2024-08-05 18:37 ` [PATCH v3 0/2] Small fixes for issues detected during internal CI runs Junio C Hamano 2 siblings, 0 replies; 29+ messages in thread From: Kyle Lippincott via GitGitGadget @ 2024-08-05 17:10 UTC (permalink / raw) To: git; +Cc: Patrick Steinhardt, Eric Sunshine, Kyle Lippincott, Kyle Lippincott From: Kyle Lippincott <spectral@google.com> To detect conversion failure after calls to functions like `strtod`, one can check `errno == ERANGE`. These functions are not guaranteed to set `errno` to `0` on successful conversion, however. Manual manipulation of `errno` can likely be avoided by checking that the output pointer differs from the input pointer, but that's not how other locations, such as parse.c:139, handle this issue; they set errno to 0 prior to executing the function. For every place I could find a strtoX function with an ERANGE check following it, set `errno = 0;` prior to executing the conversion function. Signed-off-by: Kyle Lippincott <spectral@google.com> --- builtin/get-tar-commit-id.c | 1 + ref-filter.c | 1 + t/helper/test-json-writer.c | 2 ++ t/helper/test-trace2.c | 1 + 4 files changed, 5 insertions(+) diff --git a/builtin/get-tar-commit-id.c b/builtin/get-tar-commit-id.c index 66a7389f9f4..7195a072edc 100644 --- a/builtin/get-tar-commit-id.c +++ b/builtin/get-tar-commit-id.c @@ -35,6 +35,7 @@ int cmd_get_tar_commit_id(int argc, const char **argv UNUSED, const char *prefix if (header->typeflag[0] != TYPEFLAG_GLOBAL_HEADER) return 1; + errno = 0; len = strtol(content, &end, 10); if (errno == ERANGE || end == content || len < 0) return 1; diff --git a/ref-filter.c b/ref-filter.c index 8c5e673fc0a..54880a2497a 100644 --- a/ref-filter.c +++ b/ref-filter.c @@ -1628,6 +1628,7 @@ static void grab_date(const char *buf, struct atom_value *v, const char *atomnam timestamp = parse_timestamp(eoemail + 2, &zone, 10); if (timestamp == TIME_MAX) goto bad; + errno = 0; tz = strtol(zone, NULL, 10); if ((tz == LONG_MIN || tz == LONG_MAX) && errno == ERANGE) goto bad; diff --git a/t/helper/test-json-writer.c b/t/helper/test-json-writer.c index ed52eb76bfc..a288069b04c 100644 --- a/t/helper/test-json-writer.c +++ b/t/helper/test-json-writer.c @@ -415,6 +415,7 @@ static void get_i(struct line *line, intmax_t *s_in) get_s(line, &s); + errno = 0; *s_in = strtol(s, &endptr, 10); if (*endptr || errno == ERANGE) die("line[%d]: invalid integer value", line->nr); @@ -427,6 +428,7 @@ static void get_d(struct line *line, double *s_in) get_s(line, &s); + errno = 0; *s_in = strtod(s, &endptr); if (*endptr || errno == ERANGE) die("line[%d]: invalid float value", line->nr); diff --git a/t/helper/test-trace2.c b/t/helper/test-trace2.c index cd955ec63e9..c588c273ce7 100644 --- a/t/helper/test-trace2.c +++ b/t/helper/test-trace2.c @@ -26,6 +26,7 @@ static int get_i(int *p_value, const char *data) if (!data || !*data) return MyError; + errno = 0; *p_value = strtol(data, &endptr, 10); if (*endptr || errno == ERANGE) return MyError; -- gitgitgadget ^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH v3 2/2] t6421: fix test to work when repo dir contains d0 2024-08-05 17:10 ` [PATCH v3 0/2] Small fixes for issues detected during internal CI runs Kyle Lippincott via GitGitGadget 2024-08-05 17:10 ` [PATCH v3 1/2] set errno=0 before strtoX calls Kyle Lippincott via GitGitGadget @ 2024-08-05 17:10 ` Kyle Lippincott via GitGitGadget 2024-08-05 18:37 ` [PATCH v3 0/2] Small fixes for issues detected during internal CI runs Junio C Hamano 2 siblings, 0 replies; 29+ messages in thread From: Kyle Lippincott via GitGitGadget @ 2024-08-05 17:10 UTC (permalink / raw) To: git; +Cc: Patrick Steinhardt, Eric Sunshine, Kyle Lippincott, Kyle Lippincott From: Kyle Lippincott <spectral@google.com> The `grep` statement in this test looks for `d0.*<string>`, attempting to filter to only show lines that had tabular output where the 2nd column had `d0` and the final column had a substring of [`git -c `]`fetch.negotiationAlgorithm`. These lines also have `child_start` in the 4th column, but this isn't part of the condition. A subsequent line will have `d1` in the 2nd column, `start` in the 4th column, and `/path/to/git/git -c fetch.negotiationAlgorihm` in the final column. If `/path/to/git/git` contains the substring `d0`, then this line is included by `grep` as well as the desired line, leading to an effective doubling of the number of lines, and test failures. Tighten the grep expression to require `d0` to be surrounded by spaces, and to have the `child_start` label. Signed-off-by: Kyle Lippincott <spectral@google.com> --- t/t6421-merge-partial-clone.sh | 15 +++++++++------ 1 file changed, 9 insertions(+), 6 deletions(-) diff --git a/t/t6421-merge-partial-clone.sh b/t/t6421-merge-partial-clone.sh index 711b709e755..b99f29ef9ba 100755 --- a/t/t6421-merge-partial-clone.sh +++ b/t/t6421-merge-partial-clone.sh @@ -230,8 +230,9 @@ test_expect_merge_algorithm failure success 'Objects downloaded for single relev grep fetch_count trace.output | cut -d "|" -f 9 | tr -d " ." >actual && test_cmp expect actual && - # Check the number of fetch commands exec-ed - grep d0.*fetch.negotiationAlgorithm trace.output >fetches && + # Check the number of fetch commands exec-ed by filtering trace to + # child_start events by the top-level program (2nd field == d0) + grep " d0 .* child_start .*fetch.negotiationAlgorithm" trace.output >fetches && test_line_count = 2 fetches && git rev-list --objects --all --missing=print | @@ -318,8 +319,9 @@ test_expect_merge_algorithm failure success 'Objects downloaded when a directory grep fetch_count trace.output | cut -d "|" -f 9 | tr -d " ." >actual && test_cmp expect actual && - # Check the number of fetch commands exec-ed - grep d0.*fetch.negotiationAlgorithm trace.output >fetches && + # Check the number of fetch commands exec-ed by filtering trace to + # child_start events by the top-level program (2nd field == d0) + grep " d0 .* child_start .*fetch.negotiationAlgorithm" trace.output >fetches && test_line_count = 1 fetches && git rev-list --objects --all --missing=print | @@ -422,8 +424,9 @@ test_expect_merge_algorithm failure success 'Objects downloaded with lots of ren grep fetch_count trace.output | cut -d "|" -f 9 | tr -d " ." >actual && test_cmp expect actual && - # Check the number of fetch commands exec-ed - grep d0.*fetch.negotiationAlgorithm trace.output >fetches && + # Check the number of fetch commands exec-ed by filtering trace to + # child_start events by the top-level program (2nd field == d0) + grep " d0 .* child_start .*fetch.negotiationAlgorithm" trace.output >fetches && test_line_count = 4 fetches && git rev-list --objects --all --missing=print | -- gitgitgadget ^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: [PATCH v3 0/2] Small fixes for issues detected during internal CI runs 2024-08-05 17:10 ` [PATCH v3 0/2] Small fixes for issues detected during internal CI runs Kyle Lippincott via GitGitGadget 2024-08-05 17:10 ` [PATCH v3 1/2] set errno=0 before strtoX calls Kyle Lippincott via GitGitGadget 2024-08-05 17:10 ` [PATCH v3 2/2] t6421: fix test to work when repo dir contains d0 Kyle Lippincott via GitGitGadget @ 2024-08-05 18:37 ` Junio C Hamano 2 siblings, 0 replies; 29+ messages in thread From: Junio C Hamano @ 2024-08-05 18:37 UTC (permalink / raw) To: Kyle Lippincott via GitGitGadget Cc: git, Patrick Steinhardt, Eric Sunshine, Kyle Lippincott "Kyle Lippincott via GitGitGadget" <gitgitgadget@gmail.com> writes: > I'm attempting to get the git test suite running automatically during our > weekly import. I have this mostly working, including with Address Sanitizer > and Memory Sanitizer, but ran into a few issues: > > * several tests were failing due to strbuf_getcwd not clearing errno on > success after it internally looped due to the path being >128 bytes. This > is resolved in depth; though either one of the commits alone would > resolve our issues: > * modify locations that call strtoX and check for ERANGE to set errno = > 0; prior to calling the conversion function. This is the typical way > that these functions are invoked, and may indicate that we want > compatibility helpers in git-compat-util.h to ensure that this happens > correctly (and add these functions to the banned list). > * have strbuf_getcwd set errno = 0; prior to a successful exit. This > isn't very common for most functions in the codebase, but some other > examples of this were found. > * t6421-merge-partial-clone.sh had >10% flakiness. This is due to our build > system using paths that contain a 64-hex-char hash, which had a 12.5% > chance of containing the substring d0. > > Kyle Lippincott (2): > set errno=0 before strtoX calls > t6421: fix test to work when repo dir contains d0 Both patches make perfect sense to me. Thanks. ^ permalink raw reply [flat|nested] 29+ messages in thread
end of thread, other threads:[~2024-08-06 7:04 UTC | newest] Thread overview: 29+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-08-02 4:10 [PATCH 0/3] Small fixes for issues detected during internal CI runs Kyle Lippincott via GitGitGadget 2024-08-02 4:10 ` [PATCH 1/3] set errno=0 before strtoX calls Kyle Lippincott via GitGitGadget 2024-08-02 5:12 ` Patrick Steinhardt 2024-08-02 6:15 ` Kyle Lippincott 2024-08-02 15:01 ` Junio C Hamano 2024-08-02 4:10 ` [PATCH 2/3] strbuf: set errno to 0 after strbuf_getcwd Kyle Lippincott via GitGitGadget 2024-08-02 15:10 ` Junio C Hamano 2024-08-02 17:56 ` Kyle Lippincott 2024-08-02 4:10 ` [PATCH 3/3] t6421: fix test to work when repo dir contains d0 Kyle Lippincott via GitGitGadget 2024-08-02 15:13 ` Junio C Hamano 2024-08-02 20:58 ` [PATCH v2 0/3] Small fixes for issues detected during internal CI runs Kyle Lippincott via GitGitGadget 2024-08-02 20:58 ` [PATCH v2 1/3] set errno=0 before strtoX calls Kyle Lippincott via GitGitGadget 2024-08-02 21:18 ` Junio C Hamano 2024-08-02 20:58 ` [PATCH v2 2/3] strbuf: set errno to 0 after strbuf_getcwd Kyle Lippincott via GitGitGadget 2024-08-02 21:32 ` Junio C Hamano 2024-08-02 21:54 ` Eric Sunshine 2024-08-05 15:51 ` Junio C Hamano 2024-08-06 6:26 ` Patrick Steinhardt 2024-08-06 7:04 ` Kyle Lippincott 2024-08-02 23:51 ` Kyle Lippincott 2024-08-05 17:12 ` Kyle Lippincott 2024-08-02 20:58 ` [PATCH v2 3/3] t6421: fix test to work when repo dir contains d0 Kyle Lippincott via GitGitGadget 2024-08-02 21:41 ` Junio C Hamano 2024-08-03 0:03 ` Kyle Lippincott 2024-08-03 0:27 ` Junio C Hamano 2024-08-05 17:10 ` [PATCH v3 0/2] Small fixes for issues detected during internal CI runs Kyle Lippincott via GitGitGadget 2024-08-05 17:10 ` [PATCH v3 1/2] set errno=0 before strtoX calls Kyle Lippincott via GitGitGadget 2024-08-05 17:10 ` [PATCH v3 2/2] t6421: fix test to work when repo dir contains d0 Kyle Lippincott via GitGitGadget 2024-08-05 18:37 ` [PATCH v3 0/2] Small fixes for issues detected during internal CI runs Junio C Hamano
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).