* [PATCH 0/3] Small fixes for issues detected during internal CI runs
@ 2024-08-02 4:10 Kyle Lippincott via GitGitGadget
2024-08-02 4:10 ` [PATCH 1/3] set errno=0 before strtoX calls Kyle Lippincott via GitGitGadget
` (3 more replies)
0 siblings, 4 replies; 29+ messages in thread
From: Kyle Lippincott via GitGitGadget @ 2024-08-02 4:10 UTC (permalink / raw)
To: git; +Cc: Kyle Lippincott
I'm attempting to get the git test suite running automatically during our
weekly import. I have this mostly working, including with Address Sanitizer
and Memory Sanitizer, but ran into a few issues:
* several tests were failing due to strbuf_getcwd not clearing errno on
success after it internally looped due to the path being >128 bytes. This
is resolved in depth; though either one of the commits alone would
resolve our issues:
* modify locations that call strtoX and check for ERANGE to set errno =
0; prior to calling the conversion function. This is the typical way
that these functions are invoked, and may indicate that we want
compatibility helpers in git-compat-util.h to ensure that this happens
correctly (and add these functions to the banned list).
* have strbuf_getcwd set errno = 0; prior to a successful exit. This
isn't very common for most functions in the codebase, but some other
examples of this were found.
* t6421-merge-partial-clone.sh had >10% flakiness. This is due to our build
system using paths that contain a 64-hex-char hash, which had a 12.5%
chance of containing the substring d0.
Kyle Lippincott (3):
set errno=0 before strtoX calls
strbuf: set errno to 0 after strbuf_getcwd
t6421: fix test to work when repo dir contains d0
builtin/get-tar-commit-id.c | 1 +
ref-filter.c | 1 +
strbuf.c | 1 +
t/helper/test-json-writer.c | 2 ++
t/helper/test-trace2.c | 1 +
t/t6421-merge-partial-clone.sh | 6 +++---
6 files changed, 9 insertions(+), 3 deletions(-)
base-commit: e559c4bf1a306cf5814418d318cc0fea070da3c7
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1756%2Fspectral54%2Fstrbuf_getcwd-clear-errno-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1756/spectral54/strbuf_getcwd-clear-errno-v1
Pull-Request: https://github.com/git/git/pull/1756
--
gitgitgadget
^ permalink raw reply [flat|nested] 29+ messages in thread
* [PATCH 1/3] set errno=0 before strtoX calls
2024-08-02 4:10 [PATCH 0/3] Small fixes for issues detected during internal CI runs Kyle Lippincott via GitGitGadget
@ 2024-08-02 4:10 ` Kyle Lippincott via GitGitGadget
2024-08-02 5:12 ` Patrick Steinhardt
2024-08-02 4:10 ` [PATCH 2/3] strbuf: set errno to 0 after strbuf_getcwd Kyle Lippincott via GitGitGadget
` (2 subsequent siblings)
3 siblings, 1 reply; 29+ messages in thread
From: Kyle Lippincott via GitGitGadget @ 2024-08-02 4:10 UTC (permalink / raw)
To: git; +Cc: Kyle Lippincott, Kyle Lippincott
From: Kyle Lippincott <spectral@google.com>
To detect conversion failure after calls to functions like `strtod`, one
can check `errno == ERANGE`. These functions are not guaranteed to set
`errno` to `0` on successful conversion, however. Manual manipulation of
`errno` can likely be avoided by checking that the output pointer
differs from the input pointer, but that's not how other locations, such
as parse.c:139, handle this issue; they set errno to 0 prior to
executing the function.
For every place I could find a strtoX function with an ERANGE check
following it, set `errno = 0;` prior to executing the conversion
function.
Signed-off-by: Kyle Lippincott <spectral@google.com>
---
builtin/get-tar-commit-id.c | 1 +
ref-filter.c | 1 +
t/helper/test-json-writer.c | 2 ++
t/helper/test-trace2.c | 1 +
4 files changed, 5 insertions(+)
diff --git a/builtin/get-tar-commit-id.c b/builtin/get-tar-commit-id.c
index 66a7389f9f4..7195a072edc 100644
--- a/builtin/get-tar-commit-id.c
+++ b/builtin/get-tar-commit-id.c
@@ -35,6 +35,7 @@ int cmd_get_tar_commit_id(int argc, const char **argv UNUSED, const char *prefix
if (header->typeflag[0] != TYPEFLAG_GLOBAL_HEADER)
return 1;
+ errno = 0;
len = strtol(content, &end, 10);
if (errno == ERANGE || end == content || len < 0)
return 1;
diff --git a/ref-filter.c b/ref-filter.c
index 8c5e673fc0a..54880a2497a 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -1628,6 +1628,7 @@ static void grab_date(const char *buf, struct atom_value *v, const char *atomnam
timestamp = parse_timestamp(eoemail + 2, &zone, 10);
if (timestamp == TIME_MAX)
goto bad;
+ errno = 0;
tz = strtol(zone, NULL, 10);
if ((tz == LONG_MIN || tz == LONG_MAX) && errno == ERANGE)
goto bad;
diff --git a/t/helper/test-json-writer.c b/t/helper/test-json-writer.c
index ed52eb76bfc..a288069b04c 100644
--- a/t/helper/test-json-writer.c
+++ b/t/helper/test-json-writer.c
@@ -415,6 +415,7 @@ static void get_i(struct line *line, intmax_t *s_in)
get_s(line, &s);
+ errno = 0;
*s_in = strtol(s, &endptr, 10);
if (*endptr || errno == ERANGE)
die("line[%d]: invalid integer value", line->nr);
@@ -427,6 +428,7 @@ static void get_d(struct line *line, double *s_in)
get_s(line, &s);
+ errno = 0;
*s_in = strtod(s, &endptr);
if (*endptr || errno == ERANGE)
die("line[%d]: invalid float value", line->nr);
diff --git a/t/helper/test-trace2.c b/t/helper/test-trace2.c
index cd955ec63e9..c588c273ce7 100644
--- a/t/helper/test-trace2.c
+++ b/t/helper/test-trace2.c
@@ -26,6 +26,7 @@ static int get_i(int *p_value, const char *data)
if (!data || !*data)
return MyError;
+ errno = 0;
*p_value = strtol(data, &endptr, 10);
if (*endptr || errno == ERANGE)
return MyError;
--
gitgitgadget
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH 2/3] strbuf: set errno to 0 after strbuf_getcwd
2024-08-02 4:10 [PATCH 0/3] Small fixes for issues detected during internal CI runs Kyle Lippincott via GitGitGadget
2024-08-02 4:10 ` [PATCH 1/3] set errno=0 before strtoX calls Kyle Lippincott via GitGitGadget
@ 2024-08-02 4:10 ` Kyle Lippincott via GitGitGadget
2024-08-02 15:10 ` Junio C Hamano
2024-08-02 4:10 ` [PATCH 3/3] t6421: fix test to work when repo dir contains d0 Kyle Lippincott via GitGitGadget
2024-08-02 20:58 ` [PATCH v2 0/3] Small fixes for issues detected during internal CI runs Kyle Lippincott via GitGitGadget
3 siblings, 1 reply; 29+ messages in thread
From: Kyle Lippincott via GitGitGadget @ 2024-08-02 4:10 UTC (permalink / raw)
To: git; +Cc: Kyle Lippincott, Kyle Lippincott
From: Kyle Lippincott <spectral@google.com>
If the loop executes more than once due to cwd being longer than 128
bytes, then `errno = ERANGE` might persist outside of this function.
This technically shouldn't be a problem, as all locations where the
value in `errno` is tested should either (a) call a function that's
guaranteed to set `errno` to 0 on success, or (b) set `errno` to 0 prior
to calling the function that only conditionally sets errno, such as the
`strtod` function. In the case of functions in category (b), it's easy
to forget to do that.
Set `errno = 0;` prior to exiting from `strbuf_getcwd` successfully.
This matches the behavior in functions like `run_transaction_hook`
(refs.c:2176) and `read_ref_internal` (refs/files-backend.c:564).
Signed-off-by: Kyle Lippincott <spectral@google.com>
---
strbuf.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/strbuf.c b/strbuf.c
index 3d2189a7f64..b94ef040ab0 100644
--- a/strbuf.c
+++ b/strbuf.c
@@ -601,6 +601,7 @@ int strbuf_getcwd(struct strbuf *sb)
strbuf_grow(sb, guessed_len);
if (getcwd(sb->buf, sb->alloc)) {
strbuf_setlen(sb, strlen(sb->buf));
+ errno = 0;
return 0;
}
--
gitgitgadget
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH 3/3] t6421: fix test to work when repo dir contains d0
2024-08-02 4:10 [PATCH 0/3] Small fixes for issues detected during internal CI runs Kyle Lippincott via GitGitGadget
2024-08-02 4:10 ` [PATCH 1/3] set errno=0 before strtoX calls Kyle Lippincott via GitGitGadget
2024-08-02 4:10 ` [PATCH 2/3] strbuf: set errno to 0 after strbuf_getcwd Kyle Lippincott via GitGitGadget
@ 2024-08-02 4:10 ` Kyle Lippincott via GitGitGadget
2024-08-02 15:13 ` Junio C Hamano
2024-08-02 20:58 ` [PATCH v2 0/3] Small fixes for issues detected during internal CI runs Kyle Lippincott via GitGitGadget
3 siblings, 1 reply; 29+ messages in thread
From: Kyle Lippincott via GitGitGadget @ 2024-08-02 4:10 UTC (permalink / raw)
To: git; +Cc: Kyle Lippincott, Kyle Lippincott
From: Kyle Lippincott <spectral@google.com>
The `grep` statement in this test looks for `d0.*<string>`, attempting
to filter to only show lines that had tabular output where the 2nd
column had `d0` and the final column had a substring of
[`git -c `]`fetch.negotiationAlgorithm`. These lines also have
`child_start` in the 4th column, but this isn't part of the condition.
A subsequent line will have `d1` in the 2nd column, `start` in the 4th
column, and `/path/to/git/git -c fetch.negotiationAlgorihm` in the final
column. If `/path/to/git/git` contains the substring `d0`, then this
line is included by `grep` as well as the desired line, leading to an
effective doubling of the number of lines, and test failures.
Tighten the grep expression to require `d0` to be surrounded by spaces,
and to have the `child_start` label.
Signed-off-by: Kyle Lippincott <spectral@google.com>
---
t/t6421-merge-partial-clone.sh | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/t/t6421-merge-partial-clone.sh b/t/t6421-merge-partial-clone.sh
index 711b709e755..0f312ac93dc 100755
--- a/t/t6421-merge-partial-clone.sh
+++ b/t/t6421-merge-partial-clone.sh
@@ -231,7 +231,7 @@ test_expect_merge_algorithm failure success 'Objects downloaded for single relev
test_cmp expect actual &&
# Check the number of fetch commands exec-ed
- grep d0.*fetch.negotiationAlgorithm trace.output >fetches &&
+ grep " d0 .* child_start .*fetch.negotiationAlgorithm" trace.output >fetches &&
test_line_count = 2 fetches &&
git rev-list --objects --all --missing=print |
@@ -319,7 +319,7 @@ test_expect_merge_algorithm failure success 'Objects downloaded when a directory
test_cmp expect actual &&
# Check the number of fetch commands exec-ed
- grep d0.*fetch.negotiationAlgorithm trace.output >fetches &&
+ grep " d0 .* child_start .*fetch.negotiationAlgorithm" trace.output >fetches &&
test_line_count = 1 fetches &&
git rev-list --objects --all --missing=print |
@@ -423,7 +423,7 @@ test_expect_merge_algorithm failure success 'Objects downloaded with lots of ren
test_cmp expect actual &&
# Check the number of fetch commands exec-ed
- grep d0.*fetch.negotiationAlgorithm trace.output >fetches &&
+ grep " d0 .* child_start .*fetch.negotiationAlgorithm" trace.output >fetches &&
test_line_count = 4 fetches &&
git rev-list --objects --all --missing=print |
--
gitgitgadget
^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: [PATCH 1/3] set errno=0 before strtoX calls
2024-08-02 4:10 ` [PATCH 1/3] set errno=0 before strtoX calls Kyle Lippincott via GitGitGadget
@ 2024-08-02 5:12 ` Patrick Steinhardt
2024-08-02 6:15 ` Kyle Lippincott
2024-08-02 15:01 ` Junio C Hamano
0 siblings, 2 replies; 29+ messages in thread
From: Patrick Steinhardt @ 2024-08-02 5:12 UTC (permalink / raw)
To: Kyle Lippincott via GitGitGadget; +Cc: git, Kyle Lippincott
[-- Attachment #1: Type: text/plain, Size: 1634 bytes --]
On Fri, Aug 02, 2024 at 04:10:51AM +0000, Kyle Lippincott via GitGitGadget wrote:
> From: Kyle Lippincott <spectral@google.com>
>
> To detect conversion failure after calls to functions like `strtod`, one
> can check `errno == ERANGE`. These functions are not guaranteed to set
> `errno` to `0` on successful conversion, however. Manual manipulation of
> `errno` can likely be avoided by checking that the output pointer
> differs from the input pointer, but that's not how other locations, such
> as parse.c:139, handle this issue; they set errno to 0 prior to
> executing the function.
>
> For every place I could find a strtoX function with an ERANGE check
> following it, set `errno = 0;` prior to executing the conversion
> function.
Makes sense. I've also gone through callsites and couldn't spot any
additional ones that are broken.
Generally speaking, the interfaces provided by the `strtod()` family of
functions is just plain awful, and ideally we wouldn't be using them in
the Git codebase at all without a wrapper. We already do have wrappers
for a subset of those functions, e.g. `strtol_i()`, which use an out
pointer to store the result and indicate success via the return value
instead of via `errno`.
It would be great if we could extend those wrappers to cover all of the
integer types, convert our code base to use them, and then extend our
"banned.h" banner. I'm of course not asking you to do that in this patch
series.
Out of curiosity, why do you hit those errors in your test setup? Do you
use a special libc that behaves differently than the most common ones?
Patrick
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 1/3] set errno=0 before strtoX calls
2024-08-02 5:12 ` Patrick Steinhardt
@ 2024-08-02 6:15 ` Kyle Lippincott
2024-08-02 15:01 ` Junio C Hamano
1 sibling, 0 replies; 29+ messages in thread
From: Kyle Lippincott @ 2024-08-02 6:15 UTC (permalink / raw)
To: Patrick Steinhardt; +Cc: Kyle Lippincott via GitGitGadget, git
On Thu, Aug 1, 2024 at 10:12 PM Patrick Steinhardt <ps@pks.im> wrote:
>
> On Fri, Aug 02, 2024 at 04:10:51AM +0000, Kyle Lippincott via GitGitGadget wrote:
> > From: Kyle Lippincott <spectral@google.com>
> >
> > To detect conversion failure after calls to functions like `strtod`, one
> > can check `errno == ERANGE`. These functions are not guaranteed to set
> > `errno` to `0` on successful conversion, however. Manual manipulation of
> > `errno` can likely be avoided by checking that the output pointer
> > differs from the input pointer, but that's not how other locations, such
> > as parse.c:139, handle this issue; they set errno to 0 prior to
> > executing the function.
> >
> > For every place I could find a strtoX function with an ERANGE check
> > following it, set `errno = 0;` prior to executing the conversion
> > function.
>
> Makes sense. I've also gone through callsites and couldn't spot any
> additional ones that are broken.
>
> Generally speaking, the interfaces provided by the `strtod()` family of
> functions is just plain awful, and ideally we wouldn't be using them in
> the Git codebase at all without a wrapper. We already do have wrappers
> for a subset of those functions, e.g. `strtol_i()`, which use an out
> pointer to store the result and indicate success via the return value
> instead of via `errno`.
>
> It would be great if we could extend those wrappers to cover all of the
> integer types, convert our code base to use them, and then extend our
> "banned.h" banner. I'm of course not asking you to do that in this patch
> series.
>
> Out of curiosity, why do you hit those errors in your test setup? Do you
> use a special libc that behaves differently than the most common ones?
The second patch in this series fixes the original reason I noticed
the issues in three of the files: our remote test execution service
uses paths that are >128 bytes long, so the getcwd call in
strbuf_getcwd was returning ERANGE once, and then it remained set
since getcwd didn't clear it on success. ref-filter.c was found via
searching, I think that was during the search for `ERANGE`.
>
> Patrick
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 1/3] set errno=0 before strtoX calls
2024-08-02 5:12 ` Patrick Steinhardt
2024-08-02 6:15 ` Kyle Lippincott
@ 2024-08-02 15:01 ` Junio C Hamano
1 sibling, 0 replies; 29+ messages in thread
From: Junio C Hamano @ 2024-08-02 15:01 UTC (permalink / raw)
To: Patrick Steinhardt; +Cc: Kyle Lippincott via GitGitGadget, git, Kyle Lippincott
Patrick Steinhardt <ps@pks.im> writes:
> It would be great if we could extend those wrappers to cover all of the
> integer types, convert our code base to use them, and then extend our
> "banned.h" banner. I'm of course not asking you to do that in this patch
> series.
A good #leftoverbits material.
> Out of curiosity, why do you hit those errors in your test setup? Do you
> use a special libc that behaves differently than the most common ones?
;-)
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 2/3] strbuf: set errno to 0 after strbuf_getcwd
2024-08-02 4:10 ` [PATCH 2/3] strbuf: set errno to 0 after strbuf_getcwd Kyle Lippincott via GitGitGadget
@ 2024-08-02 15:10 ` Junio C Hamano
2024-08-02 17:56 ` Kyle Lippincott
0 siblings, 1 reply; 29+ messages in thread
From: Junio C Hamano @ 2024-08-02 15:10 UTC (permalink / raw)
To: Kyle Lippincott via GitGitGadget; +Cc: git, Kyle Lippincott
"Kyle Lippincott via GitGitGadget" <gitgitgadget@gmail.com> writes:
> Set `errno = 0;` prior to exiting from `strbuf_getcwd` successfully.
> This matches the behavior in functions like `run_transaction_hook`
> (refs.c:2176) and `read_ref_internal` (refs/files-backend.c:564).
This deep in the call chain, there is nothing that assures us that
the caller of this function does not care about the error before
entering this function, so I feel a bit uneasy about the approach,
and my initial reaction was "wouldn't it be safer to do the usual
int saved_errno = errno;
for (guessed_len = 128;; guessed_len *= 2) {
... do things ...
if (...) {
... happy ...
errno = saved_errno;
return 0;
}
}
pattern.
Who calls this function, and inspects errno when this function
returns 0? I do not mind adding the "save and restore" fix to this
function, but if there is a caller that looks at errno from a call
that returns success, that caller may also have to be looked at and
fixed if necessary.
Thanks.
> Signed-off-by: Kyle Lippincott <spectral@google.com>
> ---
> strbuf.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/strbuf.c b/strbuf.c
> index 3d2189a7f64..b94ef040ab0 100644
> --- a/strbuf.c
> +++ b/strbuf.c
> @@ -601,6 +601,7 @@ int strbuf_getcwd(struct strbuf *sb)
> strbuf_grow(sb, guessed_len);
> if (getcwd(sb->buf, sb->alloc)) {
> strbuf_setlen(sb, strlen(sb->buf));
> + errno = 0;
> return 0;
> }
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 3/3] t6421: fix test to work when repo dir contains d0
2024-08-02 4:10 ` [PATCH 3/3] t6421: fix test to work when repo dir contains d0 Kyle Lippincott via GitGitGadget
@ 2024-08-02 15:13 ` Junio C Hamano
0 siblings, 0 replies; 29+ messages in thread
From: Junio C Hamano @ 2024-08-02 15:13 UTC (permalink / raw)
To: Kyle Lippincott via GitGitGadget; +Cc: git, Kyle Lippincott
"Kyle Lippincott via GitGitGadget" <gitgitgadget@gmail.com> writes:
> From: Kyle Lippincott <spectral@google.com>
>
> The `grep` statement in this test looks for `d0.*<string>`, attempting
> to filter to only show lines that had tabular output where the 2nd
> column had `d0` and the final column had a substring of
> [`git -c `]`fetch.negotiationAlgorithm`. These lines also have
> `child_start` in the 4th column, but this isn't part of the condition.
>
> A subsequent line will have `d1` in the 2nd column, `start` in the 4th
> column, and `/path/to/git/git -c fetch.negotiationAlgorihm` in the final
> column. If `/path/to/git/git` contains the substring `d0`, then this
> line is included by `grep` as well as the desired line, leading to an
> effective doubling of the number of lines, and test failures.
>
> Tighten the grep expression to require `d0` to be surrounded by spaces,
> and to have the `child_start` label.
Makes sense.
Updating the comment with expected shape of the output might make it
even less likely that we'd break these fixes again by mistake.
Thanks.
> Signed-off-by: Kyle Lippincott <spectral@google.com>
> ---
> t/t6421-merge-partial-clone.sh | 6 +++---
> 1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/t/t6421-merge-partial-clone.sh b/t/t6421-merge-partial-clone.sh
> index 711b709e755..0f312ac93dc 100755
> --- a/t/t6421-merge-partial-clone.sh
> +++ b/t/t6421-merge-partial-clone.sh
> @@ -231,7 +231,7 @@ test_expect_merge_algorithm failure success 'Objects downloaded for single relev
> test_cmp expect actual &&
>
> # Check the number of fetch commands exec-ed
> - grep d0.*fetch.negotiationAlgorithm trace.output >fetches &&
> + grep " d0 .* child_start .*fetch.negotiationAlgorithm" trace.output >fetches &&
> test_line_count = 2 fetches &&
>
> git rev-list --objects --all --missing=print |
> @@ -319,7 +319,7 @@ test_expect_merge_algorithm failure success 'Objects downloaded when a directory
> test_cmp expect actual &&
>
> # Check the number of fetch commands exec-ed
> - grep d0.*fetch.negotiationAlgorithm trace.output >fetches &&
> + grep " d0 .* child_start .*fetch.negotiationAlgorithm" trace.output >fetches &&
> test_line_count = 1 fetches &&
>
> git rev-list --objects --all --missing=print |
> @@ -423,7 +423,7 @@ test_expect_merge_algorithm failure success 'Objects downloaded with lots of ren
> test_cmp expect actual &&
>
> # Check the number of fetch commands exec-ed
> - grep d0.*fetch.negotiationAlgorithm trace.output >fetches &&
> + grep " d0 .* child_start .*fetch.negotiationAlgorithm" trace.output >fetches &&
> test_line_count = 4 fetches &&
>
> git rev-list --objects --all --missing=print |
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 2/3] strbuf: set errno to 0 after strbuf_getcwd
2024-08-02 15:10 ` Junio C Hamano
@ 2024-08-02 17:56 ` Kyle Lippincott
0 siblings, 0 replies; 29+ messages in thread
From: Kyle Lippincott @ 2024-08-02 17:56 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Kyle Lippincott via GitGitGadget, git
On Fri, Aug 2, 2024 at 8:10 AM Junio C Hamano <gitster@pobox.com> wrote:
>
> "Kyle Lippincott via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
> > Set `errno = 0;` prior to exiting from `strbuf_getcwd` successfully.
> > This matches the behavior in functions like `run_transaction_hook`
> > (refs.c:2176) and `read_ref_internal` (refs/files-backend.c:564).
>
> This deep in the call chain, there is nothing that assures us that
> the caller of this function does not care about the error before
> entering this function, so I feel a bit uneasy about the approach,
> and my initial reaction was "wouldn't it be safer to do the usual
>
> int saved_errno = errno;
>
> for (guessed_len = 128;; guessed_len *= 2) {
> ... do things ...
> if (...) {
> ... happy ...
> errno = saved_errno;
> return 0;
> }
> }
>
> pattern.
>
> Who calls this function, and inspects errno when this function
> returns 0?
That's a difficult question to answer if you want to be wholistic for
the whole program :) For immediate callers:
- unix_sockaddr_init: doesn't inspect or adjust errno itself if
strbuf_getcwd returns 0. Continues on to call other functions that may
set errno.
- strbuf_realpath_1: same
- chdir_notify: same
- discover_git_directory_reason: same
- setup_git_directory_gently: same
- setup_enlistment_directory (in scalar.c): dies immediately if
strbuf_getcwd returns < 0, otherwise same
- xgetcwd: also doesn't inspect/adjust errno if strbuf_getcwd returns
0. Doesn't call any other functions afterward (besides strbuf
functions).
- main: stores the value if strbuf_getcwd returns 0, but doesn't inspect errno.
> I do not mind adding the "save and restore" fix to this
> function, but if there is a caller that looks at errno from a call
> that returns success, that caller may also have to be looked at and
> fixed if necessary.
There aren't any that I could find, this patch is mostly a
defense-in-depth solution to the strtoX functions that were fixed in
patch 1. This function _may_ set errno even on success. That errno
value ends up retained indefinitely as long as things continue
succeeding, and then we call a function like `strtod` which has a
suboptimal interface. If this patch doesn't land, the codebase is
still correct; the main reason to want to land this is that without
this patch, any user that has paths longer than 128 bytes becomes de
facto responsible for finding and reporting/fixing issues that arise
from this errno value being persisted, and I was hoping I wouldn't be
signing the people maintaining CI at $JOB up for that :) It's not an
obvious failure, either. For example, t0211's failure, prior to
setting errno to 0 just before calling strtoX is just: `fatal: expect
<exit_code>`. That's not easy to trace back to "strbuf_getcwd sets
ERANGE in errno in our environment, so this is a misuse of a strtoX or
parse_timestamp function".
>
> Thanks.
>
> > Signed-off-by: Kyle Lippincott <spectral@google.com>
> > ---
> > strbuf.c | 1 +
> > 1 file changed, 1 insertion(+)
> >
> > diff --git a/strbuf.c b/strbuf.c
> > index 3d2189a7f64..b94ef040ab0 100644
> > --- a/strbuf.c
> > +++ b/strbuf.c
> > @@ -601,6 +601,7 @@ int strbuf_getcwd(struct strbuf *sb)
> > strbuf_grow(sb, guessed_len);
> > if (getcwd(sb->buf, sb->alloc)) {
> > strbuf_setlen(sb, strlen(sb->buf));
> > + errno = 0;
> > return 0;
> > }
^ permalink raw reply [flat|nested] 29+ messages in thread
* [PATCH v2 0/3] Small fixes for issues detected during internal CI runs
2024-08-02 4:10 [PATCH 0/3] Small fixes for issues detected during internal CI runs Kyle Lippincott via GitGitGadget
` (2 preceding siblings ...)
2024-08-02 4:10 ` [PATCH 3/3] t6421: fix test to work when repo dir contains d0 Kyle Lippincott via GitGitGadget
@ 2024-08-02 20:58 ` Kyle Lippincott via GitGitGadget
2024-08-02 20:58 ` [PATCH v2 1/3] set errno=0 before strtoX calls Kyle Lippincott via GitGitGadget
` (3 more replies)
3 siblings, 4 replies; 29+ messages in thread
From: Kyle Lippincott via GitGitGadget @ 2024-08-02 20:58 UTC (permalink / raw)
To: git; +Cc: Patrick Steinhardt, Kyle Lippincott
I'm attempting to get the git test suite running automatically during our
weekly import. I have this mostly working, including with Address Sanitizer
and Memory Sanitizer, but ran into a few issues:
* several tests were failing due to strbuf_getcwd not clearing errno on
success after it internally looped due to the path being >128 bytes. This
is resolved in depth; though either one of the commits alone would
resolve our issues:
* modify locations that call strtoX and check for ERANGE to set errno =
0; prior to calling the conversion function. This is the typical way
that these functions are invoked, and may indicate that we want
compatibility helpers in git-compat-util.h to ensure that this happens
correctly (and add these functions to the banned list).
* have strbuf_getcwd set errno = 0; prior to a successful exit. This
isn't very common for most functions in the codebase, but some other
examples of this were found.
* t6421-merge-partial-clone.sh had >10% flakiness. This is due to our build
system using paths that contain a 64-hex-char hash, which had a 12.5%
chance of containing the substring d0.
Kyle Lippincott (3):
set errno=0 before strtoX calls
strbuf: set errno to 0 after strbuf_getcwd
t6421: fix test to work when repo dir contains d0
builtin/get-tar-commit-id.c | 1 +
ref-filter.c | 1 +
strbuf.c | 1 +
t/helper/test-json-writer.c | 2 ++
t/helper/test-trace2.c | 1 +
t/t6421-merge-partial-clone.sh | 15 +++++++++------
6 files changed, 15 insertions(+), 6 deletions(-)
base-commit: e559c4bf1a306cf5814418d318cc0fea070da3c7
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1756%2Fspectral54%2Fstrbuf_getcwd-clear-errno-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1756/spectral54/strbuf_getcwd-clear-errno-v2
Pull-Request: https://github.com/git/git/pull/1756
Range-diff vs v1:
1: 4dbd0bec40a = 1: 4dbd0bec40a set errno=0 before strtoX calls
2: 0ed09e9abb8 = 2: 0ed09e9abb8 strbuf: set errno to 0 after strbuf_getcwd
3: 6c08b8ceb2b ! 3: 818dc9e6b3e t6421: fix test to work when repo dir contains d0
@@ Commit message
## t/t6421-merge-partial-clone.sh ##
@@ t/t6421-merge-partial-clone.sh: test_expect_merge_algorithm failure success 'Objects downloaded for single relev
+ grep fetch_count trace.output | cut -d "|" -f 9 | tr -d " ." >actual &&
test_cmp expect actual &&
- # Check the number of fetch commands exec-ed
+- # Check the number of fetch commands exec-ed
- grep d0.*fetch.negotiationAlgorithm trace.output >fetches &&
++ # Check the number of fetch commands exec-ed by filtering trace to
++ # child_start events by the top-level program (2nd field == d0)
+ grep " d0 .* child_start .*fetch.negotiationAlgorithm" trace.output >fetches &&
test_line_count = 2 fetches &&
git rev-list --objects --all --missing=print |
@@ t/t6421-merge-partial-clone.sh: test_expect_merge_algorithm failure success 'Objects downloaded when a directory
+ grep fetch_count trace.output | cut -d "|" -f 9 | tr -d " ." >actual &&
test_cmp expect actual &&
- # Check the number of fetch commands exec-ed
+- # Check the number of fetch commands exec-ed
- grep d0.*fetch.negotiationAlgorithm trace.output >fetches &&
++ # Check the number of fetch commands exec-ed by filtering trace to
++ # child_start events by the top-level program (2nd field == d0)
+ grep " d0 .* child_start .*fetch.negotiationAlgorithm" trace.output >fetches &&
test_line_count = 1 fetches &&
git rev-list --objects --all --missing=print |
@@ t/t6421-merge-partial-clone.sh: test_expect_merge_algorithm failure success 'Objects downloaded with lots of ren
+ grep fetch_count trace.output | cut -d "|" -f 9 | tr -d " ." >actual &&
test_cmp expect actual &&
- # Check the number of fetch commands exec-ed
+- # Check the number of fetch commands exec-ed
- grep d0.*fetch.negotiationAlgorithm trace.output >fetches &&
++ # Check the number of fetch commands exec-ed by filtering trace to
++ # child_start events by the top-level program (2nd field == d0)
+ grep " d0 .* child_start .*fetch.negotiationAlgorithm" trace.output >fetches &&
test_line_count = 4 fetches &&
--
gitgitgadget
^ permalink raw reply [flat|nested] 29+ messages in thread
* [PATCH v2 1/3] set errno=0 before strtoX calls
2024-08-02 20:58 ` [PATCH v2 0/3] Small fixes for issues detected during internal CI runs Kyle Lippincott via GitGitGadget
@ 2024-08-02 20:58 ` Kyle Lippincott via GitGitGadget
2024-08-02 21:18 ` Junio C Hamano
2024-08-02 20:58 ` [PATCH v2 2/3] strbuf: set errno to 0 after strbuf_getcwd Kyle Lippincott via GitGitGadget
` (2 subsequent siblings)
3 siblings, 1 reply; 29+ messages in thread
From: Kyle Lippincott via GitGitGadget @ 2024-08-02 20:58 UTC (permalink / raw)
To: git; +Cc: Patrick Steinhardt, Kyle Lippincott, Kyle Lippincott
From: Kyle Lippincott <spectral@google.com>
To detect conversion failure after calls to functions like `strtod`, one
can check `errno == ERANGE`. These functions are not guaranteed to set
`errno` to `0` on successful conversion, however. Manual manipulation of
`errno` can likely be avoided by checking that the output pointer
differs from the input pointer, but that's not how other locations, such
as parse.c:139, handle this issue; they set errno to 0 prior to
executing the function.
For every place I could find a strtoX function with an ERANGE check
following it, set `errno = 0;` prior to executing the conversion
function.
Signed-off-by: Kyle Lippincott <spectral@google.com>
---
builtin/get-tar-commit-id.c | 1 +
ref-filter.c | 1 +
t/helper/test-json-writer.c | 2 ++
t/helper/test-trace2.c | 1 +
4 files changed, 5 insertions(+)
diff --git a/builtin/get-tar-commit-id.c b/builtin/get-tar-commit-id.c
index 66a7389f9f4..7195a072edc 100644
--- a/builtin/get-tar-commit-id.c
+++ b/builtin/get-tar-commit-id.c
@@ -35,6 +35,7 @@ int cmd_get_tar_commit_id(int argc, const char **argv UNUSED, const char *prefix
if (header->typeflag[0] != TYPEFLAG_GLOBAL_HEADER)
return 1;
+ errno = 0;
len = strtol(content, &end, 10);
if (errno == ERANGE || end == content || len < 0)
return 1;
diff --git a/ref-filter.c b/ref-filter.c
index 8c5e673fc0a..54880a2497a 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -1628,6 +1628,7 @@ static void grab_date(const char *buf, struct atom_value *v, const char *atomnam
timestamp = parse_timestamp(eoemail + 2, &zone, 10);
if (timestamp == TIME_MAX)
goto bad;
+ errno = 0;
tz = strtol(zone, NULL, 10);
if ((tz == LONG_MIN || tz == LONG_MAX) && errno == ERANGE)
goto bad;
diff --git a/t/helper/test-json-writer.c b/t/helper/test-json-writer.c
index ed52eb76bfc..a288069b04c 100644
--- a/t/helper/test-json-writer.c
+++ b/t/helper/test-json-writer.c
@@ -415,6 +415,7 @@ static void get_i(struct line *line, intmax_t *s_in)
get_s(line, &s);
+ errno = 0;
*s_in = strtol(s, &endptr, 10);
if (*endptr || errno == ERANGE)
die("line[%d]: invalid integer value", line->nr);
@@ -427,6 +428,7 @@ static void get_d(struct line *line, double *s_in)
get_s(line, &s);
+ errno = 0;
*s_in = strtod(s, &endptr);
if (*endptr || errno == ERANGE)
die("line[%d]: invalid float value", line->nr);
diff --git a/t/helper/test-trace2.c b/t/helper/test-trace2.c
index cd955ec63e9..c588c273ce7 100644
--- a/t/helper/test-trace2.c
+++ b/t/helper/test-trace2.c
@@ -26,6 +26,7 @@ static int get_i(int *p_value, const char *data)
if (!data || !*data)
return MyError;
+ errno = 0;
*p_value = strtol(data, &endptr, 10);
if (*endptr || errno == ERANGE)
return MyError;
--
gitgitgadget
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH v2 2/3] strbuf: set errno to 0 after strbuf_getcwd
2024-08-02 20:58 ` [PATCH v2 0/3] Small fixes for issues detected during internal CI runs Kyle Lippincott via GitGitGadget
2024-08-02 20:58 ` [PATCH v2 1/3] set errno=0 before strtoX calls Kyle Lippincott via GitGitGadget
@ 2024-08-02 20:58 ` Kyle Lippincott via GitGitGadget
2024-08-02 21:32 ` Junio C Hamano
2024-08-02 20:58 ` [PATCH v2 3/3] t6421: fix test to work when repo dir contains d0 Kyle Lippincott via GitGitGadget
2024-08-05 17:10 ` [PATCH v3 0/2] Small fixes for issues detected during internal CI runs Kyle Lippincott via GitGitGadget
3 siblings, 1 reply; 29+ messages in thread
From: Kyle Lippincott via GitGitGadget @ 2024-08-02 20:58 UTC (permalink / raw)
To: git; +Cc: Patrick Steinhardt, Kyle Lippincott, Kyle Lippincott
From: Kyle Lippincott <spectral@google.com>
If the loop executes more than once due to cwd being longer than 128
bytes, then `errno = ERANGE` might persist outside of this function.
This technically shouldn't be a problem, as all locations where the
value in `errno` is tested should either (a) call a function that's
guaranteed to set `errno` to 0 on success, or (b) set `errno` to 0 prior
to calling the function that only conditionally sets errno, such as the
`strtod` function. In the case of functions in category (b), it's easy
to forget to do that.
Set `errno = 0;` prior to exiting from `strbuf_getcwd` successfully.
This matches the behavior in functions like `run_transaction_hook`
(refs.c:2176) and `read_ref_internal` (refs/files-backend.c:564).
Signed-off-by: Kyle Lippincott <spectral@google.com>
---
strbuf.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/strbuf.c b/strbuf.c
index 3d2189a7f64..b94ef040ab0 100644
--- a/strbuf.c
+++ b/strbuf.c
@@ -601,6 +601,7 @@ int strbuf_getcwd(struct strbuf *sb)
strbuf_grow(sb, guessed_len);
if (getcwd(sb->buf, sb->alloc)) {
strbuf_setlen(sb, strlen(sb->buf));
+ errno = 0;
return 0;
}
--
gitgitgadget
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH v2 3/3] t6421: fix test to work when repo dir contains d0
2024-08-02 20:58 ` [PATCH v2 0/3] Small fixes for issues detected during internal CI runs Kyle Lippincott via GitGitGadget
2024-08-02 20:58 ` [PATCH v2 1/3] set errno=0 before strtoX calls Kyle Lippincott via GitGitGadget
2024-08-02 20:58 ` [PATCH v2 2/3] strbuf: set errno to 0 after strbuf_getcwd Kyle Lippincott via GitGitGadget
@ 2024-08-02 20:58 ` Kyle Lippincott via GitGitGadget
2024-08-02 21:41 ` Junio C Hamano
2024-08-05 17:10 ` [PATCH v3 0/2] Small fixes for issues detected during internal CI runs Kyle Lippincott via GitGitGadget
3 siblings, 1 reply; 29+ messages in thread
From: Kyle Lippincott via GitGitGadget @ 2024-08-02 20:58 UTC (permalink / raw)
To: git; +Cc: Patrick Steinhardt, Kyle Lippincott, Kyle Lippincott
From: Kyle Lippincott <spectral@google.com>
The `grep` statement in this test looks for `d0.*<string>`, attempting
to filter to only show lines that had tabular output where the 2nd
column had `d0` and the final column had a substring of
[`git -c `]`fetch.negotiationAlgorithm`. These lines also have
`child_start` in the 4th column, but this isn't part of the condition.
A subsequent line will have `d1` in the 2nd column, `start` in the 4th
column, and `/path/to/git/git -c fetch.negotiationAlgorihm` in the final
column. If `/path/to/git/git` contains the substring `d0`, then this
line is included by `grep` as well as the desired line, leading to an
effective doubling of the number of lines, and test failures.
Tighten the grep expression to require `d0` to be surrounded by spaces,
and to have the `child_start` label.
Signed-off-by: Kyle Lippincott <spectral@google.com>
---
t/t6421-merge-partial-clone.sh | 15 +++++++++------
1 file changed, 9 insertions(+), 6 deletions(-)
diff --git a/t/t6421-merge-partial-clone.sh b/t/t6421-merge-partial-clone.sh
index 711b709e755..b99f29ef9ba 100755
--- a/t/t6421-merge-partial-clone.sh
+++ b/t/t6421-merge-partial-clone.sh
@@ -230,8 +230,9 @@ test_expect_merge_algorithm failure success 'Objects downloaded for single relev
grep fetch_count trace.output | cut -d "|" -f 9 | tr -d " ." >actual &&
test_cmp expect actual &&
- # Check the number of fetch commands exec-ed
- grep d0.*fetch.negotiationAlgorithm trace.output >fetches &&
+ # Check the number of fetch commands exec-ed by filtering trace to
+ # child_start events by the top-level program (2nd field == d0)
+ grep " d0 .* child_start .*fetch.negotiationAlgorithm" trace.output >fetches &&
test_line_count = 2 fetches &&
git rev-list --objects --all --missing=print |
@@ -318,8 +319,9 @@ test_expect_merge_algorithm failure success 'Objects downloaded when a directory
grep fetch_count trace.output | cut -d "|" -f 9 | tr -d " ." >actual &&
test_cmp expect actual &&
- # Check the number of fetch commands exec-ed
- grep d0.*fetch.negotiationAlgorithm trace.output >fetches &&
+ # Check the number of fetch commands exec-ed by filtering trace to
+ # child_start events by the top-level program (2nd field == d0)
+ grep " d0 .* child_start .*fetch.negotiationAlgorithm" trace.output >fetches &&
test_line_count = 1 fetches &&
git rev-list --objects --all --missing=print |
@@ -422,8 +424,9 @@ test_expect_merge_algorithm failure success 'Objects downloaded with lots of ren
grep fetch_count trace.output | cut -d "|" -f 9 | tr -d " ." >actual &&
test_cmp expect actual &&
- # Check the number of fetch commands exec-ed
- grep d0.*fetch.negotiationAlgorithm trace.output >fetches &&
+ # Check the number of fetch commands exec-ed by filtering trace to
+ # child_start events by the top-level program (2nd field == d0)
+ grep " d0 .* child_start .*fetch.negotiationAlgorithm" trace.output >fetches &&
test_line_count = 4 fetches &&
git rev-list --objects --all --missing=print |
--
gitgitgadget
^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: [PATCH v2 1/3] set errno=0 before strtoX calls
2024-08-02 20:58 ` [PATCH v2 1/3] set errno=0 before strtoX calls Kyle Lippincott via GitGitGadget
@ 2024-08-02 21:18 ` Junio C Hamano
0 siblings, 0 replies; 29+ messages in thread
From: Junio C Hamano @ 2024-08-02 21:18 UTC (permalink / raw)
To: Kyle Lippincott via GitGitGadget; +Cc: git, Patrick Steinhardt, Kyle Lippincott
"Kyle Lippincott via GitGitGadget" <gitgitgadget@gmail.com> writes:
> From: Kyle Lippincott <spectral@google.com>
>
> To detect conversion failure after calls to functions like `strtod`, one
> can check `errno == ERANGE`. These functions are not guaranteed to set
> `errno` to `0` on successful conversion, however. Manual manipulation of
> `errno` can likely be avoided by checking that the output pointer
> differs from the input pointer, but that's not how other locations, such
> as parse.c:139, handle this issue; they set errno to 0 prior to
> executing the function.
>
> For every place I could find a strtoX function with an ERANGE check
> following it, set `errno = 0;` prior to executing the conversion
> function.
>
> Signed-off-by: Kyle Lippincott <spectral@google.com>
> ---
> builtin/get-tar-commit-id.c | 1 +
> ref-filter.c | 1 +
> t/helper/test-json-writer.c | 2 ++
> t/helper/test-trace2.c | 1 +
> 4 files changed, 5 insertions(+)
Clearilng before strtoX() call like these changes make perfect sense
(within the constraint of strtoX() API, which is horrible as pointed
out by others many times in the past ;-)
Thanks, will queue.
> diff --git a/builtin/get-tar-commit-id.c b/builtin/get-tar-commit-id.c
> index 66a7389f9f4..7195a072edc 100644
> --- a/builtin/get-tar-commit-id.c
> +++ b/builtin/get-tar-commit-id.c
> @@ -35,6 +35,7 @@ int cmd_get_tar_commit_id(int argc, const char **argv UNUSED, const char *prefix
> if (header->typeflag[0] != TYPEFLAG_GLOBAL_HEADER)
> return 1;
>
> + errno = 0;
> len = strtol(content, &end, 10);
> if (errno == ERANGE || end == content || len < 0)
> return 1;
> diff --git a/ref-filter.c b/ref-filter.c
> index 8c5e673fc0a..54880a2497a 100644
> --- a/ref-filter.c
> +++ b/ref-filter.c
> @@ -1628,6 +1628,7 @@ static void grab_date(const char *buf, struct atom_value *v, const char *atomnam
> timestamp = parse_timestamp(eoemail + 2, &zone, 10);
> if (timestamp == TIME_MAX)
> goto bad;
> + errno = 0;
> tz = strtol(zone, NULL, 10);
> if ((tz == LONG_MIN || tz == LONG_MAX) && errno == ERANGE)
> goto bad;
> diff --git a/t/helper/test-json-writer.c b/t/helper/test-json-writer.c
> index ed52eb76bfc..a288069b04c 100644
> --- a/t/helper/test-json-writer.c
> +++ b/t/helper/test-json-writer.c
> @@ -415,6 +415,7 @@ static void get_i(struct line *line, intmax_t *s_in)
>
> get_s(line, &s);
>
> + errno = 0;
> *s_in = strtol(s, &endptr, 10);
> if (*endptr || errno == ERANGE)
> die("line[%d]: invalid integer value", line->nr);
> @@ -427,6 +428,7 @@ static void get_d(struct line *line, double *s_in)
>
> get_s(line, &s);
>
> + errno = 0;
> *s_in = strtod(s, &endptr);
> if (*endptr || errno == ERANGE)
> die("line[%d]: invalid float value", line->nr);
> diff --git a/t/helper/test-trace2.c b/t/helper/test-trace2.c
> index cd955ec63e9..c588c273ce7 100644
> --- a/t/helper/test-trace2.c
> +++ b/t/helper/test-trace2.c
> @@ -26,6 +26,7 @@ static int get_i(int *p_value, const char *data)
> if (!data || !*data)
> return MyError;
>
> + errno = 0;
> *p_value = strtol(data, &endptr, 10);
> if (*endptr || errno == ERANGE)
> return MyError;
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v2 2/3] strbuf: set errno to 0 after strbuf_getcwd
2024-08-02 20:58 ` [PATCH v2 2/3] strbuf: set errno to 0 after strbuf_getcwd Kyle Lippincott via GitGitGadget
@ 2024-08-02 21:32 ` Junio C Hamano
2024-08-02 21:54 ` Eric Sunshine
2024-08-02 23:51 ` Kyle Lippincott
0 siblings, 2 replies; 29+ messages in thread
From: Junio C Hamano @ 2024-08-02 21:32 UTC (permalink / raw)
To: Kyle Lippincott via GitGitGadget; +Cc: git, Patrick Steinhardt, Kyle Lippincott
"Kyle Lippincott via GitGitGadget" <gitgitgadget@gmail.com> writes:
> From: Kyle Lippincott <spectral@google.com>
>
> If the loop executes more than once due to cwd being longer than 128
> bytes, then `errno = ERANGE` might persist outside of this function.
> This technically shouldn't be a problem, as all locations where the
> value in `errno` is tested should either (a) call a function that's
> guaranteed to set `errno` to 0 on success, or (b) set `errno` to 0 prior
> to calling the function that only conditionally sets errno, such as the
> `strtod` function. In the case of functions in category (b), it's easy
> to forget to do that.
>
> Set `errno = 0;` prior to exiting from `strbuf_getcwd` successfully.
> This matches the behavior in functions like `run_transaction_hook`
> (refs.c:2176) and `read_ref_internal` (refs/files-backend.c:564).
I am still uneasy to see this unconditional clearing, which looks
more like spreading the bad practice from two places you identified
than following good behaviour modelled after these two places.
But I'll let it pass.
As long as our programmers understand that across strbuf_getcwd(),
errno will *not* be preserved, even if the function returns success,
it would be OK. As the usual convention around errno is that a
successful call would leave errno intact, not clear it to 0, it
would make it a bit harder to learn our API for newcomers, though.
Thanks.
> Signed-off-by: Kyle Lippincott <spectral@google.com>
> ---
> strbuf.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/strbuf.c b/strbuf.c
> index 3d2189a7f64..b94ef040ab0 100644
> --- a/strbuf.c
> +++ b/strbuf.c
> @@ -601,6 +601,7 @@ int strbuf_getcwd(struct strbuf *sb)
> strbuf_grow(sb, guessed_len);
> if (getcwd(sb->buf, sb->alloc)) {
> strbuf_setlen(sb, strlen(sb->buf));
> + errno = 0;
> return 0;
> }
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v2 3/3] t6421: fix test to work when repo dir contains d0
2024-08-02 20:58 ` [PATCH v2 3/3] t6421: fix test to work when repo dir contains d0 Kyle Lippincott via GitGitGadget
@ 2024-08-02 21:41 ` Junio C Hamano
2024-08-03 0:03 ` Kyle Lippincott
0 siblings, 1 reply; 29+ messages in thread
From: Junio C Hamano @ 2024-08-02 21:41 UTC (permalink / raw)
To: Kyle Lippincott via GitGitGadget; +Cc: git, Patrick Steinhardt, Kyle Lippincott
"Kyle Lippincott via GitGitGadget" <gitgitgadget@gmail.com> writes:
> From: Kyle Lippincott <spectral@google.com>
>
> The `grep` statement in this test looks for `d0.*<string>`, attempting
> to filter to only show lines that had tabular output where the 2nd
> column had `d0` and the final column had a substring of
> [`git -c `]`fetch.negotiationAlgorithm`. These lines also have
> `child_start` in the 4th column, but this isn't part of the condition.
>
> A subsequent line will have `d1` in the 2nd column, `start` in the 4th
> column, and `/path/to/git/git -c fetch.negotiationAlgorihm` in the final
> column. If `/path/to/git/git` contains the substring `d0`, then this
> line is included by `grep` as well as the desired line, leading to an
> effective doubling of the number of lines, and test failures.
>
> Tighten the grep expression to require `d0` to be surrounded by spaces,
> and to have the `child_start` label.
OK.
I think I actually misinterpreted what you meant with these changes.
It is not what the patterns are picking. It is some _other_ trace
entry we do not necessarily care about, like label:do_write_index
that has the path to the .git/index.lock file, that can accidentally
contain d0, that can be picked up with a pattern that is too loose.
So it really didn't have to clarify what it is looking for, as it
would not help seeing what false positives the patterns are designed
to avoid matching. Sorry about that.
Will queue.
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v2 2/3] strbuf: set errno to 0 after strbuf_getcwd
2024-08-02 21:32 ` Junio C Hamano
@ 2024-08-02 21:54 ` Eric Sunshine
2024-08-05 15:51 ` Junio C Hamano
2024-08-02 23:51 ` Kyle Lippincott
1 sibling, 1 reply; 29+ messages in thread
From: Eric Sunshine @ 2024-08-02 21:54 UTC (permalink / raw)
To: Junio C Hamano
Cc: Kyle Lippincott via GitGitGadget, git, Patrick Steinhardt,
Kyle Lippincott
On Fri, Aug 2, 2024 at 5:32 PM Junio C Hamano <gitster@pobox.com> wrote:
> > [...]
> > Set `errno = 0;` prior to exiting from `strbuf_getcwd` successfully.
> > This matches the behavior in functions like `run_transaction_hook`
> > (refs.c:2176) and `read_ref_internal` (refs/files-backend.c:564).
>
> I am still uneasy to see this unconditional clearing, which looks
> more like spreading the bad practice from two places you identified
> than following good behaviour modelled after these two places.
>
> But I'll let it pass.
>
> As long as our programmers understand that across strbuf_getcwd(),
> errno will *not* be preserved, even if the function returns success,
> it would be OK. As the usual convention around errno is that a
> successful call would leave errno intact, not clear it to 0, it
> would make it a bit harder to learn our API for newcomers, though.
For what it's worth, I share your misgivings about this change and
consider the suggestion[*] to make it save/restore `errno` upon
success more sensible. It would also be a welcome change to see the
function documentation in strbuf.h updated to mention that it follows
the usual convention of leaving `errno` untouched upon success and
clobbered upon error.
[*]: https://lore.kernel.org/git/xmqqv80jeza5.fsf@gitster.g/
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v2 2/3] strbuf: set errno to 0 after strbuf_getcwd
2024-08-02 21:32 ` Junio C Hamano
2024-08-02 21:54 ` Eric Sunshine
@ 2024-08-02 23:51 ` Kyle Lippincott
2024-08-05 17:12 ` Kyle Lippincott
1 sibling, 1 reply; 29+ messages in thread
From: Kyle Lippincott @ 2024-08-02 23:51 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Kyle Lippincott via GitGitGadget, git, Patrick Steinhardt
On Fri, Aug 2, 2024 at 2:32 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> "Kyle Lippincott via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
> > From: Kyle Lippincott <spectral@google.com>
> >
> > If the loop executes more than once due to cwd being longer than 128
> > bytes, then `errno = ERANGE` might persist outside of this function.
> > This technically shouldn't be a problem, as all locations where the
> > value in `errno` is tested should either (a) call a function that's
> > guaranteed to set `errno` to 0 on success, or (b) set `errno` to 0 prior
> > to calling the function that only conditionally sets errno, such as the
> > `strtod` function. In the case of functions in category (b), it's easy
> > to forget to do that.
> >
> > Set `errno = 0;` prior to exiting from `strbuf_getcwd` successfully.
> > This matches the behavior in functions like `run_transaction_hook`
> > (refs.c:2176) and `read_ref_internal` (refs/files-backend.c:564).
>
> I am still uneasy to see this unconditional clearing, which looks
> more like spreading the bad practice from two places you identified
> than following good behaviour modelled after these two places.
>
> But I'll let it pass.
>
> As long as our programmers understand that across strbuf_getcwd(),
> errno will *not* be preserved, even if the function returns success,
> it would be OK. As the usual convention around errno is that a
> successful call would leave errno intact, not clear it to 0, it
> would make it a bit harder to learn our API for newcomers, though.
I'm sympathetic to that argument. If you'd prefer to not have this
patch, I'm fine with it not landing, and instead at some future date I
may try to work on those #leftoverbits from the previous patch (to
make a safer wrapper around strtoX, and ban the use of the unwrapped
versions), or someone else can if they beat me to it.
Since this is wrapping a posix function, and posix has things to say
about this (see below), I agree that it shouldn't set it to 0, and
withdraw this patch.
I'm including my references below mostly because with the information
I just acquired, I think that any attempt to _preserve_ errno is also
folly. No function we write, unless we explicitly state that it _will_
preserve errno, should feel obligated to do so. The number of cases
where errno _could_ be modified according to the various
specifications (C99 and posix) are just too numerous.
---
Perhaps because I'm not all that experienced with C, but when I did C
a couple decades ago, I operated in a mode where basically every
function was actively hostile. If I wanted errno preserved across a
function call, then it's up to me (the caller) to do so, regardless of
what the current implementation of that function says will happen,
because that can change at any point. Unless the function is
documented as errno-preserving, I'm going to treat it as
errno-hostile. In practice, this didn't really matter much, as I've
never found `if (some_func()) { if (!some_other_func()) { /* use errno
from `some_func` */ } }` logic to happen often, but maybe it does in
"real" programs, I was just a hobbyist self-teaching at the time.
The C standard has a very precise definition of how the library
functions defined in the C specification will act. It guarantees:
- the library functions defined in the specification will never set errno to 0.
- the library functions defined in the specification may set the value
to non-zero whether an error occurs or not, "provided the use of errno
is not documented in the description of the function in this
International Standard". What this means is that (a) if the function
as defined in the C standard mentions errno, it can only set the
values as specified there, and (b) if the function as defined in the C
standard does _not_ mention errno, such as `fopen` or `strstr`, it can
do _whatever it wants_ to errno, even on success, _except_ set it to
0.
POSIX has similar language
(https://pubs.opengroup.org/onlinepubs/009695399/functions/errno.html),
with some key differences:
- The value of errno should only be examined when it is indicated to
be valid by a function's return value.
- The setting of errno after a successful call to a function is
unspecified unless the description of that function specifies that
errno shall not be modified.
This means that unlike the C specification, which says that if a
function doesn't describe its use of errno it can do anything it wants
to errno [except set it to 0], in POSIX, a function can do anything it
wants to errno [except set it to 0] at any time.
What this means in practice is that errno should never be assumed to
be preserved across calls to posix functions (like getcwd). Also,
strbuf_getcwd calls free, malloc, and realloc, none of which mention
errno in the C specification, so they can do whatever they want to it
[except set it to 0]. That I was able to find one single function that
was causing problems is luck, and not guaranteed by any specification.
Kind of makes me want to try writing an actively hostile C99 and POSIX
environment, and see how many things break with it. :) C99 spec
doesn't say anything about malloc setting errno? Ok! malloc now sets
errno to ENOENT on tuesdays [in GMT because I'm not a monster], but
only on success. On any other day, it'll set it to ERANGE, regardless
of success or failure.
>
> Thanks.
>
> > Signed-off-by: Kyle Lippincott <spectral@google.com>
> > ---
> > strbuf.c | 1 +
> > 1 file changed, 1 insertion(+)
> >
> > diff --git a/strbuf.c b/strbuf.c
> > index 3d2189a7f64..b94ef040ab0 100644
> > --- a/strbuf.c
> > +++ b/strbuf.c
> > @@ -601,6 +601,7 @@ int strbuf_getcwd(struct strbuf *sb)
> > strbuf_grow(sb, guessed_len);
> > if (getcwd(sb->buf, sb->alloc)) {
> > strbuf_setlen(sb, strlen(sb->buf));
> > + errno = 0;
> > return 0;
> > }
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v2 3/3] t6421: fix test to work when repo dir contains d0
2024-08-02 21:41 ` Junio C Hamano
@ 2024-08-03 0:03 ` Kyle Lippincott
2024-08-03 0:27 ` Junio C Hamano
0 siblings, 1 reply; 29+ messages in thread
From: Kyle Lippincott @ 2024-08-03 0:03 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Kyle Lippincott via GitGitGadget, git, Patrick Steinhardt
On Fri, Aug 2, 2024 at 2:41 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> "Kyle Lippincott via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
> > From: Kyle Lippincott <spectral@google.com>
> >
> > The `grep` statement in this test looks for `d0.*<string>`, attempting
> > to filter to only show lines that had tabular output where the 2nd
> > column had `d0` and the final column had a substring of
> > [`git -c `]`fetch.negotiationAlgorithm`. These lines also have
> > `child_start` in the 4th column, but this isn't part of the condition.
> >
> > A subsequent line will have `d1` in the 2nd column, `start` in the 4th
> > column, and `/path/to/git/git -c fetch.negotiationAlgorihm` in the final
> > column. If `/path/to/git/git` contains the substring `d0`, then this
> > line is included by `grep` as well as the desired line, leading to an
> > effective doubling of the number of lines, and test failures.
> >
> > Tighten the grep expression to require `d0` to be surrounded by spaces,
> > and to have the `child_start` label.
>
> OK.
>
> I think I actually misinterpreted what you meant with these changes.
> It is not what the patterns are picking. It is some _other_ trace
> entry we do not necessarily care about, like label:do_write_index
> that has the path to the .git/index.lock file, that can accidentally
> contain d0, that can be picked up with a pattern that is too loose.
> So it really didn't have to clarify what it is looking for, as it
> would not help seeing what false positives the patterns are designed
> to avoid matching. Sorry about that.
I would have included examples, but they're quite long (>>>80 chars),
so seemed very out of place in both commit description and in the
codebase. With line wrapping, it wasn't very readable either. At the
risk of this also getting line-wrapped into unreadability:
test_line_count: line count for fetches != 1
23:59:48.794453 run-command.c:733 | d0 | main
| child_start | | 0.027328 | | |
..........[ch1] class:? argv:[git -c fetch.negotiationAlgorithm=noop
fetch origin --no-tags --no-write-fetch-head --recurse-submodules=no
--filter=blob:none --stdin]
23:59:48.798901 common-main.c:58 | d1 | main
| start | | 0.000852 | | |
/usr/local/google/home/spectral/src/oss/d0/git/git -c
fetch.negotiationAlgorithm=noop fetch origin --no-tags
--no-write-fetch-head --recurse-submodules=no --filter=blob:none
--stdin
where each line in the `fetches` file starts with `23:59:48` here.
It's 9 columns, separated by `|` characters, and the line we don't
want is the second one; the regex `d0.*fetch.negotiationAlgorithm`
includes it because of the `d0` in the path.
I considered using `awk -F"|" "\$2~/d0/ &&
\$9~/fetch\\.negotiationAlgorithm/{ print }" trace.output >fetches`,
but it was longer, possibly less clear, and less specific (since it
didn't include the $4~/child_start/ condition)
>
> Will queue.
>
>
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v2 3/3] t6421: fix test to work when repo dir contains d0
2024-08-03 0:03 ` Kyle Lippincott
@ 2024-08-03 0:27 ` Junio C Hamano
0 siblings, 0 replies; 29+ messages in thread
From: Junio C Hamano @ 2024-08-03 0:27 UTC (permalink / raw)
To: Kyle Lippincott; +Cc: Kyle Lippincott via GitGitGadget, git, Patrick Steinhardt
Kyle Lippincott <spectral@google.com> writes:
>> So it really didn't have to clarify what it is looking for, as it
>> would not help seeing what false positives the patterns are designed
>> to avoid matching. Sorry about that.
>
> I would have included examples, but they're quite long (>>>80 chars),
> so seemed very out of place in both commit description and in the
> codebase.
Absolutely. It turned out not to be so useful to show the shape of
potential matches, like this one:
... run-command.c:733 | d0 | main | child_start |...
To explain why spaces around " d0 " matters, the readers need to
understand that other trace entries that are irrelevant for our
purpose, like this one
... | label:do_write_index /path/to/t/trash directory.../.git/index.lock
we want reject, and for that we want the pattern to be specific
enough by looking for " do " that is followed by " child_start ".
Otherwise the leading paths that can contain anything won't easily
match, and the original of looking for just "d0" was way too error
prone. But it is hard to leave a concise hint for that there.
So, again, sorry about the bad suggestion.
> I considered using `awk -F"|" "\$2~/d0/ &&
> \$9~/fetch\\.negotiationAlgorithm/{ print }" trace.output >fetches`,
> but it was longer, possibly less clear, and less specific (since it
> didn't include the $4~/child_start/ condition)
Yeah, using the syntactic clue -F"|" would also be a way to convey
the intention (i.e. "we are dealing with tabular output and we
expect nth column to be X"), but what you have is probably good
enough---it certainly is simpler to read and understand. I briefly
considered that looking for "| d0 |" (i.e. explicitly mentioning the
column separator in the pattern) would make it even more obvious
what we are looking for, but having to worry about quoting "|" in
regexp would negate the benefit of obviousness out of the approach
to use "grep".
Thanks.
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v2 2/3] strbuf: set errno to 0 after strbuf_getcwd
2024-08-02 21:54 ` Eric Sunshine
@ 2024-08-05 15:51 ` Junio C Hamano
2024-08-06 6:26 ` Patrick Steinhardt
0 siblings, 1 reply; 29+ messages in thread
From: Junio C Hamano @ 2024-08-05 15:51 UTC (permalink / raw)
To: Eric Sunshine
Cc: Kyle Lippincott via GitGitGadget, git, Patrick Steinhardt,
Kyle Lippincott
Eric Sunshine <sunshine@sunshineco.com> writes:
> On Fri, Aug 2, 2024 at 5:32 PM Junio C Hamano <gitster@pobox.com> wrote:
>> > [...]
>> > Set `errno = 0;` prior to exiting from `strbuf_getcwd` successfully.
>> > This matches the behavior in functions like `run_transaction_hook`
>> > (refs.c:2176) and `read_ref_internal` (refs/files-backend.c:564).
>>
>> I am still uneasy to see this unconditional clearing, which looks
>> more like spreading the bad practice from two places you identified
>> than following good behaviour modelled after these two places.
>>
>> But I'll let it pass.
>>
>> As long as our programmers understand that across strbuf_getcwd(),
>> errno will *not* be preserved, even if the function returns success,
>> it would be OK. As the usual convention around errno is that a
>> successful call would leave errno intact, not clear it to 0, it
>> would make it a bit harder to learn our API for newcomers, though.
>
> For what it's worth, I share your misgivings about this change and
> consider the suggestion[*] to make it save/restore `errno` upon
> success more sensible. It would also be a welcome change to see the
> function documentation in strbuf.h updated to mention that it follows
> the usual convention of leaving `errno` untouched upon success and
> clobbered upon error.
>
> [*]: https://lore.kernel.org/git/xmqqv80jeza5.fsf@gitster.g/
Yup, of course save/restore would be safer, and probably easier to
reason about for many people.
Thanks.
^ permalink raw reply [flat|nested] 29+ messages in thread
* [PATCH v3 0/2] Small fixes for issues detected during internal CI runs
2024-08-02 20:58 ` [PATCH v2 0/3] Small fixes for issues detected during internal CI runs Kyle Lippincott via GitGitGadget
` (2 preceding siblings ...)
2024-08-02 20:58 ` [PATCH v2 3/3] t6421: fix test to work when repo dir contains d0 Kyle Lippincott via GitGitGadget
@ 2024-08-05 17:10 ` Kyle Lippincott via GitGitGadget
2024-08-05 17:10 ` [PATCH v3 1/2] set errno=0 before strtoX calls Kyle Lippincott via GitGitGadget
` (2 more replies)
3 siblings, 3 replies; 29+ messages in thread
From: Kyle Lippincott via GitGitGadget @ 2024-08-05 17:10 UTC (permalink / raw)
To: git; +Cc: Patrick Steinhardt, Eric Sunshine, Kyle Lippincott
I'm attempting to get the git test suite running automatically during our
weekly import. I have this mostly working, including with Address Sanitizer
and Memory Sanitizer, but ran into a few issues:
* several tests were failing due to strbuf_getcwd not clearing errno on
success after it internally looped due to the path being >128 bytes. This
is resolved in depth; though either one of the commits alone would
resolve our issues:
* modify locations that call strtoX and check for ERANGE to set errno =
0; prior to calling the conversion function. This is the typical way
that these functions are invoked, and may indicate that we want
compatibility helpers in git-compat-util.h to ensure that this happens
correctly (and add these functions to the banned list).
* have strbuf_getcwd set errno = 0; prior to a successful exit. This
isn't very common for most functions in the codebase, but some other
examples of this were found.
* t6421-merge-partial-clone.sh had >10% flakiness. This is due to our build
system using paths that contain a 64-hex-char hash, which had a 12.5%
chance of containing the substring d0.
Kyle Lippincott (2):
set errno=0 before strtoX calls
t6421: fix test to work when repo dir contains d0
builtin/get-tar-commit-id.c | 1 +
ref-filter.c | 1 +
t/helper/test-json-writer.c | 2 ++
t/helper/test-trace2.c | 1 +
t/t6421-merge-partial-clone.sh | 15 +++++++++------
5 files changed, 14 insertions(+), 6 deletions(-)
base-commit: e559c4bf1a306cf5814418d318cc0fea070da3c7
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1756%2Fspectral54%2Fstrbuf_getcwd-clear-errno-v3
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1756/spectral54/strbuf_getcwd-clear-errno-v3
Pull-Request: https://github.com/git/git/pull/1756
Range-diff vs v2:
1: 4dbd0bec40a = 1: 4dbd0bec40a set errno=0 before strtoX calls
2: 0ed09e9abb8 < -: ----------- strbuf: set errno to 0 after strbuf_getcwd
3: 818dc9e6b3e = 2: 96984c4a15e t6421: fix test to work when repo dir contains d0
--
gitgitgadget
^ permalink raw reply [flat|nested] 29+ messages in thread
* [PATCH v3 1/2] set errno=0 before strtoX calls
2024-08-05 17:10 ` [PATCH v3 0/2] Small fixes for issues detected during internal CI runs Kyle Lippincott via GitGitGadget
@ 2024-08-05 17:10 ` Kyle Lippincott via GitGitGadget
2024-08-05 17:10 ` [PATCH v3 2/2] t6421: fix test to work when repo dir contains d0 Kyle Lippincott via GitGitGadget
2024-08-05 18:37 ` [PATCH v3 0/2] Small fixes for issues detected during internal CI runs Junio C Hamano
2 siblings, 0 replies; 29+ messages in thread
From: Kyle Lippincott via GitGitGadget @ 2024-08-05 17:10 UTC (permalink / raw)
To: git; +Cc: Patrick Steinhardt, Eric Sunshine, Kyle Lippincott,
Kyle Lippincott
From: Kyle Lippincott <spectral@google.com>
To detect conversion failure after calls to functions like `strtod`, one
can check `errno == ERANGE`. These functions are not guaranteed to set
`errno` to `0` on successful conversion, however. Manual manipulation of
`errno` can likely be avoided by checking that the output pointer
differs from the input pointer, but that's not how other locations, such
as parse.c:139, handle this issue; they set errno to 0 prior to
executing the function.
For every place I could find a strtoX function with an ERANGE check
following it, set `errno = 0;` prior to executing the conversion
function.
Signed-off-by: Kyle Lippincott <spectral@google.com>
---
builtin/get-tar-commit-id.c | 1 +
ref-filter.c | 1 +
t/helper/test-json-writer.c | 2 ++
t/helper/test-trace2.c | 1 +
4 files changed, 5 insertions(+)
diff --git a/builtin/get-tar-commit-id.c b/builtin/get-tar-commit-id.c
index 66a7389f9f4..7195a072edc 100644
--- a/builtin/get-tar-commit-id.c
+++ b/builtin/get-tar-commit-id.c
@@ -35,6 +35,7 @@ int cmd_get_tar_commit_id(int argc, const char **argv UNUSED, const char *prefix
if (header->typeflag[0] != TYPEFLAG_GLOBAL_HEADER)
return 1;
+ errno = 0;
len = strtol(content, &end, 10);
if (errno == ERANGE || end == content || len < 0)
return 1;
diff --git a/ref-filter.c b/ref-filter.c
index 8c5e673fc0a..54880a2497a 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -1628,6 +1628,7 @@ static void grab_date(const char *buf, struct atom_value *v, const char *atomnam
timestamp = parse_timestamp(eoemail + 2, &zone, 10);
if (timestamp == TIME_MAX)
goto bad;
+ errno = 0;
tz = strtol(zone, NULL, 10);
if ((tz == LONG_MIN || tz == LONG_MAX) && errno == ERANGE)
goto bad;
diff --git a/t/helper/test-json-writer.c b/t/helper/test-json-writer.c
index ed52eb76bfc..a288069b04c 100644
--- a/t/helper/test-json-writer.c
+++ b/t/helper/test-json-writer.c
@@ -415,6 +415,7 @@ static void get_i(struct line *line, intmax_t *s_in)
get_s(line, &s);
+ errno = 0;
*s_in = strtol(s, &endptr, 10);
if (*endptr || errno == ERANGE)
die("line[%d]: invalid integer value", line->nr);
@@ -427,6 +428,7 @@ static void get_d(struct line *line, double *s_in)
get_s(line, &s);
+ errno = 0;
*s_in = strtod(s, &endptr);
if (*endptr || errno == ERANGE)
die("line[%d]: invalid float value", line->nr);
diff --git a/t/helper/test-trace2.c b/t/helper/test-trace2.c
index cd955ec63e9..c588c273ce7 100644
--- a/t/helper/test-trace2.c
+++ b/t/helper/test-trace2.c
@@ -26,6 +26,7 @@ static int get_i(int *p_value, const char *data)
if (!data || !*data)
return MyError;
+ errno = 0;
*p_value = strtol(data, &endptr, 10);
if (*endptr || errno == ERANGE)
return MyError;
--
gitgitgadget
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH v3 2/2] t6421: fix test to work when repo dir contains d0
2024-08-05 17:10 ` [PATCH v3 0/2] Small fixes for issues detected during internal CI runs Kyle Lippincott via GitGitGadget
2024-08-05 17:10 ` [PATCH v3 1/2] set errno=0 before strtoX calls Kyle Lippincott via GitGitGadget
@ 2024-08-05 17:10 ` Kyle Lippincott via GitGitGadget
2024-08-05 18:37 ` [PATCH v3 0/2] Small fixes for issues detected during internal CI runs Junio C Hamano
2 siblings, 0 replies; 29+ messages in thread
From: Kyle Lippincott via GitGitGadget @ 2024-08-05 17:10 UTC (permalink / raw)
To: git; +Cc: Patrick Steinhardt, Eric Sunshine, Kyle Lippincott,
Kyle Lippincott
From: Kyle Lippincott <spectral@google.com>
The `grep` statement in this test looks for `d0.*<string>`, attempting
to filter to only show lines that had tabular output where the 2nd
column had `d0` and the final column had a substring of
[`git -c `]`fetch.negotiationAlgorithm`. These lines also have
`child_start` in the 4th column, but this isn't part of the condition.
A subsequent line will have `d1` in the 2nd column, `start` in the 4th
column, and `/path/to/git/git -c fetch.negotiationAlgorihm` in the final
column. If `/path/to/git/git` contains the substring `d0`, then this
line is included by `grep` as well as the desired line, leading to an
effective doubling of the number of lines, and test failures.
Tighten the grep expression to require `d0` to be surrounded by spaces,
and to have the `child_start` label.
Signed-off-by: Kyle Lippincott <spectral@google.com>
---
t/t6421-merge-partial-clone.sh | 15 +++++++++------
1 file changed, 9 insertions(+), 6 deletions(-)
diff --git a/t/t6421-merge-partial-clone.sh b/t/t6421-merge-partial-clone.sh
index 711b709e755..b99f29ef9ba 100755
--- a/t/t6421-merge-partial-clone.sh
+++ b/t/t6421-merge-partial-clone.sh
@@ -230,8 +230,9 @@ test_expect_merge_algorithm failure success 'Objects downloaded for single relev
grep fetch_count trace.output | cut -d "|" -f 9 | tr -d " ." >actual &&
test_cmp expect actual &&
- # Check the number of fetch commands exec-ed
- grep d0.*fetch.negotiationAlgorithm trace.output >fetches &&
+ # Check the number of fetch commands exec-ed by filtering trace to
+ # child_start events by the top-level program (2nd field == d0)
+ grep " d0 .* child_start .*fetch.negotiationAlgorithm" trace.output >fetches &&
test_line_count = 2 fetches &&
git rev-list --objects --all --missing=print |
@@ -318,8 +319,9 @@ test_expect_merge_algorithm failure success 'Objects downloaded when a directory
grep fetch_count trace.output | cut -d "|" -f 9 | tr -d " ." >actual &&
test_cmp expect actual &&
- # Check the number of fetch commands exec-ed
- grep d0.*fetch.negotiationAlgorithm trace.output >fetches &&
+ # Check the number of fetch commands exec-ed by filtering trace to
+ # child_start events by the top-level program (2nd field == d0)
+ grep " d0 .* child_start .*fetch.negotiationAlgorithm" trace.output >fetches &&
test_line_count = 1 fetches &&
git rev-list --objects --all --missing=print |
@@ -422,8 +424,9 @@ test_expect_merge_algorithm failure success 'Objects downloaded with lots of ren
grep fetch_count trace.output | cut -d "|" -f 9 | tr -d " ." >actual &&
test_cmp expect actual &&
- # Check the number of fetch commands exec-ed
- grep d0.*fetch.negotiationAlgorithm trace.output >fetches &&
+ # Check the number of fetch commands exec-ed by filtering trace to
+ # child_start events by the top-level program (2nd field == d0)
+ grep " d0 .* child_start .*fetch.negotiationAlgorithm" trace.output >fetches &&
test_line_count = 4 fetches &&
git rev-list --objects --all --missing=print |
--
gitgitgadget
^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: [PATCH v2 2/3] strbuf: set errno to 0 after strbuf_getcwd
2024-08-02 23:51 ` Kyle Lippincott
@ 2024-08-05 17:12 ` Kyle Lippincott
0 siblings, 0 replies; 29+ messages in thread
From: Kyle Lippincott @ 2024-08-05 17:12 UTC (permalink / raw)
To: Junio C Hamano
Cc: Kyle Lippincott via GitGitGadget, git, Patrick Steinhardt,
Eric Sunshine
On Fri, Aug 2, 2024 at 4:51 PM Kyle Lippincott <spectral@google.com> wrote:
>
> On Fri, Aug 2, 2024 at 2:32 PM Junio C Hamano <gitster@pobox.com> wrote:
> >
> > "Kyle Lippincott via GitGitGadget" <gitgitgadget@gmail.com> writes:
> >
> > > From: Kyle Lippincott <spectral@google.com>
> > >
> > > If the loop executes more than once due to cwd being longer than 128
> > > bytes, then `errno = ERANGE` might persist outside of this function.
> > > This technically shouldn't be a problem, as all locations where the
> > > value in `errno` is tested should either (a) call a function that's
> > > guaranteed to set `errno` to 0 on success, or (b) set `errno` to 0 prior
> > > to calling the function that only conditionally sets errno, such as the
> > > `strtod` function. In the case of functions in category (b), it's easy
> > > to forget to do that.
> > >
> > > Set `errno = 0;` prior to exiting from `strbuf_getcwd` successfully.
> > > This matches the behavior in functions like `run_transaction_hook`
> > > (refs.c:2176) and `read_ref_internal` (refs/files-backend.c:564).
> >
> > I am still uneasy to see this unconditional clearing, which looks
> > more like spreading the bad practice from two places you identified
> > than following good behaviour modelled after these two places.
> >
> > But I'll let it pass.
> >
> > As long as our programmers understand that across strbuf_getcwd(),
> > errno will *not* be preserved, even if the function returns success,
> > it would be OK. As the usual convention around errno is that a
> > successful call would leave errno intact, not clear it to 0, it
> > would make it a bit harder to learn our API for newcomers, though.
>
> I'm sympathetic to that argument. If you'd prefer to not have this
> patch, I'm fine with it not landing, and instead at some future date I
> may try to work on those #leftoverbits from the previous patch (to
> make a safer wrapper around strtoX, and ban the use of the unwrapped
> versions), or someone else can if they beat me to it.
>
> Since this is wrapping a posix function, and posix has things to say
> about this (see below), I agree that it shouldn't set it to 0, and
> withdraw this patch.
Dropped this patch in the reroll that (I think) I just sent.
>
> I'm including my references below mostly because with the information
> I just acquired, I think that any attempt to _preserve_ errno is also
> folly. No function we write, unless we explicitly state that it _will_
> preserve errno, should feel obligated to do so. The number of cases
> where errno _could_ be modified according to the various
> specifications (C99 and posix) are just too numerous.
>
> ---
>
> Perhaps because I'm not all that experienced with C, but when I did C
> a couple decades ago, I operated in a mode where basically every
> function was actively hostile. If I wanted errno preserved across a
> function call, then it's up to me (the caller) to do so, regardless of
> what the current implementation of that function says will happen,
> because that can change at any point. Unless the function is
> documented as errno-preserving, I'm going to treat it as
> errno-hostile. In practice, this didn't really matter much, as I've
> never found `if (some_func()) { if (!some_other_func()) { /* use errno
> from `some_func` */ } }` logic to happen often, but maybe it does in
> "real" programs, I was just a hobbyist self-teaching at the time.
>
> The C standard has a very precise definition of how the library
> functions defined in the C specification will act. It guarantees:
> - the library functions defined in the specification will never set errno to 0.
> - the library functions defined in the specification may set the value
> to non-zero whether an error occurs or not, "provided the use of errno
> is not documented in the description of the function in this
> International Standard". What this means is that (a) if the function
> as defined in the C standard mentions errno, it can only set the
> values as specified there, and (b) if the function as defined in the C
> standard does _not_ mention errno, such as `fopen` or `strstr`, it can
> do _whatever it wants_ to errno, even on success, _except_ set it to
> 0.
>
> POSIX has similar language
> (https://pubs.opengroup.org/onlinepubs/009695399/functions/errno.html),
> with some key differences:
> - The value of errno should only be examined when it is indicated to
> be valid by a function's return value.
> - The setting of errno after a successful call to a function is
> unspecified unless the description of that function specifies that
> errno shall not be modified.
>
> This means that unlike the C specification, which says that if a
> function doesn't describe its use of errno it can do anything it wants
> to errno [except set it to 0], in POSIX, a function can do anything it
> wants to errno [except set it to 0] at any time.
>
> What this means in practice is that errno should never be assumed to
> be preserved across calls to posix functions (like getcwd). Also,
> strbuf_getcwd calls free, malloc, and realloc, none of which mention
> errno in the C specification, so they can do whatever they want to it
> [except set it to 0]. That I was able to find one single function that
> was causing problems is luck, and not guaranteed by any specification.
>
> Kind of makes me want to try writing an actively hostile C99 and POSIX
> environment, and see how many things break with it. :) C99 spec
> doesn't say anything about malloc setting errno? Ok! malloc now sets
> errno to ENOENT on tuesdays [in GMT because I'm not a monster], but
> only on success. On any other day, it'll set it to ERANGE, regardless
> of success or failure.
>
> >
> > Thanks.
> >
> > > Signed-off-by: Kyle Lippincott <spectral@google.com>
> > > ---
> > > strbuf.c | 1 +
> > > 1 file changed, 1 insertion(+)
> > >
> > > diff --git a/strbuf.c b/strbuf.c
> > > index 3d2189a7f64..b94ef040ab0 100644
> > > --- a/strbuf.c
> > > +++ b/strbuf.c
> > > @@ -601,6 +601,7 @@ int strbuf_getcwd(struct strbuf *sb)
> > > strbuf_grow(sb, guessed_len);
> > > if (getcwd(sb->buf, sb->alloc)) {
> > > strbuf_setlen(sb, strlen(sb->buf));
> > > + errno = 0;
> > > return 0;
> > > }
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v3 0/2] Small fixes for issues detected during internal CI runs
2024-08-05 17:10 ` [PATCH v3 0/2] Small fixes for issues detected during internal CI runs Kyle Lippincott via GitGitGadget
2024-08-05 17:10 ` [PATCH v3 1/2] set errno=0 before strtoX calls Kyle Lippincott via GitGitGadget
2024-08-05 17:10 ` [PATCH v3 2/2] t6421: fix test to work when repo dir contains d0 Kyle Lippincott via GitGitGadget
@ 2024-08-05 18:37 ` Junio C Hamano
2 siblings, 0 replies; 29+ messages in thread
From: Junio C Hamano @ 2024-08-05 18:37 UTC (permalink / raw)
To: Kyle Lippincott via GitGitGadget
Cc: git, Patrick Steinhardt, Eric Sunshine, Kyle Lippincott
"Kyle Lippincott via GitGitGadget" <gitgitgadget@gmail.com> writes:
> I'm attempting to get the git test suite running automatically during our
> weekly import. I have this mostly working, including with Address Sanitizer
> and Memory Sanitizer, but ran into a few issues:
>
> * several tests were failing due to strbuf_getcwd not clearing errno on
> success after it internally looped due to the path being >128 bytes. This
> is resolved in depth; though either one of the commits alone would
> resolve our issues:
> * modify locations that call strtoX and check for ERANGE to set errno =
> 0; prior to calling the conversion function. This is the typical way
> that these functions are invoked, and may indicate that we want
> compatibility helpers in git-compat-util.h to ensure that this happens
> correctly (and add these functions to the banned list).
> * have strbuf_getcwd set errno = 0; prior to a successful exit. This
> isn't very common for most functions in the codebase, but some other
> examples of this were found.
> * t6421-merge-partial-clone.sh had >10% flakiness. This is due to our build
> system using paths that contain a 64-hex-char hash, which had a 12.5%
> chance of containing the substring d0.
>
> Kyle Lippincott (2):
> set errno=0 before strtoX calls
> t6421: fix test to work when repo dir contains d0
Both patches make perfect sense to me. Thanks.
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v2 2/3] strbuf: set errno to 0 after strbuf_getcwd
2024-08-05 15:51 ` Junio C Hamano
@ 2024-08-06 6:26 ` Patrick Steinhardt
2024-08-06 7:04 ` Kyle Lippincott
0 siblings, 1 reply; 29+ messages in thread
From: Patrick Steinhardt @ 2024-08-06 6:26 UTC (permalink / raw)
To: Junio C Hamano
Cc: Eric Sunshine, Kyle Lippincott via GitGitGadget, git,
Kyle Lippincott
[-- Attachment #1: Type: text/plain, Size: 2697 bytes --]
On Mon, Aug 05, 2024 at 08:51:50AM -0700, Junio C Hamano wrote:
> Eric Sunshine <sunshine@sunshineco.com> writes:
>
> > On Fri, Aug 2, 2024 at 5:32 PM Junio C Hamano <gitster@pobox.com> wrote:
> >> > [...]
> >> > Set `errno = 0;` prior to exiting from `strbuf_getcwd` successfully.
> >> > This matches the behavior in functions like `run_transaction_hook`
> >> > (refs.c:2176) and `read_ref_internal` (refs/files-backend.c:564).
> >>
> >> I am still uneasy to see this unconditional clearing, which looks
> >> more like spreading the bad practice from two places you identified
> >> than following good behaviour modelled after these two places.
> >>
> >> But I'll let it pass.
> >>
> >> As long as our programmers understand that across strbuf_getcwd(),
> >> errno will *not* be preserved, even if the function returns success,
> >> it would be OK. As the usual convention around errno is that a
> >> successful call would leave errno intact, not clear it to 0, it
> >> would make it a bit harder to learn our API for newcomers, though.
> >
> > For what it's worth, I share your misgivings about this change and
> > consider the suggestion[*] to make it save/restore `errno` upon
> > success more sensible. It would also be a welcome change to see the
> > function documentation in strbuf.h updated to mention that it follows
> > the usual convention of leaving `errno` untouched upon success and
> > clobbered upon error.
> >
> > [*]: https://lore.kernel.org/git/xmqqv80jeza5.fsf@gitster.g/
>
> Yup, of course save/restore would be safer, and probably easier to
> reason about for many people.
Is it really all that reasonable? We're essentially partitioning our set
of APIs into two sets, where one set knows to keep `errno` intact
whereas another set doesn't. In such a world, you have to be very
careful about which APIs you are calling in a function that wants to
keep `errno` intact, which to me sounds like a maintenance headache.
I'd claim that most callers never care about `errno` at all. For the
callers that do, I feel it is way more fragile to rely on whether or not
a called function leaves `errno` intact or not. For one, it's fragile
because that may easily change due to a bug. Second, it is fragile
because the dependency on `errno` is not explicitly documented via code,
but rather an implicit dependency.
So isn't it more reasonable to rather make the few callers that do
require `errno` to be left intact to save it? It makes the dependency
explicit, avoids splitting our functions into two sets and allows us to
just ignore this issue for the majority of functions that couldn't care
less about `errno`.
Patrick
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v2 2/3] strbuf: set errno to 0 after strbuf_getcwd
2024-08-06 6:26 ` Patrick Steinhardt
@ 2024-08-06 7:04 ` Kyle Lippincott
0 siblings, 0 replies; 29+ messages in thread
From: Kyle Lippincott @ 2024-08-06 7:04 UTC (permalink / raw)
To: Patrick Steinhardt
Cc: Junio C Hamano, Eric Sunshine, Kyle Lippincott via GitGitGadget,
git
On Mon, Aug 5, 2024 at 11:26 PM Patrick Steinhardt <ps@pks.im> wrote:
>
> On Mon, Aug 05, 2024 at 08:51:50AM -0700, Junio C Hamano wrote:
> > Eric Sunshine <sunshine@sunshineco.com> writes:
> >
> > > On Fri, Aug 2, 2024 at 5:32 PM Junio C Hamano <gitster@pobox.com> wrote:
> > >> > [...]
> > >> > Set `errno = 0;` prior to exiting from `strbuf_getcwd` successfully.
> > >> > This matches the behavior in functions like `run_transaction_hook`
> > >> > (refs.c:2176) and `read_ref_internal` (refs/files-backend.c:564).
> > >>
> > >> I am still uneasy to see this unconditional clearing, which looks
> > >> more like spreading the bad practice from two places you identified
> > >> than following good behaviour modelled after these two places.
> > >>
> > >> But I'll let it pass.
> > >>
> > >> As long as our programmers understand that across strbuf_getcwd(),
> > >> errno will *not* be preserved, even if the function returns success,
> > >> it would be OK. As the usual convention around errno is that a
> > >> successful call would leave errno intact, not clear it to 0, it
> > >> would make it a bit harder to learn our API for newcomers, though.
> > >
> > > For what it's worth, I share your misgivings about this change and
> > > consider the suggestion[*] to make it save/restore `errno` upon
> > > success more sensible. It would also be a welcome change to see the
> > > function documentation in strbuf.h updated to mention that it follows
> > > the usual convention of leaving `errno` untouched upon success and
> > > clobbered upon error.
> > >
> > > [*]: https://lore.kernel.org/git/xmqqv80jeza5.fsf@gitster.g/
> >
> > Yup, of course save/restore would be safer, and probably easier to
> > reason about for many people.
>
> Is it really all that reasonable? We're essentially partitioning our set
> of APIs into two sets, where one set knows to keep `errno` intact
> whereas another set doesn't. In such a world, you have to be very
> careful about which APIs you are calling in a function that wants to
> keep `errno` intact, which to me sounds like a maintenance headache.
>
> I'd claim that most callers never care about `errno` at all. For the
> callers that do, I feel it is way more fragile to rely on whether or not
> a called function leaves `errno` intact or not. For one, it's fragile
> because that may easily change due to a bug. Second, it is fragile
> because the dependency on `errno` is not explicitly documented via code,
> but rather an implicit dependency.
>
> So isn't it more reasonable to rather make the few callers that do
> require `errno` to be left intact to save it? It makes the dependency
> explicit, avoids splitting our functions into two sets and allows us to
> just ignore this issue for the majority of functions that couldn't care
> less about `errno`.
100% agreed. The C language specification says you can't rely on errno
persisting across function calls, and that the caller must preserve it
if it needs that behavior for some reason. The POSIX specification
says you can't either except in very rare circumstances where it
guarantees errno will not change. The Linux man page for errno says
you can't rely on errno not changing, even for printf:
https://man7.org/linux/man-pages/man3/errno.3.html
A common mistake is to do
if (somecall() == -1) {
printf("somecall() failed\n");
if (errno == ...) { ... }
}
where errno no longer needs to have the value it had upon return
from somecall() (i.e., it may have been changed by the
printf(3)). If the value of errno should be preserved across a
library call, it must be saved:
if (somecall() == -1) {
int errsv = errno;
printf("somecall() failed\n");
if (errsv == ...) { ... }
}
Basically: errno is _extremely_ volatile. One should assume that
_every_ function call is going to change it, even if they return
successfully. The only thing that can't happen is that the functions
defined in the C and POSIX standards set errno to 0, which is why I
withdrew the patch (since it's a wrapper around a function defined in
POSIX). But in general, I don't see any reason for any of the
functions we write to be errno preserving, especially since any call
to malloc, printf, trace functionality, etc. may modify errno.
>
> Patrick
^ permalink raw reply [flat|nested] 29+ messages in thread
end of thread, other threads:[~2024-08-06 7:04 UTC | newest]
Thread overview: 29+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-02 4:10 [PATCH 0/3] Small fixes for issues detected during internal CI runs Kyle Lippincott via GitGitGadget
2024-08-02 4:10 ` [PATCH 1/3] set errno=0 before strtoX calls Kyle Lippincott via GitGitGadget
2024-08-02 5:12 ` Patrick Steinhardt
2024-08-02 6:15 ` Kyle Lippincott
2024-08-02 15:01 ` Junio C Hamano
2024-08-02 4:10 ` [PATCH 2/3] strbuf: set errno to 0 after strbuf_getcwd Kyle Lippincott via GitGitGadget
2024-08-02 15:10 ` Junio C Hamano
2024-08-02 17:56 ` Kyle Lippincott
2024-08-02 4:10 ` [PATCH 3/3] t6421: fix test to work when repo dir contains d0 Kyle Lippincott via GitGitGadget
2024-08-02 15:13 ` Junio C Hamano
2024-08-02 20:58 ` [PATCH v2 0/3] Small fixes for issues detected during internal CI runs Kyle Lippincott via GitGitGadget
2024-08-02 20:58 ` [PATCH v2 1/3] set errno=0 before strtoX calls Kyle Lippincott via GitGitGadget
2024-08-02 21:18 ` Junio C Hamano
2024-08-02 20:58 ` [PATCH v2 2/3] strbuf: set errno to 0 after strbuf_getcwd Kyle Lippincott via GitGitGadget
2024-08-02 21:32 ` Junio C Hamano
2024-08-02 21:54 ` Eric Sunshine
2024-08-05 15:51 ` Junio C Hamano
2024-08-06 6:26 ` Patrick Steinhardt
2024-08-06 7:04 ` Kyle Lippincott
2024-08-02 23:51 ` Kyle Lippincott
2024-08-05 17:12 ` Kyle Lippincott
2024-08-02 20:58 ` [PATCH v2 3/3] t6421: fix test to work when repo dir contains d0 Kyle Lippincott via GitGitGadget
2024-08-02 21:41 ` Junio C Hamano
2024-08-03 0:03 ` Kyle Lippincott
2024-08-03 0:27 ` Junio C Hamano
2024-08-05 17:10 ` [PATCH v3 0/2] Small fixes for issues detected during internal CI runs Kyle Lippincott via GitGitGadget
2024-08-05 17:10 ` [PATCH v3 1/2] set errno=0 before strtoX calls Kyle Lippincott via GitGitGadget
2024-08-05 17:10 ` [PATCH v3 2/2] t6421: fix test to work when repo dir contains d0 Kyle Lippincott via GitGitGadget
2024-08-05 18:37 ` [PATCH v3 0/2] Small fixes for issues detected during internal CI runs Junio C Hamano
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).