* [PATCH] Always check the return value of `repo_read_object_file()`
@ 2024-02-05 14:35 Johannes Schindelin via GitGitGadget
2024-02-05 16:10 ` Karthik Nayak
` (4 more replies)
0 siblings, 5 replies; 16+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2024-02-05 14:35 UTC (permalink / raw)
To: git; +Cc: Johannes Schindelin, Johannes Schindelin
From: Johannes Schindelin <johannes.schindelin@gmx.de>
There are a couple of places in Git's source code where the return value
is not checked. As a consequence, they are susceptible to segmentation
faults.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
Always check the return value of repo_read_object_file()
I ran into this today, when I had tried git am -3 to import changes from
a repository into a different repository that has the first repository's
code vendored in. To make this work, I set
GIT_ALTERNATE_OBJECT_DIRECTORIES accordingly for the git am -3 call, but
forgot to set it for a subsequent git diff call, which then segfaulted.
There are still a couple of places left where there are checks but they
look dubious to me, as they simply continue as if an empty blob had been
read, for example in builtin/tag.c. However, there are checks that avoid
segfaults, so I left them alone.
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1650%2Fdscho%2Fsafer-object-reads-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1650/dscho/safer-object-reads-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/1650
bisect.c | 3 +++
builtin/cat-file.c | 10 ++++++++--
builtin/grep.c | 2 ++
builtin/notes.c | 6 ++++--
combine-diff.c | 2 ++
rerere.c | 3 +++
6 files changed, 22 insertions(+), 4 deletions(-)
diff --git a/bisect.c b/bisect.c
index f1273c787d9..f75e50c3397 100644
--- a/bisect.c
+++ b/bisect.c
@@ -158,6 +158,9 @@ static void show_list(const char *debug, int counted, int nr,
const char *subject_start;
int subject_len;
+ if (!buf)
+ die(_("unable to read %s"), oid_to_hex(&commit->object.oid));
+
fprintf(stderr, "%c%c%c ",
(commit_flags & TREESAME) ? ' ' : 'T',
(commit_flags & UNINTERESTING) ? 'U' : ' ',
diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 7d4899348a3..bbf851138ec 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -221,6 +221,10 @@ static int cat_one_file(int opt, const char *exp_type, const char *obj_name,
&type,
&size);
const char *target;
+
+ if (!buffer)
+ die(_("unable to read %s"), oid_to_hex(&oid));
+
if (!skip_prefix(buffer, "object ", &target) ||
get_oid_hex(target, &blob_oid))
die("%s not a valid tag", oid_to_hex(&oid));
@@ -416,6 +420,8 @@ static void print_object_or_die(struct batch_options *opt, struct expand_data *d
contents = repo_read_object_file(the_repository, oid, &type,
&size);
+ if (!contents)
+ die("object %s disappeared", oid_to_hex(oid));
if (use_mailmap) {
size_t s = size;
@@ -423,8 +429,6 @@ static void print_object_or_die(struct batch_options *opt, struct expand_data *d
size = cast_size_t_to_ulong(s);
}
- if (!contents)
- die("object %s disappeared", oid_to_hex(oid));
if (type != data->type)
die("object %s changed type!?", oid_to_hex(oid));
if (data->info.sizep && size != data->size && !use_mailmap)
@@ -481,6 +485,8 @@ static void batch_object_write(const char *obj_name,
buf = repo_read_object_file(the_repository, &data->oid, &data->type,
&data->size);
+ if (!buf)
+ die(_("unable to read %s"), oid_to_hex(&data->oid));
buf = replace_idents_using_mailmap(buf, &s);
data->size = cast_size_t_to_ulong(s);
diff --git a/builtin/grep.c b/builtin/grep.c
index c8e33f97755..982bcfc4b1d 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -571,6 +571,8 @@ static int grep_cache(struct grep_opt *opt,
data = repo_read_object_file(the_repository, &ce->oid,
&type, &size);
+ if (!data)
+ die(_("unable to read tree %s"), oid_to_hex(&ce->oid));
init_tree_desc(&tree, data, size);
hit |= grep_tree(opt, pathspec, &tree, &name, 0, 0);
diff --git a/builtin/notes.c b/builtin/notes.c
index e65cae0bcf7..caf20fd5bdd 100644
--- a/builtin/notes.c
+++ b/builtin/notes.c
@@ -716,9 +716,11 @@ static int append_edit(int argc, const char **argv, const char *prefix)
struct strbuf buf = STRBUF_INIT;
char *prev_buf = repo_read_object_file(the_repository, note, &type, &size);
- if (prev_buf && size)
+ if (!prev_buf)
+ die(_("unable to read %s"), oid_to_hex(note));
+ if (size)
strbuf_add(&buf, prev_buf, size);
- if (d.buf.len && prev_buf && size)
+ if (d.buf.len && size)
append_separator(&buf);
strbuf_insert(&d.buf, 0, buf.buf, buf.len);
diff --git a/combine-diff.c b/combine-diff.c
index db94581f724..d6d6fa16894 100644
--- a/combine-diff.c
+++ b/combine-diff.c
@@ -337,6 +337,8 @@ static char *grab_blob(struct repository *r,
free_filespec(df);
} else {
blob = repo_read_object_file(r, oid, &type, size);
+ if (!blob)
+ die(_("unable to read %s"), oid_to_hex(oid));
if (type != OBJ_BLOB)
die("object '%s' is not a blob!", oid_to_hex(oid));
}
diff --git a/rerere.c b/rerere.c
index ca7e77ba68c..13c94ded037 100644
--- a/rerere.c
+++ b/rerere.c
@@ -973,6 +973,9 @@ static int handle_cache(struct index_state *istate,
mmfile[i].ptr = repo_read_object_file(the_repository,
&ce->oid, &type,
&size);
+ if (!mmfile[i].ptr)
+ die(_("unable to read %s"),
+ oid_to_hex(&ce->oid));
mmfile[i].size = size;
}
}
base-commit: 2a540e432fe5dff3cfa9d3bf7ca56db2ad12ebb9
--
gitgitgadget
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH] Always check the return value of `repo_read_object_file()`
2024-02-05 14:35 [PATCH] Always check the return value of `repo_read_object_file()` Johannes Schindelin via GitGitGadget
@ 2024-02-05 16:10 ` Karthik Nayak
2024-02-06 18:36 ` Junio C Hamano
2024-02-12 23:16 ` Johannes Schindelin
2024-02-06 1:13 ` Kyle Lippincott
` (3 subsequent siblings)
4 siblings, 2 replies; 16+ messages in thread
From: Karthik Nayak @ 2024-02-05 16:10 UTC (permalink / raw)
To: Johannes Schindelin via GitGitGadget, git; +Cc: Johannes Schindelin
[-- Attachment #1: Type: text/plain, Size: 740 bytes --]
Hello,
"Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com> writes:
> diff --git a/bisect.c b/bisect.c
> index f1273c787d9..f75e50c3397 100644
> --- a/bisect.c
> +++ b/bisect.c
> @@ -158,6 +158,9 @@ static void show_list(const char *debug, int counted, int nr,
> const char *subject_start;
> int subject_len;
>
> + if (!buf)
> + die(_("unable to read %s"), oid_to_hex(&commit->object.oid));
> +
Nit: We know that `repo_read_object_file()` fails on corrupt objects, so
this means that this is only happening when the object doesn't exist. I
wonder if it makes more sense to replace "unable to read %s" which is a
little ambiguous with something like "object %q doesn't exist".
Otherwise, the patch looks good, thanks!
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] Always check the return value of `repo_read_object_file()`
2024-02-05 16:10 ` Karthik Nayak
@ 2024-02-06 18:36 ` Junio C Hamano
2024-02-12 23:16 ` Johannes Schindelin
1 sibling, 0 replies; 16+ messages in thread
From: Junio C Hamano @ 2024-02-06 18:36 UTC (permalink / raw)
To: Karthik Nayak
Cc: Johannes Schindelin via GitGitGadget, git, Johannes Schindelin
Karthik Nayak <karthik.188@gmail.com> writes:
> Hello,
>
> "Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com> writes:
>> diff --git a/bisect.c b/bisect.c
>> index f1273c787d9..f75e50c3397 100644
>> --- a/bisect.c
>> +++ b/bisect.c
>> @@ -158,6 +158,9 @@ static void show_list(const char *debug, int counted, int nr,
>> const char *subject_start;
>> int subject_len;
>>
>> + if (!buf)
>> + die(_("unable to read %s"), oid_to_hex(&commit->object.oid));
>> +
>
> Nit: We know that `repo_read_object_file()` fails on corrupt objects, so
> this means that this is only happening when the object doesn't exist. I
> wonder if it makes more sense to replace "unable to read %s" which is a
> little ambiguous with something like "object %q doesn't exist".
I am not sure if that is a good move in the longer run. We may
"fix" the called function to return NULL to allow callers to deal
with errors from object corruption better, at which time between
"doesn't exist" and "unable to read", the latter becomes far closer
to what actually happened (it is debatable if a corrupt thing really
exists in the first place, too).
> Otherwise, the patch looks good, thanks!
I haven't read the remainder of the patch, but to me this hunk looks
OK.
Thanks.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] Always check the return value of `repo_read_object_file()`
2024-02-05 16:10 ` Karthik Nayak
2024-02-06 18:36 ` Junio C Hamano
@ 2024-02-12 23:16 ` Johannes Schindelin
1 sibling, 0 replies; 16+ messages in thread
From: Johannes Schindelin @ 2024-02-12 23:16 UTC (permalink / raw)
To: Karthik Nayak; +Cc: Johannes Schindelin via GitGitGadget, git
Hi Karthik,
On Mon, 5 Feb 2024, Karthik Nayak wrote:
> Hello,
>
> "Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com> writes:
> > diff --git a/bisect.c b/bisect.c
> > index f1273c787d9..f75e50c3397 100644
> > --- a/bisect.c
> > +++ b/bisect.c
> > @@ -158,6 +158,9 @@ static void show_list(const char *debug, int counted, int nr,
> > const char *subject_start;
> > int subject_len;
> >
> > + if (!buf)
> > + die(_("unable to read %s"), oid_to_hex(&commit->object.oid));
> > +
>
> Nit: We know that `repo_read_object_file()` fails on corrupt objects, so
> this means that this is only happening when the object doesn't exist. I
> wonder if it makes more sense to replace "unable to read %s" which is a
> little ambiguous with something like "object %q doesn't exist".
I specifically copied this error message from existing code that already
deals with these errors, so as not to cause unnecessary translator
friction.
Ciao,
Johannes
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] Always check the return value of `repo_read_object_file()`
2024-02-05 14:35 [PATCH] Always check the return value of `repo_read_object_file()` Johannes Schindelin via GitGitGadget
2024-02-05 16:10 ` Karthik Nayak
@ 2024-02-06 1:13 ` Kyle Lippincott
2024-02-09 8:06 ` Johannes Schindelin
2024-02-06 6:51 ` Patrick Steinhardt
` (2 subsequent siblings)
4 siblings, 1 reply; 16+ messages in thread
From: Kyle Lippincott @ 2024-02-06 1:13 UTC (permalink / raw)
To: Johannes Schindelin via GitGitGadget; +Cc: git, Johannes Schindelin
On Mon, Feb 5, 2024 at 6:36 AM Johannes Schindelin via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Johannes Schindelin <johannes.schindelin@gmx.de>
>
> There are a couple of places in Git's source code where the return value
> is not checked. As a consequence, they are susceptible to segmentation
> faults.
>
> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> ---
> Always check the return value of repo_read_object_file()
>
> I ran into this today, when I had tried git am -3 to import changes from
> a repository into a different repository that has the first repository's
> code vendored in. To make this work, I set
> GIT_ALTERNATE_OBJECT_DIRECTORIES accordingly for the git am -3 call, but
> forgot to set it for a subsequent git diff call, which then segfaulted.
>
> There are still a couple of places left where there are checks but they
> look dubious to me, as they simply continue as if an empty blob had been
> read, for example in builtin/tag.c. However, there are checks that avoid
> segfaults, so I left them alone.
>
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1650%2Fdscho%2Fsafer-object-reads-v1
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1650/dscho/safer-object-reads-v1
> Pull-Request: https://github.com/gitgitgadget/git/pull/1650
>
> bisect.c | 3 +++
> builtin/cat-file.c | 10 ++++++++--
> builtin/grep.c | 2 ++
> builtin/notes.c | 6 ++++--
> combine-diff.c | 2 ++
> rerere.c | 3 +++
> 6 files changed, 22 insertions(+), 4 deletions(-)
>
> diff --git a/builtin/notes.c b/builtin/notes.c
> index e65cae0bcf7..caf20fd5bdd 100644
> --- a/builtin/notes.c
> +++ b/builtin/notes.c
> @@ -716,9 +716,11 @@ static int append_edit(int argc, const char **argv, const char *prefix)
> struct strbuf buf = STRBUF_INIT;
> char *prev_buf = repo_read_object_file(the_repository, note, &type, &size);
>
> - if (prev_buf && size)
> + if (!prev_buf)
> + die(_("unable to read %s"), oid_to_hex(note));
This changes the behavior of this function. Previously, it would not
add prev_buf output, but still succeed. This now dies.
> + if (size)
> strbuf_add(&buf, prev_buf, size);
> - if (d.buf.len && prev_buf && size)
> + if (d.buf.len && size)
> append_separator(&buf);
> strbuf_insert(&d.buf, 0, buf.buf, buf.len);
>
> diff --git a/combine-diff.c b/combine-diff.c
> index db94581f724..d6d6fa16894 100644
> --- a/combine-diff.c
> +++ b/combine-diff.c
> @@ -337,6 +337,8 @@ static char *grab_blob(struct repository *r,
> free_filespec(df);
> } else {
> blob = repo_read_object_file(r, oid, &type, size);
> + if (!blob)
> + die(_("unable to read %s"), oid_to_hex(oid));
Technically this is changing the output, but I think that's good - I
believe that previously the behavior was undefined, because `type`
wouldn't be modified if the blob didn't exist, and `type` wasn't
assigned a value earlier in the function.
> if (type != OBJ_BLOB)
> die("object '%s' is not a blob!", oid_to_hex(oid));
> }
>
> base-commit: 2a540e432fe5dff3cfa9d3bf7ca56db2ad12ebb9
> --
> gitgitgadget
>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] Always check the return value of `repo_read_object_file()`
2024-02-06 1:13 ` Kyle Lippincott
@ 2024-02-09 8:06 ` Johannes Schindelin
2024-02-09 16:37 ` Junio C Hamano
2024-02-09 19:56 ` Kyle Lippincott
0 siblings, 2 replies; 16+ messages in thread
From: Johannes Schindelin @ 2024-02-09 8:06 UTC (permalink / raw)
To: Kyle Lippincott; +Cc: Johannes Schindelin via GitGitGadget, git
[-- Attachment #1: Type: text/plain, Size: 1589 bytes --]
Hi Kyle,
On Mon, 5 Feb 2024, Kyle Lippincott wrote:
> On Mon, Feb 5, 2024 at 6:36 AM Johannes Schindelin via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
> >
> > diff --git a/builtin/notes.c b/builtin/notes.c
> > index e65cae0bcf7..caf20fd5bdd 100644
> > --- a/builtin/notes.c
> > +++ b/builtin/notes.c
> > @@ -716,9 +716,11 @@ static int append_edit(int argc, const char **argv, const char *prefix)
> > struct strbuf buf = STRBUF_INIT;
> > char *prev_buf = repo_read_object_file(the_repository, note, &type, &size);
> >
> > - if (prev_buf && size)
> > + if (!prev_buf)
> > + die(_("unable to read %s"), oid_to_hex(note));
>
> This changes the behavior of this function. Previously, it would not
> add prev_buf output, but still succeed. This now dies.
It does change behavior. The previous behavior looked up the note OID,
then tried to read it, and if it was missing just pretended that there had
not been a note.
I'm not quite sure whether we should keep that behavior, as it is unclear
in which scenarios it would be desirable to paper over missing objects.
In GitGitGadget, I am a heavy user of notes and it wouldn't do any good to
have this behavior: It would lose information.
And even in scenarios where the `notes` ref is fetch shallowly, I would
expect all of the actual notes blobs to be present, and I would _want_ the
`git note edit ...` command to error out when that blob is not found.
Does that reasoning make sense to you?
Ciao,
Johannes
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] Always check the return value of `repo_read_object_file()`
2024-02-09 8:06 ` Johannes Schindelin
@ 2024-02-09 16:37 ` Junio C Hamano
2024-02-09 19:56 ` Kyle Lippincott
1 sibling, 0 replies; 16+ messages in thread
From: Junio C Hamano @ 2024-02-09 16:37 UTC (permalink / raw)
To: Johannes Schindelin
Cc: Kyle Lippincott, Johannes Schindelin via GitGitGadget, git
Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
> It does change behavior. The previous behavior looked up the note OID,
> then tried to read it, and if it was missing just pretended that there had
> not been a note.
>
> I'm not quite sure whether we should keep that behavior, as it is unclear
> in which scenarios it would be desirable to paper over missing objects.
Yeah, an object that does not have any notes attached to is a norm
so the calling application must be prepared for it, but this
codepath is different. The notes tree says it has notes, we try to
read it and it is not there---at least we noticed an inconsistent
notes tree (and object store), and if we were to run "git fsck" at
that point, we would certainly complain about a missing blob object
(can a tree object at an intermediate level be missing and would we
notice, by the way, I wonder). It is only prudent to report it,
instead of pretending that the notes are not there.
So I think this tightening falls into the "bugfix" category.
Thanks.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] Always check the return value of `repo_read_object_file()`
2024-02-09 8:06 ` Johannes Schindelin
2024-02-09 16:37 ` Junio C Hamano
@ 2024-02-09 19:56 ` Kyle Lippincott
1 sibling, 0 replies; 16+ messages in thread
From: Kyle Lippincott @ 2024-02-09 19:56 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: Johannes Schindelin via GitGitGadget, git
On Fri, Feb 9, 2024 at 12:06 AM Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
>
> Hi Kyle,
>
> On Mon, 5 Feb 2024, Kyle Lippincott wrote:
>
> > On Mon, Feb 5, 2024 at 6:36 AM Johannes Schindelin via GitGitGadget
> > <gitgitgadget@gmail.com> wrote:
> > >
> > > diff --git a/builtin/notes.c b/builtin/notes.c
> > > index e65cae0bcf7..caf20fd5bdd 100644
> > > --- a/builtin/notes.c
> > > +++ b/builtin/notes.c
> > > @@ -716,9 +716,11 @@ static int append_edit(int argc, const char **argv, const char *prefix)
> > > struct strbuf buf = STRBUF_INIT;
> > > char *prev_buf = repo_read_object_file(the_repository, note, &type, &size);
> > >
> > > - if (prev_buf && size)
> > > + if (!prev_buf)
> > > + die(_("unable to read %s"), oid_to_hex(note));
> >
> > This changes the behavior of this function. Previously, it would not
> > add prev_buf output, but still succeed. This now dies.
>
> It does change behavior. The previous behavior looked up the note OID,
> then tried to read it, and if it was missing just pretended that there had
> not been a note.
>
> I'm not quite sure whether we should keep that behavior, as it is unclear
> in which scenarios it would be desirable to paper over missing objects.
>
> In GitGitGadget, I am a heavy user of notes and it wouldn't do any good to
> have this behavior: It would lose information.
>
> And even in scenarios where the `notes` ref is fetch shallowly, I would
> expect all of the actual notes blobs to be present, and I would _want_ the
> `git note edit ...` command to error out when that blob is not found.
>
> Does that reasoning make sense to you?
Sounds good, thanks for talking through it with me.
>
> Ciao,
> Johannes
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] Always check the return value of `repo_read_object_file()`
2024-02-05 14:35 [PATCH] Always check the return value of `repo_read_object_file()` Johannes Schindelin via GitGitGadget
2024-02-05 16:10 ` Karthik Nayak
2024-02-06 1:13 ` Kyle Lippincott
@ 2024-02-06 6:51 ` Patrick Steinhardt
2024-02-06 18:42 ` Junio C Hamano
2024-02-09 8:15 ` Johannes Schindelin
2024-02-06 22:02 ` Junio C Hamano
2024-02-16 6:43 ` Teng Long
4 siblings, 2 replies; 16+ messages in thread
From: Patrick Steinhardt @ 2024-02-06 6:51 UTC (permalink / raw)
To: Johannes Schindelin via GitGitGadget; +Cc: git, Johannes Schindelin
[-- Attachment #1: Type: text/plain, Size: 1025 bytes --]
On Mon, Feb 05, 2024 at 02:35:53PM +0000, Johannes Schindelin via GitGitGadget wrote:
> From: Johannes Schindelin <johannes.schindelin@gmx.de>
[snip]
> diff --git a/rerere.c b/rerere.c
> index ca7e77ba68c..13c94ded037 100644
> --- a/rerere.c
> +++ b/rerere.c
> @@ -973,6 +973,9 @@ static int handle_cache(struct index_state *istate,
> mmfile[i].ptr = repo_read_object_file(the_repository,
> &ce->oid, &type,
> &size);
> + if (!mmfile[i].ptr)
> + die(_("unable to read %s"),
> + oid_to_hex(&ce->oid));
> mmfile[i].size = size;
> }
> }
A few lines below this we check whether `mmfile[i].ptr` is `NULL` and
replace it with the empty string if so. So this patch here is basically
a change in behaviour where we now die instead of falling back to the
empty value.
I'm not familiar enough with the code to say whether the old behaviour
is intended or not -- it certainly feels somewhat weird to me. But it
did leave me wondering and could maybe use some explanation.
Patrick
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] Always check the return value of `repo_read_object_file()`
2024-02-06 6:51 ` Patrick Steinhardt
@ 2024-02-06 18:42 ` Junio C Hamano
2024-02-09 8:15 ` Johannes Schindelin
1 sibling, 0 replies; 16+ messages in thread
From: Junio C Hamano @ 2024-02-06 18:42 UTC (permalink / raw)
To: Patrick Steinhardt
Cc: Johannes Schindelin via GitGitGadget, git, Johannes Schindelin
Patrick Steinhardt <ps@pks.im> writes:
>> mmfile[i].ptr = repo_read_object_file(the_repository,
>> &ce->oid, &type,
>> &size);
>> + if (!mmfile[i].ptr)
>> + die(_("unable to read %s"),
>> + oid_to_hex(&ce->oid));
>> mmfile[i].size = size;
>> }
>> }
>
> A few lines below this we check whether `mmfile[i].ptr` is `NULL` and
> replace it with the empty string if so. So this patch here is basically
> a change in behaviour where we now die instead of falling back to the
> empty value.
I think that one is trying to cope with cases where we genuinely do
not have all three variants, not "we thought we had this variant so
we tried to read it into mmfile[i].{ptr,size}, but it turns out that
the object name we had was bad". So the fallback code for an entirely
different case was masking the breakage the above hunk fixes, and
this being "rerere", it is better to be cautious than sorry.
Thanks for reading the original code carefully.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] Always check the return value of `repo_read_object_file()`
2024-02-06 6:51 ` Patrick Steinhardt
2024-02-06 18:42 ` Junio C Hamano
@ 2024-02-09 8:15 ` Johannes Schindelin
2024-02-09 8:17 ` Patrick Steinhardt
1 sibling, 1 reply; 16+ messages in thread
From: Johannes Schindelin @ 2024-02-09 8:15 UTC (permalink / raw)
To: Patrick Steinhardt; +Cc: Johannes Schindelin via GitGitGadget, git
Hi Patrick,
On Tue, 6 Feb 2024, Patrick Steinhardt wrote:
> On Mon, Feb 05, 2024 at 02:35:53PM +0000, Johannes Schindelin via GitGitGadget wrote:
> > From: Johannes Schindelin <johannes.schindelin@gmx.de>
> [snip]
> > diff --git a/rerere.c b/rerere.c
> > index ca7e77ba68c..13c94ded037 100644
> > --- a/rerere.c
> > +++ b/rerere.c
> > @@ -973,6 +973,9 @@ static int handle_cache(struct index_state *istate,
> > mmfile[i].ptr = repo_read_object_file(the_repository,
> > &ce->oid, &type,
> > &size);
> > + if (!mmfile[i].ptr)
> > + die(_("unable to read %s"),
> > + oid_to_hex(&ce->oid));
> > mmfile[i].size = size;
> > }
> > }
>
> A few lines below this we check whether `mmfile[i].ptr` is `NULL` and
> replace it with the empty string if so. So this patch here is basically
> a change in behaviour where we now die instead of falling back to the
> empty value.
>
> I'm not familiar enough with the code to say whether the old behaviour
> is intended or not -- it certainly feels somewhat weird to me. But it
> did leave me wondering and could maybe use some explanation.
Hmm. That's a good point. The `mmfile[i].ptr == NULL` situation is indeed
handled specifically.
However, after reading the code I come to the conclusion that the `i`
refers to the stage of an index entry, i.e. that loop
(https://github.com/git/git/blob/v2.43.0/rerere.c#L981-L983) handles the
case where conflicts are due to deletions (where one side of the merge
deleted the file) or double-adds (where both sides of the merge added the
file, with different contents).
Therefore I would suggest that ignoring missing blobs (as is the pre-patch
behavior) would mishandle the available data and paper over a corruption
of the database (the blob is reachable via the Git index, but is missing).
Ciao,
Johannes
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] Always check the return value of `repo_read_object_file()`
2024-02-09 8:15 ` Johannes Schindelin
@ 2024-02-09 8:17 ` Patrick Steinhardt
0 siblings, 0 replies; 16+ messages in thread
From: Patrick Steinhardt @ 2024-02-09 8:17 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: Johannes Schindelin via GitGitGadget, git
[-- Attachment #1: Type: text/plain, Size: 2094 bytes --]
On Fri, Feb 09, 2024 at 09:15:15AM +0100, Johannes Schindelin wrote:
> Hi Patrick,
>
> On Tue, 6 Feb 2024, Patrick Steinhardt wrote:
>
> > On Mon, Feb 05, 2024 at 02:35:53PM +0000, Johannes Schindelin via GitGitGadget wrote:
> > > From: Johannes Schindelin <johannes.schindelin@gmx.de>
> > [snip]
> > > diff --git a/rerere.c b/rerere.c
> > > index ca7e77ba68c..13c94ded037 100644
> > > --- a/rerere.c
> > > +++ b/rerere.c
> > > @@ -973,6 +973,9 @@ static int handle_cache(struct index_state *istate,
> > > mmfile[i].ptr = repo_read_object_file(the_repository,
> > > &ce->oid, &type,
> > > &size);
> > > + if (!mmfile[i].ptr)
> > > + die(_("unable to read %s"),
> > > + oid_to_hex(&ce->oid));
> > > mmfile[i].size = size;
> > > }
> > > }
> >
> > A few lines below this we check whether `mmfile[i].ptr` is `NULL` and
> > replace it with the empty string if so. So this patch here is basically
> > a change in behaviour where we now die instead of falling back to the
> > empty value.
> >
> > I'm not familiar enough with the code to say whether the old behaviour
> > is intended or not -- it certainly feels somewhat weird to me. But it
> > did leave me wondering and could maybe use some explanation.
>
> Hmm. That's a good point. The `mmfile[i].ptr == NULL` situation is indeed
> handled specifically.
>
> However, after reading the code I come to the conclusion that the `i`
> refers to the stage of an index entry, i.e. that loop
> (https://github.com/git/git/blob/v2.43.0/rerere.c#L981-L983) handles the
> case where conflicts are due to deletions (where one side of the merge
> deleted the file) or double-adds (where both sides of the merge added the
> file, with different contents).
>
> Therefore I would suggest that ignoring missing blobs (as is the pre-patch
> behavior) would mishandle the available data and paper over a corruption
> of the database (the blob is reachable via the Git index, but is missing).
Yeah, that explanation sounds reasonable to me, thanks!
Patrick
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] Always check the return value of `repo_read_object_file()`
2024-02-05 14:35 [PATCH] Always check the return value of `repo_read_object_file()` Johannes Schindelin via GitGitGadget
` (2 preceding siblings ...)
2024-02-06 6:51 ` Patrick Steinhardt
@ 2024-02-06 22:02 ` Junio C Hamano
2024-02-12 23:19 ` Johannes Schindelin
2024-02-16 6:43 ` Teng Long
4 siblings, 1 reply; 16+ messages in thread
From: Junio C Hamano @ 2024-02-06 22:02 UTC (permalink / raw)
To: Johannes Schindelin via GitGitGadget; +Cc: git, Johannes Schindelin
> diff --git a/builtin/grep.c b/builtin/grep.c
> index c8e33f97755..982bcfc4b1d 100644
> --- a/builtin/grep.c
> +++ b/builtin/grep.c
> @@ -571,6 +571,8 @@ static int grep_cache(struct grep_opt *opt,
>
> data = repo_read_object_file(the_repository, &ce->oid,
> &type, &size);
> + if (!data)
> + die(_("unable to read tree %s"), oid_to_hex(&ce->oid));
> init_tree_desc(&tree, data, size);
>
> hit |= grep_tree(opt, pathspec, &tree, &name, 0, 0);
This caught my attention during today's integration cycle. Checking
nullness for data certainly is an improvement, but shouldn't we be
checking type as well to make sure it is a tree and not a random
tree-ish or even blob?
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] Always check the return value of `repo_read_object_file()`
2024-02-06 22:02 ` Junio C Hamano
@ 2024-02-12 23:19 ` Johannes Schindelin
0 siblings, 0 replies; 16+ messages in thread
From: Johannes Schindelin @ 2024-02-12 23:19 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Johannes Schindelin via GitGitGadget, git
Hi Junio,
On Tue, 6 Feb 2024, Junio C Hamano wrote:
> > diff --git a/builtin/grep.c b/builtin/grep.c
> > index c8e33f97755..982bcfc4b1d 100644
> > --- a/builtin/grep.c
> > +++ b/builtin/grep.c
> > @@ -571,6 +571,8 @@ static int grep_cache(struct grep_opt *opt,
> >
> > data = repo_read_object_file(the_repository, &ce->oid,
> > &type, &size);
> > + if (!data)
> > + die(_("unable to read tree %s"), oid_to_hex(&ce->oid));
> > init_tree_desc(&tree, data, size);
> >
> > hit |= grep_tree(opt, pathspec, &tree, &name, 0, 0);
>
> This caught my attention during today's integration cycle. Checking
> nullness for data certainly is an improvement, but shouldn't we be
> checking type as well to make sure it is a tree and not a random
> tree-ish or even blob?
That sounds right, but I am already stretching this patch beyond what I am
funded to work on, and therefore I need to leave it at the current state.
Ciao,
Johannes
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH] Always check the return value of `repo_read_object_file()`
2024-02-05 14:35 [PATCH] Always check the return value of `repo_read_object_file()` Johannes Schindelin via GitGitGadget
` (3 preceding siblings ...)
2024-02-06 22:02 ` Junio C Hamano
@ 2024-02-16 6:43 ` Teng Long
2024-02-18 22:36 ` Johannes Schindelin
4 siblings, 1 reply; 16+ messages in thread
From: Teng Long @ 2024-02-16 6:43 UTC (permalink / raw)
To: gitgitgadget; +Cc: git, johannes.schindelin
Johannes Schindelin <johannes.schindelin@gmx.de> wrote on Mon, 05 Feb 2024:
Hi, when I do zh_CN l10n work for 2.44, I found some check changes like:
die(_("unable to read tree %s")
in patchset, some old code for this like work is similar but with parentheses
surrounded with the OID parameter:
die(_("unable to read tree (%s)")
I think it's really a small nit, I don't think it's a requirement to immediately
optimize, they're just some small printing consistency formatting issues, so make
some small tips here.
Thanks.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] Always check the return value of `repo_read_object_file()`
2024-02-16 6:43 ` Teng Long
@ 2024-02-18 22:36 ` Johannes Schindelin
0 siblings, 0 replies; 16+ messages in thread
From: Johannes Schindelin @ 2024-02-18 22:36 UTC (permalink / raw)
To: Teng Long; +Cc: gitgitgadget, git
Hi,
On Fri, 16 Feb 2024, Teng Long wrote:
> Johannes Schindelin <johannes.schindelin@gmx.de> wrote on Mon, 05 Feb 2024:
>
> Hi, when I do zh_CN l10n work for 2.44, I found some check changes like:
>
> die(_("unable to read tree %s")
>
> in patchset, some old code for this like work is similar but with parentheses
> surrounded with the OID parameter:
>
> die(_("unable to read tree (%s)")
FWIW I copied the error message from
https://github.com/git/git/blob/v2.43.0/tree-walk.c#L103, but only now
realized that it is untranslated.
> I think it's really a small nit, I don't think it's a requirement to immediately
> optimize, they're just some small printing consistency formatting issues, so make
> some small tips here.
Thank you for paying attention. I agree that it would be good to make
Git's error messages consistent, even if I sadly won't be able to focus on
that due to changes at my dayjob.
Ciao,
Johannes
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2024-02-18 22:36 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-02-05 14:35 [PATCH] Always check the return value of `repo_read_object_file()` Johannes Schindelin via GitGitGadget
2024-02-05 16:10 ` Karthik Nayak
2024-02-06 18:36 ` Junio C Hamano
2024-02-12 23:16 ` Johannes Schindelin
2024-02-06 1:13 ` Kyle Lippincott
2024-02-09 8:06 ` Johannes Schindelin
2024-02-09 16:37 ` Junio C Hamano
2024-02-09 19:56 ` Kyle Lippincott
2024-02-06 6:51 ` Patrick Steinhardt
2024-02-06 18:42 ` Junio C Hamano
2024-02-09 8:15 ` Johannes Schindelin
2024-02-09 8:17 ` Patrick Steinhardt
2024-02-06 22:02 ` Junio C Hamano
2024-02-12 23:19 ` Johannes Schindelin
2024-02-16 6:43 ` Teng Long
2024-02-18 22:36 ` Johannes Schindelin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).