git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] attr: fix msan issue in read_attr_from_index
@ 2024-06-17 20:00 Kyle Lippincott via GitGitGadget
  2024-06-17 20:08 ` Junio C Hamano
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Kyle Lippincott via GitGitGadget @ 2024-06-17 20:00 UTC (permalink / raw)
  To: git; +Cc: Kyle Lippincott, Kyle Lippincott

From: Kyle Lippincott <spectral@google.com>

Memory sanitizer (msan) is detecting a use of an uninitialized variable
(`size`) in `read_attr_from_index`:

    ==2268==WARNING: MemorySanitizer: use-of-uninitialized-value
    #0 0x5651f3416504 in read_attr_from_index git/attr.c:868:11
    #1 0x5651f3415530 in read_attr git/attr.c
    #2 0x5651f3413d74 in bootstrap_attr_stack git/attr.c:968:6
    #3 0x5651f3413d74 in prepare_attr_stack git/attr.c:1004:2
    #4 0x5651f3413d74 in collect_some_attrs git/attr.c:1199:2
    #5 0x5651f3413144 in git_check_attr git/attr.c:1345:2
    #6 0x5651f34728da in convert_attrs git/convert.c:1320:2
    #7 0x5651f3473425 in would_convert_to_git_filter_fd git/convert.c:1373:2
    #8 0x5651f357a35e in index_fd git/object-file.c:2630:34
    #9 0x5651f357aa15 in index_path git/object-file.c:2657:7
    #10 0x5651f35db9d9 in add_to_index git/read-cache.c:766:7
    #11 0x5651f35dc170 in add_file_to_index git/read-cache.c:799:9
    #12 0x5651f321f9b2 in add_files git/builtin/add.c:346:7
    #13 0x5651f321f9b2 in cmd_add git/builtin/add.c:565:18
    #14 0x5651f321d327 in run_builtin git/git.c:474:11
    #15 0x5651f321bc9e in handle_builtin git/git.c:729:3
    #16 0x5651f321a792 in run_argv git/git.c:793:4
    #17 0x5651f321a792 in cmd_main git/git.c:928:19
    #18 0x5651f33dde1f in main git/common-main.c:62:11

The issue exists because `size` is an output parameter from
`read_blob_data_from_index`, but it's only modified if
`read_blob_data_from_index` returns non-NULL. The read of `size` when
calling `read_attr_from_buf` unconditionally may read from an
uninitialized value. `read_attr_from_buf` checks that `buf` is non-NULL
before reading from `size`, but by then it's already too late: the
uninitialized read will have happened already. Furthermore, there's no
guarantee that the compiler won't reorder things so that it checks
`size` before checking `!buf`.

Make the call to `read_attr_from_buf` conditional on `buf` being
non-NULL, ensuring that `size` is not read if it's never set.

Signed-off-by: Kyle Lippincott <spectral@google.com>
---
    attr: fix msan issue in read_attr_from_index

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1747%2Fspectral54%2Fmsan-attr-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1747/spectral54/msan-attr-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/1747

 attr.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/attr.c b/attr.c
index 300f994ba6e..a2e0775f7e5 100644
--- a/attr.c
+++ b/attr.c
@@ -865,7 +865,8 @@ static struct attr_stack *read_attr_from_index(struct index_state *istate,
 		stack = read_attr_from_blob(istate, &istate->cache[sparse_dir_pos]->oid, relative_path, flags);
 	} else {
 		buf = read_blob_data_from_index(istate, path, &size);
-		stack = read_attr_from_buf(buf, size, path, flags);
+		if (buf)
+			stack = read_attr_from_buf(buf, size, path, flags);
 	}
 	return stack;
 }

base-commit: 8d94cfb54504f2ec9edc7ca3eb5c29a3dd3675ae
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] attr: fix msan issue in read_attr_from_index
  2024-06-17 20:00 [PATCH] attr: fix msan issue in read_attr_from_index Kyle Lippincott via GitGitGadget
@ 2024-06-17 20:08 ` Junio C Hamano
  2024-06-17 20:30 ` Junio C Hamano
  2024-06-18 23:39 ` Jeff King
  2 siblings, 0 replies; 6+ messages in thread
From: Junio C Hamano @ 2024-06-17 20:08 UTC (permalink / raw)
  To: Kyle Lippincott via GitGitGadget; +Cc: git, Kyle Lippincott

"Kyle Lippincott via GitGitGadget" <gitgitgadget@gmail.com> writes:

> Make the call to `read_attr_from_buf` conditional on `buf` being
> non-NULL, ensuring that `size` is not read if it's never set.

Makes good sense.


> Signed-off-by: Kyle Lippincott <spectral@google.com>
> ---
>     attr: fix msan issue in read_attr_from_index
>
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1747%2Fspectral54%2Fmsan-attr-v1
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1747/spectral54/msan-attr-v1
> Pull-Request: https://github.com/gitgitgadget/git/pull/1747
>
>  attr.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/attr.c b/attr.c
> index 300f994ba6e..a2e0775f7e5 100644
> --- a/attr.c
> +++ b/attr.c
> @@ -865,7 +865,8 @@ static struct attr_stack *read_attr_from_index(struct index_state *istate,
>  		stack = read_attr_from_blob(istate, &istate->cache[sparse_dir_pos]->oid, relative_path, flags);
>  	} else {
>  		buf = read_blob_data_from_index(istate, path, &size);
> -		stack = read_attr_from_buf(buf, size, path, flags);
> +		if (buf)
> +			stack = read_attr_from_buf(buf, size, path, flags);
>  	}
>  	return stack;
>  }

Not directly related to the issue this patch addresses, but I notice
that both buf and size variables have unnecesarily wide scope.  As a
clean-up we may want to move their declaration into this "} else {"
block.  But that is totally outside the scope (no pun intended) of
this patch.

Will queue.
Thanks.  

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] attr: fix msan issue in read_attr_from_index
  2024-06-17 20:00 [PATCH] attr: fix msan issue in read_attr_from_index Kyle Lippincott via GitGitGadget
  2024-06-17 20:08 ` Junio C Hamano
@ 2024-06-17 20:30 ` Junio C Hamano
  2024-06-17 21:14   ` Junio C Hamano
  2024-06-17 21:17   ` Kyle Lippincott
  2024-06-18 23:39 ` Jeff King
  2 siblings, 2 replies; 6+ messages in thread
From: Junio C Hamano @ 2024-06-17 20:30 UTC (permalink / raw)
  To: Kyle Lippincott via GitGitGadget; +Cc: git, Kyle Lippincott

"Kyle Lippincott via GitGitGadget" <gitgitgadget@gmail.com> writes:

> The issue exists because `size` is an output parameter from
> `read_blob_data_from_index`, but it's only modified if
> `read_blob_data_from_index` returns non-NULL.

Correct.

> The read of `size` when
> calling `read_attr_from_buf` unconditionally may read from an
> uninitialized value. `read_attr_from_buf` checks that `buf` is non-NULL
> before reading from `size`, but by then it's already too late: the
> uninitialized read will have happened already.

Yes, but it is dubious that reading an uninitialized value that we
know will not be used is a problem, so I am inclined to say that
MSAN is giving a false positive here.

> Furthermore, there's no
> guarantee that the compiler won't reorder things so that it checks
> `size` before checking `!buf`.

This I do not understand.  Are you talking about buf vs length here
in the callee?

        static struct attr_stack *read_attr_from_buf(char *buf, size_t length,
                                                     const char *path, unsigned flags)
        {
                struct attr_stack *res;
                char *sp;
                int lineno = 0;

                if (!buf)
                        return NULL;
                if (length >= ATTR_MAX_FILE_SIZE) {
                        warning(_("ignoring overly large gitattributes blob '%s'"), path);
                        free(buf);
                        return NULL;
                }

At the machine level, a prefetch may happen from both buf and
length, but the program ought to behave the same way as the code is
executed serially as written.  If the compiler allows the outside
world to observe that resulting code checks length even when buf is
NULL, such a compiler is broken.  So I do not think that is what you
are referring to, but then I do not know what problem you are
describing.

Having said all that ...

> Make the call to `read_attr_from_buf` conditional on `buf` being
> non-NULL, ensuring that `size` is not read if it's never set.

... this makes the logic at the caller crystal clear, so even if
there are suboptimal checker that bothers us with false positives,
the change itself justifies itself, I would say.

>  	} else {
>  		buf = read_blob_data_from_index(istate, path, &size);
> -		stack = read_attr_from_buf(buf, size, path, flags);
> +		if (buf)
> +			stack = read_attr_from_buf(buf, size, path, flags);
>  	}
>  	return stack;

Thanks.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] attr: fix msan issue in read_attr_from_index
  2024-06-17 20:30 ` Junio C Hamano
@ 2024-06-17 21:14   ` Junio C Hamano
  2024-06-17 21:17   ` Kyle Lippincott
  1 sibling, 0 replies; 6+ messages in thread
From: Junio C Hamano @ 2024-06-17 21:14 UTC (permalink / raw)
  To: Kyle Lippincott via GitGitGadget; +Cc: git, Kyle Lippincott

Junio C Hamano <gitster@pobox.com> writes:

> Having said all that ...
>
>> Make the call to `read_attr_from_buf` conditional on `buf` being
>> non-NULL, ensuring that `size` is not read if it's never set.
>
> ... this makes the logic at the caller crystal clear, so even if
> there are suboptimal checker that bothers us with false positives,
> the change itself justifies itself, I would say.

Well, "even if there were *no* MSAN or other issues wrt usage of size"
was what I wanted to say.  Sorry for a noise.

>>  	} else {
>>  		buf = read_blob_data_from_index(istate, path, &size);
>> -		stack = read_attr_from_buf(buf, size, path, flags);
>> +		if (buf)
>> +			stack = read_attr_from_buf(buf, size, path, flags);
>>  	}
>>  	return stack;
>
> Thanks.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] attr: fix msan issue in read_attr_from_index
  2024-06-17 20:30 ` Junio C Hamano
  2024-06-17 21:14   ` Junio C Hamano
@ 2024-06-17 21:17   ` Kyle Lippincott
  1 sibling, 0 replies; 6+ messages in thread
From: Kyle Lippincott @ 2024-06-17 21:17 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Kyle Lippincott via GitGitGadget, git

On Mon, Jun 17, 2024 at 1:30 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> "Kyle Lippincott via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
> > The issue exists because `size` is an output parameter from
> > `read_blob_data_from_index`, but it's only modified if
> > `read_blob_data_from_index` returns non-NULL.
>
> Correct.
>
> > The read of `size` when
> > calling `read_attr_from_buf` unconditionally may read from an
> > uninitialized value. `read_attr_from_buf` checks that `buf` is non-NULL
> > before reading from `size`, but by then it's already too late: the
> > uninitialized read will have happened already.
>
> Yes, but it is dubious that reading an uninitialized value that we
> know will not be used is a problem, so I am inclined to say that
> MSAN is giving a false positive here.
>
> > Furthermore, there's no
> > guarantee that the compiler won't reorder things so that it checks
> > `size` before checking `!buf`.
>
> This I do not understand.  Are you talking about buf vs length here
> in the callee?
>
>         static struct attr_stack *read_attr_from_buf(char *buf, size_t length,
>                                                      const char *path, unsigned flags)
>         {
>                 struct attr_stack *res;
>                 char *sp;
>                 int lineno = 0;
>
>                 if (!buf)
>                         return NULL;
>                 if (length >= ATTR_MAX_FILE_SIZE) {
>                         warning(_("ignoring overly large gitattributes blob '%s'"), path);
>                         free(buf);
>                         return NULL;
>                 }
>
> At the machine level, a prefetch may happen from both buf and
> length, but the program ought to behave the same way as the code is
> executed serially as written.  If the compiler allows the outside
> world to observe that resulting code checks length even when buf is
> NULL, such a compiler is broken.  So I do not think that is what you
> are referring to, but then I do not know what problem you are
> describing.

Once there's an uninitialized read, we're in undefined behavior
territory. There's no requirement that the compiler keep the code
operating the way we'd logically expect once there's undefined
behavior, especially with the optimizer involved.

I think that you're right though: when I wrote this I'd convinced
myself that this wasn't guaranteed to work, but taking a look now I
can't think of a way for this to go wrong, because afaik size_t is
such a simple type, conceptually. When compiling `read_attr_from_buf`,
it can't actually assume any relationship between `buf` and `length`.
Maybe I was thinking that with link-time optimizations (whole-program
optimizations) it can go wrong? I'm not remembering what I was
thinking about when I wrote that, sorry.

>
> Having said all that ...
>
> > Make the call to `read_attr_from_buf` conditional on `buf` being
> > non-NULL, ensuring that `size` is not read if it's never set.
>
> ... this makes the logic at the caller crystal clear, so even if
> there are suboptimal checker that bothers us with false positives,
> the change itself justifies itself, I would say.
>
> >       } else {
> >               buf = read_blob_data_from_index(istate, path, &size);
> > -             stack = read_attr_from_buf(buf, size, path, flags);
> > +             if (buf)
> > +                     stack = read_attr_from_buf(buf, size, path, flags);
> >       }
> >       return stack;
>
> Thanks.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] attr: fix msan issue in read_attr_from_index
  2024-06-17 20:00 [PATCH] attr: fix msan issue in read_attr_from_index Kyle Lippincott via GitGitGadget
  2024-06-17 20:08 ` Junio C Hamano
  2024-06-17 20:30 ` Junio C Hamano
@ 2024-06-18 23:39 ` Jeff King
  2 siblings, 0 replies; 6+ messages in thread
From: Jeff King @ 2024-06-18 23:39 UTC (permalink / raw)
  To: Kyle Lippincott via GitGitGadget; +Cc: Taylor Blau, git, Kyle Lippincott

On Mon, Jun 17, 2024 at 08:00:24PM +0000, Kyle Lippincott via GitGitGadget wrote:

> The issue exists because `size` is an output parameter from
> `read_blob_data_from_index`, but it's only modified if
> `read_blob_data_from_index` returns non-NULL. The read of `size` when
> calling `read_attr_from_buf` unconditionally may read from an
> uninitialized value. `read_attr_from_buf` checks that `buf` is non-NULL
> before reading from `size`, but by then it's already too late: the
> uninitialized read will have happened already. Furthermore, there's no
> guarantee that the compiler won't reorder things so that it checks
> `size` before checking `!buf`.
> 
> Make the call to `read_attr_from_buf` conditional on `buf` being
> non-NULL, ensuring that `size` is not read if it's never set.

Yeah, this is the same one I mentioned when bisecting in the other
thread[1]. But I got confused by applying my fixup patch at various
points in the bisection, and thought it _used_ to be a problem, and
isn't anymore. It's the other way around. It was introduced by
c793f9cb08, which moved the NULL check into the helper.

That patch is from Taylor, but I'm listed as a co-author, and I'm almost
certain moving that NULL check was my suggestion. So it's doubly bad
that I didn't figure out what was going on earlier. ;)

Possible UB aside, I doubt this can trigger bad behavior in practice.
But I also wouldn't call it a false positive in MSan. We really are
reading the uninitialized value and passing it. Your fix here is the
obviously correct thing to do.

-Peff

[1] https://lore.kernel.org/git/20240608081855.GA2390433@coredump.intra.peff.net/

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2024-06-18 23:39 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-17 20:00 [PATCH] attr: fix msan issue in read_attr_from_index Kyle Lippincott via GitGitGadget
2024-06-17 20:08 ` Junio C Hamano
2024-06-17 20:30 ` Junio C Hamano
2024-06-17 21:14   ` Junio C Hamano
2024-06-17 21:17   ` Kyle Lippincott
2024-06-18 23:39 ` Jeff King

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).