From: Kyle Lippincott <spectral@google.com>
To: Junio C Hamano <gitster@pobox.com>
Cc: Kyle Lippincott via GitGitGadget <gitgitgadget@gmail.com>,
git@vger.kernel.org
Subject: Re: [PATCH] attr: fix msan issue in read_attr_from_index
Date: Mon, 17 Jun 2024 14:17:28 -0700 [thread overview]
Message-ID: <CAO_smVg1GuyreBw7Dw0RjLPjD1KhH-NmFMkkC-D4hB7kfPqjCQ@mail.gmail.com> (raw)
In-Reply-To: <xmqqcyof5n2t.fsf@gitster.g>
On Mon, Jun 17, 2024 at 1:30 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> "Kyle Lippincott via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
> > The issue exists because `size` is an output parameter from
> > `read_blob_data_from_index`, but it's only modified if
> > `read_blob_data_from_index` returns non-NULL.
>
> Correct.
>
> > The read of `size` when
> > calling `read_attr_from_buf` unconditionally may read from an
> > uninitialized value. `read_attr_from_buf` checks that `buf` is non-NULL
> > before reading from `size`, but by then it's already too late: the
> > uninitialized read will have happened already.
>
> Yes, but it is dubious that reading an uninitialized value that we
> know will not be used is a problem, so I am inclined to say that
> MSAN is giving a false positive here.
>
> > Furthermore, there's no
> > guarantee that the compiler won't reorder things so that it checks
> > `size` before checking `!buf`.
>
> This I do not understand. Are you talking about buf vs length here
> in the callee?
>
> static struct attr_stack *read_attr_from_buf(char *buf, size_t length,
> const char *path, unsigned flags)
> {
> struct attr_stack *res;
> char *sp;
> int lineno = 0;
>
> if (!buf)
> return NULL;
> if (length >= ATTR_MAX_FILE_SIZE) {
> warning(_("ignoring overly large gitattributes blob '%s'"), path);
> free(buf);
> return NULL;
> }
>
> At the machine level, a prefetch may happen from both buf and
> length, but the program ought to behave the same way as the code is
> executed serially as written. If the compiler allows the outside
> world to observe that resulting code checks length even when buf is
> NULL, such a compiler is broken. So I do not think that is what you
> are referring to, but then I do not know what problem you are
> describing.
Once there's an uninitialized read, we're in undefined behavior
territory. There's no requirement that the compiler keep the code
operating the way we'd logically expect once there's undefined
behavior, especially with the optimizer involved.
I think that you're right though: when I wrote this I'd convinced
myself that this wasn't guaranteed to work, but taking a look now I
can't think of a way for this to go wrong, because afaik size_t is
such a simple type, conceptually. When compiling `read_attr_from_buf`,
it can't actually assume any relationship between `buf` and `length`.
Maybe I was thinking that with link-time optimizations (whole-program
optimizations) it can go wrong? I'm not remembering what I was
thinking about when I wrote that, sorry.
>
> Having said all that ...
>
> > Make the call to `read_attr_from_buf` conditional on `buf` being
> > non-NULL, ensuring that `size` is not read if it's never set.
>
> ... this makes the logic at the caller crystal clear, so even if
> there are suboptimal checker that bothers us with false positives,
> the change itself justifies itself, I would say.
>
> > } else {
> > buf = read_blob_data_from_index(istate, path, &size);
> > - stack = read_attr_from_buf(buf, size, path, flags);
> > + if (buf)
> > + stack = read_attr_from_buf(buf, size, path, flags);
> > }
> > return stack;
>
> Thanks.
next prev parent reply other threads:[~2024-06-17 21:17 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-06-17 20:00 [PATCH] attr: fix msan issue in read_attr_from_index Kyle Lippincott via GitGitGadget
2024-06-17 20:08 ` Junio C Hamano
2024-06-17 20:30 ` Junio C Hamano
2024-06-17 21:14 ` Junio C Hamano
2024-06-17 21:17 ` Kyle Lippincott [this message]
2024-06-18 23:39 ` Jeff King
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAO_smVg1GuyreBw7Dw0RjLPjD1KhH-NmFMkkC-D4hB7kfPqjCQ@mail.gmail.com \
--to=spectral@google.com \
--cc=git@vger.kernel.org \
--cc=gitgitgadget@gmail.com \
--cc=gitster@pobox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).