public inbox for linux-bcachefs@vger.kernel.org
 help / color / mirror / Atom feed
From: Kees Cook <kees@kernel.org>
To: Jan Hendrik Farr <kernel@jfarr.cc>
Cc: Thorsten Blum <thorsten.blum@toblux.com>,
	kent.overstreet@linux.dev, regressions@lists.linux.dev,
	linux-bcachefs@vger.kernel.org, linux-hardening@vger.kernel.org,
	linux-kernel@vger.kernel.org, ardb@kernel.org, morbo@google.com
Subject: Re: [REGRESSION][BISECTED] erroneous buffer overflow detected in bch2_xattr_validate
Date: Wed, 16 Oct 2024 14:13:39 -0700	[thread overview]
Message-ID: <202410161343.D871C005@keescook> (raw)
In-Reply-To: <ZwP5_bY-AmN91_YB@archlinux>

On Mon, Oct 07, 2024 at 05:10:53PM +0200, Jan Hendrik Farr wrote:
> On 07 05:56:46, Jan Hendrik Farr wrote:
> > > I want to separate several easily confused issues. Instead of just
> > > saying __bdos, let's clearly refer to what calculation within bdos is
> > > being used. There are 3 choices currently:
> > > - alloc_size attribute
> > > - counted_by attribute
> > > - fallback to __bos (which is similar to sizeof(), except that FAMs are 0 sized)
> > > 
> > > Additionally there are (for all intents and purposes) 2 size
> > > determinations to be made by __bos and __bdos, via argument 2:
> > > - containing object size (type 0) ("maximum size")
> > > - specific object size (type 1) ("minimum size")
> > 
> > "maximum" vs "minimum" size would by type 0 vs type 2, but I think you
> > do mean type 0 and type 1 as those are the types currently used by
> > __struct_size and __member_size. Those are both "maximum" sizes.
> > 
> > > 
> > > For example, consider:
> > > 
> > > struct posix_acl *acl = malloc(1024);
> > > acl->a_count = 1;
> > > 
> > > what should these return:
> > > 
> > > 	__bos(acl, 0)
> > > 	__bos(acl, 1)
> > > 	__bdos(acl, 0)
> > > 	__bdos(acl, 1)
> > > 	__bos(acl->a_entries, 0)
> > > 	__bos(acl->a_entries, 1)
> > > 	__bdos(acl->a_entries, 0)
> > > 	__bdos(acl->a_entries, 1)
> > > 
> > 
> > I gathered some data from clang and gcc on all for all these cases and
> > additionally varied whether the allocation size is a compile time known
> > constant, runtime known, or not known. I also varied whether
> > __counted_by was used.
> > 
> > Source code: [1]
> > 
> > 
> > Abbreviations:
> > 
> > FAM      = flexible array member
> > -1       = SIZE_MAX
> > p->a_ent = p->a_entries
> > comp.    = allocation size is compile time known
> > run.     = allocation size is compile time known
> > none     = allocation size is unknown
> > count    = __counted_by attribute in use
> > correct  = What I think the correct answers should be. In some places I
> > have two answers. In that case the second number is what the kernel
> > currently expects.
> > 
> > 
> > And here's the data:
> > 
> > function        |comp.|run.|none|count| gcc  |clang |correct
> > ----------------|-----|----|----|-----|------|------|-----
> > bos(p, 0)       |  x  |    |    |     | 1024 | 1024 | 1024
> > bos(p, 0)       |     | x  |    |     |  -1  |  -1  | -1
> > bos(p, 0)       |     |    | x  |     |  -1  |  -1  | -1
> > bos(p, 0)       |  x  |    |    |  x  | 1024 | 1024 | 1024
> > bos(p, 0)       |     | x  |    |  x  |  -1  |  -1  | -1
> > bos(p, 0)       |     |    | x  |  x  |  -1  |  -1  | -1
> > ----------------|-----|----|----|-----|------|------|-----
> > bos(p, 1)       |  x  |    |    |     | 1024 | 1024 | 1024
> > bos(p, 1)       |     | x  |    |     |  -1  |  -1  | -1
> > bos(p, 1)       |     |    | x  |     |  -1  |  -1  | -1
> > bos(p, 1)       |  x  |    |    |  x  | 1024 | 1024 | 1024
> > bos(p, 1)       |     | x  |    |  x  |  -1  |  -1  | -1
> > bos(p, 1)       |     |    | x  |  x  |  -1  |  -1  | -1
> > ----------------|-----|----|----|-----|------|------|-----
> > bdos(p, 0)      |  x  |    |    |     | 1024 | 1024 | 1024
> > bdos(p, 0)      |     | x  |    |     | 1024 | 1024 | 1024
> > bdos(p, 0)      |     |    | x  |     |  -1  |  -1  | -1
> > bdos(p, 0)      |  x  |    |    |  x  | 1024 |  36  | 43 / 40
> > bdos(p, 0)      |     | x  |    |  x  | 1024 |  36  | 43 / 40
> > bdos(p, 0)      |     |    | x  |  x  |  -1  |  36  | 43 / 40
> > ----------------|-----|----|----|-----|------|------|-----
> > bdos(p, 1)      |  x  |    |    |     | 1024 | 1024 | 1024
> > bdos(p, 1)      |     | x  |    |     | 1024 | 1024 | 1024
> > bdos(p, 1)      |     |    | x  |     |  -1  |  -1  | -1
> > bdos(p, 1)      |  x  |    |    |  x  | 1024 |  36  | 43 / 40
> > bdos(p, 1)      |     | x  |    |  x  | 1024 |  36  | 43 / 40
> > bdos(p, 1)      |     |    | x  |  x  |  -1  |  36  | 43 / 40
> > ----------------|-----|----|----|-----|------|------|-----
> > bos(p->a_ent, 0)|  x  |    |    |     |  996 | 996  | 996
> > bos(p->a_ent, 0)|     | x  |    |     |  -1  |  -1  | -1
> > bos(p->a_ent, 0)|     |    | x  |     |  -1  |  -1  | -1
> > bos(p->a_ent, 0)|  x  |    |    |  x  |  996 | 996  | 996
> > bos(p->a_ent, 0)|     | x  |    |  x  |  -1  |  -1  | -1
> > bos(p->a_ent, 0)|     |    | x  |  x  |  -1  |  -1  | -1
> > ----------------|-----|----|----|-----|------|------|-----
> > bos(p->a_ent, 1)|  x  |    |    |     |  996 | 996  | 992
> > bos(p->a_ent, 1)|     | x  |    |     |  -1  |  -1  | -1
> > bos(p->a_ent, 1)|     |    | x  |     |  -1  |  -1  | -1
> > bos(p->a_ent, 1)|  x  |    |    |  x  |  996 | 996  | 992
> > bos(p->a_ent, 1)|     | x  |    |  x  |  -1  |  -1  | -1
> > bos(p->a_ent, 1)|     |    | x  |  x  |  -1  |  -1  | -1
> > ----------------|-----|----|----|-----|------|------|-----
> > bdos(p->a_ent,0)|  x  |    |    |     |  996 | 996  | 996
> > bdos(p->a_ent,0)|     | x  |    |     |  996 | 996  | 996
> > bdos(p->a_ent,0)|     |    | x  |     |  -1  |  -1  | -1
> > bdos(p->a_ent,0)|  x  |    |    |  x  |   8  |  8   |  8
> > bdos(p->a_ent,0)|     | x  |    |  x  |   8  |  8   |  8
> > bdos(p->a_ent,0)|     |    | x  |  x  |   8  |  8   |  8
> 
> 
> These previous three should probably actually be like this:
> bdos(p->a_ent,0)|  x  |    |    |  x  |   8  |  8   |  15
> bdos(p->a_ent,0)|     | x  |    |  x  |   8  |  8   |  15
> bdos(p->a_ent,0)|     |    | x  |  x  |   8  |  8   |  15
> 
> They should include the allowed padding after the FAM, as this is a type
> 0 bdos. Not really an issue here, as the kernel expects 8 here.
> 
> 
> > ----------------|-----|----|----|-----|------|------|-----
> > bdos(p->a_ent,1)|  x  |    |    |     |  996 | 996  | 992
> > bdos(p->a_ent,1)|     | x  |    |     |  996 | 996  | 992
> > bdos(p->a_ent,1)|     |    | x  |     |  -1  |  -1  | -1
> > bdos(p->a_ent,1)|  x  |    |    |  x  |   8  |  8   |  8
> > bdos(p->a_ent,1)|     | x  |    |  x  |   8  |  8   |  8
> > bdos(p->a_ent,1)|     |    | x  |  x  |   8  |  8   |  8
> > ----------------|-----|----|----|-----|------|------|-----

Thanks for building all these tables -- I want to look at it all again
myself, but I'm pretty well convinced you've found a bunch of things we
need to sort out.

> > bos only uses the allocation size to give it's answers. It only works if
> > it is a compile time known constant. bos also does not utilize the
> > __counted_by attribute.
> > 
> > bdos on the other hand allows the allocation size to be runtime known.
> > It also makes use of the __counted_by attribute if present, which always
> > takes precedence over the allocation size when the compiler supports it
> > for the particular case. So in those cases you can "lie" to the compiler
> > about the size of an object.

I am okay with counted_by winning out over alloc_size when it is present.
However, I would expect this:

void *v = malloc(1024);
struct flex *p = v;
p->counter = 4;

__bdos(p, 1) == struct_size(f, array, 4)	// "p" has both size hints
__bdos(v, 1) == 1024				// "v" only has alloc_size

> > 
> > clang supports the __counted_by attribute for both cases (p and
> > p->a_entries). gcc only supports it for p->a_entries cases.

I think gcc refuses to believe "p" is anything until it has been
dereferenced to a specific type. I would like it if they could fix this,
as if we have a pointer to a type and we're using __bdos to query its
size, we are explicitly treating it as the type it is pointing at (i.e.
examining the counter and evaluating the FAM size).

> > 
> > 
> > 
> > Issue A (clang)
> > =======
> > 
> > function        |comp.|run.|none|count| gcc  |clang |correct
> > ----------------|-----|----|----|-----|------|------|-----
> > bdos(p, 0)      |  x  |    |    |  x  | 1024 |  36  | 43 / 40
> > bdos(p, 0)      |     | x  |    |  x  | 1024 |  36  | 43 / 40
> > bdos(p, 0)      |     |    | x  |  x  |  -1  |  36  | 43 / 40
> > bdos(p, 1)      |  x  |    |    |  x  | 1024 |  36  | 43 / 40
> > bdos(p, 1)      |     | x  |    |  x  | 1024 |  36  | 43 / 40
> > bdos(p, 1)      |     |    | x  |  x  |  -1  |  36  | 43 / 40
> > 
> > These cases also represent the "bdos off by 4" issue in clang. clang
> > will compute these results using:
> > 
> > max(sizeof(struct posix_acl), offsetof(struct posix_acl, a_entries) +
> > count * sizeof(struct posix_acl_entries)) = 36
> > 
> > The kernel on the other hand expects this behavior:
> > 
> > sizeof(struct posix_acl) + count * sizeof(struct posix_acl_entries) = 40
> > 
> > 
> > I think the correct calculation would actually be this:
> > 
> > offsetof(struct posix_acl, a_entries)
> > + (acl->a_count + 1) * sizeof(struct posix_acl_entry) - 1 = 43
> > 
> > The C11 standard says that when the . or -> operator is used on a struct
> > with an FAM it behaves like the FAM was replaced with the largest array
> > (with the same element type) that would not make the object any larger
> > (see page 113 and 114 of [2]).
> > So there are actually multiple sizes of the object that are consistent
> > with a count of 1.
> > 
> > malloc-max = maximum size of the object
> > malloc-min = minimum size of the object
> > FAME = flexible array member element
> > (FAME) = hypothetical 2nd FAME
> > 
> > <-----------------malloc-max-------------->
> > <-----------------malloc-min------->
> > <------sizeof(posix_acl)------->
> >                             <-FAME-><(FAME)>
> > 
> > The clang documentation of type 0 (vs type 2) bdos says this:
> > 
> > If ``type & 2 == 0``, the least ``n`` is returned such that accesses to 
> >    ``(const char*)ptr + n`` and beyond are known to be out of bounds.
> > 
> > We only _know_ that that access to the last byte of a 2nd hypothetical FAME
> > would be out of bounds. All the bytes before that are padding that is
> > allowed by the standard.
> > 
> > 
> > However, also this calculation doesn't get the kernel out
> > of trouble here. While this would fix the issue for this particular
> > struct it does not solve it for all structs:
> > 
> > What if the elements of the FAM were chars instead of
> > struct posix_acl_entries here? In that case the kernel is back to
> > overestimating the size of the struct / underreporting the count to the
> > compiler. So while I think this answer is more correct it doesn't
> > actually solve the issue.
> > 
> > Example:
> > Let's say the kernel allocates one of these posix_acl_char structs for a
> > single char in the array:
> > 
> > malloc(sizeof(posix_acl_char) + 1 * sizeof(char)) = 33
> > 
> > The C standard actually says that this object will behave like this when
> > the FAM is accessed:
> > 
> > struct posix_acl {
> >     refcount_t a_refcount;
> >     struct rcu_head a_rcu;
> >     unsigned int a_count;
> >     char a_entries[5];
> > };
> > 
> > a_count should be set to 5, not 1!

This is making my head spin. In a practical sense, a struct instance
has a fixed size (which I would say is the calculation using max(),
above). Whether the . or -> operators are used shouldn't matter (to
the program's view of the object size).

This "1 becomes 5" should just not be allowed. I can accept object padding
contributing to object size, but we should not redefine the array size
out from under what it actually is. I don't want array accesses into
padding either. :P

I think 36 is correct here, not 40 nor 43.

> > So we would really need an option to tell the compiler to use the same
> > size calculation as the kernel expects here, or maybe be able to specify
> > an offset in the __counted_by attribute. Alternatively clang could use
> > an option to disable the use of __counted_by for cases where the whole
> > struct is passed. This would make it behave like gcc.
> > 
> > 
> > 
> > Issue B (clang + gcc)
> > =======
> > 
> > A less serious issue happens with these cases:
> > 
> > function        |comp.|run.|none|count| gcc  |clang |correct
> > ----------------|-----|----|----|-----|------|------|-----
> > bos(p->a_ent, 1)|  x  |    |    |     |  996 | 996  | 992
> > bos(p->a_ent, 1)|  x  |    |    |  x  |  996 | 996  | 992
> > bdos(p->a_ent,1)|  x  |    |    |     |  996 | 996  | 992
> > bdos(p->a_ent,1)|     | x  |    |     |  996 | 996  | 992
> > 
> > In this case the size returned by bos/bdos is too large, so this won't
> > lead to false positives. Both clang and gcc simply compute the difference
> > between the pointer from the start of the FAM to the end of the whole
> > struct. I believe this is wrong. According to the C standard the object
> > should behave like the FAM was replaced with the largest array that does
> > not make the object any larger. The size of that array is 124 elements.
> > So the posix_acl becomes:
> > 
> > struct posix_acl {
> >     refcount_t a_refcount;
> >     struct rcu_head a_rcu;
> >     unsigned int a_count;
> >     struct posix_acl_entry a_entries[124];
> > };
> > 
> > Since this is a type 1 bos/bdos it should return the size of just the
> > array, which is 124 * 8 = 992, and not 124.5 * 8 = 996.

For type 1, I agree: this needs to be the precise size of the array.

As of now, are GCC and Clang both using:

max(sizeof(*p),
    offsetof(typeof(*p), FAM) + count * sizeof(*p->FAM))

?

Linux can adjust struct_size() to match, and I'll check that nothing
shrunk inappropriately...

-- 
Kees Cook

  reply	other threads:[~2024-10-16 21:13 UTC|newest]

Thread overview: 62+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-09-26 15:14 [REGRESSION][BISECTED] erroneous buffer overflow detected in bch2_xattr_validate Jan Hendrik Farr
2024-09-26 15:28 ` Thorsten Blum
2024-09-26 16:09   ` Thorsten Blum
2024-09-26 16:37     ` Jan Hendrik Farr
2024-09-26 17:01     ` Jan Hendrik Farr
2024-09-26 17:45       ` Jan Hendrik Farr
2024-09-26 19:58         ` Ard Biesheuvel
2024-09-26 22:18           ` Bill Wendling
2024-09-27  1:30             ` Bill Wendling
2024-09-27  3:41               ` Jan Hendrik Farr
2024-09-28 20:50               ` Kees Cook
2024-09-28 23:33                 ` Jan Hendrik Farr
2024-09-29 19:59                   ` Jan Hendrik Farr
2024-09-28 17:36     ` Jan Hendrik Farr
2024-09-28 17:49       ` Jan Hendrik Farr
2024-09-28 20:34       ` Kees Cook
2024-10-02  9:18         ` Thorsten Blum
2024-10-03 11:33           ` Jan Hendrik Farr
2024-10-03 13:07             ` Thorsten Blum
2024-10-03 13:12               ` Jan Hendrik Farr
2024-10-03 15:02                 ` Thorsten Blum
2024-10-03 15:22                   ` Jan Hendrik Farr
2024-10-03 15:30                     ` Thorsten Blum
2024-10-03 15:35                       ` Jan Hendrik Farr
2024-10-03 15:43                         ` Thorsten Blum
2024-10-03 16:32                           ` Jan Hendrik Farr
2024-10-03 15:17               ` Jan Hendrik Farr
2024-10-03 21:28                 ` Kees Cook
2024-10-03 21:48                   ` Jan Hendrik Farr
2024-10-04 17:13                     ` Kees Cook
2024-10-07  3:56                       ` Jan Hendrik Farr
2024-10-07 15:10                         ` Jan Hendrik Farr
2024-10-16 21:13                           ` Kees Cook [this message]
2024-10-16 23:41                         ` Bill Wendling
2024-10-17  0:09                           ` Bill Wendling
2024-10-17  3:04                             ` Jan Hendrik Farr
2024-10-17 16:55                               ` Nathan Chancellor
2024-10-17 17:39                                 ` Miguel Ojeda
2024-10-17 18:55                                   ` Nathan Chancellor
2024-10-18 11:52                                     ` Miguel Ojeda
2024-10-21  1:33                                 ` Jan Hendrik Farr
2024-10-21  6:04                                   ` Miguel Ojeda
2024-10-21 17:01                                     ` Jan Hendrik Farr
2024-10-21 19:25                                   ` Nathan Chancellor
2024-10-24 13:16                                     ` Jan Hendrik Farr
2024-10-25  1:15                                       ` Nathan Chancellor
2024-10-25  8:10                                         ` Miguel Ojeda
2024-10-25 15:27                                           ` Jan Hendrik Farr
2025-05-01 14:30                                             ` Alan Huang
2025-05-01 16:45                                               ` Jan Hendrik Farr
2025-05-01 17:22                                               ` Jan Hendrik Farr
2025-05-01 17:28                                                 ` Alan Huang
2025-05-01 17:58                                                   ` Jan Hendrik Farr
2025-05-01 18:10                                                     ` Kees Cook
2025-05-01 18:18                                                     ` Alan Huang
2024-10-17  0:41                           ` Jan Hendrik Farr
2024-10-14 21:39                       ` Bill Wendling
2024-10-16  1:22             ` Bill Wendling
2024-10-16  2:18               ` Jan Hendrik Farr
2024-10-16 20:43                 ` Kees Cook
2024-10-03 21:23           ` Kees Cook
2024-10-03 22:05             ` Jan Hendrik Farr

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=202410161343.D871C005@keescook \
    --to=kees@kernel.org \
    --cc=ardb@kernel.org \
    --cc=kent.overstreet@linux.dev \
    --cc=kernel@jfarr.cc \
    --cc=linux-bcachefs@vger.kernel.org \
    --cc=linux-hardening@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=morbo@google.com \
    --cc=regressions@lists.linux.dev \
    --cc=thorsten.blum@toblux.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox