public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Kees Cook <keescook@chromium.org>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	KVM list <kvm@vger.kernel.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Sean Christopherson <seanjc@google.com>, Willy Tarreau <w@1wt.eu>,
	syzbot+e0de2333cbf95ea473e8@syzkaller.appspotmail.com
Subject: Re: [PATCH] mm: allow huge kvmalloc() calls if they're accounted to memcg
Date: Mon, 18 Oct 2021 08:19:23 -0700	[thread overview]
Message-ID: <202110180817.A2C3AE34@keescook> (raw)
In-Reply-To: <10e3d402-017e-1a0d-b6c7-112117067b03@redhat.com>

On Sun, Oct 17, 2021 at 01:17:56AM +0200, Paolo Bonzini wrote:
> On 16/10/21 20:10, Linus Torvalds wrote:
> > That said, I also do wonder if we could possibly change "kvcalloc()"
> > to avoid the warning. The reason I didn't like your patch is that
> > kvmalloc_node() only takes a "size_t", and the overflow condition
> > there is that "MAX_INT".
> > 
> > But the "kvcalloc()" case that takes a "number of elements and size"
> > should _conceptually_ warn not when the total size overflows, but when
> > either number or the element size overflows.
> 
> That makes sense, but the number could still overflow in KVM's case; the
> size is small, just 8, it's the count that's humongous.  In general,
> users of kvcalloc of kvmalloc_array *should* not be doing
> multiplications (that's the whole point of the functions), and that
> lowers a lot the risk of overflows, but the safest way is to provide
> a variant that does not warn.  See the (compile-tested only) patch
> below.
> 
> Pulling the WARN in the inline function is a bit ugly.  For kvcalloc()
> and kvmalloc_array(), one of the two is almost always constant, but
> it is unlikely that the compiler eliminates both.  The impact on a
> localyesconfig build seems to be minimal though (about 150 bytes
> larger out of 20 megabytes of code).
> 
> Paolo
> 
> ---------------- 8< -----------------
> From: Paolo Bonzini <pbonzini@redhat.com>
> Subject: [PATCH] mm: add kvmalloc variants that do not to warn
> 
> Commit 7661809d493b ("mm: don't allow oversized kvmalloc() calls")
> restricted memory allocation with 'kvmalloc()' to sizes that fit
> in an 'int', to protect against trivial integer conversion issues.
> However, the WARN triggers with KVM when it allocates ancillary page
> data, whose size essentially depends on whatever userspace has passed to
> the KVM_SET_USER_MEMORY_REGION ioctl.  The warnings are quickly found by
> syzkaller, but they can also happen with huge but real-world VMs.
> The largest allocation that KVM can do is 8 bytes per page of guest
> memory, meaning a 1 TiB memslot will cause a warning even outside fuzzing.
> In fact, Google already has VMs that create 1.5 TiB memslots (12 TiB of
> total guest memory spread across 8 virtual NUMA nodes).
> 
> For kvcalloc() and kvmalloc_array(), Linus suggested warning if either
> the number or the size are big.  However, this would only move the
> goalpost for KVM's warning without fully avoiding it.  Therefore,
> provide a "double underscore" version of kvcalloc(), kvmalloc_array()
> and kvmalloc_node() that omits the check.
> 
> Cc: Willy Tarreau <w@1wt.eu>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Linus Torvalds <torvalds@linux-foundation.org>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Ah, so memcg wasn't doing sanity checks?

Is there a cheap way to resolve the question "does this much memory
exist"? The "__" versions end up lacking context for why they're "__"
versions. I.e. do we want something more descriptive, like
__huge_kvmalloc_node() or __unbounded_kvmalloc_node()?

-Kees

> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 73a52aba448f..92aba7327bd8 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -799,7 +799,15 @@ static inline int is_vmalloc_or_module_addr(const void *x)
>  }
>  #endif
> -extern void *kvmalloc_node(size_t size, gfp_t flags, int node);
> +extern void *__kvmalloc_node(size_t size, gfp_t flags, int node);
> +static inline void *kvmalloc_node(size_t size, gfp_t flags, int node)
> +{
> +	/* Don't even allow crazy sizes */
> +	if (WARN_ON(size > INT_MAX))
> +		return NULL;
> +	return __kvmalloc_node(size, flags, node);
> +}
> +
>  static inline void *kvmalloc(size_t size, gfp_t flags)
>  {
>  	return kvmalloc_node(size, flags, NUMA_NO_NODE);
> @@ -813,14 +821,31 @@ static inline void *kvzalloc(size_t size, gfp_t flags)
>  	return kvmalloc(size, flags | __GFP_ZERO);
>  }
> -static inline void *kvmalloc_array(size_t n, size_t size, gfp_t flags)
> +static inline void *__kvmalloc_array(size_t n, size_t size, gfp_t flags)
>  {
>  	size_t bytes;
>  	if (unlikely(check_mul_overflow(n, size, &bytes)))
>  		return NULL;
> -	return kvmalloc(bytes, flags);
> +	return __kvmalloc_node(bytes, flags, NUMA_NO_NODE);
> +}
> +
> +static inline void *kvmalloc_array(size_t n, size_t size, gfp_t flags)
> +{
> +	/*
> +	 * Don't allow crazy sizes here, either.  For 64-bit,
> +	 * this also lets the compiler avoid the overflow check.
> +	 */
> +	if (WARN_ON(size > INT_MAX || n > INT_MAX))
> +		return NULL;
> +
> +	return __kvmalloc_array(n, size, flags);
> +}
> +
> +static inline void *__kvcalloc(size_t n, size_t size, gfp_t flags)
> +{
> +	return __kvmalloc_array(n, size, flags | __GFP_ZERO);
>  }
>  static inline void *kvcalloc(size_t n, size_t size, gfp_t flags)
> diff --git a/mm/util.c b/mm/util.c
> index 499b6b5767ed..0406709d8097 100644
> --- a/mm/util.c
> +++ b/mm/util.c
> @@ -558,7 +558,7 @@ EXPORT_SYMBOL(vm_mmap);
>   *
>   * Return: pointer to the allocated memory of %NULL in case of failure
>   */
> -void *kvmalloc_node(size_t size, gfp_t flags, int node)
> +void *__kvmalloc_node(size_t size, gfp_t flags, int node)
>  {
>  	gfp_t kmalloc_flags = flags;
>  	void *ret;
> @@ -593,14 +593,10 @@ void *kvmalloc_node(size_t size, gfp_t flags, int node)
>  	if (ret || size <= PAGE_SIZE)
>  		return ret;
> -	/* Don't even allow crazy sizes */
> -	if (WARN_ON_ONCE(size > INT_MAX))
> -		return NULL;
> -
>  	return __vmalloc_node(size, 1, flags, node,
>  			__builtin_return_address(0));
>  }
> -EXPORT_SYMBOL(kvmalloc_node);
> +EXPORT_SYMBOL(__kvmalloc_node);
>  /**
>   * kvfree() - Free memory.
> 
> > So I would also accept a patch that just changes how "kvcalloc()"
> > works (or how "kvmalloc_array()" works).
> > 
> > It's a bit annoying how we've ended up losing that "n/size"
> > information by the time we hit kvmalloc().
> > 
> >                 Linus
> > 
> 

-- 
Kees Cook

  reply	other threads:[~2021-10-18 15:19 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-16  6:43 [PATCH] mm: allow huge kvmalloc() calls if they're accounted to memcg Paolo Bonzini
2021-10-16 15:39 ` Linus Torvalds
2021-10-16 17:53   ` Paolo Bonzini
2021-10-16 18:10     ` Linus Torvalds
2021-10-16 23:17       ` Paolo Bonzini
2021-10-18 15:19         ` Kees Cook [this message]
2021-10-18 15:24           ` Paolo Bonzini
  -- strict thread matches above, loose matches on Subject: below --
2021-10-16  6:51 Paolo Bonzini
2021-10-18 15:09 ` Kees Cook

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=202110180817.A2C3AE34@keescook \
    --to=keescook@chromium.org \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=seanjc@google.com \
    --cc=syzbot+e0de2333cbf95ea473e8@syzkaller.appspotmail.com \
    --cc=torvalds@linux-foundation.org \
    --cc=w@1wt.eu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox