Re: [PATCH v2 1/1] pgo: Fix sleep in atomic section in prf_open()

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Jarmo Tiitto <jarmo.tiitto@gmail.com>
To: Jarmo Tiitto <jarmo.tiitto@gmail.com>, Kees Cook <keescook@chromium.org>
Cc: Sami Tolvanen <samitolvanen@google.com>,
	Bill Wendling <wcw@google.com>,
	Nathan Chancellor <nathan@kernel.org>,
	Nick Desaulniers <ndesaulniers@google.com>,
	clang-built-linux@googlegroups.com, linux-kernel@vger.kernel.org,
	morbo@google.com
Subject: Re: [PATCH v2 1/1] pgo: Fix sleep in atomic section in prf_open()
Date: Fri, 04 Jun 2021 13:15:43 +0300	[thread overview]
Message-ID: <3874710.oRHo3lsn7p@hyperiorarchmachine> (raw)
In-Reply-To: <202106031441.FA95440A@keescook>

Kees Cook wrote perjantaina 4. kesäkuuta 2021 0.47.23 EEST:
> On Thu, Jun 03, 2021 at 06:53:17PM +0300, Jarmo Tiitto wrote:
> > In prf_open() the required buffer size can be so large that
> > vzalloc() may sleep thus triggering bug:
> > 
> > ======
> >  BUG: sleeping function called from invalid context at include/linux/sched/mm.h:201
> >  in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 337, name: cat
> >  CPU: 1 PID: 337 Comm: cat Not tainted 5.13.0-rc2-24-hack+ #154
> >  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
> >  Call Trace:
> >   dump_stack+0xc7/0x134
> >   ___might_sleep+0x177/0x190
> >   __might_sleep+0x5a/0x90
> >   kmem_cache_alloc_node_trace+0x6b/0x3a0
> >   ? __get_vm_area_node+0xcd/0x1b0
> >   ? dput+0x283/0x300
> >   __get_vm_area_node+0xcd/0x1b0
> >   __vmalloc_node_range+0x7b/0x420
> >   ? prf_open+0x1da/0x580
> >   ? prf_open+0x32/0x580
> >   ? __llvm_profile_instrument_memop+0x36/0x50
> >   vzalloc+0x54/0x60
> >   ? prf_open+0x1da/0x580
> >   prf_open+0x1da/0x580
> >   full_proxy_open+0x211/0x370
> >   ....
> > ======
> > 
> > Since we can't vzalloc while holding pgo_lock,
> > split the code into steps:
> > * First get buffer size via prf_buffer_size()
> >   and release the lock.
> > * Round up to the page size and allocate the buffer.
> > * Finally re-acquire the pgo_lock and call prf_serialize().
> >   prf_serialize() will now check if the buffer is large enough
> >   and returns -EAGAIN if it is not.
> > 
> > New in this v2 patch:
> > The -EAGAIN case was determined to be such rare event that
> > running following in a loop:
> > 
> > $cat /sys/kernel/debug/pgo/vmlinux.profraw > vmlinux.profdata;
> > 
> > Didn't trigger it, and I don't know if it ever may occur at all.
> 
> Hm, I remain nervous that it'll pop up when we least expect it. But, I
> went to go look at this, and I don't understand why we need a lock at
> all for prf_buffer_size(). These appear to be entirely static in size.
> 

I would think the reasoning of taking the pgo_lock for prf_buffer_size() is that because
__prf_get_value_size() walks linked lists that are modified by 
__llvm_profile_instrument_target() in instrument.c.

However __llvm_profile_instrument_target() looks like it will only append to
llvm_prf_data::values array of  llvm_prf_value_node linked lists, so you might be right.

I'll try analyze the prf_buffer_size() better to determine upper bound of memory required,
such that the pgo_lock is absolutely not needed for prf_buffer_size().

> > 
> > Signed-off-by: Jarmo Tiitto <jarmo.tiitto@gmail.com>
> > ---
> >  kernel/pgo/fs.c | 52 ++++++++++++++++++++++++++++++++++++-------------
> >  1 file changed, 38 insertions(+), 14 deletions(-)
> > 
> > diff --git a/kernel/pgo/fs.c b/kernel/pgo/fs.c
> > index ef985159dad3..9afd6f001a1b 100644
> > --- a/kernel/pgo/fs.c
> > +++ b/kernel/pgo/fs.c
> > @@ -24,13 +24,14 @@
> >  #include <linux/module.h>
> >  #include <linux/slab.h>
> >  #include <linux/vmalloc.h>
> > +#include <linux/mm.h>
> >  #include "pgo.h"
> >  
> >  static struct dentry *directory;
> >  
> >  struct prf_private_data {
> >  	void *buffer;
> > -	unsigned long size;
> > +	size_t size;
> >  };
> >  
> >  /*
> > @@ -213,6 +214,7 @@ static inline unsigned long prf_get_padding(unsigned long size)
> >  	return 7 & (sizeof(u64) - size % sizeof(u64));
> >  }
> >  
> > +/* Note: caller *must* hold pgo_lock */
> >  static unsigned long prf_buffer_size(void)
> >  {
> >  	return sizeof(struct llvm_prf_header) +
> > @@ -225,18 +227,21 @@ static unsigned long prf_buffer_size(void)
> >  
> >  /*
> >   * Serialize the profiling data into a format LLVM's tools can understand.
> > + * Note: p->buffer must point into vzalloc()'d
> > + * area of at least prf_buffer_size() in size.
> >   * Note: caller *must* hold pgo_lock.
> >   */
> > -static int prf_serialize(struct prf_private_data *p)
> > +static int prf_serialize(struct prf_private_data *p, size_t buf_size)
> >  {
> >  	int err = 0;
> >  	void *buffer;
> >  
> > +	/* get buffer size, again. */
> >  	p->size = prf_buffer_size();
> > -	p->buffer = vzalloc(p->size);
> >  
> > -	if (!p->buffer) {
> > -		err = -ENOMEM;
> > +	/* check for unlikely overflow. */
> > +	if (p->size > buf_size) {
> > +		err = -EAGAIN;
> 
> This can just be ENOMEM instead -- it'll never change in size. (But we
> should absolutely keep the check.)
> 

Ok, I was wondering what return value would be appropriate here.
 
> >  		goto out;
> >  	}
> >  
> > @@ -259,27 +264,46 @@ static int prf_open(struct inode *inode, struct file *file)
> >  {
> >  	struct prf_private_data *data;
> >  	unsigned long flags;
> > -	int err;
> > +	size_t buf_size;
> > +	int err = 0;
> >  
> >  	data = kzalloc(sizeof(*data), GFP_KERNEL);
> >  	if (!data) {
> >  		err = -ENOMEM;
> > -		goto out;
> > +		goto out_free;
> >  	}
> >  
> > +	/* get buffer size */
> >  	flags = prf_lock();
> > +	buf_size = prf_buffer_size();
> > +	prf_unlock(flags);
> 
> And there's no locking needed here.
> 
> >  
> > -	err = prf_serialize(data);
> > -	if (unlikely(err)) {
> > -		kfree(data);
> > -		goto out_unlock;
> > +	/* allocate, round up to page size. */
> > +	buf_size = PAGE_ALIGN(buf_size);
> > +	data->buffer = vzalloc(buf_size);
> > +
> > +	if (!data->buffer) {
> > +		err = -ENOMEM;
> > +		goto out_free;
> >  	}
> >  
> > +	/* try serialize and get actual
> > +	 * data length in data->size
> > +	 */
> > +	flags = prf_lock();
> > +	err = prf_serialize(data, buf_size);
> > +	prf_unlock(flags);
> > +
> > +	if (err)
> > +		goto out_free;
> > +
> >  	file->private_data = data;
> > +	return 0;
> >  
> > -out_unlock:
> > -	prf_unlock(flags);
> > -out:
> > +out_free:
> > +	if (data)
> > +		vfree(data->buffer);
> > +	kfree(data);
> >  	return err;
> >  }
> >  
> > 
> > base-commit: 5d0cda65918279ada060417c5fecb7e86ccb3def
> > -- 
> > 2.31.1
> > 
> 
> -- 
> Kees Cook
> 


--
-Jarmo

next prev parent reply	other threads:[~2021-06-04 10:16 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-03 15:53 [PATCH v2 1/1] pgo: Fix sleep in atomic section in prf_open() Jarmo Tiitto
2021-06-03 21:47 ` Kees Cook
2021-06-04 10:15   ` Jarmo Tiitto [this message]
2021-06-04 18:19     ` Kees Cook

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3874710.oRHo3lsn7p@hyperiorarchmachine \
    --to=jarmo.tiitto@gmail.com \
    --cc=clang-built-linux@googlegroups.com \
    --cc=keescook@chromium.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=morbo@google.com \
    --cc=nathan@kernel.org \
    --cc=ndesaulniers@google.com \
    --cc=samitolvanen@google.com \
    --cc=wcw@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.