All of lore.kernel.org
 help / color / mirror / Atom feed
From: John Fastabend <john.fastabend@gmail.com>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>,
	David Vernet <void@manifault.com>
Cc: John Fastabend <john.fastabend@gmail.com>,
	bpf@vger.kernel.org, ast@kernel.org, andrii@kernel.org,
	daniel@iogearbox.net, martin.lau@linux.dev, memxor@gmail.com,
	yhs@fb.com, song@kernel.org, sdf@google.com, kpsingh@kernel.org,
	jolsa@kernel.org, haoluo@google.com, tj@kernel.org,
	kernel-team@fb.com, linux-kernel@vger.kernel.org
Subject: Re: [PATCH bpf-next v7 0/3] Support storing struct task_struct objects as kptrs
Date: Fri, 18 Nov 2022 22:09:59 -0800	[thread overview]
Message-ID: <637873372bf8d_656da2081@john.notmuch> (raw)
In-Reply-To: <20221118183113.ftsafchmurs7copl@MacBook-Pro-5.local>

Alexei Starovoitov wrote:
> On Fri, Nov 18, 2022 at 09:08:12AM -0600, David Vernet wrote:
> > On Thu, Nov 17, 2022 at 10:04:27PM -0800, John Fastabend wrote:
> > 
> > [...]
> > 
> > > > > And last thing I was checking is because KF_SLEEPABLE is not set
> > > > > this should be blocked from running on sleepable progs which would
> > > > > break the call_rcu in the destructor. Maybe small nit, not sure
> > > > > its worth it but might be nice to annotate the helper description
> > > > > with a note, "will not work on sleepable progs" or something to
> > > > > that effect.
> > > > 
> > > > KF_SLEEPABLE is used to indicate whether the kfunc _itself_ may sleep,
> > > > not whether the calling program can be sleepable. call_rcu() doesn't
> > > > block, so no need to mark the kfunc as KF_SLEEPABLE. The key is that if
> > > > a kfunc is sleepable, non-sleepable programs are not able to call it
> > > > (and this is enforced in the verifier).
> > > 
> > > OK but should these helpers be allowed in sleepable progs? I think
> > > not. What stops this, (using your helpers):
> > > 
> > >   cpu0                                       cpu1
> > >   ----
> > >   v = insert_lookup_task(task)
> > >   kptr = bpf_kptr_xchg(&v->task, NULL);
> > >   if (!kptr)
> > >     return 0;
> > >                                             map_delete_elem()
> > >                                                put_task()
> > >                                                  rcu_call
> > >   do_something_might_sleep()
> > >                                                     put_task_struct
> > >                                                       ... free  
> 
> the free won't happen here, because the kptr on cpu0 holds the refcnt.
> bpf side never does direct free of kptr. It only inc/dec refcnt via kfuncs.
> 
> > >   kptr->[free'd memory]
> > >  
> > > the insert_lookup_task will bump the refcnt on the acquire on map
> > > insert. But the lookup doesn't do anything to the refcnt and the
> 
> lookup from map doesn't touch kptrs in the value.
> just reading v->kptr becomes PTR_UNTRUSTED with probe_mem protection.
> 
> > > map_delete_elem will delete it. We have a check for spin_lock
> > > types to stop them from being in sleepable progs. Did I miss a
> > > similar check for these?
> > 
> > So, in your example above, bpf_kptr_xchg(&v->task, NULL) will atomically
> > xchg the kptr from the map, and so the map_delete_elem() call would fail
> > with (something like) -ENOENT. In general, the semantics are similar to
> > std::unique_ptr::swap() in C++.
> > 
> > FWIW, I think KF_KPTR_GET kfuncs are the more complex / racy kfuncs to
> > reason about. The reason is that we're passing a pointer to the map
> > value containing a kptr directly to the kfunc (with the attempt of
> > acquiring an additional reference if a kptr was already present in the
> > map) rather than doing an xchg which atomically gets us the unique
> > pointer if nobody else xchgs it in first. So with KF_KPTR_GET, someone
> > else could come along and delete the kptr from the map while the kfunc
> > is trying to acquire that additional reference. The race looks something
> > like this:
> > 
> >    cpu0                                       cpu1
> >    ----
> >    v = insert_lookup_task(task)
> >    kptr = bpf_task_kptr_get(&v->task);
> >                                              map_delete_elem()
> >                                                 put_task()
> >                                                   rcu_call
> >                                                      put_task_struct
> >                                                        ... free  
> >    if (!kptr)
> >      /* In this race example, this path will be taken. */
> >      return 0;
> > 
> > The difference is that here, we're not doing an atomic xchg of the kptr
> > out of the map. Instead, we're passing a pointer to the map value
> > containing the kptr directly to bpf_task_kptr_get(), which itself tries
> > to acquire an additional reference on the task to return to the program
> > as a kptr. This is still safe, however, as bpf_task_kptr_get() uses RCU
> > and refcount_inc_not_zero() in the bpf_task_kptr_get() kfunc to ensure
> > that it can't hit a UAF, and that it won't return a dying task to the
> > caller:
> > 
> > /**
> >  * bpf_task_kptr_get - Acquire a reference on a struct task_struct kptr. A task
> >  * kptr acquired by this kfunc which is not subsequently stored in a map, must
> >  * be released by calling bpf_task_release().
> >  * @pp: A pointer to a task kptr on which a reference is being acquired.
> >  */
> > __used noinline
> > struct task_struct *bpf_task_kptr_get(struct task_struct **pp)
> > {
> >         struct task_struct *p;
> > 
> >         rcu_read_lock();
> >         p = READ_ONCE(*pp);
> > 
> > 	/* <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
> > 	 * cpu1 could remove the element from the map here, and invoke
> > 	 * put_task_struct_rcu_user(). We're in an RCU read region
> > 	 * though, so the task won't be freed until at the very
> > 	 * earliest, the rcu_read_unlock() below.
> > 	 * >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > 	 */
> > 
> >         if (p && !refcount_inc_not_zero(&p->rcu_users))
> > 		/* <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
> > 		 * refcount_inc_not_zero() will return false, as cpu1
> > 		 * deleted the element from the map and dropped its last
> > 		 * refcount. So we just return NULL as the task will be
> > 		 * deleted once an RCU gp has elapsed.
> > 		 * >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > 		 */
> >                 p = NULL;
> >         rcu_read_unlock();
> > 
> >         return p;
> > }
> > 
> > Let me know if that makes sense. This stuff is tricky, and I plan to
> > clearly / thoroughly add it to that kptr docs page once this patch set
> > lands.
> 
> All correct. Probably worth adding this comment directly in bpf_task_kptr_get.

Yes also agree thanks for the details. Spent sometime trying to break
it this event, but didn't find anything.

Thanks.

      reply	other threads:[~2022-11-19  6:10 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-17  3:23 [PATCH bpf-next v7 0/3] Support storing struct task_struct objects as kptrs David Vernet
2022-11-17  3:24 ` [PATCH bpf-next v7 1/3] bpf: Allow trusted pointers to be passed to KF_TRUSTED_ARGS kfuncs David Vernet
2022-11-18  2:26   ` Alexei Starovoitov
2022-11-18 14:45     ` David Vernet
2022-11-18 16:45       ` David Vernet
2022-11-18 18:45         ` Alexei Starovoitov
2022-11-18 21:44           ` David Vernet
2022-11-19  4:13             ` Alexei Starovoitov
2022-11-19  5:14               ` David Vernet
2022-11-19 16:48                 ` Alexei Starovoitov
2022-11-17  3:24 ` [PATCH bpf-next v7 2/3] bpf: Add kfuncs for storing struct task_struct * as a kptr David Vernet
2022-11-17  3:24 ` [PATCH bpf-next v7 3/3] bpf/selftests: Add selftests for new task kfuncs David Vernet
2022-11-18  2:21   ` Alexei Starovoitov
2022-11-18 14:49     ` David Vernet
2022-11-17 21:03 ` [PATCH bpf-next v7 0/3] Support storing struct task_struct objects as kptrs John Fastabend
2022-11-17 21:54   ` David Vernet
2022-11-17 22:36     ` John Fastabend
2022-11-18  1:41       ` David Vernet
2022-11-18  6:04         ` John Fastabend
2022-11-18 15:08           ` David Vernet
2022-11-18 18:31             ` Alexei Starovoitov
2022-11-19  6:09               ` John Fastabend [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=637873372bf8d_656da2081@john.notmuch \
    --to=john.fastabend@gmail.com \
    --cc=alexei.starovoitov@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=haoluo@google.com \
    --cc=jolsa@kernel.org \
    --cc=kernel-team@fb.com \
    --cc=kpsingh@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=martin.lau@linux.dev \
    --cc=memxor@gmail.com \
    --cc=sdf@google.com \
    --cc=song@kernel.org \
    --cc=tj@kernel.org \
    --cc=void@manifault.com \
    --cc=yhs@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.