Re: [PATCH v3 bpf-next 1/8] bpf: Introduce bpf timers.

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Yonghong Song <yhs@fb.com>
To: Alexei Starovoitov <ast@fb.com>,
	Alexei Starovoitov <alexei.starovoitov@gmail.com>,
	<davem@davemloft.net>
Cc: <daniel@iogearbox.net>, <andrii@kernel.org>,
	<netdev@vger.kernel.org>, <bpf@vger.kernel.org>,
	<kernel-team@fb.com>
Subject: Re: [PATCH v3 bpf-next 1/8] bpf: Introduce bpf timers.
Date: Fri, 25 Jun 2021 08:54:55 -0700	[thread overview]
Message-ID: <cfa10fa1-9ee4-95de-109d-a24cd5d43a98@fb.com> (raw)
In-Reply-To: <de1204cc-8c20-0e09-8880-e39c9ee6d889@fb.com>



On 6/25/21 7:57 AM, Alexei Starovoitov wrote:
> On 6/24/21 11:25 PM, Yonghong Song wrote:
>>
>>> +
>>> +    ____bpf_spin_lock(&timer->lock);
>>
>> I think we may still have some issues.
>> Case 1:
>>    1. one bpf program is running in process context,
>>       bpf_timer_start() is called and timer->lock is taken
>>    2. timer softirq is triggered and this callback is called
> 
> ___bpf_spin_lock is actually irqsave version of spin_lock.
> So this race is not possible.

Sorry I missed that ____bpf_spin_lock() has local_irq_save(),
so yes. the above situation cannot happen.

> 
>> Case 2:
>>    1. this callback is called, timer->lock is taken
>>    2. a nmi happens and some bpf program is called (kprobe, tracepoint,
>>       fentry/fexit or perf_event, etc.) and that program calls
>>       bpf_timer_start()
>>
>> So we could have deadlock in both above cases?
> 
> Shouldn't be possible either because bpf timers are not allowed
> in nmi-bpf-progs. I'll double check that it's the case.
> Pretty much the same restrictions are with bpf_spin_lock.

The patch added bpf_base_func_proto() to bpf_tracing_func_proto:

diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 7a52bc172841..80f6e6dafd5e 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -1057,7 +1057,7 @@ bpf_tracing_func_proto(enum bpf_func_id func_id, 
const struct bpf_prog *prog)
  	case BPF_FUNC_snprintf:
  		return &bpf_snprintf_proto;
  	default:
-		return NULL;
+		return bpf_base_func_proto(func_id);
  	}
  }

and timer helpers are added to bpf_base_func_proto:
@@ -1055,6 +1330,12 @@ bpf_base_func_proto(enum bpf_func_id func_id)
  		return &bpf_per_cpu_ptr_proto;
  	case BPF_FUNC_this_cpu_ptr:
  		return &bpf_this_cpu_ptr_proto;
+	case BPF_FUNC_timer_init:
+		return &bpf_timer_init_proto;
+	case BPF_FUNC_timer_start:
+		return &bpf_timer_start_proto;
+	case BPF_FUNC_timer_cancel:
+		return &bpf_timer_cancel_proto;
  	default:
  		break;
  	}

static const struct bpf_func_proto *
pe_prog_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
{
         switch (func_id) {
...
         default:
                 return bpf_tracing_func_proto(func_id, prog);
         }
}

static const struct bpf_func_proto *
kprobe_prog_func_proto(enum bpf_func_id func_id, const struct bpf_prog 
*prog)
{
...
         default:
                 return bpf_tracing_func_proto(func_id, prog);
         }
}

Also, we have some functions inside ____bpf_spin_lock() e.g., 
bpf_prog_inc(), hrtimer_start(), etc. If we want to be absolutely safe,
we need to mark them not tracable for kprobe/kretprobe/fentry/fexit/...
But I am not sure whether this is really needed or not.

> 
>>
>>> +    /* callback_fn and prog need to match. They're updated together
>>> +     * and have to be read under lock.
>>> +     */
>>> +    prog = t->prog;
>>> +    callback_fn = t->callback_fn;
>>> +
>>> +    /* wrap bpf subprog invocation with prog->refcnt++ and -- to make
>>> +     * sure that refcnt doesn't become zero when subprog is executing.
>>> +     * Do it under lock to make sure that bpf_timer_start doesn't drop
>>> +     * prev prog refcnt to zero before timer_cb has a chance to bump 
>>> it.
>>> +     */
>>> +    bpf_prog_inc(prog);
>>> +    ____bpf_spin_unlock(&timer->lock);
>>> +
>>> +    /* bpf_timer_cb() runs in hrtimer_run_softirq. It doesn't 
>>> migrate and
>>> +     * cannot be preempted by another bpf_timer_cb() on the same cpu.
>>> +     * Remember the timer this callback is servicing to prevent
>>> +     * deadlock if callback_fn() calls bpf_timer_cancel() on the 
>>> same timer.
>>> +     */
>>> +    this_cpu_write(hrtimer_running, t);
>>
>> This is not protected by spinlock, in bpf_timer_cancel() and
>> bpf_timer_cancel_and_free(), we have spinlock protected read, so
>> there is potential race conditions if callback function and 
>> helper/bpf_timer_cancel_and_free run in different context?
> 
> what kind of race do you see?
> This is per-cpu var and bpf_timer_cb is in softirq
> while timer_cancel/cancel_and_free are calling it under
> spin_lock_irqsave... so they cannot race because softirq
> and bpf_timer_cb will run after start/canel/cancel_free
> will do unlock_irqrestore.

Again, I missed local_irq_save(). With irqsave, this indeed
won't happen. The same for a few comments below.

> 
>>> +    prev = t->prog;
>>> +    if (prev != prog) {
>>> +        if (prev)
>>> +            /* Drop pref prog refcnt when swapping with new prog */
>>
>> pref -> prev
>>
>>> +            bpf_prog_put(prev);
>>
>> Maybe we want to put the above two lines with {}?
> 
> you mean add {} because there is a comment ?
> I don't think the kernel coding style considers comment as a statement.
> 
>>> +    if (this_cpu_read(hrtimer_running) != t)
>>> +        hrtimer_cancel(&t->timer);
>>
>> We could still have race conditions here when 
>> bpf_timer_cancel_and_free() runs in process context and callback in
>> softirq context. I guess we might be okay.
> 
> No, since this check is under spin_lock_irsave.
> 
>> But if bpf_timer_cancel_and_free() in nmi context, not 100% sure
>> whether we have issues or not.
> 
> timers shouldn't be available to nmi-bpf progs.
> There will be all sorts of issues.
> The underlying hrtimer implementation cannot deal with nmi either.

next prev parent reply	other threads:[~2021-06-25 15:55 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-24  2:25 [PATCH v3 bpf-next 0/8] bpf: Introduce BPF timers Alexei Starovoitov
2021-06-24  2:25 ` [PATCH v3 bpf-next 1/8] bpf: Introduce bpf timers Alexei Starovoitov
2021-06-25  6:25   ` Yonghong Song
2021-06-25 14:57     ` Alexei Starovoitov
2021-06-25 15:54       ` Yonghong Song [this message]
2021-06-29  1:39         ` Alexei Starovoitov
2021-06-25 16:54   ` Yonghong Song
2021-06-29  1:46     ` Alexei Starovoitov
2021-06-29  2:24       ` Yonghong Song
2021-06-29  3:32         ` Alexei Starovoitov
2021-06-29  6:34       ` Andrii Nakryiko
2021-06-29 13:28         ` Alexei Starovoitov
2021-06-30 10:08           ` Andrii Nakryiko
2021-06-30 17:38             ` Alexei Starovoitov
2021-07-01  5:40   ` Alexei Starovoitov
2021-07-01 11:51     ` Toke Høiland-Jørgensen
2021-07-01 15:34       ` Alexei Starovoitov
2021-06-24  2:25 ` [PATCH v3 bpf-next 2/8] bpf: Add map side support for " Alexei Starovoitov
2021-06-25 19:46   ` Yonghong Song
2021-06-29  1:49     ` Alexei Starovoitov
2021-06-24  2:25 ` [PATCH v3 bpf-next 3/8] bpf: Remember BTF of inner maps Alexei Starovoitov
2021-06-29  1:45   ` Yonghong Song
2021-06-24  2:25 ` [PATCH v3 bpf-next 4/8] bpf: Relax verifier recursion check Alexei Starovoitov
2021-06-24  2:25 ` [PATCH v3 bpf-next 5/8] bpf: Implement verifier support for validation of async callbacks Alexei Starovoitov
2021-06-24  2:25 ` [PATCH v3 bpf-next 6/8] bpf: Teach stack depth check about " Alexei Starovoitov
2021-06-24  2:25 ` [PATCH v3 bpf-next 7/8] selftests/bpf: Add bpf_timer test Alexei Starovoitov
2021-06-24  2:25 ` [PATCH v3 bpf-next 8/8] selftests/bpf: Add a test with bpf_timer in inner map Alexei Starovoitov
2021-06-24 11:27 ` [PATCH v3 bpf-next 0/8] bpf: Introduce BPF timers Toke Høiland-Jørgensen

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:7a52bc17284 dfblob:80f6e6dafd5 )
 OR (
bs:"Re: [PATCH v3 bpf-next 1/8] bpf: Introduce bpf timers." )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cfa10fa1-9ee4-95de-109d-a24cd5d43a98@fb.com \
    --to=yhs@fb.com \
    --cc=alexei.starovoitov@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@fb.com \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=kernel-team@fb.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.