From mboxrd@z Thu Jan  1 00:00:00 1970
From: Daniel Colascione <dancol@google.com>
Subject: Re: [RFC] Add BPF_SYNCHRONIZE bpf(2) command
Date: Fri, 6 Jul 2018 20:22:49 -0700
Message-ID: <CAKOZuevB9vq5n8s_oc923pwaAhU-_MCQwdCWLmO12h2D==PDCA@mail.gmail.com>
References: <20180707015616.25988-1-dancol@google.com> <20180707025426.ssxipi7hsehoiuyo@ast-mbp.dhcp.thefacebook.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Cc: Joel Fernandes <joelaf@google.com>, ast@fb.com,
        linux-kernel@vger.kernel.org, Tim Murray <timmurray@google.com>,
        daniel@iogearbox.net, netdev@vger.kernel.org
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Return-path: <netdev-owner@vger.kernel.org>
Received: from mail-io0-f196.google.com ([209.85.223.196]:41534 "EHLO
        mail-io0-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S932865AbeGGDWv (ORCPT
        <rfc822;netdev@vger.kernel.org>); Fri, 6 Jul 2018 23:22:51 -0400
Received: by mail-io0-f196.google.com with SMTP id q9-v6so12515892ioj.8
        for <netdev@vger.kernel.org>; Fri, 06 Jul 2018 20:22:51 -0700 (PDT)
In-Reply-To: <20180707025426.ssxipi7hsehoiuyo@ast-mbp.dhcp.thefacebook.com>
Sender: netdev-owner@vger.kernel.org
List-ID: <netdev.vger.kernel.org>

On Fri, Jul 6, 2018 at 7:54 PM, Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
> On Fri, Jul 06, 2018 at 06:56:16PM -0700, Daniel Colascione wrote:
>> BPF_SYNCHRONIZE waits for any BPF programs active at the time of
>> BPF_SYNCHRONIZE to complete, allowing userspace to ensure atomicity of
>> RCU data structure operations with respect to active programs. For
>> example, userspace can update a map->map entry to point to a new map,
>> use BPF_SYNCHRONIZE to wait for any BPF programs using the old map to
>> complete, and then drain the old map without fear that BPF programs
>> may still be updating it.
>>
>> Signed-off-by: Daniel Colascione <dancol@google.com>
>> ---
>>  include/uapi/linux/bpf.h |  1 +
>>  kernel/bpf/syscall.c     | 14 ++++++++++++++
>>  2 files changed, 15 insertions(+)
>>
>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
>> index b7db3261c62d..4365c50e8055 100644
>> --- a/include/uapi/linux/bpf.h
>> +++ b/include/uapi/linux/bpf.h
>> @@ -98,6 +98,7 @@ enum bpf_cmd {
>>       BPF_BTF_LOAD,
>>       BPF_BTF_GET_FD_BY_ID,
>>       BPF_TASK_FD_QUERY,
>> +     BPF_SYNCHRONIZE,
>>  };
>>
>>  enum bpf_map_type {
>> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
>> index d10ecd78105f..60ec7811846e 100644
>> --- a/kernel/bpf/syscall.c
>> +++ b/kernel/bpf/syscall.c
>> @@ -2272,6 +2272,20 @@ SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, uattr, unsigned int, siz
>>       if (sysctl_unprivileged_bpf_disabled && !capable(CAP_SYS_ADMIN))
>>               return -EPERM;
>>
>> +     if (cmd == BPF_SYNCHRONIZE) {
>> +             if (uattr != NULL || size != 0)
>> +                     return -EINVAL;
>> +             err = security_bpf(cmd, NULL, 0);
>> +             if (err < 0)
>> +                     return err;
>> +             /* BPF programs are run with preempt disabled, so
>> +              * synchronize_sched is sufficient even with
>> +              * RCU_PREEMPT.
>> +              */
>> +             synchronize_sched();
>> +             return 0;
>
> I don't think it's necessary. sys_membarrier() can do this already
> and some folks use it exactly for this use case.

True --- implementation-wise, today. The point of the patch is to sort
out contractual guarantees. membarrier isn't documented to have the
BPF_SYNCHRONIZE effect right now, although it does. If we want to
extend membarrier's contract, great, but that might make it more
expensive than necessary for callers who, well, just want a memory
barrier. It's worthwhile to have a dedicated BPF command for this
task, especially considering the simplicity of the implementation.