* Re: WARNING in strp_data_ready
From: Tom Herbert @ 2017-12-28 1:19 UTC (permalink / raw)
To: Ozgur
Cc: Dmitry Vyukov, John Fastabend, syzbot, David S. Miller,
Eric Biggers, LKML, Linux Kernel Network Developers,
syzkaller-bugs@googlegroups.com, Tom Herbert, Cong Wang
In-Reply-To: <926421514406019@web43j.yandex.ru>
On Wed, Dec 27, 2017 at 12:20 PM, Ozgur <ozgur@goosey.org> wrote:
>
>
> 27.12.2017, 23:14, "Dmitry Vyukov" <dvyukov@google.com>:
>> On Wed, Dec 27, 2017 at 9:08 PM, Ozgur <ozgur@goosey.org> wrote:
>>> 27.12.2017, 22:21, "Dmitry Vyukov" <dvyukov@google.com>:
>>>> On Wed, Dec 27, 2017 at 8:09 PM, Tom Herbert <tom@herbertland.com> wrote:
>>>>> Did you try the patch I posted?
>>>>
>>>> Hi Tom,
>>>
>>> Hello Dmitry,
>>>
>>>> No. And I didn't know I need to. Why?
>>>> If you think the patch needs additional testing, you can ask syzbot to
>>>> test it. See https://github.com/google/syzkaller/blob/master/docs/syzbot.md#communication-with-syzbot
>>>> Otherwise proceed with committing it. Or what are we waiting for?
>>>>
>>>> Thanks
>>>
>>> I think we need to fixed patch for crash, in fact check to patch code and test solve the bug.
>>> How do test it because there is no patch in the following bug?
>>
>> Hi Ozgur,
>>
>> I am not sure I completely understand what you mean. But the
>> reproducer for this bug (which one can use for testing) is here:
>> https://groups.google.com/forum/#!topic/syzkaller-bugs/Kxs05ziCpgY
>> Tom also mentions there is some patch for this, but I don't know where
>> it is, it doesn't seem to be referenced from this thread.
>
> Hello Dmitry,
>
> Ah, I'm sorry I don't seen Tom mail and I don't have a patch not tested :)
> I think Tom send patch to only you and are you tested?
>
> kcmsock.c will change and strp_data_ready I think locked.
>
> Tom, please send a patch for me? I can test and inform you.
>
Hi Ozgur,
I reposted the patches as RFC "kcm: Fix lockdep issue". Please test if you can!
Thanks,
Tom
> Regards
>
> Ozgur
>
>>> The fix patch should be for this net/kcm/kcmsock.c file and lock functions must be added calling sk_data_ready ().
>>> Regards
>>>
>>> Ozgur
>>>
>>>>> On Wed, Dec 27, 2017 at 10:25 AM, Dmitry Vyukov <dvyukov@google.com> wrote:
>>>>>> On Wed, Dec 6, 2017 at 4:44 PM, Dmitry Vyukov <dvyukov@google.com> wrote:
>>>>>>>> <john.fastabend@gmail.com> wrote:
>>>>>>>>> On 10/24/2017 08:20 AM, syzbot wrote:
>>>>>>>>>> Hello,
>>>>>>>>>>
>>>>>>>>>> syzkaller hit the following crash on 73d3393ada4f70fa3df5639c8d438f2f034c0ecb
>>>>>>>>>> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/master
>>>>>>>>>> compiler: gcc (GCC) 7.1.1 20170620
>>>>>>>>>> .config is attached
>>>>>>>>>> Raw console output is attached.
>>>>>>>>>> C reproducer is attached
>>>>>>>>>> syzkaller reproducer is attached. See https://goo.gl/kgGztJ
>>>>>>>>>> for information about syzkaller reproducers
>>>>>>>>>>
>>>>>>>>>> WARNING: CPU: 0 PID: 2996 at ./include/net/sock.h:1505 sock_owned_by_me include/net/sock.h:1505 [inline]
>>>>>>>>>> WARNING: CPU: 0 PID: 2996 at ./include/net/sock.h:1505 sock_owned_by_user include/net/sock.h:1511 [inline]
>>>>>>>>>> WARNING: CPU: 0 PID: 2996 at ./include/net/sock.h:1505 strp_data_ready+0x2b7/0x390 net/strparser/strparser.c:404
>>>>>>>>>> Kernel panic - not syncing: panic_on_warn set ...
>>>>>>>>>>
>>>>>>>>>> CPU: 0 PID: 2996 Comm: syzkaller142210 Not tainted 4.14.0-rc5+ #138
>>>>>>>>>> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
>>>>>>>>>> Call Trace:
>>>>>>>>>> <IRQ>
>>>>>>>>>> __dump_stack lib/dump_stack.c:16 [inline]
>>>>>>>>>> dump_stack+0x194/0x257 lib/dump_stack.c:52
>>>>>>>>>> panic+0x1e4/0x417 kernel/panic.c:181
>>>>>>>>>> __warn+0x1c4/0x1d9 kernel/panic.c:542
>>>>>>>>>> report_bug+0x211/0x2d0 lib/bug.c:183
>>>>>>>>>> fixup_bug+0x40/0x90 arch/x86/kernel/traps.c:178
>>>>>>>>>> do_trap_no_signal arch/x86/kernel/traps.c:212 [inline]
>>>>>>>>>> do_trap+0x260/0x390 arch/x86/kernel/traps.c:261
>>>>>>>>>> do_error_trap+0x120/0x390 arch/x86/kernel/traps.c:298
>>>>>>>>>> do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:311
>>>>>>>>>> invalid_op+0x18/0x20 arch/x86/entry/entry_64.S:905
>>>>>>>>>> RIP: 0010:sock_owned_by_me include/net/sock.h:1505 [inline]
>>>>>>>>>> RIP: 0010:sock_owned_by_user include/net/sock.h:1511 [inline]
>>>>>>>>>> RIP: 0010:strp_data_ready+0x2b7/0x390 net/strparser/strparser.c:404
>>>>>>>>>> RSP: 0018:ffff8801db206b18 EFLAGS: 00010206
>>>>>>>>>> RAX: ffff8801d1e02080 RBX: ffff8801dad74c48 RCX: 0000000000000000
>>>>>>>>>> RDX: 0000000000000100 RSI: ffff8801d29fa0a0 RDI: ffffffff85cbede0
>>>>>>>>>> RBP: ffff8801db206b38 R08: 0000000000000005 R09: 1ffffffff0ce0bcd
>>>>>>>>>> R10: ffff8801db206a00 R11: dffffc0000000000 R12: ffff8801d29fa000
>>>>>>>>>> R13: ffff8801dad74c50 R14: ffff8801d4350a92 R15: 0000000000000001
>>>>>>>>>> psock_data_ready+0x56/0x70 net/kcm/kcmsock.c:353
>>>>>>>>>
>>>>>>>>> Looks like KCM is calling sk_data_ready() without first taking the
>>>>>>>>> sock lock.
>>>>>>>>>
>>>>>>>>> /* Called with lower sock held */
>>>>>>>>> static void kcm_rcv_strparser(struct strparser *strp, struct sk_buff *skb)
>>>>>>>>> {
>>>>>>>>> [...]
>>>>>>>>> if (kcm_queue_rcv_skb(&kcm->sk, skb)) {
>>>>>>>>>
>>>>>>>>> In this case kcm->sk is not the same lock the comment is referring to.
>>>>>>>>> And kcm_queue_rcv_skb() will eventually call sk_data_ready().
>>>>>>>>>
>>>>>>>>> @Tom, how about wrapping the sk_data_ready call in {lock|release}_sock?
>>>>>>>>> I don't have anything better in mind immediately.
>>>>>>>> The sock locks are taken in reverse order in the send path so so
>>>>>>>> grabbing kcm sock lock with lower lock held to call sk_data_ready may
>>>>>>>> lead to deadlock like I think.
>>>>>>>>
>>>>>>>> It might be possible to change the order in the send path to do this.
>>>>>>>> Something like:
>>>>>>>>
>>>>>>>> trylock on lower socket lock
>>>>>>>> -if trylock fails
>>>>>>>> - release kcm sock lock
>>>>>>>> - lock lower sock
>>>>>>>> - lock kcm sock
>>>>>>>> - call sendpage locked function
>>>>>>>>
>>>>>>>> I admit that dealing with two levels of socket locks in the data path
>>>>>>>> is quite a pain :-)
>>>>>>>
>>>>>>> up
>>>>>>>
>>>>>>> still happening and we've lost 50K+ test VMs on this
>>>>>>
>>>>>> up
>>>>>>
>>>>>> Still happens and number of crashes crossed 60K, can we do something
>>>>>> with this please?
^ permalink raw reply
* Re: pull-request: bpf 2017-12-28
From: David Miller @ 2017-12-28 1:35 UTC (permalink / raw)
To: daniel; +Cc: ast, netdev
In-Reply-To: <20171227232723.4544-1-daniel@iogearbox.net>
From: Daniel Borkmann <daniel@iogearbox.net>
Date: Thu, 28 Dec 2017 00:27:23 +0100
> The following pull-request contains BPF updates for your *net* tree.
>
> The main changes are:
>
> 1) Two small fixes for bpftool. Fix otherwise broken output if any of
> the system calls failed when listing maps in json format and instead
> of bailing out, skip maps or progs that disappeared between fetching
> next id and getting an fd for that id, both from Jakub.
>
> 2) Small fix in BPF selftests to respect LLC passed from command line
> when testing for -mcpu=probe presence, from Quentin.
>
> Please consider pulling these changes from:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git
Pulled, thanks Daniel.
^ permalink raw reply
* Re: [PATCH net-next] nfp: bpf: allocate vNIC priv for keeping track of the offloaded program
From: David Miller @ 2017-12-28 1:38 UTC (permalink / raw)
To: jakub.kicinski; +Cc: netdev, oss-drivers
In-Reply-To: <20171227233649.15018-1-jakub.kicinski@netronome.com>
From: Jakub Kicinski <jakub.kicinski@netronome.com>
Date: Wed, 27 Dec 2017 15:36:49 -0800
> After TC offloads were converted to callbacks we have no choice
> but keep track of the offloaded filter in the driver.
>
> Since this change came a little late in the release cycle
> there were a number of conflicts and allocation of vNIC priv
> structure seems to have slipped away in linux-next.
>
> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Oh well, I thought I resolved the merge conflict properly in this
case. Apparently not :-)
Applied, thanks Jakub.
^ permalink raw reply
* Re: [RFC PATCH bpf-next v2 4/4] error-injection: Support fault injection framework
From: Masami Hiramatsu @ 2017-12-28 1:38 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: Alexei Starovoitov, Josef Bacik, rostedt, mingo, davem, netdev,
linux-kernel, ast, kernel-team, daniel, linux-btrfs, darrick.wong,
Josef Bacik, Akinobu Mita
In-Reply-To: <87e5e909-593d-7fd6-c7bb-714c1e3022a0@fb.com>
On Wed, 27 Dec 2017 14:49:46 -0800
Alexei Starovoitov <ast@fb.com> wrote:
> On 12/27/17 12:09 AM, Masami Hiramatsu wrote:
> > On Tue, 26 Dec 2017 18:12:56 -0800
> > Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
> >
> >> On Tue, Dec 26, 2017 at 04:48:25PM +0900, Masami Hiramatsu wrote:
> >>> Support in-kernel fault-injection framework via debugfs.
> >>> This allows you to inject a conditional error to specified
> >>> function using debugfs interfaces.
> >>>
> >>> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
> >>> ---
> >>> Documentation/fault-injection/fault-injection.txt | 5 +
> >>> kernel/Makefile | 1
> >>> kernel/fail_function.c | 169 +++++++++++++++++++++
> >>> lib/Kconfig.debug | 10 +
> >>> 4 files changed, 185 insertions(+)
> >>> create mode 100644 kernel/fail_function.c
> >>>
> >>> diff --git a/Documentation/fault-injection/fault-injection.txt b/Documentation/fault-injection/fault-injection.txt
> >>> index 918972babcd8..6243a588dd71 100644
> >>> --- a/Documentation/fault-injection/fault-injection.txt
> >>> +++ b/Documentation/fault-injection/fault-injection.txt
> >>> @@ -30,6 +30,11 @@ o fail_mmc_request
> >>> injects MMC data errors on devices permitted by setting
> >>> debugfs entries under /sys/kernel/debug/mmc0/fail_mmc_request
> >>>
> >>> +o fail_function
> >>> +
> >>> + injects error return on specific functions by setting debugfs entries
> >>> + under /sys/kernel/debug/fail_function. No boot option supported.
> >>
> >> I like it.
> >> Could you document it a bit better?
> >
> > Yes, I will do in next series.
> >
> >> In particular retval is configurable, but without an example no one
> >> will be able to figure out how to use it.
> >
> > Ah, right. BTW, as I pointed in the covermail, should we store the
> > expected error value range into the injectable list? e.g.
> >
> > ALLOW_ERROR_INJECTION(open_ctree, -1, -MAX_ERRNO)
> >
> > And provide APIs to check/get it.
>
> I'm afraid such check would be too costly.
> Right now we have only two functions marked but I expect hundreds more
> will be added in the near future as soon as developers realize the
> potential of such error injection.
> All of ALLOW_ERROR_INJECTION marks add 8 byte overhead each to .data.
> Multiple by 1k and we have 8k of data spent on marks.
> If we add max/min range marks that doubles it for very little use.
> I think marking function only is enough.
Sorry, I don't think so.
Even if it takes 16 bytes more for each points, I don't think it is
any overhead for machines in these days. Even if so, we can provide
a kconfig to reduce it.
I mean, we are living in GB-order memory are, and it will be bigger
in the future. Why we have to worry about hundreds of 16bytes memory
pieces? It will take a few KB, and even if we mark thousands of
functions, it never reaches 1MB, in GB memory pool. :)
Of course, for many small-footprint embedded devices (like having
less than 128MB memory), this feature can be a overhead. But they
can cut off the table by kconfig.
Thank you,
--
Masami Hiramatsu <mhiramat@kernel.org>
^ permalink raw reply
* Re: pull-request: bpf-next 2017-12-28
From: David Miller @ 2017-12-28 1:41 UTC (permalink / raw)
To: daniel; +Cc: ast, netdev
In-Reply-To: <20171228001821.5305-1-daniel@iogearbox.net>
From: Daniel Borkmann <daniel@iogearbox.net>
Date: Thu, 28 Dec 2017 01:18:21 +0100
> The following pull-request contains BPF updates for your *net-next*
> tree.
Pulled.
Any progress on those tests failing on strict alignment architectures?
^ permalink raw reply
* Re: [RFC PATCH net-next] net: hns3: hns3_get_channels() can be static
From: David Miller @ 2017-12-28 1:42 UTC (permalink / raw)
To: fengguang.wu
Cc: lipeng321, kbuild-all, netdev, qumingguang, yisen.zhuang,
salil.mehta, linux-kernel
In-Reply-To: <20171228010959.GA86177@lkp-sb05>
From: kbuild test robot <fengguang.wu@intel.com>
Date: Thu, 28 Dec 2017 09:09:59 +0800
>
> Fixes: 482d2e9c1cc7 ("net: hns3: add support to query tqps number")
> Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Applied, thanks.
^ permalink raw reply
* Re: [RFC PATCH bpf-next v2 1/4] tracing/kprobe: bpf: Check error injectable event is on function entry
From: Masami Hiramatsu @ 2017-12-28 2:34 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: Alexei Starovoitov, Josef Bacik, rostedt, mingo, davem, netdev,
linux-kernel, ast, kernel-team, daniel, linux-btrfs, darrick.wong,
Josef Bacik, Akinobu Mita
In-Reply-To: <a4097830-ac90-4db0-b860-6f6a85e91cba@fb.com>
On Wed, 27 Dec 2017 14:46:24 -0800
Alexei Starovoitov <ast@fb.com> wrote:
> On 12/26/17 9:56 PM, Masami Hiramatsu wrote:
> > On Tue, 26 Dec 2017 17:57:32 -0800
> > Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
> >
> >> On Tue, Dec 26, 2017 at 04:46:59PM +0900, Masami Hiramatsu wrote:
> >>> Check whether error injectable event is on function entry or not.
> >>> Currently it checks the event is ftrace-based kprobes or not,
> >>> but that is wrong. It should check if the event is on the entry
> >>> of target function. Since error injection will override a function
> >>> to just return with modified return value, that operation must
> >>> be done before the target function starts making stackframe.
> >>>
> >>> As a side effect, bpf error injection is no need to depend on
> >>> function-tracer. It can work with sw-breakpoint based kprobe
> >>> events too.
> >>>
> >>> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
> >>> ---
> >>> kernel/trace/Kconfig | 2 --
> >>> kernel/trace/bpf_trace.c | 6 +++---
> >>> kernel/trace/trace_kprobe.c | 8 +++++---
> >>> kernel/trace/trace_probe.h | 12 ++++++------
> >>> 4 files changed, 14 insertions(+), 14 deletions(-)
> >>>
> >>> diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
> >>> index ae3a2d519e50..6400e1bf97c5 100644
> >>> --- a/kernel/trace/Kconfig
> >>> +++ b/kernel/trace/Kconfig
> >>> @@ -533,9 +533,7 @@ config FUNCTION_PROFILER
> >>> config BPF_KPROBE_OVERRIDE
> >>> bool "Enable BPF programs to override a kprobed function"
> >>> depends on BPF_EVENTS
> >>> - depends on KPROBES_ON_FTRACE
> >>> depends on HAVE_KPROBE_OVERRIDE
> >>> - depends on DYNAMIC_FTRACE_WITH_REGS
> >>> default n
> >>> help
> >>> Allows BPF to override the execution of a probed function and
> >>> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> >>> index f6d2327ecb59..d663660f8392 100644
> >>> --- a/kernel/trace/bpf_trace.c
> >>> +++ b/kernel/trace/bpf_trace.c
> >>> @@ -800,11 +800,11 @@ int perf_event_attach_bpf_prog(struct perf_event *event,
> >>> int ret = -EEXIST;
> >>>
> >>> /*
> >>> - * Kprobe override only works for ftrace based kprobes, and only if they
> >>> - * are on the opt-in list.
> >>> + * Kprobe override only works if they are on the function entry,
> >>> + * and only if they are on the opt-in list.
> >>> */
> >>> if (prog->kprobe_override &&
> >>> - (!trace_kprobe_ftrace(event->tp_event) ||
> >>> + (!trace_kprobe_on_func_entry(event->tp_event) ||
> >>> !trace_kprobe_error_injectable(event->tp_event)))
> >>> return -EINVAL;
> >>>
> >>> diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
> >>> index 91f4b57dab82..265e3e27e8dc 100644
> >>> --- a/kernel/trace/trace_kprobe.c
> >>> +++ b/kernel/trace/trace_kprobe.c
> >>> @@ -88,13 +88,15 @@ static nokprobe_inline unsigned long trace_kprobe_nhit(struct trace_kprobe *tk)
> >>> return nhit;
> >>> }
> >>>
> >>> -int trace_kprobe_ftrace(struct trace_event_call *call)
> >>> +bool trace_kprobe_on_func_entry(struct trace_event_call *call)
> >>> {
> >>> struct trace_kprobe *tk = (struct trace_kprobe *)call->data;
> >>> - return kprobe_ftrace(&tk->rp.kp);
> >>> +
> >>> + return kprobe_on_func_entry(tk->rp.kp.addr, tk->rp.kp.symbol_name,
> >>> + tk->rp.kp.offset);
> >>
> >> That would be nice, but did you test this?
> >
> > Yes, because the jprobe, which was only official user of modifying execution
> > path using kprobe, did same way to check. (and kretprobe also does it)
> >
> >> My understanding that kprobe will restore all regs and
> >> here we need to override return ip _and_ value.
> >
> > yes, no problem. kprobe restore all regs from pt_regs, including regs->ip.
> >
> >> Could you add a patch with the test the way Josef did
> >> or describe the steps to test this new mode?
> >
> > Would you mean below patch? If so, it should work without any change.
> >
> > [PATCH v10 4/5] samples/bpf: add a test for bpf_override_return
>
> yeah. I expect bpf_override_return test to work as-is.
> I'm asking for the test for new functionality added by this patch.
> In particular kprobe on func entry without ftrace.
> How did you test it?
This function is used in kretprobe and jprobe. Jprobe was the user of
"modifying instruction pointer to another function" in kprobes.
If it doesn't work, jprobe also doesn't work, this means you can not
modify IP by kprobes anymore.
Anyway, until linux-4.13, that was well tested by kprobe smoke test.
> and how I can repeat the test?
> I'm still not sure that it works correctly.
That works correctly because it checks given address is on the entry
point (the 1st instruction) of a function, using kallsyms.
The reason why I made another flag for ftrace was, there are 2 modes
for ftrace dynamic instrumentation, fentry and mcount.
With new fentry mode, ftrace will be put on the first instruction
of the function, so it will work as you expected.
With traditional gcc mcount, ftrace will be called after making call
frame for _mcount(). This means if you modify ip, it will not work
or cause a trouble because _mcount call frame is still on the stack.
So, current ftrace-based checker doesn't work, it depends on the case.
Of course, in most case, kernel will be build in new gcc which
supports fentry, but there is no guarantee.
Please follow what jprobe did if you want to change invoked function
using kprobes. That has been well reviewed and discussed in more than
10 years.
Thank you,
--
Masami Hiramatsu <mhiramat@kernel.org>
^ permalink raw reply
* [PATCH bpf-next v3 0/9] bpf: offload: report device back to user space (take 2)
From: Jakub Kicinski @ 2017-12-28 2:39 UTC (permalink / raw)
To: netdev, daniel, alexei.starovoitov; +Cc: oss-drivers, Jakub Kicinski
Hi!
This series is a redo of reporting offload device information to
user space after the first attempt did not take into account name
spaces. As requested by Kirill offloads are now protected by an
r/w sem. This allows us to remove the workqueue and free the
offload state fully when device is removed (suggested by Alexei).
Net namespace is reported with a device/inode pair.
The accompanying bpftool support is placed in common code because
maps will have very similar info. Note that the UAPI information
can't be nicely encapsulated into a struct, because in case we
need to grow the device information the new fields will have to
be added at the end of struct bpf_prog_info, we can't grow
structures in the middle of bpf_prog_info.
v3:
- use dev_get_by_index();
- redo ns code (new patch 6).
v2:
- rework the locking in patch 1 (use RCU instead of locking
dependencies);
- grab RTNL for a short time in patch 6;
- minor update to the test in patch 8.
Jakub Kicinski (9):
bpf: offload: don't require rtnl for dev list manipulation
bpf: offload: don't use prog->aux->offload as boolean
bpf: offload: allow netdev to disappear while verifier is running
bpf: offload: free prog->aux->offload when device disappears
bpf: offload: free program id when device disappears
nsfs: generalize ns_get_path() for path resolution with a task
bpf: offload: report device information for offloaded programs
tools: bpftool: report device information for offloaded programs
selftests/bpf: test device info reporting for bound progs
drivers/net/ethernet/netronome/nfp/bpf/main.h | 2 +-
drivers/net/ethernet/netronome/nfp/bpf/verifier.c | 2 +-
drivers/net/netdevsim/bpf.c | 2 +-
fs/nsfs.c | 29 ++++-
include/linux/bpf.h | 16 ++-
include/linux/bpf_verifier.h | 16 +--
include/linux/netdevice.h | 4 +-
include/linux/proc_ns.h | 3 +
include/uapi/linux/bpf.h | 3 +
kernel/bpf/offload.c | 147 ++++++++++++++++------
kernel/bpf/syscall.c | 19 ++-
kernel/bpf/verifier.c | 20 ++-
tools/bpf/bpftool/common.c | 52 ++++++++
tools/bpf/bpftool/main.h | 2 +
tools/bpf/bpftool/prog.c | 3 +
tools/include/uapi/linux/bpf.h | 3 +
tools/testing/selftests/bpf/test_offload.py | 112 +++++++++++++++--
17 files changed, 346 insertions(+), 89 deletions(-)
--
2.15.1
^ permalink raw reply
* [PATCH bpf-next v3 1/9] bpf: offload: don't require rtnl for dev list manipulation
From: Jakub Kicinski @ 2017-12-28 2:39 UTC (permalink / raw)
To: netdev, daniel, alexei.starovoitov; +Cc: oss-drivers, Jakub Kicinski
In-Reply-To: <20171228023911.4251-1-jakub.kicinski@netronome.com>
We don't need the RTNL lock for all operations on offload state.
We only need to hold it around ndo calls. The device offload
initialization doesn't require it. The soon-to-come querying
of the offload info will only need it partially. We will also
be able to remove the waitqueue in following patches.
Use struct rw_semaphore because map offload will require sleeping
with the semaphore held for read.
Suggested-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
---
v3:
- use dev_get_by_index() (Kirill).
v2:
- use dev_get_by_index_rcu() instead of implicit lock dependencies;
- use DECLARE_RWSEM() instead of init_rwsem() (Kirill).
---
kernel/bpf/offload.c | 34 ++++++++++++++++++++++++----------
1 file changed, 24 insertions(+), 10 deletions(-)
diff --git a/kernel/bpf/offload.c b/kernel/bpf/offload.c
index 8455b89d1bbf..032079754d88 100644
--- a/kernel/bpf/offload.c
+++ b/kernel/bpf/offload.c
@@ -20,13 +20,16 @@
#include <linux/netdevice.h>
#include <linux/printk.h>
#include <linux/rtnetlink.h>
+#include <linux/rwsem.h>
-/* protected by RTNL */
+/* Protects bpf_prog_offload_devs and offload members of all progs.
+ * RTNL lock cannot be taken when holding this lock.
+ */
+static DECLARE_RWSEM(bpf_devs_lock);
static LIST_HEAD(bpf_prog_offload_devs);
int bpf_prog_offload_init(struct bpf_prog *prog, union bpf_attr *attr)
{
- struct net *net = current->nsproxy->net_ns;
struct bpf_dev_offload *offload;
if (attr->prog_type != BPF_PROG_TYPE_SCHED_CLS &&
@@ -43,19 +46,26 @@ int bpf_prog_offload_init(struct bpf_prog *prog, union bpf_attr *attr)
offload->prog = prog;
init_waitqueue_head(&offload->verifier_done);
- rtnl_lock();
- offload->netdev = __dev_get_by_index(net, attr->prog_ifindex);
- if (!offload->netdev) {
- rtnl_unlock();
- kfree(offload);
- return -EINVAL;
- }
+ offload->netdev = dev_get_by_index(current->nsproxy->net_ns,
+ attr->prog_ifindex);
+ if (!offload->netdev)
+ goto err_free;
+ down_write(&bpf_devs_lock);
+ if (offload->netdev->reg_state != NETREG_REGISTERED)
+ goto err_unlock;
prog->aux->offload = offload;
list_add_tail(&offload->offloads, &bpf_prog_offload_devs);
- rtnl_unlock();
+ dev_put(offload->netdev);
+ up_write(&bpf_devs_lock);
return 0;
+err_unlock:
+ up_write(&bpf_devs_lock);
+ dev_put(offload->netdev);
+err_free:
+ kfree(offload);
+ return -EINVAL;
}
static int __bpf_offload_ndo(struct bpf_prog *prog, enum bpf_netdev_command cmd,
@@ -126,7 +136,9 @@ void bpf_prog_offload_destroy(struct bpf_prog *prog)
wake_up(&offload->verifier_done);
rtnl_lock();
+ down_write(&bpf_devs_lock);
__bpf_prog_offload_destroy(prog);
+ up_write(&bpf_devs_lock);
rtnl_unlock();
kfree(offload);
@@ -181,11 +193,13 @@ static int bpf_offload_notification(struct notifier_block *notifier,
if (netdev->reg_state != NETREG_UNREGISTERING)
break;
+ down_write(&bpf_devs_lock);
list_for_each_entry_safe(offload, tmp, &bpf_prog_offload_devs,
offloads) {
if (offload->netdev == netdev)
__bpf_prog_offload_destroy(offload->prog);
}
+ up_write(&bpf_devs_lock);
break;
default:
break;
--
2.15.1
^ permalink raw reply related
* [PATCH bpf-next v3 2/9] bpf: offload: don't use prog->aux->offload as boolean
From: Jakub Kicinski @ 2017-12-28 2:39 UTC (permalink / raw)
To: netdev, daniel, alexei.starovoitov; +Cc: oss-drivers, Jakub Kicinski
In-Reply-To: <20171228023911.4251-1-jakub.kicinski@netronome.com>
We currently use aux->offload to indicate that program is bound
to a specific device. This forces us to keep the offload structure
around even after the device is gone. Add a bool member to
struct bpf_prog_aux to indicate if offload was requested.
Suggested-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
---
include/linux/bpf.h | 3 ++-
kernel/bpf/syscall.c | 4 +++-
2 files changed, 5 insertions(+), 2 deletions(-)
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index da54ef644fcd..838eee10e979 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -201,6 +201,7 @@ struct bpf_prog_aux {
u32 stack_depth;
u32 id;
u32 func_cnt;
+ bool offload_requested;
struct bpf_prog **func;
void *jit_data; /* JIT specific data. arch dependent */
struct latch_tree_node ksym_tnode;
@@ -529,7 +530,7 @@ int bpf_prog_offload_init(struct bpf_prog *prog, union bpf_attr *attr);
static inline bool bpf_prog_is_dev_bound(struct bpf_prog_aux *aux)
{
- return aux->offload;
+ return aux->offload_requested;
}
#else
static inline int bpf_prog_offload_init(struct bpf_prog *prog,
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index e2e1c78ce1dc..1143db61584c 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -1151,6 +1151,8 @@ static int bpf_prog_load(union bpf_attr *attr)
if (!prog)
return -ENOMEM;
+ prog->aux->offload_requested = !!attr->prog_ifindex;
+
err = security_bpf_prog_alloc(prog->aux);
if (err)
goto free_prog_nouncharge;
@@ -1172,7 +1174,7 @@ static int bpf_prog_load(union bpf_attr *attr)
atomic_set(&prog->aux->refcnt, 1);
prog->gpl_compatible = is_gpl ? 1 : 0;
- if (attr->prog_ifindex) {
+ if (bpf_prog_is_dev_bound(prog->aux)) {
err = bpf_prog_offload_init(prog, attr);
if (err)
goto free_prog;
--
2.15.1
^ permalink raw reply related
* [PATCH bpf-next v3 4/9] bpf: offload: free prog->aux->offload when device disappears
From: Jakub Kicinski @ 2017-12-28 2:39 UTC (permalink / raw)
To: netdev, daniel, alexei.starovoitov; +Cc: oss-drivers, Jakub Kicinski
In-Reply-To: <20171228023911.4251-1-jakub.kicinski@netronome.com>
All bpf offload operations should now be under bpf_devs_lock,
it's safe to free and clear the entire offload structure,
not only the netdev pointer.
__bpf_prog_offload_destroy() will no longer be called multiple
times.
Suggested-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
---
kernel/bpf/offload.c | 23 +++++++++--------------
1 file changed, 9 insertions(+), 14 deletions(-)
diff --git a/kernel/bpf/offload.c b/kernel/bpf/offload.c
index 69ddc3899bab..3126e1a842e6 100644
--- a/kernel/bpf/offload.c
+++ b/kernel/bpf/offload.c
@@ -70,12 +70,14 @@ int bpf_prog_offload_init(struct bpf_prog *prog, union bpf_attr *attr)
static int __bpf_offload_ndo(struct bpf_prog *prog, enum bpf_netdev_command cmd,
struct netdev_bpf *data)
{
- struct net_device *netdev = prog->aux->offload->netdev;
+ struct bpf_dev_offload *offload = prog->aux->offload;
+ struct net_device *netdev;
ASSERT_RTNL();
- if (!netdev)
+ if (!offload)
return -ENODEV;
+ netdev = offload->netdev;
if (!netdev->netdev_ops->ndo_bpf)
return -EOPNOTSUPP;
@@ -111,7 +113,7 @@ int bpf_prog_offload_verify_insn(struct bpf_verifier_env *env,
down_read(&bpf_devs_lock);
offload = env->prog->aux->offload;
- if (offload->netdev)
+ if (offload)
ret = offload->dev_ops->insn_hook(env, insn_idx, prev_insn_idx);
up_read(&bpf_devs_lock);
@@ -123,31 +125,24 @@ static void __bpf_prog_offload_destroy(struct bpf_prog *prog)
struct bpf_dev_offload *offload = prog->aux->offload;
struct netdev_bpf data = {};
- /* Caution - if netdev is destroyed before the program, this function
- * will be called twice.
- */
-
data.offload.prog = prog;
if (offload->dev_state)
WARN_ON(__bpf_offload_ndo(prog, BPF_OFFLOAD_DESTROY, &data));
- offload->dev_state = false;
list_del_init(&offload->offloads);
- offload->netdev = NULL;
+ kfree(offload);
+ prog->aux->offload = NULL;
}
void bpf_prog_offload_destroy(struct bpf_prog *prog)
{
- struct bpf_dev_offload *offload = prog->aux->offload;
-
rtnl_lock();
down_write(&bpf_devs_lock);
- __bpf_prog_offload_destroy(prog);
+ if (prog->aux->offload)
+ __bpf_prog_offload_destroy(prog);
up_write(&bpf_devs_lock);
rtnl_unlock();
-
- kfree(offload);
}
static int bpf_prog_offload_translate(struct bpf_prog *prog)
--
2.15.1
^ permalink raw reply related
* [PATCH bpf-next v3 3/9] bpf: offload: allow netdev to disappear while verifier is running
From: Jakub Kicinski @ 2017-12-28 2:39 UTC (permalink / raw)
To: netdev, daniel, alexei.starovoitov; +Cc: oss-drivers, Jakub Kicinski
In-Reply-To: <20171228023911.4251-1-jakub.kicinski@netronome.com>
To allow verifier instruction callbacks without any extra locking
NETDEV_UNREGISTER notification would wait on a waitqueue for verifier
to finish. This design decision was made when rtnl lock was providing
all the locking. Use the read/write lock instead and remove the
workqueue.
Verifier will now call into the offload code, so dev_ops are moved
to offload structure. Since verifier calls are all under
bpf_prog_is_dev_bound() we no longer need static inline implementations
to please builds with CONFIG_NET=n.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
---
drivers/net/ethernet/netronome/nfp/bpf/main.h | 2 +-
drivers/net/ethernet/netronome/nfp/bpf/verifier.c | 2 +-
drivers/net/netdevsim/bpf.c | 2 +-
include/linux/bpf.h | 9 +++++--
include/linux/bpf_verifier.h | 16 ++----------
include/linux/netdevice.h | 4 +--
kernel/bpf/offload.c | 30 ++++++++++++-----------
kernel/bpf/verifier.c | 20 ++++++---------
8 files changed, 37 insertions(+), 48 deletions(-)
diff --git a/drivers/net/ethernet/netronome/nfp/bpf/main.h b/drivers/net/ethernet/netronome/nfp/bpf/main.h
index aae1be9ed056..89a9b6393882 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/main.h
+++ b/drivers/net/ethernet/netronome/nfp/bpf/main.h
@@ -238,7 +238,7 @@ struct nfp_bpf_vnic {
int nfp_bpf_jit(struct nfp_prog *prog);
-extern const struct bpf_ext_analyzer_ops nfp_bpf_analyzer_ops;
+extern const struct bpf_prog_offload_ops nfp_bpf_analyzer_ops;
struct netdev_bpf;
struct nfp_app;
diff --git a/drivers/net/ethernet/netronome/nfp/bpf/verifier.c b/drivers/net/ethernet/netronome/nfp/bpf/verifier.c
index 9c2608445bd8..d8870c2f11f3 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/verifier.c
+++ b/drivers/net/ethernet/netronome/nfp/bpf/verifier.c
@@ -260,6 +260,6 @@ nfp_verify_insn(struct bpf_verifier_env *env, int insn_idx, int prev_insn_idx)
return 0;
}
-const struct bpf_ext_analyzer_ops nfp_bpf_analyzer_ops = {
+const struct bpf_prog_offload_ops nfp_bpf_analyzer_ops = {
.insn_hook = nfp_verify_insn,
};
diff --git a/drivers/net/netdevsim/bpf.c b/drivers/net/netdevsim/bpf.c
index a243fa7ae02f..5134d5c1306c 100644
--- a/drivers/net/netdevsim/bpf.c
+++ b/drivers/net/netdevsim/bpf.c
@@ -66,7 +66,7 @@ nsim_bpf_verify_insn(struct bpf_verifier_env *env, int insn_idx, int prev_insn)
return 0;
}
-static const struct bpf_ext_analyzer_ops nsim_bpf_analyzer_ops = {
+static const struct bpf_prog_offload_ops nsim_bpf_analyzer_ops = {
.insn_hook = nsim_bpf_verify_insn,
};
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 838eee10e979..669549f7e3e8 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -17,6 +17,7 @@
#include <linux/numa.h>
#include <linux/wait.h>
+struct bpf_verifier_env;
struct perf_event;
struct bpf_prog;
struct bpf_map;
@@ -184,14 +185,18 @@ struct bpf_verifier_ops {
struct bpf_prog *prog, u32 *target_size);
};
+struct bpf_prog_offload_ops {
+ int (*insn_hook)(struct bpf_verifier_env *env,
+ int insn_idx, int prev_insn_idx);
+};
+
struct bpf_dev_offload {
struct bpf_prog *prog;
struct net_device *netdev;
void *dev_priv;
struct list_head offloads;
bool dev_state;
- bool verifier_running;
- wait_queue_head_t verifier_done;
+ const struct bpf_prog_offload_ops *dev_ops;
};
struct bpf_prog_aux {
diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index c009e472f647..70237bb29eb7 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -166,12 +166,6 @@ static inline bool bpf_verifier_log_full(const struct bpf_verifer_log *log)
return log->len_used >= log->len_total - 1;
}
-struct bpf_verifier_env;
-struct bpf_ext_analyzer_ops {
- int (*insn_hook)(struct bpf_verifier_env *env,
- int insn_idx, int prev_insn_idx);
-};
-
#define BPF_MAX_SUBPROGS 256
/* single container for all structs
@@ -185,7 +179,6 @@ struct bpf_verifier_env {
bool strict_alignment; /* perform strict pointer alignment checks */
struct bpf_verifier_state *cur_state; /* current verifier state */
struct bpf_verifier_state_list **explored_states; /* search pruning optimization */
- const struct bpf_ext_analyzer_ops *dev_ops; /* device analyzer ops */
struct bpf_map *used_maps[MAX_USED_MAPS]; /* array of map's used by eBPF program */
u32 used_map_cnt; /* number of used maps */
u32 id_gen; /* used to generate unique reg IDs */
@@ -205,13 +198,8 @@ static inline struct bpf_reg_state *cur_regs(struct bpf_verifier_env *env)
return cur->frame[cur->curframe]->regs;
}
-#if defined(CONFIG_NET) && defined(CONFIG_BPF_SYSCALL)
int bpf_prog_offload_verifier_prep(struct bpf_verifier_env *env);
-#else
-static inline int bpf_prog_offload_verifier_prep(struct bpf_verifier_env *env)
-{
- return -EOPNOTSUPP;
-}
-#endif
+int bpf_prog_offload_verify_insn(struct bpf_verifier_env *env,
+ int insn_idx, int prev_insn_idx);
#endif /* _LINUX_BPF_VERIFIER_H */
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 352066e4eeef..49bfc6eec74c 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -804,7 +804,7 @@ enum bpf_netdev_command {
BPF_OFFLOAD_DESTROY,
};
-struct bpf_ext_analyzer_ops;
+struct bpf_prog_offload_ops;
struct netlink_ext_ack;
struct netdev_bpf {
@@ -826,7 +826,7 @@ struct netdev_bpf {
/* BPF_OFFLOAD_VERIFIER_PREP */
struct {
struct bpf_prog *prog;
- const struct bpf_ext_analyzer_ops *ops; /* callee set */
+ const struct bpf_prog_offload_ops *ops; /* callee set */
} verifier;
/* BPF_OFFLOAD_TRANSLATE, BPF_OFFLOAD_DESTROY */
struct {
diff --git a/kernel/bpf/offload.c b/kernel/bpf/offload.c
index 032079754d88..69ddc3899bab 100644
--- a/kernel/bpf/offload.c
+++ b/kernel/bpf/offload.c
@@ -44,7 +44,6 @@ int bpf_prog_offload_init(struct bpf_prog *prog, union bpf_attr *attr)
return -ENOMEM;
offload->prog = prog;
- init_waitqueue_head(&offload->verifier_done);
offload->netdev = dev_get_by_index(current->nsproxy->net_ns,
attr->prog_ifindex);
@@ -97,15 +96,28 @@ int bpf_prog_offload_verifier_prep(struct bpf_verifier_env *env)
if (err)
goto exit_unlock;
- env->dev_ops = data.verifier.ops;
-
+ env->prog->aux->offload->dev_ops = data.verifier.ops;
env->prog->aux->offload->dev_state = true;
- env->prog->aux->offload->verifier_running = true;
exit_unlock:
rtnl_unlock();
return err;
}
+int bpf_prog_offload_verify_insn(struct bpf_verifier_env *env,
+ int insn_idx, int prev_insn_idx)
+{
+ struct bpf_dev_offload *offload;
+ int ret = -ENODEV;
+
+ down_read(&bpf_devs_lock);
+ offload = env->prog->aux->offload;
+ if (offload->netdev)
+ ret = offload->dev_ops->insn_hook(env, insn_idx, prev_insn_idx);
+ up_read(&bpf_devs_lock);
+
+ return ret;
+}
+
static void __bpf_prog_offload_destroy(struct bpf_prog *prog)
{
struct bpf_dev_offload *offload = prog->aux->offload;
@@ -117,9 +129,6 @@ static void __bpf_prog_offload_destroy(struct bpf_prog *prog)
data.offload.prog = prog;
- if (offload->verifier_running)
- wait_event(offload->verifier_done, !offload->verifier_running);
-
if (offload->dev_state)
WARN_ON(__bpf_offload_ndo(prog, BPF_OFFLOAD_DESTROY, &data));
@@ -132,9 +141,6 @@ void bpf_prog_offload_destroy(struct bpf_prog *prog)
{
struct bpf_dev_offload *offload = prog->aux->offload;
- offload->verifier_running = false;
- wake_up(&offload->verifier_done);
-
rtnl_lock();
down_write(&bpf_devs_lock);
__bpf_prog_offload_destroy(prog);
@@ -146,15 +152,11 @@ void bpf_prog_offload_destroy(struct bpf_prog *prog)
static int bpf_prog_offload_translate(struct bpf_prog *prog)
{
- struct bpf_dev_offload *offload = prog->aux->offload;
struct netdev_bpf data = {};
int ret;
data.offload.prog = prog;
- offload->verifier_running = false;
- wake_up(&offload->verifier_done);
-
rtnl_lock();
ret = __bpf_offload_ndo(prog, BPF_OFFLOAD_TRANSLATE, &data);
rtnl_unlock();
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 1cd2c2d28fc3..88450c8c0515 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -4390,15 +4390,6 @@ static int is_state_visited(struct bpf_verifier_env *env, int insn_idx)
return 0;
}
-static int ext_analyzer_insn_hook(struct bpf_verifier_env *env,
- int insn_idx, int prev_insn_idx)
-{
- if (env->dev_ops && env->dev_ops->insn_hook)
- return env->dev_ops->insn_hook(env, insn_idx, prev_insn_idx);
-
- return 0;
-}
-
static int do_check(struct bpf_verifier_env *env)
{
struct bpf_verifier_state *state;
@@ -4480,9 +4471,12 @@ static int do_check(struct bpf_verifier_env *env)
env->allow_ptr_leaks);
}
- err = ext_analyzer_insn_hook(env, insn_idx, prev_insn_idx);
- if (err)
- return err;
+ if (bpf_prog_is_dev_bound(env->prog->aux)) {
+ err = bpf_prog_offload_verify_insn(env, insn_idx,
+ prev_insn_idx);
+ if (err)
+ return err;
+ }
regs = cur_regs(env);
env->insn_aux_data[insn_idx].seen = true;
@@ -5390,7 +5384,7 @@ int bpf_check(struct bpf_prog **prog, union bpf_attr *attr)
if (!IS_ENABLED(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS))
env->strict_alignment = true;
- if (env->prog->aux->offload) {
+ if (bpf_prog_is_dev_bound(env->prog->aux)) {
ret = bpf_prog_offload_verifier_prep(env);
if (ret)
goto err_unlock;
--
2.15.1
^ permalink raw reply related
* [PATCH bpf-next v3 5/9] bpf: offload: free program id when device disappears
From: Jakub Kicinski @ 2017-12-28 2:39 UTC (permalink / raw)
To: netdev, daniel, alexei.starovoitov; +Cc: oss-drivers, Jakub Kicinski
In-Reply-To: <20171228023911.4251-1-jakub.kicinski@netronome.com>
Bound programs are quite useless after their device disappears.
They are simply waiting for reference count to go to zero,
don't list them in BPF_PROG_GET_NEXT_ID by freeing their ID
early.
Note that orphaned offload programs will return -ENODEV on
BPF_OBJ_GET_INFO_BY_FD so user will never see ID 0.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
---
include/linux/bpf.h | 2 ++
kernel/bpf/offload.c | 3 +++
kernel/bpf/syscall.c | 9 +++++++--
3 files changed, 12 insertions(+), 2 deletions(-)
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 669549f7e3e8..9a916ab34299 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -357,6 +357,8 @@ void bpf_prog_put(struct bpf_prog *prog);
int __bpf_prog_charge(struct user_struct *user, u32 pages);
void __bpf_prog_uncharge(struct user_struct *user, u32 pages);
+void bpf_prog_free_id(struct bpf_prog *prog, bool do_idr_lock);
+
struct bpf_map *bpf_map_get_with_uref(u32 ufd);
struct bpf_map *__bpf_map_get(struct fd f);
struct bpf_map * __must_check bpf_map_inc(struct bpf_map *map, bool uref);
diff --git a/kernel/bpf/offload.c b/kernel/bpf/offload.c
index 3126e1a842e6..e4f1668a021c 100644
--- a/kernel/bpf/offload.c
+++ b/kernel/bpf/offload.c
@@ -130,6 +130,9 @@ static void __bpf_prog_offload_destroy(struct bpf_prog *prog)
if (offload->dev_state)
WARN_ON(__bpf_offload_ndo(prog, BPF_OFFLOAD_DESTROY, &data));
+ /* Make sure BPF_PROG_GET_NEXT_ID can't find this dead program */
+ bpf_prog_free_id(prog, true);
+
list_del_init(&offload->offloads);
kfree(offload);
prog->aux->offload = NULL;
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 1143db61584c..7d9f5b0f0e49 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -905,9 +905,13 @@ static int bpf_prog_alloc_id(struct bpf_prog *prog)
return id > 0 ? 0 : id;
}
-static void bpf_prog_free_id(struct bpf_prog *prog, bool do_idr_lock)
+void bpf_prog_free_id(struct bpf_prog *prog, bool do_idr_lock)
{
- /* cBPF to eBPF migrations are currently not in the idr store. */
+ /* cBPF to eBPF migrations are currently not in the idr store.
+ * Offloaded programs are removed from the store when their device
+ * disappears - even if someone grabs an fd to them they are unusable,
+ * simply waiting for refcnt to drop to be freed.
+ */
if (!prog->aux->id)
return;
@@ -917,6 +921,7 @@ static void bpf_prog_free_id(struct bpf_prog *prog, bool do_idr_lock)
__acquire(&prog_idr_lock);
idr_remove(&prog_idr, prog->aux->id);
+ prog->aux->id = 0;
if (do_idr_lock)
spin_unlock_bh(&prog_idr_lock);
--
2.15.1
^ permalink raw reply related
* [PATCH bpf-next v3 6/9] nsfs: generalize ns_get_path() for path resolution with a task
From: Jakub Kicinski @ 2017-12-28 2:39 UTC (permalink / raw)
To: netdev, daniel, alexei.starovoitov
Cc: oss-drivers, Jakub Kicinski, Eric W . Biederman
In-Reply-To: <20171228023911.4251-1-jakub.kicinski@netronome.com>
ns_get_path() takes struct task_struct and proc_ns_ops as its
parameters. For path resolution directly from a namespace,
e.g. based on a networking device's net name space, we need
more flexibility. Add a ns_get_path_cb() helper which will
allow callers to use any method of obtaining the name space
reference. Convert ns_get_path() to use ns_get_path_cb().
Following patches will bring a networking user.
CC: Eric W. Biederman <ebiederm@xmission.com>
Suggested-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
fs/nsfs.c | 29 ++++++++++++++++++++++++++---
include/linux/proc_ns.h | 3 +++
2 files changed, 29 insertions(+), 3 deletions(-)
diff --git a/fs/nsfs.c b/fs/nsfs.c
index 7c6f76d29f56..36b0772701a0 100644
--- a/fs/nsfs.c
+++ b/fs/nsfs.c
@@ -103,14 +103,14 @@ static void *__ns_get_path(struct path *path, struct ns_common *ns)
goto got_it;
}
-void *ns_get_path(struct path *path, struct task_struct *task,
- const struct proc_ns_operations *ns_ops)
+void *ns_get_path_cb(struct path *path, ns_get_path_helper_t *ns_get_cb,
+ void *private_data)
{
struct ns_common *ns;
void *ret;
again:
- ns = ns_ops->get(task);
+ ns = ns_get_cb(private_data);
if (!ns)
return ERR_PTR(-ENOENT);
@@ -120,6 +120,29 @@ void *ns_get_path(struct path *path, struct task_struct *task,
return ret;
}
+struct ns_get_path_task_args {
+ const struct proc_ns_operations *ns_ops;
+ struct task_struct *task;
+};
+
+static struct ns_common *ns_get_path_task(void *private_data)
+{
+ struct ns_get_path_task_args *args = private_data;
+
+ return args->ns_ops->get(args->task);
+}
+
+void *ns_get_path(struct path *path, struct task_struct *task,
+ const struct proc_ns_operations *ns_ops)
+{
+ struct ns_get_path_task_args args = {
+ .ns_ops = ns_ops,
+ .task = task,
+ };
+
+ return ns_get_path_cb(path, ns_get_path_task, &args);
+}
+
int open_related_ns(struct ns_common *ns,
struct ns_common *(*get_ns)(struct ns_common *ns))
{
diff --git a/include/linux/proc_ns.h b/include/linux/proc_ns.h
index 2ff18c9840a7..d31cb6215905 100644
--- a/include/linux/proc_ns.h
+++ b/include/linux/proc_ns.h
@@ -78,6 +78,9 @@ extern struct file *proc_ns_fget(int fd);
#define get_proc_ns(inode) ((struct ns_common *)(inode)->i_private)
extern void *ns_get_path(struct path *path, struct task_struct *task,
const struct proc_ns_operations *ns_ops);
+typedef struct ns_common *ns_get_path_helper_t(void *);
+extern void *ns_get_path_cb(struct path *path, ns_get_path_helper_t ns_get_cb,
+ void *private_data);
extern int ns_get_name(char *buf, size_t size, struct task_struct *task,
const struct proc_ns_operations *ns_ops);
--
2.15.1
^ permalink raw reply related
* [PATCH bpf-next v3 7/9] bpf: offload: report device information for offloaded programs
From: Jakub Kicinski @ 2017-12-28 2:39 UTC (permalink / raw)
To: netdev, daniel, alexei.starovoitov
Cc: oss-drivers, Jakub Kicinski, Eric W . Biederman
In-Reply-To: <20171228023911.4251-1-jakub.kicinski@netronome.com>
Report to the user ifindex and namespace information of offloaded
programs. If device has disappeared return -ENODEV. Specify the
namespace using dev/inode combination.
CC: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
v3:
- path_put() (Daniel);
- move more of the logic to nsfs.c (Daniel).
v2:
- take RTNL lock to grab a coherent snapshot of device state
(ifindex vs name space) and avoid races with name space
moves (based on Eric's comment on Kirill's patch to
peernet2id_alloc()).
---
include/linux/bpf.h | 2 ++
include/uapi/linux/bpf.h | 3 +++
kernel/bpf/offload.c | 59 ++++++++++++++++++++++++++++++++++++++++++
kernel/bpf/syscall.c | 6 +++++
tools/include/uapi/linux/bpf.h | 3 +++
5 files changed, 73 insertions(+)
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 9a916ab34299..7810ae57b357 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -531,6 +531,8 @@ static inline struct bpf_prog *bpf_prog_get_type(u32 ufd,
int bpf_prog_offload_compile(struct bpf_prog *prog);
void bpf_prog_offload_destroy(struct bpf_prog *prog);
+int bpf_prog_offload_info_fill(struct bpf_prog_info *info,
+ struct bpf_prog *prog);
#if defined(CONFIG_NET) && defined(CONFIG_BPF_SYSCALL)
int bpf_prog_offload_init(struct bpf_prog *prog, union bpf_attr *attr);
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index d01f1cb3cfc0..72b37fc3bc0c 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -921,6 +921,9 @@ struct bpf_prog_info {
__u32 nr_map_ids;
__aligned_u64 map_ids;
char name[BPF_OBJ_NAME_LEN];
+ __u32 ifindex;
+ __u64 netns_dev;
+ __u64 netns_ino;
} __attribute__((aligned(8)));
struct bpf_map_info {
diff --git a/kernel/bpf/offload.c b/kernel/bpf/offload.c
index e4f1668a021c..040d4e0edf3f 100644
--- a/kernel/bpf/offload.c
+++ b/kernel/bpf/offload.c
@@ -16,9 +16,11 @@
#include <linux/bpf.h>
#include <linux/bpf_verifier.h>
#include <linux/bug.h>
+#include <linux/kdev_t.h>
#include <linux/list.h>
#include <linux/netdevice.h>
#include <linux/printk.h>
+#include <linux/proc_ns.h>
#include <linux/rtnetlink.h>
#include <linux/rwsem.h>
@@ -176,6 +178,63 @@ int bpf_prog_offload_compile(struct bpf_prog *prog)
return bpf_prog_offload_translate(prog);
}
+struct ns_get_path_bpf_prog_args {
+ struct bpf_prog *prog;
+ struct bpf_prog_info *info;
+};
+
+static struct ns_common *bpf_prog_offload_info_fill_ns(void *private_data)
+{
+ struct ns_get_path_bpf_prog_args *args = private_data;
+ struct bpf_prog_aux *aux = args->prog->aux;
+ struct ns_common *ns;
+ struct net *net;
+
+ rtnl_lock();
+ down_read(&bpf_devs_lock);
+
+ if (aux->offload) {
+ args->info->ifindex = aux->offload->netdev->ifindex;
+ net = dev_net(aux->offload->netdev);
+ get_net(net);
+ ns = &net->ns;
+ } else {
+ args->info->ifindex = 0;
+ ns = NULL;
+ }
+
+ up_read(&bpf_devs_lock);
+ rtnl_unlock();
+
+ return ns;
+}
+
+int bpf_prog_offload_info_fill(struct bpf_prog_info *info,
+ struct bpf_prog *prog)
+{
+ struct ns_get_path_bpf_prog_args args = {
+ .prog = prog,
+ .info = info,
+ };
+ struct inode *ns_inode;
+ struct path ns_path;
+ void *res;
+
+ res = ns_get_path_cb(&ns_path, bpf_prog_offload_info_fill_ns, &args);
+ if (IS_ERR(res)) {
+ if (!info->ifindex)
+ return -ENODEV;
+ return PTR_ERR(res);
+ }
+
+ ns_inode = ns_path.dentry->d_inode;
+ info->netns_dev = new_encode_dev(ns_inode->i_sb->s_dev);
+ info->netns_ino = ns_inode->i_ino;
+ path_put(&ns_path);
+
+ return 0;
+}
+
const struct bpf_prog_ops bpf_offload_prog_ops = {
};
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 7d9f5b0f0e49..20444fd678d0 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -1624,6 +1624,12 @@ static int bpf_prog_get_info_by_fd(struct bpf_prog *prog,
return -EFAULT;
}
+ if (bpf_prog_is_dev_bound(prog->aux)) {
+ err = bpf_prog_offload_info_fill(&info, prog);
+ if (err)
+ return err;
+ }
+
done:
if (copy_to_user(uinfo, &info, info_len) ||
put_user(info_len, &uattr->info.info_len))
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index db1b0923a308..4e8c60acfa32 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -921,6 +921,9 @@ struct bpf_prog_info {
__u32 nr_map_ids;
__aligned_u64 map_ids;
char name[BPF_OBJ_NAME_LEN];
+ __u32 ifindex;
+ __u64 netns_dev;
+ __u64 netns_ino;
} __attribute__((aligned(8)));
struct bpf_map_info {
--
2.15.1
^ permalink raw reply related
* [PATCH bpf-next v3 8/9] tools: bpftool: report device information for offloaded programs
From: Jakub Kicinski @ 2017-12-28 2:39 UTC (permalink / raw)
To: netdev, daniel, alexei.starovoitov; +Cc: oss-drivers, Jakub Kicinski
In-Reply-To: <20171228023911.4251-1-jakub.kicinski@netronome.com>
Print the just-exposed device information about device to which
program is bound.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
---
tools/bpf/bpftool/common.c | 52 ++++++++++++++++++++++++++++++++++++++++++++++
tools/bpf/bpftool/main.h | 2 ++
tools/bpf/bpftool/prog.c | 3 +++
3 files changed, 57 insertions(+)
diff --git a/tools/bpf/bpftool/common.c b/tools/bpf/bpftool/common.c
index b62c94e3997a..6601c95a9258 100644
--- a/tools/bpf/bpftool/common.c
+++ b/tools/bpf/bpftool/common.c
@@ -44,7 +44,9 @@
#include <unistd.h>
#include <linux/limits.h>
#include <linux/magic.h>
+#include <net/if.h>
#include <sys/mount.h>
+#include <sys/stat.h>
#include <sys/types.h>
#include <sys/vfs.h>
@@ -412,3 +414,53 @@ void delete_pinned_obj_table(struct pinned_obj_table *tab)
free(obj);
}
}
+
+static char *
+ifindex_to_name_ns(__u32 ifindex, __u32 ns_dev, __u32 ns_ino, char *buf)
+{
+ struct stat st;
+ int err;
+
+ err = stat("/proc/self/ns/net", &st);
+ if (err) {
+ p_err("Can't stat /proc/self: %s", strerror(errno));
+ return NULL;
+ }
+
+ if (st.st_dev != ns_dev || st.st_ino != ns_ino)
+ return NULL;
+
+ return if_indextoname(ifindex, buf);
+}
+
+void print_dev_plain(__u32 ifindex, __u64 ns_dev, __u64 ns_inode)
+{
+ char name[IF_NAMESIZE];
+
+ if (!ifindex)
+ return;
+
+ printf(" dev ");
+ if (ifindex_to_name_ns(ifindex, ns_dev, ns_inode, name))
+ printf("%s", name);
+ else
+ printf("ifindex %u ns_dev %llu ns_ino %llu",
+ ifindex, ns_dev, ns_inode);
+}
+
+void print_dev_json(__u32 ifindex, __u64 ns_dev, __u64 ns_inode)
+{
+ char name[IF_NAMESIZE];
+
+ if (!ifindex)
+ return;
+
+ jsonw_name(json_wtr, "dev");
+ jsonw_start_object(json_wtr);
+ jsonw_uint_field(json_wtr, "ifindex", ifindex);
+ jsonw_uint_field(json_wtr, "ns_dev", ns_dev);
+ jsonw_uint_field(json_wtr, "ns_inode", ns_inode);
+ if (ifindex_to_name_ns(ifindex, ns_dev, ns_inode, name))
+ jsonw_string_field(json_wtr, "ifname", name);
+ jsonw_end_object(json_wtr);
+}
diff --git a/tools/bpf/bpftool/main.h b/tools/bpf/bpftool/main.h
index 8f6d3cac0347..65b526fe6e7e 100644
--- a/tools/bpf/bpftool/main.h
+++ b/tools/bpf/bpftool/main.h
@@ -96,6 +96,8 @@ struct pinned_obj {
int build_pinned_obj_table(struct pinned_obj_table *table,
enum bpf_obj_type type);
void delete_pinned_obj_table(struct pinned_obj_table *tab);
+void print_dev_plain(__u32 ifindex, __u64 ns_dev, __u64 ns_inode);
+void print_dev_json(__u32 ifindex, __u64 ns_dev, __u64 ns_inode);
struct cmd {
const char *cmd;
diff --git a/tools/bpf/bpftool/prog.c b/tools/bpf/bpftool/prog.c
index 037484ceaeaf..4ccf6301f0fe 100644
--- a/tools/bpf/bpftool/prog.c
+++ b/tools/bpf/bpftool/prog.c
@@ -230,6 +230,8 @@ static void print_prog_json(struct bpf_prog_info *info, int fd)
info->tag[0], info->tag[1], info->tag[2], info->tag[3],
info->tag[4], info->tag[5], info->tag[6], info->tag[7]);
+ print_dev_json(info->ifindex, info->netns_dev, info->netns_ino);
+
if (info->load_time) {
char buf[32];
@@ -287,6 +289,7 @@ static void print_prog_plain(struct bpf_prog_info *info, int fd)
printf("tag ");
fprint_hex(stdout, info->tag, BPF_TAG_SIZE, "");
+ print_dev_plain(info->ifindex, info->netns_dev, info->netns_ino);
printf("\n");
if (info->load_time) {
--
2.15.1
^ permalink raw reply related
* [PATCH bpf-next v3 9/9] selftests/bpf: test device info reporting for bound progs
From: Jakub Kicinski @ 2017-12-28 2:39 UTC (permalink / raw)
To: netdev, daniel, alexei.starovoitov; +Cc: oss-drivers, Jakub Kicinski
In-Reply-To: <20171228023911.4251-1-jakub.kicinski@netronome.com>
Check if bound programs report correct device info. Test
in local namespace, in remote one, back to the local ns,
remove the device and check that information is cleared.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
--
v2:
- check the error code from "prog show pin XX" with device
removed is -ENODEV.
---
tools/testing/selftests/bpf/test_offload.py | 112 +++++++++++++++++++++++++---
1 file changed, 101 insertions(+), 11 deletions(-)
diff --git a/tools/testing/selftests/bpf/test_offload.py b/tools/testing/selftests/bpf/test_offload.py
index c940505c2978..e3c750f17cb8 100755
--- a/tools/testing/selftests/bpf/test_offload.py
+++ b/tools/testing/selftests/bpf/test_offload.py
@@ -18,6 +18,8 @@ import argparse
import json
import os
import pprint
+import random
+import string
import subprocess
import time
@@ -27,6 +29,7 @@ bpf_test_dir = os.path.dirname(os.path.realpath(__file__))
pp = pprint.PrettyPrinter()
devs = [] # devices we created for clean up
files = [] # files to be removed
+netns = [] # net namespaces to be removed
def log_get_sec(level=0):
return "*" * (log_level + level)
@@ -128,22 +131,25 @@ files = [] # files to be removed
if f in files:
files.remove(f)
-def tool(name, args, flags, JSON=True, fail=True):
+def tool(name, args, flags, JSON=True, ns="", fail=True):
params = ""
if JSON:
params += "%s " % (flags["json"])
- ret, out = cmd(name + " " + params + args, fail=fail)
+ if ns != "":
+ ns = "ip netns exec %s " % (ns)
+
+ ret, out = cmd(ns + name + " " + params + args, fail=fail)
if JSON and len(out.strip()) != 0:
return ret, json.loads(out)
else:
return ret, out
-def bpftool(args, JSON=True, fail=True):
- return tool("bpftool", args, {"json":"-p"}, JSON=JSON, fail=fail)
+def bpftool(args, JSON=True, ns="", fail=True):
+ return tool("bpftool", args, {"json":"-p"}, JSON=JSON, ns=ns, fail=fail)
-def bpftool_prog_list(expected=None):
- _, progs = bpftool("prog show", JSON=True, fail=True)
+def bpftool_prog_list(expected=None, ns=""):
+ _, progs = bpftool("prog show", JSON=True, ns=ns, fail=True)
if expected is not None:
if len(progs) != expected:
fail(True, "%d BPF programs loaded, expected %d" %
@@ -158,13 +164,13 @@ files = [] # files to be removed
time.sleep(0.05)
raise Exception("Time out waiting for program counts to stabilize want %d, have %d" % (expected, nprogs))
-def ip(args, force=False, JSON=True, fail=True):
+def ip(args, force=False, JSON=True, ns="", fail=True):
if force:
args = "-force " + args
- return tool("ip", args, {"json":"-j"}, JSON=JSON, fail=fail)
+ return tool("ip", args, {"json":"-j"}, JSON=JSON, ns=ns, fail=fail)
-def tc(args, JSON=True, fail=True):
- return tool("tc", args, {"json":"-p"}, JSON=JSON, fail=fail)
+def tc(args, JSON=True, ns="", fail=True):
+ return tool("tc", args, {"json":"-p"}, JSON=JSON, ns=ns, fail=fail)
def ethtool(dev, opt, args, fail=True):
return cmd("ethtool %s %s %s" % (opt, dev["ifname"], args), fail=fail)
@@ -178,6 +184,15 @@ files = [] # files to be removed
def bpf_bytecode(bytecode):
return "bytecode \"%s\"" % (bytecode)
+def mknetns(n_retry=10):
+ for i in range(n_retry):
+ name = ''.join([random.choice(string.ascii_letters) for i in range(8)])
+ ret, _ = ip("netns add %s" % (name), fail=False)
+ if ret == 0:
+ netns.append(name)
+ return name
+ return None
+
class DebugfsDir:
"""
Class for accessing DebugFS directories as a dictionary.
@@ -237,6 +252,8 @@ files = [] # files to be removed
self.dev = self._netdevsim_create()
devs.append(self)
+ self.ns = ""
+
self.dfs_dir = '/sys/kernel/debug/netdevsim/%s' % (self.dev['ifname'])
self.dfs_refresh()
@@ -257,7 +274,7 @@ files = [] # files to be removed
def remove(self):
devs.remove(self)
- ip("link del dev %s" % (self.dev["ifname"]))
+ ip("link del dev %s" % (self.dev["ifname"]), ns=self.ns)
def dfs_refresh(self):
self.dfs = DebugfsDir(self.dfs_dir)
@@ -285,6 +302,11 @@ files = [] # files to be removed
time.sleep(0.05)
raise Exception("Time out waiting for program counts to stabilize want %d/%d, have %d bound, %d loaded" % (bound, total, nbound, nprogs))
+ def set_ns(self, ns):
+ name = "1" if ns == "" else ns
+ ip("link set dev %s netns %s" % (self.dev["ifname"], name), ns=self.ns)
+ self.ns = ns
+
def set_mtu(self, mtu, fail=True):
return ip("link set dev %s mtu %d" % (self.dev["ifname"], mtu),
fail=fail)
@@ -372,6 +394,8 @@ files = [] # files to be removed
dev.remove()
for f in files:
cmd("rm -f %s" % (f))
+ for ns in netns:
+ cmd("ip netns delete %s" % (ns))
def pin_prog(file_name, idx=0):
progs = bpftool_prog_list(expected=(idx + 1))
@@ -381,6 +405,35 @@ files = [] # files to be removed
return file_name, bpf_pinned(file_name)
+def check_dev_info(other_ns, ns, pin_file=None, removed=False):
+ if removed:
+ bpftool_prog_list(expected=0)
+ ret, err = bpftool("prog show pin %s" % (pin_file), fail=False)
+ fail(ret == 0, "Showing prog with removed device did not fail")
+ fail(err["error"].find("No such device") == -1,
+ "Showing prog with removed device expected ENODEV, error is %s" %
+ (err["error"]))
+ return
+ progs = bpftool_prog_list(expected=int(not removed), ns=ns)
+ prog = progs[0]
+
+ fail("dev" not in prog.keys(), "Device parameters not reported")
+ dev = prog["dev"]
+ fail("ifindex" not in dev.keys(), "Device parameters not reported")
+ fail("ns_dev" not in dev.keys(), "Device parameters not reported")
+ fail("ns_inode" not in dev.keys(), "Device parameters not reported")
+
+ if not removed and not other_ns:
+ fail("ifname" not in dev.keys(), "Ifname not reported")
+ fail(dev["ifname"] != sim["ifname"],
+ "Ifname incorrect %s vs %s" % (dev["ifname"], sim["ifname"]))
+ else:
+ fail("ifname" in dev.keys(), "Ifname is reported for other ns")
+ if removed:
+ fail(dev["ifindex"] != 0, "Device perameters not zero on removed")
+ fail(dev["ns_dev"] != 0, "Device perameters not zero on removed")
+ fail(dev["ns_inode"] != 0, "Device perameters not zero on removed")
+
# Parse command line
parser = argparse.ArgumentParser()
parser.add_argument("--log", help="output verbose log to given file")
@@ -417,6 +470,12 @@ samples = ["sample_ret0.o"]
skip(ret != 0, "sample %s/%s not found, please compile it" %
(bpf_test_dir, s))
+# Check if net namespaces seem to work
+ns = mknetns()
+skip(ns is None, "Could not create a net namespace")
+cmd("ip netns delete %s" % (ns))
+netns = []
+
try:
obj = bpf_obj("sample_ret0.o")
bytecode = bpf_bytecode("1,6 0 0 4294967295,")
@@ -549,6 +608,8 @@ samples = ["sample_ret0.o"]
progs = bpftool_prog_list(expected=1)
fail(ipl["xdp"]["prog"]["id"] != progs[0]["id"],
"Loaded program has wrong ID")
+ fail("dev" in progs[0].keys(),
+ "Device parameters reported for non-offloaded program")
start_test("Test XDP prog replace with bad flags...")
ret, _ = sim.set_xdp(obj, "offload", force=True, fail=False)
@@ -673,6 +734,35 @@ samples = ["sample_ret0.o"]
fail(time_diff < delay_sec, "Removal process took %s, expected %s" %
(time_diff, delay_sec))
+ # Remove all pinned files and reinstantiate the netdev
+ clean_up()
+ bpftool_prog_list_wait(expected=0)
+
+ sim = NetdevSim()
+ sim.set_ethtool_tc_offloads(True)
+ sim.set_xdp(obj, "offload")
+
+ start_test("Test bpftool bound info reporting (own ns)...")
+ check_dev_info(False, "")
+
+ start_test("Test bpftool bound info reporting (other ns)...")
+ ns = mknetns()
+ sim.set_ns(ns)
+ check_dev_info(True, "")
+
+ start_test("Test bpftool bound info reporting (remote ns)...")
+ check_dev_info(False, ns)
+
+ start_test("Test bpftool bound info reporting (back to own ns)...")
+ sim.set_ns("")
+ check_dev_info(False, "")
+
+ pin_file, _ = pin_prog("/sys/fs/bpf/tmp")
+ sim.remove()
+
+ start_test("Test bpftool bound info reporting (removed dev)...")
+ check_dev_info(True, "", pin_file=pin_file, removed=True)
+
print("%s: OK" % (os.path.basename(__file__)))
finally:
--
2.15.1
^ permalink raw reply related
* Re: [RFC PATCH bpf-next v2 1/4] tracing/kprobe: bpf: Check error injectable event is on function entry
From: Alexei Starovoitov @ 2017-12-28 3:45 UTC (permalink / raw)
To: Masami Hiramatsu
Cc: Alexei Starovoitov, Josef Bacik, rostedt, mingo, davem, netdev,
linux-kernel, ast, kernel-team, daniel, linux-btrfs, darrick.wong,
Josef Bacik, Akinobu Mita
In-Reply-To: <20171228113434.eb182c348fc69853fec934ee@kernel.org>
On 12/27/17 6:34 PM, Masami Hiramatsu wrote:
> On Wed, 27 Dec 2017 14:46:24 -0800
> Alexei Starovoitov <ast@fb.com> wrote:
>
>> On 12/26/17 9:56 PM, Masami Hiramatsu wrote:
>>> On Tue, 26 Dec 2017 17:57:32 -0800
>>> Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
>>>
>>>> On Tue, Dec 26, 2017 at 04:46:59PM +0900, Masami Hiramatsu wrote:
>>>>> Check whether error injectable event is on function entry or not.
>>>>> Currently it checks the event is ftrace-based kprobes or not,
>>>>> but that is wrong. It should check if the event is on the entry
>>>>> of target function. Since error injection will override a function
>>>>> to just return with modified return value, that operation must
>>>>> be done before the target function starts making stackframe.
>>>>>
>>>>> As a side effect, bpf error injection is no need to depend on
>>>>> function-tracer. It can work with sw-breakpoint based kprobe
>>>>> events too.
>>>>>
>>>>> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
>>>>> ---
>>>>> kernel/trace/Kconfig | 2 --
>>>>> kernel/trace/bpf_trace.c | 6 +++---
>>>>> kernel/trace/trace_kprobe.c | 8 +++++---
>>>>> kernel/trace/trace_probe.h | 12 ++++++------
>>>>> 4 files changed, 14 insertions(+), 14 deletions(-)
>>>>>
>>>>> diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
>>>>> index ae3a2d519e50..6400e1bf97c5 100644
>>>>> --- a/kernel/trace/Kconfig
>>>>> +++ b/kernel/trace/Kconfig
>>>>> @@ -533,9 +533,7 @@ config FUNCTION_PROFILER
>>>>> config BPF_KPROBE_OVERRIDE
>>>>> bool "Enable BPF programs to override a kprobed function"
>>>>> depends on BPF_EVENTS
>>>>> - depends on KPROBES_ON_FTRACE
>>>>> depends on HAVE_KPROBE_OVERRIDE
>>>>> - depends on DYNAMIC_FTRACE_WITH_REGS
>>>>> default n
>>>>> help
>>>>> Allows BPF to override the execution of a probed function and
>>>>> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
>>>>> index f6d2327ecb59..d663660f8392 100644
>>>>> --- a/kernel/trace/bpf_trace.c
>>>>> +++ b/kernel/trace/bpf_trace.c
>>>>> @@ -800,11 +800,11 @@ int perf_event_attach_bpf_prog(struct perf_event *event,
>>>>> int ret = -EEXIST;
>>>>>
>>>>> /*
>>>>> - * Kprobe override only works for ftrace based kprobes, and only if they
>>>>> - * are on the opt-in list.
>>>>> + * Kprobe override only works if they are on the function entry,
>>>>> + * and only if they are on the opt-in list.
>>>>> */
>>>>> if (prog->kprobe_override &&
>>>>> - (!trace_kprobe_ftrace(event->tp_event) ||
>>>>> + (!trace_kprobe_on_func_entry(event->tp_event) ||
>>>>> !trace_kprobe_error_injectable(event->tp_event)))
>>>>> return -EINVAL;
>>>>>
>>>>> diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
>>>>> index 91f4b57dab82..265e3e27e8dc 100644
>>>>> --- a/kernel/trace/trace_kprobe.c
>>>>> +++ b/kernel/trace/trace_kprobe.c
>>>>> @@ -88,13 +88,15 @@ static nokprobe_inline unsigned long trace_kprobe_nhit(struct trace_kprobe *tk)
>>>>> return nhit;
>>>>> }
>>>>>
>>>>> -int trace_kprobe_ftrace(struct trace_event_call *call)
>>>>> +bool trace_kprobe_on_func_entry(struct trace_event_call *call)
>>>>> {
>>>>> struct trace_kprobe *tk = (struct trace_kprobe *)call->data;
>>>>> - return kprobe_ftrace(&tk->rp.kp);
>>>>> +
>>>>> + return kprobe_on_func_entry(tk->rp.kp.addr, tk->rp.kp.symbol_name,
>>>>> + tk->rp.kp.offset);
>>>>
>>>> That would be nice, but did you test this?
>>>
>>> Yes, because the jprobe, which was only official user of modifying execution
>>> path using kprobe, did same way to check. (and kretprobe also does it)
>>>
>>>> My understanding that kprobe will restore all regs and
>>>> here we need to override return ip _and_ value.
>>>
>>> yes, no problem. kprobe restore all regs from pt_regs, including regs->ip.
>>>
>>>> Could you add a patch with the test the way Josef did
>>>> or describe the steps to test this new mode?
>>>
>>> Would you mean below patch? If so, it should work without any change.
>>>
>>> [PATCH v10 4/5] samples/bpf: add a test for bpf_override_return
>>
>> yeah. I expect bpf_override_return test to work as-is.
>> I'm asking for the test for new functionality added by this patch.
>> In particular kprobe on func entry without ftrace.
>> How did you test it?
>
> This function is used in kretprobe and jprobe. Jprobe was the user of
> "modifying instruction pointer to another function" in kprobes.
> If it doesn't work, jprobe also doesn't work, this means you can not
> modify IP by kprobes anymore.
> Anyway, until linux-4.13, that was well tested by kprobe smoke test.
>
>> and how I can repeat the test?
>> I'm still not sure that it works correctly.
>
> That works correctly because it checks given address is on the entry
> point (the 1st instruction) of a function, using kallsyms.
>
> The reason why I made another flag for ftrace was, there are 2 modes
> for ftrace dynamic instrumentation, fentry and mcount.
> With new fentry mode, ftrace will be put on the first instruction
> of the function, so it will work as you expected.
> With traditional gcc mcount, ftrace will be called after making call
> frame for _mcount(). This means if you modify ip, it will not work
> or cause a trouble because _mcount call frame is still on the stack.
>
> So, current ftrace-based checker doesn't work, it depends on the case.
> Of course, in most case, kernel will be build in new gcc which
> supports fentry, but there is no guarantee.
I don't think that's the case. My reading of current
trace_kprobe_ftrace() -> arch_check_ftrace_location()
is that it will not be true for old mcount case.
As far as the rest of your arguments it very much puzzles me that
you claim that this patch suppose to work based on historical
reasoning whereas you did NOT test it.
^ permalink raw reply
* Re: [RFC PATCH bpf-next v2 4/4] error-injection: Support fault injection framework
From: Alexei Starovoitov @ 2017-12-28 3:49 UTC (permalink / raw)
To: Masami Hiramatsu
Cc: Alexei Starovoitov, Josef Bacik, rostedt, mingo, davem, netdev,
linux-kernel, ast, kernel-team, daniel, linux-btrfs, darrick.wong,
Josef Bacik, Akinobu Mita
In-Reply-To: <20171228103849.9be1a4507298b280b9df9c20@kernel.org>
On 12/27/17 5:38 PM, Masami Hiramatsu wrote:
> On Wed, 27 Dec 2017 14:49:46 -0800
> Alexei Starovoitov <ast@fb.com> wrote:
>
>> On 12/27/17 12:09 AM, Masami Hiramatsu wrote:
>>> On Tue, 26 Dec 2017 18:12:56 -0800
>>> Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
>>>
>>>> On Tue, Dec 26, 2017 at 04:48:25PM +0900, Masami Hiramatsu wrote:
>>>>> Support in-kernel fault-injection framework via debugfs.
>>>>> This allows you to inject a conditional error to specified
>>>>> function using debugfs interfaces.
>>>>>
>>>>> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
>>>>> ---
>>>>> Documentation/fault-injection/fault-injection.txt | 5 +
>>>>> kernel/Makefile | 1
>>>>> kernel/fail_function.c | 169 +++++++++++++++++++++
>>>>> lib/Kconfig.debug | 10 +
>>>>> 4 files changed, 185 insertions(+)
>>>>> create mode 100644 kernel/fail_function.c
>>>>>
>>>>> diff --git a/Documentation/fault-injection/fault-injection.txt b/Documentation/fault-injection/fault-injection.txt
>>>>> index 918972babcd8..6243a588dd71 100644
>>>>> --- a/Documentation/fault-injection/fault-injection.txt
>>>>> +++ b/Documentation/fault-injection/fault-injection.txt
>>>>> @@ -30,6 +30,11 @@ o fail_mmc_request
>>>>> injects MMC data errors on devices permitted by setting
>>>>> debugfs entries under /sys/kernel/debug/mmc0/fail_mmc_request
>>>>>
>>>>> +o fail_function
>>>>> +
>>>>> + injects error return on specific functions by setting debugfs entries
>>>>> + under /sys/kernel/debug/fail_function. No boot option supported.
>>>>
>>>> I like it.
>>>> Could you document it a bit better?
>>>
>>> Yes, I will do in next series.
>>>
>>>> In particular retval is configurable, but without an example no one
>>>> will be able to figure out how to use it.
>>>
>>> Ah, right. BTW, as I pointed in the covermail, should we store the
>>> expected error value range into the injectable list? e.g.
>>>
>>> ALLOW_ERROR_INJECTION(open_ctree, -1, -MAX_ERRNO)
>>>
>>> And provide APIs to check/get it.
>>
>> I'm afraid such check would be too costly.
>> Right now we have only two functions marked but I expect hundreds more
>> will be added in the near future as soon as developers realize the
>> potential of such error injection.
>> All of ALLOW_ERROR_INJECTION marks add 8 byte overhead each to .data.
>> Multiple by 1k and we have 8k of data spent on marks.
>> If we add max/min range marks that doubles it for very little use.
>> I think marking function only is enough.
>
> Sorry, I don't think so.
> Even if it takes 16 bytes more for each points, I don't think it is
> any overhead for machines in these days. Even if so, we can provide
> a kconfig to reduce it.
> I mean, we are living in GB-order memory are, and it will be bigger
> in the future. Why we have to worry about hundreds of 16bytes memory
> pieces? It will take a few KB, and even if we mark thousands of
> functions, it never reaches 1MB, in GB memory pool. :)
>
> Of course, for many small-footprint embedded devices (like having
> less than 128MB memory), this feature can be a overhead. But they
> can cut off the table by kconfig.
I still disagree on wasting 16-byte * num_of_funcs of .data here.
The trade-off of usability vs memory just not worth it. Sorry.
Please focus on testing your changes instead.
^ permalink raw reply
* [net-next PATCH] net: ptr_ring: otherwise safe empty checks can overrun array bounds
From: John Fastabend @ 2017-12-28 3:50 UTC (permalink / raw)
To: jakub.kicinski, davem, mst; +Cc: xiyou.wangcong, jiri, netdev
When running consumer and/or producer operations and empty checks in
parallel its possible to have the empty check run past the end of the
array. The scenario occurs when an empty check is run while
__ptr_ring_discard_one() is in progress. Specifically after the
consumer_head is incremented but before (consumer_head >= ring_size)
check is made and the consumer head is zeroe'd.
To resolve this, without having to rework how consumer/producer ops
work on the array, simply add an extra dummy slot to the end of the
array. Even if we did a rework to avoid the extra slot it looks
like the normal case checks would suffer some so best to just
allocate an extra pointer.
Reported-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Fixes: c5ad119fb6c09 ("net: sched: pfifo_fast use skb_array")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
---
include/linux/ptr_ring.h | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/include/linux/ptr_ring.h b/include/linux/ptr_ring.h
index 6866df4..13fb06a 100644
--- a/include/linux/ptr_ring.h
+++ b/include/linux/ptr_ring.h
@@ -447,7 +447,12 @@ static inline int ptr_ring_consume_batched_bh(struct ptr_ring *r,
static inline void **__ptr_ring_init_queue_alloc(unsigned int size, gfp_t gfp)
{
- return kcalloc(size, sizeof(void *), gfp);
+ /* Allocate an extra dummy element at end of ring to avoid consumer head
+ * or produce head access past the end of the array. Possible when
+ * producer/consumer operations and __ptr_ring_peek operations run in
+ * parallel.
+ */
+ return kcalloc(size + 1, sizeof(void *), gfp);
}
static inline void __ptr_ring_set_size(struct ptr_ring *r, int size)
^ permalink raw reply related
* Re: [PATCH net-next] nfp: bpf: allocate vNIC priv for keeping track of the offloaded program
From: Jakub Kicinski @ 2017-12-28 4:11 UTC (permalink / raw)
To: David Miller; +Cc: netdev, oss-drivers
In-Reply-To: <20171227.203811.663878227338123152.davem@davemloft.net>
On Wed, 27 Dec 2017 20:38:11 -0500 (EST), David Miller wrote:
> From: Jakub Kicinski <jakub.kicinski@netronome.com>
> Date: Wed, 27 Dec 2017 15:36:49 -0800
>
> > After TC offloads were converted to callbacks we have no choice
> > but keep track of the offloaded filter in the driver.
> >
> > Since this change came a little late in the release cycle
> > there were a number of conflicts and allocation of vNIC priv
> > structure seems to have slipped away in linux-next.
> >
> > Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
>
> Oh well, I thought I resolved the merge conflict properly in this
> case. Apparently not :-)
The resolution looks suspiciously similar to what Stephen did in
linux-next, but I wasn't sure if you reuse his resolutions hence
no Fixes: 903628bbc3a7 ;)
^ permalink raw reply
* Re: [patch net-next v5 05/10] net: sched: keep track of offloaded filters and check tc offload feature
From: Jakub Kicinski @ 2017-12-28 4:12 UTC (permalink / raw)
To: Jiri Pirko
Cc: netdev, davem, jhs, xiyou.wangcong, mlxsw, andrew, vivien.didelot,
f.fainelli, michael.chan, ganeshgr, saeedm, matanb, leonro,
idosch, simon.horman, pieter.jansenvanvuuren, john.hurley,
alexander.h.duyck, ogerlitz, john.fastabend, daniel, dsahern
In-Reply-To: <20171226141604.1605-6-jiri@resnulli.us>
On Tue, 26 Dec 2017 15:15:59 +0100, Jiri Pirko wrote:
> From: Jiri Pirko <jiri@mellanox.com>
>
> During block bind, we need to check tc offload feature. If it is
> disabled yet still the block contains offloaded filters, forbid the
> bind. Also forbid to register callback for a block that already
> containes offloaded filters, as the play back is not supported now.
> For keeping track of offloaded filters there is a new counter
> introduced, alongside with couple of helpers called from cls_* code.
> These helpers set and clear TCA_CLS_FLAGS_IN_HW flag.
>
> Signed-off-by: Jiri Pirko <jiri@mellanox.com>
LGTM, thanks!
^ permalink raw reply
* Re: [RFC PATCH bpf-next v2 1/4] tracing/kprobe: bpf: Check error injectable event is on function entry
From: Steven Rostedt @ 2017-12-28 4:16 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: Masami Hiramatsu, Alexei Starovoitov, Josef Bacik, mingo, davem,
netdev, linux-kernel, ast, kernel-team, daniel, linux-btrfs,
darrick.wong, Josef Bacik, Akinobu Mita
In-Reply-To: <03e0ebb7-0b2a-4235-3408-c0d59a1ba4c2@fb.com>
On Wed, 27 Dec 2017 19:45:42 -0800
Alexei Starovoitov <ast@fb.com> wrote:
> I don't think that's the case. My reading of current
> trace_kprobe_ftrace() -> arch_check_ftrace_location()
> is that it will not be true for old mcount case.
In the old mcount case, you can't use ftrace to return without calling
the function. That is, no modification of the return ip, unless you
created a trampoline that could handle arbitrary stack frames, and
remove them from the stack before returning back to the function.
>
> As far as the rest of your arguments it very much puzzles me that
> you claim that this patch suppose to work based on historical
> reasoning whereas you did NOT test it.
I believe that Masami is saying that the modification of the IP from
kprobes has been very well tested. But I'm guessing that you still want
a test case for using kprobes in this particular instance. It's not the
implementation of modifying the IP that you are worried about, but the
implementation of BPF using it in this case. Right?
-- Steve
^ permalink raw reply
* Re: [PATCH v3 bpf-next 2/2] tools/bpftool: fix bpftool build with bintutils >= 2.9
From: Jakub Kicinski @ 2017-12-28 4:19 UTC (permalink / raw)
To: Roman Gushchin
Cc: netdev, linux-kernel, kernel-team, Alexei Starovoitov,
Daniel Borkmann
In-Reply-To: <20171227191629.4920-2-guro@fb.com>
On Wed, 27 Dec 2017 19:16:29 +0000, Roman Gushchin wrote:
> Bpftool build is broken with binutils version 2.29 and later.
> The cause is commit 003ca0fd2286 ("Refactor disassembler selection")
> in the binutils repo, which changed the disassembler() function
> signature.
>
> Fix this by adding a new "feature" to the tools/build/features
> infrastructure and make it responsible for decision which
> disassembler() function signature to use.
>
> Signed-off-by: Roman Gushchin <guro@fb.com>
> Cc: Jakub Kicinski <jakub.kicinski@netronome.com>
> Cc: Alexei Starovoitov <ast@kernel.org>
> Cc: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
^ permalink raw reply
* Re: [RFC PATCH bpf-next v2 1/4] tracing/kprobe: bpf: Check error injectable event is on function entry
From: Alexei Starovoitov @ 2017-12-28 4:32 UTC (permalink / raw)
To: Steven Rostedt
Cc: Masami Hiramatsu, Alexei Starovoitov, Josef Bacik, mingo, davem,
netdev, linux-kernel, ast, kernel-team, daniel, linux-btrfs,
darrick.wong, Josef Bacik, Akinobu Mita
In-Reply-To: <20171227231644.168abc0f@vmware.local.home>
On 12/27/17 8:16 PM, Steven Rostedt wrote:
> On Wed, 27 Dec 2017 19:45:42 -0800
> Alexei Starovoitov <ast@fb.com> wrote:
>
>> I don't think that's the case. My reading of current
>> trace_kprobe_ftrace() -> arch_check_ftrace_location()
>> is that it will not be true for old mcount case.
>
> In the old mcount case, you can't use ftrace to return without calling
> the function. That is, no modification of the return ip, unless you
> created a trampoline that could handle arbitrary stack frames, and
> remove them from the stack before returning back to the function.
correct. I was saying that trace_kprobe_ftrace() won't let us do
bpf_override_return with old mcount.
>>
>> As far as the rest of your arguments it very much puzzles me that
>> you claim that this patch suppose to work based on historical
>> reasoning whereas you did NOT test it.
>
> I believe that Masami is saying that the modification of the IP from
> kprobes has been very well tested. But I'm guessing that you still want
> a test case for using kprobes in this particular instance. It's not the
> implementation of modifying the IP that you are worried about, but the
> implementation of BPF using it in this case. Right?
exactly. No doubt that old code works.
But it doesn't mean that bpf_override_return() will continue to
work in kprobes that are not ftrace based.
I suspect Josef's existing test case will cover this situation.
Probably only special .config is needed to disable ftrace, so
"kprobe on entry but not ftrace" check will kick in.
But I didn't get an impression that this situation was tested.
Instead I see only logical reasoning that it's _supposed_ to work.
That's not enough.
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox