* Re: Kernel 4.19 network performance - forwarding/routing normal users traffic
From: David Ahern @ 2018-11-09 0:42 UTC (permalink / raw)
To: Paweł Staszewski, Jesper Dangaard Brouer; +Cc: netdev, Yoel Caspersen
In-Reply-To: <8cb2630e-e7fe-cd44-7798-070f2e6d348a@itcare.pl>
On 11/8/18 5:40 PM, Paweł Staszewski wrote:
>
>
> W dniu 08.11.2018 o 17:32, David Ahern pisze:
>> On 11/8/18 9:27 AM, Paweł Staszewski wrote:
>>>>> What hardware is this?
>>>>>
>>> mellanox connectx 4
>>> ethtool -i enp175s0f0
>>> driver: mlx5_core
>>> version: 5.0-0
>>> firmware-version: 12.21.1000 (SM_2001000001033)
>>> expansion-rom-version:
>>> bus-info: 0000:af:00.0
>>> supports-statistics: yes
>>> supports-test: yes
>>> supports-eeprom-access: no
>>> supports-register-dump: no
>>> supports-priv-flags: yes
>>>
>>> ethtool -i enp175s0f1
>>> driver: mlx5_core
>>> version: 5.0-0
>>> firmware-version: 12.21.1000 (SM_2001000001033)
>>> expansion-rom-version:
>>> bus-info: 0000:af:00.1
>>> supports-statistics: yes
>>> supports-test: yes
>>> supports-eeprom-access: no
>>> supports-register-dump: no
>>> supports-priv-flags: yes
>>>
>>>>> Start with:
>>>>>
>>>>> echo 1 > /sys/kernel/debug/tracing/events/xdp/enable
>>>>> cat /sys/kernel/debug/tracing/trace_pipe
>>>> cat /sys/kernel/debug/tracing/trace_pipe
>>>> <idle>-0 [045] ..s. 68469.467752: xdp_devmap_xmit:
>>>> ndo_xdp_xmit map_id=32 map_index=5 action=REDIRECT sent=0 drops=1
>>>> from_ifindex=4 to_ifindex=5 err=-6
>> FIB lookup is good, the redirect is happening, but the mlx5 driver does
>> not like it.
>>
>> I think the -6 is coming from the mlx5 driver and the packet is getting
>> dropped. Perhaps this check in mlx5e_xdp_xmit:
>>
>> if (unlikely(sq_num >= priv->channels.num))
>> return -ENXIO;
> I removed that part and recompiled - but after running now xdp_fwd i
> have kernel pamic :)
Jesper or one of the Mellanox folks needs to respond about the config
needed to run XDP with this NIC. I don't have a 40G or 100G card to play
with.
^ permalink raw reply
* RE:(2) (2) (2) (2) [Kernel][NET] Bug report on packet defragmenting
From: 배석진 @ 2018-11-09 0:42 UTC (permalink / raw)
To: Eric Dumazet, netdev@vger.kernel.org
In-Reply-To: <dce9a158-8e7b-1381-c7ff-b590771f95fb@gmail.com>
[-- Attachment #1: Type: text/plain, Size: 327 bytes --]
>Thanks for testing.
>
>This is not a pristine net-next tree, this dump seems unrelated to the patch ?
yes, looks like that.
but only when using your patch, panic came. even right after packet recieving..
without that, there's no problem except defrag issue. it's odd.. :p
I couldn't more debugging since have other problems.
[-- Attachment #2: rcptInfo.txt --]
[-- Type: application/octet-stream, Size: 1090 bytes --]
=================================================================================================================================
Subject : Re: (2) (2) (2) [Kernel][NET] Bug report on packet defragmenting
From : Eric Dumazet eric.dumazet@gmail.com
Sent Date : 2018-11-09 00:12 GMT+9
=================================================================================================================================
Name Type Job Title Dept. Company
=================================================================================================================================
배석진 TO Staff Engineer System개발2그룹(무선) 삼성전자
eric.dumazet@gmail.com TO
netdev@vger.kernel.org TO
=================================================================================================================================
^ permalink raw reply
* Re: [PATCH bpf-next v4 02/13] bpf: btf: Add BTF_KIND_FUNC
From: Yonghong Song @ 2018-11-09 0:40 UTC (permalink / raw)
To: Edward Cree, Alexei Starovoitov, daniel@iogearbox.net,
netdev@vger.kernel.org
Cc: Kernel Team, Martin Lau
In-Reply-To: <8a025d6e-64af-1d37-6cc2-692e9ce3f760@solarflare.com>
On 11/8/18 12:52 PM, Edward Cree wrote:
> On 08/11/18 20:36, Yonghong Song wrote:
>> This patch adds BTF_KIND_FUNC support to the type section.
>> BTF_KIND_FUNC is used to specify the signature of a
>> defined subprogram or the pointee of a function pointer.
>>
>> In BTF, the function type related data structures are
>> struct bpf_param {
>> __u32 name_off; /* parameter name */
>> __u32 type; /* parameter type */
>> };
>> struct bpf_type {
>> __u32 name_off; /* function name */
>> __u32 info; /* BTF_KIND_FUNC and num of parameters (#vlen) */
>> __u32 type; /* return type */
>> }
>> The data layout of the function type:
>> struct bpf_type
>> #vlen number of bpf_param's
>>
>> For a defined subprogram with valid function body,
>> . function name and all parameter names except the vararg
>> must be valid C identifier.
> Given that there's an intention to support other frontends besides
> C, what's the reason for this restriction?
This (C) is the typical usage today. If later on other frontend
generates bpf programs with more relaxed symbol name requirement,
we can certainly relax the rule.
>
>> For the pointee of a function pointer,
>> . function name and all parameter names will
>> have name_off = 0 to indicate a non-existing name.
> Why can't function pointer parameters have names?
Currently, both bcc and llvm does not retain function pointer
arguments in dwarf. For LLVM, the IR generation for function pointer
type discards the argument name. So I did the checking because
llvm does not generate it.
We can relax the restrictions later if the compiler starts
to keep argument names in the IR.
> E.g. imagine something like struct net_device_ops. All those
> function pointers have named parameters and that's relevant info
> when debugging.
>
> -Ed
>
^ permalink raw reply
* Re: Kernel 4.19 network performance - forwarding/routing normal users traffic
From: Paweł Staszewski @ 2018-11-09 0:40 UTC (permalink / raw)
To: David Ahern, Jesper Dangaard Brouer; +Cc: netdev, Yoel Caspersen
In-Reply-To: <68cc8279-5e3f-85c2-673c-aa3d4a47b353@gmail.com>
W dniu 08.11.2018 o 17:32, David Ahern pisze:
> On 11/8/18 9:27 AM, Paweł Staszewski wrote:
>>>> What hardware is this?
>>>>
>> mellanox connectx 4
>> ethtool -i enp175s0f0
>> driver: mlx5_core
>> version: 5.0-0
>> firmware-version: 12.21.1000 (SM_2001000001033)
>> expansion-rom-version:
>> bus-info: 0000:af:00.0
>> supports-statistics: yes
>> supports-test: yes
>> supports-eeprom-access: no
>> supports-register-dump: no
>> supports-priv-flags: yes
>>
>> ethtool -i enp175s0f1
>> driver: mlx5_core
>> version: 5.0-0
>> firmware-version: 12.21.1000 (SM_2001000001033)
>> expansion-rom-version:
>> bus-info: 0000:af:00.1
>> supports-statistics: yes
>> supports-test: yes
>> supports-eeprom-access: no
>> supports-register-dump: no
>> supports-priv-flags: yes
>>
>>>> Start with:
>>>>
>>>> echo 1 > /sys/kernel/debug/tracing/events/xdp/enable
>>>> cat /sys/kernel/debug/tracing/trace_pipe
>>> cat /sys/kernel/debug/tracing/trace_pipe
>>> <idle>-0 [045] ..s. 68469.467752: xdp_devmap_xmit:
>>> ndo_xdp_xmit map_id=32 map_index=5 action=REDIRECT sent=0 drops=1
>>> from_ifindex=4 to_ifindex=5 err=-6
> FIB lookup is good, the redirect is happening, but the mlx5 driver does
> not like it.
>
> I think the -6 is coming from the mlx5 driver and the packet is getting
> dropped. Perhaps this check in mlx5e_xdp_xmit:
>
> if (unlikely(sq_num >= priv->channels.num))
> return -ENXIO;
I removed that part and recompiled - but after running now xdp_fwd i
have kernel pamic :)
>
>
>>> swapper 0 [045] 68493.746274: fib:fib_table_lookup: table 254 oif
>>> 0 iif 6 proto 1 192.168.22.237/0 -> 172.16.0.2/0 tos 0 scope 0 flags 0
>>> ==> dev vlan1740 gw 0.0.0.0 src 172.16.0.1 err 0
>>> 7fff818c13b5 fib_table_lookup ([kernel.kallsyms])
>>>
>>> swapper 0 [045] 68494.770287: fib:fib_table_lookup: table 254 oif
>>> 0 iif 6 proto 1 192.168.22.237/0 -> 172.16.0.2/0 tos 0 scope 0 flags 0
>>> ==> dev vlan1740 gw 0.0.0.0 src 172.16.0.1 err 0
>>> 7fff818c13b5 fib_table_lookup ([kernel.kallsyms])
>>>
>>> swapper 0 [045] 68495.794304: fib:fib_table_lookup: table 254 oif
>>> 0 iif 6 proto 1 192.168.22.237/0 -> 172.16.0.2/0 tos 0 scope 0 flags 0
>>> ==> dev vlan1740 gw 0.0.0.0 src 172.16.0.1 err 0
>>> 7fff818c13b5 fib_table_lookup ([kernel.kallsyms])
>>>
>>> swapper 0 [045] 68496.818308: fib:fib_table_lookup: table 254 oif
>>> 0 iif 6 proto 1 192.168.22.237/0 -> 172.16.0.2/0 tos 0 scope 0 flags 0
>>> ==> dev vlan1740 gw 0.0.0.0 src 172.16.0.1 err 0
>>> 7fff818c13b5 fib_table_lookup ([kernel.kallsyms])
>>>
>>> swapper 0 [045] 68497.842313: fib:fib_table_lookup: table 254 oif
>>> 0 iif 6 proto 1 192.168.22.237/0 -> 172.16.0.2/0 tos 0 scope 0 flags 0
>>> ==> dev vlan1740 gw 0.0.0.0 src 172.16.0.1 err 0
>>> 7fff818c13b5 fib_table_lookup ([kernel.kallsyms])
^ permalink raw reply
* Re: [RFC PATCH 0/3] acpi: Add acpi mdio support code
From: Timur Tabi @ 2018-11-09 0:37 UTC (permalink / raw)
To: Andrew Lunn, Wang Dongsheng; +Cc: yu.zheng, f.fainelli, rjw, linux-acpi, netdev
In-Reply-To: <20181108232353.GL5259@lunn.ch>
On 11/8/18 5:23 PM, Andrew Lunn wrote:
> I don't know much about ACPI. I do know DT. MDIO busses can have
> multiple PHYs on them. Is the following valid to list two PHYs?
>
> Device (MDIO) {
> Name (_DSD, Package () {
> ToUUID("dbb8e3e6-5886-4ba6-8795-1319f52a966b"),
> Package () { Package () { "ethernet-phy@0", PHY0 }, }
> })
> Name (PHY0, Package() {
> ToUUID("daffd814-6eba-4d8c-8a91-bc9bbf4aa301"),
> Package () { Package () { "reg", 0x0 }, }
> })
> Name (_DSD, Package () {
> ToUUID("dbb8e3e6-5886-4ba6-8795-1319f52a966b"),
> Package () { Package () { "ethernet-phy@10", PHY1 }, }
> })
> Name (PHY1, Package() {
> ToUUID("daffd814-6eba-4d8c-8a91-bc9bbf4aa301"),
> Package () { Package () { "reg", 0x10 }, }
> })
> }
You can't have the same DSD twice. It would need to look like this:
Name (PHY1, Package() {
ToUUID("daffd814-6eba-4d8c-8a91-bc9bbf4aa301"),
Package () { Package () { "reg", 0, 0x10 }, }
})
^ permalink raw reply
* Re: [PATCH net-next v2 3/5] virtio_ring: add packed ring support
From: Jason Wang @ 2018-11-09 10:05 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: Tiwei Bie, virtualization, linux-kernel, netdev, virtio-dev, wexu,
jfreimann
In-Reply-To: <20181108225858-mutt-send-email-mst@kernel.org>
On 2018/11/9 下午12:00, Michael S. Tsirkin wrote:
> On Fri, Nov 09, 2018 at 10:30:50AM +0800, Jason Wang wrote:
>> On 2018/11/8 下午11:56, Michael S. Tsirkin wrote:
>>> On Thu, Nov 08, 2018 at 07:51:48PM +0800, Tiwei Bie wrote:
>>>> On Thu, Nov 08, 2018 at 04:18:25PM +0800, Jason Wang wrote:
>>>>> On 2018/11/8 上午9:38, Tiwei Bie wrote:
>>>>>>>> +
>>>>>>>> + if (vq->vq.num_free < descs_used) {
>>>>>>>> + pr_debug("Can't add buf len %i - avail = %i\n",
>>>>>>>> + descs_used, vq->vq.num_free);
>>>>>>>> + /* FIXME: for historical reasons, we force a notify here if
>>>>>>>> + * there are outgoing parts to the buffer. Presumably the
>>>>>>>> + * host should service the ring ASAP. */
>>>>>>> I don't think we have a reason to do this for packed ring.
>>>>>>> No historical baggage there, right?
>>>>>> Based on the original commit log, it seems that the notify here
>>>>>> is just an "optimization". But I don't quite understand what does
>>>>>> the "the heuristics which KVM uses" refer to. If it's safe to drop
>>>>>> this in packed ring, I'd like to do it.
>>>>> According to the commit log, it seems like a workaround of lguest networking
>>>>> backend.
>>>> Do you know why removing this notify in Tx will break "the
>>>> heuristics which KVM uses"? Or what does "the heuristics
>>>> which KVM uses" refer to?
>>> Yes. QEMU has a mode where it disables notifications and processes TX
>>> ring periodically from a timer. It's off by default but used to be on
>>> by default a long time ago. If ring becomes full this causes traffic
>>> stalls.
>>
>> Do you mean tx-timer? If yes, we can still enable it for packed ring
> Yes we can but I doubt anyone does.
>
>> and the
>> timer will finally fired and we can go.
> on tx ring full we probably don't want to wait for timer.
> But I think we can just prevent qemu from using tx timer
> with virtio 1.
Yes, we can.
Thanks
>
>>> As a work-around Rusty put in this hack to kick on ring full
>>> even with notifications disabled.
>>
>> From the commit log it looks more like a performance workaround instead of a
>> bug fix.
> it's a quality of implementation issue, yes.
>
>>> It's easy enough to make sure QEMU
>>> does not combine devices with packed ring support with the timer hack.
>>> And I am guessing it's safe enough to also block that option completely
>>> e.g. when virtio 1.0 is enabled.
>>
>> I agree.
>>
>> Thanks
>>
>>
>>>>> I agree to drop it, we should not have such burden.
>>>>>
>>>>> But we should notice that, with this removed, the compare between packed vs
>>>>> split is kind of unfair. Consider the removal of lguest support recently,
>>>>> maybe we can drop this for split ring as well?
>>>>>
>>>>> Thanks
>>>>>
>>>>>
>>>>>> commit 44653eae1407f79dff6f52fcf594ae84cb165ec4
>>>>>> Author: Rusty Russell<rusty@rustcorp.com.au>
>>>>>> Date: Fri Jul 25 12:06:04 2008 -0500
>>>>>>
>>>>>> virtio: don't always force a notification when ring is full
>>>>>> We force notification when the ring is full, even if the host has
>>>>>> indicated it doesn't want to know. This seemed like a good idea at
>>>>>> the time: if we fill the transmit ring, we should tell the host
>>>>>> immediately.
>>>>>> Unfortunately this logic also applies to the receiving ring, which is
>>>>>> refilled constantly. We should introduce real notification thesholds
>>>>>> to replace this logic. Meanwhile, removing the logic altogether breaks
>>>>>> the heuristics which KVM uses, so we use a hack: only notify if there are
>>>>>> outgoing parts of the new buffer.
>>>>>> Here are the number of exits with lguest's crappy network implementation:
>>>>>> Before:
>>>>>> network xmit 7859051 recv 236420
>>>>>> After:
>>>>>> network xmit 7858610 recv 118136
>>>>>> Signed-off-by: Rusty Russell<rusty@rustcorp.com.au>
>>>>>>
>>>>>> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
>>>>>> index 72bf8bc09014..21d9a62767af 100644
>>>>>> --- a/drivers/virtio/virtio_ring.c
>>>>>> +++ b/drivers/virtio/virtio_ring.c
>>>>>> @@ -87,8 +87,11 @@ static int vring_add_buf(struct virtqueue *_vq,
>>>>>> if (vq->num_free < out + in) {
>>>>>> pr_debug("Can't add buf len %i - avail = %i\n",
>>>>>> out + in, vq->num_free);
>>>>>> - /* We notify*even if* VRING_USED_F_NO_NOTIFY is set here. */
>>>>>> - vq->notify(&vq->vq);
>>>>>> + /* FIXME: for historical reasons, we force a notify here if
>>>>>> + * there are outgoing parts to the buffer. Presumably the
>>>>>> + * host should service the ring ASAP. */
>>>>>> + if (out)
>>>>>> + vq->notify(&vq->vq);
>>>>>> END_USE(vq);
>>>>>> return -ENOSPC;
>>>>>> }
>>>>>>
>>>>>>
^ permalink raw reply
* Re: [PATCH net-next v2 3/5] virtio_ring: add packed ring support
From: Jason Wang @ 2018-11-09 10:04 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: virtio-dev, netdev, linux-kernel, virtualization, wexu
In-Reply-To: <20181108225555-mutt-send-email-mst@kernel.org>
On 2018/11/9 上午11:58, Michael S. Tsirkin wrote:
> On Fri, Nov 09, 2018 at 10:25:28AM +0800, Jason Wang wrote:
>> On 2018/11/8 下午10:14, Michael S. Tsirkin wrote:
>>> On Thu, Nov 08, 2018 at 04:18:25PM +0800, Jason Wang wrote:
>>>> On 2018/11/8 上午9:38, Tiwei Bie wrote:
>>>>>>> +
>>>>>>> + if (vq->vq.num_free < descs_used) {
>>>>>>> + pr_debug("Can't add buf len %i - avail = %i\n",
>>>>>>> + descs_used, vq->vq.num_free);
>>>>>>> + /* FIXME: for historical reasons, we force a notify here if
>>>>>>> + * there are outgoing parts to the buffer. Presumably the
>>>>>>> + * host should service the ring ASAP. */
>>>>>> I don't think we have a reason to do this for packed ring.
>>>>>> No historical baggage there, right?
>>>>> Based on the original commit log, it seems that the notify here
>>>>> is just an "optimization". But I don't quite understand what does
>>>>> the "the heuristics which KVM uses" refer to. If it's safe to drop
>>>>> this in packed ring, I'd like to do it.
>>>> According to the commit log, it seems like a workaround of lguest networking
>>>> backend. I agree to drop it, we should not have such burden.
>>>>
>>>> But we should notice that, with this removed, the compare between packed vs
>>>> split is kind of unfair.
>>> I don't think this ever triggers to be frank. When would it?
>>
>> I think it can happen e.g in the path of XDP transmission in
>> __virtnet_xdp_xmit_one():
>>
>>
>> err = virtqueue_add_outbuf(sq->vq, sq->sg, 1, xdpf, GFP_ATOMIC);
>> if (unlikely(err))
>> return -ENOSPC; /* Caller handle free/refcnt */
>>
> I see. We used to do it for regular xmit but stopped
> doing it. Is it fine for xdp then?
There's no traffic control in XDP, so it was the only thing we can do.
>
>>>> Consider the removal of lguest support recently,
>>>> maybe we can drop this for split ring as well?
>>>>
>>>> Thanks
>>> If it's helpful, then for sure we can drop it for virtio 1.
>>> Can you see any perf differences at all? With which device?
>>
>> I don't test but consider the case of XDP_TX in guest plus vhost_net in
>> host. Since vhost_net is half duplex, it's pretty easier to trigger this
>> condition.
>>
>> Thanks
> Sounds reasonable. Worth testing before we change things though.
Let me test and submit a patch.
Thanks
>
>>>>> commit 44653eae1407f79dff6f52fcf594ae84cb165ec4
>>>>> Author: Rusty Russell<rusty@rustcorp.com.au>
>>>>> Date: Fri Jul 25 12:06:04 2008 -0500
>>>>>
>>>>> virtio: don't always force a notification when ring is full
>>>>> We force notification when the ring is full, even if the host has
>>>>> indicated it doesn't want to know. This seemed like a good idea at
>>>>> the time: if we fill the transmit ring, we should tell the host
>>>>> immediately.
>>>>> Unfortunately this logic also applies to the receiving ring, which is
>>>>> refilled constantly. We should introduce real notification thesholds
>>>>> to replace this logic. Meanwhile, removing the logic altogether breaks
>>>>> the heuristics which KVM uses, so we use a hack: only notify if there are
>>>>> outgoing parts of the new buffer.
>>>>> Here are the number of exits with lguest's crappy network implementation:
>>>>> Before:
>>>>> network xmit 7859051 recv 236420
>>>>> After:
>>>>> network xmit 7858610 recv 118136
>>>>> Signed-off-by: Rusty Russell<rusty@rustcorp.com.au>
>>>>>
>>>>> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
>>>>> index 72bf8bc09014..21d9a62767af 100644
>>>>> --- a/drivers/virtio/virtio_ring.c
>>>>> +++ b/drivers/virtio/virtio_ring.c
>>>>> @@ -87,8 +87,11 @@ static int vring_add_buf(struct virtqueue *_vq,
>>>>> if (vq->num_free < out + in) {
>>>>> pr_debug("Can't add buf len %i - avail = %i\n",
>>>>> out + in, vq->num_free);
>>>>> - /* We notify*even if* VRING_USED_F_NO_NOTIFY is set here. */
>>>>> - vq->notify(&vq->vq);
>>>>> + /* FIXME: for historical reasons, we force a notify here if
>>>>> + * there are outgoing parts to the buffer. Presumably the
>>>>> + * host should service the ring ASAP. */
>>>>> + if (out)
>>>>> + vq->notify(&vq->vq);
>>>>> END_USE(vq);
>>>>> return -ENOSPC;
>>>>> }
>>>>>
>>>>>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* [PATCH v4 bpf-next 7/7] bpftool: support loading flow dissector
From: Stanislav Fomichev @ 2018-11-09 0:22 UTC (permalink / raw)
To: netdev, linux-kselftest, ast, daniel, shuah, jakub.kicinski,
quentin.monnet
Cc: guro, jiong.wang, sdf, bhole_prashant_q7, john.fastabend, jbenc,
treeze.taeung, yhs, osk, sandipan
In-Reply-To: <20181109002213.5914-1-s@fomichev.me>
From: Stanislav Fomichev <sdf@google.com>
This commit adds support for loading/attaching/detaching flow
dissector program. The structure of the flow dissector program is
assumed to be the same as in the selftests:
* flow_dissector section with the main entry point
* a bunch of tail call progs
* a jmp_table map that is populated with the tail call progs
When `bpftool loadall` is called with a flow_dissector prog (i.e. when the
'type flow_dissector' argument is passed), we load and pin all programs.
User is responsible to construct the jump table for the tail calls.
The last argument of `bpftool attach` is made optional for this use
case.
Example:
bpftool prog load tools/testing/selftests/bpf/bpf_flow.o \
/sys/fs/bpf/flow type flow_dissector \
pinmaps /sys/fs/bpf/flow
bpftool map update pinned /sys/fs/bpf/flow/jmp_table \
key 0 0 0 0 \
value pinned /sys/fs/bpf/flow/IP
bpftool map update pinned /sys/fs/bpf/flow/jmp_table \
key 1 0 0 0 \
value pinned /sys/fs/bpf/flow/IPV6
bpftool map update pinned /sys/fs/bpf/flow/jmp_table \
key 2 0 0 0 \
value pinned /sys/fs/bpf/flow/IPV6OP
bpftool map update pinned /sys/fs/bpf/flow/jmp_table \
key 3 0 0 0 \
value pinned /sys/fs/bpf/flow/IPV6FR
bpftool map update pinned /sys/fs/bpf/flow/jmp_table \
key 4 0 0 0 \
value pinned /sys/fs/bpf/flow/MPLS
bpftool map update pinned /sys/fs/bpf/flow/jmp_table \
key 5 0 0 0 \
value pinned /sys/fs/bpf/flow/VLAN
bpftool prog attach pinned /sys/fs/bpf/flow/flow_dissector flow_dissector
Tested by using the above lines to load the prog in
the test_flow_dissector.sh selftest.
Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
.../bpftool/Documentation/bpftool-prog.rst | 26 +++---
tools/bpf/bpftool/bash-completion/bpftool | 14 ++-
tools/bpf/bpftool/prog.c | 87 +++++++++++--------
3 files changed, 77 insertions(+), 50 deletions(-)
diff --git a/tools/bpf/bpftool/Documentation/bpftool-prog.rst b/tools/bpf/bpftool/Documentation/bpftool-prog.rst
index b04c4a365739..d77885176464 100644
--- a/tools/bpf/bpftool/Documentation/bpftool-prog.rst
+++ b/tools/bpf/bpftool/Documentation/bpftool-prog.rst
@@ -26,8 +26,8 @@ MAP COMMANDS
| **bpftool** **prog dump jited** *PROG* [{**file** *FILE* | **opcodes**}]
| **bpftool** **prog pin** *PROG* *FILE*
| **bpftool** **prog { load | loadall }** *OBJ* *FILE* [**type** *TYPE*] [**map** {**idx** *IDX* | **name** *NAME*} *MAP*] [**dev** *NAME*]
-| **bpftool** **prog attach** *PROG* *ATTACH_TYPE* *MAP*
-| **bpftool** **prog detach** *PROG* *ATTACH_TYPE* *MAP*
+| **bpftool** **prog attach** *PROG* *ATTACH_TYPE* [*MAP*]
+| **bpftool** **prog detach** *PROG* *ATTACH_TYPE* [*MAP*]
| **bpftool** **prog help**
|
| *MAP* := { **id** *MAP_ID* | **pinned** *FILE* }
@@ -40,7 +40,9 @@ MAP COMMANDS
| **cgroup/bind4** | **cgroup/bind6** | **cgroup/post_bind4** | **cgroup/post_bind6** |
| **cgroup/connect4** | **cgroup/connect6** | **cgroup/sendmsg4** | **cgroup/sendmsg6**
| }
-| *ATTACH_TYPE* := { **msg_verdict** | **skb_verdict** | **skb_parse** }
+| *ATTACH_TYPE* := {
+| **msg_verdict** | **skb_verdict** | **skb_parse** | **flow_dissector**
+| }
DESCRIPTION
@@ -105,13 +107,17 @@ DESCRIPTION
contain a dot character ('.'), which is reserved for future
extensions of *bpffs*.
- **bpftool prog attach** *PROG* *ATTACH_TYPE* *MAP*
- Attach bpf program *PROG* (with type specified by *ATTACH_TYPE*)
- to the map *MAP*.
-
- **bpftool prog detach** *PROG* *ATTACH_TYPE* *MAP*
- Detach bpf program *PROG* (with type specified by *ATTACH_TYPE*)
- from the map *MAP*.
+ **bpftool prog attach** *PROG* *ATTACH_TYPE* [*MAP*]
+ Attach bpf program *PROG* (with type specified by
+ *ATTACH_TYPE*). Most *ATTACH_TYPEs* require a *MAP*
+ parameter, with the exception of *flow_dissector* which is
+ attached to current networking name space.
+
+ **bpftool prog detach** *PROG* *ATTACH_TYPE* [*MAP*]
+ Detach bpf program *PROG* (with type specified by
+ *ATTACH_TYPE*). Most *ATTACH_TYPEs* require a *MAP*
+ parameter, with the exception of *flow_dissector* which is
+ detached from the current networking name space.
**bpftool prog help**
Print short help message.
diff --git a/tools/bpf/bpftool/bash-completion/bpftool b/tools/bpf/bpftool/bash-completion/bpftool
index a05d0071f39f..45c2db257d2b 100644
--- a/tools/bpf/bpftool/bash-completion/bpftool
+++ b/tools/bpf/bpftool/bash-completion/bpftool
@@ -299,7 +299,8 @@ _bpftool()
fi
if [[ ${#words[@]} == 6 ]]; then
- COMPREPLY=( $( compgen -W "msg_verdict skb_verdict skb_parse" -- "$cur" ) )
+ COMPREPLY=( $( compgen -W "msg_verdict skb_verdict \
+ skb_parse flow_dissector" -- "$cur" ) )
return 0
fi
@@ -338,7 +339,16 @@ _bpftool()
case $prev in
type)
- COMPREPLY=( $( compgen -W "socket kprobe kretprobe classifier action tracepoint raw_tracepoint xdp perf_event cgroup/skb cgroup/sock cgroup/dev lwt_in lwt_out lwt_xmit lwt_seg6local sockops sk_skb sk_msg lirc_mode2 cgroup/bind4 cgroup/bind6 cgroup/connect4 cgroup/connect6 cgroup/sendmsg4 cgroup/sendmsg6 cgroup/post_bind4 cgroup/post_bind6" -- \
+ COMPREPLY=( $( compgen -W "socket kprobe \
+ kretprobe classifier flow_dissector \
+ action tracepoint raw_tracepoint \
+ xdp perf_event cgroup/skb cgroup/sock \
+ cgroup/dev lwt_in lwt_out lwt_xmit \
+ lwt_seg6local sockops sk_skb sk_msg \
+ lirc_mode2 cgroup/bind4 cgroup/bind6 \
+ cgroup/connect4 cgroup/connect6 \
+ cgroup/sendmsg4 cgroup/sendmsg6 \
+ cgroup/post_bind4 cgroup/post_bind6" -- \
"$cur" ) )
return 0
;;
diff --git a/tools/bpf/bpftool/prog.c b/tools/bpf/bpftool/prog.c
index 4654d9450cd9..b808a67d1d3e 100644
--- a/tools/bpf/bpftool/prog.c
+++ b/tools/bpf/bpftool/prog.c
@@ -81,6 +81,7 @@ static const char * const attach_type_strings[] = {
[BPF_SK_SKB_STREAM_PARSER] = "stream_parser",
[BPF_SK_SKB_STREAM_VERDICT] = "stream_verdict",
[BPF_SK_MSG_VERDICT] = "msg_verdict",
+ [BPF_FLOW_DISSECTOR] = "flow_dissector",
[__MAX_BPF_ATTACH_TYPE] = NULL,
};
@@ -721,30 +722,53 @@ int map_replace_compar(const void *p1, const void *p2)
return a->idx - b->idx;
}
-static int do_attach(int argc, char **argv)
+static int parse_atach_detach_args(int argc, char **argv, int *progfd,
+ enum bpf_attach_type *attach_type,
+ int *mapfd)
{
- enum bpf_attach_type attach_type;
- int err, mapfd, progfd;
-
- if (!REQ_ARGS(5)) {
- p_err("too few parameters for map attach");
+ if (!REQ_ARGS(3)) {
+ p_err("too few parameters for attach/detach");
return -EINVAL;
}
- progfd = prog_parse_fd(&argc, &argv);
- if (progfd < 0)
- return progfd;
+ *progfd = prog_parse_fd(&argc, &argv);
+ if (*progfd < 0)
+ return *progfd;
- attach_type = parse_attach_type(*argv);
- if (attach_type == __MAX_BPF_ATTACH_TYPE) {
- p_err("invalid attach type");
+ *attach_type = parse_attach_type(*argv);
+ if (*attach_type == __MAX_BPF_ATTACH_TYPE) {
+ p_err("invalid attach/detach type");
return -EINVAL;
}
+
+ if (*attach_type == BPF_FLOW_DISSECTOR) {
+ *mapfd = -1;
+ return 0;
+ }
+
NEXT_ARG();
+ if (!REQ_ARGS(2)) {
+ p_err("too few parameters for map attach/detach");
+ return -EINVAL;
+ }
- mapfd = map_parse_fd(&argc, &argv);
- if (mapfd < 0)
- return mapfd;
+ *mapfd = map_parse_fd(&argc, &argv);
+ if (*mapfd < 0)
+ return *mapfd;
+
+ return 0;
+}
+
+static int do_attach(int argc, char **argv)
+{
+ enum bpf_attach_type attach_type;
+ int err, progfd;
+ int mapfd;
+
+ err = parse_atach_detach_args(argc, argv,
+ &progfd, &attach_type, &mapfd);
+ if (err)
+ return err;
err = bpf_prog_attach(progfd, mapfd, attach_type, 0);
if (err) {
@@ -760,27 +784,13 @@ static int do_attach(int argc, char **argv)
static int do_detach(int argc, char **argv)
{
enum bpf_attach_type attach_type;
- int err, mapfd, progfd;
-
- if (!REQ_ARGS(5)) {
- p_err("too few parameters for map detach");
- return -EINVAL;
- }
+ int err, progfd;
+ int mapfd;
- progfd = prog_parse_fd(&argc, &argv);
- if (progfd < 0)
- return progfd;
-
- attach_type = parse_attach_type(*argv);
- if (attach_type == __MAX_BPF_ATTACH_TYPE) {
- p_err("invalid attach type");
- return -EINVAL;
- }
- NEXT_ARG();
-
- mapfd = map_parse_fd(&argc, &argv);
- if (mapfd < 0)
- return mapfd;
+ err = parse_atach_detach_args(argc, argv,
+ &progfd, &attach_type, &mapfd);
+ if (err)
+ return err;
err = bpf_prog_detach2(progfd, mapfd, attach_type);
if (err) {
@@ -1092,8 +1102,8 @@ static int do_help(int argc, char **argv)
" [type TYPE] [dev NAME] \\\n"
" [map { idx IDX | name NAME } MAP]\\\n"
" [pinmaps MAP_DIR]\n"
- " %s %s attach PROG ATTACH_TYPE MAP\n"
- " %s %s detach PROG ATTACH_TYPE MAP\n"
+ " %s %s attach PROG ATTACH_TYPE [MAP]\n"
+ " %s %s detach PROG ATTACH_TYPE [MAP]\n"
" %s %s help\n"
"\n"
" " HELP_SPEC_MAP "\n"
@@ -1105,7 +1115,8 @@ static int do_help(int argc, char **argv)
" cgroup/bind4 | cgroup/bind6 | cgroup/post_bind4 |\n"
" cgroup/post_bind6 | cgroup/connect4 | cgroup/connect6 |\n"
" cgroup/sendmsg4 | cgroup/sendmsg6 }\n"
- " ATTACH_TYPE := { msg_verdict | skb_verdict | skb_parse }\n"
+ " ATTACH_TYPE := { msg_verdict | skb_verdict | skb_parse |\n"
+ " flow_dissector }\n"
" " HELP_SPEC_OPTIONS "\n"
"",
bin_name, argv[-2], bin_name, argv[-2], bin_name, argv[-2],
--
2.19.1.930.g4563a0d9d0-goog
^ permalink raw reply related
* [PATCH v4 bpf-next 6/7] bpftool: add pinmaps argument to the load/loadall
From: Stanislav Fomichev @ 2018-11-09 0:22 UTC (permalink / raw)
To: netdev, linux-kselftest, ast, daniel, shuah, jakub.kicinski,
quentin.monnet
Cc: guro, jiong.wang, sdf, bhole_prashant_q7, john.fastabend, jbenc,
treeze.taeung, yhs, osk, sandipan
In-Reply-To: <20181109002213.5914-1-s@fomichev.me>
From: Stanislav Fomichev <sdf@google.com>
This new additional argument lets users pin all maps from the object at
specified path.
Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
.../bpftool/Documentation/bpftool-prog.rst | 4 +++-
tools/bpf/bpftool/bash-completion/bpftool | 3 ++-
tools/bpf/bpftool/prog.c | 24 ++++++++++++++++++-
3 files changed, 28 insertions(+), 3 deletions(-)
diff --git a/tools/bpf/bpftool/Documentation/bpftool-prog.rst b/tools/bpf/bpftool/Documentation/bpftool-prog.rst
index d943d9b67a1d..b04c4a365739 100644
--- a/tools/bpf/bpftool/Documentation/bpftool-prog.rst
+++ b/tools/bpf/bpftool/Documentation/bpftool-prog.rst
@@ -80,7 +80,7 @@ DESCRIPTION
contain a dot character ('.'), which is reserved for future
extensions of *bpffs*.
- **bpftool prog { load | loadall }** *OBJ* *FILE* [**type** *TYPE*] [**map** {**idx** *IDX* | **name** *NAME*} *MAP*] [**dev** *NAME*]
+ **bpftool prog { load | loadall }** *OBJ* *FILE* [**type** *TYPE*] [**map** {**idx** *IDX* | **name** *NAME*} *MAP*] [**dev** *NAME*] [**pinmaps** *MAP_DIR*]
Load bpf program(s) from binary *OBJ* and pin as *FILE*.
Both **bpftool prog load** and **bpftool prog loadall** load
all maps and programs from the *OBJ* and differ only in
@@ -98,6 +98,8 @@ DESCRIPTION
use, referring to it by **id** or through a **pinned** file.
If **dev** *NAME* is specified program will be loaded onto
given networking device (offload).
+ Optional **pinmaps** argument can be provided to pin all
+ maps under *MAP_DIR* directory.
Note: *FILE* must be located in *bpffs* mount. It must not
contain a dot character ('.'), which is reserved for future
diff --git a/tools/bpf/bpftool/bash-completion/bpftool b/tools/bpf/bpftool/bash-completion/bpftool
index 780ebafb756a..a05d0071f39f 100644
--- a/tools/bpf/bpftool/bash-completion/bpftool
+++ b/tools/bpf/bpftool/bash-completion/bpftool
@@ -346,7 +346,7 @@ _bpftool()
_bpftool_get_map_ids
return 0
;;
- pinned)
+ pinned|pinmaps)
_filedir
return 0
;;
@@ -358,6 +358,7 @@ _bpftool()
COMPREPLY=( $( compgen -W "map" -- "$cur" ) )
_bpftool_once_attr 'type'
_bpftool_once_attr 'dev'
+ _bpftool_once_attr 'pinmaps'
return 0
;;
esac
diff --git a/tools/bpf/bpftool/prog.c b/tools/bpf/bpftool/prog.c
index 751a90ccfdab..4654d9450cd9 100644
--- a/tools/bpf/bpftool/prog.c
+++ b/tools/bpf/bpftool/prog.c
@@ -802,6 +802,7 @@ static int load_with_options(int argc, char **argv, bool first_prog_only)
struct map_replace *map_replace = NULL;
struct bpf_program *prog = NULL, *pos;
unsigned int old_map_fds = 0;
+ const char *pinmaps = NULL;
struct bpf_object *obj;
struct bpf_map *map;
const char *pinfile;
@@ -906,6 +907,13 @@ static int load_with_options(int argc, char **argv, bool first_prog_only)
goto err_free_reuse_maps;
}
NEXT_ARG();
+ } else if (is_prefix(*argv, "pinmaps")) {
+ NEXT_ARG();
+
+ if (!REQ_ARGS(1))
+ goto err_free_reuse_maps;
+
+ pinmaps = GET_ARG();
} else {
p_err("expected no more arguments, 'type', 'map' or 'dev', got: '%s'?",
*argv);
@@ -1026,6 +1034,14 @@ static int load_with_options(int argc, char **argv, bool first_prog_only)
}
}
+ if (pinmaps) {
+ err = bpf_object__pin_maps(obj, pinmaps);
+ if (err) {
+ p_err("failed to pin all maps");
+ goto err_unpin;
+ }
+ }
+
if (json_output)
jsonw_null(json_wtr);
@@ -1036,6 +1052,11 @@ static int load_with_options(int argc, char **argv, bool first_prog_only)
return 0;
+err_unpin:
+ if (first_prog_only)
+ unlink(pinfile);
+ else
+ bpf_object__unpin_programs(obj, pinfile);
err_close_obj:
bpf_object__close(obj);
err_free_reuse_maps:
@@ -1069,7 +1090,8 @@ static int do_help(int argc, char **argv)
" %s %s pin PROG FILE\n"
" %s %s { load | loadall } OBJ FILE \\\n"
" [type TYPE] [dev NAME] \\\n"
- " [map { idx IDX | name NAME } MAP]\n"
+ " [map { idx IDX | name NAME } MAP]\\\n"
+ " [pinmaps MAP_DIR]\n"
" %s %s attach PROG ATTACH_TYPE MAP\n"
" %s %s detach PROG ATTACH_TYPE MAP\n"
" %s %s help\n"
--
2.19.1.930.g4563a0d9d0-goog
^ permalink raw reply related
* [PATCH v4 bpf-next 5/7] bpftool: add loadall command
From: Stanislav Fomichev @ 2018-11-09 0:22 UTC (permalink / raw)
To: netdev, linux-kselftest, ast, daniel, shuah, jakub.kicinski,
quentin.monnet
Cc: guro, jiong.wang, sdf, bhole_prashant_q7, john.fastabend, jbenc,
treeze.taeung, yhs, osk, sandipan
In-Reply-To: <20181109002213.5914-1-s@fomichev.me>
From: Stanislav Fomichev <sdf@google.com>
This patch adds new *loadall* command which slightly differs from the
existing *load*. *load* command loads all programs from the obj file,
but pins only the first programs. *loadall* pins all programs from the
obj file under specified directory.
The intended usecase is flow_dissector, where we want to load a bunch
of progs, pin them all and after that construct a jump table.
Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
.../bpftool/Documentation/bpftool-prog.rst | 14 +++-
tools/bpf/bpftool/bash-completion/bpftool | 4 +-
tools/bpf/bpftool/common.c | 31 ++++----
tools/bpf/bpftool/main.h | 1 +
tools/bpf/bpftool/prog.c | 74 ++++++++++++++-----
5 files changed, 82 insertions(+), 42 deletions(-)
diff --git a/tools/bpf/bpftool/Documentation/bpftool-prog.rst b/tools/bpf/bpftool/Documentation/bpftool-prog.rst
index ac4e904b10fb..d943d9b67a1d 100644
--- a/tools/bpf/bpftool/Documentation/bpftool-prog.rst
+++ b/tools/bpf/bpftool/Documentation/bpftool-prog.rst
@@ -15,7 +15,8 @@ SYNOPSIS
*OPTIONS* := { { **-j** | **--json** } [{ **-p** | **--pretty** }] | { **-f** | **--bpffs** } }
*COMMANDS* :=
- { **show** | **list** | **dump xlated** | **dump jited** | **pin** | **load** | **help** }
+ { **show** | **list** | **dump xlated** | **dump jited** | **pin** | **load**
+ | **loadall** | **help** }
MAP COMMANDS
=============
@@ -24,7 +25,7 @@ MAP COMMANDS
| **bpftool** **prog dump xlated** *PROG* [{**file** *FILE* | **opcodes** | **visual**}]
| **bpftool** **prog dump jited** *PROG* [{**file** *FILE* | **opcodes**}]
| **bpftool** **prog pin** *PROG* *FILE*
-| **bpftool** **prog load** *OBJ* *FILE* [**type** *TYPE*] [**map** {**idx** *IDX* | **name** *NAME*} *MAP*] [**dev** *NAME*]
+| **bpftool** **prog { load | loadall }** *OBJ* *FILE* [**type** *TYPE*] [**map** {**idx** *IDX* | **name** *NAME*} *MAP*] [**dev** *NAME*]
| **bpftool** **prog attach** *PROG* *ATTACH_TYPE* *MAP*
| **bpftool** **prog detach** *PROG* *ATTACH_TYPE* *MAP*
| **bpftool** **prog help**
@@ -79,8 +80,13 @@ DESCRIPTION
contain a dot character ('.'), which is reserved for future
extensions of *bpffs*.
- **bpftool prog load** *OBJ* *FILE* [**type** *TYPE*] [**map** {**idx** *IDX* | **name** *NAME*} *MAP*] [**dev** *NAME*]
- Load bpf program from binary *OBJ* and pin as *FILE*.
+ **bpftool prog { load | loadall }** *OBJ* *FILE* [**type** *TYPE*] [**map** {**idx** *IDX* | **name** *NAME*} *MAP*] [**dev** *NAME*]
+ Load bpf program(s) from binary *OBJ* and pin as *FILE*.
+ Both **bpftool prog load** and **bpftool prog loadall** load
+ all maps and programs from the *OBJ* and differ only in
+ pinning. **load** pins only the first program from the *OBJ*
+ as *FILE*. **loadall** pins all programs from the *OBJ*
+ under *FILE* directory.
**type** is optional, if not specified program type will be
inferred from section names.
By default bpftool will create new maps as declared in the ELF
diff --git a/tools/bpf/bpftool/bash-completion/bpftool b/tools/bpf/bpftool/bash-completion/bpftool
index 3f78e6404589..780ebafb756a 100644
--- a/tools/bpf/bpftool/bash-completion/bpftool
+++ b/tools/bpf/bpftool/bash-completion/bpftool
@@ -243,7 +243,7 @@ _bpftool()
# Completion depends on object and command in use
case $object in
prog)
- if [[ $command != "load" ]]; then
+ if [[ $command != "load" && $command != "loadall" ]]; then
case $prev in
id)
_bpftool_get_prog_ids
@@ -309,7 +309,7 @@ _bpftool()
fi
return 0
;;
- load)
+ load|loadall)
local obj
if [[ ${#words[@]} -lt 6 ]]; then
diff --git a/tools/bpf/bpftool/common.c b/tools/bpf/bpftool/common.c
index 25af85304ebe..21ce556c15e1 100644
--- a/tools/bpf/bpftool/common.c
+++ b/tools/bpf/bpftool/common.c
@@ -169,34 +169,23 @@ int open_obj_pinned_any(char *path, enum bpf_obj_type exp_type)
return fd;
}
-int do_pin_fd(int fd, const char *name)
+int mount_bpffs_for_pin(const char *name)
{
char err_str[ERR_MAX_LEN];
char *file;
char *dir;
int err = 0;
- err = bpf_obj_pin(fd, name);
- if (!err)
- goto out;
-
file = malloc(strlen(name) + 1);
strcpy(file, name);
dir = dirname(file);
- if (errno != EPERM || is_bpffs(dir)) {
- p_err("can't pin the object (%s): %s", name, strerror(errno));
+ if (is_bpffs(dir))
+ /* nothing to do if already mounted */
goto out_free;
- }
- /* Attempt to mount bpffs, then retry pinning. */
err = mnt_bpffs(dir, err_str, ERR_MAX_LEN);
- if (!err) {
- err = bpf_obj_pin(fd, name);
- if (err)
- p_err("can't pin the object (%s): %s", name,
- strerror(errno));
- } else {
+ if (err) {
err_str[ERR_MAX_LEN - 1] = '\0';
p_err("can't mount BPF file system to pin the object (%s): %s",
name, err_str);
@@ -204,10 +193,20 @@ int do_pin_fd(int fd, const char *name)
out_free:
free(file);
-out:
return err;
}
+int do_pin_fd(int fd, const char *name)
+{
+ int err;
+
+ err = mount_bpffs_for_pin(name);
+ if (err)
+ return err;
+
+ return bpf_obj_pin(fd, name);
+}
+
int do_pin_any(int argc, char **argv, int (*get_fd_by_id)(__u32))
{
unsigned int id;
diff --git a/tools/bpf/bpftool/main.h b/tools/bpf/bpftool/main.h
index 28322ace2856..1383824c9baf 100644
--- a/tools/bpf/bpftool/main.h
+++ b/tools/bpf/bpftool/main.h
@@ -129,6 +129,7 @@ const char *get_fd_type_name(enum bpf_obj_type type);
char *get_fdinfo(int fd, const char *key);
int open_obj_pinned(char *path);
int open_obj_pinned_any(char *path, enum bpf_obj_type exp_type);
+int mount_bpffs_for_pin(const char *name);
int do_pin_any(int argc, char **argv, int (*get_fd_by_id)(__u32));
int do_pin_fd(int fd, const char *name);
diff --git a/tools/bpf/bpftool/prog.c b/tools/bpf/bpftool/prog.c
index 5302ee282409..751a90ccfdab 100644
--- a/tools/bpf/bpftool/prog.c
+++ b/tools/bpf/bpftool/prog.c
@@ -792,15 +792,16 @@ static int do_detach(int argc, char **argv)
jsonw_null(json_wtr);
return 0;
}
-static int do_load(int argc, char **argv)
+
+static int load_with_options(int argc, char **argv, bool first_prog_only)
{
enum bpf_attach_type expected_attach_type;
struct bpf_object_open_attr attr = {
.prog_type = BPF_PROG_TYPE_UNSPEC,
};
struct map_replace *map_replace = NULL;
+ struct bpf_program *prog = NULL, *pos;
unsigned int old_map_fds = 0;
- struct bpf_program *prog;
struct bpf_object *obj;
struct bpf_map *map;
const char *pinfile;
@@ -918,26 +919,25 @@ static int do_load(int argc, char **argv)
goto err_free_reuse_maps;
}
- prog = bpf_program__next(NULL, obj);
- if (!prog) {
- p_err("object file doesn't contain any bpf program");
- goto err_close_obj;
- }
+ bpf_object__for_each_program(pos, obj) {
+ enum bpf_prog_type prog_type = attr.prog_type;
- bpf_program__set_ifindex(prog, ifindex);
- if (attr.prog_type == BPF_PROG_TYPE_UNSPEC) {
- const char *sec_name = bpf_program__title(prog, false);
+ if (attr.prog_type == BPF_PROG_TYPE_UNSPEC) {
+ const char *sec_name = bpf_program__title(pos, false);
- err = libbpf_prog_type_by_name(sec_name, &attr.prog_type,
- &expected_attach_type);
- if (err < 0) {
- p_err("failed to guess program type based on section name %s\n",
- sec_name);
- goto err_close_obj;
+ err = libbpf_prog_type_by_name(sec_name, &prog_type,
+ &expected_attach_type);
+ if (err < 0) {
+ p_err("failed to guess program type based on section name %s\n",
+ sec_name);
+ goto err_close_obj;
+ }
}
+
+ bpf_program__set_ifindex(pos, ifindex);
+ bpf_program__set_type(pos, prog_type);
+ bpf_program__set_expected_attach_type(pos, expected_attach_type);
}
- bpf_program__set_type(prog, attr.prog_type);
- bpf_program__set_expected_attach_type(prog, expected_attach_type);
qsort(map_replace, old_map_fds, sizeof(*map_replace),
map_replace_compar);
@@ -1001,9 +1001,31 @@ static int do_load(int argc, char **argv)
goto err_close_obj;
}
- if (do_pin_fd(bpf_program__fd(prog), pinfile))
+ err = mount_bpffs_for_pin(pinfile);
+ if (err)
goto err_close_obj;
+ if (first_prog_only) {
+ prog = bpf_program__next(NULL, obj);
+ if (!prog) {
+ p_err("object file doesn't contain any bpf program");
+ goto err_close_obj;
+ }
+
+ err = bpf_obj_pin(bpf_program__fd(prog), pinfile);
+ if (err) {
+ p_err("failed to pin program %s",
+ bpf_program__title(prog, false));
+ goto err_close_obj;
+ }
+ } else {
+ err = bpf_object__pin_programs(obj, pinfile);
+ if (err) {
+ p_err("failed to pin all programs");
+ goto err_close_obj;
+ }
+ }
+
if (json_output)
jsonw_null(json_wtr);
@@ -1023,6 +1045,16 @@ static int do_load(int argc, char **argv)
return -1;
}
+static int do_load(int argc, char **argv)
+{
+ return load_with_options(argc, argv, true);
+}
+
+static int do_loadall(int argc, char **argv)
+{
+ return load_with_options(argc, argv, false);
+}
+
static int do_help(int argc, char **argv)
{
if (json_output) {
@@ -1035,7 +1067,8 @@ static int do_help(int argc, char **argv)
" %s %s dump xlated PROG [{ file FILE | opcodes | visual }]\n"
" %s %s dump jited PROG [{ file FILE | opcodes }]\n"
" %s %s pin PROG FILE\n"
- " %s %s load OBJ FILE [type TYPE] [dev NAME] \\\n"
+ " %s %s { load | loadall } OBJ FILE \\\n"
+ " [type TYPE] [dev NAME] \\\n"
" [map { idx IDX | name NAME } MAP]\n"
" %s %s attach PROG ATTACH_TYPE MAP\n"
" %s %s detach PROG ATTACH_TYPE MAP\n"
@@ -1067,6 +1100,7 @@ static const struct cmd cmds[] = {
{ "dump", do_dump },
{ "pin", do_pin },
{ "load", do_load },
+ { "loadall", do_loadall },
{ "attach", do_attach },
{ "detach", do_detach },
{ 0 }
--
2.19.1.930.g4563a0d9d0-goog
^ permalink raw reply related
* [PATCH v4 bpf-next 4/7] libbpf: add internal pin_name
From: Stanislav Fomichev @ 2018-11-09 0:22 UTC (permalink / raw)
To: netdev, linux-kselftest, ast, daniel, shuah, jakub.kicinski,
quentin.monnet
Cc: guro, jiong.wang, sdf, bhole_prashant_q7, john.fastabend, jbenc,
treeze.taeung, yhs, osk, sandipan
In-Reply-To: <20181109002213.5914-1-s@fomichev.me>
From: Stanislav Fomichev <sdf@google.com>
pin_name is the same as section_name where '/' is replaced
by '_'. bpf_object__pin_programs is converted to use pin_name
to avoid the situation where section_name would require creating another
subdirectory for a pin (as, for example, when calling bpf_object__pin_programs
for programs in sections like "cgroup/connect6").
Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
tools/lib/bpf/libbpf.c | 30 +++++++++++++++++++++++++++---
1 file changed, 27 insertions(+), 3 deletions(-)
diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index cfa269c91e11..38dbeb113eeb 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -124,6 +124,10 @@ struct bpf_program {
char *name;
int prog_ifindex;
char *section_name;
+ /* section_name with / replaced by _; makes recursive pinning
+ * in bpf_object__pin_programs easier
+ */
+ char *pin_name;
struct bpf_insn *insns;
size_t insns_cnt, main_prog_cnt;
enum bpf_prog_type type;
@@ -253,6 +257,7 @@ static void bpf_program__exit(struct bpf_program *prog)
bpf_program__unload(prog);
zfree(&prog->name);
zfree(&prog->section_name);
+ zfree(&prog->pin_name);
zfree(&prog->insns);
zfree(&prog->reloc_desc);
@@ -261,6 +266,18 @@ static void bpf_program__exit(struct bpf_program *prog)
prog->idx = -1;
}
+static char *__bpf_program__pin_name(struct bpf_program *prog)
+{
+ char *name;
+
+ name = strdup(prog->section_name);
+ for (char *p = name; p && *p; p++)
+ if (*p == '/')
+ *p = '_';
+
+ return name;
+}
+
static int
bpf_program__init(void *data, size_t size, char *section_name, int idx,
struct bpf_program *prog)
@@ -279,6 +296,13 @@ bpf_program__init(void *data, size_t size, char *section_name, int idx,
goto errout;
}
+ prog->pin_name = __bpf_program__pin_name(prog);
+ if (!prog->pin_name) {
+ pr_warning("failed to alloc pin name for prog under section(%d) %s\n",
+ idx, section_name);
+ goto errout;
+ }
+
prog->insns = malloc(size);
if (!prog->insns) {
pr_warning("failed to alloc insns for prog under section %s\n",
@@ -2008,7 +2032,7 @@ int bpf_object__pin_programs(struct bpf_object *obj, const char *path)
int len;
len = snprintf(buf, PATH_MAX, "%s/%s", path,
- prog->section_name);
+ prog->pin_name);
if (len < 0) {
err = -EINVAL;
goto err_unpin_programs;
@@ -2032,7 +2056,7 @@ int bpf_object__pin_programs(struct bpf_object *obj, const char *path)
int len;
len = snprintf(buf, PATH_MAX, "%s/%s", path,
- prog->section_name);
+ prog->pin_name);
if (len < 0)
continue;
else if (len >= PATH_MAX)
@@ -2057,7 +2081,7 @@ int bpf_object__unpin_programs(struct bpf_object *obj, const char *path)
int len;
len = snprintf(buf, PATH_MAX, "%s/%s", path,
- prog->section_name);
+ prog->pin_name);
if (len < 0)
return -EINVAL;
else if (len >= PATH_MAX)
--
2.19.1.930.g4563a0d9d0-goog
^ permalink raw reply related
* [PATCH v4 bpf-next 3/7] libbpf: bpf_program__pin: add special case for instances.nr == 1
From: Stanislav Fomichev @ 2018-11-09 0:22 UTC (permalink / raw)
To: netdev, linux-kselftest, ast, daniel, shuah, jakub.kicinski,
quentin.monnet
Cc: guro, jiong.wang, sdf, bhole_prashant_q7, john.fastabend, jbenc,
treeze.taeung, yhs, osk, sandipan
In-Reply-To: <20181109002213.5914-1-s@fomichev.me>
From: Stanislav Fomichev <sdf@google.com>
When bpf_program has only one instance, don't create a subdirectory with
per-instance pin files (<prog>/0). Instead, just create a single pin file
for that single instance. This simplifies object pinning by not creating
unnecessary subdirectories.
This can potentially break existing users that depend on the case
where '/0' is always created. However, I couldn't find any serious
usage of bpf_program__pin inside the kernel tree and I suppose there
should be none outside.
Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
tools/lib/bpf/libbpf.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index f8590490a9dd..cfa269c91e11 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -1761,6 +1761,11 @@ int bpf_program__pin(struct bpf_program *prog, const char *path)
return -EINVAL;
}
+ if (prog->instances.nr == 1) {
+ /* don't create subdirs when pinning single instance */
+ return bpf_program__pin_instance(prog, path, 0);
+ }
+
err = make_dir(path);
if (err)
return err;
@@ -1823,6 +1828,11 @@ int bpf_program__unpin(struct bpf_program *prog, const char *path)
return -EINVAL;
}
+ if (prog->instances.nr == 1) {
+ /* don't create subdirs when pinning single instance */
+ return bpf_program__unpin_instance(prog, path, 0);
+ }
+
for (i = 0; i < prog->instances.nr; i++) {
char buf[PATH_MAX];
int len;
--
2.19.1.930.g4563a0d9d0-goog
^ permalink raw reply related
* [PATCH v4 bpf-next 2/7] libbpf: cleanup after partial failure in bpf_object__pin
From: Stanislav Fomichev @ 2018-11-09 0:22 UTC (permalink / raw)
To: netdev, linux-kselftest, ast, daniel, shuah, jakub.kicinski,
quentin.monnet
Cc: guro, jiong.wang, sdf, bhole_prashant_q7, john.fastabend, jbenc,
treeze.taeung, yhs, osk, sandipan
In-Reply-To: <20181109002213.5914-1-s@fomichev.me>
From: Stanislav Fomichev <sdf@google.com>
bpftool will use bpf_object__pin in the next commits to pin all programs
and maps from the file; in case of a partial failure, we need to get
back to the clean state (undo previous program/map pins).
As part of a cleanup, I've added and exported separate routines to
pin all maps (bpf_object__pin_maps) and progs (bpf_object__pin_programs)
of an object.
Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
tools/lib/bpf/libbpf.c | 328 ++++++++++++++++++++++++++++++++++++++---
tools/lib/bpf/libbpf.h | 18 +++
2 files changed, 323 insertions(+), 23 deletions(-)
diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index d6e62e90e8d4..f8590490a9dd 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -1699,6 +1699,34 @@ int bpf_program__pin_instance(struct bpf_program *prog, const char *path,
return 0;
}
+int bpf_program__unpin_instance(struct bpf_program *prog, const char *path,
+ int instance)
+{
+ int err;
+
+ err = check_path(path);
+ if (err)
+ return err;
+
+ if (prog == NULL) {
+ pr_warning("invalid program pointer\n");
+ return -EINVAL;
+ }
+
+ if (instance < 0 || instance >= prog->instances.nr) {
+ pr_warning("invalid prog instance %d of prog %s (max %d)\n",
+ instance, prog->section_name, prog->instances.nr);
+ return -EINVAL;
+ }
+
+ err = unlink(path);
+ if (err != 0)
+ return -errno;
+ pr_debug("unpinned program '%s'\n", path);
+
+ return 0;
+}
+
static int make_dir(const char *path)
{
char *cp, errmsg[STRERR_BUFSIZE];
@@ -1737,6 +1765,64 @@ int bpf_program__pin(struct bpf_program *prog, const char *path)
if (err)
return err;
+ for (i = 0; i < prog->instances.nr; i++) {
+ char buf[PATH_MAX];
+ int len;
+
+ len = snprintf(buf, PATH_MAX, "%s/%d", path, i);
+ if (len < 0) {
+ err = -EINVAL;
+ goto err_unpin;
+ } else if (len >= PATH_MAX) {
+ err = -ENAMETOOLONG;
+ goto err_unpin;
+ }
+
+ err = bpf_program__pin_instance(prog, buf, i);
+ if (err)
+ goto err_unpin;
+ }
+
+ return 0;
+
+err_unpin:
+ for (i = i - 1; i >= 0; i--) {
+ char buf[PATH_MAX];
+ int len;
+
+ len = snprintf(buf, PATH_MAX, "%s/%d", path, i);
+ if (len < 0)
+ continue;
+ else if (len >= PATH_MAX)
+ continue;
+
+ bpf_program__unpin_instance(prog, buf, i);
+ }
+
+ rmdir(path);
+
+ return err;
+}
+
+int bpf_program__unpin(struct bpf_program *prog, const char *path)
+{
+ int i, err;
+
+ err = check_path(path);
+ if (err)
+ return err;
+
+ if (prog == NULL) {
+ pr_warning("invalid program pointer\n");
+ return -EINVAL;
+ }
+
+ if (prog->instances.nr <= 0) {
+ pr_warning("no instances of prog %s to pin\n",
+ prog->section_name);
+ return -EINVAL;
+ }
+
for (i = 0; i < prog->instances.nr; i++) {
char buf[PATH_MAX];
int len;
@@ -1747,11 +1833,15 @@ int bpf_program__pin(struct bpf_program *prog, const char *path)
else if (len >= PATH_MAX)
return -ENAMETOOLONG;
- err = bpf_program__pin_instance(prog, buf, i);
+ err = bpf_program__unpin_instance(prog, buf, i);
if (err)
return err;
}
+ err = rmdir(path);
+ if (err)
+ return -errno;
+
return 0;
}
@@ -1776,12 +1866,33 @@ int bpf_map__pin(struct bpf_map *map, const char *path)
}
pr_debug("pinned map '%s'\n", path);
+
return 0;
}
-int bpf_object__pin(struct bpf_object *obj, const char *path)
+int bpf_map__unpin(struct bpf_map *map, const char *path)
+{
+ int err;
+
+ err = check_path(path);
+ if (err)
+ return err;
+
+ if (map == NULL) {
+ pr_warning("invalid map pointer\n");
+ return -EINVAL;
+ }
+
+ err = unlink(path);
+ if (err != 0)
+ return -errno;
+ pr_debug("unpinned map '%s'\n", path);
+
+ return 0;
+}
+
+int bpf_object__pin_maps(struct bpf_object *obj, const char *path)
{
- struct bpf_program *prog;
struct bpf_map *map;
int err;
@@ -1797,6 +1908,55 @@ int bpf_object__pin(struct bpf_object *obj, const char *path)
if (err)
return err;
+ bpf_map__for_each(map, obj) {
+ char buf[PATH_MAX];
+ int len;
+
+ len = snprintf(buf, PATH_MAX, "%s/%s", path,
+ bpf_map__name(map));
+ if (len < 0) {
+ err = -EINVAL;
+ goto err_unpin_maps;
+ } else if (len >= PATH_MAX) {
+ err = -ENAMETOOLONG;
+ goto err_unpin_maps;
+ }
+
+ err = bpf_map__pin(map, buf);
+ if (err)
+ goto err_unpin_maps;
+ }
+
+ return 0;
+
+err_unpin_maps:
+ for (map = bpf_map__prev(map, obj);
+ map != NULL;
+ map = bpf_map__prev(map, obj)) {
+ char buf[PATH_MAX];
+ int len;
+
+ len = snprintf(buf, PATH_MAX, "%s/%s", path,
+ bpf_map__name(map));
+ if (len < 0)
+ continue;
+ else if (len >= PATH_MAX)
+ continue;
+
+ bpf_map__unpin(map, buf);
+ }
+
+ return err;
+}
+
+int bpf_object__unpin_maps(struct bpf_object *obj, const char *path)
+{
+ struct bpf_map *map;
+ int err;
+
+ if (!obj)
+ return -ENOENT;
+
bpf_map__for_each(map, obj) {
char buf[PATH_MAX];
int len;
@@ -1808,11 +1968,80 @@ int bpf_object__pin(struct bpf_object *obj, const char *path)
else if (len >= PATH_MAX)
return -ENAMETOOLONG;
- err = bpf_map__pin(map, buf);
+ err = bpf_map__unpin(map, buf);
if (err)
return err;
}
+ return 0;
+}
+
+int bpf_object__pin_programs(struct bpf_object *obj, const char *path)
+{
+ struct bpf_program *prog;
+ int err;
+
+ if (!obj)
+ return -ENOENT;
+
+ if (!obj->loaded) {
+ pr_warning("object not yet loaded; load it first\n");
+ return -ENOENT;
+ }
+
+ err = make_dir(path);
+ if (err)
+ return err;
+
+ bpf_object__for_each_program(prog, obj) {
+ char buf[PATH_MAX];
+ int len;
+
+ len = snprintf(buf, PATH_MAX, "%s/%s", path,
+ prog->section_name);
+ if (len < 0) {
+ err = -EINVAL;
+ goto err_unpin_programs;
+ } else if (len >= PATH_MAX) {
+ err = -ENAMETOOLONG;
+ goto err_unpin_programs;
+ }
+
+ err = bpf_program__pin(prog, buf);
+ if (err)
+ goto err_unpin_programs;
+ }
+
+ return 0;
+
+err_unpin_programs:
+ for (prog = bpf_program__prev(prog, obj);
+ prog != NULL;
+ prog = bpf_program__prev(prog, obj)) {
+ char buf[PATH_MAX];
+ int len;
+
+ len = snprintf(buf, PATH_MAX, "%s/%s", path,
+ prog->section_name);
+ if (len < 0)
+ continue;
+ else if (len >= PATH_MAX)
+ continue;
+
+ bpf_program__unpin(prog, buf);
+ }
+
+ return err;
+}
+
+int bpf_object__unpin_programs(struct bpf_object *obj, const char *path)
+{
+ struct bpf_program *prog;
+ int err;
+
+ if (!obj)
+ return -ENOENT;
+
bpf_object__for_each_program(prog, obj) {
char buf[PATH_MAX];
int len;
@@ -1824,7 +2053,7 @@ int bpf_object__pin(struct bpf_object *obj, const char *path)
else if (len >= PATH_MAX)
return -ENAMETOOLONG;
- err = bpf_program__pin(prog, buf);
+ err = bpf_program__unpin(prog, buf);
if (err)
return err;
}
@@ -1832,6 +2061,23 @@ int bpf_object__pin(struct bpf_object *obj, const char *path)
return 0;
}
+int bpf_object__pin(struct bpf_object *obj, const char *path)
+{
+ int err;
+
+ err = bpf_object__pin_maps(obj, path);
+ if (err)
+ return err;
+
+ err = bpf_object__pin_programs(obj, path);
+ if (err) {
+ bpf_object__unpin_maps(obj, path);
+ return err;
+ }
+
+ return 0;
+}
+
void bpf_object__close(struct bpf_object *obj)
{
size_t i;
@@ -1918,23 +2164,20 @@ void *bpf_object__priv(struct bpf_object *obj)
}
static struct bpf_program *
-__bpf_program__next(struct bpf_program *prev, struct bpf_object *obj)
+__bpf_program__iter(struct bpf_program *p, struct bpf_object *obj, int i)
{
- size_t idx;
+ ssize_t idx;
if (!obj->programs)
return NULL;
- /* First handler */
- if (prev == NULL)
- return &obj->programs[0];
- if (prev->obj != obj) {
+ if (p->obj != obj) {
pr_warning("error: program handler doesn't match object\n");
return NULL;
}
- idx = (prev - obj->programs) + 1;
- if (idx >= obj->nr_programs)
+ idx = (p - obj->programs) + i;
+ if (idx >= obj->nr_programs || idx < 0)
return NULL;
return &obj->programs[idx];
}
@@ -1944,8 +2187,29 @@ bpf_program__next(struct bpf_program *prev, struct bpf_object *obj)
{
struct bpf_program *prog = prev;
+ if (prev == NULL)
+ return obj->programs;
+
do {
- prog = __bpf_program__next(prog, obj);
+ prog = __bpf_program__iter(prog, obj, 1);
+ } while (prog && bpf_program__is_function_storage(prog, obj));
+
+ return prog;
+}
+
+struct bpf_program *
+bpf_program__prev(struct bpf_program *next, struct bpf_object *obj)
+{
+ struct bpf_program *prog = next;
+
+ if (next == NULL) {
+ if (!obj->nr_programs)
+ return NULL;
+ return obj->programs + obj->nr_programs - 1;
+ }
+
+ do {
+ prog = __bpf_program__iter(prog, obj, -1);
} while (prog && bpf_program__is_function_storage(prog, obj));
return prog;
@@ -2272,10 +2536,10 @@ void bpf_map__set_ifindex(struct bpf_map *map, __u32 ifindex)
map->map_ifindex = ifindex;
}
-struct bpf_map *
-bpf_map__next(struct bpf_map *prev, struct bpf_object *obj)
+static struct bpf_map *
+__bpf_map__iter(struct bpf_map *m, struct bpf_object *obj, int i)
{
- size_t idx;
+ ssize_t idx;
struct bpf_map *s, *e;
if (!obj || !obj->maps)
@@ -2284,21 +2548,39 @@ bpf_map__next(struct bpf_map *prev, struct bpf_object *obj)
s = obj->maps;
e = obj->maps + obj->nr_maps;
- if (prev == NULL)
- return s;
-
- if ((prev < s) || (prev >= e)) {
+ if ((m < s) || (m >= e)) {
pr_warning("error in %s: map handler doesn't belong to object\n",
__func__);
return NULL;
}
- idx = (prev - obj->maps) + 1;
- if (idx >= obj->nr_maps)
+ idx = (m - obj->maps) + i;
+ if (idx >= obj->nr_maps || idx < 0)
return NULL;
return &obj->maps[idx];
}
+struct bpf_map *
+bpf_map__next(struct bpf_map *prev, struct bpf_object *obj)
+{
+ if (prev == NULL)
+ return obj->maps;
+
+ return __bpf_map__iter(prev, obj, 1);
+}
+
+struct bpf_map *
+bpf_map__prev(struct bpf_map *next, struct bpf_object *obj)
+{
+ if (next == NULL) {
+ if (!obj->nr_maps)
+ return NULL;
+ return obj->maps + obj->nr_maps - 1;
+ }
+
+ return __bpf_map__iter(next, obj, -1);
+}
+
struct bpf_map *
bpf_object__find_map_by_name(struct bpf_object *obj, const char *name)
{
diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index 1f3468dad8b2..b1686a787102 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -71,6 +71,13 @@ struct bpf_object *__bpf_object__open_xattr(struct bpf_object_open_attr *attr,
LIBBPF_API struct bpf_object *bpf_object__open_buffer(void *obj_buf,
size_t obj_buf_sz,
const char *name);
+LIBBPF_API int bpf_object__pin_maps(struct bpf_object *obj, const char *path);
+LIBBPF_API int bpf_object__unpin_maps(struct bpf_object *obj,
+ const char *path);
+LIBBPF_API int bpf_object__pin_programs(struct bpf_object *obj,
+ const char *path);
+LIBBPF_API int bpf_object__unpin_programs(struct bpf_object *obj,
+ const char *path);
LIBBPF_API int bpf_object__pin(struct bpf_object *object, const char *path);
LIBBPF_API void bpf_object__close(struct bpf_object *object);
@@ -112,6 +119,9 @@ LIBBPF_API struct bpf_program *bpf_program__next(struct bpf_program *prog,
(pos) != NULL; \
(pos) = bpf_program__next((pos), (obj)))
+LIBBPF_API struct bpf_program *bpf_program__prev(struct bpf_program *prog,
+ struct bpf_object *obj);
+
typedef void (*bpf_program_clear_priv_t)(struct bpf_program *,
void *);
@@ -131,7 +141,11 @@ LIBBPF_API int bpf_program__fd(struct bpf_program *prog);
LIBBPF_API int bpf_program__pin_instance(struct bpf_program *prog,
const char *path,
int instance);
+LIBBPF_API int bpf_program__unpin_instance(struct bpf_program *prog,
+ const char *path,
+ int instance);
LIBBPF_API int bpf_program__pin(struct bpf_program *prog, const char *path);
+LIBBPF_API int bpf_program__unpin(struct bpf_program *prog, const char *path);
LIBBPF_API void bpf_program__unload(struct bpf_program *prog);
struct bpf_insn;
@@ -260,6 +274,9 @@ bpf_map__next(struct bpf_map *map, struct bpf_object *obj);
(pos) != NULL; \
(pos) = bpf_map__next((pos), (obj)))
+LIBBPF_API struct bpf_map *
+bpf_map__prev(struct bpf_map *map, struct bpf_object *obj);
+
LIBBPF_API int bpf_map__fd(struct bpf_map *map);
LIBBPF_API const struct bpf_map_def *bpf_map__def(struct bpf_map *map);
LIBBPF_API const char *bpf_map__name(struct bpf_map *map);
@@ -274,6 +291,7 @@ LIBBPF_API int bpf_map__reuse_fd(struct bpf_map *map, int fd);
LIBBPF_API bool bpf_map__is_offload_neutral(struct bpf_map *map);
LIBBPF_API void bpf_map__set_ifindex(struct bpf_map *map, __u32 ifindex);
LIBBPF_API int bpf_map__pin(struct bpf_map *map, const char *path);
+LIBBPF_API int bpf_map__unpin(struct bpf_map *map, const char *path);
LIBBPF_API long libbpf_get_error(const void *ptr);
--
2.19.1.930.g4563a0d9d0-goog
^ permalink raw reply related
* [PATCH v4 bpf-next 1/7] selftests/bpf: rename flow dissector section to flow_dissector
From: Stanislav Fomichev @ 2018-11-09 0:22 UTC (permalink / raw)
To: netdev, linux-kselftest, ast, daniel, shuah, jakub.kicinski,
quentin.monnet
Cc: guro, jiong.wang, sdf, bhole_prashant_q7, john.fastabend, jbenc,
treeze.taeung, yhs, osk, sandipan
In-Reply-To: <20181109002213.5914-1-s@fomichev.me>
From: Stanislav Fomichev <sdf@google.com>
Makes it compatible with the logic that derives program type
from section name in libbpf_prog_type_by_name.
Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
tools/testing/selftests/bpf/bpf_flow.c | 2 +-
tools/testing/selftests/bpf/test_flow_dissector.sh | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/bpf/bpf_flow.c b/tools/testing/selftests/bpf/bpf_flow.c
index 107350a7821d..b9798f558ca7 100644
--- a/tools/testing/selftests/bpf/bpf_flow.c
+++ b/tools/testing/selftests/bpf/bpf_flow.c
@@ -116,7 +116,7 @@ static __always_inline int parse_eth_proto(struct __sk_buff *skb, __be16 proto)
return BPF_DROP;
}
-SEC("dissect")
+SEC("flow_dissector")
int _dissect(struct __sk_buff *skb)
{
if (!skb->vlan_present)
diff --git a/tools/testing/selftests/bpf/test_flow_dissector.sh b/tools/testing/selftests/bpf/test_flow_dissector.sh
index c0fb073b5eab..d23d4da66b83 100755
--- a/tools/testing/selftests/bpf/test_flow_dissector.sh
+++ b/tools/testing/selftests/bpf/test_flow_dissector.sh
@@ -59,7 +59,7 @@ else
fi
# Attach BPF program
-./flow_dissector_load -p bpf_flow.o -s dissect
+./flow_dissector_load -p bpf_flow.o -s flow_dissector
# Setup
tc qdisc add dev lo ingress
--
2.19.1.930.g4563a0d9d0-goog
^ permalink raw reply related
* [PATCH v4 bpf-next 0/7] bpftool: support loading flow dissector
From: Stanislav Fomichev @ 2018-11-09 0:22 UTC (permalink / raw)
To: netdev, linux-kselftest, ast, daniel, shuah, jakub.kicinski,
quentin.monnet
Cc: guro, jiong.wang, sdf, bhole_prashant_q7, john.fastabend, jbenc,
treeze.taeung, yhs, osk, sandipan
From: Stanislav Fomichev <sdf@google.com>
v4 changes:
* addressed another round of comments/style issues from Jakub Kicinski &
Quentin Monnet (thanks!)
* implemented bpf_object__pin_maps and bpf_object__pin_programs helpers and
used them in bpf_program__pin
* added new pin_name to bpf_program so bpf_program__pin
works with sections that contain '/'
* moved *loadall* command implementation into a separate patch
* added patch that implements *pinmaps* to pin maps when doing
load/loadall
v3 changes:
* (maybe) better cleanup for partial failure in bpf_object__pin
* added special case in bpf_program__pin for programs with single
instances
v2 changes:
* addressed comments/style issues from Jakub Kicinski & Quentin Monnet
* removed logic that populates jump table
* added cleanup for partial failure in bpf_object__pin
This patch series adds support for loading and attaching flow dissector
programs from the bpftool:
* first patch fixes flow dissector section name in the selftests (so
libbpf auto-detection works)
* second patch adds proper cleanup to bpf_object__pin, parts of which are now
being used to attach all flow dissector progs/maps
* third patch adds special case in bpf_program__pin for programs with
single instances (we don't create <prog>/0 pin anymore, just <prog>)
* forth patch adds pin_name to the bpf_program struct
which is now used as a pin name in bpf_program__pin et al
* fifth patch adds *loadall* command that pins all programs, not just
the first one
* sixth patch adds *pinmaps* argument to load/loadall to let users pin
all maps of the obj file
* seventh patch adds actual flow_dissector support to the bpftool and
an example
Stanislav Fomichev (7):
selftests/bpf: rename flow dissector section to flow_dissector
libbpf: cleanup after partial failure in bpf_object__pin
libbpf: bpf_program__pin: add special case for instances.nr == 1
libbpf: add internal pin_name
bpftool: add loadall command
bpftool: add pinmaps argument to the load/loadall
bpftool: support loading flow dissector
.../bpftool/Documentation/bpftool-prog.rst | 42 +-
tools/bpf/bpftool/bash-completion/bpftool | 21 +-
tools/bpf/bpftool/common.c | 31 +-
tools/bpf/bpftool/main.h | 1 +
tools/bpf/bpftool/prog.c | 185 ++++++---
tools/lib/bpf/libbpf.c | 364 ++++++++++++++++--
tools/lib/bpf/libbpf.h | 18 +
tools/testing/selftests/bpf/bpf_flow.c | 2 +-
.../selftests/bpf/test_flow_dissector.sh | 2 +-
9 files changed, 546 insertions(+), 120 deletions(-)
--
2.19.1.930.g4563a0d9d0-goog
^ permalink raw reply
* Re: [PATCH net-next 0/7] net: sched: prepare for more Qdisc offloads
From: David Miller @ 2018-11-09 0:20 UTC (permalink / raw)
To: jakub.kicinski
Cc: netdev, oss-drivers, jiri, xiyou.wangcong, jhs, nogah.frankel,
yuvalm
In-Reply-To: <20181108013340.20983-1-jakub.kicinski@netronome.com>
From: Jakub Kicinski <jakub.kicinski@netronome.com>
Date: Wed, 7 Nov 2018 17:33:33 -0800
> This series refactors the "switchdev" Qdisc offloads a little. We have
> a few Qdiscs which can be fully offloaded today to the forwarding plane
> of switching devices.
>
> First patch adds a helper for handing statistic dumps, the code seems
> to be copy pasted between PRIO and RED. Second patch removes unnecessary
> parameter from RED offload function. Third patch makes the MQ offload
> use the dump helper which helps it behave much like PRIO and RED when
> it comes to the TCQ_F_OFFLOADED flag. Patch 4 adds a graft helper,
> similar to the dump helper.
>
> Patch 5 is unrelated to offloads, qdisc_graft() code seemed ripe for a
> small refactor - no functional changes there.
>
> Last two patches move the qdisc_put() call outside of the sch_tree_lock
> section for RED and PRIO. The child Qdiscs will get removed from the
> hierarchy under the lock, but having the put (and potentially destroy)
> called outside of the lock helps offload which may choose to sleep,
> and it should generally lower the Qdisc change impact.
Series applied, thanks Jakub.
^ permalink raw reply
* Re: [PATCH net] net: mvneta: Don't advertise 2.5G modes
From: Russell King - ARM Linux @ 2018-11-09 9:57 UTC (permalink / raw)
To: Maxime Chevallier
Cc: davem, netdev, linux-kernel, Antoine Tenart, thomas.petazzoni,
gregory.clement, Andrew Lunn, linux-arm-kernel
In-Reply-To: <20181109081733.24458-1-maxime.chevallier@bootlin.com>
On Fri, Nov 09, 2018 at 09:17:33AM +0100, Maxime Chevallier wrote:
> Using 2.5G speed relies on the SerDes lanes being configured
> accordingly. The lanes have to be reconfigured to switch between
> 1G and 2.5G, and for now only the bootloader does this configuration.
>
> In the case we add a Comphy driver to handle switching the lanes
> dynamically, it's better for now to stick with supporting only 1G and
> add advertisement for 2.5G once we really are capable of handling both
> speeds without problem.
>
> Since the interface mode is initialy taken from the DT, we want to make
> sure that adding comphy support won't break boards that don't update
> their dtb.
>
> Fixes: da58a931f248 ("net: mvneta: Add support for 2500Mbps SGMII")
> Reported-by: Andrew Lunn <andrew@lunn.ch>
> Reported-by: Russell King <linux@armlinux.org.uk>
> Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
As we discussed on IRC... however, please can we wait until Monday/
Tuesday before merging this patch to allow for some further thought
and discussion - by which time I expect to have a comphy driver.
Thanks.
> ---
> drivers/net/ethernet/marvell/mvneta.c | 12 +++---------
> 1 file changed, 3 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
> index 5bfd349bf41a..c19ecd153499 100644
> --- a/drivers/net/ethernet/marvell/mvneta.c
> +++ b/drivers/net/ethernet/marvell/mvneta.c
> @@ -3343,7 +3343,6 @@ static void mvneta_validate(struct net_device *ndev, unsigned long *supported,
> if (state->interface != PHY_INTERFACE_MODE_NA &&
> state->interface != PHY_INTERFACE_MODE_QSGMII &&
> state->interface != PHY_INTERFACE_MODE_SGMII &&
> - state->interface != PHY_INTERFACE_MODE_2500BASEX &&
> !phy_interface_mode_is_8023z(state->interface) &&
> !phy_interface_mode_is_rgmii(state->interface)) {
> bitmap_zero(supported, __ETHTOOL_LINK_MODE_MASK_NBITS);
> @@ -3357,14 +3356,9 @@ static void mvneta_validate(struct net_device *ndev, unsigned long *supported,
> /* Asymmetric pause is unsupported */
> phylink_set(mask, Pause);
>
> - /* We cannot use 1Gbps when using the 2.5G interface. */
> - if (state->interface == PHY_INTERFACE_MODE_2500BASEX) {
> - phylink_set(mask, 2500baseT_Full);
> - phylink_set(mask, 2500baseX_Full);
> - } else {
> - phylink_set(mask, 1000baseT_Full);
> - phylink_set(mask, 1000baseX_Full);
> - }
> + /* Half-duplex at speeds higher than 100Mbit is unsupported */
> + phylink_set(mask, 1000baseT_Full);
> + phylink_set(mask, 1000baseX_Full);
>
> if (!phy_interface_mode_is_8023z(state->interface)) {
> /* 10M and 100M are only supported in non-802.3z mode */
> --
> 2.11.0
>
--
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up
^ permalink raw reply
* Re: [RFC PATCH 0/3] acpi: Add acpi mdio support code
From: Andrew Lunn @ 2018-11-08 23:23 UTC (permalink / raw)
To: Wang Dongsheng; +Cc: timur, yu.zheng, f.fainelli, rjw, linux-acpi, netdev
In-Reply-To: <cover.1541660504.git.dongsheng.wang@hxt-semitech.com>
On Thu, Nov 08, 2018 at 03:21:29PM +0800, Wang Dongsheng wrote:
> Originally I just push "phy-handle" support for ACPI on the QCOM QDF2400
> platform. After some discussion and following Andrew's advice, I send
> out with a generic version of ACPI.
>
> Current there is no clear documentation about MDIO/PHY for ACPI, so when
> I reading some documents about ACPI [1], I think we just need to reuse the
> DT binding in the ACPI.[2]. However, this series of patches are not
> fully compatible with all contents specified in DT binding.
>
> The most important thing about this iseries is link the phy device and
> fwnode of acpi. Besides, we need to carry out bus scan at the mdio
> register. Therefore, I am not compatible with more DT binding properties
> in this series of patches. More support will be in the follow-up patches
> support, or some people do the support.
>
> Example:
> Based on ACPI doc:
> Documentation/acpi/dsd/data-node-references.txt
> Documentation/acpi/dsd/graph.txt
> With _DSD device properties we can finally do this:
> Device (MDIO) {
> Name (_DSD, Package () {
> ToUUID("dbb8e3e6-5886-4ba6-8795-1319f52a966b"),
> Package () { Package () { "ethernet-phy@0", PHY0 }, }
> })
> Name (PHY0, Package() {
> ToUUID("daffd814-6eba-4d8c-8a91-bc9bbf4aa301"),
> Package () { Package () { "reg", 0x0 }, }
> })
I don't know much about ACPI. I do know DT. MDIO busses can have
multiple PHYs on them. Is the following valid to list two PHYs?
Device (MDIO) {
Name (_DSD, Package () {
ToUUID("dbb8e3e6-5886-4ba6-8795-1319f52a966b"),
Package () { Package () { "ethernet-phy@0", PHY0 }, }
})
Name (PHY0, Package() {
ToUUID("daffd814-6eba-4d8c-8a91-bc9bbf4aa301"),
Package () { Package () { "reg", 0x0 }, }
})
Name (_DSD, Package () {
ToUUID("dbb8e3e6-5886-4ba6-8795-1319f52a966b"),
Package () { Package () { "ethernet-phy@10", PHY1 }, }
})
Name (PHY1, Package() {
ToUUID("daffd814-6eba-4d8c-8a91-bc9bbf4aa301"),
Package () { Package () { "reg", 0x10 }, }
})
}
An MDIO bus can also have more than PHYs on them. There can be
Ethernet switches. Broadcom also have some with generic PHY devices on
them, and other odd things. That means whatever is on an MDIO bus is a
device in the Linux device model. How does that work? Do we need some
form Device (PHY) {}?
Device (MDIO) {
Device (PHY) {
Name (_DSD, Package () {
ToUUID("dbb8e3e6-5886-4ba6-8795-1319f52a966b"),
Package () { Package () { "ethernet-phy@0", PHY0 }, }
})
Name (PHY0, Package() {
ToUUID("daffd814-6eba-4d8c-8a91-bc9bbf4aa301"),
Package () { Package () { "reg", 0x0 }, }
})
}
Device (PHY) {
Name (_DSD, Package () {
ToUUID("dbb8e3e6-5886-4ba6-8795-1319f52a966b"),
Package () { Package () { "ethernet-phy@10", PHY1 }, }
})
Name (PHY1, Package() {
ToUUID("daffd814-6eba-4d8c-8a91-bc9bbf4aa301"),
Package () { Package () { "reg", 0x10 }, }
})
Device (SWITCH) {
Name (_DSD, Package () {
ToUUID("dbb8e3e6-5886-4ba6-8795-1319f52a966b"),
Package () { Package () { "switch@11", SWITCH0 }, }
})
Name (SWITCH0, Package() {
ToUUID("daffd814-6eba-4d8c-8a91-bc9bbf4aa301"),
Package () { Package () { "reg", 0x11 }, }
})
}
}
I'm just trying to ensure whatever is defined is flexible enough that
we really can later support everything which DT does. We have PHYs on
MDIO busses, inside switches, which are on MDIO busses, which are
inside Ethernet interfaces, etc.
An MDIO bus is very similar to an i2c bus. How is that described in
ACPI? Anything we can learn from that?
Thanks
Andrew
^ permalink raw reply
* [PATCH net-next 9/9] sky2: use __vlan_hwaccel helpers
From: Michał Mirosław @ 2018-11-08 23:18 UTC (permalink / raw)
To: netdev
Cc: Mirko Lindner, Stephen Hemminger, Ajit Khaparde, Alexey Kuznetsov,
bridge, Cong Wang, coreteam, David S. Miller, Florian Westphal,
Hideaki YOSHIFUJI, Jamal Hadi Salim, Jiri Pirko, Jozsef Kadlecsik,
linux-rdma, netfilter-devel, Nikolay Aleksandrov,
Pablo Neira Ayuso, Roopa Prabhu, Sathya Perla,
Somnath Kotur <somnath.
In-Reply-To: <cover.1541718583.git.mirq-linux@rere.qmqm.pl>
Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
---
drivers/net/ethernet/marvell/sky2.c | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)
diff --git a/drivers/net/ethernet/marvell/sky2.c b/drivers/net/ethernet/marvell/sky2.c
index 697d9b374f5e..c7cd0081058e 100644
--- a/drivers/net/ethernet/marvell/sky2.c
+++ b/drivers/net/ethernet/marvell/sky2.c
@@ -2485,13 +2485,11 @@ static struct sk_buff *receive_copy(struct sky2_port *sky2,
skb->ip_summed = re->skb->ip_summed;
skb->csum = re->skb->csum;
skb_copy_hash(skb, re->skb);
- skb->vlan_proto = re->skb->vlan_proto;
- skb->vlan_tci = re->skb->vlan_tci;
+ __vlan_hwaccel_copy_tag(skb, re->skb);
pci_dma_sync_single_for_device(sky2->hw->pdev, re->data_addr,
length, PCI_DMA_FROMDEVICE);
- re->skb->vlan_proto = 0;
- re->skb->vlan_tci = 0;
+ __vlan_hwaccel_clear_tag(re->skb);
skb_clear_hash(re->skb);
re->skb->ip_summed = CHECKSUM_NONE;
skb_put(skb, length);
--
2.19.1
^ permalink raw reply related
* [PATCH net-next 7/9] benet: use __vlan_hwaccel helpers
From: Michał Mirosław @ 2018-11-08 23:18 UTC (permalink / raw)
To: netdev
Cc: Sathya Perla, Ajit Khaparde, Sriharsha Basavapatna, Somnath Kotur,
Alexey Kuznetsov, bridge, Cong Wang, coreteam, David S. Miller,
Florian Westphal, Hideaki YOSHIFUJI, Jamal Hadi Salim, Jiri Pirko,
Jozsef Kadlecsik, linux-rdma, Mirko Lindner, netfilter-devel,
Nikolay Aleksandrov, Pablo Neira Ayuso, Roopa
In-Reply-To: <cover.1541718583.git.mirq-linux@rere.qmqm.pl>
Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
---
drivers/net/ethernet/emulex/benet/be_main.c | 13 +++++++++----
1 file changed, 9 insertions(+), 4 deletions(-)
diff --git a/drivers/net/ethernet/emulex/benet/be_main.c b/drivers/net/ethernet/emulex/benet/be_main.c
index c5ad7a4f4d83..80b2bd3747ce 100644
--- a/drivers/net/ethernet/emulex/benet/be_main.c
+++ b/drivers/net/ethernet/emulex/benet/be_main.c
@@ -1049,30 +1049,35 @@ static struct sk_buff *be_insert_vlan_in_pkt(struct be_adapter *adapter,
struct be_wrb_params
*wrb_params)
{
+ bool insert_vlan = false;
u16 vlan_tag = 0;
skb = skb_share_check(skb, GFP_ATOMIC);
if (unlikely(!skb))
return skb;
- if (skb_vlan_tag_present(skb))
+ if (skb_vlan_tag_present(skb)) {
vlan_tag = be_get_tx_vlan_tag(adapter, skb);
+ insert_vlan = true;
+ }
if (qnq_async_evt_rcvd(adapter) && adapter->pvid) {
- if (!vlan_tag)
+ if (!insert_vlan) {
vlan_tag = adapter->pvid;
+ insert_vlan = true;
+ }
/* f/w workaround to set skip_hw_vlan = 1, informs the F/W to
* skip VLAN insertion
*/
BE_WRB_F_SET(wrb_params->features, VLAN_SKIP_HW, 1);
}
- if (vlan_tag) {
+ if (insert_vlan) {
skb = vlan_insert_tag_set_proto(skb, htons(ETH_P_8021Q),
vlan_tag);
if (unlikely(!skb))
return skb;
- skb->vlan_tci = 0;
+ __vlan_hwaccel_clear_tag(skb);
}
/* Insert the outer VLAN, if any */
--
2.19.1
^ permalink raw reply related
* [PATCH net-next 8/9] mlx4: use __vlan_hwaccel helpers
From: Michał Mirosław @ 2018-11-08 23:18 UTC (permalink / raw)
To: netdev
Cc: Tariq Toukan, linux-rdma, Ajit Khaparde, Alexey Kuznetsov, bridge,
Cong Wang, coreteam, David S. Miller, Florian Westphal,
Hideaki YOSHIFUJI, Jamal Hadi Salim, Jiri Pirko, Jozsef Kadlecsik,
Mirko Lindner, netfilter-devel, Nikolay Aleksandrov,
Pablo Neira Ayuso, Roopa Prabhu, Sathya Perla, Somnath Kotur
In-Reply-To: <cover.1541718583.git.mirq-linux@rere.qmqm.pl>
Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
---
drivers/net/ethernet/mellanox/mlx4/en_rx.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
index db00bf1c23f5..fd09ba98c0a6 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
@@ -875,7 +875,7 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int bud
skb->data_len = length;
napi_gro_frags(&cq->napi);
} else {
- skb->vlan_tci = 0;
+ __vlan_hwaccel_clear_tag(skb);
skb_clear_hash(skb);
}
next:
--
2.19.1
^ permalink raw reply related
* [PATCH net-next 6/9] ipv4/tunnel: use __vlan_hwaccel helpers
From: Michał Mirosław @ 2018-11-08 23:18 UTC (permalink / raw)
To: netdev
Cc: David S. Miller, Alexey Kuznetsov, Hideaki YOSHIFUJI,
Ajit Khaparde, bridge, Cong Wang, coreteam, Florian Westphal,
Jamal Hadi Salim, Jiri Pirko, Jozsef Kadlecsik, linux-rdma,
Mirko Lindner, netfilter-devel, Nikolay Aleksandrov,
Pablo Neira Ayuso, Roopa Prabhu, Sathya Perla, Somnath Kotur,
Sriharsha Basavapatna <srih
In-Reply-To: <cover.1541718583.git.mirq-linux@rere.qmqm.pl>
Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
---
net/ipv4/ip_tunnel_core.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/ipv4/ip_tunnel_core.c b/net/ipv4/ip_tunnel_core.c
index dde671e97829..f45b96d715f0 100644
--- a/net/ipv4/ip_tunnel_core.c
+++ b/net/ipv4/ip_tunnel_core.c
@@ -120,7 +120,7 @@ int __iptunnel_pull_header(struct sk_buff *skb, int hdr_len,
}
skb_clear_hash_if_not_l4(skb);
- skb->vlan_tci = 0;
+ __vlan_hwaccel_clear_tag(skb);
skb_set_queue_mapping(skb, 0);
skb_scrub_packet(skb, xnet);
--
2.19.1
^ permalink raw reply related
* [PATCH net-next 5/9] bridge: use __vlan_hwaccel helpers
From: Michał Mirosław @ 2018-11-08 23:18 UTC (permalink / raw)
To: netdev
Cc: Pablo Neira Ayuso, Jozsef Kadlecsik, Florian Westphal,
Roopa Prabhu, Nikolay Aleksandrov, netfilter-devel, coreteam,
bridge, Ajit Khaparde, Alexey Kuznetsov, Cong Wang,
David S. Miller, Hideaki YOSHIFUJI, Jamal Hadi Salim, Jiri Pirko,
linux-rdma, Mirko Lindner, Sathya Perla, Somnath Kotur,
Sriharsha Basavapatna <srih
In-Reply-To: <cover.1541718583.git.mirq-linux@rere.qmqm.pl>
This removes assumption than vlan_tci != 0 when tag is present.
Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
---
net/bridge/br_netfilter_hooks.c | 15 +++++++++------
net/bridge/br_private.h | 2 +-
net/bridge/br_vlan.c | 6 +++---
3 files changed, 13 insertions(+), 10 deletions(-)
diff --git a/net/bridge/br_netfilter_hooks.c b/net/bridge/br_netfilter_hooks.c
index b1b5e8516724..c9383c470a83 100644
--- a/net/bridge/br_netfilter_hooks.c
+++ b/net/bridge/br_netfilter_hooks.c
@@ -671,10 +671,8 @@ static int br_nf_push_frag_xmit(struct net *net, struct sock *sk, struct sk_buff
return 0;
}
- if (data->vlan_tci) {
- skb->vlan_tci = data->vlan_tci;
- skb->vlan_proto = data->vlan_proto;
- }
+ if (data->vlan_proto)
+ __vlan_hwaccel_put_tag(skb, data->vlan_proto, data->vlan_tci);
skb_copy_to_linear_data_offset(skb, -data->size, data->mac, data->size);
__skb_push(skb, data->encap_size);
@@ -740,8 +738,13 @@ static int br_nf_dev_queue_xmit(struct net *net, struct sock *sk, struct sk_buff
data = this_cpu_ptr(&brnf_frag_data_storage);
- data->vlan_tci = skb->vlan_tci;
- data->vlan_proto = skb->vlan_proto;
+ if (skb_vlan_tag_present(skb)) {
+ data->vlan_tci = skb->vlan_tci;
+ data->vlan_proto = skb->vlan_proto;
+ } else {
+ data->vlan_proto = 0;
+ }
+
data->encap_size = nf_bridge_encap_header_len(skb);
data->size = ETH_HLEN + data->encap_size;
diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
index 2920e06a5403..67105c66584a 100644
--- a/net/bridge/br_private.h
+++ b/net/bridge/br_private.h
@@ -905,7 +905,7 @@ static inline int br_vlan_get_tag(const struct sk_buff *skb, u16 *vid)
int err = 0;
if (skb_vlan_tag_present(skb)) {
- *vid = skb_vlan_tag_get(skb) & VLAN_VID_MASK;
+ *vid = skb_vlan_tag_get_id(skb);
} else {
*vid = 0;
err = -EINVAL;
diff --git a/net/bridge/br_vlan.c b/net/bridge/br_vlan.c
index 8c9297a01947..a7e869da21bf 100644
--- a/net/bridge/br_vlan.c
+++ b/net/bridge/br_vlan.c
@@ -420,7 +420,7 @@ struct sk_buff *br_handle_vlan(struct net_bridge *br,
}
if (v->flags & BRIDGE_VLAN_INFO_UNTAGGED)
- skb->vlan_tci = 0;
+ __vlan_hwaccel_clear_tag(skb);
if (p && (p->flags & BR_VLAN_TUNNEL) &&
br_handle_egress_vlan_tunnel(skb, v)) {
@@ -493,8 +493,8 @@ static bool __allowed_ingress(const struct net_bridge *br,
__vlan_hwaccel_put_tag(skb, br->vlan_proto, pvid);
else
/* Priority-tagged Frame.
- * At this point, We know that skb->vlan_tci had
- * VLAN_TAG_PRESENT bit and its VID field was 0x000.
+ * At this point, we know that skb->vlan_tci VID
+ * field was 0.
* We update only VID field and preserve PCP field.
*/
skb->vlan_tci |= pvid;
--
2.19.1
^ permalink raw reply related
* [PATCH net-next 4/9] 8021q: use __vlan_hwaccel helpers
From: Michał Mirosław @ 2018-11-08 23:18 UTC (permalink / raw)
To: netdev
Cc: David S. Miller, Ajit Khaparde, Alexey Kuznetsov, bridge,
Cong Wang, coreteam, Florian Westphal, Hideaki YOSHIFUJI,
Jamal Hadi Salim, Jiri Pirko, Jozsef Kadlecsik, linux-rdma,
Mirko Lindner, netfilter-devel, Nikolay Aleksandrov,
Pablo Neira Ayuso, Roopa Prabhu, Sathya Perla, Somnath Kotur,
Sriharsha Basavapatna <srih
In-Reply-To: <cover.1541718583.git.mirq-linux@rere.qmqm.pl>
Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
---
net/8021q/vlan_core.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/8021q/vlan_core.c b/net/8021q/vlan_core.c
index 4f60e86f4b8d..dd39489c829a 100644
--- a/net/8021q/vlan_core.c
+++ b/net/8021q/vlan_core.c
@@ -57,7 +57,7 @@ bool vlan_do_receive(struct sk_buff **skbp)
}
skb->priority = vlan_get_ingress_priority(vlan_dev, skb->vlan_tci);
- skb->vlan_tci = 0;
+ __vlan_hwaccel_clear_tag(skb);
rx_stats = this_cpu_ptr(vlan_dev_priv(vlan_dev)->vlan_pcpu_stats);
--
2.19.1
^ permalink raw reply related
* [PATCH net-next 3/9] net/core: use __vlan_hwaccel helpers
From: Michał Mirosław @ 2018-11-08 23:18 UTC (permalink / raw)
To: netdev
Cc: David S. Miller, Jamal Hadi Salim, Cong Wang, Jiri Pirko,
Ajit Khaparde, Alexey Kuznetsov, bridge, coreteam,
Florian Westphal, Hideaki YOSHIFUJI, Jozsef Kadlecsik, linux-rdma,
Mirko Lindner, netfilter-devel, Nikolay Aleksandrov,
Pablo Neira Ayuso, Roopa Prabhu, Sathya Perla, Somnath Kotur,
Sriharsha Basavapatna <srih
In-Reply-To: <cover.1541718583.git.mirq-linux@rere.qmqm.pl>
This removes assumptions about VLAN_TAG_PRESENT bit.
Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
---
net/core/dev.c | 8 +++++---
net/core/skbuff.c | 2 +-
net/sched/act_vlan.c | 2 +-
3 files changed, 7 insertions(+), 5 deletions(-)
diff --git a/net/core/dev.c b/net/core/dev.c
index 0ffcbdd55fa9..bf7e0a471186 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4889,7 +4889,7 @@ static int __netif_receive_skb_core(struct sk_buff *skb, bool pfmemalloc,
* and set skb->priority like in vlan_do_receive()
* For the time being, just ignore Priority Code Point
*/
- skb->vlan_tci = 0;
+ __vlan_hwaccel_clear_tag(skb);
}
type = skb->protocol;
@@ -5386,7 +5386,9 @@ static struct list_head *gro_list_prepare(struct napi_struct *napi,
}
diffs = (unsigned long)p->dev ^ (unsigned long)skb->dev;
- diffs |= p->vlan_tci ^ skb->vlan_tci;
+ diffs |= skb_vlan_tag_present(p) ^ skb_vlan_tag_present(skb);
+ if (skb_vlan_tag_present(p))
+ diffs |= p->vlan_tci ^ skb->vlan_tci;
diffs |= skb_metadata_dst_cmp(p, skb);
diffs |= skb_metadata_differs(p, skb);
if (maclen == ETH_HLEN)
@@ -5652,7 +5654,7 @@ static void napi_reuse_skb(struct napi_struct *napi, struct sk_buff *skb)
__skb_pull(skb, skb_headlen(skb));
/* restore the reserve we had after netdev_alloc_skb_ip_align() */
skb_reserve(skb, NET_SKB_PAD + NET_IP_ALIGN - skb_headroom(skb));
- skb->vlan_tci = 0;
+ __vlan_hwaccel_clear_tag(skb);
skb->dev = napi->dev;
skb->skb_iif = 0;
skb->encapsulation = 0;
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index b4ee5c8b928f..5bb5eb500605 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -5123,7 +5123,7 @@ int skb_vlan_pop(struct sk_buff *skb)
int err;
if (likely(skb_vlan_tag_present(skb))) {
- skb->vlan_tci = 0;
+ __vlan_hwaccel_clear_tag(skb);
} else {
if (unlikely(!eth_type_vlan(skb->protocol)))
return 0;
diff --git a/net/sched/act_vlan.c b/net/sched/act_vlan.c
index ba677d54a7af..93fdaf707313 100644
--- a/net/sched/act_vlan.c
+++ b/net/sched/act_vlan.c
@@ -63,7 +63,7 @@ static int tcf_vlan_act(struct sk_buff *skb, const struct tc_action *a,
/* extract existing tag (and guarantee no hw-accel tag) */
if (skb_vlan_tag_present(skb)) {
tci = skb_vlan_tag_get(skb);
- skb->vlan_tci = 0;
+ __vlan_hwaccel_clear_tag(skb);
} else {
/* in-payload vlan tag, pop it */
err = __skb_vlan_pop(skb, &tci);
--
2.19.1
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox