* Re: [PATCH net-next] drivers/net: add 6WIND SHULTI support
From: Nicolas Dichtel @ 2016-04-27 15:38 UTC (permalink / raw)
To: Jiri Pirko, Florian Westphal; +Cc: davem, netdev
In-Reply-To: <20160427151336.GB1962@nanopsycho.orion>
Le 27/04/2016 17:14, Jiri Pirko a écrit :
> Wed, Apr 27, 2016 at 11:56:15AM CEST, fw@strlen.de wrote:
>> Nicolas Dichtel <nicolas.dichtel@6wind.com> wrote:
>>> This patch adds the support of the 6WIND SHULTI switch. It is a software
>>> switch doing L2 forwarding.
>>>
>>> This first version implements the minimum needed to get the device working.
>>> It also implements, via switchdev and rtnetlink, bridge forwarding offload,
>>> including FDB static entries, FDB learning and FDB ageing.
>>
>> How is this different from net/bridge?
>> How is this different from openvswitch?
>
> The difference is that it this tries to allow userspace crap to mirror
> setting user does for bridge/ovs. Basically this looks to me like an
> attempt to enable userspace SDKs and such.
>
It is software switch, allowed by the switchdev model (see
Documentation/networking/switchdev.txt), same design as mellanox spectrum.
What's wrong with that?
^ permalink raw reply
* Re: [net-next PATCH 6/8] mlx4: Add support for inner IPv6 checksum offloads and TSO
From: Tariq Toukan @ 2016-04-27 15:39 UTC (permalink / raw)
To: Alexander Duyck, Saeed Mahameed
Cc: Alex Duyck, Tal Alon, Linux Kernel Network Developers,
David Miller, Gal Pressman, Or Gerlitz, Eran Ben Elisha
In-Reply-To: <CAKgT0UfWUOFs8NvjWFQTtL_3Ure8MWbe5tJhEWFYnowb1kz1Dw@mail.gmail.com>
On 27/04/2016 12:01 AM, Alexander Duyck wrote:
> On Tue, Apr 26, 2016 at 1:23 PM, Saeed Mahameed
> <saeedm@dev.mellanox.co.il> wrote:
>> On Tue, Apr 26, 2016 at 6:50 PM, Alex Duyck <aduyck@mirantis.com> wrote:
>>> The setup is pretty straight forward. Basically I left the first port
>>> in the default namespace and moved the second int a secondary
>>> namespace referred to below as $netns. I then assigned the IPv6
>>> addresses fec0::10:1 and fec0::10:2. After that I ran the following:
>>>
>>> VXLAN=vx$net
>>> echo $VXLAN ${test_options[$i]}
>>> ip link add $VXLAN type vxlan id $net \
>>> local fec0::10:1 remote $addr6 dev $PF0 \
>>> ${test_options[$i]} dstport `expr 8800 + $net`
>>> ip netns exec $netns ip link add $VXLAN type vxlan id $net \
>>> local $addr6 remote fec0::10:1 dev $port \
>>> ${test_options[$i]} dstport `expr 8800 + $net`
>>> ifconfig $VXLAN 192.168.${net}.1/24
>>> ip netns exec $netns ifconfig $VXLAN 192.168.${net}.2/24
>>>
>> Thanks, indeed i see that GSO is not working with vxlan over IPv6 over
>> mlx5 device.
>> We will test out those patches on both mlx4 and mlx5, and debug mlx4
>> IPv6 issue you see.
>>
>>>> Anyway, I suspect it might be related to a driver bug most likely in
>>>> get_real_size function @en_tx.c
>>>> specifically in : *lso_header_size = (skb_inner_transport_header(skb) -
>>>> skb->data) + inner_tcp_hdrlen(skb);
>>>>
>>>> will check this and get back to you.
>>> I'm not entirely convinced. What I was seeing is t hat the hardware
>>> itself was performing Rx checksum offload only on tunnels with an
>>> outer IPv4 header and ignoring tunnels with an outer IPv6 header.
>> I don't get it, are you trying to say that the issue is in the RX side ?
>> what do you mean by ignoring ? Dropping ? or just not validating checksum ?
>> if so why would you disable GSO and IPv6 checksumming on TX ?
> I'm suspecting that whatever parsing logic exists in either the
> hardware or firmware may not be configured to parse tunnels with outer
> IPv6 headers. The tell-tale sign is what occurs with an IPv6 based
> tunnel with no outer checksum. The hardware is not performing a
> checksum on the inner headers so it reports it as a UDP frame with no
> checksum to the stack which ends up preventing us from doing GRO.
> That tells me that the hardware is not parsing IPv6 based tunnels on
> Rx. I am assuming that if the Rx side doesn't work then there is a
> good chance that the Tx won't.
>
>>>>> @@ -2431,7 +2435,18 @@ static netdev_features_t
>>>>> mlx4_en_features_check(struct sk_buff *skb,
>>>>> netdev_features_t
>>>>> features)
>>>>> {
>>>>> features = vlan_features_check(skb, features);
>>>>> - return vxlan_features_check(skb, features);
>>>>> + features = vxlan_features_check(skb, features);
>>>>> +
>>>>> + /* The ConnectX-3 doesn't support outer IPv6 checksums but it does
>>>>> + * support inner IPv6 checksums and segmentation so we need to
>>>>> + * strip that feature if this is an IPv6 encapsulated frame.
>>>>> + */
>>>>> + if (skb->encapsulation &&
>>>>> + (skb->ip_summed == CHECKSUM_PARTIAL) &&
>>>>> + (ip_hdr(skb)->version != 4))
>>>>> + features &= ~(NETIF_F_CSUM_MASK | NETIF_F_GSO_MASK);
>>>> Dejavu, didn't you fix this already in harmonize_features, in
>>>> i.e, it is enough to do here:
>>>>
>>>> if (skb->encapsulation && (skb->ip_summed == CHECKSUM_PARTIAL))
>>>> features &= ~NETIF_F_IPV6_CSUM;
>>>>
>>> So what this patch is doing is enabling an inner IPv6 header offloads.
>>> Up above we set the NETIF_F_IPV6_CSUM bit and we want it to stay set
>>> unless we have an outer IPv6 header because the inner headers may
>>> still need that bit set. If I did what you suggest it strips IPv6
>>> checksum support for inner headers and if we have to use GSO partial I
>>> ended up encountering some of the other bugs that I have fixed for GSO
>>> partial where either sg or csum are not defined.
>>>
>> I see, you mean that you want to disable checksumming and GSO only for
>> packets with Outer(IPv6):Inner(X) and keep it in case for
>> Outer(IPv4):Inner(IPv6)
>> but i think it is weird that the driver decides to disable features it
>> didn't declare in first place (NETIF_F_CSUM_MASK | NETIF_F_GSO_MASK)
>>
>> Retry:
>>
>> if (skb->encapsulation && (skb->ip_summed == CHECKSUM_PARTIAL) &&
>> (ip_hdr(skb)->version != 4))
>> features &= ~NETIF_F_IPV6_CSUM;
>>
>> will this work ?
> Sort of. All that would happen is that you would fall through to
> harmonize_features where NETIF_F_CSUM_MASK | NETIF_F_GSO_MASK gets
> cleared. I just figured I would short-cut things since we cannot
> support inner checksum or any GSO offloads if the tunnel has an outer
> IPv6 header. In addition this happens to effectively be the same code
> I am using in vxlan_features_check to disable things if we cannot
> checksum a protocol so it should help to keep the code size smaller
> for the function if the compiler is smart enough to coalesce similar
> code.
>
>> Anyway i prefer to debug the mlx4 issue first before we discuss the
>> best approach to disable checksumming & GSO for outer IPv6 in mlx4.
> The current code as-is already has it disabled. All I am doing is
> enabling IPv6 checksums for inner headers as it seems like it doesn't
> work for outer headers.
Hi Alex,
I will be working on the mlx4 issue next week after the holidays.
I will check this offline in-house, without blocking the series.
Regards,
Tariq
^ permalink raw reply
* Re: [PATCH net-next 9/9] taskstats: use the libnl API to align nlattr on 64-bit
From: Nicolas Dichtel @ 2016-04-27 15:46 UTC (permalink / raw)
To: Balbir Singh, netdev-u79uwXL29TY76Z2rM5mHXA
Cc: dev-yBygre7rU0TnMu66kgdUjQ,
steffen.klassert-opNxpl+3fjRBDgjK7y7TUQ,
herbert-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q,
aar-bIcnvbaLZ9MEGnE8C9+IrQ, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
davem-fT/PcQaiUtIeIZ0/mPfg9Q, yoshfuji-VfPWfsRibaP+Ru+s062T9g,
netfilter-devel-u79uwXL29TY76Z2rM5mHXA,
kadlec-K40Dz/62t/MgiyqX0sVFJYdd74u8MsAO,
kuznet-v/Mj1YrvjDBInbfyfbPRSQ, jmorris-gx6/JNMH7DfYtjvyW6yDsg,
linux-wpan-u79uwXL29TY76Z2rM5mHXA, kaber-dcUjhNyLwpNeoWH0uzbU5w,
pablo-Cap9r6Oaw4JrovVCs/uTlw
In-Reply-To: <5720B0A2.1030405-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Le 27/04/2016 14:29, Balbir Singh a écrit :
[snip]
> Please try
>
> https://www.kernel.org/doc/Documentation/accounting/getdelays.c
A patch follows this mail to fix that.
>
> iotop uses it as well. My concern is ABI breakage of user space.
My test is ok here, I didn't see a problem.
Code review (from here http://repo.or.cz/w/iotop.git) is also ok.
Regards,
Nicolas
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev
^ permalink raw reply
* [PATCH net-next] taskstats: fix nl parsing in accounting/getdelays.c
From: Nicolas Dichtel @ 2016-04-27 15:47 UTC (permalink / raw)
To: bsingharora
Cc: netdev, davem, linux-kernel, linux-wpan, aar, pablo, kaber,
kadlec, pshelar, kuznet, jmorris, yoshfuji, netfilter-devel, dev,
steffen.klassert, herbert, Nicolas Dichtel
In-Reply-To: <5720DED8.4090706@6wind.com>
The type TASKSTATS_TYPE_NULL should always be ignored.
When jumping to the next attribute, only the length of the current
attribute should be added, not the length of all nested attributes.
This last bug was not visible before commit 80df554275c2, because the
kernel didn't put more than two nested attributes.
Fixes: a3baf649ca9c ("[PATCH] per-task-delay-accounting: documentation")
Fixes: 80df554275c2 ("taskstats: use the libnl API to align nlattr on 64-bit")
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
---
Documentation/accounting/getdelays.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/Documentation/accounting/getdelays.c b/Documentation/accounting/getdelays.c
index 7785fb5eb93f..d3caa6748a46 100644
--- a/Documentation/accounting/getdelays.c
+++ b/Documentation/accounting/getdelays.c
@@ -512,7 +512,7 @@ int main(int argc, char *argv[])
break;
}
len2 += NLA_ALIGN(na->nla_len);
- na = (struct nlattr *) ((char *) na + len2);
+ na = (struct nlattr *) ((char *) na + NLA_ALIGN(na->nla_len));
}
break;
--
2.8.1
^ permalink raw reply related
* Re: [PATCH net-next] taskstats: fix nl parsing in accounting/getdelays.c
From: Nicolas Dichtel @ 2016-04-27 15:49 UTC (permalink / raw)
To: bsingharora-Re5JQEeQqe8AvxtiuMwx3w
Cc: dev-yBygre7rU0TnMu66kgdUjQ,
steffen.klassert-opNxpl+3fjRBDgjK7y7TUQ,
herbert-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q,
aar-bIcnvbaLZ9MEGnE8C9+IrQ, netdev-u79uwXL29TY76Z2rM5mHXA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA, kaber-dcUjhNyLwpNeoWH0uzbU5w,
yoshfuji-VfPWfsRibaP+Ru+s062T9g,
netfilter-devel-u79uwXL29TY76Z2rM5mHXA,
kadlec-K40Dz/62t/MgiyqX0sVFJYdd74u8MsAO,
kuznet-v/Mj1YrvjDBInbfyfbPRSQ, jmorris-gx6/JNMH7DfYtjvyW6yDsg,
linux-wpan-u79uwXL29TY76Z2rM5mHXA, davem-fT/PcQaiUtIeIZ0/mPfg9Q,
pablo-Cap9r6Oaw4JrovVCs/uTlw
In-Reply-To: <1461772077-3216-1-git-send-email-nicolas.dichtel-pdR9zngts4EAvxtiuMwx3w@public.gmane.org>
Le 27/04/2016 17:47, Nicolas Dichtel a écrit :
> The type TASKSTATS_TYPE_NULL should always be ignored.
>
> When jumping to the next attribute, only the length of the current
> attribute should be added, not the length of all nested attributes.
> This last bug was not visible before commit 80df554275c2, because the
> kernel didn't put more than two nested attributes.
>
> Fixes: a3baf649ca9c ("[PATCH] per-task-delay-accounting: documentation")
> Fixes: 80df554275c2 ("taskstats: use the libnl API to align nlattr on 64-bit")
> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Please, drop this version. I fatfingered my rebase.
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev
^ permalink raw reply
* [PATCH net-next v2] taskstats: fix nl parsing in accounting/getdelays.c
From: Nicolas Dichtel @ 2016-04-27 15:53 UTC (permalink / raw)
To: bsingharora-Re5JQEeQqe8AvxtiuMwx3w
Cc: dev-yBygre7rU0TnMu66kgdUjQ,
steffen.klassert-opNxpl+3fjRBDgjK7y7TUQ,
herbert-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q,
aar-bIcnvbaLZ9MEGnE8C9+IrQ, netdev-u79uwXL29TY76Z2rM5mHXA,
Nicolas Dichtel, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
kaber-dcUjhNyLwpNeoWH0uzbU5w, yoshfuji-VfPWfsRibaP+Ru+s062T9g,
netfilter-devel-u79uwXL29TY76Z2rM5mHXA,
kadlec-K40Dz/62t/MgiyqX0sVFJYdd74u8MsAO,
kuznet-v/Mj1YrvjDBInbfyfbPRSQ, jmorris-gx6/JNMH7DfYtjvyW6yDsg,
linux-wpan-u79uwXL29TY76Z2rM5mHXA, davem-fT/PcQaiUtIeIZ0/mPfg9Q,
pablo-Cap9r6Oaw4JrovVCs/uTlw
In-Reply-To: <5720DFA1.3080503-pdR9zngts4EAvxtiuMwx3w@public.gmane.org>
The type TASKSTATS_TYPE_NULL should always be ignored.
When jumping to the next attribute, only the length of the current
attribute should be added, not the length of all nested attributes.
This last bug was not visible before commit 80df554275c2, because the
kernel didn't put more than two nested attributes.
Fixes: a3baf649ca9c ("[PATCH] per-task-delay-accounting: documentation")
Fixes: 80df554275c2 ("taskstats: use the libnl API to align nlattr on 64-bit")
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
---
Documentation/accounting/getdelays.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/Documentation/accounting/getdelays.c b/Documentation/accounting/getdelays.c
index 7785fb5eb93f..b5ca536e56a8 100644
--- a/Documentation/accounting/getdelays.c
+++ b/Documentation/accounting/getdelays.c
@@ -505,6 +505,8 @@ int main(int argc, char *argv[])
if (!loop)
goto done;
break;
+ case TASKSTATS_TYPE_NULL:
+ break;
default:
fprintf(stderr, "Unknown nested"
" nla_type %d\n",
@@ -512,7 +514,8 @@ int main(int argc, char *argv[])
break;
}
len2 += NLA_ALIGN(na->nla_len);
- na = (struct nlattr *) ((char *) na + len2);
+ na = (struct nlattr *)((char *)na +
+ NLA_ALIGN(na->nla_len));
}
break;
--
2.8.1
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev
^ permalink raw reply related
* Re: [PATCH net-next] drivers/net: add 6WIND SHULTI support
From: Jiri Pirko @ 2016-04-27 15:54 UTC (permalink / raw)
To: Nicolas Dichtel; +Cc: Florian Westphal, davem, netdev
In-Reply-To: <5720DCE1.8060803@6wind.com>
Wed, Apr 27, 2016 at 05:38:09PM CEST, nicolas.dichtel@6wind.com wrote:
>Le 27/04/2016 17:14, Jiri Pirko a écrit :
>> Wed, Apr 27, 2016 at 11:56:15AM CEST, fw@strlen.de wrote:
>>> Nicolas Dichtel <nicolas.dichtel@6wind.com> wrote:
>>>> This patch adds the support of the 6WIND SHULTI switch. It is a software
>>>> switch doing L2 forwarding.
>>>>
>>>> This first version implements the minimum needed to get the device working.
>>>> It also implements, via switchdev and rtnetlink, bridge forwarding offload,
>>>> including FDB static entries, FDB learning and FDB ageing.
>>>
>>> How is this different from net/bridge?
>>> How is this different from openvswitch?
>>
>> The difference is that it this tries to allow userspace crap to mirror
>> setting user does for bridge/ovs. Basically this looks to me like an
>> attempt to enable userspace SDKs and such.
>>
>It is software switch, allowed by the switchdev model (see
>Documentation/networking/switchdev.txt), same design as mellanox spectrum.
Switchdev purpose is to offload stuff down to HW.
You say your switch is software, so integrate it properly into kernel.
Easy.
>
>What's wrong with that?
What's wrong is that your driver allows many userspace proprietary SDKs
to work on out-of-box kernel. This is called "trampoline model". You
basically enable userspace drivers for switches and stuff like that.
This was discussed many many times.
^ permalink raw reply
* Re: [PATCH 3.2 085/115] veth: don’t modify ip_summed; doing so treats packets with bad checksums as good.
From: Ben Greear @ 2016-04-27 15:59 UTC (permalink / raw)
To: Ben Hutchings, linux-kernel, stable
Cc: akpm, David S. Miller, Vijay Pandurangan, Cong Wang, netdev,
Evan Jones, Nicolas Dichtel, Phil Sutter, Toshiaki Makita
In-Reply-To: <lsq.1461711744.699003961@decadent.org.uk>
On 04/26/2016 04:02 PM, Ben Hutchings wrote:
> 3.2.80-rc1 review patch. If anyone has any objections, please let me know.
I would be careful about this. It causes regressions when sending
PACKET_SOCKET buffers from user-space to veth devices.
There was a proposed upstream fix for the regression, but it has not gone
into the tree as far as I know.
http://www.spinics.net/lists/netdev/msg370436.html
Thanks,
Ben
>
> ------------------
>
> From: Vijay Pandurangan <vijayp@vijayp.ca>
>
> [ Upstream commit ce8c839b74e3017996fad4e1b7ba2e2625ede82f ]
>
> Packets that arrive from real hardware devices have ip_summed ==
> CHECKSUM_UNNECESSARY if the hardware verified the checksums, or
> CHECKSUM_NONE if the packet is bad or it was unable to verify it. The
> current version of veth will replace CHECKSUM_NONE with
> CHECKSUM_UNNECESSARY, which causes corrupt packets routed from hardware to
> a veth device to be delivered to the application. This caused applications
> at Twitter to receive corrupt data when network hardware was corrupting
> packets.
>
> We believe this was added as an optimization to skip computing and
> verifying checksums for communication between containers. However, locally
> generated packets have ip_summed == CHECKSUM_PARTIAL, so the code as
> written does nothing for them. As far as we can tell, after removing this
> code, these packets are transmitted from one stack to another unmodified
> (tcpdump shows invalid checksums on both sides, as expected), and they are
> delivered correctly to applications. We didn’t test every possible network
> configuration, but we tried a few common ones such as bridging containers,
> using NAT between the host and a container, and routing from hardware
> devices to containers. We have effectively deployed this in production at
> Twitter (by disabling RX checksum offloading on veth devices).
>
> This code dates back to the first version of the driver, commit
> <e314dbdc1c0dc6a548ecf> ("[NET]: Virtual ethernet device driver"), so I
> suspect this bug occurred mostly because the driver API has evolved
> significantly since then. Commit <0b7967503dc97864f283a> ("net/veth: Fix
> packet checksumming") (in December 2010) fixed this for packets that get
> created locally and sent to hardware devices, by not changing
> CHECKSUM_PARTIAL. However, the same issue still occurs for packets coming
> in from hardware devices.
>
> Co-authored-by: Evan Jones <ej@evanjones.ca>
> Signed-off-by: Evan Jones <ej@evanjones.ca>
> Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>
> Cc: Phil Sutter <phil@nwl.cc>
> Cc: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
> Cc: netdev@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Signed-off-by: Vijay Pandurangan <vijayp@vijayp.ca>
> Acked-by: Cong Wang <cwang@twopensource.com>
> Signed-off-by: David S. Miller <davem@davemloft.net>
> [bwh: Backported to 3.2: adjust context]
> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
> ---
> drivers/net/veth.c | 6 ------
> 1 file changed, 6 deletions(-)
>
> --- a/drivers/net/veth.c
> +++ b/drivers/net/veth.c
> @@ -126,11 +126,6 @@ static netdev_tx_t veth_xmit(struct sk_b
> stats = this_cpu_ptr(priv->stats);
> rcv_stats = this_cpu_ptr(rcv_priv->stats);
>
> - /* don't change ip_summed == CHECKSUM_PARTIAL, as that
> - will cause bad checksum on forwarded packets */
> - if (skb->ip_summed == CHECKSUM_NONE &&
> - rcv->features & NETIF_F_RXCSUM)
> - skb->ip_summed = CHECKSUM_UNNECESSARY;
>
> length = skb->len;
> if (dev_forward_skb(rcv, skb) != NET_RX_SUCCESS)
>
--
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc http://www.candelatech.com
^ permalink raw reply
* [PATCH net-next 0/7] bridge: per-vlan stats
From: Nikolay Aleksandrov @ 2016-04-27 16:18 UTC (permalink / raw)
To: netdev; +Cc: roopa, davem, stephen, jhs, Nikolay Aleksandrov
Hi,
This set adds support for bridge per-vlan statistics.
In order to be able to dump statistics we need a way to continue
dumping after reaching maximum size, thus patches 01-03 extend the new
stats API with a per-device extended link stats attribute and callback
which can save its local state and continue where it left off afterwards.
I considered using the already existing "fill_xstats" callback but it gets
confusing since we need to separate the linkinfo dump from the new stats
api dump and adding a flag/argument to do that just looks messy. I don't
think the rtnl_link_ops size is an issue, so adding these seemed like the
cleaner approach.
Patch 05 converts the pvid to a pointer so we can consolidate the vlan
stats accounting paths later, also allows to simplify the pvid code.
Patches 06 and 07 add the stats support and netlink dump support
respectively.
I've tested this set with both old and modified iproute2, kmemleak on and
some traffic stress tests while adding/removing vlans and ports.
Thank you,
Nik
Note: Jamal I haven't forgotten about the per-port per-vlan stats, I've got
a follow-up patch that adds it. You can easily see that the infrastructure
for private port/vlan stats is in place after this set. Though the stats
api will need some more changes to support that.
Nikolay Aleksandrov (7):
net: rtnetlink: allow rtnl_fill_statsinfo to save private state
counter
net: rtnetlink: allow only one idx saving stats attribute
net: rtnetlink: add linkxstats callbacks and attribute
net: constify is_skb_forwardable's arguments
bridge: vlan: RCUify pvid
bridge: vlan: learn to count
bridge: netlink: export per-vlan stats
include/linux/netdevice.h | 3 +-
include/net/rtnetlink.h | 10 +++
include/uapi/linux/if_bridge.h | 8 +++
include/uapi/linux/if_link.h | 9 +++
net/bridge/br_netlink.c | 80 ++++++++++++++++++++----
net/bridge/br_private.h | 32 +++++-----
net/bridge/br_vlan.c | 134 +++++++++++++++++++++++++++++------------
net/core/dev.c | 2 +-
net/core/rtnetlink.c | 64 +++++++++++++++++---
9 files changed, 266 insertions(+), 76 deletions(-)
--
2.4.11
^ permalink raw reply
* [PATCH net-next 1/7] net: rtnetlink: allow rtnl_fill_statsinfo to save private state counter
From: Nikolay Aleksandrov @ 2016-04-27 16:18 UTC (permalink / raw)
To: netdev; +Cc: roopa, davem, stephen, jhs, Nikolay Aleksandrov
In-Reply-To: <1461773902-13528-1-git-send-email-nikolay@cumulusnetworks.com>
The new lidx argument allows the current dumping device to save a
private state counter which would enable it to continue dumping from
where it left off.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
---
net/core/rtnetlink.c | 21 +++++++++++++++------
1 file changed, 15 insertions(+), 6 deletions(-)
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 5ec059d52823..aeb2fa9b1cda 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -3446,11 +3446,13 @@ out:
static int rtnl_fill_statsinfo(struct sk_buff *skb, struct net_device *dev,
int type, u32 pid, u32 seq, u32 change,
- unsigned int flags, unsigned int filter_mask)
+ unsigned int flags, unsigned int filter_mask,
+ int *lidx)
{
struct if_stats_msg *ifsm;
struct nlmsghdr *nlh;
struct nlattr *attr;
+ int s_lidx = *lidx;
ASSERT_RTNL();
@@ -3480,7 +3482,11 @@ static int rtnl_fill_statsinfo(struct sk_buff *skb, struct net_device *dev,
return 0;
nla_put_failure:
- nlmsg_cancel(skb, nlh);
+ /* If we haven't made progress, it's a real error */
+ if (s_lidx == *lidx)
+ nlmsg_cancel(skb, nlh);
+ else
+ nlmsg_end(skb, nlh);
return -EMSGSIZE;
}
@@ -3507,6 +3513,7 @@ static int rtnl_stats_get(struct sk_buff *skb, struct nlmsghdr *nlh)
struct net_device *dev = NULL;
struct sk_buff *nskb;
u32 filter_mask;
+ int lidx = 0;
int err;
ifsm = nlmsg_data(nlh);
@@ -3528,7 +3535,7 @@ static int rtnl_stats_get(struct sk_buff *skb, struct nlmsghdr *nlh)
err = rtnl_fill_statsinfo(nskb, dev, RTM_NEWSTATS,
NETLINK_CB(skb).portid, nlh->nlmsg_seq, 0,
- 0, filter_mask);
+ 0, filter_mask, &lidx);
if (err < 0) {
/* -EMSGSIZE implies BUG in if_nlmsg_stats_size */
WARN_ON(err == -EMSGSIZE);
@@ -3545,7 +3552,7 @@ static int rtnl_stats_dump(struct sk_buff *skb, struct netlink_callback *cb)
struct net *net = sock_net(skb->sk);
struct if_stats_msg *ifsm;
int h, s_h;
- int idx = 0, s_idx;
+ int idx = 0, s_idx, s_lidx;
struct net_device *dev;
struct hlist_head *head;
unsigned int flags = NLM_F_MULTI;
@@ -3554,6 +3561,7 @@ static int rtnl_stats_dump(struct sk_buff *skb, struct netlink_callback *cb)
s_h = cb->args[0];
s_idx = cb->args[1];
+ s_lidx = cb->args[2];
cb->seq = net->dev_base_seq;
@@ -3571,7 +3579,7 @@ static int rtnl_stats_dump(struct sk_buff *skb, struct netlink_callback *cb)
err = rtnl_fill_statsinfo(skb, dev, RTM_NEWSTATS,
NETLINK_CB(cb->skb).portid,
cb->nlh->nlmsg_seq, 0,
- flags, filter_mask);
+ flags, filter_mask, &s_lidx);
/* If we ran out of room on the first message,
* we're in trouble
*/
@@ -3579,13 +3587,14 @@ static int rtnl_stats_dump(struct sk_buff *skb, struct netlink_callback *cb)
if (err < 0)
goto out;
-
+ s_lidx = 0;
nl_dump_check_consistent(cb, nlmsg_hdr(skb));
cont:
idx++;
}
}
out:
+ cb->args[2] = s_lidx;
cb->args[1] = idx;
cb->args[0] = h;
--
2.4.11
^ permalink raw reply related
* [PATCH net-next 2/7] net: rtnetlink: allow only one idx saving stats attribute
From: Nikolay Aleksandrov @ 2016-04-27 16:18 UTC (permalink / raw)
To: netdev; +Cc: roopa, davem, stephen, jhs, Nikolay Aleksandrov
In-Reply-To: <1461773902-13528-1-git-send-email-nikolay@cumulusnetworks.com>
We can't allow more than one stats attribute which uses the local idx
since the result will be a mess. This is a simple check to make sure
only one is being used at a time. Later when the filter_mask's 32 bits
are over we can switch to a bitmap.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
---
include/net/rtnetlink.h | 6 ++++++
net/core/rtnetlink.c | 17 +++++++++++++++--
2 files changed, 21 insertions(+), 2 deletions(-)
diff --git a/include/net/rtnetlink.h b/include/net/rtnetlink.h
index 2f87c1ba13de..3f3b0b1b8722 100644
--- a/include/net/rtnetlink.h
+++ b/include/net/rtnetlink.h
@@ -150,4 +150,10 @@ int rtnl_nla_parse_ifla(struct nlattr **tb, const struct nlattr *head, int len);
#define MODULE_ALIAS_RTNL_LINK(kind) MODULE_ALIAS("rtnl-link-" kind)
+/* at most one attribute which can save a local idx is allowed to be set
+ * IFLA_STATS_IDX_ATTR_MASK has all the idx saving attributes set and is
+ * used to check if more than one is being requested
+ */
+#define IFLA_STATS_IDX_ATTR_MASK 0
+
#endif
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index aeb2fa9b1cda..ea03b6cd3d3c 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -3512,7 +3512,7 @@ static int rtnl_stats_get(struct sk_buff *skb, struct nlmsghdr *nlh)
struct if_stats_msg *ifsm;
struct net_device *dev = NULL;
struct sk_buff *nskb;
- u32 filter_mask;
+ u32 filter_mask, lidx_filter;
int lidx = 0;
int err;
@@ -3529,6 +3529,14 @@ static int rtnl_stats_get(struct sk_buff *skb, struct nlmsghdr *nlh)
if (!filter_mask)
return -EINVAL;
+ /* only one attribute which can save a local idx is allowed at a time
+ * even though rtnl_stats_get doesn't save the lidx, we need to be
+ * consistent with the dump side and error out
+ */
+ lidx_filter = filter_mask & IFLA_STATS_IDX_ATTR_MASK;
+ if (lidx_filter && !is_power_of_2(lidx_filter))
+ return -EINVAL;
+
nskb = nlmsg_new(if_nlmsg_stats_size(dev, filter_mask), GFP_KERNEL);
if (!nskb)
return -ENOBUFS;
@@ -3556,7 +3564,7 @@ static int rtnl_stats_dump(struct sk_buff *skb, struct netlink_callback *cb)
struct net_device *dev;
struct hlist_head *head;
unsigned int flags = NLM_F_MULTI;
- u32 filter_mask = 0;
+ u32 filter_mask = 0, lidx_filter;
int err;
s_h = cb->args[0];
@@ -3570,6 +3578,11 @@ static int rtnl_stats_dump(struct sk_buff *skb, struct netlink_callback *cb)
if (!filter_mask)
return -EINVAL;
+ /* only one attribute which can save a local idx is allowed at a time */
+ lidx_filter = filter_mask & IFLA_STATS_IDX_ATTR_MASK;
+ if (lidx_filter && !is_power_of_2(lidx_filter))
+ return -EINVAL;
+
for (h = s_h; h < NETDEV_HASHENTRIES; h++, s_idx = 0) {
idx = 0;
head = &net->dev_index_head[h];
--
2.4.11
^ permalink raw reply related
* [PATCH net-next 3/7] net: rtnetlink: add linkxstats callbacks and attribute
From: Nikolay Aleksandrov @ 2016-04-27 16:18 UTC (permalink / raw)
To: netdev; +Cc: roopa, davem, stephen, jhs, Nikolay Aleksandrov
In-Reply-To: <1461773902-13528-1-git-send-email-nikolay@cumulusnetworks.com>
Add callbacks to calculate the size and fill link extended statistics
which can be split into multiple messages and are dumped via the new
rtnl stats API (RTM_GETSTATS) with the IFLA_STATS_LINK_XSTATS attribute.
Also add that attribute to the idx mask check since it is expected to
be able to save state and resume dumping (e.g. future bridge per-vlan
stats will be dumped via this attribute and callbacks).
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
---
include/net/rtnetlink.h | 6 +++++-
include/uapi/linux/if_link.h | 8 ++++++++
net/core/rtnetlink.c | 26 ++++++++++++++++++++++++++
3 files changed, 39 insertions(+), 1 deletion(-)
diff --git a/include/net/rtnetlink.h b/include/net/rtnetlink.h
index 3f3b0b1b8722..b449c1f3416f 100644
--- a/include/net/rtnetlink.h
+++ b/include/net/rtnetlink.h
@@ -95,6 +95,10 @@ struct rtnl_link_ops {
const struct net_device *dev,
const struct net_device *slave_dev);
struct net *(*get_link_net)(const struct net_device *dev);
+ size_t (*get_linkxstats_size)(const struct net_device *dev);
+ int (*fill_linkxstats)(struct sk_buff *skb,
+ const struct net_device *dev,
+ int *lidx);
};
int __rtnl_link_register(struct rtnl_link_ops *ops);
@@ -154,6 +158,6 @@ int rtnl_nla_parse_ifla(struct nlattr **tb, const struct nlattr *head, int len);
* IFLA_STATS_IDX_ATTR_MASK has all the idx saving attributes set and is
* used to check if more than one is being requested
*/
-#define IFLA_STATS_IDX_ATTR_MASK 0
+#define IFLA_STATS_IDX_ATTR_MASK IFLA_STATS_FILTER_BIT(IFLA_STATS_LINK_XSTATS)
#endif
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index ba69d4447249..1b874e26b15b 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -798,6 +798,7 @@ struct if_stats_msg {
enum {
IFLA_STATS_UNSPEC, /* also used as 64bit pad attribute */
IFLA_STATS_LINK_64,
+ IFLA_STATS_LINK_XSTATS,
__IFLA_STATS_MAX,
};
@@ -805,4 +806,11 @@ enum {
#define IFLA_STATS_FILTER_BIT(ATTR) (1 << (ATTR - 1))
+/* These are embedded into IFLA_STATS_LINK_XSTATS */
+enum {
+ LINK_XSTATS_UNSPEC,
+ __LINK_XSTATS_MAX
+};
+#define LINK_XSTATS_MAX (__LINK_XSTATS_MAX - 1)
+
#endif /* _UAPI_LINUX_IF_LINK_H */
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index ea03b6cd3d3c..9637618c408d 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -3477,6 +3477,23 @@ static int rtnl_fill_statsinfo(struct sk_buff *skb, struct net_device *dev,
dev_get_stats(dev, sp);
}
+ if (filter_mask & IFLA_STATS_FILTER_BIT(IFLA_STATS_LINK_XSTATS)) {
+ const struct rtnl_link_ops *ops = dev->rtnl_link_ops;
+
+ if (ops && ops->fill_linkxstats) {
+ int err;
+
+ attr = nla_nest_start(skb, IFLA_STATS_LINK_XSTATS);
+ if (!attr)
+ goto nla_put_failure;
+
+ err = ops->fill_linkxstats(skb, dev, lidx);
+ nla_nest_end(skb, attr);
+ if (err)
+ goto nla_put_failure;
+ }
+ }
+
nlmsg_end(skb, nlh);
return 0;
@@ -3503,6 +3520,15 @@ static size_t if_nlmsg_stats_size(const struct net_device *dev,
if (filter_mask & IFLA_STATS_FILTER_BIT(IFLA_STATS_LINK_64))
size += nla_total_size_64bit(sizeof(struct rtnl_link_stats64));
+ if (filter_mask & IFLA_STATS_FILTER_BIT(IFLA_STATS_LINK_XSTATS)) {
+ const struct rtnl_link_ops *ops = dev->rtnl_link_ops;
+
+ if (ops && ops->get_linkxstats_size)
+ size += nla_total_size(ops->get_linkxstats_size(dev));
+ /* anything dumped is embedded in IFLA_STATS_LINK_XSTATS */
+ size += nla_total_size(0);
+ }
+
return size;
}
--
2.4.11
^ permalink raw reply related
* [PATCH net-next 4/7] net: constify is_skb_forwardable's arguments
From: Nikolay Aleksandrov @ 2016-04-27 16:18 UTC (permalink / raw)
To: netdev; +Cc: roopa, davem, stephen, jhs, Nikolay Aleksandrov
In-Reply-To: <1461773902-13528-1-git-send-email-nikolay@cumulusnetworks.com>
is_skb_forwardable is not supposed to change anything so constify its
arguments
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
---
include/linux/netdevice.h | 3 ++-
net/core/dev.c | 2 +-
2 files changed, 3 insertions(+), 2 deletions(-)
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 1f6d5db471a2..85d56e7cd46b 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3263,7 +3263,8 @@ struct sk_buff *dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev,
struct netdev_queue *txq, int *ret);
int __dev_forward_skb(struct net_device *dev, struct sk_buff *skb);
int dev_forward_skb(struct net_device *dev, struct sk_buff *skb);
-bool is_skb_forwardable(struct net_device *dev, struct sk_buff *skb);
+bool is_skb_forwardable(const struct net_device *dev,
+ const struct sk_buff *skb);
extern int netdev_budget;
diff --git a/net/core/dev.c b/net/core/dev.c
index 6324bc9267f7..36f0170a0754 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1741,7 +1741,7 @@ static inline void net_timestamp_set(struct sk_buff *skb)
__net_timestamp(SKB); \
} \
-bool is_skb_forwardable(struct net_device *dev, struct sk_buff *skb)
+bool is_skb_forwardable(const struct net_device *dev, const struct sk_buff *skb)
{
unsigned int len;
--
2.4.11
^ permalink raw reply related
* [PATCH net-next 5/7] bridge: vlan: RCUify pvid
From: Nikolay Aleksandrov @ 2016-04-27 16:18 UTC (permalink / raw)
To: netdev; +Cc: roopa, davem, stephen, jhs, Nikolay Aleksandrov
In-Reply-To: <1461773902-13528-1-git-send-email-nikolay@cumulusnetworks.com>
Make pvid a pointer to a vlan struct and RCUify the access to it. Vlans
are already RCU-protected so the pvid vlan entry cannot disappear
without being initialized to NULL and going through a grace period first.
This change is necessary for the upcoming vlan counters and also would
serve to later move to vlan passing via a pointer instead of id.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
---
net/bridge/br_netlink.c | 23 ++++++++++-----------
net/bridge/br_private.h | 16 +--------------
net/bridge/br_vlan.c | 54 +++++++++++++++++++++++--------------------------
3 files changed, 37 insertions(+), 56 deletions(-)
diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c
index e9c635eae24d..f33d95b0f5d3 100644
--- a/net/bridge/br_netlink.c
+++ b/net/bridge/br_netlink.c
@@ -24,22 +24,22 @@
static int __get_num_vlan_infos(struct net_bridge_vlan_group *vg,
u32 filter_mask)
{
- struct net_bridge_vlan *v;
+ struct net_bridge_vlan *v, *pvid;
u16 vid_range_start = 0, vid_range_end = 0, vid_range_flags = 0;
- u16 flags, pvid;
int num_vlans = 0;
+ u16 flags;
if (!(filter_mask & RTEXT_FILTER_BRVLAN_COMPRESSED))
return 0;
- pvid = br_get_pvid(vg);
+ pvid = rcu_dereference(vg->pvid);
/* Count number of vlan infos */
list_for_each_entry_rcu(v, &vg->vlan_list, vlist) {
flags = 0;
/* only a context, bridge vlan not activated */
if (!br_vlan_should_use(v))
continue;
- if (v->vid == pvid)
+ if (v == pvid)
flags |= BRIDGE_VLAN_INFO_PVID;
if (v->flags & BRIDGE_VLAN_INFO_UNTAGGED)
@@ -243,21 +243,21 @@ nla_put_failure:
static int br_fill_ifvlaninfo_compressed(struct sk_buff *skb,
struct net_bridge_vlan_group *vg)
{
- struct net_bridge_vlan *v;
+ struct net_bridge_vlan *v, *pvid;
u16 vid_range_start = 0, vid_range_end = 0, vid_range_flags = 0;
- u16 flags, pvid;
int err = 0;
+ u16 flags;
/* Pack IFLA_BRIDGE_VLAN_INFO's for every vlan
* and mark vlan info with begin and end flags
* if vlaninfo represents a range
*/
- pvid = br_get_pvid(vg);
+ pvid = rcu_dereference(vg->pvid);
list_for_each_entry_rcu(v, &vg->vlan_list, vlist) {
flags = 0;
if (!br_vlan_should_use(v))
continue;
- if (v->vid == pvid)
+ if (v == pvid)
flags |= BRIDGE_VLAN_INFO_PVID;
if (v->flags & BRIDGE_VLAN_INFO_UNTAGGED)
@@ -298,18 +298,17 @@ initvars:
static int br_fill_ifvlaninfo(struct sk_buff *skb,
struct net_bridge_vlan_group *vg)
{
+ struct net_bridge_vlan *v, *pvid;
struct bridge_vlan_info vinfo;
- struct net_bridge_vlan *v;
- u16 pvid;
- pvid = br_get_pvid(vg);
+ pvid = rcu_dereference(vg->pvid);
list_for_each_entry_rcu(v, &vg->vlan_list, vlist) {
if (!br_vlan_should_use(v))
continue;
vinfo.vid = v->vid;
vinfo.flags = 0;
- if (v->vid == pvid)
+ if (v == pvid)
vinfo.flags |= BRIDGE_VLAN_INFO_PVID;
if (v->flags & BRIDGE_VLAN_INFO_UNTAGGED)
diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
index 1b5d145dfcbf..50d70b5eb307 100644
--- a/net/bridge/br_private.h
+++ b/net/bridge/br_private.h
@@ -130,8 +130,8 @@ struct net_bridge_vlan {
struct net_bridge_vlan_group {
struct rhashtable vlan_hash;
struct list_head vlan_list;
+ struct net_bridge_vlan __rcu *pvid;
u16 num_vlans;
- u16 pvid;
};
struct net_bridge_fdb_entry
@@ -741,15 +741,6 @@ static inline int br_vlan_get_tag(const struct sk_buff *skb, u16 *vid)
return err;
}
-static inline u16 br_get_pvid(const struct net_bridge_vlan_group *vg)
-{
- if (!vg)
- return 0;
-
- smp_rmb();
- return vg->pvid;
-}
-
static inline int br_vlan_enabled(struct net_bridge *br)
{
return br->vlan_enabled;
@@ -835,11 +826,6 @@ static inline u16 br_vlan_get_tag(const struct sk_buff *skb, u16 *tag)
return 0;
}
-static inline u16 br_get_pvid(const struct net_bridge_vlan_group *vg)
-{
- return 0;
-}
-
static inline int br_vlan_enabled(struct net_bridge *br)
{
return 0;
diff --git a/net/bridge/br_vlan.c b/net/bridge/br_vlan.c
index e001152d6ad1..4fab7665df8c 100644
--- a/net/bridge/br_vlan.c
+++ b/net/bridge/br_vlan.c
@@ -31,22 +31,11 @@ static struct net_bridge_vlan *br_vlan_lookup(struct rhashtable *tbl, u16 vid)
return rhashtable_lookup_fast(tbl, &vid, br_vlan_rht_params);
}
-static void __vlan_add_pvid(struct net_bridge_vlan_group *vg, u16 vid)
+/* __vlan_delete_pvid is just __vlan_set_pvid(vg, NULL) */
+static void __vlan_set_pvid(struct net_bridge_vlan_group *vg,
+ struct net_bridge_vlan *v)
{
- if (vg->pvid == vid)
- return;
-
- smp_wmb();
- vg->pvid = vid;
-}
-
-static void __vlan_delete_pvid(struct net_bridge_vlan_group *vg, u16 vid)
-{
- if (vg->pvid != vid)
- return;
-
- smp_wmb();
- vg->pvid = 0;
+ rcu_assign_pointer(vg->pvid, v);
}
static void __vlan_add_flags(struct net_bridge_vlan *v, u16 flags)
@@ -59,9 +48,9 @@ static void __vlan_add_flags(struct net_bridge_vlan *v, u16 flags)
vg = nbp_vlan_group(v->port);
if (flags & BRIDGE_VLAN_INFO_PVID)
- __vlan_add_pvid(vg, v->vid);
- else
- __vlan_delete_pvid(vg, v->vid);
+ __vlan_set_pvid(vg, v);
+ else if (rtnl_dereference(vg->pvid) == v)
+ __vlan_set_pvid(vg, NULL);
if (flags & BRIDGE_VLAN_INFO_UNTAGGED)
v->flags |= BRIDGE_VLAN_INFO_UNTAGGED;
@@ -285,7 +274,9 @@ static int __vlan_del(struct net_bridge_vlan *v)
masterv = v->brvlan;
}
- __vlan_delete_pvid(vg, v->vid);
+ if (rtnl_dereference(vg->pvid) == v)
+ __vlan_set_pvid(vg, NULL);
+
if (p) {
err = __vlan_vid_del(p->dev, p->br, v->vid);
if (err)
@@ -320,7 +311,7 @@ static void __vlan_flush(struct net_bridge_vlan_group *vg)
{
struct net_bridge_vlan *vlan, *tmp;
- __vlan_delete_pvid(vg, vg->pvid);
+ __vlan_set_pvid(vg, NULL);
list_for_each_entry_safe(vlan, tmp, &vg->vlan_list, vlist)
__vlan_del(vlan);
}
@@ -404,29 +395,29 @@ static bool __allowed_ingress(struct net_bridge_vlan_group *vg, __be16 proto,
}
if (!*vid) {
- u16 pvid = br_get_pvid(vg);
+ v = rcu_dereference(vg->pvid);
/* Frame had a tag with VID 0 or did not have a tag.
* See if pvid is set on this port. That tells us which
* vlan untagged or priority-tagged traffic belongs to.
*/
- if (!pvid)
+ if (!v)
goto drop;
/* PVID is set on this port. Any untagged or priority-tagged
* ingress frame is considered to belong to this vlan.
*/
- *vid = pvid;
+ *vid = v->vid;
if (likely(!tagged))
/* Untagged Frame. */
- __vlan_hwaccel_put_tag(skb, proto, pvid);
+ __vlan_hwaccel_put_tag(skb, proto, v->vid);
else
/* Priority-tagged Frame.
* At this point, We know that skb->vlan_tci had
* VLAN_TAG_PRESENT bit and its VID field was 0x000.
* We update only VID field and preserve PCP field.
*/
- skb->vlan_tci |= pvid;
+ skb->vlan_tci |= v->vid;
return true;
}
@@ -451,6 +442,9 @@ bool br_allowed_ingress(const struct net_bridge *br,
BR_INPUT_SKB_CB(skb)->vlan_filtered = false;
return true;
}
+ /* if there's no vlan_group, there's nothing to match against */
+ if (!vg)
+ return false;
return __allowed_ingress(vg, br->vlan_proto, skb, vid);
}
@@ -492,9 +486,11 @@ bool br_should_learn(struct net_bridge_port *p, struct sk_buff *skb, u16 *vid)
*vid = 0;
if (!*vid) {
- *vid = br_get_pvid(vg);
- if (!*vid)
+ struct net_bridge_vlan *v = rcu_dereference(vg->pvid);
+
+ if (!v)
return false;
+ *vid = v->vid;
return true;
}
@@ -713,9 +709,9 @@ int br_vlan_set_proto(struct net_bridge *br, unsigned long val)
static bool vlan_default_pvid(struct net_bridge_vlan_group *vg, u16 vid)
{
- struct net_bridge_vlan *v;
+ struct net_bridge_vlan *v = rtnl_dereference(vg->pvid);
- if (vid != vg->pvid)
+ if (v && vid != v->vid)
return false;
v = br_vlan_lookup(&vg->vlan_hash, vid);
--
2.4.11
^ permalink raw reply related
* [PATCH net-next 6/7] bridge: vlan: learn to count
From: Nikolay Aleksandrov @ 2016-04-27 16:18 UTC (permalink / raw)
To: netdev; +Cc: roopa, davem, stephen, jhs, Nikolay Aleksandrov
In-Reply-To: <1461773902-13528-1-git-send-email-nikolay@cumulusnetworks.com>
Add support for per-VLAN Tx/Rx statistics. Every global vlan context gets
allocated a per-cpu stats which is then set in each per-port vlan context
for quick access. The br_allowed_ingress() common function is used to
account for Rx packets and the br_handle_vlan() common function is used
to account for Tx packets.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
---
Note: maybe in the future it'd be better to rename br_allowed_ingress to
br_vlan_ingress() or something similar as it's not doing only checks.
net/bridge/br_private.h | 11 +++++++++-
net/bridge/br_vlan.c | 53 +++++++++++++++++++++++++++++++++++++++----------
2 files changed, 53 insertions(+), 11 deletions(-)
diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
index 50d70b5eb307..f6876ed718a5 100644
--- a/net/bridge/br_private.h
+++ b/net/bridge/br_private.h
@@ -77,12 +77,21 @@ struct bridge_mcast_querier {
};
#endif
+struct br_vlan_stats {
+ u64 rx_bytes;
+ u64 rx_packets;
+ u64 tx_bytes;
+ u64 tx_packets;
+ struct u64_stats_sync syncp;
+};
+
/**
* struct net_bridge_vlan - per-vlan entry
*
* @vnode: rhashtable member
* @vid: VLAN id
* @flags: bridge vlan flags
+ * @stats: per-cpu VLAN statistics
* @br: if MASTER flag set, this points to a bridge struct
* @port: if MASTER flag unset, this points to a port struct
* @refcnt: if MASTER flag set, this is bumped for each port referencing it
@@ -100,6 +109,7 @@ struct net_bridge_vlan {
struct rhash_head vnode;
u16 vid;
u16 flags;
+ struct br_vlan_stats __percpu *stats;
union {
struct net_bridge *br;
struct net_bridge_port *port;
@@ -866,7 +876,6 @@ static inline struct net_bridge_vlan_group *nbp_vlan_group_rcu(
{
return NULL;
}
-
#endif
struct nf_br_ops {
diff --git a/net/bridge/br_vlan.c b/net/bridge/br_vlan.c
index 4fab7665df8c..d7a70c2ea3ec 100644
--- a/net/bridge/br_vlan.c
+++ b/net/bridge/br_vlan.c
@@ -151,6 +151,17 @@ static struct net_bridge_vlan *br_vlan_get_master(struct net_bridge *br, u16 vid
return masterv;
}
+static void br_master_vlan_rcu_free(struct rcu_head *rcu)
+{
+ struct net_bridge_vlan *v;
+
+ v = container_of(rcu, struct net_bridge_vlan, rcu);
+ WARN_ON(!br_vlan_is_master(v));
+ free_percpu(v->stats);
+ v->stats = NULL;
+ kfree(v);
+}
+
static void br_vlan_put_master(struct net_bridge_vlan *masterv)
{
struct net_bridge_vlan_group *vg;
@@ -163,7 +174,7 @@ static void br_vlan_put_master(struct net_bridge_vlan *masterv)
rhashtable_remove_fast(&vg->vlan_hash,
&masterv->vnode, br_vlan_rht_params);
__vlan_del_list(masterv);
- kfree_rcu(masterv, rcu);
+ call_rcu(&masterv->rcu, br_master_vlan_rcu_free);
}
}
@@ -219,6 +230,7 @@ static int __vlan_add(struct net_bridge_vlan *v, u16 flags)
if (!masterv)
goto out_filt;
v->brvlan = masterv;
+ v->stats = masterv->stats;
}
/* Add the dev mac and count the vlan only if it's usable */
@@ -320,6 +332,7 @@ struct sk_buff *br_handle_vlan(struct net_bridge *br,
struct net_bridge_vlan_group *vg,
struct sk_buff *skb)
{
+ struct br_vlan_stats *stats;
struct net_bridge_vlan *v;
u16 vid;
@@ -346,9 +359,14 @@ struct sk_buff *br_handle_vlan(struct net_bridge *br,
return NULL;
}
}
+ stats = this_cpu_ptr(v->stats);
+ u64_stats_update_begin(&stats->syncp);
+ stats->tx_bytes += skb->len;
+ stats->tx_packets++;
+ u64_stats_update_end(&stats->syncp);
+
if (v->flags & BRIDGE_VLAN_INFO_UNTAGGED)
skb->vlan_tci = 0;
-
out:
return skb;
}
@@ -357,7 +375,8 @@ out:
static bool __allowed_ingress(struct net_bridge_vlan_group *vg, __be16 proto,
struct sk_buff *skb, u16 *vid)
{
- const struct net_bridge_vlan *v;
+ struct br_vlan_stats *stats;
+ struct net_bridge_vlan *v;
bool tagged;
BR_INPUT_SKB_CB(skb)->vlan_filtered = true;
@@ -418,14 +437,21 @@ static bool __allowed_ingress(struct net_bridge_vlan_group *vg, __be16 proto,
* We update only VID field and preserve PCP field.
*/
skb->vlan_tci |= v->vid;
-
- return true;
+ } else {
+ /* Frame had a valid vlan tag. See if vlan is allowed */
+ v = br_vlan_find(vg, *vid);
}
+ if (!v || !br_vlan_should_use(v))
+ goto drop;
+
+ stats = this_cpu_ptr(v->stats);
+ u64_stats_update_begin(&stats->syncp);
+ stats->rx_bytes += skb->len;
+ stats->rx_packets++;
+ u64_stats_update_end(&stats->syncp);
+
+ return true;
- /* Frame had a valid vlan tag. See if vlan is allowed */
- v = br_vlan_find(vg, *vid);
- if (v && br_vlan_should_use(v))
- return true;
drop:
kfree_skb(skb);
return false;
@@ -538,6 +564,11 @@ int br_vlan_add(struct net_bridge *br, u16 vid, u16 flags)
if (!vlan)
return -ENOMEM;
+ vlan->stats = netdev_alloc_pcpu_stats(struct br_vlan_stats);
+ if (!vlan->stats) {
+ kfree(vlan);
+ return -ENOMEM;
+ }
vlan->vid = vid;
vlan->flags = flags | BRIDGE_VLAN_INFO_MASTER;
vlan->flags &= ~BRIDGE_VLAN_INFO_PVID;
@@ -545,8 +576,10 @@ int br_vlan_add(struct net_bridge *br, u16 vid, u16 flags)
if (flags & BRIDGE_VLAN_INFO_BRENTRY)
atomic_set(&vlan->refcnt, 1);
ret = __vlan_add(vlan, flags);
- if (ret)
+ if (ret) {
+ free_percpu(vlan->stats);
kfree(vlan);
+ }
return ret;
}
--
2.4.11
^ permalink raw reply related
* [PATCH net-next 7/7] bridge: netlink: export per-vlan stats
From: Nikolay Aleksandrov @ 2016-04-27 16:18 UTC (permalink / raw)
To: netdev; +Cc: roopa, davem, stephen, jhs, Nikolay Aleksandrov
In-Reply-To: <1461773902-13528-1-git-send-email-nikolay@cumulusnetworks.com>
Add a new LINK_XSTATS_BRIDGE_VLAN attribute and implement the
RTM_GETSTATS callbacks for IFLA_STATS_LINK_XSTATS (fill_linkxstats and
get_linkxstats_size) in order to export the per-vlan stats.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
---
include/uapi/linux/if_bridge.h | 8 ++++++
include/uapi/linux/if_link.h | 1 +
net/bridge/br_netlink.c | 57 ++++++++++++++++++++++++++++++++++++++++++
net/bridge/br_private.h | 7 ++++++
net/bridge/br_vlan.c | 27 ++++++++++++++++++++
5 files changed, 100 insertions(+)
diff --git a/include/uapi/linux/if_bridge.h b/include/uapi/linux/if_bridge.h
index 0536eefff9bf..3eb4e7145825 100644
--- a/include/uapi/linux/if_bridge.h
+++ b/include/uapi/linux/if_bridge.h
@@ -134,6 +134,14 @@ struct bridge_vlan_info {
__u16 vid;
};
+struct bridge_vlan_xstats {
+ __u16 vid;
+ __u64 rx_bytes;
+ __u64 rx_packets;
+ __u64 tx_bytes;
+ __u64 tx_packets;
+};
+
/* Bridge multicast database attributes
* [MDBA_MDB] = {
* [MDBA_MDB_ENTRY] = {
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index 1b874e26b15b..7a9420a19720 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -809,6 +809,7 @@ enum {
/* These are embedded into IFLA_STATS_LINK_XSTATS */
enum {
LINK_XSTATS_UNSPEC,
+ LINK_XSTATS_BRIDGE_VLAN,
__LINK_XSTATS_MAX
};
#define LINK_XSTATS_MAX (__LINK_XSTATS_MAX - 1)
diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c
index f33d95b0f5d3..34b4fa6fd693 100644
--- a/net/bridge/br_netlink.c
+++ b/net/bridge/br_netlink.c
@@ -1212,6 +1212,61 @@ static int br_fill_info(struct sk_buff *skb, const struct net_device *brdev)
return 0;
}
+static size_t br_get_linkxstats_size(const struct net_device *dev)
+{
+ struct net_bridge *br = netdev_priv(dev);
+ struct net_bridge_vlan_group *vg;
+ struct net_bridge_vlan *v;
+ int numvls = 0;
+
+ vg = br_vlan_group(br);
+ if (!vg || !vg->num_vlans)
+ return 0;
+
+ /* we need to count all, even placeholder entries */
+ list_for_each_entry(v, &vg->vlan_list, vlist)
+ numvls++;
+
+ return numvls * nla_total_size(sizeof(struct bridge_vlan_xstats));
+}
+
+static int br_fill_linkxstats(struct sk_buff *skb, const struct net_device *dev,
+ int *lidx)
+{
+ struct net_bridge *br = netdev_priv(dev);
+ struct net_bridge_vlan *v, *pvid;
+ struct net_bridge_vlan_group *vg;
+ struct bridge_vlan_xstats vxi;
+ int vl_idx = 0;
+
+ vg = br_vlan_group(br);
+ if (!vg || !vg->num_vlans)
+ goto out;
+ pvid = rtnl_dereference(vg->pvid);
+ list_for_each_entry(v, &vg->vlan_list, vlist) {
+ struct br_vlan_stats stats;
+
+ if (vl_idx++ < *lidx)
+ continue;
+ memset(&vxi, 0, sizeof(vxi));
+ vxi.vid = v->vid;
+ br_vlan_get_stats(v, &stats);
+ vxi.rx_bytes = stats.rx_bytes;
+ vxi.rx_packets = stats.rx_packets;
+ vxi.tx_bytes = stats.tx_bytes;
+ vxi.tx_packets = stats.tx_packets;
+
+ if (nla_put(skb, LINK_XSTATS_BRIDGE_VLAN, sizeof(vxi), &vxi))
+ goto nla_put_failure;
+ }
+ *lidx = 0;
+out:
+ return 0;
+
+nla_put_failure:
+ *lidx = vl_idx;
+ return -EMSGSIZE;
+}
static struct rtnl_af_ops br_af_ops __read_mostly = {
.family = AF_BRIDGE,
@@ -1230,6 +1285,8 @@ struct rtnl_link_ops br_link_ops __read_mostly = {
.dellink = br_dev_delete,
.get_size = br_get_size,
.fill_info = br_fill_info,
+ .fill_linkxstats = br_fill_linkxstats,
+ .get_linkxstats_size = br_get_linkxstats_size,
.slave_maxtype = IFLA_BRPORT_MAX,
.slave_policy = br_port_policy,
diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
index f6876ed718a5..a10f7ed26f3b 100644
--- a/net/bridge/br_private.h
+++ b/net/bridge/br_private.h
@@ -709,6 +709,8 @@ int nbp_vlan_delete(struct net_bridge_port *port, u16 vid);
void nbp_vlan_flush(struct net_bridge_port *port);
int nbp_vlan_init(struct net_bridge_port *port);
int nbp_get_num_vlan_infos(struct net_bridge_port *p, u32 filter_mask);
+void br_vlan_get_stats(const struct net_bridge_vlan *v,
+ struct br_vlan_stats *stats);
static inline struct net_bridge_vlan_group *br_vlan_group(
const struct net_bridge *br)
@@ -876,6 +878,11 @@ static inline struct net_bridge_vlan_group *nbp_vlan_group_rcu(
{
return NULL;
}
+
+static inline void br_vlan_get_stats(const struct net_bridge_vlan *v,
+ struct br_vlan_stats *stats)
+{
+}
#endif
struct nf_br_ops {
diff --git a/net/bridge/br_vlan.c b/net/bridge/br_vlan.c
index d7a70c2ea3ec..b39d9f5761d9 100644
--- a/net/bridge/br_vlan.c
+++ b/net/bridge/br_vlan.c
@@ -1029,3 +1029,30 @@ void nbp_vlan_flush(struct net_bridge_port *port)
synchronize_rcu();
__vlan_group_free(vg);
}
+
+void br_vlan_get_stats(const struct net_bridge_vlan *v,
+ struct br_vlan_stats *stats)
+{
+ int i;
+
+ memset(stats, 0, sizeof(*stats));
+ for_each_possible_cpu(i) {
+ u64 rxpackets, rxbytes, txpackets, txbytes;
+ struct br_vlan_stats *cpu_stats;
+ unsigned int start;
+
+ cpu_stats = per_cpu_ptr(v->stats, i);
+ do {
+ start = u64_stats_fetch_begin_irq(&cpu_stats->syncp);
+ rxpackets = cpu_stats->rx_packets;
+ rxbytes = cpu_stats->rx_bytes;
+ txbytes = cpu_stats->tx_bytes;
+ txpackets = cpu_stats->tx_packets;
+ } while (u64_stats_fetch_retry_irq(&cpu_stats->syncp, start));
+
+ stats->rx_packets += rxpackets;
+ stats->rx_bytes += rxbytes;
+ stats->tx_bytes += txbytes;
+ stats->tx_packets += txpackets;
+ }
+}
--
2.4.11
^ permalink raw reply related
* Re: [PATCH v2] net: Add Qualcomm IPC router
From: David Miller @ 2016-04-27 16:22 UTC (permalink / raw)
To: bjorn.andersson
Cc: linux-kernel, netdev, linux-arm-msm, courtney.cavin,
bjorn.andersson
In-Reply-To: <1461736085-24862-1-git-send-email-bjorn.andersson@linaro.org>
From: Bjorn Andersson <bjorn.andersson@linaro.org>
Date: Tue, 26 Apr 2016 22:48:05 -0700
> + rc = qcom_smd_send(qdev->channel, skb->data, skb->len);
I truly dislike adding networking protocols that depend upon some
piece of infrastructure that only some platforms can enable, it's even
worse when that set of platforms doesn't intersect with x86-64.
When you do things like this, it's quite hard to make protocol wide
changes to APIs because build testing becomes an issue.
This code can now only be build tested on ARCH_QCOM architectures, and
that's a serious negative downside.
^ permalink raw reply
* RE: [PATCH v2 0/2] pegasus: correct buffer sizes
From: David Laight @ 2016-04-27 16:26 UTC (permalink / raw)
To: 'Johannes Berg', Petko Manolov
Cc: netdev@vger.kernel.org, davem@davemloft.net, a1291762@gmail.com
In-Reply-To: <1461750245.3723.8.camel@sipsolutions.net>
From: Johannes Berg
> Sent: 27 April 2016 10:44
> On Wed, 2016-04-27 at 12:33 +0300, Petko Manolov wrote:
> >
> > Your guess turned out to be not so wild.;)All Pegasus devices are
> > configured (by the driver) to append CRC at the end of each RX
> > packet.However, the driver reports packet length that does not
> > include it.
>
> Interesting, then my guess was wrong though, since the length is
> reported without it, or am I misunderstanding this?
...
It is even possible that the crc is written into the rx buffer even
though the length the hardware reports excludes it.
David
^ permalink raw reply
* Re: [PATCH net-next] drivers/net: add 6WIND SHULTI support
From: David Miller @ 2016-04-27 16:34 UTC (permalink / raw)
To: nicolas.dichtel; +Cc: netdev, jiri
In-Reply-To: <1461749838-4613-1-git-send-email-nicolas.dichtel@6wind.com>
From: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Date: Wed, 27 Apr 2016 11:37:18 +0200
> +/* rtnl linkinfo attributes */
> +enum {
> + IFLA_SHULTI_UNSPEC,
> + IFLA_SHULTI_SWDEV_ID,
> + IFLA_SHULTI_SWDEV_PORTID,
> + IFLA_SHULTI_PHYS_PORT_NAME,
> +
> + __IFLA_SHULTI_MAX
> +#define IFLA_SHULTI_MAX (__IFLA_SHULTI_MAX - 1)
> +};
> +
...
> +enum {
> + SHULTI_A_UNSPEC,
> + SHULTI_A_NL_PORTIDS,
> + SHULTI_A_SWDEV_PORTID,
> + SHULTI_A_RX_QUEUES,
> + SHULTI_A_ERRCODE,
> + SHULTI_A_IFINDEX,
> + SHULTI_A_LINK_STATUS,
> + SHULTI_A_PACKET,
> + SHULTI_A_STATS,
> + SHULTI_A_VLAN_VID,
> + SHULTI_A_DRVINFO,
> + SHULTI_A_SETTINGS,
> + SHULTI_A_RX_MODES,
> + SHULTI_A_UC_ADDR,
> + SHULTI_A_MC_ADDR,
> + SHULTI_A_STP_STATE,
> + SHULTI_A_BR_STATE_LEARNING,
> + SHULTI_A_BR_AGEING_TIME,
> + SHULTI_A_BR_FDB,
> +
> + __SHULTI_A_MAX
> +#define SHULTI_A_MAX (__SHULTI_A_MAX - 1)
> +};
> +
If our existing netlink facilities for switchdevs are not sufficient to properly
report and configure the state of your device, extend it rather than making custom
netlink stuff.
^ permalink raw reply
* Re: [PATCH net-next 9/9] taskstats: use the libnl API to align nlattr on 64-bit
From: David Miller @ 2016-04-27 16:41 UTC (permalink / raw)
To: bsingharora
Cc: nicolas.dichtel, netdev, linux-kernel, linux-wpan, aar, pablo,
kaber, kadlec, pshelar, kuznet, jmorris, yoshfuji,
netfilter-devel, dev, steffen.klassert, herbert
In-Reply-To: <5720B0A2.1030405@gmail.com>
From: Balbir Singh <bsingharora@gmail.com>
Date: Wed, 27 Apr 2016 22:29:22 +1000
> My concern is ABI breakage of user space.
The "ABI" is that unrecognized attributes must be silently ignored by
userspace.
^ permalink raw reply
* Re: [PATCH] can: m_can: fix bitrate setup on latest silicon
From: Florian Vallée @ 2016-04-27 16:47 UTC (permalink / raw)
To: Oliver Hartkopp
Cc: Wolfgang Grandegger, Marc Kleine-Budde, linux-can, netdev,
linux-kernel, Eric Bénard
In-Reply-To: <571FBD5E.9040807@hartkopp.net>
On 26 April 2016 at 21:11, Oliver Hartkopp <socketcan@hartkopp.net> wrote:
>
> I wonder whether this small change covers the updates made between
> v3.0.1 and v3.1.0.
>
Probably not, I was mainly interested in fixing basic functionality here :)
(ie: with the default settings we can exchange data frames with
another controller)
>
> Your patch looks very good so far. I would appreciate if you could update the other register changes too as I don't have a hardware to test. I can provide the ISO/NON_ISO config for the netlink interface updates then :-)
>
Ok, I'll have another look at the changes. Thank you for the spec
history btw, it seems bosch only keeps the latest one publicly
available.
Regards,
Florian
^ permalink raw reply
* Re: [PATCH net-next] drivers/net: add 6WIND SHULTI support
From: David Miller @ 2016-04-27 16:55 UTC (permalink / raw)
To: jiri; +Cc: fw, nicolas.dichtel, netdev
In-Reply-To: <20160427151336.GB1962@nanopsycho.orion>
From: Jiri Pirko <jiri@resnulli.us>
Date: Wed, 27 Apr 2016 17:14:04 +0200
> Wed, Apr 27, 2016 at 11:56:15AM CEST, fw@strlen.de wrote:
>>Nicolas Dichtel <nicolas.dichtel@6wind.com> wrote:
>>> This patch adds the support of the 6WIND SHULTI switch. It is a software
>>> switch doing L2 forwarding.
>>>
>>> This first version implements the minimum needed to get the device working.
>>> It also implements, via switchdev and rtnetlink, bridge forwarding offload,
>>> including FDB static entries, FDB learning and FDB ageing.
>>
>>How is this different from net/bridge?
>>How is this different from openvswitch?
>
> The difference is that it this tries to allow userspace crap to mirror
> setting user does for bridge/ovs. Basically this looks to me like an
> attempt to enable userspace SDKs and such.
+1
There is no way I'm applying this.
^ permalink raw reply
* Re: [PATCH net-next v2] taskstats: fix nl parsing in accounting/getdelays.c
From: David Miller @ 2016-04-27 16:56 UTC (permalink / raw)
To: nicolas.dichtel-pdR9zngts4EAvxtiuMwx3w
Cc: dev-yBygre7rU0TnMu66kgdUjQ,
steffen.klassert-opNxpl+3fjRBDgjK7y7TUQ,
herbert-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q,
aar-bIcnvbaLZ9MEGnE8C9+IrQ, netdev-u79uwXL29TY76Z2rM5mHXA,
bsingharora-Re5JQEeQqe8AvxtiuMwx3w,
linux-kernel-u79uwXL29TY76Z2rM5mHXA,
jmorris-gx6/JNMH7DfYtjvyW6yDsg, yoshfuji-VfPWfsRibaP+Ru+s062T9g,
netfilter-devel-u79uwXL29TY76Z2rM5mHXA,
kadlec-K40Dz/62t/MgiyqX0sVFJYdd74u8MsAO,
kuznet-v/Mj1YrvjDBInbfyfbPRSQ, linux-wpan-u79uwXL29TY76Z2rM5mHXA,
kaber-dcUjhNyLwpNeoWH0uzbU5w, pablo-Cap9r6Oaw4JrovVCs/uTlw
In-Reply-To: <1461772388-3763-1-git-send-email-nicolas.dichtel-pdR9zngts4EAvxtiuMwx3w@public.gmane.org>
From: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Date: Wed, 27 Apr 2016 17:53:08 +0200
> The type TASKSTATS_TYPE_NULL should always be ignored.
>
> When jumping to the next attribute, only the length of the current
> attribute should be added, not the length of all nested attributes.
> This last bug was not visible before commit 80df554275c2, because the
> kernel didn't put more than two nested attributes.
>
> Fixes: a3baf649ca9c ("[PATCH] per-task-delay-accounting: documentation")
> Fixes: 80df554275c2 ("taskstats: use the libnl API to align nlattr on 64-bit")
> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Applied, thanks.
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev
^ permalink raw reply
* Re: [PATCH net-next 0/7] bridge: per-vlan stats
From: Stephen Hemminger @ 2016-04-27 17:06 UTC (permalink / raw)
To: Nikolay Aleksandrov; +Cc: netdev, roopa, davem, jhs
In-Reply-To: <1461773902-13528-1-git-send-email-nikolay@cumulusnetworks.com>
On Wed, 27 Apr 2016 18:18:15 +0200
Nikolay Aleksandrov <nikolay@cumulusnetworks.com> wrote:
> Hi,
> This set adds support for bridge per-vlan statistics.
> In order to be able to dump statistics we need a way to continue
> dumping after reaching maximum size, thus patches 01-03 extend the new
> stats API with a per-device extended link stats attribute and callback
> which can save its local state and continue where it left off afterwards.
> I considered using the already existing "fill_xstats" callback but it gets
> confusing since we need to separate the linkinfo dump from the new stats
> api dump and adding a flag/argument to do that just looks messy. I don't
> think the rtnl_link_ops size is an issue, so adding these seemed like the
> cleaner approach.
>
> Patch 05 converts the pvid to a pointer so we can consolidate the vlan
> stats accounting paths later, also allows to simplify the pvid code.
> Patches 06 and 07 add the stats support and netlink dump support
> respectively.
> I've tested this set with both old and modified iproute2, kmemleak on and
> some traffic stress tests while adding/removing vlans and ports.
>
> Thank you,
> Nik
>
> Note: Jamal I haven't forgotten about the per-port per-vlan stats, I've got
> a follow-up patch that adds it. You can easily see that the infrastructure
> for private port/vlan stats is in place after this set. Though the stats
> api will need some more changes to support that.
>
>
> Nikolay Aleksandrov (7):
> net: rtnetlink: allow rtnl_fill_statsinfo to save private state
> counter
> net: rtnetlink: allow only one idx saving stats attribute
> net: rtnetlink: add linkxstats callbacks and attribute
> net: constify is_skb_forwardable's arguments
> bridge: vlan: RCUify pvid
> bridge: vlan: learn to count
> bridge: netlink: export per-vlan stats
>
> include/linux/netdevice.h | 3 +-
> include/net/rtnetlink.h | 10 +++
> include/uapi/linux/if_bridge.h | 8 +++
> include/uapi/linux/if_link.h | 9 +++
> net/bridge/br_netlink.c | 80 ++++++++++++++++++++----
> net/bridge/br_private.h | 32 +++++-----
> net/bridge/br_vlan.c | 134 +++++++++++++++++++++++++++++------------
> net/core/dev.c | 2 +-
> net/core/rtnetlink.c | 64 +++++++++++++++++---
> 9 files changed, 266 insertions(+), 76 deletions(-)
>
I am concerned this adds unnecessary complexity (more bugs)
and overhead (slower performance). Statistics are not free, and having
them in a convenient place maybe unnecessary duplication.
^ permalink raw reply
* Re: [PATCH] ip: add udp_csum, udp6_csum_tx, udp6_csum_rx control flags to ip l2tp add tunnel
From: Wang Shanker @ 2016-04-27 17:06 UTC (permalink / raw)
To: James Chapman; +Cc: netdev
In-Reply-To: <FD67CA88-0731-4DD7-AC78-5A4B922810D6@gmail.com>
> 在 2016年4月27日,23:33,Wang Shanker <shankerwangmiao@gmail.com> 写道:
>
>
>
>> 在 2016年4月27日,20:21,James Chapman <jchapman@katalix.com> 写道:
>>
>> On 26 April 2016 at 15:15, Wang Shanker <shankerwangmiao@gmail.com> wrote:
>>> Hi, all
>>>
>>> It’s my first time to contribute to such an important open source project. Things began when I upgraded my server, called "Server A", form ubuntu 14.04 to 16.04, which is shipped with new kernel version, 4.4. After upgrade, I soon found a l2tp tunnel between this server and another linux server, called "Server B", via ipv6 broke down. Here is the network topology:
>>>
>>> +----------+ +----------+
>>> | Server A | -- IPV6 Network -- | Server B |
>>> +----------+ +----------+
>>>
>>> The l2tp tunnel was encapsulated in udp datagrams. All the configuration was normal and could work after I reverted the kernel on Server A to original version.
>>>
>>> Here is what i did to create the tunnel:
>>>
>>> ```
>>> on Server A:
>>>
>>> ip l2tp add tunnel tunnel_id 86 peer_tunnel_id 86 remote 2001:db8::aaaa local 2001:db8::bbbb udp_sport 1086 udp_dport 1086
>>> ip l2tp add session name l2tpeth0 tunnel_id 86 session_id 86 peer_session_id 86
>>> ip l s l2tpeth0 up
>>>
>>> on Server B:
>>>
>>> ip l2tp add tunnel tunnel_id 86 peer_tunnel_id 86 local 2001:db8::aaaa remote 2001:db8::bbbb udp_sport 1086 udp_dport 1086
>>> ip l2tp add session name l2tpeth0 tunnel_id 86 session_id 86 peer_session_id 86
>>> ip l s l2tpeth0 up
>>>
>>> ```
>>>
>>> When I used tcpdump to diagnose the problem, I got such result:
>>>
>>> ```
>>> on Server A:
>>>
>>> arping -i l2tpeth0 -0 1.2.3.4
>>>
>>> on Server B:
>>>
>>> tcpdump -i eth0 -n port 1086 -v
>>>
>>> 21:35:57.818810 IP6 (flowlabel 0x8f028, hlim 64, next-header UDP (17) payload length: 62) 2001:db8::aaaa.1086 > 2001:db8::bbbb.1086: [bad udp cksum 0x0000 -> 0x1140!] UDP, length 54
>>> 21:35:58.820572 IP6 (flowlabel 0x8f028, hlim 64, next-header UDP (17) payload length: 62) 2001:db8::aaaa.1086 > 2001:db8::bbbb.1086: [bad udp cksum 0x0000 -> 0x1140!] UDP, length 54
>>> 21:35:59.822216 IP6 (flowlabel 0x8f028, hlim 64, next-header UDP (17) payload length: 62) 2001:db8::aaaa.1086 > 2001:db8::bbbb.1086: [bad udp cksum 0x0000 -> 0x1140!] UDP, length 54
>>>
>>> ```
>>>
>>> After looking into kernel source, I found out that in this commit a new feature to set udp6 checksum to zero in commit 6b649fe, which added `L2TP_ATTR_UDP_ZERO_CSUM6_TX` and `L2TP_ATTR_UDP_ZERO_CSUM6_TX`.
>>>
>>> As a result, I added `udp_csum`, `udp6_csum_tx`, `udp6_csum_rx` control flags to `ip l2tp add tunnel` to control those attributes about checksum.
>>>
>>> Using this to create the tunnel instead on Server A:
>>>
>>> ```
>>> ip l2tp add tunnel tunnel_id 86 peer_tunnel_id 86 remote 2001:db8::aaaa local 2001:db8::bbbb udp_sport 1086 udp_dport 1086 udp6_csum_tx on udp6_csum_rx on
>>> ```
>>>
>>> I finally got:
>>>
>>> ```
>>> on Server A:
>>>
>>> arping -i l2tpeth0 -0 1.2.3.4
>>>
>>> on Server B:
>>>
>>> tcpdump -i eth0 -n port 1086 -v
>>>
>>> 22:07:03.844297 IP6 (flowlabel 0x8f028, hlim 64, next-header UDP (17) payload length: 62) 2001:db8::aaaa.1086 > 2001:db8::bbbb.1086: [udp sum ok] UDP, length 54
>>> 22:07:04.845717 IP6 (flowlabel 0x8f028, hlim 64, next-header UDP (17) payload length: 62) 2001:db8::aaaa.1086 > 2001:db8::bbbb.1086: [udp sum ok] UDP, length 54
>>> 22:07:05.847965 IP6 (flowlabel 0x8f028, hlim 64, next-header UDP (17) payload length: 62) 2001:db8::aaaa.1086 > 2001:db8::bbbb.1086: [udp sum ok] UDP, length 54
>>>
>>> tcpdump -i l2tpeth0 -v
>>>
>>> 22:10:35.691326 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 1.2.3.4 tell 0.0.0.0, length 28
>>> 22:10:36.693627 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 1.2.3.4 tell 0.0.0.0, length 28
>>> 22:10:37.695010 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 1.2.3.4 tell 0.0.0.0, length 28
>>> 22:10:38.697121 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 1.2.3.4 tell 0.0.0.0, length 28
>>>
>>> ```
>>>
>>> It seems to work. However, is it the real point that should be fixed and does my patch fix it well? I need your suggestion.
>>
>> This seems reasonable to me. It is good to provide user control of
>> L2TP checksum options.
>>
>> However, there is a problem with your patch. The netlink attributes
>> you are accessing to control checksums are flags, not u8 values.
>
> I’m not so familiar with kernel code. However, in `<linux/l2tp.h>` :
>
> ```
>
> /*
> * ATTR types defined for L2TP
> */
> enum {
> L2TP_ATTR_NONE, /* no data */
> // ...
> L2TP_ATTR_IP6_DADDR, /* struct in6_addr */
> L2TP_ATTR_UDP_ZERO_CSUM6_TX, /* u8 */
> L2TP_ATTR_UDP_ZERO_CSUM6_RX, /* u8 */
> // ...
>
> }
> ```
>
> isn’t L2TP_ATTR_UDP_ZERO_CSUM6_TX a u8 value? Or should I use `addattr` instead of `addattr8`?
>
>>
>> Maybe the default checksum setting for such l2tp tunnels should be
>> changed in the l2tp kernel code to match the previous behaviour where
>> IPv6 checksums were disabled?
>
> I think so. However, I’m confused with those code.
>
> From the name of the attrs `L2TP_ATTR_UDP_ZERO_CSUM6_TX` and `L2TP_ATTR_UDP_ZERO_CSUM6_RX`, I
> can tell that when those flags are set, the checksum will be zero. Also, according to the
> comment of commit 6b649fe in kernel source, “Added new L2TP configuration options to allow
> TX and RX of zero checksums in IPv6. Default is not to use them.”, checksums shouldn't have
> been zero by default. However, in fact, they are. I think there may be some bugs in kernel
> source.
I think I’ve got the bug. Here is the patch for kernel
---
net/l2tp/l2tp_core.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/net/l2tp/l2tp_core.c b/net/l2tp/l2tp_core.c
index afca2eb..6edfa99 100644
--- a/net/l2tp/l2tp_core.c
+++ b/net/l2tp/l2tp_core.c
@@ -1376,9 +1376,9 @@ static int l2tp_tunnel_sock_create(struct net *net,
memcpy(&udp_conf.peer_ip6, cfg->peer_ip6,
sizeof(udp_conf.peer_ip6));
udp_conf.use_udp6_tx_checksums =
- cfg->udp6_zero_tx_checksums;
+ ! cfg->udp6_zero_tx_checksums;
udp_conf.use_udp6_rx_checksums =
- cfg->udp6_zero_rx_checksums;
+ ! cfg->udp6_zero_rx_checksums;
} else
#endif
{
--
>>
>>>
>>>
>>> ---
>>> ip/ipl2tp.c | 45 +++++++++++++++++++++++++++++++++++++++++++++
>>> 1 file changed, 45 insertions(+)
>>>
>>> diff --git a/ip/ipl2tp.c b/ip/ipl2tp.c
>>> index 3c8ee93..67a6482 100644
>>> --- a/ip/ipl2tp.c
>>> +++ b/ip/ipl2tp.c
>>> @@ -56,6 +56,8 @@ struct l2tp_parm {
>>>
>>> uint16_t pw_type;
>>> uint16_t mtu;
>>> + int udp6_csum_tx:1;
>>> + int udp6_csum_rx:1;
>>> int udp_csum:1;
>>> int recv_seq:1;
>>> int send_seq:1;
>>> @@ -117,6 +119,9 @@ static int create_tunnel(struct l2tp_parm *p)
>>> if (p->encap == L2TP_ENCAPTYPE_UDP) {
>>> addattr16(&req.n, 1024, L2TP_ATTR_UDP_SPORT, p->local_udp_port);
>>> addattr16(&req.n, 1024, L2TP_ATTR_UDP_DPORT, p->peer_udp_port);
>>> + addattr8 (&req.n, 1024, L2TP_ATTR_UDP_CSUM, p->udp_csum);
>>> + addattr8 (&req.n, 1024, L2TP_ATTR_UDP_ZERO_CSUM6_TX, p->udp6_csum_tx);
>>> + addattr8 (&req.n, 1024, L2TP_ATTR_UDP_ZERO_CSUM6_RX, p->udp6_csum_rx);
>>> }
>>>
>>> if (rtnl_talk(&genl_rth, &req.n, NULL, 0) < 0)
>>> @@ -282,6 +287,14 @@ static int get_response(struct nlmsghdr *n, void *arg)
>>> p->l2spec_len = rta_getattr_u8(attrs[L2TP_ATTR_L2SPEC_LEN]);
>>>
>>> p->udp_csum = !!attrs[L2TP_ATTR_UDP_CSUM];
>>> + if (attrs[L2TP_ATTR_UDP_ZERO_CSUM6_TX])
>>> + p->udp6_csum_tx = rta_getattr_u8(attrs[L2TP_ATTR_UDP_ZERO_CSUM6_TX]);
>>> + else
>>> + p->udp6_csum_tx = 1;
>>> + if (attrs[L2TP_ATTR_UDP_ZERO_CSUM6_RX])
>>> + p->udp6_csum_rx = rta_getattr_u8(attrs[L2TP_ATTR_UDP_ZERO_CSUM6_RX]);
>>> + else
>>> + p->udp6_csum_rx = 1;
>>> if (attrs[L2TP_ATTR_COOKIE])
>>> memcpy(p->cookie, RTA_DATA(attrs[L2TP_ATTR_COOKIE]),
>>> p->cookie_len = RTA_PAYLOAD(attrs[L2TP_ATTR_COOKIE]));
>>> @@ -470,6 +483,9 @@ static void usage(void)
>>> fprintf(stderr, " tunnel_id ID peer_tunnel_id ID\n");
>>> fprintf(stderr, " [ encap { ip | udp } ]\n");
>>> fprintf(stderr, " [ udp_sport PORT ] [ udp_dport PORT ]\n");
>>> + fprintf(stderr, " [ udp_csum { on | off } ]\n");
>>> + fprintf(stderr, " [ udp6_csum_tx { on | off } ]\n");
>>> + fprintf(stderr, " [ udp6_csum_rx { on | off } ]\n");
>>> fprintf(stderr, "Usage: ip l2tp add session [ name NAME ]\n");
>>> fprintf(stderr, " tunnel_id ID\n");
>>> fprintf(stderr, " session_id ID peer_session_id ID\n");
>>> @@ -500,6 +516,8 @@ static int parse_args(int argc, char **argv, int cmd, struct l2tp_parm *p)
>>> /* Defaults */
>>> p->l2spec_type = L2TP_L2SPECTYPE_DEFAULT;
>>> p->l2spec_len = 4;
>>> + p->udp6_csum_rx = 1;
>>> + p->udp6_csum_tx = 1;
>>>
>>> while (argc > 0) {
>>> if (strcmp(*argv, "encap") == 0) {
>>> @@ -569,6 +587,33 @@ static int parse_args(int argc, char **argv, int cmd, struct l2tp_parm *p)
>>> if (get_u16(&uval, *argv, 0))
>>> invarg("invalid port\n", *argv);
>>> p->peer_udp_port = uval;
>>> + } else if (strcmp(*argv, "udp_csum") == 0) {
>>> + NEXT_ARG();
>>> + if (strcmp(*argv, "on") == 0) {
>>> + p->udp_csum = 1;
>>> + } else if (strcmp(*argv, "off") == 0) {
>>> + p->udp_csum = 0;
>>> + } else {
>>> + invarg("invalid option for udp_csum\n", *argv);
>>> + }
>>> + } else if (strcmp(*argv, "udp6_csum_rx") == 0) {
>>> + NEXT_ARG();
>>> + if (strcmp(*argv, "on") == 0) {
>>> + p->udp6_csum_rx = 1;
>>> + } else if (strcmp(*argv, "off") == 0) {
>>> + p->udp6_csum_rx = 0;
>>> + } else {
>>> + invarg("invalid option for udp6_csum_rx\n", *argv);
>>> + }
>>> + } else if (strcmp(*argv, "udp6_csum_tx") == 0) {
>>> + NEXT_ARG();
>>> + if (strcmp(*argv, "on") == 0) {
>>> + p->udp6_csum_tx = 1;
>>> + } else if (strcmp(*argv, "off") == 0) {
>>> + p->udp6_csum_tx = 0;
>>> + } else {
>>> + invarg("invalid option for udp6_csum_tx\n", *argv);
>>> + }
>>> } else if (strcmp(*argv, "offset") == 0) {
>>> __u8 uval;
>>>
>>> --
>>> 2.5.2
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox