* Re: [PATCH net] ptr_ring: use kmalloc_array()
From: Eric Dumazet @ 2017-08-25 18:57 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: David Miller, netdev, Jason Wang
In-Reply-To: <20170825205653-mutt-send-email-mst@kernel.org>
On Fri, 2017-08-25 at 21:03 +0300, Michael S. Tsirkin wrote:
> On Wed, Aug 16, 2017 at 10:36:47AM -0700, Eric Dumazet wrote:
> > From: Eric Dumazet <edumazet@google.com>
> >
> > As found by syzkaller, malicious users can set whatever tx_queue_len
> > on a tun device and eventually crash the kernel.
> >
> > Lets remove the ALIGN(XXX, SMP_CACHE_BYTES) thing since a small
> > ring buffer is not fast anyway.
>
> I'm not sure it's worth changing for small rings.
>
> Does kmalloc_array guarantee cache line alignment for big buffers
> then? If the ring is misaligned it will likely cause false sharing
> as it's designed to be accessed from two CPUs.
I specifically said that in the changelog :
"since a small ring buffer is not fast anyway."
If one user sets up a pathological small ring buffer, kernel should not
try to fix it.
In this case, you would have to setup a ring of 2 or 4 slots to
eventually hit false sharing.
^ permalink raw reply
* [PULL] vhost: cleanups and fixes
From: Michael S. Tsirkin @ 2017-08-25 18:47 UTC (permalink / raw)
To: Linus Torvalds
Cc: kvm, mst, netdev, linux-kernel, stable, virtualization, stefanha,
yasu.isimatu, hch
The following changes since commit 14ccee78fc82f5512908f4424f541549a5705b89:
Linux 4.13-rc6 (2017-08-20 14:13:52 -0700)
are available in the git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git tags/for_linus
for you to fetch changes up to ba74b6f7fcc07355d087af6939712eed4a454821:
virtio_pci: fix cpu affinity support (2017-08-25 21:38:26 +0300)
----------------------------------------------------------------
virtio: bugfix
Fixes two obvious bugs in virtio pci.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
----------------------------------------------------------------
Christoph Hellwig (1):
virtio_pci: fix cpu affinity support
Stefan Hajnoczi (1):
virtio_blk: fix incorrect message when disk is resized
drivers/block/virtio_blk.c | 16 ++++++++++------
drivers/virtio/virtio_pci_common.c | 10 +++++++---
2 files changed, 17 insertions(+), 9 deletions(-)
^ permalink raw reply
* Re: [PATCH net] ptr_ring: use kmalloc_array()
From: Michael S. Tsirkin @ 2017-08-25 18:03 UTC (permalink / raw)
To: Eric Dumazet; +Cc: David Miller, netdev, Jason Wang
In-Reply-To: <1502905007.4936.133.camel@edumazet-glaptop3.roam.corp.google.com>
On Wed, Aug 16, 2017 at 10:36:47AM -0700, Eric Dumazet wrote:
> From: Eric Dumazet <edumazet@google.com>
>
> As found by syzkaller, malicious users can set whatever tx_queue_len
> on a tun device and eventually crash the kernel.
>
> Lets remove the ALIGN(XXX, SMP_CACHE_BYTES) thing since a small
> ring buffer is not fast anyway.
I'm not sure it's worth changing for small rings.
Does kmalloc_array guarantee cache line alignment for big buffers
then? If the ring is misaligned it will likely cause false sharing
as it's designed to be accessed from two CPUs.
> Fixes: 2e0ab8ca83c1 ("ptr_ring: array based FIFO for pointers")
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Reported-by: Dmitry Vyukov <dvyukov@google.com>
> Cc: Michael S. Tsirkin <mst@redhat.com>
> Cc: Jason Wang <jasowang@redhat.com>
> ---
> include/linux/ptr_ring.h | 9 +++++----
> include/linux/skb_array.h | 3 ++-
> 2 files changed, 7 insertions(+), 5 deletions(-)
>
> diff --git a/include/linux/ptr_ring.h b/include/linux/ptr_ring.h
> index d8c97ec8a8e6..37b4bb2545b3 100644
> --- a/include/linux/ptr_ring.h
> +++ b/include/linux/ptr_ring.h
> @@ -436,9 +436,9 @@ static inline int ptr_ring_consume_batched_bh(struct ptr_ring *r,
> __PTR_RING_PEEK_CALL_v; \
> })
>
> -static inline void **__ptr_ring_init_queue_alloc(int size, gfp_t gfp)
> +static inline void **__ptr_ring_init_queue_alloc(unsigned int size, gfp_t gfp)
> {
> - return kzalloc(ALIGN(size * sizeof(void *), SMP_CACHE_BYTES), gfp);
> + return kcalloc(size, sizeof(void *), gfp);
> }
>
> static inline void __ptr_ring_set_size(struct ptr_ring *r, int size)
> @@ -582,7 +582,8 @@ static inline int ptr_ring_resize(struct ptr_ring *r, int size, gfp_t gfp,
> * In particular if you consume ring in interrupt or BH context, you must
> * disable interrupts/BH when doing so.
> */
> -static inline int ptr_ring_resize_multiple(struct ptr_ring **rings, int nrings,
> +static inline int ptr_ring_resize_multiple(struct ptr_ring **rings,
> + unsigned int nrings,
> int size,
> gfp_t gfp, void (*destroy)(void *))
> {
> @@ -590,7 +591,7 @@ static inline int ptr_ring_resize_multiple(struct ptr_ring **rings, int nrings,
> void ***queues;
> int i;
>
> - queues = kmalloc(nrings * sizeof *queues, gfp);
> + queues = kmalloc_array(nrings, sizeof(*queues), gfp);
> if (!queues)
> goto noqueues;
>
> diff --git a/include/linux/skb_array.h b/include/linux/skb_array.h
> index 35226cd4efb0..8621ffdeecbf 100644
> --- a/include/linux/skb_array.h
> +++ b/include/linux/skb_array.h
> @@ -193,7 +193,8 @@ static inline int skb_array_resize(struct skb_array *a, int size, gfp_t gfp)
> }
>
> static inline int skb_array_resize_multiple(struct skb_array **rings,
> - int nrings, int size, gfp_t gfp)
> + int nrings, unsigned int size,
> + gfp_t gfp)
> {
> BUILD_BUG_ON(offsetof(struct skb_array, ring));
> return ptr_ring_resize_multiple((struct ptr_ring **)rings,
>
^ permalink raw reply
* Re: Permissions for eBPF objects
From: Jeffrey Vander Stoep via Selinux @ 2017-08-25 18:03 UTC (permalink / raw)
To: Chenbo Feng, SELinux, netdev-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <CABXk95ATb_AFk+4GX9Xw+HEU6No8irb0mOoLE9O4EBuLAgA-1w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
[-- Attachment #1: Type: text/plain, Size: 1207 bytes --]
Disregard this email. Re-sending in plain-text mode to prevent rejection by
netdev list.
On Fri, Aug 25, 2017 at 10:56 AM Jeffrey Vander Stoep <jeffv-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
wrote:
> I’d like to get your thoughts on adding LSM permission checks on BPF
> objects.
>
> By default, the ability to create and use eBPF maps/programs requires
> CAP_SYS_ADMIN [1]. Alternatively, all processes can be granted access to
> bpf() functions. This seems like poor granularity. [2]
>
> Like files and sockets, eBPF maps and programs can be passed between
> processes by FD and have a number of functions that map cleanly to
> permissions.
>
> Let me know what you think. Are there simpler alternative approaches that
> we haven’t considered?
>
> Thanks!
> Jeff
>
> [1] http://man7.org/linux/man-pages/man2/bpf.2.html NOTES section
> [2] We are considering eBPF for network filtering by netd. Giving netd
> CAP_SYS_ADMIN would considerably increase netd’s privileges. Alternatively
> allowing all processes permission to use bpf() goes against the principle
> of least privilege exposing a lot of kernel attack surface to processes
> that do not actually need it.
>
[-- Attachment #2: Type: text/html, Size: 1660 bytes --]
^ permalink raw reply
* Permissions for eBPF objects
From: Jeffrey Vander Stoep @ 2017-08-25 18:01 UTC (permalink / raw)
To: Chenbo Feng, netdev, SELinux
I’d like to get your thoughts on adding LSM permission checks on BPF objects.
By default, the ability to create and use eBPF maps/programs requires
CAP_SYS_ADMIN [1]. Alternatively, all processes can be granted access
to bpf() functions. This seems like poor granularity. [2]
Like files and sockets, eBPF maps and programs can be passed between
processes by FD and have a number of functions that map cleanly to
permissions.
Let me know what you think. Are there simpler alternative approaches
that we haven’t considered?
Thanks!
Jeff
[1] http://man7.org/linux/man-pages/man2/bpf.2.html NOTES section
[2] We are considering eBPF for network filtering by netd. Giving netd
CAP_SYS_ADMIN would considerably increase netd’s privileges.
Alternatively allowing all processes permission to use bpf() goes
against the principle of least privilege exposing a lot of kernel
attack surface to processes that do not actually need it.
^ permalink raw reply
* Permissions for eBPF objects
From: Jeffrey Vander Stoep via Selinux @ 2017-08-25 17:56 UTC (permalink / raw)
To: Chenbo Feng, SELinux, netdev-u79uwXL29TY76Z2rM5mHXA
[-- Attachment #1: Type: text/plain, Size: 957 bytes --]
I’d like to get your thoughts on adding LSM permission checks on BPF
objects.
By default, the ability to create and use eBPF maps/programs requires
CAP_SYS_ADMIN [1]. Alternatively, all processes can be granted access to
bpf() functions. This seems like poor granularity. [2]
Like files and sockets, eBPF maps and programs can be passed between
processes by FD and have a number of functions that map cleanly to
permissions.
Let me know what you think. Are there simpler alternative approaches that
we haven’t considered?
Thanks!
Jeff
[1] http://man7.org/linux/man-pages/man2/bpf.2.html NOTES section
[2] We are considering eBPF for network filtering by netd. Giving netd
CAP_SYS_ADMIN would considerably increase netd’s privileges. Alternatively
allowing all processes permission to use bpf() goes against the principle
of least privilege exposing a lot of kernel attack surface to processes
that do not actually need it.
[-- Attachment #2: Type: text/html, Size: 1175 bytes --]
^ permalink raw reply
* [PATCH net 3/3] nfp: remove incorrect mask check for vlan matching
From: Pieter Jansen van Vuuren @ 2017-08-25 17:31 UTC (permalink / raw)
To: davem
Cc: netdev, oss-drivers, simon.horman, jakub.kicinski,
Pieter Jansen van Vuuren
In-Reply-To: <1503682263-17858-1-git-send-email-pieter.jansenvanvuuren@netronome.com>
Previously the vlan tci field was incorrectly exact matched. This patch
fixes this by using the flow dissector to populate the vlan tci field.
Fixes: 5571e8c9f241 ("nfp: extend flower matching capabilities")
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
---
drivers/net/ethernet/netronome/nfp/flower/match.c | 8 ++------
1 file changed, 2 insertions(+), 6 deletions(-)
diff --git a/drivers/net/ethernet/netronome/nfp/flower/match.c b/drivers/net/ethernet/netronome/nfp/flower/match.c
index b365110..d25b503 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/match.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/match.c
@@ -42,6 +42,7 @@ nfp_flower_compile_meta_tci(struct nfp_flower_meta_two *frame,
struct tc_cls_flower_offload *flow, u8 key_type,
bool mask_version)
{
+ struct fl_flow_key *target = mask_version ? flow->mask : flow->key;
struct flow_dissector_key_vlan *flow_vlan;
u16 tmp_tci;
@@ -50,15 +51,10 @@ nfp_flower_compile_meta_tci(struct nfp_flower_meta_two *frame,
frame->nfp_flow_key_layer = key_type;
frame->mask_id = ~0;
- if (mask_version) {
- frame->tci = cpu_to_be16(~0);
- return;
- }
-
if (dissector_uses_key(flow->dissector, FLOW_DISSECTOR_KEY_VLAN)) {
flow_vlan = skb_flow_dissector_target(flow->dissector,
FLOW_DISSECTOR_KEY_VLAN,
- flow->key);
+ target);
/* Populate the tci field. */
if (flow_vlan->vlan_id) {
tmp_tci = FIELD_PREP(NFP_FLOWER_MASK_VLAN_PRIO,
--
2.7.4
^ permalink raw reply related
* [PATCH net 2/3] nfp: fix supported key layers calculation
From: Pieter Jansen van Vuuren @ 2017-08-25 17:31 UTC (permalink / raw)
To: davem
Cc: netdev, oss-drivers, simon.horman, jakub.kicinski,
Pieter Jansen van Vuuren
In-Reply-To: <1503682263-17858-1-git-send-email-pieter.jansenvanvuuren@netronome.com>
Previously when calculating the supported key layers MPLS, IPv4/6
TTL and TOS were not considered. This patch checks that the TTL and
TOS fields are masked out before offloading. Additionally this patch
checks that MPLS packets are correctly handled, by not offloading them.
Fixes: af9d842c1354 ("nfp: extend flower add flow offload")
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
---
drivers/net/ethernet/netronome/nfp/flower/offload.c | 19 +++++++++++++++++++
1 file changed, 19 insertions(+)
diff --git a/drivers/net/ethernet/netronome/nfp/flower/offload.c b/drivers/net/ethernet/netronome/nfp/flower/offload.c
index 6c8ecc2..74a96d6 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/offload.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/offload.c
@@ -107,6 +107,7 @@ nfp_flower_calculate_key_layers(struct nfp_fl_key_ls *ret_key_ls,
{
struct flow_dissector_key_basic *mask_basic = NULL;
struct flow_dissector_key_basic *key_basic = NULL;
+ struct flow_dissector_key_ip *mask_ip = NULL;
u32 key_layer_two;
u8 key_layer;
int key_size;
@@ -132,6 +133,11 @@ nfp_flower_calculate_key_layers(struct nfp_fl_key_ls *ret_key_ls,
flow->key);
}
+ if (dissector_uses_key(flow->dissector, FLOW_DISSECTOR_KEY_IP))
+ mask_ip = skb_flow_dissector_target(flow->dissector,
+ FLOW_DISSECTOR_KEY_IP,
+ flow->mask);
+
key_layer_two = 0;
key_layer = NFP_FLOWER_LAYER_PORT | NFP_FLOWER_LAYER_MAC;
key_size = sizeof(struct nfp_flower_meta_one) +
@@ -142,11 +148,19 @@ nfp_flower_calculate_key_layers(struct nfp_fl_key_ls *ret_key_ls,
/* Ethernet type is present in the key. */
switch (key_basic->n_proto) {
case cpu_to_be16(ETH_P_IP):
+ if (mask_ip && mask_ip->tos)
+ return -EOPNOTSUPP;
+ if (mask_ip && mask_ip->ttl)
+ return -EOPNOTSUPP;
key_layer |= NFP_FLOWER_LAYER_IPV4;
key_size += sizeof(struct nfp_flower_ipv4);
break;
case cpu_to_be16(ETH_P_IPV6):
+ if (mask_ip && mask_ip->tos)
+ return -EOPNOTSUPP;
+ if (mask_ip && mask_ip->ttl)
+ return -EOPNOTSUPP;
key_layer |= NFP_FLOWER_LAYER_IPV6;
key_size += sizeof(struct nfp_flower_ipv6);
break;
@@ -157,6 +171,11 @@ nfp_flower_calculate_key_layers(struct nfp_fl_key_ls *ret_key_ls,
case cpu_to_be16(ETH_P_ARP):
return -EOPNOTSUPP;
+ /* Currently we do not offload MPLS. */
+ case cpu_to_be16(ETH_P_MPLS_UC):
+ case cpu_to_be16(ETH_P_MPLS_MC):
+ return -EOPNOTSUPP;
+
/* Will be included in layer 2. */
case cpu_to_be16(ETH_P_8021Q):
break;
--
2.7.4
^ permalink raw reply related
* [PATCH net 1/3] nfp: fix unchecked flow dissector use
From: Pieter Jansen van Vuuren @ 2017-08-25 17:31 UTC (permalink / raw)
To: davem
Cc: netdev, oss-drivers, simon.horman, jakub.kicinski,
Pieter Jansen van Vuuren
In-Reply-To: <1503682263-17858-1-git-send-email-pieter.jansenvanvuuren@netronome.com>
Previously flow dissectors were referenced without first checking that
they are in use and correctly populated by TC. This patch fixes this by
checking each flow dissector key before referencing them.
Fixes: 5571e8c9f241 ("nfp: extend flower matching capabilities")
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
---
drivers/net/ethernet/netronome/nfp/flower/match.c | 133 +++++++++++----------
.../net/ethernet/netronome/nfp/flower/offload.c | 41 ++++---
2 files changed, 93 insertions(+), 81 deletions(-)
diff --git a/drivers/net/ethernet/netronome/nfp/flower/match.c b/drivers/net/ethernet/netronome/nfp/flower/match.c
index 0e08404..b365110 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/match.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/match.c
@@ -45,6 +45,7 @@ nfp_flower_compile_meta_tci(struct nfp_flower_meta_two *frame,
struct flow_dissector_key_vlan *flow_vlan;
u16 tmp_tci;
+ memset(frame, 0, sizeof(struct nfp_flower_meta_two));
/* Populate the metadata frame. */
frame->nfp_flow_key_layer = key_type;
frame->mask_id = ~0;
@@ -54,21 +55,20 @@ nfp_flower_compile_meta_tci(struct nfp_flower_meta_two *frame,
return;
}
- flow_vlan = skb_flow_dissector_target(flow->dissector,
- FLOW_DISSECTOR_KEY_VLAN,
- flow->key);
-
- /* Populate the tci field. */
- if (!flow_vlan->vlan_id) {
- tmp_tci = 0;
- } else {
- tmp_tci = FIELD_PREP(NFP_FLOWER_MASK_VLAN_PRIO,
- flow_vlan->vlan_priority) |
- FIELD_PREP(NFP_FLOWER_MASK_VLAN_VID,
- flow_vlan->vlan_id) |
- NFP_FLOWER_MASK_VLAN_CFI;
+ if (dissector_uses_key(flow->dissector, FLOW_DISSECTOR_KEY_VLAN)) {
+ flow_vlan = skb_flow_dissector_target(flow->dissector,
+ FLOW_DISSECTOR_KEY_VLAN,
+ flow->key);
+ /* Populate the tci field. */
+ if (flow_vlan->vlan_id) {
+ tmp_tci = FIELD_PREP(NFP_FLOWER_MASK_VLAN_PRIO,
+ flow_vlan->vlan_priority) |
+ FIELD_PREP(NFP_FLOWER_MASK_VLAN_VID,
+ flow_vlan->vlan_id) |
+ NFP_FLOWER_MASK_VLAN_CFI;
+ frame->tci = cpu_to_be16(tmp_tci);
+ }
}
- frame->tci = cpu_to_be16(tmp_tci);
}
static void
@@ -99,17 +99,18 @@ nfp_flower_compile_mac(struct nfp_flower_mac_mpls *frame,
bool mask_version)
{
struct fl_flow_key *target = mask_version ? flow->mask : flow->key;
- struct flow_dissector_key_eth_addrs *flow_mac;
-
- flow_mac = skb_flow_dissector_target(flow->dissector,
- FLOW_DISSECTOR_KEY_ETH_ADDRS,
- target);
+ struct flow_dissector_key_eth_addrs *addr;
memset(frame, 0, sizeof(struct nfp_flower_mac_mpls));
- /* Populate mac frame. */
- ether_addr_copy(frame->mac_dst, &flow_mac->dst[0]);
- ether_addr_copy(frame->mac_src, &flow_mac->src[0]);
+ if (dissector_uses_key(flow->dissector, FLOW_DISSECTOR_KEY_ETH_ADDRS)) {
+ addr = skb_flow_dissector_target(flow->dissector,
+ FLOW_DISSECTOR_KEY_ETH_ADDRS,
+ target);
+ /* Populate mac frame. */
+ ether_addr_copy(frame->mac_dst, &addr->dst[0]);
+ ether_addr_copy(frame->mac_src, &addr->src[0]);
+ }
if (mask_version)
frame->mpls_lse = cpu_to_be32(~0);
@@ -121,14 +122,17 @@ nfp_flower_compile_tport(struct nfp_flower_tp_ports *frame,
bool mask_version)
{
struct fl_flow_key *target = mask_version ? flow->mask : flow->key;
- struct flow_dissector_key_ports *flow_tp;
+ struct flow_dissector_key_ports *tp;
- flow_tp = skb_flow_dissector_target(flow->dissector,
- FLOW_DISSECTOR_KEY_PORTS,
- target);
+ memset(frame, 0, sizeof(struct nfp_flower_tp_ports));
- frame->port_src = flow_tp->src;
- frame->port_dst = flow_tp->dst;
+ if (dissector_uses_key(flow->dissector, FLOW_DISSECTOR_KEY_PORTS)) {
+ tp = skb_flow_dissector_target(flow->dissector,
+ FLOW_DISSECTOR_KEY_PORTS,
+ target);
+ frame->port_src = tp->src;
+ frame->port_dst = tp->dst;
+ }
}
static void
@@ -137,25 +141,27 @@ nfp_flower_compile_ipv4(struct nfp_flower_ipv4 *frame,
bool mask_version)
{
struct fl_flow_key *target = mask_version ? flow->mask : flow->key;
- struct flow_dissector_key_ipv4_addrs *flow_ipv4;
- struct flow_dissector_key_basic *flow_basic;
-
- flow_ipv4 = skb_flow_dissector_target(flow->dissector,
- FLOW_DISSECTOR_KEY_IPV4_ADDRS,
- target);
+ struct flow_dissector_key_ipv4_addrs *addr;
+ struct flow_dissector_key_basic *basic;
- flow_basic = skb_flow_dissector_target(flow->dissector,
- FLOW_DISSECTOR_KEY_BASIC,
- target);
-
- /* Populate IPv4 frame. */
- frame->reserved = 0;
- frame->ipv4_src = flow_ipv4->src;
- frame->ipv4_dst = flow_ipv4->dst;
- frame->proto = flow_basic->ip_proto;
/* Wildcard TOS/TTL for now. */
- frame->tos = 0;
- frame->ttl = 0;
+ memset(frame, 0, sizeof(struct nfp_flower_ipv4));
+
+ if (dissector_uses_key(flow->dissector,
+ FLOW_DISSECTOR_KEY_IPV4_ADDRS)) {
+ addr = skb_flow_dissector_target(flow->dissector,
+ FLOW_DISSECTOR_KEY_IPV4_ADDRS,
+ target);
+ frame->ipv4_src = addr->src;
+ frame->ipv4_dst = addr->dst;
+ }
+
+ if (dissector_uses_key(flow->dissector, FLOW_DISSECTOR_KEY_BASIC)) {
+ basic = skb_flow_dissector_target(flow->dissector,
+ FLOW_DISSECTOR_KEY_BASIC,
+ target);
+ frame->proto = basic->ip_proto;
+ }
}
static void
@@ -164,26 +170,27 @@ nfp_flower_compile_ipv6(struct nfp_flower_ipv6 *frame,
bool mask_version)
{
struct fl_flow_key *target = mask_version ? flow->mask : flow->key;
- struct flow_dissector_key_ipv6_addrs *flow_ipv6;
- struct flow_dissector_key_basic *flow_basic;
-
- flow_ipv6 = skb_flow_dissector_target(flow->dissector,
- FLOW_DISSECTOR_KEY_IPV6_ADDRS,
- target);
-
- flow_basic = skb_flow_dissector_target(flow->dissector,
- FLOW_DISSECTOR_KEY_BASIC,
- target);
+ struct flow_dissector_key_ipv6_addrs *addr;
+ struct flow_dissector_key_basic *basic;
- /* Populate IPv6 frame. */
- frame->reserved = 0;
- frame->ipv6_src = flow_ipv6->src;
- frame->ipv6_dst = flow_ipv6->dst;
- frame->proto = flow_basic->ip_proto;
/* Wildcard LABEL/TOS/TTL for now. */
- frame->ipv6_flow_label_exthdr = 0;
- frame->tos = 0;
- frame->ttl = 0;
+ memset(frame, 0, sizeof(struct nfp_flower_ipv6));
+
+ if (dissector_uses_key(flow->dissector,
+ FLOW_DISSECTOR_KEY_IPV6_ADDRS)) {
+ addr = skb_flow_dissector_target(flow->dissector,
+ FLOW_DISSECTOR_KEY_IPV6_ADDRS,
+ target);
+ frame->ipv6_src = addr->src;
+ frame->ipv6_dst = addr->dst;
+ }
+
+ if (dissector_uses_key(flow->dissector, FLOW_DISSECTOR_KEY_BASIC)) {
+ basic = skb_flow_dissector_target(flow->dissector,
+ FLOW_DISSECTOR_KEY_BASIC,
+ target);
+ frame->proto = basic->ip_proto;
+ }
}
int nfp_flower_compile_flow_match(struct tc_cls_flower_offload *flow,
diff --git a/drivers/net/ethernet/netronome/nfp/flower/offload.c b/drivers/net/ethernet/netronome/nfp/flower/offload.c
index 4ad10bd..6c8ecc2 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/offload.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/offload.c
@@ -105,35 +105,40 @@ static int
nfp_flower_calculate_key_layers(struct nfp_fl_key_ls *ret_key_ls,
struct tc_cls_flower_offload *flow)
{
- struct flow_dissector_key_control *mask_enc_ctl;
- struct flow_dissector_key_basic *mask_basic;
- struct flow_dissector_key_basic *key_basic;
+ struct flow_dissector_key_basic *mask_basic = NULL;
+ struct flow_dissector_key_basic *key_basic = NULL;
u32 key_layer_two;
u8 key_layer;
int key_size;
- mask_enc_ctl = skb_flow_dissector_target(flow->dissector,
- FLOW_DISSECTOR_KEY_ENC_CONTROL,
- flow->mask);
+ if (dissector_uses_key(flow->dissector,
+ FLOW_DISSECTOR_KEY_ENC_CONTROL)) {
+ struct flow_dissector_key_control *mask_enc_ctl =
+ skb_flow_dissector_target(flow->dissector,
+ FLOW_DISSECTOR_KEY_ENC_CONTROL,
+ flow->mask);
+ /* We are expecting a tunnel. For now we ignore offloading. */
+ if (mask_enc_ctl->addr_type)
+ return -EOPNOTSUPP;
+ }
+
+ if (dissector_uses_key(flow->dissector, FLOW_DISSECTOR_KEY_BASIC)) {
+ mask_basic = skb_flow_dissector_target(flow->dissector,
+ FLOW_DISSECTOR_KEY_BASIC,
+ flow->mask);
- mask_basic = skb_flow_dissector_target(flow->dissector,
- FLOW_DISSECTOR_KEY_BASIC,
- flow->mask);
+ key_basic = skb_flow_dissector_target(flow->dissector,
+ FLOW_DISSECTOR_KEY_BASIC,
+ flow->key);
+ }
- key_basic = skb_flow_dissector_target(flow->dissector,
- FLOW_DISSECTOR_KEY_BASIC,
- flow->key);
key_layer_two = 0;
key_layer = NFP_FLOWER_LAYER_PORT | NFP_FLOWER_LAYER_MAC;
key_size = sizeof(struct nfp_flower_meta_one) +
sizeof(struct nfp_flower_in_port) +
sizeof(struct nfp_flower_mac_mpls);
- /* We are expecting a tunnel. For now we ignore offloading. */
- if (mask_enc_ctl->addr_type)
- return -EOPNOTSUPP;
-
- if (mask_basic->n_proto) {
+ if (mask_basic && mask_basic->n_proto) {
/* Ethernet type is present in the key. */
switch (key_basic->n_proto) {
case cpu_to_be16(ETH_P_IP):
@@ -166,7 +171,7 @@ nfp_flower_calculate_key_layers(struct nfp_fl_key_ls *ret_key_ls,
}
}
- if (mask_basic->ip_proto) {
+ if (mask_basic && mask_basic->ip_proto) {
/* Ethernet type is present in the key. */
switch (key_basic->ip_proto) {
case IPPROTO_TCP:
--
2.7.4
^ permalink raw reply related
* [PATCH net 0/3] fix layer calculation and flow dissector use
From: Pieter Jansen van Vuuren @ 2017-08-25 17:31 UTC (permalink / raw)
To: davem
Cc: netdev, oss-drivers, simon.horman, jakub.kicinski,
Pieter Jansen van Vuuren
Hi,
Previously when calculating the supported key layers MPLS, IPv4/6
TTL and TOS were not considered. Formerly flow dissectors were referenced
without first checking that they are in use and correctly populated by TC.
Additionally this patch set fixes the incorrect use of mask field for vlan
matching.
Pieter Jansen van Vuuren (3):
nfp: fix unchecked flow dissector use
nfp: fix supported key layers calculation
nfp: remove incorrect mask check for vlan matching
drivers/net/ethernet/netronome/nfp/flower/match.c | 139 +++++++++++----------
.../net/ethernet/netronome/nfp/flower/offload.c | 60 ++++++---
2 files changed, 113 insertions(+), 86 deletions(-)
--
2.7.4
^ permalink raw reply
* Re: [PATCH] net: stmmac: dwmac-sun8i: Use reset exclusive
From: Maxime Ripard @ 2017-08-25 17:12 UTC (permalink / raw)
To: Corentin Labbe
Cc: peppe.cavallaro, alexandre.torgue, wens, netdev, linux-arm-kernel,
linux-kernel
In-Reply-To: <20170825151733.GB9475@Red>
[-- Attachment #1: Type: text/plain, Size: 1840 bytes --]
On Fri, Aug 25, 2017 at 05:17:33PM +0200, Corentin Labbe wrote:
> On Fri, Aug 25, 2017 at 04:48:32PM +0200, Maxime Ripard wrote:
> > On Fri, Aug 25, 2017 at 04:38:05PM +0200, Corentin Labbe wrote:
> > > The current dwmac_sun8i module cannot be rmmod/modprobe due to that
> > > the reset controller was not released when removed.
> > >
> > > This patch remove ambiguity, by using of_reset_control_get_exclusive and
> > > add the missing reset_control_put().
> > >
> > > Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com>
> > > ---
> > > drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c | 3 ++-
> > > 1 file changed, 2 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c b/drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c
> > > index fffd6d5fc907..675a09629d85 100644
> > > --- a/drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c
> > > +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c
> > > @@ -782,6 +782,7 @@ static int sun8i_dwmac_unpower_internal_phy(struct sunxi_priv_data *gmac)
> > >
> > > clk_disable_unprepare(gmac->ephy_clk);
> > > reset_control_assert(gmac->rst_ephy);
> > > + reset_control_put(gmac->rst_ephy);
> > > return 0;
> > > }
> > >
> > > @@ -942,7 +943,7 @@ static int sun8i_dwmac_probe(struct platform_device *pdev)
> > > return -EINVAL;
> > > }
> > >
> > > - gmac->rst_ephy = of_reset_control_get(plat_dat->phy_node, NULL);
> > > + gmac->rst_ephy = of_reset_control_get_exclusive(plat_dat->phy_node, NULL);
> >
> > Why not just use devm_reset_control_get?
> >
>
> Because there no devm_ functions with of_
devm_reset_control_get uses of_reset_control_get internally.
Maxime
--
Maxime Ripard, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 801 bytes --]
^ permalink raw reply
* [PATCH net-next] xen-netback: update ubuf_info initialization to anonymous union
From: Willem de Bruijn @ 2017-08-25 17:10 UTC (permalink / raw)
To: netdev; +Cc: davem, wei.liu2, paul.durrant, kbuild-all, Willem de Bruijn
From: Willem de Bruijn <willemb@google.com>
The xen driver initializes struct ubuf_info fields using designated
initializers. I recently moved these fields inside a nested anonymous
struct inside an anonymous union. I had missed this use case.
This breaks compilation of xen-netback with older compilers.
>From kbuild bot with gcc-4.4.7:
drivers/net//xen-netback/interface.c: In function
'xenvif_init_queue':
>> drivers/net//xen-netback/interface.c:554: error: unknown field 'ctx' specified in initializer
>> drivers/net//xen-netback/interface.c:554: warning: missing braces around initializer
drivers/net//xen-netback/interface.c:554: warning: (near initialization for '(anonymous).<anonymous>')
>> drivers/net//xen-netback/interface.c:554: warning: initialization makes integer from pointer without a cast
>> drivers/net//xen-netback/interface.c:555: error: unknown field 'desc' specified in initializer
Add double braces around the designated initializers to match their
nested position in the struct. After this, compilation succeeds again.
Fixes: 4ab6c99d99bb ("sock: MSG_ZEROCOPY notification coalescing")
Reported-by: kbuild bot <lpk@intel.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
---
drivers/net/xen-netback/interface.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c
index e322a862ddfe..ee8ed9da00ad 100644
--- a/drivers/net/xen-netback/interface.c
+++ b/drivers/net/xen-netback/interface.c
@@ -551,8 +551,8 @@ int xenvif_init_queue(struct xenvif_queue *queue)
for (i = 0; i < MAX_PENDING_REQS; i++) {
queue->pending_tx_info[i].callback_struct = (struct ubuf_info)
{ .callback = xenvif_zerocopy_callback,
- .ctx = NULL,
- .desc = i };
+ { { .ctx = NULL,
+ .desc = i } } };
queue->grant_tx_handle[i] = NETBACK_INVALID_HANDLE;
}
--
2.14.1.342.g6490525c54-goog
^ permalink raw reply related
* Re: [PATCH net-next v2 08/14] net: mvpp2: check the netif is running in the link_event function
From: Antoine Tenart @ 2017-08-25 17:09 UTC (permalink / raw)
To: Florian Fainelli
Cc: Antoine Tenart, davem, kishon, andrew, jason,
sebastian.hesselbarth, gregory.clement, thomas.petazzoni, nadavh,
linux, linux-kernel, mw, stefanc, miquel.raynal, netdev
In-Reply-To: <1faa09d8-305a-f6d2-facf-d28f336c849a@gmail.com>
[-- Attachment #1: Type: text/plain, Size: 821 bytes --]
Hi Florian,
On Fri, Aug 25, 2017 at 09:49:15AM -0700, Florian Fainelli wrote:
> On 08/25/2017 07:48 AM, Antoine Tenart wrote:
> > This patch adds an extra check when the link_event function is called,
> > so that it won't do anything when the netif isn't running.
>
> Why is this needed? Are you possibly starting the PHY state machine
> earlier than your ndo_open() call? Looking quickly through the driver
> does not suggest this is going on since you properly connect to the PHY
> in mvpp2_open() and start the PHY there.
I added some checks while working on this, and kept this one. But I
looked at the driver again and I assume you're right and the patch could
be dropped.
Thanks!
Antoine
--
Antoine Ténart, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply
* Re: [PATCH net-next v2 08/14] net: mvpp2: check the netif is running in the link_event function
From: Florian Fainelli @ 2017-08-25 16:49 UTC (permalink / raw)
To: Antoine Tenart, davem, kishon, andrew, jason,
sebastian.hesselbarth, gregory.clement
Cc: thomas.petazzoni, nadavh, linux, linux-kernel, mw, stefanc,
miquel.raynal, netdev
In-Reply-To: <20170825144821.31129-9-antoine.tenart@free-electrons.com>
On 08/25/2017 07:48 AM, Antoine Tenart wrote:
> This patch adds an extra check when the link_event function is called,
> so that it won't do anything when the netif isn't running.
Why is this needed? Are you possibly starting the PHY state machine
earlier than your ndo_open() call? Looking quickly through the driver
does not suggest this is going on since you properly connect to the PHY
in mvpp2_open() and start the PHY there.
>
> Signed-off-by: Antoine Tenart <antoine.tenart@free-electrons.com>
> ---
> drivers/net/ethernet/marvell/mvpp2.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/drivers/net/ethernet/marvell/mvpp2.c b/drivers/net/ethernet/marvell/mvpp2.c
> index 1b26f5ed994f..49a6789a4142 100644
> --- a/drivers/net/ethernet/marvell/mvpp2.c
> +++ b/drivers/net/ethernet/marvell/mvpp2.c
> @@ -5741,6 +5741,9 @@ static void mvpp2_link_event(struct net_device *dev)
> struct mvpp2_port *port = netdev_priv(dev);
> struct phy_device *phydev = dev->phydev;
>
> + if (!netif_running(dev))
> + return;
> +
> if (phydev->link) {
> if ((port->speed != phydev->speed) ||
> (port->duplex != phydev->duplex)) {
>
--
Florian
^ permalink raw reply
* Re: Possible race in nsc-ircc.ko
From: Jean Tourrilhes @ 2017-08-25 16:48 UTC (permalink / raw)
To: Anton Volkov
Cc: dagb, samuel, netdev, linux-kernel, ldv-project,
Alexey Khoroshilov
In-Reply-To: <4a33a281-f8bd-305b-e580-0e594feea799@ispras.ru>
On Fri, Aug 25, 2017 at 05:05:25PM +0300, Anton Volkov wrote:
> Hello.
>
> While searching for races in the Linux kernel I've come across
> "drivers/net/irda/nsc-ircc.ko" module. Here is a question that I came up
> with while analyzing results. Lines are given using the info from Linux
> v4.12.
>
> Consider the following case:
>
> Thread 1: Thread 2:
> nsc_ircc_init
> ->nsc_ircc_open
> self = netdev_priv(dev)
> register_netdev(dev)
> nsc_ircc_net_ioctl
> ->nsc_ircc_change_speed
> self->dongle_id = ... <READ self->io.dongle_id>
> (nsc-ircc.c: line 485) (nsc-ircc.c: line 1318)
> platform_device_register_simple
>
> Before the initialization of self->dongle_id in msc_ircc_open() its value is
> 0. Thus if read access to its value in nsc_ircc_change_speed occurs before
> the initialization there will be an attempt to change speed of dongle with
> undesired id (if the dongle with id 0 exists). Is this case feasible from
> your point of view?
>
> Thank you for your time.
>
> -- Anton Volkov
A first glance, that seems like a valid race. I'm not sure if
there is a netdev lock/status to protect the driver, because it looks
like doing any operation on a device before "open" has completed would
be dangerous for most drivers.
I don't have time to check the code paths, as I have not
looked at that code in ages.
Good luck !
Jean
^ permalink raw reply
* Re: [PATCH net-next v2] e1000e: Be drop monitor friendly
From: Florian Fainelli @ 2017-08-25 16:47 UTC (permalink / raw)
To: netdev
Cc: davem, edumazet, Jeff Kirsher,
moderated list:INTEL ETHERNET DRIVERS, open list
In-Reply-To: <20170825035824.26935-1-f.fainelli@gmail.com>
On 08/24/2017 08:58 PM, Florian Fainelli wrote:
> e1000_put_txbuf() cleans up the successfully transmitted TX packets,
> e1000e_tx_hwtstamp_work() also does the successfully completes the
> timestamped TX packets, e1000_clean_rx_ring() cleans up the RX ring and
> e1000_remove() cleans up the timestampted packets. None of these
> functions should be reporting dropped packets, so make them use
> dev_consume_skb_any() to be drop monitor friendly.
>
> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
> ---
> drivers/net/ethernet/intel/e1000e/netdev.c | 10 +++++-----
> 1 file changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c
> index 327dfe5bedc0..a90e459c5b8a 100644
> --- a/drivers/net/ethernet/intel/e1000e/netdev.c
> +++ b/drivers/net/ethernet/intel/e1000e/netdev.c
> @@ -1085,7 +1085,7 @@ static void e1000_put_txbuf(struct e1000_ring *tx_ring,
> buffer_info->dma = 0;
> }
> if (buffer_info->skb) {
> - dev_kfree_skb_any(buffer_info->skb);
> + dev_consume_skb_any(buffer_info->skb);
> buffer_info->skb = NULL;
> }
> buffer_info->time_stamp = 0;
> @@ -1199,7 +1199,7 @@ static void e1000e_tx_hwtstamp_work(struct work_struct *work)
> wmb(); /* force write prior to skb_tstamp_tx */
>
> skb_tstamp_tx(skb, &shhwtstamps);
> - dev_kfree_skb_any(skb);
> + dev_consume_skb_any(skb);
> } else if (time_after(jiffies, adapter->tx_hwtstamp_start
> + adapter->tx_timeout_factor * HZ)) {
> dev_kfree_skb_any(adapter->tx_hwtstamp_skb);
> @@ -1716,7 +1716,7 @@ static void e1000_clean_rx_ring(struct e1000_ring *rx_ring)
> }
>
> if (buffer_info->skb) {
> - dev_kfree_skb(buffer_info->skb);
> + dev_consume_skb_any(buffer_info->skb);
This one is not actually needed since dev_kfree_skb() is an alias for
consume_skb().
> buffer_info->skb = NULL;
> }
>
> @@ -1734,7 +1734,7 @@ static void e1000_clean_rx_ring(struct e1000_ring *rx_ring)
>
> /* there also may be some cached data from a chained receive */
> if (rx_ring->rx_skb_top) {
> - dev_kfree_skb(rx_ring->rx_skb_top);
> + dev_consume_skb_any(rx_ring->rx_skb_top);
Same here.
I will make a repost of this patch later, Jeff please drop it if you
have it queued already, thanks!
--
Florian
^ permalink raw reply
* Re: [PATCH] net: stmmac: Handle possible fixed-link with need_mdio_ids
From: Florian Fainelli @ 2017-08-25 16:45 UTC (permalink / raw)
To: Andrew Lunn, Corentin Labbe
Cc: peppe.cavallaro, alexandre.torgue, netdev, linux-kernel
In-Reply-To: <20170825162827.GA4207@lunn.ch>
On 08/25/2017 09:28 AM, Andrew Lunn wrote:
> On Fri, Aug 25, 2017 at 04:42:08PM +0200, Corentin Labbe wrote:
>> In case of fixed link, there are no mdio node.
>> This patch add a test for fixed-link for bypassing MDIO node register.
>
> The two are not mutually exclusive. E.g.
> vf610-zii-dev.dtsi/vf610-zii-dev-rev-b.dts. It has a fixed-link on
> the FEC ethernet controller, and an Ethernet switch on the MDIO bus.
>
> If anybody ever wants to use a switch with the stmmac, this will be
> required.
This is already done in the Lamobo R1 DTS file so it would be nice not
to break this use case:
&gmac {
pinctrl-names = "default";
pinctrl-0 = <&gmac_pins_rgmii_a>;
phy-mode = "rgmii";
phy-supply = <®_gmac_3v3>;
status = "okay";
fixed-link {
speed = <1000>;
full-duplex;
};
mdio {
compatible = "snps,dwmac-mdio";
#address-cells = <1>;
#size-cells = <0>;
switch: ethernet-switch@1e {
compatible = "brcm,bcm53125";
reg = <30>;
#address-cells = <1>;
#size-cells = <0>;
>
> Andrew
>
--
Florian
^ permalink raw reply
* Re: [PATCH net-next v2] net: mv643xx_eth: Be drop monitor friendly
From: Florian Fainelli @ 2017-08-25 16:44 UTC (permalink / raw)
To: Eric Dumazet; +Cc: netdev, davem, andrew, Sebastian Hesselbarth, open list
In-Reply-To: <1503635699.2499.100.camel@edumazet-glaptop3.roam.corp.google.com>
On 08/24/2017 09:34 PM, Eric Dumazet wrote:
> On Thu, 2017-08-24 at 20:55 -0700, Florian Fainelli wrote:
>> txq_reclaim() does the normal transmit queue reclamation and
>> rxq_deinit() does the RX ring cleanup, none of these are packet drops,
>> so use dev_consume_skb() for both locations.
>>
>> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
>> ---
>> drivers/net/ethernet/marvell/mv643xx_eth.c | 4 ++--
>> 1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/marvell/mv643xx_eth.c b/drivers/net/ethernet/marvell/mv643xx_eth.c
>> index fb2d533ae4ef..81c1fac00d33 100644
>> --- a/drivers/net/ethernet/marvell/mv643xx_eth.c
>> +++ b/drivers/net/ethernet/marvell/mv643xx_eth.c
>> @@ -1121,7 +1121,7 @@ static int txq_reclaim(struct tx_queue *txq, int budget, int force)
>> struct sk_buff *skb = __skb_dequeue(&txq->tx_skb);
>>
>> if (!WARN_ON(!skb))
>> - dev_kfree_skb(skb);
>> + dev_consume_skb_any(skb);
>> }
>>
>> if (cmd_sts & ERROR_SUMMARY) {
>> @@ -2024,7 +2024,7 @@ static void rxq_deinit(struct rx_queue *rxq)
>>
>> for (i = 0; i < rxq->rx_ring_size; i++) {
>> if (rxq->rx_skb[i]) {
>> - dev_kfree_skb(rxq->rx_skb[i]);
>> + dev_consume_skb_any(rxq->rx_skb[i]);
>> rxq->rx_desc_count--;
>> }
>> }
>
>
> I do not believe this patch is needed.
>
> dev_kfree_skb() is an alias of consume_skb(), which is already drop
> monitor ready ;)
You are right, this patch is/was not needed, I have been too trigger
happy with the dev_kfree_skb*() replacement, but only
dev_kfree_skb_{any,irq} are valid candidates for replacement by
dev_consume_skb_{any,irq}. I will re-audit my other two submissions to
e1000e and r8169. Thanks Eric.
--
Florian
^ permalink raw reply
* Re: [PATCH] net: stmmac: Handle possible fixed-link with need_mdio_ids
From: Andrew Lunn @ 2017-08-25 16:28 UTC (permalink / raw)
To: Corentin Labbe; +Cc: peppe.cavallaro, alexandre.torgue, netdev, linux-kernel
In-Reply-To: <20170825144208.24503-1-clabbe.montjoie@gmail.com>
On Fri, Aug 25, 2017 at 04:42:08PM +0200, Corentin Labbe wrote:
> In case of fixed link, there are no mdio node.
> This patch add a test for fixed-link for bypassing MDIO node register.
The two are not mutually exclusive. E.g.
vf610-zii-dev.dtsi/vf610-zii-dev-rev-b.dts. It has a fixed-link on
the FEC ethernet controller, and an Ethernet switch on the MDIO bus.
If anybody ever wants to use a switch with the stmmac, this will be
required.
Andrew
^ permalink raw reply
* Re: [PATCH net-next v6 2/3] net: gso: Add GSO support for NSH
From: Jiri Benc @ 2017-08-25 16:25 UTC (permalink / raw)
To: Yi Yang; +Cc: netdev, dev, e, blp, jan.scheurich
In-Reply-To: <1503670805-31051-3-git-send-email-yi.y.yang@intel.com>
On Fri, 25 Aug 2017 22:20:04 +0800, Yi Yang wrote:
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -766,7 +766,7 @@ struct sk_buff {
> __u8 ndisc_nodetype:2;
> #endif
> __u8 ipvs_property:1;
> - __u8 inner_protocol_type:1;
> + __u8 inner_protocol_type:2;
Adding anything to sk_buff is pretty much forbidden. You can't add more
bytes to it and there are no more free bits to use, either.
Luckily, we still have one byte hole next to inner_ipproto that we can
use. What is needed is renaming of ENCAP_TYPE_IPPROTO to ENCAP_TYPE_L3
and storing the L3 type in the unused byte. It's not beautiful (would
be better to use ethertype than a custom enum) but it will work.
While looking at this, I realized that GSO for VXLAN-GPE is broken,
too. Let me fix it by implementing what I described above which will
make your patch much easier.
Jiri
^ permalink raw reply
* RE: [RFC PATCH] net: limit maximum number of packets to mark with xmit_more
From: Keller, Jacob E @ 2017-08-25 16:24 UTC (permalink / raw)
To: Stephen Hemminger, Waskiewicz Jr, Peter; +Cc: netdev@vger.kernel.org
In-Reply-To: <20170825085816.3425a70c@xeon-e3>
> -----Original Message-----
> From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> Sent: Friday, August 25, 2017 8:58 AM
> To: Waskiewicz Jr, Peter <peter.waskiewicz.jr@intel.com>
> Cc: Keller, Jacob E <jacob.e.keller@intel.com>; netdev@vger.kernel.org
> Subject: Re: [RFC PATCH] net: limit maximum number of packets to mark with
> xmit_more
>
> xmit_more is only a hint to the device. The device driver should ignore it unless
> there are hardware advantages. The device driver is the place with HW specific
> knowledge (like 4 Tx descriptors is equivalent to one PCI transaction on this
> device).
>
> Anything that pushes that optimization out to the user is only useful for
> benchmarks
> and embedded devices.
Right so most drivers I've seen simply take it as a "avoid bumping tail of a ring" whenever they see xmit_more. But unfortunately in some circumstances, this results in potentially several hundred packets being set with xmit_more in a row, and then the driver doesn't bump the tail for a long time, resulting in high latency spikes..
I was trying to find a way to fix this potentially in multiple drivers, rather than just a single driver, since I figured the same sort of code might need to be needed.
So you're suggesting we should just perform some check in the device driver, even if it might be duplication?
We could also instead make it a setting in the netdev struct or something which would be set by the driver and then tell stack code to limit how many it sends at once (so that we don't need to duplicate that checking code in every driver?)
Thanks,
Jake
^ permalink raw reply
* [PATCH net-next 3/3] samples/bpf: extend test_tunnel_bpf.sh with ERSPAN
From: William Tu @ 2017-08-25 16:21 UTC (permalink / raw)
To: netdev
In-Reply-To: <1503678089-27131-1-git-send-email-u9012063@gmail.com>
Extend existing tests for vxlan, gre, geneve, ipip to
include ERSPAN tunnel.
Signed-off-by: William Tu <u9012063@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
---
samples/bpf/tcbpf2_kern.c | 63 +++++++++++++++++++++++++++++++++++++++++-
samples/bpf/test_tunnel_bpf.sh | 29 +++++++++++++++++++
2 files changed, 91 insertions(+), 1 deletion(-)
diff --git a/samples/bpf/tcbpf2_kern.c b/samples/bpf/tcbpf2_kern.c
index 270edcc149a1..370b749f5ee6 100644
--- a/samples/bpf/tcbpf2_kern.c
+++ b/samples/bpf/tcbpf2_kern.c
@@ -17,6 +17,7 @@
#include <uapi/linux/pkt_cls.h>
#include <net/ipv6.h>
#include "bpf_helpers.h"
+#include "bpf_endian.h"
#define _htonl __builtin_bswap32
#define ERROR(ret) do {\
@@ -38,6 +39,10 @@ struct vxlan_metadata {
u32 gbp;
};
+struct erspan_metadata {
+ __be32 index;
+};
+
SEC("gre_set_tunnel")
int _gre_set_tunnel(struct __sk_buff *skb)
{
@@ -76,6 +81,63 @@ int _gre_get_tunnel(struct __sk_buff *skb)
return TC_ACT_OK;
}
+SEC("erspan_set_tunnel")
+int _erspan_set_tunnel(struct __sk_buff *skb)
+{
+ struct bpf_tunnel_key key;
+ struct erspan_metadata md;
+ int ret;
+
+ __builtin_memset(&key, 0x0, sizeof(key));
+ key.remote_ipv4 = 0xac100164; /* 172.16.1.100 */
+ key.tunnel_id = 2;
+ key.tunnel_tos = 0;
+ key.tunnel_ttl = 64;
+
+ ret = bpf_skb_set_tunnel_key(skb, &key, sizeof(key), BPF_F_ZERO_CSUM_TX);
+ if (ret < 0) {
+ ERROR(ret);
+ return TC_ACT_SHOT;
+ }
+
+ md.index = htonl(123);
+ ret = bpf_skb_set_tunnel_opt(skb, &md, sizeof(md));
+ if (ret < 0) {
+ ERROR(ret);
+ return TC_ACT_SHOT;
+ }
+
+ return TC_ACT_OK;
+}
+
+SEC("erspan_get_tunnel")
+int _erspan_get_tunnel(struct __sk_buff *skb)
+{
+ char fmt[] = "key %d remote ip 0x%x erspan index 0x%x\n";
+ struct bpf_tunnel_key key;
+ struct erspan_metadata md;
+ u32 index;
+ int ret;
+
+ ret = bpf_skb_get_tunnel_key(skb, &key, sizeof(key), 0);
+ if (ret < 0) {
+ ERROR(ret);
+ return TC_ACT_SHOT;
+ }
+
+ ret = bpf_skb_get_tunnel_opt(skb, &md, sizeof(md));
+ if (ret < 0) {
+ ERROR(ret);
+ return TC_ACT_SHOT;
+ }
+
+ index = bpf_ntohl(md.index);
+ bpf_trace_printk(fmt, sizeof(fmt),
+ key.tunnel_id, key.remote_ipv4, index);
+
+ return TC_ACT_OK;
+}
+
SEC("vxlan_set_tunnel")
int _vxlan_set_tunnel(struct __sk_buff *skb)
{
@@ -378,5 +440,4 @@ int _ip6ip6_get_tunnel(struct __sk_buff *skb)
return TC_ACT_OK;
}
-
char _license[] SEC("license") = "GPL";
diff --git a/samples/bpf/test_tunnel_bpf.sh b/samples/bpf/test_tunnel_bpf.sh
index a70d2ea90313..410052d9fc37 100755
--- a/samples/bpf/test_tunnel_bpf.sh
+++ b/samples/bpf/test_tunnel_bpf.sh
@@ -32,6 +32,19 @@ function add_gre_tunnel {
ip addr add dev $DEV 10.1.1.200/24
}
+function add_erspan_tunnel {
+ # in namespace
+ ip netns exec at_ns0 \
+ ip link add dev $DEV_NS type $TYPE seq key 2 local 172.16.1.100 remote 172.16.1.200 erspan 123
+ ip netns exec at_ns0 ip link set dev $DEV_NS up
+ ip netns exec at_ns0 ip addr add dev $DEV_NS 10.1.1.100/24
+
+ # out of namespace
+ ip link add dev $DEV type $TYPE external
+ ip link set dev $DEV up
+ ip addr add dev $DEV 10.1.1.200/24
+}
+
function add_vxlan_tunnel {
# Set static ARP entry here because iptables set-mark works
# on L3 packet, as a result not applying to ARP packets,
@@ -99,6 +112,18 @@ function test_gre {
cleanup
}
+function test_erspan {
+ TYPE=erspan
+ DEV_NS=erspan00
+ DEV=erspan11
+ config_device
+ add_erspan_tunnel
+ attach_bpf $DEV erspan_set_tunnel erspan_get_tunnel
+ ping -c 1 10.1.1.100
+ ip netns exec at_ns0 ping -c 1 10.1.1.200
+ cleanup
+}
+
function test_vxlan {
TYPE=vxlan
DEV_NS=vxlan00
@@ -151,14 +176,18 @@ function cleanup {
ip link del gretap11
ip link del vxlan11
ip link del geneve11
+ ip link del erspan11
pkill tcpdump
pkill cat
set -ex
}
+trap cleanup 0 2 3 6 9
cleanup
echo "Testing GRE tunnel..."
test_gre
+echo "Testing ERSPAN tunnel..."
+test_erspan
echo "Testing VXLAN tunnel..."
test_vxlan
echo "Testing GENEVE tunnel..."
--
2.7.4
^ permalink raw reply related
* [PATCH net-next 2/3] gre: add collect_md mode to ERSPAN tunnel
From: William Tu @ 2017-08-25 16:21 UTC (permalink / raw)
To: netdev
In-Reply-To: <1503678089-27131-1-git-send-email-u9012063@gmail.com>
Similar to gre, vxlan, geneve, ipip tunnels, allow ERSPAN tunnels to
operate in 'collect metadata' mode. bpf_skb_[gs]et_tunnel_key() helpers
can make use of it right away. OVS can use it as well in the future.
Signed-off-by: William Tu <u9012063@gmail.com>
---
include/net/ip_tunnels.h | 4 +-
net/ipv4/ip_gre.c | 102 +++++++++++++++++++++++++++++++++++++++++++++--
2 files changed, 101 insertions(+), 5 deletions(-)
diff --git a/include/net/ip_tunnels.h b/include/net/ip_tunnels.h
index 625c29329372..992652856fe8 100644
--- a/include/net/ip_tunnels.h
+++ b/include/net/ip_tunnels.h
@@ -154,8 +154,10 @@ struct ip_tunnel {
#define TUNNEL_GENEVE_OPT __cpu_to_be16(0x0800)
#define TUNNEL_VXLAN_OPT __cpu_to_be16(0x1000)
#define TUNNEL_NOCACHE __cpu_to_be16(0x2000)
+#define TUNNEL_ERSPAN_OPT __cpu_to_be16(0x4000)
-#define TUNNEL_OPTIONS_PRESENT (TUNNEL_GENEVE_OPT | TUNNEL_VXLAN_OPT)
+#define TUNNEL_OPTIONS_PRESENT \
+ (TUNNEL_GENEVE_OPT | TUNNEL_VXLAN_OPT | TUNNEL_ERSPAN_OPT)
struct tnl_ptk_info {
__be16 flags;
diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
index 453b7925b940..0162fb955b33 100644
--- a/net/ipv4/ip_gre.c
+++ b/net/ipv4/ip_gre.c
@@ -113,6 +113,8 @@ MODULE_PARM_DESC(log_ecn_error, "Log packets received with corrupted ECN");
static struct rtnl_link_ops ipgre_link_ops __read_mostly;
static int ipgre_tunnel_init(struct net_device *dev);
+static void erspan_build_header(struct sk_buff *skb,
+ __be32 id, u32 index, bool truncate);
static unsigned int ipgre_net_id __read_mostly;
static unsigned int gre_tap_net_id __read_mostly;
@@ -287,7 +289,33 @@ static int erspan_rcv(struct sk_buff *skb, struct tnl_ptk_info *tpi,
false, false) < 0)
goto drop;
- tunnel->index = ntohl(index);
+ if (tunnel->collect_md) {
+ struct ip_tunnel_info *info;
+ struct erspan_metadata *md;
+ __be64 tun_id;
+ __be16 flags;
+
+ tpi->flags |= TUNNEL_KEY;
+ flags = tpi->flags;
+ tun_id = key32_to_tunnel_id(tpi->key);
+
+ tun_dst = ip_tun_rx_dst(skb, flags,
+ tun_id, sizeof(*md));
+ if (!tun_dst)
+ return PACKET_REJECT;
+
+ md = ip_tunnel_info_opts(&tun_dst->u.tun_info);
+ if (!md)
+ return PACKET_REJECT;
+
+ md->index = index;
+ info = &tun_dst->u.tun_info;
+ info->key.tun_flags |= TUNNEL_ERSPAN_OPT;
+ info->options_len = sizeof(*md);
+ } else {
+ tunnel->index = ntohl(index);
+ }
+
skb_reset_mac_header(skb);
ip_tunnel_rcv(tunnel, skb, tpi, tun_dst, log_ecn_error);
return PACKET_RCVD;
@@ -523,6 +551,64 @@ static void gre_fb_xmit(struct sk_buff *skb, struct net_device *dev,
dev->stats.tx_dropped++;
}
+static void erspan_fb_xmit(struct sk_buff *skb, struct net_device *dev,
+ __be16 proto)
+{
+ struct ip_tunnel *tunnel = netdev_priv(dev);
+ struct ip_tunnel_info *tun_info;
+ const struct ip_tunnel_key *key;
+ struct erspan_metadata *md;
+ struct rtable *rt = NULL;
+ bool truncate = false;
+ struct flowi4 fl;
+ int tunnel_hlen;
+ __be16 df;
+
+ tun_info = skb_tunnel_info(skb);
+ if (unlikely(!tun_info || !(tun_info->mode & IP_TUNNEL_INFO_TX) ||
+ ip_tunnel_info_af(tun_info) != AF_INET))
+ goto err_free_skb;
+
+ key = &tun_info->key;
+
+ /* ERSPAN has fixed 8 byte GRE header */
+ tunnel_hlen = 8 + sizeof(struct erspanhdr);
+
+ rt = prepare_fb_xmit(skb, dev, &fl, tunnel_hlen);
+ if (!rt)
+ return;
+
+ if (gre_handle_offloads(skb, false))
+ goto err_free_rt;
+
+ if (skb->len > dev->mtu) {
+ pskb_trim(skb, dev->mtu);
+ truncate = true;
+ }
+
+ md = ip_tunnel_info_opts(tun_info);
+ if (!md)
+ goto err_free_rt;
+
+ erspan_build_header(skb, tunnel_id_to_key32(key->tun_id),
+ ntohl(md->index), truncate);
+
+ gre_build_header(skb, 8, TUNNEL_SEQ,
+ htons(ETH_P_ERSPAN), 0, htonl(tunnel->o_seqno++));
+
+ df = key->tun_flags & TUNNEL_DONT_FRAGMENT ? htons(IP_DF) : 0;
+
+ iptunnel_xmit(skb->sk, rt, skb, fl.saddr, key->u.ipv4.dst, IPPROTO_GRE,
+ key->tos, key->ttl, df, false);
+ return;
+
+err_free_rt:
+ ip_rt_put(rt);
+err_free_skb:
+ kfree_skb(skb);
+ dev->stats.tx_dropped++;
+}
+
static int gre_fill_metadata_dst(struct net_device *dev, struct sk_buff *skb)
{
struct ip_tunnel_info *info = skb_tunnel_info(skb);
@@ -636,6 +722,11 @@ static netdev_tx_t erspan_xmit(struct sk_buff *skb,
struct ip_tunnel *tunnel = netdev_priv(dev);
bool truncate = false;
+ if (tunnel->collect_md) {
+ erspan_fb_xmit(skb, dev, skb->protocol);
+ return NETDEV_TX_OK;
+ }
+
if (gre_handle_offloads(skb, false))
goto free_skb;
@@ -998,9 +1089,12 @@ static int erspan_validate(struct nlattr *tb[], struct nlattr *data[],
return ret;
/* ERSPAN should only have GRE sequence and key flag */
- flags |= nla_get_be16(data[IFLA_GRE_OFLAGS]);
- flags |= nla_get_be16(data[IFLA_GRE_IFLAGS]);
- if (flags != (GRE_SEQ | GRE_KEY))
+ if (data[IFLA_GRE_OFLAGS])
+ flags |= nla_get_be16(data[IFLA_GRE_OFLAGS]);
+ if (data[IFLA_GRE_IFLAGS])
+ flags |= nla_get_be16(data[IFLA_GRE_IFLAGS]);
+ if (!data[IFLA_GRE_COLLECT_METADATA] &&
+ flags != (GRE_SEQ | GRE_KEY))
return -EINVAL;
/* ERSPAN Session ID only has 10-bit. Since we reuse
--
2.7.4
^ permalink raw reply related
* [PATCH net-next 1/3] gre: refactor the gre_fb_xmit
From: William Tu @ 2017-08-25 16:21 UTC (permalink / raw)
To: netdev
In-Reply-To: <1503678089-27131-1-git-send-email-u9012063@gmail.com>
The patch refactors the gre_fb_xmit function, by creating
prepare_fb_xmit function for later ERSPAN collect_md mode patch.
Signed-off-by: William Tu <u9012063@gmail.com>
---
net/ipv4/ip_gre.c | 55 ++++++++++++++++++++++++++++++++++++++++---------------
1 file changed, 40 insertions(+), 15 deletions(-)
diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
index f70674799fdd..453b7925b940 100644
--- a/net/ipv4/ip_gre.c
+++ b/net/ipv4/ip_gre.c
@@ -432,39 +432,33 @@ static struct rtable *gre_get_rt(struct sk_buff *skb,
return ip_route_output_key(net, fl);
}
-static void gre_fb_xmit(struct sk_buff *skb, struct net_device *dev,
- __be16 proto)
+static struct rtable *prepare_fb_xmit(struct sk_buff *skb,
+ struct net_device *dev,
+ struct flowi4 *fl,
+ int tunnel_hlen)
{
struct ip_tunnel_info *tun_info;
const struct ip_tunnel_key *key;
struct rtable *rt = NULL;
- struct flowi4 fl;
int min_headroom;
- int tunnel_hlen;
- __be16 df, flags;
bool use_cache;
int err;
tun_info = skb_tunnel_info(skb);
- if (unlikely(!tun_info || !(tun_info->mode & IP_TUNNEL_INFO_TX) ||
- ip_tunnel_info_af(tun_info) != AF_INET))
- goto err_free_skb;
-
key = &tun_info->key;
use_cache = ip_tunnel_dst_cache_usable(skb, tun_info);
+
if (use_cache)
- rt = dst_cache_get_ip4(&tun_info->dst_cache, &fl.saddr);
+ rt = dst_cache_get_ip4(&tun_info->dst_cache, &fl->saddr);
if (!rt) {
- rt = gre_get_rt(skb, dev, &fl, key);
+ rt = gre_get_rt(skb, dev, fl, key);
if (IS_ERR(rt))
- goto err_free_skb;
+ goto err_free_skb;
if (use_cache)
dst_cache_set_ip4(&tun_info->dst_cache, &rt->dst,
- fl.saddr);
+ fl->saddr);
}
- tunnel_hlen = gre_calc_hlen(key->tun_flags);
-
min_headroom = LL_RESERVED_SPACE(rt->dst.dev) + rt->dst.header_len
+ tunnel_hlen + sizeof(struct iphdr);
if (skb_headroom(skb) < min_headroom || skb_header_cloned(skb)) {
@@ -476,6 +470,37 @@ static void gre_fb_xmit(struct sk_buff *skb, struct net_device *dev,
if (unlikely(err))
goto err_free_rt;
}
+ return rt;
+
+err_free_rt:
+ ip_rt_put(rt);
+err_free_skb:
+ kfree_skb(skb);
+ dev->stats.tx_dropped++;
+ return NULL;
+}
+
+static void gre_fb_xmit(struct sk_buff *skb, struct net_device *dev,
+ __be16 proto)
+{
+ struct ip_tunnel_info *tun_info;
+ const struct ip_tunnel_key *key;
+ struct rtable *rt = NULL;
+ struct flowi4 fl;
+ int tunnel_hlen;
+ __be16 df, flags;
+
+ tun_info = skb_tunnel_info(skb);
+ if (unlikely(!tun_info || !(tun_info->mode & IP_TUNNEL_INFO_TX) ||
+ ip_tunnel_info_af(tun_info) != AF_INET))
+ goto err_free_skb;
+
+ key = &tun_info->key;
+ tunnel_hlen = gre_calc_hlen(key->tun_flags);
+
+ rt = prepare_fb_xmit(skb, dev, &fl, tunnel_hlen);
+ if (!rt)
+ return;
/* Push Tunnel header. */
if (gre_handle_offloads(skb, !!(tun_info->key.tun_flags & TUNNEL_CSUM)))
--
2.7.4
^ permalink raw reply related
* [PATCH net-next 0/3] gre: add collect_md mode for ERSPAN tunnel
From: William Tu @ 2017-08-25 16:21 UTC (permalink / raw)
To: netdev
This patch series provide collect_md mode for ERSPAN tunnel. The fist patch
refactors the existing gre_fb_xmit function by exacting the route cache
portion into a new function called prepare_fb_xmit. The second patch
introduces the collect_md mode for ERSPAN tunnel, by calling the
prepare_fb_xmit function and adding ERSPAN specific logic. The final patch
adds the test case using bpf_skb_{set,get}_tunnel_{key,opt}.
Thank you
William Tu (3):
gre: refactor the gre_fb_xmit
gre: add collect_md mode to ERSPAN tunnel
samples/bpf: extend test_tunnel_bpf.sh with ERSPAN
include/net/ip_tunnels.h | 4 +-
net/ipv4/ip_gre.c | 157 ++++++++++++++++++++++++++++++++++++-----
samples/bpf/tcbpf2_kern.c | 63 ++++++++++++++++-
samples/bpf/test_tunnel_bpf.sh | 29 ++++++++
4 files changed, 232 insertions(+), 21 deletions(-)
--
2.7.4
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox