Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH net-next v3 2/4] ipv6: Compute multipath hash for ICMP errors from offending packet
From: Jakub Sitnicki @ 2017-08-23  7:58 UTC (permalink / raw)
  To: netdev
  Cc: David S. Miller, Hannes Frederic Sowa, Nikolay Aleksandrov,
	Tom Herbert
In-Reply-To: <20170823075831.27031-1-jkbs@redhat.com>

When forwarding or sending out an ICMPv6 error, look at the embedded
packet that triggered the error and compute a flow hash over its
headers.

This let's us route the ICMP error together with the flow it belongs to
when multipath (ECMP) routing is in use, which in turn makes Path MTU
Discovery work in ECMP load-balanced or anycast setups (RFC 7690).

Granted, end-hosts behind the ECMP router (aka servers) need to reflect
the IPv6 Flow Label for PMTUD to work.

The code is organized to be in parallel with ipv4 stack:

  ip_multipath_l3_keys -> ip6_multipath_l3_keys
  fib_multipath_hash   -> rt6_multipath_hash

Signed-off-by: Jakub Sitnicki <jkbs@redhat.com>
---
 include/net/ip6_route.h |  1 +
 net/ipv6/icmp.c         |  1 +
 net/ipv6/route.c        | 50 +++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 52 insertions(+)

diff --git a/include/net/ip6_route.h b/include/net/ip6_route.h
index 907d39a..882bc3c 100644
--- a/include/net/ip6_route.h
+++ b/include/net/ip6_route.h
@@ -115,6 +115,7 @@ static inline int ip6_route_get_saddr(struct net *net, struct rt6_info *rt,
 
 struct rt6_info *rt6_lookup(struct net *net, const struct in6_addr *daddr,
 			    const struct in6_addr *saddr, int oif, int flags);
+u32 rt6_multipath_hash(const struct flowi6 *fl6, const struct sk_buff *skb);
 
 struct dst_entry *icmp6_dst_alloc(struct net_device *dev, struct flowi6 *fl6);
 
diff --git a/net/ipv6/icmp.c b/net/ipv6/icmp.c
index 4f82830..dd7608c 100644
--- a/net/ipv6/icmp.c
+++ b/net/ipv6/icmp.c
@@ -519,6 +519,7 @@ static void icmp6_send(struct sk_buff *skb, u8 type, u8 code, __u32 info,
 	fl6.fl6_icmp_type = type;
 	fl6.fl6_icmp_code = code;
 	fl6.flowi6_uid = sock_net_uid(net, NULL);
+	fl6.mp_hash = rt6_multipath_hash(&fl6, skb);
 	security_skb_classify_flow(skb, flowi6_to_flowi(&fl6));
 
 	sk = icmpv6_xmit_lock(net);
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 9b02064..6c4dd57 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -1214,6 +1214,54 @@ struct dst_entry *ip6_route_input_lookup(struct net *net,
 }
 EXPORT_SYMBOL_GPL(ip6_route_input_lookup);
 
+static void ip6_multipath_l3_keys(const struct sk_buff *skb,
+				  struct flow_keys *keys)
+{
+	const struct ipv6hdr *outer_iph = ipv6_hdr(skb);
+	const struct ipv6hdr *key_iph = outer_iph;
+	const struct ipv6hdr *inner_iph;
+	const struct icmp6hdr *icmph;
+	struct ipv6hdr _inner_iph;
+
+	if (likely(outer_iph->nexthdr != IPPROTO_ICMPV6))
+		goto out;
+
+	icmph = icmp6_hdr(skb);
+	if (icmph->icmp6_type != ICMPV6_DEST_UNREACH &&
+	    icmph->icmp6_type != ICMPV6_PKT_TOOBIG &&
+	    icmph->icmp6_type != ICMPV6_TIME_EXCEED &&
+	    icmph->icmp6_type != ICMPV6_PARAMPROB)
+		goto out;
+
+	inner_iph = skb_header_pointer(skb,
+				       skb_transport_offset(skb) + sizeof(*icmph),
+				       sizeof(_inner_iph), &_inner_iph);
+	if (!inner_iph)
+		goto out;
+
+	key_iph = inner_iph;
+out:
+	memset(keys, 0, sizeof(*keys));
+	keys->control.addr_type = FLOW_DISSECTOR_KEY_IPV6_ADDRS;
+	keys->addrs.v6addrs.src = key_iph->saddr;
+	keys->addrs.v6addrs.dst = key_iph->daddr;
+	keys->tags.flow_label = ip6_flowinfo(key_iph);
+	keys->basic.ip_proto = key_iph->nexthdr;
+}
+
+/* if skb is set it will be used and fl6 can be NULL */
+u32 rt6_multipath_hash(const struct flowi6 *fl6, const struct sk_buff *skb)
+{
+	struct flow_keys hash_keys;
+
+	if (skb) {
+		ip6_multipath_l3_keys(skb, &hash_keys);
+		return flow_hash_from_keys(&hash_keys);
+	}
+
+	return get_hash_from_flowi6(fl6);
+}
+
 void ip6_route_input(struct sk_buff *skb)
 {
 	const struct ipv6hdr *iph = ipv6_hdr(skb);
@@ -1232,6 +1280,8 @@ void ip6_route_input(struct sk_buff *skb)
 	tun_info = skb_tunnel_info(skb);
 	if (tun_info && !(tun_info->mode & IP_TUNNEL_INFO_TX))
 		fl6.flowi6_tun_key.tun_id = tun_info->key.tun_id;
+	if (unlikely(fl6.flowi6_proto == IPPROTO_ICMPV6))
+		fl6.mp_hash = rt6_multipath_hash(&fl6, skb);
 	skb_dst_drop(skb);
 	skb_dst_set(skb, ip6_route_input_lookup(net, skb->dev, &fl6, flags));
 }
-- 
2.9.4

^ permalink raw reply related

* [PATCH net-next v3 1/4] net: Extend struct flowi6 with multipath hash
From: Jakub Sitnicki @ 2017-08-23  7:58 UTC (permalink / raw)
  To: netdev
  Cc: David S. Miller, Hannes Frederic Sowa, Nikolay Aleksandrov,
	Tom Herbert
In-Reply-To: <20170823075831.27031-1-jkbs@redhat.com>

Allow for functions that fill out the IPv6 flow info to also pass a hash
computed over the skb contents. The hash value will drive the multipath
routing decisions.

This is intended for special treatment of ICMPv6 errors, where we would
like to make a routing decision based on the flow identifying the
offending IPv6 datagram that triggered the error, rather than the flow
of the ICMP error itself.

Signed-off-by: Jakub Sitnicki <jkbs@redhat.com>
---
 include/net/flow.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/net/flow.h b/include/net/flow.h
index f3dc61b..eb60cee3 100644
--- a/include/net/flow.h
+++ b/include/net/flow.h
@@ -149,6 +149,7 @@ struct flowi6 {
 #define fl6_ipsec_spi		uli.spi
 #define fl6_mh_type		uli.mht.type
 #define fl6_gre_key		uli.gre_key
+	__u32			mp_hash;
 } __attribute__((__aligned__(BITS_PER_LONG/8)));

 struct flowidn {
-- 
2.9.4

^ permalink raw reply related

* [PATCH net-next v3 0/4] Route ICMPv6 errors with the flow when ECMP in use
From: Jakub Sitnicki @ 2017-08-23  7:58 UTC (permalink / raw)
  To: netdev
  Cc: David S. Miller, Hannes Frederic Sowa, Nikolay Aleksandrov,
	Tom Herbert

This patch set is another take at making Path MTU Discovery work when
server nodes are behind a router employing multipath routing in a
load-balance or anycast setup (that is, when not every end-node can be
reached by every path). The problem has been well described in RFC 7690
[1], but in short - in such setups ICMPv6 PTB errors are not guaranteed
to be routed back to the server node that sent a reply that exceeds path
MTU.

The proposed solution is two-fold:

 (1) on the server side - reflect the Flow Label [2]. This can be done
     without modifying the application using a new per-netns sysctl knob
     that has been proposed independently of this patchset in the patch
     entitled "ipv6: Add sysctl for per namespace flow label
     reflection" [3].

 (2) on the ECMP router - make the ipv6 routing subsystem look into the
     ICMPv6 error packets and compute the flow-hash from its payload,
     i.e. the offending packet that triggered the error. This is the
     same behavior as ipv4 stack has already.

With both parts in place Path MTU Discovery can work past the ECMP
router when using IPv6.

[1] https://tools.ietf.org/html/rfc7690
[2] https://tools.ietf.org/html/draft-wang-6man-flow-label-reflection-01
[3] http://patchwork.ozlabs.org/patch/804870/

v1 -> v2:
 - don't use "extern" in external function declaration in header file
 - style change, put as many arguments as possible on the first line of
   a function call, and align consecutive lines to the first argument
 - expand the cover letter based on the feedback

v2 -> v3:
 - switch to computing flow-hash using flow dissector to align with
   recent changes to multipath routing in ipv4 stack
 - add a sysctl knob for enabling flow label reflection per netns

---

Testing has covered multipath routing of ICMPv6 PTB errors in forward
and local output path in a simple use-case of an HTTP server sending a
reply which is over the path MTU size [3]. I have also checked if the
flows get evenly spread over multiple paths (i.e. if there are no
regressions) [4].

[3] https://github.com/jsitnicki/tools/tree/master/net/tests/ecmp/pmtud
[4] https://github.com/jsitnicki/tools/tree/master/net/tests/ecmp/load-balance


Jakub Sitnicki (4):
  net: Extend struct flowi6 with multipath hash
  ipv6: Compute multipath hash for ICMP errors from offending packet
  ipv6: Fold rt6_info_hash_nhsfn() into its only caller
  ipv6: Use multipath hash from flow info if available

 include/net/flow.h      |  1 +
 include/net/ip6_route.h |  1 +
 net/ipv6/icmp.c         |  1 +
 net/ipv6/route.c        | 68 +++++++++++++++++++++++++++++++++++++++++--------
 4 files changed, 60 insertions(+), 11 deletions(-)

-- 
2.9.4

^ permalink raw reply

* [PATCH net-next] ipv6: Add sysctl for per namespace flow label reflection
From: Jakub Sitnicki @ 2017-08-23  7:55 UTC (permalink / raw)
  To: netdev
  Cc: David S. Miller, Hannes Frederic Sowa, Nikolay Aleksandrov,
	Tom Herbert

Reflecting IPv6 Flow Label at server nodes is useful in environments
that employ multipath routing to load balance the requests. As "IPv6
Flow Label Reflection" standard draft [1] points out - ICMPv6 PTB error
messages generated in response to a downstream packets from the server
can be routed by a load balancer back to the original server without
looking at transport headers, if the server applies the flow label
reflection. This enables the Path MTU Discovery past the ECMP router in
load-balance or anycast environments where each server node is reachable
by only one path.

Introduce a sysctl to enable flow label reflection per net namespace for
all newly created sockets. Same could be earlier achieved only per
socket by setting the IPV6_FL_F_REFLECT flag for the IPV6_FLOWLABEL_MGR
socket option.

[1] https://tools.ietf.org/html/draft-wang-6man-flow-label-reflection-01

Signed-off-by: Jakub Sitnicki <jkbs@redhat.com>
---
 Documentation/networking/ip-sysctl.txt | 9 +++++++++
 include/net/netns/ipv6.h               | 1 +
 net/ipv6/af_inet6.c                    | 1 +
 net/ipv6/sysctl_net_ipv6.c             | 8 ++++++++
 4 files changed, 19 insertions(+)

diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
index 84c9b8c..6b0bc0f 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -1350,6 +1350,15 @@ flowlabel_state_ranges - BOOLEAN
 	FALSE: disabled
 	Default: true
 
+flowlabel_reflect - BOOLEAN
+	Automatically reflect the flow label. Needed for Path MTU
+	Discovery to work with Equal Cost Multipath Routing in anycast
+	environments. See RFC 7690 and:
+	https://tools.ietf.org/html/draft-wang-6man-flow-label-reflection-01
+	TRUE: enabled
+	FALSE: disabled
+	Default: FALSE
+
 anycast_src_echo_reply - BOOLEAN
 	Controls the use of anycast addresses as source addresses for ICMPv6
 	echo reply
diff --git a/include/net/netns/ipv6.h b/include/net/netns/ipv6.h
index 0e50bf3..2544f97 100644
--- a/include/net/netns/ipv6.h
+++ b/include/net/netns/ipv6.h
@@ -36,6 +36,7 @@ struct netns_sysctl_ipv6 {
 	int idgen_retries;
 	int idgen_delay;
 	int flowlabel_state_ranges;
+	int flowlabel_reflect;
 };
 
 struct netns_ipv6 {
diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
index 3b58ee7..fe5262f 100644
--- a/net/ipv6/af_inet6.c
+++ b/net/ipv6/af_inet6.c
@@ -211,6 +211,7 @@ static int inet6_create(struct net *net, struct socket *sock, int protocol,
 	np->mc_loop	= 1;
 	np->pmtudisc	= IPV6_PMTUDISC_WANT;
 	np->autoflowlabel = ip6_default_np_autolabel(net);
+	np->repflow	= net->ipv6.sysctl.flowlabel_reflect;
 	sk->sk_ipv6only	= net->ipv6.sysctl.bindv6only;
 
 	/* Init the ipv4 part of the socket since we can have sockets
diff --git a/net/ipv6/sysctl_net_ipv6.c b/net/ipv6/sysctl_net_ipv6.c
index 69c50e7..6fbf8ae 100644
--- a/net/ipv6/sysctl_net_ipv6.c
+++ b/net/ipv6/sysctl_net_ipv6.c
@@ -90,6 +90,13 @@ static struct ctl_table ipv6_table_template[] = {
 		.mode		= 0644,
 		.proc_handler	= proc_dointvec
 	},
+	{
+		.procname	= "flowlabel_reflect",
+		.data		= &init_net.ipv6.sysctl.flowlabel_reflect,
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec,
+	},
 	{ }
 };
 
@@ -149,6 +156,7 @@ static int __net_init ipv6_sysctl_net_init(struct net *net)
 	ipv6_table[6].data = &net->ipv6.sysctl.idgen_delay;
 	ipv6_table[7].data = &net->ipv6.sysctl.flowlabel_state_ranges;
 	ipv6_table[8].data = &net->ipv6.sysctl.ip_nonlocal_bind;
+	ipv6_table[9].data = &net->ipv6.sysctl.flowlabel_reflect;
 
 	ipv6_route_table = ipv6_route_sysctl_init(net);
 	if (!ipv6_route_table)
-- 
2.9.4

^ permalink raw reply related

* Re: [PATCH v3 3/4] net: stmmac: register parent MDIO node for sun8i-h3-emac
From: Maxime Ripard @ 2017-08-23  7:49 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: Corentin Labbe, Chen-Yu Tsai, Andrew Lunn, Rob Herring,
	Mark Rutland, Russell King, Giuseppe Cavallaro, Alexandre Torgue,
	devicetree, linux-arm-kernel, linux-kernel, netdev
In-Reply-To: <352ae66b-78a4-e88b-4544-a8edd9390b0c@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 1874 bytes --]

Hi Florian,

On Tue, Aug 22, 2017 at 11:35:01AM -0700, Florian Fainelli wrote:
> >>> So I think what you are saying is either impossible or engineering-wise
> >>> a very stupid design, like using an external MAC with a discrete PHY
> >>> connected to the internal MAC's MDIO bus, while using the internal MAC
> >>> with the internal PHY.
> >>>
> >>> Now can we please decide on something? We're a week and a half from
> >>> the 4.13 release. If mdio-mux is wrong, then we could have two mdio
> >>> nodes (internal-mdio & external-mdio).
> >>
> >> I really don't see a need for a mdio-mux in the first place, just have
> >> one MDIO controller (current state) sub-node which describes the
> >> built-in STMMAC MDIO controller and declare the internal PHY as a child
> >> node (along with 'phy-is-integrated'). If a different configuration is
> >> used, then just put the external PHY as a child node there.
> >>
> >> If fixed-link is required, the mdio node becomes unused anyway.
> >>
> >> Works for everyone?
> > 
> > If we put an external PHY with reg=1 as a child of internal MDIO,
> > il will be merged with internal PHY node and get
> > phy-is-integrated.
> 
> Then have the .dtsi file contain just the mdio node, but no internal or
> external PHY and push all the internal and external PHY node definition
> (in its entirety) to the per-board DTS file, does not that work?

If possible, I'd really like to have the internal PHY in the
DTSI. It's always there in hardware anyway, and duplicating the PHY,
with its clock, reset line, and whatever info we might need in the
future in each and every board DTS that uses it will be very error
prone and we will have the usual bunch of issues that come up with
duplication.

Maxime

-- 
Maxime Ripard, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply

* RE: [PATCH v2 net-next 1/3] ipv6: Prevent unexpected sk->sk_prot changes
From: Ilya Lesokhin @ 2017-08-23  7:49 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: netdev@vger.kernel.org, davem@davemloft.net, davejwatson@fb.com,
	Aviad Yehezkel, Boris Pismenny
In-Reply-To: <1502808336.4936.78.camel@edumazet-glaptop3.roam.corp.google.com>

> -----Original Message-----
> From: Eric Dumazet [mailto:eric.dumazet@gmail.com]
> Sent: Tuesday, August 15, 2017 5:46 PM
> To: Boris Pismenny <borisp@mellanox.com>
> Cc: Ilya Lesokhin <ilyal@mellanox.com>; netdev@vger.kernel.org;
> davem@davemloft.net; davejwatson@fb.com; Aviad Yehezkel
> <aviadye@mellanox.com>
> Subject: Re: [PATCH v2 net-next 1/3] ipv6: Prevent unexpected sk->sk_prot
> changes
> 
> I am just saying IPV6_ADDRFORM is becoming spaghetti code, and maybe
> this is time to make it modular. 
> 

Hi Eric,
I understand your concern, but I would like to postpone implementing this
feature until the ulp infrastructure stabilizes.
I don't even know how many users want to use IPV6_ADDRFORM on sockets with ulp.
I think that simply returning an error for ulp sockets in the meantime is a reasonable 
compromise.

what do you think about the patch below?

Subject: [PATCH 1/1] ipv6: Disable IPV6_ADDRFORM on sockets with ulp attached

The IPV6_ADDRFORM setsockopt works by overriding sk->sk_prot.
The current KTLS ulp also overrides sk->sk_prot.
Consequently, using IPV6_ADDRFORM when there is a ulp attached is
unsafe and this patch disables it.

Fixes: 3c4d7559159b ('tls: kernel TLS support')
Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com>
---
 net/ipv6/ipv6_sockglue.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/net/ipv6/ipv6_sockglue.c b/net/ipv6/ipv6_sockglue.c
index 02d795f..d935948 100644
--- a/net/ipv6/ipv6_sockglue.c
+++ b/net/ipv6/ipv6_sockglue.c
@@ -185,8 +185,12 @@ static int do_ipv6_setsockopt(struct sock *sk, int level, int optname,
 					retv = -EBUSY;
 					break;
 				}
-			} else if (sk->sk_protocol != IPPROTO_TCP)
+			} else if (sk->sk_protocol == IPPROTO_TCP) {
+				if (inet_csk(sk)->icsk_ulp_ops)
+					break;
+			} else {
 				break;
+			}
 
 			if (sk->sk_state != TCP_ESTABLISHED) {
 				retv = -ENOTCONN;


Thanks,
Ilya

^ permalink raw reply related

* [PATCH net-next 3/3] net: mvpp2: software tso support
From: Antoine Tenart @ 2017-08-23  7:46 UTC (permalink / raw)
  To: davem
  Cc: Antoine Tenart, andrew, gregory.clement, thomas.petazzoni, nadavh,
	linux, mw, stefanc, netdev
In-Reply-To: <20170823074656.14595-1-antoine.tenart@free-electrons.com>

The patch uses the tso API to implement the tso functionality in Marvell
PPv2 driver.

Using iperf and 10G ports, using TSO shows a significant performance
improvement by a factor 2 to reach around 9.5Gbps in TX; as well as a
significant CPU usage drop (from 25% to 15%).

Signed-off-by: Antoine Tenart <antoine.tenart@free-electrons.com>
---
 drivers/net/ethernet/marvell/mvpp2.c | 171 ++++++++++++++++++++++++++++++++---
 1 file changed, 157 insertions(+), 14 deletions(-)

diff --git a/drivers/net/ethernet/marvell/mvpp2.c b/drivers/net/ethernet/marvell/mvpp2.c
index 0f5a2a4f9ddf..90103bce6c66 100644
--- a/drivers/net/ethernet/marvell/mvpp2.c
+++ b/drivers/net/ethernet/marvell/mvpp2.c
@@ -37,6 +37,7 @@
 #include <uapi/linux/ppp_defs.h>
 #include <net/ip.h>
 #include <net/ipv6.h>
+#include <net/tso.h>
 
 /* RX Fifo Registers */
 #define MVPP2_RX_DATA_FIFO_SIZE_REG(port)	(0x00 + 4 * (port))
@@ -1033,6 +1034,10 @@ struct mvpp2_txq_pcpu {
 
 	/* Index of the TX DMA descriptor to be cleaned up */
 	int txq_get_index;
+
+	/* DMA buffer for TSO headers */
+	char *tso_headers;
+	dma_addr_t tso_headers_dma;
 };
 
 struct mvpp2_tx_queue {
@@ -5623,6 +5628,14 @@ static int mvpp2_txq_init(struct mvpp2_port *port,
 		txq_pcpu->reserved_num = 0;
 		txq_pcpu->txq_put_index = 0;
 		txq_pcpu->txq_get_index = 0;
+
+		txq_pcpu->tso_headers =
+			dma_alloc_coherent(port->dev->dev.parent,
+					   MVPP2_AGGR_TXQ_SIZE * TSO_HEADER_SIZE,
+					   &txq_pcpu->tso_headers_dma,
+					   GFP_KERNEL);
+		if (!txq_pcpu->tso_headers)
+			goto cleanup;
 	}
 
 	return 0;
@@ -5630,6 +5643,11 @@ static int mvpp2_txq_init(struct mvpp2_port *port,
 	for_each_present_cpu(cpu) {
 		txq_pcpu = per_cpu_ptr(txq->pcpu, cpu);
 		kfree(txq_pcpu->buffs);
+
+		dma_free_coherent(port->dev->dev.parent,
+				  MVPP2_AGGR_TXQ_SIZE * MVPP2_DESC_ALIGNED_SIZE,
+				  txq_pcpu->tso_headers,
+				  txq_pcpu->tso_headers_dma);
 	}
 
 	dma_free_coherent(port->dev->dev.parent,
@@ -5649,6 +5667,11 @@ static void mvpp2_txq_deinit(struct mvpp2_port *port,
 	for_each_present_cpu(cpu) {
 		txq_pcpu = per_cpu_ptr(txq->pcpu, cpu);
 		kfree(txq_pcpu->buffs);
+
+		dma_free_coherent(port->dev->dev.parent,
+				  MVPP2_AGGR_TXQ_SIZE * MVPP2_DESC_ALIGNED_SIZE,
+				  txq_pcpu->tso_headers,
+				  txq_pcpu->tso_headers_dma);
 	}
 
 	if (txq->descs)
@@ -6238,6 +6261,123 @@ static int mvpp2_tx_frag_process(struct mvpp2_port *port, struct sk_buff *skb,
 	return -ENOMEM;
 }
 
+static inline void mvpp2_tso_put_hdr(struct sk_buff *skb,
+				     struct net_device *dev,
+				     struct mvpp2_tx_queue *txq,
+				     struct mvpp2_tx_queue *aggr_txq,
+				     struct mvpp2_txq_pcpu *txq_pcpu,
+				     int hdr_sz)
+{
+	struct mvpp2_port *port = netdev_priv(dev);
+	struct mvpp2_tx_desc *tx_desc = mvpp2_txq_next_desc_get(aggr_txq);
+	dma_addr_t addr;
+
+	mvpp2_txdesc_txq_set(port, tx_desc, txq->id);
+	mvpp2_txdesc_size_set(port, tx_desc, hdr_sz);
+
+	addr = txq_pcpu->tso_headers_dma +
+	       txq_pcpu->txq_put_index * TSO_HEADER_SIZE;
+	mvpp2_txdesc_offset_set(port, tx_desc, addr & MVPP2_TX_DESC_ALIGN);
+	mvpp2_txdesc_dma_addr_set(port, tx_desc, addr & ~MVPP2_TX_DESC_ALIGN);
+
+	mvpp2_txdesc_cmd_set(port, tx_desc, mvpp2_skb_tx_csum(port, skb) |
+					    MVPP2_TXD_F_DESC |
+					    MVPP2_TXD_PADDING_DISABLE);
+	mvpp2_txq_inc_put(port, txq_pcpu, NULL, tx_desc);
+}
+
+static inline int mvpp2_tso_put_data(struct sk_buff *skb,
+				     struct net_device *dev, struct tso_t *tso,
+				     struct mvpp2_tx_queue *txq,
+				     struct mvpp2_tx_queue *aggr_txq,
+				     struct mvpp2_txq_pcpu *txq_pcpu,
+				     int sz, bool left, bool last)
+{
+	struct mvpp2_port *port = netdev_priv(dev);
+	struct mvpp2_tx_desc *tx_desc = mvpp2_txq_next_desc_get(aggr_txq);
+	dma_addr_t buf_dma_addr;
+
+	mvpp2_txdesc_txq_set(port, tx_desc, txq->id);
+	mvpp2_txdesc_size_set(port, tx_desc, sz);
+
+	buf_dma_addr = dma_map_single(dev->dev.parent, tso->data, sz,
+				      DMA_TO_DEVICE);
+	if (unlikely(dma_mapping_error(dev->dev.parent, buf_dma_addr))) {
+		mvpp2_txq_desc_put(txq);
+		return -ENOMEM;
+	}
+
+	mvpp2_txdesc_offset_set(port, tx_desc,
+				buf_dma_addr & MVPP2_TX_DESC_ALIGN);
+	mvpp2_txdesc_dma_addr_set(port, tx_desc,
+				  buf_dma_addr & ~MVPP2_TX_DESC_ALIGN);
+
+	if (!left) {
+		mvpp2_txdesc_cmd_set(port, tx_desc, MVPP2_TXD_L_DESC);
+		if (last) {
+			mvpp2_txq_inc_put(port, txq_pcpu, skb, tx_desc);
+			return 0;
+		}
+	} else {
+		mvpp2_txdesc_cmd_set(port, tx_desc, 0);
+	}
+
+	mvpp2_txq_inc_put(port, txq_pcpu, NULL, tx_desc);
+	return 0;
+}
+
+static int mvpp2_tx_tso(struct sk_buff *skb, struct net_device *dev,
+			struct mvpp2_tx_queue *txq,
+			struct mvpp2_tx_queue *aggr_txq,
+			struct mvpp2_txq_pcpu *txq_pcpu)
+{
+	struct mvpp2_port *port = netdev_priv(dev);
+	struct tso_t tso;
+	int hdr_sz = skb_transport_offset(skb) + tcp_hdrlen(skb);
+	int i, len, descs = 0;
+
+	/* Check number of available descriptors */
+	if (mvpp2_aggr_desc_num_check(port->priv, aggr_txq,
+				      tso_count_descs(skb)) ||
+	    mvpp2_txq_reserved_desc_num_proc(port->priv, txq, txq_pcpu,
+					     tso_count_descs(skb)))
+		return 0;
+
+	tso_start(skb, &tso);
+	len = skb->len - hdr_sz;
+	while (len > 0) {
+		int left = min_t(int, skb_shinfo(skb)->gso_size, len);
+		char *hdr = txq_pcpu->tso_headers +
+			    txq_pcpu->txq_put_index * TSO_HEADER_SIZE;
+
+		len -= left;
+		descs++;
+
+		tso_build_hdr(skb, hdr, &tso, left, len == 0);
+		mvpp2_tso_put_hdr(skb, dev, txq, aggr_txq, txq_pcpu, hdr_sz);
+
+		while (left > 0) {
+			int sz = min_t(int, tso.size, left);
+			left -= sz;
+			descs++;
+
+			if (mvpp2_tso_put_data(skb, dev, &tso, txq, aggr_txq,
+					       txq_pcpu, sz, left, len == 0))
+				goto release;
+			tso_build_data(skb, &tso, sz);
+		}
+	}
+
+	return descs;
+
+release:
+	for (i = descs - 1; i >= 0; i--) {
+		struct mvpp2_tx_desc *tx_desc = txq->descs + i;
+		tx_desc_unmap_put(port, txq, tx_desc);
+	}
+	return 0;
+}
+
 /* Main tx processing */
 static int mvpp2_tx(struct sk_buff *skb, struct net_device *dev)
 {
@@ -6255,6 +6395,10 @@ static int mvpp2_tx(struct sk_buff *skb, struct net_device *dev)
 	txq_pcpu = this_cpu_ptr(txq->pcpu);
 	aggr_txq = &port->priv->aggr_txqs[smp_processor_id()];
 
+	if (skb_is_gso(skb)) {
+		frags = mvpp2_tx_tso(skb, dev, txq, aggr_txq, txq_pcpu);
+		goto out;
+	}
 	frags = skb_shinfo(skb)->nr_frags + 1;
 
 	/* Check number of available descriptors */
@@ -6304,22 +6448,21 @@ static int mvpp2_tx(struct sk_buff *skb, struct net_device *dev)
 		}
 	}
 
-	txq_pcpu->reserved_num -= frags;
-	txq_pcpu->count += frags;
-	aggr_txq->count += frags;
-
-	/* Enable transmit */
-	wmb();
-	mvpp2_aggr_txq_pend_desc_add(port, frags);
-
-	if (txq_pcpu->size - txq_pcpu->count < MAX_SKB_FRAGS + 1) {
-		struct netdev_queue *nq = netdev_get_tx_queue(dev, txq_id);
-
-		netif_tx_stop_queue(nq);
-	}
 out:
 	if (frags > 0) {
 		struct mvpp2_pcpu_stats *stats = this_cpu_ptr(port->stats);
+		struct netdev_queue *nq = netdev_get_tx_queue(dev, txq_id);
+
+		txq_pcpu->reserved_num -= frags;
+		txq_pcpu->count += frags;
+		aggr_txq->count += frags;
+
+		/* Enable transmit */
+		wmb();
+		mvpp2_aggr_txq_pend_desc_add(port, frags);
+
+		if (txq_pcpu->size - txq_pcpu->count < MAX_SKB_FRAGS + 1)
+			netif_tx_stop_queue(nq);
 
 		u64_stats_update_begin(&stats->syncp);
 		stats->tx_packets++;
@@ -7496,7 +7639,7 @@ static int mvpp2_port_probe(struct platform_device *pdev,
 		}
 	}
 
-	features = NETIF_F_SG | NETIF_F_IP_CSUM;
+	features = NETIF_F_SG | NETIF_F_IP_CSUM | NETIF_F_TSO;
 	dev->features = features | NETIF_F_RXCSUM;
 	dev->hw_features |= features | NETIF_F_RXCSUM | NETIF_F_GRO;
 	dev->vlan_features |= features;
-- 
2.13.5

^ permalink raw reply related

* [PATCH net-next 0/3] net: mvpp2: software TSO support
From: Antoine Tenart @ 2017-08-23  7:46 UTC (permalink / raw)
  To: davem
  Cc: Antoine Tenart, andrew, gregory.clement, thomas.petazzoni, nadavh,
	linux, mw, stefanc, netdev

Hi all,

This series adds the s/w TSO support in the PPv2 driver, in addition to
two cosmetic commits. As stated in patch 3/3:

Using iperf and 10G ports, using TSO shows a significant performance
improvement by a factor 2 to reach around 9.5Gbps in TX; as well as a
significant CPU usage drop (from 25% to 15%).

Thanks!
Antoine

Antoine Tenart (3):
  net: define the TSO header size in net/tso.h
  net: mvpp2: unify the txq size define use
  net: mvpp2: software tso support

 drivers/net/ethernet/cavium/thunder/nicvf_queues.h |   1 -
 drivers/net/ethernet/freescale/fec_main.c          |   1 -
 drivers/net/ethernet/marvell/mv643xx_eth.c         |   2 -
 drivers/net/ethernet/marvell/mvneta.c              |   3 -
 drivers/net/ethernet/marvell/mvpp2.c               | 182 ++++++++++++++++++---
 include/net/tso.h                                  |   2 +
 6 files changed, 164 insertions(+), 27 deletions(-)

-- 
2.13.5

^ permalink raw reply

* [PATCH net-next 1/3] net: define the TSO header size in net/tso.h
From: Antoine Tenart @ 2017-08-23  7:46 UTC (permalink / raw)
  To: davem
  Cc: Antoine Tenart, andrew, gregory.clement, thomas.petazzoni, nadavh,
	linux, mw, stefanc, netdev
In-Reply-To: <20170823074656.14595-1-antoine.tenart@free-electrons.com>

The TSO header size was defined in many drivers. Factorize the code and
define its size in net/tso.h.

Signed-off-by: Antoine Tenart <antoine.tenart@free-electrons.com>
---
 drivers/net/ethernet/cavium/thunder/nicvf_queues.h | 1 -
 drivers/net/ethernet/freescale/fec_main.c          | 1 -
 drivers/net/ethernet/marvell/mv643xx_eth.c         | 2 --
 drivers/net/ethernet/marvell/mvneta.c              | 3 ---
 include/net/tso.h                                  | 2 ++
 5 files changed, 2 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/cavium/thunder/nicvf_queues.h b/drivers/net/ethernet/cavium/thunder/nicvf_queues.h
index 57858522c33c..67d1a3230773 100644
--- a/drivers/net/ethernet/cavium/thunder/nicvf_queues.h
+++ b/drivers/net/ethernet/cavium/thunder/nicvf_queues.h
@@ -277,7 +277,6 @@ struct snd_queue {
 	u16		xdp_free_cnt;
 	bool		is_xdp;
 
-#define	TSO_HEADER_SIZE	128
 	/* For TSO segment's header */
 	char		*tso_hdrs;
 	dma_addr_t	tso_hdrs_phys;
diff --git a/drivers/net/ethernet/freescale/fec_main.c b/drivers/net/ethernet/freescale/fec_main.c
index df09b254553d..56f56d6ada9c 100644
--- a/drivers/net/ethernet/freescale/fec_main.c
+++ b/drivers/net/ethernet/freescale/fec_main.c
@@ -226,7 +226,6 @@ MODULE_PARM_DESC(macaddr, "FEC Ethernet MAC address");
 
 #define COPYBREAK_DEFAULT	256
 
-#define TSO_HEADER_SIZE		128
 /* Max number of allowed TCP segments for software TSO */
 #define FEC_MAX_TSO_SEGS	100
 #define FEC_MAX_SKB_DESCS	(FEC_MAX_TSO_SEGS * 2 + MAX_SKB_FRAGS)
diff --git a/drivers/net/ethernet/marvell/mv643xx_eth.c b/drivers/net/ethernet/marvell/mv643xx_eth.c
index 9c94ea9b2b80..fb2d533ae4ef 100644
--- a/drivers/net/ethernet/marvell/mv643xx_eth.c
+++ b/drivers/net/ethernet/marvell/mv643xx_eth.c
@@ -183,8 +183,6 @@ static char mv643xx_eth_driver_version[] = "1.4";
 #define DEFAULT_TX_QUEUE_SIZE	512
 #define SKB_DMA_REALIGN		((PAGE_SIZE - NET_SKB_PAD) % SMP_CACHE_BYTES)
 
-#define TSO_HEADER_SIZE		128
-
 /* Max number of allowed TCP segments for software TSO */
 #define MV643XX_MAX_TSO_SEGS 100
 #define MV643XX_MAX_SKB_DESCS (MV643XX_MAX_TSO_SEGS * 2 + MAX_SKB_FRAGS)
diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
index 0aab74c2a209..35ff1ecfcff0 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -281,9 +281,6 @@
  */
 #define MVNETA_RSS_LU_TABLE_SIZE	1
 
-/* TSO header size */
-#define TSO_HEADER_SIZE 128
-
 /* Max number of Rx descriptors */
 #define MVNETA_MAX_RXD 128
 
diff --git a/include/net/tso.h b/include/net/tso.h
index b7be852bfe9d..9a56c39e6d0a 100644
--- a/include/net/tso.h
+++ b/include/net/tso.h
@@ -3,6 +3,8 @@
 
 #include <net/ip.h>
 
+#define TSO_HEADER_SIZE		128
+
 struct tso_t {
 	int next_frag_idx;
 	void *data;
-- 
2.13.5

^ permalink raw reply related

* [PATCH net-next 2/3] net: mvpp2: unify the txq size define use
From: Antoine Tenart @ 2017-08-23  7:46 UTC (permalink / raw)
  To: davem
  Cc: Antoine Tenart, andrew, gregory.clement, thomas.petazzoni, nadavh,
	linux, mw, stefanc, netdev
In-Reply-To: <20170823074656.14595-1-antoine.tenart@free-electrons.com>

The txq size is defined by MVPP2_AGGR_TXQ_SIZE, which is sometime not
used directly but through variables. As it is a fixed value use the
define everywhere in the driver.

Signed-off-by: Antoine Tenart <antoine.tenart@free-electrons.com>
---
 drivers/net/ethernet/marvell/mvpp2.c | 11 +++++------
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/marvell/mvpp2.c b/drivers/net/ethernet/marvell/mvpp2.c
index 4a2ed5c66bb4..0f5a2a4f9ddf 100644
--- a/drivers/net/ethernet/marvell/mvpp2.c
+++ b/drivers/net/ethernet/marvell/mvpp2.c
@@ -5413,15 +5413,14 @@ static unsigned int mvpp2_tx_done(struct mvpp2_port *port, u32 cause,
 
 /* Allocate and initialize descriptors for aggr TXQ */
 static int mvpp2_aggr_txq_init(struct platform_device *pdev,
-			       struct mvpp2_tx_queue *aggr_txq,
-			       int desc_num, int cpu,
+			       struct mvpp2_tx_queue *aggr_txq, int cpu,
 			       struct mvpp2 *priv)
 {
 	u32 txq_dma;
 
 	/* Allocate memory for TX descriptors */
 	aggr_txq->descs = dma_alloc_coherent(&pdev->dev,
-				desc_num * MVPP2_DESC_ALIGNED_SIZE,
+				MVPP2_AGGR_TXQ_SIZE * MVPP2_DESC_ALIGNED_SIZE,
 				&aggr_txq->descs_dma, GFP_KERNEL);
 	if (!aggr_txq->descs)
 		return -ENOMEM;
@@ -5442,7 +5441,8 @@ static int mvpp2_aggr_txq_init(struct platform_device *pdev,
 			MVPP22_AGGR_TXQ_DESC_ADDR_OFFS;
 
 	mvpp2_write(priv, MVPP2_AGGR_TXQ_DESC_ADDR_REG(cpu), txq_dma);
-	mvpp2_write(priv, MVPP2_AGGR_TXQ_DESC_SIZE_REG(cpu), desc_num);
+	mvpp2_write(priv, MVPP2_AGGR_TXQ_DESC_SIZE_REG(cpu),
+		    MVPP2_AGGR_TXQ_SIZE);
 
 	return 0;
 }
@@ -7691,8 +7691,7 @@ static int mvpp2_init(struct platform_device *pdev, struct mvpp2 *priv)
 	for_each_present_cpu(i) {
 		priv->aggr_txqs[i].id = i;
 		priv->aggr_txqs[i].size = MVPP2_AGGR_TXQ_SIZE;
-		err = mvpp2_aggr_txq_init(pdev, &priv->aggr_txqs[i],
-					  MVPP2_AGGR_TXQ_SIZE, i, priv);
+		err = mvpp2_aggr_txq_init(pdev, &priv->aggr_txqs[i], i, priv);
 		if (err < 0)
 			return err;
 	}
-- 
2.13.5

^ permalink raw reply related

* Re: [PATCH net-next v7 02/10] bpf: Add eBPF program subtype and is_valid_subtype() verifier
From: Mickaël Salaün @ 2017-08-23  7:45 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: linux-kernel, Alexei Starovoitov, Andy Lutomirski,
	Arnaldo Carvalho de Melo, Casey Schaufler, Daniel Borkmann,
	David Drysdale, David S . Miller, Eric W . Biederman,
	James Morris, Jann Horn, Jonathan Corbet, Matthew Garrett,
	Michael Kerrisk, Kees Cook, Paul Moore, Sargun Dhillon,
	Serge E . Hallyn, Shuah Khan, Tejun Heo, Thomas Graf
In-Reply-To: <20170823024452.zvizovwfd7xjucsx@ast-mbp>


[-- Attachment #1.1: Type: text/plain, Size: 6655 bytes --]


On 23/08/2017 04:44, Alexei Starovoitov wrote:
> On Mon, Aug 21, 2017 at 02:09:25AM +0200, Mickaël Salaün wrote:
>> The goal of the program subtype is to be able to have different static
>> fine-grained verifications for a unique program type.
>>
>> The struct bpf_verifier_ops gets a new optional function:
>> is_valid_subtype(). This new verifier is called at the beginning of the
>> eBPF program verification to check if the (optional) program subtype is
>> valid.
>>
>> For now, only Landlock eBPF programs are using a program subtype (see
>> next commit) but this could be used by other program types in the future.
>>
>> Signed-off-by: Mickaël Salaün <mic@digikod.net>
>> Cc: Alexei Starovoitov <ast@kernel.org>
>> Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
>> Cc: Daniel Borkmann <daniel@iogearbox.net>
>> Cc: David S. Miller <davem@davemloft.net>
>> Link: https://lkml.kernel.org/r/20160827205559.GA43880@ast-mbp.thefacebook.com
>> ---
>>
>> Changes since v6:
>> * rename Landlock version to ABI to better reflect its purpose
>> * fix unsigned integer checks
>> * fix pointer cast
>> * constify pointers
>> * rebase
>>
>> Changes since v5:
>> * use a prog_subtype pointer and make it future-proof
>> * add subtype test
>> * constify bpf_load_program()'s subtype argument
>> * cleanup subtype initialization
>> * rebase
>>
>> Changes since v4:
>> * replace the "status" field with "version" (more generic)
>> * replace the "access" field with "ability" (less confusing)
>>
>> Changes since v3:
>> * remove the "origin" field
>> * add an "option" field
>> * cleanup comments
>> ---
>>  include/linux/bpf.h                         |  7 ++-
>>  include/linux/filter.h                      |  2 +
>>  include/uapi/linux/bpf.h                    | 11 +++++
>>  kernel/bpf/syscall.c                        | 22 ++++++++-
>>  kernel/bpf/verifier.c                       | 17 +++++--
>>  kernel/trace/bpf_trace.c                    | 15 ++++--
>>  net/core/filter.c                           | 71 ++++++++++++++++++-----------
>>  samples/bpf/bpf_load.c                      |  3 +-
>>  samples/bpf/cookie_uid_helper_example.c     |  2 +-
>>  samples/bpf/fds_example.c                   |  2 +-
>>  samples/bpf/sock_example.c                  |  3 +-
>>  samples/bpf/test_cgrp2_attach.c             |  2 +-
>>  samples/bpf/test_cgrp2_attach2.c            |  2 +-
>>  samples/bpf/test_cgrp2_sock.c               |  2 +-
>>  tools/include/uapi/linux/bpf.h              | 11 +++++
>>  tools/lib/bpf/bpf.c                         | 10 +++-
>>  tools/lib/bpf/bpf.h                         |  5 +-
>>  tools/lib/bpf/libbpf.c                      |  4 +-
>>  tools/perf/tests/bpf.c                      |  2 +-
>>  tools/testing/selftests/bpf/test_align.c    |  2 +-
>>  tools/testing/selftests/bpf/test_tag.c      |  2 +-
>>  tools/testing/selftests/bpf/test_verifier.c | 17 ++++++-
>>  22 files changed, 158 insertions(+), 56 deletions(-)
> 
> ...
> 
>> diff --git a/include/linux/filter.h b/include/linux/filter.h
>> index 7015116331af..0c3fadbb5a58 100644
>> --- a/include/linux/filter.h
>> +++ b/include/linux/filter.h
>> @@ -464,6 +464,8 @@ struct bpf_prog {
>>  	u32			len;		/* Number of filter blocks */
>>  	u32			jited_len;	/* Size of jited insns in bytes */
>>  	u8			tag[BPF_TAG_SIZE];
>> +	u8			has_subtype;
>> +	union bpf_prog_subtype	subtype;	/* Fine-grained verifications */
> 
> these burn a hole in very performance sensitive structure.
> Also there are bits rigth above. use them instead of u8 has_subtype?
> or can these two fields be part of bpf_prog_aux ?

OK, I'll create one bit variable and a bpf_prog_subtype field in the
bpf_prog_aux struct then.


> 
>>  	struct bpf_prog_aux	*aux;		/* Auxiliary fields */
>>  	struct sock_fprog_kern	*orig_prog;	/* Original BPF program */
>>  	unsigned int		(*bpf_func)(const void *ctx,
>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
>> index 843818dff96d..8541ab85e432 100644
>> --- a/include/uapi/linux/bpf.h
>> +++ b/include/uapi/linux/bpf.h
>> @@ -177,6 +177,15 @@ enum bpf_attach_type {
>>  /* Specify numa node during map creation */
>>  #define BPF_F_NUMA_NODE		(1U << 2)
>>  
>> +union bpf_prog_subtype {
>> +	struct {
>> +		__u32		abi; /* minimal ABI version, cf. user doc */
> 
> the concept of abi (version) sounds a bit weird to me.
> Why bother with it at all?
> Once the first set of patches lands the kernel as whole will have landlock feature
> with a set of helpers, actions, event types.
> Some future patches will extend the landlock feature step by step.
> This abi concept assumes that anyone who adds new helper would need
> to keep incrementing this 'abi'. What value does it give to user or to kernel?
> The users will already know that landlock is present in kernel 4.14 or whatever
> and the kernel 4.18 has more landlock features. Why bother with extra abi number?

That's right for helpers and context fields, but we can't check the use
of one field's content. The status field is intended to be a bitfield
extendable in the future. For example, one use case is to set a flag to
inform the eBPF program that it was already called with the same context
and can skip most of its check (if not related to maps). Same goes for
the FS action bitfield, one may want to add more of them. Another
example may be the check for abilities. We may want to relax/remove the
capability require to set one of them. With an ABI version, the user can
easily check if the current kernel support that.

> 
>> +		__u32		event; /* enum landlock_subtype_event */
>> +		__aligned_u64	ability; /* LANDLOCK_SUBTYPE_ABILITY_* */
>> +		__aligned_u64	option; /* LANDLOCK_SUBTYPE_OPTION_* */
>> +	} landlock_rule;
>> +} __attribute__((aligned(8)));
>> +
>>  union bpf_attr {
>>  	struct { /* anonymous struct used by BPF_MAP_CREATE command */
>>  		__u32	map_type;	/* one of enum bpf_map_type */
>> @@ -212,6 +221,8 @@ union bpf_attr {
>>  		__aligned_u64	log_buf;	/* user supplied buffer */
>>  		__u32		kern_version;	/* checked when prog_type=kprobe */
>>  		__u32		prog_flags;
>> +		__aligned_u64	prog_subtype;	/* bpf_prog_subtype address */
>> +		__u32		prog_subtype_size;
>>  	};
> 
> more general question: what is the status of security/ bits?
> I'm assuming they still need to be reviewed and explicitly acked by James, right?

Right, the review process is ongoing. :)
BTW, I'll be at Linux Security Summit (co-located with Plumbers) next
month. We'll be able to clarify some points there too.

Regards,
 Mickaël


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply

* Re: [PATCH net-next 5/5] xdp: get tracepoints xdp_exception and xdp_redirect in sync
From: Jesper Dangaard Brouer @ 2017-08-23  7:41 UTC (permalink / raw)
  To: Daniel Borkmann; +Cc: netdev, John Fastabend, brouer
In-Reply-To: <599CA26D.7060003@iogearbox.net>

On Tue, 22 Aug 2017 23:30:21 +0200
Daniel Borkmann <daniel@iogearbox.net> wrote:

> On 08/22/2017 10:47 PM, Jesper Dangaard Brouer wrote:
> > Remove the net_device string name from the xdp_exception tracepoint,
> > like the xdp_redirect tracepoint.
> >
> > Align the TP_STRUCT to have common entries between these two
> > tracepoint.
> >
> > Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
> > ---
> >   include/trace/events/xdp.h |   24 ++++++++++++------------
> >   1 file changed, 12 insertions(+), 12 deletions(-)
> >
> > diff --git a/include/trace/events/xdp.h b/include/trace/events/xdp.h
> > index 7511bed80558..6495b0d9d5c7 100644
> > --- a/include/trace/events/xdp.h
> > +++ b/include/trace/events/xdp.h
> > @@ -31,22 +31,22 @@ TRACE_EVENT(xdp_exception,
> >   	TP_ARGS(dev, xdp, act),
> >
> >   	TP_STRUCT__entry(
> > -		__string(name, dev->name)
> >   		__array(u8, prog_tag, 8)
> >   		__field(u32, act)
> > +		__field(int, ifindex)
> >   	),
> >
> >   	TP_fast_assign(
> >   		BUILD_BUG_ON(sizeof(__entry->prog_tag) != sizeof(xdp->tag));
> >   		memcpy(__entry->prog_tag, xdp->tag, sizeof(xdp->tag));
> > -		__assign_str(name, dev->name);
> > -		__entry->act = act;
> > +		__entry->act		= act;
> > +		__entry->ifindex	= dev->ifindex;
> >   	),
> >  
> [...]
> >   TRACE_EVENT(xdp_redirect,
> > @@ -57,26 +57,26 @@ TRACE_EVENT(xdp_redirect,
> >   	TP_ARGS(from_index, to_index, xdp, act, err),
> >
> >   	TP_STRUCT__entry(
> > -		__field(int, from_index)
> > -		__field(int, to_index)
> >   		__array(u8, prog_tag, 8)
> >   		__field(u32, act)
> > +		__field(int, from_index)
> > +		__field(int, to_index)
> >   		__field(int, err)
> >   	),
> >
> >   	TP_fast_assign(
> >   		BUILD_BUG_ON(sizeof(__entry->prog_tag) != sizeof(xdp->tag));
> >   		memcpy(__entry->prog_tag, xdp->tag, sizeof(xdp->tag));
> > +		__entry->act		= act;
> >   		__entry->from_index	= from_index;
> >   		__entry->to_index	= to_index;
> > -		__entry->act = act;  
> 
> If you already get them in sync, could you also make it consistent
> that for both tracepoints in TP_ARGS() we use either ifindex
> directly or device pointer and extract it from TP_fast_assign().
> Right now, it's mixed.

I did this because, in the (bpf_redirect)_map case, I was hoping that
"from_index" could be come the index from the map.  Looking closer at
the devmap design, I cannot see how we can ever know what (the bpf_prog
thinks) is the corresponding map-ingress/from_index.

Based on that, I think we can/should use the device pointer as arg (as
you suggested). And then rename "from_index" to "ifindex", where
ifindex is the XDP ifindex the xdp_buff arrived on.

I guess we can keep the "to_index".  We also need to extend the
tracepoint args with a "map", to let the trace user tell whether the
"to_index" is an ifindex or a map-index.

I'll code this up and submit at V2.
-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply

* Re: [PATCH V8 net-next 00/22] Huawei HiNIC Ethernet Driver
From: Arnd Bergmann @ 2017-08-23  7:37 UTC (permalink / raw)
  To: Aviad Krawczyk
  Cc: David Miller, Linux Kernel Mailing List, Networking, bc.y,
	victor.gissin, zhaochen6, tony.qu
In-Reply-To: <cover.1503330613.git.aviad.krawczyk@huawei.com>

On Mon, Aug 21, 2017 at 5:55 PM, Aviad Krawczyk
<aviad.krawczyk@huawei.com> wrote:
> The patch-set contains the support of the HiNIC Ethernet driver for
> hinic family of PCIE Network Interface Cards.
>
> The Huawei's PCIE HiNIC card is a new Ethernet card and hence there was
> a need of a new driver.
>
> The current driver is meant to be used for the Physical Function and there
> would soon be a support for Virtual Function and more features once the
> basic PF driver has been accepted.

Sorry I didn't comment before it got merged, but once it appeared in
linux-next I saw it and wondered why this is grouped under huawei
unlike the network drivers for the parts integrated into the hip0x
SoCs from hisilicon. Both appear to be made by Huawei's HiSilicon
subsidiary.

I did not check whether the two devices are related at all or could
share some of the source code as well, but it might be good to
move this one into the existing directory to spare confusion later.

         Arnd

^ permalink raw reply

* Stable apply request [was: Bluetooth: bnep: fix possible might sleep error in bnep_session]
From: Jiri Slaby @ 2017-08-23  7:29 UTC (permalink / raw)
  To: Marcel Holtmann, Jeffy Chen, stable
  Cc: LKML, open list:BLUETOOTH DRIVERS, rvaswani, Brian Norris,
	dmitry.torokhov, Douglas Anderson, acho, Johan Hedberg,
	Network Development, David S. Miller, Gustavo F. Padovan
In-Reply-To: <0815411E-62D7-42DC-A0BF-78FD150C0949@holtmann.org>

On 06/27/2017, 07:32 PM, Marcel Holtmann wrote:
>> It looks like bnep_session has same pattern as the issue reported in
>> old rfcomm:
>>
>> 	while (1) {
>> 		set_current_state(TASK_INTERRUPTIBLE);
>> 		if (condition)
>> 			break;
>> 		// may call might_sleep here
>> 		schedule();
>> 	}
>> 	__set_current_state(TASK_RUNNING);
>>
>> Which fixed at:
>> 	dfb2fae Bluetooth: Fix nested sleeps
>>
>> So let's fix it at the same way, also follow the suggestion of:
>> https://lwn.net/Articles/628628/

...

> all 3 patches have been applied to bluetooth-next tree.

Hi,

given users are hitting it in at least 4.4 and 4.12, can we have all
three in all stables where this applies?

5da8e47d849d Bluetooth: hidp: fix possible might sleep error in
hidp_session_thread
f06d977309d0 Bluetooth: cmtp: fix possible might sleep error in cmtp_session
25717382c1dd Bluetooth: bnep: fix possible might sleep error in bnep_session

I am not sure: to stable directly or via net stable?

thanks,
-- 
js
suse labs

^ permalink raw reply

* Re: [PATCH net-next v2 10/10] arm64: dts: marvell: mcbin: enable more networking ports
From: Baruch Siach @ 2017-08-23  7:28 UTC (permalink / raw)
  To: Antoine Tenart
  Cc: davem, jason, andrew, gregory.clement, sebastian.hesselbarth,
	thomas.petazzoni, netdev, linux, nadavh, stefanc, mw,
	linux-arm-kernel
In-Reply-To: <20170822170830.32413-11-antoine.tenart@free-electrons.com>

Hi Antoine,

On Tue, Aug 22, 2017 at 07:08:30PM +0200, Antoine Tenart wrote:
> This patch enables the two GE/SFP ports. They are configured in 10GKR
> mode by default. To do this the cpm_xdmio is enabled as well, and two
> phy descriptions are added.
> 
> Signed-off-by: Antoine Tenart <antoine.tenart@free-electrons.com>
> Tested-by: Marcin Wojtas <mw@semihalf.com>
> ---
>  arch/arm64/boot/dts/marvell/armada-8040-mcbin.dts | 30 +++++++++++++++++++++++
>  1 file changed, 30 insertions(+)
> 
> diff --git a/arch/arm64/boot/dts/marvell/armada-8040-mcbin.dts b/arch/arm64/boot/dts/marvell/armada-8040-mcbin.dts
> index abd39d1c1739..6cb4b000e1ac 100644
> --- a/arch/arm64/boot/dts/marvell/armada-8040-mcbin.dts
> +++ b/arch/arm64/boot/dts/marvell/armada-8040-mcbin.dts
> @@ -127,6 +127,30 @@
>  	};
>  };
>  
> +&cpm_xmdio {
> +	status = "okay";
> +
> +	phy0: ethernet-phy@0 {
> +		compatible = "ethernet-phy-ieee802.3-c45";
> +		reg = <0>;
> +	};
> +
> +	phy1: ethernet-phy@1 {

Should be named 'ethernet-phy@8' to match the 'reg' property.

> +		compatible = "ethernet-phy-ieee802.3-c45";
> +		reg = <8>;
> +	};
> +};

baruch

-- 
     http://baruch.siach.name/blog/                  ~. .~   Tk Open Systems
=}------------------------------------------------ooO--U--Ooo------------{=
   - baruch@tkos.co.il - tel: +972.2.679.5364, http://www.tkos.co.il -

^ permalink raw reply

* Re: [PATCH net-next v4] openvswitch: enable NSH support
From: Jiri Benc @ 2017-08-23  7:26 UTC (permalink / raw)
  To: Yang, Yi
  Cc: netdev@vger.kernel.org, dev@openvswitch.org, blp@ovn.org,
	e@erig.me, jan.scheurich@ericsson.com
In-Reply-To: <20170822093825.GA94865@cran64.bj.intel.com>

On Tue, 22 Aug 2017 17:38:26 +0800, Yang, Yi wrote:
> I checked GSO support code, we have two cases to handle, one case is to
> output NSH packet to VxLAN-gpe port, the other case is to output NSH packet
> to Ethernet port (tap, veth, vhost, physical eth, etc.), I don't think
> it is a good way to do GSO support in OVS module, because we can't know
> the final output port at this point, if the final output port is
> VxLAN-gpe port, it will still face GSO issue, but I didn't see vxlan
> module handles this, maybe udp tunnel did this. I think a NSH packet can
> be split into two packets, one has NSH header, another one doesn't,
> so I'm not sure how vxlan module can avoid this situation.

As I said before, by either implementing GSO support for NSH or by
segmenting in software before pushing the NSH header. In the first
case, the packet will be software segmented before being pushed to the
hardware, in the second case, it will be software segmented before
pushing the NSH header.

> For the second case, we can add a nsh GSO module in net/nsh/nsh_gso.c
> (as MPLS did in net/mpls/mpls_gso.c), that is the best way to handle
> this, what do you think about it?

Yes, that's one of the two options that we've been talking about. It's
the better and the more efficient one.

 Jiri

^ permalink raw reply

* [PATCH RESEND v12 7/8] wireless: ipw2200: Replace PCI pool old API
From: Romain Perier @ 2017-08-23  7:16 UTC (permalink / raw)
  To: Stanislav Yakovlev, Kalle Valo
  Cc: linux-wireless, netdev, linux-kernel, Romain Perier

The PCI pool API is deprecated. This commit replaces the PCI pool old
API by the appropriate function with the DMA pool API.

Signed-off-by: Romain Perier <romain.perier@collabora.com>
Reviewed-by: Peter Senna Tschudin <peter.senna@collabora.com>
Acked-by: Stanislav Yakovlev <stas.yakovlev@gmail.com>
---

Note: As requested by Kalle Valo, resend and add linux-wireless to Cc:

 drivers/net/wireless/intel/ipw2x00/ipw2200.c | 13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/drivers/net/wireless/intel/ipw2x00/ipw2200.c b/drivers/net/wireless/intel/ipw2x00/ipw2200.c
index c311b1a994c1..c57c567add4d 100644
--- a/drivers/net/wireless/intel/ipw2x00/ipw2200.c
+++ b/drivers/net/wireless/intel/ipw2x00/ipw2200.c
@@ -3209,7 +3209,7 @@ static int ipw_load_firmware(struct ipw_priv *priv, u8 * data, size_t len)
 	struct fw_chunk *chunk;
 	int total_nr = 0;
 	int i;
-	struct pci_pool *pool;
+	struct dma_pool *pool;
 	void **virts;
 	dma_addr_t *phys;
 
@@ -3226,9 +3226,10 @@ static int ipw_load_firmware(struct ipw_priv *priv, u8 * data, size_t len)
 		kfree(virts);
 		return -ENOMEM;
 	}
-	pool = pci_pool_create("ipw2200", priv->pci_dev, CB_MAX_LENGTH, 0, 0);
+	pool = dma_pool_create("ipw2200", &priv->pci_dev->dev, CB_MAX_LENGTH, 0,
+			       0);
 	if (!pool) {
-		IPW_ERROR("pci_pool_create failed\n");
+		IPW_ERROR("dma_pool_create failed\n");
 		kfree(phys);
 		kfree(virts);
 		return -ENOMEM;
@@ -3253,7 +3254,7 @@ static int ipw_load_firmware(struct ipw_priv *priv, u8 * data, size_t len)
 
 		nr = (chunk_len + CB_MAX_LENGTH - 1) / CB_MAX_LENGTH;
 		for (i = 0; i < nr; i++) {
-			virts[total_nr] = pci_pool_alloc(pool, GFP_KERNEL,
+			virts[total_nr] = dma_pool_alloc(pool, GFP_KERNEL,
 							 &phys[total_nr]);
 			if (!virts[total_nr]) {
 				ret = -ENOMEM;
@@ -3297,9 +3298,9 @@ static int ipw_load_firmware(struct ipw_priv *priv, u8 * data, size_t len)
 	}
  out:
 	for (i = 0; i < total_nr; i++)
-		pci_pool_free(pool, virts[i], phys[i]);
+		dma_pool_free(pool, virts[i], phys[i]);
 
-	pci_pool_destroy(pool);
+	dma_pool_destroy(pool);
 	kfree(phys);
 	kfree(virts);
 
-- 
2.11.0

^ permalink raw reply related

* Re: XDP redirect measurements, gotchas and tracepoints
From: Michael Chan @ 2017-08-23  6:59 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Duyck, Alexander H, john.fastabend@gmail.com, brouer@redhat.com,
	pstaszewski@itcare.pl, netdev@vger.kernel.org,
	xdp-newbies@vger.kernel.org, andy@greyhouse.net,
	borkmann@iogearbox.net
In-Reply-To: <CAKgT0Uf=ZfVeK-wtvNxSyGEVZ3UseUOHiP3ZOg-SrzmqsR=LtQ@mail.gmail.com>

On Tue, Aug 22, 2017 at 6:06 PM, Alexander Duyck
<alexander.duyck@gmail.com> wrote:
> On Tue, Aug 22, 2017 at 1:04 PM, Michael Chan <michael.chan@broadcom.com> wrote:
>>
>> Right, but it's conceivable to add an API to "return" the buffer to
>> the input device, right?
>
> You could, it is just added complexity. "just free the buffer" in
> ixgbe usually just amounts to one atomic operation to decrement the
> total page count since page recycling is already implemented in the
> driver. You still would have to unmap the buffer regardless of if you
> were recycling it or not so all you would save is 1.000015259 atomic
> operations per packet. The fraction is because once every 64K uses we
> have to bulk update the count on the page.
>

If the buffer is returned to the input device, the input device can
keep the DMA mapping.  All it needs to do is to dma_sync it back to
the input device when the buffer is returned.

^ permalink raw reply

* [PATCH net 3/3] nfp: avoid buffer leak when representor is missing
From: Jakub Kicinski @ 2017-08-23  6:22 UTC (permalink / raw)
  To: netdev; +Cc: oss-drivers, Jakub Kicinski
In-Reply-To: <20170823062244.26580-1-jakub.kicinski@netronome.com>

When driver receives a muxed frame, but it can't find the representor
netdev it is destined to it will try to "drop" that frame, i.e. reuse
the buffer.  The issue is that the replacement buffer has already been
allocated at this point, and reusing the buffer from received frame
will leak it.  Change the code to put the new buffer on the ring
earlier and not reuse the old buffer (make the buffer parameter
to nfp_net_rx_drop() a NULL).

Fixes: 91bf82ca9eed ("nfp: add support for tx/rx with metadata portid")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
---
 drivers/net/ethernet/netronome/nfp/nfp_net_common.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
index 9f77ce038a4a..1ff0c577819e 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
@@ -1751,6 +1751,10 @@ static int nfp_net_rx(struct nfp_net_rx_ring *rx_ring, int budget)
 			continue;
 		}
 
+		nfp_net_dma_unmap_rx(dp, rxbuf->dma_addr);
+
+		nfp_net_rx_give_one(dp, rx_ring, new_frag, new_dma_addr);
+
 		if (likely(!meta.portid)) {
 			netdev = dp->netdev;
 		} else {
@@ -1759,16 +1763,12 @@ static int nfp_net_rx(struct nfp_net_rx_ring *rx_ring, int budget)
 			nn = netdev_priv(dp->netdev);
 			netdev = nfp_app_repr_get(nn->app, meta.portid);
 			if (unlikely(!netdev)) {
-				nfp_net_rx_drop(dp, r_vec, rx_ring, rxbuf, skb);
+				nfp_net_rx_drop(dp, r_vec, rx_ring, NULL, skb);
 				continue;
 			}
 			nfp_repr_inc_rx_stats(netdev, pkt_len);
 		}
 
-		nfp_net_dma_unmap_rx(dp, rxbuf->dma_addr);
-
-		nfp_net_rx_give_one(dp, rx_ring, new_frag, new_dma_addr);
-
 		skb_reserve(skb, pkt_off);
 		skb_put(skb, pkt_len);
 
-- 
2.11.0

^ permalink raw reply related

* [PATCH net 2/3] nfp: make sure representors are destroyed before their lower netdev
From: Jakub Kicinski @ 2017-08-23  6:22 UTC (permalink / raw)
  To: netdev; +Cc: oss-drivers, Jakub Kicinski
In-Reply-To: <20170823062244.26580-1-jakub.kicinski@netronome.com>

App start/stop callbacks can perform application initialization.
Unfortunately, flower app started using them for creating and
destroying representors.  This can lead to a situation where
lower vNIC netdev is destroyed while representors still try
to pass traffic.  This will most likely lead to a NULL-dereference
on the lower netdev TX path.

Move the start/stop callbacks, so that representors are created/
destroyed when vNICs are fully initialized.

Fixes: 5de73ee46704 ("nfp: general representor implementation")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
---
 drivers/net/ethernet/netronome/nfp/nfp_net_main.c | 22 ++++++++++++++--------
 1 file changed, 14 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_main.c b/drivers/net/ethernet/netronome/nfp/nfp_net_main.c
index 5797dbf2b507..1aca4e57bf41 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_main.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_main.c
@@ -456,10 +456,6 @@ static int nfp_net_pf_app_start(struct nfp_pf *pf)
 {
 	int err;
 
-	err = nfp_net_pf_app_start_ctrl(pf);
-	if (err)
-		return err;
-
 	err = nfp_app_start(pf->app, pf->ctrl_vnic);
 	if (err)
 		goto err_ctrl_stop;
@@ -484,7 +480,6 @@ static void nfp_net_pf_app_stop(struct nfp_pf *pf)
 	if (pf->num_vfs)
 		nfp_app_sriov_disable(pf->app);
 	nfp_app_stop(pf->app);
-	nfp_net_pf_app_stop_ctrl(pf);
 }
 
 static void nfp_net_pci_unmap_mem(struct nfp_pf *pf)
@@ -559,7 +554,7 @@ static int nfp_net_pci_map_mem(struct nfp_pf *pf)
 
 static void nfp_net_pci_remove_finish(struct nfp_pf *pf)
 {
-	nfp_net_pf_app_stop(pf);
+	nfp_net_pf_app_stop_ctrl(pf);
 	/* stop app first, to avoid double free of ctrl vNIC's ddir */
 	nfp_net_debugfs_dir_clean(&pf->ddir);
 
@@ -690,6 +685,7 @@ int nfp_net_pci_probe(struct nfp_pf *pf)
 {
 	struct nfp_net_fw_version fw_ver;
 	u8 __iomem *ctrl_bar, *qc_bar;
+	struct nfp_net *nn;
 	int stride;
 	int err;
 
@@ -766,7 +762,7 @@ int nfp_net_pci_probe(struct nfp_pf *pf)
 	if (err)
 		goto err_free_vnics;
 
-	err = nfp_net_pf_app_start(pf);
+	err = nfp_net_pf_app_start_ctrl(pf);
 	if (err)
 		goto err_free_irqs;
 
@@ -774,12 +770,20 @@ int nfp_net_pci_probe(struct nfp_pf *pf)
 	if (err)
 		goto err_stop_app;
 
+	err = nfp_net_pf_app_start(pf);
+	if (err)
+		goto err_clean_vnics;
+
 	mutex_unlock(&pf->lock);
 
 	return 0;
 
+err_clean_vnics:
+	list_for_each_entry(nn, &pf->vnics, vnic_list)
+		if (nfp_net_is_data_vnic(nn))
+			nfp_net_pf_clean_vnic(pf, nn);
 err_stop_app:
-	nfp_net_pf_app_stop(pf);
+	nfp_net_pf_app_stop_ctrl(pf);
 err_free_irqs:
 	nfp_net_pf_free_irqs(pf);
 err_free_vnics:
@@ -803,6 +807,8 @@ void nfp_net_pci_remove(struct nfp_pf *pf)
 	if (list_empty(&pf->vnics))
 		goto out;
 
+	nfp_net_pf_app_stop(pf);
+
 	list_for_each_entry(nn, &pf->vnics, vnic_list)
 		if (nfp_net_is_data_vnic(nn))
 			nfp_net_pf_clean_vnic(pf, nn);
-- 
2.11.0

^ permalink raw reply related

* [PATCH net 1/3] nfp: don't hold PF lock while enabling SR-IOV
From: Jakub Kicinski @ 2017-08-23  6:22 UTC (permalink / raw)
  To: netdev; +Cc: oss-drivers, Jakub Kicinski
In-Reply-To: <20170823062244.26580-1-jakub.kicinski@netronome.com>

Enabling SR-IOV VFs will cause the PCI subsystem to schedule a
work and flush its workqueue.  Since the nfp driver schedules its
own work we can't enable VFs while holding driver load.  Commit
6d48ceb27af1 ("nfp: allocate a private workqueue for driver work")
tried to avoid this deadlock by creating a separate workqueue.
Unfortunately, due to the architecture of workqueue subsystem this
does not guarantee a separate thread of execution.  Luckily
we can simply take pci_enable_sriov() from under the driver lock.

Take pci_disable_sriov() from under the lock too for symmetry.

Fixes: 6d48ceb27af1 ("nfp: allocate a private workqueue for driver work")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
---
 drivers/net/ethernet/netronome/nfp/nfp_main.c | 16 +++++++---------
 1 file changed, 7 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_main.c b/drivers/net/ethernet/netronome/nfp/nfp_main.c
index d67969d3e484..3f199db2002e 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_main.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_main.c
@@ -98,21 +98,20 @@ static int nfp_pcie_sriov_enable(struct pci_dev *pdev, int num_vfs)
 	struct nfp_pf *pf = pci_get_drvdata(pdev);
 	int err;
 
-	mutex_lock(&pf->lock);
-
 	if (num_vfs > pf->limit_vfs) {
 		nfp_info(pf->cpp, "Firmware limits number of VFs to %u\n",
 			 pf->limit_vfs);
-		err = -EINVAL;
-		goto err_unlock;
+		return -EINVAL;
 	}
 
 	err = pci_enable_sriov(pdev, num_vfs);
 	if (err) {
 		dev_warn(&pdev->dev, "Failed to enable PCI SR-IOV: %d\n", err);
-		goto err_unlock;
+		return err;
 	}
 
+	mutex_lock(&pf->lock);
+
 	err = nfp_app_sriov_enable(pf->app, num_vfs);
 	if (err) {
 		dev_warn(&pdev->dev,
@@ -129,9 +128,8 @@ static int nfp_pcie_sriov_enable(struct pci_dev *pdev, int num_vfs)
 	return num_vfs;
 
 err_sriov_disable:
-	pci_disable_sriov(pdev);
-err_unlock:
 	mutex_unlock(&pf->lock);
+	pci_disable_sriov(pdev);
 	return err;
 #endif
 	return 0;
@@ -158,10 +156,10 @@ static int nfp_pcie_sriov_disable(struct pci_dev *pdev)
 
 	pf->num_vfs = 0;
 
+	mutex_unlock(&pf->lock);
+
 	pci_disable_sriov(pdev);
 	dev_dbg(&pdev->dev, "Removed VFs.\n");
-
-	mutex_unlock(&pf->lock);
 #endif
 	return 0;
 }
-- 
2.11.0

^ permalink raw reply related

* [PATCH net 0/3] nfp: fix SR-IOV deadlock and representor bugs
From: Jakub Kicinski @ 2017-08-23  6:22 UTC (permalink / raw)
  To: netdev; +Cc: oss-drivers, Jakub Kicinski

This series tackles the bug I've already tried to fix in commit 
6d48ceb27af1 ("nfp: allocate a private workqueue for driver work").
I created a separate workqueue to avoid possible deadlock, and
the lockdep error disappeared, coincidentally.  The way workqueues
are operating, separate workqueue doesn't necessarily mean separate
thread of execution.  Luckily we can safely forego the lock.

Second fix changes the order in which vNIC netdevs and representors
are created/destroyed.  The fix is kept small and should be sufficient
for net because of how flower uses representors, a more thorough fix 
will be targeted at net-next.

Third fix avoids leaking mapped frame buffers if FW sent a frame with
unknown portid.

Jakub Kicinski (3):
  nfp: don't hold PF lock while enabling SR-IOV
  nfp: make sure representors are destroyed before their lower netdev
  nfp: avoid buffer leak when representor is missing

 drivers/net/ethernet/netronome/nfp/nfp_main.c      | 16 +++++++---------
 .../net/ethernet/netronome/nfp/nfp_net_common.c    | 10 +++++-----
 drivers/net/ethernet/netronome/nfp/nfp_net_main.c  | 22 ++++++++++++++--------
 3 files changed, 26 insertions(+), 22 deletions(-)

-- 
2.11.0

^ permalink raw reply

* [PATCH] dt-binding: net/phy: fix interrupts description
From: Baruch Siach @ 2017-08-23  6:11 UTC (permalink / raw)
  To: Rob Herring, Mark Rutland, David S . Miller
  Cc: devicetree, netdev, Baruch Siach

Commit b053dc5a722ea (powerpc: Refactor device tree binding) split the
Ethernet PHY binding documentation out of the big booting-without-of.txt
file, leaving a dangling reference to "section 2" in the 'interrupts'
property description. Drop that reference, and make the description look
more like the rest.

While at it, make the example interrupt-parent phandle look more like a
real world phandle, and use an IRQ_TYPE_ macro for the 'interrupts'
type.

Signed-off-by: Baruch Siach <baruch@tkos.co.il>
---
 Documentation/devicetree/bindings/net/phy.txt | 10 +++-------
 1 file changed, 3 insertions(+), 7 deletions(-)

diff --git a/Documentation/devicetree/bindings/net/phy.txt b/Documentation/devicetree/bindings/net/phy.txt
index b55857696fc3..86ba72af6163 100644
--- a/Documentation/devicetree/bindings/net/phy.txt
+++ b/Documentation/devicetree/bindings/net/phy.txt
@@ -2,11 +2,7 @@ PHY nodes

 Required properties:

- - interrupts : <a b> where a is the interrupt number and b is a
-   field that represents an encoding of the sense and level
-   information for the interrupt.  This should be encoded based on
-   the information in section 2) depending on the type of interrupt
-   controller you have.
+ - interrupts : interrupt specifier for the sole interrupt.
  - interrupt-parent : the phandle for the interrupt controller that
    services interrupts for this device.
  - reg : The ID number for the phy, usually a small integer
@@ -56,7 +52,7 @@ Example:

 ethernet-phy@0 {
 	compatible = "ethernet-phy-id0141.0e90", "ethernet-phy-ieee802.3-c22";
-	interrupt-parent = <40000>;
-	interrupts = <35 1>;
+	interrupt-parent = <&PIC>;
+	interrupts = <35 IRQ_TYPE_EDGE_RISING>;
 	reg = <0>;
 };
-- 
2.14.1

^ permalink raw reply related

* Re: linux-next: manual merge of the net-next tree with the net tree
From: Ido Schimmel @ 2017-08-23  5:41 UTC (permalink / raw)
  To: Stephen Rothwell
  Cc: David Miller, Networking, Linux-Next Mailing List,
	Linux Kernel Mailing List, Wei Wang, Ido Schimmel, iri Pirko
In-Reply-To: <20170823113105.0bd692ea@canb.auug.org.au>

On Wed, Aug 23, 2017 at 11:31:05AM +1000, Stephen Rothwell wrote:
> Hi all,
> 
> Today's linux-next merge of the net-next tree got a conflict in:
> 
>   net/ipv6/ip6_fib.c
> 
> between commit:
> 
>   c5cff8561d2d ("ipv6: add rcu grace period before freeing fib6_node")
> 
> from the net tree and commit:
> 
>   a460aa83963b ("ipv6: fib: Add helpers to hold / drop a reference on rt6_info")
> 
> from the net-next tree.
> 
> I fixed it up (see below) and can carry the fix as necessary. This
> is now fixed as far as linux-next is concerned, but any non trivial
> conflicts should be mentioned to your upstream maintainer when your tree
> is submitted for merging.  You may also want to consider cooperating
> with the maintainer of the conflicting tree to minimise any particularly
> complex conflicts.

Looks good to me.

Thanks!

^ permalink raw reply

* RE: [PATCH v2 1/6] fsl/fman: enable FMan Keygen
From: Madalin-cristian Bucur @ 2017-08-23  5:16 UTC (permalink / raw)
  To: David Miller
  Cc: netdev@vger.kernel.org, linuxppc-dev@lists.ozlabs.org,
	linux-kernel@vger.kernel.org, Florinel Iordache
In-Reply-To: <20170822.214645.953115103494976385.davem@davemloft.net>

> -----Original Message-----
> From: David Miller [mailto:davem@davemloft.net]
> Sent: Wednesday, August 23, 2017 7:47 AM
> To: Madalin-cristian Bucur <madalin.bucur@nxp.com>
> Cc: netdev@vger.kernel.org; linuxppc-dev@lists.ozlabs.org; linux-
> kernel@vger.kernel.org
> Subject: Re: [PATCH v2 1/6] fsl/fman: enable FMan Keygen
> 
> From: Madalin-cristian Bucur <madalin.bucur@nxp.com>
> Date: Wed, 23 Aug 2017 04:36:56 +0000
> 
> > The struct fman is only visible in the fman file, the fman port
> > module uses struct fman as an opaque pointer, thus this export.
> 
> Don't use that programming model.
> 
> Export the datastructure properly to it's users.
> 
> This abstraction scheme is so wasteful and costly.

Normally does not come with this cost, it's this case where one of the
sub-modules needs to call into another that gets things complicated.
I'll move struct fman to the header file.

Thanks,
Madalin

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox