Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH net-next] bpf: Optimize lpm trie delete
From: Daniel Mack @ 2017-09-20 16:51 UTC (permalink / raw)
  To: Craig Gallek, Alexei Starovoitov, Daniel Borkmann,
	David S . Miller; +Cc: netdev
In-Reply-To: <20170920162247.63787-1-kraigatgoog@gmail.com>

Hi Craig,

Thanks, this looks much cleaner already :)

On 09/20/2017 06:22 PM, Craig Gallek wrote:
> diff --git a/kernel/bpf/lpm_trie.c b/kernel/bpf/lpm_trie.c
> index 9d58a576b2ae..b5a7d70ec8b5 100644
> --- a/kernel/bpf/lpm_trie.c
> +++ b/kernel/bpf/lpm_trie.c
> @@ -397,7 +397,7 @@ static int trie_delete_elem(struct bpf_map *map, void *_key)
>  	struct lpm_trie_node __rcu **trim;
>  	struct lpm_trie_node *node;
>  	unsigned long irq_flags;
> -	unsigned int next_bit;
> +	unsigned int next_bit = 0;

This default assignment seems wrong, and I guess you only added it to
squelch a compiler warning?

[...]

> +	/* If the node has one child, we may be able to collapse the tree
> +	 * while removing this node if the node's child is in the same
> +	 * 'next bit' slot as this node was in its parent or if the node
> +	 * itself is the root.
> +	 */
> +	if (trim == &trie->root) {
> +		next_bit = node->child[0] ? 0 : 1;
> +		rcu_assign_pointer(trie->root, node->child[next_bit]);
> +		kfree_rcu(node, rcu);

I don't think you should treat this 'root' case special.

Instead, move the 'next_bit' assignment outside of the condition ...

> +	} else if (rcu_access_pointer(node->child[next_bit])) {
> +		rcu_assign_pointer(*trim, node->child[next_bit]);
> +		kfree_rcu(node, rcu);

... and then this branch would handle the case just fine. Correct?

Otherwise, looks good to me!



Thanks,
Daniel

^ permalink raw reply

* Re: [RFC PATCH] net: Introduce a socket option to enable picking tx queue based on rx queue.
From: Samudrala, Sridhar @ 2017-09-20 16:51 UTC (permalink / raw)
  To: Tom Herbert, Eric Dumazet
  Cc: Alexander Duyck, Linux Kernel Network Developers
In-Reply-To: <CALx6S374dN944bdJ87Za+MzFH3YV_6S5L3ZVGKD9503fp=-6Bg@mail.gmail.com>



On 9/20/2017 7:18 AM, Tom Herbert wrote:
> On Tue, Sep 19, 2017 at 10:13 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> On Tue, 2017-09-19 at 21:59 -0700, Samudrala, Sridhar wrote:
>>> On 9/19/2017 5:48 PM, Tom Herbert wrote:
>>>> On Tue, Sep 19, 2017 at 5:34 PM, Samudrala, Sridhar
>>>> <sridhar.samudrala@intel.com> wrote:
>>>>> On 9/12/2017 3:53 PM, Tom Herbert wrote:
>>>>>> On Tue, Sep 12, 2017 at 3:31 PM, Samudrala, Sridhar
>>>>>> <sridhar.samudrala@intel.com> wrote:
>>>>>>> On 9/12/2017 8:47 AM, Eric Dumazet wrote:
>>>>>>>> On Mon, 2017-09-11 at 23:27 -0700, Samudrala, Sridhar wrote:
>>>>>>>>> On 9/11/2017 8:53 PM, Eric Dumazet wrote:
>>>>>>>>>> On Mon, 2017-09-11 at 20:12 -0700, Tom Herbert wrote:
>>>>>>>>>>
>>>>>>>>>>> Two ints in sock_common for this purpose is quite expensive and the
>>>>>>>>>>> use case for this is limited-- even if a RX->TX queue mapping were
>>>>>>>>>>> introduced to eliminate the queue pair assumption this still won't
>>>>>>>>>>> help if the receive and transmit interfaces are different for the
>>>>>>>>>>> connection. I think we really need to see some very compelling
>>>>>>>>>>> results
>>>>>>>>>>> to be able to justify this.
>>>>>>>>> Will try to collect and post some perf data with symmetric queue
>>>>>>>>> configuration.
>>>>> Here is some performance data i collected with memcached workload over
>>>>> ixgbe 10Gb NIC with mcblaster benchmark.
>>>>> ixgbe is configured with 16 queues and rx-usecs is set to 1000 for a very
>>>>> low
>>>>> interrupt rate.
>>>>>        ethtool -L p1p1 combined 16
>>>>>        ethtool -C p1p1 rx-usecs 1000
>>>>> and busy poll is set to 1000usecs
>>>>>        sysctl net.core.busy_poll = 1000
>>>>>
>>>>> 16 threads  800K requests/sec
>>>>> =============================
>>>>>                    rtt(min/avg/max)usecs     intr/sec contextswitch/sec
>>>>> -----------------------------------------------------------------------
>>>>> Default                2/182/10641            23391 61163
>>>>> Symmetric Queues       2/50/6311              20457 32843
>>>>>
>>>>> 32 threads  800K requests/sec
>>>>> =============================
>>>>>                   rtt(min/avg/max)usecs     intr/sec contextswitch/sec
>>>>> ------------------------------------------------------------------------
>>>>> Default                2/162/6390            32168 69450
>>>>> Symmetric Queues        2/50/3853            35044 35847
>>>>>
>>>> No idea what "Default" configuration is. Please report how xps_cpus is
>>>> being set, how many RSS queues there are, and what the mapping is
>>>> between RSS queues and CPUs and shared caches. Also, whether and
>>>> threads are pinned.
>>> Default is linux 4.13 with the settings i listed above.
>>>          ethtool -L p1p1 combined 16
>>>          ethtool -C p1p1 rx-usecs 1000
>>>          sysctl net.core.busy_poll = 1000
>>>
>>> # ethtool -x p1p1
>>> RX flow hash indirection table for p1p1 with 16 RX ring(s):
>>>      0:      0     1     2     3     4     5     6     7
>>>      8:      8     9    10    11    12    13    14    15
>>>     16:      0     1     2     3     4     5     6     7
>>>     24:      8     9    10    11    12    13    14    15
>>>     32:      0     1     2     3     4     5     6     7
>>>     40:      8     9    10    11    12    13    14    15
>>>     48:      0     1     2     3     4     5     6     7
>>>     56:      8     9    10    11    12    13    14    15
>>>     64:      0     1     2     3     4     5     6     7
>>>     72:      8     9    10    11    12    13    14    15
>>>     80:      0     1     2     3     4     5     6     7
>>>     88:      8     9    10    11    12    13    14    15
>>>     96:      0     1     2     3     4     5     6     7
>>>    104:      8     9    10    11    12    13    14    15
>>>    112:      0     1     2     3     4     5     6     7
>>>    120:      8     9    10    11    12    13    14    15
>>>
>>> smp_affinity for the 16 queuepairs
>>>          141 p1p1-TxRx-0 0000,00000001
>>>          142 p1p1-TxRx-1 0000,00000002
>>>          143 p1p1-TxRx-2 0000,00000004
>>>          144 p1p1-TxRx-3 0000,00000008
>>>          145 p1p1-TxRx-4 0000,00000010
>>>          146 p1p1-TxRx-5 0000,00000020
>>>          147 p1p1-TxRx-6 0000,00000040
>>>          148 p1p1-TxRx-7 0000,00000080
>>>          149 p1p1-TxRx-8 0000,00000100
>>>          150 p1p1-TxRx-9 0000,00000200
>>>          151 p1p1-TxRx-10 0000,00000400
>>>          152 p1p1-TxRx-11 0000,00000800
>>>          153 p1p1-TxRx-12 0000,00001000
>>>          154 p1p1-TxRx-13 0000,00002000
>>>          155 p1p1-TxRx-14 0000,00004000
>>>          156 p1p1-TxRx-15 0000,00008000
>>> xps_cpus for the 16 Tx queues
>>>          0000,00000001
>>>          0000,00000002
>>>          0000,00000004
>>>          0000,00000008
>>>          0000,00000010
>>>          0000,00000020
>>>          0000,00000040
>>>          0000,00000080
>>>          0000,00000100
>>>          0000,00000200
>>>          0000,00000400
>>>          0000,00000800
>>>          0000,00001000
>>>          0000,00002000
>>>          0000,00004000
>>>          0000,00008000
>>> memcached threads are not pinned.
>>>
>> ...
>>
>> I urge you to take the time to properly tune this host.
>>
>> linux kernel does not do automagic configuration. This is user policy.
>>
>> Documentation/networking/scaling.txt has everything you need.
>>
> Yes, tuning a system for optimal performance is difficult. Even if you
> find a performance benefit for a configuration on one system, that
> might not translate to another. In other words, if you've produced
> some code that seems to perform better than previous implementation on
> a test machine it's not enough to be satisfied with that. We want
> understand _why_ there is a difference. If you can show there is
> intrinsic benefits to the queue-pair model that we can't achieve with
> existing implementation _and_ can show there are ill effects in other
> circumstances, then you should have a good case to make changes.
>
> In the case of memcached, threads inevitably migrate off the CPU they
> were created on, the data follows the thread but the RX-queue does not
> change which means that the receive path is crosses CPUs or caches.
> But, then in the queuepair case that also means transmit completions
> are crossing CPUs. We don't normally expect that to be a good thing.
> However, transmit completion processing does not happen in the
> critical path, so if that work is being deferred to a less busy CPU
> there may benefits. That's only a theory, analysis and experimentation
> should be able to get to the root cause.
>
With regards to tuning, forgot to mention that memcached is updated to
select thethread based on incoming queue via SO_INCOMING_NAPI_ID and
is started with16 threads to match the number of RX queues.
If i do pinning of memcached threads to each of the 16 cores, i do get
similar performance as symmetric queues. But this symmetric queues 
configuration
is to support scenarios where it is not possible to pin the threads of the
application.

Thanks
Sridhar

^ permalink raw reply

* Re: [PATCH v4 4/4] samples/bpf: Add documentation on cross compilation
From: Randy Dunlap @ 2017-09-20 16:50 UTC (permalink / raw)
  To: Joel Fernandes, linux-kernel
  Cc: netdev, alison, juri.lelli, fengc, daniel, davem, ast,
	kernel-team
In-Reply-To: <20170920161159.25747-4-joelaf@google.com>

On 09/20/17 09:11, Joel Fernandes wrote:
> Acked-by: Alexei Starovoitov <ast@kernel.org>
> Signed-off-by: Joel Fernandes <joelaf@google.com>
> ---
>  samples/bpf/README.rst | 10 ++++++++++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/samples/bpf/README.rst b/samples/bpf/README.rst
> index 79f9a58f1872..2b906127ef54 100644
> --- a/samples/bpf/README.rst
> +++ b/samples/bpf/README.rst
> @@ -64,3 +64,13 @@ It is also possible to point make to the newly compiled 'llc' or
>  'clang' command via redefining LLC or CLANG on the make command line::
>  
>   make samples/bpf/ LLC=~/git/llvm/build/bin/llc CLANG=~/git/llvm/build/bin/clang
> +
> +Cross compiling samples
> +-----------------------
> +Inorder to cross-compile, say for arm64 targets, export CROSS_COMPILE and ARCH

   In order to

> +environment variables before calling make. This will direct make to build
> +samples for the cross target.
> +
> +export ARCH=arm64
> +export CROSS_COMPILE="aarch64-linux-gnu-"
> +make samples/bpf/ LLC=~/git/llvm/build/bin/llc CLANG=~/git/llvm/build/bin/clang
> 


-- 
~Randy

^ permalink raw reply

* Re: [Patch net] net_sched: remove cls_flower idr on failure
From: Jiri Pirko @ 2017-09-20 16:40 UTC (permalink / raw)
  To: Cong Wang; +Cc: netdev, Chris Mi, Jiri Pirko
In-Reply-To: <20170920161845.28753-1-xiyou.wangcong@gmail.com>

Wed, Sep 20, 2017 at 06:18:45PM CEST, xiyou.wangcong@gmail.com wrote:
>Fixes: c15ab236d69d ("net/sched: Change cls_flower to use IDR")
>Cc: Chris Mi <chrism@mellanox.com>
>Cc: Jiri Pirko <jiri@mellanox.com>
>Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>

Looks fine.
Acked-by: Jiri Pirko <jiri@mellanox.com>

^ permalink raw reply

* Re: [PATCH net-next] net: avoid a full fib lookup when rp_filter is disabled.
From: Paolo Abeni @ 2017-09-20 16:37 UTC (permalink / raw)
  To: David S. Miller; +Cc: Eric Dumazet, Hannes Frederic Sowa, netdev
In-Reply-To: <597db26b5e57eb89d2ff27454cce997fa6c0f5aa.1505924560.git.pabeni@redhat.com>

Dumb me and dumb my scripts. 

This is actually a v2, v1 was at:

https://patchwork.ozlabs.org/project/netdev/list/?series=3835

David, please let me know if you prefer I'll repost with a more
appropriate subject line.

Sorry for the noise,

Paolo

^ permalink raw reply

* Re: [PATCH,net-next,0/2] Improve code coverage of syzkaller
From: David Miller @ 2017-09-20 16:36 UTC (permalink / raw)
  To: willemdebruijn.kernel; +Cc: peterpenkov96, netdev
In-Reply-To: <CAF=yD-J_qL=hkgnbgPmHW3o_iC6WKmqFUZmmzQKs-+AUhEWY-Q@mail.gmail.com>

From: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Date: Wed, 20 Sep 2017 11:38:45 -0400

> I think that the compile time option was chosen because of the ns_capable
> check, so that with user namespaces unprivileged processes can control this
> path. Perhaps we can require capable() only to set IFF_NAPI_FRAGS.
> 
> Then we can convert the napi_gro_receive path to be conditional on a new
> IFF_NAPI flag instead of this compile time option.

That works for me.

^ permalink raw reply

* [PATCH net-next] net: avoid a full fib lookup when rp_filter is disabled.
From: Paolo Abeni @ 2017-09-20 16:26 UTC (permalink / raw)
  To: netdev; +Cc: David S. Miller, Eric Dumazet, Hannes Frederic Sowa

Since commit 1dced6a85482 ("ipv4: Restore accept_local behaviour
in fib_validate_source()") a full fib lookup is needed even if
the rp_filter is disabled, if accept_local is false - which is
the default.

What we really need in the above scenario is just checking
that the source IP address is not local, and in most case we
can do that is a cheaper way looking up the ifaddr hash table.

This commit adds a helper for such lookup, and uses it to
validate the src address when rp_filter is disabled and no
'local' routes are created by the user space in the relevant
namespace.

A new ipv4 netns flag is added to account for such routes.
We need that to preserve the same behavior we had before this
patch.

It also drops the checks to bail early from __fib_validate_source,
added by the commit 1dced6a85482 ("ipv4: Restore accept_local
behaviour in fib_validate_source()") they do not give any
measurable performance improvement: if we do the lookup with are
on a slower path.

This improves UDP performances for unconnected sockets
when rp_filter is disabled by 5% and also gives small but
measurable performance improvement for TCP flood scenarios.

v1 -> v2:
 - use the ifaddr lookup helper in __ip_dev_find(), as suggested
   by Eric
 - fall-back to full lookup if custom local routes are present

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 include/linux/inetdevice.h |  1 +
 include/net/netns/ipv4.h   |  1 +
 net/ipv4/devinet.c         | 30 ++++++++++++++++++------------
 net/ipv4/fib_frontend.c    | 22 +++++++++++++++++-----
 4 files changed, 37 insertions(+), 17 deletions(-)

diff --git a/include/linux/inetdevice.h b/include/linux/inetdevice.h
index fb3f809e34e4..751d051f0bc7 100644
--- a/include/linux/inetdevice.h
+++ b/include/linux/inetdevice.h
@@ -179,6 +179,7 @@ __be32 inet_confirm_addr(struct net *net, struct in_device *in_dev, __be32 dst,
 			 __be32 local, int scope);
 struct in_ifaddr *inet_ifa_byprefix(struct in_device *in_dev, __be32 prefix,
 				    __be32 mask);
+struct in_ifaddr *inet_lookup_ifaddr_rcu(struct net *net, __be32 addr);
 static __inline__ bool inet_ifa_match(__be32 addr, struct in_ifaddr *ifa)
 {
 	return !((addr^ifa->ifa_address)&ifa->ifa_mask);
diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h
index 20d061c805e3..20720721da4b 100644
--- a/include/net/netns/ipv4.h
+++ b/include/net/netns/ipv4.h
@@ -49,6 +49,7 @@ struct netns_ipv4 {
 #ifdef CONFIG_IP_MULTIPLE_TABLES
 	struct fib_rules_ops	*rules_ops;
 	bool			fib_has_custom_rules;
+	bool			fib_has_custom_local_routes;
 	struct fib_table __rcu	*fib_main;
 	struct fib_table __rcu	*fib_default;
 #endif
diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
index d7adc0616599..7ce22a2c07ce 100644
--- a/net/ipv4/devinet.c
+++ b/net/ipv4/devinet.c
@@ -137,22 +137,12 @@ static void inet_hash_remove(struct in_ifaddr *ifa)
  */
 struct net_device *__ip_dev_find(struct net *net, __be32 addr, bool devref)
 {
-	u32 hash = inet_addr_hash(net, addr);
 	struct net_device *result = NULL;
 	struct in_ifaddr *ifa;
 
 	rcu_read_lock();
-	hlist_for_each_entry_rcu(ifa, &inet_addr_lst[hash], hash) {
-		if (ifa->ifa_local == addr) {
-			struct net_device *dev = ifa->ifa_dev->dev;
-
-			if (!net_eq(dev_net(dev), net))
-				continue;
-			result = dev;
-			break;
-		}
-	}
-	if (!result) {
+	ifa = inet_lookup_ifaddr_rcu(net, addr);
+	if (!ifa) {
 		struct flowi4 fl4 = { .daddr = addr };
 		struct fib_result res = { 0 };
 		struct fib_table *local;
@@ -165,6 +155,8 @@ struct net_device *__ip_dev_find(struct net *net, __be32 addr, bool devref)
 		    !fib_table_lookup(local, &fl4, &res, FIB_LOOKUP_NOREF) &&
 		    res.type == RTN_LOCAL)
 			result = FIB_RES_DEV(res);
+	} else {
+		result = ifa->ifa_dev->dev;
 	}
 	if (result && devref)
 		dev_hold(result);
@@ -173,6 +165,20 @@ struct net_device *__ip_dev_find(struct net *net, __be32 addr, bool devref)
 }
 EXPORT_SYMBOL(__ip_dev_find);
 
+/* called under RCU lock */
+struct in_ifaddr *inet_lookup_ifaddr_rcu(struct net *net, __be32 addr)
+{
+	u32 hash = inet_addr_hash(net, addr);
+	struct in_ifaddr *ifa;
+
+	hlist_for_each_entry_rcu(ifa, &inet_addr_lst[hash], hash)
+		if (ifa->ifa_local == addr &&
+		    net_eq(dev_net(ifa->ifa_dev->dev), net))
+			return ifa;
+
+	return NULL;
+}
+
 static void rtmsg_ifa(int event, struct in_ifaddr *, struct nlmsghdr *, u32);
 
 static BLOCKING_NOTIFIER_HEAD(inetaddr_chain);
diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index 37819ab4cc74..f02819134ba2 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -345,9 +345,6 @@ static int __fib_validate_source(struct sk_buff *skb, __be32 src, __be32 dst,
 	if (res.type != RTN_UNICAST &&
 	    (res.type != RTN_LOCAL || !IN_DEV_ACCEPT_LOCAL(idev)))
 		goto e_inval;
-	if (!rpf && !fib_num_tclassid_users(net) &&
-	    (dev->ifindex != oif || !IN_DEV_TX_REDIRECTS(idev)))
-		goto last_resort;
 	fib_combine_itag(itag, &res);
 	dev_match = false;
 
@@ -402,13 +399,26 @@ int fib_validate_source(struct sk_buff *skb, __be32 src, __be32 dst,
 			struct in_device *idev, u32 *itag)
 {
 	int r = secpath_exists(skb) ? 0 : IN_DEV_RPFILTER(idev);
+	struct net *net = dev_net(dev);
 
-	if (!r && !fib_num_tclassid_users(dev_net(dev)) &&
-	    IN_DEV_ACCEPT_LOCAL(idev) &&
+	if (!r && !fib_num_tclassid_users(net) &&
 	    (dev->ifindex != oif || !IN_DEV_TX_REDIRECTS(idev))) {
+		if (IN_DEV_ACCEPT_LOCAL(idev))
+			goto ok;
+		/* if no local routes are added from user space we can check
+		 * for local addresses looking-up the ifaddr table
+		 */
+		if (net->ipv4.fib_has_custom_local_routes)
+			goto full_check;
+		if (inet_lookup_ifaddr_rcu(net, src))
+			return -EINVAL;
+
+ok:
 		*itag = 0;
 		return 0;
 	}
+
+full_check:
 	return __fib_validate_source(skb, src, dst, tos, oif, dev, r, idev, itag);
 }
 
@@ -759,6 +769,8 @@ static int inet_rtm_newroute(struct sk_buff *skb, struct nlmsghdr *nlh,
 	}
 
 	err = fib_table_insert(net, tb, &cfg, extack);
+	if (!err && cfg.fc_type == RTN_LOCAL)
+		net->ipv4.fib_has_custom_local_routes = true;
 errout:
 	return err;
 }
-- 
2.13.5

^ permalink raw reply related

* [PATCH net-next] net: dsa: use dedicated CPU port
From: Vivien Didelot @ 2017-09-20 16:28 UTC (permalink / raw)
  To: netdev
  Cc: linux-kernel, kernel, David S. Miller, Florian Fainelli,
	Andrew Lunn, Vivien Didelot

Each port in DSA has its own dedicated CPU port currently available in
its parent switch's ds->ports[port].cpu_dp. Use it instead of getting
the unique tree CPU port, which will be deprecated soon.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
---
 drivers/net/dsa/b53/b53_common.c | 4 ++--
 drivers/net/dsa/bcm_sf2.c        | 6 +++---
 drivers/net/dsa/mt7530.c         | 4 ++--
 drivers/net/dsa/mv88e6060.c      | 2 +-
 drivers/net/dsa/qca8k.c          | 2 +-
 5 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/drivers/net/dsa/b53/b53_common.c b/drivers/net/dsa/b53/b53_common.c
index a9f2a5b55a5e..d4ce092def83 100644
--- a/drivers/net/dsa/b53/b53_common.c
+++ b/drivers/net/dsa/b53/b53_common.c
@@ -1336,7 +1336,7 @@ EXPORT_SYMBOL(b53_fdb_dump);
 int b53_br_join(struct dsa_switch *ds, int port, struct net_device *br)
 {
 	struct b53_device *dev = ds->priv;
-	s8 cpu_port = ds->dst->cpu_dp->index;
+	s8 cpu_port = ds->ports[port].cpu_dp->index;
 	u16 pvlan, reg;
 	unsigned int i;
 
@@ -1382,7 +1382,7 @@ void b53_br_leave(struct dsa_switch *ds, int port, struct net_device *br)
 {
 	struct b53_device *dev = ds->priv;
 	struct b53_vlan *vl = &dev->vlans[0];
-	s8 cpu_port = ds->dst->cpu_dp->index;
+	s8 cpu_port = ds->ports[port].cpu_dp->index;
 	unsigned int i;
 	u16 pvlan, reg, pvid;
 
diff --git a/drivers/net/dsa/bcm_sf2.c b/drivers/net/dsa/bcm_sf2.c
index 0072a959db5b..898d5642b516 100644
--- a/drivers/net/dsa/bcm_sf2.c
+++ b/drivers/net/dsa/bcm_sf2.c
@@ -661,7 +661,7 @@ static int bcm_sf2_sw_resume(struct dsa_switch *ds)
 static void bcm_sf2_sw_get_wol(struct dsa_switch *ds, int port,
 			       struct ethtool_wolinfo *wol)
 {
-	struct net_device *p = ds->dst->cpu_dp->netdev;
+	struct net_device *p = ds->ports[port].cpu_dp->netdev;
 	struct bcm_sf2_priv *priv = bcm_sf2_to_priv(ds);
 	struct ethtool_wolinfo pwol;
 
@@ -684,9 +684,9 @@ static void bcm_sf2_sw_get_wol(struct dsa_switch *ds, int port,
 static int bcm_sf2_sw_set_wol(struct dsa_switch *ds, int port,
 			      struct ethtool_wolinfo *wol)
 {
-	struct net_device *p = ds->dst->cpu_dp->netdev;
+	struct net_device *p = ds->ports[port].cpu_dp->netdev;
 	struct bcm_sf2_priv *priv = bcm_sf2_to_priv(ds);
-	s8 cpu_port = ds->dst->cpu_dp->index;
+	s8 cpu_port = ds->ports[port].cpu_dp->index;
 	struct ethtool_wolinfo pwol;
 
 	p->ethtool_ops->get_wol(p, &pwol);
diff --git a/drivers/net/dsa/mt7530.c b/drivers/net/dsa/mt7530.c
index c142b97add2c..faa3b88d2206 100644
--- a/drivers/net/dsa/mt7530.c
+++ b/drivers/net/dsa/mt7530.c
@@ -928,11 +928,11 @@ mt7530_setup(struct dsa_switch *ds)
 	struct device_node *dn;
 	struct mt7530_dummy_poll p;
 
-	/* The parent node of cpu_dp->netdev which holds the common system
+	/* The parent node of master netdev which holds the common system
 	 * controller also is the container for two GMACs nodes representing
 	 * as two netdev instances.
 	 */
-	dn = ds->dst->cpu_dp->netdev->dev.of_node->parent;
+	dn = ds->ports[MT7530_CPU_PORT].netdev->dev.of_node->parent;
 	priv->ethernet = syscon_node_to_regmap(dn);
 	if (IS_ERR(priv->ethernet))
 		return PTR_ERR(priv->ethernet);
diff --git a/drivers/net/dsa/mv88e6060.c b/drivers/net/dsa/mv88e6060.c
index dce7fa57eb55..621cdc46ad81 100644
--- a/drivers/net/dsa/mv88e6060.c
+++ b/drivers/net/dsa/mv88e6060.c
@@ -176,7 +176,7 @@ static int mv88e6060_setup_port(struct dsa_switch *ds, int p)
 		  ((p & 0xf) << PORT_VLAN_MAP_DBNUM_SHIFT) |
 		   (dsa_is_cpu_port(ds, p) ?
 			ds->enabled_port_mask :
-			BIT(ds->dst->cpu_dp->index)));
+			BIT(ds->ports[p].cpu_dp->index)));
 
 	/* Port Association Vector: when learning source addresses
 	 * of packets, add the address to the address database using
diff --git a/drivers/net/dsa/qca8k.c b/drivers/net/dsa/qca8k.c
index 5ada7a41449c..82f09711ac1a 100644
--- a/drivers/net/dsa/qca8k.c
+++ b/drivers/net/dsa/qca8k.c
@@ -506,7 +506,7 @@ qca8k_setup(struct dsa_switch *ds)
 		pr_warn("regmap initialization failed");
 
 	/* Initialize CPU port pad mode (xMII type, delays...) */
-	phy_mode = of_get_phy_mode(ds->dst->cpu_dp->dn);
+	phy_mode = of_get_phy_mode(ds->ports[QCA8K_CPU_PORT].dn);
 	if (phy_mode < 0) {
 		pr_err("Can't find phy-mode for master device\n");
 		return phy_mode;
-- 
2.14.1

^ permalink raw reply related

* Re: [PATCH net-next 09/14] gtp: Allow configuring GTP interface as standalone
From: Tom Herbert @ 2017-09-20 16:24 UTC (permalink / raw)
  To: Andreas Schultz
  Cc: Tom Herbert, David S. Miller, Linux Kernel Network Developers,
	Pablo Neira Ayuso, Harald Welte, Rohit Seth
In-Reply-To: <56e339a8-fa2d-6f8d-a89c-c0f3f242763a@tpip.net>

On Wed, Sep 20, 2017 at 9:07 AM, Andreas Schultz <aschultz@tpip.net> wrote:
>
>
> On 20/09/17 17:57, Tom Herbert wrote:
>>
>> On Wed, Sep 20, 2017 at 8:27 AM, Andreas Schultz <aschultz@tpip.net>
>> wrote:
>>>
>>> On 19/09/17 02:38, Tom Herbert wrote:
>>>>
>>>>
>>>> Add new configuration of GTP interfaces that allow specifying a port to
>>>> listen on (as opposed to having to get sockets from a userspace control
>>>> plane). This allows GTP interfaces to be configured and the data path
>>>> tested without requiring a GTP-C daemon.
>>>
>>>
>>>
>>> This would imply that you can have multiple independent GTP sockets on
>>> the
>>> same IP address.That is not permitted by the GTP specifications. 3GPP TS
>>> 29.281, section 4.3 states clearly that there is "only" one GTP entity
>>> per
>>> IP address.A PDP context is defined by the destination IP and the TEID.
>>> The
>>> destination port is not part of the identity of a PDP context.
>>>
>> We are in no way trying change GTP, if someone runs this in a real GTP
>> network then they need to abide by the specification. However, there
>> is nothing inconsistent and it breaks nothing if someone wishes to use
>> different port numbers in their own private network for testing or
>> development purposes. Every other UDP application that has assigned
>> port number allows configurable ports, I don't see that GTP is so
>> special that it should be an exception.
>
>
> GTP isn't special, I just don't like to have testing only features in there
> when the same goal can be reached without having to add extra stuff. Adding
> code that is not going to be useful in real production setups (or in this
> case would even break production setups when enabled accidentally) makes the
> implementation more complex than it needs to be.

Well, you could make the same argument that allowing GTP to configured
as standalone interface is a problem since GTP is only allowed to be
with used with GTP-C. But, then we have something in the kernel that
the community is expected to support, but requires jumping through a
whole bunch of hoops just to run a simple netperf. The more that
patches and features look like other things in the kernel that are
already well established, the better the chances we can accept them
and support them. It's probably a natural consequence of any large
open source project, so sometimes it's worth the effort to add a few
lines of complexity to get the benefits of community contribution and
support.

Tom

^ permalink raw reply

* [PATCH net-next] bpf: Optimize lpm trie delete
From: Craig Gallek @ 2017-09-20 16:22 UTC (permalink / raw)
  To: Daniel Mack, Alexei Starovoitov, Daniel Borkmann,
	David S . Miller; +Cc: netdev

From: Craig Gallek <kraig@google.com>

Before the delete operator was added, this datastructure maintained
an invariant that intermediate nodes were only present when necessary
to build the tree.  This patch updates the delete operation to reinstate
that invariant by removing unnecessary intermediate nodes after a node is
removed and thus keeping the tree structure at a minimal size.

Suggested-by: Daniel Mack <daniel@zonque.org>
Signed-off-by: Craig Gallek <kraig@google.com>
---
 kernel/bpf/lpm_trie.c | 55 +++++++++++++++++++++++++++------------------------
 1 file changed, 29 insertions(+), 26 deletions(-)

diff --git a/kernel/bpf/lpm_trie.c b/kernel/bpf/lpm_trie.c
index 9d58a576b2ae..b5a7d70ec8b5 100644
--- a/kernel/bpf/lpm_trie.c
+++ b/kernel/bpf/lpm_trie.c
@@ -397,7 +397,7 @@ static int trie_delete_elem(struct bpf_map *map, void *_key)
 	struct lpm_trie_node __rcu **trim;
 	struct lpm_trie_node *node;
 	unsigned long irq_flags;
-	unsigned int next_bit;
+	unsigned int next_bit = 0;
 	size_t matchlen = 0;
 	int ret = 0;
 
@@ -408,14 +408,12 @@ static int trie_delete_elem(struct bpf_map *map, void *_key)
 
 	/* Walk the tree looking for an exact key/length match and keeping
 	 * track of where we could begin trimming the tree.  The trim-point
-	 * is the sub-tree along the walk consisting of only single-child
-	 * intermediate nodes and ending at a leaf node that we want to
-	 * remove.
+	 * is the location of the pointer where we will remove a node from the
+	 * tree.
 	 */
 	trim = &trie->root;
-	node = rcu_dereference_protected(
-		trie->root, lockdep_is_held(&trie->lock));
-	while (node) {
+	while ((node = rcu_dereference_protected(
+		       *trim, lockdep_is_held(&trie->lock)))) {
 		matchlen = longest_prefix_match(trie, node, key);
 
 		if (node->prefixlen != matchlen ||
@@ -423,15 +421,7 @@ static int trie_delete_elem(struct bpf_map *map, void *_key)
 			break;
 
 		next_bit = extract_bit(key->data, node->prefixlen);
-		/* If we hit a node that has more than one child or is a valid
-		 * prefix itself, do not remove it. Reset the root of the trim
-		 * path to its descendant on our path.
-		 */
-		if (!(node->flags & LPM_TREE_NODE_FLAG_IM) ||
-		    (node->child[0] && node->child[1]))
-			trim = &node->child[next_bit];
-		node = rcu_dereference_protected(
-			node->child[next_bit], lockdep_is_held(&trie->lock));
+		trim = &node->child[next_bit];
 	}
 
 	if (!node || node->prefixlen != key->prefixlen ||
@@ -442,25 +432,38 @@ static int trie_delete_elem(struct bpf_map *map, void *_key)
 
 	trie->n_entries--;
 
-	/* If the node we are removing is not a leaf node, simply mark it
+	/* If the node we are removing has two children, simply mark it
 	 * as intermediate and we are done.
 	 */
-	if (rcu_access_pointer(node->child[0]) ||
+	if (rcu_access_pointer(node->child[0]) &&
 	    rcu_access_pointer(node->child[1])) {
 		node->flags |= LPM_TREE_NODE_FLAG_IM;
 		goto out;
 	}
 
-	/* trim should now point to the slot holding the start of a path from
-	 * zero or more intermediate nodes to our leaf node for deletion.
-	 */
-	while ((node = rcu_dereference_protected(
-			*trim, lockdep_is_held(&trie->lock)))) {
+	/* If the node has no children, it can be completely removed */
+	if (!rcu_access_pointer(node->child[0]) &&
+	    !rcu_access_pointer(node->child[1])) {
 		RCU_INIT_POINTER(*trim, NULL);
-		trim = rcu_access_pointer(node->child[0]) ?
-			&node->child[0] :
-			&node->child[1];
 		kfree_rcu(node, rcu);
+		goto out;
+	}
+
+	/* If the node has one child, we may be able to collapse the tree
+	 * while removing this node if the node's child is in the same
+	 * 'next bit' slot as this node was in its parent or if the node
+	 * itself is the root.
+	 */
+	if (trim == &trie->root) {
+		next_bit = node->child[0] ? 0 : 1;
+		rcu_assign_pointer(trie->root, node->child[next_bit]);
+		kfree_rcu(node, rcu);
+	} else if (rcu_access_pointer(node->child[next_bit])) {
+		rcu_assign_pointer(*trim, node->child[next_bit]);
+		kfree_rcu(node, rcu);
+	} else {
+		/* If we can't collapse, just mark this node as intermediate */
+		node->flags |= LPM_TREE_NODE_FLAG_IM;
 	}
 
 out:
-- 
2.14.1.821.g8fa685d3b7-goog

^ permalink raw reply related

* [Patch net] net_sched: remove cls_flower idr on failure
From: Cong Wang @ 2017-09-20 16:18 UTC (permalink / raw)
  To: netdev; +Cc: Cong Wang, Chris Mi, Jiri Pirko

Fixes: c15ab236d69d ("net/sched: Change cls_flower to use IDR")
Cc: Chris Mi <chrism@mellanox.com>
Cc: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
---
 net/sched/cls_flower.c | 15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
index 1a267e77c6de..d230cb4c8094 100644
--- a/net/sched/cls_flower.c
+++ b/net/sched/cls_flower.c
@@ -922,28 +922,28 @@ static int fl_change(struct net *net, struct sk_buff *in_skb,
 
 		if (!tc_flags_valid(fnew->flags)) {
 			err = -EINVAL;
-			goto errout;
+			goto errout_idr;
 		}
 	}
 
 	err = fl_set_parms(net, tp, fnew, &mask, base, tb, tca[TCA_RATE], ovr);
 	if (err)
-		goto errout;
+		goto errout_idr;
 
 	err = fl_check_assign_mask(head, &mask);
 	if (err)
-		goto errout;
+		goto errout_idr;
 
 	if (!tc_skip_sw(fnew->flags)) {
 		if (!fold && fl_lookup(head, &fnew->mkey)) {
 			err = -EEXIST;
-			goto errout;
+			goto errout_idr;
 		}
 
 		err = rhashtable_insert_fast(&head->ht, &fnew->ht_node,
 					     head->ht_params);
 		if (err)
-			goto errout;
+			goto errout_idr;
 	}
 
 	if (!tc_skip_hw(fnew->flags)) {
@@ -952,7 +952,7 @@ static int fl_change(struct net *net, struct sk_buff *in_skb,
 					   &mask.key,
 					   fnew);
 		if (err)
-			goto errout;
+			goto errout_idr;
 	}
 
 	if (!tc_in_hw(fnew->flags))
@@ -981,6 +981,9 @@ static int fl_change(struct net *net, struct sk_buff *in_skb,
 	kfree(tb);
 	return 0;
 
+errout_idr:
+	if (fnew->handle)
+		idr_remove_ext(&head->handle_idr, fnew->handle);
 errout:
 	tcf_exts_destroy(&fnew->exts);
 	kfree(fnew);
-- 
2.13.0

^ permalink raw reply related

* [PATCH] ipv6: Use ipv6_authlen for len in ipv6_skip_exthdr
From: Xiang Gao @ 2017-09-20 16:18 UTC (permalink / raw)
  To: trivial, netdev, davem, kuznet, yoshfuji; +Cc: qasdfgtyuiop

In ipv6_skip_exthdr, the lengh of AH header is computed manually
as (hp->hdrlen+2)<<2. However, in include/linux/ipv6.h, a macro
named ipv6_authlen is already defined for exactly the same job. This
commit replaces the manual computation code with the macro.

Signed-off-by: Xiang Gao <qasdfgtyuiop@gmail.com>
---
 net/ipv6/exthdrs_core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv6/exthdrs_core.c b/net/ipv6/exthdrs_core.c
index 305e2ed730bf..115d60919f72 100644
--- a/net/ipv6/exthdrs_core.c
+++ b/net/ipv6/exthdrs_core.c
@@ -99,7 +99,7 @@ int ipv6_skip_exthdr(const struct sk_buff *skb, int start, u8 *nexthdrp,
 				break;
 			hdrlen = 8;
 		} else if (nexthdr == NEXTHDR_AUTH)
-			hdrlen = (hp->hdrlen+2)<<2;
+			hdrlen = ipv6_authlen(hp);
 		else
 			hdrlen = ipv6_optlen(hp);
 
-- 
2.14.1

^ permalink raw reply related

* Re: [PATCH] ipv6_skip_exthdr: use ipv6_authlen for AH hdrlen
From: Xiang Gao @ 2017-09-20 16:17 UTC (permalink / raw)
  To: David Miller; +Cc: trivial, netdev, Alexey Kuznetsov, Hideaki YOSHIFUJI
In-Reply-To: <20170919.153208.1575676406897804623.davem@davemloft.net>

Hi David,

Thanks for your time and all your suggestions. I will resend a new patch soon.

Xiang Gao
Xiang Gao


2017-09-19 18:32 GMT-04:00 David Miller <davem@davemloft.net>:
> From: Xiang Gao <qasdfgtyuiop@gmail.com>
> Date: Tue, 19 Sep 2017 08:59:50 -0400
>
>> In ipv6_skip_exthdr, the lengh of AH header is computed manually
>> as (hp->hdrlen+2)<<2. However, in include/linux/ipv6.h, a macro
>> named ipv6_authlen is already defined for exactly the same job. This
>> commit replaces the manual computation code with the macro.
>
> All patch submissions must have a proper signoff.
>
> Also, please use a proper subsystem prefix in your Subject
> line "[PATCH] ipv6: Use ipv6_authlen for AH hdrlen in ipv6_skip_exthdr()"
> would have been much better as "ipv6: " is the appropriate
> subsystem prefix to use here.
>
> Thanks.

^ permalink raw reply

* [PATCH v4 4/4] samples/bpf: Add documentation on cross compilation
From: Joel Fernandes @ 2017-09-20 16:11 UTC (permalink / raw)
  To: linux-kernel
  Cc: netdev, alison, juri.lelli, fengc, daniel, davem, ast,
	kernel-team, Joel Fernandes
In-Reply-To: <20170920161159.25747-1-joelaf@google.com>

Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Joel Fernandes <joelaf@google.com>
---
 samples/bpf/README.rst | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/samples/bpf/README.rst b/samples/bpf/README.rst
index 79f9a58f1872..2b906127ef54 100644
--- a/samples/bpf/README.rst
+++ b/samples/bpf/README.rst
@@ -64,3 +64,13 @@ It is also possible to point make to the newly compiled 'llc' or
 'clang' command via redefining LLC or CLANG on the make command line::
 
  make samples/bpf/ LLC=~/git/llvm/build/bin/llc CLANG=~/git/llvm/build/bin/clang
+
+Cross compiling samples
+-----------------------
+Inorder to cross-compile, say for arm64 targets, export CROSS_COMPILE and ARCH
+environment variables before calling make. This will direct make to build
+samples for the cross target.
+
+export ARCH=arm64
+export CROSS_COMPILE="aarch64-linux-gnu-"
+make samples/bpf/ LLC=~/git/llvm/build/bin/llc CLANG=~/git/llvm/build/bin/clang
-- 
2.14.1.821.g8fa685d3b7-goog

^ permalink raw reply related

* [PATCH v4 3/4] samples/bpf: Fix pt_regs issues when cross-compiling
From: Joel Fernandes @ 2017-09-20 16:11 UTC (permalink / raw)
  To: linux-kernel
  Cc: netdev, alison, juri.lelli, fengc, daniel, davem, ast,
	kernel-team, Joel Fernandes
In-Reply-To: <20170920161159.25747-1-joelaf@google.com>

BPF samples fail to build when cross-compiling for ARM64 because of incorrect
pt_regs param selection. This is because clang defines __x86_64__ and
bpf_headers thinks we're building for x86. Since clang is building for the BPF
target, it shouldn't make assumptions about what target the BPF program is
going to run on. To fix this, lets pass ARCH so the header knows which target
the BPF program is being compiled for and can use the correct pt_regs code.

Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Joel Fernandes <joelaf@google.com>
---
 samples/bpf/Makefile                      |  2 +-
 tools/testing/selftests/bpf/bpf_helpers.h | 56 +++++++++++++++++++++++++++----
 2 files changed, 50 insertions(+), 8 deletions(-)

diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
index 13f74b67ca44..ebc2ad69b62c 100644
--- a/samples/bpf/Makefile
+++ b/samples/bpf/Makefile
@@ -230,7 +230,7 @@ $(obj)/%.o: $(src)/%.c
 	$(CLANG) $(NOSTDINC_FLAGS) $(LINUXINCLUDE) $(EXTRA_CFLAGS) -I$(obj) \
 		-I$(srctree)/tools/testing/selftests/bpf/ \
 		-D__KERNEL__ -D__ASM_SYSREG_H -Wno-unused-value -Wno-pointer-sign \
-		-Wno-compare-distinct-pointer-types \
+		-D__TARGET_ARCH_$(ARCH) -Wno-compare-distinct-pointer-types \
 		-Wno-gnu-variable-sized-type-not-at-end \
 		-Wno-address-of-packed-member -Wno-tautological-compare \
 		-Wno-unknown-warning-option \
diff --git a/tools/testing/selftests/bpf/bpf_helpers.h b/tools/testing/selftests/bpf/bpf_helpers.h
index 36fb9161b34a..4875395b0b52 100644
--- a/tools/testing/selftests/bpf/bpf_helpers.h
+++ b/tools/testing/selftests/bpf/bpf_helpers.h
@@ -109,7 +109,47 @@ static int (*bpf_skb_under_cgroup)(void *ctx, void *map, int index) =
 static int (*bpf_skb_change_head)(void *, int len, int flags) =
 	(void *) BPF_FUNC_skb_change_head;
 
+/* Scan the ARCH passed in from ARCH env variable (see Makefile) */
+#if defined(__TARGET_ARCH_x86)
+	#define bpf_target_x86
+	#define bpf_target_defined
+#elif defined(__TARGET_ARCH_s930x)
+	#define bpf_target_s930x
+	#define bpf_target_defined
+#elif defined(__TARGET_ARCH_arm64)
+	#define bpf_target_arm64
+	#define bpf_target_defined
+#elif defined(__TARGET_ARCH_mips)
+	#define bpf_target_mips
+	#define bpf_target_defined
+#elif defined(__TARGET_ARCH_powerpc)
+	#define bpf_target_powerpc
+	#define bpf_target_defined
+#elif defined(__TARGET_ARCH_sparc)
+	#define bpf_target_sparc
+	#define bpf_target_defined
+#else
+	#undef bpf_target_defined
+#endif
+
+/* Fall back to what the compiler says */
+#ifndef bpf_target_defined
 #if defined(__x86_64__)
+	#define bpf_target_x86
+#elif defined(__s390x__)
+	#define bpf_target_s930x
+#elif defined(__aarch64__)
+	#define bpf_target_arm64
+#elif defined(__mips__)
+	#define bpf_target_mips
+#elif defined(__powerpc__)
+	#define bpf_target_powerpc
+#elif defined(__sparc__)
+	#define bpf_target_sparc
+#endif
+#endif
+
+#if defined(bpf_target_x86)
 
 #define PT_REGS_PARM1(x) ((x)->di)
 #define PT_REGS_PARM2(x) ((x)->si)
@@ -122,7 +162,7 @@ static int (*bpf_skb_change_head)(void *, int len, int flags) =
 #define PT_REGS_SP(x) ((x)->sp)
 #define PT_REGS_IP(x) ((x)->ip)
 
-#elif defined(__s390x__)
+#elif defined(bpf_target_s390x)
 
 #define PT_REGS_PARM1(x) ((x)->gprs[2])
 #define PT_REGS_PARM2(x) ((x)->gprs[3])
@@ -135,7 +175,7 @@ static int (*bpf_skb_change_head)(void *, int len, int flags) =
 #define PT_REGS_SP(x) ((x)->gprs[15])
 #define PT_REGS_IP(x) ((x)->psw.addr)
 
-#elif defined(__aarch64__)
+#elif defined(bpf_target_arm64)
 
 #define PT_REGS_PARM1(x) ((x)->regs[0])
 #define PT_REGS_PARM2(x) ((x)->regs[1])
@@ -148,7 +188,7 @@ static int (*bpf_skb_change_head)(void *, int len, int flags) =
 #define PT_REGS_SP(x) ((x)->sp)
 #define PT_REGS_IP(x) ((x)->pc)
 
-#elif defined(__mips__)
+#elif defined(bpf_target_mips)
 
 #define PT_REGS_PARM1(x) ((x)->regs[4])
 #define PT_REGS_PARM2(x) ((x)->regs[5])
@@ -161,7 +201,7 @@ static int (*bpf_skb_change_head)(void *, int len, int flags) =
 #define PT_REGS_SP(x) ((x)->regs[29])
 #define PT_REGS_IP(x) ((x)->cp0_epc)
 
-#elif defined(__powerpc__)
+#elif defined(bpf_target_powerpc)
 
 #define PT_REGS_PARM1(x) ((x)->gpr[3])
 #define PT_REGS_PARM2(x) ((x)->gpr[4])
@@ -172,7 +212,7 @@ static int (*bpf_skb_change_head)(void *, int len, int flags) =
 #define PT_REGS_SP(x) ((x)->sp)
 #define PT_REGS_IP(x) ((x)->nip)
 
-#elif defined(__sparc__)
+#elif defined(bpf_target_sparc)
 
 #define PT_REGS_PARM1(x) ((x)->u_regs[UREG_I0])
 #define PT_REGS_PARM2(x) ((x)->u_regs[UREG_I1])
@@ -182,6 +222,8 @@ static int (*bpf_skb_change_head)(void *, int len, int flags) =
 #define PT_REGS_RET(x) ((x)->u_regs[UREG_I7])
 #define PT_REGS_RC(x) ((x)->u_regs[UREG_I0])
 #define PT_REGS_SP(x) ((x)->u_regs[UREG_FP])
+
+/* Should this also be a bpf_target check for the sparc case? */
 #if defined(__arch64__)
 #define PT_REGS_IP(x) ((x)->tpc)
 #else
@@ -190,10 +232,10 @@ static int (*bpf_skb_change_head)(void *, int len, int flags) =
 
 #endif
 
-#ifdef __powerpc__
+#ifdef bpf_target_powerpc
 #define BPF_KPROBE_READ_RET_IP(ip, ctx)		({ (ip) = (ctx)->link; })
 #define BPF_KRETPROBE_READ_RET_IP		BPF_KPROBE_READ_RET_IP
-#elif defined(__sparc__)
+#elif bpf_target_sparc
 #define BPF_KPROBE_READ_RET_IP(ip, ctx)		({ (ip) = PT_REGS_RET(ctx); })
 #define BPF_KRETPROBE_READ_RET_IP		BPF_KPROBE_READ_RET_IP
 #else
-- 
2.14.1.821.g8fa685d3b7-goog

^ permalink raw reply related

* [PATCH v4 2/4] samples/bpf: Enable cross compiler support
From: Joel Fernandes @ 2017-09-20 16:11 UTC (permalink / raw)
  To: linux-kernel
  Cc: netdev, alison, juri.lelli, fengc, daniel, davem, ast,
	kernel-team, Joel Fernandes
In-Reply-To: <20170920161159.25747-1-joelaf@google.com>

When cross compiling, bpf samples use HOSTCC for compiling the non-BPF part of
the sample, however what we really want is to use the cross compiler to build
for the cross target since that is what will load and run the BPF sample.
Detect this and compile samples correctly.

Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Joel Fernandes <joelaf@google.com>
---
 samples/bpf/Makefile | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
index cf17c7932a6e..13f74b67ca44 100644
--- a/samples/bpf/Makefile
+++ b/samples/bpf/Makefile
@@ -177,6 +177,11 @@ HOSTLOADLIBES_syscall_tp += -lelf
 LLC ?= llc
 CLANG ?= clang
 
+# Detect that we're cross compiling and use the cross compiler
+ifdef CROSS_COMPILE
+HOSTCC = $(CROSS_COMPILE)gcc
+endif
+
 # Trick to allow make to be run from this directory
 all:
 	$(MAKE) -C ../../ $(CURDIR)/
-- 
2.14.1.821.g8fa685d3b7-goog

^ permalink raw reply related

* [PATCH v4 1/4] samples/bpf: Use getppid instead of getpgrp for array map stress
From: Joel Fernandes @ 2017-09-20 16:11 UTC (permalink / raw)
  To: linux-kernel
  Cc: netdev, alison, juri.lelli, fengc, daniel, davem, ast,
	kernel-team, Joel Fernandes

When cross-compiling the bpf sample map_perf_test for aarch64, I find that
__NR_getpgrp is undefined. This causes build errors. This syscall is deprecated
and requires defining __ARCH_WANT_SYSCALL_DEPRECATED. To avoid having to define
that, just use a different syscall (getppid) for the array map stress test.

Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Joel Fernandes <joelaf@google.com>
---
 samples/bpf/map_perf_test_kern.c | 2 +-
 samples/bpf/map_perf_test_user.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/samples/bpf/map_perf_test_kern.c b/samples/bpf/map_perf_test_kern.c
index 098c857f1eda..2b2ffb97018b 100644
--- a/samples/bpf/map_perf_test_kern.c
+++ b/samples/bpf/map_perf_test_kern.c
@@ -266,7 +266,7 @@ int stress_hash_map_lookup(struct pt_regs *ctx)
 	return 0;
 }
 
-SEC("kprobe/sys_getpgrp")
+SEC("kprobe/sys_getppid")
 int stress_array_map_lookup(struct pt_regs *ctx)
 {
 	u32 key = 1, i;
diff --git a/samples/bpf/map_perf_test_user.c b/samples/bpf/map_perf_test_user.c
index f388254896f6..a0310fc70057 100644
--- a/samples/bpf/map_perf_test_user.c
+++ b/samples/bpf/map_perf_test_user.c
@@ -282,7 +282,7 @@ static void test_array_lookup(int cpu)
 
 	start_time = time_get_ns();
 	for (i = 0; i < max_cnt; i++)
-		syscall(__NR_getpgrp, 0);
+		syscall(__NR_getppid, 0);
 	printf("%d:array_lookup %lld lookups per sec\n",
 	       cpu, max_cnt * 1000000000ll * 64 / (time_get_ns() - start_time));
 }
-- 
2.14.1.821.g8fa685d3b7-goog

^ permalink raw reply related

* re: mac80211: avoid allocating TXQs that won't be used
From: Colin Ian King @ 2017-09-20 16:08 UTC (permalink / raw)
  To: Johannes Berg
  Cc: David S. Miller, linux-wireless@vger.kernel.org,
	netdev@vger.kernel.org

Johannes,

Static analysis with CoverityScan on linux-next today detected a null
pointer dereference issue on commit:

>From 0fc4b3403d215ecd3c05505ec1f0028a227ed319 Mon Sep 17 00:00:00 2001
From: Johannes Berg <johannes.berg@intel.com>
Date: Thu, 22 Jun 2017 12:20:29 +0200
Subject: [PATCH] mac80211: avoid allocating TXQs that won't be used

Issue: sdata is null when the sdata is dereferenced by:

                   sdata->vif.type != NL80211_IFTYPE_AP_VLAN &&
                   sdata->vif.type != NL80211_IFTYPE_MONITOR)

note that sdata is assigned a non-null much later with the statement
sdata = netdev_priv(ndev).

Detected by CoverityScan CID#1456974 ("Explicit null dereferenced")

Colin

^ permalink raw reply

* Re: [PATCH net-next 09/14] gtp: Allow configuring GTP interface as standalone
From: Andreas Schultz @ 2017-09-20 16:07 UTC (permalink / raw)
  To: Tom Herbert
  Cc: Tom Herbert, David S. Miller, Linux Kernel Network Developers,
	Pablo Neira Ayuso, Harald Welte, Rohit Seth
In-Reply-To: <CALx6S3771X4QUm2mJG=aJJ0f4Tm98NpGA7060vF1P8iuCPgicQ@mail.gmail.com>



On 20/09/17 17:57, Tom Herbert wrote:
> On Wed, Sep 20, 2017 at 8:27 AM, Andreas Schultz <aschultz@tpip.net> wrote:
>> On 19/09/17 02:38, Tom Herbert wrote:
>>>
>>> Add new configuration of GTP interfaces that allow specifying a port to
>>> listen on (as opposed to having to get sockets from a userspace control
>>> plane). This allows GTP interfaces to be configured and the data path
>>> tested without requiring a GTP-C daemon.
>>
>>
>> This would imply that you can have multiple independent GTP sockets on the
>> same IP address.That is not permitted by the GTP specifications. 3GPP TS
>> 29.281, section 4.3 states clearly that there is "only" one GTP entity per
>> IP address.A PDP context is defined by the destination IP and the TEID. The
>> destination port is not part of the identity of a PDP context.
>>
> We are in no way trying change GTP, if someone runs this in a real GTP
> network then they need to abide by the specification. However, there
> is nothing inconsistent and it breaks nothing if someone wishes to use
> different port numbers in their own private network for testing or
> development purposes. Every other UDP application that has assigned
> port number allows configurable ports, I don't see that GTP is so
> special that it should be an exception.

GTP isn't special, I just don't like to have testing only features in 
there when the same goal can be reached without having to add extra 
stuff. Adding code that is not going to be useful in real production 
setups (or in this case would even break production setups when enabled 
accidentally) makes the implementation more complex than it needs to be.

You can always add multiple IP's to your test system and have the same 
effect without having to change the ports.

Regards
Andreas

> 
> Tom
> 

^ permalink raw reply

* [PATCH v4 0/4] Add cross-compilation support to eBPF samples
From: Joel Fernandes @ 2017-09-20 16:04 UTC (permalink / raw)
  To: linux-kernel, netdev
  Cc: alison, juri.lelli, fengc, daniel, davem, ast, kernel-team,
	Joel Fernandes

These patches fix issues seen when cross-compiling eBPF samples on arm64.
Compared to [1], I dropped the controversial inline-asm patch and exploring
other options to fix it. However these patches are a step in the right
direction and I look forward to getting them into -next and the merge window.

Changes since v3:
- just a repost with acks

[1] https://lkml.org/lkml/2017/8/7/417

Joel Fernandes (4):
  samples/bpf: Use getppid instead of getpgrp for array map stress
  samples/bpf: Enable cross compiler support
  samples/bpf: Fix pt_regs issues when cross-compiling
  samples/bpf: Add documentation on cross compilation

 samples/bpf/Makefile                      |  7 +++-
 samples/bpf/README.rst                    | 10 ++++++
 samples/bpf/map_perf_test_kern.c          |  2 +-
 samples/bpf/map_perf_test_user.c          |  2 +-
 tools/testing/selftests/bpf/bpf_helpers.h | 56 +++++++++++++++++++++++++++----
 5 files changed, 67 insertions(+), 10 deletions(-)

-- 
2.14.1.821.g8fa685d3b7-goog

^ permalink raw reply

* Re: [PATCH net-next 09/14] gtp: Allow configuring GTP interface as standalone
From: Tom Herbert @ 2017-09-20 15:57 UTC (permalink / raw)
  To: Andreas Schultz
  Cc: Tom Herbert, David S. Miller, Linux Kernel Network Developers,
	Pablo Neira Ayuso, Harald Welte, Rohit Seth
In-Reply-To: <a6d1dad0-e922-2913-d4c8-592d403ee1cb@tpip.net>

On Wed, Sep 20, 2017 at 8:27 AM, Andreas Schultz <aschultz@tpip.net> wrote:
> On 19/09/17 02:38, Tom Herbert wrote:
>>
>> Add new configuration of GTP interfaces that allow specifying a port to
>> listen on (as opposed to having to get sockets from a userspace control
>> plane). This allows GTP interfaces to be configured and the data path
>> tested without requiring a GTP-C daemon.
>
>
> This would imply that you can have multiple independent GTP sockets on the
> same IP address.That is not permitted by the GTP specifications. 3GPP TS
> 29.281, section 4.3 states clearly that there is "only" one GTP entity per
> IP address.A PDP context is defined by the destination IP and the TEID. The
> destination port is not part of the identity of a PDP context.
>
We are in no way trying change GTP, if someone runs this in a real GTP
network then they need to abide by the specification. However, there
is nothing inconsistent and it breaks nothing if someone wishes to use
different port numbers in their own private network for testing or
development purposes. Every other UDP application that has assigned
port number allows configurable ports, I don't see that GTP is so
special that it should be an exception.

Tom

^ permalink raw reply

* [PATCH v5 net 3/3] lan78xx: Use default values loaded from EEPROM/OTP after reset
From: Nisar Sayed @ 2017-09-20 21:06 UTC (permalink / raw)
  To: davem; +Cc: UNGLinuxDriver, netdev
In-Reply-To: <20170920210638.11150-1-Nisar.Sayed@microchip.com>

Use default value of auto duplex and auto speed values loaded
from EEPROM/OTP after reset. The LAN78xx allows platform
configurations to be loaded from EEPROM/OTP.
Ex: When external phy is connected, the MAC can be configured to
have correct auto speed, auto duplex, auto polarity configured
from the EEPROM/OTP.

Fixes: 55d7de9de6c3 ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet device driver")
Signed-off-by: Nisar Sayed <Nisar.Sayed@microchip.com>
---
 drivers/net/usb/lan78xx.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/net/usb/lan78xx.c b/drivers/net/usb/lan78xx.c
index f8c63eec8353..0161f77641fa 100644
--- a/drivers/net/usb/lan78xx.c
+++ b/drivers/net/usb/lan78xx.c
@@ -2449,7 +2449,6 @@ static int lan78xx_reset(struct lan78xx_net *dev)
 	/* LAN7801 only has RGMII mode */
 	if (dev->chipid == ID_REV_CHIP_ID_7801_)
 		buf &= ~MAC_CR_GMII_EN_;
-	buf |= MAC_CR_AUTO_DUPLEX_ | MAC_CR_AUTO_SPEED_;
 	ret = lan78xx_write_reg(dev, MAC_CR, buf);
 
 	ret = lan78xx_read_reg(dev, MAC_TX, &buf);
-- 
2.14.1

^ permalink raw reply related

* [PATCH v5 net 2/3] lan78xx: Allow EEPROM write for less than MAX_EEPROM_SIZE
From: Nisar Sayed @ 2017-09-20 21:06 UTC (permalink / raw)
  To: davem; +Cc: UNGLinuxDriver, netdev
In-Reply-To: <20170920210638.11150-1-Nisar.Sayed@microchip.com>

Allow EEPROM write for less than MAX_EEPROM_SIZE

Fixes: 55d7de9de6c3 ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet device driver")
Signed-off-by: Nisar Sayed <Nisar.Sayed@microchip.com>
---
 drivers/net/usb/lan78xx.c | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/drivers/net/usb/lan78xx.c b/drivers/net/usb/lan78xx.c
index fcf85ae37435..f8c63eec8353 100644
--- a/drivers/net/usb/lan78xx.c
+++ b/drivers/net/usb/lan78xx.c
@@ -1290,11 +1290,10 @@ static int lan78xx_ethtool_set_eeprom(struct net_device *netdev,
 	if (ret)
 		return ret;
 
-	/* Allow entire eeprom update only */
-	if ((ee->magic == LAN78XX_EEPROM_MAGIC) &&
-	    (ee->offset == 0) &&
-	    (ee->len == 512) &&
-	    (data[0] == EEPROM_INDICATOR))
+	/* Invalid EEPROM_INDICATOR at offset zero will result in a failure
+	 * to load data from EEPROM
+	 */
+	if (ee->magic == LAN78XX_EEPROM_MAGIC)
 		ret = lan78xx_write_raw_eeprom(dev, ee->offset, ee->len, data);
 	else if ((ee->magic == LAN78XX_OTP_MAGIC) &&
 		 (ee->offset == 0) &&
-- 
2.14.1

^ permalink raw reply related

* [PATCH v5 net 1/3] lan78xx: Fix for eeprom read/write when device auto suspend
From: Nisar Sayed @ 2017-09-20 21:06 UTC (permalink / raw)
  To: davem; +Cc: UNGLinuxDriver, netdev
In-Reply-To: <20170920210638.11150-1-Nisar.Sayed@microchip.com>

Fix for eeprom read/write when device auto suspend

Fixes: 55d7de9de6c3 ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet device driver")
Signed-off-by: Nisar Sayed <Nisar.Sayed@microchip.com>
---
 drivers/net/usb/lan78xx.c | 24 ++++++++++++++++++++----
 1 file changed, 20 insertions(+), 4 deletions(-)

diff --git a/drivers/net/usb/lan78xx.c b/drivers/net/usb/lan78xx.c
index b99a7fb09f8e..fcf85ae37435 100644
--- a/drivers/net/usb/lan78xx.c
+++ b/drivers/net/usb/lan78xx.c
@@ -1265,30 +1265,46 @@ static int lan78xx_ethtool_get_eeprom(struct net_device *netdev,
 				      struct ethtool_eeprom *ee, u8 *data)
 {
 	struct lan78xx_net *dev = netdev_priv(netdev);
+	int ret;
+
+	ret = usb_autopm_get_interface(dev->intf);
+	if (ret)
+		return ret;
 
 	ee->magic = LAN78XX_EEPROM_MAGIC;
 
-	return lan78xx_read_raw_eeprom(dev, ee->offset, ee->len, data);
+	ret = lan78xx_read_raw_eeprom(dev, ee->offset, ee->len, data);
+
+	usb_autopm_put_interface(dev->intf);
+
+	return ret;
 }
 
 static int lan78xx_ethtool_set_eeprom(struct net_device *netdev,
 				      struct ethtool_eeprom *ee, u8 *data)
 {
 	struct lan78xx_net *dev = netdev_priv(netdev);
+	int ret;
+
+	ret = usb_autopm_get_interface(dev->intf);
+	if (ret)
+		return ret;
 
 	/* Allow entire eeprom update only */
 	if ((ee->magic == LAN78XX_EEPROM_MAGIC) &&
 	    (ee->offset == 0) &&
 	    (ee->len == 512) &&
 	    (data[0] == EEPROM_INDICATOR))
-		return lan78xx_write_raw_eeprom(dev, ee->offset, ee->len, data);
+		ret = lan78xx_write_raw_eeprom(dev, ee->offset, ee->len, data);
 	else if ((ee->magic == LAN78XX_OTP_MAGIC) &&
 		 (ee->offset == 0) &&
 		 (ee->len == 512) &&
 		 (data[0] == OTP_INDICATOR_1))
-		return lan78xx_write_raw_otp(dev, ee->offset, ee->len, data);
+		ret = lan78xx_write_raw_otp(dev, ee->offset, ee->len, data);
 
-	return -EINVAL;
+	usb_autopm_put_interface(dev->intf);
+
+	return ret;
 }
 
 static void lan78xx_get_strings(struct net_device *netdev, u32 stringset,
-- 
2.14.1

^ permalink raw reply related

* [PATCH v5 net 0/3] lan78xx: This series of patches are for lan78xx driver.
From: Nisar Sayed @ 2017-09-20 21:06 UTC (permalink / raw)
  To: davem; +Cc: UNGLinuxDriver, netdev

This series of patches are for lan78xx driver.

These patches fixes potential issues associated with lan78xx driver.

v5
- Updated changes as per comments

v4
- Updated changes to handle return values as per comments
- Updated EEPROM write handling as per comments

v3
- Updated chagnes as per comments

v2
- Added patch version information
- Added fixes tag
- Updated patch description
- Updated chagnes as per comments

v1
- Splitted patches as per comments
- Dropped "fixed_phy device support" and "Fix for system suspend" changes

Nisar Sayed (3):
  lan78xx: Fix for eeprom read/write when device auto suspend
  lan78xx: Allow EEPROM write for less than MAX_EEPROM_SIZE
  lan78xx: Use default values loaded from EEPROM/OTP after reset

 drivers/net/usb/lan78xx.c | 34 ++++++++++++++++++++++++----------
 1 file changed, 24 insertions(+), 10 deletions(-)

-- 
2.14.1

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox