Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH v4 0/9] bring back stack frame warning with KASAN
From: Arnd Bergmann @ 2017-09-22 21:29 UTC (permalink / raw)
  Cc: Arnd Bergmann, Mauro Carvalho Chehab, Jiri Pirko,
	Arend van Spriel, Kalle Valo, David S. Miller, Andrey Ryabinin,
	Alexander Potapenko, Dmitry Vyukov, Masahiro Yamada, Michal Marek,
	Andrew Morton, Kees Cook, Geert Uytterhoeven, Greg Kroah-Hartman,
	linux-media, linux-kernel, netdev, linux-wireless,
	brcm80211-dev-list.pdl

This is a new version of patches I originally submitted back in March
[1], and last time in June [2]. This time I have basically rewritten
the entire patch series based on a new approach that came out of GCC
PR81715 that I opened[3]. The upcoming gcc-8 release is now much better
at consolidating stack slots for inline function arguments and would
obsolete most of my workaround patches here, but we still need the
workarounds for gcc-5, gcc-6 and gcc-7. Many thanks to Jakub Jelinek
for the analysis and the gcc-8 patch!

This minimal set of patches only makes sure that we do get frame size
warnings in allmodconfig for x86_64 and arm64 again with a 2048 byte
limit, even with KASAN enabled, but without the new KASAN_EXTRA option.

I set the warning limit with KASAN_EXTRA to 3072, limiting the
allmodconfig+KASAN_EXTRA build output to around 50 legitimate warnings.
These are for stack frames up to 31KB that will cause an immediate stack
overflow, and fixing them would require bringing back my older patches
and more. We can debate whether we want to apply those as a follow-up,
or instead remove the option entirely.

Another follow-up series I have reduces the warning limit with
KASAN to 1536, and without KASAN to 1280 for 64-bit architectures.

I hope we can get all patches merged for v4.14 and most of them
backported into stable kernels. Since we no longer have a dependency
on a preparation patch, my preference would be for the respective
subsystem maintainers to pick up the individual patches.
The last patch introduces a couple of "allmodconfig" build warnings
on x86 and arm64 unless the other patches get merged first, I'll
send that again separately once everything else has been taken
care of.

The remaining contents are:
- -fsanitize-address-use-after-scope is moved to a separate
  CONFIG_KASAN_EXTRA option that increases the warning limit
- CONFIG_KASAN_EXTRA is disabled with CONFIG_COMPILE_TEST,
  improving compile speed and disabling code that leads to
  valid warnings on gcc-7.0.1
- KMEMCHECK conflicts with CONFIG_KASAN
- my inline function workaround is applied to netlink, one
  ethernet driver and a few media drivers.
- The rework for the brcmsmac driver from previous versions is
  still there.

Changes since v3:
- I dropped all "noinline_if_stackbloat" annotations and used
  a workaround that introduces additional local variables in the inline
  functions to copy the function arguments, resulting in much better
  object code at the expense of having rather odd-looking functions.
- The v4 patches now don't help with KASAN_EXTRA any more at all,
  CONFIG_KASAN_EXTRA now depends on CONFIG_DEBUG_KERNEL, as it
  is more dangerous in production systems than it was before
- Rewrote the "em28xx" patch to be small enough for a stable backport.
- The rewritten vt-keyboard patches got merged and are now in
  stable kernels as well.

Changes since v2:
- rewrote the vt-keyboard patch based on feedback
- and made KMEMCHECK mutually exclusive with KASAN
  (rather than KASAN_EXTRA)

Changes since v1:
- dropped patches to fix all the CONFIG_KASAN_EXTRA warnings:
 - READ_ONCE/WRITE_ONCE cause problems in lots of code
 - typecheck() causes huge problems in a few places
 - many more uses of noinline_if_stackbloat

     Arnd

[1] https://www.spinics.net/lists/linux-wireless/msg159819.html
[2] https://www.spinics.net/lists/netdev/msg441918.html
[3] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81715

Arnd Bergmann (9):
  brcmsmac: make some local variables 'static const' to reduce stack
    size
  brcmsmac: split up wlc_phy_workarounds_nphy
  brcmsmac: reindent split functions
  em28xx: fix em28xx_dvb_init for KASAN
  r820t: fix r820t_write_reg for KASAN
  dvb-frontends: fix i2c access helpers for KASAN
  rocker: fix rocker_tlv_put_* functions for KASAN
  netlink: fix nla_put_{u8,u16,u32} for KASAN
  kasan: rework Kconfig settings

 drivers/media/dvb-frontends/ascot2e.c              |    4 +-
 drivers/media/dvb-frontends/cxd2841er.c            |    4 +-
 drivers/media/dvb-frontends/helene.c               |    4 +-
 drivers/media/dvb-frontends/horus3a.c              |    4 +-
 drivers/media/dvb-frontends/itd1000.c              |    5 +-
 drivers/media/dvb-frontends/mt312.c                |    4 +-
 drivers/media/dvb-frontends/stb0899_drv.c          |    3 +-
 drivers/media/dvb-frontends/stb6100.c              |    6 +-
 drivers/media/dvb-frontends/stv0367.c              |    4 +-
 drivers/media/dvb-frontends/stv090x.c              |    4 +-
 drivers/media/dvb-frontends/stv6110x.c             |    4 +-
 drivers/media/dvb-frontends/zl10039.c              |    4 +-
 drivers/media/tuners/r820t.c                       |   13 +-
 drivers/media/usb/em28xx/em28xx-dvb.c              |   30 +-
 drivers/net/ethernet/rocker/rocker_tlv.h           |   48 +-
 .../broadcom/brcm80211/brcmsmac/phy/phy_n.c        | 1856 ++++++++++----------
 include/net/netlink.h                              |   73 +-
 lib/Kconfig.debug                                  |    4 +-
 lib/Kconfig.kasan                                  |   13 +-
 lib/Kconfig.kmemcheck                              |    1 +
 scripts/Makefile.kasan                             |    3 +
 21 files changed, 1047 insertions(+), 1044 deletions(-)

-- 
2.9.0

Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Jiri Pirko <jiri@resnulli.us>
Cc: Arend van Spriel <arend.vanspriel@broadcom.com>
Cc: Kalle Valo <kvalo@codeaurora.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Michal Marek <mmarek@suse.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: linux-media@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: netdev@vger.kernel.org
Cc: linux-wireless@vger.kernel.org
Cc: brcm80211-dev-list.pdl@broadcom.com
Cc: brcm80211-dev-list@cypress.com
Cc: kasan-dev@googlegroups.com
Cc: linux-kbuild@vger.kernel.org
Cc: Jakub Jelinek <jakub@gcc.gnu.org>
Cc: Martin Liška <marxin@gcc.gnu.org>

^ permalink raw reply

* [RFC PATCH 09/11] route: add ipv4/6 helpers to do partial route lookup vs local dst
From: Paolo Abeni @ 2017-09-22 21:06 UTC (permalink / raw)
  To: netdev
  Cc: David S. Miller, Pablo Neira Ayuso, Florian Westphal,
	Eric Dumazet, Hannes Frederic Sowa
In-Reply-To: <cover.1506114055.git.pabeni@redhat.com>

For ipv4 also implement the proper source address validation, even
against martian addresses and return an error code accordingly.

Will be used by later patches to perform dst lookup in early
demux for unconnected sockets.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 include/net/ip6_route.h |  1 +
 include/net/route.h     |  2 ++
 net/ipv4/route.c        | 43 +++++++++++++++++++++++++++++++++++++++++++
 net/ipv6/route.c        | 13 +++++++++++++
 4 files changed, 59 insertions(+)

diff --git a/include/net/ip6_route.h b/include/net/ip6_route.h
index ee96f402cb75..edb24456a609 100644
--- a/include/net/ip6_route.h
+++ b/include/net/ip6_route.h
@@ -65,6 +65,7 @@ static inline bool rt6_need_strict(const struct in6_addr *daddr)
 		(IPV6_ADDR_MULTICAST | IPV6_ADDR_LINKLOCAL | IPV6_ADDR_LOOPBACK);
 }
 
+void ip6_route_try_local_rcu_bh(struct net *net, struct sk_buff *skb);
 void ip6_route_input(struct sk_buff *skb);
 struct dst_entry *ip6_route_input_lookup(struct net *net,
 					 struct net_device *dev,
diff --git a/include/net/route.h b/include/net/route.h
index ec09c3d73581..21927231cc14 100644
--- a/include/net/route.h
+++ b/include/net/route.h
@@ -178,6 +178,8 @@ static inline struct rtable *ip_route_output_gre(struct net *net, struct flowi4
 
 struct rtable *ip_local_route_alloc(struct net_device *dev, unsigned int flags,
 				    u32 itag, unsigned char type, bool docache);
+int ip_route_try_local_rcu(struct net *net, struct sk_buff *skb,
+			   const struct iphdr *iph);
 int ip_route_input_noref(struct sk_buff *skb, __be32 dst, __be32 src,
 			 u8 tos, struct net_device *devin);
 int ip_route_input_rcu(struct sk_buff *skb, __be32 dst, __be32 src,
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 515589f1b3d1..84248dd41da6 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -2079,6 +2079,49 @@ out:	return err;
 	goto out;
 }
 
+/* try to resolve and set the route for the ingress packet in the local
+ * destination, looking-up the destination address against the local ones
+ * and performing source validation
+ * return an error only if the local look up is successful and validation fails
+ * Called under RCU
+ */
+int ip_route_try_local_rcu(struct net *net, struct sk_buff *skb,
+			   const struct iphdr *iph)
+{
+	__be32 saddr = iph->saddr;
+	struct in_device *in_dev;
+	struct dst_entry *dst;
+	int err = -EINVAL;
+	u32 itag;
+
+	dst = inet_get_ifaddr_dst_rcu(net, iph->daddr);
+	if (!dst)
+		return 0;
+
+	in_dev = __in_dev_get_rcu(skb->dev);
+	if (ipv4_is_multicast(saddr) || ipv4_is_lbcast(saddr))
+		goto martian_source;
+
+	/* check for zeronet only after successful lookup, so that we don't trip
+	 * over limited broadcast destination, see ip_route_input_slow()
+	 */
+	if (ipv4_is_zeronet(saddr) || (ipv4_is_loopback(saddr) &&
+				       !IN_DEV_NET_ROUTE_LOCALNET(in_dev, net)))
+		goto martian_source;
+
+	err = fib_validate_source(skb, saddr, iph->daddr, iph->tos, 0, skb->dev,
+				  in_dev, &itag);
+	if (err < 0)
+		goto martian_source;
+
+	skb_dst_set_noref(skb, dst);
+	return 0;
+
+martian_source:
+	ip_handle_martian_source(skb->dev, in_dev, skb, iph->daddr, iph->saddr);
+	return err;
+}
+
 int ip_route_input_noref(struct sk_buff *skb, __be32 daddr, __be32 saddr,
 			 u8 tos, struct net_device *dev)
 {
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 26cc9f483b6d..d957e30b1cbe 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -1283,6 +1283,19 @@ void ip6_route_input(struct sk_buff *skb)
 	skb_dst_set(skb, ip6_route_input_lookup(net, skb->dev, &fl6, flags));
 }
 
+/* try to resolve and set the route for the ingress packet in the local
+ * destination
+ * Called under RCU
+ */
+void ip6_route_try_local_rcu_bh(struct net *net, struct sk_buff *skb)
+{
+	struct dst_entry *dst;
+
+	dst = inet6_get_ifaddr_dst_rcu_bh(net, &ipv6_hdr(skb)->daddr);
+	if (dst)
+		skb_dst_set_noref(skb, dst);
+}
+
 static struct rt6_info *ip6_pol_route_output(struct net *net, struct fib6_table *table,
 					     struct flowi6 *fl6, int flags)
 {
-- 
2.13.5

^ permalink raw reply related

* [RFC PATCH 08/11] net: implement local route cache inside ifaddr
From: Paolo Abeni @ 2017-09-22 21:06 UTC (permalink / raw)
  To: netdev
  Cc: David S. Miller, Pablo Neira Ayuso, Florian Westphal,
	Eric Dumazet, Hannes Frederic Sowa
In-Reply-To: <cover.1506114055.git.pabeni@redhat.com>

add storage and helpers to associate an ipv{4,6} address
with the local route to self. This will be used by a
later patch to implement early demux for unconnected UDP
sockets.

The caches are filled on address creation, with DST_OBSOLETE_NONE.
Ipv6 cache are explicitly clearered and refreshed on underlaying
device down/up events.

The above schema is simpler than refreshing the cache every
time the dst expires under the default obsolete schema.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 include/linux/inetdevice.h |  4 ++++
 include/net/addrconf.h     |  3 +++
 include/net/if_inet6.h     |  4 ++++
 net/ipv4/devinet.c         | 29 ++++++++++++++++++++++++++++-
 net/ipv6/addrconf.c        | 44 ++++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 83 insertions(+), 1 deletion(-)

diff --git a/include/linux/inetdevice.h b/include/linux/inetdevice.h
index 751d051f0bc7..c29982f178bb 100644
--- a/include/linux/inetdevice.h
+++ b/include/linux/inetdevice.h
@@ -130,6 +130,8 @@ static inline void ipv4_devconf_setall(struct in_device *in_dev)
 #define IN_DEV_ARP_IGNORE(in_dev)	IN_DEV_MAXCONF((in_dev), ARP_IGNORE)
 #define IN_DEV_ARP_NOTIFY(in_dev)	IN_DEV_MAXCONF((in_dev), ARP_NOTIFY)
 
+struct dst_entry;
+
 struct in_ifaddr {
 	struct hlist_node	hash;
 	struct in_ifaddr	*ifa_next;
@@ -149,6 +151,7 @@ struct in_ifaddr {
 	__u32			ifa_preferred_lft;
 	unsigned long		ifa_cstamp; /* created timestamp */
 	unsigned long		ifa_tstamp; /* updated timestamp */
+	struct dst_entry	*dst; /* local route to self */
 };
 
 struct in_validator_info {
@@ -180,6 +183,7 @@ __be32 inet_confirm_addr(struct net *net, struct in_device *in_dev, __be32 dst,
 struct in_ifaddr *inet_ifa_byprefix(struct in_device *in_dev, __be32 prefix,
 				    __be32 mask);
 struct in_ifaddr *inet_lookup_ifaddr_rcu(struct net *net, __be32 addr);
+struct dst_entry *inet_get_ifaddr_dst_rcu(struct net *net, __be32 addr);
 static __inline__ bool inet_ifa_match(__be32 addr, struct in_ifaddr *ifa)
 {
 	return !((addr^ifa->ifa_address)&ifa->ifa_mask);
diff --git a/include/net/addrconf.h b/include/net/addrconf.h
index 87981cd63180..bdfa3306a4c5 100644
--- a/include/net/addrconf.h
+++ b/include/net/addrconf.h
@@ -87,6 +87,9 @@ struct inet6_ifaddr *ipv6_get_ifaddr(struct net *net,
 				     const struct in6_addr *addr,
 				     struct net_device *dev, int strict);
 
+struct dst_entry *inet6_get_ifaddr_dst_rcu_bh(struct net *net,
+					      const struct in6_addr *addr);
+
 int ipv6_dev_get_saddr(struct net *net, const struct net_device *dev,
 		       const struct in6_addr *daddr, unsigned int srcprefs,
 		       struct in6_addr *saddr);
diff --git a/include/net/if_inet6.h b/include/net/if_inet6.h
index d4088d1a688d..1dd42e7c17a4 100644
--- a/include/net/if_inet6.h
+++ b/include/net/if_inet6.h
@@ -39,6 +39,8 @@ enum {
 	INET6_IFADDR_STATE_DEAD,
 };
 
+struct dst_entry;
+
 struct inet6_ifaddr {
 	struct in6_addr		addr;
 	__u32			prefix_len;
@@ -77,6 +79,8 @@ struct inet6_ifaddr {
 
 	struct rcu_head		rcu;
 	struct in6_addr		peer_addr;
+
+	struct dst_entry	*dst; /* local route to self */
 };
 
 struct ip6_sf_socklist {
diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
index 7ce22a2c07ce..a7748f787866 100644
--- a/net/ipv4/devinet.c
+++ b/net/ipv4/devinet.c
@@ -179,6 +179,17 @@ struct in_ifaddr *inet_lookup_ifaddr_rcu(struct net *net, __be32 addr)
 	return NULL;
 }
 
+/* called under RCU lock */
+struct dst_entry *inet_get_ifaddr_dst_rcu(struct net *net, __be32 addr)
+{
+	struct in_ifaddr *ifa = inet_lookup_ifaddr_rcu(net, addr);
+
+	if (!ifa)
+		return NULL;
+
+	return dst_access(&ifa->dst, 0);
+}
+
 static void rtmsg_ifa(int event, struct in_ifaddr *, struct nlmsghdr *, u32);
 
 static BLOCKING_NOTIFIER_HEAD(inetaddr_chain);
@@ -337,6 +348,7 @@ static void __inet_del_ifa(struct in_device *in_dev, struct in_ifaddr **ifap,
 	struct in_ifaddr *last_prim = in_dev->ifa_list;
 	struct in_ifaddr *prev_prom = NULL;
 	int do_promote = IN_DEV_PROMOTE_SECONDARIES(in_dev);
+	struct dst_entry *dst;
 
 	ASSERT_RTNL();
 
@@ -395,7 +407,12 @@ static void __inet_del_ifa(struct in_device *in_dev, struct in_ifaddr **ifap,
 	*ifap = ifa1->ifa_next;
 	inet_hash_remove(ifa1);
 
-	/* 3. Announce address deletion */
+	/* 3. Clear dst cache */
+
+	dst = xchg(&ifa1->dst, NULL);
+	dst_release(dst);
+
+	/* 4. Announce address deletion */
 
 	/* Send message first, then call notifier.
 	   At first sight, FIB update triggered by notifier
@@ -449,6 +466,7 @@ static int __inet_insert_ifa(struct in_ifaddr *ifa, struct nlmsghdr *nlh,
 	struct in_device *in_dev = ifa->ifa_dev;
 	struct in_ifaddr *ifa1, **ifap, **last_primary;
 	struct in_validator_info ivi;
+	struct rtable *rt;
 	int ret;
 
 	ASSERT_RTNL();
@@ -516,6 +534,15 @@ static int __inet_insert_ifa(struct in_ifaddr *ifa, struct nlmsghdr *nlh,
 	rtmsg_ifa(RTM_NEWADDR, ifa, nlh, portid);
 	blocking_notifier_call_chain(&inetaddr_chain, NETDEV_UP, ifa);
 
+	/* fill the dst cache and transfer the dst ownership to it */
+	rt = ip_local_route_alloc(in_dev->dev, 0, 0, RTN_LOCAL, false);
+	if (rt) {
+		/* the local route will be valid for till the address will be
+		 * up
+		 */
+		rt->dst.obsolete = DST_OBSOLETE_NONE;
+		__dst_update(&ifa->dst, &rt->dst);
+	}
 	return 0;
 }
 
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 5940062cac8d..5fa8d1b764ca 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -904,6 +904,26 @@ static int addrconf_fixup_linkdown(struct ctl_table *table, int *p, int newf)
 
 #endif
 
+static void inet6_addr_dst_clear(struct inet6_ifaddr *ifp)
+{
+	dst_release(xchg(&ifp->dst, NULL));
+}
+
+static void inet6_addr_dst_update(struct inet6_dev *idev,
+				  struct inet6_ifaddr *ifp)
+{
+	struct rt6_info *rt = addrconf_dst_alloc(idev, &ifp->addr, false);
+
+	if (IS_ERR(rt))
+		return;
+
+	/* we are going to manully clear the cache when the related dev will
+	 * go down
+	 */
+	rt->dst.obsolete = DST_OBSOLETE_NONE;
+	__dst_update(&ifp->dst, &rt->dst);
+}
+
 /* Nobody refers to this ifaddr, destroy it */
 void inet6_ifa_finish_destroy(struct inet6_ifaddr *ifp)
 {
@@ -914,6 +934,7 @@ void inet6_ifa_finish_destroy(struct inet6_ifaddr *ifp)
 #endif
 
 	in6_dev_put(ifp->idev);
+	inet6_addr_dst_clear(ifp);
 
 	if (cancel_delayed_work(&ifp->dad_work))
 		pr_notice("delayed DAD work was pending while freeing ifa=%p\n",
@@ -1907,6 +1928,18 @@ int ipv6_chk_prefix(const struct in6_addr *addr, struct net_device *dev)
 }
 EXPORT_SYMBOL(ipv6_chk_prefix);
 
+/* called under RCU lock */
+struct dst_entry *inet6_get_ifaddr_dst_rcu_bh(struct net *net,
+					      const struct in6_addr *addr)
+{
+	struct inet6_ifaddr *ifp = ipv6_lookup_ifaddr_rcu_bh(net, addr);
+
+	if (!ifp)
+		return NULL;
+
+	return dst_access(&ifp->dst, 0);
+}
+
 struct inet6_ifaddr *ipv6_get_ifaddr(struct net *net, const struct in6_addr *addr,
 				     struct net_device *dev, int strict)
 {
@@ -3300,6 +3333,15 @@ static int fixup_permanent_addr(struct inet6_dev *idev,
 		ifp->rt = rt;
 		spin_unlock(&ifp->lock);
 
+		/* if dad is not going to start, we must cache the new route
+		 * elsewhere we can just clear the old one
+		 */
+		if (ifp->state == INET6_IFADDR_STATE_PREDAD ||
+		    ifp->flags & IFA_F_TENTATIVE)
+			inet6_addr_dst_clear(ifp);
+		else
+			inet6_addr_dst_update(idev, ifp);
+
 		ip6_rt_put(prev);
 	}
 
@@ -3679,6 +3721,7 @@ static int addrconf_ifdown(struct net_device *dev, int how)
 
 		spin_unlock_bh(&ifa->lock);
 
+		inet6_addr_dst_clear(ifa);
 		if (rt)
 			ip6_del_rt(rt);
 
@@ -4009,6 +4052,7 @@ static void addrconf_dad_completed(struct inet6_ifaddr *ifp, bool bump_id)
 	 */
 
 	ipv6_ifa_notify(RTM_NEWADDR, ifp);
+	inet6_addr_dst_update(ifp->idev, ifp);
 
 	/* If added prefix is link local and we are prepared to process
 	   router advertisements, start sending router solicitations.
-- 
2.13.5

^ permalink raw reply related

* [RFC PATCH 07/11] ipv6/addrconf: add an helper for inet6 address lookup
From: Paolo Abeni @ 2017-09-22 21:06 UTC (permalink / raw)
  To: netdev
  Cc: David S. Miller, Pablo Neira Ayuso, Florian Westphal,
	Eric Dumazet, Hannes Frederic Sowa
In-Reply-To: <cover.1506114055.git.pabeni@redhat.com>

reduce code duplication and will simplify follow-up patch

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 net/ipv6/addrconf.c | 65 +++++++++++++++++++++++++----------------------------
 1 file changed, 31 insertions(+), 34 deletions(-)

diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index c2e2a78787ec..5940062cac8d 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -1796,35 +1796,46 @@ int ipv6_chk_addr(struct net *net, const struct in6_addr *addr,
 }
 EXPORT_SYMBOL(ipv6_chk_addr);
 
+/* called under RCU lock with bh disabled */
+static struct inet6_ifaddr *ipv6_lookup_ifaddr_rcu_bh(struct net *net,
+						    const struct in6_addr *addr)
+{
+	unsigned int hash = inet6_addr_hash(addr);
+	struct inet6_ifaddr *ifp;
+
+	hlist_for_each_entry_rcu_bh(ifp, &inet6_addr_lst[hash], addr_lst)
+		if (net_eq(dev_net(ifp->idev->dev), net) &&
+		    ipv6_addr_equal(&ifp->addr, addr))
+			return ifp;
+
+	return NULL;
+}
+
 int ipv6_chk_addr_and_flags(struct net *net, const struct in6_addr *addr,
 			    const struct net_device *dev, int strict,
 			    u32 banned_flags)
 {
 	struct inet6_ifaddr *ifp;
-	unsigned int hash = inet6_addr_hash(addr);
 	u32 ifp_flags;
+	int ret = 0;
 
 	rcu_read_lock_bh();
-	hlist_for_each_entry_rcu(ifp, &inet6_addr_lst[hash], addr_lst) {
-		if (!net_eq(dev_net(ifp->idev->dev), net))
-			continue;
+	ifp = ipv6_lookup_ifaddr_rcu_bh(net, addr);
+	if (ifp) {
 		/* Decouple optimistic from tentative for evaluation here.
 		 * Ban optimistic addresses explicitly, when required.
 		 */
 		ifp_flags = (ifp->flags&IFA_F_OPTIMISTIC)
 			    ? (ifp->flags&~IFA_F_TENTATIVE)
 			    : ifp->flags;
-		if (ipv6_addr_equal(&ifp->addr, addr) &&
-		    !(ifp_flags&banned_flags) &&
+		if (!(ifp_flags&banned_flags) &&
 		    (!dev || ifp->idev->dev == dev ||
-		     !(ifp->scope&(IFA_LINK|IFA_HOST) || strict))) {
-			rcu_read_unlock_bh();
-			return 1;
-		}
+		     !(ifp->scope&(IFA_LINK|IFA_HOST) || strict)))
+			ret = 1;
 	}
 
 	rcu_read_unlock_bh();
-	return 0;
+	return ret;
 }
 EXPORT_SYMBOL(ipv6_chk_addr_and_flags);
 
@@ -1900,20 +1911,13 @@ struct inet6_ifaddr *ipv6_get_ifaddr(struct net *net, const struct in6_addr *add
 				     struct net_device *dev, int strict)
 {
 	struct inet6_ifaddr *ifp, *result = NULL;
-	unsigned int hash = inet6_addr_hash(addr);
 
 	rcu_read_lock_bh();
-	hlist_for_each_entry_rcu_bh(ifp, &inet6_addr_lst[hash], addr_lst) {
-		if (!net_eq(dev_net(ifp->idev->dev), net))
-			continue;
-		if (ipv6_addr_equal(&ifp->addr, addr)) {
-			if (!dev || ifp->idev->dev == dev ||
-			    !(ifp->scope&(IFA_LINK|IFA_HOST) || strict)) {
-				result = ifp;
-				in6_ifa_hold(ifp);
-				break;
-			}
-		}
+	ifp = ipv6_lookup_ifaddr_rcu_bh(net, addr);
+	if (ifp && (!dev || ifp->idev->dev == dev ||
+		    !(ifp->scope & (IFA_LINK|IFA_HOST) || strict))) {
+		result = ifp;
+		in6_ifa_hold(ifp);
 	}
 	rcu_read_unlock_bh();
 
@@ -4226,20 +4230,13 @@ void if6_proc_exit(void)
 /* Check if address is a home address configured on any interface. */
 int ipv6_chk_home_addr(struct net *net, const struct in6_addr *addr)
 {
-	int ret = 0;
 	struct inet6_ifaddr *ifp = NULL;
-	unsigned int hash = inet6_addr_hash(addr);
+	int ret = 0;
 
 	rcu_read_lock_bh();
-	hlist_for_each_entry_rcu_bh(ifp, &inet6_addr_lst[hash], addr_lst) {
-		if (!net_eq(dev_net(ifp->idev->dev), net))
-			continue;
-		if (ipv6_addr_equal(&ifp->addr, addr) &&
-		    (ifp->flags & IFA_F_HOMEADDRESS)) {
-			ret = 1;
-			break;
-		}
-	}
+	ifp = ipv6_lookup_ifaddr_rcu_bh(net, addr);
+	if (ifp && ifp->flags & IFA_F_HOMEADDRESS)
+		ret = 1;
 	rcu_read_unlock_bh();
 	return ret;
 }
-- 
2.13.5

^ permalink raw reply related

* [RFC PATCH 06/11] ip/route: factor out helper for local route creation
From: Paolo Abeni @ 2017-09-22 21:06 UTC (permalink / raw)
  To: netdev
  Cc: David S. Miller, Pablo Neira Ayuso, Florian Westphal,
	Eric Dumazet, Hannes Frederic Sowa
In-Reply-To: <cover.1506114055.git.pabeni@redhat.com>

Will be used by a later patch to build the ifaddr dst cache.
No functional changes are introduced here.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 include/net/route.h |  2 ++
 net/ipv4/route.c    | 30 ++++++++++++++++++++++--------
 2 files changed, 24 insertions(+), 8 deletions(-)

diff --git a/include/net/route.h b/include/net/route.h
index 1b09a9368c68..ec09c3d73581 100644
--- a/include/net/route.h
+++ b/include/net/route.h
@@ -176,6 +176,8 @@ static inline struct rtable *ip_route_output_gre(struct net *net, struct flowi4
 	return ip_route_output_key(net, fl4);
 }
 
+struct rtable *ip_local_route_alloc(struct net_device *dev, unsigned int flags,
+				    u32 itag, unsigned char type, bool docache);
 int ip_route_input_noref(struct sk_buff *skb, __be32 dst, __be32 src,
 			 u8 tos, struct net_device *devin);
 int ip_route_input_rcu(struct sk_buff *skb, __be32 dst, __be32 src,
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 94d4cd2d5ea4..515589f1b3d1 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -1859,6 +1859,27 @@ static int ip_mkroute_input(struct sk_buff *skb,
 	return __mkroute_input(skb, res, in_dev, daddr, saddr, tos);
 }
 
+struct rtable *ip_local_route_alloc(struct net_device *dev, unsigned int flags,
+				    u32 itag, unsigned char type, bool do_cache)
+{
+	struct in_device *in_dev = __in_dev_get_rcu(dev);
+	struct net *net = dev_net(dev);
+	struct rtable *rth;
+
+	rth = rt_dst_alloc(l3mdev_master_dev_rcu(dev) ? : net->loopback_dev,
+			   flags | RTCF_LOCAL, type,
+			   IN_DEV_CONF_GET(in_dev, NOPOLICY), false, do_cache);
+	if (!rth)
+		return NULL;
+
+	rth->dst.output= ip_rt_bug;
+#ifdef CONFIG_IP_ROUTE_CLASSID
+	rth->dst.tclassid = itag;
+#endif
+	rth->rt_is_input = 1;
+	return rth;
+}
+
 /*
  *	NOTE. We drop all the packets that has local source
  *	addresses, because every properly looped back packet
@@ -1996,17 +2017,10 @@ out:	return err;
 		}
 	}
 
-	rth = rt_dst_alloc(l3mdev_master_dev_rcu(dev) ? : net->loopback_dev,
-			   flags | RTCF_LOCAL, res->type,
-			   IN_DEV_CONF_GET(in_dev, NOPOLICY), false, do_cache);
+	rth = ip_local_route_alloc(dev, flags, itag, res->type, do_cache);
 	if (!rth)
 		goto e_nobufs;
 
-	rth->dst.output= ip_rt_bug;
-#ifdef CONFIG_IP_ROUTE_CLASSID
-	rth->dst.tclassid = itag;
-#endif
-	rth->rt_is_input = 1;
 	if (res->table)
 		rth->rt_table_id = res->table->tb_id;
 
-- 
2.13.5

^ permalink raw reply related

* [RFC PATCH 05/11] udp: perform full socket lookup in early demux
From: Paolo Abeni @ 2017-09-22 21:06 UTC (permalink / raw)
  To: netdev
  Cc: David S. Miller, Pablo Neira Ayuso, Florian Westphal,
	Eric Dumazet, Hannes Frederic Sowa
In-Reply-To: <cover.1506114055.git.pabeni@redhat.com>

Since UDP early demux lookup fetches noref socket references,
we can safely be optimistic about it and set the sk reference
even if the skb is not going to land on such socket, avoiding
the rx dst cache usage for unconnected unicast sockets.

This avoids a second lookup for unconnected sockets, and clean
up a bit the whole udp early demux code.

After this change, on hosts not acting as routers, the UDP
early demux never affect negatively the receive performances,
while before this change UDP early demux caused measurable
performance impact for unconnected sockets.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 include/linux/udp.h |  2 ++
 net/ipv4/udp.c      | 62 +++++++++++++++++++----------------------------------
 net/ipv6/udp.c      | 57 ++++++++++++------------------------------------
 3 files changed, 38 insertions(+), 83 deletions(-)

diff --git a/include/linux/udp.h b/include/linux/udp.h
index eaea63bc79bb..9c68b57543cc 100644
--- a/include/linux/udp.h
+++ b/include/linux/udp.h
@@ -92,6 +92,8 @@ static inline struct udp_sock *udp_sk(const struct sock *sk)
 	return (struct udp_sock *)sk;
 }
 
+void udp_set_skb_rx_dst(struct sock *sk, struct sk_buff *skb, u32 cookie);
+
 static inline void udp_set_no_check6_tx(struct sock *sk, bool val)
 {
 	udp_sk(sk)->no_check6_tx = val;
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index ba49d5aa9f09..5cbbd78024dc 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -2043,6 +2043,11 @@ static inline int udp4_csum_init(struct sk_buff *skb, struct udphdr *uh,
 							 inet_compute_pseudo);
 }
 
+static bool udp_use_rx_dst_cache(struct sock *sk, struct sk_buff *skb)
+{
+	return sk->sk_state == TCP_ESTABLISHED || skb->pkt_type != PACKET_HOST;
+}
+
 /*
  *	All we need to do is get the socket, and then do a checksum.
  */
@@ -2088,8 +2093,8 @@ int __udp4_lib_rcv(struct sk_buff *skb, struct udp_table *udptable,
 		struct dst_entry *dst = skb_dst(skb);
 		int ret;
 
-		if (unlikely(sk->sk_rx_dst != dst))
-			udp_sk_rx_dst_set(sk, dst);
+		if (udp_use_rx_dst_cache(sk, skb))
+			dst_update(&sk->sk_rx_dst, dst);
 
 		ret = udp_queue_rcv_skb(sk, skb);
 		if (!noref_sk)
@@ -2196,42 +2201,28 @@ static struct sock *__udp4_lib_mcast_demux_lookup(struct net *net,
 	return result;
 }
 
-/* For unicast we should only early demux connected sockets or we can
- * break forwarding setups.  The chains here can be long so only check
- * if the first socket is an exact match and if not move on.
- */
-static struct sock *__udp4_lib_demux_lookup(struct net *net,
-					    __be16 loc_port, __be32 loc_addr,
-					    __be16 rmt_port, __be32 rmt_addr,
-					    int dif, int sdif)
+void udp_set_skb_rx_dst(struct sock *sk, struct sk_buff *skb, u32 cookie)
 {
-	unsigned short hnum = ntohs(loc_port);
-	unsigned int hash2 = udp4_portaddr_hash(net, loc_addr, hnum);
-	unsigned int slot2 = hash2 & udp_table.mask;
-	struct udp_hslot *hslot2 = &udp_table.hash2[slot2];
-	INET_ADDR_COOKIE(acookie, rmt_addr, loc_addr);
-	const __portpair ports = INET_COMBINED_PORTS(rmt_port, hnum);
-	struct sock *sk;
+	struct dst_entry *dst = dst_access(&sk->sk_rx_dst, cookie);
 
-	udp_portaddr_for_each_entry_rcu(sk, &hslot2->head) {
-		if (INET_MATCH(sk, net, acookie, rmt_addr,
-			       loc_addr, ports, dif, sdif))
-			return sk;
-		/* Only check first socket in chain */
-		break;
+	if (dst) {
+		/* set noref for now.
+		 * any place which wants to hold dst has to call
+		 * dst_hold_safe()
+		 */
+		skb_dst_set_noref(skb, dst);
 	}
-	return NULL;
 }
+EXPORT_SYMBOL_GPL(udp_set_skb_rx_dst);
 
 void udp_v4_early_demux(struct sk_buff *skb)
 {
 	struct net *net = dev_net(skb->dev);
+	int dif = skb->dev->ifindex;
+	int sdif = inet_sdif(skb);
 	const struct iphdr *iph;
 	const struct udphdr *uh;
 	struct sock *sk = NULL;
-	struct dst_entry *dst;
-	int dif = skb->dev->ifindex;
-	int sdif = inet_sdif(skb);
 	int ours;
 
 	/* validate the packet */
@@ -2260,25 +2251,16 @@ void udp_v4_early_demux(struct sk_buff *skb)
 						   uh->source, iph->saddr,
 						   dif, sdif);
 	} else if (skb->pkt_type == PACKET_HOST) {
-		sk = __udp4_lib_demux_lookup(net, uh->dest, iph->daddr,
-					     uh->source, iph->saddr, dif, sdif);
+		sk = __udp4_lib_lookup(net, iph->saddr, uh->source, iph->daddr,
+				       uh->dest, dif, sdif, &udp_table, skb);
 	}
 
 	if (!sk)
 		return;
 
 	skb_set_noref_sk(skb, sk);
-	dst = READ_ONCE(sk->sk_rx_dst);
-
-	if (dst)
-		dst = dst_check(dst, 0);
-	if (dst) {
-		/* set noref for now.
-		 * any place which wants to hold dst has to call
-		 * dst_hold_safe()
-		 */
-		skb_dst_set_noref(skb, dst);
-	}
+	if (udp_use_rx_dst_cache(sk, skb))
+		udp_set_skb_rx_dst(sk, skb, 0);
 }
 
 int udp_rcv(struct sk_buff *skb)
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 8f62392c4c35..67d340679c3a 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -773,13 +773,18 @@ static int __udp6_lib_mcast_deliver(struct net *net, struct sk_buff *skb,
 
 static void udp6_sk_rx_dst_set(struct sock *sk, struct dst_entry *dst)
 {
-	if (udp_sk_rx_dst_set(sk, dst)) {
+	if (unlikely(dst_update(&sk->sk_rx_dst, dst))) {
 		const struct rt6_info *rt = (const struct rt6_info *)dst;
 
 		inet6_sk(sk)->rx_dst_cookie = rt6_get_cookie(rt);
 	}
 }
 
+static bool udp6_use_rx_dst_cache(struct sock *sk)
+{
+	return sk->sk_state == TCP_ESTABLISHED;
+}
+
 int __udp6_lib_rcv(struct sk_buff *skb, struct udp_table *udptable,
 		   int proto)
 {
@@ -830,7 +835,7 @@ int __udp6_lib_rcv(struct sk_buff *skb, struct udp_table *udptable,
 		struct dst_entry *dst = skb_dst(skb);
 		int ret;
 
-		if (unlikely(sk->sk_rx_dst != dst))
+		if (udp6_use_rx_dst_cache(sk))
 			udp6_sk_rx_dst_set(sk, dst);
 
 		ret = udpv6_queue_rcv_skb(sk, skb);
@@ -905,37 +910,13 @@ int __udp6_lib_rcv(struct sk_buff *skb, struct udp_table *udptable,
 	return 0;
 }
 
-
-static struct sock *__udp6_lib_demux_lookup(struct net *net,
-			__be16 loc_port, const struct in6_addr *loc_addr,
-			__be16 rmt_port, const struct in6_addr *rmt_addr,
-			int dif, int sdif)
-{
-	unsigned short hnum = ntohs(loc_port);
-	unsigned int hash2 = udp6_portaddr_hash(net, loc_addr, hnum);
-	unsigned int slot2 = hash2 & udp_table.mask;
-	struct udp_hslot *hslot2 = &udp_table.hash2[slot2];
-	const __portpair ports = INET_COMBINED_PORTS(rmt_port, hnum);
-	struct sock *sk;
-
-	udp_portaddr_for_each_entry_rcu(sk, &hslot2->head) {
-		if (sk->sk_state == TCP_ESTABLISHED &&
-		    INET6_MATCH(sk, net, rmt_addr, loc_addr, ports, dif, sdif))
-			return sk;
-		/* Only check first socket in chain */
-		break;
-	}
-	return NULL;
-}
-
 static void udp_v6_early_demux(struct sk_buff *skb)
 {
 	struct net *net = dev_net(skb->dev);
-	const struct udphdr *uh;
-	struct sock *sk;
-	struct dst_entry *dst;
 	int dif = skb->dev->ifindex;
 	int sdif = inet6_sdif(skb);
+	const struct udphdr *uh;
+	struct sock *sk;
 
 	if (!pskb_may_pull(skb, skb_transport_offset(skb) +
 	    sizeof(struct udphdr)))
@@ -944,10 +925,9 @@ static void udp_v6_early_demux(struct sk_buff *skb)
 	uh = udp_hdr(skb);
 
 	if (skb->pkt_type == PACKET_HOST)
-		sk = __udp6_lib_demux_lookup(net, uh->dest,
-					     &ipv6_hdr(skb)->daddr,
-					     uh->source, &ipv6_hdr(skb)->saddr,
-					     dif, sdif);
+		sk = __udp6_lib_lookup(net, &ipv6_hdr(skb)->saddr, uh->source,
+				       &ipv6_hdr(skb)->daddr, uh->dest, dif,
+				       sdif, &udp_table, skb);
 	else
 		return;
 
@@ -955,17 +935,8 @@ static void udp_v6_early_demux(struct sk_buff *skb)
 		return;
 
 	skb_set_noref_sk(skb, sk);
-	dst = READ_ONCE(sk->sk_rx_dst);
-
-	if (dst)
-		dst = dst_check(dst, inet6_sk(sk)->rx_dst_cookie);
-	if (dst) {
-		/* set noref for now.
-		 * any place which wants to hold dst has to call
-		 * dst_hold_safe()
-		 */
-		skb_dst_set_noref(skb, dst);
-	}
+	if (udp6_use_rx_dst_cache(sk))
+		udp_set_skb_rx_dst(sk, skb, inet6_sk(sk)->rx_dst_cookie);
 }
 
 static __inline__ int udpv6_rcv(struct sk_buff *skb)
-- 
2.13.5

^ permalink raw reply related

* [RFC PATCH 04/11] net: add simple socket-like dst cache helpers
From: Paolo Abeni @ 2017-09-22 21:06 UTC (permalink / raw)
  To: netdev
  Cc: David S. Miller, Pablo Neira Ayuso, Florian Westphal,
	Eric Dumazet, Hannes Frederic Sowa
In-Reply-To: <cover.1506114055.git.pabeni@redhat.com>

It will be used by later patches to reduce code duplication.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 include/net/dst.h | 20 ++++++++++++++++++++
 net/core/dst.c    | 12 ++++++++++++
 2 files changed, 32 insertions(+)

diff --git a/include/net/dst.h b/include/net/dst.h
index 93568bd0a352..4fcca0e368c6 100644
--- a/include/net/dst.h
+++ b/include/net/dst.h
@@ -485,6 +485,26 @@ static inline struct dst_entry *dst_check(struct dst_entry *dst, u32 cookie)
 	return dst;
 }
 
+/* update the cache with dst, assuming the latter already carries a refcount */
+static inline bool __dst_update(struct dst_entry **cache, struct dst_entry *dst)
+{
+	struct dst_entry *old = xchg(cache, dst);
+
+	dst_release(old);
+	return old != dst;
+}
+bool dst_update(struct dst_entry **cache, struct dst_entry *dst);
+static inline struct dst_entry *dst_access(struct dst_entry **cache,
+					      u32 cookie)
+{
+	struct dst_entry *dst = READ_ONCE(*cache);
+
+	if (!dst)
+		return NULL;
+
+	return dst_check(dst, cookie);
+}
+
 /* Flags for xfrm_lookup flags argument. */
 enum {
 	XFRM_LOOKUP_ICMP = 1 << 0,
diff --git a/net/core/dst.c b/net/core/dst.c
index a6c47da7d0f8..4076f9af45d7 100644
--- a/net/core/dst.c
+++ b/net/core/dst.c
@@ -205,6 +205,18 @@ void dst_release_immediate(struct dst_entry *dst)
 }
 EXPORT_SYMBOL(dst_release_immediate);
 
+/* update the cache with dst, assuming the latter does not carry a refcount */
+bool dst_update(struct dst_entry **cache, struct dst_entry *dst)
+{
+	if (likely(*cache == dst))
+		return false;
+
+	if (dst_hold_safe(dst))
+		return __dst_update(cache, dst);
+	return false;
+}
+EXPORT_SYMBOL_GPL(dst_update);
+
 u32 *dst_cow_metrics_generic(struct dst_entry *dst, unsigned long old)
 {
 	struct dst_metrics *p = kmalloc(sizeof(*p), GFP_ATOMIC);
-- 
2.13.5

^ permalink raw reply related

* [RFC PATCH 03/11] udp: do not touch socket refcount in early demux
From: Paolo Abeni @ 2017-09-22 21:06 UTC (permalink / raw)
  To: netdev
  Cc: David S. Miller, Pablo Neira Ayuso, Florian Westphal,
	Eric Dumazet, Hannes Frederic Sowa
In-Reply-To: <cover.1506114055.git.pabeni@redhat.com>

use noref sockets instead. This gives some small performance
improvements and will allow efficient early demux for unconnected
sockets in a later patch.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 net/ipv4/udp.c | 18 ++++++++++--------
 net/ipv6/udp.c | 10 ++++++----
 2 files changed, 16 insertions(+), 12 deletions(-)

diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 784ced0b9150..ba49d5aa9f09 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -2050,12 +2050,13 @@ static inline int udp4_csum_init(struct sk_buff *skb, struct udphdr *uh,
 int __udp4_lib_rcv(struct sk_buff *skb, struct udp_table *udptable,
 		   int proto)
 {
-	struct sock *sk;
-	struct udphdr *uh;
-	unsigned short ulen;
+	struct net *net = dev_net(skb->dev);
 	struct rtable *rt = skb_rtable(skb);
+	unsigned short ulen;
 	__be32 saddr, daddr;
-	struct net *net = dev_net(skb->dev);
+	struct udphdr *uh;
+	struct sock *sk;
+	bool noref_sk;
 
 	/*
 	 *  Validate the packet.
@@ -2081,6 +2082,7 @@ int __udp4_lib_rcv(struct sk_buff *skb, struct udp_table *udptable,
 	if (udp4_csum_init(skb, uh, proto))
 		goto csum_error;
 
+	noref_sk = skb_has_noref_sk(skb);
 	sk = skb_steal_sock(skb);
 	if (sk) {
 		struct dst_entry *dst = skb_dst(skb);
@@ -2090,7 +2092,8 @@ int __udp4_lib_rcv(struct sk_buff *skb, struct udp_table *udptable,
 			udp_sk_rx_dst_set(sk, dst);
 
 		ret = udp_queue_rcv_skb(sk, skb);
-		sock_put(sk);
+		if (!noref_sk)
+			sock_put(sk);
 		/* a return value > 0 means to resubmit the input, but
 		 * it wants the return to be -protocol, or 0
 		 */
@@ -2261,11 +2264,10 @@ void udp_v4_early_demux(struct sk_buff *skb)
 					     uh->source, iph->saddr, dif, sdif);
 	}
 
-	if (!sk || !refcount_inc_not_zero(&sk->sk_refcnt))
+	if (!sk)
 		return;
 
-	skb->sk = sk;
-	skb->destructor = sock_efree;
+	skb_set_noref_sk(skb, sk);
 	dst = READ_ONCE(sk->sk_rx_dst);
 
 	if (dst)
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index e2ecfb137297..8f62392c4c35 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -787,6 +787,7 @@ int __udp6_lib_rcv(struct sk_buff *skb, struct udp_table *udptable,
 	struct net *net = dev_net(skb->dev);
 	struct udphdr *uh;
 	struct sock *sk;
+	bool noref_sk;
 	u32 ulen = 0;
 
 	if (!pskb_may_pull(skb, sizeof(struct udphdr)))
@@ -823,6 +824,7 @@ int __udp6_lib_rcv(struct sk_buff *skb, struct udp_table *udptable,
 		goto csum_error;
 
 	/* Check if the socket is already available, e.g. due to early demux */
+	noref_sk = skb_has_noref_sk(skb);
 	sk = skb_steal_sock(skb);
 	if (sk) {
 		struct dst_entry *dst = skb_dst(skb);
@@ -832,7 +834,8 @@ int __udp6_lib_rcv(struct sk_buff *skb, struct udp_table *udptable,
 			udp6_sk_rx_dst_set(sk, dst);
 
 		ret = udpv6_queue_rcv_skb(sk, skb);
-		sock_put(sk);
+		if (!noref_sk)
+			sock_put(sk);
 
 		/* a return value > 0 means to resubmit the input */
 		if (ret > 0)
@@ -948,11 +951,10 @@ static void udp_v6_early_demux(struct sk_buff *skb)
 	else
 		return;
 
-	if (!sk || !refcount_inc_not_zero(&sk->sk_refcnt))
+	if (!sk)
 		return;
 
-	skb->sk = sk;
-	skb->destructor = sock_efree;
+	skb_set_noref_sk(skb, sk);
 	dst = READ_ONCE(sk->sk_rx_dst);
 
 	if (dst)
-- 
2.13.5

^ permalink raw reply related

* [RFC PATCH 02/11] net: allow early demux to fetch noref socket
From: Paolo Abeni @ 2017-09-22 21:06 UTC (permalink / raw)
  To: netdev
  Cc: David S. Miller, Pablo Neira Ayuso, Florian Westphal,
	Eric Dumazet, Hannes Frederic Sowa
In-Reply-To: <cover.1506114055.git.pabeni@redhat.com>

We must be careful to avoid leaking such sockets outside
the RCU section containing the early demux call; we clear
them on nonlocal delivery.

For ipv4 we clear sknoref even for multicast traffic entering
the ip_mr_input() path; we will lose the mcast early demux
optimization when the host is acting as multicast router, but
that will help to keep to code simple.

Also update all iptables/nftables extension that can
happen in the input chain and can transmit the skb outside
such patch, namely TEE, nft_dup and nfqueue.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 net/ipv4/ip_input.c              | 8 ++++++++
 net/ipv4/netfilter/nf_dup_ipv4.c | 3 +++
 net/ipv6/ip6_input.c             | 4 ++++
 net/ipv6/netfilter/nf_dup_ipv6.c | 3 +++
 net/netfilter/nf_queue.c         | 3 +++
 5 files changed, 21 insertions(+)

diff --git a/net/ipv4/ip_input.c b/net/ipv4/ip_input.c
index fa2dc8f692c6..5690ef09da28 100644
--- a/net/ipv4/ip_input.c
+++ b/net/ipv4/ip_input.c
@@ -351,6 +351,14 @@ static int ip_rcv_finish(struct net *net, struct sock *sk, struct sk_buff *skb)
 		}
 	}
 
+	/* Since the sk has no reference to the socket, we must
+	 * clear it before escaping this RCU section.
+	 * The sk is just an hint and we know we are not going to use
+	 * it outside the input path.
+	 */
+	if (skb_dst(skb)->input != ip_local_deliver)
+		skb_clear_noref_sk(skb);
+
 #ifdef CONFIG_IP_ROUTE_CLASSID
 	if (unlikely(skb_dst(skb)->tclassid)) {
 		struct ip_rt_acct *st = this_cpu_ptr(ip_rt_acct);
diff --git a/net/ipv4/netfilter/nf_dup_ipv4.c b/net/ipv4/netfilter/nf_dup_ipv4.c
index 39895b9ddeb9..bf8b78492fc8 100644
--- a/net/ipv4/netfilter/nf_dup_ipv4.c
+++ b/net/ipv4/netfilter/nf_dup_ipv4.c
@@ -71,6 +71,9 @@ void nf_dup_ipv4(struct net *net, struct sk_buff *skb, unsigned int hooknum,
 	nf_reset(skb);
 	nf_ct_set(skb, NULL, IP_CT_UNTRACKED);
 #endif
+	/* Avoid leaking noref sk outside the input path */
+	skb_clear_noref_sk(skb);
+
 	/*
 	 * If we are in PREROUTING/INPUT, decrease the TTL to mitigate potential
 	 * loops between two hosts.
diff --git a/net/ipv6/ip6_input.c b/net/ipv6/ip6_input.c
index 9ee208a348f5..e15ec2d36b9e 100644
--- a/net/ipv6/ip6_input.c
+++ b/net/ipv6/ip6_input.c
@@ -68,6 +68,10 @@ int ip6_rcv_finish(struct net *net, struct sock *sk, struct sk_buff *skb)
 	if (!skb_valid_dst(skb))
 		ip6_route_input(skb);
 
+	/* see comment on ipv4 edmux */
+	if (skb_dst(skb)->input != ip6_input)
+		skb_clear_noref_sk(skb);
+
 	return dst_input(skb);
 }
 
diff --git a/net/ipv6/netfilter/nf_dup_ipv6.c b/net/ipv6/netfilter/nf_dup_ipv6.c
index 4a7ddeddbaab..939f6a2238f9 100644
--- a/net/ipv6/netfilter/nf_dup_ipv6.c
+++ b/net/ipv6/netfilter/nf_dup_ipv6.c
@@ -60,6 +60,9 @@ void nf_dup_ipv6(struct net *net, struct sk_buff *skb, unsigned int hooknum,
 	nf_reset(skb);
 	nf_ct_set(skb, NULL, IP_CT_UNTRACKED);
 #endif
+	/* Avoid leaking noref sk outside the input path */
+	skb_clear_noref_sk(skb);
+
 	if (hooknum == NF_INET_PRE_ROUTING ||
 	    hooknum == NF_INET_LOCAL_IN) {
 		struct ipv6hdr *iph = ipv6_hdr(skb);
diff --git a/net/netfilter/nf_queue.c b/net/netfilter/nf_queue.c
index f7e21953b1de..100eff08cb51 100644
--- a/net/netfilter/nf_queue.c
+++ b/net/netfilter/nf_queue.c
@@ -145,6 +145,9 @@ static int __nf_queue(struct sk_buff *skb, const struct nf_hook_state *state,
 		.size	= sizeof(*entry) + afinfo->route_key_size,
 	};
 
+	/* Avoid leaking noref sk outside the input path */
+	skb_clear_noref_sk(skb);
+
 	nf_queue_entry_get_refs(entry);
 	skb_dst_force(skb);
 	afinfo->saveroute(skb, entry);
-- 
2.13.5

^ permalink raw reply related

* [RFC PATCH 01/11] net: add support for noref skb->sk
From: Paolo Abeni @ 2017-09-22 21:06 UTC (permalink / raw)
  To: netdev
  Cc: David S. Miller, Pablo Neira Ayuso, Florian Westphal,
	Eric Dumazet, Hannes Frederic Sowa
In-Reply-To: <cover.1506114055.git.pabeni@redhat.com>

Noref sk do not carry a socket refcount, are valid
only inside the current RCU section and must be
explicitly cleared before exiting such section.

They will be used in a later patch to allow early demux
without sock refcounting.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 include/linux/skbuff.h | 31 +++++++++++++++++++++++++++++++
 net/core/sock.c        |  7 +++++++
 2 files changed, 38 insertions(+)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 492828801acb..c3fc32636690 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -922,6 +922,37 @@ static inline struct rtable *skb_rtable(const struct sk_buff *skb)
 	return (struct rtable *)skb_dst(skb);
 }
 
+void sock_dummyfree(struct sk_buff *skb);
+
+/* only early demux can set noref socks
+ * noref socks do not carry any refcount and must be
+ * cleared before exiting the current RCU section
+ */
+static inline void skb_set_noref_sk(struct sk_buff *skb, struct sock *sk)
+{
+	skb->sk = sk;
+	skb->destructor = sock_dummyfree;
+}
+
+static inline bool skb_has_noref_sk(struct sk_buff *skb)
+{
+	return skb->destructor == sock_dummyfree;
+}
+
+static inline struct sock *skb_clear_noref_sk(struct sk_buff *skb)
+{
+	struct sock *ret;
+
+	if (!skb_has_noref_sk(skb))
+		return NULL;
+
+	WARN_ON_ONCE(!rcu_read_lock_held());
+	ret = skb->sk;
+	skb->sk = NULL;
+	skb->destructor = NULL;
+	return ret;
+}
+
 /* For mangling skb->pkt_type from user space side from applications
  * such as nft, tc, etc, we only allow a conservative subset of
  * possible pkt_types to be set.
diff --git a/net/core/sock.c b/net/core/sock.c
index 9b7b6bbb2a23..33da8e7e58a0 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1893,6 +1893,13 @@ void sock_efree(struct sk_buff *skb)
 }
 EXPORT_SYMBOL(sock_efree);
 
+/* dummy destructor used by noref sockets */
+void sock_dummyfree(struct sk_buff *skb)
+{
+	WARN_ON_ONCE(!rcu_read_lock_held());
+}
+EXPORT_SYMBOL(sock_dummyfree);
+
 kuid_t sock_i_uid(struct sock *sk)
 {
 	kuid_t uid;
-- 
2.13.5

^ permalink raw reply related

* [RFC PATCH 00/11] udp: full early demux for unconnected sockets
From: Paolo Abeni @ 2017-09-22 21:06 UTC (permalink / raw)
  To: netdev
  Cc: David S. Miller, Pablo Neira Ayuso, Florian Westphal,
	Eric Dumazet, Hannes Frederic Sowa

This series refactor the UDP early demux code so that:

* full socket lookup is performed for unicast packets
* a sk is grabbed even for unconnected socket match
* a dst cache is used even in such scenario

To perform this tasks a couple of facilities are added:

* noref socket references, scoped inside the current RCU section, to be
  explicitly cleared before leaving such section
* a dst cache inside the inet and inet6 local addresses tables, caching the
  related local dst entry

The measured performance gain under small packet UDP flood is as follow:

ingress NIC	vanilla		patched		delta
rx queues	(kpps)		(kpps)		(%)
[ipv4]
1		2177		2414		10
2		2527		2892		14
3		3050		3733		22
4		3918		4643		18
5		5074		5699		12
6		5654		6869		21

[ipv6]
1		2002		2821		40
2		2087		3148		50
3		2583		4008		55
4		3072		4963		61
5		3719		5992		61
6		4314		6910		60

The number of user space process in use is equal to the number of
NIC rx queue; when multiple user space processes the SO_REUSEPORT 
options is used, as described below:

ethtool  -L em2 combined $n
MASK=1
for I in `seq 0 $((n - 1))`; do
        udp_sink  --reuse-port --recvfrom --count 1000000000 --port 9 $1 &
        taskset -p $((MASK << ($I + $n) )) $!
done

Paolo Abeni (11):
  net: add support for noref skb->sk
  net: allow early demux to fetch noref socket
  udp: do not touch socket refcount in early demux
  net: add simple socket-like dst cache helpers
  udp: perform full socket lookup in early demux
  ip/route: factor out helper for local route creation
  ipv6/addrconf: add an helper for inet6 address lookup
  net: implement local route cache inside ifaddr
  route: add ipv4/6 helpers to do partial route lookup vs local dst
  IP: early demux can return an error code
  udp: dst lookup in early demux for unconnected sockets

 include/linux/inetdevice.h       |   4 ++
 include/linux/skbuff.h           |  31 +++++++++++
 include/linux/udp.h              |   2 +
 include/net/addrconf.h           |   3 ++
 include/net/dst.h                |  20 +++++++
 include/net/if_inet6.h           |   4 ++
 include/net/ip6_route.h          |   1 +
 include/net/protocol.h           |   4 +-
 include/net/route.h              |   4 ++
 include/net/tcp.h                |   2 +-
 include/net/udp.h                |   2 +-
 net/core/dst.c                   |  12 +++++
 net/core/sock.c                  |   7 +++
 net/ipv4/devinet.c               |  29 ++++++++++-
 net/ipv4/ip_input.c              |  33 ++++++++----
 net/ipv4/netfilter/nf_dup_ipv4.c |   3 ++
 net/ipv4/route.c                 |  73 +++++++++++++++++++++++---
 net/ipv4/tcp_ipv4.c              |   9 ++--
 net/ipv4/udp.c                   |  95 +++++++++++++++-------------------
 net/ipv6/addrconf.c              | 109 +++++++++++++++++++++++++++------------
 net/ipv6/ip6_input.c             |   4 ++
 net/ipv6/netfilter/nf_dup_ipv6.c |   3 ++
 net/ipv6/route.c                 |  13 +++++
 net/ipv6/udp.c                   |  72 ++++++++++----------------
 net/netfilter/nf_queue.c         |   3 ++
 25 files changed, 383 insertions(+), 159 deletions(-)

-- 
2.13.5

^ permalink raw reply

* [PATCH,v3,net-next 2/2] tun: enable napi_gro_frags() for TUN/TAP driver
From: Petar Penkov @ 2017-09-22 20:49 UTC (permalink / raw)
  To: netdev; +Cc: edumazet, maheshb, willemb, davem, ppenkov, Petar Penkov
In-Reply-To: <20170922204915.7889-1-peterpenkov96@gmail.com>

Add a TUN/TAP receive mode that exercises the napi_gro_frags()
interface. This mode is available only in TAP mode, as the interface
expects packets with Ethernet headers.

Furthermore, packets follow the layout of the iovec_iter that was
received. The first iovec is the linear data, and every one after the
first is a fragment. If there are more fragments than the max number,
drop the packet. Additionally, invoke eth_get_headlen() to exercise flow
dissector code and to verify that the header resides in the linear data.

The napi_gro_frags() mode requires setting the IFF_NAPI_FRAGS option.
This is imposed because this mode is intended for testing via tools like
syzkaller and packetdrill, and the increased flexibility it provides can
introduce security vulnerabilities. This flag is accepted only if the
device is in TAP mode and has the IFF_NAPI flag set as well. This is
done because both of these are explicit requirements for correct
operation in this mode.

Signed-off-by: Petar Penkov <peterpenkov96@gmail.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Mahesh Bandewar <maheshb@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: davem@davemloft.net
Cc: ppenkov@stanford.edu
---
 drivers/net/tun.c           | 134 ++++++++++++++++++++++++++++++++++++++++++--
 include/uapi/linux/if_tun.h |   1 +
 2 files changed, 129 insertions(+), 6 deletions(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index f16407242b18..9880b3bc8fa5 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -75,6 +75,7 @@
 #include <linux/skb_array.h>
 #include <linux/bpf.h>
 #include <linux/bpf_trace.h>
+#include <linux/mutex.h>
 
 #include <linux/uaccess.h>
 
@@ -121,7 +122,8 @@ do {								\
 #define TUN_VNET_BE     0x40000000
 
 #define TUN_FEATURES (IFF_NO_PI | IFF_ONE_QUEUE | IFF_VNET_HDR | \
-		      IFF_MULTI_QUEUE | IFF_NAPI)
+		      IFF_MULTI_QUEUE | IFF_NAPI | IFF_NAPI_FRAGS)
+
 #define GOODCOPY_LEN 128
 
 #define FLT_EXACT_COUNT 8
@@ -173,6 +175,7 @@ struct tun_file {
 		unsigned int ifindex;
 	};
 	struct napi_struct napi;
+	struct mutex napi_mutex;	/* Protects access to the above napi */
 	struct list_head next;
 	struct tun_struct *detached;
 	struct skb_array tx_array;
@@ -277,6 +280,7 @@ static void tun_napi_init(struct tun_struct *tun, struct tun_file *tfile,
 		netif_napi_add(tun->dev, &tfile->napi, tun_napi_poll,
 			       NAPI_POLL_WEIGHT);
 		napi_enable(&tfile->napi);
+		mutex_init(&tfile->napi_mutex);
 	}
 }
 
@@ -292,6 +296,11 @@ static void tun_napi_del(struct tun_struct *tun, struct tun_file *tfile)
 		netif_napi_del(&tfile->napi);
 }
 
+static bool tun_napi_frags_enabled(const struct tun_struct *tun)
+{
+	return READ_ONCE(tun->flags) & IFF_NAPI_FRAGS;
+}
+
 #ifdef CONFIG_TUN_VNET_CROSS_LE
 static inline bool tun_legacy_is_little_endian(struct tun_struct *tun)
 {
@@ -1036,7 +1045,8 @@ static void tun_poll_controller(struct net_device *dev)
 	 * supports polling, which enables bridge devices in virt setups to
 	 * still use netconsole
 	 * If NAPI is enabled, however, we need to schedule polling for all
-	 * queues.
+	 * queues unless we are using napi_gro_frags(), which we call in
+	 * process context and not in NAPI context.
 	 */
 	struct tun_struct *tun = netdev_priv(dev);
 
@@ -1044,6 +1054,9 @@ static void tun_poll_controller(struct net_device *dev)
 		struct tun_file *tfile;
 		int i;
 
+		if (tun_napi_frags_enabled(tun))
+			return;
+
 		rcu_read_lock();
 		for (i = 0; i < tun->numqueues; i++) {
 			tfile = rcu_dereference(tun->tfiles[i]);
@@ -1266,6 +1279,64 @@ static unsigned int tun_chr_poll(struct file *file, poll_table *wait)
 	return mask;
 }
 
+static struct sk_buff *tun_napi_alloc_frags(struct tun_file *tfile,
+					    size_t len,
+					    const struct iov_iter *it)
+{
+	struct sk_buff *skb;
+	size_t linear;
+	int err;
+	int i;
+
+	if (it->nr_segs > MAX_SKB_FRAGS + 1)
+		return ERR_PTR(-ENOMEM);
+
+	local_bh_disable();
+	skb = napi_get_frags(&tfile->napi);
+	local_bh_enable();
+	if (!skb)
+		return ERR_PTR(-ENOMEM);
+
+	linear = iov_iter_single_seg_count(it);
+	err = __skb_grow(skb, linear);
+	if (err)
+		goto free;
+
+	skb->len = len;
+	skb->data_len = len - linear;
+	skb->truesize += skb->data_len;
+
+	for (i = 1; i < it->nr_segs; i++) {
+		size_t fragsz = it->iov[i].iov_len;
+		unsigned long offset;
+		struct page *page;
+		void *data;
+
+		if (fragsz == 0 || fragsz > PAGE_SIZE) {
+			err = -EINVAL;
+			goto free;
+		}
+
+		local_bh_disable();
+		data = napi_alloc_frag(fragsz);
+		local_bh_enable();
+		if (!data) {
+			err = -ENOMEM;
+			goto free;
+		}
+
+		page = virt_to_head_page(data);
+		offset = data - page_address(page);
+		skb_fill_page_desc(skb, i - 1, page, offset, fragsz);
+	}
+
+	return skb;
+free:
+	/* frees skb and all frags allocated with napi_alloc_frag() */
+	napi_free_frags(&tfile->napi);
+	return ERR_PTR(err);
+}
+
 /* prepad is the amount to reserve at front.  len is length after that.
  * linear is a hint as to how much to copy (usually headers). */
 static struct sk_buff *tun_alloc_skb(struct tun_file *tfile,
@@ -1478,6 +1549,7 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
 	int err;
 	u32 rxhash;
 	int skb_xdp = 1;
+	bool frags = tun_napi_frags_enabled(tun);
 
 	if (!(tun->dev->flags & IFF_UP))
 		return -EIO;
@@ -1535,7 +1607,7 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
 			zerocopy = true;
 	}
 
-	if (tun_can_build_skb(tun, tfile, len, noblock, zerocopy)) {
+	if (!frags && tun_can_build_skb(tun, tfile, len, noblock, zerocopy)) {
 		/* For the packet that is not easy to be processed
 		 * (e.g gso or jumbo packet), we will do it at after
 		 * skb was created with generic XDP routine.
@@ -1556,10 +1628,24 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
 				linear = tun16_to_cpu(tun, gso.hdr_len);
 		}
 
-		skb = tun_alloc_skb(tfile, align, copylen, linear, noblock);
+		if (frags) {
+			mutex_lock(&tfile->napi_mutex);
+			skb = tun_napi_alloc_frags(tfile, copylen, from);
+			/* tun_napi_alloc_frags() enforces a layout for the skb.
+			 * If zerocopy is enabled, then this layout will be
+			 * overwritten by zerocopy_sg_from_iter().
+			 */
+			zerocopy = false;
+		} else {
+			skb = tun_alloc_skb(tfile, align, copylen, linear,
+					    noblock);
+		}
+
 		if (IS_ERR(skb)) {
 			if (PTR_ERR(skb) != -EAGAIN)
 				this_cpu_inc(tun->pcpu_stats->rx_dropped);
+			if (frags)
+				mutex_unlock(&tfile->napi_mutex);
 			return PTR_ERR(skb);
 		}
 
@@ -1571,6 +1657,11 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
 		if (err) {
 			this_cpu_inc(tun->pcpu_stats->rx_dropped);
 			kfree_skb(skb);
+			if (frags) {
+				tfile->napi.skb = NULL;
+				mutex_unlock(&tfile->napi_mutex);
+			}
+
 			return -EFAULT;
 		}
 	}
@@ -1578,6 +1669,11 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
 	if (virtio_net_hdr_to_skb(skb, &gso, tun_is_little_endian(tun))) {
 		this_cpu_inc(tun->pcpu_stats->rx_frame_errors);
 		kfree_skb(skb);
+		if (frags) {
+			tfile->napi.skb = NULL;
+			mutex_unlock(&tfile->napi_mutex);
+		}
+
 		return -EINVAL;
 	}
 
@@ -1603,7 +1699,8 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
 		skb->dev = tun->dev;
 		break;
 	case IFF_TAP:
-		skb->protocol = eth_type_trans(skb, tun->dev);
+		if (!frags)
+			skb->protocol = eth_type_trans(skb, tun->dev);
 		break;
 	}
 
@@ -1638,7 +1735,23 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
 
 	rxhash = __skb_get_hash_symmetric(skb);
 
-	if (tun->flags & IFF_NAPI) {
+	if (frags) {
+		/* Exercise flow dissector code path. */
+		u32 headlen = eth_get_headlen(skb->data, skb_headlen(skb));
+
+		if (headlen > skb_headlen(skb) || headlen < ETH_HLEN) {
+			this_cpu_inc(tun->pcpu_stats->rx_dropped);
+			napi_free_frags(&tfile->napi);
+			mutex_unlock(&tfile->napi_mutex);
+			WARN_ON(1);
+			return -ENOMEM;
+		}
+
+		local_bh_disable();
+		napi_gro_frags(&tfile->napi);
+		local_bh_enable();
+		mutex_unlock(&tfile->napi_mutex);
+	} else if (tun->flags & IFF_NAPI) {
 		struct sk_buff_head *queue = &tfile->sk.sk_write_queue;
 		int queue_len;
 
@@ -2061,6 +2174,15 @@ static int tun_set_iff(struct net *net, struct file *file, struct ifreq *ifr)
 	if (tfile->detached)
 		return -EINVAL;
 
+	if ((ifr->ifr_flags & IFF_NAPI_FRAGS)) {
+		if (!capable(CAP_NET_ADMIN))
+			return -EPERM;
+
+		if (!(ifr->ifr_flags & IFF_NAPI) ||
+		    (ifr->ifr_flags & TUN_TYPE_MASK) != IFF_TAP)
+			return -EINVAL;
+	}
+
 	dev = __dev_get_by_name(net, ifr->ifr_name);
 	if (dev) {
 		if (ifr->ifr_flags & IFF_TUN_EXCL)
diff --git a/include/uapi/linux/if_tun.h b/include/uapi/linux/if_tun.h
index 30b6184884eb..365ade5685c9 100644
--- a/include/uapi/linux/if_tun.h
+++ b/include/uapi/linux/if_tun.h
@@ -61,6 +61,7 @@
 #define IFF_TUN		0x0001
 #define IFF_TAP		0x0002
 #define IFF_NAPI	0x0010
+#define IFF_NAPI_FRAGS	0x0020
 #define IFF_NO_PI	0x1000
 /* This flag has no real effect */
 #define IFF_ONE_QUEUE	0x2000
-- 
2.11.0

^ permalink raw reply related

* [PATCH,v3,net-next 1/2] tun: enable NAPI for TUN/TAP driver
From: Petar Penkov @ 2017-09-22 20:49 UTC (permalink / raw)
  To: netdev; +Cc: edumazet, maheshb, willemb, davem, ppenkov, Petar Penkov
In-Reply-To: <20170922204915.7889-1-peterpenkov96@gmail.com>

Changes TUN driver to use napi_gro_receive() upon receiving packets
rather than netif_rx_ni(). Adds flag IFF_NAPI that enables these
changes and operation is not affected if the flag is disabled.  SKBs
are constructed upon packet arrival and are queued to be processed
later.

The new path was evaluated with a benchmark with the following setup:
Open two tap devices and a receiver thread that reads in a loop for
each device. Start one sender thread and pin all threads to different
CPUs. Send 1M minimum UDP packets to each device and measure sending
time for each of the sending methods:
	napi_gro_receive():	4.90s
	netif_rx_ni():		4.90s
	netif_receive_skb():	7.20s

Signed-off-by: Petar Penkov <peterpenkov96@gmail.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Mahesh Bandewar <maheshb@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: davem@davemloft.net
Cc: ppenkov@stanford.edu
---
 drivers/net/tun.c           | 133 +++++++++++++++++++++++++++++++++++++++-----
 include/uapi/linux/if_tun.h |   1 +
 2 files changed, 119 insertions(+), 15 deletions(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 3c9985f29950..f16407242b18 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -121,7 +121,7 @@ do {								\
 #define TUN_VNET_BE     0x40000000
 
 #define TUN_FEATURES (IFF_NO_PI | IFF_ONE_QUEUE | IFF_VNET_HDR | \
-		      IFF_MULTI_QUEUE)
+		      IFF_MULTI_QUEUE | IFF_NAPI)
 #define GOODCOPY_LEN 128
 
 #define FLT_EXACT_COUNT 8
@@ -172,6 +172,7 @@ struct tun_file {
 		u16 queue_index;
 		unsigned int ifindex;
 	};
+	struct napi_struct napi;
 	struct list_head next;
 	struct tun_struct *detached;
 	struct skb_array tx_array;
@@ -229,6 +230,68 @@ struct tun_struct {
 	struct bpf_prog __rcu *xdp_prog;
 };
 
+static int tun_napi_receive(struct napi_struct *napi, int budget)
+{
+	struct tun_file *tfile = container_of(napi, struct tun_file, napi);
+	struct sk_buff_head *queue = &tfile->sk.sk_write_queue;
+	struct sk_buff_head process_queue;
+	struct sk_buff *skb;
+	int received = 0;
+
+	__skb_queue_head_init(&process_queue);
+
+	spin_lock(&queue->lock);
+	skb_queue_splice_tail_init(queue, &process_queue);
+	spin_unlock(&queue->lock);
+
+	while (received < budget && (skb = __skb_dequeue(&process_queue))) {
+		napi_gro_receive(napi, skb);
+		++received;
+	}
+
+	if (!skb_queue_empty(&process_queue)) {
+		spin_lock(&queue->lock);
+		skb_queue_splice(&process_queue, queue);
+		spin_unlock(&queue->lock);
+	}
+
+	return received;
+}
+
+static int tun_napi_poll(struct napi_struct *napi, int budget)
+{
+	unsigned int received;
+
+	received = tun_napi_receive(napi, budget);
+
+	if (received < budget)
+		napi_complete_done(napi, received);
+
+	return received;
+}
+
+static void tun_napi_init(struct tun_struct *tun, struct tun_file *tfile,
+			  bool napi_en)
+{
+	if (napi_en) {
+		netif_napi_add(tun->dev, &tfile->napi, tun_napi_poll,
+			       NAPI_POLL_WEIGHT);
+		napi_enable(&tfile->napi);
+	}
+}
+
+static void tun_napi_disable(struct tun_struct *tun, struct tun_file *tfile)
+{
+	if (tun->flags & IFF_NAPI)
+		napi_disable(&tfile->napi);
+}
+
+static void tun_napi_del(struct tun_struct *tun, struct tun_file *tfile)
+{
+	if (tun->flags & IFF_NAPI)
+		netif_napi_del(&tfile->napi);
+}
+
 #ifdef CONFIG_TUN_VNET_CROSS_LE
 static inline bool tun_legacy_is_little_endian(struct tun_struct *tun)
 {
@@ -541,6 +604,11 @@ static void __tun_detach(struct tun_file *tfile, bool clean)
 
 	tun = rtnl_dereference(tfile->tun);
 
+	if (tun && clean) {
+		tun_napi_disable(tun, tfile);
+		tun_napi_del(tun, tfile);
+	}
+
 	if (tun && !tfile->detached) {
 		u16 index = tfile->queue_index;
 		BUG_ON(index >= tun->numqueues);
@@ -598,6 +666,7 @@ static void tun_detach_all(struct net_device *dev)
 	for (i = 0; i < n; i++) {
 		tfile = rtnl_dereference(tun->tfiles[i]);
 		BUG_ON(!tfile);
+		tun_napi_disable(tun, tfile);
 		tfile->socket.sk->sk_shutdown = RCV_SHUTDOWN;
 		tfile->socket.sk->sk_data_ready(tfile->socket.sk);
 		RCU_INIT_POINTER(tfile->tun, NULL);
@@ -613,6 +682,7 @@ static void tun_detach_all(struct net_device *dev)
 	synchronize_net();
 	for (i = 0; i < n; i++) {
 		tfile = rtnl_dereference(tun->tfiles[i]);
+		tun_napi_del(tun, tfile);
 		/* Drop read queue */
 		tun_queue_purge(tfile);
 		sock_put(&tfile->sk);
@@ -631,7 +701,8 @@ static void tun_detach_all(struct net_device *dev)
 		module_put(THIS_MODULE);
 }
 
-static int tun_attach(struct tun_struct *tun, struct file *file, bool skip_filter)
+static int tun_attach(struct tun_struct *tun, struct file *file,
+		      bool skip_filter, bool napi)
 {
 	struct tun_file *tfile = file->private_data;
 	struct net_device *dev = tun->dev;
@@ -677,10 +748,12 @@ static int tun_attach(struct tun_struct *tun, struct file *file, bool skip_filte
 	rcu_assign_pointer(tun->tfiles[tun->numqueues], tfile);
 	tun->numqueues++;
 
-	if (tfile->detached)
+	if (tfile->detached) {
 		tun_enable_queue(tfile);
-	else
+	} else {
 		sock_hold(&tfile->sk);
+		tun_napi_init(tun, tfile, napi);
+	}
 
 	tun_set_real_num_queues(tun);
 
@@ -956,13 +1029,28 @@ static void tun_poll_controller(struct net_device *dev)
 	 * Tun only receives frames when:
 	 * 1) the char device endpoint gets data from user space
 	 * 2) the tun socket gets a sendmsg call from user space
-	 * Since both of those are synchronous operations, we are guaranteed
-	 * never to have pending data when we poll for it
-	 * so there is nothing to do here but return.
+	 * If NAPI is not enabled, since both of those are synchronous
+	 * operations, we are guaranteed never to have pending data when we poll
+	 * for it so there is nothing to do here but return.
 	 * We need this though so netpoll recognizes us as an interface that
 	 * supports polling, which enables bridge devices in virt setups to
 	 * still use netconsole
+	 * If NAPI is enabled, however, we need to schedule polling for all
+	 * queues.
 	 */
+	struct tun_struct *tun = netdev_priv(dev);
+
+	if (tun->flags & IFF_NAPI) {
+		struct tun_file *tfile;
+		int i;
+
+		rcu_read_lock();
+		for (i = 0; i < tun->numqueues; i++) {
+			tfile = rcu_dereference(tun->tfiles[i]);
+			napi_schedule(&tfile->napi);
+		}
+		rcu_read_unlock();
+	}
 	return;
 }
 #endif
@@ -1549,11 +1637,25 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
 	}
 
 	rxhash = __skb_get_hash_symmetric(skb);
-#ifndef CONFIG_4KSTACKS
-	tun_rx_batched(tun, tfile, skb, more);
-#else
-	netif_rx_ni(skb);
-#endif
+
+	if (tun->flags & IFF_NAPI) {
+		struct sk_buff_head *queue = &tfile->sk.sk_write_queue;
+		int queue_len;
+
+		spin_lock_bh(&queue->lock);
+		__skb_queue_tail(queue, skb);
+		queue_len = skb_queue_len(queue);
+		spin_unlock(&queue->lock);
+
+		if (!more || queue_len > NAPI_POLL_WEIGHT)
+			napi_schedule(&tfile->napi);
+
+		local_bh_enable();
+	} else if (!IS_ENABLED(CONFIG_4KSTACKS)) {
+		tun_rx_batched(tun, tfile, skb, more);
+	} else {
+		netif_rx_ni(skb);
+	}
 
 	stats = get_cpu_ptr(tun->pcpu_stats);
 	u64_stats_update_begin(&stats->syncp);
@@ -1980,7 +2082,8 @@ static int tun_set_iff(struct net *net, struct file *file, struct ifreq *ifr)
 		if (err < 0)
 			return err;
 
-		err = tun_attach(tun, file, ifr->ifr_flags & IFF_NOFILTER);
+		err = tun_attach(tun, file, ifr->ifr_flags & IFF_NOFILTER,
+				 ifr->ifr_flags & IFF_NAPI);
 		if (err < 0)
 			return err;
 
@@ -2066,7 +2169,7 @@ static int tun_set_iff(struct net *net, struct file *file, struct ifreq *ifr)
 				       NETIF_F_HW_VLAN_STAG_TX);
 
 		INIT_LIST_HEAD(&tun->disabled);
-		err = tun_attach(tun, file, false);
+		err = tun_attach(tun, file, false, ifr->ifr_flags & IFF_NAPI);
 		if (err < 0)
 			goto err_free_flow;
 
@@ -2216,7 +2319,7 @@ static int tun_set_queue(struct file *file, struct ifreq *ifr)
 		ret = security_tun_dev_attach_queue(tun->security);
 		if (ret < 0)
 			goto unlock;
-		ret = tun_attach(tun, file, false);
+		ret = tun_attach(tun, file, false, tun->flags & IFF_NAPI);
 	} else if (ifr->ifr_flags & IFF_DETACH_QUEUE) {
 		tun = rtnl_dereference(tfile->tun);
 		if (!tun || !(tun->flags & IFF_MULTI_QUEUE) || tfile->detached)
diff --git a/include/uapi/linux/if_tun.h b/include/uapi/linux/if_tun.h
index 3cb5e1d85ddd..30b6184884eb 100644
--- a/include/uapi/linux/if_tun.h
+++ b/include/uapi/linux/if_tun.h
@@ -60,6 +60,7 @@
 /* TUNSETIFF ifr flags */
 #define IFF_TUN		0x0001
 #define IFF_TAP		0x0002
+#define IFF_NAPI	0x0010
 #define IFF_NO_PI	0x1000
 /* This flag has no real effect */
 #define IFF_ONE_QUEUE	0x2000
-- 
2.11.0

^ permalink raw reply related

* [PATCH,v3,net-next 0/2] Improve code coverage of syzkaller
From: Petar Penkov @ 2017-09-22 20:49 UTC (permalink / raw)
  To: netdev; +Cc: edumazet, maheshb, willemb, davem, ppenkov, Petar Penkov

This patch series is intended to improve code coverage of syzkaller on
the early receive path, specifically including flow dissector, GRO,
and GRO with frags parts of the networking stack. Syzkaller exercises
the stack through the TUN driver and this is therefore where changes
reside. Current coverage through netif_receive_skb() is limited as it
does not touch on any of the aforementioned code paths. Furthermore,
for full coverage, it is necessary to have more flexibility over the
linear and non-linear data of the skbs.

The following patches address this by providing the user(syzkaller)
with the ability to send via napi_gro_receive() and napi_gro_frags().
Additionally, syzkaller can specify how many fragments there are and
how much data per fragment there is. This is done by exploiting the
convenient structure of iovecs. Finally, this patch series adds
support for exercising the flow dissector during fuzzing.

The code path including napi_gro_receive() can be enabled via the
IFF_NAPI flag.  The remainder of the changes in this patch series give
the user significantly more control over packets entering the kernel.
To avoid potential security vulnerabilities, hide the ability to send
custom skbs and the flow dissector code paths behind a
capable(CAP_NET_ADMIN) check to require special user privileges.

Changes since v2 based on feedback from Willem de Bruijn and Mahesh
Bandewar:

Patch 1/ No changes.
Patch 2/ Check if the preconditions for IFF_NAPI_FRAGS (IFF_NAPI and
	 IFF_TAP) are met before opening/attaching rather than after.
	 If they are not, change the behavior from discarding the
	 flag to rejecting the command with EINVAL.

Petar Penkov (2):
  tun: enable NAPI for TUN/TAP driver
  tun: enable napi_gro_frags() for TUN/TAP driver

 drivers/net/tun.c           | 261 +++++++++++++++++++++++++++++++++++++++++---
 include/uapi/linux/if_tun.h |   2 +
 2 files changed, 245 insertions(+), 18 deletions(-)

-- 
2.11.0

^ permalink raw reply

* [PATCH] [for 4.14] net: qcom/emac: specify the correct size when mapping a DMA buffer
From: Timur Tabi @ 2017-09-22 20:32 UTC (permalink / raw)
  To: David S. Miller, netdev, stable; +Cc: timur

When mapping the RX DMA buffers, the driver was accidentally specifying
zero for the buffer length.  Under normal circumstances, SWIOTLB does not
need to allocate a bounce buffer, so the address is just mapped without
checking the size field.  This is why the error was not detected earlier.

Fixes: b9b17debc69d ("net: emac: emac gigabit ethernet controller driver")
Cc: stable@vger.kernel.org
Signed-off-by: Timur Tabi <timur@codeaurora.org>
---
 drivers/net/ethernet/qualcomm/emac/emac-mac.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/qualcomm/emac/emac-mac.c b/drivers/net/ethernet/qualcomm/emac/emac-mac.c
index 0ea3ca09c689..3ed9033e56db 100644
--- a/drivers/net/ethernet/qualcomm/emac/emac-mac.c
+++ b/drivers/net/ethernet/qualcomm/emac/emac-mac.c
@@ -898,7 +898,8 @@ static void emac_mac_rx_descs_refill(struct emac_adapter *adpt,

 		curr_rxbuf->dma_addr =
 			dma_map_single(adpt->netdev->dev.parent, skb->data,
-				       curr_rxbuf->length, DMA_FROM_DEVICE);
+				       adpt->rxbuf_size, DMA_FROM_DEVICE);
+
 		ret = dma_mapping_error(adpt->netdev->dev.parent,
 					curr_rxbuf->dma_addr);
 		if (ret) {
-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm
Technologies, Inc.  Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply related

* Re: [PATCH net-next v2 1/3] net: dsa: use slave device phydev
From: Vivien Didelot @ 2017-09-22 20:11 UTC (permalink / raw)
  To: Florian Fainelli, netdev
  Cc: linux-kernel, kernel, David S. Miller, Andrew Lunn
In-Reply-To: <2262f266-6fb9-cce4-f83a-933dd41b9062@gmail.com>

Hi Florian,

Florian Fainelli <f.fainelli@gmail.com> writes:

> On 09/22/2017 12:40 PM, Vivien Didelot wrote:
>> There is no need to store a phy_device in dsa_slave_priv since
>> net_device already provides one. Simply s/p->phy/dev->phydev/.
>
> You can therefore remove the phy_device from dsa_slave_priv, see below
> for more comments. I will have to regress test the heck out of this,
> this should take a few hours.

OK, since this is a sensible topic, I will respin a v3 without this
patch, so that a future patchset can address your comments below and
also gives you time to test this one patch alone.

>>  static int dsa_slave_port_attr_set(struct net_device *dev,
>> @@ -435,12 +433,10 @@ static int
>>  dsa_slave_get_link_ksettings(struct net_device *dev,
>>  			     struct ethtool_link_ksettings *cmd)
>>  {
>> -	struct dsa_slave_priv *p = netdev_priv(dev);
>> +	if (!dev->phydev)
>> +		return -ENODEV;
>>  
>> -	if (!p->phy)
>> -		return -EOPNOTSUPP;
>> -
>> -	phy_ethtool_ksettings_get(p->phy, cmd);
>> +	phy_ethtool_ksettings_get(dev->phydev, cmd);
>
> This can be replaced by phy_ethtool_get_link_ksettings()
>
>>  
>>  	return 0;
>>  }
>> @@ -449,12 +445,10 @@ static int
>>  dsa_slave_set_link_ksettings(struct net_device *dev,
>>  			     const struct ethtool_link_ksettings *cmd)
>>  {
>> -	struct dsa_slave_priv *p = netdev_priv(dev);
>> +	if (!dev->phydev)
>> +		return -ENODEV;
>>  
>> -	if (p->phy != NULL)
>> -		return phy_ethtool_ksettings_set(p->phy, cmd);
>> -
>> -	return -EOPNOTSUPP;
>> +	return phy_ethtool_ksettings_set(dev->phydev, cmd);
>>  }
>
> This can disappear and you can assign this ethtool operation to
> phy_ethtool_set_link_ksettings()
>
>>  
>>  static void dsa_slave_get_drvinfo(struct net_device *dev,
>> @@ -488,24 +482,20 @@ dsa_slave_get_regs(struct net_device *dev, struct ethtool_regs *regs, void *_p)
>>  
>>  static int dsa_slave_nway_reset(struct net_device *dev)
>>  {
>> -	struct dsa_slave_priv *p = netdev_priv(dev);
>> +	if (!dev->phydev)
>> +		return -ENODEV;
>>  
>> -	if (p->phy != NULL)
>> -		return genphy_restart_aneg(p->phy);
>> -
>> -	return -EOPNOTSUPP;
>> +	return genphy_restart_aneg(dev->phydev);
>>  }
>
> This can now disappear and you can use phy_ethtool_nway_reset() directly
> in ethtool_ops
>
>>  
>>  static u32 dsa_slave_get_link(struct net_device *dev)
>>  {
>> -	struct dsa_slave_priv *p = netdev_priv(dev);
>> +	if (!dev->phydev)
>> +		return -ENODEV;
>>  
>> -	if (p->phy != NULL) {
>> -		genphy_update_link(p->phy);
>> -		return p->phy->link;
>> -	}
>> +	genphy_update_link(dev->phydev);
>>  
>> -	return -EOPNOTSUPP;
>> +	return dev->phydev->link;
>>  }
>
> This should certainly be just ethtool_op_get_link(), not sure why we
> kept that around here...

Haaa, good to read that! I wasn't sure about this, but with this patch
the slave phy ethtool functions seemed indeed quite generic...


Thanks,

        Vivien

^ permalink raw reply

* Re: [PATCH net-next 2/2] net: dsa: lan9303: Add basic offloading of unicast traffic
From: Andrew Lunn @ 2017-09-22 20:08 UTC (permalink / raw)
  To: Egil Hjelmeland; +Cc: vivien.didelot, f.fainelli, netdev, linux-kernel
In-Reply-To: <b27d2dd4-84e3-b930-a6fe-1e5b36a7d213@egil-hjelmeland.no>

> >I'm wondering how this is supposed to work. Please add a good comment
> >here, since the hardware is forcing you to do something odd.
> >
> >Maybe it would be a good idea to save the STP state in chip.  And then
> >when chip->is_bridged is set true, change the state in the hardware to
> >the saved value?
> >
> >What happens when port 0 is added to the bridge, there is then a
> >minute pause and then port 1 is added? I would expect that as soon as
> >port 0 is added, the STP state machine for port 0 will start and move
> >into listening and then forwarding. Due to hardware limitations it
> >looks like you cannot do this. So what state is the hardware
> >effectively in? Blocking? Forwarding?
> >
> >Then port 1 is added. You can then can respect the states. port 1 will
> >do blocking->listening->forwarding, but what about port 0? The calls
> >won't get repeated? How does it transition to forwarding?
> >
> >   Andrew
> >
> 
> I see your point with the "minute pause" argument. Although a bit
> contrived use case, it is easy to fix by caching the STP state, as
> you suggest. So I can do that.

I don't think it is contrived. I've done bridge configuration by hand
for testing purposes. I've also set the forwarding delay to very small
values, so there is a clear race condition here.

> How does other DSA HW chips handle port separation? Knowing that
> could perhaps help me know what to look for.

They have better hardware :-)

Generally each port is totally independent. You can change the STP
state per port without restrictions.

      Andrew

^ permalink raw reply

* Re: [PATCH net-next v2 1/3] net: dsa: use slave device phydev
From: Andrew Lunn @ 2017-09-22 20:03 UTC (permalink / raw)
  To: Vivien Didelot
  Cc: netdev, linux-kernel, kernel, David S. Miller, Florian Fainelli
In-Reply-To: <20170922194045.18814-2-vivien.didelot@savoirfairelinux.com>

On Fri, Sep 22, 2017 at 03:40:43PM -0400, Vivien Didelot wrote:
> There is no need to store a phy_device in dsa_slave_priv since
> net_device already provides one. Simply s/p->phy/dev->phydev/.
> 
> While at it, return -ENODEV when it is NULL instead of -EOPNOTSUPP.

I just did a quick poll for calling phy_mii_ioctl(). ENODEV seems the
most popular, second to EINVAL. Marvell drivers all use EOPNOTSUPP.

>  static int dsa_slave_nway_reset(struct net_device *dev)
>  {
> -	struct dsa_slave_priv *p = netdev_priv(dev);
> +	if (!dev->phydev)
> +		return -ENODEV;
>  
> -	if (p->phy != NULL)
> -		return genphy_restart_aneg(p->phy);
> -
> -	return -EOPNOTSUPP;
> +	return genphy_restart_aneg(dev->phydev);
>  }

It looks like this can now be replaced with phy_ethtool_nway_reset().

It could be there are other phy_ethtool_ helpers which can be used,
now that we have phydev in ndev.

    Andrew

^ permalink raw reply

* Re: [PATCH net-next v2 1/3] net: dsa: use slave device phydev
From: Florian Fainelli @ 2017-09-22 20:00 UTC (permalink / raw)
  To: Vivien Didelot, netdev; +Cc: linux-kernel, kernel, David S. Miller, Andrew Lunn
In-Reply-To: <20170922194045.18814-2-vivien.didelot@savoirfairelinux.com>

On 09/22/2017 12:40 PM, Vivien Didelot wrote:
> There is no need to store a phy_device in dsa_slave_priv since
> net_device already provides one. Simply s/p->phy/dev->phydev/.

You can therefore remove the phy_device from dsa_slave_priv, see below
for more comments. I will have to regress test the heck out of this,
this should take a few hours.

> 
> While at it, return -ENODEV when it is NULL instead of -EOPNOTSUPP.
> 
> Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
> ---

>  static int dsa_slave_port_attr_set(struct net_device *dev,
> @@ -435,12 +433,10 @@ static int
>  dsa_slave_get_link_ksettings(struct net_device *dev,
>  			     struct ethtool_link_ksettings *cmd)
>  {
> -	struct dsa_slave_priv *p = netdev_priv(dev);
> +	if (!dev->phydev)
> +		return -ENODEV;
>  
> -	if (!p->phy)
> -		return -EOPNOTSUPP;
> -
> -	phy_ethtool_ksettings_get(p->phy, cmd);
> +	phy_ethtool_ksettings_get(dev->phydev, cmd);

This can be replaced by phy_ethtool_get_link_ksettings()

>  
>  	return 0;
>  }
> @@ -449,12 +445,10 @@ static int
>  dsa_slave_set_link_ksettings(struct net_device *dev,
>  			     const struct ethtool_link_ksettings *cmd)
>  {
> -	struct dsa_slave_priv *p = netdev_priv(dev);
> +	if (!dev->phydev)
> +		return -ENODEV;
>  
> -	if (p->phy != NULL)
> -		return phy_ethtool_ksettings_set(p->phy, cmd);
> -
> -	return -EOPNOTSUPP;
> +	return phy_ethtool_ksettings_set(dev->phydev, cmd);
>  }

This can disappear and you can assign this ethtool operation to
phy_ethtool_set_link_ksettings()

>  
>  static void dsa_slave_get_drvinfo(struct net_device *dev,
> @@ -488,24 +482,20 @@ dsa_slave_get_regs(struct net_device *dev, struct ethtool_regs *regs, void *_p)
>  
>  static int dsa_slave_nway_reset(struct net_device *dev)
>  {
> -	struct dsa_slave_priv *p = netdev_priv(dev);
> +	if (!dev->phydev)
> +		return -ENODEV;
>  
> -	if (p->phy != NULL)
> -		return genphy_restart_aneg(p->phy);
> -
> -	return -EOPNOTSUPP;
> +	return genphy_restart_aneg(dev->phydev);
>  }

This can now disappear and you can use phy_ethtool_nway_reset() directly
in ethtool_ops

>  
>  static u32 dsa_slave_get_link(struct net_device *dev)
>  {
> -	struct dsa_slave_priv *p = netdev_priv(dev);
> +	if (!dev->phydev)
> +		return -ENODEV;
>  
> -	if (p->phy != NULL) {
> -		genphy_update_link(p->phy);
> -		return p->phy->link;
> -	}
> +	genphy_update_link(dev->phydev);
>  
> -	return -EOPNOTSUPP;
> +	return dev->phydev->link;
>  }

This should certainly be just ethtool_op_get_link(), not sure why we
kept that around here...
-- 
Florian

^ permalink raw reply

* [PATCH net-next v2 3/3] net: dsa: add port enable and disable helpers
From: Vivien Didelot @ 2017-09-22 19:40 UTC (permalink / raw)
  To: netdev
  Cc: linux-kernel, kernel, David S. Miller, Florian Fainelli,
	Andrew Lunn, Vivien Didelot
In-Reply-To: <20170922194045.18814-1-vivien.didelot@savoirfairelinux.com>

Provide dsa_port_enable and dsa_port_disable helpers to respectively
enable and disable a switch port. This makes the dsa_port_set_state_now
helper static.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
---
 net/dsa/dsa_priv.h |  3 ++-
 net/dsa/port.c     | 31 ++++++++++++++++++++++++++++++-
 net/dsa/slave.c    | 19 +++++--------------
 3 files changed, 37 insertions(+), 16 deletions(-)

diff --git a/net/dsa/dsa_priv.h b/net/dsa/dsa_priv.h
index 9803952a5b40..0298a0f6a349 100644
--- a/net/dsa/dsa_priv.h
+++ b/net/dsa/dsa_priv.h
@@ -117,7 +117,8 @@ void dsa_master_ethtool_restore(struct net_device *dev);
 /* port.c */
 int dsa_port_set_state(struct dsa_port *dp, u8 state,
 		       struct switchdev_trans *trans);
-void dsa_port_set_state_now(struct dsa_port *dp, u8 state);
+int dsa_port_enable(struct dsa_port *dp, struct phy_device *phy);
+void dsa_port_disable(struct dsa_port *dp, struct phy_device *phy);
 int dsa_port_bridge_join(struct dsa_port *dp, struct net_device *br);
 void dsa_port_bridge_leave(struct dsa_port *dp, struct net_device *br);
 int dsa_port_vlan_filtering(struct dsa_port *dp, bool vlan_filtering,
diff --git a/net/dsa/port.c b/net/dsa/port.c
index 76d43a82d397..72c8dbd3d3f2 100644
--- a/net/dsa/port.c
+++ b/net/dsa/port.c
@@ -56,7 +56,7 @@ int dsa_port_set_state(struct dsa_port *dp, u8 state,
 	return 0;
 }
 
-void dsa_port_set_state_now(struct dsa_port *dp, u8 state)
+static void dsa_port_set_state_now(struct dsa_port *dp, u8 state)
 {
 	int err;
 
@@ -65,6 +65,35 @@ void dsa_port_set_state_now(struct dsa_port *dp, u8 state)
 		pr_err("DSA: failed to set STP state %u (%d)\n", state, err);
 }
 
+int dsa_port_enable(struct dsa_port *dp, struct phy_device *phy)
+{
+	u8 stp_state = dp->bridge_dev ? BR_STATE_BLOCKING : BR_STATE_FORWARDING;
+	struct dsa_switch *ds = dp->ds;
+	int port = dp->index;
+	int err;
+
+	if (ds->ops->port_enable) {
+		err = ds->ops->port_enable(ds, port, phy);
+		if (err)
+			return err;
+	}
+
+	dsa_port_set_state_now(dp, stp_state);
+
+	return 0;
+}
+
+void dsa_port_disable(struct dsa_port *dp, struct phy_device *phy)
+{
+	struct dsa_switch *ds = dp->ds;
+	int port = dp->index;
+
+	dsa_port_set_state_now(dp, BR_STATE_DISABLED);
+
+	if (ds->ops->port_disable)
+		ds->ops->port_disable(ds, port, phy);
+}
+
 int dsa_port_bridge_join(struct dsa_port *dp, struct net_device *br)
 {
 	struct dsa_notifier_bridge_info info = {
diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index 0aab29928152..4ea1c6eb0da8 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -73,9 +73,7 @@ static int dsa_slave_open(struct net_device *dev)
 {
 	struct dsa_slave_priv *p = netdev_priv(dev);
 	struct dsa_port *dp = p->dp;
-	struct dsa_switch *ds = dp->ds;
 	struct net_device *master = dsa_master_netdev(p);
-	u8 stp_state = dp->bridge_dev ? BR_STATE_BLOCKING : BR_STATE_FORWARDING;
 	int err;
 
 	if (!(master->flags & IFF_UP))
@@ -98,13 +96,9 @@ static int dsa_slave_open(struct net_device *dev)
 			goto clear_allmulti;
 	}
 
-	if (ds->ops->port_enable) {
-		err = ds->ops->port_enable(ds, p->dp->index, dev->phydev);
-		if (err)
-			goto clear_promisc;
-	}
-
-	dsa_port_set_state_now(p->dp, stp_state);
+	err = dsa_port_enable(dp, dev->phydev);
+	if (err)
+		goto clear_promisc;
 
 	if (dev->phydev)
 		phy_start(dev->phydev);
@@ -128,15 +122,12 @@ static int dsa_slave_close(struct net_device *dev)
 {
 	struct dsa_slave_priv *p = netdev_priv(dev);
 	struct net_device *master = dsa_master_netdev(p);
-	struct dsa_switch *ds = p->dp->ds;
+	struct dsa_port *dp = p->dp;
 
 	if (dev->phydev)
 		phy_stop(dev->phydev);
 
-	dsa_port_set_state_now(p->dp, BR_STATE_DISABLED);
-
-	if (ds->ops->port_disable)
-		ds->ops->port_disable(ds, p->dp->index, dev->phydev);
+	dsa_port_disable(dp, dev->phydev);
 
 	dev_mc_unsync(master, dev);
 	dev_uc_unsync(master, dev);
-- 
2.14.1

^ permalink raw reply related

* [PATCH net-next v2 2/3] net: dsa: make slave close symmetrical to open
From: Vivien Didelot @ 2017-09-22 19:40 UTC (permalink / raw)
  To: netdev
  Cc: linux-kernel, kernel, David S. Miller, Florian Fainelli,
	Andrew Lunn, Vivien Didelot
In-Reply-To: <20170922194045.18814-1-vivien.didelot@savoirfairelinux.com>

The DSA slave open function configures the unicast MAC addresses on the
master device, enable the switch port, change its STP state, then start
the PHY device.

Make the close function symmetric, by first stopping the PHY device,
then changing the STP state, disabling the switch port and restore the
master device.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
---
 net/dsa/slave.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index 3760472bf41d..0aab29928152 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -133,6 +133,11 @@ static int dsa_slave_close(struct net_device *dev)
 	if (dev->phydev)
 		phy_stop(dev->phydev);
 
+	dsa_port_set_state_now(p->dp, BR_STATE_DISABLED);
+
+	if (ds->ops->port_disable)
+		ds->ops->port_disable(ds, p->dp->index, dev->phydev);
+
 	dev_mc_unsync(master, dev);
 	dev_uc_unsync(master, dev);
 	if (dev->flags & IFF_ALLMULTI)
@@ -143,11 +148,6 @@ static int dsa_slave_close(struct net_device *dev)
 	if (!ether_addr_equal(dev->dev_addr, master->dev_addr))
 		dev_uc_del(master, dev->dev_addr);
 
-	if (ds->ops->port_disable)
-		ds->ops->port_disable(ds, p->dp->index, dev->phydev);
-
-	dsa_port_set_state_now(p->dp, BR_STATE_DISABLED);
-
 	return 0;
 }
 
-- 
2.14.1

^ permalink raw reply related

* [PATCH net-next v2 1/3] net: dsa: use slave device phydev
From: Vivien Didelot @ 2017-09-22 19:40 UTC (permalink / raw)
  To: netdev
  Cc: linux-kernel, kernel, David S. Miller, Florian Fainelli,
	Andrew Lunn, Vivien Didelot
In-Reply-To: <20170922194045.18814-1-vivien.didelot@savoirfairelinux.com>

There is no need to store a phy_device in dsa_slave_priv since
net_device already provides one. Simply s/p->phy/dev->phydev/.

While at it, return -ENODEV when it is NULL instead of -EOPNOTSUPP.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
---
 net/dsa/slave.c | 126 ++++++++++++++++++++++++++------------------------------
 1 file changed, 58 insertions(+), 68 deletions(-)

diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index 02ace7d462c4..3760472bf41d 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -99,15 +99,15 @@ static int dsa_slave_open(struct net_device *dev)
 	}
 
 	if (ds->ops->port_enable) {
-		err = ds->ops->port_enable(ds, p->dp->index, p->phy);
+		err = ds->ops->port_enable(ds, p->dp->index, dev->phydev);
 		if (err)
 			goto clear_promisc;
 	}
 
 	dsa_port_set_state_now(p->dp, stp_state);
 
-	if (p->phy)
-		phy_start(p->phy);
+	if (dev->phydev)
+		phy_start(dev->phydev);
 
 	return 0;
 
@@ -130,8 +130,8 @@ static int dsa_slave_close(struct net_device *dev)
 	struct net_device *master = dsa_master_netdev(p);
 	struct dsa_switch *ds = p->dp->ds;
 
-	if (p->phy)
-		phy_stop(p->phy);
+	if (dev->phydev)
+		phy_stop(dev->phydev);
 
 	dev_mc_unsync(master, dev);
 	dev_uc_unsync(master, dev);
@@ -144,7 +144,7 @@ static int dsa_slave_close(struct net_device *dev)
 		dev_uc_del(master, dev->dev_addr);
 
 	if (ds->ops->port_disable)
-		ds->ops->port_disable(ds, p->dp->index, p->phy);
+		ds->ops->port_disable(ds, p->dp->index, dev->phydev);
 
 	dsa_port_set_state_now(p->dp, BR_STATE_DISABLED);
 
@@ -273,12 +273,10 @@ dsa_slave_fdb_dump(struct sk_buff *skb, struct netlink_callback *cb,
 
 static int dsa_slave_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd)
 {
-	struct dsa_slave_priv *p = netdev_priv(dev);
+	if (!dev->phydev)
+		return -ENODEV;
 
-	if (p->phy != NULL)
-		return phy_mii_ioctl(p->phy, ifr, cmd);
-
-	return -EOPNOTSUPP;
+	return phy_mii_ioctl(dev->phydev, ifr, cmd);
 }
 
 static int dsa_slave_port_attr_set(struct net_device *dev,
@@ -435,12 +433,10 @@ static int
 dsa_slave_get_link_ksettings(struct net_device *dev,
 			     struct ethtool_link_ksettings *cmd)
 {
-	struct dsa_slave_priv *p = netdev_priv(dev);
+	if (!dev->phydev)
+		return -ENODEV;
 
-	if (!p->phy)
-		return -EOPNOTSUPP;
-
-	phy_ethtool_ksettings_get(p->phy, cmd);
+	phy_ethtool_ksettings_get(dev->phydev, cmd);
 
 	return 0;
 }
@@ -449,12 +445,10 @@ static int
 dsa_slave_set_link_ksettings(struct net_device *dev,
 			     const struct ethtool_link_ksettings *cmd)
 {
-	struct dsa_slave_priv *p = netdev_priv(dev);
+	if (!dev->phydev)
+		return -ENODEV;
 
-	if (p->phy != NULL)
-		return phy_ethtool_ksettings_set(p->phy, cmd);
-
-	return -EOPNOTSUPP;
+	return phy_ethtool_ksettings_set(dev->phydev, cmd);
 }
 
 static void dsa_slave_get_drvinfo(struct net_device *dev,
@@ -488,24 +482,20 @@ dsa_slave_get_regs(struct net_device *dev, struct ethtool_regs *regs, void *_p)
 
 static int dsa_slave_nway_reset(struct net_device *dev)
 {
-	struct dsa_slave_priv *p = netdev_priv(dev);
+	if (!dev->phydev)
+		return -ENODEV;
 
-	if (p->phy != NULL)
-		return genphy_restart_aneg(p->phy);
-
-	return -EOPNOTSUPP;
+	return genphy_restart_aneg(dev->phydev);
 }
 
 static u32 dsa_slave_get_link(struct net_device *dev)
 {
-	struct dsa_slave_priv *p = netdev_priv(dev);
+	if (!dev->phydev)
+		return -ENODEV;
 
-	if (p->phy != NULL) {
-		genphy_update_link(p->phy);
-		return p->phy->link;
-	}
+	genphy_update_link(dev->phydev);
 
-	return -EOPNOTSUPP;
+	return dev->phydev->link;
 }
 
 static int dsa_slave_get_eeprom_len(struct net_device *dev)
@@ -640,7 +630,7 @@ static int dsa_slave_set_eee(struct net_device *dev, struct ethtool_eee *e)
 	int ret;
 
 	/* Port's PHY and MAC both need to be EEE capable */
-	if (!p->phy)
+	if (!dev->phydev)
 		return -ENODEV;
 
 	if (!ds->ops->set_mac_eee)
@@ -651,12 +641,12 @@ static int dsa_slave_set_eee(struct net_device *dev, struct ethtool_eee *e)
 		return ret;
 
 	if (e->eee_enabled) {
-		ret = phy_init_eee(p->phy, 0);
+		ret = phy_init_eee(dev->phydev, 0);
 		if (ret)
 			return ret;
 	}
 
-	return phy_ethtool_set_eee(p->phy, e);
+	return phy_ethtool_set_eee(dev->phydev, e);
 }
 
 static int dsa_slave_get_eee(struct net_device *dev, struct ethtool_eee *e)
@@ -666,7 +656,7 @@ static int dsa_slave_get_eee(struct net_device *dev, struct ethtool_eee *e)
 	int ret;
 
 	/* Port's PHY and MAC both need to be EEE capable */
-	if (!p->phy)
+	if (!dev->phydev)
 		return -ENODEV;
 
 	if (!ds->ops->get_mac_eee)
@@ -676,7 +666,7 @@ static int dsa_slave_get_eee(struct net_device *dev, struct ethtool_eee *e)
 	if (ret)
 		return ret;
 
-	return phy_ethtool_get_eee(p->phy, e);
+	return phy_ethtool_get_eee(dev->phydev, e);
 }
 
 #ifdef CONFIG_NET_POLL_CONTROLLER
@@ -985,26 +975,26 @@ static void dsa_slave_adjust_link(struct net_device *dev)
 	struct dsa_switch *ds = p->dp->ds;
 	unsigned int status_changed = 0;
 
-	if (p->old_link != p->phy->link) {
+	if (p->old_link != dev->phydev->link) {
 		status_changed = 1;
-		p->old_link = p->phy->link;
+		p->old_link = dev->phydev->link;
 	}
 
-	if (p->old_duplex != p->phy->duplex) {
+	if (p->old_duplex != dev->phydev->duplex) {
 		status_changed = 1;
-		p->old_duplex = p->phy->duplex;
+		p->old_duplex = dev->phydev->duplex;
 	}
 
-	if (p->old_pause != p->phy->pause) {
+	if (p->old_pause != dev->phydev->pause) {
 		status_changed = 1;
-		p->old_pause = p->phy->pause;
+		p->old_pause = dev->phydev->pause;
 	}
 
 	if (ds->ops->adjust_link && status_changed)
-		ds->ops->adjust_link(ds, p->dp->index, p->phy);
+		ds->ops->adjust_link(ds, p->dp->index, dev->phydev);
 
 	if (status_changed)
-		phy_print_status(p->phy);
+		phy_print_status(dev->phydev);
 }
 
 static int dsa_slave_fixed_link_update(struct net_device *dev,
@@ -1029,17 +1019,18 @@ static int dsa_slave_phy_connect(struct net_device *slave_dev, int addr)
 	struct dsa_slave_priv *p = netdev_priv(slave_dev);
 	struct dsa_switch *ds = p->dp->ds;
 
-	p->phy = mdiobus_get_phy(ds->slave_mii_bus, addr);
-	if (!p->phy) {
+	slave_dev->phydev = mdiobus_get_phy(ds->slave_mii_bus, addr);
+	if (!slave_dev->phydev) {
 		netdev_err(slave_dev, "no phy at %d\n", addr);
 		return -ENODEV;
 	}
 
 	/* Use already configured phy mode */
 	if (p->phy_interface == PHY_INTERFACE_MODE_NA)
-		p->phy_interface = p->phy->interface;
-	return phy_connect_direct(slave_dev, p->phy, dsa_slave_adjust_link,
-				  p->phy_interface);
+		p->phy_interface = slave_dev->phydev->interface;
+
+	return phy_connect_direct(slave_dev, slave_dev->phydev,
+				  dsa_slave_adjust_link, p->phy_interface);
 }
 
 static int dsa_slave_phy_setup(struct net_device *slave_dev)
@@ -1091,22 +1082,23 @@ static int dsa_slave_phy_setup(struct net_device *slave_dev)
 				return ret;
 			}
 		} else {
-			p->phy = of_phy_connect(slave_dev, phy_dn,
-						dsa_slave_adjust_link,
-						phy_flags,
-						p->phy_interface);
+			slave_dev->phydev = of_phy_connect(slave_dev, phy_dn,
+							   dsa_slave_adjust_link,
+							   phy_flags,
+							   p->phy_interface);
 		}
 
 		of_node_put(phy_dn);
 	}
 
-	if (p->phy && phy_is_fixed)
-		fixed_phy_set_link_update(p->phy, dsa_slave_fixed_link_update);
+	if (slave_dev->phydev && phy_is_fixed)
+		fixed_phy_set_link_update(slave_dev->phydev,
+					  dsa_slave_fixed_link_update);
 
 	/* We could not connect to a designated PHY, so use the switch internal
 	 * MDIO bus instead
 	 */
-	if (!p->phy) {
+	if (!slave_dev->phydev) {
 		ret = dsa_slave_phy_connect(slave_dev, p->dp->index);
 		if (ret) {
 			netdev_err(slave_dev, "failed to connect to port %d: %d\n",
@@ -1117,7 +1109,7 @@ static int dsa_slave_phy_setup(struct net_device *slave_dev)
 		}
 	}
 
-	phy_attached_info(p->phy);
+	phy_attached_info(slave_dev->phydev);
 
 	return 0;
 }
@@ -1137,12 +1129,12 @@ int dsa_slave_suspend(struct net_device *slave_dev)
 
 	netif_device_detach(slave_dev);
 
-	if (p->phy) {
-		phy_stop(p->phy);
+	if (slave_dev->phydev) {
+		phy_stop(slave_dev->phydev);
 		p->old_pause = -1;
 		p->old_link = -1;
 		p->old_duplex = -1;
-		phy_suspend(p->phy);
+		phy_suspend(slave_dev->phydev);
 	}
 
 	return 0;
@@ -1150,13 +1142,11 @@ int dsa_slave_suspend(struct net_device *slave_dev)
 
 int dsa_slave_resume(struct net_device *slave_dev)
 {
-	struct dsa_slave_priv *p = netdev_priv(slave_dev);
-
 	netif_device_attach(slave_dev);
 
-	if (p->phy) {
-		phy_resume(p->phy);
-		phy_start(p->phy);
+	if (slave_dev->phydev) {
+		phy_resume(slave_dev->phydev);
+		phy_start(slave_dev->phydev);
 	}
 
 	return 0;
@@ -1249,8 +1239,8 @@ void dsa_slave_destroy(struct net_device *slave_dev)
 	port_dn = p->dp->dn;
 
 	netif_carrier_off(slave_dev);
-	if (p->phy) {
-		phy_disconnect(p->phy);
+	if (slave_dev->phydev) {
+		phy_disconnect(slave_dev->phydev);
 
 		if (of_phy_is_fixed_link(port_dn))
 			of_phy_deregister_fixed_link(port_dn);
-- 
2.14.1

^ permalink raw reply related

* [PATCH net-next v2 0/3] net: dsa: use slave device phydev
From: Vivien Didelot @ 2017-09-22 19:40 UTC (permalink / raw)
  To: netdev
  Cc: linux-kernel, kernel, David S. Miller, Florian Fainelli,
	Andrew Lunn, Vivien Didelot

This patchset removes the private phy_device in favor of the one
provided by the slave net_device, makes slave open and close symmetrical
and finally provides helpers for enabling or disabling a DSA port.

Changes in v2:
  - do not remove the phy argument from port enable/disable

Vivien Didelot (3):
  net: dsa: use slave device phydev
  net: dsa: make slave close symmetrical to open
  net: dsa: add port enable and disable helpers

 net/dsa/dsa_priv.h |   3 +-
 net/dsa/port.c     |  31 +++++++++++-
 net/dsa/slave.c    | 143 +++++++++++++++++++++++------------------------------
 3 files changed, 94 insertions(+), 83 deletions(-)

-- 
2.14.1

^ permalink raw reply

* Re: [PATCH net-next 2/4] net: dsa: remove phy arg from port enable/disable
From: Andrew Lunn @ 2017-09-22 19:11 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: Vivien Didelot, netdev, linux-kernel, kernel, David S. Miller
In-Reply-To: <6d7799bb-2b33-0696-1805-63cea5e52667@gmail.com>

> Historical reasons mostly. Considering the complexity of
> dsa_slave_phy_setup(), I would certainly be extremely careful in
> changing any of this, the potential for breakage is pretty big.

Yes, i took a look at this, wondering how to convert to phylink. I
went away and got a stiff drink :-)

     Andrew

^ permalink raw reply

* [PATCH] r8152:  add Linksys USB3GIGV1 id
From: Grant Grundler @ 2017-09-22 19:06 UTC (permalink / raw)
  To: Hayes Wang
  Cc: linux-usb, David S . Miller, LKML, netdev-u79uwXL29TY76Z2rM5mHXA,
	Grant Grundler

This Linksys dongle by default comes up in cdc_ether mode.
This patch allows r8152 to claim the device:
   Bus 002 Device 002: ID 13b1:0041 Linksys

Signed-off-by: Grant Grundler <grundler-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org>
---
 drivers/net/usb/r8152.c | 2 ++
 1 file changed, 2 insertions(+)

This was tested on chromeos-3.14, chromeos-3.18, and chromeos-4.4 kernels
with a mix of ARM/x86-64 systems.

diff --git a/drivers/net/usb/r8152.c b/drivers/net/usb/r8152.c
index ceb78e2ea4f0..941ece08ba78 100644
--- a/drivers/net/usb/r8152.c
+++ b/drivers/net/usb/r8152.c
@@ -613,6 +613,7 @@ enum rtl8152_flags {
 #define VENDOR_ID_MICROSOFT		0x045e
 #define VENDOR_ID_SAMSUNG		0x04e8
 #define VENDOR_ID_LENOVO		0x17ef
+#define VENDOR_ID_LINKSYS		0x13b1
 #define VENDOR_ID_NVIDIA		0x0955
 
 #define MCU_TYPE_PLA			0x0100
@@ -5316,6 +5317,7 @@ static const struct usb_device_id rtl8152_table[] = {
 	{REALTEK_USB_DEVICE(VENDOR_ID_LENOVO,  0x7205)},
 	{REALTEK_USB_DEVICE(VENDOR_ID_LENOVO,  0x720c)},
 	{REALTEK_USB_DEVICE(VENDOR_ID_LENOVO,  0x7214)},
+	{REALTEK_USB_DEVICE(VENDOR_ID_LINKSYS, 0x0041)},
 	{REALTEK_USB_DEVICE(VENDOR_ID_NVIDIA,  0x09ff)},
 	{}
 };
-- 
2.14.1.821.g8fa685d3b7-goog

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox