Netdev List

Netdev List
 help / color / mirror / Atom feed

* [net-next  1/3] tipc: set default MTU for UDP media
From: GhantaKrishnamurthy MohanKrishna @ 2018-04-19  9:06 UTC (permalink / raw)
  To: tipc-discussion, jon.maloy, maloy, ying.xue,
	mohan.krishna.ghanta.krishnamurthy, netdev, davem
In-Reply-To: <1524128780-2550-1-git-send-email-mohan.krishna.ghanta.krishnamurthy@ericsson.com>

Currently, all bearers are configured with MTU value same as the
underlying L2 device. However, in case of bearers with media type
UDP, higher throughput is possible with a fixed and higher emulated
MTU value than adapting to the underlying L2 MTU.

In this commit, we introduce a parameter mtu in struct tipc_media
and a default value is set for UDP. A default value of 14k
was determined by experimentation and found to have a higher throughput
than 16k. MTU for UDP bearers are assigned the above set value of
media MTU.

Acked-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: GhantaKrishnamurthy MohanKrishna <mohan.krishna.ghanta.krishnamurthy@ericsson.com>
---
 include/uapi/linux/tipc_config.h | 5 +++++
 net/tipc/udp_media.c             | 4 ++--
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/include/uapi/linux/tipc_config.h b/include/uapi/linux/tipc_config.h
index 3f29e3c8ed06..4b2c93b1934c 100644
--- a/include/uapi/linux/tipc_config.h
+++ b/include/uapi/linux/tipc_config.h
@@ -185,6 +185,11 @@
 #define TIPC_DEF_LINK_WIN 50
 #define TIPC_MAX_LINK_WIN 8191
 
+/*
+ * Default MTU for UDP media
+ */
+
+#define TIPC_DEF_LINK_UDP_MTU 14000
 
 struct tipc_node_info {
 	__be32 addr;			/* network address of node */
diff --git a/net/tipc/udp_media.c b/net/tipc/udp_media.c
index e7d91f5d5cae..9783101bc4a9 100644
--- a/net/tipc/udp_media.c
+++ b/net/tipc/udp_media.c
@@ -713,8 +713,7 @@ static int tipc_udp_enable(struct net *net, struct tipc_bearer *b,
 			err = -EINVAL;
 			goto err;
 		}
-		b->mtu = dev->mtu - sizeof(struct iphdr)
-			- sizeof(struct udphdr);
+		b->mtu = b->media->mtu;
 #if IS_ENABLED(CONFIG_IPV6)
 	} else if (local.proto == htons(ETH_P_IPV6)) {
 		udp_conf.family = AF_INET6;
@@ -803,6 +802,7 @@ struct tipc_media udp_media_info = {
 	.priority	= TIPC_DEF_LINK_PRI,
 	.tolerance	= TIPC_DEF_LINK_TOL,
 	.window		= TIPC_DEF_LINK_WIN,
+	.mtu		= TIPC_DEF_LINK_UDP_MTU,
 	.type_id	= TIPC_MEDIA_TYPE_UDP,
 	.hwaddr_len	= 0,
 	.name		= "udp"
-- 
2.1.4

^ permalink raw reply related

* [net-next  0/3] tipc: Confgiuration of MTU for media UDP
From: GhantaKrishnamurthy MohanKrishna @ 2018-04-19  9:06 UTC (permalink / raw)
  To: tipc-discussion, jon.maloy, maloy, ying.xue,
	mohan.krishna.ghanta.krishnamurthy, netdev, davem

Systematic measurements have shown that an emulated MTU of 14k for
UDP bearers is the optimal value for maximal throughput. Accordingly,
the default MTU of UDP bearers is changed to 14k.

We also provide users with a fallback option from this value,
by providing support to configure MTU for UDP bearers. The following
options are introduced which are symmetrical to the design of
confguring link tolerance.

- Configure media with new MTU value, which will take effect on
links going up after the moment it was configured. Alternatively,
the bearer has to be disabled and re-enabled, for existing links to
reflect the configured value.

- Configure bearer with new MTU value, which take effect on 
running links dynamically.

Please note:
- User has to change MTU at both endpoints, otherwise the link 
will fall back to smallest MTU after a reset.
- Failover from a link with higher MTU to a link with lower MTU

GhantaKrishnamurthy MohanKrishna (3):
  tipc: set default MTU for UDP media
  tipc: implement configuration of UDP media MTU
  tipc: confgiure and apply UDP bearer MTU on running links

 include/uapi/linux/tipc_config.h  |  5 +++++
 include/uapi/linux/tipc_netlink.h |  1 +
 net/tipc/bearer.c                 | 29 ++++++++++++++++++++++++++++-
 net/tipc/bearer.h                 |  3 +++
 net/tipc/node.c                   | 12 +++++++++---
 net/tipc/node.h                   |  2 +-
 net/tipc/udp_media.c              |  4 ++--
 net/tipc/udp_media.h              | 14 ++++++++++++++
 8 files changed, 63 insertions(+), 7 deletions(-)

-- 
2.1.4

^ permalink raw reply

* RE: [PATCH] net: phy: marvell: clear wol event before setting it
From: Bhadram Varka @ 2018-04-19  9:00 UTC (permalink / raw)
  To: Jisheng Zhang
  Cc: Andrew Lunn, Florian Fainelli, David S. Miller,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org, Jingju Hou
In-Reply-To: <20180419165351.5388021e@xhacker.debian>

Hi,

> -----Original Message-----
> From: Jisheng Zhang <Jisheng.Zhang@synaptics.com>
> Sent: Thursday, April 19, 2018 2:24 PM
> To: Bhadram Varka <vbhadram@nvidia.com>
> Cc: Andrew Lunn <andrew@lunn.ch>; Florian Fainelli <f.fainelli@gmail.com>;
> David S. Miller <davem@davemloft.net>; netdev@vger.kernel.org; linux-
> kernel@vger.kernel.org; Jingju Hou <Jingju.Hou@synaptics.com>
> Subject: Re: [PATCH] net: phy: marvell: clear wol event before setting it
> 
> Hi,
> 
> On Thu, 19 Apr 2018 08:38:45 +0000 Bhadram Varka wrote:
> 
> > Hi,
> >
> > > -----Original Message-----
> > > From: netdev-owner@vger.kernel.org <netdev-owner@vger.kernel.org> On
> > > Behalf Of Jisheng Zhang
> > > Sent: Thursday, April 19, 2018 1:33 PM
> > > To: Andrew Lunn <andrew@lunn.ch>; Florian Fainelli
> > > <f.fainelli@gmail.com>; David S. Miller <davem@davemloft.net>
> > > Cc: netdev@vger.kernel.org; linux-kernel@vger.kernel.org; Jingju Hou
> > > <Jingju.Hou@synaptics.com>
> > > Subject: [PATCH] net: phy: marvell: clear wol event before setting
> > > it
> > >
> > > From: Jingju Hou <Jingju.Hou@synaptics.com>
> > >
> > > If WOL event happened once, the LED[2] interrupt pin will not be
> > > cleared unless reading the CSISR register. So clear the WOL event before
> enabling it.
> > >
> > > Signed-off-by: Jingju Hou <Jingju.Hou@synaptics.com>
> > > Signed-off-by: Jisheng Zhang <Jisheng.Zhang@synaptics.com>
> > > ---
> > >  drivers/net/phy/marvell.c | 9 +++++++++
> > >  1 file changed, 9 insertions(+)
> > >
> > > diff --git a/drivers/net/phy/marvell.c b/drivers/net/phy/marvell.c
> > > index c22e8e383247..b6abe1cbc84b 100644
> > > --- a/drivers/net/phy/marvell.c
> > > +++ b/drivers/net/phy/marvell.c
> > > @@ -115,6 +115,9 @@
> > >  /* WOL Event Interrupt Enable */
> > >  #define MII_88E1318S_PHY_CSIER_WOL_EIE			BIT(7)
> > >
> > > +/* Copper Specific Interrupt Status Register */
> > > +#define MII_88E1318S_PHY_CSISR				0x13
> > > +
> >
> > There is already macro to represent this register - MII_M1011_IEVENT. Do we
> need this macro ?
> 
> Good point. Will use MII_M1011_IEVENT instead in v2.
> 
> >
> > >  /* LED Timer Control Register */
> > >  #define MII_88E1318S_PHY_LED_TCR			0x12
> > >  #define MII_88E1318S_PHY_LED_TCR_FORCE_INT		BIT(15)
> > > @@ -1393,6 +1396,12 @@ static int m88e1318_set_wol(struct phy_device
> > > *phydev,
> > >  		if (err < 0)
> > >  			goto error;
> > >
> > > +		/* If WOL event happened once, the LED[2] interrupt pin
> > > +		 * will not be cleared unless reading the CSISR register.
> > > +		 * So clear the WOL event first before enabling it.
> > > +		 */
> > > +		phy_read(phydev, MII_88E1318S_PHY_CSISR);
> >
> > This part of the operation already taken care by ack_interrupt and
> > did_interrupt [....] .ack_interrupt = &marvell_ack_interrupt,
> > .did_interrupt = &m88e1121_did_interrupt, [...]
> >
> > If at all WOL event occurred marvell_ack_interrupt will take care of clearing the
> interrupt status register.
> > Am I missing anything here ?
> 
> If there's no valid irq for phy, the ack_interrupt/did_interrupt won't be called.

Which means that the PHY is not having Interrupt pin ?

Generally through PHY interrupt will wake up the system right. If there is no interrupt pin then how the system will wake up the from suspend for the magic packet.?

Thanks!

^ permalink raw reply

* [PATCH 5/5] ipvs: fix multiplicative hashing in sh/dh/lblc/lblcr algorithms
From: Simon Horman @ 2018-04-19  8:56 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: lvs-devel, netdev, netfilter-devel, Wensong Zhang,
	Julian Anastasov, Vincent Bernat, Simon Horman
In-Reply-To: <20180419085614.7437-1-horms@verge.net.au>

From: Vincent Bernat <vincent@bernat.im>

The sh/dh/lblc/lblcr algorithms are using Knuth's multiplicative
hashing incorrectly. Replace its use by the hash_32() macro, which
correctly implements this algorithm. It doesn't use the same constant,
but it shouldn't matter.

Signed-off-by: Vincent Bernat <vincent@bernat.im>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
---
 net/netfilter/ipvs/ip_vs_dh.c    | 3 ++-
 net/netfilter/ipvs/ip_vs_lblc.c  | 3 ++-
 net/netfilter/ipvs/ip_vs_lblcr.c | 3 ++-
 net/netfilter/ipvs/ip_vs_sh.c    | 3 ++-
 4 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/net/netfilter/ipvs/ip_vs_dh.c b/net/netfilter/ipvs/ip_vs_dh.c
index 75f798f8e83b..07459e71d907 100644
--- a/net/netfilter/ipvs/ip_vs_dh.c
+++ b/net/netfilter/ipvs/ip_vs_dh.c
@@ -43,6 +43,7 @@
 #include <linux/module.h>
 #include <linux/kernel.h>
 #include <linux/skbuff.h>
+#include <linux/hash.h>
 
 #include <net/ip_vs.h>
 
@@ -81,7 +82,7 @@ static inline unsigned int ip_vs_dh_hashkey(int af, const union nf_inet_addr *ad
 		addr_fold = addr->ip6[0]^addr->ip6[1]^
 			    addr->ip6[2]^addr->ip6[3];
 #endif
-	return (ntohl(addr_fold)*2654435761UL) & IP_VS_DH_TAB_MASK;
+	return hash_32(ntohl(addr_fold), IP_VS_DH_TAB_BITS);
 }
 
 
diff --git a/net/netfilter/ipvs/ip_vs_lblc.c b/net/netfilter/ipvs/ip_vs_lblc.c
index 3057e453bf31..08147fc6400c 100644
--- a/net/netfilter/ipvs/ip_vs_lblc.c
+++ b/net/netfilter/ipvs/ip_vs_lblc.c
@@ -48,6 +48,7 @@
 #include <linux/kernel.h>
 #include <linux/skbuff.h>
 #include <linux/jiffies.h>
+#include <linux/hash.h>
 
 /* for sysctl */
 #include <linux/fs.h>
@@ -160,7 +161,7 @@ ip_vs_lblc_hashkey(int af, const union nf_inet_addr *addr)
 		addr_fold = addr->ip6[0]^addr->ip6[1]^
 			    addr->ip6[2]^addr->ip6[3];
 #endif
-	return (ntohl(addr_fold)*2654435761UL) & IP_VS_LBLC_TAB_MASK;
+	return hash_32(ntohl(addr_fold), IP_VS_LBLC_TAB_BITS);
 }
 
 
diff --git a/net/netfilter/ipvs/ip_vs_lblcr.c b/net/netfilter/ipvs/ip_vs_lblcr.c
index 92adc04557ed..9b6a6c9e9cfa 100644
--- a/net/netfilter/ipvs/ip_vs_lblcr.c
+++ b/net/netfilter/ipvs/ip_vs_lblcr.c
@@ -47,6 +47,7 @@
 #include <linux/jiffies.h>
 #include <linux/list.h>
 #include <linux/slab.h>
+#include <linux/hash.h>
 
 /* for sysctl */
 #include <linux/fs.h>
@@ -323,7 +324,7 @@ ip_vs_lblcr_hashkey(int af, const union nf_inet_addr *addr)
 		addr_fold = addr->ip6[0]^addr->ip6[1]^
 			    addr->ip6[2]^addr->ip6[3];
 #endif
-	return (ntohl(addr_fold)*2654435761UL) & IP_VS_LBLCR_TAB_MASK;
+	return hash_32(ntohl(addr_fold), IP_VS_LBLCR_TAB_BITS);
 }
 
 
diff --git a/net/netfilter/ipvs/ip_vs_sh.c b/net/netfilter/ipvs/ip_vs_sh.c
index 16aaac6eedc9..1e01c782583a 100644
--- a/net/netfilter/ipvs/ip_vs_sh.c
+++ b/net/netfilter/ipvs/ip_vs_sh.c
@@ -96,7 +96,8 @@ ip_vs_sh_hashkey(int af, const union nf_inet_addr *addr,
 		addr_fold = addr->ip6[0]^addr->ip6[1]^
 			    addr->ip6[2]^addr->ip6[3];
 #endif
-	return (offset + (ntohs(port) + ntohl(addr_fold))*2654435761UL) &
+	return (offset + hash_32(ntohs(port) + ntohl(addr_fold),
+				 IP_VS_SH_TAB_BITS)) &
 		IP_VS_SH_TAB_MASK;
 }
 
-- 
2.11.0

^ permalink raw reply related

* [PATCH 3/5] netfilter: ipvs: Add Maglev hashing scheduler
From: Simon Horman @ 2018-04-19  8:56 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: lvs-devel, netdev, netfilter-devel, Wensong Zhang,
	Julian Anastasov, Inju Song, Simon Horman
In-Reply-To: <20180419085614.7437-1-horms@verge.net.au>

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=UTF-8, Size: 15318 bytes --]

From: Inju Song <inju.song@navercorp.com>

Implements the Google's Maglev hashing algorithm as a IPVS scheduler.

Basically it provides consistent hashing butÂ offers some special
features about disruption and load balancing.

 1)Â minimal disruption: when the set of destinations changes,
    a connection will likely be sent to the same destination
    as it was before.

 2)Â load balancing: each destination will receive an almost
    equal number of connections.

Seel also for detail: [3.4 Consistent Hasing] in
https://www.usenix.org/system/files/conference/nsdi16/nsdi16-paper-eisenbud.pdf

Signed-off-by: Inju Song <inju.song@navercorp.com>
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
---
 net/netfilter/ipvs/ip_vs_mh.c | 540 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 540 insertions(+)
 create mode 100644 net/netfilter/ipvs/ip_vs_mh.c

diff --git a/net/netfilter/ipvs/ip_vs_mh.c b/net/netfilter/ipvs/ip_vs_mh.c
new file mode 100644
index 000000000000..0f795b186eb3
--- /dev/null
+++ b/net/netfilter/ipvs/ip_vs_mh.c
@@ -0,0 +1,540 @@
+// SPDX-License-Identifier: GPL-2.0
+/* IPVS:	Maglev Hashing scheduling module
+ *
+ * Authors:	Inju Song <inju.song@navercorp.com>
+ *
+ */
+
+/* The mh algorithm is to assign a preference list of all the lookup
+ * table positions to each destination and populate the table with
+ * the most-preferred position of destinations. Then it is to select
+ * destination with the hash key of source IP address through looking
+ * up a the lookup table.
+ *
+ * The algorithm is detailed in:
+ * [3.4 Consistent Hasing]
+https://www.usenix.org/system/files/conference/nsdi16/nsdi16-paper-eisenbud.pdf
+ *
+ */
+
+#define KMSG_COMPONENT "IPVS"
+#define pr_fmt(fmt) KMSG_COMPONENT ": " fmt
+
+#include <linux/ip.h>
+#include <linux/slab.h>
+#include <linux/module.h>
+#include <linux/kernel.h>
+#include <linux/skbuff.h>
+
+#include <net/ip_vs.h>
+
+#include <linux/siphash.h>
+#include <linux/bitops.h>
+#include <linux/gcd.h>
+
+#define IP_VS_SVC_F_SCHED_MH_FALLBACK	IP_VS_SVC_F_SCHED1 /* MH fallback */
+#define IP_VS_SVC_F_SCHED_MH_PORT	IP_VS_SVC_F_SCHED2 /* MH use port */
+
+struct ip_vs_mh_lookup {
+	struct ip_vs_dest __rcu	*dest;	/* real server (cache) */
+};
+
+struct ip_vs_mh_dest_setup {
+	unsigned int	offset; /* starting offset */
+	unsigned int	skip;	/* skip */
+	unsigned int	perm;	/* next_offset */
+	int		turns;	/* weight / gcd() and rshift */
+};
+
+/* Available prime numbers for MH table */
+static int primes[] = {251, 509, 1021, 2039, 4093,
+		       8191, 16381, 32749, 65521, 131071};
+
+/* For IPVS MH entry hash table */
+#ifndef CONFIG_IP_VS_MH_TAB_INDEX
+#define CONFIG_IP_VS_MH_TAB_INDEX	12
+#endif
+#define IP_VS_MH_TAB_BITS		(CONFIG_IP_VS_MH_TAB_INDEX / 2)
+#define IP_VS_MH_TAB_INDEX		(CONFIG_IP_VS_MH_TAB_INDEX - 8)
+#define IP_VS_MH_TAB_SIZE               primes[IP_VS_MH_TAB_INDEX]
+
+struct ip_vs_mh_state {
+	struct rcu_head			rcu_head;
+	struct ip_vs_mh_lookup		*lookup;
+	struct ip_vs_mh_dest_setup	*dest_setup;
+	hsiphash_key_t			hash1, hash2;
+	int				gcd;
+	int				rshift;
+};
+
+static inline void generate_hash_secret(hsiphash_key_t *hash1,
+					hsiphash_key_t *hash2)
+{
+	hash1->key[0] = 2654435761UL;
+	hash1->key[1] = 2654435761UL;
+
+	hash2->key[0] = 2654446892UL;
+	hash2->key[1] = 2654446892UL;
+}
+
+/* Helper function to determine if server is unavailable */
+static inline bool is_unavailable(struct ip_vs_dest *dest)
+{
+	return atomic_read(&dest->weight) <= 0 ||
+	       dest->flags & IP_VS_DEST_F_OVERLOAD;
+}
+
+/* Returns hash value for IPVS MH entry */
+static inline unsigned int
+ip_vs_mh_hashkey(int af, const union nf_inet_addr *addr,
+		 __be16 port, hsiphash_key_t *key, unsigned int offset)
+{
+	unsigned int v;
+	__be32 addr_fold = addr->ip;
+
+#ifdef CONFIG_IP_VS_IPV6
+	if (af == AF_INET6)
+		addr_fold = addr->ip6[0] ^ addr->ip6[1] ^
+			    addr->ip6[2] ^ addr->ip6[3];
+#endif
+	v = (offset + ntohs(port) + ntohl(addr_fold));
+	return hsiphash(&v, sizeof(v), key);
+}
+
+/* Reset all the hash buckets of the specified table. */
+static void ip_vs_mh_reset(struct ip_vs_mh_state *s)
+{
+	int i;
+	struct ip_vs_mh_lookup *l;
+	struct ip_vs_dest *dest;
+
+	l = &s->lookup[0];
+	for (i = 0; i < IP_VS_MH_TAB_SIZE; i++) {
+		dest = rcu_dereference_protected(l->dest, 1);
+		if (dest) {
+			ip_vs_dest_put(dest);
+			RCU_INIT_POINTER(l->dest, NULL);
+		}
+		l++;
+	}
+}
+
+static int ip_vs_mh_permutate(struct ip_vs_mh_state *s,
+			      struct ip_vs_service *svc)
+{
+	struct list_head *p;
+	struct ip_vs_mh_dest_setup *ds;
+	struct ip_vs_dest *dest;
+	int lw;
+
+	/* If gcd is smaller then 1, number of dests or
+	 * all last_weight of dests are zero. So, skip
+	 * permutation for the dests.
+	 */
+	if (s->gcd < 1)
+		return 0;
+
+	/* Set dest_setup for the dests permutation */
+	p = &svc->destinations;
+	ds = &s->dest_setup[0];
+	while ((p = p->next) != &svc->destinations) {
+		dest = list_entry(p, struct ip_vs_dest, n_list);
+
+		ds->offset = ip_vs_mh_hashkey(svc->af, &dest->addr,
+					      dest->port, &s->hash1, 0) %
+					      IP_VS_MH_TAB_SIZE;
+		ds->skip = ip_vs_mh_hashkey(svc->af, &dest->addr,
+					    dest->port, &s->hash2, 0) %
+					    (IP_VS_MH_TAB_SIZE - 1) + 1;
+		ds->perm = ds->offset;
+
+		lw = atomic_read(&dest->last_weight);
+		ds->turns = ((lw / s->gcd) >> s->rshift) ? : (lw != 0);
+		ds++;
+	}
+
+	return 0;
+}
+
+static int ip_vs_mh_populate(struct ip_vs_mh_state *s,
+			     struct ip_vs_service *svc)
+{
+	int n, c, dt_count;
+	unsigned long *table;
+	struct list_head *p;
+	struct ip_vs_mh_dest_setup *ds;
+	struct ip_vs_dest *dest, *new_dest;
+
+	/* If gcd is smaller then 1, number of dests or
+	 * all last_weight of dests are zero. So, skip
+	 * the population for the dests and reset lookup table.
+	 */
+	if (s->gcd < 1) {
+		ip_vs_mh_reset(s);
+		return 0;
+	}
+
+	table =  kcalloc(BITS_TO_LONGS(IP_VS_MH_TAB_SIZE),
+			 sizeof(unsigned long), GFP_KERNEL);
+	if (!table)
+		return -ENOMEM;
+
+	p = &svc->destinations;
+	n = 0;
+	dt_count = 0;
+	while (n < IP_VS_MH_TAB_SIZE) {
+		if (p == &svc->destinations)
+			p = p->next;
+
+		ds = &s->dest_setup[0];
+		while (p != &svc->destinations) {
+			/* Ignore added server with zero weight */
+			if (ds->turns < 1) {
+				p = p->next;
+				ds++;
+				continue;
+			}
+
+			c = ds->perm;
+			while (test_bit(c, table)) {
+				/* Add skip, mod IP_VS_MH_TAB_SIZE */
+				ds->perm += ds->skip;
+				if (ds->perm >= IP_VS_MH_TAB_SIZE)
+					ds->perm -= IP_VS_MH_TAB_SIZE;
+				c = ds->perm;
+			}
+
+			__set_bit(c, table);
+
+			dest = rcu_dereference_protected(s->lookup[c].dest, 1);
+			new_dest = list_entry(p, struct ip_vs_dest, n_list);
+			if (dest != new_dest) {
+				if (dest)
+					ip_vs_dest_put(dest);
+				ip_vs_dest_hold(new_dest);
+				RCU_INIT_POINTER(s->lookup[c].dest, new_dest);
+			}
+
+			if (++n == IP_VS_MH_TAB_SIZE)
+				goto out;
+
+			if (++dt_count >= ds->turns) {
+				dt_count = 0;
+				p = p->next;
+				ds++;
+			}
+		}
+	}
+
+out:
+	kfree(table);
+	return 0;
+}
+
+/* Get ip_vs_dest associated with supplied parameters. */
+static inline struct ip_vs_dest *
+ip_vs_mh_get(struct ip_vs_service *svc, struct ip_vs_mh_state *s,
+	     const union nf_inet_addr *addr, __be16 port)
+{
+	unsigned int hash = ip_vs_mh_hashkey(svc->af, addr, port, &s->hash1, 0)
+					     % IP_VS_MH_TAB_SIZE;
+	struct ip_vs_dest *dest = rcu_dereference(s->lookup[hash].dest);
+
+	return (!dest || is_unavailable(dest)) ? NULL : dest;
+}
+
+/* As ip_vs_mh_get, but with fallback if selected server is unavailable */
+static inline struct ip_vs_dest *
+ip_vs_mh_get_fallback(struct ip_vs_service *svc, struct ip_vs_mh_state *s,
+		      const union nf_inet_addr *addr, __be16 port)
+{
+	unsigned int offset, roffset;
+	unsigned int hash, ihash;
+	struct ip_vs_dest *dest;
+
+	/* First try the dest it's supposed to go to */
+	ihash = ip_vs_mh_hashkey(svc->af, addr, port,
+				 &s->hash1, 0) % IP_VS_MH_TAB_SIZE;
+	dest = rcu_dereference(s->lookup[ihash].dest);
+	if (!dest)
+		return NULL;
+	if (!is_unavailable(dest))
+		return dest;
+
+	IP_VS_DBG_BUF(6, "MH: selected unavailable server %s:%u, reselecting",
+		      IP_VS_DBG_ADDR(dest->af, &dest->addr), ntohs(dest->port));
+
+	/* If the original dest is unavailable, loop around the table
+	 * starting from ihash to find a new dest
+	 */
+	for (offset = 0; offset < IP_VS_MH_TAB_SIZE; offset++) {
+		roffset = (offset + ihash) % IP_VS_MH_TAB_SIZE;
+		hash = ip_vs_mh_hashkey(svc->af, addr, port, &s->hash1,
+					roffset) % IP_VS_MH_TAB_SIZE;
+		dest = rcu_dereference(s->lookup[hash].dest);
+		if (!dest)
+			break;
+		if (!is_unavailable(dest))
+			return dest;
+		IP_VS_DBG_BUF(6,
+			      "MH: selected unavailable server %s:%u (offset %u), reselecting",
+			      IP_VS_DBG_ADDR(dest->af, &dest->addr),
+			      ntohs(dest->port), roffset);
+	}
+
+	return NULL;
+}
+
+/* Assign all the hash buckets of the specified table with the service. */
+static int ip_vs_mh_reassign(struct ip_vs_mh_state *s,
+			     struct ip_vs_service *svc)
+{
+	int ret;
+
+	if (svc->num_dests > IP_VS_MH_TAB_SIZE)
+		return -EINVAL;
+
+	if (svc->num_dests >= 1) {
+		s->dest_setup = kcalloc(svc->num_dests,
+					sizeof(struct ip_vs_mh_dest_setup),
+					GFP_KERNEL);
+		if (!s->dest_setup)
+			return -ENOMEM;
+	}
+
+	ip_vs_mh_permutate(s, svc);
+
+	ret = ip_vs_mh_populate(s, svc);
+	if (ret < 0)
+		goto out;
+
+	IP_VS_DBG_BUF(6, "MH: reassign lookup table of %s:%u\n",
+		      IP_VS_DBG_ADDR(svc->af, &svc->addr),
+		      ntohs(svc->port));
+
+out:
+	if (svc->num_dests >= 1) {
+		kfree(s->dest_setup);
+		s->dest_setup = NULL;
+	}
+	return ret;
+}
+
+static int ip_vs_mh_gcd_weight(struct ip_vs_service *svc)
+{
+	struct ip_vs_dest *dest;
+	int weight;
+	int g = 0;
+
+	list_for_each_entry(dest, &svc->destinations, n_list) {
+		weight = atomic_read(&dest->last_weight);
+		if (weight > 0) {
+			if (g > 0)
+				g = gcd(weight, g);
+			else
+				g = weight;
+		}
+	}
+	return g;
+}
+
+/* To avoid assigning huge weight for the MH table,
+ * calculate shift value with gcd.
+ */
+static int ip_vs_mh_shift_weight(struct ip_vs_service *svc, int gcd)
+{
+	struct ip_vs_dest *dest;
+	int new_weight, weight = 0;
+	int mw, shift;
+
+	/* If gcd is smaller then 1, number of dests or
+	 * all last_weight of dests are zero. So, return
+	 * shift value as zero.
+	 */
+	if (gcd < 1)
+		return 0;
+
+	list_for_each_entry(dest, &svc->destinations, n_list) {
+		new_weight = atomic_read(&dest->last_weight);
+		if (new_weight > weight)
+			weight = new_weight;
+	}
+
+	/* Because gcd is greater than zero,
+	 * the maximum weight and gcd are always greater than zero
+	 */
+	mw = weight / gcd;
+
+	/* shift = occupied bits of weight/gcd - MH highest bits */
+	shift = fls(mw) - IP_VS_MH_TAB_BITS;
+	return (shift >= 0) ? shift : 0;
+}
+
+static void ip_vs_mh_state_free(struct rcu_head *head)
+{
+	struct ip_vs_mh_state *s;
+
+	s = container_of(head, struct ip_vs_mh_state, rcu_head);
+	kfree(s->lookup);
+	kfree(s);
+}
+
+static int ip_vs_mh_init_svc(struct ip_vs_service *svc)
+{
+	int ret;
+	struct ip_vs_mh_state *s;
+
+	/* Allocate the MH table for this service */
+	s = kzalloc(sizeof(*s), GFP_KERNEL);
+	if (!s)
+		return -ENOMEM;
+
+	s->lookup = kcalloc(IP_VS_MH_TAB_SIZE, sizeof(struct ip_vs_mh_lookup),
+			    GFP_KERNEL);
+	if (!s->lookup) {
+		kfree(s);
+		return -ENOMEM;
+	}
+
+	generate_hash_secret(&s->hash1, &s->hash2);
+	s->gcd = ip_vs_mh_gcd_weight(svc);
+	s->rshift = ip_vs_mh_shift_weight(svc, s->gcd);
+
+	IP_VS_DBG(6,
+		  "MH lookup table (memory=%zdbytes) allocated for current service\n",
+		  sizeof(struct ip_vs_mh_lookup) * IP_VS_MH_TAB_SIZE);
+
+	/* Assign the lookup table with current dests */
+	ret = ip_vs_mh_reassign(s, svc);
+	if (ret < 0) {
+		ip_vs_mh_reset(s);
+		ip_vs_mh_state_free(&s->rcu_head);
+		return ret;
+	}
+
+	/* No more failures, attach state */
+	svc->sched_data = s;
+	return 0;
+}
+
+static void ip_vs_mh_done_svc(struct ip_vs_service *svc)
+{
+	struct ip_vs_mh_state *s = svc->sched_data;
+
+	/* Got to clean up lookup entry here */
+	ip_vs_mh_reset(s);
+
+	call_rcu(&s->rcu_head, ip_vs_mh_state_free);
+	IP_VS_DBG(6, "MH lookup table (memory=%zdbytes) released\n",
+		  sizeof(struct ip_vs_mh_lookup) * IP_VS_MH_TAB_SIZE);
+}
+
+static int ip_vs_mh_dest_changed(struct ip_vs_service *svc,
+				 struct ip_vs_dest *dest)
+{
+	struct ip_vs_mh_state *s = svc->sched_data;
+
+	s->gcd = ip_vs_mh_gcd_weight(svc);
+	s->rshift = ip_vs_mh_shift_weight(svc, s->gcd);
+
+	/* Assign the lookup table with the updated service */
+	return ip_vs_mh_reassign(s, svc);
+}
+
+/* Helper function to get port number */
+static inline __be16
+ip_vs_mh_get_port(const struct sk_buff *skb, struct ip_vs_iphdr *iph)
+{
+	__be16 _ports[2], *ports;
+
+	/* At this point we know that we have a valid packet of some kind.
+	 * Because ICMP packets are only guaranteed to have the first 8
+	 * bytes, let's just grab the ports.  Fortunately they're in the
+	 * same position for all three of the protocols we care about.
+	 */
+	switch (iph->protocol) {
+	case IPPROTO_TCP:
+	case IPPROTO_UDP:
+	case IPPROTO_SCTP:
+		ports = skb_header_pointer(skb, iph->len, sizeof(_ports),
+					   &_ports);
+		if (unlikely(!ports))
+			return 0;
+
+		if (likely(!ip_vs_iph_inverse(iph)))
+			return ports[0];
+		else
+			return ports[1];
+	default:
+		return 0;
+	}
+}
+
+/* Maglev Hashing scheduling */
+static struct ip_vs_dest *
+ip_vs_mh_schedule(struct ip_vs_service *svc, const struct sk_buff *skb,
+		  struct ip_vs_iphdr *iph)
+{
+	struct ip_vs_dest *dest;
+	struct ip_vs_mh_state *s;
+	__be16 port = 0;
+	const union nf_inet_addr *hash_addr;
+
+	hash_addr = ip_vs_iph_inverse(iph) ? &iph->daddr : &iph->saddr;
+
+	IP_VS_DBG(6, "%s : Scheduling...\n", __func__);
+
+	if (svc->flags & IP_VS_SVC_F_SCHED_MH_PORT)
+		port = ip_vs_mh_get_port(skb, iph);
+
+	s = (struct ip_vs_mh_state *)svc->sched_data;
+
+	if (svc->flags & IP_VS_SVC_F_SCHED_MH_FALLBACK)
+		dest = ip_vs_mh_get_fallback(svc, s, hash_addr, port);
+	else
+		dest = ip_vs_mh_get(svc, s, hash_addr, port);
+
+	if (!dest) {
+		ip_vs_scheduler_err(svc, "no destination available");
+		return NULL;
+	}
+
+	IP_VS_DBG_BUF(6, "MH: source IP address %s:%u --> server %s:%u\n",
+		      IP_VS_DBG_ADDR(svc->af, hash_addr),
+		      ntohs(port),
+		      IP_VS_DBG_ADDR(dest->af, &dest->addr),
+		      ntohs(dest->port));
+
+	return dest;
+}
+
+/* IPVS MH Scheduler structure */
+static struct ip_vs_scheduler ip_vs_mh_scheduler = {
+	.name =			"mh",
+	.refcnt =		ATOMIC_INIT(0),
+	.module =		THIS_MODULE,
+	.n_list	 =		LIST_HEAD_INIT(ip_vs_mh_scheduler.n_list),
+	.init_service =		ip_vs_mh_init_svc,
+	.done_service =		ip_vs_mh_done_svc,
+	.add_dest =		ip_vs_mh_dest_changed,
+	.del_dest =		ip_vs_mh_dest_changed,
+	.upd_dest =		ip_vs_mh_dest_changed,
+	.schedule =		ip_vs_mh_schedule,
+};
+
+static int __init ip_vs_mh_init(void)
+{
+	return register_ip_vs_scheduler(&ip_vs_mh_scheduler);
+}
+
+static void __exit ip_vs_mh_cleanup(void)
+{
+	unregister_ip_vs_scheduler(&ip_vs_mh_scheduler);
+	rcu_barrier();
+}
+
+module_init(ip_vs_mh_init);
+module_exit(ip_vs_mh_cleanup);
+MODULE_DESCRIPTION("Maglev hashing ipvs scheduler");
+MODULE_LICENSE("GPL v2");
+MODULE_AUTHOR("Inju Song <inju.song@navercorp.com>");
-- 
2.11.0

^ permalink raw reply related

* [PATCH 4/5] netfilter: ipvs: Add configurations of Maglev hashing
From: Simon Horman @ 2018-04-19  8:56 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: lvs-devel, netdev, netfilter-devel, Wensong Zhang,
	Julian Anastasov, Inju Song, Simon Horman
In-Reply-To: <20180419085614.7437-1-horms@verge.net.au>

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=UTF-8, Size: 3306 bytes --]

From: Inju Song <inju.song@navercorp.com>

To build the maglev hashing scheduler, add some configuration
to Kconfig and Makefile.

 - The compile configurations of MH are added to the Kconfig.

 - The MH build rule is added to the Makefile.

Signed-off-by: Inju Song <inju.song@navercorp.com>
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
---
 net/netfilter/ipvs/Kconfig  | 37 +++++++++++++++++++++++++++++++++++++
 net/netfilter/ipvs/Makefile |  1 +
 2 files changed, 38 insertions(+)

diff --git a/net/netfilter/ipvs/Kconfig b/net/netfilter/ipvs/Kconfig
index b32fb0dbe237..05dc1b77e466 100644
--- a/net/netfilter/ipvs/Kconfig
+++ b/net/netfilter/ipvs/Kconfig
@@ -225,6 +225,25 @@ config	IP_VS_SH
 	  If you want to compile it in kernel, say Y. To compile it as a
 	  module, choose M here. If unsure, say N.
 
+config	IP_VS_MH
+	tristate "maglev hashing scheduling"
+	---help---
+	  The maglev consistent hashing scheduling algorithm provides the
+	  Google's Maglev hashing algorithm as a IPVS scheduler. It assigns
+	  network connections to the servers through looking up a statically
+	  assigned special hash table called the lookup table. Maglev hashing
+	  is to assign a preference list of all the lookup table positions
+	  to each destination.
+
+	  Through this operation, The maglev hashing gives an almost equal
+	  share of the lookup table to each of the destinations and provides
+	  minimal disruption by using the lookup table. When the set of
+	  destinations changes, a connection will likely be sent to the same
+	  destination as it was before.
+
+	  If you want to compile it in kernel, say Y. To compile it as a
+	  module, choose M here. If unsure, say N.
+
 config	IP_VS_SED
 	tristate "shortest expected delay scheduling"
 	---help---
@@ -266,6 +285,24 @@ config IP_VS_SH_TAB_BITS
 	  needs to be large enough to effectively fit all the destinations
 	  multiplied by their respective weights.
 
+comment 'IPVS MH scheduler'
+
+config IP_VS_MH_TAB_INDEX
+	int "IPVS maglev hashing table index of size (the prime numbers)"
+	range 8 17
+	default 12
+	---help---
+	  The maglev hashing scheduler maps source IPs to destinations
+	  stored in a hash table. This table is assigned by a preference
+	  list of the positions to each destination until all slots in
+	  the table are filled. The index determines the prime for size of
+	  the table as 251, 509, 1021, 2039, 4093, 8191, 16381, 32749,
+	  65521 or 131071. When using weights to allow destinations to
+	  receive more connections, the table is assigned an amount
+	  proportional to the weights specified. The table needs to be large
+	  enough to effectively fit all the destinations multiplied by their
+	  respective weights.
+
 comment 'IPVS application helper'
 
 config	IP_VS_FTP
diff --git a/net/netfilter/ipvs/Makefile b/net/netfilter/ipvs/Makefile
index c552993fa4b9..bfce2677fda2 100644
--- a/net/netfilter/ipvs/Makefile
+++ b/net/netfilter/ipvs/Makefile
@@ -33,6 +33,7 @@ obj-$(CONFIG_IP_VS_LBLC) += ip_vs_lblc.o
 obj-$(CONFIG_IP_VS_LBLCR) += ip_vs_lblcr.o
 obj-$(CONFIG_IP_VS_DH) += ip_vs_dh.o
 obj-$(CONFIG_IP_VS_SH) += ip_vs_sh.o
+obj-$(CONFIG_IP_VS_MH) += ip_vs_mh.o
 obj-$(CONFIG_IP_VS_SED) += ip_vs_sed.o
 obj-$(CONFIG_IP_VS_NQ) += ip_vs_nq.o
 
-- 
2.11.0

^ permalink raw reply related

* [GIT PULL 0/5] IPVS Updates for v4.18
From: Simon Horman @ 2018-04-19  8:56 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: lvs-devel, netdev, netfilter-devel, Wensong Zhang,
	Julian Anastasov, Simon Horman

Hi Pablo,

please consider these IPVS enhancements for v4.18.

* Whitepace cleanup

* Add Maglev hashing algorithm as a IPVS scheduler

  Inju Song says "Implements the Google's Maglev hashing algorithm as a
  IPVS scheduler.  Basically it provides consistent hashing but offers some
  special features about disruption and load balancing.

  1) minimal disruption: when the set of destinations changes,
     a connection will likely be sent to the same destination
     as it was before.

  2) load balancing: each destination will receive an almost
     equal number of connections.

 Seel also: [3.4 Consistent Hasing] in
 https://www.usenix.org/system/files/conference/nsdi16/nsdi16-paper-eisenbud.pdf
 "

* Fix to correct implementation of Knuth's multiplicative hashing
  which is used in sh/dh/lblc/lblcr algorithms. Instead the
  implementation provided by the hash_32() macro is used.

The following changes since commit 159f02977b2feb18a4bece5e586c838a6d26d44b:

  Merge branch 'net-mvneta-improve-suspend-resume' (2018-04-02 11:14:03 -0400)

are available in the git repository at:

  http://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs-next.git tags/ipvs-for-v4.18

for you to fetch changes up to 9a17740e0ea1c9b1edd89836bb27c76272f54641:

  ipvs: fix multiplicative hashing in sh/dh/lblc/lblcr algorithms (2018-04-09 10:15:27 +0300)

----------------------------------------------------------------
Arvind Yadav (1):
      netfilter: ipvs: Fix space before '[' error.

Inju Song (3):
      netfilter: ipvs: Keep latest weight of destination
      netfilter: ipvs: Add Maglev hashing scheduler
      netfilter: ipvs: Add configurations of Maglev hashing

Vincent Bernat (1):
      ipvs: fix multiplicative hashing in sh/dh/lblc/lblcr algorithms

 include/net/ip_vs.h                  |   1 +
 net/netfilter/ipvs/Kconfig           |  37 +++
 net/netfilter/ipvs/Makefile          |   1 +
 net/netfilter/ipvs/ip_vs_ctl.c       |   4 +
 net/netfilter/ipvs/ip_vs_dh.c        |   3 +-
 net/netfilter/ipvs/ip_vs_lblc.c      |   3 +-
 net/netfilter/ipvs/ip_vs_lblcr.c     |   3 +-
 net/netfilter/ipvs/ip_vs_mh.c        | 540 +++++++++++++++++++++++++++++++++++
 net/netfilter/ipvs/ip_vs_proto_tcp.c |   4 +-
 net/netfilter/ipvs/ip_vs_sh.c        |   3 +-
 10 files changed, 593 insertions(+), 6 deletions(-)
 create mode 100644 net/netfilter/ipvs/ip_vs_mh.c

^ permalink raw reply

* [PATCH 2/5] netfilter: ipvs: Keep latest weight of destination
From: Simon Horman @ 2018-04-19  8:56 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: lvs-devel, netdev, netfilter-devel, Wensong Zhang,
	Julian Anastasov, Inju Song, Simon Horman
In-Reply-To: <20180419085614.7437-1-horms@verge.net.au>

From: Inju Song <inju.song@navercorp.com>

The hashing table in scheduler such as source hash or maglev hash
should ignore the changed weight to 0 and allow changing the weight
from/to non-0 values. So, struct ip_vs_dest needs to keep weight
with latest non-0 weight.

Signed-off-by: Inju Song <inju.song@navercorp.com>
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
---
 include/net/ip_vs.h            | 1 +
 net/netfilter/ipvs/ip_vs_ctl.c | 4 ++++
 2 files changed, 5 insertions(+)

diff --git a/include/net/ip_vs.h b/include/net/ip_vs.h
index eb0bec043c96..0ac795b41ab8 100644
--- a/include/net/ip_vs.h
+++ b/include/net/ip_vs.h
@@ -668,6 +668,7 @@ struct ip_vs_dest {
 	volatile unsigned int	flags;		/* dest status flags */
 	atomic_t		conn_flags;	/* flags to copy to conn */
 	atomic_t		weight;		/* server weight */
+	atomic_t		last_weight;	/* server latest weight */
 
 	refcount_t		refcnt;		/* reference counter */
 	struct ip_vs_stats      stats;          /* statistics */
diff --git a/net/netfilter/ipvs/ip_vs_ctl.c b/net/netfilter/ipvs/ip_vs_ctl.c
index 5ebde4b15810..b91bb70ece92 100644
--- a/net/netfilter/ipvs/ip_vs_ctl.c
+++ b/net/netfilter/ipvs/ip_vs_ctl.c
@@ -821,6 +821,10 @@ __ip_vs_update_dest(struct ip_vs_service *svc, struct ip_vs_dest *dest,
 	if (add && udest->af != svc->af)
 		ipvs->mixed_address_family_dests++;
 
+	/* keep the last_weight with latest non-0 weight */
+	if (add || udest->weight != 0)
+		atomic_set(&dest->last_weight, udest->weight);
+
 	/* set the weight and the flags */
 	atomic_set(&dest->weight, udest->weight);
 	conn_flags = udest->conn_flags & IP_VS_CONN_F_DEST_MASK;
-- 
2.11.0

^ permalink raw reply related

* [PATCH 1/5] netfilter: ipvs: Fix space before '[' error.
From: Simon Horman @ 2018-04-19  8:56 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: lvs-devel, netdev, netfilter-devel, Wensong Zhang,
	Julian Anastasov, Arvind Yadav, Simon Horman
In-Reply-To: <20180419085614.7437-1-horms@verge.net.au>

From: Arvind Yadav <arvind.yadav.cs@gmail.com>

Fix checkpatch.pl error:
ERROR: space prohibited before open square bracket '['.

Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
---
 net/netfilter/ipvs/ip_vs_proto_tcp.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/netfilter/ipvs/ip_vs_proto_tcp.c b/net/netfilter/ipvs/ip_vs_proto_tcp.c
index bcd9b7bde4ee..569631d2b2a1 100644
--- a/net/netfilter/ipvs/ip_vs_proto_tcp.c
+++ b/net/netfilter/ipvs/ip_vs_proto_tcp.c
@@ -436,7 +436,7 @@ static bool tcp_state_active(int state)
 	return tcp_state_active_table[state];
 }
 
-static struct tcp_states_t tcp_states [] = {
+static struct tcp_states_t tcp_states[] = {
 /*	INPUT */
 /*        sNO, sES, sSS, sSR, sFW, sTW, sCL, sCW, sLA, sLI, sSA	*/
 /*syn*/ {{sSR, sES, sES, sSR, sSR, sSR, sSR, sSR, sSR, sSR, sSR }},
@@ -459,7 +459,7 @@ static struct tcp_states_t tcp_states [] = {
 /*rst*/ {{sCL, sCL, sCL, sSR, sCL, sCL, sCL, sCL, sLA, sLI, sCL }},
 };
 
-static struct tcp_states_t tcp_states_dos [] = {
+static struct tcp_states_t tcp_states_dos[] = {
 /*	INPUT */
 /*        sNO, sES, sSS, sSR, sFW, sTW, sCL, sCW, sLA, sLI, sSA	*/
 /*syn*/ {{sSR, sES, sES, sSR, sSR, sSR, sSR, sSR, sSR, sSR, sSA }},
-- 
2.11.0

^ permalink raw reply related

* Re: [PATCH] net: phy: marvell: clear wol event before setting it
From: Jisheng Zhang @ 2018-04-19  8:53 UTC (permalink / raw)
  To: Bhadram Varka
  Cc: Andrew Lunn, Florian Fainelli, David S. Miller,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org, Jingju Hou
In-Reply-To: <96e77eac86794bef9a5b772147527c67@bgmail102.nvidia.com>

Hi,

On Thu, 19 Apr 2018 08:38:45 +0000 Bhadram Varka wrote:

> Hi,
> 
> > -----Original Message-----
> > From: netdev-owner@vger.kernel.org <netdev-owner@vger.kernel.org> On
> > Behalf Of Jisheng Zhang
> > Sent: Thursday, April 19, 2018 1:33 PM
> > To: Andrew Lunn <andrew@lunn.ch>; Florian Fainelli <f.fainelli@gmail.com>;
> > David S. Miller <davem@davemloft.net>
> > Cc: netdev@vger.kernel.org; linux-kernel@vger.kernel.org; Jingju Hou
> > <Jingju.Hou@synaptics.com>
> > Subject: [PATCH] net: phy: marvell: clear wol event before setting it
> > 
> > From: Jingju Hou <Jingju.Hou@synaptics.com>
> > 
> > If WOL event happened once, the LED[2] interrupt pin will not be cleared unless
> > reading the CSISR register. So clear the WOL event before enabling it.
> > 
> > Signed-off-by: Jingju Hou <Jingju.Hou@synaptics.com>
> > Signed-off-by: Jisheng Zhang <Jisheng.Zhang@synaptics.com>
> > ---
> >  drivers/net/phy/marvell.c | 9 +++++++++
> >  1 file changed, 9 insertions(+)
> > 
> > diff --git a/drivers/net/phy/marvell.c b/drivers/net/phy/marvell.c index
> > c22e8e383247..b6abe1cbc84b 100644
> > --- a/drivers/net/phy/marvell.c
> > +++ b/drivers/net/phy/marvell.c
> > @@ -115,6 +115,9 @@
> >  /* WOL Event Interrupt Enable */
> >  #define MII_88E1318S_PHY_CSIER_WOL_EIE			BIT(7)
> > 
> > +/* Copper Specific Interrupt Status Register */
> > +#define MII_88E1318S_PHY_CSISR				0x13
> > +  
> 
> There is already macro to represent this register - MII_M1011_IEVENT. Do we need this macro ?

Good point. Will use MII_M1011_IEVENT instead in v2.

> 
> >  /* LED Timer Control Register */
> >  #define MII_88E1318S_PHY_LED_TCR			0x12
> >  #define MII_88E1318S_PHY_LED_TCR_FORCE_INT		BIT(15)
> > @@ -1393,6 +1396,12 @@ static int m88e1318_set_wol(struct phy_device
> > *phydev,
> >  		if (err < 0)
> >  			goto error;
> > 
> > +		/* If WOL event happened once, the LED[2] interrupt pin
> > +		 * will not be cleared unless reading the CSISR register.
> > +		 * So clear the WOL event first before enabling it.
> > +		 */
> > +		phy_read(phydev, MII_88E1318S_PHY_CSISR);  
> 
> This part of the operation already taken care by ack_interrupt and did_interrupt
> [....]
> .ack_interrupt = &marvell_ack_interrupt,
> .did_interrupt = &m88e1121_did_interrupt,
> [...]
> 
> If at all WOL event occurred marvell_ack_interrupt will take care of clearing the interrupt status register.
> Am I missing anything here ?

If there's no valid irq for phy, the ack_interrupt/did_interrupt won't
be called.


Thanks

^ permalink raw reply

* RE: [PATCH] net: phy: marvell: clear wol event before setting it
From: Bhadram Varka @ 2018-04-19  8:38 UTC (permalink / raw)
  To: Jisheng Zhang, Andrew Lunn, Florian Fainelli, David S. Miller
  Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org, Jingju Hou
In-Reply-To: <20180419160232.519d15be@xhacker.debian>

Hi,

> -----Original Message-----
> From: netdev-owner@vger.kernel.org <netdev-owner@vger.kernel.org> On
> Behalf Of Jisheng Zhang
> Sent: Thursday, April 19, 2018 1:33 PM
> To: Andrew Lunn <andrew@lunn.ch>; Florian Fainelli <f.fainelli@gmail.com>;
> David S. Miller <davem@davemloft.net>
> Cc: netdev@vger.kernel.org; linux-kernel@vger.kernel.org; Jingju Hou
> <Jingju.Hou@synaptics.com>
> Subject: [PATCH] net: phy: marvell: clear wol event before setting it
> 
> From: Jingju Hou <Jingju.Hou@synaptics.com>
> 
> If WOL event happened once, the LED[2] interrupt pin will not be cleared unless
> reading the CSISR register. So clear the WOL event before enabling it.
> 
> Signed-off-by: Jingju Hou <Jingju.Hou@synaptics.com>
> Signed-off-by: Jisheng Zhang <Jisheng.Zhang@synaptics.com>
> ---
>  drivers/net/phy/marvell.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/drivers/net/phy/marvell.c b/drivers/net/phy/marvell.c index
> c22e8e383247..b6abe1cbc84b 100644
> --- a/drivers/net/phy/marvell.c
> +++ b/drivers/net/phy/marvell.c
> @@ -115,6 +115,9 @@
>  /* WOL Event Interrupt Enable */
>  #define MII_88E1318S_PHY_CSIER_WOL_EIE			BIT(7)
> 
> +/* Copper Specific Interrupt Status Register */
> +#define MII_88E1318S_PHY_CSISR				0x13
> +

There is already macro to represent this register - MII_M1011_IEVENT. Do we need this macro ?

>  /* LED Timer Control Register */
>  #define MII_88E1318S_PHY_LED_TCR			0x12
>  #define MII_88E1318S_PHY_LED_TCR_FORCE_INT		BIT(15)
> @@ -1393,6 +1396,12 @@ static int m88e1318_set_wol(struct phy_device
> *phydev,
>  		if (err < 0)
>  			goto error;
> 
> +		/* If WOL event happened once, the LED[2] interrupt pin
> +		 * will not be cleared unless reading the CSISR register.
> +		 * So clear the WOL event first before enabling it.
> +		 */
> +		phy_read(phydev, MII_88E1318S_PHY_CSISR);

This part of the operation already taken care by ack_interrupt and did_interrupt
[....]
.ack_interrupt = &marvell_ack_interrupt,
.did_interrupt = &m88e1121_did_interrupt,
[...]

If at all WOL event occurred marvell_ack_interrupt will take care of clearing the interrupt status register.
Am I missing anything here ?

>  		/* Enable the WOL interrupt */
>  		err = __phy_modify(phydev, MII_88E1318S_PHY_CSIER, 0,
>  				   MII_88E1318S_PHY_CSIER_WOL_EIE);
> --
> 2.17.0

^ permalink raw reply

* [PATCH] net: phy: TLK10X initial driver submission
From: Måns Andersson @ 2018-04-19  8:28 UTC (permalink / raw)
  To: Rob Herring, Mark Rutland, Andrew Lunn, Florian Fainelli, netdev,
	devicetree, linux-kernel
  Cc: Mans Andersson

From: Mans Andersson <mans.andersson@nibe.se>

Add suport for the TI TLK105 and TLK106 10/100Mbit ethernet phys.

In addition the TLK10X needs to be removed from DP83848 driver as the
power back off support is added here for this device.

Datasheet:
http://www.ti.com/lit/gpn/tlk106
---
 .../devicetree/bindings/net/ti,tlk10x.txt          |  27 +++
 drivers/net/phy/Kconfig                            |   5 +
 drivers/net/phy/Makefile                           |   1 +
 drivers/net/phy/dp83848.c                          |   3 -
 drivers/net/phy/tlk10x.c                           | 209 +++++++++++++++++++++
 5 files changed, 242 insertions(+), 3 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/net/ti,tlk10x.txt
 create mode 100644 drivers/net/phy/tlk10x.c

diff --git a/Documentation/devicetree/bindings/net/ti,tlk10x.txt b/Documentation/devicetree/bindings/net/ti,tlk10x.txt
new file mode 100644
index 0000000..371d0d7
--- /dev/null
+++ b/Documentation/devicetree/bindings/net/ti,tlk10x.txt
@@ -0,0 +1,27 @@
+* Texas Instruments - TLK105 / TLK106 ethernet PHYs
+
+Required properties:
+	- reg - The ID number for the phy, usually a small integer
+
+Optional properties:
+	- ti,power-back-off - Power Back Off Level
+		Please refer to data sheet chapter 8.6 and TI Application
+		Note SLLA3228
+		0 - Normal Operation
+		1 - Level 1 (up to 140m cable between TLK link partners)
+		2 - Level 2 (up to 100m cable between TLK link partners)
+		3 - Level 3 (up to 80m cable between TLK link partners)
+
+Default child nodes are standard Ethernet PHY device
+nodes as described in Documentation/devicetree/bindings/net/phy.txt
+
+Example:
+
+	ethernet-phy@0 {
+		reg = <0>;
+		ti,power-back-off = <2>;
+	};
+
+Datasheets and documentation can be found at:
+http://www.ti.com/lit/gpn/tlk106
+http://www.ti.com/lit/an/slla328/slla328.pdf
diff --git a/drivers/net/phy/Kconfig b/drivers/net/phy/Kconfig
index bdfbabb..c980240 100644
--- a/drivers/net/phy/Kconfig
+++ b/drivers/net/phy/Kconfig
@@ -295,6 +295,11 @@ config DP83867_PHY
 	---help---
 	  Currently supports the DP83867 PHY.
 
+config TLK10X_PHY
+	tristate "Texas Instruments TLK10x PHY"
+	---help---
+	  Supports the TLK105 and TLK106 PHYs.
+
 config FIXED_PHY
 	tristate "MDIO Bus/PHY emulation with fixed speed/link PHYs"
 	depends on PHYLIB
diff --git a/drivers/net/phy/Makefile b/drivers/net/phy/Makefile
index 01acbcb..37e4e02 100644
--- a/drivers/net/phy/Makefile
+++ b/drivers/net/phy/Makefile
@@ -79,5 +79,6 @@ obj-$(CONFIG_ROCKCHIP_PHY)	+= rockchip.o
 obj-$(CONFIG_SMSC_PHY)		+= smsc.o
 obj-$(CONFIG_STE10XP)		+= ste10Xp.o
 obj-$(CONFIG_TERANETICS_PHY)	+= teranetics.o
+obj-$(CONFIG_TLK10X_PHY)	+= tlk10x.o
 obj-$(CONFIG_VITESSE_PHY)	+= vitesse.o
 obj-$(CONFIG_XILINX_GMII2RGMII) += xilinx_gmii2rgmii.o
diff --git a/drivers/net/phy/dp83848.c b/drivers/net/phy/dp83848.c
index cd09c3a..435f401 100644
--- a/drivers/net/phy/dp83848.c
+++ b/drivers/net/phy/dp83848.c
@@ -19,7 +19,6 @@
 #define TI_DP83848C_PHY_ID		0x20005ca0
 #define TI_DP83620_PHY_ID		0x20005ce0
 #define NS_DP83848C_PHY_ID		0x20005c90
-#define TLK10X_PHY_ID			0x2000a210
 
 /* Registers */
 #define DP83848_MICR			0x11 /* MII Interrupt Control Register */
@@ -78,7 +77,6 @@ static struct mdio_device_id __maybe_unused dp83848_tbl[] = {
 	{ TI_DP83848C_PHY_ID, 0xfffffff0 },
 	{ NS_DP83848C_PHY_ID, 0xfffffff0 },
 	{ TI_DP83620_PHY_ID, 0xfffffff0 },
-	{ TLK10X_PHY_ID, 0xfffffff0 },
 	{ }
 };
 MODULE_DEVICE_TABLE(mdio, dp83848_tbl);
@@ -105,7 +103,6 @@ static struct phy_driver dp83848_driver[] = {
 	DP83848_PHY_DRIVER(TI_DP83848C_PHY_ID, "TI DP83848C 10/100 Mbps PHY"),
 	DP83848_PHY_DRIVER(NS_DP83848C_PHY_ID, "NS DP83848C 10/100 Mbps PHY"),
 	DP83848_PHY_DRIVER(TI_DP83620_PHY_ID, "TI DP83620 10/100 Mbps PHY"),
-	DP83848_PHY_DRIVER(TLK10X_PHY_ID, "TI TLK10X 10/100 Mbps PHY"),
 };
 module_phy_driver(dp83848_driver);
 
diff --git a/drivers/net/phy/tlk10x.c b/drivers/net/phy/tlk10x.c
new file mode 100644
index 0000000..1efc81e
--- /dev/null
+++ b/drivers/net/phy/tlk10x.c
@@ -0,0 +1,209 @@
+// SPDX-License-Identifier: GPL-2.0
+/**
+ * Driver for the Texas Instruments TLK105 / TLK106
+ *
+ * Copyright (C) 2018 NIBE Industrier AB - http://www.nibe.com
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include <linux/module.h>
+#include <linux/phy.h>
+#include <linux/of.h>
+
+#define TLK10X_PHY_ID			0x2000a210
+
+/* Registers */
+#define TLK10X_MICR			0x11 /* MII Interrupt Control Reg */
+#define TLK10X_MISR			0x12 /* MII Interrupt Status Reg */
+#define TLK10X_REGCR			0x0d /* Register Control Register */
+#define TLK10X_ADDAR			0x0e /* Data Register */
+#define TLK10X_PWRBOCR			0xae /* Power Backoff Register */
+
+/* MICR Register Fields */
+#define TLK10X_MICR_INT_OE		BIT(0) /* Interrupt Output Enable */
+#define TLK10X_MICR_INTEN		BIT(1) /* Interrupt Enable */
+
+/* MISR Register Fields */
+#define TLK10X_MISR_RHF_INT_EN		BIT(0) /* Receive Error Counter */
+#define TLK10X_MISR_FHF_INT_EN		BIT(1) /* False Carrier Counter */
+#define TLK10X_MISR_ANC_INT_EN		BIT(2) /* Auto-negotiation complete */
+#define TLK10X_MISR_DUP_INT_EN		BIT(3) /* Duplex Status */
+#define TLK10X_MISR_SPD_INT_EN		BIT(4) /* Speed status */
+#define TLK10X_MISR_LINK_INT_EN		BIT(5) /* Link status */
+#define TLK10X_MISR_ED_INT_EN		BIT(6) /* Energy detect */
+#define TLK10X_MISR_LQM_INT_EN		BIT(7) /* Link Quality Monitor */
+
+/* PWRBOCR Register Fields */
+#define TLK10X_PWRBOCR_MASK		0xe0 /* Power Backoff Mask */
+
+#define TLK10X_INT_EN_MASK		\
+	(TLK10X_MISR_ANC_INT_EN |	\
+	 TLK10X_MISR_DUP_INT_EN |	\
+	 TLK10X_MISR_SPD_INT_EN |	\
+	 TLK10X_MISR_LINK_INT_EN)
+
+struct tlk10x_private {
+	int pwrbo_level;
+};
+
+static int tlk10x_read(struct phy_device *phydev, int reg)
+{
+	if (reg & ~0x1f) {
+		/* Extended register */
+		phy_write(phydev, TLK10X_REGCR, 0x001F);
+		phy_write(phydev, TLK10X_ADDAR, reg);
+		phy_write(phydev, TLK10X_REGCR, 0x401F);
+		reg = TLK10X_ADDAR;
+	}
+
+	return phy_read(phydev, reg);
+}
+
+static int tlk10x_write(struct phy_device *phydev, int reg, int val)
+{
+	if (reg & ~0x1f) {
+		/* Extended register */
+		phy_write(phydev, TLK10X_REGCR, 0x001F);
+		phy_write(phydev, TLK10X_ADDAR, reg);
+		phy_write(phydev, TLK10X_REGCR, 0x401F);
+		reg = TLK10X_ADDAR;
+	}
+
+	return phy_write(phydev, reg, val);
+}
+
+#ifdef CONFIG_OF_MDIO
+static int tlk10x_of_init(struct phy_device *phydev)
+{
+	struct tlk10x_private *tlk10x = phydev->priv;
+	struct device *dev = &phydev->mdio.dev;
+	struct device_node *of_node = dev->of_node;
+	int ret;
+
+	if (!of_node)
+		return 0;
+
+	ret = of_property_read_u32(of_node, "ti,power-back-off",
+				   &tlk10x->pwrbo_level);
+	if (ret) {
+		dev_err(dev, "missing ti,power-back-off property");
+		tlk10x->pwrbo_level = 0;
+	}
+
+	return 0;
+}
+#else
+static int tlk10x_of_init(struct phy_device *phydev)
+{
+	return 0;
+}
+#endif /* CONFIG_OF_MDIO */
+
+static int tlk10x_config_init(struct phy_device *phydev)
+{
+	int ret, reg;
+	struct tlk10x_private *tlk10x;
+
+	ret = genphy_config_init(phydev);
+	if (ret < 0)
+		return ret;
+
+	if (!phydev->priv) {
+		tlk10x = devm_kzalloc(&phydev->mdio.dev, sizeof(*tlk10x),
+				      GFP_KERNEL);
+		if (!tlk10x)
+			return -ENOMEM;
+
+		phydev->priv = tlk10x;
+		ret = tlk10x_of_init(phydev);
+		if (ret)
+			return ret;
+	} else {
+		tlk10x = (struct tlk10x_private *)phydev->priv;
+	}
+
+	// Power back off
+	if (tlk10x->pwrbo_level < 0 || tlk10x->pwrbo_level > 3)
+		tlk10x->pwrbo_level = 0;
+	reg = tlk10x_read(phydev, TLK10X_PWRBOCR);
+	reg = ((reg & ~TLK10X_PWRBOCR_MASK)
+		| (tlk10x->pwrbo_level << 6));
+	ret = tlk10x_write(phydev, TLK10X_PWRBOCR, reg);
+	if (ret < 0) {
+		dev_err(&phydev->mdio.dev,
+			"unable to set power back-off (err=%d)\n", ret);
+		return ret;
+	}
+	dev_info(&phydev->mdio.dev, "power back-off set to level %d\n",
+		 tlk10x->pwrbo_level);
+
+	return 0;
+}
+
+static int tlk10x_ack_interrupt(struct phy_device *phydev)
+{
+	int err = tlk10x_read(phydev, TLK10X_MISR);
+
+	return err < 0 ? err : 0;
+}
+
+static int tlk10x_config_intr(struct phy_device *phydev)
+{
+	int control, ret;
+
+	control = tlk10x_read(phydev, TLK10X_MICR);
+	if (control < 0)
+		return control;
+
+	if (phydev->interrupts == PHY_INTERRUPT_ENABLED) {
+		control |= TLK10X_MICR_INT_OE;
+		control |= TLK10X_MICR_INTEN;
+
+		ret = tlk10x_write(phydev, TLK10X_MISR, TLK10X_INT_EN_MASK);
+		if (ret < 0)
+			return ret;
+	} else {
+		control &= ~TLK10X_MICR_INTEN;
+	}
+
+	return tlk10x_write(phydev, TLK10X_MICR, control);
+}
+
+static struct phy_driver tlk10x_driver[] = {
+	{
+		.phy_id		= TLK10X_PHY_ID,
+		.phy_id_mask	= 0xfffffff0,
+		.name		= "TI TLK10X 10/100 Mbps PHY",
+		.features	= PHY_BASIC_FEATURES,
+		.flags		= PHY_HAS_INTERRUPT,
+
+		.config_init	= tlk10x_config_init,
+		.soft_reset	= genphy_soft_reset,
+
+		/* IRQ related */
+		.ack_interrupt	= tlk10x_ack_interrupt,
+		.config_intr	= tlk10x_config_intr,
+
+		.suspend	= genphy_suspend,
+		.resume		= genphy_resume,
+	},
+};
+module_phy_driver(tlk10x_driver);
+
+static struct mdio_device_id __maybe_unused tlk10x_tbl[] = {
+	{ TLK10X_PHY_ID, 0xfffffff0 },
+	{ }
+};
+MODULE_DEVICE_TABLE(mdio, tlk10x_tbl);
+
+MODULE_DESCRIPTION("Texas Instruments TLK105 / TLK106 PHY driver");
+MODULE_AUTHOR("Mans Andersson <mans.andersson@nibe.se>");
+MODULE_LICENSE("GPL");
-- 
2.9.3

^ permalink raw reply related

* Re: [PATCH net-next v4 1/3] vmcore: add API to collect hardware dump in second kernel
From: Greg KH @ 2018-04-19  8:24 UTC (permalink / raw)
  To: Rahul Lakkireddy
  Cc: netdev, kexec, linux-fsdevel, linux-kernel, davem, viro, ebiederm,
	stephen, akpm, torvalds, ganeshgr, nirranjan, indranil
In-Reply-To: <e69b49f1406d2dcd9cd68e4300939e96cd745559.1523950324.git.rahul.lakkireddy@chelsio.com>

On Tue, Apr 17, 2018 at 01:14:17PM +0530, Rahul Lakkireddy wrote:
> +config PROC_VMCORE_DEVICE_DUMP
> +	bool "Device Hardware/Firmware Log Collection"
> +	depends on PROC_VMCORE
> +	default y

Only things that require the machine to keep working should be 'default
y', please remove this, it's an option.

> +	help
> +	  Device drivers can collect the device specific snapshot of
> +	  their hardware or firmware before they are initialized in
> +	  crash recovery kernel. If you say Y here, the device dumps
> +	  will be added as ELF notes to /proc/vmcore

Which exact "device drivers" are you referring to here?

thanks,

greg k-h

^ permalink raw reply

* Re: [PATCH net] virtio_net: split out ctrl buffer
From: kbuild test robot @ 2018-04-19  8:05 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: kbuild-all, linux-kernel, Mikulas Patocka, Eric Dumazet,
	David Miller, Jason Wang, virtualization, netdev
In-Reply-To: <1524113437-308621-1-git-send-email-mst@redhat.com>

[-- Attachment #1: Type: text/plain, Size: 1725 bytes --]

Hi Michael,

I love your patch! Yet something to improve:

[auto build test ERROR on net/master]

url:    https://github.com/0day-ci/linux/commits/Michael-S-Tsirkin/virtio_net-split-out-ctrl-buffer/20180419-145754
config: i386-randconfig-a0-201815 (attached as .config)
compiler: gcc-4.9 (Debian 4.9.4-2) 4.9.4
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

All errors (new ones prefixed by >>):

   drivers/net/virtio_net.c: In function 'virtnet_free_queues':
>> drivers/net/virtio_net.c:2365:10: error: 'struct virtnet_info' has no member named 'err_ctrl'
     kfree(vi->err_ctrl);
             ^
   drivers/net/virtio_net.c: In function 'virtnet_alloc_queues':
>> drivers/net/virtio_net.c:2589:8: error: 'vq' undeclared (first use in this function)
     kfree(vq->ctrl);
           ^
   drivers/net/virtio_net.c:2589:8: note: each undeclared identifier is reported only once for each function it appears in

vim +2365 drivers/net/virtio_net.c

  2347	
  2348	static void virtnet_free_queues(struct virtnet_info *vi)
  2349	{
  2350		int i;
  2351	
  2352		for (i = 0; i < vi->max_queue_pairs; i++) {
  2353			napi_hash_del(&vi->rq[i].napi);
  2354			netif_napi_del(&vi->rq[i].napi);
  2355			netif_napi_del(&vi->sq[i].napi);
  2356		}
  2357	
  2358		/* We called napi_hash_del() before netif_napi_del(),
  2359		 * we need to respect an RCU grace period before freeing vi->rq
  2360		 */
  2361		synchronize_net();
  2362	
  2363		kfree(vi->rq);
  2364		kfree(vi->sq);
> 2365		kfree(vi->err_ctrl);
  2366	}
  2367	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 31332 bytes --]

^ permalink raw reply

* [PATCH] net: phy: marvell: clear wol event before setting it
From: Jisheng Zhang @ 2018-04-19  8:02 UTC (permalink / raw)
  To: Andrew Lunn, Florian Fainelli, David S. Miller
  Cc: netdev, linux-kernel, Jingju Hou

From: Jingju Hou <Jingju.Hou@synaptics.com>

If WOL event happened once, the LED[2] interrupt pin will not be
cleared unless reading the CSISR register. So clear the WOL event
before enabling it.

Signed-off-by: Jingju Hou <Jingju.Hou@synaptics.com>
Signed-off-by: Jisheng Zhang <Jisheng.Zhang@synaptics.com>
---
 drivers/net/phy/marvell.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/drivers/net/phy/marvell.c b/drivers/net/phy/marvell.c
index c22e8e383247..b6abe1cbc84b 100644
--- a/drivers/net/phy/marvell.c
+++ b/drivers/net/phy/marvell.c
@@ -115,6 +115,9 @@
 /* WOL Event Interrupt Enable */
 #define MII_88E1318S_PHY_CSIER_WOL_EIE			BIT(7)
 
+/* Copper Specific Interrupt Status Register */
+#define MII_88E1318S_PHY_CSISR				0x13
+
 /* LED Timer Control Register */
 #define MII_88E1318S_PHY_LED_TCR			0x12
 #define MII_88E1318S_PHY_LED_TCR_FORCE_INT		BIT(15)
@@ -1393,6 +1396,12 @@ static int m88e1318_set_wol(struct phy_device *phydev,
 		if (err < 0)
 			goto error;
 
+		/* If WOL event happened once, the LED[2] interrupt pin
+		 * will not be cleared unless reading the CSISR register.
+		 * So clear the WOL event first before enabling it.
+		 */
+		phy_read(phydev, MII_88E1318S_PHY_CSISR);
+
 		/* Enable the WOL interrupt */
 		err = __phy_modify(phydev, MII_88E1318S_PHY_CSIER, 0,
 				   MII_88E1318S_PHY_CSIER_WOL_EIE);
-- 
2.17.0

^ permalink raw reply related

* Re: [net PATCH v2] net: sched, fix OOO packets with pfifo_fast
From: Paolo Abeni @ 2018-04-19  8:00 UTC (permalink / raw)
  To: John Fastabend, Cong Wang
  Cc: Eric Dumazet, Jiri Pirko, David Miller,
	Linux Kernel Network Developers
In-Reply-To: <36a89ed1-d6ff-ddad-c736-4e68909d61c4@gmail.com>

On Wed, 2018-04-18 at 09:44 -0700, John Fastabend wrote:
> Thanks for bringing this up. I'll think about it for a bit maybe
> there is something we can do here. There is a set of conditions
> that if met we can run without the lock. Possibly ONETXQUEUE and
> aligned cpu_map is sufficient. 

I think you mean "root qdisc is mq and aligned cpu_map": AFAICS we can
have ONETXQUEUE when root qdisc is e.g. pfifo_fast which would not help
here.

> We could detect this case and drop
> the locking. For existing systems and high Gbps NICs I think (feel
> free to correct me) assuming a core per cpu is OK. 

I'm sorry, I'm lost. Do you mean "a tx queue per core" instead ?!? 

I'm unsure we can assume the above. In my experiments, at least in some
scenarios it's preferrable configuring a limited number of rx/tx
queues, confine BH processing to the related cores and let user space
processes run on the others, with a many to 1 relationship between the
cores "assigned" to user-space and the cores "assigned" to BH
processing. 

Can't we somewhat try to leverage TCQ_F_CAN_BYPASS even with NOLOCK
qdisc? I *think* we can avoid the qdisc_run() call after
sch_direct_xmit() in the bypass scenario, and that will avoid the
blamed atomic ops above.

Cheers,

Paolo

^ permalink raw reply

* Re: [PATCH net-next 2/2] udp: implement and use per cpu rx skbs cache
From: Paolo Abeni @ 2018-04-19  7:40 UTC (permalink / raw)
  To: Eric Dumazet, netdev; +Cc: David S. Miller
In-Reply-To: <3270c995-4eea-b3e1-128c-82921d89eb79@gmail.com>

Hi,

On Wed, 2018-04-18 at 12:21 -0700, Eric Dumazet wrote:
> 
> On 04/18/2018 10:15 AM, Paolo Abeni wrote:
> is not appealing to me :/
> > 
> > Thank you for the feedback.
> > Sorry for not being clear about it, but knotd is using SO_REUSEPORT and
> > the above tests are leveraging it.
> > 
> > That 5% is on top of that 300%.
> 
> Then there is something wrong.
> 
> Adding copies should not increase performance.

The skb and data are copied into the UDP skb cache only if the socket
is under memory pressure, and that happens if and only if the receiver
is slower than the BH/IP receive path.

The copy slows down the RX path - which was dropping packets - and
makes the udp_recvmsg() considerably faster, as consuming skb becomes
almost a no-op.

AFAICS, this is similar to the strategy you used in:

ommit c8c8b127091b758f5768f906bcdeeb88bc9951ca
Author: Eric Dumazet <edumazet@google.com>
Date:   Wed Dec 7 09:19:33 2016 -0800

    udp: under rx pressure, try to condense skbs

with the difference that with the UDP skb cache there is an hard limit
to the amount of memory the BH is allowed to copy.

> If it does, there is certainly another way, reaching 10% instead of 5%

I benchmarked vs a DNS server to test and verify that we get measurable
benefits in real life scenario. The measured performance gain for the
RX path with reasonable configurations is ~20%.

Any suggestions for better results are more than welcome!

Cheers,

Paolo

^ permalink raw reply

* Re: [PATCH net] virtio_net: split out ctrl buffer
From: kbuild test robot @ 2018-04-19  7:40 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: kbuild-all, linux-kernel, Mikulas Patocka, Eric Dumazet,
	David Miller, Jason Wang, virtualization, netdev
In-Reply-To: <1524113437-308621-1-git-send-email-mst@redhat.com>

[-- Attachment #1: Type: text/plain, Size: 1822 bytes --]

Hi Michael,

I love your patch! Yet something to improve:

[auto build test ERROR on net/master]

url:    https://github.com/0day-ci/linux/commits/Michael-S-Tsirkin/virtio_net-split-out-ctrl-buffer/20180419-145754
config: x86_64-randconfig-x006-201815 (attached as .config)
compiler: gcc-7 (Debian 7.3.0-16) 7.3.0
reproduce:
        # save the attached .config to linux build tree
        make ARCH=x86_64 

All errors (new ones prefixed by >>):

   drivers//net/virtio_net.c: In function 'virtnet_free_queues':
>> drivers//net/virtio_net.c:2365:12: error: 'struct virtnet_info' has no member named 'err_ctrl'; did you mean 'ctrl'?
     kfree(vi->err_ctrl);
               ^~~~~~~~
               ctrl
   drivers//net/virtio_net.c: In function 'virtnet_alloc_queues':
>> drivers//net/virtio_net.c:2589:8: error: 'vq' undeclared (first use in this function); did you mean 'vi'?
     kfree(vq->ctrl);
           ^~
           vi
   drivers//net/virtio_net.c:2589:8: note: each undeclared identifier is reported only once for each function it appears in

vim +2365 drivers//net/virtio_net.c

  2347	
  2348	static void virtnet_free_queues(struct virtnet_info *vi)
  2349	{
  2350		int i;
  2351	
  2352		for (i = 0; i < vi->max_queue_pairs; i++) {
  2353			napi_hash_del(&vi->rq[i].napi);
  2354			netif_napi_del(&vi->rq[i].napi);
  2355			netif_napi_del(&vi->sq[i].napi);
  2356		}
  2357	
  2358		/* We called napi_hash_del() before netif_napi_del(),
  2359		 * we need to respect an RCU grace period before freeing vi->rq
  2360		 */
  2361		synchronize_net();
  2362	
  2363		kfree(vi->rq);
  2364		kfree(vi->sq);
> 2365		kfree(vi->err_ctrl);
  2366	}
  2367	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 27230 bytes --]

^ permalink raw reply

* Greetings
From: "Miss Zeliha ömer faruk" @ 2018-04-19  7:37 UTC (permalink / raw)





Hello

Greetings to you please i have a business proposal for you contact me
for more detailes asap thanks.

Best Regards,
Miss.Zeliha ömer faruk
Esentepe Mahallesi Büyükdere
Caddesi Kristal Kule Binasi
No:215
Sisli - Istanbul, Turkey

^ permalink raw reply

* [PATCH net-next v2 3/3] net: ethernet: ave: add support for phy-mode setting of system controller
From: Kunihiko Hayashi @ 2018-04-19  7:24 UTC (permalink / raw)
  To: David Miller, netdev
  Cc: Andrew Lunn, Florian Fainelli, Rob Herring, Mark Rutland,
	linux-arm-kernel, linux-kernel, devicetree, Masahiro Yamada,
	Masami Hiramatsu, Jassi Brar, Kunihiko Hayashi
In-Reply-To: <1524122695-19597-1-git-send-email-hayashi.kunihiko@socionext.com>

This patch adds support for specifying system controller that configures
phy-mode setting.

According to the DT property "phy-mode", it's necessary to configure the
controller, which is used to choose the settings of the MAC suitable,
for example, mdio pin connections, internal clocks, and so on.

Supported phy-modes are SoC-dependent. The driver allows phy-mode to set
"internal" if the SoC has a built-in PHY, and {"mii", "rmii", "rgmii"}
if the SoC supports each mode. So we have to check whether the phy-mode
is valid or not.

This adds the following features for each SoC:
- check whether the SoC supports the specified phy-mode
- configure the controller accroding to phy-mode

The DT property accepts one argument to distinguish them for multiple MAC
instances.

ethernet@65000000 {
	...
	socionext,syscon-phy-mode = <&soc_glue 0>;
};

ethernet@65200000 {
	...
	socionext,syscon-phy-mode = <&soc_glue 1>;
};

Signed-off-by: Kunihiko Hayashi <hayashi.kunihiko@socionext.com>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
---
 drivers/net/ethernet/socionext/Kconfig   |   2 +
 drivers/net/ethernet/socionext/sni_ave.c | 150 ++++++++++++++++++++++++++++---
 2 files changed, 140 insertions(+), 12 deletions(-)

diff --git a/drivers/net/ethernet/socionext/Kconfig b/drivers/net/ethernet/socionext/Kconfig
index 6bcfe27..b80048c 100644
--- a/drivers/net/ethernet/socionext/Kconfig
+++ b/drivers/net/ethernet/socionext/Kconfig
@@ -14,6 +14,8 @@ if NET_VENDOR_SOCIONEXT
 config SNI_AVE
 	tristate "Socionext AVE ethernet support"
 	depends on (ARCH_UNIPHIER || COMPILE_TEST) && OF
+	depends on HAS_IOMEM
+	select MFD_SYSCON
 	select PHYLIB
 	---help---
 	  Driver for gigabit ethernet MACs, called AVE, in the
diff --git a/drivers/net/ethernet/socionext/sni_ave.c b/drivers/net/ethernet/socionext/sni_ave.c
index 52940bd..f7eccee 100644
--- a/drivers/net/ethernet/socionext/sni_ave.c
+++ b/drivers/net/ethernet/socionext/sni_ave.c
@@ -11,6 +11,7 @@
 #include <linux/interrupt.h>
 #include <linux/io.h>
 #include <linux/iopoll.h>
+#include <linux/mfd/syscon.h>
 #include <linux/mii.h>
 #include <linux/module.h>
 #include <linux/netdevice.h>
@@ -18,6 +19,7 @@
 #include <linux/of_mdio.h>
 #include <linux/of_platform.h>
 #include <linux/phy.h>
+#include <linux/regmap.h>
 #include <linux/reset.h>
 #include <linux/types.h>
 #include <linux/u64_stats_sync.h>
@@ -197,6 +199,11 @@
 #define AVE_INTM_COUNT		20
 #define AVE_FORCE_TXINTCNT	1
 
+/* SG */
+#define SG_ETPINMODE		0x540
+#define SG_ETPINMODE_EXTPHY	BIT(1)	/* for LD11 */
+#define SG_ETPINMODE_RMII(ins)	BIT(ins)
+
 #define IS_DESC_64BIT(p)	((p)->data->is_desc_64bit)
 
 #define AVE_MAX_CLKS		4
@@ -228,12 +235,6 @@ struct ave_desc_info {
 	struct ave_desc *desc;	/* skb info related descriptor */
 };
 
-struct ave_soc_data {
-	bool	is_desc_64bit;
-	const char	*clock_names[AVE_MAX_CLKS];
-	const char	*reset_names[AVE_MAX_RSTS];
-};
-
 struct ave_stats {
 	struct	u64_stats_sync	syncp;
 	u64	packets;
@@ -257,6 +258,9 @@ struct ave_private {
 	phy_interface_t		phy_mode;
 	struct phy_device	*phydev;
 	struct mii_bus		*mdio;
+	struct regmap		*regmap;
+	unsigned int		pinmode_mask;
+	unsigned int		pinmode_val;
 
 	/* stats */
 	struct ave_stats	stats_rx;
@@ -279,6 +283,14 @@ struct ave_private {
 	const struct ave_soc_data *data;
 };
 
+struct ave_soc_data {
+	bool	is_desc_64bit;
+	const char	*clock_names[AVE_MAX_CLKS];
+	const char	*reset_names[AVE_MAX_RSTS];
+	int	(*get_pinmode)(struct ave_private *priv,
+			       phy_interface_t phy_mode, u32 arg);
+};
+
 static u32 ave_desc_read(struct net_device *ndev, enum desc_id id, int entry,
 			 int offset)
 {
@@ -1179,6 +1191,11 @@ static int ave_init(struct net_device *ndev)
 		}
 	}
 
+	ret = regmap_update_bits(priv->regmap, SG_ETPINMODE,
+				 priv->pinmode_mask, priv->pinmode_val);
+	if (ret)
+		return ret;
+
 	ave_global_reset(ndev);
 
 	mdio_np = of_get_child_by_name(np, "mdio");
@@ -1537,6 +1554,7 @@ static int ave_probe(struct platform_device *pdev)
 	const struct ave_soc_data *data;
 	struct device *dev = &pdev->dev;
 	char buf[ETHTOOL_FWVERS_LEN];
+	struct of_phandle_args args;
 	phy_interface_t phy_mode;
 	struct ave_private *priv;
 	struct net_device *ndev;
@@ -1559,12 +1577,6 @@ static int ave_probe(struct platform_device *pdev)
 		dev_err(dev, "phy-mode not found\n");
 		return -EINVAL;
 	}
-	if ((!phy_interface_mode_is_rgmii(phy_mode)) &&
-	    phy_mode != PHY_INTERFACE_MODE_RMII &&
-	    phy_mode != PHY_INTERFACE_MODE_MII) {
-		dev_err(dev, "phy-mode is invalid\n");
-		return -EINVAL;
-	}
 
 	irq = platform_get_irq(pdev, 0);
 	if (irq < 0) {
@@ -1656,6 +1668,26 @@ static int ave_probe(struct platform_device *pdev)
 		priv->nrsts++;
 	}
 
+	ret = of_parse_phandle_with_fixed_args(np,
+					       "socionext,syscon-phy-mode",
+					       1, 0, &args);
+	if (ret) {
+		netdev_err(ndev, "can't get syscon-phy-mode property\n");
+		goto out_free_netdev;
+	}
+	priv->regmap = syscon_node_to_regmap(args.np);
+	of_node_put(args.np);
+	if (IS_ERR(priv->regmap)) {
+		netdev_err(ndev, "can't map syscon-phy-mode\n");
+		ret = PTR_ERR(priv->regmap);
+		goto out_free_netdev;
+	}
+	ret = priv->data->get_pinmode(priv, phy_mode, args.args[0]);
+	if (ret) {
+		netdev_err(ndev, "invalid phy-mode setting\n");
+		goto out_free_netdev;
+	}
+
 	priv->mdio = devm_mdiobus_alloc(dev);
 	if (!priv->mdio) {
 		ret = -ENOMEM;
@@ -1715,6 +1747,95 @@ static int ave_remove(struct platform_device *pdev)
 	return 0;
 }
 
+static int ave_pro4_get_pinmode(struct ave_private *priv,
+				phy_interface_t phy_mode, u32 arg)
+{
+	if (arg > 0)
+		return -EINVAL;
+
+	priv->pinmode_mask = SG_ETPINMODE_RMII(0);
+
+	switch (phy_mode) {
+	case PHY_INTERFACE_MODE_RMII:
+		priv->pinmode_val = SG_ETPINMODE_RMII(0);
+		break;
+	case PHY_INTERFACE_MODE_MII:
+	case PHY_INTERFACE_MODE_RGMII:
+		priv->pinmode_val = 0;
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static int ave_ld11_get_pinmode(struct ave_private *priv,
+				phy_interface_t phy_mode, u32 arg)
+{
+	if (arg > 0)
+		return -EINVAL;
+
+	priv->pinmode_mask = SG_ETPINMODE_EXTPHY | SG_ETPINMODE_RMII(0);
+
+	switch (phy_mode) {
+	case PHY_INTERFACE_MODE_INTERNAL:
+		priv->pinmode_val = 0;
+		break;
+	case PHY_INTERFACE_MODE_RMII:
+		priv->pinmode_val = SG_ETPINMODE_EXTPHY | SG_ETPINMODE_RMII(0);
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static int ave_ld20_get_pinmode(struct ave_private *priv,
+				phy_interface_t phy_mode, u32 arg)
+{
+	if (arg > 0)
+		return -EINVAL;
+
+	priv->pinmode_mask = SG_ETPINMODE_RMII(0);
+
+	switch (phy_mode) {
+	case PHY_INTERFACE_MODE_RMII:
+		priv->pinmode_val = SG_ETPINMODE_RMII(0);
+		break;
+	case PHY_INTERFACE_MODE_RGMII:
+		priv->pinmode_val = 0;
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static int ave_pxs3_get_pinmode(struct ave_private *priv,
+				phy_interface_t phy_mode, u32 arg)
+{
+	if (arg > 1)
+		return -EINVAL;
+
+	priv->pinmode_mask = SG_ETPINMODE_RMII(arg);
+
+	switch (phy_mode) {
+	case PHY_INTERFACE_MODE_RMII:
+		priv->pinmode_val = SG_ETPINMODE_RMII(arg);
+		break;
+	case PHY_INTERFACE_MODE_RGMII:
+		priv->pinmode_val = 0;
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
 static const struct ave_soc_data ave_pro4_data = {
 	.is_desc_64bit = false,
 	.clock_names = {
@@ -1723,6 +1844,7 @@ static const struct ave_soc_data ave_pro4_data = {
 	.reset_names = {
 		"gio", "ether",
 	},
+	.get_pinmode = ave_pro4_get_pinmode,
 };
 
 static const struct ave_soc_data ave_pxs2_data = {
@@ -1733,6 +1855,7 @@ static const struct ave_soc_data ave_pxs2_data = {
 	.reset_names = {
 		"ether",
 	},
+	.get_pinmode = ave_pro4_get_pinmode,
 };
 
 static const struct ave_soc_data ave_ld11_data = {
@@ -1743,6 +1866,7 @@ static const struct ave_soc_data ave_ld11_data = {
 	.reset_names = {
 		"ether",
 	},
+	.get_pinmode = ave_ld11_get_pinmode,
 };
 
 static const struct ave_soc_data ave_ld20_data = {
@@ -1753,6 +1877,7 @@ static const struct ave_soc_data ave_ld20_data = {
 	.reset_names = {
 		"ether",
 	},
+	.get_pinmode = ave_ld20_get_pinmode,
 };
 
 static const struct ave_soc_data ave_pxs3_data = {
@@ -1763,6 +1888,7 @@ static const struct ave_soc_data ave_pxs3_data = {
 	.reset_names = {
 		"ether",
 	},
+	.get_pinmode = ave_pxs3_get_pinmode,
 };
 
 static const struct of_device_id of_ave_match[] = {
-- 
2.7.4

^ permalink raw reply related

* [PATCH net-next v2 2/3] dt-bindings: net: ave: add syscon-phy-mode property to configure phy-mode setting
From: Kunihiko Hayashi @ 2018-04-19  7:24 UTC (permalink / raw)
  To: David Miller, netdev
  Cc: Andrew Lunn, Florian Fainelli, Rob Herring, Mark Rutland,
	linux-arm-kernel, linux-kernel, devicetree, Masahiro Yamada,
	Masami Hiramatsu, Jassi Brar, Kunihiko Hayashi
In-Reply-To: <1524122695-19597-1-git-send-email-hayashi.kunihiko@socionext.com>

Add "socionext,syscon-phy-mode" property to specify system controller that
configures the settings about phy-mode.

Signed-off-by: Kunihiko Hayashi <hayashi.kunihiko@socionext.com>
Reviewed-by: Rob Herring <robh@kernel.org>
---
 Documentation/devicetree/bindings/net/socionext,uniphier-ave4.txt | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/Documentation/devicetree/bindings/net/socionext,uniphier-ave4.txt b/Documentation/devicetree/bindings/net/socionext,uniphier-ave4.txt
index 85e0c49..fc8f017 100644
--- a/Documentation/devicetree/bindings/net/socionext,uniphier-ave4.txt
+++ b/Documentation/devicetree/bindings/net/socionext,uniphier-ave4.txt
@@ -13,7 +13,8 @@ Required properties:
  - reg: Address where registers are mapped and size of region.
  - interrupts: Should contain the MAC interrupt.
  - phy-mode: See ethernet.txt in the same directory. Allow to choose
-	"rgmii", "rmii", or "mii" according to the PHY.
+	"rgmii", "rmii", "mii", or "internal" according to the PHY.
+	The acceptable mode is SoC-dependent.
  - phy-handle: Should point to the external phy device.
 	See ethernet.txt file in the same directory.
  - clocks: A phandle to the clock for the MAC.
@@ -27,6 +28,8 @@ Required properties:
  - reset-names: Should contain
 	- "ether", "gio" for Pro4 SoC
 	- "ether" for others
+ - socionext,syscon-phy-mode: A phandle to syscon with one argument
+	that configures phy mode. The argument is the ID of MAC instance.
 
 Optional properties:
  - local-mac-address: See ethernet.txt in the same directory.
@@ -47,6 +50,7 @@ Example:
 		clocks = <&sys_clk 6>;
 		reset-names = "ether";
 		resets = <&sys_rst 6>;
+		socionext,syscon-phy-mode = <&soc_glue 0>;
 		local-mac-address = [00 00 00 00 00 00];
 
 		mdio {
-- 
2.7.4

^ permalink raw reply related

* [PATCH net-next v2 1/3] net: ethernet: ave: add multiple clocks and resets support as required property
From: Kunihiko Hayashi @ 2018-04-19  7:24 UTC (permalink / raw)
  To: David Miller, netdev
  Cc: Andrew Lunn, Florian Fainelli, Rob Herring, Mark Rutland,
	linux-arm-kernel, linux-kernel, devicetree, Masahiro Yamada,
	Masami Hiramatsu, Jassi Brar, Kunihiko Hayashi
In-Reply-To: <1524122695-19597-1-git-send-email-hayashi.kunihiko@socionext.com>

When the link is becoming up for Pro4 SoC, the kernel is stalled
due to some missing clocks and resets.

The AVE block for Pro4 is connected to the GIO bus in the SoC.
Without its clock/reset, the access to the AVE register makes the
system stall.

In the same way, another MAC clock for Giga-bit Connection and
the PHY clock are also required for Pro4 to activate the Giga-bit feature
and to recognize the PHY.

To satisfy these requirements, this patch adds support for multiple clocks
and resets, and adds the clock-names and reset-names to the binding because
we need to distinguish clock/reset for the AVE main block and the others.

Also, make the resets a required property. Currently, "reset is
optional" relies on that the bootloader or firmware has deasserted
the reset before booting the kernel.  Drivers should work without
such expectation.

Fixes: 4c270b55a5af ("net: ethernet: socionext: add AVE ethernet driver")
Suggested-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Kunihiko Hayashi <hayashi.kunihiko@socionext.com>
Reviewed-by: Rob Herring <robh@kernel.org>
---
 .../bindings/net/socionext,uniphier-ave4.txt       |  13 ++-
 drivers/net/ethernet/socionext/sni_ave.c           | 108 ++++++++++++++++-----
 2 files changed, 96 insertions(+), 25 deletions(-)

diff --git a/Documentation/devicetree/bindings/net/socionext,uniphier-ave4.txt b/Documentation/devicetree/bindings/net/socionext,uniphier-ave4.txt
index 96398cc..85e0c49 100644
--- a/Documentation/devicetree/bindings/net/socionext,uniphier-ave4.txt
+++ b/Documentation/devicetree/bindings/net/socionext,uniphier-ave4.txt
@@ -17,9 +17,18 @@ Required properties:
  - phy-handle: Should point to the external phy device.
 	See ethernet.txt file in the same directory.
  - clocks: A phandle to the clock for the MAC.
+	For Pro4 SoC, that is "socionext,uniphier-pro4-ave4",
+	another MAC clock, GIO bus clock and PHY clock are also required.
+ - clock-names: Should contain
+	- "ether", "ether-gb", "gio", "ether-phy" for Pro4 SoC
+	- "ether" for others
+ - resets: A phandle to the reset control for the MAC. For Pro4 SoC,
+	GIO bus reset is also required.
+ - reset-names: Should contain
+	- "ether", "gio" for Pro4 SoC
+	- "ether" for others
 
 Optional properties:
- - resets: A phandle to the reset control for the MAC.
  - local-mac-address: See ethernet.txt in the same directory.
 
 Required subnode:
@@ -34,7 +43,9 @@ Example:
 		interrupts = <0 66 4>;
 		phy-mode = "rgmii";
 		phy-handle = <&ethphy>;
+		clock-names = "ether";
 		clocks = <&sys_clk 6>;
+		reset-names = "ether";
 		resets = <&sys_rst 6>;
 		local-mac-address = [00 00 00 00 00 00];
 
diff --git a/drivers/net/ethernet/socionext/sni_ave.c b/drivers/net/ethernet/socionext/sni_ave.c
index 0b3b7a4..52940bd 100644
--- a/drivers/net/ethernet/socionext/sni_ave.c
+++ b/drivers/net/ethernet/socionext/sni_ave.c
@@ -199,6 +199,9 @@
 
 #define IS_DESC_64BIT(p)	((p)->data->is_desc_64bit)
 
+#define AVE_MAX_CLKS		4
+#define AVE_MAX_RSTS		2
+
 enum desc_id {
 	AVE_DESCID_RX,
 	AVE_DESCID_TX,
@@ -227,6 +230,8 @@ struct ave_desc_info {
 
 struct ave_soc_data {
 	bool	is_desc_64bit;
+	const char	*clock_names[AVE_MAX_CLKS];
+	const char	*reset_names[AVE_MAX_RSTS];
 };
 
 struct ave_stats {
@@ -245,8 +250,10 @@ struct ave_private {
 	int			phy_id;
 	unsigned int		desc_size;
 	u32			msg_enable;
-	struct clk		*clk;
-	struct reset_control	*rst;
+	int			nclks;
+	struct clk		*clk[AVE_MAX_CLKS];
+	int			nrsts;
+	struct reset_control	*rst[AVE_MAX_RSTS];
 	phy_interface_t		phy_mode;
 	struct phy_device	*phydev;
 	struct mii_bus		*mdio;
@@ -1153,18 +1160,23 @@ static int ave_init(struct net_device *ndev)
 	struct device_node *np = dev->of_node;
 	struct device_node *mdio_np;
 	struct phy_device *phydev;
-	int ret;
+	int nc, nr, ret;
 
 	/* enable clk because of hw access until ndo_open */
-	ret = clk_prepare_enable(priv->clk);
-	if (ret) {
-		dev_err(dev, "can't enable clock\n");
-		return ret;
+	for (nc = 0; nc < priv->nclks; nc++) {
+		ret = clk_prepare_enable(priv->clk[nc]);
+		if (ret) {
+			dev_err(dev, "can't enable clock\n");
+			goto out_clk_disable;
+		}
 	}
-	ret = reset_control_deassert(priv->rst);
-	if (ret) {
-		dev_err(dev, "can't deassert reset\n");
-		goto out_clk_disable;
+
+	for (nr = 0; nr < priv->nrsts; nr++) {
+		ret = reset_control_deassert(priv->rst[nr]);
+		if (ret) {
+			dev_err(dev, "can't deassert reset\n");
+			goto out_reset_assert;
+		}
 	}
 
 	ave_global_reset(ndev);
@@ -1207,9 +1219,11 @@ static int ave_init(struct net_device *ndev)
 out_mdio_unregister:
 	mdiobus_unregister(priv->mdio);
 out_reset_assert:
-	reset_control_assert(priv->rst);
+	while (--nr >= 0)
+		reset_control_assert(priv->rst[nr]);
 out_clk_disable:
-	clk_disable_unprepare(priv->clk);
+	while (--nc >= 0)
+		clk_disable_unprepare(priv->clk[nc]);
 
 	return ret;
 }
@@ -1217,13 +1231,16 @@ static int ave_init(struct net_device *ndev)
 static void ave_uninit(struct net_device *ndev)
 {
 	struct ave_private *priv = netdev_priv(ndev);
+	int i;
 
 	phy_disconnect(priv->phydev);
 	mdiobus_unregister(priv->mdio);
 
 	/* disable clk because of hw access after ndo_stop */
-	reset_control_assert(priv->rst);
-	clk_disable_unprepare(priv->clk);
+	for (i = 0; i < priv->nrsts; i++)
+		reset_control_assert(priv->rst[i]);
+	for (i = 0; i < priv->nclks; i++)
+		clk_disable_unprepare(priv->clk[i]);
 }
 
 static int ave_open(struct net_device *ndev)
@@ -1527,8 +1544,9 @@ static int ave_probe(struct platform_device *pdev)
 	struct resource	*res;
 	const void *mac_addr;
 	void __iomem *base;
+	const char *name;
+	int i, irq, ret;
 	u64 dma_mask;
-	int irq, ret;
 	u32 ave_id;
 
 	data = of_device_get_match_data(dev);
@@ -1614,16 +1632,28 @@ static int ave_probe(struct platform_device *pdev)
 	u64_stats_init(&priv->stats_tx.syncp);
 	u64_stats_init(&priv->stats_rx.syncp);
 
-	priv->clk = devm_clk_get(dev, NULL);
-	if (IS_ERR(priv->clk)) {
-		ret = PTR_ERR(priv->clk);
-		goto out_free_netdev;
+	for (i = 0; i < AVE_MAX_CLKS; i++) {
+		name = priv->data->clock_names[i];
+		if (!name)
+			break;
+		priv->clk[i] = devm_clk_get(dev, name);
+		if (IS_ERR(priv->clk[i])) {
+			ret = PTR_ERR(priv->clk[i]);
+			goto out_free_netdev;
+		}
+		priv->nclks++;
 	}
 
-	priv->rst = devm_reset_control_get_optional_shared(dev, NULL);
-	if (IS_ERR(priv->rst)) {
-		ret = PTR_ERR(priv->rst);
-		goto out_free_netdev;
+	for (i = 0; i < AVE_MAX_RSTS; i++) {
+		name = priv->data->reset_names[i];
+		if (!name)
+			break;
+		priv->rst[i] = devm_reset_control_get_shared(dev, name);
+		if (IS_ERR(priv->rst[i])) {
+			ret = PTR_ERR(priv->rst[i]);
+			goto out_free_netdev;
+		}
+		priv->nrsts++;
 	}
 
 	priv->mdio = devm_mdiobus_alloc(dev);
@@ -1687,22 +1717,52 @@ static int ave_remove(struct platform_device *pdev)
 
 static const struct ave_soc_data ave_pro4_data = {
 	.is_desc_64bit = false,
+	.clock_names = {
+		"gio", "ether", "ether-gb", "ether-phy",
+	},
+	.reset_names = {
+		"gio", "ether",
+	},
 };
 
 static const struct ave_soc_data ave_pxs2_data = {
 	.is_desc_64bit = false,
+	.clock_names = {
+		"ether",
+	},
+	.reset_names = {
+		"ether",
+	},
 };
 
 static const struct ave_soc_data ave_ld11_data = {
 	.is_desc_64bit = false,
+	.clock_names = {
+		"ether",
+	},
+	.reset_names = {
+		"ether",
+	},
 };
 
 static const struct ave_soc_data ave_ld20_data = {
 	.is_desc_64bit = true,
+	.clock_names = {
+		"ether",
+	},
+	.reset_names = {
+		"ether",
+	},
 };
 
 static const struct ave_soc_data ave_pxs3_data = {
 	.is_desc_64bit = false,
+	.clock_names = {
+		"ether",
+	},
+	.reset_names = {
+		"ether",
+	},
 };
 
 static const struct of_device_id of_ave_match[] = {
-- 
2.7.4

^ permalink raw reply related

* [PATCH net-next v2 0/3] ave: fix the activation issues for some UniPhier SoCs
From: Kunihiko Hayashi @ 2018-04-19  7:24 UTC (permalink / raw)
  To: David Miller, netdev
  Cc: Andrew Lunn, Florian Fainelli, Rob Herring, Mark Rutland,
	linux-arm-kernel, linux-kernel, devicetree, Masahiro Yamada,
	Masami Hiramatsu, Jassi Brar, Kunihiko Hayashi

This add the following stuffs to fix the activation issues and satisfy
requirements for AVE ethernet driver implemented on some UniPhier SoCs.

- Add support for additional necessary clocks and resets, because the kernel
  is stalled on Pro4 due to lack of them.

- Check whether the SoC supports the specified phy-mode

- Add DT property support indicating system controller that has the feature
  for configurating phy-mode including built-in phy on LD11.

v1: https://www.spinics.net/lists/netdev/msg494904.html

Changes since v1:
- Add 'Reviewed-by' lines

Kunihiko Hayashi (3):
  net: ethernet: ave: add multiple clocks and resets support as required
    property
  dt-bindings: net: ave: add syscon-phy-mode property to configure
    phy-mode setting
  net: ethernet: ave: add support for phy-mode setting of system
    controller

 .../bindings/net/socionext,uniphier-ave4.txt       |  19 +-
 drivers/net/ethernet/socionext/Kconfig             |   2 +
 drivers/net/ethernet/socionext/sni_ave.c           | 252 ++++++++++++++++++---
 3 files changed, 238 insertions(+), 35 deletions(-)

-- 
2.7.4

^ permalink raw reply

* Re: [RFC PATCH net-next v6 2/4] net: Introduce generic bypass module
From: Jiri Pirko @ 2018-04-19  7:22 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Samudrala, Sridhar, stephen, davem, netdev, virtualization,
	virtio-dev, jesse.brandeburg, alexander.h.duyck, kubakici,
	jasowang, loseweigh
In-Reply-To: <20180419070752-mutt-send-email-mst@kernel.org>

Thu, Apr 19, 2018 at 06:08:58AM CEST, mst@redhat.com wrote:
>On Wed, Apr 18, 2018 at 10:32:06PM +0200, Jiri Pirko wrote:
>> >> >> > With regards to alternate names for 'active', you suggested 'stolen', but i
>> >> >> > am not too happy with it.
>> >> >> > netvsc uses vf_netdev, are you OK with this? Or another option is 'passthru'
>> >> >> No. The netdev could be any netdevice. It does not have to be a "VF".
>> >> >> I think "stolen" is quite appropriate since it describes the modus
>> >> >> operandi. The bypass master steals some netdevice according to some
>> >> >> match.
>> >> >> 
>> >> >> But I don't insist on "stolen". Just sounds right.
>> >> >
>> >> >We are adding VIRTIO_NET_F_BACKUP as a new feature bit to enable this feature, So i think
>> >> >'backup' name is consistent.
>> >> 
>> >> It perhaps makes sense from the view of virtio device. However, as I
>> >> described couple of times, for master/slave device the name "backup" is
>> >> highly misleading.
>> >
>> >virtio is the backup. You are supposed to use another
>> >(typically passthrough) device, if that fails use virtio.
>> >It does seem appropriate to me. If you like, we can
>> >change that to "standby".  Active I don't like either. "main"?
>> 
>> Sounds much better, yes.
>
>Excuse me, which of the versions are better in your eyes?

standby is okay. main/primary is fine too.

>
>
>> 
>> >
>> >In fact would failover be better than bypass?
>> 
>> Also, much better.
>> 

^ permalink raw reply

* [PATCH net] bnxt_en: Fix memory fault in bnxt_ethtool_init()
From: Michael Chan @ 2018-04-19  7:16 UTC (permalink / raw)
  To: davem; +Cc: netdev, kernel-team

From: Vasundhara Volam <vasundhara-v.volam@broadcom.com>

In some firmware images, the length of BNX_DIR_TYPE_PKG_LOG nvram type
could be greater than the fixed buffer length of 4096 bytes allocated by
the driver.  This was causing HWRM_NVM_READ to copy more data to the buffer
than the allocated size, causing general protection fault.

Fix the issue by allocating the exact buffer length returned by
HWRM_NVM_FIND_DIR_ENTRY, instead of 4096.  Move the kzalloc() call
into the bnxt_get_pkgver() function.

Fixes: 3ebf6f0a09a2 ("bnxt_en: Add installed-package firmware version reporting via Ethtool GDRVINFO")
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c  | 49 ++++++++++++----------
 drivers/net/ethernet/broadcom/bnxt/bnxt_nvm_defs.h |  2 -
 2 files changed, 27 insertions(+), 24 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
index 1f622ca..8ba14ae 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
@@ -1927,22 +1927,39 @@ static char *bnxt_parse_pkglog(int desired_field, u8 *data, size_t datalen)
 	return retval;
 }
 
-static char *bnxt_get_pkgver(struct net_device *dev, char *buf, size_t buflen)
+static void bnxt_get_pkgver(struct net_device *dev)
 {
+	struct bnxt *bp = netdev_priv(dev);
 	u16 index = 0;
-	u32 datalen;
+	char *pkgver;
+	u32 pkglen;
+	u8 *pkgbuf;
+	int len;
 
 	if (bnxt_find_nvram_item(dev, BNX_DIR_TYPE_PKG_LOG,
 				 BNX_DIR_ORDINAL_FIRST, BNX_DIR_EXT_NONE,
-				 &index, NULL, &datalen) != 0)
-		return NULL;
+				 &index, NULL, &pkglen) != 0)
+		return;
 
-	memset(buf, 0, buflen);
-	if (bnxt_get_nvram_item(dev, index, 0, datalen, buf) != 0)
-		return NULL;
+	pkgbuf = kzalloc(pkglen, GFP_KERNEL);
+	if (!pkgbuf) {
+		dev_err(&bp->pdev->dev, "Unable to allocate memory for pkg version, length = %u\n",
+			pkglen);
+		return;
+	}
+
+	if (bnxt_get_nvram_item(dev, index, 0, pkglen, pkgbuf))
+		goto err;
 
-	return bnxt_parse_pkglog(BNX_PKG_LOG_FIELD_IDX_PKG_VERSION, buf,
-		datalen);
+	pkgver = bnxt_parse_pkglog(BNX_PKG_LOG_FIELD_IDX_PKG_VERSION, pkgbuf,
+				   pkglen);
+	if (pkgver && *pkgver != 0 && isdigit(*pkgver)) {
+		len = strlen(bp->fw_ver_str);
+		snprintf(bp->fw_ver_str + len, FW_VER_STR_LEN - len - 1,
+			 "/pkg %s", pkgver);
+	}
+err:
+	kfree(pkgbuf);
 }
 
 static int bnxt_get_eeprom(struct net_device *dev,
@@ -2615,22 +2632,10 @@ void bnxt_ethtool_init(struct bnxt *bp)
 	struct hwrm_selftest_qlist_input req = {0};
 	struct bnxt_test_info *test_info;
 	struct net_device *dev = bp->dev;
-	char *pkglog;
 	int i, rc;
 
-	pkglog = kzalloc(BNX_PKG_LOG_MAX_LENGTH, GFP_KERNEL);
-	if (pkglog) {
-		char *pkgver;
-		int len;
+	bnxt_get_pkgver(dev);
 
-		pkgver = bnxt_get_pkgver(dev, pkglog, BNX_PKG_LOG_MAX_LENGTH);
-		if (pkgver && *pkgver != 0 && isdigit(*pkgver)) {
-			len = strlen(bp->fw_ver_str);
-			snprintf(bp->fw_ver_str + len, FW_VER_STR_LEN - len - 1,
-				 "/pkg %s", pkgver);
-		}
-		kfree(pkglog);
-	}
 	if (bp->hwrm_spec_code < 0x10704 || !BNXT_SINGLE_PF(bp))
 		return;
 
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_nvm_defs.h b/drivers/net/ethernet/broadcom/bnxt/bnxt_nvm_defs.h
index 73f2249..8344481 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_nvm_defs.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_nvm_defs.h
@@ -59,8 +59,6 @@ enum bnxt_nvm_directory_type {
 #define BNX_DIR_ATTR_NO_CHKSUM			(1 << 0)
 #define BNX_DIR_ATTR_PROP_STREAM		(1 << 1)
 
-#define BNX_PKG_LOG_MAX_LENGTH			4096
-
 enum bnxnvm_pkglog_field_index {
 	BNX_PKG_LOG_FIELD_IDX_INSTALLED_TIMESTAMP	= 0,
 	BNX_PKG_LOG_FIELD_IDX_PKG_DESCRIPTION		= 1,
-- 
1.8.3.1

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox