Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH v5 net-next] net: ipv4: Consider failed nexthops in multipath routes
From: David Miller @ 2016-04-06 15:35 UTC (permalink / raw)
  To: dsa; +Cc: netdev, ja
In-Reply-To: <1459728547-36371-1-git-send-email-dsa@cumulusnetworks.com>

From: David Ahern <dsa@cumulusnetworks.com>
Date: Sun,  3 Apr 2016 17:09:07 -0700

> +		rcu_read_lock_bh();
> +
> +		n = __neigh_lookup_noref(&arp_tbl, &nh->nh_gw, nh->nh_dev);
> +		if (n)
> +			state = n->nud_state;
> +
> +		rcu_read_unlock_bh();

Please use __ipv4_neigh_lookup_noref() (from net/arp.h), it inlines
the comparison and hash functions and is therefore several orders of
magnitude faster.

^ permalink raw reply

* Re: [net PATCH v2 2/2] ipv4/GRO: Make GRO conform to RFC 6864
From: Eric Dumazet @ 2016-04-06 15:39 UTC (permalink / raw)
  To: Edward Cree
  Cc: Tom Herbert, David Miller, Herbert Xu, Alexander Duyck,
	Alex Duyck, Jesse Gross, Eric Dumazet,
	Linux Kernel Network Developers
In-Reply-To: <57051C79.7010303@solarflare.com>

On Wed, 2016-04-06 at 15:26 +0100, Edward Cree wrote:
> On 06/04/16 14:53, Tom Herbert wrote:
> > But again, this scheme is optimizing for forwarding case and doesn't
> > help (and probably hurts) the use case of locally terminated
> > connections
> Not really.  I think this has a chance to outperform GRO for locally
> terminated connections, for a number of reasons:
> * Doesn't look at higher-layer or inner headers until later in packet
>   processing, for instance we (maybe) process every L3 header in a NAPI poll
>   before looking at a single L4.  This could delay touching the second
>   cacheline of some packets.
> * Doesn't have to re-write headers to produce a coherent superframe.
>   Instead, each segment carries its original headers around with it.  Also
>   means we can skip _checking_ some headers to see if we're 'allowed' to
>   coalesce (e.g. TCP TS differences, and the current fun with IP IDs).
> * Can be used for protocols like UDP where the original packet boundaries
>   are important (so you can't coalesce into a superframe).
> Really the last of those was the original reason for this idea, helping with
> forwarding is just another nice bonus that we (might) get from it.
> And it's all speculative and I don't know for sure what the performance
> would be like, but I won't know until I try it!

Look at the mess of some helpers in net/core/skbuff.c, and imagine the
super mess it would be if using a concept of 'super packet with various
headers on each segment'. netfilter is already complex, it would become
a nightmare.

GRO on the other hand presents one virtual set of headers, then the
payload in one or multiple frags.

^ permalink raw reply

* Re: [net PATCH v2 2/2] ipv4/GRO: Make GRO conform to RFC 6864
From: David Miller @ 2016-04-06 15:43 UTC (permalink / raw)
  To: tom; +Cc: ecree, herbert, alexander.duyck, aduyck, jesse, edumazet, netdev
In-Reply-To: <CALx6S34Sqp7tnmkPRWH0RF_CKrBLLNWFjHbr6Uq2r=6=scMMow@mail.gmail.com>

From: Tom Herbert <tom@herbertland.com>
Date: Wed, 6 Apr 2016 10:53:52 -0300

> Packets that are forwarded really should not be GRO'ed in the first
> place because of the loss of information and added latency.

First of all GRO is supposed to be lossless, so please stop saying this
would be a reason to turn it off on a router.

Second of all, the biggest piece of overhead is the routing lookup,
therefore GRO batching helps enormously with routing workloads, and
therefore is appropriate to be enabled on routers.

Yes, I agree that for locally terminated stuff it helps more, but don't
turn this into a "GRO on routers, meh..." type argument.  It simply is
not true at all.

^ permalink raw reply

* [PATCH net-next v2 0/3] net: dsa: voidify STP setter and FDB/VLAN add ops
From: Vivien Didelot @ 2016-04-06 15:55 UTC (permalink / raw)
  To: netdev
  Cc: linux-kernel, kernel, David S. Miller, Andrew Lunn,
	Florian Fainelli, Jiri Pirko, Scott Feldman, Vivien Didelot

Neither the DSA layer nor the bridge code (see br_set_state) really care
about eventual errors from STP state setters, so make it void.

The DSA layer separates the prepare and commit phases of switchdev in
two different functions. Logical errors must not happen in commit
routines, so make them void.

Changes v1 -> v2:
  - rename port_stp_update to port_stp_state_set
  - don't change code flow of bcm_sf2_sw_br_set_stp_state
  - prefer netdev_err over netdev_warn

Vivien Didelot (3):
  net: dsa: make the STP state function return void
  net: dsa: make the FDB add function return void
  net: dsa: make the VLAN add function return void

 Documentation/networking/dsa/dsa.txt |  2 +-
 drivers/net/dsa/bcm_sf2.c            | 25 +++++++--------
 drivers/net/dsa/mv88e6171.c          |  2 +-
 drivers/net/dsa/mv88e6352.c          |  2 +-
 drivers/net/dsa/mv88e6xxx.c          | 44 +++++++++++----------------
 drivers/net/dsa/mv88e6xxx.h          | 14 ++++-----
 include/net/dsa.h                    |  8 ++---
 net/dsa/slave.c                      | 59 ++++++++++++++++--------------------
 8 files changed, 69 insertions(+), 87 deletions(-)

-- 
2.8.0

^ permalink raw reply

* [PATCH net-next v2 3/3] net: dsa: make the VLAN add function return void
From: Vivien Didelot @ 2016-04-06 15:55 UTC (permalink / raw)
  To: netdev
  Cc: linux-kernel, kernel, David S. Miller, Andrew Lunn,
	Florian Fainelli, Jiri Pirko, Scott Feldman, Vivien Didelot
In-Reply-To: <1459958105-1090-1-git-send-email-vivien.didelot@savoirfairelinux.com>

The switchdev design implies that a software error should not happen in
the commit phase since it must have been previously reported in the
prepare phase. If an hardware error occurs during the commit phase,
there is nothing switchdev can do about it.

The DSA layer separates port_vlan_prepare and port_vlan_add for
simplicity and convenience. If an hardware error occurs during the
commit phase, there is no need to report it outside the driver itself.

Make the DSA port_vlan_add routine return void for explicitness.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
---
 drivers/net/dsa/mv88e6xxx.c | 26 +++++++++++---------------
 drivers/net/dsa/mv88e6xxx.h |  6 +++---
 include/net/dsa.h           |  2 +-
 net/dsa/slave.c             | 11 +++--------
 4 files changed, 18 insertions(+), 27 deletions(-)

diff --git a/drivers/net/dsa/mv88e6xxx.c b/drivers/net/dsa/mv88e6xxx.c
index ef36bf6..62320fc 100644
--- a/drivers/net/dsa/mv88e6xxx.c
+++ b/drivers/net/dsa/mv88e6xxx.c
@@ -1908,31 +1908,27 @@ static int _mv88e6xxx_port_vlan_add(struct dsa_switch *ds, int port, u16 vid,
 	return _mv88e6xxx_vtu_loadpurge(ds, &vlan);
 }
 
-int mv88e6xxx_port_vlan_add(struct dsa_switch *ds, int port,
-			    const struct switchdev_obj_port_vlan *vlan,
-			    struct switchdev_trans *trans)
+void mv88e6xxx_port_vlan_add(struct dsa_switch *ds, int port,
+			     const struct switchdev_obj_port_vlan *vlan,
+			     struct switchdev_trans *trans)
 {
 	struct mv88e6xxx_priv_state *ps = ds_to_priv(ds);
 	bool untagged = vlan->flags & BRIDGE_VLAN_INFO_UNTAGGED;
 	bool pvid = vlan->flags & BRIDGE_VLAN_INFO_PVID;
 	u16 vid;
-	int err = 0;
 
 	mutex_lock(&ps->smi_mutex);
 
-	for (vid = vlan->vid_begin; vid <= vlan->vid_end; ++vid) {
-		err = _mv88e6xxx_port_vlan_add(ds, port, vid, untagged);
-		if (err)
-			goto unlock;
-	}
+	for (vid = vlan->vid_begin; vid <= vlan->vid_end; ++vid)
+		if (_mv88e6xxx_port_vlan_add(ds, port, vid, untagged))
+			netdev_err(ds->ports[port], "failed to add VLAN %d%c\n",
+				   vid, untagged ? 'u' : 't');
 
-	/* no PVID with ranges, otherwise it's a bug */
-	if (pvid)
-		err = _mv88e6xxx_port_pvid_set(ds, port, vlan->vid_end);
-unlock:
-	mutex_unlock(&ps->smi_mutex);
+	if (pvid && _mv88e6xxx_port_pvid_set(ds, port, vlan->vid_end))
+		netdev_err(ds->ports[port], "failed to set PVID %d\n",
+			   vlan->vid_end);
 
-	return err;
+	mutex_unlock(&ps->smi_mutex);
 }
 
 static int _mv88e6xxx_port_vlan_del(struct dsa_switch *ds, int port, u16 vid)
diff --git a/drivers/net/dsa/mv88e6xxx.h b/drivers/net/dsa/mv88e6xxx.h
index a7dccbe..236bcaa 100644
--- a/drivers/net/dsa/mv88e6xxx.h
+++ b/drivers/net/dsa/mv88e6xxx.h
@@ -503,9 +503,9 @@ int mv88e6xxx_port_vlan_filtering(struct dsa_switch *ds, int port,
 int mv88e6xxx_port_vlan_prepare(struct dsa_switch *ds, int port,
 				const struct switchdev_obj_port_vlan *vlan,
 				struct switchdev_trans *trans);
-int mv88e6xxx_port_vlan_add(struct dsa_switch *ds, int port,
-			    const struct switchdev_obj_port_vlan *vlan,
-			    struct switchdev_trans *trans);
+void mv88e6xxx_port_vlan_add(struct dsa_switch *ds, int port,
+			     const struct switchdev_obj_port_vlan *vlan,
+			     struct switchdev_trans *trans);
 int mv88e6xxx_port_vlan_del(struct dsa_switch *ds, int port,
 			    const struct switchdev_obj_port_vlan *vlan);
 int mv88e6xxx_port_vlan_dump(struct dsa_switch *ds, int port,
diff --git a/include/net/dsa.h b/include/net/dsa.h
index f1670a4..18d1be3 100644
--- a/include/net/dsa.h
+++ b/include/net/dsa.h
@@ -310,7 +310,7 @@ struct dsa_switch_driver {
 	int	(*port_vlan_prepare)(struct dsa_switch *ds, int port,
 				     const struct switchdev_obj_port_vlan *vlan,
 				     struct switchdev_trans *trans);
-	int	(*port_vlan_add)(struct dsa_switch *ds, int port,
+	void	(*port_vlan_add)(struct dsa_switch *ds, int port,
 				 const struct switchdev_obj_port_vlan *vlan,
 				 struct switchdev_trans *trans);
 	int	(*port_vlan_del)(struct dsa_switch *ds, int port,
diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index 90bc744..2dae0d0 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -207,21 +207,16 @@ static int dsa_slave_port_vlan_add(struct net_device *dev,
 {
 	struct dsa_slave_priv *p = netdev_priv(dev);
 	struct dsa_switch *ds = p->parent;
-	int err;
 
 	if (switchdev_trans_ph_prepare(trans)) {
 		if (!ds->drv->port_vlan_prepare || !ds->drv->port_vlan_add)
 			return -EOPNOTSUPP;
 
-		err = ds->drv->port_vlan_prepare(ds, p->port, vlan, trans);
-		if (err)
-			return err;
-	} else {
-		err = ds->drv->port_vlan_add(ds, p->port, vlan, trans);
-		if (err)
-			return err;
+		return ds->drv->port_vlan_prepare(ds, p->port, vlan, trans);
 	}
 
+	ds->drv->port_vlan_add(ds, p->port, vlan, trans);
+
 	return 0;
 }
 
-- 
2.8.0

^ permalink raw reply related

* [PATCH net-next v2 1/3] net: dsa: make the STP state function return void
From: Vivien Didelot @ 2016-04-06 15:55 UTC (permalink / raw)
  To: netdev
  Cc: linux-kernel, kernel, David S. Miller, Andrew Lunn,
	Florian Fainelli, Jiri Pirko, Scott Feldman, Vivien Didelot
In-Reply-To: <1459958105-1090-1-git-send-email-vivien.didelot@savoirfairelinux.com>

The DSA layer doesn't care about the return code of the port_stp_update
routine, so make it void in the layer and the DSA drivers.

Replace the useless dsa_slave_stp_update function with a
dsa_slave_stp_state function used to reply to the switchdev
SWITCHDEV_ATTR_ID_PORT_STP_STATE attribute.

In the meantime, rename port_stp_update to port_stp_state_set to
explicit the state change.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
---
 Documentation/networking/dsa/dsa.txt |  2 +-
 drivers/net/dsa/bcm_sf2.c            | 16 ++++++----------
 drivers/net/dsa/mv88e6171.c          |  2 +-
 drivers/net/dsa/mv88e6352.c          |  2 +-
 drivers/net/dsa/mv88e6xxx.c          |  6 ++----
 drivers/net/dsa/mv88e6xxx.h          |  2 +-
 include/net/dsa.h                    |  4 ++--
 net/dsa/slave.c                      | 32 +++++++++++++++-----------------
 8 files changed, 29 insertions(+), 37 deletions(-)

diff --git a/Documentation/networking/dsa/dsa.txt b/Documentation/networking/dsa/dsa.txt
index 013b670..ba698c5 100644
--- a/Documentation/networking/dsa/dsa.txt
+++ b/Documentation/networking/dsa/dsa.txt
@@ -533,7 +533,7 @@ Bridge layer
   out at the switch hardware for the switch to (re) learn MAC addresses behind
   this port.
 
-- port_stp_update: bridge layer function invoked when a given switch port STP
+- port_stp_state_set: bridge layer function invoked when a given switch port STP
   state is computed by the bridge layer and should be propagated to switch
   hardware to forward/block/learn traffic. The switch driver is responsible for
   computing a STP state change based on current and asked parameters and perform
diff --git a/drivers/net/dsa/bcm_sf2.c b/drivers/net/dsa/bcm_sf2.c
index 95944d5..2bba1d9 100644
--- a/drivers/net/dsa/bcm_sf2.c
+++ b/drivers/net/dsa/bcm_sf2.c
@@ -545,12 +545,11 @@ static void bcm_sf2_sw_br_leave(struct dsa_switch *ds, int port)
 	priv->port_sts[port].bridge_dev = NULL;
 }
 
-static int bcm_sf2_sw_br_set_stp_state(struct dsa_switch *ds, int port,
-				       u8 state)
+static void bcm_sf2_sw_br_set_stp_state(struct dsa_switch *ds, int port,
+					u8 state)
 {
 	struct bcm_sf2_priv *priv = ds_to_priv(ds);
 	u8 hw_state, cur_hw_state;
-	int ret = 0;
 	u32 reg;
 
 	reg = core_readl(priv, CORE_G_PCTL_PORT(port));
@@ -574,7 +573,7 @@ static int bcm_sf2_sw_br_set_stp_state(struct dsa_switch *ds, int port,
 		break;
 	default:
 		pr_err("%s: invalid STP state: %d\n", __func__, state);
-		return -EINVAL;
+		return;
 	}
 
 	/* Fast-age ARL entries if we are moving a port from Learning or
@@ -584,10 +583,9 @@ static int bcm_sf2_sw_br_set_stp_state(struct dsa_switch *ds, int port,
 	if (cur_hw_state != hw_state) {
 		if (cur_hw_state >= G_MISTP_LEARN_STATE &&
 		    hw_state <= G_MISTP_LISTEN_STATE) {
-			ret = bcm_sf2_sw_fast_age_port(ds, port);
-			if (ret) {
+			if (bcm_sf2_sw_fast_age_port(ds, port)) {
 				pr_err("%s: fast-ageing failed\n", __func__);
-				return ret;
+				return;
 			}
 		}
 	}
@@ -596,8 +594,6 @@ static int bcm_sf2_sw_br_set_stp_state(struct dsa_switch *ds, int port,
 	reg &= ~(G_MISTP_STATE_MASK << G_MISTP_STATE_SHIFT);
 	reg |= hw_state;
 	core_writel(priv, reg, CORE_G_PCTL_PORT(port));
-
-	return 0;
 }
 
 /* Address Resolution Logic routines */
@@ -1387,7 +1383,7 @@ static struct dsa_switch_driver bcm_sf2_switch_driver = {
 	.set_eee		= bcm_sf2_sw_set_eee,
 	.port_bridge_join	= bcm_sf2_sw_br_join,
 	.port_bridge_leave	= bcm_sf2_sw_br_leave,
-	.port_stp_update	= bcm_sf2_sw_br_set_stp_state,
+	.port_stp_state_set	= bcm_sf2_sw_br_set_stp_state,
 	.port_fdb_prepare	= bcm_sf2_sw_fdb_prepare,
 	.port_fdb_add		= bcm_sf2_sw_fdb_add,
 	.port_fdb_del		= bcm_sf2_sw_fdb_del,
diff --git a/drivers/net/dsa/mv88e6171.c b/drivers/net/dsa/mv88e6171.c
index c0164b9..0e62f3b 100644
--- a/drivers/net/dsa/mv88e6171.c
+++ b/drivers/net/dsa/mv88e6171.c
@@ -105,7 +105,7 @@ struct dsa_switch_driver mv88e6171_switch_driver = {
 	.get_regs		= mv88e6xxx_get_regs,
 	.port_bridge_join	= mv88e6xxx_port_bridge_join,
 	.port_bridge_leave	= mv88e6xxx_port_bridge_leave,
-	.port_stp_update        = mv88e6xxx_port_stp_update,
+	.port_stp_state_set	= mv88e6xxx_port_stp_state_set,
 	.port_vlan_filtering	= mv88e6xxx_port_vlan_filtering,
 	.port_vlan_prepare	= mv88e6xxx_port_vlan_prepare,
 	.port_vlan_add		= mv88e6xxx_port_vlan_add,
diff --git a/drivers/net/dsa/mv88e6352.c b/drivers/net/dsa/mv88e6352.c
index 5f528ab..7f452e4 100644
--- a/drivers/net/dsa/mv88e6352.c
+++ b/drivers/net/dsa/mv88e6352.c
@@ -326,7 +326,7 @@ struct dsa_switch_driver mv88e6352_switch_driver = {
 	.get_regs		= mv88e6xxx_get_regs,
 	.port_bridge_join	= mv88e6xxx_port_bridge_join,
 	.port_bridge_leave	= mv88e6xxx_port_bridge_leave,
-	.port_stp_update	= mv88e6xxx_port_stp_update,
+	.port_stp_state_set	= mv88e6xxx_port_stp_state_set,
 	.port_vlan_filtering	= mv88e6xxx_port_vlan_filtering,
 	.port_vlan_prepare	= mv88e6xxx_port_vlan_prepare,
 	.port_vlan_add		= mv88e6xxx_port_vlan_add,
diff --git a/drivers/net/dsa/mv88e6xxx.c b/drivers/net/dsa/mv88e6xxx.c
index 0dda281..53c545c 100644
--- a/drivers/net/dsa/mv88e6xxx.c
+++ b/drivers/net/dsa/mv88e6xxx.c
@@ -1193,7 +1193,7 @@ static int _mv88e6xxx_port_based_vlan_map(struct dsa_switch *ds, int port)
 	return _mv88e6xxx_reg_write(ds, REG_PORT(port), PORT_BASE_VLAN, reg);
 }
 
-int mv88e6xxx_port_stp_update(struct dsa_switch *ds, int port, u8 state)
+void mv88e6xxx_port_stp_state_set(struct dsa_switch *ds, int port, u8 state)
 {
 	struct mv88e6xxx_priv_state *ps = ds_to_priv(ds);
 	int stp_state;
@@ -1215,14 +1215,12 @@ int mv88e6xxx_port_stp_update(struct dsa_switch *ds, int port, u8 state)
 		break;
 	}
 
-	/* mv88e6xxx_port_stp_update may be called with softirqs disabled,
+	/* mv88e6xxx_port_stp_state_set may be called with softirqs disabled,
 	 * so we can not update the port state directly but need to schedule it.
 	 */
 	ps->ports[port].state = stp_state;
 	set_bit(port, ps->port_state_update_mask);
 	schedule_work(&ps->bridge_work);
-
-	return 0;
 }
 
 static int _mv88e6xxx_port_pvid(struct dsa_switch *ds, int port, u16 *new,
diff --git a/drivers/net/dsa/mv88e6xxx.h b/drivers/net/dsa/mv88e6xxx.h
index 26a424a..4944855 100644
--- a/drivers/net/dsa/mv88e6xxx.h
+++ b/drivers/net/dsa/mv88e6xxx.h
@@ -497,7 +497,7 @@ int mv88e6xxx_set_eee(struct dsa_switch *ds, int port,
 int mv88e6xxx_port_bridge_join(struct dsa_switch *ds, int port,
 			       struct net_device *bridge);
 void mv88e6xxx_port_bridge_leave(struct dsa_switch *ds, int port);
-int mv88e6xxx_port_stp_update(struct dsa_switch *ds, int port, u8 state);
+void mv88e6xxx_port_stp_state_set(struct dsa_switch *ds, int port, u8 state);
 int mv88e6xxx_port_vlan_filtering(struct dsa_switch *ds, int port,
 				  bool vlan_filtering);
 int mv88e6xxx_port_vlan_prepare(struct dsa_switch *ds, int port,
diff --git a/include/net/dsa.h b/include/net/dsa.h
index 6463bb2..2123981 100644
--- a/include/net/dsa.h
+++ b/include/net/dsa.h
@@ -299,8 +299,8 @@ struct dsa_switch_driver {
 	int	(*port_bridge_join)(struct dsa_switch *ds, int port,
 				    struct net_device *bridge);
 	void	(*port_bridge_leave)(struct dsa_switch *ds, int port);
-	int	(*port_stp_update)(struct dsa_switch *ds, int port,
-				   u8 state);
+	void	(*port_stp_state_set)(struct dsa_switch *ds, int port,
+				      u8 state);
 
 	/*
 	 * VLAN support
diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index a575f03..088215c 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -104,8 +104,8 @@ static int dsa_slave_open(struct net_device *dev)
 			goto clear_promisc;
 	}
 
-	if (ds->drv->port_stp_update)
-		ds->drv->port_stp_update(ds, p->port, stp_state);
+	if (ds->drv->port_stp_state_set)
+		ds->drv->port_stp_state_set(ds, p->port, stp_state);
 
 	if (p->phy)
 		phy_start(p->phy);
@@ -147,8 +147,8 @@ static int dsa_slave_close(struct net_device *dev)
 	if (ds->drv->port_disable)
 		ds->drv->port_disable(ds, p->port, p->phy);
 
-	if (ds->drv->port_stp_update)
-		ds->drv->port_stp_update(ds, p->port, BR_STATE_DISABLED);
+	if (ds->drv->port_stp_state_set)
+		ds->drv->port_stp_state_set(ds, p->port, BR_STATE_DISABLED);
 
 	return 0;
 }
@@ -305,16 +305,19 @@ static int dsa_slave_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd)
 	return -EOPNOTSUPP;
 }
 
-static int dsa_slave_stp_update(struct net_device *dev, u8 state)
+static int dsa_slave_stp_state_set(struct net_device *dev,
+				   const struct switchdev_attr *attr,
+				   struct switchdev_trans *trans)
 {
 	struct dsa_slave_priv *p = netdev_priv(dev);
 	struct dsa_switch *ds = p->parent;
-	int ret = -EOPNOTSUPP;
 
-	if (ds->drv->port_stp_update)
-		ret = ds->drv->port_stp_update(ds, p->port, state);
+	if (switchdev_trans_ph_prepare(trans))
+		return ds->drv->port_stp_state_set ? 0 : -EOPNOTSUPP;
 
-	return ret;
+	ds->drv->port_stp_state_set(ds, p->port, attr->u.stp_state);
+
+	return 0;
 }
 
 static int dsa_slave_vlan_filtering(struct net_device *dev,
@@ -339,17 +342,11 @@ static int dsa_slave_port_attr_set(struct net_device *dev,
 				   const struct switchdev_attr *attr,
 				   struct switchdev_trans *trans)
 {
-	struct dsa_slave_priv *p = netdev_priv(dev);
-	struct dsa_switch *ds = p->parent;
 	int ret;
 
 	switch (attr->id) {
 	case SWITCHDEV_ATTR_ID_PORT_STP_STATE:
-		if (switchdev_trans_ph_prepare(trans))
-			ret = ds->drv->port_stp_update ? 0 : -EOPNOTSUPP;
-		else
-			ret = ds->drv->port_stp_update(ds, p->port,
-						       attr->u.stp_state);
+		ret = dsa_slave_stp_state_set(dev, attr, trans);
 		break;
 	case SWITCHDEV_ATTR_ID_BRIDGE_VLAN_FILTERING:
 		ret = dsa_slave_vlan_filtering(dev, attr, trans);
@@ -468,7 +465,8 @@ static void dsa_slave_bridge_port_leave(struct net_device *dev)
 	/* Port left the bridge, put in BR_STATE_DISABLED by the bridge layer,
 	 * so allow it to be in BR_STATE_FORWARDING to be kept functional
 	 */
-	dsa_slave_stp_update(dev, BR_STATE_FORWARDING);
+	if (ds->drv->port_stp_state_set)
+		ds->drv->port_stp_state_set(ds, p->port, BR_STATE_FORWARDING);
 }
 
 static int dsa_slave_port_attr_get(struct net_device *dev,
-- 
2.8.0

^ permalink raw reply related

* [PATCH net-next v2 2/3] net: dsa: make the FDB add function return void
From: Vivien Didelot @ 2016-04-06 15:55 UTC (permalink / raw)
  To: netdev
  Cc: linux-kernel, kernel, David S. Miller, Andrew Lunn,
	Florian Fainelli, Jiri Pirko, Scott Feldman, Vivien Didelot
In-Reply-To: <1459958105-1090-1-git-send-email-vivien.didelot@savoirfairelinux.com>

The switchdev design implies that a software error should not happen in
the commit phase since it must have been previously reported in the
prepare phase. If an hardware error occurs during the commit phase,
there is nothing switchdev can do about it.

The DSA layer separates port_fdb_prepare and port_fdb_add for simplicity
and convenience. If an hardware error occurs during the commit phase,
there is no need to report it outside the DSA driver itself.

Make the DSA port_fdb_add routine return void for explicitness.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
---
 drivers/net/dsa/bcm_sf2.c   |  9 +++++----
 drivers/net/dsa/mv88e6xxx.c | 12 +++++-------
 drivers/net/dsa/mv88e6xxx.h |  6 +++---
 include/net/dsa.h           |  2 +-
 net/dsa/slave.c             | 16 ++++++++--------
 5 files changed, 22 insertions(+), 23 deletions(-)

diff --git a/drivers/net/dsa/bcm_sf2.c b/drivers/net/dsa/bcm_sf2.c
index 2bba1d9..780f228 100644
--- a/drivers/net/dsa/bcm_sf2.c
+++ b/drivers/net/dsa/bcm_sf2.c
@@ -724,13 +724,14 @@ static int bcm_sf2_sw_fdb_prepare(struct dsa_switch *ds, int port,
 	return 0;
 }
 
-static int bcm_sf2_sw_fdb_add(struct dsa_switch *ds, int port,
-			      const struct switchdev_obj_port_fdb *fdb,
-			      struct switchdev_trans *trans)
+static void bcm_sf2_sw_fdb_add(struct dsa_switch *ds, int port,
+			       const struct switchdev_obj_port_fdb *fdb,
+			       struct switchdev_trans *trans)
 {
 	struct bcm_sf2_priv *priv = ds_to_priv(ds);
 
-	return bcm_sf2_arl_op(priv, 0, port, fdb->addr, fdb->vid, true);
+	if (bcm_sf2_arl_op(priv, 0, port, fdb->addr, fdb->vid, true))
+		pr_err("%s: failed to add MAC address\n", __func__);
 }
 
 static int bcm_sf2_sw_fdb_del(struct dsa_switch *ds, int port,
diff --git a/drivers/net/dsa/mv88e6xxx.c b/drivers/net/dsa/mv88e6xxx.c
index 53c545c..ef36bf6 100644
--- a/drivers/net/dsa/mv88e6xxx.c
+++ b/drivers/net/dsa/mv88e6xxx.c
@@ -2090,21 +2090,19 @@ int mv88e6xxx_port_fdb_prepare(struct dsa_switch *ds, int port,
 	return 0;
 }
 
-int mv88e6xxx_port_fdb_add(struct dsa_switch *ds, int port,
-			   const struct switchdev_obj_port_fdb *fdb,
-			   struct switchdev_trans *trans)
+void mv88e6xxx_port_fdb_add(struct dsa_switch *ds, int port,
+			    const struct switchdev_obj_port_fdb *fdb,
+			    struct switchdev_trans *trans)
 {
 	int state = is_multicast_ether_addr(fdb->addr) ?
 		GLOBAL_ATU_DATA_STATE_MC_STATIC :
 		GLOBAL_ATU_DATA_STATE_UC_STATIC;
 	struct mv88e6xxx_priv_state *ps = ds_to_priv(ds);
-	int ret;
 
 	mutex_lock(&ps->smi_mutex);
-	ret = _mv88e6xxx_port_fdb_load(ds, port, fdb->addr, fdb->vid, state);
+	if (_mv88e6xxx_port_fdb_load(ds, port, fdb->addr, fdb->vid, state))
+		netdev_err(ds->ports[port], "failed to load MAC address\n");
 	mutex_unlock(&ps->smi_mutex);
-
-	return ret;
 }
 
 int mv88e6xxx_port_fdb_del(struct dsa_switch *ds, int port,
diff --git a/drivers/net/dsa/mv88e6xxx.h b/drivers/net/dsa/mv88e6xxx.h
index 4944855..a7dccbe 100644
--- a/drivers/net/dsa/mv88e6xxx.h
+++ b/drivers/net/dsa/mv88e6xxx.h
@@ -514,9 +514,9 @@ int mv88e6xxx_port_vlan_dump(struct dsa_switch *ds, int port,
 int mv88e6xxx_port_fdb_prepare(struct dsa_switch *ds, int port,
 			       const struct switchdev_obj_port_fdb *fdb,
 			       struct switchdev_trans *trans);
-int mv88e6xxx_port_fdb_add(struct dsa_switch *ds, int port,
-			   const struct switchdev_obj_port_fdb *fdb,
-			   struct switchdev_trans *trans);
+void mv88e6xxx_port_fdb_add(struct dsa_switch *ds, int port,
+			    const struct switchdev_obj_port_fdb *fdb,
+			    struct switchdev_trans *trans);
 int mv88e6xxx_port_fdb_del(struct dsa_switch *ds, int port,
 			   const struct switchdev_obj_port_fdb *fdb);
 int mv88e6xxx_port_fdb_dump(struct dsa_switch *ds, int port,
diff --git a/include/net/dsa.h b/include/net/dsa.h
index 2123981..f1670a4 100644
--- a/include/net/dsa.h
+++ b/include/net/dsa.h
@@ -325,7 +325,7 @@ struct dsa_switch_driver {
 	int	(*port_fdb_prepare)(struct dsa_switch *ds, int port,
 				    const struct switchdev_obj_port_fdb *fdb,
 				    struct switchdev_trans *trans);
-	int	(*port_fdb_add)(struct dsa_switch *ds, int port,
+	void	(*port_fdb_add)(struct dsa_switch *ds, int port,
 				const struct switchdev_obj_port_fdb *fdb,
 				struct switchdev_trans *trans);
 	int	(*port_fdb_del)(struct dsa_switch *ds, int port,
diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index 088215c..90bc744 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -256,17 +256,17 @@ static int dsa_slave_port_fdb_add(struct net_device *dev,
 {
 	struct dsa_slave_priv *p = netdev_priv(dev);
 	struct dsa_switch *ds = p->parent;
-	int ret;
 
-	if (!ds->drv->port_fdb_prepare || !ds->drv->port_fdb_add)
-		return -EOPNOTSUPP;
+	if (switchdev_trans_ph_prepare(trans)) {
+		if (!ds->drv->port_fdb_prepare || !ds->drv->port_fdb_add)
+			return -EOPNOTSUPP;
 
-	if (switchdev_trans_ph_prepare(trans))
-		ret = ds->drv->port_fdb_prepare(ds, p->port, fdb, trans);
-	else
-		ret = ds->drv->port_fdb_add(ds, p->port, fdb, trans);
+		return ds->drv->port_fdb_prepare(ds, p->port, fdb, trans);
+	}
 
-	return ret;
+	ds->drv->port_fdb_add(ds, p->port, fdb, trans);
+
+	return 0;
 }
 
 static int dsa_slave_port_fdb_del(struct net_device *dev,
-- 
2.8.0

^ permalink raw reply related

* Re: [net PATCH v2 2/2] ipv4/GRO: Make GRO conform to RFC 6864
From: Edward Cree @ 2016-04-06 15:55 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Tom Herbert, David Miller, Herbert Xu, Alexander Duyck,
	Alex Duyck, Jesse Gross, Eric Dumazet,
	Linux Kernel Network Developers
In-Reply-To: <1459957158.6473.363.camel@edumazet-glaptop3.roam.corp.google.com>

On 06/04/16 16:39, Eric Dumazet wrote:
> Look at the mess of some helpers in net/core/skbuff.c, and imagine the
> super mess it would be if using a concept of 'super packet with various
> headers on each segment'.
Maybe I'm still not explaining this very well, but there is _no_ concept of
'super packet [anything]' in this idea.  There is just 'list of skbs that
were all received in the same NAPI poll, and have not yet been determined to
be different'.

Any layer that doesn't want to deal with this stuff will always have the
option of "while (skb = skb_dequeue(list)) my_normal_receive_function(skb);"
and in fact I'd make that happen by default for anything that hadn't
registered a function to take a list.
> netfilter is already complex, it would become a nightmare.
A netfilter hook could, for instance, run on each packet in the list, then
partition the list into sub-lists of packets that all had the same verdict
(letting go of any that were DROP or STOLEN).  That doesn't seem like it
should be nightmarish.

-Ed

^ permalink raw reply

* Re: [net PATCH v2 2/2] ipv4/GRO: Make GRO conform to RFC 6864
From: Eric Dumazet @ 2016-04-06 16:03 UTC (permalink / raw)
  To: Edward Cree
  Cc: Tom Herbert, David Miller, Herbert Xu, Alexander Duyck,
	Alex Duyck, Jesse Gross, Eric Dumazet,
	Linux Kernel Network Developers
In-Reply-To: <57053183.6030002@solarflare.com>

On Wed, 2016-04-06 at 16:55 +0100, Edward Cree wrote:
> On 06/04/16 16:39, Eric Dumazet wrote:
> > Look at the mess of some helpers in net/core/skbuff.c, and imagine the
> > super mess it would be if using a concept of 'super packet with various
> > headers on each segment'.
> Maybe I'm still not explaining this very well, but there is _no_ concept of
> 'super packet [anything]' in this idea.  There is just 'list of skbs that
> were all received in the same NAPI poll, and have not yet been determined to
> be different'.
> 
> Any layer that doesn't want to deal with this stuff will always have the
> option of "while (skb = skb_dequeue(list)) my_normal_receive_function(skb);"
> and in fact I'd make that happen by default for anything that hadn't
> registered a function to take a list.
> > netfilter is already complex, it would become a nightmare.
> A netfilter hook could, for instance, run on each packet in the list, then
> partition the list into sub-lists of packets that all had the same verdict
> (letting go of any that were DROP or STOLEN).  That doesn't seem like it
> should be nightmarish.

Okay, I will let you try this alone ;)

^ permalink raw reply

* Re: [PATCH net-next v2 0/3] net: dsa: voidify STP setter and FDB/VLAN add ops
From: Andrew Lunn @ 2016-04-06 16:19 UTC (permalink / raw)
  To: Vivien Didelot
  Cc: netdev, linux-kernel, kernel, David S. Miller, Florian Fainelli,
	Jiri Pirko, Scott Feldman
In-Reply-To: <1459958105-1090-1-git-send-email-vivien.didelot@savoirfairelinux.com>

On Wed, Apr 06, 2016 at 11:55:02AM -0400, Vivien Didelot wrote:
> Neither the DSA layer nor the bridge code (see br_set_state) really care
> about eventual errors from STP state setters, so make it void.
> 
> The DSA layer separates the prepare and commit phases of switchdev in
> two different functions. Logical errors must not happen in commit
> routines, so make them void.

Looks good.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>

Thanks
	Andrew

^ permalink raw reply

* Re: [PATCH net-next v2] net: dsa: document missing functions
From: Andrew Lunn @ 2016-04-06 16:21 UTC (permalink / raw)
  To: Vivien Didelot
  Cc: netdev, linux-kernel, kernel, David S. Miller, Florian Fainelli
In-Reply-To: <1459955180-12083-1-git-send-email-vivien.didelot@savoirfairelinux.com>

On Wed, Apr 06, 2016 at 11:06:20AM -0400, Vivien Didelot wrote:
> Add description for the missing port_vlan_prepare, port_fdb_prepare,
> port_fdb_dump functions in the DSA documentation.
> 
> Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>

Reviewed-by: Andrew Lunn <andrew@lunn.ch>

	     Andrew

^ permalink raw reply

* Re: [PATCH net-next V3 00/16] net: fec: cleanup and fixes
From: Troy Kisky @ 2016-04-06 16:43 UTC (permalink / raw)
  To: Fugang Duan, netdev@vger.kernel.org, davem@davemloft.net,
	lznuaa@gmail.com
  Cc: Fabio Estevam, l.stach@pengutronix.de, andrew@lunn.ch,
	tremyfr@gmail.com, gerg@uclinux.org,
	linux-arm-kernel@lists.infradead.org, johannes@sipsolutions.net,
	stillcompiling@gmail.com, sergei.shtylyov@cogentembedded.com,
	arnd@arndb.de
In-Reply-To: <VI1PR0401MB1855C389F2265E30252D62B5FF9F0@VI1PR0401MB1855.eurprd04.prod.outlook.com>

On 4/6/2016 1:51 AM, Fugang Duan wrote:
> From: Troy Kisky <troy.kisky@boundarydevices.com> Sent: Wednesday, April 06, 2016 10:26 AM
>> To: netdev@vger.kernel.org; davem@davemloft.net; Fugang Duan
>> <fugang.duan@nxp.com>; lznuaa@gmail.com
>> Cc: Fabio Estevam <fabio.estevam@nxp.com>; l.stach@pengutronix.de;
>> andrew@lunn.ch; tremyfr@gmail.com; gerg@uclinux.org; linux-arm-
>> kernel@lists.infradead.org; johannes@sipsolutions.net;
>> stillcompiling@gmail.com; sergei.shtylyov@cogentembedded.com;
>> arnd@arndb.de; Troy Kisky <troy.kisky@boundarydevices.com>
>> Subject: [PATCH net-next V3 00/16] net: fec: cleanup and fixes
>>
>> V3 has
>>
>> 1 dropped patch "net: fec: print more debug info in fec_timeout"
>> 2 new patches
>> 0002-net-fec-remove-unused-interrupt-FEC_ENET_TS_TIMER.patch
>> 0003-net-fec-return-IRQ_HANDLED-if-fec_ptp_check_pps_even.patch
>>
>> 1 combined patch
>> 0004-net-fec-pass-rxq-txq-to-fec_enet_rx-tx_queue-instead.patch
>>
>> The changes are noted on individual patches
>>
>> My measured performance of this series is
>>
>> before patch set
>> 365 Mbits/sec Tx/407 RX
>>
>> after patch set
>> 374 Tx/427 Rx
>>
> 
> I doubt the performance data,  I validate it on i.MX6q sabresd board on the latest commit(4da46cebbd3b) in net tree.



I was doing UDP tests, as outlined in my V2 cover letter. Also, my cpu is 1G. Is yours 1.2G?




> root@imx6qdlsolo:~# uname -a
> Linux imx6qdlsolo 4.6.0-rc1-00318-g4da46ce #180 SMP Wed Apr 6 16:24:09 CST 2016 armv7l GNU/Linux


This is the V2 patch that I dropped.

I will force update my local net-next_master branch, to make testing this series easier.
Note that my local net-next_master branch has about 19 patches on top of this series.
so,

tkisky@office-server2:~/linux-imx6$ git reset --hard HEAD~19
HEAD is now at a125da7 net: fec: don't set cbd_bufaddr unless no mapping error


> 
> TCP RX performance is 602Mbps, TX is only 325Mbps,   TX path has some performance issue in net tree.
> I will dig out it.
> 
> 


More testing is always better. Thanks


>>
>> Troy Kisky (16):
>>   net: fec: only check queue 0 if RXF_0/TXF_0 interrupt is set
>>   net: fec: remove unused interrupt FEC_ENET_TS_TIMER
>>   net: fec: return IRQ_HANDLED if fec_ptp_check_pps_event handled it
>>   net: fec: pass rxq/txq to fec_enet_rx/tx_queue instead of queue_id
>>   net: fec: reduce interrupts
>>   net: fec: split off napi routine with 3 queues
>>   net: fec: don't clear all rx queue bits when just one is being checked
>>   net: fec: set cbd_sc without relying on previous value
>>   net: fec: eliminate calls to fec_enet_get_prevdesc
>>   net: fec: move restart test for efficiency
>>   net: fec: clear cbd_sc after transmission to help with debugging
>>   net: fec: dump all tx queues in fec_dump
>>   net: fec: detect tx int lost
>>   net: fec: create subroutine reset_tx_queue
>>   net: fec: call dma_unmap_single on mapped tx buffers at restart
>>   net: fec: don't set cbd_bufaddr unless no mapping error
>>
>>  drivers/net/ethernet/freescale/fec.h      |  10 +-
>>  drivers/net/ethernet/freescale/fec_main.c | 410 ++++++++++++++++------------
>> --
>>  2 files changed, 218 insertions(+), 202 deletions(-)
>>
>> --
>> 2.5.0
> 

^ permalink raw reply

* Re: [PATCH net-next V3 00/16] net: fec: cleanup and fixes
From: Troy Kisky @ 2016-04-06 16:47 UTC (permalink / raw)
  To: Fugang Duan, netdev@vger.kernel.org, davem@davemloft.net,
	lznuaa@gmail.com
  Cc: Fabio Estevam, l.stach@pengutronix.de, andrew@lunn.ch,
	tremyfr@gmail.com, gerg@uclinux.org,
	linux-arm-kernel@lists.infradead.org, johannes@sipsolutions.net,
	stillcompiling@gmail.com, sergei.shtylyov@cogentembedded.com,
	arnd@arndb.de
In-Reply-To: <57053C94.2000009@boundarydevices.com>

On 4/6/2016 9:43 AM, Troy Kisky wrote:
> On 4/6/2016 1:51 AM, Fugang Duan wrote:
>> From: Troy Kisky <troy.kisky@boundarydevices.com> Sent: Wednesday, April 06, 2016 10:26 AM
>>> To: netdev@vger.kernel.org; davem@davemloft.net; Fugang Duan
>>> <fugang.duan@nxp.com>; lznuaa@gmail.com
>>> Cc: Fabio Estevam <fabio.estevam@nxp.com>; l.stach@pengutronix.de;
>>> andrew@lunn.ch; tremyfr@gmail.com; gerg@uclinux.org; linux-arm-
>>> kernel@lists.infradead.org; johannes@sipsolutions.net;
>>> stillcompiling@gmail.com; sergei.shtylyov@cogentembedded.com;
>>> arnd@arndb.de; Troy Kisky <troy.kisky@boundarydevices.com>
>>> Subject: [PATCH net-next V3 00/16] net: fec: cleanup and fixes
>>>
>>> V3 has
>>>
>>> 1 dropped patch "net: fec: print more debug info in fec_timeout"
>>> 2 new patches
>>> 0002-net-fec-remove-unused-interrupt-FEC_ENET_TS_TIMER.patch
>>> 0003-net-fec-return-IRQ_HANDLED-if-fec_ptp_check_pps_even.patch
>>>
>>> 1 combined patch
>>> 0004-net-fec-pass-rxq-txq-to-fec_enet_rx-tx_queue-instead.patch
>>>
>>> The changes are noted on individual patches
>>>
>>> My measured performance of this series is
>>>
>>> before patch set
>>> 365 Mbits/sec Tx/407 RX
>>>
>>> after patch set
>>> 374 Tx/427 Rx
>>>
>>
>> I doubt the performance data,  I validate it on i.MX6q sabresd board on the latest commit(4da46cebbd3b) in net tree.
> 
> 
> 
> I was doing UDP tests, as outlined in my V2 cover letter. Also, my cpu is 1G. Is yours 1.2G?
> 
> 
> 
> 
>> root@imx6qdlsolo:~# uname -a
>> Linux imx6qdlsolo 4.6.0-rc1-00318-g4da46ce #180 SMP Wed Apr 6 16:24:09 CST 2016 armv7l GNU/Linux
> 
> 
> This is the V2 patch that I dropped.


Sorry, your right. It is the current head, without this series.


> 
> I will force update my local net-next_master branch, to make testing this series easier.
> Note that my local net-next_master branch has about 19 patches on top of this series.
> so,
> 
> tkisky@office-server2:~/linux-imx6$ git reset --hard HEAD~19
> HEAD is now at a125da7 net: fec: don't set cbd_bufaddr unless no mapping error
> 
> 
>>
>> TCP RX performance is 602Mbps, TX is only 325Mbps,   TX path has some performance issue in net tree.
>> I will dig out it.
>>
>>
> 
> 
> More testing is always better. Thanks
> 
> 
>>>
>>> Troy Kisky (16):
>>>   net: fec: only check queue 0 if RXF_0/TXF_0 interrupt is set
>>>   net: fec: remove unused interrupt FEC_ENET_TS_TIMER
>>>   net: fec: return IRQ_HANDLED if fec_ptp_check_pps_event handled it
>>>   net: fec: pass rxq/txq to fec_enet_rx/tx_queue instead of queue_id
>>>   net: fec: reduce interrupts
>>>   net: fec: split off napi routine with 3 queues
>>>   net: fec: don't clear all rx queue bits when just one is being checked
>>>   net: fec: set cbd_sc without relying on previous value
>>>   net: fec: eliminate calls to fec_enet_get_prevdesc
>>>   net: fec: move restart test for efficiency
>>>   net: fec: clear cbd_sc after transmission to help with debugging
>>>   net: fec: dump all tx queues in fec_dump
>>>   net: fec: detect tx int lost
>>>   net: fec: create subroutine reset_tx_queue
>>>   net: fec: call dma_unmap_single on mapped tx buffers at restart
>>>   net: fec: don't set cbd_bufaddr unless no mapping error
>>>
>>>  drivers/net/ethernet/freescale/fec.h      |  10 +-
>>>  drivers/net/ethernet/freescale/fec_main.c | 410 ++++++++++++++++------------
>>> --
>>>  2 files changed, 218 insertions(+), 202 deletions(-)
>>>
>>> --
>>> 2.5.0
>>

^ permalink raw reply

* [PATCH net-next] ibmvnic: Enable use of multiple tx/rx scrqs
From: John Allen @ 2016-04-06 16:49 UTC (permalink / raw)
  To: Thomas Falcon, netdev, davem; +Cc: linuxppc-dev

Enables the use of multiple transmit and receive scrqs allowing the ibmvnic
driver to take advantage of multiqueue functionality. To achieve this, the
driver must implement the process of negotiating the maximum number of
queues allowed by the server. Initially, the driver will attempt to login
with the maximum number of tx and rx queues supported by the server. If
the server fails to allocate the requested number of scrqs, it will return
partial success in the login response. In this case, we must reinitiate
the login process from the request capabilities stage and attempt to login
requesting fewer scrqs.

Signed-off-by: John Allen <jallen@linux.vnet.ibm.com>
---
diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c
index 21bccf6..6e9b91d 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -800,11 +800,12 @@ static int ibmvnic_xmit(struct sk_buff *skb, struct net_device *netdev)
 			ret = NETDEV_TX_BUSY;
 			goto out;
 		}
-		lpar_rc = send_subcrq_indirect(adapter, handle_array[0],
+		lpar_rc = send_subcrq_indirect(adapter, handle_array[queue_num],
 					       (u64)tx_buff->indir_dma,
 					       (u64)num_entries);
 	} else {
-		lpar_rc = send_subcrq(adapter, handle_array[0], &tx_crq);
+		lpar_rc = send_subcrq(adapter, handle_array[queue_num],
+				      &tx_crq);
 	}
 	if (lpar_rc != H_SUCCESS) {
 		dev_err(dev, "tx failed with code %ld\n", lpar_rc);
@@ -989,7 +990,7 @@ restart_poll:
 		netdev->stats.rx_bytes += length;
 		frames_processed++;
 	}
-	replenish_pools(adapter);
+	replenish_rx_pool(adapter, &adapter->rx_pool[scrq_num]);

 	if (frames_processed < budget) {
 		enable_scrq_irq(adapter, adapter->rx_scrq[scrq_num]);
@@ -1426,9 +1427,9 @@ static void init_sub_crqs(struct ibmvnic_adapter *adapter, int retry)
 		    entries_page : adapter->max_rx_add_entries_per_subcrq;

 		/* Choosing the maximum number of queues supported by firmware*/
-		adapter->req_tx_queues = adapter->min_tx_queues;
-		adapter->req_rx_queues = adapter->min_rx_queues;
-		adapter->req_rx_add_queues = adapter->min_rx_add_queues;
+		adapter->req_tx_queues = adapter->max_tx_queues;
+		adapter->req_rx_queues = adapter->max_rx_queues;
+		adapter->req_rx_add_queues = adapter->max_rx_add_queues;

 		adapter->req_mtu = adapter->max_mtu;
 	}
@@ -1776,13 +1777,11 @@ static void send_login(struct ibmvnic_adapter *adapter)
 		goto buf_map_failed;
 	}

-	rsp_buffer_size =
-	    sizeof(struct ibmvnic_login_rsp_buffer) +
-	    sizeof(u64) * (adapter->req_tx_queues +
-			   adapter->req_rx_queues *
-			   adapter->req_rx_add_queues + adapter->
-			   req_rx_add_queues) +
-	    sizeof(u8) * (IBMVNIC_TX_DESC_VERSIONS);
+	rsp_buffer_size = sizeof(struct ibmvnic_login_rsp_buffer) +
+			  sizeof(u64) * adapter->req_tx_queues +
+			  sizeof(u64) * adapter->req_rx_queues +
+			  sizeof(u64) * adapter->req_rx_queues +
+			  sizeof(u8) * IBMVNIC_TX_DESC_VERSIONS;

 	login_rsp_buffer = kmalloc(rsp_buffer_size, GFP_ATOMIC);
 	if (!login_rsp_buffer)
@@ -2401,6 +2400,16 @@ static int handle_login_rsp(union ibmvnic_crq *login_rsp_crq,
 	dma_unmap_single(dev, adapter->login_rsp_buf_token,
 			 adapter->login_rsp_buf_sz, DMA_BIDIRECTIONAL);

+	/* If the number of queues requested can't be allocated by the
+	 * server, the login response will return with code 1. We will need
+	 * to resend the login buffer with fewer queues requested.
+	 */
+	if (login_rsp_crq->generic.rc.code) {
+		adapter->renegotiate = true;
+		complete(&adapter->init_done);
+		return 0;
+	}
+
 	netdev_dbg(adapter->netdev, "Login Response Buffer:\n");
 	for (i = 0; i < (adapter->login_rsp_buf_sz - 1) / 8 + 1; i++) {
 		netdev_dbg(adapter->netdev, "%016lx\n",
@@ -3627,15 +3636,22 @@ static int ibmvnic_probe(struct vio_dev *dev, const struct vio_device_id *id)

 	init_completion(&adapter->init_done);
 	wait_for_completion(&adapter->init_done);
+
+	do {
+		adapter->renegotiate = false;

-	/* needed to pull init_sub_crqs outside of an interrupt context
-	 * because it creates IRQ mappings for the subCRQ queues, causing
-	 * a kernel warning
-	 */
-	init_sub_crqs(adapter, 0);
+		init_sub_crqs(adapter, 0);
+		reinit_completion(&adapter->init_done);
+		wait_for_completion(&adapter->init_done);

-	reinit_completion(&adapter->init_done);
-	wait_for_completion(&adapter->init_done);
+		if (adapter->renegotiate) {
+			release_sub_crqs(adapter);
+			send_cap_queries(adapter);
+
+			reinit_completion(&adapter->init_done);
+			wait_for_completion(&adapter->init_done);
+		}
+	} while (adapter->renegotiate);

 	/* if init_sub_crqs is partially successful, retry */
 	while (!adapter->tx_scrq || !adapter->rx_scrq) {
diff --git a/drivers/net/ethernet/ibm/ibmvnic.h b/drivers/net/ethernet/ibm/ibmvnic.h
index 5af8a79..0b66a50 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.h
+++ b/drivers/net/ethernet/ibm/ibmvnic.h
@@ -980,6 +980,7 @@ struct ibmvnic_adapter {
 	struct ibmvnic_sub_crq_queue **tx_scrq;
 	struct ibmvnic_sub_crq_queue **rx_scrq;
 	int requested_caps;
+	bool renegotiate;

 	/* rx structs */
 	struct napi_struct *napi;

^ permalink raw reply related

* [PATCH] packet: uses kfree_skb() for drop.
From: Weongyo Jeong @ 2016-04-06 16:54 UTC (permalink / raw)
  To: netdev; +Cc: Weongyo Jeong, David S. Miller, Willem de Bruijn

consume_skb() isn't for drop or error cases.  kfree_skb() is more proper
one.

Signed-off-by: Weongyo Jeong <weongyo.linux@gmail.com>
---
 net/packet/af_packet.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 1ecfa71..a75d5bf 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -2141,7 +2141,7 @@ drop_n_restore:
 		skb->len = skb_len;
 	}
 drop:
-	consume_skb(skb);
+	kfree_skb(skb);
 	return 0;
 }

-- 
2.1.3

^ permalink raw reply related

* Re: [PATCH net] net: sched: do not requeue a NULL skb
From: Cong Wang @ 2016-04-06 17:14 UTC (permalink / raw)
  To: Lars Persson
  Cc: Linux Kernel Network Developers, Jamal Hadi Salim, LKML,
	Lars Persson
In-Reply-To: <1459948214-735-1-git-send-email-larper@axis.com>

On Wed, Apr 6, 2016 at 6:10 AM, Lars Persson <lars.persson@axis.com> wrote:
> A failure in validate_xmit_skb_list() triggered an unconditional call
> to dev_requeue_skb with skb=NULL. This slowly grows the queue
> discipline's qlen count until all traffic through the queue stops.
>

Sounds reasonable.

> Fixes: 55a93b3ea780 ("qdisc: validate skb without holding lock")
> Signed-off-by: Lars Persson <larper@axis.com>
> ---
>  net/sched/sch_generic.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
> index f18c350..1031536 100644
> --- a/net/sched/sch_generic.c
> +++ b/net/sched/sch_generic.c
> @@ -165,6 +165,9 @@ int sch_direct_xmit(struct sk_buff *skb, struct Qdisc *q,
>                         skb = dev_hard_start_xmit(skb, dev, txq, &ret);
>
>                 HARD_TX_UNLOCK(dev, txq);
> +       } else {
> +               spin_lock(root_lock);
> +               return qdisc_qlen(q);

I think we should return 0 for this failure case so that qdisc_restart()
will stop. How about teaching dev_requeue_skb() to skip skb==NULL
case?

^ permalink raw reply

* Re: [PATCH] packet: uses kfree_skb() for drop.
From: Willem de Bruijn @ 2016-04-06 17:27 UTC (permalink / raw)
  To: Weongyo Jeong; +Cc: Network Development, David S. Miller, Willem de Bruijn
In-Reply-To: <1459961686-14281-1-git-send-email-weongyo.linux@gmail.com>

On Wed, Apr 6, 2016 at 12:54 PM, Weongyo Jeong <weongyo.linux@gmail.com> wrote:
> consume_skb() isn't for drop or error cases.  kfree_skb() is more proper
> one.
> Signed-off-by: Weongyo Jeong <weongyo.linux@gmail.com>
> ---
>  net/packet/af_packet.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
> index 1ecfa71..a75d5bf 100644
> --- a/net/packet/af_packet.c
> +++ b/net/packet/af_packet.c
> @@ -2141,7 +2141,7 @@ drop_n_restore:
>                 skb->len = skb_len;
>         }
>  drop:
> -       consume_skb(skb);
> +       kfree_skb(skb);

This does show an inconsistency between packet_rcv and tpacket_rcv,
which calls kfree_skb.

A comment at consume_skb mentions that kfree_skb is intended for drops
that signal a failure condition, and indeed, that makes it a useful
way to track errors (e.g., with perf record -a -g -e skb:kfree_skb).

This drop path is not always an error path, though. These network taps
will legitimately drop references to any packets not destined to them.
To be precise, only the drop_n_acct label cases are delivery errors
(drops after the filter accepted the packet). Changing unconditionally
to kfree_skb does pollute that useful counter with false positives. A
pedantic solution is to change both functions to only call kfree_skb
on drop_n_acct and consume_skb otherwise.

This shorthand change does at least makes packet_rcv and tpacket_rcv more alike.

^ permalink raw reply

* Re: [net PATCH v2 2/2] ipv4/GRO: Make GRO conform to RFC 6864
From: Tom Herbert @ 2016-04-06 17:42 UTC (permalink / raw)
  To: David Miller
  Cc: Edward Cree, Herbert Xu, Alexander Duyck, Alex Duyck, Jesse Gross,
	Eric Dumazet, Linux Kernel Network Developers
In-Reply-To: <20160406.114339.1083222124758102992.davem@davemloft.net>

On Wed, Apr 6, 2016 at 12:43 PM, David Miller <davem@davemloft.net> wrote:
> From: Tom Herbert <tom@herbertland.com>
> Date: Wed, 6 Apr 2016 10:53:52 -0300
>
>> Packets that are forwarded really should not be GRO'ed in the first
>> place because of the loss of information and added latency.
>
> First of all GRO is supposed to be lossless, so please stop saying this
> would be a reason to turn it off on a router.
>
> Second of all, the biggest piece of overhead is the routing lookup,
> therefore GRO batching helps enormously with routing workloads, and
> therefore is appropriate to be enabled on routers.
>
> Yes, I agree that for locally terminated stuff it helps more, but don't
> turn this into a "GRO on routers, meh..." type argument.  It simply is
> not true at all.
>
GRO on routers will help in a limited case where there is little load
and the traffic is nicely groomed high tput TCP connections. But for
routers with significant load, handling large quantities other
protocols like UDP, GRO is not necessarily helpful and presents a
nondeterministic performance improvement. For instance, I cannot
provision a router with any assumptions that GRO will be effective for
any % of packets to save any % of CPU, we need to provision based
purely on ability to forward by pps assuming no benefit from GRO.
Technically, this provisioning argument applies to end hosts also. We
have seen real cases on video servers where servers were
mis-provisioned with the assumption that GSO is always effective. So
when were getting good GSO benefits we might be using something like
80% CPU, but if some connectivity event occurs forcing all the cwnds
of the active connections drop, we find we need 110% of CPU to
recover.

This discussion is relevant because there is a big push now to replace
dedicated HW with commodity HW running Linux, this is already happened
significantly in load balancers but I expect this to extend to even
some cases of basic switching. Besides, I seriously doubt you'll find
any commercial router in the world that does GRO.

Yes, GRO needs to work correctly in all cases, but for system
performance in routing we a bound to per packet actions like route
lookup. As I said, optimizing GRO as Edward is suggesting for
forwarding case seems to have diminishing benefits.

Tom

^ permalink raw reply

* [PATCH v2 0/2] sctp: delay calls to sk_data_ready() as much as possible
From: Marcelo Ricardo Leitner @ 2016-04-06 17:53 UTC (permalink / raw)
  To: netdev; +Cc: Neil Horman, Vlad Yasevich, linux-sctp

1st patch is a preparation for the 2nd. The idea is to not call
->sk_data_ready() for every data chunk processed while processing
packets but only once before releasing the socket.

v2: patchset re-checked, small changelog fixes

Marcelo Ricardo Leitner (2):
  sctp: compress bit-wide flags to a bitfield on sctp_sock
  sctp: delay calls to sk_data_ready() as much as possible

 include/net/sctp/structs.h | 13 +++++++------
 net/sctp/sm_sideeffect.c   |  5 +++++
 net/sctp/ulpqueue.c        |  4 ++--
 3 files changed, 14 insertions(+), 8 deletions(-)

-- 
2.5.0

^ permalink raw reply

* [PATCH v2 1/2] sctp: compress bit-wide flags to a bitfield on sctp_sock
From: Marcelo Ricardo Leitner @ 2016-04-06 17:53 UTC (permalink / raw)
  To: netdev; +Cc: Neil Horman, Vlad Yasevich, linux-sctp
In-Reply-To: <cover.1459952558.git.marcelo.leitner@gmail.com>

It wastes space and gets worse as we add new flags, so convert bit-wide
flags to a bitfield.

Currently it already saves 4 bytes in sctp_sock, which are left as holes
in it for now. The whole struct needs packing, which should be done in
another patch.

Note that do_auto_asconf cannot be merged, as explained in the comment
before it.

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
---
 include/net/sctp/structs.h | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
index 6df1ce7a411c548bda4163840a90578b6e1b4cfe..1a6a626904bba4223b7921bbb4be41c2550271a7 100644
--- a/include/net/sctp/structs.h
+++ b/include/net/sctp/structs.h
@@ -210,14 +210,14 @@ struct sctp_sock {
 	int user_frag;
 
 	__u32 autoclose;
-	__u8 nodelay;
-	__u8 disable_fragments;
-	__u8 v4mapped;
-	__u8 frag_interleave;
 	__u32 adaptation_ind;
 	__u32 pd_point;
-	__u8 recvrcvinfo;
-	__u8 recvnxtinfo;
+	__u16	nodelay:1,
+		disable_fragments:1,
+		v4mapped:1,
+		frag_interleave:1,
+		recvrcvinfo:1,
+		recvnxtinfo:1;
 
 	atomic_t pd_mode;
 	/* Receive to here while partial delivery is in effect. */
-- 
2.5.0

^ permalink raw reply related

* [PATCH v2 2/2] sctp: delay calls to sk_data_ready() as much as possible
From: Marcelo Ricardo Leitner @ 2016-04-06 17:53 UTC (permalink / raw)
  To: netdev; +Cc: Neil Horman, Vlad Yasevich, linux-sctp
In-Reply-To: <cover.1459952558.git.marcelo.leitner@gmail.com>

Currently, the processing of multiple chunks in a single SCTP packet
leads to multiple calls to sk_data_ready, causing multiple wake up
signals which are costly and doesn't make it wake up any faster.

With this patch it will notice that the wake up is pending and will do it
before leaving the state machine interpreter, latest place possible to
do it realiably and cleanly.

Note that sk_data_ready events are not dependent on asocs, unlike waking
up writers.

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
---
 include/net/sctp/structs.h | 3 ++-
 net/sctp/sm_sideeffect.c   | 5 +++++
 net/sctp/ulpqueue.c        | 4 ++--
 3 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
index 1a6a626904bba4223b7921bbb4be41c2550271a7..21cb11107e378b4da1e7efde22fab4349496e35a 100644
--- a/include/net/sctp/structs.h
+++ b/include/net/sctp/structs.h
@@ -217,7 +217,8 @@ struct sctp_sock {
 		v4mapped:1,
 		frag_interleave:1,
 		recvrcvinfo:1,
-		recvnxtinfo:1;
+		recvnxtinfo:1,
+		pending_data_ready:1;
 
 	atomic_t pd_mode;
 	/* Receive to here while partial delivery is in effect. */
diff --git a/net/sctp/sm_sideeffect.c b/net/sctp/sm_sideeffect.c
index 7fe56d0acabf66cfd8fe29dfdb45f7620b470ac7..e7042f9ce63b0cfca50cae252f51b60b68cb5731 100644
--- a/net/sctp/sm_sideeffect.c
+++ b/net/sctp/sm_sideeffect.c
@@ -1742,6 +1742,11 @@ out:
 			error = sctp_outq_uncork(&asoc->outqueue, gfp);
 	} else if (local_cork)
 		error = sctp_outq_uncork(&asoc->outqueue, gfp);
+
+	if (sctp_sk(ep->base.sk)->pending_data_ready) {
+		ep->base.sk->sk_data_ready(ep->base.sk);
+		sctp_sk(ep->base.sk)->pending_data_ready = 0;
+	}
 	return error;
 nomem:
 	error = -ENOMEM;
diff --git a/net/sctp/ulpqueue.c b/net/sctp/ulpqueue.c
index ce469d648ffbe166f9ae1c5650f481256f31a7f8..72e5b3e41cddf9d79371de8ab01484e4601b97b6 100644
--- a/net/sctp/ulpqueue.c
+++ b/net/sctp/ulpqueue.c
@@ -264,7 +264,7 @@ int sctp_ulpq_tail_event(struct sctp_ulpq *ulpq, struct sctp_ulpevent *event)
 		sctp_ulpq_clear_pd(ulpq);
 
 	if (queue == &sk->sk_receive_queue)
-		sk->sk_data_ready(sk);
+		sctp_sk(sk)->pending_data_ready = 1;
 	return 1;
 
 out_free:
@@ -1140,5 +1140,5 @@ void sctp_ulpq_abort_pd(struct sctp_ulpq *ulpq, gfp_t gfp)
 
 	/* If there is data waiting, send it up the socket now. */
 	if (sctp_ulpq_clear_pd(ulpq) || ev)
-		sk->sk_data_ready(sk);
+		sctp_sk(sk)->pending_data_ready = 1;
 }
-- 
2.5.0

^ permalink raw reply related

* Re: [RFC PATCH net 3/4] ipv6: datagram: Update dst cache of a connected datagram sk during pmtu update
From: Cong Wang @ 2016-04-06 17:58 UTC (permalink / raw)
  To: Martin KaFai Lau; +Cc: netdev, Eric Dumazet, Wei Wang, Kernel Team
In-Reply-To: <20160406001141.GA97608@kafai-mba.local>

On Tue, Apr 5, 2016 at 5:11 PM, Martin KaFai Lau <kafai@fb.com> wrote:
> On Mon, Apr 04, 2016 at 01:45:02PM -0700, Cong Wang wrote:
>> I see your point, but calling __ip6_datagram_connect() seems overkill
>> here, we don't need to update so many things in the pmtu update context,
>> at least IPv4 doesn't do that either. I don't think you have to do that.
>>
>> So why just updating the dst cache (also some addr cache) here is not
>> enough?
> I am not sure I understand.  I could be missing something.
>
> This patch uses ip6_datagram_dst_update() to do the route lookup and
> sk->sk_dst_cache update.  ip6_datagram_dst_update() is
> created in the first two refactoring patches and is also used by
> __ip6_datagram_connect().
>
> Which operations in ip6_datagram_dst_update() could be saved
> during the pmtu update?

I thought you call the same ip6_datagram_dst_update() for both
pmtu update and __ip6_datagram_connect(), but you actually skip
some sk operations for pmtu case, which means you don't need
to worry about parallel ip6_datagram_connect().

IPv6 UDP sendmsg() path stores the dst without sock lock anyway,
we don't cope with a concurrent connect() on another cpu. But
still, I don't see this is a problem here, because even if we store
an obsolete address in cache, it would be corrected later.

^ permalink raw reply

* Re: [PATCH] packet: uses kfree_skb() for drop.
From: Weongyo Jeong @ 2016-04-06 18:10 UTC (permalink / raw)
  To: Willem de Bruijn; +Cc: Network Development, David S. Miller, Willem de Bruijn
In-Reply-To: <CAF=yD-JrVXKsJTUcWSmuVmPbdZV=_h8hsVPQVQTPF64R1k684w@mail.gmail.com>

On Wed, Apr 06, 2016 at 01:27:11PM -0400, Willem de Bruijn wrote:
> On Wed, Apr 6, 2016 at 12:54 PM, Weongyo Jeong <weongyo.linux@gmail.com> wrote:
> > consume_skb() isn't for drop or error cases.  kfree_skb() is more proper
> > one.
> > Signed-off-by: Weongyo Jeong <weongyo.linux@gmail.com>
> > ---
> >  net/packet/af_packet.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
> > index 1ecfa71..a75d5bf 100644
> > --- a/net/packet/af_packet.c
> > +++ b/net/packet/af_packet.c
> > @@ -2141,7 +2141,7 @@ drop_n_restore:
> >                 skb->len = skb_len;
> >         }
> >  drop:
> > -       consume_skb(skb);
> > +       kfree_skb(skb);
> 
> This does show an inconsistency between packet_rcv and tpacket_rcv,
> which calls kfree_skb.
> 
> A comment at consume_skb mentions that kfree_skb is intended for drops
> that signal a failure condition, and indeed, that makes it a useful
> way to track errors (e.g., with perf record -a -g -e skb:kfree_skb).
> 
> This drop path is not always an error path, though. These network taps
> will legitimately drop references to any packets not destined to them.
> To be precise, only the drop_n_acct label cases are delivery errors
> (drops after the filter accepted the packet). Changing unconditionally
> to kfree_skb does pollute that useful counter with false positives. A
> pedantic solution is to change both functions to only call kfree_skb
> on drop_n_acct and consume_skb otherwise.
> 
> This shorthand change does at least makes packet_rcv and tpacket_rcv more alike.

Thank you for comments.  I'll try to submit patch v2 for this case.

Regards,
Weongyo Jeong

^ permalink raw reply

* [PATCH v2] sctp: avoid refreshing heartbeat timer too often
From: Marcelo Ricardo Leitner @ 2016-04-06 18:15 UTC (permalink / raw)
  To: netdev; +Cc: Neil Horman, Vlad Yasevich, David Laight, linux-sctp

Currently on high rate SCTP streams the heartbeat timer refresh can
consume quite a lot of resources as timer updates are costly and it
contains a random factor, which a) is also costly and b) invalidates
mod_timer() optimization for not editing a timer to the same value.
It may even cause the timer to be slightly advanced, for no good reason.

As suggested by David Laight this patch now removes this timer update
from hot path by leaving the timer on and re-evaluating upon its
expiration if the heartbeat is still needed or not, similarly to what is
done for TCP. If it's not needed anymore the timer is re-scheduled to
the new timeout, considering the time already elapsed.

For this, we now record the last tx timestamp per transport, updated in
the same spots as hb timer was restarted on tx. Also split up
sctp_transport_reset_timers into sctp_transport_reset_t3_rtx and
sctp_transport_reset_hb_timer, so we can re-arm T3 without re-arming the
heartbeat one.

On loopback with MTU of 65535 and data chunks with 1636, so that we
have a considerable amount of chunks without stressing system calls,
netperf -t SCTP_STREAM -l 30, perf looked like this before:

Samples: 103K of event 'cpu-clock', Event count (approx.): 25833000000
  Overhead  Command  Shared Object      Symbol
+    6,15%  netperf  [kernel.vmlinux]   [k] copy_user_enhanced_fast_string
-    5,43%  netperf  [kernel.vmlinux]   [k] _raw_write_unlock_irqrestore
   - _raw_write_unlock_irqrestore
      - 96,54% _raw_spin_unlock_irqrestore
         - 36,14% mod_timer
            + 97,24% sctp_transport_reset_timers
            + 2,76% sctp_do_sm
         + 33,65% __wake_up_sync_key
         + 28,77% sctp_ulpq_tail_event
         + 1,40% del_timer
      - 1,84% mod_timer
         + 99,03% sctp_transport_reset_timers
         + 0,97% sctp_do_sm
      + 1,50% sctp_ulpq_tail_event

And after this patch, now with netperf -l 60:

Samples: 230K of event 'cpu-clock', Event count (approx.): 57707250000
  Overhead  Command  Shared Object      Symbol
+    5,65%  netperf  [kernel.vmlinux]   [k] memcpy_erms
+    5,59%  netperf  [kernel.vmlinux]   [k] copy_user_enhanced_fast_string
-    5,05%  netperf  [kernel.vmlinux]   [k] _raw_spin_unlock_irqrestore
   - _raw_spin_unlock_irqrestore
      + 49,89% __wake_up_sync_key
      + 45,68% sctp_ulpq_tail_event
      - 2,85% mod_timer
         + 76,51% sctp_transport_reset_t3_rtx
         + 23,49% sctp_do_sm
      + 1,55% del_timer
+    2,50%  netperf  [sctp]             [k] sctp_datamsg_from_user
+    2,26%  netperf  [sctp]             [k] sctp_sendmsg

Throughput-wise, from 6800mbps without the patch to 7050mbps with it,
~3.7%.

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
---
v2:
  patch re-checked, fixed indentation issue.

 include/net/sctp/structs.h |  8 +++++++-
 net/sctp/outqueue.c        | 15 ++++++++++-----
 net/sctp/sm_make_chunk.c   |  3 +--
 net/sctp/sm_sideeffect.c   | 36 ++++++++++++++++--------------------
 net/sctp/transport.c       | 19 +++++++++++++------
 5 files changed, 47 insertions(+), 34 deletions(-)

diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
index 6df1ce7a411c548bda4163840a90578b6e1b4cfe..5a404c354f4c787e57a6ab6ca68b3ffc537eb1cc 100644
--- a/include/net/sctp/structs.h
+++ b/include/net/sctp/structs.h
@@ -847,6 +847,11 @@ struct sctp_transport {
 	 */
 	ktime_t last_time_heard;
 
+	/* When was the last time that we sent a chunk using this
+	 * transport? We use this to check for idle transports
+	 */
+	unsigned long last_time_sent;
+
 	/* Last time(in jiffies) when cwnd is reduced due to the congestion
 	 * indication based on ECNE chunk.
 	 */
@@ -952,7 +957,8 @@ void sctp_transport_route(struct sctp_transport *, union sctp_addr *,
 			  struct sctp_sock *);
 void sctp_transport_pmtu(struct sctp_transport *, struct sock *sk);
 void sctp_transport_free(struct sctp_transport *);
-void sctp_transport_reset_timers(struct sctp_transport *);
+void sctp_transport_reset_t3_rtx(struct sctp_transport *);
+void sctp_transport_reset_hb_timer(struct sctp_transport *);
 int sctp_transport_hold(struct sctp_transport *);
 void sctp_transport_put(struct sctp_transport *);
 void sctp_transport_update_rto(struct sctp_transport *, __u32);
diff --git a/net/sctp/outqueue.c b/net/sctp/outqueue.c
index 8d3d3625130ee0fd294998554a9290d57eae56e7..084718f9b3dad09e21e41e34b989e25627058c98 100644
--- a/net/sctp/outqueue.c
+++ b/net/sctp/outqueue.c
@@ -866,8 +866,10 @@ static int sctp_outq_flush(struct sctp_outq *q, int rtx_timeout, gfp_t gfp)
 				 * sender MUST assure that at least one T3-rtx
 				 * timer is running.
 				 */
-				if (chunk->chunk_hdr->type == SCTP_CID_FWD_TSN)
-					sctp_transport_reset_timers(transport);
+				if (chunk->chunk_hdr->type == SCTP_CID_FWD_TSN) {
+					sctp_transport_reset_t3_rtx(transport);
+					transport->last_time_sent = jiffies;
+				}
 			}
 			break;
 
@@ -924,8 +926,10 @@ static int sctp_outq_flush(struct sctp_outq *q, int rtx_timeout, gfp_t gfp)
 			error = sctp_outq_flush_rtx(q, packet,
 						    rtx_timeout, &start_timer);
 
-			if (start_timer)
-				sctp_transport_reset_timers(transport);
+			if (start_timer) {
+				sctp_transport_reset_t3_rtx(transport);
+				transport->last_time_sent = jiffies;
+			}
 
 			/* This can happen on COOKIE-ECHO resend.  Only
 			 * one chunk can get bundled with a COOKIE-ECHO.
@@ -1062,7 +1066,8 @@ static int sctp_outq_flush(struct sctp_outq *q, int rtx_timeout, gfp_t gfp)
 			list_add_tail(&chunk->transmitted_list,
 				      &transport->transmitted);
 
-			sctp_transport_reset_timers(transport);
+			sctp_transport_reset_t3_rtx(transport);
+			transport->last_time_sent = jiffies;
 
 			/* Only let one DATA chunk get bundled with a
 			 * COOKIE-ECHO chunk.
diff --git a/net/sctp/sm_make_chunk.c b/net/sctp/sm_make_chunk.c
index 7f0bf798205ba3c5311bf57ace66aaaa57cb0367..56f364d8f93270f31867333585fb28317f9d87ad 100644
--- a/net/sctp/sm_make_chunk.c
+++ b/net/sctp/sm_make_chunk.c
@@ -3080,8 +3080,7 @@ static __be16 sctp_process_asconf_param(struct sctp_association *asoc,
 			return SCTP_ERROR_RSRC_LOW;
 
 		/* Start the heartbeat timer. */
-		if (!mod_timer(&peer->hb_timer, sctp_transport_timeout(peer)))
-			sctp_transport_hold(peer);
+		sctp_transport_reset_hb_timer(peer);
 		asoc->new_transport = peer;
 		break;
 	case SCTP_PARAM_DEL_IP:
diff --git a/net/sctp/sm_sideeffect.c b/net/sctp/sm_sideeffect.c
index 7fe56d0acabf66cfd8fe29dfdb45f7620b470ac7..41b081a64752da3e4de5dafe6d3afac84bf4923d 100644
--- a/net/sctp/sm_sideeffect.c
+++ b/net/sctp/sm_sideeffect.c
@@ -69,8 +69,6 @@ static int sctp_side_effects(sctp_event_t event_type, sctp_subtype_t subtype,
 			     sctp_cmd_seq_t *commands,
 			     gfp_t gfp);
 
-static void sctp_cmd_hb_timer_update(sctp_cmd_seq_t *cmds,
-				     struct sctp_transport *t);
 /********************************************************************
  * Helper functions
  ********************************************************************/
@@ -367,6 +365,7 @@ void sctp_generate_heartbeat_event(unsigned long data)
 	struct sctp_association *asoc = transport->asoc;
 	struct sock *sk = asoc->base.sk;
 	struct net *net = sock_net(sk);
+	u32 elapsed, timeout;
 
 	bh_lock_sock(sk);
 	if (sock_owned_by_user(sk)) {
@@ -378,6 +377,16 @@ void sctp_generate_heartbeat_event(unsigned long data)
 		goto out_unlock;
 	}
 
+	/* Check if we should still send the heartbeat or reschedule */
+	elapsed = jiffies - transport->last_time_sent;
+	timeout = sctp_transport_timeout(transport);
+	if (elapsed < timeout) {
+		elapsed = timeout - elapsed;
+		if (!mod_timer(&transport->hb_timer, jiffies + elapsed))
+			sctp_transport_hold(transport);
+		goto out_unlock;
+	}
+
 	error = sctp_do_sm(net, SCTP_EVENT_T_TIMEOUT,
 			   SCTP_ST_TIMEOUT(SCTP_EVENT_TIMEOUT_HEARTBEAT),
 			   asoc->state, asoc->ep, asoc,
@@ -507,7 +516,7 @@ static void sctp_do_8_2_transport_strike(sctp_cmd_seq_t *commands,
 					     0);
 
 		/* Update the hb timer to resend a heartbeat every rto */
-		sctp_cmd_hb_timer_update(commands, transport);
+		sctp_transport_reset_hb_timer(transport);
 	}
 
 	if (transport->state != SCTP_INACTIVE &&
@@ -634,11 +643,8 @@ static void sctp_cmd_hb_timers_start(sctp_cmd_seq_t *cmds,
 	 * hold a reference on the transport to make sure none of
 	 * the needed data structures go away.
 	 */
-	list_for_each_entry(t, &asoc->peer.transport_addr_list, transports) {
-
-		if (!mod_timer(&t->hb_timer, sctp_transport_timeout(t)))
-			sctp_transport_hold(t);
-	}
+	list_for_each_entry(t, &asoc->peer.transport_addr_list, transports)
+		sctp_transport_reset_hb_timer(t);
 }
 
 static void sctp_cmd_hb_timers_stop(sctp_cmd_seq_t *cmds,
@@ -669,15 +675,6 @@ static void sctp_cmd_t3_rtx_timers_stop(sctp_cmd_seq_t *cmds,
 }
 
 
-/* Helper function to update the heartbeat timer. */
-static void sctp_cmd_hb_timer_update(sctp_cmd_seq_t *cmds,
-				     struct sctp_transport *t)
-{
-	/* Update the heartbeat timer.  */
-	if (!mod_timer(&t->hb_timer, sctp_transport_timeout(t)))
-		sctp_transport_hold(t);
-}
-
 /* Helper function to handle the reception of an HEARTBEAT ACK.  */
 static void sctp_cmd_transport_on(sctp_cmd_seq_t *cmds,
 				  struct sctp_association *asoc,
@@ -742,8 +739,7 @@ static void sctp_cmd_transport_on(sctp_cmd_seq_t *cmds,
 	sctp_transport_update_rto(t, (jiffies - hbinfo->sent_at));
 
 	/* Update the heartbeat timer.  */
-	if (!mod_timer(&t->hb_timer, sctp_transport_timeout(t)))
-		sctp_transport_hold(t);
+	sctp_transport_reset_hb_timer(t);
 
 	if (was_unconfirmed && asoc->peer.transport_count == 1)
 		sctp_transport_immediate_rtx(t);
@@ -1614,7 +1610,7 @@ static int sctp_cmd_interpreter(sctp_event_t event_type,
 
 		case SCTP_CMD_HB_TIMER_UPDATE:
 			t = cmd->obj.transport;
-			sctp_cmd_hb_timer_update(commands, t);
+			sctp_transport_reset_hb_timer(t);
 			break;
 
 		case SCTP_CMD_HB_TIMERS_STOP:
diff --git a/net/sctp/transport.c b/net/sctp/transport.c
index 9b6b48c7524e4b441a151b80f0babec81f539d49..81b86678be4d6fccc527d3c3e509c12576b2194c 100644
--- a/net/sctp/transport.c
+++ b/net/sctp/transport.c
@@ -183,7 +183,7 @@ static void sctp_transport_destroy(struct sctp_transport *transport)
 /* Start T3_rtx timer if it is not already running and update the heartbeat
  * timer.  This routine is called every time a DATA chunk is sent.
  */
-void sctp_transport_reset_timers(struct sctp_transport *transport)
+void sctp_transport_reset_t3_rtx(struct sctp_transport *transport)
 {
 	/* RFC 2960 6.3.2 Retransmission Timer Rules
 	 *
@@ -197,11 +197,18 @@ void sctp_transport_reset_timers(struct sctp_transport *transport)
 		if (!mod_timer(&transport->T3_rtx_timer,
 			       jiffies + transport->rto))
 			sctp_transport_hold(transport);
+}
+
+void sctp_transport_reset_hb_timer(struct sctp_transport *transport)
+{
+	unsigned long expires;
 
 	/* When a data chunk is sent, reset the heartbeat interval.  */
-	if (!mod_timer(&transport->hb_timer,
-		       sctp_transport_timeout(transport)))
-	    sctp_transport_hold(transport);
+	expires = jiffies + sctp_transport_timeout(transport);
+	if (time_before(transport->hb_timer.expires, expires) &&
+	    !mod_timer(&transport->hb_timer,
+		       expires + prandom_u32_max(transport->rto)))
+		sctp_transport_hold(transport);
 }
 
 /* This transport has been assigned to an association.
@@ -595,13 +602,13 @@ void sctp_transport_burst_reset(struct sctp_transport *t)
 unsigned long sctp_transport_timeout(struct sctp_transport *trans)
 {
 	/* RTO + timer slack +/- 50% of RTO */
-	unsigned long timeout = (trans->rto >> 1) + prandom_u32_max(trans->rto);
+	unsigned long timeout = trans->rto >> 1;
 
 	if (trans->state != SCTP_UNCONFIRMED &&
 	    trans->state != SCTP_PF)
 		timeout += trans->hbinterval;
 
-	return timeout + jiffies;
+	return timeout;
 }
 
 /* Reset transport variables to their initial values */
-- 
2.5.0

^ permalink raw reply related

* Re: [PATCH] neigh: remove duplicated log msg
From: David Miller @ 2016-04-06 18:20 UTC (permalink / raw)
  To: abdelmajidx.mlayeh
  Cc: linux-kernel, dsa, edumazet, martinbj2008, koct9i, ja,
	rick.jones2, netdev
In-Reply-To: <1459946535-6015-1-git-send-email-abdelmajidx.mlayeh@intel.com>

From: Abdelmajid Mlayeh <abdelmajidx.mlayeh@intel.com>
Date: Wed,  6 Apr 2016 14:42:06 +0200

> This e-mail and any attachments may contain confidential material for
> the sole use of the intended recipient(s). Any review or distribution
> by others is strictly prohibited. If you are not the intended
> recipient, please contact the sender and delete all copies.

We will not read any patches with attachments like this, sorry.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox