Netdev List
 help / color / mirror / Atom feed
* Re: drop all fragments inside tx queue if one gets dropped
From: Michael Richardson @ 2016-04-20 20:15 UTC (permalink / raw)
  To: netdev, linux-wpan; +Cc: Alexander Aring
In-Reply-To: <57175156.3050501@pengutronix.de>

[-- Attachment #1: Type: text/plain, Size: 1600 bytes --]


{adding some more comments from the -wpan side of things}

Alexander Aring <aar@pengutronix.de> wrote:
    > On linux-wpan we had a discussion about setting the right tx_queue_len
    > and came to some issues in 802.15.4 6LoWPAN networks.

...

    > And then a lot of fragments laying inside the tx_queue and waits to
    > transfer to the transceiver which has only one framebuffer to transmit
    > one frame and waits for tx completion to transfer the next one.

    > My question is, if qdisc drops some fragment because the queue is full
    > or something else. Exists there some way to remove all fragments inside
    > the queue? If one fragment will be dropped and all related are still
    > inside the queue then we send mostly garbage.

The big concern is that if we make tx_queue_len too big, we are effectively
introducing bloat.
If we make it too small, then we might drop one fragment, when we would
prefer to drop the entire packet.

It seems that maybe we ought to have a queue in the upper interface and fill
the lower interface with at most two packets' worth of fragments.

    > I want to add a behaviour which drops all related fragments for 6LoWPAN
    > fragmentation at first, if the payload is above 1280 bytes, then we
    > have also IPv6 fragmentation on it. In future I also like to remove all
    > related 6LoWPAN fragments which are related according to the IPv6
    > fragment.

It would still be useful to be able to do this in general: this kind of
operation would also benefit sending large UDP packets over ethernet when we
have to do IP-layer fragmentation.

^ permalink raw reply

* Re: [PATCH iproute2 WIP] ifstat: use new RTM_GETSTATS api
From: Roopa Prabhu @ 2016-04-20 20:25 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: davem, netdev
In-Reply-To: <20160420115347.6a43d7f7@xeon-e3>

On 4/20/16, 11:53 AM, Stephen Hemminger wrote:
> On Wed, 20 Apr 2016 09:16:15 -0700
> Roopa Prabhu <roopa@cumulusnetworks.com> wrote:
>
>> +int rtnl_wilddump_stats_req_filter(struct rtnl_handle *rth, int family, int type,
>> +				   __u32 filt_mask)
>> +{
>> +	struct {
>> +		struct nlmsghdr nlh;
>> +		struct if_stats_msg ifsm;
>> +	} req;
> Please use C99 initialization instead of memset in new code.

yes, ack.
>
>> +	int err;
>> +
>> +	memset(&req, 0, sizeof(req));
>> +	req.nlh.nlmsg_len = sizeof(req);
>> +	req.nlh.nlmsg_type = type;
>> +	req.nlh.nlmsg_flags = NLM_F_DUMP|NLM_F_REQUEST;
>> +	req.nlh.nlmsg_pid = 0;
>> +	req.nlh.nlmsg_seq = rth->dump = ++rth->seq;
>> +	req.ifsm.family = family;
>> +	req.ifsm.filter_mask = filt_mask;
>> +
>> +	err = send(rth->fd, (void*)&req, sizeof(req), 0);
>> +
>> +	return err;
> Why not just:
>         return send(rth->fd, &req, sizoef(req), 0);

yes, i had that initially. and then changed it to add some debugs before returning.

this is all WIP. will clean it up.

thanks.

^ permalink raw reply

* [RFC 0/3] net: dsa: cross-chip operations
From: Vivien Didelot @ 2016-04-20 20:26 UTC (permalink / raw)
  To: netdev
  Cc: linux-kernel, kernel, David S. Miller, Florian Fainelli,
	Andrew Lunn, Jiri Pirko, Vivien Didelot

This patchset aims to start a thread on cross-chips operations in DSA, no need
to spend time on reviewing the details of the code (especially for mv88e6xxx).

So when several switch chips are interconnected, we need to configure them all
to ensure correct hardware switching. We can think about this case:

          sw0             sw1             sw2
    [ 0 1 2 3 4 5 ] [ 0 1 2 3 4 5 ] [ 0 1 2 3 4 5 ]
      |   '     ^     ^         ^     ^     '
      v   '     |     |         |     |     '
     CPU  '     `-DSA-'         `-DSA-'     '
          '                                 '
          + - - - - - - - br0 - - - - - - - +

Here sw1 needs to be aware of br0, to configure itself with MAC addresses,
VIDs, or whatever to ensure hardware frame bridging between sw0 and sw2.

Two cross-chip unbridged ports (e.g. sw0p3 and sw1p1) of mv88e6xxx-supported
devices can currently talk to each other, because the chips are configured to
allow frames to ingress from any external ports. This is not what we want, and
this patchset fixes that. The only important part for the thread is 1/3 though.

Some Marvell switches have a cross-chip port based VLAN table used to allow or
not external frames to egress its internal ports. So a new switch-level
operation needs to be added in order to inform the other switches that a port
joined or left a bridge group. This is what dsa_slave_broadcast_bridge() does.

But this is not enough. When a port joins a bridge group, its switch driver
needs to learn the existing cross-chip members, so that ingressing frames from
them can be allowed. This is what dsa_tree_broadcast_bridge() does.

But that is ugly. This adds yet another DSA function, and makes the DSA layer
code quite complex. Also, similar notifications need to be implemented to
configure cross-chip VLANs (for VLAN filtering aware systems where br0 is
implemented with a 802.1Q VLAN), FDB additions/deletions so that frames get
switched correctly by the hardware, etc.

Actually the DSA drivers functions are just switchdev ops with a bit of
syntactic sugar, but no real value added. The purpose of the DSA layer is to
scale the switchdev ops "horizontally" to every tree port. To avoid numerous
operations and keep it simple for drivers, I think we need 2 things:

  1) The scope of DSA switch driver ops should be the DSA tree, not the switch.
  This means having each dsa_switch_driver implements functions such as:

      int (*port_bridge_join)(struct dsa_switch *ds, int sw_index, int sw_port,
           struct net_device *bridge);

  instead of the current:

      int (*port_bridge_join)(struct dsa_switch *ds, int port,
           struct net_device *bridge);

  So that drivers can configure their in-chip or cross-chip stuffs, return 0 or
  -EOPNOTSUPP if ds->index != sw_index. Replacing dsa_slave_broadcast_bridge.

  2) To replace dsa_tree_broadcast_bridge, drivers need to access public info
  in the tree, such as bridge membership of every port. That can be acheived
  with a bit of refactoring like the following:

      /* include/net/dsa.h */
      struct dsa_port {
          struct list_head    list;
          struct dsa_switch   *ds;
          int                 port;
          struct net_device   *bridge_dev;
      }

      struct dsa_switch_tree {
          ...
          struct list_head ports;
      };

      /* net/dsa/dsa_priv.h */
      struct dsa_slave_priv {
          ...
          dsa_port dp;
      };

      Then DSA switch drivers can implement tree-level ops such as:

      int (*port_bridge_join)(struct dsa_switch *ds, struct dsa_port *dp,
           struct net_device *bridge);

I'm working on an RFC for the above. Let me know what you think and if this
seems correct to you.

Cheers,

Vivien Didelot (3):
  net: dsa: add cross-chip notification for bridge
  net: dsa: mv88e6xxx: initialize PVT
  net: dsa: mv88e6xxx: setup PVT

 drivers/net/dsa/mv88e6352.c |   1 +
 drivers/net/dsa/mv88e6xxx.c | 181 ++++++++++++++++++++++++++++++++++++++++++--
 drivers/net/dsa/mv88e6xxx.h |   7 ++
 include/net/dsa.h           |   6 ++
 net/dsa/slave.c             |  60 ++++++++++++++-
 5 files changed, 246 insertions(+), 9 deletions(-)

-- 
2.8.0

^ permalink raw reply

* [RFC 1/3] net: dsa: add cross-chip notification for bridge
From: Vivien Didelot @ 2016-04-20 20:26 UTC (permalink / raw)
  To: netdev
  Cc: linux-kernel, kernel, David S. Miller, Florian Fainelli,
	Andrew Lunn, Jiri Pirko, Vivien Didelot
In-Reply-To: <1461183969-24610-1-git-send-email-vivien.didelot@savoirfairelinux.com>

When multiple switch chips are chained together, one needs to know about
the bridge membership of others. For instance, switches like Marvell
6352 have cross-chip port-based VLAN table to allow or forbid cross-chip
frames to egress.

Add a cross_chip_bridge DSA driver function, used to notify a switch
about bridge membership configured in other chips.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
---
 include/net/dsa.h |  6 ++++++
 net/dsa/slave.c   | 60 +++++++++++++++++++++++++++++++++++++++++++++++++++----
 2 files changed, 62 insertions(+), 4 deletions(-)

diff --git a/include/net/dsa.h b/include/net/dsa.h
index c4bc42b..1994fa7 100644
--- a/include/net/dsa.h
+++ b/include/net/dsa.h
@@ -340,6 +340,12 @@ struct dsa_switch_driver {
 	int	(*port_fdb_dump)(struct dsa_switch *ds, int port,
 				 struct switchdev_obj_port_fdb *fdb,
 				 int (*cb)(struct switchdev_obj *obj));
+
+	/*
+	 * Cross-chip notifications
+	 */
+	void	(*cross_chip_bridge)(struct dsa_switch *ds, int sw_index,
+				     int sw_port, struct net_device *bridge);
 };
 
 void register_switch_driver(struct dsa_switch_driver *type);
diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index 3b6750f..bd8f4e2 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -431,19 +431,68 @@ static int dsa_slave_port_obj_dump(struct net_device *dev,
 	return err;
 }
 
+static void dsa_slave_broadcast_bridge(struct net_device *dev)
+{
+	struct dsa_slave_priv *p = netdev_priv(dev);
+	struct dsa_switch *ds = p->parent;
+	int chip;
+
+	for (chip = 0; chip < ds->dst->pd->nr_chips; ++chip) {
+		struct dsa_switch *sw = ds->dst->ds[chip];
+
+		if (sw->index == ds->index)
+			continue;
+
+		if (sw->drv->cross_chip_bridge)
+			sw->drv->cross_chip_bridge(sw, ds->index, p->port,
+						   p->bridge_dev);
+	}
+}
+
+static void dsa_tree_broadcast_bridge(struct dsa_switch_tree *dst,
+				      struct net_device *bridge)
+{
+	struct net_device *dev;
+	struct dsa_slave_priv *p;
+	struct dsa_switch *ds;
+	int chip, port;
+
+	for (chip = 0; chip < dst->pd->nr_chips; ++chip) {
+		ds = dst->ds[chip];
+
+		for (port = 0; port < DSA_MAX_PORTS; ++port) {
+			if (!ds->ports[port])
+				continue;
+
+			dev = ds->ports[port];
+			p = netdev_priv(dev);
+
+			if (p->bridge_dev == bridge)
+				dsa_slave_broadcast_bridge(dev);
+		}
+	}
+}
+
 static int dsa_slave_bridge_port_join(struct net_device *dev,
 				      struct net_device *br)
 {
 	struct dsa_slave_priv *p = netdev_priv(dev);
 	struct dsa_switch *ds = p->parent;
-	int ret = -EOPNOTSUPP;
+	int err;
 
 	p->bridge_dev = br;
 
-	if (ds->drv->port_bridge_join)
-		ret = ds->drv->port_bridge_join(ds, p->port, br);
+	/* In-chip hardware bridging */
+	if (ds->drv->port_bridge_join) {
+		err = ds->drv->port_bridge_join(ds, p->port, br);
+		if (err && err != -EOPNOTSUPP)
+			return err;
+	}
+
+	/* Broadcast bridge membership across chips */
+	dsa_tree_broadcast_bridge(ds->dst, br);
 
-	return ret == -EOPNOTSUPP ? 0 : ret;
+	return 0;
 }
 
 static void dsa_slave_bridge_port_leave(struct net_device *dev)
@@ -462,6 +511,9 @@ static void dsa_slave_bridge_port_leave(struct net_device *dev)
 	 */
 	if (ds->drv->port_stp_state_set)
 		ds->drv->port_stp_state_set(ds, p->port, BR_STATE_FORWARDING);
+
+	/* Notify the port leaving to other chips */
+	dsa_slave_broadcast_bridge(dev);
 }
 
 static int dsa_slave_port_attr_get(struct net_device *dev,
-- 
2.8.0

^ permalink raw reply related

* [RFC 2/3] net: dsa: mv88e6xxx: initialize PVT
From: Vivien Didelot @ 2016-04-20 20:26 UTC (permalink / raw)
  To: netdev
  Cc: linux-kernel, kernel, David S. Miller, Florian Fainelli,
	Andrew Lunn, Jiri Pirko, Vivien Didelot
In-Reply-To: <1461183969-24610-1-git-send-email-vivien.didelot@savoirfairelinux.com>

Expand the Cross-chip Port Based VLAN Table initilization code, and make
sure the "5 Bit Port" bit is cleared.

This commit doesn't make any functional change to the current code.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
---
 drivers/net/dsa/mv88e6xxx.c | 48 ++++++++++++++++++++++++++++++++++++++++-----
 drivers/net/dsa/mv88e6xxx.h |  5 +++++
 2 files changed, 48 insertions(+), 5 deletions(-)

diff --git a/drivers/net/dsa/mv88e6xxx.c b/drivers/net/dsa/mv88e6xxx.c
index 1dd525d..e35bc9f 100644
--- a/drivers/net/dsa/mv88e6xxx.c
+++ b/drivers/net/dsa/mv88e6xxx.c
@@ -2203,6 +2203,47 @@ unlock:
 	return err;
 }
 
+static int _mv88e6xxx_pvt_wait(struct dsa_switch *ds)
+{
+	return _mv88e6xxx_wait(ds, REG_GLOBAL2, GLOBAL2_PVT_ADDR,
+			       GLOBAL2_PVT_ADDR_BUSY);
+}
+
+static int _mv88e6xxx_pvt_cmd(struct dsa_switch *ds, int src_dev, int src_port,
+			      u16 op)
+{
+	u16 reg = op;
+	int err;
+
+	/* 9-bit Cross-chip PVT pointer: with GLOBAL2_MISC_5_BIT_PORT cleared,
+	 * source device is 5-bit, source port is 4-bit.
+	 */
+	reg |= (src_dev & 0x1f) << 4;
+	reg |= (src_port & 0xf);
+
+	err = _mv88e6xxx_reg_write(ds, REG_GLOBAL2, GLOBAL2_PVT_ADDR, reg);
+	if (err)
+		return err;
+
+	return _mv88e6xxx_pvt_wait(ds);
+}
+
+static int _mv88e6xxx_pvt_init(struct dsa_switch *ds)
+{
+	int err;
+
+	/* Clear 5 Bit Port for usage with Marvell Link Street devices:
+	 * use 4 bits for the Src_Port/Src_Trunk and 5 bits for the Src_Dev.
+	 */
+	err = _mv88e6xxx_reg_write(ds, REG_GLOBAL2, GLOBAL2_MISC,
+				   0 & ~GLOBAL2_MISC_5_BIT_PORT);
+	if (err)
+		return err;
+
+	/* Allow any external frame to egress any internal port */
+	return _mv88e6xxx_pvt_cmd(ds, 0, 0, GLOBAL2_PVT_ADDR_OP_INIT_ONES);
+}
+
 int mv88e6xxx_port_bridge_join(struct dsa_switch *ds, int port,
 			       struct net_device *bridge)
 {
@@ -2747,11 +2788,8 @@ int mv88e6xxx_setup_global(struct dsa_switch *ds)
 		if (err)
 			goto unlock;
 
-		/* Initialise cross-chip port VLAN table to reset
-		 * defaults.
-		 */
-		err = _mv88e6xxx_reg_write(ds, REG_GLOBAL2,
-					   GLOBAL2_PVT_ADDR, 0x9000);
+		/* Initialize Cross-chip Port VLAN Table (PVT) */
+		err = _mv88e6xxx_pvt_init(ds);
 		if (err)
 			goto unlock;
 
diff --git a/drivers/net/dsa/mv88e6xxx.h b/drivers/net/dsa/mv88e6xxx.h
index 0dbe2d1..dd63377 100644
--- a/drivers/net/dsa/mv88e6xxx.h
+++ b/drivers/net/dsa/mv88e6xxx.h
@@ -298,6 +298,10 @@
 #define GLOBAL2_INGRESS_OP	0x09
 #define GLOBAL2_INGRESS_DATA	0x0a
 #define GLOBAL2_PVT_ADDR	0x0b
+#define GLOBAL2_PVT_ADDR_BUSY	BIT(15)
+#define GLOBAL2_PVT_ADDR_OP_INIT_ONES	((0x01 << 12) | GLOBAL2_PVT_ADDR_BUSY)
+#define GLOBAL2_PVT_ADDR_OP_WRITE_PVLAN	((0x03 << 12) | GLOBAL2_PVT_ADDR_BUSY)
+#define GLOBAL2_PVT_ADDR_OP_READ	((0x04 << 12) | GLOBAL2_PVT_ADDR_BUSY)
 #define GLOBAL2_PVT_DATA	0x0c
 #define GLOBAL2_SWITCH_MAC	0x0d
 #define GLOBAL2_SWITCH_MAC_BUSY BIT(15)
@@ -335,6 +339,7 @@
 #define GLOBAL2_WDOG_CONTROL	0x1b
 #define GLOBAL2_QOS_WEIGHT	0x1c
 #define GLOBAL2_MISC		0x1d
+#define GLOBAL2_MISC_5_BIT_PORT	BIT(14)
 
 #define MV88E6XXX_N_FID		4096
 
-- 
2.8.0

^ permalink raw reply related

* [RFC 3/3] net: dsa: mv88e6xxx: setup PVT
From: Vivien Didelot @ 2016-04-20 20:26 UTC (permalink / raw)
  To: netdev
  Cc: linux-kernel, kernel, David S. Miller, Florian Fainelli,
	Andrew Lunn, Jiri Pirko, Vivien Didelot
In-Reply-To: <1461183969-24610-1-git-send-email-vivien.didelot@savoirfairelinux.com>

Instead of allowing any external frame to egress any internal port,
configure the Cross-chip Port VLAN Table (PVT) to forbid that.

When an external source port joins or leaves a bridge crossing this
switch, mask it in the PVT to allow or forbid frames to egress.

Add support for the cross-chip bridge notification to the 6352 family.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
---
 drivers/net/dsa/mv88e6352.c |   1 +
 drivers/net/dsa/mv88e6xxx.c | 137 +++++++++++++++++++++++++++++++++++++++++++-
 drivers/net/dsa/mv88e6xxx.h |   2 +
 3 files changed, 138 insertions(+), 2 deletions(-)

diff --git a/drivers/net/dsa/mv88e6352.c b/drivers/net/dsa/mv88e6352.c
index 4afc24d..03ab309 100644
--- a/drivers/net/dsa/mv88e6352.c
+++ b/drivers/net/dsa/mv88e6352.c
@@ -364,6 +364,7 @@ struct dsa_switch_driver mv88e6352_switch_driver = {
 	.port_fdb_add		= mv88e6xxx_port_fdb_add,
 	.port_fdb_del		= mv88e6xxx_port_fdb_del,
 	.port_fdb_dump		= mv88e6xxx_port_fdb_dump,
+	.cross_chip_bridge	= mv88e6xxx_cross_chip_bridge,
 };
 
 MODULE_ALIAS("platform:mv88e6172");
diff --git a/drivers/net/dsa/mv88e6xxx.c b/drivers/net/dsa/mv88e6xxx.c
index e35bc9f..dccefdb 100644
--- a/drivers/net/dsa/mv88e6xxx.c
+++ b/drivers/net/dsa/mv88e6xxx.c
@@ -481,6 +481,14 @@ static bool mv88e6xxx_has_stu(struct dsa_switch *ds)
 	return false;
 }
 
+static bool mv88e6xxx_has_pvt(struct dsa_switch *ds)
+{
+	if (mv88e6xxx_6185_family(ds))
+		return false;
+
+	return true;
+}
+
 /* We expect the switch to perform auto negotiation if there is a real
  * phy. However, in the case of a fixed link phy, we force the port
  * settings from the fixed link settings.
@@ -2228,8 +2236,69 @@ static int _mv88e6xxx_pvt_cmd(struct dsa_switch *ds, int src_dev, int src_port,
 	return _mv88e6xxx_pvt_wait(ds);
 }
 
+static int _mv88e6xxx_pvt_read(struct dsa_switch *ds, int src_dev, int src_port,
+			       u16 *data)
+{
+	int ret;
+
+	ret = _mv88e6xxx_pvt_wait(ds);
+	if (ret < 0)
+		return ret;
+
+	ret = _mv88e6xxx_pvt_cmd(ds, src_dev, src_port,
+				GLOBAL2_PVT_ADDR_OP_READ);
+	if (ret < 0)
+		return ret;
+
+	ret = _mv88e6xxx_reg_read(ds, REG_GLOBAL2, GLOBAL2_PVT_DATA);
+	if (ret < 0)
+		return ret;
+
+	*data = ret;
+
+	return 0;
+}
+
+static int _mv88e6xxx_pvt_write(struct dsa_switch *ds, int src_dev,
+				int src_port, u16 data)
+{
+	int err;
+
+	err = _mv88e6xxx_pvt_wait(ds);
+	if (err)
+		return err;
+
+	err = _mv88e6xxx_reg_write(ds, REG_GLOBAL2, GLOBAL2_PVT_DATA, data);
+	if (err)
+		return err;
+
+        return _mv88e6xxx_pvt_cmd(ds, src_dev, src_port,
+				GLOBAL2_PVT_ADDR_OP_WRITE_PVLAN);
+}
+
+static int _mv88e6xxx_pvt_map(struct dsa_switch *ds, int src_dev, int src_port,
+			      struct net_device *bridge)
+{
+	struct mv88e6xxx_priv_state *ps = ds_to_priv(ds);
+	u16 pvlan = 0;
+	int port;
+
+	for (port = 0; port < ps->info->num_ports; ++port) {
+		/* Frames from external ports can egress DSA and CPU ports */
+		if (dsa_is_cpu_port(ds, port) || dsa_is_dsa_port(ds, port))
+			pvlan |= BIT(port);
+
+		/* Frames can egress bridge group members */
+		if (bridge && ps->ports[port].bridge_dev == bridge)
+			pvlan |= BIT(port);
+	}
+
+	return _mv88e6xxx_pvt_write(ds, src_dev, src_port, pvlan);
+}
+
 static int _mv88e6xxx_pvt_init(struct dsa_switch *ds)
 {
+	int src_dev, src_port;
 	int err;
 
 	/* Clear 5 Bit Port for usage with Marvell Link Street devices:
@@ -2240,8 +2309,21 @@ static int _mv88e6xxx_pvt_init(struct dsa_switch *ds)
 	if (err)
 		return err;
 
-	/* Allow any external frame to egress any internal port */
-	return _mv88e6xxx_pvt_cmd(ds, 0, 0, GLOBAL2_PVT_ADDR_OP_INIT_ONES);
+	/* Forbid every port of potential neighbor switches to egress frames on
+	 * the normal ports of this switch.
+	 */
+	for (src_dev = 0; src_dev < 32; ++src_dev) {
+		if (src_dev == ds->index)
+			continue;
+
+		for (src_port = 0; src_port < 16; ++src_port) {
+			err = _mv88e6xxx_pvt_map(ds, src_dev, src_port, NULL);
+			if (err)
+				return err;
+		}
+	}
+
+	return 0;
 }
 
 int mv88e6xxx_port_bridge_join(struct dsa_switch *ds, int port,
@@ -2286,6 +2368,35 @@ unlock:
 	return err;
 }
 
+static int _mv88e6xxx_pvt_unmap_local(struct dsa_switch *ds, int port)
+{
+	u16 pvlan;
+	int src_dev, src_port, err;
+
+	for (src_dev = 0; src_dev < 32; ++src_dev) {
+		if (src_dev == ds->index)
+			continue;
+
+		for (src_port = 0; src_port < 16; ++src_port) {
+			err = _mv88e6xxx_pvt_read(ds, src_dev, src_port,
+						  &pvlan);
+			if (err)
+				return err;
+
+			/* Forbid external normal frames to egress this port */
+			if (pvlan & BIT(port)) {
+				err = _mv88e6xxx_pvt_write(ds, src_dev,
+							   src_port,
+							   pvlan & ~BIT(port));
+				if (err)
+					return err;
+			}
+		}
+	}
+
+	return 0;
+}
+
 void mv88e6xxx_port_bridge_leave(struct dsa_switch *ds, int port)
 {
 	struct mv88e6xxx_priv_state *ps = ds_to_priv(ds);
@@ -2308,6 +2419,28 @@ void mv88e6xxx_port_bridge_leave(struct dsa_switch *ds, int port)
 			if (_mv88e6xxx_port_based_vlan_map(ds, i))
 				netdev_warn(ds->ports[i], "failed to remap\n");
 
+	if (mv88e6xxx_has_pvt(ds) && _mv88e6xxx_pvt_unmap_local(ds, port))
+		netdev_err(ds->ports[port], "failed to unmap\n");
+
+	mutex_unlock(&ps->smi_mutex);
+}
+
+void mv88e6xxx_cross_chip_bridge(struct dsa_switch *ds, int sw_index,
+				 int sw_port, struct net_device *bridge)
+{
+	struct mv88e6xxx_priv_state *ps = ds_to_priv(ds);
+
+	if (!mv88e6xxx_has_pvt(ds))
+		return;
+
+	/* Update the Cross-chip Port VLAN Table (PVT) entry for this external
+	 * source port to map which internal ports frames are allowed to egress.
+	 */
+
+	mutex_lock(&ps->smi_mutex);
+	if (_mv88e6xxx_pvt_map(ds, sw_index, sw_port, bridge))
+		dev_err(ds->master_dev, "failed to access PVT for sw%dp%d\n",
+			sw_index, sw_port);
 	mutex_unlock(&ps->smi_mutex);
 }
 
diff --git a/drivers/net/dsa/mv88e6xxx.h b/drivers/net/dsa/mv88e6xxx.h
index dd63377..ea214f2 100644
--- a/drivers/net/dsa/mv88e6xxx.h
+++ b/drivers/net/dsa/mv88e6xxx.h
@@ -523,6 +523,8 @@ int mv88e6xxx_port_fdb_del(struct dsa_switch *ds, int port,
 int mv88e6xxx_port_fdb_dump(struct dsa_switch *ds, int port,
 			    struct switchdev_obj_port_fdb *fdb,
 			    int (*cb)(struct switchdev_obj *obj));
+void mv88e6xxx_cross_chip_bridge(struct dsa_switch *ds, int sw_index,
+				 int sw_port, struct net_device *bridge);
 int mv88e6xxx_phy_page_read(struct dsa_switch *ds, int port, int page, int reg);
 int mv88e6xxx_phy_page_write(struct dsa_switch *ds, int port, int page,
 			     int reg, int val);
-- 
2.8.0

^ permalink raw reply related

* Re: [PATCH net-next v6] rtnetlink: add new RTM_GETSTATS message to dump link stats
From: Roopa Prabhu @ 2016-04-20 20:27 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, jhs, tgraf, nicolas.dichtel, nikolay
In-Reply-To: <20160420.160849.1254968687102415877.davem@davemloft.net>

On 4/20/16, 1:08 PM, David Miller wrote:
> From: Roopa Prabhu <roopa@cumulusnetworks.com>
> Date: Wed, 20 Apr 2016 08:43:43 -0700
>
>> This patch adds a new RTM_GETSTATS message to query link stats via netlink
>> from the kernel. RTM_NEWLINK also dumps stats today, but RTM_NEWLINK
>> returns a lot more than just stats and is expensive in some cases when
>> frequent polling for stats from userspace is a common operation.
> With nla_align_64bit() now working properly, I've applied this and it works
> on sparc64 too.
>
> Thanks!
Thank you.

^ permalink raw reply

* Re: drop all fragments inside tx queue if one gets dropped
From: Rick Jones @ 2016-04-20 20:45 UTC (permalink / raw)
  To: Michael Richardson, netdev, linux-wpan; +Cc: Alexander Aring
In-Reply-To: <11312.1461183334@obiwan.sandelman.ca>

For the "everything old is new again" files, back in the 1990s, it was 
noticed that on the likes of a netperf UDP_STREAM test on HP-UX, with 
fragmentation taking place, it was possible to consume 100% of the link 
bandwidth and have 0% effective throughput because the transmit queue 
was kept full with IP datagram fragments which could not possibly be 
reassembled (*) because one or more of the fragments of a datagram were 
dropped because the transmit queue was full.

HP-UX implemented "packet trains" where all the fragments of a 
fragmented datagram were presented to the driver, which then either 
queued them all, or none of them.

I don't recall seeing similar poor behaviour in Linux; I would have 
assumed that the intra-stack flow-control "took care" of it.  Perhaps 
there is something specific to wpan which precludes that?

happy benchmarking,

rick jones

^ permalink raw reply

* [net-next resubmit PATCH v2 0/3] Feature tweaks/fixes follow-up to GSO partial patches
From: Alexander Duyck @ 2016-04-20 20:50 UTC (permalink / raw)
  To: netdev, davem, alexander.duyck

This patch series is a set of minor fix-ups and tweaks following the GSO
partial and TSO with IPv4 ID mangling patches.  It mostly is just meant to
make certain that if we have GSO partial support at the device we can make
use of it from the far end of the tunnel.

I submitted this earlier today but it was set as RFC in patchwork.  This is
a submission for net-next and not an RFC so I am resubmitting.

v2: Added cover page which was forgotten with first submission.
    Added patch that enables TSOv4 IP ID mangling w/ tunnels and/or VLANs.

---

Alexander Duyck (3):
      netdev_features: Fold NETIF_F_ALL_TSO into NETIF_F_GSO_SOFTWARE
      veth: Update features to include all tunnel GSO types
      net: Add support for IP ID mangling TSO in cases that require encapsulation


 drivers/net/veth.c              |    7 +++----
 include/linux/netdev_features.h |    8 +++-----
 net/core/dev.c                  |   11 +++++++++++
 3 files changed, 17 insertions(+), 9 deletions(-)

^ permalink raw reply

* [net-next resubmit PATCH v2 1/3] netdev_features: Fold NETIF_F_ALL_TSO into NETIF_F_GSO_SOFTWARE
From: Alexander Duyck @ 2016-04-20 20:50 UTC (permalink / raw)
  To: netdev, davem, alexander.duyck
In-Reply-To: <20160420204900.4029.42938.stgit@ahduyck-xeon-server>

This patch folds NETIF_F_ALL_TSO into the bitmask for NETIF_F_GSO_SOFTWARE.
The idea is to avoid duplication of defines since the only difference
between the two was the GSO_UDP bit.

Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
---
 include/linux/netdev_features.h |    8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/include/linux/netdev_features.h b/include/linux/netdev_features.h
index 15eb0b12fff9..bc8736266749 100644
--- a/include/linux/netdev_features.h
+++ b/include/linux/netdev_features.h
@@ -152,11 +152,6 @@ enum {
 #define NETIF_F_GSO_MASK	(__NETIF_F_BIT(NETIF_F_GSO_LAST + 1) - \
 		__NETIF_F_BIT(NETIF_F_GSO_SHIFT))
 
-/* List of features with software fallbacks. */
-#define NETIF_F_GSO_SOFTWARE	(NETIF_F_TSO | NETIF_F_TSO_ECN | \
-				 NETIF_F_TSO_MANGLEID | \
-				 NETIF_F_TSO6 | NETIF_F_UFO)
-
 /* List of IP checksum features. Note that NETIF_F_ HW_CSUM should not be
  * set in features when NETIF_F_IP_CSUM or NETIF_F_IPV6_CSUM are set--
  * this would be contradictory
@@ -170,6 +165,9 @@ enum {
 #define NETIF_F_ALL_FCOE	(NETIF_F_FCOE_CRC | NETIF_F_FCOE_MTU | \
 				 NETIF_F_FSO)
 
+/* List of features with software fallbacks. */
+#define NETIF_F_GSO_SOFTWARE	(NETIF_F_ALL_TSO | NETIF_F_UFO)
+
 /*
  * If one device supports one of these features, then enable them
  * for all in netdev_increment_features.

^ permalink raw reply related

* [net-next resubmit PATCH v2 2/3] veth: Update features to include all tunnel GSO types
From: Alexander Duyck @ 2016-04-20 20:50 UTC (permalink / raw)
  To: netdev, davem, alexander.duyck
In-Reply-To: <20160420204900.4029.42938.stgit@ahduyck-xeon-server>

This patch adds support for the checksum enabled versions of UDP and GRE
tunnels.  With this change we should be able to send and receive GSO frames
of these types over the veth pair without needing to segment the packets.

Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
---
 drivers/net/veth.c |    7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index 4f30a6ae50d0..f37a6e61d4ad 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -312,10 +312,9 @@ static const struct net_device_ops veth_netdev_ops = {
 	.ndo_set_rx_headroom	= veth_set_rx_headroom,
 };
 
-#define VETH_FEATURES (NETIF_F_SG | NETIF_F_FRAGLIST | NETIF_F_ALL_TSO |    \
-		       NETIF_F_HW_CSUM | NETIF_F_RXCSUM | NETIF_F_HIGHDMA | \
-		       NETIF_F_GSO_GRE | NETIF_F_GSO_UDP_TUNNEL |	    \
-		       NETIF_F_GSO_IPIP | NETIF_F_GSO_SIT | NETIF_F_UFO	|   \
+#define VETH_FEATURES (NETIF_F_SG | NETIF_F_FRAGLIST | NETIF_F_HW_CSUM | \
+		       NETIF_F_RXCSUM | NETIF_F_HIGHDMA | \
+		       NETIF_F_GSO_SOFTWARE | NETIF_F_GSO_ENCAP_ALL | \
 		       NETIF_F_HW_VLAN_CTAG_TX | NETIF_F_HW_VLAN_CTAG_RX | \
 		       NETIF_F_HW_VLAN_STAG_TX | NETIF_F_HW_VLAN_STAG_RX )
 

^ permalink raw reply related

* [net-next resubmit PATCH v2 3/3] net: Add support for IP ID mangling TSO in cases that require encapsulation
From: Alexander Duyck @ 2016-04-20 20:51 UTC (permalink / raw)
  To: netdev, davem, alexander.duyck
In-Reply-To: <20160420204900.4029.42938.stgit@ahduyck-xeon-server>

This patch adds support for NETIF_F_TSO_MANGLEID if a given tunnel supports
NETIF_F_TSO.  This way if needed a device can then later enable the TSO
with IP ID mangling and the tunnels on top of that device can then also
make use of the IP ID mangling as well.

Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
---
 net/core/dev.c |   11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/net/core/dev.c b/net/core/dev.c
index 52d446b2cb99..6324bc9267f7 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -7029,8 +7029,19 @@ int register_netdevice(struct net_device *dev)
 	if (!(dev->flags & IFF_LOOPBACK))
 		dev->hw_features |= NETIF_F_NOCACHE_COPY;
 
+	/* If IPv4 TCP segmentation offload is supported we should also
+	 * allow the device to enable segmenting the frame with the option
+	 * of ignoring a static IP ID value.  This doesn't enable the
+	 * feature itself but allows the user to enable it later.
+	 */
 	if (dev->hw_features & NETIF_F_TSO)
 		dev->hw_features |= NETIF_F_TSO_MANGLEID;
+	if (dev->vlan_features & NETIF_F_TSO)
+		dev->vlan_features |= NETIF_F_TSO_MANGLEID;
+	if (dev->mpls_features & NETIF_F_TSO)
+		dev->mpls_features |= NETIF_F_TSO_MANGLEID;
+	if (dev->hw_enc_features & NETIF_F_TSO)
+		dev->hw_enc_features |= NETIF_F_TSO_MANGLEID;
 
 	/* Make NETIF_F_HIGHDMA inheritable to VLAN devices.
 	 */

^ permalink raw reply related

* Re: [PATCH net-next v6] rtnetlink: add new RTM_GETSTATS message to dump link stats
From: Elad Raz @ 2016-04-20 20:57 UTC (permalink / raw)
  To: Roopa Prabhu
  Cc: netdev@vger.kernel.org, jhs@mojatatu.com, davem@davemloft.net,
	tgraf@suug.ch, nicolas.dichtel@6wind.com,
	nikolay@cumulusnetworks.com
In-Reply-To: <1461167023-7640-1-git-send-email-roopa@cumulusnetworks.com>


> On 20 Apr 2016, at 6:43 PM, Roopa Prabhu <roopa@cumulusnetworks.com> wrote:
> 
> From: Roopa Prabhu <roopa@cumulusnetworks.com>
> 
> This patch adds a new RTM_GETSTATS message to query link stats via netlink
> from the kernel. RTM_NEWLINK also dumps stats today, but RTM_NEWLINK
> returns a lot more than just stats and is expensive in some cases when
> frequent polling for stats from userspace is a common operation.
> 
> RTM_GETSTATS is an attempt to provide a light weight netlink message
> to explicity query only link stats from the kernel on an interface.
> The idea is to also keep it extensible so that new kinds of stats can be
> added to it in the future.
> 
> This patch adds the following attribute for NETDEV stats:
> struct nla_policy ifla_stats_policy[IFLA_STATS_MAX + 1] = {
>        [IFLA_STATS_LINK_64]  = { .len = sizeof(struct rtnl_link_stats64) },
> };
> 
> Like any other rtnetlink message, RTM_GETSTATS can be used to get stats of
> a single interface or all interfaces with NLM_F_DUMP.
> 
> Future possible new types of stat attributes:
> link af stats:
>    - IFLA_STATS_LINK_IPV6  (nested. for ipv6 stats)
>    - IFLA_STATS_LINK_MPLS  (nested. for mpls/mdev stats)
> extended stats:
>    - IFLA_STATS_LINK_EXTENDED (nested. extended software netdev stats like bridge,
>      vlan, vxlan etc)
>    - IFLA_STATS_LINK_HW_EXTENDED (nested. extended hardware stats which are
>      available via ethtool today)

I think that it’s better to have IFLA_STATS_LINK_CPU_ONLY attribute. The default stat should be aggregation of HW only packets and packets that got trapped to CPU together.

> 
> This patch also declares a filter mask for all stat attributes.
> User has to provide a mask of stats attributes to query. filter mask
> can be specified in the new hdr 'struct if_stats_msg' for stats messages.
> Other important field in the header is the ifindex.
> 
> This api can also include attributes for global stats (eg tcp) in the future.
> When global stats are included in a stats msg, the ifindex in the header
> must be zero. A single stats message cannot contain both global and
> netdev specific stats. To easily distinguish them, netdev specific stat
> attributes name are prefixed with IFLA_STATS_LINK_
> 
> Without any attributes in the filter_mask, no stats will be returned.
> 
> This patch has been tested with mofified iproute2 ifstat.
> 
> Suggested-by: Jamal Hadi Salim <jhs@mojatatu.com>
> Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>

Nice work! Thank you Roopa!


^ permalink raw reply

* [RESEND] Re: updating carl9170-1.fw in linux-firmware.git
From: Christian Lamparter @ 2016-04-20 21:11 UTC (permalink / raw)
  To: Kalle Valo
  Cc: Lauri Kasanen, linux-wireless-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Linux Firmware Maintainers,
	Xose Vazquez Perez
In-Reply-To: <87a8koiubz.fsf-HodKDYzPHsUD5k0oWYwrnHL1okKdlPRT@public.gmane.org>

On Wednesday, April 20, 2016 10:59:44 AM Kalle Valo wrote:
> Christian Lamparter <chunkeey-gM/Ye1E23mwN+BqQ9rBEUg@public.gmane.org> writes:
> 
> > On Monday, April 18, 2016 07:42:05 PM Kalle Valo wrote:
> >> Christian Lamparter <chunkeey-gM/Ye1E23mwN+BqQ9rBEUg@public.gmane.org> writes:
> >> 
> >> > On Monday, April 18, 2016 06:45:09 PM Kalle Valo wrote:
> >> >
> >> >> Why even mention anything about a "special firmware" as the firmware is
> >> >> already available from linux-firmware.git? 
> >> >
> >> > Yes and no. 1.9.6 is in linux-firmware.git. I've tried to add 1.9.9 too
> >> > but that failed.
> >> > <http://comments.gmane.org/gmane.linux.kernel.wireless.general/114639>
> >> 
> >> Rick's comment makes sense to me, better just to provide the latest
> >> version. No need to unnecessary confuse the users. And if someone really
> >> wants to use an older version that she can retrieve it from the git
> >> history.
> >
> > Part of the fun here is that firmware is GPLv2. The linux-firmware.git has
> > to point to or add the firmware source to their tree. They have added every
> > single source file to it.... instead of "packaging" it in a tar.bz2/gz/xz
> > like you normally do for release sources.
> >
> > If you want to read more about it:
> > <http://www.spinics.net/lists/linux-wireless/msg101868.html>
> 
> Yeah, that's more work. I get that. But I'm still not understanding
> what's the actual problem which prevents us from updating carl9170
> firmware in linux-firmware.
I'm not sure, but why not ask? I've added the cc'ed Linux Firmware
Maintainers. So for those people reading the fw list:

What would it take to update the carl9170-1.fw firmware file in your
repository to the latest version?

Who has to sent the firmware update. Does it have to be the person who
sent the first request? (Xose)? The maintainer of the firmware (me)?
someone from Qualcomm Atheros? Or someone else (specific)? (the 
firmware is licensed as GPLv2 - in theory anyone should be able to
do that)

How should the firmware source update be handled? Currently the latest
.tar.xz of the firmware has ~130kb. The formated patches from 1.9.6 to
latest are about ~100kb (182 individual patches).

How does linux-firmware handle new binary firmware images and new 
sources? What if carl9170fw-2.bin is added. Do we need another
source directory for this in the current tree then? Because 
carl9170fw-1.bin will still be needed for backwards compatibility
so we basically need to duplicate parts of the source?

Also, how's the situation with ath9k_htc? The 1.4.0 image contains
some GPLv2 code as well? So, why is there no source in the tree, but 
just the link to it? Because, I would like to do basically the same
for carl9170fw and just add a link to the carl9170fw repository and
save everyone this source update "song and dance".

Regards,
Christian
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH 02/19] io-mapping: Specify mapping size for io_mapping_map_wc()
From: Luis R. Rodriguez @ 2016-04-20 21:23 UTC (permalink / raw)
  To: Chris Wilson, Luis R. Rodriguez, intel-gfx, Tvrtko Ursulin,
	Mika Kuoppala, Joonas Lahtinen, Tvrtko Ursulin, Daniel Vetter,
	Jani Nikula, David Airlie, Yishai Hadas, Dan Williams,
	Ingo Molnar, Peter Zijlstra (Intel), David Hildenbrand, dri-devel,
	netdev, linux-rdma, linux-kernel
In-Reply-To: <20160420191432.GK17454@nuc-i3427.alporthouse.com>

On Wed, Apr 20, 2016 at 08:14:32PM +0100, Chris Wilson wrote:
> On Wed, Apr 20, 2016 at 08:58:44PM +0200, Luis R. Rodriguez wrote:
> > On Wed, Apr 20, 2016 at 07:42:13PM +0100, Chris Wilson wrote:
> > > The ioremap() hidden behind the io_mapping_map_wc() convenience helper
> > > can be used for remapping multiple pages. Extend the helper so that
> > > future callers can use it for larger ranges.
> > > 
> > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > > Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> > > Cc: Daniel Vetter <daniel.vetter@intel.com>
> > > Cc: Jani Nikula <jani.nikula@linux.intel.com>
> > > Cc: David Airlie <airlied@linux.ie>
> > > Cc: Yishai Hadas <yishaih@mellanox.com>
> > > Cc: Dan Williams <dan.j.williams@intel.com>
> > > Cc: Ingo Molnar <mingo@kernel.org>
> > > Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
> > > Cc: David Hildenbrand <dahi@linux.vnet.ibm.com>
> > > Cc: Luis R. Rodriguez <mcgrof@kernel.org>
> > > Cc: intel-gfx@lists.freedesktop.org
> > > Cc: dri-devel@lists.freedesktop.org
> > > Cc: netdev@vger.kernel.org
> > > Cc: linux-rdma@vger.kernel.org
> > > Cc: linux-kernel@vger.kernel.org
> > 
> > We have 2 callers today, in the future, can you envision
> > this API getting more options? If so, in order to avoid the
> > pain of collateral evolutions I can suggest a descriptor
> > being passed with the required settings / options. This lets
> > you evolve the API without needing to go in and modify
> > old users. If you choose not to that's fine too, just
> > figured I'd chime in with that as I've seen the pain
> > with other APIs, and I'm putting an end to the needless
> > set of collateral evolutions this way.
> 
> Do you have a good example in mind? I've one more patch to try and take
> advantage of the io-mapping (that may or not be such a good idea in
> practice) but I may as well see if I can make io_mapping more useful
> when I do.

Sure, here's my current version of the revamp of the firmware API
to a more flexible API, which lets us compartamentalize the
usermode helper, and through the new API avoids the issues with further
future collateral evolutions. It is still being baked, I'm fine tuning
the SmPL to folks automatically do conversion if they want:

https://git.kernel.org/cgit/linux/kernel/git/mcgrof/linux.git/log/?h=20160417-sysdata-api-v1

It also has a test driver (which I'd also recommend if you can pull off).
It would be kind of hard to do something like a lib/io-mapping_test.c
given there is no real device to ioremap -- _but_ perhaps regular
RAM can be used for fake a device MMIO. I am not sure if its even
possible... but if so it would not only be useful for something
like your API but also for testing ioremap() and friends, and
any possible aliasing bombs we may want to vet for. It also hints
how we may in the future be able to automatically write test drivers
for APIs for us through inference, but that needs a lot of more love
to make it tangible.

  Luis
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply

* Re: qdisc spin lock
From: Michael Ma @ 2016-04-20 21:24 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Cong Wang, Linux Kernel Network Developers
In-Reply-To: <1460125157.6473.434.camel@edumazet-glaptop3.roam.corp.google.com>

2016-04-08 7:19 GMT-07:00 Eric Dumazet <eric.dumazet@gmail.com>:
> On Thu, 2016-03-31 at 16:48 -0700, Michael Ma wrote:
>> I didn't really know that multiple qdiscs can be isolated using MQ so
>> that each txq can be associated with a particular qdisc. Also we don't
>> really have multiple interfaces...
>>
>> With this MQ solution we'll still need to assign transmit queues to
>> different classes by doing some math on the bandwidth limit if I
>> understand correctly, which seems to be less convenient compared with
>> a solution purely within HTB.
>>
>> I assume that with this solution I can still share qdisc among
>> multiple transmit queues - please let me know if this is not the case.
>
> Note that this MQ + HTB thing works well, unless you use a bonding
> device. (Or you need the MQ+HTB on the slaves, with no way of sharing
> tokens between the slaves)

Actually MQ+HTB works well for small packets - like flow of 512 byte
packets can be throttled by HTB using one txq without being affected
by other flows with small packets. However I found using this solution
large packets (10k for example) will only achieve very limited
bandwidth. In my test I used MQ to assign one txq to a HTB which sets
rate at 1Gbit/s, 512 byte packets can achieve the ceiling rate by
using 30 threads. But sending 10k packets using 10 threads has only 10
Mbit/s with the same TC configuration. If I increase burst and cburst
of HTB to some extreme large value (like 50MB) the ceiling rate can be
hit.

The strange thing is that I don't see this problem when using HTB as
the root. So txq number seems to be a factor here - however it's
really hard to understand why would it only affect larger packets. Is
this a known issue? Any suggestion on how to investigate the issue
further? Profiling shows that the cpu utilization is pretty low.

>
>
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=bb1d912323d5dd50e1079e389f4e964be14f0ae3
>
> bonding can not really be used as a true MQ device yet.
>
> I might send a patch to disable this 'bonding feature' if no slave sets
> a queue_id.
>
>

^ permalink raw reply

* Re: [Intel-gfx] [PATCH 4/4] drm/i915: Move ioremap_wc tracking onto VMA
From: Luis R. Rodriguez @ 2016-04-20 21:27 UTC (permalink / raw)
  To: Luis R. Rodriguez, Chris Wilson, David Airlie,
	netdev-u79uwXL29TY76Z2rM5mHXA,
	intel-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Ingo Molnar,
	Peter Zijlstra (Intel),
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Daniel Vetter, Dan Williams,
	Yishai Hadas, David Hildenbrand
In-Reply-To: <20160420111730.GL2510-dv86pmgwkMBes7Z6vYuT8azUEOm+Xw19@public.gmane.org>

On Wed, Apr 20, 2016 at 01:17:30PM +0200, Daniel Vetter wrote:
> On Wed, Apr 20, 2016 at 11:10:54AM +0200, Luis R. Rodriguez wrote:
> > Reason I ask is since I noticed a while ago a lot of drivers
> > were using info->fix.smem_start and info->fix.smem_len consistently
> > for their ioremap'd areas it might make sense instead to let the
> > internal framebuffer (register_framebuffer()) optionally manage the
> > ioremap_wc() for drivers, given that this is pretty generic stuff.
> 
> All that legacy fbdev stuff is just for legacy support, and I prefer to
> have that as dumb as possible. There's been some discussion even around
> lifting the "kick out firmware fb driver" out of fbdev, since we'd need it
> to have a simple drm driver for e.g. uefi.
> 
> But I definitely don't want a legacy horror show like fbdev to
> automagically take care of device mappings for drivers.

Makes sense, it also still begs the question if more modern APIs
could manage the ioremap for you. Evidence shows people get
sloppy and if things were done internally with helpers it may
be easier to later make adjustments.

  Luis
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH net-next] macvlan: fix failure during registration v2
From: Francesco Ruggeri @ 2016-04-20 21:38 UTC (permalink / raw)
  To: netdev
  Cc: Francesco Ruggeri, David S. Miller, Eric W. Biederman,
	Mahesh Bandewar

If macvlan_common_newlink fails in register_netdevice after macvlan_init
then it decrements port->count twice, first in macvlan_uninit (from
register_netdevice or rollback_registered) and then again in
macvlan_common_newlink.
A similar problem may exist in the ipvlan driver.
This patch consolidates modifications to port->count into macvlan_init
and macvlan_uninit (thanks to Eric Biederman for suggesting this approach).
In macvtap_device_event it also avoids cleaning up in NETDEV_UNREGISTER
if NETDEV_REGISTER had previously failed.

Signed-off-by: Francesco Ruggeri <fruggeri@arista.com>
---
 drivers/net/macvlan.c | 10 ++++------
 drivers/net/macvtap.c |  2 ++
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c
index 2bcf1f3..cb01023 100644
--- a/drivers/net/macvlan.c
+++ b/drivers/net/macvlan.c
@@ -795,6 +795,7 @@ static int macvlan_init(struct net_device *dev)
 {
 	struct macvlan_dev *vlan = netdev_priv(dev);
 	const struct net_device *lowerdev = vlan->lowerdev;
+	struct macvlan_port *port = vlan->port;
 
 	dev->state		= (dev->state & ~MACVLAN_STATE_MASK) |
 				  (lowerdev->state & MACVLAN_STATE_MASK);
@@ -812,6 +813,8 @@ static int macvlan_init(struct net_device *dev)
 	if (!vlan->pcpu_stats)
 		return -ENOMEM;
 
+	port->count += 1;
+
 	return 0;
 }
 
@@ -1312,10 +1315,9 @@ int macvlan_common_newlink(struct net *src_net, struct net_device *dev,
 			return err;
 	}
 
-	port->count += 1;
 	err = register_netdevice(dev);
 	if (err < 0)
-		goto destroy_port;
+		return err;
 
 	dev->priv_flags |= IFF_MACVLAN;
 	err = netdev_upper_dev_link(lowerdev, dev);
@@ -1330,10 +1332,6 @@ int macvlan_common_newlink(struct net *src_net, struct net_device *dev,
 
 unregister_netdev:
 	unregister_netdevice(dev);
-destroy_port:
-	port->count -= 1;
-	if (!port->count)
-		macvlan_port_destroy(lowerdev);
 
 	return err;
 }
diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
index 95394ed..e770221 100644
--- a/drivers/net/macvtap.c
+++ b/drivers/net/macvtap.c
@@ -1303,6 +1303,8 @@ static int macvtap_device_event(struct notifier_block *unused,
 		}
 		break;
 	case NETDEV_UNREGISTER:
+		if (vlan->minor == 0)
+			break;
 		devt = MKDEV(MAJOR(macvtap_major), vlan->minor);
 		device_destroy(macvtap_class, devt);
 		macvtap_free_minor(vlan);
-- 
1.8.1.4

^ permalink raw reply related

* Re: [PATCH V2] net: stmmac: socfpga: Remove re-registration of reset controller
From: Dinh Nguyen @ 2016-04-20 21:17 UTC (permalink / raw)
  To: Marek Vasut, netdev
  Cc: peppe.cavallaro, alexandre.torgue, Matthew Gerlach,
	David S . Miller
In-Reply-To: <1461110753-7641-1-git-send-email-marex@denx.de>

On 04/19/2016 07:05 PM, Marek Vasut wrote:
> Both socfpga_dwmac_parse_data() in dwmac-socfpga.c and stmmac_dvr_probe()
> in stmmac_main.c functions call devm_reset_control_get() to register an
> reset controller for the stmmac. This results in an attempt to register
> two reset controllers for the same non-shared reset line.
> 
> The first attempt to register the reset controller works fine. The second
> attempt fails with warning from the reset controller core, see below.
> The warning is produced because the reset line is non-shared and thus
> it is allowed to have only up-to one reset controller associated with
> that reset line, not two or more.
> 
> The solution is not great. Since the hardware needs to toggle the reset
> before calling stmmac_dvr_probe() to perform mandatory preconfiguration,
> this patch splits socfpga_dwmac_init_probe() from socfpga_dwmac_init().
> 
> The socfpga_dwmac_init_probe() temporarily registers the reset controller,
> performs the pre-configuration and unregisters the reset controller again.
> This function is only called from the socfpga_dwmac_probe().
> 
> The original socfpga_dwmac_init() is tweaked to use reset controller
> pointer from the stmmac_priv (private data of the stmmac core) instead
> of the local instance, which was used before.
> 
> Finally, plat_dat->exit and socfpga_dwmac_exit() is no longer necessary,
> since the functionality is already performed by the stmmac core.
> 
> ------------[ cut here ]------------
> WARNING: CPU: 0 PID: 1 at drivers/reset/core.c:187 __of_reset_control_get+0x218/0x270
> Modules linked in:
> CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.6.0-rc4-next-20160419-00015-gabb2477-dirty #4
> Hardware name: Altera SOCFPGA
> [<c010f290>] (unwind_backtrace) from [<c010b82c>] (show_stack+0x10/0x14)
> [<c010b82c>] (show_stack) from [<c0373da4>] (dump_stack+0x94/0xa8)
> [<c0373da4>] (dump_stack) from [<c011bcc0>] (__warn+0xec/0x104)
> [<c011bcc0>] (__warn) from [<c011bd88>] (warn_slowpath_null+0x20/0x28)
> [<c011bd88>] (warn_slowpath_null) from [<c03a6eb4>] (__of_reset_control_get+0x218/0x270)
> [<c03a6eb4>] (__of_reset_control_get) from [<c03a701c>] (__devm_reset_control_get+0x54/0x90)
> [<c03a701c>] (__devm_reset_control_get) from [<c041fa30>] (stmmac_dvr_probe+0x1b4/0x8e8)
> [<c041fa30>] (stmmac_dvr_probe) from [<c04298c8>] (socfpga_dwmac_probe+0x1b8/0x28c)
> [<c04298c8>] (socfpga_dwmac_probe) from [<c03d6ffc>] (platform_drv_probe+0x4c/0xb0)
> [<c03d6ffc>] (platform_drv_probe) from [<c03d54ec>] (driver_probe_device+0x224/0x2bc)
> [<c03d54ec>] (driver_probe_device) from [<c03d5630>] (__driver_attach+0xac/0xb0)
> [<c03d5630>] (__driver_attach) from [<c03d382c>] (bus_for_each_dev+0x6c/0xa0)
> [<c03d382c>] (bus_for_each_dev) from [<c03d4ad4>] (bus_add_driver+0x1a4/0x21c)
> [<c03d4ad4>] (bus_add_driver) from [<c03d60ac>] (driver_register+0x78/0xf8)
> [<c03d60ac>] (driver_register) from [<c0101760>] (do_one_initcall+0x40/0x170)
> [<c0101760>] (do_one_initcall) from [<c0800e38>] (kernel_init_freeable+0x1dc/0x27c)
> [<c0800e38>] (kernel_init_freeable) from [<c05d1bd4>] (kernel_init+0x8/0x114)
> [<c05d1bd4>] (kernel_init) from [<c01076f8>] (ret_from_fork+0x14/0x3c)
> ---[ end trace 059d2fbe87608fa9 ]---
> 
> Signed-off-by: Marek Vasut <marex@denx.de>
> Cc: Matthew Gerlach <mgerlach@opensource.altera.com>
> Cc: Dinh Nguyen <dinguyen@opensource.altera.com>
> Cc: David S. Miller <davem@davemloft.net>
> ---
> V2: Add missing stmmac_rst = NULL; into socfpga_dwmac_init_probe()
> ---
>  .../net/ethernet/stmicro/stmmac/dwmac-socfpga.c    | 70 ++++++++++++----------
>  1 file changed, 39 insertions(+), 31 deletions(-)
> 
> diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-socfpga.c b/drivers/net/ethernet/stmicro/stmmac/dwmac-socfpga.c
> index 76d671e..5885a2e 100644
> --- a/drivers/net/ethernet/stmicro/stmmac/dwmac-socfpga.c
> +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-socfpga.c
> @@ -49,7 +49,6 @@ struct socfpga_dwmac {
>  	u32	reg_shift;
>  	struct	device *dev;
>  	struct regmap *sys_mgr_base_addr;
> -	struct reset_control *stmmac_rst;
>  	void __iomem *splitter_base;
>  	bool f2h_ptp_ref_clk;
>  };
> @@ -92,15 +91,6 @@ static int socfpga_dwmac_parse_data(struct socfpga_dwmac *dwmac, struct device *
>  	struct device_node *np_splitter;
>  	struct resource res_splitter;
>  
> -	dwmac->stmmac_rst = devm_reset_control_get(dev,
> -						  STMMAC_RESOURCE_NAME);
> -	if (IS_ERR(dwmac->stmmac_rst)) {
> -		dev_info(dev, "Could not get reset control!\n");
> -		if (PTR_ERR(dwmac->stmmac_rst) == -EPROBE_DEFER)
> -			return -EPROBE_DEFER;
> -		dwmac->stmmac_rst = NULL;
> -	}
> -
>  	dwmac->interface = of_get_phy_mode(np);
>  
>  	sys_mgr_base_addr = syscon_regmap_lookup_by_phandle(np, "altr,sysmgr-syscon");
> @@ -194,30 +184,23 @@ static int socfpga_dwmac_setup(struct socfpga_dwmac *dwmac)
>  	return 0;
>  }
>  
> -static void socfpga_dwmac_exit(struct platform_device *pdev, void *priv)
> -{
> -	struct socfpga_dwmac	*dwmac = priv;
> -
> -	/* On socfpga platform exit, assert and hold reset to the
> -	 * enet controller - the default state after a hard reset.
> -	 */
> -	if (dwmac->stmmac_rst)
> -		reset_control_assert(dwmac->stmmac_rst);
> -}
> -
>  static int socfpga_dwmac_init(struct platform_device *pdev, void *priv)
>  {
> -	struct socfpga_dwmac	*dwmac = priv;
> +	struct socfpga_dwmac *dwmac = priv;
>  	struct net_device *ndev = platform_get_drvdata(pdev);
>  	struct stmmac_priv *stpriv = NULL;
>  	int ret = 0;
>  
> -	if (ndev)
> -		stpriv = netdev_priv(ndev);
> +	if (!ndev)
> +		return -EINVAL;
> +
> +	stpriv = netdev_priv(ndev);
> +	if (!stpriv)
> +		return -EINVAL;
>  
>  	/* Assert reset to the enet controller before changing the phy mode */
> -	if (dwmac->stmmac_rst)
> -		reset_control_assert(dwmac->stmmac_rst);
> +	if (stpriv->stmmac_rst)
> +		reset_control_assert(stpriv->stmmac_rst);
>  
>  	/* Setup the phy mode in the system manager registers according to
>  	 * devicetree configuration
> @@ -227,8 +210,8 @@ static int socfpga_dwmac_init(struct platform_device *pdev, void *priv)
>  	/* Deassert reset for the phy configuration to be sampled by
>  	 * the enet controller, and operation to start in requested mode
>  	 */
> -	if (dwmac->stmmac_rst)
> -		reset_control_deassert(dwmac->stmmac_rst);
> +	if (stpriv->stmmac_rst)
> +		reset_control_deassert(stpriv->stmmac_rst);
>  
>  	/* Before the enet controller is suspended, the phy is suspended.
>  	 * This causes the phy clock to be gated. The enet controller is
> @@ -245,12 +228,38 @@ static int socfpga_dwmac_init(struct platform_device *pdev, void *priv)
>  	 * control register 0, and can be modified by the phy driver
>  	 * framework.
>  	 */
> -	if (stpriv && stpriv->phydev)
> +	if (stpriv->phydev)
>  		phy_resume(stpriv->phydev);
>  
>  	return ret;
>  }
>  
> +static int socfpga_dwmac_init_probe(struct socfpga_dwmac *dwmac)
> +{
> +	struct reset_control *stmmac_rst;
> +	int ret;
> +
> +	stmmac_rst = reset_control_get(dwmac->dev, STMMAC_RESOURCE_NAME);
> +	if (IS_ERR(stmmac_rst)) {
> +		dev_info(dwmac->dev, "Could not get reset control!\n");
> +		if (PTR_ERR(stmmac_rst) == -EPROBE_DEFER)
> +			return -EPROBE_DEFER;
> +		stmmac_rst = NULL;
> +	}
> +
> +	if (stmmac_rst)
> +		reset_control_assert(stmmac_rst);
> +
> +	ret = socfpga_dwmac_setup(dwmac);
> +
> +	if (stmmac_rst) {
> +		reset_control_deassert(stmmac_rst);
> +		reset_control_put(stmmac_rst);
> +	}
> +
> +	return ret;
> +}

I don't think you this function because...

> +
>  static int socfpga_dwmac_probe(struct platform_device *pdev)
>  {
>  	struct plat_stmmacenet_data *plat_dat;
> @@ -279,10 +288,9 @@ static int socfpga_dwmac_probe(struct platform_device *pdev)
>  
>  	plat_dat->bsp_priv = dwmac;
>  	plat_dat->init = socfpga_dwmac_init;
> -	plat_dat->exit = socfpga_dwmac_exit;
>  	plat_dat->fix_mac_speed = socfpga_dwmac_fix_mac_speed;
>  
> -	ret = socfpga_dwmac_init(pdev, plat_dat->bsp_priv);
> +	ret = socfpga_dwmac_init_probe(dwmac);
>  	if (ret)
>  		return ret;
>  
> 

if you modify the patch to call stmmac_dvr_probe() before calling
socfpga_dwmac_init(), then you would already have the reset control
information.

Something like this:

---------------------------------8<--------------------------------

@@ -269,14 +252,13 @@ static int socfpga_dwmac_probe(struct
platform_device *pdev)

        plat_dat->bsp_priv = dwmac;
        plat_dat->init = socfpga_dwmac_init;
-       plat_dat->exit = socfpga_dwmac_exit;
        plat_dat->fix_mac_speed = socfpga_dwmac_fix_mac_speed;

-       ret = socfpga_dwmac_init(pdev, plat_dat->bsp_priv);
-       if (ret)
-               return ret;
+       ret = stmmac_dvr_probe(&pdev->dev, plat_dat, &stmmac_res);

-       return stmmac_dvr_probe(&pdev->dev, plat_dat, &stmmac_res);
+       if (!ret)
+               ret = socfpga_dwmac_init(pdev, plat_dat->bsp_priv);
+       return ret;
 }


What do you think?

Dinh

^ permalink raw reply

* [PATCH net] atl2: Disable unimplemented scatter/gather feature
From: Ben Hutchings @ 2016-04-20 22:23 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Justin Yackoski

[-- Attachment #1: Type: text/plain, Size: 1462 bytes --]

atl2 includes NETIF_F_SG in hw_features even though it has no support
for non-linear skbs.  This bug was originally harmless since the
driver does not claim to implement checksum offload and that used to
be a requirement for SG.

Now that SG and checksum offload are independent features, if you
explicitly enable SG *and* use one of the rare protocols that can use
SG without checkusm offload, this potentially leaks sensitive
information (before you notice that it just isn't working).  Therefore
this obscure bug has been designated CVE-2016-2117.

Reported-by: Justin Yackoski <jyackoski@crypto-nite.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Fixes: ec5f06156423 ("net: Kill link between CSUM and SG features.")
---
 drivers/net/ethernet/atheros/atlx/atl2.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/atheros/atlx/atl2.c b/drivers/net/ethernet/atheros/atlx/atl2.c
index 8f76f4558a88..2ff465848b65 100644
--- a/drivers/net/ethernet/atheros/atlx/atl2.c
+++ b/drivers/net/ethernet/atheros/atlx/atl2.c
@@ -1412,7 +1412,7 @@ static int atl2_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 
 	err = -EIO;
 
-	netdev->hw_features = NETIF_F_SG | NETIF_F_HW_VLAN_CTAG_RX;
+	netdev->hw_features = NETIF_F_HW_VLAN_CTAG_RX;
 	netdev->features |= (NETIF_F_HW_VLAN_CTAG_TX | NETIF_F_HW_VLAN_CTAG_RX);
 
 	/* Init PHY as early as possible due to power saving issue  */

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 811 bytes --]

^ permalink raw reply related

* Re: [PATCH V2] net: stmmac: socfpga: Remove re-registration of reset controller
From: Marek Vasut @ 2016-04-20 22:27 UTC (permalink / raw)
  To: Dinh Nguyen, netdev
  Cc: peppe.cavallaro, alexandre.torgue, Matthew Gerlach,
	David S . Miller
In-Reply-To: <5717F1DF.1020004@opensource.altera.com>

On 04/20/2016 11:17 PM, Dinh Nguyen wrote:
> On 04/19/2016 07:05 PM, Marek Vasut wrote:
>> Both socfpga_dwmac_parse_data() in dwmac-socfpga.c and stmmac_dvr_probe()
>> in stmmac_main.c functions call devm_reset_control_get() to register an
>> reset controller for the stmmac. This results in an attempt to register
>> two reset controllers for the same non-shared reset line.
>>
>> The first attempt to register the reset controller works fine. The second
>> attempt fails with warning from the reset controller core, see below.
>> The warning is produced because the reset line is non-shared and thus
>> it is allowed to have only up-to one reset controller associated with
>> that reset line, not two or more.
>>
>> The solution is not great. Since the hardware needs to toggle the reset
>> before calling stmmac_dvr_probe() to perform mandatory preconfiguration,
>> this patch splits socfpga_dwmac_init_probe() from socfpga_dwmac_init().
>>
>> The socfpga_dwmac_init_probe() temporarily registers the reset controller,
>> performs the pre-configuration and unregisters the reset controller again.
>> This function is only called from the socfpga_dwmac_probe().
>>
>> The original socfpga_dwmac_init() is tweaked to use reset controller
>> pointer from the stmmac_priv (private data of the stmmac core) instead
>> of the local instance, which was used before.
>>
>> Finally, plat_dat->exit and socfpga_dwmac_exit() is no longer necessary,
>> since the functionality is already performed by the stmmac core.
>>
>> ------------[ cut here ]------------
>> WARNING: CPU: 0 PID: 1 at drivers/reset/core.c:187 __of_reset_control_get+0x218/0x270
>> Modules linked in:
>> CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.6.0-rc4-next-20160419-00015-gabb2477-dirty #4
>> Hardware name: Altera SOCFPGA
>> [<c010f290>] (unwind_backtrace) from [<c010b82c>] (show_stack+0x10/0x14)
>> [<c010b82c>] (show_stack) from [<c0373da4>] (dump_stack+0x94/0xa8)
>> [<c0373da4>] (dump_stack) from [<c011bcc0>] (__warn+0xec/0x104)
>> [<c011bcc0>] (__warn) from [<c011bd88>] (warn_slowpath_null+0x20/0x28)
>> [<c011bd88>] (warn_slowpath_null) from [<c03a6eb4>] (__of_reset_control_get+0x218/0x270)
>> [<c03a6eb4>] (__of_reset_control_get) from [<c03a701c>] (__devm_reset_control_get+0x54/0x90)
>> [<c03a701c>] (__devm_reset_control_get) from [<c041fa30>] (stmmac_dvr_probe+0x1b4/0x8e8)
>> [<c041fa30>] (stmmac_dvr_probe) from [<c04298c8>] (socfpga_dwmac_probe+0x1b8/0x28c)
>> [<c04298c8>] (socfpga_dwmac_probe) from [<c03d6ffc>] (platform_drv_probe+0x4c/0xb0)
>> [<c03d6ffc>] (platform_drv_probe) from [<c03d54ec>] (driver_probe_device+0x224/0x2bc)
>> [<c03d54ec>] (driver_probe_device) from [<c03d5630>] (__driver_attach+0xac/0xb0)
>> [<c03d5630>] (__driver_attach) from [<c03d382c>] (bus_for_each_dev+0x6c/0xa0)
>> [<c03d382c>] (bus_for_each_dev) from [<c03d4ad4>] (bus_add_driver+0x1a4/0x21c)
>> [<c03d4ad4>] (bus_add_driver) from [<c03d60ac>] (driver_register+0x78/0xf8)
>> [<c03d60ac>] (driver_register) from [<c0101760>] (do_one_initcall+0x40/0x170)
>> [<c0101760>] (do_one_initcall) from [<c0800e38>] (kernel_init_freeable+0x1dc/0x27c)
>> [<c0800e38>] (kernel_init_freeable) from [<c05d1bd4>] (kernel_init+0x8/0x114)
>> [<c05d1bd4>] (kernel_init) from [<c01076f8>] (ret_from_fork+0x14/0x3c)
>> ---[ end trace 059d2fbe87608fa9 ]---
>>
>> Signed-off-by: Marek Vasut <marex@denx.de>
>> Cc: Matthew Gerlach <mgerlach@opensource.altera.com>
>> Cc: Dinh Nguyen <dinguyen@opensource.altera.com>
>> Cc: David S. Miller <davem@davemloft.net>
>> ---
>> V2: Add missing stmmac_rst = NULL; into socfpga_dwmac_init_probe()
>> ---
>>  .../net/ethernet/stmicro/stmmac/dwmac-socfpga.c    | 70 ++++++++++++----------
>>  1 file changed, 39 insertions(+), 31 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-socfpga.c b/drivers/net/ethernet/stmicro/stmmac/dwmac-socfpga.c
>> index 76d671e..5885a2e 100644
>> --- a/drivers/net/ethernet/stmicro/stmmac/dwmac-socfpga.c
>> +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-socfpga.c
>> @@ -49,7 +49,6 @@ struct socfpga_dwmac {
>>  	u32	reg_shift;
>>  	struct	device *dev;
>>  	struct regmap *sys_mgr_base_addr;
>> -	struct reset_control *stmmac_rst;
>>  	void __iomem *splitter_base;
>>  	bool f2h_ptp_ref_clk;
>>  };
>> @@ -92,15 +91,6 @@ static int socfpga_dwmac_parse_data(struct socfpga_dwmac *dwmac, struct device *
>>  	struct device_node *np_splitter;
>>  	struct resource res_splitter;
>>  
>> -	dwmac->stmmac_rst = devm_reset_control_get(dev,
>> -						  STMMAC_RESOURCE_NAME);
>> -	if (IS_ERR(dwmac->stmmac_rst)) {
>> -		dev_info(dev, "Could not get reset control!\n");
>> -		if (PTR_ERR(dwmac->stmmac_rst) == -EPROBE_DEFER)
>> -			return -EPROBE_DEFER;
>> -		dwmac->stmmac_rst = NULL;
>> -	}
>> -
>>  	dwmac->interface = of_get_phy_mode(np);
>>  
>>  	sys_mgr_base_addr = syscon_regmap_lookup_by_phandle(np, "altr,sysmgr-syscon");
>> @@ -194,30 +184,23 @@ static int socfpga_dwmac_setup(struct socfpga_dwmac *dwmac)
>>  	return 0;
>>  }
>>  
>> -static void socfpga_dwmac_exit(struct platform_device *pdev, void *priv)
>> -{
>> -	struct socfpga_dwmac	*dwmac = priv;
>> -
>> -	/* On socfpga platform exit, assert and hold reset to the
>> -	 * enet controller - the default state after a hard reset.
>> -	 */
>> -	if (dwmac->stmmac_rst)
>> -		reset_control_assert(dwmac->stmmac_rst);
>> -}
>> -
>>  static int socfpga_dwmac_init(struct platform_device *pdev, void *priv)
>>  {
>> -	struct socfpga_dwmac	*dwmac = priv;
>> +	struct socfpga_dwmac *dwmac = priv;
>>  	struct net_device *ndev = platform_get_drvdata(pdev);
>>  	struct stmmac_priv *stpriv = NULL;
>>  	int ret = 0;
>>  
>> -	if (ndev)
>> -		stpriv = netdev_priv(ndev);
>> +	if (!ndev)
>> +		return -EINVAL;
>> +
>> +	stpriv = netdev_priv(ndev);
>> +	if (!stpriv)
>> +		return -EINVAL;
>>  
>>  	/* Assert reset to the enet controller before changing the phy mode */
>> -	if (dwmac->stmmac_rst)
>> -		reset_control_assert(dwmac->stmmac_rst);
>> +	if (stpriv->stmmac_rst)
>> +		reset_control_assert(stpriv->stmmac_rst);
>>  
>>  	/* Setup the phy mode in the system manager registers according to
>>  	 * devicetree configuration
>> @@ -227,8 +210,8 @@ static int socfpga_dwmac_init(struct platform_device *pdev, void *priv)
>>  	/* Deassert reset for the phy configuration to be sampled by
>>  	 * the enet controller, and operation to start in requested mode
>>  	 */
>> -	if (dwmac->stmmac_rst)
>> -		reset_control_deassert(dwmac->stmmac_rst);
>> +	if (stpriv->stmmac_rst)
>> +		reset_control_deassert(stpriv->stmmac_rst);
>>  
>>  	/* Before the enet controller is suspended, the phy is suspended.
>>  	 * This causes the phy clock to be gated. The enet controller is
>> @@ -245,12 +228,38 @@ static int socfpga_dwmac_init(struct platform_device *pdev, void *priv)
>>  	 * control register 0, and can be modified by the phy driver
>>  	 * framework.
>>  	 */
>> -	if (stpriv && stpriv->phydev)
>> +	if (stpriv->phydev)
>>  		phy_resume(stpriv->phydev);
>>  
>>  	return ret;
>>  }
>>  
>> +static int socfpga_dwmac_init_probe(struct socfpga_dwmac *dwmac)
>> +{
>> +	struct reset_control *stmmac_rst;
>> +	int ret;
>> +
>> +	stmmac_rst = reset_control_get(dwmac->dev, STMMAC_RESOURCE_NAME);
>> +	if (IS_ERR(stmmac_rst)) {
>> +		dev_info(dwmac->dev, "Could not get reset control!\n");
>> +		if (PTR_ERR(stmmac_rst) == -EPROBE_DEFER)
>> +			return -EPROBE_DEFER;
>> +		stmmac_rst = NULL;
>> +	}
>> +
>> +	if (stmmac_rst)
>> +		reset_control_assert(stmmac_rst);
>> +
>> +	ret = socfpga_dwmac_setup(dwmac);
>> +
>> +	if (stmmac_rst) {
>> +		reset_control_deassert(stmmac_rst);
>> +		reset_control_put(stmmac_rst);
>> +	}
>> +
>> +	return ret;
>> +}
> 
> I don't think you this function because...
> 
>> +
>>  static int socfpga_dwmac_probe(struct platform_device *pdev)
>>  {
>>  	struct plat_stmmacenet_data *plat_dat;
>> @@ -279,10 +288,9 @@ static int socfpga_dwmac_probe(struct platform_device *pdev)
>>  
>>  	plat_dat->bsp_priv = dwmac;
>>  	plat_dat->init = socfpga_dwmac_init;
>> -	plat_dat->exit = socfpga_dwmac_exit;
>>  	plat_dat->fix_mac_speed = socfpga_dwmac_fix_mac_speed;
>>  
>> -	ret = socfpga_dwmac_init(pdev, plat_dat->bsp_priv);
>> +	ret = socfpga_dwmac_init_probe(dwmac);
>>  	if (ret)
>>  		return ret;
>>  
>>
> 
> if you modify the patch to call stmmac_dvr_probe() before calling
> socfpga_dwmac_init(), then you would already have the reset control
> information.

I was under the impression that you must call socfpga_dwmac_init()
before stmmac_dvr_probe() for whatever hardware-related reason. If
you are absolutely certain this is not necessary, then that's just
perfect and the patch can be simplified even further -- just remove
the call to socfpga_dwmac_init() from probe altogether , the dwmac
core code will call plat_dat->init at the end of probe .

So shall we do that ? I am happy to spin V3 like that if you confirm
that it's legal to do things in the aforementioned order.

> Something like this:
> 
> ---------------------------------8<--------------------------------
> 
> @@ -269,14 +252,13 @@ static int socfpga_dwmac_probe(struct
> platform_device *pdev)
> 
>         plat_dat->bsp_priv = dwmac;
>         plat_dat->init = socfpga_dwmac_init;
> -       plat_dat->exit = socfpga_dwmac_exit;
>         plat_dat->fix_mac_speed = socfpga_dwmac_fix_mac_speed;
> 
> -       ret = socfpga_dwmac_init(pdev, plat_dat->bsp_priv);
> -       if (ret)
> -               return ret;
> +       ret = stmmac_dvr_probe(&pdev->dev, plat_dat, &stmmac_res);
> 
> -       return stmmac_dvr_probe(&pdev->dev, plat_dat, &stmmac_res);
> +       if (!ret)
> +               ret = socfpga_dwmac_init(pdev, plat_dat->bsp_priv);
> +       return ret;
>  }
> 
> 
> What do you think?



> Dinh
> 


-- 
Best regards,
Marek Vasut

^ permalink raw reply

* Re: qdisc spin lock
From: Eric Dumazet @ 2016-04-20 22:34 UTC (permalink / raw)
  To: Michael Ma; +Cc: Cong Wang, Linux Kernel Network Developers
In-Reply-To: <CAAmHdhzAEG-oXsA0P1mEJfseBTUdU7X_bEhgvkCLOLcqOHcQSw@mail.gmail.com>

On Wed, 2016-04-20 at 14:24 -0700, Michael Ma wrote:
> 2016-04-08 7:19 GMT-07:00 Eric Dumazet <eric.dumazet@gmail.com>:
> > On Thu, 2016-03-31 at 16:48 -0700, Michael Ma wrote:
> >> I didn't really know that multiple qdiscs can be isolated using MQ so
> >> that each txq can be associated with a particular qdisc. Also we don't
> >> really have multiple interfaces...
> >>
> >> With this MQ solution we'll still need to assign transmit queues to
> >> different classes by doing some math on the bandwidth limit if I
> >> understand correctly, which seems to be less convenient compared with
> >> a solution purely within HTB.
> >>
> >> I assume that with this solution I can still share qdisc among
> >> multiple transmit queues - please let me know if this is not the case.
> >
> > Note that this MQ + HTB thing works well, unless you use a bonding
> > device. (Or you need the MQ+HTB on the slaves, with no way of sharing
> > tokens between the slaves)
> 
> Actually MQ+HTB works well for small packets - like flow of 512 byte
> packets can be throttled by HTB using one txq without being affected
> by other flows with small packets. However I found using this solution
> large packets (10k for example) will only achieve very limited
> bandwidth. In my test I used MQ to assign one txq to a HTB which sets
> rate at 1Gbit/s, 512 byte packets can achieve the ceiling rate by
> using 30 threads. But sending 10k packets using 10 threads has only 10
> Mbit/s with the same TC configuration. If I increase burst and cburst
> of HTB to some extreme large value (like 50MB) the ceiling rate can be
> hit.
> 
> The strange thing is that I don't see this problem when using HTB as
> the root. So txq number seems to be a factor here - however it's
> really hard to understand why would it only affect larger packets. Is
> this a known issue? Any suggestion on how to investigate the issue
> further? Profiling shows that the cpu utilization is pretty low.

You could try 

perf record -a -g -e skb:kfree_skb sleep 5
perf report

So that you see where the packets are dropped.

Chances are that your UDP sockets SO_SNDBUF is too big, and packets are
dropped at qdisc enqueue time, instead of having backpressure.

^ permalink raw reply

* linux-next: zillions of lockdep whinges in include/net/sock.h:1408
From: Valdis Kletnieks @ 2016-04-21  0:30 UTC (permalink / raw)
  To: Hannes Frederic Sowa, David S. Miller; +Cc: netdev, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2126 bytes --]

linux-next 20160420 is whining at an incredible rate - in 20 minutes of
uptime, I piled up some 41,000 hits from all over the place (cleaned up
to skip the CPU and PID so the list isn't quite so long):

% grep include/net/sock.h /var/log/messages | cut -f5- -d: |  sed -e 's/PID: [0-9]* /PID: (elided) /' -e 's/CPU: [0-3]/CPU: +/' | sort | uniq -c | sort -nr
  13468  CPU: + PID: (elided) at include/net/sock.h:1408 tcp_v6_rcv+0xc20/0xcb0
   9770  CPU: + PID: (elided) at include/net/sock.h:1408 udp_queue_rcv_skb+0x3ca/0x6d0
   7706  CPU: + PID: (elided) at include/net/sock.h:1408 sock_owned_by_user+0x91/0xa0
   2818  CPU: + PID: (elided) at include/net/sock.h:1408 udpv6_queue_rcv_skb+0x3b6/0x6d0
   1981  CPU: + PID: (elided) at include/net/sock.h:1408 tcp_write_timer+0xf2/0x110
   1954  CPU: + PID: (elided) at include/net/sock.h:1408 tcp_delack_timer+0x110/0x130
   1912  CPU: + PID: (elided) at include/net/sock.h:1408 tcp_keepalive_timer+0x136/0x2c0
    882  CPU: + PID: (elided) at include/net/sock.h:1408 tcp_close+0x226/0x4f0
    804  CPU: + PID: (elided) at include/net/sock.h:1408 tcp_tasklet_func+0x192/0x1e0
     28  CPU: + PID: (elided) at include/net/sock.h:1408 tcp_child_process+0x17a/0x350
      2  CPU: + PID: (elided) at include/net/sock.h:1408 tcp_v6_err+0x401/0x660
      2  CPU: + PID: (elided) at include/net/sock.h:1408 tcp_v6_err+0x1fd/0x660

Seems to be from this commit, which is apparently over-stringent or
isn't handling some case correctly:

commit fafc4e1ea1a4c1eb13a30c9426fb799f5efacbc3
Author: Hannes Frederic Sowa <hannes@stressinduktion.org>
Date:   Fri Apr 8 15:11:27 2016 +0200

    sock: tigthen lockdep checks for sock_owned_by_user

    sock_owned_by_user should not be used without socket lock held. It seems
    to be a common practice to check .owned before lock reclassification, so
    provide a little help to abstract this check away.

    Cc: linux-cifs@vger.kernel.org
    Cc: linux-bluetooth@vger.kernel.org
    Cc: linux-nfs@vger.kernel.org
    Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>


[-- Attachment #2: Type: application/pgp-signature, Size: 848 bytes --]

^ permalink raw reply

* Re: [PATCH net 4/4] net/mlx4_en: Split SW RX dropped counter per RX ring
From: Eric Dumazet @ 2016-04-21  1:02 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: David S. Miller, netdev, Eran Ben Elisha, Yishai Hadas,
	Saeed Mahameed
In-Reply-To: <571799A3.402@mellanox.com>

On Wed, 2016-04-20 at 18:00 +0300, Or Gerlitz wrote:

> Just to be sure, you'd like me to re-spin this and fix the reporter name?

Absolutely not, I believe patchwork should handle this just fine.

Patchwork does not understand the "Fixes:" tag yet, but Reported-by: is
fine.

^ permalink raw reply

* [PATCH net] Driver: Vmxnet3: set CHECKSUM_UNNECESSARY for IPv6 packets
From: Shrikrishna Khare @ 2016-04-21  1:12 UTC (permalink / raw)
  To: netdev, linux-kernel, pv-drivers; +Cc: Shrikrishna Khare, Jin Heo

For IPv6, if the device indicates that the checksum is correct, set
CHECKSUM_UNNECESSARY.

Reported-by: Subbarao Narahari <snarahari@vmware.com>
Signed-off-by: Shrikrishna Khare <skhare@vmware.com>
Signed-off-by: Jin Heo <heoj@vmware.com>
---
 drivers/net/vmxnet3/vmxnet3_drv.c | 12 ++++++++----
 drivers/net/vmxnet3/vmxnet3_int.h |  4 ++--
 2 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/drivers/net/vmxnet3/vmxnet3_drv.c b/drivers/net/vmxnet3/vmxnet3_drv.c
index fc895d0..4a67e4f 100644
--- a/drivers/net/vmxnet3/vmxnet3_drv.c
+++ b/drivers/net/vmxnet3/vmxnet3_drv.c
@@ -1152,12 +1152,16 @@ vmxnet3_rx_csum(struct vmxnet3_adapter *adapter,
 		union Vmxnet3_GenericDesc *gdesc)
 {
 	if (!gdesc->rcd.cnc && adapter->netdev->features & NETIF_F_RXCSUM) {
-		/* typical case: TCP/UDP over IP and both csums are correct */
-		if ((le32_to_cpu(gdesc->dword[3]) & VMXNET3_RCD_CSUM_OK) ==
-							VMXNET3_RCD_CSUM_OK) {
+		if (gdesc->rcd.v4 &&
+		    (le32_to_cpu(gdesc->dword[3]) &
+		     VMXNET3_RCD_CSUM_OK) == VMXNET3_RCD_CSUM_OK) {
+			skb->ip_summed = CHECKSUM_UNNECESSARY;
+			BUG_ON(!(gdesc->rcd.tcp || gdesc->rcd.udp));
+			BUG_ON(gdesc->rcd.frg);
+		} else if (gdesc->rcd.v6 && (le32_to_cpu(gdesc->dword[3]) &
+					     (1 << VMXNET3_RCD_TUC_SHIFT))) {
 			skb->ip_summed = CHECKSUM_UNNECESSARY;
 			BUG_ON(!(gdesc->rcd.tcp || gdesc->rcd.udp));
-			BUG_ON(!(gdesc->rcd.v4  || gdesc->rcd.v6));
 			BUG_ON(gdesc->rcd.frg);
 		} else {
 			if (gdesc->rcd.csum) {
diff --git a/drivers/net/vmxnet3/vmxnet3_int.h b/drivers/net/vmxnet3/vmxnet3_int.h
index 729c344..c482539 100644
--- a/drivers/net/vmxnet3/vmxnet3_int.h
+++ b/drivers/net/vmxnet3/vmxnet3_int.h
@@ -69,10 +69,10 @@
 /*
  * Version numbers
  */
-#define VMXNET3_DRIVER_VERSION_STRING   "1.4.6.0-k"
+#define VMXNET3_DRIVER_VERSION_STRING   "1.4.7.0-k"
 
 /* a 32-bit int, each byte encode a verion number in VMXNET3_DRIVER_VERSION */
-#define VMXNET3_DRIVER_VERSION_NUM      0x01040600
+#define VMXNET3_DRIVER_VERSION_NUM      0x01040700
 
 #if defined(CONFIG_PCI_MSI)
 	/* RSS only makes sense if MSI-X is supported. */
-- 
1.9.1

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox