Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH net-next v2 2/4] swdevice: add new api to set and del bridge port attributes
From: Roopa Prabhu @ 2014-12-11 18:27 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: sfeldma, jhs, bcrl, tgraf, john.fastabend, stephen, linville,
	vyasevic, netdev, davem, shm, gospo
In-Reply-To: <20141211180743.GA2010@nanopsycho.orion>

On 12/11/14, 10:07 AM, Jiri Pirko wrote:
> Thu, Dec 11, 2014 at 06:59:15PM CET, roopa@cumulusnetworks.com wrote:
>> On 12/11/14, 9:11 AM, Jiri Pirko wrote:
>>> Thu, Dec 11, 2014 at 05:52:10PM CET, roopa@cumulusnetworks.com wrote:
>>>> On 12/10/14, 1:37 AM, Jiri Pirko wrote:
>>>>> Wed, Dec 10, 2014 at 10:05:18AM CET, roopa@cumulusnetworks.com wrote:
>>>>>> From: Roopa Prabhu <roopa@cumulusnetworks.com>
>>>>>>
>>>>>> This patch adds two new api's netdev_switch_port_bridge_setlink
>>>>>> and netdev_switch_port_bridge_dellink to offload bridge port attributes
>>>>>> to switch asic
>>>>>>
>>>>>> (The names of the apis look odd with 'switch_port_bridge',
>>>>>> but am more inclined to change the prefix of the api to something else.
>>>>>> Will take any suggestions).
>>>>>>
>>>>>> The api's look at the NETIF_F_HW_NETFUNC_OFFLOAD feature flag to
>>>>>> pass bridge port attributes to the port device.
>>>>>>
>>>>>> If the device has the NETIF_F_HW_NETFUNC_OFFLOAD, but does not support
>>>>>> the bridge port attribute offload ndo, call bridge port attribute ndo's on
>>>>>> the lowerdevs if supported. This is one way to pass bridge port attributes
>>>>>> through stacked netdevs (example when bridge port is a bond and bond slaves
>>>>>> are switch ports).
>>>>>>
>>>>>> Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
>>>>>> ---
>>>>>> include/net/switchdev.h   |    5 +++-
>>>>>> net/switchdev/switchdev.c |   70 +++++++++++++++++++++++++++++++++++++++++++++
>>>>>> 2 files changed, 74 insertions(+), 1 deletion(-)
>>>>>>
>>>>>> diff --git a/include/net/switchdev.h b/include/net/switchdev.h
>>>>>> index 8a6d164..22676b6 100644
>>>>>> --- a/include/net/switchdev.h
>>>>>> +++ b/include/net/switchdev.h
>>>>>> @@ -17,7 +17,10 @@
>>>>>> int netdev_switch_parent_id_get(struct net_device *dev,
>>>>>> 				struct netdev_phys_item_id *psid);
>>>>>> int netdev_switch_port_stp_update(struct net_device *dev, u8 state);
>>>>>> -
>>>>>> +int netdev_switch_port_bridge_setlink(struct net_device *dev,
>>>>>> +				struct nlmsghdr *nlh, u16 flags);
>>>>>> +int netdev_switch_port_bridge_dellink(struct net_device *dev,
>>>>>> +				struct nlmsghdr *nlh, u16 flags);
>>>>>> #else
>>>>>>
>>>>>> static inline int netdev_switch_parent_id_get(struct net_device *dev,
>>>>>> diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c
>>>>>> index d162b21..62317e1 100644
>>>>>> --- a/net/switchdev/switchdev.c
>>>>>> +++ b/net/switchdev/switchdev.c
>>>>>> @@ -50,3 +50,73 @@ int netdev_switch_port_stp_update(struct net_device *dev, u8 state)
>>>>>> 	return ops->ndo_switch_port_stp_update(dev, state);
>>>>>> }
>>>>>> EXPORT_SYMBOL(netdev_switch_port_stp_update);
>>>>>> +
>>>>>> +/**
>>>>>> + *	netdev_switch_port_bridge_setlink - Notify switch device port of bridge
>>>>>> + *	port attributes
>>>>>> + *
>>>>>> + *	@dev: port device
>>>>>> + *	@nlh: netlink msg with bridge port attributes
>>>>>> + *
>>>>>> + *	Notify switch device port of bridge port attributes
>>>>>> + */
>>>>>> +int netdev_switch_port_bridge_setlink(struct net_device *dev,
>>>>>> +									  struct nlmsghdr *nlh, u16 flags)
>>>>>> +{
>>>>>> +	const struct net_device_ops *ops = dev->netdev_ops;
>>>>>> +	struct net_device *lower_dev;
>>>>>> +	struct list_head *iter;
>>>>>> +	int ret = 0, err = 0;
>>>>>> +
>>>>>> +	if (!(dev->features & NETIF_F_HW_NETFUNC_OFFLOAD))
>>>>>> +		return err;
>>>>>> +
>>>>>> +	if (ops->ndo_bridge_setlink) {
>>>>>> +	    WARN_ON(!ops->ndo_switch_parent_id_get);
>>>>>> +	    return ops->ndo_bridge_setlink(dev, nlh, flags);
>>>>> 	You have to change ndo_bridge_setlink in netdevice.h first.
>>>>> 	Otherwise when only this patch is applied (during bisection)
>>>>> 	this won't compile.
>>>> ack, will fix it and keep that in mind next time.
>>>>>> +	}
>>>>>> +
>>>>>> +	netdev_for_each_lower_dev(dev, lower_dev, iter) {
>>>>> 	I do not understand why to iterate over lower devices. At this
>>>>> 	stage we don't know a thing about this upper or its lowers. Let
>>>>> 	the uppers (/masters) to decide if this needs to be propagated
>>>>> 	or not.
>>>> Jiri, In the stacked devices case, there is no way to propagate the bridge
>>>> port attributes to switch device driver today (vlan and other bridge port
>>>> attributes). Can you tell me if there is a way ?. no, ndo_vlan* ndo's are not
>>>> useful here. Nor we should go and implement ndo_bridge_setlink* in all
>>>> devices that can be bridge ports.
>>> Hmm. I just think that is cleaner to implement ndo_bridge_setlink in
>>> bonding for example and let it propagate the the call to slaves.
>> No, that will require bridge attribute support in all drivers. And that is no
>> good.
> Not all drivers, just all masters which want to support this. Like bond,
> team, macvlan etc. That would be the same as for
> ndo_vlan_rx_add_vid/ndo_vlan_rx_kill_vid/ndo_change_mtu etc. I do not
> see any problem in that. It is much much clearer over big hammer iterate
> over lowers in my opinion.

You cannot avoid the lowerdev iteration in any case.
If you added it in the individual drivers: bond, macvlan and other 
drivers will all have to do the same thing.
ie Call bridge setlink on lowerdevs.
My patch avoids the need to modify these drivers. Besides it does this 
only when the OFFLOAD flag is set.

It will not stop at adding the ndo_bridge_setlink to bond/macvlan etc. 
It will be all other ndo_ops we will need for switch asics.
It will be l3 tomorrow, if the route is through a bond (But at that 
point, we may end up having to introduce switch device instead of going 
to the port. Lets see).

Today this patch introduces an abstract way to get to the switch driver 
by getting to the slave switch port (And only when the OFFLOAD flag is set).


>
>
>>> Let every "upper" to handle ndo_bridge_setlink their way. Sometimes it
>>> might not make sense to propagate to "lowers".
>> This does not really propagate to lowers. It is just trying to get to a
>> switch port and from there to the switch driver.
>> Example, bond driver does not need to care if its a bridge port. It will
>> simply pass the call to its slave which
>> might be a switch port.
>>
>> bond driver does not care if its a bridge port. But the switch driver cares,
>> because it knows that the bond was created with switch ports.
>>
>>
>>>> And this allows a switch driver to receive these callbacks if it has marked
>>>> the switch port with an offload flag. Your way of using the switch port to
>>>> get to the switch driver does not help in these cases.
>>> I do not follow how this is related to this case (stacked layout).
>>>
>>>> The other option is to use the 'switch device (not port)' to get to the
>>>> switch driver.
>>> That would not help this case (stacked layout) I believe.
>>>
>>>
>>>> This patch shows that you can still do this with the ndo ops.
>>>>>> +		err = netdev_switch_port_bridge_setlink(lower_dev, nlh, flags);
>>>>>> +		if (err)
>>>>>> +			ret = err;
>>>>>> +    }
>>>>>   ^^^^^ Indent is off. This should be catched by scripts/checkpatch.pl.
>>>>>
>>>>>> +
>>>>>> +	return ret;
>>>>>> +}
>>>>>> +EXPORT_SYMBOL(netdev_switch_port_bridge_setlink);
>>>>>> +
>>>>>> +/**
>>>>>> + *	netdev_switch_port_bridge_dellink - Notify switch device port of bridge
>>>>>> + *	attribute delete
>>>>>> + *
>>>>>> + *	@dev: port device
>>>>>> + *	@nlh: netlink msg with bridge port attributes
>>>>>> + *
>>>>>> + *	Notify switch device port of bridge port attribute delete
>>>>>> + */
>>>>>> +int netdev_switch_port_bridge_dellink(struct net_device *dev,
>>>>>> +									  struct nlmsghdr *nlh, u16 flags)
>>>>>> +{
>>>>>> +	const struct net_device_ops *ops = dev->netdev_ops;
>>>>>> +	struct net_device *lower_dev;
>>>>>> +	struct list_head *iter;
>>>>>> +	int ret = 0, err = 0;
>>>>>> +
>>>>>> +	if (!(dev->features & NETIF_F_HW_NETFUNC_OFFLOAD))
>>>>>> +		return err;
>>>>>> +
>>>>>> +	if (ops->ndo_bridge_dellink) {
>>>>>> +		WARN_ON(!ops->ndo_switch_parent_id_get);
>>>>>> +		return ops->ndo_bridge_dellink(dev, nlh, flags);
>>>>>> +	}
>>>>>> +
>>>>>> +	netdev_for_each_lower_dev(dev, lower_dev, iter) {
>>>>>> +		err = netdev_switch_port_bridge_dellink(lower_dev, nlh, flags);
>>>>>> +		if (err)
>>>>>> +			ret = err;
>>>>>> +	}
>>>>>> +
>>>>>> +	return ret;
>>>>>> +}
>>>>>> +EXPORT_SYMBOL(netdev_switch_port_bridge_dellink);
>>>>>> -- 
>>>>>> 1.7.10.4
>>>>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH v7 2/3] net: Add Keystone NetCP ethernet driver
From: Joe Perches @ 2014-12-11 18:34 UTC (permalink / raw)
  To: Murali Karicheri
  Cc: David Miller, netdev, linux-arm-kernel, linux-kernel, devicetree,
	robh+dt, grant.likely, WingMan Kwok
In-Reply-To: <5489D196.4070703@ti.com>

On Thu, 2014-12-11 at 12:17 -0500, Murali Karicheri wrote:
> On 12/11/2014 12:01 PM, David Miller wrote:
> > From: Murali Karicheri<m-karicheri2@ti.com>
> > Date: Thu, 11 Dec 2014 09:14:37 -0500
> >
> >> BTW, could you provide any suggestions that would help us merge this
> >> series to upstream? This has been sitting on this list for a while
> >> now.
> >
> > You simply have to continue going through the review and revision
> > process until people no longer find problems with your changes.
> >
> > This could take several more weeks, you simply must be patient.
> 
> Ok. Thanks. That is encouraging to hear!

Perhaps these trivial things might be considered

(uncompiled)
---
 drivers/net/ethernet/ti/netcp_core.c     |  2 +-
 drivers/net/ethernet/ti/netcp_ethss.c    | 83 +++++++++++++++++---------------
 drivers/net/ethernet/ti/netcp_xgbepcsr.c | 48 +++++++++---------
 3 files changed, 71 insertions(+), 62 deletions(-)

diff --git a/drivers/net/ethernet/ti/netcp_core.c b/drivers/net/ethernet/ti/netcp_core.c
index 60ad299..8f38fe8 100644
--- a/drivers/net/ethernet/ti/netcp_core.c
+++ b/drivers/net/ethernet/ti/netcp_core.c
@@ -1094,7 +1094,7 @@ static void netcp_tx_notify(void *arg)
 	napi_schedule(&netcp->tx_napi);
 }
 
-static struct knav_dma_desc*
+static struct knav_dma_desc *
 netcp_tx_map_skb(struct sk_buff *skb, struct netcp_intf *netcp)
 {
 	struct knav_dma_desc *desc, *ndesc, *pdesc;
diff --git a/drivers/net/ethernet/ti/netcp_ethss.c b/drivers/net/ethernet/ti/netcp_ethss.c
index 036b886..3757957 100644
--- a/drivers/net/ethernet/ti/netcp_ethss.c
+++ b/drivers/net/ethernet/ti/netcp_ethss.c
@@ -486,21 +486,29 @@ struct netcp_ethtool_stat {
 	int offset;
 };
 
-#define GBE_STATSA_INFO(field)		"GBE_A:"#field, GBE_STATSA_MODULE,\
-				FIELD_SIZEOF(struct gbe_hw_stats, field), \
-				offsetof(struct gbe_hw_stats, field)
-
-#define GBE_STATSB_INFO(field)		"GBE_B:"#field, GBE_STATSB_MODULE,\
-				FIELD_SIZEOF(struct gbe_hw_stats, field), \
-				offsetof(struct gbe_hw_stats, field)
-
-#define GBE_STATSC_INFO(field)		"GBE_C:"#field, GBE_STATSC_MODULE,\
-				FIELD_SIZEOF(struct gbe_hw_stats, field), \
-				offsetof(struct gbe_hw_stats, field)
-
-#define GBE_STATSD_INFO(field)		"GBE_D:"#field, GBE_STATSD_MODULE,\
-				FIELD_SIZEOF(struct gbe_hw_stats, field), \
-				offsetof(struct gbe_hw_stats, field)
+#define GBE_STATSA_INFO(field)						\
+	.desc = "GBE_A:"#field,						\
+	.type = GBE_STATSA_MODULE,					\
+	.size = FIELD_SIZEOF(struct gbe_hw_stats, field),		\
+	.offset = offsetof(struct gbe_hw_stats, field)
+
+#define GBE_STATSB_INFO(field)						\
+	.desc = "GBE_B:"#field,						\
+	.type = GBE_STATSB_MODULE,					\
+	.size = FIELD_SIZEOF(struct gbe_hw_stats, field),		\
+	.offset = offsetof(struct gbe_hw_stats, field)
+
+#define GBE_STATSC_INFO(field)						\
+	.desc = "GBE_C:"#field,						\
+	.type = GBE_STATSC_MODULE,					\
+	.size = FIELD_SIZEOF(struct gbe_hw_stats, field),		\
+	.offset = offsetof(struct gbe_hw_stats, field)
+
+#define GBE_STATSD_INFO(field)						\
+	.desc = "GBE_D:"#field,						\
+	.type = GBE_STATSD_MODULE,					\
+	.size = FIELD_SIZEOF(struct gbe_hw_stats, field),		\
+	.offset = offsetof(struct gbe_hw_stats, field)
 
 static const struct netcp_ethtool_stat gbe13_et_stats[] = {
 	/* GBE module A */
@@ -645,17 +653,23 @@ static const struct netcp_ethtool_stat gbe13_et_stats[] = {
 	{GBE_STATSD_INFO(rx_dma_overruns)},
 };
 
-#define XGBE_STATS0_INFO(field)	"GBE_0:"#field, XGBE_STATS0_MODULE, \
-				FIELD_SIZEOF(struct xgbe_hw_stats, field), \
-				offsetof(struct xgbe_hw_stats, field)
+#define XGBE_STATS0_INFO(field)						\
+	.desc = "GBE_0:"#field,						\
+	.type = GBE_STATS0_MODULE,					\
+	.size = FIELD_SIZEOF(struct gbe_hw_stats, field),		\
+	.offset = offsetof(struct gbe_hw_stats, field)
 
-#define XGBE_STATS1_INFO(field)	"GBE_1:"#field, XGBE_STATS1_MODULE, \
-				FIELD_SIZEOF(struct xgbe_hw_stats, field), \
-				offsetof(struct xgbe_hw_stats, field)
+#define XGBE_STATS1_INFO(field)						\
+	.desc = "GBE_1:"#field,						\
+	.type = GBE_STATS1_MODULE,					\
+	.size = FIELD_SIZEOF(struct gbe_hw_stats, field),		\
+	.offset = offsetof(struct gbe_hw_stats, field)
 
-#define XGBE_STATS2_INFO(field)	"GBE_2:"#field, XGBE_STATS2_MODULE, \
-				FIELD_SIZEOF(struct xgbe_hw_stats, field), \
-				offsetof(struct xgbe_hw_stats, field)
+#define XGBE_STATS2_INFO(field)
+	.desc = "GBE_2:"#field,						\
+	.type = GBE_STATS2_MODULE,					\
+	.size = FIELD_SIZEOF(struct gbe_hw_stats, field),		\
+	.offset = offsetof(struct gbe_hw_stats, field)
 
 static const struct netcp_ethtool_stat xgbe10_et_stats[] = {
 	/* GBE module 0 */
@@ -883,11 +897,11 @@ static void gbe_update_stats_ver14(struct gbe_priv *gbe_dev, uint64_t *data)
 			case GBE_STATSA_MODULE:
 			case GBE_STATSC_MODULE:
 				base = gbe_statsa;
-			break;
+				break;
 			case GBE_STATSB_MODULE:
 			case GBE_STATSD_MODULE:
 				base  = gbe_statsb;
-			break;
+				break;
 			}
 
 			p = base + gbe_dev->et_stats[j].offset;
@@ -1639,11 +1653,8 @@ static void init_secondary_ports(struct gbe_priv *gbe_dev,
 
 	for_each_child_of_node(node, port) {
 		slave = devm_kzalloc(dev, sizeof(*slave), GFP_KERNEL);
-		if (!slave) {
-			dev_err(dev, "memomry alloc failed for secondary port(%s), skipping...\n",
-				port->name);
+		if (!slave)
 			continue;
-		}
 
 		if (init_slave(gbe_dev, slave, port)) {
 			dev_err(dev, "Failed to initialize secondary port(%s), skipping...\n",
@@ -1763,10 +1774,8 @@ static int set_xgbe_ethss10_priv(struct gbe_priv *gbe_dev,
 					  XGBE10_NUM_STAT_ENTRIES *
 					  XGBE10_NUM_SLAVES * sizeof(u64),
 					  GFP_KERNEL);
-	if (!gbe_dev->hw_stats) {
-		dev_err(gbe_dev->dev, "hw_stats memory allocation failed\n");
+	if (!gbe_dev->hw_stats)
 		return -ENOMEM;
-	}
 
 	gbe_dev->ss_version = XGBE_SS_VERSION_10;
 	gbe_dev->sgmii_port_regs = gbe_dev->ss_regs +
@@ -1836,10 +1845,8 @@ static int set_gbe_ethss14_priv(struct gbe_priv *gbe_dev,
 					  GBE13_NUM_HW_STAT_ENTRIES *
 					  GBE13_NUM_SLAVES * sizeof(u64),
 					  GFP_KERNEL);
-	if (!gbe_dev->hw_stats) {
-		dev_err(gbe_dev->dev, "hw_stats memory allocation failed\n");
+	if (!gbe_dev->hw_stats)
 		return -ENOMEM;
-	}
 
 	gbe_dev->ss_version = GBE_SS_VERSION_14;
 	gbe_dev->sgmii_port_regs = regs + GBE13_SGMII_MODULE_OFFSET;
@@ -1995,10 +2002,10 @@ static int gbe_probe(struct netcp_device *netcp_device, struct device *dev,
 		dev_err(gbe_dev->dev, "error initializing ale engine\n");
 		ret = -ENODEV;
 		goto quit;
-	} else {
-		dev_dbg(gbe_dev->dev, "Created a gbe ale engine\n");
 	}
 
+	dev_dbg(gbe_dev->dev, "Created a gbe ale engine\n");
+
 	/* initialize host port */
 	gbe_init_host_port(gbe_dev);
 
diff --git a/drivers/net/ethernet/ti/netcp_xgbepcsr.c b/drivers/net/ethernet/ti/netcp_xgbepcsr.c
index 33571ac..d93a6a4 100644
--- a/drivers/net/ethernet/ti/netcp_xgbepcsr.c
+++ b/drivers/net/ethernet/ti/netcp_xgbepcsr.c
@@ -14,6 +14,9 @@
  * of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
  * GNU General Public License for more details.
  */
+
+#define pr_fmt(fmt) "XGBE: " fmt
+
 #include "netcp.h"
 
 /* XGBE registers */
@@ -43,7 +46,7 @@ struct serdes_cfg {
 	u32 mask;
 };
 
-static struct serdes_cfg cfg_phyb_1p25g_156p25mhz_cmu0[] = {
+static const struct serdes_cfg cfg_phyb_1p25g_156p25mhz_cmu0[] = {
 	{0x0000, 0x00800002, 0x00ff00ff},
 	{0x0014, 0x00003838, 0x0000ffff},
 	{0x0060, 0x1c44e438, 0xffffffff},
@@ -54,7 +57,7 @@ static struct serdes_cfg cfg_phyb_1p25g_156p25mhz_cmu0[] = {
 	{0x0000, 0x00000003, 0x000000ff},
 };
 
-static struct serdes_cfg cfg_phyb_10p3125g_156p25mhz_cmu1[] = {
+static const struct serdes_cfg cfg_phyb_10p3125g_156p25mhz_cmu1[] = {
 	{0x0c00, 0x00030002, 0x00ff00ff},
 	{0x0c14, 0x00005252, 0x0000ffff},
 	{0x0c28, 0x80000000, 0xff000000},
@@ -76,7 +79,7 @@ static struct serdes_cfg cfg_phyb_10p3125g_156p25mhz_cmu1[] = {
 	{0x0c00, 0x00000003, 0x000000ff},
 };
 
-static struct serdes_cfg cfg_phyb_10p3125g_16bit_lane[] = {
+static const struct serdes_cfg cfg_phyb_10p3125g_16bit_lane[] = {
 	{0x0204, 0x00000080, 0x000000ff},
 	{0x0208, 0x0000920d, 0x0000ffff},
 	{0x0204, 0xfc000000, 0xff000000},
@@ -106,7 +109,7 @@ static struct serdes_cfg cfg_phyb_10p3125g_16bit_lane[] = {
 	{0x03cc, 0x00000000, 0x000000ff},
 };
 
-static struct serdes_cfg cfg_phyb_10p3125g_comlane[] = {
+static const struct serdes_cfg cfg_phyb_10p3125g_comlane[] = {
 	{0x0a00, 0x00000800, 0x0000ff00},
 	{0x0a84, 0x00000000, 0x000000ff},
 	{0x0a8c, 0x00130000, 0x00ff0000},
@@ -124,7 +127,7 @@ static struct serdes_cfg cfg_phyb_10p3125g_comlane[] = {
 	{0x0ac0, 0x0000008b, 0x000000ff},
 };
 
-static struct serdes_cfg cfg_cm_c1_c2[] = {
+static const struct serdes_cfg cfg_cm_c1_c2[] = {
 	{0x0208, 0x00000000, 0x00000f00},
 	{0x0208, 0x00000000, 0x0000001f},
 	{0x0204, 0x00000000, 0x00040000},
@@ -185,8 +188,8 @@ static void netcp_xgbe_serdes_com_enable(void __iomem *serdes_regs)
 	}
 }
 
-static void netcp_xgbe_serdes_lane_enable(
-			void __iomem *serdes_regs, int lane)
+static void netcp_xgbe_serdes_lane_enable(void __iomem *serdes_regs,
+					  int lane)
 {
 	/* Set Lane Control Rate */
 	writel(0xe0e9e038, serdes_regs + 0x1fe0 + (4 * lane));
@@ -230,7 +233,7 @@ static int netcp_xgbe_wait_pll_locked(void __iomem *sw_regs)
 		cpu_relax();
 	} while (true);
 
-	pr_err("XGBE serdes not locked: time out.\n");
+	pr_err("serdes not locked: time out\n");
 	return ret;
 }
 
@@ -292,8 +295,7 @@ static void netcp_xgbe_serdes_reset_cdr(void __iomem *serdes_regs,
 	u32 tmp, dlpf, tbus;
 
 	/*Get the DLPF values */
-	tmp = netcp_xgbe_serdes_read_select_tbus(
-			serdes_regs, lane + 1, 5);
+	tmp = netcp_xgbe_serdes_read_select_tbus(serdes_regs, lane + 1, 5);
 
 	dlpf = tmp >> 2;
 
@@ -302,10 +304,10 @@ static void netcp_xgbe_serdes_reset_cdr(void __iomem *serdes_regs,
 		mdelay(1);
 		reg_rmw(sig_detect_reg, VAL_SH(0, 1), MASK_WID_SH(2, 1));
 	} else {
-		tbus = netcp_xgbe_serdes_read_select_tbus(serdes_regs, lane +
-							  1, 0xe);
+		tbus = netcp_xgbe_serdes_read_select_tbus(serdes_regs,
+							  lane + 1, 0xe);
 
-		pr_debug("XGBE: CDR centered, DLPF: %4d,%d,%d.\n",
+		pr_debug("CDR centered, DLPF: %4d,%d,%d\n",
 			 tmp >> 2, tmp & 3, (tbus >> 2) & 3);
 	}
 }
@@ -340,13 +342,13 @@ static int netcp_xgbe_check_link_status(void __iomem *serdes_regs,
 		case 0:
 			/* if good link lock the signal detect ON! */
 			if (!loss && blk_lock) {
-				pr_debug("XGBE PCSR Linked Lane: %d\n", i);
+				pr_debug("PCSR Linked Lane: %d\n", i);
 				reg_rmw(sig_detect_reg, VAL_SH(3, 1),
 					MASK_WID_SH(2, 1));
 				current_state[i] = 1;
 			} else if (!blk_lock) {
 				/* if no lock, then reset CDR */
-				pr_debug("XGBE PCSR Recover Lane: %d\n", i);
+				pr_debug("PCSR Recover Lane: %d\n", i);
 				netcp_xgbe_serdes_reset_cdr(serdes_regs,
 							    sig_detect_reg, i);
 			}
@@ -361,10 +363,10 @@ static int netcp_xgbe_check_link_status(void __iomem *serdes_regs,
 			break;
 
 		case 2:
-			if (blk_lock)
+			if (blk_lock) {
 				/* Nope just noise */
 				current_state[i] = 1;
-			else {
+			} else {
 				/* Lost the block lock, reset CDR if it is
 				 * not centered and go back to sync state
 				 */
@@ -375,7 +377,7 @@ static int netcp_xgbe_check_link_status(void __iomem *serdes_regs,
 			break;
 
 		default:
-			pr_err("XGBE: unknown current_state[%d] %d\n",
+			pr_err("unknown current_state[%d] %d\n",
 			       i, current_state[i]);
 			break;
 		}
@@ -417,19 +419,19 @@ static int netcp_xgbe_serdes_check_lane(void __iomem *serdes_regs,
 			break;
 
 		if (lane_down[0])
-			pr_debug("XGBE: detected link down on lane 0\n");
+			pr_debug("detected link down on lane 0\n");
 
 		if (lane_down[1])
-			pr_debug("XGBE: detected link down on lane 1\n");
+			pr_debug("detected link down on lane 1\n");
 
 		if (++retries > 1) {
-			pr_debug("XGBE: timeout waiting for serdes link up\n");
+			pr_debug("timeout waiting for serdes link up\n");
 			return -ETIMEDOUT;
 		}
 		mdelay(100);
 	} while (!link_up);
 
-	pr_debug("XGBE: PCSR link is up\n");
+	pr_debug("PCSR link is up\n");
 	return 0;
 }
 
@@ -494,7 +496,7 @@ int netcp_xgbe_serdes_init(void __iomem *serdes_regs, void __iomem *xgbe_regs)
 	/* read COMLANE bits 4:0 */
 	val = readl(serdes_regs + 0xa00);
 	if (val & 0x1f) {
-		pr_debug("XGBE: serdes already in operation - reset\n");
+		pr_debug("serdes already in operation - reset\n");
 		netcp_xgbe_reset_serdes(serdes_regs);
 	}
 	return netcp_xgbe_serdes_config(serdes_regs, xgbe_regs);

^ permalink raw reply related

* Re: [PATCH iproute2] ip: Simplify executing ip cmd within namespace
From: Marcelo Ricardo Leitner @ 2014-12-11 18:36 UTC (permalink / raw)
  To: Jiri Benc; +Cc: vadim4j, netdev
In-Reply-To: <20141211190846.1e25e7fa@griffin>

On 11-12-2014 16:08, Jiri Benc wrote:
> On Thu, 11 Dec 2014 15:33:34 -0200, Marcelo Ricardo Leitner wrote:
>> In that case, it would be interesting to also accelerate the original use
>> case, no? So all usages we currently have will benefit from this speed up
>> without a change.
>>
>> if (command to be executed == myself)
>>     switch namespace, continue without fork/exec..
>
> It's never good idea to do such tricks behind the user's back. This
> particular case could easily break for users wanting to execute a
> different ip binary (for whatever reason).

Then the if() above wouldn't match. That if means to check 
/proc/self/exe against the result of the path expansion. If that fails, 
continue with the normal path. If it matches, it is the same binary, and 
no need to re-exec itself.

> All programs should do what they are told to do, not try to outsmart
> the user.

It's not outsmarting, it's just not being dumb and doing it the proper 
way. Bash itself does this twist a lot. If you just type 'echo hi', it 
won't execute /bin/echo but use a built-in version. But if you write 
"/bin/echo hi", it will use /bin/echo..

We could use the same idea. "ip netns exec ip" -> ellipse it and avoid 
the fork/exec. But if it's cmd != "ip", execute it..

Now consider other applications that are user of this command. They will 
have to implement something like:

if (this ip command has --netns argument) {
    cmd="ip --netns ..."
} else {
    cmd="ip netns exec ..."
}

which is ugly and inconvenient.

   Marcelo

^ permalink raw reply

* RE: [PATCH net-next RESEND] net: Do not call ndo_dflt_fdb_dump if ndo_fdb_dump is defined.
From: Arad, Ronen @ 2014-12-11 18:47 UTC (permalink / raw)
  To: Hubert Sokolowski, Roopa Prabhu, netdev@vger.kernel.org; +Cc: Jamal Hadi Salim
In-Reply-To: <6e03100b17d3e17625f1c323d94a853e.squirrel@poczta.wsisiz.edu.pl>



> -----Original Message-----
> From: netdev-owner@vger.kernel.org [mailto:netdev-
> owner@vger.kernel.org] On Behalf Of Hubert Sokolowski
> Sent: Thursday, December 11, 2014 8:40 AM
> To: Roopa Prabhu
> Cc: netdev@vger.kernel.org; Jamal Hadi Salim
> Subject: Re: [PATCH net-next RESEND] net: Do not call ndo_dflt_fdb_dump if
> ndo_fdb_dump is defined.
> 
> > I think commit
> > "5e6d243587990a588143b9da3974833649595587 "bridge: netlink dump
> > interface at par with brctl" tried to make sure even the dflt entries
> > (ie dev->uc and dev->mc) were also printed in the below case. ie the
> > 'self' entries in the below output.
> >
> > # ./bridge fdb show brport eth1
> > 02:00:00:12:01:02 vlan 0 master br0 permanent
> > 00:17:42:8a:b4:05 vlan 0 master br0 permanent
> > 00:17:42:8a:b4:07 self permanent
> > 33:33:00:00:00:01 self permanent
> >
> > Am guessing reverting the patch is going to make the 'self' entries in
> > the above output to go away.
> > Can you please confirm ?.
> 
> I don't want you to revert the patch, as the main goal of the patch was to
> enable filtering in the kernel. I am only proposing to revert part of it that
> allows driver to implement own dump.
> This does not break the filtering in the kernel.
> Whether the 'self' entries will go away it depends if the driver overrides
> ndo_fdb_dump callback with its own. For cases where the driver does not
> implement the callback, the dflt callback is still called showing 'self' entries:
> [root@silpixa00378825 ~]# bridge fdb show
> 33:33:00:00:00:01 dev em1 self permanent
> 01:00:5e:00:00:01 dev em1 self permanent
> 33:33:00:00:00:01 dev p4p1 self permanent
> 01:00:5e:00:00:01 dev p4p1 self permanent 33:33:ff:81:56:db dev p4p1 self
> permanent 01:00:5e:00:00:fb dev p4p1 self permanent
> 33:33:00:00:00:01 dev dummy0 self permanent
> 
> >
> > Also, if i hear your concern correctly, for bridge ports that
> > implement ndo_fdb_dump, with commit
> > 5e6d243587990a588143b9da3974833649595587, we will get two entries
> for each 'self' entry above.
> > Can you also paste sample output for that ?.
> 
> My patch affects *only* drivers that implements own dump callback.
> Implementing own dump callback means the driver want to have a control
> over what is being dumped. For example you may want to dump a hardware
> MAC table only (my case) where 'self' entries created by kernel make no
> sense.
> Also there are drivers that calls dflt callback from inside own dump function.
> Please see following dump callback implemented for QLogic:
> static int qlcnic_fdb_dump(struct sk_buff *skb, struct netlink_callback *ncb,
>                         struct net_device *netdev,
>                         struct net_device *filter_dev, int idx) {
>         struct qlcnic_adapter *adapter = netdev_priv(netdev);
> 
>         if (!adapter->fdb_mac_learn)
>                 return ndo_dflt_fdb_dump(skb, ncb, netdev, filter_dev, idx); [...]
> 
> Another example of dflt being called twice is macvlan.c where
> ndo_fdb_dump is actually initialized with the dflt callback:
> macvlan.c:1022:        .ndo_fdb_dump           = ndo_dflt_fdb_dump,
> 
> Thanks,
> Hubert
>

I agree with Hubert on that. When a device defines its own ndo_fdb_dump it implies it wants control over the information that will be returned to user-space.
The proposed patch changes the recent status quo but it only restores it to the way it was before 5e6d243587990a588143b9da3974833649595587.
A compromise could be to also include in the same patch-set patches to the drivers that have benefited from the implicit call to ndo_dflt_fdb_dump.

-Ronen
 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in the body
> of a message to majordomo@vger.kernel.org More majordomo info at
> http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [RFC PATCH 1/3] lib: adding an Array-based Lock-Free (ALF) queue
From: David Miller @ 2014-12-11 19:15 UTC (permalink / raw)
  To: brouer-H+wXaHxf7aLQT0dZR+AlfA
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, cl-vYTEC60ixJUAvxtiuMwx3w,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	eric.dumazet-Re5JQEeQqe8AvxtiuMwx3w,
	hannes-tFNcAqjVMyqKXQKiL6tip0B+6BGkLq7r,
	alexander.duyck-Re5JQEeQqe8AvxtiuMwx3w,
	ast-uqk4Ao+rVK5Wk0Htik3J/w,
	paulmck-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	mathieu.desnoyers-vg+e7yoeK/dWk0Htik3J/w,
	rostedt-nx8X9YLhiw1AfugRpC6u6w
In-Reply-To: <20141210141512.31779.96487.stgit@dragon>

From: Jesper Dangaard Brouer <brouer-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Date: Wed, 10 Dec 2014 15:15:26 +0100

> +static inline int
> +alf_mp_enqueue(const u32 n;
> +	       struct alf_queue *q, void *ptr[n], const u32 n)
> +{
 ...
> +/* Main Multi-Consumer DEQUEUE */
> +static inline int
> +alf_mc_dequeue(const u32 n;
> +	       struct alf_queue *q, void *ptr[n], const u32 n)
> +{

I would seriously consider not inlining these.

^ permalink raw reply

* bind() should not return -EADDRINUSE
From: Phillip Susi @ 2014-12-11 19:19 UTC (permalink / raw)
  To: netdev

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Attempting to establish a connection to a remote host using the same
local port, but a different remote port as the previous connection (
that is still in TIME_WAIT ) results in bind() returning -EADDRINUSE.
 By changing the remote port, you avoid the conflict with the other
connection that is in TIME_WAIT, but since the remote port is not
known when bind() is called, it incorrectly returns -EADDRINUSE.  This
check should not be done in bind(), but deferred until the remote port
is known in the call to connect(), or listen().

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.17 (MingW32)

iQEcBAEBAgAGBQJUie4vAAoJENRVrw2cjl5R8CgH/jgdrBXs1Yso7XsHtrsUgO75
P8bY3Mf8eOuAkLvwbHopyGr19UZJJ+5zCQWc6bus9BRa7jSQvPCRw8fSjg3u+NXs
r9/qEqiDrYwAJqV5SwqDPGU8xGwq8v31bbT09spGIL9fABW3krBFtDHcZ/PzGA/p
UGDgELdkU7QIMKvC7FAV5RzvMXpmAI+m7iQiF936QkGZPN0AOCTea0sB+IpeidTO
/ChLd2iaEqAmS6UGh3jIQSNcTB3Ho3imHHMjTCel2v8+SgCBb2Tp3HzfmmIsGAnh
YeH+ZCNHD08c+k9yprEVjCH6fwMOo7CWzyPnSme+Ggk0L5r9oj8iz4xIeIMv43o=
=Ou3S
-----END PGP SIGNATURE-----

^ permalink raw reply

* Re: randconfig build error with next-20141210, in drivers/net/ethernet/broadcom/genet
From: David Miller @ 2014-12-11 19:22 UTC (permalink / raw)
  To: jim.epost; +Cc: sfr, linux-next, linux-kernel, f.fainelli, netdev
In-Reply-To: <CA+r1ZhjxtK8BtF16Mr9jXjqvFVV7KAAFjWqfVFUibNreZwZxTw@mail.gmail.com>

From: Jim Davis <jim.epost@gmail.com>
Date: Wed, 10 Dec 2014 09:10:45 -0700

> Building with the attached random configuration file,
> 
> ERROR: "fixed_phy_register"
> [drivers/net/ethernet/broadcom/genet/genet.ko] undefined!

Florian, I don't understand why FIXED_PHY is only selected in Kconfig
if the driver is statically built into the kernel.

That makes no sense at all, you should need that module regardless of
how the driver itself is enabled.

Can't we just remove the "XXX=y" in all of those silly:

	select FIXED_PHY if XXX=y

expressions?

There are three such cases right now:

drivers/net/dsa/Kconfig:	select FIXED_PHY if NET_DSA_BCM_SF2=y
drivers/net/ethernet/broadcom/Kconfig:	select FIXED_PHY if BCMGENET=y
drivers/net/ethernet/broadcom/Kconfig:	select FIXED_PHY if SYSTEMPORT=y

^ permalink raw reply

* Re: bind() should not return -EADDRINUSE
From: David Miller @ 2014-12-11 19:23 UTC (permalink / raw)
  To: psusi; +Cc: netdev
In-Reply-To: <5489EE2F.6030502@ubuntu.com>

From: Phillip Susi <psusi@ubuntu.com>
Date: Thu, 11 Dec 2014 14:19:11 -0500

> Attempting to establish a connection to a remote host using the same
> local port, but a different remote port as the previous connection (
> that is still in TIME_WAIT ) results in bind() returning -EADDRINUSE.
>  By changing the remote port, you avoid the conflict with the other
> connection that is in TIME_WAIT, but since the remote port is not
> known when bind() is called, it incorrectly returns -EADDRINUSE.  This
> check should not be done in bind(), but deferred until the remote port
> is known in the call to connect(), or listen().

Bind has to allocate and hold from everyone else on the system the
local port at bind() time, so we cannot defer this decision.

^ permalink raw reply

* Re: [PATCH] cxgb4/csiostor: Don't use MASTER_MUST for fw_hello call
From: David Miller @ 2014-12-11 19:25 UTC (permalink / raw)
  To: hariprasad
  Cc: netdev, linux-scsi, JBottomley, hch, leedom, anish, nirranjan,
	praveenm
In-Reply-To: <1418276503-23001-1-git-send-email-hariprasad@chelsio.com>

From: Hariprasad Shenai <hariprasad@chelsio.com>
Date: Thu, 11 Dec 2014 11:11:43 +0530

> Remove use of calls into t4_fw_hello() with MASTER_MUST, which results in
> FW_HELLO_CMD_MASTERFORCE being set. The firmware doesn't support this and of
> course any existing PF Drivers will totally go for a toss.
> 
> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>

Applied, thanks.

^ permalink raw reply

* Re: [PATCH net v2] gianfar: Fix dma check map error when DMA_API_DEBUG is enabled
From: David Miller @ 2014-12-11 19:27 UTC (permalink / raw)
  To: haokexin; +Cc: netdev, claudiu.manoil
In-Reply-To: <1418278121-20209-1-git-send-email-haokexin@gmail.com>

From: Kevin Hao <haokexin@gmail.com>
Date: Thu, 11 Dec 2014 14:08:41 +0800

> We need to use dma_mapping_error() to check the dma address returned
> by dma_map_single/page(). Otherwise we would get warning like this:
 ...
> For TX, we need to unmap the pages which has already been mapped and
> free the skb before return.
> 
> For RX, move the dma mapping and error check to gfar_new_skb(). We
> would reuse the original skb in the rx ring when either allocating
> skb failure or dma mapping error.
> 
> Signed-off-by: Kevin Hao <haokexin@gmail.com>
> Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com>
> ---
> v2: Just update the RX path to reuse the original skb when dma mapping error
>     occurs as suggested by David.

Looks good, applied, thanks.

^ permalink raw reply

* Re: [PATCH v2 1/1] net/macb: add TX multiqueue support for gem
From: Thomas Petazzoni @ 2014-12-11 19:31 UTC (permalink / raw)
  To: Cyrille Pitchen
  Cc: nicolas.ferre, davem, linux-arm-kernel, netdev, soren.brinkmann,
	linux-kernel
In-Reply-To: <87a3098203ee6eaa7a60607713a293d3258e2b58.1418291637.git.cyrille.pitchen@atmel.com>

Dear Cyrille Pitchen,

On Thu, 11 Dec 2014 11:16:51 +0100, Cyrille Pitchen wrote:

> +#define GEM_ISR1				0x0400
> +#define GEM_ISR2				0x0404
> +#define GEM_ISR3				0x0408
> +#define GEM_ISR4				0x040c
> +#define GEM_ISR5				0x0410
> +#define GEM_ISR6				0x0414
> +#define GEM_ISR7				0x0418

What about doing instead:

#define GEM_ISR(q)				((q) == 0 ? MACB_ISR : 0x400 + (q) << 2)

And ditto for all other registers, which will save a lot of boring repeated code.

If you do that, then you can avoid the following fields in the
macb_queue structure:

+	unsigned int		ISR;
+	unsigned int		IER;
+	unsigned int		IDR;
+	unsigned int		IMR;
+	unsigned int		TBQP;

And the not very pleasant calculation of those offsets:

+	bp->queues[0].bp = bp;
+	bp->queues[0].ISR  = MACB_ISR;
+	bp->queues[0].IER  = MACB_IER;
+	bp->queues[0].IDR  = MACB_IDR;
+	bp->queues[0].IMR  = MACB_IMR;
+	bp->queues[0].TBQP = MACB_TBQP;
+	for (q = 1, queue = &bp->queues[1]; q < MACB_MAX_QUEUES; ++q) {
+		if (!(queue_mask & (1 << q)))
+			continue;
+
+		queue->bp = bp;
+		queue->ISR  = (q-1) * sizeof(u32) + GEM_ISR1;
+		queue->IER  = (q-1) * sizeof(u32) + GEM_IER1;
+		queue->IDR  = (q-1) * sizeof(u32) + GEM_IDR1;
+		queue->IMR  = (q-1) * sizeof(u32) + GEM_IMR1;
+		queue->TBQP = (q-1) * sizeof(u32) + GEM_TBQP1;
+		queue++;
+	}

You replace the ISR, IER, IDR, IMR and TBQP by an "id" field in
macb_queue, which contains the queue number, and then change your:

+#define queue_readl(queue, reg)				\
+	__raw_readl((queue)->bp->regs + queue->reg)
+#define queue_writel(queue, reg, value)			\
+	__raw_writel((value), (queue)->bp->regs + queue->reg)

to

+#define queue_readl(queue, reg)				\
+	__raw_readl((queue)->bp->regs + reg((queue)->id))
+#define queue_writel(queue, reg, value)			\
+	__raw_writel((value), (queue)->bp->regs + reg((queue)->id)

Best regards,

Thomas
-- 
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

^ permalink raw reply

* Re: [net] gre: fix the inner mac header in nbma gre tunnels xmit path
From: David Miller @ 2014-12-11 19:36 UTC (permalink / raw)
  To: timo.teras; +Cc: netdev, therbert, alexander.h.duyck
In-Reply-To: <1418282079-30245-1-git-send-email-timo.teras@iki.fi>

From: Timo Teräs <timo.teras@iki.fi>
Date: Thu, 11 Dec 2014 09:14:39 +0200

> @@ -266,6 +262,7 @@ static netdev_tx_t ipgre_xmit(struct sk_buff *skb,
>  		 * to gre header.
>  		 */
>  		skb_pull(skb, tunnel->hlen + sizeof(struct iphdr));
> +		skb_reset_mac_header(mac);

Please explain to me how this compiles, let alone be functionally
tested.

^ permalink raw reply

* Re: [PATCH v2 net] be2net: Export tunnel offloads only when a VxLAN tunnel is created
From: David Miller @ 2014-12-11 19:44 UTC (permalink / raw)
  To: sathya.perla; +Cc: netdev
In-Reply-To: <1418286287-19738-1-git-send-email-sathya.perla@emulex.com>

From: Sathya Perla <sathya.perla@emulex.com>
Date: Thu, 11 Dec 2014 03:24:47 -0500

> From: Sriharsha Basavapatna <sriharsha.basavapatna@emulex.com>
> 
> The encapsulated offload flags shouldn't be unconditionally exported
> to the stack. The stack expects offloading to work across all tunnel
> types when those flags are set. This would break other tunnels (like
> GRE) since be2net currently supports tunnel offload for VxLAN only.
> 
> Also, with VxLANs Skyhawk-R can offload only 1 UDP dport. If more
> than 1 UDP port is added, we should disable offloads in that case too.
> 
> Signed-off-by: Sriharsha Basavapatna <sriharsha.basavapatna@emulex.com>
> Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
> ---
> 
> v2 changes: fix a bad indentation pointed out by Dave M.

Applied, thanks.

^ permalink raw reply

* Re: [net] gre: fix the inner mac header in nbma gre tunnels xmit path
From: Timo Teras @ 2014-12-11 19:44 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, therbert, alexander.h.duyck
In-Reply-To: <20141211.143627.1516156524809595238.davem@davemloft.net>

On Thu, 11 Dec 2014 14:36:27 -0500 (EST)
David Miller <davem@davemloft.net> wrote:

> From: Timo Teräs <timo.teras@iki.fi>
> Date: Thu, 11 Dec 2014 09:14:39 +0200
> 
> > @@ -266,6 +262,7 @@ static netdev_tx_t ipgre_xmit(struct sk_buff
> > *skb,
> >  		 * to gre header.
> >  		 */
> >  		skb_pull(skb, tunnel->hlen + sizeof(struct iphdr));
> > +		skb_reset_mac_header(mac);
> 
> Please explain to me how this compiles, let alone be functionally
> tested.

Sorry. I made the change twice; once on the build box. And again on my
git checkout on work station. Must've been on seriously coffee deprived
state.

Should be obviously:
+		skb_reset_mac_header(skb);

Is there other comments on it?

Or shall I resend?

^ permalink raw reply

* Re: [PATCH V2 net-next 00/10] mlx4 driver update
From: David Miller @ 2014-12-11 19:48 UTC (permalink / raw)
  To: ogerlitz; +Cc: netdev, matanb, amirv, talal, jackm
In-Reply-To: <1418288280-334-1-git-send-email-ogerlitz@mellanox.com>

From: Or Gerlitz <ogerlitz@mellanox.com>
Date: Thu, 11 Dec 2014 10:57:50 +0200

> This series from Matan, Jenny, Dotan and myself is mostly about adding
> support to a new performance optimized flow steering mode (patches 4-10).
> 
> The 1st two patches are small fixes (one for VXLAN and one for SRIOV),
> and the third patch is a fix to avoid hard-lockup situation when many
> (hunderds) processes holding user-space QPs/CQs get events.
> 
> Matan and Or. 

Series applied, thanks.

^ permalink raw reply

* Bug: mv643xxx fails with highmem
From: Russell King - ARM Linux @ 2014-12-11 19:49 UTC (permalink / raw)
  To: Ezequiel Garcia, David S. Miller; +Cc: netdev

Commit 69ad0dd7af22 removed skb_frag_dma_map() in favour of mapping
all fragments with dma_map_single().  This fails when the driver is
used in an environment with highmem.

With this patch in place in 3.18:

diff --git a/drivers/net/ethernet/marvell/mv643xx_eth.c b/drivers/net/ethernet/marvell/mv643xx_eth.c
index d44560d1d268..14d1fc9ff485 100644
--- a/drivers/net/ethernet/marvell/mv643xx_eth.c
+++ b/drivers/net/ethernet/marvell/mv643xx_eth.c
@@ -882,7 +882,9 @@ static void txq_submit_frag_skb(struct tx_queue *txq, struct sk_buff *skb)
 		void *addr;
  
 		this_frag = &skb_shinfo(skb)->frags[frag];
+		BUG_ON(PageHighMem(this_frag->page.p));
 		addr = page_address(this_frag->page.p) + this_frag->page_offset;
+		BUG_ON(!addr);
 		tx_index = txq->tx_curr_desc++;
 		if (txq->tx_curr_desc == txq->tx_ring_size)
 			txq->tx_curr_desc = 0;

I regularly hit the first BUG_ON().  Without the BUG_ON(), the machine
either freezes or oopses in the DMA API functions due to them being
passed an invalid struct page *.

So, I'm unclear whether this is a driver bug, or whether it's a core
netdev bug: should netdev be passing skbuffs with highmem pages
attached when the driver has NETIF_F_HIGHDMA clear, or is Ezequiel's
patch removing the ability to map highmem pages from this driver
incorrect?

Here's the ethtool -k eth0 output for this driver:

Features for eth0:
rx-checksumming: on
tx-checksumming: on
        tx-checksum-ipv4: on
        tx-checksum-ip-generic: off [fixed]
        tx-checksum-ipv6: off [fixed]
        tx-checksum-fcoe-crc: off [fixed]
        tx-checksum-sctp: off [fixed]
scatter-gather: on
        tx-scatter-gather: on
        tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: on
        tx-tcp-segmentation: on
        tx-tcp-ecn-segmentation: off [fixed]
        tx-tcp6-segmentation: off [fixed]
udp-fragmentation-offload: off [fixed]
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off [fixed]
rx-vlan-offload: off [fixed]
tx-vlan-offload: off [fixed]
ntuple-filters: off [fixed]
receive-hashing: off [fixed]
highdma: off [fixed]
rx-vlan-filter: off [fixed]
vlan-challenged: off [fixed]
tx-lockless: off [fixed]
netns-local: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: off [fixed]
tx-gre-segmentation: off [fixed]
tx-ipip-segmentation: off [fixed]
tx-sit-segmentation: off [fixed]
tx-udp_tnl-segmentation: off [fixed]
tx-mpls-segmentation: off [fixed]
fcoe-mtu: off [fixed]
tx-nocache-copy: off
loopback: off [fixed]
rx-fcs: off [fixed]
rx-all: off [fixed]
tx-vlan-stag-hw-insert: off [fixed]
rx-vlan-stag-hw-parse: off [fixed]
rx-vlan-stag-filter: off [fixed]
l2-fwd-offload: off [fixed]
busy-poll: off [fixed]

Thanks.

-- 
FTTC broadband for 0.8mile line: currently at 9.5Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply related

* Re: bind() should not return -EADDRINUSE
From: Phillip Susi @ 2014-12-11 19:50 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20141211.142355.854082841833367081.davem@davemloft.net>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 12/11/2014 2:23 PM, David Miller wrote:
> From: Phillip Susi <psusi@ubuntu.com> Date: Thu, 11 Dec 2014
> 14:19:11 -0500
> 
>> Attempting to establish a connection to a remote host using the
>> same local port, but a different remote port as the previous
>> connection ( that is still in TIME_WAIT ) results in bind()
>> returning -EADDRINUSE. By changing the remote port, you avoid the
>> conflict with the other connection that is in TIME_WAIT, but
>> since the remote port is not known when bind() is called, it
>> incorrectly returns -EADDRINUSE.  This check should not be done
>> in bind(), but deferred until the remote port is known in the
>> call to connect(), or listen().
> 
> Bind has to allocate and hold from everyone else on the system the 
> local port at bind() time, so we cannot defer this decision.

What on earth for?  If two processes are going to connect to different
remotes using the same source, that is perfectly fine.  The only
contention is if two processes want to listen() with the same local
addr and wildcard remote.


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.17 (MingW32)

iQEcBAEBAgAGBQJUifWYAAoJENRVrw2cjl5RpckH/2g9V0p3SFNvf1qmR1nxZjJ6
R19eosdoWvCL6B2pIqwvpy64ACqTf6MS3NrdZt6TwSz+8OKqBj/D8+IpKcAdo5kU
Yzhl3qASkm17HTeOmGhw00pqJkD4FYzzwb8hw/OxwxWUwpVLvv+nMYHhF7vwgOy6
h90Kmj6ycOeY/+sX5Woe5RheKq+AlDqJGAELk1Vs9lrRDDa/3o/HqEQX5HNsohyC
ib2dJRAdCuNN9KjSKuVLnCu4Tf1UB5W+efSK21qN/s1r0a0RWXMe2ph6JrYQsg2e
H2nl7Ifiv/4VbuvRkabSsrge8bJtw76c2uMpxw1GHNUN1rNnxhGAj4J1oOIp4rQ=
=LXdQ
-----END PGP SIGNATURE-----

^ permalink raw reply

* Re: [PATCH 1/1] net/macb: fix compilation warning for print_hex_dump() called with skb->mac_header
From: David Miller @ 2014-12-11 19:52 UTC (permalink / raw)
  To: cyrille.pitchen
  Cc: nicolas.ferre, linux-arm-kernel, netdev, soren.brinkmann,
	linux-kernel
In-Reply-To: <efa28485b430e77f5254248cb396da431d03fc5b.1418292741.git.cyrille.pitchen@atmel.com>

From: Cyrille Pitchen <cyrille.pitchen@atmel.com>
Date: Thu, 11 Dec 2014 11:15:54 +0100

> Signed-off-by: Cyrille Pitchen <cyrille.pitchen@atmel.com>

Applied, thanks.

^ permalink raw reply

* Re: [PATCH net v2] Fix race condition between vxlan_sock_add and vxlan_sock_release
From: David Miller @ 2014-12-11 19:57 UTC (permalink / raw)
  To: mleitner; +Cc: netdev
In-Reply-To: <125fcb71bfb239711f03d3344b4275b304db6bb8.1418298054.git.mleitner@redhat.com>

From: Marcelo Ricardo Leitner <mleitner@redhat.com>
Date: Thu, 11 Dec 2014 10:02:22 -0200

> Currently, when trying to reuse a socket, vxlan_sock_add will grab
> vn->sock_lock, locate a reusable socket, inc refcount and release
> vn->sock_lock.
> 
> But vxlan_sock_release() will first decrement refcount, and then grab
> that lock. refcnt operations are atomic but as currently we have
> deferred works which hold vs->refcnt each, this might happen, leading to
> a use after free (specially after vxlan_igmp_leave):
> 
>   CPU 1                            CPU 2
> 
> deferred work                    vxlan_sock_add
>   ...                              ...
>                                    spin_lock(&vn->sock_lock)
>                                    vs = vxlan_find_sock();
>   vxlan_sock_release
>     dec vs->refcnt, reaches 0
>     spin_lock(&vn->sock_lock)
>                                    vxlan_sock_hold(vs), refcnt=1
>                                    spin_unlock(&vn->sock_lock)
>     hlist_del_rcu(&vs->hlist);
>     vxlan_notify_del_rx_port(vs)
>     spin_unlock(&vn->sock_lock)
> 
> 
> So when we look for a reusable socket, we check if it wasn't freed
> already before reusing it.
> 
> Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
> Fixes: 7c47cedf43a8b3 ("vxlan: move IGMP join/leave to work queue")

Applied and queued up for -stable, thanks.

^ permalink raw reply

* Re: [PATCH net v9 0/7] cxgb4/cxgbi: misc. fixes for cxgb4i
From: David Miller @ 2014-12-11 20:04 UTC (permalink / raw)
  To: kxie; +Cc: linux-scsi, netdev, hariprasad, anish, hch, James.Bottomley,
	michaelc
In-Reply-To: <201412111525.sBBFPbfn012920@localhost6.localdomain6>

From: Karen Xie <kxie@chelsio.com>
Date: Thu, 11 Dec 2014 07:25:37 -0800

> This patch set fixes cxgb4i's tx credit calculation and adds handling of
> additional rx message and negative advice types. It also removes the duplicate
> code in cxgb4i to set the outgoing queues of a packet. 

This series does not apply cleanly at all to net-next, it gets many
rejects.

^ permalink raw reply

* [PATCH net-next] Avoid reducing cwnd when ACK+DSACK is received
From: Sébastien Barré @ 2014-12-11 19:58 UTC (permalink / raw)
  To: David Miller
  Cc: Sébastien Barré, netdev, Gregory Detal,
	Nandita Dukkipati, Yuchung Cheng

When the peer has delayed ack enabled, it may reply to a probe with an
ACK+D-SACK, with ack value set to tlp_high_seq. In the current code,
such ACK+DSACK will be missed and only at next, higher ack will the TLP
episode be considered done. Since the DSACK is not present anymore,
this will cost a cwnd reduction.

This patch ensures that this scenario does not cause a cwnd reduction, since
receiving an ACK+DSACK indicates that both the initial segment and the probe
have been received by the peer.

Cc: Gregory Detal <gregory.detal@uclouvain.be>
Cc: Nandita Dukkipati <nanditad@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Sébastien Barré <sebastien.barre@uclouvain.be>

---
 net/ipv4/tcp_input.c | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 075ab4d..fb007cc 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -3369,23 +3369,23 @@ static void tcp_process_tlp_ack(struct sock *sk, u32 ack, int flag)
 
 	/* Mark the end of TLP episode on receiving TLP dupack or when
 	 * ack is after tlp_high_seq.
+	 * With delayed acks, we may also get a regular ACK+DSACK, in which
+	 * case we don't want to reduce the cwnd either.
 	 */
-	if (is_tlp_dupack) {
+	if (is_tlp_dupack ||
+	    !before(ack, tp->tlp_high_seq) && (flag & FLAG_DSACKING_ACK)) {
 		tp->tlp_high_seq = 0;
 		return;
 	}
 
 	if (after(ack, tp->tlp_high_seq)) {
 		tp->tlp_high_seq = 0;
-		/* Don't reduce cwnd if DSACK arrives for TLP retrans. */
-		if (!(flag & FLAG_DSACKING_ACK)) {
-			tcp_init_cwnd_reduction(sk);
-			tcp_set_ca_state(sk, TCP_CA_CWR);
-			tcp_end_cwnd_reduction(sk);
-			tcp_try_keep_open(sk);
-			NET_INC_STATS_BH(sock_net(sk),
-					 LINUX_MIB_TCPLOSSPROBERECOVERY);
-		}
+		tcp_init_cwnd_reduction(sk);
+		tcp_set_ca_state(sk, TCP_CA_CWR);
+		tcp_end_cwnd_reduction(sk);
+		tcp_try_keep_open(sk);
+		NET_INC_STATS_BH(sock_net(sk),
+				 LINUX_MIB_TCPLOSSPROBERECOVERY);
 	}
 }
 
-- 
tg: (52c9b12..) net-next/tlp-dsack-handling (depends on: net-next/master)

^ permalink raw reply related

* Re: [net] gre: fix the inner mac header in nbma gre tunnels xmit path
From: David Miller @ 2014-12-11 20:07 UTC (permalink / raw)
  To: timo.teras; +Cc: netdev, therbert, alexander.h.duyck
In-Reply-To: <20141211214456.7cc621e4@vostro>

From: Timo Teras <timo.teras@iki.fi>
Date: Thu, 11 Dec 2014 21:44:56 +0200

> On Thu, 11 Dec 2014 14:36:27 -0500 (EST)
> David Miller <davem@davemloft.net> wrote:
> 
>> From: Timo Teräs <timo.teras@iki.fi>
>> Date: Thu, 11 Dec 2014 09:14:39 +0200
>> 
>> > @@ -266,6 +262,7 @@ static netdev_tx_t ipgre_xmit(struct sk_buff
>> > *skb,
>> >  		 * to gre header.
>> >  		 */
>> >  		skb_pull(skb, tunnel->hlen + sizeof(struct iphdr));
>> > +		skb_reset_mac_header(mac);
>> 
>> Please explain to me how this compiles, let alone be functionally
>> tested.
> 
> Sorry. I made the change twice; once on the build box. And again on my
> git checkout on work station. Must've been on seriously coffee deprived
> state.
> 
> Should be obviously:
> +		skb_reset_mac_header(skb);
> 
> Is there other comments on it?

I don't care where you applied the patch, if you didn't type make after
making the change, that is extremely careless.

^ permalink raw reply

* Re: Bug: mv643xxx fails with highmem
From: David Miller @ 2014-12-11 20:10 UTC (permalink / raw)
  To: linux; +Cc: ezequiel.garcia, netdev
In-Reply-To: <20141211194920.GR11285@n2100.arm.linux.org.uk>

From: Russell King - ARM Linux <linux@arm.linux.org.uk>
Date: Thu, 11 Dec 2014 19:49:20 +0000

> Commit 69ad0dd7af22 removed skb_frag_dma_map() in favour of mapping
> all fragments with dma_map_single().  This fails when the driver is
> used in an environment with highmem.

This change looks really buggy to me.

Unfortunately, all the changes he subsequently makes for software TSO
support depend upon this :-/

The change is definitely wrong.

^ permalink raw reply

* Re: bind() should not return -EADDRINUSE
From: David Miller @ 2014-12-11 20:13 UTC (permalink / raw)
  To: psusi; +Cc: netdev
In-Reply-To: <5489F598.9020708@ubuntu.com>

From: Phillip Susi <psusi@ubuntu.com>
Date: Thu, 11 Dec 2014 14:50:48 -0500

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> On 12/11/2014 2:23 PM, David Miller wrote:
>> From: Phillip Susi <psusi@ubuntu.com> Date: Thu, 11 Dec 2014
>> 14:19:11 -0500
>> 
>>> Attempting to establish a connection to a remote host using the
>>> same local port, but a different remote port as the previous
>>> connection ( that is still in TIME_WAIT ) results in bind()
>>> returning -EADDRINUSE. By changing the remote port, you avoid the
>>> conflict with the other connection that is in TIME_WAIT, but
>>> since the remote port is not known when bind() is called, it
>>> incorrectly returns -EADDRINUSE.  This check should not be done
>>> in bind(), but deferred until the remote port is known in the
>>> call to connect(), or listen().
>> 
>> Bind has to allocate and hold from everyone else on the system the 
>> local port at bind() time, so we cannot defer this decision.
> 
> What on earth for?  If two processes are going to connect to different
> remotes using the same source, that is perfectly fine.  The only
> contention is if two processes want to listen() with the same local
> addr and wildcard remote.

But you don't know ahead of time what the processes are going to
do, that's the problem.

You cannot leave the port available and pretend to another process
that he will be able to use it.

Port allocation failures must be signalled at bind() time.

^ permalink raw reply

* Re: Bug: mv643xxx fails with highmem
From: Ezequiel Garcia @ 2014-12-11 20:12 UTC (permalink / raw)
  To: David Miller, linux; +Cc: netdev
In-Reply-To: <20141211.151055.817876561546126576.davem@davemloft.net>

On 12/11/2014 05:10 PM, David Miller wrote:
> From: Russell King - ARM Linux <linux@arm.linux.org.uk>
> Date: Thu, 11 Dec 2014 19:49:20 +0000
> 
>> Commit 69ad0dd7af22 removed skb_frag_dma_map() in favour of mapping
>> all fragments with dma_map_single().  This fails when the driver is
>> used in an environment with highmem.
> 
> This change looks really buggy to me.
> 
> Unfortunately, all the changes he subsequently makes for software TSO
> support depend upon this :-/
> 
> The change is definitely wrong.
> 

Got it. I'll take a closer look and will try to think a fix for this.

-- 
Ezequiel García, Free Electrons
Embedded Linux, Kernel and Android Engineering
http://free-electrons.com

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox