Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH] net: less interrupt masking in NAPI
From: Yang Yingliang @ 2014-12-03  7:31 UTC (permalink / raw)
  To: David Miller, eric.dumazet; +Cc: netdev, willemb
In-Reply-To: <20141103.122538.387451917276174830.davem@davemloft.net>

On 2014/11/4 1:25, David Miller wrote:
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Sun, 02 Nov 2014 06:19:33 -0800
> 
>> From: Eric Dumazet <edumazet@google.com>
>>
>> net_rx_action() can mask irqs a single time to transfert sd->poll_list
>> into a private list, for a very short duration.
>>
>> Then, napi_complete() can avoid masking irqs again,
>> and net_rx_action() only needs to mask irq again in slow path.
>>
>> This patch removes 2 couples of irq mask/unmask per typical NAPI run,
>> more if multiple napi were triggered.
>>
>> Note this also allows to give control back to caller (do_softirq())
>> more often, so that other softirq handlers can be called a bit earlier,
>> or ksoftirqd can be wakeup earlier under pressure.
>>
>> This was developed while testing an alternative to RX interrupt
>> mitigation to reduce latencies while keeping or improving GRO
>> aggregation on fast NIC.
>>
>> Idea is to test napi->gro_list at the end of a napi->poll() and
>> reschedule one NAPI poll, but after servicing a full round of
>> softirqs (timers, TX, rcu, ...). This will be allowed only if softirq
>> is currently serviced by idle task or ksoftirqd, and resched not needed.
>>
>> Signed-off-by: Eric Dumazet <edumazet@google.com>
> 
> Also applied, thanks Eric.

This patch can resolve my performance problem.
Will/Can this patch queue for stable ?

Regards,
Yang

^ permalink raw reply

* Re: [PATCH net-next v3] rtnetlink: delay RTM_DELLINK notification until after ndo_uninit()
From: Eric Dumazet @ 2014-12-03  7:39 UTC (permalink / raw)
  To: Mahesh Bandewar
  Cc: netdev, David Miller, Eric Dumazet, Roopa Prabhu, Toshiaki Makita
In-Reply-To: <1417591497-24246-1-git-send-email-maheshb@google.com>

On Tue, 2014-12-02 at 23:24 -0800, Mahesh Bandewar wrote:

> +EXPORT_SYMBOL(rtmsg_ifinfo_build_skb);
> +
> +void rtmsg_ifinfo_send(struct sk_buff *skb, struct net_device *dev, gfp_t flags)
> +{
> +	struct net *net = dev_net(dev);
> +
> +	rtnl_notify(skb, net, 0, RTNLGRP_LINK, NULL, flags);
> +}
> +EXPORT_SYMBOL(rtmsg_ifinfo_send);
> +
> +void rtmsg_ifinfo(int type, struct net_device *dev, unsigned int change,
> +		  gfp_t flags)
> +{
> +	struct sk_buff *skb;
> +
> +	skb = rtmsg_ifinfo_build_skb(type, dev, change, flags);
> +	if (skb)
> +		rtmsg_ifinfo_send(skb, dev, flags);
>  }
>  EXPORT_SYMBOL(rtmsg_ifinfo);

One last point :

Only rtmsg_ifinfo() needs the EXPORT_SYMBOL(...)

rtmsg_ifinfo_build_skb() and rtmsg_ifinfo_send() are used from core networking stack,
not a module.

^ permalink raw reply

* Re: [PATCH] net: less interrupt masking in NAPI
From: Eric Dumazet @ 2014-12-03  7:41 UTC (permalink / raw)
  To: Yang Yingliang; +Cc: David Miller, netdev, willemb
In-Reply-To: <547EBC66.4040301@huawei.com>

On Wed, 2014-12-03 at 15:31 +0800, Yang Yingliang wrote:

> This patch can resolve my performance problem.
> Will/Can this patch queue for stable ?

Hmm.. please give us more details, I am surprised it can have
a big impact.

^ permalink raw reply

* Re: [PATCH v2 1/6] GMAC: add driver for Rockchip RK3288 SoCs integrated GMAC
From: Roger @ 2014-12-03  7:57 UTC (permalink / raw)
  To: Heiko Stübner
  Cc: peppe.cavallaro, netdev, linux-kernel, linux-rockchip, kever.yang,
	eddie.cai
In-Reply-To: <1591535.72TyRu5yfQ@diego>

Hi! Heiko

about regulator, power gpio,  reset gpio and irq gpio
please refer to my comment inline, tks.

On 2014/12/2 7:44, Heiko Stübner wrote:
> Hi Roger,
>
> the comments inline are a rough first review. I hope to get a clearer picture
> for the stuff I'm not sure about in v3 once the big issues are fixed.
>
>
> Am Donnerstag, 27. November 2014, 10:52:08 schrieb Roger Chen:
>> This driver is based on stmmac driver.
>>
>> modification based on Giuseppe CAVALLARO's suggestion:
>> 1. use BIT()
>>
>> 	> +/*RK3288_GRF_SOC_CON3*/
>> 	> +#define GMAC_TXCLK_DLY_ENABLE   ((0x4000 << 16) | (0x4000))
>> 	> +#define GMAC_TXCLK_DLY_DISABLE  ((0x4000 << 16) | (0x0000))
>>
>> 	...
>>
>> 	why do not use BIT and BIT_MASK where possible?
>>
>> 	===>after modification:
>>
>> 	#define GRF_BIT(nr)     (BIT(nr) | BIT(nr+16))
>> 	#define GRF_CLR_BIT(nr) (BIT(nr+16))
>> 	#define GMAC_TXCLK_DLY_ENABLE   GRF_BIT(14)
>> 	#define GMAC_TXCLK_DLY_DISABLE  GRF_CLR_BIT(14)
>> 	...
>> 2.
>>
>> 	> +    regmap_write(bsp_priv->grf, RK3288_GRF_SOC_CON1,
>> 	> +             GMAC_PHY_INTF_SEL_RGMII);
>> 	> +    regmap_write(bsp_priv->grf, RK3288_GRF_SOC_CON1,
>> 	> +             GMAC_RMII_MODE_CLR);
>>
>> 	maybe you could perform just one write unless there is some HW
>> 	constraint.
>>
>> 	===>after modification:
>>
>> 	regmap_write(bsp_priv->grf, RK3288_GRF_SOC_CON1,
>> 			 GMAC_PHY_INTF_SEL_RGMII | GMAC_RMII_MODE_CLR);
>>
>> 3. use macros
>>
>> 	> +    regmap_write(bsp_priv->grf, RK3288_GRF_GPIO3D_E, 0xFFFFFFFF);
>> 	> +    regmap_write(bsp_priv->grf, RK3288_GRF_GPIO4B_E,
>> 	> +             0x3<<2<<16 | 0x3<<2);
>>
>> 	pls use macros, these shift sequence is really help to decode
>>
>> 	===>after modification:
>>
>> 	regmap_write(bsp_priv->grf, RK3288_GRF_GPIO4A_E, GPIO4A_12MA);
>> 	regmap_write(bsp_priv->grf, RK3288_GRF_GPIO4B_E, GPIO4B_2_12MA);
>>
>> 4. remove grf fail check in rk_gmac_setup()
>>
>> 	> +    if (IS_ERR(bsp_priv->grf))
>> 	> +        dev_err(&pdev->dev, "Missing rockchip,grf property\n");
>>
>> 	I wonder if you can fail on here and save all the check in
>> 	set_rgmii_speed etc.
>> 	Maybe this can be considered a mandatory property for the glue-logic.
>>
>> 5. remove .tx_coe=1
>>
>> 	> +const struct stmmac_of_data rk_gmac_data = {
>> 	> +    .has_gmac = 1,
>> 	> +    .tx_coe = 1,
>>
>> 	FYI, on new gmac there is the HW capability register to dinamically
>> 	provide you if coe is supported.
>>
>> 	IMO you should add the OF "compatible" string and in case of mac
>> 	newer than the 3.50a you can remove coe.
> changelogs like these, should be compact and also not be in the commit message
> itself, but in the "comment"-section below the "---" and before the diffstat.
>
>
>> Signed-off-by: Roger Chen <roger.chen@rock-chips.com>
>> ---
> changelog here ... the commonly used pattern is something like
>
> changes since v2:
> - ...
> - ...
>
> changes since v1:
> - ...
>
>>   drivers/net/ethernet/stmicro/stmmac/Makefile       |    2 +-
>>   drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c     |  636
>> ++++++++++++++++++++ .../net/ethernet/stmicro/stmmac/stmmac_platform.c  |
>>   3 +
>>   .../net/ethernet/stmicro/stmmac/stmmac_platform.h  |    1 +
>>   4 files changed, 641 insertions(+), 1 deletion(-)
>>   create mode 100644 drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c
>>
>> diff --git a/drivers/net/ethernet/stmicro/stmmac/Makefile
>> b/drivers/net/ethernet/stmicro/stmmac/Makefile index ac4d562..73c2715
>> 100644
>> --- a/drivers/net/ethernet/stmicro/stmmac/Makefile
>> +++ b/drivers/net/ethernet/stmicro/stmmac/Makefile
>> @@ -6,7 +6,7 @@ stmmac-objs:= stmmac_main.o stmmac_ethtool.o stmmac_mdio.o
>> ring_mode.o	\
>>
>>   obj-$(CONFIG_STMMAC_PLATFORM) += stmmac-platform.o
>>   stmmac-platform-objs:= stmmac_platform.o dwmac-meson.o dwmac-sunxi.o	\
>> -		       dwmac-sti.o dwmac-socfpga.o
>> +		       dwmac-sti.o dwmac-socfpga.o dwmac-rk.o
>>
>>   obj-$(CONFIG_STMMAC_PCI) += stmmac-pci.o
>>   stmmac-pci-objs:= stmmac_pci.o
>> diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c
>> b/drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c new file mode 100644
>> index 0000000..870563f
>> --- /dev/null
>> +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c
>> @@ -0,0 +1,636 @@
>> +/**
>> + * dwmac-rk.c - Rockchip RK3288 DWMAC specific glue layer
>> + *
>> + * Copyright (C) 2014 Chen-Zhi (Roger Chen)
>> + *
>> + * Chen-Zhi (Roger Chen)  <roger.chen@rock-chips.com>
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License as published by
>> + * the Free Software Foundation; either version 2 of the License, or
>> + * (at your option) any later version.
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + * GNU General Public License for more details.
>> + */
>> +
>> +#include <linux/stmmac.h>
>> +#include <linux/bitops.h>
>> +#include <linux/clk.h>
>> +#include <linux/phy.h>
>> +#include <linux/of_net.h>
>> +#include <linux/gpio.h>
>> +#include <linux/of_gpio.h>
>> +#include <linux/of_device.h>
>> +#include <linux/regulator/consumer.h>
>> +#include <linux/delay.h>
>> +#include <linux/mfd/syscon.h>
>> +#include <linux/regmap.h>
>> +
>> +struct rk_priv_data {
>> +	struct platform_device *pdev;
>> +	int phy_iface;
>> +	bool power_ctrl_by_pmu;
>> +	char pmu_regulator[32];
>> +	int pmu_enable_level;
>> +
>> +	int power_io;
>> +	int power_io_level;
>> +	int reset_io;
>> +	int reset_io_level;
>> +	int phyirq_io;
>> +	int phyirq_io_level;
>> +
>> +	bool clk_enabled;
>> +	bool clock_input;
>> +
>> +	struct clk *clk_mac;
>> +	struct clk *clk_mac_pll;
>> +	struct clk *gmac_clkin;
>> +	struct clk *mac_clk_rx;
>> +	struct clk *mac_clk_tx;
>> +	struct clk *clk_mac_ref;
>> +	struct clk *clk_mac_refout;
>> +	struct clk *aclk_mac;
>> +	struct clk *pclk_mac;
>> +
>> +	int tx_delay;
>> +	int rx_delay;
>> +
>> +	struct regmap *grf;
>> +};
>> +
>> +#define RK3288_GRF_SOC_CON1 0x0248
>> +#define RK3288_GRF_SOC_CON3 0x0250
>> +#define RK3288_GRF_GPIO3D_E 0x01ec
>> +#define RK3288_GRF_GPIO4A_E 0x01f0
>> +#define RK3288_GRF_GPIO4B_E 0x01f4
> here you're using a space instead of a tab, please select one pattern either
> tabs or space but do not mix them.
>
>
>> +#define GPIO3D_2MA	0xFFFF0000
>> +#define GPIO3D_4MA	0xFFFF5555
>> +#define GPIO3D_8MA	0xFFFFAAAA
>> +#define GPIO3D_12MA	0xFFFFFFFF
>> +
>> +#define GPIO4A_2MA	0xFFFF0000
>> +#define GPIO4A_4MA	0xFFFF5555
>> +#define GPIO4A_8MA	0xFFFFAAAA
>> +#define GPIO4A_12MA	0xFFFFFFFF
> see comment about pin settings below
>
>
>> +
>> +#define GRF_BIT(nr)	(BIT(nr) | BIT(nr+16))
>> +#define GRF_CLR_BIT(nr)	(BIT(nr+16))
>> +
>> +#define GPIO4B_2_2MA	(GRF_CLR_BIT(2) | GRF_CLR_BIT(3))
>> +#define GPIO4B_2_4MA	(GRF_BIT(2) | GRF_CLR_BIT(3))
>> +#define GPIO4B_2_8MA	(GRF_CLR_BIT(2) | GRF_BIT(3))
>> +#define GPIO4B_2_12MA	(GRF_BIT(2) | GRF_BIT(3))
>> +
>> +/*RK3288_GRF_SOC_CON1*/
>> +#define GMAC_PHY_INTF_SEL_RGMII	(GRF_BIT(6) | GRF_CLR_BIT(7) |
>> GRF_CLR_BIT(8))
>> +#define GMAC_PHY_INTF_SEL_RMII  (GRF_CLR_BIT(6) |
>> GRF_CLR_BIT(7) | GRF_BIT(8))
>> +#define GMAC_FLOW_CTRL          GRF_BIT(9)
>> +#define GMAC_FLOW_CTRL_CLR      GRF_CLR_BIT(9)
>> +#define GMAC_SPEED_10M          GRF_CLR_BIT(10)
>> +#define GMAC_SPEED_100M         GRF_BIT(10)
>> +#define GMAC_RMII_CLK_25M       GRF_BIT(11)
>> +#define GMAC_RMII_CLK_2_5M      GRF_CLR_BIT(11)
>> +#define GMAC_CLK_125M           (GRF_CLR_BIT(12) | GRF_CLR_BIT(13))
>> +#define GMAC_CLK_25M            (GRF_BIT(12) | GRF_BIT(13))
>> +#define GMAC_CLK_2_5M           (GRF_CLR_BIT(12) | GRF_BIT(13))
>> +#define GMAC_RMII_MODE          GRF_BIT(14)
>> +#define GMAC_RMII_MODE_CLR      GRF_CLR_BIT(14)
>> +
>> +/*RK3288_GRF_SOC_CON3*/
>> +#define GMAC_TXCLK_DLY_ENABLE   GRF_BIT(14)
>> +#define GMAC_TXCLK_DLY_DISABLE  GRF_CLR_BIT(14)
>> +#define GMAC_RXCLK_DLY_ENABLE   GRF_BIT(15)
>> +#define GMAC_RXCLK_DLY_DISABLE  GRF_CLR_BIT(15)
>> +#define GMAC_CLK_RX_DL_CFG(val) ((0x7F<<7<<16) | (val<<7))
>> +#define GMAC_CLK_TX_DL_CFG(val) ((0x7F<<16) | (val))
> again mixed tabs and spaces as delimiters.
>
> Also the _CFG macros are not well abstracted. You could take a look at the
> HIWORD_UPDATE macro in drivers/clk/rockchip/clk.h:
>
> #define GMAC_CLK_DL_MASK	0x7f
> #define GMAC_CLK_RX_DL_CFG(val)  HIWORD_UPDATE(val, GMAC_CLK_DL_MASK, 7)
> #define GMAC_CLK_TX_DL_CFG(val)  HIWORD_UPDATE(val, GMAC_CLK_DL_MASK, 0)
>
>
>> +
>> +static void set_to_rgmii(struct rk_priv_data *bsp_priv,
>> +			 int tx_delay, int rx_delay)
>> +{
>> +	if (IS_ERR(bsp_priv->grf)) {
>> +		pr_err("%s: Missing rockchip,grf property\n", __func__);
>> +		return;
>> +	}
>> +
>> +	regmap_write(bsp_priv->grf, RK3288_GRF_SOC_CON1,
>> +		     GMAC_PHY_INTF_SEL_RGMII | GMAC_RMII_MODE_CLR);
>> +	regmap_write(bsp_priv->grf, RK3288_GRF_SOC_CON3,
>> +		     GMAC_RXCLK_DLY_ENABLE | GMAC_TXCLK_DLY_ENABLE |
>> +		     GMAC_CLK_RX_DL_CFG(rx_delay) |
>> +		     GMAC_CLK_TX_DL_CFG(tx_delay));
>> +	regmap_write(bsp_priv->grf, RK3288_GRF_GPIO3D_E, GPIO3D_12MA);
>> +	regmap_write(bsp_priv->grf, RK3288_GRF_GPIO4A_E, GPIO4A_12MA);
>> +	regmap_write(bsp_priv->grf, RK3288_GRF_GPIO4B_E, GPIO4B_2_12MA);
> please don't write to parts controlled by other drivers - here the drive
> strength settings of pins is controlled by the pinctrl driver. Instead you can
> just set the drive-strength in the pinctrl settings.
>
>
>> +
>> +	pr_debug("%s: tx delay=0x%x; rx delay=0x%x;\n",
>> +		 __func__, tx_delay, rx_delay);
>> +}
>> +
>> +static void set_to_rmii(struct rk_priv_data *bsp_priv)
>> +{
>> +	if (IS_ERR(bsp_priv->grf)) {
>> +		pr_err("%s: Missing rockchip,grf property\n", __func__);
> you have a device-reference in rk_priv_data, so you could use dev_err here.
> Same for all other pr_err/pr_debug/pr_* calls in this file.
>
>
>> +		return;
>> +	}
>> +
>> +	regmap_write(bsp_priv->grf, RK3288_GRF_SOC_CON1,
>> +		     GMAC_PHY_INTF_SEL_RMII);
>> +	regmap_write(bsp_priv->grf, RK3288_GRF_SOC_CON1,
>> +		     GMAC_RMII_MODE);
> these two could be combined?
>
>
>> +}
>> +
>> +static void set_rgmii_speed(struct rk_priv_data *bsp_priv, int speed)
>> +{
>> +	if (IS_ERR(bsp_priv->grf)) {
>> +		pr_err("%s: Missing rockchip,grf property\n", __func__);
>> +		return;
>> +	}
>> +
>> +	if (speed == 10)
>> +		regmap_write(bsp_priv->grf, RK3288_GRF_SOC_CON1, GMAC_CLK_2_5M);
>> +	else if (speed == 100)
>> +		regmap_write(bsp_priv->grf, RK3288_GRF_SOC_CON1, GMAC_CLK_25M);
>> +	else if (speed == 1000)
>> +		regmap_write(bsp_priv->grf, RK3288_GRF_SOC_CON1, GMAC_CLK_125M);
>> +	else
>> +		pr_err("unknown speed value for RGMII! speed=%d", speed);
>> +}
>> +
>> +static void set_rmii_speed(struct rk_priv_data *bsp_priv, int speed)
>> +{
>> +	if (IS_ERR(bsp_priv->grf)) {
>> +		pr_err("%s: Missing rockchip,grf property\n", __func__);
>> +		return;
>> +	}
>> +
>> +	if (speed == 10) {
>> +		regmap_write(bsp_priv->grf, RK3288_GRF_SOC_CON1,
>> +			     GMAC_RMII_CLK_2_5M);
>> +		regmap_write(bsp_priv->grf, RK3288_GRF_SOC_CON1,
>> +			     GMAC_SPEED_10M);
> combine into one write?
>
>
>> +	} else if (speed == 100) {
>> +		regmap_write(bsp_priv->grf, RK3288_GRF_SOC_CON1,
>> +			     GMAC_RMII_CLK_25M);
>> +		regmap_write(bsp_priv->grf, RK3288_GRF_SOC_CON1,
>> +			     GMAC_SPEED_100M);
> combine into one write?
>
>
>> +	} else {
>> +		pr_err("unknown speed value for RMII! speed=%d", speed);
>> +	}
>> +}
>> +
>> +#define MAC_CLK_RX	"mac_clk_rx"
>> +#define MAC_CLK_TX	"mac_clk_tx"
>> +#define CLK_MAC_REF	"clk_mac_ref"
>> +#define CLK_MAC_REF_OUT	"clk_mac_refout"
>> +#define CLK_MAC_PLL	"clk_mac_pll"
>> +#define ACLK_MAC	"aclk_mac"
>> +#define PCLK_MAC	"pclk_mac"
>> +#define MAC_CLKIN	"ext_gmac"
>> +#define CLK_MAC		"stmmaceth"
> why the need to extra constants for the clock names and not use the real names
> directly like most other drivers do?
>
>
>> +
>> +static int gmac_clk_init(struct rk_priv_data *bsp_priv)
>> +{
>> +	struct device *dev = &bsp_priv->pdev->dev;
>> +
>> +	bsp_priv->clk_enabled = false;
>> +
>> +	bsp_priv->mac_clk_rx = clk_get(dev, MAC_CLK_RX);
>> +	if (IS_ERR(bsp_priv->mac_clk_rx))
>> +		pr_warn("%s: warning: cannot get clock %s\n",
>> +			__func__, MAC_CLK_RX);
>> +
>> +	bsp_priv->mac_clk_tx = clk_get(dev, MAC_CLK_TX);
>> +	if (IS_ERR(bsp_priv->mac_clk_tx))
>> +		pr_warn("%s: warning: cannot get clock %s\n",
>> +			__func__, MAC_CLK_TX);
>> +
>> +	bsp_priv->clk_mac_ref = clk_get(dev, CLK_MAC_REF);
>> +	if (IS_ERR(bsp_priv->clk_mac_ref))
>> +		pr_warn("%s: warning: cannot get clock %s\n",
>> +			__func__, CLK_MAC_REF);
>> +
>> +	bsp_priv->clk_mac_refout = clk_get(dev, CLK_MAC_REF_OUT);
>> +	if (IS_ERR(bsp_priv->clk_mac_refout))
>> +		pr_warn("%s: warning:cannot get clock %s\n",
>> +			__func__, CLK_MAC_REF_OUT);
>> +
>> +	bsp_priv->aclk_mac = clk_get(dev, ACLK_MAC);
>> +	if (IS_ERR(bsp_priv->aclk_mac))
>> +		pr_warn("%s: warning: cannot get clock %s\n",
>> +			__func__, ACLK_MAC);
>> +
>> +	bsp_priv->pclk_mac = clk_get(dev, PCLK_MAC);
>> +	if (IS_ERR(bsp_priv->pclk_mac))
>> +		pr_warn("%s: warning: cannot get clock %s\n",
>> +			__func__, PCLK_MAC);
>> +
>> +	bsp_priv->clk_mac_pll = clk_get(dev, CLK_MAC_PLL);
>> +	if (IS_ERR(bsp_priv->clk_mac_pll))
>> +		pr_warn("%s: warning: cannot get clock %s\n",
>> +			__func__, CLK_MAC_PLL);
>> +
>> +	bsp_priv->gmac_clkin = clk_get(dev, MAC_CLKIN);
>> +	if (IS_ERR(bsp_priv->gmac_clkin))
>> +		pr_warn("%s: warning: cannot get clock %s\n",
>> +			__func__, MAC_CLKIN);
>> +
>> +	bsp_priv->clk_mac = clk_get(dev, CLK_MAC);
>> +	if (IS_ERR(bsp_priv->clk_mac))
>> +		pr_warn("%s: warning: cannot get clock %s\n",
>> +			__func__, CLK_MAC);
> there is not clk_put in the _remove case ... maybe you could simply use
> devm_clk_get here so that all clocks are put on device removal.
>
> Also you're warning on every missing clock. Below it looks like you need a
> different set of them for rgmii and rmii, so maybe you should simply error out
> when core clocks for the selected phy-mode are missing.
>
>
>> +
>> +	if (bsp_priv->clock_input) {
>> +		pr_info("%s: clock input from PHY\n", __func__);
>> +	} else {
>> +		if (bsp_priv->phy_iface == PHY_INTERFACE_MODE_RMII)
>> +			clk_set_rate(bsp_priv->clk_mac_pll, 50000000);
>> +
>> +		clk_set_parent(bsp_priv->clk_mac, bsp_priv->clk_mac_pll);
> why the explicit reparenting. The common clock-framework is intelligent enough
> to select the best suitable parent.
>
> In general I'm thinking the clock-handling inside this driver should be
> simplyfied, as the common-clock framework can handle most cases itself. I.e. if
> a 125MHz external clock is present and so on. But haven't looked to deep yet.
>
>
>
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>> +static int gmac_clk_enable(struct rk_priv_data *bsp_priv, bool enable)
>> +{
>> +	int phy_iface = phy_iface = bsp_priv->phy_iface;
>> +
>> +	if (enable) {
>> +		if (!bsp_priv->clk_enabled) {
>> +			if (phy_iface == PHY_INTERFACE_MODE_RMII) {
>> +				if (!IS_ERR(bsp_priv->mac_clk_rx))
>> +					clk_prepare_enable(
>> +						bsp_priv->mac_clk_rx);
>> +
>> +				if (!IS_ERR(bsp_priv->clk_mac_ref))
>> +					clk_prepare_enable(
>> +						bsp_priv->clk_mac_ref);
>> +
>> +				if (!IS_ERR(bsp_priv->clk_mac_refout))
>> +					clk_prepare_enable(
>> +						bsp_priv->clk_mac_refout);
>> +			}
>> +
>> +			if (!IS_ERR(bsp_priv->aclk_mac))
>> +				clk_prepare_enable(bsp_priv->aclk_mac);
>> +
>> +			if (!IS_ERR(bsp_priv->pclk_mac))
>> +				clk_prepare_enable(bsp_priv->pclk_mac);
>> +
>> +			if (!IS_ERR(bsp_priv->mac_clk_tx))
>> +				clk_prepare_enable(bsp_priv->mac_clk_tx);
>> +
>> +			/**
>> +			 * if (!IS_ERR(bsp_priv->clk_mac))
>> +			 *	clk_prepare_enable(bsp_priv->clk_mac);
>> +			 */
>> +			mdelay(5);
>> +			bsp_priv->clk_enabled = true;
>> +		}
>> +	} else {
>> +		if (bsp_priv->clk_enabled) {
>> +			if (phy_iface == PHY_INTERFACE_MODE_RMII) {
>> +				if (!IS_ERR(bsp_priv->mac_clk_rx))
>> +					clk_disable_unprepare(
>> +						bsp_priv->mac_clk_rx);
>> +
>> +				if (!IS_ERR(bsp_priv->clk_mac_ref))
>> +					clk_disable_unprepare(
>> +						bsp_priv->clk_mac_ref);
>> +
>> +				if (!IS_ERR(bsp_priv->clk_mac_refout))
>> +					clk_disable_unprepare(
>> +						bsp_priv->clk_mac_refout);
>> +			}
>> +
>> +			if (!IS_ERR(bsp_priv->aclk_mac))
>> +				clk_disable_unprepare(bsp_priv->aclk_mac);
>> +
>> +			if (!IS_ERR(bsp_priv->pclk_mac))
>> +				clk_disable_unprepare(bsp_priv->pclk_mac);
>> +
>> +			if (!IS_ERR(bsp_priv->mac_clk_tx))
>> +				clk_disable_unprepare(bsp_priv->mac_clk_tx);
>> +			/**
>> +			 * if (!IS_ERR(bsp_priv->clk_mac))
>> +			 *	clk_disable_unprepare(bsp_priv->clk_mac);
>> +			 */
>> +			bsp_priv->clk_enabled = false;
>> +		}
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>> +static int power_on_by_pmu(struct rk_priv_data *bsp_priv, bool enable)
>> +{
>> +	struct regulator *ldo;
>> +	char *ldostr = bsp_priv->pmu_regulator;
>> +	int ret;
>> +
>> +	if (!ldostr) {
>> +		pr_err("%s: no ldo found\n", __func__);
>> +		return -1;
>> +	}
>> +
>> +	ldo = regulator_get(NULL, ldostr);
>> +	if (!ldo) {
>> +		pr_err("\n%s get ldo %s failed\n", __func__, ldostr);
>> +	} else {
>> +		if (enable) {
>> +			if (!regulator_is_enabled(ldo)) {
>> +				regulator_set_voltage(ldo, 3300000, 3300000);
>> +				ret = regulator_enable(ldo);
>> +				if (ret != 0)
>> +					pr_err("%s: fail to enable %s\n",
>> +					       __func__, ldostr);
>> +				else
>> +					pr_info("turn on ldo done.\n");
>> +			} else {
>> +				pr_warn("%s is enabled before enable", ldostr);
>> +			}
>> +		} else {
>> +			if (regulator_is_enabled(ldo)) {
>> +				ret = regulator_disable(ldo);
>> +				if (ret != 0)
>> +					pr_err("%s: fail to disable %s\n",
>> +					       __func__, ldostr);
>> +				else
>> +					pr_info("turn off ldo done.\n");
>> +			} else {
>> +				pr_warn("%s is disabled before disable",
>> +					ldostr);
>> +			}
>> +		}
>> +		regulator_put(ldo);
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>> +static int power_on_by_gpio(struct rk_priv_data *bsp_priv, bool enable)
>> +{
>> +	if (enable) {
>> +		/*power on*/
>> +		if (gpio_is_valid(bsp_priv->power_io))
>> +			gpio_direction_output(bsp_priv->power_io,
>> +					      bsp_priv->power_io_level);
>> +	} else {
>> +		/*power off*/
>> +		if (gpio_is_valid(bsp_priv->power_io))
>> +			gpio_direction_output(bsp_priv->power_io,
>> +					      !bsp_priv->power_io_level);
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>> +static int phy_power_on(struct rk_priv_data *bsp_priv, bool enable)
>> +{
>> +	int ret = -1;
>> +
>> +	pr_info("Ethernet PHY power %s\n", enable == 1 ? "on" : "off");
>> +
>> +	if (bsp_priv->power_ctrl_by_pmu)
>> +		ret = power_on_by_pmu(bsp_priv, enable);
>> +	else
>> +		ret =  power_on_by_gpio(bsp_priv, enable);
> this looks wrong. This should always be a regulator. Even a regulator + switch
> controlled by a gpio can still be modelled as regulator, so that you don't
> need this switch and assorted special handling - so just use the regulator API
>
In some case, it would be a switching circuit to control the power for PHY.
All I need to do is to control a GPIO to make switch on/off.  So...
>> +
>> +	if (enable) {
>> +		/*reset*/
>> +		if (gpio_is_valid(bsp_priv->reset_io)) {
>> +			gpio_direction_output(bsp_priv->reset_io,
>> +					      bsp_priv->reset_io_level);
>> +			mdelay(5);
>> +			gpio_direction_output(bsp_priv->reset_io,
>> +					      !bsp_priv->reset_io_level);
>> +		}
>> +		mdelay(30);
>> +
>> +	} else {
>> +		/*pull down reset*/
>> +		if (gpio_is_valid(bsp_priv->reset_io)) {
>> +			gpio_direction_output(bsp_priv->reset_io,
>> +					      bsp_priv->reset_io_level);
>> +		}
>> +	}
> I'm not sure yet if it would be better to use the reset framework for this.
> While it says it is also meant for reset-gpios, there does not seem a driver
> for this to exist yet.
>
What should I do?
>
>> +
>> +	return ret;
>> +}
>> +
>> +#define GPIO_PHY_POWER	"gmac_phy_power"
>> +#define GPIO_PHY_RESET	"gmac_phy_reset"
>> +#define GPIO_PHY_IRQ	"gmac_phy_irq"
> again I don't understand why these constants are necessary
>
>> +
>> +static void *rk_gmac_setup(struct platform_device *pdev)
>> +{
>> +	struct rk_priv_data *bsp_priv;
>> +	struct device *dev = &pdev->dev;
>> +	enum of_gpio_flags flags;
>> +	int ret;
>> +	const char *strings = NULL;
>> +	int value;
>> +	int irq;
>> +
>> +	bsp_priv = devm_kzalloc(dev, sizeof(*bsp_priv), GFP_KERNEL);
>> +	if (!bsp_priv)
>> +		return ERR_PTR(-ENOMEM);
>> +
>> +	bsp_priv->phy_iface = of_get_phy_mode(dev->of_node);
>> +
>> +	ret = of_property_read_string(dev->of_node, "pmu_regulator", &strings);
>> +	if (ret) {
>> +		pr_err("%s: Can not read property: pmu_regulator.\n", __func__);
>> +		bsp_priv->power_ctrl_by_pmu = false;
>> +	} else {
>> +		pr_info("%s: ethernet phy power controlled by pmu(%s).\n",
>> +			__func__, strings);
>> +		bsp_priv->power_ctrl_by_pmu = true;
>> +		strcpy(bsp_priv->pmu_regulator, strings);
>> +	}
> There is a generic regulator-dt-binding for regulator-consumers available
> which you should of course use.
>
The same explanation as above
>> +
>> +	ret = of_property_read_u32(dev->of_node, "pmu_enable_level", &value);
>> +	if (ret) {
>> +		pr_err("%s: Can not read property: pmu_enable_level.\n",
>> +		       __func__);
>> +		bsp_priv->power_ctrl_by_pmu = false;
>> +	} else {
>> +		pr_info("%s: PHY power controlled by pmu(level = %s).\n",
>> +			__func__, (value == 1) ? "HIGH" : "LOW");
>> +		bsp_priv->power_ctrl_by_pmu = true;
>> +		bsp_priv->pmu_enable_level = value;
>> +	}
> What is this used for? Enabling should of course be done via regulator_enable
> and disabling using regulator_disable.
>
>
>> +
>> +	ret = of_property_read_string(dev->of_node, "clock_in_out", &strings);
>> +	if (ret) {
>> +		pr_err("%s: Can not read property: clock_in_out.\n", __func__);
>> +		bsp_priv->clock_input = true;
>> +	} else {
>> +		pr_info("%s: clock input or output? (%s).\n",
>> +			__func__, strings);
>> +		if (!strcmp(strings, "input"))
>> +			bsp_priv->clock_input = true;
>> +		else
>> +			bsp_priv->clock_input = false;
>> +	}
>> +
>> +	ret = of_property_read_u32(dev->of_node, "tx_delay", &value);
>> +	if (ret) {
>> +		bsp_priv->tx_delay = 0x30;
>> +		pr_err("%s: Can not read property: tx_delay.", __func__);
>> +		pr_err("%s: set tx_delay to 0x%x\n",
>> +		       __func__, bsp_priv->tx_delay);
>> +	} else {
>> +		pr_info("%s: TX delay(0x%x).\n", __func__, value);
>> +		bsp_priv->tx_delay = value;
>> +	}
>> +
>> +	ret = of_property_read_u32(dev->of_node, "rx_delay", &value);
>> +	if (ret) {
>> +		bsp_priv->rx_delay = 0x10;
>> +		pr_err("%s: Can not read property: rx_delay.", __func__);
>> +		pr_err("%s: set rx_delay to 0x%x\n",
>> +		       __func__, bsp_priv->rx_delay);
>> +	} else {
>> +		pr_info("%s: RX delay(0x%x).\n", __func__, value);
>> +		bsp_priv->rx_delay = value;
>> +	}
>> +
>> +	bsp_priv->grf = syscon_regmap_lookup_by_phandle(dev->of_node,
>> +							"rockchip,grf");
>> +	bsp_priv->phyirq_io =
>> +		of_get_named_gpio_flags(dev->of_node,
>> +					"phyirq-gpio", 0, &flags);
>> +	bsp_priv->phyirq_io_level = (flags & OF_GPIO_ACTIVE_LOW) ? 0 : 1;
>> +
>> +	bsp_priv->reset_io =
>> +		of_get_named_gpio_flags(dev->of_node,
>> +					"reset-gpio", 0, &flags);
>> +	bsp_priv->reset_io_level = (flags & OF_GPIO_ACTIVE_LOW) ? 0 : 1;
>> +
>> +	bsp_priv->power_io =
>> +		of_get_named_gpio_flags(dev->of_node, "power-gpio", 0, &flags);
>> +	bsp_priv->power_io_level = (flags & OF_GPIO_ACTIVE_LOW) ? 0 : 1;
>> +
>> +	/*power*/
>> +	if (!gpio_is_valid(bsp_priv->power_io)) {
>> +		pr_err("%s: Failed to get GPIO %s.\n",
>> +		       __func__, "power-gpio");
>> +	} else {
>> +		ret = gpio_request(bsp_priv->power_io, GPIO_PHY_POWER);
>> +		if (ret)
>> +			pr_err("%s: ERROR: Failed to request GPIO %s.\n",
>> +			       __func__, GPIO_PHY_POWER);
>> +	}
> When everything power-related is handled using the regulator api, you don't
> need this
The same explanation as above
>
>> +
>> +	if (!gpio_is_valid(bsp_priv->reset_io)) {
>> +		pr_err("%s: ERROR: Get reset-gpio failed.\n", __func__);
>> +	} else {
>> +		ret = gpio_request(bsp_priv->reset_io, GPIO_PHY_RESET);
>> +		if (ret)
>> +			pr_err("%s: ERROR: Failed to request GPIO %s.\n",
>> +			       __func__, GPIO_PHY_RESET);
>> +	}
>> +
>> +	if (bsp_priv->phyirq_io > 0) {
> This is more for my understanding: why does the mac driver need to handle the
> phy interrupt - but I might be overlooking something.
>
phy interrupt is not mandatory.  In most of the time,  in order to find 
something happen in PHY, for example,
link is up or down,  we just use polling method to read the phy's 
register in a timer.
Buf if phy interrupt is in use, when link is up or down,  phy interrupt 
pin will be assert to inform the CPU.
I just implement the driver for phy interrupt gpio,  not enable it as 
default.

>> +		struct plat_stmmacenet_data *plat_dat;
>> +
>> +		pr_info("PHY irq in use\n");
>> +		ret = gpio_request(bsp_priv->phyirq_io, GPIO_PHY_IRQ);
>> +		if (ret < 0) {
>> +			pr_warn("%s: Failed to request GPIO %s\n",
>> +				__func__, GPIO_PHY_IRQ);
>> +			goto goon;
>> +		}
>> +
>> +		ret = gpio_direction_input(bsp_priv->phyirq_io);
>> +		if (ret < 0) {
>> +			pr_err("%s, Failed to set input for GPIO %s\n",
>> +			       __func__, GPIO_PHY_IRQ);
>> +			gpio_free(bsp_priv->phyirq_io);
>> +			goto goon;
>> +		}
>> +
>> +		irq = gpio_to_irq(bsp_priv->phyirq_io);
>> +		if (irq < 0) {
>> +			ret = irq;
>> +			pr_err("Failed to set irq for %s\n",
>> +			       GPIO_PHY_IRQ);
>> +			gpio_free(bsp_priv->phyirq_io);
>> +			goto goon;
>> +		}
>> +
>> +		plat_dat = dev_get_platdata(&pdev->dev);
>> +		if (plat_dat)
>> +			plat_dat->mdio_bus_data->probed_phy_irq = irq;
>> +		else
>> +			pr_err("%s: plat_data is NULL\n", __func__);
>> +	}
>> +
>> +goon:
>> +	/*rmii or rgmii*/
>> +	if (bsp_priv->phy_iface == PHY_INTERFACE_MODE_RGMII) {
>> +		pr_info("%s: init for RGMII\n", __func__);
>> +		set_to_rgmii(bsp_priv, bsp_priv->tx_delay, bsp_priv->rx_delay);
>> +	} else if (bsp_priv->phy_iface == PHY_INTERFACE_MODE_RMII) {
>> +		pr_info("%s: init for RMII\n", __func__);
>> +		set_to_rmii(bsp_priv);
>> +	} else {
>> +		pr_err("%s: ERROR: NO interface defined!\n", __func__);
>> +	}
>> +
>> +	bsp_priv->pdev = pdev;
>> +
>> +	gmac_clk_init(bsp_priv);
>> +
>> +	return bsp_priv;
>> +}
>> +
>> +static int rk_gmac_init(struct platform_device *pdev, void *priv)
>> +{
>> +	struct rk_priv_data *bsp_priv = priv;
>> +	int ret;
>> +
>> +	ret = phy_power_on(bsp_priv, true);
>> +	if (ret)
>> +		return ret;
>> +
>> +	ret = gmac_clk_enable(bsp_priv, true);
>> +	if (ret)
>> +		return ret;
>> +
>> +	return 0;
>> +}
>> +
>> +static void rk_gmac_exit(struct platform_device *pdev, void *priv)
>> +{
>> +	struct rk_priv_data *gmac = priv;
>> +
>> +	phy_power_on(gmac, false);
>> +	gmac_clk_enable(gmac, false);
>> +}
>> +
>> +static void rk_fix_speed(void *priv, unsigned int speed)
>> +{
>> +	struct rk_priv_data *bsp_priv = priv;
>> +
>> +	if (bsp_priv->phy_iface == PHY_INTERFACE_MODE_RGMII)
>> +		set_rgmii_speed(bsp_priv, speed);
>> +	else if (bsp_priv->phy_iface == PHY_INTERFACE_MODE_RMII)
>> +		set_rmii_speed(bsp_priv, speed);
>> +	else
>> +		pr_err("unsupported interface %d", bsp_priv->phy_iface);
>> +}
>> +
>> +const struct stmmac_of_data rk_gmac_data = {
>> +	.has_gmac = 1,
>> +	.fix_mac_speed = rk_fix_speed,
>> +	.setup = rk_gmac_setup,
>> +	.init = rk_gmac_init,
>> +	.exit = rk_gmac_exit,
>> +};
>> diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
>> b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c index
>> 15814b7..b4dee96 100644
>> --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
>> +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
>> @@ -33,6 +33,7 @@
>>
>>   static const struct of_device_id stmmac_dt_ids[] = {
>>   	/* SoC specific glue layers should come before generic bindings */
>> +	{ .compatible = "rockchip,rk3288-gmac", .data = &rk_gmac_data},
> please name that rk3288_gmac_data [of course the other occurences too]
> It makes it easier to see which soc it is meant for and it's also not save to
> assume that the next one will use the same register + bit positions in the
> grf.
>
>
>>   	{ .compatible = "amlogic,meson6-dwmac", .data = &meson6_dwmac_data},
>>   	{ .compatible = "allwinner,sun7i-a20-gmac", .data = &sun7i_gmac_data},
>>   	{ .compatible = "st,stih415-dwmac", .data = &stih4xx_dwmac_data},
>> @@ -291,6 +292,8 @@ static int stmmac_pltfr_probe(struct platform_device
>> *pdev) return  -ENOMEM;
>>   		}
>>
>> +		pdev->dev.platform_data = plat_dat;
>> +
>>   		ret = stmmac_probe_config_dt(pdev, plat_dat, &mac);
>>   		if (ret) {
>>   			pr_err("%s: main dt probe failed", __func__);
>> diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.h
>> b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.h index
>> 25dd1f7..32a0516 100644
>> --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.h
>> +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.h
>> @@ -24,5 +24,6 @@ extern const struct stmmac_of_data sun7i_gmac_data;
>>   extern const struct stmmac_of_data stih4xx_dwmac_data;
>>   extern const struct stmmac_of_data stid127_dwmac_data;
>>   extern const struct stmmac_of_data socfpga_gmac_data;
>> +extern const struct stmmac_of_data rk_gmac_data;
>>
>>   #endif /* __STMMAC_PLATFORM_H__ */
>
>

^ permalink raw reply

* Re: [PATCH net] cxgb4: Add a check for flashing FW using ethtool
From: Hariprasad S @ 2014-12-03  8:07 UTC (permalink / raw)
  To: Sergei Shtylyov; +Cc: netdev, davem, leedom, anish, nirranjan, kumaras
In-Reply-To: <547DCA45.7060702@cogentembedded.com>

On Tue, Dec 02, 2014 at 17:18:45 +0300, Sergei Shtylyov wrote:
> Hello.
> 
> On 12/2/2014 3:09 PM, Hariprasad Shenai wrote:
> 
> >Don't let T4 firmware flash on a T5 adapter and vice-versa
> >using ethtool
> 
> >Based on original work by Casey Leedom <leedom@chelsio.com>
> 
> >Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
> >---
> >  drivers/net/ethernet/chelsio/cxgb4/t4_hw.c |   26 ++++++++++++++++++++++++++
> >  1 files changed, 26 insertions(+), 0 deletions(-)
> 
> >diff --git a/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c b/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c
> >index 163a2a1..fae205a 100644
> >--- a/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c
> >+++ b/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c
> >@@ -1131,6 +1131,27 @@ unsigned int t4_flash_cfg_addr(struct adapter *adapter)
> >  		return FLASH_CFG_START;
> >  }
> >
> >+/* Return TRUE if the specified firmware matches the adapter.  I.e. T4
> >+ * firmware for T4 adapters, T5 firmware for T5 adapters, etc.  We go ahead
> >+ * and emit an error message for mismatched firmware to save our caller the
> >+ * effort ...
> >+ */
> >+static int t4_fw_matches_chip(const struct adapter *adap,
> 
>    s/int/bool/?
> 
> >+			      const struct fw_hdr *hdr)
> >+{
> >+	/* The expression below will return FALSE for any unsupported adapter
> >+	 * which will keep us "honest" in the future ...
> >+	 */
> >+	if ((is_t4(adap->params.chip) && hdr->chip == FW_HDR_CHIP_T4) ||
> >+	    (is_t5(adap->params.chip) && hdr->chip == FW_HDR_CHIP_T5))
> >+		return 1;
> 
>    s/1/true/?
> 
> >+
> >+	dev_err(adap->pdev_dev,
> >+		"FW image (%d) is not suitable for this adapter (%d)\n",
> >+		hdr->chip, CHELSIO_CHIP_VERSION(adap->params.chip));
> >+	return 0;
> 
>    s/0/false/?
> 
> [...]
> 
> WBR, Sergei
> 

Thanks for the review comment, have sent a V2 based on your comments.

^ permalink raw reply

* What's the concern about setting irq thread's policy as SCHED_FIFO
From: Qin Chuanyu @ 2014-12-03  8:06 UTC (permalink / raw)
  To: netdev; +Cc: linux-kernel

I am doing network performance test under suse11sp3 and intel 82599 nic,
Becasuse the softirq is out of schedule policy's control, so netserver 
thread couldn't always get 100% cpu usage, then packet dropped in kernel 
udp socket's receive queue.

In order to get a stable result, I did some patch in ixgbe driver and 
then use irq_thread instead of softirq to handle rx.
It seems work well, but irq_thread's SCHED_FIFO schedule policy cause 
that when the cpu is limited, netserver couldn't work at all.

So I change the irq_thread's schedule policy from SCHED_FIFO to 
SCHED_NORMAL, then the irq_thread could share the cpu usage with 
netserver thread.

the question is:
What's the concrete reason about setting irq thread's policy as SCHED_FIFO?
Except the priority affecting the cpu usage, any function would be 
broken if irq thread change to SCHED_NORMAL?

^ permalink raw reply

* wireless workshop - was: Netdev 0.1 Call for Proposals
From: Johannes Berg @ 2014-12-03  8:24 UTC (permalink / raw)
  To: Richard Guy Briggs; +Cc: netdev, linux-wireless, lwn.net, netdev01
In-Reply-To: <20141203022815.GM4612@toccata2.tricolour.ca>

On Tue, 2014-12-02 at 21:28 -0500, Richard Guy Briggs wrote:

> 1) Workshops (day 1)

FWIW, we're planning a wireless summit as a workshop here, which should
give us a little more breathing room than e.g. the 3-hour slot at LPC.
I'll put up the wiki page and other things and start working on an
agenda when I'm back home next week.

Wireless-related submissions for the remainder of the conference are
also very welcome and encouraged!

johannes

^ permalink raw reply

* Re: linux-next: manual merge of the driver-core tree with the net-next tree
From: Jeremiah Mahler @ 2014-12-03  8:36 UTC (permalink / raw)
  To: Arend van Spriel
  Cc: Stephen Rothwell, Greg KH, John W. Linville, David Miller, netdev,
	linux-next, linux-kernel, Felix Fietkau, Ben Greear
In-Reply-To: <547C19ED.10201@broadcom.com>

all,

On Mon, Dec 01, 2014 at 08:34:05AM +0100, Arend van Spriel wrote:
> On 01-12-14 08:19, Stephen Rothwell wrote:
> > Hi Greg,
> > 
> > Today's linux-next merge of the driver-core tree got a conflict in
> > drivers/net/wireless/ath/ath9k/debug.c between commits 70e535ed0029
> > ("ath9k: clean up debugfs print of reset causes"), 7b8aaead958e
> > ("ath9k: restart hardware after noise floor calibration failure") and
> > 325e18817668 ("ath9k: fix misc debugfs when not using chan context")
> > from the net-next tree and commit 631bee257bd5 ("ath: use seq_file api
> > for ath9k debugfs files") from the driver-core tree.
> > 
> > I fixed it up (see below) and can carry the fix as necessary (no action
> > is required).
> > 
> > Greg, I am not sure why those 2 commits are even in your tree.  Do they
> > depend on something else in your tree?
> 
> They do. The three commits below are related:
> 
> d32394f ath: ath9k: use debugfs_create_devm_seqfile() helper for
> seq_file entrie
> 631bee2 ath: use seq_file api for ath9k debugfs files
> 98210b7 debugfs: add helper function to create device related seq_file
> 
> The ath patches were made to provide example of using the new helper
> function and get some idea about code savings. Greg and John discussed
> who would take them. I noticed other ath changes in net-next so I kinda
> expected this email ;-)
> 
> Regards,
> Arend
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

I just ran in to a problem with one of these commits.

On an Acer C720 laptop if a suspend is performed the screen freezes,
the machine locks up, and according to the indicator lights it does
not enter suspend.  A hard reset is required to get it running again.

I have bisected the kernel and found that the following is the first bad
commit.

  commit d32394fae95741d733b174ec1446f27765f80233
  Author: Arend van Spriel <arend@broadcom.com>
  Date:   Sun Nov 9 11:32:00 2014 +0100
  
      ath: ath9k: use debugfs_create_devm_seqfile() helper for seq_file
  entries
      
      Use the helper to get rid of the file operations per debugfs file.
  The
      struct ath9k_softc pointer is set as device driver data to be
  obtained
      in the seq_file read operation.
      
      Signed-off-by: Arend van Spriel <arend@broadcom.com>
      Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Let me know if I can do anything else to help.

-- 
- Jeremiah Mahler

^ permalink raw reply

* [PATCH] bpf: arm64: lift restriction on last instruction
From: Zi Shen Lim @ 2014-12-03  8:38 UTC (permalink / raw)
  To: Alexei Starovoitov, David S. Miller, Catalin Marinas, Will Deacon
  Cc: Zi Shen Lim, Daniel Borkmann, netdev, linux-arm-kernel,
	linux-kernel

Earlier implementation assumed last instruction is BPF_EXIT.
Since this is no longer a restriction in eBPF, we remove this
limitation.

Per Alexei Starovoitov [1]:
> classic BPF has a restriction that last insn is always BPF_RET.
> eBPF doesn't have BPF_RET instruction and this restriction.
> It has BPF_EXIT insn which can appear anywhere in the program
> one or more times and it doesn't have to be last insn.

[1] https://lkml.org/lkml/2014/11/27/2

Fixes: e54bcde3d69d ("arm64: eBPF JIT compiler")
Signed-off-by: Zi Shen Lim <zlim.lnx@gmail.com>
---
 arch/arm64/net/bpf_jit_comp.c | 13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c
index 41f1e3e..edba042 100644
--- a/arch/arm64/net/bpf_jit_comp.c
+++ b/arch/arm64/net/bpf_jit_comp.c
@@ -60,7 +60,7 @@ struct jit_ctx {
 	const struct bpf_prog *prog;
 	int idx;
 	int tmp_used;
-	int body_offset;
+	int epilogue_offset;
 	int *offset;
 	u32 *image;
 };
@@ -130,8 +130,8 @@ static void jit_fill_hole(void *area, unsigned int size)
 
 static inline int epilogue_offset(const struct jit_ctx *ctx)
 {
-	int to = ctx->offset[ctx->prog->len - 1];
-	int from = ctx->idx - ctx->body_offset;
+	int to = ctx->epilogue_offset;
+	int from = ctx->idx;
 
 	return to - from;
 }
@@ -463,6 +463,8 @@ emit_cond_jmp:
 	}
 	/* function return */
 	case BPF_JMP | BPF_EXIT:
+		/* Optimization: when last instruction is EXIT,
+		   simply fallthrough to epilogue. */
 		if (i == ctx->prog->len - 1)
 			break;
 		jmp_offset = epilogue_offset(ctx);
@@ -685,11 +687,13 @@ void bpf_int_jit_compile(struct bpf_prog *prog)
 
 	/* 1. Initial fake pass to compute ctx->idx. */
 
-	/* Fake pass to fill in ctx->offset. */
+	/* Fake pass to fill in ctx->offset and ctx->tmp_used. */
 	if (build_body(&ctx))
 		goto out;
 
 	build_prologue(&ctx);
+
+	ctx.epilogue_offset = ctx.idx;
 	build_epilogue(&ctx);
 
 	/* Now we know the actual image size. */
@@ -706,7 +710,6 @@ void bpf_int_jit_compile(struct bpf_prog *prog)
 
 	build_prologue(&ctx);
 
-	ctx.body_offset = ctx.idx;
 	if (build_body(&ctx)) {
 		bpf_jit_binary_free(header);
 		goto out;
-- 
1.9.1

^ permalink raw reply related

* Re: [PATCH net] gso: do GSO for local skb with size bigger than MTU
From: Michael S. Tsirkin @ 2014-12-03  9:03 UTC (permalink / raw)
  To: Jesse Gross
  Cc: dev-yBygre7rU0TnMu66kgdUjQ@public.gmane.org,
	netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Jason Wang,
	Du, Fan, fw-HFFVJYpyMKqzQB+pC5nmwQ@public.gmane.org,
	davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org
In-Reply-To: <CAEP_g=-86Z6pxNow-wjnbx_v9er_TSn6x5waigqVqYHa7tEQJw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

On Tue, Dec 02, 2014 at 10:12:04AM -0800, Jesse Gross wrote:
> On Tue, Dec 2, 2014 at 9:41 AM, Thomas Graf <tgraf@suug.ch> wrote:
> > On 12/02/14 at 07:34pm, Michael S. Tsirkin wrote:
> >> On Tue, Dec 02, 2014 at 05:09:27PM +0000, Thomas Graf wrote:
> >> > On 12/02/14 at 01:48pm, Flavio Leitner wrote:
> >> > > What about containers or any other virtualization environment that
> >> > > doesn't use Virtio?
> >> >
> >> > The host can dictate the MTU in that case for both veth or OVS
> >> > internal which would be primary container plumbing techniques.
> >>
> >> It typically can't do this easily for VMs with emulated devices:
> >> real ethernet uses a fixed MTU.
> >>
> >> IMHO it's confusing to suggest MTU as a fix for this bug, it's
> >> an unrelated optimization.
> >> ICMP_DEST_UNREACH/ICMP_FRAG_NEEDED is the right fix here.
> >
> > PMTU discovery only resolves the issue if an actual IP stack is
> > running inside the VM. This may not be the case at all.
> 
> It's also only really a correct thing to do if the ICMP packet is
> coming from an L3 node. If you are doing straight bridging then you
> have to resort to hacks like OVS had before, which I agree are not
> particularly desirable.

The issue seems to be that fundamentally, this is
bridging interfaces with variable MTUs (even if MTU values
on devices don't let us figure this out)-
that is already not straight bridging, and
I would argue sending ICMPs back is the right thing to do.


> > I agree that exposing an MTU towards the guest is not applicable
> > in all situations, in particular because it is difficult to decide
> > what MTU to expose. It is a relatively elegant solution in a lot
> > of virtualization host cases hooked up to an orchestration system
> > though.
> 
> I also think this is the right thing to do as a common case
> optimization and I know other platforms (such as Hyper-V) do it. It's
> not a complete solution so we still need the original patch in this
> thread to handle things transparently.

Well, as I believe David (and independently Jason) is saying, it looks like
the ICMPs we are sending back after applying the original patch have the
wrong MTU.

And if I understand what David is saying here, IP is also the wrong place to
do it.

-- 
MST
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

^ permalink raw reply

* RE: [PATCH net-next] r8152: reduce memory copy for rx
From: Hayes Wang @ 2014-12-03  9:09 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: netdev@vger.kernel.org, nic_swsd, linux-kernel@vger.kernel.org,
	linux-usb@vger.kernel.org
In-Reply-To: <1417590925.5303.127.camel@edumazet-glaptop2.roam.corp.google.com>

Eric Dumazet [mailto:eric.dumazet@gmail.com] 
> Sent: Wednesday, December 03, 2014 3:15 PM
[...]
> Have you tried using more concurrent RX flows, in a possibly lossy
> environment (so that TCP is forced to queue packets in out of order
> queue) ?

I don't do the test. I would check it next time.

> skb cloning prevents GRO and TCP coalescing from working.
> 
> netfilter might also be forced to copy whole frame in case a mangle is
> needed (eg with NAT ...)
> 
> I would rather try to implement GRO, and/or using fragments instead of
> pure linear skbs.
> 
> (skb->head would be around 128 or 256 bytes, and you attach to skb the
> frame as a page fragment)

Thanks for your response. I would study the GRO first.
 
Best Regards,
Hayes

^ permalink raw reply

* Re: [PATCH] net: less interrupt masking in NAPI
From: Yang Yingliang @ 2014-12-03  9:26 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev, willemb
In-Reply-To: <1417592482.5303.132.camel@edumazet-glaptop2.roam.corp.google.com>

On 2014/12/3 15:41, Eric Dumazet wrote:
> On Wed, 2014-12-03 at 15:31 +0800, Yang Yingliang wrote:
> 
>> This patch can resolve my performance problem.
>> Will/Can this patch queue for stable ?
> 
> Hmm.. please give us more details, I am surprised it can have
> a big impact.
> 
> 
> 
> 
Before this patch, when a large network flow arrives, some other processes
response slowly or even don't response because the cpu is dealing with softirq.

After this patch, under pressure, much more softirq is doing in ksoftirqd. The other
processes be scheduled.

My system has dual core.

Regards,
Yang

^ permalink raw reply

* Re: [PATCH net-next] virtio-net: don't do header check for dodgy gso packets
From: Michael S. Tsirkin @ 2014-12-03  9:31 UTC (permalink / raw)
  To: Jason Wang; +Cc: netdev, linux-kernel, virtualization
In-Reply-To: <1417588844-11095-1-git-send-email-jasowang@redhat.com>

On Wed, Dec 03, 2014 at 02:40:44PM +0800, Jason Wang wrote:
> There's no need to do header check for virito-net since:

s/virito/virtio/

> 
> - Host set dodgy for all gso packets from guest and check the header.

s/set/sets/

> - Host should prepare for all kinds of evil packets from guest, since

s/prepare/be prepared/

>   malicious guest can send any kinds of packet.
> 
> So this patch sets NETIF_F_GSO_ROBUST for virtio-net to skip the check.
> 
> Cc: Rusty Russell <rusty@rustcorp.com.au>
> Cc: Michael S. Tsirkin <mst@redhat.com>
> Signed-off-by: Jason Wang <jasowang@redhat.com>

with the comment fixes:

Acked-by: Michael S. Tsirkin <mst@redhat.com>

> ---
>  drivers/net/virtio_net.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index b0bc8ea..4cd242b 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -1760,6 +1760,8 @@ static int virtnet_probe(struct virtio_device *vdev)
>  		if (virtio_has_feature(vdev, VIRTIO_NET_F_HOST_ECN))
>  			dev->hw_features |= NETIF_F_TSO_ECN;
>  
> +		dev->features |= NETIF_F_GSO_ROBUST;
> +
>  		if (gso)
>  			dev->features |= dev->hw_features & NETIF_F_ALL_TSO;
>  		/* (!csum && gso) case will be fixed by register_netdev() */
> -- 
> 1.9.1
> 
> _______________________________________________
> Virtualization mailing list
> Virtualization@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply

* Re: [PATCH v3 net-next 1/2] net: allow large number of rx queues
From: Michael S. Tsirkin @ 2014-12-03  9:42 UTC (permalink / raw)
  To: Pankaj Gupta
  Cc: linux-kernel, netdev, davem, jasowang, dgibson, vfalico, edumazet,
	vyasevic, hkchu, wuzhy, xemul, therbert, bhutchings, xii, stephen,
	jiri, sergei.shtylyov
In-Reply-To: <1417591177-7985-2-git-send-email-pagupta@redhat.com>

On Wed, Dec 03, 2014 at 12:49:36PM +0530, Pankaj Gupta wrote:
> netif_alloc_rx_queues() uses kcalloc() to allocate memory
> for "struct netdev_queue *_rx" array.
> If we are doing large rx queue allocation kcalloc() might
> fail, so this patch does a fallback to vzalloc().
> Similar implementation is done for tx queue allocation in
> netif_alloc_netdev_queues().
> 
> We avoid failure of high order memory allocation
> with the help of vzalloc(), this allows us to do large
> rx and tx queue allocation which in turn helps us to
> increase the number of queues in tun.
> 
> As vmalloc() adds overhead on a critical network path,
> __GFP_REPEAT flag is used with kzalloc() to do this fallback
> only when really needed.
> 
> Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
> Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
> Reviewed-by: David Gibson <dgibson@redhat.com>
> ---
>  net/core/dev.c | 19 +++++++++++++------
>  1 file changed, 13 insertions(+), 6 deletions(-)
> 
> diff --git a/net/core/dev.c b/net/core/dev.c
> index e916ba8..abe9560 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -6059,17 +6059,25 @@ void netif_stacked_transfer_operstate(const struct net_device *rootdev,
>  EXPORT_SYMBOL(netif_stacked_transfer_operstate);
>  
>  #ifdef CONFIG_SYSFS
> +static void netif_free_rx_queues(struct net_device *dev)
> +{
> +	kvfree(dev->_rx);
> +}
> +

I would just open-code this.

>  static int netif_alloc_rx_queues(struct net_device *dev)
>  {
>  	unsigned int i, count = dev->num_rx_queues;
>  	struct netdev_rx_queue *rx;
> +	size_t sz = count * sizeof(*rx);
>  
>  	BUG_ON(count < 1);
>  
> -	rx = kcalloc(count, sizeof(struct netdev_rx_queue), GFP_KERNEL);
> -	if (!rx)
> -		return -ENOMEM;
> -
> +	rx = kzalloc(sz, GFP_KERNEL | __GFP_NOWARN | __GFP_REPEAT);
> +	if (!rx) {
> +		rx = vzalloc(sz);
> +		if (!rx)
> +			return -ENOMEM;
> +	}
>  	dev->_rx = rx;
>  
>  	for (i = 0; i < count; i++)
> @@ -6698,9 +6706,8 @@ void free_netdev(struct net_device *dev)
>  
>  	netif_free_tx_queues(dev);
>  #ifdef CONFIG_SYSFS
> -	kfree(dev->_rx);
> +	netif_free_rx_queues(dev);
>  #endif
> -

and I think it's nicer with the empty line.

>  	kfree(rcu_dereference_protected(dev->ingress_queue, 1));
>  
>  	/* Flush device addresses */
> -- 
> 1.8.3.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH v3 net-next 2/2 tuntap: Increase the number of queues in tun.
From: Michael S. Tsirkin @ 2014-12-03  9:52 UTC (permalink / raw)
  To: Pankaj Gupta
  Cc: linux-kernel, netdev, davem, jasowang, dgibson, vfalico, edumazet,
	vyasevic, hkchu, wuzhy, xemul, therbert, bhutchings, xii, stephen,
	jiri, sergei.shtylyov
In-Reply-To: <1417591177-7985-3-git-send-email-pagupta@redhat.com>

On Wed, Dec 03, 2014 at 12:49:37PM +0530, Pankaj Gupta wrote:
> Networking under kvm works best if we allocate a per-vCPU RX and TX
> queue in a virtual NIC. This requires a per-vCPU queue on the host side.
> 
> It is now safe to increase the maximum number of queues.
> Preceding patche: 'net: allow large number of rx queues'

s/patche/patch/

> made sure this won't cause failures due to high order memory
> allocations. Increase it to 256: this is the max number of vCPUs
> KVM supports.
> 
> Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
> Reviewed-by: David Gibson <dgibson@redhat.com>

Hmm it's kind of nasty that each tun device is now using x16 memory.
Maybe we should look at using a flex array instead, and removing the
limitation altogether (e.g. make it INT_MAX)?


> ---
>  drivers/net/tun.c | 9 +++++----
>  1 file changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index e3fa65a..a19dc5f8 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -113,10 +113,11 @@ struct tap_filter {
>  	unsigned char	addr[FLT_EXACT_COUNT][ETH_ALEN];
>  };
>  
> -/* DEFAULT_MAX_NUM_RSS_QUEUES were chosen to let the rx/tx queues allocated for
> - * the netdevice to be fit in one page. So we can make sure the success of
> - * memory allocation. TODO: increase the limit. */
> -#define MAX_TAP_QUEUES DEFAULT_MAX_NUM_RSS_QUEUES
> +/* MAX_TAP_QUEUES 256 is chosen to allow rx/tx queues to be equal
> + * to max number of vCPUS in guest. Also, we are making sure here
> + * queue memory allocation do not fail.

It's not queue memory allocation anymore, is it?
I would say "
This also helps the tfiles field fit in 4K, so the whole tun
device only needs an order-1 allocation.
"

> + */
> +#define MAX_TAP_QUEUES 256
>  #define MAX_TAP_FLOWS  4096
>  
>  #define TUN_FLOW_EXPIRE (3 * HZ)
> -- 
> 1.8.3.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH 2/5] xfrm: add XFRMA_REPLAY_VAL attribute to SA messages
From: Steffen Klassert @ 2014-12-03 10:04 UTC (permalink / raw)
  To: David Miller; +Cc: Herbert Xu, Steffen Klassert, netdev
In-Reply-To: <1417601101-18625-1-git-send-email-steffen.klassert@secunet.com>

From: dingzhi <zhi.ding@6wind.com>

After this commit, the attribute XFRMA_REPLAY_VAL is added when no ESN replay
value is defined. Thus sequence number values are always notified to userspace.

Signed-off-by: dingzhi <zhi.ding@6wind.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 net/xfrm/xfrm_user.c | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
index e812e98..8128594 100644
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -824,13 +824,15 @@ static int copy_to_user_state_extra(struct xfrm_state *x,
 	ret = xfrm_mark_put(skb, &x->mark);
 	if (ret)
 		goto out;
-	if (x->replay_esn) {
+	if (x->replay_esn)
 		ret = nla_put(skb, XFRMA_REPLAY_ESN_VAL,
 			      xfrm_replay_state_esn_len(x->replay_esn),
 			      x->replay_esn);
-		if (ret)
-			goto out;
-	}
+	else
+		ret = nla_put(skb, XFRMA_REPLAY_VAL, sizeof(x->replay),
+			      &x->replay);
+	if (ret)
+		goto out;
 	if (x->security)
 		ret = copy_sec_ctx(x->security, skb);
 out:
@@ -2569,6 +2571,8 @@ static inline size_t xfrm_sa_len(struct xfrm_state *x)
 		l += nla_total_size(sizeof(x->tfcpad));
 	if (x->replay_esn)
 		l += nla_total_size(xfrm_replay_state_esn_len(x->replay_esn));
+	else
+		l += nla_total_size(sizeof(struct xfrm_replay_state));
 	if (x->security)
 		l += nla_total_size(sizeof(struct xfrm_user_sec_ctx) +
 				    x->security->ctx_len);
-- 
1.9.1

^ permalink raw reply related

* pull request (net-next): ipsec-next 2014-12-03
From: Steffen Klassert @ 2014-12-03 10:04 UTC (permalink / raw)
  To: David Miller; +Cc: Herbert Xu, Steffen Klassert, netdev

1) Fix a set but not used warning. From Fabian Frederick.

2) Currently we make sequence number values available to userspace
   only if we use ESN. Make the sequence number values also available
   for non ESN states. From Zhi Ding.

3) Remove socket policy hashing. We don't need it because socket
   policies are always looked up via a linked list. From Herbert Xu.

4) After removing socket policy hashing, we can use __xfrm_policy_link
   in xfrm_policy_insert. From Herbert Xu.

5) Add a lookup method for vti6 tunnels with wildcard endpoints.
   I forgot this when I initially implemented vti6.
   
Please pull or let me know if there are problems.

Thanks!

The following changes since commit 49cc91f919e2a94df8c9f99aae9b405444e88624:

  Merge branch 's390-next' (2014-10-26 22:21:45 -0400)

are available in the git repository at:


  git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next.git master

for you to fetch changes up to fbe68ee87522f6eaa10f9076c0a7117e1613f2f7:

  vti6: Add a lookup method for tunnels with wildcard endpoints. (2014-11-20 10:03:07 +0100)

----------------------------------------------------------------
Fabian Frederick (1):
      xfrm: fix set but not used warning in xfrm_policy_queue_process()

Herbert Xu (2):
      xfrm: Do not hash socket policies
      xfrm: Use __xfrm_policy_link in xfrm_policy_insert

Steffen Klassert (1):
      vti6: Add a lookup method for tunnels with wildcard endpoints.

dingzhi (1):
      xfrm: add XFRMA_REPLAY_VAL attribute to SA messages

 include/net/netns/xfrm.h |  4 ++--
 net/ipv6/ip6_vti.c       | 17 ++++++++++++++++
 net/xfrm/xfrm_policy.c   | 52 +++++++++++++++++++++++++++---------------------
 net/xfrm/xfrm_user.c     | 12 +++++++----
 4 files changed, 56 insertions(+), 29 deletions(-)

^ permalink raw reply

* [PATCH 3/5] xfrm: Do not hash socket policies
From: Steffen Klassert @ 2014-12-03 10:04 UTC (permalink / raw)
  To: David Miller; +Cc: Herbert Xu, Steffen Klassert, netdev
In-Reply-To: <1417601101-18625-1-git-send-email-steffen.klassert@secunet.com>

From: Herbert Xu <herbert@gondor.apana.org.au>

Back in 2003 when I added policy expiration, I half-heartedly
did a clean-up and renamed xfrm_sk_policy_link/xfrm_sk_policy_unlink
to __xfrm_policy_link/__xfrm_policy_unlink, because the latter
could be reused for all policies.  I never actually got around
to using __xfrm_policy_link for non-socket policies.

Later on hashing was added to all xfrm policies, including socket
policies.  In fact, we don't need hashing on socket policies at
all since they're always looked up via a linked list.

This patch restores xfrm_sk_policy_link/xfrm_sk_policy_unlink
as wrappers around __xfrm_policy_link/__xfrm_policy_unlink so
that it's obvious we're dealing with socket policies.

This patch also removes hashing from __xfrm_policy_link as for
now it's only used by socket policies which do not need to be
hashed.  Ironically this will in fact allow us to use this helper
for non-socket policies which I shall do later.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 include/net/netns/xfrm.h |  4 ++--
 net/xfrm/xfrm_policy.c   | 44 ++++++++++++++++++++++++++------------------
 2 files changed, 28 insertions(+), 20 deletions(-)

diff --git a/include/net/netns/xfrm.h b/include/net/netns/xfrm.h
index 9da7982..730d82a 100644
--- a/include/net/netns/xfrm.h
+++ b/include/net/netns/xfrm.h
@@ -50,8 +50,8 @@ struct netns_xfrm {
 	struct list_head	policy_all;
 	struct hlist_head	*policy_byidx;
 	unsigned int		policy_idx_hmask;
-	struct hlist_head	policy_inexact[XFRM_POLICY_MAX * 2];
-	struct xfrm_policy_hash	policy_bydst[XFRM_POLICY_MAX * 2];
+	struct hlist_head	policy_inexact[XFRM_POLICY_MAX];
+	struct xfrm_policy_hash	policy_bydst[XFRM_POLICY_MAX];
 	unsigned int		policy_count[XFRM_POLICY_MAX * 2];
 	struct work_struct	policy_hash_work;
 	struct xfrm_policy_hthresh policy_hthresh;
diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
index dc65324..f4d3a12 100644
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -561,7 +561,7 @@ static void xfrm_hash_resize(struct work_struct *work)
 	mutex_lock(&hash_resize_mutex);
 
 	total = 0;
-	for (dir = 0; dir < XFRM_POLICY_MAX * 2; dir++) {
+	for (dir = 0; dir < XFRM_POLICY_MAX; dir++) {
 		if (xfrm_bydst_should_resize(net, dir, &total))
 			xfrm_bydst_resize(net, dir);
 	}
@@ -601,7 +601,7 @@ static void xfrm_hash_rebuild(struct work_struct *work)
 	write_lock_bh(&net->xfrm.xfrm_policy_lock);
 
 	/* reset the bydst and inexact table in all directions */
-	for (dir = 0; dir < XFRM_POLICY_MAX * 2; dir++) {
+	for (dir = 0; dir < XFRM_POLICY_MAX; dir++) {
 		INIT_HLIST_HEAD(&net->xfrm.policy_inexact[dir]);
 		hmask = net->xfrm.policy_bydst[dir].hmask;
 		odst = net->xfrm.policy_bydst[dir].table;
@@ -1247,17 +1247,10 @@ out:
 static void __xfrm_policy_link(struct xfrm_policy *pol, int dir)
 {
 	struct net *net = xp_net(pol);
-	struct hlist_head *chain = policy_hash_bysel(net, &pol->selector,
-						     pol->family, dir);
 
 	list_add(&pol->walk.all, &net->xfrm.policy_all);
-	hlist_add_head(&pol->bydst, chain);
-	hlist_add_head(&pol->byidx, net->xfrm.policy_byidx+idx_hash(net, pol->index));
 	net->xfrm.policy_count[dir]++;
 	xfrm_pol_hold(pol);
-
-	if (xfrm_bydst_should_resize(net, dir, NULL))
-		schedule_work(&net->xfrm.policy_hash_work);
 }
 
 static struct xfrm_policy *__xfrm_policy_unlink(struct xfrm_policy *pol,
@@ -1265,17 +1258,31 @@ static struct xfrm_policy *__xfrm_policy_unlink(struct xfrm_policy *pol,
 {
 	struct net *net = xp_net(pol);
 
-	if (hlist_unhashed(&pol->bydst))
+	if (list_empty(&pol->walk.all))
 		return NULL;
 
-	hlist_del_init(&pol->bydst);
-	hlist_del(&pol->byidx);
-	list_del(&pol->walk.all);
+	/* Socket policies are not hashed. */
+	if (!hlist_unhashed(&pol->bydst)) {
+		hlist_del(&pol->bydst);
+		hlist_del(&pol->byidx);
+	}
+
+	list_del_init(&pol->walk.all);
 	net->xfrm.policy_count[dir]--;
 
 	return pol;
 }
 
+static void xfrm_sk_policy_link(struct xfrm_policy *pol, int dir)
+{
+	__xfrm_policy_link(pol, XFRM_POLICY_MAX + dir);
+}
+
+static void xfrm_sk_policy_unlink(struct xfrm_policy *pol, int dir)
+{
+	__xfrm_policy_unlink(pol, XFRM_POLICY_MAX + dir);
+}
+
 int xfrm_policy_delete(struct xfrm_policy *pol, int dir)
 {
 	struct net *net = xp_net(pol);
@@ -1307,7 +1314,7 @@ int xfrm_sk_policy_insert(struct sock *sk, int dir, struct xfrm_policy *pol)
 	if (pol) {
 		pol->curlft.add_time = get_seconds();
 		pol->index = xfrm_gen_index(net, XFRM_POLICY_MAX+dir, 0);
-		__xfrm_policy_link(pol, XFRM_POLICY_MAX+dir);
+		xfrm_sk_policy_link(pol, dir);
 	}
 	if (old_pol) {
 		if (pol)
@@ -1316,7 +1323,7 @@ int xfrm_sk_policy_insert(struct sock *sk, int dir, struct xfrm_policy *pol)
 		/* Unlinking succeeds always. This is the only function
 		 * allowed to delete or replace socket policy.
 		 */
-		__xfrm_policy_unlink(old_pol, XFRM_POLICY_MAX+dir);
+		xfrm_sk_policy_unlink(old_pol, dir);
 	}
 	write_unlock_bh(&net->xfrm.xfrm_policy_lock);
 
@@ -1349,7 +1356,7 @@ static struct xfrm_policy *clone_policy(const struct xfrm_policy *old, int dir)
 		memcpy(newp->xfrm_vec, old->xfrm_vec,
 		       newp->xfrm_nr*sizeof(struct xfrm_tmpl));
 		write_lock_bh(&net->xfrm.xfrm_policy_lock);
-		__xfrm_policy_link(newp, XFRM_POLICY_MAX+dir);
+		xfrm_sk_policy_link(newp, dir);
 		write_unlock_bh(&net->xfrm.xfrm_policy_lock);
 		xfrm_pol_put(newp);
 	}
@@ -2965,10 +2972,11 @@ static int __net_init xfrm_policy_init(struct net *net)
 		goto out_byidx;
 	net->xfrm.policy_idx_hmask = hmask;
 
-	for (dir = 0; dir < XFRM_POLICY_MAX * 2; dir++) {
+	for (dir = 0; dir < XFRM_POLICY_MAX; dir++) {
 		struct xfrm_policy_hash *htab;
 
 		net->xfrm.policy_count[dir] = 0;
+		net->xfrm.policy_count[XFRM_POLICY_MAX + dir] = 0;
 		INIT_HLIST_HEAD(&net->xfrm.policy_inexact[dir]);
 
 		htab = &net->xfrm.policy_bydst[dir];
@@ -3020,7 +3028,7 @@ static void xfrm_policy_fini(struct net *net)
 
 	WARN_ON(!list_empty(&net->xfrm.policy_all));
 
-	for (dir = 0; dir < XFRM_POLICY_MAX * 2; dir++) {
+	for (dir = 0; dir < XFRM_POLICY_MAX; dir++) {
 		struct xfrm_policy_hash *htab;
 
 		WARN_ON(!hlist_empty(&net->xfrm.policy_inexact[dir]));
-- 
1.9.1

^ permalink raw reply related

* [PATCH 4/5] xfrm: Use __xfrm_policy_link in xfrm_policy_insert
From: Steffen Klassert @ 2014-12-03 10:05 UTC (permalink / raw)
  To: David Miller; +Cc: Herbert Xu, Steffen Klassert, netdev
In-Reply-To: <1417601101-18625-1-git-send-email-steffen.klassert@secunet.com>

From: Herbert Xu <herbert@gondor.apana.org.au>

For a long time we couldn't actually use __xfrm_policy_link in
xfrm_policy_insert because the latter wanted to do hashing at
a specific position.

Now that __xfrm_policy_link no longer does hashing it can now
be safely used in xfrm_policy_insert to kill some duplicate code,
finally reuniting general policies with socket policies.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 net/xfrm/xfrm_policy.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
index f4d3a12..178fa0e 100644
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -55,6 +55,7 @@ static int stale_bundle(struct dst_entry *dst);
 static int xfrm_bundle_ok(struct xfrm_dst *xdst);
 static void xfrm_policy_queue_process(unsigned long arg);
 
+static void __xfrm_policy_link(struct xfrm_policy *pol, int dir);
 static struct xfrm_policy *__xfrm_policy_unlink(struct xfrm_policy *pol,
 						int dir);
 
@@ -779,8 +780,7 @@ int xfrm_policy_insert(int dir, struct xfrm_policy *policy, int excl)
 		hlist_add_behind(&policy->bydst, newpos);
 	else
 		hlist_add_head(&policy->bydst, chain);
-	xfrm_pol_hold(policy);
-	net->xfrm.policy_count[dir]++;
+	__xfrm_policy_link(policy, dir);
 	atomic_inc(&net->xfrm.flow_cache_genid);
 
 	/* After previous checking, family can either be AF_INET or AF_INET6 */
@@ -799,7 +799,6 @@ int xfrm_policy_insert(int dir, struct xfrm_policy *policy, int excl)
 	policy->curlft.use_time = 0;
 	if (!mod_timer(&policy->timer, jiffies + HZ))
 		xfrm_pol_hold(policy);
-	list_add(&policy->walk.all, &net->xfrm.policy_all);
 	write_unlock_bh(&net->xfrm.xfrm_policy_lock);
 
 	if (delpol)
-- 
1.9.1

^ permalink raw reply related

* [PATCH 5/5] vti6: Add a lookup method for tunnels with wildcard endpoints.
From: Steffen Klassert @ 2014-12-03 10:05 UTC (permalink / raw)
  To: David Miller; +Cc: Herbert Xu, Steffen Klassert, netdev
In-Reply-To: <1417601101-18625-1-git-send-email-steffen.klassert@secunet.com>

Currently we can't lookup tunnels with wildcard endpoints.
This patch adds a method to lookup these tunnels in the
receive path.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 net/ipv6/ip6_vti.c | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/net/ipv6/ip6_vti.c b/net/ipv6/ip6_vti.c
index d440bb5..6101919 100644
--- a/net/ipv6/ip6_vti.c
+++ b/net/ipv6/ip6_vti.c
@@ -95,6 +95,7 @@ vti6_tnl_lookup(struct net *net, const struct in6_addr *remote,
 	unsigned int hash = HASH(remote, local);
 	struct ip6_tnl *t;
 	struct vti6_net *ip6n = net_generic(net, vti6_net_id);
+	struct in6_addr any;
 
 	for_each_vti6_tunnel_rcu(ip6n->tnls_r_l[hash]) {
 		if (ipv6_addr_equal(local, &t->parms.laddr) &&
@@ -102,6 +103,22 @@ vti6_tnl_lookup(struct net *net, const struct in6_addr *remote,
 		    (t->dev->flags & IFF_UP))
 			return t;
 	}
+
+	memset(&any, 0, sizeof(any));
+	hash = HASH(&any, local);
+	for_each_vti6_tunnel_rcu(ip6n->tnls_r_l[hash]) {
+		if (ipv6_addr_equal(local, &t->parms.laddr) &&
+		    (t->dev->flags & IFF_UP))
+			return t;
+	}
+
+	hash = HASH(remote, &any);
+	for_each_vti6_tunnel_rcu(ip6n->tnls_r_l[hash]) {
+		if (ipv6_addr_equal(remote, &t->parms.raddr) &&
+		    (t->dev->flags & IFF_UP))
+			return t;
+	}
+
 	t = rcu_dereference(ip6n->tnls_wc[0]);
 	if (t && (t->dev->flags & IFF_UP))
 		return t;
-- 
1.9.1

^ permalink raw reply related

* [PATCH 1/5] xfrm: fix set but not used warning in xfrm_policy_queue_process()
From: Steffen Klassert @ 2014-12-03 10:04 UTC (permalink / raw)
  To: David Miller; +Cc: Herbert Xu, Steffen Klassert, netdev
In-Reply-To: <1417601101-18625-1-git-send-email-steffen.klassert@secunet.com>

From: Fabian Frederick <fabf@skynet.be>

err was set but unused.

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 net/xfrm/xfrm_policy.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
index 4c4e457..dc65324 100644
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -1878,7 +1878,6 @@ xfrm_resolve_and_create_bundle(struct xfrm_policy **pols, int num_pols,
 
 static void xfrm_policy_queue_process(unsigned long arg)
 {
-	int err = 0;
 	struct sk_buff *skb;
 	struct sock *sk;
 	struct dst_entry *dst;
@@ -1941,7 +1940,7 @@ static void xfrm_policy_queue_process(unsigned long arg)
 		skb_dst_drop(skb);
 		skb_dst_set(skb, dst);
 
-		err = dst_output(skb);
+		dst_output(skb);
 	}
 
 out:
-- 
1.9.1

^ permalink raw reply related

* [patch] ipvs: uninitialized data with IP_VS_IPV6
From: Dan Carpenter @ 2014-12-03 10:12 UTC (permalink / raw)
  To: Wensong Zhang
  Cc: Simon Horman, Julian Anastasov, Pablo Neira Ayuso,
	Patrick McHardy, Jozsef Kadlecsik, David S. Miller, netdev,
	lvs-devel, netfilter-devel, coreteam, kernel-janitors

The app_tcp_pkt_out() function expects "*diff" to be set and ends up
using uninitialized data if CONFIG_IP_VS_IPV6 is turned on.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
---
This bug is very old.

diff --git a/net/netfilter/ipvs/ip_vs_ftp.c b/net/netfilter/ipvs/ip_vs_ftp.c
index 1d5341f..f93f974 100644
--- a/net/netfilter/ipvs/ip_vs_ftp.c
+++ b/net/netfilter/ipvs/ip_vs_ftp.c
@@ -183,6 +183,8 @@ static int ip_vs_ftp_out(struct ip_vs_app *app, struct ip_vs_conn *cp,
 	struct nf_conn *ct;
 	struct net *net;
 
+	*diff = 0;
+
 #ifdef CONFIG_IP_VS_IPV6
 	/* This application helper doesn't work with IPv6 yet,
 	 * so turn this into a no-op for IPv6 packets
@@ -191,8 +193,6 @@ static int ip_vs_ftp_out(struct ip_vs_app *app, struct ip_vs_conn *cp,
 		return 1;
 #endif
 
-	*diff = 0;
-
 	/* Only useful for established sessions */
 	if (cp->state != IP_VS_TCP_S_ESTABLISHED)
 		return 1;

^ permalink raw reply related

* [patch net-next 1/2] rocker: introduce be put/get variants and use it when appropriate
From: Jiri Pirko @ 2014-12-03 10:32 UTC (permalink / raw)
  To: netdev; +Cc: davem, sfeldma

This kills the sparse warnings.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
---
 drivers/net/ethernet/rocker/rocker.c | 79 ++++++++++++++++++++++--------------
 1 file changed, 48 insertions(+), 31 deletions(-)

diff --git a/drivers/net/ethernet/rocker/rocker.c b/drivers/net/ethernet/rocker/rocker.c
index fded127..4b060fb 100644
--- a/drivers/net/ethernet/rocker/rocker.c
+++ b/drivers/net/ethernet/rocker/rocker.c
@@ -648,6 +648,11 @@ static u16 rocker_tlv_get_u16(const struct rocker_tlv *tlv)
 	return *(u16 *) rocker_tlv_data(tlv);
 }
 
+static __be16 rocker_tlv_get_be16(const struct rocker_tlv *tlv)
+{
+	return *(__be16 *) rocker_tlv_data(tlv);
+}
+
 static u32 rocker_tlv_get_u32(const struct rocker_tlv *tlv)
 {
 	return *(u32 *) rocker_tlv_data(tlv);
@@ -726,12 +731,24 @@ static int rocker_tlv_put_u16(struct rocker_desc_info *desc_info,
 	return rocker_tlv_put(desc_info, attrtype, sizeof(u16), &value);
 }
 
+static int rocker_tlv_put_be16(struct rocker_desc_info *desc_info,
+			       int attrtype, __be16 value)
+{
+	return rocker_tlv_put(desc_info, attrtype, sizeof(__be16), &value);
+}
+
 static int rocker_tlv_put_u32(struct rocker_desc_info *desc_info,
 			      int attrtype, u32 value)
 {
 	return rocker_tlv_put(desc_info, attrtype, sizeof(u32), &value);
 }
 
+static int rocker_tlv_put_be32(struct rocker_desc_info *desc_info,
+			       int attrtype, __be32 value)
+{
+	return rocker_tlv_put(desc_info, attrtype, sizeof(__be32), &value);
+}
+
 static int rocker_tlv_put_u64(struct rocker_desc_info *desc_info,
 			      int attrtype, u64 value)
 {
@@ -1343,7 +1360,7 @@ static int rocker_event_mac_vlan_seen(struct rocker *rocker,
 	port_number =
 		rocker_tlv_get_u32(attrs[ROCKER_TLV_EVENT_MAC_VLAN_LPORT]) - 1;
 	addr = rocker_tlv_data(attrs[ROCKER_TLV_EVENT_MAC_VLAN_MAC]);
-	vlan_id = rocker_tlv_get_u16(attrs[ROCKER_TLV_EVENT_MAC_VLAN_VLAN_ID]);
+	vlan_id = rocker_tlv_get_be16(attrs[ROCKER_TLV_EVENT_MAC_VLAN_VLAN_ID]);
 
 	if (port_number >= rocker->port_count)
 		return -EINVAL;
@@ -1717,18 +1734,18 @@ static int rocker_cmd_flow_tbl_add_vlan(struct rocker_desc_info *desc_info,
 	if (rocker_tlv_put_u32(desc_info, ROCKER_TLV_OF_DPA_IN_LPORT,
 			       entry->key.vlan.in_lport))
 		return -EMSGSIZE;
-	if (rocker_tlv_put_u16(desc_info, ROCKER_TLV_OF_DPA_VLAN_ID,
-			       entry->key.vlan.vlan_id))
+	if (rocker_tlv_put_be16(desc_info, ROCKER_TLV_OF_DPA_VLAN_ID,
+				entry->key.vlan.vlan_id))
 		return -EMSGSIZE;
-	if (rocker_tlv_put_u16(desc_info, ROCKER_TLV_OF_DPA_VLAN_ID_MASK,
-			       entry->key.vlan.vlan_id_mask))
+	if (rocker_tlv_put_be16(desc_info, ROCKER_TLV_OF_DPA_VLAN_ID_MASK,
+				entry->key.vlan.vlan_id_mask))
 		return -EMSGSIZE;
 	if (rocker_tlv_put_u16(desc_info, ROCKER_TLV_OF_DPA_GOTO_TABLE_ID,
 			       entry->key.vlan.goto_tbl))
 		return -EMSGSIZE;
 	if (entry->key.vlan.untagged &&
-	    rocker_tlv_put_u16(desc_info, ROCKER_TLV_OF_DPA_NEW_VLAN_ID,
-			       entry->key.vlan.new_vlan_id))
+	    rocker_tlv_put_be16(desc_info, ROCKER_TLV_OF_DPA_NEW_VLAN_ID,
+				entry->key.vlan.new_vlan_id))
 		return -EMSGSIZE;
 
 	return 0;
@@ -1743,8 +1760,8 @@ static int rocker_cmd_flow_tbl_add_term_mac(struct rocker_desc_info *desc_info,
 	if (rocker_tlv_put_u32(desc_info, ROCKER_TLV_OF_DPA_IN_LPORT_MASK,
 			       entry->key.term_mac.in_lport_mask))
 		return -EMSGSIZE;
-	if (rocker_tlv_put_u16(desc_info, ROCKER_TLV_OF_DPA_ETHERTYPE,
-			       entry->key.term_mac.eth_type))
+	if (rocker_tlv_put_be16(desc_info, ROCKER_TLV_OF_DPA_ETHERTYPE,
+				entry->key.term_mac.eth_type))
 		return -EMSGSIZE;
 	if (rocker_tlv_put(desc_info, ROCKER_TLV_OF_DPA_DST_MAC,
 			   ETH_ALEN, entry->key.term_mac.eth_dst))
@@ -1752,11 +1769,11 @@ static int rocker_cmd_flow_tbl_add_term_mac(struct rocker_desc_info *desc_info,
 	if (rocker_tlv_put(desc_info, ROCKER_TLV_OF_DPA_DST_MAC_MASK,
 			   ETH_ALEN, entry->key.term_mac.eth_dst_mask))
 		return -EMSGSIZE;
-	if (rocker_tlv_put_u16(desc_info, ROCKER_TLV_OF_DPA_VLAN_ID,
-			       entry->key.term_mac.vlan_id))
+	if (rocker_tlv_put_be16(desc_info, ROCKER_TLV_OF_DPA_VLAN_ID,
+				entry->key.term_mac.vlan_id))
 		return -EMSGSIZE;
-	if (rocker_tlv_put_u16(desc_info, ROCKER_TLV_OF_DPA_VLAN_ID_MASK,
-			       entry->key.term_mac.vlan_id_mask))
+	if (rocker_tlv_put_be16(desc_info, ROCKER_TLV_OF_DPA_VLAN_ID_MASK,
+				entry->key.term_mac.vlan_id_mask))
 		return -EMSGSIZE;
 	if (rocker_tlv_put_u16(desc_info, ROCKER_TLV_OF_DPA_GOTO_TABLE_ID,
 			       entry->key.term_mac.goto_tbl))
@@ -1773,14 +1790,14 @@ static int
 rocker_cmd_flow_tbl_add_ucast_routing(struct rocker_desc_info *desc_info,
 				      struct rocker_flow_tbl_entry *entry)
 {
-	if (rocker_tlv_put_u16(desc_info, ROCKER_TLV_OF_DPA_ETHERTYPE,
-			       entry->key.ucast_routing.eth_type))
+	if (rocker_tlv_put_be16(desc_info, ROCKER_TLV_OF_DPA_ETHERTYPE,
+				entry->key.ucast_routing.eth_type))
 		return -EMSGSIZE;
-	if (rocker_tlv_put_u32(desc_info, ROCKER_TLV_OF_DPA_DST_IP,
-			       entry->key.ucast_routing.dst4))
+	if (rocker_tlv_put_be32(desc_info, ROCKER_TLV_OF_DPA_DST_IP,
+				entry->key.ucast_routing.dst4))
 		return -EMSGSIZE;
-	if (rocker_tlv_put_u32(desc_info, ROCKER_TLV_OF_DPA_DST_IP_MASK,
-			       entry->key.ucast_routing.dst4_mask))
+	if (rocker_tlv_put_be32(desc_info, ROCKER_TLV_OF_DPA_DST_IP_MASK,
+				entry->key.ucast_routing.dst4_mask))
 		return -EMSGSIZE;
 	if (rocker_tlv_put_u16(desc_info, ROCKER_TLV_OF_DPA_GOTO_TABLE_ID,
 			       entry->key.ucast_routing.goto_tbl))
@@ -1804,8 +1821,8 @@ static int rocker_cmd_flow_tbl_add_bridge(struct rocker_desc_info *desc_info,
 			   ETH_ALEN, entry->key.bridge.eth_dst_mask))
 		return -EMSGSIZE;
 	if (entry->key.bridge.vlan_id &&
-	    rocker_tlv_put_u16(desc_info, ROCKER_TLV_OF_DPA_VLAN_ID,
-			       entry->key.bridge.vlan_id))
+	    rocker_tlv_put_be16(desc_info, ROCKER_TLV_OF_DPA_VLAN_ID,
+				entry->key.bridge.vlan_id))
 		return -EMSGSIZE;
 	if (entry->key.bridge.tunnel_id &&
 	    rocker_tlv_put_u32(desc_info, ROCKER_TLV_OF_DPA_TUNNEL_ID,
@@ -1846,14 +1863,14 @@ static int rocker_cmd_flow_tbl_add_acl(struct rocker_desc_info *desc_info,
 	if (rocker_tlv_put(desc_info, ROCKER_TLV_OF_DPA_DST_MAC_MASK,
 			   ETH_ALEN, entry->key.acl.eth_dst_mask))
 		return -EMSGSIZE;
-	if (rocker_tlv_put_u16(desc_info, ROCKER_TLV_OF_DPA_ETHERTYPE,
-			       entry->key.acl.eth_type))
+	if (rocker_tlv_put_be16(desc_info, ROCKER_TLV_OF_DPA_ETHERTYPE,
+				entry->key.acl.eth_type))
 		return -EMSGSIZE;
-	if (rocker_tlv_put_u16(desc_info, ROCKER_TLV_OF_DPA_VLAN_ID,
-			       entry->key.acl.vlan_id))
+	if (rocker_tlv_put_be16(desc_info, ROCKER_TLV_OF_DPA_VLAN_ID,
+				entry->key.acl.vlan_id))
 		return -EMSGSIZE;
-	if (rocker_tlv_put_u16(desc_info, ROCKER_TLV_OF_DPA_VLAN_ID_MASK,
-			       entry->key.acl.vlan_id_mask))
+	if (rocker_tlv_put_be16(desc_info, ROCKER_TLV_OF_DPA_VLAN_ID_MASK,
+				entry->key.acl.vlan_id_mask))
 		return -EMSGSIZE;
 
 	switch (ntohs(entry->key.acl.eth_type)) {
@@ -2002,8 +2019,8 @@ rocker_cmd_group_tbl_add_l2_rewrite(struct rocker_desc_info *desc_info,
 			   ETH_ALEN, entry->l2_rewrite.eth_dst))
 		return -EMSGSIZE;
 	if (entry->l2_rewrite.vlan_id &&
-	    rocker_tlv_put_u16(desc_info, ROCKER_TLV_OF_DPA_VLAN_ID,
-			       entry->l2_rewrite.vlan_id))
+	    rocker_tlv_put_be16(desc_info, ROCKER_TLV_OF_DPA_VLAN_ID,
+				entry->l2_rewrite.vlan_id))
 		return -EMSGSIZE;
 
 	return 0;
@@ -2048,8 +2065,8 @@ rocker_cmd_group_tbl_add_l3_unicast(struct rocker_desc_info *desc_info,
 			   ETH_ALEN, entry->l3_unicast.eth_dst))
 		return -EMSGSIZE;
 	if (entry->l3_unicast.vlan_id &&
-	    rocker_tlv_put_u16(desc_info, ROCKER_TLV_OF_DPA_VLAN_ID,
-			       entry->l3_unicast.vlan_id))
+	    rocker_tlv_put_be16(desc_info, ROCKER_TLV_OF_DPA_VLAN_ID,
+				entry->l3_unicast.vlan_id))
 		return -EMSGSIZE;
 	if (rocker_tlv_put_u8(desc_info, ROCKER_TLV_OF_DPA_TTL_CHECK,
 			      entry->l3_unicast.ttl_check))
-- 
1.9.3

^ permalink raw reply related

* [patch net-next 2/2] rocker: fix eth_type tybe in struct rocker_ctrl
From: Jiri Pirko @ 2014-12-03 10:32 UTC (permalink / raw)
  To: netdev; +Cc: davem, sfeldma
In-Reply-To: <1417602753-3084-1-git-send-email-jiri@resnulli.us>

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
---
 drivers/net/ethernet/rocker/rocker.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/rocker/rocker.c b/drivers/net/ethernet/rocker/rocker.c
index 4b060fb..5536435 100644
--- a/drivers/net/ethernet/rocker/rocker.c
+++ b/drivers/net/ethernet/rocker/rocker.c
@@ -2753,7 +2753,7 @@ static int rocker_port_vlan_l2_groups(struct rocker_port *rocker_port,
 static struct rocker_ctrl {
 	const u8 *eth_dst;
 	const u8 *eth_dst_mask;
-	u16 eth_type;
+	__be16 eth_type;
 	bool acl;
 	bool bridge;
 	bool term;
-- 
1.9.3

^ permalink raw reply related

* Re: [Discussion] About over-MTU-sized skb in virtualized env
From: Florian Westphal @ 2014-12-03 10:50 UTC (permalink / raw)
  To: Du Fan
  Cc: Florian Westphal, Thomas Graf, Michael S. Tsirkin, Jesse Gross,
	Flavio Leitner, davem@davemloft.net, pshelar, netdev,
	dev@openvswitch.org, Du, Fan
In-Reply-To: <547EB029.5010102@gmail.com>

Du Fan <fengyuleidian0615@gmail.com> wrote:
> Sorry for resend this mail, because my company email is rejected by netdev.
> 
> 
> Hi Florian
> 
>  214 static int ip_finish_output_gso(struct sk_buff *skb)
>  215 {
>  216     netdev_features_t features;
>  217     struct sk_buff *segs;
>  218     int ret = 0;
>  219
>  220     /* common case: locally created skb or seglen is <= mtu */
>  221     if (((IPCB(skb)->flags & IPSKB_FORWARDED) == 0) ||
>  222           skb_gso_network_seglen(skb) <= ip_skb_dst_mtu(skb))
>  223         return ip_finish_output2(skb);
> 
> Could you please state _concrete_ reason why locally created skb
> length is _always_ fitting into MTU size? or why we needs this
> checking.

We don't "need" this checking.  Its just to avoid skb_gso_network_seglen()
computation for the common (local-out) case.

Locally generated GSO packet is not supposed to exceed dst_mtu, as that
is the PMTU discovery start point in absence of lower/learned value.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox