Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [patch net-next v2 1/2] net: sched: fix skb->protocol use in case of accelerated vlan path
From: David Miller @ 2015-01-13 22:51 UTC (permalink / raw)
  To: jiri; +Cc: netdev, jhs, kaber
In-Reply-To: <1421165624-19882-1-git-send-email-jiri@resnulli.us>

From: Jiri Pirko <jiri@resnulli.us>
Date: Tue, 13 Jan 2015 17:13:43 +0100

> tc code implicitly considers skb->protocol even in case of accelerated
> vlan paths and expects vlan protocol type here. However, on rx path,
> if the vlan header was already stripped, skb->protocol contains value
> of next header. Similar situation is on tx path.
> 
> So for skbs that use skb->vlan_tci for tagging, use skb->vlan_proto instead.
> 
> Reported-by: Jamal Hadi Salim <jhs@mojatatu.com>
> Signed-off-by: Jiri Pirko <jiri@resnulli.us>
> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>

Applied.

^ permalink raw reply

* Re: [patch net-next 2/2] net: rename vlan_tx_* helpers since "tx" is misleading there
From: David Miller @ 2015-01-13 22:51 UTC (permalink / raw)
  To: jiri; +Cc: netdev, jhs, kaber
In-Reply-To: <1421165624-19882-2-git-send-email-jiri@resnulli.us>

From: Jiri Pirko <jiri@resnulli.us>
Date: Tue, 13 Jan 2015 17:13:44 +0100

> The same macros are used for rx as well. So rename it.
> 
> Signed-off-by: Jiri Pirko <jiri@resnulli.us>

Applied.

^ permalink raw reply

* Re: [net-next PATCH v2 02/12] net: flow_table: add flow, delete flow
From: Alexei Starovoitov @ 2015-01-13 23:00 UTC (permalink / raw)
  To: John Fastabend
  Cc: Thomas Graf, simon.horman, Scott Feldman, netdev@vger.kernel.org,
	gerlitz.or@gmail.com, Jamal Hadi Salim, Andy Gospodarek,
	David S. Miller
In-Reply-To: <20150113213556.13874.41211.stgit@nitbit.x32>

On Tue, Jan 13, 2015 at 1:35 PM, John Fastabend
<john.fastabend@gmail.com> wrote:
> Now that the device capabilities are exposed we can add support to
> add and delete flows from the tables.
>
> The two operations are
>
> table_set_flows :
>
>   The set flow operations is used to program a set of flows into a
>   hardware device table. The message is consumed via netlink encoded

should now netlink cmd be called table_set_rules ?
and s/flow/rule/ everywhere in commit log?

>   message which is then decoded into a null terminated  array of
>   flow entry structures. A flow entry structure is defined as
>
>      struct net_flow_flow {

commit log no longer matches implementation ;)
should be net_flow_rule ?

can you update your .html writeup?
I hope to see more real examples in there.

btw how the whole thing will work with queue splitting from
your other patch?

^ permalink raw reply

* Re: [PATCH] tcp: Fix RFC reference in comment
From: Banerjee, Debabrata @ 2015-01-13 23:17 UTC (permalink / raw)
  To: John Heffner
  Cc: Yuchung Cheng, David Miller, netdev, linux-kernel@vger.kernel.org
In-Reply-To: <CABrhC0m=aPO=OcfK_gTusHz7CU9OUeXAHoqUDaVTGLDiQnL56Q@mail.gmail.com>

On 1/13/15, 5:01 PM, "John Heffner" <johnwheffner@gmail.com> wrote:

>On Tue, Jan 13, 2015 at 4:42 PM, Banerjee, Debabrata
><dbanerje@akamai.com> wrote:
>> On 1/13/15, 4:36 PM, "Yuchung Cheng" <ycheng@google.com> wrote:
>>
>>>RFC2861 resets the cwnd like in RFC2581, but the rest of the code
>>>implements RFC2861. So I think the current comment is fine.
>>
>>
>> No RFC2861 is an experimental RFC that's implemented in
>> tcp_cwnd_application_limited(). RFC2861 Recommends reducing the cwnd by
>> averaging the current cwnd and the used cwnd as the new cwnd.
>>
>>
>> RFC2581 4.1 Says to set cwnd to initial cwnd if more than one rto has
>> passed since the last send. This is what is implemented in the function
>> above.
>
>Look at the code a little closer -- it's decaying cwnd based on number
>of timeouts as described in 2861, not resetting to IW as recommended
>in 2581.
>
>  -John


You're right it's not RFC2581 I was partially misled by the comment
(reset/restart window), but it doesn't appear to be doing what rfc2861 3.2
says either:

           For i=1  To (tcpnow - T_last)/RTO
	win =  min(cwnd, receiver's declared max window)
	cwnd =  max(win/2, MSS)


Versus:

u32 restart_cwnd = tcp_init_cwnd(tp, dst);

restart_cwnd = min(restart_cwnd, cwnd);

	while ((delta -= inet_csk(sk)->icsk_rto) > 0 && cwnd > restart_cwnd)
	cwnd >>= 1;
	tp->snd_cwnd = max(cwnd, restart_cwnd);


It's not using receiver window, it's using cwnd/init_cwnd, it should at
least be using tp->snd_wnd, no?.


I stumbled onto this because it looks like tcp_cwnd_application_limited()
doesn't execute when it should, because tp->snd_cwnd_stamp is being
touched much more often than in rfc2861 3.2. Something seems not right
here...

-Debabrata

^ permalink raw reply

* [PATCH RESEND v2] net: fec: fix MDIO bus assignement for dual fec SoC's
From: Stefan Agner @ 2015-01-13 23:20 UTC (permalink / raw)
  To: davem-fT/PcQaiUtIeIZ0/mPfg9Q
  Cc: shawn.guo-QSEj5FYQhm4dnm+yROfE0A,
	u.kleine-koenig-bIcnvbaLZ9MEGnE8C9+IrQ,
	fugang.duan-KZfg59tc24xl57MIdRCFDg,
	fabio.estevam-KZfg59tc24xl57MIdRCFDg, mark.rutland-5wv7dgnIgG8,
	robh+dt-DgEjT+Ai2ygdnm+yROfE0A, pawel.moll-5wv7dgnIgG8,
	ijc+devicetree-KcIKpvwj1kUDXYZnReoRVg,
	galak-sgV2jX0FEOL9JmXXK+q4OQ, B38611-KZfg59tc24xl57MIdRCFDg,
	LW-bxm8fMRDkQLDiMYJYoSAnRvVK+yQ3ZXh,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
	stefan-XLVq0VzYD2Y

On i.MX28, the MDIO bus is shared between the two FEC instances.
The driver makes sure that the second FEC uses the MDIO bus of the
first FEC. This is done conditionally if FEC_QUIRK_ENET_MAC is set.
However, in newer designs, such as Vybrid or i.MX6SX, each FEC MAC
has its own MDIO bus. Simply removing the quirk FEC_QUIRK_ENET_MAC
is not an option since other logic, triggered by this quirk, is
still needed.

Furthermore, there are board designs which use the same MDIO bus
for both PHY's even though the second bus would be available on the
SoC side. Such layout are popular since it saves pins on SoC side.
Due to the above quirk, those boards currently do work fine. The
boards in the mainline tree with such a layout are:
- Freescale Vybrid Tower with TWR-SER2 (vf610-twr.dts)
- Freescale i.MX6 SoloX SDB Board (imx6sx-sdb.dts)

This patch adds a new quirk FEC_QUIRK_SINGLE_MDIO for i.MX28, which
makes sure that the MDIO bus of the first FEC is used in any case.

However, the boards above do have a SoC with a MDIO bus for each FEC
instance. But the PHY's are not connected in a 1:1 configuration. A
proper device tree description is needed to allow the driver to
figure out where to find its PHY. This patch fixes that shortcoming
by adding a MDIO bus child node to the first FEC instance, along
with the two PHY's on that bus, and making use of the phy-handle
property to add a reference to the PHY's.

Acked-by: Sascha Hauer <s.hauer-bIcnvbaLZ9MEGnE8C9+IrQ@public.gmane.org>
Signed-off-by: Stefan Agner <stefan-XLVq0VzYD2Y@public.gmane.org>
---
Sorry for this resend, I forgot the netdev mailing list...

There are some discussion around this, see:
https://lkml.org/lkml/2015/1/9/338

I prefer to have the changes in one patch since they are inherently
connected to each other. Shawn seems to be ok with that...

 arch/arm/boot/dts/imx6sx-sdb.dts          | 15 +++++++++++++++
 arch/arm/boot/dts/vf610-twr.dts           | 15 +++++++++++++++
 drivers/net/ethernet/freescale/fec.h      |  2 ++
 drivers/net/ethernet/freescale/fec_main.c |  9 +++++----
 4 files changed, 37 insertions(+), 4 deletions(-)

diff --git a/arch/arm/boot/dts/imx6sx-sdb.dts b/arch/arm/boot/dts/imx6sx-sdb.dts
index 1e6e5cc..8c1febd 100644
--- a/arch/arm/boot/dts/imx6sx-sdb.dts
+++ b/arch/arm/boot/dts/imx6sx-sdb.dts
@@ -159,13 +159,28 @@
 	pinctrl-0 = <&pinctrl_enet1>;
 	phy-supply = <&reg_enet_3v3>;
 	phy-mode = "rgmii";
+	phy-handle = <&ethphy1>;
 	status = "okay";
+
+	mdio {
+		#address-cells = <1>;
+		#size-cells = <0>;
+
+		ethphy1: ethernet-phy@0 {
+			reg = <0>;
+		};
+
+		ethphy2: ethernet-phy@1 {
+			reg = <1>;
+		};
+	};
 };
 
 &fec2 {
 	pinctrl-names = "default";
 	pinctrl-0 = <&pinctrl_enet2>;
 	phy-mode = "rgmii";
+	phy-handle = <&ethphy2>;
 	status = "okay";
 };
 
diff --git a/arch/arm/boot/dts/vf610-twr.dts b/arch/arm/boot/dts/vf610-twr.dts
index a0f7621..f2b64b1 100644
--- a/arch/arm/boot/dts/vf610-twr.dts
+++ b/arch/arm/boot/dts/vf610-twr.dts
@@ -129,13 +129,28 @@
 
 &fec0 {
 	phy-mode = "rmii";
+	phy-handle = <&ethphy0>;
 	pinctrl-names = "default";
 	pinctrl-0 = <&pinctrl_fec0>;
 	status = "okay";
+
+	mdio {
+		#address-cells = <1>;
+		#size-cells = <0>;
+
+		ethphy0: ethernet-phy@0 {
+			reg = <0>;
+		};
+
+		ethphy1: ethernet-phy@1 {
+			reg = <1>;
+		};
+	};
 };
 
 &fec1 {
 	phy-mode = "rmii";
+	phy-handle = <&ethphy1>;
 	pinctrl-names = "default";
 	pinctrl-0 = <&pinctrl_fec1>;
 	status = "okay";
diff --git a/drivers/net/ethernet/freescale/fec.h b/drivers/net/ethernet/freescale/fec.h
index 469691a..4013292 100644
--- a/drivers/net/ethernet/freescale/fec.h
+++ b/drivers/net/ethernet/freescale/fec.h
@@ -424,6 +424,8 @@ struct bufdesc_ex {
  * (40ns * 6).
  */
 #define FEC_QUIRK_BUG_CAPTURE		(1 << 10)
+/* Controller has only one MDIO bus */
+#define FEC_QUIRK_SINGLE_MDIO		(1 << 11)
 
 struct fec_enet_priv_tx_q {
 	int index;
diff --git a/drivers/net/ethernet/freescale/fec_main.c b/drivers/net/ethernet/freescale/fec_main.c
index 5ebdf8d..8dc0391 100644
--- a/drivers/net/ethernet/freescale/fec_main.c
+++ b/drivers/net/ethernet/freescale/fec_main.c
@@ -91,7 +91,8 @@ static struct platform_device_id fec_devtype[] = {
 		.driver_data = 0,
 	}, {
 		.name = "imx28-fec",
-		.driver_data = FEC_QUIRK_ENET_MAC | FEC_QUIRK_SWAP_FRAME,
+		.driver_data = FEC_QUIRK_ENET_MAC | FEC_QUIRK_SWAP_FRAME |
+				FEC_QUIRK_SINGLE_MDIO,
 	}, {
 		.name = "imx6q-fec",
 		.driver_data = FEC_QUIRK_ENET_MAC | FEC_QUIRK_HAS_GBIT |
@@ -1937,7 +1938,7 @@ static int fec_enet_mii_init(struct platform_device *pdev)
 	int err = -ENXIO, i;
 
 	/*
-	 * The dual fec interfaces are not equivalent with enet-mac.
+	 * The i.MX28 dual fec interfaces are not equal.
 	 * Here are the differences:
 	 *
 	 *  - fec0 supports MII & RMII modes while fec1 only supports RMII
@@ -1952,7 +1953,7 @@ static int fec_enet_mii_init(struct platform_device *pdev)
 	 * mdio interface in board design, and need to be configured by
 	 * fec0 mii_bus.
 	 */
-	if ((fep->quirks & FEC_QUIRK_ENET_MAC) && fep->dev_id > 0) {
+	if ((fep->quirks & FEC_QUIRK_SINGLE_MDIO) && fep->dev_id > 0) {
 		/* fec1 uses fec0 mii_bus */
 		if (mii_cnt && fec0_mii_bus) {
 			fep->mii_bus = fec0_mii_bus;
@@ -2015,7 +2016,7 @@ static int fec_enet_mii_init(struct platform_device *pdev)
 	mii_cnt++;
 
 	/* save fec0 mii_bus */
-	if (fep->quirks & FEC_QUIRK_ENET_MAC)
+	if (fep->quirks & FEC_QUIRK_SINGLE_MDIO)
 		fec0_mii_bus = fep->mii_bus;
 
 	return 0;
-- 
2.2.1

--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* Re: [PATCH 2/6] net: davinci_emac: Fix runtime pm calls for davinci_emac
From: Mark A. Greer @ 2015-01-13 23:42 UTC (permalink / raw)
  To: Tony Lindgren
  Cc: Felipe Balbi, David Miller, netdev, linux-omap, Brian Hutchinson
In-Reply-To: <20150113205439.GK2419@atomide.com>

On Tue, Jan 13, 2015 at 12:54:40PM -0800, Tony Lindgren wrote:
> * Felipe Balbi <balbi@ti.com> [150113 11:55]:
> > On Tue, Jan 13, 2015 at 11:29:24AM -0800, Tony Lindgren wrote:
> > > --- a/drivers/net/ethernet/ti/davinci_emac.c
> > > +++ b/drivers/net/ethernet/ti/davinci_emac.c
> > > @@ -1538,7 +1538,7 @@ static int emac_dev_open(struct net_device *ndev)
> > >  	int i = 0;
> > >  	struct emac_priv *priv = netdev_priv(ndev);
> > >  
> > > -	pm_runtime_get(&priv->pdev->dev);
> > > +	pm_runtime_get_sync(&priv->pdev->dev);
> > 
> > gotta check return value on all pm_runtime_get_sync() calls. IIRC,
> > there's a coccinelle script for checking and patching this.
> 
> Sure, here's an updated patch with error checking added.
> 
> Regards,
> 
> Tony
> 
> 8< ---------------------
> From: Tony Lindgren <tony@atomide.com>
> Date: Mon, 22 Dec 2014 08:19:06 -0800
> Subject: [PATCH] net: davinci_emac: Fix runtime pm calls for davinci_emac
> 
> Commit 3ba97381343b ("net: ethernet: davinci_emac: add pm_runtime support")
> added support for runtime PM, but it causes issues on omap3 related devices
> that actually gate the clocks:
> 
> Unhandled fault: external abort on non-linefetch (0x1008)
> ...
> [<c04160f0>] (emac_dev_getnetstats) from [<c04d6a3c>] (dev_get_stats+0x78/0xc8)
> [<c04d6a3c>] (dev_get_stats) from [<c04e9ccc>] (rtnl_fill_ifinfo+0x3b8/0x938)
> [<c04e9ccc>] (rtnl_fill_ifinfo) from [<c04eade4>] (rtmsg_ifinfo+0x68/0xd8)
> [<c04eade4>] (rtmsg_ifinfo) from [<c04dd35c>] (register_netdevice+0x3a0/0x4ec)
> [<c04dd35c>] (register_netdevice) from [<c04dd4bc>] (register_netdev+0x14/0x24)
> [<c04dd4bc>] (register_netdev) from [<c041755c>] (davinci_emac_probe+0x408/0x5c8)
> [<c041755c>] (davinci_emac_probe) from [<c0396d78>] (platform_drv_probe+0x48/0xa4)
> 
> Let's fix it by moving the pm_runtime_get() call earlier, and also add it to
> the emac_dev_getnetstats(). Also note that we want to use pm_runtime_get_sync()
> as we don't want to have deferred_resume happen. And let's also check the
> return value for pm_runtime_get_sync() as noted by Felipe Balbi <balbi@ti.com>.
> 
> Cc: Brian Hutchinson <b.hutchman@gmail.com>
> Cc: Felipe Balbi <balbi@ti.com>
> Cc: Mark A. Greer <mgreer@animalcreek.com>
> Signed-off-by: Tony Lindgren <tony@atomide.com>

Tony,

Sorry for the sloppy pm_runtime patch and thanks for fixing it.

Acked-by: Mark A. Greer <mgreer@animalcreek.com>

^ permalink raw reply

* [PATCH RFC v2 net-next 2/2] ip_tunnel: Remove struct gro_cells
From: Martin KaFai Lau @ 2015-01-13 23:42 UTC (permalink / raw)
  To: netdev; +Cc: Eric Dumazet, kernel-team
In-Reply-To: <1421192564-437455-1-git-send-email-kafai@fb.com>

After adding percpu gro_cells, struct gro_cells can be removed.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
---
 include/net/gro_cells.h  | 34 ++++++++++++++++------------------
 include/net/ip_tunnels.h |  2 +-
 net/ipv4/ip_tunnel.c     | 11 +++++------
 3 files changed, 22 insertions(+), 25 deletions(-)

diff --git a/include/net/gro_cells.h b/include/net/gro_cells.h
index 0f712c0..b1aeea1 100644
--- a/include/net/gro_cells.h
+++ b/include/net/gro_cells.h
@@ -10,21 +10,18 @@ struct gro_cell {
 	struct napi_struct	napi;
 };
 
-struct gro_cells {
-	struct gro_cell __percpu	*cells;
-};
-
-static inline void gro_cells_receive(struct gro_cells *gcells, struct sk_buff *skb)
+static inline void gro_cells_receive(struct gro_cell __percpu *gcells,
+				     struct sk_buff *skb)
 {
 	struct gro_cell *cell;
 	struct net_device *dev = skb->dev;
 
-	if (!gcells->cells || skb_cloned(skb) || !(dev->features & NETIF_F_GRO)) {
+	if (!gcells || skb_cloned(skb) || !(dev->features & NETIF_F_GRO)) {
 		netif_rx(skb);
 		return;
 	}
 
-	cell = this_cpu_ptr(gcells->cells);
+	cell = this_cpu_ptr(gcells);
 
 	if (skb_queue_len(&cell->napi_skbs) > netdev_max_backlog) {
 		atomic_long_inc(&dev->rx_dropped);
@@ -66,37 +63,38 @@ static inline int gro_cell_poll(struct napi_struct *napi, int budget)
 	return work_done;
 }
 
-static inline int gro_cells_init(struct gro_cells *gcells, struct net_device *dev)
+static inline struct gro_cell __percpu *
+gro_cell_alloc_percpu(struct net_device *dev)
 {
+	struct gro_cell __percpu *gcells;
 	int i;
 
-	gcells->cells = alloc_percpu(struct gro_cell);
-	if (!gcells->cells)
-		return -ENOMEM;
+	gcells = alloc_percpu(struct gro_cell);
+	if (!gcells)
+		return ERR_PTR(-ENOMEM);
 
 	for_each_possible_cpu(i) {
-		struct gro_cell *cell = per_cpu_ptr(gcells->cells, i);
+		struct gro_cell *cell = per_cpu_ptr(gcells, i);
 
 		skb_queue_head_init(&cell->napi_skbs);
 		netif_napi_add(dev, &cell->napi, gro_cell_poll, 64);
 		napi_enable(&cell->napi);
 	}
-	return 0;
+	return gcells;
 }
 
-static inline void gro_cells_destroy(struct gro_cells *gcells)
+static inline void gro_cell_free_percpu(struct gro_cell __percpu *gcells)
 {
 	int i;
 
-	if (!gcells->cells)
+	if (IS_ERR_OR_NULL(gcells))
 		return;
 	for_each_possible_cpu(i) {
-		struct gro_cell *cell = per_cpu_ptr(gcells->cells, i);
+		struct gro_cell *cell = per_cpu_ptr(gcells, i);
 		netif_napi_del(&cell->napi);
 		skb_queue_purge(&cell->napi_skbs);
 	}
-	free_percpu(gcells->cells);
-	gcells->cells = NULL;
+	free_percpu(gcells);
 }
 
 #endif
diff --git a/include/net/ip_tunnels.h b/include/net/ip_tunnels.h
index 25a59eb..e406bc3 100644
--- a/include/net/ip_tunnels.h
+++ b/include/net/ip_tunnels.h
@@ -83,7 +83,7 @@ struct ip_tunnel {
 	struct ip_tunnel_prl_entry __rcu *prl;	/* potential router list */
 	unsigned int		prl_count;	/* # of entries in PRL */
 	int			ip_tnl_net_id;
-	struct gro_cells	gro_cells;
+	struct gro_cell __percpu *gro_cells;
 };
 
 #define TUNNEL_CSUM		__cpu_to_be16(0x01)
diff --git a/net/ipv4/ip_tunnel.c b/net/ipv4/ip_tunnel.c
index d3e4479..fcc92c0 100644
--- a/net/ipv4/ip_tunnel.c
+++ b/net/ipv4/ip_tunnel.c
@@ -479,7 +479,7 @@ int ip_tunnel_rcv(struct ip_tunnel *tunnel, struct sk_buff *skb,
 		skb->dev = tunnel->dev;
 	}
 
-	gro_cells_receive(&tunnel->gro_cells, skb);
+	gro_cells_receive(tunnel->gro_cells, skb);
 	return 0;
 
 drop:
@@ -952,7 +952,7 @@ static void ip_tunnel_dev_free(struct net_device *dev)
 {
 	struct ip_tunnel *tunnel = netdev_priv(dev);
 
-	gro_cells_destroy(&tunnel->gro_cells);
+	gro_cell_free_percpu(tunnel->gro_cells);
 	free_percpu(tunnel->dst_cache);
 	free_percpu(dev->tstats);
 	free_netdev(dev);
@@ -1120,7 +1120,6 @@ int ip_tunnel_init(struct net_device *dev)
 {
 	struct ip_tunnel *tunnel = netdev_priv(dev);
 	struct iphdr *iph = &tunnel->parms.iph;
-	int err;
 
 	dev->destructor	= ip_tunnel_dev_free;
 	dev->tstats = netdev_alloc_pcpu_stats(struct pcpu_sw_netstats);
@@ -1133,11 +1132,11 @@ int ip_tunnel_init(struct net_device *dev)
 		return -ENOMEM;
 	}
 
-	err = gro_cells_init(&tunnel->gro_cells, dev);
-	if (err) {
+	tunnel->gro_cells = gro_cell_alloc_percpu(dev);
+	if (IS_ERR(tunnel->gro_cells)) {
 		free_percpu(tunnel->dst_cache);
 		free_percpu(dev->tstats);
-		return err;
+		return PTR_ERR(tunnel->gro_cells);
 	}
 
 	tunnel->dev = dev;
-- 
1.8.1

^ permalink raw reply related

* [PATCH RFC v2 net-next 0/2] ip_tunnel: Create percpu gro_cell
From: Martin KaFai Lau @ 2015-01-13 23:42 UTC (permalink / raw)
  To: netdev; +Cc: Eric Dumazet, kernel-team

In the ipip tunnel, the skb->queue_mapping is lost in ipip_rcv().
All skb will be queued to the same cell->napi_skbs.  The
gro_cell_poll is pinned to one core under load. In production traffic,
we also see severe rx_dropped in the tunl iface and it is probably due to
this limit: skb_queue_len(&cell->napi_skbs) > netdev_max_backlog

This patch is trying to alloc_percpu(struct gro_cell) and schedule
gro_cell_poll to process the skb in the same core.

Changes from v1:
Eric Dumazet pointed out that ____cacheline_aligned_in_smp is no longer needed.

Setup:
VIP_PREFIX=9.9.9.9/32
REMOTE_REAL_IP=10.228.95.75

if [ "$1" = "encap" ]
then
    sudo ip tunnel add mode ipip remote ${REMOTE_REAL_IP}
    sudo ip link set dev ipip0 up
    sudo ip route add dev ipip0 ${VIP_PREFIX}
else
    # Decapsulating host

    sudo ip tunnel add mode ipip
    sudo ip link set dev tunl0 up
    sudo ip addr add dev lo ${VIP_PREFIX}
    sudo sysctl -a  | grep '\.rp_filter' | awk '{print $1;}' | \
        xargs -n1 -I{} sudo sysctl {}=0
fi

Before:
[root@DECAP ~]# netserver -p 8888
[root@ENCAP ~]# super_netperf 200 -t TCP_RR -H 9.9.9.9 -p 8888 \
-l 30 -- -d 0x6 -m 8k,64k -s 1M -S 1M
332215
[root@DECAP ~]# perf probe -a gro_cell_poll
[root@DECAP ~]# perf stat -I 1000 -a -A -e probe:gro_cell_poll
   117.258518273 CPU0                     0      probe:gro_cell_poll
   117.258518273 CPU1                     0      probe:gro_cell_poll
   117.258518273 CPU2                     0      probe:gro_cell_poll
   117.258518273 CPU3                     0      probe:gro_cell_poll
   117.258518273 CPU4                     0      probe:gro_cell_poll
   117.258518273 CPU5                     0      probe:gro_cell_poll
   117.258518273 CPU6                     0      probe:gro_cell_poll
   117.258518273 CPU7                     0      probe:gro_cell_poll
   117.258518273 CPU8                     0      probe:gro_cell_poll
   117.258518273 CPU9                     0      probe:gro_cell_poll
   117.258518273 CPU10                    0      probe:gro_cell_poll
   117.258518273 CPU11                    0      probe:gro_cell_poll
   117.258518273 CPU12                    0      probe:gro_cell_poll
   117.258518273 CPU13                    0      probe:gro_cell_poll
   117.258518273 CPU14                    0      probe:gro_cell_poll
   117.258518273 CPU15                4,882      probe:gro_cell_poll
   117.258518273 CPU16                    0      probe:gro_cell_poll
   117.258518273 CPU17                    0      probe:gro_cell_poll
   117.258518273 CPU18                    0      probe:gro_cell_poll
   117.258518273 CPU19                    0      probe:gro_cell_poll
   117.258518273 CPU20                    0      probe:gro_cell_poll
   117.258518273 CPU21                    0      probe:gro_cell_poll
   117.258518273 CPU22                    0      probe:gro_cell_poll
   117.258518273 CPU23                    0      probe:gro_cell_poll
   117.258518273 CPU24                    0      probe:gro_cell_poll
   117.258518273 CPU25                    0      probe:gro_cell_poll
   117.258518273 CPU26                    0      probe:gro_cell_poll
   117.258518273 CPU27                    0      probe:gro_cell_poll
   117.258518273 CPU28                    0      probe:gro_cell_poll
   117.258518273 CPU29                    0      probe:gro_cell_poll
   117.258518273 CPU30                    0      probe:gro_cell_poll
   117.258518273 CPU31                    0      probe:gro_cell_poll
   117.258518273 CPU32                    0      probe:gro_cell_poll
   117.258518273 CPU33                    0      probe:gro_cell_poll
   117.258518273 CPU34                    0      probe:gro_cell_poll
   117.258518273 CPU35                    0      probe:gro_cell_poll
   117.258518273 CPU36                    0      probe:gro_cell_poll
   117.258518273 CPU37                    0      probe:gro_cell_poll
   117.258518273 CPU38                    0      probe:gro_cell_poll
   117.258518273 CPU39                    0      probe:gro_cell_poll

After:
[root@DECAP ~]# netserver -p 8888
[root@ENCAP ~]# super_netperf 200 -t TCP_RR -H 9.9.9.9 -p 8888 \
-l 30 -- -d 0x6 -m 8k,64k -s 1M -S 1M
877530
[root@DECAP ~]# perf probe -a gro_cell_poll
[root@DECAP ~]# perf stat -I 1000 -a -A -e probe:gro_cell_poll
    40.085714389 CPU0                13,607      probe:gro_cell_poll
    40.085714389 CPU1                13,188      probe:gro_cell_poll
    40.085714389 CPU2                12,913      probe:gro_cell_poll
    40.085714389 CPU3                12,790      probe:gro_cell_poll
    40.085714389 CPU4                13,395      probe:gro_cell_poll
    40.085714389 CPU5                13,121      probe:gro_cell_poll
    40.085714389 CPU6                11,083      probe:gro_cell_poll
    40.085714389 CPU7                12,945      probe:gro_cell_poll
    40.085714389 CPU8                13,704      probe:gro_cell_poll
    40.085714389 CPU9                13,514      probe:gro_cell_poll
    40.085714389 CPU10                    0      probe:gro_cell_poll
    40.085714389 CPU11                    0      probe:gro_cell_poll
    40.085714389 CPU12                    0      probe:gro_cell_poll
    40.085714389 CPU13                    0      probe:gro_cell_poll
    40.085714389 CPU14                    0      probe:gro_cell_poll
    40.085714389 CPU15                    0      probe:gro_cell_poll
    40.085714389 CPU16                    0      probe:gro_cell_poll
    40.085714389 CPU17                    0      probe:gro_cell_poll
    40.085714389 CPU18                    0      probe:gro_cell_poll
    40.085714389 CPU19                    0      probe:gro_cell_poll
    40.085714389 CPU20               10,402      probe:gro_cell_poll
    40.085714389 CPU21               12,312      probe:gro_cell_poll
    40.085714389 CPU22               11,913      probe:gro_cell_poll
    40.085714389 CPU23               12,964      probe:gro_cell_poll
    40.085714389 CPU24               13,727      probe:gro_cell_poll
    40.085714389 CPU25               12,943      probe:gro_cell_poll
    40.085714389 CPU26               13,558      probe:gro_cell_poll
    40.085714389 CPU27               12,676      probe:gro_cell_poll
    40.085714389 CPU28               13,754      probe:gro_cell_poll
    40.085714389 CPU29               13,379      probe:gro_cell_poll
    40.085714389 CPU30                    0      probe:gro_cell_poll
    40.085714389 CPU31                    0      probe:gro_cell_poll
    40.085714389 CPU32                    0      probe:gro_cell_poll
    40.085714389 CPU33                    0      probe:gro_cell_poll
    40.085714389 CPU34                    0      probe:gro_cell_poll
    40.085714389 CPU35                    0      probe:gro_cell_poll
    40.085714389 CPU36                    0      probe:gro_cell_poll
    40.085714389 CPU37                    0      probe:gro_cell_poll
    40.085714389 CPU38                    0      probe:gro_cell_poll
    40.085714389 CPU39                    0      probe:gro_cell_poll

^ permalink raw reply

* [PATCH RFC v2 net-next 1/2] ip_tunnel: Create percpu gro_cell
From: Martin KaFai Lau @ 2015-01-13 23:42 UTC (permalink / raw)
  To: netdev; +Cc: Eric Dumazet, kernel-team
In-Reply-To: <1421192564-437455-1-git-send-email-kafai@fb.com>

In the ipip tunnel, the skb->queue_mapping is lost in ipip_rcv().
All skb will be queued to the same cell->napi_skbs.  The
gro_cell_poll is pinned to one core under load.  In production traffic,
we also see severe rx_dropped in the tunl iface and it is probably due to
this limit: skb_queue_len(&cell->napi_skbs) > netdev_max_backlog.

This patch is trying to alloc_percpu(struct gro_cell) and schedule
gro_cell_poll to process the skb in the same core.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
---
 include/net/gro_cells.h | 29 ++++++++++++-----------------
 1 file changed, 12 insertions(+), 17 deletions(-)

diff --git a/include/net/gro_cells.h b/include/net/gro_cells.h
index 734d9b5..0f712c0 100644
--- a/include/net/gro_cells.h
+++ b/include/net/gro_cells.h
@@ -8,25 +8,23 @@
 struct gro_cell {
 	struct sk_buff_head	napi_skbs;
 	struct napi_struct	napi;
-} ____cacheline_aligned_in_smp;
+};
 
 struct gro_cells {
-	unsigned int		gro_cells_mask;
-	struct gro_cell		*cells;
+	struct gro_cell __percpu	*cells;
 };
 
 static inline void gro_cells_receive(struct gro_cells *gcells, struct sk_buff *skb)
 {
-	struct gro_cell *cell = gcells->cells;
+	struct gro_cell *cell;
 	struct net_device *dev = skb->dev;
 
-	if (!cell || skb_cloned(skb) || !(dev->features & NETIF_F_GRO)) {
+	if (!gcells->cells || skb_cloned(skb) || !(dev->features & NETIF_F_GRO)) {
 		netif_rx(skb);
 		return;
 	}
 
-	if (skb_rx_queue_recorded(skb))
-		cell += skb_get_rx_queue(skb) & gcells->gro_cells_mask;
+	cell = this_cpu_ptr(gcells->cells);
 
 	if (skb_queue_len(&cell->napi_skbs) > netdev_max_backlog) {
 		atomic_long_inc(&dev->rx_dropped);
@@ -72,15 +70,12 @@ static inline int gro_cells_init(struct gro_cells *gcells, struct net_device *de
 {
 	int i;
 
-	gcells->gro_cells_mask = roundup_pow_of_two(netif_get_num_default_rss_queues()) - 1;
-	gcells->cells = kcalloc(gcells->gro_cells_mask + 1,
-				sizeof(struct gro_cell),
-				GFP_KERNEL);
+	gcells->cells = alloc_percpu(struct gro_cell);
 	if (!gcells->cells)
 		return -ENOMEM;
 
-	for (i = 0; i <= gcells->gro_cells_mask; i++) {
-		struct gro_cell *cell = gcells->cells + i;
+	for_each_possible_cpu(i) {
+		struct gro_cell *cell = per_cpu_ptr(gcells->cells, i);
 
 		skb_queue_head_init(&cell->napi_skbs);
 		netif_napi_add(dev, &cell->napi, gro_cell_poll, 64);
@@ -91,16 +86,16 @@ static inline int gro_cells_init(struct gro_cells *gcells, struct net_device *de
 
 static inline void gro_cells_destroy(struct gro_cells *gcells)
 {
-	struct gro_cell *cell = gcells->cells;
 	int i;
 
-	if (!cell)
+	if (!gcells->cells)
 		return;
-	for (i = 0; i <= gcells->gro_cells_mask; i++,cell++) {
+	for_each_possible_cpu(i) {
+		struct gro_cell *cell = per_cpu_ptr(gcells->cells, i);
 		netif_napi_del(&cell->napi);
 		skb_queue_purge(&cell->napi_skbs);
 	}
-	kfree(gcells->cells);
+	free_percpu(gcells->cells);
 	gcells->cells = NULL;
 }
 
-- 
1.8.1

^ permalink raw reply related

* Re: [patch iproute2 2/2] tc: add support for BPF based actions
From: Stephen Hemminger @ 2015-01-14  1:19 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: netdev, davem, jhs
In-Reply-To: <1420649244-9574-2-git-send-email-jiri@resnulli.us>

On Wed,  7 Jan 2015 17:47:24 +0100
Jiri Pirko <jiri@resnulli.us> wrote:

> Signed-off-by: Jiri Pirko <jiri@resnulli.us>
> ---
>  include/linux/tc_act/tc_bpf.h |  31 +++++++

The overall concept is fine, but the mechanics of file management
gets in the way.

All header files in the linux directory of iproute2 must come from sanitized
version of kernel headers that are in uapi/linux in the kernel source.

I take headers from 'make install_headers' and copy them over to keep
in sync. The headers for master branch are kept in sync with upstream
kernel (from Linus tree), and for net-next branch I derive them from Davem's
net-next tree.

If you want to use tc_bpf.h it has to be in upstream kernel first.

Sorry

^ permalink raw reply

* Re: [PATCH iproute2 0/3] ip netns: Run over all netns
From: Stephen Hemminger @ 2015-01-14  1:28 UTC (permalink / raw)
  To: Vadim Kochan; +Cc: netdev
In-Reply-To: <1420628662-9930-1-git-send-email-vadim4j@gmail.com>

On Wed,  7 Jan 2015 13:04:19 +0200
Vadim Kochan <vadim4j@gmail.com> wrote:

> From: Vadim Kochan <vadim4j@gmail.com>
> 
> Allow 'ip netns del' and 'ip netns exec' run over each network namespace names.
> 
> 'ip netns exec' executes command forcely on eacn nsname.
> 
> Vadim Kochan (3):
>   lib: Exec func on each netns
>   ip netns: Allow exec on each netns
>   ip netns: Delete all netns
> 
>  include/namespace.h |  6 ++++
>  include/utils.h     |  5 +++
>  ip/ipnetns.c        | 96 ++++++++++++++++++++++++++++++++---------------------
>  lib/namespace.c     | 22 ++++++++++++
>  lib/utils.c         | 28 ++++++++++++++++
>  man/man8/ip-netns.8 | 26 ++++++++++++---
>  6 files changed, 141 insertions(+), 42 deletions(-)
> 

It is a useful concept but as others have pointed out the idea of reserving
a keyword 'all' at this point is likely to cause somebody to break.
So sorry.

^ permalink raw reply

* Re: [PATCH iproute2] ss: Filter inet dgram sockets with established state by default
From: Stephen Hemminger @ 2015-01-14  1:31 UTC (permalink / raw)
  To: Vadim Kochan; +Cc: netdev
In-Reply-To: <1420738342-3750-1-git-send-email-vadim4j@gmail.com>

On Thu,  8 Jan 2015 19:32:22 +0200
Vadim Kochan <vadim4j@gmail.com> wrote:

> From: Vadim Kochan <vadim4j@gmail.com>
> 
> As inet dgram sockets (udp, raw) can call connect(...)  - they
> might be set in ESTABLISHED state. So keep the original behaviour of
> 'ss' which filtered them by ESTABLISHED state by default. So:
> 
>     $ ss -u
> 
>     or
> 
>     $ ss -w
> 
> Will show only ESTABLISHED UDP sockets by default.
> 
> Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
> ---
>  misc/ss.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/misc/ss.c b/misc/ss.c
> index 08d210a..015d829 100644
> --- a/misc/ss.c
> +++ b/misc/ss.c
> @@ -170,11 +170,11 @@ static const struct filter default_dbs[MAX_DB] = {
>  		.families = (1 << AF_INET) | (1 << AF_INET6),
>  	},
>  	[UDP_DB] = {
> -		.states   = (1 << SS_CLOSE),
> +		.states   = (1 << SS_ESTABLISHED),
>  		.families = (1 << AF_INET) | (1 << AF_INET6),
>  	},
>  	[RAW_DB] = {
> -		.states   = (1 << SS_CLOSE),
> +		.states   = (1 << SS_ESTABLISHED),
>  		.families = (1 << AF_INET) | (1 << AF_INET6),
>  	},
>  	[UNIX_DG_DB] = {

This is a change likely to break somebody using 'ss -u' now and the bound
sockets will disappear from the output.

^ permalink raw reply

* Re: [PATCH iproute2] ss: Usage filter state names, options alignment
From: Stephen Hemminger @ 2015-01-14  1:33 UTC (permalink / raw)
  To: Vadim Kochan; +Cc: netdev
In-Reply-To: <1420684243-23921-1-git-send-email-vadim4j@gmail.com>

On Thu,  8 Jan 2015 04:30:43 +0200
Vadim Kochan <vadim4j@gmail.com> wrote:

> From: Vadim Kochan <vadim4j@gmail.com>
> 
> Signed-off-by: Vadim Kochan <vadim4j@gmail.com>

Accepted

^ permalink raw reply

* Re: [PATCH iproute2] ss: Fix case when UDP is printed as ipproto-xxx
From: Stephen Hemminger @ 2015-01-14  1:33 UTC (permalink / raw)
  To: Vadim Kochan; +Cc: netdev
In-Reply-To: <1420677774-10288-1-git-send-email-vadim4j@gmail.com>

On Thu,  8 Jan 2015 02:42:54 +0200
Vadim Kochan <vadim4j@gmail.com> wrote:

> From: Vadim Kochan <vadim4j@gmail.com>
> 
> When 'ss' prints UDP sockets info together with RAW sockets
> e.g.:
> 
>     $ ss -a
> 
> then UDP sockets are resolved as "ipproto-xxx".
> 
> It was caused that dg_proto was set after printing UDP
> socket info from netlink. So fixed issue by moving
> setting dg_proto before printing info from Netlink.
> 
> Signed-off-by: Vadim Kochan <vadim4j@gmail.com>

Accepted

^ permalink raw reply

* Re: [PATCH iproute2 -next v3] ip: route: add congestion control metric
From: Stephen Hemminger @ 2015-01-14  1:41 UTC (permalink / raw)
  To: Daniel Borkmann; +Cc: fw, netdev
In-Reply-To: <1420758786-32666-1-git-send-email-dborkman@redhat.com>

On Fri,  9 Jan 2015 00:13:06 +0100
Daniel Borkmann <dborkman@redhat.com> wrote:

> This patch adds configuration and dumping of congestion control metric
> for ip route, for example:
> 
>   ip route add <dst> dev foo congctl [lock] dctcp
> 
> Reference: http://thread.gmane.org/gmane.linux.network/344733
> Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
> ---
>  v2->v3:
>   - use rta_getattr_u32 et al helpers
>   - regarding discussed mx(un)lock, I'll do a follow-up for
>     all users inside iproute_modify() if that is wished
>  v1->v2:
>   - adapted mxlock setting to kernel style
>   - use arg directly in rta_addattr_l
> 
>  include/linux/rtnetlink.h |  2 ++
>  ip/iproute.c              | 22 ++++++++++++++++++----
>  man/man8/ip-route.8.in    | 19 ++++++++++++++++++-
>  3 files changed, 38 insertions(+), 5 deletions(-)
> 

Applied to net-next branch.

^ permalink raw reply

* [PATCH net-next 0/2] cxgb4/cxgb4i : Update & use ipv6 handling api
From: Anish Bhatt @ 2015-01-14  2:28 UTC (permalink / raw)
  To: netdev; +Cc: davem, hariprasad, kxie, deepak.s, Anish Bhatt

This patch series consolidates and updates the ipv6 api, as well as exports
it for use by upper level drivers dependent on cxgb4

Anish Bhatt (2):
  cxgb4: Update ipv6 address handling api
  cxgb4i : Call into recently added cxgb4 ipv6 api

 drivers/net/ethernet/chelsio/cxgb4/Makefile        |   2 +-
 drivers/net/ethernet/chelsio/cxgb4/clip_tbl.c      | 314 +++++++++++++++++++++
 drivers/net/ethernet/chelsio/cxgb4/clip_tbl.h      |  41 +++
 drivers/net/ethernet/chelsio/cxgb4/cxgb4.h         |   3 +
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c |  19 ++
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c    | 228 +++++----------
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.h     |   3 -
 drivers/scsi/cxgbi/cxgb4i/cxgb4i.c                 |  23 +-
 8 files changed, 469 insertions(+), 164 deletions(-)
 create mode 100644 drivers/net/ethernet/chelsio/cxgb4/clip_tbl.c
 create mode 100644 drivers/net/ethernet/chelsio/cxgb4/clip_tbl.h

-- 
2.2.1

^ permalink raw reply

* [PATCH net-next 1/2] cxgb4: Update ipv6 address handling api
From: Anish Bhatt @ 2015-01-14  2:28 UTC (permalink / raw)
  To: netdev; +Cc: davem, hariprasad, kxie, deepak.s, Anish Bhatt
In-Reply-To: <1421202516-13862-1-git-send-email-anish@chelsio.com>

This patch improves on previously added support for ipv6 addresses. The code
is consolidated to a single file and adds an api for use by dependent upper
level drivers such as cxgb4i/iw_cxgb4 etc.

Signed-off-by: Anish Bhatt <anish@chelsio.com>
Signed-off-by: Deepak Singh <deepak.s@chelsio.com>
---
 drivers/net/ethernet/chelsio/cxgb4/Makefile        |   2 +-
 drivers/net/ethernet/chelsio/cxgb4/clip_tbl.c      | 314 +++++++++++++++++++++
 drivers/net/ethernet/chelsio/cxgb4/clip_tbl.h      |  41 +++
 drivers/net/ethernet/chelsio/cxgb4/cxgb4.h         |   3 +
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c |  19 ++
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c    | 228 +++++----------
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.h     |   3 -
 7 files changed, 447 insertions(+), 163 deletions(-)
 create mode 100644 drivers/net/ethernet/chelsio/cxgb4/clip_tbl.c
 create mode 100644 drivers/net/ethernet/chelsio/cxgb4/clip_tbl.h

diff --git a/drivers/net/ethernet/chelsio/cxgb4/Makefile b/drivers/net/ethernet/chelsio/cxgb4/Makefile
index b85280775997..ae50cd72358c 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/Makefile
+++ b/drivers/net/ethernet/chelsio/cxgb4/Makefile
@@ -4,6 +4,6 @@
 
 obj-$(CONFIG_CHELSIO_T4) += cxgb4.o
 
-cxgb4-objs := cxgb4_main.o l2t.o t4_hw.o sge.o
+cxgb4-objs := cxgb4_main.o l2t.o t4_hw.o sge.o clip_tbl.o
 cxgb4-$(CONFIG_CHELSIO_T4_DCB) +=  cxgb4_dcb.o
 cxgb4-$(CONFIG_DEBUG_FS) += cxgb4_debugfs.o
diff --git a/drivers/net/ethernet/chelsio/cxgb4/clip_tbl.c b/drivers/net/ethernet/chelsio/cxgb4/clip_tbl.c
new file mode 100644
index 000000000000..dacecba70f76
--- /dev/null
+++ b/drivers/net/ethernet/chelsio/cxgb4/clip_tbl.c
@@ -0,0 +1,314 @@
+/*
+ *  This file is part of the Chelsio T4 Ethernet driver for Linux.
+ *  Copyright (C) 2003-2014 Chelsio Communications.  All rights reserved.
+ *
+ *  Written by Deepak (deepak.s@chelsio.com)
+ *
+ *  This program is distributed in the hope that it will be useful, but WITHOUT
+ *  ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ *  FITNESS FOR A PARTICULAR PURPOSE.  See the LICENSE file included in this
+ *  release for licensing terms and conditions.
+ */
+
+#include <linux/module.h>
+#include <linux/netdevice.h>
+#include <linux/jhash.h>
+#include <linux/if_vlan.h>
+#include <net/addrconf.h>
+#include "cxgb4.h"
+#include "clip_tbl.h"
+
+static inline unsigned int ipv4_clip_hash(struct clip_tbl *c, const u32 *key)
+{
+	unsigned int clipt_size_half = c->clipt_size / 2;
+
+	return jhash_1word(*key, 0) % clipt_size_half;
+}
+
+static inline unsigned int ipv6_clip_hash(struct clip_tbl *d, const u32 *key)
+{
+	unsigned int clipt_size_half = d->clipt_size / 2;
+	u32 xor = key[0] ^ key[1] ^ key[2] ^ key[3];
+
+	return clipt_size_half +
+		(jhash_1word(xor, 0) % clipt_size_half);
+}
+
+static unsigned int clip_addr_hash(struct clip_tbl *ctbl, const u32 *addr,
+				   int addr_len)
+{
+	return addr_len == 4 ? ipv4_clip_hash(ctbl, addr) :
+				ipv6_clip_hash(ctbl, addr);
+}
+
+static int clip6_get_mbox(const struct net_device *dev,
+			  const struct in6_addr *lip)
+{
+	struct adapter *adap = netdev2adap(dev);
+	struct fw_clip_cmd c;
+
+	memset(&c, 0, sizeof(c));
+	c.op_to_write = htonl(FW_CMD_OP_V(FW_CLIP_CMD) |
+			FW_CMD_REQUEST_F | FW_CMD_WRITE_F);
+	c.alloc_to_len16 = htonl(FW_CLIP_CMD_ALLOC_F | FW_LEN16(c));
+	*(__be64 *)&c.ip_hi = *(__be64 *)(lip->s6_addr);
+	*(__be64 *)&c.ip_lo = *(__be64 *)(lip->s6_addr + 8);
+	return t4_wr_mbox_meat(adap, adap->mbox, &c, sizeof(c), &c, false);
+}
+
+static int clip6_release_mbox(const struct net_device *dev,
+			      const struct in6_addr *lip)
+{
+	struct adapter *adap = netdev2adap(dev);
+	struct fw_clip_cmd c;
+
+	memset(&c, 0, sizeof(c));
+	c.op_to_write = htonl(FW_CMD_OP_V(FW_CLIP_CMD) |
+			FW_CMD_REQUEST_F | FW_CMD_READ_F);
+	c.alloc_to_len16 = htonl(FW_CLIP_CMD_FREE_F | FW_LEN16(c));
+	*(__be64 *)&c.ip_hi = *(__be64 *)(lip->s6_addr);
+	*(__be64 *)&c.ip_lo = *(__be64 *)(lip->s6_addr + 8);
+	return t4_wr_mbox_meat(adap, adap->mbox, &c, sizeof(c), &c, false);
+}
+
+int cxgb4_clip_get(const struct net_device *dev, const u32 *lip, u8 v6)
+{
+	struct adapter *adap = netdev2adap(dev);
+	struct clip_tbl *ctbl = adap->clipt;
+	struct clip_entry *ce, *cte;
+	u32 *addr = (u32 *)lip;
+	int hash;
+	int addr_len;
+	int ret = 0;
+
+	if (v6)
+		addr_len = 16;
+	else
+		addr_len = 4;
+
+	hash = clip_addr_hash(ctbl, addr, addr_len);
+
+	read_lock_bh(&ctbl->lock);
+	list_for_each_entry(cte, &ctbl->hash_list[hash], list) {
+		if (addr_len == cte->addr_len &&
+		    memcmp(lip, cte->addr, cte->addr_len) == 0) {
+			ce = cte;
+			read_unlock_bh(&ctbl->lock);
+			goto found;
+		}
+	}
+	read_unlock_bh(&ctbl->lock);
+
+	write_lock_bh(&ctbl->lock);
+	if (!list_empty(&ctbl->ce_free_head)) {
+		ce = list_first_entry(&ctbl->ce_free_head,
+				      struct clip_entry, list);
+		list_del(&ce->list);
+		INIT_LIST_HEAD(&ce->list);
+		spin_lock_init(&ce->lock);
+		atomic_set(&ce->refcnt, 0);
+		atomic_dec(&ctbl->nfree);
+		ce->addr_len = addr_len;
+		memcpy(ce->addr, lip, addr_len);
+		list_add_tail(&ce->list, &ctbl->hash_list[hash]);
+		if (v6) {
+			ret = clip6_get_mbox(dev, (const struct in6_addr *)lip);
+			if (ret) {
+				write_unlock_bh(&ctbl->lock);
+				return ret;
+			}
+		}
+	} else {
+		write_unlock_bh(&ctbl->lock);
+		return -ENOMEM;
+	}
+	write_unlock_bh(&ctbl->lock);
+found:
+	atomic_inc(&ce->refcnt);
+
+	return 0;
+}
+EXPORT_SYMBOL(cxgb4_clip_get);
+
+void cxgb4_clip_release(const struct net_device *dev, const u32 *lip, u8 v6)
+{
+	struct adapter *adap = netdev2adap(dev);
+	struct clip_tbl *ctbl = adap->clipt;
+	struct clip_entry *ce, *cte;
+	u32 *addr = (u32 *)lip;
+	int hash;
+	int addr_len;
+
+	if (v6)
+		addr_len = 16;
+	else
+		addr_len = 4;
+
+	hash = clip_addr_hash(ctbl, addr, addr_len);
+
+	read_lock_bh(&ctbl->lock);
+	list_for_each_entry(cte, &ctbl->hash_list[hash], list) {
+		if (addr_len == cte->addr_len &&
+		    memcmp(lip, cte->addr, cte->addr_len) == 0) {
+			ce = cte;
+			read_unlock_bh(&ctbl->lock);
+			goto found;
+		}
+	}
+	read_unlock_bh(&ctbl->lock);
+
+	return;
+found:
+	write_lock_bh(&ctbl->lock);
+	spin_lock_bh(&ce->lock);
+	if (atomic_dec_and_test(&ce->refcnt)) {
+		list_del(&ce->list);
+		INIT_LIST_HEAD(&ce->list);
+		list_add_tail(&ce->list, &ctbl->ce_free_head);
+		atomic_inc(&ctbl->nfree);
+		if (v6)
+			clip6_release_mbox(dev, (const struct in6_addr *)lip);
+	}
+	spin_unlock_bh(&ce->lock);
+	write_unlock_bh(&ctbl->lock);
+}
+EXPORT_SYMBOL(cxgb4_clip_release);
+
+/* Retrieves IPv6 addresses from a root device (bond, vlan) associated with
+ * a physical device.
+ * The physical device reference is needed to send the actul CLIP command.
+ */
+static int cxgb4_update_dev_clip(struct net_device *root_dev,
+				 struct net_device *dev)
+{
+	struct inet6_dev *idev = NULL;
+	struct inet6_ifaddr *ifa;
+	int ret = 0;
+
+	idev = __in6_dev_get(root_dev);
+	if (!idev)
+		return ret;
+
+	read_lock_bh(&idev->lock);
+	list_for_each_entry(ifa, &idev->addr_list, if_list) {
+		ret = cxgb4_clip_get(dev, (const u32 *)ifa->addr.s6_addr, 1);
+		if (ret < 0)
+			break;
+	}
+	read_unlock_bh(&idev->lock);
+
+	return ret;
+}
+
+int cxgb4_update_root_dev_clip(struct net_device *dev)
+{
+	struct net_device *root_dev = NULL;
+	int i, ret = 0;
+
+	/* First populate the real net device's IPv6 addresses */
+	ret = cxgb4_update_dev_clip(dev, dev);
+	if (ret)
+		return ret;
+
+	/* Parse all bond and vlan devices layered on top of the physical dev */
+	root_dev = netdev_master_upper_dev_get_rcu(dev);
+	if (root_dev) {
+		ret = cxgb4_update_dev_clip(root_dev, dev);
+		if (ret)
+			return ret;
+	}
+
+	for (i = 0; i < VLAN_N_VID; i++) {
+		root_dev = __vlan_find_dev_deep_rcu(dev, htons(ETH_P_8021Q), i);
+		if (!root_dev)
+			continue;
+
+		ret = cxgb4_update_dev_clip(root_dev, dev);
+		if (ret)
+			break;
+	}
+
+	return ret;
+}
+EXPORT_SYMBOL(cxgb4_update_root_dev_clip);
+
+int clip_tbl_show(struct seq_file *seq, void *v)
+{
+	struct adapter *adapter = seq->private;
+	struct clip_tbl *ctbl = adapter->clipt;
+	struct clip_entry *ce;
+	char ip[60];
+	int i;
+
+	read_lock_bh(&ctbl->lock);
+
+	seq_puts(seq, "IP Address                  Users\n");
+	for (i = 0 ; i < ctbl->clipt_size;  ++i) {
+		list_for_each_entry(ce, &ctbl->hash_list[i], list) {
+			ip[0] = '\0';
+			if (ce->addr_len == 16)
+				sprintf(ip, "%pI6c", ce->addr);
+			else
+				sprintf(ip, "%pI4c", ce->addr);
+			seq_printf(seq, "%-25s   %u\n", ip,
+				   atomic_read(&ce->refcnt));
+		}
+	}
+	seq_printf(seq, "Free clip entries : %d\n", atomic_read(&ctbl->nfree));
+
+	read_unlock_bh(&ctbl->lock);
+
+	return 0;
+}
+
+struct clip_tbl *t4_init_clip_tbl(unsigned int clipt_start,
+				  unsigned int clipt_end)
+{
+	struct clip_entry *cl_list;
+	struct clip_tbl *ctbl;
+	unsigned int clipt_size;
+	int i;
+
+	if (clipt_start >= clipt_end)
+		return NULL;
+	clipt_size = clipt_end - clipt_start + 1;
+	if (clipt_size < CLIPT_MIN_HASH_BUCKETS)
+		return NULL;
+
+	ctbl = t4_alloc_mem(sizeof(*ctbl) +
+				clipt_size*sizeof(struct list_head));
+	if (!ctbl)
+		return NULL;
+
+	ctbl->clipt_start = clipt_start;
+	ctbl->clipt_size = clipt_size;
+	INIT_LIST_HEAD(&ctbl->ce_free_head);
+
+	atomic_set(&ctbl->nfree, clipt_size);
+	rwlock_init(&ctbl->lock);
+
+	for (i = 0; i < ctbl->clipt_size; ++i)
+		INIT_LIST_HEAD(&ctbl->hash_list[i]);
+
+	cl_list = t4_alloc_mem(clipt_size*sizeof(struct clip_entry));
+	ctbl->cl_list = (void *)cl_list;
+
+	for (i = 0; i < clipt_size; i++) {
+		INIT_LIST_HEAD(&cl_list[i].list);
+		list_add_tail(&cl_list[i].list, &ctbl->ce_free_head);
+	}
+
+	return ctbl;
+}
+
+void t4_cleanup_clip_tbl(struct adapter *adap)
+{
+	struct clip_tbl *ctbl = adap->clipt;
+
+	if (ctbl) {
+		if (ctbl->cl_list)
+			t4_free_mem(ctbl->cl_list);
+		t4_free_mem(ctbl);
+	}
+}
+EXPORT_SYMBOL(t4_cleanup_clip_tbl);
diff --git a/drivers/net/ethernet/chelsio/cxgb4/clip_tbl.h b/drivers/net/ethernet/chelsio/cxgb4/clip_tbl.h
new file mode 100644
index 000000000000..2eaba0161cf8
--- /dev/null
+++ b/drivers/net/ethernet/chelsio/cxgb4/clip_tbl.h
@@ -0,0 +1,41 @@
+/*
+ *  This file is part of the Chelsio T4 Ethernet driver for Linux.
+ *  Copyright (C) 2003-2014 Chelsio Communications.  All rights reserved.
+ *
+ *  Written by Deepak (deepak.s@chelsio.com)
+ *
+ *  This program is distributed in the hope that it will be useful, but WITHOUT
+ *  ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ *  FITNESS FOR A PARTICULAR PURPOSE.  See the LICENSE file included in this
+ *  release for licensing terms and conditions.
+ */
+
+struct clip_entry {
+	spinlock_t lock;	/* Hold while modifying clip reference */
+	atomic_t refcnt;
+	struct list_head list;
+	u32 addr[4];
+	int addr_len;
+};
+
+struct clip_tbl {
+	unsigned int clipt_start;
+	unsigned int clipt_size;
+	rwlock_t lock;
+	atomic_t nfree;
+	struct list_head ce_free_head;
+	void *cl_list;
+	struct list_head hash_list[0];
+};
+
+enum {
+	CLIPT_MIN_HASH_BUCKETS = 2,
+};
+
+struct clip_tbl *t4_init_clip_tbl(unsigned int clipt_start,
+				  unsigned int clipt_end);
+int cxgb4_clip_get(const struct net_device *dev, const u32 *lip, u8 v6);
+void cxgb4_clip_release(const struct net_device *dev, const u32 *lip, u8 v6);
+int clip_tbl_show(struct seq_file *seq, void *v);
+int cxgb4_update_root_dev_clip(struct net_device *dev);
+void t4_cleanup_clip_tbl(struct adapter *adap);
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h b/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
index 7c785b5e7757..e468f920892f 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
@@ -668,6 +668,9 @@ struct adapter {
 	unsigned int l2t_start;
 	unsigned int l2t_end;
 	struct l2t_data *l2t;
+	unsigned int clipt_start;
+	unsigned int clipt_end;
+	struct clip_tbl *clipt;
 	void *uld_handle[CXGB4_ULD_MAX];
 	struct list_head list_node;
 	struct list_head rcu_node;
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c
index e9f348942eb0..6dabfe5ba44e 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c
@@ -41,6 +41,7 @@
 #include "t4_regs.h"
 #include "t4fw_api.h"
 #include "cxgb4_debugfs.h"
+#include "clip_tbl.h"
 #include "l2t.h"
 
 /* generic seq_file support for showing a table of size rows x width. */
@@ -563,6 +564,21 @@ static const struct file_operations mps_tcam_debugfs_fops = {
 	.release = seq_release,
 };
 
+#if IS_ENABLED(CONFIG_IPV6)
+static int clip_tbl_open(struct inode *inode, struct file *file)
+{
+	return single_open(file, clip_tbl_show, PDE_DATA(inode));
+}
+
+static const struct file_operations clip_tbl_debugfs_fops = {
+	.owner   = THIS_MODULE,
+	.open    = clip_tbl_open,
+	.read    = seq_read,
+	.llseek  = seq_lseek,
+	.release = single_release
+};
+#endif
+
 static ssize_t mem_read(struct file *file, char __user *buf, size_t count,
 			loff_t *ppos)
 {
@@ -646,6 +662,9 @@ int t4_setup_debugfs(struct adapter *adap)
 		{ "devlog", &devlog_fops, S_IRUSR, 0 },
 		{ "l2t", &t4_l2t_fops, S_IRUSR, 0},
 		{ "mps_tcam", &mps_tcam_debugfs_fops, S_IRUSR, 0 },
+#if IS_ENABLED(CONFIG_IPV6)
+		{ "clip_tbl", &clip_tbl_debugfs_fops, S_IRUSR, 0 },
+#endif
 	};
 
 	add_debugfs_files(adap,
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
index 082a596a4264..1147e1e88314 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
@@ -62,6 +62,7 @@
 #include <net/netevent.h>
 #include <net/addrconf.h>
 #include <net/bonding.h>
+#include <net/addrconf.h>
 #include <asm/uaccess.h>
 
 #include "cxgb4.h"
@@ -71,6 +72,7 @@
 #include "t4fw_api.h"
 #include "cxgb4_dcb.h"
 #include "cxgb4_debugfs.h"
+#include "clip_tbl.h"
 #include "l2t.h"
 
 #ifdef DRV_VERSION
@@ -3236,40 +3238,6 @@ static int tid_init(struct tid_info *t)
 	return 0;
 }
 
-int cxgb4_clip_get(const struct net_device *dev,
-		   const struct in6_addr *lip)
-{
-	struct adapter *adap;
-	struct fw_clip_cmd c;
-
-	adap = netdev2adap(dev);
-	memset(&c, 0, sizeof(c));
-	c.op_to_write = htonl(FW_CMD_OP_V(FW_CLIP_CMD) |
-			FW_CMD_REQUEST_F | FW_CMD_WRITE_F);
-	c.alloc_to_len16 = htonl(FW_CLIP_CMD_ALLOC_F | FW_LEN16(c));
-	c.ip_hi = *(__be64 *)(lip->s6_addr);
-	c.ip_lo = *(__be64 *)(lip->s6_addr + 8);
-	return t4_wr_mbox_meat(adap, adap->mbox, &c, sizeof(c), &c, false);
-}
-EXPORT_SYMBOL(cxgb4_clip_get);
-
-int cxgb4_clip_release(const struct net_device *dev,
-		       const struct in6_addr *lip)
-{
-	struct adapter *adap;
-	struct fw_clip_cmd c;
-
-	adap = netdev2adap(dev);
-	memset(&c, 0, sizeof(c));
-	c.op_to_write = htonl(FW_CMD_OP_V(FW_CLIP_CMD) |
-			FW_CMD_REQUEST_F | FW_CMD_READ_F);
-	c.alloc_to_len16 = htonl(FW_CLIP_CMD_FREE_F | FW_LEN16(c));
-	c.ip_hi = *(__be64 *)(lip->s6_addr);
-	c.ip_lo = *(__be64 *)(lip->s6_addr + 8);
-	return t4_wr_mbox_meat(adap, adap->mbox, &c, sizeof(c), &c, false);
-}
-EXPORT_SYMBOL(cxgb4_clip_release);
-
 /**
  *	cxgb4_create_server - create an IP server
  *	@dev: the device
@@ -4122,148 +4090,61 @@ int cxgb4_unregister_uld(enum cxgb4_uld type)
 }
 EXPORT_SYMBOL(cxgb4_unregister_uld);
 
-/* Check if netdev on which event is occured belongs to us or not. Return
- * success (true) if it belongs otherwise failure (false).
- * Called with rcu_read_lock() held.
- */
 #if IS_ENABLED(CONFIG_IPV6)
-static bool cxgb4_netdev(const struct net_device *netdev)
+static int cxgb4_inet6addr_handler(struct notifier_block *this,
+				   unsigned long event, void *data)
 {
+	struct inet6_ifaddr *ifa = data;
+	struct net_device *event_dev = ifa->idev->dev;
+	const struct device *parent = NULL;
+#if IS_ENABLED(CONFIG_BONDING)
 	struct adapter *adap;
-	int i;
-
-	list_for_each_entry_rcu(adap, &adap_rcu_list, rcu_node)
-		for (i = 0; i < MAX_NPORTS; i++)
-			if (adap->port[i] == netdev)
-				return true;
-	return false;
-}
+#endif
+	if (event_dev->priv_flags & IFF_802_1Q_VLAN)
+		event_dev = vlan_dev_real_dev(event_dev);
+#if IS_ENABLED(CONFIG_BONDING)
+	if (event_dev->flags & IFF_MASTER) {
+		list_for_each_entry(adap, &adapter_list, list_node) {
+			switch (event) {
+			case NETDEV_UP:
+				cxgb4_clip_get(adap->port[0],
+					       (const u32 *)ifa, 1);
+				break;
+			case NETDEV_DOWN:
+				cxgb4_clip_release(adap->port[0],
+						   (const u32 *)ifa, 1);
+				break;
+			default:
+				break;
+			}
+		}
+		return NOTIFY_OK;
+	}
+#endif
 
-static int clip_add(struct net_device *event_dev, struct inet6_ifaddr *ifa,
-		    unsigned long event)
-{
-	int ret = NOTIFY_DONE;
+	if (event_dev)
+		parent = event_dev->dev.parent;
 
-	rcu_read_lock();
-	if (cxgb4_netdev(event_dev)) {
+	if (parent && parent->driver == &cxgb4_driver.driver) {
 		switch (event) {
 		case NETDEV_UP:
-			ret = cxgb4_clip_get(event_dev, &ifa->addr);
-			if (ret < 0) {
-				rcu_read_unlock();
-				return ret;
-			}
-			ret = NOTIFY_OK;
+			cxgb4_clip_get(event_dev, (const u32 *)ifa, 1);
 			break;
 		case NETDEV_DOWN:
-			cxgb4_clip_release(event_dev, &ifa->addr);
-			ret = NOTIFY_OK;
+			cxgb4_clip_release(event_dev, (const u32 *)ifa, 1);
 			break;
 		default:
 			break;
 		}
 	}
-	rcu_read_unlock();
-	return ret;
-}
-
-static int cxgb4_inet6addr_handler(struct notifier_block *this,
-		unsigned long event, void *data)
-{
-	struct inet6_ifaddr *ifa = data;
-	struct net_device *event_dev;
-	int ret = NOTIFY_DONE;
-	struct bonding *bond = netdev_priv(ifa->idev->dev);
-	struct list_head *iter;
-	struct slave *slave;
-	struct pci_dev *first_pdev = NULL;
-
-	if (ifa->idev->dev->priv_flags & IFF_802_1Q_VLAN) {
-		event_dev = vlan_dev_real_dev(ifa->idev->dev);
-		ret = clip_add(event_dev, ifa, event);
-	} else if (ifa->idev->dev->flags & IFF_MASTER) {
-		/* It is possible that two different adapters are bonded in one
-		 * bond. We need to find such different adapters and add clip
-		 * in all of them only once.
-		 */
-		bond_for_each_slave(bond, slave, iter) {
-			if (!first_pdev) {
-				ret = clip_add(slave->dev, ifa, event);
-				/* If clip_add is success then only initialize
-				 * first_pdev since it means it is our device
-				 */
-				if (ret == NOTIFY_OK)
-					first_pdev = to_pci_dev(
-							slave->dev->dev.parent);
-			} else if (first_pdev !=
-				   to_pci_dev(slave->dev->dev.parent))
-					ret = clip_add(slave->dev, ifa, event);
-		}
-	} else
-		ret = clip_add(ifa->idev->dev, ifa, event);
-
-	return ret;
+	return NOTIFY_OK;
 }
 
+static bool inet6addr_registered;
 static struct notifier_block cxgb4_inet6addr_notifier = {
 	.notifier_call = cxgb4_inet6addr_handler
 };
 
-/* Retrieves IPv6 addresses from a root device (bond, vlan) associated with
- * a physical device.
- * The physical device reference is needed to send the actul CLIP command.
- */
-static int update_dev_clip(struct net_device *root_dev, struct net_device *dev)
-{
-	struct inet6_dev *idev = NULL;
-	struct inet6_ifaddr *ifa;
-	int ret = 0;
-
-	idev = __in6_dev_get(root_dev);
-	if (!idev)
-		return ret;
-
-	read_lock_bh(&idev->lock);
-	list_for_each_entry(ifa, &idev->addr_list, if_list) {
-		ret = cxgb4_clip_get(dev, &ifa->addr);
-		if (ret < 0)
-			break;
-	}
-	read_unlock_bh(&idev->lock);
-
-	return ret;
-}
-
-static int update_root_dev_clip(struct net_device *dev)
-{
-	struct net_device *root_dev = NULL;
-	int i, ret = 0;
-
-	/* First populate the real net device's IPv6 addresses */
-	ret = update_dev_clip(dev, dev);
-	if (ret)
-		return ret;
-
-	/* Parse all bond and vlan devices layered on top of the physical dev */
-	root_dev = netdev_master_upper_dev_get_rcu(dev);
-	if (root_dev) {
-		ret = update_dev_clip(root_dev, dev);
-		if (ret)
-			return ret;
-	}
-
-	for (i = 0; i < VLAN_N_VID; i++) {
-		root_dev = __vlan_find_dev_deep_rcu(dev, htons(ETH_P_8021Q), i);
-		if (!root_dev)
-			continue;
-
-		ret = update_dev_clip(root_dev, dev);
-		if (ret)
-			break;
-	}
-	return ret;
-}
-
 static void update_clip(const struct adapter *adap)
 {
 	int i;
@@ -4277,7 +4158,7 @@ static void update_clip(const struct adapter *adap)
 		ret = 0;
 
 		if (dev)
-			ret = update_root_dev_clip(dev);
+			ret = cxgb4_update_root_dev_clip(dev);
 
 		if (ret < 0)
 			break;
@@ -5391,6 +5272,14 @@ static int adap_init0(struct adapter *adap)
 	adap->tids.nftids = val[4] - val[3] + 1;
 	adap->sge.ingr_start = val[5];
 
+	params[0] = FW_PARAM_PFVF(CLIP_START);
+	params[1] = FW_PARAM_PFVF(CLIP_END);
+	ret = t4_query_params(adap, adap->mbox, adap->fn, 0, 2, params, val);
+	if (ret < 0)
+		goto bye;
+	adap->clipt_start = val[0];
+	adap->clipt_end = val[1];
+
 	/* query params related to active filter region */
 	params[0] = FW_PARAM_PFVF(ACTIVE_FILTER_START);
 	params[1] = FW_PARAM_PFVF(ACTIVE_FILTER_END);
@@ -6211,6 +6100,18 @@ static int init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 		adapter->params.offload = 0;
 	}
 
+#if IS_ENABLED(CONFIG_IPV6)
+	adapter->clipt = t4_init_clip_tbl(adapter->clipt_start,
+					  adapter->clipt_end);
+	if (!adapter->clipt) {
+		/* We tolerate a lack of clip_table, giving up
+		 * some functionality
+		 */
+		dev_warn(&pdev->dev,
+			 "could not allocate Clip table, continuing\n");
+		adapter->params.offload = 0;
+	}
+#endif
 	if (is_offload(adapter) && tid_init(&adapter->tids) < 0) {
 		dev_warn(&pdev->dev, "could not allocate TID table, "
 			 "continuing\n");
@@ -6336,6 +6237,9 @@ static void remove_one(struct pci_dev *pdev)
 			cxgb_down(adapter);
 
 		free_some_resources(adapter);
+#if IS_ENABLED(CONFIG_IPV6)
+		t4_cleanup_clip_tbl(adapter);
+#endif
 		iounmap(adapter->regs);
 		if (!is_t4(adapter->params.chip))
 			iounmap(adapter->bar2);
@@ -6374,7 +6278,10 @@ static int __init cxgb4_init_module(void)
 		debugfs_remove(cxgb4_debugfs_root);
 
 #if IS_ENABLED(CONFIG_IPV6)
-	register_inet6addr_notifier(&cxgb4_inet6addr_notifier);
+	if (!inet6addr_registered) {
+		register_inet6addr_notifier(&cxgb4_inet6addr_notifier);
+		inet6addr_registered = true;
+	}
 #endif
 
 	return ret;
@@ -6383,7 +6290,10 @@ static int __init cxgb4_init_module(void)
 static void __exit cxgb4_cleanup_module(void)
 {
 #if IS_ENABLED(CONFIG_IPV6)
-	unregister_inet6addr_notifier(&cxgb4_inet6addr_notifier);
+	if (inet6addr_registered && list_empty(&adapter_list)) {
+		unregister_inet6addr_notifier(&cxgb4_inet6addr_notifier);
+		inet6addr_registered = false;
+	}
 #endif
 	pci_unregister_driver(&cxgb4_driver);
 	debugfs_remove(cxgb4_debugfs_root);  /* NULL ok */
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.h b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.h
index 152b4c4c7809..78ab4d406ce2 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.h
@@ -173,9 +173,6 @@ int cxgb4_create_server_filter(const struct net_device *dev, unsigned int stid,
 			       unsigned char port, unsigned char mask);
 int cxgb4_remove_server_filter(const struct net_device *dev, unsigned int stid,
 			       unsigned int queue, bool ipv6);
-int cxgb4_clip_get(const struct net_device *dev, const struct in6_addr *lip);
-int cxgb4_clip_release(const struct net_device *dev,
-		       const struct in6_addr *lip);
 
 static inline void set_wr_txq(struct sk_buff *skb, int prio, int queue)
 {
-- 
2.2.1

^ permalink raw reply related

* [PATCH net-next 2/2] cxgb4i : Call into recently added cxgb4 ipv6 api
From: Anish Bhatt @ 2015-01-14  2:28 UTC (permalink / raw)
  To: netdev; +Cc: davem, hariprasad, kxie, deepak.s, Anish Bhatt
In-Reply-To: <1421202516-13862-1-git-send-email-anish@chelsio.com>

Get a reference on every ipv6 address we offload to hardware so that it cannot
be prematurely cleared out before cleanup or when still in use.

Signed-off-by: Anish Bhatt <anish@chelsio.com>
---
 drivers/scsi/cxgbi/cxgb4i/cxgb4i.c | 23 ++++++++++++++++++++++-
 1 file changed, 22 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/cxgbi/cxgb4i/cxgb4i.c b/drivers/scsi/cxgbi/cxgb4i/cxgb4i.c
index 37d7191a3c38..dd00e5fe4a5e 100644
--- a/drivers/scsi/cxgbi/cxgb4i/cxgb4i.c
+++ b/drivers/scsi/cxgbi/cxgb4i/cxgb4i.c
@@ -28,6 +28,7 @@
 #include "t4fw_api.h"
 #include "l2t.h"
 #include "cxgb4i.h"
+#include "clip_tbl.h"
 
 static unsigned int dbg_level;
 
@@ -1322,6 +1323,9 @@ static inline void l2t_put(struct cxgbi_sock *csk)
 static void release_offload_resources(struct cxgbi_sock *csk)
 {
 	struct cxgb4_lld_info *lldi;
+#if IS_ENABLED(CONFIG_IPV6)
+	struct net_device *ndev = csk->cdev->ports[csk->port_id];
+#endif
 
 	log_debug(1 << CXGBI_DBG_TOE | 1 << CXGBI_DBG_SOCK,
 		"csk 0x%p,%u,0x%lx,%u.\n",
@@ -1334,6 +1338,12 @@ static void release_offload_resources(struct cxgbi_sock *csk)
 	}
 
 	l2t_put(csk);
+#if IS_ENABLED(CONFIG_IPV6)
+	if (csk->csk_family == AF_INET6)
+		cxgb4_clip_release(ndev,
+				   (const u32 *)&csk->saddr6.sin6_addr, 1);
+#endif
+
 	if (cxgbi_sock_flag(csk, CTPF_HAS_ATID))
 		free_atid(csk);
 	else if (cxgbi_sock_flag(csk, CTPF_HAS_TID)) {
@@ -1391,10 +1401,15 @@ static int init_act_open(struct cxgbi_sock *csk)
 	csk->l2t = cxgb4_l2t_get(lldi->l2t, n, ndev, 0);
 	if (!csk->l2t) {
 		pr_err("%s, cannot alloc l2t.\n", ndev->name);
-		goto rel_resource;
+		goto rel_resource_without_clip;
 	}
 	cxgbi_sock_get(csk);
 
+#if IS_ENABLED(CONFIG_IPV6)
+	if (csk->csk_family == AF_INET6)
+		cxgb4_clip_get(ndev, (const u32 *)&csk->saddr6.sin6_addr, 1);
+#endif
+
 	if (t4) {
 		size = sizeof(struct cpl_act_open_req);
 		size6 = sizeof(struct cpl_act_open_req6);
@@ -1451,6 +1466,12 @@ static int init_act_open(struct cxgbi_sock *csk)
 	return 0;
 
 rel_resource:
+#if IS_ENABLED(CONFIG_IPV6)
+	if (csk->csk_family == AF_INET6)
+		cxgb4_clip_release(ndev,
+				   (const u32 *)&csk->saddr6.sin6_addr, 1);
+#endif
+rel_resource_without_clip:
 	if (n)
 		neigh_release(n);
 	if (skb)
-- 
2.2.1

^ permalink raw reply related

* Why is bridge's sysfs values in 1/100 of second?
From: Ben Greear @ 2015-01-14  2:35 UTC (permalink / raw)
  To: netdev

Are the units supposed to be 1/100 of a second, or is that just some
luck depending on HZ?

root@ath9k-138:/home/lanforge# brctl setageing br0 98
root@ath9k-138:/home/lanforge# cat /sys/class/net/br0/bridge/ageing_time
9800

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply

* Re: [PATCH] i40e: don't enable and init FCOE by default when do PF reset
From: ethan zhao @ 2015-01-14  2:40 UTC (permalink / raw)
  To: Dev, Vasu
  Cc: e1000-devel@lists.sourceforge.net, brian.maly@oracle.com,
	Allan, Bruce W, Brandeburg, Jesse, Parikh, Neerav, Linux NICS,
	Ronciak, John, netdev@vger.kernel.org, Ethan Zhao,
	linux-kernel@vger.kernel.org
In-Reply-To: <933BEC2E04D6A5458F4B0239FB547F9A34CC4055@fmsmsx118.amr.corp.intel.com>

Vasu,

On 2015/1/14 3:38, Dev, Vasu wrote:
>> -----Original Message-----
>>>>> diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c
>>>>> b/drivers/net/ethernet/intel/i40e/i40e_main.c
>>>>> index a5f2660..a2572cc 100644
>>>>> --- a/drivers/net/ethernet/intel/i40e/i40e_main.c
>>>>> +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
>>>>> @@ -6180,9 +6180,12 @@ static void i40e_reset_and_rebuild(struct
>>>>> i40e_pf *pf, bool reinit)
>>>>>      }
>>>>>   #endif /* CONFIG_I40E_DCB */
>>>>>   #ifdef I40E_FCOE
>>>>> -   ret = i40e_init_pf_fcoe(pf);
>>>>> -   if (ret)
>>>>> -           dev_info(&pf->pdev->dev, "init_pf_fcoe failed: %d\n", ret);
>>>>> +   if (pf->flags & I40E_FLAG_FCOE_ENABLED) {
>>>>> +           ret = i40e_init_pf_fcoe(pf);
>>> Calling i40e_init_pf_fcoe() here conflicts with its
>> I40E_FLAG_FCOE_ENABLED pre-condition since I40E_FLAG_FCOE_ENABLED is
>> set by very same i40e_init_pf_fcoe(), in turn i40e_init_pf_fcoe() will never get
>> called.
>>
>> I don't think so,  here ,i40e_reset_and_rebuild()  is not the only and the first
>> place that  i40e_init_pf_fcoe() is called, see i40e_probe(), that is the first
>> chance.
>>
>> i40e_probe()
>> -->i40e_sw_init()
>>       -->i40e_init_pf_fcoe()
>>
>> And the I40E_FLAG_FCOE_ENABLED is possible be set by
>> i40e_fcoe_enable() or i40e_fcoe_disable() interface before the reset action is
>> to be done.
>>
> It is set by i40e_init_pf_fcoe() and you are right that the modified call flow by your patch won't impact setting of I40E_FLAG_FCOE_ENABLED anyway which could have prevented calling i40e_init_pf_fcoe() as I described above, so this is not an issue with the patch.
>
>> BTW, the reason I post this patch is that we hit a bug, after setup vlan, the PF
>> is enabled to FCOE.
>>
> Then that BUG would still remain un-fixed and calling i40e_init_pf_fcoe() under I40E_FLAG_FCOE_ENABLED  flag really won't affect call flow to fix anything. I mean I40E_FLAG_FCOE_ENABLED  condition will be true with "pf->hw.func_caps.fcoe == true" and otherwise calling i40e_init_pf_fcoe() simply returns back early on after checking "pf->hw.func_caps.fcoe == false", so how that bug is fixed here by added I40E_FLAG_FCOE_ENABLED  condition ? What is the bug ?
  The func_caps.fcoe is assigned by following call path, under our test 
environment,

  i40e_probe()
   ->i40e_get_capabilities()
      ->i40e_aq_discover_capabilities()
         ->i40e_parse_discover_capabilities()

  Or

  i40e_reset_and_rebuild()
   ->i40e_get_capabilities()
     ->i40e_aq_discover_capabilities()
       ->i40e_parse_discover_capabilities()

  Under our test environment, the "pf->hw.func_caps.fcoe" is true. so if 
i40e_reset_and_rebuild() is called for VLAN setup, ethtool diagnostic test.
  And then i40e_init_pf_fcoe() is to be called,

  While if (!pf->hw.func_caps.fcoe) wouldn't return,

  So  pf->flags is set to I40E_FLAG_FCOE_ENABLED.

  With my patch,  i40e_init_pf_fcoe() is only called after 
I40E_FLAG_FCOE_ENABLED is set before reset.

Enable FCOE in i40e_probe() or not is another issue.


Thanks,
Ethan


>
>>> Jeff Kirsher should be getting out a patch queued by me which adds
>> I40E_FCoE Kbuild option, in that FCoE is disabled by default and  user could
>> enable FCoE only if needed, that patch would do same of skipping
>> i40e_init_pf_fcoe() whether FCoE capability in device enabled or not in
>> default config.
>> The following patch will not fix the above issue -- configuration of PF will be
>> changed via reset.
>> How about the FCOE is configured and disabled by  i40e_fcoe_disable() ,
>> then reset happens ?
>>
> May be but if the BUG is due to FCoE being enabled then having it disabled in config will avoid the bug for non FCoE config option and once bug is understood then that has to be fixed for FCoE enabled config also as I asked above.
>
> Thanks Ethan for detailed response.
> Vasu
>
>>>  From patchwork Wed Oct  2 23:26:08 2013
>>> Content-Type: text/plain; charset="utf-8"
>>> MIME-Version: 1.0
>>> Content-Transfer-Encoding: 7bit
>>> Subject: [net] i40e: adds FCoE configure option
>>> Date: Thu, 03 Oct 2013 07:26:08 -0000
>>> From: Vasu Dev <vasu.dev@intel.com>
>>> X-Patchwork-Id: 11797
>>>
>>> Adds FCoE config option I40E_FCOE, so that FCoE can be enabled as
>>> needed but otherwise have it disabled by default.
>>>
>>> This also eliminate multiple FCoE config checks, instead now just one
>>> config check for CONFIG_I40E_FCOE.
>>>
>>> The I40E FCoE was added with 3.17 kernel and therefore this patch
>>> shall be applied to stable 3.17 kernel also.
>>>
>>> CC: <stable@vger.kernel.org>
>>> Signed-off-by: Vasu Dev <vasu.dev@intel.com>
>>> Tested-by: Jim Young <jamesx.m.young@intel.com>
>>>
>>> ---
>>> drivers/net/ethernet/intel/Kconfig           |   11 +++++++++++
>>>   drivers/net/ethernet/intel/i40e/Makefile     |    2 +-
>>>   drivers/net/ethernet/intel/i40e/i40e_osdep.h |    4 ++--
>>>   3 files changed, 14 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/drivers/net/ethernet/intel/Kconfig
>>> b/drivers/net/ethernet/intel/Kconfig
>>> index 5b8300a..4d61ef5 100644
>>> --- a/drivers/net/ethernet/intel/Kconfig
>>> +++ b/drivers/net/ethernet/intel/Kconfig
>>> @@ -281,6 +281,17 @@ config I40E_DCB
>>>
>>>            If unsure, say N.
>>>
>>> +config I40E_FCOE
>>> +       bool "Fibre Channel over Ethernet (FCoE)"
>>> +       default n
>>> +       depends on I40E && DCB && FCOE
>>> +       ---help---
>>> +         Say Y here if you want to use Fibre Channel over Ethernet (FCoE)
>>> +         in the driver. This will create new netdev for exclusive FCoE
>>> +         use with XL710 FCoE offloads enabled.
>>> +
>>> +         If unsure, say N.
>>> +
>>>   config I40EVF
>>>          tristate "Intel(R) XL710 X710 Virtual Function Ethernet support"
>>>          depends on PCI_MSI
>>> diff --git a/drivers/net/ethernet/intel/i40e/Makefile
>>> b/drivers/net/ethernet/intel/i40e/Makefile
>>> index 4b94ddb..c405819 100644
>>> --- a/drivers/net/ethernet/intel/i40e/Makefile
>>> +++ b/drivers/net/ethernet/intel/i40e/Makefile
>>> @@ -44,4 +44,4 @@ i40e-objs := i40e_main.o \
>>>          i40e_virtchnl_pf.o
>>>
>>>   i40e-$(CONFIG_I40E_DCB) += i40e_dcb.o i40e_dcb_nl.o
>>> -i40e-$(CONFIG_FCOE:m=y) += i40e_fcoe.o
>>> +i40e-$(CONFIG_I40E_FCOE) += i40e_fcoe.o
>>> diff --git a/drivers/net/ethernet/intel/i40e/i40e_osdep.h
>>> b/drivers/net/ethernet/intel/i40e/i40e_osdep.h
>>> index 045b5c4..ad802dd 100644
>>> --- a/drivers/net/ethernet/intel/i40e/i40e_osdep.h
>>> +++ b/drivers/net/ethernet/intel/i40e/i40e_osdep.h
>>> @@ -78,7 +78,7 @@ do {                                                            \
>>>   } while (0)
>>>
>>>   typedef enum i40e_status_code i40e_status; -#if defined(CONFIG_FCOE)
>>> || defined(CONFIG_FCOE_MODULE)
>>> +#ifdef CONFIG_I40E_FCOE
>>>   #define I40E_FCOE
>>> -#endif /* CONFIG_FCOE or CONFIG_FCOE_MODULE */
>>> +#endif
>>>   #endif /* _I40E_OSDEP_H_ */
>>>
>>>>> +           if (ret)
>>>>> +                   dev_info(&pf->pdev->dev,
>>>>> +                            "init_pf_fcoe failed: %d\n", ret);
>>>>> +   }
>>>>>
>>>>>   #endif
>>>>>      /* do basic switch setup */
>>>>> --
>>>>> 1.8.3.1
>> Thanks,
>> Ethan


------------------------------------------------------------------------------
New Year. New Location. New Benefits. New Data Center in Ashburn, VA.
GigeNET is offering a free month of service with a new server in Ashburn.
Choose from 2 high performing configs, both with 100TB of bandwidth.
Higher redundancy.Lower latency.Increased capacity.Completely compliant.
http://p.sf.net/sfu/gigenet
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit http://communities.intel.com/community/wired

^ permalink raw reply

* Re: [PATCH] bridge: only provide proxy ARP when CONFIG_INET is enabled
From: David Ahern @ 2015-01-14  2:56 UTC (permalink / raw)
  To: Arnd Bergmann; +Cc: David Miller, cwang, netdev, kyeyoonp, bridge, stephen
In-Reply-To: <4681500.x6CYopasp1@wuerfel>

On 1/13/15 2:33 PM, Arnd Bergmann wrote:

> The effect is very similar to my patch (probably same object code), the
> only difference should be that it would add an ugly #ifdef instead of
> the preferred IS_ENABLED() check, so you don't get any compile-time
> coverage of the function.

Indeed. As long as br_do_proxy_arp does not get exported that works the 
same.

David

^ permalink raw reply

* Re: Why is bridge's sysfs values in 1/100 of second?
From: David Miller @ 2015-01-14  2:53 UTC (permalink / raw)
  To: greearb; +Cc: netdev
In-Reply-To: <54B5D5DA.5080008@candelatech.com>

From: Ben Greear <greearb@candelatech.com>
Date: Tue, 13 Jan 2015 18:35:06 -0800

> Are the units supposed to be 1/100 of a second, or is that just some
> luck depending on HZ?

More specifically, it's "USER_HZ" which unlike HZ is unchanging.

^ permalink raw reply

* Re: [PATCH v2 1/1] atm: remove deprecated use of pci api
From: David Miller @ 2015-01-14  2:59 UTC (permalink / raw)
  To: lambert.quentin; +Cc: chas, linux-atm-general, netdev, linux-kernel
In-Reply-To: <20150112161042.GA11374@sloth>

From: Quentin Lambert <lambert.quentin@gmail.com>
Date: Mon, 12 Jan 2015 17:10:42 +0100

> @@ -2246,7 +2246,8 @@ static int eni_init_one(struct pci_dev *pci_dev,
>  		goto err_disable;
>  
>  	zero = &eni_dev->zero;
> -	zero->addr = pci_alloc_consistent(pci_dev, ENI_ZEROES_SIZE, &zero->dma);
> +	zero->addr = dma_alloc_coherent(&pci_dev->dev, ENI_ZEROES_SIZE,
> +					&zero->dma, GFP_ATOMIC);
>  	if (!zero->addr)
>  		goto err_kfree;
>  

I really would like you to look at these locations and see if
GFP_KERNEL can be used instead of GFP_ATOMIC.  I bet that nearly
all of these can, and it is preferred.

Thanks.

^ permalink raw reply

* Re: [PATCH] neighbour: fix base_reachable_time(_ms) not effective immediatly when changed
From: David Miller @ 2015-01-14  3:04 UTC (permalink / raw)
  To: jeff; +Cc: netdev
In-Reply-To: <1421135479-15952-1-git-send-email-jeff@melix.org>

From: Jean-Francois Remy <jeff@melix.org>
Date: Tue, 13 Jan 2015 08:51:19 +0100

> When setting base_reachable_time or base_reachable_time_ms through
> sysctl or netlink, the reachable_time value is not updated.
> This means that neighbour entries will continue to be updated using the
> old value until it is recomputed in neigh_period_work (which
>     recomputes the value every 300*HZ).
> On systems with HZ equal to 1000 for instance, it means 5mins before
> the change is effective.
> 
> This patch changes this behavior by recomputing reachable_time after
> each set on base_reachable_time or base_reachable_time_ms.
> The new value will become effective the next time the neighbour's timer
> is triggered.
> 
> Changes are made in two places: the netlink code for set and the sysctl
> handling code. For sysctl, I use a proc_handler. The ipv6 network
> code does provide its own handler but it already refreshes
> reachable_time correctly so it's not an issue.
> Any other user of neighbour which provide its own handlers must
> refresh reachable_time.
> 
> Signed-off-by: Jean-Francois Remy <jeff@melix.org>

Your change is correct but there are some coding style issues to
deal with:

> +				/*
> +				 * update reachable_time as well, otherwise, the change will
> +				 * only be effective after the next time neigh_periodic_work
> +				 * decides to recompute it (can be multiple minutes)
> +				 */

Comments in the networking should be:

	/* Of this
	 * form.
	 */


> +	if (write && ret == 0 ) {

No space between '0' and the closing parenthesis please.

> +		/*
> +		 * update reachable_time as well, otherwise, the change will
> +		 * only be effective after the next time neigh_periodic_work
> +		 * decides to recompute it
> +		 */

Please fix this comment's layout as per above.

> +		/*
> +		 * Those handlers will update p->reachable_time after
> +		 * base_reachable_time(_ms) is set to ensure the new timer starts being
> +		 * applied after the next neighbour update instead of waiting for
> +		 * neigh_periodic_work to update its value (can be multiple minutes)
> +		 * So any handler that replaces them should do this as well
> +		 */

Likewise.

^ permalink raw reply

* [PATCH] neighbour: fix base_reachable_time(_ms) not effective immediatly when changed
From: Jean-Francois Remy @ 2015-01-14  3:22 UTC (permalink / raw)
  To: davem; +Cc: netdev, Jean-Francois Remy
In-Reply-To: <20150113.220438.1470978767602235254.davem@davemloft.net>

When setting base_reachable_time or base_reachable_time_ms on a
specific interface through sysctl or netlink, the reachable_time
value is not updated.

This means that neighbour entries will continue to be updated using the
old value until it is recomputed in neigh_period_work (which
    recomputes the value every 300*HZ).
On systems with HZ equal to 1000 for instance, it means 5mins before
the change is effective.

This patch changes this behavior by recomputing reachable_time after
each set on base_reachable_time or base_reachable_time_ms.
The new value will become effective the next time the neighbour's timer
is triggered.

Changes are made in two places: the netlink code for set and the sysctl
handling code. For sysctl, I use a proc_handler. The ipv6 network
code does provide its own handler but it already refreshes
reachable_time correctly so it's not an issue.
Any other user of neighbour which provide its own handlers must
refresh reachable_time.

Signed-off-by: Jean-Francois Remy <jeff@melix.org>
---
 net/core/neighbour.c | 44 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 44 insertions(+)

diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index 8e38f17..8d614c9 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -2043,6 +2043,12 @@ static int neightbl_set(struct sk_buff *skb, struct nlmsghdr *nlh)
 			case NDTPA_BASE_REACHABLE_TIME:
 				NEIGH_VAR_SET(p, BASE_REACHABLE_TIME,
 					      nla_get_msecs(tbp[i]));
+				/* update reachable_time as well, otherwise, the change will
+				 * only be effective after the next time neigh_periodic_work
+				 * decides to recompute it (can be multiple minutes)
+				 */
+				p->reachable_time =
+					neigh_rand_reach_time(NEIGH_VAR(p, BASE_REACHABLE_TIME));
 				break;
 			case NDTPA_GC_STALETIME:
 				NEIGH_VAR_SET(p, GC_STALETIME,
@@ -2921,6 +2927,31 @@ static int neigh_proc_dointvec_unres_qlen(struct ctl_table *ctl, int write,
 	return ret;
 }
 
+static int neigh_proc_base_reachable_time(struct ctl_table *ctl, int write,
+					  void __user *buffer,
+					  size_t *lenp, loff_t *ppos)
+{
+	struct neigh_parms *p = ctl->extra2;
+	int ret;
+
+	if (strcmp(ctl->procname, "base_reachable_time") == 0)
+		ret = neigh_proc_dointvec_jiffies(ctl, write, buffer, lenp, ppos);
+	else if (strcmp(ctl->procname, "base_reachable_time_ms") == 0)
+		ret = neigh_proc_dointvec_ms_jiffies(ctl, write, buffer, lenp, ppos);
+	else
+		ret = -1;
+
+	if (write && ret == 0) {
+		/* update reachable_time as well, otherwise, the change will
+		 * only be effective after the next time neigh_periodic_work
+		 * decides to recompute it
+		 */
+		p->reachable_time =
+			neigh_rand_reach_time(NEIGH_VAR(p, BASE_REACHABLE_TIME));
+	}
+	return ret;
+}
+
 #define NEIGH_PARMS_DATA_OFFSET(index)	\
 	(&((struct neigh_parms *) 0)->data[index])
 
@@ -3047,6 +3078,19 @@ int neigh_sysctl_register(struct net_device *dev, struct neigh_parms *p,
 		t->neigh_vars[NEIGH_VAR_RETRANS_TIME_MS].proc_handler = handler;
 		/* ReachableTime (in milliseconds) */
 		t->neigh_vars[NEIGH_VAR_BASE_REACHABLE_TIME_MS].proc_handler = handler;
+	} else {
+		/* Those handlers will update p->reachable_time after
+		 * base_reachable_time(_ms) is set to ensure the new timer starts being
+		 * applied after the next neighbour update instead of waiting for
+		 * neigh_periodic_work to update its value (can be multiple minutes)
+		 * So any handler that replaces them should do this as well
+		 */
+		/* ReachableTime */
+		t->neigh_vars[NEIGH_VAR_BASE_REACHABLE_TIME].proc_handler =
+			neigh_proc_base_reachable_time;
+		/* ReachableTime (in milliseconds) */
+		t->neigh_vars[NEIGH_VAR_BASE_REACHABLE_TIME_MS].proc_handler =
+			neigh_proc_base_reachable_time;
 	}
 
 	/* Don't export sysctls to unprivileged users */
-- 
2.1.0

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox