Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH v2 net-next 00/13] net: hns: add support of debug dsaf device
From: David Miller @ 2016-04-26  5:11 UTC (permalink / raw)
  To: Yisen.Zhuang
  Cc: devicetree, netdev, linux-arm-kernel, robh+dt, pawel.moll,
	mark.rutland, ijc+devicetree, galak, will.deacon, catalin.marinas,
	yankejian, huangdaode, salil.mehta, lipeng321, liguozhu,
	xieqianqian, xuwei5, linuxarm
In-Reply-To: <1461402317-136499-1-git-send-email-Yisen.Zhuang@huawei.com>

From: Yisen Zhuang <Yisen.Zhuang@huawei.com>
Date: Sat, 23 Apr 2016 17:05:04 +0800

> There are two kinds of dsaf device in hns, one is for service ports,
> contains crossbar in it, can work under different mode. Another is for
> debug port, only can work under single port mode. The current code only
> declares a dsaf device for both service ports and debug ports.It is not so
> readability. This patch separates it to three platform devices to make the
> code more simple and readability.
 ...
> We take the compatibility into consideration, and it works well by using the
> old dts file(tested on d02 board).
> 
> For more details, please see individual patches.

Series applied, thanks.

^ permalink raw reply

* [PATCH 5/5] batman-adv: Fix broadcast/ogm queue limit on a removed interface
From: Antonio Quartulli @ 2016-04-26  3:27 UTC (permalink / raw)
  To: davem
  Cc: netdev, b.a.t.m.a.n, Linus Lüssing, Sven Eckelmann,
	Marek Lindner, Antonio Quartulli
In-Reply-To: <1461641239-7097-1-git-send-email-a@unstable.cc>

From: Linus Lüssing <linus.luessing@c0d3.blue>

When removing a single interface while a broadcast or ogm packet is
still pending then we will free the forward packet without releasing the
queue slots again.

This patch is supposed to fix this issue.

Fixes: 6d5808d4ae1b ("batman-adv: Add missing hardif_free_ref in forw_packet_free")
Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
[sven@narfation.org: fix conflicts with current version]
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <a@unstable.cc>
---
 net/batman-adv/send.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/net/batman-adv/send.c b/net/batman-adv/send.c
index 3ce06e0a91b1..76417850d3fc 100644
--- a/net/batman-adv/send.c
+++ b/net/batman-adv/send.c
@@ -675,6 +675,9 @@ batadv_purge_outstanding_packets(struct batadv_priv *bat_priv,
 
 		if (pending) {
 			hlist_del(&forw_packet->list);
+			if (!forw_packet->own)
+				atomic_inc(&bat_priv->bcast_queue_left);
+
 			batadv_forw_packet_free(forw_packet);
 		}
 	}
@@ -702,6 +705,9 @@ batadv_purge_outstanding_packets(struct batadv_priv *bat_priv,
 
 		if (pending) {
 			hlist_del(&forw_packet->list);
+			if (!forw_packet->own)
+				atomic_inc(&bat_priv->batman_queue_left);
+
 			batadv_forw_packet_free(forw_packet);
 		}
 	}
-- 
2.8.1

^ permalink raw reply related

* [PATCH 4/5] batman-adv: Reduce refcnt of removed router when updating route
From: Antonio Quartulli @ 2016-04-26  3:27 UTC (permalink / raw)
  To: davem; +Cc: netdev, b.a.t.m.a.n, Sven Eckelmann, Marek Lindner,
	Antonio Quartulli
In-Reply-To: <1461641239-7097-1-git-send-email-a@unstable.cc>

From: Sven Eckelmann <sven@narfation.org>

_batadv_update_route rcu_derefences orig_ifinfo->router outside of a
spinlock protected region to print some information messages to the debug
log. But this pointer is not checked again when the new pointer is assigned
in the spinlock protected region. Thus is can happen that the value of
orig_ifinfo->router changed in the meantime and thus the reference counter
of the wrong router gets reduced after the spinlock protected region.

Just rcu_dereferencing the value of orig_ifinfo->router inside the spinlock
protected region (which also set the new pointer) is enough to get the
correct old router object.

Fixes: e1a5382f978b ("batman-adv: Make orig_node->router an rcu protected pointer")
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <a@unstable.cc>
---
 net/batman-adv/routing.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/net/batman-adv/routing.c b/net/batman-adv/routing.c
index 4dd646a52f1a..b781bf753250 100644
--- a/net/batman-adv/routing.c
+++ b/net/batman-adv/routing.c
@@ -105,6 +105,15 @@ static void _batadv_update_route(struct batadv_priv *bat_priv,
 		neigh_node = NULL;
 
 	spin_lock_bh(&orig_node->neigh_list_lock);
+	/* curr_router used earlier may not be the current orig_ifinfo->router
+	 * anymore because it was dereferenced outside of the neigh_list_lock
+	 * protected region. After the new best neighbor has replace the current
+	 * best neighbor the reference counter needs to decrease. Consequently,
+	 * the code needs to ensure the curr_router variable contains a pointer
+	 * to the replaced best neighbor.
+	 */
+	curr_router = rcu_dereference_protected(orig_ifinfo->router, true);
+
 	rcu_assign_pointer(orig_ifinfo->router, neigh_node);
 	spin_unlock_bh(&orig_node->neigh_list_lock);
 	batadv_orig_ifinfo_put(orig_ifinfo);
-- 
2.8.1

^ permalink raw reply related

* [PATCH 3/5] batman-adv: Deactivate TO_BE_ACTIVATED hardif on shutdown
From: Antonio Quartulli @ 2016-04-26  3:27 UTC (permalink / raw)
  To: davem; +Cc: netdev, b.a.t.m.a.n, Sven Eckelmann, Marek Lindner,
	Antonio Quartulli
In-Reply-To: <1461641239-7097-1-git-send-email-a@unstable.cc>

From: Sven Eckelmann <sven@narfation.org>

The shutdown of an batman-adv interface can happen with one of its slave
interfaces still being in the BATADV_IF_TO_BE_ACTIVATED state. A possible
reason for it is that the routing algorithm BATMAN_V was selected and
batadv_schedule_bat_ogm was not yet called for this interface. This slave
interface still has to be set to BATADV_IF_INACTIVE or the batman-adv
interface will never reduce its usage counter and thus never gets shutdown.

This problem can be simulated via:

    $ modprobe dummy
    $ modprobe batman-adv routing_algo=BATMAN_V
    $ ip link add bat0 type batadv
    $ ip link set dummy0 master bat0
    $ ip link set dummy0 up
    $ ip link del bat0
    unregister_netdevice: waiting for bat0 to become free. Usage count = 3

Reported-by: Matthias Schiffer <mschiffer@universe-factory.net>
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <a@unstable.cc>
---
 net/batman-adv/hard-interface.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/net/batman-adv/hard-interface.c b/net/batman-adv/hard-interface.c
index b22b2775a0a5..c61d5b0b24d2 100644
--- a/net/batman-adv/hard-interface.c
+++ b/net/batman-adv/hard-interface.c
@@ -572,8 +572,7 @@ void batadv_hardif_disable_interface(struct batadv_hard_iface *hard_iface,
 	struct batadv_priv *bat_priv = netdev_priv(hard_iface->soft_iface);
 	struct batadv_hard_iface *primary_if = NULL;
 
-	if (hard_iface->if_status == BATADV_IF_ACTIVE)
-		batadv_hardif_deactivate_interface(hard_iface);
+	batadv_hardif_deactivate_interface(hard_iface);
 
 	if (hard_iface->if_status != BATADV_IF_INACTIVE)
 		goto out;
-- 
2.8.1

^ permalink raw reply related

* [PATCH 2/5] batman-adv: init neigh node last seen field
From: Antonio Quartulli @ 2016-04-26  3:27 UTC (permalink / raw)
  To: davem; +Cc: netdev, b.a.t.m.a.n, Marek Lindner, Sven Eckelmann,
	Antonio Quartulli
In-Reply-To: <1461641239-7097-1-git-send-email-a@unstable.cc>

From: Marek Lindner <mareklindner@neomailbox.ch>

Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
[sven@narfation.org: fix conflicts with current version]
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Antonio Quartulli <a@unstable.cc>
---
 net/batman-adv/originator.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/batman-adv/originator.c b/net/batman-adv/originator.c
index e4cbb0753e37..d52f67a0c057 100644
--- a/net/batman-adv/originator.c
+++ b/net/batman-adv/originator.c
@@ -663,6 +663,7 @@ batadv_neigh_node_new(struct batadv_orig_node *orig_node,
 	ether_addr_copy(neigh_node->addr, neigh_addr);
 	neigh_node->if_incoming = hard_iface;
 	neigh_node->orig_node = orig_node;
+	neigh_node->last_seen = jiffies;
 
 	/* extra reference for return */
 	kref_init(&neigh_node->refcount);
-- 
2.8.1

^ permalink raw reply related

* [PATCH 1/5] batman-adv: Check skb size before using encapsulated ETH+VLAN header
From: Antonio Quartulli @ 2016-04-26  3:27 UTC (permalink / raw)
  To: davem; +Cc: netdev, b.a.t.m.a.n, Sven Eckelmann, Marek Lindner,
	Antonio Quartulli
In-Reply-To: <1461641239-7097-1-git-send-email-a@unstable.cc>

From: Sven Eckelmann <sven@narfation.org>

The encapsulated ethernet and VLAN header may be outside the received
ethernet frame. Thus the skb buffer size has to be checked before it can be
parsed to find out if it encapsulates another batman-adv packet.

Fixes: 420193573f11 ("batman-adv: softif bridge loop avoidance")
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <a@unstable.cc>
---
 net/batman-adv/soft-interface.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/net/batman-adv/soft-interface.c b/net/batman-adv/soft-interface.c
index 0710379491bf..8a136b6a1ff0 100644
--- a/net/batman-adv/soft-interface.c
+++ b/net/batman-adv/soft-interface.c
@@ -408,11 +408,17 @@ void batadv_interface_rx(struct net_device *soft_iface,
 	 */
 	nf_reset(skb);
 
+	if (unlikely(!pskb_may_pull(skb, ETH_HLEN)))
+		goto dropped;
+
 	vid = batadv_get_vid(skb, 0);
 	ethhdr = eth_hdr(skb);
 
 	switch (ntohs(ethhdr->h_proto)) {
 	case ETH_P_8021Q:
+		if (!pskb_may_pull(skb, VLAN_ETH_HLEN))
+			goto dropped;
+
 		vhdr = (struct vlan_ethhdr *)skb->data;
 
 		if (vhdr->h_vlan_encapsulated_proto != ethertype)
@@ -424,8 +430,6 @@ void batadv_interface_rx(struct net_device *soft_iface,
 	}
 
 	/* skb->dev & skb->pkt_type are set here */
-	if (unlikely(!pskb_may_pull(skb, ETH_HLEN)))
-		goto dropped;
 	skb->protocol = eth_type_trans(skb, soft_iface);
 
 	/* should not be necessary anymore as we use skb_pull_rcsum()
-- 
2.8.1

^ permalink raw reply related

* pull request [net]: batman-adv-0160426
From: Antonio Quartulli @ 2016-04-26  3:27 UTC (permalink / raw)
  To: davem; +Cc: netdev, b.a.t.m.a.n

Hi David,

this is a batch intended for net. Changes are quite small, therefore I
hope it is not a big deal to include them at this point of the release cycle.

In this patchset you can find the following fixes:

1) check skb size to avoid reading beyond its border when delivering
   payloads, by Sven Eckelmann
2) initialize last_seen time in neigh_node object to prevent cleanup
   routine from accidentally purge it, by Marek Lindner
3) release "recently added" slave interfaces upon virtual/batman
   interface shutdown, by Sven Eckelmann
4) properly decrease router object reference counter upon routing table
   update, by Sven Eckelmann
5) release queue slots when purging OGM packets of deactivating slave
   interface, by Linus Lüssing

Patch 2 and 3 have no "Fixes:" tag because the offending commits date
back to when batman-adv was not yet officially in the net tree.

Note that all these changes are fixing very old commits and therefore
it would be nice if you could queue them for *stable*.

Please pull or let me know of any issue!

Thanks a lot,
	Antonio




The following changes since commit 5f44abd041c5f3be76d57579ab254d78e601315b:

  Merge tag 'rtc-4.6-3' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux (2016-04-21 15:41:13 -0700)

are available in the git repository at:

  git://git.open-mesh.org/linux-merge.git tags/batman-adv-fix-for-davem

for you to fetch changes up to c4fdb6cff2aa0ae740c5f19b6f745cbbe786d42f:

  batman-adv: Fix broadcast/ogm queue limit on a removed interface (2016-04-24 15:41:56 +0800)

----------------------------------------------------------------
In this patchset you can find the following fixes:

1) check skb size to avoid reading beyond its border when delivering
   payloads, by Sven Eckelmann
2) initialize last_seen time in neigh_node object to prevent cleanup
   routine from accidentally purge it, by Marek Lindner
3) release "recently added" slave interfaces upon virtual/batman
   interface shutdown, by Sven Eckelmann
4) properly decrease router object reference counter upon routing table
   update, by Sven Eckelmann
5) release queue slots when purging OGM packets of deactivating slave
   interface, by Linus Lüssing

Patch 2 and 3 have no "Fixes:" tag because the offending commits date
back to when batman-adv was not yet officially in the net tree.

----------------------------------------------------------------
Linus Lüssing (1):
      batman-adv: Fix broadcast/ogm queue limit on a removed interface

Marek Lindner (1):
      batman-adv: init neigh node last seen field

Sven Eckelmann (3):
      batman-adv: Check skb size before using encapsulated ETH+VLAN header
      batman-adv: Deactivate TO_BE_ACTIVATED hardif on shutdown
      batman-adv: Reduce refcnt of removed router when updating route

 net/batman-adv/hard-interface.c | 3 +--
 net/batman-adv/originator.c     | 1 +
 net/batman-adv/routing.c        | 9 +++++++++
 net/batman-adv/send.c           | 6 ++++++
 net/batman-adv/soft-interface.c | 8 ++++++--
 5 files changed, 23 insertions(+), 4 deletions(-)

^ permalink raw reply

* [PATCH net-next V2] tuntap: calculate rps hash only when needed
From: Jason Wang @ 2016-04-26  3:13 UTC (permalink / raw)
  To: davem, netdev, linux-kernel; +Cc: Jason Wang, Michael S. Tsirkin

There's no need to calculate rps hash if it was not enabled. So this
patch export rps_needed and check it before trying to get rps
hash. Tests (using pktgen to inject packets to guest) shows this can
improve pps about 13% (when rps is disabled).

Before:
~1150000 pps
After:
~1300000 pps

Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
----
Changes from V1:
- Fix build when CONFIG_RPS is not set
---
 drivers/net/tun.c | 4 +++-
 net/core/dev.c    | 1 +
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index afdf950..8df9e23 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -819,7 +819,8 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev)
 	if (txq >= numqueues)
 		goto drop;
 
-	if (numqueues == 1) {
+#ifdef CONFIG_RPS
+	if (numqueues == 1 && static_key_false(&rps_needed)) {
 		/* Select queue was not called for the skbuff, so we extract the
 		 * RPS hash and save it into the flow_table here.
 		 */
@@ -834,6 +835,7 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev)
 				tun_flow_save_rps_rxhash(e, rxhash);
 		}
 	}
+#endif
 
 	tun_debug(KERN_INFO, tun, "tun_net_xmit %d\n", skb->len);
 
diff --git a/net/core/dev.c b/net/core/dev.c
index b9bcbe7..d4ba936 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3428,6 +3428,7 @@ u32 rps_cpu_mask __read_mostly;
 EXPORT_SYMBOL(rps_cpu_mask);
 
 struct static_key rps_needed __read_mostly;
+EXPORT_SYMBOL(rps_needed);
 
 static struct rps_dev_flow *
 set_rps_cpu(struct net_device *dev, struct sk_buff *skb,
-- 
1.8.3.1

^ permalink raw reply related

* Re: [PATCH net-next] tuntap: calculate rps hash only when needed
From: Jason Wang @ 2016-04-26  3:12 UTC (permalink / raw)
  To: davem, netdev, linux-kernel; +Cc: Michael S. Tsirkin
In-Reply-To: <1461635741-18857-1-git-send-email-jasowang@redhat.com>



On 04/26/2016 09:55 AM, Jason Wang wrote:
> There's no need to calculate rps hash if it was not enabled. So this
> patch export rps_needed and check it before trying to get rps
> hash. Tests (using pktgen to inject packets to guest) shows this can
> improve pps about 13% (when rps is disabled).
>
> Before:
> ~1150000 pps
> After:
> ~1300000 pps
>
> Cc: Michael S. Tsirkin <mst@redhat.com>
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---
>  drivers/net/tun.c | 2 +-
>  net/core/dev.c    | 1 +
>  2 files changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index afdf950..746877f 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -819,7 +819,7 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev)
>  	if (txq >= numqueues)
>  		goto drop;
>  
> -	if (numqueues == 1) {
> +	if (numqueues == 1 && static_key_false(&rps_needed)) {
>  		/* Select queue was not called for the skbuff, so we extract the
>  		 * RPS hash and save it into the flow_table here.
>  		 */
> diff --git a/net/core/dev.c b/net/core/dev.c
> index b9bcbe7..d4ba936 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -3428,6 +3428,7 @@ u32 rps_cpu_mask __read_mostly;
>  EXPORT_SYMBOL(rps_cpu_mask);
>  
>  struct static_key rps_needed __read_mostly;
> +EXPORT_SYMBOL(rps_needed);
>  
>  static struct rps_dev_flow *
>  set_rps_cpu(struct net_device *dev, struct sk_buff *skb,

Kbuild bot reports an error when !CONFIG_RPS. Will send V2 to fix this.

^ permalink raw reply

* Re: [PATCH] net: ipv6: Delete host routes on an ifdown
From: David Miller @ 2016-04-26  2:53 UTC (permalink / raw)
  To: mmanning; +Cc: dsa, netdev
In-Reply-To: <571EBCF0.8030609@brocade.com>

From: Mike Manning <mmanning@brocade.com>
Date: Tue, 26 Apr 2016 01:57:20 +0100

> It would be great if this could be reconsidered

If you guys start ganging up on me, I will set you all to ignore.
This is my last warning.

Please respect my decision and try to shore up this change properly
for 4.7.0

Thank you.

^ permalink raw reply

* Re: [PATCH] net: ipv6: Delete host routes on an ifdown
From: David Miller @ 2016-04-26  2:50 UTC (permalink / raw)
  To: dsa; +Cc: netdev, mmanning
In-Reply-To: <571E943C.9010504@cumulusnetworks.com>

From: David Ahern <dsa@cumulusnetworks.com>
Date: Mon, 25 Apr 2016 16:03:40 -0600

> Rather than focusing on my mistakes, why not see the commitment on
> following through with this change?

I do not question the amount of time and effort invested.

I question whether the change was truly ready yet, and my
conclusion right now is that it is not ready for 4.6.0-final

So I reverted instead of waiting for the other shoe to drop.

Thanks.

^ permalink raw reply

* RE: [v8, 1/7] Documentation: DT: update Freescale DCFG compatible
From: Yangbo Lu @ 2016-04-26  2:43 UTC (permalink / raw)
  To: Mark Rutland
  Cc: ulf.hansson-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org, Zhao Qiang,
	Xiaobo Xie, linux-i2c-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-clk-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Russell King,
	Bhupesh Sharma, Jochen Friedrich, Scott Wood, Claudiu Manoil,
	devicetree-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Rob Herring,
	Santosh Shilimkar,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org,
	netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-mmc-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Yang-Leo Li,
	"iommu-cunTk1MwBs9QetFLy7KEmw@public.gmane.org
In-Reply-To: <20160422131151.GJ10606@leverpostej>

Hi Mark,


> -----Original Message-----
> From: Mark Rutland [mailto:mark.rutland-5wv7dgnIgG8@public.gmane.org]
> Sent: Friday, April 22, 2016 9:12 PM
> To: Yangbo Lu
> Cc: linux-mmc-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; linuxppc-dev-uLR06cmDAlY/bJ5BZ2RsiQ@public.gmane.org;
> devicetree-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org; linux-
> kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; linux-clk-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; linux-
> i2c-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org;
> netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; ulf.hansson-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org; Scott Wood; Rob Herring;
> Russell King; Jochen Friedrich; Joerg Roedel; Claudiu Manoil; Bhupesh
> Sharma; Zhao Qiang; Kumar Gala; Santosh Shilimkar; Yang-Leo Li; Xiaobo
> Xie
> Subject: Re: [v8, 1/7] Documentation: DT: update Freescale DCFG
> compatible
> 
> On Fri, Apr 22, 2016 at 02:27:38PM +0800, Yangbo Lu wrote:
> > Update Freescale DCFG compatible with 'fsl,<chip>-dcfg' instead of
> > 'fsl,ls1021a-dcfg' to include more chips.
> >
> > Signed-off-by: Yangbo Lu <yangbo.lu-3arQi8VN3Tc@public.gmane.org>
> > ---
> > Changes for v8:
> > 	- Added this patch
> > ---
> >  Documentation/devicetree/bindings/arm/fsl.txt | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/Documentation/devicetree/bindings/arm/fsl.txt
> > b/Documentation/devicetree/bindings/arm/fsl.txt
> > index 752a685..1d5f512 100644
> > --- a/Documentation/devicetree/bindings/arm/fsl.txt
> > +++ b/Documentation/devicetree/bindings/arm/fsl.txt
> > @@ -119,7 +119,7 @@ Freescale DCFG
> >  configuration and status for the device. Such as setting the
> > secondary  core start address and release the secondary core from
> holdoff and startup.
> >    Required properties:
> > -  - compatible: should be "fsl,ls1021a-dcfg"
> > +  - compatible: should be "fsl,<chip>-dcfg"
> 
> Please list specific values expected for <chip>, while jusy saying <chip>
> may be more generic, it makes it practically impossible to search for the
> correct binding given a compatible string, and it's vague as to exaclty
> what <chip> should be.

[Lu Yangbo-B47093] Thanks for your comment. I will list the possible chips.

> 
> Thanks,
> Mark.
> 
> 
> 
> >    - reg : should contain base address and length of DCFG
> > memory-mapped registers
> >
> >  Example:
> > --
> > 2.1.0.27.g96db324
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe devicetree"
> > in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo
> > info at  http://vger.kernel.org/majordomo-info.html
> >

^ permalink raw reply

* Re: myri10ge: fix sleeping with bh disabled
From: Hyong-Youb Kim @ 2016-04-26  2:22 UTC (permalink / raw)
  To: Stanislaw Gruszka; +Cc: netdev
In-Reply-To: <20160425085918.GB2608@redhat.com>

On Mon, Apr 25, 2016 at 10:59:19AM +0200, Stanislaw Gruszka wrote:
> napi_disable() can not be called with bh disabled, move locking just
> around myri10ge_ss_lock_napi() .

Acked-by: Hyong-Youb Kim <hykim@myri.com>

Thanks.

> 
> Patches fixes following bug:
> 
> [  114.278378] BUG: sleeping function called from invalid context at net/core/dev.c:4383 
> <snip>
> [  114.313712] Call Trace: 
> [  114.314943]  [<ffffffff817010ce>] dump_stack+0x19/0x1b 
> [  114.317673]  [<ffffffff810ce7f3>] __might_sleep+0x173/0x230 
> [  114.320566]  [<ffffffff815b3117>] napi_disable+0x27/0x90 
> [  114.323254]  [<ffffffffa01e437f>] myri10ge_close+0xbf/0x3f0 [myri10ge] 
> 
> Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
> ---
> diff --git a/drivers/net/ethernet/myricom/myri10ge/myri10ge.c b/drivers/net/ethernet/myricom/myri10ge/myri10ge.c
> index 270c9ee..6d1a956 100644
> --- a/drivers/net/ethernet/myricom/myri10ge/myri10ge.c
> +++ b/drivers/net/ethernet/myricom/myri10ge/myri10ge.c
> @@ -2668,9 +2668,9 @@ static int myri10ge_close(struct net_device *dev)
>  
>  	del_timer_sync(&mgp->watchdog_timer);
>  	mgp->running = MYRI10GE_ETH_STOPPING;
> -	local_bh_disable(); /* myri10ge_ss_lock_napi needs bh disabled */
>  	for (i = 0; i < mgp->num_slices; i++) {
>  		napi_disable(&mgp->ss[i].napi);
> +		local_bh_disable(); /* myri10ge_ss_lock_napi needs this */
>  		/* Lock the slice to prevent the busy_poll handler from
>  		 * accessing it.  Later when we bring the NIC up, myri10ge_open
>  		 * resets the slice including this lock.
> @@ -2679,8 +2679,8 @@ static int myri10ge_close(struct net_device *dev)
>  			pr_info("Slice %d locked\n", i);
>  			mdelay(1);
>  		}
> +		local_bh_enable();
>  	}
> -	local_bh_enable();
>  	netif_carrier_off(dev);
>  
>  	netif_tx_stop_all_queues(dev);

^ permalink raw reply

* [PATCH] ps3_gelic: fix memcpy parameter
From: Christophe JAILLET @ 2016-04-26  2:33 UTC (permalink / raw)
  To: netdev; +Cc: linux-kernel, kernel-janitors, Christophe JAILLET

The size allocated for target->hwinfo and the number of bytes copied in it
should be consistent.

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
---
Untested

 drivers/net/ethernet/toshiba/ps3_gelic_wireless.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/toshiba/ps3_gelic_wireless.c b/drivers/net/ethernet/toshiba/ps3_gelic_wireless.c
index 13214a6..743b182 100644
--- a/drivers/net/ethernet/toshiba/ps3_gelic_wireless.c
+++ b/drivers/net/ethernet/toshiba/ps3_gelic_wireless.c
@@ -1622,7 +1622,7 @@ static void gelic_wl_scan_complete_event(struct gelic_wl_info *wl)
 			continue;
 
 		/* copy hw scan info */
-		memcpy(target->hwinfo, scan_info, scan_info->size);
+		memcpy(target->hwinfo, scan_info, be16_to_cpu(scan_info->size));
 		target->essid_len = strnlen(scan_info->essid,
 					    sizeof(scan_info->essid));
 		target->rate_len = 0;
-- 
2.7.4

^ permalink raw reply related

* Re: [PATCH net-next] tuntap: calculate rps hash only when needed
From: kbuild test robot @ 2016-04-26  2:30 UTC (permalink / raw)
  To: Jason Wang
  Cc: kbuild-all, davem, netdev, linux-kernel, Jason Wang,
	Michael S. Tsirkin
In-Reply-To: <1461635741-18857-1-git-send-email-jasowang@redhat.com>

[-- Attachment #1: Type: text/plain, Size: 1464 bytes --]

Hi,

[auto build test ERROR on net-next/master]

url:    https://github.com/0day-ci/linux/commits/Jason-Wang/tuntap-calculate-rps-hash-only-when-needed/20160426-095825
config: xtensa-allyesconfig (attached as .config)
compiler: 
reproduce:
        wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=xtensa 

All errors (new ones prefixed by >>):

   drivers/net/tun.c: In function 'tun_net_xmit':
>> drivers/net/tun.c:836:42: error: 'rps_needed' undeclared (first use in this function)
     if (numqueues == 1 && static_key_false(&rps_needed)) {
                                             ^
   drivers/net/tun.c:836:42: note: each undeclared identifier is reported only once for each function it appears in

vim +/rps_needed +836 drivers/net/tun.c

   830		numqueues = ACCESS_ONCE(tun->numqueues);
   831	
   832		/* Drop packet if interface is not attached */
   833		if (txq >= numqueues)
   834			goto drop;
   835	
 > 836		if (numqueues == 1 && static_key_false(&rps_needed)) {
   837			/* Select queue was not called for the skbuff, so we extract the
   838			 * RPS hash and save it into the flow_table here.
   839			 */

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/octet-stream, Size: 44887 bytes --]

^ permalink raw reply

* linux-next: manual merge of the net-next tree with the net tree
From: Stephen Rothwell @ 2016-04-26  2:18 UTC (permalink / raw)
  To: David Miller, netdev; +Cc: linux-next, linux-kernel, Konstantin Khlebnikov

Hi all,

Today's linux-next merge of the net-next tree got conflicts in:

  include/linux/ipv6.h
  net/ipv6/addrconf.c

between commit:

  841645b5f2df ("ipv6: Revert optional address flusing on ifdown.")

from the net tree and commits:

  607ea7cda631 ("net/ipv6/addrconf: simplify sysctl registration")
  5df1f77f65e1 ("net/ipv6/addrconf: fix sysctl table indentation")

from the net-next tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc include/linux/ipv6.h
index 4b2267e1b7c3,58d6e158755f..000000000000
--- a/include/linux/ipv6.h
+++ b/include/linux/ipv6.h
@@@ -62,7 -62,9 +62,8 @@@ struct ipv6_devconf 
  		struct in6_addr secret;
  	} stable_secret;
  	__s32		use_oif_addrs_only;
- 	void		*sysctl;
 -	__s32		keep_addr_on_down;
+ 
+ 	struct ctl_table_header *sysctl_header;
  };
  
  struct ipv6_params {
diff --cc net/ipv6/addrconf.c
index d77ba395d593,f5a77a9dd34e..000000000000
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@@ -5506,323 -5637,322 +5507,314 @@@ int addrconf_sysctl_ignore_routes_with_
  	return ret;
  }
  
- static struct addrconf_sysctl_table
- {
- 	struct ctl_table_header *sysctl_header;
- 	struct ctl_table addrconf_vars[DEVCONF_MAX+1];
- } addrconf_sysctl __read_mostly = {
- 	.sysctl_header = NULL,
- 	.addrconf_vars = {
- 		{
- 			.procname	= "forwarding",
- 			.data		= &ipv6_devconf.forwarding,
- 			.maxlen		= sizeof(int),
- 			.mode		= 0644,
- 			.proc_handler	= addrconf_sysctl_forward,
- 		},
- 		{
- 			.procname	= "hop_limit",
- 			.data		= &ipv6_devconf.hop_limit,
- 			.maxlen		= sizeof(int),
- 			.mode		= 0644,
- 			.proc_handler	= addrconf_sysctl_hop_limit,
- 		},
- 		{
- 			.procname	= "mtu",
- 			.data		= &ipv6_devconf.mtu6,
- 			.maxlen		= sizeof(int),
- 			.mode		= 0644,
- 			.proc_handler	= addrconf_sysctl_mtu,
- 		},
- 		{
- 			.procname	= "accept_ra",
- 			.data		= &ipv6_devconf.accept_ra,
- 			.maxlen		= sizeof(int),
- 			.mode		= 0644,
- 			.proc_handler	= proc_dointvec,
- 		},
- 		{
- 			.procname	= "accept_redirects",
- 			.data		= &ipv6_devconf.accept_redirects,
- 			.maxlen		= sizeof(int),
- 			.mode		= 0644,
- 			.proc_handler	= proc_dointvec,
- 		},
- 		{
- 			.procname	= "autoconf",
- 			.data		= &ipv6_devconf.autoconf,
- 			.maxlen		= sizeof(int),
- 			.mode		= 0644,
- 			.proc_handler	= proc_dointvec,
- 		},
- 		{
- 			.procname	= "dad_transmits",
- 			.data		= &ipv6_devconf.dad_transmits,
- 			.maxlen		= sizeof(int),
- 			.mode		= 0644,
- 			.proc_handler	= proc_dointvec,
- 		},
- 		{
- 			.procname	= "router_solicitations",
- 			.data		= &ipv6_devconf.rtr_solicits,
- 			.maxlen		= sizeof(int),
- 			.mode		= 0644,
- 			.proc_handler	= proc_dointvec,
- 		},
- 		{
- 			.procname	= "router_solicitation_interval",
- 			.data		= &ipv6_devconf.rtr_solicit_interval,
- 			.maxlen		= sizeof(int),
- 			.mode		= 0644,
- 			.proc_handler	= proc_dointvec_jiffies,
- 		},
- 		{
- 			.procname	= "router_solicitation_delay",
- 			.data		= &ipv6_devconf.rtr_solicit_delay,
- 			.maxlen		= sizeof(int),
- 			.mode		= 0644,
- 			.proc_handler	= proc_dointvec_jiffies,
- 		},
- 		{
- 			.procname	= "force_mld_version",
- 			.data		= &ipv6_devconf.force_mld_version,
- 			.maxlen		= sizeof(int),
- 			.mode		= 0644,
- 			.proc_handler	= proc_dointvec,
- 		},
- 		{
- 			.procname	= "mldv1_unsolicited_report_interval",
- 			.data		=
- 				&ipv6_devconf.mldv1_unsolicited_report_interval,
- 			.maxlen		= sizeof(int),
- 			.mode		= 0644,
- 			.proc_handler	= proc_dointvec_ms_jiffies,
- 		},
- 		{
- 			.procname	= "mldv2_unsolicited_report_interval",
- 			.data		=
- 				&ipv6_devconf.mldv2_unsolicited_report_interval,
- 			.maxlen		= sizeof(int),
- 			.mode		= 0644,
- 			.proc_handler	= proc_dointvec_ms_jiffies,
- 		},
- 		{
- 			.procname	= "use_tempaddr",
- 			.data		= &ipv6_devconf.use_tempaddr,
- 			.maxlen		= sizeof(int),
- 			.mode		= 0644,
- 			.proc_handler	= proc_dointvec,
- 		},
- 		{
- 			.procname	= "temp_valid_lft",
- 			.data		= &ipv6_devconf.temp_valid_lft,
- 			.maxlen		= sizeof(int),
- 			.mode		= 0644,
- 			.proc_handler	= proc_dointvec,
- 		},
- 		{
- 			.procname	= "temp_prefered_lft",
- 			.data		= &ipv6_devconf.temp_prefered_lft,
- 			.maxlen		= sizeof(int),
- 			.mode		= 0644,
- 			.proc_handler	= proc_dointvec,
- 		},
- 		{
- 			.procname	= "regen_max_retry",
- 			.data		= &ipv6_devconf.regen_max_retry,
- 			.maxlen		= sizeof(int),
- 			.mode		= 0644,
- 			.proc_handler	= proc_dointvec,
- 		},
- 		{
- 			.procname	= "max_desync_factor",
- 			.data		= &ipv6_devconf.max_desync_factor,
- 			.maxlen		= sizeof(int),
- 			.mode		= 0644,
- 			.proc_handler	= proc_dointvec,
- 		},
- 		{
- 			.procname	= "max_addresses",
- 			.data		= &ipv6_devconf.max_addresses,
- 			.maxlen		= sizeof(int),
- 			.mode		= 0644,
- 			.proc_handler	= proc_dointvec,
- 		},
- 		{
- 			.procname	= "accept_ra_defrtr",
- 			.data		= &ipv6_devconf.accept_ra_defrtr,
- 			.maxlen		= sizeof(int),
- 			.mode		= 0644,
- 			.proc_handler	= proc_dointvec,
- 		},
- 		{
- 			.procname	= "accept_ra_min_hop_limit",
- 			.data		= &ipv6_devconf.accept_ra_min_hop_limit,
- 			.maxlen		= sizeof(int),
- 			.mode		= 0644,
- 			.proc_handler	= proc_dointvec,
- 		},
- 		{
- 			.procname	= "accept_ra_pinfo",
- 			.data		= &ipv6_devconf.accept_ra_pinfo,
- 			.maxlen		= sizeof(int),
- 			.mode		= 0644,
- 			.proc_handler	= proc_dointvec,
- 		},
+ static const struct ctl_table addrconf_sysctl[] = {
+ 	{
+ 		.procname	= "forwarding",
+ 		.data		= &ipv6_devconf.forwarding,
+ 		.maxlen		= sizeof(int),
+ 		.mode		= 0644,
+ 		.proc_handler	= addrconf_sysctl_forward,
+ 	},
+ 	{
+ 		.procname	= "hop_limit",
+ 		.data		= &ipv6_devconf.hop_limit,
+ 		.maxlen		= sizeof(int),
+ 		.mode		= 0644,
+ 		.proc_handler	= addrconf_sysctl_hop_limit,
+ 	},
+ 	{
+ 		.procname	= "mtu",
+ 		.data		= &ipv6_devconf.mtu6,
+ 		.maxlen		= sizeof(int),
+ 		.mode		= 0644,
+ 		.proc_handler	= addrconf_sysctl_mtu,
+ 	},
+ 	{
+ 		.procname	= "accept_ra",
+ 		.data		= &ipv6_devconf.accept_ra,
+ 		.maxlen		= sizeof(int),
+ 		.mode		= 0644,
+ 		.proc_handler	= proc_dointvec,
+ 	},
+ 	{
+ 		.procname	= "accept_redirects",
+ 		.data		= &ipv6_devconf.accept_redirects,
+ 		.maxlen		= sizeof(int),
+ 		.mode		= 0644,
+ 		.proc_handler	= proc_dointvec,
+ 	},
+ 	{
+ 		.procname	= "autoconf",
+ 		.data		= &ipv6_devconf.autoconf,
+ 		.maxlen		= sizeof(int),
+ 		.mode		= 0644,
+ 		.proc_handler	= proc_dointvec,
+ 	},
+ 	{
+ 		.procname	= "dad_transmits",
+ 		.data		= &ipv6_devconf.dad_transmits,
+ 		.maxlen		= sizeof(int),
+ 		.mode		= 0644,
+ 		.proc_handler	= proc_dointvec,
+ 	},
+ 	{
+ 		.procname	= "router_solicitations",
+ 		.data		= &ipv6_devconf.rtr_solicits,
+ 		.maxlen		= sizeof(int),
+ 		.mode		= 0644,
+ 		.proc_handler	= proc_dointvec,
+ 	},
+ 	{
+ 		.procname	= "router_solicitation_interval",
+ 		.data		= &ipv6_devconf.rtr_solicit_interval,
+ 		.maxlen		= sizeof(int),
+ 		.mode		= 0644,
+ 		.proc_handler	= proc_dointvec_jiffies,
+ 	},
+ 	{
+ 		.procname	= "router_solicitation_delay",
+ 		.data		= &ipv6_devconf.rtr_solicit_delay,
+ 		.maxlen		= sizeof(int),
+ 		.mode		= 0644,
+ 		.proc_handler	= proc_dointvec_jiffies,
+ 	},
+ 	{
+ 		.procname	= "force_mld_version",
+ 		.data		= &ipv6_devconf.force_mld_version,
+ 		.maxlen		= sizeof(int),
+ 		.mode		= 0644,
+ 		.proc_handler	= proc_dointvec,
+ 	},
+ 	{
+ 		.procname	= "mldv1_unsolicited_report_interval",
+ 		.data		=
+ 			&ipv6_devconf.mldv1_unsolicited_report_interval,
+ 		.maxlen		= sizeof(int),
+ 		.mode		= 0644,
+ 		.proc_handler	= proc_dointvec_ms_jiffies,
+ 	},
+ 	{
+ 		.procname	= "mldv2_unsolicited_report_interval",
+ 		.data		=
+ 			&ipv6_devconf.mldv2_unsolicited_report_interval,
+ 		.maxlen		= sizeof(int),
+ 		.mode		= 0644,
+ 		.proc_handler	= proc_dointvec_ms_jiffies,
+ 	},
+ 	{
+ 		.procname	= "use_tempaddr",
+ 		.data		= &ipv6_devconf.use_tempaddr,
+ 		.maxlen		= sizeof(int),
+ 		.mode		= 0644,
+ 		.proc_handler	= proc_dointvec,
+ 	},
+ 	{
+ 		.procname	= "temp_valid_lft",
+ 		.data		= &ipv6_devconf.temp_valid_lft,
+ 		.maxlen		= sizeof(int),
+ 		.mode		= 0644,
+ 		.proc_handler	= proc_dointvec,
+ 	},
+ 	{
+ 		.procname	= "temp_prefered_lft",
+ 		.data		= &ipv6_devconf.temp_prefered_lft,
+ 		.maxlen		= sizeof(int),
+ 		.mode		= 0644,
+ 		.proc_handler	= proc_dointvec,
+ 	},
+ 	{
+ 		.procname	= "regen_max_retry",
+ 		.data		= &ipv6_devconf.regen_max_retry,
+ 		.maxlen		= sizeof(int),
+ 		.mode		= 0644,
+ 		.proc_handler	= proc_dointvec,
+ 	},
+ 	{
+ 		.procname	= "max_desync_factor",
+ 		.data		= &ipv6_devconf.max_desync_factor,
+ 		.maxlen		= sizeof(int),
+ 		.mode		= 0644,
+ 		.proc_handler	= proc_dointvec,
+ 	},
+ 	{
+ 		.procname	= "max_addresses",
+ 		.data		= &ipv6_devconf.max_addresses,
+ 		.maxlen		= sizeof(int),
+ 		.mode		= 0644,
+ 		.proc_handler	= proc_dointvec,
+ 	},
+ 	{
+ 		.procname	= "accept_ra_defrtr",
+ 		.data		= &ipv6_devconf.accept_ra_defrtr,
+ 		.maxlen		= sizeof(int),
+ 		.mode		= 0644,
+ 		.proc_handler	= proc_dointvec,
+ 	},
+ 	{
+ 		.procname	= "accept_ra_min_hop_limit",
+ 		.data		= &ipv6_devconf.accept_ra_min_hop_limit,
+ 		.maxlen		= sizeof(int),
+ 		.mode		= 0644,
+ 		.proc_handler	= proc_dointvec,
+ 	},
+ 	{
+ 		.procname	= "accept_ra_pinfo",
+ 		.data		= &ipv6_devconf.accept_ra_pinfo,
+ 		.maxlen		= sizeof(int),
+ 		.mode		= 0644,
+ 		.proc_handler	= proc_dointvec,
+ 	},
  #ifdef CONFIG_IPV6_ROUTER_PREF
- 		{
- 			.procname	= "accept_ra_rtr_pref",
- 			.data		= &ipv6_devconf.accept_ra_rtr_pref,
- 			.maxlen		= sizeof(int),
- 			.mode		= 0644,
- 			.proc_handler	= proc_dointvec,
- 		},
- 		{
- 			.procname	= "router_probe_interval",
- 			.data		= &ipv6_devconf.rtr_probe_interval,
- 			.maxlen		= sizeof(int),
- 			.mode		= 0644,
- 			.proc_handler	= proc_dointvec_jiffies,
- 		},
+ 	{
+ 		.procname	= "accept_ra_rtr_pref",
+ 		.data		= &ipv6_devconf.accept_ra_rtr_pref,
+ 		.maxlen		= sizeof(int),
+ 		.mode		= 0644,
+ 		.proc_handler	= proc_dointvec,
+ 	},
+ 	{
+ 		.procname	= "router_probe_interval",
+ 		.data		= &ipv6_devconf.rtr_probe_interval,
+ 		.maxlen		= sizeof(int),
+ 		.mode		= 0644,
+ 		.proc_handler	= proc_dointvec_jiffies,
+ 	},
  #ifdef CONFIG_IPV6_ROUTE_INFO
- 		{
- 			.procname	= "accept_ra_rt_info_max_plen",
- 			.data		= &ipv6_devconf.accept_ra_rt_info_max_plen,
- 			.maxlen		= sizeof(int),
- 			.mode		= 0644,
- 			.proc_handler	= proc_dointvec,
- 		},
+ 	{
+ 		.procname	= "accept_ra_rt_info_max_plen",
+ 		.data		= &ipv6_devconf.accept_ra_rt_info_max_plen,
+ 		.maxlen		= sizeof(int),
+ 		.mode		= 0644,
+ 		.proc_handler	= proc_dointvec,
+ 	},
  #endif
  #endif
- 		{
- 			.procname	= "proxy_ndp",
- 			.data		= &ipv6_devconf.proxy_ndp,
- 			.maxlen		= sizeof(int),
- 			.mode		= 0644,
- 			.proc_handler	= addrconf_sysctl_proxy_ndp,
- 		},
- 		{
- 			.procname	= "accept_source_route",
- 			.data		= &ipv6_devconf.accept_source_route,
- 			.maxlen		= sizeof(int),
- 			.mode		= 0644,
- 			.proc_handler	= proc_dointvec,
- 		},
+ 	{
+ 		.procname	= "proxy_ndp",
+ 		.data		= &ipv6_devconf.proxy_ndp,
+ 		.maxlen		= sizeof(int),
+ 		.mode		= 0644,
+ 		.proc_handler	= addrconf_sysctl_proxy_ndp,
+ 	},
+ 	{
+ 		.procname	= "accept_source_route",
+ 		.data		= &ipv6_devconf.accept_source_route,
+ 		.maxlen		= sizeof(int),
+ 		.mode		= 0644,
+ 		.proc_handler	= proc_dointvec,
+ 	},
  #ifdef CONFIG_IPV6_OPTIMISTIC_DAD
- 		{
- 			.procname       = "optimistic_dad",
- 			.data           = &ipv6_devconf.optimistic_dad,
- 			.maxlen         = sizeof(int),
- 			.mode           = 0644,
- 			.proc_handler   = proc_dointvec,
- 
- 		},
- 		{
- 			.procname       = "use_optimistic",
- 			.data           = &ipv6_devconf.use_optimistic,
- 			.maxlen         = sizeof(int),
- 			.mode           = 0644,
- 			.proc_handler   = proc_dointvec,
- 
- 		},
+ 	{
+ 		.procname	= "optimistic_dad",
+ 		.data		= &ipv6_devconf.optimistic_dad,
+ 		.maxlen		= sizeof(int),
+ 		.mode		= 0644,
+ 		.proc_handler   = proc_dointvec,
+ 	},
+ 	{
+ 		.procname	= "use_optimistic",
+ 		.data		= &ipv6_devconf.use_optimistic,
+ 		.maxlen		= sizeof(int),
+ 		.mode		= 0644,
+ 		.proc_handler	= proc_dointvec,
+ 	},
  #endif
  #ifdef CONFIG_IPV6_MROUTE
- 		{
- 			.procname	= "mc_forwarding",
- 			.data		= &ipv6_devconf.mc_forwarding,
- 			.maxlen		= sizeof(int),
- 			.mode		= 0444,
- 			.proc_handler	= proc_dointvec,
- 		},
+ 	{
+ 		.procname	= "mc_forwarding",
+ 		.data		= &ipv6_devconf.mc_forwarding,
+ 		.maxlen		= sizeof(int),
+ 		.mode		= 0444,
+ 		.proc_handler	= proc_dointvec,
+ 	},
  #endif
- 		{
- 			.procname	= "disable_ipv6",
- 			.data		= &ipv6_devconf.disable_ipv6,
- 			.maxlen		= sizeof(int),
- 			.mode		= 0644,
- 			.proc_handler	= addrconf_sysctl_disable,
- 		},
- 		{
- 			.procname	= "accept_dad",
- 			.data		= &ipv6_devconf.accept_dad,
- 			.maxlen		= sizeof(int),
- 			.mode		= 0644,
- 			.proc_handler	= proc_dointvec,
- 		},
- 		{
- 			.procname       = "force_tllao",
- 			.data           = &ipv6_devconf.force_tllao,
- 			.maxlen         = sizeof(int),
- 			.mode           = 0644,
- 			.proc_handler   = proc_dointvec
- 		},
- 		{
- 			.procname       = "ndisc_notify",
- 			.data           = &ipv6_devconf.ndisc_notify,
- 			.maxlen         = sizeof(int),
- 			.mode           = 0644,
- 			.proc_handler   = proc_dointvec
- 		},
- 		{
- 			.procname	= "suppress_frag_ndisc",
- 			.data		= &ipv6_devconf.suppress_frag_ndisc,
- 			.maxlen		= sizeof(int),
- 			.mode		= 0644,
- 			.proc_handler	= proc_dointvec
- 		},
- 		{
- 			.procname	= "accept_ra_from_local",
- 			.data		= &ipv6_devconf.accept_ra_from_local,
- 			.maxlen		= sizeof(int),
- 			.mode		= 0644,
- 			.proc_handler	= proc_dointvec,
- 		},
- 		{
- 			.procname	= "accept_ra_mtu",
- 			.data		= &ipv6_devconf.accept_ra_mtu,
- 			.maxlen		= sizeof(int),
- 			.mode		= 0644,
- 			.proc_handler	= proc_dointvec,
- 		},
- 		{
- 			.procname	= "stable_secret",
- 			.data		= &ipv6_devconf.stable_secret,
- 			.maxlen		= IPV6_MAX_STRLEN,
- 			.mode		= 0600,
- 			.proc_handler	= addrconf_sysctl_stable_secret,
- 		},
- 		{
- 			.procname       = "use_oif_addrs_only",
- 			.data           = &ipv6_devconf.use_oif_addrs_only,
- 			.maxlen         = sizeof(int),
- 			.mode           = 0644,
- 			.proc_handler   = proc_dointvec,
- 		},
- 		{
- 			.procname	= "ignore_routes_with_linkdown",
- 			.data		= &ipv6_devconf.ignore_routes_with_linkdown,
- 			.maxlen		= sizeof(int),
- 			.mode		= 0644,
- 			.proc_handler	= addrconf_sysctl_ignore_routes_with_linkdown,
- 		},
- 		{
- 			.procname	= "drop_unicast_in_l2_multicast",
- 			.data		= &ipv6_devconf.drop_unicast_in_l2_multicast,
- 			.maxlen		= sizeof(int),
- 			.mode		= 0644,
- 			.proc_handler	= proc_dointvec,
- 		},
- 		{
- 			.procname	= "drop_unsolicited_na",
- 			.data		= &ipv6_devconf.drop_unsolicited_na,
- 			.maxlen		= sizeof(int),
- 			.mode		= 0644,
- 			.proc_handler	= proc_dointvec,
- 		},
- 		{
- 			/* sentinel */
- 		}
+ 	{
+ 		.procname	= "disable_ipv6",
+ 		.data		= &ipv6_devconf.disable_ipv6,
+ 		.maxlen		= sizeof(int),
+ 		.mode		= 0644,
+ 		.proc_handler	= addrconf_sysctl_disable,
+ 	},
+ 	{
+ 		.procname	= "accept_dad",
+ 		.data		= &ipv6_devconf.accept_dad,
+ 		.maxlen		= sizeof(int),
+ 		.mode		= 0644,
+ 		.proc_handler	= proc_dointvec,
+ 	},
+ 	{
+ 		.procname	= "force_tllao",
+ 		.data		= &ipv6_devconf.force_tllao,
+ 		.maxlen		= sizeof(int),
+ 		.mode		= 0644,
+ 		.proc_handler	= proc_dointvec
  	},
+ 	{
+ 		.procname	= "ndisc_notify",
+ 		.data		= &ipv6_devconf.ndisc_notify,
+ 		.maxlen		= sizeof(int),
+ 		.mode		= 0644,
+ 		.proc_handler	= proc_dointvec
+ 	},
+ 	{
+ 		.procname	= "suppress_frag_ndisc",
+ 		.data		= &ipv6_devconf.suppress_frag_ndisc,
+ 		.maxlen		= sizeof(int),
+ 		.mode		= 0644,
+ 		.proc_handler	= proc_dointvec
+ 	},
+ 	{
+ 		.procname	= "accept_ra_from_local",
+ 		.data		= &ipv6_devconf.accept_ra_from_local,
+ 		.maxlen		= sizeof(int),
+ 		.mode		= 0644,
+ 		.proc_handler	= proc_dointvec,
+ 	},
+ 	{
+ 		.procname	= "accept_ra_mtu",
+ 		.data		= &ipv6_devconf.accept_ra_mtu,
+ 		.maxlen		= sizeof(int),
+ 		.mode		= 0644,
+ 		.proc_handler	= proc_dointvec,
+ 	},
+ 	{
+ 		.procname	= "stable_secret",
+ 		.data		= &ipv6_devconf.stable_secret,
+ 		.maxlen		= IPV6_MAX_STRLEN,
+ 		.mode		= 0600,
+ 		.proc_handler	= addrconf_sysctl_stable_secret,
+ 	},
+ 	{
+ 		.procname	= "use_oif_addrs_only",
+ 		.data		= &ipv6_devconf.use_oif_addrs_only,
+ 		.maxlen		= sizeof(int),
+ 		.mode		= 0644,
+ 		.proc_handler	= proc_dointvec,
+ 	},
+ 	{
+ 		.procname	= "ignore_routes_with_linkdown",
+ 		.data		= &ipv6_devconf.ignore_routes_with_linkdown,
+ 		.maxlen		= sizeof(int),
+ 		.mode		= 0644,
+ 		.proc_handler	= addrconf_sysctl_ignore_routes_with_linkdown,
+ 	},
+ 	{
+ 		.procname	= "drop_unicast_in_l2_multicast",
+ 		.data		= &ipv6_devconf.drop_unicast_in_l2_multicast,
+ 		.maxlen		= sizeof(int),
+ 		.mode		= 0644,
+ 		.proc_handler	= proc_dointvec,
+ 	},
+ 	{
+ 		.procname	= "drop_unsolicited_na",
+ 		.data		= &ipv6_devconf.drop_unsolicited_na,
+ 		.maxlen		= sizeof(int),
+ 		.mode		= 0644,
+ 		.proc_handler	= proc_dointvec,
+ 	},
+ 	{
 -		.procname	= "keep_addr_on_down",
 -		.data		= &ipv6_devconf.keep_addr_on_down,
 -		.maxlen		= sizeof(int),
 -		.mode		= 0644,
 -		.proc_handler	= proc_dointvec,
 -
 -	},
 -	{
+ 		/* sentinel */
+ 	}
  };
  
  static int __addrconf_sysctl_register(struct net *net, char *dev_name,

^ permalink raw reply

* [PATCH 2/2] vhost: lockless enqueuing
From: Jason Wang @ 2016-04-26  2:14 UTC (permalink / raw)
  To: mst; +Cc: kvm, virtualization, netdev, linux-kernel, Jason Wang
In-Reply-To: <1461636873-45335-1-git-send-email-jasowang@redhat.com>

We use spinlock to synchronize the work list now which may cause
unnecessary contentions. So this patch switch to use llist to remove
this contention. Pktgen tests shows about 5% improvement:

Before:
~1300000 pps
After:
~1370000 pps

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/vhost/vhost.c | 52 +++++++++++++++++++++++++--------------------------
 drivers/vhost/vhost.h |  7 ++++---
 2 files changed, 29 insertions(+), 30 deletions(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 73dd16d..0061a7b 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -168,7 +168,7 @@ static int vhost_poll_wakeup(wait_queue_t *wait, unsigned mode, int sync,
 
 void vhost_work_init(struct vhost_work *work, vhost_work_fn_t fn)
 {
-	INIT_LIST_HEAD(&work->node);
+	clear_bit(VHOST_WORK_QUEUED, &work->flags);
 	work->fn = fn;
 	init_waitqueue_head(&work->done);
 }
@@ -246,15 +246,16 @@ EXPORT_SYMBOL_GPL(vhost_poll_flush);
 
 void vhost_work_queue(struct vhost_dev *dev, struct vhost_work *work)
 {
-	unsigned long flags;
+	if (!dev->worker)
+		return;
 
-	spin_lock_irqsave(&dev->work_lock, flags);
-	if (list_empty(&work->node)) {
-		list_add_tail(&work->node, &dev->work_list);
-		spin_unlock_irqrestore(&dev->work_lock, flags);
+	if (!test_and_set_bit(VHOST_WORK_QUEUED, &work->flags)) {
+		/* We can only add the work to the list after we're
+		 * sure it was not in the list.
+		 */
+		smp_mb();
+		llist_add(&work->node, &dev->work_list);
 		wake_up_process(dev->worker);
-	} else {
-		spin_unlock_irqrestore(&dev->work_lock, flags);
 	}
 }
 EXPORT_SYMBOL_GPL(vhost_work_queue);
@@ -262,7 +263,7 @@ EXPORT_SYMBOL_GPL(vhost_work_queue);
 /* A lockless hint for busy polling code to exit the loop */
 bool vhost_has_work(struct vhost_dev *dev)
 {
-	return !list_empty(&dev->work_list);
+	return !llist_empty(&dev->work_list);
 }
 EXPORT_SYMBOL_GPL(vhost_has_work);
 
@@ -305,7 +306,8 @@ static void vhost_vq_reset(struct vhost_dev *dev,
 static int vhost_worker(void *data)
 {
 	struct vhost_dev *dev = data;
-	struct vhost_work *work = NULL;
+	struct vhost_work *work, *work_next;
+	struct llist_node *node;
 	mm_segment_t oldfs = get_fs();
 
 	set_fs(USER_DS);
@@ -315,29 +317,25 @@ static int vhost_worker(void *data)
 		/* mb paired w/ kthread_stop */
 		set_current_state(TASK_INTERRUPTIBLE);
 
-		spin_lock_irq(&dev->work_lock);
-
 		if (kthread_should_stop()) {
-			spin_unlock_irq(&dev->work_lock);
 			__set_current_state(TASK_RUNNING);
 			break;
 		}
-		if (!list_empty(&dev->work_list)) {
-			work = list_first_entry(&dev->work_list,
-						struct vhost_work, node);
-			list_del_init(&work->node);
-		} else
-			work = NULL;
-		spin_unlock_irq(&dev->work_lock);
 
-		if (work) {
+		node = llist_del_all(&dev->work_list);
+		if (!node)
+			schedule();
+
+		node = llist_reverse_order(node);
+		/* make sure flag is seen after deletion */
+		smp_wmb();
+		llist_for_each_entry_safe(work, work_next, node, node) {
+			clear_bit(VHOST_WORK_QUEUED, &work->flags);
 			__set_current_state(TASK_RUNNING);
 			work->fn(work);
 			if (need_resched())
 				schedule();
-		} else
-			schedule();
-
+		}
 	}
 	unuse_mm(dev->mm);
 	set_fs(oldfs);
@@ -398,9 +396,9 @@ void vhost_dev_init(struct vhost_dev *dev,
 	dev->log_file = NULL;
 	dev->memory = NULL;
 	dev->mm = NULL;
-	spin_lock_init(&dev->work_lock);
-	INIT_LIST_HEAD(&dev->work_list);
 	dev->worker = NULL;
+	init_llist_head(&dev->work_list);
+
 
 	for (i = 0; i < dev->nvqs; ++i) {
 		vq = dev->vqs[i];
@@ -566,7 +564,7 @@ void vhost_dev_cleanup(struct vhost_dev *dev, bool locked)
 	/* No one will access memory at this point */
 	kvfree(dev->memory);
 	dev->memory = NULL;
-	WARN_ON(!list_empty(&dev->work_list));
+	WARN_ON(!llist_empty(&dev->work_list));
 	if (dev->worker) {
 		kthread_stop(dev->worker);
 		dev->worker = NULL;
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index d36d8be..6690e64 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -15,13 +15,15 @@
 struct vhost_work;
 typedef void (*vhost_work_fn_t)(struct vhost_work *work);
 
+#define VHOST_WORK_QUEUED 1
 struct vhost_work {
-	struct list_head	  node;
+	struct llist_node	  node;
 	vhost_work_fn_t		  fn;
 	wait_queue_head_t	  done;
 	int			  flushing;
 	unsigned		  queue_seq;
 	unsigned		  done_seq;
+	unsigned long		  flags;
 };
 
 /* Poll a file (eventfd or socket) */
@@ -126,8 +128,7 @@ struct vhost_dev {
 	int nvqs;
 	struct file *log_file;
 	struct eventfd_ctx *log_ctx;
-	spinlock_t work_lock;
-	struct list_head work_list;
+	struct llist_head work_list;
 	struct task_struct *worker;
 };
 
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH 1/2] vhost: simplify work flushing
From: Jason Wang @ 2016-04-26  2:14 UTC (permalink / raw)
  To: mst; +Cc: netdev, linux-kernel, kvm, virtualization

We used to implement the work flushing through tracking queued seq,
done seq, and the number of flushing. This patch simplify this by just
implement work flushing through another kind of vhost work with
completion. This will be used by lockless enqueuing patch.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/vhost/vhost.c | 53 ++++++++++++++++++++-------------------------------
 1 file changed, 21 insertions(+), 32 deletions(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 669fef1..73dd16d 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -131,6 +131,19 @@ static void vhost_reset_is_le(struct vhost_virtqueue *vq)
 	vq->is_le = virtio_legacy_is_little_endian();
 }
 
+struct vhost_flush_struct {
+	struct vhost_work work;
+	struct completion wait_event;
+};
+
+static void vhost_flush_work(struct vhost_work *work)
+{
+	struct vhost_flush_struct *s;
+
+	s = container_of(work, struct vhost_flush_struct, work);
+	complete(&s->wait_event);
+}
+
 static void vhost_poll_func(struct file *file, wait_queue_head_t *wqh,
 			    poll_table *pt)
 {
@@ -158,8 +171,6 @@ void vhost_work_init(struct vhost_work *work, vhost_work_fn_t fn)
 	INIT_LIST_HEAD(&work->node);
 	work->fn = fn;
 	init_waitqueue_head(&work->done);
-	work->flushing = 0;
-	work->queue_seq = work->done_seq = 0;
 }
 EXPORT_SYMBOL_GPL(vhost_work_init);
 
@@ -211,31 +222,17 @@ void vhost_poll_stop(struct vhost_poll *poll)
 }
 EXPORT_SYMBOL_GPL(vhost_poll_stop);
 
-static bool vhost_work_seq_done(struct vhost_dev *dev, struct vhost_work *work,
-				unsigned seq)
-{
-	int left;
-
-	spin_lock_irq(&dev->work_lock);
-	left = seq - work->done_seq;
-	spin_unlock_irq(&dev->work_lock);
-	return left <= 0;
-}
-
 void vhost_work_flush(struct vhost_dev *dev, struct vhost_work *work)
 {
-	unsigned seq;
-	int flushing;
+	struct vhost_flush_struct flush;
+
+	if (dev->worker) {
+		init_completion(&flush.wait_event);
+		vhost_work_init(&flush.work, vhost_flush_work);
 
-	spin_lock_irq(&dev->work_lock);
-	seq = work->queue_seq;
-	work->flushing++;
-	spin_unlock_irq(&dev->work_lock);
-	wait_event(work->done, vhost_work_seq_done(dev, work, seq));
-	spin_lock_irq(&dev->work_lock);
-	flushing = --work->flushing;
-	spin_unlock_irq(&dev->work_lock);
-	BUG_ON(flushing < 0);
+		vhost_work_queue(dev, &flush.work);
+		wait_for_completion(&flush.wait_event);
+	}
 }
 EXPORT_SYMBOL_GPL(vhost_work_flush);
 
@@ -254,7 +251,6 @@ void vhost_work_queue(struct vhost_dev *dev, struct vhost_work *work)
 	spin_lock_irqsave(&dev->work_lock, flags);
 	if (list_empty(&work->node)) {
 		list_add_tail(&work->node, &dev->work_list);
-		work->queue_seq++;
 		spin_unlock_irqrestore(&dev->work_lock, flags);
 		wake_up_process(dev->worker);
 	} else {
@@ -310,7 +306,6 @@ static int vhost_worker(void *data)
 {
 	struct vhost_dev *dev = data;
 	struct vhost_work *work = NULL;
-	unsigned uninitialized_var(seq);
 	mm_segment_t oldfs = get_fs();
 
 	set_fs(USER_DS);
@@ -321,11 +316,6 @@ static int vhost_worker(void *data)
 		set_current_state(TASK_INTERRUPTIBLE);
 
 		spin_lock_irq(&dev->work_lock);
-		if (work) {
-			work->done_seq = seq;
-			if (work->flushing)
-				wake_up_all(&work->done);
-		}
 
 		if (kthread_should_stop()) {
 			spin_unlock_irq(&dev->work_lock);
@@ -336,7 +326,6 @@ static int vhost_worker(void *data)
 			work = list_first_entry(&dev->work_list,
 						struct vhost_work, node);
 			list_del_init(&work->node);
-			seq = work->queue_seq;
 		} else
 			work = NULL;
 		spin_unlock_irq(&dev->work_lock);
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH net-next] tuntap: calculate rps hash only when needed
From: Jason Wang @ 2016-04-26  1:55 UTC (permalink / raw)
  To: davem, netdev, linux-kernel; +Cc: Jason Wang, Michael S. Tsirkin

There's no need to calculate rps hash if it was not enabled. So this
patch export rps_needed and check it before trying to get rps
hash. Tests (using pktgen to inject packets to guest) shows this can
improve pps about 13% (when rps is disabled).

Before:
~1150000 pps
After:
~1300000 pps

Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/net/tun.c | 2 +-
 net/core/dev.c    | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index afdf950..746877f 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -819,7 +819,7 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev)
 	if (txq >= numqueues)
 		goto drop;
 
-	if (numqueues == 1) {
+	if (numqueues == 1 && static_key_false(&rps_needed)) {
 		/* Select queue was not called for the skbuff, so we extract the
 		 * RPS hash and save it into the flow_table here.
 		 */
diff --git a/net/core/dev.c b/net/core/dev.c
index b9bcbe7..d4ba936 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3428,6 +3428,7 @@ u32 rps_cpu_mask __read_mostly;
 EXPORT_SYMBOL(rps_cpu_mask);
 
 struct static_key rps_needed __read_mostly;
+EXPORT_SYMBOL(rps_needed);
 
 static struct rps_dev_flow *
 set_rps_cpu(struct net_device *dev, struct sk_buff *skb,
-- 
1.8.3.1

^ permalink raw reply related

* Re: [PATCH] net: dsa: mv88e6xxx: fix uninitialized error return
From: Vivien Didelot @ 2016-04-26  1:24 UTC (permalink / raw)
  To: Colin King, David S . Miller, Andrew Lunn, netdev
  Cc: linux-kernel, Geert Uytterhoeven
In-Reply-To: <1461622282-30463-1-git-send-email-colin.king@canonical.com>

Hi Colin,

Colin King <colin.king@canonical.com> writes:

> From: Colin Ian King <colin.king@canonical.com>
>
> The error return err is not initialized and there is a possibility
> that err is not assigned causing mv88e6xxx_port_bridge_join to
> return a garbage error return status. Fix this by initializing err
> to 0.
>
> Signed-off-by: Colin Ian King <colin.king@canonical.com>

Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>

Even though that cannot happen, the fix doesn't hurt.
Adding Geert in the loop who submitted an RFC for this first:

    https://lkml.org/lkml/2016/4/25/95

Thanks,

        Vivien

^ permalink raw reply

* Re: [PATCH] net: ipv6: Delete host routes on an ifdown
From: Mike Manning @ 2016-04-26  0:57 UTC (permalink / raw)
  To: David Ahern, David Miller; +Cc: netdev
In-Reply-To: <571E943C.9010504@cumulusnetworks.com>

On 04/25/2016 11:03 PM, David Ahern wrote:
> On 4/25/16 2:42 PM, David Miller wrote:
>> From: David Ahern <dsa@cumulusnetworks.com>
>> Date: Mon, 25 Apr 2016 13:40:26 -0600
>>
>>> It's unfortunate you want to take that action. Last week I came across
>>> a prior attempt by Stephen to do this same thing -- keep IPv6
>>> addresses. That prior attempt was reverted by commit
>>> 73a8bd74e261. Cumulus, Brocade, and others clearly want this
>>> capability.
>>
>> But nobody has implemented it correctly, it doesn't matter who wants
>> the feature.  That's why it keeps getting reverted.
>>
>> Also, this testing you are talking about should have happened long
>> before you submitted that first patch that introduced all of these
>> regressions.  My observations tell me that the bulk of the testing
>> happened afterwards and that's why all the regressions are popping up
>> now.
>>
> 
> My testing when submitting the patch was host level: Add an address, while(1) (link up, link down), delete an address, etc.
> 
> Once it was committed to our kernel it started getting hit with a range of L3 deployment scenarios with many nodes and networking config files are uploaded and jumped between on real switch hardware - no reboot but 'networking reload' on the fly. Jumping between different deployments with different sets addresses, routes, vrf devices, bridges, bonds, etc.
> 
> Your objection seems to be 'all these regressions' but beyond the ref count from Andrey all of the bug reports have come from me with 1 from Mike, another invested party wanting this to happen. I am the one who spent the hours dealing with the kernel panics. My patch, my bug, my time wasted coming up with the delta patch. Rather than focusing on my mistakes, why not see the commitment on following through with this change?

It would be great if this could be reconsidered, also bearing in mind that any potential regressions do not have any impact with the default setting of keep_addr_on_down disabled. Or if not, to at least identify what the shortcomings of this solution are for future reference.

I confirm we have been using David's original patch for not flushing IPv6 addresses since it was submitted last year, as for routers it is unacceptable to have IPv6 addresses disappear on link down (although we can work around this to some extent).

When the revised patch and the immediate follow-up fix by David were recently merged for the 4.6 kernel, the only regression I found for ethernet interfaces by changing to the new fix was that local addresses were being retained on link down. This bug was only introduced as a result of a review comment, and David's subsequent fix avoided keeping local addrs (I suggested a complementary fix to avoid fixing them up, as a crash was observed without this in some cases).

Now with David's fix for a vulnerability with loopback interfaces in place and testing looking fine, it seems a shame to give up.

^ permalink raw reply

* Re: [PATCH v4 net-next 0/3] tcp: Make use of MSG_EOR in tcp_sendmsg
From: Soheil Hassas Yeganeh @ 2016-04-26  0:50 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: netdev, Eric Dumazet, Neal Cardwell, Willem de Bruijn,
	Yuchung Cheng, Kernel Team
In-Reply-To: <1461620690-1081063-1-git-send-email-kafai@fb.com>

On Mon, Apr 25, 2016 at 5:44 PM, Martin KaFai Lau <kafai@fb.com> wrote:
> v4:
> ~ Do not set eor bit in do_tcp_sendpages() since there is
>   no way to pass MSG_EOR from the userland now.
> ~ Avoid rmw by testing MSG_EOR first in tcp_sendmsg().
> ~ Move TCP_SKB_CB(skb)->eor test to a new helper
>   tcp_skb_can_collapse_to() (suggested by Soheil).
> ~ Add some packetdrill tests.

Thanks for the nice patches and the tests!

> v3:
> ~ Separate EOR marking from the SKBTX_ANY_TSTAMP logic.
> ~ Move the eor bit test back to the loop in tcp_sendmsg and
>   tcp_sendpage because there could be >1 threads doing
>   sendmsg.
> ~ Thanks to Eric Dumazet's suggestions on v2.
> ~ The TCP timestamp bug fixes are separated into other threads.
>
> v2:
> ~ Rework based on the recent work
>   "add TX timestamping via cmsg" by
>   Soheil Hassas Yeganeh <soheil.kdev@gmail.com>
> ~ This version takes the MSG_EOR bit as a signal of
>   end-of-response-message and leave the selective
>   timestamping job to the cmsg
> ~ Changes based on the v1 feedback (like avoid
>   unlikely check in a loop and adding tcp_sendpage
>   support)
> ~ The first 3 patches are bug fixes.  The fixes in this
>   series depend on the newly introduced txstamp_ack in
>   net-next.  I will make relevant patches against net after
>   getting some feedback.
> ~ The test results are based on the recently posted net fix:
>   "tcp: Fix SOF_TIMESTAMPING_TX_ACK when handling dup acks"
>
> One potential use case is to use MSG_EOR with
> SOF_TIMESTAMPING_TX_ACK to get a more accurate
> TCP ack timestamping on application protocol with
> multiple outgoing response messages (e.g. HTTP2).
>
> One of our use case is at the webserver.  The webserver tracks
> the HTTP2 response latency by measuring when the webserver sends
> the first byte to the socket till the TCP ACK of the last byte
> is received.  In the cases where we don't have client side
> measurement, measuring from the server side is the only option.
> In the cases we have the client side measurement, the server side
> data can also be used to justify/cross-check-with the client
> side data.
>

^ permalink raw reply

* Re: [PATCH v4 net-next 3/3] tcp: Handle eor bit when fragmenting a skb
From: Soheil Hassas Yeganeh @ 2016-04-26  0:49 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: netdev, Eric Dumazet, Neal Cardwell, Willem de Bruijn,
	Yuchung Cheng, Kernel Team
In-Reply-To: <1461620690-1081063-4-git-send-email-kafai@fb.com>

On Mon, Apr 25, 2016 at 5:44 PM, Martin KaFai Lau <kafai@fb.com> wrote:
> When fragmenting a skb, the next_skb should carry
> the eor from prev_skb.  The eor of prev_skb should
> also be reset.
>
> Packetdrill script for testing:
> ~~~~~~
> +0 `sysctl -q -w net.ipv4.tcp_min_tso_segs=10`
> +0 `sysctl -q -w net.ipv4.tcp_no_metrics_save=1`
> +0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
> +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
> +0 bind(3, ..., ...) = 0
> +0 listen(3, 1) = 0
>
> 0.100 < S 0:0(0) win 32792 <mss 1460,sackOK,nop,nop,nop,wscale 7>
> 0.100 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 7>
> 0.200 < . 1:1(0) ack 1 win 257
> 0.200 accept(3, ..., ...) = 4
> +0 setsockopt(4, SOL_TCP, TCP_NODELAY, [1], 4) = 0
>
> 0.200 sendto(4, ..., 15330, MSG_EOR, ..., ...) = 15330
> 0.200 sendto(4, ..., 730, 0, ..., ...) = 730
>
> 0.200 > .  1:7301(7300) ack 1
> 0.200 > . 7301:14601(7300) ack 1
>
> 0.300 < . 1:1(0) ack 14601 win 257
> 0.300 > P. 14601:15331(730) ack 1
> 0.300 > P. 15331:16061(730) ack 1
>
> 0.400 < . 1:1(0) ack 16061 win 257
> 0.400 close(4) = 0
> 0.400 > F. 16061:16061(0) ack 1
> 0.400 < F. 1:1(0) ack 16062 win 257
> 0.400 > . 16062:16062(0) ack 2
>
> Signed-off-by: Martin KaFai Lau <kafai@fb.com>
> Cc: Eric Dumazet <edumazet@google.com>
> Cc: Neal Cardwell <ncardwell@google.com>
> Cc: Soheil Hassas Yeganeh <soheil@google.com>
> Cc: Willem de Bruijn <willemb@google.com>
> Cc: Yuchung Cheng <ycheng@google.com>

Acked-by: Soheil Hassas Yeganeh <soheil@google.com>

> ---
>  net/ipv4/tcp_output.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
>
> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> index fa4d17f..55a926b 100644
> --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@ -1128,6 +1128,12 @@ static void tcp_fragment_tstamp(struct sk_buff *skb, struct sk_buff *skb2)
>         }
>  }
>
> +static void tcp_skb_fragment_eor(struct sk_buff *skb, struct sk_buff *skb2)
> +{
> +       TCP_SKB_CB(skb2)->eor = TCP_SKB_CB(skb)->eor;
> +       TCP_SKB_CB(skb)->eor = 0;
> +}
> +
>  /* Function to create two new TCP segments.  Shrinks the given segment
>   * to the specified size and appends a new segment with the rest of the
>   * packet to the list.  This won't be called frequently, I hope.
> @@ -1173,6 +1179,7 @@ int tcp_fragment(struct sock *sk, struct sk_buff *skb, u32 len,
>         TCP_SKB_CB(skb)->tcp_flags = flags & ~(TCPHDR_FIN | TCPHDR_PSH);
>         TCP_SKB_CB(buff)->tcp_flags = flags;
>         TCP_SKB_CB(buff)->sacked = TCP_SKB_CB(skb)->sacked;
> +       tcp_skb_fragment_eor(skb, buff);
>
>         if (!skb_shinfo(skb)->nr_frags && skb->ip_summed != CHECKSUM_PARTIAL) {
>                 /* Copy and checksum data tail into the new buffer. */
> @@ -1733,6 +1740,8 @@ static int tso_fragment(struct sock *sk, struct sk_buff *skb, unsigned int len,
>         /* This packet was never sent out yet, so no SACK bits. */
>         TCP_SKB_CB(buff)->sacked = 0;
>
> +       tcp_skb_fragment_eor(skb, buff);
> +
>         buff->ip_summed = skb->ip_summed = CHECKSUM_PARTIAL;
>         skb_split(skb, buff, len);
>         tcp_fragment_tstamp(skb, buff);
> --
> 2.5.1
>

^ permalink raw reply

* Re: [PATCH v4 net-next 2/3] tcp: Handle eor bit when coalescing skb
From: Soheil Hassas Yeganeh @ 2016-04-26  0:49 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: netdev, Eric Dumazet, Neal Cardwell, Willem de Bruijn,
	Yuchung Cheng, Kernel Team
In-Reply-To: <1461620690-1081063-3-git-send-email-kafai@fb.com>

On Mon, Apr 25, 2016 at 5:44 PM, Martin KaFai Lau <kafai@fb.com> wrote:
> This patch:
> 1. Prevent next_skb from coalescing to the prev_skb if
>    TCP_SKB_CB(prev_skb)->eor is set
> 2. Update the TCP_SKB_CB(prev_skb)->eor if coalescing is
>    allowed
>
> Packetdrill script for testing:
> ~~~~~~
> +0 `sysctl -q -w net.ipv4.tcp_min_tso_segs=10`
> +0 `sysctl -q -w net.ipv4.tcp_no_metrics_save=1`
> +0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
> +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
> +0 bind(3, ..., ...) = 0
> +0 listen(3, 1) = 0
>
> 0.100 < S 0:0(0) win 32792 <mss 1460,sackOK,nop,nop,nop,wscale 7>
> 0.100 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 7>
> 0.200 < . 1:1(0) ack 1 win 257
> 0.200 accept(3, ..., ...) = 4
> +0 setsockopt(4, SOL_TCP, TCP_NODELAY, [1], 4) = 0
>
> 0.200 sendto(4, ..., 730, MSG_EOR, ..., ...) = 730
> 0.200 sendto(4, ..., 730, MSG_EOR, ..., ...) = 730
> 0.200 write(4, ..., 11680) = 11680
>
> 0.200 > P. 1:731(730) ack 1
> 0.200 > P. 731:1461(730) ack 1
> 0.200 > . 1461:8761(7300) ack 1
> 0.200 > P. 8761:13141(4380) ack 1
>
> 0.300 < . 1:1(0) ack 1 win 257 <sack 1461:13141,nop,nop>
> 0.300 > P. 1:731(730) ack 1
> 0.300 > P. 731:1461(730) ack 1
> 0.400 < . 1:1(0) ack 13141 win 257
>
> 0.400 close(4) = 0
> 0.400 > F. 13141:13141(0) ack 1
> 0.500 < F. 1:1(0) ack 13142 win 257
> 0.500 > . 13142:13142(0) ack 2
>
> Signed-off-by: Martin KaFai Lau <kafai@fb.com>
> Cc: Eric Dumazet <edumazet@google.com>
> Cc: Neal Cardwell <ncardwell@google.com>
> Cc: Soheil Hassas Yeganeh <soheil@google.com>
> Cc: Willem de Bruijn <willemb@google.com>
> Cc: Yuchung Cheng <ycheng@google.com>

Acked-by: Soheil Hassas Yeganeh <soheil@google.com>

> ---
>  net/ipv4/tcp_input.c  | 4 ++++
>  net/ipv4/tcp_output.c | 4 ++++
>  2 files changed, 8 insertions(+)
>
> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> index dcad8f9..65fb708 100644
> --- a/net/ipv4/tcp_input.c
> +++ b/net/ipv4/tcp_input.c
> @@ -1303,6 +1303,7 @@ static bool tcp_shifted_skb(struct sock *sk, struct sk_buff *skb,
>         }
>
>         TCP_SKB_CB(prev)->tcp_flags |= TCP_SKB_CB(skb)->tcp_flags;
> +       TCP_SKB_CB(prev)->eor = TCP_SKB_CB(skb)->eor;
>         if (TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN)
>                 TCP_SKB_CB(prev)->end_seq++;
>
> @@ -1368,6 +1369,9 @@ static struct sk_buff *tcp_shift_skb_data(struct sock *sk, struct sk_buff *skb,
>         if ((TCP_SKB_CB(prev)->sacked & TCPCB_TAGBITS) != TCPCB_SACKED_ACKED)
>                 goto fallback;
>
> +       if (!tcp_skb_can_collapse_to(prev))
> +               goto fallback;
> +
>         in_sack = !after(start_seq, TCP_SKB_CB(skb)->seq) &&
>                   !before(end_seq, TCP_SKB_CB(skb)->end_seq);
>
> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> index 9d3b4b3..fa4d17f 100644
> --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@ -2494,6 +2494,7 @@ static void tcp_collapse_retrans(struct sock *sk, struct sk_buff *skb)
>          * packet counting does not break.
>          */
>         TCP_SKB_CB(skb)->sacked |= TCP_SKB_CB(next_skb)->sacked & TCPCB_EVER_RETRANS;
> +       TCP_SKB_CB(skb)->eor = TCP_SKB_CB(next_skb)->eor;
>
>         /* changed transmit queue under us so clear hints */
>         tcp_clear_retrans_hints_partial(tp);
> @@ -2545,6 +2546,9 @@ static void tcp_retrans_try_collapse(struct sock *sk, struct sk_buff *to,
>                 if (!tcp_can_collapse(sk, skb))
>                         break;
>
> +               if (!tcp_skb_can_collapse_to(to))
> +                       break;
> +
>                 space -= skb->len;
>
>                 if (first) {
> --
> 2.5.1
>

^ permalink raw reply

* Re: [PATCH v4 net-next 1/3] tcp: Make use of MSG_EOR in tcp_sendmsg
From: Soheil Hassas Yeganeh @ 2016-04-26  0:48 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: netdev, Eric Dumazet, Neal Cardwell, Willem de Bruijn,
	Yuchung Cheng, Kernel Team
In-Reply-To: <1461620690-1081063-2-git-send-email-kafai@fb.com>

On Mon, Apr 25, 2016 at 5:44 PM, Martin KaFai Lau <kafai@fb.com> wrote:
> This patch adds an eor bit to the TCP_SKB_CB.  When MSG_EOR
> is passed to tcp_sendmsg, the eor bit will be set at the skb
> containing the last byte of the userland's msg.  The eor bit
> will prevent data from appending to that skb in the future.
>
> The change in do_tcp_sendpages is to honor the eor set
> during the previous tcp_sendmsg(MSG_EOR) call.
>
> This patch handles the tcp_sendmsg case.  The followup patches
> will handle other skb coalescing and fragment cases.
>
> One potential use case is to use MSG_EOR with
> SOF_TIMESTAMPING_TX_ACK to get a more accurate
> TCP ack timestamping on application protocol with
> multiple outgoing response messages (e.g. HTTP2).
>
> Packetdrill script for testing:
> ~~~~~~
> +0 `sysctl -q -w net.ipv4.tcp_min_tso_segs=10`
> +0 `sysctl -q -w net.ipv4.tcp_no_metrics_save=1`
> +0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
> +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
> +0 bind(3, ..., ...) = 0
> +0 listen(3, 1) = 0
>
> 0.100 < S 0:0(0) win 32792 <mss 1460,sackOK,nop,nop,nop,wscale 7>
> 0.100 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 7>
> 0.200 < . 1:1(0) ack 1 win 257
> 0.200 accept(3, ..., ...) = 4
> +0 setsockopt(4, SOL_TCP, TCP_NODELAY, [1], 4) = 0
>
> 0.200 write(4, ..., 14600) = 14600
> 0.200 sendto(4, ..., 730, MSG_EOR, ..., ...) = 730
> 0.200 sendto(4, ..., 730, MSG_EOR, ..., ...) = 730
>
> 0.200 > .  1:7301(7300) ack 1
> 0.200 > P. 7301:14601(7300) ack 1
>
> 0.300 < . 1:1(0) ack 14601 win 257
> 0.300 > P. 14601:15331(730) ack 1
> 0.300 > P. 15331:16061(730) ack 1
>
> 0.400 < . 1:1(0) ack 16061 win 257
> 0.400 close(4) = 0
> 0.400 > F. 16061:16061(0) ack 1
> 0.400 < F. 1:1(0) ack 16062 win 257
> 0.400 > . 16062:16062(0) ack 2
>
> Signed-off-by: Martin KaFai Lau <kafai@fb.com>
> Cc: Eric Dumazet <edumazet@google.com>
> Cc: Neal Cardwell <ncardwell@google.com>
> Cc: Soheil Hassas Yeganeh <soheil@google.com>
> Cc: Willem de Bruijn <willemb@google.com>
> Cc: Yuchung Cheng <ycheng@google.com>
> Suggested-by: Eric Dumazet <edumazet@google.com>

Acked-by: Soheil Hassas Yeganeh <soheil@google.com>

> ---
>  include/net/tcp.h | 8 +++++++-
>  net/ipv4/tcp.c    | 7 +++++--
>  2 files changed, 12 insertions(+), 3 deletions(-)
>
> diff --git a/include/net/tcp.h b/include/net/tcp.h
> index 7f2553d..ce08038 100644
> --- a/include/net/tcp.h
> +++ b/include/net/tcp.h
> @@ -762,7 +762,8 @@ struct tcp_skb_cb {
>
>         __u8            ip_dsfield;     /* IPv4 tos or IPv6 dsfield     */
>         __u8            txstamp_ack:1,  /* Record TX timestamp for ack? */
> -                       unused:7;
> +                       eor:1,          /* Is skb MSG_EOR marked? */
> +                       unused:6;
>         __u32           ack_seq;        /* Sequence number ACK'd        */
>         union {
>                 struct inet_skb_parm    h4;
> @@ -809,6 +810,11 @@ static inline int tcp_skb_mss(const struct sk_buff *skb)
>         return TCP_SKB_CB(skb)->tcp_gso_size;
>  }
>
> +static inline bool tcp_skb_can_collapse_to(const struct sk_buff *skb)
> +{
> +       return likely(!TCP_SKB_CB(skb)->eor);
> +}
> +
>  /* Events passed to congestion control interface */
>  enum tcp_ca_event {
>         CA_EVENT_TX_START,      /* first transmit when no packets in flight */
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index 4d73858..ea5364b 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -908,7 +908,8 @@ static ssize_t do_tcp_sendpages(struct sock *sk, struct page *page, int offset,
>                 int copy, i;
>                 bool can_coalesce;
>
> -               if (!tcp_send_head(sk) || (copy = size_goal - skb->len) <= 0) {
> +               if (!tcp_send_head(sk) || (copy = size_goal - skb->len) <= 0 ||
> +                   !tcp_skb_can_collapse_to(skb)) {
>  new_segment:
>                         if (!sk_stream_memory_free(sk))
>                                 goto wait_for_sndbuf;
> @@ -1156,7 +1157,7 @@ int tcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t size)
>                         copy = max - skb->len;
>                 }
>
> -               if (copy <= 0) {
> +               if (copy <= 0 || !tcp_skb_can_collapse_to(skb)) {
>  new_segment:
>                         /* Allocate new segment. If the interface is SG,
>                          * allocate skb fitting to single page.
> @@ -1250,6 +1251,8 @@ new_segment:
>                 copied += copy;
>                 if (!msg_data_left(msg)) {
>                         tcp_tx_timestamp(sk, sockc.tsflags, skb);
> +                       if (unlikely(flags & MSG_EOR))
> +                               TCP_SKB_CB(skb)->eor = 1;
>                         goto out;
>                 }
>
> --
> 2.5.1
>

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox