Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH 2/6] ath9k: add a quirk to set use_msi automatically
From: AceLan Kao @ 2017-09-26  7:26 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: QCA ath9k Development, Kalle Valo, linux-wireless, netdev,
	Linux-Kernel@Vger. Kernel. Org
In-Reply-To: <20170926064456.GA28611@infradead.org>

Ath9k is an old driver for old chips, and they work fine with legacy INTx.
But some new platforms are using it, so I think we should list those
new platforms which blocks INTx in the driver.

BTW, new chips use ath10k driver.

2017-09-26 14:44 GMT+08:00 Christoph Hellwig <hch@infradead.org>:
> On Tue, Sep 26, 2017 at 02:41:35PM +0800, AceLan Kao wrote:
>> Some platform(BIOS) blocks legacy interrupts (INTx), and only allows MSI
>> for WLAN device. So adding a quirk to list those machines and set
>> use_msi automatically.
>> Adding Dell Inspiron 24-3460 to the quirk.
>
> Huh?  Using MSI should be the default, and skipping MSI should be
> a quirk if needed at all (usually it should be autodetected)

^ permalink raw reply

* Re: [PATCH v2 net-next 10/10] net: hns3: Add mqprio support when interacting with network stack
From: Yunsheng Lin @ 2017-09-26  7:29 UTC (permalink / raw)
  To: Yuval Mintz
  Cc: huangdaode@hisilicon.com, xuwei5@hisilicon.com,
	liguozhu@hisilicon.com, Yisen.Zhuang@huawei.com,
	gabriele.paoloni@huawei.com, john.garry@huawei.com,
	linuxarm@huawei.com, salil.mehta@huawei.com, lipeng321@huawei.com,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	davem@davemloft.net
In-Reply-To: <AM0PR0502MB3683C922A7D87D3E1F64B93EBF7B0@AM0PR0502MB3683.eurprd05.prod.outlook.com>

Hi, Yuval

On 2017/9/26 14:43, Yuval Mintz wrote:
>> When using tc qdisc to configure DCB parameter, dcb_ops->setup_tc
>> is used to tell hclge_dcb module to do the setup.
> 
> While this might be a step in the right direction, this causes an inconsistency
> in user experience - Some [well, most] vendors didn't allow the mqprio
> priority mapping to affect DCB, instead relying on the dcbnl functionality
> to control that configuration.
> 
> A couple of options to consider:
>   - Perhaps said logic shouldn't be contained inside the driver but rather
>      in mqprio logic itself. I.e., rely on DCBNL functionality [if available] from
>      within mqprio and try changing the configuration. 
>   - Add a new TC_MQPRIO_HW_OFFLOAD_ value to explicitly reflect user
>      request to allow this configuration to affect DCB.
> 
>> When using lldptool to configure DCB parameter, hclge_dcb module
>> call the client_ops->setup_tc to tell network stack which queue
>> and priority is using for specific tc.
> 
> You're basically bypassing the mqprio logic.
> Since you're configuring the prio->queue mapping from DCB flow,
> you'll get an mqprio-like behavior [meaning a transmitted packet
> would reach a transmission queue associated with its priority] even
> if device wasn't grated with an mqprio qdisc.
> Why should your user even use mqprio? What benefit does he get from it?


When adding mqprio and lldptool support, I was thinking user can use
tc qdisc or lldptool to do the configuration, giving user two option to
setup the DCB.

If user is only tc qdisc or lldptool, I think there is no problem here.

when user is using tc qdisc and lldptool, As you explained above, When
tc qdisc changes the configuration, there should be a way to notify dcbnl,
so that the dcbnl can response correctly(like tell the peer it's configuration
has changed).

I will try to find if there is a way to do notify the dcbnl when using tc qdisc
to setup the configuration.
If there is not a way to do it now, then I will drop the mqprio in this patch, and
will address this problem if there is need for the tc qdisc.

Please let me know if I was misunderstood.
And thanks for your time reviewing.

> 
> ...
> 
>> +static int hns3_nic_set_real_num_queue(struct net_device *netdev)
>> +{
>> +	struct hns3_nic_priv *priv = netdev_priv(netdev);
>> +	struct hnae3_handle *h = priv->ae_handle;
>> +	struct hnae3_knic_private_info *kinfo = &h->kinfo;
>> +	unsigned int queue_size = kinfo->rss_size * kinfo->num_tc;
>> +	int ret;
>> +
>> +	ret = netif_set_real_num_tx_queues(netdev, queue_size);
>> +	if (ret) {
>> +		netdev_err(netdev,
>> +			   "netif_set_real_num_tx_queues fail, ret=%d!\n",
>> +			   ret);
>> +		return ret;
>> +	}
>> +
>> +	ret = netif_set_real_num_rx_queues(netdev, queue_size);
> 
> I don't think you're changing the driver behavior, but why are you setting
> the real number of rx queues based on the number of TCs?
> Do you actually open (TC x RSS) Rx queues?
> 
> .
> 

^ permalink raw reply

* Re: [PATCH v2 net-next 10/10] net: hns3: Add mqprio support when interacting with network stack
From: Yunsheng Lin @ 2017-09-26  7:33 UTC (permalink / raw)
  To: Yuval Mintz
  Cc: huangdaode@hisilicon.com, xuwei5@hisilicon.com,
	liguozhu@hisilicon.com, Yisen.Zhuang@huawei.com,
	gabriele.paoloni@huawei.com, john.garry@huawei.com,
	linuxarm@huawei.com, salil.mehta@huawei.com, lipeng321@huawei.com,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	davem@davemloft.net
In-Reply-To: <AM0PR0502MB3683C922A7D87D3E1F64B93EBF7B0@AM0PR0502MB3683.eurprd05.prod.outlook.com>

Hi, Yuval

On 2017/9/26 14:43, Yuval Mintz wrote:
>> When using tc qdisc to configure DCB parameter, dcb_ops->setup_tc
>> is used to tell hclge_dcb module to do the setup.
> 
> While this might be a step in the right direction, this causes an inconsistency
> in user experience - Some [well, most] vendors didn't allow the mqprio
> priority mapping to affect DCB, instead relying on the dcbnl functionality
> to control that configuration.
> 
> A couple of options to consider:
>   - Perhaps said logic shouldn't be contained inside the driver but rather
>      in mqprio logic itself. I.e., rely on DCBNL functionality [if available] from
>      within mqprio and try changing the configuration. 
>   - Add a new TC_MQPRIO_HW_OFFLOAD_ value to explicitly reflect user
>      request to allow this configuration to affect DCB.
> 
>> When using lldptool to configure DCB parameter, hclge_dcb module
>> call the client_ops->setup_tc to tell network stack which queue
>> and priority is using for specific tc.
> 
> You're basically bypassing the mqprio logic.
> Since you're configuring the prio->queue mapping from DCB flow,
> you'll get an mqprio-like behavior [meaning a transmitted packet
> would reach a transmission queue associated with its priority] even
> if device wasn't grated with an mqprio qdisc.
> Why should your user even use mqprio? What benefit does he get from it?
> 
> ...
> 
>> +static int hns3_nic_set_real_num_queue(struct net_device *netdev)
>> +{
>> +	struct hns3_nic_priv *priv = netdev_priv(netdev);
>> +	struct hnae3_handle *h = priv->ae_handle;
>> +	struct hnae3_knic_private_info *kinfo = &h->kinfo;
>> +	unsigned int queue_size = kinfo->rss_size * kinfo->num_tc;
>> +	int ret;
>> +
>> +	ret = netif_set_real_num_tx_queues(netdev, queue_size);
>> +	if (ret) {
>> +		netdev_err(netdev,
>> +			   "netif_set_real_num_tx_queues fail, ret=%d!\n",
>> +			   ret);
>> +		return ret;
>> +	}
>> +
>> +	ret = netif_set_real_num_rx_queues(netdev, queue_size);
> 
> I don't think you're changing the driver behavior, but why are you setting
> the real number of rx queues based on the number of TCs?
> Do you actually open (TC x RSS) Rx queues?

Yes, our hardware can do the rss based on TC.

Sorry for almost forget to answer this question.
Thanks for your time reviewing again.

> 
> .
> 

^ permalink raw reply

* Re: [PATCH] r8152: add Linksys USB3GIGV1 id
From: Greg KH @ 2017-09-26  7:35 UTC (permalink / raw)
  To: Grant Grundler
  Cc: Oliver Neukum, Hayes Wang, David S . Miller, LKML, linux-usb,
	netdev
In-Reply-To: <CANEJEGtM+gFC9Ofmp=UmGn5pKys9NRbsC6+ks_VqaLKWkEBS8A@mail.gmail.com>

On Mon, Sep 25, 2017 at 01:17:32PM -0700, Grant Grundler wrote:
> Correct. r8152 happens to claim the device before cdc_ether does - I
> thought because cdc_ether is a class driver and only gets picked up
> after vendor specific drivers are probed.  Is that correct?

Nope, there is not "priority" scheme of binding some drivers to devices
instead of others at all in Linux.  The whole scheme is "first in the
list", and has always been that way.

And yes, people have talked about changing this for decades now, but no
one has come up with any working patch, for the obvious reasons[1].

thanks,

greg k-h

[1] exercise is left for the reader :)

^ permalink raw reply

* [PATCH net-next] net: ena: Remove redundant unlikely()
From: Tobias Klauser @ 2017-09-26  9:04 UTC (permalink / raw)
  To: netdev; +Cc: Netanel Belgazal, Saeed Bishara, Zorik Machulsky, David S. Miller

IS_ERR() already implies unlikely(), so it can be omitted.

Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
---
 drivers/net/ethernet/amazon/ena/ena_com.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/amazon/ena/ena_com.c b/drivers/net/ethernet/amazon/ena/ena_com.c
index 52beba8c7a39..ded29af648c9 100644
--- a/drivers/net/ethernet/amazon/ena/ena_com.c
+++ b/drivers/net/ethernet/amazon/ena/ena_com.c
@@ -315,7 +315,7 @@ static struct ena_comp_ctx *ena_com_submit_admin_cmd(struct ena_com_admin_queue
 					      cmd_size_in_bytes,
 					      comp,
 					      comp_size_in_bytes);
-	if (unlikely(IS_ERR(comp_ctx)))
+	if (IS_ERR(comp_ctx))
 		admin_queue->running_state = false;
 	spin_unlock_irqrestore(&admin_queue->q_lock, flags);
 
@@ -1130,7 +1130,7 @@ int ena_com_execute_admin_command(struct ena_com_admin_queue *admin_queue,
 
 	comp_ctx = ena_com_submit_admin_cmd(admin_queue, cmd, cmd_size,
 					    comp, comp_size);
-	if (unlikely(IS_ERR(comp_ctx))) {
+	if (IS_ERR(comp_ctx)) {
 		if (comp_ctx == ERR_PTR(-ENODEV))
 			pr_debug("Failed to submit command [%ld]\n",
 				 PTR_ERR(comp_ctx));
-- 
2.13.0

^ permalink raw reply related

* [PATCH net-next] datagram: Remove redundant unlikely()
From: Tobias Klauser @ 2017-09-26  9:21 UTC (permalink / raw)
  To: netdev; +Cc: David S. Miller

IS_ERR() already implies unlikely(), so it can be omitted.

Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
---
 net/core/datagram.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/core/datagram.c b/net/core/datagram.c
index f7fb7e3f2acf..0b7b4c22719e 100644
--- a/net/core/datagram.c
+++ b/net/core/datagram.c
@@ -188,7 +188,7 @@ struct sk_buff *__skb_try_recv_from_queue(struct sock *sk,
 			}
 			if (!skb->len) {
 				skb = skb_set_peeked(skb);
-				if (unlikely(IS_ERR(skb))) {
+				if (IS_ERR(skb)) {
 					*err = PTR_ERR(skb);
 					return NULL;
 				}
-- 
2.13.0

^ permalink raw reply related

* [PATCH net-next] ipv6: Remove redundant unlikely()
From: Tobias Klauser @ 2017-09-26  9:22 UTC (permalink / raw)
  To: netdev; +Cc: David S. Miller, Alexey Kuznetsov, Hideaki YOSHIFUJI

IS_ERR() already implies unlikely(), so it can be omitted.

Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
---
 net/ipv6/addrconf.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 96861c702c06..13c3b697f8c0 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -3297,7 +3297,7 @@ static int fixup_permanent_addr(struct inet6_dev *idev,
 		struct rt6_info *rt, *prev;
 
 		rt = addrconf_dst_alloc(idev, &ifp->addr, false);
-		if (unlikely(IS_ERR(rt)))
+		if (IS_ERR(rt))
 			return PTR_ERR(rt);
 
 		/* ifp->rt can be accessed outside of rtnl */
-- 
2.13.0

^ permalink raw reply related

* [PATCH net-next] kcm: Remove redundant unlikely()
From: Tobias Klauser @ 2017-09-26  9:22 UTC (permalink / raw)
  To: netdev; +Cc: David S. Miller

IS_ERR() already implies unlikely(), so it can be omitted.

Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
---
 net/kcm/kcmsock.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/kcm/kcmsock.c b/net/kcm/kcmsock.c
index af4e76ac88ff..0b750a22c4b9 100644
--- a/net/kcm/kcmsock.c
+++ b/net/kcm/kcmsock.c
@@ -1650,7 +1650,7 @@ static int kcm_clone(struct socket *osock, struct kcm_clone *info,
 	}
 
 	newfile = sock_alloc_file(newsock, 0, osock->sk->sk_prot_creator->name);
-	if (unlikely(IS_ERR(newfile))) {
+	if (IS_ERR(newfile)) {
 		err = PTR_ERR(newfile);
 		goto out_sock_alloc_fail;
 	}
-- 
2.13.0

^ permalink raw reply related

* [PATCH v2 1/2] mpls: expose stack entry function
From: Amine Kherbouche @ 2017-09-26  9:22 UTC (permalink / raw)
  To: netdev, xeb, roopa; +Cc: amine.kherbouche, equinox
In-Reply-To: <cover.1506416988.git.amine.kherbouche@6wind.com>

Exporting mpls_forward() function to be able to be called from elsewhere
such as MPLS over GRE in the next commit.

Signed-off-by: Amine Kherbouche <amine.kherbouche@6wind.com>
---
 include/linux/mpls.h | 3 +++
 net/mpls/af_mpls.c   | 5 +++--
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/include/linux/mpls.h b/include/linux/mpls.h
index 384fb22..d5c7599 100644
--- a/include/linux/mpls.h
+++ b/include/linux/mpls.h
@@ -2,10 +2,13 @@
 #define _LINUX_MPLS_H
 
 #include <uapi/linux/mpls.h>
+#include <linux/netdevice.h>
 
 #define MPLS_TTL_MASK		(MPLS_LS_TTL_MASK >> MPLS_LS_TTL_SHIFT)
 #define MPLS_BOS_MASK		(MPLS_LS_S_MASK >> MPLS_LS_S_SHIFT)
 #define MPLS_TC_MASK		(MPLS_LS_TC_MASK >> MPLS_LS_TC_SHIFT)
 #define MPLS_LABEL_MASK		(MPLS_LS_LABEL_MASK >> MPLS_LS_LABEL_SHIFT)
 
+int mpls_forward(struct sk_buff *skb, struct net_device *dev,
+		 struct packet_type *pt, struct net_device *orig_dev);
 #endif  /* _LINUX_MPLS_H */
diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
index c5b9ce4..36ea2ad 100644
--- a/net/mpls/af_mpls.c
+++ b/net/mpls/af_mpls.c
@@ -307,8 +307,8 @@ static bool mpls_egress(struct net *net, struct mpls_route *rt,
 	return success;
 }
 
-static int mpls_forward(struct sk_buff *skb, struct net_device *dev,
-			struct packet_type *pt, struct net_device *orig_dev)
+int mpls_forward(struct sk_buff *skb, struct net_device *dev,
+		 struct packet_type *pt, struct net_device *orig_dev)
 {
 	struct net *net = dev_net(dev);
 	struct mpls_shim_hdr *hdr;
@@ -442,6 +442,7 @@ static int mpls_forward(struct sk_buff *skb, struct net_device *dev,
 	kfree_skb(skb);
 	return NET_RX_DROP;
 }
+EXPORT_SYMBOL(mpls_forward);
 
 static struct packet_type mpls_packet_type __read_mostly = {
 	.type = cpu_to_be16(ETH_P_MPLS_UC),
-- 
2.1.4

^ permalink raw reply related

* [PATCH v2 0/2] Introduce MPLS over GRE
From: Amine Kherbouche @ 2017-09-26  9:22 UTC (permalink / raw)
  To: netdev, xeb, roopa; +Cc: amine.kherbouche, equinox

This series introduces the MPLS over GRE encapsulation (RFC 4023).

Various applications of MPLS make use of label stacks with multiple
entries.  In some cases, it is possible to replace the top label of
the stack with an IP-based encapsulation, thereby, it is possible for
two LSRs that are adjacent on an LSP to be separated by an IP network,
even if that IP network does not provide MPLS.

Changes in v2:
  - wrap ip tunnel functions under ifdef in mpls file.
  - fix indentation.
  - check return code.

An example of configuration:


         node1                LER1                       LER2                node2
        +-----+             +------+                   +------+             +-----+
        |     |             |      |                   |      |             |     |
        |     |             |      |p3  GRE tunnel   p4|      |             |     |
        |     |p1         p2|      +-------------------+      |p5         p6|     |
        |     +-------------+      +-------------------+      +------------+|     |
        |     |10.100.0.0/24|      |                   |      |10.200.0.0/24|     |
        |     |fd00:100::/64|      |  10.125.0.0/24    |      |fd00:200::/64|     |
        |     |             |      |  fd00:125::/64    |      |             |     |
        |     |             |      |                   |      |             |     |
        |     |             |      |                   |      |             |     |
        |     |             |      |                   |      |             |     |
        |     |             |      |                   |      |             |     |
        +-----+             +------+                   +------+             +-----+


		###	node1	###

ip link set p1 up
ip addr add 10.100.0.1/24 dev p1

		###	LER1	###

ip link set p2 up
ip addr add 10.100.0.2/24 dev p2

ip link set p3 up
ip addr add 10.125.0.1/24 dev p3

modprobe mpls_router
sysctl -w net.mpls.conf.p2.input=1
sysctl -w net.mpls.conf.p3.input=1
sysctl -w net.mpls.platform_labels=1000

ip link add gre1 type gre ttl 64 local 10.125.0.1 remote 10.125.0.2 dev p3
ip link set dev gre1 up

ip -M route add 111 as 222 dev gre1
ip -M route add 555 as 666 via inet 10.100.0.1 dev p2

		###	LER2	###

ip link set p5 up
ip addr add 10.200.0.2/24 dev p5

ip link set p4 up
ip addr add 10.125.0.2/24 dev p4

modprobe mpls_router
sysctl -w net.mpls.conf.p4.input=1
sysctl -w net.mpls.conf.p5.input=1
sysctl -w net.mpls.platform_labels=1000

ip link add gre1 type gre ttl 64 local 10.125.0.2 remote 10.125.0.1 dev p4
ip link set dev gre1 up

ip -M route add 444 as 555 dev gre1
ip -M route add 222 as 333 via inet 10.200.0.1 dev p5

		###	node2	###

ip link set p6 up
ip addr add 10.200.0.1/24 dev p6


Now using this scapy to forge and send packets from the port p1 of node1:

p = Ether(src='de:ed:01:0c:41:09', dst='de:ed:01:2f:3b:ba')
p /= MPLS(s=1, ttl=64, label=111)/Raw(load='\xde')
sendp(p, iface="p1", count=20, inter=0.1)

Amine Kherbouche (2):
  mpls: expose stack entry function
  ip_tunnel: add mpls over gre encapsulation

 include/linux/mpls.h           |  3 +++
 include/net/gre.h              |  3 +++
 include/uapi/linux/if_tunnel.h |  1 +
 net/ipv4/gre_demux.c           | 22 +++++++++++++++++++++
 net/ipv4/ip_gre.c              |  9 +++++++++
 net/ipv6/ip6_gre.c             |  7 +++++++
 net/mpls/af_mpls.c             | 45 ++++++++++++++++++++++++++++++++++++++++--
 7 files changed, 88 insertions(+), 2 deletions(-)

-- 
2.1.4

^ permalink raw reply

* [PATCH v2 2/2] ip_tunnel: add mpls over gre encapsulation
From: Amine Kherbouche @ 2017-09-26  9:22 UTC (permalink / raw)
  To: netdev, xeb, roopa; +Cc: amine.kherbouche, equinox
In-Reply-To: <cover.1506416988.git.amine.kherbouche@6wind.com>

This commit introduces the MPLSoGRE support (RFC 4023), using ip tunnel
API.

Encap:
  - Add a new iptunnel type mpls.
  - Share tx path: gre type mpls loaded from skb->protocol.

Decap:
  - pull gre hdr and call mpls_forward().

Signed-off-by: Amine Kherbouche <amine.kherbouche@6wind.com>
---
 include/net/gre.h              |  3 +++
 include/uapi/linux/if_tunnel.h |  1 +
 net/ipv4/gre_demux.c           | 22 ++++++++++++++++++++++
 net/ipv4/ip_gre.c              |  9 +++++++++
 net/ipv6/ip6_gre.c             |  7 +++++++
 net/mpls/af_mpls.c             | 40 ++++++++++++++++++++++++++++++++++++++++
 6 files changed, 82 insertions(+)

diff --git a/include/net/gre.h b/include/net/gre.h
index d25d836..88a8343 100644
--- a/include/net/gre.h
+++ b/include/net/gre.h
@@ -35,6 +35,9 @@ struct net_device *gretap_fb_dev_create(struct net *net, const char *name,
 				       u8 name_assign_type);
 int gre_parse_header(struct sk_buff *skb, struct tnl_ptk_info *tpi,
 		     bool *csum_err, __be16 proto, int nhs);
+#if IS_ENABLED(CONFIG_MPLS)
+int mpls_gre_rcv(struct sk_buff *skb, int gre_hdr_len);
+#endif
 
 static inline int gre_calc_hlen(__be16 o_flags)
 {
diff --git a/include/uapi/linux/if_tunnel.h b/include/uapi/linux/if_tunnel.h
index 2e52088..a2f48c0 100644
--- a/include/uapi/linux/if_tunnel.h
+++ b/include/uapi/linux/if_tunnel.h
@@ -84,6 +84,7 @@ enum tunnel_encap_types {
 	TUNNEL_ENCAP_NONE,
 	TUNNEL_ENCAP_FOU,
 	TUNNEL_ENCAP_GUE,
+	TUNNEL_ENCAP_MPLS,
 };
 
 #define TUNNEL_ENCAP_FLAG_CSUM		(1<<0)
diff --git a/net/ipv4/gre_demux.c b/net/ipv4/gre_demux.c
index b798862..a6a937e 100644
--- a/net/ipv4/gre_demux.c
+++ b/net/ipv4/gre_demux.c
@@ -23,6 +23,9 @@
 #include <linux/netdevice.h>
 #include <linux/if_tunnel.h>
 #include <linux/spinlock.h>
+#if IS_ENABLED(CONFIG_MPLS)
+#include <linux/mpls.h>
+#endif
 #include <net/protocol.h>
 #include <net/gre.h>
 
@@ -122,6 +125,25 @@ int gre_parse_header(struct sk_buff *skb, struct tnl_ptk_info *tpi,
 }
 EXPORT_SYMBOL(gre_parse_header);
 
+#if IS_ENABLED(CONFIG_MPLS)
+int mpls_gre_rcv(struct sk_buff *skb, int gre_hdr_len)
+{
+	if (unlikely(!pskb_may_pull(skb, gre_hdr_len)))
+		goto drop;
+
+	/* Pop GRE hdr and reset the skb */
+	skb_pull(skb, gre_hdr_len);
+	skb_reset_network_header(skb);
+
+	mpls_forward(skb, skb->dev, NULL, NULL);
+
+	return 0;
+drop:
+	return NET_RX_DROP;
+}
+EXPORT_SYMBOL(mpls_gre_rcv);
+#endif
+
 static int gre_rcv(struct sk_buff *skb)
 {
 	const struct gre_protocol *proto;
diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
index 9cee986..dd4431c 100644
--- a/net/ipv4/ip_gre.c
+++ b/net/ipv4/ip_gre.c
@@ -412,10 +412,19 @@ static int gre_rcv(struct sk_buff *skb)
 			return 0;
 	}
 
+#if IS_ENABLED(CONFIG_MPLS)
+	if (unlikely(tpi.proto == htons(ETH_P_MPLS_UC))) {
+		if (mpls_gre_rcv(skb, hdr_len))
+			goto drop;
+		return 0;
+	}
+#endif
+
 	if (ipgre_rcv(skb, &tpi, hdr_len) == PACKET_RCVD)
 		return 0;
 
 	icmp_send(skb, ICMP_DEST_UNREACH, ICMP_PORT_UNREACH, 0);
+
 drop:
 	kfree_skb(skb);
 	return 0;
diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c
index c82d41e..e52396d 100644
--- a/net/ipv6/ip6_gre.c
+++ b/net/ipv6/ip6_gre.c
@@ -476,6 +476,13 @@ static int gre_rcv(struct sk_buff *skb)
 	if (hdr_len < 0)
 		goto drop;
 
+#if IS_ENABLED(CONFIG_MPLS)
+	if (unlikely(tpi.proto == htons(ETH_P_MPLS_UC))) {
+		if (mpls_gre_rcv(skb, hdr_len))
+			goto drop;
+		return 0;
+	}
+#endif
 	if (iptunnel_pull_header(skb, hdr_len, tpi.proto, false))
 		goto drop;
 
diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
index 36ea2ad..5505074 100644
--- a/net/mpls/af_mpls.c
+++ b/net/mpls/af_mpls.c
@@ -16,6 +16,7 @@
 #include <net/arp.h>
 #include <net/ip_fib.h>
 #include <net/netevent.h>
+#include <net/ip_tunnels.h>
 #include <net/netns/generic.h>
 #if IS_ENABLED(CONFIG_IPV6)
 #include <net/ipv6.h>
@@ -39,6 +40,40 @@ static int one = 1;
 static int label_limit = (1 << 20) - 1;
 static int ttl_max = 255;
 
+size_t ipgre_mpls_encap_hlen(struct ip_tunnel_encap *e)
+{
+	return sizeof(struct mpls_shim_hdr);
+}
+
+int ipgre_mpls_build_header(struct sk_buff *skb, struct ip_tunnel_encap *e,
+			    u8 *protocol, struct flowi4 *fl4)
+{
+	return 0;
+}
+
+static const struct ip_tunnel_encap_ops mpls_iptun_ops = {
+	.encap_hlen	= ipgre_mpls_encap_hlen,
+	.build_header	= ipgre_mpls_build_header,
+};
+
+static int ipgre_tunnel_encap_add_mpls_ops(void)
+{
+	int ret = -1;
+
+#if IS_ENABLED(CONFIG_NET_IP_TUNNEL)
+	ret = ip_tunnel_encap_add_ops(&mpls_iptun_ops, TUNNEL_ENCAP_MPLS);
+#endif
+
+	return ret;
+}
+
+static void ipgre_tunnel_encap_del_mpls_ops(void)
+{
+#if IS_ENABLED(CONFIG_NET_IP_TUNNEL)
+	ip_tunnel_encap_del_ops(&mpls_iptun_ops, TUNNEL_ENCAP_MPLS);
+#endif
+}
+
 static void rtmsg_lfib(int event, u32 label, struct mpls_route *rt,
 		       struct nlmsghdr *nlh, struct net *net, u32 portid,
 		       unsigned int nlm_flags);
@@ -2486,6 +2521,10 @@ static int __init mpls_init(void)
 		      0);
 	rtnl_register(PF_MPLS, RTM_GETNETCONF, mpls_netconf_get_devconf,
 		      mpls_netconf_dump_devconf, 0);
+	err = ipgre_tunnel_encap_add_mpls_ops();
+	if (err)
+		pr_err("Can't add mpls over gre tunnel ops\n");
+
 	err = 0;
 out:
 	return err;
@@ -2503,6 +2542,7 @@ static void __exit mpls_exit(void)
 	dev_remove_pack(&mpls_packet_type);
 	unregister_netdevice_notifier(&mpls_dev_notifier);
 	unregister_pernet_subsys(&mpls_net_ops);
+	ipgre_tunnel_encap_del_mpls_ops();
 }
 module_exit(mpls_exit);
 
-- 
2.1.4

^ permalink raw reply related

* Re: [PATCH net-next 7/7] nfp: flower vxlan neighbour keep-alive
From: John Hurley @ 2017-09-26  9:37 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: Simon Horman, David Miller, Jakub Kicinski, Linux Netdev List,
	oss-drivers
In-Reply-To: <CAK+XE=mVKbAqYwSYvLb0y48O9D-Oq+B_bks7c9iwjsm0j7oYvw@mail.gmail.com>

[ Reposting in plantext only]



On Mon, Sep 25, 2017 at 7:32 PM, Or Gerlitz <gerlitz.or@gmail.com> wrote:
>
> On Mon, Sep 25, 2017 at 1:23 PM, Simon Horman
> <simon.horman@netronome.com> wrote:
> > From: John Hurley <john.hurley@netronome.com>
> >
> > Periodically receive messages containing the destination IPs of tunnels
> > that have recently forwarded traffic. Update the neighbour entries 'used'
> > value for these IPs next hop.
>
> Are you proactively sending keep alive messages from the driver or the
> fw? what's wrong with the probes sent by the kernel NUD subsystem?
>

Hi Or,

The messages are sent from the FW to the driver. They indicate which
offloaded tunnels are currently active.

>
> In our driver we also update the used value for neighs of offloaded
> tunnels, we do it based on flow counters for the offloaded tunnels
> which is an evidence for activity. Any reason for you not to apply a
> similar practice?


Yes, this would provide the same outcome. Because our firmware already
offered these messages, we chose to support this approach.

>
>
> Or.

^ permalink raw reply

* Re: [PATCH net-next 0/7] nfp: flower vxlan tunnel offload
From: Jiri Benc @ 2017-09-26 10:15 UTC (permalink / raw)
  To: Simon Horman
  Cc: Or Gerlitz, David Miller, Jakub Kicinski, Linux Netdev List,
	oss-drivers, John Hurley, Paolo Abeni, Eli Cohen, Paul Blakey
In-Reply-To: <20170925170451.GD18763@vergenet.net>

On Mon, 25 Sep 2017 19:04:53 +0200, Simon Horman wrote:
> The MAC addresses are extracted from the netdevs already loaded in the
> kernel and are monitored for any changes. The IP addresses are slightly
> different in that they are extracted from the rules themselves. We make the
> assumption that, if a packet is decapsulated at the end point and a match
> is attempted on the IP address,

You lost me here, I'm afraid. What do you mean by "match"?

> that this IP address should be recognised
> in the kernel. That being the case, the same traffic pattern should be
> witnessed if the skip_hw flag is applied.

Just to be really sure that this works correctly, can you confirm that
this will match the packet:

ip link add vxlan0 type vxlan dstport 4789 dev eth0 external
ip link set dev vxlan0 up
tc qdisc add dev vxlan0 ingress
ethtool -K eth0 hw-tc-offload on
tc filter add dev vxlan0 protocol ip parent ffff: flower enc_key_id 102 \
   enc_dst_port 4789 src_ip 3.4.5.6 skip_sw action [...]

while this one will NOT match:

ip link add vxlan0 type vxlan dstport 4789 dev eth0 external
ip link set dev vxlan0 up
tc qdisc add dev eth0 ingress
ethtool -K eth0 hw-tc-offload on
tc filter add dev eth0 protocol ip parent ffff: flower enc_key_id 102 \
   enc_dst_port 4789 src_ip 3.4.5.6 skip_sw action [...]

We found that with mlx5, the second one actually matches, too. Which is
a very serious bug. (Adding Paolo who found this. And adding a few more
Mellanox guys to be aware of the bug.)

 Jiri

^ permalink raw reply

* [PATCH] net: stmmac: dwc-qos: Add suspend / resume support
From: Ed Blake @ 2017-09-26 10:43 UTC (permalink / raw)
  To: peppe.cavallaro, alexandre.torgue; +Cc: netdev, Ed Blake

Add hook to stmmac_pltfr_pm_ops for suspend / resume handling.

Signed-off-by: Ed Blake <ed.blake@sondrel.com>
---
 drivers/net/ethernet/stmicro/stmmac/dwmac-dwc-qos-eth.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-dwc-qos-eth.c b/drivers/net/ethernet/stmicro/stmmac/dwmac-dwc-qos-eth.c
index dd6a2f9..5efef80 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac-dwc-qos-eth.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-dwc-qos-eth.c
@@ -511,6 +511,7 @@ static int dwc_eth_dwmac_remove(struct platform_device *pdev)
 	.remove = dwc_eth_dwmac_remove,
 	.driver = {
 		.name           = "dwc-eth-dwmac",
+		.pm             = &stmmac_pltfr_pm_ops,
 		.of_match_table = dwc_eth_dwmac_match,
 	},
 };
-- 
1.9.1

^ permalink raw reply related

* [PATCH] net: stmmac: dwmac4: Re-enable MAC Rx before powering down
From: Ed Blake @ 2017-09-26 10:44 UTC (permalink / raw)
  To: peppe.cavallaro, alexandre.torgue; +Cc: netdev, Ed Blake

Re-enable the MAC receiver by setting CONFIG_RE before powering down,
as instructed in section 6.3.5.1 of [1].  Without this the MAC fails
to receive WoL packets and never wakes up.

[1] DWC Ethernet QoS Databook 4.10a October 2014

Signed-off-by: Ed Blake <ed.blake@sondrel.com>
---
 drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c b/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c
index c4407e8..2f7d7ec 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c
@@ -296,6 +296,7 @@ static void dwmac4_pmt(struct mac_device_info *hw, unsigned long mode)
 {
 	void __iomem *ioaddr = hw->pcsr;
 	unsigned int pmt = 0;
+	u32 config;
 
 	if (mode & WAKE_MAGIC) {
 		pr_debug("GMAC: WOL Magic frame\n");
@@ -306,6 +307,12 @@ static void dwmac4_pmt(struct mac_device_info *hw, unsigned long mode)
 		pmt |= power_down | global_unicast | wake_up_frame_en;
 	}
 
+	if (pmt) {
+		/* The receiver must be enabled for WOL before powering down */
+		config = readl(ioaddr + GMAC_CONFIG);
+		config |= GMAC_CONFIG_RE;
+		writel(config, ioaddr + GMAC_CONFIG);
+	}
 	writel(pmt, ioaddr + GMAC_PMT);
 }
 
-- 
1.9.1

^ permalink raw reply related

* Re: [PATCH v2 net-next 10/10] net: hns3: Add mqprio support when interacting with network stack
From: Yunsheng Lin @ 2017-09-26 10:49 UTC (permalink / raw)
  To: Yuval Mintz
  Cc: huangdaode@hisilicon.com, xuwei5@hisilicon.com,
	liguozhu@hisilicon.com, Yisen.Zhuang@huawei.com,
	gabriele.paoloni@huawei.com, john.garry@huawei.com,
	linuxarm@huawei.com, salil.mehta@huawei.com, lipeng321@huawei.com,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	davem@davemloft.net
In-Reply-To: <AM0PR0502MB3683C922A7D87D3E1F64B93EBF7B0@AM0PR0502MB3683.eurprd05.prod.outlook.com>

Hi, Yuval

On 2017/9/26 14:43, Yuval Mintz wrote:
>> When using tc qdisc to configure DCB parameter, dcb_ops->setup_tc
>> is used to tell hclge_dcb module to do the setup.
> 
> While this might be a step in the right direction, this causes an inconsistency
> in user experience - Some [well, most] vendors didn't allow the mqprio
> priority mapping to affect DCB, instead relying on the dcbnl functionality
> to control that configuration.
> 
> A couple of options to consider:
>   - Perhaps said logic shouldn't be contained inside the driver but rather
>      in mqprio logic itself. I.e., rely on DCBNL functionality [if available] from
>      within mqprio and try changing the configuration. 

In net/dcb/dcbnl.c
dcbnl_ieee_set already call dcbnl_ieee_notify to notify the user space
configuration has changed, does this dcbnl_ieee_notify function do the
job for us? I am not sure if lldpad has registered for this notifition.

As you suggested below, can we add a new TC_MQPRIO_HW_OFFLOAD_ value to
reflect that the configuration is needed to be changed by dcbnl_ieee_set
(perhaps some other function) in dcbnl?
Do you think it is feasible?


>   - Add a new TC_MQPRIO_HW_OFFLOAD_ value to explicitly reflect user
>      request to allow this configuration to affect DCB.
> 
>> When using lldptool to configure DCB parameter, hclge_dcb module
>> call the client_ops->setup_tc to tell network stack which queue
>> and priority is using for specific tc.
> 
> You're basically bypassing the mqprio logic.
> Since you're configuring the prio->queue mapping from DCB flow,
> you'll get an mqprio-like behavior [meaning a transmitted packet
> would reach a transmission queue associated with its priority] even
> if device wasn't grated with an mqprio qdisc.
> Why should your user even use mqprio? What benefit does he get from it?
> 
> ...
> 
>> +static int hns3_nic_set_real_num_queue(struct net_device *netdev)
>> +{
>> +	struct hns3_nic_priv *priv = netdev_priv(netdev);
>> +	struct hnae3_handle *h = priv->ae_handle;
>> +	struct hnae3_knic_private_info *kinfo = &h->kinfo;
>> +	unsigned int queue_size = kinfo->rss_size * kinfo->num_tc;
>> +	int ret;
>> +
>> +	ret = netif_set_real_num_tx_queues(netdev, queue_size);
>> +	if (ret) {
>> +		netdev_err(netdev,
>> +			   "netif_set_real_num_tx_queues fail, ret=%d!\n",
>> +			   ret);
>> +		return ret;
>> +	}
>> +
>> +	ret = netif_set_real_num_rx_queues(netdev, queue_size);
> 
> I don't think you're changing the driver behavior, but why are you setting
> the real number of rx queues based on the number of TCs?
> Do you actually open (TC x RSS) Rx queues?
> 
> .
> 

^ permalink raw reply

* Re: [PATCH net-next v9] openvswitch: enable NSH support
From: Jiri Benc @ 2017-09-26 10:49 UTC (permalink / raw)
  To: Yang, Yi
  Cc: netdev@vger.kernel.org, dev@openvswitch.org, e@erig.me,
	davem@davemloft.net, Pravin Shelar
In-Reply-To: <20170926045538.GA5896@localhost.localdomain>

On Tue, 26 Sep 2017 12:55:39 +0800, Yang, Yi wrote:
> After push_nsh, the packet won't be recirculated to flow pipeline, so
> key->eth.type must be set explicitly here, but for pop_nsh, the packet
> will be recirculated to flow pipeline, it will be reparsed, so
> key->eth.type will be set in packet parse function, we needn't handle it
> in pop_nsh.

This seems to be a very different approach than what we currently have.
Looking at the code, the requirement after "destructive" actions such
as pushing or popping headers is to recirculate.

Setting key->eth.type to satisfy conditions in the output path without
updating the rest of the key looks very hacky and fragile to me. There
might be other conditions and dependencies that are not obvious.
I don't think the code was written with such code path in mind.

I'd like to hear what Pravin thinks about this.

 Jiri

^ permalink raw reply

* Re: [PATCH net-next v10] openvswitch: enable NSH support
From: Jiri Benc @ 2017-09-26 10:50 UTC (permalink / raw)
  To: Yi Yang; +Cc: netdev, dev, e, davem
In-Reply-To: <1506401236-5716-1-git-send-email-yi.y.yang@intel.com>

On Tue, 26 Sep 2017 12:47:16 +0800, Yi Yang wrote:
> v9->v10
>  - Change struct ovs_key_nsh to
>        struct ovs_nsh_key_base base;
>        __be32 context[NSH_MD1_CONTEXT_SIZE];
>  - Fix new comments for v9

NAK, we haven't finished the discussion for v9 yet. It's not
appropriate to send a new version until there's a conclusion (or at
least until the discussion dies).

 Jiri

^ permalink raw reply

* IMPORTANT,
From: UDO JOHN @ 2017-09-26 10:53 UTC (permalink / raw)


Dear Beloved Friend,

Sorry if this email came to you as a surprise,I am Dr.Udoukwu Johnson
and we are looking for a company or individual from your region to
help us receive investment fund and safekeeping. I will send you full
details As soon As I hear from you. For fast respond Please write me
through this Email: udo.john@iiiha.net

Yours Faithfully,
Dr.Udoukwu Johnson.

^ permalink raw reply

* Re: [PATCH net-next v10] openvswitch: enable NSH support
From: Jiri Benc @ 2017-09-26 11:05 UTC (permalink / raw)
  To: Yi Yang
  Cc: dev-yBygre7rU0TnMu66kgdUjQ, netdev-u79uwXL29TY76Z2rM5mHXA, e,
	davem-fT/PcQaiUtIeIZ0/mPfg9Q
In-Reply-To: <1506401236-5716-1-git-send-email-yi.y.yang-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

On Tue, 26 Sep 2017 12:47:16 +0800, Yi Yang wrote:
> +	return ((ret != 0) ? false : true);

I'm not going to review this version but this caught my eye - I pointed
out this silly construct in my review of v9. I can understand that
working late in the night and rewriting the code back and forth, one
could end up with such construct and overlook it at the final
self-review before submission. Happens to everyone.

But leaving this after a review pointed it out means you're not paying
enough attention to your work. Even the fact that you sent v10 so
shortly after v9 means you did not spend the needed time on reflecting
on the review and that you did not properly test the new version. And
I told you exactly this before.

Honestly, I'm starting to be tired with reviewing your patch again and
again and pointing out silly mistakes like this one and repeating basic
things to you again and again.

 Jiri

^ permalink raw reply

* [PATCH] lib: fix multiple strlcpy definition
From: Baruch Siach @ 2017-09-26 11:08 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev, Baruch Siach, Phil Sutter

Some C libraries, like uClibc and musl, provide BSD compatible
strlcpy(). Add check_strlcpy() to configure, and avoid defining strlcpy
and strlcat when the C library provides them.

This fixes the following static link error with uClibc-ng:

.../sysroot/usr/lib/libc.a(strlcpy.os): In function `strlcpy':
strlcpy.c:(.text+0x0): multiple definition of `strlcpy'
../lib/libutil.a(utils.o):utils.c:(.text+0x1ddc): first defined here
collect2: error: ld returned 1 exit status

Cc: Phil Sutter <phil@nwl.cc>
Signed-off-by: Baruch Siach <baruch@tkos.co.il>
---
 configure    | 24 ++++++++++++++++++++++++
 lib/Makefile |  4 ++++
 lib/utils.c  |  2 ++
 3 files changed, 30 insertions(+)

diff --git a/configure b/configure
index 7be8fb113cc9..787b2e061af9 100755
--- a/configure
+++ b/configure
@@ -326,6 +326,27 @@ EOF
     rm -f $TMPDIR/dbtest.c $TMPDIR/dbtest
 }
 
+check_strlcpy()
+{
+    cat >$TMPDIR/strtest.c <<EOF
+#include <string.h>
+int main(int argc, char **argv) {
+	char dst[10];
+	strlcpy("test", dst, sizeof(dst));
+	return 0;
+}
+EOF
+    $CC -I$INCLUDE -o $TMPDIR/strtest $TMPDIR/strtest.c >/dev/null 2>&1
+    if [ $? -eq 0 ]
+    then
+	echo "no"
+    else
+	echo "NEED_STRLCPY:=y" >>$CONFIG
+	echo "yes"
+    fi
+    rm -f $TMPDIR/strtest.c $TMPDIR/strtest
+}
+
 quiet_config()
 {
 	cat <<EOF
@@ -397,6 +418,9 @@ check_mnl
 echo -n "Berkeley DB: "
 check_berkeley_db
 
+echo -n "need for strlcpy: "
+check_strlcpy
+
 echo
 echo -n "docs:"
 check_docs
diff --git a/lib/Makefile b/lib/Makefile
index 0fbdf4c31f50..132ad00c3335 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -1,5 +1,9 @@
 include ../config.mk
 
+ifeq ($(NEED_STRLCPY),y)
+	CFLAGS += -DNEED_STRLCPY
+endif
+
 CFLAGS += -fPIC
 
 UTILOBJ = utils.o rt_names.o ll_types.o ll_proto.o ll_addr.o \
diff --git a/lib/utils.c b/lib/utils.c
index bbd3cbc46a0e..240e7426a810 100644
--- a/lib/utils.c
+++ b/lib/utils.c
@@ -1231,6 +1231,7 @@ int get_real_family(int rtm_type, int rtm_family)
 	return rtm_family;
 }
 
+#ifdef NEED_STRLCPY
 size_t strlcpy(char *dst, const char *src, size_t size)
 {
 	size_t srclen = strlen(src);
@@ -1253,3 +1254,4 @@ size_t strlcat(char *dst, const char *src, size_t size)
 
 	return dlen + strlcpy(dst + dlen, src, size - dlen);
 }
+#endif
-- 
2.14.1

^ permalink raw reply related

* Re: [PATCH v2 net-next 0/7] net: speedup netns create/delete time
From: Tariq Toukan @ 2017-09-26 11:21 UTC (permalink / raw)
  To: Eric Dumazet, David S . Miller
  Cc: netdev, Eric W . Biederman, Eric Dumazet, Majd Dibbiny,
	Yonatan Cohen, Eran Ben Elisha
In-Reply-To: <20170919232709.14690-1-edumazet@google.com>


On 20/09/2017 2:27 AM, Eric Dumazet wrote:
> When rate of netns creation/deletion is high enough,
> we observe softlockups in cleanup_net() caused by huge list
> of netns and way too many rcu_barrier() calls.
> 
> This patch series does some optimizations in kobject,
> and add batching to tunnels so that netns dismantles are
> less costly.
> 
> IPv6 addrlabels also get a per netns list, and tcp_metrics
> also benefit from batch flushing.
> 
> This gives me one order of magnitude gain.
> (~50 ms -> ~5 ms for one netns create/delete pair)
> 
...
> 
> Eric Dumazet (7):
>    kobject: add kobject_uevent_net_broadcast()
>    kobject: copy env blob in one go
>    kobject: factorize skb setup in kobject_uevent_net_broadcast()
>    ipv6: addrlabel: per netns list
>    tcp: batch tcp_net_metrics_exit
>    ipv6: speedup ipv6 tunnels dismantle
>    ipv4: speedup ipv6 tunnels dismantle
> 
>   include/net/ip_tunnels.h |  3 +-
>   include/net/netns/ipv6.h |  5 +++
>   lib/kobject_uevent.c     | 94 ++++++++++++++++++++++++++----------------------
>   net/ipv4/ip_gre.c        | 22 +++++-------
>   net/ipv4/ip_tunnel.c     | 12 +++++--
>   net/ipv4/ip_vti.c        |  7 ++--
>   net/ipv4/ipip.c          |  7 ++--
>   net/ipv4/tcp_metrics.c   | 14 +++++---
>   net/ipv6/addrlabel.c     | 81 ++++++++++++++++-------------------------
>   net/ipv6/ip6_gre.c       |  8 +++--
>   net/ipv6/ip6_tunnel.c    | 20 ++++++-----
>   net/ipv6/ip6_vti.c       | 23 +++++++-----
>   net/ipv6/sit.c           |  9 +++--
>   13 files changed, 157 insertions(+), 148 deletions(-)
> 

Hi Eric,

We see a regression introduced in this series, specifically in the 
patches touching lib/kobject_uevent.c.
We tried to figure out what is wrong there, but couldn't point it out.

Bug is that mlx4 driver restart fails, because mlx4_core is still in use.
According to module dependencies, both mlx4_en and mlx4_ib should have 
been unloaded at this point
Please see log below.

This looks to be some kind of a race, as the repro is not deterministic.
Probably the en/ib modules are now mistakenly reloaded.

Any idea what could this be?

Regards,
Tariq


[root@reg-l-vrt-41016-009 ~]# /etc/init.d/openibd stop
Unloading HCA driver:                                      [  OK  ]
[root@reg-l-vrt-41016-009 ~]# /etc/init.d/openibd start
Loading HCA driver and Access Layer:                       [  OK  ]
[root@reg-l-vrt-41016-009 ~]# /etc/init.d/openibd stop
Unloading mlx4_core                                        [FAILED]
rmmod: ERROR: Module mlx4_core is in use

^ permalink raw reply

* [PATCH net-next v4 0/4] rtnetlink: preparation patches for further rtnl lock pushdown/removal
From: Florian Westphal @ 2017-09-26 11:58 UTC (permalink / raw)
  To: netdev

Patches split large rtnl_fill_ifinfo into smaller chunks
to better see which parts

1. require rtnl
2. do not require it at all
3. rely on rtnl locking now but could be converted

Changes since v3:

I dropped the 'ifalias' patch, I have a change to decouple ifalias and
rtnl mutex, I will send it once this series has been merged.

^ permalink raw reply

* [PATCH net-next v4 1/4] rtnetlink: add helper to put master and link ifindexes
From: Florian Westphal @ 2017-09-26 11:58 UTC (permalink / raw)
  To: netdev; +Cc: Florian Westphal
In-Reply-To: <20170926115843.12013-1-fw@strlen.de>

rtnl_fill_ifinfo currently requires caller to hold the rtnl mutex.
Unfortunately the function is quite large which makes it harder to see
which spots require the lock, which spots assume it and which ones could
do without.

Add helpers to factor out the ifindex dumping, one can use rcu to avoid
rtnl dependency.

Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
---
 No changes in v4.

 net/core/rtnetlink.c | 32 +++++++++++++++++++++++++++-----
 1 file changed, 27 insertions(+), 5 deletions(-)

diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index a78fd61da0ec..c801212ee40e 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -1307,6 +1307,31 @@ static u32 rtnl_get_event(unsigned long event)
 	return rtnl_event_type;
 }
 
+static int put_master_ifindex(struct sk_buff *skb, struct net_device *dev)
+{
+	const struct net_device *upper_dev;
+	int ret = 0;
+
+	rcu_read_lock();
+
+	upper_dev = netdev_master_upper_dev_get_rcu(dev);
+	if (upper_dev)
+		ret = nla_put_u32(skb, IFLA_MASTER, upper_dev->ifindex);
+
+	rcu_read_unlock();
+	return ret;
+}
+
+static int nla_put_iflink(struct sk_buff *skb, const struct net_device *dev)
+{
+	int ifindex = dev_get_iflink(dev);
+
+	if (dev->ifindex == ifindex)
+		return 0;
+
+	return nla_put_u32(skb, IFLA_LINK, ifindex);
+}
+
 static int rtnl_fill_ifinfo(struct sk_buff *skb, struct net_device *dev,
 			    int type, u32 pid, u32 seq, u32 change,
 			    unsigned int flags, u32 ext_filter_mask,
@@ -1316,7 +1341,6 @@ static int rtnl_fill_ifinfo(struct sk_buff *skb, struct net_device *dev,
 	struct nlmsghdr *nlh;
 	struct nlattr *af_spec;
 	struct rtnl_af_ops *af_ops;
-	struct net_device *upper_dev = netdev_master_upper_dev_get(dev);
 
 	ASSERT_RTNL();
 	nlh = nlmsg_put(skb, pid, seq, type, sizeof(*ifm), flags);
@@ -1345,10 +1369,8 @@ static int rtnl_fill_ifinfo(struct sk_buff *skb, struct net_device *dev,
 #ifdef CONFIG_RPS
 	    nla_put_u32(skb, IFLA_NUM_RX_QUEUES, dev->num_rx_queues) ||
 #endif
-	    (dev->ifindex != dev_get_iflink(dev) &&
-	     nla_put_u32(skb, IFLA_LINK, dev_get_iflink(dev))) ||
-	    (upper_dev &&
-	     nla_put_u32(skb, IFLA_MASTER, upper_dev->ifindex)) ||
+	    nla_put_iflink(skb, dev) ||
+	    put_master_ifindex(skb, dev) ||
 	    nla_put_u8(skb, IFLA_CARRIER, netif_carrier_ok(dev)) ||
 	    (dev->qdisc &&
 	     nla_put_string(skb, IFLA_QDISC, dev->qdisc->ops->id)) ||
-- 
2.13.5

^ permalink raw reply related

* [PATCH net-next v4 2/4] rtnetlink: add helpers to dump vf information
From: Florian Westphal @ 2017-09-26 11:58 UTC (permalink / raw)
  To: netdev; +Cc: Florian Westphal
In-Reply-To: <20170926115843.12013-1-fw@strlen.de>

similar to earlier patches, split out more parts of this function to
better see what is happening and where we assume rtnl is locked.

Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
---
 No changes in v4.

 net/core/rtnetlink.c | 50 +++++++++++++++++++++++++++++++-------------------
 1 file changed, 31 insertions(+), 19 deletions(-)

diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index c801212ee40e..d504e78cd01f 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -1211,6 +1211,36 @@ static noinline_for_stack int rtnl_fill_vfinfo(struct sk_buff *skb,
 	return -EMSGSIZE;
 }
 
+static noinline_for_stack int rtnl_fill_vf(struct sk_buff *skb,
+					   struct net_device *dev,
+					   u32 ext_filter_mask)
+{
+	struct nlattr *vfinfo;
+	int i, num_vfs;
+
+	if (!dev->dev.parent || ((ext_filter_mask & RTEXT_FILTER_VF) == 0))
+		return 0;
+
+	num_vfs = dev_num_vf(dev->dev.parent);
+	if (nla_put_u32(skb, IFLA_NUM_VF, num_vfs))
+		return -EMSGSIZE;
+
+	if (!dev->netdev_ops->ndo_get_vf_config)
+		return 0;
+
+	vfinfo = nla_nest_start(skb, IFLA_VFINFO_LIST);
+	if (!vfinfo)
+		return -EMSGSIZE;
+
+	for (i = 0; i < num_vfs; i++) {
+		if (rtnl_fill_vfinfo(skb, dev, i, vfinfo))
+			return -EMSGSIZE;
+	}
+
+	nla_nest_end(skb, vfinfo);
+	return 0;
+}
+
 static int rtnl_fill_link_ifmap(struct sk_buff *skb, struct net_device *dev)
 {
 	struct rtnl_link_ifmap map;
@@ -1407,27 +1437,9 @@ static int rtnl_fill_ifinfo(struct sk_buff *skb, struct net_device *dev,
 	if (rtnl_fill_stats(skb, dev))
 		goto nla_put_failure;
 
-	if (dev->dev.parent && (ext_filter_mask & RTEXT_FILTER_VF) &&
-	    nla_put_u32(skb, IFLA_NUM_VF, dev_num_vf(dev->dev.parent)))
+	if (rtnl_fill_vf(skb, dev, ext_filter_mask))
 		goto nla_put_failure;
 
-	if (dev->netdev_ops->ndo_get_vf_config && dev->dev.parent &&
-	    ext_filter_mask & RTEXT_FILTER_VF) {
-		int i;
-		struct nlattr *vfinfo;
-		int num_vfs = dev_num_vf(dev->dev.parent);
-
-		vfinfo = nla_nest_start(skb, IFLA_VFINFO_LIST);
-		if (!vfinfo)
-			goto nla_put_failure;
-		for (i = 0; i < num_vfs; i++) {
-			if (rtnl_fill_vfinfo(skb, dev, i, vfinfo))
-				goto nla_put_failure;
-		}
-
-		nla_nest_end(skb, vfinfo);
-	}
-
 	if (rtnl_port_fill(skb, dev, ext_filter_mask))
 		goto nla_put_failure;
 
-- 
2.13.5

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox