Netdev List
 help / color / mirror / Atom feed
* RE: [PATCH net-netx] net: lan78xx: add LAN7801 MAC-only support
From: Woojung.Huh @ 2016-11-29 23:55 UTC (permalink / raw)
  To: f.fainelli, davem, andrew; +Cc: netdev, UNGLinuxDriver
In-Reply-To: <dcabc07f-ab5f-ad25-5beb-09f717c06e4e@gmail.com>

> There are two ways to get these settings propagated to the PHY driver:
> 
> - using a board fixup which is going to be invoked during
> drv->config_init() time
> 
> - specifying a phydev->dev_flags and reading it from the PHY driver to
> act upon and configure the PHY based on that value, there are only
> 32-bits available though, and you need to make sure they are not
> conflicting with other potential users in tree
> 
> My preference would go with 1, since you could just register it in your
> PHY driver and re-use the code you are proposing to include here.

Florian,

It seems phy_unregister_fixup() will be needed for module network driver.
phy_fixup_list keeps the list even after unloading module.
Do you know any update is waiting for submission? If not, I'll make patch.

Thanks.
Woojung

^ permalink raw reply

* Re: [PATCH v2 net-next 2/2] openvswitch: Fix skb->protocol for vlan frames.
From: Jarno Rajahalme @ 2016-11-29 23:32 UTC (permalink / raw)
  To: Pravin Shelar; +Cc: Linux Kernel Network Developers, Jiri Benc
In-Reply-To: <CAOrHB_B15uBYL-nVhsCoRKzZ1Akmn0ZDcee57VUXGAjMNEAAWQ@mail.gmail.com>


> On Nov 28, 2016, at 11:21 PM, Pravin Shelar <pshelar@ovn.org> wrote:
> 
> On Mon, Nov 28, 2016 at 6:41 PM, Jarno Rajahalme <jarno@ovn.org> wrote:
>> Do not set skb->protocol to be the ethertype of the L3 header, unless
>> the packet only has the L3 header.  For a non-hardware offloaded VLAN
>> frame skb->protocol needs to be one of the VLAN ethertypes.
>> 
>> Any VLAN offloading is undone on the OVS netlink interface.  Also any
>> VLAN tags added by userspace are non-offloaded.
>> 
>> Incorrect skb->protocol value on a full-size non-offloaded VLAN skb
>> causes packet drop due to failing MTU check, as the VLAN header should
>> not be counted in when considering MTU in ovs_vport_send().
>> 
> I think we should move to is_skb_forwardable() type of packet length
> check in vport-send and get rid of skb-protocol checks altogether.
> 

I added a new patch to do this, thanks for the suggestion.

>> Fixes: 5108bbaddc ("openvswitch: add processing of L3 packets")
>> Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
>> ---
>> v2: Set skb->protocol when an ETH_P_TEB frame is received via ARPHRD_NONE
>>    interface.
>> 
>> net/openvswitch/datapath.c |  1 -
>> net/openvswitch/flow.c     | 30 ++++++++++++++++++++++--------
>> 2 files changed, 22 insertions(+), 9 deletions(-)
> ...
> ...
>> @@ -531,15 +538,22 @@ static int key_extract(struct sk_buff *skb, struct sw_flow_key *key)
>>                if (unlikely(parse_vlan(skb, key)))
>>                        return -ENOMEM;
>> 
>> -               skb->protocol = parse_ethertype(skb);
>> -               if (unlikely(skb->protocol == htons(0)))
>> +               key->eth.type = parse_ethertype(skb);
>> +               if (unlikely(key->eth.type == htons(0)))
>>                        return -ENOMEM;
>> 
>> +               if (skb->protocol == htons(ETH_P_TEB)) {
>> +                       if (key->eth.vlan.tci & htons(VLAN_TAG_PRESENT)
>> +                           && !skb_vlan_tag_present(skb))
>> +                               skb->protocol = key->eth.vlan.tpid;
>> +                       else
>> +                               skb->protocol = key->eth.type;
>> +               }
>> +
> 
> I am not sure if this work in case of nested vlans.
> Can we move skb-protocol assignment to parse_vlan() to avoid checking
> for non-accelerated vlan case again here?

I did this for the v3 that I sent out a moment ago.

Thanks for the review,

  Jarno

^ permalink raw reply

* Re: [PATCH net-next 1/7] net/mlx5e: Implement Fragmented Work Queue (WQ)
From: Eric Dumazet @ 2016-11-29 23:31 UTC (permalink / raw)
  To: Saeed Mahameed
  Cc: David S. Miller, netdev, Tariq Toukan, Or Gerlitz, Roi Dayan,
	Sebastian Ott
In-Reply-To: <1480457972-31286-2-git-send-email-saeedm@mellanox.com>

On Wed, 2016-11-30 at 00:19 +0200, Saeed Mahameed wrote:
> From: Tariq Toukan <tariqt@mellanox.com>
> 
> Add new type of struct mlx5_frag_buf which is used to allocate fragmented
> buffers rather than contiguous, and make the Completion Queues (CQs) use
> it as they are big (default of 2MB per CQ in Striding RQ).
> 
> This fixes the failures of type:
> "mlx5e_open_locked: mlx5e_open_channels failed, -12"
> due to dma_zalloc_coherent insufficient contiguous coherent memory to
> satisfy the driver's request when the user tries to setup more or larger
> rings.
> 
> Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
> Reported-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
> ---
>  drivers/net/ethernet/mellanox/mlx5/core/alloc.c   | 66 +++++++++++++++++++++++
>  drivers/net/ethernet/mellanox/mlx5/core/en.h      |  2 +-
>  drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 10 ++--
>  drivers/net/ethernet/mellanox/mlx5/core/wq.c      | 26 ++++++---
>  drivers/net/ethernet/mellanox/mlx5/core/wq.h      | 18 +++++--
>  include/linux/mlx5/driver.h                       | 11 ++++
>  6 files changed, 116 insertions(+), 17 deletions(-)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/alloc.c b/drivers/net/ethernet/mellanox/mlx5/core/alloc.c
> index 2c6e3c7..bc8357d 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/alloc.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/alloc.c
> @@ -106,6 +106,63 @@ void mlx5_buf_free(struct mlx5_core_dev *dev, struct mlx5_buf *buf)
>  }
>  EXPORT_SYMBOL_GPL(mlx5_buf_free);
>  
> +int mlx5_frag_buf_alloc_node(struct mlx5_core_dev *dev, int size,
> +			     struct mlx5_frag_buf *buf, int node)
> +{
> +	int i;
> +
> +	buf->size = size;
> +	buf->npages = 1 << get_order(size);
> +	buf->page_shift = PAGE_SHIFT;
> +	buf->frags = kcalloc(buf->npages, sizeof(struct mlx5_buf_list),
> +			     GFP_KERNEL);
> +	if (!buf->frags)
> +		goto err_out;
> +
> +	for (i = 0; i < buf->npages; i++) {
> +		struct mlx5_buf_list *frag = &buf->frags[i];
> +		int frag_sz = min_t(int, size, PAGE_SIZE);
> +
> +		frag->buf = mlx5_dma_zalloc_coherent_node(dev, frag_sz,
> +							  &frag->map, node);
> +		if (!frag->buf)
> +			goto err_free_buf;
> +		if (frag->map & ((1 << buf->page_shift) - 1)) {
> +			dma_free_coherent(&dev->pdev->dev, frag_sz,
> +					  buf->frags[i].buf, buf->frags[i].map);

There is a bug if this happens with i = 0

> +			mlx5_core_warn(dev, "unexpected map alignment: 0x%p, page_shift=%d\n",
> +				       (void *)frag->map, buf->page_shift);
> +			goto err_free_buf;
> +		}
> +		size -= frag_sz;
> +	}
> +
> +	return 0;
> +
> +err_free_buf:
> +	while (--i)

Because this loop will be done about 2^32 times.

> +		dma_free_coherent(&dev->pdev->dev, PAGE_SIZE, buf->frags[i].buf,
> +				  buf->frags[i].map);
> +	kfree(buf->frags);
> +err_out:
> +	return -ENOMEM;
> +}

^ permalink raw reply

* [PATCH v3 net-next 3/3] openvswitch: Fix skb->protocol for vlan frames.
From: Jarno Rajahalme @ 2016-11-29 23:30 UTC (permalink / raw)
  To: netdev; +Cc: jarno, jbenc, pshelar, e
In-Reply-To: <1480462253-114713-1-git-send-email-jarno@ovn.org>

Do not always set skb->protocol to be the ethertype of the L3 header.
For a packet with non-accelerated VLAN tags skb->protocol needs to be
the ethertype of the outermost non-accelerated VLAN ethertype.

Any VLAN offloading is undone on the OVS netlink interface, and any
VLAN tags added by userspace are non-accelerated, as are double tagged
VLAN packets.

Fixes: 018c1dda5f ("openvswitch: 802.1AD Flow handling, actions, vlan parsing, netlink attributes")
Fixes: 5108bbaddc ("openvswitch: add processing of L3 packets")
Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
---
v3: Set skb->protocol properly also for double tagged packets, as suggested
    by Pravin.  This patch is no longer needed for the MTU test failure, as
    the new patch 2/3 addresses that.

 net/openvswitch/datapath.c |  1 -
 net/openvswitch/flow.c     | 62 +++++++++++++++++++++++-----------------------
 2 files changed, 31 insertions(+), 32 deletions(-)

diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index 2d4c4d3..9c62b63 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -606,7 +606,6 @@ static int ovs_packet_cmd_execute(struct sk_buff *skb, struct genl_info *info)
 	rcu_assign_pointer(flow->sf_acts, acts);
 	packet->priority = flow->key.phy.priority;
 	packet->mark = flow->key.phy.skb_mark;
-	packet->protocol = flow->key.eth.type;
 
 	rcu_read_lock();
 	dp = get_dp_rcu(net, ovs_header->dp_ifindex);
diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c
index 08aa926..e2fe2e5 100644
--- a/net/openvswitch/flow.c
+++ b/net/openvswitch/flow.c
@@ -354,6 +354,7 @@ static int parse_vlan(struct sk_buff *skb, struct sw_flow_key *key)
 		res = parse_vlan_tag(skb, &key->eth.vlan);
 		if (res <= 0)
 			return res;
+		skb->protocol = key->eth.vlan.tpid;
 	}
 
 	/* Parse inner vlan tag. */
@@ -361,6 +362,11 @@ static int parse_vlan(struct sk_buff *skb, struct sw_flow_key *key)
 	if (res <= 0)
 		return res;
 
+	/* If the outer vlan tag was accelerated, skb->protocol should
+	 * refelect the inner vlan type. */
+	if (!eth_type_vlan(skb->protocol))
+		skb->protocol = key->eth.cvlan.tpid;
+
 	return 0;
 }
 
@@ -477,12 +483,18 @@ static int parse_icmpv6(struct sk_buff *skb, struct sw_flow_key *key,
 }
 
 /**
- * key_extract - extracts a flow key from an Ethernet frame.
+ * key_extract - extracts a flow key from a packet with or without an
+ * Ethernet header.
  * @skb: sk_buff that contains the frame, with skb->data pointing to the
- * Ethernet header
+ * beginning of the packet.
  * @key: output flow key
  *
- * The caller must ensure that skb->len >= ETH_HLEN.
+ * 'key->mac_proto' must be initialized to indicate the frame type.  For an L3
+ * frame 'key->mac_proto' must equal 'MAC_PROTO_NONE', and the caller must
+ * ensure that 'skb->protocol' is set to the ethertype of the L3 header.
+ * Otherwise the presence of an Ethernet header is assumed and the caller must
+ * ensure that skb->len >= ETH_HLEN and that 'skb->protocol' is initialized to
+ * zero.
  *
  * Returns 0 if successful, otherwise a negative errno value.
  *
@@ -498,8 +510,9 @@ static int parse_icmpv6(struct sk_buff *skb, struct sw_flow_key *key,
  *      of a correct length, otherwise the same as skb->network_header.
  *      For other key->eth.type values it is left untouched.
  *
- *    - skb->protocol: the type of the data starting at skb->network_header.
- *      Equals to key->eth.type.
+ *    - skb->protocol: For non-accelerated VLAN, one of the VLAN ether types,
+ *      otherwise the same as key->eth.type, the ether type of the payload
+ *      starting at skb->network_header.
  */
 static int key_extract(struct sk_buff *skb, struct sw_flow_key *key)
 {
@@ -531,15 +544,16 @@ static int key_extract(struct sk_buff *skb, struct sw_flow_key *key)
 		if (unlikely(parse_vlan(skb, key)))
 			return -ENOMEM;
 
-		skb->protocol = parse_ethertype(skb);
-		if (unlikely(skb->protocol == htons(0)))
+		key->eth.type = parse_ethertype(skb);
+		if (unlikely(key->eth.type == htons(0)))
 			return -ENOMEM;
 
 		skb_reset_network_header(skb);
 		__skb_push(skb, skb->data - skb_mac_header(skb));
 	}
 	skb_reset_mac_len(skb);
-	key->eth.type = skb->protocol;
+	if (!eth_type_vlan(skb->protocol))
+		skb->protocol = key->eth.type;
 
 	/* Network layer. */
 	if (key->eth.type == htons(ETH_P_IP)) {
@@ -800,29 +814,15 @@ int ovs_flow_key_extract_userspace(struct net *net, const struct nlattr *attr,
 	if (err)
 		return err;
 
-	if (ovs_key_mac_proto(key) == MAC_PROTO_NONE) {
-		/* key_extract assumes that skb->protocol is set-up for
-		 * layer 3 packets which is the case for other callers,
-		 * in particular packets recieved from the network stack.
-		 * Here the correct value can be set from the metadata
-		 * extracted above.
-		 */
-		skb->protocol = key->eth.type;
-	} else {
-		struct ethhdr *eth;
-
-		skb_reset_mac_header(skb);
-		eth = eth_hdr(skb);
-
-		/* Normally, setting the skb 'protocol' field would be
-		 * handled by a call to eth_type_trans(), but it assumes
-		 * there's a sending device, which we may not have.
-		 */
-		if (eth_proto_is_802_3(eth->h_proto))
-			skb->protocol = eth->h_proto;
-		else
-			skb->protocol = htons(ETH_P_802_2);
-	}
+	/* key_extract assumes that skb->protocol is set-up for
+	 * layer 3 packets which is the case for other callers,
+	 * in particular packets recieved from the network stack.
+	 * Here the correct value can be set from the metadata
+	 * extracted above.  For layer 2 packets we initialize
+         * skb->protocol to zero and set it in key_extract() while
+         * parsing the L2 headers.
+	 */
+	skb->protocol = key->eth.type;
 
 	return key_extract(skb, key);
 }
-- 
2.1.4

^ permalink raw reply related

* [PATCH v3 net-next 2/3] openvswitch: Use is_skb_forwardable() for length check.
From: Jarno Rajahalme @ 2016-11-29 23:30 UTC (permalink / raw)
  To: netdev; +Cc: jarno, jbenc, pshelar, e
In-Reply-To: <1480462253-114713-1-git-send-email-jarno@ovn.org>

Use is_skb_forwardable() instead of an explicit length check.  This
gets around the apparent MTU check failure in OVS test cases when
skb->protocol is not properly set in case of non-accelerated VLAN
skbs.

Suggested-by: Pravin Shelar <pshelar@ovn.org>
Fixes: 5108bbaddc ("openvswitch: add processing of L3 packets")
Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
---
v3: New patch suggested by Pravin.

net/openvswitch/vport.c | 38 ++++++++++++++------------------------
 1 file changed, 14 insertions(+), 24 deletions(-)

diff --git a/net/openvswitch/vport.c b/net/openvswitch/vport.c
index b6c8524..076b39f 100644
--- a/net/openvswitch/vport.c
+++ b/net/openvswitch/vport.c
@@ -464,27 +464,8 @@ int ovs_vport_receive(struct vport *vport, struct sk_buff *skb,
 	return 0;
 }
 
-static unsigned int packet_length(const struct sk_buff *skb,
-				  struct net_device *dev)
-{
-	unsigned int length = skb->len - dev->hard_header_len;
-
-	if (!skb_vlan_tag_present(skb) &&
-	    eth_type_vlan(skb->protocol))
-		length -= VLAN_HLEN;
-
-	/* Don't subtract for multiple VLAN tags. Most (all?) drivers allow
-	 * (ETH_LEN + VLAN_HLEN) in addition to the mtu value, but almost none
-	 * account for 802.1ad. e.g. is_skb_forwardable().
-	 */
-
-	return length;
-}
-
 void ovs_vport_send(struct vport *vport, struct sk_buff *skb, u8 mac_proto)
 {
-	int mtu = vport->dev->mtu;
-
 	switch (vport->dev->type) {
 	case ARPHRD_NONE:
 		if (mac_proto == MAC_PROTO_ETHERNET) {
@@ -504,11 +485,20 @@ void ovs_vport_send(struct vport *vport, struct sk_buff *skb, u8 mac_proto)
 		goto drop;
 	}
 
-	if (unlikely(packet_length(skb, vport->dev) > mtu &&
-		     !skb_is_gso(skb))) {
-		net_warn_ratelimited("%s: dropped over-mtu packet: %d > %d\n",
-				     vport->dev->name,
-				     packet_length(skb, vport->dev), mtu);
+	if (unlikely(!is_skb_forwardable(vport->dev, skb))) {
+		/* Log only if the device is up. */
+		if (vport->dev->flags & IFF_UP) {
+			unsigned int length = skb->len
+				- vport->dev->hard_header_len;
+
+			if (!skb_vlan_tag_present(skb)
+			    && eth_type_vlan(skb->protocol))
+				length -= VLAN_HLEN;
+
+			net_warn_ratelimited("%s: dropped over-mtu packet %d > %d\n",
+					     vport->dev->name, length,
+					     vport->dev->mtu);
+		}
 		vport->dev->stats.tx_errors++;
 		goto drop;
 	}
-- 
2.1.4

^ permalink raw reply related

* [PATCH v3 net-next 1/3] openvswitch: Add a missing break statement.
From: Jarno Rajahalme @ 2016-11-29 23:30 UTC (permalink / raw)
  To: netdev; +Cc: jarno, jbenc, pshelar, e

Add a break statement to prevent fall-through from
OVS_KEY_ATTR_ETHERNET to OVS_KEY_ATTR_TUNNEL.  Without the break
actions setting ethernet addresses fail to validate with log messages
complaining about invalid tunnel attributes.

Fixes: 0a6410fbde ("openvswitch: netlink: support L3 packets")
Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Acked-by: Jiri Benc <jbenc@redhat.com>
---
v3: No change.

 net/openvswitch/flow_netlink.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c
index d19044f..c87d359 100644
--- a/net/openvswitch/flow_netlink.c
+++ b/net/openvswitch/flow_netlink.c
@@ -2195,6 +2195,7 @@ static int validate_set(const struct nlattr *a,
 	case OVS_KEY_ATTR_ETHERNET:
 		if (mac_proto != MAC_PROTO_ETHERNET)
 			return -EINVAL;
+		break;
 
 	case OVS_KEY_ATTR_TUNNEL:
 		if (masked)
-- 
2.1.4

^ permalink raw reply related

* Re: bnx2 breaks Dell R815 BMC IPMI since 4.8
From: Gavin Shan @ 2016-11-29 23:28 UTC (permalink / raw)
  To: Brice Goglin; +Cc: Linux Network Development list, Baoquan He
In-Reply-To: <583D26EF.60207@inria.fr>

On Tue, Nov 29, 2016 at 07:57:51AM +0100, Brice Goglin wrote:
>Hello
>
>My Dell PowerEdge R815 doesn't have IPMI anymore when I boot a 4.8
>kernel, the BMC doesn't even ping anymore. Its Ethernet devices are 4 of
>those:
>
>01:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet (rev 20)
>	DeviceName: Embedded NIC 1                          
>	Subsystem: Dell NetXtreme II BCM5709 Gigabit Ethernet
>	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
>	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
>	Latency: 0, Cache Line Size: 64 bytes
>	Interrupt: pin A routed to IRQ 42
>	Region 0: Memory at e6000000 (64-bit, non-prefetchable) [size=32M]
>	Capabilities: <access denied>
>	Kernel driver in use: bnx2
>	Kernel modules: bnx2
>
>The only change in bnx2 between 4.7 and 4.8 appears to be this one:
>
>commit 3e1be7ad2d38c6bd6aeef96df9bd0a7822f4e51c
>Author: Baoquan He <bhe@redhat.com>
>Date:   Fri Sep 9 22:43:12 2016 +0800
>
>    bnx2: Reset device during driver initialization
>
>Could you patch actually break the BMC? What do I need to further debug
>this issue?
>

Brice, could you please confirm NCSI is enabled on BMC? It seems NCSI
AEN pakets aren't handled by NCSI stack on BMC side when resetting NIC
on host side. Those AEN packets usually help to reconfigure the (active)
NCSI channel and bring it back to service after the reset.

Thanks,
Gavin

^ permalink raw reply

* RE: [PATCH net-netx] net: lan78xx: add LAN7801 MAC-only support
From: Woojung.Huh @ 2016-11-29 23:20 UTC (permalink / raw)
  To: f.fainelli, davem, andrew; +Cc: netdev, UNGLinuxDriver
In-Reply-To: <dcabc07f-ab5f-ad25-5beb-09f717c06e4e@gmail.com>

> There are two ways to get these settings propagated to the PHY driver:
> 
> - using a board fixup which is going to be invoked during
> drv->config_init() time
> 
> - specifying a phydev->dev_flags and reading it from the PHY driver to
> act upon and configure the PHY based on that value, there are only
> 32-bits available though, and you need to make sure they are not
> conflicting with other potential users in tree
> 
> My preference would go with 1, since you could just register it in your
> PHY driver and re-use the code you are proposing to include here.

fixup looks good to me too. Will submit revision soon.

Thanks.
- Woojung

^ permalink raw reply

* Re: [PATCH net] fib_trie: Avoid expensive update of suffix length when not required
From: Alexander Duyck @ 2016-11-29 23:14 UTC (permalink / raw)
  To: Robert Shearman; +Cc: David Miller, Netdev, Alexander Duyck
In-Reply-To: <1480441605-27855-1-git-send-email-rshearma@brocade.com>

On Tue, Nov 29, 2016 at 9:46 AM, Robert Shearman <rshearma@brocade.com> wrote:
> With certain distributions of routes it can take a long time to add
> and delete further routes at scale. For example, with a route for
> 10.37.96.0/20 present it takes 47s to add ~200k contiguous /24 routes
> from 8.0.0.0/24 through to 11.138.207.0/24. Perf shows the bottleneck
> is update_suffix:
>
>       40.39%  [kernel]                      [k] update_suffix
>        8.02%  libnl-3.so.200.19.0           [.] nl_hash_table_lookup

Well yeah, it will be expensive when you have over 512K entries in a
single node.  I'm assuming that is the case based on the fact that
200K routes will double up in the trie node due to broadcast and the
route ending up in adjacent key vectors.

> With these changes, the time is reduced to 4s for the same scale and
> distribution of routes.
>
> The issue is that update_suffix does an O(n) walk on the children of a
> node and the with a dense distribtion of routes the number of children
> in a node tends towards the number of nodes in the tree.

Other than the performance I was just wondering if you did any other
validation to confirm that nothing else differs.  Basically it would
just be a matter of verifying that /proc/net/fib_trie is the same
before and after your patch.

> In the add case it isn't necessary to walk all the other children to
> update the largest suffix length of the node (which is what
> update_suffix is doing) since we already know what the largest suffix
> length of all the other children is and we only need to update it if
> the new node/leaf has a larger suffix. However, it is currently called
> from resize which is called on any update to rebalance the trie.
>
> Therefore improve the scaling by moving the responsibility of
> recalculating the node's largest suffix length out of the resize
> function into its callers. For fib_insert_node this can be taken care
> of by extending put_child to not only update the largest suffix length
> in the parent node, but to propagate it all the way up the trie as
> required, using a new function, node_push_suffix, based on
> leaf_push_suffix, but renamed to reflect its purpose and made safe if
> the node has no parent.
>
> No changes are required to inflate/halve since the maximum suffix
> length of the sub-trie starting from the node's parent isn't changed,
> and the suffix lengths of the halved/inflated nodes are updated
> through the creation of the new nodes and put_child called to add the
> children to them.
>
> In both fib_table_flush and fib_table_flush_external, given that a
> walk of the entire trie is being done there's a choice of optimising
> either for the case of a small number of leafs being flushed by
> updating the suffix on a change and propagating it up, or optimising
> for large numbers of leafs being flushed by deferring the updating of
> the largest suffix length to the walk up to the parent node. I opted
> for the latter to keep the algorithm linear in complexity in the case
> of flushing all leafs and because it is close to the status quo.
>
> Fixes: 5405afd1a306 ("fib_trie: Add tracking value for suffix length")
> Signed-off-by: Robert Shearman <rshearma@brocade.com>

So I am not entirely convinced this is a regression, I was wondering
what your numbers looked like before the patch you are saying this
fixes?  I recall I fixed a number of issues leading up to this so I am
just curious.

My thought is this seems more like a performance optimization than a
fix.  If that is the case this probably belongs in net-next.

It seems to me we might be able to simplify update_suffix rather than
going through all this change.  That might be something that is more
acceptable for net.  Have you looked at simply adding code so that
there is a break inside update_suffix if (slen == tn->slen)?  We don't
have to call it for if a leaf is added so it might make sense to add
that check.

> ---
>  net/ipv4/fib_trie.c | 86 ++++++++++++++++++++++++++++++++++++++---------------
>  1 file changed, 62 insertions(+), 24 deletions(-)
>
> diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
> index 026f309c51e9..701cae8af44a 100644
> --- a/net/ipv4/fib_trie.c
> +++ b/net/ipv4/fib_trie.c
> @@ -421,8 +421,22 @@ static inline int tnode_full(struct key_vector *tn, struct key_vector *n)
>         return n && ((n->pos + n->bits) == tn->pos) && IS_TNODE(n);
>  }
>
> +static void node_push_suffix(struct key_vector *tn, struct key_vector *l)
> +{
> +       while (tn->slen < l->slen) {
> +               tn->slen = l->slen;
> +               tn = node_parent(tn);
> +               if (!tn)
> +                       break;

I don't think the NULL check is necessary, at least it wasn't with the old code.

> +       }
> +}
> +
>  /* Add a child at position i overwriting the old value.
>   * Update the value of full_children and empty_children.
> + *
> + * The suffix length of the parent node and the rest of the tree is
> + * updated (if required) when adding/replacing a node, but is caller's
> + * responsibility on removal.
>   */
>  static void put_child(struct key_vector *tn, unsigned long i,
>                       struct key_vector *n)
> @@ -447,8 +461,8 @@ static void put_child(struct key_vector *tn, unsigned long i,
>         else if (!wasfull && isfull)
>                 tn_info(tn)->full_children++;
>
> -       if (n && (tn->slen < n->slen))
> -               tn->slen = n->slen;
> +       if (n)
> +               node_push_suffix(tn, n);

This is just creating redundant work if we have to perform a resize.

>         rcu_assign_pointer(tn->tnode[i], n);
>  }
> @@ -919,34 +933,35 @@ static struct key_vector *resize(struct trie *t, struct key_vector *tn)
>         if (max_work != MAX_WORK)
>                 return tp;
>
> -       /* push the suffix length to the parent node */
> -       if (tn->slen > tn->pos) {
> -               unsigned char slen = update_suffix(tn);
> -
> -               if (slen > tp->slen)
> -                       tp->slen = slen;
> -       }
> -
>         return tp;
>  }
>

So it seems like the goal here is to move update_suffix out of the
resize code.  I'm mostly okay with that.  I don't recall why I was
doing it as a part of the resize instead of just doing it whenever a
longer suffix was added.

> -static void leaf_pull_suffix(struct key_vector *tp, struct key_vector *l)
> +static void node_set_suffix(struct key_vector *tp, unsigned char slen)
>  {
> -       while ((tp->slen > tp->pos) && (tp->slen > l->slen)) {
> -               if (update_suffix(tp) > l->slen)
> +       if (slen > tp->slen)
> +               tp->slen = slen;
> +}
> +
> +static void node_pull_suffix(struct key_vector *tn)
> +{
> +       struct key_vector *tp;
> +       unsigned char slen;
> +
> +       slen = update_suffix(tn);
> +       tp = node_parent(tn);
> +       while ((tp->slen > tp->pos) && (tp->slen > slen)) {
> +               if (update_suffix(tp) > slen)
>                         break;
>                 tp = node_parent(tp);
>         }
>  }
>

I really hate all the renaming.  Can't you just leave these as
leaf_pull_suffix and leaf_push _suffix.  I'm fine with us dropping the
leaf of the suffix but the renaming just makes adds a bunch of noise.
Really it might work better if you broke the replacement of the
key_vector as a leaf with the suffix length into a separate patch,
then you could do the rename as a part of that.

> -static void leaf_push_suffix(struct key_vector *tn, struct key_vector *l)
> +static void leaf_pull_suffix(struct key_vector *tp, struct key_vector *l)
>  {
> -       /* if this is a new leaf then tn will be NULL and we can sort
> -        * out parent suffix lengths as a part of trie_rebalance
> -        */
> -       while (tn->slen < l->slen) {
> -               tn->slen = l->slen;
> -               tn = node_parent(tn);
> +       while ((tp->slen > tp->pos) && (tp->slen > l->slen)) {
> +               if (update_suffix(tp) > l->slen)
> +                       break;
> +               tp = node_parent(tp);
>         }
>  }

If possible it would work better if you could avoid moving functions
around as a result of your changes.

> @@ -1107,7 +1122,7 @@ static int fib_insert_alias(struct trie *t, struct key_vector *tp,
>         /* if we added to the tail node then we need to update slen */
>         if (l->slen < new->fa_slen) {
>                 l->slen = new->fa_slen;
> -               leaf_push_suffix(tp, l);
> +               node_push_suffix(tp, l);
>         }
>
>         return 0;
> @@ -1495,12 +1510,15 @@ static void fib_remove_alias(struct trie *t, struct key_vector *tp,
>         /* remove the fib_alias from the list */
>         hlist_del_rcu(&old->fa_list);
>
> -       /* if we emptied the list this leaf will be freed and we can sort
> -        * out parent suffix lengths as a part of trie_rebalance
> -        */
> +       /* if we emptied the list this leaf will be freed */
>         if (hlist_empty(&l->leaf)) {
>                 put_child_root(tp, l->key, NULL);
>                 node_free(l);
> +               /* only need to update suffixes if this alias was
> +                * possibly the one with the largest suffix in the parent
> +                */
> +               if (tp->slen == old->fa_slen)
> +                       node_pull_suffix(tp);
>                 trie_rebalance(t, tp);
>                 return;
>         }
> @@ -1783,6 +1801,16 @@ void fib_table_flush_external(struct fib_table *tb)
>                         if (IS_TRIE(pn))
>                                 break;
>
> +                       /* push the suffix length to the parent node,
> +                        * since a previous leaf removal may have
> +                        * caused it to change
> +                        */
> +                       if (pn->slen > pn->pos) {
> +                               unsigned char slen = update_suffix(pn);
> +
> +                               node_set_suffix(node_parent(pn), slen);
> +                       }
> +

Why bother with the local variable?  You could just pass
update_suffix(pn) as the parameter to your node_set_suffix function.

>                         /* resize completed node */
>                         pn = resize(t, pn);
>                         cindex = get_index(pkey, pn);
> @@ -1849,6 +1877,16 @@ int fib_table_flush(struct net *net, struct fib_table *tb)
>                         if (IS_TRIE(pn))
>                                 break;
>
> +                       /* push the suffix length to the parent node,
> +                        * since a previous leaf removal may have
> +                        * caused it to change
> +                        */
> +                       if (pn->slen > pn->pos) {
> +                               unsigned char slen = update_suffix(pn);
> +
> +                               node_set_suffix(node_parent(pn), slen);
> +                       }
> +

Same here.

>                         /* resize completed node */
>                         pn = resize(t, pn);
>                         cindex = get_index(pkey, pn);
> --
> 2.1.4
>

^ permalink raw reply

* Re: netlink: GPF in sock_sndtimeo
From: Cong Wang @ 2016-11-29 23:13 UTC (permalink / raw)
  To: Richard Guy Briggs
  Cc: Dmitry Vyukov, David Miller, Johannes Berg, Florian Westphal,
	Eric Dumazet, Herbert Xu, netdev, LKML, syzkaller
In-Reply-To: <20161129164859.GD26673@madcap2.tricolour.ca>

On Tue, Nov 29, 2016 at 8:48 AM, Richard Guy Briggs <rgb@redhat.com> wrote:
> On 2016-11-26 17:11, Cong Wang wrote:
>> It is racy on audit_sock, especially on the netns exit path.
>
> I think that is the only place it is racy.  The other places audit_sock
> is set is when the socket failure has just triggered a reset.
>
> Is there a notifier callback for failed or reaped sockets?

Is NETLINK_URELEASE event what you are looking for?

^ permalink raw reply

* [PATCH for-next 5/6] IB/hns: Fix the bug when free cq
From: Salil Mehta @ 2016-11-29 23:10 UTC (permalink / raw)
  To: dledford
  Cc: salil.mehta, xavier.huwei, oulijun, xushaobo2, mehta.salil.lnk,
	lijun_nudt, linux-rdma, netdev, linux-kernel, linuxarm
In-Reply-To: <20161129231030.1105600-1-salil.mehta@huawei.com>

From: Shaobo Xu <xushaobo2@huawei.com>

If the resources of cq are freed while executing the user case, hardware
can not been notified in hip06 SoC. Then hardware will hold on when it
writes the cq buffer which has been released.

In order to slove this problem, RoCE driver checks the CQE counter, and
ensure that the outstanding CQE have been written. Then the cq buffer
can be released.

Signed-off-by: Shaobo Xu <xushaobo2@huawei.com>
Reviewed-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
---
 drivers/infiniband/hw/hns/hns_roce_common.h |    2 +
 drivers/infiniband/hw/hns/hns_roce_cq.c     |   27 ++++++++------
 drivers/infiniband/hw/hns/hns_roce_device.h |    8 ++++
 drivers/infiniband/hw/hns/hns_roce_hw_v1.c  |   53 +++++++++++++++++++++++++++
 4 files changed, 79 insertions(+), 11 deletions(-)

diff --git a/drivers/infiniband/hw/hns/hns_roce_common.h b/drivers/infiniband/hw/hns/hns_roce_common.h
index a055632..4af403e 100644
--- a/drivers/infiniband/hw/hns/hns_roce_common.h
+++ b/drivers/infiniband/hw/hns/hns_roce_common.h
@@ -354,6 +354,8 @@
 
 #define ROCEE_SDB_ISSUE_PTR_REG			0x758
 #define ROCEE_SDB_SEND_PTR_REG			0x75C
+#define ROCEE_CAEP_CQE_WCMD_EMPTY		0x850
+#define ROCEE_SCAEP_WR_CQE_CNT			0x8D0
 #define ROCEE_SDB_INV_CNT_REG			0x9A4
 #define ROCEE_SDB_RETRY_CNT_REG			0x9AC
 #define ROCEE_TSP_BP_ST_REG			0x9EC
diff --git a/drivers/infiniband/hw/hns/hns_roce_cq.c b/drivers/infiniband/hw/hns/hns_roce_cq.c
index c9f6c3d..ff9a6a3 100644
--- a/drivers/infiniband/hw/hns/hns_roce_cq.c
+++ b/drivers/infiniband/hw/hns/hns_roce_cq.c
@@ -179,8 +179,7 @@ static int hns_roce_hw2sw_cq(struct hns_roce_dev *dev,
 				 HNS_ROCE_CMD_TIMEOUT_MSECS);
 }
 
-static void hns_roce_free_cq(struct hns_roce_dev *hr_dev,
-			     struct hns_roce_cq *hr_cq)
+void hns_roce_free_cq(struct hns_roce_dev *hr_dev, struct hns_roce_cq *hr_cq)
 {
 	struct hns_roce_cq_table *cq_table = &hr_dev->cq_table;
 	struct device *dev = &hr_dev->pdev->dev;
@@ -392,19 +391,25 @@ int hns_roce_ib_destroy_cq(struct ib_cq *ib_cq)
 {
 	struct hns_roce_dev *hr_dev = to_hr_dev(ib_cq->device);
 	struct hns_roce_cq *hr_cq = to_hr_cq(ib_cq);
+	int ret = 0;
 
-	hns_roce_free_cq(hr_dev, hr_cq);
-	hns_roce_mtt_cleanup(hr_dev, &hr_cq->hr_buf.hr_mtt);
+	if (hr_dev->hw->destroy_cq) {
+		ret = hr_dev->hw->destroy_cq(ib_cq);
+	} else {
+		hns_roce_free_cq(hr_dev, hr_cq);
+		hns_roce_mtt_cleanup(hr_dev, &hr_cq->hr_buf.hr_mtt);
 
-	if (ib_cq->uobject)
-		ib_umem_release(hr_cq->umem);
-	else
-		/* Free the buff of stored cq */
-		hns_roce_ib_free_cq_buf(hr_dev, &hr_cq->hr_buf, ib_cq->cqe);
+		if (ib_cq->uobject)
+			ib_umem_release(hr_cq->umem);
+		else
+			/* Free the buff of stored cq */
+			hns_roce_ib_free_cq_buf(hr_dev, &hr_cq->hr_buf,
+						ib_cq->cqe);
 
-	kfree(hr_cq);
+		kfree(hr_cq);
+	}
 
-	return 0;
+	return ret;
 }
 
 void hns_roce_cq_completion(struct hns_roce_dev *hr_dev, u32 cqn)
diff --git a/drivers/infiniband/hw/hns/hns_roce_device.h b/drivers/infiniband/hw/hns/hns_roce_device.h
index 1050829..d4f0fce 100644
--- a/drivers/infiniband/hw/hns/hns_roce_device.h
+++ b/drivers/infiniband/hw/hns/hns_roce_device.h
@@ -56,6 +56,12 @@
 #define HNS_ROCE_MAX_INNER_MTPT_NUM		0x7
 #define HNS_ROCE_MAX_MTPT_PBL_NUM		0x100000
 
+#define HNS_ROCE_EACH_FREE_CQ_WAIT_MSECS	20
+#define HNS_ROCE_MAX_FREE_CQ_WAIT_CNT	\
+	(5000 / HNS_ROCE_EACH_FREE_CQ_WAIT_MSECS)
+#define HNS_ROCE_CQE_WCMD_EMPTY_BIT		0x2
+#define HNS_ROCE_MIN_CQE_CNT			16
+
 #define HNS_ROCE_MAX_IRQ_NUM			34
 
 #define HNS_ROCE_COMP_VEC_NUM			32
@@ -528,6 +534,7 @@ struct hns_roce_hw {
 	int (*req_notify_cq)(struct ib_cq *ibcq, enum ib_cq_notify_flags flags);
 	int (*poll_cq)(struct ib_cq *ibcq, int num_entries, struct ib_wc *wc);
 	int (*dereg_mr)(struct hns_roce_dev *hr_dev, struct hns_roce_mr *mr);
+	int (*destroy_cq)(struct ib_cq *ibcq);
 	void	*priv;
 };
 
@@ -734,6 +741,7 @@ struct ib_cq *hns_roce_ib_create_cq(struct ib_device *ib_dev,
 				    struct ib_udata *udata);
 
 int hns_roce_ib_destroy_cq(struct ib_cq *ib_cq);
+void hns_roce_free_cq(struct hns_roce_dev *hr_dev, struct hns_roce_cq *hr_cq);
 
 void hns_roce_cq_completion(struct hns_roce_dev *hr_dev, u32 cqn);
 void hns_roce_cq_event(struct hns_roce_dev *hr_dev, u32 cqn, int event_type);
diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v1.c b/drivers/infiniband/hw/hns/hns_roce_hw_v1.c
index f67a3bf..b8111b0 100644
--- a/drivers/infiniband/hw/hns/hns_roce_hw_v1.c
+++ b/drivers/infiniband/hw/hns/hns_roce_hw_v1.c
@@ -3763,6 +3763,58 @@ int hns_roce_v1_destroy_qp(struct ib_qp *ibqp)
 	return 0;
 }
 
+int hns_roce_v1_destroy_cq(struct ib_cq *ibcq)
+{
+	struct hns_roce_dev *hr_dev = to_hr_dev(ibcq->device);
+	struct hns_roce_cq *hr_cq = to_hr_cq(ibcq);
+	struct device *dev = &hr_dev->pdev->dev;
+	u32 cqe_cnt_ori;
+	u32 cqe_cnt_cur;
+	u32 cq_buf_size;
+	int wait_time = 0;
+	int ret = 0;
+
+	hns_roce_free_cq(hr_dev, hr_cq);
+
+	/*
+	 * Before freeing cq buffer, we need to ensure that the outstanding CQE
+	 * have been written by checking the CQE counter.
+	 */
+	cqe_cnt_ori = roce_read(hr_dev, ROCEE_SCAEP_WR_CQE_CNT);
+	while (1) {
+		if (roce_read(hr_dev, ROCEE_CAEP_CQE_WCMD_EMPTY) &
+		    HNS_ROCE_CQE_WCMD_EMPTY_BIT)
+			break;
+
+		cqe_cnt_cur = roce_read(hr_dev, ROCEE_SCAEP_WR_CQE_CNT);
+		if ((cqe_cnt_cur - cqe_cnt_ori) >= HNS_ROCE_MIN_CQE_CNT)
+			break;
+
+		msleep(HNS_ROCE_EACH_FREE_CQ_WAIT_MSECS);
+		if (wait_time > HNS_ROCE_MAX_FREE_CQ_WAIT_CNT) {
+			dev_warn(dev, "Destroy cq 0x%lx timeout!\n",
+				hr_cq->cqn);
+			ret = -ETIMEDOUT;
+			break;
+		}
+		wait_time++;
+	}
+
+	hns_roce_mtt_cleanup(hr_dev, &hr_cq->hr_buf.hr_mtt);
+
+	if (ibcq->uobject)
+		ib_umem_release(hr_cq->umem);
+	else {
+		/* Free the buff of stored cq */
+		cq_buf_size = (ibcq->cqe + 1) * hr_dev->caps.cq_entry_sz;
+		hns_roce_buf_free(hr_dev, cq_buf_size, &hr_cq->hr_buf.hr_buf);
+	}
+
+	kfree(hr_cq);
+
+	return ret;
+}
+
 struct hns_roce_v1_priv hr_v1_priv;
 
 struct hns_roce_hw hns_roce_hw_v1 = {
@@ -3784,5 +3836,6 @@ struct hns_roce_hw hns_roce_hw_v1 = {
 	.req_notify_cq = hns_roce_v1_req_notify_cq,
 	.poll_cq = hns_roce_v1_poll_cq,
 	.dereg_mr = hns_roce_v1_dereg_mr,
+	.destroy_cq = hns_roce_v1_destroy_cq,
 	.priv = &hr_v1_priv,
 };
-- 
1.7.9.5

^ permalink raw reply related

* Re: qed, qedi patchset submission
From: Arun Easi @ 2016-11-29 23:11 UTC (permalink / raw)
  To: David Miller, Martin K. Petersen; +Cc: linux-scsi, netdev
In-Reply-To: <yq11sydfwts.fsf@sermon.lab.mkp.net>

Hi David,

As we are preparing to post V3 of the qed (drivers/net/) and qedi 
(drivers/scsi/), I would like to get your opinion on the "qedi" part. As 
qedi (the iSCSI driver) is dependent on the accompanying "qed" changes, 
qedi changes have to follow or be together with qed changes. Martin is ok 
in having you taking the qedi part as well (see below discussion) or wait 
until qed is in, and then take qedi through SCSI tree.

Hi David, Martin,

So far, we have been posting qedi changes split into functional blocks, 
for review, but was not bisectable. With Martin ok to our request to 
squash all patches while committing to tree, we were wondering if we 
should post the qedi patches squashed, with all the Reviewed-by added, or 
continue to post as before?

Regards,
-Arun

On Mon, 14 Nov 2016, 3:47pm, Martin K. Petersen wrote:

> >>>>> "Arun" == Arun Easi <arun.easi@cavium.com> writes:
> 
> Arun,
> 
> Arun> Do you have any preference or thoughts on how the "qed" patches be
> Arun> approached? Just as a reference, our rdma driver "qedr" went
> Arun> through something similar[1], and eventually "qed" patches were
> Arun> taken by David in the net tree and "qedr", in the rdma tree
> Arun> (obviously) by Doug L.
> 
> DaveM can queue the whole series or I can hold the SCSI pieces for
> rc1. Either way works for me.
> 
> However, I would like to do a review of the driver first. I will try to
> get it done this week.
> 
> 

^ permalink raw reply

* [PATCH for-next 6/6] IB/hns: Fix the IB device name
From: Salil Mehta @ 2016-11-29 23:10 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: salil.mehta-hv44wF8Li93QT0dZR+AlfA,
	xavier.huwei-hv44wF8Li93QT0dZR+AlfA,
	oulijun-hv44wF8Li93QT0dZR+AlfA, xushaobo2-hv44wF8Li93QT0dZR+AlfA,
	mehta.salil.lnk-Re5JQEeQqe8AvxtiuMwx3w, lijun_nudt-9Onoh4P/yGk,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linuxarm-hv44wF8Li93QT0dZR+AlfA
In-Reply-To: <20161129231030.1105600-1-salil.mehta-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>

From: Lijun Ou <oulijun-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>

This patch mainly fix the name for IB device in order
to match with libhns.

Signed-off-by: Lijun Ou <oulijun-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
Signed-off-by: Salil Mehta <salil.mehta-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
---
 drivers/infiniband/hw/hns/hns_roce_main.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/infiniband/hw/hns/hns_roce_main.c b/drivers/infiniband/hw/hns/hns_roce_main.c
index 28a8f24..eddb053 100644
--- a/drivers/infiniband/hw/hns/hns_roce_main.c
+++ b/drivers/infiniband/hw/hns/hns_roce_main.c
@@ -433,7 +433,7 @@ static int hns_roce_register_device(struct hns_roce_dev *hr_dev)
 	spin_lock_init(&iboe->lock);
 
 	ib_dev = &hr_dev->ib_dev;
-	strlcpy(ib_dev->name, "hisi_%d", IB_DEVICE_NAME_MAX);
+	strlcpy(ib_dev->name, "hns_%d", IB_DEVICE_NAME_MAX);
 
 	ib_dev->owner			= THIS_MODULE;
 	ib_dev->node_type		= RDMA_NODE_IB_CA;
-- 
1.7.9.5


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH for-next 4/6] IB/hns: Delete the redundant memset operation
From: Salil Mehta @ 2016-11-29 23:10 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: salil.mehta-hv44wF8Li93QT0dZR+AlfA,
	xavier.huwei-hv44wF8Li93QT0dZR+AlfA,
	oulijun-hv44wF8Li93QT0dZR+AlfA, xushaobo2-hv44wF8Li93QT0dZR+AlfA,
	mehta.salil.lnk-Re5JQEeQqe8AvxtiuMwx3w, lijun_nudt-9Onoh4P/yGk,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linuxarm-hv44wF8Li93QT0dZR+AlfA
In-Reply-To: <20161129231030.1105600-1-salil.mehta-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>

From: "Wei Hu (Xavier)" <xavier.huwei-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>

It deleted the redundant memset operation because the memory allocated
by ib_alloc_device has been set zero.

Signed-off-by: Wei Hu (Xavier) <xavier.huwei-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
Signed-off-by: Salil Mehta <salil.mehta-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
---
 drivers/infiniband/hw/hns/hns_roce_main.c |    3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/infiniband/hw/hns/hns_roce_main.c b/drivers/infiniband/hw/hns/hns_roce_main.c
index 5e620f9..28a8f24 100644
--- a/drivers/infiniband/hw/hns/hns_roce_main.c
+++ b/drivers/infiniband/hw/hns/hns_roce_main.c
@@ -843,9 +843,6 @@ static int hns_roce_probe(struct platform_device *pdev)
 	if (!hr_dev)
 		return -ENOMEM;
 
-	memset((u8 *)hr_dev + sizeof(struct ib_device), 0,
-		sizeof(struct hns_roce_dev) - sizeof(struct ib_device));
-
 	hr_dev->pdev = pdev;
 	platform_set_drvdata(pdev, hr_dev);
 
-- 
1.7.9.5


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH for-next 3/6] IB/hns: Fix the bug of setting port mtu
From: Salil Mehta @ 2016-11-29 23:10 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: salil.mehta-hv44wF8Li93QT0dZR+AlfA,
	xavier.huwei-hv44wF8Li93QT0dZR+AlfA,
	oulijun-hv44wF8Li93QT0dZR+AlfA, xushaobo2-hv44wF8Li93QT0dZR+AlfA,
	mehta.salil.lnk-Re5JQEeQqe8AvxtiuMwx3w, lijun_nudt-9Onoh4P/yGk,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linuxarm-hv44wF8Li93QT0dZR+AlfA
In-Reply-To: <20161129231030.1105600-1-salil.mehta-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>

From: "Wei Hu (Xavier)" <xavier.huwei-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>

In hns_roce driver, we need not call iboe_get_mtu to reduce
IB headers from effective IBoE MTU because hr_dev->caps.max_mtu
has already been reduced.

Signed-off-by: Wei Hu (Xavier) <xavier.huwei-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
Signed-off-by: Salil Mehta <salil.mehta-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
---
 drivers/infiniband/hw/hns/hns_roce_main.c |   16 ++--------------
 1 file changed, 2 insertions(+), 14 deletions(-)

diff --git a/drivers/infiniband/hw/hns/hns_roce_main.c b/drivers/infiniband/hw/hns/hns_roce_main.c
index 0cedec0..5e620f9 100644
--- a/drivers/infiniband/hw/hns/hns_roce_main.c
+++ b/drivers/infiniband/hw/hns/hns_roce_main.c
@@ -72,18 +72,6 @@ static void hns_roce_set_mac(struct hns_roce_dev *hr_dev, u8 port, u8 *addr)
 	hr_dev->hw->set_mac(hr_dev, phy_port, addr);
 }
 
-static void hns_roce_set_mtu(struct hns_roce_dev *hr_dev, u8 port, int mtu)
-{
-	u8 phy_port = hr_dev->iboe.phy_port[port];
-	enum ib_mtu tmp;
-
-	tmp = iboe_get_mtu(mtu);
-	if (!tmp)
-		tmp = IB_MTU_256;
-
-	hr_dev->hw->set_mtu(hr_dev, phy_port, tmp);
-}
-
 static int hns_roce_add_gid(struct ib_device *device, u8 port_num,
 			    unsigned int index, const union ib_gid *gid,
 			    const struct ib_gid_attr *attr, void **context)
@@ -188,8 +176,8 @@ static int hns_roce_setup_mtu_mac(struct hns_roce_dev *hr_dev)
 	u8 i;
 
 	for (i = 0; i < hr_dev->caps.num_ports; i++) {
-		hns_roce_set_mtu(hr_dev, i,
-				 ib_mtu_enum_to_int(hr_dev->caps.max_mtu));
+		hr_dev->hw->set_mtu(hr_dev, hr_dev->iboe.phy_port[i],
+				    hr_dev->caps.max_mtu);
 		hns_roce_set_mac(hr_dev, i, hr_dev->iboe.netdevs[i]->dev_addr);
 	}
 
-- 
1.7.9.5


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH for-next 2/6] IB/hns: Fix the bug when free mr
From: Salil Mehta @ 2016-11-29 23:10 UTC (permalink / raw)
  To: dledford
  Cc: salil.mehta, xavier.huwei, oulijun, xushaobo2, mehta.salil.lnk,
	lijun_nudt, linux-rdma, netdev, linux-kernel, linuxarm
In-Reply-To: <20161129231030.1105600-1-salil.mehta@huawei.com>

From: Shaobo Xu <xushaobo2@huawei.com>

If the resources of mr are freed while executing the user case, hardware
can not been notified in hip06 SoC. Then hardware will hold on when it
reads the payload by the PA which has been released.

In order to slove this problem, RoCE driver creates 8 reserved loopback
QPs to ensure zero wqe when free mr. When the mac address is reset, in
order to avoid loopback failure, we need to release the reserved loopback
QPs and recreate them.

Signed-off-by: Shaobo Xu <xushaobo2@huawei.com>
Reviewed-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
---
 drivers/infiniband/hw/hns/hns_roce_cmd.h    |    5 -
 drivers/infiniband/hw/hns/hns_roce_device.h |   10 +
 drivers/infiniband/hw/hns/hns_roce_hw_v1.c  |  485 +++++++++++++++++++++++++++
 drivers/infiniband/hw/hns/hns_roce_hw_v1.h  |   34 ++
 drivers/infiniband/hw/hns/hns_roce_main.c   |    5 +-
 drivers/infiniband/hw/hns/hns_roce_mr.c     |   21 +-
 6 files changed, 545 insertions(+), 15 deletions(-)

diff --git a/drivers/infiniband/hw/hns/hns_roce_cmd.h b/drivers/infiniband/hw/hns/hns_roce_cmd.h
index ed14ad3..f5a9ee2 100644
--- a/drivers/infiniband/hw/hns/hns_roce_cmd.h
+++ b/drivers/infiniband/hw/hns/hns_roce_cmd.h
@@ -58,11 +58,6 @@ enum {
 	HNS_ROCE_CMD_QUERY_QP		= 0x22,
 };
 
-struct hns_roce_cmd_mailbox {
-	void		       *buf;
-	dma_addr_t		dma;
-};
-
 int hns_roce_cmd_mbox(struct hns_roce_dev *hr_dev, u64 in_param, u64 out_param,
 		      unsigned long in_modifier, u8 op_modifier, u16 op,
 		      unsigned long timeout);
diff --git a/drivers/infiniband/hw/hns/hns_roce_device.h b/drivers/infiniband/hw/hns/hns_roce_device.h
index e48464d..1050829 100644
--- a/drivers/infiniband/hw/hns/hns_roce_device.h
+++ b/drivers/infiniband/hw/hns/hns_roce_device.h
@@ -388,6 +388,11 @@ struct hns_roce_cmdq {
 	u8			toggle;
 };
 
+struct hns_roce_cmd_mailbox {
+	void		       *buf;
+	dma_addr_t		dma;
+};
+
 struct hns_roce_dev;
 
 struct hns_roce_qp {
@@ -522,6 +527,7 @@ struct hns_roce_hw {
 			 struct ib_recv_wr **bad_recv_wr);
 	int (*req_notify_cq)(struct ib_cq *ibcq, enum ib_cq_notify_flags flags);
 	int (*poll_cq)(struct ib_cq *ibcq, int num_entries, struct ib_wc *wc);
+	int (*dereg_mr)(struct hns_roce_dev *hr_dev, struct hns_roce_mr *mr);
 	void	*priv;
 };
 
@@ -688,6 +694,10 @@ struct ib_mr *hns_roce_reg_user_mr(struct ib_pd *pd, u64 start, u64 length,
 				   u64 virt_addr, int access_flags,
 				   struct ib_udata *udata);
 int hns_roce_dereg_mr(struct ib_mr *ibmr);
+int hns_roce_hw2sw_mpt(struct hns_roce_dev *hr_dev,
+		       struct hns_roce_cmd_mailbox *mailbox,
+		       unsigned long mpt_index);
+unsigned long key_to_hw_index(u32 key);
 
 void hns_roce_buf_free(struct hns_roce_dev *hr_dev, u32 size,
 		       struct hns_roce_buf *buf);
diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v1.c b/drivers/infiniband/hw/hns/hns_roce_hw_v1.c
index aee1d01..f67a3bf 100644
--- a/drivers/infiniband/hw/hns/hns_roce_hw_v1.c
+++ b/drivers/infiniband/hw/hns/hns_roce_hw_v1.c
@@ -295,6 +295,8 @@ int hns_roce_v1_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr,
 		roce_set_field(sq_db.u32_4, SQ_DOORBELL_U32_4_SQ_HEAD_M,
 			       SQ_DOORBELL_U32_4_SQ_HEAD_S,
 			      (qp->sq.head & ((qp->sq.wqe_cnt << 1) - 1)));
+		roce_set_field(sq_db.u32_4, SQ_DOORBELL_U32_4_SL_M,
+			       SQ_DOORBELL_U32_4_SL_S, qp->sl);
 		roce_set_field(sq_db.u32_4, SQ_DOORBELL_U32_4_PORT_M,
 			       SQ_DOORBELL_U32_4_PORT_S, qp->phy_port);
 		roce_set_field(sq_db.u32_8, SQ_DOORBELL_U32_8_QPN_M,
@@ -622,6 +624,213 @@ static int hns_roce_db_ext_init(struct hns_roce_dev *hr_dev, u32 sdb_ext_mod,
 	return ret;
 }
 
+static struct hns_roce_qp *hns_roce_v1_create_lp_qp(struct hns_roce_dev *hr_dev,
+						    struct ib_pd *pd)
+{
+	struct device *dev = &hr_dev->pdev->dev;
+	struct ib_qp_init_attr init_attr;
+	struct ib_qp *qp;
+
+	memset(&init_attr, 0, sizeof(struct ib_qp_init_attr));
+	init_attr.qp_type		= IB_QPT_RC;
+	init_attr.sq_sig_type		= IB_SIGNAL_ALL_WR;
+	init_attr.cap.max_recv_wr	= HNS_ROCE_MIN_WQE_NUM;
+	init_attr.cap.max_send_wr	= HNS_ROCE_MIN_WQE_NUM;
+
+	qp = hns_roce_create_qp(pd, &init_attr, NULL);
+	if (IS_ERR(qp)) {
+		dev_err(dev, "Create loop qp for mr free failed!");
+		return NULL;
+	}
+
+	return to_hr_qp(qp);
+}
+
+static int hns_roce_v1_rsv_lp_qp(struct hns_roce_dev *hr_dev)
+{
+	struct hns_roce_caps *caps = &hr_dev->caps;
+	struct device *dev = &hr_dev->pdev->dev;
+	struct ib_cq_init_attr cq_init_attr;
+	struct hns_roce_free_mr *free_mr;
+	struct ib_qp_attr attr = { 0 };
+	struct hns_roce_v1_priv *priv;
+	struct hns_roce_qp *hr_qp;
+	struct ib_cq *cq;
+	struct ib_pd *pd;
+	u64 subnet_prefix;
+	int attr_mask = 0;
+	int i;
+	int ret;
+	u8 phy_port;
+	u8 sl;
+
+	priv = (struct hns_roce_v1_priv *)hr_dev->hw->priv;
+	free_mr = &priv->free_mr;
+
+	/* Reserved cq for loop qp */
+	cq_init_attr.cqe		= HNS_ROCE_MIN_WQE_NUM * 2;
+	cq_init_attr.comp_vector	= 0;
+	cq = hns_roce_ib_create_cq(&hr_dev->ib_dev, &cq_init_attr, NULL, NULL);
+	if (IS_ERR(cq)) {
+		dev_err(dev, "Create cq for reseved loop qp failed!");
+		return -ENOMEM;
+	}
+	free_mr->mr_free_cq = to_hr_cq(cq);
+	free_mr->mr_free_cq->ib_cq.device		= &hr_dev->ib_dev;
+	free_mr->mr_free_cq->ib_cq.uobject		= NULL;
+	free_mr->mr_free_cq->ib_cq.comp_handler		= NULL;
+	free_mr->mr_free_cq->ib_cq.event_handler	= NULL;
+	free_mr->mr_free_cq->ib_cq.cq_context		= NULL;
+	atomic_set(&free_mr->mr_free_cq->ib_cq.usecnt, 0);
+
+	pd = hns_roce_alloc_pd(&hr_dev->ib_dev, NULL, NULL);
+	if (IS_ERR(pd)) {
+		dev_err(dev, "Create pd for reseved loop qp failed!");
+		ret = -ENOMEM;
+		goto alloc_pd_failed;
+	}
+	free_mr->mr_free_pd = to_hr_pd(pd);
+	free_mr->mr_free_pd->ibpd.device  = &hr_dev->ib_dev;
+	free_mr->mr_free_pd->ibpd.uobject = NULL;
+	atomic_set(&free_mr->mr_free_pd->ibpd.usecnt, 0);
+
+	attr.qp_access_flags	= IB_ACCESS_REMOTE_WRITE;
+	attr.pkey_index		= 0;
+	attr.min_rnr_timer	= 0;
+	/* Disable read ability */
+	attr.max_dest_rd_atomic = 0;
+	attr.max_rd_atomic	= 0;
+	/* Use arbitrary values as rq_psn and sq_psn */
+	attr.rq_psn		= 0x0808;
+	attr.sq_psn		= 0x0808;
+	attr.retry_cnt		= 7;
+	attr.rnr_retry		= 7;
+	attr.timeout		= 0x12;
+	attr.path_mtu		= IB_MTU_256;
+	attr.ah_attr.ah_flags		= 1;
+	attr.ah_attr.static_rate	= 3;
+	attr.ah_attr.grh.sgid_index	= 0;
+	attr.ah_attr.grh.hop_limit	= 1;
+	attr.ah_attr.grh.flow_label	= 0;
+	attr.ah_attr.grh.traffic_class	= 0;
+
+	subnet_prefix = cpu_to_be64(0xfe80000000000000LL);
+	for (i = 0; i < HNS_ROCE_V1_RESV_QP; i++) {
+		free_mr->mr_free_qp[i] = hns_roce_v1_create_lp_qp(hr_dev, pd);
+		if (IS_ERR(free_mr->mr_free_qp[i])) {
+			dev_err(dev, "Create loop qp failed!\n");
+			goto create_lp_qp_failed;
+		}
+		hr_qp = free_mr->mr_free_qp[i];
+
+		sl = i / caps->num_ports;
+
+		if (caps->num_ports == HNS_ROCE_MAX_PORTS)
+			phy_port = (i >= HNS_ROCE_MAX_PORTS) ? (i - 2) :
+				(i % caps->num_ports);
+		else
+			phy_port = i % caps->num_ports;
+
+		hr_qp->port		= phy_port + 1;
+		hr_qp->phy_port		= phy_port;
+		hr_qp->ibqp.qp_type	= IB_QPT_RC;
+		hr_qp->ibqp.device	= &hr_dev->ib_dev;
+		hr_qp->ibqp.uobject	= NULL;
+		atomic_set(&hr_qp->ibqp.usecnt, 0);
+		hr_qp->ibqp.pd		= pd;
+		hr_qp->ibqp.recv_cq	= cq;
+		hr_qp->ibqp.send_cq	= cq;
+
+		attr.ah_attr.port_num	= phy_port + 1;
+		attr.ah_attr.sl		= sl;
+		attr.port_num		= phy_port + 1;
+
+		attr.dest_qp_num	= hr_qp->qpn;
+		memcpy(attr.ah_attr.dmac, hr_dev->dev_addr[phy_port],
+		       MAC_ADDR_OCTET_NUM);
+
+		memcpy(attr.ah_attr.grh.dgid.raw,
+			&subnet_prefix, sizeof(u64));
+		memcpy(&attr.ah_attr.grh.dgid.raw[8],
+		       hr_dev->dev_addr[phy_port], 3);
+		memcpy(&attr.ah_attr.grh.dgid.raw[13],
+		       hr_dev->dev_addr[phy_port] + 3, 3);
+		attr.ah_attr.grh.dgid.raw[11] = 0xff;
+		attr.ah_attr.grh.dgid.raw[12] = 0xfe;
+		attr.ah_attr.grh.dgid.raw[8] ^= 2;
+
+		attr_mask |= IB_QP_PORT;
+
+		ret = hr_dev->hw->modify_qp(&hr_qp->ibqp, &attr, attr_mask,
+					    IB_QPS_RESET, IB_QPS_INIT);
+		if (ret) {
+			dev_err(dev, "modify qp failed(%d)!\n", ret);
+			goto create_lp_qp_failed;
+		}
+
+		ret = hr_dev->hw->modify_qp(&hr_qp->ibqp, &attr, attr_mask,
+					    IB_QPS_INIT, IB_QPS_RTR);
+		if (ret) {
+			dev_err(dev, "modify qp failed(%d)!\n", ret);
+			goto create_lp_qp_failed;
+		}
+
+		ret = hr_dev->hw->modify_qp(&hr_qp->ibqp, &attr, attr_mask,
+					    IB_QPS_RTR, IB_QPS_RTS);
+		if (ret) {
+			dev_err(dev, "modify qp failed(%d)!\n", ret);
+			goto create_lp_qp_failed;
+		}
+	}
+
+	return 0;
+
+create_lp_qp_failed:
+	for (i -= 1; i >= 0; i--) {
+		hr_qp = free_mr->mr_free_qp[i];
+		if (hns_roce_v1_destroy_qp(&hr_qp->ibqp))
+			dev_err(dev, "Destroy qp %d for mr free failed!\n", i);
+	}
+
+	if (hns_roce_dealloc_pd(pd))
+		dev_err(dev, "Destroy pd for create_lp_qp failed!\n");
+
+alloc_pd_failed:
+	if (hns_roce_ib_destroy_cq(cq))
+		dev_err(dev, "Destroy cq for create_lp_qp failed!\n");
+
+	return -EINVAL;
+}
+
+static void hns_roce_v1_release_lp_qp(struct hns_roce_dev *hr_dev)
+{
+	struct device *dev = &hr_dev->pdev->dev;
+	struct hns_roce_free_mr *free_mr;
+	struct hns_roce_v1_priv *priv;
+	struct hns_roce_qp *hr_qp;
+	int ret;
+	int i;
+
+	priv = (struct hns_roce_v1_priv *)hr_dev->hw->priv;
+	free_mr = &priv->free_mr;
+
+	for (i = 0; i < HNS_ROCE_V1_RESV_QP; i++) {
+		hr_qp = free_mr->mr_free_qp[i];
+		ret = hns_roce_v1_destroy_qp(&hr_qp->ibqp);
+		if (ret)
+			dev_err(dev, "Destroy qp %d for mr free failed(%d)!\n",
+				i, ret);
+	}
+
+	ret = hns_roce_ib_destroy_cq(&free_mr->mr_free_cq->ib_cq);
+	if (ret)
+		dev_err(dev, "Destroy cq for mr_free failed(%d)!\n", ret);
+
+	ret = hns_roce_dealloc_pd(&free_mr->mr_free_pd->ibpd);
+	if (ret)
+		dev_err(dev, "Destroy pd for mr_free failed(%d)!\n", ret);
+}
+
 static int hns_roce_db_init(struct hns_roce_dev *hr_dev)
 {
 	struct device *dev = &hr_dev->pdev->dev;
@@ -659,6 +868,223 @@ static int hns_roce_db_init(struct hns_roce_dev *hr_dev)
 	return 0;
 }
 
+void hns_roce_v1_recreate_lp_qp_work_fn(struct work_struct *work)
+{
+	struct hns_roce_recreate_lp_qp_work *lp_qp_work;
+	struct hns_roce_dev *hr_dev;
+
+	lp_qp_work = container_of(work, struct hns_roce_recreate_lp_qp_work,
+				  work);
+	hr_dev = to_hr_dev(lp_qp_work->ib_dev);
+
+	hns_roce_v1_release_lp_qp(hr_dev);
+
+	if (hns_roce_v1_rsv_lp_qp(hr_dev))
+		dev_err(&hr_dev->pdev->dev, "create reserver qp failed\n");
+
+	if (lp_qp_work->comp_flag)
+		complete(lp_qp_work->comp);
+
+	kfree(lp_qp_work);
+}
+
+static int hns_roce_v1_recreate_lp_qp(struct hns_roce_dev *hr_dev)
+{
+	struct device *dev = &hr_dev->pdev->dev;
+	struct hns_roce_recreate_lp_qp_work *lp_qp_work;
+	struct hns_roce_free_mr *free_mr;
+	struct hns_roce_v1_priv *priv;
+	struct completion comp;
+	unsigned long end =
+	  msecs_to_jiffies(HNS_ROCE_V1_RECREATE_LP_QP_TIMEOUT_MSECS) + jiffies;
+
+	priv = (struct hns_roce_v1_priv *)hr_dev->hw->priv;
+	free_mr = &priv->free_mr;
+
+	lp_qp_work = kzalloc(sizeof(struct hns_roce_recreate_lp_qp_work),
+			     GFP_KERNEL);
+
+	INIT_WORK(&(lp_qp_work->work), hns_roce_v1_recreate_lp_qp_work_fn);
+
+	lp_qp_work->ib_dev = &(hr_dev->ib_dev);
+	lp_qp_work->comp = &comp;
+	lp_qp_work->comp_flag = 1;
+
+	init_completion(lp_qp_work->comp);
+
+	queue_work(free_mr->free_mr_wq, &(lp_qp_work->work));
+
+	while (time_before_eq(jiffies, end)) {
+		if (try_wait_for_completion(&comp))
+			return 0;
+		msleep(HNS_ROCE_V1_RECREATE_LP_QP_WAIT_VALUE);
+	}
+
+	lp_qp_work->comp_flag = 0;
+	if (try_wait_for_completion(&comp))
+		return 0;
+
+	dev_warn(dev, "recreate lp qp failed 20s timeout and return failed!\n");
+	return -ETIMEDOUT;
+}
+
+static int hns_roce_v1_send_lp_wqe(struct hns_roce_qp *hr_qp)
+{
+	struct hns_roce_dev *hr_dev = to_hr_dev(hr_qp->ibqp.device);
+	struct device *dev = &hr_dev->pdev->dev;
+	struct ib_send_wr send_wr, *bad_wr;
+	int ret;
+
+	memset(&send_wr, 0, sizeof(send_wr));
+	send_wr.next	= NULL;
+	send_wr.num_sge	= 0;
+	send_wr.send_flags = 0;
+	send_wr.sg_list	= NULL;
+	send_wr.wr_id	= (unsigned long long)&send_wr;
+	send_wr.opcode	= IB_WR_RDMA_WRITE;
+
+	ret = hns_roce_v1_post_send(&hr_qp->ibqp, &send_wr, &bad_wr);
+	if (ret) {
+		dev_err(dev, "Post write wqe for mr free failed(%d)!", ret);
+		return ret;
+	}
+
+	return 0;
+}
+
+static void hns_roce_v1_mr_free_work_fn(struct work_struct *work)
+{
+	struct hns_roce_mr_free_work *mr_work;
+	struct ib_wc wc[HNS_ROCE_V1_RESV_QP];
+	struct hns_roce_free_mr *free_mr;
+	struct hns_roce_cq *mr_free_cq;
+	struct hns_roce_v1_priv *priv;
+	struct hns_roce_dev *hr_dev;
+	struct hns_roce_mr *hr_mr;
+	struct hns_roce_qp *hr_qp;
+	struct device *dev;
+	unsigned long end =
+		msecs_to_jiffies(HNS_ROCE_V1_FREE_MR_TIMEOUT_MSECS) + jiffies;
+	int i;
+	int ret;
+	int ne;
+
+	mr_work = container_of(work, struct hns_roce_mr_free_work, work);
+	hr_mr = (struct hns_roce_mr *)mr_work->mr;
+	hr_dev = to_hr_dev(mr_work->ib_dev);
+	dev = &hr_dev->pdev->dev;
+
+	priv = (struct hns_roce_v1_priv *)hr_dev->hw->priv;
+	free_mr = &priv->free_mr;
+	mr_free_cq = free_mr->mr_free_cq;
+
+	for (i = 0; i < HNS_ROCE_V1_RESV_QP; i++) {
+		hr_qp = free_mr->mr_free_qp[i];
+		ret = hns_roce_v1_send_lp_wqe(hr_qp);
+		if (ret) {
+			dev_err(dev,
+			     "Send wqe (qp:0x%lx) for mr free failed(%d)!\n",
+			     hr_qp->qpn, ret);
+			goto free_work;
+		}
+	}
+
+	ne = HNS_ROCE_V1_RESV_QP;
+	do {
+		ret = hns_roce_v1_poll_cq(&mr_free_cq->ib_cq, ne, wc);
+		if (ret < 0) {
+			dev_err(dev,
+			   "(qp:0x%lx) starts, Poll cqe failed(%d) for mr 0x%x free! Remain %d cqe\n",
+			   hr_qp->qpn, ret, hr_mr->key, ne);
+			goto free_work;
+		}
+		ne -= ret;
+		msleep(HNS_ROCE_V1_FREE_MR_WAIT_VALUE);
+	} while (ne && time_before_eq(jiffies, end));
+
+	if (ne != 0)
+		dev_err(dev,
+			"Poll cqe for mr 0x%x free timeout! Remain %d cqe\n",
+			hr_mr->key, ne);
+
+free_work:
+	if (mr_work->comp_flag)
+		complete(mr_work->comp);
+	kfree(mr_work);
+}
+
+int hns_roce_v1_dereg_mr(struct hns_roce_dev *hr_dev, struct hns_roce_mr *mr)
+{
+	struct device *dev = &hr_dev->pdev->dev;
+	struct hns_roce_mr_free_work *mr_work;
+	struct hns_roce_free_mr *free_mr;
+	struct hns_roce_v1_priv *priv;
+	struct completion comp;
+	unsigned long end =
+		msecs_to_jiffies(HNS_ROCE_V1_FREE_MR_TIMEOUT_MSECS) + jiffies;
+	unsigned long start = jiffies;
+	int npages;
+	int ret = 0;
+
+	priv = (struct hns_roce_v1_priv *)hr_dev->hw->priv;
+	free_mr = &priv->free_mr;
+
+	if (mr->enabled) {
+		if (hns_roce_hw2sw_mpt(hr_dev, NULL, key_to_hw_index(mr->key)
+				       & (hr_dev->caps.num_mtpts - 1)))
+			dev_warn(dev, "HW2SW_MPT failed!\n");
+	}
+
+	mr_work = kzalloc(sizeof(*mr_work), GFP_KERNEL);
+	if (!mr_work) {
+		ret = -ENOMEM;
+		goto free_mr;
+	}
+
+	INIT_WORK(&(mr_work->work), hns_roce_v1_mr_free_work_fn);
+
+	mr_work->ib_dev = &(hr_dev->ib_dev);
+	mr_work->comp = &comp;
+	mr_work->comp_flag = 1;
+	mr_work->mr = (void *)mr;
+	init_completion(mr_work->comp);
+
+	queue_work(free_mr->free_mr_wq, &(mr_work->work));
+
+	while (time_before_eq(jiffies, end)) {
+		if (try_wait_for_completion(&comp))
+			goto free_mr;
+		msleep(HNS_ROCE_V1_FREE_MR_WAIT_VALUE);
+	}
+
+	mr_work->comp_flag = 0;
+	if (try_wait_for_completion(&comp))
+		goto free_mr;
+
+	dev_warn(dev, "Free mr work 0x%x over 50s and failed!\n", mr->key);
+	ret = -ETIMEDOUT;
+
+free_mr:
+	dev_dbg(dev, "Free mr 0x%x use 0x%x us.\n",
+		mr->key, jiffies_to_usecs(jiffies) - jiffies_to_usecs(start));
+
+	if (mr->size != ~0ULL) {
+		npages = ib_umem_page_count(mr->umem);
+		dma_free_coherent(dev, npages * 8, mr->pbl_buf,
+				  mr->pbl_dma_addr);
+	}
+
+	hns_roce_bitmap_free(&hr_dev->mr_table.mtpt_bitmap,
+			     key_to_hw_index(mr->key), 0);
+
+	if (mr->umem)
+		ib_umem_release(mr->umem);
+
+	kfree(mr);
+
+	return ret;
+}
+
 static void hns_roce_db_free(struct hns_roce_dev *hr_dev)
 {
 	struct device *dev = &hr_dev->pdev->dev;
@@ -899,6 +1325,46 @@ static void hns_roce_tptr_free(struct hns_roce_dev *hr_dev)
 			  tptr_buf->buf, tptr_buf->map);
 }
 
+static int hns_roce_free_mr_init(struct hns_roce_dev *hr_dev)
+{
+	struct device *dev = &hr_dev->pdev->dev;
+	struct hns_roce_free_mr *free_mr;
+	struct hns_roce_v1_priv *priv;
+	int ret = 0;
+
+	priv = (struct hns_roce_v1_priv *)hr_dev->hw->priv;
+	free_mr = &priv->free_mr;
+
+	free_mr->free_mr_wq = create_singlethread_workqueue("hns_roce_free_mr");
+	if (!free_mr->free_mr_wq) {
+		dev_err(dev, "Create free mr workqueue failed!\n");
+		return -ENOMEM;
+	}
+
+	ret = hns_roce_v1_rsv_lp_qp(hr_dev);
+	if (ret) {
+		dev_err(dev, "Reserved loop qp failed(%d)!\n", ret);
+		flush_workqueue(free_mr->free_mr_wq);
+		destroy_workqueue(free_mr->free_mr_wq);
+	}
+
+	return ret;
+}
+
+static void hns_roce_free_mr_free(struct hns_roce_dev *hr_dev)
+{
+	struct hns_roce_free_mr *free_mr;
+	struct hns_roce_v1_priv *priv;
+
+	priv = (struct hns_roce_v1_priv *)hr_dev->hw->priv;
+	free_mr = &priv->free_mr;
+
+	flush_workqueue(free_mr->free_mr_wq);
+	destroy_workqueue(free_mr->free_mr_wq);
+
+	hns_roce_v1_release_lp_qp(hr_dev);
+}
+
 /**
  * hns_roce_v1_reset - reset RoCE
  * @hr_dev: RoCE device struct pointer
@@ -1100,10 +1566,19 @@ int hns_roce_v1_init(struct hns_roce_dev *hr_dev)
 		goto error_failed_des_qp_init;
 	}
 
+	ret = hns_roce_free_mr_init(hr_dev);
+	if (ret) {
+		dev_err(dev, "free mr init failed!\n");
+		goto error_failed_free_mr_init;
+	}
+
 	hns_roce_port_enable(hr_dev, HNS_ROCE_PORT_UP);
 
 	return 0;
 
+error_failed_free_mr_init:
+	hns_roce_des_qp_free(hr_dev);
+
 error_failed_des_qp_init:
 	hns_roce_tptr_free(hr_dev);
 
@@ -1121,6 +1596,7 @@ int hns_roce_v1_init(struct hns_roce_dev *hr_dev)
 void hns_roce_v1_exit(struct hns_roce_dev *hr_dev)
 {
 	hns_roce_port_enable(hr_dev, HNS_ROCE_PORT_DOWN);
+	hns_roce_free_mr_free(hr_dev);
 	hns_roce_des_qp_free(hr_dev);
 	hns_roce_tptr_free(hr_dev);
 	hns_roce_bt_free(hr_dev);
@@ -1161,6 +1637,14 @@ void hns_roce_v1_set_mac(struct hns_roce_dev *hr_dev, u8 phy_port, u8 *addr)
 	u32 *p;
 	u32 val;
 
+	/*
+	 * When mac changed, loopback may fail
+	 * because of smac not equal to dmac.
+	 * We Need to release and create reserved qp again.
+	 */
+	if (hr_dev->hw->dereg_mr && hns_roce_v1_recreate_lp_qp(hr_dev))
+		dev_warn(&hr_dev->pdev->dev, "recreate lp qp timeout!\n");
+
 	p = (u32 *)(&addr[0]);
 	reg_smac_l = *p;
 	roce_raw_write(reg_smac_l, hr_dev->reg_base + ROCEE_SMAC_L_0_REG +
@@ -3299,5 +3783,6 @@ struct hns_roce_hw hns_roce_hw_v1 = {
 	.post_recv = hns_roce_v1_post_recv,
 	.req_notify_cq = hns_roce_v1_req_notify_cq,
 	.poll_cq = hns_roce_v1_poll_cq,
+	.dereg_mr = hns_roce_v1_dereg_mr,
 	.priv = &hr_v1_priv,
 };
diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v1.h b/drivers/infiniband/hw/hns/hns_roce_hw_v1.h
index 1d250c0..b213b5e 100644
--- a/drivers/infiniband/hw/hns/hns_roce_hw_v1.h
+++ b/drivers/infiniband/hw/hns/hns_roce_hw_v1.h
@@ -58,6 +58,7 @@
 #define HNS_ROCE_V1_PHY_UAR_NUM				8
 
 #define HNS_ROCE_V1_GID_NUM				16
+#define HNS_ROCE_V1_RESV_QP				8
 
 #define HNS_ROCE_V1_NUM_COMP_EQE			0x8000
 #define HNS_ROCE_V1_NUM_ASYNC_EQE			0x400
@@ -107,6 +108,10 @@
 #define HNS_ROCE_V1_DB_STAGE2				2
 #define HNS_ROCE_V1_CHECK_DB_TIMEOUT_MSECS		10000
 #define HNS_ROCE_V1_CHECK_DB_SLEEP_MSECS		20
+#define HNS_ROCE_V1_FREE_MR_TIMEOUT_MSECS		50000
+#define HNS_ROCE_V1_RECREATE_LP_QP_TIMEOUT_MSECS	10000
+#define HNS_ROCE_V1_FREE_MR_WAIT_VALUE			5
+#define HNS_ROCE_V1_RECREATE_LP_QP_WAIT_VALUE		20
 
 #define HNS_ROCE_BT_RSV_BUF_SIZE			(1 << 17)
 
@@ -969,6 +974,10 @@ struct hns_roce_sq_db {
 #define SQ_DOORBELL_U32_4_SQ_HEAD_M   \
 	(((1UL << 15) - 1) << SQ_DOORBELL_U32_4_SQ_HEAD_S)
 
+#define SQ_DOORBELL_U32_4_SL_S 16
+#define SQ_DOORBELL_U32_4_SL_M   \
+	(((1UL << 2) - 1) << SQ_DOORBELL_U32_4_SL_S)
+
 #define SQ_DOORBELL_U32_4_PORT_S 18
 #define SQ_DOORBELL_U32_4_PORT_M  (((1UL << 3) - 1) << SQ_DOORBELL_U32_4_PORT_S)
 
@@ -1015,14 +1024,39 @@ struct hns_roce_des_qp {
 	int	requeue_flag;
 };
 
+struct hns_roce_mr_free_work {
+	struct	work_struct work;
+	struct	ib_device *ib_dev;
+	struct	completion *comp;
+	int	comp_flag;
+	void	*mr;
+};
+
+struct hns_roce_recreate_lp_qp_work {
+	struct	work_struct work;
+	struct	ib_device *ib_dev;
+	struct	completion *comp;
+	int	comp_flag;
+};
+
+struct hns_roce_free_mr {
+	struct workqueue_struct *free_mr_wq;
+	struct hns_roce_qp *mr_free_qp[HNS_ROCE_V1_RESV_QP];
+	struct hns_roce_cq *mr_free_cq;
+	struct hns_roce_pd *mr_free_pd;
+};
+
 struct hns_roce_v1_priv {
 	struct hns_roce_db_table  db_table;
 	struct hns_roce_raq_table raq_table;
 	struct hns_roce_bt_table  bt_table;
 	struct hns_roce_tptr_table tptr_table;
 	struct hns_roce_des_qp des_qp;
+	struct hns_roce_free_mr free_mr;
 };
 
 int hns_dsaf_roce_reset(struct fwnode_handle *dsaf_fwnode, bool dereset);
+int hns_roce_v1_poll_cq(struct ib_cq *ibcq, int num_entries, struct ib_wc *wc);
+int hns_roce_v1_destroy_qp(struct ib_qp *ibqp);
 
 #endif
diff --git a/drivers/infiniband/hw/hns/hns_roce_main.c b/drivers/infiniband/hw/hns/hns_roce_main.c
index 914d0ac..0cedec0 100644
--- a/drivers/infiniband/hw/hns/hns_roce_main.c
+++ b/drivers/infiniband/hw/hns/hns_roce_main.c
@@ -129,7 +129,6 @@ static int handle_en_event(struct hns_roce_dev *hr_dev, u8 port,
 {
 	struct device *dev = &hr_dev->pdev->dev;
 	struct net_device *netdev;
-	unsigned long flags;
 
 	netdev = hr_dev->iboe.netdevs[port];
 	if (!netdev) {
@@ -137,7 +136,7 @@ static int handle_en_event(struct hns_roce_dev *hr_dev, u8 port,
 		return -ENODEV;
 	}
 
-	spin_lock_irqsave(&hr_dev->iboe.lock, flags);
+	spin_lock_bh(&hr_dev->iboe.lock);
 
 	switch (event) {
 	case NETDEV_UP:
@@ -156,7 +155,7 @@ static int handle_en_event(struct hns_roce_dev *hr_dev, u8 port,
 		break;
 	}
 
-	spin_unlock_irqrestore(&hr_dev->iboe.lock, flags);
+	spin_unlock_bh(&hr_dev->iboe.lock);
 	return 0;
 }
 
diff --git a/drivers/infiniband/hw/hns/hns_roce_mr.c b/drivers/infiniband/hw/hns/hns_roce_mr.c
index 9b8a1ad..4139abe 100644
--- a/drivers/infiniband/hw/hns/hns_roce_mr.c
+++ b/drivers/infiniband/hw/hns/hns_roce_mr.c
@@ -42,7 +42,7 @@ static u32 hw_index_to_key(unsigned long ind)
 	return (u32)(ind >> 24) | (ind << 8);
 }
 
-static unsigned long key_to_hw_index(u32 key)
+unsigned long key_to_hw_index(u32 key)
 {
 	return (key << 24) | (key >> 8);
 }
@@ -56,7 +56,7 @@ static int hns_roce_sw2hw_mpt(struct hns_roce_dev *hr_dev,
 				 HNS_ROCE_CMD_TIMEOUT_MSECS);
 }
 
-static int hns_roce_hw2sw_mpt(struct hns_roce_dev *hr_dev,
+int hns_roce_hw2sw_mpt(struct hns_roce_dev *hr_dev,
 			      struct hns_roce_cmd_mailbox *mailbox,
 			      unsigned long mpt_index)
 {
@@ -607,13 +607,20 @@ struct ib_mr *hns_roce_reg_user_mr(struct ib_pd *pd, u64 start, u64 length,
 
 int hns_roce_dereg_mr(struct ib_mr *ibmr)
 {
+	struct hns_roce_dev *hr_dev = to_hr_dev(ibmr->device);
 	struct hns_roce_mr *mr = to_hr_mr(ibmr);
+	int ret = 0;
 
-	hns_roce_mr_free(to_hr_dev(ibmr->device), mr);
-	if (mr->umem)
-		ib_umem_release(mr->umem);
+	if (hr_dev->hw->dereg_mr) {
+		ret = hr_dev->hw->dereg_mr(hr_dev, mr);
+	} else {
+		hns_roce_mr_free(hr_dev, mr);
 
-	kfree(mr);
+		if (mr->umem)
+			ib_umem_release(mr->umem);
 
-	return 0;
+		kfree(mr);
+	}
+
+	return ret;
 }
-- 
1.7.9.5

^ permalink raw reply related

* [PATCH for-next 1/6] IB/hns: Fix the bug when destroy qp
From: Salil Mehta @ 2016-11-29 23:10 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: salil.mehta-hv44wF8Li93QT0dZR+AlfA,
	xavier.huwei-hv44wF8Li93QT0dZR+AlfA,
	oulijun-hv44wF8Li93QT0dZR+AlfA, xushaobo2-hv44wF8Li93QT0dZR+AlfA,
	mehta.salil.lnk-Re5JQEeQqe8AvxtiuMwx3w, lijun_nudt-9Onoh4P/yGk,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linuxarm-hv44wF8Li93QT0dZR+AlfA, Dongdong Huang
In-Reply-To: <20161129231030.1105600-1-salil.mehta-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>

From: "Wei Hu (Xavier)" <xavier.huwei-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>

If send queue is still working when qp is in reset state by modify qp
in destroy qp function, hardware will hold on and don't work in hip06
SoC. In current codes, RoCE driver check hardware pointer of sending and
hardware pointer of processing to ensure that hardware has processed all
the dbs of this qp. But while the environment of wire becomes not good,
The checking time maybe too long.

In order to solve this problem, RoCE driver created a workqueue at probe
function. If there is a timeout when checking the status of qp, driver
initialize work entry and push it into the workqueue, Work function will
finish checking and release the related resources later.

Signed-off-by: Wei Hu (Xavier) <xavier.huwei-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
Signed-off-by: Lijun Ou <oulijun-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
Signed-off-by: Dongdong Huang(Donald) <hdd.huang-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
Signed-off-by: Salil Mehta <salil.mehta-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
---
 drivers/infiniband/hw/hns/hns_roce_common.h |   40 +++
 drivers/infiniband/hw/hns/hns_roce_hw_v1.c  |  435 +++++++++++++++++++++------
 drivers/infiniband/hw/hns/hns_roce_hw_v1.h  |   23 ++
 3 files changed, 402 insertions(+), 96 deletions(-)

diff --git a/drivers/infiniband/hw/hns/hns_roce_common.h b/drivers/infiniband/hw/hns/hns_roce_common.h
index 0dcb620..a055632 100644
--- a/drivers/infiniband/hw/hns/hns_roce_common.h
+++ b/drivers/infiniband/hw/hns/hns_roce_common.h
@@ -57,6 +57,32 @@
 #define roce_set_bit(origin, shift, val) \
 	roce_set_field((origin), (1ul << (shift)), (shift), (val))
 
+/*
+ * roce_hw_index_cmp_lt - Compare two hardware index values in hisilicon
+ *                        SOC, check if a is less than b.
+ * @a: hardware index value
+ * @b: hardware index value
+ * @bits: the number of bits of a and b, range: 0~31.
+ *
+ * Hardware index increases continuously till max value, and then restart
+ * from zero, again and again. Because the bits of reg field is often
+ * limited, the reg field can only hold the low bits of the hardware index
+ * in hisilicon SOC.
+ * In some scenes we need to compare two values(a,b) getted from two reg
+ * fields in this driver, for example:
+ * If a equals 0xfffe, b equals 0x1 and bits equals 16, we think b has
+ * incresed from 0xffff to 0x1 and a is less than b.
+ * If a equals 0xfffe, b equals 0x0xf001 and bits equals 16, we think a
+ * is bigger than b.
+ *
+ * Return true on a less than b, otherwise false.
+ */
+#define roce_hw_index_mask(bits)	((1ul << (bits)) - 1)
+#define roce_hw_index_shift(bits)	(32 - (bits))
+#define roce_hw_index_cmp_lt(a, b, bits) \
+	((int)((((a) - (b)) & roce_hw_index_mask(bits)) << \
+		roce_hw_index_shift(bits)) < 0)
+
 #define ROCEE_GLB_CFG_ROCEE_DB_SQ_MODE_S 3
 #define ROCEE_GLB_CFG_ROCEE_DB_OTH_MODE_S 4
 
@@ -245,10 +271,22 @@
 #define ROCEE_SDB_SEND_PTR_SDB_SEND_PTR_M   \
 	(((1UL << 28) - 1) << ROCEE_SDB_SEND_PTR_SDB_SEND_PTR_S)
 
+#define ROCEE_SDB_PTR_CMP_BITS 28
+
 #define ROCEE_SDB_INV_CNT_SDB_INV_CNT_S 0
 #define ROCEE_SDB_INV_CNT_SDB_INV_CNT_M   \
 	(((1UL << 16) - 1) << ROCEE_SDB_INV_CNT_SDB_INV_CNT_S)
 
+#define ROCEE_SDB_RETRY_CNT_SDB_RETRY_CT_S	0
+#define ROCEE_SDB_RETRY_CNT_SDB_RETRY_CT_M	\
+	(((1UL << 16) - 1) << ROCEE_SDB_RETRY_CNT_SDB_RETRY_CT_S)
+
+#define ROCEE_SDB_CNT_CMP_BITS 16
+
+#define ROCEE_TSP_BP_ST_QH_FIFO_ENTRY_S	20
+
+#define ROCEE_CNT_CLR_CE_CNT_CLR_CE_S 0
+
 /*************ROCEE_REG DEFINITION****************/
 #define ROCEE_VENDOR_ID_REG			0x0
 #define ROCEE_VENDOR_PART_ID_REG		0x4
@@ -317,6 +355,8 @@
 #define ROCEE_SDB_ISSUE_PTR_REG			0x758
 #define ROCEE_SDB_SEND_PTR_REG			0x75C
 #define ROCEE_SDB_INV_CNT_REG			0x9A4
+#define ROCEE_SDB_RETRY_CNT_REG			0x9AC
+#define ROCEE_TSP_BP_ST_REG			0x9EC
 #define ROCEE_ECC_UCERR_ALM0_REG		0xB34
 #define ROCEE_ECC_CERR_ALM0_REG			0xB40
 
diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v1.c b/drivers/infiniband/hw/hns/hns_roce_hw_v1.c
index 125ab90..aee1d01 100644
--- a/drivers/infiniband/hw/hns/hns_roce_hw_v1.c
+++ b/drivers/infiniband/hw/hns/hns_roce_hw_v1.c
@@ -948,6 +948,38 @@ int hns_roce_v1_reset(struct hns_roce_dev *hr_dev, bool dereset)
 	return ret;
 }
 
+static int hns_roce_des_qp_init(struct hns_roce_dev *hr_dev)
+{
+	struct device *dev = &hr_dev->pdev->dev;
+	struct hns_roce_v1_priv *priv;
+	struct hns_roce_des_qp *des_qp;
+
+	priv = (struct hns_roce_v1_priv *)hr_dev->hw->priv;
+	des_qp = &priv->des_qp;
+
+	des_qp->requeue_flag = 1;
+	des_qp->qp_wq = create_singlethread_workqueue("hns_roce_destroy_qp");
+	if (!des_qp->qp_wq) {
+		dev_err(dev, "Create destroy qp workqueue failed!\n");
+		return -ENOMEM;
+	}
+
+	return 0;
+}
+
+static void hns_roce_des_qp_free(struct hns_roce_dev *hr_dev)
+{
+	struct hns_roce_v1_priv *priv;
+	struct hns_roce_des_qp *des_qp;
+
+	priv = (struct hns_roce_v1_priv *)hr_dev->hw->priv;
+	des_qp = &priv->des_qp;
+
+	des_qp->requeue_flag = 0;
+	flush_workqueue(des_qp->qp_wq);
+	destroy_workqueue(des_qp->qp_wq);
+}
+
 void hns_roce_v1_profile(struct hns_roce_dev *hr_dev)
 {
 	int i = 0;
@@ -1050,8 +1082,6 @@ int hns_roce_v1_init(struct hns_roce_dev *hr_dev)
 		goto error_failed_raq_init;
 	}
 
-	hns_roce_port_enable(hr_dev, HNS_ROCE_PORT_UP);
-
 	ret = hns_roce_bt_init(hr_dev);
 	if (ret) {
 		dev_err(dev, "bt init failed!\n");
@@ -1064,13 +1094,23 @@ int hns_roce_v1_init(struct hns_roce_dev *hr_dev)
 		goto error_failed_tptr_init;
 	}
 
+	ret = hns_roce_des_qp_init(hr_dev);
+	if (ret) {
+		dev_err(dev, "des qp init failed!\n");
+		goto error_failed_des_qp_init;
+	}
+
+	hns_roce_port_enable(hr_dev, HNS_ROCE_PORT_UP);
+
 	return 0;
 
+error_failed_des_qp_init:
+	hns_roce_tptr_free(hr_dev);
+
 error_failed_tptr_init:
 	hns_roce_bt_free(hr_dev);
 
 error_failed_bt_init:
-	hns_roce_port_enable(hr_dev, HNS_ROCE_PORT_DOWN);
 	hns_roce_raq_free(hr_dev);
 
 error_failed_raq_init:
@@ -1080,9 +1120,10 @@ int hns_roce_v1_init(struct hns_roce_dev *hr_dev)
 
 void hns_roce_v1_exit(struct hns_roce_dev *hr_dev)
 {
+	hns_roce_port_enable(hr_dev, HNS_ROCE_PORT_DOWN);
+	hns_roce_des_qp_free(hr_dev);
 	hns_roce_tptr_free(hr_dev);
 	hns_roce_bt_free(hr_dev);
-	hns_roce_port_enable(hr_dev, HNS_ROCE_PORT_DOWN);
 	hns_roce_raq_free(hr_dev);
 	hns_roce_db_free(hr_dev);
 }
@@ -2906,132 +2947,334 @@ int hns_roce_v1_query_qp(struct ib_qp *ibqp, struct ib_qp_attr *qp_attr,
 		hns_roce_v1_q_sqp(ibqp, qp_attr, qp_attr_mask, qp_init_attr) :
 		hns_roce_v1_q_qp(ibqp, qp_attr, qp_attr_mask, qp_init_attr);
 }
-static void hns_roce_v1_destroy_qp_common(struct hns_roce_dev *hr_dev,
-					  struct hns_roce_qp *hr_qp,
-					  int is_user)
+
+static int check_qp_db_process_status(struct hns_roce_dev *hr_dev,
+				      struct hns_roce_qp *hr_qp,
+				      u32 sdb_issue_ptr,
+				      u32 *sdb_inv_cnt,
+				      u32 *wait_stage)
 {
-	u32 sdbinvcnt;
-	unsigned long end = 0;
-	u32 sdbinvcnt_val;
-	u32 sdbsendptr_val;
-	u32 sdbisusepr_val;
-	struct hns_roce_cq *send_cq, *recv_cq;
 	struct device *dev = &hr_dev->pdev->dev;
+	u32 sdb_retry_cnt, old_retry;
+	u32 sdb_send_ptr, old_send;
+	u32 success_flags = 0;
+	u32 cur_cnt, old_cnt;
+	unsigned long end;
+	u32 send_ptr;
+	u32 inv_cnt;
+	u32 tsp_st;
+
+	if (*wait_stage > HNS_ROCE_V1_DB_STAGE2 ||
+	    *wait_stage < HNS_ROCE_V1_DB_STAGE1) {
+		dev_err(dev, "QP(0x%lx) db status wait stage(%d) error!\n",
+			hr_qp->qpn, *wait_stage);
+		return -EINVAL;
+	}
 
-	if (hr_qp->ibqp.qp_type == IB_QPT_RC) {
-		if (hr_qp->state != IB_QPS_RESET) {
-			/*
-			 * Set qp to ERR,
-			 * waiting for hw complete processing all dbs
-			 */
-			if (hns_roce_v1_qp_modify(hr_dev, NULL,
-					to_hns_roce_state(
-						(enum ib_qp_state)hr_qp->state),
-						HNS_ROCE_QP_STATE_ERR, NULL,
-						hr_qp))
-				dev_err(dev, "modify QP %06lx to ERR failed.\n",
-					hr_qp->qpn);
-
-			/* Record issued doorbell */
-			sdbisusepr_val = roce_read(hr_dev,
-					 ROCEE_SDB_ISSUE_PTR_REG);
-			/*
-			 * Query db process status,
-			 * until hw process completely
-			 */
-			end = msecs_to_jiffies(
-			      HNS_ROCE_QP_DESTROY_TIMEOUT_MSECS) + jiffies;
-			do {
-				sdbsendptr_val = roce_read(hr_dev,
+	/* Calculate the total timeout for the entire verification process */
+	end = msecs_to_jiffies(HNS_ROCE_V1_CHECK_DB_TIMEOUT_MSECS) + jiffies;
+
+	if (*wait_stage == HNS_ROCE_V1_DB_STAGE1) {
+		/* Query db process status, until hw process completely */
+		sdb_send_ptr = roce_read(hr_dev, ROCEE_SDB_SEND_PTR_REG);
+		while (roce_hw_index_cmp_lt(sdb_send_ptr, sdb_issue_ptr,
+					    ROCEE_SDB_PTR_CMP_BITS)) {
+			if (!time_before(jiffies, end)) {
+				dev_dbg(dev, "QP(0x%lx) db process stage1 timeout. issue 0x%x send 0x%x.\n",
+					hr_qp->qpn, sdb_issue_ptr,
+					sdb_send_ptr);
+				return 0;
+			}
+
+			msleep(HNS_ROCE_V1_CHECK_DB_SLEEP_MSECS);
+			sdb_send_ptr = roce_read(hr_dev,
 						 ROCEE_SDB_SEND_PTR_REG);
-				if (!time_before(jiffies, end)) {
-					dev_err(dev, "destroy qp(0x%lx) timeout!!!",
-						hr_qp->qpn);
-					break;
-				}
-			} while ((short)(roce_get_field(sdbsendptr_val,
-					ROCEE_SDB_SEND_PTR_SDB_SEND_PTR_M,
-					ROCEE_SDB_SEND_PTR_SDB_SEND_PTR_S) -
-				roce_get_field(sdbisusepr_val,
-					ROCEE_SDB_ISSUE_PTR_SDB_ISSUE_PTR_M,
-					ROCEE_SDB_ISSUE_PTR_SDB_ISSUE_PTR_S)
-				) < 0);
+		}
 
-			/* Get list pointer */
-			sdbinvcnt = roce_read(hr_dev, ROCEE_SDB_INV_CNT_REG);
+		if (roce_get_field(sdb_issue_ptr,
+				   ROCEE_SDB_ISSUE_PTR_SDB_ISSUE_PTR_M,
+				   ROCEE_SDB_ISSUE_PTR_SDB_ISSUE_PTR_S) ==
+		    roce_get_field(sdb_send_ptr,
+				   ROCEE_SDB_SEND_PTR_SDB_SEND_PTR_M,
+				   ROCEE_SDB_SEND_PTR_SDB_SEND_PTR_S)) {
+			old_send = roce_read(hr_dev, ROCEE_SDB_SEND_PTR_REG);
+			old_retry = roce_read(hr_dev, ROCEE_SDB_RETRY_CNT_REG);
 
-			/* Query db's list status, until hw reversal */
 			do {
-				sdbinvcnt_val = roce_read(hr_dev,
-						ROCEE_SDB_INV_CNT_REG);
+				tsp_st = roce_read(hr_dev, ROCEE_TSP_BP_ST_REG);
+				if (roce_get_bit(tsp_st,
+					ROCEE_TSP_BP_ST_QH_FIFO_ENTRY_S) == 1) {
+					*wait_stage = HNS_ROCE_V1_DB_WAIT_OK;
+					return 0;
+				}
+
 				if (!time_before(jiffies, end)) {
-					dev_err(dev, "destroy qp(0x%lx) timeout!!!",
-						hr_qp->qpn);
-					dev_err(dev, "SdbInvCnt = 0x%x\n",
-						sdbinvcnt_val);
-					break;
+					dev_dbg(dev, "QP(0x%lx) db process stage1 timeout when send ptr equals issue ptr.\n"
+						     "issue 0x%x send 0x%x.\n",
+						hr_qp->qpn, sdb_issue_ptr,
+						sdb_send_ptr);
+					return 0;
 				}
-			} while ((short)(roce_get_field(sdbinvcnt_val,
-				  ROCEE_SDB_INV_CNT_SDB_INV_CNT_M,
-				  ROCEE_SDB_INV_CNT_SDB_INV_CNT_S) -
-				  (sdbinvcnt + SDB_INV_CNT_OFFSET)) < 0);
-
-			/* Modify qp to reset before destroying qp */
-			if (hns_roce_v1_qp_modify(hr_dev, NULL,
-					to_hns_roce_state(
-					(enum ib_qp_state)hr_qp->state),
-					HNS_ROCE_QP_STATE_RST, NULL, hr_qp))
-				dev_err(dev, "modify QP %06lx to RESET failed.\n",
-					hr_qp->qpn);
+
+				msleep(HNS_ROCE_V1_CHECK_DB_SLEEP_MSECS);
+
+				sdb_send_ptr = roce_read(hr_dev,
+							ROCEE_SDB_SEND_PTR_REG);
+				sdb_retry_cnt =	roce_read(hr_dev,
+						       ROCEE_SDB_RETRY_CNT_REG);
+				cur_cnt = roce_get_field(sdb_send_ptr,
+					ROCEE_SDB_SEND_PTR_SDB_SEND_PTR_M,
+					ROCEE_SDB_SEND_PTR_SDB_SEND_PTR_S) +
+					roce_get_field(sdb_retry_cnt,
+					ROCEE_SDB_RETRY_CNT_SDB_RETRY_CT_M,
+					ROCEE_SDB_RETRY_CNT_SDB_RETRY_CT_S);
+				if (!roce_get_bit(tsp_st,
+					ROCEE_CNT_CLR_CE_CNT_CLR_CE_S)) {
+					old_cnt = roce_get_field(old_send,
+					ROCEE_SDB_SEND_PTR_SDB_SEND_PTR_M,
+					ROCEE_SDB_SEND_PTR_SDB_SEND_PTR_S) +
+					roce_get_field(old_retry,
+					ROCEE_SDB_RETRY_CNT_SDB_RETRY_CT_M,
+					ROCEE_SDB_RETRY_CNT_SDB_RETRY_CT_S);
+					if (cur_cnt - old_cnt > SDB_ST_CMP_VAL)
+						success_flags = 1;
+				} else {
+					old_cnt = roce_get_field(old_send,
+					ROCEE_SDB_SEND_PTR_SDB_SEND_PTR_M,
+					ROCEE_SDB_SEND_PTR_SDB_SEND_PTR_S);
+					if (cur_cnt - old_cnt > SDB_ST_CMP_VAL)
+						success_flags = 1;
+					else {
+					    send_ptr = roce_get_field(old_send,
+					    ROCEE_SDB_SEND_PTR_SDB_SEND_PTR_M,
+					    ROCEE_SDB_SEND_PTR_SDB_SEND_PTR_S) +
+					    roce_get_field(sdb_retry_cnt,
+					    ROCEE_SDB_RETRY_CNT_SDB_RETRY_CT_M,
+					    ROCEE_SDB_RETRY_CNT_SDB_RETRY_CT_S);
+					    roce_set_field(old_send,
+					    ROCEE_SDB_SEND_PTR_SDB_SEND_PTR_M,
+					    ROCEE_SDB_SEND_PTR_SDB_SEND_PTR_S,
+						send_ptr);
+					}
+				}
+			} while (!success_flags);
+		}
+
+		*wait_stage = HNS_ROCE_V1_DB_STAGE2;
+
+		/* Get list pointer */
+		*sdb_inv_cnt = roce_read(hr_dev, ROCEE_SDB_INV_CNT_REG);
+		dev_dbg(dev, "QP(0x%lx) db process stage2. inv cnt = 0x%x.\n",
+			hr_qp->qpn, *sdb_inv_cnt);
+	}
+
+	if (*wait_stage == HNS_ROCE_V1_DB_STAGE2) {
+		/* Query db's list status, until hw reversal */
+		inv_cnt = roce_read(hr_dev, ROCEE_SDB_INV_CNT_REG);
+		while (roce_hw_index_cmp_lt(inv_cnt,
+					    *sdb_inv_cnt + SDB_INV_CNT_OFFSET,
+					    ROCEE_SDB_CNT_CMP_BITS)) {
+			if (!time_before(jiffies, end)) {
+				dev_dbg(dev, "QP(0x%lx) db process stage2 timeout. inv cnt 0x%x.\n",
+					hr_qp->qpn, inv_cnt);
+				return 0;
+			}
+
+			msleep(HNS_ROCE_V1_CHECK_DB_SLEEP_MSECS);
+			inv_cnt = roce_read(hr_dev, ROCEE_SDB_INV_CNT_REG);
 		}
+
+		*wait_stage = HNS_ROCE_V1_DB_WAIT_OK;
+	}
+
+	return 0;
+}
+
+static int check_qp_reset_state(struct hns_roce_dev *hr_dev,
+				struct hns_roce_qp *hr_qp,
+				struct hns_roce_qp_work *qp_work_entry,
+				int *is_timeout)
+{
+	struct device *dev = &hr_dev->pdev->dev;
+	u32 sdb_issue_ptr;
+	int ret;
+
+	if (hr_qp->state != IB_QPS_RESET) {
+		/* Set qp to ERR, waiting for hw complete processing all dbs */
+		ret = hns_roce_v1_modify_qp(&hr_qp->ibqp, NULL, 0, hr_qp->state,
+					    IB_QPS_ERR);
+		if (ret) {
+			dev_err(dev, "Modify QP(0x%lx) to ERR failed!\n",
+				hr_qp->qpn);
+			return ret;
+		}
+
+		/* Record issued doorbell */
+		sdb_issue_ptr = roce_read(hr_dev, ROCEE_SDB_ISSUE_PTR_REG);
+		qp_work_entry->sdb_issue_ptr = sdb_issue_ptr;
+		qp_work_entry->db_wait_stage = HNS_ROCE_V1_DB_STAGE1;
+
+		/* Query db process status, until hw process completely */
+		ret = check_qp_db_process_status(hr_dev, hr_qp, sdb_issue_ptr,
+						 &qp_work_entry->sdb_inv_cnt,
+						 &qp_work_entry->db_wait_stage);
+		if (ret) {
+			dev_err(dev, "Check QP(0x%lx) db process status failed!\n",
+				hr_qp->qpn);
+			return ret;
+		}
+
+		if (qp_work_entry->db_wait_stage != HNS_ROCE_V1_DB_WAIT_OK) {
+			qp_work_entry->sche_cnt = 0;
+			*is_timeout = 1;
+			return 0;
+		}
+
+		/* Modify qp to reset before destroying qp */
+		ret = hns_roce_v1_modify_qp(&hr_qp->ibqp, NULL, 0, hr_qp->state,
+					    IB_QPS_RESET);
+		if (ret) {
+			dev_err(dev, "Modify QP(0x%lx) to RST failed!\n",
+				hr_qp->qpn);
+			return ret;
+		}
+	}
+
+	return 0;
+}
+
+static void hns_roce_v1_destroy_qp_work_fn(struct work_struct *work)
+{
+	struct hns_roce_qp_work *qp_work_entry;
+	struct hns_roce_v1_priv *priv;
+	struct hns_roce_dev *hr_dev;
+	struct hns_roce_qp *hr_qp;
+	struct device *dev;
+	int ret;
+
+	qp_work_entry = container_of(work, struct hns_roce_qp_work, work);
+	hr_dev = to_hr_dev(qp_work_entry->ib_dev);
+	dev = &hr_dev->pdev->dev;
+	priv = (struct hns_roce_v1_priv *)hr_dev->hw->priv;
+	hr_qp = qp_work_entry->qp;
+
+	dev_dbg(dev, "Schedule destroy QP(0x%lx) work.\n", hr_qp->qpn);
+
+	qp_work_entry->sche_cnt++;
+
+	/* Query db process status, until hw process completely */
+	ret = check_qp_db_process_status(hr_dev, hr_qp,
+					 qp_work_entry->sdb_issue_ptr,
+					 &qp_work_entry->sdb_inv_cnt,
+					 &qp_work_entry->db_wait_stage);
+	if (ret) {
+		dev_err(dev, "Check QP(0x%lx) db process status failed!\n",
+			hr_qp->qpn);
+		return;
+	}
+
+	if (qp_work_entry->db_wait_stage != HNS_ROCE_V1_DB_WAIT_OK &&
+	    priv->des_qp.requeue_flag) {
+		queue_work(priv->des_qp.qp_wq, work);
+		return;
+	}
+
+	/* Modify qp to reset before destroying qp */
+	ret = hns_roce_v1_modify_qp(&hr_qp->ibqp, NULL, 0, hr_qp->state,
+				    IB_QPS_RESET);
+	if (ret) {
+		dev_err(dev, "Modify QP(0x%lx) to RST failed!\n", hr_qp->qpn);
+		return;
+	}
+
+	hns_roce_qp_remove(hr_dev, hr_qp);
+	hns_roce_qp_free(hr_dev, hr_qp);
+
+	if (hr_qp->ibqp.qp_type == IB_QPT_RC) {
+		/* RC QP, release QPN */
+		hns_roce_release_range_qp(hr_dev, hr_qp->qpn, 1);
+		kfree(hr_qp);
+	} else
+		kfree(hr_to_hr_sqp(hr_qp));
+
+	kfree(qp_work_entry);
+
+	dev_dbg(dev, "Accomplished destroy QP(0x%lx) work.\n", hr_qp->qpn);
+}
+
+int hns_roce_v1_destroy_qp(struct ib_qp *ibqp)
+{
+	struct hns_roce_dev *hr_dev = to_hr_dev(ibqp->device);
+	struct hns_roce_qp *hr_qp = to_hr_qp(ibqp);
+	struct device *dev = &hr_dev->pdev->dev;
+	struct hns_roce_qp_work qp_work_entry;
+	struct hns_roce_qp_work *qp_work;
+	struct hns_roce_v1_priv *priv;
+	struct hns_roce_cq *send_cq, *recv_cq;
+	int is_user = !!ibqp->pd->uobject;
+	int is_timeout = 0;
+	int ret;
+
+	ret = check_qp_reset_state(hr_dev, hr_qp, &qp_work_entry, &is_timeout);
+	if (ret) {
+		dev_err(dev, "QP reset state check failed(%d)!\n", ret);
+		return ret;
 	}
 
 	send_cq = to_hr_cq(hr_qp->ibqp.send_cq);
 	recv_cq = to_hr_cq(hr_qp->ibqp.recv_cq);
 
 	hns_roce_lock_cqs(send_cq, recv_cq);
-
 	if (!is_user) {
 		__hns_roce_v1_cq_clean(recv_cq, hr_qp->qpn, hr_qp->ibqp.srq ?
 				       to_hr_srq(hr_qp->ibqp.srq) : NULL);
 		if (send_cq != recv_cq)
 			__hns_roce_v1_cq_clean(send_cq, hr_qp->qpn, NULL);
 	}
-
-	hns_roce_qp_remove(hr_dev, hr_qp);
-
 	hns_roce_unlock_cqs(send_cq, recv_cq);
 
-	hns_roce_qp_free(hr_dev, hr_qp);
+	if (!is_timeout) {
+		hns_roce_qp_remove(hr_dev, hr_qp);
+		hns_roce_qp_free(hr_dev, hr_qp);
 
-	/* Not special_QP, free their QPN */
-	if ((hr_qp->ibqp.qp_type == IB_QPT_RC) ||
-	    (hr_qp->ibqp.qp_type == IB_QPT_UC) ||
-	    (hr_qp->ibqp.qp_type == IB_QPT_UD))
-		hns_roce_release_range_qp(hr_dev, hr_qp->qpn, 1);
+		/* RC QP, release QPN */
+		if (hr_qp->ibqp.qp_type == IB_QPT_RC)
+			hns_roce_release_range_qp(hr_dev, hr_qp->qpn, 1);
+	}
 
 	hns_roce_mtt_cleanup(hr_dev, &hr_qp->mtt);
 
-	if (is_user) {
+	if (is_user)
 		ib_umem_release(hr_qp->umem);
-	} else {
+	else {
 		kfree(hr_qp->sq.wrid);
 		kfree(hr_qp->rq.wrid);
+
 		hns_roce_buf_free(hr_dev, hr_qp->buff_size, &hr_qp->hr_buf);
 	}
-}
 
-int hns_roce_v1_destroy_qp(struct ib_qp *ibqp)
-{
-	struct hns_roce_dev *hr_dev = to_hr_dev(ibqp->device);
-	struct hns_roce_qp *hr_qp = to_hr_qp(ibqp);
-
-	hns_roce_v1_destroy_qp_common(hr_dev, hr_qp, !!ibqp->pd->uobject);
-
-	if (hr_qp->ibqp.qp_type == IB_QPT_GSI)
-		kfree(hr_to_hr_sqp(hr_qp));
-	else
-		kfree(hr_qp);
+	if (!is_timeout) {
+		if (hr_qp->ibqp.qp_type == IB_QPT_RC)
+			kfree(hr_qp);
+		else
+			kfree(hr_to_hr_sqp(hr_qp));
+	} else {
+		qp_work = kzalloc(sizeof(*qp_work), GFP_KERNEL);
+		if (!qp_work)
+			return -ENOMEM;
+
+		INIT_WORK(&qp_work->work, hns_roce_v1_destroy_qp_work_fn);
+		qp_work->ib_dev	= &hr_dev->ib_dev;
+		qp_work->qp		= hr_qp;
+		qp_work->db_wait_stage	= qp_work_entry.db_wait_stage;
+		qp_work->sdb_issue_ptr	= qp_work_entry.sdb_issue_ptr;
+		qp_work->sdb_inv_cnt	= qp_work_entry.sdb_inv_cnt;
+		qp_work->sche_cnt	= qp_work_entry.sche_cnt;
+
+		priv = (struct hns_roce_v1_priv *)hr_dev->hw->priv;
+		queue_work(priv->des_qp.qp_wq, &qp_work->work);
+		dev_dbg(dev, "Begin destroy QP(0x%lx) work.\n", hr_qp->qpn);
+	}
 
 	return 0;
 }
diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v1.h b/drivers/infiniband/hw/hns/hns_roce_hw_v1.h
index cf28f1b..1d250c0 100644
--- a/drivers/infiniband/hw/hns/hns_roce_hw_v1.h
+++ b/drivers/infiniband/hw/hns/hns_roce_hw_v1.h
@@ -102,6 +102,12 @@
 #define HNS_ROCE_V1_EXT_ODB_ALFUL	\
 	(HNS_ROCE_V1_EXT_ODB_DEPTH - HNS_ROCE_V1_DB_RSVD)
 
+#define HNS_ROCE_V1_DB_WAIT_OK				0
+#define HNS_ROCE_V1_DB_STAGE1				1
+#define HNS_ROCE_V1_DB_STAGE2				2
+#define HNS_ROCE_V1_CHECK_DB_TIMEOUT_MSECS		10000
+#define HNS_ROCE_V1_CHECK_DB_SLEEP_MSECS		20
+
 #define HNS_ROCE_BT_RSV_BUF_SIZE			(1 << 17)
 
 #define HNS_ROCE_V1_TPTR_ENTRY_SIZE			2
@@ -144,6 +150,7 @@
 #define SQ_PSN_SHIFT					8
 #define QKEY_VAL					0x80010000
 #define SDB_INV_CNT_OFFSET				8
+#define SDB_ST_CMP_VAL					8
 
 struct hns_roce_cq_context {
 	u32 cqc_byte_4;
@@ -993,11 +1000,27 @@ struct hns_roce_tptr_table {
 	struct hns_roce_buf_list tptr_buf;
 };
 
+struct hns_roce_qp_work {
+	struct	work_struct work;
+	struct	ib_device *ib_dev;
+	struct	hns_roce_qp *qp;
+	u32	db_wait_stage;
+	u32	sdb_issue_ptr;
+	u32	sdb_inv_cnt;
+	u32	sche_cnt;
+};
+
+struct hns_roce_des_qp {
+	struct workqueue_struct	*qp_wq;
+	int	requeue_flag;
+};
+
 struct hns_roce_v1_priv {
 	struct hns_roce_db_table  db_table;
 	struct hns_roce_raq_table raq_table;
 	struct hns_roce_bt_table  bt_table;
 	struct hns_roce_tptr_table tptr_table;
+	struct hns_roce_des_qp des_qp;
 };
 
 int hns_dsaf_roce_reset(struct fwnode_handle *dsaf_fwnode, bool dereset);
-- 
1.7.9.5


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH for-next 0/6] IB/hns: Bug Fixes for HNS RoCE Driver
From: Salil Mehta @ 2016-11-29 23:10 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: salil.mehta-hv44wF8Li93QT0dZR+AlfA,
	xavier.huwei-hv44wF8Li93QT0dZR+AlfA,
	oulijun-hv44wF8Li93QT0dZR+AlfA, xushaobo2-hv44wF8Li93QT0dZR+AlfA,
	mehta.salil.lnk-Re5JQEeQqe8AvxtiuMwx3w, lijun_nudt-9Onoh4P/yGk,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linuxarm-hv44wF8Li93QT0dZR+AlfA

This patch-set contains bug fixes for the HNS RoCE driver.

Lijun Ou (1):
  IB/hns: Fix the IB device name

Shaobo Xu (2):
  IB/hns: Fix the bug when free mr
  IB/hns: Fix the bug when free cq

Wei Hu (Xavier) (3):
  IB/hns: Fix the bug when destroy qp
  IB/hns: Fix the bug of setting port mtu
  IB/hns: Delete the redundant memset operation

 drivers/infiniband/hw/hns/hns_roce_cmd.h    |    5 -
 drivers/infiniband/hw/hns/hns_roce_common.h |   42 ++
 drivers/infiniband/hw/hns/hns_roce_cq.c     |   27 +-
 drivers/infiniband/hw/hns/hns_roce_device.h |   18 +
 drivers/infiniband/hw/hns/hns_roce_hw_v1.c  |  967 ++++++++++++++++++++++++---
 drivers/infiniband/hw/hns/hns_roce_hw_v1.h  |   57 ++
 drivers/infiniband/hw/hns/hns_roce_main.c   |   26 +-
 drivers/infiniband/hw/hns/hns_roce_mr.c     |   21 +-
 8 files changed, 1026 insertions(+), 137 deletions(-)

-- 
1.7.9.5


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH] ethtool: mark mask values as ULL values
From: Jacob Keller @ 2016-11-29 23:08 UTC (permalink / raw)
  To: netdev, Intel Wired LAN; +Cc: David Miller, Jakub Kicinski, Jacob Keller

If compiling with signed checks enabled, there will be warnings
generated by the ETHTOOL_RX_FLOW_SPEC_RING(_VF) masks. These are
currently marked as signed constants, which will generate warnings when
masking with unsigned values. Avoid this by marking them explicitly as
unsigned values.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
---
 include/uapi/linux/ethtool.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/uapi/linux/ethtool.h b/include/uapi/linux/ethtool.h
index ed650eef9e54..a716112b2b01 100644
--- a/include/uapi/linux/ethtool.h
+++ b/include/uapi/linux/ethtool.h
@@ -891,8 +891,8 @@ struct ethtool_rx_flow_spec {
  * space for this at this time. If a future patch consumes the next
  * byte it should be aware of this possiblity.
  */
-#define ETHTOOL_RX_FLOW_SPEC_RING	0x00000000FFFFFFFFLL
-#define ETHTOOL_RX_FLOW_SPEC_RING_VF	0x000000FF00000000LL
+#define ETHTOOL_RX_FLOW_SPEC_RING	0x00000000FFFFFFFFULL
+#define ETHTOOL_RX_FLOW_SPEC_RING_VF	0x000000FF00000000ULL
 #define ETHTOOL_RX_FLOW_SPEC_RING_VF_OFF 32
 static inline __u64 ethtool_get_flow_spec_ring(__u64 ring_cookie)
 {
-- 
2.11.0.rc2.152.g4d04e67

^ permalink raw reply related

* Re: 4.9-rc7: (forcedeth?) BUG: sleeping function called from invalid context at kernel/irq/manage.c:110
From: Eric Dumazet @ 2016-11-29 23:00 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Linux Kernel Mailing List, Network Development, Meelis Roos
In-Reply-To: <CA+55aFwUFPLbcqpoccKXHsGj_80ANaMwCSgwTUqgxMFmNRHkaQ@mail.gmail.com>

On Tue, 2016-11-29 at 12:06 -0800, Linus Torvalds wrote:
> On Nov 29, 2016 11:58 AM, "Eric Dumazet" <eric.dumazet@gmail.com>
> wrote:
> >
> > nv_do_nic_poll() is simply buggy and needs a fix.
> >
> > synchronize_irq() can sleep.
> 
> Yes, but why did it start showing up now? None of this has changed as
> far as I can see?
> 
> Is it just timing and the transmit queue being busy? Perhaps due to
> the sheer size of messages? Meelis does seem to have a ton of
> debugging enabled..
> 
>     Linus
> 
> 
Looks like bad luck really or forced_irqthreads ? 


These commits contain what is wanted.

02cea3958664723a5d2236f0f0058de97c7e4693 
genirq: Provide disable_hardirq()



18258f7239a61d8929b8e0c7b6d46c446459074c
genirq: Provide synchronize_hardirq()

^ permalink raw reply

* [PATCH] net: ethernet: ti: cpsw: fix ASSERT_RTNL() warning during resume
From: Grygorii Strashko @ 2016-11-29 22:27 UTC (permalink / raw)
  To: David S. Miller, netdev, Mugunthan V N
  Cc: Sekhar Nori, linux-kernel, linux-omap, Ivan Khoronzhuk,
	Grygorii Strashko, Dave Gerlach

netif_set_real_num_tx/rx_queues() are required to be called with rtnl_lock
taken, otherwise ASSERT_RTNL() warning will be triggered - which happens
now during System resume from suspend:
cpsw_resume()
|- cpsw_ndo_open()
  |- netif_set_real_num_tx/rx_queues()
     |- ASSERT_RTNL();

Hence, fix it by surrounding cpsw_ndo_open() by rtnl_lock/unlock() calls.

Cc: Dave Gerlach <d-gerlach@ti.com>
Cc: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Fixes: commit e05107e6b747 ("net: ethernet: ti: cpsw: add multi queue support")
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
---
 drivers/net/ethernet/ti/cpsw.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
index ae1ec6a..fd6c03b 100644
--- a/drivers/net/ethernet/ti/cpsw.c
+++ b/drivers/net/ethernet/ti/cpsw.c
@@ -2944,6 +2944,8 @@ static int cpsw_resume(struct device *dev)
 	/* Select default pin state */
 	pinctrl_pm_select_default_state(dev);
 
+	/* shut up ASSERT_RTNL() warning in netif_set_real_num_tx/rx_queues */
+	rtnl_lock();
 	if (cpsw->data.dual_emac) {
 		int i;
 
@@ -2955,6 +2957,8 @@ static int cpsw_resume(struct device *dev)
 		if (netif_running(ndev))
 			cpsw_ndo_open(ndev);
 	}
+	rtnl_unlock();
+
 	return 0;
 }
 #endif
-- 
2.10.1

^ permalink raw reply related

* [PATCH net-next 6/7] net/mlx5e: Refactor tc del flow to accept mlx5e_tc_flow instance
From: Saeed Mahameed @ 2016-11-29 22:19 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Tariq Toukan, Or Gerlitz, Roi Dayan, Sebastian Ott,
	Saeed Mahameed
In-Reply-To: <1480457972-31286-1-git-send-email-saeedm@mellanox.com>

From: Roi Dayan <roid@mellanox.com>

Change the function that deletes offloaded TC rule to get
struct mlx5e_tc_flow instance which contains both the flow
handle and flow attributes. This is a cleanup needed for
downstream patches, it doesn't change any functionality.

Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_tc.c | 13 ++++++-------
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
index 4d71445..3875c1c 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
@@ -143,18 +143,17 @@ mlx5e_tc_add_fdb_flow(struct mlx5e_priv *priv,
 }
 
 static void mlx5e_tc_del_flow(struct mlx5e_priv *priv,
-			      struct mlx5_flow_handle *rule,
-			      struct mlx5_esw_flow_attr *attr)
+			      struct mlx5e_tc_flow *flow)
 {
 	struct mlx5_eswitch *esw = priv->mdev->priv.eswitch;
 	struct mlx5_fc *counter = NULL;
 
-	counter = mlx5_flow_rule_counter(rule);
+	counter = mlx5_flow_rule_counter(flow->rule);
 
-	mlx5_del_flow_rules(rule);
+	mlx5_del_flow_rules(flow->rule);
 
 	if (esw && esw->mode == SRIOV_OFFLOADS)
-		mlx5_eswitch_del_vlan_action(esw, attr);
+		mlx5_eswitch_del_vlan_action(esw, flow->attr);
 
 	mlx5_fc_destroy(priv->mdev, counter);
 
@@ -1005,7 +1004,7 @@ int mlx5e_delete_flower(struct mlx5e_priv *priv,
 
 	rhashtable_remove_fast(&tc->ht, &flow->node, tc->ht_params);
 
-	mlx5e_tc_del_flow(priv, flow->rule, flow->attr);
+	mlx5e_tc_del_flow(priv, flow);
 
 	if (flow->attr->action & MLX5_FLOW_CONTEXT_ACTION_ENCAP)
 		mlx5e_detach_encap(priv, flow);
@@ -1065,7 +1064,7 @@ static void _mlx5e_tc_del_flow(void *ptr, void *arg)
 	struct mlx5e_tc_flow *flow = ptr;
 	struct mlx5e_priv *priv = arg;
 
-	mlx5e_tc_del_flow(priv, flow->rule, flow->attr);
+	mlx5e_tc_del_flow(priv, flow);
 	kfree(flow);
 }
 
-- 
2.7.4

^ permalink raw reply related

* [PATCH net-next 2/7] net/mlx5e: Move function mlx5e_create_umr_mkey
From: Saeed Mahameed @ 2016-11-29 22:19 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Tariq Toukan, Or Gerlitz, Roi Dayan, Sebastian Ott,
	Saeed Mahameed
In-Reply-To: <1480457972-31286-1-git-send-email-saeedm@mellanox.com>

From: Tariq Toukan <tariqt@mellanox.com>

In next patch we are going to create a UMR MKey per RQ, we need
mlx5e_create_umr_mkey declared before mlx5e_create_rq.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 74 +++++++++++------------
 1 file changed, 37 insertions(+), 37 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index ba25cd3..49ca30b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -471,6 +471,43 @@ static void mlx5e_rq_free_mpwqe_info(struct mlx5e_rq *rq)
 	kfree(rq->mpwqe.info);
 }
 
+static int mlx5e_create_umr_mkey(struct mlx5e_priv *priv)
+{
+	struct mlx5_core_dev *mdev = priv->mdev;
+	u64 npages = MLX5E_REQUIRED_MTTS(priv->profile->max_nch(mdev),
+					 BIT(MLX5E_PARAMS_MAXIMUM_LOG_RQ_SIZE_MPW));
+	int inlen = MLX5_ST_SZ_BYTES(create_mkey_in);
+	void *mkc;
+	u32 *in;
+	int err;
+
+	in = mlx5_vzalloc(inlen);
+	if (!in)
+		return -ENOMEM;
+
+	mkc = MLX5_ADDR_OF(create_mkey_in, in, memory_key_mkey_entry);
+
+	npages = min_t(u32, ALIGN(U16_MAX, 4) * 2, npages);
+
+	MLX5_SET(mkc, mkc, free, 1);
+	MLX5_SET(mkc, mkc, umr_en, 1);
+	MLX5_SET(mkc, mkc, lw, 1);
+	MLX5_SET(mkc, mkc, lr, 1);
+	MLX5_SET(mkc, mkc, access_mode, MLX5_MKC_ACCESS_MODE_MTT);
+
+	MLX5_SET(mkc, mkc, qpn, 0xffffff);
+	MLX5_SET(mkc, mkc, pd, mdev->mlx5e_res.pdn);
+	MLX5_SET64(mkc, mkc, len, npages << PAGE_SHIFT);
+	MLX5_SET(mkc, mkc, translations_octword_size,
+		 MLX5_MTT_OCTW(npages));
+	MLX5_SET(mkc, mkc, log_page_size, PAGE_SHIFT);
+
+	err = mlx5_core_create_mkey(mdev, &priv->umr_mkey, in, inlen);
+
+	kvfree(in);
+	return err;
+}
+
 static int mlx5e_create_rq(struct mlx5e_channel *c,
 			   struct mlx5e_rq_param *param,
 			   struct mlx5e_rq *rq)
@@ -3625,43 +3662,6 @@ static void mlx5e_destroy_q_counter(struct mlx5e_priv *priv)
 	mlx5_core_dealloc_q_counter(priv->mdev, priv->q_counter);
 }
 
-static int mlx5e_create_umr_mkey(struct mlx5e_priv *priv)
-{
-	struct mlx5_core_dev *mdev = priv->mdev;
-	u64 npages = MLX5E_REQUIRED_MTTS(priv->profile->max_nch(mdev),
-					 BIT(MLX5E_PARAMS_MAXIMUM_LOG_RQ_SIZE_MPW));
-	int inlen = MLX5_ST_SZ_BYTES(create_mkey_in);
-	void *mkc;
-	u32 *in;
-	int err;
-
-	in = mlx5_vzalloc(inlen);
-	if (!in)
-		return -ENOMEM;
-
-	mkc = MLX5_ADDR_OF(create_mkey_in, in, memory_key_mkey_entry);
-
-	npages = min_t(u32, ALIGN(U16_MAX, 4) * 2, npages);
-
-	MLX5_SET(mkc, mkc, free, 1);
-	MLX5_SET(mkc, mkc, umr_en, 1);
-	MLX5_SET(mkc, mkc, lw, 1);
-	MLX5_SET(mkc, mkc, lr, 1);
-	MLX5_SET(mkc, mkc, access_mode, MLX5_MKC_ACCESS_MODE_MTT);
-
-	MLX5_SET(mkc, mkc, qpn, 0xffffff);
-	MLX5_SET(mkc, mkc, pd, mdev->mlx5e_res.pdn);
-	MLX5_SET64(mkc, mkc, len, npages << PAGE_SHIFT);
-	MLX5_SET(mkc, mkc, translations_octword_size,
-		 MLX5_MTT_OCTW(npages));
-	MLX5_SET(mkc, mkc, log_page_size, PAGE_SHIFT);
-
-	err = mlx5_core_create_mkey(mdev, &priv->umr_mkey, in, inlen);
-
-	kvfree(in);
-	return err;
-}
-
 static void mlx5e_nic_init(struct mlx5_core_dev *mdev,
 			   struct net_device *netdev,
 			   const struct mlx5e_profile *profile,
-- 
2.7.4

^ permalink raw reply related

* [PATCH net-next 3/7] net/mlx5e: Create UMR MKey per RQ
From: Saeed Mahameed @ 2016-11-29 22:19 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Tariq Toukan, Or Gerlitz, Roi Dayan, Sebastian Ott,
	Saeed Mahameed
In-Reply-To: <1480457972-31286-1-git-send-email-saeedm@mellanox.com>

From: Tariq Toukan <tariqt@mellanox.com>

In Striding RQ implementation, we used a single UMR
(User-Mode Memory Registration) memory key for all RQs.
When the product of RQs number*size gets high, we hit a
limitation of u16 field size in FW.

Here we move to using a UMR memory key per RQ, so we can
scale to any number of rings, with the maximum buffer
size in each.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h       | 12 ++---
 .../net/ethernet/mellanox/mlx5/core/en_ethtool.c   | 12 +----
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  | 53 ++++++++++++----------
 3 files changed, 35 insertions(+), 42 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index f16f7fb..63dd639 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -77,9 +77,9 @@
 						 MLX5_MPWRQ_WQE_PAGE_ORDER)
 
 #define MLX5_MTT_OCTW(npages) (ALIGN(npages, 8) / 2)
-#define MLX5E_REQUIRED_MTTS(rqs, wqes)\
-	(rqs * wqes * ALIGN(MLX5_MPWRQ_PAGES_PER_WQE, 8))
-#define MLX5E_VALID_NUM_MTTS(num_mtts) (MLX5_MTT_OCTW(num_mtts) <= U16_MAX)
+#define MLX5E_REQUIRED_MTTS(wqes)		\
+	(wqes * ALIGN(MLX5_MPWRQ_PAGES_PER_WQE, 8))
+#define MLX5E_VALID_NUM_MTTS(num_mtts) (MLX5_MTT_OCTW(num_mtts) - 1 <= U16_MAX)
 
 #define MLX5_UMR_ALIGN				(2048)
 #define MLX5_MPWRQ_SMALL_PACKET_THRESHOLD	(128)
@@ -347,7 +347,6 @@ struct mlx5e_rq {
 		struct {
 			struct mlx5e_mpw_info *info;
 			void                  *mtt_no_align;
-			u32                    mtt_offset;
 		} mpwqe;
 	};
 	struct {
@@ -382,6 +381,7 @@ struct mlx5e_rq {
 	u32                    rqn;
 	struct mlx5e_channel  *channel;
 	struct mlx5e_priv     *priv;
+	struct mlx5_core_mkey  umr_mkey;
 } ____cacheline_aligned_in_smp;
 
 struct mlx5e_umr_dma_info {
@@ -689,7 +689,6 @@ struct mlx5e_priv {
 
 	unsigned long              state;
 	struct mutex               state_lock; /* Protects Interface state */
-	struct mlx5_core_mkey      umr_mkey;
 	struct mlx5e_rq            drop_rq;
 
 	struct mlx5e_channel     **channel;
@@ -838,8 +837,7 @@ static inline void mlx5e_cq_arm(struct mlx5e_cq *cq)
 
 static inline u32 mlx5e_get_wqe_mtt_offset(struct mlx5e_rq *rq, u16 wqe_ix)
 {
-	return rq->mpwqe.mtt_offset +
-		wqe_ix * ALIGN(MLX5_MPWRQ_PAGES_PER_WQE, 8);
+	return wqe_ix * ALIGN(MLX5_MPWRQ_PAGES_PER_WQE, 8);
 }
 
 static inline int mlx5e_get_max_num_channels(struct mlx5_core_dev *mdev)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
index aa963d7..352462a 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
@@ -499,8 +499,7 @@ static int mlx5e_set_ringparam(struct net_device *dev,
 		return -EINVAL;
 	}
 
-	num_mtts = MLX5E_REQUIRED_MTTS(priv->params.num_channels,
-				       rx_pending_wqes);
+	num_mtts = MLX5E_REQUIRED_MTTS(rx_pending_wqes);
 	if (priv->params.rq_wq_type == MLX5_WQ_TYPE_LINKED_LIST_STRIDING_RQ &&
 	    !MLX5E_VALID_NUM_MTTS(num_mtts)) {
 		netdev_info(dev, "%s: rx_pending (%d) request can't be satisfied, try to reduce.\n",
@@ -565,7 +564,6 @@ static int mlx5e_set_channels(struct net_device *dev,
 	unsigned int count = ch->combined_count;
 	bool arfs_enabled;
 	bool was_opened;
-	u32 num_mtts;
 	int err = 0;
 
 	if (!count) {
@@ -584,14 +582,6 @@ static int mlx5e_set_channels(struct net_device *dev,
 		return -EINVAL;
 	}
 
-	num_mtts = MLX5E_REQUIRED_MTTS(count, BIT(priv->params.log_rq_size));
-	if (priv->params.rq_wq_type == MLX5_WQ_TYPE_LINKED_LIST_STRIDING_RQ &&
-	    !MLX5E_VALID_NUM_MTTS(num_mtts)) {
-		netdev_info(dev, "%s: rx count (%d) request can't be satisfied, try to reduce.\n",
-			    __func__, count);
-		return -EINVAL;
-	}
-
 	if (priv->params.num_channels == count)
 		return 0;
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 49ca30b..84a4adb 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -471,24 +471,25 @@ static void mlx5e_rq_free_mpwqe_info(struct mlx5e_rq *rq)
 	kfree(rq->mpwqe.info);
 }
 
-static int mlx5e_create_umr_mkey(struct mlx5e_priv *priv)
+static int mlx5e_create_umr_mkey(struct mlx5e_priv *priv,
+				 u64 npages, u8 page_shift,
+				 struct mlx5_core_mkey *umr_mkey)
 {
 	struct mlx5_core_dev *mdev = priv->mdev;
-	u64 npages = MLX5E_REQUIRED_MTTS(priv->profile->max_nch(mdev),
-					 BIT(MLX5E_PARAMS_MAXIMUM_LOG_RQ_SIZE_MPW));
 	int inlen = MLX5_ST_SZ_BYTES(create_mkey_in);
 	void *mkc;
 	u32 *in;
 	int err;
 
+	if (!MLX5E_VALID_NUM_MTTS(npages))
+		return -EINVAL;
+
 	in = mlx5_vzalloc(inlen);
 	if (!in)
 		return -ENOMEM;
 
 	mkc = MLX5_ADDR_OF(create_mkey_in, in, memory_key_mkey_entry);
 
-	npages = min_t(u32, ALIGN(U16_MAX, 4) * 2, npages);
-
 	MLX5_SET(mkc, mkc, free, 1);
 	MLX5_SET(mkc, mkc, umr_en, 1);
 	MLX5_SET(mkc, mkc, lw, 1);
@@ -497,17 +498,25 @@ static int mlx5e_create_umr_mkey(struct mlx5e_priv *priv)
 
 	MLX5_SET(mkc, mkc, qpn, 0xffffff);
 	MLX5_SET(mkc, mkc, pd, mdev->mlx5e_res.pdn);
-	MLX5_SET64(mkc, mkc, len, npages << PAGE_SHIFT);
+	MLX5_SET64(mkc, mkc, len, npages << page_shift);
 	MLX5_SET(mkc, mkc, translations_octword_size,
 		 MLX5_MTT_OCTW(npages));
-	MLX5_SET(mkc, mkc, log_page_size, PAGE_SHIFT);
+	MLX5_SET(mkc, mkc, log_page_size, page_shift);
 
-	err = mlx5_core_create_mkey(mdev, &priv->umr_mkey, in, inlen);
+	err = mlx5_core_create_mkey(mdev, umr_mkey, in, inlen);
 
 	kvfree(in);
 	return err;
 }
 
+static int mlx5e_create_rq_umr_mkey(struct mlx5e_rq *rq)
+{
+	struct mlx5e_priv *priv = rq->priv;
+	u64 num_mtts = MLX5E_REQUIRED_MTTS(BIT(priv->params.log_rq_size));
+
+	return mlx5e_create_umr_mkey(priv, num_mtts, PAGE_SHIFT, &rq->umr_mkey);
+}
+
 static int mlx5e_create_rq(struct mlx5e_channel *c,
 			   struct mlx5e_rq_param *param,
 			   struct mlx5e_rq *rq)
@@ -564,18 +573,20 @@ static int mlx5e_create_rq(struct mlx5e_channel *c,
 		rq->alloc_wqe = mlx5e_alloc_rx_mpwqe;
 		rq->dealloc_wqe = mlx5e_dealloc_rx_mpwqe;
 
-		rq->mpwqe.mtt_offset = c->ix *
-			MLX5E_REQUIRED_MTTS(1, BIT(priv->params.log_rq_size));
-
 		rq->mpwqe_stride_sz = BIT(priv->params.mpwqe_log_stride_sz);
 		rq->mpwqe_num_strides = BIT(priv->params.mpwqe_log_num_strides);
 
 		rq->buff.wqe_sz = rq->mpwqe_stride_sz * rq->mpwqe_num_strides;
 		byte_count = rq->buff.wqe_sz;
-		rq->mkey_be = cpu_to_be32(c->priv->umr_mkey.key);
-		err = mlx5e_rq_alloc_mpwqe_info(rq, c);
+
+		err = mlx5e_create_rq_umr_mkey(rq);
 		if (err)
 			goto err_rq_wq_destroy;
+		rq->mkey_be = cpu_to_be32(rq->umr_mkey.key);
+
+		err = mlx5e_rq_alloc_mpwqe_info(rq, c);
+		if (err)
+			goto err_destroy_umr_mkey;
 		break;
 	default: /* MLX5_WQ_TYPE_LINKED_LIST */
 		rq->dma_info = kzalloc_node(wq_sz * sizeof(*rq->dma_info),
@@ -626,6 +637,9 @@ static int mlx5e_create_rq(struct mlx5e_channel *c,
 
 	return 0;
 
+err_destroy_umr_mkey:
+	mlx5_core_destroy_mkey(mdev, &rq->umr_mkey);
+
 err_rq_wq_destroy:
 	if (rq->xdp_prog)
 		bpf_prog_put(rq->xdp_prog);
@@ -644,6 +658,7 @@ static void mlx5e_destroy_rq(struct mlx5e_rq *rq)
 	switch (rq->wq_type) {
 	case MLX5_WQ_TYPE_LINKED_LIST_STRIDING_RQ:
 		mlx5e_rq_free_mpwqe_info(rq);
+		mlx5_core_destroy_mkey(rq->priv->mdev, &rq->umr_mkey);
 		break;
 	default: /* MLX5_WQ_TYPE_LINKED_LIST */
 		kfree(rq->dma_info);
@@ -3868,15 +3883,9 @@ int mlx5e_attach_netdev(struct mlx5_core_dev *mdev, struct net_device *netdev)
 	profile = priv->profile;
 	clear_bit(MLX5E_STATE_DESTROYING, &priv->state);
 
-	err = mlx5e_create_umr_mkey(priv);
-	if (err) {
-		mlx5_core_err(mdev, "create umr mkey failed, %d\n", err);
-		goto out;
-	}
-
 	err = profile->init_tx(priv);
 	if (err)
-		goto err_destroy_umr_mkey;
+		goto out;
 
 	err = mlx5e_open_drop_rq(priv);
 	if (err) {
@@ -3916,9 +3925,6 @@ int mlx5e_attach_netdev(struct mlx5_core_dev *mdev, struct net_device *netdev)
 err_cleanup_tx:
 	profile->cleanup_tx(priv);
 
-err_destroy_umr_mkey:
-	mlx5_core_destroy_mkey(mdev, &priv->umr_mkey);
-
 out:
 	return err;
 }
@@ -3967,7 +3973,6 @@ void mlx5e_detach_netdev(struct mlx5_core_dev *mdev, struct net_device *netdev)
 	profile->cleanup_rx(priv);
 	mlx5e_close_drop_rq(priv);
 	profile->cleanup_tx(priv);
-	mlx5_core_destroy_mkey(priv->mdev, &priv->umr_mkey);
 	cancel_delayed_work_sync(&priv->update_stats_work);
 }
 
-- 
2.7.4

^ permalink raw reply related

* [PATCH net-next 4/7] net/mlx5e: Remove redundant hashtable lookup in configure flower
From: Saeed Mahameed @ 2016-11-29 22:19 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Tariq Toukan, Or Gerlitz, Roi Dayan, Sebastian Ott,
	Saeed Mahameed
In-Reply-To: <1480457972-31286-1-git-send-email-saeedm@mellanox.com>

From: Roi Dayan <roid@mellanox.com>

We will never find a flow with the same cookie as cls_flower always
allocates a new flow and the cookie is the allocated memory address.

Fixes: e3a2b7ed018e ("net/mlx5e: Support offload cls_flower with drop action")
Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_tc.c | 26 +++++++------------------
 1 file changed, 7 insertions(+), 19 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
index 4d06fab..dd6d954 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
@@ -915,25 +915,17 @@ int mlx5e_configure_flower(struct mlx5e_priv *priv, __be16 protocol,
 	u32 flow_tag, action;
 	struct mlx5e_tc_flow *flow;
 	struct mlx5_flow_spec *spec;
-	struct mlx5_flow_handle *old = NULL;
-	struct mlx5_esw_flow_attr *old_attr = NULL;
 	struct mlx5_eswitch *esw = priv->mdev->priv.eswitch;
 
 	if (esw && esw->mode == SRIOV_OFFLOADS)
 		fdb_flow = true;
 
-	flow = rhashtable_lookup_fast(&tc->ht, &f->cookie,
-				      tc->ht_params);
-	if (flow) {
-		old = flow->rule;
-		old_attr = flow->attr;
-	} else {
-		if (fdb_flow)
-			flow = kzalloc(sizeof(*flow) + sizeof(struct mlx5_esw_flow_attr),
-				       GFP_KERNEL);
-		else
-			flow = kzalloc(sizeof(*flow), GFP_KERNEL);
-	}
+	if (fdb_flow)
+		flow = kzalloc(sizeof(*flow) +
+			       sizeof(struct mlx5_esw_flow_attr),
+			       GFP_KERNEL);
+	else
+		flow = kzalloc(sizeof(*flow), GFP_KERNEL);
 
 	spec = mlx5_vzalloc(sizeof(*spec));
 	if (!spec || !flow) {
@@ -970,17 +962,13 @@ int mlx5e_configure_flower(struct mlx5e_priv *priv, __be16 protocol,
 	if (err)
 		goto err_del_rule;
 
-	if (old)
-		mlx5e_tc_del_flow(priv, old, old_attr);
-
 	goto out;
 
 err_del_rule:
 	mlx5_del_flow_rules(flow->rule);
 
 err_free:
-	if (!old)
-		kfree(flow);
+	kfree(flow);
 out:
 	kvfree(spec);
 	return err;
-- 
2.7.4

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox