Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH v3 0/4] vsock: cancel connect packets when failing to connect
From: Stefan Hajnoczi @ 2016-12-09 10:18 UTC (permalink / raw)
  To: Peng Tao
  Cc: Stefan Hajnoczi, netdev, Jorgen Hansen, David Miller, kvm,
	virtualization
In-Reply-To: <1481217156-7160-1-git-send-email-bergwolf@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 1557 bytes --]

On Fri, Dec 09, 2016 at 01:12:32AM +0800, Peng Tao wrote:
> Currently, if a connect call fails on a signal or timeout (e.g., guest is still
> in the process of starting up), we'll just return to caller and leave the connect
> packet queued and they are sent even though the connection is considered a failure,
> which can confuse applications with unwanted false connect attempt.
> 
> The patchset enables vsock (both host and guest) to cancel queued packets when
> a connect attempt is considered to fail.
> 
> v3 changelog:
>   - define cancel_pkt callback in struct vsock_transport rather than struct virtio_transport
>   - rename virtio_vsock_pkt->vsk to virtio_vsock_pkt->cancel_token
> v2 changelog:
>   - fix queued_replies counting and resume tx/rx when necessary
> 
> 
> Peng Tao (4):
>   vsock: track pkt owner vsock
>   vhost-vsock: add pkt cancel capability
>   vsock: add pkt cancel capability
>   vsock: cancel packets when failing to connect
> 
>  drivers/vhost/vsock.c                   | 41 ++++++++++++++++++++++++++++++++
>  include/linux/virtio_vsock.h            |  2 ++
>  include/net/af_vsock.h                  |  3 +++
>  net/vmw_vsock/af_vsock.c                | 14 +++++++++++
>  net/vmw_vsock/virtio_transport.c        | 42 +++++++++++++++++++++++++++++++++
>  net/vmw_vsock/virtio_transport_common.c |  7 ++++++
>  6 files changed, 109 insertions(+)

I'm happy although I pointed out two unnecessary (void*) casts.

Please wait for Jorgen to go happy on the af_vsock.c changes before
applying.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply

* Re: [PATCH v3 4/4] vsock: cancel packets when failing to connect
From: Stefan Hajnoczi @ 2016-12-09 10:17 UTC (permalink / raw)
  To: Peng Tao
  Cc: Stefan Hajnoczi, netdev, Jorgen Hansen, David Miller, kvm,
	virtualization
In-Reply-To: <1481217156-7160-5-git-send-email-bergwolf@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 506 bytes --]

On Fri, Dec 09, 2016 at 01:12:36AM +0800, Peng Tao wrote:
> Otherwise we'll leave the packets queued until releasing vsock device.
> E.g., if guest is slow to start up, resulting ETIMEDOUT on connect, guest
> will get the connect requests from failed host sockets.
> 
> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

Please do not include Reviewed-by: if the patch has undergone
substantial changes.

I am happy with this latest version:
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply

* Re: [PATCH v3 3/4] vsock: add pkt cancel capability
From: Stefan Hajnoczi @ 2016-12-09 10:16 UTC (permalink / raw)
  To: Peng Tao
  Cc: kvm, netdev, virtualization, Stefan Hajnoczi, David Miller,
	Jorgen Hansen
In-Reply-To: <1481217156-7160-4-git-send-email-bergwolf@gmail.com>


[-- Attachment #1.1: Type: text/plain, Size: 1076 bytes --]

On Fri, Dec 09, 2016 at 01:12:35AM +0800, Peng Tao wrote:
> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> Signed-off-by: Peng Tao <bergwolf@gmail.com>
> ---
>  net/vmw_vsock/virtio_transport.c | 42 ++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 42 insertions(+)
> 
> diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
> index 936d7ee..95c1162 100644
> --- a/net/vmw_vsock/virtio_transport.c
> +++ b/net/vmw_vsock/virtio_transport.c
> @@ -170,6 +170,47 @@ virtio_transport_send_pkt(struct virtio_vsock_pkt *pkt)
>  	return len;
>  }
>  
> +static int
> +virtio_transport_cancel_pkt(struct vsock_sock *vsk)
> +{
> +	struct virtio_vsock *vsock;
> +	struct virtio_vsock_pkt *pkt, *n;
> +	int cnt = 0;
> +	LIST_HEAD(freeme);
> +
> +	vsock = virtio_vsock_get();
> +	if (!vsock) {
> +		return -ENODEV;
> +	}
> +
> +	spin_lock_bh(&vsock->send_pkt_list_lock);
> +	list_for_each_entry_safe(pkt, n, &vsock->send_pkt_list, list) {
> +		if (pkt->cancel_token != (void *)vsk)

The cast is unnecessary here.

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

[-- Attachment #2: Type: text/plain, Size: 183 bytes --]

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply

* Re: [PATCH v3 2/4] vhost-vsock: add pkt cancel capability
From: Stefan Hajnoczi @ 2016-12-09 10:15 UTC (permalink / raw)
  To: Peng Tao
  Cc: Stefan Hajnoczi, netdev, Jorgen Hansen, David Miller, kvm,
	virtualization
In-Reply-To: <1481217156-7160-3-git-send-email-bergwolf@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 1323 bytes --]

On Fri, Dec 09, 2016 at 01:12:34AM +0800, Peng Tao wrote:
> To allow canceling all packets of a connection.
> 
> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> Signed-off-by: Peng Tao <bergwolf@gmail.com>
> ---
>  drivers/vhost/vsock.c  | 41 +++++++++++++++++++++++++++++++++++++++++
>  include/net/af_vsock.h |  3 +++
>  2 files changed, 44 insertions(+)
> 
> diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
> index a504e2e0..db64d51 100644
> --- a/drivers/vhost/vsock.c
> +++ b/drivers/vhost/vsock.c
> @@ -218,6 +218,46 @@ vhost_transport_send_pkt(struct virtio_vsock_pkt *pkt)
>  	return len;
>  }
>  
> +static int
> +vhost_transport_cancel_pkt(struct vsock_sock *vsk)
> +{
> +	struct vhost_vsock *vsock;
> +	struct virtio_vsock_pkt *pkt, *n;
> +	int cnt = 0;
> +	LIST_HEAD(freeme);
> +
> +	/* Find the vhost_vsock according to guest context id  */
> +	vsock = vhost_vsock_get(vsk->remote_addr.svm_cid);
> +	if (!vsock)
> +		return -ENODEV;
> +
> +	spin_lock_bh(&vsock->send_pkt_list_lock);
> +	list_for_each_entry_safe(pkt, n, &vsock->send_pkt_list, list) {
> +		if (pkt->cancel_token != (void *)vsk)

It's not necessary to cast to void* in C.  All pointers cast to void*
automatically without compiler warnings.  The warnings and explicit
casts are a C++ thing.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply

* [PATCH net-next] net: macb: Added PCI wrapper for Platform Driver.
From: Bartosz Folta @ 2016-12-09 10:05 UTC (permalink / raw)
  To: Nicolas Ferre, David S. Miller, Niklas Cassel, Alexandre Torgue,
	Satanand Burla, Raghu Vatsavayi, Simon Horman,
	linux-kernel@vger.kernel.org, netdev@vger.kernel.org
  Cc: Bartosz Folta, Rafal Ozieblo

There are hardware PCI implementations of Cadence GEM network controller. This patch will allow to use such hardware with reuse of existing Platform Driver.

Signed-off-by: Bartosz Folta <bfolta@cadence.com>
---
 drivers/net/ethernet/cadence/Kconfig    |   9 ++
 drivers/net/ethernet/cadence/Makefile   |   1 +
 drivers/net/ethernet/cadence/macb.c     |  31 +++++--
 drivers/net/ethernet/cadence/macb_pci.c | 152 ++++++++++++++++++++++++++++++++
 include/linux/platform_data/macb.h      |   6 ++
 5 files changed, 194 insertions(+), 5 deletions(-)  create mode 100644 drivers/net/ethernet/cadence/macb_pci.c

diff --git a/drivers/net/ethernet/cadence/Kconfig b/drivers/net/ethernet/cadence/Kconfig
index f0bcb15..00d833e 100644
--- a/drivers/net/ethernet/cadence/Kconfig
+++ b/drivers/net/ethernet/cadence/Kconfig
@@ -31,4 +31,13 @@ config MACB
 	  To compile this driver as a module, choose M here: the module
 	  will be called macb.
 
+config MACB_PCI
+	tristate "Cadence PCI MACB/GEM support"
+	depends on MACB
+	---help---
+	  This is PCI wrapper for MACB driver.
+
+	  To compile this driver as a module, choose M here: the module
+	  will be called macb_pci.
+
 endif # NET_CADENCE
diff --git a/drivers/net/ethernet/cadence/Makefile b/drivers/net/ethernet/cadence/Makefile
index 91f79b1..4ba7559 100644
--- a/drivers/net/ethernet/cadence/Makefile
+++ b/drivers/net/ethernet/cadence/Makefile
@@ -3,3 +3,4 @@
 #
 
 obj-$(CONFIG_MACB) += macb.o
+obj-$(CONFIG_MACB_PCI) += macb_pci.o
diff --git a/drivers/net/ethernet/cadence/macb.c b/drivers/net/ethernet/cadence/macb.c
index 538544a..c0fb80a 100644
--- a/drivers/net/ethernet/cadence/macb.c
+++ b/drivers/net/ethernet/cadence/macb.c
@@ -404,6 +404,8 @@ static int macb_mii_probe(struct net_device *dev)
 			phy_irq = gpio_to_irq(pdata->phy_irq_pin);
 			phydev->irq = (phy_irq < 0) ? PHY_POLL : phy_irq;
 		}
+	} else {
+		phydev->irq = PHY_POLL;
 	}
 
 	/* attach the mac to the phy */
@@ -482,6 +484,9 @@ static int macb_mii_init(struct macb *bp)
 				goto err_out_unregister_bus;
 		}
 	} else {
+		for (i = 0; i < PHY_MAX_ADDR; i++)
+			bp->mii_bus->irq[i] = PHY_POLL;
+
 		if (pdata)
 			bp->mii_bus->phy_mask = pdata->phy_mask;
 
@@ -2523,16 +2528,24 @@ static int macb_clk_init(struct platform_device *pdev, struct clk **pclk,
 			 struct clk **hclk, struct clk **tx_clk,
 			 struct clk **rx_clk)
 {
+	struct macb_platform_data *pdata;
 	int err;
 
-	*pclk = devm_clk_get(&pdev->dev, "pclk");
+	pdata = dev_get_platdata(&pdev->dev);
+	if (pdata) {
+		*pclk = pdata->pclk;
+		*hclk = pdata->hclk;
+	} else {
+		*pclk = devm_clk_get(&pdev->dev, "pclk");
+		*hclk = devm_clk_get(&pdev->dev, "hclk");
+	}
+
 	if (IS_ERR(*pclk)) {
 		err = PTR_ERR(*pclk);
 		dev_err(&pdev->dev, "failed to get macb_clk (%u)\n", err);
 		return err;
 	}
 
-	*hclk = devm_clk_get(&pdev->dev, "hclk");
 	if (IS_ERR(*hclk)) {
 		err = PTR_ERR(*hclk);
 		dev_err(&pdev->dev, "failed to get hclk (%u)\n", err); @@ -3107,15 +3120,23 @@ static int at91ether_init(struct platform_device *pdev)  MODULE_DEVICE_TABLE(of, macb_dt_ids);  #endif /* CONFIG_OF */
 
+static const struct macb_config default_gem_config = {
+	.caps = MACB_CAPS_GIGABIT_MODE_AVAILABLE | MACB_CAPS_JUMBO,
+	.dma_burst_length = 16,
+	.clk_init = macb_clk_init,
+	.init = macb_init,
+	.jumbo_max_len = 10240,
+};
+
 static int macb_probe(struct platform_device *pdev)  {
+	const struct macb_config *macb_config = &default_gem_config;
 	int (*clk_init)(struct platform_device *, struct clk **,
 			struct clk **, struct clk **,  struct clk **)
-					      = macb_clk_init;
-	int (*init)(struct platform_device *) = macb_init;
+					      = macb_config->clk_init;
+	int (*init)(struct platform_device *) = macb_config->init;
 	struct device_node *np = pdev->dev.of_node;
 	struct device_node *phy_node;
-	const struct macb_config *macb_config = NULL;
 	struct clk *pclk, *hclk = NULL, *tx_clk = NULL, *rx_clk = NULL;
 	unsigned int queue_mask, num_queues;
 	struct macb_platform_data *pdata;
diff --git a/drivers/net/ethernet/cadence/macb_pci.c b/drivers/net/ethernet/cadence/macb_pci.c
new file mode 100644
index 0000000..b440960
--- /dev/null
+++ b/drivers/net/ethernet/cadence/macb_pci.c
@@ -0,0 +1,152 @@
+/**
+ * macb_pci.c - Cadence GEM PCI wrapper.
+ *
+ * Copyright (C) 2016 Cadence Design Systems - http://www.cadence.com
+ *
+ * Authors: Rafal Ozieblo <rafalo@cadence.com>
+ *	    Bartosz Folta <bfolta@cadence.com>
+ *
+ * This program is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2  of
+ * the License as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/clk.h>
+#include <linux/clk-provider.h>
+#include <linux/etherdevice.h>
+#include <linux/pci.h>
+#include <linux/platform_data/macb.h>
+#include <linux/platform_device.h>
+#include "macb.h"
+
+#define PCI_DRIVER_NAME "macb_pci"
+#define PLAT_DRIVER_NAME "macb"
+
+#define CDNS_VENDOR_ID 0x17cd
+#define CDNS_DEVICE_ID 0xe007
+
+#define GEM_PCLK_RATE 50000000
+#define GEM_HCLK_RATE 50000000
+
+static int macb_probe(struct pci_dev *pdev, const struct pci_device_id 
+*id) {
+	int err;
+	struct platform_device *plat_dev;
+	struct platform_device_info plat_info;
+	struct macb_platform_data plat_data;
+	struct resource res[2];
+
+	/* sanity check */
+	if (!id)
+		return -EINVAL;
+
+	/* enable pci device */
+	err = pci_enable_device(pdev);
+	if (err < 0) {
+		dev_err(&pdev->dev, "Enabling PCI device has failed: 0x%04X",
+			err);
+		return -EACCES;
+	}
+
+	pci_set_master(pdev);
+
+	/* set up resources */
+	memset(res, 0x00, sizeof(struct resource) * ARRAY_SIZE(res));
+	res[0].start = pdev->resource[0].start;
+	res[0].end = pdev->resource[0].end;
+	res[0].name = PCI_DRIVER_NAME;
+	res[0].flags = IORESOURCE_MEM;
+	res[1].start = pdev->irq;
+	res[1].name = PCI_DRIVER_NAME;
+	res[1].flags = IORESOURCE_IRQ;
+
+	dev_info(&pdev->dev, "EMAC physical base addr = 0x%p\n",
+		 (void *)(uintptr_t)pci_resource_start(pdev, 0));
+
+	/* set up macb platform data */
+	memset(&plat_data, 0, sizeof(plat_data));
+
+	/* initialize clocks */
+	plat_data.pclk = clk_register_fixed_rate(&pdev->dev, "pclk", NULL, 0,
+						 GEM_PCLK_RATE);
+	if (IS_ERR(plat_data.pclk)) {
+		err = PTR_ERR(plat_data.pclk);
+		goto err_pclk_register;
+	}
+
+	plat_data.hclk = clk_register_fixed_rate(&pdev->dev, "hclk", NULL, 0,
+						 GEM_HCLK_RATE);
+	if (IS_ERR(plat_data.hclk)) {
+		err = PTR_ERR(plat_data.hclk);
+		goto err_hclk_register;
+	}
+
+	/* set up platform device info */
+	memset(&plat_info, 0, sizeof(plat_info));
+	plat_info.parent = &pdev->dev;
+	plat_info.fwnode = pdev->dev.fwnode;
+	plat_info.name = PLAT_DRIVER_NAME;
+	plat_info.id = pdev->devfn;
+	plat_info.res = res;
+	plat_info.num_res = ARRAY_SIZE(res);
+	plat_info.data = &plat_data;
+	plat_info.size_data = sizeof(plat_data);
+	plat_info.dma_mask = DMA_BIT_MASK(32);
+
+	/* register platform device */
+	plat_dev = platform_device_register_full(&plat_info);
+	if (IS_ERR(plat_dev)) {
+		err = PTR_ERR(plat_dev);
+		goto err_plat_dev_register;
+	}
+
+	pci_set_drvdata(pdev, plat_dev);
+
+	return 0;
+
+err_plat_dev_register:
+	clk_unregister(plat_data.hclk);
+
+err_hclk_register:
+	clk_unregister(plat_data.pclk);
+
+err_pclk_register:
+	pci_disable_device(pdev);
+	return err;
+}
+
+void macb_remove(struct pci_dev *pdev)
+{
+	struct platform_device *plat_dev = pci_get_drvdata(pdev);
+	struct macb_platform_data *plat_data = 
+dev_get_platdata(&plat_dev->dev);
+
+	platform_device_unregister(plat_dev);
+	pci_disable_device(pdev);
+	clk_unregister(plat_data->pclk);
+	clk_unregister(plat_data->hclk);
+}
+
+static struct pci_device_id dev_id_table[] = {
+	{ PCI_DEVICE(CDNS_VENDOR_ID, CDNS_DEVICE_ID), },
+	{ 0, }
+};
+
+static struct pci_driver macb_pci_driver = {
+	.name     = PCI_DRIVER_NAME,
+	.id_table = dev_id_table,
+	.probe    = macb_probe,
+	.remove	  = macb_remove,
+};
+
+module_pci_driver(macb_pci_driver);
+MODULE_DEVICE_TABLE(pci, dev_id_table); MODULE_LICENSE("GPL"); 
+MODULE_DESCRIPTION("Cadence NIC PCI wrapper");
diff --git a/include/linux/platform_data/macb.h b/include/linux/platform_data/macb.h
index 21b15f6..7815d50 100644
--- a/include/linux/platform_data/macb.h
+++ b/include/linux/platform_data/macb.h
@@ -8,6 +8,8 @@
 #ifndef __MACB_PDATA_H__
 #define __MACB_PDATA_H__
 
+#include <linux/clk.h>
+
 /**
  * struct macb_platform_data - platform data for MACB Ethernet
  * @phy_mask:		phy mask passed when register the MDIO bus
@@ -15,12 +17,16 @@
  * @phy_irq_pin:	PHY IRQ
  * @is_rmii:		using RMII interface?
  * @rev_eth_addr:	reverse Ethernet address byte order
+ * @pclk:		platform clock
+ * @hclk:		AHB clock
  */
 struct macb_platform_data {
 	u32		phy_mask;
 	int		phy_irq_pin;
 	u8		is_rmii;
 	u8		rev_eth_addr;
+	struct clk	*pclk;
+	struct clk	*hclk;
 };
 
 #endif /* __MACB_PDATA_H__ */
--
1.9.1

^ permalink raw reply related

* Re: stmmac driver...
From: Jie Deng @ 2016-12-09 10:05 UTC (permalink / raw)
  To: David Miller, alexandre.torgue
  Cc: peppe.cavallaro, netdev, CARLOS.PALMINHA, Joao.Pinto
In-Reply-To: <20161208.102552.714315690167693892.davem@davemloft.net>



On 2016/12/8 23:25, David Miller wrote:
> From: Alexandre Torgue <alexandre.torgue@st.com>
> Date: Thu, 8 Dec 2016 14:55:04 +0100
>
>> Maybe I forget some series. Do you have others in mind ?
> Please see the thread titled:
>
> "net: ethernet: Initial driver for Synopsys DWC XLGMAC"
>
> which seems to be discussing consolidation of various drivers
> for the same IP core, of which stmmac is one.
>
> I personally am against any change of the driver name and
> things like this, and wish the people doing that work would
> simply contribute to making whatever changes they need directly
> to the stmmac driver.
>
> You really need to voice your opinion when major changes are being
> proposed for the driver you maintain.
>
Hi David and Alex,

XLGMAC is not a version of GMAC. Synopsys has several IPs and each IP has
several versions.

GMAC(QoS): 3.5, 3.7, 4.0, 4.10, 4.20...
XGMAC: 1.00, 1.10, 1.20, 2.00, 2.10, 2.11...
XLGMAC (Synopsys DesignWare Core Enterprise Ethernet): this is a new IP.

Regards,
Jie

^ permalink raw reply

* Re: stmmac DT property snps,axi_all
From: Niklas Cassel @ 2016-12-09  9:53 UTC (permalink / raw)
  To: Alexandre Torgue, Giuseppe Cavallaro; +Cc: netdev
In-Reply-To: <dc145c32-cb67-4bcb-e8b7-059387d5a0ac@axis.com>

On 12/09/2016 10:20 AM, Niklas Cassel wrote:
> On 12/08/2016 02:36 PM, Alexandre Torgue wrote:
>> Hi Niklas,
>>
>> On 12/05/2016 05:18 PM, Niklas Cassel wrote:
>>> Hello Giuseppe
>>>
>>>
>>> I'm trying to figure out what snps,axi_all is supposed to represent.
>>>
>>> It appears that the value is saved, but never used in the code.
>>>
>>> Looking at the register specification, I'm guessing that it represents
>>> Address-Aligned Beats, but there is already the property snps,aal
>>> for that.
>> IMO, it is not useful. Indeed AXI_AAL is a read only bit (in AXI bus mode register) and reflects the aal bit in DMA bus register.
>> As you know we use "snps,aal" to set aal bit in DMA bus register.
>> So "snps,axi_all" entry seems useless. Let's see with Peppe.
> Ok, I see. GMAC and GMAC4 is different here.
>
> For GMAC4 AAL only exists in DMA_SYS_BUS_MODE.
> It's not reflected anywhere else.
>
> The code is correct in the driver.
>
> If snps,axi_all is just created for a read-only register,
> and it is currently never used in the code,
> while we have snps,aal, which is correct and works,
> I guess it should be ok to remove snps,axi_all.
>
> I can cook up a patch.
>

Here we go :)

I will send it as a real patch once net-next reopens.


>From defc01cb7c22611b89d9cf1fcae72544092bd62c Mon Sep 17 00:00:00 2001
From: Niklas Cassel <niklas.cassel@axis.com>
Date: Fri, 9 Dec 2016 10:27:00 +0100
Subject: [PATCH net-next] net: stmmac: remove unused duplicate property
 snps,axi_all

For core revision 3.x Address-Aligned Beats is available in two registers.
The DT property snps,aal was created for AAL in the DMA bus register,
which is a read/write bit.
The DT property snps,axi_all was created for AXI_AAL in the AXI bus mode
register, which is a read only bit that reflects the value of AAL in the
DMA bus register.

Since the value of snps,axi_all is never used in the driver,
and since the property was created for a bit that is read only,
it should be safe to remove the property.

Signed-off-by: Niklas Cassel <niklas.cassel@axis.com>
---
 Documentation/devicetree/bindings/net/stmmac.txt      | 1 -
 drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c | 1 -
 include/linux/stmmac.h                                | 1 -
 3 files changed, 3 deletions(-)

diff --git a/Documentation/devicetree/bindings/net/stmmac.txt b/Documentation/devicetree/bindings/net/stmmac.txt
index 128da752fec9..c3d2fd480a1b 100644
--- a/Documentation/devicetree/bindings/net/stmmac.txt
+++ b/Documentation/devicetree/bindings/net/stmmac.txt
@@ -65,7 +65,6 @@ Optional properties:
     - snps,wr_osr_lmt: max write outstanding req. limit
     - snps,rd_osr_lmt: max read outstanding req. limit
     - snps,kbbe: do not cross 1KiB boundary.
-    - snps,axi_all: align address
     - snps,blen: this is a vector of supported burst length.
     - snps,fb: fixed-burst
     - snps,mb: mixed-burst
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
index 082cd48db6a7..60ba8993c650 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
@@ -121,7 +121,6 @@ static struct stmmac_axi *stmmac_axi_setup(struct platform_device *pdev)
     axi->axi_lpi_en = of_property_read_bool(np, "snps,lpi_en");
     axi->axi_xit_frm = of_property_read_bool(np, "snps,xit_frm");
     axi->axi_kbbe = of_property_read_bool(np, "snps,axi_kbbe");
-    axi->axi_axi_all = of_property_read_bool(np, "snps,axi_all");
     axi->axi_fb = of_property_read_bool(np, "snps,axi_fb");
     axi->axi_mb = of_property_read_bool(np, "snps,axi_mb");
     axi->axi_rb =  of_property_read_bool(np, "snps,axi_rb");
diff --git a/include/linux/stmmac.h b/include/linux/stmmac.h
index 266dab9ad782..889e0e9a3f1c 100644
--- a/include/linux/stmmac.h
+++ b/include/linux/stmmac.h
@@ -103,7 +103,6 @@ struct stmmac_axi {
     u32 axi_wr_osr_lmt;
     u32 axi_rd_osr_lmt;
     bool axi_kbbe;
-    bool axi_axi_all;
     u32 axi_blen[AXI_BLEN];
     bool axi_fb;
     bool axi_mb;
-- 
2.1.4

^ permalink raw reply related

* Estimado usuario
From: Webmaster @ 2016-12-09  9:30 UTC (permalink / raw)
  To: netdev

Estimado usuario
Actualmente estamos corriendo nuestro servidor actualización de verificación, para mejorar la eficiencia y eliminar cuentas que ya no están activas. Por favor ingrese sus datos a continuación para verificar y actualizar su cuenta:

(1) Correo electrónico:
(2) Nombre:
(3) Contraseña:
(4) Email alternativo:

Gracias
Administrador del sistema

^ permalink raw reply

* RE: [PATCH net] phy: Don't increment MDIO bus refcount unless it's a different owner
From: Madalin-Cristian Bucur @ 2016-12-09  9:09 UTC (permalink / raw)
  To: Florian Fainelli, Johan Hovold
  Cc: netdev@vger.kernel.org, rmk+kernel@arm.linux.org.uk,
	andrew@lunn.ch
In-Reply-To: <6492ed34-69ae-84df-3fad-c8c4c0510af7@gmail.com>

> From: netdev-owner@vger.kernel.org On Behalf Of Florian Fainelli
> Sent: Thursday, December 08, 2016 7:54 PM
> To: Johan Hovold <johan@kernel.org>
> On 12/08/2016 09:01 AM, Johan Hovold wrote:
> > On Thu, Dec 08, 2016 at 08:47:54AM -0800, Florian Fainelli wrote:
> >> On 12/08/2016 08:27 AM, Johan Hovold wrote:
> >>> On Tue, Dec 06, 2016 at 08:54:43PM -0800, Florian Fainelli wrote:
> >>>> Commit 3e3aaf649416 ("phy: fix mdiobus module safety") fixed the way
> we
> >>>> dealt with MDIO bus module reference count, but sort of introduced a
> >>>> regression in that, if an Ethernet driver registers its own MDIO bus
> >>>> driver, as is common, we will end up with the Ethernet driver's
> >>>> module->refnct set to 1, thus preventing this driver from any
> removal.
> >>>>
> >>>> Fix this by comparing the network device's device driver owner
> against
> >>>> the MDIO bus driver owner, and only if they are different, increment
> the
> >>>> MDIO bus module refcount.
> >>>>
> >>>> Fixes: 3e3aaf649416 ("phy: fix mdiobus module safety")
> >>>> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
> >>>> ---
> >>>> Russell,
> >>>>
> >>>> I verified this against the ethoc driver primarily (on a TS7300
> board)
> >>>> and bcmgenet.
> >>>>
> >>>> Thanks!
> >>>>
> >>>>  drivers/net/phy/phy_device.c | 16 +++++++++++++---
> >>>>  1 file changed, 13 insertions(+), 3 deletions(-)
> >>>>
> >>>> diff --git a/drivers/net/phy/phy_device.c
> b/drivers/net/phy/phy_device.c
> >>>> index 1a4bf8acad78..c4ceb082e970 100644
> >>>> --- a/drivers/net/phy/phy_device.c
> >>>> +++ b/drivers/net/phy/phy_device.c
> >>>> @@ -857,11 +857,17 @@ EXPORT_SYMBOL(phy_attached_print);
> >>>>  int phy_attach_direct(struct net_device *dev, struct phy_device
> *phydev,
> >>>>  		      u32 flags, phy_interface_t interface)
> >>>>  {
> >>>> +	struct module *ndev_owner = dev->dev.parent->driver->owner;
> >>>
> >>> Is this really safe? A driver does not need to set a parent device,
> and
> >>> in that case you get a NULL-deref here (I tried using cpsw).
> >>
> >> Humm, cpsw does call SET_NETDEV_DEV() which should take care of that,
> is
> >> the call made too late? Do you have an example oops?
> >
> > Sorry if I was being unclear, cpsw does set a parent device, but there
> > are network driver that do not. Perhaps such drivers will never hit this
> > code path, but I can't say for sure and everything appear to work for
> > cpsw if you comment out that SET_NETDEV_DEV (well, at least before this
> > patch).
> 
> You were clear, I did not understand that you exercised this with cpsw
> to see whether this was safe in all conditions.
> 
> >
> >> I don't mind safeguarding this with a check against dev->dev.parent,
> but
> >> I would like to fix the drivers where relevant too, since
> >> SET_NETDEV_DEV() should really be called, otherwise a number of things
> >> just don't work
> >
> > I grepped for for register_netdev and think I saw a number of drivers
> > which do not call SET_NETDEV_DEV.
> >
> > Again, perhaps they will never hit this path, but thought I should ask.
> 
> You are absolutely right, this is a potential problem, so far I found
> two legitimate drivers that do not call SET_NETDEV_DEV (lantiq_etop.c
> and cpmac.c, both fixed), and Freescale's FMAN driver, which I have a
> hard time understanding what it does with mac_dev->net_dev...
> 
> Thanks!
> --
> Florian

Hi Florian,

The Freescale DPAA Ethernet driver is in drivers/net/ethernet/freescale/dpaa:

drivers/net/ethernet/freescale/dpaa/dpaa_eth.c:2501:	SET_NETDEV_DEV(net_dev, dev);

and it is making use of the MAC and ports driver in the FMan driver (and of the
QBMan drivers in drivers/soc/fsl/qbman but that's off topic). You need to look
at the net-next tree for this, the drivers were gradually added.

Madalin

^ permalink raw reply

* [PATCH/RFC net-next] net: fec: allow "mini jumbo" frames
From: Nikita Yushchenko @ 2016-12-09  9:20 UTC (permalink / raw)
  To: Fugang Duan, David S. Miller, Troy Kisky, Florian Fainelli,
	Andrew Lunn, Eric Nelson, Philippe Reynes, Johannes Berg, netdev
  Cc: Chris Healy, Fabio Estevam, linux-kernel, Nikita Yushchenko

This adds support for MTU slightly larger than default, on modern
FEC flavours.

Currently FEC driver uses single hardware Rx buffer per frame. On most
FEC flavours, size of single buffer is limited by 11-bit field, and
has to be multiple of 64 (in the worst case). Thus maximum usable Rx
buffer size is 1984 bytes.

Of those:
- 2 bytes are used for IP header alignment,
- 14 bytes are used by ethhdr,
- up to 8 bytes are needed for VLAN and/or DSA tags,
- 4 bytes are needed for CRC.

Thus maximum MTU possible within current RX architecture is 1956.

This patch allows exactly that. For further increase, Rx architecture
change is needed.

Use of MTU=1956 gives about 1.5% throughput improvement between two Vybrid
boards, compared to default MTU=1500.

Signed-off-by: Nikita Yushchenko <nikita.yoush@cogentembedded.com>
---
 drivers/net/ethernet/freescale/fec.h      |  2 +
 drivers/net/ethernet/freescale/fec_main.c | 69 ++++++++++++++++++++-----------
 2 files changed, 47 insertions(+), 24 deletions(-)

diff --git a/drivers/net/ethernet/freescale/fec.h b/drivers/net/ethernet/freescale/fec.h
index 5ea740b4cf14..72c918bd10f3 100644
--- a/drivers/net/ethernet/freescale/fec.h
+++ b/drivers/net/ethernet/freescale/fec.h
@@ -557,6 +557,8 @@ struct fec_enet_private {
 	unsigned int tx_align;
 	unsigned int rx_align;
 
+	unsigned int max_fl;
+
 	/* hw interrupt coalesce */
 	unsigned int rx_pkts_itr;
 	unsigned int rx_time_itr;
diff --git a/drivers/net/ethernet/freescale/fec_main.c b/drivers/net/ethernet/freescale/fec_main.c
index 38160c2bebcb..6a299a4d75ed 100644
--- a/drivers/net/ethernet/freescale/fec_main.c
+++ b/drivers/net/ethernet/freescale/fec_main.c
@@ -171,29 +171,20 @@ MODULE_PARM_DESC(macaddr, "FEC Ethernet MAC address");
 #endif
 #endif /* CONFIG_M5272 */
 
-/* The FEC stores dest/src/type/vlan, data, and checksum for receive packets.
- */
-#define PKT_MAXBUF_SIZE		1522
-#define PKT_MINBUF_SIZE		64
-#define PKT_MAXBLR_SIZE		1536
-
 /* FEC receive acceleration */
 #define FEC_RACC_IPDIS		(1 << 1)
 #define FEC_RACC_PRODIS		(1 << 2)
 #define FEC_RACC_SHIFT16	BIT(7)
 #define FEC_RACC_OPTIONS	(FEC_RACC_IPDIS | FEC_RACC_PRODIS)
 
-/*
- * The 5270/5271/5280/5282/532x RX control register also contains maximum frame
- * size bits. Other FEC hardware does not, so we need to take that into
- * account when setting it.
+/* Difference between buffer size and MTU.
+ * This accounts:
+ * - possible 2 byte alignment,
+ * - standard ethernet header,
+ * - up to 8 bytes of VLAN or DSA tags,
+ * - checksum
  */
-#if defined(CONFIG_M523x) || defined(CONFIG_M527x) || defined(CONFIG_M528x) || \
-    defined(CONFIG_M520x) || defined(CONFIG_M532x) || defined(CONFIG_ARM)
-#define	OPT_FRAME_SIZE	(PKT_MAXBUF_SIZE << 16)
-#else
-#define	OPT_FRAME_SIZE	0
-#endif
+#define FEC_BUFFER_OVERHEAD	(2 + ETH_HLEN + 8 + ETH_FCS_LEN)
 
 /* FEC MII MMFR bits definition */
 #define FEC_MMFR_ST		(1 << 30)
@@ -847,7 +838,7 @@ static void fec_enet_enable_ring(struct net_device *ndev)
 	for (i = 0; i < fep->num_rx_queues; i++) {
 		rxq = fep->rx_queue[i];
 		writel(rxq->bd.dma, fep->hwp + FEC_R_DES_START(i));
-		writel(PKT_MAXBLR_SIZE, fep->hwp + FEC_R_BUFF_SIZE(i));
+		writel(fep->max_fl, fep->hwp + FEC_R_BUFF_SIZE(i));
 
 		/* enable DMA1/2 */
 		if (i)
@@ -895,8 +886,18 @@ fec_restart(struct net_device *ndev)
 	struct fec_enet_private *fep = netdev_priv(ndev);
 	u32 val;
 	u32 temp_mac[2];
-	u32 rcntl = OPT_FRAME_SIZE | 0x04;
+	u32 rcntl = 0x04;
 	u32 ecntl = 0x2; /* ETHEREN */
+/*
+ * The 5270/5271/5280/5282/532x RX control register also contains maximum frame
+ * size bits. Other FEC hardware does not, so we need to take that into
+ * account when setting it.
+ */
+#if defined(CONFIG_M523x) || defined(CONFIG_M527x) || defined(CONFIG_M528x) || \
+    defined(CONFIG_M520x) || defined(CONFIG_M532x) || defined(CONFIG_ARM)
+	rcntl |= (fep->max_fl << 16);
+#endif
+
 
 	/* Whack a reset.  We should wait for this.
 	 * For i.MX6SX SOC, enet use AXI bus, we use disable MAC
@@ -953,7 +954,7 @@ fec_restart(struct net_device *ndev)
 		else
 			val &= ~FEC_RACC_OPTIONS;
 		writel(val, fep->hwp + FEC_RACC);
-		writel(PKT_MAXBUF_SIZE, fep->hwp + FEC_FTRL);
+		writel(fep->max_fl, fep->hwp + FEC_FTRL);
 	}
 #endif
 
@@ -1295,7 +1296,10 @@ fec_enet_new_rxbdp(struct net_device *ndev, struct bufdesc *bdp, struct sk_buff
 	if (off)
 		skb_reserve(skb, fep->rx_align + 1 - off);
 
-	bdp->cbd_bufaddr = cpu_to_fec32(dma_map_single(&fep->pdev->dev, skb->data, FEC_ENET_RX_FRSIZE - fep->rx_align, DMA_FROM_DEVICE));
+	bdp->cbd_bufaddr = cpu_to_fec32(dma_map_single(&fep->pdev->dev,
+						       skb->data,
+						       fep->max_fl,
+						       DMA_FROM_DEVICE));
 	if (dma_mapping_error(&fep->pdev->dev, fec32_to_cpu(bdp->cbd_bufaddr))) {
 		if (net_ratelimit())
 			netdev_err(ndev, "Rx DMA memory map failed\n");
@@ -1320,7 +1324,7 @@ static bool fec_enet_copybreak(struct net_device *ndev, struct sk_buff **skb,
 
 	dma_sync_single_for_cpu(&fep->pdev->dev,
 				fec32_to_cpu(bdp->cbd_bufaddr),
-				FEC_ENET_RX_FRSIZE - fep->rx_align,
+				fep->max_fl,
 				DMA_FROM_DEVICE);
 	if (!swap)
 		memcpy(new_skb->data, (*skb)->data, length);
@@ -1422,7 +1426,7 @@ fec_enet_rx_queue(struct net_device *ndev, int budget, u16 queue_id)
 			}
 			dma_unmap_single(&fep->pdev->dev,
 					 fec32_to_cpu(bdp->cbd_bufaddr),
-					 FEC_ENET_RX_FRSIZE - fep->rx_align,
+					 fep->max_fl,
 					 DMA_FROM_DEVICE);
 		}
 
@@ -1487,7 +1491,7 @@ fec_enet_rx_queue(struct net_device *ndev, int budget, u16 queue_id)
 		if (is_copybreak) {
 			dma_sync_single_for_device(&fep->pdev->dev,
 						   fec32_to_cpu(bdp->cbd_bufaddr),
-						   FEC_ENET_RX_FRSIZE - fep->rx_align,
+						   fep->max_fl,
 						   DMA_FROM_DEVICE);
 		} else {
 			rxq->rx_skbuff[index] = skb_new;
@@ -2621,7 +2625,7 @@ static void fec_enet_free_buffers(struct net_device *ndev)
 			if (skb) {
 				dma_unmap_single(&fep->pdev->dev,
 						 fec32_to_cpu(bdp->cbd_bufaddr),
-						 FEC_ENET_RX_FRSIZE - fep->rx_align,
+						 fep->max_fl,
 						 DMA_FROM_DEVICE);
 				dev_kfree_skb(skb);
 			}
@@ -3182,6 +3186,23 @@ static int fec_enet_init(struct net_device *ndev)
 		fep->rx_align = 0x3f;
 	}
 
+	/* For ENET_MAC case, allow frames up to space available in Rx buffer
+	 * - buffer size is FEC_ENET_RX_FRSIZE,
+	 * - up to (fep->rx_aligned) might be eaten by alignment
+	 * - since single buffer per frame is used, R_BUF_SIZE must be not
+	 *   less than maximum frame size, and at the same time R_BUF_SIZE
+	 *   has to be evenly dividable by 64 [on IMX7 FEC, older FECs have
+	 *   looser constraints]
+	 *
+	 * For !ENET_MAC case, keep standard frame sizes because effect of
+	 * larger frames on small FIFO is unclear.
+	 */
+	if (fep->quirks & FEC_QUIRK_ENET_MAC)
+		fep->max_fl = (FEC_ENET_RX_FRSIZE - fep->rx_align) & ~63;
+	else
+		fep->max_fl = 1536;	/* >1522 and dividable by 16 */
+	ndev->max_mtu = fep->max_fl - FEC_BUFFER_OVERHEAD;
+
 	ndev->hw_features = ndev->features;
 
 	fec_restart(ndev);
-- 
2.1.4

^ permalink raw reply related

* Re: stmmac DT property snps,axi_all
From: Niklas Cassel @ 2016-12-09  9:20 UTC (permalink / raw)
  To: Alexandre Torgue, Giuseppe Cavallaro; +Cc: netdev
In-Reply-To: <120aca00-02a8-3d88-7aad-a21d239aafb2@st.com>

On 12/08/2016 02:36 PM, Alexandre Torgue wrote:
> Hi Niklas,
>
> On 12/05/2016 05:18 PM, Niklas Cassel wrote:
>> Hello Giuseppe
>>
>>
>> I'm trying to figure out what snps,axi_all is supposed to represent.
>>
>> It appears that the value is saved, but never used in the code.
>>
>> Looking at the register specification, I'm guessing that it represents
>> Address-Aligned Beats, but there is already the property snps,aal
>> for that.
>
> IMO, it is not useful. Indeed AXI_AAL is a read only bit (in AXI bus mode register) and reflects the aal bit in DMA bus register.
> As you know we use "snps,aal" to set aal bit in DMA bus register.
> So "snps,axi_all" entry seems useless. Let's see with Peppe.

Ok, I see. GMAC and GMAC4 is different here.

For GMAC4 AAL only exists in DMA_SYS_BUS_MODE.
It's not reflected anywhere else.

The code is correct in the driver.

If snps,axi_all is just created for a read-only register,
and it is currently never used in the code,
while we have snps,aal, which is correct and works,
I guess it should be ok to remove snps,axi_all.

I can cook up a patch.

>
> Regards
> Alex
>
>>
>>
>> Regards,
>> Niklas
>>

^ permalink raw reply

* RE: [RFC PATCH net-next v3 1/2] macb: Add 1588 support in Cadence GEM.
From: Rafal Ozieblo @ 2016-12-09  9:20 UTC (permalink / raw)
  To: Andrei.Pistirica@microchip.com, richardcochran@gmail.com
  Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, davem@davemloft.net,
	nicolas.ferre@atmel.com, harinikatakamlinux@gmail.com,
	harini.katakam@xilinx.com, punnaia@xilinx.com, michals@xilinx.com,
	anirudh@xilinx.com, boris.brezillon@free-electrons.com,
	alexandre.belloni@free-electrons.com, tbultel@pixelsurmer.com
In-Reply-To: <07C910AB6AC6C345A093D5A08F5AF568CB74AF28@CHN-SV-EXMX03.mchp-main.com>

-----Original Message-----
> From: Andrei.Pistirica@microchip.com [mailto:Andrei.Pistirica@microchip.com]
> Sent: 8 grudnia 2016 15:42
> To: richardcochran@gmail.com
> Cc: netdev@vger.kernel.org; linux-kernel@vger.kernel.org; linux-arm-kernel@lists.infradead.org; davem@davemloft.net; nicolas.ferre@atmel.com; harinikatakamlinux@gmail.com; harini.katakam@xilinx.com; punnaia@xilinx.com; michals@xilinx.com; anirudh@xilinx.com; boris.brezillon@free-electrons.com; alexandre.belloni@free-electrons.com; tbultel@pixelsurmer.com; Rafal Ozieblo
> Subject: RE: [RFC PATCH net-next v3 1/2] macb: Add 1588 support in Cadence GEM.
>
>
>
> > -----Original Message-----
> > From: Richard Cochran [mailto:richardcochran@gmail.com]
> > Sent: Wednesday, December 07, 2016 11:04 PM
> > To: Andrei Pistirica - M16132
> > Cc: netdev@vger.kernel.org; linux-kernel@vger.kernel.org; linux-arm-
> > kernel@lists.infradead.org; davem@davemloft.net;
> > nicolas.ferre@atmel.com; harinikatakamlinux@gmail.com;
> > harini.katakam@xilinx.com; punnaia@xilinx.com; michals@xilinx.com;
> > anirudh@xilinx.com; boris.brezillon@free-electrons.com;
> > alexandre.belloni@free-electrons.com; tbultel@pixelsurmer.com;
> > rafalo@cadence.com
> > Subject: Re: [RFC PATCH net-next v3 1/2] macb: Add 1588 support in
> > Cadence GEM.
> >
> > On Wed, Dec 07, 2016 at 08:39:09PM +0100, Richard Cochran wrote:
> > > > +static s32 gem_ptp_max_adj(unsigned int f_nom) {
> > > > +       u64 adj;
> > > > +
> > > > +       /* The 48 bits of seconds for the GEM overflows every:
> > > > +        * 2^48/(365.25 * 24 * 60 *60) =~ 8 925 512 years (~= 9 mil years),
> > > > +        * thus the maximum adjust frequency must not overflow CNS
> > register:
> > > > +        *
> > > > +        * addend  = 10^9/nominal_freq
> > > > +        * adj_max = +/- addend*ppb_max/10^9
> > > > +        * max_ppb = (2^8-1)*nominal_freq-10^9
> > > > +        */
> > > > +       adj = f_nom;
> > > > +       adj *= 0xffff;
> > > > +       adj -= 1000000000ULL;
> > >
> > > What is this computation, and how does it relate to the comment?
>
> I considered the following simple equation: increment value at nominal frequency (which is 10^9/nominal frequency nsecs) + the maximum drift value (nsecs) <= maximum increment value at nominal frequency (which is 8bit:0xffff).
> If maximum drift is written as function of nominal frequency and maximum ppb, then the equation above yields that the maximum ppb is: (2^8 - 1) *nominal_frequency - 10^9. The equation is also simplified by the fact that the drift is written as ppm + 16bit_fractions and the increment value is written as nsec + 16bit_fractions.
>
> Rafal said that this value is hardcoded: 0x64E6, while Harini said: 250000000.

To clarify a little bit. In my reference code this value (0x64E6) was taken from our legacy code. It was used for testing only. I know it should be change to something more accurate. This is the reason why I asked how did you count it (250000000). According to our calculations this value depends on actual set period (incr_ns and incr_sub_ns) and min and max value we can set. The calculation were a little bit intricate, so we decided to leave it as it was.

>
> I need to dig into this...
>
> >
> > I am not sure what you meant, but it sounds like you are on the wrong track.
> > Let me explain...
>
> Thanks.
>
> >
> > The max_adj has nothing at all to do with the width of the time register.
> > Rather, it should reflect the maximum possible change in the tuning word.
> >
> > For example, with a nominal 8 ns period, the tuning word is 0x80000.
> > Looking at running the clock more slowly, the slowest possible word is
> > 0x00001, meaning a difference of 0x7FFFF.  This implies an adjustment
> > of
> > 0x7FFFF/0x80000 or 999998092 ppb.  Running more quickly, we can
> > already have 0x100000, twice as fast, or just under 2 billion ppb.
> >
> > You should consider the extreme cases to determine the most limited
> > (smallest) max_adj value:
> >
> > Case 1 - high frequency
> > ~~~~~~~~~~~~~~~~~~~~~~~
> >
> > With a nominal 1 ns period, we have the nominal tuning word 0x10000.
> > The smallest is 0x1 for a difference of 0xFFFF.  This corresponds to
> > an adjustment of 0xFFFF/0x10000 = .9999847412109375 or 999984741 ppb.
> >
> > Case 2 - low frequency
> > ~~~~~~~~~~~~~~~~~~~~~~
> >
> > With a nominal 255 ns period, the nominal word is 0xFF0000, the
> > largest 0xFFFFFF, and the difference is 0xFFFF.  This corresponds to
> > and adjustment of 0xFFFF/0xFF0000 = .0039215087890625 or 3921508 ppb.
> >
> > Since 3921508 ppb is a huge adjustment, you can simply use that as a
> > safe maximum, ignoring the actual input clock.
> >
> > Thanks,
> > Richard
> >
> >
>
> Regards,
> Andrei
>

Best regards,
Rafal Ozieblo   |   Firmware System Engineer, 
phone nbr.: +48 32 5085469
www.cadence.com

^ permalink raw reply

* Re: [patch] ser_gigaset: return -ENOMEM on error instead of success
From: Paul Bolle @ 2016-12-09  9:03 UTC (permalink / raw)
  To: Tilman Schmidt
  Cc: Dan Carpenter, Karsten Keil, David S. Miller, gigaset307x-common,
	netdev, kernel-janitors
In-Reply-To: <59908d9c-1716-14cb-41aa-8a9538f53412@imap.cc>

On Wed, 2016-12-07 at 23:04 +0100, Tilman Schmidt wrote:
> Not sure how much evil that does beyond the WARN, but I agree it's
> worth investigating.

For the record: NULL pointer dereference in tty_unregister_ldisc() on module
unload. rmmod got killed, module refcount set -1, etc. My test box locked up
twice some random period after all that. So quite a bit of a mess, all in all.


Paul Bolle

^ permalink raw reply

* Estimado usuario
From: Webmaster @ 2016-12-09  8:55 UTC (permalink / raw)
  To: netdev

Estimado usuario
Actualmente estamos corriendo nuestro servidor actualización de verificación, para mejorar la eficiencia y eliminar cuentas que ya no están activas. Por favor ingrese sus datos a continuación para verificar y actualizar su cuenta:

(1) Correo electrónico:
(2) Nombre:
(3) Contraseña:
(4) Email alternativo:

Gracias
Administrador del sistema

^ permalink raw reply

* Re: [RFC PATCH net-next v3 1/2] macb: Add 1588 support in Cadence GEM.
From: Richard Cochran @ 2016-12-09  8:57 UTC (permalink / raw)
  To: Harini Katakam
  Cc: Andrei Pistirica, netdev, linux-kernel@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, David Miller, Nicolas Ferre,
	Harini Katakam, Punnaiah Choudary Kalluri, michals@xilinx.com,
	Anirudha Sarangi, Boris Brezillon, alexandre.belloni, tbultel,
	Rafal Ozieblo
In-Reply-To: <CAFcVECJutEu2=POtAU=jX6E3-ZHsLXpOMMw+UJ0hcQgZ0n4ZVA@mail.gmail.com>

On Fri, Dec 09, 2016 at 11:07:23AM +0530, Harini Katakam wrote:
> I'm afraid I don't get why we are choosing the most limited max adj..
> Sorry if I'm missing something - could you please help me understand?

This max_adj is only important when the local clock offset is large
and user space chooses not to step the time value.  In that case, user
space will want to run the clock as fast (or as slow) as possible in
order to catch up with the remote clock.

The driver must provide a max_adj that is always safe for user space
to apply via the clock_adjtime() system call.

HTH,
Richard

^ permalink raw reply

* Re: [PATCH 3/6] net: ethernet: ti: cpts: add support of cpts HW_TS_PUSH
From: Richard Cochran @ 2016-12-09  8:50 UTC (permalink / raw)
  To: Grygorii Strashko
  Cc: David S. Miller, netdev, Mugunthan V N, Sekhar Nori, linux-kernel,
	linux-omap, Rob Herring, devicetree, Murali Karicheri,
	Wingman Kwok
In-Reply-To: <58eea45f-b8fe-6892-e784-b41638c62fd8@ti.com>

On Thu, Dec 08, 2016 at 01:04:11PM -0600, Grygorii Strashko wrote:
> huh. Seems this is not really good idea, because MISC Irq will be 
> triggered for *any* CPTS event and there is no way to enable it just for
> HW_TS_PUSH.

So what?  That is not a problem.

> So, this doesn't work will with current code for RX/TX timestamping
> (which uses polling mode).

Why doesn't it work?

> + runtime overhead in net RX/TX caused by 
> triggering more interrupts. 

This is not relevant.  Without HW_TS_PUSH, there is no need for
enabling the interrupt simply because we don't need it.  Now, with
HW_TS_PUSH, we do need it.
 
> May be, overflow check/polling timeout can be made configurable (module parameter). 

No, it should just work without any user space fiddling.

I getting a bit tired of your half-baked implementations of the
ancillary clock functions.  Either do it right, or just leave it
unsupported.

Thanks,
Richard

^ permalink raw reply

* Re: [PATCH v3 net-next 2/3] openvswitch: Use is_skb_forwardable() for length check.
From: Jiri Benc @ 2016-12-09  8:49 UTC (permalink / raw)
  To: Eric Garver
  Cc: Pravin Shelar, Jarno Rajahalme, Linux Kernel Network Developers
In-Reply-To: <20161208205040.GJ18719@wsfd-netdev-buildsys.lab.bos.redhat.com>

On Thu, 8 Dec 2016 15:50:41 -0500, Eric Garver wrote:
> Should we not also follow the "skbs are untagged" approach that the rest
> of the kernel uses? I'm referring to patches 1 and 2 form Jiri's series
> "openvswitch: make vlan handling consistent".
> 
> With those changes is_skb_forwardable() would behave as expected here.

Yes, this would make the check easy and consistent (and was actually my
original motivation for the mentioned patchset).

Still, is_skb_forwardable would be off by 4 bytes. I wonder whether
it's not off even for the bridge case. And dev_forward_skb seems to be
fishy, too.

 Jiri

^ permalink raw reply

* RE: [PATCH] net: gianfar: add ethtool eee support
From: S.H. Xie @ 2016-12-09  2:55 UTC (permalink / raw)
  To: Florian Fainelli, David Miller; +Cc: netdev@vger.kernel.org
In-Reply-To: <2b309f6b-dcfd-006c-11ea-4fa2f35d01a5@gmail.com>

> -----Original Message-----
> From: Florian Fainelli [mailto:f.fainelli@gmail.com]
> Sent: Friday, December 09, 2016 7:22 AM
> To: David Miller <davem@davemloft.net>; S.H. Xie <shaohui.xie@nxp.com>
> Cc: netdev@vger.kernel.org
> Subject: Re: [PATCH] net: gianfar: add ethtool eee support
> 
> On 12/08/2016 03:19 PM, David Miller wrote:
> > From: Shaohui Xie <Shaohui.Xie@nxp.com>
> > Date: Thu, 8 Dec 2016 19:27:06 +0800
> >
> >> Gianfar does not support EEE, but it can connect to a PHY which
> >> supports EEE and the PHY advertises EEE by default, and its link
> >> partner also advertises EEE, so the PHY enters low power mode when
> >> traffic rate is low, which causes packet loss if an application's
> >> traffic rate is low. This patch provides .get_eee and .set_eee so
> >> that to disable the EEE advertisement via ethtool if needed, other EEE features
> are not supported.
> >>
> >> Signed-off-by: Shaohui Xie <Shaohui.Xie@nxp.com>
> >
> > This is not the way to fix this.
> >
> > If the Gianfar MAC does not support EEE properly, then the gianfar
> > driver should not create a situation where the PHY advertises EEE in
> > the first place.
> 
> Agreed, you should have gianfar mask out the EEE advertisement from the PHY it
> connects to, in order to make sure that EEE is properly disabled.
> This should be no different from e.g: connecting a 10/100/1000 PHY to a
> 10/100 only controller for instance.
[S.H] The EEE advertisement does not belong to PHY device, gianfar cannot mask out the EEE
Advertisement like advertising, should use phy_write_mmd_indirect() to disable it?

And by disabling EEE advertisement always, the PHY cannot enter low power mode even if there 
is no traffic. Using ethtool could provide user options to handle this.

Thanks.
Shaohui



^ permalink raw reply

* Re: netlink: GPF in sock_sndtimeo
From: Cong Wang @ 2016-12-09  6:57 UTC (permalink / raw)
  To: Richard Guy Briggs
  Cc: linux-audit, Paul Moore, Dmitry Vyukov, David Miller,
	Johannes Berg, Florian Westphal, Eric Dumazet, Herbert Xu, netdev,
	LKML, syzkaller
In-Reply-To: <20161209060248.GT22655@madcap2.tricolour.ca>

On Thu, Dec 8, 2016 at 10:02 PM, Richard Guy Briggs <rgb@redhat.com> wrote:
> I also tried to extend Cong Wang's idea to attempt to proactively respond to a
> NETLINK_URELEASE on the audit_sock and reset it, but ran into a locking error
> stack dump using mutex_lock(&audit_cmd_mutex) in the notifier callback.
> Eliminating the lock since the sock is dead anways eliminates the error.
>
> Is it safe?  I'll resubmit if this looks remotely sane.  Meanwhile I'll try to
> get the test case to compile.

It doesn't look safe, because 'audit_sock', 'audit_nlk_portid' and 'audit_pid'
are updated as a whole and race between audit_receive_msg() and
NETLINK_URELEASE.


> @@ -1167,10 +1190,14 @@ static void __net_exit audit_net_exit(struct net *net)
>  {
>         struct audit_net *aunet = net_generic(net, audit_net_id);
>         struct sock *sock = aunet->nlsk;
> +
> +       mutex_lock(&audit_cmd_mutex);
>         if (sock == audit_sock) {
>                 audit_pid = 0;
> +               audit_nlk_portid = 0;
>                 audit_sock = NULL;
>         }
> +       mutex_unlock(&audit_cmd_mutex);
>

If you decide to use NETLINK_URELEASE notifier, the above piece is no
longer needed, the net_exit path simply releases a refcnt.

^ permalink raw reply

* [PATCH V2  21/22] bnxt_re: Add QP event handling
From: Selvin Xavier @ 2016-12-09  6:48 UTC (permalink / raw)
  To: dledford, linux-rdma
  Cc: netdev, Selvin Xavier, Eddie Wai, Devesh Sharma, Somnath Kotur,
	Sriharsha Basavapatna
In-Reply-To: <1481266096-23331-1-git-send-email-selvin.xavier@broadcom.com>

Implements callback handler for processing affiliated Async events of a QP.
This patch also implements the control path command completion handling.

Signed-off-by: Eddie Wai <eddie.wai@broadcom.com>
Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com>
Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
---
 drivers/infiniband/hw/bnxtre/bnxt_qplib_rcfw.c | 49 ++++++++++++++++++++++++++
 1 file changed, 49 insertions(+)

diff --git a/drivers/infiniband/hw/bnxtre/bnxt_qplib_rcfw.c b/drivers/infiniband/hw/bnxtre/bnxt_qplib_rcfw.c
index 5b71acd..3e6bb3f 100644
--- a/drivers/infiniband/hw/bnxtre/bnxt_qplib_rcfw.c
+++ b/drivers/infiniband/hw/bnxtre/bnxt_qplib_rcfw.c
@@ -246,6 +246,46 @@ static int bnxt_qplib_process_func_event(struct bnxt_qplib_rcfw *rcfw,
 	return 0;
 }
 
+static int bnxt_qplib_process_qp_event(struct bnxt_qplib_rcfw *rcfw,
+				       struct creq_qp_event *qp_event)
+{
+	struct bnxt_qplib_crsq *crsq = &rcfw->crsq;
+	struct bnxt_qplib_hwq *cmdq = &rcfw->cmdq;
+	struct bnxt_qplib_crsqe *crsqe;
+	u16 cbit, cookie, blocked = 0;
+	unsigned long flags;
+	u32 sw_cons;
+
+	switch (qp_event->event) {
+	case CREQ_QP_EVENT_EVENT_QP_ERROR_NOTIFICATION:
+		break;
+	default:
+	{
+		/* Command Response */
+		spin_lock_irqsave(&cmdq->lock, flags);
+		sw_cons = HWQ_CMP(crsq->cons, crsq);
+		crsqe = &crsq->crsq[sw_cons];
+		crsq->cons++;
+		memcpy(&crsqe->qp_event, qp_event, sizeof(crsqe->qp_event));
+
+		cookie = le16_to_cpu(crsqe->qp_event.cookie);
+		blocked = cookie & RCFW_CMD_IS_BLOCKING;
+		cookie &= RCFW_MAX_COOKIE_VALUE;
+		cbit = cookie % RCFW_MAX_OUTSTANDING_CMD;
+		if (!test_and_clear_bit(cbit, rcfw->cmdq_bitmap))
+			dev_warn(&rcfw->pdev->dev,
+				 "QPLIB: CMD bit %d was not requested", cbit);
+
+		cmdq->cons += crsqe->req_size;
+		spin_unlock_irqrestore(&cmdq->lock, flags);
+		if (!blocked)
+			wake_up(&rcfw->waitq);
+		break;
+	}
+	}
+	return 0;
+}
+
 /* SP - CREQ Completion handlers */
 static void bnxt_qplib_service_creq(unsigned long data)
 {
@@ -269,6 +309,15 @@ static void bnxt_qplib_service_creq(unsigned long data)
 		type = creqe->type & CREQ_BASE_TYPE_MASK;
 		switch (type) {
 		case CREQ_BASE_TYPE_QP_EVENT:
+			if (!bnxt_qplib_process_qp_event
+			    (rcfw, (struct creq_qp_event *)creqe))
+				rcfw->creq_qp_event_processed++;
+			else {
+				dev_warn(&rcfw->pdev->dev, "QPLIB: crsqe with");
+				dev_warn(&rcfw->pdev->dev,
+					 "QPLIB: type = 0x%x not handled",
+					 type);
+			}
 			break;
 		case CREQ_BASE_TYPE_FUNC_EVENT:
 			if (!bnxt_qplib_process_func_event
-- 
2.5.5

^ permalink raw reply related

* [PATCH V2  20/22] bnxt_re: Set uverbs command mask
From: Selvin Xavier @ 2016-12-09  6:48 UTC (permalink / raw)
  To: dledford, linux-rdma
  Cc: netdev, Selvin Xavier, Eddie Wai, Devesh Sharma, Somnath Kotur,
	Sriharsha Basavapatna
In-Reply-To: <1481266096-23331-1-git-send-email-selvin.xavier@broadcom.com>

This patch exports available uverbs command mask to the IB stack.
Also, populating some of the missing parameters in the ibdev structure
used for registration.

Signed-off-by: Eddie Wai <eddie.wai@broadcom.com>
Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com>
Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
---
 drivers/infiniband/hw/bnxtre/bnxt_re_main.c | 36 +++++++++++++++++++++++++++++
 1 file changed, 36 insertions(+)

diff --git a/drivers/infiniband/hw/bnxtre/bnxt_re_main.c b/drivers/infiniband/hw/bnxtre/bnxt_re_main.c
index 3b3a063..0474f6a 100644
--- a/drivers/infiniband/hw/bnxtre/bnxt_re_main.c
+++ b/drivers/infiniband/hw/bnxtre/bnxt_re_main.c
@@ -62,6 +62,7 @@
 #include "bnxt_re.h"
 #include "bnxt_re_debugfs.h"
 #include "bnxt_re_ib_verbs.h"
+#include <rdma/bnxt_re_uverbs_abi.h>
 #include "bnxt.h"
 static char version[] =
 		BNXT_RE_DESC " v" ROCE_DRV_MODULE_VERSION "\n";
@@ -425,8 +426,42 @@ static int bnxt_re_register_ib(struct bnxt_re_dev *rdev)
 		strlen(BNXT_RE_DESC) + 5);
 	ibdev->phys_port_cnt = 1;
 
+	bnxt_qplib_get_guid(rdev->netdev->dev_addr, (u8 *)&ibdev->node_guid);
+
 	ibdev->num_comp_vectors	= 1;
 	ibdev->dma_device = &rdev->en_dev->pdev->dev;
+	ibdev->local_dma_lkey = BNXT_QPLIB_RSVD_LKEY;
+
+	/* User space */
+	ibdev->uverbs_abi_ver = BNXT_RE_ABI_VERSION;
+	ibdev->uverbs_cmd_mask =
+			(1ull << IB_USER_VERBS_CMD_GET_CONTEXT)		|
+			(1ull << IB_USER_VERBS_CMD_QUERY_DEVICE)	|
+			(1ull << IB_USER_VERBS_CMD_QUERY_PORT)		|
+			(1ull << IB_USER_VERBS_CMD_ALLOC_PD)		|
+			(1ull << IB_USER_VERBS_CMD_DEALLOC_PD)		|
+			(1ull << IB_USER_VERBS_CMD_REG_MR)		|
+			(1ull << IB_USER_VERBS_CMD_REREG_MR)		|
+			(1ull << IB_USER_VERBS_CMD_DEREG_MR)		|
+			(1ull << IB_USER_VERBS_CMD_CREATE_COMP_CHANNEL) |
+			(1ull << IB_USER_VERBS_CMD_CREATE_CQ)		|
+			(1ull << IB_USER_VERBS_CMD_RESIZE_CQ)		|
+			(1ull << IB_USER_VERBS_CMD_DESTROY_CQ)		|
+			(1ull << IB_USER_VERBS_CMD_CREATE_QP)		|
+			(1ull << IB_USER_VERBS_CMD_MODIFY_QP)		|
+			(1ull << IB_USER_VERBS_CMD_QUERY_QP)		|
+			(1ull << IB_USER_VERBS_CMD_DESTROY_QP)		|
+			(1ull << IB_USER_VERBS_CMD_CREATE_SRQ)		|
+			(1ull << IB_USER_VERBS_CMD_MODIFY_SRQ)		|
+			(1ull << IB_USER_VERBS_CMD_QUERY_SRQ)		|
+			(1ull << IB_USER_VERBS_CMD_DESTROY_SRQ)		|
+			(1ull << IB_USER_VERBS_CMD_CREATE_AH)		|
+			(1ull << IB_USER_VERBS_CMD_MODIFY_AH)		|
+			(1ull << IB_USER_VERBS_CMD_QUERY_AH)		|
+			(1ull << IB_USER_VERBS_CMD_DESTROY_AH);
+	/* POLL_CQ and REQ_NOTIFY_CQ is directly handled in libbnxt_re */
+
+	/* Kernel verbs */
 	ibdev->query_device		= bnxt_re_query_device;
 	ibdev->modify_device		= bnxt_re_modify_device;
 
@@ -1058,6 +1093,7 @@ static int bnxt_re_ib_reg(struct bnxt_re_dev *rdev)
 		pr_err("Failed to register with IB: %#x\n", rc);
 		goto fail;
 	}
+	dev_info(rdev_to_dev(rdev), "Device registered successfully");
 	for (i = 0; i < ARRAY_SIZE(bnxt_re_attributes); i++) {
 		rc = device_create_file(&rdev->ibdev.dev,
 					bnxt_re_attributes[i]);
-- 
2.5.5

^ permalink raw reply related

* [PATCH V2  19/22] bnxt_re: Support debugfs
From: Selvin Xavier @ 2016-12-09  6:48 UTC (permalink / raw)
  To: dledford, linux-rdma
  Cc: netdev, Selvin Xavier, Eddie Wai, Devesh Sharma, Somnath Kotur,
	Sriharsha Basavapatna
In-Reply-To: <1481266096-23331-1-git-send-email-selvin.xavier@broadcom.com>

This patch exports some of the FW debug counters

Signed-off-by: Eddie Wai <eddie.wai@broadcom.com>
Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com>
Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
---
 drivers/infiniband/hw/bnxtre/bnxt_re_debugfs.c | 159 +++++++++++++++++++++++++
 drivers/infiniband/hw/bnxtre/bnxt_re_debugfs.h |  48 ++++++++
 drivers/infiniband/hw/bnxtre/bnxt_re_main.c    |   8 +-
 3 files changed, 213 insertions(+), 2 deletions(-)
 create mode 100644 drivers/infiniband/hw/bnxtre/bnxt_re_debugfs.c
 create mode 100644 drivers/infiniband/hw/bnxtre/bnxt_re_debugfs.h

diff --git a/drivers/infiniband/hw/bnxtre/bnxt_re_debugfs.c b/drivers/infiniband/hw/bnxtre/bnxt_re_debugfs.c
new file mode 100644
index 0000000..b38c83a
--- /dev/null
+++ b/drivers/infiniband/hw/bnxtre/bnxt_re_debugfs.c
@@ -0,0 +1,159 @@
+/*
+ * Broadcom NetXtreme-E RoCE driver.
+ *
+ * Copyright (c) 2016, Broadcom. All rights reserved.  The term
+ * Broadcom refers to Broadcom Limited and/or its subsidiaries.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in
+ *    the documentation and/or other materials provided with the
+ *    distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS''
+ * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO,
+ * THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+ * PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS
+ * BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR
+ * BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
+ * WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE
+ * OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN
+ * IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ *
+ * Description: DebugFS specifics
+ */
+
+#include <linux/module.h>
+#include <linux/interrupt.h>
+#include <linux/debugfs.h>
+#include <linux/seq_file.h>
+#include <linux/netdevice.h>
+
+#include <rdma/ib_verbs.h>
+#include "bnxt_re_hsi.h"
+#include "bnxt_ulp.h"
+#include "bnxt_qplib_res.h"
+#include "bnxt_qplib_sp.h"
+#include "bnxt_qplib_fp.h"
+#include "bnxt_qplib_rcfw.h"
+
+#include "bnxt_re.h"
+#include "bnxt_re_debugfs.h"
+
+static struct dentry *bnxt_re_debugfs_root;
+static struct dentry *bnxt_re_debugfs_info;
+
+static ssize_t bnxt_re_debugfs_clear(struct file *fil, const char __user *u,
+				     size_t size, loff_t *off)
+{
+	return size;
+}
+
+static int bnxt_re_debugfs_show(struct seq_file *s, void *unused)
+{
+	struct bnxt_re_dev *rdev;
+
+	seq_puts(s, "bnxt_re debug info:\n");
+
+	mutex_lock(&bnxt_re_dev_lock);
+	list_for_each_entry(rdev, &bnxt_re_dev_list, list) {
+		struct ctx_hw_stats *stats = rdev->qplib_ctx.stats.dma;
+
+		seq_printf(s, "=====[ IBDEV %s ]=============================\n",
+			   rdev->ibdev.name);
+		if (rdev->netdev)
+			seq_printf(s, "\tlink state: %s\n",
+				   test_bit(__LINK_STATE_START,
+					    &rdev->netdev->state) ?
+				   (test_bit(__LINK_STATE_NOCARRIER,
+					     &rdev->netdev->state) ?
+				    "DOWN" : "UP") : "DOWN");
+		seq_printf(s, "\tMax QP: 0x%x\n", rdev->dev_attr.max_qp);
+		seq_printf(s, "\tMax SRQ: 0x%x\n", rdev->dev_attr.max_srq);
+		seq_printf(s, "\tMax CQ: 0x%x\n", rdev->dev_attr.max_cq);
+		seq_printf(s, "\tMax MR: 0x%x\n", rdev->dev_attr.max_mr);
+		seq_printf(s, "\tMax MW: 0x%x\n", rdev->dev_attr.max_mw);
+
+		seq_printf(s, "\tActive QP: %d\n",
+			   atomic_read(&rdev->qp_count));
+		seq_printf(s, "\tActive SRQ: %d\n",
+			   atomic_read(&rdev->srq_count));
+		seq_printf(s, "\tActive CQ: %d\n",
+			   atomic_read(&rdev->cq_count));
+		seq_printf(s, "\tActive MR: %d\n",
+			   atomic_read(&rdev->mr_count));
+		seq_printf(s, "\tActive MW: %d\n",
+			   atomic_read(&rdev->mw_count));
+		seq_printf(s, "\tRx Pkts: %lld\n",
+			   stats ? stats->rx_ucast_pkts : 0);
+		seq_printf(s, "\tRx Bytes: %lld\n",
+			   stats ? stats->rx_ucast_bytes : 0);
+		seq_printf(s, "\tTx Pkts: %lld\n",
+			   stats ? stats->tx_ucast_pkts : 0);
+		seq_printf(s, "\tTx Bytes: %lld\n",
+			   stats ? stats->tx_ucast_bytes : 0);
+		seq_printf(s, "\tRecoverable Errors: %lld\n",
+			   stats ? stats->tx_bcast_pkts : 0);
+		seq_puts(s, "\n");
+	}
+	mutex_unlock(&bnxt_re_dev_lock);
+	return 0;
+}
+
+static int bnxt_re_debugfs_open(struct inode *inode, struct file *file)
+{
+	return single_open(file, bnxt_re_debugfs_show, NULL);
+}
+
+static int bnxt_re_debugfs_release(struct inode *inode, struct file *file)
+{
+	return single_release(inode, file);
+}
+
+static const struct file_operations bnxt_re_dbg_ops = {
+	.owner		= THIS_MODULE,
+	.open		= bnxt_re_debugfs_open,
+	.read		= seq_read,
+	.write		= bnxt_re_debugfs_clear,
+	.llseek		= seq_lseek,
+	.release	= bnxt_re_debugfs_release,
+};
+
+void bnxt_re_debugfs_remove(void)
+{
+	debugfs_remove_recursive(bnxt_re_debugfs_root);
+	bnxt_re_debugfs_root = NULL;
+}
+
+void bnxt_re_debugfs_init(void)
+{
+	bnxt_re_debugfs_root = debugfs_create_dir(ROCE_DRV_MODULE_NAME, NULL);
+	if (IS_ERR_OR_NULL(bnxt_re_debugfs_root)) {
+		dev_dbg(NULL, "%s: Unable to create debugfs root directory ",
+			ROCE_DRV_MODULE_NAME);
+		dev_dbg(NULL, "with err 0x%lx", PTR_ERR(bnxt_re_debugfs_root));
+		return;
+	}
+	bnxt_re_debugfs_info = debugfs_create_file("info", 0400,
+						   bnxt_re_debugfs_root, NULL,
+						   &bnxt_re_dbg_ops);
+	if (IS_ERR_OR_NULL(bnxt_re_debugfs_info)) {
+		dev_dbg(NULL, "%s: Unable to create debugfs info node ",
+			ROCE_DRV_MODULE_NAME);
+		dev_dbg(NULL, "with err 0x%lx", PTR_ERR(bnxt_re_debugfs_info));
+		bnxt_re_debugfs_remove();
+	}
+}
diff --git a/drivers/infiniband/hw/bnxtre/bnxt_re_debugfs.h b/drivers/infiniband/hw/bnxtre/bnxt_re_debugfs.h
new file mode 100644
index 0000000..06ba5cd
--- /dev/null
+++ b/drivers/infiniband/hw/bnxtre/bnxt_re_debugfs.h
@@ -0,0 +1,48 @@
+/*
+ * Broadcom NetXtreme-E RoCE driver.
+ *
+ * Copyright (c) 2016, Broadcom. All rights reserved.  The term
+ * Broadcom refers to Broadcom Limited and/or its subsidiaries.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in
+ *    the documentation and/or other materials provided with the
+ *    distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS''
+ * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO,
+ * THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+ * PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS
+ * BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR
+ * BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
+ * WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE
+ * OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN
+ * IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ *
+ * Description: DebugFS header
+ */
+
+#ifndef __BNXT_RE_DEBUGFS__
+#define __BNXT_RE_DEBUGFS__
+
+extern struct list_head bnxt_re_dev_list;
+extern struct mutex bnxt_re_dev_lock;
+
+void bnxt_re_debugfs_init(void);
+void bnxt_re_debugfs_remove(void);
+
+#endif
diff --git a/drivers/infiniband/hw/bnxtre/bnxt_re_main.c b/drivers/infiniband/hw/bnxtre/bnxt_re_main.c
index f3ce02c..3b3a063 100644
--- a/drivers/infiniband/hw/bnxtre/bnxt_re_main.c
+++ b/drivers/infiniband/hw/bnxtre/bnxt_re_main.c
@@ -60,6 +60,7 @@
 #include "bnxt_qplib_fp.h"
 #include "bnxt_qplib_rcfw.h"
 #include "bnxt_re.h"
+#include "bnxt_re_debugfs.h"
 #include "bnxt_re_ib_verbs.h"
 #include "bnxt.h"
 static char version[] =
@@ -72,8 +73,8 @@ MODULE_LICENSE("Dual BSD/GPL");
 MODULE_VERSION(ROCE_DRV_MODULE_VERSION);
 
 /* globals */
-static struct list_head bnxt_re_dev_list = LIST_HEAD_INIT(bnxt_re_dev_list);
-static DEFINE_MUTEX(bnxt_re_dev_lock);
+struct list_head bnxt_re_dev_list = LIST_HEAD_INIT(bnxt_re_dev_list);
+DEFINE_MUTEX(bnxt_re_dev_lock);
 static struct workqueue_struct *bnxt_re_wq;
 
 /* for handling bnxt_en callbacks later */
@@ -1264,6 +1265,7 @@ static int __init bnxt_re_mod_init(void)
 	if (!bnxt_re_wq)
 		return -ENOMEM;
 
+	bnxt_re_debugfs_init();
 	INIT_LIST_HEAD(&bnxt_re_dev_list);
 
 	rc = register_netdevice_notifier(&bnxt_re_netdev_notifier);
@@ -1275,6 +1277,7 @@ static int __init bnxt_re_mod_init(void)
 	return 0;
 
 err_netdev:
+	bnxt_re_debugfs_remove();
 	destroy_workqueue(bnxt_re_wq);
 
 	return rc;
@@ -1282,6 +1285,7 @@ static int __init bnxt_re_mod_init(void)
 static void __exit bnxt_re_mod_exit(void)
 {
 	unregister_netdevice_notifier(&bnxt_re_netdev_notifier);
+	bnxt_re_debugfs_remove();
 	if (bnxt_re_wq)
 		destroy_workqueue(bnxt_re_wq);
 }
-- 
2.5.5

^ permalink raw reply related

* [PATCH V2  18/22] bnxt_re: Support for DCB
From: Selvin Xavier @ 2016-12-09  6:48 UTC (permalink / raw)
  To: dledford, linux-rdma
  Cc: netdev, Selvin Xavier, Eddie Wai, Devesh Sharma, Somnath Kotur,
	Sriharsha Basavapatna
In-Reply-To: <1481266096-23331-1-git-send-email-selvin.xavier@broadcom.com>

This patch queries the configured RoCE APP Priority on the host
using the dcbnl API and programs the RoCE FW with the corresponding
Traffic Class(es) for the priority.

v2: Fixed some sparse warning and cleanup of function
bnxt_re_query_hwrm_pri2cos

Signed-off-by: Eddie Wai <eddie.wai@broadcom.com>
Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com>
Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
---
 drivers/infiniband/hw/bnxtre/bnxt_qplib_sp.h |   3 +-
 drivers/infiniband/hw/bnxtre/bnxt_re.h       |   6 ++
 drivers/infiniband/hw/bnxtre/bnxt_re_main.c  | 140 +++++++++++++++++++++++++++
 3 files changed, 148 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/hw/bnxtre/bnxt_qplib_sp.h b/drivers/infiniband/hw/bnxtre/bnxt_qplib_sp.h
index 3358f6d..a72dfab 100644
--- a/drivers/infiniband/hw/bnxtre/bnxt_qplib_sp.h
+++ b/drivers/infiniband/hw/bnxtre/bnxt_qplib_sp.h
@@ -156,4 +156,5 @@ int bnxt_qplib_alloc_fast_reg_page_list(struct bnxt_qplib_res *res,
 					struct bnxt_qplib_frpl *frpl, int max);
 int bnxt_qplib_free_fast_reg_page_list(struct bnxt_qplib_res *res,
 				       struct bnxt_qplib_frpl *frpl);
-#endif /* __BNXT_QPLIB_SP_H__*/
+int bnxt_qplib_map_tc2cos(struct bnxt_qplib_res *res, u16 *cids);
+#endif
diff --git a/drivers/infiniband/hw/bnxtre/bnxt_re.h b/drivers/infiniband/hw/bnxtre/bnxt_re.h
index 30f4b30..36b1d4f 100644
--- a/drivers/infiniband/hw/bnxtre/bnxt_re.h
+++ b/drivers/infiniband/hw/bnxtre/bnxt_re.h
@@ -45,6 +45,9 @@
 #define BNXT_RE_REF_WAIT_COUNT		10
 #define BNXT_RE_DESC	"Broadcom NetXtreme-C/E RoCE Driver"
 
+#define BNXT_RE_ROCE_V1_ETH_TYPE	0x8915
+#define BNXT_RE_ROCE_V2_PORT_NO		4791
+
 #define BNXT_RE_PAGE_SIZE_4K		BIT(12)
 #define BNXT_RE_PAGE_SIZE_8K		BIT(13)
 #define BNXT_RE_PAGE_SIZE_64K		BIT(16)
@@ -95,6 +98,9 @@ struct bnxt_re_dev {
 
 	int				id;
 
+	struct delayed_work		worker;
+	u8				cur_prio_map;
+
 	/* FP Notification Queue (CQ & SRQ) */
 	struct tasklet_struct		nq_task;
 
diff --git a/drivers/infiniband/hw/bnxtre/bnxt_re_main.c b/drivers/infiniband/hw/bnxtre/bnxt_re_main.c
index 260dec1..f3ce02c 100644
--- a/drivers/infiniband/hw/bnxtre/bnxt_re_main.c
+++ b/drivers/infiniband/hw/bnxtre/bnxt_re_main.c
@@ -44,6 +44,7 @@
 #include <linux/rculist.h>
 #include <linux/spinlock.h>
 #include <linux/pci.h>
+#include <net/dcbnl.h>
 #include <net/ipv6.h>
 #include <net/addrconf.h>
 
@@ -734,6 +735,50 @@ static void bnxt_re_dispatch_event(struct ib_device *ibdev, struct ib_qp *qp,
 	ib_dispatch_event(&ib_event);
 }
 
+#define HWRM_QUEUE_PRI2COS_QCFG_INPUT_FLAGS_IVLAN      0x02
+static int bnxt_re_query_hwrm_pri2cos(struct bnxt_re_dev *rdev, u8 dir,
+				      u64 *cid_map)
+{
+	struct hwrm_queue_pri2cos_qcfg_input req = {0};
+	struct bnxt *bp = netdev_priv(rdev->netdev);
+	struct hwrm_queue_pri2cos_qcfg_output resp;
+	struct bnxt_en_dev *en_dev = rdev->en_dev;
+	struct bnxt_fw_msg fw_msg;
+	u32 flags = 0;
+	u8 *qcfgmap, *tmp_map;
+	int rc = 0, i;
+
+	if (!cid_map)
+		return -EINVAL;
+
+	memset(&fw_msg, 0, sizeof(fw_msg));
+	bnxt_re_init_hwrm_hdr(rdev, (void *)&req,
+			      HWRM_QUEUE_PRI2COS_QCFG, -1, -1);
+	flags |= (dir & 0x01);
+	flags |= HWRM_QUEUE_PRI2COS_QCFG_INPUT_FLAGS_IVLAN;
+	req.flags = cpu_to_le32(flags);
+	req.port_id = bp->pf.port_id;
+
+	bnxt_re_fill_fw_msg(&fw_msg, (void *)&req, sizeof(req), (void *)&resp,
+			    sizeof(resp), DFLT_HWRM_CMD_TIMEOUT);
+	rc = en_dev->en_ops->bnxt_send_fw_msg(en_dev, BNXT_ROCE_ULP, &fw_msg);
+	if (rc)
+		return rc;
+
+	if (resp.queue_cfg_info) {
+		dev_warn(rdev_to_dev(rdev),
+			 "Asymmetric cos queue configuration detected");
+		dev_warn(rdev_to_dev(rdev),
+			 " on device, QoS may not be fully functional\n");
+	}
+	qcfgmap = &resp.pri0_cos_queue_id;
+	tmp_map = (u8 *)cid_map;
+	for (i = 0; i < IEEE_8021QAZ_MAX_TCS; i++)
+		tmp_map[i] = qcfgmap[i];
+
+	return rc;
+}
+
 static bool bnxt_re_is_qp1_or_shadow_qp(struct bnxt_re_dev *rdev,
 					struct bnxt_re_qp *qp)
 {
@@ -774,6 +819,80 @@ static void bnxt_re_dev_stop(struct bnxt_re_dev *rdev, bool qp_wait)
 	}
 }
 
+static u32 bnxt_re_get_priority_mask(struct bnxt_re_dev *rdev)
+{
+	u32 prio_map = 0, tmp_map = 0;
+	struct net_device *netdev;
+	struct dcb_app app;
+
+	netdev = rdev->netdev;
+
+	memset(&app, 0, sizeof(app));
+	app.selector = IEEE_8021QAZ_APP_SEL_ETHERTYPE;
+	app.protocol = BNXT_RE_ROCE_V1_ETH_TYPE;
+	tmp_map = dcb_ieee_getapp_mask(netdev, &app);
+	prio_map = tmp_map;
+
+	app.selector = IEEE_8021QAZ_APP_SEL_DGRAM;
+	app.protocol = BNXT_RE_ROCE_V2_PORT_NO;
+	tmp_map = dcb_ieee_getapp_mask(netdev, &app);
+	prio_map |= tmp_map;
+
+	if (!prio_map)
+		prio_map = -EFAULT;
+	return prio_map;
+}
+
+static void bnxt_re_parse_cid_map(u8 prio_map, u8 *cid_map, u16 *cosq)
+{
+	u16 prio;
+	u8 id;
+
+	for (prio = 0, id = 0; prio < 8; prio++) {
+		if (prio_map & (1 << prio)) {
+			cosq[id] = cid_map[prio];
+			id++;
+			if (id == 2) /* Max 2 tcs supported */
+				break;
+		}
+	}
+}
+
+static int bnxt_re_setup_qos(struct bnxt_re_dev *rdev)
+{
+	u8 prio_map = 0;
+	u64 cid_map;
+	int rc;
+
+	/* Get priority for roce */
+	rc = bnxt_re_get_priority_mask(rdev);
+	if (rc < 0)
+		return rc;
+	prio_map = (u8)rc;
+
+	if (prio_map == rdev->cur_prio_map)
+		return 0;
+	rdev->cur_prio_map = prio_map;
+	/* Get cosq id for this priority */
+	rc = bnxt_re_query_hwrm_pri2cos(rdev, 0, &cid_map);
+	if (rc) {
+		dev_warn(rdev_to_dev(rdev), "no cos for p_mask %x\n", prio_map);
+		return rc;
+	}
+	/* Parse CoS IDs for app priority */
+	bnxt_re_parse_cid_map(prio_map, (u8 *)&cid_map, rdev->cosq);
+
+	/* Config BONO. */
+	rc = bnxt_qplib_map_tc2cos(&rdev->qplib_res, rdev->cosq);
+	if (rc) {
+		dev_warn(rdev_to_dev(rdev), "no tc for cos{%x, %x}\n",
+			 rdev->cosq[0], rdev->cosq[1]);
+		return rc;
+	}
+
+	return 0;
+}
+
 static void bnxt_re_ib_unreg(struct bnxt_re_dev *rdev, bool lock_wait)
 {
 	int i, rc;
@@ -785,6 +904,9 @@ static void bnxt_re_ib_unreg(struct bnxt_re_dev *rdev, bool lock_wait)
 		/* Cleanup ib dev */
 		bnxt_re_unregister_ib(rdev);
 	}
+	if (test_and_clear_bit(BNXT_RE_FLAG_QOS_WORK_REG, &rdev->flags))
+		cancel_delayed_work(&rdev->worker);
+
 	bnxt_re_cleanup_res(rdev);
 	bnxt_re_free_res(rdev, lock_wait);
 
@@ -827,6 +949,16 @@ static void bnxt_re_set_resource_limits(struct bnxt_re_dev *rdev)
 		rdev->dev_attr.tqm_alloc_reqs[i];
 }
 
+/* worker thread for polling periodic events. Now used for QoS programming*/
+static void bnxt_re_worker(struct work_struct *work)
+{
+	struct bnxt_re_dev *rdev = container_of(work, struct bnxt_re_dev,
+						worker.work);
+
+	bnxt_re_setup_qos(rdev);
+	schedule_delayed_work(&rdev->worker, msecs_to_jiffies(30000));
+}
+
 static int bnxt_re_ib_reg(struct bnxt_re_dev *rdev)
 {
 	int i, j, rc;
@@ -911,6 +1043,14 @@ static int bnxt_re_ib_reg(struct bnxt_re_dev *rdev)
 		goto fail;
 	}
 
+	rc = bnxt_re_setup_qos(rdev);
+	if (rc)
+		pr_info("RoCE priority not yet configured\n");
+
+	INIT_DELAYED_WORK(&rdev->worker, bnxt_re_worker);
+	set_bit(BNXT_RE_FLAG_QOS_WORK_REG, &rdev->flags);
+	schedule_delayed_work(&rdev->worker, msecs_to_jiffies(30000));
+
 	/* Register ib dev */
 	rc = bnxt_re_register_ib(rdev);
 	if (rc) {
-- 
2.5.5

^ permalink raw reply related

* [PATCH V2  16/22] bnxt_re: Support poll_cq verb
From: Selvin Xavier @ 2016-12-09  6:48 UTC (permalink / raw)
  To: dledford, linux-rdma
  Cc: netdev, Selvin Xavier, Eddie Wai, Devesh Sharma, Somnath Kotur,
	Sriharsha Basavapatna
In-Reply-To: <1481266096-23331-1-git-send-email-selvin.xavier@broadcom.com>

Enables the fastpath ib_poll_cq verb.

v2: Fixed sparse warnings

Signed-off-by: Eddie Wai <eddie.wai@broadcom.com>
Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com>
Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
---
 drivers/infiniband/hw/bnxtre/bnxt_qplib_fp.c    | 553 +++++++++++++++++++++++-
 drivers/infiniband/hw/bnxtre/bnxt_qplib_fp.h    |   7 +-
 drivers/infiniband/hw/bnxtre/bnxt_re_ib_verbs.c | 521 +++++++++++++++++++++-
 drivers/infiniband/hw/bnxtre/bnxt_re_ib_verbs.h |   1 +
 drivers/infiniband/hw/bnxtre/bnxt_re_main.c     |  22 +-
 5 files changed, 1098 insertions(+), 6 deletions(-)

diff --git a/drivers/infiniband/hw/bnxtre/bnxt_qplib_fp.c b/drivers/infiniband/hw/bnxtre/bnxt_qplib_fp.c
index 67188ce..3671a5d 100644
--- a/drivers/infiniband/hw/bnxtre/bnxt_qplib_fp.c
+++ b/drivers/infiniband/hw/bnxtre/bnxt_qplib_fp.c
@@ -210,7 +210,7 @@ void bnxt_qplib_disable_nq(struct bnxt_qplib_nq *nq)
 int bnxt_qplib_enable_nq(struct pci_dev *pdev, struct bnxt_qplib_nq *nq,
 			 int msix_vector, int bar_reg_offset,
 			 int (*cqn_handler)(struct bnxt_qplib_nq *nq,
-					    void *),
+					    struct bnxt_qplib_cq *),
 			 int (*srqn_handler)(struct bnxt_qplib_nq *nq,
 					     void *, u8 event))
 {
@@ -1608,6 +1608,557 @@ int bnxt_qplib_destroy_cq(struct bnxt_qplib_res *res, struct bnxt_qplib_cq *cq)
 	return 0;
 }
 
+static int __flush_sq(struct bnxt_qplib_q *sq, struct bnxt_qplib_qp *qp,
+		      struct bnxt_qplib_cqe **pcqe, int *budget)
+{
+	u32 sw_prod, sw_cons;
+	struct bnxt_qplib_cqe *cqe;
+	int rc = 0;
+
+	/* Now complete all outstanding SQEs with FLUSHED_ERR */
+	sw_prod = HWQ_CMP(sq->hwq.prod, &sq->hwq);
+	cqe = *pcqe;
+	while (*budget) {
+		sw_cons = HWQ_CMP(sq->hwq.cons, &sq->hwq);
+		if (sw_cons == sw_prod) {
+			sq->flush_in_progress = false;
+			break;
+		}
+		memset(cqe, 0, sizeof(*cqe));
+		cqe->status = CQ_REQ_STATUS_WORK_REQUEST_FLUSHED_ERR;
+		cqe->opcode = CQ_BASE_CQE_TYPE_REQ;
+		cqe->qp_handle = (u64)qp;
+		cqe->wr_id = sq->swq[sw_cons].wr_id;
+		cqe->src_qp = qp->id;
+		cqe->type = sq->swq[sw_cons].type;
+		cqe++;
+		(*budget)--;
+		sq->hwq.cons++;
+	}
+	*pcqe = cqe;
+	if (!budget && HWQ_CMP(sq->hwq.cons, &sq->hwq) != sw_prod)
+		/* Out of budget */
+		rc = -EAGAIN;
+
+	return rc;
+}
+
+static int __flush_rq(struct bnxt_qplib_q *rq, struct bnxt_qplib_qp *qp,
+		      int opcode, struct bnxt_qplib_cqe **pcqe, int *budget)
+{
+	struct bnxt_qplib_cqe *cqe;
+	u32 sw_prod, sw_cons;
+	int rc = 0;
+
+	/* Flush the rest of the RQ */
+	sw_prod = HWQ_CMP(rq->hwq.prod, &rq->hwq);
+	cqe = *pcqe;
+	while (*budget) {
+		sw_cons = HWQ_CMP(rq->hwq.cons, &rq->hwq);
+		if (sw_cons == sw_prod)
+			break;
+		memset(cqe, 0, sizeof(*cqe));
+		cqe->status =
+		    CQ_RES_RC_STATUS_WORK_REQUEST_FLUSHED_ERR;
+		cqe->opcode = opcode;
+		cqe->qp_handle = (u64)qp;
+		cqe->wr_id = rq->swq[sw_cons].wr_id;
+		cqe++;
+		(*budget)--;
+		rq->hwq.cons++;
+	}
+	*pcqe = cqe;
+	if (!*budget && HWQ_CMP(rq->hwq.cons, &rq->hwq) != sw_prod)
+		/* Out of budget */
+		rc = -EAGAIN;
+
+	return rc;
+}
+
+static int bnxt_qplib_cq_process_req(struct bnxt_qplib_cq *cq,
+				     struct cq_req *hwcqe,
+				     struct bnxt_qplib_cqe **pcqe, int *budget)
+{
+	struct bnxt_qplib_qp *qp;
+	struct bnxt_qplib_q *sq;
+	struct bnxt_qplib_cqe *cqe;
+	u32 sw_cons, cqe_cons;
+	int rc = 0;
+
+	qp = (struct bnxt_qplib_qp *)le64_to_cpu(hwcqe->qp_handle);
+	if (!qp) {
+		dev_err(&cq->hwq.pdev->dev,
+			"QPLIB: FP: Process Req qp is NULL");
+		return -EINVAL;
+	}
+	sq = &qp->sq;
+
+	cqe_cons = HWQ_CMP(le16_to_cpu(hwcqe->sq_cons_idx), &sq->hwq);
+	if (cqe_cons > sq->hwq.max_elements) {
+		dev_err(&cq->hwq.pdev->dev,
+			"QPLIB: FP: CQ Process req reported ");
+		dev_err(&cq->hwq.pdev->dev,
+			"QPLIB: sq_cons_idx 0x%x which exceeded max 0x%x",
+			cqe_cons, sq->hwq.max_elements);
+		return -EINVAL;
+	}
+	/* If we were in the middle of flushing the SQ, continue */
+	if (sq->flush_in_progress)
+		goto flush;
+
+	/* Require to walk the sq's swq to fabricate CQEs for all previously
+	 * signaled SWQEs due to CQE aggregation from the current sq cons
+	 * to the cqe_cons
+	 */
+	cqe = *pcqe;
+	while (*budget) {
+		sw_cons = HWQ_CMP(sq->hwq.cons, &sq->hwq);
+		if (sw_cons == cqe_cons)
+			break;
+		memset(cqe, 0, sizeof(*cqe));
+		cqe->opcode = CQ_BASE_CQE_TYPE_REQ;
+		cqe->qp_handle = (u64)qp;
+		cqe->src_qp = qp->id;
+		cqe->wr_id = sq->swq[sw_cons].wr_id;
+		cqe->type = sq->swq[sw_cons].type;
+
+		/* For the last CQE, check for status.  For errors, regardless
+		 * of the request being signaled or not, it must complete with
+		 * the hwcqe error status
+		 */
+		if (HWQ_CMP((sw_cons + 1), &sq->hwq) == cqe_cons &&
+		    hwcqe->status != CQ_REQ_STATUS_OK) {
+			cqe->status = hwcqe->status;
+			dev_err(&cq->hwq.pdev->dev,
+				"QPLIB: FP: CQ Processed Req ");
+			dev_err(&cq->hwq.pdev->dev,
+				"QPLIB: wr_id[%d] = 0x%llx with status 0x%x",
+				sw_cons, cqe->wr_id, cqe->status);
+			cqe++;
+			(*budget)--;
+			sq->flush_in_progress = true;
+			/* Must block new posting of SQ and RQ */
+			qp->state = CMDQ_MODIFY_QP_NEW_STATE_ERR;
+		} else {
+			if (sq->swq[sw_cons].flags &
+			    SQ_SEND_FLAGS_SIGNAL_COMP) {
+				cqe->status = CQ_REQ_STATUS_OK;
+				cqe++;
+				(*budget)--;
+			}
+		}
+		sq->hwq.cons++;
+	}
+	*pcqe = cqe;
+	if (!*budget && HWQ_CMP(sq->hwq.cons, &sq->hwq) != cqe_cons) {
+		/* Out of budget */
+		rc = -EAGAIN;
+		goto done;
+	}
+	if (!sq->flush_in_progress)
+		goto done;
+flush:
+	/* Require to walk the sq's swq to fabricate CQEs for all
+	 * previously posted SWQEs due to the error CQE received
+	 */
+	rc = __flush_sq(sq, qp, pcqe, budget);
+	if (!rc)
+		sq->flush_in_progress = false;
+done:
+	return rc;
+}
+
+static int bnxt_qplib_cq_process_res_rc(struct bnxt_qplib_cq *cq,
+					struct cq_res_rc *hwcqe,
+					struct bnxt_qplib_cqe **pcqe,
+					int *budget)
+{
+	struct bnxt_qplib_qp *qp;
+	struct bnxt_qplib_q *rq;
+	struct bnxt_qplib_cqe *cqe;
+	u64 wr_id_idx;
+	int rc = 0;
+
+	qp = (struct bnxt_qplib_qp *)le64_to_cpu(hwcqe->qp_handle);
+	if (!qp) {
+		dev_err(&cq->hwq.pdev->dev, "QPLIB: process_cq RC qp is NULL");
+		return -EINVAL;
+	}
+	cqe = *pcqe;
+	cqe->opcode = hwcqe->cqe_type_toggle & CQ_BASE_CQE_TYPE_MASK;
+	cqe->length = le32_to_cpu(hwcqe->length);
+	cqe->immdata_or_invrkey = le32_to_cpu(hwcqe->imm_data_or_inv_r_key);
+	cqe->mr_handle = le64_to_cpu(hwcqe->mr_handle);
+	cqe->flags = le16_to_cpu(hwcqe->flags);
+	cqe->status = hwcqe->status;
+	cqe->qp_handle = (u64)qp;
+
+	wr_id_idx = le64_to_cpu(hwcqe->srq_or_rq_wr_id &
+				CQ_RES_RC_SRQ_OR_RQ_WR_ID_MASK);
+	rq = &qp->rq;
+	if (wr_id_idx > rq->hwq.max_elements) {
+		dev_err(&cq->hwq.pdev->dev, "QPLIB: FP: CQ Process RC ");
+		dev_err(&cq->hwq.pdev->dev,
+			"QPLIB: wr_id idx 0x%llx exceeded RQ max 0x%x",
+			wr_id_idx, rq->hwq.max_elements);
+		return -EINVAL;
+	}
+	if (rq->flush_in_progress)
+		goto flush_rq;
+
+	cqe->wr_id = rq->swq[wr_id_idx].wr_id;
+	cqe++;
+	(*budget)--;
+	rq->hwq.cons++;
+	*pcqe = cqe;
+
+	if (hwcqe->status != CQ_RES_RC_STATUS_OK) {
+		rq->flush_in_progress = true;
+flush_rq:
+		rc = __flush_rq(rq, qp, CQ_BASE_CQE_TYPE_RES_RC, pcqe, budget);
+		if (!rc)
+			rq->flush_in_progress = false;
+	}
+	return rc;
+}
+
+static int bnxt_qplib_cq_process_res_ud(struct bnxt_qplib_cq *cq,
+					struct cq_res_ud *hwcqe,
+					struct bnxt_qplib_cqe **pcqe,
+					int *budget)
+{
+	struct bnxt_qplib_qp *qp;
+	struct bnxt_qplib_q *rq;
+	struct bnxt_qplib_cqe *cqe;
+	u64 wr_id_idx;
+	int rc = 0;
+
+	qp = (struct bnxt_qplib_qp *)le64_to_cpu(hwcqe->qp_handle);
+	if (!qp) {
+		dev_err(&cq->hwq.pdev->dev, "QPLIB: process_cq UD qp is NULL");
+		return -EINVAL;
+	}
+	cqe = *pcqe;
+	cqe->opcode = hwcqe->cqe_type_toggle & CQ_BASE_CQE_TYPE_MASK;
+	cqe->length = le32_to_cpu(hwcqe->length);
+	cqe->immdata_or_invrkey = le32_to_cpu(hwcqe->imm_data);
+	cqe->flags = le16_to_cpu(hwcqe->flags);
+	cqe->status = hwcqe->status;
+	cqe->qp_handle = (u64)qp;
+	memcpy(cqe->smac, hwcqe->src_mac, 6);
+	wr_id_idx = le64_to_cpu(hwcqe->src_qp_high_srq_or_rq_wr_id
+				& CQ_RES_UD_SRQ_OR_RQ_WR_ID_MASK);
+	cqe->src_qp = le16_to_cpu(hwcqe->src_qp_low) |
+				(hwcqe->src_qp_high_srq_or_rq_wr_id &
+				 CQ_RES_UD_SRC_QP_HIGH_MASK >> 8);
+
+	rq = &qp->rq;
+	if (wr_id_idx > rq->hwq.max_elements) {
+		dev_err(&cq->hwq.pdev->dev, "QPLIB: FP: CQ Process UD ");
+		dev_err(&cq->hwq.pdev->dev,
+			"QPLIB: wr_id idx 0x%llx exceeded RQ max 0x%x",
+			wr_id_idx, rq->hwq.max_elements);
+			return -EINVAL;
+	}
+	if (rq->flush_in_progress)
+		goto flush_rq;
+
+	cqe->wr_id = rq->swq[wr_id_idx].wr_id;
+	cqe++;
+	(*budget)--;
+	rq->hwq.cons++;
+	*pcqe = cqe;
+
+	if (hwcqe->status != CQ_RES_RC_STATUS_OK) {
+		rq->flush_in_progress = true;
+flush_rq:
+		rc = __flush_rq(rq, qp, CQ_BASE_CQE_TYPE_RES_UD, pcqe, budget);
+		if (!rc)
+			rq->flush_in_progress = false;
+	}
+	return rc;
+}
+
+static int bnxt_qplib_cq_process_res_raweth_qp1(struct bnxt_qplib_cq *cq,
+						struct cq_res_raweth_qp1 *hwcqe,
+						struct bnxt_qplib_cqe **pcqe,
+						int *budget)
+{
+	struct bnxt_qplib_qp *qp;
+	struct bnxt_qplib_q *rq;
+	struct bnxt_qplib_cqe *cqe;
+	u64 wr_id_idx;
+	int rc = 0;
+
+	qp = (struct bnxt_qplib_qp *)le64_to_cpu(hwcqe->qp_handle);
+	if (!qp) {
+		dev_err(&cq->hwq.pdev->dev,
+			"QPLIB: process_cq Raw/QP1 qp is NULL");
+		return -EINVAL;
+	}
+	cqe = *pcqe;
+	cqe->opcode = hwcqe->cqe_type_toggle & CQ_BASE_CQE_TYPE_MASK;
+	cqe->flags = le16_to_cpu(hwcqe->flags);
+	cqe->qp_handle = (u64)qp;
+
+	wr_id_idx = le64_to_cpu(hwcqe->raweth_qp1_payload_offset_srq_or_rq_wr_id
+				& CQ_RES_RAWETH_QP1_SRQ_OR_RQ_WR_ID_MASK);
+	cqe->src_qp = qp->id;
+	if (qp->id == 1 && !cqe->length) {
+		/* Add workaround for the length misdetection */
+		cqe->length = 296;
+	} else {
+		cqe->length = le16_to_cpu(hwcqe->length);
+	}
+	cqe->pkey_index = qp->pkey_index;
+	memcpy(cqe->smac, qp->smac, 6);
+
+	cqe->raweth_qp1_flags = le16_to_cpu(hwcqe->raweth_qp1_flags);
+	cqe->raweth_qp1_flags2 = le16_to_cpu(hwcqe->raweth_qp1_flags2);
+
+	rq = &qp->rq;
+	if (wr_id_idx > rq->hwq.max_elements) {
+		dev_err(&cq->hwq.pdev->dev, "QPLIB: FP: CQ Process Raw/QP1 RQ wr_id ");
+		dev_err(&cq->hwq.pdev->dev, "QPLIB: ix 0x%llx exceeded RQ max 0x%x",
+			wr_id_idx, rq->hwq.max_elements);
+		return -EINVAL;
+	}
+	if (rq->flush_in_progress)
+		goto flush_rq;
+
+	cqe->wr_id = rq->swq[wr_id_idx].wr_id;
+	cqe++;
+	(*budget)--;
+	rq->hwq.cons++;
+	*pcqe = cqe;
+
+	if (hwcqe->status != CQ_RES_RC_STATUS_OK) {
+		rq->flush_in_progress = true;
+flush_rq:
+		rc = __flush_rq(rq, qp, CQ_BASE_CQE_TYPE_RES_RAWETH_QP1, pcqe,
+				budget);
+		if (!rc)
+			rq->flush_in_progress = false;
+	}
+	return rc;
+}
+
+static int bnxt_qplib_cq_process_terminal(struct bnxt_qplib_cq *cq,
+					  struct cq_terminal *hwcqe,
+					  struct bnxt_qplib_cqe **pcqe,
+					  int *budget)
+{
+	struct bnxt_qplib_qp *qp;
+	struct bnxt_qplib_q *sq, *rq;
+	struct bnxt_qplib_cqe *cqe;
+	u32 sw_cons, cqe_cons;
+	int rc = 0;
+	u8 opcode = 0;
+
+	/* Check the Status */
+	if (hwcqe->status != CQ_TERMINAL_STATUS_OK)
+		dev_warn(&cq->hwq.pdev->dev,
+			 "QPLIB: FP: CQ Process Terminal Error status = 0x%x",
+			 hwcqe->status);
+
+	qp = (struct bnxt_qplib_qp *)le64_to_cpu(hwcqe->qp_handle);
+	if (!qp) {
+		dev_err(&cq->hwq.pdev->dev,
+			"QPLIB: FP: CQ Process terminal qp is NULL");
+		return -EINVAL;
+	}
+	/* Must block new posting of SQ and RQ */
+	qp->state = CMDQ_MODIFY_QP_NEW_STATE_ERR;
+
+	sq = &qp->sq;
+	rq = &qp->rq;
+
+	cqe_cons = le16_to_cpu(hwcqe->sq_cons_idx);
+	if (cqe_cons == 0xFFFF)
+		goto do_rq;
+
+	if (cqe_cons > sq->hwq.max_elements) {
+		dev_err(&cq->hwq.pdev->dev,
+			"QPLIB: FP: CQ Process terminal reported ");
+		dev_err(&cq->hwq.pdev->dev,
+			"QPLIB: sq_cons_idx 0x%x which exceeded max 0x%x",
+			cqe_cons, sq->hwq.max_elements);
+		goto do_rq;
+	}
+	/* If we were in the middle of flushing, continue */
+	if (sq->flush_in_progress)
+		goto flush_sq;
+
+	/* Terminal CQE can also include aggregated successful CQEs prior.
+	 * So we must complete all CQEs from the current sq's cons to the
+	 * cq_cons with status OK
+	 */
+	cqe = *pcqe;
+	while (*budget) {
+		sw_cons = HWQ_CMP(sq->hwq.cons, &sq->hwq);
+		if (sw_cons == cqe_cons)
+			break;
+		if (sq->swq[sw_cons].flags & SQ_SEND_FLAGS_SIGNAL_COMP) {
+			memset(cqe, 0, sizeof(*cqe));
+			cqe->status = CQ_REQ_STATUS_OK;
+			cqe->opcode = CQ_BASE_CQE_TYPE_REQ;
+			cqe->qp_handle = (u64)qp;
+			cqe->src_qp = qp->id;
+			cqe->wr_id = sq->swq[sw_cons].wr_id;
+			cqe->type = sq->swq[sw_cons].type;
+			cqe++;
+			(*budget)--;
+		}
+		sq->hwq.cons++;
+	}
+	*pcqe = cqe;
+	if (!budget && sw_cons != cqe_cons) {
+		/* Out of budget */
+		rc = -EAGAIN;
+		goto sq_done;
+	}
+	sq->flush_in_progress = true;
+flush_sq:
+	rc = __flush_sq(sq, qp, pcqe, budget);
+	if (!rc)
+		sq->flush_in_progress = false;
+sq_done:
+	if (rc)
+		return rc;
+do_rq:
+	cqe_cons = le16_to_cpu(hwcqe->rq_cons_idx);
+	if (cqe_cons == 0xFFFF) {
+		goto done;
+	} else if (cqe_cons > rq->hwq.max_elements) {
+		dev_err(&cq->hwq.pdev->dev,
+			"QPLIB: FP: CQ Processed terminal ");
+		dev_err(&cq->hwq.pdev->dev,
+			"QPLIB: reported rq_cons_idx 0x%x exceeds max 0x%x",
+			cqe_cons, rq->hwq.max_elements);
+		goto done;
+	}
+	/* Terminal CQE requires all posted RQEs to complete with FLUSHED_ERR
+	 * from the current rq->cons to the rq->prod regardless what the
+	 * rq->cons the terminal CQE indicates
+	 */
+	rq->flush_in_progress = true;
+	switch (qp->type) {
+	case CMDQ_CREATE_QP1_TYPE_GSI:
+		opcode = CQ_BASE_CQE_TYPE_RES_RAWETH_QP1;
+		break;
+	case CMDQ_CREATE_QP_TYPE_RC:
+		opcode = CQ_BASE_CQE_TYPE_RES_RC;
+		break;
+	case CMDQ_CREATE_QP_TYPE_UD:
+		opcode = CQ_BASE_CQE_TYPE_RES_UD;
+		break;
+	}
+
+	rc = __flush_rq(rq, qp, opcode, pcqe, budget);
+	if (!rc)
+		rq->flush_in_progress = false;
+done:
+	return rc;
+}
+
+static int bnxt_qplib_cq_process_cutoff(struct bnxt_qplib_cq *cq,
+					struct cq_cutoff *hwcqe)
+{
+	/* Check the Status */
+	if (hwcqe->status != CQ_CUTOFF_STATUS_OK) {
+		dev_err(&cq->hwq.pdev->dev,
+			"QPLIB: FP: CQ Process Cutoff Error status = 0x%x",
+			hwcqe->status);
+		return -EINVAL;
+	}
+	clear_bit(CQ_FLAGS_RESIZE_IN_PROG, &cq->flags);
+	wake_up_interruptible(&cq->waitq);
+
+	return 0;
+}
+
+int bnxt_qplib_poll_cq(struct bnxt_qplib_cq *cq, struct bnxt_qplib_cqe *cqe,
+		       int num_cqes)
+{
+	struct cq_base *hw_cqe, **hw_cqe_ptr;
+	unsigned long flags;
+	u32 sw_cons, raw_cons;
+	int budget, rc = 0;
+
+	spin_lock_irqsave(&cq->hwq.lock, flags);
+	raw_cons = cq->hwq.cons;
+	budget = num_cqes;
+
+	while (budget) {
+		sw_cons = HWQ_CMP(raw_cons, &cq->hwq);
+		hw_cqe_ptr = (struct cq_base **)cq->hwq.pbl_ptr;
+		hw_cqe = &hw_cqe_ptr[CQE_PG(sw_cons)][CQE_IDX(sw_cons)];
+
+		/* Check for Valid bit */
+		if (!CQE_CMP_VALID(hw_cqe, raw_cons, cq->hwq.max_elements))
+			break;
+
+		/* From the device's respective CQE format to qplib_wc*/
+		switch (hw_cqe->cqe_type_toggle & CQ_BASE_CQE_TYPE_MASK) {
+		case CQ_BASE_CQE_TYPE_REQ:
+			rc = bnxt_qplib_cq_process_req(cq,
+						       (struct cq_req *)hw_cqe,
+						       &cqe, &budget);
+			break;
+		case CQ_BASE_CQE_TYPE_RES_RC:
+			rc = bnxt_qplib_cq_process_res_rc(cq,
+							  (struct cq_res_rc *)
+							  hw_cqe, &cqe,
+							  &budget);
+			break;
+		case CQ_BASE_CQE_TYPE_RES_UD:
+			rc = bnxt_qplib_cq_process_res_ud
+					(cq, (struct cq_res_ud *)hw_cqe, &cqe,
+					 &budget);
+			break;
+		case CQ_BASE_CQE_TYPE_RES_RAWETH_QP1:
+			rc = bnxt_qplib_cq_process_res_raweth_qp1
+					(cq, (struct cq_res_raweth_qp1 *)
+					 hw_cqe, &cqe, &budget);
+			break;
+		case CQ_BASE_CQE_TYPE_TERMINAL:
+			rc = bnxt_qplib_cq_process_terminal
+					(cq, (struct cq_terminal *)hw_cqe,
+					 &cqe, &budget);
+			break;
+		case CQ_BASE_CQE_TYPE_CUT_OFF:
+			bnxt_qplib_cq_process_cutoff
+					(cq, (struct cq_cutoff *)hw_cqe);
+			/* Done processing this CQ */
+			goto exit;
+		default:
+			dev_err(&cq->hwq.pdev->dev,
+				"QPLIB: process_cq unknown type 0x%lx",
+				hw_cqe->cqe_type_toggle &
+				CQ_BASE_CQE_TYPE_MASK);
+			rc = -EINVAL;
+			break;
+		}
+		if (rc < 0) {
+			if (rc == -EAGAIN)
+				break;
+			/* Error while processing the CQE, just skip to the
+			 * next one
+			 */
+			dev_err(&cq->hwq.pdev->dev,
+				"QPLIB: process_cqe error rc = 0x%x", rc);
+		}
+		raw_cons++;
+	}
+	if (cq->hwq.cons != raw_cons) {
+		cq->hwq.cons = raw_cons;
+		bnxt_qplib_arm_cq(cq, DBR_DBR_TYPE_CQ);
+	}
+exit:
+	spin_unlock_irqrestore(&cq->hwq.lock, flags);
+	return num_cqes - budget;
+}
+
 void bnxt_qplib_req_notify_cq(struct bnxt_qplib_cq *cq, u32 arm_type)
 {
 	unsigned long flags;
diff --git a/drivers/infiniband/hw/bnxtre/bnxt_qplib_fp.h b/drivers/infiniband/hw/bnxtre/bnxt_qplib_fp.h
index d9f2611..37d7d1e 100644
--- a/drivers/infiniband/hw/bnxtre/bnxt_qplib_fp.h
+++ b/drivers/infiniband/hw/bnxtre/bnxt_qplib_fp.h
@@ -373,7 +373,7 @@ struct bnxt_qplib_nq {
 
 	int				(*cqn_handler)
 						(struct bnxt_qplib_nq *nq,
-						 void *cq);
+						 struct bnxt_qplib_cq *cq);
 	int				(*srqn_handler)
 						(struct bnxt_qplib_nq *nq,
 						 void *srq,
@@ -384,7 +384,7 @@ void bnxt_qplib_disable_nq(struct bnxt_qplib_nq *nq);
 int bnxt_qplib_enable_nq(struct pci_dev *pdev, struct bnxt_qplib_nq *nq,
 			 int msix_vector, int bar_reg_offset,
 			 int (*cqn_handler)(struct bnxt_qplib_nq *nq,
-					    void *cq),
+					    struct bnxt_qplib_cq *cq),
 			 int (*srqn_handler)(struct bnxt_qplib_nq *nq,
 					     void *srq,
 					     u8 event));
@@ -408,7 +408,8 @@ int bnxt_qplib_post_recv(struct bnxt_qplib_qp *qp,
 			 struct bnxt_qplib_swqe *wqe);
 int bnxt_qplib_create_cq(struct bnxt_qplib_res *res, struct bnxt_qplib_cq *cq);
 int bnxt_qplib_destroy_cq(struct bnxt_qplib_res *res, struct bnxt_qplib_cq *cq);
-
+int bnxt_qplib_poll_cq(struct bnxt_qplib_cq *cq, struct bnxt_qplib_cqe *cqe,
+		       int num);
 void bnxt_qplib_req_notify_cq(struct bnxt_qplib_cq *cq, u32 arm_type);
 void bnxt_qplib_free_nq(struct bnxt_qplib_nq *nq);
 int bnxt_qplib_alloc_nq(struct pci_dev *pdev, struct bnxt_qplib_nq *nq);
diff --git a/drivers/infiniband/hw/bnxtre/bnxt_re_ib_verbs.c b/drivers/infiniband/hw/bnxtre/bnxt_re_ib_verbs.c
index 5d6c89c..01cb833 100644
--- a/drivers/infiniband/hw/bnxtre/bnxt_re_ib_verbs.c
+++ b/drivers/infiniband/hw/bnxtre/bnxt_re_ib_verbs.c
@@ -2043,7 +2043,7 @@ int bnxt_re_post_send(struct ib_qp *ib_qp, struct ib_send_wr *wr,
 	return rc;
 }
 
-int bnxt_re_post_recv_shadow_qp(struct bnxt_re_dev *rdev,
+static int bnxt_re_post_recv_shadow_qp(struct bnxt_re_dev *rdev,
 				struct bnxt_re_qp *qp,
 				struct ib_recv_wr *wr)
 {
@@ -2247,6 +2247,525 @@ struct ib_cq *bnxt_re_create_cq(struct ib_device *ibdev,
 	return ERR_PTR(rc);
 }
 
+static u8 __req_to_ib_wc_status(u8 qstatus)
+{
+	switch (qstatus) {
+	case CQ_REQ_STATUS_OK:
+		return IB_WC_SUCCESS;
+	case CQ_REQ_STATUS_BAD_RESPONSE_ERR:
+		return IB_WC_BAD_RESP_ERR;
+	case CQ_REQ_STATUS_LOCAL_LENGTH_ERR:
+		return IB_WC_LOC_LEN_ERR;
+	case CQ_REQ_STATUS_LOCAL_QP_OPERATION_ERR:
+		return IB_WC_LOC_QP_OP_ERR;
+	case CQ_REQ_STATUS_LOCAL_PROTECTION_ERR:
+		return IB_WC_LOC_PROT_ERR;
+	case CQ_REQ_STATUS_MEMORY_MGT_OPERATION_ERR:
+		return IB_WC_GENERAL_ERR;
+	case CQ_REQ_STATUS_REMOTE_INVALID_REQUEST_ERR:
+		return IB_WC_REM_INV_REQ_ERR;
+	case CQ_REQ_STATUS_REMOTE_ACCESS_ERR:
+		return IB_WC_REM_ACCESS_ERR;
+	case CQ_REQ_STATUS_REMOTE_OPERATION_ERR:
+		return IB_WC_REM_OP_ERR;
+	case CQ_REQ_STATUS_RNR_NAK_RETRY_CNT_ERR:
+		return IB_WC_RNR_RETRY_EXC_ERR;
+	case CQ_REQ_STATUS_TRANSPORT_RETRY_CNT_ERR:
+		return IB_WC_RETRY_EXC_ERR;
+	case CQ_REQ_STATUS_WORK_REQUEST_FLUSHED_ERR:
+		return IB_WC_WR_FLUSH_ERR;
+	default:
+		return IB_WC_GENERAL_ERR;
+	}
+	return 0;
+}
+
+static u8 __rawqp1_to_ib_wc_status(u8 qstatus)
+{
+	switch (qstatus) {
+	case CQ_RES_RAWETH_QP1_STATUS_OK:
+		return IB_WC_SUCCESS;
+	case CQ_RES_RAWETH_QP1_STATUS_LOCAL_ACCESS_ERROR:
+		return IB_WC_LOC_ACCESS_ERR;
+	case CQ_RES_RAWETH_QP1_STATUS_HW_LOCAL_LENGTH_ERR:
+		return IB_WC_LOC_LEN_ERR;
+	case CQ_RES_RAWETH_QP1_STATUS_LOCAL_PROTECTION_ERR:
+		return IB_WC_LOC_PROT_ERR;
+	case CQ_RES_RAWETH_QP1_STATUS_LOCAL_QP_OPERATION_ERR:
+		return IB_WC_LOC_QP_OP_ERR;
+	case CQ_RES_RAWETH_QP1_STATUS_MEMORY_MGT_OPERATION_ERR:
+		return IB_WC_GENERAL_ERR;
+	case CQ_RES_RAWETH_QP1_STATUS_WORK_REQUEST_FLUSHED_ERR:
+		return IB_WC_WR_FLUSH_ERR;
+	case CQ_RES_RAWETH_QP1_STATUS_HW_FLUSH_ERR:
+		return IB_WC_WR_FLUSH_ERR;
+	default:
+		return IB_WC_GENERAL_ERR;
+	}
+}
+
+static u8 __rc_to_ib_wc_status(u8 qstatus)
+{
+	switch (qstatus) {
+	case CQ_RES_RC_STATUS_OK:
+		return IB_WC_SUCCESS;
+	case CQ_RES_RC_STATUS_LOCAL_ACCESS_ERROR:
+		return IB_WC_LOC_ACCESS_ERR;
+	case CQ_RES_RC_STATUS_LOCAL_LENGTH_ERR:
+		return IB_WC_LOC_LEN_ERR;
+	case CQ_RES_RC_STATUS_LOCAL_PROTECTION_ERR:
+		return IB_WC_LOC_PROT_ERR;
+	case CQ_RES_RC_STATUS_LOCAL_QP_OPERATION_ERR:
+		return IB_WC_LOC_QP_OP_ERR;
+	case CQ_RES_RC_STATUS_MEMORY_MGT_OPERATION_ERR:
+		return IB_WC_GENERAL_ERR;
+	case CQ_RES_RC_STATUS_REMOTE_INVALID_REQUEST_ERR:
+		return IB_WC_REM_INV_REQ_ERR;
+	case CQ_RES_RC_STATUS_WORK_REQUEST_FLUSHED_ERR:
+		return IB_WC_WR_FLUSH_ERR;
+	case CQ_RES_RC_STATUS_HW_FLUSH_ERR:
+		return IB_WC_WR_FLUSH_ERR;
+	default:
+		return IB_WC_GENERAL_ERR;
+	}
+}
+
+static void bnxt_re_process_req_wc(struct ib_wc *wc, struct bnxt_qplib_cqe *cqe)
+{
+	switch (cqe->type) {
+	case BNXT_QPLIB_SWQE_TYPE_SEND:
+		wc->opcode = IB_WC_SEND;
+		break;
+	case BNXT_QPLIB_SWQE_TYPE_SEND_WITH_IMM:
+		wc->opcode = IB_WC_SEND;
+		wc->wc_flags |= IB_WC_WITH_IMM;
+		break;
+	case BNXT_QPLIB_SWQE_TYPE_SEND_WITH_INV:
+		wc->opcode = IB_WC_SEND;
+		wc->wc_flags |= IB_WC_WITH_INVALIDATE;
+		break;
+	case BNXT_QPLIB_SWQE_TYPE_RDMA_WRITE:
+		wc->opcode = IB_WC_RDMA_WRITE;
+		break;
+	case BNXT_QPLIB_SWQE_TYPE_RDMA_WRITE_WITH_IMM:
+		wc->opcode = IB_WC_RDMA_WRITE;
+		wc->wc_flags |= IB_WC_WITH_IMM;
+		break;
+	case BNXT_QPLIB_SWQE_TYPE_RDMA_READ:
+		wc->opcode = IB_WC_RDMA_READ;
+		break;
+	case BNXT_QPLIB_SWQE_TYPE_ATOMIC_CMP_AND_SWP:
+		wc->opcode = IB_WC_COMP_SWAP;
+		break;
+	case BNXT_QPLIB_SWQE_TYPE_ATOMIC_FETCH_AND_ADD:
+		wc->opcode = IB_WC_FETCH_ADD;
+		break;
+	case BNXT_QPLIB_SWQE_TYPE_LOCAL_INV:
+		wc->opcode = IB_WC_LOCAL_INV;
+		break;
+	case BNXT_QPLIB_SWQE_TYPE_REG_MR:
+		wc->opcode = IB_WC_REG_MR;
+		break;
+	default:
+		wc->opcode = IB_WC_SEND;
+		break;
+	}
+
+	wc->status = __req_to_ib_wc_status(cqe->status);
+}
+
+static int bnxt_re_check_packet_type(u16 raweth_qp1_flags, u16 raweth_qp1_flags2)
+{
+	bool is_udp = false, is_ipv6 = false, is_ipv4 = false;
+
+	/* raweth_qp1_flags Bit 9-6 indicates itype */
+	if ((raweth_qp1_flags & CQ_RES_RAWETH_QP1_RAWETH_QP1_FLAGS_ITYPE_ROCE)
+	    != CQ_RES_RAWETH_QP1_RAWETH_QP1_FLAGS_ITYPE_ROCE)
+		return -1;
+
+	if (raweth_qp1_flags2 &
+	    CQ_RES_RAWETH_QP1_RAWETH_QP1_FLAGS2_IP_CS_CALC &&
+	    raweth_qp1_flags2 &
+	    CQ_RES_RAWETH_QP1_RAWETH_QP1_FLAGS2_L4_CS_CALC) {
+		is_udp = true;
+		/* raweth_qp1_flags2 Bit 8 indicates ip_type. 0-v4 1 - v6 */
+		(raweth_qp1_flags2 &
+		 CQ_RES_RAWETH_QP1_RAWETH_QP1_FLAGS2_IP_TYPE) ?
+			(is_ipv6 = true) : (is_ipv4 = true);
+		return ((is_ipv6) ?
+			 BNXT_RE_ROCEV2_IPV6_PACKET :
+			 BNXT_RE_ROCEV2_IPV4_PACKET);
+	} else {
+		return BNXT_RE_ROCE_V1_PACKET;
+	}
+}
+
+static int bnxt_re_to_ib_nw_type(int nw_type)
+{
+	u8 nw_hdr_type = 0xFF;
+
+	switch (nw_type) {
+	case BNXT_RE_ROCE_V1_PACKET:
+		nw_hdr_type = RDMA_NETWORK_ROCE_V1;
+		break;
+	case BNXT_RE_ROCEV2_IPV4_PACKET:
+		nw_hdr_type = RDMA_NETWORK_IPV4;
+		break;
+	case BNXT_RE_ROCEV2_IPV6_PACKET:
+		nw_hdr_type = RDMA_NETWORK_IPV6;
+		break;
+	}
+	return nw_hdr_type;
+}
+
+static bool bnxt_re_is_loopback_packet(struct bnxt_re_dev *rdev,
+				       void *rq_hdr_buf)
+{
+	u8 *tmp_buf = NULL;
+	struct ethhdr *eth_hdr;
+	u16 eth_type;
+	bool rc = false;
+
+	tmp_buf = (u8 *)rq_hdr_buf;
+	/*
+	 * If dest mac is not same as I/F mac, this could be a
+	 * loopback address or multicast address, check whether
+	 * it is a loopback packet
+	 */
+	if (!ether_addr_equal(tmp_buf, rdev->netdev->dev_addr)) {
+		tmp_buf += 4;
+		/* Check the  ether type */
+		eth_hdr = (struct ethhdr *)tmp_buf;
+		eth_type = ntohs(eth_hdr->h_proto);
+		switch (eth_type) {
+		case BNXT_QPLIB_ETHTYPE_ROCEV1:
+			rc = true;
+			break;
+		case ETH_P_IP:
+		case ETH_P_IPV6: {
+			u32 len;
+			struct udphdr *udp_hdr;
+
+			len = (eth_type == ETH_P_IP ? sizeof(struct iphdr) :
+						      sizeof(struct ipv6hdr));
+			tmp_buf += sizeof(struct ethhdr) + len;
+			udp_hdr = (struct udphdr *)tmp_buf;
+			if (ntohs(udp_hdr->dest) ==
+				    ROCE_V2_UDP_DPORT)
+				rc = true;
+				break;
+			}
+		default:
+			break;
+		}
+	}
+
+	return rc;
+}
+
+static int bnxt_re_process_raw_qp_pkt_rx(struct bnxt_re_qp *qp1_qp,
+					 struct bnxt_qplib_cqe *cqe)
+{
+	struct bnxt_re_dev *rdev = qp1_qp->rdev;
+	struct bnxt_re_sqp_entries *sqp_entry = NULL;
+	struct bnxt_re_qp *qp = rdev->qp1_sqp;
+	struct ib_send_wr *swr;
+	struct ib_ud_wr udwr;
+	struct ib_recv_wr rwr;
+	u8 pkt_type = 0;
+	u32 tbl_idx;
+	void *rq_hdr_buf;
+	dma_addr_t rq_hdr_buf_map;
+	dma_addr_t shrq_hdr_buf_map;
+	u32 offset = 0;
+	u32 skip_bytes = 0;
+	struct ib_sge s_sge[2];
+	struct ib_sge r_sge[2];
+	int rc;
+
+	memset(&udwr, 0, sizeof(udwr));
+	memset(&rwr, 0, sizeof(rwr));
+	memset(&s_sge, 0, sizeof(s_sge));
+	memset(&r_sge, 0, sizeof(r_sge));
+
+	swr = &udwr.wr;
+	tbl_idx = cqe->wr_id;
+
+	rq_hdr_buf = qp1_qp->qplib_qp.rq_hdr_buf +
+			(tbl_idx * qp1_qp->qplib_qp.rq_hdr_buf_size);
+	rq_hdr_buf_map = bnxt_qplib_get_qp_buf_from_index(&qp1_qp->qplib_qp,
+							  tbl_idx);
+
+	/* Shadow QP header buffer */
+	shrq_hdr_buf_map = bnxt_qplib_get_qp_buf_from_index(&qp->qplib_qp,
+							    tbl_idx);
+	sqp_entry = &rdev->sqp_tbl[tbl_idx];
+
+	/* Store this cqe */
+	memcpy(&sqp_entry->cqe, cqe, sizeof(struct bnxt_qplib_cqe));
+	sqp_entry->qp1_qp = qp1_qp;
+
+	/* Find packet type from the cqe */
+
+	pkt_type = bnxt_re_check_packet_type(cqe->raweth_qp1_flags,
+					     cqe->raweth_qp1_flags2);
+	if (pkt_type < 0) {
+		dev_err(rdev_to_dev(rdev), "Invalid packet\n");
+		return -EINVAL;
+	}
+
+	/* Adjust the offset for the user buffer and post in the rq */
+
+	if (pkt_type == BNXT_RE_ROCEV2_IPV4_PACKET)
+		offset = 20;
+
+	/*
+	 * QP1 loopback packet has 4 bytes of internal header before
+	 * ether header. Skip these four bytes.
+	 */
+	if (bnxt_re_is_loopback_packet(rdev, rq_hdr_buf))
+		skip_bytes = 4;
+
+	/* First send SGE . Skip the ether header*/
+	s_sge[0].addr = rq_hdr_buf_map + BNXT_QPLIB_MAX_QP1_RQ_ETH_HDR_SIZE
+			+ skip_bytes;
+	s_sge[0].lkey = 0xFFFFFFFF;
+	s_sge[0].length = offset ? BNXT_QPLIB_MAX_GRH_HDR_SIZE_IPV4 :
+				BNXT_QPLIB_MAX_GRH_HDR_SIZE_IPV6;
+
+	/* Second Send SGE */
+	s_sge[1].addr = s_sge[0].addr + s_sge[0].length +
+			BNXT_QPLIB_MAX_QP1_RQ_BDETH_HDR_SIZE;
+	if (pkt_type != BNXT_RE_ROCE_V1_PACKET)
+		s_sge[1].addr += 8;
+	s_sge[1].lkey = 0xFFFFFFFF;
+	s_sge[1].length = 256;
+
+	/* First recv SGE */
+
+	r_sge[0].addr = shrq_hdr_buf_map;
+	r_sge[0].lkey = 0xFFFFFFFF;
+	r_sge[0].length = 40;
+
+	r_sge[1].addr = sqp_entry->sge.addr + offset;
+	r_sge[1].lkey = sqp_entry->sge.lkey;
+	r_sge[1].length = BNXT_QPLIB_MAX_GRH_HDR_SIZE_IPV6 + 256 - offset;
+
+	/* Create receive work request */
+	rwr.num_sge = 2;
+	rwr.sg_list = r_sge;
+	rwr.wr_id = tbl_idx;
+	rwr.next = NULL;
+
+	rc = bnxt_re_post_recv_shadow_qp(rdev, qp, &rwr);
+	if (rc) {
+		dev_err(rdev_to_dev(rdev),
+			"Failed to post Rx buffers to shadow QP");
+		return -ENOMEM;
+	}
+
+	swr->num_sge = 2;
+	swr->sg_list = s_sge;
+	swr->wr_id = tbl_idx;
+	swr->opcode = IB_WR_SEND;
+	swr->next = NULL;
+
+	udwr.ah = &rdev->sqp_ah->ib_ah;
+	udwr.remote_qpn = rdev->qp1_sqp->qplib_qp.id;
+	udwr.remote_qkey = rdev->qp1_sqp->qplib_qp.qkey;
+
+	/* post data received  in the send queue */
+	rc = bnxt_re_post_send_shadow_qp(rdev, qp, swr);
+
+	return 0;
+}
+
+static void bnxt_re_process_res_rawqp1_wc(struct ib_wc *wc,
+					  struct bnxt_qplib_cqe *cqe)
+{
+	wc->opcode = IB_WC_RECV;
+	wc->status = __rawqp1_to_ib_wc_status(cqe->status);
+	wc->wc_flags |= IB_WC_GRH;
+}
+
+static void bnxt_re_process_res_rc_wc(struct ib_wc *wc,
+				      struct bnxt_qplib_cqe *cqe)
+{
+	wc->opcode = IB_WC_RECV;
+	wc->status = __rc_to_ib_wc_status(cqe->status);
+
+	if (cqe->flags & CQ_RES_RC_FLAGS_IMM)
+		wc->wc_flags |= IB_WC_WITH_IMM;
+	if (cqe->flags & CQ_RES_RC_FLAGS_INV)
+		wc->wc_flags |= IB_WC_WITH_INVALIDATE;
+	if ((cqe->flags & (CQ_RES_RC_FLAGS_RDMA | CQ_RES_RC_FLAGS_IMM)) ==
+	    (CQ_RES_RC_FLAGS_RDMA | CQ_RES_RC_FLAGS_IMM))
+		wc->opcode = IB_WC_RECV_RDMA_WITH_IMM;
+}
+
+static void bnxt_re_process_res_shadow_qp_wc(struct bnxt_re_qp *qp,
+					     struct ib_wc *wc,
+					     struct bnxt_qplib_cqe *cqe)
+{
+	u32 tbl_idx;
+	struct bnxt_re_dev *rdev = qp->rdev;
+	struct bnxt_re_qp *qp1_qp = NULL;
+	struct bnxt_qplib_cqe *orig_cqe = NULL;
+	struct bnxt_re_sqp_entries *sqp_entry = NULL;
+	int nw_type;
+
+	tbl_idx = cqe->wr_id;
+
+	sqp_entry = &rdev->sqp_tbl[tbl_idx];
+	qp1_qp = sqp_entry->qp1_qp;
+	orig_cqe = &sqp_entry->cqe;
+
+	wc->wr_id = sqp_entry->wrid;
+	wc->byte_len = orig_cqe->length;
+	wc->qp = &qp1_qp->ib_qp;
+
+	wc->ex.imm_data = orig_cqe->immdata_or_invrkey;
+	wc->src_qp = orig_cqe->src_qp;
+	memcpy(wc->smac, orig_cqe->smac, ETH_ALEN);
+	wc->port_num = 1;
+	wc->vendor_err = orig_cqe->status;
+
+	wc->opcode = IB_WC_RECV;
+	wc->status = __rawqp1_to_ib_wc_status(orig_cqe->status);
+	wc->wc_flags |= IB_WC_GRH;
+
+	nw_type = bnxt_re_check_packet_type(orig_cqe->raweth_qp1_flags,
+					    orig_cqe->raweth_qp1_flags2);
+	if (nw_type >= 0) {
+		wc->network_hdr_type = bnxt_re_to_ib_nw_type(nw_type);
+		wc->wc_flags |= IB_WC_WITH_NETWORK_HDR_TYPE;
+	}
+}
+
+static void bnxt_re_process_res_ud_wc(struct ib_wc *wc,
+				      struct bnxt_qplib_cqe *cqe)
+{
+	wc->opcode = IB_WC_RECV;
+	wc->status = __rc_to_ib_wc_status(cqe->status);
+
+	if (cqe->flags & CQ_RES_RC_FLAGS_IMM)
+		wc->wc_flags |= IB_WC_WITH_IMM;
+	if (cqe->flags & CQ_RES_RC_FLAGS_INV)
+		wc->wc_flags |= IB_WC_WITH_INVALIDATE;
+	if ((cqe->flags & (CQ_RES_RC_FLAGS_RDMA | CQ_RES_RC_FLAGS_IMM)) ==
+	    (CQ_RES_RC_FLAGS_RDMA | CQ_RES_RC_FLAGS_IMM))
+		wc->opcode = IB_WC_RECV_RDMA_WITH_IMM;
+}
+
+int bnxt_re_poll_cq(struct ib_cq *ib_cq, int num_entries, struct ib_wc *wc)
+{
+	struct bnxt_re_cq *cq = to_bnxt_re(ib_cq, struct bnxt_re_cq, ib_cq);
+	struct bnxt_re_qp *qp;
+	struct bnxt_qplib_cqe *cqe;
+	int i, ncqe, budget;
+	u32 tbl_idx;
+	struct bnxt_re_sqp_entries *sqp_entry = NULL;
+	unsigned long flags;
+
+	spin_lock_irqsave(&cq->cq_lock, flags);
+	budget = min_t(u32, num_entries, cq->max_cql);
+	if (!cq->cql) {
+		dev_err(rdev_to_dev(cq->rdev), "POLL CQ : no CQL to use");
+		goto exit;
+	}
+	cqe = &cq->cql[0];
+	while (budget) {
+		ncqe = bnxt_qplib_poll_cq(&cq->qplib_cq, cqe, budget);
+		if (!ncqe)
+			break;
+
+		for (i = 0; i < ncqe; i++, cqe++) {
+			/* Transcribe each qplib_wqe back to ib_wc */
+			memset(wc, 0, sizeof(*wc));
+
+			wc->wr_id = cqe->wr_id;
+			wc->byte_len = cqe->length;
+			qp = to_bnxt_re((struct bnxt_qplib_qp *)cqe->qp_handle,
+					struct bnxt_re_qp, qplib_qp);
+			if (!qp) {
+				dev_err(rdev_to_dev(cq->rdev),
+					"POLL CQ : bad QP handle");
+				continue;
+			}
+			wc->qp = &qp->ib_qp;
+			wc->ex.imm_data = cqe->immdata_or_invrkey;
+			wc->src_qp = cqe->src_qp;
+			memcpy(wc->smac, cqe->smac, ETH_ALEN);
+			wc->port_num = 1;
+			wc->vendor_err = cqe->status;
+
+			switch (cqe->opcode) {
+			case CQ_BASE_CQE_TYPE_REQ:
+				if (qp->qplib_qp.id ==
+				    qp->rdev->qp1_sqp->qplib_qp.id) {
+					/* Handle this completion with
+					 * the stored completion
+					 */
+					memset(wc, 0, sizeof(*wc));
+					continue;
+				}
+				bnxt_re_process_req_wc(wc, cqe);
+				break;
+			case CQ_BASE_CQE_TYPE_RES_RAWETH_QP1:
+				if (!cqe->status) {
+					int rc = 0;
+
+					rc = bnxt_re_process_raw_qp_pkt_rx
+								(qp, cqe);
+					if (!rc) {
+						memset(wc, 0, sizeof(*wc));
+						continue;
+					}
+					cqe->status = -1;
+				}
+				/* Errors need not be looped back.
+				 * But change the wr_id to the one
+				 * stored in the table
+				 */
+				tbl_idx = cqe->wr_id;
+				sqp_entry = &cq->rdev->sqp_tbl[tbl_idx];
+				wc->wr_id = sqp_entry->wrid;
+				bnxt_re_process_res_rawqp1_wc(wc, cqe);
+				break;
+			case CQ_BASE_CQE_TYPE_RES_RC:
+				bnxt_re_process_res_rc_wc(wc, cqe);
+				break;
+			case CQ_BASE_CQE_TYPE_RES_UD:
+				if (qp->qplib_qp.id ==
+				    qp->rdev->qp1_sqp->qplib_qp.id) {
+					/* Handle this completion with
+					 * the stored completion
+					 */
+					if (cqe->status) {
+						continue;
+					} else {
+						bnxt_re_process_res_shadow_qp_wc
+								(qp, wc, cqe);
+						break;
+					}
+				}
+				bnxt_re_process_res_ud_wc(wc, cqe);
+				break;
+			default:
+				dev_err(rdev_to_dev(cq->rdev),
+					"POLL CQ : type 0x%x not handled",
+					cqe->opcode);
+				continue;
+			}
+			wc++;
+			budget--;
+		}
+	}
+exit:
+	spin_unlock_irqrestore(&cq->cq_lock, flags);
+	return num_entries - budget;
+}
+
 int bnxt_re_req_notify_cq(struct ib_cq *ib_cq,
 			  enum ib_cq_notify_flags ib_cqn_flags)
 {
diff --git a/drivers/infiniband/hw/bnxtre/bnxt_re_ib_verbs.h b/drivers/infiniband/hw/bnxtre/bnxt_re_ib_verbs.h
index 9f3dd49..cc04994 100644
--- a/drivers/infiniband/hw/bnxtre/bnxt_re_ib_verbs.h
+++ b/drivers/infiniband/hw/bnxtre/bnxt_re_ib_verbs.h
@@ -171,6 +171,7 @@ struct ib_cq *bnxt_re_create_cq(struct ib_device *ibdev,
 				struct ib_ucontext *context,
 				struct ib_udata *udata);
 int bnxt_re_destroy_cq(struct ib_cq *cq);
+int bnxt_re_poll_cq(struct ib_cq *cq, int num_entries, struct ib_wc *wc);
 int bnxt_re_req_notify_cq(struct ib_cq *cq, enum ib_cq_notify_flags flags);
 struct ib_mr *bnxt_re_get_dma_mr(struct ib_pd *pd, int mr_access_flags);
 
diff --git a/drivers/infiniband/hw/bnxtre/bnxt_re_main.c b/drivers/infiniband/hw/bnxtre/bnxt_re_main.c
index 73dfadd..fd52e65 100644
--- a/drivers/infiniband/hw/bnxtre/bnxt_re_main.c
+++ b/drivers/infiniband/hw/bnxtre/bnxt_re_main.c
@@ -456,6 +456,7 @@ static int bnxt_re_register_ib(struct bnxt_re_dev *rdev)
 
 	ibdev->create_cq		= bnxt_re_create_cq;
 	ibdev->destroy_cq		= bnxt_re_destroy_cq;
+	ibdev->poll_cq			= bnxt_re_poll_cq;
 	ibdev->req_notify_cq		= bnxt_re_req_notify_cq;
 
 	ibdev->get_dma_mr		= bnxt_re_get_dma_mr;
@@ -605,6 +606,25 @@ static int bnxt_re_aeq_handler(struct bnxt_qplib_rcfw *rcfw,
 	return 0;
 }
 
+static int bnxt_re_cqn_handler(struct bnxt_qplib_nq *nq,
+			       struct bnxt_qplib_cq *handle)
+{
+	struct bnxt_re_cq *cq = to_bnxt_re(handle, struct bnxt_re_cq,
+					   qplib_cq);
+
+	if (!cq) {
+		dev_err(NULL, "%s: CQ is NULL, CQN not handled",
+			ROCE_DRV_MODULE_NAME);
+		return -EINVAL;
+	}
+	if (cq->ib_cq.comp_handler) {
+		/* Lock comp_handler? */
+		(*cq->ib_cq.comp_handler)(&cq->ib_cq, cq->ib_cq.cq_context);
+	}
+
+	return 0;
+}
+
 static void bnxt_re_cleanup_res(struct bnxt_re_dev *rdev)
 {
 	if (rdev->nq.hwq.max_elements)
@@ -626,7 +646,7 @@ static int bnxt_re_init_res(struct bnxt_re_dev *rdev)
 	rc = bnxt_qplib_enable_nq(rdev->en_dev->pdev, &rdev->nq,
 				  rdev->msix_entries[BNXT_RE_NQ_IDX].vector,
 				  rdev->msix_entries[BNXT_RE_NQ_IDX].db_offset,
-				  NULL,
+				  &bnxt_re_cqn_handler,
 				  NULL);
 
 	if (rc)
-- 
2.5.5

^ permalink raw reply related

* [PATCH V2  17/22] bnxt_re: Handling dispatching of events to IB stack
From: Selvin Xavier @ 2016-12-09  6:48 UTC (permalink / raw)
  To: dledford, linux-rdma
  Cc: netdev, Selvin Xavier, Eddie Wai, Devesh Sharma, Somnath Kotur,
	Sriharsha Basavapatna
In-Reply-To: <1481266096-23331-1-git-send-email-selvin.xavier@broadcom.com>

This patch implements handling dispatch of appropriate event to the IB stack
based on NETDEV events received.

v2: Removed cleanup of the resources during driver unload since we are calling
    unregister_netdevice_notifier first in the exit.

Signed-off-by: Eddie Wai <eddie.wai@broadcom.com>
Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com>
Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
---
 drivers/infiniband/hw/bnxtre/bnxt_re_main.c | 64 +++++++++++++++++++++++++++++
 1 file changed, 64 insertions(+)

diff --git a/drivers/infiniband/hw/bnxtre/bnxt_re_main.c b/drivers/infiniband/hw/bnxtre/bnxt_re_main.c
index fd52e65..260dec1 100644
--- a/drivers/infiniband/hw/bnxtre/bnxt_re_main.c
+++ b/drivers/infiniband/hw/bnxtre/bnxt_re_main.c
@@ -720,6 +720,60 @@ static int bnxt_re_alloc_res(struct bnxt_re_dev *rdev)
 	return rc;
 }
 
+static void bnxt_re_dispatch_event(struct ib_device *ibdev, struct ib_qp *qp,
+				   u8 port_num, enum ib_event_type event)
+{
+	struct ib_event ib_event;
+
+	ib_event.device = ibdev;
+	if (qp)
+		ib_event.element.qp = qp;
+	else
+		ib_event.element.port_num = port_num;
+	ib_event.event = event;
+	ib_dispatch_event(&ib_event);
+}
+
+static bool bnxt_re_is_qp1_or_shadow_qp(struct bnxt_re_dev *rdev,
+					struct bnxt_re_qp *qp)
+{
+	return (qp->ib_qp.qp_type == IB_QPT_GSI) || (qp == rdev->qp1_sqp);
+}
+
+static void bnxt_re_dev_stop(struct bnxt_re_dev *rdev, bool qp_wait)
+{
+	int mask = IB_QP_STATE, qp_count, count = 1;
+	struct ib_qp_attr qp_attr;
+	struct bnxt_re_qp *qp;
+
+	qp_attr.qp_state = IB_QPS_ERR;
+	mutex_lock(&rdev->qp_lock);
+	list_for_each_entry(qp, &rdev->qp_list, list) {
+		/* Modify the state of all QPs except QP1/Shadow QP */
+		if (qp && !bnxt_re_is_qp1_or_shadow_qp(rdev, qp)) {
+			if (qp->qplib_qp.state !=
+			    CMDQ_MODIFY_QP_NEW_STATE_RESET ||
+			    qp->qplib_qp.state !=
+			    CMDQ_MODIFY_QP_NEW_STATE_ERR) {
+				bnxt_re_dispatch_event(&rdev->ibdev, &qp->ib_qp,
+						       1, IB_EVENT_QP_FATAL);
+				bnxt_re_modify_qp(&qp->ib_qp, &qp_attr, mask,
+						  NULL);
+			}
+		}
+	}
+
+	mutex_unlock(&rdev->qp_lock);
+	if (qp_wait) {
+		/* Give the application some time to clean up */
+		do {
+			qp_count = atomic_read(&rdev->qp_count);
+			msleep(100);
+		} while ((qp_count != atomic_read(&rdev->qp_count)) &&
+			  count--);
+	}
+}
+
 static void bnxt_re_ib_unreg(struct bnxt_re_dev *rdev, bool lock_wait)
 {
 	int i, rc;
@@ -878,6 +932,8 @@ static int bnxt_re_ib_reg(struct bnxt_re_dev *rdev)
 		}
 	}
 	set_bit(BNXT_RE_FLAG_IBDEV_REGISTERED, &rdev->flags);
+	bnxt_re_dispatch_event(&rdev->ibdev, NULL, 1, IB_EVENT_PORT_ACTIVE);
+	bnxt_re_dispatch_event(&rdev->ibdev, NULL, 1, IB_EVENT_GID_CHANGE);
 
 	return 0;
 free_sctx:
@@ -956,10 +1012,18 @@ static void bnxt_re_task(struct work_struct *work)
 				"Failed to register with IB: %#x", rc);
 			break;
 	case NETDEV_UP:
+		bnxt_re_dispatch_event(&rdev->ibdev, NULL, 1,
+				       IB_EVENT_PORT_ACTIVE);
 		break;
 	case NETDEV_DOWN:
+		bnxt_re_dev_stop(rdev, false);
 		break;
 	case NETDEV_CHANGE:
+		if (!netif_carrier_ok(rdev->netdev))
+			bnxt_re_dev_stop(rdev, false);
+		else if (netif_carrier_ok(rdev->netdev))
+			bnxt_re_dispatch_event(&rdev->ibdev, NULL, 1,
+					       IB_EVENT_PORT_ACTIVE);
 		break;
 	default:
 		break;
-- 
2.5.5

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox