netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next 0/2] cxgb4: speed up reading on-chip memory
@ 2018-01-14  9:32 Rahul Lakkireddy
  2018-01-14  9:32 ` [PATCH net-next 1/2] cxgb4: rework on-chip memory read Rahul Lakkireddy
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Rahul Lakkireddy @ 2018-01-14  9:32 UTC (permalink / raw)
  To: netdev; +Cc: davem, ganeshgr, nirranjan, indranil, Rahul Lakkireddy

This series of patches speed up reading on-chip memory (EDC and MC)
by using AVX intrinsic instructions when available.

Patch 1 exports callback to register supported intrinsic instructions
when available.  Also rework logic to read EDC and MC.

Patch 2 adds AVX CPU intrinsic instructions to read EDC and MC
256-bits at a time.  Also fallback to regular 32-bit reads, if AVX is
not available.

Thanks,
Rahul

Rahul Lakkireddy (2):
  cxgb4: rework on-chip memory read
  cxgb4: speed up on-chip memory read

 drivers/net/ethernet/chelsio/cxgb4/Makefile        |   3 +-
 drivers/net/ethernet/chelsio/cxgb4/cudbg_entity.h  |   2 +
 drivers/net/ethernet/chelsio/cxgb4/cudbg_if.h      |   6 +
 .../net/ethernet/chelsio/cxgb4/cudbg_intrinsic.c   |  43 +++++
 .../net/ethernet/chelsio/cxgb4/cudbg_intrinsic.h   |  33 ++++
 .../ethernet/chelsio/cxgb4/cudbg_intrinsic_avx.c   |  78 +++++++++
 drivers/net/ethernet/chelsio/cxgb4/cudbg_lib.c     |  70 +++++++-
 drivers/net/ethernet/chelsio/cxgb4/cxgb4.h         |   5 +
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_cudbg.c   |   7 +-
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_ethtool.c |   2 +
 drivers/net/ethernet/chelsio/cxgb4/t4_hw.c         | 193 +++++++++++++--------
 11 files changed, 367 insertions(+), 75 deletions(-)
 create mode 100644 drivers/net/ethernet/chelsio/cxgb4/cudbg_intrinsic.c
 create mode 100644 drivers/net/ethernet/chelsio/cxgb4/cudbg_intrinsic.h
 create mode 100644 drivers/net/ethernet/chelsio/cxgb4/cudbg_intrinsic_avx.c

-- 
2.14.1

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH net-next 1/2] cxgb4: rework on-chip memory read
  2018-01-14  9:32 [PATCH net-next 0/2] cxgb4: speed up reading on-chip memory Rahul Lakkireddy
@ 2018-01-14  9:32 ` Rahul Lakkireddy
  2018-01-16  1:59   ` kbuild test robot
  2018-01-14  9:32 ` [PATCH net-next 2/2] cxgb4: speed up " Rahul Lakkireddy
  2018-01-14 17:17 ` [PATCH net-next 0/2] cxgb4: speed up reading on-chip memory David Miller
  2 siblings, 1 reply; 6+ messages in thread
From: Rahul Lakkireddy @ 2018-01-14  9:32 UTC (permalink / raw)
  To: netdev; +Cc: davem, ganeshgr, nirranjan, indranil, Rahul Lakkireddy

Export callback to register supported intrinsic instructions to speed
up reading EDC and MC.  By default, use 32-bit reads.  Also rework
logic to read EDC and MC.

Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
---
 drivers/net/ethernet/chelsio/cxgb4/Makefile        |   2 +-
 drivers/net/ethernet/chelsio/cxgb4/cudbg_if.h      |   6 +
 .../net/ethernet/chelsio/cxgb4/cudbg_intrinsic.c   |  38 ++++
 .../net/ethernet/chelsio/cxgb4/cudbg_intrinsic.h   |  25 +++
 drivers/net/ethernet/chelsio/cxgb4/cudbg_lib.c     |  70 +++++++-
 drivers/net/ethernet/chelsio/cxgb4/cxgb4.h         |   5 +
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_cudbg.c   |   2 +
 drivers/net/ethernet/chelsio/cxgb4/t4_hw.c         | 193 +++++++++++++--------
 8 files changed, 267 insertions(+), 74 deletions(-)
 create mode 100644 drivers/net/ethernet/chelsio/cxgb4/cudbg_intrinsic.c
 create mode 100644 drivers/net/ethernet/chelsio/cxgb4/cudbg_intrinsic.h

diff --git a/drivers/net/ethernet/chelsio/cxgb4/Makefile b/drivers/net/ethernet/chelsio/cxgb4/Makefile
index 8c9c6b0d2e5d..0dbaf1b18bac 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/Makefile
+++ b/drivers/net/ethernet/chelsio/cxgb4/Makefile
@@ -8,7 +8,7 @@ obj-$(CONFIG_CHELSIO_T4) += cxgb4.o
 cxgb4-objs := cxgb4_main.o l2t.o smt.o t4_hw.o sge.o clip_tbl.o cxgb4_ethtool.o \
 	      cxgb4_uld.o sched.o cxgb4_filter.o cxgb4_tc_u32.o \
 	      cxgb4_ptp.o cxgb4_tc_flower.o cxgb4_cudbg.o \
-	      cudbg_common.o cudbg_lib.o
+	      cudbg_common.o cudbg_lib.o cudbg_intrinsic.o
 cxgb4-$(CONFIG_CHELSIO_T4_DCB) +=  cxgb4_dcb.o
 cxgb4-$(CONFIG_CHELSIO_T4_FCOE) +=  cxgb4_fcoe.o
 cxgb4-$(CONFIG_DEBUG_FS) += cxgb4_debugfs.o
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cudbg_if.h b/drivers/net/ethernet/chelsio/cxgb4/cudbg_if.h
index 88e740082a02..456d61eacb27 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cudbg_if.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/cudbg_if.h
@@ -83,10 +83,16 @@ enum cudbg_dbg_entity_type {
 	CUDBG_MAX_ENTITY = 70,
 };
 
+struct cudbg_init;
+typedef unsigned int (*cudbg_intrinsic_t)(struct cudbg_init *pdbg_init,
+					  u32 start, u32 offset, u32 size,
+					  u32 max_size, u8 *buf);
+
 struct cudbg_init {
 	struct adapter *adap; /* Pointer to adapter structure */
 	void *outbuf; /* Output buffer */
 	u32 outbuf_size;  /* Output buffer size */
+	cudbg_intrinsic_t intrinsic_cb; /* CPU intrinsic callback */
 };
 
 static inline unsigned int cudbg_mbytes_to_bytes(unsigned int size)
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cudbg_intrinsic.c b/drivers/net/ethernet/chelsio/cxgb4/cudbg_intrinsic.c
new file mode 100644
index 000000000000..0b80512e5c0c
--- /dev/null
+++ b/drivers/net/ethernet/chelsio/cxgb4/cudbg_intrinsic.c
@@ -0,0 +1,38 @@
+/*
+ *  Copyright (C) 2018 Chelsio Communications.  All rights reserved.
+ *
+ *  This program is free software; you can redistribute it and/or modify it
+ *  under the terms and conditions of the GNU General Public License,
+ *  version 2, as published by the Free Software Foundation.
+ *
+ *  This program is distributed in the hope it will be useful, but WITHOUT
+ *  ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ *  FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ *  more details.
+ *
+ *  The full GNU General Public License is included in this distribution in
+ *  the file called "COPYING".
+ *
+ */
+
+#include "cxgb4.h"
+#include "cudbg_if.h"
+#include "cudbg_intrinsic.h"
+
+unsigned int cudbg_mem_read_def(struct cudbg_init *pdbg_init,
+				u32 start, u32 offset, u32 size,
+				u32 mem_aperture, u8 *outbuf)
+{
+	struct adapter *adap = pdbg_init->adap;
+	__be32 *buf = (__be32 *)outbuf;
+
+	*buf = le32_to_cpu((__force __le32)
+			   t4_read_reg(adap, start + offset));
+
+	return sizeof(__be32);
+}
+
+void cudbg_set_intrinsic_callback(struct cudbg_init *pdbg_init)
+{
+	pdbg_init->intrinsic_cb = cudbg_mem_read_def;
+}
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cudbg_intrinsic.h b/drivers/net/ethernet/chelsio/cxgb4/cudbg_intrinsic.h
new file mode 100644
index 000000000000..3af0f07311ec
--- /dev/null
+++ b/drivers/net/ethernet/chelsio/cxgb4/cudbg_intrinsic.h
@@ -0,0 +1,25 @@
+/*
+ *  Copyright (C) 2018 Chelsio Communications.  All rights reserved.
+ *
+ *  This program is free software; you can redistribute it and/or modify it
+ *  under the terms and conditions of the GNU General Public License,
+ *  version 2, as published by the Free Software Foundation.
+ *
+ *  This program is distributed in the hope it will be useful, but WITHOUT
+ *  ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ *  FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ *  more details.
+ *
+ *  The full GNU General Public License is included in this distribution in
+ *  the file called "COPYING".
+ *
+ */
+
+#ifndef __CUDBG_INTRINSIC_H__
+#define __CUDBG_INTRINSIC_H__
+
+unsigned int cudbg_mem_read_def(struct cudbg_init *pdbg_init,
+				u32 start, u32 offset, u32 size,
+				u32 mem_aperture, u8 *outbuf);
+void cudbg_set_intrinsic_callback(struct cudbg_init *pdbg_init);
+#endif /* __CUDBG_INTRINSIC_H__ */
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cudbg_lib.c b/drivers/net/ethernet/chelsio/cxgb4/cudbg_lib.c
index 0a3871f10787..4921460aa787 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cudbg_lib.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cudbg_lib.c
@@ -18,6 +18,7 @@
 #include <linux/sort.h>
 
 #include "t4_regs.h"
+#include "t4_values.h"
 #include "cxgb4.h"
 #include "cudbg_if.h"
 #include "cudbg_lib_common.h"
@@ -840,6 +841,69 @@ static int cudbg_get_payload_range(struct adapter *padap, u8 mem_type,
 				      &payload->start, &payload->end);
 }
 
+static int cudbg_memory_read(struct cudbg_init *pdbg_init, int win,
+			     int mtype, u32 addr, u32 len, void *hbuf)
+{
+	u32 win_pf, memoffset, mem_aperture, mem_base;
+	struct adapter *adap = pdbg_init->adap;
+	u32 pos, offset, resid, read_len;
+	u8 *buf;
+	int ret;
+
+	/* Argument sanity checks ...
+	 */
+	if (addr & 0x3 || (uintptr_t)hbuf & 0x3)
+		return -EINVAL;
+
+	buf = (u8 *)hbuf;
+
+	/* Try to do 32-bit reads.  Residual will be handled later. */
+	resid = len & 0x3;
+	len -= resid;
+
+	ret = t4_memory_rw_init(adap, win, mtype, &memoffset, &mem_base,
+				&mem_aperture);
+	if (ret)
+		return ret;
+
+	addr = addr + memoffset;
+	win_pf = is_t4(adap->params.chip) ? 0 : PFNUM_V(adap->pf);
+
+	pos = addr & ~(mem_aperture - 1);
+	offset = addr - pos;
+
+	/* Set up initial PCI-E Memory Window to cover the start of our
+	 * transfer.
+	 */
+	t4_memory_update_win(adap, win, pos | win_pf);
+
+	/* Transfer data from the adapter */
+	while (len > 0) {
+		read_len = pdbg_init->intrinsic_cb(pdbg_init, mem_base,
+						   offset, len, mem_aperture,
+						   buf);
+		buf += read_len;
+		offset += read_len;
+		len -= read_len;
+
+		/* If we've reached the end of our current window aperture,
+		 * move the PCI-E Memory Window on to the next.
+		 */
+		if (offset == mem_aperture) {
+			pos += mem_aperture;
+			offset = 0;
+			t4_memory_update_win(adap, win, pos | win_pf);
+		}
+	}
+
+	/* Transfer residual */
+	if (resid)
+		t4_memory_rw_residual(adap, resid, mem_base + offset, buf,
+				      T4_MEMORY_READ);
+
+	return 0;
+}
+
 #define CUDBG_YIELD_ITERATION 256
 
 static int cudbg_read_fw_mem(struct cudbg_init *pdbg_init,
@@ -899,10 +963,8 @@ static int cudbg_read_fw_mem(struct cudbg_init *pdbg_init,
 				goto skip_read;
 
 		spin_lock(&padap->win0_lock);
-		rc = t4_memory_rw(padap, MEMWIN_NIC, mem_type,
-				  bytes_read, bytes,
-				  (__be32 *)temp_buff.data,
-				  1);
+		rc = cudbg_memory_read(pdbg_init, MEMWIN_NIC, mem_type,
+				       bytes_read, bytes, temp_buff.data);
 		spin_unlock(&padap->win0_lock);
 		if (rc) {
 			cudbg_err->sys_err = rc;
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h b/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
index baa67d362051..92f1d87adf9f 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
@@ -1471,6 +1471,11 @@ u32 t4_read_pcie_cfg4(struct adapter *adap, int reg);
 u32 t4_get_util_window(struct adapter *adap);
 void t4_setup_memwin(struct adapter *adap, u32 memwin_base, u32 window);
 
+int t4_memory_rw_init(struct adapter *adap, int win, int mtype, u32 *mem_off,
+		      u32 *mem_base, u32 *mem_aperture);
+void t4_memory_update_win(struct adapter *adap, int win, u32 addr);
+void t4_memory_rw_residual(struct adapter *adap, u32 off, u32 addr, u8 *buf,
+			   int dir);
 #define T4_MEMORY_WRITE	0
 #define T4_MEMORY_READ	1
 int t4_memory_rw(struct adapter *adap, int win, int mtype, u32 addr, u32 len,
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_cudbg.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_cudbg.c
index a2d6c8a69c52..db1b57a09887 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_cudbg.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_cudbg.c
@@ -18,6 +18,7 @@
 #include "t4_regs.h"
 #include "cxgb4.h"
 #include "cxgb4_cudbg.h"
+#include "cudbg_intrinsic.h"
 
 static const struct cxgb4_collect_entity cxgb4_collect_mem_dump[] = {
 	{ CUDBG_EDC0, cudbg_collect_edc0_meminfo },
@@ -395,6 +396,7 @@ int cxgb4_cudbg_collect(struct adapter *adap, void *buf, u32 *buf_size,
 	cudbg_init.adap = adap;
 	cudbg_init.outbuf = buf;
 	cudbg_init.outbuf_size = size;
+	cudbg_set_intrinsic_callback(&cudbg_init);
 
 	dbg_buff.data = buf;
 	dbg_buff.size = size;
diff --git a/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c b/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c
index 6d76851a4da9..e4d7da15c3b6 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c
@@ -481,6 +481,117 @@ static int t4_edc_err_read(struct adapter *adap, int idx)
 	return 0;
 }
 
+/**
+ * t4_memory_rw_init - Get memory window relative offset, base, and size.
+ * @adap: the adapter
+ * @win: PCI-E Memory Window to use
+ * @mtype: memory type: MEM_EDC0, MEM_EDC1 or MEM_MC
+ * @mem_off: memory relative offset with respect to @mtype.
+ * @mem_base: configured memory base address.
+ * @mem_aperture: configured memory window aperture.
+ *
+ * Get the configured memory window's relative offset, base, and size.
+ */
+int t4_memory_rw_init(struct adapter *adap, int win, int mtype, u32 *mem_off,
+		      u32 *mem_base, u32 *mem_aperture)
+{
+	u32 edc_size, mc_size, mem_reg;
+
+	/* Offset into the region of memory which is being accessed
+	 * MEM_EDC0 = 0
+	 * MEM_EDC1 = 1
+	 * MEM_MC   = 2 -- MEM_MC for chips with only 1 memory controller
+	 * MEM_MC1  = 3 -- for chips with 2 memory controllers (e.g. T5)
+	 * MEM_HMA  = 4
+	 */
+	edc_size  = EDRAM0_SIZE_G(t4_read_reg(adap, MA_EDRAM0_BAR_A));
+	if (mtype == MEM_HMA) {
+		*mem_off = 2 * (edc_size * 1024 * 1024);
+	} else if (mtype != MEM_MC1) {
+		*mem_off = (mtype * (edc_size * 1024 * 1024));
+	} else {
+		mc_size = EXT_MEM0_SIZE_G(t4_read_reg(adap,
+						      MA_EXT_MEMORY0_BAR_A));
+		*mem_off = (MEM_MC0 * edc_size + mc_size) * 1024 * 1024;
+	}
+
+	/* Each PCI-E Memory Window is programmed with a window size -- or
+	 * "aperture" -- which controls the granularity of its mapping onto
+	 * adapter memory.  We need to grab that aperture in order to know
+	 * how to use the specified window.  The window is also programmed
+	 * with the base address of the Memory Window in BAR0's address
+	 * space.  For T4 this is an absolute PCI-E Bus Address.  For T5
+	 * the address is relative to BAR0.
+	 */
+	mem_reg = t4_read_reg(adap,
+			      PCIE_MEM_ACCESS_REG(PCIE_MEM_ACCESS_BASE_WIN_A,
+						  win));
+	/* a dead adapter will return 0xffffffff for PIO reads */
+	if (mem_reg == 0xffffffff)
+		return -ENXIO;
+
+	*mem_aperture = 1 << (WINDOW_G(mem_reg) + WINDOW_SHIFT_X);
+	*mem_base = PCIEOFST_G(mem_reg) << PCIEOFST_SHIFT_X;
+	if (is_t4(adap->params.chip))
+		*mem_base -= adap->t4_bar0;
+
+	return 0;
+}
+
+/**
+ * t4_memory_update_win - Move memory window to specified address.
+ * @adap: the adapter
+ * @win: PCI-E Memory Window to use
+ * @addr: location to move.
+ *
+ * Move memory window to specified address.
+ */
+void t4_memory_update_win(struct adapter *adap, int win, u32 addr)
+{
+	t4_write_reg(adap,
+		     PCIE_MEM_ACCESS_REG(PCIE_MEM_ACCESS_OFFSET_A, win),
+		     addr);
+	/* Read it back to ensure that changes propagate before we
+	 * attempt to use the new value.
+	 */
+	t4_read_reg(adap,
+		    PCIE_MEM_ACCESS_REG(PCIE_MEM_ACCESS_OFFSET_A, win));
+}
+
+/**
+ * t4_memory_rw_residual - Read/Write residual data.
+ * @adap: the adapter
+ * @off: relative offset within residual to start read/write.
+ * @addr: address within indicated memory type.
+ * @buf: host memory buffer
+ * @dir: direction of transfer T4_MEMORY_READ (1) or T4_MEMORY_WRITE (0)
+ *
+ * Read/Write residual data less than 32-bits.
+ */
+void t4_memory_rw_residual(struct adapter *adap, u32 off, u32 addr, u8 *buf,
+			   int dir)
+{
+	union {
+		u32 word;
+		char byte[4];
+	} last;
+	unsigned char *bp;
+	int i;
+
+	if (dir == T4_MEMORY_READ) {
+		last.word = le32_to_cpu((__force __le32)
+					t4_read_reg(adap, addr));
+		for (bp = (unsigned char *)buf, i = off; i < 4; i++)
+			bp[i] = last.byte[i];
+	} else {
+		last.word = *buf;
+		for (i = off; i < 4; i++)
+			last.byte[i] = 0;
+		t4_write_reg(adap, addr,
+			     (__force u32)cpu_to_le32(last.word));
+	}
+}
+
 /**
  *	t4_memory_rw - read/write EDC 0, EDC 1 or MC via PCIE memory window
  *	@adap: the adapter
@@ -502,8 +613,9 @@ int t4_memory_rw(struct adapter *adap, int win, int mtype, u32 addr,
 		 u32 len, void *hbuf, int dir)
 {
 	u32 pos, offset, resid, memoffset;
-	u32 edc_size, mc_size, win_pf, mem_reg, mem_aperture, mem_base;
+	u32 win_pf, mem_aperture, mem_base;
 	u32 *buf;
+	int ret;
 
 	/* Argument sanity checks ...
 	 */
@@ -519,59 +631,26 @@ int t4_memory_rw(struct adapter *adap, int win, int mtype, u32 addr,
 	resid = len & 0x3;
 	len -= resid;
 
-	/* Offset into the region of memory which is being accessed
-	 * MEM_EDC0 = 0
-	 * MEM_EDC1 = 1
-	 * MEM_MC   = 2 -- MEM_MC for chips with only 1 memory controller
-	 * MEM_MC1  = 3 -- for chips with 2 memory controllers (e.g. T5)
-	 * MEM_HMA  = 4
-	 */
-	edc_size  = EDRAM0_SIZE_G(t4_read_reg(adap, MA_EDRAM0_BAR_A));
-	if (mtype == MEM_HMA) {
-		memoffset = 2 * (edc_size * 1024 * 1024);
-	} else if (mtype != MEM_MC1) {
-		memoffset = (mtype * (edc_size * 1024 * 1024));
-	} else {
-		mc_size = EXT_MEM0_SIZE_G(t4_read_reg(adap,
-						      MA_EXT_MEMORY0_BAR_A));
-		memoffset = (MEM_MC0 * edc_size + mc_size) * 1024 * 1024;
-	}
+	ret = t4_memory_rw_init(adap, win, mtype, &memoffset, &mem_base,
+				&mem_aperture);
+	if (ret)
+		return ret;
 
 	/* Determine the PCIE_MEM_ACCESS_OFFSET */
 	addr = addr + memoffset;
 
-	/* Each PCI-E Memory Window is programmed with a window size -- or
-	 * "aperture" -- which controls the granularity of its mapping onto
-	 * adapter memory.  We need to grab that aperture in order to know
-	 * how to use the specified window.  The window is also programmed
-	 * with the base address of the Memory Window in BAR0's address
-	 * space.  For T4 this is an absolute PCI-E Bus Address.  For T5
-	 * the address is relative to BAR0.
-	 */
-	mem_reg = t4_read_reg(adap,
-			      PCIE_MEM_ACCESS_REG(PCIE_MEM_ACCESS_BASE_WIN_A,
-						  win));
-	mem_aperture = 1 << (WINDOW_G(mem_reg) + WINDOW_SHIFT_X);
-	mem_base = PCIEOFST_G(mem_reg) << PCIEOFST_SHIFT_X;
-	if (is_t4(adap->params.chip))
-		mem_base -= adap->t4_bar0;
 	win_pf = is_t4(adap->params.chip) ? 0 : PFNUM_V(adap->pf);
 
 	/* Calculate our initial PCI-E Memory Window Position and Offset into
 	 * that Window.
 	 */
-	pos = addr & ~(mem_aperture-1);
+	pos = addr & ~(mem_aperture - 1);
 	offset = addr - pos;
 
 	/* Set up initial PCI-E Memory Window to cover the start of our
-	 * transfer.  (Read it back to ensure that changes propagate before we
-	 * attempt to use the new value.)
+	 * transfer.
 	 */
-	t4_write_reg(adap,
-		     PCIE_MEM_ACCESS_REG(PCIE_MEM_ACCESS_OFFSET_A, win),
-		     pos | win_pf);
-	t4_read_reg(adap,
-		    PCIE_MEM_ACCESS_REG(PCIE_MEM_ACCESS_OFFSET_A, win));
+	t4_memory_update_win(adap, win, pos | win_pf);
 
 	/* Transfer data to/from the adapter as long as there's an integral
 	 * number of 32-bit transfers to complete.
@@ -626,12 +705,7 @@ int t4_memory_rw(struct adapter *adap, int win, int mtype, u32 addr,
 		if (offset == mem_aperture) {
 			pos += mem_aperture;
 			offset = 0;
-			t4_write_reg(adap,
-				PCIE_MEM_ACCESS_REG(PCIE_MEM_ACCESS_OFFSET_A,
-						    win), pos | win_pf);
-			t4_read_reg(adap,
-				PCIE_MEM_ACCESS_REG(PCIE_MEM_ACCESS_OFFSET_A,
-						    win));
+			t4_memory_update_win(adap, win, pos | win_pf);
 		}
 	}
 
@@ -640,28 +714,9 @@ int t4_memory_rw(struct adapter *adap, int win, int mtype, u32 addr,
 	 * residual amount.  The PCI-E Memory Window has already been moved
 	 * above (if necessary) to cover this final transfer.
 	 */
-	if (resid) {
-		union {
-			u32 word;
-			char byte[4];
-		} last;
-		unsigned char *bp;
-		int i;
-
-		if (dir == T4_MEMORY_READ) {
-			last.word = le32_to_cpu(
-					(__force __le32)t4_read_reg(adap,
-						mem_base + offset));
-			for (bp = (unsigned char *)buf, i = resid; i < 4; i++)
-				bp[i] = last.byte[i];
-		} else {
-			last.word = *buf;
-			for (i = resid; i < 4; i++)
-				last.byte[i] = 0;
-			t4_write_reg(adap, mem_base + offset,
-				     (__force u32)cpu_to_le32(last.word));
-		}
-	}
+	if (resid)
+		t4_memory_rw_residual(adap, resid, mem_base + offset,
+				      (u8 *)buf, dir);
 
 	return 0;
 }
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH net-next 2/2] cxgb4: speed up on-chip memory read
  2018-01-14  9:32 [PATCH net-next 0/2] cxgb4: speed up reading on-chip memory Rahul Lakkireddy
  2018-01-14  9:32 ` [PATCH net-next 1/2] cxgb4: rework on-chip memory read Rahul Lakkireddy
@ 2018-01-14  9:32 ` Rahul Lakkireddy
  2018-01-16  2:56   ` kbuild test robot
  2018-01-14 17:17 ` [PATCH net-next 0/2] cxgb4: speed up reading on-chip memory David Miller
  2 siblings, 1 reply; 6+ messages in thread
From: Rahul Lakkireddy @ 2018-01-14  9:32 UTC (permalink / raw)
  To: netdev; +Cc: davem, ganeshgr, nirranjan, indranil, Rahul Lakkireddy

Register and use AVX CPU intrinsic instructions when available to do
256-bit reads to speed up reading EDC and MC.  Otherwise, fallback to
32-bit reads.  Also align destination buffer on 32-byte boundary.

Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
---
 drivers/net/ethernet/chelsio/cxgb4/Makefile        |  1 +
 drivers/net/ethernet/chelsio/cxgb4/cudbg_entity.h  |  2 +
 .../net/ethernet/chelsio/cxgb4/cudbg_intrinsic.c   |  7 +-
 .../net/ethernet/chelsio/cxgb4/cudbg_intrinsic.h   |  8 +++
 .../ethernet/chelsio/cxgb4/cudbg_intrinsic_avx.c   | 78 ++++++++++++++++++++++
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_cudbg.c   |  5 +-
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_ethtool.c |  2 +
 7 files changed, 101 insertions(+), 2 deletions(-)
 create mode 100644 drivers/net/ethernet/chelsio/cxgb4/cudbg_intrinsic_avx.c

diff --git a/drivers/net/ethernet/chelsio/cxgb4/Makefile b/drivers/net/ethernet/chelsio/cxgb4/Makefile
index 0dbaf1b18bac..a0f5239b19d4 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/Makefile
+++ b/drivers/net/ethernet/chelsio/cxgb4/Makefile
@@ -12,3 +12,4 @@ cxgb4-objs := cxgb4_main.o l2t.o smt.o t4_hw.o sge.o clip_tbl.o cxgb4_ethtool.o
 cxgb4-$(CONFIG_CHELSIO_T4_DCB) +=  cxgb4_dcb.o
 cxgb4-$(CONFIG_CHELSIO_T4_FCOE) +=  cxgb4_fcoe.o
 cxgb4-$(CONFIG_DEBUG_FS) += cxgb4_debugfs.o
+cxgb4-$(CONFIG_X86) += cudbg_intrinsic_avx.o
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cudbg_entity.h b/drivers/net/ethernet/chelsio/cxgb4/cudbg_entity.h
index b57acb8dc35b..4269d1621e9a 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cudbg_entity.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/cudbg_entity.h
@@ -25,6 +25,8 @@
 #define MC1_FLAG 4
 #define HMA_FLAG 5
 
+#define CUDBG_MEM_ALIGN 32
+
 #define CUDBG_ENTITY_SIGNATURE 0xCCEDB001
 
 struct cudbg_mbox_log {
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cudbg_intrinsic.c b/drivers/net/ethernet/chelsio/cxgb4/cudbg_intrinsic.c
index 0b80512e5c0c..6ed418d90507 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cudbg_intrinsic.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cudbg_intrinsic.c
@@ -34,5 +34,10 @@ unsigned int cudbg_mem_read_def(struct cudbg_init *pdbg_init,
 
 void cudbg_set_intrinsic_callback(struct cudbg_init *pdbg_init)
 {
-	pdbg_init->intrinsic_cb = cudbg_mem_read_def;
+#ifdef CONFIG_X86
+	if (cudbg_intrinsic_avx_supported())
+		pdbg_init->intrinsic_cb = cudbg_mem_read_avx;
+	else
+#endif
+		pdbg_init->intrinsic_cb = cudbg_mem_read_def;
 }
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cudbg_intrinsic.h b/drivers/net/ethernet/chelsio/cxgb4/cudbg_intrinsic.h
index 3af0f07311ec..d878c71ef65d 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cudbg_intrinsic.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/cudbg_intrinsic.h
@@ -21,5 +21,13 @@
 unsigned int cudbg_mem_read_def(struct cudbg_init *pdbg_init,
 				u32 start, u32 offset, u32 size,
 				u32 mem_aperture, u8 *outbuf);
+
+#ifdef CONFIG_X86
+int cudbg_intrinsic_avx_supported(void);
+unsigned int cudbg_mem_read_avx(struct cudbg_init *pdbg_init, u32 start,
+				u32 offset, u32 size, u32 mem_aperture,
+				u8 *outbuf);
+#endif
+
 void cudbg_set_intrinsic_callback(struct cudbg_init *pdbg_init);
 #endif /* __CUDBG_INTRINSIC_H__ */
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cudbg_intrinsic_avx.c b/drivers/net/ethernet/chelsio/cxgb4/cudbg_intrinsic_avx.c
new file mode 100644
index 000000000000..d5bd4dfef428
--- /dev/null
+++ b/drivers/net/ethernet/chelsio/cxgb4/cudbg_intrinsic_avx.c
@@ -0,0 +1,78 @@
+/*
+ *  Copyright (C) 2018 Chelsio Communications.  All rights reserved.
+ *
+ *  This program is free software; you can redistribute it and/or modify it
+ *  under the terms and conditions of the GNU General Public License,
+ *  version 2, as published by the Free Software Foundation.
+ *
+ *  This program is distributed in the hope it will be useful, but WITHOUT
+ *  ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ *  FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ *  more details.
+ *
+ *  The full GNU General Public License is included in this distribution in
+ *  the file called "COPYING".
+ *
+ */
+
+#include <linux/cpufeature.h>
+#include <asm/fpu/api.h>
+
+#include "cxgb4.h"
+#include "cudbg_if.h"
+#include "cudbg_lib_common.h"
+#include "cudbg_intrinsic.h"
+
+int cudbg_intrinsic_avx_supported(void)
+{
+#ifdef CONFIG_AS_AVX
+	return boot_cpu_has(X86_FEATURE_AVX);
+#else
+	return 0;
+#endif /* CONFIG_AS_AVX */
+}
+
+/* Alignment in bytes for AVX aligned instructions */
+#define CUDBG_MEM_ALIGN_AVX 32
+
+unsigned int cudbg_mem_read_avx(struct cudbg_init *pdbg_init, u32 start,
+				u32 offset, u32 size, u32 mem_aperture,
+				u8 *outbuf)
+{
+#ifdef CONFIG_AS_AVX
+	u32 max_read_len = CUDBG_MEM_ALIGN_AVX;
+	struct adapter *adap = pdbg_init->adap;
+	u8 *reg_addr, *src_addr, *dst_addr;
+	u32 bytes_read, read_len;
+
+	reg_addr = (u8 *)adap->regs + start + offset;
+	src_addr = PTR_ALIGN(reg_addr, max_read_len);
+	dst_addr = PTR_ALIGN(outbuf, max_read_len);
+	read_len = min(size, max_read_len);
+
+	/* Don't use intrinsic for following cases:
+	 * 1. If reading current offset + 256-bits would
+	 *    exceed current window aperture.
+	 * 2. Source or Destination address is not aligned
+	 *    to 256-bits.
+	 * 3. There are less than 256-bits left to read.
+	 */
+	if (offset + max_read_len > mem_aperture ||
+	    src_addr != reg_addr || dst_addr != outbuf ||
+	    read_len < max_read_len) {
+		return cudbg_mem_read_def(pdbg_init, start, offset, size,
+					  mem_aperture, outbuf);
+	} else {
+		kernel_fpu_begin();
+		asm volatile("vmovdqa %0, %%ymm0" : : "m" (*reg_addr));
+		asm volatile("vmovdqa %%ymm0, %0" : "=m" (*outbuf));
+		kernel_fpu_end();
+		bytes_read = read_len;
+	}
+
+	return bytes_read;
+#else
+	return cudbg_mem_read_def(pdbg_init, start, offset, size, mem_aperture,
+				  outbuf);
+#endif /* CONFIG_AS_AVX */
+}
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_cudbg.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_cudbg.c
index db1b57a09887..220ba2f60cf7 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_cudbg.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_cudbg.c
@@ -428,12 +428,15 @@ int cxgb4_cudbg_collect(struct adapter *adap, void *buf, u32 *buf_size,
 					   buf,
 					   &total_size);
 
-	if (flag & CXGB4_ETH_DUMP_MEM)
+	if (flag & CXGB4_ETH_DUMP_MEM) {
+		dbg_buff.offset = roundup(dbg_buff.offset, CUDBG_MEM_ALIGN);
+		total_size = roundup(total_size, CUDBG_MEM_ALIGN);
 		cxgb4_cudbg_collect_entity(&cudbg_init, &dbg_buff,
 					   cxgb4_collect_mem_dump,
 					   ARRAY_SIZE(cxgb4_collect_mem_dump),
 					   buf,
 					   &total_size);
+	}
 
 	cudbg_hdr->data_len = total_size;
 	*buf_size = total_size;
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_ethtool.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_ethtool.c
index 7852d98bad75..d437e46f6af6 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_ethtool.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_ethtool.c
@@ -1362,6 +1362,7 @@ static int set_dump(struct net_device *dev, struct ethtool_dump *eth_dump)
 	len = sizeof(struct cudbg_hdr) +
 	      sizeof(struct cudbg_entity_hdr) * CUDBG_MAX_ENTITY;
 	len += cxgb4_get_dump_length(adapter, eth_dump->flag);
+	len = roundup(len, CUDBG_MEM_ALIGN);
 
 	adapter->eth_dump.flag = eth_dump->flag;
 	adapter->eth_dump.len = len;
@@ -1391,6 +1392,7 @@ static int get_dump_data(struct net_device *dev, struct ethtool_dump *eth_dump,
 	len = sizeof(struct cudbg_hdr) +
 	      sizeof(struct cudbg_entity_hdr) * CUDBG_MAX_ENTITY;
 	len += cxgb4_get_dump_length(adapter, adapter->eth_dump.flag);
+	len = roundup(len, CUDBG_MEM_ALIGN);
 	if (eth_dump->len < len)
 		return -ENOMEM;
 
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH net-next 0/2] cxgb4: speed up reading on-chip memory
  2018-01-14  9:32 [PATCH net-next 0/2] cxgb4: speed up reading on-chip memory Rahul Lakkireddy
  2018-01-14  9:32 ` [PATCH net-next 1/2] cxgb4: rework on-chip memory read Rahul Lakkireddy
  2018-01-14  9:32 ` [PATCH net-next 2/2] cxgb4: speed up " Rahul Lakkireddy
@ 2018-01-14 17:17 ` David Miller
  2 siblings, 0 replies; 6+ messages in thread
From: David Miller @ 2018-01-14 17:17 UTC (permalink / raw)
  To: rahul.lakkireddy; +Cc: netdev, ganeshgr, nirranjan, indranil

From: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Date: Sun, 14 Jan 2018 15:02:03 +0530

> This series of patches speed up reading on-chip memory (EDC and MC)
> by using AVX intrinsic instructions when available.
> 
> Patch 1 exports callback to register supported intrinsic instructions
> when available.  Also rework logic to read EDC and MC.
> 
> Patch 2 adds AVX CPU intrinsic instructions to read EDC and MC
> 256-bits at a time.  Also fallback to regular 32-bit reads, if AVX is
> not available.

This violates things on several levels.

IO mappings are a special __iomem type because you _CANNOT_
dereference them directly.

This means you cannot feed them into normal C dereferences
or normal loads or stores.

The whole point is that if the layout and format of the
__iomem pointer changes, or if some special kind of access
is necessary, no driver code needs to change.

But if you start adding direct AVX instruction loads and
stores of these pointers, things are going to break in the
future.

Sorry, there is no way I am applying this patch set.

Thanks.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net-next 1/2] cxgb4: rework on-chip memory read
  2018-01-14  9:32 ` [PATCH net-next 1/2] cxgb4: rework on-chip memory read Rahul Lakkireddy
@ 2018-01-16  1:59   ` kbuild test robot
  0 siblings, 0 replies; 6+ messages in thread
From: kbuild test robot @ 2018-01-16  1:59 UTC (permalink / raw)
  To: Rahul Lakkireddy
  Cc: kbuild-all, netdev, davem, ganeshgr, nirranjan, indranil,
	Rahul Lakkireddy

Hi Rahul,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on net-next/master]

url:    https://github.com/0day-ci/linux/commits/Rahul-Lakkireddy/cxgb4-rework-on-chip-memory-read/20180116-050826
reproduce:
        # apt-get install sparse
        make ARCH=x86_64 allmodconfig
        make C=1 CF=-D__CHECK_ENDIAN__


sparse warnings: (new ones prefixed by >>)

>> drivers/net/ethernet/chelsio/cxgb4/cudbg_intrinsic.c:29:14: sparse: incorrect type in assignment (different base types) @@ expected restricted __be32 <noident> @@ got unsignrestricted __be32 <noident> @@
   drivers/net/ethernet/chelsio/cxgb4/cudbg_intrinsic.c:29:14: expected restricted __be32 <noident>
   drivers/net/ethernet/chelsio/cxgb4/cudbg_intrinsic.c:29:14: got unsigned int <noident>

vim +29 drivers/net/ethernet/chelsio/cxgb4/cudbg_intrinsic.c

    21	
    22	unsigned int cudbg_mem_read_def(struct cudbg_init *pdbg_init,
    23					u32 start, u32 offset, u32 size,
    24					u32 mem_aperture, u8 *outbuf)
    25	{
    26		struct adapter *adap = pdbg_init->adap;
    27		__be32 *buf = (__be32 *)outbuf;
    28	
  > 29		*buf = le32_to_cpu((__force __le32)
    30				   t4_read_reg(adap, start + offset));
    31	
    32		return sizeof(__be32);
    33	}
    34	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net-next 2/2] cxgb4: speed up on-chip memory read
  2018-01-14  9:32 ` [PATCH net-next 2/2] cxgb4: speed up " Rahul Lakkireddy
@ 2018-01-16  2:56   ` kbuild test robot
  0 siblings, 0 replies; 6+ messages in thread
From: kbuild test robot @ 2018-01-16  2:56 UTC (permalink / raw)
  To: Rahul Lakkireddy
  Cc: kbuild-all, netdev, davem, ganeshgr, nirranjan, indranil,
	Rahul Lakkireddy

Hi Rahul,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on net-next/master]

url:    https://github.com/0day-ci/linux/commits/Rahul-Lakkireddy/cxgb4-rework-on-chip-memory-read/20180116-050826
reproduce:
        # apt-get install sparse
        make ARCH=x86_64 allmodconfig
        make C=1 CF=-D__CHECK_ENDIAN__


sparse warnings: (new ones prefixed by >>)

>> drivers/net/ethernet/chelsio/cxgb4/cudbg_intrinsic_avx.c:48:21: sparse: cast removes address space of expression

vim +48 drivers/net/ethernet/chelsio/cxgb4/cudbg_intrinsic_avx.c

    37	
    38	unsigned int cudbg_mem_read_avx(struct cudbg_init *pdbg_init, u32 start,
    39					u32 offset, u32 size, u32 mem_aperture,
    40					u8 *outbuf)
    41	{
    42	#ifdef CONFIG_AS_AVX
    43		u32 max_read_len = CUDBG_MEM_ALIGN_AVX;
    44		struct adapter *adap = pdbg_init->adap;
    45		u8 *reg_addr, *src_addr, *dst_addr;
    46		u32 bytes_read, read_len;
    47	
  > 48		reg_addr = (u8 *)adap->regs + start + offset;

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2018-01-16  2:57 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-01-14  9:32 [PATCH net-next 0/2] cxgb4: speed up reading on-chip memory Rahul Lakkireddy
2018-01-14  9:32 ` [PATCH net-next 1/2] cxgb4: rework on-chip memory read Rahul Lakkireddy
2018-01-16  1:59   ` kbuild test robot
2018-01-14  9:32 ` [PATCH net-next 2/2] cxgb4: speed up " Rahul Lakkireddy
2018-01-16  2:56   ` kbuild test robot
2018-01-14 17:17 ` [PATCH net-next 0/2] cxgb4: speed up reading on-chip memory David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).