[net-next PATCH 1/3] qlge: Relax alignment on TX harware queue.

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [net-next PATCH 1/3] qlge: Relax alignment on TX harware queue.
@ 2009-06-11  1:49 Ron Mercer
  2009-06-11  1:49 ` [net-next PATCH 2/3] qlge: Allow RX buf rings to be > than 4096 bytes Ron Mercer
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Ron Mercer @ 2009-06-11  1:49 UTC (permalink / raw)
  To: davem; +Cc: netdev, ron.mercer

The alignment was on size of queue boundary, but the hardware
only requires 4-byte alignment.

Signed-off-by: Ron Mercer <ron.mercer@qlogic.com>
---
 drivers/net/qlge/qlge.h      |    2 ++
 drivers/net/qlge/qlge_main.c |    2 +-
 2 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/drivers/net/qlge/qlge.h b/drivers/net/qlge/qlge.h
index 5eb52ca..b1ddfd1 100644
--- a/drivers/net/qlge/qlge.h
+++ b/drivers/net/qlge/qlge.h
@@ -27,6 +27,8 @@
 			   "%s: " fmt, __func__, ##args);  \
        } while (0)
 
+#define WQ_ADDR_ALIGN	0x3	/* 4 byte alignment */
+
 #define QLGE_VENDOR_ID    0x1077
 #define QLGE_DEVICE_ID_8012	0x8012
 #define QLGE_DEVICE_ID_8000	0x8000
diff --git a/drivers/net/qlge/qlge_main.c b/drivers/net/qlge/qlge_main.c
index 024c734..17d512c 100644
--- a/drivers/net/qlge/qlge_main.c
+++ b/drivers/net/qlge/qlge_main.c
@@ -2237,7 +2237,7 @@ static int ql_alloc_tx_resources(struct ql_adapter *qdev,
 				 &tx_ring->wq_base_dma);
 
 	if ((tx_ring->wq_base == NULL)
-	    || tx_ring->wq_base_dma & (tx_ring->wq_size - 1)) {
+		|| tx_ring->wq_base_dma & WQ_ADDR_ALIGN) {
 		QPRINTK(qdev, IFUP, ERR, "tx_ring alloc failed.\n");
 		return -ENOMEM;
 	}
-- 
1.6.0.2


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [net-next PATCH 2/3] qlge: Allow RX buf rings to be > than 4096 bytes.
  2009-06-11  1:49 [net-next PATCH 1/3] qlge: Relax alignment on TX harware queue Ron Mercer
@ 2009-06-11  1:49 ` Ron Mercer
  2009-06-11  9:37   ` David Miller
  2009-06-11  1:49 ` [net-next PATCH 3/3] qlge: Increase default TX/RX ring size to 1024 Ron Mercer
  2009-06-11  9:37 ` [net-next PATCH 1/3] qlge: Relax alignment on TX harware queue David Miller
  2 siblings, 1 reply; 9+ messages in thread
From: Ron Mercer @ 2009-06-11  1:49 UTC (permalink / raw)
  To: davem; +Cc: netdev, ron.mercer

RX buffer rings can be comprised of non-contiguous fixed
size chunks of memory. The ring is given to the hardware
as a pointer to a location that stores the location of
the queue.  If the queue is greater than 4096 bytes then
the hardware gets a list of said pointers.
This patch addes the necessary logic to generate the list if
the queue size exceeds 4096.

Signed-off-by: Ron Mercer <ron.mercer@qlogic.com>
---
 drivers/net/qlge/qlge.h      |   13 +++++++++++--
 drivers/net/qlge/qlge_main.c |   28 ++++++++++++++++++++++------
 2 files changed, 33 insertions(+), 8 deletions(-)

diff --git a/drivers/net/qlge/qlge.h b/drivers/net/qlge/qlge.h
index b1ddfd1..156e02e 100644
--- a/drivers/net/qlge/qlge.h
+++ b/drivers/net/qlge/qlge.h
@@ -41,7 +41,18 @@
 
 #define NUM_SMALL_BUFFERS   512
 #define NUM_LARGE_BUFFERS   512
+#define DB_PAGE_SIZE 4096
+
+/* Calculate the number of (4k) pages required to
+ * contain a buffer queue of the given length.
+ */
+#define MAX_DB_PAGES_PER_BQ(x) \
+		(((x * sizeof(u64)) / DB_PAGE_SIZE) + \
+		(((x * sizeof(u64)) % DB_PAGE_SIZE) ? 1 : 0))
 
+#define RX_RING_SHADOW_SPACE	(sizeof(u64) + \
+		MAX_DB_PAGES_PER_BQ(NUM_SMALL_BUFFERS) * sizeof(u64) + \
+		MAX_DB_PAGES_PER_BQ(NUM_LARGE_BUFFERS) * sizeof(u64))
 #define SMALL_BUFFER_SIZE 256
 #define LARGE_BUFFER_SIZE	PAGE_SIZE
 #define MAX_SPLIT_SIZE 1023
@@ -65,8 +76,6 @@
 #define TX_DESC_PER_OAL 0
 #endif
 
-#define DB_PAGE_SIZE 4096
-
 /* MPI test register definitions. This register
  * is used for determining alternate NIC function's
  * PCI->func number.
diff --git a/drivers/net/qlge/qlge_main.c b/drivers/net/qlge/qlge_main.c
index 17d512c..b9a5f59 100644
--- a/drivers/net/qlge/qlge_main.c
+++ b/drivers/net/qlge/qlge_main.c
@@ -2552,14 +2552,16 @@ static int ql_start_rx_ring(struct ql_adapter *qdev, struct rx_ring *rx_ring)
 {
 	struct cqicb *cqicb = &rx_ring->cqicb;
 	void *shadow_reg = qdev->rx_ring_shadow_reg_area +
-	    (rx_ring->cq_id * sizeof(u64) * 4);
+		(rx_ring->cq_id * RX_RING_SHADOW_SPACE);
 	u64 shadow_reg_dma = qdev->rx_ring_shadow_reg_dma +
-	    (rx_ring->cq_id * sizeof(u64) * 4);
+		(rx_ring->cq_id * RX_RING_SHADOW_SPACE);
 	void __iomem *doorbell_area =
 	    qdev->doorbell_area + (DB_PAGE_SIZE * (128 + rx_ring->cq_id));
 	int err = 0;
 	u16 bq_len;
 	u64 tmp;
+	__le64 *base_indirect_ptr;
+	int page_entries;
 
 	/* Set up the shadow registers for this ring. */
 	rx_ring->prod_idx_sh_reg = shadow_reg;
@@ -2568,8 +2570,8 @@ static int ql_start_rx_ring(struct ql_adapter *qdev, struct rx_ring *rx_ring)
 	shadow_reg_dma += sizeof(u64);
 	rx_ring->lbq_base_indirect = shadow_reg;
 	rx_ring->lbq_base_indirect_dma = shadow_reg_dma;
-	shadow_reg += sizeof(u64);
-	shadow_reg_dma += sizeof(u64);
+	shadow_reg += (sizeof(u64) * MAX_DB_PAGES_PER_BQ(rx_ring->lbq_len));
+	shadow_reg_dma += (sizeof(u64) * MAX_DB_PAGES_PER_BQ(rx_ring->lbq_len));
 	rx_ring->sbq_base_indirect = shadow_reg;
 	rx_ring->sbq_base_indirect_dma = shadow_reg_dma;
 
@@ -2606,7 +2608,14 @@ static int ql_start_rx_ring(struct ql_adapter *qdev, struct rx_ring *rx_ring)
 	if (rx_ring->lbq_len) {
 		cqicb->flags |= FLAGS_LL;	/* Load lbq values */
 		tmp = (u64)rx_ring->lbq_base_dma;;
-		*((__le64 *) rx_ring->lbq_base_indirect) = cpu_to_le64(tmp);
+		base_indirect_ptr = (__le64 *) rx_ring->lbq_base_indirect;
+		page_entries = 0;
+		do {
+			*base_indirect_ptr = cpu_to_le64(tmp);
+			tmp += DB_PAGE_SIZE;
+			base_indirect_ptr++;
+			page_entries++;
+		} while (page_entries < MAX_DB_PAGES_PER_BQ(rx_ring->lbq_len));
 		cqicb->lbq_addr =
 		    cpu_to_le64(rx_ring->lbq_base_indirect_dma);
 		bq_len = (rx_ring->lbq_buf_size == 65536) ? 0 :
@@ -2623,7 +2632,14 @@ static int ql_start_rx_ring(struct ql_adapter *qdev, struct rx_ring *rx_ring)
 	if (rx_ring->sbq_len) {
 		cqicb->flags |= FLAGS_LS;	/* Load sbq values */
 		tmp = (u64)rx_ring->sbq_base_dma;;
-		*((__le64 *) rx_ring->sbq_base_indirect) = cpu_to_le64(tmp);
+		base_indirect_ptr = (__le64 *) rx_ring->sbq_base_indirect;
+		page_entries = 0;
+		do {
+			*base_indirect_ptr = cpu_to_le64(tmp);
+			tmp += DB_PAGE_SIZE;
+			base_indirect_ptr++;
+			page_entries++;
+		} while (page_entries < MAX_DB_PAGES_PER_BQ(rx_ring->sbq_len));
 		cqicb->sbq_addr =
 		    cpu_to_le64(rx_ring->sbq_base_indirect_dma);
 		cqicb->sbq_buf_size =
-- 
1.6.0.2


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [net-next PATCH 3/3] qlge: Increase default TX/RX ring size to 1024.
  2009-06-11  1:49 [net-next PATCH 1/3] qlge: Relax alignment on TX harware queue Ron Mercer
  2009-06-11  1:49 ` [net-next PATCH 2/3] qlge: Allow RX buf rings to be > than 4096 bytes Ron Mercer
@ 2009-06-11  1:49 ` Ron Mercer
  2009-06-11  9:27   ` David Miller
  2009-06-11  9:37 ` [net-next PATCH 1/3] qlge: Relax alignment on TX harware queue David Miller
  2 siblings, 1 reply; 9+ messages in thread
From: Ron Mercer @ 2009-06-11  1:49 UTC (permalink / raw)
  To: davem; +Cc: netdev, ron.mercer


Signed-off-by: Ron Mercer <ron.mercer@qlogic.com>
---
 drivers/net/qlge/qlge.h |    8 ++++----
 1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/qlge/qlge.h b/drivers/net/qlge/qlge.h
index 156e02e..0c47bbe 100644
--- a/drivers/net/qlge/qlge.h
+++ b/drivers/net/qlge/qlge.h
@@ -36,11 +36,11 @@
 #define MAX_TX_RINGS MAX_CPUS
 #define MAX_RX_RINGS ((MAX_CPUS * 2) + 1)
 
-#define NUM_TX_RING_ENTRIES	256
-#define NUM_RX_RING_ENTRIES	256
+#define NUM_TX_RING_ENTRIES	1024
+#define NUM_RX_RING_ENTRIES	1024
 
-#define NUM_SMALL_BUFFERS   512
-#define NUM_LARGE_BUFFERS   512
+#define NUM_SMALL_BUFFERS   1024
+#define NUM_LARGE_BUFFERS   1024
 #define DB_PAGE_SIZE 4096
 
 /* Calculate the number of (4k) pages required to
-- 
1.6.0.2


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [net-next PATCH 3/3] qlge: Increase default TX/RX ring size to 1024.
  2009-06-11  1:49 ` [net-next PATCH 3/3] qlge: Increase default TX/RX ring size to 1024 Ron Mercer
@ 2009-06-11  9:27   ` David Miller
  2009-06-11 15:34     ` Stephen Hemminger
  2009-06-11 23:21     ` Ron Mercer
  0 siblings, 2 replies; 9+ messages in thread
From: David Miller @ 2009-06-11  9:27 UTC (permalink / raw)
  To: ron.mercer; +Cc: netdev

From: Ron Mercer <ron.mercer@qlogic.com>
Date: Wed, 10 Jun 2009 18:49:35 -0700

> 
> Signed-off-by: Ron Mercer <ron.mercer@qlogic.com>

This is huge.  Even other aggressive NICs such as BNX2X only use 256
ring entries per TX queue.

There is a point where increasing definitely hurts, because you're
increasing the resident set size of the cpu, as more and more SKBs are
effectively "in flight" at a given time and only due to the amount
you're allowing to get queued up into the chip.

And with multiqueue, per-queue TX queue sizes should matter less at
least to some extent.

Are you sure that jacking the value up this high has no negative side
effects for various workloads?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [net-next PATCH 1/3] qlge: Relax alignment on TX harware queue.
  2009-06-11  1:49 [net-next PATCH 1/3] qlge: Relax alignment on TX harware queue Ron Mercer
  2009-06-11  1:49 ` [net-next PATCH 2/3] qlge: Allow RX buf rings to be > than 4096 bytes Ron Mercer
  2009-06-11  1:49 ` [net-next PATCH 3/3] qlge: Increase default TX/RX ring size to 1024 Ron Mercer
@ 2009-06-11  9:37 ` David Miller
  2 siblings, 0 replies; 9+ messages in thread
From: David Miller @ 2009-06-11  9:37 UTC (permalink / raw)
  To: ron.mercer; +Cc: netdev

From: Ron Mercer <ron.mercer@qlogic.com>
Date: Wed, 10 Jun 2009 18:49:33 -0700

> The alignment was on size of queue boundary, but the hardware
> only requires 4-byte alignment.
> 
> Signed-off-by: Ron Mercer <ron.mercer@qlogic.com>

Applied.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [net-next PATCH 2/3] qlge: Allow RX buf rings to be > than 4096 bytes.
  2009-06-11  1:49 ` [net-next PATCH 2/3] qlge: Allow RX buf rings to be > than 4096 bytes Ron Mercer
@ 2009-06-11  9:37   ` David Miller
  0 siblings, 0 replies; 9+ messages in thread
From: David Miller @ 2009-06-11  9:37 UTC (permalink / raw)
  To: ron.mercer; +Cc: netdev

From: Ron Mercer <ron.mercer@qlogic.com>
Date: Wed, 10 Jun 2009 18:49:34 -0700

> RX buffer rings can be comprised of non-contiguous fixed
> size chunks of memory. The ring is given to the hardware
> as a pointer to a location that stores the location of
> the queue.  If the queue is greater than 4096 bytes then
> the hardware gets a list of said pointers.
> This patch addes the necessary logic to generate the list if
> the queue size exceeds 4096.
> 
> Signed-off-by: Ron Mercer <ron.mercer@qlogic.com>

Applied.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [net-next PATCH 3/3] qlge: Increase default TX/RX ring size to 1024.
  2009-06-11  9:27   ` David Miller
@ 2009-06-11 15:34     ` Stephen Hemminger
  2009-06-11 23:21     ` Ron Mercer
  1 sibling, 0 replies; 9+ messages in thread
From: Stephen Hemminger @ 2009-06-11 15:34 UTC (permalink / raw)
  To: David Miller; +Cc: ron.mercer, netdev

On Thu, 11 Jun 2009 02:27:13 -0700 (PDT)
David Miller <davem@davemloft.net> wrote:

> From: Ron Mercer <ron.mercer@qlogic.com>
> Date: Wed, 10 Jun 2009 18:49:35 -0700
> 
> > 
> > Signed-off-by: Ron Mercer <ron.mercer@qlogic.com>
> 
> This is huge.  Even other aggressive NICs such as BNX2X only use 256
> ring entries per TX queue.
> 
> There is a point where increasing definitely hurts, because you're
> increasing the resident set size of the cpu, as more and more SKBs are
> effectively "in flight" at a given time and only due to the amount
> you're allowing to get queued up into the chip.
> 
> And with multiqueue, per-queue TX queue sizes should matter less at
> least to some extent.
> 
> Are you sure that jacking the value up this high has no negative side
> effects for various workloads?

I am investigating reducing the sky2 default tx ring size down to
128 after user complaints about the latency. At 10G 1024 ring
entries is 7ms for jumbo frames.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [net-next PATCH 3/3] qlge: Increase default TX/RX ring size to 1024.
  2009-06-11  9:27   ` David Miller
  2009-06-11 15:34     ` Stephen Hemminger
@ 2009-06-11 23:21     ` Ron Mercer
  2009-06-11 23:50       ` David Miller
  1 sibling, 1 reply; 9+ messages in thread
From: Ron Mercer @ 2009-06-11 23:21 UTC (permalink / raw)
  To: David Miller; +Cc: netdev@vger.kernel.org

> This is huge.  Even other aggressive NICs such as BNX2X only use 256
> ring entries per TX queue.
> 
> There is a point where increasing definitely hurts, because you're
> increasing the resident set size of the cpu, as more and more SKBs are
> effectively "in flight" at a given time and only due to the amount
> you're allowing to get queued up into the chip.
> 
> And with multiqueue, per-queue TX queue sizes should matter less at
> least to some extent.
> 
> Are you sure that jacking the value up this high has no negative side
> effects for various workloads?

Just drop this patch for now.  I looked at our spreadsheet and ran the
tests again and we see a (marginal) throughput increase that leveled off
at TXQ length of 1024.  In light of yours and Stephens comments I prefer
to revisit this issue.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [net-next PATCH 3/3] qlge: Increase default TX/RX ring size to 1024.
  2009-06-11 23:21     ` Ron Mercer
@ 2009-06-11 23:50       ` David Miller
  0 siblings, 0 replies; 9+ messages in thread
From: David Miller @ 2009-06-11 23:50 UTC (permalink / raw)
  To: ron.mercer; +Cc: netdev

From: Ron Mercer <ron.mercer@qlogic.com>
Date: Thu, 11 Jun 2009 16:21:59 -0700

>> This is huge.  Even other aggressive NICs such as BNX2X only use 256
>> ring entries per TX queue.
>> 
>> There is a point where increasing definitely hurts, because you're
>> increasing the resident set size of the cpu, as more and more SKBs are
>> effectively "in flight" at a given time and only due to the amount
>> you're allowing to get queued up into the chip.
>> 
>> And with multiqueue, per-queue TX queue sizes should matter less at
>> least to some extent.
>> 
>> Are you sure that jacking the value up this high has no negative side
>> effects for various workloads?
> 
> Just drop this patch for now.  I looked at our spreadsheet and ran the
> tests again and we see a (marginal) throughput increase that leveled off
> at TXQ length of 1024.  In light of yours and Stephens comments I prefer
> to revisit this issue.

Ok, thanks.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2009-06-11 23:50 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-06-11  1:49 [net-next PATCH 1/3] qlge: Relax alignment on TX harware queue Ron Mercer
2009-06-11  1:49 ` [net-next PATCH 2/3] qlge: Allow RX buf rings to be > than 4096 bytes Ron Mercer
2009-06-11  9:37   ` David Miller
2009-06-11  1:49 ` [net-next PATCH 3/3] qlge: Increase default TX/RX ring size to 1024 Ron Mercer
2009-06-11  9:27   ` David Miller
2009-06-11 15:34     ` Stephen Hemminger
2009-06-11 23:21     ` Ron Mercer
2009-06-11 23:50       ` David Miller
2009-06-11  9:37 ` [net-next PATCH 1/3] qlge: Relax alignment on TX harware queue David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).