netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next 1/4] bnx2: Use proper counter for net_device_stats->multicast.
@ 2010-07-20  0:15 Michael Chan
  2010-07-20  0:15 ` [PATCH net-next 2/4] bnx2: Call pci_enable_msix() with actual number of vectors Michael Chan
  2010-07-20  3:31 ` [PATCH net-next 1/4] bnx2: Use proper counter for net_device_stats->multicast David Miller
  0 siblings, 2 replies; 10+ messages in thread
From: Michael Chan @ 2010-07-20  0:15 UTC (permalink / raw)
  To: davem; +Cc: netdev

We were using the wrong tx multicast counter instead of the rx multicast
counter.

Reported-by: Peter Snellman <peter.snellman@cinnober.com>
Reviewed-by: Benjamin Li <benli@broadcom.com>
Reviewed-by: Matt Carlson <mcarlson@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
---
 drivers/net/bnx2.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/bnx2.c b/drivers/net/bnx2.c
index ce3217b..deb7f83 100644
--- a/drivers/net/bnx2.c
+++ b/drivers/net/bnx2.c
@@ -6631,7 +6631,7 @@ bnx2_get_stats64(struct net_device *dev, struct rtnl_link_stats64 *net_stats)
 		GET_64BIT_NET_STATS(stat_IfHCOutOctets);
 
 	net_stats->multicast =
-		GET_64BIT_NET_STATS(stat_IfHCOutMulticastPkts);
+		GET_64BIT_NET_STATS(stat_IfHCInMulticastPkts);
 
 	net_stats->collisions =
 		GET_32BIT_NET_STATS(stat_EtherStatsCollisions);
-- 
1.6.4.GIT



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH net-next 2/4] bnx2: Call pci_enable_msix() with actual number of vectors.
  2010-07-20  0:15 [PATCH net-next 1/4] bnx2: Use proper counter for net_device_stats->multicast Michael Chan
@ 2010-07-20  0:15 ` Michael Chan
  2010-07-20  0:15   ` [PATCH net-next 3/4] bnx2: Remove some unnecessary smp_mb() in tx fast path Michael Chan
  2010-07-20  3:31   ` [PATCH net-next 2/4] bnx2: Call pci_enable_msix() with actual number of vectors David Miller
  2010-07-20  3:31 ` [PATCH net-next 1/4] bnx2: Use proper counter for net_device_stats->multicast David Miller
  1 sibling, 2 replies; 10+ messages in thread
From: Michael Chan @ 2010-07-20  0:15 UTC (permalink / raw)
  To: davem; +Cc: netdev, Breno Leitão

Based on original patch by Breno Leitão <leitao@linux.vnet.ibm.com>.

Allocate the actual number of vectors and make use of fewer vectors
if pci_enable_msix() returns > 0.  We must allocate one additional
vector for the cnic driver.

Cc: Breno Leitão <leitao@linux.vnet.ibm.com>
Reviewed-by: Benjamin Li <benli@broadcom.com>
Reviewed-by: Matt Carlson <mcarlson@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
---
 drivers/net/bnx2.c |   24 ++++++++++++++++++++----
 drivers/net/bnx2.h |    9 ++++++---
 2 files changed, 26 insertions(+), 7 deletions(-)

diff --git a/drivers/net/bnx2.c b/drivers/net/bnx2.c
index deb7f83..d44ecc3 100644
--- a/drivers/net/bnx2.c
+++ b/drivers/net/bnx2.c
@@ -864,7 +864,7 @@ bnx2_alloc_mem(struct bnx2 *bp)
 	bnapi->hw_rx_cons_ptr =
 		&bnapi->status_blk.msi->status_rx_quick_consumer_index0;
 	if (bp->flags & BNX2_FLAG_MSIX_CAP) {
-		for (i = 1; i < BNX2_MAX_MSIX_VEC; i++) {
+		for (i = 1; i < bp->irq_nvecs; i++) {
 			struct status_block_msix *sblk;
 
 			bnapi = &bp->bnx2_napi[i];
@@ -6152,7 +6152,7 @@ bnx2_free_irq(struct bnx2 *bp)
 static void
 bnx2_enable_msix(struct bnx2 *bp, int msix_vecs)
 {
-	int i, rc;
+	int i, total_vecs, rc;
 	struct msix_entry msix_ent[BNX2_MAX_MSIX_VEC];
 	struct net_device *dev = bp->dev;
 	const int len = sizeof(bp->irq_tbl[0].name);
@@ -6171,13 +6171,29 @@ bnx2_enable_msix(struct bnx2 *bp, int msix_vecs)
 		msix_ent[i].vector = 0;
 	}
 
-	rc = pci_enable_msix(bp->pdev, msix_ent, BNX2_MAX_MSIX_VEC);
+	total_vecs = msix_vecs;
+#ifdef BCM_CNIC
+	total_vecs++;
+#endif
+	rc = -ENOSPC;
+	while (total_vecs >= BNX2_MIN_MSIX_VEC) {
+		rc = pci_enable_msix(bp->pdev, msix_ent, total_vecs);
+		if (rc <= 0)
+			break;
+		if (rc > 0)
+			total_vecs = rc;
+	}
+
 	if (rc != 0)
 		return;
 
+	msix_vecs = total_vecs;
+#ifdef BCM_CNIC
+	msix_vecs--;
+#endif
 	bp->irq_nvecs = msix_vecs;
 	bp->flags |= BNX2_FLAG_USING_MSIX | BNX2_FLAG_ONE_SHOT_MSI;
-	for (i = 0; i < BNX2_MAX_MSIX_VEC; i++) {
+	for (i = 0; i < total_vecs; i++) {
 		bp->irq_tbl[i].vector = msix_ent[i].vector;
 		snprintf(bp->irq_tbl[i].name, len, "%s-%d", dev->name, i);
 		bp->irq_tbl[i].handler = bnx2_msi_1shot;
diff --git a/drivers/net/bnx2.h b/drivers/net/bnx2.h
index b9af6bc..2104c10 100644
--- a/drivers/net/bnx2.h
+++ b/drivers/net/bnx2.h
@@ -6637,9 +6637,12 @@ struct flash_spec {
 
 #define BNX2_MAX_MSIX_HW_VEC	9
 #define BNX2_MAX_MSIX_VEC	9
-#define BNX2_BASE_VEC		0
-#define BNX2_TX_VEC		1
-#define BNX2_TX_INT_NUM	(BNX2_TX_VEC << BNX2_PCICFG_INT_ACK_CMD_INT_NUM_SHIFT)
+#ifdef BCM_CNIC
+#define BNX2_MIN_MSIX_VEC	2
+#else
+#define BNX2_MIN_MSIX_VEC	1
+#endif
+
 
 struct bnx2_irq {
 	irq_handler_t	handler;
-- 
1.6.4.GIT



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH net-next 3/4] bnx2: Remove some unnecessary smp_mb() in tx fast path.
  2010-07-20  0:15 ` [PATCH net-next 2/4] bnx2: Call pci_enable_msix() with actual number of vectors Michael Chan
@ 2010-07-20  0:15   ` Michael Chan
  2010-07-20  0:15     ` [PATCH net-next 4/4] bnx2: Update version to 2.0.17 Michael Chan
                       ` (2 more replies)
  2010-07-20  3:31   ` [PATCH net-next 2/4] bnx2: Call pci_enable_msix() with actual number of vectors David Miller
  1 sibling, 3 replies; 10+ messages in thread
From: Michael Chan @ 2010-07-20  0:15 UTC (permalink / raw)
  To: davem; +Cc: netdev

smp_mb() inside bnx2_tx_avail() is used twice in the normal
bnx2_start_xmit() path (see illustration below).  The full memory
barrier is only necessary during race conditions with tx completion.
We can speed up the tx path by replacing smp_mb() in bnx2_tx_avail()
with a compiler barrier.  The compiler barrier is to force the
compiler to fetch the tx_prod and tx_cons from memory.

In the race condition between bnx2_start_xmit() and bnx2_tx_int(),
we have the following situation:

bnx2_start_xmit()                       bnx2_tx_int()
    if (!bnx2_tx_avail())
            BUG();

    ...

    if (!bnx2_tx_avail())
            netif_tx_stop_queue();          update_tx_index();
            smp_mb();                       smp_mb();
            if (bnx2_tx_avail())            if (netif_tx_queue_stopped() &&
                    netif_tx_wake_queue();      bnx2_tx_avail())

With smp_mb() removed from bnx2_tx_avail(), we need to add smp_mb() to
bnx2_start_xmit() as shown above to properly order netif_tx_stop_queue()
and bnx2_tx_avail() to check the ring index.  If it is not strictly
ordered, the tx queue can be stopped forever.

This improves performance by about 5% with 2 ports running bi-directional
64-byte packets.

Reviewed-by: Benjamin Li <benli@broadcom.com>
Reviewed-by: Matt Carlson <mcarlson@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
---
 drivers/net/bnx2.c |   10 +++++++++-
 1 files changed, 9 insertions(+), 1 deletions(-)

diff --git a/drivers/net/bnx2.c b/drivers/net/bnx2.c
index d44ecc3..2af570d 100644
--- a/drivers/net/bnx2.c
+++ b/drivers/net/bnx2.c
@@ -253,7 +253,8 @@ static inline u32 bnx2_tx_avail(struct bnx2 *bp, struct bnx2_tx_ring_info *txr)
 {
 	u32 diff;
 
-	smp_mb();
+	/* Tell compiler to fetch tx_prod and tx_cons from memory. */
+	barrier();
 
 	/* The ring uses 256 indices for 255 entries, one of them
 	 * needs to be skipped.
@@ -6534,6 +6535,13 @@ bnx2_start_xmit(struct sk_buff *skb, struct net_device *dev)
 
 	if (unlikely(bnx2_tx_avail(bp, txr) <= MAX_SKB_FRAGS)) {
 		netif_tx_stop_queue(txq);
+
+		/* netif_tx_stop_queue() must be done before checking
+		 * tx index in bnx2_tx_avail() below, because in
+		 * bnx2_tx_int(), we update tx index before checking for
+		 * netif_tx_queue_stopped().
+		 */
+		smp_mb();
 		if (bnx2_tx_avail(bp, txr) > bp->tx_wake_thresh)
 			netif_tx_wake_queue(txq);
 	}
-- 
1.6.4.GIT



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH net-next 4/4] bnx2: Update version to 2.0.17.
  2010-07-20  0:15   ` [PATCH net-next 3/4] bnx2: Remove some unnecessary smp_mb() in tx fast path Michael Chan
@ 2010-07-20  0:15     ` Michael Chan
  2010-07-20  3:31       ` David Miller
  2010-07-20  3:31     ` [PATCH net-next 3/4] bnx2: Remove some unnecessary smp_mb() in tx fast path David Miller
  2010-07-20  5:38     ` Eric Dumazet
  2 siblings, 1 reply; 10+ messages in thread
From: Michael Chan @ 2010-07-20  0:15 UTC (permalink / raw)
  To: davem; +Cc: netdev

Signed-off-by: Michael Chan <mchan@broadcom.com>
---
 drivers/net/bnx2.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/bnx2.c b/drivers/net/bnx2.c
index 2af570d..e6a803f 100644
--- a/drivers/net/bnx2.c
+++ b/drivers/net/bnx2.c
@@ -58,8 +58,8 @@
 #include "bnx2_fw.h"
 
 #define DRV_MODULE_NAME		"bnx2"
-#define DRV_MODULE_VERSION	"2.0.16"
-#define DRV_MODULE_RELDATE	"July 2, 2010"
+#define DRV_MODULE_VERSION	"2.0.17"
+#define DRV_MODULE_RELDATE	"July 18, 2010"
 #define FW_MIPS_FILE_06		"bnx2/bnx2-mips-06-5.0.0.j6.fw"
 #define FW_RV2P_FILE_06		"bnx2/bnx2-rv2p-06-5.0.0.j3.fw"
 #define FW_MIPS_FILE_09		"bnx2/bnx2-mips-09-5.0.0.j15.fw"
-- 
1.6.4.GIT



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH net-next 1/4] bnx2: Use proper counter for net_device_stats->multicast.
  2010-07-20  0:15 [PATCH net-next 1/4] bnx2: Use proper counter for net_device_stats->multicast Michael Chan
  2010-07-20  0:15 ` [PATCH net-next 2/4] bnx2: Call pci_enable_msix() with actual number of vectors Michael Chan
@ 2010-07-20  3:31 ` David Miller
  1 sibling, 0 replies; 10+ messages in thread
From: David Miller @ 2010-07-20  3:31 UTC (permalink / raw)
  To: mchan; +Cc: netdev

From: "Michael Chan" <mchan@broadcom.com>
Date: Mon, 19 Jul 2010 17:15:02 -0700

> We were using the wrong tx multicast counter instead of the rx multicast
> counter.
> 
> Reported-by: Peter Snellman <peter.snellman@cinnober.com>
> Reviewed-by: Benjamin Li <benli@broadcom.com>
> Reviewed-by: Matt Carlson <mcarlson@broadcom.com>
> Signed-off-by: Michael Chan <mchan@broadcom.com>

Applied.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net-next 2/4] bnx2: Call pci_enable_msix() with actual number of vectors.
  2010-07-20  0:15 ` [PATCH net-next 2/4] bnx2: Call pci_enable_msix() with actual number of vectors Michael Chan
  2010-07-20  0:15   ` [PATCH net-next 3/4] bnx2: Remove some unnecessary smp_mb() in tx fast path Michael Chan
@ 2010-07-20  3:31   ` David Miller
  1 sibling, 0 replies; 10+ messages in thread
From: David Miller @ 2010-07-20  3:31 UTC (permalink / raw)
  To: mchan; +Cc: netdev, leitao

From: "Michael Chan" <mchan@broadcom.com>
Date: Mon, 19 Jul 2010 17:15:03 -0700

> Based on original patch by Breno Leitão <leitao@linux.vnet.ibm.com>.
> 
> Allocate the actual number of vectors and make use of fewer vectors
> if pci_enable_msix() returns > 0.  We must allocate one additional
> vector for the cnic driver.
> 
> Cc: Breno Leitão <leitao@linux.vnet.ibm.com>
> Reviewed-by: Benjamin Li <benli@broadcom.com>
> Reviewed-by: Matt Carlson <mcarlson@broadcom.com>
> Signed-off-by: Michael Chan <mchan@broadcom.com>

Applied.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net-next 3/4] bnx2: Remove some unnecessary smp_mb() in tx fast path.
  2010-07-20  0:15   ` [PATCH net-next 3/4] bnx2: Remove some unnecessary smp_mb() in tx fast path Michael Chan
  2010-07-20  0:15     ` [PATCH net-next 4/4] bnx2: Update version to 2.0.17 Michael Chan
@ 2010-07-20  3:31     ` David Miller
  2010-07-20  5:38     ` Eric Dumazet
  2 siblings, 0 replies; 10+ messages in thread
From: David Miller @ 2010-07-20  3:31 UTC (permalink / raw)
  To: mchan; +Cc: netdev

From: "Michael Chan" <mchan@broadcom.com>
Date: Mon, 19 Jul 2010 17:15:04 -0700

> smp_mb() inside bnx2_tx_avail() is used twice in the normal
> bnx2_start_xmit() path (see illustration below).  The full memory
> barrier is only necessary during race conditions with tx completion.
> We can speed up the tx path by replacing smp_mb() in bnx2_tx_avail()
> with a compiler barrier.  The compiler barrier is to force the
> compiler to fetch the tx_prod and tx_cons from memory.
> 
> In the race condition between bnx2_start_xmit() and bnx2_tx_int(),
> we have the following situation:
> 
> bnx2_start_xmit()                       bnx2_tx_int()
>     if (!bnx2_tx_avail())
>             BUG();
> 
>     ...
> 
>     if (!bnx2_tx_avail())
>             netif_tx_stop_queue();          update_tx_index();
>             smp_mb();                       smp_mb();
>             if (bnx2_tx_avail())            if (netif_tx_queue_stopped() &&
>                     netif_tx_wake_queue();      bnx2_tx_avail())
> 
> With smp_mb() removed from bnx2_tx_avail(), we need to add smp_mb() to
> bnx2_start_xmit() as shown above to properly order netif_tx_stop_queue()
> and bnx2_tx_avail() to check the ring index.  If it is not strictly
> ordered, the tx queue can be stopped forever.
> 
> This improves performance by about 5% with 2 ports running bi-directional
> 64-byte packets.
> 
> Reviewed-by: Benjamin Li <benli@broadcom.com>
> Reviewed-by: Matt Carlson <mcarlson@broadcom.com>
> Signed-off-by: Michael Chan <mchan@broadcom.com>

Applied.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net-next 4/4] bnx2: Update version to 2.0.17.
  2010-07-20  0:15     ` [PATCH net-next 4/4] bnx2: Update version to 2.0.17 Michael Chan
@ 2010-07-20  3:31       ` David Miller
  0 siblings, 0 replies; 10+ messages in thread
From: David Miller @ 2010-07-20  3:31 UTC (permalink / raw)
  To: mchan; +Cc: netdev

From: "Michael Chan" <mchan@broadcom.com>
Date: Mon, 19 Jul 2010 17:15:05 -0700

> Signed-off-by: Michael Chan <mchan@broadcom.com>

Applied.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net-next 3/4] bnx2: Remove some unnecessary smp_mb() in tx fast path.
  2010-07-20  0:15   ` [PATCH net-next 3/4] bnx2: Remove some unnecessary smp_mb() in tx fast path Michael Chan
  2010-07-20  0:15     ` [PATCH net-next 4/4] bnx2: Update version to 2.0.17 Michael Chan
  2010-07-20  3:31     ` [PATCH net-next 3/4] bnx2: Remove some unnecessary smp_mb() in tx fast path David Miller
@ 2010-07-20  5:38     ` Eric Dumazet
  2010-07-20  5:48       ` Michael Chan
  2 siblings, 1 reply; 10+ messages in thread
From: Eric Dumazet @ 2010-07-20  5:38 UTC (permalink / raw)
  To: Michael Chan; +Cc: davem, netdev

Le lundi 19 juillet 2010 à 17:15 -0700, Michael Chan a écrit :
> smp_mb() inside bnx2_tx_avail() is used twice in the normal
> bnx2_start_xmit() path (see illustration below).  The full memory
> barrier is only necessary during race conditions with tx completion.
> We can speed up the tx path by replacing smp_mb() in bnx2_tx_avail()
> with a compiler barrier.  The compiler barrier is to force the
> compiler to fetch the tx_prod and tx_cons from memory.
> 
> In the race condition between bnx2_start_xmit() and bnx2_tx_int(),
> we have the following situation:
> 
> bnx2_start_xmit()                       bnx2_tx_int()
>     if (!bnx2_tx_avail())
>             BUG();
> 
>     ...
> 
>     if (!bnx2_tx_avail())
>             netif_tx_stop_queue();          update_tx_index();
>             smp_mb();                       smp_mb();
>             if (bnx2_tx_avail())            if (netif_tx_queue_stopped() &&
>                     netif_tx_wake_queue();      bnx2_tx_avail())
> 
> With smp_mb() removed from bnx2_tx_avail(), we need to add smp_mb() to
> bnx2_start_xmit() as shown above to properly order netif_tx_stop_queue()
> and bnx2_tx_avail() to check the ring index.  If it is not strictly
> ordered, the tx queue can be stopped forever.
> 
> This improves performance by about 5% with 2 ports running bi-directional
> 64-byte packets.
> 
> Reviewed-by: Benjamin Li <benli@broadcom.com>
> Reviewed-by: Matt Carlson <mcarlson@broadcom.com>
> Signed-off-by: Michael Chan <mchan@broadcom.com>
> ---
>  drivers/net/bnx2.c |   10 +++++++++-
>  1 files changed, 9 insertions(+), 1 deletions(-)
> 
> diff --git a/drivers/net/bnx2.c b/drivers/net/bnx2.c
> index d44ecc3..2af570d 100644
> --- a/drivers/net/bnx2.c
> +++ b/drivers/net/bnx2.c
> @@ -253,7 +253,8 @@ static inline u32 bnx2_tx_avail(struct bnx2 *bp, struct bnx2_tx_ring_info *txr)
>  {
>  	u32 diff;
>  
> -	smp_mb();
> +	/* Tell compiler to fetch tx_prod and tx_cons from memory. */
> +	barrier();
>  
>  	/* The ring uses 256 indices for 255 entries, one of them
>  	 * needs to be skipped.
> @@ -6534,6 +6535,13 @@ bnx2_start_xmit(struct sk_buff *skb, struct net_device *dev)
>  
>  	if (unlikely(bnx2_tx_avail(bp, txr) <= MAX_SKB_FRAGS)) {
>  		netif_tx_stop_queue(txq);
> +
> +		/* netif_tx_stop_queue() must be done before checking
> +		 * tx index in bnx2_tx_avail() below, because in
> +		 * bnx2_tx_int(), we update tx index before checking for
> +		 * netif_tx_queue_stopped().
> +		 */
> +		smp_mb();
>  		if (bnx2_tx_avail(bp, txr) > bp->tx_wake_thresh)
>  			netif_tx_wake_queue(txq);
>  	}

Excellent,

Is similar patch for tg3 planned ?

Thanks




^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net-next 3/4] bnx2: Remove some unnecessary smp_mb() in tx fast path.
  2010-07-20  5:38     ` Eric Dumazet
@ 2010-07-20  5:48       ` Michael Chan
  0 siblings, 0 replies; 10+ messages in thread
From: Michael Chan @ 2010-07-20  5:48 UTC (permalink / raw)
  To: 'Eric Dumazet'; +Cc: davem@davemloft.net, netdev@vger.kernel.org

Eric Dumazet wrote:

> Excellent,
>
> Is similar patch for tg3 planned ?
>

Yes, this performance issue was actually first discovered when
profiling a new tg3 device.  So Matt should be sending out a
very similar patch as part of his next patch-set.


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2010-07-20  5:48 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-07-20  0:15 [PATCH net-next 1/4] bnx2: Use proper counter for net_device_stats->multicast Michael Chan
2010-07-20  0:15 ` [PATCH net-next 2/4] bnx2: Call pci_enable_msix() with actual number of vectors Michael Chan
2010-07-20  0:15   ` [PATCH net-next 3/4] bnx2: Remove some unnecessary smp_mb() in tx fast path Michael Chan
2010-07-20  0:15     ` [PATCH net-next 4/4] bnx2: Update version to 2.0.17 Michael Chan
2010-07-20  3:31       ` David Miller
2010-07-20  3:31     ` [PATCH net-next 3/4] bnx2: Remove some unnecessary smp_mb() in tx fast path David Miller
2010-07-20  5:38     ` Eric Dumazet
2010-07-20  5:48       ` Michael Chan
2010-07-20  3:31   ` [PATCH net-next 2/4] bnx2: Call pci_enable_msix() with actual number of vectors David Miller
2010-07-20  3:31 ` [PATCH net-next 1/4] bnx2: Use proper counter for net_device_stats->multicast David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).