* [PATCH net-next 1/4] bnx2: Use proper counter for net_device_stats->multicast.
@ 2010-07-20 0:15 Michael Chan
2010-07-20 0:15 ` [PATCH net-next 2/4] bnx2: Call pci_enable_msix() with actual number of vectors Michael Chan
2010-07-20 3:31 ` [PATCH net-next 1/4] bnx2: Use proper counter for net_device_stats->multicast David Miller
0 siblings, 2 replies; 10+ messages in thread
From: Michael Chan @ 2010-07-20 0:15 UTC (permalink / raw)
To: davem; +Cc: netdev
We were using the wrong tx multicast counter instead of the rx multicast
counter.
Reported-by: Peter Snellman <peter.snellman@cinnober.com>
Reviewed-by: Benjamin Li <benli@broadcom.com>
Reviewed-by: Matt Carlson <mcarlson@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
---
drivers/net/bnx2.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/drivers/net/bnx2.c b/drivers/net/bnx2.c
index ce3217b..deb7f83 100644
--- a/drivers/net/bnx2.c
+++ b/drivers/net/bnx2.c
@@ -6631,7 +6631,7 @@ bnx2_get_stats64(struct net_device *dev, struct rtnl_link_stats64 *net_stats)
GET_64BIT_NET_STATS(stat_IfHCOutOctets);
net_stats->multicast =
- GET_64BIT_NET_STATS(stat_IfHCOutMulticastPkts);
+ GET_64BIT_NET_STATS(stat_IfHCInMulticastPkts);
net_stats->collisions =
GET_32BIT_NET_STATS(stat_EtherStatsCollisions);
--
1.6.4.GIT
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH net-next 2/4] bnx2: Call pci_enable_msix() with actual number of vectors.
2010-07-20 0:15 [PATCH net-next 1/4] bnx2: Use proper counter for net_device_stats->multicast Michael Chan
@ 2010-07-20 0:15 ` Michael Chan
2010-07-20 0:15 ` [PATCH net-next 3/4] bnx2: Remove some unnecessary smp_mb() in tx fast path Michael Chan
2010-07-20 3:31 ` [PATCH net-next 2/4] bnx2: Call pci_enable_msix() with actual number of vectors David Miller
2010-07-20 3:31 ` [PATCH net-next 1/4] bnx2: Use proper counter for net_device_stats->multicast David Miller
1 sibling, 2 replies; 10+ messages in thread
From: Michael Chan @ 2010-07-20 0:15 UTC (permalink / raw)
To: davem; +Cc: netdev, Breno Leitão
Based on original patch by Breno Leitão <leitao@linux.vnet.ibm.com>.
Allocate the actual number of vectors and make use of fewer vectors
if pci_enable_msix() returns > 0. We must allocate one additional
vector for the cnic driver.
Cc: Breno Leitão <leitao@linux.vnet.ibm.com>
Reviewed-by: Benjamin Li <benli@broadcom.com>
Reviewed-by: Matt Carlson <mcarlson@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
---
drivers/net/bnx2.c | 24 ++++++++++++++++++++----
drivers/net/bnx2.h | 9 ++++++---
2 files changed, 26 insertions(+), 7 deletions(-)
diff --git a/drivers/net/bnx2.c b/drivers/net/bnx2.c
index deb7f83..d44ecc3 100644
--- a/drivers/net/bnx2.c
+++ b/drivers/net/bnx2.c
@@ -864,7 +864,7 @@ bnx2_alloc_mem(struct bnx2 *bp)
bnapi->hw_rx_cons_ptr =
&bnapi->status_blk.msi->status_rx_quick_consumer_index0;
if (bp->flags & BNX2_FLAG_MSIX_CAP) {
- for (i = 1; i < BNX2_MAX_MSIX_VEC; i++) {
+ for (i = 1; i < bp->irq_nvecs; i++) {
struct status_block_msix *sblk;
bnapi = &bp->bnx2_napi[i];
@@ -6152,7 +6152,7 @@ bnx2_free_irq(struct bnx2 *bp)
static void
bnx2_enable_msix(struct bnx2 *bp, int msix_vecs)
{
- int i, rc;
+ int i, total_vecs, rc;
struct msix_entry msix_ent[BNX2_MAX_MSIX_VEC];
struct net_device *dev = bp->dev;
const int len = sizeof(bp->irq_tbl[0].name);
@@ -6171,13 +6171,29 @@ bnx2_enable_msix(struct bnx2 *bp, int msix_vecs)
msix_ent[i].vector = 0;
}
- rc = pci_enable_msix(bp->pdev, msix_ent, BNX2_MAX_MSIX_VEC);
+ total_vecs = msix_vecs;
+#ifdef BCM_CNIC
+ total_vecs++;
+#endif
+ rc = -ENOSPC;
+ while (total_vecs >= BNX2_MIN_MSIX_VEC) {
+ rc = pci_enable_msix(bp->pdev, msix_ent, total_vecs);
+ if (rc <= 0)
+ break;
+ if (rc > 0)
+ total_vecs = rc;
+ }
+
if (rc != 0)
return;
+ msix_vecs = total_vecs;
+#ifdef BCM_CNIC
+ msix_vecs--;
+#endif
bp->irq_nvecs = msix_vecs;
bp->flags |= BNX2_FLAG_USING_MSIX | BNX2_FLAG_ONE_SHOT_MSI;
- for (i = 0; i < BNX2_MAX_MSIX_VEC; i++) {
+ for (i = 0; i < total_vecs; i++) {
bp->irq_tbl[i].vector = msix_ent[i].vector;
snprintf(bp->irq_tbl[i].name, len, "%s-%d", dev->name, i);
bp->irq_tbl[i].handler = bnx2_msi_1shot;
diff --git a/drivers/net/bnx2.h b/drivers/net/bnx2.h
index b9af6bc..2104c10 100644
--- a/drivers/net/bnx2.h
+++ b/drivers/net/bnx2.h
@@ -6637,9 +6637,12 @@ struct flash_spec {
#define BNX2_MAX_MSIX_HW_VEC 9
#define BNX2_MAX_MSIX_VEC 9
-#define BNX2_BASE_VEC 0
-#define BNX2_TX_VEC 1
-#define BNX2_TX_INT_NUM (BNX2_TX_VEC << BNX2_PCICFG_INT_ACK_CMD_INT_NUM_SHIFT)
+#ifdef BCM_CNIC
+#define BNX2_MIN_MSIX_VEC 2
+#else
+#define BNX2_MIN_MSIX_VEC 1
+#endif
+
struct bnx2_irq {
irq_handler_t handler;
--
1.6.4.GIT
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH net-next 3/4] bnx2: Remove some unnecessary smp_mb() in tx fast path.
2010-07-20 0:15 ` [PATCH net-next 2/4] bnx2: Call pci_enable_msix() with actual number of vectors Michael Chan
@ 2010-07-20 0:15 ` Michael Chan
2010-07-20 0:15 ` [PATCH net-next 4/4] bnx2: Update version to 2.0.17 Michael Chan
` (2 more replies)
2010-07-20 3:31 ` [PATCH net-next 2/4] bnx2: Call pci_enable_msix() with actual number of vectors David Miller
1 sibling, 3 replies; 10+ messages in thread
From: Michael Chan @ 2010-07-20 0:15 UTC (permalink / raw)
To: davem; +Cc: netdev
smp_mb() inside bnx2_tx_avail() is used twice in the normal
bnx2_start_xmit() path (see illustration below). The full memory
barrier is only necessary during race conditions with tx completion.
We can speed up the tx path by replacing smp_mb() in bnx2_tx_avail()
with a compiler barrier. The compiler barrier is to force the
compiler to fetch the tx_prod and tx_cons from memory.
In the race condition between bnx2_start_xmit() and bnx2_tx_int(),
we have the following situation:
bnx2_start_xmit() bnx2_tx_int()
if (!bnx2_tx_avail())
BUG();
...
if (!bnx2_tx_avail())
netif_tx_stop_queue(); update_tx_index();
smp_mb(); smp_mb();
if (bnx2_tx_avail()) if (netif_tx_queue_stopped() &&
netif_tx_wake_queue(); bnx2_tx_avail())
With smp_mb() removed from bnx2_tx_avail(), we need to add smp_mb() to
bnx2_start_xmit() as shown above to properly order netif_tx_stop_queue()
and bnx2_tx_avail() to check the ring index. If it is not strictly
ordered, the tx queue can be stopped forever.
This improves performance by about 5% with 2 ports running bi-directional
64-byte packets.
Reviewed-by: Benjamin Li <benli@broadcom.com>
Reviewed-by: Matt Carlson <mcarlson@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
---
drivers/net/bnx2.c | 10 +++++++++-
1 files changed, 9 insertions(+), 1 deletions(-)
diff --git a/drivers/net/bnx2.c b/drivers/net/bnx2.c
index d44ecc3..2af570d 100644
--- a/drivers/net/bnx2.c
+++ b/drivers/net/bnx2.c
@@ -253,7 +253,8 @@ static inline u32 bnx2_tx_avail(struct bnx2 *bp, struct bnx2_tx_ring_info *txr)
{
u32 diff;
- smp_mb();
+ /* Tell compiler to fetch tx_prod and tx_cons from memory. */
+ barrier();
/* The ring uses 256 indices for 255 entries, one of them
* needs to be skipped.
@@ -6534,6 +6535,13 @@ bnx2_start_xmit(struct sk_buff *skb, struct net_device *dev)
if (unlikely(bnx2_tx_avail(bp, txr) <= MAX_SKB_FRAGS)) {
netif_tx_stop_queue(txq);
+
+ /* netif_tx_stop_queue() must be done before checking
+ * tx index in bnx2_tx_avail() below, because in
+ * bnx2_tx_int(), we update tx index before checking for
+ * netif_tx_queue_stopped().
+ */
+ smp_mb();
if (bnx2_tx_avail(bp, txr) > bp->tx_wake_thresh)
netif_tx_wake_queue(txq);
}
--
1.6.4.GIT
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH net-next 4/4] bnx2: Update version to 2.0.17.
2010-07-20 0:15 ` [PATCH net-next 3/4] bnx2: Remove some unnecessary smp_mb() in tx fast path Michael Chan
@ 2010-07-20 0:15 ` Michael Chan
2010-07-20 3:31 ` David Miller
2010-07-20 3:31 ` [PATCH net-next 3/4] bnx2: Remove some unnecessary smp_mb() in tx fast path David Miller
2010-07-20 5:38 ` Eric Dumazet
2 siblings, 1 reply; 10+ messages in thread
From: Michael Chan @ 2010-07-20 0:15 UTC (permalink / raw)
To: davem; +Cc: netdev
Signed-off-by: Michael Chan <mchan@broadcom.com>
---
drivers/net/bnx2.c | 4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/net/bnx2.c b/drivers/net/bnx2.c
index 2af570d..e6a803f 100644
--- a/drivers/net/bnx2.c
+++ b/drivers/net/bnx2.c
@@ -58,8 +58,8 @@
#include "bnx2_fw.h"
#define DRV_MODULE_NAME "bnx2"
-#define DRV_MODULE_VERSION "2.0.16"
-#define DRV_MODULE_RELDATE "July 2, 2010"
+#define DRV_MODULE_VERSION "2.0.17"
+#define DRV_MODULE_RELDATE "July 18, 2010"
#define FW_MIPS_FILE_06 "bnx2/bnx2-mips-06-5.0.0.j6.fw"
#define FW_RV2P_FILE_06 "bnx2/bnx2-rv2p-06-5.0.0.j3.fw"
#define FW_MIPS_FILE_09 "bnx2/bnx2-mips-09-5.0.0.j15.fw"
--
1.6.4.GIT
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH net-next 1/4] bnx2: Use proper counter for net_device_stats->multicast.
2010-07-20 0:15 [PATCH net-next 1/4] bnx2: Use proper counter for net_device_stats->multicast Michael Chan
2010-07-20 0:15 ` [PATCH net-next 2/4] bnx2: Call pci_enable_msix() with actual number of vectors Michael Chan
@ 2010-07-20 3:31 ` David Miller
1 sibling, 0 replies; 10+ messages in thread
From: David Miller @ 2010-07-20 3:31 UTC (permalink / raw)
To: mchan; +Cc: netdev
From: "Michael Chan" <mchan@broadcom.com>
Date: Mon, 19 Jul 2010 17:15:02 -0700
> We were using the wrong tx multicast counter instead of the rx multicast
> counter.
>
> Reported-by: Peter Snellman <peter.snellman@cinnober.com>
> Reviewed-by: Benjamin Li <benli@broadcom.com>
> Reviewed-by: Matt Carlson <mcarlson@broadcom.com>
> Signed-off-by: Michael Chan <mchan@broadcom.com>
Applied.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH net-next 2/4] bnx2: Call pci_enable_msix() with actual number of vectors.
2010-07-20 0:15 ` [PATCH net-next 2/4] bnx2: Call pci_enable_msix() with actual number of vectors Michael Chan
2010-07-20 0:15 ` [PATCH net-next 3/4] bnx2: Remove some unnecessary smp_mb() in tx fast path Michael Chan
@ 2010-07-20 3:31 ` David Miller
1 sibling, 0 replies; 10+ messages in thread
From: David Miller @ 2010-07-20 3:31 UTC (permalink / raw)
To: mchan; +Cc: netdev, leitao
From: "Michael Chan" <mchan@broadcom.com>
Date: Mon, 19 Jul 2010 17:15:03 -0700
> Based on original patch by Breno Leitão <leitao@linux.vnet.ibm.com>.
>
> Allocate the actual number of vectors and make use of fewer vectors
> if pci_enable_msix() returns > 0. We must allocate one additional
> vector for the cnic driver.
>
> Cc: Breno Leitão <leitao@linux.vnet.ibm.com>
> Reviewed-by: Benjamin Li <benli@broadcom.com>
> Reviewed-by: Matt Carlson <mcarlson@broadcom.com>
> Signed-off-by: Michael Chan <mchan@broadcom.com>
Applied.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH net-next 3/4] bnx2: Remove some unnecessary smp_mb() in tx fast path.
2010-07-20 0:15 ` [PATCH net-next 3/4] bnx2: Remove some unnecessary smp_mb() in tx fast path Michael Chan
2010-07-20 0:15 ` [PATCH net-next 4/4] bnx2: Update version to 2.0.17 Michael Chan
@ 2010-07-20 3:31 ` David Miller
2010-07-20 5:38 ` Eric Dumazet
2 siblings, 0 replies; 10+ messages in thread
From: David Miller @ 2010-07-20 3:31 UTC (permalink / raw)
To: mchan; +Cc: netdev
From: "Michael Chan" <mchan@broadcom.com>
Date: Mon, 19 Jul 2010 17:15:04 -0700
> smp_mb() inside bnx2_tx_avail() is used twice in the normal
> bnx2_start_xmit() path (see illustration below). The full memory
> barrier is only necessary during race conditions with tx completion.
> We can speed up the tx path by replacing smp_mb() in bnx2_tx_avail()
> with a compiler barrier. The compiler barrier is to force the
> compiler to fetch the tx_prod and tx_cons from memory.
>
> In the race condition between bnx2_start_xmit() and bnx2_tx_int(),
> we have the following situation:
>
> bnx2_start_xmit() bnx2_tx_int()
> if (!bnx2_tx_avail())
> BUG();
>
> ...
>
> if (!bnx2_tx_avail())
> netif_tx_stop_queue(); update_tx_index();
> smp_mb(); smp_mb();
> if (bnx2_tx_avail()) if (netif_tx_queue_stopped() &&
> netif_tx_wake_queue(); bnx2_tx_avail())
>
> With smp_mb() removed from bnx2_tx_avail(), we need to add smp_mb() to
> bnx2_start_xmit() as shown above to properly order netif_tx_stop_queue()
> and bnx2_tx_avail() to check the ring index. If it is not strictly
> ordered, the tx queue can be stopped forever.
>
> This improves performance by about 5% with 2 ports running bi-directional
> 64-byte packets.
>
> Reviewed-by: Benjamin Li <benli@broadcom.com>
> Reviewed-by: Matt Carlson <mcarlson@broadcom.com>
> Signed-off-by: Michael Chan <mchan@broadcom.com>
Applied.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH net-next 4/4] bnx2: Update version to 2.0.17.
2010-07-20 0:15 ` [PATCH net-next 4/4] bnx2: Update version to 2.0.17 Michael Chan
@ 2010-07-20 3:31 ` David Miller
0 siblings, 0 replies; 10+ messages in thread
From: David Miller @ 2010-07-20 3:31 UTC (permalink / raw)
To: mchan; +Cc: netdev
From: "Michael Chan" <mchan@broadcom.com>
Date: Mon, 19 Jul 2010 17:15:05 -0700
> Signed-off-by: Michael Chan <mchan@broadcom.com>
Applied.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH net-next 3/4] bnx2: Remove some unnecessary smp_mb() in tx fast path.
2010-07-20 0:15 ` [PATCH net-next 3/4] bnx2: Remove some unnecessary smp_mb() in tx fast path Michael Chan
2010-07-20 0:15 ` [PATCH net-next 4/4] bnx2: Update version to 2.0.17 Michael Chan
2010-07-20 3:31 ` [PATCH net-next 3/4] bnx2: Remove some unnecessary smp_mb() in tx fast path David Miller
@ 2010-07-20 5:38 ` Eric Dumazet
2010-07-20 5:48 ` Michael Chan
2 siblings, 1 reply; 10+ messages in thread
From: Eric Dumazet @ 2010-07-20 5:38 UTC (permalink / raw)
To: Michael Chan; +Cc: davem, netdev
Le lundi 19 juillet 2010 à 17:15 -0700, Michael Chan a écrit :
> smp_mb() inside bnx2_tx_avail() is used twice in the normal
> bnx2_start_xmit() path (see illustration below). The full memory
> barrier is only necessary during race conditions with tx completion.
> We can speed up the tx path by replacing smp_mb() in bnx2_tx_avail()
> with a compiler barrier. The compiler barrier is to force the
> compiler to fetch the tx_prod and tx_cons from memory.
>
> In the race condition between bnx2_start_xmit() and bnx2_tx_int(),
> we have the following situation:
>
> bnx2_start_xmit() bnx2_tx_int()
> if (!bnx2_tx_avail())
> BUG();
>
> ...
>
> if (!bnx2_tx_avail())
> netif_tx_stop_queue(); update_tx_index();
> smp_mb(); smp_mb();
> if (bnx2_tx_avail()) if (netif_tx_queue_stopped() &&
> netif_tx_wake_queue(); bnx2_tx_avail())
>
> With smp_mb() removed from bnx2_tx_avail(), we need to add smp_mb() to
> bnx2_start_xmit() as shown above to properly order netif_tx_stop_queue()
> and bnx2_tx_avail() to check the ring index. If it is not strictly
> ordered, the tx queue can be stopped forever.
>
> This improves performance by about 5% with 2 ports running bi-directional
> 64-byte packets.
>
> Reviewed-by: Benjamin Li <benli@broadcom.com>
> Reviewed-by: Matt Carlson <mcarlson@broadcom.com>
> Signed-off-by: Michael Chan <mchan@broadcom.com>
> ---
> drivers/net/bnx2.c | 10 +++++++++-
> 1 files changed, 9 insertions(+), 1 deletions(-)
>
> diff --git a/drivers/net/bnx2.c b/drivers/net/bnx2.c
> index d44ecc3..2af570d 100644
> --- a/drivers/net/bnx2.c
> +++ b/drivers/net/bnx2.c
> @@ -253,7 +253,8 @@ static inline u32 bnx2_tx_avail(struct bnx2 *bp, struct bnx2_tx_ring_info *txr)
> {
> u32 diff;
>
> - smp_mb();
> + /* Tell compiler to fetch tx_prod and tx_cons from memory. */
> + barrier();
>
> /* The ring uses 256 indices for 255 entries, one of them
> * needs to be skipped.
> @@ -6534,6 +6535,13 @@ bnx2_start_xmit(struct sk_buff *skb, struct net_device *dev)
>
> if (unlikely(bnx2_tx_avail(bp, txr) <= MAX_SKB_FRAGS)) {
> netif_tx_stop_queue(txq);
> +
> + /* netif_tx_stop_queue() must be done before checking
> + * tx index in bnx2_tx_avail() below, because in
> + * bnx2_tx_int(), we update tx index before checking for
> + * netif_tx_queue_stopped().
> + */
> + smp_mb();
> if (bnx2_tx_avail(bp, txr) > bp->tx_wake_thresh)
> netif_tx_wake_queue(txq);
> }
Excellent,
Is similar patch for tg3 planned ?
Thanks
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH net-next 3/4] bnx2: Remove some unnecessary smp_mb() in tx fast path.
2010-07-20 5:38 ` Eric Dumazet
@ 2010-07-20 5:48 ` Michael Chan
0 siblings, 0 replies; 10+ messages in thread
From: Michael Chan @ 2010-07-20 5:48 UTC (permalink / raw)
To: 'Eric Dumazet'; +Cc: davem@davemloft.net, netdev@vger.kernel.org
Eric Dumazet wrote:
> Excellent,
>
> Is similar patch for tg3 planned ?
>
Yes, this performance issue was actually first discovered when
profiling a new tg3 device. So Matt should be sending out a
very similar patch as part of his next patch-set.
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2010-07-20 5:48 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-07-20 0:15 [PATCH net-next 1/4] bnx2: Use proper counter for net_device_stats->multicast Michael Chan
2010-07-20 0:15 ` [PATCH net-next 2/4] bnx2: Call pci_enable_msix() with actual number of vectors Michael Chan
2010-07-20 0:15 ` [PATCH net-next 3/4] bnx2: Remove some unnecessary smp_mb() in tx fast path Michael Chan
2010-07-20 0:15 ` [PATCH net-next 4/4] bnx2: Update version to 2.0.17 Michael Chan
2010-07-20 3:31 ` David Miller
2010-07-20 3:31 ` [PATCH net-next 3/4] bnx2: Remove some unnecessary smp_mb() in tx fast path David Miller
2010-07-20 5:38 ` Eric Dumazet
2010-07-20 5:48 ` Michael Chan
2010-07-20 3:31 ` [PATCH net-next 2/4] bnx2: Call pci_enable_msix() with actual number of vectors David Miller
2010-07-20 3:31 ` [PATCH net-next 1/4] bnx2: Use proper counter for net_device_stats->multicast David Miller
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).