Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH] sky2: use GFP_KERNEL allocations at device setup
From: David Miller @ 2011-07-07 13:13 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev, shemminger
In-Reply-To: <1310044288.2127.12.camel@edumazet-HP-Compaq-6005-Pro-SFF-PC>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 07 Jul 2011 15:11:28 +0200

> In process and sleep allowed context, favor GFP_KERNEL allocations over
> GFP_ATOMIC ones.
> 
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>

Applied, thanks Eric.

^ permalink raw reply

* Re: [PATCH v2 net-next af-packet 1/2] Enhance af-packet to provide (near zero)lossless packet capture functionality.
From: David Miller @ 2011-07-07 13:11 UTC (permalink / raw)
  To: loke.chetan
  Cc: netdev, eric.dumazet, joe, bhutchings, shemminger, linux-kernel
In-Reply-To: <CAAsGZS4Y=-GSmP+2iYtd+V3RcE2+PpaStBzW7R7TDip7U-rdvA@mail.gmail.com>

From: chetan loke <loke.chetan@gmail.com>
Date: Thu, 7 Jul 2011 09:04:58 -0400

> On Thu, Jul 7, 2011 at 3:13 AM, David Miller <davem@davemloft.net> wrote:
> 
>> Get rid of __packed__, it's going to kill performance on RISC
>> platforms.  If you use __packed__, regardless of the actual alignment,
> 
> The performance boost has been achieved by amortizing the cost of
> static spin-wait/poll and not by shrinking the data-set.

Chetan, if you're implementing something for performance reasons,
getting rid of packed is non-negotiable.

We pass data structures between userspace and the kernel all the
time, and without __packed__.  We have mechanisms to ensure the
size of the individual data types, and we have mechanisms to make
sure 64-bit datums get aligned even on x86 (see "aligned_u64" and
friends")

Again, I can't seriously consider your patch if you keep the packed
attribute crap in there.

^ permalink raw reply

* [PATCH] sky2: use GFP_KERNEL allocations at device setup
From: Eric Dumazet @ 2011-07-07 13:11 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Stephen Hemminger

In process and sleep allowed context, favor GFP_KERNEL allocations over
GFP_ATOMIC ones.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Stephen Hemminger <shemminger@vyatta.com>
---
 drivers/net/sky2.c |   13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/drivers/net/sky2.c b/drivers/net/sky2.c
index e14b86e..c91513e 100644
--- a/drivers/net/sky2.c
+++ b/drivers/net/sky2.c
@@ -1362,13 +1362,14 @@ static inline unsigned sky2_rx_pad(const struct sky2_hw *hw)
  * Allocate an skb for receiving. If the MTU is large enough
  * make the skb non-linear with a fragment list of pages.
  */
-static struct sk_buff *sky2_rx_alloc(struct sky2_port *sky2)
+static struct sk_buff *sky2_rx_alloc(struct sky2_port *sky2, gfp_t gfp)
 {
 	struct sk_buff *skb;
 	int i;
 
-	skb = netdev_alloc_skb(sky2->netdev,
-			       sky2->rx_data_size + sky2_rx_pad(sky2->hw));
+	skb = __netdev_alloc_skb(sky2->netdev,
+				 sky2->rx_data_size + sky2_rx_pad(sky2->hw),
+				 gfp);
 	if (!skb)
 		goto nomem;
 
@@ -1386,7 +1387,7 @@ static struct sk_buff *sky2_rx_alloc(struct sky2_port *sky2)
 		skb_reserve(skb, NET_IP_ALIGN);
 
 	for (i = 0; i < sky2->rx_nfrags; i++) {
-		struct page *page = alloc_page(GFP_ATOMIC);
+		struct page *page = alloc_page(gfp);
 
 		if (!page)
 			goto free_partial;
@@ -1416,7 +1417,7 @@ static int sky2_alloc_rx_skbs(struct sky2_port *sky2)
 	for (i = 0; i < sky2->rx_pending; i++) {
 		struct rx_ring_info *re = sky2->rx_ring + i;
 
-		re->skb = sky2_rx_alloc(sky2);
+		re->skb = sky2_rx_alloc(sky2, GFP_KERNEL);
 		if (!re->skb)
 			return -ENOMEM;
 
@@ -2384,7 +2385,7 @@ static struct sk_buff *receive_new(struct sky2_port *sky2,
 	struct rx_ring_info nre;
 	unsigned hdr_space = sky2->rx_data_size;
 
-	nre.skb = sky2_rx_alloc(sky2);
+	nre.skb = sky2_rx_alloc(sky2, GFP_ATOMIC);
 	if (unlikely(!nre.skb))
 		goto nobuf;
 



^ permalink raw reply related

* Re: [PATCH v2 net-next af-packet 1/2] Enhance af-packet to provide (near zero)lossless packet capture functionality.
From: chetan loke @ 2011-07-07 13:04 UTC (permalink / raw)
  To: David Miller
  Cc: netdev, eric.dumazet, joe, bhutchings, shemminger, linux-kernel
In-Reply-To: <20110707.001301.1054777374178479078.davem@davemloft.net>

On Thu, Jul 7, 2011 at 3:13 AM, David Miller <davem@davemloft.net> wrote:

> Get rid of __packed__, it's going to kill performance on RISC
> platforms.  If you use __packed__, regardless of the actual alignment,

The performance boost has been achieved by amortizing the cost of
static spin-wait/poll and not by shrinking the data-set.


> the compiler must assume that each part of the struct "might" be
> unaligned.  So on architectures such as sparc where alignment matters,
> a word is going to be accessed by a sequence of byte loads/stores.
>
Haven't worked with sparc so I didn't know. Thanks for the insight.
One also needs to analyze both the user/kernel components.The app
reads the header(hdr_size <<< blk_size) just once and then walks the
entire block. Apps operate on local copy of the variable and not on
the header.

kernel components - almost everything is cached in kbdq_core. block is
updated while closing.

> Do not use packed unless absolutely enforced by a protocol or hardware
> data structure, it's evil.
>
Depends. Why not evaluate on case-by-case basis? All I need to do is
pass this definition of the header around and only mandate how wide
the fields should be.
Once packed, I don't need to worry about padding on different
OS's/arch's. All I care about is the offset to the first pkt and other
details. The block says - you provide me offset to the first packet
and I will start walking the packets.

Another way to look at it - you pack something and then no padding is
needed(not the right example because every pkt-header will be
byte-sequenced if packed but you get the idea) :
http://git2.kernel.org/?p=linux/kernel/git/davem/net-2.6.git;a=commit;h=13fcb7bd322164c67926ffe272846d4860196dc6


Chetan Loke

^ permalink raw reply

* Re: [PATCH 00/14] Swap-over-NBD without deadlocking v5
From: Christoph Hellwig @ 2011-07-07 12:58 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, Linux-MM, Linux-Netdev, LKML, David Miller,
	Neil Brown, Peter Zijlstra
In-Reply-To: <20110707094737.GG15285@suse.de>

On Thu, Jul 07, 2011 at 10:47:37AM +0100, Mel Gorman wrote:
> Additional complexity is required for swap-over-NFS but affects the
> core kernel far less than this series. I do not have a series prepared
> but from what's in a distro kernel, supporting NFS requires extending
> address_space_operations for swapfile activation/deactivation with
> some minor helpers and the bulk of the remaining complexity within
> NFS itself.

The biggest addition for swap over NFS is to add proper support for
a filesystem interface to do I/O on random kernel pages instead of
the current nasty bmap hack the swapfile code is using.  Splitting
that work from all the required VM infrastructure should make life
easier for everyone involved and allows merging it independeny as
both bits have other uses case as well.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [PATCH 2/2] vhost: set log when updating used flags or avail event
From: Michael S. Tsirkin @ 2011-07-07 12:57 UTC (permalink / raw)
  To: Jason Wang; +Cc: netdev, virtualization, linux-kernel, kvm
In-Reply-To: <20110621100438.6777.20695.stgit@dhcp-91-7.nay.redhat.com.englab.nay.redhat.com>

Subject: vhost: used ring logging cleanup

remove extra log bit setting for used ring updates: it's no longer
necessary.  Also, use vhost_avail_event instead of duplicating offset
math.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---

> We need set log when updating used flags and avail event. Otherwise guest may
> see stale values after migration and then do not exit or exit unexpectedly.
> 
> Signed-off-by: Jason Wang <jasowang@redhat.com>

OK but this means we set the log twice now.
Also, hardcording offset is not as nice as using
vhost_avail_event. So I think the below is needed
on top. Comments?

 drivers/vhost/vhost.c |   29 +++++++++--------------------
 1 files changed, 9 insertions(+), 20 deletions(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 540591b..c5f96ba 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -946,14 +946,16 @@ int vhost_log_write(struct vhost_virtqueue *vq, struct vhost_log *log,
 
 static int vhost_update_used_flags(struct vhost_virtqueue *vq)
 {
+	void __user *used;
 	if (put_user(vq->used_flags, &vq->used->flags) < 0)
 		return -EFAULT;
 	if (unlikely(vq->log_used)) {
 		/* Make sure the flag is seen before log. */
 		smp_wmb();
 		/* Log used flag write. */
-		log_write(vq->log_base,
-			  vq->log_addr + offsetof(struct vring_used, flags),
+		used = &vq->used->flags;
+		log_write(vq->log_base, vq->log_addr +
+			  (used - (void __user *)vq->used),
 			  sizeof vq->used->flags);
 		if (vq->log_ctx)
 			eventfd_signal(vq->log_ctx, 1);
@@ -966,13 +968,14 @@ static int vhost_update_avail_event(struct vhost_virtqueue *vq, u16 avail_event)
 	if (put_user(vq->avail_idx, vhost_avail_event(vq)))
 		return -EFAULT;
 	if (unlikely(vq->log_used)) {
+		void __user *used;
 		/* Make sure the event is seen before log. */
 		smp_wmb();
 		/* Log avail event write */
-		log_write(vq->log_base,
-			  vq->log_addr + offsetof(struct vring_used,
-						  ring[vq->num]),
-			  sizeof avail_event);
+		used = vhost_avail_event(vq);
+		log_write(vq->log_base, vq->log_addr +
+			  (used - (void __user *)vq->used),
+			  sizeof *vhost_avail_event(vq));
 		if (vq->log_ctx)
 			eventfd_signal(vq->log_ctx, 1);
 	}
@@ -1474,20 +1477,6 @@ bool vhost_enable_notify(struct vhost_dev *dev, struct vhost_virtqueue *vq)
 			return false;
 		}
 	}
-	if (unlikely(vq->log_used)) {
-		void __user *used;
-		/* Make sure data is seen before log. */
-		smp_wmb();
-		used = vhost_has_feature(dev, VIRTIO_RING_F_EVENT_IDX) ?
-			&vq->used->flags : vhost_avail_event(vq);
-		/* Log used flags or event index entry write. Both are 16 bit
-		 * fields. */
-		log_write(vq->log_base, vq->log_addr +
-			   (used - (void __user *)vq->used),
-			  sizeof(u16));
-		if (vq->log_ctx)
-			eventfd_signal(vq->log_ctx, 1);
-	}
 	/* They could have slipped one in as we were doing that: make
 	 * sure it's written, then check again. */
 	smp_mb();
-- 
1.7.5.53.gc233e

^ permalink raw reply related

* Re: Getting the correct asix AX88178 usb gige driver in mainline?
From: Arnd Bergmann @ 2011-07-07 12:55 UTC (permalink / raw)
  To: Marc MERLIN; +Cc: netdev, greg
In-Reply-To: <20110706210857.GH22090@merlins.org>

On Wednesday 06 July 2011, Marc MERLIN wrote:
> > The patch I mentioned was merged back in 2006, for 2.6.19. Either that
> > patch was never complete and is missing support for your hardware, or
> > it broke since then. You should probably try an old kernel to see if it's
> > actually a regression.
> 
> Thanks for the details Arnd, I'll see if I can boot 2.6.19 on that laptop
> and report back.

I would perhaps try something less ancient first, 2.6.27-longterm would be
a good candidate. If it was working at some point but broken later, chances
are that it wasn't broken until a few years ago, rather than shortly after
it was merged.

	Arnd

^ permalink raw reply

* Re: [PATCH] ATM: Fix wrong usage of INIT_WORK
From: chas williams - CONTRACTOR @ 2011-07-07 12:29 UTC (permalink / raw)
  To: Wang Shaoyan; +Cc: David Miller, netdev, wangshaoyan.pt, linux-atm-general
In-Reply-To: <CANxBZFo_2P+hrjNDmvU0wai2CA7Ry3LJ9vk+pZ4ML1RR12LKTQ@mail.gmail.com>

On Thu, 7 Jul 2011 20:04:25 +0800
Wang Shaoyan <stufever@gmail.com> wrote:

> I just don't know whether the marco FILL_RX_POOLS_IN_BH is useful?

the macro doesnt seem to be useful in anyway.  it can simply be
eliminated.

^ permalink raw reply

* Re: [PATCH] ATM: Fix wrong usage of INIT_WORK
From: David Miller @ 2011-07-07 12:10 UTC (permalink / raw)
  To: stufever; +Cc: netdev, wangshaoyan.pt, chas, linux-atm-general
In-Reply-To: <CANxBZFo_2P+hrjNDmvU0wai2CA7Ry3LJ9vk+pZ4ML1RR12LKTQ@mail.gmail.com>

From: Wang Shaoyan <stufever@gmail.com>
Date: Thu, 7 Jul 2011 20:04:25 +0800

> I just don't know whether the marco FILL_RX_POOLS_IN_BH is useful?

Yes, another option is to delete all of the code protected by
that macro altogether.

It obviously hasn't been build tested in a long time, if at all.

^ permalink raw reply

* Re: [PATCH net next] bnx2x: Add dcbnl notification
From: David Miller @ 2011-07-07 12:11 UTC (permalink / raw)
  To: shmulikr; +Cc: eilong, netdev
In-Reply-To: <1310046941.22577.20.camel@lb-tlvb-shmulik.il.broadcom.com>

From: "Shmulik Ravid" <shmulikr@broadcom.com>
Date: Thu, 7 Jul 2011 16:55:41 +0300

> This patch adds a dcbnl notification to the bnx2x. The notification is
> sent to user mode clients following a change in the dcb negotiated
> parameters as resolved by the embedded DCBX stack. 
> 
> Signed-off-by: Shmulik Ravid <shmulikr@broadcom.com>
> Signed-off-by: Eilon Greenstein <eilong@broadcom.com>

Applied.

^ permalink raw reply

* [PATCH net next] bnx2x: Add dcbnl notification
From: Shmulik Ravid @ 2011-07-07 13:55 UTC (permalink / raw)
  To: davem; +Cc: Eilon Greenstein, netdev

This patch adds a dcbnl notification to the bnx2x. The notification is
sent to user mode clients following a change in the dcb negotiated
parameters as resolved by the embedded DCBX stack. 

Signed-off-by: Shmulik Ravid <shmulikr@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
---
 drivers/net/bnx2x/bnx2x_dcb.c |   12 +++++++++---
 1 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/drivers/net/bnx2x/bnx2x_dcb.c b/drivers/net/bnx2x/bnx2x_dcb.c
index b51a759..45cf3ce 100644
--- a/drivers/net/bnx2x/bnx2x_dcb.c
+++ b/drivers/net/bnx2x/bnx2x_dcb.c
@@ -19,14 +19,14 @@
 #include <linux/netdevice.h>
 #include <linux/types.h>
 #include <linux/errno.h>
-#ifdef BCM_DCBNL
-#include <linux/dcbnl.h>
-#endif
 
 #include "bnx2x.h"
 #include "bnx2x_cmn.h"
 #include "bnx2x_dcb.h"
 
+#ifdef BCM_DCBNL
+#include <linux/rtnetlink.h>
+#endif
 
 /* forward declarations of dcbx related functions */
 static void bnx2x_dcbx_stop_hw_tx(struct bnx2x *bp);
@@ -702,6 +702,12 @@ void bnx2x_dcbx_set_params(struct bnx2x *bp, u32 state)
 	case BNX2X_DCBX_STATE_TX_RELEASED:
 		DP(NETIF_MSG_LINK, "BNX2X_DCBX_STATE_TX_RELEASED\n");
 		bnx2x_fw_command(bp, DRV_MSG_CODE_DCBX_PMF_DRV_OK, 0);
+#ifdef BCM_DCBNL
+		/**
+		 * Send a notification for the new negotiated parameters
+		 */
+		dcbnl_cee_notify(bp->dev, RTM_GETDCB, DCB_CMD_CEE_GET, 0, 0);
+#endif
 		return;
 	default:
 		BNX2X_ERR("Unknown DCBX_STATE\n");
-- 
1.7.3.5





^ permalink raw reply related

* Re: [PATCH] ATM: Fix wrong usage of INIT_WORK
From: Wang Shaoyan @ 2011-07-07 12:04 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, wangshaoyan.pt, chas, linux-atm-general
In-Reply-To: <20110707.045159.735589070219101177.davem@davemloft.net>

I just don't know whether the marco FILL_RX_POOLS_IN_BH is useful?

2011/7/7 David Miller <davem@davemloft.net>:

>
> This just makes the driver a bigger CPP mess.
>
> Unconditionally provide the dev->bh member, and unconditionally pass
> it into the worker function.
>
>



-- 
Wang Shaoyan

^ permalink raw reply

* Re: [PATCH] ATM: Fix wrong usage of INIT_WORK
From: David Miller @ 2011-07-07 11:51 UTC (permalink / raw)
  To: stufever; +Cc: netdev, wangshaoyan.pt, chas, linux-atm-general
In-Reply-To: <1310039512-17579-1-git-send-email-wangshaoyan.pt@taobao.com>

From: stufever@gmail.com
Date: Thu,  7 Jul 2011 19:51:52 +0800

> From: Wang Shaoyan <wangshaoyan.pt@taobao.com>
> 
> If we define FILL_RX_POOLS_IN_BH, the compiler will report error such as
>   drivers/atm/ambassador.c:2159:64: error: macro "INIT_WORK" passed 3 arguments, but takes just 2
> because the function INIT_WORK() don't accept "data" now, it only has
> two arguments, so use the right way to initialise work queue.
> 
> Cc: Chas Williams <chas@cmf.nrl.navy.mil> (maintainer:ATM)
> Cc: linux-atm-general@lists.sourceforge.net (open list:ATM)
> Signed-off-by: Wang Shaoyan <wangshaoyan.pt@taobao.com>

This just makes the driver a bigger CPP mess.

Unconditionally provide the dev->bh member, and unconditionally pass
it into the worker function.


^ permalink raw reply

* Re: [PATCH] lib/checksum.c: optimize do_csum a bit
From: David Miller @ 2011-07-07 11:52 UTC (permalink / raw)
  To: abbotti; +Cc: ian.abbott, netdev, arnd, linux-kernel
In-Reply-To: <20110707.043625.856190521365086318.davem@davemloft.net>

From: David Miller <davem@davemloft.net>
Date: Thu, 07 Jul 2011 04:36:25 -0700 (PDT)

> From: Ian Abbott <abbotti@mev.co.uk>
> Date: Thu, 7 Jul 2011 12:32:45 +0100
> 
>> On 07/07/11 12:29, David Miller wrote:
>>> From: Ian Abbott <abbotti@mev.co.uk>
>>> Date: Thu, 7 Jul 2011 12:18:49 +0100
>>> 
>>>> Reduce the number of variables modified by the loop in do_csum() by 1,
>>>> which seems like a good idea.  On Nios II (a RISC CPU with 3-operand
>>>> instruction set) it reduces the loop from 7 to 6 instructions, including
>>>> the conditional branch.
>>>>
>>>> Signed-off-by: Ian Abbott <abbotti@mev.co.uk>
>>> 
>>> I think you'll overshoot past the end of the buffer when there are
>>> trailing bytes to handle.
>>> 
>>> The whole reason we need the count variable is to handle those
>>> kinds of cases.
>> 
>> I don't think it does.  That's what the & ~3 was for.
> 
> Aha, yes that indeed makes it work.

I've applied this to net-next-2.6, thanks.

^ permalink raw reply

* Re: [PATCH V8 0/4 net-next] macvtap/vhost TX zero-copy support
From: David Miller @ 2011-07-07 11:49 UTC (permalink / raw)
  To: mst; +Cc: mashirle, netdev, kvm, linux-kernel
In-Reply-To: <20110707113715.GA32632@redhat.com>

From: "Michael S. Tsirkin" <mst@redhat.com>
Date: Thu, 7 Jul 2011 14:37:15 +0300

> Apply patches 1-3 for now?

Done, and I fixed the use-after-free in patch #2.


^ permalink raw reply

* [PATCH] ATM: Fix wrong usage of INIT_WORK
From: stufever @ 2011-07-07 11:51 UTC (permalink / raw)
  To: netdev; +Cc: Wang Shaoyan, Chas Williams, open list:ATM

From: Wang Shaoyan <wangshaoyan.pt@taobao.com>

If we define FILL_RX_POOLS_IN_BH, the compiler will report error such as
  drivers/atm/ambassador.c:2159:64: error: macro "INIT_WORK" passed 3 arguments, but takes just 2
because the function INIT_WORK() don't accept "data" now, it only has
two arguments, so use the right way to initialise work queue.

Cc: Chas Williams <chas@cmf.nrl.navy.mil> (maintainer:ATM)
Cc: linux-atm-general@lists.sourceforge.net (open list:ATM)
Signed-off-by: Wang Shaoyan <wangshaoyan.pt@taobao.com>
---
 drivers/atm/ambassador.c |   11 ++++++++++-
 1 files changed, 10 insertions(+), 1 deletions(-)

diff --git a/drivers/atm/ambassador.c b/drivers/atm/ambassador.c
index a5fcb1e..3618c5c 100644
--- a/drivers/atm/ambassador.c
+++ b/drivers/atm/ambassador.c
@@ -814,7 +814,12 @@ static void fill_rx_pool (amb_dev * dev, unsigned char pool,
 }
 
 // top up all RX pools (can also be called as a bottom half)
+#ifdef FILL_RX_POOLS_IN_BH
+static void fill_rx_pools (struct work_struct * work) {
+  amb_dev * dev = container_of(work, amb_dev, bh);
+#else
 static void fill_rx_pools (amb_dev * dev) {
+#endif
   unsigned char pool;
   
   PRINTD (DBG_FLOW|DBG_POOL, "fill_rx_pools %p", dev);
@@ -1503,7 +1508,11 @@ static void do_housekeeping (unsigned long arg) {
   // could collect device-specific (not driver/atm-linux) stats here
       
   // last resort refill once every ten seconds
+#ifdef FILL_RX_POOLS_IN_BH
+  fill_rx_pools (&dev->bh);
+#else
   fill_rx_pools (dev);
+#endif
   mod_timer(&dev->housekeeping, jiffies + 10*HZ);
   
   return;
@@ -2156,7 +2165,7 @@ static void setup_dev(amb_dev *dev, struct pci_dev *pci_dev)
       
 #ifdef FILL_RX_POOLS_IN_BH
       // initialise bottom half
-      INIT_WORK(&dev->bh, (void (*)(void *)) fill_rx_pools, dev);
+      INIT_WORK(&dev->bh, fill_rx_pools);
 #endif
       
       // semaphore for txer/rxer modifications - we cannot use a
-- 
1.7.4.1


^ permalink raw reply related

* Re: [PATCH V8 0/4 net-next] macvtap/vhost TX zero-copy support
From: Michael S. Tsirkin @ 2011-07-07 11:37 UTC (permalink / raw)
  To: David Miller; +Cc: mashirle, netdev, kvm, linux-kernel
In-Reply-To: <20110707.040840.809661052978541081.davem@davemloft.net>

On Thu, Jul 07, 2011 at 04:08:40AM -0700, David Miller wrote:
> From: Shirley Ma <mashirle@us.ibm.com>
> Date: Wed, 06 Jul 2011 15:15:25 -0700
> 
> > This patchset add supports for TX zero-copy between guest and host
> > kernel through vhost. It significantly reduces CPU utilization on the
> > local host on which the guest is located (It reduced about 50% CPU usage
> > for single stream test on the host, while 4K message size BW has
> > increased about 50%). The patchset is based on previous submission and
> > comments from the community regarding when/how to handle guest kernel
> > buffers to be released. This is the simplest approach I can think of
> > after comparing with several other solutions.
> > 
> > This patchset has integrated V3 review comments from community: 
> 
> I'm personally fine with this patch set.  Unless there are others
> who object, please fix the use-after-free bug I reported, respin
> the patch set, and I'll apply it.
> 
> Thanks.

There's the FIXME in patch 4 where it spins in vhost waiting for
the pages to get freed. I'm fixing that up as Shirley's on vacation.

Apply patches 1-3 for now?

-- 
MST

^ permalink raw reply

* Re: [PATCH] lib/checksum.c: optimize do_csum a bit
From: David Miller @ 2011-07-07 11:36 UTC (permalink / raw)
  To: abbotti; +Cc: ian.abbott, netdev, arnd, linux-kernel
In-Reply-To: <4E15995D.6040405@mev.co.uk>

From: Ian Abbott <abbotti@mev.co.uk>
Date: Thu, 7 Jul 2011 12:32:45 +0100

> On 07/07/11 12:29, David Miller wrote:
>> From: Ian Abbott <abbotti@mev.co.uk>
>> Date: Thu, 7 Jul 2011 12:18:49 +0100
>> 
>>> Reduce the number of variables modified by the loop in do_csum() by 1,
>>> which seems like a good idea.  On Nios II (a RISC CPU with 3-operand
>>> instruction set) it reduces the loop from 7 to 6 instructions, including
>>> the conditional branch.
>>>
>>> Signed-off-by: Ian Abbott <abbotti@mev.co.uk>
>> 
>> I think you'll overshoot past the end of the buffer when there are
>> trailing bytes to handle.
>> 
>> The whole reason we need the count variable is to handle those
>> kinds of cases.
> 
> I don't think it does.  That's what the & ~3 was for.

Aha, yes that indeed makes it work.

^ permalink raw reply

* Re: [PATCH] lib/checksum.c: optimize do_csum a bit
From: Ian Abbott @ 2011-07-07 11:32 UTC (permalink / raw)
  To: David Miller
  Cc: Ian Abbott, netdev@vger.kernel.org, arnd@arndb.de,
	linux-kernel@vger.kernel.org
In-Reply-To: <20110707.042925.1609726410655229616.davem@davemloft.net>

On 07/07/11 12:29, David Miller wrote:
> From: Ian Abbott <abbotti@mev.co.uk>
> Date: Thu, 7 Jul 2011 12:18:49 +0100
> 
>> Reduce the number of variables modified by the loop in do_csum() by 1,
>> which seems like a good idea.  On Nios II (a RISC CPU with 3-operand
>> instruction set) it reduces the loop from 7 to 6 instructions, including
>> the conditional branch.
>>
>> Signed-off-by: Ian Abbott <abbotti@mev.co.uk>
> 
> I think you'll overshoot past the end of the buffer when there are
> trailing bytes to handle.
> 
> The whole reason we need the count variable is to handle those
> kinds of cases.

I don't think it does.  That's what the & ~3 was for.

-- 
-=( Ian Abbott @ MEV Ltd.    E-mail: <abbotti@mev.co.uk>        )=-
-=( Tel: +44 (0)161 477 1898   FAX: +44 (0)161 718 3587         )=-

^ permalink raw reply

* [GIT] Networking
From: David Miller @ 2011-07-07 11:32 UTC (permalink / raw)
  To: torvalds; +Cc: akpm, netdev, linux-kernel


1) iwl_tx_queue_reset() doesn't clear out the right number of slots,
   fix from Emmanuel Grumbach.

2) DMA buffer leak in iwlagn, fix from Johannes Berg.

3) iwlwifi command buffers need to be DMA mapped as "bidirectional" as
   the chip can write back to them sometimes, fix from Johannes Berg.

4) Fix OOPS in mac80211 on MIC failure, from Arik Nemtsov.

5) Fix lost power up during resume in ath9k, from Rajkumar Manoharan.

6) Memory leak in ath5k, fix from Bob Copeland.

7) 6pack/mkiss need to use BH locking, fix from Arnd Bergmann.

8) Get rid of artificial ipv6 routing table size limits, this is
   hitting real people now.

9) Don't send ICMP errors on local socket created MTU events.  Fix
   from Steffen Klassert.

10) Fix mailbox execution crash in qlge, from Jitendra Kalsaria.

11) Fix mixup between "spin_lock_irqsave()" flags and flags
    passed to request_irq in bna driver, fix from Shyam Iyer.

12) vlan_features not set properly in 8139too, from Shan Wei.

13) Fix DMA unmap length in natsemi, from FUJITA Tomonori.

14) Use more POSIX'ly correct when wrong address family is used
    during ipv4 bind().  From Marcus Meissner.

15) Fix MAC address setting in greth driver, from Kristoffer Glembo.

16) Regression fix, have to always flood broadcasts on bridge.  Fix
    from Herbert Xu.

17) Global memory limits of TCP/UDP/SCTP are miscalculated when lots
    of hugepages are reserved.  Fix from Eric Dumazet.

18) SCTP_SENDER_DRY_EVENT are not always sent when they should be, fix
    from Wei Yongjun.

Please pull, thanks a lot!

The following changes since commit 4dd1b49c6d215dc41ce50c80b4868388b93f31a3:

  Merge branch 'gpio/merge' of git://git.secretlab.ca/git/linux-2.6 (2011-07-06 18:36:53 -0700)

are available in the git repository at:

  master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6.git master

Arik Nemtsov (1):
      mac80211: fix rx->key NULL dereference during mic failure

Arnd Bergmann (1):
      6pack,mkiss: fix lock inconsistency

David S. Miller (3):
      Merge branch 'for-davem' of git://git.kernel.org/.../linville/wireless-2.6
      ipv6: Don't change dst->flags using assignments.
      ipv6: Don't put artificial limit on routing table size.

Emmanuel Grumbach (1):
      iwlagn: Fix a bug introduced by the HUGE command removal

Eric Dumazet (1):
      net: refine {udp|tcp|sctp}_mem limits

Eugene A. Shatokhin (1):
      ath5k: fix memory leak when fewer than N_PD_CURVES are in use

Evgeni Golov (1):
      iwlagn: fix *_UCODE_API_MAX output in the firmware field

FUJITA Tomonori (1):
      natsemi: silence dma-debug warnings

Herbert Xu (1):
      bridge: Always flood broadcast packets

Jitendra Kalsaria (3):
      qlge:Fix crash caused by mailbox execution on wedged chip.
      qlge: Fix printk priority so chip fatal errors are always reported.
      qlge:Version change to v1.00.00.29

Johannes Berg (3):
      iwlagn: fix change_interface for P2P types
      iwlagn: fix cmd queue unmap
      iwlagn: map command buffers BIDI

John W. Linville (3):
      Merge branch 'wireless-2.6' of git://git.kernel.org/.../iwlwifi/iwlwifi-2.6
      iwlagn: use PCI_DMA_* for pci_* operations
      Merge branch 'master' of git://git.kernel.org/.../linville/wireless-2.6 into for-davem

Kristoffer Glembo (1):
      greth: greth_set_mac_add would corrupt the MAC address.

Marcus Meissner (1):
      net: bind() fix error return on wrong address family

Rajkumar Manoharan (1):
      ath9k: Fix suspend/resume when no interface is UP

Shan Wei (2):
      net: 8139too: Initial necessary vlan_features to support vlan
      net: vlan: enable soft features regardless of underlying device

Shreyas Bhatewara (2):
      vmxnet3: fix starving rx ring whenoc_skb kb fails
      vmxnet3: round down # of queues to power of two

Shyam Iyer (1):
      Fix call trace when interrupts are disabled while sleeping function kzalloc is called

Steffen Klassert (3):
      xfrm: Remove family arg from xfrm_bundle_ok
      ipv4: Don't use ufo handling on later transformed packets
      xfrm4: Don't call icmp_send on local error

Wei Yongjun (1):
      sctp: fix missing send up SCTP_SENDER_DRY_EVENT when subscribe it

Yoshihiro Shimoda (2):
      net: sh_eth: fix cannot work half-duplex mode
      net: sh_eth: fix the parameter for the ETHER of SH7757

 drivers/net/8139too.c                   |    1 +
 drivers/net/bna/bnad.c                  |    7 +-
 drivers/net/greth.c                     |    7 +-
 drivers/net/hamradio/6pack.c            |    4 +-
 drivers/net/hamradio/mkiss.c            |    4 +-
 drivers/net/natsemi.c                   |    3 +-
 drivers/net/qlge/qlge.h                 |    3 +-
 drivers/net/qlge/qlge_main.c            |   42 ++++++----
 drivers/net/sh_eth.c                    |    6 +-
 drivers/net/vmxnet3/vmxnet3_drv.c       |  138 ++++++++++++++++++++++---------
 drivers/net/vmxnet3/vmxnet3_int.h       |    5 +-
 drivers/net/wireless/ath/ath5k/eeprom.c |    8 +-
 drivers/net/wireless/ath/ath9k/pci.c    |    6 ++
 drivers/net/wireless/iwlwifi/iwl-1000.c |    5 +-
 drivers/net/wireless/iwlwifi/iwl-2000.c |    7 +-
 drivers/net/wireless/iwlwifi/iwl-5000.c |    5 +-
 drivers/net/wireless/iwlwifi/iwl-6000.c |    9 +-
 drivers/net/wireless/iwlwifi/iwl-core.c |    3 +-
 drivers/net/wireless/iwlwifi/iwl-tx.c   |   25 ++----
 include/net/cfg80211.h                  |    2 +-
 include/net/dst.h                       |    1 +
 net/8021q/vlan_dev.c                    |    5 +
 net/bridge/br_device.c                  |    4 +-
 net/bridge/br_input.c                   |    6 +-
 net/core/dst.c                          |    6 +-
 net/ipv4/af_inet.c                      |    4 +-
 net/ipv4/ip_output.c                    |    2 +-
 net/ipv4/tcp.c                          |   10 +--
 net/ipv4/udp.c                          |   10 +--
 net/ipv4/xfrm4_output.c                 |    7 ++-
 net/ipv6/af_inet6.c                     |    2 +-
 net/ipv6/route.c                        |   25 ++----
 net/mac80211/wpa.c                      |    8 ++-
 net/sctp/protocol.c                     |   11 +--
 net/sctp/socket.c                       |   23 +++++
 net/wireless/nl80211.c                  |    3 +-
 net/xfrm/xfrm_policy.c                  |    6 +-
 37 files changed, 257 insertions(+), 166 deletions(-)

^ permalink raw reply

* Re: [PATCH] gianfar v2: code cleanups
From: David Miller @ 2011-07-07 11:30 UTC (permalink / raw)
  To: sebastian.belden; +Cc: netdev, sebastian.poehn
In-Reply-To: <1310037749.3410.5.camel@DENEC1DT0191>

From: "Sebastian Pöhn" <sebastian.belden@googlemail.com>
Date: Thu, 07 Jul 2011 13:22:29 +0200

> This patch:
> # improves readability of some gianfar nfc code
> 
>> Your email client has corrupted this patch, breaking up long
>> lines, etc.
> Sorry for that!
> 
> Signed-off-by: Sebastian Poehn <sebastian.poehn@belden.com>

Applied, thanks.

^ permalink raw reply

* Re: [PATCH] lib/checksum.c: optimize do_csum a bit
From: David Miller @ 2011-07-07 11:29 UTC (permalink / raw)
  To: abbotti; +Cc: netdev, arnd, linux-kernel
In-Reply-To: <1310037529-30854-1-git-send-email-abbotti@mev.co.uk>

From: Ian Abbott <abbotti@mev.co.uk>
Date: Thu, 7 Jul 2011 12:18:49 +0100

> Reduce the number of variables modified by the loop in do_csum() by 1,
> which seems like a good idea.  On Nios II (a RISC CPU with 3-operand
> instruction set) it reduces the loop from 7 to 6 instructions, including
> the conditional branch.
> 
> Signed-off-by: Ian Abbott <abbotti@mev.co.uk>

I think you'll overshoot past the end of the buffer when there are
trailing bytes to handle.

The whole reason we need the count variable is to handle those
kinds of cases.

^ permalink raw reply

* [PATCH] gianfar v2: code cleanups
From: Sebastian Pöhn @ 2011-07-07 11:22 UTC (permalink / raw)
  To: Linux Netdev; +Cc: Sebastian Pöhn

This patch:
# improves readability of some gianfar nfc code

> Your email client has corrupted this patch, breaking up long
> lines, etc.
Sorry for that!

Signed-off-by: Sebastian Poehn <sebastian.poehn@belden.com>
---

 drivers/net/gianfar.h         |    1 +
 drivers/net/gianfar_ethtool.c |   52 ++++++++++++++++++++++-------------------
 2 files changed, 29 insertions(+), 24 deletions(-)

diff --git a/drivers/net/gianfar.h b/drivers/net/gianfar.h
index 76f14d0..27499c6 100644
--- a/drivers/net/gianfar.h
+++ b/drivers/net/gianfar.h
@@ -409,6 +409,7 @@ extern const char gfar_driver_version[];
 #define RQFCR_HASHTBL_2		0x00060000
 #define RQFCR_HASHTBL_3		0x00080000
 #define RQFCR_HASH		0x00010000
+#define RQFCR_QUEUE		0x0000FC00
 #define RQFCR_CLE		0x00000200
 #define RQFCR_RJE		0x00000100
 #define RQFCR_AND		0x00000080
diff --git a/drivers/net/gianfar_ethtool.c b/drivers/net/gianfar_ethtool.c
index 2ecdc9a..203369c 100644
--- a/drivers/net/gianfar_ethtool.c
+++ b/drivers/net/gianfar_ethtool.c
@@ -40,6 +40,7 @@
 #include <linux/mii.h>
 #include <linux/phy.h>
 #include <linux/sort.h>
+#include <linux/if_vlan.h>
 
 #include "gianfar.h"
 
@@ -883,7 +884,7 @@ static void gfar_set_attribute(u32 value, u32 mask, u32 flag,
 		struct filer_table *tab)
 {
 	switch (flag) {
-	/* 3bit */
+		/* 3bit */
 	case RQFCR_PID_PRI:
 		if (!(value | mask))
 			return;
@@ -1051,17 +1052,17 @@ static int gfar_convert_to_filer(struct ethtool_rx_flow_spec *rule,
 		vlan_mask = RQFPR_VLN;
 
 		/* Separate the fields */
-		id = rule->h_ext.vlan_tci & 0xFFF;
-		id_mask = rule->m_ext.vlan_tci & 0xFFF;
-		cfi = (rule->h_ext.vlan_tci >> 12) & 1;
-		cfi_mask = (rule->m_ext.vlan_tci >> 12) & 1;
-		prio = (rule->h_ext.vlan_tci >> 13) & 0x7;
-		prio_mask = (rule->m_ext.vlan_tci >> 13) & 0x7;
-
-		if (cfi == 1 && cfi_mask == 1) {
+		id = rule->h_ext.vlan_tci & VLAN_VID_MASK;
+		id_mask = rule->m_ext.vlan_tci & VLAN_VID_MASK;
+		cfi = rule->h_ext.vlan_tci & VLAN_CFI_MASK;
+		cfi_mask = rule->m_ext.vlan_tci & VLAN_CFI_MASK;
+		prio = (rule->h_ext.vlan_tci & VLAN_PRIO_MASK) >> VLAN_PRIO_SHIFT;
+		prio_mask = (rule->m_ext.vlan_tci & VLAN_PRIO_MASK) >> VLAN_PRIO_SHIFT;
+
+		if (cfi == VLAN_TAG_PRESENT && cfi_mask == VLAN_TAG_PRESENT) {
 			vlan |= RQFPR_CFI;
 			vlan_mask |= RQFPR_CFI;
-		} else if (cfi == 0 && cfi_mask == 1) {
+		} else if (cfi != VLAN_TAG_PRESENT && cfi_mask == VLAN_TAG_PRESENT) {
 			vlan_mask |= RQFPR_CFI;
 		}
 	}
@@ -1262,21 +1263,21 @@ static void gfar_cluster_filer(struct filer_table *tab)
 	}
 }
 
-/* Swaps the 0xFF80 masked bits of a1<>a2 and b1<>b2 */
-static void gfar_swap_ff80_bits(struct gfar_filer_entry *a1,
+/* Swaps the masked bits of a1<>a2 and b1<>b2 */
+static void gfar_swap_bits(struct gfar_filer_entry *a1,
 		struct gfar_filer_entry *a2, struct gfar_filer_entry *b1,
-		struct gfar_filer_entry *b2)
+		struct gfar_filer_entry *b2, u32 mask)
 {
 	u32 temp[4];
-	temp[0] = a1->ctrl & 0xFF80;
-	temp[1] = a2->ctrl & 0xFF80;
-	temp[2] = b1->ctrl & 0xFF80;
-	temp[3] = b2->ctrl & 0xFF80;
+	temp[0] = a1->ctrl & mask;
+	temp[1] = a2->ctrl & mask;
+	temp[2] = b1->ctrl & mask;
+	temp[3] = b2->ctrl & mask;
 
-	a1->ctrl &= ~0xFF80;
-	a2->ctrl &= ~0xFF80;
-	b1->ctrl &= ~0xFF80;
-	b2->ctrl &= ~0xFF80;
+	a1->ctrl &= ~mask;
+	a2->ctrl &= ~mask;
+	b1->ctrl &= ~mask;
+	b2->ctrl &= ~mask;
 
 	a1->ctrl |= temp[1];
 	a2->ctrl |= temp[0];
@@ -1305,7 +1306,7 @@ static u32 gfar_generate_mask_table(struct gfar_mask_entry *mask_table,
 				mask_table[and_index - 1].end = i - 1;
 			and_index++;
 		}
-		/* cluster starts will be separated because they should
+		/* cluster starts and ends will be separated because they should
 		 * hold their position */
 		if (tab->fe[i].ctrl & RQFCR_CLE)
 			block_index++;
@@ -1356,10 +1357,13 @@ static void gfar_sort_mask_table(struct gfar_mask_entry *mask_table,
 			new_first = mask_table[start].start + 1;
 			new_last = mask_table[i - 1].end;
 
-			gfar_swap_ff80_bits(&temp_table->fe[new_first],
+			gfar_swap_bits(&temp_table->fe[new_first],
 					&temp_table->fe[old_first],
 					&temp_table->fe[new_last],
-					&temp_table->fe[old_last]);
+					&temp_table->fe[old_last],
+					RQFCR_QUEUE | RQFCR_CLE |
+						RQFCR_RJE | RQFCR_AND
+					);
 
 			start = i;
 			size = 0;



^ permalink raw reply related

* RE: bnx2: FTQ dump on heavy workload(bnx2-2.0.23b + kernel 2.6.32.36)
From: MaoXiaoyun @ 2011-07-07 11:21 UTC (permalink / raw)
  To: mchan, netdev; +Cc: davidch
In-Reply-To: <C27F8246C663564A84BB7AB343977242667C64FA19@IRVEXCHCCR01.corp.ad.broadcom.com>


Hi:
 
I've added more register to dump.
Hope this helps. Thanks.
 
 
---[ end trace cdb56f702ac69a53 ]---
bnx2: <--- start FTQ dump on peth0 --->
bnx2: peth0: BNX2_RV2P_PFTQ_CTL 10000
bnx2: peth0: BNX2_RV2P_TFTQ_CTL 20000
bnx2: peth0: BNX2_RV2P_MFTQ_CTL 4000
bnx2: peth0: BNX2_TBDR_FTQ_CTL 1004002
bnx2: peth0: BNX2_TDMA_FTQ_CTL 4010002
bnx2: peth0: BNX2_TXP_FTQ_CTL 4010002
bnx2: peth0: BNX2_TPAT_FTQ_CTL 10002
bnx2: peth0: BNX2_RXP_CFTQ_CTL 8000
bnx2: peth0: BNX2_RXP_FTQ_CTL 100000
bnx2: peth0: BNX2_COM_COMXQ_FTQ_CTL 10000
bnx2: peth0: BNX2_COM_COMTQ_FTQ_CTL 20000
bnx2: peth0: BNX2_COM_COMQ_FTQ_CTL 10000
bnx2: peth0: BNX2_CP_CPQ_FTQ_CTL 4000
bnx2: peth0: TXP mode b84c state 80005000 evt_mask 500 pc 8000758 pc 8000758 instr 8fa90030
bnx2: peth0: TPAT mode b84c state 80001000 evt_mask 500 pc 8000a4c pc 8000a5c instr 10400016
bnx2: peth0: RXP mode b84c state 80001000 evt_mask 500 pc 8004c20 pc 8004c10 instr 32050003
bnx2: peth0: COM mode b8cc state 80000000 evt_mask 500 pc 8000a98 pc 8000a94 instr 8821
bnx2: peth0: CP mode b8cc state 80004000 evt_mask 500 pc 8000ca8 pc 8000920 instr 8f890014
bnx2: <--- end FTQ dump on peth0 --->
 

bnx2: <--- start extra FTQ dump on peth0 --->
bnx2: peth0: BNX2_TBDR_STATUS 42
bnx2: peth0: BNX2_TBDR_CONFIG 4000040
bnx2: peth0: BNX2_TBDR_DEBUG_VECT_PEEK 70007
bnx2: peth0: BNX2_TBDR_CKSUM_ERROR_STATUS 0
bnx2: peth0: BNX2_TBDR_TBDRQ 0
bnx2: peth0: BNX2_TDMA_STATUS 420000
bnx2: peth0: BNX2_TDMA_CONFIG 310088c
bnx2: peth0: BNX2_TDMA_PAYLOAD_PROD 9480
bnx2: peth0: BNX2_TDMA_DMAD_STATUS 0
bnx2: peth0: BNX2_TDMA_DR_INTF_FSM 0
bnx2: peth0: BNX2_TDMA_DR_INTF_STATUS 0
bnx2: peth0: BNX2_TDMA_TDMAQ 0
bnx2: peth0: BNX2_TXP_CPU_MODE b84c
bnx2: peth0: BNX2_TXP_CPU_STATE 80005000
bnx2: peth0: BNX2_TXP_CPU_EVENT_MASK 500
bnx2: peth0: BNX2_TXP_SCRATCH a00002a
bnx2: peth0: BNX2_TXP_CPU_REG_FILE 8007f6c
bnx2: peth0: BNX2_TXP_TXPQ 0
bnx2: peth0: BNX2_TPAT_CPU_MODE b84c
bnx2: peth0: BNX2_TPAT_CPU_STATE 80001000
bnx2: peth0: BNX2_TPAT_CPU_EVENT_MASK 500
bnx2: peth0: BNX2_TPAT_CPU_PROGRAM_COUNTER 8000a54
bnx2: peth0: BNX2_TPAT_TPATQ 0
bnx2: peth0: BNX2_TPAT_SCRATCH0
bnx2: <--- end extra FTQ dump on peth0 --->

bnx2: peth0 DEBUG: intr_sem[0]
bnx2: peth0 DEBUG: intr_sem[0] PCI_CMD[20100406]
bnx2: peth0 DEBUG: PCI_PM[19002008] PCI_MISC_CFG[92000088]
bnx2: peth0 DEBUG: EMAC_TX_STATUS[00000008] EMAC_RX_STATUS[00000000]
bnx2: peth0 RPM_MGMT_PKT_CTRL[40000088]
bnx2: peth0 DEBUG: MCP_STATE_P0[0007e10e] MCP_STATE_P1[0003e00e]
bnx2: peth0 DEBUG: HC_STATS_INTERRUPT_STATUS[01ff0000]
bnx2: peth0 DEBUG: PBA[00000000]
BNX2_PCICFG_INT_ACK_CMD[0401f8d4]
bnx2: peth0: prevent chip reset during tx timeout
last status idx 34554 
hw_tx_cons 60308, txr->hw_tx_conds 60308 txr->tx_prod 60429 txr->tx_cons 60301
hw_rx_cons 19366, txr->hw_rx_conds 19366
sblk->status_attn_bits 1
sblk->status_attn_bits_ack 1
bnx2_tx_avail 127 
sblk->status_tx_quick_consumer_index0 60308
sblk->status_tx_quick_consumer_index1 0
sblk->status_tx_quick_consumer_index2 0
sblk->status_tx_quick_consumer_index3 0
sblk->status_rx_quick_consumer_index0 19366
sblk->status_rx_quick_consumer_index1 0
sblk->status_rx_quick_consumer_index2 0
sblk->status_rx_quick_consumer_index3 0
sblk->status_rx_quick_consumer_index4 0
sblk->status_rx_quick_consumer_index5 0
sblk->status_rx_quick_consumer_index6 0
sblk->status_rx_quick_consumer_index7 0
sblk->status_rx_quick_consumer_index8 0
sblk->status_rx_quick_consumer_index9 0
sblk->status_rx_quick_consumer_index10 0
sblk->status_rx_quick_consumer_index11 0
sblk->status_rx_quick_consumer_index12 0
sblk->status_rx_quick_consumer_index13 0
sblk->status_rx_quick_consumer_index14 0
sblk->status_rx_quick_consumer_index15 0
sblk->status_completion_producer_index 0
sblk->status_cmd_consumer_index 0
sblk->status_idx 34554
sblk->status_unused 0
sblk->status_blk_num 0



> From: mchan@broadcom.com
> To: tinnycloud@hotmail.com; netdev@vger.kernel.org
> CC: davidch@broadcom.com
> Date: Mon, 4 Jul 2011 10:04:25 -0700
> Subject: Re: bnx2: FTQ dump on heavy workload(bnx2-2.0.23b + kernel 2.6.32.36)
> 
> MaoXiaoyun wrote:
> 
> > Could it be caused by the similar timeout as
> > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-
> > 2.6.git;a=commit;h=c441b8d2cb2194b05550a558d6d95d8944e56a84.
> 
> Based on the register dump below, it is not caused by the MSI-X issue.
> 
> > 
> > Maybe timeout still happens in my test scenerino.
> > 
> > Well, from the patch, BNX2_MISC_ECO_HW_CTL is defined 0x000008cc. But I
> > cannot find
> > the defines in programmer reference Guide.(NetXtremeII-PG203-R.pdf).
> > Could some help
> > to point out for me or is the doc is out of date.
> 
> I will request the document to be updated to describe that register. We
> are increasing the register read and write timeout value to workaround
> the problem of the MSI-X table being updated while there is a pending
> MSI-X. Without the patch, the write to unmask the MSI-X table entry can
> be dropped by the chip.
> 
> > 
> > Also, is there a way to comfirm whether the timeout really happen?
> > (which regisiter
> > shall I read?) Or is there a bigger timeout I can set?
> 
> Again, the register dump shows that it is not caused by this issue. I'll
> send you some additional debug patch to try to debug the problem.
> 
> Thanks.
> > 
> > thanks.
> > 
> > ----------------------------------------
> > > From: tinnycloud@hotmail.com
> > > To: netdev@vger.kernel.org
> > > Subject: bnx2: FTQ dump on heavy workload(bnx2-2.0.23b + kernel
> > 2.6.32.36)
> > > Date: Mon, 4 Jul 2011 15:40:01 +0800
> > >
> > >
> > > Hi:
> > >
> > > I met bnx2 FTQ dump over and over again during my testing on Xen live
> > migration which generate
> > > heavy network workload.
> > >
> > > I have two physcial machine, both have xen 4.0.1 installed, and
> > kernel 2.6.32.36, bnx2 2.0.23b.
> > > I start 15 Virtual Machines totoally, and doing migration between the
> > host over and over again,
> > > about 16hours, the network will not work, and sometimes, it can reset
> > successfully, sometimes, it
> > > cause kernel crash.
> > >
> > > I've tried debug some, add code in the driver. below is the code when
> > FTQ happened.
> > > It looks like the NIC is stop transmit the packets, and cause
> > timeout.
> > >
> > > BTW, cpu max_cstate=1 in my grub.
> > >
> > > Thanks.
> > >
> > > --------------
> > > static void
> > > bnx2_tx_timeout(struct net_device *dev)
> > > {
> > > struct bnx2 *bp = netdev_priv(dev);
> > > struct bnx2_napi *bnapi = &bp->bnx2_napi[0];
> > > struct bnx2_tx_ring_info *txr = &bnapi->tx_ring;
> > > struct bnx2_rx_ring_info *rxr = &bnapi->rx_ring;
> > > int i ;
> > > bnx2_dump_ftq(bp);
> > > bnx2_dump_state(bp);
> > > if (stop_on_tx_timeout) {
> > > printk(KERN_WARNING PFX
> > > "%s: prevent chip reset during tx timeout\n",
> > > bp->dev->name);
> > > smp_rmb();
> > > printk("last status idx %d \n", bnapi->last_status_idx);
> > > printk("hw_tx_cons %d, txr->hw_tx_conds %d txr->tx_prod %d txr-
> > >tx_cons %d\n",
> > > bnx2_get_hw_tx_cons(bnapi), txr->hw_tx_cons, txr->tx_prod, txr-
> > >tx_cons);
> > > printk("hw_rx_cons %d, txr->hw_rx_conds %d\n",
> > bnx2_get_hw_rx_cons(bnapi), rxr->rx_cons);
> > > printk("sblk->status_attn_bits %d\n",bnapi->status_blk.msi-
> > >status_attn_bits);
> > > printk("sblk->status_attn_bits_ack %d\n",bnapi->status_blk.msi-
> > >status_attn_bits_ack);
> > > printk("bnx2_tx_avail %d \n",(bnx2_tx_avail(bp, txr)));
> > > printk("sblk->status_tx_quick_consumer_index0 %d\n",bnapi-
> > >status_blk.msi->status_tx_quick_consumer_index0);
> > > printk("sblk->status_tx_quick_consumer_index1 %d\n",bnapi-
> > >status_blk.msi->status_tx_quick_consumer_index1);
> > > printk("sblk->status_tx_quick_consumer_index2 %d\n",bnapi-
> > >status_blk.msi->status_tx_quick_consumer_index2);
> > > printk("sblk->status_tx_quick_consumer_index3 %d\n",bnapi-
> > >status_blk.msi->status_tx_quick_consumer_index3);
> > > printk("sblk->status_rx_quick_consumer_index0 %d\n",bnapi-
> > >status_blk.msi->status_rx_quick_consumer_index0);
> > > printk("sblk->status_rx_quick_consumer_index1 %d\n",bnapi-
> > >status_blk.msi->status_rx_quick_consumer_index1);
> > > printk("sblk->status_rx_quick_consumer_index2 %d\n",bnapi-
> > >status_blk.msi->status_rx_quick_consumer_index2);
> > > printk("sblk->status_rx_quick_consumer_index3 %d\n",bnapi-
> > >status_blk.msi->status_rx_quick_consumer_index3);
> > > printk("sblk->status_rx_quick_consumer_index4 %d\n",bnapi-
> > >status_blk.msi->status_rx_quick_consumer_index4);
> > > printk("sblk->status_rx_quick_consumer_index5 %d\n",bnapi-
> > >status_blk.msi->status_rx_quick_consumer_index5);
> > > printk("sblk->status_rx_quick_consumer_index6 %d\n",bnapi-
> > >status_blk.msi->status_rx_quick_consumer_index6);
> > > printk("sblk->status_rx_quick_consumer_index7 %d\n",bnapi-
> > >status_blk.msi->status_rx_quick_consumer_index7);
> > > printk("sblk->status_rx_quick_consumer_index8 %d\n",bnapi-
> > >status_blk.msi->status_rx_quick_consumer_index8);
> > > printk("sblk->status_rx_quick_consumer_index9 %d\n",bnapi-
> > >status_blk.msi->status_rx_quick_consumer_index9);
> > > printk("sblk->status_rx_quick_consumer_index10 %d\n",bnapi-
> > >status_blk.msi->status_rx_quick_consumer_index10);
> > > printk("sblk->status_rx_quick_consumer_index11 %d\n",bnapi-
> > >status_blk.msi->status_rx_quick_consumer_index11);
> > > printk("sblk->status_rx_quick_consumer_index12 %d\n",bnapi-
> > >status_blk.msi->status_rx_quick_consumer_index12);
> > > printk("sblk->status_rx_quick_consumer_index13 %d\n",bnapi-
> > >status_blk.msi->status_rx_quick_consumer_index13);
> > > printk("sblk->status_rx_quick_consumer_index14 %d\n",bnapi-
> > >status_blk.msi->status_rx_quick_consumer_index14);
> > > printk("sblk->status_rx_quick_consumer_index15 %d\n",bnapi-
> > >status_blk.msi->status_rx_quick_consumer_index15);
> > > printk("sblk->status_completion_producer_index %d\n",bnapi-
> > >status_blk.msi->status_completion_producer_index);
> > > printk("sblk->status_cmd_consumer_index %d\n",bnapi->status_blk.msi-
> > >status_cmd_consumer_index);
> > > printk("sblk->status_idx %d\n",bnapi->status_blk.msi->status_idx);
> > > printk("sblk->status_unused %d\n",bnapi->status_blk.msi-
> > >status_unused);
> > > printk("sblk->status_blk_num %d\n",bnapi->status_blk.msi-
> > >status_blk_num);
> > > is_timedout = 1;
> > > for (i = 0; i < bp->irq_nvecs; i++) {
> > > bnapi = &bp->bnx2_napi[i];
> > > bnx2_tx_int(bp, bnapi, 0);
> > > }
> > > return;
> > > }
> > > -----------------
> > >
> > > -------------FTQ log in /var/log/message
> > > ------------[ cut here ]------------
> > > WARNING: at net/sched/sch_generic.c:261 dev_watchdog+0x105/0x16a()
> > > Hardware name: Tecal RH2285
> > > Modules linked in: iptable_filter ip_tables nfs fscache nfs_acl
> > auth_rpcgss bridge stp llc autofs4 ipmi_devintf ipmi_si ipmi_msghandler
> > lockd sunrpc ipv6 xenfs dm_multipath fuse xen_netback xen_blkback
> > blktap blkback_pagemap loop nbd video output sbs sbshc parport_pc lp
> > parport snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq
> > snd_seq_device snd_pcm_oss snd_mixer_oss bnx2 serio_raw snd_pcm
> > snd_timer snd soundcore snd_page_alloc i2c_i801 iTCO_wdt
> > iTCO_vendor_support i2c_core pata_acpi ata_generic pcspkr ata_piix
> > shpchp mptsas mptscsih mptbase [last unloaded: freq_table]
> > > Pid: 0, comm: swapper Not tainted 2.6.32.36xen #1
> > > Call Trace:
> > > <IRQ> [<ffffffff813ba154>] ? dev_watchdog+0x105/0x16a
> > > [<ffffffff81056666>] warn_slowpath_common+0x7c/0x94
> > > [<ffffffff81056738>] warn_slowpath_fmt+0xa4/0xa6
> > > [<ffffffff81080bfa>] ? clockevents_program_event+0x78/0x81
> > > [<ffffffff81081fce>] ? tick_program_event+0x2a/0x2c
> > > [<ffffffff813b951d>] ? __netif_tx_lock+0x1b/0x24
> > > [<ffffffff813b95a8>] ? netif_tx_lock+0x46/0x6e
> > > [<ffffffff813a3ed1>] ? netdev_drivername+0x48/0x4f
> > > [<ffffffff813ba154>] dev_watchdog+0x105/0x16a
> > > [<ffffffff81063d98>] run_timer_softirq+0x156/0x1f8
> > > [<ffffffff813ba04f>] ? dev_watchdog+0x0/0x16a
> > > [<ffffffff8105d6f0>] __do_softirq+0xd7/0x19e
> > > [<ffffffff81013eac>] call_softirq+0x1c/0x30
> > > [<ffffffff8101564b>] do_softirq+0x46/0x87
> > > [<ffffffff8105d575>] irq_exit+0x3b/0x7a
> > > [<ffffffff8128dcfe>] xen_evtchn_do_upcall+0x38/0x46
> > > [<ffffffff81013efe>] xen_do_hypervisor_callback+0x1e/0x30
> > > <EOI> [<ffffffff8103f642>] ? pick_next_task_idle+0x18/0x22
> > > [<ffffffff810093aa>] ? hypercall_page+0x3aa/0x1000
> > > [<ffffffff810093aa>] ? hypercall_page+0x3aa/0x1000
> > > [<ffffffff8100f1bb>] ? xen_safe_halt+0x10/0x1a
> > > [<ffffffff81019e14>] ? default_idle+0x39/0x56
> > > [<ffffffff81011cd0>] ? cpu_idle+0x5d/0x8c
> > > [<ffffffff8143375d>] ? cpu_bringup_and_idle+0x13/0x15
> > > ---[ end trace 867bb8f6cd959b03 ]---
> > > bnx2: <--- start FTQ dump on peth0 --->
> > > bnx2: peth0: BNX2_RV2P_PFTQ_CTL 10000
> > > bnx2: peth0: BNX2_RV2P_TFTQ_CTL 20000
> > > bnx2: peth0: BNX2_RV2P_MFTQ_CTL 4000
> > > bnx2: peth0: BNX2_TBDR_FTQ_CTL 1004002
> > > bnx2: peth0: BNX2_TDMA_FTQ_CTL 4010002
> > > bnx2: peth0: BNX2_TXP_FTQ_CTL 2410002
> > > bnx2: peth0: BNX2_TPAT_FTQ_CTL 10002
> > > bnx2: peth0: BNX2_RXP_CFTQ_CTL 8000
> > > bnx2: peth0: BNX2_RXP_FTQ_CTL 100000
> > > bnx2: peth0: BNX2_COM_COMXQ_FTQ_CTL 10000
> > > bnx2: peth0: BNX2_COM_COMTQ_FTQ_CTL 20000
> > > bnx2: peth0: BNX2_COM_COMQ_FTQ_CTL 10000
> > > bnx2: peth0: BNX2_CP_CPQ_FTQ_CTL 4000
> > > bnx2: peth0: TXP mode b84c state 80005000 evt_mask 500 pc 8000d60 pc
> > 8000d60 instr 8f860000
> > > bnx2: peth0: TPAT mode b84c state 80009000 evt_mask 500 pc 8000a5c pc
> > 8000a5c instr 10400016
> > > bnx2: peth0: RXP mode b84c state 80001000 evt_mask 500 pc 8004c14 pc
> > 8004c14 instr 10e00088
> > > bnx2: peth0: COM mode b8cc state 80000000 evt_mask 500 pc 8000b28 pc
> > 8000a9c instr 8c530000
> > > bnx2: peth0: CP mode b8cc state 80000000 evt_mask 500 pc 8000c50 pc
> > 8000c58 instr 8ca50020
> > > bnx2: <--- end FTQ dump on peth0 --->
> > > bnx2: peth0 DEBUG: intr_sem[0]
> > > bnx2: peth0 DEBUG: intr_sem[0] PCI_CMD[20100406]
> > > bnx2: peth0 DEBUG: PCI_PM[19002008] PCI_MISC_CFG[92000088]
> > > bnx2: peth0 DEBUG: EMAC_TX_STATUS[00000008] EMAC_RX_STATUS[00000000]
> > > bnx2: peth0 RPM_MGMT_PKT_CTRL[40000088]
> > > bnx2: peth0 DEBUG: MCP_STATE_P0[0007e10e] MCP_STATE_P1[0003e00e]
> > > bnx2: peth0 DEBUG: HC_STATS_INTERRUPT_STATUS[01ff0000]
> > > bnx2: peth0 DEBUG: PBA[00000000]
> > > BNX2_PCICFG_INT_ACK_CMD[00013ce1]
> > > bnx2: peth0: prevent chip reset during tx timeout
> > > last status idx 2426
> > > hw_tx_cons 32474, txr->hw_tx_conds 32474 txr->tx_prod 32641 txr-
> > >tx_cons 32474
> > > hw_rx_cons 19665, txr->hw_rx_conds 19665
> > > sblk->status_attn_bits 1
> > > sblk->status_attn_bits_ack 1
> > > bnx2_tx_avail 88
> > > sblk->status_tx_quick_consumer_index0 32474
> > > sblk->status_tx_quick_consumer_index1 0
> > > sblk->status_tx_quick_consumer_index2 0
> > > sblk->status_tx_quick_consumer_index3 0
> > > sblk->status_rx_quick_consumer_index0 19665
> > > sblk->status_rx_quick_consumer_index1 0
> > > sblk->status_rx_quick_consumer_index2 0
> > > sblk->status_rx_quick_consumer_index3 0
> > > sblk->status_rx_quick_consumer_index4 0
> > > sblk->status_rx_quick_consumer_index5 0
> > > sblk->status_rx_quick_consumer_index6 0
> > > sblk->status_rx_quick_consumer_index7 0
> > > sblk->status_rx_quick_consumer_index8 0
> > > sblk->status_rx_quick_consumer_index9 0
> > > sblk->status_rx_quick_consumer_index10 0
> > > sblk->status_rx_quick_consumer_index11 0
> > > sblk->status_rx_quick_consumer_index12 0
> > > sblk->status_rx_quick_consumer_index13 0
> > > sblk->status_rx_quick_consumer_index14 0
> > > sblk->status_rx_quick_consumer_index15 0
> > > sblk->status_completion_producer_index 0
> > > sblk->status_cmd_consumer_index 0
> > > sblk->status_idx 2426
> > > sblk->status_unused 0
> > > sblk->status_blk_num 0
> > > hw_cons 32474 sw_cons 32474 ffff8801d27f85c0 bnapi
> > > return hw_cons 32474 sw_cons 32474 ffff8801d27f85c0 bnapi
> > > hw_cons 3628 sw_cons 3625 ffff8801d27f8bc0 bnapi
> > > return hw_cons 3628 sw_cons 3625 ffff8801d27f8bc0 bnapi
> > > hw_cons 62094 sw_cons 62090 ffff8801d27f91c0 bnapi
> > > return hw_cons 62094 sw_cons 62090 ffff8801d27f91c0 bnapi
> > > hw_cons 3184 sw_cons 3173 ffff8801d27f97c0 bnapi
> > > return hw_cons 3184 sw_cons 3173 ffff8801d27f97c0 bnapi
> > > hw_cons 0 sw_cons 0 ffff8801d27f9dc0 bnapi
> > > return hw_cons 0 sw_cons 0 ffff8801d27f9dc0 bnapi
> > 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html 		 	   		  

^ permalink raw reply

* [PATCH] lib/checksum.c: optimize do_csum a bit
From: Ian Abbott @ 2011-07-07 11:18 UTC (permalink / raw)
  To: netdev; +Cc: Ian Abbott, David S. Miller, Arnd Bergmann, linux-kernel

Reduce the number of variables modified by the loop in do_csum() by 1,
which seems like a good idea.  On Nios II (a RISC CPU with 3-operand
instruction set) it reduces the loop from 7 to 6 instructions, including
the conditional branch.

Signed-off-by: Ian Abbott <abbotti@mev.co.uk>
---
 lib/checksum.c |   13 +++++--------
 1 files changed, 5 insertions(+), 8 deletions(-)

diff --git a/lib/checksum.c b/lib/checksum.c
index 0975087..8df2f91 100644
--- a/lib/checksum.c
+++ b/lib/checksum.c
@@ -49,7 +49,7 @@ static inline unsigned short from32to16(unsigned int x)
 
 static unsigned int do_csum(const unsigned char *buff, int len)
 {
-	int odd, count;
+	int odd;
 	unsigned int result = 0;
 
 	if (len <= 0)
@@ -64,25 +64,22 @@ static unsigned int do_csum(const unsigned char *buff, int len)
 		len--;
 		buff++;
 	}
-	count = len >> 1;		/* nr of 16-bit words.. */
-	if (count) {
+	if (len >= 2) {
 		if (2 & (unsigned long) buff) {
 			result += *(unsigned short *) buff;
-			count--;
 			len -= 2;
 			buff += 2;
 		}
-		count >>= 1;		/* nr of 32-bit words.. */
-		if (count) {
+		if (len >= 4) {
+			const unsigned char *end = buff + ((unsigned)len & ~3);
 			unsigned int carry = 0;
 			do {
 				unsigned int w = *(unsigned int *) buff;
-				count--;
 				buff += 4;
 				result += carry;
 				result += w;
 				carry = (w > result);
-			} while (count);
+			} while (buff < end);
 			result += carry;
 			result = (result & 0xffff) + (result >> 16);
 		}
-- 
1.7.6

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox