Netdev List
 help / color / mirror / Atom feed
* Re: [BUG] crashes with kvm/nat networking and net-next
From: Patrick McHardy @ 2010-05-12 11:18 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Stephen Hemminger, Bart De Schuymer, netdev
In-Reply-To: <1273649526.2621.3.camel@edumazet-laptop>

Eric Dumazet wrote:
> Le mardi 11 mai 2010 à 20:25 -0700, Stephen Hemminger a écrit :
>> This is a regression that is showing up now in net-next, not sure what
>> changed recently in bridge netfilter that could be causing it?
>>
>> [ 4593.956206] BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
>> [ 4593.956219] IP: [<ffffffffa03357a4>] br_nf_forward_finish+0x154/0x170 [bridge]
>> [ 4593.956232] PGD 195ece067 PUD 1ba005067 PMD 0 
>> [ 4593.956241] Oops: 0000 [#1] SMP 
>> [ 4593.956248] last sysfs file: /sys/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A08:00/device:08/ATK0110:00/hwmon/hwmon0/temp2_label
>> [ 4593.956253] CPU 3 
>> [ 4593.956256] Modules linked in: netconsole configfs hid_belkin tun ntfs vfat msdos fat autofs4 binfmt_misc ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables bridge stp llc kvm_intel kvm radeon ttm drm_kms_helper drm i2c_algo_bit snd_hda_codec_analog ipv6 snd_hda_intel snd_hda_codec snd_hwdep snd_pcm_oss snd_mixer_oss snd_pcm snd_seq_dummy snd_seq_oss snd_seq_midi snd_rawmidi snd_seq_midi_event snd_seq snd_timer snd_seq_device psmouse asus_atk0110 snd serio_raw soundcore snd_page_alloc usbhid mvsas libsas scsi_transport_sas floppy sky2 e1000e [last unloaded: netconsole]
>> [ 4593.956375] 
>> [ 4593.956380] Pid: 29512, comm: kvm Not tainted 2.6.34-rc7-net #195 P6T DELUXE/System Product Name
>> [ 4593.956384] RIP: 0010:[<ffffffffa03357a4>]  [<ffffffffa03357a4>] br_nf_forward_finish+0x154/0x170 [bridge]
>> [ 4593.956395] RSP: 0018:ffff880001e63b78  EFLAGS: 00010246
>> [ 4593.956399] RAX: 0000000000000608 RBX: ffff880057181700 RCX: ffff8801b813d000
>> [ 4593.956402] RDX: 0000000000000000 RSI: 0000000000000002 RDI: ffff880057181700
>> [ 4593.956406] RBP: ffff880001e63ba8 R08: ffff8801b9d97000 R09: ffffffffa0335650
>> [ 4593.956410] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801b813d000
>> [ 4593.956413] R13: ffffffff81ab3940 R14: ffff880057181700 R15: 0000000000000002
>> [ 4593.956418] FS:  00007fc40d380710(0000) GS:ffff880001e60000(0000) knlGS:0000000000000000
>> [ 4593.956422] CS:  0010 DS: 002b ES: 002b CR0: 000000008005003b
>> [ 4593.956426] CR2: 0000000000000018 CR3: 00000001ba1d7000 CR4: 00000000000026e0
>> [ 4593.956429] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> [ 4593.956433] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
>> [ 4593.956437] Process kvm (pid: 29512, threadinfo ffff8801ba566000, task ffff8801b8003870)
>> [ 4593.956441] Stack:
>> [ 4593.956443]  0000000100000020 ffff880001e63ba0 ffff880001e63ba0 ffff880057181700
>> [ 4593.956451] <0> ffffffffa0335650 ffffffff81ab3940 ffff880001e63bd8 ffffffffa03350e6
>> [ 4593.956462] <0> ffff880001e63c40 000000000000024d ffff880057181700 0000000080000000
>> [ 4593.956474] Call Trace:
>> [ 4593.956478]  <IRQ> 
>> [ 4593.956488]  [<ffffffffa0335650>] ? br_nf_forward_finish+0x0/0x170 [bridge]
>> [ 4593.956496]  [<ffffffffa03350e6>] NF_HOOK_THRESH+0x56/0x60 [bridge]
>> [ 4593.956504]  [<ffffffffa0335282>] br_nf_forward_arp+0x112/0x120 [bridge]
>> [ 4593.956511]  [<ffffffff813f7184>] nf_iterate+0x64/0xa0
>> [ 4593.956519]  [<ffffffffa032f920>] ? br_forward_finish+0x0/0x60 [bridge]
>> [ 4593.956524]  [<ffffffff813f722c>] nf_hook_slow+0x6c/0x100
>> [ 4593.956531]  [<ffffffffa032f920>] ? br_forward_finish+0x0/0x60 [bridge]
>> [ 4593.956538]  [<ffffffffa032f800>] ? __br_forward+0x0/0xc0 [bridge]
>> [ 4593.956545]  [<ffffffffa032f86d>] __br_forward+0x6d/0xc0 [bridge]
>> [ 4593.956550]  [<ffffffff813c5d8e>] ? skb_clone+0x3e/0x70
> 
> Not sure, but br_nf_forward_ip() has following check :
> 
> if (!skb->nf_bridge)
> 	return NF_ACCEPT;
> 
> while br_nf_forward_arp() missed this check ...
> 
> So we can dereference null pointer later

That looks correct to me, offset 0x18 would be nf_bridge_info->mask.
Bart, please review, thanks.

> 
> diff --git a/net/bridge/br_netfilter.c b/net/bridge/br_netfilter.c
> index 93f80fe..cd2e5f5 100644
> --- a/net/bridge/br_netfilter.c
> +++ b/net/bridge/br_netfilter.c
> @@ -723,6 +723,9 @@ static unsigned int br_nf_forward_arp(unsigned int hook, struct sk_buff *skb,
>  		return NF_ACCEPT;
>  #endif
>  
> +	if (!skb->nf_bridge)
> +		return NF_ACCEPT;
> +
>  	if (skb->protocol != htons(ETH_P_ARP)) {
>  		if (!IS_VLAN_ARP(skb))
>  			return NF_ACCEPT;
> 
> 


^ permalink raw reply

* [RFC PATCH] tg3: use netif_carrier_off to prevent tx timeout
From: Stanislaw Gruszka @ 2010-05-12 11:16 UTC (permalink / raw)
  To: netdev
  Cc: Eric Dumazet, Eilon Greenstein, Vladislav Zolotarov,
	Dmitry Kravkov, Michael Chan, Breno Leitao, Matt Carlson
In-Reply-To: <20100512130628.69bc3890@dhcp-lab-109.englab.brq.redhat.com>

Touching ->trans_start make netdev watchdog timeouts only less probable.
Use netif_carrier_off to prevent timeout, lately we take care of turning
carrier on.

Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
---
Patch was not tested!

 drivers/net/tg3.c |    6 +++++-
 1 files changed, 5 insertions(+), 1 deletions(-)

diff --git a/drivers/net/tg3.c b/drivers/net/tg3.c
index 573054a..d745038 100644
--- a/drivers/net/tg3.c
+++ b/drivers/net/tg3.c
@@ -753,7 +753,7 @@ static void tg3_napi_enable(struct tg3 *tp)
 
 static inline void tg3_netif_stop(struct tg3 *tp)
 {
-	tp->dev->trans_start = jiffies;	/* prevent tx timeout */
+	netif_carrier_off(tp->dev);	/* prevent tx timeout */
 	tg3_napi_disable(tp);
 	netif_tx_disable(tp->dev);
 }
@@ -10964,12 +10964,14 @@ static int tg3_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd)
 static void tg3_vlan_rx_register(struct net_device *dev, struct vlan_group *grp)
 {
 	struct tg3 *tp = netdev_priv(dev);
+	int link_up;
 
 	if (!netif_running(dev)) {
 		tp->vlgrp = grp;
 		return;
 	}
 
+	link_up = netif_carrier_ok(dev);
 	tg3_netif_stop(tp);
 
 	tg3_full_lock(tp, 0);
@@ -10979,6 +10981,8 @@ static void tg3_vlan_rx_register(struct net_device *dev, struct vlan_group *grp)
 	/* Update RX_MODE_KEEP_VLAN_TAG bit in RX_MODE register. */
 	__tg3_set_rx_mode(dev);
 
+	if (link_up)
+		netif_carrier_on(dev);
 	tg3_netif_start(tp);
 
 	tg3_full_unlock(tp);
-- 
1.5.5.6


^ permalink raw reply related

* [RFC PATCH] bnx2: use netif_carrier_off to prevent tx timeout
From: Stanislaw Gruszka @ 2010-05-12 11:06 UTC (permalink / raw)
  To: netdev
  Cc: Eric Dumazet, Eilon Greenstein, Vladislav Zolotarov,
	Dmitry Kravkov, Michael Chan, Breno Leitao, Matt Carlson
In-Reply-To: <20100512125815.0dad8ad0@dhcp-lab-109.englab.brq.redhat.com>

Touching ->trans_start make netdev watchdog timeouts only less probable.
Use netif_carrier_off to prevent timeout, lately we take care of tuning
carrier on.

Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
---
Patch was not tested!

 drivers/net/bnx2.c |   12 +++---------
 1 files changed, 3 insertions(+), 9 deletions(-)

diff --git a/drivers/net/bnx2.c b/drivers/net/bnx2.c
index 667f419..44fc392 100644
--- a/drivers/net/bnx2.c
+++ b/drivers/net/bnx2.c
@@ -656,17 +656,9 @@ bnx2_netif_stop(struct bnx2 *bp, bool stop_cnic)
 	if (stop_cnic)
 		bnx2_cnic_stop(bp);
 	if (netif_running(bp->dev)) {
-		int i;
-
 		bnx2_napi_disable(bp);
 		netif_tx_disable(bp->dev);
-		/* prevent tx timeout */
-		for (i = 0; i <  bp->dev->num_tx_queues; i++) {
-			struct netdev_queue *txq;
-
-			txq = netdev_get_tx_queue(bp->dev, i);
-			txq->trans_start = jiffies;
-		}
+		netif_carrier_off(bp->dev);
 	}
 	bnx2_disable_int_sync(bp);
 }
@@ -6346,6 +6338,8 @@ bnx2_vlan_rx_register(struct net_device *dev, struct vlan_group *vlgrp)
 	if (bp->flags & BNX2_FLAG_CAN_KEEP_VLAN)
 		bnx2_fw_sync(bp, BNX2_DRV_MSG_CODE_KEEP_VLAN_UPDATE, 0, 1);
 
+	if (bp->link_up)
+		netif_carrier_on(bp->dev);
 	bnx2_netif_start(bp, false);
 }
 #endif
-- 
1.5.5.6



^ permalink raw reply related

* Re: [PATCH net-next] bnx2x: avoid TX timeout when stopping device
From: Stanislaw Gruszka @ 2010-05-12 10:58 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: netdev, Eilon Greenstein, Vladislav Zolotarov, Dmitry Kravkov,
	Michael Chan, Breno Leitao, Matt Carlson
In-Reply-To: <1273656458.2621.22.camel@edumazet-laptop>

On Wed, 12 May 2010 11:27:38 +0200
Eric Dumazet <eric.dumazet@gmail.com> wrote:

> Le mercredi 12 mai 2010 à 11:09 +0200, Stanislaw Gruszka a écrit :
> > When stop device call netif_carrier_off() just after disabling TX queue to
> > avoid possibility of netdev watchdog warning and ->ndo_tx_timeout() invocation.
> > 
> > Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
> > ---
> 
> This reminds me I saw some strange things in bnx2.c for a similar
> symptom.
> 
> Commit e6bf95ffa8d6f8f4b7ee33ea01490d95b0bbeb6e
> 
> Would you take a look at this too ?

I can send RFC patch for bnx2, and tg3 as I think it needs similar fix.
 
> Or if this kind of trans_start refresh on all queues is really needed,
> it should be a core network provided function, not implemented on every
> driver...

I think netif_carrier_off() should be used, since touching trans_start make
timeout only less probable, but not prevent it. 

Stanislaw

^ permalink raw reply

* RE: does the broadcom bnx2x support RSS/multi queue
From: Eric Dumazet @ 2010-05-12  9:59 UTC (permalink / raw)
  To: Jon Zhou; +Cc: eilong@broadcom.com, netdev@vger.kernel.org
In-Reply-To: <4A6A2125329CFD4D8CC40C9E8ABCAB9F2497D85DCD@MILEXCH2.ds.jdsu.net>

Le mercredi 12 mai 2010 à 02:34 -0700, Jon Zhou a écrit :
> hi eilon:
> 
> do you think I need to update the kernel also?
> 
> thanks!
> jon

I believe both of us (Eilong and me) stated your kernel version was too
old.

In order to play with multiqueue, you should use a very recent kernel,
or hit various bottlenecks and bugs.




^ permalink raw reply

* RE: does the broadcom bnx2x support RSS/multi queue
From: Eilon Greenstein @ 2010-05-12  9:58 UTC (permalink / raw)
  To: Jon Zhou; +Cc: Eric Dumazet, netdev@vger.kernel.org
In-Reply-To: <4A6A2125329CFD4D8CC40C9E8ABCAB9F2497D85DCD@MILEXCH2.ds.jdsu.net>

On Wed, 2010-05-12 at 02:34 -0700, Jon Zhou wrote:
> hi eilon:
> 
> do you think I need to update the kernel also?

Kernel 2.6.27 supports Tx multi queue (Rx multi-queue was added even
before that), so theoretically you can update only the bnx2x. However,
you cannot use the bnx2x from the current kernel due to other changes
between 2.6.27 and the current kernel. So you need to download the bnx2x
from the Broadcom site.

Enjoy,
Eilon



^ permalink raw reply

* [PATCH NEXT 2/4] netxen: remove unnecessary size checks
From: Amit Kumar Salecha @ 2010-05-12  9:53 UTC (permalink / raw)
  To: davem; +Cc: netdev, ameen.rahman, Sucheta Chakraborty
In-Reply-To: <1273657985-13405-1-git-send-email-amit.salecha@qlogic.com>

From: Sucheta Chakraborty <sucheta.chakraborty@qlogic.com>

NX3031 have 64bit on card memory. Fix the limit check to
64MB and remove unnecessary 128bit read/write check.

Signed-off-by: Sucheta Chakraborty <sucheta.chakraborty@qlogic.com>
Signed-off-by: Amit Kumar Salecha <amit.salecha@qlogic.com>
---
 drivers/net/netxen/netxen_nic_hdr.h |    6 ----
 drivers/net/netxen/netxen_nic_hw.c  |   52 +++++-----------------------------
 2 files changed, 8 insertions(+), 50 deletions(-)

diff --git a/drivers/net/netxen/netxen_nic_hdr.h b/drivers/net/netxen/netxen_nic_hdr.h
index d930068..d8bd73d 100644
--- a/drivers/net/netxen/netxen_nic_hdr.h
+++ b/drivers/net/netxen/netxen_nic_hdr.h
@@ -681,14 +681,8 @@ enum {
 #define MIU_TEST_AGT_ADDR_HI		(0x08)
 #define MIU_TEST_AGT_WRDATA_LO		(0x10)
 #define MIU_TEST_AGT_WRDATA_HI		(0x14)
-#define MIU_TEST_AGT_WRDATA_UPPER_LO	(0x20)
-#define MIU_TEST_AGT_WRDATA_UPPER_HI	(0x24)
-#define MIU_TEST_AGT_WRDATA(i)		(0x10+(0x10*((i)>>1))+(4*((i)&1)))
 #define MIU_TEST_AGT_RDDATA_LO		(0x18)
 #define MIU_TEST_AGT_RDDATA_HI		(0x1c)
-#define MIU_TEST_AGT_RDDATA_UPPER_LO	(0x28)
-#define MIU_TEST_AGT_RDDATA_UPPER_HI	(0x2c)
-#define MIU_TEST_AGT_RDDATA(i)		(0x18+(0x10*((i)>>1))+(4*((i)&1)))
 
 #define MIU_TEST_AGT_ADDR_MASK		0xfffffff8
 #define MIU_TEST_AGT_UPPER_ADDR(off)	(0)
diff --git a/drivers/net/netxen/netxen_nic_hw.c b/drivers/net/netxen/netxen_nic_hw.c
index 5e5fe2f..87bc910 100644
--- a/drivers/net/netxen/netxen_nic_hw.c
+++ b/drivers/net/netxen/netxen_nic_hw.c
@@ -1621,9 +1621,8 @@ static int
 netxen_nic_pci_mem_write_2M(struct netxen_adapter *adapter,
 		u64 off, u64 data)
 {
-	int i, j, ret;
+	int j, ret;
 	u32 temp, off8;
-	u64 stride;
 	void __iomem *mem_crb;
 
 	/* Only 64-bit aligned access */
@@ -1650,44 +1649,17 @@ netxen_nic_pci_mem_write_2M(struct netxen_adapter *adapter,
 	return -EIO;
 
 correct:
-	stride = NX_IS_REVISION_P3P(adapter->ahw.revision_id) ? 16 : 8;
-
-	off8 = off & ~(stride-1);
+	off8 = off & 0xfffffff8;
 
 	spin_lock(&adapter->ahw.mem_lock);
 
 	writel(off8, (mem_crb + MIU_TEST_AGT_ADDR_LO));
 	writel(0, (mem_crb + MIU_TEST_AGT_ADDR_HI));
 
-	i = 0;
-	if (stride == 16) {
-		writel(TA_CTL_ENABLE, (mem_crb + TEST_AGT_CTRL));
-		writel((TA_CTL_START | TA_CTL_ENABLE),
-				(mem_crb + TEST_AGT_CTRL));
-
-		for (j = 0; j < MAX_CTL_CHECK; j++) {
-			temp = readl(mem_crb + TEST_AGT_CTRL);
-			if ((temp & TA_CTL_BUSY) == 0)
-				break;
-		}
-
-		if (j >= MAX_CTL_CHECK) {
-			ret = -EIO;
-			goto done;
-		}
-
-		i = (off & 0xf) ? 0 : 2;
-		writel(readl(mem_crb + MIU_TEST_AGT_RDDATA(i)),
-				mem_crb + MIU_TEST_AGT_WRDATA(i));
-		writel(readl(mem_crb + MIU_TEST_AGT_RDDATA(i+1)),
-				mem_crb + MIU_TEST_AGT_WRDATA(i+1));
-		i = (off & 0xf) ? 2 : 0;
-	}
-
 	writel(data & 0xffffffff,
-			mem_crb + MIU_TEST_AGT_WRDATA(i));
+			mem_crb + MIU_TEST_AGT_WRDATA_LO);
 	writel((data >> 32) & 0xffffffff,
-			mem_crb + MIU_TEST_AGT_WRDATA(i+1));
+			mem_crb + MIU_TEST_AGT_WRDATA_HI);
 
 	writel((TA_CTL_ENABLE | TA_CTL_WRITE), (mem_crb + TEST_AGT_CTRL));
 	writel((TA_CTL_START | TA_CTL_ENABLE | TA_CTL_WRITE),
@@ -1707,7 +1679,6 @@ correct:
 	} else
 		ret = 0;
 
-done:
 	spin_unlock(&adapter->ahw.mem_lock);
 
 	return ret;
@@ -1719,7 +1690,7 @@ netxen_nic_pci_mem_read_2M(struct netxen_adapter *adapter,
 {
 	int j, ret;
 	u32 temp, off8;
-	u64 val, stride;
+	u64 val;
 	void __iomem *mem_crb;
 
 	/* Only 64-bit aligned access */
@@ -1748,9 +1719,7 @@ netxen_nic_pci_mem_read_2M(struct netxen_adapter *adapter,
 	return -EIO;
 
 correct:
-	stride = NX_IS_REVISION_P3P(adapter->ahw.revision_id) ? 16 : 8;
-
-	off8 = off & ~(stride-1);
+	off8 = off & 0xfffffff8;
 
 	spin_lock(&adapter->ahw.mem_lock);
 
@@ -1771,13 +1740,8 @@ correct:
 					"failed to read through agent\n");
 		ret = -EIO;
 	} else {
-		off8 = MIU_TEST_AGT_RDDATA_LO;
-		if ((stride == 16) && (off & 0xf))
-			off8 = MIU_TEST_AGT_RDDATA_UPPER_LO;
-
-		temp = readl(mem_crb + off8 + 4);
-		val = (u64)temp << 32;
-		val |= readl(mem_crb + off8);
+		val = (u64)(readl(mem_crb + MIU_TEST_AGT_RDDATA_HI)) << 32;
+		val |= readl(mem_crb + MIU_TEST_AGT_RDDATA_LO);
 		*data = val;
 		ret = 0;
 	}
-- 
1.6.0.2


^ permalink raw reply related

* [PATCH NEXT 0/4]netxen: bug fixes
From: Amit Kumar Salecha @ 2010-05-12  9:53 UTC (permalink / raw)
  To: davem; +Cc: netdev, ameen.rahman

Hi
  Series of 4 patches to fix diagnostic tools access and register usage
  for NX3031.
  Please apply them on net-next branch.

-Amit

^ permalink raw reply

* [PATCH NEXT 4/4] netxen: handle queue manager access
From: Amit Kumar Salecha @ 2010-05-12  9:53 UTC (permalink / raw)
  To: davem; +Cc: netdev, ameen.rahman
In-Reply-To: <1273657985-13405-1-git-send-email-amit.salecha@qlogic.com>

Check the access by tools for hardware queue engine and handle it
separately than other block registers, otherwise incorrect data
is returned.

Support for only NX3031 based cards.

Acked-by: Dhananjay Phadke <dhananjay.phadke@qlogic.com>
Signed-off-by: Amit Kumar Salecha <amit.salecha@qlogic.com>
---
 drivers/net/netxen/netxen_nic.h      |    5 ++++
 drivers/net/netxen/netxen_nic_hw.c   |   25 +++++++++++++++++--
 drivers/net/netxen/netxen_nic_main.c |   44 +++++++++++++++++++++++++++------
 3 files changed, 63 insertions(+), 11 deletions(-)

diff --git a/drivers/net/netxen/netxen_nic.h b/drivers/net/netxen/netxen_nic.h
index 174ac8e..ffa1b9c 100644
--- a/drivers/net/netxen/netxen_nic.h
+++ b/drivers/net/netxen/netxen_nic.h
@@ -95,6 +95,9 @@
 #define ADDR_IN_WINDOW1(off)	\
 	((off > NETXEN_CRB_PCIX_HOST2) && (off < NETXEN_CRB_MAX)) ? 1 : 0
 
+#define ADDR_IN_RANGE(addr, low, high)	\
+	(((addr) < (high)) && ((addr) >= (low)))
+
 /*
  * normalize a 64MB crb address to 32MB PCI window
  * To use NETXEN_CRB_NORMALIZE, window _must_ be set to 1
@@ -1352,6 +1355,8 @@ int netxen_config_rss(struct netxen_adapter *adapter, int enable);
 int netxen_config_ipaddr(struct netxen_adapter *adapter, u32 ip, int cmd);
 int netxen_linkevent_request(struct netxen_adapter *adapter, int enable);
 void netxen_advert_link_change(struct netxen_adapter *adapter, int linkup);
+void netxen_pci_camqm_read_2M(struct netxen_adapter *, u64, u64 *);
+void netxen_pci_camqm_write_2M(struct netxen_adapter *, u64, u64);
 
 int nx_fw_cmd_set_mtu(struct netxen_adapter *adapter, int mtu);
 int netxen_nic_change_mtu(struct net_device *netdev, int new_mtu);
diff --git a/drivers/net/netxen/netxen_nic_hw.c b/drivers/net/netxen/netxen_nic_hw.c
index be63988..5c496f8 100644
--- a/drivers/net/netxen/netxen_nic_hw.c
+++ b/drivers/net/netxen/netxen_nic_hw.c
@@ -62,9 +62,6 @@ static inline void writeq(u64 val, void __iomem *addr)
 }
 #endif
 
-#define ADDR_IN_RANGE(addr, low, high)	\
-	(((addr) < (high)) && ((addr) >= (low)))
-
 #define PCI_OFFSET_FIRST_RANGE(adapter, off)    \
 	((adapter)->ahw.pci_base0 + (off))
 #define PCI_OFFSET_SECOND_RANGE(adapter, off)   \
@@ -1448,6 +1445,28 @@ unlock:
 	return ret;
 }
 
+void
+netxen_pci_camqm_read_2M(struct netxen_adapter *adapter, u64 off, u64 *data)
+{
+	void __iomem *addr = adapter->ahw.pci_base0 +
+		NETXEN_PCI_CAMQM_2M_BASE + (off - NETXEN_PCI_CAMQM);
+
+	spin_lock(&adapter->ahw.mem_lock);
+	*data = readq(addr);
+	spin_unlock(&adapter->ahw.mem_lock);
+}
+
+void
+netxen_pci_camqm_write_2M(struct netxen_adapter *adapter, u64 off, u64 data)
+{
+	void __iomem *addr = adapter->ahw.pci_base0 +
+		NETXEN_PCI_CAMQM_2M_BASE + (off - NETXEN_PCI_CAMQM);
+
+	spin_lock(&adapter->ahw.mem_lock);
+	writeq(data, addr);
+	spin_unlock(&adapter->ahw.mem_lock);
+}
+
 #define MAX_CTL_CHECK   1000
 
 static int
diff --git a/drivers/net/netxen/netxen_nic_main.c b/drivers/net/netxen/netxen_nic_main.c
index b665b42..692e672 100644
--- a/drivers/net/netxen/netxen_nic_main.c
+++ b/drivers/net/netxen/netxen_nic_main.c
@@ -2537,14 +2537,24 @@ static int
 netxen_sysfs_validate_crb(struct netxen_adapter *adapter,
 		loff_t offset, size_t size)
 {
+	size_t crb_size = 4;
+
 	if (!(adapter->flags & NETXEN_NIC_DIAG_ENABLED))
 		return -EIO;
 
-	if ((size != 4) || (offset & 0x3))
-		return  -EINVAL;
+	if (offset < NETXEN_PCI_CRBSPACE) {
+		if (NX_IS_REVISION_P2(adapter->ahw.revision_id))
+			return -EINVAL;
 
-	if (offset < NETXEN_PCI_CRBSPACE)
-		return -EINVAL;
+		if (ADDR_IN_RANGE(offset, NETXEN_PCI_CAMQM,
+						NETXEN_PCI_CAMQM_2M_END))
+			crb_size = 8;
+		else
+			return -EINVAL;
+	}
+
+	if ((size != crb_size) || (offset & (crb_size-1)))
+		return  -EINVAL;
 
 	return 0;
 }
@@ -2556,14 +2566,23 @@ netxen_sysfs_read_crb(struct kobject *kobj, struct bin_attribute *attr,
 	struct device *dev = container_of(kobj, struct device, kobj);
 	struct netxen_adapter *adapter = dev_get_drvdata(dev);
 	u32 data;
+	u64 qmdata;
 	int ret;
 
 	ret = netxen_sysfs_validate_crb(adapter, offset, size);
 	if (ret != 0)
 		return ret;
 
-	data = NXRD32(adapter, offset);
-	memcpy(buf, &data, size);
+	if (NX_IS_REVISION_P3(adapter->ahw.revision_id) &&
+		ADDR_IN_RANGE(offset, NETXEN_PCI_CAMQM,
+					NETXEN_PCI_CAMQM_2M_END)) {
+		netxen_pci_camqm_read_2M(adapter, offset, &qmdata);
+		memcpy(buf, &qmdata, size);
+	} else {
+		data = NXRD32(adapter, offset);
+		memcpy(buf, &data, size);
+	}
+
 	return size;
 }
 
@@ -2574,14 +2593,23 @@ netxen_sysfs_write_crb(struct kobject *kobj, struct bin_attribute *attr,
 	struct device *dev = container_of(kobj, struct device, kobj);
 	struct netxen_adapter *adapter = dev_get_drvdata(dev);
 	u32 data;
+	u64 qmdata;
 	int ret;
 
 	ret = netxen_sysfs_validate_crb(adapter, offset, size);
 	if (ret != 0)
 		return ret;
 
-	memcpy(&data, buf, size);
-	NXWR32(adapter, offset, data);
+	if (NX_IS_REVISION_P3(adapter->ahw.revision_id) &&
+		ADDR_IN_RANGE(offset, NETXEN_PCI_CAMQM,
+					NETXEN_PCI_CAMQM_2M_END)) {
+		memcpy(&qmdata, buf, size);
+		netxen_pci_camqm_write_2M(adapter, offset, qmdata);
+	} else {
+		memcpy(&data, buf, size);
+		NXWR32(adapter, offset, data);
+	}
+
 	return size;
 }
 
-- 
1.6.0.2


^ permalink raw reply related

* [PATCH NEXT 1/4] netxen: fix register usage
From: Amit Kumar Salecha @ 2010-05-12  9:53 UTC (permalink / raw)
  To: davem; +Cc: netdev, ameen.rahman
In-Reply-To: <1273657985-13405-1-git-send-email-amit.salecha@qlogic.com>

o For NX3031, MSI_MODE, CAPABILITIES_FW and SCRATCHPAD registers
  are obsolete. These register addresses can be used for different
  purpose.

Signed-off-by: Amit Kumar Salecha <amit.salecha@qlogic.com>
---
 drivers/net/netxen/netxen_nic_ethtool.c |    3 +++
 drivers/net/netxen/netxen_nic_hdr.h     |    2 --
 drivers/net/netxen/netxen_nic_init.c    |    4 +++-
 3 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/net/netxen/netxen_nic_ethtool.c b/drivers/net/netxen/netxen_nic_ethtool.c
index aecba78..20f7c58 100644
--- a/drivers/net/netxen/netxen_nic_ethtool.c
+++ b/drivers/net/netxen/netxen_nic_ethtool.c
@@ -632,6 +632,9 @@ static int netxen_nic_reg_test(struct net_device *dev)
 	if ((data_read & 0xffff) != adapter->pdev->vendor)
 		return 1;
 
+	if (NX_IS_REVISION_P3(adapter->ahw.revision_id))
+		return 0;
+
 	data_written = (u32)0xa5a5a5a5;
 
 	NXWR32(adapter, CRB_SCRATCHPAD_TEST, data_written);
diff --git a/drivers/net/netxen/netxen_nic_hdr.h b/drivers/net/netxen/netxen_nic_hdr.h
index 622e4c8..d930068 100644
--- a/drivers/net/netxen/netxen_nic_hdr.h
+++ b/drivers/net/netxen/netxen_nic_hdr.h
@@ -789,9 +789,7 @@ enum {
  * for backward compability
  */
 #define CRB_NIC_CAPABILITIES_HOST	NETXEN_NIC_REG(0x1a8)
-#define CRB_NIC_CAPABILITIES_FW	  	NETXEN_NIC_REG(0x1dc)
 #define CRB_NIC_MSI_MODE_HOST		NETXEN_NIC_REG(0x270)
-#define CRB_NIC_MSI_MODE_FW	  	NETXEN_NIC_REG(0x274)
 
 #define INTR_SCHEME_PERPORT	      	0x1
 #define MSI_MODE_MULTIFUNC	      	0x1
diff --git a/drivers/net/netxen/netxen_nic_init.c b/drivers/net/netxen/netxen_nic_init.c
index 388feaf..4a2bbeb 100644
--- a/drivers/net/netxen/netxen_nic_init.c
+++ b/drivers/net/netxen/netxen_nic_init.c
@@ -1361,10 +1361,12 @@ int netxen_init_firmware(struct netxen_adapter *adapter)
 		return err;
 
 	NXWR32(adapter, CRB_NIC_CAPABILITIES_HOST, INTR_SCHEME_PERPORT);
-	NXWR32(adapter, CRB_NIC_MSI_MODE_HOST, MSI_MODE_MULTIFUNC);
 	NXWR32(adapter, CRB_MPORT_MODE, MPORT_MULTI_FUNCTION_MODE);
 	NXWR32(adapter, CRB_CMDPEG_STATE, PHAN_INITIALIZE_ACK);
 
+	if (NX_IS_REVISION_P2(adapter->ahw.revision_id))
+		NXWR32(adapter, CRB_NIC_MSI_MODE_HOST, MSI_MODE_MULTIFUNC);
+
 	return err;
 }
 
-- 
1.6.0.2


^ permalink raw reply related

* [PATCH NEXT 3/4] netxen: to fix onchip memory access.
From: Amit Kumar Salecha @ 2010-05-12  9:53 UTC (permalink / raw)
  To: davem; +Cc: netdev, ameen.rahman, Sucheta Chakraborty
In-Reply-To: <1273657985-13405-1-git-send-email-amit.salecha@qlogic.com>

From: Sucheta Chakraborty <sucheta.chakraborty@qlogic.com>

Remove unnecessary remap of the region in bar 0 to access onhip memory
for NX3031.

Signed-off-by: Sucheta Chakraborty <sucheta.chakraborty@qlogic.com>
Signed-off-by: Amit Kumar Salecha <amit.salecha@qlogic.com>
---
 drivers/net/netxen/netxen_nic_hw.c |   42 ++++++++++++++---------------------
 1 files changed, 17 insertions(+), 25 deletions(-)

diff --git a/drivers/net/netxen/netxen_nic_hw.c b/drivers/net/netxen/netxen_nic_hw.c
index 87bc910..be63988 100644
--- a/drivers/net/netxen/netxen_nic_hw.c
+++ b/drivers/net/netxen/netxen_nic_hw.c
@@ -32,7 +32,6 @@
 #define MASK(n) ((1ULL<<(n))-1)
 #define MN_WIN(addr) (((addr & 0x1fc0000) >> 1) | ((addr >> 25) & 0x3ff))
 #define OCM_WIN(addr) (((addr & 0x1ff0000) >> 1) | ((addr >> 25) & 0x3ff))
-#define OCM_WIN_P3P(addr) (addr & 0xffc0000)
 #define MS_WIN(addr) (addr & 0x0ffc0000)
 
 #define GET_MEM_OFFS_2M(addr) (addr & MASK(18))
@@ -1391,18 +1390,8 @@ netxen_nic_pci_set_window_2M(struct netxen_adapter *adapter,
 		u64 addr, u32 *start)
 {
 	u32 window;
-	struct pci_dev *pdev = adapter->pdev;
 
-	if ((addr & 0x00ff800) == 0xff800) {
-		if (printk_ratelimit())
-			dev_warn(&pdev->dev, "QM access not handled\n");
-		return -EIO;
-	}
-
-	if (NX_IS_REVISION_P3P(adapter->ahw.revision_id))
-		window = OCM_WIN_P3P(addr);
-	else
-		window = OCM_WIN(addr);
+	window = OCM_WIN(addr);
 
 	writel(window, adapter->ahw.ocm_win_crb);
 	/* read back to flush */
@@ -1419,7 +1408,7 @@ netxen_nic_pci_mem_access_direct(struct netxen_adapter *adapter, u64 off,
 {
 	void __iomem *addr, *mem_ptr = NULL;
 	resource_size_t mem_base;
-	int ret = -EIO;
+	int ret;
 	u32 start;
 
 	spin_lock(&adapter->ahw.mem_lock);
@@ -1428,20 +1417,23 @@ netxen_nic_pci_mem_access_direct(struct netxen_adapter *adapter, u64 off,
 	if (ret != 0)
 		goto unlock;
 
-	addr = pci_base_offset(adapter, start);
-	if (addr)
-		goto noremap;
-
-	mem_base = pci_resource_start(adapter->pdev, 0) + (start & PAGE_MASK);
+	if (NX_IS_REVISION_P3(adapter->ahw.revision_id)) {
+		addr = adapter->ahw.pci_base0 + start;
+	} else {
+		addr = pci_base_offset(adapter, start);
+		if (addr)
+			goto noremap;
+
+		mem_base = pci_resource_start(adapter->pdev, 0) +
+					(start & PAGE_MASK);
+		mem_ptr = ioremap(mem_base, PAGE_SIZE);
+		if (mem_ptr == NULL) {
+			ret = -EIO;
+			goto unlock;
+		}
 
-	mem_ptr = ioremap(mem_base, PAGE_SIZE);
-	if (mem_ptr == NULL) {
-		ret = -EIO;
-		goto unlock;
+		addr = mem_ptr + (start & (PAGE_SIZE-1));
 	}
-
-	addr = mem_ptr + (start & (PAGE_SIZE - 1));
-
 noremap:
 	if (op == 0)	/* read */
 		*data = readq(addr);
-- 
1.6.0.2


^ permalink raw reply related

* RE: does the broadcom bnx2x support RSS/multi queue
From: Jon Zhou @ 2010-05-12  9:34 UTC (permalink / raw)
  To: eilong@broadcom.com, Eric Dumazet; +Cc: netdev@vger.kernel.org
In-Reply-To: <1273655947.4491.5.camel@lb-tlvb-eilong.il.broadcom.com>

hi eilon:

do you think I need to update the kernel also?

thanks!
jon

-----Original Message-----
From: Eilon Greenstein [mailto:eilong@broadcom.com] 
Sent: Wednesday, May 12, 2010 5:19 PM
To: Eric Dumazet
Cc: Jon Zhou; netdev@vger.kernel.org
Subject: Re: does the broadcom bnx2x support RSS/multi queue

On Wed, 2010-05-12 at 00:41 -0700, Eric Dumazet wrote:
> Le mercredi 12 mai 2010 à 00:31 -0700, Jon Zhou a écrit :
> > hi there
> > 
> > I am not sure if my Broadcom 10G nic driver(bnx2x) support RSS/multi queue
> > 
> > ibm-bc-53:/home/ruizhou/nprobe # uname -a
> > Linux ibm-bc-53 2.6.27.19-5-default #1 SMP 2009-02-28 04:40:21 +0100 x86_64 x86_64 x86_64 GNU/Linux
> > 
> > ibm-bc-53:/home/ruizhou/nprobe # ethtool -S eth5
> > NIC statistics:
> >      rx_bytes: 68100170
> >      rx_error_bytes: 0
> >      tx_bytes: 0
> >      tx_error_bytes: 0
> >      rx_ucast_packets: 201654
> >      rx_mcast_packets: 0
> >      rx_bcast_packets: 0
> >      tx_packets: 0
> >      tx_mac_errors: 0
> >      tx_carrier_errors: 0
> >      rx_crc_errors: 0
> >      rx_align_errors: 0
> >      tx_single_collisions: 0
> >      tx_multi_collisions: 0
> >      tx_deferred: 0
> >      tx_excess_collisions: 0
> >      tx_late_collisions: 0
> >      tx_total_collisions: 0
> >      rx_fragments: 0
> >      rx_jabbers: 0
> >      rx_undersize_packets: 0
> >      rx_oversize_packets: 0
> >      tx_64_byte_packets: 0
> >      tx_65_to_127_byte_packets: 0
> >      tx_128_to_255_byte_packets: 0
> >      tx_256_to_511_byte_packets: 0
> >      tx_512_to_1023_byte_packets: 0
> >      tx_1024_to_1522_byte_packets: 0
> >      tx_1523_to_9022_byte_packets: 0
> >      rx_xon_frames: 0
> >      rx_xoff_frames: 0
> >      tx_xon_frames: 0
> >      tx_xoff_frames: 0
> >      rx_mac_ctrl_frames: 0
> >      rx_filtered_packets: 0
> >      rx_discards: 0
> >      rx_fw_discards: 0
> >      brb_discard: 0
> >      brb_truncate: 0
> >      rx_phy_ip_err_discards: 0
> >      rx_skb_alloc_discard: 0
> >      rx_csum_offload_errors: 6
> > 
> > the driver ver is:
> > bnx2x_main.c
> > #define DRV_MODULE_VERSION      "1.45.26"
> > 
> > looks not support?
> > 
> > thanks
> > jon
> 
> Per queue stats were added last year only (Thu Feb 12 08:36:33 2009)
> 
> You might check "grep eth5 /proc/interrupts"
> 
> Or upgrade to 2.6.33.x kernel :)
> 
The HW and current driver support multi-queue. However, you are using a version which is too old.






^ permalink raw reply

* Re: [PATCH net-next] bnx2x: avoid TX timeout when stopping device
From: Eric Dumazet @ 2010-05-12  9:27 UTC (permalink / raw)
  To: Stanislaw Gruszka
  Cc: netdev, Eilon Greenstein, Vladislav Zolotarov, Dmitry Kravkov,
	Michael Chan, Breno Leitao
In-Reply-To: <20100512110921.0e3f45fc@dhcp-lab-109.englab.brq.redhat.com>

Le mercredi 12 mai 2010 à 11:09 +0200, Stanislaw Gruszka a écrit :
> When stop device call netif_carrier_off() just after disabling TX queue to
> avoid possibility of netdev watchdog warning and ->ndo_tx_timeout() invocation.
> 
> Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
> ---

This reminds me I saw some strange things in bnx2.c for a similar
symptom.

Commit e6bf95ffa8d6f8f4b7ee33ea01490d95b0bbeb6e

Would you take a look at this too ?

Or if this kind of trans_start refresh on all queues is really needed,
it should be a core network provided function, not implemented on every
driver...

Thanks :)

>  drivers/net/bnx2x_main.c |    6 ++----
>  1 files changed, 2 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/net/bnx2x_main.c b/drivers/net/bnx2x_main.c
> index 2bc35c7..57ff5b3 100644
> --- a/drivers/net/bnx2x_main.c
> +++ b/drivers/net/bnx2x_main.c
> @@ -8499,6 +8499,7 @@ static int bnx2x_nic_unload(struct bnx2x *bp, int unload_mode)
>  
>  	/* Disable HW interrupts, NAPI and Tx */
>  	bnx2x_netif_stop(bp, 1);
> +	netif_carrier_off(bp->dev);
>  
>  	del_timer_sync(&bp->timer);
>  	SHMEM_WR(bp, func_mb[BP_FUNC(bp)].drv_pulse_mb,
> @@ -8524,8 +8525,6 @@ static int bnx2x_nic_unload(struct bnx2x *bp, int unload_mode)
>  
>  	bp->state = BNX2X_STATE_CLOSED;
>  
> -	netif_carrier_off(bp->dev);
> -
>  	/* The last driver must disable a "close the gate" if there is no
>  	 * parity attention or "process kill" pending.
>  	 */
> @@ -13431,6 +13430,7 @@ static int bnx2x_eeh_nic_unload(struct bnx2x *bp)
>  	bp->rx_mode = BNX2X_RX_MODE_NONE;
>  
>  	bnx2x_netif_stop(bp, 0);
> +	netif_carrier_off(bp->dev);
>  
>  	del_timer_sync(&bp->timer);
>  	bp->stats_state = STATS_STATE_DISABLED;
> @@ -13457,8 +13457,6 @@ static int bnx2x_eeh_nic_unload(struct bnx2x *bp)
>  
>  	bp->state = BNX2X_STATE_CLOSED;
>  
> -	netif_carrier_off(bp->dev);
> -
>  	return 0;
>  }
>  



^ permalink raw reply

* Re: [PATCH RFC] vhost: fix barrier pairing
From: Juan Quintela @ 2010-05-12  9:22 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Rusty Russell, David S. Miller, Paul E. McKenney, kvm,
	virtualization, netdev, linux-kernel
In-Reply-To: <20100511172633.GA9091@redhat.com>

"Michael S. Tsirkin" <mst@redhat.com> wrote:
> According to memory-barriers.txt, an smp memory barrier
> should always be paired with another smp memory barrier,
> and I quote "a lack of appropriate pairing is almost certainly an
> error".
>
> In case of vhost, failure to flush out used index
> update before looking at the interrupt disable flag
> could result in missed interrupts, resulting in
> networking hang under stress.
>
> This might happen when flags read bypasses used index write.
> So we see interrupts disabled and do not interrupt, at the
> same time guest writes flags value to enable interrupt,
> reads an old used index value, thinks that
> used ring is empty and waits for interrupt.
>
> Note: the barrier we pair with here is in
> drivers/virtio/virtio_ring.c, function
> vring_enable_cb.
>
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> ---
>
> Dave, I think this is needed in 2.6.34, I'll send a pull
> request after doing some more testing.
>
> Rusty, Juan, could you take a look as well please?
> Thanks!

I would have prefered to put it:

void vhost_add_used_and_signal(struct vhost_dev *dev,
			       struct vhost_virtqueue *vq,
			       unsigned int head, int len)
{
	vhost_add_used(vq, head, len);
>>>>    smp_mb();
	vhost_signal(dev, vq);
}

Because it looks strange to have a barrier as the 1st instruction of a
function.  And this way it is clearer (at least to me) what we are
protecting.

But on the other hand, we would have to put a comment explainingthat all
users of vhost_signal() have to put that smp_mb() so .....

Perhaps just improving the commet stating that the corresponding barrier
is there?

> Note: the barrier we pair with here is in
> drivers/virtio/virtio_ring.c, function
> vring_enable_cb.

Good catch.

Later, Juan.

^ permalink raw reply

* Re: [PATCH V4 1/4] net: add a noref bit on skb dst
From: Eric Dumazet @ 2010-05-12  9:19 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <1273527162.10889.10.camel@edumazet-laptop>

Le lundi 10 mai 2010 à 23:32 +0200, Eric Dumazet a écrit :

> skb_dst_force() helper is used to force a refcount on dst, when skb
> is queued and not anymore RCU protected.
> 
> Use skb_dst_force() in __sk_add_backlog(), __dev_xmit_skb() if 
> !IFF_XMIT_DST_RELEASE or skb enqueued on qdisc queue, in
> sock_queue_rcv_skb(), in __nf_queue().

While doing benches, I noticed one spot was forgotten, in case of
requeueing from a work-conserving queue (dev_requeue_skb() must call
skb_dst_force())

Here is updated first patch.


[PATCH V4 1/4] net: add a noref bit on skb dst

Use low order bit of skb->_skb_dst to tell dst is not refcounted.

Change _skb_dst to _skb_refdst to make sure all uses are catched.

skb_dst() returns the dst, regardless of noref bit set or not, but
with a lockdep check to make sure a noref dst is not given if current
user is not rcu protected.

New skb_dst_set_noref() helper to set an notrefcounted dst on a skb.
(with lockdep check)

skb_dst_drop() drops a reference only if skb dst was refcounted.

skb_dst_force() helper is used to force a refcount on dst, when skb
is queued and not anymore RCU protected.

Use skb_dst_force() in __sk_add_backlog(), __dev_xmit_skb() if 
!IFF_XMIT_DST_RELEASE or skb enqueued on qdisc queue, in
sock_queue_rcv_skb(), in __nf_queue().

Use skb_dst_force() in dev_requeue_skb().

Note: dst_use_noref() still dirties dst, we might transform it
later to do one dirtying per jiffies.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 include/linux/skbuff.h   |   58 ++++++++++++++++++++++++++++++++++---
 include/net/dst.h        |   48 ++++++++++++++++++++++++++++--
 include/net/sock.h       |   13 +++++---
 net/core/dev.c           |    3 +
 net/core/skbuff.c        |    2 -
 net/core/sock.c          |    6 +++
 net/ipv4/icmp.c          |    6 +--
 net/ipv4/ip_options.c    |    9 +++--
 net/ipv4/netfilter.c     |    6 +--
 net/ipv4/route.c         |    2 -
 net/netfilter/nf_queue.c |    2 +
 net/sched/sch_generic.c  |    4 +-
 12 files changed, 134 insertions(+), 25 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index c9525bc..7cdfb4d 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -264,7 +264,7 @@ typedef unsigned char *sk_buff_data_t;
  *	@transport_header: Transport layer header
  *	@network_header: Network layer header
  *	@mac_header: Link layer header
- *	@_skb_dst: destination entry
+ *	@_skb_refdst: destination entry (with norefcount bit)
  *	@sp: the security path, used for xfrm
  *	@cb: Control buffer. Free for use by every layer. Put private vars here
  *	@len: Length of actual data
@@ -328,7 +328,7 @@ struct sk_buff {
 	 */
 	char			cb[48] __aligned(8);
 
-	unsigned long		_skb_dst;
+	unsigned long		_skb_refdst;
 #ifdef CONFIG_XFRM
 	struct	sec_path	*sp;
 #endif
@@ -419,14 +419,64 @@ struct sk_buff {
 
 #include <asm/system.h>
 
+/*
+ * skb might have a dst pointer attached, refcounted or not.
+ * _skb_refdst low order bit is set if refcount was _not_ taken
+ */
+#define SKB_DST_NOREF	1UL
+#define SKB_DST_PTRMASK	~(SKB_DST_NOREF)
+
+/**
+ * skb_dst - returns skb dst_entry
+ * @skb: buffer
+ *
+ * Returns skb dst_entry, regardless of reference taken or not.
+ */
 static inline struct dst_entry *skb_dst(const struct sk_buff *skb)
 {
-	return (struct dst_entry *)skb->_skb_dst;
+	/* If refdst was not refcounted, check we still are in a 
+	 * rcu_read_lock section
+	 */
+	WARN_ON((skb->_skb_refdst & SKB_DST_NOREF) &&
+		!rcu_read_lock_held() &&
+		!rcu_read_lock_bh_held());
+	return (struct dst_entry *)(skb->_skb_refdst & SKB_DST_PTRMASK);
 }
 
+/**
+ * skb_dst_set - sets skb dst
+ * @skb: buffer
+ * @dst: dst entry
+ *
+ * Sets skb dst, assuming a reference was taken on dst and should
+ * be released by skb_dst_drop()
+ */
 static inline void skb_dst_set(struct sk_buff *skb, struct dst_entry *dst)
 {
-	skb->_skb_dst = (unsigned long)dst;
+	skb->_skb_refdst = (unsigned long)dst;
+}
+
+/**
+ * skb_dst_set_noref - sets skb dst, without a reference
+ * @skb: buffer
+ * @dst: dst entry
+ *
+ * Sets skb dst, assuming a reference was not taken on dst
+ * skb_dst_drop() should not dst_release() this dst
+ */
+static inline void skb_dst_set_noref(struct sk_buff *skb, struct dst_entry *dst)
+{
+	WARN_ON(!rcu_read_lock_held() && !rcu_read_lock_bh_held());
+	skb->_skb_refdst = (unsigned long)dst | SKB_DST_NOREF;
+}
+
+/**
+ * skb_dst_is_noref - Test if skb dst isnt refcounted
+ * @skb: buffer
+ */
+static inline bool skb_dst_is_noref(const struct sk_buff *skb)
+{
+	return (skb->_skb_refdst & SKB_DST_NOREF) && skb_dst(skb);
 }
 
 static inline struct rtable *skb_rtable(const struct sk_buff *skb)
diff --git a/include/net/dst.h b/include/net/dst.h
index aac5a5f..27207a1 100644
--- a/include/net/dst.h
+++ b/include/net/dst.h
@@ -168,6 +168,12 @@ static inline void dst_use(struct dst_entry *dst, unsigned long time)
 	dst->lastuse = time;
 }
 
+static inline void dst_use_noref(struct dst_entry *dst, unsigned long time)
+{
+	dst->__use++;
+	dst->lastuse = time;
+}
+
 static inline
 struct dst_entry * dst_clone(struct dst_entry * dst)
 {
@@ -177,11 +183,47 @@ struct dst_entry * dst_clone(struct dst_entry * dst)
 }
 
 extern void dst_release(struct dst_entry *dst);
+
+static inline void refdst_drop(unsigned long refdst)
+{
+	if (!(refdst & SKB_DST_NOREF))
+		dst_release((struct dst_entry *)(refdst & SKB_DST_PTRMASK));
+}
+
+/**
+ * skb_dst_drop - drops skb dst
+ * @skb: buffer
+ *
+ * Drops dst reference count if a reference was taken.
+ */
 static inline void skb_dst_drop(struct sk_buff *skb)
 {
-	if (skb->_skb_dst)
-		dst_release(skb_dst(skb));
-	skb->_skb_dst = 0UL;
+	if (skb->_skb_refdst) {
+		refdst_drop(skb->_skb_refdst);
+		skb->_skb_refdst = 0UL;
+	}
+}
+
+static inline void skb_dst_copy(struct sk_buff *nskb, const struct sk_buff *oskb)
+{
+	nskb->_skb_refdst = oskb->_skb_refdst;
+	if (!(nskb->_skb_refdst & SKB_DST_NOREF))
+		dst_clone(skb_dst(nskb));
+}
+
+/**
+ * skb_dst_force - makes sure skb dst is refcounted
+ * @skb: buffer
+ *
+ * If dst is not yet refcounted, let's do it
+ */
+static inline void skb_dst_force(struct sk_buff *skb)
+{
+	if (skb_dst_is_noref(skb)) {
+		WARN_ON(!rcu_read_lock_held());
+		skb->_skb_refdst &= ~SKB_DST_NOREF;
+		dst_clone(skb_dst(skb));
+	}
 }
 
 /* Children define the path of the packet through the
diff --git a/include/net/sock.h b/include/net/sock.h
index 328e03f..307affa 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -598,12 +598,15 @@ static inline int sk_stream_memory_free(struct sock *sk)
 /* OOB backlog add */
 static inline void __sk_add_backlog(struct sock *sk, struct sk_buff *skb)
 {
-	if (!sk->sk_backlog.tail) {
-		sk->sk_backlog.head = sk->sk_backlog.tail = skb;
-	} else {
+	/* dont let skb dst not refcounted, we are going to leave rcu lock */
+	skb_dst_force(skb);
+
+	if (!sk->sk_backlog.tail)
+		sk->sk_backlog.head = skb;
+	else
 		sk->sk_backlog.tail->next = skb;
-		sk->sk_backlog.tail = skb;
-	}
+
+	sk->sk_backlog.tail = skb;
 	skb->next = NULL;
 }
 
diff --git a/net/core/dev.c b/net/core/dev.c
index 32611c8..dfe6ba6 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2047,6 +2047,8 @@ static inline int __dev_xmit_skb(struct sk_buff *skb, struct Qdisc *q,
 		 * waiting to be sent out; and the qdisc is not running -
 		 * xmit the skb directly.
 		 */
+		if (!(dev->priv_flags & IFF_XMIT_DST_RELEASE))
+			skb_dst_force(skb);
 		__qdisc_update_bstats(q, skb->len);
 		if (sch_direct_xmit(skb, q, dev, txq, root_lock))
 			__qdisc_run(q);
@@ -2055,6 +2057,7 @@ static inline int __dev_xmit_skb(struct sk_buff *skb, struct Qdisc *q,
 
 		rc = NET_XMIT_SUCCESS;
 	} else {
+		skb_dst_force(skb);
 		rc = qdisc_enqueue_root(skb, q);
 		qdisc_run(q);
 	}
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index a9b0e1f..c543dd2 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -520,7 +520,7 @@ static void __copy_skb_header(struct sk_buff *new, const struct sk_buff *old)
 	new->transport_header	= old->transport_header;
 	new->network_header	= old->network_header;
 	new->mac_header		= old->mac_header;
-	skb_dst_set(new, dst_clone(skb_dst(old)));
+	skb_dst_copy(new, old);
 	new->rxhash		= old->rxhash;
 #ifdef CONFIG_XFRM
 	new->sp			= secpath_get(old->sp);
diff --git a/net/core/sock.c b/net/core/sock.c
index 94c4aff..d24b5c1 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -307,6 +307,11 @@ int sock_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
 	 */
 	skb_len = skb->len;
 
+	/* we escape from rcu protected region, make sure we dont leak
+	 * a norefcounted dst
+	 */
+	skb_dst_force(skb);
+
 	spin_lock_irqsave(&list->lock, flags);
 	skb->dropcount = atomic_read(&sk->sk_drops);
 	__skb_queue_tail(list, skb);
@@ -1535,6 +1540,7 @@ static void __release_sock(struct sock *sk)
 		do {
 			struct sk_buff *next = skb->next;
 
+			WARN_ON_ONCE(skb_dst_is_noref(skb));
 			skb->next = NULL;
 			sk_backlog_rcv(sk, skb);
 
diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
index f3d339f..d65e921 100644
--- a/net/ipv4/icmp.c
+++ b/net/ipv4/icmp.c
@@ -587,20 +587,20 @@ void icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info)
 			err = __ip_route_output_key(net, &rt2, &fl);
 		else {
 			struct flowi fl2 = {};
-			struct dst_entry *odst;
+			unsigned long orefdst;
 
 			fl2.fl4_dst = fl.fl4_src;
 			if (ip_route_output_key(net, &rt2, &fl2))
 				goto relookup_failed;
 
 			/* Ugh! */
-			odst = skb_dst(skb_in);
+			orefdst = skb_in->_skb_refdst; /* save old refdst */
 			err = ip_route_input(skb_in, fl.fl4_dst, fl.fl4_src,
 					     RT_TOS(tos), rt2->u.dst.dev);
 
 			dst_release(&rt2->u.dst);
 			rt2 = skb_rtable(skb_in);
-			skb_dst_set(skb_in, odst);
+			skb_in->_skb_refdst = orefdst; /* restore old refdst */
 		}
 
 		if (err)
diff --git a/net/ipv4/ip_options.c b/net/ipv4/ip_options.c
index 4c09a31..3244133 100644
--- a/net/ipv4/ip_options.c
+++ b/net/ipv4/ip_options.c
@@ -601,6 +601,7 @@ int ip_options_rcv_srr(struct sk_buff *skb)
 	unsigned char *optptr = skb_network_header(skb) + opt->srr;
 	struct rtable *rt = skb_rtable(skb);
 	struct rtable *rt2;
+	unsigned long orefdst;
 	int err;
 
 	if (!opt->srr)
@@ -624,16 +625,16 @@ int ip_options_rcv_srr(struct sk_buff *skb)
 		}
 		memcpy(&nexthop, &optptr[srrptr-1], 4);
 
-		rt = skb_rtable(skb);
+		orefdst = skb->_skb_refdst;
 		skb_dst_set(skb, NULL);
 		err = ip_route_input(skb, nexthop, iph->saddr, iph->tos, skb->dev);
 		rt2 = skb_rtable(skb);
 		if (err || (rt2->rt_type != RTN_UNICAST && rt2->rt_type != RTN_LOCAL)) {
-			ip_rt_put(rt2);
-			skb_dst_set(skb, &rt->u.dst);
+			skb_dst_drop(skb);
+			skb->_skb_refdst = orefdst;
 			return -EINVAL;
 		}
-		ip_rt_put(rt);
+		refdst_drop(orefdst);
 		if (rt2->rt_type != RTN_LOCAL)
 			break;
 		/* Superfast 8) loopback forward */
diff --git a/net/ipv4/netfilter.c b/net/ipv4/netfilter.c
index 82fb43c..07de855 100644
--- a/net/ipv4/netfilter.c
+++ b/net/ipv4/netfilter.c
@@ -17,7 +17,7 @@ int ip_route_me_harder(struct sk_buff *skb, unsigned addr_type)
 	const struct iphdr *iph = ip_hdr(skb);
 	struct rtable *rt;
 	struct flowi fl = {};
-	struct dst_entry *odst;
+	unsigned long orefdst;
 	unsigned int hh_len;
 	unsigned int type;
 
@@ -51,14 +51,14 @@ int ip_route_me_harder(struct sk_buff *skb, unsigned addr_type)
 		if (ip_route_output_key(net, &rt, &fl) != 0)
 			return -1;
 
-		odst = skb_dst(skb);
+		orefdst = skb->_skb_refdst;
 		if (ip_route_input(skb, iph->daddr, iph->saddr,
 				   RT_TOS(iph->tos), rt->u.dst.dev) != 0) {
 			dst_release(&rt->u.dst);
 			return -1;
 		}
 		dst_release(&rt->u.dst);
-		dst_release(odst);
+		refdst_drop(orefdst);
 	}
 
 	if (skb_dst(skb)->error)
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index dea3f92..705eccf 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -3033,7 +3033,7 @@ int ip_rt_dump(struct sk_buff *skb,  struct netlink_callback *cb)
 				continue;
 			if (rt_is_expired(rt))
 				continue;
-			skb_dst_set(skb, dst_clone(&rt->u.dst));
+			skb_dst_set_noref(skb, &rt->u.dst);
 			if (rt_fill_info(net, skb, NETLINK_CB(cb->skb).pid,
 					 cb->nlh->nlmsg_seq, RTM_NEWROUTE,
 					 1, NLM_F_MULTI) <= 0) {
diff --git a/net/netfilter/nf_queue.c b/net/netfilter/nf_queue.c
index c49ef21..cb3cde4 100644
--- a/net/netfilter/nf_queue.c
+++ b/net/netfilter/nf_queue.c
@@ -9,6 +9,7 @@
 #include <linux/rcupdate.h>
 #include <net/protocol.h>
 #include <net/netfilter/nf_queue.h>
+#include <net/dst.h>
 
 #include "nf_internals.h"
 
@@ -170,6 +171,7 @@ static int __nf_queue(struct sk_buff *skb,
 			dev_hold(physoutdev);
 	}
 #endif
+	skb_dst_force(skb);
 	afinfo->saveroute(skb, entry);
 	status = qh->outfn(entry, queuenum);
 
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index a969b11..a63029e 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -26,6 +26,7 @@
 #include <linux/list.h>
 #include <linux/slab.h>
 #include <net/pkt_sched.h>
+#include <net/dst.h>
 
 /* Main transmission queue. */
 
@@ -40,6 +41,7 @@
 
 static inline int dev_requeue_skb(struct sk_buff *skb, struct Qdisc *q)
 {
+	skb_dst_force(skb);
 	q->gso_skb = skb;
 	q->qstats.requeues++;
 	q->q.qlen++;	/* it's still part of the queue */
@@ -179,7 +181,7 @@ static inline int qdisc_restart(struct Qdisc *q)
 	skb = dequeue_skb(q);
 	if (unlikely(!skb))
 		return 0;
-
+	WARN_ON_ONCE(skb_dst_is_noref(skb));
 	root_lock = qdisc_lock(q);
 	dev = qdisc_dev(q);
 	txq = netdev_get_tx_queue(dev, skb_get_queue_mapping(skb));



^ permalink raw reply related

* Re: does the broadcom bnx2x support RSS/multi queue
From: Eilon Greenstein @ 2010-05-12  9:19 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Jon Zhou, netdev@vger.kernel.org
In-Reply-To: <1273650119.2621.5.camel@edumazet-laptop>

On Wed, 2010-05-12 at 00:41 -0700, Eric Dumazet wrote:
> Le mercredi 12 mai 2010 à 00:31 -0700, Jon Zhou a écrit :
> > hi there
> > 
> > I am not sure if my Broadcom 10G nic driver(bnx2x) support RSS/multi queue
> > 
> > ibm-bc-53:/home/ruizhou/nprobe # uname -a
> > Linux ibm-bc-53 2.6.27.19-5-default #1 SMP 2009-02-28 04:40:21 +0100 x86_64 x86_64 x86_64 GNU/Linux
> > 
> > ibm-bc-53:/home/ruizhou/nprobe # ethtool -S eth5
> > NIC statistics:
> >      rx_bytes: 68100170
> >      rx_error_bytes: 0
> >      tx_bytes: 0
> >      tx_error_bytes: 0
> >      rx_ucast_packets: 201654
> >      rx_mcast_packets: 0
> >      rx_bcast_packets: 0
> >      tx_packets: 0
> >      tx_mac_errors: 0
> >      tx_carrier_errors: 0
> >      rx_crc_errors: 0
> >      rx_align_errors: 0
> >      tx_single_collisions: 0
> >      tx_multi_collisions: 0
> >      tx_deferred: 0
> >      tx_excess_collisions: 0
> >      tx_late_collisions: 0
> >      tx_total_collisions: 0
> >      rx_fragments: 0
> >      rx_jabbers: 0
> >      rx_undersize_packets: 0
> >      rx_oversize_packets: 0
> >      tx_64_byte_packets: 0
> >      tx_65_to_127_byte_packets: 0
> >      tx_128_to_255_byte_packets: 0
> >      tx_256_to_511_byte_packets: 0
> >      tx_512_to_1023_byte_packets: 0
> >      tx_1024_to_1522_byte_packets: 0
> >      tx_1523_to_9022_byte_packets: 0
> >      rx_xon_frames: 0
> >      rx_xoff_frames: 0
> >      tx_xon_frames: 0
> >      tx_xoff_frames: 0
> >      rx_mac_ctrl_frames: 0
> >      rx_filtered_packets: 0
> >      rx_discards: 0
> >      rx_fw_discards: 0
> >      brb_discard: 0
> >      brb_truncate: 0
> >      rx_phy_ip_err_discards: 0
> >      rx_skb_alloc_discard: 0
> >      rx_csum_offload_errors: 6
> > 
> > the driver ver is:
> > bnx2x_main.c
> > #define DRV_MODULE_VERSION      "1.45.26"
> > 
> > looks not support?
> > 
> > thanks
> > jon
> 
> Per queue stats were added last year only (Thu Feb 12 08:36:33 2009)
> 
> You might check "grep eth5 /proc/interrupts"
> 
> Or upgrade to 2.6.33.x kernel :)
> 
The HW and current driver support multi-queue. However, you are using a version which is too old.






^ permalink raw reply

* [PATCH net-next] bnx2x: avoid TX timeout when stopping device
From: Stanislaw Gruszka @ 2010-05-12  9:09 UTC (permalink / raw)
  To: netdev; +Cc: Eilon Greenstein, Vladislav Zolotarov, Dmitry Kravkov

When stop device call netif_carrier_off() just after disabling TX queue to
avoid possibility of netdev watchdog warning and ->ndo_tx_timeout() invocation.

Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
---
 drivers/net/bnx2x_main.c |    6 ++----
 1 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/net/bnx2x_main.c b/drivers/net/bnx2x_main.c
index 2bc35c7..57ff5b3 100644
--- a/drivers/net/bnx2x_main.c
+++ b/drivers/net/bnx2x_main.c
@@ -8499,6 +8499,7 @@ static int bnx2x_nic_unload(struct bnx2x *bp, int unload_mode)
 
 	/* Disable HW interrupts, NAPI and Tx */
 	bnx2x_netif_stop(bp, 1);
+	netif_carrier_off(bp->dev);
 
 	del_timer_sync(&bp->timer);
 	SHMEM_WR(bp, func_mb[BP_FUNC(bp)].drv_pulse_mb,
@@ -8524,8 +8525,6 @@ static int bnx2x_nic_unload(struct bnx2x *bp, int unload_mode)
 
 	bp->state = BNX2X_STATE_CLOSED;
 
-	netif_carrier_off(bp->dev);
-
 	/* The last driver must disable a "close the gate" if there is no
 	 * parity attention or "process kill" pending.
 	 */
@@ -13431,6 +13430,7 @@ static int bnx2x_eeh_nic_unload(struct bnx2x *bp)
 	bp->rx_mode = BNX2X_RX_MODE_NONE;
 
 	bnx2x_netif_stop(bp, 0);
+	netif_carrier_off(bp->dev);
 
 	del_timer_sync(&bp->timer);
 	bp->stats_state = STATS_STATE_DISABLED;
@@ -13457,8 +13457,6 @@ static int bnx2x_eeh_nic_unload(struct bnx2x *bp)
 
 	bp->state = BNX2X_STATE_CLOSED;
 
-	netif_carrier_off(bp->dev);
-
 	return 0;
 }
 
-- 
1.5.5.6


^ permalink raw reply related

* Re: [PATCH 2/2] ioat2,3: convert to producer/consumer locking
From: David Howells @ 2010-05-12  8:36 UTC (permalink / raw)
  To: Dan Williams
  Cc: dhowells, linux-kernel, linux-raid, netdev, Paul E. McKenney,
	Maciej Sosnowski
In-Reply-To: <20100511185141.6139.98842.stgit@localhost.localdomain>


Out of interest, does it make the code smaller if you mark
ioat2_get_ring_ent() and ioat2_ring_mask() with __attribute_const__?

I'm not sure whether it'll affect how long gcc is willing to cache these, but
once computed, I would guess they won't change within the calling function.

Also, is the device you're driving watching the ring and its indices?  If so,
does it modify the indices?  If that is the case, you might need to use
read_barrier_depends() rather than smp_read_barrier_depends().

> +		prefetch(ioat2_get_ring_ent(ioat, idx + i + 1));
> +		desc = ioat2_get_ring_ent(ioat, idx + i);
>  		dump_desc_dbg(ioat, desc);
>  		tx = &desc->txd;
>  		if (tx->cookie) {

Is this right, I wonder?  You're prefetching [i+1] before reading [i]?  Doesn't
this mean that you might have to wait for [i+1] to be retrieved from RAM before
[i] can be read?  Should you instead read tx->cookie before issuing the
prefetch?  Admittedly, this is only likely to affect the reading of the head of
the queue - subsequent reads in the same loop will, of course, have been
prefetched.

David

^ permalink raw reply

* Re: does the broadcom bnx2x support RSS/multi queue
From: Eric Dumazet @ 2010-05-12  7:41 UTC (permalink / raw)
  To: Jon Zhou; +Cc: netdev@vger.kernel.org
In-Reply-To: <4A6A2125329CFD4D8CC40C9E8ABCAB9F2497D85D43@MILEXCH2.ds.jdsu.net>

Le mercredi 12 mai 2010 à 00:31 -0700, Jon Zhou a écrit :
> hi there
> 
> I am not sure if my Broadcom 10G nic driver(bnx2x) support RSS/multi queue
> 
> ibm-bc-53:/home/ruizhou/nprobe # uname -a
> Linux ibm-bc-53 2.6.27.19-5-default #1 SMP 2009-02-28 04:40:21 +0100 x86_64 x86_64 x86_64 GNU/Linux
> 
> ibm-bc-53:/home/ruizhou/nprobe # ethtool -S eth5
> NIC statistics:
>      rx_bytes: 68100170
>      rx_error_bytes: 0
>      tx_bytes: 0
>      tx_error_bytes: 0
>      rx_ucast_packets: 201654
>      rx_mcast_packets: 0
>      rx_bcast_packets: 0
>      tx_packets: 0
>      tx_mac_errors: 0
>      tx_carrier_errors: 0
>      rx_crc_errors: 0
>      rx_align_errors: 0
>      tx_single_collisions: 0
>      tx_multi_collisions: 0
>      tx_deferred: 0
>      tx_excess_collisions: 0
>      tx_late_collisions: 0
>      tx_total_collisions: 0
>      rx_fragments: 0
>      rx_jabbers: 0
>      rx_undersize_packets: 0
>      rx_oversize_packets: 0
>      tx_64_byte_packets: 0
>      tx_65_to_127_byte_packets: 0
>      tx_128_to_255_byte_packets: 0
>      tx_256_to_511_byte_packets: 0
>      tx_512_to_1023_byte_packets: 0
>      tx_1024_to_1522_byte_packets: 0
>      tx_1523_to_9022_byte_packets: 0
>      rx_xon_frames: 0
>      rx_xoff_frames: 0
>      tx_xon_frames: 0
>      tx_xoff_frames: 0
>      rx_mac_ctrl_frames: 0
>      rx_filtered_packets: 0
>      rx_discards: 0
>      rx_fw_discards: 0
>      brb_discard: 0
>      brb_truncate: 0
>      rx_phy_ip_err_discards: 0
>      rx_skb_alloc_discard: 0
>      rx_csum_offload_errors: 6
> 
> the driver ver is:
> bnx2x_main.c
> #define DRV_MODULE_VERSION      "1.45.26"
> 
> looks not support?
> 
> thanks
> jon

Per queue stats were added last year only (Thu Feb 12 08:36:33 2009)

You might check "grep eth5 /proc/interrupts"

Or upgrade to 2.6.33.x kernel :)



^ permalink raw reply

* does the broadcom bnx2x support RSS/multi queue
From: Jon Zhou @ 2010-05-12  7:31 UTC (permalink / raw)
  To: netdev@vger.kernel.org

hi there

I am not sure if my Broadcom 10G nic driver(bnx2x) support RSS/multi queue

ibm-bc-53:/home/ruizhou/nprobe # uname -a
Linux ibm-bc-53 2.6.27.19-5-default #1 SMP 2009-02-28 04:40:21 +0100 x86_64 x86_64 x86_64 GNU/Linux

ibm-bc-53:/home/ruizhou/nprobe # ethtool -S eth5
NIC statistics:
     rx_bytes: 68100170
     rx_error_bytes: 0
     tx_bytes: 0
     tx_error_bytes: 0
     rx_ucast_packets: 201654
     rx_mcast_packets: 0
     rx_bcast_packets: 0
     tx_packets: 0
     tx_mac_errors: 0
     tx_carrier_errors: 0
     rx_crc_errors: 0
     rx_align_errors: 0
     tx_single_collisions: 0
     tx_multi_collisions: 0
     tx_deferred: 0
     tx_excess_collisions: 0
     tx_late_collisions: 0
     tx_total_collisions: 0
     rx_fragments: 0
     rx_jabbers: 0
     rx_undersize_packets: 0
     rx_oversize_packets: 0
     tx_64_byte_packets: 0
     tx_65_to_127_byte_packets: 0
     tx_128_to_255_byte_packets: 0
     tx_256_to_511_byte_packets: 0
     tx_512_to_1023_byte_packets: 0
     tx_1024_to_1522_byte_packets: 0
     tx_1523_to_9022_byte_packets: 0
     rx_xon_frames: 0
     rx_xoff_frames: 0
     tx_xon_frames: 0
     tx_xoff_frames: 0
     rx_mac_ctrl_frames: 0
     rx_filtered_packets: 0
     rx_discards: 0
     rx_fw_discards: 0
     brb_discard: 0
     brb_truncate: 0
     rx_phy_ip_err_discards: 0
     rx_skb_alloc_discard: 0
     rx_csum_offload_errors: 6

the driver ver is:
bnx2x_main.c
#define DRV_MODULE_VERSION      "1.45.26"

looks not support?

thanks
jon





^ permalink raw reply

* Re: [BUG] crashes with kvm/nat networking and net-next
From: Eric Dumazet @ 2010-05-12  7:32 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Bart De Schuymer, Patrick McHardy, netdev
In-Reply-To: <20100511202544.267d33ee@nehalam>

Le mardi 11 mai 2010 à 20:25 -0700, Stephen Hemminger a écrit :
> This is a regression that is showing up now in net-next, not sure what
> changed recently in bridge netfilter that could be causing it?
> 
> [ 4593.956206] BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
> [ 4593.956219] IP: [<ffffffffa03357a4>] br_nf_forward_finish+0x154/0x170 [bridge]
> [ 4593.956232] PGD 195ece067 PUD 1ba005067 PMD 0 
> [ 4593.956241] Oops: 0000 [#1] SMP 
> [ 4593.956248] last sysfs file: /sys/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A08:00/device:08/ATK0110:00/hwmon/hwmon0/temp2_label
> [ 4593.956253] CPU 3 
> [ 4593.956256] Modules linked in: netconsole configfs hid_belkin tun ntfs vfat msdos fat autofs4 binfmt_misc ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables bridge stp llc kvm_intel kvm radeon ttm drm_kms_helper drm i2c_algo_bit snd_hda_codec_analog ipv6 snd_hda_intel snd_hda_codec snd_hwdep snd_pcm_oss snd_mixer_oss snd_pcm snd_seq_dummy snd_seq_oss snd_seq_midi snd_rawmidi snd_seq_midi_event snd_seq snd_timer snd_seq_device psmouse asus_atk0110 snd serio_raw soundcore snd_page_alloc usbhid mvsas libsas scsi_transport_sas floppy sky2 e1000e [last unloaded: netconsole]
> [ 4593.956375] 
> [ 4593.956380] Pid: 29512, comm: kvm Not tainted 2.6.34-rc7-net #195 P6T DELUXE/System Product Name
> [ 4593.956384] RIP: 0010:[<ffffffffa03357a4>]  [<ffffffffa03357a4>] br_nf_forward_finish+0x154/0x170 [bridge]
> [ 4593.956395] RSP: 0018:ffff880001e63b78  EFLAGS: 00010246
> [ 4593.956399] RAX: 0000000000000608 RBX: ffff880057181700 RCX: ffff8801b813d000
> [ 4593.956402] RDX: 0000000000000000 RSI: 0000000000000002 RDI: ffff880057181700
> [ 4593.956406] RBP: ffff880001e63ba8 R08: ffff8801b9d97000 R09: ffffffffa0335650
> [ 4593.956410] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801b813d000
> [ 4593.956413] R13: ffffffff81ab3940 R14: ffff880057181700 R15: 0000000000000002
> [ 4593.956418] FS:  00007fc40d380710(0000) GS:ffff880001e60000(0000) knlGS:0000000000000000
> [ 4593.956422] CS:  0010 DS: 002b ES: 002b CR0: 000000008005003b
> [ 4593.956426] CR2: 0000000000000018 CR3: 00000001ba1d7000 CR4: 00000000000026e0
> [ 4593.956429] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 4593.956433] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [ 4593.956437] Process kvm (pid: 29512, threadinfo ffff8801ba566000, task ffff8801b8003870)
> [ 4593.956441] Stack:
> [ 4593.956443]  0000000100000020 ffff880001e63ba0 ffff880001e63ba0 ffff880057181700
> [ 4593.956451] <0> ffffffffa0335650 ffffffff81ab3940 ffff880001e63bd8 ffffffffa03350e6
> [ 4593.956462] <0> ffff880001e63c40 000000000000024d ffff880057181700 0000000080000000
> [ 4593.956474] Call Trace:
> [ 4593.956478]  <IRQ> 
> [ 4593.956488]  [<ffffffffa0335650>] ? br_nf_forward_finish+0x0/0x170 [bridge]
> [ 4593.956496]  [<ffffffffa03350e6>] NF_HOOK_THRESH+0x56/0x60 [bridge]
> [ 4593.956504]  [<ffffffffa0335282>] br_nf_forward_arp+0x112/0x120 [bridge]
> [ 4593.956511]  [<ffffffff813f7184>] nf_iterate+0x64/0xa0
> [ 4593.956519]  [<ffffffffa032f920>] ? br_forward_finish+0x0/0x60 [bridge]
> [ 4593.956524]  [<ffffffff813f722c>] nf_hook_slow+0x6c/0x100
> [ 4593.956531]  [<ffffffffa032f920>] ? br_forward_finish+0x0/0x60 [bridge]
> [ 4593.956538]  [<ffffffffa032f800>] ? __br_forward+0x0/0xc0 [bridge]
> [ 4593.956545]  [<ffffffffa032f86d>] __br_forward+0x6d/0xc0 [bridge]
> [ 4593.956550]  [<ffffffff813c5d8e>] ? skb_clone+0x3e/0x70
> [ 4593.956557]  [<ffffffffa032f462>] deliver_clone+0x32/0x60 [bridge]
> [ 4593.956564]  [<ffffffffa032f6b6>] br_flood+0xa6/0xe0 [bridge]
> [ 4593.956571]  [<ffffffffa032f800>] ? __br_forward+0x0/0xc0 [bridge]
> [ 4593.956578]  [<ffffffffa032f700>] br_flood_forward+0x10/0x20 [bridge]
> [ 4593.956586]  [<ffffffffa0330ace>] br_handle_frame_finish+0x23e/0x260 [bridge]
> [ 4593.956595]  [<ffffffffa03307ea>] br_handle_frame+0x1aa/0x250 [bridge]
> [ 4593.956605]  [<ffffffff81070331>] ? autoremove_wake_function+0x11/0x40
> [ 4593.956614]  [<ffffffff813cf537>] __netif_receive_skb+0x187/0x5d0
> [ 4593.956622]  [<ffffffff813cfa81>] process_backlog+0x101/0x210
> [ 4593.956630]  [<ffffffff813d092d>] net_rx_action+0x10d/0x260
> [ 4593.956639]  [<ffffffff81058100>] __do_softirq+0xb0/0x230
> [ 4593.956648]  [<ffffffff81009e5c>] call_softirq+0x1c/0x30
> [ 4593.956653]  <EOI> 
> [ 4593.956662]  [<ffffffff8100bad5>] ? do_softirq+0x65/0xa0
> [ 4593.956667]  [<ffffffff813d3e48>] netif_rx_ni+0x28/0x30
> [ 4593.956673]  [<ffffffffa03e2196>] tun_chr_aio_write+0x276/0x540 [tun]
> [ 4593.956679]  [<ffffffffa03e1f20>] ? tun_chr_aio_write+0x0/0x540 [tun]
> [ 4593.956686]  [<ffffffff8110cd0b>] do_sync_readv_writev+0xcb/0x110
> [ 4593.956692]  [<ffffffff8120d593>] ? selinux_file_permission+0xf3/0x150
> [ 4593.956699]  [<ffffffff81203081>] ? security_file_permission+0x11/0x20
> [ 4593.956704]  [<ffffffff8110dd9a>] do_readv_writev+0xca/0x1f0
> [ 4593.956710]  [<ffffffff8111c888>] ? vfs_ioctl+0x38/0xd0
> [ 4593.956714]  [<ffffffff8111ceda>] ? do_vfs_ioctl+0x8a/0x610
> [ 4593.956719]  [<ffffffff8110defe>] vfs_writev+0x3e/0x60
> [ 4593.956723]  [<ffffffff8110e02c>] sys_writev+0x4c/0xb0
> [ 4593.956730]  [<ffffffff81008f42>] system_call_fastpath+0x16/0x1b
> [ 4593.956733] Code: d8 00 00 00 66 81 7c 01 10 08 06 0f 85 fc fe ff ff 44 8b 15 ff 6e 00 00 45 85 d2 0f 84 ec fe ff ff 66 0f 1f 44 00 00 4c 8b 63 28 <8b> 42 18 e9 e5 fe ff ff 0f 1f 40 00 48 89 df e8 68 a1 ff ff e9 
> [ 4593.956838] RIP  [<ffffffffa03357a4>] br_nf_forward_finish+0x154/0x170 [bridge]
> [ 4593.956848]  RSP <ffff880001e63b78>
> [ 4593.956851] CR2: 0000000000000018
> [ 4593.956855] ---[ end trace 5703d55ac3604d1c ]---
> [ 4593.956859] Kernel panic - not syncing: Fatal exception in interrupt
> [ 4593.956864] Pid: 29512, comm: kvm Tainted: G      D    2.6.34-rc7-net #195
> [ 4593.956867] Call Trace:
> [ 4593.956869]  <IRQ>  [<ffffffff81484ff2>] panic+0x78/0xf1
> [ 4593.956880]  [<ffffffff81489449>] oops_end+0xa9/0xb0
> [ 4593.956885]  [<ffffffff81033963>] no_context+0xf3/0x260
> [ 4593.956891]  [<ffffffff81256664>] ? do_raw_spin_lock+0x54/0x150
> [ 4593.956896]  [<ffffffff81033be5>] __bad_area_nosemaphore+0x115/0x1d0
> [ 4593.956901]  [<ffffffff81033cae>] bad_area_nosemaphore+0xe/0x10
> [ 4593.956907]  [<ffffffff8148bb3f>] do_page_fault+0x28f/0x330
> [ 4593.956913]  [<ffffffff814887b5>] page_fault+0x25/0x30
> [ 4593.956921]  [<ffffffffa0335650>] ? br_nf_forward_finish+0x0/0x170 [bridge]
> [ 4593.956929]  [<ffffffffa03357a4>] ? br_nf_forward_finish+0x154/0x170 [bridge]
> [ 4593.956938]  [<ffffffffa0335650>] ? br_nf_forward_finish+0x0/0x170 [bridge]
> [ 4593.956951]  [<ffffffffa03350e6>] NF_HOOK_THRESH+0x56/0x60 [bridge]
> [ 4593.956963]  [<ffffffffa0335282>] br_nf_forward_arp+0x112/0x120 [bridge]
> [ 4593.956972]  [<ffffffff813f7184>] nf_iterate+0x64/0xa0
> [ 4593.956983]  [<ffffffffa032f920>] ? br_forward_finish+0x0/0x60 [bridge]
> [ 4593.956990]  [<ffffffff813f722c>] nf_hook_slow+0x6c/0x100
> [ 4593.956997]  [<ffffffffa032f920>] ? br_forward_finish+0x0/0x60 [bridge]
> [ 4593.957005]  [<ffffffffa032f800>] ? __br_forward+0x0/0xc0 [bridge]
> [ 4593.957012]  [<ffffffffa032f86d>] __br_forward+0x6d/0xc0 [bridge]
> [ 4593.957017]  [<ffffffff813c5d8e>] ? skb_clone+0x3e/0x70
> [ 4593.957023]  [<ffffffffa032f462>] deliver_clone+0x32/0x60 [bridge]
> [ 4593.957030]  [<ffffffffa032f6b6>] br_flood+0xa6/0xe0 [bridge]
> [ 4593.957037]  [<ffffffffa032f800>] ? __br_forward+0x0/0xc0 [bridge]
> [ 4593.957044]  [<ffffffffa032f700>] br_flood_forward+0x10/0x20 [bridge]
> [ 4593.957052]  [<ffffffffa0330ace>] br_handle_frame_finish+0x23e/0x260 [bridge]
> [ 4593.957059]  [<ffffffffa03307ea>] br_handle_frame+0x1aa/0x250 [bridge]
> [ 4593.957065]  [<ffffffff81070331>] ? autoremove_wake_function+0x11/0x40
> [ 4593.957070]  [<ffffffff813cf537>] __netif_receive_skb+0x187/0x5d0
> [ 4593.957076]  [<ffffffff813cfa81>] process_backlog+0x101/0x210
> [ 4593.957081]  [<ffffffff813d092d>] net_rx_action+0x10d/0x260
> [ 4593.957086]  [<ffffffff81058100>] __do_softirq+0xb0/0x230
> [ 4593.957091]  [<ffffffff81009e5c>] call_softirq+0x1c/0x30
> [ 4593.957094]  <EOI>  [<ffffffff8100bad5>] ? do_softirq+0x65/0xa0
> [ 4593.957102]  [<ffffffff813d3e48>] netif_rx_ni+0x28/0x30
> [ 4593.957108]  [<ffffffffa03e2196>] tun_chr_aio_write+0x276/0x540 [tun]
> [ 4593.957113]  [<ffffffffa03e1f20>] ? tun_chr_aio_write+0x0/0x540 [tun]
> [ 4593.957119]  [<ffffffff8110cd0b>] do_sync_readv_writev+0xcb/0x110
> [ 4593.957125]  [<ffffffff8120d593>] ? selinux_file_permission+0xf3/0x150
> [ 4593.957130]  [<ffffffff81203081>] ? security_file_permission+0x11/0x20
> [ 4593.957135]  [<ffffffff8110dd9a>] do_readv_writev+0xca/0x1f0
> [ 4593.957139]  [<ffffffff8111c888>] ? vfs_ioctl+0x38/0xd0
> [ 4593.957144]  [<ffffffff8111ceda>] ? do_vfs_ioctl+0x8a/0x610
> [ 4593.957148]  [<ffffffff8110defe>] vfs_writev+0x3e/0x60
> [ 4593.957153]  [<ffffffff8110e02c>] sys_writev+0x4c/0xb0
> [ 4593.957158]  [<ffffffff81008f42>] system_call_fastpath+0x16/0x1b

Not sure, but br_nf_forward_ip() has following check :

if (!skb->nf_bridge)
	return NF_ACCEPT;

while br_nf_forward_arp() missed this check ...

So we can dereference null pointer later

diff --git a/net/bridge/br_netfilter.c b/net/bridge/br_netfilter.c
index 93f80fe..cd2e5f5 100644
--- a/net/bridge/br_netfilter.c
+++ b/net/bridge/br_netfilter.c
@@ -723,6 +723,9 @@ static unsigned int br_nf_forward_arp(unsigned int hook, struct sk_buff *skb,
 		return NF_ACCEPT;
 #endif
 
+	if (!skb->nf_bridge)
+		return NF_ACCEPT;
+
 	if (skb->protocol != htons(ETH_P_ARP)) {
 		if (!IS_VLAN_ARP(skb))
 			return NF_ACCEPT;



^ permalink raw reply related

* Re: [Uclinux-dist-devel] [PATCH 1/9] netdev: bfin_mac: add support for IEEE 1588 PTP
From: Richard Cochran @ 2010-05-12  7:20 UTC (permalink / raw)
  To: Mike Frysinger
  Cc: Barry Song, netdev, Barry Song, David S. Miller,
	uclinux-dist-devel
In-Reply-To: <AANLkTimPOxc4RQ5iDMk8N9xYFb95VaqFleUa_nG4Gwwn@mail.gmail.com>

On Tue, May 11, 2010 at 11:31:37PM -0400, Mike Frysinger wrote:
> On Tue, May 11, 2010 at 23:20, Barry Song wrote:
> >
> > I think the API can work for blackfin.  But our PTP driver is based on
> > drivers/net/igb and has worked together with user-space PTPD utility.
> > Here he is writing a different driver framework. It is not the moment
> > for us to merge now. Maybe next kernel version after his patches have
> > been popular.
> 
> i'm not going to merge them into our tree ahead of the net->mainline
> merge.  Richard would just like some feedback on the proposed
> framework to make sure it doesnt have limitations we'd have to fix
> after things get merged.

Yes, thats right. It is enough just to know that the API *could* work
for blackfin. The idea is to have a standard API that works for all
current (and likely future) PTP hardware clocks.

The patch set is still under active development and review, so it is
better for you to wait until the dust settles.

Thanks,
Richard


^ permalink raw reply

* [PATCH net-next-2.6] [PPP] cleanup: remove pppoe_ioctl() declaration.
From: Rami Rosen @ 2010-05-12  5:37 UTC (permalink / raw)
  To: davem, netdev

[-- Attachment #1: Type: text/plain, Size: 167 bytes --]

Hi,
  - This patch removes pppoe_ioctl()  declaration in
drivers/net/pppoe.c as it is unneeded.


Regards,
Rami Rosen


Signed-off-by: Rami Rosen <ramirose@gmail.com>

[-- Attachment #2: patch.txt --]
[-- Type: text/plain, Size: 511 bytes --]

diff --git a/drivers/net/pppoe.c b/drivers/net/pppoe.c
old mode 100644
new mode 100755
index 99f031a..6fd84ed
--- a/drivers/net/pppoe.c
+++ b/drivers/net/pppoe.c
@@ -89,7 +89,6 @@
 #define PPPOE_HASH_SIZE (1 << PPPOE_HASH_BITS)
 #define PPPOE_HASH_MASK	(PPPOE_HASH_SIZE - 1)
 
-static int pppoe_ioctl(struct socket *sock, unsigned int cmd, unsigned long arg);
 static int pppoe_xmit(struct ppp_channel *chan, struct sk_buff *skb);
 static int __pppoe_xmit(struct sock *sk, struct sk_buff *skb);
 

^ permalink raw reply related

* [patch 2/3] [PATCH] qeth: new message if OLM limit is reached
From: frank.blaschka @ 2010-05-12  5:34 UTC (permalink / raw)
  To: davem; +Cc: netdev, linux-s390, Ursula Braun
In-Reply-To: <20100512053444.035939000@de.ibm.com>

[-- Attachment #1: 602-qeth-olm-limit-msg.diff --]
[-- Type: text/plain, Size: 1621 bytes --]

From: Ursula Braun <ursula.braun@de.ibm.com>

z/OS may activate Optimized Latency Mode (OLM) for a connection
through an OSA Express3 adapter, which reduces the number of
allowed concurrent connections, if adapter is used in shared mode.
Create a meaningful message, if activation of an OSA-connection fails
due to an active OLM-connection on the shared OSA-adapter.

Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com>
Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com>
---

 drivers/s390/net/qeth_core_main.c |   10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff -urpN linux-2.6/drivers/s390/net/qeth_core_main.c linux-2.6-patched/drivers/s390/net/qeth_core_main.c
--- linux-2.6/drivers/s390/net/qeth_core_main.c	2010-05-11 22:10:12.000000000 +0200
+++ linux-2.6-patched/drivers/s390/net/qeth_core_main.c	2010-05-11 22:10:34.000000000 +0200
@@ -1976,6 +1976,7 @@ static int qeth_ulp_setup_cb(struct qeth
 		unsigned long data)
 {
 	struct qeth_cmd_buffer *iob;
+	int rc = 0;
 
 	QETH_DBF_TEXT(SETUP, 2, "ulpstpcb");
 
@@ -1983,8 +1984,15 @@ static int qeth_ulp_setup_cb(struct qeth
 	memcpy(&card->token.ulp_connection_r,
 	       QETH_ULP_SETUP_RESP_CONNECTION_TOKEN(iob->data),
 	       QETH_MPC_TOKEN_LENGTH);
+	if (!strncmp("00S", QETH_ULP_SETUP_RESP_CONNECTION_TOKEN(iob->data),
+		     3)) {
+		QETH_DBF_TEXT(SETUP, 2, "olmlimit");
+		dev_err(&card->gdev->dev, "A connection could not be "
+			"established because of an OLM limit\n");
+		rc = -EMLINK;
+	}
 	QETH_DBF_TEXT_(SETUP, 2, "  rc%d", iob->rc);
-	return 0;
+	return rc;
 }
 
 static int qeth_ulp_setup(struct qeth_card *card)


^ permalink raw reply

* [patch 0/3] s390: qeth patches for 2.6.35
From: frank.blaschka @ 2010-05-12  5:34 UTC (permalink / raw)
  To: davem; +Cc: netdev, linux-s390

Hi Dave,

here are some qeth patches for 2.6.35 (net-next).

shortlog:
Ursula Braun (1)
qeth: new message if OLM limit is reached

Frank Blaschka (2)
qeth: exploit HW TX checksumming
qeth: synchronize configuration interface

Thanks,
        Frank

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox