Netdev List

Netdev List
 help / color / mirror / Atom feed

* [ofa-general] [PATCH 3/3] RDMA/cxgb3: Mark qp as privileged based on user capabilities.
From: Steve Wise @ 2008-01-21 20:39 UTC (permalink / raw)
  To: rdreier; +Cc: netdev, linux-kernel, general
In-Reply-To: <20080121203829.3143.26181.stgit@dell3.ogc.int>


RDMA/cxgb3: Mark qp as privileged based on user capabilities.

This is needed for zero-stag support.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
---

 drivers/infiniband/hw/cxgb3/cxio_wr.h |    3 ++-
 drivers/infiniband/hw/cxgb3/iwch_qp.c |    1 +
 2 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/drivers/infiniband/hw/cxgb3/cxio_wr.h b/drivers/infiniband/hw/cxgb3/cxio_wr.h
index c84d4ac..d72b584 100644
--- a/drivers/infiniband/hw/cxgb3/cxio_wr.h
+++ b/drivers/infiniband/hw/cxgb3/cxio_wr.h
@@ -324,7 +324,8 @@ struct t3_genbit {
 };
 
 enum rdma_init_wr_flags {
-	RECVS_POSTED = 1,
+	RECVS_POSTED = (1<<0),
+	PRIV_QP = (1<<1),
 };
 
 union t3_wr {
diff --git a/drivers/infiniband/hw/cxgb3/iwch_qp.c b/drivers/infiniband/hw/cxgb3/iwch_qp.c
index 7681fdc..ea2cdd7 100644
--- a/drivers/infiniband/hw/cxgb3/iwch_qp.c
+++ b/drivers/infiniband/hw/cxgb3/iwch_qp.c
@@ -717,6 +717,7 @@ static int rdma_init(struct iwch_dev *rhp, struct iwch_qp *qhp,
 	init_attr.qp_dma_addr = qhp->wq.dma_addr;
 	init_attr.qp_dma_size = (1UL << qhp->wq.size_log2);
 	init_attr.flags = rqes_posted(qhp) ? RECVS_POSTED : 0;
+	init_attr.flags |= capable(CAP_NET_BIND_SERVICE) ? PRIV_QP : 0;
 	init_attr.irs = qhp->ep->rcv_seq;
 	PDBG("%s init_attr.rq_addr 0x%x init_attr.rq_size = %d "
 	     "flags 0x%x qpcaps 0x%x\n", __FUNCTION__,

^ permalink raw reply related

* [ofa-general] [PATCH RESEND 0/3] RDMA/cxgb3 fixes
From: Steve Wise @ 2008-01-21 20:41 UTC (permalink / raw)
  To: rdreier; +Cc: netdev, linux-kernel, general

Hey Roland,

Please include these three iw_cxgb3 fixes for 2.6.25.  The first two fix
bugs found doing Lustre testing, and the last patch correctly marks
privileged qps.

Shortlog:
      RDMA/cxgb3: Flush the RQ when closing.
      RDMA/cxgb3: fix page shift calculation in build_phys_page_list()
      RDMA/cxgb3: Mark qp as privileged based on user capabilities.

-- 
Steve.

^ permalink raw reply

* [ofa-general] [PATCH RESEND 1/3] RDMA/cxgb3: Flush the RQ when closing.
From: Steve Wise @ 2008-01-21 20:42 UTC (permalink / raw)
  To: rdreier; +Cc: netdev, linux-kernel, general
In-Reply-To: <20080121204130.3820.11053.stgit@dell3.ogc.int>


RDMA/cxgb3: Flush the RQ when closing.

- for kernel mode cqs, call event notification handler when flushing
- flush qp when moving from RTS -> CLOSING
- fixed logic to identify a kernel mode qp

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
---

 drivers/infiniband/hw/cxgb3/iwch_qp.c |    7 +++++--
 1 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/hw/cxgb3/iwch_qp.c b/drivers/infiniband/hw/cxgb3/iwch_qp.c
index 9bb8112..7681fdc 100644
--- a/drivers/infiniband/hw/cxgb3/iwch_qp.c
+++ b/drivers/infiniband/hw/cxgb3/iwch_qp.c
@@ -642,6 +642,7 @@ static void __flush_qp(struct iwch_qp *qhp, unsigned long *flag)
 	cxio_flush_rq(&qhp->wq, &rchp->cq, count);
 	spin_unlock(&qhp->lock);
 	spin_unlock_irqrestore(&rchp->lock, *flag);
+	(*rchp->ibcq.comp_handler)(&rchp->ibcq, rchp->ibcq.cq_context);
 
 	/* locking heirarchy: cq lock first, then qp lock. */
 	spin_lock_irqsave(&schp->lock, *flag);
@@ -651,6 +652,7 @@ static void __flush_qp(struct iwch_qp *qhp, unsigned long *flag)
 	cxio_flush_sq(&qhp->wq, &schp->cq, count);
 	spin_unlock(&qhp->lock);
 	spin_unlock_irqrestore(&schp->lock, *flag);
+	(*schp->ibcq.comp_handler)(&schp->ibcq, schp->ibcq.cq_context);
 
 	/* deref */
 	if (atomic_dec_and_test(&qhp->refcnt))
@@ -661,7 +663,7 @@ static void __flush_qp(struct iwch_qp *qhp, unsigned long *flag)
 
 static void flush_qp(struct iwch_qp *qhp, unsigned long *flag)
 {
-	if (t3b_device(qhp->rhp))
+	if (qhp->ibqp.uobject)
 		cxio_set_wq_in_error(&qhp->wq);
 	else
 		__flush_qp(qhp, flag);
@@ -830,10 +832,11 @@ int iwch_modify_qp(struct iwch_dev *rhp, struct iwch_qp *qhp,
 				disconnect = 1;
 				ep = qhp->ep;
 			}
+			flush_qp(qhp, &flag);
 			break;
 		case IWCH_QP_STATE_TERMINATE:
 			qhp->attr.state = IWCH_QP_STATE_TERMINATE;
-			if (t3b_device(qhp->rhp))
+			if (qhp->ibqp.uobject)
 				cxio_set_wq_in_error(&qhp->wq);
 			if (!internal)
 				terminate = 1;

^ permalink raw reply related

* [ofa-general] [PATCH RESEND 2/3] RDMA/cxgb3: fix page shift calculation in build_phys_page_list()
From: Steve Wise @ 2008-01-21 20:42 UTC (permalink / raw)
  To: rdreier; +Cc: netdev, linux-kernel, general
In-Reply-To: <20080121204130.3820.11053.stgit@dell3.ogc.int>


RDMA/cxgb3: fix page shift calculation in build_phys_page_list()

The existing logic incorrectly maps this buffer list:

0: addr 0x10001000, size 0x1000
1: addr 0x10002000, size 0x1000

To this bogus page list:

0: 0x10000000
1: 0x10002000

The shift calculation must also take into account the address of the first
entry masked by the page_mask as well as the last address+size rounded
up to the next page size.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
---

 drivers/infiniband/hw/cxgb3/iwch_mem.c |    7 +++++++
 1 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/drivers/infiniband/hw/cxgb3/iwch_mem.c b/drivers/infiniband/hw/cxgb3/iwch_mem.c
index a6c2c4b..73bfd16 100644
--- a/drivers/infiniband/hw/cxgb3/iwch_mem.c
+++ b/drivers/infiniband/hw/cxgb3/iwch_mem.c
@@ -122,6 +122,13 @@ int build_phys_page_list(struct ib_phys_buf *buffer_list,
 		*total_size += buffer_list[i].size;
 		if (i > 0)
 			mask |= buffer_list[i].addr;
+		else
+			mask |= buffer_list[i].addr & PAGE_MASK;
+		if (i != num_phys_buf - 1)
+			mask |= buffer_list[i].addr + buffer_list[i].size;
+		else
+			mask |= (buffer_list[i].addr + buffer_list[i].size +
+				PAGE_SIZE - 1) & PAGE_MASK;
 	}
 
 	if (*total_size > 0xFFFFFFFFULL)

^ permalink raw reply related

* [ofa-general] [PATCH RESEND 3/3] RDMA/cxgb3: Mark qp as privileged based on user capabilities.
From: Steve Wise @ 2008-01-21 20:42 UTC (permalink / raw)
  To: rdreier; +Cc: netdev, linux-kernel, general
In-Reply-To: <20080121204130.3820.11053.stgit@dell3.ogc.int>


RDMA/cxgb3: Mark qp as privileged based on user capabilities.

This is needed for zero-stag support.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
---

 drivers/infiniband/hw/cxgb3/cxio_wr.h |    3 ++-
 drivers/infiniband/hw/cxgb3/iwch_qp.c |    1 +
 2 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/drivers/infiniband/hw/cxgb3/cxio_wr.h b/drivers/infiniband/hw/cxgb3/cxio_wr.h
index c84d4ac..d72b584 100644
--- a/drivers/infiniband/hw/cxgb3/cxio_wr.h
+++ b/drivers/infiniband/hw/cxgb3/cxio_wr.h
@@ -324,7 +324,8 @@ struct t3_genbit {
 };
 
 enum rdma_init_wr_flags {
-	RECVS_POSTED = 1,
+	RECVS_POSTED = (1<<0),
+	PRIV_QP = (1<<1),
 };
 
 union t3_wr {
diff --git a/drivers/infiniband/hw/cxgb3/iwch_qp.c b/drivers/infiniband/hw/cxgb3/iwch_qp.c
index 7681fdc..ea2cdd7 100644
--- a/drivers/infiniband/hw/cxgb3/iwch_qp.c
+++ b/drivers/infiniband/hw/cxgb3/iwch_qp.c
@@ -717,6 +717,7 @@ static int rdma_init(struct iwch_dev *rhp, struct iwch_qp *qhp,
 	init_attr.qp_dma_addr = qhp->wq.dma_addr;
 	init_attr.qp_dma_size = (1UL << qhp->wq.size_log2);
 	init_attr.flags = rqes_posted(qhp) ? RECVS_POSTED : 0;
+	init_attr.flags |= capable(CAP_NET_BIND_SERVICE) ? PRIV_QP : 0;
 	init_attr.irs = qhp->ep->rcv_seq;
 	PDBG("%s init_attr.rq_addr 0x%x init_attr.rq_size = %d "
 	     "flags 0x%x qpcaps 0x%x\n", __FUNCTION__,

^ permalink raw reply related

* [PATCH] [NET] cpmac: convert to new Fixed PHY infrastructure (was: Re: fixed phy support (warning related to FIXED_MII_100_FDX))
From: Anton Vorontsov @ 2008-01-21 20:49 UTC (permalink / raw)
  To: Kumar Gala; +Cc: linuxppc-dev list, netdev, Jeff Garzik, Eugene Konev
In-Reply-To: <416B8C5D-109C-4C03-807D-7EBB0AA9C366@kernel.crashing.org>

On Mon, Jan 21, 2008 at 01:19:41PM -0600, Kumar Gala wrote:
> Anton,
> 
> it looks like the "TI AR7 CPMAC Ethernet support" uses FIXED_PHY and  
> was selecting FIXED_MII_100_FDX which is gone.
> 
> Can you look into this.  I get the following warning now:
> 
> scripts/kconfig/conf -s arch/powerpc/Kconfig
> drivers/net/Kconfig:1713:warning: 'select' used by config symbol  
> 'CPMAC' refers to undefined symbol 'FIXED_MII_100_FDX'

Wow. I thought there were no Fixed PHY users. :-)

Jeff, as you've already Acked Fixed PHY rework to go through powerpc
tree, would you please Ack this patch in addition? I hope cpmac
maintainer will fix remaining issues as time goes by.

Thanks!

- - - -
From: Anton Vorontsov <avorontsov@ru.mvista.com>
Subject: [PATCH] [NET] cpmac: convert to new Fixed PHY infrastructure

This patch converts cpmac to the new Fixed PHY infrastructure, though it
doesn't fix all the problems with that driver. I didn't even bother to
test this patch to compile, because cpmac driver is broken in several ways:

1. This driver won't compile by itself because lack of its header describing
   platform data;
2. It assumes that fixed PHYs should be created by the ethernet driver.
   It is wrong assumption: fixed PHYs creation is platform code authority,
   driver must blindly accept bus_id and phy_id platform data variables
   instead.

Also, it seem that that driver doesn't have actual in-tree users, so
nothing to fix further.

The main purpose of that patch is to get rid of the following Kconfig
warning:

scripts/kconfig/conf -s arch/powerpc/Kconfig
drivers/net/Kconfig:1713:warning: 'select' used by config symbol
'CPMAC' refers to undefined symbol 'FIXED_MII_100_FDX'

Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
---
 drivers/net/Kconfig |    4 +--
 drivers/net/cpmac.c |   55 ++++++++++++++++----------------------------------
 2 files changed, 19 insertions(+), 40 deletions(-)

diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
index 114771a..5380ff9 100644
--- a/drivers/net/Kconfig
+++ b/drivers/net/Kconfig
@@ -1707,10 +1707,8 @@ config SC92031
 
 config CPMAC
 	tristate "TI AR7 CPMAC Ethernet support (EXPERIMENTAL)"
-	depends on NET_ETHERNET && EXPERIMENTAL && AR7
+	depends on NET_ETHERNET && EXPERIMENTAL && AR7 && BROKEN
 	select PHYLIB
-	select FIXED_PHY
-	select FIXED_MII_100_FDX
 	help
 	  TI AR7 CPMAC Ethernet support
 
diff --git a/drivers/net/cpmac.c b/drivers/net/cpmac.c
index 6fd95a2..88eeb1d 100644
--- a/drivers/net/cpmac.c
+++ b/drivers/net/cpmac.c
@@ -848,15 +848,6 @@ static void cpmac_adjust_link(struct net_device *dev)
 	spin_unlock(&priv->lock);
 }
 
-static int cpmac_link_update(struct net_device *dev,
-			     struct fixed_phy_status *status)
-{
-	status->link = 1;
-	status->speed = 100;
-	status->duplex = 1;
-	return 0;
-}
-
 static int cpmac_open(struct net_device *dev)
 {
 	int i, size, res;
@@ -999,11 +990,11 @@ static int external_switch;
 static int __devinit cpmac_probe(struct platform_device *pdev)
 {
 	int rc, phy_id, i;
+	int mdio_bus_id = cpmac_mii.id;
 	struct resource *mem;
 	struct cpmac_priv *priv;
 	struct net_device *dev;
 	struct plat_cpmac_data *pdata;
-	struct fixed_info *fixed_phy;
 	DECLARE_MAC_BUF(mac);
 
 	pdata = pdev->dev.platform_data;
@@ -1017,9 +1008,23 @@ static int __devinit cpmac_probe(struct platform_device *pdev)
 	}
 
 	if (phy_id == PHY_MAX_ADDR) {
-		if (external_switch || dumb_switch)
+		if (external_switch || dumb_switch) {
+			struct fixed_phy_status status = {};
+
+			mdio_bus_id = 0;
+
+			/*
+			 * FIXME: this should be in the platform code!
+			 * Since there is not platform code at all (that is,
+			 * no mainline users of that driver), place it here
+			 * for now.
+			 */
 			phy_id = 0;
-		else {
+			status.link = 1;
+			status.duplex = 1;
+			status.speed = 100;
+			fixed_phy_add(PHY_POLL, phy_id, &status);
+		} else {
 			printk(KERN_ERR "cpmac: no PHY present\n");
 			return -ENODEV;
 		}
@@ -1063,32 +1068,8 @@ static int __devinit cpmac_probe(struct platform_device *pdev)
 	priv->msg_enable = netif_msg_init(debug_level, 0xff);
 	memcpy(dev->dev_addr, pdata->dev_addr, sizeof(dev->dev_addr));
 
-	if (phy_id == 31) {
-		snprintf(priv->phy_name, BUS_ID_SIZE, PHY_ID_FMT, cpmac_mii.id,
-			 phy_id);
-	} else {
-		/* Let's try to get a free fixed phy... */
-		for (i = 0; i < MAX_PHY_AMNT; i++) {
-			fixed_phy = fixed_mdio_get_phydev(i);
-			if (!fixed_phy)
-				continue;
-			if (!fixed_phy->phydev->attached_dev) {
-				strncpy(priv->phy_name,
-					fixed_phy->phydev->dev.bus_id,
-					BUS_ID_SIZE);
-				fixed_mdio_set_link_update(fixed_phy->phydev,
-							   &cpmac_link_update);
-				goto phy_found;
-			}
-		}
-		if (netif_msg_drv(priv))
-			printk(KERN_ERR "%s: Could not find fixed PHY\n",
-			       dev->name);
-		rc = -ENODEV;
-		goto fail;
-	}
+	snprintf(priv->phy_name, BUS_ID_SIZE, PHY_ID_FMT, mdio_bus_id, phy_id);
 
-phy_found:
 	priv->phy = phy_connect(dev, priv->phy_name, &cpmac_adjust_link, 0,
 				PHY_INTERFACE_MODE_MII);
 	if (IS_ERR(priv->phy)) {
-- 
1.5.2.2


^ permalink raw reply related

* Re: questions on NAPI processing latency and dropped network packets
From: Ben Greear @ 2008-01-21 21:11 UTC (permalink / raw)
  To: Chris Friesen; +Cc: netdev, linux-kernel
In-Reply-To: <4794F848.9020402@nortel.com>

Chris Friesen wrote:

> Is there anything else we can do to minimize the latency of network 
> packet processing and avoid having to crank the rx ring size up so high?

Why is it such a big deal to crank up the rx queue length?  Seems like
a perfectly normal way to handle bursts like this...

Thanks,
Ben


-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


^ permalink raw reply

* Re: 2.6.24-rc8-mm1 : net tcp_input.c warnings
From: Ilpo Järvinen @ 2008-01-21 21:14 UTC (permalink / raw)
  To: Dave Young; +Cc: LKML, David Miller, Netdev, Andrew Morton
In-Reply-To: <20080121075548.GA22177@darkstar.te-china.tietoenator.com>

On Mon, 21 Jan 2008, Dave Young wrote:

> Please see the kernel messages following,(trigged while using some qemu session)
> BTW, seems there's some e100 error message as well.
> 
> PCI: Setting latency timer of device 0000:00:1b.0 to 64
> e100: Intel(R) PRO/100 Network Driver, 3.5.23-k4-NAPI
> e100: Copyright(c) 1999-2006 Intel Corporation
> ACPI: PCI Interrupt 0000:03:08.0[A] -> GSI 20 (level, low) -> IRQ 20
> modprobe:2331 conflicting cache attribute efaff000-efb00000 uncached<->default
> e100: 0000:03:08.0: e100_probe: Cannot map device registers, aborting.
> ACPI: PCI interrupt for device 0000:03:08.0 disabled
> e100: probe of 0000:03:08.0 failed with error -12
> eth0:  setting full-duplex.
> ------------[ cut here ]------------
> WARNING: at net/ipv4/tcp_input.c:2169 tcp_mark_head_lost+0x121/0x150()
> Modules linked in: snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss eeprom e100 psmouse snd_hda_intel snd_pcm snd_timer btusb rtc_cmos thermal bluetooth rtc_core serio_raw intel_agp button processor sg snd rtc_lib i2c_i801 evdev agpgart soundcore dcdbas 3c59x pcspkr snd_page_alloc
> Pid: 0, comm: swapper Not tainted 2.6.24-rc8-mm1 #4
>  [<c0132100>] ? printk+0x0/0x20
>  [<c0131834>] warn_on_slowpath+0x54/0x80
>  [<c03e8df8>] ? ip_finish_output+0x128/0x2e0
>  [<c03e9527>] ? ip_output+0xe7/0x100
>  [<c03e8a88>] ? ip_local_out+0x18/0x20
>  [<c03e991c>] ? ip_queue_xmit+0x3dc/0x470
>  [<c043641e>] ? _spin_unlock_irqrestore+0x5e/0x70
>  [<c0186be1>] ? check_pad_bytes+0x61/0x80
>  [<c03f6031>] tcp_mark_head_lost+0x121/0x150
>  [<c03f60ac>] tcp_update_scoreboard+0x4c/0x170
>  [<c03f6e0a>] tcp_fastretrans_alert+0x48a/0x6b0
>  [<c03f7d93>] tcp_ack+0x1b3/0x3a0
>  [<c03fa14b>] tcp_rcv_established+0x3eb/0x710
>  [<c04015c5>] tcp_v4_do_rcv+0xe5/0x100
>  [<c0401bbb>] tcp_v4_rcv+0x5db/0x660

Doh, once more these S+L things..., the rest are symptom of the first 
problem.

What is strange is that it doesn't show up until now, the last TCP
changes that could have some significance are from early Dec/Nov. Is
there some reason why you haven't seen this before this (e.g., not
tested with similar cfg or so)? I'm a bit worried about its
reproducability if it takes this far to see it...


-- 
 i.

^ permalink raw reply

* Re: [PATCH] [IPV4] route: fix locking in rt_run_flush()
From: Eric Dumazet @ 2008-01-21 21:14 UTC (permalink / raw)
  To: David Miller; +Cc: joonwpark81, netdev
In-Reply-To: <20080121.024043.105024413.davem@davemloft.net>

David Miller a écrit :
> From: Joonwoo Park <joonwpark81@gmail.com>
> Date: Tue, 22 Jan 2008 00:08:57 +0900
> 
>> The rt_run_flush() can be stucked if it was called while netdev is on the 
>> high load.
>> It's possible when pushing rtable to rt_hash is faster than pulling
>> from it.
>>
>> The commands 'ifconfig up or ifconfig mtu' and netif_carrier_on() can
>> introduce soft lockup like this:
>>
>> [  363.528001] BUG: soft lockup - CPU#0 stuck for 11s! [events/0:9]
>> [  363.531492]
>> [  363.535027] Pid: 9, comm: events/0 Not tainted (2.6.24-rc8 #14)
>> [  363.538837] EIP: 0060:[<c4086a39>] EFLAGS: 00000286 CPU: 0
>> [  363.542762] EIP is at kfree+0xa9/0xf0
>> ...
>> [  363.660815]  [<c42fb0fd>] skb_release_data+0x5d/0x90
>> [  363.666989]  [<c42fb7dc>] skb_release_all+0x5c/0xd0
>> [  363.673207]  [<c42faf8b>] __kfree_skb+0xb/0x90
>> [  363.679474]  [<c42fb029>] kfree_skb+0x19/0x40
>> [  363.685811]  [<c4322d87>] ip_rcv+0x27/0x290
>> [  363.692223]  [<c4300ae5>] netif_receive_skb+0x255/0x320
>> [  363.698759]  [<f88465aa>] e1000_clean_rx_irq+0x14a/0x4f0 [e1000]
>> [  363.705456]  [<f88437c2>] e1000_clean+0x62/0x270 [e1000]
>> [  363.712217]  [<c43031ee>] net_rx_action+0x16e/0x220
>> [  363.719065]  [<c40346d7>] __do_softirq+0x87/0x100
>> [  363.726001]  [<c40347a7>] do_softirq+0x57/0x60
>> [  363.732979]  [<c4034b4e>] local_bh_enable_ip+0xae/0x100
>> [  363.740094]  [<c43e73f5>] _spin_unlock_bh+0x25/0x30
>> [  363.747283]  [<c431ec88>] rt_run_flush+0xc8/0xe0
>> [  363.754566]  [<c4320c76>] rt_cache_flush+0xd6/0xe0
>> [  363.761917]  [<c4350269>] fib_netdev_event+0x89/0xa0
>> [  363.769361]  [<c4047d67>] notifier_call_chain+0x37/0x80
>> ...
>>
>> This patch makes rt_run_flush() to run with softirq is disabled.
>>
>> Signed-off-by: Joonwoo Park <joonwpark81@gmail.com>
> 
> I agree with the analysis of the problem, however not the solution.
> 
> This will absolutely kill software interrupt latency.
> 
> In fact, we have moved much of the flush work into a workqueue in
> net-2.6.25 because of how important that is
> 
> We need to find some other way to solve this.
> 
> Eric, any ideas?

Hum... 2.6.25 is certainly better in this aspect, but I remember we left 
something to finish :)

We currently can have a worker doing the automatic flush every 600 seconds, 
and another task doing a rt_cache_flush(...)

On very loaded machines (DDOS), routes might be added faster than deleted.

Also, each change in routes must invalidate rtcache, and/but full scan of this 
cache is way too expensive (huge amount of MBytes must me read/written)

One possibility is to use a genid marker so that each entry can be thrown away 
   if its genid is different than the global one.

rt_cache_flush(-1) or rt_secret_build() would just have to increment the 
global genid.


^ permalink raw reply

* Re: questions on NAPI processing latency and dropped network packets
From: Eric Dumazet @ 2008-01-21 21:31 UTC (permalink / raw)
  To: Chris Friesen; +Cc: netdev, linux-kernel
In-Reply-To: <4794F848.9020402@nortel.com>

Chris Friesen a écrit :
> I've done some further digging, and it appears that one of the problems 
> we may be facing is very high instantaneous traffic rates.
> 
> Instrumentation showed up to 222K packets/sec for short periods (at 
> least 1.1 ms, possibly longer), although the long-term average is down 
> around 14-16K packets/sec.

Instrumentation done where exactly ?

tcpdump on the host will lie, since timestamps are probably going to be host 
timestamps, giving the time of RX dequeing, and not time of packets arrival 
given by NIC.

> 
> If I bump the rx ring size up to 4096, we can handle all the packets and 
> we still have 44% idle on cpu0 and 27% idle on cpu1.
> 
> Is there anything else we can do to minimize the latency of network 
> packet processing and avoid having to crank the rx ring size up so high?
> 

You have some tasks that disable softirqs too long. Sometimes, bumping RX ring 
size is OK (but you will still have delays), sometimes it is not an option, 
since 4096 is the limit on current hardware.

^ permalink raw reply

* [PATCH] ipv6: update MSS even if MTU is unchanged
From: Jim Paris @ 2008-01-21 22:02 UTC (permalink / raw)
  To: netdev
In-Reply-To: <20080121105227.GA8306@jim.sh>

This is needed because in ndisc.c, we have:

  static void ndisc_router_discovery(struct sk_buff *skb)
  {
  // ...
  	if (ndopts.nd_opts_mtu) {
  // ...
  			if (rt)
  				rt->u.dst.metrics[RTAX_MTU-1] = mtu;

  			rt6_mtu_change(skb->dev, mtu);
  // ...
  }

Since the mtu is set directly here, rt6_mtu_change_route thinks that
it is unchanged, and so it fails to update the MSS accordingly.  This
patch lets rt6_mtu_change_route still update MSS if old_mtu == new_mtu.

Signed-off-by: Jim Paris <jim@jtan.com>
---
This fixes the problem I reported earlier where IPv6 autoconfiguration
ends up with mtu < advmss on the default route.

I don't know if this is the best way to fix the problem, but it works
for me.  Other options: set rt->u.dst.metrics[RTAX_ADVMSS-1] directly
in ndisc_router_discovery (but ipv6_advmss function isn't available);
don't set MTU at all in ndisc_router_discovery and let rt6_mtu_change
handle it; etc.

 net/ipv6/route.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 6ecb5e6..0965fb3 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -1912,7 +1912,7 @@ static int rt6_mtu_change_route(struct rt6_info *rt, void *p_arg)
 	 */
 	if (rt->rt6i_dev == arg->dev &&
 	    !dst_metric_locked(&rt->u.dst, RTAX_MTU) &&
-	    (dst_mtu(&rt->u.dst) > arg->mtu ||
+	    (dst_mtu(&rt->u.dst) >= arg->mtu ||
 	     (dst_mtu(&rt->u.dst) < arg->mtu &&
 	      dst_mtu(&rt->u.dst) == idev->cnf.mtu6))) {
 		rt->u.dst.metrics[RTAX_MTU-1] = arg->mtu;
-- 
1.5.3.8

^ permalink raw reply related

* [PATCH 1/3 v2][NET] gen_estimator: faster gen_kill_estimator
From: Jarek Poplawski @ 2008-01-21 22:31 UTC (permalink / raw)
  To: netdev; +Cc: Badalian Vyacheslav, Patrick McHardy, jamal, David Miller
In-Reply-To: <20080120234659.GA2691@ami.dom.local>


So, let's try something easy first: #ifdef __KERNEL__. (I know there
are many esthetes around, but since this subject looks quite dirty...)

Alternatively we could change an api, and as a matter of fact there was
such a try some time ago, but is it really worth of such a mess?

Regards,
Jarek P.
-----------> (take 2)

gen_kill_estimator() is called during qdisc_destroy() with BHs disabled,
and each time does searching for each list member. This can block soft
interrupts for quite a long time when many classes are used. This patch 
changes this by storing pointers to internal gen_estimator structures
with gnet_stats_rate_est structures used by clients of gen_estimator.

This method removes currently possible registering in gen_estimator the
same structures more than once, but it isn't used after all. (There is
added a warning if gen_new_estimator() is called with structures being
used by gen_estimator already.) Thanks to David Miller for pointing an
error in the first version of this patch.
 
Reported-by: Badalian Vyacheslav <slavon@bigtelecom.ru>
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
 
[needs more testing]
 
---

 include/linux/gen_stats.h |    4 ++++
 net/core/gen_estimator.c  |   36 +++++++++++++++++++++++++++++++++---
 2 files changed, 37 insertions(+), 3 deletions(-)

diff -Nurp 2.6.24-rc8-mm1-/include/linux/gen_stats.h 2.6.24-rc8-mm1+/include/linux/gen_stats.h
--- 2.6.24-rc8-mm1-/include/linux/gen_stats.h	2007-10-09 22:31:38.000000000 +0200
+++ 2.6.24-rc8-mm1+/include/linux/gen_stats.h	2008-01-21 22:53:47.000000000 +0100
@@ -28,11 +28,15 @@ struct gnet_stats_basic
  * struct gnet_stats_rate_est - rate estimator
  * @bps: current byte rate
  * @pps: current packet rate
+ * @gen_estimator: internal data
  */
 struct gnet_stats_rate_est
 {
 	__u32	bps;
 	__u32	pps;
+#ifdef __KERNEL__
+	unsigned long	gen_estimator;
+#endif
 };
 
 /**
diff -Nurp 2.6.24-rc8-mm1-/net/core/gen_estimator.c 2.6.24-rc8-mm1+/net/core/gen_estimator.c
--- 2.6.24-rc8-mm1-/net/core/gen_estimator.c	2008-01-19 17:54:45.000000000 +0100
+++ 2.6.24-rc8-mm1+/net/core/gen_estimator.c	2008-01-20 20:58:35.000000000 +0100
@@ -139,6 +139,9 @@ skip:
 	rcu_read_unlock();
 }
 
+static void gen_kill_estimator_find(struct gnet_stats_basic *bstats,
+				    struct gnet_stats_rate_est *rate_est);
+
 /**
  * gen_new_estimator - create a new rate estimator
  * @bstats: basic statistics
@@ -171,6 +174,10 @@ int gen_new_estimator(struct gnet_stats_
 	if (parm->interval < -2 || parm->interval > 3)
 		return -EINVAL;
 
+	if (rate_est->gen_estimator)
+		/* not sure: not zeroed in alloc or reused */
+		gen_kill_estimator_find(bstats, rate_est);
+
 	est = kzalloc(sizeof(*est), GFP_KERNEL);
 	if (est == NULL)
 		return -ENOBUFS;
@@ -184,6 +191,7 @@ int gen_new_estimator(struct gnet_stats_
 	est->avbps = rate_est->bps<<5;
 	est->last_packets = bstats->packets;
 	est->avpps = rate_est->pps<<10;
+	rate_est->gen_estimator = (unsigned long)est;
 
 	if (!elist[idx].timer.function) {
 		INIT_LIST_HEAD(&elist[idx].list);
@@ -209,13 +217,30 @@ static void __gen_kill_estimator(struct 
  * @bstats: basic statistics
  * @rate_est: rate estimator statistics
  *
- * Removes the rate estimator specified by &bstats and &rate_est
- * and deletes the timer.
+ * Removes the rate estimator specified by &bstats and &rate_est.
  *
  * NOTE: Called under rtnl_mutex
  */
 void gen_kill_estimator(struct gnet_stats_basic *bstats,
-	struct gnet_stats_rate_est *rate_est)
+			struct gnet_stats_rate_est *rate_est)
+{
+	if (rate_est && rate_est->gen_estimator) {
+		struct gen_estimator *e;
+		
+		e = (struct gen_estimator *)rate_est->gen_estimator;
+
+		rate_est->gen_estimator = 0;
+		write_lock_bh(&est_lock);
+		e->bstats = NULL;
+		write_unlock_bh(&est_lock);
+
+		list_del_rcu(&e->list);
+		call_rcu(&e->e_rcu, __gen_kill_estimator);
+	}
+}
+
+static void gen_kill_estimator_find(struct gnet_stats_basic *bstats,
+				    struct gnet_stats_rate_est *rate_est)
 {
 	int idx;
 	struct gen_estimator *e, *n;
@@ -236,6 +261,11 @@ void gen_kill_estimator(struct gnet_stat
 
 			list_del_rcu(&e->list);
 			call_rcu(&e->e_rcu, __gen_kill_estimator);
+
+			WARN_ON_ONCE(1); /* gen_new_estimator() repeated? */
+			rate_est->gen_estimator = 0;
+
+			return;
 		}
 	}
 }

^ permalink raw reply

* Re: [PATCH 1/3 v2][NET] gen_estimator: faster gen_kill_estimator
From: Jarek Poplawski @ 2008-01-21 22:41 UTC (permalink / raw)
  To: netdev; +Cc: Badalian Vyacheslav, Patrick McHardy, jamal, David Miller
In-Reply-To: <20080121223137.GB2758@ami.dom.local>

On Mon, Jan 21, 2008 at 11:31:37PM +0100, Jarek Poplawski wrote:
> 
> So, let's try something easy first: #ifdef __KERNEL__. (I know there
...

SORRY!!! Of course this is still wrong, I withdraw this patch.

Jarek P.

^ permalink raw reply

* Re: questions on NAPI processing latency and dropped network packets
From: Chris Friesen @ 2008-01-21 23:15 UTC (permalink / raw)
  To: Ben Greear; +Cc: netdev, linux-kernel
In-Reply-To: <47950A71.5010304@candelatech.com>

Ben Greear wrote:
> Chris Friesen wrote:
> 
>> Is there anything else we can do to minimize the latency of network 
>> packet processing and avoid having to crank the rx ring size up so high?
> 
> 
> Why is it such a big deal to crank up the rx queue length?  Seems like
> a perfectly normal way to handle bursts like this...

It means that the latency for handling those packets is higher than it 
could be.  Draining 4096 packets from the queue will take a while.

Ideally we'd like to bring the latency down as much as possible, and 
then bump up the rx queue length to handle the rest.

Chris

^ permalink raw reply

* Re: questions on NAPI processing latency and dropped network packets
From: Chris Friesen @ 2008-01-21 23:25 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, linux-kernel
In-Reply-To: <47950F1D.4010508@cosmosbay.com>

Eric Dumazet wrote:
> Chris Friesen a écrit :
> 
>> I've done some further digging, and it appears that one of the 
>> problems we may be facing is very high instantaneous traffic rates.
>>
>> Instrumentation showed up to 222K packets/sec for short periods (at 
>> least 1.1 ms, possibly longer), although the long-term average is down 
>> around 14-16K packets/sec.
> 
> 
> Instrumentation done where exactly ?

I added some code to e1000_clean_rx_irq() to track rx_fifo drops, total 
packets received, and an accurate timestamp.

If rx_fifo errors changed, it would dump the information.

>> Is there anything else we can do to minimize the latency of network 
>> packet processing and avoid having to crank the rx ring size up so high?

> You have some tasks that disable softirqs too long. Sometimes, bumping 
> RX ring size is OK (but you will still have delays), sometimes it is not 
> an option, since 4096 is the limit on current hardware.

I added some instrumentation to take timestamps in __do_softirq() as 
well.  Based on these timestamps, I can see the following code sequence:

2374604616 usec, start processing softirqs in __do_softirq()
2374610337 usec, log values in e1000_clean_rx_irq()
2374611411 usec, log values in e1000_clean_rx_irq()

In between the successive calls to e1000_clean_rx_irq() the rx_fifo 
counts went up.

Does anyone have any patchsets to track down what softirqs are taking a 
long time, and/or who's disabling softirqs?

Chris

^ permalink raw reply

* Re: questions on NAPI processing latency and dropped network packets
From: Ben Greear @ 2008-01-21 23:32 UTC (permalink / raw)
  To: Chris Friesen; +Cc: netdev
In-Reply-To: <47952783.8080505@nortel.com>

Chris Friesen wrote:
> Ben Greear wrote:
>> Chris Friesen wrote:
>>
>>> Is there anything else we can do to minimize the latency of network 
>>> packet processing and avoid having to crank the rx ring size up so high?
>>
>>
>> Why is it such a big deal to crank up the rx queue length?  Seems like
>> a perfectly normal way to handle bursts like this...
> 
> It means that the latency for handling those packets is higher than it 
> could be.  Draining 4096 packets from the queue will take a while.
> 
> Ideally we'd like to bring the latency down as much as possible, and 
> then bump up the rx queue length to handle the rest.

Unless having a bigger queue somehow makes the IRQ fire later,
then it wouldn't seem to increase the time for packets that arrive
early in the queue, though it would of course take longer to process
the ones at the back.  Still, if the option is process them slightly
later or drop them, the choice seems obvious...

Decreasing your latencies seems a mostly un-related optimization that
should be tackled independently of queue size.

Do you perhaps have any network equipment that would queue up the pkts
and burst them?  Maybe a firewall, router, or something?  That could
potentially cause small bursts of pkts...

Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply

* [PATCH 2.6.25 1/4][BNX2]: Disable jumbo rx paging on 5709 Ax.
From: Michael Chan @ 2008-01-22  1:07 UTC (permalink / raw)
  To: davem, netdev

[BNX2]: Disable jumbo rx paging on 5709 Ax.

The chip has problem running in this mode and needs to be disabled.

Signed-off-by: Michael Chan <mchan@broadcom.com>

diff --git a/drivers/net/bnx2.c b/drivers/net/bnx2.c
index 94d1857..8348af2 100644
--- a/drivers/net/bnx2.c
+++ b/drivers/net/bnx2.c
@@ -4669,7 +4669,7 @@ bnx2_set_rx_ring_size(struct bnx2 *bp, u32 size)
 	bp->rx_pg_ring_size = 0;
 	bp->rx_max_pg_ring = 0;
 	bp->rx_max_pg_ring_idx = 0;
-	if (rx_space > PAGE_SIZE) {
+	if ((rx_space > PAGE_SIZE) && !(bp->flags & JUMBO_BROKEN_FLAG)) {
 		int pages = PAGE_ALIGN(bp->dev->mtu - 40) >> PAGE_SHIFT;
 
 		jumbo_size = size * pages;
@@ -7031,6 +7031,8 @@ bnx2_init_board(struct pci_dev *pdev, struct net_device *dev)
 			goto err_out_unmap;
 		}
 		bp->flags |= PCIE_FLAG;
+		if (CHIP_REV(bp) == CHIP_REV_Ax)
+			bp->flags |= JUMBO_BROKEN_FLAG;
 	} else {
 		bp->pcix_cap = pci_find_capability(pdev, PCI_CAP_ID_PCIX);
 		if (bp->pcix_cap == 0) {
diff --git a/drivers/net/bnx2.h b/drivers/net/bnx2.h
index 09bd665..c1ab30b 100644
--- a/drivers/net/bnx2.h
+++ b/drivers/net/bnx2.h
@@ -6582,6 +6582,7 @@ struct bnx2 {
 #define PCIE_FLAG			0x00000200
 #define USING_MSIX_FLAG			0x00000400
 #define USING_MSI_OR_MSIX_FLAG		(USING_MSI_FLAG | USING_MSIX_FLAG)
+#define JUMBO_BROKEN_FLAG		0x00000800
 
 	/* Put tx producer and consumer fields in separate cache lines. */
 



^ permalink raw reply related

* [PATCH 2.6.25 3/4][BNX2]: Add link-down workaround on 5706 serdes.
From: Michael Chan @ 2008-01-22  1:08 UTC (permalink / raw)
  To: davem, netdev

[BNX2]: Add link-down workaround on 5706 serdes.

In some blade systems using the 5706 serdes, the hardware sometimes
does not properly generate link down interrupts.  We add a workaround
in the driver's timer to force a link-down when some PHY registers
report loss of SYNC.

The parallel detect logic is cleaned up slightly to better integrate
the workaround.

Signed-off-by: Michael Chan <mchan@broadcom.com>

diff --git a/drivers/net/bnx2.c b/drivers/net/bnx2.c
index 7ee7746..834675f 100644
--- a/drivers/net/bnx2.c
+++ b/drivers/net/bnx2.c
@@ -1186,6 +1186,19 @@ bnx2_disable_forced_2g5(struct bnx2 *bp)
 	bnx2_write_phy(bp, bp->mii_bmcr, bmcr);
 }
 
+static void
+bnx2_5706s_force_link_dn(struct bnx2 *bp, int start)
+{
+	u32 val;
+
+	bnx2_write_phy(bp, MII_BNX2_DSP_ADDRESS, MII_EXPAND_SERDES_CTL);
+	bnx2_read_phy(bp, MII_BNX2_DSP_RW_PORT, &val);
+	if (start)
+		bnx2_write_phy(bp, MII_BNX2_DSP_RW_PORT, val & 0xff0f);
+	else
+		bnx2_write_phy(bp, MII_BNX2_DSP_RW_PORT, val | 0xc0);
+}
+
 static int
 bnx2_set_link(struct bnx2 *bp)
 {
@@ -1211,6 +1224,10 @@ bnx2_set_link(struct bnx2 *bp)
 	    (CHIP_NUM(bp) == CHIP_NUM_5706)) {
 		u32 val;
 
+		if (bp->phy_flags & PHY_FORCED_DOWN_FLAG) {
+			bnx2_5706s_force_link_dn(bp, 0);
+			bp->phy_flags &= ~PHY_FORCED_DOWN_FLAG;
+		}
 		val = REG_RD(bp, BNX2_EMAC_STATUS);
 		if (val & BNX2_EMAC_STATUS_LINK)
 			bmsr |= BMSR_LSTATUS;
@@ -1239,7 +1256,15 @@ bnx2_set_link(struct bnx2 *bp)
 		    (bp->autoneg & AUTONEG_SPEED))
 			bnx2_disable_forced_2g5(bp);
 
-		bp->phy_flags &= ~PHY_PARALLEL_DETECT_FLAG;
+		if (bp->phy_flags & PHY_PARALLEL_DETECT_FLAG) {
+			u32 bmcr;
+
+			bnx2_read_phy(bp, bp->mii_bmcr, &bmcr);
+			bmcr |= BMCR_ANENABLE;
+			bnx2_write_phy(bp, bp->mii_bmcr, bmcr);
+
+			bp->phy_flags &= ~PHY_PARALLEL_DETECT_FLAG;
+		}
 		bp->link_up = 0;
 	}
 
@@ -5276,13 +5301,51 @@ bnx2_test_intr(struct bnx2 *bp)
 	return -ENODEV;
 }
 
+static int
+bnx2_5706_serdes_has_link(struct bnx2 *bp)
+{
+	u32 mode_ctl, an_dbg, exp;
+
+	bnx2_write_phy(bp, MII_BNX2_MISC_SHADOW, MISC_SHDW_MODE_CTL);
+	bnx2_read_phy(bp, MII_BNX2_MISC_SHADOW, &mode_ctl);
+
+	if (!(mode_ctl & MISC_SHDW_MODE_CTL_SIG_DET))
+		return 0;
+
+	bnx2_write_phy(bp, MII_BNX2_MISC_SHADOW, MISC_SHDW_AN_DBG);
+	bnx2_read_phy(bp, MII_BNX2_MISC_SHADOW, &an_dbg);
+	bnx2_read_phy(bp, MII_BNX2_MISC_SHADOW, &an_dbg);
+
+	if (an_dbg & MISC_SHDW_AN_DBG_NOSYNC)
+		return 0;
+
+	bnx2_write_phy(bp, MII_BNX2_DSP_ADDRESS, MII_EXPAND_REG1);
+	bnx2_read_phy(bp, MII_BNX2_DSP_RW_PORT, &exp);
+	bnx2_read_phy(bp, MII_BNX2_DSP_RW_PORT, &exp);
+
+	if (exp & MII_EXPAND_REG1_RUDI_C)	/* receiving CONFIG */
+		return 0;
+
+	return 1;
+}
+
 static void
 bnx2_5706_serdes_timer(struct bnx2 *bp)
 {
+	int check_link = 1;
+
 	spin_lock(&bp->phy_lock);
-	if (bp->serdes_an_pending)
+	if (bp->phy_flags & PHY_FORCED_DOWN_FLAG) {
+		bnx2_5706s_force_link_dn(bp, 0);
+		bp->phy_flags &= ~PHY_FORCED_DOWN_FLAG;
+		spin_unlock(&bp->phy_lock);
+		return;
+	}
+
+	if (bp->serdes_an_pending) {
 		bp->serdes_an_pending--;
-	else if ((bp->link_up == 0) && (bp->autoneg & AUTONEG_SPEED)) {
+		check_link = 0;
+	} else if ((bp->link_up == 0) && (bp->autoneg & AUTONEG_SPEED)) {
 		u32 bmcr;
 
 		bp->current_interval = bp->timer_interval;
@@ -5290,19 +5353,7 @@ bnx2_5706_serdes_timer(struct bnx2 *bp)
 		bnx2_read_phy(bp, bp->mii_bmcr, &bmcr);
 
 		if (bmcr & BMCR_ANENABLE) {
-			u32 phy1, phy2;
-
-			bnx2_write_phy(bp, 0x1c, 0x7c00);
-			bnx2_read_phy(bp, 0x1c, &phy1);
-
-			bnx2_write_phy(bp, 0x17, 0x0f01);
-			bnx2_read_phy(bp, 0x15, &phy2);
-			bnx2_write_phy(bp, 0x17, 0x0f01);
-			bnx2_read_phy(bp, 0x15, &phy2);
-
-			if ((phy1 & 0x10) &&	/* SIGNAL DETECT */
-				!(phy2 & 0x20)) {	/* no CONFIG */
-
+			if (bnx2_5706_serdes_has_link(bp)) {
 				bmcr &= ~BMCR_ANENABLE;
 				bmcr |= BMCR_SPEED1000 | BMCR_FULLDPLX;
 				bnx2_write_phy(bp, bp->mii_bmcr, bmcr);
@@ -5314,6 +5365,7 @@ bnx2_5706_serdes_timer(struct bnx2 *bp)
 		 (bp->phy_flags & PHY_PARALLEL_DETECT_FLAG)) {
 		u32 phy2;
 
+		check_link = 0;
 		bnx2_write_phy(bp, 0x17, 0x0f01);
 		bnx2_read_phy(bp, 0x15, &phy2);
 		if (phy2 & 0x20) {
@@ -5328,6 +5380,18 @@ bnx2_5706_serdes_timer(struct bnx2 *bp)
 	} else
 		bp->current_interval = bp->timer_interval;
 
+	if (bp->link_up && (bp->autoneg & AUTONEG_SPEED) && check_link) {
+		u32 val;
+
+		bnx2_write_phy(bp, MII_BNX2_MISC_SHADOW, MISC_SHDW_AN_DBG);
+		bnx2_read_phy(bp, MII_BNX2_MISC_SHADOW, &val);
+		bnx2_read_phy(bp, MII_BNX2_MISC_SHADOW, &val);
+
+		if (val & MISC_SHDW_AN_DBG_NOSYNC) {
+			bnx2_5706s_force_link_dn(bp, 1);
+			bp->phy_flags |= PHY_FORCED_DOWN_FLAG;
+		}
+	}
 	spin_unlock(&bp->phy_lock);
 }
 
diff --git a/drivers/net/bnx2.h b/drivers/net/bnx2.h
index c1ab30b..31a030a 100644
--- a/drivers/net/bnx2.h
+++ b/drivers/net/bnx2.h
@@ -6344,6 +6344,15 @@ struct l2_fhdr {
 #define MII_BNX2_DSP_RW_PORT			0x15
 #define MII_BNX2_DSP_ADDRESS			0x17
 #define MII_BNX2_DSP_EXPAND_REG			 0x0f00
+#define MII_EXPAND_REG1				  (MII_BNX2_DSP_EXPAND_REG | 1)
+#define MII_EXPAND_REG1_RUDI_C			   0x20
+#define MII_EXPAND_SERDES_CTL			  (MII_BNX2_DSP_EXPAND_REG | 2)
+
+#define MII_BNX2_MISC_SHADOW			0x1c
+#define MISC_SHDW_AN_DBG			 0x6800
+#define MISC_SHDW_AN_DBG_NOSYNC			  0x0002
+#define MISC_SHDW_MODE_CTL			 0x7c00
+#define MISC_SHDW_MODE_CTL_SIG_DET		  0x0010
 
 #define MII_BNX2_BLK_ADDR			0x1f
 #define MII_BNX2_BLK_ADDR_IEEE0			 0x0000
@@ -6643,6 +6652,7 @@ struct bnx2 {
 #define PHY_INT_MODE_LINK_READY_FLAG	0x200
 #define PHY_DIS_EARLY_DAC_FLAG		0x400
 #define REMOTE_PHY_CAP_FLAG		0x800
+#define PHY_FORCED_DOWN_FLAG		0x1000
 
 	u32			mii_bmcr;
 	u32			mii_bmsr;



^ permalink raw reply related

* [PATCH 2.6.25 2/4][BNX2]: Fix minor bug in bnx2_has_work().
From: Michael Chan @ 2008-01-22  1:08 UTC (permalink / raw)
  To: davem, netdev

[BNX2]: Fix minor bug in bnx2_has_work().

It is more correct to get the status block from the bnx2_napi struct
instead of the bnx2 struct.  It happens that they are the same in this
case because we are using the first MSIX vector.

Signed-off-by: Michael Chan <mchan@broadcom.com>

diff --git a/drivers/net/bnx2.c b/drivers/net/bnx2.c
index 8348af2..7ee7746 100644
--- a/drivers/net/bnx2.c
+++ b/drivers/net/bnx2.c
@@ -2903,8 +2903,7 @@ bnx2_tx_msix(int irq, void *dev_instance)
 static inline int
 bnx2_has_work(struct bnx2_napi *bnapi)
 {
-	struct bnx2 *bp = bnapi->bp;
-	struct status_block *sblk = bp->status_blk;
+	struct status_block *sblk = bnapi->status_blk;
 
 	if ((bnx2_get_hw_rx_cons(bnapi) != bnapi->rx_cons) ||
 	    (bnx2_get_hw_tx_cons(bnapi) != bnapi->hw_tx_cons))



^ permalink raw reply related

* [PATCH 2.6.25 4/4][BNX2] Update version to 1.7.2.
From: Michael Chan @ 2008-01-22  1:08 UTC (permalink / raw)
  To: davem, netdev

[BNX2] Update version to 1.7.2.

Signed-off-by: Michael Chan <mchan@broadcom.com>

diff --git a/drivers/net/bnx2.c b/drivers/net/bnx2.c
index 834675f..970a003 100644
--- a/drivers/net/bnx2.c
+++ b/drivers/net/bnx2.c
@@ -1,6 +1,6 @@
 /* bnx2.c: Broadcom NX2 network driver.
  *
- * Copyright (c) 2004-2007 Broadcom Corporation
+ * Copyright (c) 2004-2008 Broadcom Corporation
  *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License as published by
@@ -56,8 +56,8 @@
 
 #define DRV_MODULE_NAME		"bnx2"
 #define PFX DRV_MODULE_NAME	": "
-#define DRV_MODULE_VERSION	"1.7.1"
-#define DRV_MODULE_RELDATE	"December 19, 2007"
+#define DRV_MODULE_VERSION	"1.7.2"
+#define DRV_MODULE_RELDATE	"January 21, 2008"
 
 #define RUN_AT(x) (jiffies + (x))
 



^ permalink raw reply related

* Re: [PATCH 1/3 v2][NET] gen_estimator: faster gen_kill_estimator
From: David Miller @ 2008-01-22  0:29 UTC (permalink / raw)
  To: jarkao2; +Cc: netdev, slavon, kaber, hadi
In-Reply-To: <20080121223137.GB2758@ami.dom.local>

From: Jarek Poplawski <jarkao2@gmail.com>
Date: Mon, 21 Jan 2008 23:31:37 +0100

> 
> So, let's try something easy first: #ifdef __KERNEL__. (I know there
> are many esthetes around, but since this subject looks quite dirty...)

You can't do this, the attribute is copied to the user netlink SKB
using sizeof(struct gnet_stats_rate_est) as the size (or more
specifically sizeof(*p) where p is a pointer to this structure).

Therefore this new patch has the same exact problem.

Fix this right, make a structure like:

struct kernel_gnet_stats_rate_est {
	struct gnet_stats_rate_est	est;
	void				*gen_estimator;
}

And update all the code as needed.

^ permalink raw reply

* Re: Please pull 'upstream-davem' branch of wireless-2.6 (2008-01-21)
From: David Miller @ 2008-01-22  1:05 UTC (permalink / raw)
  To: linville; +Cc: netdev, linux-wireless
In-Reply-To: <20080121202835.GA3065@tuxdriver.com>

From: "John W. Linville" <linville@tuxdriver.com>
Date: Mon, 21 Jan 2008 15:28:35 -0500

> A slew of patches intended for 2.6.25...  Included are some big updates to
> b43, iwlwifi, libertas, and rt2x00.  Also included are some sizeable
> mac80211 updates and a bunch of cleanup work from Al Viro, as well as a
> smattering of other patches.
> 
> More notable are two new drivers: ath5k and rtl8180.  Both of these
> drivers have seen a lot of development during this cycle, and both have
> been serving commendably in Fedora 8 for some time.  I think we are better
> off with them in the upstream tree now.
> 
> Please let me know if there are problems!

Pulled and pushed back out to net-2.6.25, thanks John!

^ permalink raw reply

* Re: [PATCH 2.6.25 1/4][BNX2]: Disable jumbo rx paging on 5709 Ax.
From: David Miller @ 2008-01-22  1:06 UTC (permalink / raw)
  To: mchan; +Cc: netdev
In-Reply-To: <1200964052.10010.46.camel@dell>

From: "Michael Chan" <mchan@broadcom.com>
Date: Mon, 21 Jan 2008 17:07:32 -0800

> [BNX2]: Disable jumbo rx paging on 5709 Ax.
> 
> The chip has problem running in this mode and needs to be disabled.
> 
> Signed-off-by: Michael Chan <mchan@broadcom.com>

Applied.

^ permalink raw reply

* Re: [PATCH 2.6.25 2/4][BNX2]: Fix minor bug in bnx2_has_work().
From: David Miller @ 2008-01-22  1:06 UTC (permalink / raw)
  To: mchan; +Cc: netdev
In-Reply-To: <1200964081.10010.47.camel@dell>

From: "Michael Chan" <mchan@broadcom.com>
Date: Mon, 21 Jan 2008 17:08:01 -0800

> [BNX2]: Fix minor bug in bnx2_has_work().
> 
> It is more correct to get the status block from the bnx2_napi struct
> instead of the bnx2 struct.  It happens that they are the same in this
> case because we are using the first MSIX vector.
> 
> Signed-off-by: Michael Chan <mchan@broadcom.com>

Applied.

^ permalink raw reply

* Re: [PATCH 2.6.25 3/4][BNX2]: Add link-down workaround on 5706 serdes.
From: David Miller @ 2008-01-22  1:07 UTC (permalink / raw)
  To: mchan; +Cc: netdev
In-Reply-To: <1200964104.10010.48.camel@dell>

From: "Michael Chan" <mchan@broadcom.com>
Date: Mon, 21 Jan 2008 17:08:24 -0800

> [BNX2]: Add link-down workaround on 5706 serdes.
> 
> In some blade systems using the 5706 serdes, the hardware sometimes
> does not properly generate link down interrupts.  We add a workaround
> in the driver's timer to force a link-down when some PHY registers
> report loss of SYNC.
> 
> The parallel detect logic is cleaned up slightly to better integrate
> the workaround.
> 
> Signed-off-by: Michael Chan <mchan@broadcom.com>

Applied.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox