Netdev List

Netdev List
 help / color / mirror / Atom feed

* [patch 5/9] PCI Error Recovery: e100 network device driver
From: akpm @ 2006-04-27  9:30 UTC (permalink / raw)
  To: jeff; +Cc: netdev, akpm, linas, jesse.brandeburg


From: linas@austin.ibm.com (Linas Vepstas)

Various PCI bus errors can be signaled by newer PCI controllers.  This
patch adds the PCI error recovery callbacks to the intel ethernet e100
device driver.  The patch has been tested, and appears to work well.

Signed-off-by: Linas Vepstas <linas@linas.org>
Acked-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
---

 drivers/net/e100.c |   75 +++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 75 insertions(+)

diff -puN drivers/net/e100.c~pci-error-recovery-e100-network-device-driver drivers/net/e100.c
--- devel/drivers/net/e100.c~pci-error-recovery-e100-network-device-driver	2006-04-10 23:21:20.000000000 -0700
+++ devel-akpm/drivers/net/e100.c	2006-04-10 23:21:20.000000000 -0700
@@ -2726,6 +2726,80 @@ static void e100_shutdown(struct pci_dev
 		DPRINTK(PROBE,ERR, "Error enabling wake\n");
 }
 
+/* ------------------ PCI Error Recovery infrastructure  -------------- */
+/**
+ * e100_io_error_detected - called when PCI error is detected.
+ * @pdev: Pointer to PCI device
+ * @state: The current pci conneection state
+ */
+static pci_ers_result_t e100_io_error_detected(struct pci_dev *pdev, pci_channel_state_t state)
+{
+	struct net_device *netdev = pci_get_drvdata(pdev);
+
+	/* Similar to calling e100_down(), but avoids adpater I/O. */
+	netdev->stop(netdev);
+
+	/* Detach; put netif into state similar to hotplug unplug. */
+	netif_poll_enable(netdev);
+	netif_device_detach(netdev);
+
+	/* Request a slot reset. */
+	return PCI_ERS_RESULT_NEED_RESET;
+}
+
+/**
+ * e100_io_slot_reset - called after the pci bus has been reset.
+ * @pdev: Pointer to PCI device
+ *
+ * Restart the card from scratch.
+ */
+static pci_ers_result_t e100_io_slot_reset(struct pci_dev *pdev)
+{
+	struct net_device *netdev = pci_get_drvdata(pdev);
+	struct nic *nic = netdev_priv(netdev);
+
+	if (pci_enable_device(pdev)) {
+		printk(KERN_ERR "e100: Cannot re-enable PCI device after reset.\n");
+		return PCI_ERS_RESULT_DISCONNECT;
+	}
+	pci_set_master(pdev);
+
+	/* Only one device per card can do a reset */
+	if (0 != PCI_FUNC(pdev->devfn))
+		return PCI_ERS_RESULT_RECOVERED;
+	e100_hw_reset(nic);
+	e100_phy_init(nic);
+
+	return PCI_ERS_RESULT_RECOVERED;
+}
+
+/**
+ * e100_io_resume - resume normal operations
+ * @pdev: Pointer to PCI device
+ *
+ * Resume normal operations after an error recovery
+ * sequence has been completed.
+ */
+static void e100_io_resume(struct pci_dev *pdev)
+{
+	struct net_device *netdev = pci_get_drvdata(pdev);
+	struct nic *nic = netdev_priv(netdev);
+
+	/* ack any pending wake events, disable PME */
+	pci_enable_wake(pdev, 0, 0);
+
+	netif_device_attach(netdev);
+	if (netif_running(netdev)) {
+		e100_open(netdev);
+		mod_timer(&nic->watchdog, jiffies);
+	}
+}
+
+static struct pci_error_handlers e100_err_handler = {
+	.error_detected = e100_io_error_detected,
+	.slot_reset = e100_io_slot_reset,
+	.resume = e100_io_resume,
+};
 
 static struct pci_driver e100_driver = {
 	.name =         DRV_NAME,
@@ -2737,6 +2811,7 @@ static struct pci_driver e100_driver = {
 	.resume =       e100_resume,
 #endif
 	.shutdown =     e100_shutdown,
+	.err_handler = &e100_err_handler,
 };
 
 static int __init e100_init_module(void)
_

^ permalink raw reply

* [patch 4/9] PCI Error Recovery: e1000 network device driver
From: akpm @ 2006-04-27  9:30 UTC (permalink / raw)
  To: jeff; +Cc: netdev, akpm, linas, jesse.brandeburg


From: Linas Vepstas <linas@linas.org>

Various PCI bus errors can be signaled by newer PCI controllers.  This
patch adds the PCI error recovery callbacks to the intel gigabit ethernet
e1000 device driver.  The patch has been tested, and appears to work well.

[akpm@osdl.org: minor cleanups]
Signed-off-by: Linas Vepstas <linas@linas.org>
Acked-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
---

 drivers/net/e1000/e1000_main.c |  116 ++++++++++++++++++++++++++++++-
 1 files changed, 115 insertions(+), 1 deletion(-)

diff -puN drivers/net/e1000/e1000_main.c~pci-error-recovery-e1000-network-device-driver drivers/net/e1000/e1000_main.c
--- devel/drivers/net/e1000/e1000_main.c~pci-error-recovery-e1000-network-device-driver	2006-04-22 01:39:15.000000000 -0700
+++ devel-akpm/drivers/net/e1000/e1000_main.c	2006-04-22 01:39:15.000000000 -0700
@@ -227,6 +227,16 @@ static int e1000_resume(struct pci_dev *
 static void e1000_netpoll (struct net_device *netdev);
 #endif
 
+static pci_ers_result_t e1000_io_error_detected(struct pci_dev *pdev,
+                     pci_channel_state_t state);
+static pci_ers_result_t e1000_io_slot_reset(struct pci_dev *pdev);
+static void e1000_io_resume(struct pci_dev *pdev);
+
+static struct pci_error_handlers e1000_err_handler = {
+	.error_detected = e1000_io_error_detected,
+	.slot_reset = e1000_io_slot_reset,
+	.resume = e1000_io_resume,
+};
 
 static struct pci_driver e1000_driver = {
 	.name     = e1000_driver_name,
@@ -236,8 +246,9 @@ static struct pci_driver e1000_driver = 
 	/* Power Managment Hooks */
 #ifdef CONFIG_PM
 	.suspend  = e1000_suspend,
-	.resume   = e1000_resume
+	.resume   = e1000_resume,
 #endif
+	.err_handler = &e1000_err_handler,
 };
 
 MODULE_AUTHOR("Intel Corporation, <linux.nics@intel.com>");
@@ -3076,6 +3087,10 @@ e1000_update_stats(struct e1000_adapter 
 
 #define PHY_IDLE_ERROR_COUNT_MASK 0x00FF
 
+	/* Prevent stats update while adapter is being reset */
+	if (adapter->link_speed == 0)
+		return;
+
 	spin_lock_irqsave(&adapter->stats_lock, flags);
 
 	/* these counters are modified from e1000_adjust_tbi_stats,
@@ -4625,4 +4640,103 @@ e1000_netpoll(struct net_device *netdev)
 }
 #endif
 
+/**
+ * e1000_io_error_detected - called when PCI error is detected
+ * @pdev: Pointer to PCI device
+ * @state: The current pci conneection state
+ *
+ * This function is called after a PCI bus error affecting
+ * this device has been detected.
+ */
+static pci_ers_result_t e1000_io_error_detected(struct pci_dev *pdev,
+						pci_channel_state_t state)
+{
+	struct net_device *netdev = pci_get_drvdata(pdev);
+	struct e1000_adapter *adapter = netdev->priv;
+
+	netif_device_detach(netdev);
+
+	if (netif_running(netdev))
+		e1000_down(adapter);
+
+	/* Request a slot slot reset. */
+	return PCI_ERS_RESULT_NEED_RESET;
+}
+
+/**
+ * e1000_io_slot_reset - called after the pci bus has been reset.
+ * @pdev: Pointer to PCI device
+ *
+ * Restart the card from scratch, as if from a cold-boot. Implementation
+ * resembles the first-half of the e1000_resume routine.
+ */
+static pci_ers_result_t e1000_io_slot_reset(struct pci_dev *pdev)
+{
+	struct net_device *netdev = pci_get_drvdata(pdev);
+	struct e1000_adapter *adapter = netdev->priv;
+
+	if (pci_enable_device(pdev)) {
+		printk(KERN_ERR "e1000: Cannot re-enable PCI device after "
+				"reset.\n");
+		return PCI_ERS_RESULT_DISCONNECT;
+	}
+	pci_set_master(pdev);
+
+	pci_enable_wake(pdev, 3, 0);
+	pci_enable_wake(pdev, 4, 0); /* 4 == D3 cold */
+
+	/* Perform card reset only on one instance of the card */
+	if (PCI_FUNC (pdev->devfn) != 0)
+		return PCI_ERS_RESULT_RECOVERED;
+
+	e1000_reset(adapter);
+	E1000_WRITE_REG(&adapter->hw, WUS, ~0);
+
+	return PCI_ERS_RESULT_RECOVERED;
+}
+
+/**
+ * e1000_io_resume - called when traffic can start flowing again.
+ * @pdev: Pointer to PCI device
+ *
+ * This callback is called when the error recovery driver tells us that
+ * its OK to resume normal operation. Implementation resembles the
+ * second-half of the e1000_resume routine.
+ */
+static void e1000_io_resume(struct pci_dev *pdev)
+{
+	struct net_device *netdev = pci_get_drvdata(pdev);
+	struct e1000_adapter *adapter = netdev->priv;
+	uint32_t manc, swsm;
+
+	if (netif_running(netdev)) {
+		if (e1000_up(adapter)) {
+			printk(KERN_ERR "e1000: can't bring device back up "
+					"after reset\n");
+			return;
+		}
+	}
+
+	netif_device_attach(netdev);
+
+	if (adapter->hw.mac_type >= e1000_82540 &&
+	    adapter->hw.media_type == e1000_media_type_copper) {
+		manc = E1000_READ_REG(&adapter->hw, MANC);
+		manc &= ~(E1000_MANC_ARP_EN);
+		E1000_WRITE_REG(&adapter->hw, MANC, manc);
+	}
+
+	switch (adapter->hw.mac_type) {
+	case e1000_82573:
+		swsm = E1000_READ_REG(&adapter->hw, SWSM);
+		E1000_WRITE_REG(&adapter->hw, SWSM, swsm | E1000_SWSM_DRV_LOAD);
+		break;
+	default:
+		break;
+	}
+
+	if (netif_running(netdev))
+		mod_timer(&adapter->watchdog_timer, jiffies);
+}
+
 /* e1000_main.c */
_

^ permalink raw reply

* [patch 2/9] tulip: NatSemi DP83840A PHY fix
From: akpm @ 2006-04-27  9:30 UTC (permalink / raw)
  To: jeff; +Cc: netdev, akpm, T-Bone, grundler, jgarzik, varenet


From: Thibaut VARENE <T-Bone@parisc-linux.org>

Fix a problem with Tulip 21142 HP branded PCI cards (PN#: B5509-66001),
which feature a NatSemi DP83840A PHY.

Without that patch, it is impossible to properly initialize the card's PHY,
and it's thus impossible to monitor/configure it.

It's a timing/posting problem, and it is solved exactly the same way Grant
fixed it elsewhere already.

Signed-off-by: Thibaut VARENE <varenet@parisc-linux.org>
Cc: Jeff Garzik <jgarzik@pobox.com>
Acked-by: Grant Grundler <grundler@parisc-linux.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
---

 drivers/net/tulip/media.c |   18 +++++++++++++++++-
 1 files changed, 17 insertions(+), 1 deletion(-)

diff -puN drivers/net/tulip/media.c~tulip-natsemi-dp83840a-phy-fix drivers/net/tulip/media.c
--- devel/drivers/net/tulip/media.c~tulip-natsemi-dp83840a-phy-fix	2006-04-10 23:21:18.000000000 -0700
+++ devel-akpm/drivers/net/tulip/media.c	2006-04-10 23:21:18.000000000 -0700
@@ -261,11 +261,27 @@ void tulip_select_media(struct net_devic
 				u16 *reset_sequence = &((u16*)(p+3))[init_length];
 				int reset_length = p[2 + init_length*2];
 				misc_info = reset_sequence + reset_length;
-				if (startup)
+				if (startup) {
+					int timeout = 10;	/* max 1 ms */
 					for (i = 0; i < reset_length; i++)
 						iowrite32(get_u16(&reset_sequence[i]) << 16, ioaddr + CSR15);
+
+					/* flush posted writes */
+					ioread32(ioaddr + CSR15);
+
+					/* Sect 3.10.3 in DP83840A.pdf (p39) */
+					udelay(500);
+
+					/* Section 4.2 in DP83840A.pdf (p43) */
+					/* and IEEE 802.3 "22.2.4.1.1 Reset" */
+					while (timeout-- &&
+						(tulip_mdio_read (dev, phy_num, MII_BMCR) & BMCR_RESET))
+						udelay(100);
+				}
 				for (i = 0; i < init_length; i++)
 					iowrite32(get_u16(&init_sequence[i]) << 16, ioaddr + CSR15);
+
+				ioread32(ioaddr + CSR15);	/* flush posted writes */
 			} else {
 				u8 *init_sequence = p + 2;
 				u8 *reset_sequence = p + 3 + init_length;
_

^ permalink raw reply

* [patch 6/9] e1000: prevent statistics from getting garbled during reset
From: akpm @ 2006-04-27  9:30 UTC (permalink / raw)
  To: jeff; +Cc: netdev, akpm, linas, jeffrey.t.kirsher, jesse.brandeburg,
	john.ronciak


From: Linas Vepstas <linas@austin.ibm.com>

If a PCI bus error/fault triggers a PCI bus reset, attempts to get the
ethernet packet count statistics from the hardware will fail, returning
garbage data upstream.  This patch skips statistics data collection if the
PCI device is not on the bus.

This patch presumes that an earlier patch,
[PATCH] PCI Error Recovery: e1000 network device driver
has already been applied.

Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Cc: John Ronciak <john.ronciak@intel.com>
Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Cc: Jeff Garzik <jeff@garzik.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
---

 drivers/net/e1000/e1000_main.c |    8 +++++++-
 1 files changed, 7 insertions(+), 1 deletion(-)

diff -puN drivers/net/e1000/e1000_main.c~e1000-prevent-statistics-from-getting-garbled-during-reset drivers/net/e1000/e1000_main.c
--- devel/drivers/net/e1000/e1000_main.c~e1000-prevent-statistics-from-getting-garbled-during-reset	2006-04-14 23:41:34.000000000 -0700
+++ devel-akpm/drivers/net/e1000/e1000_main.c	2006-04-14 23:41:34.000000000 -0700
@@ -3082,14 +3082,20 @@ void
 e1000_update_stats(struct e1000_adapter *adapter)
 {
 	struct e1000_hw *hw = &adapter->hw;
+	struct pci_dev *pdev = adapter->pdev;
 	unsigned long flags;
 	uint16_t phy_tmp;
 
 #define PHY_IDLE_ERROR_COUNT_MASK 0x00FF
 
-	/* Prevent stats update while adapter is being reset */
+	/*
+	 * Prevent stats update while adapter is being reset, or if the pci
+	 * connection is down.
+	 */
 	if (adapter->link_speed == 0)
 		return;
+	if (pdev->error_state && pdev->error_state != pci_channel_io_normal)
+		return;
 
 	spin_lock_irqsave(&adapter->stats_lock, flags);
 
_

^ permalink raw reply

* [patch 9/9] via-rhine: zero pad short packets on Rhine I ethernet cards
From: akpm @ 2006-04-27  9:30 UTC (permalink / raw)
  To: jeff; +Cc: netdev, akpm, craigbrind, rl

From: Craig Brind <craigbrind@gmail.com>

Fixes Rhine I cards disclosing fragments of previously transmitted frames
in new transmissions.

Before transmission, any socket buffer (skb) shorter than the ethernet
minimum length of 60 bytes was zero-padded.  On Rhine I cards the data can
later be copied into an aligned transmission buffer without copying this
padding.  This resulted in the transmission of the frame with the extra
bytes beyond the provided content leaking the previous contents of this
buffer on to the network.

Now zero-padding is repeated in the local aligned buffer if one is used.

Following a suggestion from the via-rhine maintainer, no attempt is made
here to avoid the duplicated effort of padding the skb if it is known that
an aligned buffer will definitely be used.  This is to make the change
"obviously correct" and allow it to be applied to a stable kernel if
necessary.  There is no change to the flow of control and the changes are
only to the Rhine I code path.

The patch has run on an in-service Rhine-I host without incident.  Frames
shorter than 60 bytes are now correctly zero-padded when captured on a
separate host.  I see no unusual stats reported by ifconfig, and no unusual
log messages.

Signed-off-by: Craig Brind <craigbrind@gmail.com>
Signed-off-by: Roger Luethi <rl@hellgate.ch>
Cc: Jeff Garzik <jeff@garzik.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
---

 drivers/net/via-rhine.c |    6 ++++++
 1 files changed, 6 insertions(+)

diff -puN drivers/net/via-rhine.c~via-rhine-zero-pad-short-packets-on-rhine-i-ethernet-cards drivers/net/via-rhine.c
--- devel/drivers/net/via-rhine.c~via-rhine-zero-pad-short-packets-on-rhine-i-ethernet-cards	2006-04-24 18:22:43.000000000 -0700
+++ devel-akpm/drivers/net/via-rhine.c	2006-04-24 18:22:43.000000000 -0700
@@ -129,6 +129,7 @@
 	- Massive clean-up
 	- Rewrite PHY, media handling (remove options, full_duplex, backoff)
 	- Fix Tx engine race for good
+	- Craig Brind: Zero padded aligned buffers for short packets.

 */

@@ -1326,7 +1327,12 @@ static int rhine_start_tx(struct sk_buff
 			rp->stats.tx_dropped++;
 			return 0;
 		}
+
+		/* Padding is not copied and so must be redone. */
 		skb_copy_and_csum_dev(skb, rp->tx_buf[entry]);
+		if (skb->len < ETH_ZLEN)
+			memset(rp->tx_buf[entry] + skb->len, 0,
+			       ETH_ZLEN - skb->len);
 		rp->tx_skbuff_dma[entry] = 0;
 		rp->tx_ring[entry].addr = cpu_to_le32(rp->tx_bufs_dma +
 						      (rp->tx_buf[entry] -
_

^ permalink raw reply

* [patch 7/9] e100: disable interrupts at boot
From: akpm @ 2006-04-27  9:30 UTC (permalink / raw)
  To: jeff
  Cc: netdev, akpm, bjorn.helgaas, jeffrey.t.kirsher, jesse.brandeburg,
	john.ronciak, nils.rennebarth, stern


From: Bjorn Helgaas <bjorn.helgaas@hp.com>

Apparently the Intel PRO/100 device enables interrupts on reset.  Unless
firmware explicitly disables PRO/100 interrupts, we can get a flood of
interrupts when a driver attaches to an unrelated device that happens to
share the PRO/100 IRQ.

This should resolve this "irq 11: nobody cared" bug report:
    http://bugzilla.kernel.org/show_bug.cgi?id=5918

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Cc: John Ronciak <john.ronciak@intel.com>
Cc: <stern@rowland.harvard.edu>
Cc: <nils.rennebarth@packetalarm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
---

 drivers/pci/quirks.c |   57 +++++++++++++++++++++++++++++++++++++++++
 1 files changed, 57 insertions(+)

diff -puN drivers/pci/quirks.c~e100-disable-interrupts-at-boot drivers/pci/quirks.c
--- devel/drivers/pci/quirks.c~e100-disable-interrupts-at-boot	2006-04-14 23:41:34.000000000 -0700
+++ devel-akpm/drivers/pci/quirks.c	2006-04-14 23:41:34.000000000 -0700
@@ -1374,6 +1374,63 @@ static void __devinit quirk_netmos(struc
 }
 DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_NETMOS, PCI_ANY_ID, quirk_netmos);
 
+static void __devinit quirk_e100_interrupt(struct pci_dev *dev)
+{
+	u16 command;
+	u32 bar;
+	u8 __iomem *csr;
+	u8 cmd_hi;
+
+	switch (dev->device) {
+	/* PCI IDs taken from drivers/net/e100.c */
+	case 0x1029:
+	case 0x1030 ... 0x1034:
+	case 0x1038 ... 0x103E:
+	case 0x1050 ... 0x1057:
+	case 0x1059:
+	case 0x1064 ... 0x106B:
+	case 0x1091 ... 0x1095:
+	case 0x1209:
+	case 0x1229:
+	case 0x2449:
+	case 0x2459:
+	case 0x245D:
+	case 0x27DC:
+		break;
+	default:
+		return;
+	}
+
+	/*
+	 * Some firmware hands off the e100 with interrupts enabled,
+	 * which can cause a flood of interrupts if packets are
+	 * received before the driver attaches to the device.  So
+	 * disable all e100 interrupts here.  The driver will
+	 * re-enable them when it's ready.
+	 */
+	pci_read_config_word(dev, PCI_COMMAND, &command);
+	pci_read_config_dword(dev, PCI_BASE_ADDRESS_0, &bar);
+
+	if (!(command & PCI_COMMAND_MEMORY) || !bar)
+		return;
+
+	csr = ioremap(bar, 8);
+	if (!csr) {
+		printk(KERN_WARNING "PCI: Can't map %s e100 registers\n",
+			pci_name(dev));
+		return;
+	}
+
+	cmd_hi = readb(csr + 3);
+	if (cmd_hi == 0) {
+		printk(KERN_WARNING "PCI: Firmware left %s e100 interrupts "
+			"enabled, disabling\n", pci_name(dev));
+		writeb(1, csr + 3);
+	}
+
+	iounmap(csr);
+}
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, PCI_ANY_ID, quirk_e100_interrupt);
 
 static void __devinit fixup_rev1_53c810(struct pci_dev* dev)
 {
_

^ permalink raw reply

* [patch 8/9] forcedeth: suggested cleanups
From: akpm @ 2006-04-27  9:30 UTC (permalink / raw)
  To: jeff; +Cc: netdev, akpm, ioe-lkml, manfred


From: Ingo Oeser <ioe-lkml@rameria.de>

general:
	- endian annotation of the ring descriptors

nv_getlen():
	- use htons() instead of __constant_htons()
          to improvde readability and let the compiler constant fold it.

nv_rx_process():
	- use a real for() loop in processing instead of goto and break
	- consolidate rx_errors increment
	- count detected rx_length_errors

Signed-off-by: Ingo Oeser <ioe-lkml@rameria.de>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
---

 drivers/net/forcedeth.c |   59 ++++++++++++++++----------------------
 1 files changed, 26 insertions(+), 33 deletions(-)

diff -puN drivers/net/forcedeth.c~forcedeth-suggested-cleanups drivers/net/forcedeth.c
--- devel/drivers/net/forcedeth.c~forcedeth-suggested-cleanups	2006-04-10 23:21:26.000000000 -0700
+++ devel-akpm/drivers/net/forcedeth.c	2006-04-10 23:21:26.000000000 -0700
@@ -328,17 +328,18 @@ enum {
 	NvRegMSIXIrqStatus = 0x3f0,
 };
 
-/* Big endian: should work, but is untested */
+/* Big endian: should work, but is untested.
+ * So give arch maintainers a hint here. -ioe */
 struct ring_desc {
-	u32 PacketBuffer;
-	u32 FlagLen;
+	__le32 PacketBuffer;
+	__le32 FlagLen;
 };
 
 struct ring_desc_ex {
-	u32 PacketBufferHigh;
-	u32 PacketBufferLow;
-	u32 TxVlan;
-	u32 FlagLen;
+	__le32 PacketBufferHigh;
+	__le32 PacketBufferLow;
+	__le32 TxVlan;
+	__le32 FlagLen;
 };
 
 typedef union _ring_type {
@@ -1403,7 +1404,7 @@ static int nv_getlen(struct net_device *
 	int protolen;	/* length as stored in the proto field */
 
 	/* 1) calculate len according to header */
-	if ( ((struct vlan_ethhdr *)packet)->h_vlan_proto == __constant_htons(ETH_P_8021Q)) {
+	if (((struct vlan_ethhdr *)packet)->h_vlan_proto == htons(ETH_P_8021Q)) {
 		protolen = ntohs( ((struct vlan_ethhdr *)packet)->h_vlan_encapsulated_proto );
 		hdrlen = VLAN_HLEN;
 	} else {
@@ -1453,12 +1454,10 @@ static void nv_rx_process(struct net_dev
 	u32 vlanflags = 0;
 
 
-	for (;;) {
+	for (; np->cur_rx - np->refill_rx < RX_RING; np->cur_rx++) {
 		struct sk_buff *skb;
 		int len;
 		int i;
-		if (np->cur_rx - np->refill_rx >= RX_RING)
-			break;	/* we scanned the whole ring - do not continue */
 
 		i = np->cur_rx % RX_RING;
 		if (np->desc_ver == DESC_VER_1 || np->desc_ver == DESC_VER_2) {
@@ -1498,33 +1497,29 @@ static void nv_rx_process(struct net_dev
 		/* look at what we actually got: */
 		if (np->desc_ver == DESC_VER_1) {
 			if (!(Flags & NV_RX_DESCRIPTORVALID))
-				goto next_pkt;
+				continue;
 
 			if (Flags & NV_RX_ERROR) {
 				if (Flags & NV_RX_MISSEDFRAME) {
 					np->stats.rx_missed_errors++;
-					np->stats.rx_errors++;
-					goto next_pkt;
+					goto error_pkt;
 				}
 				if (Flags & (NV_RX_ERROR1|NV_RX_ERROR2|NV_RX_ERROR3)) {
-					np->stats.rx_errors++;
-					goto next_pkt;
+					goto error_pkt;
 				}
 				if (Flags & NV_RX_CRCERR) {
 					np->stats.rx_crc_errors++;
-					np->stats.rx_errors++;
-					goto next_pkt;
+					goto error_pkt;
 				}
 				if (Flags & NV_RX_OVERFLOW) {
 					np->stats.rx_over_errors++;
-					np->stats.rx_errors++;
-					goto next_pkt;
+					goto error_pkt;
 				}
 				if (Flags & NV_RX_ERROR4) {
 					len = nv_getlen(dev, np->rx_skbuff[i]->data, len);
 					if (len < 0) {
-						np->stats.rx_errors++;
-						goto next_pkt;
+						np->stats.rx_length_errors++;
+						goto error_pkt;
 					}
 				}
 				/* framing errors are soft errors. */
@@ -1536,28 +1531,25 @@ static void nv_rx_process(struct net_dev
 			}
 		} else {
 			if (!(Flags & NV_RX2_DESCRIPTORVALID))
-				goto next_pkt;
+				continue;
 
 			if (Flags & NV_RX2_ERROR) {
 				if (Flags & (NV_RX2_ERROR1|NV_RX2_ERROR2|NV_RX2_ERROR3)) {
-					np->stats.rx_errors++;
-					goto next_pkt;
+					goto error_pkt;
 				}
 				if (Flags & NV_RX2_CRCERR) {
 					np->stats.rx_crc_errors++;
-					np->stats.rx_errors++;
-					goto next_pkt;
+					goto error_pkt;
 				}
 				if (Flags & NV_RX2_OVERFLOW) {
 					np->stats.rx_over_errors++;
-					np->stats.rx_errors++;
-					goto next_pkt;
+					goto error_pkt;
 				}
 				if (Flags & NV_RX2_ERROR4) {
 					len = nv_getlen(dev, np->rx_skbuff[i]->data, len);
 					if (len < 0) {
-						np->stats.rx_errors++;
-						goto next_pkt;
+						np->stats.rx_length_errors++;
+						goto error_pkt;
 					}
 				}
 				/* framing errors are soft errors */
@@ -1593,8 +1585,9 @@ static void nv_rx_process(struct net_dev
 		dev->last_rx = jiffies;
 		np->stats.rx_packets++;
 		np->stats.rx_bytes += len;
-next_pkt:
-		np->cur_rx++;
+		continue;
+error_pkt:
+		np->stats.rx_errors++;
 	}
 }
 
_

^ permalink raw reply

* [patch 3/9] natsemi: Add support for using MII port with no PHY
From: akpm @ 2006-04-27  9:30 UTC (permalink / raw)
  To: jeff; +Cc: netdev, akpm, broonie, jgarzik, thockin


From: Mark Brown <broonie@sirena.org.uk>

Provide a module option which configures the natsemi driver to use the
external MII port on the chip but ignore any PHYs that may be attached to it. 
The link state will be left as it was when the driver started and can be
configured via ethtool.  Any PHYs that are present can be accessed via the MII
ioctl()s.

This is useful for systems where the device is connected without a PHY or
where either information or actions outside the scope of the driver are
required in order to use the PHYs.

Signed-off-by: Mark Brown <broonie@sirena.org.uk>
Cc: Tim Hockin <thockin@hockin.org>
Cc: Jeff Garzik <jgarzik@pobox.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
---

 drivers/net/natsemi.c |  105 +++++++++++++++++++++++++---------------
 1 files changed, 67 insertions(+), 38 deletions(-)

diff -puN drivers/net/natsemi.c~natsemi-add-support-for-using-mii-port-with-no-phy drivers/net/natsemi.c
--- devel/drivers/net/natsemi.c~natsemi-add-support-for-using-mii-port-with-no-phy	2006-04-10 23:21:19.000000000 -0700
+++ devel-akpm/drivers/net/natsemi.c	2006-04-10 23:21:19.000000000 -0700
@@ -259,7 +259,7 @@ MODULE_PARM_DESC(debug, "DP8381x default
 MODULE_PARM_DESC(rx_copybreak, 
 	"DP8381x copy breakpoint for copy-only-tiny-frames");
 MODULE_PARM_DESC(options, 
-	"DP8381x: Bits 0-3: media type, bit 17: full duplex");
+	"DP8381x: Bits 0-3: media type, bit 17: full duplex, bit 18: ignore PHY");
 MODULE_PARM_DESC(full_duplex, "DP8381x full duplex setting(s) (1)");
 
 /*
@@ -690,6 +690,8 @@ struct netdev_private {
 	u32 intr_status;
 	/* Do not touch the nic registers */
 	int hands_off;
+	/* Don't pay attention to the reported link state. */
+	int ignore_phy;
 	/* external phy that is used: only valid if dev->if_port != PORT_TP */
 	int mii;
 	int phy_addr_external;
@@ -894,7 +896,19 @@ static int __devinit natsemi_probe1 (str
 	np->intr_status = 0;
 	np->eeprom_size = NATSEMI_DEF_EEPROM_SIZE;
 
+	option = find_cnt < MAX_UNITS ? options[find_cnt] : 0;
+	if (dev->mem_start)
+		option = dev->mem_start;
+
+	/* Ignore the PHY status? */
+	if (option & 0x400) {
+		np->ignore_phy = 1;
+	} else {
+		np->ignore_phy = 0;
+	}
+
 	/* Initial port:
+	 * - If configured to ignore the PHY set up for external.
 	 * - If the nic was configured to use an external phy and if find_mii
 	 *   finds a phy: use external port, first phy that replies.
 	 * - Otherwise: internal port.
@@ -902,7 +916,7 @@ static int __devinit natsemi_probe1 (str
 	 * The address would be used to access a phy over the mii bus, but
 	 * the internal phy is accessed through mapped registers.
 	 */
-	if (readl(ioaddr + ChipConfig) & CfgExtPhy)
+	if (np->ignore_phy || readl(ioaddr + ChipConfig) & CfgExtPhy)
 		dev->if_port = PORT_MII;
 	else
 		dev->if_port = PORT_TP;
@@ -912,7 +926,9 @@ static int __devinit natsemi_probe1 (str
 
 	if (dev->if_port != PORT_TP) {
 		np->phy_addr_external = find_mii(dev);
-		if (np->phy_addr_external == PHY_ADDR_NONE) {
+		/* If we're ignoring the PHY it doesn't matter if we can't
+		 * find one. */
+		if (!np->ignore_phy && np->phy_addr_external == PHY_ADDR_NONE) {
 			dev->if_port = PORT_TP;
 			np->phy_addr_external = PHY_ADDR_INTERNAL;
 		}
@@ -920,10 +936,6 @@ static int __devinit natsemi_probe1 (str
 		np->phy_addr_external = PHY_ADDR_INTERNAL;
 	}
 
-	option = find_cnt < MAX_UNITS ? options[find_cnt] : 0;
-	if (dev->mem_start)
-		option = dev->mem_start;
-
 	/* The lower four bits are the media type. */
 	if (option) {
 		if (option & 0x200)
@@ -957,7 +969,10 @@ static int __devinit natsemi_probe1 (str
 	if (mtu)
 		dev->mtu = mtu;
 
-	netif_carrier_off(dev);
+	if (np->ignore_phy)
+		netif_carrier_on(dev);
+	else
+		netif_carrier_off(dev);
 
 	/* get the initial settings from hardware */
 	tmp            = mdio_read(dev, MII_BMCR);
@@ -1005,6 +1020,8 @@ static int __devinit natsemi_probe1 (str
 		printk("%02x, IRQ %d", dev->dev_addr[i], irq);
 		if (dev->if_port == PORT_TP)
 			printk(", port TP.\n");
+		else if (np->ignore_phy)
+			printk(", port MII, ignoring PHY\n");
 		else
 			printk(", port MII, phy ad %d.\n", np->phy_addr_external);
 	}
@@ -1685,42 +1702,44 @@ static void check_link(struct net_device
 {
 	struct netdev_private *np = netdev_priv(dev);
 	void __iomem * ioaddr = ns_ioaddr(dev);
-	int duplex;
+	int duplex = np->full_duplex;
 	u16 bmsr;
-       
-	/* The link status field is latched: it remains low after a temporary
-	 * link failure until it's read. We need the current link status,
-	 * thus read twice.
-	 */
-	mdio_read(dev, MII_BMSR);
-	bmsr = mdio_read(dev, MII_BMSR);
 
-	if (!(bmsr & BMSR_LSTATUS)) {
-		if (netif_carrier_ok(dev)) {
+	/* If we're not paying attention to the PHY status then don't check. */
+	if (!np->ignore_phy) {
+		/* The link status field is latched: it remains low
+		 * after a temporary link failure until it's read. We
+		 * need the current link status, thus read twice.
+		 */
+		mdio_read(dev, MII_BMSR);
+		bmsr = mdio_read(dev, MII_BMSR);
+
+		if (!(bmsr & BMSR_LSTATUS)) {
+			if (netif_carrier_ok(dev)) {
+				if (netif_msg_link(np))
+					printk(KERN_NOTICE "%s: link down.\n",
+					       dev->name);
+				netif_carrier_off(dev);
+				undo_cable_magic(dev);
+			}
+			return;
+		}
+		if (!netif_carrier_ok(dev)) {
 			if (netif_msg_link(np))
-				printk(KERN_NOTICE "%s: link down.\n",
-					dev->name);
-			netif_carrier_off(dev);
-			undo_cable_magic(dev);
+				printk(KERN_NOTICE "%s: link up.\n", dev->name);
+			netif_carrier_on(dev);
+			do_cable_magic(dev);
 		}
-		return;
-	}
-	if (!netif_carrier_ok(dev)) {
-		if (netif_msg_link(np))
-			printk(KERN_NOTICE "%s: link up.\n", dev->name);
-		netif_carrier_on(dev);
-		do_cable_magic(dev);
-	}
 
-	duplex = np->full_duplex;
-	if (!duplex) {
-		if (bmsr & BMSR_ANEGCOMPLETE) {
-			int tmp = mii_nway_result(
-				np->advertising & mdio_read(dev, MII_LPA));
-			if (tmp == LPA_100FULL || tmp == LPA_10FULL)
+		if (!duplex) {
+			if (bmsr & BMSR_ANEGCOMPLETE) {
+				int tmp = mii_nway_result(
+					np->advertising & mdio_read(dev, MII_LPA));
+				if (tmp == LPA_100FULL || tmp == LPA_10FULL)
+					duplex = 1;
+			} else if (mdio_read(dev, MII_BMCR) & BMCR_FULLDPLX)
 				duplex = 1;
-		} else if (mdio_read(dev, MII_BMCR) & BMCR_FULLDPLX)
-			duplex = 1;
+		}
 	}
 
 	/* if duplex is set then bit 28 must be set, too */
@@ -2936,6 +2955,16 @@ static int netdev_set_ecmd(struct net_de
 	}
 
 	/*
+	 * If we're ignoring the PHY then autoneg and the internal
+	 * transciever are really not going to work so don't let the
+	 * user select them.
+	 */
+	if (np->ignore_phy && (ecmd->autoneg == AUTONEG_ENABLE ||
+			       ecmd->port == PORT_TP)) {
+		return -EINVAL;
+	}
+
+	/*
 	 * maxtxpkt, maxrxpkt: ignored for now.
 	 *
 	 * transceiver:
_

^ permalink raw reply

* Re: [patch 2/9] tulip: NatSemi DP83840A PHY fix
From: Jeff Garzik @ 2006-04-27  9:52 UTC (permalink / raw)
  To: akpm; +Cc: netdev, T-Bone, grundler, jgarzik, varenet, Francois Romieu
In-Reply-To: <200604270932.k3R9W4Xj025312@shell0.pdx.osdl.net>

akpm@osdl.org wrote:
> +				if (startup) {
> +					int timeout = 10;	/* max 1 ms */
>  					for (i = 0; i < reset_length; i++)
>  						iowrite32(get_u16(&reset_sequence[i]) << 16, ioaddr + CSR15);
> +
> +					/* flush posted writes */
> +					ioread32(ioaddr + CSR15);
> +
> +					/* Sect 3.10.3 in DP83840A.pdf (p39) */
> +					udelay(500);
> +
> +					/* Section 4.2 in DP83840A.pdf (p43) */
> +					/* and IEEE 802.3 "22.2.4.1.1 Reset" */
> +					while (timeout-- &&
> +						(tulip_mdio_read (dev, phy_num, MII_BMCR) & BMCR_RESET))
> +						udelay(100);


What can we do about this?

Its a huge delay to be taken inside a spinlock.

Anybody interested to converting the driver to use schedule_work() or 
similar?

	Jeff



^ permalink raw reply

* Re: [patch 3/9] natsemi: Add support for using MII port with no PHY
From: Jeff Garzik @ 2006-04-27  9:54 UTC (permalink / raw)
  To: akpm; +Cc: netdev, broonie, jgarzik, thockin
In-Reply-To: <200604270932.k3R9W5lK025315@shell0.pdx.osdl.net>

akpm@osdl.org wrote:
> From: Mark Brown <broonie@sirena.org.uk>
> 
> Provide a module option which configures the natsemi driver to use the
> external MII port on the chip but ignore any PHYs that may be attached to it. 
> The link state will be left as it was when the driver started and can be
> configured via ethtool.  Any PHYs that are present can be accessed via the MII
> ioctl()s.
> 
> This is useful for systems where the device is connected without a PHY or
> where either information or actions outside the scope of the driver are
> required in order to use the PHYs.
> 
> Signed-off-by: Mark Brown <broonie@sirena.org.uk>
> Cc: Tim Hockin <thockin@hockin.org>
> Cc: Jeff Garzik <jgarzik@pobox.com>
> Signed-off-by: Andrew Morton <akpm@osdl.org>

NAK.

The proper way to do this is via the force_media boolean flag found in 
several net drivers.

In general I agree with the motivation to do something like this...

	Jeff




^ permalink raw reply

* Re: [patch 8/9] forcedeth: suggested cleanups
From: Jeff Garzik @ 2006-04-27  9:56 UTC (permalink / raw)
  To: akpm; +Cc: netdev, ioe-lkml, manfred
In-Reply-To: <200604270932.k3R9WD14025332@shell0.pdx.osdl.net>

akpm@osdl.org wrote:
> From: Ingo Oeser <ioe-lkml@rameria.de>
> 
> general:
> 	- endian annotation of the ring descriptors
> 
> nv_getlen():
> 	- use htons() instead of __constant_htons()
>           to improvde readability and let the compiler constant fold it.
> 
> nv_rx_process():
> 	- use a real for() loop in processing instead of goto and break
> 	- consolidate rx_errors increment
> 	- count detected rx_length_errors
> 
> Signed-off-by: Ingo Oeser <ioe-lkml@rameria.de>
> Cc: Manfred Spraul <manfred@colorfullife.com>
> Signed-off-by: Andrew Morton <akpm@osdl.org>

ACK, but I'm holding off applying this while trying to get 4 apply-able 
patches out of NVIDIA and Manfred :)

	Jeff




^ permalink raw reply

* Re: [patch 7/9] e100: disable interrupts at boot
From: Jeff Garzik @ 2006-04-27 10:00 UTC (permalink / raw)
  To: akpm
  Cc: netdev, bjorn.helgaas, jeffrey.t.kirsher, jesse.brandeburg,
	john.ronciak, nils.rennebarth, stern
In-Reply-To: <200604270932.k3R9WBhr025327@shell0.pdx.osdl.net>

akpm@osdl.org wrote:
> From: Bjorn Helgaas <bjorn.helgaas@hp.com>
> 
> Apparently the Intel PRO/100 device enables interrupts on reset.  Unless
> firmware explicitly disables PRO/100 interrupts, we can get a flood of
> interrupts when a driver attaches to an unrelated device that happens to
> share the PRO/100 IRQ.
> 
> This should resolve this "irq 11: nobody cared" bug report:
>     http://bugzilla.kernel.org/show_bug.cgi?id=5918
> 
> Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
> Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>
> Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
> Cc: John Ronciak <john.ronciak@intel.com>
> Cc: <stern@rowland.harvard.edu>
> Cc: <nils.rennebarth@packetalarm.com>
> Signed-off-by: Andrew Morton <akpm@osdl.org>

I'm really worried that we are burdening the kernel for a very very rare 
condition.  Do we want to apply this for one stupid firmware?

Can't early userspace just run setpci to fix this one?

	Jeff




^ permalink raw reply

* Re: [patch 2/9] tulip: NatSemi DP83840A PHY fix
From: Thibaut VARENE @ 2006-04-27 10:21 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: akpm, netdev, grundler, jgarzik, Francois Romieu
In-Reply-To: <44509477.4010001@garzik.org>

On 4/27/06, Jeff Garzik <jeff@garzik.org> wrote:
> akpm@osdl.org wrote:
> > +                             if (startup) {
> > +                                     int timeout = 10;       /* max 1 ms */
> >                                       for (i = 0; i < reset_length; i++)
> >                                               iowrite32(get_u16(&reset_sequence[i]) << 16, ioaddr + CSR15);
> > +
> > +                                     /* flush posted writes */
> > +                                     ioread32(ioaddr + CSR15);
> > +
> > +                                     /* Sect 3.10.3 in DP83840A.pdf (p39) */
> > +                                     udelay(500);
> > +
> > +                                     /* Section 4.2 in DP83840A.pdf (p43) */
> > +                                     /* and IEEE 802.3 "22.2.4.1.1 Reset" */
> > +                                     while (timeout-- &&
> > +                                             (tulip_mdio_read (dev, phy_num, MII_BMCR) & BMCR_RESET))
> > +                                             udelay(100);
>
>
> What can we do about this?
>
> Its a huge delay to be taken inside a spinlock.

This is device setup code. ISTR Grant showing other similar examples
of delays in such code in the kernel. Unless you keep
configuring/deconfiguring the device, and assuming you hit worst case
scenario everytime, it won't be a problem. But if you're doing that,
you already have a problem elsewhere. Or am I missing something?

> Anybody interested to converting the driver to use schedule_work() or
> similar?

That question has been raised months ago without any significant
outcome. Maybe it's time to move on? This code does respect hardware
specs, at least, which isn't the case of existing code, and fixes a
bug...

HTH

T-Bone

--
Thibaut VARENE
http://www.parisc-linux.org/~varenet/

^ permalink raw reply

* Re: [PATCH 1/3] Rough VJ Channel Implementation - vj_core.patch
From: Evgeniy Polyakov @ 2006-04-27 11:51 UTC (permalink / raw)
  To: David S. Miller; +Cc: kelly, rusty, netdev
In-Reply-To: <20060426.232501.119306252.davem@davemloft.net>

On Wed, Apr 26, 2006 at 11:25:01PM -0700, David S. Miller (davem@davemloft.net) wrote:
> > We approached this from the understanding that an intelligent NIC
> > will be able to transition directly to userspace, which is a major
> > win.  0 copies to userspace would be sweet.  I think we can still
> > achieve this using your scheme without *too* much pain.
> 
> Understood.  What's your basic idea?  Just make the buffers in the
> pool large enough to fit the SKB encapsulation at the end?

There are some caveats here found while developing zero-copy sniffer
[1]. Project's goal was to remap skbs into userspace in real-time.
While absolute numbers (posted to netdev@) were really high, it is only
applicable to read-only application. As was shown in IOAT thread,
data must be warmed in caches, so reading from mapped area will be as
fast as memcpy() (read+write), and copy_to_user() actually almost equal
to memcpy() (benchmarks were posted to netdev@). And we must add
remapping overhead.

If we want to dma data from nic into premapped userspace area, this will
strike with message sizes/misalignment/slow read and so on, so
preallocation has even more problems.

This change also requires significant changes in application, at least
until recv/send are changed, which is not the best thing to do.

So I think that mapping itself can be done as some additional socket
option or something not turnedon by default.

I do think that significant win in VJ's tests belongs not to remapping
and cache-oriented changes, but to move all protocol processing into
process' context.

I fully agree with Dave that it must be implemented step-by-step, and
the most significant, IMHO, is moving protocol processing into socket's
"place". This will force to netfilter changes, but I do think that for
the proof-of-concept code we can turn it off.

I will start to work in this direction next week after aio_sendfile() is
completed.

So, we will have three attempts to write incompatible stacks - and that is good :)
No one need an excuse to rewrite something, as I read in Rusty's blog...

Thanks.

[1]. http://tservice.net.ru/~s0mbre/old/?section=projects&item=af_tlb

-- 
	Evgeniy Polyakov

^ permalink raw reply

* [IPSEC]: Fix IP ID selection
From: Herbert Xu @ 2006-04-27 11:56 UTC (permalink / raw)
  To: David S. Miller, netdev

[-- Attachment #1: Type: text/plain, Size: 800 bytes --]

Hi Dave:

I was looking through the xfrm input/output code in order to abstract
out the address family specific encapsulation/decapsulation code.  During
that process I found this bug in the IP ID selection code in xfrm4_output.c.

At that point dst is still the xfrm_dst for the current SA which
represents an internal flow as far as the IPsec tunnel is concerned.
Since the IP ID is going to sit on the outside of the encapsulated
packet, we obviously want the external flow which is just dst->child.

The fix is trivial.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

[-- Attachment #2: xfrm4-output-fid.patch --]
[-- Type: text/plain, Size: 437 bytes --]

diff --git a/net/ipv4/xfrm4_output.c b/net/ipv4/xfrm4_output.c
index 32ad229..4ef8efa 100644
--- a/net/ipv4/xfrm4_output.c
+++ b/net/ipv4/xfrm4_output.c
@@ -62,7 +62,7 @@
 	top_iph->frag_off = (flags & XFRM_STATE_NOPMTUDISC) ?
 		0 : (iph->frag_off & htons(IP_DF));
 	if (!top_iph->frag_off)
-		__ip_select_ident(top_iph, dst, 0);
+		__ip_select_ident(top_iph, dst->child, 0);

 	top_iph->ttl = dst_metric(dst->child, RTAX_HOPLIMIT);

^ permalink raw reply related

* [PATCH] DECnet: Fix level1 router hello
From: Patrick Caulfield @ 2006-04-27 12:37 UTC (permalink / raw)
  To: David S. Miller, netdev; +Cc: Steven Whitehouse

This patch fixes hello messages sent when a node is a level 1 router. Slightly
contrary to the spec (maybe) VMS ignores hello messages that do not name
level2 routers that it also knows about.

So, here we simply name all the routers that the node knows about rather just
other level1 routers.
(I hope the patch is clearer than the description. sorry).

Patrick

Signed-off-by: Patrick Caulfield <patrick@tykepenguin.com>

diff --git a/net/decnet/dn_neigh.c b/net/decnet/dn_neigh.c
index 7c8692c..66e230c 100644
--- a/net/decnet/dn_neigh.c
+++ b/net/decnet/dn_neigh.c
@@ -493,7 +493,6 @@ struct elist_cb_state {
 static void neigh_elist_cb(struct neighbour *neigh, void *_info)
 {
 	struct elist_cb_state *s = _info;
-	struct dn_dev *dn_db;
 	struct dn_neigh *dn;

 	if (neigh->dev != s->dev)
@@ -503,10 +502,6 @@ static void neigh_elist_cb(struct neighb
 	if (!(dn->flags & (DN_NDFLAG_R1|DN_NDFLAG_R2)))
 		return;

-	dn_db = (struct dn_dev *) s->dev->dn_ptr;
-	if (dn_db->parms.forwarding == 1 && (dn->flags & DN_NDFLAG_R2))
-		return;
-
 	if (s->t == s->n)
 		s->rs = dn_find_slot(s->ptr, s->n, dn->priority);
 	else

^ permalink raw reply related

* Re: [PATCH 0/17] d80211 patches
From: Michael Buesch @ 2006-04-27 12:49 UTC (permalink / raw)
  To: John W. Linville; +Cc: Jiri Benc, netdev
In-Reply-To: <20060426193907.GB7922@tuxdriver.com>

[-- Attachment #1: Type: text/plain, Size: 651 bytes --]

On Wednesday 26 April 2006 21:39, John W. Linville wrote:
> On Fri, Apr 21, 2006 at 10:52:10PM +0200, Jiri Benc wrote:
> > On Fri, 21 Apr 2006 22:52:08 +0200, Michael Buesch wrote:
> > > Can you please send your hacky patch for the bcm43xx
> > > to me, so I can come up with a clean one?
> > 
> > Sure, actually I planned to do it in a few minutes :-)
> 
> Hacky or not, I'm applying this patch to keep the bcm43xx driver
> from breaking.
That's perfectly fine.
The correct patch needs a little bit of rewriting in the "mode" code.
I will do that after I finished the 4318 fixes (should take quite
some time).

-- 
Greetings Michael.

[-- Attachment #2: Type: application/pgp-signature, Size: 191 bytes --]

^ permalink raw reply

* Re: Netpoll checksum issue
From: Aubrey @ 2006-04-27 12:57 UTC (permalink / raw)
  To: Herbert Xu; +Cc: Stephen Hemminger, netdev
In-Reply-To: <6d6a94c50604240622v4cc3f0efi20429b474927c814@mail.gmail.com>

Hi Herbert,

Is there any update of this issue?

Regards,
- Aubrey

On 4/24/06, Aubrey <aubreylee@gmail.com> wrote:
> On 4/24/06, Herbert Xu <herbert@gondor.apana.org.au> wrote:
> > On Mon, Apr 24, 2006 at 01:42:12PM +0800, Aubrey wrote:
> > >
> > >   dev->last_rx = jiffies;
> > >   skb->dev = dev;
> > >   skb->protocol = eth_type_trans(skb, dev);
> > >   skb->ip_summed = CHECKSUM_UNNECESSARY;
> > >   netif_rx(skb);
> >
> > This doesn't make sense.  First of all you're setting ip_summed to
> > CHECK_UNNECESSARY unconditionally which is most likely wrong.
> >
> > What's more, if this was the driver that you were using, then
> > checksum_udp couldn't possibly fail since the first thing it does
> > is check ip_summed.
>
> Hmm, when I change the code in __netpoll_rx, there is no the following
> one line in the driver:
> ==============================================
> skb->ip_summed = CHECKSUM_UNNECESSARY;
> ==============================================
>
> The above one line code in the driver is just a workaround.
>
> Regards,
> -Aubrey
>

^ permalink raw reply

* Re: [patch 7/9] e100: disable interrupts at boot
From: Bjorn Helgaas @ 2006-04-27 15:05 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: akpm, netdev, jeffrey.t.kirsher, jesse.brandeburg, john.ronciak,
	nils.rennebarth, stern
In-Reply-To: <44509628.4090704@garzik.org>

On Thursday 27 April 2006 04:00, Jeff Garzik wrote:
> akpm@osdl.org wrote:
> > From: Bjorn Helgaas <bjorn.helgaas@hp.com>
> > 
> > Apparently the Intel PRO/100 device enables interrupts on reset.  Unless
> > firmware explicitly disables PRO/100 interrupts, we can get a flood of
> > interrupts when a driver attaches to an unrelated device that happens to
> > share the PRO/100 IRQ.
> > 
> > This should resolve this "irq 11: nobody cared" bug report:
> >     http://bugzilla.kernel.org/show_bug.cgi?id=5918
> > 
> > Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
> > Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>
> > Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
> > Cc: John Ronciak <john.ronciak@intel.com>
> > Cc: <stern@rowland.harvard.edu>
> > Cc: <nils.rennebarth@packetalarm.com>
> > Signed-off-by: Andrew Morton <akpm@osdl.org>
> 
> I'm really worried that we are burdening the kernel for a very very rare 
> condition.  Do we want to apply this for one stupid firmware?
> 
> Can't early userspace just run setpci to fix this one?

I don't think so, because the problem occurs as soon as any driver
enables IRQ 11.  I agree that it's rare, but it's a real pain to
debug it and identify it when it occurs.

^ permalink raw reply

* Fwd: More
From: Vanessa Rivers @ 2006-04-27 16:54 UTC (permalink / raw)
  To: netdev

"Cia-lis Sof`tabs" is better than Pfizer V`ia`g`ra
and normal Ci-ialis because:

- Guarantes 40 hours lasting
- Safe to take, no side efects at all
- Boost and increase se-xual performance
- Harder e`rectiiions and quick recharge
- Proven and certified by experts and doctors
- only $1.56 per tabs
- Special offeer! These prices 
- are valid until 30th of April!
 
 Clisk here: http://sq-design.info









eleven springe toolkit shave indonesia bean chili graph constitutive bruegel
decompile effluent demarcate contrast arcadia pariah transferring buttress safety
callahan conjugate diamagnetic strengthen eben wasp appreciable esquire peek fatigue zachary
exacerbate chaise sloven alistair chant etch animosity aaron
dim abc jay shire vagary pest cap flabbergast
chevron directrix vasectomy angelina counsel enos cube falsify

^ permalink raw reply

* Re: [RFC] e1000 performance patch
From: Rick Jones @ 2006-04-27 16:07 UTC (permalink / raw)
  To: Robin Humble; +Cc: netdev
In-Reply-To: <20060427024326.GA6318@lemming.cita.utoronto.ca>


> 
> but clearly I should be using netperf to get more accurate cpu numbers
> and a more convincing aggregate table :-)

Well, I'll not stop you  :)

> 
> 
>>It is a bit rough/messy as a writeup, but here is what I've seen wrt the 
>>latency vs throughput tradeoffs:
>>ftp://ftp.cup.hp.com/dist/networking/briefs/nic_latency_vs_tput.txt
> 
> 
> from a quick read it looks like just the case with 32kB messages,
> multiple simultaneous clients, and driver set to unlimited ITR sees
> reduced throughput. is that right?
> 
> if so, then I'm not surprised.

There should be three basic measures there - one is the single-instance 
request-response test. The idea is to see minimum latency.  That test 
likes to see the interrupt throttle rate made very high, or disabled 
completely.

The aggregate TCP_RR's and the TCP_STREAM tests are there to show what 
effect that has on the ability to do aggregate request/response and a 
bulk transfer.

> but overall I'm actually more worried about a mix of small and large
> messages than multiple clients.


> 
> a large/small mix might well occur in 'the real world' and it'll be 2s
> until the watchdog routine can adapt the ITR. potentially that 2s will
> be at 200k ITR which is too high for large messages, and up to 2s of
> cpu will be burnt needlessly.
> 
> can netperf (or some other tool) mix up big and small message sizes
> like 'the real world' perhaps does?
> that might help me find a good frequency at which to try to adapt the
> ITR... (eg. 1, 10, 100 or 1000 times a second)

There is the "vst" (variable size test IIRC) in netperf4:

http://www.netperf.org/svn/netperf4/branches/glib_migration

The docs for netperf4 are presently pathetic.  Feel free to email me for 
bootstrapping information.  Basically, you'll need pkg-config, libxml2 
and glib-2.0 on the system.

> 
> cheers,
> robin


^ permalink raw reply

* [RFC PATCH] [IPV6]: Fix race in route selection.
From: YOSHIFUJI Hideaki / 吉藤英明 @ 2006-04-27 17:04 UTC (permalink / raw)
  To: davem; +Cc: netdev

Hello.

We eliminated rt6_dflt_lock (to protect default router pointer)
at 2.6.17-rc1, and introduced rt6_select() for general router selection.
The function is called in the context of rt6_lock read-lock held,
but this means, we have some race conditions when we do round-robin.

Am I correct?

If so, we should put some spin_lock for serialization.

Comments?

Signed-off-by; YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>

Or, should we run whole part of rt6_select() under some lock?

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 7907874..0190e39 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -317,7 +317,7 @@ static struct rt6_info *rt6_select(struc
 		  __FUNCTION__, head, head ? *head : NULL, oif);
 
 	for (rt = rt0, metric = rt0->rt6i_metric;
-	     rt && rt->rt6i_metric == metric;
+	     rt && rt->rt6i_metric == metric && (!last || rt != rt0);
 	     rt = rt->u.next) {
 		int m;
 
@@ -343,9 +343,12 @@ static struct rt6_info *rt6_select(struc
 	    (strict & RT6_SELECT_F_REACHABLE) &&
 	    last && last != rt0) {
 		/* no entries matched; do round-robin */
+		static spinlock_t lock = SPIN_LOCK_UNLOCKED;
+		spin_lock(&lock);
 		*head = rt0->u.next;
 		rt0->u.next = last->u.next;
 		last->u.next = rt0;
+		spin_unlock(&lock);
 	}
 
 	RT6_TRACE("%s() => %p, score=%d\n",

-- 
YOSHIFUJI Hideaki @ USAGI Project  <yoshfuji@linux-ipv6.org>
GPG-FP  : 9022 65EB 1ECF 3AD1 0BDF  80D8 4807 F894 E062 0EEA

^ permalink raw reply related

* 2.6.16.11 BUG at tg3.c:2917
From: Ed L. Cashin @ 2006-04-27 16:52 UTC (permalink / raw)
  To: netdev; +Cc: David S. Miller

Hi.  On 2.6.15.7 and 2.6.16.11, I have seen panics under heavy NFS
write load on an x86_64 system with two onboard Broadcom gigabit NICs.
It's a Supermicro P8SCi motherboard with an EMT64 Intel CPU.  The aoe
driver in use is the aoe6-26 driver from the Coraid website.

I haven't yet trimmed down the test case or tried using the aoe driver
that comes with 2.6.16.11.  Right now there's kernel NFS exporting an
XFS filesystem on a logical volume backed by 3 AoE devices.

I'm including two panics here.

There's a relevant-looking discussion of the same bug from May 2005 at
the URL below.

  http://oss.sgi.com/projects/netdev/archive/2004-05/msg00378.html


----------- [cut here ] --------- [please bite here ] ---------
Kernel BUG at drivers/net/tg3.c:2917
invalid opcode: 0000 [1] SMP 
CPU 0 
Modules linked in: nfsd lockd nfs_acl sunrpc xfs exportfs dm_mod aoe ipv6 rtc piix i2c_i801 psmouse evdev i2c_core unix
Pid: 3053, comm: nfsd Not tainted 2.6.16.11-c1 #1
RIP: 0010:[<ffffffff802302ac>] <ffffffff802302ac>{tg3_poll+179}
RSP: 0000:ffffffff8039cc38  EFLAGS: 00010246
RAX: 00000000000001fb RBX: 0000000000000000 RCX: 0000000000000003
RDX: 0000000000000038 RSI: ffff81003f03f180 RDI: ffff810001fbb980
RBP: ffff81003d82df88 R08: 0000000000000400 R09: ffff81003e5fae18
R10: ffff81003ee86a80 R11: 00000000000000c4 R12: ffff81003f0d0500
R13: 00000000000001fb R14: 0000000000000016 R15: ffff810023088c30
FS:  00002b4cde2ee6d0(0000) GS:ffffffff803e6000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000438010 CR3: 0000000025729000 CR4: 00000000000006e0
Process nfsd (pid: 3053, threadinfo ffff81003d6fc000, task ffff81003f304140)
Stack: 0000000000000046 ffffffff802427b4 ffffffff8039ccd4 ffff81003f0d0000 
       ffff81003dfec000 000000140000002c 00000000000000ca 00ca8100000000ca 
       ffff81003dfdd920 ffff81003f0d059c 
Call Trace: <IRQ> <ffffffff802427b4>{task_in_intr+240}
       <ffffffff802720b5>{net_rx_action+165} <ffffffff8012e449>{__do_softirq+86}
       <ffffffff8010ba52>{call_softirq+30} <EOI> <ffffffff8010d13f>{do_softirq+44}
       <ffffffff8012e16c>{local_bh_enable+105} <ffffffff80273172>{dev_queue_xmit+551}
       <ffffffff88074579>{:aoe:aoenet_xmit+26} <ffffffff880723af>{:aoe:aoeblk_make_request+413}
       <ffffffff801b207a>{generic_make_request+335} <ffffffff8807bca2>{:dm_mod:__map_bio+66}
       <ffffffff8807befc>{:dm_mod:__split_bio+365} <ffffffff880e0f96>{:xfs:linvfs_get_block+0}
       <ffffffff8807c30a>{:dm_mod:dm_request+262} <ffffffff801b207a>{generic_make_request+335}
       <ffffffff801b24e7>{submit_bio+184} <ffffffff880e39cd>{:xfs:xfs_buf_iorequest+828}
       <ffffffff80124b9d>{default_wake_function+0} <ffffffff880e3240>{:xfs:xfs_buf_associate_memory+117}
       <ffffffff880cc103>{:xfs:xlog_bdstrat_cb+22} <ffffffff880cc794>{:xfs:xlog_state_release_iclog+695}
       <ffffffff880ce890>{:xfs:xlog_write+1509} <ffffffff880ce95c>{:xfs:xfs_log_write+42}
       <ffffffff880d66f4>{:xfs:_xfs_trans_commit+1294} <ffffffff880e0790>{:xfs:kmem_zone_alloc+73}
       <ffffffff880e07f9>{:xfs:kmem_zone_zalloc+28} <ffffffff880c5957>{:xfs:xfs_itruncate_finish+530}
       <ffffffff880daeb3>{:xfs:xfs_inactive_free_eofblocks+384}
       <ffffffff880e40e3>{:xfs:linvfs_release+0} <ffffffff880daf90>{:xfs:xfs_release+152}
       <ffffffff880e40fa>{:xfs:linvfs_release+23} <ffffffff80164fe2>{__fput+155}
       <ffffffff88144d64>{:nfsd:nfsd_write+196} <ffffffff8814bc1c>{:nfsd:nfsd3_proc_write+231}
       <ffffffff881413c2>{:nfsd:nfsd_dispatch+221} <ffffffff8810c360>{:sunrpc:svc_process+975}
       <ffffffff802c672f>{__down_read+18} <ffffffff88141648>{:nfsd:nfsd+451}
       <ffffffff8010b702>{child_rip+8} <ffffffff88141485>{:nfsd:nfsd+0}
       <ffffffff8010b6fa>{child_rip+0}

Code: 0f 0b 68 83 5f 2f 80 c2 65 0b 49 8b 44 24 40 8b 93 88 00 00 
RIP <ffffffff802302ac>{tg3_poll+179} RSP <ffffffff8039cc38>
 <0>Kernel panic - not syncing: Aiee, killing interrupt handler!
 

----------- [cut here ] --------- [please bite here ] ---------
Kernel BUG at drivers/net/tg3.c:2914
invalid operand: 0000 [1] SMP
CPU 0
Modules linked in: nfsd lockd nfs_acl sunrpc dm_mod aoe xfs exportfs ipv6 i2c_i801 i2c_core piix md_mod rtc psmouse unix
Pid: 88, comm: kswapd0 Not tainted 2.6.15.7-c1 #1
RIP: 0010:[<ffffffff802329ee>] <ffffffff802329ee>{tg3_poll+179}
RSP: 0000:ffffffff80395e08  EFLAGS: 00010246
RAX: 0000000000000066 RBX: 0000000000000000 RCX: 0000000000000002
RDX: 0000000000000028 RSI: ffff81003e999d80 RDI: ffff810001fbba40
RBP: ffff81003dd63990 R08: ffffffff80395ea8 R09: ffff81003dc2ce18
R10: 000000000000003a R11: ffffffff80395ea8 R12: ffff81003f1a3500
R13: 0000000000000066 R14: 00000000000000a9 R15: ffffffff80395f08
FS:  0000000000000000(0000) GS:ffffffff803e1800(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00000000004a12a7 CR3: 00000000077ab000 CR4: 00000000000006e0
Process kswapd0 (pid: 88, threadinfo ffff81003f5d8000, task ffff81003f594790)
Stack: ffffffff803c8980 0000000000001d4c ffffffff80395ea4 ffff81003f1a3000
       ffff81003db45000 0000004000000000 0000000000000049 004900000000003b
       ffff81003e52c740 ffff81003f1a359c
Call Trace: <IRQ> <ffffffff80273e14>{net_rx_action+165} <ffffffff8013348c>{__do_softirq+86}
       <ffffffff8010eaef>{call_softirq+31} <ffffffff80110187>{do_softirq+44}
       <ffffffff801101bf>{do_IRQ+52} <ffffffff8010dd10>{ret_from_intr+0}
        <EOI> <ffffffff80154a80>{cache_flusharray+30} <ffffffff880d776c>{:xfs:linvfs_release_page+0}
       <ffffffff802c81d7>{_write_unlock_irqrestore+9} <ffffffff80152a41>{test_clear_page_dirty+152}
       <ffffffff8016ad64>{try_to_free_buffers+116} <ffffffff880d776c>{:xfs:linvfs_release_page+0}
       <ffffffff880d77f1>{:xfs:linvfs_release_page+133} <ffffffff801577d0>{shrink_zone+2695}
       <ffffffff80129aa5>{activate_task+140} <ffffffff8012a713>{try_to_wake_up+1110}
       <ffffffff80157d53>{balance_pgdat+535} <ffffffff80157fb2>{kswapd+256}
       <ffffffff80141334>{autoremove_wake_function+0} <ffffffff8010e65e>{child_rip+8}
       <ffffffff80157eb2>{kswapd+0} <ffffffff8010e656>{child_rip+0}


Code: 0f 0b 68 ba 2d 2f 80 c2 62 0b 49 8b 44 24 40 8b 93 80 00 00
RIP <ffffffff802329ee>{tg3_poll+179} RSP <ffffffff80395e08>
 <0>Kernel panic - not syncing: Aiee, killing interrupt handler!


-- 
  Ed L Cashin <ecashin@coraid.com>

^ permalink raw reply

* Re: 2.6.16.11 BUG at tg3.c:2917
From: Michael Chan @ 2006-04-27 15:45 UTC (permalink / raw)
  To: Ed L. Cashin; +Cc: netdev, David S. Miller
In-Reply-To: <20060427165234.GC29045@coraid.com>

On Thu, 2006-04-27 at 12:52 -0400, Ed L. Cashin wrote:
> -- [please bite here ] ---------
> Kernel BUG at drivers/net/tg3.c:2917
> invalid opcode: 0000 [1] SMP 
> CPU 0 

Most likely caused by IO re-ordering. Try the test patch in this
discussion:

http://marc.theaimsgroup.com/?l=linux-netdev&m=113890239404768&w=2


^ permalink raw reply

* Re: tune back idle cwnd closing?
From: John Heffner @ 2006-04-27 17:47 UTC (permalink / raw)
  To: David S. Miller; +Cc: zach.brown, netdev
In-Reply-To: <20060426.144540.39973302.davem@davemloft.net>

David S. Miller wrote:
> From: John Heffner <jheffner@psc.edu>
>> Given that RFC2681 is Experimental (and I'm not aware of any current 
>> efforts in the IETF to push it to the standard track), IHMO it would not 
>> be inappropriate to make this behavior controlled via sysctl.
> 
> I have to respectfully disagree.
> 
> This is the price you pay when the network's congestion is being
> measured by probing, information becomes stale over time if you don't
> send any probes.
> 
> And this change of congestion state is real and happens frequently for
> most end to end users.
> 
> When you're bursty application is not sending, other flows can take up
> the pipe space you are not using, and you must reprobe to figure that
> out.

A lot of the time doing 2861 is a good thing, since if you have a long 
pause, you've lost your ack clock, and you don't want to send a 
window-sized burst because you'll probably overflow a queue somewhere 
and step on your own feet.  Since we don't have a pacing mechanism, a 
slow start is really the only way to do this.

I don't entirely buy the "staleness" argument.  I don't think that *not* 
doing 2861 will affect the stability of congestion control, since all of 
the response mechanisms are still in place.  (Most OS's don't do 2861, 
and it is not a standard.)  If you have a long RTT, short RTT flows can 
make a big difference in congestion in a period much smaller than your 
timeout.  In fact, congestion information is *always* stale by the time 
you get it. :)

Sometimes having cwnd validation turned on will make your applications 
perform better, sometimes worse.  I don't think it would be incorrect to 
add a switch.  One question is whether it's worth adding the switch 
(i.e., do enough people care?).

Myself, I'd be interested to see some quantitative comparisons of 
performance with a "real" application affected by this.

Thanks,
   -John

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox