* [PATCH 1/18] spidernet: skb used after netif_receive_skb
2007-06-07 19:17 [PATCH 0/18] spidernet driver bug fixes Linas Vepstas
@ 2007-06-07 19:20 ` Linas Vepstas
2007-06-07 19:22 ` [PATCH 2/18] spidernet: checksum and ethtool Linas Vepstas
` (17 subsequent siblings)
18 siblings, 0 replies; 69+ messages in thread
From: Linas Vepstas @ 2007-06-07 19:20 UTC (permalink / raw)
To: Jeff Garzik; +Cc: cbe-oss-dev, netdev, Florin Malita
From: Florin Malita <fmalita@gmail.com>
The stats update code in spider_net_pass_skb_up() is touching the skb
after it's been passed up to the stack. To avoid that, just update the
stats first.
Signed-off-by: Florin Malita <fmalita@gmail.com>
Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
----
drivers/net/spider_net.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/net/spider_net.c b/drivers/net/spider_net.c
index 108adbf..1df2f0b 100644
Index: linux-2.6.22-rc1/drivers/net/spider_net.c
===================================================================
--- linux-2.6.22-rc1.orig/drivers/net/spider_net.c 2007-06-07 11:51:04.000000000 -0500
+++ linux-2.6.22-rc1/drivers/net/spider_net.c 2007-06-07 11:51:40.000000000 -0500
@@ -1014,12 +1014,12 @@ spider_net_pass_skb_up(struct spider_net
*/
}
- /* pass skb up to stack */
- netif_receive_skb(skb);
-
/* update netdevice statistics */
card->netdev_stats.rx_packets++;
card->netdev_stats.rx_bytes += skb->len;
+
+ /* pass skb up to stack */
+ netif_receive_skb(skb);
}
#ifdef DEBUG
^ permalink raw reply [flat|nested] 69+ messages in thread* [PATCH 2/18] spidernet: checksum and ethtool
2007-06-07 19:17 [PATCH 0/18] spidernet driver bug fixes Linas Vepstas
2007-06-07 19:20 ` [PATCH 1/18] spidernet: skb used after netif_receive_skb Linas Vepstas
@ 2007-06-07 19:22 ` Linas Vepstas
2007-06-07 19:24 ` [PATCH 3/18] spidernet: beautify error messages Linas Vepstas
` (16 subsequent siblings)
18 siblings, 0 replies; 69+ messages in thread
From: Linas Vepstas @ 2007-06-07 19:22 UTC (permalink / raw)
To: Jeff Garzik; +Cc: cbe-oss-dev, netdev, Stephen Hemminger
From: Stephen Hemminger <shemminger@linux-foundation.org>
It doesn't look like spidernet hardware can really checksum all protocols,
the code looks like it does IPV4 only. If so, it should use NETIF_F_IP_CSUM
instead of NETIF_F_HW_CSUM.
The driver doesn't need it's own get/set for ethtool tx csum, and it
should use the standard ethtool_op_get_link.
Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
-----
drivers/net/spider_net.c | 4 ++--
drivers/net/spider_net_ethtool.c | 21 +++------------------
2 files changed, 5 insertions(+), 20 deletions(-)
Index: linux-2.6.22-rc1/drivers/net/spider_net.c
===================================================================
--- linux-2.6.22-rc1.orig/drivers/net/spider_net.c 2007-06-07 11:51:40.000000000 -0500
+++ linux-2.6.22-rc1/drivers/net/spider_net.c 2007-06-07 11:51:45.000000000 -0500
@@ -718,7 +718,7 @@ spider_net_prepare_tx_descr(struct spide
SPIDER_NET_DESCR_CARDOWNED | SPIDER_NET_DMAC_NOCS;
spin_unlock_irqrestore(&chain->lock, flags);
- if (skb->protocol == htons(ETH_P_IP) && skb->ip_summed == CHECKSUM_PARTIAL)
+ if (skb->ip_summed == CHECKSUM_PARTIAL)
switch (ip_hdr(skb)->protocol) {
case IPPROTO_TCP:
hwdescr->dmac_cmd_status |= SPIDER_NET_DMAC_TCP;
@@ -2225,7 +2225,7 @@ spider_net_setup_netdev(struct spider_ne
spider_net_setup_netdev_ops(netdev);
- netdev->features = NETIF_F_HW_CSUM | NETIF_F_LLTX;
+ netdev->features = NETIF_F_IP_CSUM | NETIF_F_LLTX;
/* some time: NETIF_F_HW_VLAN_TX | NETIF_F_HW_VLAN_RX |
* NETIF_F_HW_VLAN_FILTER */
Index: linux-2.6.22-rc1/drivers/net/spider_net_ethtool.c
===================================================================
--- linux-2.6.22-rc1.orig/drivers/net/spider_net_ethtool.c 2007-06-07 11:49:01.000000000 -0500
+++ linux-2.6.22-rc1/drivers/net/spider_net_ethtool.c 2007-06-07 11:51:45.000000000 -0500
@@ -134,22 +134,6 @@ spider_net_ethtool_set_rx_csum(struct ne
return 0;
}
-static uint32_t
-spider_net_ethtool_get_tx_csum(struct net_device *netdev)
-{
- return (netdev->features & NETIF_F_HW_CSUM) != 0;
-}
-
-static int
-spider_net_ethtool_set_tx_csum(struct net_device *netdev, uint32_t data)
-{
- if (data)
- netdev->features |= NETIF_F_HW_CSUM;
- else
- netdev->features &= ~NETIF_F_HW_CSUM;
-
- return 0;
-}
static void
spider_net_ethtool_get_ringparam(struct net_device *netdev,
@@ -200,11 +184,12 @@ const struct ethtool_ops spider_net_etht
.get_wol = spider_net_ethtool_get_wol,
.get_msglevel = spider_net_ethtool_get_msglevel,
.set_msglevel = spider_net_ethtool_set_msglevel,
+ .get_link = ethtool_op_get_link,
.nway_reset = spider_net_ethtool_nway_reset,
.get_rx_csum = spider_net_ethtool_get_rx_csum,
.set_rx_csum = spider_net_ethtool_set_rx_csum,
- .get_tx_csum = spider_net_ethtool_get_tx_csum,
- .set_tx_csum = spider_net_ethtool_set_tx_csum,
+ .get_tx_csum = ethtool_op_get_tx_csum,
+ .set_tx_csum = ethtool_op_set_tx_csum,
.get_ringparam = spider_net_ethtool_get_ringparam,
.get_strings = spider_net_get_strings,
.get_stats_count = spider_net_get_stats_count,
^ permalink raw reply [flat|nested] 69+ messages in thread* [PATCH 3/18] spidernet: beautify error messages
2007-06-07 19:17 [PATCH 0/18] spidernet driver bug fixes Linas Vepstas
2007-06-07 19:20 ` [PATCH 1/18] spidernet: skb used after netif_receive_skb Linas Vepstas
2007-06-07 19:22 ` [PATCH 2/18] spidernet: checksum and ethtool Linas Vepstas
@ 2007-06-07 19:24 ` Linas Vepstas
2007-06-07 19:25 ` [PATCH 4/18] spidernet: move a block of code around Linas Vepstas
` (15 subsequent siblings)
18 siblings, 0 replies; 69+ messages in thread
From: Linas Vepstas @ 2007-06-07 19:24 UTC (permalink / raw)
To: Jeff Garzik; +Cc: cbe-oss-dev, netdev
Use dev_err() to print device error messages.
Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
----
drivers/net/spider_net.c | 67 ++++++++++++++++++++++++-----------------------
drivers/net/spider_net.h | 2 -
2 files changed, 36 insertions(+), 33 deletions(-)
Index: linux-2.6.22-rc1/drivers/net/spider_net.c
===================================================================
--- linux-2.6.22-rc1.orig/drivers/net/spider_net.c 2007-06-07 11:51:45.000000000 -0500
+++ linux-2.6.22-rc1/drivers/net/spider_net.c 2007-06-07 11:51:47.000000000 -0500
@@ -434,7 +434,8 @@ spider_net_prepare_rx_descr(struct spide
bufsize + SPIDER_NET_RXBUF_ALIGN - 1);
if (!descr->skb) {
if (netif_msg_rx_err(card) && net_ratelimit())
- pr_err("Not enough memory to allocate rx buffer\n");
+ dev_err(&card->netdev->dev,
+ "Not enough memory to allocate rx buffer\n");
card->spider_stats.alloc_rx_skb_error++;
return -ENOMEM;
}
@@ -455,7 +456,7 @@ spider_net_prepare_rx_descr(struct spide
dev_kfree_skb_any(descr->skb);
descr->skb = NULL;
if (netif_msg_rx_err(card) && net_ratelimit())
- pr_err("Could not iommu-map rx buffer\n");
+ dev_err(&card->netdev->dev, "Could not iommu-map rx buffer\n");
card->spider_stats.rx_iommu_map_error++;
hwdescr->dmac_cmd_status = SPIDER_NET_DESCR_NOT_IN_USE;
} else {
@@ -692,7 +693,7 @@ spider_net_prepare_tx_descr(struct spide
buf = pci_map_single(card->pdev, skb->data, skb->len, PCI_DMA_TODEVICE);
if (pci_dma_mapping_error(buf)) {
if (netif_msg_tx_err(card) && net_ratelimit())
- pr_err("could not iommu-map packet (%p, %i). "
+ dev_err(&card->netdev->dev, "could not iommu-map packet (%p, %i). "
"Dropping packet\n", skb->data, skb->len);
card->spider_stats.tx_iommu_map_error++;
return -ENOMEM;
@@ -832,9 +833,8 @@ spider_net_release_tx_chain(struct spide
case SPIDER_NET_DESCR_PROTECTION_ERROR:
case SPIDER_NET_DESCR_FORCE_END:
if (netif_msg_tx_err(card))
- pr_err("%s: forcing end of tx descriptor "
- "with status x%02x\n",
- card->netdev->name, status);
+ dev_err(&card->netdev->dev, "forcing end of tx descriptor "
+ "with status x%02x\n", status);
card->netdev_stats.tx_errors++;
break;
@@ -1087,8 +1087,8 @@ spider_net_decode_one_descr(struct spide
(status == SPIDER_NET_DESCR_PROTECTION_ERROR) ||
(status == SPIDER_NET_DESCR_FORCE_END) ) {
if (netif_msg_rx_err(card))
- pr_err("%s: dropping RX descriptor with state %d\n",
- card->netdev->name, status);
+ dev_err(&card->netdev->dev,
+ "dropping RX descriptor with state %d\n", status);
card->netdev_stats.rx_dropped++;
goto bad_desc;
}
@@ -1096,8 +1096,8 @@ spider_net_decode_one_descr(struct spide
if ( (status != SPIDER_NET_DESCR_COMPLETE) &&
(status != SPIDER_NET_DESCR_FRAME_END) ) {
if (netif_msg_rx_err(card))
- pr_err("%s: RX descriptor with unknown state %d\n",
- card->netdev->name, status);
+ dev_err(&card->netdev->dev,
+ "RX descriptor with unknown state %d\n", status);
card->spider_stats.rx_desc_unk_state++;
goto bad_desc;
}
@@ -1105,16 +1105,14 @@ spider_net_decode_one_descr(struct spide
/* The cases we'll throw away the packet immediately */
if (hwdescr->data_error & SPIDER_NET_DESTROY_RX_FLAGS) {
if (netif_msg_rx_err(card))
- pr_err("%s: error in received descriptor found, "
+ dev_err(&card->netdev->dev, "error in received descriptor found, "
"data_status=x%08x, data_error=x%08x\n",
- card->netdev->name,
hwdescr->data_status, hwdescr->data_error);
goto bad_desc;
}
if (hwdescr->dmac_cmd_status & 0xfefe) {
- pr_err("%s: bad status, cmd_status=x%08x\n",
- card->netdev->name,
+ dev_err(&card->netdev->dev, "bad status, cmd_status=x%08x\n",
hwdescr->dmac_cmd_status);
pr_err("buf_addr=x%08x\n", hwdescr->buf_addr);
pr_err("buf_size=x%08x\n", hwdescr->buf_size);
@@ -1384,7 +1382,7 @@ spider_net_handle_error_irq(struct spide
case SPIDER_NET_GPWFFINT:
/* PHY command queue full */
if (netif_msg_intr(card))
- pr_err("PHY write queue full\n");
+ dev_err(&card->netdev->dev, "PHY write queue full\n");
show_error = 0;
break;
@@ -1455,7 +1453,7 @@ spider_net_handle_error_irq(struct spide
case SPIDER_NET_GRFAFLLINT: /* fallthrough */
case SPIDER_NET_GRMFLLINT:
if (netif_msg_intr(card) && net_ratelimit())
- pr_err("Spider RX RAM full, incoming packets "
+ dev_err(&card->netdev->dev, "Spider RX RAM full, incoming packets "
"might be discarded!\n");
spider_net_rx_irq_off(card);
netif_rx_schedule(card->netdev);
@@ -1474,7 +1472,7 @@ spider_net_handle_error_irq(struct spide
case SPIDER_NET_GDBDCEINT: /* fallthrough */
case SPIDER_NET_GDADCEINT:
if (netif_msg_intr(card) && net_ratelimit())
- pr_err("got descriptor chain end interrupt, "
+ dev_err(&card->netdev->dev, "got descriptor chain end interrupt, "
"restarting DMAC %c.\n",
'D'-(i-SPIDER_NET_GDDDCEINT)/3);
spider_net_refill_rx_chain(card);
@@ -1545,9 +1543,8 @@ spider_net_handle_error_irq(struct spide
}
if ((show_error) && (netif_msg_intr(card)) && net_ratelimit())
- pr_err("Got error interrupt on %s, GHIINT0STS = 0x%08x, "
+ dev_err(&card->netdev->dev, "Error interrupt, GHIINT0STS = 0x%08x, "
"GHIINT1STS = 0x%08x, GHIINT2STS = 0x%08x\n",
- card->netdev->name,
status_reg, error_reg1, error_reg2);
/* clear interrupt sources */
@@ -1811,7 +1808,8 @@ spider_net_init_firmware(struct spider_n
SPIDER_NET_FIRMWARE_NAME, &card->pdev->dev) == 0) {
if ( (firmware->size != SPIDER_NET_FIRMWARE_LEN) &&
netif_msg_probe(card) ) {
- pr_err("Incorrect size of spidernet firmware in " \
+ dev_err(&card->netdev->dev,
+ "Incorrect size of spidernet firmware in " \
"filesystem. Looking in host firmware...\n");
goto try_host_fw;
}
@@ -1835,8 +1833,8 @@ try_host_fw:
if ( (fw_size != SPIDER_NET_FIRMWARE_LEN) &&
netif_msg_probe(card) ) {
- pr_err("Incorrect size of spidernet firmware in " \
- "host firmware\n");
+ dev_err(&card->netdev->dev,
+ "Incorrect size of spidernet firmware in host firmware\n");
goto done;
}
@@ -1846,7 +1844,8 @@ done:
return err;
out_err:
if (netif_msg_probe(card))
- pr_err("Couldn't find spidernet firmware in filesystem " \
+ dev_err(&card->netdev->dev,
+ "Couldn't find spidernet firmware in filesystem " \
"or host firmware\n");
return err;
}
@@ -2242,13 +2241,14 @@ spider_net_setup_netdev(struct spider_ne
result = spider_net_set_mac(netdev, &addr);
if ((result) && (netif_msg_probe(card)))
- pr_err("Failed to set MAC address: %i\n", result);
+ dev_err(&card->netdev->dev,
+ "Failed to set MAC address: %i\n", result);
result = register_netdev(netdev);
if (result) {
if (netif_msg_probe(card))
- pr_err("Couldn't register net_device: %i\n",
- result);
+ dev_err(&card->netdev->dev,
+ "Couldn't register net_device: %i\n", result);
return result;
}
@@ -2326,17 +2326,19 @@ spider_net_setup_pci_dev(struct pci_dev
unsigned long mmio_start, mmio_len;
if (pci_enable_device(pdev)) {
- pr_err("Couldn't enable PCI device\n");
+ dev_err(&pdev->dev, "Couldn't enable PCI device\n");
return NULL;
}
if (!(pci_resource_flags(pdev, 0) & IORESOURCE_MEM)) {
- pr_err("Couldn't find proper PCI device base address.\n");
+ dev_err(&pdev->dev,
+ "Couldn't find proper PCI device base address.\n");
goto out_disable_dev;
}
if (pci_request_regions(pdev, spider_net_driver_name)) {
- pr_err("Couldn't obtain PCI resources, aborting.\n");
+ dev_err(&pdev->dev,
+ "Couldn't obtain PCI resources, aborting.\n");
goto out_disable_dev;
}
@@ -2344,8 +2346,8 @@ spider_net_setup_pci_dev(struct pci_dev
card = spider_net_alloc_card();
if (!card) {
- pr_err("Couldn't allocate net_device structure, "
- "aborting.\n");
+ dev_err(&pdev->dev,
+ "Couldn't allocate net_device structure, aborting.\n");
goto out_release_regions;
}
card->pdev = pdev;
@@ -2359,7 +2361,8 @@ spider_net_setup_pci_dev(struct pci_dev
card->regs = ioremap(mmio_start, mmio_len);
if (!card->regs) {
- pr_err("Couldn't obtain PCI resources, aborting.\n");
+ dev_err(&pdev->dev,
+ "Couldn't obtain PCI resources, aborting.\n");
goto out_release_regions;
}
Index: linux-2.6.22-rc1/drivers/net/spider_net.h
===================================================================
--- linux-2.6.22-rc1.orig/drivers/net/spider_net.h 2007-06-07 11:42:17.000000000 -0500
+++ linux-2.6.22-rc1/drivers/net/spider_net.h 2007-06-07 11:51:47.000000000 -0500
@@ -25,7 +25,7 @@
#ifndef _SPIDER_NET_H
#define _SPIDER_NET_H
-#define VERSION "2.0 A"
+#define VERSION "2.0 B"
#include "sungem_phy.h"
^ permalink raw reply [flat|nested] 69+ messages in thread* [PATCH 4/18] spidernet: move a block of code around
2007-06-07 19:17 [PATCH 0/18] spidernet driver bug fixes Linas Vepstas
` (2 preceding siblings ...)
2007-06-07 19:24 ` [PATCH 3/18] spidernet: beautify error messages Linas Vepstas
@ 2007-06-07 19:25 ` Linas Vepstas
2007-06-07 19:27 ` [PATCH 5/18] spidernet: zero out a pointer Linas Vepstas
` (14 subsequent siblings)
18 siblings, 0 replies; 69+ messages in thread
From: Linas Vepstas @ 2007-06-07 19:25 UTC (permalink / raw)
To: Jeff Garzik; +Cc: cbe-oss-dev, netdev
Put the enable and disable routines next to one-another,
as this makes verifying thier symmetry that much easier.
Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
----
drivers/net/spider_net.c | 28 ++++++++++++++--------------
1 file changed, 14 insertions(+), 14 deletions(-)
Index: linux-2.6.22-rc1/drivers/net/spider_net.c
===================================================================
--- linux-2.6.22-rc1.orig/drivers/net/spider_net.c 2007-06-07 11:51:47.000000000 -0500
+++ linux-2.6.22-rc1/drivers/net/spider_net.c 2007-06-07 11:51:48.000000000 -0500
@@ -505,6 +505,20 @@ spider_net_enable_rxdmac(struct spider_n
}
/**
+ * spider_net_disable_rxdmac - disables the receive DMA controller
+ * @card: card structure
+ *
+ * spider_net_disable_rxdmac terminates processing on the DMA controller
+ * by turing off the DMA controller, with the force-end flag set.
+ */
+static inline void
+spider_net_disable_rxdmac(struct spider_net_card *card)
+{
+ spider_net_write_reg(card, SPIDER_NET_GDADMACCNTR,
+ SPIDER_NET_DMA_RX_FEND_VALUE);
+}
+
+/**
* spider_net_refill_rx_chain - refills descriptors/skbs in the rx chains
* @card: card structure
*
@@ -656,20 +670,6 @@ write_hash:
}
/**
- * spider_net_disable_rxdmac - disables the receive DMA controller
- * @card: card structure
- *
- * spider_net_disable_rxdmac terminates processing on the DMA controller by
- * turing off DMA and issueing a force end
- */
-static void
-spider_net_disable_rxdmac(struct spider_net_card *card)
-{
- spider_net_write_reg(card, SPIDER_NET_GDADMACCNTR,
- SPIDER_NET_DMA_RX_FEND_VALUE);
-}
-
-/**
* spider_net_prepare_tx_descr - fill tx descriptor with skb data
* @card: card structure
* @descr: descriptor structure to fill out
^ permalink raw reply [flat|nested] 69+ messages in thread* [PATCH 5/18] spidernet: zero out a pointer.
2007-06-07 19:17 [PATCH 0/18] spidernet driver bug fixes Linas Vepstas
` (3 preceding siblings ...)
2007-06-07 19:25 ` [PATCH 4/18] spidernet: move a block of code around Linas Vepstas
@ 2007-06-07 19:27 ` Linas Vepstas
2007-06-07 19:29 ` [PATCH 6/18] spidernet: null out skb pointer after its been used Linas Vepstas
` (13 subsequent siblings)
18 siblings, 0 replies; 69+ messages in thread
From: Linas Vepstas @ 2007-06-07 19:27 UTC (permalink / raw)
To: Jeff Garzik; +Cc: cbe-oss-dev, netdev
Invalidate a pointer as its pci_unmap'ed; this is a bit of
paranoia to make sure hardware doesn't continue trying to
DMA to it.
Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
----
drivers/net/spider_net.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
Index: linux-2.6.22-rc1/drivers/net/spider_net.c
===================================================================
--- linux-2.6.22-rc1.orig/drivers/net/spider_net.c 2007-06-07 11:51:48.000000000 -0500
+++ linux-2.6.22-rc1/drivers/net/spider_net.c 2007-06-07 11:51:51.000000000 -0500
@@ -1067,6 +1067,7 @@ spider_net_decode_one_descr(struct spide
struct spider_net_descr_chain *chain = &card->rx_chain;
struct spider_net_descr *descr = chain->tail;
struct spider_net_hw_descr *hwdescr = descr->hwdescr;
+ u32 hw_buf_addr;
int status;
status = spider_net_get_descr_status(hwdescr);
@@ -1080,7 +1081,9 @@ spider_net_decode_one_descr(struct spide
chain->tail = descr->next;
/* unmap descriptor */
- pci_unmap_single(card->pdev, hwdescr->buf_addr,
+ hw_buf_addr = hwdescr->buf_addr;
+ hwdescr->buf_addr = 0xffffffff;
+ pci_unmap_single(card->pdev, hw_buf_addr,
SPIDER_NET_MAX_FRAME, PCI_DMA_FROMDEVICE);
if ( (status == SPIDER_NET_DESCR_RESPONSE_ERROR) ||
@@ -1114,7 +1117,7 @@ spider_net_decode_one_descr(struct spide
if (hwdescr->dmac_cmd_status & 0xfefe) {
dev_err(&card->netdev->dev, "bad status, cmd_status=x%08x\n",
hwdescr->dmac_cmd_status);
- pr_err("buf_addr=x%08x\n", hwdescr->buf_addr);
+ pr_err("buf_addr=x%08x\n", hw_buf_addr);
pr_err("buf_size=x%08x\n", hwdescr->buf_size);
pr_err("next_descr_addr=x%08x\n", hwdescr->next_descr_addr);
pr_err("result_size=x%08x\n", hwdescr->result_size);
^ permalink raw reply [flat|nested] 69+ messages in thread* [PATCH 6/18] spidernet: null out skb pointer after its been used.
2007-06-07 19:17 [PATCH 0/18] spidernet driver bug fixes Linas Vepstas
` (4 preceding siblings ...)
2007-06-07 19:27 ` [PATCH 5/18] spidernet: zero out a pointer Linas Vepstas
@ 2007-06-07 19:29 ` Linas Vepstas
2007-06-07 19:33 ` [PATCH 7/18] spidernet: Don't terminate the RX ring Linas Vepstas
` (12 subsequent siblings)
18 siblings, 0 replies; 69+ messages in thread
From: Linas Vepstas @ 2007-06-07 19:29 UTC (permalink / raw)
To: Jeff Garzik; +Cc: cbe-oss-dev, netdev
If the ethernet interface is brought down while there is still
RX traffic in flight, the device shutdown routine can end up
trying to double-free an skb, leading to a crash in mm/slab.c
Avoid the double-free by nulling out the skb pointer.
Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
----
drivers/net/spider_net.c | 1 +
1 file changed, 1 insertion(+)
Index: linux-2.6.22-rc1/drivers/net/spider_net.c
===================================================================
--- linux-2.6.22-rc1.orig/drivers/net/spider_net.c 2007-06-07 11:51:51.000000000 -0500
+++ linux-2.6.22-rc1/drivers/net/spider_net.c 2007-06-07 11:51:52.000000000 -0500
@@ -1132,6 +1132,7 @@ spider_net_decode_one_descr(struct spide
/* Ok, we've got a packet in descr */
spider_net_pass_skb_up(descr, card);
+ descr->skb = NULL;
hwdescr->dmac_cmd_status = SPIDER_NET_DESCR_NOT_IN_USE;
return 1;
^ permalink raw reply [flat|nested] 69+ messages in thread* [PATCH 7/18] spidernet: Don't terminate the RX ring
2007-06-07 19:17 [PATCH 0/18] spidernet driver bug fixes Linas Vepstas
` (5 preceding siblings ...)
2007-06-07 19:29 ` [PATCH 6/18] spidernet: null out skb pointer after its been used Linas Vepstas
@ 2007-06-07 19:33 ` Linas Vepstas
2007-06-07 19:35 ` [PATCH 8/18] spidernet: enhance the dump routine Linas Vepstas
` (11 subsequent siblings)
18 siblings, 0 replies; 69+ messages in thread
From: Linas Vepstas @ 2007-06-07 19:33 UTC (permalink / raw)
To: Jeff Garzik; +Cc: cbe-oss-dev, netdev
Subject: [PATCH 7/18] spidernet: Don't terminate the RX ring
There is no real reason to terminate the RX ring; it
doesn't make the operation any smooother, and it does
require an extra sync. So don't do it.
Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
----
drivers/net/spider_net.c | 18 +++++++++---------
1 file changed, 9 insertions(+), 9 deletions(-)
Index: linux-2.6.22-rc1/drivers/net/spider_net.c
===================================================================
--- linux-2.6.22-rc1.orig/drivers/net/spider_net.c 2007-06-07 11:51:52.000000000 -0500
+++ linux-2.6.22-rc1/drivers/net/spider_net.c 2007-06-07 11:51:55.000000000 -0500
@@ -461,13 +461,9 @@ spider_net_prepare_rx_descr(struct spide
hwdescr->dmac_cmd_status = SPIDER_NET_DESCR_NOT_IN_USE;
} else {
hwdescr->buf_addr = buf;
- hwdescr->next_descr_addr = 0;
wmb();
hwdescr->dmac_cmd_status = SPIDER_NET_DESCR_CARDOWNED |
SPIDER_NET_DMAC_NOINTR_COMPLETE;
-
- wmb();
- descr->prev->hwdescr->next_descr_addr = descr->bus_addr;
}
return 0;
@@ -556,12 +552,16 @@ spider_net_refill_rx_chain(struct spider
static int
spider_net_alloc_rx_skbs(struct spider_net_card *card)
{
- int result;
- struct spider_net_descr_chain *chain;
+ struct spider_net_descr_chain *chain = &card->rx_chain;
+ struct spider_net_descr *start = chain->tail;
+ struct spider_net_descr *descr = start;
- result = -ENOMEM;
+ /* Link up the hardware chain pointers */
+ do {
+ descr->prev->hwdescr->next_descr_addr = descr->bus_addr;
+ descr = descr->next;
+ } while (descr != start);
- chain = &card->rx_chain;
/* Put at least one buffer into the chain. if this fails,
* we've got a problem. If not, spider_net_refill_rx_chain
* will do the rest at the end of this function. */
@@ -578,7 +578,7 @@ spider_net_alloc_rx_skbs(struct spider_n
error:
spider_net_free_rx_chain_contents(card);
- return result;
+ return -ENOMEM;
}
/**
^ permalink raw reply [flat|nested] 69+ messages in thread* [PATCH 8/18] spidernet: enhance the dump routine
2007-06-07 19:17 [PATCH 0/18] spidernet driver bug fixes Linas Vepstas
` (6 preceding siblings ...)
2007-06-07 19:33 ` [PATCH 7/18] spidernet: Don't terminate the RX ring Linas Vepstas
@ 2007-06-07 19:35 ` Linas Vepstas
2007-06-07 19:39 ` [PATCH 9/18] spidernet: reset the card when an rxramfull is seen Linas Vepstas
` (10 subsequent siblings)
18 siblings, 0 replies; 69+ messages in thread
From: Linas Vepstas @ 2007-06-07 19:35 UTC (permalink / raw)
To: Jeff Garzik; +Cc: cbe-oss-dev, netdev
Crazy device problems are hard to debug, when one does not have
good trace info. This patch makes a major enhancement to the
device dump routine.
Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
----
drivers/net/spider_net.c | 78 ++++++++++++++++++++++++++++++++++++++++++-----
1 file changed, 70 insertions(+), 8 deletions(-)
Index: linux-2.6.22-rc1/drivers/net/spider_net.c
===================================================================
--- linux-2.6.22-rc1.orig/drivers/net/spider_net.c 2007-06-07 14:07:26.000000000 -0500
+++ linux-2.6.22-rc1/drivers/net/spider_net.c 2007-06-07 14:07:55.000000000 -0500
@@ -1022,34 +1022,94 @@ spider_net_pass_skb_up(struct spider_net
netif_receive_skb(skb);
}
-#ifdef DEBUG
static void show_rx_chain(struct spider_net_card *card)
{
struct spider_net_descr_chain *chain = &card->rx_chain;
struct spider_net_descr *start= chain->tail;
struct spider_net_descr *descr= start;
+ struct spider_net_hw_descr *hwd = start->hwdescr;
+ struct device *dev = &card->netdev->dev;
+ u32 curr_desc, next_desc;
int status;
+ int tot = 0;
int cnt = 0;
- int cstat = spider_net_get_descr_status(descr);
- printk(KERN_INFO "RX chain tail at descr=%ld\n",
- (start - card->descr) - card->tx_chain.num_desc);
+ int off = start - chain->ring;
+ int cstat = hwd->dmac_cmd_status;
+
+ dev_info(dev, "Total number of descrs=%d\n",
+ chain->num_desc);
+ dev_info(dev, "Chain tail located at descr=%d, status=0x%x\n",
+ off, cstat);
+
+ curr_desc = spider_net_read_reg(card, SPIDER_NET_GDACTDPA);
+ next_desc = spider_net_read_reg(card, SPIDER_NET_GDACNEXTDA);
+
status = cstat;
do
{
- status = spider_net_get_descr_status(descr);
+ hwd = descr->hwdescr;
+ off = descr - chain->ring;
+ status = hwd->dmac_cmd_status;
+
+ if (descr == chain->head)
+ dev_info(dev, "Chain head is at %d, head status=0x%x\n",
+ off, status);
+
+ if (curr_desc == descr->bus_addr)
+ dev_info(dev, "HW curr desc (GDACTDPA) is at %d, status=0x%x\n",
+ off, status);
+
+ if (next_desc == descr->bus_addr)
+ dev_info(dev, "HW next desc (GDACNEXTDA) is at %d, status=0x%x\n",
+ off, status);
+
+ if (hwd->next_descr_addr == 0)
+ dev_info(dev, "chain is cut at %d\n", off);
+
if (cstat != status) {
- printk(KERN_INFO "Have %d descrs with stat=x%08x\n", cnt, cstat);
+ int from = (chain->num_desc + off - cnt) % chain->num_desc;
+ int to = (chain->num_desc + off - 1) % chain->num_desc;
+ dev_info(dev, "Have %d (from %d to %d) descrs "
+ "with stat=0x%08x\n", cnt, from, to, cstat);
cstat = status;
cnt = 0;
}
+
cnt ++;
+ tot ++;
+ descr = descr->next;
+ } while (descr != start);
+
+ dev_info(dev, "Last %d descrs with stat=0x%08x "
+ "for a total of %d descrs\n", cnt, cstat, tot);
+
+#ifdef DEBUG
+ /* Now dump the whole ring */
+ descr = start;
+ do
+ {
+ struct spider_net_hw_descr *hwd = descr->hwdescr;
+ status = spider_net_get_descr_status(hwd);
+ cnt = descr - chain->ring;
+ dev_info(dev, "Descr %d stat=0x%08x skb=%p\n",
+ cnt, status, descr->skb);
+ dev_info(dev, "bus addr=%08x buf addr=%08x sz=%d\n",
+ descr->bus_addr, hwd->buf_addr, hwd->buf_size);
+ dev_info(dev, "next=%08x result sz=%d valid sz=%d\n",
+ hwd->next_descr_addr, hwd->result_size,
+ hwd->valid_size);
+ dev_info(dev, "dmac=%08x data stat=%08x data err=%08x\n",
+ hwd->dmac_cmd_status, hwd->data_status,
+ hwd->data_error);
+ dev_info(dev, "\n");
+
descr = descr->next;
} while (descr != start);
- printk(KERN_INFO "Last %d descrs with stat=x%08x\n", cnt, cstat);
-}
#endif
+}
+
/**
* spider_net_decode_one_descr - processes an RX descriptor
* @card: card structure
@@ -1137,6 +1197,8 @@ spider_net_decode_one_descr(struct spide
return 1;
bad_desc:
+ if (netif_msg_rx_err(card))
+ show_rx_chain(card);
dev_kfree_skb_irq(descr->skb);
descr->skb = NULL;
hwdescr->dmac_cmd_status = SPIDER_NET_DESCR_NOT_IN_USE;
^ permalink raw reply [flat|nested] 69+ messages in thread* [PATCH 9/18] spidernet: reset the card when an rxramfull is seen
2007-06-07 19:17 [PATCH 0/18] spidernet driver bug fixes Linas Vepstas
` (7 preceding siblings ...)
2007-06-07 19:35 ` [PATCH 8/18] spidernet: enhance the dump routine Linas Vepstas
@ 2007-06-07 19:39 ` Linas Vepstas
2007-06-07 19:41 ` [PATCH 10/18] spidernet: service TX later Linas Vepstas
` (9 subsequent siblings)
18 siblings, 0 replies; 69+ messages in thread
From: Linas Vepstas @ 2007-06-07 19:39 UTC (permalink / raw)
To: Jeff Garzik; +Cc: cbe-oss-dev, netdev
Some versions of the spider have a firmware bug, where the
RX ring sequencer goes crazy when the RX RAM on the device
fills up. Appearently the only viable wrkaround is a soft
reset of the card.
Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
----
drivers/net/spider_net.c | 15 +++++++++++----
1 file changed, 11 insertions(+), 4 deletions(-)
Index: linux-2.6.22-rc1/drivers/net/spider_net.c
===================================================================
--- linux-2.6.22-rc1.orig/drivers/net/spider_net.c 2007-06-07 11:52:12.000000000 -0500
+++ linux-2.6.22-rc1/drivers/net/spider_net.c 2007-06-07 11:52:17.000000000 -0500
@@ -1518,11 +1518,16 @@ spider_net_handle_error_irq(struct spide
case SPIDER_NET_GRFBFLLINT: /* fallthrough */
case SPIDER_NET_GRFAFLLINT: /* fallthrough */
case SPIDER_NET_GRMFLLINT:
- if (netif_msg_intr(card) && net_ratelimit())
- dev_err(&card->netdev->dev, "Spider RX RAM full, incoming packets "
- "might be discarded!\n");
+ if (netif_msg_intr(card) && net_ratelimit()) {
+ dev_err(&card->netdev->dev, "Spider RX RAM full, "
+ "incoming packets might be discarded!\n");
+ show_rx_chain(card);
+ }
spider_net_rx_irq_off(card);
- netif_rx_schedule(card->netdev);
+
+ /* If the card is spewing rxramfulls, then reset */
+ atomic_inc(&card->tx_timeout_task_counter);
+ schedule_work(&card->tx_timeout_task);
show_error = 0;
break;
@@ -2100,6 +2105,8 @@ spider_net_workaround_rxramfull(struct s
{
int i, sequencer = 0;
+ dev_info(&card->pdev->dev, "calling rxramfull workaround\n");
+
/* cancel reset */
spider_net_write_reg(card, SPIDER_NET_CKRCTRL,
SPIDER_NET_CKRCTRL_RUN_VALUE);
^ permalink raw reply [flat|nested] 69+ messages in thread* [PATCH 10/18] spidernet: service TX later.
2007-06-07 19:17 [PATCH 0/18] spidernet driver bug fixes Linas Vepstas
` (8 preceding siblings ...)
2007-06-07 19:39 ` [PATCH 9/18] spidernet: reset the card when an rxramfull is seen Linas Vepstas
@ 2007-06-07 19:41 ` Linas Vepstas
2007-06-07 19:43 ` [PATCH 11/18] spidernet: increase the NAPI weight Linas Vepstas
` (8 subsequent siblings)
18 siblings, 0 replies; 69+ messages in thread
From: Linas Vepstas @ 2007-06-07 19:41 UTC (permalink / raw)
To: Jeff Garzik; +Cc: cbe-oss-dev, netdev
When entering the netdev poll routine, empty out the RX
chain first, before cleaning up the TX chain. This should
help avoid RX buffer overflows.
Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
----
drivers/net/spider_net.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
Index: linux-2.6.22-rc1/drivers/net/spider_net.c
===================================================================
--- linux-2.6.22-rc1.orig/drivers/net/spider_net.c 2007-06-07 11:52:17.000000000 -0500
+++ linux-2.6.22-rc1/drivers/net/spider_net.c 2007-06-07 11:52:20.000000000 -0500
@@ -1224,7 +1224,6 @@ spider_net_poll(struct net_device *netde
int packets_to_do, packets_done = 0;
int no_more_packets = 0;
- spider_net_cleanup_tx_ring(card);
packets_to_do = min(*budget, netdev->quota);
while (packets_to_do) {
@@ -1243,6 +1242,8 @@ spider_net_poll(struct net_device *netde
spider_net_refill_rx_chain(card);
spider_net_enable_rxdmac(card);
+ spider_net_cleanup_tx_ring(card);
+
/* if all packets are in the stack, enable interrupts and return 0 */
/* if not, return 1 */
if (no_more_packets) {
^ permalink raw reply [flat|nested] 69+ messages in thread* [PATCH 11/18] spidernet: increase the NAPI weight
2007-06-07 19:17 [PATCH 0/18] spidernet driver bug fixes Linas Vepstas
` (9 preceding siblings ...)
2007-06-07 19:41 ` [PATCH 10/18] spidernet: service TX later Linas Vepstas
@ 2007-06-07 19:43 ` Linas Vepstas
2007-06-07 19:45 ` [PATCH 12/18] spidernet: don't flag rare packets as bad packets Linas Vepstas
` (7 subsequent siblings)
18 siblings, 0 replies; 69+ messages in thread
From: Linas Vepstas @ 2007-06-07 19:43 UTC (permalink / raw)
To: Jeff Garzik; +Cc: cbe-oss-dev, netdev
Another way of minimizing the likelyhood of RX ram from overflowing
is to empty out the entire rx ring every chance we get. Change
the crazy watchdog timeout from 50 seconds to 3 seconds, while
we're here.
Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
----
drivers/net/spider_net.h | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
Index: linux-2.6.22-rc1/drivers/net/spider_net.h
===================================================================
--- linux-2.6.22-rc1.orig/drivers/net/spider_net.h 2007-06-07 11:51:47.000000000 -0500
+++ linux-2.6.22-rc1/drivers/net/spider_net.h 2007-06-07 11:52:22.000000000 -0500
@@ -56,8 +56,13 @@ extern char spider_net_driver_name[];
#define SPIDER_NET_RX_CSUM_DEFAULT 1
-#define SPIDER_NET_WATCHDOG_TIMEOUT 50*HZ
-#define SPIDER_NET_NAPI_WEIGHT 64
+#define SPIDER_NET_WATCHDOG_TIMEOUT 3*HZ
+
+/* We really really want to empty the ring buffer every time,
+ * so as to avoid the RX ram full bug. So set te napi wieght
+ * to the ring size.
+ */
+#define SPIDER_NET_NAPI_WEIGHT SPIDER_NET_RX_DESCRIPTORS_DEFAULT
#define SPIDER_NET_FIRMWARE_SEQS 6
#define SPIDER_NET_FIRMWARE_SEQWORDS 1024
^ permalink raw reply [flat|nested] 69+ messages in thread* [PATCH 12/18] spidernet: don't flag rare packets as bad packets
2007-06-07 19:17 [PATCH 0/18] spidernet driver bug fixes Linas Vepstas
` (10 preceding siblings ...)
2007-06-07 19:43 ` [PATCH 11/18] spidernet: increase the NAPI weight Linas Vepstas
@ 2007-06-07 19:45 ` Linas Vepstas
2007-06-07 19:51 ` [PATCH 13/18] spidernet: Cure RX ram full bug Linas Vepstas
` (6 subsequent siblings)
18 siblings, 0 replies; 69+ messages in thread
From: Linas Vepstas @ 2007-06-07 19:45 UTC (permalink / raw)
To: Jeff Garzik; +Cc: cbe-oss-dev, netdev
The current error checking is flagging some perfectly normal, but
usually rare packets as being bad. Do not flag these packets.
Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
----
drivers/net/spider_net.c | 9 +++------
1 file changed, 3 insertions(+), 6 deletions(-)
Index: linux-2.6.22-rc1/drivers/net/spider_net.c
===================================================================
--- linux-2.6.22-rc1.orig/drivers/net/spider_net.c 2007-06-07 11:52:20.000000000 -0500
+++ linux-2.6.22-rc1/drivers/net/spider_net.c 2007-06-07 11:52:24.000000000 -0500
@@ -1174,7 +1174,7 @@ spider_net_decode_one_descr(struct spide
goto bad_desc;
}
- if (hwdescr->dmac_cmd_status & 0xfefe) {
+ if (hwdescr->dmac_cmd_status & 0xfcf4) {
dev_err(&card->netdev->dev, "bad status, cmd_status=x%08x\n",
hwdescr->dmac_cmd_status);
pr_err("buf_addr=x%08x\n", hw_buf_addr);
@@ -1543,10 +1543,7 @@ spider_net_handle_error_irq(struct spide
case SPIDER_NET_GDCDCEINT: /* fallthrough */
case SPIDER_NET_GDBDCEINT: /* fallthrough */
case SPIDER_NET_GDADCEINT:
- if (netif_msg_intr(card) && net_ratelimit())
- dev_err(&card->netdev->dev, "got descriptor chain end interrupt, "
- "restarting DMAC %c.\n",
- 'D'-(i-SPIDER_NET_GDDDCEINT)/3);
+ /* Could happen when rx chain is full */
spider_net_refill_rx_chain(card);
spider_net_enable_rxdmac(card);
show_error = 0;
@@ -1557,7 +1554,7 @@ spider_net_handle_error_irq(struct spide
case SPIDER_NET_GDCINVDINT: /* fallthrough */
case SPIDER_NET_GDBINVDINT: /* fallthrough */
case SPIDER_NET_GDAINVDINT:
- /* could happen when rx chain is full */
+ /* Could happen when rx chain is full */
spider_net_refill_rx_chain(card);
spider_net_enable_rxdmac(card);
show_error = 0;
^ permalink raw reply [flat|nested] 69+ messages in thread* [PATCH 13/18] spidernet: Cure RX ram full bug
2007-06-07 19:17 [PATCH 0/18] spidernet driver bug fixes Linas Vepstas
` (11 preceding siblings ...)
2007-06-07 19:45 ` [PATCH 12/18] spidernet: don't flag rare packets as bad packets Linas Vepstas
@ 2007-06-07 19:51 ` Linas Vepstas
2007-06-07 19:53 ` [PATCH 14/18] spidernet: silence the ramfull messages Linas Vepstas
` (5 subsequent siblings)
18 siblings, 0 replies; 69+ messages in thread
From: Linas Vepstas @ 2007-06-07 19:51 UTC (permalink / raw)
To: Jeff Garzik
Cc: cbe-oss-dev, netdev, joseferr, mlui, Utz Bacher, Abdullah Dagli,
Jens Osterkamp, MOKUNO Masakazu, Tsutomu OWA, Kou Ishizaki,
Geoff Levand, Geert Uytterhoeven
This patch fixes a rare deadlock that can occur when the kernel
is not able to empty out the RX ring quickly enough. Below follows
a detailed description of the bug and te fix.
As long as the OS can empty out the RX buffers at a rate faster than
the hardware can fill them, there is no problem. If, for some reason,
the OS fails to empty the RX ring fast enough, the hardware GDACTDPA
pointer will catch up to the head, notice the not-empty condition,
ad stop. However, RX packets may still continue arriving on the wire.
The spidernet chip can save some limited number of these in local RAM.
When this local ram fills up, the spider chip will issue an interrupt
indicating this (GHIINT0STS will show ERRINT, and the GRMFLLINT bit
will be set in GHIINT1STS). When te RX ram full condition occurs,
a certain bug/feature is triggered that has to be specially handled.
This section describes the special handling for this condition.
When the OS finally has a chance to run, it will empty out the RX ring.
In particular, it will clear the descriptor on which the hardware had
stopped. However, once the hardware has decided that a certain
descriptor is invalid, it will not restart at that descriptor; instead
it will restart at the next descr. This potentially will lead to a
deadlock condition, as the tail pointer will be pointing at this descr,
which, from the OS point of view, is empty; the OS will be waiting for
this descr to be filled. However, the hardware has skipped this descr,
and is filling the next descrs. Since the OS doesn't see this, there
is a potential deadlock, with the OS waiting for one descr to fill,
while the hardware is waiting for a differen set of descrs to become
empty.
A call to show_rx_chain() at this point indicates the nature of the
problem. A typical print when the network is hung shows the following:
net eth1: Spider RX RAM full, incoming packets might be discarded!
net eth1: Total number of descrs=256
net eth1: Chain tail located at descr=255
net eth1: Chain head is at 255
net eth1: HW curr desc (GDACTDPA) is at 0
net eth1: Have 1 descrs with stat=xa0800000
net eth1: HW next desc (GDACNEXTDA) is at 1
net eth1: Have 127 descrs with stat=x40800101
net eth1: Have 1 descrs with stat=x40800001
net eth1: Have 126 descrs with stat=x40800101
net eth1: Last 1 descrs with stat=xa0800000
Both the tail and head pointers are pointing at descr 255, which is
marked xa... which is "empty". Thus, from the OS point of view, there
is nothing to be done. In particular, there is the implicit assumption
that everything in front of the "empty" descr must surely also be empty,
as explained in the last section. The OS is waiting for descr 255 to
become non-empty, which, in this case, will never happen.
The HW pointer is at descr 0. This descr is marked 0x4.. or "full".
Since its already full, the hardware can do nothing more, and thus has
halted processing. Notice that descrs 0 through 254 are all marked
"full", while descr 254 and 255 are empty. (The "Last 1 descrs" is
descr 254, since tail was at 255.) Thus, the system is deadlocked,
and there can be no forward progress; the OS thinks there's nothing
to do, and the hardware has nowhere to put incoming data.
This bug/feature is worked around with the spider_net_resync_head_ptr()
routine. When the driver receives RX interrupts, but an examination
of the RX chain seems to show it is empty, then it is probable that
the hardware has skipped a descr or two (sometimes dozens under heavy
network conditions). The spider_net_resync_head_ptr() subroutine will
search the ring for the next full descr, and the driver will resume
operations there. Since this will leave "holes" in the ring, there
is also a spider_net_resync_tail_ptr() that will skip over such holes.
Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
----
drivers/net/spider_net.c | 86 +++++++++++++++++++++++++++++++++++++++++++----
drivers/net/spider_net.h | 1
2 files changed, 81 insertions(+), 6 deletions(-)
Index: linux-2.6.22-rc1/drivers/net/spider_net.c
===================================================================
--- linux-2.6.22-rc1.orig/drivers/net/spider_net.c 2007-06-07 11:52:24.000000000 -0500
+++ linux-2.6.22-rc1/drivers/net/spider_net.c 2007-06-07 11:53:55.000000000 -0500
@@ -1111,6 +1111,65 @@ static void show_rx_chain(struct spider_
}
/**
+ * spider_net_resync_head_ptr - Advance head ptr past empty descrs
+ *
+ * If the driver fails to keep up and empty the queue, then the
+ * hardware wil run out of room to put incoming packets. This
+ * will cause the hardware to skip descrs that are full (instead
+ * of halting/retrying). Thus, once the driver runs, it wil need
+ * to "catch up" to where the hardware chain pointer is at.
+ */
+static void spider_net_resync_head_ptr(struct spider_net_card *card)
+{
+ unsigned long flags;
+ struct spider_net_descr_chain *chain = &card->rx_chain;
+ struct spider_net_descr *descr;
+ int i, status;
+
+ /* Advance head pointer past any empty descrs */
+ descr = chain->head;
+ status = spider_net_get_descr_status(descr->hwdescr);
+
+ if (status == SPIDER_NET_DESCR_NOT_IN_USE)
+ return;
+
+ spin_lock_irqsave(&chain->lock, flags);
+
+ descr = chain->head;
+ status = spider_net_get_descr_status(descr->hwdescr);
+ for (i=0; i<chain->num_desc; i++) {
+ if (status != SPIDER_NET_DESCR_CARDOWNED) break;
+ descr = descr->next;
+ status = spider_net_get_descr_status(descr->hwdescr);
+ }
+ chain->head = descr;
+
+ spin_unlock_irqrestore(&chain->lock, flags);
+}
+
+static int spider_net_resync_tail_ptr(struct spider_net_card *card)
+{
+ struct spider_net_descr_chain *chain = &card->rx_chain;
+ struct spider_net_descr *descr;
+ int i, status;
+
+ /* Advance tail pointer past any empty and reaped descrs */
+ descr = chain->tail;
+ status = spider_net_get_descr_status(descr->hwdescr);
+
+ for (i=0; i<chain->num_desc; i++) {
+ if ((status != SPIDER_NET_DESCR_CARDOWNED) &&
+ (status != SPIDER_NET_DESCR_NOT_IN_USE)) break;
+ descr = descr->next;
+ status = spider_net_get_descr_status(descr->hwdescr);
+ }
+ chain->tail = descr;
+ if ((i != 0) && (i != chain->num_desc))
+ return 0;
+ return 1;
+}
+
+/**
* spider_net_decode_one_descr - processes an RX descriptor
* @card: card structure
*
@@ -1237,6 +1296,12 @@ spider_net_poll(struct net_device *netde
}
}
+ if ((packets_done == 0) && (card->num_rx_ints != 0)) {
+ no_more_packets = spider_net_resync_tail_ptr(card);
+ spider_net_resync_head_ptr(card);
+ }
+ card->num_rx_ints = 0;
+
netdev->quota -= packets_done;
*budget -= packets_done;
spider_net_refill_rx_chain(card);
@@ -1520,15 +1585,16 @@ spider_net_handle_error_irq(struct spide
case SPIDER_NET_GRFAFLLINT: /* fallthrough */
case SPIDER_NET_GRMFLLINT:
if (netif_msg_intr(card) && net_ratelimit()) {
- dev_err(&card->netdev->dev, "Spider RX RAM full, "
+ dev_info(&card->netdev->dev, "Spider RX RAM full, "
"incoming packets might be discarded!\n");
show_rx_chain(card);
}
- spider_net_rx_irq_off(card);
-
- /* If the card is spewing rxramfulls, then reset */
- atomic_inc(&card->tx_timeout_task_counter);
- schedule_work(&card->tx_timeout_task);
+ /* Could happen when rx chain is full */
+ spider_net_resync_head_ptr(card);
+ spider_net_refill_rx_chain(card);
+ spider_net_enable_rxdmac(card);
+ card->num_rx_ints ++;
+ netif_rx_schedule(card->netdev);
show_error = 0;
break;
@@ -1544,8 +1610,11 @@ spider_net_handle_error_irq(struct spide
case SPIDER_NET_GDBDCEINT: /* fallthrough */
case SPIDER_NET_GDADCEINT:
/* Could happen when rx chain is full */
+ spider_net_resync_head_ptr(card);
spider_net_refill_rx_chain(card);
spider_net_enable_rxdmac(card);
+ card->num_rx_ints ++;
+ netif_rx_schedule(card->netdev);
show_error = 0;
break;
@@ -1555,8 +1624,11 @@ spider_net_handle_error_irq(struct spide
case SPIDER_NET_GDBINVDINT: /* fallthrough */
case SPIDER_NET_GDAINVDINT:
/* Could happen when rx chain is full */
+ spider_net_resync_head_ptr(card);
spider_net_refill_rx_chain(card);
spider_net_enable_rxdmac(card);
+ card->num_rx_ints ++;
+ netif_rx_schedule(card->netdev);
show_error = 0;
break;
@@ -1648,6 +1720,7 @@ spider_net_interrupt(int irq, void *ptr)
if (status_reg & SPIDER_NET_RXINT ) {
spider_net_rx_irq_off(card);
netif_rx_schedule(netdev);
+ card->num_rx_ints ++;
}
if (status_reg & SPIDER_NET_TXINT)
netif_rx_schedule(netdev);
@@ -2300,6 +2373,7 @@ spider_net_setup_netdev(struct spider_ne
* NETIF_F_HW_VLAN_FILTER */
netdev->irq = card->pdev->irq;
+ card->num_rx_ints = 0;
dn = pci_device_to_OF_node(card->pdev);
if (!dn)
Index: linux-2.6.22-rc1/drivers/net/spider_net.h
===================================================================
--- linux-2.6.22-rc1.orig/drivers/net/spider_net.h 2007-06-07 11:52:22.000000000 -0500
+++ linux-2.6.22-rc1/drivers/net/spider_net.h 2007-06-07 11:52:35.000000000 -0500
@@ -466,6 +466,7 @@ struct spider_net_card {
struct work_struct tx_timeout_task;
atomic_t tx_timeout_task_counter;
wait_queue_head_t waitq;
+ int num_rx_ints;
/* for ethtool */
int msg_enable;
^ permalink raw reply [flat|nested] 69+ messages in thread* [PATCH 14/18] spidernet: silence the ramfull messages
2007-06-07 19:17 [PATCH 0/18] spidernet driver bug fixes Linas Vepstas
` (12 preceding siblings ...)
2007-06-07 19:51 ` [PATCH 13/18] spidernet: Cure RX ram full bug Linas Vepstas
@ 2007-06-07 19:53 ` Linas Vepstas
2007-06-07 19:55 ` [PATCH 15/18] spidernet: minor RX optimization Linas Vepstas
` (4 subsequent siblings)
18 siblings, 0 replies; 69+ messages in thread
From: Linas Vepstas @ 2007-06-07 19:53 UTC (permalink / raw)
To: Jeff Garzik; +Cc: cbe-oss-dev, netdev
Altough the previous patch resolved issues with hangs when the
RX ram full interrupt is encountered, there are still situations
where lots of RX ramfull interrupts arrive, rsulting in a noisy
log in syslog. There is no need for this.
Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
----
drivers/net/spider_net.c | 20 ++++++++++----------
drivers/net/spider_net.h | 3 ++-
2 files changed, 12 insertions(+), 11 deletions(-)
Index: linux-2.6.22-rc1/drivers/net/spider_net.c
===================================================================
--- linux-2.6.22-rc1.orig/drivers/net/spider_net.c 2007-06-07 11:53:55.000000000 -0500
+++ linux-2.6.22-rc1/drivers/net/spider_net.c 2007-06-07 11:56:10.000000000 -0500
@@ -1314,6 +1314,7 @@ spider_net_poll(struct net_device *netde
if (no_more_packets) {
netif_rx_complete(netdev);
spider_net_rx_irq_on(card);
+ card->ignore_rx_ramfull = 0;
return 0;
}
@@ -1584,17 +1585,15 @@ spider_net_handle_error_irq(struct spide
case SPIDER_NET_GRFBFLLINT: /* fallthrough */
case SPIDER_NET_GRFAFLLINT: /* fallthrough */
case SPIDER_NET_GRMFLLINT:
- if (netif_msg_intr(card) && net_ratelimit()) {
- dev_info(&card->netdev->dev, "Spider RX RAM full, "
- "incoming packets might be discarded!\n");
- show_rx_chain(card);
- }
/* Could happen when rx chain is full */
- spider_net_resync_head_ptr(card);
- spider_net_refill_rx_chain(card);
- spider_net_enable_rxdmac(card);
- card->num_rx_ints ++;
- netif_rx_schedule(card->netdev);
+ if (card->ignore_rx_ramfull == 0) {
+ card->ignore_rx_ramfull = 1;
+ spider_net_resync_head_ptr(card);
+ spider_net_refill_rx_chain(card);
+ spider_net_enable_rxdmac(card);
+ card->num_rx_ints ++;
+ netif_rx_schedule(card->netdev);
+ }
show_error = 0;
break;
@@ -2374,6 +2373,7 @@ spider_net_setup_netdev(struct spider_ne
netdev->irq = card->pdev->irq;
card->num_rx_ints = 0;
+ card->ignore_rx_ramfull = 0;
dn = pci_device_to_OF_node(card->pdev);
if (!dn)
Index: linux-2.6.22-rc1/drivers/net/spider_net.h
===================================================================
--- linux-2.6.22-rc1.orig/drivers/net/spider_net.h 2007-06-07 11:52:35.000000000 -0500
+++ linux-2.6.22-rc1/drivers/net/spider_net.h 2007-06-07 11:55:06.000000000 -0500
@@ -164,7 +164,7 @@ extern char spider_net_driver_name[];
/** interrupt mask registers */
#define SPIDER_NET_INT0_MASK_VALUE 0x3f7fe2c7
-#define SPIDER_NET_INT1_MASK_VALUE 0xffff7ff7
+#define SPIDER_NET_INT1_MASK_VALUE 0xffff5ff5
/* no MAC aborts -> auto retransmission */
#define SPIDER_NET_INT2_MASK_VALUE 0xffef7ff1
@@ -467,6 +467,7 @@ struct spider_net_card {
atomic_t tx_timeout_task_counter;
wait_queue_head_t waitq;
int num_rx_ints;
+ int ignore_rx_ramfull;
/* for ethtool */
int msg_enable;
^ permalink raw reply [flat|nested] 69+ messages in thread* [PATCH 15/18] spidernet: minor RX optimization
2007-06-07 19:17 [PATCH 0/18] spidernet driver bug fixes Linas Vepstas
` (13 preceding siblings ...)
2007-06-07 19:53 ` [PATCH 14/18] spidernet: silence the ramfull messages Linas Vepstas
@ 2007-06-07 19:55 ` Linas Vepstas
2007-06-07 19:57 ` [PATCH 16/18] spidernet: fix misnamed flag Linas Vepstas
` (3 subsequent siblings)
18 siblings, 0 replies; 69+ messages in thread
From: Linas Vepstas @ 2007-06-07 19:55 UTC (permalink / raw)
To: Jeff Garzik; +Cc: cbe-oss-dev, netdev
A minor optimization on the RX side is that the hardware does
not need to be kicked if space did not open up in the RX ring.
Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
----
drivers/net/spider_net.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
Index: linux-2.6.22-rc1/drivers/net/spider_net.c
===================================================================
--- linux-2.6.22-rc1.orig/drivers/net/spider_net.c 2007-06-07 11:56:10.000000000 -0500
+++ linux-2.6.22-rc1/drivers/net/spider_net.c 2007-06-07 11:56:23.000000000 -0500
@@ -525,6 +525,7 @@ spider_net_refill_rx_chain(struct spider
{
struct spider_net_descr_chain *chain = &card->rx_chain;
unsigned long flags;
+ int cnt = 0;
/* one context doing the refill (and a second context seeing that
* and omitting it) is ok. If called by NAPI, we'll be called again
@@ -538,9 +539,13 @@ spider_net_refill_rx_chain(struct spider
if (spider_net_prepare_rx_descr(card, chain->head))
break;
chain->head = chain->head->next;
+ cnt ++;
}
spin_unlock_irqrestore(&chain->lock, flags);
+
+ if (cnt)
+ spider_net_enable_rxdmac(card);
}
/**
@@ -573,7 +578,6 @@ spider_net_alloc_rx_skbs(struct spider_n
/* This will allocate the rest of the rx buffers;
* if not, it's business as usual later on. */
spider_net_refill_rx_chain(card);
- spider_net_enable_rxdmac(card);
return 0;
error:
@@ -1305,7 +1309,6 @@ spider_net_poll(struct net_device *netde
netdev->quota -= packets_done;
*budget -= packets_done;
spider_net_refill_rx_chain(card);
- spider_net_enable_rxdmac(card);
spider_net_cleanup_tx_ring(card);
@@ -1590,7 +1593,6 @@ spider_net_handle_error_irq(struct spide
card->ignore_rx_ramfull = 1;
spider_net_resync_head_ptr(card);
spider_net_refill_rx_chain(card);
- spider_net_enable_rxdmac(card);
card->num_rx_ints ++;
netif_rx_schedule(card->netdev);
}
@@ -1611,7 +1613,6 @@ spider_net_handle_error_irq(struct spide
/* Could happen when rx chain is full */
spider_net_resync_head_ptr(card);
spider_net_refill_rx_chain(card);
- spider_net_enable_rxdmac(card);
card->num_rx_ints ++;
netif_rx_schedule(card->netdev);
show_error = 0;
@@ -1625,7 +1626,6 @@ spider_net_handle_error_irq(struct spide
/* Could happen when rx chain is full */
spider_net_resync_head_ptr(card);
spider_net_refill_rx_chain(card);
- spider_net_enable_rxdmac(card);
card->num_rx_ints ++;
netif_rx_schedule(card->netdev);
show_error = 0;
^ permalink raw reply [flat|nested] 69+ messages in thread* [PATCH 16/18] spidernet: fix misnamed flag
2007-06-07 19:17 [PATCH 0/18] spidernet driver bug fixes Linas Vepstas
` (14 preceding siblings ...)
2007-06-07 19:55 ` [PATCH 15/18] spidernet: minor RX optimization Linas Vepstas
@ 2007-06-07 19:57 ` Linas Vepstas
2007-06-07 20:01 ` [PATCH 17/18] spidernet: turn off descriptor chain end interrupt Linas Vepstas
` (2 subsequent siblings)
18 siblings, 0 replies; 69+ messages in thread
From: Linas Vepstas @ 2007-06-07 19:57 UTC (permalink / raw)
To: Jeff Garzik; +Cc: cbe-oss-dev, netdev
The transmit frame tail bit is stranglely misnamed as
"no checksum". Fix the name to what it should be:
"transmit frame tail". No functional change,
just a name change.
Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
----
drivers/net/spider_net.c | 2 +-
drivers/net/spider_net.h | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
Index: linux-2.6.22-rc1/drivers/net/spider_net.c
===================================================================
--- linux-2.6.22-rc1.orig/drivers/net/spider_net.c 2007-06-07 11:56:23.000000000 -0500
+++ linux-2.6.22-rc1/drivers/net/spider_net.c 2007-06-07 11:56:31.000000000 -0500
@@ -720,7 +720,7 @@ spider_net_prepare_tx_descr(struct spide
hwdescr->data_status = 0;
hwdescr->dmac_cmd_status =
- SPIDER_NET_DESCR_CARDOWNED | SPIDER_NET_DMAC_NOCS;
+ SPIDER_NET_DESCR_CARDOWNED | SPIDER_NET_DMAC_TXFRMTL;
spin_unlock_irqrestore(&chain->lock, flags);
if (skb->ip_summed == CHECKSUM_PARTIAL)
Index: linux-2.6.22-rc1/drivers/net/spider_net.h
===================================================================
--- linux-2.6.22-rc1.orig/drivers/net/spider_net.h 2007-06-07 11:55:06.000000000 -0500
+++ linux-2.6.22-rc1/drivers/net/spider_net.h 2007-06-07 11:56:31.000000000 -0500
@@ -354,7 +354,7 @@ enum spider_net_int2_status {
#define SPIDER_NET_GPRDAT_MASK 0x0000ffff
#define SPIDER_NET_DMAC_NOINTR_COMPLETE 0x00800000
-#define SPIDER_NET_DMAC_NOCS 0x00040000
+#define SPIDER_NET_DMAC_TXFRMTL 0x00040000
#define SPIDER_NET_DMAC_TCP 0x00020000
#define SPIDER_NET_DMAC_UDP 0x00030000
#define SPIDER_NET_TXDCEST 0x08000000
^ permalink raw reply [flat|nested] 69+ messages in thread* [PATCH 17/18] spidernet: turn off descriptor chain end interrupt.
2007-06-07 19:17 [PATCH 0/18] spidernet driver bug fixes Linas Vepstas
` (15 preceding siblings ...)
2007-06-07 19:57 ` [PATCH 16/18] spidernet: fix misnamed flag Linas Vepstas
@ 2007-06-07 20:01 ` Linas Vepstas
2007-06-07 20:05 ` [PATCH 18/18] spidernet: driver docmentation Linas Vepstas
2007-06-08 1:12 ` [Cbe-oss-dev] [PATCH 0/18] spidernet driver bug fixes Michael Ellerman
18 siblings, 0 replies; 69+ messages in thread
From: Linas Vepstas @ 2007-06-07 20:01 UTC (permalink / raw)
To: Jeff Garzik
Cc: cbe-oss-dev, netdev, Nathan J Lee, Ling Shao, Utz Bacher,
Zhen Bo Zhu, Zhu Han, Jens Osterkamp, Yan Qi Wang
At some point, the transmit descriptor chain end interrupt (TXDCEINT)
was turned on. This is a mistake; and it damages small packet
transmit performance, as it results in a huge storm of interrupts.
Turn it off.
Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
----
drivers/net/spider_net.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
Index: linux-2.6.22-rc1/drivers/net/spider_net.h
===================================================================
--- linux-2.6.22-rc1.orig/drivers/net/spider_net.h 2007-06-07 11:56:31.000000000 -0500
+++ linux-2.6.22-rc1/drivers/net/spider_net.h 2007-06-07 11:56:36.000000000 -0500
@@ -227,6 +227,7 @@ extern char spider_net_driver_name[];
#define SPIDER_NET_GDTBSTA 0x00000300
#define SPIDER_NET_GDTDCEIDIS 0x00000002
#define SPIDER_NET_DMA_TX_VALUE SPIDER_NET_TX_DMA_EN | \
+ SPIDER_NET_GDTDCEIDIS | \
SPIDER_NET_GDTBSTA
#define SPIDER_NET_DMA_TX_FEND_VALUE 0x00030003
@@ -337,8 +338,7 @@ enum spider_net_int2_status {
SPIDER_NET_GRISPDNGINT
};
-#define SPIDER_NET_TXINT ( (1 << SPIDER_NET_GDTFDCINT) | \
- (1 << SPIDER_NET_GDTDCEINT) )
+#define SPIDER_NET_TXINT (1 << SPIDER_NET_GDTFDCINT)
/* We rely on flagged descriptor interrupts */
#define SPIDER_NET_RXINT ( (1 << SPIDER_NET_GDAFDCINT) )
^ permalink raw reply [flat|nested] 69+ messages in thread* [PATCH 18/18] spidernet: driver docmentation
2007-06-07 19:17 [PATCH 0/18] spidernet driver bug fixes Linas Vepstas
` (16 preceding siblings ...)
2007-06-07 20:01 ` [PATCH 17/18] spidernet: turn off descriptor chain end interrupt Linas Vepstas
@ 2007-06-07 20:05 ` Linas Vepstas
2007-06-08 1:12 ` [Cbe-oss-dev] [PATCH 0/18] spidernet driver bug fixes Michael Ellerman
18 siblings, 0 replies; 69+ messages in thread
From: Linas Vepstas @ 2007-06-07 20:05 UTC (permalink / raw)
To: Jeff Garzik
Cc: cbe-oss-dev, netdev, Nathan J Lee, Ling Shao, Utz Bacher,
Zhen Bo Zhu, Zhu Han, Jens Osterkamp, Yan Qi Wang
Documentation for the spidernet driver.
Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
----
Documentation/networking/spider_net.txt | 204 ++++++++++++++++++++++++++++++++
1 file changed, 204 insertions(+)
Index: linux-2.6.22-rc1/Documentation/networking/spider_net.txt
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6.22-rc1/Documentation/networking/spider_net.txt 2007-06-07 14:01:52.000000000 -0500
@@ -0,0 +1,204 @@
+
+ The Spidernet Device Driver
+ ===========================
+
+Written by Linas Vepstas <linas@austin.ibm.com>
+
+Version of 7 June 2007
+
+Abstract
+========
+This document sketches the structure of portions of the spidernet
+device driver in the Linux kernel tree. The spidernet is a gigabit
+ethernet device built into the Toshiba southbridge commonly used
+in the SONY Playstation 3 and the IBM QS20 Cell blade.
+
+The Structure of the RX Ring.
+=============================
+The receive (RX) ring is a circular linked list of RX descriptors,
+together with three pointers into the ring that are used to manage its
+contents.
+
+The elements of the ring are called "descriptors" or "descrs"; they
+describe the received data. This includes a pointer to a buffer
+containing the received data, the buffer size, and various status bits.
+
+There are three primary states that a descriptor can be in: "empty",
+"full" and "not-in-use". An "empty" or "ready" descriptor is ready
+to receive data from the hardware. A "full" descriptor has data in it,
+and is waiting to be emptied and processed by the OS. A "not-in-use"
+descriptor is neither empty or full; it is simply not ready. It may
+not even have a data buffer in it, or is otherwise unusable.
+
+During normal operation, on device startup, the OS (specifically, the
+spidernet device driver) allocates a set of RX descriptors and RX
+buffers. These are all marked "empty", ready to receive data. This
+ring is handed off to the hardware, which sequentially fills in the
+buffers, and marks them "full". The OS follows up, taking the full
+buffers, processing them, and re-marking them empty.
+
+This filling and emptying is managed by three pointers, the "head"
+and "tail" pointers, managed by the OS, and a hardware current
+descriptor pointer (GDACTDPA). The GDACTDPA points at the descr
+currently being filled. When this descr is filled, the hardware
+marks it full, and advances the GDACTDPA by one. Thus, when there is
+flowing RX traffic, every descr behind it should be marked "full",
+and everything in front of it should be "empty". If the hardware
+discovers that the current descr is not empty, it will signal an
+interrupt, and halt processing.
+
+The tail pointer tails or trails the hardware pointer. When the
+hardware is ahead, the tail pointer will be pointing at a "full"
+descr. The OS will process this descr, and then mark it "not-in-use",
+and advance the tail pointer. Thus, when there is flowing RX traffic,
+all of the descrs in front of the tail pointer should be "full", and
+all of those behind it should be "not-in-use". When RX traffic is not
+flowing, then the tail pointer can catch up to the hardware pointer.
+The OS will then note that the current tail is "empty", and halt
+processing.
+
+The head pointer (somewhat mis-named) follows after the tail pointer.
+When traffic is flowing, then the head pointer will be pointing at
+a "not-in-use" descr. The OS will perform various housekeeping duties
+on this descr. This includes allocating a new data buffer and
+dma-mapping it so as to make it visible to the hardware. The OS will
+then mark the descr as "empty", ready to receive data. Thus, when there
+is flowing RX traffic, everything in front of the head pointer should
+be "not-in-use", and everything behind it should be "empty". If no
+RX traffic is flowing, then the head pointer can catch up to the tail
+pointer, at which point the OS will notice that the head descr is
+"empty", and it will halt processing.
+
+Thus, in an idle system, the GDACTDPA, tail and head pointers will
+all be pointing at the same descr, which should be "empty". All of the
+other descrs in the ring should be "empty" as well.
+
+The show_rx_chain() routine will print out the the locations of the
+GDACTDPA, tail and head pointers. It will also summarize the contents
+of the ring, starting at the tail pointer, and listing the status
+of the descrs that follow.
+
+A typical example of the output, for a nearly idle system, might be
+
+net eth1: Total number of descrs=256
+net eth1: Chain tail located at descr=20
+net eth1: Chain head is at 20
+net eth1: HW curr desc (GDACTDPA) is at 21
+net eth1: Have 1 descrs with stat=x40800101
+net eth1: HW next desc (GDACNEXTDA) is at 22
+net eth1: Last 255 descrs with stat=xa0800000
+
+In the above, the hardware has filled in one descr, number 20. Both
+head and tail are pointing at 20, because it has not yet been emptied.
+Meanwhile, hw is pointing at 21, which is free.
+
+The "Have nnn decrs" refers to the descr starting at the tail: in this
+case, nnn=1 descr, starting at descr 20. The "Last nnn descrs" refers
+to all of the rest of the descrs, from the last status change. The "nnn"
+is a count of how many descrs have exactly the same status.
+
+The status x4... corresponds to "full" and status xa... corresponds
+to "empty". The actual value printed is RXCOMST_A.
+
+In the device driver source code, a different set of names are
+used for these same concepts, so that
+
+"empty" == SPIDER_NET_DESCR_CARDOWNED == 0xa
+"full" == SPIDER_NET_DESCR_FRAME_END == 0x4
+"not in use" == SPIDER_NET_DESCR_NOT_IN_USE == 0xf
+
+
+The RX RAM full bug/feature
+===========================
+
+As long as the OS can empty out the RX buffers at a rate faster than
+the hardware can fill them, there is no problem. If, for some reason,
+the OS fails to empty the RX ring fast enough, the hardware GDACTDPA
+pointer will catch up to the head, notice the not-empty condition,
+ad stop. However, RX packets may still continue arriving on the wire.
+The spidernet chip can save some limited number of these in local RAM.
+When this local ram fills up, the spider chip will issue an interrupt
+indicating this (GHIINT0STS will show ERRINT, and the GRMFLLINT bit
+will be set in GHIINT1STS). When the RX ram full condition occurs,
+a certain bug/feature is triggered that has to be specially handled.
+This section describes the special handling for this condition.
+
+When the OS finally has a chance to run, it will empty out the RX ring.
+In particular, it will clear the descriptor on which the hardware had
+stopped. However, once the hardware has decided that a certain
+descriptor is invalid, it will not restart at that descriptor; instead
+it will restart at the next descr. This potentially will lead to a
+deadlock condition, as the tail pointer will be pointing at this descr,
+which, from the OS point of view, is empty; the OS will be waiting for
+this descr to be filled. However, the hardware has skipped this descr,
+and is filling the next descrs. Since the OS doesn't see this, there
+is a potential deadlock, with the OS waiting for one descr to fill,
+while the hardware is waiting for a different set of descrs to become
+empty.
+
+A call to show_rx_chain() at this point indicates the nature of the
+problem. A typical print when the network is hung shows the following:
+
+net eth1: Spider RX RAM full, incoming packets might be discarded!
+net eth1: Total number of descrs=256
+net eth1: Chain tail located at descr=255
+net eth1: Chain head is at 255
+net eth1: HW curr desc (GDACTDPA) is at 0
+net eth1: Have 1 descrs with stat=xa0800000
+net eth1: HW next desc (GDACNEXTDA) is at 1
+net eth1: Have 127 descrs with stat=x40800101
+net eth1: Have 1 descrs with stat=x40800001
+net eth1: Have 126 descrs with stat=x40800101
+net eth1: Last 1 descrs with stat=xa0800000
+
+Both the tail and head pointers are pointing at descr 255, which is
+marked xa... which is "empty". Thus, from the OS point of view, there
+is nothing to be done. In particular, there is the implicit assumption
+that everything in front of the "empty" descr must surely also be empty,
+as explained in the last section. The OS is waiting for descr 255 to
+become non-empty, which, in this case, will never happen.
+
+The HW pointer is at descr 0. This descr is marked 0x4.. or "full".
+Since its already full, the hardware can do nothing more, and thus has
+halted processing. Notice that descrs 0 through 254 are all marked
+"full", while descr 254 and 255 are empty. (The "Last 1 descrs" is
+descr 254, since tail was at 255.) Thus, the system is deadlocked,
+and there can be no forward progress; the OS thinks there's nothing
+to do, and the hardware has nowhere to put incoming data.
+
+This bug/feature is worked around with the spider_net_resync_head_ptr()
+routine. When the driver receives RX interrupts, but an examination
+of the RX chain seems to show it is empty, then it is probable that
+the hardware has skipped a descr or two (sometimes dozens under heavy
+network conditions). The spider_net_resync_head_ptr() subroutine will
+search the ring for the next full descr, and the driver will resume
+operations there. Since this will leave "holes" in the ring, there
+is also a spider_net_resync_tail_ptr() that will skip over such holes.
+
+As of this writing, the spider_net_resync() strategy seems to work very
+well, even under heavy network loads.
+
+
+The TX ring
+===========
+The TX ring uses a low-watermark interrupt scheme to make sure that
+the TX queue is appropriately serviced for large packet sizes.
+
+For packet sizes greater than about 1KBytes, the kernel can fill
+the TX ring quicker than the device can drain it. Once the ring
+is full, the netdev is stopped. When there is room in the ring,
+the netdev needs to be reawakened, so that more TX packets are placed
+in the ring. The hardware can empty the ring about four times per jiffy,
+so its not appropriate to wait for the poll routine to refill, since
+the poll routine runs only once per jiffy. The low-watermark mechanism
+marks a descr about 1/4th of the way from the bottom of the queue, so
+that an interrupt is generated when the descr is processed. This
+interrupt wakes up the netdev, which can then refill the queue.
+For large packets, this mechanism generates a relatively small number
+of interrupts, about 1K/sec. For smaller packets, this will drop to zero
+interrupts, as the hardware can empty the queue faster than the kernel
+can fill it.
+
+
+ ======= END OF DOCUMENT ========
+
^ permalink raw reply [flat|nested] 69+ messages in thread* Re: [Cbe-oss-dev] [PATCH 0/18] spidernet driver bug fixes
2007-06-07 19:17 [PATCH 0/18] spidernet driver bug fixes Linas Vepstas
` (17 preceding siblings ...)
2007-06-07 20:05 ` [PATCH 18/18] spidernet: driver docmentation Linas Vepstas
@ 2007-06-08 1:12 ` Michael Ellerman
2007-06-08 17:06 ` Linas Vepstas
18 siblings, 1 reply; 69+ messages in thread
From: Michael Ellerman @ 2007-06-08 1:12 UTC (permalink / raw)
To: Linas Vepstas; +Cc: Jeff Garzik, netdev, cbe-oss-dev
[-- Attachment #1: Type: text/plain, Size: 768 bytes --]
On Thu, 2007-06-07 at 14:17 -0500, Linas Vepstas wrote:
> Jeff, please apply for the 2.6.23 kernel tree. The pach series
> consists of two major bugfixes, and several bits of cleanup.
>
> The major bug fixes are:
>
> 1) a rare but fatal bug involving "RX ram full" messages,
> which results in a driver deadlock.
>
> 2) misconfigured TX interrupts, causing a sever performance
> degardation for small packets.
I realise it's late, but shouldn't "major bugfixes" be going into 22 ?
cheers
--
Michael Ellerman
OzLabs, IBM Australia Development Lab
wwweb: http://michael.ellerman.id.au
phone: +61 2 6212 1183 (tie line 70 21183)
We do not inherit the earth from our ancestors,
we borrow it from our children. - S.M.A.R.T Person
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 69+ messages in thread* Re: [Cbe-oss-dev] [PATCH 0/18] spidernet driver bug fixes
2007-06-08 1:12 ` [Cbe-oss-dev] [PATCH 0/18] spidernet driver bug fixes Michael Ellerman
@ 2007-06-08 17:06 ` Linas Vepstas
2007-06-08 17:20 ` Jeff Garzik
0 siblings, 1 reply; 69+ messages in thread
From: Linas Vepstas @ 2007-06-08 17:06 UTC (permalink / raw)
To: Michael Ellerman; +Cc: Jeff Garzik, netdev, cbe-oss-dev
On Fri, Jun 08, 2007 at 11:12:31AM +1000, Michael Ellerman wrote:
> On Thu, 2007-06-07 at 14:17 -0500, Linas Vepstas wrote:
> > Jeff, please apply for the 2.6.23 kernel tree. The pach series
> > consists of two major bugfixes, and several bits of cleanup.
> >
> > The major bug fixes are:
> >
> > 1) a rare but fatal bug involving "RX ram full" messages,
> > which results in a driver deadlock.
> >
> > 2) misconfigured TX interrupts, causing a sever performance
> > degardation for small packets.
>
> I realise it's late, but shouldn't "major bugfixes" be going into 22 ?
Yeah, I suppose, I admit I've lost track of the process.
I'm not sure how to submit patches for this case. The "major fixes"
are patches 6/18, 13/18 14/18 and 17/18; (the rest of the patches are
cruft-fixes). Taken alone, these four will not apply cleanly.
I could prepare a new set, with just these four; asuming these are
accepted into 2.6.22, then once 22 comes out, Jeff's .23 tree won't
merge cleanly.
What's the right way to do this?
--linas
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [Cbe-oss-dev] [PATCH 0/18] spidernet driver bug fixes
2007-06-08 17:06 ` Linas Vepstas
@ 2007-06-08 17:20 ` Jeff Garzik
2007-06-11 18:14 ` [PATCH 0/15] " Linas Vepstas
0 siblings, 1 reply; 69+ messages in thread
From: Jeff Garzik @ 2007-06-08 17:20 UTC (permalink / raw)
To: Linas Vepstas; +Cc: Michael Ellerman, Jeff Garzik, netdev, cbe-oss-dev
On Fri, Jun 08, 2007 at 12:06:08PM -0500, Linas Vepstas wrote:
> On Fri, Jun 08, 2007 at 11:12:31AM +1000, Michael Ellerman wrote:
> > On Thu, 2007-06-07 at 14:17 -0500, Linas Vepstas wrote:
> > > Jeff, please apply for the 2.6.23 kernel tree. The pach series
> > > consists of two major bugfixes, and several bits of cleanup.
> > >
> > > The major bug fixes are:
> > >
> > > 1) a rare but fatal bug involving "RX ram full" messages,
> > > which results in a driver deadlock.
> > >
> > > 2) misconfigured TX interrupts, causing a sever performance
> > > degardation for small packets.
> >
> > I realise it's late, but shouldn't "major bugfixes" be going into 22 ?
>
> Yeah, I suppose, I admit I've lost track of the process.
>
> I'm not sure how to submit patches for this case. The "major fixes"
> are patches 6/18, 13/18 14/18 and 17/18; (the rest of the patches are
> cruft-fixes). Taken alone, these four will not apply cleanly.
>
> I could prepare a new set, with just these four; asuming these are
> accepted into 2.6.22, then once 22 comes out, Jeff's .23 tree won't
> merge cleanly.
You need to order your bug fixes first in the queue. I push those
upstream, and simultaneous merge the result into netdev#upstream (2.6.23
queue).
Jeff
^ permalink raw reply [flat|nested] 69+ messages in thread
* [PATCH 0/15] spidernet driver bug fixes
2007-06-08 17:20 ` Jeff Garzik
@ 2007-06-11 18:14 ` Linas Vepstas
2007-06-11 18:17 ` [PATCH 1/15] spidernet: null out skb pointer after its been used Linas Vepstas
` (2 more replies)
0 siblings, 3 replies; 69+ messages in thread
From: Linas Vepstas @ 2007-06-11 18:14 UTC (permalink / raw)
To: Jeff Garzik; +Cc: Michael Ellerman, Jeff Garzik, netdev, cbe-oss-dev
On Fri, Jun 08, 2007 at 01:20:20PM -0400, Jeff Garzik wrote:
> On Fri, Jun 08, 2007 at 12:06:08PM -0500, Linas Vepstas wrote:
> > On Fri, Jun 08, 2007 at 11:12:31AM +1000, Michael Ellerman wrote:
> > > On Thu, 2007-06-07 at 14:17 -0500, Linas Vepstas wrote:
> > > >
> > > > The major bug fixes are:
> > > I realise it's late, but shouldn't "major bugfixes" be going into 22 ?
> > Yeah, I suppose, I admit I've lost track of the process.
>
> You need to order your bug fixes first in the queue.
OK, here are the patches, re-ordered. There is a different number
than last time, as I threw out one, merged one, and got cold feet
on a third one. They still pass the tests.
The first five patches focus on three serious bugs, fixing crashes or
hangs.
-- patch 1 -- kernel crash when ifdown while receiving packets.
-- patch 2,3,4 -- device driver deadlocks on "RX ram full" mesgs.
(kernel stays up, ifdown/up clear the problem).
-- patch 5 -- misconfigured TX interrupts results in 3x-4x per
degradation for small packets.
-- patch 6 -- rx stats may be mangled
-- patch 7 -- hw checksum sometimes breaks ipv6 operation
-- patches 8-15 -- misc tweaks, and documentation.
I re-ran my stress tests with patches 1-7 applied; they pass.
I suggest that patches 1-5 or 1-7 be applied asap.
--linas
^ permalink raw reply [flat|nested] 69+ messages in thread* [PATCH 1/15] spidernet: null out skb pointer after its been used.
2007-06-11 18:14 ` [PATCH 0/15] " Linas Vepstas
@ 2007-06-11 18:17 ` Linas Vepstas
2007-06-11 18:21 ` [PATCH 2/15] spidernet: Cure RX ram full bug Linas Vepstas
` (14 more replies)
2007-06-12 2:01 ` [PATCH 0/15] spidernet driver bug fixes Michael Ellerman
2007-06-12 23:00 ` Jeff Garzik
2 siblings, 15 replies; 69+ messages in thread
From: Linas Vepstas @ 2007-06-11 18:17 UTC (permalink / raw)
To: Jeff Garzik; +Cc: Michael Ellerman, netdev, cbe-oss-dev
Avoid kernel crash in mm/slab.c due to double-free of pointer.
If the ethernet interface is brought down while there is still
RX traffic in flight, the device shutdown routine can end up
trying to double-free an skb, leading to a crash in mm/slab.c
Avoid the double-free by nulling out the skb pointer.
Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
----
drivers/net/spider_net.c | 1 +
1 file changed, 1 insertion(+)
Index: linux-2.6.22-rc1/drivers/net/spider_net.c
===================================================================
--- linux-2.6.22-rc1.orig/drivers/net/spider_net.c 2007-06-08 15:45:33.000000000 -0500
+++ linux-2.6.22-rc1/drivers/net/spider_net.c 2007-06-08 15:48:10.000000000 -0500
@@ -1131,6 +1131,7 @@ spider_net_decode_one_descr(struct spide
/* Ok, we've got a packet in descr */
spider_net_pass_skb_up(descr, card);
+ descr->skb = NULL;
hwdescr->dmac_cmd_status = SPIDER_NET_DESCR_NOT_IN_USE;
return 1;
^ permalink raw reply [flat|nested] 69+ messages in thread* [PATCH 2/15] spidernet: Cure RX ram full bug
2007-06-11 18:17 ` [PATCH 1/15] spidernet: null out skb pointer after its been used Linas Vepstas
@ 2007-06-11 18:21 ` Linas Vepstas
2007-06-11 18:23 ` [PATCH 3/15] spidernet: Don't terminate the RX ring Linas Vepstas
` (13 subsequent siblings)
14 siblings, 0 replies; 69+ messages in thread
From: Linas Vepstas @ 2007-06-11 18:21 UTC (permalink / raw)
To: Jeff Garzik; +Cc: Michael Ellerman, netdev, cbe-oss-dev
This patch fixes a rare deadlock that can occur when the kernel
is not able to empty out the RX ring quickly enough. Below follows
a detailed description of the bug and the fix.
As long as the OS can empty out the RX buffers at a rate faster than
the hardware can fill them, there is no problem. If, for some reason,
the OS fails to empty the RX ring fast enough, the hardware GDACTDPA
pointer will catch up to the head, notice the not-empty condition,
ad stop. However, RX packets may still continue arriving on the wire.
The spidernet chip can save some limited number of these in local RAM.
When this local ram fills up, the spider chip will issue an interrupt
indicating this (GHIINT0STS will show ERRINT, and the GRMFLLINT bit
will be set in GHIINT1STS). When te RX ram full condition occurs,
a certain bug/feature is triggered that has to be specially handled.
This section describes the special handling for this condition.
When the OS finally has a chance to run, it will empty out the RX ring.
In particular, it will clear the descriptor on which the hardware had
stopped. However, once the hardware has decided that a certain
descriptor is invalid, it will not restart at that descriptor; instead
it will restart at the next descr. This potentially will lead to a
deadlock condition, as the tail pointer will be pointing at this descr,
which, from the OS point of view, is empty; the OS will be waiting for
this descr to be filled. However, the hardware has skipped this descr,
and is filling the next descrs. Since the OS doesn't see this, there
is a potential deadlock, with the OS waiting for one descr to fill,
while the hardware is waiting for a differen set of descrs to become
empty.
A call to show_rx_chain() at this point indicates the nature of the
problem. A typical print when the network is hung shows the following:
net eth1: Spider RX RAM full, incoming packets might be discarded!
net eth1: Total number of descrs=256
net eth1: Chain tail located at descr=255
net eth1: Chain head is at 255
net eth1: HW curr desc (GDACTDPA) is at 0
net eth1: Have 1 descrs with stat=xa0800000
net eth1: HW next desc (GDACNEXTDA) is at 1
net eth1: Have 127 descrs with stat=x40800101
net eth1: Have 1 descrs with stat=x40800001
net eth1: Have 126 descrs with stat=x40800101
net eth1: Last 1 descrs with stat=xa0800000
Both the tail and head pointers are pointing at descr 255, which is
marked xa... which is "empty". Thus, from the OS point of view, there
is nothing to be done. In particular, there is the implicit assumption
that everything in front of the "empty" descr must surely also be empty,
as explained in the last section. The OS is waiting for descr 255 to
become non-empty, which, in this case, will never happen.
The HW pointer is at descr 0. This descr is marked 0x4.. or "full".
Since its already full, the hardware can do nothing more, and thus has
halted processing. Notice that descrs 0 through 254 are all marked
"full", while descr 254 and 255 are empty. (The "Last 1 descrs" is
descr 254, since tail was at 255.) Thus, the system is deadlocked,
and there can be no forward progress; the OS thinks there's nothing
to do, and the hardware has nowhere to put incoming data.
This bug/feature is worked around with the spider_net_resync_head_ptr()
routine. When the driver receives RX interrupts, but an examination
of the RX chain seems to show it is empty, then it is probable that
the hardware has skipped a descr or two (sometimes dozens under heavy
network conditions). The spider_net_resync_head_ptr() subroutine will
search the ring for the next full descr, and the driver will resume
operations there. Since this will leave "holes" in the ring, there
is also a spider_net_resync_tail_ptr() that will skip over such holes.
Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
----
drivers/net/spider_net.c | 86 +++++++++++++++++++++++++++++++++++++++++++----
drivers/net/spider_net.h | 3 +
2 files changed, 82 insertions(+), 7 deletions(-)
Index: linux-2.6.22-rc1/drivers/net/spider_net.c
===================================================================
--- linux-2.6.22-rc1.orig/drivers/net/spider_net.c 2007-06-08 15:48:10.000000000 -0500
+++ linux-2.6.22-rc1/drivers/net/spider_net.c 2007-06-11 10:02:12.000000000 -0500
@@ -1051,6 +1051,66 @@ static void show_rx_chain(struct spider_
#endif
/**
+ * spider_net_resync_head_ptr - Advance head ptr past empty descrs
+ *
+ * If the driver fails to keep up and empty the queue, then the
+ * hardware wil run out of room to put incoming packets. This
+ * will cause the hardware to skip descrs that are full (instead
+ * of halting/retrying). Thus, once the driver runs, it wil need
+ * to "catch up" to where the hardware chain pointer is at.
+ */
+static void spider_net_resync_head_ptr(struct spider_net_card *card)
+{
+ unsigned long flags;
+ struct spider_net_descr_chain *chain = &card->rx_chain;
+ struct spider_net_descr *descr;
+ int i, status;
+
+ /* Advance head pointer past any empty descrs */
+ descr = chain->head;
+ status = spider_net_get_descr_status(descr->hwdescr);
+
+ if (status == SPIDER_NET_DESCR_NOT_IN_USE)
+ return;
+
+ spin_lock_irqsave(&chain->lock, flags);
+
+ descr = chain->head;
+ status = spider_net_get_descr_status(descr->hwdescr);
+ for (i=0; i<chain->num_desc; i++) {
+ if (status != SPIDER_NET_DESCR_CARDOWNED) break;
+ descr = descr->next;
+ status = spider_net_get_descr_status(descr->hwdescr);
+ }
+ chain->head = descr;
+
+ spin_unlock_irqrestore(&chain->lock, flags);
+}
+
+static int spider_net_resync_tail_ptr(struct spider_net_card *card)
+{
+ struct spider_net_descr_chain *chain = &card->rx_chain;
+ struct spider_net_descr *descr;
+ int i, status;
+
+ /* Advance tail pointer past any empty and reaped descrs */
+ descr = chain->tail;
+ status = spider_net_get_descr_status(descr->hwdescr);
+
+ for (i=0; i<chain->num_desc; i++) {
+ if ((status != SPIDER_NET_DESCR_CARDOWNED) &&
+ (status != SPIDER_NET_DESCR_NOT_IN_USE)) break;
+ descr = descr->next;
+ status = spider_net_get_descr_status(descr->hwdescr);
+ }
+ chain->tail = descr;
+
+ if ((i == chain->num_desc) || (i == 0))
+ return 1;
+ return 0;
+}
+
+/**
* spider_net_decode_one_descr - processes an RX descriptor
* @card: card structure
*
@@ -1175,6 +1235,12 @@ spider_net_poll(struct net_device *netde
}
}
+ if ((packets_done == 0) && (card->num_rx_ints != 0)) {
+ no_more_packets = spider_net_resync_tail_ptr(card);
+ spider_net_resync_head_ptr(card);
+ }
+ card->num_rx_ints = 0;
+
netdev->quota -= packets_done;
*budget -= packets_done;
spider_net_refill_rx_chain(card);
@@ -1458,7 +1524,11 @@ spider_net_handle_error_irq(struct spide
if (netif_msg_intr(card) && net_ratelimit())
pr_err("Spider RX RAM full, incoming packets "
"might be discarded!\n");
- spider_net_rx_irq_off(card);
+ /* Could happen when rx chain is full */
+ spider_net_resync_head_ptr(card);
+ spider_net_refill_rx_chain(card);
+ spider_net_enable_rxdmac(card);
+ card->num_rx_ints ++;
netif_rx_schedule(card->netdev);
show_error = 0;
break;
@@ -1474,12 +1544,11 @@ spider_net_handle_error_irq(struct spide
case SPIDER_NET_GDCDCEINT: /* fallthrough */
case SPIDER_NET_GDBDCEINT: /* fallthrough */
case SPIDER_NET_GDADCEINT:
- if (netif_msg_intr(card) && net_ratelimit())
- pr_err("got descriptor chain end interrupt, "
- "restarting DMAC %c.\n",
- 'D'-(i-SPIDER_NET_GDDDCEINT)/3);
+ spider_net_resync_head_ptr(card);
spider_net_refill_rx_chain(card);
spider_net_enable_rxdmac(card);
+ card->num_rx_ints ++;
+ netif_rx_schedule(card->netdev);
show_error = 0;
break;
@@ -1488,9 +1557,12 @@ spider_net_handle_error_irq(struct spide
case SPIDER_NET_GDCINVDINT: /* fallthrough */
case SPIDER_NET_GDBINVDINT: /* fallthrough */
case SPIDER_NET_GDAINVDINT:
- /* could happen when rx chain is full */
+ /* Could happen when rx chain is full */
+ spider_net_resync_head_ptr(card);
spider_net_refill_rx_chain(card);
spider_net_enable_rxdmac(card);
+ card->num_rx_ints ++;
+ netif_rx_schedule(card->netdev);
show_error = 0;
break;
@@ -1583,6 +1655,7 @@ spider_net_interrupt(int irq, void *ptr)
if (status_reg & SPIDER_NET_RXINT ) {
spider_net_rx_irq_off(card);
netif_rx_schedule(netdev);
+ card->num_rx_ints ++;
}
if (status_reg & SPIDER_NET_TXINT)
netif_rx_schedule(netdev);
@@ -2231,6 +2304,7 @@ spider_net_setup_netdev(struct spider_ne
* NETIF_F_HW_VLAN_FILTER */
netdev->irq = card->pdev->irq;
+ card->num_rx_ints = 0;
dn = pci_device_to_OF_node(card->pdev);
if (!dn)
Index: linux-2.6.22-rc1/drivers/net/spider_net.h
===================================================================
--- linux-2.6.22-rc1.orig/drivers/net/spider_net.h 2007-06-08 15:45:32.000000000 -0500
+++ linux-2.6.22-rc1/drivers/net/spider_net.h 2007-06-11 10:02:25.000000000 -0500
@@ -25,7 +25,7 @@
#ifndef _SPIDER_NET_H
#define _SPIDER_NET_H
-#define VERSION "2.0 A"
+#define VERSION "2.0 B"
#include "sungem_phy.h"
@@ -461,6 +461,7 @@ struct spider_net_card {
struct work_struct tx_timeout_task;
atomic_t tx_timeout_task_counter;
wait_queue_head_t waitq;
+ int num_rx_ints;
/* for ethtool */
int msg_enable;
^ permalink raw reply [flat|nested] 69+ messages in thread* [PATCH 3/15] spidernet: Don't terminate the RX ring
2007-06-11 18:17 ` [PATCH 1/15] spidernet: null out skb pointer after its been used Linas Vepstas
2007-06-11 18:21 ` [PATCH 2/15] spidernet: Cure RX ram full bug Linas Vepstas
@ 2007-06-11 18:23 ` Linas Vepstas
2007-06-11 18:26 ` [PATCH 4/15] spidernet: silence the ramfull messages Linas Vepstas
` (12 subsequent siblings)
14 siblings, 0 replies; 69+ messages in thread
From: Linas Vepstas @ 2007-06-11 18:23 UTC (permalink / raw)
To: Jeff Garzik; +Cc: Michael Ellerman, netdev, cbe-oss-dev
The terminated RX ring will cause trouble during the RX ram full
conditions, leading to a hung driver, as the hardware can't find
the next descr. There is no real reason to terminate the RX ring;
it doesn't make the operation any smooother, and it does
require an extra sync. So don't do it.
Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
----
drivers/net/spider_net.c | 18 +++++++++---------
1 file changed, 9 insertions(+), 9 deletions(-)
Index: linux-2.6.22-rc1/drivers/net/spider_net.c
===================================================================
--- linux-2.6.22-rc1.orig/drivers/net/spider_net.c 2007-06-08 17:35:33.000000000 -0500
+++ linux-2.6.22-rc1/drivers/net/spider_net.c 2007-06-08 17:36:19.000000000 -0500
@@ -460,13 +460,9 @@ spider_net_prepare_rx_descr(struct spide
hwdescr->dmac_cmd_status = SPIDER_NET_DESCR_NOT_IN_USE;
} else {
hwdescr->buf_addr = buf;
- hwdescr->next_descr_addr = 0;
wmb();
hwdescr->dmac_cmd_status = SPIDER_NET_DESCR_CARDOWNED |
SPIDER_NET_DMAC_NOINTR_COMPLETE;
-
- wmb();
- descr->prev->hwdescr->next_descr_addr = descr->bus_addr;
}
return 0;
@@ -541,12 +537,16 @@ spider_net_refill_rx_chain(struct spider
static int
spider_net_alloc_rx_skbs(struct spider_net_card *card)
{
- int result;
- struct spider_net_descr_chain *chain;
+ struct spider_net_descr_chain *chain = &card->rx_chain;
+ struct spider_net_descr *start = chain->tail;
+ struct spider_net_descr *descr = start;
- result = -ENOMEM;
+ /* Link up the hardware chain pointers */
+ do {
+ descr->prev->hwdescr->next_descr_addr = descr->bus_addr;
+ descr = descr->next;
+ } while (descr != start);
- chain = &card->rx_chain;
/* Put at least one buffer into the chain. if this fails,
* we've got a problem. If not, spider_net_refill_rx_chain
* will do the rest at the end of this function. */
@@ -563,7 +563,7 @@ spider_net_alloc_rx_skbs(struct spider_n
error:
spider_net_free_rx_chain_contents(card);
- return result;
+ return -ENOMEM;
}
/**
^ permalink raw reply [flat|nested] 69+ messages in thread* [PATCH 4/15] spidernet: silence the ramfull messages
2007-06-11 18:17 ` [PATCH 1/15] spidernet: null out skb pointer after its been used Linas Vepstas
2007-06-11 18:21 ` [PATCH 2/15] spidernet: Cure RX ram full bug Linas Vepstas
2007-06-11 18:23 ` [PATCH 3/15] spidernet: Don't terminate the RX ring Linas Vepstas
@ 2007-06-11 18:26 ` Linas Vepstas
2007-06-13 20:12 ` Jeff Garzik
2007-06-11 18:29 ` [PATCH 5/15] spidernet: turn off descriptor chain end interrupt Linas Vepstas
` (11 subsequent siblings)
14 siblings, 1 reply; 69+ messages in thread
From: Linas Vepstas @ 2007-06-11 18:26 UTC (permalink / raw)
To: Jeff Garzik; +Cc: Michael Ellerman, netdev, cbe-oss-dev
Although the previous patch resolved issues with hangs when the
RX ram full interrupt is encountered, there are still situations
where lots of RX ramfull interrupts arrive, resulting in a noisy
log in syslog. There is no need for this.
Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
----
drivers/net/spider_net.c | 20 +++++++++++---------
drivers/net/spider_net.h | 1 +
2 files changed, 12 insertions(+), 9 deletions(-)
Index: linux-2.6.22-rc1/drivers/net/spider_net.c
===================================================================
--- linux-2.6.22-rc1.orig/drivers/net/spider_net.c 2007-06-11 10:02:34.000000000 -0500
+++ linux-2.6.22-rc1/drivers/net/spider_net.c 2007-06-11 11:45:25.000000000 -0500
@@ -1172,7 +1172,7 @@ spider_net_decode_one_descr(struct spide
goto bad_desc;
}
- if (hwdescr->dmac_cmd_status & 0xfefe) {
+ if (hwdescr->dmac_cmd_status & 0xfcf4) {
pr_err("%s: bad status, cmd_status=x%08x\n",
card->netdev->name,
hwdescr->dmac_cmd_status);
@@ -1251,6 +1251,7 @@ spider_net_poll(struct net_device *netde
if (no_more_packets) {
netif_rx_complete(netdev);
spider_net_rx_irq_on(card);
+ card->ignore_rx_ramfull = 0;
return 0;
}
@@ -1521,15 +1522,15 @@ spider_net_handle_error_irq(struct spide
case SPIDER_NET_GRFBFLLINT: /* fallthrough */
case SPIDER_NET_GRFAFLLINT: /* fallthrough */
case SPIDER_NET_GRMFLLINT:
- if (netif_msg_intr(card) && net_ratelimit())
- pr_err("Spider RX RAM full, incoming packets "
- "might be discarded!\n");
/* Could happen when rx chain is full */
- spider_net_resync_head_ptr(card);
- spider_net_refill_rx_chain(card);
- spider_net_enable_rxdmac(card);
- card->num_rx_ints ++;
- netif_rx_schedule(card->netdev);
+ if (card->ignore_rx_ramfull == 0) {
+ card->ignore_rx_ramfull = 1;
+ spider_net_resync_head_ptr(card);
+ spider_net_refill_rx_chain(card);
+ spider_net_enable_rxdmac(card);
+ card->num_rx_ints ++;
+ netif_rx_schedule(card->netdev);
+ }
show_error = 0;
break;
@@ -2305,6 +2306,7 @@ spider_net_setup_netdev(struct spider_ne
netdev->irq = card->pdev->irq;
card->num_rx_ints = 0;
+ card->ignore_rx_ramfull = 0;
dn = pci_device_to_OF_node(card->pdev);
if (!dn)
Index: linux-2.6.22-rc1/drivers/net/spider_net.h
===================================================================
--- linux-2.6.22-rc1.orig/drivers/net/spider_net.h 2007-06-11 10:02:25.000000000 -0500
+++ linux-2.6.22-rc1/drivers/net/spider_net.h 2007-06-11 11:45:50.000000000 -0500
@@ -462,6 +462,7 @@ struct spider_net_card {
atomic_t tx_timeout_task_counter;
wait_queue_head_t waitq;
int num_rx_ints;
+ int ignore_rx_ramfull;
/* for ethtool */
int msg_enable;
^ permalink raw reply [flat|nested] 69+ messages in thread* Re: [PATCH 4/15] spidernet: silence the ramfull messages
2007-06-11 18:26 ` [PATCH 4/15] spidernet: silence the ramfull messages Linas Vepstas
@ 2007-06-13 20:12 ` Jeff Garzik
2007-06-14 22:29 ` Linas Vepstas
2007-06-14 23:12 ` [PATCH] spidernet: Replace literal with const Linas Vepstas
0 siblings, 2 replies; 69+ messages in thread
From: Jeff Garzik @ 2007-06-13 20:12 UTC (permalink / raw)
To: Linas Vepstas; +Cc: Michael Ellerman, netdev, cbe-oss-dev
Linas Vepstas wrote:
> --- linux-2.6.22-rc1.orig/drivers/net/spider_net.c 2007-06-11 10:02:34.000000000 -0500
> +++ linux-2.6.22-rc1/drivers/net/spider_net.c 2007-06-11 11:45:25.000000000 -0500
> @@ -1172,7 +1172,7 @@ spider_net_decode_one_descr(struct spide
> goto bad_desc;
> }
>
> - if (hwdescr->dmac_cmd_status & 0xfefe) {
> + if (hwdescr->dmac_cmd_status & 0xfcf4) {
> pr_err("%s: bad status, cmd_status=x%08x\n",
> card->netdev->name,
> hwdescr->dmac_cmd_status);
A follow-up patch needs to remove the above magic numbers (==numeric
constants), replacing them with named constants
I only accepted the above patch because it was needed for the fixes.
Otherwise I would have requested a SPIDERNET_BAD_STATUS constant or
similar, containing the relevant split-out bits
^ permalink raw reply [flat|nested] 69+ messages in thread* Re: [PATCH 4/15] spidernet: silence the ramfull messages
2007-06-13 20:12 ` Jeff Garzik
@ 2007-06-14 22:29 ` Linas Vepstas
2007-06-14 23:12 ` [PATCH] spidernet: Replace literal with const Linas Vepstas
1 sibling, 0 replies; 69+ messages in thread
From: Linas Vepstas @ 2007-06-14 22:29 UTC (permalink / raw)
To: Jeff Garzik; +Cc: Michael Ellerman, netdev, cbe-oss-dev
On Wed, Jun 13, 2007 at 04:12:00PM -0400, Jeff Garzik wrote:
> Linas Vepstas wrote:
> >--- linux-2.6.22-rc1.orig/drivers/net/spider_net.c 2007-06-11
> >10:02:34.000000000 -0500
> >+++ linux-2.6.22-rc1/drivers/net/spider_net.c 2007-06-11
> >11:45:25.000000000 -0500
> >@@ -1172,7 +1172,7 @@ spider_net_decode_one_descr(struct spide
> > goto bad_desc;
> > }
> >
> >- if (hwdescr->dmac_cmd_status & 0xfefe) {
> >+ if (hwdescr->dmac_cmd_status & 0xfcf4) {
> > pr_err("%s: bad status, cmd_status=x%08x\n",
> > card->netdev->name,
> > hwdescr->dmac_cmd_status);
>
>
> A follow-up patch needs to remove the above magic numbers (==numeric
> constants), replacing them with named constants
I thought laziness was a virtue ... oh, wait, wrong programming language.
Patch coming shortly.
--linas
^ permalink raw reply [flat|nested] 69+ messages in thread* [PATCH] spidernet: Replace literal with const
2007-06-13 20:12 ` Jeff Garzik
2007-06-14 22:29 ` Linas Vepstas
@ 2007-06-14 23:12 ` Linas Vepstas
2007-07-02 12:37 ` Jeff Garzik
1 sibling, 1 reply; 69+ messages in thread
From: Linas Vepstas @ 2007-06-14 23:12 UTC (permalink / raw)
To: Jeff Garzik; +Cc: Michael Ellerman, netdev, cbe-oss-dev
Replace literal with const; add bit definitions.
Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
----
On Wed, Jun 13, 2007 at 04:12:00PM -0400, Jeff Garzik wrote:
> A follow-up patch needs to remove the above magic numbers (==numeric
> constants), replacing them with named constants
Here it is. Lightly stres-tested (about 1/2 hour), as this patch
tests some additonal bits.
drivers/net/spider_net.c | 2 +-
drivers/net/spider_net.h | 19 +++++++++++++++++++
2 files changed, 20 insertions(+), 1 deletion(-)
Index: linux-2.6.22-rc1/drivers/net/spider_net.c
===================================================================
--- linux-2.6.22-rc1.orig/drivers/net/spider_net.c 2007-06-11 15:39:03.000000000 -0500
+++ linux-2.6.22-rc1/drivers/net/spider_net.c 2007-06-14 17:23:32.000000000 -0500
@@ -1235,7 +1235,7 @@ spider_net_decode_one_descr(struct spide
goto bad_desc;
}
- if (hwdescr->dmac_cmd_status & 0xfcf4) {
+ if (hwdescr->dmac_cmd_status & SPIDER_NET_DESCR_BAD_STATUS) {
dev_err(&card->netdev->dev, "bad status, cmd_status=x%08x\n",
hwdescr->dmac_cmd_status);
pr_err("buf_addr=x%08x\n", hw_buf_addr);
Index: linux-2.6.22-rc1/drivers/net/spider_net.h
===================================================================
--- linux-2.6.22-rc1.orig/drivers/net/spider_net.h 2007-06-11 15:39:03.000000000 -0500
+++ linux-2.6.22-rc1/drivers/net/spider_net.h 2007-06-14 17:34:56.000000000 -0500
@@ -359,6 +359,18 @@ enum spider_net_int2_status {
#define SPIDER_NET_DMAC_UDP 0x00030000
#define SPIDER_NET_TXDCEST 0x08000000
+#define SPIDER_NET_DESCR_RXFDIS 0x00000001
+#define SPIDER_NET_DESCR_RXDCEIS 0x00000002
+#define SPIDER_NET_DESCR_RXDEN0IS 0x00000004
+#define SPIDER_NET_DESCR_RXINVDIS 0x00000008
+#define SPIDER_NET_DESCR_RXRERRIS 0x00000010
+#define SPIDER_NET_DESCR_RXFDCIMS 0x00000100
+#define SPIDER_NET_DESCR_RXDCEIMS 0x00000200
+#define SPIDER_NET_DESCR_RXDEN0IMS 0x00000400
+#define SPIDER_NET_DESCR_RXINVDIMS 0x00000800
+#define SPIDER_NET_DESCR_RXRERRMIS 0x00001000
+#define SPIDER_NET_DESCR_UNUSED 0x077fe0e0
+
#define SPIDER_NET_DESCR_IND_PROC_MASK 0xF0000000
#define SPIDER_NET_DESCR_COMPLETE 0x00000000 /* used in rx and tx */
#define SPIDER_NET_DESCR_RESPONSE_ERROR 0x10000000 /* used in rx and tx */
@@ -369,6 +381,13 @@ enum spider_net_int2_status {
#define SPIDER_NET_DESCR_NOT_IN_USE 0xF0000000
#define SPIDER_NET_DESCR_TXDESFLG 0x00800000
+#define SPIDER_NET_DESCR_BAD_STATUS (SPIDER_NET_DESCR_RXDEN0IS | \
+ SPIDER_NET_DESCR_RXRERRIS | \
+ SPIDER_NET_DESCR_RXDEN0IMS | \
+ SPIDER_NET_DESCR_RXINVDIMS | \
+ SPIDER_NET_DESCR_RXRERRMIS | \
+ SPIDER_NET_DESCR_UNUSED)
+
/* Descriptor, as defined by the hardware */
struct spider_net_hw_descr {
u32 buf_addr;
^ permalink raw reply [flat|nested] 69+ messages in thread* Re: [PATCH] spidernet: Replace literal with const
2007-06-14 23:12 ` [PATCH] spidernet: Replace literal with const Linas Vepstas
@ 2007-07-02 12:37 ` Jeff Garzik
0 siblings, 0 replies; 69+ messages in thread
From: Jeff Garzik @ 2007-07-02 12:37 UTC (permalink / raw)
To: Linas Vepstas; +Cc: Michael Ellerman, netdev, cbe-oss-dev
Linas Vepstas wrote:
> Replace literal with const; add bit definitions.
>
> Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
>
> ----
>
> On Wed, Jun 13, 2007 at 04:12:00PM -0400, Jeff Garzik wrote:
>> A follow-up patch needs to remove the above magic numbers (==numeric
>> constants), replacing them with named constants
>
> Here it is. Lightly stres-tested (about 1/2 hour), as this patch
> tests some additonal bits.
>
> drivers/net/spider_net.c | 2 +-
> drivers/net/spider_net.h | 19 +++++++++++++++++++
> 2 files changed, 20 insertions(+), 1 deletion(-)
applied to #upstream (2.6.23)
^ permalink raw reply [flat|nested] 69+ messages in thread
* [PATCH 5/15] spidernet: turn off descriptor chain end interrupt.
2007-06-11 18:17 ` [PATCH 1/15] spidernet: null out skb pointer after its been used Linas Vepstas
` (2 preceding siblings ...)
2007-06-11 18:26 ` [PATCH 4/15] spidernet: silence the ramfull messages Linas Vepstas
@ 2007-06-11 18:29 ` Linas Vepstas
2007-06-11 18:32 ` [PATCH 6/15] spidernet: skb used after netif_receive_skb Linas Vepstas
` (10 subsequent siblings)
14 siblings, 0 replies; 69+ messages in thread
From: Linas Vepstas @ 2007-06-11 18:29 UTC (permalink / raw)
To: Jeff Garzik; +Cc: Michael Ellerman, netdev, cbe-oss-dev
At some point, the transmit descriptor chain end interrupt (TXDCEINT)
was turned on. This is a mistake; and it damages small packet
transmit performance, as it results in a huge storm of interrupts.
Turn it off.
Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
----
drivers/net/spider_net.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
Index: linux-2.6.22-rc1/drivers/net/spider_net.h
===================================================================
--- linux-2.6.22-rc1.orig/drivers/net/spider_net.h 2007-06-08 17:40:02.000000000 -0500
+++ linux-2.6.22-rc1/drivers/net/spider_net.h 2007-06-08 17:40:05.000000000 -0500
@@ -222,6 +222,7 @@ extern char spider_net_driver_name[];
#define SPIDER_NET_GDTBSTA 0x00000300
#define SPIDER_NET_GDTDCEIDIS 0x00000002
#define SPIDER_NET_DMA_TX_VALUE SPIDER_NET_TX_DMA_EN | \
+ SPIDER_NET_GDTDCEIDIS | \
SPIDER_NET_GDTBSTA
#define SPIDER_NET_DMA_TX_FEND_VALUE 0x00030003
@@ -332,8 +333,7 @@ enum spider_net_int2_status {
SPIDER_NET_GRISPDNGINT
};
-#define SPIDER_NET_TXINT ( (1 << SPIDER_NET_GDTFDCINT) | \
- (1 << SPIDER_NET_GDTDCEINT) )
+#define SPIDER_NET_TXINT (1 << SPIDER_NET_GDTFDCINT)
/* We rely on flagged descriptor interrupts */
#define SPIDER_NET_RXINT ( (1 << SPIDER_NET_GDAFDCINT) )
^ permalink raw reply [flat|nested] 69+ messages in thread* [PATCH 6/15] spidernet: skb used after netif_receive_skb
2007-06-11 18:17 ` [PATCH 1/15] spidernet: null out skb pointer after its been used Linas Vepstas
` (3 preceding siblings ...)
2007-06-11 18:29 ` [PATCH 5/15] spidernet: turn off descriptor chain end interrupt Linas Vepstas
@ 2007-06-11 18:32 ` Linas Vepstas
2007-06-11 18:35 ` [PATCH 7/15] spidernet: checksum and ethtool Linas Vepstas
` (9 subsequent siblings)
14 siblings, 0 replies; 69+ messages in thread
From: Linas Vepstas @ 2007-06-11 18:32 UTC (permalink / raw)
To: Jeff Garzik; +Cc: Michael Ellerman, netdev, cbe-oss-dev
From: Florin Malita <fmalita@gmail.com>
The stats update code in spider_net_pass_skb_up() is touching the skb
after it's been passed up to the stack. To avoid that, just update the
stats first.
Signed-off-by: Florin Malita <fmalita@gmail.com>
Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
----
drivers/net/spider_net.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/net/spider_net.c b/drivers/net/spider_net.c
index 108adbf..1df2f0b 100644
Index: linux-2.6.22-rc1/drivers/net/spider_net.c
===================================================================
--- linux-2.6.22-rc1.orig/drivers/net/spider_net.c 2007-06-08 17:40:02.000000000 -0500
+++ linux-2.6.22-rc1/drivers/net/spider_net.c 2007-06-08 17:40:09.000000000 -0500
@@ -1014,12 +1014,12 @@ spider_net_pass_skb_up(struct spider_net
*/
}
- /* pass skb up to stack */
- netif_receive_skb(skb);
-
/* update netdevice statistics */
card->netdev_stats.rx_packets++;
card->netdev_stats.rx_bytes += skb->len;
+
+ /* pass skb up to stack */
+ netif_receive_skb(skb);
}
#ifdef DEBUG
^ permalink raw reply [flat|nested] 69+ messages in thread* [PATCH 7/15] spidernet: checksum and ethtool
2007-06-11 18:17 ` [PATCH 1/15] spidernet: null out skb pointer after its been used Linas Vepstas
` (4 preceding siblings ...)
2007-06-11 18:32 ` [PATCH 6/15] spidernet: skb used after netif_receive_skb Linas Vepstas
@ 2007-06-11 18:35 ` Linas Vepstas
2007-06-11 18:41 ` [PATCH 8/15] spidernet: beautify error messages Linas Vepstas
` (8 subsequent siblings)
14 siblings, 0 replies; 69+ messages in thread
From: Linas Vepstas @ 2007-06-11 18:35 UTC (permalink / raw)
To: Jeff Garzik; +Cc: Michael Ellerman, netdev, cbe-oss-dev
From: Stephen Hemminger <shemminger@linux-foundation.org>
It doesn't look like spidernet hardware can really checksum all protocols,
the code looks like it does IPV4 only. If so, it should use NETIF_F_IP_CSUM
instead of NETIF_F_HW_CSUM.
The driver doesn't need it's own get/set for ethtool tx csum, and it
should use the standard ethtool_op_get_link.
Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
-----
drivers/net/spider_net.c | 4 ++--
drivers/net/spider_net_ethtool.c | 21 +++------------------
2 files changed, 5 insertions(+), 20 deletions(-)
Index: linux-2.6.22-rc1/drivers/net/spider_net.c
===================================================================
--- linux-2.6.22-rc1.orig/drivers/net/spider_net.c 2007-06-08 17:28:55.000000000 -0500
+++ linux-2.6.22-rc1/drivers/net/spider_net.c 2007-06-08 17:28:58.000000000 -0500
@@ -718,7 +718,7 @@ spider_net_prepare_tx_descr(struct spide
SPIDER_NET_DESCR_CARDOWNED | SPIDER_NET_DMAC_NOCS;
spin_unlock_irqrestore(&chain->lock, flags);
- if (skb->protocol == htons(ETH_P_IP) && skb->ip_summed == CHECKSUM_PARTIAL)
+ if (skb->ip_summed == CHECKSUM_PARTIAL)
switch (ip_hdr(skb)->protocol) {
case IPPROTO_TCP:
hwdescr->dmac_cmd_status |= SPIDER_NET_DMAC_TCP;
@@ -2300,7 +2300,7 @@ spider_net_setup_netdev(struct spider_ne
spider_net_setup_netdev_ops(netdev);
- netdev->features = NETIF_F_HW_CSUM | NETIF_F_LLTX;
+ netdev->features = NETIF_F_IP_CSUM | NETIF_F_LLTX;
/* some time: NETIF_F_HW_VLAN_TX | NETIF_F_HW_VLAN_RX |
* NETIF_F_HW_VLAN_FILTER */
Index: linux-2.6.22-rc1/drivers/net/spider_net_ethtool.c
===================================================================
--- linux-2.6.22-rc1.orig/drivers/net/spider_net_ethtool.c 2007-06-08 17:27:01.000000000 -0500
+++ linux-2.6.22-rc1/drivers/net/spider_net_ethtool.c 2007-06-08 17:28:58.000000000 -0500
@@ -134,22 +134,6 @@ spider_net_ethtool_set_rx_csum(struct ne
return 0;
}
-static uint32_t
-spider_net_ethtool_get_tx_csum(struct net_device *netdev)
-{
- return (netdev->features & NETIF_F_HW_CSUM) != 0;
-}
-
-static int
-spider_net_ethtool_set_tx_csum(struct net_device *netdev, uint32_t data)
-{
- if (data)
- netdev->features |= NETIF_F_HW_CSUM;
- else
- netdev->features &= ~NETIF_F_HW_CSUM;
-
- return 0;
-}
static void
spider_net_ethtool_get_ringparam(struct net_device *netdev,
@@ -200,11 +184,12 @@ const struct ethtool_ops spider_net_etht
.get_wol = spider_net_ethtool_get_wol,
.get_msglevel = spider_net_ethtool_get_msglevel,
.set_msglevel = spider_net_ethtool_set_msglevel,
+ .get_link = ethtool_op_get_link,
.nway_reset = spider_net_ethtool_nway_reset,
.get_rx_csum = spider_net_ethtool_get_rx_csum,
.set_rx_csum = spider_net_ethtool_set_rx_csum,
- .get_tx_csum = spider_net_ethtool_get_tx_csum,
- .set_tx_csum = spider_net_ethtool_set_tx_csum,
+ .get_tx_csum = ethtool_op_get_tx_csum,
+ .set_tx_csum = ethtool_op_set_tx_csum,
.get_ringparam = spider_net_ethtool_get_ringparam,
.get_strings = spider_net_get_strings,
.get_stats_count = spider_net_get_stats_count,
^ permalink raw reply [flat|nested] 69+ messages in thread* [PATCH 8/15] spidernet: beautify error messages
2007-06-11 18:17 ` [PATCH 1/15] spidernet: null out skb pointer after its been used Linas Vepstas
` (5 preceding siblings ...)
2007-06-11 18:35 ` [PATCH 7/15] spidernet: checksum and ethtool Linas Vepstas
@ 2007-06-11 18:41 ` Linas Vepstas
2007-06-13 20:15 ` Jeff Garzik
2007-06-11 18:48 ` [PATCH 9/15] spidernet: enhance the dump routine Linas Vepstas
` (7 subsequent siblings)
14 siblings, 1 reply; 69+ messages in thread
From: Linas Vepstas @ 2007-06-11 18:41 UTC (permalink / raw)
To: Jeff Garzik; +Cc: Michael Ellerman, netdev, cbe-oss-dev
Use dev_err() to print device error messages.
Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
----
drivers/net/spider_net.c | 64 ++++++++++++++++++++++++-----------------------
1 file changed, 34 insertions(+), 30 deletions(-)
Index: linux-2.6.22-rc1/drivers/net/spider_net.c
===================================================================
--- linux-2.6.22-rc1.orig/drivers/net/spider_net.c 2007-06-11 13:09:46.000000000 -0500
+++ linux-2.6.22-rc1/drivers/net/spider_net.c 2007-06-11 13:11:29.000000000 -0500
@@ -434,7 +434,8 @@ spider_net_prepare_rx_descr(struct spide
bufsize + SPIDER_NET_RXBUF_ALIGN - 1);
if (!descr->skb) {
if (netif_msg_rx_err(card) && net_ratelimit())
- pr_err("Not enough memory to allocate rx buffer\n");
+ dev_err(&card->netdev->dev,
+ "Not enough memory to allocate rx buffer\n");
card->spider_stats.alloc_rx_skb_error++;
return -ENOMEM;
}
@@ -455,7 +456,7 @@ spider_net_prepare_rx_descr(struct spide
dev_kfree_skb_any(descr->skb);
descr->skb = NULL;
if (netif_msg_rx_err(card) && net_ratelimit())
- pr_err("Could not iommu-map rx buffer\n");
+ dev_err(&card->netdev->dev, "Could not iommu-map rx buffer\n");
card->spider_stats.rx_iommu_map_error++;
hwdescr->dmac_cmd_status = SPIDER_NET_DESCR_NOT_IN_USE;
} else {
@@ -692,7 +693,7 @@ spider_net_prepare_tx_descr(struct spide
buf = pci_map_single(card->pdev, skb->data, skb->len, PCI_DMA_TODEVICE);
if (pci_dma_mapping_error(buf)) {
if (netif_msg_tx_err(card) && net_ratelimit())
- pr_err("could not iommu-map packet (%p, %i). "
+ dev_err(&card->netdev->dev, "could not iommu-map packet (%p, %i). "
"Dropping packet\n", skb->data, skb->len);
card->spider_stats.tx_iommu_map_error++;
return -ENOMEM;
@@ -832,9 +833,8 @@ spider_net_release_tx_chain(struct spide
case SPIDER_NET_DESCR_PROTECTION_ERROR:
case SPIDER_NET_DESCR_FORCE_END:
if (netif_msg_tx_err(card))
- pr_err("%s: forcing end of tx descriptor "
- "with status x%02x\n",
- card->netdev->name, status);
+ dev_err(&card->netdev->dev, "forcing end of tx descriptor "
+ "with status x%02x\n", status);
card->netdev_stats.tx_errors++;
break;
@@ -1147,8 +1147,8 @@ spider_net_decode_one_descr(struct spide
(status == SPIDER_NET_DESCR_PROTECTION_ERROR) ||
(status == SPIDER_NET_DESCR_FORCE_END) ) {
if (netif_msg_rx_err(card))
- pr_err("%s: dropping RX descriptor with state %d\n",
- card->netdev->name, status);
+ dev_err(&card->netdev->dev,
+ "dropping RX descriptor with state %d\n", status);
card->netdev_stats.rx_dropped++;
goto bad_desc;
}
@@ -1156,8 +1156,8 @@ spider_net_decode_one_descr(struct spide
if ( (status != SPIDER_NET_DESCR_COMPLETE) &&
(status != SPIDER_NET_DESCR_FRAME_END) ) {
if (netif_msg_rx_err(card))
- pr_err("%s: RX descriptor with unknown state %d\n",
- card->netdev->name, status);
+ dev_err(&card->netdev->dev,
+ "RX descriptor with unknown state %d\n", status);
card->spider_stats.rx_desc_unk_state++;
goto bad_desc;
}
@@ -1165,16 +1165,15 @@ spider_net_decode_one_descr(struct spide
/* The cases we'll throw away the packet immediately */
if (hwdescr->data_error & SPIDER_NET_DESTROY_RX_FLAGS) {
if (netif_msg_rx_err(card))
- pr_err("%s: error in received descriptor found, "
+ dev_err(&card->netdev->dev,
+ "error in received descriptor found, "
"data_status=x%08x, data_error=x%08x\n",
- card->netdev->name,
hwdescr->data_status, hwdescr->data_error);
goto bad_desc;
}
if (hwdescr->dmac_cmd_status & 0xfcf4) {
- pr_err("%s: bad status, cmd_status=x%08x\n",
- card->netdev->name,
+ dev_err(&card->netdev->dev, "bad status, cmd_status=x%08x\n",
hwdescr->dmac_cmd_status);
pr_err("buf_addr=x%08x\n", hwdescr->buf_addr);
pr_err("buf_size=x%08x\n", hwdescr->buf_size);
@@ -1452,7 +1451,7 @@ spider_net_handle_error_irq(struct spide
case SPIDER_NET_GPWFFINT:
/* PHY command queue full */
if (netif_msg_intr(card))
- pr_err("PHY write queue full\n");
+ dev_err(&card->netdev->dev, "PHY write queue full\n");
show_error = 0;
break;
@@ -1619,9 +1618,8 @@ spider_net_handle_error_irq(struct spide
}
if ((show_error) && (netif_msg_intr(card)) && net_ratelimit())
- pr_err("Got error interrupt on %s, GHIINT0STS = 0x%08x, "
+ dev_err(&card->netdev->dev, "Error interrupt, GHIINT0STS = 0x%08x, "
"GHIINT1STS = 0x%08x, GHIINT2STS = 0x%08x\n",
- card->netdev->name,
status_reg, error_reg1, error_reg2);
/* clear interrupt sources */
@@ -1886,7 +1884,8 @@ spider_net_init_firmware(struct spider_n
SPIDER_NET_FIRMWARE_NAME, &card->pdev->dev) == 0) {
if ( (firmware->size != SPIDER_NET_FIRMWARE_LEN) &&
netif_msg_probe(card) ) {
- pr_err("Incorrect size of spidernet firmware in " \
+ dev_err(&card->netdev->dev,
+ "Incorrect size of spidernet firmware in " \
"filesystem. Looking in host firmware...\n");
goto try_host_fw;
}
@@ -1910,8 +1909,8 @@ try_host_fw:
if ( (fw_size != SPIDER_NET_FIRMWARE_LEN) &&
netif_msg_probe(card) ) {
- pr_err("Incorrect size of spidernet firmware in " \
- "host firmware\n");
+ dev_err(&card->netdev->dev,
+ "Incorrect size of spidernet firmware in host firmware\n");
goto done;
}
@@ -1921,7 +1920,8 @@ done:
return err;
out_err:
if (netif_msg_probe(card))
- pr_err("Couldn't find spidernet firmware in filesystem " \
+ dev_err(&card->netdev->dev,
+ "Couldn't find spidernet firmware in filesystem " \
"or host firmware\n");
return err;
}
@@ -2319,13 +2319,14 @@ spider_net_setup_netdev(struct spider_ne
result = spider_net_set_mac(netdev, &addr);
if ((result) && (netif_msg_probe(card)))
- pr_err("Failed to set MAC address: %i\n", result);
+ dev_err(&card->netdev->dev,
+ "Failed to set MAC address: %i\n", result);
result = register_netdev(netdev);
if (result) {
if (netif_msg_probe(card))
- pr_err("Couldn't register net_device: %i\n",
- result);
+ dev_err(&card->netdev->dev,
+ "Couldn't register net_device: %i\n", result);
return result;
}
@@ -2403,17 +2404,19 @@ spider_net_setup_pci_dev(struct pci_dev
unsigned long mmio_start, mmio_len;
if (pci_enable_device(pdev)) {
- pr_err("Couldn't enable PCI device\n");
+ dev_err(&pdev->dev, "Couldn't enable PCI device\n");
return NULL;
}
if (!(pci_resource_flags(pdev, 0) & IORESOURCE_MEM)) {
- pr_err("Couldn't find proper PCI device base address.\n");
+ dev_err(&pdev->dev,
+ "Couldn't find proper PCI device base address.\n");
goto out_disable_dev;
}
if (pci_request_regions(pdev, spider_net_driver_name)) {
- pr_err("Couldn't obtain PCI resources, aborting.\n");
+ dev_err(&pdev->dev,
+ "Couldn't obtain PCI resources, aborting.\n");
goto out_disable_dev;
}
@@ -2421,8 +2424,8 @@ spider_net_setup_pci_dev(struct pci_dev
card = spider_net_alloc_card();
if (!card) {
- pr_err("Couldn't allocate net_device structure, "
- "aborting.\n");
+ dev_err(&pdev->dev,
+ "Couldn't allocate net_device structure, aborting.\n");
goto out_release_regions;
}
card->pdev = pdev;
@@ -2436,7 +2439,8 @@ spider_net_setup_pci_dev(struct pci_dev
card->regs = ioremap(mmio_start, mmio_len);
if (!card->regs) {
- pr_err("Couldn't obtain PCI resources, aborting.\n");
+ dev_err(&pdev->dev,
+ "Couldn't obtain PCI resources, aborting.\n");
goto out_release_regions;
}
^ permalink raw reply [flat|nested] 69+ messages in thread* [PATCH 9/15] spidernet: enhance the dump routine
2007-06-11 18:17 ` [PATCH 1/15] spidernet: null out skb pointer after its been used Linas Vepstas
` (6 preceding siblings ...)
2007-06-11 18:41 ` [PATCH 8/15] spidernet: beautify error messages Linas Vepstas
@ 2007-06-11 18:48 ` Linas Vepstas
2007-06-11 18:52 ` [PATCH 10/15] spidernet: invalidate unused pointer Linas Vepstas
` (6 subsequent siblings)
14 siblings, 0 replies; 69+ messages in thread
From: Linas Vepstas @ 2007-06-11 18:48 UTC (permalink / raw)
To: Jeff Garzik; +Cc: Michael Ellerman, netdev, cbe-oss-dev
Crazy device problems are hard to debug, when one does not have
good trace info. This patch makes a major enhancement to the
device dump routine.
Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
----
drivers/net/spider_net.c | 78 ++++++++++++++++++++++++++++++++++++++++++-----
1 file changed, 70 insertions(+), 8 deletions(-)
Index: linux-2.6.22-rc1/drivers/net/spider_net.c
===================================================================
--- linux-2.6.22-rc1.orig/drivers/net/spider_net.c 2007-06-11 11:50:03.000000000 -0500
+++ linux-2.6.22-rc1/drivers/net/spider_net.c 2007-06-11 11:51:19.000000000 -0500
@@ -1022,34 +1022,94 @@ spider_net_pass_skb_up(struct spider_net
netif_receive_skb(skb);
}
-#ifdef DEBUG
static void show_rx_chain(struct spider_net_card *card)
{
struct spider_net_descr_chain *chain = &card->rx_chain;
struct spider_net_descr *start= chain->tail;
struct spider_net_descr *descr= start;
+ struct spider_net_hw_descr *hwd = start->hwdescr;
+ struct device *dev = &card->netdev->dev;
+ u32 curr_desc, next_desc;
int status;
+ int tot = 0;
int cnt = 0;
- int cstat = spider_net_get_descr_status(descr);
- printk(KERN_INFO "RX chain tail at descr=%ld\n",
- (start - card->descr) - card->tx_chain.num_desc);
+ int off = start - chain->ring;
+ int cstat = hwd->dmac_cmd_status;
+
+ dev_info(dev, "Total number of descrs=%d\n",
+ chain->num_desc);
+ dev_info(dev, "Chain tail located at descr=%d, status=0x%x\n",
+ off, cstat);
+
+ curr_desc = spider_net_read_reg(card, SPIDER_NET_GDACTDPA);
+ next_desc = spider_net_read_reg(card, SPIDER_NET_GDACNEXTDA);
+
status = cstat;
do
{
- status = spider_net_get_descr_status(descr);
+ hwd = descr->hwdescr;
+ off = descr - chain->ring;
+ status = hwd->dmac_cmd_status;
+
+ if (descr == chain->head)
+ dev_info(dev, "Chain head is at %d, head status=0x%x\n",
+ off, status);
+
+ if (curr_desc == descr->bus_addr)
+ dev_info(dev, "HW curr desc (GDACTDPA) is at %d, status=0x%x\n",
+ off, status);
+
+ if (next_desc == descr->bus_addr)
+ dev_info(dev, "HW next desc (GDACNEXTDA) is at %d, status=0x%x\n",
+ off, status);
+
+ if (hwd->next_descr_addr == 0)
+ dev_info(dev, "chain is cut at %d\n", off);
+
if (cstat != status) {
- printk(KERN_INFO "Have %d descrs with stat=x%08x\n", cnt, cstat);
+ int from = (chain->num_desc + off - cnt) % chain->num_desc;
+ int to = (chain->num_desc + off - 1) % chain->num_desc;
+ dev_info(dev, "Have %d (from %d to %d) descrs "
+ "with stat=0x%08x\n", cnt, from, to, cstat);
cstat = status;
cnt = 0;
}
+
cnt ++;
+ tot ++;
+ descr = descr->next;
+ } while (descr != start);
+
+ dev_info(dev, "Last %d descrs with stat=0x%08x "
+ "for a total of %d descrs\n", cnt, cstat, tot);
+
+#ifdef DEBUG
+ /* Now dump the whole ring */
+ descr = start;
+ do
+ {
+ struct spider_net_hw_descr *hwd = descr->hwdescr;
+ status = spider_net_get_descr_status(hwd);
+ cnt = descr - chain->ring;
+ dev_info(dev, "Descr %d stat=0x%08x skb=%p\n",
+ cnt, status, descr->skb);
+ dev_info(dev, "bus addr=%08x buf addr=%08x sz=%d\n",
+ descr->bus_addr, hwd->buf_addr, hwd->buf_size);
+ dev_info(dev, "next=%08x result sz=%d valid sz=%d\n",
+ hwd->next_descr_addr, hwd->result_size,
+ hwd->valid_size);
+ dev_info(dev, "dmac=%08x data stat=%08x data err=%08x\n",
+ hwd->dmac_cmd_status, hwd->data_status,
+ hwd->data_error);
+ dev_info(dev, "\n");
+
descr = descr->next;
} while (descr != start);
- printk(KERN_INFO "Last %d descrs with stat=x%08x\n", cnt, cstat);
-}
#endif
+}
+
/**
* spider_net_resync_head_ptr - Advance head ptr past empty descrs
*
@@ -1197,6 +1257,8 @@ spider_net_decode_one_descr(struct spide
return 1;
bad_desc:
+ if (netif_msg_rx_err(card))
+ show_rx_chain(card);
dev_kfree_skb_irq(descr->skb);
descr->skb = NULL;
hwdescr->dmac_cmd_status = SPIDER_NET_DESCR_NOT_IN_USE;
^ permalink raw reply [flat|nested] 69+ messages in thread* [PATCH 10/15] spidernet: invalidate unused pointer.
2007-06-11 18:17 ` [PATCH 1/15] spidernet: null out skb pointer after its been used Linas Vepstas
` (7 preceding siblings ...)
2007-06-11 18:48 ` [PATCH 9/15] spidernet: enhance the dump routine Linas Vepstas
@ 2007-06-11 18:52 ` Linas Vepstas
2007-06-11 18:59 ` [PATCH 11/15] spidernet: service TX later Linas Vepstas
` (5 subsequent siblings)
14 siblings, 0 replies; 69+ messages in thread
From: Linas Vepstas @ 2007-06-11 18:52 UTC (permalink / raw)
To: Jeff Garzik; +Cc: Michael Ellerman, netdev, cbe-oss-dev
Invalidate a pointer as its pci_unmap'ed; this is a bit of
paranoia to make sure hardware doesn't continue trying to
DMA to it.
Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
----
drivers/net/spider_net.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
Index: linux-2.6.22-rc1/drivers/net/spider_net.c
===================================================================
--- linux-2.6.22-rc1.orig/drivers/net/spider_net.c 2007-06-11 11:51:19.000000000 -0500
+++ linux-2.6.22-rc1/drivers/net/spider_net.c 2007-06-11 11:53:21.000000000 -0500
@@ -1187,6 +1187,7 @@ spider_net_decode_one_descr(struct spide
struct spider_net_descr_chain *chain = &card->rx_chain;
struct spider_net_descr *descr = chain->tail;
struct spider_net_hw_descr *hwdescr = descr->hwdescr;
+ u32 hw_buf_addr;
int status;
status = spider_net_get_descr_status(hwdescr);
@@ -1200,7 +1201,9 @@ spider_net_decode_one_descr(struct spide
chain->tail = descr->next;
/* unmap descriptor */
- pci_unmap_single(card->pdev, hwdescr->buf_addr,
+ hw_buf_addr = hwdescr->buf_addr;
+ hwdescr->buf_addr = 0xffffffff;
+ pci_unmap_single(card->pdev, hw_buf_addr,
SPIDER_NET_MAX_FRAME, PCI_DMA_FROMDEVICE);
if ( (status == SPIDER_NET_DESCR_RESPONSE_ERROR) ||
@@ -1237,7 +1240,7 @@ spider_net_decode_one_descr(struct spide
dev_err(&card->netdev->dev, "bad status, cmd_status=x%08x\n",
card->netdev->name,
hwdescr->dmac_cmd_status);
- pr_err("buf_addr=x%08x\n", hwdescr->buf_addr);
+ pr_err("buf_addr=x%08x\n", hw_buf_addr);
pr_err("buf_size=x%08x\n", hwdescr->buf_size);
pr_err("next_descr_addr=x%08x\n", hwdescr->next_descr_addr);
pr_err("result_size=x%08x\n", hwdescr->result_size);
^ permalink raw reply [flat|nested] 69+ messages in thread* [PATCH 11/15] spidernet: service TX later.
2007-06-11 18:17 ` [PATCH 1/15] spidernet: null out skb pointer after its been used Linas Vepstas
` (8 preceding siblings ...)
2007-06-11 18:52 ` [PATCH 10/15] spidernet: invalidate unused pointer Linas Vepstas
@ 2007-06-11 18:59 ` Linas Vepstas
2007-06-11 19:02 ` [PATCH 12/15] spidernet: increase the NAPI weight Linas Vepstas
` (4 subsequent siblings)
14 siblings, 0 replies; 69+ messages in thread
From: Linas Vepstas @ 2007-06-11 18:59 UTC (permalink / raw)
To: Jeff Garzik; +Cc: Michael Ellerman, netdev, cbe-oss-dev
When entering the netdev poll routine, empty out the RX
chain first, before cleaning up the TX chain. This should
help avoid RX buffer overflows.
Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
----
drivers/net/spider_net.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
Index: linux-2.6.22-rc1/drivers/net/spider_net.c
===================================================================
--- linux-2.6.22-rc1.orig/drivers/net/spider_net.c 2007-06-11 11:53:21.000000000 -0500
+++ linux-2.6.22-rc1/drivers/net/spider_net.c 2007-06-11 11:53:24.000000000 -0500
@@ -1287,7 +1287,6 @@ spider_net_poll(struct net_device *netde
int packets_to_do, packets_done = 0;
int no_more_packets = 0;
- spider_net_cleanup_tx_ring(card);
packets_to_do = min(*budget, netdev->quota);
while (packets_to_do) {
@@ -1312,6 +1311,8 @@ spider_net_poll(struct net_device *netde
spider_net_refill_rx_chain(card);
spider_net_enable_rxdmac(card);
+ spider_net_cleanup_tx_ring(card);
+
/* if all packets are in the stack, enable interrupts and return 0 */
/* if not, return 1 */
if (no_more_packets) {
^ permalink raw reply [flat|nested] 69+ messages in thread* [PATCH 12/15] spidernet: increase the NAPI weight
2007-06-11 18:17 ` [PATCH 1/15] spidernet: null out skb pointer after its been used Linas Vepstas
` (9 preceding siblings ...)
2007-06-11 18:59 ` [PATCH 11/15] spidernet: service TX later Linas Vepstas
@ 2007-06-11 19:02 ` Linas Vepstas
2007-06-13 20:14 ` Jeff Garzik
2007-06-11 19:05 ` [PATCH 13/15] spidernet: move a block of code around Linas Vepstas
` (3 subsequent siblings)
14 siblings, 1 reply; 69+ messages in thread
From: Linas Vepstas @ 2007-06-11 19:02 UTC (permalink / raw)
To: Jeff Garzik; +Cc: Michael Ellerman, netdev, cbe-oss-dev
Another way of minimizing the likelyhood of RX ram from overflowing
is to empty out the entire rx ring every chance we get. Change
the crazy watchdog timeout from 50 seconds to 3 seconds, while
we're here.
Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
----
drivers/net/spider_net.h | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
Index: linux-2.6.22-rc1/drivers/net/spider_net.h
===================================================================
--- linux-2.6.22-rc1.orig/drivers/net/spider_net.h 2007-06-11 11:50:03.000000000 -0500
+++ linux-2.6.22-rc1/drivers/net/spider_net.h 2007-06-11 11:53:26.000000000 -0500
@@ -56,8 +56,13 @@ extern char spider_net_driver_name[];
#define SPIDER_NET_RX_CSUM_DEFAULT 1
-#define SPIDER_NET_WATCHDOG_TIMEOUT 50*HZ
-#define SPIDER_NET_NAPI_WEIGHT 64
+#define SPIDER_NET_WATCHDOG_TIMEOUT 3*HZ
+
+/* We really really want to empty the ring buffer every time,
+ * so as to avoid the RX ram full bug. So set te napi wieght
+ * to the ring size.
+ */
+#define SPIDER_NET_NAPI_WEIGHT SPIDER_NET_RX_DESCRIPTORS_DEFAULT
#define SPIDER_NET_FIRMWARE_SEQS 6
#define SPIDER_NET_FIRMWARE_SEQWORDS 1024
^ permalink raw reply [flat|nested] 69+ messages in thread* Re: [PATCH 12/15] spidernet: increase the NAPI weight
2007-06-11 19:02 ` [PATCH 12/15] spidernet: increase the NAPI weight Linas Vepstas
@ 2007-06-13 20:14 ` Jeff Garzik
2007-06-13 20:49 ` [Cbe-oss-dev] " Arnd Bergmann
0 siblings, 1 reply; 69+ messages in thread
From: Jeff Garzik @ 2007-06-13 20:14 UTC (permalink / raw)
To: Linas Vepstas; +Cc: Michael Ellerman, netdev, cbe-oss-dev
Linas Vepstas wrote:
> Another way of minimizing the likelyhood of RX ram from overflowing
> is to empty out the entire rx ring every chance we get. Change
> the crazy watchdog timeout from 50 seconds to 3 seconds, while
> we're here.
>
> Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
>
> ----
> drivers/net/spider_net.h | 9 +++++++--
> 1 file changed, 7 insertions(+), 2 deletions(-)
>
> Index: linux-2.6.22-rc1/drivers/net/spider_net.h
> ===================================================================
> --- linux-2.6.22-rc1.orig/drivers/net/spider_net.h 2007-06-11 11:50:03.000000000 -0500
> +++ linux-2.6.22-rc1/drivers/net/spider_net.h 2007-06-11 11:53:26.000000000 -0500
> @@ -56,8 +56,13 @@ extern char spider_net_driver_name[];
>
> #define SPIDER_NET_RX_CSUM_DEFAULT 1
>
> -#define SPIDER_NET_WATCHDOG_TIMEOUT 50*HZ
> -#define SPIDER_NET_NAPI_WEIGHT 64
> +#define SPIDER_NET_WATCHDOG_TIMEOUT 3*HZ
> +
> +/* We really really want to empty the ring buffer every time,
> + * so as to avoid the RX ram full bug. So set te napi wieght
> + * to the ring size.
> + */
> +#define SPIDER_NET_NAPI_WEIGHT SPIDER_NET_RX_DESCRIPTORS_DEFAULT
I don't see why spider_net should have a different NAPI weight from
other drivers
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [Cbe-oss-dev] [PATCH 12/15] spidernet: increase the NAPI weight
2007-06-13 20:14 ` Jeff Garzik
@ 2007-06-13 20:49 ` Arnd Bergmann
2007-06-14 22:08 ` Linas Vepstas
0 siblings, 1 reply; 69+ messages in thread
From: Arnd Bergmann @ 2007-06-13 20:49 UTC (permalink / raw)
To: cbe-oss-dev; +Cc: netdev, Jeff Garzik, Linas Vepstas
On Wednesday 13 June 2007, Jeff Garzik wrote:
> > +/* We really really want to empty the ring buffer every time,
> > + * so as to avoid the RX ram full bug. So set te napi wieght
> > + * to the ring size.
> > + */
> > +#define SPIDER_NET_NAPI_WEIGHT SPIDER_NET_RX_DESCRIPTORS_DEFAULT
>
> I don't see why spider_net should have a different NAPI weight from
> other drivers
>
Would it help to do it the other way round, as in
#define SPIDER_NET_RX_DESCRIPTORS_DEFAULT SPIDER_NET_NAPI_WEIGHT
and leave that at 64 instead of 256?
Arnd <><
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [Cbe-oss-dev] [PATCH 12/15] spidernet: increase the NAPI weight
2007-06-13 20:49 ` [Cbe-oss-dev] " Arnd Bergmann
@ 2007-06-14 22:08 ` Linas Vepstas
0 siblings, 0 replies; 69+ messages in thread
From: Linas Vepstas @ 2007-06-14 22:08 UTC (permalink / raw)
To: Arnd Bergmann; +Cc: cbe-oss-dev, netdev, Jeff Garzik
On Wed, Jun 13, 2007 at 10:49:51PM +0200, Arnd Bergmann wrote:
> On Wednesday 13 June 2007, Jeff Garzik wrote:
> > > +/* We really really want to empty the ring buffer every time,
> > > + * so as to avoid the RX ram full bug. So set te napi wieght
> > > + * to the ring size.
> > > + */
> > > +#define SPIDER_NET_NAPI_WEIGHT SPIDER_NET_RX_DESCRIPTORS_DEFAULT
> >
> > I don't see why spider_net should have a different NAPI weight from
> > other drivers
It was a lame attempt to try to trick napi into draining the entire
RX queue in one go, with the goal of avoiding the dreaded rx ram full.
I'm not sure it made much of a difference, so we can let this slide.
> Would it help to do it the other way round, as in
No, that would shorten the RX queue, thus making it more likely
to overflow. At gigabit speeds, its petty easy to fill this thing
up multiple times per jiffy. The driver should continue to operate
either way, but the larger queue should keep it from being a busy
beaver.
--linas
^ permalink raw reply [flat|nested] 69+ messages in thread
* [PATCH 13/15] spidernet: move a block of code around
2007-06-11 18:17 ` [PATCH 1/15] spidernet: null out skb pointer after its been used Linas Vepstas
` (10 preceding siblings ...)
2007-06-11 19:02 ` [PATCH 12/15] spidernet: increase the NAPI weight Linas Vepstas
@ 2007-06-11 19:05 ` Linas Vepstas
2007-06-11 19:09 ` [PATCH 14/15] spidernet: fix misnamed flag Linas Vepstas
` (2 subsequent siblings)
14 siblings, 0 replies; 69+ messages in thread
From: Linas Vepstas @ 2007-06-11 19:05 UTC (permalink / raw)
To: Jeff Garzik; +Cc: Michael Ellerman, netdev, cbe-oss-dev
Put the enable and disable routines next to one-another,
as this makes verifying thier symmetry that much easier.
Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
----
drivers/net/spider_net.c | 28 ++++++++++++++--------------
1 file changed, 14 insertions(+), 14 deletions(-)
Index: linux-2.6.22-rc1/drivers/net/spider_net.c
===================================================================
--- linux-2.6.22-rc1.orig/drivers/net/spider_net.c 2007-06-11 11:53:24.000000000 -0500
+++ linux-2.6.22-rc1/drivers/net/spider_net.c 2007-06-11 11:53:27.000000000 -0500
@@ -501,6 +501,20 @@ spider_net_enable_rxdmac(struct spider_n
}
/**
+ * spider_net_disable_rxdmac - disables the receive DMA controller
+ * @card: card structure
+ *
+ * spider_net_disable_rxdmac terminates processing on the DMA controller
+ * by turing off the DMA controller, with the force-end flag set.
+ */
+static inline void
+spider_net_disable_rxdmac(struct spider_net_card *card)
+{
+ spider_net_write_reg(card, SPIDER_NET_GDADMACCNTR,
+ SPIDER_NET_DMA_RX_FEND_VALUE);
+}
+
+/**
* spider_net_refill_rx_chain - refills descriptors/skbs in the rx chains
* @card: card structure
*
@@ -656,20 +670,6 @@ write_hash:
}
/**
- * spider_net_disable_rxdmac - disables the receive DMA controller
- * @card: card structure
- *
- * spider_net_disable_rxdmac terminates processing on the DMA controller by
- * turing off DMA and issueing a force end
- */
-static void
-spider_net_disable_rxdmac(struct spider_net_card *card)
-{
- spider_net_write_reg(card, SPIDER_NET_GDADMACCNTR,
- SPIDER_NET_DMA_RX_FEND_VALUE);
-}
-
-/**
* spider_net_prepare_tx_descr - fill tx descriptor with skb data
* @card: card structure
* @descr: descriptor structure to fill out
^ permalink raw reply [flat|nested] 69+ messages in thread* [PATCH 14/15] spidernet: fix misnamed flag
2007-06-11 18:17 ` [PATCH 1/15] spidernet: null out skb pointer after its been used Linas Vepstas
` (11 preceding siblings ...)
2007-06-11 19:05 ` [PATCH 13/15] spidernet: move a block of code around Linas Vepstas
@ 2007-06-11 19:09 ` Linas Vepstas
2007-06-11 19:12 ` [PATCH 15/15] spidernet: driver docmentation Linas Vepstas
2007-06-13 20:10 ` [PATCH 1/15] spidernet: null out skb pointer after its been used Jeff Garzik
14 siblings, 0 replies; 69+ messages in thread
From: Linas Vepstas @ 2007-06-11 19:09 UTC (permalink / raw)
To: Jeff Garzik; +Cc: Michael Ellerman, netdev, cbe-oss-dev
The transmit frame tail bit is stranglely misnamed as
"no checksum". Fix the name to what it should be:
"transmit frame tail". No functional change,
just a name change.
Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
----
drivers/net/spider_net.c | 2 +-
drivers/net/spider_net.h | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
Index: linux-2.6.22-rc1/drivers/net/spider_net.c
===================================================================
--- linux-2.6.22-rc1.orig/drivers/net/spider_net.c 2007-06-11 11:53:27.000000000 -0500
+++ linux-2.6.22-rc1/drivers/net/spider_net.c 2007-06-11 11:53:29.000000000 -0500
@@ -716,7 +716,7 @@ spider_net_prepare_tx_descr(struct spide
hwdescr->data_status = 0;
hwdescr->dmac_cmd_status =
- SPIDER_NET_DESCR_CARDOWNED | SPIDER_NET_DMAC_NOCS;
+ SPIDER_NET_DESCR_CARDOWNED | SPIDER_NET_DMAC_TXFRMTL;
spin_unlock_irqrestore(&chain->lock, flags);
if (skb->ip_summed == CHECKSUM_PARTIAL)
Index: linux-2.6.22-rc1/drivers/net/spider_net.h
===================================================================
--- linux-2.6.22-rc1.orig/drivers/net/spider_net.h 2007-06-11 11:53:26.000000000 -0500
+++ linux-2.6.22-rc1/drivers/net/spider_net.h 2007-06-11 11:53:29.000000000 -0500
@@ -354,7 +354,7 @@ enum spider_net_int2_status {
#define SPIDER_NET_GPRDAT_MASK 0x0000ffff
#define SPIDER_NET_DMAC_NOINTR_COMPLETE 0x00800000
-#define SPIDER_NET_DMAC_NOCS 0x00040000
+#define SPIDER_NET_DMAC_TXFRMTL 0x00040000
#define SPIDER_NET_DMAC_TCP 0x00020000
#define SPIDER_NET_DMAC_UDP 0x00030000
#define SPIDER_NET_TXDCEST 0x08000000
^ permalink raw reply [flat|nested] 69+ messages in thread* [PATCH 15/15] spidernet: driver docmentation
2007-06-11 18:17 ` [PATCH 1/15] spidernet: null out skb pointer after its been used Linas Vepstas
` (12 preceding siblings ...)
2007-06-11 19:09 ` [PATCH 14/15] spidernet: fix misnamed flag Linas Vepstas
@ 2007-06-11 19:12 ` Linas Vepstas
2007-06-13 20:10 ` [PATCH 1/15] spidernet: null out skb pointer after its been used Jeff Garzik
14 siblings, 0 replies; 69+ messages in thread
From: Linas Vepstas @ 2007-06-11 19:12 UTC (permalink / raw)
To: Jeff Garzik; +Cc: Michael Ellerman, netdev, cbe-oss-dev
Documentation for the spidernet driver.
Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
----
Documentation/networking/spider_net.txt | 204 ++++++++++++++++++++++++++++++++
1 file changed, 204 insertions(+)
Index: linux-2.6.22-rc1/Documentation/networking/spider_net.txt
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6.22-rc1/Documentation/networking/spider_net.txt 2007-06-11 11:53:31.000000000 -0500
@@ -0,0 +1,204 @@
+
+ The Spidernet Device Driver
+ ===========================
+
+Written by Linas Vepstas <linas@austin.ibm.com>
+
+Version of 7 June 2007
+
+Abstract
+========
+This document sketches the structure of portions of the spidernet
+device driver in the Linux kernel tree. The spidernet is a gigabit
+ethernet device built into the Toshiba southbridge commonly used
+in the SONY Playstation 3 and the IBM QS20 Cell blade.
+
+The Structure of the RX Ring.
+=============================
+The receive (RX) ring is a circular linked list of RX descriptors,
+together with three pointers into the ring that are used to manage its
+contents.
+
+The elements of the ring are called "descriptors" or "descrs"; they
+describe the received data. This includes a pointer to a buffer
+containing the received data, the buffer size, and various status bits.
+
+There are three primary states that a descriptor can be in: "empty",
+"full" and "not-in-use". An "empty" or "ready" descriptor is ready
+to receive data from the hardware. A "full" descriptor has data in it,
+and is waiting to be emptied and processed by the OS. A "not-in-use"
+descriptor is neither empty or full; it is simply not ready. It may
+not even have a data buffer in it, or is otherwise unusable.
+
+During normal operation, on device startup, the OS (specifically, the
+spidernet device driver) allocates a set of RX descriptors and RX
+buffers. These are all marked "empty", ready to receive data. This
+ring is handed off to the hardware, which sequentially fills in the
+buffers, and marks them "full". The OS follows up, taking the full
+buffers, processing them, and re-marking them empty.
+
+This filling and emptying is managed by three pointers, the "head"
+and "tail" pointers, managed by the OS, and a hardware current
+descriptor pointer (GDACTDPA). The GDACTDPA points at the descr
+currently being filled. When this descr is filled, the hardware
+marks it full, and advances the GDACTDPA by one. Thus, when there is
+flowing RX traffic, every descr behind it should be marked "full",
+and everything in front of it should be "empty". If the hardware
+discovers that the current descr is not empty, it will signal an
+interrupt, and halt processing.
+
+The tail pointer tails or trails the hardware pointer. When the
+hardware is ahead, the tail pointer will be pointing at a "full"
+descr. The OS will process this descr, and then mark it "not-in-use",
+and advance the tail pointer. Thus, when there is flowing RX traffic,
+all of the descrs in front of the tail pointer should be "full", and
+all of those behind it should be "not-in-use". When RX traffic is not
+flowing, then the tail pointer can catch up to the hardware pointer.
+The OS will then note that the current tail is "empty", and halt
+processing.
+
+The head pointer (somewhat mis-named) follows after the tail pointer.
+When traffic is flowing, then the head pointer will be pointing at
+a "not-in-use" descr. The OS will perform various housekeeping duties
+on this descr. This includes allocating a new data buffer and
+dma-mapping it so as to make it visible to the hardware. The OS will
+then mark the descr as "empty", ready to receive data. Thus, when there
+is flowing RX traffic, everything in front of the head pointer should
+be "not-in-use", and everything behind it should be "empty". If no
+RX traffic is flowing, then the head pointer can catch up to the tail
+pointer, at which point the OS will notice that the head descr is
+"empty", and it will halt processing.
+
+Thus, in an idle system, the GDACTDPA, tail and head pointers will
+all be pointing at the same descr, which should be "empty". All of the
+other descrs in the ring should be "empty" as well.
+
+The show_rx_chain() routine will print out the the locations of the
+GDACTDPA, tail and head pointers. It will also summarize the contents
+of the ring, starting at the tail pointer, and listing the status
+of the descrs that follow.
+
+A typical example of the output, for a nearly idle system, might be
+
+net eth1: Total number of descrs=256
+net eth1: Chain tail located at descr=20
+net eth1: Chain head is at 20
+net eth1: HW curr desc (GDACTDPA) is at 21
+net eth1: Have 1 descrs with stat=x40800101
+net eth1: HW next desc (GDACNEXTDA) is at 22
+net eth1: Last 255 descrs with stat=xa0800000
+
+In the above, the hardware has filled in one descr, number 20. Both
+head and tail are pointing at 20, because it has not yet been emptied.
+Meanwhile, hw is pointing at 21, which is free.
+
+The "Have nnn decrs" refers to the descr starting at the tail: in this
+case, nnn=1 descr, starting at descr 20. The "Last nnn descrs" refers
+to all of the rest of the descrs, from the last status change. The "nnn"
+is a count of how many descrs have exactly the same status.
+
+The status x4... corresponds to "full" and status xa... corresponds
+to "empty". The actual value printed is RXCOMST_A.
+
+In the device driver source code, a different set of names are
+used for these same concepts, so that
+
+"empty" == SPIDER_NET_DESCR_CARDOWNED == 0xa
+"full" == SPIDER_NET_DESCR_FRAME_END == 0x4
+"not in use" == SPIDER_NET_DESCR_NOT_IN_USE == 0xf
+
+
+The RX RAM full bug/feature
+===========================
+
+As long as the OS can empty out the RX buffers at a rate faster than
+the hardware can fill them, there is no problem. If, for some reason,
+the OS fails to empty the RX ring fast enough, the hardware GDACTDPA
+pointer will catch up to the head, notice the not-empty condition,
+ad stop. However, RX packets may still continue arriving on the wire.
+The spidernet chip can save some limited number of these in local RAM.
+When this local ram fills up, the spider chip will issue an interrupt
+indicating this (GHIINT0STS will show ERRINT, and the GRMFLLINT bit
+will be set in GHIINT1STS). When the RX ram full condition occurs,
+a certain bug/feature is triggered that has to be specially handled.
+This section describes the special handling for this condition.
+
+When the OS finally has a chance to run, it will empty out the RX ring.
+In particular, it will clear the descriptor on which the hardware had
+stopped. However, once the hardware has decided that a certain
+descriptor is invalid, it will not restart at that descriptor; instead
+it will restart at the next descr. This potentially will lead to a
+deadlock condition, as the tail pointer will be pointing at this descr,
+which, from the OS point of view, is empty; the OS will be waiting for
+this descr to be filled. However, the hardware has skipped this descr,
+and is filling the next descrs. Since the OS doesn't see this, there
+is a potential deadlock, with the OS waiting for one descr to fill,
+while the hardware is waiting for a different set of descrs to become
+empty.
+
+A call to show_rx_chain() at this point indicates the nature of the
+problem. A typical print when the network is hung shows the following:
+
+net eth1: Spider RX RAM full, incoming packets might be discarded!
+net eth1: Total number of descrs=256
+net eth1: Chain tail located at descr=255
+net eth1: Chain head is at 255
+net eth1: HW curr desc (GDACTDPA) is at 0
+net eth1: Have 1 descrs with stat=xa0800000
+net eth1: HW next desc (GDACNEXTDA) is at 1
+net eth1: Have 127 descrs with stat=x40800101
+net eth1: Have 1 descrs with stat=x40800001
+net eth1: Have 126 descrs with stat=x40800101
+net eth1: Last 1 descrs with stat=xa0800000
+
+Both the tail and head pointers are pointing at descr 255, which is
+marked xa... which is "empty". Thus, from the OS point of view, there
+is nothing to be done. In particular, there is the implicit assumption
+that everything in front of the "empty" descr must surely also be empty,
+as explained in the last section. The OS is waiting for descr 255 to
+become non-empty, which, in this case, will never happen.
+
+The HW pointer is at descr 0. This descr is marked 0x4.. or "full".
+Since its already full, the hardware can do nothing more, and thus has
+halted processing. Notice that descrs 0 through 254 are all marked
+"full", while descr 254 and 255 are empty. (The "Last 1 descrs" is
+descr 254, since tail was at 255.) Thus, the system is deadlocked,
+and there can be no forward progress; the OS thinks there's nothing
+to do, and the hardware has nowhere to put incoming data.
+
+This bug/feature is worked around with the spider_net_resync_head_ptr()
+routine. When the driver receives RX interrupts, but an examination
+of the RX chain seems to show it is empty, then it is probable that
+the hardware has skipped a descr or two (sometimes dozens under heavy
+network conditions). The spider_net_resync_head_ptr() subroutine will
+search the ring for the next full descr, and the driver will resume
+operations there. Since this will leave "holes" in the ring, there
+is also a spider_net_resync_tail_ptr() that will skip over such holes.
+
+As of this writing, the spider_net_resync() strategy seems to work very
+well, even under heavy network loads.
+
+
+The TX ring
+===========
+The TX ring uses a low-watermark interrupt scheme to make sure that
+the TX queue is appropriately serviced for large packet sizes.
+
+For packet sizes greater than about 1KBytes, the kernel can fill
+the TX ring quicker than the device can drain it. Once the ring
+is full, the netdev is stopped. When there is room in the ring,
+the netdev needs to be reawakened, so that more TX packets are placed
+in the ring. The hardware can empty the ring about four times per jiffy,
+so its not appropriate to wait for the poll routine to refill, since
+the poll routine runs only once per jiffy. The low-watermark mechanism
+marks a descr about 1/4th of the way from the bottom of the queue, so
+that an interrupt is generated when the descr is processed. This
+interrupt wakes up the netdev, which can then refill the queue.
+For large packets, this mechanism generates a relatively small number
+of interrupts, about 1K/sec. For smaller packets, this will drop to zero
+interrupts, as the hardware can empty the queue faster than the kernel
+can fill it.
+
+
+ ======= END OF DOCUMENT ========
+
^ permalink raw reply [flat|nested] 69+ messages in thread* Re: [PATCH 1/15] spidernet: null out skb pointer after its been used.
2007-06-11 18:17 ` [PATCH 1/15] spidernet: null out skb pointer after its been used Linas Vepstas
` (13 preceding siblings ...)
2007-06-11 19:12 ` [PATCH 15/15] spidernet: driver docmentation Linas Vepstas
@ 2007-06-13 20:10 ` Jeff Garzik
2007-06-14 22:00 ` Linas Vepstas
14 siblings, 1 reply; 69+ messages in thread
From: Jeff Garzik @ 2007-06-13 20:10 UTC (permalink / raw)
To: Linas Vepstas; +Cc: Michael Ellerman, netdev, cbe-oss-dev
Linas Vepstas wrote:
> Avoid kernel crash in mm/slab.c due to double-free of pointer.
>
> If the ethernet interface is brought down while there is still
> RX traffic in flight, the device shutdown routine can end up
> trying to double-free an skb, leading to a crash in mm/slab.c
> Avoid the double-free by nulling out the skb pointer.
>
> Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
>
> ----
> drivers/net/spider_net.c | 1 +
> 1 file changed, 1 insertion(+)
applied 1-5, 7 to #upstream-fixes (2.6.22)
patch #6 was ignored, because it was already upstream
^ permalink raw reply [flat|nested] 69+ messages in thread* Re: [PATCH 1/15] spidernet: null out skb pointer after its been used.
2007-06-13 20:10 ` [PATCH 1/15] spidernet: null out skb pointer after its been used Jeff Garzik
@ 2007-06-14 22:00 ` Linas Vepstas
0 siblings, 0 replies; 69+ messages in thread
From: Linas Vepstas @ 2007-06-14 22:00 UTC (permalink / raw)
To: Jeff Garzik; +Cc: Michael Ellerman, netdev, cbe-oss-dev
On Wed, Jun 13, 2007 at 04:10:17PM -0400, Jeff Garzik wrote:
> Linas Vepstas wrote:
> >Avoid kernel crash in mm/slab.c due to double-free of pointer.
> >
> >If the ethernet interface is brought down while there is still
> >RX traffic in flight, the device shutdown routine can end up
> >trying to double-free an skb, leading to a crash in mm/slab.c
> >Avoid the double-free by nulling out the skb pointer.
> >
> >Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
> >
> >----
> > drivers/net/spider_net.c | 1 +
> > 1 file changed, 1 insertion(+)
>
> applied 1-5, 7 to #upstream-fixes (2.6.22)
>
> patch #6 was ignored, because it was already upstream
Thank you!
--linas
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 0/15] spidernet driver bug fixes
2007-06-11 18:14 ` [PATCH 0/15] " Linas Vepstas
2007-06-11 18:17 ` [PATCH 1/15] spidernet: null out skb pointer after its been used Linas Vepstas
@ 2007-06-12 2:01 ` Michael Ellerman
2007-06-12 23:00 ` Jeff Garzik
2 siblings, 0 replies; 69+ messages in thread
From: Michael Ellerman @ 2007-06-12 2:01 UTC (permalink / raw)
To: Linas Vepstas; +Cc: Jeff Garzik, Jeff Garzik, netdev, cbe-oss-dev
[-- Attachment #1: Type: text/plain, Size: 1135 bytes --]
On Mon, 2007-06-11 at 13:14 -0500, Linas Vepstas wrote:
> On Fri, Jun 08, 2007 at 01:20:20PM -0400, Jeff Garzik wrote:
> > On Fri, Jun 08, 2007 at 12:06:08PM -0500, Linas Vepstas wrote:
> > > On Fri, Jun 08, 2007 at 11:12:31AM +1000, Michael Ellerman wrote:
> > > > On Thu, 2007-06-07 at 14:17 -0500, Linas Vepstas wrote:
> > > > >
> > > > > The major bug fixes are:
> > > > I realise it's late, but shouldn't "major bugfixes" be going into 22 ?
> > > Yeah, I suppose, I admit I've lost track of the process.
> >
> > You need to order your bug fixes first in the queue.
>
> OK, here are the patches, re-ordered. There is a different number
> than last time, as I threw out one, merged one, and got cold feet
> on a third one. They still pass the tests.
>
> The first five patches focus on three serious bugs, fixing crashes or
> hangs.
Nice one.
cheers
--
Michael Ellerman
OzLabs, IBM Australia Development Lab
wwweb: http://michael.ellerman.id.au
phone: +61 2 6212 1183 (tie line 70 21183)
We do not inherit the earth from our ancestors,
we borrow it from our children. - S.M.A.R.T Person
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 0/15] spidernet driver bug fixes
2007-06-11 18:14 ` [PATCH 0/15] " Linas Vepstas
2007-06-11 18:17 ` [PATCH 1/15] spidernet: null out skb pointer after its been used Linas Vepstas
2007-06-12 2:01 ` [PATCH 0/15] spidernet driver bug fixes Michael Ellerman
@ 2007-06-12 23:00 ` Jeff Garzik
2007-06-12 23:32 ` Linas Vepstas
2007-06-13 1:33 ` Michael Ellerman
2 siblings, 2 replies; 69+ messages in thread
From: Jeff Garzik @ 2007-06-12 23:00 UTC (permalink / raw)
To: Linas Vepstas; +Cc: Michael Ellerman, Jeff Garzik, netdev, cbe-oss-dev
Linas Vepstas wrote:
> On Fri, Jun 08, 2007 at 01:20:20PM -0400, Jeff Garzik wrote:
>> On Fri, Jun 08, 2007 at 12:06:08PM -0500, Linas Vepstas wrote:
>>> On Fri, Jun 08, 2007 at 11:12:31AM +1000, Michael Ellerman wrote:
>>>> On Thu, 2007-06-07 at 14:17 -0500, Linas Vepstas wrote:
>>>>> The major bug fixes are:
>>>> I realise it's late, but shouldn't "major bugfixes" be going into 22 ?
>>> Yeah, I suppose, I admit I've lost track of the process.
>> You need to order your bug fixes first in the queue.
>
> OK, here are the patches, re-ordered. There is a different number
> than last time, as I threw out one, merged one, and got cold feet
> on a third one. They still pass the tests.
>
> The first five patches focus on three serious bugs, fixing crashes or
> hangs.
>
> -- patch 1 -- kernel crash when ifdown while receiving packets.
> -- patch 2,3,4 -- device driver deadlocks on "RX ram full" mesgs.
> (kernel stays up, ifdown/up clear the problem).
> -- patch 5 -- misconfigured TX interrupts results in 3x-4x per
> degradation for small packets.
>
> -- patch 6 -- rx stats may be mangled
> -- patch 7 -- hw checksum sometimes breaks ipv6 operation
>
> -- patches 8-15 -- misc tweaks, and documentation.
>
>
> I re-ran my stress tests with patches 1-7 applied; they pass.
This is a bit frustrating, because this includes many patches that you
ALREADY told me to queue for 2.6.23, which I did, in
netdev-2.6.git#upstream.
Should I just drop all spidernet patches and start over?
Jeff
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 0/15] spidernet driver bug fixes
2007-06-12 23:00 ` Jeff Garzik
@ 2007-06-12 23:32 ` Linas Vepstas
2007-06-13 0:04 ` Jeff Garzik
2007-06-13 1:33 ` Michael Ellerman
1 sibling, 1 reply; 69+ messages in thread
From: Linas Vepstas @ 2007-06-12 23:32 UTC (permalink / raw)
To: Jeff Garzik; +Cc: Michael Ellerman, Jeff Garzik, netdev, cbe-oss-dev
On Tue, Jun 12, 2007 at 07:00:17PM -0400, Jeff Garzik wrote:
> Linas Vepstas wrote:
> >On Fri, Jun 08, 2007 at 01:20:20PM -0400, Jeff Garzik wrote:
> >>On Fri, Jun 08, 2007 at 12:06:08PM -0500, Linas Vepstas wrote:
> >>>On Fri, Jun 08, 2007 at 11:12:31AM +1000, Michael Ellerman wrote:
> >>>>On Thu, 2007-06-07 at 14:17 -0500, Linas Vepstas wrote:
> >>>>>The major bug fixes are:
> >>>>I realise it's late, but shouldn't "major bugfixes" be going into 22 ?
> >>>Yeah, I suppose, I admit I've lost track of the process.
> >>You need to order your bug fixes first in the queue.
> >
> >OK, here are the patches, re-ordered. There is a different number
> >than last time, as I threw out one, merged one, and got cold feet
> >on a third one. They still pass the tests.
> >
> >The first five patches focus on three serious bugs, fixing crashes or
> >hangs.
> >
> >-- patch 1 -- kernel crash when ifdown while receiving packets.
> >-- patch 2,3,4 -- device driver deadlocks on "RX ram full" mesgs.
> > (kernel stays up, ifdown/up clear the problem).
> >-- patch 5 -- misconfigured TX interrupts results in 3x-4x per
> > degradation for small packets.
> >
> >-- patch 6 -- rx stats may be mangled
> >-- patch 7 -- hw checksum sometimes breaks ipv6 operation
> >
> >-- patches 8-15 -- misc tweaks, and documentation.
> >
> >
> >I re-ran my stress tests with patches 1-7 applied; they pass.
>
> This is a bit frustrating, because this includes many patches that you
> ALREADY told me to queue for 2.6.23, which I did, in
> netdev-2.6.git#upstream.
Sigh. I redid the series so as to avoid this problem, per the
previous conversation.
> Should I just drop all spidernet patches and start over?
No. Apply the series I just sent you, dropping the one called
"patch 6/15", the one from Florin Malita, as it appears you'd
previously picked this up. The rest of the patches should apply
cleanly; I just cheked. I just did a "git pull" of
git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/netdev-2.6
and checked. The result of patching is exactly as it should be.
Just in case it wasn't clear, I'd like to see patches 1-5 go
into 2.6.22 ... as these address the most critical complaints I'd
gotten recently.
--linas
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 0/15] spidernet driver bug fixes
2007-06-12 23:32 ` Linas Vepstas
@ 2007-06-13 0:04 ` Jeff Garzik
2007-06-13 16:14 ` Linas Vepstas
0 siblings, 1 reply; 69+ messages in thread
From: Jeff Garzik @ 2007-06-13 0:04 UTC (permalink / raw)
To: Linas Vepstas; +Cc: Michael Ellerman, netdev, cbe-oss-dev
Linas Vepstas wrote:
> On Tue, Jun 12, 2007 at 07:00:17PM -0400, Jeff Garzik wrote:
>> Linas Vepstas wrote:
>>> On Fri, Jun 08, 2007 at 01:20:20PM -0400, Jeff Garzik wrote:
>>>> On Fri, Jun 08, 2007 at 12:06:08PM -0500, Linas Vepstas wrote:
>>>>> On Fri, Jun 08, 2007 at 11:12:31AM +1000, Michael Ellerman wrote:
>>>>>> On Thu, 2007-06-07 at 14:17 -0500, Linas Vepstas wrote:
>>>>>>> The major bug fixes are:
>>>>>> I realise it's late, but shouldn't "major bugfixes" be going into 22 ?
>>>>> Yeah, I suppose, I admit I've lost track of the process.
>>>> You need to order your bug fixes first in the queue.
>>> OK, here are the patches, re-ordered. There is a different number
>>> than last time, as I threw out one, merged one, and got cold feet
>>> on a third one. They still pass the tests.
>>>
>>> The first five patches focus on three serious bugs, fixing crashes or
>>> hangs.
>>>
>>> -- patch 1 -- kernel crash when ifdown while receiving packets.
>>> -- patch 2,3,4 -- device driver deadlocks on "RX ram full" mesgs.
>>> (kernel stays up, ifdown/up clear the problem).
>>> -- patch 5 -- misconfigured TX interrupts results in 3x-4x per
>>> degradation for small packets.
>>>
>>> -- patch 6 -- rx stats may be mangled
>>> -- patch 7 -- hw checksum sometimes breaks ipv6 operation
>>>
>>> -- patches 8-15 -- misc tweaks, and documentation.
>>>
>>>
>>> I re-ran my stress tests with patches 1-7 applied; they pass.
>> This is a bit frustrating, because this includes many patches that you
>> ALREADY told me to queue for 2.6.23, which I did, in
>> netdev-2.6.git#upstream.
>
> Sigh. I redid the series so as to avoid this problem, per the
> previous conversation.
>
>> Should I just drop all spidernet patches and start over?
>
> No. Apply the series I just sent you, dropping the one called
> "patch 6/15", the one from Florin Malita, as it appears you'd
> previously picked this up. The rest of the patches should apply
> cleanly; I just cheked. I just did a "git pull" of
> git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/netdev-2.6
> and checked. The result of patching is exactly as it should be.
>
> Just in case it wasn't clear, I'd like to see patches 1-5 go
> into 2.6.22 ... as these address the most critical complaints I'd
> gotten recently.
>
> --linas
>
>
As I just stated, many of the patches in the "current" patch series have
already been applied to netdev-2.6.git#upstream:
Linas Vepstas (11):
s2io: add PCI error recovery support
s2io: add PCI error recovery support
spidernet: beautify error messages
spidernet: move a block of code around
spidernet: zero out a pointer.
spidernet: null out skb pointer after its been used.
spidernet: Don't terminate the RX ring
spidernet: enhance the dump routine
spidernet: reset the card when an rxramfull is seen
spidernet: service TX later.
spidernet: increase the NAPI weight
These are clearly duplicating some of the patches in your patchseries,
which means you are woefully out of sync with upstream.
Jeff
^ permalink raw reply [flat|nested] 69+ messages in thread* Re: [PATCH 0/15] spidernet driver bug fixes
2007-06-13 0:04 ` Jeff Garzik
@ 2007-06-13 16:14 ` Linas Vepstas
2007-06-13 18:51 ` Jeff Garzik
` (2 more replies)
0 siblings, 3 replies; 69+ messages in thread
From: Linas Vepstas @ 2007-06-13 16:14 UTC (permalink / raw)
To: Jeff Garzik; +Cc: Michael Ellerman, netdev, cbe-oss-dev
On Tue, Jun 12, 2007 at 08:04:18PM -0400, Jeff Garzik wrote:
> >
> >>Should I just drop all spidernet patches and start over?
> >
> >No. Apply the series I just sent you, dropping the one called
> >"patch 6/15", the one from Florin Malita, as it appears you'd
> >previously picked this up. The rest of the patches should apply
> >cleanly; I just cheked. I just did a "git pull" of
> >git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/netdev-2.6
> >and checked. The result of patching is exactly as it should be.
> >
>
> As I just stated, many of the patches in the "current" patch series have
>
> Linas Vepstas (11):
> s2io: add PCI error recovery support
> s2io: add PCI error recovery support
> spidernet: beautify error messages
> spidernet: move a block of code around
> spidernet: zero out a pointer.
> spidernet: null out skb pointer after its been used.
> spidernet: Don't terminate the RX ring
> spidernet: enhance the dump routine
> spidernet: reset the card when an rxramfull is seen
> spidernet: service TX later.
> spidernet: increase the NAPI weight
>
> These are clearly duplicating some of the patches in your patchseries,
> which means you are woefully out of sync with upstream.
> already been applied to netdev-2.6.git#upstream:
My apologies; I'm trying. Seems that I've tripped over a git "feature".
"git branch" shows that I'm on "upstream". So I performed a "git pull"
(without any additional arguments) assuming that it would sync to your
"upstream" branch. And so my email was based on this.
Some googling seems to show that "git pull" has a bug/feature of
ignoring the branch that one is working in, and pulling "master"
no matter what. I have no clue why; this seems broken to me.
So ... let me try again ...
git pull git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/netdev-2.6 upstream
...
Automatic merge failed; fix up by hand
So not only did "git pull" not fetch the correct branch, but it also
wrecked the repository. Glug. I have no clue how to recover from this.
I suggest dropping the above series of spidernet patches, and reapplying
the series of 15 I'd sent in.
--linas
^ permalink raw reply [flat|nested] 69+ messages in thread* Re: [PATCH 0/15] spidernet driver bug fixes
2007-06-13 16:14 ` Linas Vepstas
@ 2007-06-13 18:51 ` Jeff Garzik
2007-06-13 19:01 ` [Cbe-oss-dev] " Segher Boessenkool
2007-06-13 18:52 ` Jeff Garzik
2007-06-14 22:08 ` [Cbe-oss-dev] " David Woodhouse
2 siblings, 1 reply; 69+ messages in thread
From: Jeff Garzik @ 2007-06-13 18:51 UTC (permalink / raw)
To: Linas Vepstas; +Cc: Michael Ellerman, netdev, cbe-oss-dev
Linas Vepstas wrote:
> "git branch" shows that I'm on "upstream". So I performed a "git pull"
> (without any additional arguments) assuming that it would sync to your
> "upstream" branch. And so my email was based on this.
>
> Some googling seems to show that "git pull" has a bug/feature of
> ignoring the branch that one is working in, and pulling "master"
> no matter what. I have no clue why; this seems broken to me.
>
> So ... let me try again ...
> git pull git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/netdev-2.6 upstream
> ...
> Automatic merge failed; fix up by hand
Unfortunately git isn't the greatest for saying "just give me what is on
the remote", since each repository is an independent peer.
You need to:
* grab the latest torvalds/linux-2.6.git
* erase local netdev-2.6
* clone to create local netdev-2.6:
URL=git//git.kernel.org/.../jgarzik/netdev-2.6
git-clone --reference linux-2.6 $URL netdev-2.6
* that creates 'master' branch, which always equals vanilla upstream
* now create a local upstream branch:
git checkout -b upstream master
* and finally, pull remote upstream branch into local upstream branch:
git pull $URL upstream:upstream
Occasionally the remote 'upstream' will get "rebased", which means is
has been completely replaced by a new linear history. If you pull
'upstream' after a rebase, into a local 'upstream', git will attempt to
merge the same patches all over again, with disastrous results.
I wish there was a git option to "just make my shit look like the
remote, dammit!" The above is the "easiest" way I know how to do that.
Jeff
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [Cbe-oss-dev] [PATCH 0/15] spidernet driver bug fixes
2007-06-13 18:51 ` Jeff Garzik
@ 2007-06-13 19:01 ` Segher Boessenkool
2007-06-13 19:02 ` Jeff Garzik
2007-06-13 23:55 ` Michael Ellerman
0 siblings, 2 replies; 69+ messages in thread
From: Segher Boessenkool @ 2007-06-13 19:01 UTC (permalink / raw)
To: Jeff Garzik; +Cc: Linas Vepstas, netdev, cbe-oss-dev
> I wish there was a git option to "just make my shit look like the
> remote, dammit!" The above is the "easiest" way I know how to do that.
git-fetch -f remote:local ?
Segher
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [Cbe-oss-dev] [PATCH 0/15] spidernet driver bug fixes
2007-06-13 19:01 ` [Cbe-oss-dev] " Segher Boessenkool
@ 2007-06-13 19:02 ` Jeff Garzik
2007-06-13 20:52 ` Arnd Bergmann
2007-06-13 23:55 ` Michael Ellerman
1 sibling, 1 reply; 69+ messages in thread
From: Jeff Garzik @ 2007-06-13 19:02 UTC (permalink / raw)
To: Segher Boessenkool; +Cc: Linas Vepstas, netdev, cbe-oss-dev
Segher Boessenkool wrote:
>> I wish there was a git option to "just make my shit look like the
>> remote, dammit!" The above is the "easiest" way I know how to do that.
>
> git-fetch -f remote:local ?
If that works... great :) Much better than what I described.
Jeff
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [Cbe-oss-dev] [PATCH 0/15] spidernet driver bug fixes
2007-06-13 19:02 ` Jeff Garzik
@ 2007-06-13 20:52 ` Arnd Bergmann
0 siblings, 0 replies; 69+ messages in thread
From: Arnd Bergmann @ 2007-06-13 20:52 UTC (permalink / raw)
To: cbe-oss-dev; +Cc: Jeff Garzik, Segher Boessenkool, netdev, Linas Vepstas
On Wednesday 13 June 2007, Jeff Garzik wrote:
> Segher Boessenkool wrote:
> >> I wish there was a git option to "just make my shit look like the
> >> remote, dammit!" The above is the "easiest" way I know how to do that.
> >
> > git-fetch -f remote:local ?
>
> If that works... great :) Much better than what I described.
It works as long as you are not on branch 'local', but in that case
you can do 'git-fetch -f remote:local2' or something.
Arnd <><
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [Cbe-oss-dev] [PATCH 0/15] spidernet driver bug fixes
2007-06-13 19:01 ` [Cbe-oss-dev] " Segher Boessenkool
2007-06-13 19:02 ` Jeff Garzik
@ 2007-06-13 23:55 ` Michael Ellerman
1 sibling, 0 replies; 69+ messages in thread
From: Michael Ellerman @ 2007-06-13 23:55 UTC (permalink / raw)
To: Segher Boessenkool; +Cc: Jeff Garzik, netdev, Linas Vepstas, cbe-oss-dev
[-- Attachment #1: Type: text/plain, Size: 550 bytes --]
On Wed, 2007-06-13 at 21:01 +0200, Segher Boessenkool wrote:
> > I wish there was a git option to "just make my shit look like the
> > remote, dammit!" The above is the "easiest" way I know how to do that.
>
> git-fetch -f remote:local ?
There's always "git reset --hard <sha1>"
cheers
--
Michael Ellerman
OzLabs, IBM Australia Development Lab
wwweb: http://michael.ellerman.id.au
phone: +61 2 6212 1183 (tie line 70 21183)
We do not inherit the earth from our ancestors,
we borrow it from our children. - S.M.A.R.T Person
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 0/15] spidernet driver bug fixes
2007-06-13 16:14 ` Linas Vepstas
2007-06-13 18:51 ` Jeff Garzik
@ 2007-06-13 18:52 ` Jeff Garzik
2007-06-14 22:08 ` [Cbe-oss-dev] " David Woodhouse
2 siblings, 0 replies; 69+ messages in thread
From: Jeff Garzik @ 2007-06-13 18:52 UTC (permalink / raw)
To: Linas Vepstas; +Cc: Michael Ellerman, netdev, cbe-oss-dev
As of this moment there are -no- spidernet patches in netdev. I just
rebased 'upstream', and dropped the existing spidernet patches.
Jeff
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [Cbe-oss-dev] [PATCH 0/15] spidernet driver bug fixes
2007-06-13 16:14 ` Linas Vepstas
2007-06-13 18:51 ` Jeff Garzik
2007-06-13 18:52 ` Jeff Garzik
@ 2007-06-14 22:08 ` David Woodhouse
2007-06-14 23:01 ` Jeff Garzik
2 siblings, 1 reply; 69+ messages in thread
From: David Woodhouse @ 2007-06-14 22:08 UTC (permalink / raw)
To: Linas Vepstas; +Cc: Jeff Garzik, netdev, cbe-oss-dev
On Wed, 2007-06-13 at 11:14 -0500, Linas Vepstas wrote:
> Some googling seems to show that "git pull" has a bug/feature of
> ignoring the branch that one is working in, and pulling "master"
> no matter what. I have no clue why; this seems broken to me.
Branches are generally a PITA -- it's probably best just to avoid them
entirely. It's not as if it's hard to create separate trees, and even
share objects between the trees.
--
dwmw2
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [Cbe-oss-dev] [PATCH 0/15] spidernet driver bug fixes
2007-06-14 22:08 ` [Cbe-oss-dev] " David Woodhouse
@ 2007-06-14 23:01 ` Jeff Garzik
2007-06-14 23:03 ` David Woodhouse
0 siblings, 1 reply; 69+ messages in thread
From: Jeff Garzik @ 2007-06-14 23:01 UTC (permalink / raw)
To: David Woodhouse; +Cc: Linas Vepstas, netdev, cbe-oss-dev
David Woodhouse wrote:
> On Wed, 2007-06-13 at 11:14 -0500, Linas Vepstas wrote:
>> Some googling seems to show that "git pull" has a bug/feature of
>> ignoring the branch that one is working in, and pulling "master"
>> no matter what. I have no clue why; this seems broken to me.
>
> Branches are generally a PITA -- it's probably best just to avoid them
> entirely. It's not as if it's hard to create separate trees, and even
> share objects between the trees.
It makes diffing between lines of development more difficult, takes up
more overall space, less cache friendly, ...
Jeff
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [Cbe-oss-dev] [PATCH 0/15] spidernet driver bug fixes
2007-06-14 23:01 ` Jeff Garzik
@ 2007-06-14 23:03 ` David Woodhouse
2007-06-14 23:04 ` Jeff Garzik
0 siblings, 1 reply; 69+ messages in thread
From: David Woodhouse @ 2007-06-14 23:03 UTC (permalink / raw)
To: Jeff Garzik; +Cc: Linas Vepstas, netdev, cbe-oss-dev
On Thu, 2007-06-14 at 19:01 -0400, Jeff Garzik wrote:
> It makes diffing between lines of development more difficult, takes up
> more overall space, less cache friendly, ...
All of which is much less true if you're sharing object directories or
even using alternates.
--
dwmw2
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [Cbe-oss-dev] [PATCH 0/15] spidernet driver bug fixes
2007-06-14 23:03 ` David Woodhouse
@ 2007-06-14 23:04 ` Jeff Garzik
2007-06-14 23:07 ` David Woodhouse
0 siblings, 1 reply; 69+ messages in thread
From: Jeff Garzik @ 2007-06-14 23:04 UTC (permalink / raw)
To: David Woodhouse; +Cc: Linas Vepstas, netdev, cbe-oss-dev
David Woodhouse wrote:
> On Thu, 2007-06-14 at 19:01 -0400, Jeff Garzik wrote:
>> It makes diffing between lines of development more difficult, takes up
>> more overall space, less cache friendly, ...
>
> All of which is much less true if you're sharing object directories or
> even using alternates.
Think about the actual kernel tree source code, not just the metadata...
Jeff
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [Cbe-oss-dev] [PATCH 0/15] spidernet driver bug fixes
2007-06-14 23:04 ` Jeff Garzik
@ 2007-06-14 23:07 ` David Woodhouse
2007-06-14 23:32 ` Michael Ellerman
0 siblings, 1 reply; 69+ messages in thread
From: David Woodhouse @ 2007-06-14 23:07 UTC (permalink / raw)
To: Jeff Garzik; +Cc: Linas Vepstas, netdev, cbe-oss-dev
On Thu, 2007-06-14 at 19:04 -0400, Jeff Garzik wrote:
> Think about the actual kernel tree source code, not just the
> metadata...
Disk is cheap. Waiting for the whole damn thing to rebuild after
switching branches and back again is less so.
Besides, checking it out is optional.
--
dwmw2
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [Cbe-oss-dev] [PATCH 0/15] spidernet driver bug fixes
2007-06-14 23:07 ` David Woodhouse
@ 2007-06-14 23:32 ` Michael Ellerman
0 siblings, 0 replies; 69+ messages in thread
From: Michael Ellerman @ 2007-06-14 23:32 UTC (permalink / raw)
To: David Woodhouse; +Cc: Jeff Garzik, netdev, Linas Vepstas, cbe-oss-dev
[-- Attachment #1: Type: text/plain, Size: 592 bytes --]
On Fri, 2007-06-15 at 00:07 +0100, David Woodhouse wrote:
> On Thu, 2007-06-14 at 19:04 -0400, Jeff Garzik wrote:
> > Think about the actual kernel tree source code, not just the
> > metadata...
>
> Disk is cheap. Waiting for the whole damn thing to rebuild after
> switching branches and back again is less so.
ccache!
cheers
--
Michael Ellerman
OzLabs, IBM Australia Development Lab
wwweb: http://michael.ellerman.id.au
phone: +61 2 6212 1183 (tie line 70 21183)
We do not inherit the earth from our ancestors,
we borrow it from our children. - S.M.A.R.T Person
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 0/15] spidernet driver bug fixes
2007-06-12 23:00 ` Jeff Garzik
2007-06-12 23:32 ` Linas Vepstas
@ 2007-06-13 1:33 ` Michael Ellerman
2007-06-13 1:54 ` Jeff Garzik
1 sibling, 1 reply; 69+ messages in thread
From: Michael Ellerman @ 2007-06-13 1:33 UTC (permalink / raw)
To: Jeff Garzik; +Cc: Linas Vepstas, Jeff Garzik, netdev, cbe-oss-dev
[-- Attachment #1: Type: text/plain, Size: 2142 bytes --]
On Tue, 2007-06-12 at 19:00 -0400, Jeff Garzik wrote:
> Linas Vepstas wrote:
> > On Fri, Jun 08, 2007 at 01:20:20PM -0400, Jeff Garzik wrote:
> >> On Fri, Jun 08, 2007 at 12:06:08PM -0500, Linas Vepstas wrote:
> >>> On Fri, Jun 08, 2007 at 11:12:31AM +1000, Michael Ellerman wrote:
> >>>> On Thu, 2007-06-07 at 14:17 -0500, Linas Vepstas wrote:
> >>>>> The major bug fixes are:
> >>>> I realise it's late, but shouldn't "major bugfixes" be going into 22 ?
> >>> Yeah, I suppose, I admit I've lost track of the process.
> >> You need to order your bug fixes first in the queue.
> >
> > OK, here are the patches, re-ordered. There is a different number
> > than last time, as I threw out one, merged one, and got cold feet
> > on a third one. They still pass the tests.
> >
> > The first five patches focus on three serious bugs, fixing crashes or
> > hangs.
> >
> > -- patch 1 -- kernel crash when ifdown while receiving packets.
> > -- patch 2,3,4 -- device driver deadlocks on "RX ram full" mesgs.
> > (kernel stays up, ifdown/up clear the problem).
> > -- patch 5 -- misconfigured TX interrupts results in 3x-4x per
> > degradation for small packets.
> >
> > -- patch 6 -- rx stats may be mangled
> > -- patch 7 -- hw checksum sometimes breaks ipv6 operation
> >
> > -- patches 8-15 -- misc tweaks, and documentation.
> >
> >
> > I re-ran my stress tests with patches 1-7 applied; they pass.
>
> This is a bit frustrating, because this includes many patches that you
> ALREADY told me to queue for 2.6.23, which I did, in
> netdev-2.6.git#upstream.
Linas posted the patches, I responded querying whether the bug fixes
should go into 2.6.22, and then you told him "you need to order your bug
fixes first in the queue". Which seemed pretty clear to me that you'd
wait for the reordered series.
cheers
--
Michael Ellerman
OzLabs, IBM Australia Development Lab
wwweb: http://michael.ellerman.id.au
phone: +61 2 6212 1183 (tie line 70 21183)
We do not inherit the earth from our ancestors,
we borrow it from our children. - S.M.A.R.T Person
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 0/15] spidernet driver bug fixes
2007-06-13 1:33 ` Michael Ellerman
@ 2007-06-13 1:54 ` Jeff Garzik
2007-06-13 13:53 ` Michael Ellerman
0 siblings, 1 reply; 69+ messages in thread
From: Jeff Garzik @ 2007-06-13 1:54 UTC (permalink / raw)
To: michael; +Cc: Linas Vepstas, netdev, cbe-oss-dev
Michael Ellerman wrote:
> Linas posted the patches, I responded querying whether the bug fixes
> should go into 2.6.22, and then you told him "you need to order your bug
> fixes first in the queue". Which seemed pretty clear to me that you'd
> wait for the reordered series.
This was presuming Linas actually knew what he himself had submitted
previously, and had been accepted...
I explicitly emailed Linas on May 24, 2007 detailing each patch that had
been applied, and to which netdev-2.6.git branch it had been applied
(and thus whether it was queued for 2.6.22 or 2.6.23). Relevant
Message-id is <4656033F.1060505@garzik.org>, and was sent not only to
Linas but also to netdev@vger.kernel.org, linuxppc-dev@ozlabs.org, and
cbe-oss-dev@ozlabs.org.
These changes were subsequently made public immediately via
git://git.kernel.org/.../jgarzik/netdev-2.6.git branches
'upstream-fixes' and 'upstream', and were followed a few days later by
akpm's public tree, starting with 2.6.22-rc3-mm1 (and all subsequent
releases).
All of the above seemed pretty clear, too.
To move forward, it sounds like the best thing to do is drop all
spidernet patches and start over, yes?
Jeff
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 0/15] spidernet driver bug fixes
2007-06-13 1:54 ` Jeff Garzik
@ 2007-06-13 13:53 ` Michael Ellerman
2007-06-13 18:45 ` Jeff Garzik
0 siblings, 1 reply; 69+ messages in thread
From: Michael Ellerman @ 2007-06-13 13:53 UTC (permalink / raw)
To: Jeff Garzik; +Cc: Linas Vepstas, netdev, cbe-oss-dev
[-- Attachment #1: Type: text/plain, Size: 1783 bytes --]
On Tue, 2007-06-12 at 21:54 -0400, Jeff Garzik wrote:
> Michael Ellerman wrote:
> > Linas posted the patches, I responded querying whether the bug fixes
> > should go into 2.6.22, and then you told him "you need to order your bug
> > fixes first in the queue". Which seemed pretty clear to me that you'd
> > wait for the reordered series.
>
> This was presuming Linas actually knew what he himself had submitted
> previously, and had been accepted...
>
> I explicitly emailed Linas on May 24, 2007 detailing each patch that had
> been applied, and to which netdev-2.6.git branch it had been applied
> (and thus whether it was queued for 2.6.22 or 2.6.23). Relevant
> Message-id is <4656033F.1060505@garzik.org>, and was sent not only to
> Linas but also to netdev@vger.kernel.org, linuxppc-dev@ozlabs.org, and
> cbe-oss-dev@ozlabs.org.
>
> These changes were subsequently made public immediately via
> git://git.kernel.org/.../jgarzik/netdev-2.6.git branches
> 'upstream-fixes' and 'upstream', and were followed a few days later by
> akpm's public tree, starting with 2.6.22-rc3-mm1 (and all subsequent
> releases).
>
> All of the above seemed pretty clear, too.
Yeah fair enuf, that's fairly clear.
> To move forward, it sounds like the best thing to do is drop all
> spidernet patches and start over, yes?
Well see what Linas thinks, but that is probably easiest. I was just
keen to see the "major bugfixes" get into 22, rather than waiting
another few months for 23.
cheers
--
Michael Ellerman
OzLabs, IBM Australia Development Lab
wwweb: http://michael.ellerman.id.au
phone: +61 2 6212 1183 (tie line 70 21183)
We do not inherit the earth from our ancestors,
we borrow it from our children. - S.M.A.R.T Person
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 69+ messages in thread