* [RFT 1/5] sky2: suspend/resume patchlets
2006-06-13 8:17 [RFT 0/5] sky2: suspend/resume patchlets Stephen Hemminger
@ 2006-06-13 8:17 ` Stephen Hemminger
2006-06-13 8:17 ` [RFT 2/5] " Stephen Hemminger
` (5 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: Stephen Hemminger @ 2006-06-13 8:17 UTC (permalink / raw)
To: Jeff Garzik; +Cc: linus, netdev
[-- Attachment #1: sky2-set-power-state.patch --]
[-- Type: text/plain, Size: 1932 bytes --]
Subject: [RFT 1/5] sky2: set_power_state should be void
The set power state function is cleaner if it doesn't return anything.
The only caller that could fail is in suspend() and it can check the argument
there.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
--- test.orig/drivers/net/sky2.c
+++ test/drivers/net/sky2.c
@@ -187,12 +187,11 @@ static u16 gm_phy_read(struct sky2_hw *h
return v;
}
-static int sky2_set_power_state(struct sky2_hw *hw, pci_power_t state)
+static void sky2_set_power_state(struct sky2_hw *hw, pci_power_t state)
{
u16 power_control;
u32 reg1;
int vaux;
- int ret = 0;
pr_debug("sky2_set_power_state %d\n", state);
sky2_write8(hw, B2_TST_CTRL1, TST_CFG_WRITE_ON);
@@ -275,12 +274,10 @@ static int sky2_set_power_state(struct s
break;
default:
printk(KERN_ERR PFX "Unknown power state %d\n", state);
- ret = -1;
}
sky2_pci_write16(hw, hw->pm_cap + PCI_PM_CTRL, power_control);
sky2_write8(hw, B2_TST_CTRL1, TST_CFG_WRITE_OFF);
- return ret;
}
static void sky2_phy_reset(struct sky2_hw *hw, unsigned port)
@@ -3428,6 +3425,10 @@ static int sky2_suspend(struct pci_dev *
{
struct sky2_hw *hw = pci_get_drvdata(pdev);
int i;
+ pci_power_t pstate = pci_choose_state(pdev, state);
+
+ if (!(pstate == PCI_D3hot || pstate == PCI_D3cold))
+ return -EINVAL;
for (i = 0; i < 2; i++) {
struct net_device *dev = hw->dev[i];
@@ -3442,7 +3443,8 @@ static int sky2_suspend(struct pci_dev *
}
pci_save_state(pdev);
- return sky2_set_power_state(hw, pci_choose_state(pdev, state));
+ sky2_set_power_state(hw, pstate);
+ return 0;
}
static int sky2_resume(struct pci_dev *pdev)
@@ -3452,9 +3454,7 @@ static int sky2_resume(struct pci_dev *p
pci_restore_state(pdev);
pci_enable_wake(pdev, PCI_D0, 0);
- err = sky2_set_power_state(hw, PCI_D0);
- if (err)
- goto out;
+ sky2_set_power_state(hw, PCI_D0);
err = sky2_reset(hw);
if (err)
--
^ permalink raw reply [flat|nested] 8+ messages in thread* [RFT 2/5] sky2: suspend/resume patchlets
2006-06-13 8:17 [RFT 0/5] sky2: suspend/resume patchlets Stephen Hemminger
2006-06-13 8:17 ` [RFT 1/5] " Stephen Hemminger
@ 2006-06-13 8:17 ` Stephen Hemminger
2006-06-13 8:17 ` [RFT 3/5] " Stephen Hemminger
` (4 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: Stephen Hemminger @ 2006-06-13 8:17 UTC (permalink / raw)
To: Jeff Garzik; +Cc: linus, netdev
[-- Attachment #1: sky2-ports.patch --]
[-- Type: text/plain, Size: 761 bytes --]
Subject: [RFT 2/5] sky2: don't hard code number of ports
It is cleaner, to not loop over both ports if only one exists.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
--- test.orig/drivers/net/sky2.c
+++ test/drivers/net/sky2.c
@@ -3430,7 +3430,7 @@ static int sky2_suspend(struct pci_dev *
if (!(pstate == PCI_D3hot || pstate == PCI_D3cold))
return -EINVAL;
- for (i = 0; i < 2; i++) {
+ for (i = 0; i < hw->ports; i++) {
struct net_device *dev = hw->dev[i];
if (dev) {
@@ -3460,7 +3460,7 @@ static int sky2_resume(struct pci_dev *p
if (err)
goto out;
- for (i = 0; i < 2; i++) {
+ for (i = 0; i < hw->ports; i++) {
struct net_device *dev = hw->dev[i];
if (dev && netif_running(dev)) {
netif_device_attach(dev);
--
^ permalink raw reply [flat|nested] 8+ messages in thread* [RFT 3/5] sky2: suspend/resume patchlets
2006-06-13 8:17 [RFT 0/5] sky2: suspend/resume patchlets Stephen Hemminger
2006-06-13 8:17 ` [RFT 1/5] " Stephen Hemminger
2006-06-13 8:17 ` [RFT 2/5] " Stephen Hemminger
@ 2006-06-13 8:17 ` Stephen Hemminger
2006-06-13 8:17 ` [RFT 4/5] " Stephen Hemminger
` (3 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: Stephen Hemminger @ 2006-06-13 8:17 UTC (permalink / raw)
To: Jeff Garzik; +Cc: linus, netdev
[-- Attachment #1: sky2-poll-complete.patch --]
[-- Type: text/plain, Size: 700 bytes --]
Subject: [RFT 3/5] sky2: fix hotplug detect during poll
Linus, doesn't understand NAPI.
If the poll routine detects no hardware available, it needs to dequeue
it self from the network poll list.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
--- test.orig/drivers/net/sky2.c
+++ test/drivers/net/sky2.c
@@ -2181,7 +2181,7 @@ static int sky2_poll(struct net_device *
u32 status = sky2_read32(hw, B0_Y2_SP_EISR);
if (!~status)
- return 0;
+ goto out;
if (status & Y2_IS_HW_ERR)
sky2_hw_intr(hw);
@@ -2219,7 +2219,7 @@ static int sky2_poll(struct net_device *
if (sky2_more_work(hw))
return 1;
-
+out:
netif_rx_complete(dev0);
sky2_read32(hw, B0_Y2_SP_LISR);
--
^ permalink raw reply [flat|nested] 8+ messages in thread* [RFT 4/5] sky2: suspend/resume patchlets
2006-06-13 8:17 [RFT 0/5] sky2: suspend/resume patchlets Stephen Hemminger
` (2 preceding siblings ...)
2006-06-13 8:17 ` [RFT 3/5] " Stephen Hemminger
@ 2006-06-13 8:17 ` Stephen Hemminger
2006-06-13 8:17 ` [RFT 5/5] " Stephen Hemminger
` (2 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: Stephen Hemminger @ 2006-06-13 8:17 UTC (permalink / raw)
To: Jeff Garzik; +Cc: linus, netdev
[-- Attachment #1: sky2-suspend-nohwirq.patch --]
[-- Type: text/plain, Size: 725 bytes --]
Subject: [RFT 4/5] sky2: save/restore base hardware irq during suspend/resume
The hardware should be fully shut off during suspend, and the base
irq mask restored during resume.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
--- test.orig/drivers/net/sky2.c
+++ test/drivers/net/sky2.c
@@ -3442,6 +3442,7 @@ static int sky2_suspend(struct pci_dev *
}
}
+ sky2_write32(hw, B0_IMSK, 0);
pci_save_state(pdev);
sky2_set_power_state(hw, pstate);
return 0;
@@ -3460,6 +3461,8 @@ static int sky2_resume(struct pci_dev *p
if (err)
goto out;
+ sky2_write32(hw, B0_IMSK, Y2_IS_BASE);
+
for (i = 0; i < hw->ports; i++) {
struct net_device *dev = hw->dev[i];
if (dev && netif_running(dev)) {
--
^ permalink raw reply [flat|nested] 8+ messages in thread* [RFT 5/5] sky2: suspend/resume patchlets
2006-06-13 8:17 [RFT 0/5] sky2: suspend/resume patchlets Stephen Hemminger
` (3 preceding siblings ...)
2006-06-13 8:17 ` [RFT 4/5] " Stephen Hemminger
@ 2006-06-13 8:17 ` Stephen Hemminger
2006-06-13 13:22 ` [RFT 0/5] " Jeff Garzik
2006-06-13 16:38 ` Linus Torvalds
6 siblings, 0 replies; 8+ messages in thread
From: Stephen Hemminger @ 2006-06-13 8:17 UTC (permalink / raw)
To: Jeff Garzik; +Cc: linus, netdev
[-- Attachment #1: sky2-resume.patch --]
[-- Type: text/plain, Size: 1685 bytes --]
Subject: [RFT 5/5] sky2: stop/start hardware idle timer on suspend/resume
The resume bug was cause not by an early interrupt but because
the idle timeout was not being stopped on suspend. Also disable hardware
IRQ's on suspend. Will need to revisit this with hotplug?
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
--- test.orig/drivers/net/sky2.c
+++ test/drivers/net/sky2.c
@@ -2161,6 +2161,13 @@ static void sky2_descriptor_error(struct
/* If idle then force a fake soft NAPI poll once a second
* to work around cases where sharing an edge triggered interrupt.
*/
+static inline void sky2_idle_start(struct sky2_hw *hw)
+{
+ if (idle_timeout > 0)
+ mod_timer(&hw->idle_timer,
+ jiffies + msecs_to_jiffies(idle_timeout));
+}
+
static void sky2_idle(unsigned long arg)
{
struct sky2_hw *hw = (struct sky2_hw *) arg;
@@ -3350,9 +3357,7 @@ static int __devinit sky2_probe(struct p
sky2_write32(hw, B0_IMSK, Y2_IS_BASE);
setup_timer(&hw->idle_timer, sky2_idle, (unsigned long) hw);
- if (idle_timeout > 0)
- mod_timer(&hw->idle_timer,
- jiffies + msecs_to_jiffies(idle_timeout));
+ sky2_idle_start(hw);
pci_set_drvdata(pdev, hw);
@@ -3430,6 +3435,8 @@ static int sky2_suspend(struct pci_dev *
if (!(pstate == PCI_D3hot || pstate == PCI_D3cold))
return -EINVAL;
+ del_timer_sync(&hw->idle_timer);
+
for (i = 0; i < hw->ports; i++) {
struct net_device *dev = hw->dev[i];
@@ -3472,10 +3479,12 @@ static int sky2_resume(struct pci_dev *p
printk(KERN_ERR PFX "%s: could not up: %d\n",
dev->name, err);
dev_close(dev);
- break;
+ goto out;
}
}
}
+
+ sky2_idle_start(hw);
out:
return err;
}
--
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: [RFT 0/5] sky2: suspend/resume patchlets
2006-06-13 8:17 [RFT 0/5] sky2: suspend/resume patchlets Stephen Hemminger
` (4 preceding siblings ...)
2006-06-13 8:17 ` [RFT 5/5] " Stephen Hemminger
@ 2006-06-13 13:22 ` Jeff Garzik
2006-06-13 16:38 ` Linus Torvalds
6 siblings, 0 replies; 8+ messages in thread
From: Jeff Garzik @ 2006-06-13 13:22 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: Linus Torvalds, netdev
Stephen Hemminger wrote:
> There were a several problems buried in suspend/resume. The real
> failure was caused by the idle timer not being stopped/restarted.
> But several other races, and cleanups were needed.
>
> Since I don't have a machine that will suspend successfully with
> that hardware, I can't test it.
All five patches look OK to me. Did Linus get the patches? The emails
were sent to linus@osdl.org not torvalds@osdl.org.
Jeff
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: [RFT 0/5] sky2: suspend/resume patchlets
2006-06-13 8:17 [RFT 0/5] sky2: suspend/resume patchlets Stephen Hemminger
` (5 preceding siblings ...)
2006-06-13 13:22 ` [RFT 0/5] " Jeff Garzik
@ 2006-06-13 16:38 ` Linus Torvalds
6 siblings, 0 replies; 8+ messages in thread
From: Linus Torvalds @ 2006-06-13 16:38 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: Jeff Garzik, netdev, David S. Miller
[ Davem - see the final conclusion: this might not be a driver bug as much
as a netconsole problem, where netconsole might perhaps continue sendign
on a device that really can't take it any more? ]
On Tue, 13 Jun 2006, Stephen Hemminger wrote:
>
> There were a several problems buried in suspend/resume. The real
> failure was caused by the idle timer not being stopped/restarted.
> But several other races, and cleanups were needed.
>
> Since I don't have a machine that will suspend successfully with
> that hardware, I can't test it.
With this, I get a page-fault in sky2_tx_complete+0x91 (with traceback to
sky2_poll, net_rx_action, do_softirq, do_IRQ, skb_release_data, kfree_skb,
sky2_rx_clean, sky2_down, sky2_suspend, pci_device_suspend, all the way
down to suspend_device()).
So an IRQ happened while the sky2 driver was doign sky2_rx_clean, which is
just _after_ it did "sky2_tx_clean()", and then the TX side was unhappy
for some reason.
Again, the driver has actually tried to disable its _own_ irq, but that
doesn't much help. Also, with write posting, even its own irq might have
gotten delayed (ie if you really want to synchronize irq's, you need to
read from the device, and then also wait a bit to see that the irq isn't
being posted int he _other_ direction), but in the presense of shared
irq's, it just doesn't do anything at all.
I can't seem to get a bigger VGA console on the Mac mini, so I'm unable to
see the exact register values.
Btw, this probably happens with my patch too, and is likely
timing-related.
Oh, and to make matters worse, I also enabled netconsole (in order to see
what goes wrong), which is probably what brought on the horrid timing
issue (ie packets going out _just_ at the right time saying "shutting
down sky2")
Btw, that "sky2_tx_complete+0x91" seems to be
loop:
inc %edx
mov 0x9c(%ecx),%eax
** movzwl 0x4(%eax),%eax **
cmp %eax,%edx
jb loop
which in turn is:
for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) {
struct tx_ring_info *fre;
fre = sky2->tx_ring + RING_NEXT(put + i, TX_RING_SIZE);
pci_unmap_page(pdev, pci_unmap_addr(fre, mapaddr),
skb_shinfo(skb)->frags[i].size,
PCI_DMA_TODEVICE);
}
since pci_unmap_page() is a no-op here ;)
So it looks like it's the "skb_shinfo(skb)->nr_frags" access that oopses.
Which probably means that
skb = re->skb;
just got garbage (rememebr: the pci_unmap_single() directly after it is
_also_ a no-op, so it wouldn't oops there).
I dunno the details. I'd have _expected_ tx_cons to be equal to tx_prod
here (since we just did a sky2_tx_clean() before), and the loop to not
have been entered at all, but I wonder if maybe it's the netconsole that
doesn't honor "netif_stop_queue()"?
Dunno.
Linus
^ permalink raw reply [flat|nested] 8+ messages in thread