netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] b44: fix resume, request_irq after hw reset
@ 2010-10-11 23:22 James Hogan
  2010-10-11 23:34 ` Andrew Morton
  2010-10-13 16:46 ` David Miller
  0 siblings, 2 replies; 7+ messages in thread
From: James Hogan @ 2010-10-11 23:22 UTC (permalink / raw)
  To: Gary Zambrano, Jiri Pirko, FUJITA Tomonori, Hauke Mehrtens,
	Larry Finger <Larry.Finger@
  Cc: David S. Miller

This driver was hanging on resume because it was requesting a shared irq
that it wasn't ready to immediately handle, which was tested in
request_irq because of the CONFIG_DEBUG_SHIRQ config option. The
interrupt handler tried to read the interrupt status register but for
some reason it hung the system.

The request_irq is now moved a bit later after resetting the hardware
which seems to fix it.

Signed-off-by: James Hogan <james@albanarts.com>
---
 drivers/net/b44.c |   12 ++++++------
 1 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/net/b44.c b/drivers/net/b44.c
index 1e620e2..dbba981 100644
--- a/drivers/net/b44.c
+++ b/drivers/net/b44.c
@@ -2296,12 +2296,6 @@ static int b44_resume(struct ssb_device *sdev)
 	if (!netif_running(dev))
 		return 0;
 
-	rc = request_irq(dev->irq, b44_interrupt, IRQF_SHARED, dev->name, dev);
-	if (rc) {
-		netdev_err(dev, "request_irq failed\n");
-		return rc;
-	}
-
 	spin_lock_irq(&bp->lock);
 
 	b44_init_rings(bp);
@@ -2309,6 +2303,12 @@ static int b44_resume(struct ssb_device *sdev)
 	netif_device_attach(bp->dev);
 	spin_unlock_irq(&bp->lock);
 
+	rc = request_irq(dev->irq, b44_interrupt, IRQF_SHARED, dev->name, dev);
+	if (rc) {
+		netdev_err(dev, "request_irq failed\n");
+		return rc;
+	}
+
 	b44_enable_ints(bp);
 	netif_wake_queue(dev);
 
-- 
1.7.2.3


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] b44: fix resume, request_irq after hw reset
  2010-10-11 23:22 [PATCH] b44: fix resume, request_irq after hw reset James Hogan
@ 2010-10-11 23:34 ` Andrew Morton
  2010-10-12  7:08   ` James Hogan
  2010-10-13 16:46 ` David Miller
  1 sibling, 1 reply; 7+ messages in thread
From: Andrew Morton @ 2010-10-11 23:34 UTC (permalink / raw)
  To: James Hogan
  Cc: Gary Zambrano, Jiri Pirko, FUJITA Tomonori, Hauke Mehrtens,
	Larry Finger, netdev, linux-kernel, David S. Miller

On Tue, 12 Oct 2010 00:22:12 +0100
James Hogan <james@albanarts.com> wrote:

> This driver was hanging on resume because it was requesting a shared irq
> that it wasn't ready to immediately handle, which was tested in
> request_irq because of the CONFIG_DEBUG_SHIRQ config option. The
> interrupt handler tried to read the interrupt status register but for
> some reason it hung the system.
> 
> The request_irq is now moved a bit later after resetting the hardware
> which seems to fix it.
> 
> Signed-off-by: James Hogan <james@albanarts.com>
> ---
>  drivers/net/b44.c |   12 ++++++------
>  1 files changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/net/b44.c b/drivers/net/b44.c
> index 1e620e2..dbba981 100644
> --- a/drivers/net/b44.c
> +++ b/drivers/net/b44.c
> @@ -2296,12 +2296,6 @@ static int b44_resume(struct ssb_device *sdev)
>  	if (!netif_running(dev))
>  		return 0;
>  
> -	rc = request_irq(dev->irq, b44_interrupt, IRQF_SHARED, dev->name, dev);
> -	if (rc) {
> -		netdev_err(dev, "request_irq failed\n");
> -		return rc;
> -	}
> -
>  	spin_lock_irq(&bp->lock);
>  
>  	b44_init_rings(bp);
> @@ -2309,6 +2303,12 @@ static int b44_resume(struct ssb_device *sdev)
>  	netif_device_attach(bp->dev);
>  	spin_unlock_irq(&bp->lock);
>  
> +	rc = request_irq(dev->irq, b44_interrupt, IRQF_SHARED, dev->name, dev);
> +	if (rc) {
> +		netdev_err(dev, "request_irq failed\n");
> +		return rc;
> +	}
> +
>  	b44_enable_ints(bp);
>  	netif_wake_queue(dev);

OK, running the interrupt handler before b44_init_hw() is presumably
the problem here.

The hardware surely won't be generating interrupts until we've run
b44_init_hw() and b44_enable_ints(), so this patch really is only to
keep CONFIG_DEBUG_SHIRQ happy.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] b44: fix resume, request_irq after hw reset
  2010-10-11 23:34 ` Andrew Morton
@ 2010-10-12  7:08   ` James Hogan
  2010-10-12  7:27     ` Andrew Morton
  0 siblings, 1 reply; 7+ messages in thread
From: James Hogan @ 2010-10-12  7:08 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Gary Zambrano, Jiri Pirko, FUJITA Tomonori, Hauke Mehrtens,
	Larry Finger, netdev, linux-kernel, David S. Miller

On Tuesday 12 October 2010 00:34:40 Andrew Morton wrote:
> On Tue, 12 Oct 2010 00:22:12 +0100
> 
> James Hogan <james@albanarts.com> wrote:
> > This driver was hanging on resume because it was requesting a shared irq
> > that it wasn't ready to immediately handle, which was tested in
> > request_irq because of the CONFIG_DEBUG_SHIRQ config option. The
> > interrupt handler tried to read the interrupt status register but for
> > some reason it hung the system.
> > 
> > The request_irq is now moved a bit later after resetting the hardware
> > which seems to fix it.
> > 
> > Signed-off-by: James Hogan <james@albanarts.com>
> > ---
> > 
> >  drivers/net/b44.c |   12 ++++++------
> >  1 files changed, 6 insertions(+), 6 deletions(-)
> > 
> > diff --git a/drivers/net/b44.c b/drivers/net/b44.c
> > index 1e620e2..dbba981 100644
> > --- a/drivers/net/b44.c
> > +++ b/drivers/net/b44.c
> > @@ -2296,12 +2296,6 @@ static int b44_resume(struct ssb_device *sdev)
> > 
> >  	if (!netif_running(dev))
> >  	
> >  		return 0;
> > 
> > -	rc = request_irq(dev->irq, b44_interrupt, IRQF_SHARED, dev->name, 
dev);
> > -	if (rc) {
> > -		netdev_err(dev, "request_irq failed\n");
> > -		return rc;
> > -	}
> > -
> > 
> >  	spin_lock_irq(&bp->lock);
> >  	
> >  	b44_init_rings(bp);
> > 
> > @@ -2309,6 +2303,12 @@ static int b44_resume(struct ssb_device *sdev)
> > 
> >  	netif_device_attach(bp->dev);
> >  	spin_unlock_irq(&bp->lock);
> > 
> > +	rc = request_irq(dev->irq, b44_interrupt, IRQF_SHARED, dev->name, 
dev);
> > +	if (rc) {
> > +		netdev_err(dev, "request_irq failed\n");
> > +		return rc;
> > +	}
> > +
> > 
> >  	b44_enable_ints(bp);
> >  	netif_wake_queue(dev);
> 
> OK, running the interrupt handler before b44_init_hw() is presumably
> the problem here.
> 
> The hardware surely won't be generating interrupts until we've run
> b44_init_hw() and b44_enable_ints(), so this patch really is only to
> keep CONFIG_DEBUG_SHIRQ happy.

For me it's mainly to keep CONFIG_DEBUG_SHIRQ happy (Fedora has this switched 
on), but since it's a shared IRQ, there is still a chance it could be 
called before enabling it's own interrupts by a different device on the same 
IRQ.

It makes sense to me why it's disabling the IRQ now, in case another device 
triggers it when it cannot handle it safely. I also tried calling the 
interrupt directly before the free_irq in the suspend function to check that 
it wasn't being done too late, and it didn't fail, so possibly it is the core 
suspension that makes it start failing until it is brought back up properly.

Cheers
James

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] b44: fix resume, request_irq after hw reset
  2010-10-12  7:08   ` James Hogan
@ 2010-10-12  7:27     ` Andrew Morton
  2010-10-12  8:40       ` James Hogan
  0 siblings, 1 reply; 7+ messages in thread
From: Andrew Morton @ 2010-10-12  7:27 UTC (permalink / raw)
  To: James Hogan
  Cc: Gary Zambrano, Jiri Pirko, FUJITA Tomonori, Hauke Mehrtens,
	Larry Finger, netdev, linux-kernel, David S. Miller

On Tue, 12 Oct 2010 08:08:05 +0100 James Hogan <james@albanarts.com> wrote:

> > > @@ -2309,6 +2303,12 @@ static int b44_resume(struct ssb_device *sdev)
> > > 
> > >  	netif_device_attach(bp->dev);
> > >  	spin_unlock_irq(&bp->lock);
> > > 
> > > +	rc = request_irq(dev->irq, b44_interrupt, IRQF_SHARED, dev->name, 
> dev);
> > > +	if (rc) {
> > > +		netdev_err(dev, "request_irq failed\n");
> > > +		return rc;
> > > +	}
> > > +
> > > 
> > >  	b44_enable_ints(bp);
> > >  	netif_wake_queue(dev);
> > 
> > OK, running the interrupt handler before b44_init_hw() is presumably
> > the problem here.
> > 
> > The hardware surely won't be generating interrupts until we've run
> > b44_init_hw() and b44_enable_ints(), so this patch really is only to
> > keep CONFIG_DEBUG_SHIRQ happy.
> 
> For me it's mainly to keep CONFIG_DEBUG_SHIRQ happy (Fedora has this switched 
> on), but since it's a shared IRQ, there is still a chance it could be 
> called before enabling it's own interrupts by a different device on the same 
> IRQ.

ooh, yes, you're right, I forgot about that.  It's indeed a bug.

> It makes sense to me why it's disabling the IRQ now, in case another device 
> triggers it when it cannot handle it safely.

What code are you referring to here?  There's no disable_irq() in that area?

> I also tried calling the 
> interrupt directly before the free_irq in the suspend function to check that 
> it wasn't being done too late, and it didn't fail, so possibly it is the core 
> suspension that makes it start failing until it is brought back up properly.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] b44: fix resume, request_irq after hw reset
  2010-10-12  7:27     ` Andrew Morton
@ 2010-10-12  8:40       ` James Hogan
  0 siblings, 0 replies; 7+ messages in thread
From: James Hogan @ 2010-10-12  8:40 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Gary Zambrano, Jiri Pirko, FUJITA Tomonori, Hauke Mehrtens,
	Larry Finger, netdev, linux-kernel, David S. Miller

On 12 October 2010 08:27, Andrew Morton <akpm@linux-foundation.org> wrote:
> On Tue, 12 Oct 2010 08:08:05 +0100 James Hogan <james@albanarts.com> wrote:
>
>> > > @@ -2309,6 +2303,12 @@ static int b44_resume(struct ssb_device *sdev)
>> > >
>> > >   netif_device_attach(bp->dev);
>> > >   spin_unlock_irq(&bp->lock);
>> > >
>> > > + rc = request_irq(dev->irq, b44_interrupt, IRQF_SHARED, dev->name,
>> dev);
>> > > + if (rc) {
>> > > +         netdev_err(dev, "request_irq failed\n");
>> > > +         return rc;
>> > > + }
>> > > +
>> > >
>> > >   b44_enable_ints(bp);
>> > >   netif_wake_queue(dev);
>> >
>> > OK, running the interrupt handler before b44_init_hw() is presumably
>> > the problem here.
>> >
>> > The hardware surely won't be generating interrupts until we've run
>> > b44_init_hw() and b44_enable_ints(), so this patch really is only to
>> > keep CONFIG_DEBUG_SHIRQ happy.
>>
>> For me it's mainly to keep CONFIG_DEBUG_SHIRQ happy (Fedora has this switched
>> on), but since it's a shared IRQ, there is still a chance it could be
>> called before enabling it's own interrupts by a different device on the same
>> IRQ.
>
> ooh, yes, you're right, I forgot about that.  It's indeed a bug.
>
>> It makes sense to me why it's disabling the IRQ now, in case another device
>> triggers it when it cannot handle it safely.
>
> What code are you referring to here?  There's no disable_irq() in that area?

Sorry, I meant freeing the irq (free_irq() in b44_suspend).

Thinking about it this should also go in stable too.

Cheers
James

>
>> I also tried calling the
>> interrupt directly before the free_irq in the suspend function to check that
>> it wasn't being done too late, and it didn't fail, so possibly it is the core
>> suspension that makes it start failing until it is brought back up properly.
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] b44: fix resume, request_irq after hw reset
  2010-10-11 23:22 [PATCH] b44: fix resume, request_irq after hw reset James Hogan
  2010-10-11 23:34 ` Andrew Morton
@ 2010-10-13 16:46 ` David Miller
  2010-10-13 21:39   ` James Hogan
  1 sibling, 1 reply; 7+ messages in thread
From: David Miller @ 2010-10-13 16:46 UTC (permalink / raw)
  To: james
  Cc: zambrano, jpirko, fujita.tomonori, hauke, Larry.Finger, netdev,
	linux-kernel

From: James Hogan <james@albanarts.com>
Date: Tue, 12 Oct 2010 00:22:12 +0100

> @@ -2309,6 +2303,12 @@ static int b44_resume(struct ssb_device *sdev)
>  	netif_device_attach(bp->dev);
>  	spin_unlock_irq(&bp->lock);
>  
> +	rc = request_irq(dev->irq, b44_interrupt, IRQF_SHARED, dev->name, dev);
> +	if (rc) {
> +		netdev_err(dev, "request_irq failed\n");
> +		return rc;
> +	}
> +
>  	b44_enable_ints(bp);
>  	netif_wake_queue(dev);

Since you've moved the request_irq() down, you'll need to adjust
the error handling so that it undoes side effects made by this
function up until this point.

F.e. netif_device_attach() has to be undone for one thing.

Next, b44_init_rings() allocates memory that you must now free.

Etc. etc. etc.

This change is not so simple. :-)

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] b44: fix resume, request_irq after hw reset
  2010-10-13 16:46 ` David Miller
@ 2010-10-13 21:39   ` James Hogan
  0 siblings, 0 replies; 7+ messages in thread
From: James Hogan @ 2010-10-13 21:39 UTC (permalink / raw)
  To: David Miller
  Cc: zambrano, jpirko, fujita.tomonori, hauke, Larry.Finger, netdev,
	linux-kernel

On Wednesday 13 October 2010 17:46:59 David Miller wrote:
> From: James Hogan <james@albanarts.com>
> Date: Tue, 12 Oct 2010 00:22:12 +0100
> 
> > @@ -2309,6 +2303,12 @@ static int b44_resume(struct ssb_device *sdev)
> > 
> >  	netif_device_attach(bp->dev);
> >  	spin_unlock_irq(&bp->lock);
> > 
> > +	rc = request_irq(dev->irq, b44_interrupt, IRQF_SHARED, dev->name, 
dev);
> > +	if (rc) {
> > +		netdev_err(dev, "request_irq failed\n");
> > +		return rc;
> > +	}
> > +
> > 
> >  	b44_enable_ints(bp);
> >  	netif_wake_queue(dev);
> 
> Since you've moved the request_irq() down, you'll need to adjust
> the error handling so that it undoes side effects made by this
> function up until this point.
> 
> F.e. netif_device_attach() has to be undone for one thing.
> 
> Next, b44_init_rings() allocates memory that you must now free.
> 
> Etc. etc. etc.
> 
> This change is not so simple. :-)

Very good point!

Does the ssb_bus_powerup need undoing as well? I'm guessing not since it
wasn't undone before.

How's the patch (at the bottom) looking? it does some better error handling 
and leaves the netif_device_attach(bp->dev); until after the irq is obtained.

I just noticed I actually get the following in my log after resume, so it 
appears something's going wrong even without the request_irq failing. Any idea 
what could be causing the padding to be overwritten? (my experience of net 
drivers and DMA are both non existent). If the net device is closed before 
suspend, this happens on open instead of resume.

Thanks
James

=============================================================================
BUG kmalloc_dma-2048: Padding overwritten. 
0xffff88000003fda8-0xffff88000003fdff
-----------------------------------------------------------------------------

INFO: Slab 0xffffea0000000c40 objects=15 used=1 fp=0xffff880000038000 
flags=0x40c1
Pid: 21848, comm: bash Not tainted 2.6.36-rc7-custom+ #18
Call Trace:
 [<ffffffff8111582d>] slab_err+0xaa/0xcc
 [<ffffffff81006666>] ? xen_set_pud+0x18/0x49
 [<ffffffff811171f4>] ? unfreeze_slab+0x53/0xb0
 [<ffffffff8111756c>] ? get_partial_node+0x20/0x79
 [<ffffffff81115cbd>] slab_pad_check+0xd2/0x124
 [<ffffffff81115da4>] check_slab+0x95/0x9c
 [<ffffffff811178eb>] __slab_alloc+0x326/0x42a
 [<ffffffff813d9096>] ? __netdev_alloc_skb+0x34/0x52
 [<ffffffff812467c6>] ? should_fail+0x91/0xf3
 [<ffffffff81119abb>] __kmalloc_node_track_caller+0x115/0x193
 [<ffffffff813d9096>] ? __netdev_alloc_skb+0x34/0x52
 [<ffffffff813d8076>] __alloc_skb+0x83/0x141
 [<ffffffff813d9096>] __netdev_alloc_skb+0x34/0x52
 [<ffffffffa030f6cf>] b44_alloc_rx_skb+0xf9/0x247 [b44]
 [<ffffffffa03b8000>] ? ssb_device_resume+0x0/0x36 [ssb]
 [<ffffffffa030f8b0>] b44_init_rings+0x93/0xa8 [b44]
 [<ffffffffa030fab3>] b44_resume+0x86/0x142 [b44]
 [<ffffffffa03b8030>] ssb_device_resume+0x30/0x36 [ssb]
 [<ffffffff813031df>] legacy_resume+0x24/0x5c
 [<ffffffff81303b9e>] device_resume+0xcd/0x1ba
 [<ffffffff81303dc3>] dpm_resume_end+0x138/0x3d8
 [<ffffffff8108db83>] suspend_devices_and_enter+0x1ba/0x203
 [<ffffffff8108dcb3>] enter_state+0xe7/0x12e
 [<ffffffff8108d3a5>] state_store+0xb6/0xd3
 [<ffffffff81235877>] kobj_attr_store+0x17/0x19
 [<ffffffff81183223>] sysfs_write_file+0x108/0x144
 [<ffffffff81126330>] vfs_write+0xae/0x10a
 [<ffffffff8112644f>] sys_write+0x4d/0x74
 [<ffffffff81009c32>] system_call_fastpath+0x16/0x1b
 Padding 0xffff88000003fa38:  5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 
ZZZZZZZZZZZZZZZZ
  <snip>
 Padding 0xffff88000003fd98:  5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 
ZZZZZZZZZZZZZZZZ
 Padding 0xffff88000003fda8:  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
................
 Padding 0xffff88000003fdb8:  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
................
 Padding 0xffff88000003fdc8:  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
................
 Padding 0xffff88000003fdd8:  00 00 00 00 5a 5a 5a 5a 00 00 00 00 00 00 00 00 
....ZZZZ........
 Padding 0xffff88000003fde8:  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
................
 Padding 0xffff88000003fdf8:  64 2a 8b 00 00 00 00 00                         
d*......        




---
 drivers/net/b44.c |   12 +++++++++---
 1 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/drivers/net/b44.c b/drivers/net/b44.c
index 1e620e2..5fd251c 100644
--- a/drivers/net/b44.c
+++ b/drivers/net/b44.c
@@ -2296,16 +2296,22 @@ static int b44_resume(struct ssb_device *sdev)
 	if (!netif_running(dev))
 		return 0;
 
+	spin_lock_irq(&bp->lock);
+	b44_init_rings(bp);
+	b44_init_hw(bp, B44_FULL_RESET);
+	spin_unlock_irq(&bp->lock);
+
 	rc = request_irq(dev->irq, b44_interrupt, IRQF_SHARED, dev->name, dev);
 	if (rc) {
 		netdev_err(dev, "request_irq failed\n");
+		spin_lock_irq(&bp->lock);
+		b44_halt(bp);
+		b44_free_rings(bp);
+		spin_unlock_irq(&bp->lock);
 		return rc;
 	}
 
 	spin_lock_irq(&bp->lock);
-
-	b44_init_rings(bp);
-	b44_init_hw(bp, B44_FULL_RESET);
 	netif_device_attach(bp->dev);
 	spin_unlock_irq(&bp->lock);
 
-- 
1.7.2.3


^ permalink raw reply related	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2010-10-13 21:40 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-10-11 23:22 [PATCH] b44: fix resume, request_irq after hw reset James Hogan
2010-10-11 23:34 ` Andrew Morton
2010-10-12  7:08   ` James Hogan
2010-10-12  7:27     ` Andrew Morton
2010-10-12  8:40       ` James Hogan
2010-10-13 16:46 ` David Miller
2010-10-13 21:39   ` James Hogan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).