public inbox for linux-mmc@vger.kernel.org
 help / color / mirror / Atom feed
* crash in mmc subsystem during suspend
@ 2009-11-20  8:51 Daniel Drake
  2009-11-20 15:10 ` Daniel Drake
  0 siblings, 1 reply; 5+ messages in thread
From: Daniel Drake @ 2009-11-20  8:51 UTC (permalink / raw)
  To: linux-mmc

Hi,

We're busy testing the suspend/resume functionality on the new OLPC
XO-1.5 laptop. This uses a SD card as its primary disk, which is on a
SDHCI controller on the VIA VX855 chipset. We are running Linux v2.6.31

We are occasionally encountering a crash from below mmc_rescan() during
suspend:

BUG: unable to handle kernel paging request at 6b6b6c57          
IP: [<b066d6e2>] sdio_remove_func+0x9/0x27
Call Trace:
[<b066cfb4>] ? mmc_sdio_remove+0x34/0x65                        
[<b066d1fc>] ? mmc_attach_sdio+0x217/0x240                      
[<b066a22f>] ? mmc_rescan+0x1a2/0x20f                           
[<b042e9a0>] ? worker_thread+0x156/0x1e1


This needs at least a few hundred suspend/resume cycles before it
reproduces, and often more than 2000.

Here is a log of the final couple of suspend/resumes and the crash:
http://dev.laptop.org/attachment/ticket/9707/p1_crash.log

and some of our diagnosis/discussion so far:
http://dev.laptop.org/ticket/9707

Does anyone have any theories on why this is happening?

Thanks,
Daniel



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: crash in mmc subsystem during suspend
  2009-11-20  8:51 crash in mmc subsystem during suspend Daniel Drake
@ 2009-11-20 15:10 ` Daniel Drake
  2009-11-21 12:31   ` Matt Fleming
  0 siblings, 1 reply; 5+ messages in thread
From: Daniel Drake @ 2009-11-20 15:10 UTC (permalink / raw)
  To: linux-mmc

On Fri, 2009-11-20 at 08:51 +0000, Daniel Drake wrote:
> We're busy testing the suspend/resume functionality on the new OLPC
> XO-1.5 laptop. This uses a SD card as its primary disk, which is on a
> SDHCI controller on the VIA VX855 chipset. We are running Linux v2.6.31
> 
> We are occasionally encountering a crash from below mmc_rescan() during
> suspend:

OK, I just realised that our kernel includes various non-upstream
patches (yet), some of which are executing during suspend/resume and may
be a contributing factor to this problem.
(the patches allow us to retain power to our SDIO wifi card during
suspend)

Nevertheless, any help on our battle to stable suspend/resume and the
other bits we're working on is always appreciated. :)

Daniel



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: crash in mmc subsystem during suspend
  2009-11-20 15:10 ` Daniel Drake
@ 2009-11-21 12:31   ` Matt Fleming
  2009-12-01 12:49     ` Daniel Drake
  0 siblings, 1 reply; 5+ messages in thread
From: Matt Fleming @ 2009-11-21 12:31 UTC (permalink / raw)
  To: Daniel Drake; +Cc: linux-mmc

On Fri, Nov 20, 2009 at 03:10:58PM +0000, Daniel Drake wrote:
> On Fri, 2009-11-20 at 08:51 +0000, Daniel Drake wrote:
> > We're busy testing the suspend/resume functionality on the new OLPC
> > XO-1.5 laptop. This uses a SD card as its primary disk, which is on a
> > SDHCI controller on the VIA VX855 chipset. We are running Linux v2.6.31
> > 
> > We are occasionally encountering a crash from below mmc_rescan() during
> > suspend:
> 
> OK, I just realised that our kernel includes various non-upstream
> patches (yet), some of which are executing during suspend/resume and may
> be a contributing factor to this problem.
> (the patches allow us to retain power to our SDIO wifi card during
> suspend)
> 
> Nevertheless, any help on our battle to stable suspend/resume and the
> other bits we're working on is always appreciated. :)
> 
> Daniel
> 
> 

Fancy giving this patch a try? I think it's just a case of removing too
many funcs in the error path.

diff --git a/drivers/mmc/core/sdio.c b/drivers/mmc/core/sdio.c
index cdb845b..cdec9c8 100644
--- a/drivers/mmc/core/sdio.c
+++ b/drivers/mmc/core/sdio.c
@@ -516,7 +516,8 @@ int mmc_attach_sdio(struct mmc_host *host, u32 ocr)
 	 * The number of functions on the card is encoded inside
 	 * the ocr.
 	 */
-	card->sdio_funcs = funcs = (ocr & 0x70000000) >> 28;
+	funcs = (ocr & 0x70000000) >> 28;
+	card->sdio_funcs = 0;
 
 	/*
 	 * If needed, disconnect card detection pull-up resistor.
@@ -528,7 +529,7 @@ int mmc_attach_sdio(struct mmc_host *host, u32 ocr)
 	/*
 	 * Initialize (but don't add) all present functions.
 	 */
-	for (i = 0;i < funcs;i++) {
+	for (i = 0;i < funcs;i++,card->sdio_funcs++) {
 		err = sdio_init_func(host->card, i + 1);
 		if (err)
 			goto remove;

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: crash in mmc subsystem during suspend
  2009-11-21 12:31   ` Matt Fleming
@ 2009-12-01 12:49     ` Daniel Drake
  2009-12-01 13:40       ` Matt Fleming
  0 siblings, 1 reply; 5+ messages in thread
From: Daniel Drake @ 2009-12-01 12:49 UTC (permalink / raw)
  To: Matt Fleming; +Cc: linux-mmc

On Sat, 2009-11-21 at 12:31 +0000, Matt Fleming wrote:
> Fancy giving this patch a try? I think it's just a case of removing too
> many funcs in the error path.

Thanks!
Yes, I agree that looks like the culprit.
I applied something very similar to your patch and the crash went away.

The one additional change I made is in sdio_bus.c :

 void sdio_remove_func(struct sdio_func *func)
 {
-       if (sdio_func_present(func))
-               device_del(&func->dev);
+       if (!sdio_func_present(func))
+               return;
 
+       device_del(&func->dev);
        put_device(&func->dev);
 }

I think this is necessary because the error path will go mmc_sdio_remove
--> sdio_remove_func
Hence sdio_remove_func() will be called when sdio_add_func() was never
called beforehand, so there is no func->dev reference to drop.

Do you agree? I'm not certain about this one.

Thanks!
Daniel


> diff --git a/drivers/mmc/core/sdio.c b/drivers/mmc/core/sdio.c
> index cdb845b..cdec9c8 100644
> --- a/drivers/mmc/core/sdio.c
> +++ b/drivers/mmc/core/sdio.c
> @@ -516,7 +516,8 @@ int mmc_attach_sdio(struct mmc_host *host, u32 ocr)
>  	 * The number of functions on the card is encoded inside
>  	 * the ocr.
>  	 */
> -	card->sdio_funcs = funcs = (ocr & 0x70000000) >> 28;
> +	funcs = (ocr & 0x70000000) >> 28;
> +	card->sdio_funcs = 0;
>  
>  	/*
>  	 * If needed, disconnect card detection pull-up resistor.
> @@ -528,7 +529,7 @@ int mmc_attach_sdio(struct mmc_host *host, u32 ocr)
>  	/*
>  	 * Initialize (but don't add) all present functions.
>  	 */
> -	for (i = 0;i < funcs;i++) {
> +	for (i = 0;i < funcs;i++,card->sdio_funcs++) {
>  		err = sdio_init_func(host->card, i + 1);
>  		if (err)
>  			goto remove;


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: crash in mmc subsystem during suspend
  2009-12-01 12:49     ` Daniel Drake
@ 2009-12-01 13:40       ` Matt Fleming
  0 siblings, 0 replies; 5+ messages in thread
From: Matt Fleming @ 2009-12-01 13:40 UTC (permalink / raw)
  To: Daniel Drake; +Cc: linux-mmc

On Tue, Dec 01, 2009 at 12:49:26PM +0000, Daniel Drake wrote:
> On Sat, 2009-11-21 at 12:31 +0000, Matt Fleming wrote:
> > Fancy giving this patch a try? I think it's just a case of removing too
> > many funcs in the error path.
> 
> Thanks!
> Yes, I agree that looks like the culprit.
> I applied something very similar to your patch and the crash went away.
> 
> The one additional change I made is in sdio_bus.c :
> 
>  void sdio_remove_func(struct sdio_func *func)
>  {
> -       if (sdio_func_present(func))
> -               device_del(&func->dev);
> +       if (!sdio_func_present(func))
> +               return;
>  
> +       device_del(&func->dev);
>         put_device(&func->dev);
>  }
> 
> I think this is necessary because the error path will go mmc_sdio_remove
> --> sdio_remove_func
> Hence sdio_remove_func() will be called when sdio_add_func() was never
> called beforehand, so there is no func->dev reference to drop.
> 
> Do you agree? I'm not certain about this one.
> 
> Thanks!
> Daniel
> 

Yep, your patch looks correct. Good catch.

Would you mind making a proper patch (S-O-B line and all) and submitting
it to the linux-mmc mailing list and CC'ing Andrew Morton?

Cheers


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2009-12-01 13:38 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-11-20  8:51 crash in mmc subsystem during suspend Daniel Drake
2009-11-20 15:10 ` Daniel Drake
2009-11-21 12:31   ` Matt Fleming
2009-12-01 12:49     ` Daniel Drake
2009-12-01 13:40       ` Matt Fleming

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox