public inbox for linux-mmc@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] Race fixes in sdhci
@ 2011-04-27 13:23 Mark Brown
  2011-04-27 19:33 ` Andrei Warkentin
  2011-04-27 21:44 ` Chris Ball
  0 siblings, 2 replies; 5+ messages in thread
From: Mark Brown @ 2011-04-27 13:23 UTC (permalink / raw)
  To: Chris Ball; +Cc: linux-mmc, Ben Dooks, Dimitris Papastamos

I've had this pair of patches sitting in my tree for a while now (I
believe they were previously posted) providing stability improvements in
sdhci on my systems.  Having looked through the code I believe but have
not confirmed that the issue is that the timeout is racing with an
actual completion of a pending task - both paths will trigger the
tasklet, and if you trigger a tasklet while it is running this causes it
to be rescheduled.  The result will be that the tasklet gets run a
second time with no work pending for it.

I'm not convinced that these are the best fixes (it feels like we should
instead be closing the races down) but I don't really have time to come
up with something better myself right now so I'm pushing them out as-is
for comment.

Ben Dooks (1):
      MMC: SDHCI: Check mrq->cmd in sdhci_tasklet_finish

Dimitris Papastamos (1):
      MMC: SDHCI: Check mrq != NULL in sdhci_tasklet_finish

 drivers/mmc/host/sdhci.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 0/2] Race fixes in sdhci
  2011-04-27 13:23 [PATCH 0/2] Race fixes in sdhci Mark Brown
@ 2011-04-27 19:33 ` Andrei Warkentin
  2011-04-27 20:41   ` Chris Ball
  2011-04-27 21:44 ` Chris Ball
  1 sibling, 1 reply; 5+ messages in thread
From: Andrei Warkentin @ 2011-04-27 19:33 UTC (permalink / raw)
  To: Mark Brown; +Cc: Chris Ball, linux-mmc, Ben Dooks, Dimitris Papastamos

On Wed, Apr 27, 2011 at 8:23 AM, Mark Brown
<broonie@opensource.wolfsonmicro.com> wrote:
> I've had this pair of patches sitting in my tree for a while now (I
> believe they were previously posted) providing stability improvements in
> sdhci on my systems.  Having looked through the code I believe but have
> not confirmed that the issue is that the timeout is racing with an
> actual completion of a pending task - both paths will trigger the
> tasklet, and if you trigger a tasklet while it is running this causes it
> to be rescheduled.  The result will be that the tasklet gets run a
> second time with no work pending for it.
>
> I'm not convinced that these are the best fixes (it feels like we should
> instead be closing the races down) but I don't really have time to come
> up with something better myself right now so I'm pushing them out as-is
> for comment.
>
> Ben Dooks (1):
>      MMC: SDHCI: Check mrq->cmd in sdhci_tasklet_finish
>
> Dimitris Papastamos (1):
>      MMC: SDHCI: Check mrq != NULL in sdhci_tasklet_finish
>
>  drivers/mmc/host/sdhci.c |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)

So the timeout interrupt occurs after even though the command
succeeds? Am I interpreting that correctly?

A

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 0/2] Race fixes in sdhci
  2011-04-27 19:33 ` Andrei Warkentin
@ 2011-04-27 20:41   ` Chris Ball
  2011-04-27 21:04     ` Mark Brown
  0 siblings, 1 reply; 5+ messages in thread
From: Chris Ball @ 2011-04-27 20:41 UTC (permalink / raw)
  To: Andrei Warkentin; +Cc: Mark Brown, linux-mmc, Ben Dooks, Dimitris Papastamos

Hi,

On Wed, Apr 27 2011, Andrei Warkentin wrote:
> On Wed, Apr 27, 2011 at 8:23 AM, Mark Brown
> <broonie@opensource.wolfsonmicro.com> wrote:
>> I've had this pair of patches sitting in my tree for a while now (I
>> believe they were previously posted) providing stability improvements in
>> sdhci on my systems.  Having looked through the code I believe but have
>> not confirmed that the issue is that the timeout is racing with an
>> actual completion of a pending task - both paths will trigger the
>> tasklet, and if you trigger a tasklet while it is running this causes it
>> to be rescheduled.  The result will be that the tasklet gets run a
>> second time with no work pending for it.
>>
>> I'm not convinced that these are the best fixes (it feels like we should
>> instead be closing the races down) but I don't really have time to come
>> up with something better myself right now so I'm pushing them out as-is
>> for comment.
>
> So the timeout interrupt occurs after even though the command
> succeeds? Am I interpreting that correctly?

No, I think Mark's saying there's a race of:

* the successful completion interrupt fires, and

* the host timer fires to signify timeout due to *lack* of an interrupt
  (via sdhci_timeout_timer()).  i.e., the completion interrupt fires
  very close to the timeout period.

Both cases call tasklet_schedule(&host->finish_tasklet), and if you
manage to schedule a tasklet while it's already running, it runs again
after it completes -- but during the first run we set host->mrq->cmd to
NULL, so then it oopses on the second run.

We could consider taking Mark's first patch and also adding to the top:

        /*
         * If we get scheduled twice concurrently, this tasklet will
         * be run again afterwards but without any active request.
         */
        if (!host->mrq)
                return;

.. and pushing to .39 with a stable@ tag.

- Chris.
-- 
Chris Ball   <cjb@laptop.org>   <http://printf.net/>
One Laptop Per Child

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 0/2] Race fixes in sdhci
  2011-04-27 20:41   ` Chris Ball
@ 2011-04-27 21:04     ` Mark Brown
  0 siblings, 0 replies; 5+ messages in thread
From: Mark Brown @ 2011-04-27 21:04 UTC (permalink / raw)
  To: Chris Ball; +Cc: Andrei Warkentin, linux-mmc, Ben Dooks, Dimitris Papastamos

On Wed, Apr 27, 2011 at 04:41:27PM -0400, Chris Ball wrote:

> No, I think Mark's saying there's a race of:

> * the successful completion interrupt fires, and

> * the host timer fires to signify timeout due to *lack* of an interrupt
>   (via sdhci_timeout_timer()).  i.e., the completion interrupt fires
>   very close to the timeout period.

> Both cases call tasklet_schedule(&host->finish_tasklet), and if you
> manage to schedule a tasklet while it's already running, it runs again
> after it completes -- but during the first run we set host->mrq->cmd to
> NULL, so then it oopses on the second run.

Yes, exactly - that's what I saw through code inspection.  I haven't
actually seen this directly and analysed enough to understand a
particular race, I just looked at the fixes that we are carrying in our
tree and looked at the code sufficiently to determine that the fixes
addressed an issue I could identify.

> We could consider taking Mark's first patch and also adding to the top:

Ben's patch.

>         /*
>          * If we get scheduled twice concurrently, this tasklet will
>          * be run again afterwards but without any active request.
>          */
>         if (!host->mrq)
>                 return;

> .. and pushing to .39 with a stable@ tag.

Yes, that's a broader way of writing Dimitris' patch.  The issue isn't
concurrent *schedules* - they will just fall out into a single queue for
the task - it's that the tasklet can be rescheduled while it's running.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 0/2] Race fixes in sdhci
  2011-04-27 13:23 [PATCH 0/2] Race fixes in sdhci Mark Brown
  2011-04-27 19:33 ` Andrei Warkentin
@ 2011-04-27 21:44 ` Chris Ball
  1 sibling, 0 replies; 5+ messages in thread
From: Chris Ball @ 2011-04-27 21:44 UTC (permalink / raw)
  To: Mark Brown; +Cc: linux-mmc, Ben Dooks, Dimitris Papastamos

Hi Mark,

On Wed, Apr 27 2011, Mark Brown wrote:
> Ben Dooks (1):
>       MMC: SDHCI: Check mrq->cmd in sdhci_tasklet_finish
>
> Dimitris Papastamos (1):
>       MMC: SDHCI: Check mrq != NULL in sdhci_tasklet_finish

Thanks.  I've merged Ben's patch for .39, and also:

From: Chris Ball <cjb@laptop.org>
Subject: [PATCH] mmc: sdhci: Check mrq != NULL in sdhci_tasklet_finish

It seems that under certain circumstances the sdhci_tasklet_finish()
call can be entered with mrq set to NULL, causing the system to crash
with a NULL pointer de-reference.

Seen on S3C6410 system.  Based on a patch by Dimitris Papastamos.

Reported-by: Dimitris Papastamos <dp@opensource.wolfsonmicro.com>
Cc: <stable@kernel.org>
Signed-off-by: Chris Ball <cjb@laptop.org>
---
 drivers/mmc/host/sdhci.c |    7 +++++++
 1 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/drivers/mmc/host/sdhci.c b/drivers/mmc/host/sdhci.c
index e4084a3..f197c67 100644
--- a/drivers/mmc/host/sdhci.c
+++ b/drivers/mmc/host/sdhci.c
@@ -1808,6 +1808,13 @@ static void sdhci_tasklet_finish(unsigned long param)
 
 	host = (struct sdhci_host*)param;
 
+        /*
+         * If this tasklet gets rescheduled while running, it will
+         * be run again afterwards but without any active request.
+         */
+	if (!host->mrq)
+		return;
+
 	spin_lock_irqsave(&host->lock, flags);
 
 	del_timer(&host->timer);
-- 
Chris Ball   <cjb@laptop.org>   <http://printf.net/>
One Laptop Per Child

^ permalink raw reply related	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2011-04-27 21:41 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-04-27 13:23 [PATCH 0/2] Race fixes in sdhci Mark Brown
2011-04-27 19:33 ` Andrei Warkentin
2011-04-27 20:41   ` Chris Ball
2011-04-27 21:04     ` Mark Brown
2011-04-27 21:44 ` Chris Ball

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox