deadlock between mmc_pm_notify() and mmcqd

linux-mmc.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* deadlock between mmc_pm_notify() and mmcqd
@ 2011-03-30 10:52 Frank Hofmann
  2011-04-04 12:54 ` Andrei Warkentin
  0 siblings, 1 reply; 13+ messages in thread
From: Frank Hofmann @ 2011-03-30 10:52 UTC (permalink / raw)
  To: linux-mmc

Hi linux-mmc'ers,

(have originally posted this to linux-pm, got suggested I ask here as well)

I'm encountering the following deadlock:



First, an "echo disk >/sys/power/state":

[<c03bad2c>] (schedule+0x48c/0x50c) from [<c01a5ca4>] (log_wait_commit+0xb8/0x110)
[<c01a5ca4>] (log_wait_commit+0xb8/0x110) from [<c0193254>] (ext3_sync_fs+0x3c/0x44)
[<c0193254>] (ext3_sync_fs+0x3c/0x44) from [<c015d7a0>] (__sync_filesystem+0x50/0x60)
[<c015d7a0>] (__sync_filesystem+0x50/0x60) from [<c01663ac>] (fsync_bdev+0x18/0x38)
[<c01663ac>] (fsync_bdev+0x18/0x38) from [<c01d4ef8>] (invalidate_partition+0x18/0x34)
[<c01d4ef8>] (invalidate_partition+0x18/0x34) from [<c0182260>] (del_gendisk+0x24/0xc0)
[<c0182260>] (del_gendisk+0x24/0xc0) from [<c02f7f48>] (mmc_blk_remove+0x20/0x40)
[<c02f7f48>] (mmc_blk_remove+0x20/0x40) from [<c02f233c>] (mmc_bus_remove+0x18/0x20)
[<c02f233c>] (mmc_bus_remove+0x18/0x20) from [<c024649c>] (__device_release_driver+0x64/0xa4)
[<c024649c>] (__device_release_driver+0x64/0xa4) from [<c02465a4>] (device_release_driver+0x1c/0x28)
[<c02465a4>] (device_release_driver+0x1c/0x28) from [<c0245ae0>] (bus_remove_device+0x6c/0x7c)
[<c0245ae0>] (bus_remove_device+0x6c/0x7c) from [<c0244178>] (device_del+0x118/0x170)
[<c0244178>] (device_del+0x118/0x170) from [<c02f23f4>] (mmc_remove_card+0x50/0x64)
[<c02f23f4>] (mmc_remove_card+0x50/0x64) from [<c02f3ed4>] (mmc_sd_remove+0x24/0x30)
[<c02f3ed4>] (mmc_sd_remove+0x24/0x30) from [<c02f1c0c>] (mmc_pm_notify+0x88/0xd8)
[<c02f1c0c>] (mmc_pm_notify+0x88/0xd8) from [<c00d6780>] (notifier_call_chain+0x2c/0x70)
[<c00d6780>] (notifier_call_chain+0x2c/0x70) from [<c00d6998>] (__blocking_notifier_call_chain+0x48/0x5c)
[<c00d6998>] (__blocking_notifier_call_chain+0x48/0x5c) from [<c00d69c0>] (blocking_notifier_call_chain+0x14/0x18)
[<c00d69c0>] (blocking_notifier_call_chain+0x14/0x18) from [<c00ebc70>] (pm_notifier_call_chain+0x14/0x2c)
[<c00ebc70>] (pm_notifier_call_chain+0x14/0x2c) from [<c00ed630>] (hibernate+0x1a8/0x1d8)
[<c00ed630>] (hibernate+0x1a8/0x1d8) from [<c00ebbc4>] (state_store+0x4c/0xe4)
[<c00ebbc4>] (state_store+0x4c/0xe4) from [<c01d9d34>] (kobj_attr_store+0x18/0x1c)
[<c01d9d34>] (kobj_attr_store+0x18/0x1c) from [<c0183bf8>] (sysfs_write_file+0x10c/0x140)
[<c0183bf8>] (sysfs_write_file+0x10c/0x140) from [<c013d8bc>] (vfs_write+0xac/0x154)
[<c013d8bc>] (vfs_write+0xac/0x154) from [<c013da10>] (sys_write+0x3c/0x68)
[<c013da10>] (sys_write+0x3c/0x68) from [<c007d700>] (ret_fast_syscall+0x0/0x2c)

This waits for I/O. Which would be processed by mmcqd:

mmcqd D c03bad2c 0 516 2 0x00000000
[<c03bad2c>] (schedule+0x48c/0x50c) from [<c02f1ae8>] (__mmc_claim_host+0xbc/0x158)
[<c02f1ae8>] (__mmc_claim_host+0xbc/0x158) from [<c02f8378>] (mmc_blk_issue_rq+0x2c/0x728)
[<c02f8378>] (mmc_blk_issue_rq+0x2c/0x728) from [<c02f9184>] (mmc_queue_thread+0xd8/0xdc)
[<c02f9184>] (mmc_queue_thread+0xd8/0xdc) from [<c00d199c>] (kthread+0x80/0x88)
[<c00d199c>] (kthread+0x80/0x88) from [<c007e0e4>] (kernel_thread_exit+0x0/0x8)

and that's waiting to claim the MMC host.

Which will never happen - mmc_pm_notify(), by the point above, holds that host 
ransom already, the code does the mmc_claim_host() directly before calling into 
bus_ops->suspend().


Is this is a known problem ? We're running a customized 2.6.32 kernel, so 
admittedly not the latest; mmc_pm_notify, on the other hand, is already newer 
than that, introduced via:

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=4c2ef25fe0b847d2ae818f74758ddb0be1c27d8e;hp=7310ece86ad7da027f85a37a0638164118a5d12f

in 2.6.35, and in itself not changed since.


I've found https://lkml.org/lkml/2010/10/21/494 which talks about a regression 
from said change, but nothing came out of that, and the codepaths mentioned 
there is mmc_suspend_host not mmc_sd_remove.

The device I'm testing this on is an OMAP3 box that boots via MMC, hence the 
root filesystem is on there and it'd be expected dirty.

Removing the abovementioned commit prevents the deadlock from happening, but 
then I wonder ? Also, I've found that even if I remove the commit, MMC doesn't 
suspend cleanly (gives a DPM timeout crash a little later).



FrankH.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: deadlock between mmc_pm_notify() and mmcqd
  2011-03-30 10:52 deadlock between mmc_pm_notify() and mmcqd Frank Hofmann
@ 2011-04-04 12:54 ` Andrei Warkentin
  2011-04-04 13:11   ` Frank Hofmann
  0 siblings, 1 reply; 13+ messages in thread
From: Andrei Warkentin @ 2011-04-04 12:54 UTC (permalink / raw)
  To: frank.hofmann; +Cc: linux-mmc

On Wed, Mar 30, 2011 at 5:52 AM, Frank Hofmann <frank.hofmann@tomtom.com> wrote:
> Hi linux-mmc'ers,
>
> (have originally posted this to linux-pm, got suggested I ask here as well)
>
> I'm encountering the following deadlock:
>
>
>
> First, an "echo disk >/sys/power/state":
>
> [<c03bad2c>] (schedule+0x48c/0x50c) from [<c01a5ca4>]
> (log_wait_commit+0xb8/0x110)
> [<c01a5ca4>] (log_wait_commit+0xb8/0x110) from [<c0193254>]
> (ext3_sync_fs+0x3c/0x44)
> [<c0193254>] (ext3_sync_fs+0x3c/0x44) from [<c015d7a0>]
> (__sync_filesystem+0x50/0x60)
> [<c015d7a0>] (__sync_filesystem+0x50/0x60) from [<c01663ac>]
> (fsync_bdev+0x18/0x38)
> [<c01663ac>] (fsync_bdev+0x18/0x38) from [<c01d4ef8>]
> (invalidate_partition+0x18/0x34)
> [<c01d4ef8>] (invalidate_partition+0x18/0x34) from [<c0182260>]
> (del_gendisk+0x24/0xc0)
> [<c0182260>] (del_gendisk+0x24/0xc0) from [<c02f7f48>]
> (mmc_blk_remove+0x20/0x40)
> [<c02f7f48>] (mmc_blk_remove+0x20/0x40) from [<c02f233c>]
> (mmc_bus_remove+0x18/0x20)
> [<c02f233c>] (mmc_bus_remove+0x18/0x20) from [<c024649c>]
> (__device_release_driver+0x64/0xa4)
> [<c024649c>] (__device_release_driver+0x64/0xa4) from [<c02465a4>]
> (device_release_driver+0x1c/0x28)
> [<c02465a4>] (device_release_driver+0x1c/0x28) from [<c0245ae0>]
> (bus_remove_device+0x6c/0x7c)
> [<c0245ae0>] (bus_remove_device+0x6c/0x7c) from [<c0244178>]
> (device_del+0x118/0x170)
> [<c0244178>] (device_del+0x118/0x170) from [<c02f23f4>]
> (mmc_remove_card+0x50/0x64)
> [<c02f23f4>] (mmc_remove_card+0x50/0x64) from [<c02f3ed4>]
> (mmc_sd_remove+0x24/0x30)
> [<c02f3ed4>] (mmc_sd_remove+0x24/0x30) from [<c02f1c0c>]
> (mmc_pm_notify+0x88/0xd8)
> [<c02f1c0c>] (mmc_pm_notify+0x88/0xd8) from [<c00d6780>]
> (notifier_call_chain+0x2c/0x70)
> [<c00d6780>] (notifier_call_chain+0x2c/0x70) from [<c00d6998>]
> (__blocking_notifier_call_chain+0x48/0x5c)
> [<c00d6998>] (__blocking_notifier_call_chain+0x48/0x5c) from [<c00d69c0>]
> (blocking_notifier_call_chain+0x14/0x18)
> [<c00d69c0>] (blocking_notifier_call_chain+0x14/0x18) from [<c00ebc70>]
> (pm_notifier_call_chain+0x14/0x2c)
> [<c00ebc70>] (pm_notifier_call_chain+0x14/0x2c) from [<c00ed630>]
> (hibernate+0x1a8/0x1d8)
> [<c00ed630>] (hibernate+0x1a8/0x1d8) from [<c00ebbc4>]
> (state_store+0x4c/0xe4)
> [<c00ebbc4>] (state_store+0x4c/0xe4) from [<c01d9d34>]
> (kobj_attr_store+0x18/0x1c)
> [<c01d9d34>] (kobj_attr_store+0x18/0x1c) from [<c0183bf8>]
> (sysfs_write_file+0x10c/0x140)
> [<c0183bf8>] (sysfs_write_file+0x10c/0x140) from [<c013d8bc>]
> (vfs_write+0xac/0x154)
> [<c013d8bc>] (vfs_write+0xac/0x154) from [<c013da10>] (sys_write+0x3c/0x68)
> [<c013da10>] (sys_write+0x3c/0x68) from [<c007d700>]
> (ret_fast_syscall+0x0/0x2c)
>
> This waits for I/O. Which would be processed by mmcqd:
>
> mmcqd D c03bad2c 0 516 2 0x00000000
> [<c03bad2c>] (schedule+0x48c/0x50c) from [<c02f1ae8>]
> (__mmc_claim_host+0xbc/0x158)
> [<c02f1ae8>] (__mmc_claim_host+0xbc/0x158) from [<c02f8378>]
> (mmc_blk_issue_rq+0x2c/0x728)
> [<c02f8378>] (mmc_blk_issue_rq+0x2c/0x728) from [<c02f9184>]
> (mmc_queue_thread+0xd8/0xdc)
> [<c02f9184>] (mmc_queue_thread+0xd8/0xdc) from [<c00d199c>]
> (kthread+0x80/0x88)
> [<c00d199c>] (kthread+0x80/0x88) from [<c007e0e4>]
> (kernel_thread_exit+0x0/0x8)
>
> and that's waiting to claim the MMC host.
>
> Which will never happen - mmc_pm_notify(), by the point above, holds that
> host ransom already, the code does the mmc_claim_host() directly before
> calling into bus_ops->suspend().
>
>
> Is this is a known problem ? We're running a customized 2.6.32 kernel, so
> admittedly not the latest; mmc_pm_notify, on the other hand, is already
> newer than that, introduced via:
>
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=4c2ef25fe0b847d2ae818f74758ddb0be1c27d8e;hp=7310ece86ad7da027f85a37a0638164118a5d12f
>
> in 2.6.35, and in itself not changed since.
>
>
> I've found https://lkml.org/lkml/2010/10/21/494 which talks about a
> regression from said change, but nothing came out of that, and the codepaths
> mentioned there is mmc_suspend_host not mmc_sd_remove.
>
> The device I'm testing this on is an OMAP3 box that boots via MMC, hence the
> root filesystem is on there and it'd be expected dirty.
>
> Removing the abovementioned commit prevents the deadlock from happening, but
> then I wonder ? Also, I've found that even if I remove the commit, MMC
> doesn't suspend cleanly (gives a DPM timeout crash a little later).
>

Hi Frank,

I was able to reproduce it on linux-next. No need for rootfs on
mmcblk, and a "echo mem > /sys/power/state" is sufficient to trigger
the issue.

I am looking into it.

Thanks,
A

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: deadlock between mmc_pm_notify() and mmcqd
  2011-04-04 12:54 ` Andrei Warkentin
@ 2011-04-04 13:11   ` Frank Hofmann
  2011-04-04 14:00     ` [PATCH] MMC: fix mmc_pm_notify bus_ops->remove deadlock Andrei Warkentin
  0 siblings, 1 reply; 13+ messages in thread
From: Frank Hofmann @ 2011-04-04 13:11 UTC (permalink / raw)
  To: Andrei Warkentin; +Cc: frank.hofmann, linux-mmc

On Mon, 4 Apr 2011, Andrei Warkentin wrote:

> On Wed, Mar 30, 2011 at 5:52 AM, Frank Hofmann <frank.hofmann@tomtom.com> wrote:
>> Hi linux-mmc'ers,
>>
>> (have originally posted this to linux-pm, got suggested I ask here as well)
>>
>> I'm encountering the following deadlock:
>>
[ mmcqd vs mmc_pm_notify ]
>
> Hi Frank,
>
> I was able to reproduce it on linux-next. No need for rootfs on
> mmcblk, and a "echo mem > /sys/power/state" is sufficient to trigger
> the issue.
>
> I am looking into it.
>
> Thanks,
> A
>

Thanks Andrei.

Missed to make it clear in the original email that disabling the notifier 
via CONFIG_MMC_UNSAFE_RESUME stops it from occurring.

Also found that simply reverting _part_ of the commit isn't helping 
because the deadlock just moves ... to mmc_suspend_host() eventually also 
ending up in del_gendisk(), and things are identical from there.

FrankH.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] MMC: fix mmc_pm_notify bus_ops->remove deadlock.
  2011-04-04 14:00     ` [PATCH] MMC: fix mmc_pm_notify bus_ops->remove deadlock Andrei Warkentin
@ 2011-04-04 13:27       ` Andrei Warkentin
  2011-04-04 15:01         ` Ohad Ben-Cohen
  2011-04-04 15:01         ` Andrei Warkentin
  0 siblings, 2 replies; 13+ messages in thread
From: Andrei Warkentin @ 2011-04-04 13:27 UTC (permalink / raw)
  To: linux-mmc, frank.hofmann; +Cc: Andrei Warkentin, ohad

On Mon, Apr 4, 2011 at 9:00 AM, Andrei Warkentin <andreiw@motorola.com> wrote:
> This resolves the deadlock issue with suspend. There is no
> need to claim host before the remove op.
>
> Signed-off-by: Andrei Warkentin <andreiw@motorola.com>

Frank,

Can you try this out? I think this fixes it.

Ohad,

I think this means we can take
1c8cf9c997a4a6b36e907c7ede5f048aeaab1644 out (mmc: sdio: fix SDIO
suspend/resume regression). What do you think?

Thanks,
A

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH] MMC: fix mmc_pm_notify bus_ops->remove deadlock.
  2011-04-04 13:11   ` Frank Hofmann
@ 2011-04-04 14:00     ` Andrei Warkentin
  2011-04-04 13:27       ` Andrei Warkentin
  0 siblings, 1 reply; 13+ messages in thread
From: Andrei Warkentin @ 2011-04-04 14:00 UTC (permalink / raw)
  To: linux-mmc; +Cc: Andrei Warkentin

This resolves the deadlock issue with suspend. There is no
need to claim host before the remove op.

Signed-off-by: Andrei Warkentin <andreiw@motorola.com>
---
 drivers/mmc/core/core.c |    3 +--
 1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
index 85ef72c..87c4af7 100644
--- a/drivers/mmc/core/core.c
+++ b/drivers/mmc/core/core.c
@@ -1823,11 +1823,10 @@ int mmc_pm_notify(struct notifier_block *notify_block,
 		if (!host->bus_ops || host->bus_ops->suspend)
 			break;
 
-		mmc_claim_host(host);
-
 		if (host->bus_ops->remove)
 			host->bus_ops->remove(host);
 
+		mmc_claim_host(host);
 		mmc_detach_bus(host);
 		mmc_release_host(host);
 		host->pm_flags = 0;
-- 
1.7.0.4


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH] MMC: fix mmc_pm_notify bus_ops->remove deadlock.
  2011-04-04 13:27       ` Andrei Warkentin
@ 2011-04-04 15:01         ` Ohad Ben-Cohen
  2011-04-04 15:10           ` Andrei Warkentin
  2011-04-04 15:01         ` Andrei Warkentin
  1 sibling, 1 reply; 13+ messages in thread
From: Ohad Ben-Cohen @ 2011-04-04 15:01 UTC (permalink / raw)
  To: Andrei Warkentin; +Cc: linux-mmc, frank.hofmann

On Mon, Apr 4, 2011 at 4:27 PM, Andrei Warkentin <andreiw@motorola.com> wrote:
> On Mon, Apr 4, 2011 at 9:00 AM, Andrei Warkentin <andreiw@motorola.com> wrote:
>> This resolves the deadlock issue with suspend. There is no
>> need to claim host before the remove op.
>>
>> Signed-off-by: Andrei Warkentin <andreiw@motorola.com>
...
> I think this means we can take
> 1c8cf9c997a4a6b36e907c7ede5f048aeaab1644 out (mmc: sdio: fix SDIO
> suspend/resume regression).

I don't see how is this related ?

1c8cf9c997a4a6b36e907c7ede5f048aeaab1644 makes sure the -ENOSYS
semantics of mmc_sdio_suspend() are followed.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] MMC: fix mmc_pm_notify bus_ops->remove deadlock.
  2011-04-04 13:27       ` Andrei Warkentin
  2011-04-04 15:01         ` Ohad Ben-Cohen
@ 2011-04-04 15:01         ` Andrei Warkentin
       [not found]           ` <alpine.DEB.2.00.1104041624310.690@localhost6.localdomain6>
  1 sibling, 1 reply; 13+ messages in thread
From: Andrei Warkentin @ 2011-04-04 15:01 UTC (permalink / raw)
  To: linux-mmc, frank.hofmann; +Cc: Andrei Warkentin, ohad

On Mon, Apr 4, 2011 at 8:27 AM, Andrei Warkentin <andreiw@motorola.com> wrote:
> On Mon, Apr 4, 2011 at 9:00 AM, Andrei Warkentin <andreiw@motorola.com> wrote:
>> This resolves the deadlock issue with suspend. There is no
>> need to claim host before the remove op.
>>
>> Signed-off-by: Andrei Warkentin <andreiw@motorola.com>
>
> Frank,
>
> Can you try this out? I think this fixes it.
>
> Ohad,
>
> I think this means we can take
> 1c8cf9c997a4a6b36e907c7ede5f048aeaab1644 out (mmc: sdio: fix SDIO
> suspend/resume regression). What do you think?
>
> Thanks,
> A
>

Ohad, nevermind.

I think there is a bigger issue here at stake for removeable cards. If
you have a mounted file system,
the usage count for mmc_blk usage will never drop enough to remove the device!

I think the block.c for removal needs to change somewhat. Removed MDs
need to clear devidx and be put on an "orphan list" from where
they will remove themselves if usage count drops to zero. On re-probe,
the "orphan list" needs to be scanned for allocated devidx, and if it
is found, then that MD should be reused.
I'll see if I can put something together.

A

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] MMC: fix mmc_pm_notify bus_ops->remove deadlock.
  2011-04-04 15:01         ` Ohad Ben-Cohen
@ 2011-04-04 15:10           ` Andrei Warkentin
  0 siblings, 0 replies; 13+ messages in thread
From: Andrei Warkentin @ 2011-04-04 15:10 UTC (permalink / raw)
  To: Ohad Ben-Cohen; +Cc: linux-mmc

On Mon, Apr 4, 2011 at 10:01 AM, Ohad Ben-Cohen <ohad@wizery.com> wrote:
> On Mon, Apr 4, 2011 at 4:27 PM, Andrei Warkentin <andreiw@motorola.com> wrote:
>> On Mon, Apr 4, 2011 at 9:00 AM, Andrei Warkentin <andreiw@motorola.com> wrote:
>>> This resolves the deadlock issue with suspend. There is no
>>> need to claim host before the remove op.
>>>
>>> Signed-off-by: Andrei Warkentin <andreiw@motorola.com>
> ...
>> I think this means we can take
>> 1c8cf9c997a4a6b36e907c7ede5f048aeaab1644 out (mmc: sdio: fix SDIO
>> suspend/resume regression).
>
> I don't see how is this related ?
>
> 1c8cf9c997a4a6b36e907c7ede5f048aeaab1644 makes sure the -ENOSYS
> semantics of mmc_sdio_suspend() are followed.
>

I thought that code was added as a fix for problems in the
pm_notifier, but I misread the code and I was wrong.

Thanks,
A

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] MMC: fix mmc_pm_notify bus_ops->remove deadlock.
       [not found]           ` <alpine.DEB.2.00.1104041624310.690@localhost6.localdomain6>
@ 2011-04-04 15:47             ` Andrei Warkentin
  2011-04-04 16:08               ` Andrei Warkentin
  0 siblings, 1 reply; 13+ messages in thread
From: Andrei Warkentin @ 2011-04-04 15:47 UTC (permalink / raw)
  To: frank.hofmann, linux-mmc

On Mon, Apr 4, 2011 at 10:26 AM, Frank Hofmann <frank.hofmann@tomtom.com> wrote:
>
>
> On Mon, 4 Apr 2011, Andrei Warkentin wrote:
>
>> On Mon, Apr 4, 2011 at 8:27 AM, Andrei Warkentin <andreiw@motorola.com>
>> wrote:
>>>
>>> On Mon, Apr 4, 2011 at 9:00 AM, Andrei Warkentin <andreiw@motorola.com>
>>> wrote:
>>>>
>>>> This resolves the deadlock issue with suspend. There is no
>>>> need to claim host before the remove op.
>>>>
>>>> Signed-off-by: Andrei Warkentin <andreiw@motorola.com>
>>>
>>> Frank,
>>>
>>> Can you try this out? I think this fixes it.
>>>
>>> Ohad,
>>>
>>> I think this means we can take
>>> 1c8cf9c997a4a6b36e907c7ede5f048aeaab1644 out (mmc: sdio: fix SDIO
>>> suspend/resume regression). What do you think?
>>>
>>> Thanks,
>>> A
>>>
>>
>> Ohad, nevermind.
>>
>> I think there is a bigger issue here at stake for removeable cards. If
>> you have a mounted file system,
>> the usage count for mmc_blk usage will never drop enough to remove the
>> device!
>>
>> I think the block.c for removal needs to change somewhat. Removed MDs
>> need to clear devidx and be put on an "orphan list" from where
>> they will remove themselves if usage count drops to zero. On re-probe,
>> the "orphan list" needs to be scanned for allocated devidx, and if it
>> is found, then that MD should be reused.
>> I'll see if I can put something together.
>>
>> A
>>
>
> Just to clarify, do you want a test result without
> 1c8cf9c997a4a6b36e907c7ede5f048aeaab1644 or better wait for now ?
>
> FrankH.
>

I don't think there is a point. For example, for mounted media, if the
userspace unmounts the filesystem on suspend, then the device will
successfully remove.
For root fs on a removeable card, it will never tear down the mmcblk
MD, because mmc_blk_release will never be called enough times (fs is
mounted).

Is the card actually an external card for you or an eMMC? You could
just get away with unsafe resume, I suppose...

A

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] MMC: fix mmc_pm_notify bus_ops->remove deadlock.
  2011-04-04 15:47             ` Andrei Warkentin
@ 2011-04-04 16:08               ` Andrei Warkentin
  2011-04-04 20:31                 ` [RFC] MMC: Request for comments attempt at dealing with removeable suspend/resume Andrei Warkentin
  0 siblings, 1 reply; 13+ messages in thread
From: Andrei Warkentin @ 2011-04-04 16:08 UTC (permalink / raw)
  To: frank.hofmann, linux-mmc

On Mon, Apr 4, 2011 at 10:47 AM, Andrei Warkentin <andreiw@motorola.com> wrote:
> On Mon, Apr 4, 2011 at 10:26 AM, Frank Hofmann <frank.hofmann@tomtom.com> wrote:
>>
>>
>> On Mon, 4 Apr 2011, Andrei Warkentin wrote:
>>
>>> On Mon, Apr 4, 2011 at 8:27 AM, Andrei Warkentin <andreiw@motorola.com>
>>> wrote:
>>>>
>>>> On Mon, Apr 4, 2011 at 9:00 AM, Andrei Warkentin <andreiw@motorola.com>
>>>> wrote:
>>>>>
>>>>> This resolves the deadlock issue with suspend. There is no
>>>>> need to claim host before the remove op.
>>>>>
>>>>> Signed-off-by: Andrei Warkentin <andreiw@motorola.com>
>>>>
>>>> Frank,
>>>>
>>>> Can you try this out? I think this fixes it.
>>>>
>>>> Ohad,
>>>>
>>>> I think this means we can take
>>>> 1c8cf9c997a4a6b36e907c7ede5f048aeaab1644 out (mmc: sdio: fix SDIO
>>>> suspend/resume regression). What do you think?
>>>>
>>>> Thanks,
>>>> A
>>>>
>>>
>>> Ohad, nevermind.
>>>
>>> I think there is a bigger issue here at stake for removeable cards. If
>>> you have a mounted file system,
>>> the usage count for mmc_blk usage will never drop enough to remove the
>>> device!
>>>
>>> I think the block.c for removal needs to change somewhat. Removed MDs
>>> need to clear devidx and be put on an "orphan list" from where
>>> they will remove themselves if usage count drops to zero. On re-probe,
>>> the "orphan list" needs to be scanned for allocated devidx, and if it
>>> is found, then that MD should be reused.
>>> I'll see if I can put something together.
>>>
>>> A
>>>
>>
>> Just to clarify, do you want a test result without
>> 1c8cf9c997a4a6b36e907c7ede5f048aeaab1644 or better wait for now ?
>>
>> FrankH.
>>
>
> I don't think there is a point. For example, for mounted media, if the
> userspace unmounts the filesystem on suspend, then the device will
> successfully remove.
> For root fs on a removeable card, it will never tear down the mmcblk
> MD, because mmc_blk_release will never be called enough times (fs is
> mounted).
>
> Is the card actually an external card for you or an eMMC? You could
> just get away with unsafe resume, I suppose...
>
> A
>

I don't think there is a good fix. Suspend with a removable card is
equivalent to pulling out the card with a file system mounted after
manually mounting the fs,
with the exception that all outstanding I/O is flushed. You could play
games with caching removed MDs if the use count doesn't drop to zero
and reusing them, but
the best you can achieve is a successful resume after a suspend as
long as the hardware configuration didn't change (no other cards
added).

A

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [RFC] MMC: Request for comments attempt at dealing with removeable suspend/resume.
  2011-04-04 16:08               ` Andrei Warkentin
@ 2011-04-04 20:31                 ` Andrei Warkentin
  2011-05-17 10:15                   ` Dong, Chuanxiao
  0 siblings, 1 reply; 13+ messages in thread
From: Andrei Warkentin @ 2011-04-04 20:31 UTC (permalink / raw)
  To: linux-mmc; +Cc: Andrei Warkentin

Is there any value to doing something like this in order to be able to suspend/resume
with a (manually, or rootfs) mounted filesystem on mmcblk?

Thoughts?

Signed-off-by: Andrei Warkentin <andreiw@motorola.com>
---
 drivers/mmc/card/block.c |   76 +++++++++++++++++++++++++++++++++++++++++----
 drivers/mmc/core/core.c  |    3 +-
 2 files changed, 70 insertions(+), 9 deletions(-)

diff --git a/drivers/mmc/card/block.c b/drivers/mmc/card/block.c
index ee8f7a9..19eb5b6 100644
--- a/drivers/mmc/card/block.c
+++ b/drivers/mmc/card/block.c
@@ -53,6 +53,9 @@ MODULE_ALIAS("mmc:block");
      ((card)->ext_csd.rel_sectors)))
 
 static DEFINE_MUTEX(block_mutex);
+static DEFINE_MUTEX(orphan_mutex);
+
+struct list_head orphans = LIST_HEAD_INIT(orphans);
 
 /*
  * The defaults come from config options but can be overriden by module
@@ -77,6 +80,7 @@ struct mmc_blk_data {
 	struct gendisk	*disk;
 	struct mmc_queue queue;
 	struct list_head part;
+	struct list_head orphan;
 
 	unsigned int	usage;
 	unsigned int	read_only;
@@ -88,6 +92,7 @@ struct mmc_blk_data {
 	 * track of the current selected device partition.
 	 */
 	unsigned int	part_curr;
+	u32		raw_cid[4];
 	struct device_attribute force_ro;
 };
 
@@ -126,10 +131,12 @@ static void mmc_blk_put(struct mmc_blk_data *md)
 	mutex_lock(&open_lock);
 	md->usage--;
 	if (md->usage == 0) {
-		int devidx = mmc_get_devidx(md->disk);
-		blk_cleanup_queue(md->queue.queue);
+		mutex_lock(&orphan_mutex);
+		list_del(&md->orphan);
+		mutex_unlock(&orphan_mutex);
 
-		__clear_bit(devidx, dev_use);
+		blk_cleanup_queue(md->queue.queue);
+		__clear_bit(mmc_get_devidx(md->disk), dev_use);
 
 		put_disk(md->disk);
 		kfree(md);
@@ -718,6 +725,49 @@ static inline int mmc_blk_readonly(struct mmc_card *card)
 	       !(card->csd.cmdclass & CCC_BLOCK_WRITE);
 }
 
+static inline struct mmc_blk_data *mmc_lookup_orphan(struct mmc_card *card,
+						     struct device *parent,
+						     unsigned int part_type,
+						     sector_t size)
+{
+	int ret;
+	struct list_head *pos, *q;
+	struct mmc_blk_data *md;
+	bool found = false;
+
+	mutex_lock(&orphan_mutex);
+	list_for_each_safe(pos, q, &orphans) {
+		md = list_entry(pos, struct mmc_blk_data, orphan);
+		if (!memcmp(md->raw_cid, card->raw_cid, sizeof(md->raw_cid)) &&
+		    md->part_type == part_type) {
+			list_del(pos);
+			found = true;
+			mmc_blk_get(md->disk);
+			break;
+		}
+	}
+	mutex_unlock(&orphan_mutex);
+
+	if (!found)
+		return NULL;
+
+	ret = mmc_init_queue(&md->queue, card, &md->lock);
+	if (ret)
+		return NULL;
+
+	INIT_LIST_HEAD(&md->part);
+	md->disk->driverfs_dev = parent;
+	md->queue.issue_fn = mmc_blk_issue_rq;
+	md->queue.data = md;
+	md->disk->queue = md->queue.queue;
+	if (REL_WRITES_SUPPORTED(card))
+		blk_queue_flush(md->queue.queue, REQ_FLUSH | REQ_FUA);
+	blk_queue_logical_block_size(md->queue.queue, 512);
+	set_capacity(md->disk, size);
+	printk("set cap to %x\n", (unsigned int) get_capacity(md->disk));
+	return md;
+}
+
 static struct mmc_blk_data *mmc_blk_alloc_req(struct mmc_card *card,
 					      struct device *parent,
 					      sector_t size,
@@ -752,7 +802,9 @@ static struct mmc_blk_data *mmc_blk_alloc_req(struct mmc_card *card,
 
 	spin_lock_init(&md->lock);
 	INIT_LIST_HEAD(&md->part);
+	INIT_LIST_HEAD(&md->orphan);
 	md->usage = 1;
+	memcpy(md->raw_cid, card->raw_cid, sizeof(card->raw_cid));
 
 	ret = mmc_init_queue(&md->queue, card, &md->lock);
 	if (ret)
@@ -822,7 +874,9 @@ static struct mmc_blk_data *mmc_blk_alloc(struct mmc_card *card)
 		size = card->csd.capacity << (card->csd.read_blkbits - 9);
 	}
 
-	md = mmc_blk_alloc_req(card, &card->dev, size, false, NULL);
+	md = mmc_lookup_orphan(card, &card->dev, 0, size);
+	if (!md)
+		md = mmc_blk_alloc_req(card, &card->dev, size, false, NULL);
 	return md;
 }
 
@@ -836,8 +890,10 @@ static int mmc_blk_alloc_part(struct mmc_card *card,
 	char cap_str[10];
 	struct mmc_blk_data *part_md;
 
-	part_md = mmc_blk_alloc_req(card, disk_to_dev(md->disk), size, default_ro,
-				    subname);
+	part_md = mmc_lookup_orphan(card, disk_to_dev(md->disk), part_type, size);
+	if (!part_md)
+		part_md = mmc_blk_alloc_req(card, disk_to_dev(md->disk), size,
+					    default_ro, subname);
 	if (IS_ERR(part_md))
 		return PTR_ERR(part_md);
 	part_md->part_type = part_type;
@@ -906,6 +962,10 @@ static void mmc_blk_remove_req(struct mmc_blk_data *md)
 
 		/* Then flush out any already in there */
 		mmc_cleanup_queue(&md->queue);
+
+		mutex_lock(&orphan_mutex);
+		list_add(&md->orphan, &orphans);
+		mutex_unlock(&orphan_mutex);
 		mmc_blk_put(md);
 	}
 }
@@ -933,8 +993,10 @@ static int mmc_add_disk(struct mmc_blk_data *md)
 	md->force_ro.attr.name = "force_ro";
 	md->force_ro.attr.mode = S_IRUGO | S_IWUSR;
 	ret = device_create_file(disk_to_dev(md->disk), &md->force_ro);
-	if (ret)
+	if (ret) {
 		del_gendisk(md->disk);
+		return ret;
+	}
 
 	return ret;
 }
diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
index 85ef72c..87c4af7 100644
--- a/drivers/mmc/core/core.c
+++ b/drivers/mmc/core/core.c
@@ -1823,11 +1823,10 @@ int mmc_pm_notify(struct notifier_block *notify_block,
 		if (!host->bus_ops || host->bus_ops->suspend)
 			break;
 
-		mmc_claim_host(host);
-
 		if (host->bus_ops->remove)
 			host->bus_ops->remove(host);
 
+		mmc_claim_host(host);
 		mmc_detach_bus(host);
 		mmc_release_host(host);
 		host->pm_flags = 0;
-- 
1.7.0.4


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* RE: [RFC] MMC: Request for comments attempt at dealing with removeable suspend/resume.
  2011-04-04 20:31                 ` [RFC] MMC: Request for comments attempt at dealing with removeable suspend/resume Andrei Warkentin
@ 2011-05-17 10:15                   ` Dong, Chuanxiao
  2011-05-18  8:06                     ` Andrei Warkentin
  0 siblings, 1 reply; 13+ messages in thread
From: Dong, Chuanxiao @ 2011-05-17 10:15 UTC (permalink / raw)
  To: Andrei Warkentin, linux-mmc@vger.kernel.org



> -----Original Message-----
> From: linux-mmc-owner@vger.kernel.org
> [mailto:linux-mmc-owner@vger.kernel.org] On Behalf Of Andrei Warkentin
> Sent: Tuesday, April 05, 2011 4:31 AM
> To: linux-mmc@vger.kernel.org
> Cc: Andrei Warkentin
> Subject: [RFC] MMC: Request for comments attempt at dealing with removeable
> suspend/resume.
> 
> Is there any value to doing something like this in order to be able to
> suspend/resume
> with a (manually, or rootfs) mounted filesystem on mmcblk?
> 
> Thoughts?
Hi Andrei,
I also encountered the same issue with you when system is suspending. And the modification in core.c actually can fix my issue.
For the mounted root file system, I think it makes sense for such medias to use MMC_UNSAFE_RESUME. Is there some use case need to remove root file system media?

> 
> Signed-off-by: Andrei Warkentin <andreiw@motorola.com>
> ---
>  drivers/mmc/card/block.c |   76
> +++++++++++++++++++++++++++++++++++++++++----
>  drivers/mmc/core/core.c  |    3 +-
>  2 files changed, 70 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/mmc/card/block.c b/drivers/mmc/card/block.c
> index ee8f7a9..19eb5b6 100644
> --- a/drivers/mmc/card/block.c
> +++ b/drivers/mmc/card/block.c
> @@ -53,6 +53,9 @@ MODULE_ALIAS("mmc:block");
>       ((card)->ext_csd.rel_sectors)))
> 
>  static DEFINE_MUTEX(block_mutex);
> +static DEFINE_MUTEX(orphan_mutex);
> +
> +struct list_head orphans = LIST_HEAD_INIT(orphans);
> 
>  /*
>   * The defaults come from config options but can be overriden by module
> @@ -77,6 +80,7 @@ struct mmc_blk_data {
>  	struct gendisk	*disk;
>  	struct mmc_queue queue;
>  	struct list_head part;
> +	struct list_head orphan;
> 
>  	unsigned int	usage;
>  	unsigned int	read_only;
> @@ -88,6 +92,7 @@ struct mmc_blk_data {
>  	 * track of the current selected device partition.
>  	 */
>  	unsigned int	part_curr;
> +	u32		raw_cid[4];
>  	struct device_attribute force_ro;
>  };
> 
> @@ -126,10 +131,12 @@ static void mmc_blk_put(struct mmc_blk_data *md)
>  	mutex_lock(&open_lock);
>  	md->usage--;
>  	if (md->usage == 0) {
> -		int devidx = mmc_get_devidx(md->disk);
> -		blk_cleanup_queue(md->queue.queue);
> +		mutex_lock(&orphan_mutex);
> +		list_del(&md->orphan);
> +		mutex_unlock(&orphan_mutex);
> 
> -		__clear_bit(devidx, dev_use);
> +		blk_cleanup_queue(md->queue.queue);
> +		__clear_bit(mmc_get_devidx(md->disk), dev_use);
> 
>  		put_disk(md->disk);
>  		kfree(md);
> @@ -718,6 +725,49 @@ static inline int mmc_blk_readonly(struct mmc_card
> *card)
>  	       !(card->csd.cmdclass & CCC_BLOCK_WRITE);
>  }
> 
> +static inline struct mmc_blk_data *mmc_lookup_orphan(struct mmc_card *card,
> +						     struct device *parent,
> +						     unsigned int part_type,
> +						     sector_t size)
> +{
> +	int ret;
> +	struct list_head *pos, *q;
> +	struct mmc_blk_data *md;
> +	bool found = false;
> +
> +	mutex_lock(&orphan_mutex);
> +	list_for_each_safe(pos, q, &orphans) {
> +		md = list_entry(pos, struct mmc_blk_data, orphan);
> +		if (!memcmp(md->raw_cid, card->raw_cid, sizeof(md->raw_cid)) &&
> +		    md->part_type == part_type) {
> +			list_del(pos);
> +			found = true;
> +			mmc_blk_get(md->disk);
> +			break;
> +		}
> +	}
> +	mutex_unlock(&orphan_mutex);
> +
> +	if (!found)
> +		return NULL;
> +
> +	ret = mmc_init_queue(&md->queue, card, &md->lock);
> +	if (ret)
> +		return NULL;
> +
> +	INIT_LIST_HEAD(&md->part);
> +	md->disk->driverfs_dev = parent;
> +	md->queue.issue_fn = mmc_blk_issue_rq;
> +	md->queue.data = md;
> +	md->disk->queue = md->queue.queue;
> +	if (REL_WRITES_SUPPORTED(card))
> +		blk_queue_flush(md->queue.queue, REQ_FLUSH | REQ_FUA);
> +	blk_queue_logical_block_size(md->queue.queue, 512);
> +	set_capacity(md->disk, size);
> +	printk("set cap to %x\n", (unsigned int) get_capacity(md->disk));
> +	return md;
> +}
> +
>  static struct mmc_blk_data *mmc_blk_alloc_req(struct mmc_card *card,
>  					      struct device *parent,
>  					      sector_t size,
> @@ -752,7 +802,9 @@ static struct mmc_blk_data *mmc_blk_alloc_req(struct
> mmc_card *card,
> 
>  	spin_lock_init(&md->lock);
>  	INIT_LIST_HEAD(&md->part);
> +	INIT_LIST_HEAD(&md->orphan);
>  	md->usage = 1;
> +	memcpy(md->raw_cid, card->raw_cid, sizeof(card->raw_cid));
> 
>  	ret = mmc_init_queue(&md->queue, card, &md->lock);
>  	if (ret)
> @@ -822,7 +874,9 @@ static struct mmc_blk_data *mmc_blk_alloc(struct
> mmc_card *card)
>  		size = card->csd.capacity << (card->csd.read_blkbits - 9);
>  	}
> 
> -	md = mmc_blk_alloc_req(card, &card->dev, size, false, NULL);
> +	md = mmc_lookup_orphan(card, &card->dev, 0, size);
> +	if (!md)
> +		md = mmc_blk_alloc_req(card, &card->dev, size, false, NULL);
>  	return md;
>  }
> 
> @@ -836,8 +890,10 @@ static int mmc_blk_alloc_part(struct mmc_card *card,
>  	char cap_str[10];
>  	struct mmc_blk_data *part_md;
> 
> -	part_md = mmc_blk_alloc_req(card, disk_to_dev(md->disk), size, default_ro,
> -				    subname);
> +	part_md = mmc_lookup_orphan(card, disk_to_dev(md->disk), part_type, size);
> +	if (!part_md)
> +		part_md = mmc_blk_alloc_req(card, disk_to_dev(md->disk), size,
> +					    default_ro, subname);
>  	if (IS_ERR(part_md))
>  		return PTR_ERR(part_md);
>  	part_md->part_type = part_type;
> @@ -906,6 +962,10 @@ static void mmc_blk_remove_req(struct mmc_blk_data
> *md)
> 
>  		/* Then flush out any already in there */
>  		mmc_cleanup_queue(&md->queue);
> +
> +		mutex_lock(&orphan_mutex);
> +		list_add(&md->orphan, &orphans);
> +		mutex_unlock(&orphan_mutex);
>  		mmc_blk_put(md);
>  	}
>  }
> @@ -933,8 +993,10 @@ static int mmc_add_disk(struct mmc_blk_data *md)
>  	md->force_ro.attr.name = "force_ro";
>  	md->force_ro.attr.mode = S_IRUGO | S_IWUSR;
>  	ret = device_create_file(disk_to_dev(md->disk), &md->force_ro);
> -	if (ret)
> +	if (ret) {
>  		del_gendisk(md->disk);
> +		return ret;
> +	}
> 
>  	return ret;
>  }
> diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
> index 85ef72c..87c4af7 100644
> --- a/drivers/mmc/core/core.c
> +++ b/drivers/mmc/core/core.c
> @@ -1823,11 +1823,10 @@ int mmc_pm_notify(struct notifier_block
> *notify_block,
>  		if (!host->bus_ops || host->bus_ops->suspend)
>  			break;
> 
> -		mmc_claim_host(host);
> -
>  		if (host->bus_ops->remove)
>  			host->bus_ops->remove(host);
> 
> +		mmc_claim_host(host);
>  		mmc_detach_bus(host);
>  		mmc_release_host(host);
>  		host->pm_flags = 0;
> --
> 1.7.0.4
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC] MMC: Request for comments attempt at dealing with removeable suspend/resume.
  2011-05-17 10:15                   ` Dong, Chuanxiao
@ 2011-05-18  8:06                     ` Andrei Warkentin
  0 siblings, 0 replies; 13+ messages in thread
From: Andrei Warkentin @ 2011-05-18  8:06 UTC (permalink / raw)
  To: Dong, Chuanxiao; +Cc: linux-mmc@vger.kernel.org

On Tue, May 17, 2011 at 5:15 AM, Dong, Chuanxiao
<chuanxiao.dong@intel.com> wrote:
>
>
>> -----Original Message-----
>> From: linux-mmc-owner@vger.kernel.org
>> [mailto:linux-mmc-owner@vger.kernel.org] On Behalf Of Andrei Warkentin
>> Sent: Tuesday, April 05, 2011 4:31 AM
>> To: linux-mmc@vger.kernel.org
>> Cc: Andrei Warkentin
>> Subject: [RFC] MMC: Request for comments attempt at dealing with removeable
>> suspend/resume.
>>
>> Is there any value to doing something like this in order to be able to
>> suspend/resume
>> with a (manually, or rootfs) mounted filesystem on mmcblk?
>>
>> Thoughts?
> Hi Andrei,
> I also encountered the same issue with you when system is suspending. And the modification in core.c actually can fix my issue.
> For the mounted root file system, I think it makes sense for such medias to use MMC_UNSAFE_RESUME. Is there some use case need to remove root file system media?
>

Except that if the filesystem (non-root) fails to unmount or it's a
root filesystem on a device on a host which isn't MMC_UNSAFE_RESUME,
then you will re-enumerate incorrectly on resume. On resume, fs will
be on a now "dead" device (say mmc0), while kernel will re-enumerate
card as mmc1 now since mmc0 never tore down.

Basically even if you're smart enough to avoid switching cards on
suspend/resume, this can still get to you and you will wind up with
either just a hang (if root fs on card), or weird re-enumeration and
countless I/O errors to dead device.

What above patch does is that it uses a two step process to clean up
MDs. MDs are put on an orphan list when the backing device goes away.
If they can be safely cleaned up - great, and they go away completely
when the refcount drops. Otherwise, they'll be "reconnected" to if the
*same* backing device reappears again.

A

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2011-05-18  8:06 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-03-30 10:52 deadlock between mmc_pm_notify() and mmcqd Frank Hofmann
2011-04-04 12:54 ` Andrei Warkentin
2011-04-04 13:11   ` Frank Hofmann
2011-04-04 14:00     ` [PATCH] MMC: fix mmc_pm_notify bus_ops->remove deadlock Andrei Warkentin
2011-04-04 13:27       ` Andrei Warkentin
2011-04-04 15:01         ` Ohad Ben-Cohen
2011-04-04 15:10           ` Andrei Warkentin
2011-04-04 15:01         ` Andrei Warkentin
     [not found]           ` <alpine.DEB.2.00.1104041624310.690@localhost6.localdomain6>
2011-04-04 15:47             ` Andrei Warkentin
2011-04-04 16:08               ` Andrei Warkentin
2011-04-04 20:31                 ` [RFC] MMC: Request for comments attempt at dealing with removeable suspend/resume Andrei Warkentin
2011-05-17 10:15                   ` Dong, Chuanxiao
2011-05-18  8:06                     ` Andrei Warkentin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).