* Occasional crashes in suspend-resume with MMC transactions
@ 2012-03-22 4:23 Bedia, Vaibhav
2012-03-22 13:20 ` Bedia, Vaibhav
0 siblings, 1 reply; 8+ messages in thread
From: Bedia, Vaibhav @ 2012-03-22 4:23 UTC (permalink / raw)
To: linux-arm-kernel
Hi,
I am trying to do suspend-resume test with a file copy on MMC/SD going on
in the background. The test involves simply copying a 450MB file on an ext3
partition to the same partition under a different name.
This is on an AM335x board which uses the omap_hsmmc driver.
The kernel is v3.2 and I have also applied the following patch
commit 6b9dc6b5f2a1b7953e46e897eb3e91e885322ed0
Author: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
Date: Wed Jan 4 15:28:45 2012 +0100
mmc: fix a deadlock between system suspend and MMC block IO
Performing MMC block IO with simultaneous STR can lead to a deadlock: the
mmc_pm_notify() function claims the host and then calls bus .remove()
method, which lands in mmc_blk_remove(), which calls mmc_blk_remove_req()
then it goes to -> mmc_cleanup_queue() -> kthread_stop(), which waits for
the mmc-block thread to stop. If the mmc-block thread at that time is
processing block requests, it will also try to claim the host in
mmc_blk_issue_rq() and block there. This patch fixes the problem by
calling .remove() before claiming the host.
Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
Acked-by: Arindam Nath <arindam.nath@amd.com>
Signed-off-by: Chris Ball <cjb@laptop.org>
In some iterations the resume part gets into issues with the following dump
...
[ 229.716033] PM: Syncing filesystems ... done.
[ 281.389129] Freezing user space processes ... (elapsed 0.95 seconds) done.
[ 282.357757] Freezing remaining freezable tasks ... (elapsed 4.36 seconds) done.
[ 286.764099] PM: suspend of devices complete after 29.815 msecs
[ 286.765777] PM: late suspend of devices complete after 1.647 msecs
( resume using a uart keystroke)
[ 290.121765] GFX domain entered low power stat
[ 290.123748] Unhandled fault: external abort on non-linefetch (0x1028) at 0xfa06022c
[ 290.123809] Internal error: : 1028 [#1]
[ 290.123840] Modules linked in:
[ 290.123870] CPU: 0 Not tainted (3.2.0-12280-g6b9dc6b-dirty #12)
[ 290.123931] PC is at omap_hsmmc_request+0xcc/0x4d8
[ 290.123962] LR is at mmc_start_request+0x110/0x20c
[ 290.123992] pc : [<c030baec>] lr : [<c02fbff8>] psr: a0000113
[ 290.124023] sp : cf373e58 ip : cf373ea0 fp : cf373e9c
[ 290.124053] r10: 00000000 r9 : ce55e218 r8 : cfa6202c
[ 290.124053] r7 : cfa620dc r6 : cf3b6000 r5 : cfa6202c r4 : cf3b6280
[ 290.124084] r3 : fa060100 r2 : fa060100 r1 : 00000080 r0 : cf3b6000
[ 290.124145] Flags: NzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment kernel
[ 290.124176] Control: 10c5387d Table: 8edf4019 DAC: 00000015
[ 290.124206] Process mmcqd/0 (pid: 666, stack limit = 0xcf3722f0)
[ 290.124237] Stack: (0xcf373e58 to 0xcf374000)
[ 290.124237] 3e40: cf373e9c cf373e68
[ 290.124298] 3e60: c001991c c01c7898 cf362c40 00000001 00002000 cfa6202c cf3b6000 c0616550
[ 290.124359] 3e80: cfa620dc cfa6202c ce55e218 00000000 cf373ed4 cf373ea0 c02fbff8 c030ba2c
[ 290.124389] 3ea0: 00000001 00000000 cfa6202c 00000000 cf373ed4 cf3b6000 cfa6211c 00000000
[ 290.124420] 3ec0: cf373f14 cfa6202c cf373efc cf373ed8 c02fce8c c02fbef4 00000000 cfa62400
[ 290.124481] 3ee0: cfa62004 cf372000 cfa62400 cfa62000 cf373f44 cf373f00 c03089fc c02fcd88
[ 290.124542] 3f00: cf373f54 00000000 cfa6223c c312e8d8 00100100 00200200 c02fc47c cfa62400
[ 290.124572] 3f20: cfa62004 cf372000 cfa62000 c312e8d8 ce55e218 cfa62000 cf373f8c cf373f48
[ 290.124633] 3f40: c0308e5c c0308998 00000000 c312e8d8 cf373f74 cf373f60 c01ab760 c01af878
[ 290.124664] 3f60: c312e8d8 cfa62004 00000000 cf372000 00000001 cf3a8ae8 ce55e218 cfa6200c
[ 290.124725] 3f80: cf373fbc cf373f90 c03095f8 c0308cd0 00000000 cf835cb0 cfa62004 c030959c
[ 290.124755] 3fa0: 00000013 00000000 00000000 00000000 cf373ff4 cf373fc0 c0055a30 c03095a8
[ 290.124786] 3fc0: cf835cb0 00000000 cfa62004 00000000 cf373fd0 cf373fd0 00000000 cf835cb0
[ 290.124847] 3fe0: c00559a0 c004018c 00000000 cf373ff8 c004018c c00559ac f52322cc f291067e
[ 290.124877] Backtrace:
[ 290.124938] [<c030ba20>] (omap_hsmmc_request+0x0/0x4d8) from [<c02fbff8>] (mmc_start_request+0x110/0x20c)
[ 290.125000] [<c02fbee8>] (mmc_start_request+0x0/0x20c) from [<c02fce8c>] (mmc_start_req+0x110/0x160)
[ 290.125030] r8:cfa6202c r7:cf373f14 r6:00000000 r5:cfa6211c r4:cf3b6000
[ 290.125091] [<c02fcd7c>] (mmc_start_req+0x0/0x160) from [<c03089fc>] (mmc_blk_issue_rw_rq+0x70/0x338)
[ 290.125122] r8:cfa62000 r7:cfa62400 r6:cf372000 r5:cfa62004 r4:cfa62400
[ 290.125183] r3:00000000
[ 290.125213] [<c030898c>] (mmc_blk_issue_rw_rq+0x0/0x338) from [<c0308e5c>] (mmc_blk_issue_rq+0x198/0x460)
[ 290.125274] [<c0308cc4>] (mmc_blk_issue_rq+0x0/0x460) from [<c03095f8>] (mmc_queue_thread+0x5c/0xf0)
[ 290.125335] [<c030959c>] (mmc_queue_thread+0x0/0xf0) from [<c0055a30>] (kthread+0x90/0x94)
[ 290.125396] [<c00559a0>] (kthread+0x0/0x94) from [<c004018c>] (do_exit+0x0/0x67c)
[ 290.125427] r6:c004018c r5:c00559a0 r4:cf835cb0
[ 290.125457] Code: e5933008 e1833801 e5823104 e5943034 (e593212c)
[ 290.125488] ---[ end trace 82f48b7910402960 ]---
[ 290.329986] PM: early resume of devices complete after 207.732 msecs
[ 290.465484] CPSW phy found : id is : 0x4dd074
[ 290.467193] PHY 0:01 not found
[ 290.469512] PM: resume of devices complete after 139.156 msecs
[ 290.921264] Successfully transitioned all domains to low power state
[ 290.928375] PM: Finishing wakeup.
[ 290.932250] Restarting tasks ...
[ 290.960052] done.
[ 293.461608] PHY: 0:00 - Link is Up - 100/Full
Whenever this happens, the file copy doesn't continue after resume.
A snapshot of the blocked tasks gives the following info
root at arago-armv7:/media/mmcblk0p3# echo w > /proc/sysrq-trigger
[ 364.364807] SysRq : Show Blocked State
[ 364.368743] task PC stack pid father
[ 364.374206] kworker/u:0 D c0411548 0 5 2 0x00000000
[ 364.380859] Backtrace:
[ 364.383422] [<c0411388>] (__schedule+0x0/0x394) from [<c0411810>] (schedule+0x50/0x68)
[ 364.391693] [<c04117c0>] (schedule+0x0/0x68) from [<c02fc604>] (__mmc_claim_host+0x138/0x144)
[ 364.400604] [<c02fc4cc>] (__mmc_claim_host+0x0/0x144) from [<c030184c>] (mmc_sd_detect+0x28/0x80)
[ 364.409881] [<c0301824>] (mmc_sd_detect+0x0/0x80) from [<c02fdff0>] (mmc_rescan+0x120/0x250)
[ 364.418670] r5:cf8d2e00 r4:cf3b6000
[ 364.422424] [<c02fded0>] (mmc_rescan+0x0/0x250) from [<c004fba0>] (process_one_work+0x124/0x384)
[ 364.431579] r6:cf834000 r5:cf8d2e00 r4:cf817f00 r3:60000013
[ 364.437500] [<c004fa7c>] (process_one_work+0x0/0x384) from [<c0051a18>] (worker_thread+0x15c/0x330)
[ 364.446960] [<c00518bc>] (worker_thread+0x0/0x330) from [<c0055a30>] (kthread+0x90/0x94)
[ 364.455413] [<c00559a0>] (kthread+0x0/0x94) from [<c004018c>] (do_exit+0x0/0x67c)
[ 364.463226] r6:c004018c r5:c00559a0 r4:cf81dee0
[ 364.468078] kjournald D c0411548 0 1747 2 0x00000000
[ 364.474700] Backtrace:
[ 364.477264] [<c0411388>] (__schedule+0x0/0x394) from [<c0411810>] (schedule+0x50/0x68)
[ 364.485534] [<c04117c0>] (schedule+0x0/0x68) from [<c0411890>] (io_schedule+0x68/0x90)
[ 364.493804] [<c0411828>] (io_schedule+0x0/0x90) from [<c01aabc4>] (get_request_wait+0xcc/0x170)
[ 364.502868] r6:cf3a8ae8 r5:cfb53d9c r4:cfb53da8 r3:cede3c00
[ 364.508789] [<c01aaaf8>] (get_request_wait+0x0/0x170) from [<c01abde8>] (blk_queue_bio+0x4c/0x298)
[ 364.518157] [<c01abd9c>] (blk_queue_bio+0x0/0x298) from [<c01a9c64>] (generic_make_request+0xb4/0xd4)
[ 364.527770] [<c01a9bb0>] (generic_make_request+0x0/0xd4) from [<c01a9d08>] (submit_bio+0x84/0xf8)
[ 364.537017] r6:00000211 r5:00000008 r4:c3538240
[ 364.541870] [<c01a9c84>] (submit_bio+0x0/0xf8) from [<c00cd198>] (submit_bh+0x13c/0x16c)
[ 364.550323] [<c00cd05c>] (submit_bh+0x0/0x16c) from [<c011ecb0>] (journal_do_submit_data.clone.23+0x3c/0x48)
[ 364.560577] r7:00000200 r6:c00d0c68 r5:cee2ee80 r4:000001a0
[ 364.566497] [<c011ec74>] (journal_do_submit_data.clone.23+0x0/0x48) from [<c011fab0>] (journal_commit_transaction+0xdf4/0xf1c)
[ 364.578369] r7:c600cec0 r6:00000200 r5:c600ce88 r4:cec053c0
[ 364.584289] [<c011ecbc>] (journal_commit_transaction+0x0/0xf1c) from [<c0121e54>] (kjournald+0xc4/0x1d8)
[ 364.594207] [<c0121d90>] (kjournald+0x0/0x1d8) from [<c0055a30>] (kthread+0x90/0x94)
[ 364.602264] [<c00559a0>] (kthread+0x0/0x94) from [<c004018c>] (do_exit+0x0/0x67c)
[ 364.610076] r6:c004018c r5:c00559a0 r4:cedfbcd8
[ 364.614929] cp D c0411548 0 1915 1877 0x00000000
[ 364.621551] Backtrace:
[ 364.624114] [<c0411388>] (__schedule+0x0/0x394) from [<c0411810>] (schedule+0x50/0x68)
[ 364.632354] [<c04117c0>] (schedule+0x0/0x68) from [<c0411890>] (io_schedule+0x68/0x90)
[ 364.640625] [<c0411828>] (io_schedule+0x0/0x90) from [<c01aabc4>] (get_request_wait+0xcc/0x170)
[ 364.649719] r6:cf3a8ae8 r5:c17aba84 r4:c17aba90 r3:cede2600
[ 364.655639] [<c01aaaf8>] (get_request_wait+0x0/0x170) from [<c01abde8>] (blk_queue_bio+0x4c/0x298)
[ 364.664978] [<c01abd9c>] (blk_queue_bio+0x0/0x298) from [<c01a9c64>] (generic_make_request+0xb4/0xd4)
[ 364.674591] [<c01a9bb0>] (generic_make_request+0x0/0xd4) from [<c01a9d08>] (submit_bio+0x84/0xf8)
[ 364.683868] r6:00000001 r5:00000080 r4:cfac3a40
[ 364.688690] [<c01a9c84>] (submit_bio+0x0/0xf8) from [<c00d9870>] (do_mpage_readpage+0x468/0x774)
[ 364.697845] [<c00d9408>] (do_mpage_readpage+0x0/0x774) from [<c00d9cec>] (mpage_readpages+0xec/0x148)
[ 364.707489] [<c00d9c00>] (mpage_readpages+0x0/0x148) from [<c01060a0>] (ext3_readpages+0x24/0x28)
[ 364.716766] [<c010607c>] (ext3_readpages+0x0/0x28) from [<c0083448>] (__do_page_cache_readahead+0x198/0x27c)
[ 364.727020] [<c00832b0>] (__do_page_cache_readahead+0x0/0x27c) from [<c00837f8>] (ra_submit+0x28/0x30)
[ 364.736724] [<c00837d0>] (ra_submit+0x0/0x30) from [<c0083908>] (ondemand_readahead+0x108/0x1e8)
[ 364.745910] [<c0083800>] (ondemand_readahead+0x0/0x1e8) from [<c0083ae0>] (page_cache_sync_readahead+0x4c/0x68)
[ 364.756439] [<c0083a94>] (page_cache_sync_readahead+0x0/0x68) from [<c007be18>] (generic_file_aio_read+0x42c/0x770)
[ 364.767333] r5:cf576888 r4:00000000
[ 364.771057] [<c007b9ec>] (generic_file_aio_read+0x0/0x770) from [<c00a7694>] (do_sync_read+0xac/0xec)
[ 364.780700] [<c00a75e8>] (do_sync_read+0x0/0xec) from [<c00a7e20>] (vfs_read+0xb8/0x14c)
[ 364.789123] r8:00001000 r7:c17abf70 r6:bea1da58 r5:00001000 r4:ced3a740
[ 364.796142] [<c00a7d68>] (vfs_read+0x0/0x14c) from [<c00a7ef8>] (sys_read+0x44/0x74)
[ 364.804229] r8:00001000 r7:bea1da58 r6:ced3a740 r5:00000000 r4:09adc000
[ 364.811248] [<c00a7eb4>] (sys_read+0x0/0x74) from [<c0014200>] (ret_fast_syscall+0x0/0x30)
[ 364.819854] r8:c00143a8 r7:00000003 r6:bea1da58 r5:00000003 r4:00001000
Debugging a bit further I found that runtime resume of hsmmc does not take place in the
failed iterations. Since this doesn't happen in every iteration I would hazard a guess
that there's some sort of race condition between the MMC core resuming and the
pm_runtime_get_sync() call.
Any pointers on what could be going wrong here and how to debug this further.
Regards,
Vaibhav
^ permalink raw reply [flat|nested] 8+ messages in thread
* Occasional crashes in suspend-resume with MMC transactions
2012-03-22 4:23 Occasional crashes in suspend-resume with MMC transactions Bedia, Vaibhav
@ 2012-03-22 13:20 ` Bedia, Vaibhav
2012-03-22 14:27 ` S, Venkatraman
0 siblings, 1 reply; 8+ messages in thread
From: Bedia, Vaibhav @ 2012-03-22 13:20 UTC (permalink / raw)
To: linux-arm-kernel
On Thu, Mar 22, 2012 at 09:53:23, Bedia, Vaibhav wrote:
> Hi,
>
> I am trying to do suspend-resume test with a file copy on MMC/SD going on
> in the background. The test involves simply copying a 450MB file on an ext3
> partition to the same partition under a different name.
>
> This is on an AM335x board which uses the omap_hsmmc driver.
> The kernel is v3.2 and I have also applied the following patch
>
[...]
I found that whenever this issue crops up, mmc_host_suspend() is not able to claim the host
And returns -EBUSY. omap_hsmmc driver does not pass on this error code to the PM core and
hence the suspend process continues. When the driver is made to return -EBUSY, the suspend
process gets aborted and the user can try suspending again. I am not sure whether this sort
of suspend failure is acceptable or the driver is doing something wrong. The following
workaround is what I came up with. Do this look a reasonable thing to do?
---
>From a4040dd1869b351a5fa29dacd08facf6e24df609 Mon Sep 17 00:00:00 2001
From: Vaibhav Bedia <vaibhav.bedia@ti.com>
Date: Thu, 22 Mar 2012 17:14:49 +0530
Subject: [PATCH 1/1] mmc: omap_hsmmc: Pass on the suspend failure to the PM core
In some cases mmc_host_suspend() is not able to claim the
host and proceed with the suspend process. The core returns
-EBUSY to the host controller driver. Unfortunately, the
host controller driver does not pass on this information
to the PM core and hence the system suspend process continues.
In these cases the MMC core gets to an unexpected state
during resume and multiple issues related to MMC crop up.
1. Host controller driver starts accessing the device registers
before the clocks are enabled which leads to a prefetch abort.
2. A file copy thread which was launched before suspend gets
stuck due to the host not being reclaimed during resume.
To avoid such problems pass on the -EBUSY status to the PM core
from the host controller driver. With this change, MMC core
suspend might still fail but it does not end up making the
system unusable. Suspend gets aborted and the user can try
suspending the system again.
Signed-off-by: Vaibhav Bedia <vaibhav.bedia@ti.com>
---
drivers/mmc/host/omap_hsmmc.c | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)
diff --git a/drivers/mmc/host/omap_hsmmc.c b/drivers/mmc/host/omap_hsmmc.c
index 3d8dbbb..1f938d9 100644
--- a/drivers/mmc/host/omap_hsmmc.c
+++ b/drivers/mmc/host/omap_hsmmc.c
@@ -2238,6 +2238,7 @@ static int omap_hsmmc_suspend(struct device *dev)
dev_dbg(mmc_dev(host->mmc),
"Unmask interrupt failed\n");
}
+ ret = -EBUSY;
goto err;
}
--
1.7.0.4
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Occasional crashes in suspend-resume with MMC transactions
2012-03-22 13:20 ` Bedia, Vaibhav
@ 2012-03-22 14:27 ` S, Venkatraman
2012-03-22 14:43 ` Bedia, Vaibhav
2012-03-22 14:49 ` Hebbar, Gururaja
0 siblings, 2 replies; 8+ messages in thread
From: S, Venkatraman @ 2012-03-22 14:27 UTC (permalink / raw)
To: linux-arm-kernel
On Thu, Mar 22, 2012 at 6:50 PM, Bedia, Vaibhav <vaibhav.bedia@ti.com> wrote:
> On Thu, Mar 22, 2012 at 09:53:23, Bedia, Vaibhav wrote:
>> Hi,
>>
>> I am trying to do suspend-resume test with a file copy on MMC/SD going on
>> in the background. The test involves simply copying a 450MB file on an ext3
>> partition to the same partition under a different name.
>>
>> This is on an AM335x board which uses the omap_hsmmc driver.
>> The kernel is v3.2 and I have also applied the following patch
>>
>
> [...]
>
> I found that whenever this issue crops up, mmc_host_suspend() is not able to claim the host
> And returns -EBUSY. omap_hsmmc driver does not pass on this error code to the PM core and
> hence the suspend process continues. When the driver is made to return -EBUSY, the suspend
> process gets aborted and the user can try suspending again. I am not sure whether this sort
> of suspend failure is acceptable or the driver is doing something wrong. The following
> workaround is what I came up with. Do this look a reasonable thing to do?
>
> ---
>
> From a4040dd1869b351a5fa29dacd08facf6e24df609 Mon Sep 17 00:00:00 2001
> From: Vaibhav Bedia <vaibhav.bedia@ti.com>
> Date: Thu, 22 Mar 2012 17:14:49 +0530
> Subject: [PATCH 1/1] mmc: omap_hsmmc: Pass on the suspend failure to the PM core
>
> In some cases mmc_host_suspend() is not able to claim the
> host and proceed with the suspend process. The core returns
> -EBUSY to the host controller driver. Unfortunately, the
> host controller driver does not pass on this information
> to the PM core and hence the system suspend process continues.
>
> In these cases the MMC core gets to an unexpected state
> during resume and multiple issues related to MMC crop up.
> 1. Host controller driver starts accessing the device registers
> before the clocks are enabled which leads to a prefetch abort.
> 2. A file copy thread which was launched before suspend gets
> stuck due to the host not being reclaimed during resume.
>
> To avoid such problems pass on the -EBUSY status to the PM core
> from the host controller driver. With this change, MMC core
> suspend might still fail but it does not end up making the
> system unusable. Suspend gets aborted and the user can try
> suspending the system again.
>
> Signed-off-by: Vaibhav Bedia <vaibhav.bedia@ti.com>
> ---
> ?drivers/mmc/host/omap_hsmmc.c | ? ?1 +
> ?1 files changed, 1 insertions(+), 0 deletions(-)
>
> diff --git a/drivers/mmc/host/omap_hsmmc.c b/drivers/mmc/host/omap_hsmmc.c
> index 3d8dbbb..1f938d9 100644
> --- a/drivers/mmc/host/omap_hsmmc.c
> +++ b/drivers/mmc/host/omap_hsmmc.c
> @@ -2238,6 +2238,7 @@ static int omap_hsmmc_suspend(struct device *dev)
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?dev_dbg(mmc_dev(host->mmc),
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?"Unmask interrupt failed\n");
> ? ? ? ? ? ? ? ? ? ? ? ?}
> + ? ? ? ? ? ? ? ? ? ? ? ret = -EBUSY;
> ? ? ? ? ? ? ? ? ? ? ? ?goto err;
> ? ? ? ? ? ? ? ?}
>
> --
> 1.7.0.4
I see (in 3.3) that the host controller driver does a "return ret" and
that means the errors is propagated.
Where is the return code lost /overridden ?
^ permalink raw reply [flat|nested] 8+ messages in thread
* Occasional crashes in suspend-resume with MMC transactions
2012-03-22 14:27 ` S, Venkatraman
@ 2012-03-22 14:43 ` Bedia, Vaibhav
2012-03-22 16:06 ` S, Venkatraman
2012-03-22 14:49 ` Hebbar, Gururaja
1 sibling, 1 reply; 8+ messages in thread
From: Bedia, Vaibhav @ 2012-03-22 14:43 UTC (permalink / raw)
To: linux-arm-kernel
On Thu, Mar 22, 2012 at 19:57:16, S, Venkatraman wrote:
[...]
>
> I see (in 3.3) that the host controller driver does a "return ret" and
> that means the errors is propagated.
> Where is the return code lost /overridden ?
>
The return code gets overridden due to the call to host->pdata->resume()
which always returns 0.
static int omap_hsmmc_resume_cdirq(struct device *dev, int slot)
{
struct omap_mmc_platform_data *mmc = dev->platform_data;
enable_irq(mmc->slots[0].card_detect_irq);
return 0;
}
Regards,
Vaibhav
^ permalink raw reply [flat|nested] 8+ messages in thread
* Occasional crashes in suspend-resume with MMC transactions
2012-03-22 14:27 ` S, Venkatraman
2012-03-22 14:43 ` Bedia, Vaibhav
@ 2012-03-22 14:49 ` Hebbar, Gururaja
1 sibling, 0 replies; 8+ messages in thread
From: Hebbar, Gururaja @ 2012-03-22 14:49 UTC (permalink / raw)
To: linux-arm-kernel
On Thu, Mar 22, 2012 at 19:57:16, S, Venkatraman wrote:
> On Thu, Mar 22, 2012 at 6:50 PM, Bedia, Vaibhav <vaibhav.bedia@ti.com> wrote:
> > On Thu, Mar 22, 2012 at 09:53:23, Bedia, Vaibhav wrote:
> >> Hi,
> >>
> >> I am trying to do suspend-resume test with a file copy on MMC/SD going on
> >> in the background. The test involves simply copying a 450MB file on an ext3
> >> partition to the same partition under a different name.
> >>
> >> This is on an AM335x board which uses the omap_hsmmc driver.
> >> The kernel is v3.2 and I have also applied the following patch
> >>
> >
> > [...]
> >
> > I found that whenever this issue crops up, mmc_host_suspend() is not able to claim the host
> > And returns -EBUSY. omap_hsmmc driver does not pass on this error code to the PM core and
> > hence the suspend process continues. When the driver is made to return -EBUSY, the suspend
> > process gets aborted and the user can try suspending again. I am not sure whether this sort
> > of suspend failure is acceptable or the driver is doing something wrong. The following
> > workaround is what I came up with. Do this look a reasonable thing to do?
> >
> > ---
> >
> > From a4040dd1869b351a5fa29dacd08facf6e24df609 Mon Sep 17 00:00:00 2001
> > From: Vaibhav Bedia <vaibhav.bedia@ti.com>
> > Date: Thu, 22 Mar 2012 17:14:49 +0530
> > Subject: [PATCH 1/1] mmc: omap_hsmmc: Pass on the suspend failure to the PM core
> >
> > In some cases mmc_host_suspend() is not able to claim the
> > host and proceed with the suspend process. The core returns
> > -EBUSY to the host controller driver. Unfortunately, the
> > host controller driver does not pass on this information
> > to the PM core and hence the system suspend process continues.
> >
> > In these cases the MMC core gets to an unexpected state
> > during resume and multiple issues related to MMC crop up.
> > 1. Host controller driver starts accessing the device registers
> > before the clocks are enabled which leads to a prefetch abort.
> > 2. A file copy thread which was launched before suspend gets
> > stuck due to the host not being reclaimed during resume.
> >
> > To avoid such problems pass on the -EBUSY status to the PM core
> > from the host controller driver. With this change, MMC core
> > suspend might still fail but it does not end up making the
> > system unusable. Suspend gets aborted and the user can try
> > suspending the system again.
> >
> > Signed-off-by: Vaibhav Bedia <vaibhav.bedia@ti.com>
> > ---
> > ?drivers/mmc/host/omap_hsmmc.c | ? ?1 +
> > ?1 files changed, 1 insertions(+), 0 deletions(-)
> >
> > diff --git a/drivers/mmc/host/omap_hsmmc.c b/drivers/mmc/host/omap_hsmmc.c
> > index 3d8dbbb..1f938d9 100644
> > --- a/drivers/mmc/host/omap_hsmmc.c
> > +++ b/drivers/mmc/host/omap_hsmmc.c
> > @@ -2238,6 +2238,7 @@ static int omap_hsmmc_suspend(struct device *dev)
> > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?dev_dbg(mmc_dev(host->mmc),
> > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?"Unmask interrupt failed\n");
> > ? ? ? ? ? ? ? ? ? ? ? ?}
> > + ? ? ? ? ? ? ? ? ? ? ? ret = -EBUSY;
> > ? ? ? ? ? ? ? ? ? ? ? ?goto err;
> > ? ? ? ? ? ? ? ?}
> >
> > --
> > 1.7.0.4
>
> I see (in 3.3) that the host controller driver does a "return ret" and
> that means the errors is propagated.
> Where is the return code lost /overridden ?
As seen from http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob;f=drivers/mmc/host/omap_hsmmc.c;h=fd0c661bbad38a7152a37955146ff6068d4d7137;hb=refs/tags/v3.3
omap_hsmmc.c
@ 2045
pdata->resume = omap_hsmmc_resume_cdirq;
@227
static int omap_hsmmc_resume_cdirq(struct device *dev, int slot)
{
struct omap_mmc_platform_data *mmc = dev->platform_data;
enable_irq(mmc->slots[0].card_detect_irq);
return 0; -----> always return 0
}
@2167
if (host->pdata->resume) {
ret = host->pdata->resume(&pdev->dev, --> ret becomes zero irrespective of its previous value.
host->slot_id);
if (ret)
dev_dbg(mmc_dev(host->mmc),
"Unmask interrupt failed\n");
}
>
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>
Regards,
Gururaja
^ permalink raw reply [flat|nested] 8+ messages in thread
* Occasional crashes in suspend-resume with MMC transactions
2012-03-22 14:43 ` Bedia, Vaibhav
@ 2012-03-22 16:06 ` S, Venkatraman
2012-03-22 16:13 ` Bedia, Vaibhav
0 siblings, 1 reply; 8+ messages in thread
From: S, Venkatraman @ 2012-03-22 16:06 UTC (permalink / raw)
To: linux-arm-kernel
On Thu, Mar 22, 2012 at 8:13 PM, Bedia, Vaibhav <vaibhav.bedia@ti.com> wrote:
> On Thu, Mar 22, 2012 at 19:57:16, S, Venkatraman wrote:
> [...]
>>
>> I see (in 3.3) that the host controller driver does a "return ret" and
>> that means the errors is propagated.
>> Where is the return code lost /overridden ?
>>
>
> The return code gets overridden due to the call to host->pdata->resume()
> which always returns 0.
Thanks - I see what you mean. But the patch is clunky.
A clean fix would be to not capture the return code of resume() in
"ret" and let the old
value of ret be propagated as is.
>
> static int omap_hsmmc_resume_cdirq(struct device *dev, int slot)
> {
> ? ? ? ?struct omap_mmc_platform_data *mmc = dev->platform_data;
>
> ? ? ? ?enable_irq(mmc->slots[0].card_detect_irq);
> ? ? ? ?return 0;
> }
>
> Regards,
> Vaibhav
^ permalink raw reply [flat|nested] 8+ messages in thread
* Occasional crashes in suspend-resume with MMC transactions
2012-03-22 16:06 ` S, Venkatraman
@ 2012-03-22 16:13 ` Bedia, Vaibhav
2012-03-26 6:49 ` S, Venkatraman
0 siblings, 1 reply; 8+ messages in thread
From: Bedia, Vaibhav @ 2012-03-22 16:13 UTC (permalink / raw)
To: linux-arm-kernel
On Thu, Mar 22, 2012 at 21:36:36, S, Venkatraman wrote:
> On Thu, Mar 22, 2012 at 8:13 PM, Bedia, Vaibhav <vaibhav.bedia@ti.com> wrote:
> > On Thu, Mar 22, 2012 at 19:57:16, S, Venkatraman wrote:
> > [...]
> >>
> >> I see (in 3.3) that the host controller driver does a "return ret" and
> >> that means the errors is propagated.
> >> Where is the return code lost /overridden ?
> >>
> >
> > The return code gets overridden due to the call to host->pdata->resume()
> > which always returns 0.
>
> Thanks - I see what you mean. But the patch is clunky.
> A clean fix would be to not capture the return code of resume() in
> "ret" and let the old
> value of ret be propagated as is.
>
I agree. However, right now there's a dev_dbg() which checks for the return code.
I could change the host->pdata->resume() and host->pdata->suspend() to not return
anything if that's acceptable.
More importantly, is this the right fix or does the driver need some other fix
to make sure that such suspend failures do not occur at all?
^ permalink raw reply [flat|nested] 8+ messages in thread
* Occasional crashes in suspend-resume with MMC transactions
2012-03-22 16:13 ` Bedia, Vaibhav
@ 2012-03-26 6:49 ` S, Venkatraman
0 siblings, 0 replies; 8+ messages in thread
From: S, Venkatraman @ 2012-03-26 6:49 UTC (permalink / raw)
To: linux-arm-kernel
On Thu, Mar 22, 2012 at 9:43 PM, Bedia, Vaibhav <vaibhav.bedia@ti.com> wrote:
> On Thu, Mar 22, 2012 at 21:36:36, S, Venkatraman wrote:
>> On Thu, Mar 22, 2012 at 8:13 PM, Bedia, Vaibhav <vaibhav.bedia@ti.com> wrote:
>> > On Thu, Mar 22, 2012 at 19:57:16, S, Venkatraman wrote:
>> > [...]
>> >>
>> >> I see (in 3.3) that the host controller driver does a "return ret" and
>> >> that means the errors is propagated.
>> >> Where is the return code lost /overridden ?
>> >>
>> >
>> > The return code gets overridden due to the call to host->pdata->resume()
>> > which always returns 0.
>>
>> Thanks - I see what you mean. But the patch is clunky.
>> A clean fix would be to not capture the return code of resume() in
>> "ret" and let the old
>> value of ret be propagated as is.
>>
>
> I agree. However, right now there's a dev_dbg() which checks for the return code.
> I could change the host->pdata->resume() and host->pdata->suspend() to not return
> anything if that's acceptable.
>
That's a good idea.
> More importantly, is this the right fix or does the driver need some other fix
> to make sure that such suspend failures do not occur at all?
I will look into this - but haven't seen such failures here to know
what's behind the symptom.
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2012-03-26 6:49 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-03-22 4:23 Occasional crashes in suspend-resume with MMC transactions Bedia, Vaibhav
2012-03-22 13:20 ` Bedia, Vaibhav
2012-03-22 14:27 ` S, Venkatraman
2012-03-22 14:43 ` Bedia, Vaibhav
2012-03-22 16:06 ` S, Venkatraman
2012-03-22 16:13 ` Bedia, Vaibhav
2012-03-26 6:49 ` S, Venkatraman
2012-03-22 14:49 ` Hebbar, Gururaja
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).