From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andrei Warkentin Subject: Re: deadlock between mmc_pm_notify() and mmcqd Date: Mon, 4 Apr 2011 07:54:11 -0500 Message-ID: References: Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Return-path: Received: from exprod5og107.obsmtp.com ([64.18.0.184]:39225 "EHLO exprod5og107.obsmtp.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752805Ab1DDMyP (ORCPT ); Mon, 4 Apr 2011 08:54:15 -0400 Received: from DE01MGRG01.AM.MOT-MOBILITY.COM ([10.22.94.168]) by DE01MGRG01.AM.MOT-MOBILITY.COM (8.14.3/8.14.3) with ESMTP id p34CsZJi023730 for ; Mon, 4 Apr 2011 08:54:35 -0400 (EDT) Received: from mail-ww0-f46.google.com (mail-ww0-f46.google.com [74.125.82.46]) by DE01MGRG01.AM.MOT-MOBILITY.COM (8.14.3/8.14.3) with ESMTP id p34Cqivq023094 (version=TLSv1/SSLv3 cipher=RC4-SHA bits=128 verify=OK) for ; Mon, 4 Apr 2011 08:54:35 -0400 (EDT) Received: by mail-ww0-f46.google.com with SMTP id 28so6072468wwb.3 for ; Mon, 04 Apr 2011 05:54:12 -0700 (PDT) In-Reply-To: Sender: linux-mmc-owner@vger.kernel.org List-Id: linux-mmc@vger.kernel.org To: frank.hofmann@tomtom.com Cc: linux-mmc@vger.kernel.org On Wed, Mar 30, 2011 at 5:52 AM, Frank Hofmann wrote: > Hi linux-mmc'ers, > > (have originally posted this to linux-pm, got suggested I ask here as well) > > I'm encountering the following deadlock: > > > > First, an "echo disk >/sys/power/state": > > [] (schedule+0x48c/0x50c) from [] > (log_wait_commit+0xb8/0x110) > [] (log_wait_commit+0xb8/0x110) from [] > (ext3_sync_fs+0x3c/0x44) > [] (ext3_sync_fs+0x3c/0x44) from [] > (__sync_filesystem+0x50/0x60) > [] (__sync_filesystem+0x50/0x60) from [] > (fsync_bdev+0x18/0x38) > [] (fsync_bdev+0x18/0x38) from [] > (invalidate_partition+0x18/0x34) > [] (invalidate_partition+0x18/0x34) from [] > (del_gendisk+0x24/0xc0) > [] (del_gendisk+0x24/0xc0) from [] > (mmc_blk_remove+0x20/0x40) > [] (mmc_blk_remove+0x20/0x40) from [] > (mmc_bus_remove+0x18/0x20) > [] (mmc_bus_remove+0x18/0x20) from [] > (__device_release_driver+0x64/0xa4) > [] (__device_release_driver+0x64/0xa4) from [] > (device_release_driver+0x1c/0x28) > [] (device_release_driver+0x1c/0x28) from [] > (bus_remove_device+0x6c/0x7c) > [] (bus_remove_device+0x6c/0x7c) from [] > (device_del+0x118/0x170) > [] (device_del+0x118/0x170) from [] > (mmc_remove_card+0x50/0x64) > [] (mmc_remove_card+0x50/0x64) from [] > (mmc_sd_remove+0x24/0x30) > [] (mmc_sd_remove+0x24/0x30) from [] > (mmc_pm_notify+0x88/0xd8) > [] (mmc_pm_notify+0x88/0xd8) from [] > (notifier_call_chain+0x2c/0x70) > [] (notifier_call_chain+0x2c/0x70) from [] > (__blocking_notifier_call_chain+0x48/0x5c) > [] (__blocking_notifier_call_chain+0x48/0x5c) from [] > (blocking_notifier_call_chain+0x14/0x18) > [] (blocking_notifier_call_chain+0x14/0x18) from [] > (pm_notifier_call_chain+0x14/0x2c) > [] (pm_notifier_call_chain+0x14/0x2c) from [] > (hibernate+0x1a8/0x1d8) > [] (hibernate+0x1a8/0x1d8) from [] > (state_store+0x4c/0xe4) > [] (state_store+0x4c/0xe4) from [] > (kobj_attr_store+0x18/0x1c) > [] (kobj_attr_store+0x18/0x1c) from [] > (sysfs_write_file+0x10c/0x140) > [] (sysfs_write_file+0x10c/0x140) from [] > (vfs_write+0xac/0x154) > [] (vfs_write+0xac/0x154) from [] (sys_write+0x3c/0x68) > [] (sys_write+0x3c/0x68) from [] > (ret_fast_syscall+0x0/0x2c) > > This waits for I/O. Which would be processed by mmcqd: > > mmcqd D c03bad2c 0 516 2 0x00000000 > [] (schedule+0x48c/0x50c) from [] > (__mmc_claim_host+0xbc/0x158) > [] (__mmc_claim_host+0xbc/0x158) from [] > (mmc_blk_issue_rq+0x2c/0x728) > [] (mmc_blk_issue_rq+0x2c/0x728) from [] > (mmc_queue_thread+0xd8/0xdc) > [] (mmc_queue_thread+0xd8/0xdc) from [] > (kthread+0x80/0x88) > [] (kthread+0x80/0x88) from [] > (kernel_thread_exit+0x0/0x8) > > and that's waiting to claim the MMC host. > > Which will never happen - mmc_pm_notify(), by the point above, holds that > host ransom already, the code does the mmc_claim_host() directly before > calling into bus_ops->suspend(). > > > Is this is a known problem ? We're running a customized 2.6.32 kernel, so > admittedly not the latest; mmc_pm_notify, on the other hand, is already > newer than that, introduced via: > > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=4c2ef25fe0b847d2ae818f74758ddb0be1c27d8e;hp=7310ece86ad7da027f85a37a0638164118a5d12f > > in 2.6.35, and in itself not changed since. > > > I've found https://lkml.org/lkml/2010/10/21/494 which talks about a > regression from said change, but nothing came out of that, and the codepaths > mentioned there is mmc_suspend_host not mmc_sd_remove. > > The device I'm testing this on is an OMAP3 box that boots via MMC, hence the > root filesystem is on there and it'd be expected dirty. > > Removing the abovementioned commit prevents the deadlock from happening, but > then I wonder ? Also, I've found that even if I remove the commit, MMC > doesn't suspend cleanly (gives a DPM timeout crash a little later). > Hi Frank, I was able to reproduce it on linux-next. No need for rootfs on mmcblk, and a "echo mem > /sys/power/state" is sufficient to trigger the issue. I am looking into it. Thanks, A