From mboxrd@z Thu Jan 1 00:00:00 1970 From: Stephen Warren Subject: Shutdown regression due to mmc workqueue/locking changes Date: Wed, 30 Oct 2013 15:05:32 -0600 Message-ID: <5271749C.2000206@wwwdotorg.org> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Return-path: Sender: linux-next-owner@vger.kernel.org To: Jeremie Samuel , Chris Ball Cc: "linux-tegra@vger.kernel.org" , "linux-next@vger.kernel.org" , "linux-mmc@vger.kernel.org" List-Id: linux-tegra@vger.kernel.org When I run "shutdown -h now" in next-20131030, I get a hung task. git bisect shows that the first problem that shows up is due to 35a8fca "mmc: sdhci: Turn tuning timeout timer into delayed work", which yields the following crash immediately that poweroff happens: > [ 76.314514] BUG: scheduling while atomic: halt/1367/0x00000002 > [ 76.320614] Modules linked in: > [ 76.323802] CPU: 2 PID: 1367 Comm: halt Not tainted 3.12.0-rc2-00141-g35a8fca #147 > [ 76.331682] [] (unwind_backtrace+0x0/0xf8) from [] (show_stack+0x10/0x14) > [ 76.340501] [] (show_stack+0x10/0x14) from [] (dump_stack+0x80/0xc4) > [ 76.348860] [] (dump_stack+0x80/0xc4) from [] (__schedule_bug+0x48/0x60) > [ 76.357559] [] (__schedule_bug+0x48/0x60) from [] (__schedule+0x64c/0x6d0) > [ 76.366404] [] (__schedule+0x64c/0x6d0) from [] (schedule_timeout+0x174/0x1ec) > [ 76.375552] [] (schedule_timeout+0x174/0x1ec) from [] (wait_for_common+0xb8/0x14c) > [ 76.385074] [] (wait_for_common+0xb8/0x14c) from [] (__flush_work+0xa4/0x12c) > [ 76.394078] [] (__flush_work+0xa4/0x12c) from [] (sdhci_do_set_ios+0x90/0x6fc) > [ 76.403220] [] (sdhci_do_set_ios+0x90/0x6fc) from [] (sdhci_set_ios+0x20/0x2c) > [ 76.412380] [] (sdhci_set_ios+0x20/0x2c) from [] (mmc_power_off+0x70/0x90) > [ 76.421212] [] (mmc_power_off+0x70/0x90) from [] (_mmc_suspend+0xb4/0x274) > [ 76.430037] [] (_mmc_suspend+0xb4/0x274) from [] (mmc_bus_shutdown+0x40/0x68) > [ 76.439129] [] (mmc_bus_shutdown+0x40/0x68) from [] (device_shutdown+0x34/0x188) > [ 76.448478] [] (device_shutdown+0x34/0x188) from [] (kernel_power_off+0x2c/0x70) > [ 76.457814] [] (kernel_power_off+0x2c/0x70) from [] (SyS_reboot+0x108/0x1d4) > [ 76.466812] [] (SyS_reboot+0x108/0x1d4) from [] (ret_fast_syscall+0x0/0x30) A few commits later, 769303b "mmc: sdhci: Turn host->lock into a mutex" changes this to a hang, followed by a hung task being detected: > [ 240.870402] INFO: task kworker/1:1:70 blocked for more than 120 seconds. > [ 240.877203] Not tainted 3.12.0-rc7-next-20131030 #133 > [ 240.882886] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > [ 240.890824] kworker/1:1 D c0554ec0 0 70 2 0x00000000 > [ 240.897345] Workqueue: events sdhci_tuning_timeout_work > [ 240.902734] [] (__schedule+0x300/0x674) from [] (schedule_preempt_disabled+0x24/0x34) > [ 240.912461] [] (schedule_preempt_disabled+0x24/0x34) from [] (__mutex_lock_slowpath+0x17c/0x374) > [ 240.923146] [] (__mutex_lock_slowpath+0x17c/0x374) from [] (mutex_lock+0xc/0x24) > [ 240.932430] [] (mutex_lock+0xc/0x24) from [] (sdhci_tuning_timeout_work+0x14/0x2c) > [ 240.941890] [] (sdhci_tuning_timeout_work+0x14/0x2c) from [] (process_one_work+0xfc/0x34c) > [ 240.952033] [] (process_one_work+0xfc/0x34c) from [] (process_scheduled_works+0x24/0x34) > [ 240.961998] [] (process_scheduled_works+0x24/0x34) from [] (worker_thread+0x228/0x384) > [ 240.971787] [] (worker_thread+0x228/0x384) from [] (kthread+0xc4/0xe0) > [ 240.980160] [] (kthread+0xc4/0xe0) from [] (ret_from_fork+0x14/0x3c) > [ 240.988366] INFO: task halt:1272 blocked for more than 120 seconds. > [ 240.994724] Not tainted 3.12.0-rc7-next-20131030 #133 > [ 241.000373] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > [ 241.008273] halt D c0554ec0 0 1272 1270 0x00000000 > [ 241.014820] [] (__schedule+0x300/0x674) from [] (schedule_timeout+0x174/0x1ec) > [ 241.023919] [] (schedule_timeout+0x174/0x1ec) from [] (wait_for_common+0xb8/0x14c) > [ 241.033359] [] (wait_for_common+0xb8/0x14c) from [] (__flush_work+0xa4/0x12c) > [ 241.042362] [] (__flush_work+0xa4/0x12c) from [] (sdhci_do_set_ios+0x8c/0x6e8) > [ 241.051448] [] (sdhci_do_set_ios+0x8c/0x6e8) from [] (sdhci_set_ios+0x20/0x2c) > [ 241.060553] [] (sdhci_set_ios+0x20/0x2c) from [] (mmc_power_off+0x5c/0x7c) > [ 241.069275] [] (mmc_power_off+0x5c/0x7c) from [] (_mmc_suspend+0x194/0x2a0) > [ 241.078110] [] (_mmc_suspend+0x194/0x2a0) from [] (mmc_bus_shutdown+0x40/0x68) > [ 241.087339] [] (mmc_bus_shutdown+0x40/0x68) from [] (device_shutdown+0x34/0x188) > [ 241.096627] [] (device_shutdown+0x34/0x188) from [] (kernel_power_off+0x2c/0x70) > [ 241.105893] [] (kernel_power_off+0x2c/0x70) from [] (SyS_reboot+0x108/0x1d4) > [ 241.114814] [] (SyS_reboot+0x108/0x1d4) from [] (ret_fast_syscall+0x0/0x30) This is on the Tegra114 Dalmore board, which is an ARM system. Looking at the backtraces, I'd guess that all Tegra devices are affected, but haven't tested that. I haven't looked at the commits to see if there's anything obviously wrong, but perhaps you can suggest something to try?