From mboxrd@z Thu Jan  1 00:00:00 1970
From: l.stach@pengutronix.de (Lucas Stach)
Date: Tue, 15 Oct 2013 08:38:24 +0200
Subject: BUG: imx6: scheduling-while-atomic bug detected with commit:
 8ba9580
In-Reply-To: <525CE0B2.7040209@de.bosch.com>
References: <857E9EDCA6C0904DB3357321AA9123EB625007EA@NA-MBX-01.mgc.mentorg.com>
 <525CE0B2.7040209@de.bosch.com>
Message-ID: <1381819104.4243.3.camel@weser.hi.pengutronix.de>
To: linux-arm-kernel@lists.infradead.org
List-Id: linux-arm-kernel.lists.infradead.org

Am Dienstag, den 15.10.2013, 08:29 +0200 schrieb Dirk Behme:
> Hi Shawn and Lucas,
> 
> On 09.10.2013 08:10, Wang, Jiada (ESD) wrote:
> > Hi Shawn and Lucas
> >
> >    The following scheduling-while-atomic bug was detected in our kernel as well as v3.11.0-rc5 vanilla
> >
> > [    3.421817] BUG: scheduling while atomic: kworker/u:0/6/0x00000002
> > [    3.421825] 4 locks held by kworker/u:0/6:
> > [    3.421857]  #0:  (events_unbound){.+.+.+}, at: [<80042028>] process_one_work+0x20c/0x668
> > [    3.421878]  #1:  ((&entry->work)){+.+.+.}, at: [<80042028>] process_one_work+0x20c/0x668
> > [    3.421901]  #2:  (&(&host->lock)->rlock){......}, at: [<802fe084>] sdhci_do_set_ios+0x24/0x6a8
> > [    3.421923]  #3:  (prepare_lock){+.+...}, at: [<803116f8>] clk_get_rate+0x1c/0x5c
> > [    3.421932] Modules linked in:
> > [    3.421936] irq event stamp: 640
> > [    3.421948] hardirqs last  enabled at (639): [<800412a4>] __cancel_work_timer+0x94/0xd0
> > [    3.421964] hardirqs last disabled at (640): [<8041579c>] _raw_spin_lock_irqsave+0x24/0x60
> > [    3.421980] softirqs last  enabled at (378): [<8002abc0>] __do_softirq+0x368/0x3f8
> > [    3.421992] softirqs last disabled at (373): [<8002b00c>] irq_exit+0x5c/0xb0
> > [    3.421998] Backtrace:
> > [    3.422025] [<80011aa0>] (dump_backtrace+0x0/0x10c) from [<8040dbd4>] (dump_stack+0x18/0x1c)
> > [    3.422044]  r6:b40bfbf8 r5:81392b80 r4:b40a2e40 r3:b40a2e40
> > [    3.422058] [<8040dbbc>] (dump_stack+0x0/0x1c) from [<8040e49c>] (__schedule_bug+0x64/0x78)
> > [    3.422072] [<8040e438>] (__schedule_bug+0x0/0x78) from [<8041424c>] (__schedule+0x68/0x7f8)
> > [    3.422083]  r4:b40a2e40 r3:00000002
> > [    3.422094] [<804141e4>] (__schedule+0x0/0x7f8) from [<80414b6c>] (schedule+0x84/0x88)
> > [    3.422106] [<80414ae8>] (schedule+0x0/0x88) from [<80414e38>] (schedule_preempt_disabled+0x18/0x24)
> > [    3.422120] [<80414e20>] (schedule_preempt_disabled+0x0/0x24) from [<80413478>] (mutex_lock_nested+0x220/0x3c0)
> > [    3.422134] [<80413258>] (mutex_lock_nested+0x0/0x3c0) from [<803116f8>] (clk_get_rate+0x1c/0x5c)
> > [    3.422148] [<803116dc>] (clk_get_rate+0x0/0x5c) from [<80302a74>] (esdhc_pltfm_set_clock+0x1c/0x200)
> > [    3.422158]  r4:b405e400 r3:80302a58
> > [    3.422171] [<80302a58>] (esdhc_pltfm_set_clock+0x0/0x200) from [<802fd3d4>] (sdhci_set_clock+0x4c/0x31c)
> > [    3.422183] [<802fd388>] (sdhci_set_clock+0x0/0x31c) from [<802fe174>] (sdhci_do_set_ios+0x114/0x6a8)
> > [    3.422194] [<802fe060>] (sdhci_do_set_ios+0x0/0x6a8) from [<802fe730>] (sdhci_set_ios+0x28/0x34)
> > [    3.422213] [<802fe708>] (sdhci_set_ios+0x0/0x34) from [<802efa6c>] (mmc_power_up+0xb8/0xc4)
> > [    3.422224]  r5:b405e2ac r4:b405e000
> > [    3.422239] [<802ef9b4>] (mmc_power_up+0x0/0xc4) from [<802f06b4>] (mmc_start_host+0x30/0x44)
> > [    3.422255]  r7:b405e4b4 r6:00000000 r5:00000000 r4:b405e000
> > [    3.422267] [<802f0684>] (mmc_start_host+0x0/0x44) from [<802f0fd8>] (mmc_add_host+0x58/0x6c)
> > [    3.422277]  r5:00000000 r4:b405e000
> > [    3.422289] [<802f0f80>] (mmc_add_host+0x0/0x6c) from [<802ff188>] (sdhci_add_host+0x9f8/0xb88)
> > [    3.422299]  r5:b405e000 r4:b405e400
> > [    3.422313] [<802fe790>] (sdhci_add_host+0x0/0xb88) from [<80303dd8>] (sdhci_esdhc_imx_probe_wrapper+0x5c4/0x6a4)
> > [    3.422334]  r8:b4128400 r7:b4128410 r6:00002801 r5:b4394110 r4:b405e400
> > [    3.422356] [<80303814>] (sdhci_esdhc_imx_probe_wrapper+0x0/0x6a4) from [<8004fe34>] (async_run_entry_fn+0xf0/0x20c)
> > [    3.422370] [<8004fd44>] (async_run_entry_fn+0x0/0x20c) from [<8004219c>] (process_one_work+0x380/0x668)
> > [    3.422393]  r9:00000089 r8:00000000 r7:b4004600 r6:8064ca40 r5:b4391e08
> > [    3.422393] r4:b4081200
> > [    3.422406] [<80041e1c>] (process_one_work+0x0/0x668) from [<800428e0>] (worker_thread+0x290/0x3a8)
> > [    3.422423] [<80042650>] (worker_thread+0x0/0x3a8) from [<800476b0>] (kthread+0xb4/0xc0)
> > [    3.422439] [<800475fc>] (kthread+0x0/0xc0) from [<8000d970>] (ret_from_fork+0x14/0x24)
> > [    3.422455]  r7:00000000 r6:00000000 r5:800475fc r4:b409de58
> >
> > The scheduling-while-atomic bug is due to clk_get_rate() trying to lock a mutex while a spinlock is being held by sdhci_do_set_ios().
> > The sequence is:
> > sdhci_do_set_ios()
> >    spin_lock_irqsave(&host->lock, flags);
> >    sdhci_set_clock(host, ios->clock);
> >      esdhc_pltfm_set_clock(host, clock); /* called via host->ops->set_clock */
> >        esdhc_set_clock(host, clock, clk_get_rate(pltfm_host->clk));
> > The locking violation occurs when clk_get_rate tries to lock a mutex.
> >
> > The esdhc_pltfm_set_clock() routine was introduced in commit:
> > 8ba9580 mmc: sdhci-esdhc: calculate sdclk divider from controller clock
> >
> > Would you please have a look at this issue?
> 
> Any opinions on this?
> 
I've had a brief look into this and IMHO it's totally bogus that the
SDHCI framework is protecting such potentially long running sections
with a spinlock.

I'll spin a patch to work around the issue for now, but honestly in the
end the spinlock should go away.

Regards,
Lucas

-- 
Pengutronix e.K.                           | Lucas Stach                 |
Industrial Linux Solutions                 | http://www.pengutronix.de/  |
Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-5076 |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |