* [REGRESSION 4.10] dw_mmc: failures on Rockchip rk3288 veyron boards
@ 2017-03-30 1:17 Brian Norris
[not found] ` <20170330011709.GA110687-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2017-04-10 23:35 ` Doug Anderson
0 siblings, 2 replies; 18+ messages in thread
From: Brian Norris @ 2017-03-30 1:17 UTC (permalink / raw)
To: linux-mmc, linux-rockchip
Cc: Heiko Stuebner, amstan, Ziyuan Xu, Shawn Lin, Jaehoon Chung
Hi all,
I haven't managed to get as far as a bugfix for this, but I've bisected
some issues seen on v4.10+ with a Chromebook of the Veyron family (Jaq,
in particular). v4.9 works fine.
Issue #1 - eMMC complains periodically:
[ 4.358135] mmc_host mmc2: Bus speed (slot 0) = 148500000Hz (slot req 150000000Hz, actual 148500000HZ div = 0)
[ 4.461466] mmc_host mmc2: Bus speed (slot 0) = 148500000Hz (slot req 150000000Hz, actual 148500000HZ div = 0)
[ 5.291450] mmc_host mmc2: Bus speed (slot 0) = 148500000Hz (slot req 150000000Hz, actual 148500000HZ div = 0)
[ 5.381471] mmc_host mmc2: Bus speed (slot 0) = 148500000Hz (slot req 150000000Hz, actual 148500000HZ div = 0)
[ 11.243337] mmc_host mmc2: Bus speed (slot 0) = 148500000Hz (slot req 150000000Hz, actual 148500000HZ div = 0)
[ 17.371628] mmc_host mmc2: Bus speed (slot 0) = 148500000Hz (slot req 150000000Hz, actual 148500000HZ div = 0)
and if I stress it out at all (e.g., dd if=/dev/mmcblk2 bs=1M >
/dev/null), it will eventually croak:
[ 359.916315] mmc_host mmc2: Bus speed (slot 0) = 148500000Hz (slot req 150000000Hz, actual 148500000HZ div = 0)
[ 360.071378] dwmmc_rockchip ff0f0000.dwmmc: Successfully tuned phase to 153
[ 360.211351] mmcblk2: error -110 transferring data, sector 8644608, nr 2048, cmd response 0x900, card status 0x0
[ 360.221936] mmcblk2: retrying using single block read
[ 363.491362] mmcblk2: error -110 transferring data, sector 8646656, nr 2048, cmd response 0x900, card status 0x0
[ 363.531569] mmc_host mmc2: Bus speed (slot 0) = 400000Hz (slot req 400000Hz, actual 400000HZ div = 0)
[ 363.596326] mmc_host mmc2: Bus speed (slot 0) = 148500000Hz (slot req 150000000Hz, actual 148500000HZ div = 0)
[ 363.612712] dwmmc_rockchip ff0f0000.dwmmc: Successfully tuned phase to 152
[ 363.751351] mmcblk2: error -110 transferring data, sector 8646656, nr 2048, cmd response 0x900, card status 0x0
[ 363.761938] mmcblk2: retrying using single block read
[ 366.611356] INFO: task mmcqd/2boot1:92 blocked for more than 120 seconds.
[ 366.618134] Not tainted 4.10.0 #284
[ 366.622146] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 366.629960] mmcqd/2boot1 D 0 92 2 0x00000000
[ 366.635454] [<c07dc21c>] (__schedule) from [<c07dc4e0>] (schedule+0x90/0xa0)
[ 366.642497] [<c07dc4e0>] (schedule) from [<c066e8b4>] (__mmc_claim_host+0xd4/0x19c)
[ 366.650142] [<c066e8b4>] (__mmc_claim_host) from [<c066e9ac>] (mmc_get_card+0x30/0x34)
[ 366.658056] [<c066e9ac>] (mmc_get_card) from [<c067fc8c>] (mmc_blk_issue_rq+0x64/0x48c)
[ 366.666052] [<c067fc8c>] (mmc_blk_issue_rq) from [<c0680230>] (mmc_queue_thread+0x114/0x1b4)
[ 366.674484] [<c0680230>] (mmc_queue_thread) from [<c023d1b0>] (kthread+0x128/0x144)
[ 366.682134] [<c023d1b0>] (kthread) from [<c02076e8>] (ret_from_fork+0x14/0x2c)
...
Issue #2 - Wifi (via SDIO, mmc1) is completely dead:
[ 1.444125] mmc_host mmc1: card is non-removable.
[ 1.471368] mmc_host mmc1: Bus speed (slot 0) = 400000Hz (slot req 400000Hz, actual 400000HZ div = 0)
[ 1.619553] mmc_host mmc1: Bus speed (slot 0) = 148500000Hz (slot req 150000000Hz, actual 148500000HZ div = 0)
[ 1.881699] mmc1: new ultra high speed SDR104 SDIO card at address 0001
[ 25.681172] mmc_host mmc1: Bus speed (slot 0) = 148500000Hz (slot req 150000000Hz, actual 148500000HZ div = 0)
[ 25.691666] mwifiex: rx work enabled, cpus 4
[ 26.827000] mwifiex_sdio mmc1:0001:1: info: FW download over, size 800344 bytes
[ 27.561352] mwifiex_sdio mmc1:0001:1: WLAN FW is active
[ 33.585165] mmc_host mmc1: Bus speed (slot 0) = 148500000Hz (slot req 150000000Hz, actual 148500000HZ div = 0)
[ 37.651344] mwifiex_sdio mmc1:0001:1: mwifiex_cmd_timeout_func: Timeout cmd id = 0xa9, act = 0x0
[ 37.660122] mwifiex_sdio mmc1:0001:1: num_data_h2c_failure = 0
[ 37.665951] mwifiex_sdio mmc1:0001:1: num_cmd_h2c_failure = 0
[ 37.671688] mwifiex_sdio mmc1:0001:1: is_cmd_timedout = 1
[ 37.677076] mwifiex_sdio mmc1:0001:1: num_tx_timeout = 0
[ 37.682380] mwifiex_sdio mmc1:0001:1: last_cmd_index = 1
[ 37.687681] mwifiex_sdio mmc1:0001:1: last_cmd_id: 00 00 a9 00 00 00 00 00 00 00
[ 37.695066] mwifiex_sdio mmc1:0001:1: last_cmd_act: 00 00 00 00 00 00 00 00 00 00
[ 37.702536] mwifiex_sdio mmc1:0001:1: last_cmd_resp_index = 0
[ 37.708269] mwifiex_sdio mmc1:0001:1: last_cmd_resp_id: 00 00 00 00 00 00 00 00 00 00
[ 37.716087] mwifiex_sdio mmc1:0001:1: last_event_index = 0
[ 37.721564] mwifiex_sdio mmc1:0001:1: last_event: 00 00 00 00 00 00 00 00 00 00
[ 37.728857] mwifiex_sdio mmc1:0001:1: data_sent=1 cmd_sent=0
[ 37.734508] mwifiex_sdio mmc1:0001:1: ps_mode=0 ps_state=0
[ 37.740016] mmc_host mmc1: Bus speed (slot 0) = 148500000Hz (slot req 150000000Hz, actual 148500000HZ div = 0)
[ 37.750268] mwifiex_sdio mmc1:0001:1: info: mwifiex_fw_dpc: unregister device
For either of these issues, if I simply revert the dw_mmc driver back to
its v4.9 version (but keep everything else at v4.10), things seem to
work fine.
At this point, I'm pretty sure that it's the runtime PM support added to
dw_mmc that cause the regression.
Any thoughts? I don't exactly plan on trying to debug a solution myself here,
but I thought I'd report it in case somebody else has ideas.
Brian
^ permalink raw reply [flat|nested] 18+ messages in thread[parent not found: <20170330011709.GA110687-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>]
* Re: [REGRESSION 4.10] dw_mmc: failures on Rockchip rk3288 veyron boards [not found] ` <20170330011709.GA110687-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org> @ 2017-03-30 1:32 ` Shawn Lin 2017-03-30 1:42 ` Brian Norris 0 siblings, 1 reply; 18+ messages in thread From: Shawn Lin @ 2017-03-30 1:32 UTC (permalink / raw) To: Brian Norris, linux-mmc-u79uwXL29TY76Z2rM5mHXA, linux-rockchip-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r Cc: Jaehoon Chung, Ziyuan Xu, Heiko Stuebner, amstan-F7+t8E8rja9g9hUCZPvPmw Hi Brian, On 2017/3/30 9:17, Brian Norris wrote: > Hi all, > > I haven't managed to get as far as a bugfix for this, but I've bisected > some issues seen on v4.10+ with a Chromebook of the Veyron family (Jaq, > in particular). v4.9 works fine. Does your v4.10+ kernel have these commits? commit e9748e0364fe82dc037d22900ff13a62d04518bf Author: Ziyuan Xu <xzy.xu-TNX95d0MmH7DzftRWevZcw@public.gmane.org> Date: Tue Jan 17 09:22:56 2017 +0800 mmc: dw_mmc: force setup bus if active slots exist commit df9bcc2bc0a1f8d2963bd916698268fb2470713b Author: Joonyoung Shim <jy0922.shim-Sze3O3UU22JBDgjK7y7TUQ@public.gmane.org> Date: Fri Nov 25 12:47:15 2016 +0900 mmc: dw_mmc: add missing codes for runtime resume commit ce69e2fea093b7fa3991c87849c4955cd47796c9 Author: Shawn Lin <shawn.lin-TNX95d0MmH7DzftRWevZcw@public.gmane.org> Date: Tue Jan 17 09:22:55 2017 +0800 mmc: dw_mmc: silent verbose log when calling from PM context > > Issue #1 - eMMC complains periodically: > > [ 4.358135] mmc_host mmc2: Bus speed (slot 0) = 148500000Hz (slot req 150000000Hz, actual 148500000HZ div = 0) > [ 4.461466] mmc_host mmc2: Bus speed (slot 0) = 148500000Hz (slot req 150000000Hz, actual 148500000HZ div = 0) > [ 5.291450] mmc_host mmc2: Bus speed (slot 0) = 148500000Hz (slot req 150000000Hz, actual 148500000HZ div = 0) > [ 5.381471] mmc_host mmc2: Bus speed (slot 0) = 148500000Hz (slot req 150000000Hz, actual 148500000HZ div = 0) > [ 11.243337] mmc_host mmc2: Bus speed (slot 0) = 148500000Hz (slot req 150000000Hz, actual 148500000HZ div = 0) > [ 17.371628] mmc_host mmc2: Bus speed (slot 0) = 148500000Hz (slot req 150000000Hz, actual 148500000HZ div = 0) > > and if I stress it out at all (e.g., dd if=/dev/mmcblk2 bs=1M > > /dev/null), it will eventually croak: > > [ 359.916315] mmc_host mmc2: Bus speed (slot 0) = 148500000Hz (slot req 150000000Hz, actual 148500000HZ div = 0) > [ 360.071378] dwmmc_rockchip ff0f0000.dwmmc: Successfully tuned phase to 153 > [ 360.211351] mmcblk2: error -110 transferring data, sector 8644608, nr 2048, cmd response 0x900, card status 0x0 > [ 360.221936] mmcblk2: retrying using single block read > [ 363.491362] mmcblk2: error -110 transferring data, sector 8646656, nr 2048, cmd response 0x900, card status 0x0 > [ 363.531569] mmc_host mmc2: Bus speed (slot 0) = 400000Hz (slot req 400000Hz, actual 400000HZ div = 0) > [ 363.596326] mmc_host mmc2: Bus speed (slot 0) = 148500000Hz (slot req 150000000Hz, actual 148500000HZ div = 0) > [ 363.612712] dwmmc_rockchip ff0f0000.dwmmc: Successfully tuned phase to 152 > [ 363.751351] mmcblk2: error -110 transferring data, sector 8646656, nr 2048, cmd response 0x900, card status 0x0 > [ 363.761938] mmcblk2: retrying using single block read > [ 366.611356] INFO: task mmcqd/2boot1:92 blocked for more than 120 seconds. > [ 366.618134] Not tainted 4.10.0 #284 > [ 366.622146] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > [ 366.629960] mmcqd/2boot1 D 0 92 2 0x00000000 > [ 366.635454] [<c07dc21c>] (__schedule) from [<c07dc4e0>] (schedule+0x90/0xa0) > [ 366.642497] [<c07dc4e0>] (schedule) from [<c066e8b4>] (__mmc_claim_host+0xd4/0x19c) > [ 366.650142] [<c066e8b4>] (__mmc_claim_host) from [<c066e9ac>] (mmc_get_card+0x30/0x34) > [ 366.658056] [<c066e9ac>] (mmc_get_card) from [<c067fc8c>] (mmc_blk_issue_rq+0x64/0x48c) > [ 366.666052] [<c067fc8c>] (mmc_blk_issue_rq) from [<c0680230>] (mmc_queue_thread+0x114/0x1b4) > [ 366.674484] [<c0680230>] (mmc_queue_thread) from [<c023d1b0>] (kthread+0x128/0x144) > [ 366.682134] [<c023d1b0>] (kthread) from [<c02076e8>] (ret_from_fork+0x14/0x2c) > ... > > Issue #2 - Wifi (via SDIO, mmc1) is completely dead: > > [ 1.444125] mmc_host mmc1: card is non-removable. > [ 1.471368] mmc_host mmc1: Bus speed (slot 0) = 400000Hz (slot req 400000Hz, actual 400000HZ div = 0) > [ 1.619553] mmc_host mmc1: Bus speed (slot 0) = 148500000Hz (slot req 150000000Hz, actual 148500000HZ div = 0) > [ 1.881699] mmc1: new ultra high speed SDR104 SDIO card at address 0001 > [ 25.681172] mmc_host mmc1: Bus speed (slot 0) = 148500000Hz (slot req 150000000Hz, actual 148500000HZ div = 0) > [ 25.691666] mwifiex: rx work enabled, cpus 4 > [ 26.827000] mwifiex_sdio mmc1:0001:1: info: FW download over, size 800344 bytes > [ 27.561352] mwifiex_sdio mmc1:0001:1: WLAN FW is active > [ 33.585165] mmc_host mmc1: Bus speed (slot 0) = 148500000Hz (slot req 150000000Hz, actual 148500000HZ div = 0) > [ 37.651344] mwifiex_sdio mmc1:0001:1: mwifiex_cmd_timeout_func: Timeout cmd id = 0xa9, act = 0x0 > [ 37.660122] mwifiex_sdio mmc1:0001:1: num_data_h2c_failure = 0 > [ 37.665951] mwifiex_sdio mmc1:0001:1: num_cmd_h2c_failure = 0 > [ 37.671688] mwifiex_sdio mmc1:0001:1: is_cmd_timedout = 1 > [ 37.677076] mwifiex_sdio mmc1:0001:1: num_tx_timeout = 0 > [ 37.682380] mwifiex_sdio mmc1:0001:1: last_cmd_index = 1 > [ 37.687681] mwifiex_sdio mmc1:0001:1: last_cmd_id: 00 00 a9 00 00 00 00 00 00 00 > [ 37.695066] mwifiex_sdio mmc1:0001:1: last_cmd_act: 00 00 00 00 00 00 00 00 00 00 > [ 37.702536] mwifiex_sdio mmc1:0001:1: last_cmd_resp_index = 0 > [ 37.708269] mwifiex_sdio mmc1:0001:1: last_cmd_resp_id: 00 00 00 00 00 00 00 00 00 00 > [ 37.716087] mwifiex_sdio mmc1:0001:1: last_event_index = 0 > [ 37.721564] mwifiex_sdio mmc1:0001:1: last_event: 00 00 00 00 00 00 00 00 00 00 > [ 37.728857] mwifiex_sdio mmc1:0001:1: data_sent=1 cmd_sent=0 > [ 37.734508] mwifiex_sdio mmc1:0001:1: ps_mode=0 ps_state=0 > [ 37.740016] mmc_host mmc1: Bus speed (slot 0) = 148500000Hz (slot req 150000000Hz, actual 148500000HZ div = 0) > [ 37.750268] mwifiex_sdio mmc1:0001:1: info: mwifiex_fw_dpc: unregister device > > For either of these issues, if I simply revert the dw_mmc driver back to > its v4.9 version (but keep everything else at v4.10), things seem to > work fine. > > At this point, I'm pretty sure that it's the runtime PM support added to > dw_mmc that cause the regression. > > Any thoughts? I don't exactly plan on trying to debug a solution myself here, > but I thought I'd report it in case somebody else has ideas. > > Brian > > > ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [REGRESSION 4.10] dw_mmc: failures on Rockchip rk3288 veyron boards 2017-03-30 1:32 ` Shawn Lin @ 2017-03-30 1:42 ` Brian Norris 2017-03-30 2:18 ` Eddie Cai 0 siblings, 1 reply; 18+ messages in thread From: Brian Norris @ 2017-03-30 1:42 UTC (permalink / raw) To: Shawn Lin Cc: linux-mmc, linux-rockchip, Heiko Stuebner, amstan, Ziyuan Xu, Jaehoon Chung On Thu, Mar 30, 2017 at 09:32:22AM +0800, Shawn Lin wrote: > Hi Brian, > > On 2017/3/30 9:17, Brian Norris wrote: > >Hi all, > > > >I haven't managed to get as far as a bugfix for this, but I've bisected > >some issues seen on v4.10+ with a Chromebook of the Veyron family (Jaq, > >in particular). v4.9 works fine. > > Does your v4.10+ kernel have these commits? By "4.10+", I meant that pure 4.10 is broken, as are all subsequent versions (e.g., 4.11-rc1). > commit e9748e0364fe82dc037d22900ff13a62d04518bf > Author: Ziyuan Xu <xzy.xu@rock-chips.com> > Date: Tue Jan 17 09:22:56 2017 +0800 > > mmc: dw_mmc: force setup bus if active slots exist > > > commit df9bcc2bc0a1f8d2963bd916698268fb2470713b > Author: Joonyoung Shim <jy0922.shim@samsung.com> > Date: Fri Nov 25 12:47:15 2016 +0900 > > mmc: dw_mmc: add missing codes for runtime resume 'git describe' tells me these are in 4.10-rc1 and -rc6. So yes. > commit ce69e2fea093b7fa3991c87849c4955cd47796c9 > Author: Shawn Lin <shawn.lin@rock-chips.com> > Date: Tue Jan 17 09:22:55 2017 +0800 > > mmc: dw_mmc: silent verbose log when calling from PM context 'git describe' tells me this is in 4.11-rc1, so no. Brian ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [REGRESSION 4.10] dw_mmc: failures on Rockchip rk3288 veyron boards 2017-03-30 1:42 ` Brian Norris @ 2017-03-30 2:18 ` Eddie Cai 2017-03-30 2:53 ` Brian Norris 0 siblings, 1 reply; 18+ messages in thread From: Eddie Cai @ 2017-03-30 2:18 UTC (permalink / raw) To: Brian Norris Cc: Shawn Lin, Heiko Stuebner, Ziyuan Xu, linux-mmc, Jaehoon Chung, linux-rockchip, amstan HI Brian I test on rk3288 firefly reload with 4.11-rc4. It work fine. 2017-03-30 9:42 GMT+08:00 Brian Norris <briannorris@chromium.org>: > On Thu, Mar 30, 2017 at 09:32:22AM +0800, Shawn Lin wrote: >> Hi Brian, >> >> On 2017/3/30 9:17, Brian Norris wrote: >> >Hi all, >> > >> >I haven't managed to get as far as a bugfix for this, but I've bisected >> >some issues seen on v4.10+ with a Chromebook of the Veyron family (Jaq, >> >in particular). v4.9 works fine. >> >> Does your v4.10+ kernel have these commits? > > By "4.10+", I meant that pure 4.10 is broken, as are all subsequent > versions (e.g., 4.11-rc1). > >> commit e9748e0364fe82dc037d22900ff13a62d04518bf >> Author: Ziyuan Xu <xzy.xu@rock-chips.com> >> Date: Tue Jan 17 09:22:56 2017 +0800 >> >> mmc: dw_mmc: force setup bus if active slots exist >> >> >> commit df9bcc2bc0a1f8d2963bd916698268fb2470713b >> Author: Joonyoung Shim <jy0922.shim@samsung.com> >> Date: Fri Nov 25 12:47:15 2016 +0900 >> >> mmc: dw_mmc: add missing codes for runtime resume > > 'git describe' tells me these are in 4.10-rc1 and -rc6. So yes. > >> commit ce69e2fea093b7fa3991c87849c4955cd47796c9 >> Author: Shawn Lin <shawn.lin@rock-chips.com> >> Date: Tue Jan 17 09:22:55 2017 +0800 >> >> mmc: dw_mmc: silent verbose log when calling from PM context > > 'git describe' tells me this is in 4.11-rc1, so no. > > Brian > > _______________________________________________ > Linux-rockchip mailing list > Linux-rockchip@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-rockchip ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [REGRESSION 4.10] dw_mmc: failures on Rockchip rk3288 veyron boards 2017-03-30 2:18 ` Eddie Cai @ 2017-03-30 2:53 ` Brian Norris 2017-03-30 5:11 ` Jaehoon Chung 0 siblings, 1 reply; 18+ messages in thread From: Brian Norris @ 2017-03-30 2:53 UTC (permalink / raw) To: Eddie Cai Cc: Shawn Lin, Heiko Stuebner, Ziyuan Xu, linux-mmc, Jaehoon Chung, linux-rockchip, amstan, Kevin Mihelich Hi Eddie, On Thu, Mar 30, 2017 at 10:18:59AM +0800, Eddie Cai wrote: > I test on rk3288 firefly reload with 4.11-rc4. It work fine. OK, thanks for checking. > > On Thu, Mar 30, 2017 at 09:32:22AM +0800, Shawn Lin wrote: > >> Hi Brian, > >> > >> On 2017/3/30 9:17, Brian Norris wrote: > >> >Hi all, > >> > > >> >I haven't managed to get as far as a bugfix for this, but I've bisected > >> >some issues seen on v4.10+ with a Chromebook of the Veyron family (Jaq, > >> >in particular). v4.9 works fine. [...] By the way, Kevin (CC'd) says he noticed similiar Wifi issues on an Exynos 5800 Peach chromebook, but not on an Exynos 5250 Snow chromebook. I haven't picked these apart yet to see what the differences and similarities are, but presumably it's not actually a Rockchip-specific bug. Maybe related to the way power sequencing is plumbed for these, for example? Brian ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [REGRESSION 4.10] dw_mmc: failures on Rockchip rk3288 veyron boards 2017-03-30 2:53 ` Brian Norris @ 2017-03-30 5:11 ` Jaehoon Chung 2017-04-06 22:04 ` Brian Norris 0 siblings, 1 reply; 18+ messages in thread From: Jaehoon Chung @ 2017-03-30 5:11 UTC (permalink / raw) To: Brian Norris, Eddie Cai Cc: Shawn Lin, Heiko Stuebner, Ziyuan Xu, linux-mmc, linux-rockchip, amstan, Kevin Mihelich Hi, On 03/30/2017 11:53 AM, Brian Norris wrote: > Hi Eddie, > > On Thu, Mar 30, 2017 at 10:18:59AM +0800, Eddie Cai wrote: >> I test on rk3288 firefly reload with 4.11-rc4. It work fine. > > OK, thanks for checking. > >>> On Thu, Mar 30, 2017 at 09:32:22AM +0800, Shawn Lin wrote: >>>> Hi Brian, >>>> >>>> On 2017/3/30 9:17, Brian Norris wrote: >>>>> Hi all, >>>>> >>>>> I haven't managed to get as far as a bugfix for this, but I've bisected >>>>> some issues seen on v4.10+ with a Chromebook of the Veyron family (Jaq, >>>>> in particular). v4.9 works fine. > > [...] > > By the way, Kevin (CC'd) says he noticed similiar Wifi issues on an > Exynos 5800 Peach chromebook, but not on an Exynos 5250 Snow chromebook. > I haven't picked these apart yet to see what the differences and > similarities are, but presumably it's not actually a Rockchip-specific > bug. Maybe related to the way power sequencing is plumbed for these, for > example? I'm not sure but if card-detecting is polling, the timing issue could be occurred. Best Regards, Jaehoon Chung > > Brian > > > ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [REGRESSION 4.10] dw_mmc: failures on Rockchip rk3288 veyron boards 2017-03-30 5:11 ` Jaehoon Chung @ 2017-04-06 22:04 ` Brian Norris 2017-04-07 4:59 ` Jaehoon Chung 2017-04-07 6:50 ` Shawn Lin 0 siblings, 2 replies; 18+ messages in thread From: Brian Norris @ 2017-04-06 22:04 UTC (permalink / raw) To: Jaehoon Chung Cc: Eddie Cai, Shawn Lin, Heiko Stuebner, Ziyuan Xu, linux-mmc, linux-rockchip, amstan, Kevin Mihelich, Doug Anderson Hi, On Thu, Mar 30, 2017 at 02:11:19PM +0900, Jaehoon Chung wrote: > On 03/30/2017 11:53 AM, Brian Norris wrote: > > On Thu, Mar 30, 2017 at 10:18:59AM +0800, Eddie Cai wrote: > >> I test on rk3288 firefly reload with 4.11-rc4. It work fine. > > > > OK, thanks for checking. > > > >>> On Thu, Mar 30, 2017 at 09:32:22AM +0800, Shawn Lin wrote: > >>>> Hi Brian, > >>>> > >>>> On 2017/3/30 9:17, Brian Norris wrote: > >>>>> Hi all, > >>>>> > >>>>> I haven't managed to get as far as a bugfix for this, but I've bisected > >>>>> some issues seen on v4.10+ with a Chromebook of the Veyron family (Jaq, > >>>>> in particular). v4.9 works fine. > > > > [...] > > > > By the way, Kevin (CC'd) says he noticed similiar Wifi issues on an > > Exynos 5800 Peach chromebook, but not on an Exynos 5250 Snow chromebook. > > I haven't picked these apart yet to see what the differences and > > similarities are, but presumably it's not actually a Rockchip-specific > > bug. Maybe related to the way power sequencing is plumbed for these, for > > example? > > I'm not sure but if card-detecting is polling, the timing issue could be occurred. I don't know much about MMC in general, nor about this driver. Any chance you'd accept reverts of the patches in question though? This is a huge regression, and there were only a few relevant changes that seem to have triggered this. I can try to come up with something targeted, but I'm not going to even try if that'd get rejected up front. Brian ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [REGRESSION 4.10] dw_mmc: failures on Rockchip rk3288 veyron boards 2017-04-06 22:04 ` Brian Norris @ 2017-04-07 4:59 ` Jaehoon Chung 2017-04-07 6:50 ` Shawn Lin 1 sibling, 0 replies; 18+ messages in thread From: Jaehoon Chung @ 2017-04-07 4:59 UTC (permalink / raw) To: Brian Norris Cc: Eddie Cai, Shawn Lin, Heiko Stuebner, Ziyuan Xu, linux-mmc, linux-rockchip, amstan, Kevin Mihelich, Doug Anderson On 04/07/2017 07:04 AM, Brian Norris wrote: > Hi, > > On Thu, Mar 30, 2017 at 02:11:19PM +0900, Jaehoon Chung wrote: >> On 03/30/2017 11:53 AM, Brian Norris wrote: >>> On Thu, Mar 30, 2017 at 10:18:59AM +0800, Eddie Cai wrote: >>>> I test on rk3288 firefly reload with 4.11-rc4. It work fine. >>> >>> OK, thanks for checking. >>> >>>>> On Thu, Mar 30, 2017 at 09:32:22AM +0800, Shawn Lin wrote: >>>>>> Hi Brian, >>>>>> >>>>>> On 2017/3/30 9:17, Brian Norris wrote: >>>>>>> Hi all, >>>>>>> >>>>>>> I haven't managed to get as far as a bugfix for this, but I've bisected >>>>>>> some issues seen on v4.10+ with a Chromebook of the Veyron family (Jaq, >>>>>>> in particular). v4.9 works fine. >>> >>> [...] >>> >>> By the way, Kevin (CC'd) says he noticed similiar Wifi issues on an >>> Exynos 5800 Peach chromebook, but not on an Exynos 5250 Snow chromebook. >>> I haven't picked these apart yet to see what the differences and >>> similarities are, but presumably it's not actually a Rockchip-specific >>> bug. Maybe related to the way power sequencing is plumbed for these, for >>> example? >> >> I'm not sure but if card-detecting is polling, the timing issue could be occurred. > > I don't know much about MMC in general, nor about this driver. Any > chance you'd accept reverts of the patches in question though? This is a > huge regression, and there were only a few relevant changes that seem to > have triggered this. I can try to come up with something targeted, but > I'm not going to even try if that'd get rejected up front. Sure, if it's big regression, we can revert the patches relevant to problem. After fixing it, we can re-apply them..before reverting, i will try to fix it until next Wends, Otherwise, will revert them at that time. Best Regards, Jaehoon Chung > > Brian > > > ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [REGRESSION 4.10] dw_mmc: failures on Rockchip rk3288 veyron boards 2017-04-06 22:04 ` Brian Norris 2017-04-07 4:59 ` Jaehoon Chung @ 2017-04-07 6:50 ` Shawn Lin 2017-04-07 7:38 ` Jaehoon Chung 1 sibling, 1 reply; 18+ messages in thread From: Shawn Lin @ 2017-04-07 6:50 UTC (permalink / raw) To: Brian Norris, Jaehoon Chung Cc: Eddie Cai, Heiko Stuebner, Ziyuan Xu, linux-mmc, linux-rockchip, amstan, Kevin Mihelich, Doug Anderson Hi Brian, On 2017/4/7 6:04, Brian Norris wrote: > Hi, > > On Thu, Mar 30, 2017 at 02:11:19PM +0900, Jaehoon Chung wrote: >> On 03/30/2017 11:53 AM, Brian Norris wrote: >>> On Thu, Mar 30, 2017 at 10:18:59AM +0800, Eddie Cai wrote: >>>> I test on rk3288 firefly reload with 4.11-rc4. It work fine. >>> >>> OK, thanks for checking. >>> >>>>> On Thu, Mar 30, 2017 at 09:32:22AM +0800, Shawn Lin wrote: >>>>>> Hi Brian, >>>>>> >>>>>> On 2017/3/30 9:17, Brian Norris wrote: >>>>>>> Hi all, >>>>>>> >>>>>>> I haven't managed to get as far as a bugfix for this, but I've bisected >>>>>>> some issues seen on v4.10+ with a Chromebook of the Veyron family (Jaq, >>>>>>> in particular). v4.9 works fine. >>> >>> [...] >>> >>> By the way, Kevin (CC'd) says he noticed similiar Wifi issues on an >>> Exynos 5800 Peach chromebook, but not on an Exynos 5250 Snow chromebook. >>> I haven't picked these apart yet to see what the differences and >>> similarities are, but presumably it's not actually a Rockchip-specific >>> bug. Maybe related to the way power sequencing is plumbed for these, for >>> example? >> >> I'm not sure but if card-detecting is polling, the timing issue could be occurred. > > I don't know much about MMC in general, nor about this driver. Any > chance you'd accept reverts of the patches in question though? This is a > huge regression, and there were only a few relevant changes that seem to > have triggered this. I can try to come up with something targeted, but > I'm not going to even try if that'd get rejected up front. Untile now, none of my(and my colleagues') rockchip platforms are able to reproduce this issue, so I can't tell what exactly the problem is. However, I noticed you mentioned that the Exynos platforms are also affected by rpm of dwmmc. I don't see dw_mmc-exynos enable this feature, so it looks quite odd to me! Can I or Eddie get a Veyron board to help you debug it? > > Brian > > > ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [REGRESSION 4.10] dw_mmc: failures on Rockchip rk3288 veyron boards 2017-04-07 6:50 ` Shawn Lin @ 2017-04-07 7:38 ` Jaehoon Chung 0 siblings, 0 replies; 18+ messages in thread From: Jaehoon Chung @ 2017-04-07 7:38 UTC (permalink / raw) To: Shawn Lin, Brian Norris Cc: Eddie Cai, Heiko Stuebner, Ziyuan Xu, linux-mmc, linux-rockchip, amstan, Kevin Mihelich, Doug Anderson On 04/07/2017 03:50 PM, Shawn Lin wrote: > Hi Brian, > > On 2017/4/7 6:04, Brian Norris wrote: >> Hi, >> >> On Thu, Mar 30, 2017 at 02:11:19PM +0900, Jaehoon Chung wrote: >>> On 03/30/2017 11:53 AM, Brian Norris wrote: >>>> On Thu, Mar 30, 2017 at 10:18:59AM +0800, Eddie Cai wrote: >>>>> I test on rk3288 firefly reload with 4.11-rc4. It work fine. >>>> >>>> OK, thanks for checking. >>>> >>>>>> On Thu, Mar 30, 2017 at 09:32:22AM +0800, Shawn Lin wrote: >>>>>>> Hi Brian, >>>>>>> >>>>>>> On 2017/3/30 9:17, Brian Norris wrote: >>>>>>>> Hi all, >>>>>>>> >>>>>>>> I haven't managed to get as far as a bugfix for this, but I've bisected >>>>>>>> some issues seen on v4.10+ with a Chromebook of the Veyron family (Jaq, >>>>>>>> in particular). v4.9 works fine. >>>> >>>> [...] >>>> >>>> By the way, Kevin (CC'd) says he noticed similiar Wifi issues on an >>>> Exynos 5800 Peach chromebook, but not on an Exynos 5250 Snow chromebook. >>>> I haven't picked these apart yet to see what the differences and >>>> similarities are, but presumably it's not actually a Rockchip-specific >>>> bug. Maybe related to the way power sequencing is plumbed for these, for >>>> example? >>> >>> I'm not sure but if card-detecting is polling, the timing issue could be occurred. >> >> I don't know much about MMC in general, nor about this driver. Any >> chance you'd accept reverts of the patches in question though? This is a >> huge regression, and there were only a few relevant changes that seem to >> have triggered this. I can try to come up with something targeted, but >> I'm not going to even try if that'd get rejected up front. > > Untile now, none of my(and my colleagues') rockchip platforms are able > to reproduce this issue, so I can't tell what exactly the problem is. > However, I noticed you mentioned that the Exynos platforms are also > affected by rpm of dwmmc. I don't see dw_mmc-exynos enable this feature, > so it looks quite odd to me! Well, exynos boards what i have didn't see the similar issue. But i will test all exynos boards for reproducing this. > > Can I or Eddie get a Veyron board to help you debug it? > >> >> Brian >> >> >> > > -- > To unsubscribe from this list: send the line "unsubscribe linux-mmc" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [REGRESSION 4.10] dw_mmc: failures on Rockchip rk3288 veyron boards 2017-03-30 1:17 [REGRESSION 4.10] dw_mmc: failures on Rockchip rk3288 veyron boards Brian Norris [not found] ` <20170330011709.GA110687-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org> @ 2017-04-10 23:35 ` Doug Anderson 2017-04-11 10:21 ` Ulf Hansson 2017-04-12 0:54 ` Shawn Lin 1 sibling, 2 replies; 18+ messages in thread From: Doug Anderson @ 2017-04-10 23:35 UTC (permalink / raw) To: Brian Norris Cc: linux-mmc@vger.kernel.org, open list:ARM/Rockchip SoC..., Heiko Stuebner, Alexandru M Stan, Ziyuan Xu, Shawn Lin, Jaehoon Chung, kevin Hi, On Wed, Mar 29, 2017 at 6:17 PM, Brian Norris <briannorris@chromium.org> wrote: > Hi all, > > I haven't managed to get as far as a bugfix for this, but I've bisected > some issues seen on v4.10+ with a Chromebook of the Veyron family (Jaq, > in particular). v4.9 works fine. OK, I finally got everything up and running to test this too... > Issue #1 - eMMC complains periodically: > > [ 4.358135] mmc_host mmc2: Bus speed (slot 0) = 148500000Hz (slot req 150000000Hz, actual 148500000HZ div = 0) > [ 4.461466] mmc_host mmc2: Bus speed (slot 0) = 148500000Hz (slot req 150000000Hz, actual 148500000HZ div = 0) > [ 5.291450] mmc_host mmc2: Bus speed (slot 0) = 148500000Hz (slot req 150000000Hz, actual 148500000HZ div = 0) > [ 5.381471] mmc_host mmc2: Bus speed (slot 0) = 148500000Hz (slot req 150000000Hz, actual 148500000HZ div = 0) > [ 11.243337] mmc_host mmc2: Bus speed (slot 0) = 148500000Hz (slot req 150000000Hz, actual 148500000HZ div = 0) > [ 17.371628] mmc_host mmc2: Bus speed (slot 0) = 148500000Hz (slot req 150000000Hz, actual 148500000HZ div = 0) I don't believe that this is an error, actually. Really we just need to quiet this message or (since I've always found it useful) move it to a different place. I believe that with runtime PM we're effectively turning the clock off whenever the MMC device isn't in use. On dw_mmc we have a "helpful" printout every time the clock is changed, and that's what you're seeing here. You can see with: while true; do dd if=/dev/mmcblk2 of=/dev/null bs=512 count=1 iflag=direct; sleep .1; done ...that you'll get a printout every 100ms. Ah, looks like this is in: ce69e2fea093 mmc: dw_mmc: silent verbose log when calling from PM context ...as pointed out by Shawn Lin. So I think in 4.10 we can just ignore those messages and they're good on 4.11. > and if I stress it out at all (e.g., dd if=/dev/mmcblk2 bs=1M > > /dev/null), it will eventually croak: > > [ 359.916315] mmc_host mmc2: Bus speed (slot 0) = 148500000Hz (slot req 150000000Hz, actual 148500000HZ div = 0) > [ 360.071378] dwmmc_rockchip ff0f0000.dwmmc: Successfully tuned phase to 153 > [ 360.211351] mmcblk2: error -110 transferring data, sector 8644608, nr 2048, cmd response 0x900, card status 0x0 > [ 360.221936] mmcblk2: retrying using single block read > [ 363.491362] mmcblk2: error -110 transferring data, sector 8646656, nr 2048, cmd response 0x900, card status 0x0 > [ 363.531569] mmc_host mmc2: Bus speed (slot 0) = 400000Hz (slot req 400000Hz, actual 400000HZ div = 0) > [ 363.596326] mmc_host mmc2: Bus speed (slot 0) = 148500000Hz (slot req 150000000Hz, actual 148500000HZ div = 0) > [ 363.612712] dwmmc_rockchip ff0f0000.dwmmc: Successfully tuned phase to 152 > [ 363.751351] mmcblk2: error -110 transferring data, sector 8646656, nr 2048, cmd response 0x900, card status 0x0 > [ 363.761938] mmcblk2: retrying using single block read > [ 366.611356] INFO: task mmcqd/2boot1:92 blocked for more than 120 seconds. > [ 366.618134] Not tainted 4.10.0 #284 > [ 366.622146] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > [ 366.629960] mmcqd/2boot1 D 0 92 2 0x00000000 > [ 366.635454] [<c07dc21c>] (__schedule) from [<c07dc4e0>] (schedule+0x90/0xa0) > [ 366.642497] [<c07dc4e0>] (schedule) from [<c066e8b4>] (__mmc_claim_host+0xd4/0x19c) > [ 366.650142] [<c066e8b4>] (__mmc_claim_host) from [<c066e9ac>] (mmc_get_card+0x30/0x34) > [ 366.658056] [<c066e9ac>] (mmc_get_card) from [<c067fc8c>] (mmc_blk_issue_rq+0x64/0x48c) > [ 366.666052] [<c067fc8c>] (mmc_blk_issue_rq) from [<c0680230>] (mmc_queue_thread+0x114/0x1b4) > [ 366.674484] [<c0680230>] (mmc_queue_thread) from [<c023d1b0>] (kthread+0x128/0x144) > [ 366.682134] [<c023d1b0>] (kthread) from [<c02076e8>] (ret_from_fork+0x14/0x2c) I'm not convinced this is a regression. I remember Heiko saying that he's heard reports that on some boards eMMC doesn't work with high speed, and I'be believe that's what you're seeing here. It would be interesting to try to debug this. I can't personally reproduce, though. I think veyron_minnie already has UHS turned off for eMMC upstream. I guess we could do it for other veyron boards too until someone can debug? > Issue #2 - Wifi (via SDIO, mmc1) is completely dead: > > [ 1.444125] mmc_host mmc1: card is non-removable. > [ 1.471368] mmc_host mmc1: Bus speed (slot 0) = 400000Hz (slot req 400000Hz, actual 400000HZ div = 0) > [ 1.619553] mmc_host mmc1: Bus speed (slot 0) = 148500000Hz (slot req 150000000Hz, actual 148500000HZ div = 0) > [ 1.881699] mmc1: new ultra high speed SDR104 SDIO card at address 0001 > [ 25.681172] mmc_host mmc1: Bus speed (slot 0) = 148500000Hz (slot req 150000000Hz, actual 148500000HZ div = 0) > [ 25.691666] mwifiex: rx work enabled, cpus 4 > [ 26.827000] mwifiex_sdio mmc1:0001:1: info: FW download over, size 800344 bytes > [ 27.561352] mwifiex_sdio mmc1:0001:1: WLAN FW is active > [ 33.585165] mmc_host mmc1: Bus speed (slot 0) = 148500000Hz (slot req 150000000Hz, actual 148500000HZ div = 0) > [ 37.651344] mwifiex_sdio mmc1:0001:1: mwifiex_cmd_timeout_func: Timeout cmd id = 0xa9, act = 0x0 > [ 37.660122] mwifiex_sdio mmc1:0001:1: num_data_h2c_failure = 0 > [ 37.665951] mwifiex_sdio mmc1:0001:1: num_cmd_h2c_failure = 0 > [ 37.671688] mwifiex_sdio mmc1:0001:1: is_cmd_timedout = 1 > [ 37.677076] mwifiex_sdio mmc1:0001:1: num_tx_timeout = 0 > [ 37.682380] mwifiex_sdio mmc1:0001:1: last_cmd_index = 1 > [ 37.687681] mwifiex_sdio mmc1:0001:1: last_cmd_id: 00 00 a9 00 00 00 00 00 00 00 > [ 37.695066] mwifiex_sdio mmc1:0001:1: last_cmd_act: 00 00 00 00 00 00 00 00 00 00 > [ 37.702536] mwifiex_sdio mmc1:0001:1: last_cmd_resp_index = 0 > [ 37.708269] mwifiex_sdio mmc1:0001:1: last_cmd_resp_id: 00 00 00 00 00 00 00 00 00 00 > [ 37.716087] mwifiex_sdio mmc1:0001:1: last_event_index = 0 > [ 37.721564] mwifiex_sdio mmc1:0001:1: last_event: 00 00 00 00 00 00 00 00 00 00 > [ 37.728857] mwifiex_sdio mmc1:0001:1: data_sent=1 cmd_sent=0 > [ 37.734508] mwifiex_sdio mmc1:0001:1: ps_mode=0 ps_state=0 > [ 37.740016] mmc_host mmc1: Bus speed (slot 0) = 148500000Hz (slot req 150000000Hz, actual 148500000HZ div = 0) > [ 37.750268] mwifiex_sdio mmc1:0001:1: info: mwifiex_fw_dpc: unregister device This doesn't surprise me at all. What surprises me, though, is that nobody else seems to be able to reproduce this. On veyron, WiFi is connected via SDIO. For good speed, it uses SDIO Interrupts. See this bit in the device tree: cap-sdio-irq; SDIO interrupts (in 4-bit mode) specifically need the card clock to be running all the time to work. I can reproduce your regression (on veryron-jerry, which also has Marvell WiFi) and I can also find that the regression is "gone' if I take out the "cap-sdio-irq" in the veyron device tree. Ah, interestingly enough, turning off SDIO interrupts has the side effect of sending enough (polling) traffic that we never seem to runtime suspend, either. ;-P In general I'd question whether dw_mmc actually gets much power benefit from Runtime PM in Linux. The dw_mmc IP blocks already have a feature in them to automatically stop and restart the card clock. See SDMMC_CLKEN_LOW_PWR. Maybe you're getting the benefit of turning off VMMC or VQMMC? Is that really a lot of power? Presumably those power savings would be for eMMC or normal SD cards (not SDIO). Maybe someone else on this thread knows how Runtime PM is supposed to work in general for SDIO? I notice that in sdio.c the mmc_sdio_runtime_suspend() unconditionally calls mmc_power_off(). That seems odd since the main mmc_sdio_suspend() _doesn't_ call it if mmc_card_keep_power(). Hrmmm... OK, so I just tried this on veyron-minnie. On minnie we have Broadcom WiFi. That actually works (!). Presumably this is because brcmf_sdiod_host_fixup() calls pm_runtime_forbid(). Commenting that out breaks things. OK, and I can make Marvell work by adding "pm_runtime_forbid(func->card->host->parent);" to the end of mwifiex_sdio_probe(). -- So where does that leave us? A) Technically we can fix Marvell's driver to work like Broadcom's. One could possibly assert that this is the wrong fix because technically we could make Runtime PM work with SDIO with enough work. We could theoretically move into 1-bit mode and there (I think) you can get interrupts with the clock off. ...or we could have a dedicated SDIO Interrupt pin (for the embedded case), which is talked about in the SDIO spec. B) Technically we could hack this in the dw_mmc code to disable Runtime PM if we see that an SDIO interrupt is used. One advantage of doing it here is that if we ever add support in dw_mmc for the external SDIO interrupt we could allow Runtime PM in that case. In theory the dw_mmc IP block has some basic support for a dedicated SDIO interrupt pin, but there's no code to support it. C) Technically we could add this into the MMC core. D) Technically we could remove Runtime PM support from dw_mmc for now until someone can address all these issues (and ideally show a real power savings). I'd tend to vote for D, but I've been pretty absent from dw_mmc for a long time, so probably my vote isn't worth that much... Shawn: I think you actually enabled runtime PM. Did you really see power savings, or did it just seem like enabling Runtime PM would be a neat thing to do? -Doug ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [REGRESSION 4.10] dw_mmc: failures on Rockchip rk3288 veyron boards 2017-04-10 23:35 ` Doug Anderson @ 2017-04-11 10:21 ` Ulf Hansson 2017-04-11 22:57 ` Doug Anderson 2017-04-12 0:54 ` Shawn Lin 1 sibling, 1 reply; 18+ messages in thread From: Ulf Hansson @ 2017-04-11 10:21 UTC (permalink / raw) To: Doug Anderson Cc: Brian Norris, linux-mmc@vger.kernel.org, open list:ARM/Rockchip SoC..., Heiko Stuebner, Alexandru M Stan, Ziyuan Xu, Shawn Lin, Jaehoon Chung, kevin [...] > >> Issue #2 - Wifi (via SDIO, mmc1) is completely dead: >> >> [ 1.444125] mmc_host mmc1: card is non-removable. >> [ 1.471368] mmc_host mmc1: Bus speed (slot 0) = 400000Hz (slot req 400000Hz, actual 400000HZ div = 0) >> [ 1.619553] mmc_host mmc1: Bus speed (slot 0) = 148500000Hz (slot req 150000000Hz, actual 148500000HZ div = 0) >> [ 1.881699] mmc1: new ultra high speed SDR104 SDIO card at address 0001 >> [ 25.681172] mmc_host mmc1: Bus speed (slot 0) = 148500000Hz (slot req 150000000Hz, actual 148500000HZ div = 0) >> [ 25.691666] mwifiex: rx work enabled, cpus 4 >> [ 26.827000] mwifiex_sdio mmc1:0001:1: info: FW download over, size 800344 bytes >> [ 27.561352] mwifiex_sdio mmc1:0001:1: WLAN FW is active >> [ 33.585165] mmc_host mmc1: Bus speed (slot 0) = 148500000Hz (slot req 150000000Hz, actual 148500000HZ div = 0) >> [ 37.651344] mwifiex_sdio mmc1:0001:1: mwifiex_cmd_timeout_func: Timeout cmd id = 0xa9, act = 0x0 >> [ 37.660122] mwifiex_sdio mmc1:0001:1: num_data_h2c_failure = 0 >> [ 37.665951] mwifiex_sdio mmc1:0001:1: num_cmd_h2c_failure = 0 >> [ 37.671688] mwifiex_sdio mmc1:0001:1: is_cmd_timedout = 1 >> [ 37.677076] mwifiex_sdio mmc1:0001:1: num_tx_timeout = 0 >> [ 37.682380] mwifiex_sdio mmc1:0001:1: last_cmd_index = 1 >> [ 37.687681] mwifiex_sdio mmc1:0001:1: last_cmd_id: 00 00 a9 00 00 00 00 00 00 00 >> [ 37.695066] mwifiex_sdio mmc1:0001:1: last_cmd_act: 00 00 00 00 00 00 00 00 00 00 >> [ 37.702536] mwifiex_sdio mmc1:0001:1: last_cmd_resp_index = 0 >> [ 37.708269] mwifiex_sdio mmc1:0001:1: last_cmd_resp_id: 00 00 00 00 00 00 00 00 00 00 >> [ 37.716087] mwifiex_sdio mmc1:0001:1: last_event_index = 0 >> [ 37.721564] mwifiex_sdio mmc1:0001:1: last_event: 00 00 00 00 00 00 00 00 00 00 >> [ 37.728857] mwifiex_sdio mmc1:0001:1: data_sent=1 cmd_sent=0 >> [ 37.734508] mwifiex_sdio mmc1:0001:1: ps_mode=0 ps_state=0 >> [ 37.740016] mmc_host mmc1: Bus speed (slot 0) = 148500000Hz (slot req 150000000Hz, actual 148500000HZ div = 0) >> [ 37.750268] mwifiex_sdio mmc1:0001:1: info: mwifiex_fw_dpc: unregister device > > This doesn't surprise me at all. What surprises me, though, is that > nobody else seems to be able to reproduce this. Me too. As I have stated several times, the PM code for SDIO is fragile/broken for many scenarios. This is just one case. > > On veyron, WiFi is connected via SDIO. For good speed, it uses SDIO > Interrupts. See this bit in the device tree: > > cap-sdio-irq; > > SDIO interrupts (in 4-bit mode) specifically need the card clock to be > running all the time to work. I can reproduce your regression (on > veryron-jerry, which also has Marvell WiFi) and I can also find that > the regression is "gone' if I take out the "cap-sdio-irq" in the > veyron device tree. Ah, interestingly enough, turning off SDIO > interrupts has the side effect of sending enough (polling) traffic > that we never seem to runtime suspend, either. ;-P We did a similar fix for sdhci recently. Simply, in cases when sdio IRQ is turned on, we call pm_runtime_get_noresume() to prevent the device from being runtime suspended. Unless the SoC supports the SDIO irq to be re-routed to a wakeup IRQ at runtime suspend, there is no other solution. However, re-routing to a wakeup IRQ should be done, when switching to 1-bit mode is completed. This is currently not supported by the mmc core. > > In general I'd question whether dw_mmc actually gets much power > benefit from Runtime PM in Linux. The dw_mmc IP blocks already have a > feature in them to automatically stop and restart the card clock. See > SDMMC_CLKEN_LOW_PWR. Maybe you're getting the benefit of turning off > VMMC or VQMMC? Is that really a lot of power? Presumably those power > savings would be for eMMC or normal SD cards (not SDIO). I think that depends on the PM topology of the SoC. Perhaps the dw_mmc devices are in PM domains sharing power rails etc, and preventing runtime PM could be very costly as it then prevent those shared resources to be put into low power state. I think the best we can do at this point is something similar as we do for sdhci. > > Maybe someone else on this thread knows how Runtime PM is supposed to > work in general for SDIO? I notice that in sdio.c the > mmc_sdio_runtime_suspend() unconditionally calls mmc_power_off(). > That seems odd since the main mmc_sdio_suspend() _doesn't_ call it if > mmc_card_keep_power(). Hrmmm... As stated above, the system PM and runtime PM code for SDIO is fragile and needs an update. I know how to do it, but it requires some work. I have started to hack on it several times, maybe I just need to put everything else aside and focus on this. :-) Moreover, I would really like to invent the feature being able to defer system PM resume of the SDIO card (in cases when it means a full re-init of the SDIO card) to runtime PM resume instead. Why? It would saves several hundreds of milliseconds in system PM resume time. The very similar feature as we already have for SD/(e)MMC. > > OK, so I just tried this on veyron-minnie. On minnie we have Broadcom > WiFi. That actually works (!). Presumably this is because > brcmf_sdiod_host_fixup() calls pm_runtime_forbid(). Commenting that > out breaks things. I think this should be managed in the dw_mmc driver instead, as it is there the problem lies. > > OK, and I can make Marvell work by adding > "pm_runtime_forbid(func->card->host->parent);" to the end of > mwifiex_sdio_probe(). Again, dw_mmc is the correct place. > > -- > > So where does that leave us? > > A) Technically we can fix Marvell's driver to work like Broadcom's. > One could possibly assert that this is the wrong fix because > technically we could make Runtime PM work with SDIO with enough work. > We could theoretically move into 1-bit mode and there (I think) you > can get interrupts with the clock off. ...or we could have a > dedicated SDIO Interrupt pin (for the embedded case), which is talked > about in the SDIO spec. If the WIFI chip supports an external SDIO irq pin, that is very much preferred. Both from PM point of view, but actually also from performance point of view (mainly because it's faster to ack the IRQ). That said, for these scenarios, I assume the switching to 1-bit mode isn't necessary before gating the clock, as the IRQ is driven completely separately from the SDIO bus. From my own experience, this is how cw1200 WIFI chip behaves on ux500. > > B) Technically we could hack this in the dw_mmc code to disable > Runtime PM if we see that an SDIO interrupt is used. One advantage of > doing it here is that if we ever add support in dw_mmc for the > external SDIO interrupt we could allow Runtime PM in that case. In > theory the dw_mmc IP block has some basic support for a dedicated SDIO > interrupt pin, but there's no code to support it. Right. > > C) Technically we could add this into the MMC core. Perhaps the MMC core needs to play a role, not sure exactly how yet. > > D) Technically we could remove Runtime PM support from dw_mmc for now > until someone can address all these issues (and ideally show a real > power savings). No. Then it's better to just prevent runtime suspend when SDIO irq becomes enabled. > > > I'd tend to vote for D, but I've been pretty absent from dw_mmc for a > long time, so probably my vote isn't worth that much... > > Shawn: I think you actually enabled runtime PM. Did you really see > power savings, or did it just seem like enabling Runtime PM would be a > neat thing to do? > > > -Doug Dough, really appreciate you efforts in testing this and the detailed way you describes the problem. Kind regards Uffe ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [REGRESSION 4.10] dw_mmc: failures on Rockchip rk3288 veyron boards 2017-04-11 10:21 ` Ulf Hansson @ 2017-04-11 22:57 ` Doug Anderson 0 siblings, 0 replies; 18+ messages in thread From: Doug Anderson @ 2017-04-11 22:57 UTC (permalink / raw) To: Ulf Hansson Cc: Brian Norris, linux-mmc@vger.kernel.org, open list:ARM/Rockchip SoC..., Heiko Stuebner, Alexandru M Stan, Ziyuan Xu, Shawn Lin, Jaehoon Chung, kevin Hi, On Tue, Apr 11, 2017 at 3:21 AM, Ulf Hansson <ulf.hansson@linaro.org> wrote: > [...] > >> >>> Issue #2 - Wifi (via SDIO, mmc1) is completely dead: >>> >>> [ 1.444125] mmc_host mmc1: card is non-removable. >>> [ 1.471368] mmc_host mmc1: Bus speed (slot 0) = 400000Hz (slot req 400000Hz, actual 400000HZ div = 0) >>> [ 1.619553] mmc_host mmc1: Bus speed (slot 0) = 148500000Hz (slot req 150000000Hz, actual 148500000HZ div = 0) >>> [ 1.881699] mmc1: new ultra high speed SDR104 SDIO card at address 0001 >>> [ 25.681172] mmc_host mmc1: Bus speed (slot 0) = 148500000Hz (slot req 150000000Hz, actual 148500000HZ div = 0) >>> [ 25.691666] mwifiex: rx work enabled, cpus 4 >>> [ 26.827000] mwifiex_sdio mmc1:0001:1: info: FW download over, size 800344 bytes >>> [ 27.561352] mwifiex_sdio mmc1:0001:1: WLAN FW is active >>> [ 33.585165] mmc_host mmc1: Bus speed (slot 0) = 148500000Hz (slot req 150000000Hz, actual 148500000HZ div = 0) >>> [ 37.651344] mwifiex_sdio mmc1:0001:1: mwifiex_cmd_timeout_func: Timeout cmd id = 0xa9, act = 0x0 >>> [ 37.660122] mwifiex_sdio mmc1:0001:1: num_data_h2c_failure = 0 >>> [ 37.665951] mwifiex_sdio mmc1:0001:1: num_cmd_h2c_failure = 0 >>> [ 37.671688] mwifiex_sdio mmc1:0001:1: is_cmd_timedout = 1 >>> [ 37.677076] mwifiex_sdio mmc1:0001:1: num_tx_timeout = 0 >>> [ 37.682380] mwifiex_sdio mmc1:0001:1: last_cmd_index = 1 >>> [ 37.687681] mwifiex_sdio mmc1:0001:1: last_cmd_id: 00 00 a9 00 00 00 00 00 00 00 >>> [ 37.695066] mwifiex_sdio mmc1:0001:1: last_cmd_act: 00 00 00 00 00 00 00 00 00 00 >>> [ 37.702536] mwifiex_sdio mmc1:0001:1: last_cmd_resp_index = 0 >>> [ 37.708269] mwifiex_sdio mmc1:0001:1: last_cmd_resp_id: 00 00 00 00 00 00 00 00 00 00 >>> [ 37.716087] mwifiex_sdio mmc1:0001:1: last_event_index = 0 >>> [ 37.721564] mwifiex_sdio mmc1:0001:1: last_event: 00 00 00 00 00 00 00 00 00 00 >>> [ 37.728857] mwifiex_sdio mmc1:0001:1: data_sent=1 cmd_sent=0 >>> [ 37.734508] mwifiex_sdio mmc1:0001:1: ps_mode=0 ps_state=0 >>> [ 37.740016] mmc_host mmc1: Bus speed (slot 0) = 148500000Hz (slot req 150000000Hz, actual 148500000HZ div = 0) >>> [ 37.750268] mwifiex_sdio mmc1:0001:1: info: mwifiex_fw_dpc: unregister device >> >> This doesn't surprise me at all. What surprises me, though, is that >> nobody else seems to be able to reproduce this. > > Me too. > > As I have stated several times, the PM code for SDIO is fragile/broken > for many scenarios. This is just one case. > >> >> On veyron, WiFi is connected via SDIO. For good speed, it uses SDIO >> Interrupts. See this bit in the device tree: >> >> cap-sdio-irq; >> >> SDIO interrupts (in 4-bit mode) specifically need the card clock to be >> running all the time to work. I can reproduce your regression (on >> veryron-jerry, which also has Marvell WiFi) and I can also find that >> the regression is "gone' if I take out the "cap-sdio-irq" in the >> veyron device tree. Ah, interestingly enough, turning off SDIO >> interrupts has the side effect of sending enough (polling) traffic >> that we never seem to runtime suspend, either. ;-P > > We did a similar fix for sdhci recently. Simply, in cases when sdio > IRQ is turned on, we call pm_runtime_get_noresume() to prevent the > device from being runtime suspended. I tried a similar mechanism for dw_mmc and it mostly worked. Until I stressed it out. I stressed it out by running: while true; do ifconfig mlan0 up; ifconfig mlan0 down; done When I did this long enough, I somehow managed to get into a state where the card allowed itself to temporarily be runtime suspended. This caused communication errors and eventually the mwifiex driver did a full reset of itself. I found that when communication errors were happening that I was runtime suspended. For whatever reason this was easiest to reproduce when I added a printk to a serial console in the enable_sdio_irq() callback, but I could also reproduce by setting the autosuspend_delay_ms to 1 and using pm_runtime_put() instead of pm_runtime_put_noidle() in my patch. It looks like on dw_mmc enable_sdio_irq(0) enable_sdio_irq(1) is called almost constantly. Without having boards with SDHCI and SDIO to test with, it appears that this is different than how things work with SDHCI. For dw_mmc enable/disable is called in sdio_irq_thread() to mask new interrupts while processing the current one. SDHCI doesn't use sdio_irq_thread() because it sets MMC_CAP2_SDIO_IRQ_NOTHREAD. Because of this constant stream of disable / enable calls we spend some amount of time with Runtime PM enabled. If we happen to have enough delays (or printks) and we happen to get lucky then we can end up running the PM Runtime suspend code for dw_mmc. This is bad because dw_mci_runtime_resume() fully resets the host controller and clears all interrupts. That doesn't seem so great, but I believe the specific problem is that we might be clearing the next SDIO interrupt which might have already come in (I haven't proven this). Overall we really just don't want any Runtime PM at all when SDIO Interrupts are being used. There's currently no callback into dw_mmc that gets called when someone holds a SDIO IRQ. I suppose we could add one from sdio_claim_irq() / sdio_release_irq() if that was desired... We could also (in theory) get the core SDIO code to get / put Runtime PM whenever it's currently processing an interrupt. I coded this up and I can post it if you want, but it feels a bit complicated. Instead I'm thinking of using the same solution I came up with for DW_MMC_CARD_NO_LOW_PWR in dw_mmc: putting this in dw_mci_init_card(). It's not 100% perfect if there are use cases where the SDIO interrupt is really disabled for long periods of times (and we want to save power) but seems like a good stopgap and eliminates this particular regression quickly. OK, I've posted that now. https://patchwork.kernel.org/patch/9676197/ Whew, that took a lot longer to dig into than I originally thought it would. :-P >> In general I'd question whether dw_mmc actually gets much power >> benefit from Runtime PM in Linux. The dw_mmc IP blocks already have a >> feature in them to automatically stop and restart the card clock. See >> SDMMC_CLKEN_LOW_PWR. Maybe you're getting the benefit of turning off >> VMMC or VQMMC? Is that really a lot of power? Presumably those power >> savings would be for eMMC or normal SD cards (not SDIO). > > I think that depends on the PM topology of the SoC. Perhaps the dw_mmc > devices are in PM domains sharing power rails etc, and preventing > runtime PM could be very costly as it then prevent those shared > resources to be put into low power state. Ah, good point. I hadn't thought about the shared power domain case. -Doug ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [REGRESSION 4.10] dw_mmc: failures on Rockchip rk3288 veyron boards 2017-04-10 23:35 ` Doug Anderson 2017-04-11 10:21 ` Ulf Hansson @ 2017-04-12 0:54 ` Shawn Lin 2017-04-12 16:12 ` Doug Anderson 1 sibling, 1 reply; 18+ messages in thread From: Shawn Lin @ 2017-04-12 0:54 UTC (permalink / raw) To: Doug Anderson Cc: Brian Norris, linux-mmc@vger.kernel.org, open list:ARM/Rockchip SoC..., Heiko Stuebner, Alexandru M Stan, Ziyuan Xu, Jaehoon Chung, kevin Hi Doug, 在 2017/4/11 7:35, Doug Anderson 写道: > Hi, > > On Wed, Mar 29, 2017 at 6:17 PM, Brian Norris <briannorris@chromium.org> wrote: >> Hi all, >> >> I haven't managed to get as far as a bugfix for this, but I've bisected >> some issues seen on v4.10+ with a Chromebook of the Veyron family (Jaq, >> in particular). v4.9 works fine. > > OK, I finally got everything up and running to test this too... > > >> Issue #1 - eMMC complains periodically: >> >> [ 4.358135] mmc_host mmc2: Bus speed (slot 0) = 148500000Hz (slot req 150000000Hz, actual 148500000HZ div = 0) >> [ 4.461466] mmc_host mmc2: Bus speed (slot 0) = 148500000Hz (slot req 150000000Hz, actual 148500000HZ div = 0) >> [ 5.291450] mmc_host mmc2: Bus speed (slot 0) = 148500000Hz (slot req 150000000Hz, actual 148500000HZ div = 0) >> [ 5.381471] mmc_host mmc2: Bus speed (slot 0) = 148500000Hz (slot req 150000000Hz, actual 148500000HZ div = 0) >> [ 11.243337] mmc_host mmc2: Bus speed (slot 0) = 148500000Hz (slot req 150000000Hz, actual 148500000HZ div = 0) >> [ 17.371628] mmc_host mmc2: Bus speed (slot 0) = 148500000Hz (slot req 150000000Hz, actual 148500000HZ div = 0) > > I don't believe that this is an error, actually. Really we just need > to quiet this message or (since I've always found it useful) move it > to a different place. I believe that with runtime PM we're > effectively turning the clock off whenever the MMC device isn't in > use. On dw_mmc we have a "helpful" printout every time the clock is > changed, and that's what you're seeing here. > > You can see with: > > while true; do > dd if=/dev/mmcblk2 of=/dev/null bs=512 count=1 iflag=direct; > sleep .1; > done > > ...that you'll get a printout every 100ms. > > > Ah, looks like this is in: > > ce69e2fea093 mmc: dw_mmc: silent verbose log when calling from PM context > > ...as pointed out by Shawn Lin. So I think in 4.10 we can just ignore > those messages and they're good on 4.11. > > >> and if I stress it out at all (e.g., dd if=/dev/mmcblk2 bs=1M > >> /dev/null), it will eventually croak: >> >> [ 359.916315] mmc_host mmc2: Bus speed (slot 0) = 148500000Hz (slot req 150000000Hz, actual 148500000HZ div = 0) >> [ 360.071378] dwmmc_rockchip ff0f0000.dwmmc: Successfully tuned phase to 153 >> [ 360.211351] mmcblk2: error -110 transferring data, sector 8644608, nr 2048, cmd response 0x900, card status 0x0 >> [ 360.221936] mmcblk2: retrying using single block read >> [ 363.491362] mmcblk2: error -110 transferring data, sector 8646656, nr 2048, cmd response 0x900, card status 0x0 >> [ 363.531569] mmc_host mmc2: Bus speed (slot 0) = 400000Hz (slot req 400000Hz, actual 400000HZ div = 0) >> [ 363.596326] mmc_host mmc2: Bus speed (slot 0) = 148500000Hz (slot req 150000000Hz, actual 148500000HZ div = 0) >> [ 363.612712] dwmmc_rockchip ff0f0000.dwmmc: Successfully tuned phase to 152 >> [ 363.751351] mmcblk2: error -110 transferring data, sector 8646656, nr 2048, cmd response 0x900, card status 0x0 >> [ 363.761938] mmcblk2: retrying using single block read >> [ 366.611356] INFO: task mmcqd/2boot1:92 blocked for more than 120 seconds. >> [ 366.618134] Not tainted 4.10.0 #284 >> [ 366.622146] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. >> [ 366.629960] mmcqd/2boot1 D 0 92 2 0x00000000 >> [ 366.635454] [<c07dc21c>] (__schedule) from [<c07dc4e0>] (schedule+0x90/0xa0) >> [ 366.642497] [<c07dc4e0>] (schedule) from [<c066e8b4>] (__mmc_claim_host+0xd4/0x19c) >> [ 366.650142] [<c066e8b4>] (__mmc_claim_host) from [<c066e9ac>] (mmc_get_card+0x30/0x34) >> [ 366.658056] [<c066e9ac>] (mmc_get_card) from [<c067fc8c>] (mmc_blk_issue_rq+0x64/0x48c) >> [ 366.666052] [<c067fc8c>] (mmc_blk_issue_rq) from [<c0680230>] (mmc_queue_thread+0x114/0x1b4) >> [ 366.674484] [<c0680230>] (mmc_queue_thread) from [<c023d1b0>] (kthread+0x128/0x144) >> [ 366.682134] [<c023d1b0>] (kthread) from [<c02076e8>] (ret_from_fork+0x14/0x2c) > > I'm not convinced this is a regression. > > I remember Heiko saying that he's heard reports that on some boards > eMMC doesn't work with high speed, and I'be believe that's what you're > seeing here. It would be interesting to try to debug this. I can't > personally reproduce, though. > > I think veyron_minnie already has UHS turned off for eMMC upstream. I > guess we could do it for other veyron boards too until someone can > debug? > > >> Issue #2 - Wifi (via SDIO, mmc1) is completely dead: >> >> [ 1.444125] mmc_host mmc1: card is non-removable. >> [ 1.471368] mmc_host mmc1: Bus speed (slot 0) = 400000Hz (slot req 400000Hz, actual 400000HZ div = 0) >> [ 1.619553] mmc_host mmc1: Bus speed (slot 0) = 148500000Hz (slot req 150000000Hz, actual 148500000HZ div = 0) >> [ 1.881699] mmc1: new ultra high speed SDR104 SDIO card at address 0001 >> [ 25.681172] mmc_host mmc1: Bus speed (slot 0) = 148500000Hz (slot req 150000000Hz, actual 148500000HZ div = 0) >> [ 25.691666] mwifiex: rx work enabled, cpus 4 >> [ 26.827000] mwifiex_sdio mmc1:0001:1: info: FW download over, size 800344 bytes >> [ 27.561352] mwifiex_sdio mmc1:0001:1: WLAN FW is active >> [ 33.585165] mmc_host mmc1: Bus speed (slot 0) = 148500000Hz (slot req 150000000Hz, actual 148500000HZ div = 0) >> [ 37.651344] mwifiex_sdio mmc1:0001:1: mwifiex_cmd_timeout_func: Timeout cmd id = 0xa9, act = 0x0 >> [ 37.660122] mwifiex_sdio mmc1:0001:1: num_data_h2c_failure = 0 >> [ 37.665951] mwifiex_sdio mmc1:0001:1: num_cmd_h2c_failure = 0 >> [ 37.671688] mwifiex_sdio mmc1:0001:1: is_cmd_timedout = 1 >> [ 37.677076] mwifiex_sdio mmc1:0001:1: num_tx_timeout = 0 >> [ 37.682380] mwifiex_sdio mmc1:0001:1: last_cmd_index = 1 >> [ 37.687681] mwifiex_sdio mmc1:0001:1: last_cmd_id: 00 00 a9 00 00 00 00 00 00 00 >> [ 37.695066] mwifiex_sdio mmc1:0001:1: last_cmd_act: 00 00 00 00 00 00 00 00 00 00 >> [ 37.702536] mwifiex_sdio mmc1:0001:1: last_cmd_resp_index = 0 >> [ 37.708269] mwifiex_sdio mmc1:0001:1: last_cmd_resp_id: 00 00 00 00 00 00 00 00 00 00 >> [ 37.716087] mwifiex_sdio mmc1:0001:1: last_event_index = 0 >> [ 37.721564] mwifiex_sdio mmc1:0001:1: last_event: 00 00 00 00 00 00 00 00 00 00 >> [ 37.728857] mwifiex_sdio mmc1:0001:1: data_sent=1 cmd_sent=0 >> [ 37.734508] mwifiex_sdio mmc1:0001:1: ps_mode=0 ps_state=0 >> [ 37.740016] mmc_host mmc1: Bus speed (slot 0) = 148500000Hz (slot req 150000000Hz, actual 148500000HZ div = 0) >> [ 37.750268] mwifiex_sdio mmc1:0001:1: info: mwifiex_fw_dpc: unregister device > > This doesn't surprise me at all. What surprises me, though, is that > nobody else seems to be able to reproduce this. > > On veyron, WiFi is connected via SDIO. For good speed, it uses SDIO > Interrupts. See this bit in the device tree: > > cap-sdio-irq; > all of *my* boards are using side-band interrupt, so there are no "cap-sdio-irq". > SDIO interrupts (in 4-bit mode) specifically need the card clock to be > running all the time to work. I can reproduce your regression (on > veryron-jerry, which also has Marvell WiFi) and I can also find that > the regression is "gone' if I take out the "cap-sdio-irq" in the > veyron device tree. Ah, interestingly enough, turning off SDIO > interrupts has the side effect of sending enough (polling) traffic > that we never seem to runtime suspend, either. ;-P > > In general I'd question whether dw_mmc actually gets much power > benefit from Runtime PM in Linux. The dw_mmc IP blocks already have a > feature in them to automatically stop and restart the card clock. See > SDMMC_CLKEN_LOW_PWR. Maybe you're getting the benefit of turning off > VMMC or VQMMC? Is that really a lot of power? Presumably those power > savings would be for eMMC or normal SD cards (not SDIO). > > Maybe someone else on this thread knows how Runtime PM is supposed to > work in general for SDIO? I notice that in sdio.c the > mmc_sdio_runtime_suspend() unconditionally calls mmc_power_off(). > That seems odd since the main mmc_sdio_suspend() _doesn't_ call it if > mmc_card_keep_power(). Hrmmm... > > OK, so I just tried this on veyron-minnie. On minnie we have Broadcom > WiFi. That actually works (!). Presumably this is because > brcmf_sdiod_host_fixup() calls pm_runtime_forbid(). Commenting that > out breaks things. > > OK, and I can make Marvell work by adding > "pm_runtime_forbid(func->card->host->parent);" to the end of > mwifiex_sdio_probe(). > > -- > > So where does that leave us? > > A) Technically we can fix Marvell's driver to work like Broadcom's. > One could possibly assert that this is the wrong fix because > technically we could make Runtime PM work with SDIO with enough work. > We could theoretically move into 1-bit mode and there (I think) you > can get interrupts with the clock off. ...or we could have a > dedicated SDIO Interrupt pin (for the embedded case), which is talked > about in the SDIO spec. > > B) Technically we could hack this in the dw_mmc code to disable > Runtime PM if we see that an SDIO interrupt is used. One advantage of > doing it here is that if we ever add support in dw_mmc for the > external SDIO interrupt we could allow Runtime PM in that case. In > theory the dw_mmc IP block has some basic support for a dedicated SDIO > interrupt pin, but there's no code to support it. > > C) Technically we could add this into the MMC core. > > D) Technically we could remove Runtime PM support from dw_mmc for now > until someone can address all these issues (and ideally show a real > power savings). > > > I'd tend to vote for D, but I've been pretty absent from dw_mmc for a > long time, so probably my vote isn't worth that much... > > Shawn: I think you actually enabled runtime PM. Did you really see > power savings, or did it just seem like enabling Runtime PM would be a > neat thing to do? As Ulf pointed out that the genpd for mmc IP on Rockchip platforms were shared with others, so it's worth to add runtime PM. > > > -Doug > > > ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [REGRESSION 4.10] dw_mmc: failures on Rockchip rk3288 veyron boards 2017-04-12 0:54 ` Shawn Lin @ 2017-04-12 16:12 ` Doug Anderson 2017-04-13 7:17 ` Ulf Hansson 2017-04-13 8:28 ` Shawn Lin 0 siblings, 2 replies; 18+ messages in thread From: Doug Anderson @ 2017-04-12 16:12 UTC (permalink / raw) To: Shawn Lin Cc: Brian Norris, linux-mmc@vger.kernel.org, open list:ARM/Rockchip SoC..., Heiko Stuebner, Alexandru M Stan, Ziyuan Xu, Jaehoon Chung, kevin Shawn On Tue, Apr 11, 2017 at 5:54 PM, Shawn Lin <shawn.lin@rock-chips.com> wrote: >> This doesn't surprise me at all. What surprises me, though, is that >> nobody else seems to be able to reproduce this. >> >> On veyron, WiFi is connected via SDIO. For good speed, it uses SDIO >> Interrupts. See this bit in the device tree: >> >> cap-sdio-irq; >> > > all of *my* boards are using side-band interrupt, so there are no > "cap-sdio-irq". They are all using side-band interrupt? What WiFi device do you have connected? If you're truly using a side-band interrupt using the dedicated SDIO interrupt pin on your SoC, I'm pretty sure you still need to define cap-sdio-irq in order for things to work properly. If you don't do that, you'll get "polling mode" for SDIO Interrupts. See sdio_irq_thread() where you can see that the kernel will poll your device every 10 ms if MMC_CAP_SDIO_IRQ isn't set. Maybe you should try defining cap-sdio-irq and see if you get a big performance boost? ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [REGRESSION 4.10] dw_mmc: failures on Rockchip rk3288 veyron boards 2017-04-12 16:12 ` Doug Anderson @ 2017-04-13 7:17 ` Ulf Hansson 2017-04-13 15:45 ` Doug Anderson 2017-04-13 8:28 ` Shawn Lin 1 sibling, 1 reply; 18+ messages in thread From: Ulf Hansson @ 2017-04-13 7:17 UTC (permalink / raw) To: Doug Anderson Cc: Shawn Lin, Brian Norris, linux-mmc@vger.kernel.org, open list:ARM/Rockchip SoC..., Heiko Stuebner, Alexandru M Stan, Ziyuan Xu, Jaehoon Chung, kevin On 12 April 2017 at 18:12, Doug Anderson <dianders@google.com> wrote: > Shawn > > On Tue, Apr 11, 2017 at 5:54 PM, Shawn Lin <shawn.lin@rock-chips.com> wrote: >>> This doesn't surprise me at all. What surprises me, though, is that >>> nobody else seems to be able to reproduce this. >>> >>> On veyron, WiFi is connected via SDIO. For good speed, it uses SDIO >>> Interrupts. See this bit in the device tree: >>> >>> cap-sdio-irq; >>> >> >> all of *my* boards are using side-band interrupt, so there are no >> "cap-sdio-irq". > > They are all using side-band interrupt? What WiFi device do you have connected? > > If you're truly using a side-band interrupt using the dedicated SDIO > interrupt pin on your SoC, I'm pretty sure you still need to define > cap-sdio-irq in order for things to work properly. If you don't do > that, you'll get "polling mode" for SDIO Interrupts. See > sdio_irq_thread() where you can see that the kernel will poll your > device every 10 ms if MMC_CAP_SDIO_IRQ isn't set. In these cases I would expect the WIFI driver to deal with the SDIO IRQ itself and not requesting it via calling sdio_claim_irq(). Because of this, there should be no polling performed by the sdio_irq_thread. > > Maybe you should try defining cap-sdio-irq and see if you get a big > performance boost? No, that seems like a bad idea. I think it would rather add overhead - decreasing performance. Likely it will also make us wake up the mmc host from its low power state, when when it actually isn't needed. Kind regards Uffe ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [REGRESSION 4.10] dw_mmc: failures on Rockchip rk3288 veyron boards 2017-04-13 7:17 ` Ulf Hansson @ 2017-04-13 15:45 ` Doug Anderson 0 siblings, 0 replies; 18+ messages in thread From: Doug Anderson @ 2017-04-13 15:45 UTC (permalink / raw) To: Ulf Hansson Cc: Shawn Lin, Brian Norris, linux-mmc@vger.kernel.org, open list:ARM/Rockchip SoC..., Heiko Stuebner, Alexandru M Stan, Ziyuan Xu, Jaehoon Chung, kevin Hi, On Thu, Apr 13, 2017 at 12:17 AM, Ulf Hansson <ulf.hansson@linaro.org> wrote: > On 12 April 2017 at 18:12, Doug Anderson <dianders@google.com> wrote: >> Shawn >> >> On Tue, Apr 11, 2017 at 5:54 PM, Shawn Lin <shawn.lin@rock-chips.com> wrote: >>>> This doesn't surprise me at all. What surprises me, though, is that >>>> nobody else seems to be able to reproduce this. >>>> >>>> On veyron, WiFi is connected via SDIO. For good speed, it uses SDIO >>>> Interrupts. See this bit in the device tree: >>>> >>>> cap-sdio-irq; >>>> >>> >>> all of *my* boards are using side-band interrupt, so there are no >>> "cap-sdio-irq". >> >> They are all using side-band interrupt? What WiFi device do you have connected? >> >> If you're truly using a side-band interrupt using the dedicated SDIO >> interrupt pin on your SoC, I'm pretty sure you still need to define >> cap-sdio-irq in order for things to work properly. If you don't do >> that, you'll get "polling mode" for SDIO Interrupts. See >> sdio_irq_thread() where you can see that the kernel will poll your >> device every 10 ms if MMC_CAP_SDIO_IRQ isn't set. > > In these cases I would expect the WIFI driver to deal with the SDIO > IRQ itself and not requesting it via calling sdio_claim_irq(). Because > of this, there should be no polling performed by the sdio_irq_thread. You're the boss here, but that's not how I envisioned it if I ever found time to dig deeper. My vision of the world is probably colored by the dw_mmc IP block, though. Both of the two SoC families that I've dealt with that have dw_mmc (both Exynos and Rockchip) have always had a pin that could be muxed as "SDIO Interrupt". If you choose this pinmux, my understanding is that it will assert the dw_mmc's normal SDIO interrupt in the IP block. The DesignWare datasheet talks about this in terms of eSDIO. It does talk a little bit about the fact that this method of interrupting can happen even when the card clock is off. Given that this concept seems generic, is directly supported by the dw_mmc hardware, and is talked about in the dw_mmc datasheet, it seems as if dw_mmc would be the place to deal with it. If other controllers don't support this concept in a generic way, I see no reason why we still couldn't handle it in a generic way (via a GPIO) at the mmc level rather than forcing each WiFi driver to invent this themselves. ...but obviously I haven't worked through all the details and have never actually coded this up successfully. >> Maybe you should try defining cap-sdio-irq and see if you get a big >> performance boost? > > No, that seems like a bad idea. I think it would rather add overhead - > decreasing performance. Likely it will also make us wake up the mmc > host from its low power state, when when it actually isn't needed. Sounds like Shawn is using an out-of-tree driver, but if it's anything like the in-tree driver then there's no Runtime PM anyway. See the pm_runtime_forbid() in brcmf_sdiod_probe(). -Doug ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [REGRESSION 4.10] dw_mmc: failures on Rockchip rk3288 veyron boards 2017-04-12 16:12 ` Doug Anderson 2017-04-13 7:17 ` Ulf Hansson @ 2017-04-13 8:28 ` Shawn Lin 1 sibling, 0 replies; 18+ messages in thread From: Shawn Lin @ 2017-04-13 8:28 UTC (permalink / raw) To: Doug Anderson Cc: Brian Norris, linux-mmc@vger.kernel.org, open list:ARM/Rockchip SoC..., Heiko Stuebner, Alexandru M Stan, Ziyuan Xu, Jaehoon Chung, kevin Hi, 在 2017/4/13 0:12, Doug Anderson 写道: > Shawn > > On Tue, Apr 11, 2017 at 5:54 PM, Shawn Lin <shawn.lin@rock-chips.com> wrote: >>> This doesn't surprise me at all. What surprises me, though, is that >>> nobody else seems to be able to reproduce this. >>> >>> On veyron, WiFi is connected via SDIO. For good speed, it uses SDIO >>> Interrupts. See this bit in the device tree: >>> >>> cap-sdio-irq; >>> >> >> all of *my* boards are using side-band interrupt, so there are no >> "cap-sdio-irq". > > They are all using side-band interrupt? What WiFi device do you have connected? I'm using brcm wifi that using out-of-tree drivers. > > If you're truly using a side-band interrupt using the dedicated SDIO > interrupt pin on your SoC, I'm pretty sure you still need to define Not really. The intention of using side-band int is that we could put the host into low power mode(maybe with pd off), so that the wifi could still works with Socs. And mostly, we don't need to keep the controller on when in S3. The side-band io could be registered as a gpio interrupt (wakeup source), and once the wifi chip need to communicate with Socs, it could wakeup the system(of course sdio controller will be alive then). Also, once using side-band interrupt, the interrupt service and management should be done with the wifi function drivers. I'm pretty sure that my at-hand drivers, for instance, brcm and realtek actually do that. > cap-sdio-irq in order for things to work properly. If you don't do > that, you'll get "polling mode" for SDIO Interrupts. See > sdio_irq_thread() where you can see that the kernel will poll your > device every 10 ms if MMC_CAP_SDIO_IRQ isn't set. > > Maybe you should try defining cap-sdio-irq and see if you get a big > performance boost? Sorry, I didn't test the upstreamed wifi drivers but from the test of my out-of-tree wifi drivers, there is no much difference. > > > ^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2017-04-13 15:45 UTC | newest]
Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-03-30 1:17 [REGRESSION 4.10] dw_mmc: failures on Rockchip rk3288 veyron boards Brian Norris
[not found] ` <20170330011709.GA110687-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2017-03-30 1:32 ` Shawn Lin
2017-03-30 1:42 ` Brian Norris
2017-03-30 2:18 ` Eddie Cai
2017-03-30 2:53 ` Brian Norris
2017-03-30 5:11 ` Jaehoon Chung
2017-04-06 22:04 ` Brian Norris
2017-04-07 4:59 ` Jaehoon Chung
2017-04-07 6:50 ` Shawn Lin
2017-04-07 7:38 ` Jaehoon Chung
2017-04-10 23:35 ` Doug Anderson
2017-04-11 10:21 ` Ulf Hansson
2017-04-11 22:57 ` Doug Anderson
2017-04-12 0:54 ` Shawn Lin
2017-04-12 16:12 ` Doug Anderson
2017-04-13 7:17 ` Ulf Hansson
2017-04-13 15:45 ` Doug Anderson
2017-04-13 8:28 ` Shawn Lin
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox