* Continuous SD IO causes hung task messages
@ 2019-08-27 13:43 Russell King - ARM Linux admin
2019-08-27 13:55 ` Ulf Hansson
0 siblings, 1 reply; 6+ messages in thread
From: Russell King - ARM Linux admin @ 2019-08-27 13:43 UTC (permalink / raw)
To: linux-arm-kernel, linux-kernel, linux-mmc
Hi,
While dd'ing the contents of a SD card, I get hung task timeout
messages as per below. However, the dd is making progress. Any
ideas?
Presumably, mmc_rescan doesn't get a look-in while IO is progressing
for the card?
ARM64 host, Macchiatobin, uSD card.
Thanks.
root@arm-d06300000000:~# dd if=/dev/mmcblk1 | md5sum
INFO: task kworker/2:1:52 blocked for more than 120 seconds.
Not tainted 5.2.0+ #309
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kworker/2:1 D 0 52 2 0x00000028
Workqueue: events_freezable mmc_rescan
Call trace:
__switch_to+0xb0/0x198
__schedule+0x22c/0x604
schedule+0x38/0xc8
__mmc_claim_host+0xcc/0x1ec
mmc_get_card+0x30/0x3c
mmc_sd_detect+0x1c/0x78
mmc_rescan+0x1c4/0x35c
process_one_work+0x14c/0x408
worker_thread+0x140/0x3f4
kthread+0xfc/0x128
ret_from_fork+0x10/0x18
INFO: task kworker/2:1:52 blocked for more than 120 seconds.
Not tainted 5.2.0+ #309
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kworker/2:1 D 0 52 2 0x00000028
Workqueue: events_freezable mmc_rescan
Call trace:
__switch_to+0xb0/0x198
__schedule+0x22c/0x604
schedule+0x38/0xc8
__mmc_claim_host+0xcc/0x1ec
mmc_get_card+0x30/0x3c
mmc_sd_detect+0x1c/0x78
mmc_rescan+0x1c4/0x35c
process_one_work+0x14c/0x408
worker_thread+0x140/0x3f4
kthread+0xfc/0x128
ret_from_fork+0x10/0x18
INFO: task kworker/2:1:52 blocked for more than 241 seconds.
Not tainted 5.2.0+ #309
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kworker/2:1 D 0 52 2 0x00000028
Workqueue: events_freezable mmc_rescan
Call trace:
__switch_to+0xb0/0x198
__schedule+0x22c/0x604
schedule+0x38/0xc8
__mmc_claim_host+0xcc/0x1ec
mmc_get_card+0x30/0x3c
mmc_sd_detect+0x1c/0x78
mmc_rescan+0x1c4/0x35c
process_one_work+0x14c/0x408
worker_thread+0x140/0x3f4
kthread+0xfc/0x128
ret_from_fork+0x10/0x18
INFO: task kworker/2:1:52 blocked for more than 362 seconds.
Not tainted 5.2.0+ #309
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kworker/2:1 D 0 52 2 0x00000028
Workqueue: events_freezable mmc_rescan
Call trace:
__switch_to+0xb0/0x198
__schedule+0x22c/0x604
schedule+0x38/0xc8
__mmc_claim_host+0xcc/0x1ec
mmc_get_card+0x30/0x3c
mmc_sd_detect+0x1c/0x78
mmc_rescan+0x1c4/0x35c
process_one_work+0x14c/0x408
worker_thread+0x140/0x3f4
kthread+0xfc/0x128
ret_from_fork+0x10/0x18
--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: Continuous SD IO causes hung task messages 2019-08-27 13:43 Continuous SD IO causes hung task messages Russell King - ARM Linux admin @ 2019-08-27 13:55 ` Ulf Hansson 2019-08-27 14:36 ` Russell King - ARM Linux admin 0 siblings, 1 reply; 6+ messages in thread From: Ulf Hansson @ 2019-08-27 13:55 UTC (permalink / raw) To: Russell King - ARM Linux admin Cc: linux-mmc@vger.kernel.org, Linux Kernel Mailing List, Linux ARM On Tue, 27 Aug 2019 at 15:43, Russell King - ARM Linux admin <linux@armlinux.org.uk> wrote: > > Hi, > > While dd'ing the contents of a SD card, I get hung task timeout > messages as per below. However, the dd is making progress. Any > ideas? > > Presumably, mmc_rescan doesn't get a look-in while IO is progressing > for the card? Is it a regression? There not much of recent mmc core and mmc block changes, that I can think of at this point. > > ARM64 host, Macchiatobin, uSD card. What mmc host driver is it? mmci? Kind regards Uffe > > Thanks. > > root@arm-d06300000000:~# dd if=/dev/mmcblk1 | md5sum > INFO: task kworker/2:1:52 blocked for more than 120 seconds. > Not tainted 5.2.0+ #309 > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > kworker/2:1 D 0 52 2 0x00000028 > Workqueue: events_freezable mmc_rescan > Call trace: > __switch_to+0xb0/0x198 > __schedule+0x22c/0x604 > schedule+0x38/0xc8 > __mmc_claim_host+0xcc/0x1ec > mmc_get_card+0x30/0x3c > mmc_sd_detect+0x1c/0x78 > mmc_rescan+0x1c4/0x35c > process_one_work+0x14c/0x408 > worker_thread+0x140/0x3f4 > kthread+0xfc/0x128 > ret_from_fork+0x10/0x18 > INFO: task kworker/2:1:52 blocked for more than 120 seconds. > Not tainted 5.2.0+ #309 > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > kworker/2:1 D 0 52 2 0x00000028 > Workqueue: events_freezable mmc_rescan > Call trace: > __switch_to+0xb0/0x198 > __schedule+0x22c/0x604 > schedule+0x38/0xc8 > __mmc_claim_host+0xcc/0x1ec > mmc_get_card+0x30/0x3c > mmc_sd_detect+0x1c/0x78 > mmc_rescan+0x1c4/0x35c > process_one_work+0x14c/0x408 > worker_thread+0x140/0x3f4 > kthread+0xfc/0x128 > ret_from_fork+0x10/0x18 > INFO: task kworker/2:1:52 blocked for more than 241 seconds. > Not tainted 5.2.0+ #309 > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > kworker/2:1 D 0 52 2 0x00000028 > Workqueue: events_freezable mmc_rescan > Call trace: > __switch_to+0xb0/0x198 > __schedule+0x22c/0x604 > schedule+0x38/0xc8 > __mmc_claim_host+0xcc/0x1ec > mmc_get_card+0x30/0x3c > mmc_sd_detect+0x1c/0x78 > mmc_rescan+0x1c4/0x35c > process_one_work+0x14c/0x408 > worker_thread+0x140/0x3f4 > kthread+0xfc/0x128 > ret_from_fork+0x10/0x18 > INFO: task kworker/2:1:52 blocked for more than 362 seconds. > Not tainted 5.2.0+ #309 > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > kworker/2:1 D 0 52 2 0x00000028 > Workqueue: events_freezable mmc_rescan > Call trace: > __switch_to+0xb0/0x198 > __schedule+0x22c/0x604 > schedule+0x38/0xc8 > __mmc_claim_host+0xcc/0x1ec > mmc_get_card+0x30/0x3c > mmc_sd_detect+0x1c/0x78 > mmc_rescan+0x1c4/0x35c > process_one_work+0x14c/0x408 > worker_thread+0x140/0x3f4 > kthread+0xfc/0x128 > ret_from_fork+0x10/0x18 > > -- > RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ > FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up > According to speedtest.net: 11.9Mbps down 500kbps up _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Continuous SD IO causes hung task messages 2019-08-27 13:55 ` Ulf Hansson @ 2019-08-27 14:36 ` Russell King - ARM Linux admin 2019-08-27 14:52 ` Russell King - ARM Linux admin 0 siblings, 1 reply; 6+ messages in thread From: Russell King - ARM Linux admin @ 2019-08-27 14:36 UTC (permalink / raw) To: Ulf Hansson Cc: linux-mmc@vger.kernel.org, Linux Kernel Mailing List, Linux ARM On Tue, Aug 27, 2019 at 03:55:23PM +0200, Ulf Hansson wrote: > On Tue, 27 Aug 2019 at 15:43, Russell King - ARM Linux admin > <linux@armlinux.org.uk> wrote: > > > > Hi, > > > > While dd'ing the contents of a SD card, I get hung task timeout > > messages as per below. However, the dd is making progress. Any > > ideas? > > > > Presumably, mmc_rescan doesn't get a look-in while IO is progressing > > for the card? > > Is it a regression? > > There not much of recent mmc core and mmc block changes, that I can > think of at this point. No idea - I just repaired the SD socket after the D2 line became disconnected, and decided to run that command as a test. > > ARM64 host, Macchiatobin, uSD card. > > What mmc host driver is it? mmci? sdhci-xenon. I'm just trying with one CPU online, then I'll try with two. My suspicion is that there's a problem in the ARM64 arch code where unlocking a mutex doesn't get noticed on other CPUs. Hmm, I thought I'd try bringing another CPU online, but it seems like the ARM64 CPU hotplug code is broken: [ 3552.029689] CPU1: shutdown [ 3552.031099] psci: CPU1 killed. [ 3949.835212] CPU1: failed to come online [ 3949.837753] CPU1: failed in unknown state : 0x0 which means I can only take CPUs down, I can't bring them back online without rebooting. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up According to speedtest.net: 11.9Mbps down 500kbps up _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Continuous SD IO causes hung task messages 2019-08-27 14:36 ` Russell King - ARM Linux admin @ 2019-08-27 14:52 ` Russell King - ARM Linux admin 2019-08-27 15:06 ` Russell King - ARM Linux admin 0 siblings, 1 reply; 6+ messages in thread From: Russell King - ARM Linux admin @ 2019-08-27 14:52 UTC (permalink / raw) To: Ulf Hansson Cc: linux-mmc@vger.kernel.org, Linux Kernel Mailing List, Linux ARM On Tue, Aug 27, 2019 at 03:36:34PM +0100, Russell King - ARM Linux admin wrote: > On Tue, Aug 27, 2019 at 03:55:23PM +0200, Ulf Hansson wrote: > > On Tue, 27 Aug 2019 at 15:43, Russell King - ARM Linux admin > > <linux@armlinux.org.uk> wrote: > > > > > > Hi, > > > > > > While dd'ing the contents of a SD card, I get hung task timeout > > > messages as per below. However, the dd is making progress. Any > > > ideas? > > > > > > Presumably, mmc_rescan doesn't get a look-in while IO is progressing > > > for the card? > > > > Is it a regression? > > > > There not much of recent mmc core and mmc block changes, that I can > > think of at this point. > > No idea - I just repaired the SD socket after the D2 line became > disconnected, and decided to run that command as a test. > > > > ARM64 host, Macchiatobin, uSD card. > > > > What mmc host driver is it? mmci? > > sdhci-xenon. > > I'm just trying with one CPU online, then I'll try with two. My > suspicion is that there's a problem in the ARM64 arch code where > unlocking a mutex doesn't get noticed on other CPUs. > > Hmm, I thought I'd try bringing another CPU online, but it seems > like the ARM64 CPU hotplug code is broken: > > [ 3552.029689] CPU1: shutdown > [ 3552.031099] psci: CPU1 killed. > [ 3949.835212] CPU1: failed to come online > [ 3949.837753] CPU1: failed in unknown state : 0x0 > > which means I can only take CPUs down, I can't bring them back > online without rebooting. Okay, running on a single CPU shows no problems. Running on four CPUs (as originally) shows that the kworker thread _never_ gets scheduled, so the warning is not false. With three CPUs, same problem. root@arm-d06300000000:~# ps aux | grep ' D ' root 34 0.0 0.0 0 0 ? D 15:38 0:00 [kworker/1:1+events_freezable] root@arm-d06300000000:~# cat /proc/34/sched kworker/1:1 (34, #threads: 1) ------------------------------------------------------------------- se.exec_start : 318689.992440 se.vruntime : 37750.882357 se.sum_exec_runtime : 9.421240 se.nr_migrations : 0 nr_switches : 1174 nr_voluntary_switches : 1171 nr_involuntary_switches : 3 se.load.weight : 1048576 se.runnable_weight : 1048576 se.avg.load_sum : 6 se.avg.runnable_load_sum : 6 se.avg.util_sum : 5170 se.avg.load_avg : 0 se.avg.runnable_load_avg : 0 se.avg.util_avg : 0 se.avg.last_update_time : 318689991680 se.avg.util_est.ewma : 10 se.avg.util_est.enqueued : 0 policy : 0 prio : 120 clock-delta : 0 The only thing that changes there is "clock-delta". When I kill the dd, I get: root@arm-d06300000000:~# cat /proc/34/sched kworker/1:1 (34, #threads: 1) ------------------------------------------------------------------- se.exec_start : 574025.791680 se.vruntime : 79996.657300 se.sum_exec_runtime : 10.916400 se.nr_migrations : 0 nr_switches : 1403 nr_voluntary_switches : 1400 nr_involuntary_switches : 3 se.load.weight : 1048576 se.runnable_weight : 1048576 se.avg.load_sum : 15 se.avg.runnable_load_sum : 15 se.avg.util_sum : 15007 se.avg.load_avg : 0 se.avg.runnable_load_avg : 0 se.avg.util_avg : 0 se.avg.last_update_time : 574025791488 se.avg.util_est.ewma : 10 se.avg.util_est.enqueued : 0 policy : 0 prio : 120 clock-delta : 40 so the thread makes forward progress. Down to two CPUs: root@arm-d06300000000:~# ps aux | grep ' D ' root 34 0.0 0.0 0 0 ? D 15:38 0:00 [kworker/1:1+events_freezable] Same symptoms. dd and md5sum switch between CPU 0 and CPU1. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up According to speedtest.net: 11.9Mbps down 500kbps up _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Continuous SD IO causes hung task messages 2019-08-27 14:52 ` Russell King - ARM Linux admin @ 2019-08-27 15:06 ` Russell King - ARM Linux admin 2019-08-28 10:31 ` Ulf Hansson 0 siblings, 1 reply; 6+ messages in thread From: Russell King - ARM Linux admin @ 2019-08-27 15:06 UTC (permalink / raw) To: Ulf Hansson Cc: linux-mmc@vger.kernel.org, Linux Kernel Mailing List, Linux ARM On Tue, Aug 27, 2019 at 03:52:17PM +0100, Russell King - ARM Linux admin wrote: > On Tue, Aug 27, 2019 at 03:36:34PM +0100, Russell King - ARM Linux admin wrote: > > On Tue, Aug 27, 2019 at 03:55:23PM +0200, Ulf Hansson wrote: > > > On Tue, 27 Aug 2019 at 15:43, Russell King - ARM Linux admin > > > <linux@armlinux.org.uk> wrote: > > > > > > > > Hi, > > > > > > > > While dd'ing the contents of a SD card, I get hung task timeout > > > > messages as per below. However, the dd is making progress. Any > > > > ideas? > > > > > > > > Presumably, mmc_rescan doesn't get a look-in while IO is progressing > > > > for the card? > > > > > > Is it a regression? > > > > > > There not much of recent mmc core and mmc block changes, that I can > > > think of at this point. > > > > No idea - I just repaired the SD socket after the D2 line became > > disconnected, and decided to run that command as a test. > > > > > > ARM64 host, Macchiatobin, uSD card. > > > > > > What mmc host driver is it? mmci? > > > > sdhci-xenon. > > > > I'm just trying with one CPU online, then I'll try with two. My > > suspicion is that there's a problem in the ARM64 arch code where > > unlocking a mutex doesn't get noticed on other CPUs. > > > > Hmm, I thought I'd try bringing another CPU online, but it seems > > like the ARM64 CPU hotplug code is broken: > > > > [ 3552.029689] CPU1: shutdown > > [ 3552.031099] psci: CPU1 killed. > > [ 3949.835212] CPU1: failed to come online > > [ 3949.837753] CPU1: failed in unknown state : 0x0 > > > > which means I can only take CPUs down, I can't bring them back > > online without rebooting. > > Okay, running on a single CPU shows no problems. > > Running on four CPUs (as originally) shows that the kworker thread > _never_ gets scheduled, so the warning is not false. > > With three CPUs, same problem. > > root@arm-d06300000000:~# ps aux | grep ' D ' > root 34 0.0 0.0 0 0 ? D 15:38 0:00 [kworker/1:1+events_freezable] > root@arm-d06300000000:~# cat /proc/34/sched > kworker/1:1 (34, #threads: 1) > ------------------------------------------------------------------- > se.exec_start : 318689.992440 > se.vruntime : 37750.882357 > se.sum_exec_runtime : 9.421240 > se.nr_migrations : 0 > nr_switches : 1174 > nr_voluntary_switches : 1171 > nr_involuntary_switches : 3 > se.load.weight : 1048576 > se.runnable_weight : 1048576 > se.avg.load_sum : 6 > se.avg.runnable_load_sum : 6 > se.avg.util_sum : 5170 > se.avg.load_avg : 0 > se.avg.runnable_load_avg : 0 > se.avg.util_avg : 0 > se.avg.last_update_time : 318689991680 > se.avg.util_est.ewma : 10 > se.avg.util_est.enqueued : 0 > policy : 0 > prio : 120 > clock-delta : 0 > > The only thing that changes there is "clock-delta". When I kill the > dd, I get: > > root@arm-d06300000000:~# cat /proc/34/sched > kworker/1:1 (34, #threads: 1) > ------------------------------------------------------------------- > se.exec_start : 574025.791680 > se.vruntime : 79996.657300 > se.sum_exec_runtime : 10.916400 > se.nr_migrations : 0 > nr_switches : 1403 > nr_voluntary_switches : 1400 > nr_involuntary_switches : 3 > se.load.weight : 1048576 > se.runnable_weight : 1048576 > se.avg.load_sum : 15 > se.avg.runnable_load_sum : 15 > se.avg.util_sum : 15007 > se.avg.load_avg : 0 > se.avg.runnable_load_avg : 0 > se.avg.util_avg : 0 > se.avg.last_update_time : 574025791488 > se.avg.util_est.ewma : 10 > se.avg.util_est.enqueued : 0 > policy : 0 > prio : 120 > clock-delta : 40 > > so the thread makes forward progress. > > Down to two CPUs: > > root@arm-d06300000000:~# ps aux | grep ' D ' > root 34 0.0 0.0 0 0 ? D 15:38 0:00 [kworker/1:1+events_freezable] > > Same symptoms. dd and md5sum switch between CPU 0 and CPU1. Hmm. static blk_status_t mmc_mq_queue_rq(struct blk_mq_hw_ctx *hctx, const struct blk_mq_queue_data *bd) mq->in_flight[issue_type] += 1; get_card = (mmc_tot_in_flight(mq) == 1); if (get_card) mmc_get_card(card, &mq->ctx); mmc_get_card() gets the host lock according to the card. So, if we always have requests in flight (which is probably the case here) we never drop the host lock, and mmc_rescan() never gets a look in - hence blocking the kworker. So this is a real issue with MMC, and not down to something in the arch. I suspect the reason that single-CPU doesn't show it is because it is unable to keep multiple requests in flight. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up According to speedtest.net: 11.9Mbps down 500kbps up _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Continuous SD IO causes hung task messages 2019-08-27 15:06 ` Russell King - ARM Linux admin @ 2019-08-28 10:31 ` Ulf Hansson 0 siblings, 0 replies; 6+ messages in thread From: Ulf Hansson @ 2019-08-28 10:31 UTC (permalink / raw) To: Russell King - ARM Linux admin Cc: linux-mmc@vger.kernel.org, Linux Kernel Mailing List, Linux ARM On Tue, 27 Aug 2019 at 17:06, Russell King - ARM Linux admin <linux@armlinux.org.uk> wrote: > > On Tue, Aug 27, 2019 at 03:52:17PM +0100, Russell King - ARM Linux admin wrote: > > On Tue, Aug 27, 2019 at 03:36:34PM +0100, Russell King - ARM Linux admin wrote: > > > On Tue, Aug 27, 2019 at 03:55:23PM +0200, Ulf Hansson wrote: > > > > On Tue, 27 Aug 2019 at 15:43, Russell King - ARM Linux admin > > > > <linux@armlinux.org.uk> wrote: > > > > > > > > > > Hi, > > > > > > > > > > While dd'ing the contents of a SD card, I get hung task timeout > > > > > messages as per below. However, the dd is making progress. Any > > > > > ideas? > > > > > > > > > > Presumably, mmc_rescan doesn't get a look-in while IO is progressing > > > > > for the card? > > > > > > > > Is it a regression? > > > > > > > > There not much of recent mmc core and mmc block changes, that I can > > > > think of at this point. > > > > > > No idea - I just repaired the SD socket after the D2 line became > > > disconnected, and decided to run that command as a test. > > > > > > > > ARM64 host, Macchiatobin, uSD card. > > > > > > > > What mmc host driver is it? mmci? > > > > > > sdhci-xenon. > > > > > > I'm just trying with one CPU online, then I'll try with two. My > > > suspicion is that there's a problem in the ARM64 arch code where > > > unlocking a mutex doesn't get noticed on other CPUs. > > > > > > Hmm, I thought I'd try bringing another CPU online, but it seems > > > like the ARM64 CPU hotplug code is broken: > > > > > > [ 3552.029689] CPU1: shutdown > > > [ 3552.031099] psci: CPU1 killed. > > > [ 3949.835212] CPU1: failed to come online > > > [ 3949.837753] CPU1: failed in unknown state : 0x0 > > > > > > which means I can only take CPUs down, I can't bring them back > > > online without rebooting. > > > > Okay, running on a single CPU shows no problems. > > > > Running on four CPUs (as originally) shows that the kworker thread > > _never_ gets scheduled, so the warning is not false. > > > > With three CPUs, same problem. > > > > root@arm-d06300000000:~# ps aux | grep ' D ' > > root 34 0.0 0.0 0 0 ? D 15:38 0:00 [kworker/1:1+events_freezable] > > root@arm-d06300000000:~# cat /proc/34/sched > > kworker/1:1 (34, #threads: 1) > > ------------------------------------------------------------------- > > se.exec_start : 318689.992440 > > se.vruntime : 37750.882357 > > se.sum_exec_runtime : 9.421240 > > se.nr_migrations : 0 > > nr_switches : 1174 > > nr_voluntary_switches : 1171 > > nr_involuntary_switches : 3 > > se.load.weight : 1048576 > > se.runnable_weight : 1048576 > > se.avg.load_sum : 6 > > se.avg.runnable_load_sum : 6 > > se.avg.util_sum : 5170 > > se.avg.load_avg : 0 > > se.avg.runnable_load_avg : 0 > > se.avg.util_avg : 0 > > se.avg.last_update_time : 318689991680 > > se.avg.util_est.ewma : 10 > > se.avg.util_est.enqueued : 0 > > policy : 0 > > prio : 120 > > clock-delta : 0 > > > > The only thing that changes there is "clock-delta". When I kill the > > dd, I get: > > > > root@arm-d06300000000:~# cat /proc/34/sched > > kworker/1:1 (34, #threads: 1) > > ------------------------------------------------------------------- > > se.exec_start : 574025.791680 > > se.vruntime : 79996.657300 > > se.sum_exec_runtime : 10.916400 > > se.nr_migrations : 0 > > nr_switches : 1403 > > nr_voluntary_switches : 1400 > > nr_involuntary_switches : 3 > > se.load.weight : 1048576 > > se.runnable_weight : 1048576 > > se.avg.load_sum : 15 > > se.avg.runnable_load_sum : 15 > > se.avg.util_sum : 15007 > > se.avg.load_avg : 0 > > se.avg.runnable_load_avg : 0 > > se.avg.util_avg : 0 > > se.avg.last_update_time : 574025791488 > > se.avg.util_est.ewma : 10 > > se.avg.util_est.enqueued : 0 > > policy : 0 > > prio : 120 > > clock-delta : 40 > > > > so the thread makes forward progress. > > > > Down to two CPUs: > > > > root@arm-d06300000000:~# ps aux | grep ' D ' > > root 34 0.0 0.0 0 0 ? D 15:38 0:00 [kworker/1:1+events_freezable] > > > > Same symptoms. dd and md5sum switch between CPU 0 and CPU1. > > Hmm. > > static blk_status_t mmc_mq_queue_rq(struct blk_mq_hw_ctx *hctx, > const struct blk_mq_queue_data *bd) > > mq->in_flight[issue_type] += 1; > get_card = (mmc_tot_in_flight(mq) == 1); > > if (get_card) > mmc_get_card(card, &mq->ctx); > > mmc_get_card() gets the host lock according to the card. > > So, if we always have requests in flight (which is probably the case > here) we never drop the host lock, and mmc_rescan() never gets a look > in - hence blocking the kworker. Ahh, you are right. However, this isn't a new problem I believe. Even if we did some re-work of the locking mechanism while converting to blk-mq, I still think the worker could starve the mmc_rescan work before. In practice this shouldn't be a problem though, unless I am overlooking something. This is because it's not until there is an I/O error, that causes the block worker to release the host, to it makes sense to let mmc_rescan to claim the host to check for card removal. > > So this is a real issue with MMC, and not down to something in the > arch. Yep, thanks for running the test and providing more details! > > I suspect the reason that single-CPU doesn't show it is because it is > unable to keep multiple requests in flight. Yes, most likely. Now, how to solve this problem I need to think more about.... FYI: The long term goal has been to try to remove the big fat host lock altogether and slowly we have moved more an more things to be executed as a part of the block worker, which is one of the needed steps. Like the mmc ioctls for example... Kind regards Uffe _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2019-08-28 10:32 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2019-08-27 13:43 Continuous SD IO causes hung task messages Russell King - ARM Linux admin 2019-08-27 13:55 ` Ulf Hansson 2019-08-27 14:36 ` Russell King - ARM Linux admin 2019-08-27 14:52 ` Russell King - ARM Linux admin 2019-08-27 15:06 ` Russell King - ARM Linux admin 2019-08-28 10:31 ` Ulf Hansson
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).