* About a possibility of long latency to claim host [not found] <CGME20210309041059epcas2p49567b092c8d82bb80aa76ad26f8d212b@epcas2p4.samsung.com> @ 2021-03-09 4:10 ` Kiwoong Kim 2021-03-09 13:07 ` Ulf Hansson 0 siblings, 1 reply; 5+ messages in thread From: Kiwoong Kim @ 2021-03-09 4:10 UTC (permalink / raw) To: Ulf Hansson, linux-mmc Dear All I want to discuss about this topic with you guys. There is an application to put many IO requests to a SD card right after completion of system resume. Current MMC stack invokes mmc_rescan asynchronously for PM_POST_SUSPEND. As reported to me, there could be a race between an IO thread and the kworker for mmc_rescan, especially when the application mentioned before is installed and the function of mmc_rescan is run later than expected For a series of IO requests, mmc_rescan, particularily at __mmc_claim_host called in mmc_sd_detect, didn't acquire a host for longer than expected. Below is the call stacks shown after the symptom happened and the system tried to enter into suspend again. In this case, mmc pm notifier is called with PM_SUSPEND_PREPARE, so it waits for completion or cancelling the work for mmc_rescan. For the latency, mobile users can see black screen for a long time sometimes, even with pushing a power button to wake up the system. If there is another way to fix it w/o any change, please let me know. Or I want to post a patch not to do _mmc_detect_change conditionally because it helps remove the race and the case assumes SD card initialization is already done. -- <6>[ 3965.259816] I[0: swapper/0: 0] 22631 0 227 3902403994884 D(2) 3 ffffffc0007ee338 0 ffffffc85678e000 * kworker/u16:4 [__mmc_claim_host] <6>[ 3965.259839] I[0: swapper/0: 0] Workqueue: kmmcd mmc_rescan <0>[ 3965.259854] I[0: swapper/0: 0] Call trace: <4>[ 3965.259870] I[0: swapper/0: 0] [<ffffffc00013e674>] __switch_to+0x80/0x90 <4>[ 3965.259886] I[0: swapper/0: 0] [<ffffffc000b137f4>] __schedule+0x530/0x76c <4>[ 3965.259902] I[0: swapper/0: 0] [<ffffffc000b13a9c>] schedule+0x6c/0x7c <4>[ 3965.259919] I[0: swapper/0: 0] [<ffffffc0007ee338>] __mmc_claim_host+0xb8/0x188 <4>[ 3965.259937] I[0: swapper/0: 0] [<ffffffc0007eebfc>] mmc_get_card+0x24/0x38 <4>[ 3965.259956] I[0: swapper/0: 0] [<ffffffc0007f5c44>] mmc_sd_detect+0x90/0xe8 <4>[ 3965.259973] I[0: swapper/0: 0] [<ffffffc0007f08d8>] mmc_rescan+0xa0/0x2ec <4>[ 3965.259991] I[0: swapper/0: 0] [<ffffffc0001688e0>] process_one_work+0x26c/0x434 <4>[ 3965.260008] I[0: swapper/0: 0] [<ffffffc000168dc8>] worker_thread+0x320/0x444 <4>[ 3965.260026] I[0: swapper/0: 0] [<ffffffc00016dac0>] kthread+0xdc/0xec .. <0>[ 3965.259854] I[0: swapper/0: 0] Call trace: <4>[ 3965.259870] I[0: swapper/0: 0] [<ffffffc00013e674>] __switch_to+0x80/0x90 <4>[ 3965.259886] I[0: swapper/0: 0] [<ffffffc000b137f4>] __schedule+0x530/0x76c <4>[ 3965.259902] I[0: swapper/0: 0] [<ffffffc000b13a9c>] schedule+0x6c/0x7c <4>[ 3965.259919] I[0: swapper/0: 0] [<ffffffc0007ee338>] __mmc_claim_host+0xb8/0x188 <4>[ 3965.259937] I[0: swapper/0: 0] [<ffffffc0007eebfc>] mmc_get_card+0x24/0x38 <4>[ 3965.259956] I[0: swapper/0: 0] [<ffffffc0007f5c44>] mmc_sd_detect+0x90/0xe8 <4>[ 3965.259973] I[0: swapper/0: 0] [<ffffffc0007f08d8>] mmc_rescan+0xa0/0x2ec <4>[ 3965.259991] I[0: swapper/0: 0] [<ffffffc0001688e0>] process_one_work+0x26c/0x434 <4>[ 3965.260008] I[0: swapper/0: 0] [<ffffffc000168dc8>] worker_thread+0x320/0x444 <4>[ 3965.260026] I[0: swapper/0: 0] [<ffffffc00016dac0>] kthread+0xdc/0xec .. <0>[ 3965.203379] I[0: swapper/0: 0] Call trace: <4>[ 3965.203397] I[0: swapper/0: 0] [<ffffffc00013e674>] __switch_to+0x80/0x90 <4>[ 3965.203414] I[0: swapper/0: 0] [<ffffffc000b137f4>] __schedule+0x530/0x76c <4>[ 3965.203430] I[0: swapper/0: 0] [<ffffffc000b13a9c>] schedule+0x6c/0x7c <4>[ 3965.203448] I[0: swapper/0: 0] [<ffffffc000b160e8>] schedule_timeout+0x28/0x214 <4>[ 3965.203464] I[0: swapper/0: 0] [<ffffffc000b145a0>] wait_for_common+0x114/0x15c <4>[ 3965.203482] I[0: swapper/0: 0] [<ffffffc000b14710>] wait_for_completion+0x10/0x20 <4>[ 3965.203498] I[0: swapper/0: 0] [<ffffffc000169990>] flush_work+0xf0/0x11c <4>[ 3965.203516] I[0: swapper/0: 0] [<ffffffc000169ad0>] __cancel_work_timer+0x114/0x1a0 <4>[ 3965.203533] I[0: swapper/0: 0] [<ffffffc000169c44>] cancel_delayed_work_sync+0xc/0x1c <4>[ 3965.203554] I[0: swapper/0: 0] [<ffffffc0007f1000>] mmc_pm_notify+0x78/0x12c <4>[ 3965.203573] I[0: swapper/0: 0] [<ffffffc00016ebf4>] notifier_call_chain+0x7c/0xec <4>[ 3965.203590] I[0: swapper/0: 0] [<ffffffc00016ef08>] __blocking_notifier_call_chain+0x44/0x68 <4>[ 3965.203608] I[0: swapper/0: 0] [<ffffffc00016ef3c>] blocking_notifier_call_chain+0x10/0x20 <4>[ 3965.203628] I[0: swapper/0: 0] [<ffffffc0001964ec>] pm_notifier_call_chain+0x1c/0x44 <4>[ 3965.203644] I[0: swapper/0: 0] [<ffffffc000197ad0>] pm_suspend+0x2e0/0x5fc <4>[ 3965.203662] I[0: swapper/0: 0] [<ffffffc0001961cc>] state_store+0x4c/0x8c <4>[ 3965.203683] I[0: swapper/0: 0] [<ffffffc0003f35ec>] kobj_attr_store+0x14/0x2c <4>[ 3965.203703] I[0: swapper/0: 0] [<ffffffc0002aac2c>] sysfs_kf_write+0x40/0x50 <4>[ 3965.203721] I[0: swapper/0: 0] [<ffffffc0002aa000>] kernfs_fop_write+0xfc/0x148 <4>[ 3965.203739] I[0: swapper/0: 0] [<ffffffc000249b4c>] vfs_write+0xb8/0x190 <4>[ 3965.203755] I[0: swapper/0: 0] [<ffffffc000249cf0>] SyS_write+0x44/0x88 Thanks. Kiwoong Kim ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: About a possibility of long latency to claim host 2021-03-09 4:10 ` About a possibility of long latency to claim host Kiwoong Kim @ 2021-03-09 13:07 ` Ulf Hansson 2021-03-09 17:21 ` Ulf Hansson 2021-03-10 3:55 ` Kiwoong Kim 0 siblings, 2 replies; 5+ messages in thread From: Ulf Hansson @ 2021-03-09 13:07 UTC (permalink / raw) To: Kiwoong Kim; +Cc: linux-mmc@vger.kernel.org On Tue, 9 Mar 2021 at 05:11, Kiwoong Kim <kwmad.kim@samsung.com> wrote: > > Dear All > > I want to discuss about this topic with you guys. > > There is an application to put many IO requests to a SD card right after completion of system resume. > Current MMC stack invokes mmc_rescan asynchronously for PM_POST_SUSPEND. > As reported to me, there could be a race between an IO thread and the kworker for mmc_rescan, > especially when the application mentioned before is installed and the function of mmc_rescan is run later than expected > For a series of IO requests, mmc_rescan, particularily at __mmc_claim_host called in mmc_sd_detect, didn't acquire a host for longer than expected. That's not a problem, in general. As long as the card is inserted and functional, the I/O operations should be completed successfully. It doesn't matter if mmc_rescan() is waiting to claim the host, as it's not important that it gets to run as long as the card remains inserted/functional. > > Below is the call stacks shown after the symptom happened and the system tried to enter into suspend again. > In this case, mmc pm notifier is called with PM_SUSPEND_PREPARE, so it waits for completion or cancelling the work for mmc_rescan. > For the latency, mobile users can see black screen for a long time sometimes, even with pushing a power button to wake up the system. I think I understand what you are saying, but please correct me if I am wrong. The I/O requests keep flowing into the blk queue even after PM_SUSPEND_PREPARE has been fired, thus preventing the earlier scheduled mmc_rescan() from claiming the host? This sounds quite reasonable that it could happen, at least theoretically. Although, I am a bit surprised that nobody has reported about this problem, until now. The design in the mmc core, has remained unchanged in regards to this behaviour, for a very very long time. Let me try to reproduce the problem, I will get back to you soon. In the meantime, I would also appreciate it if you could share, more exactly, how to trigger this problem at your side. > > If there is another way to fix it w/o any change, please let me know. > Or I want to post a patch not to do _mmc_detect_change conditionally > because it helps remove the race and the case assumes SD card initialization is already done. Unfortunately, it's not as easy as that, because we need to detect whether a removable card has been removed after the system has resumed. There are other ways to solve this, but allow me to think a little more about it and get back to you. > > -- > <6>[ 3965.259816] I[0: swapper/0: 0] 22631 0 227 3902403994884 D(2) 3 ffffffc0007ee338 0 ffffffc85678e000 * kworker/u16:4 [__mmc_claim_host] > <6>[ 3965.259839] I[0: swapper/0: 0] Workqueue: kmmcd mmc_rescan > <0>[ 3965.259854] I[0: swapper/0: 0] Call trace: > <4>[ 3965.259870] I[0: swapper/0: 0] [<ffffffc00013e674>] __switch_to+0x80/0x90 > <4>[ 3965.259886] I[0: swapper/0: 0] [<ffffffc000b137f4>] __schedule+0x530/0x76c > <4>[ 3965.259902] I[0: swapper/0: 0] [<ffffffc000b13a9c>] schedule+0x6c/0x7c > <4>[ 3965.259919] I[0: swapper/0: 0] [<ffffffc0007ee338>] __mmc_claim_host+0xb8/0x188 > <4>[ 3965.259937] I[0: swapper/0: 0] [<ffffffc0007eebfc>] mmc_get_card+0x24/0x38 > <4>[ 3965.259956] I[0: swapper/0: 0] [<ffffffc0007f5c44>] mmc_sd_detect+0x90/0xe8 > <4>[ 3965.259973] I[0: swapper/0: 0] [<ffffffc0007f08d8>] mmc_rescan+0xa0/0x2ec > <4>[ 3965.259991] I[0: swapper/0: 0] [<ffffffc0001688e0>] process_one_work+0x26c/0x434 > <4>[ 3965.260008] I[0: swapper/0: 0] [<ffffffc000168dc8>] worker_thread+0x320/0x444 > <4>[ 3965.260026] I[0: swapper/0: 0] [<ffffffc00016dac0>] kthread+0xdc/0xec > .. > <0>[ 3965.259854] I[0: swapper/0: 0] Call trace: > <4>[ 3965.259870] I[0: swapper/0: 0] [<ffffffc00013e674>] __switch_to+0x80/0x90 > <4>[ 3965.259886] I[0: swapper/0: 0] [<ffffffc000b137f4>] __schedule+0x530/0x76c > <4>[ 3965.259902] I[0: swapper/0: 0] [<ffffffc000b13a9c>] schedule+0x6c/0x7c > <4>[ 3965.259919] I[0: swapper/0: 0] [<ffffffc0007ee338>] __mmc_claim_host+0xb8/0x188 > <4>[ 3965.259937] I[0: swapper/0: 0] [<ffffffc0007eebfc>] mmc_get_card+0x24/0x38 > <4>[ 3965.259956] I[0: swapper/0: 0] [<ffffffc0007f5c44>] mmc_sd_detect+0x90/0xe8 > <4>[ 3965.259973] I[0: swapper/0: 0] [<ffffffc0007f08d8>] mmc_rescan+0xa0/0x2ec > <4>[ 3965.259991] I[0: swapper/0: 0] [<ffffffc0001688e0>] process_one_work+0x26c/0x434 > <4>[ 3965.260008] I[0: swapper/0: 0] [<ffffffc000168dc8>] worker_thread+0x320/0x444 > <4>[ 3965.260026] I[0: swapper/0: 0] [<ffffffc00016dac0>] kthread+0xdc/0xec > .. > <0>[ 3965.203379] I[0: swapper/0: 0] Call trace: > <4>[ 3965.203397] I[0: swapper/0: 0] [<ffffffc00013e674>] __switch_to+0x80/0x90 > <4>[ 3965.203414] I[0: swapper/0: 0] [<ffffffc000b137f4>] __schedule+0x530/0x76c > <4>[ 3965.203430] I[0: swapper/0: 0] [<ffffffc000b13a9c>] schedule+0x6c/0x7c > <4>[ 3965.203448] I[0: swapper/0: 0] [<ffffffc000b160e8>] schedule_timeout+0x28/0x214 > <4>[ 3965.203464] I[0: swapper/0: 0] [<ffffffc000b145a0>] wait_for_common+0x114/0x15c > <4>[ 3965.203482] I[0: swapper/0: 0] [<ffffffc000b14710>] wait_for_completion+0x10/0x20 > <4>[ 3965.203498] I[0: swapper/0: 0] [<ffffffc000169990>] flush_work+0xf0/0x11c > <4>[ 3965.203516] I[0: swapper/0: 0] [<ffffffc000169ad0>] __cancel_work_timer+0x114/0x1a0 > <4>[ 3965.203533] I[0: swapper/0: 0] [<ffffffc000169c44>] cancel_delayed_work_sync+0xc/0x1c > <4>[ 3965.203554] I[0: swapper/0: 0] [<ffffffc0007f1000>] mmc_pm_notify+0x78/0x12c > <4>[ 3965.203573] I[0: swapper/0: 0] [<ffffffc00016ebf4>] notifier_call_chain+0x7c/0xec > <4>[ 3965.203590] I[0: swapper/0: 0] [<ffffffc00016ef08>] __blocking_notifier_call_chain+0x44/0x68 > <4>[ 3965.203608] I[0: swapper/0: 0] [<ffffffc00016ef3c>] blocking_notifier_call_chain+0x10/0x20 > <4>[ 3965.203628] I[0: swapper/0: 0] [<ffffffc0001964ec>] pm_notifier_call_chain+0x1c/0x44 > <4>[ 3965.203644] I[0: swapper/0: 0] [<ffffffc000197ad0>] pm_suspend+0x2e0/0x5fc > <4>[ 3965.203662] I[0: swapper/0: 0] [<ffffffc0001961cc>] state_store+0x4c/0x8c > <4>[ 3965.203683] I[0: swapper/0: 0] [<ffffffc0003f35ec>] kobj_attr_store+0x14/0x2c > <4>[ 3965.203703] I[0: swapper/0: 0] [<ffffffc0002aac2c>] sysfs_kf_write+0x40/0x50 > <4>[ 3965.203721] I[0: swapper/0: 0] [<ffffffc0002aa000>] kernfs_fop_write+0xfc/0x148 > <4>[ 3965.203739] I[0: swapper/0: 0] [<ffffffc000249b4c>] vfs_write+0xb8/0x190 > <4>[ 3965.203755] I[0: swapper/0: 0] [<ffffffc000249cf0>] SyS_write+0x44/0x88 Kind regards Uffe ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: About a possibility of long latency to claim host 2021-03-09 13:07 ` Ulf Hansson @ 2021-03-09 17:21 ` Ulf Hansson 2021-03-10 3:56 ` Kiwoong Kim 2021-03-10 3:55 ` Kiwoong Kim 1 sibling, 1 reply; 5+ messages in thread From: Ulf Hansson @ 2021-03-09 17:21 UTC (permalink / raw) To: Kiwoong Kim; +Cc: linux-mmc@vger.kernel.org On Tue, 9 Mar 2021 at 14:07, Ulf Hansson <ulf.hansson@linaro.org> wrote: > > On Tue, 9 Mar 2021 at 05:11, Kiwoong Kim <kwmad.kim@samsung.com> wrote: > > > > Dear All > > > > I want to discuss about this topic with you guys. > > > > There is an application to put many IO requests to a SD card right after completion of system resume. > > Current MMC stack invokes mmc_rescan asynchronously for PM_POST_SUSPEND. > > As reported to me, there could be a race between an IO thread and the kworker for mmc_rescan, > > especially when the application mentioned before is installed and the function of mmc_rescan is run later than expected > > For a series of IO requests, mmc_rescan, particularily at __mmc_claim_host called in mmc_sd_detect, didn't acquire a host for longer than expected. > > That's not a problem, in general. > > As long as the card is inserted and functional, the I/O operations > should be completed successfully. It doesn't matter if mmc_rescan() is > waiting to claim the host, as it's not important that it gets to run > as long as the card remains inserted/functional. > > > > > Below is the call stacks shown after the symptom happened and the system tried to enter into suspend again. > > In this case, mmc pm notifier is called with PM_SUSPEND_PREPARE, so it waits for completion or cancelling the work for mmc_rescan. > > For the latency, mobile users can see black screen for a long time sometimes, even with pushing a power button to wake up the system. > > I think I understand what you are saying, but please correct me if I am wrong. > > The I/O requests keep flowing into the blk queue even after > PM_SUSPEND_PREPARE has been fired, thus preventing the earlier > scheduled mmc_rescan() from claiming the host? > > This sounds quite reasonable that it could happen, at least > theoretically. Although, I am a bit surprised that nobody has reported > about this problem, until now. The design in the mmc core, has > remained unchanged in regards to this behaviour, for a very very long > time. > > Let me try to reproduce the problem, I will get back to you soon. In > the meantime, I would also appreciate it if you could share, more > exactly, how to trigger this problem at your side. I have managed to reproduce the problem! Wow, I wonder how this could have slipped through without anybody reporting about this, until now. So, thanks for bringing this to my attention! My rough guess is that SD cards, used in these kinds of configurations (Android), are actually set to non removable. This would prevent mmc_rescan() from claiming the host. In any case, this needs to be fixed properly. I will continue to work on a solution and get back to you with a patch. [...] Kind regards Uffe ^ permalink raw reply [flat|nested] 5+ messages in thread
* RE: About a possibility of long latency to claim host 2021-03-09 17:21 ` Ulf Hansson @ 2021-03-10 3:56 ` Kiwoong Kim 0 siblings, 0 replies; 5+ messages in thread From: Kiwoong Kim @ 2021-03-10 3:56 UTC (permalink / raw) To: 'Ulf Hansson'; +Cc: linux-mmc > On Tue, 9 Mar 2021 at 14:07, Ulf Hansson <ulf.hansson@linaro.org> wrote: > > > > On Tue, 9 Mar 2021 at 05:11, Kiwoong Kim <kwmad.kim@samsung.com> wrote: > > > > > > Dear All > > > > > > I want to discuss about this topic with you guys. > > > > > > There is an application to put many IO requests to a SD card right > after completion of system resume. > > > Current MMC stack invokes mmc_rescan asynchronously for > PM_POST_SUSPEND. > > > As reported to me, there could be a race between an IO thread and > > > the kworker for mmc_rescan, especially when the application > > > mentioned before is installed and the function of mmc_rescan is run > later than expected For a series of IO requests, mmc_rescan, particularily > at __mmc_claim_host called in mmc_sd_detect, didn't acquire a host for > longer than expected. > > > > That's not a problem, in general. > > > > As long as the card is inserted and functional, the I/O operations > > should be completed successfully. It doesn't matter if mmc_rescan() is > > waiting to claim the host, as it's not important that it gets to run > > as long as the card remains inserted/functional. > > > > > > > > Below is the call stacks shown after the symptom happened and the > system tried to enter into suspend again. > > > In this case, mmc pm notifier is called with PM_SUSPEND_PREPARE, so it > waits for completion or cancelling the work for mmc_rescan. > > > For the latency, mobile users can see black screen for a long time > sometimes, even with pushing a power button to wake up the system. > > > > I think I understand what you are saying, but please correct me if I am > wrong. > > > > The I/O requests keep flowing into the blk queue even after > > PM_SUSPEND_PREPARE has been fired, thus preventing the earlier > > scheduled mmc_rescan() from claiming the host? > > > > This sounds quite reasonable that it could happen, at least > > theoretically. Although, I am a bit surprised that nobody has reported > > about this problem, until now. The design in the mmc core, has > > remained unchanged in regards to this behaviour, for a very very long > > time. > > > > Let me try to reproduce the problem, I will get back to you soon. In > > the meantime, I would also appreciate it if you could share, more > > exactly, how to trigger this problem at your side. > > I have managed to reproduce the problem! Wow, I wonder how this could > have slipped through without anybody reporting about this, until now. > So, thanks for bringing this to my attention! > > My rough guess is that SD cards, used in these kinds of configurations > (Android), are actually set to non removable. This would prevent > mmc_rescan() from claiming the host. > > In any case, this needs to be fixed properly. I will continue to work on a > solution and get back to you with a patch. Okay and I'll also thing about its fix. Thanks. Kiwoong Kim > > [...] > > Kind regards > Uffe ^ permalink raw reply [flat|nested] 5+ messages in thread
* RE: About a possibility of long latency to claim host 2021-03-09 13:07 ` Ulf Hansson 2021-03-09 17:21 ` Ulf Hansson @ 2021-03-10 3:55 ` Kiwoong Kim 1 sibling, 0 replies; 5+ messages in thread From: Kiwoong Kim @ 2021-03-10 3:55 UTC (permalink / raw) To: 'Ulf Hansson'; +Cc: linux-mmc > > Below is the call stacks shown after the symptom happened and the system > tried to enter into suspend again. > > In this case, mmc pm notifier is called with PM_SUSPEND_PREPARE, so it > waits for completion or cancelling the work for mmc_rescan. > > For the latency, mobile users can see black screen for a long time > sometimes, even with pushing a power button to wake up the system. > > I think I understand what you are saying, but please correct me if I am > wrong. > > The I/O requests keep flowing into the blk queue even after > PM_SUSPEND_PREPARE has been fired, thus preventing the earlier scheduled > mmc_rescan() from claiming the host? Correct. ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2021-03-10 3:57 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <CGME20210309041059epcas2p49567b092c8d82bb80aa76ad26f8d212b@epcas2p4.samsung.com>
2021-03-09 4:10 ` About a possibility of long latency to claim host Kiwoong Kim
2021-03-09 13:07 ` Ulf Hansson
2021-03-09 17:21 ` Ulf Hansson
2021-03-10 3:56 ` Kiwoong Kim
2021-03-10 3:55 ` Kiwoong Kim
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox