* [PATCH] drivercore: deferral race condition fix
@ 2014-02-26 7:06 Peter Ujfalusi
2014-03-01 0:33 ` Greg Kroah-Hartman
0 siblings, 1 reply; 7+ messages in thread
From: Peter Ujfalusi @ 2014-02-26 7:06 UTC (permalink / raw)
To: Greg Kroah-Hartman, Grant Likely; +Cc: linux-kernel, Liam Girdwood, Mark Brown
When the kernel is built with CONFIG_PREEMPT it is possible to reach a state
when all modules are loaded but some driver still stuck in the deferred list
and there is a need for external event to kick the deferred queue to probe
these drivers.
The issue has been observed on embedded systems with CONFIG_PREEMPT enabled,
audio support built as modules and using nfsroot for root filesystem.
The following fragment of a log shows such sequence when all audio modules
were loaded but the sound card is not present since the machine driver has
failed to probe due to missing dependency during it's probe.
The board is am335x-evmsk (McASP<->tlv320aic3106 codec) with davinci-evm
machine driver:
...
[ 12.615118] davinci-mcasp 4803c000.mcasp: davinci_mcasp_probe: ENTER
[ 12.719969] davinci_evm sound.3: davinci_evm_probe: ENTER
[ 12.725753] davinci_evm sound.3: davinci_evm_probe: snd_soc_register_card
[ 12.753846] davinci-mcasp 4803c000.mcasp: davinci_mcasp_probe: snd_soc_register_component
[ 12.922051] davinci-mcasp 4803c000.mcasp: davinci_mcasp_probe: snd_soc_register_component DONE
[ 12.950839] davinci_evm sound.3: ASoC: platform (null) not registered
[ 12.957898] davinci_evm sound.3: davinci_evm_probe: snd_soc_register_card DONE (-517)
[ 13.099026] davinci-mcasp 4803c000.mcasp: Kicking the deferred list
[ 13.177838] davinci-mcasp 4803c000.mcasp: really_probe: probe_count = 2
[ 13.194130] davinci_evm sound.3: snd_soc_register_card failed (-517)
[ 13.346755] davinci_mcasp_driver_init: LEAVE
[ 13.377446] platform sound.3: Driver davinci_evm requests probe deferral
[ 13.592527] platform sound.3: really_probe: probe_count = 0
In the log the machine driver enters it's probe at 12.719969 (this point it
has been removed from the deferred lists). McASP driver already executing
it's probing (12.615118) and finishes first as well.
The machine driver tries to construct the sound card (12.950839) but did
not found one of the components so it fails. After this McASP driver
registers all the ASoC components and the deferred work is prepared at
13.099026 (note that this time the machine driver is not in the lists so it
is not going to be handled when the work is executing).
Lastly the machine driver exit from it's probe and the core places it to the
deferred list but there will be no other driver going to load and the
deferred queue is not going to be kicked again - till we have external event
like connecting USB stick, etc.
The proposed solution is to try the deferred queue once more when the last
driver is asking for deferring and we had drivers loaded while this last
driver was probing.
This way we can avoid drivers stuck in the deferred queue.
Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
---
Hi Greg, Grant,
I have fixed up the commit message and rebased the patch on top of 3.14-rc4
since the RFC version [1].
[1] https://lkml.org/lkml/2013/12/23/142
Regards,
Peter
drivers/base/dd.c | 22 +++++++++++++++++++---
1 file changed, 19 insertions(+), 3 deletions(-)
diff --git a/drivers/base/dd.c b/drivers/base/dd.c
index 06051767393f..80703de6e6ad 100644
--- a/drivers/base/dd.c
+++ b/drivers/base/dd.c
@@ -53,6 +53,10 @@ static LIST_HEAD(deferred_probe_pending_list);
static LIST_HEAD(deferred_probe_active_list);
static struct workqueue_struct *deferred_wq;
+static atomic_t probe_count = ATOMIC_INIT(0);
+static DECLARE_WAIT_QUEUE_HEAD(probe_waitqueue);
+static bool deferral_retry;
+
/**
* deferred_probe_work_func() - Retry probing devices in the active list.
*/
@@ -141,6 +145,11 @@ static void driver_deferred_probe_trigger(void)
if (!driver_deferred_probe_enable)
return;
+ if (atomic_read(&probe_count) > 1)
+ deferral_retry = true;
+ else
+ deferral_retry = false;
+
/*
* A successful probe means that all the devices in the pending list
* should be triggered to be reprobed. Move all the deferred devices
@@ -259,9 +268,6 @@ int device_bind_driver(struct device *dev)
}
EXPORT_SYMBOL_GPL(device_bind_driver);
-static atomic_t probe_count = ATOMIC_INIT(0);
-static DECLARE_WAIT_QUEUE_HEAD(probe_waitqueue);
-
static int really_probe(struct device *dev, struct device_driver *drv)
{
int ret = 0;
@@ -310,6 +316,16 @@ probe_failed:
/* Driver requested deferred probing */
dev_info(dev, "Driver %s requests probe deferral\n", drv->name);
driver_deferred_probe_add(dev);
+ /*
+ * This is the last driver to load and asking to be deferred.
+ * If other driver(s) loaded while this driver was loading, we
+ * should try the deferred modules again to avoid missing
+ * dependency for this driver.
+ */
+ if (atomic_read(&probe_count) == 1 && deferral_retry) {
+ deferral_retry = false;
+ driver_deferred_probe_trigger();
+ }
} else if (ret != -ENODEV && ret != -ENXIO) {
/* driver matched but the probe failed */
printk(KERN_WARNING
--
1.9.0
^ permalink raw reply related [flat|nested] 7+ messages in thread* Re: [PATCH] drivercore: deferral race condition fix
2014-02-26 7:06 [PATCH] drivercore: deferral race condition fix Peter Ujfalusi
@ 2014-03-01 0:33 ` Greg Kroah-Hartman
2014-03-03 8:26 ` Peter Ujfalusi
0 siblings, 1 reply; 7+ messages in thread
From: Greg Kroah-Hartman @ 2014-03-01 0:33 UTC (permalink / raw)
To: Peter Ujfalusi; +Cc: Grant Likely, linux-kernel, Liam Girdwood, Mark Brown
On Wed, Feb 26, 2014 at 09:06:54AM +0200, Peter Ujfalusi wrote:
> When the kernel is built with CONFIG_PREEMPT it is possible to reach a state
> when all modules are loaded but some driver still stuck in the deferred list
> and there is a need for external event to kick the deferred queue to probe
> these drivers.
>
> The issue has been observed on embedded systems with CONFIG_PREEMPT enabled,
> audio support built as modules and using nfsroot for root filesystem.
>
> The following fragment of a log shows such sequence when all audio modules
> were loaded but the sound card is not present since the machine driver has
> failed to probe due to missing dependency during it's probe.
> The board is am335x-evmsk (McASP<->tlv320aic3106 codec) with davinci-evm
> machine driver:
>
> ...
> [ 12.615118] davinci-mcasp 4803c000.mcasp: davinci_mcasp_probe: ENTER
> [ 12.719969] davinci_evm sound.3: davinci_evm_probe: ENTER
> [ 12.725753] davinci_evm sound.3: davinci_evm_probe: snd_soc_register_card
> [ 12.753846] davinci-mcasp 4803c000.mcasp: davinci_mcasp_probe: snd_soc_register_component
> [ 12.922051] davinci-mcasp 4803c000.mcasp: davinci_mcasp_probe: snd_soc_register_component DONE
> [ 12.950839] davinci_evm sound.3: ASoC: platform (null) not registered
> [ 12.957898] davinci_evm sound.3: davinci_evm_probe: snd_soc_register_card DONE (-517)
> [ 13.099026] davinci-mcasp 4803c000.mcasp: Kicking the deferred list
> [ 13.177838] davinci-mcasp 4803c000.mcasp: really_probe: probe_count = 2
> [ 13.194130] davinci_evm sound.3: snd_soc_register_card failed (-517)
> [ 13.346755] davinci_mcasp_driver_init: LEAVE
> [ 13.377446] platform sound.3: Driver davinci_evm requests probe deferral
> [ 13.592527] platform sound.3: really_probe: probe_count = 0
>
> In the log the machine driver enters it's probe at 12.719969 (this point it
> has been removed from the deferred lists). McASP driver already executing
> it's probing (12.615118) and finishes first as well.
> The machine driver tries to construct the sound card (12.950839) but did
> not found one of the components so it fails. After this McASP driver
> registers all the ASoC components and the deferred work is prepared at
> 13.099026 (note that this time the machine driver is not in the lists so it
> is not going to be handled when the work is executing).
> Lastly the machine driver exit from it's probe and the core places it to the
> deferred list but there will be no other driver going to load and the
> deferred queue is not going to be kicked again - till we have external event
> like connecting USB stick, etc.
>
> The proposed solution is to try the deferred queue once more when the last
> driver is asking for deferring and we had drivers loaded while this last
> driver was probing.
"once more"? What happens if we get a new driver in when that one is
being probed?
It sounds like there's a race condition here somewhere, or improper
locking going on, just "let's try it again" doesn't sound like the
correct fix to me, does it to you?
thanks,
greg k-h
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] drivercore: deferral race condition fix
2014-03-01 0:33 ` Greg Kroah-Hartman
@ 2014-03-03 8:26 ` Peter Ujfalusi
2014-03-04 4:56 ` Mark Brown
0 siblings, 1 reply; 7+ messages in thread
From: Peter Ujfalusi @ 2014-03-03 8:26 UTC (permalink / raw)
To: Greg Kroah-Hartman; +Cc: Grant Likely, linux-kernel, Liam Girdwood, Mark Brown
On 03/01/2014 02:33 AM, Greg Kroah-Hartman wrote:
> On Wed, Feb 26, 2014 at 09:06:54AM +0200, Peter Ujfalusi wrote:
>> When the kernel is built with CONFIG_PREEMPT it is possible to reach a state
>> when all modules are loaded but some driver still stuck in the deferred list
>> and there is a need for external event to kick the deferred queue to probe
>> these drivers.
>>
>> The issue has been observed on embedded systems with CONFIG_PREEMPT enabled,
>> audio support built as modules and using nfsroot for root filesystem.
>>
>> The following fragment of a log shows such sequence when all audio modules
>> were loaded but the sound card is not present since the machine driver has
>> failed to probe due to missing dependency during it's probe.
>> The board is am335x-evmsk (McASP<->tlv320aic3106 codec) with davinci-evm
>> machine driver:
>>
>> ...
>> [ 12.615118] davinci-mcasp 4803c000.mcasp: davinci_mcasp_probe: ENTER
>> [ 12.719969] davinci_evm sound.3: davinci_evm_probe: ENTER
>> [ 12.725753] davinci_evm sound.3: davinci_evm_probe: snd_soc_register_card
>> [ 12.753846] davinci-mcasp 4803c000.mcasp: davinci_mcasp_probe: snd_soc_register_component
>> [ 12.922051] davinci-mcasp 4803c000.mcasp: davinci_mcasp_probe: snd_soc_register_component DONE
>> [ 12.950839] davinci_evm sound.3: ASoC: platform (null) not registered
>> [ 12.957898] davinci_evm sound.3: davinci_evm_probe: snd_soc_register_card DONE (-517)
>> [ 13.099026] davinci-mcasp 4803c000.mcasp: Kicking the deferred list
>> [ 13.177838] davinci-mcasp 4803c000.mcasp: really_probe: probe_count = 2
>> [ 13.194130] davinci_evm sound.3: snd_soc_register_card failed (-517)
>> [ 13.346755] davinci_mcasp_driver_init: LEAVE
>> [ 13.377446] platform sound.3: Driver davinci_evm requests probe deferral
>> [ 13.592527] platform sound.3: really_probe: probe_count = 0
>>
>> In the log the machine driver enters it's probe at 12.719969 (this point it
>> has been removed from the deferred lists). McASP driver already executing
>> it's probing (12.615118) and finishes first as well.
>> The machine driver tries to construct the sound card (12.950839) but did
>> not found one of the components so it fails. After this McASP driver
>> registers all the ASoC components and the deferred work is prepared at
>> 13.099026 (note that this time the machine driver is not in the lists so it
>> is not going to be handled when the work is executing).
>> Lastly the machine driver exit from it's probe and the core places it to the
>> deferred list but there will be no other driver going to load and the
>> deferred queue is not going to be kicked again - till we have external event
>> like connecting USB stick, etc.
>>
>> The proposed solution is to try the deferred queue once more when the last
>> driver is asking for deferring and we had drivers loaded while this last
>> driver was probing.
>
> "once more"? What happens if we get a new driver in when that one is
> being probed?
>
> It sounds like there's a race condition here somewhere, or improper
> locking going on, just "let's try it again" doesn't sound like the
> correct fix to me, does it to you?
The whole idea of deferred_probe is to "let's try it again". The issue
surfaces under certain conditions (main factors being CONFIG_PREEMPT + nfsroot
+ single core vs multi core?) so this could be a race condition or it could be
just how things are.
After the first driver asks to be deferred we for sure have two paths from
where the drivers are probed: still not probed drivers and drivers from the
deferred probe list.
At this point nothing stops drivers to be probed in parallel and it is just a
matter of timing if we hit the state which I have been experiencing.
In this case we have two driver processing their probe and we do not yet know
if one of them going to be asking to be deferred so neither of them should be
in the deferred list.
So we have an empty deferred list and two drivers probing in parallel. driver2
needs driver1 to be loaded:
driver1_probe(..)
{
driver1_setup();
register_me_to_somewhere();
return 0;
}
driver2_probe(..)
{
driver2_setup();
if(!is_driver1_there()) {
driver2_cleanup();
return -EPROBE_DEFER;
}
return 0;
}
It is perfectly normal that these are running in parallel. driver2 checks if
driver1 is loaded already but driver1 still inside of it's driver1_setup()
function. driver2 starts to clean up before returning with -EPROBE_DEFER
meanwhile driver1 registers itself and it is now loaded. In dd.c the deferred
probe list is empty (the deferred work has nothing to do) since driver1 has
just finished with success and driver2 not yet returned from it's probe.
Now driver2 finished the cleanup and returned with -EPROBE_DEFER, it is placed
to the deferred list. And that's where we need an external event (USB device
plug) which will kick the deferred logic to probe the drivers in the deferred
list.
I think it is correct to detect this situation without the need to have non
related drivers to be probed.
The patch is doing this exactly: detects if we had successful parallel driver
probe(s) while another driver was probing which ends up requesting to be
deferred. We only try the deferred list again if this condition has been
detected, we do not loop on the deferred list, we do not try the list again if
there were no other drivers loaded since nothing happened which could satisfy
the driver asking to be deferred.
--
Péter
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: [PATCH] drivercore: deferral race condition fix
2014-03-03 8:26 ` Peter Ujfalusi
@ 2014-03-04 4:56 ` Mark Brown
2014-04-02 6:38 ` Peter Ujfalusi
0 siblings, 1 reply; 7+ messages in thread
From: Mark Brown @ 2014-03-04 4:56 UTC (permalink / raw)
To: Peter Ujfalusi
Cc: Greg Kroah-Hartman, Grant Likely, linux-kernel, Liam Girdwood
[-- Attachment #1: Type: text/plain, Size: 836 bytes --]
On Mon, Mar 03, 2014 at 10:26:59AM +0200, Peter Ujfalusi wrote:
> I think it is correct to detect this situation without the need to have non
> related drivers to be probed.
> The patch is doing this exactly: detects if we had successful parallel driver
> probe(s) while another driver was probing which ends up requesting to be
> deferred. We only try the deferred list again if this condition has been
> detected, we do not loop on the deferred list, we do not try the list again if
> there were no other drivers loaded since nothing happened which could satisfy
> the driver asking to be deferred.
It's certainly the simplest approach I can think of - anything else
would seem to involve looking to see if we're running deferred probes
and trying to add things to the list while that's going on which seems
like it might be hairy.
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] drivercore: deferral race condition fix
2014-03-04 4:56 ` Mark Brown
@ 2014-04-02 6:38 ` Peter Ujfalusi
2014-04-02 23:18 ` Greg Kroah-Hartman
0 siblings, 1 reply; 7+ messages in thread
From: Peter Ujfalusi @ 2014-04-02 6:38 UTC (permalink / raw)
To: Mark Brown; +Cc: Greg Kroah-Hartman, Grant Likely, linux-kernel, Liam Girdwood
Hi Greg,
On 03/04/2014 06:56 AM, Mark Brown wrote:
> On Mon, Mar 03, 2014 at 10:26:59AM +0200, Peter Ujfalusi wrote:
>
>> I think it is correct to detect this situation without the need to have non
>> related drivers to be probed.
>> The patch is doing this exactly: detects if we had successful parallel driver
>> probe(s) while another driver was probing which ends up requesting to be
>> deferred. We only try the deferred list again if this condition has been
>> detected, we do not loop on the deferred list, we do not try the list again if
>> there were no other drivers loaded since nothing happened which could satisfy
>> the driver asking to be deferred.
>
> It's certainly the simplest approach I can think of - anything else
> would seem to involve looking to see if we're running deferred probes
> and trying to add things to the list while that's going on which seems
> like it might be hairy.
Do you want me to resend this patch in hope that it is going to be taken or do
you have other method in mind to deal with the situation I have described and
fixed with this patch?
Regards,
Péter
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] drivercore: deferral race condition fix
2014-04-02 6:38 ` Peter Ujfalusi
@ 2014-04-02 23:18 ` Greg Kroah-Hartman
2014-04-03 9:07 ` Mark Brown
0 siblings, 1 reply; 7+ messages in thread
From: Greg Kroah-Hartman @ 2014-04-02 23:18 UTC (permalink / raw)
To: Peter Ujfalusi; +Cc: Mark Brown, Grant Likely, linux-kernel, Liam Girdwood
On Wed, Apr 02, 2014 at 09:38:06AM +0300, Peter Ujfalusi wrote:
> Hi Greg,
>
> On 03/04/2014 06:56 AM, Mark Brown wrote:
> > On Mon, Mar 03, 2014 at 10:26:59AM +0200, Peter Ujfalusi wrote:
> >
> >> I think it is correct to detect this situation without the need to have non
> >> related drivers to be probed.
> >> The patch is doing this exactly: detects if we had successful parallel driver
> >> probe(s) while another driver was probing which ends up requesting to be
> >> deferred. We only try the deferred list again if this condition has been
> >> detected, we do not loop on the deferred list, we do not try the list again if
> >> there were no other drivers loaded since nothing happened which could satisfy
> >> the driver asking to be deferred.
> >
> > It's certainly the simplest approach I can think of - anything else
> > would seem to involve looking to see if we're running deferred probes
> > and trying to add things to the list while that's going on which seems
> > like it might be hairy.
>
> Do you want me to resend this patch in hope that it is going to be taken or do
> you have other method in mind to deal with the situation I have described and
> fixed with this patch?
Can you resend, I've totally lost the idea of the original patch. And
if others agree with it, getting acks from them (like Grant) would be
great.
thanks,
greg k-h
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] drivercore: deferral race condition fix
2014-04-02 23:18 ` Greg Kroah-Hartman
@ 2014-04-03 9:07 ` Mark Brown
0 siblings, 0 replies; 7+ messages in thread
From: Mark Brown @ 2014-04-03 9:07 UTC (permalink / raw)
To: Greg Kroah-Hartman
Cc: Peter Ujfalusi, Grant Likely, linux-kernel, Liam Girdwood
[-- Attachment #1: Type: text/plain, Size: 604 bytes --]
On Wed, Apr 02, 2014 at 04:18:57PM -0700, Greg Kroah-Hartman wrote:
> On Wed, Apr 02, 2014 at 09:38:06AM +0300, Peter Ujfalusi wrote:
> > Do you want me to resend this patch in hope that it is going to be taken or do
> > you have other method in mind to deal with the situation I have described and
> > fixed with this patch?
> Can you resend, I've totally lost the idea of the original patch. And
> if others agree with it, getting acks from them (like Grant) would be
> great.
Acked-by: Mark Brown <broonie@linaro.org>
FWIW. It's not the most elegant thing ever but then nor is deferred
probing.
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2014-04-03 9:07 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-02-26 7:06 [PATCH] drivercore: deferral race condition fix Peter Ujfalusi
2014-03-01 0:33 ` Greg Kroah-Hartman
2014-03-03 8:26 ` Peter Ujfalusi
2014-03-04 4:56 ` Mark Brown
2014-04-02 6:38 ` Peter Ujfalusi
2014-04-02 23:18 ` Greg Kroah-Hartman
2014-04-03 9:07 ` Mark Brown
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox