* Re: SBus devices sometimes detected, sometimes not [not found] <alpine.SOC.1.00.1105021259220.6200@math.ut.ee> @ 2011-05-17 20:43 ` David Miller 2011-05-18 4:29 ` Grant Likely 2011-05-18 15:27 ` [PATCH] of: fix race when matching drivers Milton Miller 1 sibling, 1 reply; 9+ messages in thread From: David Miller @ 2011-05-17 20:43 UTC (permalink / raw) To: mroos; +Cc: sparclinux, grant.likely, linux-kernel From: Meelis Roos <mroos@linux.ee> Date: Mon, 2 May 2011 13:39:46 +0300 (EEST) > This is 2.6.39-rc3..rc5 on Sun Ultra 1 with SBus hme card. > On about 50% of boots, hme is found fine and works. On other half boots > (not strictly in any order) SBus hme is not detected with this error: > > hme: probe of f006be34 failed with error -22 > > grepping the logs, I sometimes also find similar errors about qlogicpti: > > qpti: probe of f00798c4 failed with error -22 > > The node addresses are always the same for the device. > > 2.6.38 did not have this problem. Grant, I'm trying to diagnose this bug report which is a 2.6.39-rcX regression and I think it's a side-effect of your changes to move away from of_platform_{,un}register_driver(). Somehow an OF device driver's ->probe() is being called with a NULL "->of_match", that's why the probe returns -22 (-EINVAL) since in your transformations that is the first thing you check. There are only a few ways this can happen that I can tell, firstly the of_driver_match_device() call has to fail. It could then happen if pdrv->id_table is non-NULL, but in these two cases (drivers/net/sunhme.c:hme_sbus_driver and drivers/scsi/qlogicpti.c:qpti_sbus_driver) no id_table is assigned. The next possibility would be if pdrv->name matches the device's name, but in these two cases ("SUNW,hme" vs. "hme" and "QLGC,pti" vs. "qpti") that should not be the case either. There is only one more way this could happen, and that is if the device bus pointer is NULL. Because in that situation the match function doesn't get called and all devices can match. This is also very unlikely, because platform_device_add() always assigns pdev->dev.bus to &platform_bus_type before the device_add() call. The fact that this failure happens only on some boots for Meelis leads me to belive that some datastructure is corrupted. Perhaps there is an incorrect __init or __devinit somewhere, or we reference freed up data some other way. Any ideas? ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: SBus devices sometimes detected, sometimes not 2011-05-17 20:43 ` SBus devices sometimes detected, sometimes not David Miller @ 2011-05-18 4:29 ` Grant Likely 2011-05-18 13:59 ` mroos 0 siblings, 1 reply; 9+ messages in thread From: Grant Likely @ 2011-05-18 4:29 UTC (permalink / raw) To: David Miller; +Cc: mroos, sparclinux, linux-kernel On Tue, May 17, 2011 at 2:43 PM, David Miller <davem@davemloft.net> wrote: > From: Meelis Roos <mroos@linux.ee> > Date: Mon, 2 May 2011 13:39:46 +0300 (EEST) > >> This is 2.6.39-rc3..rc5 on Sun Ultra 1 with SBus hme card. >> On about 50% of boots, hme is found fine and works. On other half boots >> (not strictly in any order) SBus hme is not detected with this error: >> >> hme: probe of f006be34 failed with error -22 >> >> grepping the logs, I sometimes also find similar errors about qlogicpti: >> >> qpti: probe of f00798c4 failed with error -22 >> >> The node addresses are always the same for the device. >> >> 2.6.38 did not have this problem. > > Grant, I'm trying to diagnose this bug report which is a 2.6.39-rcX regression > and I think it's a side-effect of your changes to move away from > of_platform_{,un}register_driver(). > > Somehow an OF device driver's ->probe() is being called with a NULL > "->of_match", that's why the probe returns -22 (-EINVAL) since in > your transformations that is the first thing you check. > > There are only a few ways this can happen that I can tell, firstly > the of_driver_match_device() call has to fail. > > It could then happen if pdrv->id_table is non-NULL, but in these two > cases (drivers/net/sunhme.c:hme_sbus_driver and > drivers/scsi/qlogicpti.c:qpti_sbus_driver) no id_table is assigned. > > The next possibility would be if pdrv->name matches the device's > name, but in these two cases ("SUNW,hme" vs. "hme" and "QLGC,pti" > vs. "qpti") that should not be the case either. This is the failure case I was most concerned about when merging the busses, but as you say the device naming convention is different for OF-generated devices and therefore unlikely, and also easy to check for. > There is only one more way this could happen, and that is if the > device bus pointer is NULL. Because in that situation the > match function doesn't get called and all devices can match. > > This is also very unlikely, because platform_device_add() always > assigns pdev->dev.bus to &platform_bus_type before the device_add() > call. Actually, platform_device_add() doesn't get called for the of_device case; mostly because there are still a couple of things that need to be unified before platform_device_add() can replace of_device_add(). of_device_register() is called instead which doesn't set the bus pointer. However, both versions of scan_one_device() in arch/sparc do explicitly set the bus pointer before adding the device, so this still doesn't look like the issue, and if it was it doesn't explain the intermittent nature of the problem. > The fact that this failure happens only on some boots for Meelis leads > me to belive that some datastructure is corrupted. Perhaps there is > an incorrect __init or __devinit somewhere, or we reference freed up > data some other way. I would agree that that is a likely scenario. > Any ideas? It would be good to know the exact failure mode, so we need to know if of_driver_match_device() is setting of_match which is subsequently getting cleared, or if of_driver_match_device() isn't getting called at all. Meelis, can you add something like the following code to include/linux/of_device.h in the of_driver_match_device() function between the of_match_device() call and the return statement: if (dev->of_match) printk("found match; node=%s, driver=%s\n", dev->of_node->full_name, drv->name); g. -- Grant Likely, B.Sc., P.Eng. Secret Lab Technologies Ltd. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: SBus devices sometimes detected, sometimes not 2011-05-18 4:29 ` Grant Likely @ 2011-05-18 13:59 ` mroos 0 siblings, 0 replies; 9+ messages in thread From: mroos @ 2011-05-18 13:59 UTC (permalink / raw) To: Grant Likely; +Cc: David Miller, sparclinux, linux-kernel > It would be good to know the exact failure mode, so we need to know if > of_driver_match_device() is setting of_match which is subsequently > getting cleared, or if of_driver_match_device() isn't getting called > at all. Meelis, can you add something like the following code to > include/linux/of_device.h in the of_driver_match_device() function > between the of_match_device() call and the return statement: > > if (dev->of_match) > printk("found match; node=%s, driver=%s\n", > dev->of_node->full_name, drv->name); > Tried it with todays 2.6.39-rc7-00211-gc1d10d1. It was harder to reproduce this time, about 1 time out of 10. In normal operation: [ 77.794877] found match; node=/sbus@1f,0/SUNW,hme@0,8c00000, driver=hme [ 77.872379] sunhme.c:v3.10 August 26, 2008 David S. Miller (davem@davemloft.net) [ 77.962865] eth0: HAPPY MEAL (SBUS) 10/100baseT Ethernet 08:00:20:88:6d:6a In case of error (bpp show only as an example that some things work): [ 70.312166] found match; node=/sbus@1f,0/SUNW,bpp@e,c800000, driver=bpp [ 70.389649] parport0: sunbpp at 0x1ffec800000 [ 70.477720] found match; node=/sbus@1f,0/SUNW,hme@0,8c00000, driver=hme [ 70.578023] found match; node=/sbus@1f,0/QLGC,isp@1,10000, driver=qpti [ 70.915544] hme: probe of f006be34 failed with error -22 [ 70.978825] qpti: probe of f00798c4 failed with error -22 -- Meelis Roos (mroos@linux.ee) ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH] of: fix race when matching drivers [not found] <alpine.SOC.1.00.1105021259220.6200@math.ut.ee> 2011-05-17 20:43 ` SBus devices sometimes detected, sometimes not David Miller @ 2011-05-18 15:27 ` Milton Miller 2011-05-18 15:41 ` Grant Likely 2011-05-18 15:45 ` Josip Rodin 1 sibling, 2 replies; 9+ messages in thread From: Milton Miller @ 2011-05-18 15:27 UTC (permalink / raw) To: Grant Likely; +Cc: David Miller, devicetree-discuss, linux-kernel, sparclinux If two drivers are probing devices at the same time, both will write their match table result to the dev->of_match cache at the same time. Only write the result if the device matches. In a thread titled "SBus devices sometimes detected, sometimes not", Meelis reported his SBus hme was not detected about 50% of the time. >From the debug suggested by Grant it was obvious another driver matched some devices between the call to match the hme and the hme discovery failling. Reported-by: Meelis Roos <mroos@linux.ee> Signed-off-by: Milton Miller <miltonm@bga.com> --- Grant, I really think this of_match cache in the device node is bad a bad tradeoff, and am willing to submit patches to remove it for 2.6.40. It is only used by about 26 drivers and all use it once during probe to fill out their driver data. It comes at the cost of a long for every struct device in every system. I'll even offer to throw in a patch to cache the parsing of the compatible property to speed up of_device_is_compatible if needed. Index: work.git/include/linux/of_device.h =================================================================== --- work.git.orig/include/linux/of_device.h 2011-05-18 09:57:01.014386816 -0500 +++ work.git/include/linux/of_device.h 2011-05-18 09:58:27.537431575 -0500 @@ -21,8 +21,15 @@ extern void of_device_make_bus_id(struct static inline int of_driver_match_device(struct device *dev, const struct device_driver *drv) { - dev->of_match = of_match_device(drv->of_match_table, dev); - return dev->of_match != NULL; + const struct of_device_id *match; + + match = of_match_device(drv->of_match_table, dev); + if (match) { + dev->of_match = of_match_device(drv->of_match_table, dev); + return 1; + } + + return 0; } extern struct platform_device *of_dev_get(struct platform_device *dev); ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] of: fix race when matching drivers 2011-05-18 15:27 ` [PATCH] of: fix race when matching drivers Milton Miller @ 2011-05-18 15:41 ` Grant Likely 2011-05-18 15:47 ` Grant Likely 2011-05-18 15:45 ` Josip Rodin 1 sibling, 1 reply; 9+ messages in thread From: Grant Likely @ 2011-05-18 15:41 UTC (permalink / raw) To: Milton Miller; +Cc: David Miller, devicetree-discuss, linux-kernel, sparclinux On Wed, May 18, 2011 at 9:27 AM, Milton Miller <miltonm@bga.com> wrote: > If two drivers are probing devices at the same time, both will write > their match table result to the dev->of_match cache at the same time. > > Only write the result if the device matches. > > In a thread titled "SBus devices sometimes detected, sometimes not", > Meelis reported his SBus hme was not detected about 50% of the time. > From the debug suggested by Grant it was obvious another driver matched > some devices between the call to match the hme and the hme discovery > failling. > > Reported-by: Meelis Roos <mroos@linux.ee> > Signed-off-by: Milton Miller <miltonm@bga.com> > --- > > Grant, I really think this of_match cache in the device node is bad a > bad tradeoff, and am willing to submit patches to remove it for 2.6.40. > It is only used by about 26 drivers and all use it once during probe > to fill out their driver data. It comes at the cost of a long for > every struct device in every system. Ah, bugger. I had /thought/ that matching and probing were kept together with a mutex. So, yes, this is bad and the of_match needs to be removed. Thanks for volunteering to submit the patch. It should be backported to 2.6.39 too. > I'll even offer to throw in a patch to cache the parsing of the > compatible property to speed up of_device_is_compatible if needed. That would be useful. :-) I'll pick up your patch right now and fire it off to Linus. g. > > > > Index: work.git/include/linux/of_device.h > =================================================================== > --- work.git.orig/include/linux/of_device.h 2011-05-18 09:57:01.014386816 -0500 > +++ work.git/include/linux/of_device.h 2011-05-18 09:58:27.537431575 -0500 > @@ -21,8 +21,15 @@ extern void of_device_make_bus_id(struct > static inline int of_driver_match_device(struct device *dev, > const struct device_driver *drv) > { > - dev->of_match = of_match_device(drv->of_match_table, dev); > - return dev->of_match != NULL; > + const struct of_device_id *match; > + > + match = of_match_device(drv->of_match_table, dev); > + if (match) { > + dev->of_match = of_match_device(drv->of_match_table, dev); > + return 1; > + } > + > + return 0; > } > > extern struct platform_device *of_dev_get(struct platform_device *dev); > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > -- Grant Likely, B.Sc., P.Eng. Secret Lab Technologies Ltd. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] of: fix race when matching drivers 2011-05-18 15:41 ` Grant Likely @ 2011-05-18 15:47 ` Grant Likely 0 siblings, 0 replies; 9+ messages in thread From: Grant Likely @ 2011-05-18 15:47 UTC (permalink / raw) To: Milton Miller Cc: David Miller, devicetree-discuss, linux-kernel, sparclinux, Linus Torvalds [cc'ing Linus: This is fix for a 2.6.39 regression, so just a heads up that I'll be sending you a pull req ASAP (later this morning)] On Wed, May 18, 2011 at 9:41 AM, Grant Likely <grant.likely@secretlab.ca> wrote: > On Wed, May 18, 2011 at 9:27 AM, Milton Miller <miltonm@bga.com> wrote: >> If two drivers are probing devices at the same time, both will write >> their match table result to the dev->of_match cache at the same time. >> >> Only write the result if the device matches. >> >> In a thread titled "SBus devices sometimes detected, sometimes not", >> Meelis reported his SBus hme was not detected about 50% of the time. >> From the debug suggested by Grant it was obvious another driver matched >> some devices between the call to match the hme and the hme discovery >> failling. >> >> Reported-by: Meelis Roos <mroos@linux.ee> >> Signed-off-by: Milton Miller <miltonm@bga.com> >> --- >> >> Grant, I really think this of_match cache in the device node is bad a >> bad tradeoff, and am willing to submit patches to remove it for 2.6.40. >> It is only used by about 26 drivers and all use it once during probe >> to fill out their driver data. It comes at the cost of a long for >> every struct device in every system. > > Ah, bugger. I had /thought/ that matching and probing were kept > together with a mutex. So, yes, this is bad and the of_match needs to > be removed. Thanks for volunteering to submit the patch. It should > be backported to 2.6.39 too. > >> I'll even offer to throw in a patch to cache the parsing of the >> compatible property to speed up of_device_is_compatible if needed. > > That would be useful. :-) > > I'll pick up your patch right now and fire it off to Linus. ... it's not perfect since it will break some theoretical drivers unbind/rebind use-cases, but I cannot think of any actual examples, and it is far safer than the current code. Regardless, the removal of of_match will definitely need to go into stable when it is ready. g. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] of: fix race when matching drivers 2011-05-18 15:27 ` [PATCH] of: fix race when matching drivers Milton Miller 2011-05-18 15:41 ` Grant Likely @ 2011-05-18 15:45 ` Josip Rodin 2011-05-18 16:21 ` Grant Likely 1 sibling, 1 reply; 9+ messages in thread From: Josip Rodin @ 2011-05-18 15:45 UTC (permalink / raw) To: Milton Miller Cc: Grant Likely, David Miller, devicetree-discuss, linux-kernel, sparclinux On Wed, May 18, 2011 at 10:27:39AM -0500, Milton Miller wrote: > --- work.git.orig/include/linux/of_device.h 2011-05-18 09:57:01.014386816 -0500 > +++ work.git/include/linux/of_device.h 2011-05-18 09:58:27.537431575 -0500 > @@ -21,8 +21,15 @@ extern void of_device_make_bus_id(struct > static inline int of_driver_match_device(struct device *dev, > const struct device_driver *drv) > { > - dev->of_match = of_match_device(drv->of_match_table, dev); > - return dev->of_match != NULL; > + const struct of_device_id *match; > + > + match = of_match_device(drv->of_match_table, dev); > + if (match) { > + dev->of_match = of_match_device(drv->of_match_table, dev); > + return 1; > + } > + > + return 0; > } Err, is there some reason to avoid simply assigning the existing result: dev->of_match = match; ? -- 2. That which causes joy or happiness. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] of: fix race when matching drivers 2011-05-18 15:45 ` Josip Rodin @ 2011-05-18 16:21 ` Grant Likely 2011-05-18 17:03 ` Milton Miller 0 siblings, 1 reply; 9+ messages in thread From: Grant Likely @ 2011-05-18 16:21 UTC (permalink / raw) To: Josip Rodin Cc: Milton Miller, David Miller, devicetree-discuss, linux-kernel, sparclinux On Wed, May 18, 2011 at 05:45:17PM +0200, Josip Rodin wrote: > On Wed, May 18, 2011 at 10:27:39AM -0500, Milton Miller wrote: > > --- work.git.orig/include/linux/of_device.h 2011-05-18 09:57:01.014386816 -0500 > > +++ work.git/include/linux/of_device.h 2011-05-18 09:58:27.537431575 -0500 > > @@ -21,8 +21,15 @@ extern void of_device_make_bus_id(struct > > static inline int of_driver_match_device(struct device *dev, > > const struct device_driver *drv) > > { > > - dev->of_match = of_match_device(drv->of_match_table, dev); > > - return dev->of_match != NULL; > > + const struct of_device_id *match; > > + > > + match = of_match_device(drv->of_match_table, dev); > > + if (match) { > > + dev->of_match = of_match_device(drv->of_match_table, dev); > > + return 1; > > + } > > + > > + return 0; > > } > > Err, is there some reason to avoid simply assigning the existing result: > dev->of_match = match; ? Good point. I've committed and am currently testing the modified version. g. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] of: fix race when matching drivers 2011-05-18 16:21 ` Grant Likely @ 2011-05-18 17:03 ` Milton Miller 0 siblings, 0 replies; 9+ messages in thread From: Milton Miller @ 2011-05-18 17:03 UTC (permalink / raw) To: Grant Likely, Josip Rodin Cc: David Miller, devicetree-discuss, linux-kernel, sparclinux On Wed, 18 May 2011 about 10:21:39 -0600, Grant Likely wrote: > On Wed, May 18, 2011 at 05:45:17PM +0200, Josip Rodin wrote: > > On Wed, May 18, 2011 at 10:27:39AM -0500, Milton Miller wrote: > > > --- work.git.orig/include/linux/of_device.h 2011-05-18 09:57:01.014386816 -0500 > > > +++ work.git/include/linux/of_device.h 2011-05-18 09:58:27.537431575 -0500 > > > @@ -21,8 +21,15 @@ extern void of_device_make_bus_id(struct > > > static inline int of_driver_match_device(struct device *dev, > > > const struct device_driver *drv) > > > { > > > - dev->of_match = of_match_device(drv->of_match_table, dev); > > > - return dev->of_match != NULL; > > > + const struct of_device_id *match; > > > + > > > + match = of_match_device(drv->of_match_table, dev); > > > + if (match) { > > > + dev->of_match = of_match_device(drv->of_match_table, dev); > > > + return 1; > > > + } > > > + > > > + return 0; > > > } > > > > Err, is there some reason to avoid simply assigning the existing result: > > dev->of_match = match; ? > > Good point. I've committed and am currently testing the modified version. > > g. Sorry about that. I had intended to do that (hence creating the variable), but obvioiusly I forgot when I hurried to compile test and send out the patch. thanks to Josip for finding and Grant for fixing. milton ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2011-05-18 17:03 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <alpine.SOC.1.00.1105021259220.6200@math.ut.ee>
2011-05-17 20:43 ` SBus devices sometimes detected, sometimes not David Miller
2011-05-18 4:29 ` Grant Likely
2011-05-18 13:59 ` mroos
2011-05-18 15:27 ` [PATCH] of: fix race when matching drivers Milton Miller
2011-05-18 15:41 ` Grant Likely
2011-05-18 15:47 ` Grant Likely
2011-05-18 15:45 ` Josip Rodin
2011-05-18 16:21 ` Grant Likely
2011-05-18 17:03 ` Milton Miller
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).