From mboxrd@z Thu Jan 1 00:00:00 1970 From: jbrunet@baylibre.com (Jerome Brunet) Date: Wed, 28 Jun 2017 18:46:21 +0200 Subject: [PATCH] clk: scpi: error when clock fails to register In-Reply-To: <76efd418-9fd5-6ac5-b4c9-c75c5df69df0@arm.com> References: <20170628135345.9337-1-jbrunet@baylibre.com> <1498664332.2337.6.camel@baylibre.com> <76efd418-9fd5-6ac5-b4c9-c75c5df69df0@arm.com> Message-ID: <1498668381.2337.10.camel@baylibre.com> To: linux-arm-kernel@lists.infradead.org List-Id: linux-arm-kernel.lists.infradead.org On Wed, 2017-06-28 at 16:52 +0100, Sudeep Holla wrote: > > On 28/06/17 16:38, Jerome Brunet wrote: > > On Wed, 2017-06-28 at 16:04 +0100, Sudeep Holla wrote: > > > > > > On 28/06/17 14:53, Jerome Brunet wrote: > > > > Current implementation of scpi_clk_add just print a warning when clock > > > > fails to register but then keep going as if nothing happened. The > > > > provider is then registered with bogus data. > > > > > > > > This may latter lead to an Oops in __clk_create_clk when > > > > hlist_add_head(&clk->clks_node, &hw->core->clks) is called. > > > > > > > > > > What's the path of this call ? I see one in devm_clk_hw_register > > > but that's one which failed. > > > > > > > bL cpu freq driver requesting the cpu clock, which failed to register. Here > > the > > Oops call trace: > > > > [????2.202284] [] __clk_create_clk.part.18+0x68/0xb0 > > [????2.208494] [] __of_clk_get_from_provider+0xfc/0x140 > > [????2.214962] [] __of_clk_get_by_name+0x100/0x118 > > [????2.220999] [] clk_get+0x2c/0x78 > > [????2.225744] [] dev_pm_opp_get_opp_table+0xb0/0x118 > > [????2.232039] [] dev_pm_opp_add+0x20/0x68 > > [????2.237388] [] scpi_init_opp_table+0xa8/0x188 > > [????2.243252] [] > > _get_cluster_clk_and_freq_table+0x80/0x180 > > [????2.250151] [] bL_cpufreq_init+0x3f0/0x480 > > [????2.255758] [] cpufreq_online+0xc0/0x658 > > [????2.261191] [] cpufreq_add_dev+0x78/0x88 > > [????2.266625] [] subsys_interface_register+0x84/0xc8 > > [????2.272922] [] cpufreq_register_driver+0x138/0x1b8 > > [????2.279218] [] bL_cpufreq_register+0x74/0x120 > > [????2.285083] [] scpi_cpufreq_probe+0x28/0x38 > > [????2.290776] [] platform_drv_probe+0x50/0xb8 > > [????2.296468] [] driver_probe_device+0x21c/0x2d8 > > > > Thanks for this stack. I just worked out the same path now. I did come > up with the patch as below. That should work if my understanding is correct. I tried. It does not work unfortunately. Still crashes but somewhere else: [????2.301482] [] scpi_of_clk_src_get+0x14/0x58 [????2.307261] [] __of_clk_get_by_name+0x100/0x118 [????2.313297] [] clk_get+0x2c/0x78 [????2.318044] [] dev_pm_opp_get_opp_table+0xb0/0x118 [????2.324338] [] dev_pm_opp_add+0x20/0x68 [????2.329687] [] scpi_init_opp_table+0xa8/0x188 [????2.335550] [] _get_cluster_clk_and_freq_table+0x80/0x180 [????2.342450] [] bL_cpufreq_init+0x3f0/0x480 [????2.348056] [] cpufreq_online+0xc0/0x658 [????2.353490] [] cpufreq_add_dev+0x78/0x88 [????2.358924] [] subsys_interface_register+0x84/0xc8 [????2.365220] [] cpufreq_register_driver+0x138/0x1b8 [????2.371516] [] bL_cpufreq_register+0x74/0x120 [????2.377381] [] scpi_cpufreq_probe+0x28/0x38 [????2.383076] [] platform_drv_probe+0x50/0xb8 [????2.388766] [] driver_probe_device+0x21c/0x2d8 I have not looked at ALL the clock providers, but I have seen a few and I don't remember seeing any which fails, at some point, to register a clocks and still register successfully. It seems strange to continue with a broken controller. > > > But that's not the point. The point is there is path in clk-scpi driver > > which > > registers uninitialized data in the clock provider. That's not good.? > > > > > Also one of the reason for keeping it continuing is, if firmware fails > > > on some non-critical clock, that's fine rather than punishing the entire > > > set of clocks and may even fail the boot. > > > > I understand, but you have no way to know whether a clock is critical or not > > so? > > this explanation looks a bit weak, plus it does not keep the boot from > > failing > > ... not for me at least. > > > > As explained this approach is registering corrupt data in the provider when > > failing. It makes the kernel Oops, it shall be fixed. > > > > Agreed, I want to look at ways to fix that, hence requested you more data. > > > If you have a better solution later on, I don't think there would be any > > problem > > to revert this patch. > > > > Sure I am not against the patch as a fix. I was just trying to better > understand the problem. I had seen the usefulness of skipping on Juno > platforms > in earlier days. Also I am now working on the new SCMI[1] specification > and just want to cover this. > > --- > > diff --git i/drivers/clk/clk-scpi.c w/drivers/clk/clk-scpi.c > index 96d37175d0ad..d83c0b84798d 100644 > --- i/drivers/clk/clk-scpi.c > +++ w/drivers/clk/clk-scpi.c > @@ -245,11 +245,14 @@ static int scpi_clk_add(struct device *dev, struct > device_node *np, > ????????????????sclk->id = val; > > ????????????????err = scpi_clk_ops_init(dev, match, sclk, name); > -???????????????if (err) > +???????????????if (err) { > ????????????????????????dev_err(dev, "failed to register clock '%s'\n", > name); > -???????????????else > +???????????????????????clk_data->clk[idx] = NULL; > +???????????????????????devm_kfree(dev, sclk); > +???????????????} else { > ????????????????????????dev_dbg(dev, "Registered clock '%s'\n", name); > -???????????????clk_data->clk[idx] = sclk; > +???????????????????????clk_data->clk[idx] = sclk; > +???????????????} > ????????} > > ????????return of_clk_add_hw_provider(np, scpi_of_clk_src_get, clk_data); >