* [PATCH 00/18] pwm: Provide devm_pwmchip_alloc() function
@ 2023-07-18 18:18 Uwe Kleine-König
2023-07-18 18:18 ` [PATCH 01/18] " Uwe Kleine-König
0 siblings, 1 reply; 16+ messages in thread
From: Uwe Kleine-König @ 2023-07-18 18:18 UTC (permalink / raw)
To: Jonathan Corbet, Thierry Reding, Yang Yingliang, Andy Shevchenko,
Greg Kroah-Hartman, Mark Brown, Matti Vaittinen, James Clark,
Hector Martin, Sven Peter, Shawn Guo, Sascha Hauer, Paul Cercueil,
Vladimir Zapolskiy, Jonathan Neuschäfer, Fabrice Gasnier,
Maxime Coquelin, Alexandre Torgue, Linus Walleij,
Bartosz Golaszewski
Cc: linux-doc, kernel, linux-pwm, Alyssa Rosenzweig, asahi,
linux-arm-kernel, Fabio Estevam, NXP Linux Team, linux-mips,
linux-stm32, Andy Shevchenko, linux-gpio, Wolfram Sang
Hello,
my eventual goal is to provide a chardev API to PWMs similar to the gpioctl
API. For that to work flawlessly it's required that a pwmchip stays
around while the corresponding device is opened even if the respective
lowlevel driver goes away. See Wolfram's EOSS talk[1] for some more
details.
This series provides a new function devm_pwmchip_alloc() that allocates
a struct pwm_chip together with driver data. Currently this is still
using devm_kzalloc and so goes away when the device is unbound from the
driver. However this can be changed without having to touch all drivers
again.
The function devm_pwmchip_alloc() is modelled after similar functions
from spi, counter and networking code. The first patch provides the
allocator function and an accessor for driver data, the following
patches convert a subset of the available drivers to this new API.
The series is fully bisectable and the only interdependency is that
patch #1 is needed for all other patches. The idea is to complete
conversion of all remaining drivers and then add a struct device to
struct pwm_chip and so make pwm_chip reference counted. There are still
a few more drivers to convert, but I thought to send out the current
patch set to get some early feedback.
The base for this series is v6.5-rc1 plus the following series:
[PATCH v2 0/8] pwm: Get rid of pwm_[sg]et_chip_data()
20230705080650.2353391-1-u.kleine-koenig@pengutronix.de
[PATCH 0/2] pwm: stm32: A (small) fix and a cleanup
20230713155142.2454010-1-u.kleine-koenig@pengutronix.de
[PATCH 00/10] pwm: Constistenly name pwm_chip variables "chip"
20230714205623.2496590-1-u.kleine-koenig@pengutronix.de
[PATCH] staging: greybus: pwm: Drop unused member from driver struct
20230714201622.2490792-1-u.kleine-koenig@pengutronix.de
[PATCH] pwm: lpc18xx-sct: Simplify using devm_clk_get_enabled()
20230718144128.371818-1-u.kleine-koenig@pengutronix.de
[PATCH] pwm: lpc32xx: remove handling of PWM channels
20230717155257.2568627-1-u.kleine-koenig@pengutronix.de
[PATCH] pwm: pxa: Don't reimplement of_device_get_match_data()
20230718150657.1728166-1-u.kleine-koenig@pengutronix.de
[PATCH] pwm: ntxec: Drop a write-only variable from driver data
20230718152327.2583886-1-u.kleine-koenig@pengutronix.de
[PATCH] gpio: mvebu: Make use of devm_pwmchip_add
20230717142743.2555739-1-u.kleine-koenig@pengutronix.de
[PATCH] pwm: ntxec: Use device_set_of_node_from_dev()
20230718175310.3946687-1-u.kleine-koenig@pengutronix.de
[PATCH] pwm: berlin: Simplify using devm functions
20230718175545.3946935-1-u.kleine-koenig@pengutronix.de
I'm not sure the build bots can properly handle that, so it would be
great to get these base series into next soon.
Best regards
Uwe
[1] https://static.sched.com/hosted_files/eoss2023/e3/LifecycleIssues_WolframSang_2023.pdf
Uwe Kleine-König (18):
pwm: Provide devm_pwmchip_alloc() function
pwm: ab8500: Make use of devm_pwmchip_alloc() function
pwm: apple: Make use of devm_pwmchip_alloc() function
pwm: berlin: Make use of devm_pwmchip_alloc() function
pwm: clk: Make use of devm_pwmchip_alloc() function
pwm: fsl-ftm: Make use of devm_pwmchip_alloc() function
pwm: hibvt: Make use of devm_pwmchip_alloc() function
pwm: imx1: Make use of devm_pwmchip_alloc() function
pwm: imx27: Make use of devm_pwmchip_alloc() function
pwm: jz4740: Make use of devm_pwmchip_alloc() function
pwm: keembay: Make use of devm_pwmchip_alloc() function
pwm: lpc18xx-sct: Make use of devm_pwmchip_alloc() function
pwm: lpc32xx: Make use of devm_pwmchip_alloc() function
pwm: mxs: Make use of devm_pwmchip_alloc() function
pwm: ntxec: Make use of devm_pwmchip_alloc() function
pwm: pxa: Make use of devm_pwmchip_alloc() function
pwm: stm32: Make use of devm_pwmchip_alloc() function
gpio: mvebu: Make use of devm_pwmchip_alloc() function
.../driver-api/driver-model/devres.rst | 1 +
Documentation/driver-api/pwm.rst | 10 ++--
drivers/gpio/gpio-mvebu.c | 17 +++----
drivers/pwm/core.c | 23 +++++++++
drivers/pwm/pwm-ab8500.c | 17 +++----
drivers/pwm/pwm-apple.c | 17 +++----
drivers/pwm/pwm-berlin.c | 28 ++++++-----
drivers/pwm/pwm-clk.c | 26 +++++-----
drivers/pwm/pwm-fsl-ftm.c | 47 ++++++++++---------
drivers/pwm/pwm-hibvt.c | 26 +++++-----
drivers/pwm/pwm-imx1.c | 16 +++----
drivers/pwm/pwm-imx27.c | 19 ++++----
drivers/pwm/pwm-jz4740.c | 25 +++++-----
drivers/pwm/pwm-keembay.c | 16 +++----
drivers/pwm/pwm-lpc18xx-sct.c | 24 +++++-----
drivers/pwm/pwm-lpc32xx.c | 18 +++----
drivers/pwm/pwm-mxs.c | 19 ++++----
drivers/pwm/pwm-ntxec.c | 27 +++++------
drivers/pwm/pwm-pxa.c | 20 ++++----
drivers/pwm/pwm-stm32.c | 42 +++++++++++------
include/linux/pwm.h | 4 ++
21 files changed, 249 insertions(+), 193 deletions(-)
base-commit: 06c2afb862f9da8dc5efa4b6076a0e48c3fbaaa5
prerequisite-patch-id: c856e0baabfc22d250b7ce881427cdb74613e69a
prerequisite-patch-id: 2a5f9c04e4b5794b5a5d0b30280f75b76a05092b
prerequisite-patch-id: a48b05b94b61f8029b4bb9a78ae1f2cd8c476d80
prerequisite-patch-id: c7dfd3798f024b27b6e236da1eae40b79bb3e281
prerequisite-patch-id: 313ada72ab57438a8d54df0df9c0926bb4f69b36
prerequisite-patch-id: 318824c08f3e7d6500e3c5a47a11c5daffaea34a
prerequisite-patch-id: ea75bfd48ea4d0132c637172564bc1a57061377a
prerequisite-patch-id: 278b25a8d5fef49f7e5c46b627d4862d0b24baaf
prerequisite-patch-id: a3b11e0a7c8f6564e668e2ed1b637351e5fd9dd2
prerequisite-patch-id: 55588b25ea7ce69d716a33a7aaed662d75bb9687
prerequisite-patch-id: 399d7e94bafc5dc970a0f213745e4982066b8583
prerequisite-patch-id: 87aca1beb6efdaab95ddce7e2d9eed89a89252a1
prerequisite-patch-id: e13e3db807f2c5b49180e24492f496f8e945e42d
prerequisite-patch-id: 8712b17dee2a5f2441908e2de260ceec8ce6ff37
prerequisite-patch-id: 987b0a351801b1e361932beeeedbc5245540037c
prerequisite-patch-id: 869d3932983f3ae6a8982e186b27df11c56e9e5e
prerequisite-patch-id: 4f60b0afb435011b58b7cc6bead1d385db7d8e11
prerequisite-patch-id: 0e86da52c02300d407d4c0b6f2b9f8293e3320dc
prerequisite-patch-id: a731856f5d7dee7f3465f021e5c56981a803b22d
prerequisite-patch-id: b759d036cf578083ec76aa7ba01dd8643667d4f6
prerequisite-patch-id: 9abd8b16e74625e2630655da7d22e75bbe0c6231
prerequisite-patch-id: 6ad03ceb505a293f2308235161c54ff6b508e59f
prerequisite-patch-id: cb6d75be9b72cc04069b6952ba9e5ac80a26a1ab
prerequisite-patch-id: c8fa7296a736f42ec49ab1334cb19947d647a2b4
prerequisite-patch-id: b601ea695815fb65ed704349302ba66442277fc3
prerequisite-patch-id: 7a0daa2918f8a317333e57c1d1698078a1968720
prerequisite-patch-id: b6762f6deeb74aaf73afe2d8dd816d48cec4e1ee
prerequisite-patch-id: a0c8d63424241e64c3e5f9991ea04018a51fcc94
--
2.39.2
^ permalink raw reply [flat|nested] 16+ messages in thread* [PATCH 01/18] pwm: Provide devm_pwmchip_alloc() function 2023-07-18 18:18 [PATCH 00/18] pwm: Provide devm_pwmchip_alloc() function Uwe Kleine-König @ 2023-07-18 18:18 ` Uwe Kleine-König 2023-07-18 20:05 ` Andy Shevchenko 2023-07-19 7:59 ` Thierry Reding 0 siblings, 2 replies; 16+ messages in thread From: Uwe Kleine-König @ 2023-07-18 18:18 UTC (permalink / raw) To: Jonathan Corbet, Thierry Reding, Yang Yingliang, Andy Shevchenko, Greg Kroah-Hartman, Mark Brown, Matti Vaittinen, James Clark Cc: linux-doc, kernel, linux-pwm This function allocates a struct pwm_chip and driver data. Compared to the status quo the split into pwm_chip and driver data is new, otherwise it doesn't change anything relevant (yet). The intention is that after all drivers are switched to use this allocation function, its possible to add a struct device to struct pwm_chip to properly track the latter's lifetime without touching all drivers again. Proper lifetime tracking is a necessary precondition to introduce character device support for PWMs (that implements atomic setting and doesn't suffer from the sysfs overhead of the /sys/class/pwm userspace support). The new function pwmchip_priv() (obviously?) only works for chips allocated with devm_pwmchip_alloc(). Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> --- .../driver-api/driver-model/devres.rst | 1 + Documentation/driver-api/pwm.rst | 10 ++++---- drivers/pwm/core.c | 23 +++++++++++++++++++ include/linux/pwm.h | 4 ++++ 4 files changed, 33 insertions(+), 5 deletions(-) diff --git a/Documentation/driver-api/driver-model/devres.rst b/Documentation/driver-api/driver-model/devres.rst index 8be086b3f829..73a9ee074737 100644 --- a/Documentation/driver-api/driver-model/devres.rst +++ b/Documentation/driver-api/driver-model/devres.rst @@ -414,6 +414,7 @@ POWER devm_reboot_mode_unregister() PWM + devm_pwmchip_alloc() devm_pwmchip_add() devm_pwm_get() devm_fwnode_pwm_get() diff --git a/Documentation/driver-api/pwm.rst b/Documentation/driver-api/pwm.rst index 3fdc95f7a1d1..a3824bd58e4c 100644 --- a/Documentation/driver-api/pwm.rst +++ b/Documentation/driver-api/pwm.rst @@ -134,11 +134,11 @@ to implement the pwm_*() functions itself. This means that it's impossible to have multiple PWM drivers in the system. For this reason it's mandatory for new drivers to use the generic PWM framework. -A new PWM controller/chip can be added using pwmchip_add() and removed -again with pwmchip_remove(). pwmchip_add() takes a filled in struct -pwm_chip as argument which provides a description of the PWM chip, the -number of PWM devices provided by the chip and the chip-specific -implementation of the supported PWM operations to the framework. +A new PWM controller/chip can be allocated using devm_pwmchip_alloc, then added +using pwmchip_add() and removed again with pwmchip_remove(). pwmchip_add() +takes a filled in struct pwm_chip as argument which provides a description of +the PWM chip, the number of PWM devices provided by the chip and the +chip-specific implementation of the supported PWM operations to the framework. When implementing polarity support in a PWM driver, make sure to respect the signal conventions in the PWM framework. By definition, normal polarity diff --git a/drivers/pwm/core.c b/drivers/pwm/core.c index 81d176142403..3f4c2d940d64 100644 --- a/drivers/pwm/core.c +++ b/drivers/pwm/core.c @@ -304,6 +304,29 @@ void pwmchip_remove(struct pwm_chip *chip) } EXPORT_SYMBOL_GPL(pwmchip_remove); +void *pwmchip_priv(struct pwm_chip *chip) +{ + return (char *)chip + ALIGN(sizeof(*chip), 32); +} +EXPORT_SYMBOL_GPL(pwmchip_priv); + +struct pwm_chip *devm_pwmchip_alloc(struct device *parent, size_t sizeof_priv) +{ + struct pwm_chip *chip; + size_t alloc_size; + + alloc_size = ALIGN(sizeof(*chip), 32) + sizeof_priv; + + chip = devm_kzalloc(parent, alloc_size, GFP_KERNEL); + if (!chip) + return NULL; + + chip->dev = parent; + + return chip; +} +EXPORT_SYMBOL_GPL(devm_pwmchip_alloc); + static void devm_pwmchip_remove(void *data) { struct pwm_chip *chip = data; diff --git a/include/linux/pwm.h b/include/linux/pwm.h index 10bea923a1d5..648009c8499b 100644 --- a/include/linux/pwm.h +++ b/include/linux/pwm.h @@ -5,6 +5,7 @@ #include <linux/err.h> #include <linux/mutex.h> #include <linux/of.h> +#include <linux/compiler_attributes.h> struct pwm_chip; @@ -385,6 +386,9 @@ int pwm_capture(struct pwm_device *pwm, struct pwm_capture *result, int pwmchip_add(struct pwm_chip *chip); void pwmchip_remove(struct pwm_chip *chip); +void *pwmchip_priv(struct pwm_chip *chip) __attribute_const__; + +struct pwm_chip *devm_pwmchip_alloc(struct device *parent, size_t sizeof_priv); int devm_pwmchip_add(struct device *dev, struct pwm_chip *chip); struct pwm_device *pwm_request_from_chip(struct pwm_chip *chip, -- 2.39.2 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH 01/18] pwm: Provide devm_pwmchip_alloc() function 2023-07-18 18:18 ` [PATCH 01/18] " Uwe Kleine-König @ 2023-07-18 20:05 ` Andy Shevchenko 2023-07-19 10:31 ` Uwe Kleine-König 2023-07-19 7:59 ` Thierry Reding 1 sibling, 1 reply; 16+ messages in thread From: Andy Shevchenko @ 2023-07-18 20:05 UTC (permalink / raw) To: Uwe Kleine-König Cc: Jonathan Corbet, Thierry Reding, Yang Yingliang, Greg Kroah-Hartman, Mark Brown, Matti Vaittinen, James Clark, linux-doc, kernel, linux-pwm On Tue, Jul 18, 2023 at 08:18:32PM +0200, Uwe Kleine-König wrote: > This function allocates a struct pwm_chip and driver data. Compared to > the status quo the split into pwm_chip and driver data is new, otherwise > it doesn't change anything relevant (yet). > > The intention is that after all drivers are switched to use this > allocation function, its possible to add a struct device to struct > pwm_chip to properly track the latter's lifetime without touching all > drivers again. Proper lifetime tracking is a necessary precondition to > introduce character device support for PWMs (that implements atomic > setting and doesn't suffer from the sysfs overhead of the /sys/class/pwm > userspace support). > > The new function pwmchip_priv() (obviously?) only works for chips > allocated with devm_pwmchip_alloc(). ... > +void *pwmchip_priv(struct pwm_chip *chip) > +{ > + return (char *)chip + ALIGN(sizeof(*chip), 32); Why 32? I haven't found any explanation on the choice. I can understand arch minimum align, but hard coded value is a bit hard to get. > +} ... > +struct pwm_chip *devm_pwmchip_alloc(struct device *parent, size_t sizeof_priv) > +{ > + struct pwm_chip *chip; > + size_t alloc_size; > + alloc_size = ALIGN(sizeof(*chip), 32) + sizeof_priv; Ditto. Shouldn't it use a macro from overflow.h? > + chip = devm_kzalloc(parent, alloc_size, GFP_KERNEL); > + if (!chip) > + return NULL; > + > + chip->dev = parent; > + > + return chip; > +} -- With Best Regards, Andy Shevchenko ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 01/18] pwm: Provide devm_pwmchip_alloc() function 2023-07-18 20:05 ` Andy Shevchenko @ 2023-07-19 10:31 ` Uwe Kleine-König 0 siblings, 0 replies; 16+ messages in thread From: Uwe Kleine-König @ 2023-07-19 10:31 UTC (permalink / raw) To: Andy Shevchenko Cc: linux-pwm, Matti Vaittinen, Greg Kroah-Hartman, Jonathan Corbet, linux-doc, Mark Brown, Thierry Reding, James Clark, kernel, Yang Yingliang [-- Attachment #1: Type: text/plain, Size: 2500 bytes --] Hello Andy, On Tue, Jul 18, 2023 at 11:05:29PM +0300, Andy Shevchenko wrote: > On Tue, Jul 18, 2023 at 08:18:32PM +0200, Uwe Kleine-König wrote: > > This function allocates a struct pwm_chip and driver data. Compared to > > the status quo the split into pwm_chip and driver data is new, otherwise > > it doesn't change anything relevant (yet). > > > > The intention is that after all drivers are switched to use this > > allocation function, its possible to add a struct device to struct > > pwm_chip to properly track the latter's lifetime without touching all > > drivers again. Proper lifetime tracking is a necessary precondition to > > introduce character device support for PWMs (that implements atomic > > setting and doesn't suffer from the sysfs overhead of the /sys/class/pwm > > userspace support). > > > > The new function pwmchip_priv() (obviously?) only works for chips > > allocated with devm_pwmchip_alloc(). > > ... > > > +void *pwmchip_priv(struct pwm_chip *chip) > > +{ > > + return (char *)chip + ALIGN(sizeof(*chip), 32); > > Why 32? I haven't found any explanation on the choice. I can understand arch > minimum align, but hard coded value is a bit hard to get. I copied that part from netdev_priv (without introducing the equivalent of NETDEV_ALIGN). The 32 there isn't motivated in a comment, and i didn't think much about it. Alternatives that I'm aware of are: dma_get_cache_alignment() (spi) Use a struct (counter's counter_device_allochelper uses ____cacheline_aligned) (but I wonder about dma_get_cache_alignment() which might return 1 on some archs and then doesn't even ensure natural aligning for shorts.) > > +} > > ... > > > +struct pwm_chip *devm_pwmchip_alloc(struct device *parent, size_t sizeof_priv) > > +{ > > + struct pwm_chip *chip; > > + size_t alloc_size; > > > + alloc_size = ALIGN(sizeof(*chip), 32) + sizeof_priv; > > Ditto. Of course this needs to match the above. spi uses dev_get_drvdata on the controller's dev. Also a nice idea. > Shouldn't it use a macro from overflow.h? Yupp, that would make sense. Something like: if (unlikely(check_add_overflow(ALIGN(sizeof(*chip), 32), sizeof_priv, &alloc_size))) return NULL; (with 32 replaced by something with a better name.) Thanks for your feedback, Uwe -- Pengutronix e.K. | Uwe Kleine-König | Industrial Linux Solutions | https://www.pengutronix.de/ | [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 01/18] pwm: Provide devm_pwmchip_alloc() function 2023-07-18 18:18 ` [PATCH 01/18] " Uwe Kleine-König 2023-07-18 20:05 ` Andy Shevchenko @ 2023-07-19 7:59 ` Thierry Reding 2023-07-19 8:57 ` Uwe Kleine-König 2023-07-25 21:10 ` Uwe Kleine-König 1 sibling, 2 replies; 16+ messages in thread From: Thierry Reding @ 2023-07-19 7:59 UTC (permalink / raw) To: Uwe Kleine-König Cc: Jonathan Corbet, Yang Yingliang, Andy Shevchenko, Greg Kroah-Hartman, Mark Brown, Matti Vaittinen, James Clark, linux-doc, kernel, linux-pwm [-- Attachment #1: Type: text/plain, Size: 1175 bytes --] On Tue, Jul 18, 2023 at 08:18:32PM +0200, Uwe Kleine-König wrote: > This function allocates a struct pwm_chip and driver data. Compared to > the status quo the split into pwm_chip and driver data is new, otherwise > it doesn't change anything relevant (yet). > > The intention is that after all drivers are switched to use this > allocation function, its possible to add a struct device to struct > pwm_chip to properly track the latter's lifetime without touching all > drivers again. Proper lifetime tracking is a necessary precondition to > introduce character device support for PWMs (that implements atomic > setting and doesn't suffer from the sysfs overhead of the /sys/class/pwm > userspace support). > > The new function pwmchip_priv() (obviously?) only works for chips > allocated with devm_pwmchip_alloc(). If this is supposed to be similar to the GPIO chardev, why doesn't GPIO require this way of allocating a struct gpio_chip? I'm not a fan of doing all this upfront work without seeing where this is ultimately headed. Please hold off on reworking everything until you have a complete proposal that can be reviewed in full. Thierry [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 01/18] pwm: Provide devm_pwmchip_alloc() function 2023-07-19 7:59 ` Thierry Reding @ 2023-07-19 8:57 ` Uwe Kleine-König 2023-07-25 21:10 ` Uwe Kleine-König 1 sibling, 0 replies; 16+ messages in thread From: Uwe Kleine-König @ 2023-07-19 8:57 UTC (permalink / raw) To: Thierry Reding Cc: Jonathan Corbet, Yang Yingliang, Andy Shevchenko, Greg Kroah-Hartman, Mark Brown, Matti Vaittinen, James Clark, linux-doc, kernel, linux-pwm [-- Attachment #1: Type: text/plain, Size: 2680 bytes --] Hello Thierry, On Wed, Jul 19, 2023 at 09:59:29AM +0200, Thierry Reding wrote: > On Tue, Jul 18, 2023 at 08:18:32PM +0200, Uwe Kleine-König wrote: > > This function allocates a struct pwm_chip and driver data. Compared to > > the status quo the split into pwm_chip and driver data is new, otherwise > > it doesn't change anything relevant (yet). > > > > The intention is that after all drivers are switched to use this > > allocation function, its possible to add a struct device to struct > > pwm_chip to properly track the latter's lifetime without touching all > > drivers again. Proper lifetime tracking is a necessary precondition to > > introduce character device support for PWMs (that implements atomic > > setting and doesn't suffer from the sysfs overhead of the /sys/class/pwm > > userspace support). > > > > The new function pwmchip_priv() (obviously?) only works for chips > > allocated with devm_pwmchip_alloc(). > > If this is supposed to be similar to the GPIO chardev, why doesn't GPIO > require this way of allocating a struct gpio_chip? For each gpio_chip there is an external struct gpio_device that is allocated in gpiochip_add(). This works but has some downsides. E.g. gpiochip_remove() has to grab a mutex that is also held while userspace polls a gpioctrl device and so gpiochip_remove blocks. While this works (as in: it doesn't crash) it's in the category "not as good as possible". I'm convinced that getting this right before adding the complexity of chardev support is a good idea. > I'm not a fan of doing all this upfront work without seeing where this > is ultimately headed. Please hold off on reworking everything until > you have a complete proposal that can be reviewed in full. I see some benefit of the conversion started in this patch set stand alone, too. But I agree it's for now a bit theoretic. With the goal to pick the agreed good approach for pwm this is a bigger pile of work. Completing it in private and only present the complete proposal has the downside that I get feedback for the house's seating only when the roof is assembled. While me spending much time isn't your problem, I have a problem with it. Splitting the task in chunks that go in one after another is surely the quicker and more effective approach. So it would be great if you could understand the issue tackled here even though there are subsystems that do it differently (and less optimal) and the usecase isn't complete yet. Thanks Uwe -- Pengutronix e.K. | Uwe Kleine-König | Industrial Linux Solutions | https://www.pengutronix.de/ | [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 01/18] pwm: Provide devm_pwmchip_alloc() function 2023-07-19 7:59 ` Thierry Reding 2023-07-19 8:57 ` Uwe Kleine-König @ 2023-07-25 21:10 ` Uwe Kleine-König 2023-10-10 8:05 ` Uwe Kleine-König 1 sibling, 1 reply; 16+ messages in thread From: Uwe Kleine-König @ 2023-07-25 21:10 UTC (permalink / raw) To: Thierry Reding Cc: linux-pwm, Matti Vaittinen, Greg Kroah-Hartman, Jonathan Corbet, linux-doc, Mark Brown, James Clark, kernel, Yang Yingliang, Andy Shevchenko, Wolfram Sang [-- Attachment #1: Type: text/plain, Size: 6811 bytes --] On Wed, Jul 19, 2023 at 09:59:29AM +0200, Thierry Reding wrote: > On Tue, Jul 18, 2023 at 08:18:32PM +0200, Uwe Kleine-König wrote: > > This function allocates a struct pwm_chip and driver data. Compared to > > the status quo the split into pwm_chip and driver data is new, otherwise > > it doesn't change anything relevant (yet). > > > > The intention is that after all drivers are switched to use this > > allocation function, its possible to add a struct device to struct > > pwm_chip to properly track the latter's lifetime without touching all > > drivers again. Proper lifetime tracking is a necessary precondition to > > introduce character device support for PWMs (that implements atomic > > setting and doesn't suffer from the sysfs overhead of the /sys/class/pwm > > userspace support). > > > > The new function pwmchip_priv() (obviously?) only works for chips > > allocated with devm_pwmchip_alloc(). > > If this is supposed to be similar to the GPIO chardev, why doesn't GPIO > require this way of allocating a struct gpio_chip? I'm not a fan of > doing all this upfront work without seeing where this is ultimately > headed. Please hold off on reworking everything until you have a > complete proposal that can be reviewed in full. I'm working on that and already have a patch stack with more than 100 patches, many of them are cleanups that I found while working on each PWM driver (most of these are already posted to the linux-pwm list). The biggest part is to convert each of the 50+ pwm drivers to pwmchip_alloc(). Today I managed to trigger the problem I intend to address with this series. My machine to test this on is an stm32mp157. To be able to trigger the problem reliably I applied the following patches on top of v6.5-rc1: - pwm: stm32: Don't modify HW state in .remove() callback This is a cleanup that I already sent out. https://lore.kernel.org/r/20230713155142.2454010-2-u.kleine-koenig@pengutronix.de The purpose for reproducing the problem is to not trigger further calls to the apply callback. - The following patch: diff --git a/drivers/pwm/pwm-stm32.c b/drivers/pwm/pwm-stm32.c index 687967d3265f..c7fc02b0fa3c 100644 --- a/drivers/pwm/pwm-stm32.c +++ b/drivers/pwm/pwm-stm32.c @@ -451,6 +451,10 @@ static int stm32_pwm_apply(struct pwm_chip *chip, struct pwm_device *pwm, struct stm32_pwm *priv = to_stm32_pwm_dev(chip); int ret; + dev_info(chip->dev, "%s:%d\n", __func__, __LINE__); + msleep(5000); + dev_info(chip->dev, "%s:%d\n", __func__, __LINE__); + enabled = pwm->state.enabled; if (enabled && !state->enabled) { @@ -650,7 +654,11 @@ static void stm32_pwm_remove(struct platform_device *pdev) { struct stm32_pwm *priv = platform_get_drvdata(pdev); + dev_info(&pdev->dev, "%s:%d\n", __func__, __LINE__); pwmchip_remove(&priv->chip); + dev_info(&pdev->dev, "%s:%d\n", __func__, __LINE__); + + priv->regmap = NULL; } static int __maybe_unused stm32_pwm_suspend(struct device *dev) The first hunk is only there to widen the race window. The second is to give some diagnostics and make stm32_pwm_apply() crash if it continues to run after the msleep. (Without it it didn't crash reproducibly, don't understand why. *shrug*) The device tree contains a pwm-fan device making use of one of the PWMs. Now I do the following: echo fan > /sys/bus/platform/drivers/pwm-fan/unbind & sleep 1; echo 40007000.timer:pwm > /sys/bus/platform/drivers/stm32-pwm/unbind Unbinding the fan device has two effects: - The device link between fan and pwm looses its property to unbind fan when pwm gets unbound. (Its .status changes from DL_STATE_ACTIVE to DL_STATE_AVAILABLE) - It calls pwm_fan_cleanup() which triggers a call to pwm_apply_state(). So when the pwm device gets unbound the first thread is sleeping in stm32_pwm_apply(). The driver calls pwmchip_remove() and sets priv->regmap to NULL. Then a few seconds later the first thread wakes up in stm32_pwm_apply() with the chip freed and priv->regmap = NULL. Bang! This looks as follows: root@crown:~# echo fan > /sys/bus/platform/drivers/pwm-fan/unbind & sleep 1; echo 40007000.timer:pwm > /sys/bus/platform/drivers/stm32-pwm/unbind [ 187.182113] stm32-pwm 40007000.timer:pwm: stm32_pwm_apply:454 [ 188.164769] stm32-pwm 40007000.timer:pwm: stm32_pwm_remove:657 [ 188.184555] stm32-pwm 40007000.timer:pwm: stm32_pwm_remove:659 root@crown:~# [ 192.236423] platform 40007000.timer:pwm: stm32_pwm_apply:456 [ 192.240727] 8<--- cut here --- [ 192.243759] Unable to handle kernel NULL pointer dereference at virtual address 0000001c when read ... Even without the crash you can see that stm32_pwm_apply() is still running after pwmchip_remove() completed. I'm unsure if the device link could be improved here to ensure that the fan is completely unbound even if it started unbinding already before the pwm device gets unbound. (And if it could, would this fit the device links purpose and so be a sensible improvement?) If not, the only possible other fix is to make sure that the pwm callbacks are serialized with pwmchip_remove() and stop calling the driver callbacks when pwmchip_remove() was called. For that the pwm_chip struct must stay around until all consumers are really gone. So the pwm_chip must not be allocated using devm_kzalloc for the pwm's parent device. Am I missing something? Is this good enough to convince you that we need the concept of devm_pwmchip_alloc() from this thread? My preferred way to continue fixing that would be to get all the preparing cleanups into next soon to keep my patch stack smaller. Another pre-condition to serializing the pwm callbacks is (I think) that all low-level drivers must be fixed to not call pwm-API functions[1]. Then I'll prepare switching all drivers based on this series plus some more patches to introduce a struct device in struct gpio_chip to track the lifetime and eventually fix the issue demonstrated above. While addressing this issue isn't a hard requirement for my final plan to introduce /dev/pwmchip character devices, fixing it now before the pwm core is complicated with the character device code should be easier at least. (Also these character devices introduce another "user" of pwm_chips and so another reason that these shouldn't be allocated using devm_kzalloc.) Best regards Uwe [1] first approximation: $ git grep -Ew 'pwm_(apply|get_state|is_enabled|set_period|get_period|[sg]et_duty_cycle|get_polarity|enable|disable)' v6.5-rc1 -- drivers/pwm/pwm-*.c | wc -l 43 -- Pengutronix e.K. | Uwe Kleine-König | Industrial Linux Solutions | https://www.pengutronix.de/ | [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH 01/18] pwm: Provide devm_pwmchip_alloc() function 2023-07-25 21:10 ` Uwe Kleine-König @ 2023-10-10 8:05 ` Uwe Kleine-König 2023-10-13 21:42 ` Saravana Kannan 0 siblings, 1 reply; 16+ messages in thread From: Uwe Kleine-König @ 2023-10-10 8:05 UTC (permalink / raw) To: Saravana Kannan Cc: linux-pwm, Jonathan Corbet, Thierry Reding, Greg Kroah-Hartman, linux-doc, Wolfram Sang, Mark Brown, James Clark, kernel, Yang Yingliang, Andy Shevchenko, Matti Vaittinen [-- Attachment #1: Type: text/plain, Size: 4660 bytes --] Hello Saravana, you were pointed out to me as the expert for device links. I found a problem with these. On Tue, Jul 25, 2023 at 11:10:04PM +0200, Uwe Kleine-König wrote: > Today I managed to trigger the problem I intend to address with this > series. My machine to test this on is an stm32mp157. To be able to > trigger the problem reliably I applied the following patches on top of > v6.5-rc1: > > - pwm: stm32: Don't modify HW state in .remove() callback > This is a cleanup that I already sent out. > https://lore.kernel.org/r/20230713155142.2454010-2-u.kleine-koenig@pengutronix.de > The purpose for reproducing the problem is to not trigger further > calls to the apply callback. > > - The following patch: > > diff --git a/drivers/pwm/pwm-stm32.c b/drivers/pwm/pwm-stm32.c > index 687967d3265f..c7fc02b0fa3c 100644 > --- a/drivers/pwm/pwm-stm32.c > +++ b/drivers/pwm/pwm-stm32.c > @@ -451,6 +451,10 @@ static int stm32_pwm_apply(struct pwm_chip *chip, struct pwm_device *pwm, > struct stm32_pwm *priv = to_stm32_pwm_dev(chip); > int ret; > > + dev_info(chip->dev, "%s:%d\n", __func__, __LINE__); > + msleep(5000); > + dev_info(chip->dev, "%s:%d\n", __func__, __LINE__); > + > enabled = pwm->state.enabled; > > if (enabled && !state->enabled) { > @@ -650,7 +654,11 @@ static void stm32_pwm_remove(struct platform_device *pdev) > { > struct stm32_pwm *priv = platform_get_drvdata(pdev); > > + dev_info(&pdev->dev, "%s:%d\n", __func__, __LINE__); > pwmchip_remove(&priv->chip); > + dev_info(&pdev->dev, "%s:%d\n", __func__, __LINE__); > + > + priv->regmap = NULL; > } > > static int __maybe_unused stm32_pwm_suspend(struct device *dev) > > The first hunk is only there to widen the race window. The second is to > give some diagnostics and make stm32_pwm_apply() crash if it continues > to run after the msleep. (Without it it didn't crash reproducibly, don't > understand why. *shrug*) > > The device tree contains a pwm-fan device making use of one of the PWMs. > > Now I do the following: > > echo fan > /sys/bus/platform/drivers/pwm-fan/unbind & sleep 1; echo 40007000.timer:pwm > /sys/bus/platform/drivers/stm32-pwm/unbind > > Unbinding the fan device has two effects: > > - The device link between fan and pwm looses its property to unbind fan > when pwm gets unbound. > (Its .status changes from DL_STATE_ACTIVE to DL_STATE_AVAILABLE) > - It calls pwm_fan_cleanup() which triggers a call to > pwm_apply_state(). > > So when the pwm device gets unbound the first thread is sleeping in > stm32_pwm_apply(). The driver calls pwmchip_remove() and sets > priv->regmap to NULL. Then a few seconds later the first thread wakes up > in stm32_pwm_apply() with the chip freed and priv->regmap = NULL. Bang! > > This looks as follows: > > root@crown:~# echo fan > /sys/bus/platform/drivers/pwm-fan/unbind & sleep 1; echo 40007000.timer:pwm > /sys/bus/platform/drivers/stm32-pwm/unbind > [ 187.182113] stm32-pwm 40007000.timer:pwm: stm32_pwm_apply:454 > [ 188.164769] stm32-pwm 40007000.timer:pwm: stm32_pwm_remove:657 > [ 188.184555] stm32-pwm 40007000.timer:pwm: stm32_pwm_remove:659 > root@crown:~# [ 192.236423] platform 40007000.timer:pwm: stm32_pwm_apply:456 > [ 192.240727] 8<--- cut here --- > [ 192.243759] Unable to handle kernel NULL pointer dereference at virtual address 0000001c when read > ... > > Even without the crash you can see that stm32_pwm_apply() is still > running after pwmchip_remove() completed. > > I'm unsure if the device link could be improved here to ensure that the > fan is completely unbound even if it started unbinding already before > the pwm device gets unbound. (And if it could, would this fit the device > links purpose and so be a sensible improvement?) While I think that there is something to be done in the pwm core that this doesn't explode (i.e. do proper lifetime tracking such that a pwm_chip doesn't disappear while still being used---and I'm working on that) I expected that the device links between pwm consumer and provider would prevent the above described oops, too. But somehow the fan already going away (but still using the PWM) when the PWM is unbound, results in the PWM disappearing before the fan is completely gone. Is this expected, or a problem that can (and should?) be fixed? If you need more context or a tester, don't hesitate to ask. Best regards Uwe -- Pengutronix e.K. | Uwe Kleine-König | Industrial Linux Solutions | https://www.pengutronix.de/ | [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 01/18] pwm: Provide devm_pwmchip_alloc() function 2023-10-10 8:05 ` Uwe Kleine-König @ 2023-10-13 21:42 ` Saravana Kannan 2023-10-14 16:17 ` Uwe Kleine-König 0 siblings, 1 reply; 16+ messages in thread From: Saravana Kannan @ 2023-10-13 21:42 UTC (permalink / raw) To: Uwe Kleine-König Cc: linux-pwm, Jonathan Corbet, Thierry Reding, Greg Kroah-Hartman, linux-doc, Wolfram Sang, Mark Brown, James Clark, kernel, Yang Yingliang, Andy Shevchenko, Matti Vaittinen On Tue, Oct 10, 2023 at 1:05 AM Uwe Kleine-König <u.kleine-koenig@pengutronix.de> wrote: > > Hello Saravana, > > you were pointed out to me as the expert for device links. I found a > problem with these. > > On Tue, Jul 25, 2023 at 11:10:04PM +0200, Uwe Kleine-König wrote: > > Today I managed to trigger the problem I intend to address with this > > series. My machine to test this on is an stm32mp157. To be able to > > trigger the problem reliably I applied the following patches on top of > > v6.5-rc1: > > > > - pwm: stm32: Don't modify HW state in .remove() callback > > This is a cleanup that I already sent out. > > https://lore.kernel.org/r/20230713155142.2454010-2-u.kleine-koenig@pengutronix.de > > The purpose for reproducing the problem is to not trigger further > > calls to the apply callback. > > > > - The following patch: > > > > diff --git a/drivers/pwm/pwm-stm32.c b/drivers/pwm/pwm-stm32.c > > index 687967d3265f..c7fc02b0fa3c 100644 > > --- a/drivers/pwm/pwm-stm32.c > > +++ b/drivers/pwm/pwm-stm32.c > > @@ -451,6 +451,10 @@ static int stm32_pwm_apply(struct pwm_chip *chip, struct pwm_device *pwm, > > struct stm32_pwm *priv = to_stm32_pwm_dev(chip); > > int ret; > > > > + dev_info(chip->dev, "%s:%d\n", __func__, __LINE__); > > + msleep(5000); > > + dev_info(chip->dev, "%s:%d\n", __func__, __LINE__); > > + > > enabled = pwm->state.enabled; > > > > if (enabled && !state->enabled) { > > @@ -650,7 +654,11 @@ static void stm32_pwm_remove(struct platform_device *pdev) > > { > > struct stm32_pwm *priv = platform_get_drvdata(pdev); > > > > + dev_info(&pdev->dev, "%s:%d\n", __func__, __LINE__); > > pwmchip_remove(&priv->chip); > > + dev_info(&pdev->dev, "%s:%d\n", __func__, __LINE__); > > + > > + priv->regmap = NULL; > > } > > > > static int __maybe_unused stm32_pwm_suspend(struct device *dev) > > > > The first hunk is only there to widen the race window. The second is to > > give some diagnostics and make stm32_pwm_apply() crash if it continues > > to run after the msleep. (Without it it didn't crash reproducibly, don't > > understand why. *shrug*) > > > > The device tree contains a pwm-fan device making use of one of the PWMs. > > > > Now I do the following: > > > > echo fan > /sys/bus/platform/drivers/pwm-fan/unbind & sleep 1; echo 40007000.timer:pwm > /sys/bus/platform/drivers/stm32-pwm/unbind > > > > Unbinding the fan device has two effects: > > > > - The device link between fan and pwm looses its property to unbind fan > > when pwm gets unbound. > > (Its .status changes from DL_STATE_ACTIVE to DL_STATE_AVAILABLE) > > - It calls pwm_fan_cleanup() which triggers a call to > > pwm_apply_state(). > > > > So when the pwm device gets unbound the first thread is sleeping in > > stm32_pwm_apply(). The driver calls pwmchip_remove() and sets > > priv->regmap to NULL. Then a few seconds later the first thread wakes up > > in stm32_pwm_apply() with the chip freed and priv->regmap = NULL. Bang! > > > > This looks as follows: > > > > root@crown:~# echo fan > /sys/bus/platform/drivers/pwm-fan/unbind & sleep 1; echo 40007000.timer:pwm > /sys/bus/platform/drivers/stm32-pwm/unbind > > [ 187.182113] stm32-pwm 40007000.timer:pwm: stm32_pwm_apply:454 > > [ 188.164769] stm32-pwm 40007000.timer:pwm: stm32_pwm_remove:657 > > [ 188.184555] stm32-pwm 40007000.timer:pwm: stm32_pwm_remove:659 > > root@crown:~# [ 192.236423] platform 40007000.timer:pwm: stm32_pwm_apply:456 > > [ 192.240727] 8<--- cut here --- > > [ 192.243759] Unable to handle kernel NULL pointer dereference at virtual address 0000001c when read > > ... > > > > Even without the crash you can see that stm32_pwm_apply() is still > > running after pwmchip_remove() completed. > > > > I'm unsure if the device link could be improved here to ensure that the > > fan is completely unbound even if it started unbinding already before > > the pwm device gets unbound. (And if it could, would this fit the device > > links purpose and so be a sensible improvement?) > > While I think that there is something to be done in the pwm core that > this doesn't explode (i.e. do proper lifetime tracking such that a > pwm_chip doesn't disappear while still being used---and I'm working on > that) I expected that the device links between pwm consumer and provider > would prevent the above described oops, too. But somehow the fan already > going away (but still using the PWM) when the PWM is unbound, results in > the PWM disappearing before the fan is completely gone. > > Is this expected, or a problem that can (and should?) be fixed? I didn't read your full series, but I read this email. With what's in this email, the problem seems to be in the driver or the pwm framework. The pwm driver/framework can't tell the driver core that you successfully unbound (returning from .remove()) before you have finish all your ongoing transactions with the device. If your "apply()" is still running, you need to make sure it's complete before .remove() does any resource releasing/clean up. Also, how is the consumer driver's .remove() succeeding if it has an ongoing pwm call()? This all sounds like insufficient locking and critical region protection in both the consumer and supplier. Device links can't do anything here because you are giving it wrong info -- that the unbind was successful before it actually is. Device links will and can make sure that the consumer is unbound successfully before the unbind is called on the supplier. And it looks like that's still true here. -Saravana > > If you need more context or a tester, don't hesitate to ask. > > Best regards > Uwe > > -- > Pengutronix e.K. | Uwe Kleine-König | > Industrial Linux Solutions | https://www.pengutronix.de/ | ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 01/18] pwm: Provide devm_pwmchip_alloc() function 2023-10-13 21:42 ` Saravana Kannan @ 2023-10-14 16:17 ` Uwe Kleine-König 2023-10-17 23:35 ` Saravana Kannan 0 siblings, 1 reply; 16+ messages in thread From: Uwe Kleine-König @ 2023-10-14 16:17 UTC (permalink / raw) To: Saravana Kannan Cc: linux-pwm, linux-doc, Greg Kroah-Hartman, Mark Brown, Jonathan Corbet, Wolfram Sang, Thierry Reding, James Clark, kernel, Yang Yingliang, Andy Shevchenko, Matti Vaittinen [-- Attachment #1: Type: text/plain, Size: 7587 bytes --] On Fri, Oct 13, 2023 at 02:42:20PM -0700, Saravana Kannan wrote: > On Tue, Oct 10, 2023 at 1:05 AM Uwe Kleine-König > <u.kleine-koenig@pengutronix.de> wrote: > > > > Hello Saravana, > > > > you were pointed out to me as the expert for device links. I found a > > problem with these. > > > > On Tue, Jul 25, 2023 at 11:10:04PM +0200, Uwe Kleine-König wrote: > > > Today I managed to trigger the problem I intend to address with this > > > series. My machine to test this on is an stm32mp157. To be able to > > > trigger the problem reliably I applied the following patches on top of > > > v6.5-rc1: > > > > > > - pwm: stm32: Don't modify HW state in .remove() callback > > > This is a cleanup that I already sent out. > > > https://lore.kernel.org/r/20230713155142.2454010-2-u.kleine-koenig@pengutronix.de > > > The purpose for reproducing the problem is to not trigger further > > > calls to the apply callback. > > > > > > - The following patch: > > > > > > diff --git a/drivers/pwm/pwm-stm32.c b/drivers/pwm/pwm-stm32.c > > > index 687967d3265f..c7fc02b0fa3c 100644 > > > --- a/drivers/pwm/pwm-stm32.c > > > +++ b/drivers/pwm/pwm-stm32.c > > > @@ -451,6 +451,10 @@ static int stm32_pwm_apply(struct pwm_chip *chip, struct pwm_device *pwm, > > > struct stm32_pwm *priv = to_stm32_pwm_dev(chip); > > > int ret; > > > > > > + dev_info(chip->dev, "%s:%d\n", __func__, __LINE__); > > > + msleep(5000); > > > + dev_info(chip->dev, "%s:%d\n", __func__, __LINE__); > > > + > > > enabled = pwm->state.enabled; > > > > > > if (enabled && !state->enabled) { > > > @@ -650,7 +654,11 @@ static void stm32_pwm_remove(struct platform_device *pdev) > > > { > > > struct stm32_pwm *priv = platform_get_drvdata(pdev); > > > > > > + dev_info(&pdev->dev, "%s:%d\n", __func__, __LINE__); > > > pwmchip_remove(&priv->chip); > > > + dev_info(&pdev->dev, "%s:%d\n", __func__, __LINE__); > > > + > > > + priv->regmap = NULL; > > > } > > > > > > static int __maybe_unused stm32_pwm_suspend(struct device *dev) > > > > > > The first hunk is only there to widen the race window. The second is to > > > give some diagnostics and make stm32_pwm_apply() crash if it continues > > > to run after the msleep. (Without it it didn't crash reproducibly, don't > > > understand why. *shrug*) > > > > > > The device tree contains a pwm-fan device making use of one of the PWMs. > > > > > > Now I do the following: > > > > > > echo fan > /sys/bus/platform/drivers/pwm-fan/unbind & sleep 1; echo 40007000.timer:pwm > /sys/bus/platform/drivers/stm32-pwm/unbind > > > > > > Unbinding the fan device has two effects: > > > > > > - The device link between fan and pwm looses its property to unbind fan > > > when pwm gets unbound. > > > (Its .status changes from DL_STATE_ACTIVE to DL_STATE_AVAILABLE) > > > - It calls pwm_fan_cleanup() which triggers a call to > > > pwm_apply_state(). > > > > > > So when the pwm device gets unbound the first thread is sleeping in > > > stm32_pwm_apply(). The driver calls pwmchip_remove() and sets > > > priv->regmap to NULL. Then a few seconds later the first thread wakes up > > > in stm32_pwm_apply() with the chip freed and priv->regmap = NULL. Bang! > > > > > > This looks as follows: > > > > > > root@crown:~# echo fan > /sys/bus/platform/drivers/pwm-fan/unbind & sleep 1; echo 40007000.timer:pwm > /sys/bus/platform/drivers/stm32-pwm/unbind > > > [ 187.182113] stm32-pwm 40007000.timer:pwm: stm32_pwm_apply:454 > > > [ 188.164769] stm32-pwm 40007000.timer:pwm: stm32_pwm_remove:657 > > > [ 188.184555] stm32-pwm 40007000.timer:pwm: stm32_pwm_remove:659 > > > root@crown:~# [ 192.236423] platform 40007000.timer:pwm: stm32_pwm_apply:456 > > > [ 192.240727] 8<--- cut here --- > > > [ 192.243759] Unable to handle kernel NULL pointer dereference at virtual address 0000001c when read > > > ... > > > > > > Even without the crash you can see that stm32_pwm_apply() is still > > > running after pwmchip_remove() completed. > > > > > > I'm unsure if the device link could be improved here to ensure that the > > > fan is completely unbound even if it started unbinding already before > > > the pwm device gets unbound. (And if it could, would this fit the device > > > links purpose and so be a sensible improvement?) > > > > While I think that there is something to be done in the pwm core that > > this doesn't explode (i.e. do proper lifetime tracking such that a > > pwm_chip doesn't disappear while still being used---and I'm working on > > that) I expected that the device links between pwm consumer and provider > > would prevent the above described oops, too. But somehow the fan already > > going away (but still using the PWM) when the PWM is unbound, results in > > the PWM disappearing before the fan is completely gone. > > > > Is this expected, or a problem that can (and should?) be fixed? > > I didn't read your full series, but I read this email. With what's in > this email, the problem seems to be in the driver or the pwm > framework. The pwm driver/framework can't tell the driver core that > you successfully unbound (returning from .remove()) before you have > finish all your ongoing transactions with the device. If your > "apply()" is still running, you need to make sure it's complete before > .remove() does any resource releasing/clean up. > > Also, how is the consumer driver's .remove() succeeding if it has an > ongoing pwm call()? The thing that works fine and as expected is: - trigger unbind of PWM device via sysfs Because there is a device link PWM provider -> pwm consumer (fan), the fan is removed and once its gone (and not earlier), the PWM gets unbound. The failing sequence is: - trigger unbind of fan device in userspace thread A via sysfs. The fan's remove callback blocks for 5s in pwm_apply_state() and so .remove() doesn't complete yet. - a second later: trigger unbind of PWM device via sysfs in thread B. As before I'd expect that the device link results in waiting for the fan to be removed completely, but the PWM is removed immediately. - pwm_apply_state's sleep completes (in thread B) and operates on freed resources => bang! > This all sounds like insufficient locking and > critical region protection in both the consumer and supplier. My (and I think also Thierry's) expectation was, that the device link provides the needed synchronisation. But it doesn't as it doesn't block the PWM provider going away until the fan is completely gone. > Device links can't do anything here because you are giving it wrong > info -- that the unbind was successful before it actually is. The fan's unbind is ongoing, but not complete yet and I'd expect that the device link blocks unbinding the PWM until the fan is completely gone. So I think there is no wrong information. > Device links will and can make sure that the consumer is unbound > successfully before the unbind is called on the supplier. And it looks > like that's still true here. I hope you understood the situation better now and see the problem we have. The problem is fixable in the pwm framework (and I'm working on that), but I think there is also something to improve around devicelink handling. Best regards Uwe -- Pengutronix e.K. | Uwe Kleine-König | Industrial Linux Solutions | https://www.pengutronix.de/ | [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 01/18] pwm: Provide devm_pwmchip_alloc() function 2023-10-14 16:17 ` Uwe Kleine-König @ 2023-10-17 23:35 ` Saravana Kannan 2023-10-18 1:42 ` Saravana Kannan 0 siblings, 1 reply; 16+ messages in thread From: Saravana Kannan @ 2023-10-17 23:35 UTC (permalink / raw) To: Uwe Kleine-König Cc: linux-pwm, linux-doc, Greg Kroah-Hartman, Mark Brown, Jonathan Corbet, Wolfram Sang, Thierry Reding, James Clark, kernel, Yang Yingliang, Andy Shevchenko, Matti Vaittinen On Sat, Oct 14, 2023 at 9:17 AM Uwe Kleine-König <u.kleine-koenig@pengutronix.de> wrote: > > On Fri, Oct 13, 2023 at 02:42:20PM -0700, Saravana Kannan wrote: > > On Tue, Oct 10, 2023 at 1:05 AM Uwe Kleine-König > > <u.kleine-koenig@pengutronix.de> wrote: > > > > > > Hello Saravana, > > > > > > you were pointed out to me as the expert for device links. I found a > > > problem with these. > > > > > > On Tue, Jul 25, 2023 at 11:10:04PM +0200, Uwe Kleine-König wrote: > > > > Today I managed to trigger the problem I intend to address with this > > > > series. My machine to test this on is an stm32mp157. To be able to > > > > trigger the problem reliably I applied the following patches on top of > > > > v6.5-rc1: > > > > > > > > - pwm: stm32: Don't modify HW state in .remove() callback > > > > This is a cleanup that I already sent out. > > > > https://lore.kernel.org/r/20230713155142.2454010-2-u.kleine-koenig@pengutronix.de > > > > The purpose for reproducing the problem is to not trigger further > > > > calls to the apply callback. > > > > > > > > - The following patch: > > > > > > > > diff --git a/drivers/pwm/pwm-stm32.c b/drivers/pwm/pwm-stm32.c > > > > index 687967d3265f..c7fc02b0fa3c 100644 > > > > --- a/drivers/pwm/pwm-stm32.c > > > > +++ b/drivers/pwm/pwm-stm32.c > > > > @@ -451,6 +451,10 @@ static int stm32_pwm_apply(struct pwm_chip *chip, struct pwm_device *pwm, > > > > struct stm32_pwm *priv = to_stm32_pwm_dev(chip); > > > > int ret; > > > > > > > > + dev_info(chip->dev, "%s:%d\n", __func__, __LINE__); > > > > + msleep(5000); > > > > + dev_info(chip->dev, "%s:%d\n", __func__, __LINE__); > > > > + > > > > enabled = pwm->state.enabled; > > > > > > > > if (enabled && !state->enabled) { > > > > @@ -650,7 +654,11 @@ static void stm32_pwm_remove(struct platform_device *pdev) > > > > { > > > > struct stm32_pwm *priv = platform_get_drvdata(pdev); > > > > > > > > + dev_info(&pdev->dev, "%s:%d\n", __func__, __LINE__); > > > > pwmchip_remove(&priv->chip); > > > > + dev_info(&pdev->dev, "%s:%d\n", __func__, __LINE__); > > > > + > > > > + priv->regmap = NULL; > > > > } > > > > > > > > static int __maybe_unused stm32_pwm_suspend(struct device *dev) > > > > > > > > The first hunk is only there to widen the race window. The second is to > > > > give some diagnostics and make stm32_pwm_apply() crash if it continues > > > > to run after the msleep. (Without it it didn't crash reproducibly, don't > > > > understand why. *shrug*) > > > > > > > > The device tree contains a pwm-fan device making use of one of the PWMs. > > > > > > > > Now I do the following: > > > > > > > > echo fan > /sys/bus/platform/drivers/pwm-fan/unbind & sleep 1; echo 40007000.timer:pwm > /sys/bus/platform/drivers/stm32-pwm/unbind > > > > > > > > Unbinding the fan device has two effects: > > > > > > > > - The device link between fan and pwm looses its property to unbind fan > > > > when pwm gets unbound. > > > > (Its .status changes from DL_STATE_ACTIVE to DL_STATE_AVAILABLE) > > > > - It calls pwm_fan_cleanup() which triggers a call to > > > > pwm_apply_state(). > > > > > > > > So when the pwm device gets unbound the first thread is sleeping in > > > > stm32_pwm_apply(). The driver calls pwmchip_remove() and sets > > > > priv->regmap to NULL. Then a few seconds later the first thread wakes up > > > > in stm32_pwm_apply() with the chip freed and priv->regmap = NULL. Bang! > > > > > > > > This looks as follows: > > > > > > > > root@crown:~# echo fan > /sys/bus/platform/drivers/pwm-fan/unbind & sleep 1; echo 40007000.timer:pwm > /sys/bus/platform/drivers/stm32-pwm/unbind > > > > [ 187.182113] stm32-pwm 40007000.timer:pwm: stm32_pwm_apply:454 > > > > [ 188.164769] stm32-pwm 40007000.timer:pwm: stm32_pwm_remove:657 > > > > [ 188.184555] stm32-pwm 40007000.timer:pwm: stm32_pwm_remove:659 > > > > root@crown:~# [ 192.236423] platform 40007000.timer:pwm: stm32_pwm_apply:456 > > > > [ 192.240727] 8<--- cut here --- > > > > [ 192.243759] Unable to handle kernel NULL pointer dereference at virtual address 0000001c when read > > > > ... > > > > > > > > Even without the crash you can see that stm32_pwm_apply() is still > > > > running after pwmchip_remove() completed. > > > > > > > > I'm unsure if the device link could be improved here to ensure that the > > > > fan is completely unbound even if it started unbinding already before > > > > the pwm device gets unbound. (And if it could, would this fit the device > > > > links purpose and so be a sensible improvement?) > > > > > > While I think that there is something to be done in the pwm core that > > > this doesn't explode (i.e. do proper lifetime tracking such that a > > > pwm_chip doesn't disappear while still being used---and I'm working on > > > that) I expected that the device links between pwm consumer and provider > > > would prevent the above described oops, too. But somehow the fan already > > > going away (but still using the PWM) when the PWM is unbound, results in > > > the PWM disappearing before the fan is completely gone. > > > > > > Is this expected, or a problem that can (and should?) be fixed? > > > > I didn't read your full series, but I read this email. With what's in > > this email, the problem seems to be in the driver or the pwm > > framework. The pwm driver/framework can't tell the driver core that > > you successfully unbound (returning from .remove()) before you have > > finish all your ongoing transactions with the device. If your > > "apply()" is still running, you need to make sure it's complete before > > .remove() does any resource releasing/clean up. > > > > Also, how is the consumer driver's .remove() succeeding if it has an > > ongoing pwm call()? > > The thing that works fine and as expected is: > > - trigger unbind of PWM device via sysfs > > Because there is a device link PWM provider -> pwm consumer (fan), the > fan is removed and once its gone (and not earlier), the PWM gets unbound. > > The failing sequence is: > > - trigger unbind of fan device in userspace thread A via sysfs. The > fan's remove callback blocks for 5s in pwm_apply_state() and so > .remove() doesn't complete yet. > > - a second later: trigger unbind of PWM device via sysfs in thread B. > As before I'd expect that the device link results in waiting for the > fan to be removed completely, but the PWM is removed immediately. > > - pwm_apply_state's sleep completes (in thread B) and operates on freed > resources => bang! > > > This all sounds like insufficient locking and > > critical region protection in both the consumer and supplier. > > My (and I think also Thierry's) expectation was, that the device link > provides the needed synchronisation. But it doesn't as it doesn't block > the PWM provider going away until the fan is completely gone. > > > Device links can't do anything here because you are giving it wrong > > info -- that the unbind was successful before it actually is. > > The fan's unbind is ongoing, but not complete yet and I'd expect that > the device link blocks unbinding the PWM until the fan is completely > gone. So I think there is no wrong information. > > > Device links will and can make sure that the consumer is unbound > > successfully before the unbind is called on the supplier. And it looks > > like that's still true here. > > I hope you understood the situation better now and see the problem we > have. > > The problem is fixable in the pwm framework (and I'm working on that), > but I think there is also something to improve around devicelink > handling. Thanks for a better explanation of the issue. I agree, this seems like something device links should be able to take care of. I'll take a look into this. -Saravana ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 01/18] pwm: Provide devm_pwmchip_alloc() function 2023-10-17 23:35 ` Saravana Kannan @ 2023-10-18 1:42 ` Saravana Kannan 2023-10-18 11:17 ` Uwe Kleine-König 0 siblings, 1 reply; 16+ messages in thread From: Saravana Kannan @ 2023-10-18 1:42 UTC (permalink / raw) To: Uwe Kleine-König Cc: linux-pwm, linux-doc, Greg Kroah-Hartman, Mark Brown, Jonathan Corbet, Wolfram Sang, Thierry Reding, James Clark, kernel, Yang Yingliang, Andy Shevchenko, Matti Vaittinen, Android Kernel Team On Tue, Oct 17, 2023 at 4:35 PM Saravana Kannan <saravanak@google.com> wrote: > > On Sat, Oct 14, 2023 at 9:17 AM Uwe Kleine-König > <u.kleine-koenig@pengutronix.de> wrote: > > > > On Fri, Oct 13, 2023 at 02:42:20PM -0700, Saravana Kannan wrote: > > > On Tue, Oct 10, 2023 at 1:05 AM Uwe Kleine-König > > > <u.kleine-koenig@pengutronix.de> wrote: > > > > > > > > Hello Saravana, > > > > > > > > you were pointed out to me as the expert for device links. I found a > > > > problem with these. > > > > > > > > On Tue, Jul 25, 2023 at 11:10:04PM +0200, Uwe Kleine-König wrote: > > > > > Today I managed to trigger the problem I intend to address with this > > > > > series. My machine to test this on is an stm32mp157. To be able to > > > > > trigger the problem reliably I applied the following patches on top of > > > > > v6.5-rc1: > > > > > > > > > > - pwm: stm32: Don't modify HW state in .remove() callback > > > > > This is a cleanup that I already sent out. > > > > > https://lore.kernel.org/r/20230713155142.2454010-2-u.kleine-koenig@pengutronix.de > > > > > The purpose for reproducing the problem is to not trigger further > > > > > calls to the apply callback. > > > > > > > > > > - The following patch: > > > > > > > > > > diff --git a/drivers/pwm/pwm-stm32.c b/drivers/pwm/pwm-stm32.c > > > > > index 687967d3265f..c7fc02b0fa3c 100644 > > > > > --- a/drivers/pwm/pwm-stm32.c > > > > > +++ b/drivers/pwm/pwm-stm32.c > > > > > @@ -451,6 +451,10 @@ static int stm32_pwm_apply(struct pwm_chip *chip, struct pwm_device *pwm, > > > > > struct stm32_pwm *priv = to_stm32_pwm_dev(chip); > > > > > int ret; > > > > > > > > > > + dev_info(chip->dev, "%s:%d\n", __func__, __LINE__); > > > > > + msleep(5000); > > > > > + dev_info(chip->dev, "%s:%d\n", __func__, __LINE__); > > > > > + > > > > > enabled = pwm->state.enabled; > > > > > > > > > > if (enabled && !state->enabled) { > > > > > @@ -650,7 +654,11 @@ static void stm32_pwm_remove(struct platform_device *pdev) > > > > > { > > > > > struct stm32_pwm *priv = platform_get_drvdata(pdev); > > > > > > > > > > + dev_info(&pdev->dev, "%s:%d\n", __func__, __LINE__); > > > > > pwmchip_remove(&priv->chip); > > > > > + dev_info(&pdev->dev, "%s:%d\n", __func__, __LINE__); > > > > > + > > > > > + priv->regmap = NULL; > > > > > } > > > > > > > > > > static int __maybe_unused stm32_pwm_suspend(struct device *dev) > > > > > > > > > > The first hunk is only there to widen the race window. The second is to > > > > > give some diagnostics and make stm32_pwm_apply() crash if it continues > > > > > to run after the msleep. (Without it it didn't crash reproducibly, don't > > > > > understand why. *shrug*) > > > > > > > > > > The device tree contains a pwm-fan device making use of one of the PWMs. > > > > > > > > > > Now I do the following: > > > > > > > > > > echo fan > /sys/bus/platform/drivers/pwm-fan/unbind & sleep 1; echo 40007000.timer:pwm > /sys/bus/platform/drivers/stm32-pwm/unbind > > > > > > > > > > Unbinding the fan device has two effects: > > > > > > > > > > - The device link between fan and pwm looses its property to unbind fan > > > > > when pwm gets unbound. > > > > > (Its .status changes from DL_STATE_ACTIVE to DL_STATE_AVAILABLE) > > > > > - It calls pwm_fan_cleanup() which triggers a call to > > > > > pwm_apply_state(). > > > > > > > > > > So when the pwm device gets unbound the first thread is sleeping in > > > > > stm32_pwm_apply(). The driver calls pwmchip_remove() and sets > > > > > priv->regmap to NULL. Then a few seconds later the first thread wakes up > > > > > in stm32_pwm_apply() with the chip freed and priv->regmap = NULL. Bang! > > > > > > > > > > This looks as follows: > > > > > > > > > > root@crown:~# echo fan > /sys/bus/platform/drivers/pwm-fan/unbind & sleep 1; echo 40007000.timer:pwm > /sys/bus/platform/drivers/stm32-pwm/unbind > > > > > [ 187.182113] stm32-pwm 40007000.timer:pwm: stm32_pwm_apply:454 > > > > > [ 188.164769] stm32-pwm 40007000.timer:pwm: stm32_pwm_remove:657 > > > > > [ 188.184555] stm32-pwm 40007000.timer:pwm: stm32_pwm_remove:659 > > > > > root@crown:~# [ 192.236423] platform 40007000.timer:pwm: stm32_pwm_apply:456 > > > > > [ 192.240727] 8<--- cut here --- > > > > > [ 192.243759] Unable to handle kernel NULL pointer dereference at virtual address 0000001c when read > > > > > ... > > > > > > > > > > Even without the crash you can see that stm32_pwm_apply() is still > > > > > running after pwmchip_remove() completed. > > > > > > > > > > I'm unsure if the device link could be improved here to ensure that the > > > > > fan is completely unbound even if it started unbinding already before > > > > > the pwm device gets unbound. (And if it could, would this fit the device > > > > > links purpose and so be a sensible improvement?) > > > > > > > > While I think that there is something to be done in the pwm core that > > > > this doesn't explode (i.e. do proper lifetime tracking such that a > > > > pwm_chip doesn't disappear while still being used---and I'm working on > > > > that) I expected that the device links between pwm consumer and provider > > > > would prevent the above described oops, too. But somehow the fan already > > > > going away (but still using the PWM) when the PWM is unbound, results in > > > > the PWM disappearing before the fan is completely gone. > > > > > > > > Is this expected, or a problem that can (and should?) be fixed? > > > > > > I didn't read your full series, but I read this email. With what's in > > > this email, the problem seems to be in the driver or the pwm > > > framework. The pwm driver/framework can't tell the driver core that > > > you successfully unbound (returning from .remove()) before you have > > > finish all your ongoing transactions with the device. If your > > > "apply()" is still running, you need to make sure it's complete before > > > .remove() does any resource releasing/clean up. > > > > > > Also, how is the consumer driver's .remove() succeeding if it has an > > > ongoing pwm call()? > > > > The thing that works fine and as expected is: > > > > - trigger unbind of PWM device via sysfs > > > > Because there is a device link PWM provider -> pwm consumer (fan), the > > fan is removed and once its gone (and not earlier), the PWM gets unbound. > > > > The failing sequence is: > > > > - trigger unbind of fan device in userspace thread A via sysfs. The > > fan's remove callback blocks for 5s in pwm_apply_state() and so > > .remove() doesn't complete yet. > > > > - a second later: trigger unbind of PWM device via sysfs in thread B. > > As before I'd expect that the device link results in waiting for the > > fan to be removed completely, but the PWM is removed immediately. > > > > - pwm_apply_state's sleep completes (in thread B) and operates on freed > > resources => bang! > > > > > This all sounds like insufficient locking and > > > critical region protection in both the consumer and supplier. > > > > My (and I think also Thierry's) expectation was, that the device link > > provides the needed synchronisation. But it doesn't as it doesn't block > > the PWM provider going away until the fan is completely gone. > > > > > Device links can't do anything here because you are giving it wrong > > > info -- that the unbind was successful before it actually is. > > > > The fan's unbind is ongoing, but not complete yet and I'd expect that > > the device link blocks unbinding the PWM until the fan is completely > > gone. So I think there is no wrong information. > > > > > Device links will and can make sure that the consumer is unbound > > > successfully before the unbind is called on the supplier. And it looks > > > like that's still true here. > > > > I hope you understood the situation better now and see the problem we > > have. > > > > The problem is fixable in the pwm framework (and I'm working on that), > > but I think there is also something to improve around devicelink > > handling. > > Thanks for a better explanation of the issue. I agree, this seems like > something device links should be able to take care of. > > I'll take a look into this. Took me a while to debug this because I couldn't find the .remove() function and I was very confused about what's going on. I'm guessing you started hitting this issue only after moving to the devm_ variant of the pwm APIs. I think I have a fix. https://lore.kernel.org/lkml/20231018013851.3303928-1-saravanak@google.com/T/#u I didn't even compile test it, but it's a trivial change. Can you please test it out and give your "tested-by" if it fixes your issue? Thanks, Saravana ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 01/18] pwm: Provide devm_pwmchip_alloc() function 2023-10-18 1:42 ` Saravana Kannan @ 2023-10-18 11:17 ` Uwe Kleine-König 2023-10-18 21:01 ` Saravana Kannan 0 siblings, 1 reply; 16+ messages in thread From: Uwe Kleine-König @ 2023-10-18 11:17 UTC (permalink / raw) To: Saravana Kannan Cc: linux-pwm, Jonathan Corbet, Greg Kroah-Hartman, linux-doc, Wolfram Sang, Mark Brown, Thierry Reding, James Clark, kernel, Yang Yingliang, Andy Shevchenko, Android Kernel Team, Matti Vaittinen [-- Attachment #1: Type: text/plain, Size: 9516 bytes --] On Tue, Oct 17, 2023 at 06:42:40PM -0700, Saravana Kannan wrote: > On Tue, Oct 17, 2023 at 4:35 PM Saravana Kannan <saravanak@google.com> wrote: > > > > On Sat, Oct 14, 2023 at 9:17 AM Uwe Kleine-König > > <u.kleine-koenig@pengutronix.de> wrote: > > > > > > On Fri, Oct 13, 2023 at 02:42:20PM -0700, Saravana Kannan wrote: > > > > On Tue, Oct 10, 2023 at 1:05 AM Uwe Kleine-König > > > > <u.kleine-koenig@pengutronix.de> wrote: > > > > > > > > > > Hello Saravana, > > > > > > > > > > you were pointed out to me as the expert for device links. I found a > > > > > problem with these. > > > > > > > > > > On Tue, Jul 25, 2023 at 11:10:04PM +0200, Uwe Kleine-König wrote: > > > > > > Today I managed to trigger the problem I intend to address with this > > > > > > series. My machine to test this on is an stm32mp157. To be able to > > > > > > trigger the problem reliably I applied the following patches on top of > > > > > > v6.5-rc1: > > > > > > > > > > > > - pwm: stm32: Don't modify HW state in .remove() callback > > > > > > This is a cleanup that I already sent out. > > > > > > https://lore.kernel.org/r/20230713155142.2454010-2-u.kleine-koenig@pengutronix.de > > > > > > The purpose for reproducing the problem is to not trigger further > > > > > > calls to the apply callback. > > > > > > > > > > > > - The following patch: > > > > > > > > > > > > diff --git a/drivers/pwm/pwm-stm32.c b/drivers/pwm/pwm-stm32.c > > > > > > index 687967d3265f..c7fc02b0fa3c 100644 > > > > > > --- a/drivers/pwm/pwm-stm32.c > > > > > > +++ b/drivers/pwm/pwm-stm32.c > > > > > > @@ -451,6 +451,10 @@ static int stm32_pwm_apply(struct pwm_chip *chip, struct pwm_device *pwm, > > > > > > struct stm32_pwm *priv = to_stm32_pwm_dev(chip); > > > > > > int ret; > > > > > > > > > > > > + dev_info(chip->dev, "%s:%d\n", __func__, __LINE__); > > > > > > + msleep(5000); > > > > > > + dev_info(chip->dev, "%s:%d\n", __func__, __LINE__); > > > > > > + > > > > > > enabled = pwm->state.enabled; > > > > > > > > > > > > if (enabled && !state->enabled) { > > > > > > @@ -650,7 +654,11 @@ static void stm32_pwm_remove(struct platform_device *pdev) > > > > > > { > > > > > > struct stm32_pwm *priv = platform_get_drvdata(pdev); > > > > > > > > > > > > + dev_info(&pdev->dev, "%s:%d\n", __func__, __LINE__); > > > > > > pwmchip_remove(&priv->chip); > > > > > > + dev_info(&pdev->dev, "%s:%d\n", __func__, __LINE__); > > > > > > + > > > > > > + priv->regmap = NULL; > > > > > > } > > > > > > > > > > > > static int __maybe_unused stm32_pwm_suspend(struct device *dev) > > > > > > > > > > > > The first hunk is only there to widen the race window. The second is to > > > > > > give some diagnostics and make stm32_pwm_apply() crash if it continues > > > > > > to run after the msleep. (Without it it didn't crash reproducibly, don't > > > > > > understand why. *shrug*) > > > > > > > > > > > > The device tree contains a pwm-fan device making use of one of the PWMs. > > > > > > > > > > > > Now I do the following: > > > > > > > > > > > > echo fan > /sys/bus/platform/drivers/pwm-fan/unbind & sleep 1; echo 40007000.timer:pwm > /sys/bus/platform/drivers/stm32-pwm/unbind > > > > > > > > > > > > Unbinding the fan device has two effects: > > > > > > > > > > > > - The device link between fan and pwm looses its property to unbind fan > > > > > > when pwm gets unbound. > > > > > > (Its .status changes from DL_STATE_ACTIVE to DL_STATE_AVAILABLE) > > > > > > - It calls pwm_fan_cleanup() which triggers a call to > > > > > > pwm_apply_state(). > > > > > > > > > > > > So when the pwm device gets unbound the first thread is sleeping in > > > > > > stm32_pwm_apply(). The driver calls pwmchip_remove() and sets > > > > > > priv->regmap to NULL. Then a few seconds later the first thread wakes up > > > > > > in stm32_pwm_apply() with the chip freed and priv->regmap = NULL. Bang! > > > > > > > > > > > > This looks as follows: > > > > > > > > > > > > root@crown:~# echo fan > /sys/bus/platform/drivers/pwm-fan/unbind & sleep 1; echo 40007000.timer:pwm > /sys/bus/platform/drivers/stm32-pwm/unbind > > > > > > [ 187.182113] stm32-pwm 40007000.timer:pwm: stm32_pwm_apply:454 > > > > > > [ 188.164769] stm32-pwm 40007000.timer:pwm: stm32_pwm_remove:657 > > > > > > [ 188.184555] stm32-pwm 40007000.timer:pwm: stm32_pwm_remove:659 > > > > > > root@crown:~# [ 192.236423] platform 40007000.timer:pwm: stm32_pwm_apply:456 > > > > > > [ 192.240727] 8<--- cut here --- > > > > > > [ 192.243759] Unable to handle kernel NULL pointer dereference at virtual address 0000001c when read > > > > > > ... > > > > > > > > > > > > Even without the crash you can see that stm32_pwm_apply() is still > > > > > > running after pwmchip_remove() completed. > > > > > > > > > > > > I'm unsure if the device link could be improved here to ensure that the > > > > > > fan is completely unbound even if it started unbinding already before > > > > > > the pwm device gets unbound. (And if it could, would this fit the device > > > > > > links purpose and so be a sensible improvement?) > > > > > > > > > > While I think that there is something to be done in the pwm core that > > > > > this doesn't explode (i.e. do proper lifetime tracking such that a > > > > > pwm_chip doesn't disappear while still being used---and I'm working on > > > > > that) I expected that the device links between pwm consumer and provider > > > > > would prevent the above described oops, too. But somehow the fan already > > > > > going away (but still using the PWM) when the PWM is unbound, results in > > > > > the PWM disappearing before the fan is completely gone. > > > > > > > > > > Is this expected, or a problem that can (and should?) be fixed? > > > > > > > > I didn't read your full series, but I read this email. With what's in > > > > this email, the problem seems to be in the driver or the pwm > > > > framework. The pwm driver/framework can't tell the driver core that > > > > you successfully unbound (returning from .remove()) before you have > > > > finish all your ongoing transactions with the device. If your > > > > "apply()" is still running, you need to make sure it's complete before > > > > .remove() does any resource releasing/clean up. > > > > > > > > Also, how is the consumer driver's .remove() succeeding if it has an > > > > ongoing pwm call()? > > > > > > The thing that works fine and as expected is: > > > > > > - trigger unbind of PWM device via sysfs > > > > > > Because there is a device link PWM provider -> pwm consumer (fan), the > > > fan is removed and once its gone (and not earlier), the PWM gets unbound. > > > > > > The failing sequence is: > > > > > > - trigger unbind of fan device in userspace thread A via sysfs. The > > > fan's remove callback blocks for 5s in pwm_apply_state() and so > > > .remove() doesn't complete yet. > > > > > > - a second later: trigger unbind of PWM device via sysfs in thread B. > > > As before I'd expect that the device link results in waiting for the > > > fan to be removed completely, but the PWM is removed immediately. > > > > > > - pwm_apply_state's sleep completes (in thread B) and operates on freed > > > resources => bang! > > > > > > > This all sounds like insufficient locking and > > > > critical region protection in both the consumer and supplier. > > > > > > My (and I think also Thierry's) expectation was, that the device link > > > provides the needed synchronisation. But it doesn't as it doesn't block > > > the PWM provider going away until the fan is completely gone. > > > > > > > Device links can't do anything here because you are giving it wrong > > > > info -- that the unbind was successful before it actually is. > > > > > > The fan's unbind is ongoing, but not complete yet and I'd expect that > > > the device link blocks unbinding the PWM until the fan is completely > > > gone. So I think there is no wrong information. > > > > > > > Device links will and can make sure that the consumer is unbound > > > > successfully before the unbind is called on the supplier. And it looks > > > > like that's still true here. > > > > > > I hope you understood the situation better now and see the problem we > > > have. > > > > > > The problem is fixable in the pwm framework (and I'm working on that), > > > but I think there is also something to improve around devicelink > > > handling. > > > > Thanks for a better explanation of the issue. I agree, this seems like > > something device links should be able to take care of. > > > > I'll take a look into this. > > Took me a while to debug this because I couldn't find the .remove() > function and I was very confused about what's going on. > I'm guessing you started hitting this issue only after moving to the > devm_ variant of the pwm APIs. Ah I see. That problem wouldn't happen if the fan called a pwm API function in its remove callback but that happens in a devm cleanup call (registered by devm_add_action_or_reset(dev, pwm_fan_cleanup, ctx) in pwm_fan_probe()). I first thought you talked about 8c89fd866ad221af037ef0ec3d60b83d0b859c65. Best regards Uwe -- Pengutronix e.K. | Uwe Kleine-König | Industrial Linux Solutions | https://www.pengutronix.de/ | [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 01/18] pwm: Provide devm_pwmchip_alloc() function 2023-10-18 11:17 ` Uwe Kleine-König @ 2023-10-18 21:01 ` Saravana Kannan 2023-10-18 21:20 ` Uwe Kleine-König 0 siblings, 1 reply; 16+ messages in thread From: Saravana Kannan @ 2023-10-18 21:01 UTC (permalink / raw) To: Uwe Kleine-König Cc: linux-pwm, Jonathan Corbet, Greg Kroah-Hartman, linux-doc, Wolfram Sang, Mark Brown, Thierry Reding, James Clark, kernel, Yang Yingliang, Andy Shevchenko, Android Kernel Team, Matti Vaittinen On Wed, Oct 18, 2023 at 4:17 AM Uwe Kleine-König <u.kleine-koenig@pengutronix.de> wrote: > > On Tue, Oct 17, 2023 at 06:42:40PM -0700, Saravana Kannan wrote: > > On Tue, Oct 17, 2023 at 4:35 PM Saravana Kannan <saravanak@google.com> wrote: > > > > > > On Sat, Oct 14, 2023 at 9:17 AM Uwe Kleine-König > > > <u.kleine-koenig@pengutronix.de> wrote: > > > > > > > > On Fri, Oct 13, 2023 at 02:42:20PM -0700, Saravana Kannan wrote: > > > > > On Tue, Oct 10, 2023 at 1:05 AM Uwe Kleine-König > > > > > <u.kleine-koenig@pengutronix.de> wrote: > > > > > > > > > > > > Hello Saravana, > > > > > > > > > > > > you were pointed out to me as the expert for device links. I found a > > > > > > problem with these. > > > > > > > > > > > > On Tue, Jul 25, 2023 at 11:10:04PM +0200, Uwe Kleine-König wrote: > > > > > > > Today I managed to trigger the problem I intend to address with this > > > > > > > series. My machine to test this on is an stm32mp157. To be able to > > > > > > > trigger the problem reliably I applied the following patches on top of > > > > > > > v6.5-rc1: > > > > > > > > > > > > > > - pwm: stm32: Don't modify HW state in .remove() callback > > > > > > > This is a cleanup that I already sent out. > > > > > > > https://lore.kernel.org/r/20230713155142.2454010-2-u.kleine-koenig@pengutronix.de > > > > > > > The purpose for reproducing the problem is to not trigger further > > > > > > > calls to the apply callback. > > > > > > > > > > > > > > - The following patch: > > > > > > > > > > > > > > diff --git a/drivers/pwm/pwm-stm32.c b/drivers/pwm/pwm-stm32.c > > > > > > > index 687967d3265f..c7fc02b0fa3c 100644 > > > > > > > --- a/drivers/pwm/pwm-stm32.c > > > > > > > +++ b/drivers/pwm/pwm-stm32.c > > > > > > > @@ -451,6 +451,10 @@ static int stm32_pwm_apply(struct pwm_chip *chip, struct pwm_device *pwm, > > > > > > > struct stm32_pwm *priv = to_stm32_pwm_dev(chip); > > > > > > > int ret; > > > > > > > > > > > > > > + dev_info(chip->dev, "%s:%d\n", __func__, __LINE__); > > > > > > > + msleep(5000); > > > > > > > + dev_info(chip->dev, "%s:%d\n", __func__, __LINE__); > > > > > > > + > > > > > > > enabled = pwm->state.enabled; > > > > > > > > > > > > > > if (enabled && !state->enabled) { > > > > > > > @@ -650,7 +654,11 @@ static void stm32_pwm_remove(struct platform_device *pdev) > > > > > > > { > > > > > > > struct stm32_pwm *priv = platform_get_drvdata(pdev); > > > > > > > > > > > > > > + dev_info(&pdev->dev, "%s:%d\n", __func__, __LINE__); > > > > > > > pwmchip_remove(&priv->chip); > > > > > > > + dev_info(&pdev->dev, "%s:%d\n", __func__, __LINE__); > > > > > > > + > > > > > > > + priv->regmap = NULL; > > > > > > > } > > > > > > > > > > > > > > static int __maybe_unused stm32_pwm_suspend(struct device *dev) > > > > > > > > > > > > > > The first hunk is only there to widen the race window. The second is to > > > > > > > give some diagnostics and make stm32_pwm_apply() crash if it continues > > > > > > > to run after the msleep. (Without it it didn't crash reproducibly, don't > > > > > > > understand why. *shrug*) > > > > > > > > > > > > > > The device tree contains a pwm-fan device making use of one of the PWMs. > > > > > > > > > > > > > > Now I do the following: > > > > > > > > > > > > > > echo fan > /sys/bus/platform/drivers/pwm-fan/unbind & sleep 1; echo 40007000.timer:pwm > /sys/bus/platform/drivers/stm32-pwm/unbind > > > > > > > > > > > > > > Unbinding the fan device has two effects: > > > > > > > > > > > > > > - The device link between fan and pwm looses its property to unbind fan > > > > > > > when pwm gets unbound. > > > > > > > (Its .status changes from DL_STATE_ACTIVE to DL_STATE_AVAILABLE) > > > > > > > - It calls pwm_fan_cleanup() which triggers a call to > > > > > > > pwm_apply_state(). > > > > > > > > > > > > > > So when the pwm device gets unbound the first thread is sleeping in > > > > > > > stm32_pwm_apply(). The driver calls pwmchip_remove() and sets > > > > > > > priv->regmap to NULL. Then a few seconds later the first thread wakes up > > > > > > > in stm32_pwm_apply() with the chip freed and priv->regmap = NULL. Bang! > > > > > > > > > > > > > > This looks as follows: > > > > > > > > > > > > > > root@crown:~# echo fan > /sys/bus/platform/drivers/pwm-fan/unbind & sleep 1; echo 40007000.timer:pwm > /sys/bus/platform/drivers/stm32-pwm/unbind > > > > > > > [ 187.182113] stm32-pwm 40007000.timer:pwm: stm32_pwm_apply:454 > > > > > > > [ 188.164769] stm32-pwm 40007000.timer:pwm: stm32_pwm_remove:657 > > > > > > > [ 188.184555] stm32-pwm 40007000.timer:pwm: stm32_pwm_remove:659 > > > > > > > root@crown:~# [ 192.236423] platform 40007000.timer:pwm: stm32_pwm_apply:456 > > > > > > > [ 192.240727] 8<--- cut here --- > > > > > > > [ 192.243759] Unable to handle kernel NULL pointer dereference at virtual address 0000001c when read > > > > > > > ... > > > > > > > > > > > > > > Even without the crash you can see that stm32_pwm_apply() is still > > > > > > > running after pwmchip_remove() completed. > > > > > > > > > > > > > > I'm unsure if the device link could be improved here to ensure that the > > > > > > > fan is completely unbound even if it started unbinding already before > > > > > > > the pwm device gets unbound. (And if it could, would this fit the device > > > > > > > links purpose and so be a sensible improvement?) > > > > > > > > > > > > While I think that there is something to be done in the pwm core that > > > > > > this doesn't explode (i.e. do proper lifetime tracking such that a > > > > > > pwm_chip doesn't disappear while still being used---and I'm working on > > > > > > that) I expected that the device links between pwm consumer and provider > > > > > > would prevent the above described oops, too. But somehow the fan already > > > > > > going away (but still using the PWM) when the PWM is unbound, results in > > > > > > the PWM disappearing before the fan is completely gone. > > > > > > > > > > > > Is this expected, or a problem that can (and should?) be fixed? > > > > > > > > > > I didn't read your full series, but I read this email. With what's in > > > > > this email, the problem seems to be in the driver or the pwm > > > > > framework. The pwm driver/framework can't tell the driver core that > > > > > you successfully unbound (returning from .remove()) before you have > > > > > finish all your ongoing transactions with the device. If your > > > > > "apply()" is still running, you need to make sure it's complete before > > > > > .remove() does any resource releasing/clean up. > > > > > > > > > > Also, how is the consumer driver's .remove() succeeding if it has an > > > > > ongoing pwm call()? > > > > > > > > The thing that works fine and as expected is: > > > > > > > > - trigger unbind of PWM device via sysfs > > > > > > > > Because there is a device link PWM provider -> pwm consumer (fan), the > > > > fan is removed and once its gone (and not earlier), the PWM gets unbound. > > > > > > > > The failing sequence is: > > > > > > > > - trigger unbind of fan device in userspace thread A via sysfs. The > > > > fan's remove callback blocks for 5s in pwm_apply_state() and so > > > > .remove() doesn't complete yet. > > > > > > > > - a second later: trigger unbind of PWM device via sysfs in thread B. > > > > As before I'd expect that the device link results in waiting for the > > > > fan to be removed completely, but the PWM is removed immediately. > > > > > > > > - pwm_apply_state's sleep completes (in thread B) and operates on freed > > > > resources => bang! > > > > > > > > > This all sounds like insufficient locking and > > > > > critical region protection in both the consumer and supplier. > > > > > > > > My (and I think also Thierry's) expectation was, that the device link > > > > provides the needed synchronisation. But it doesn't as it doesn't block > > > > the PWM provider going away until the fan is completely gone. > > > > > > > > > Device links can't do anything here because you are giving it wrong > > > > > info -- that the unbind was successful before it actually is. > > > > > > > > The fan's unbind is ongoing, but not complete yet and I'd expect that > > > > the device link blocks unbinding the PWM until the fan is completely > > > > gone. So I think there is no wrong information. > > > > > > > > > Device links will and can make sure that the consumer is unbound > > > > > successfully before the unbind is called on the supplier. And it looks > > > > > like that's still true here. > > > > > > > > I hope you understood the situation better now and see the problem we > > > > have. > > > > > > > > The problem is fixable in the pwm framework (and I'm working on that), > > > > but I think there is also something to improve around devicelink > > > > handling. > > > > > > Thanks for a better explanation of the issue. I agree, this seems like > > > something device links should be able to take care of. > > > > > > I'll take a look into this. > > > > Took me a while to debug this because I couldn't find the .remove() > > function and I was very confused about what's going on. > > I'm guessing you started hitting this issue only after moving to the > > devm_ variant of the pwm APIs. > > Ah I see. That problem wouldn't happen if the fan called a pwm API > function in its remove callback but that happens in a devm cleanup call > (registered by devm_add_action_or_reset(dev, pwm_fan_cleanup, ctx) in > pwm_fan_probe()). I first thought you talked about > 8c89fd866ad221af037ef0ec3d60b83d0b859c65. Am I not talking about that commit? Btw, I'm still a bit confused by this thread. In your earlier emails to me, you said: > - trigger unbind of fan device in userspace thread A via sysfs. The > fan's remove callback blocks for 5s in pwm_apply_state() and so > .remove() doesn't complete yet. But the latest tree (Tot) didn't have any .remove() function at all. So I just decided to see if there's any issue in Tot and just fix that. I'm glad my fix helps fixed the issue with the used of devm_*() APIs. However, are you really seeing the issue (supplier freed before consumer) if you do the clean up in the .remove() function? If so, there might still be another issue that needs to be fixed. Thanks, Saravana ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 01/18] pwm: Provide devm_pwmchip_alloc() function 2023-10-18 21:01 ` Saravana Kannan @ 2023-10-18 21:20 ` Uwe Kleine-König 2023-10-18 22:25 ` Saravana Kannan 0 siblings, 1 reply; 16+ messages in thread From: Uwe Kleine-König @ 2023-10-18 21:20 UTC (permalink / raw) To: Saravana Kannan Cc: linux-pwm, linux-doc, Greg Kroah-Hartman, Jonathan Corbet, Wolfram Sang, Mark Brown, Thierry Reding, James Clark, kernel, Yang Yingliang, Andy Shevchenko, Android Kernel Team, Matti Vaittinen [-- Attachment #1: Type: text/plain, Size: 11829 bytes --] On Wed, Oct 18, 2023 at 02:01:30PM -0700, Saravana Kannan wrote: > On Wed, Oct 18, 2023 at 4:17 AM Uwe Kleine-König > <u.kleine-koenig@pengutronix.de> wrote: > > > > On Tue, Oct 17, 2023 at 06:42:40PM -0700, Saravana Kannan wrote: > > > On Tue, Oct 17, 2023 at 4:35 PM Saravana Kannan <saravanak@google.com> wrote: > > > > > > > > On Sat, Oct 14, 2023 at 9:17 AM Uwe Kleine-König > > > > <u.kleine-koenig@pengutronix.de> wrote: > > > > > > > > > > On Fri, Oct 13, 2023 at 02:42:20PM -0700, Saravana Kannan wrote: > > > > > > On Tue, Oct 10, 2023 at 1:05 AM Uwe Kleine-König > > > > > > <u.kleine-koenig@pengutronix.de> wrote: > > > > > > > > > > > > > > Hello Saravana, > > > > > > > > > > > > > > you were pointed out to me as the expert for device links. I found a > > > > > > > problem with these. > > > > > > > > > > > > > > On Tue, Jul 25, 2023 at 11:10:04PM +0200, Uwe Kleine-König wrote: > > > > > > > > Today I managed to trigger the problem I intend to address with this > > > > > > > > series. My machine to test this on is an stm32mp157. To be able to > > > > > > > > trigger the problem reliably I applied the following patches on top of > > > > > > > > v6.5-rc1: > > > > > > > > > > > > > > > > - pwm: stm32: Don't modify HW state in .remove() callback > > > > > > > > This is a cleanup that I already sent out. > > > > > > > > https://lore.kernel.org/r/20230713155142.2454010-2-u.kleine-koenig@pengutronix.de > > > > > > > > The purpose for reproducing the problem is to not trigger further > > > > > > > > calls to the apply callback. > > > > > > > > > > > > > > > > - The following patch: > > > > > > > > > > > > > > > > diff --git a/drivers/pwm/pwm-stm32.c b/drivers/pwm/pwm-stm32.c > > > > > > > > index 687967d3265f..c7fc02b0fa3c 100644 > > > > > > > > --- a/drivers/pwm/pwm-stm32.c > > > > > > > > +++ b/drivers/pwm/pwm-stm32.c > > > > > > > > @@ -451,6 +451,10 @@ static int stm32_pwm_apply(struct pwm_chip *chip, struct pwm_device *pwm, > > > > > > > > struct stm32_pwm *priv = to_stm32_pwm_dev(chip); > > > > > > > > int ret; > > > > > > > > > > > > > > > > + dev_info(chip->dev, "%s:%d\n", __func__, __LINE__); > > > > > > > > + msleep(5000); > > > > > > > > + dev_info(chip->dev, "%s:%d\n", __func__, __LINE__); > > > > > > > > + > > > > > > > > enabled = pwm->state.enabled; > > > > > > > > > > > > > > > > if (enabled && !state->enabled) { > > > > > > > > @@ -650,7 +654,11 @@ static void stm32_pwm_remove(struct platform_device *pdev) > > > > > > > > { > > > > > > > > struct stm32_pwm *priv = platform_get_drvdata(pdev); > > > > > > > > > > > > > > > > + dev_info(&pdev->dev, "%s:%d\n", __func__, __LINE__); > > > > > > > > pwmchip_remove(&priv->chip); > > > > > > > > + dev_info(&pdev->dev, "%s:%d\n", __func__, __LINE__); > > > > > > > > + > > > > > > > > + priv->regmap = NULL; > > > > > > > > } > > > > > > > > > > > > > > > > static int __maybe_unused stm32_pwm_suspend(struct device *dev) > > > > > > > > > > > > > > > > The first hunk is only there to widen the race window. The second is to > > > > > > > > give some diagnostics and make stm32_pwm_apply() crash if it continues > > > > > > > > to run after the msleep. (Without it it didn't crash reproducibly, don't > > > > > > > > understand why. *shrug*) > > > > > > > > > > > > > > > > The device tree contains a pwm-fan device making use of one of the PWMs. > > > > > > > > > > > > > > > > Now I do the following: > > > > > > > > > > > > > > > > echo fan > /sys/bus/platform/drivers/pwm-fan/unbind & sleep 1; echo 40007000.timer:pwm > /sys/bus/platform/drivers/stm32-pwm/unbind > > > > > > > > > > > > > > > > Unbinding the fan device has two effects: > > > > > > > > > > > > > > > > - The device link between fan and pwm looses its property to unbind fan > > > > > > > > when pwm gets unbound. > > > > > > > > (Its .status changes from DL_STATE_ACTIVE to DL_STATE_AVAILABLE) > > > > > > > > - It calls pwm_fan_cleanup() which triggers a call to > > > > > > > > pwm_apply_state(). > > > > > > > > > > > > > > > > So when the pwm device gets unbound the first thread is sleeping in > > > > > > > > stm32_pwm_apply(). The driver calls pwmchip_remove() and sets > > > > > > > > priv->regmap to NULL. Then a few seconds later the first thread wakes up > > > > > > > > in stm32_pwm_apply() with the chip freed and priv->regmap = NULL. Bang! > > > > > > > > > > > > > > > > This looks as follows: > > > > > > > > > > > > > > > > root@crown:~# echo fan > /sys/bus/platform/drivers/pwm-fan/unbind & sleep 1; echo 40007000.timer:pwm > /sys/bus/platform/drivers/stm32-pwm/unbind > > > > > > > > [ 187.182113] stm32-pwm 40007000.timer:pwm: stm32_pwm_apply:454 > > > > > > > > [ 188.164769] stm32-pwm 40007000.timer:pwm: stm32_pwm_remove:657 > > > > > > > > [ 188.184555] stm32-pwm 40007000.timer:pwm: stm32_pwm_remove:659 > > > > > > > > root@crown:~# [ 192.236423] platform 40007000.timer:pwm: stm32_pwm_apply:456 > > > > > > > > [ 192.240727] 8<--- cut here --- > > > > > > > > [ 192.243759] Unable to handle kernel NULL pointer dereference at virtual address 0000001c when read > > > > > > > > ... > > > > > > > > > > > > > > > > Even without the crash you can see that stm32_pwm_apply() is still > > > > > > > > running after pwmchip_remove() completed. > > > > > > > > > > > > > > > > I'm unsure if the device link could be improved here to ensure that the > > > > > > > > fan is completely unbound even if it started unbinding already before > > > > > > > > the pwm device gets unbound. (And if it could, would this fit the device > > > > > > > > links purpose and so be a sensible improvement?) > > > > > > > > > > > > > > While I think that there is something to be done in the pwm core that > > > > > > > this doesn't explode (i.e. do proper lifetime tracking such that a > > > > > > > pwm_chip doesn't disappear while still being used---and I'm working on > > > > > > > that) I expected that the device links between pwm consumer and provider > > > > > > > would prevent the above described oops, too. But somehow the fan already > > > > > > > going away (but still using the PWM) when the PWM is unbound, results in > > > > > > > the PWM disappearing before the fan is completely gone. > > > > > > > > > > > > > > Is this expected, or a problem that can (and should?) be fixed? > > > > > > > > > > > > I didn't read your full series, but I read this email. With what's in > > > > > > this email, the problem seems to be in the driver or the pwm > > > > > > framework. The pwm driver/framework can't tell the driver core that > > > > > > you successfully unbound (returning from .remove()) before you have > > > > > > finish all your ongoing transactions with the device. If your > > > > > > "apply()" is still running, you need to make sure it's complete before > > > > > > .remove() does any resource releasing/clean up. > > > > > > > > > > > > Also, how is the consumer driver's .remove() succeeding if it has an > > > > > > ongoing pwm call()? > > > > > > > > > > The thing that works fine and as expected is: > > > > > > > > > > - trigger unbind of PWM device via sysfs > > > > > > > > > > Because there is a device link PWM provider -> pwm consumer (fan), the > > > > > fan is removed and once its gone (and not earlier), the PWM gets unbound. > > > > > > > > > > The failing sequence is: > > > > > > > > > > - trigger unbind of fan device in userspace thread A via sysfs. The > > > > > fan's remove callback blocks for 5s in pwm_apply_state() and so > > > > > .remove() doesn't complete yet. > > > > > > > > > > - a second later: trigger unbind of PWM device via sysfs in thread B. > > > > > As before I'd expect that the device link results in waiting for the > > > > > fan to be removed completely, but the PWM is removed immediately. > > > > > > > > > > - pwm_apply_state's sleep completes (in thread B) and operates on freed > > > > > resources => bang! > > > > > > > > > > > This all sounds like insufficient locking and > > > > > > critical region protection in both the consumer and supplier. > > > > > > > > > > My (and I think also Thierry's) expectation was, that the device link > > > > > provides the needed synchronisation. But it doesn't as it doesn't block > > > > > the PWM provider going away until the fan is completely gone. > > > > > > > > > > > Device links can't do anything here because you are giving it wrong > > > > > > info -- that the unbind was successful before it actually is. > > > > > > > > > > The fan's unbind is ongoing, but not complete yet and I'd expect that > > > > > the device link blocks unbinding the PWM until the fan is completely > > > > > gone. So I think there is no wrong information. > > > > > > > > > > > Device links will and can make sure that the consumer is unbound > > > > > > successfully before the unbind is called on the supplier. And it looks > > > > > > like that's still true here. > > > > > > > > > > I hope you understood the situation better now and see the problem we > > > > > have. > > > > > > > > > > The problem is fixable in the pwm framework (and I'm working on that), > > > > > but I think there is also something to improve around devicelink > > > > > handling. > > > > > > > > Thanks for a better explanation of the issue. I agree, this seems like > > > > something device links should be able to take care of. > > > > > > > > I'll take a look into this. > > > > > > Took me a while to debug this because I couldn't find the .remove() > > > function and I was very confused about what's going on. > > > I'm guessing you started hitting this issue only after moving to the > > > devm_ variant of the pwm APIs. > > > > Ah I see. That problem wouldn't happen if the fan called a pwm API > > function in its remove callback but that happens in a devm cleanup call > > (registered by devm_add_action_or_reset(dev, pwm_fan_cleanup, ctx) in > > pwm_fan_probe()). I first thought you talked about > > 8c89fd866ad221af037ef0ec3d60b83d0b859c65. > > Am I not talking about that commit? The relevant thing is that the fan (i.e. the consumer) uses pwm_apply_state() in a devm cleanup. If the pwm provider uses devm_pwmchip_add or plain pwmchip_add + pwmchip_remove in .remove() doesn't matter. > Btw, I'm still a bit confused by this thread. In your earlier emails > to me, you said: > > > - trigger unbind of fan device in userspace thread A via sysfs. The > > fan's remove callback blocks for 5s in pwm_apply_state() and so > > .remove() doesn't complete yet. > > But the latest tree (Tot) didn't have any .remove() function at all. > So I just decided to see if there's any issue in Tot and just fix > that. I'm glad my fix helps fixed the issue with the used of devm_*() > APIs. I was sloppy here and called it "remove callback" when it was really a devm cleanup call. Sorry if that confused you. I didn't expect this to make a difference (and I'm not even sure I was aware this is a devm cleanup and not directly .remove() when I wrote that). > However, are you really seeing the issue (supplier freed before > consumer) if you do the clean up in the .remove() function? If so, > there might still be another issue that needs to be fixed. I didn't test that and now having understood the issue you fixed and seeing the effect, I confidently claim there is nothing to fix for drivers that use pwm_apply_state in .remove(). Best regards Uwe -- Pengutronix e.K. | Uwe Kleine-König | Industrial Linux Solutions | https://www.pengutronix.de/ | [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 01/18] pwm: Provide devm_pwmchip_alloc() function 2023-10-18 21:20 ` Uwe Kleine-König @ 2023-10-18 22:25 ` Saravana Kannan 0 siblings, 0 replies; 16+ messages in thread From: Saravana Kannan @ 2023-10-18 22:25 UTC (permalink / raw) To: Uwe Kleine-König Cc: linux-pwm, linux-doc, Greg Kroah-Hartman, Jonathan Corbet, Wolfram Sang, Mark Brown, Thierry Reding, James Clark, kernel, Yang Yingliang, Andy Shevchenko, Android Kernel Team, Matti Vaittinen On Wed, Oct 18, 2023 at 2:21 PM Uwe Kleine-König <u.kleine-koenig@pengutronix.de> wrote: > > On Wed, Oct 18, 2023 at 02:01:30PM -0700, Saravana Kannan wrote: > > On Wed, Oct 18, 2023 at 4:17 AM Uwe Kleine-König > > <u.kleine-koenig@pengutronix.de> wrote: > > > > > > On Tue, Oct 17, 2023 at 06:42:40PM -0700, Saravana Kannan wrote: > > > > On Tue, Oct 17, 2023 at 4:35 PM Saravana Kannan <saravanak@google.com> wrote: > > > > > > > > > > On Sat, Oct 14, 2023 at 9:17 AM Uwe Kleine-König > > > > > <u.kleine-koenig@pengutronix.de> wrote: > > > > > > > > > > > > On Fri, Oct 13, 2023 at 02:42:20PM -0700, Saravana Kannan wrote: > > > > > > > On Tue, Oct 10, 2023 at 1:05 AM Uwe Kleine-König > > > > > > > <u.kleine-koenig@pengutronix.de> wrote: > > > > > > > > > > > > > > > > Hello Saravana, > > > > > > > > > > > > > > > > you were pointed out to me as the expert for device links. I found a > > > > > > > > problem with these. > > > > > > > > > > > > > > > > On Tue, Jul 25, 2023 at 11:10:04PM +0200, Uwe Kleine-König wrote: > > > > > > > > > Today I managed to trigger the problem I intend to address with this > > > > > > > > > series. My machine to test this on is an stm32mp157. To be able to > > > > > > > > > trigger the problem reliably I applied the following patches on top of > > > > > > > > > v6.5-rc1: > > > > > > > > > > > > > > > > > > - pwm: stm32: Don't modify HW state in .remove() callback > > > > > > > > > This is a cleanup that I already sent out. > > > > > > > > > https://lore.kernel.org/r/20230713155142.2454010-2-u.kleine-koenig@pengutronix.de > > > > > > > > > The purpose for reproducing the problem is to not trigger further > > > > > > > > > calls to the apply callback. > > > > > > > > > > > > > > > > > > - The following patch: > > > > > > > > > > > > > > > > > > diff --git a/drivers/pwm/pwm-stm32.c b/drivers/pwm/pwm-stm32.c > > > > > > > > > index 687967d3265f..c7fc02b0fa3c 100644 > > > > > > > > > --- a/drivers/pwm/pwm-stm32.c > > > > > > > > > +++ b/drivers/pwm/pwm-stm32.c > > > > > > > > > @@ -451,6 +451,10 @@ static int stm32_pwm_apply(struct pwm_chip *chip, struct pwm_device *pwm, > > > > > > > > > struct stm32_pwm *priv = to_stm32_pwm_dev(chip); > > > > > > > > > int ret; > > > > > > > > > > > > > > > > > > + dev_info(chip->dev, "%s:%d\n", __func__, __LINE__); > > > > > > > > > + msleep(5000); > > > > > > > > > + dev_info(chip->dev, "%s:%d\n", __func__, __LINE__); > > > > > > > > > + > > > > > > > > > enabled = pwm->state.enabled; > > > > > > > > > > > > > > > > > > if (enabled && !state->enabled) { > > > > > > > > > @@ -650,7 +654,11 @@ static void stm32_pwm_remove(struct platform_device *pdev) > > > > > > > > > { > > > > > > > > > struct stm32_pwm *priv = platform_get_drvdata(pdev); > > > > > > > > > > > > > > > > > > + dev_info(&pdev->dev, "%s:%d\n", __func__, __LINE__); > > > > > > > > > pwmchip_remove(&priv->chip); > > > > > > > > > + dev_info(&pdev->dev, "%s:%d\n", __func__, __LINE__); > > > > > > > > > + > > > > > > > > > + priv->regmap = NULL; > > > > > > > > > } > > > > > > > > > > > > > > > > > > static int __maybe_unused stm32_pwm_suspend(struct device *dev) > > > > > > > > > > > > > > > > > > The first hunk is only there to widen the race window. The second is to > > > > > > > > > give some diagnostics and make stm32_pwm_apply() crash if it continues > > > > > > > > > to run after the msleep. (Without it it didn't crash reproducibly, don't > > > > > > > > > understand why. *shrug*) > > > > > > > > > > > > > > > > > > The device tree contains a pwm-fan device making use of one of the PWMs. > > > > > > > > > > > > > > > > > > Now I do the following: > > > > > > > > > > > > > > > > > > echo fan > /sys/bus/platform/drivers/pwm-fan/unbind & sleep 1; echo 40007000.timer:pwm > /sys/bus/platform/drivers/stm32-pwm/unbind > > > > > > > > > > > > > > > > > > Unbinding the fan device has two effects: > > > > > > > > > > > > > > > > > > - The device link between fan and pwm looses its property to unbind fan > > > > > > > > > when pwm gets unbound. > > > > > > > > > (Its .status changes from DL_STATE_ACTIVE to DL_STATE_AVAILABLE) > > > > > > > > > - It calls pwm_fan_cleanup() which triggers a call to > > > > > > > > > pwm_apply_state(). > > > > > > > > > > > > > > > > > > So when the pwm device gets unbound the first thread is sleeping in > > > > > > > > > stm32_pwm_apply(). The driver calls pwmchip_remove() and sets > > > > > > > > > priv->regmap to NULL. Then a few seconds later the first thread wakes up > > > > > > > > > in stm32_pwm_apply() with the chip freed and priv->regmap = NULL. Bang! > > > > > > > > > > > > > > > > > > This looks as follows: > > > > > > > > > > > > > > > > > > root@crown:~# echo fan > /sys/bus/platform/drivers/pwm-fan/unbind & sleep 1; echo 40007000.timer:pwm > /sys/bus/platform/drivers/stm32-pwm/unbind > > > > > > > > > [ 187.182113] stm32-pwm 40007000.timer:pwm: stm32_pwm_apply:454 > > > > > > > > > [ 188.164769] stm32-pwm 40007000.timer:pwm: stm32_pwm_remove:657 > > > > > > > > > [ 188.184555] stm32-pwm 40007000.timer:pwm: stm32_pwm_remove:659 > > > > > > > > > root@crown:~# [ 192.236423] platform 40007000.timer:pwm: stm32_pwm_apply:456 > > > > > > > > > [ 192.240727] 8<--- cut here --- > > > > > > > > > [ 192.243759] Unable to handle kernel NULL pointer dereference at virtual address 0000001c when read > > > > > > > > > ... > > > > > > > > > > > > > > > > > > Even without the crash you can see that stm32_pwm_apply() is still > > > > > > > > > running after pwmchip_remove() completed. > > > > > > > > > > > > > > > > > > I'm unsure if the device link could be improved here to ensure that the > > > > > > > > > fan is completely unbound even if it started unbinding already before > > > > > > > > > the pwm device gets unbound. (And if it could, would this fit the device > > > > > > > > > links purpose and so be a sensible improvement?) > > > > > > > > > > > > > > > > While I think that there is something to be done in the pwm core that > > > > > > > > this doesn't explode (i.e. do proper lifetime tracking such that a > > > > > > > > pwm_chip doesn't disappear while still being used---and I'm working on > > > > > > > > that) I expected that the device links between pwm consumer and provider > > > > > > > > would prevent the above described oops, too. But somehow the fan already > > > > > > > > going away (but still using the PWM) when the PWM is unbound, results in > > > > > > > > the PWM disappearing before the fan is completely gone. > > > > > > > > > > > > > > > > Is this expected, or a problem that can (and should?) be fixed? > > > > > > > > > > > > > > I didn't read your full series, but I read this email. With what's in > > > > > > > this email, the problem seems to be in the driver or the pwm > > > > > > > framework. The pwm driver/framework can't tell the driver core that > > > > > > > you successfully unbound (returning from .remove()) before you have > > > > > > > finish all your ongoing transactions with the device. If your > > > > > > > "apply()" is still running, you need to make sure it's complete before > > > > > > > .remove() does any resource releasing/clean up. > > > > > > > > > > > > > > Also, how is the consumer driver's .remove() succeeding if it has an > > > > > > > ongoing pwm call()? > > > > > > > > > > > > The thing that works fine and as expected is: > > > > > > > > > > > > - trigger unbind of PWM device via sysfs > > > > > > > > > > > > Because there is a device link PWM provider -> pwm consumer (fan), the > > > > > > fan is removed and once its gone (and not earlier), the PWM gets unbound. > > > > > > > > > > > > The failing sequence is: > > > > > > > > > > > > - trigger unbind of fan device in userspace thread A via sysfs. The > > > > > > fan's remove callback blocks for 5s in pwm_apply_state() and so > > > > > > .remove() doesn't complete yet. > > > > > > > > > > > > - a second later: trigger unbind of PWM device via sysfs in thread B. > > > > > > As before I'd expect that the device link results in waiting for the > > > > > > fan to be removed completely, but the PWM is removed immediately. > > > > > > > > > > > > - pwm_apply_state's sleep completes (in thread B) and operates on freed > > > > > > resources => bang! > > > > > > > > > > > > > This all sounds like insufficient locking and > > > > > > > critical region protection in both the consumer and supplier. > > > > > > > > > > > > My (and I think also Thierry's) expectation was, that the device link > > > > > > provides the needed synchronisation. But it doesn't as it doesn't block > > > > > > the PWM provider going away until the fan is completely gone. > > > > > > > > > > > > > Device links can't do anything here because you are giving it wrong > > > > > > > info -- that the unbind was successful before it actually is. > > > > > > > > > > > > The fan's unbind is ongoing, but not complete yet and I'd expect that > > > > > > the device link blocks unbinding the PWM until the fan is completely > > > > > > gone. So I think there is no wrong information. > > > > > > > > > > > > > Device links will and can make sure that the consumer is unbound > > > > > > > successfully before the unbind is called on the supplier. And it looks > > > > > > > like that's still true here. > > > > > > > > > > > > I hope you understood the situation better now and see the problem we > > > > > > have. > > > > > > > > > > > > The problem is fixable in the pwm framework (and I'm working on that), > > > > > > but I think there is also something to improve around devicelink > > > > > > handling. > > > > > > > > > > Thanks for a better explanation of the issue. I agree, this seems like > > > > > something device links should be able to take care of. > > > > > > > > > > I'll take a look into this. > > > > > > > > Took me a while to debug this because I couldn't find the .remove() > > > > function and I was very confused about what's going on. > > > > I'm guessing you started hitting this issue only after moving to the > > > > devm_ variant of the pwm APIs. > > > > > > Ah I see. That problem wouldn't happen if the fan called a pwm API > > > function in its remove callback but that happens in a devm cleanup call > > > (registered by devm_add_action_or_reset(dev, pwm_fan_cleanup, ctx) in > > > pwm_fan_probe()). I first thought you talked about > > > 8c89fd866ad221af037ef0ec3d60b83d0b859c65. > > > > Am I not talking about that commit? > > The relevant thing is that the fan (i.e. the consumer) uses > pwm_apply_state() in a devm cleanup. If the pwm provider uses > devm_pwmchip_add or plain pwmchip_add + pwmchip_remove in .remove() > doesn't matter. Duh! For whatever reason I had forgotten that stm32 was the supplier and was confusing myself. > > > Btw, I'm still a bit confused by this thread. In your earlier emails > > to me, you said: > > > > > - trigger unbind of fan device in userspace thread A via sysfs. The > > > fan's remove callback blocks for 5s in pwm_apply_state() and so > > > .remove() doesn't complete yet. > > > > But the latest tree (Tot) didn't have any .remove() function at all. > > So I just decided to see if there's any issue in Tot and just fix > > that. I'm glad my fix helps fixed the issue with the used of devm_*() > > APIs. > > I was sloppy here and called it "remove callback" when it was really a > devm cleanup call. Sorry if that confused you. I didn't expect this to > make a difference (and I'm not even sure I was aware this is a devm > cleanup and not directly .remove() when I wrote that). > > > However, are you really seeing the issue (supplier freed before > > consumer) if you do the clean up in the .remove() function? If so, > > there might still be another issue that needs to be fixed. > > I didn't test that and now having understood the issue you fixed and > seeing the effect, I confidently claim there is nothing to fix for > drivers that use pwm_apply_state in .remove(). Yeah, I'm convinced now too! -Saravana ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2023-10-18 22:26 UTC | newest] Thread overview: 16+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-07-18 18:18 [PATCH 00/18] pwm: Provide devm_pwmchip_alloc() function Uwe Kleine-König 2023-07-18 18:18 ` [PATCH 01/18] " Uwe Kleine-König 2023-07-18 20:05 ` Andy Shevchenko 2023-07-19 10:31 ` Uwe Kleine-König 2023-07-19 7:59 ` Thierry Reding 2023-07-19 8:57 ` Uwe Kleine-König 2023-07-25 21:10 ` Uwe Kleine-König 2023-10-10 8:05 ` Uwe Kleine-König 2023-10-13 21:42 ` Saravana Kannan 2023-10-14 16:17 ` Uwe Kleine-König 2023-10-17 23:35 ` Saravana Kannan 2023-10-18 1:42 ` Saravana Kannan 2023-10-18 11:17 ` Uwe Kleine-König 2023-10-18 21:01 ` Saravana Kannan 2023-10-18 21:20 ` Uwe Kleine-König 2023-10-18 22:25 ` Saravana Kannan
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).