* About regulator error events @ 2023-02-27 13:05 Sascha Hauer 2023-02-27 13:18 ` Mark Brown 2023-02-27 13:19 ` Mark Brown 0 siblings, 2 replies; 6+ messages in thread From: Sascha Hauer @ 2023-02-27 13:05 UTC (permalink / raw) To: linux-kernel; +Cc: Mark Brown, Liam Girdwood, kernel, linux-arm-kernel I have a board here which has some current limited power switches on it and I wonder if I can do something reasonable with the error interrupt pins these switches have. The devices do not have a communication channel, instead they only have an enable pin and an error interrupt pin. See https://www.diodes.com/assets/Datasheets/AP22652_53_52A_53A.pdf for a datasheet. The devices come in two variants, one goes into current limiting mode in case of overcurrent and the other variant switches off until it gets re-enabled again. At first sight it seemed logical to me to wire up the error interrupt pins to REGULATOR_EVENT_OVER_CURRENT events. That was easy to do, but now the question is: What can a regulator consumer do with these events? The strategy I had in mind was to disable the regulator, enable it again to see if the errors persists and if it does, permanently disable the device. Disabling the regulator only works though when there's only one consumer. With multiple consumers only the enable count decreases, but the regulator itself stays enabled. This means implementing such a policy at the consumer side is not generally possible. Implementing a policy in the regulator core seems awkward as well, as a good strategy likely differs between different consumers. A first good step might be to notify the user somehow. While we can get the overcurrent status of a regulator from /sys/class/regulator/*/over_current there doesn't seem to be any way to get a regulator event in userspace, right? Would patches changing that be welcomed? There doesn't seem to be much prior art for handling regulator error events in the kernel. It would be great to get some input what others do in this situation, or to get some ideas what they would do if they had the time to do so ;) Sascha -- Pengutronix e.K. | | Steuerwalder Str. 21 | http://www.pengutronix.de/ | 31137 Hildesheim, Germany | Phone: +49-5121-206917-0 | Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 | _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: About regulator error events 2023-02-27 13:05 About regulator error events Sascha Hauer @ 2023-02-27 13:18 ` Mark Brown 2023-02-27 13:19 ` Mark Brown 1 sibling, 0 replies; 6+ messages in thread From: Mark Brown @ 2023-02-27 13:18 UTC (permalink / raw) To: Sascha Hauer; +Cc: linux-kernel, Liam Girdwood, kernel, linux-arm-kernel [-- Attachment #1.1: Type: text/plain, Size: 1337 bytes --] On Mon, Feb 27, 2023 at 02:05:42PM +0100, Sascha Hauer wrote: > The strategy I had in mind was to disable the regulator, enable it again > to see if the errors persists and if it does, permanently disable the > device. Disabling the regulator only works though when there's only one > consumer. With multiple consumers only the enable count decreases, but Might also be a good idea to turn off any other supplies to the device as well but yes. > A first good step might be to notify the user somehow. While we can get > the overcurrent status of a regulator from > /sys/class/regulator/*/over_current there doesn't seem to be any way to > get a regulator event in userspace, right? Would patches changing that > be welcomed? Sure. > There doesn't seem to be much prior art for handling regulator error > events in the kernel. It would be great to get some input what others do > in this situation, or to get some ideas what they would do if they had > the time to do so ;) The general issue is that if the regulators get upset enough to start complaining something has generally gone really badly wrong and there isn't anything constructive that can be done other than logging. You might potentially want to power off the system as a whole or something too. I do think any big actions are going to be userspace policy things. [-- Attachment #1.2: signature.asc --] [-- Type: application/pgp-signature, Size: 488 bytes --] [-- Attachment #2: Type: text/plain, Size: 176 bytes --] _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: About regulator error events 2023-02-27 13:05 About regulator error events Sascha Hauer 2023-02-27 13:18 ` Mark Brown @ 2023-02-27 13:19 ` Mark Brown 2023-02-27 16:19 ` Matti Vaittinen 1 sibling, 1 reply; 6+ messages in thread From: Mark Brown @ 2023-02-27 13:19 UTC (permalink / raw) To: Sascha Hauer Cc: linux-kernel, Liam Girdwood, kernel, linux-arm-kernel, Matti Vaittinen [-- Attachment #1.1: Type: text/plain, Size: 2388 bytes --] On Mon, Feb 27, 2023 at 02:05:42PM +0100, Sascha Hauer wrote: > I have a board here which has some current limited power switches on it > and I wonder if I can do something reasonable with the error interrupt > pins these switches have. Just noticed that Matti (who's been doing a bunch of work here) wasn't CCed so adding him. > The devices do not have a communication channel, instead they only have > an enable pin and an error interrupt pin. See > https://www.diodes.com/assets/Datasheets/AP22652_53_52A_53A.pdf for a > datasheet. The devices come in two variants, one goes into current > limiting mode in case of overcurrent and the other variant switches off > until it gets re-enabled again. > > At first sight it seemed logical to me to wire up the error interrupt > pins to REGULATOR_EVENT_OVER_CURRENT events. That was easy to do, but > now the question is: What can a regulator consumer do with these events? > > The strategy I had in mind was to disable the regulator, enable it again > to see if the errors persists and if it does, permanently disable the > device. Disabling the regulator only works though when there's only one > consumer. With multiple consumers only the enable count decreases, but > the regulator itself stays enabled. This means implementing such a > policy at the consumer side is not generally possible. Implementing a > policy in the regulator core seems awkward as well, as a good strategy > likely differs between different consumers. > > A first good step might be to notify the user somehow. While we can get > the overcurrent status of a regulator from > /sys/class/regulator/*/over_current there doesn't seem to be any way to > get a regulator event in userspace, right? Would patches changing that > be welcomed? > > There doesn't seem to be much prior art for handling regulator error > events in the kernel. It would be great to get some input what others do > in this situation, or to get some ideas what they would do if they had > the time to do so ;) > > Sascha > > -- > Pengutronix e.K. | | > Steuerwalder Str. 21 | http://www.pengutronix.de/ | > 31137 Hildesheim, Germany | Phone: +49-5121-206917-0 | > Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 | [-- Attachment #1.2: signature.asc --] [-- Type: application/pgp-signature, Size: 488 bytes --] [-- Attachment #2: Type: text/plain, Size: 176 bytes --] _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: About regulator error events 2023-02-27 13:19 ` Mark Brown @ 2023-02-27 16:19 ` Matti Vaittinen 2023-02-27 17:31 ` Mark Brown 2023-02-28 8:32 ` Sascha Hauer 0 siblings, 2 replies; 6+ messages in thread From: Matti Vaittinen @ 2023-02-27 16:19 UTC (permalink / raw) To: Mark Brown Cc: Sascha Hauer, linux-kernel, Liam Girdwood, kernel, linux-arm-kernel ma 27. helmik. 2023 klo 15.19 Mark Brown (broonie@kernel.org) kirjoitti: > > On Mon, Feb 27, 2023 at 02:05:42PM +0100, Sascha Hauer wrote: > > I have a board here which has some current limited power switches on it > > and I wonder if I can do something reasonable with the error interrupt > > pins these switches have. > > Just noticed that Matti (who's been doing a bunch of work here) wasn't > CCed so adding him. Thanks. I just hoped I had more to say... > > The devices do not have a communication channel, instead they only have > > an enable pin and an error interrupt pin. See > > https://www.diodes.com/assets/Datasheets/AP22652_53_52A_53A.pdf for a > > datasheet. The devices come in two variants, one goes into current > > limiting mode in case of overcurrent and the other variant switches off > > until it gets re-enabled again. > > > > At first sight it seemed logical to me to wire up the error interrupt > > pins to REGULATOR_EVENT_OVER_CURRENT events. That was easy to do, but > > now the question is: What can a regulator consumer do with these events? This is a question I have asked too. I was asked to create a driver for a ROHM BD9576 PMIC - which has the usual configurable over-current, over-voltage, under-voltage and over-temperature protection limits. (Well, the temperature limit is fixed). This means that when limits are exceeded - the PMIC shuts down the power outputs by hardware. What was new is that the BD9576 also had configurable warning-level limits (stricter than the protection limits) - and when these were exceeded only a 'warning IRQ' was issued. This setup was requested from ROHM by a customer - and the information I received was the customer had a use-case where they wanted to do 'mitigation actions' before things get more severely off. Unfortunately, I never received the information about these 'mitigation actions' when I tried to ask what those could be. I am under impression that either out HW colleagues did not know the customer use-case in details, or that this information was 'top secret' (which seems to be the case pretty often :( ) > > The strategy I had in mind was to disable the regulator, enable it again > > to see if the errors persists and if it does, permanently disable the > > device. Disabling the regulator only works though when there's only one > > consumer. If it is obvious that disabling the regulator is required for preventing the damage, then it might be justified to use the regulator_force_disable()? Now, the question when this is obvious is hard. I think it is the board designer who should be evaluating this - and as such, I would say that the information about the severity of error should come from hardware properties - eg. from device-tree. Now, I am not really sure but I have a feeling that ideally the regulator driver (provider, not the consumer) should have this information about the severity level in device-tree and select the use of notifier flag based on this. If an ERROR level event is emitted, it should mean the hardware has really failed and forced disable could be justified. If a WARNING level event is sent, then the handling should be more graceful - probably done by some very system specific driver. My problem here is that I don't have the visibility or understanding regarding current use of those events. Not sure if all the hell would break loose if the regulators are forcibly shut down. By the very least I would expect such a consumer handling to be disabled by default - either via configs or via some runtime enable/disable mechanism. With multiple consumers only the enable count decreases, but > > the regulator itself stays enabled. This means implementing such a > > policy at the consumer side is not generally possible. Implementing a > > policy in the regulator core seems awkward as well, as a good strategy > > likely differs between different consumers. > > > > A first good step might be to notify the user somehow. While we can get > > the overcurrent status of a regulator from > > /sys/class/regulator/*/over_current there doesn't seem to be any way to > > get a regulator event in userspace, right? Would patches changing that > > be welcomed? > > > > There doesn't seem to be much prior art for handling regulator error > > events in the kernel. It would be great to get some input what others do > > in this situation, or to get some ideas what they would do if they had > > the time to do so ;) I did submit a talk proposal about these regulator notifications to ELCE next summer. I don't really have much to say - but I do hope that I could gain some insight exactly to the question: "[where / how] these notifications [are / could be] used?" With a lot of luck the talk proposal gets accepted - and with a bit of luck people in the audience know more than I do ;) I think it was qcom-labibb-regulator.c which does try to disable the failing regulators and then call BUG_ON() if even that fails. Back when I did implement the regulator irq_helper, I did decide to try supporting similar logic for drivers which explicitly request failures to handle these protection IRQs to be treated as fatal errors - except that instead of calling BUG_ON() (which to my understanding does not necessarily lead to reset on all configs) - I try to run the poweroff. This, however, is probably not a good generic solution. I am all for adding notification sending to user-space. Still, I am not against a kernel-level protection either - that can be used in cases where user-space or user-space handling is not available / to be trusted for a reason or other. What should be known then in-kernel is whether the handling should be left to user-space (or other consumers) - or if the kernel driver should take some protective action(s). Yep, I know this was not much - sorry :( As I said, I do lack the insight to real-world use cases... Yours, -- Matti -- Matti Vaittinen Linux kernel developer at ROHM Semiconductors Oulu Finland ~~ When things go utterly wrong vim users can always type :help! ~~ Discuss - Estimate - Plan - Report and finally accomplish this: void do_work(int time) __attribute__ ((const)); _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: About regulator error events 2023-02-27 16:19 ` Matti Vaittinen @ 2023-02-27 17:31 ` Mark Brown 2023-02-28 8:32 ` Sascha Hauer 1 sibling, 0 replies; 6+ messages in thread From: Mark Brown @ 2023-02-27 17:31 UTC (permalink / raw) To: Matti Vaittinen Cc: Sascha Hauer, linux-kernel, Liam Girdwood, kernel, linux-arm-kernel [-- Attachment #1.1: Type: text/plain, Size: 2668 bytes --] On Mon, Feb 27, 2023 at 06:19:19PM +0200, Matti Vaittinen wrote: > What was new is that the BD9576 also had configurable warning-level > limits (stricter than the protection limits) - and when these were > exceeded only a 'warning IRQ' was issued. This setup was requested > from ROHM by a customer - and the information I received was the > customer had a use-case where they wanted to do 'mitigation actions' > before things get more severely off. Unfortunately, I never received > the information about these 'mitigation actions' when I tried to ask > what those could be. I am under impression that either out HW > colleagues did not know the customer use-case in details, or that this > information was 'top secret' (which seems to be the case pretty often > :( ) That sounds like an industrial application with redundant instances where you can fail the workload over to another system as thing start failing. > > > The strategy I had in mind was to disable the regulator, enable it again > > > to see if the errors persists and if it does, permanently disable the > > > device. Disabling the regulator only works though when there's only one > > > consumer. > If it is obvious that disabling the regulator is required for > preventing the damage, then it might be justified to use the > regulator_force_disable()? Now, the question when this is obvious is That is basically what it's there for, or at least such things when detected from a consumer device (eg, over temperature warnings). However it's an open question if you're likely to see a situation where a regulator flagged a problem that critical and didn't autonomously shut down. > Now, I am not really sure but I have a feeling that ideally the > regulator driver (provider, not the consumer) should have this > information about the severity level in device-tree and select the use > of notifier flag based on this. If an ERROR level event is emitted, it > should mean the hardware has really failed and forced disable could be > justified. If a WARNING level event is sent, then the handling should > be more graceful - probably done by some very system specific driver. It certainly seems like the regulator is a good place to inject the configuration. > My problem here is that I don't have the visibility or understanding > regarding current use of those events. Not sure if all the hell would > break loose if the regulators are forcibly shut down. By the very > least I would expect such a consumer handling to be disabled by > default - either via configs or via some runtime enable/disable > mechanism. Yeah, as we've discussed before AFAICT any real usage is entirely downstream. [-- Attachment #1.2: signature.asc --] [-- Type: application/pgp-signature, Size: 488 bytes --] [-- Attachment #2: Type: text/plain, Size: 176 bytes --] _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: About regulator error events 2023-02-27 16:19 ` Matti Vaittinen 2023-02-27 17:31 ` Mark Brown @ 2023-02-28 8:32 ` Sascha Hauer 1 sibling, 0 replies; 6+ messages in thread From: Sascha Hauer @ 2023-02-28 8:32 UTC (permalink / raw) To: Matti Vaittinen Cc: Mark Brown, linux-kernel, Liam Girdwood, kernel, linux-arm-kernel On Mon, Feb 27, 2023 at 06:19:19PM +0200, Matti Vaittinen wrote: > ma 27. helmik. 2023 klo 15.19 Mark Brown (broonie@kernel.org) kirjoitti: > > > > On Mon, Feb 27, 2023 at 02:05:42PM +0100, Sascha Hauer wrote: > > > I have a board here which has some current limited power switches on it > > > and I wonder if I can do something reasonable with the error interrupt > > > pins these switches have. > > > > Just noticed that Matti (who's been doing a bunch of work here) wasn't > > CCed so adding him. > > Thanks. I just hoped I had more to say... > > > > > The devices do not have a communication channel, instead they only have > > > an enable pin and an error interrupt pin. See > > > https://www.diodes.com/assets/Datasheets/AP22652_53_52A_53A.pdf for a > > > datasheet. The devices come in two variants, one goes into current > > > limiting mode in case of overcurrent and the other variant switches off > > > until it gets re-enabled again. > > > > > > At first sight it seemed logical to me to wire up the error interrupt > > > pins to REGULATOR_EVENT_OVER_CURRENT events. That was easy to do, but > > > now the question is: What can a regulator consumer do with these events? > > This is a question I have asked too. > > I was asked to create a driver for a ROHM BD9576 PMIC - which has the > usual configurable over-current, over-voltage, under-voltage and > over-temperature protection limits. (Well, the temperature limit is > fixed). This means that when limits are exceeded - the PMIC shuts down > the power outputs by hardware. > > What was new is that the BD9576 also had configurable warning-level > limits (stricter than the protection limits) - and when these were > exceeded only a 'warning IRQ' was issued. This setup was requested > from ROHM by a customer - and the information I received was the > customer had a use-case where they wanted to do 'mitigation actions' > before things get more severely off. Unfortunately, I never received > the information about these 'mitigation actions' when I tried to ask > what those could be. I am under impression that either out HW > colleagues did not know the customer use-case in details, or that this > information was 'top secret' (which seems to be the case pretty often > :( ) > > > > The strategy I had in mind was to disable the regulator, enable it again > > > to see if the errors persists and if it does, permanently disable the > > > device. Disabling the regulator only works though when there's only one > > > consumer. > > If it is obvious that disabling the regulator is required for > preventing the damage, then it might be justified to use the > regulator_force_disable()? Now, the question when this is obvious is > hard. I think it is the board designer who should be evaluating this - > and as such, I would say that the information about the severity of > error should come from hardware properties - eg. from device-tree. > Now, I am not really sure but I have a feeling that ideally the > regulator driver (provider, not the consumer) should have this > information about the severity level in device-tree and select the use > of notifier flag based on this. If an ERROR level event is emitted, it > should mean the hardware has really failed and forced disable could be > justified. If a WARNING level event is sent, then the handling should > be more graceful - probably done by some very system specific driver. > > My problem here is that I don't have the visibility or understanding > regarding current use of those events. Not sure if all the hell would > break loose if the regulators are forcibly shut down. By the very > least I would expect such a consumer handling to be disabled by > default - either via configs or via some runtime enable/disable > mechanism. > > With multiple consumers only the enable count decreases, but > > > the regulator itself stays enabled. This means implementing such a > > > policy at the consumer side is not generally possible. Implementing a > > > policy in the regulator core seems awkward as well, as a good strategy > > > likely differs between different consumers. > > > > > > A first good step might be to notify the user somehow. While we can get > > > the overcurrent status of a regulator from > > > /sys/class/regulator/*/over_current there doesn't seem to be any way to > > > get a regulator event in userspace, right? Would patches changing that > > > be welcomed? > > > > > > There doesn't seem to be much prior art for handling regulator error > > > events in the kernel. It would be great to get some input what others do > > > in this situation, or to get some ideas what they would do if they had > > > the time to do so ;) > > I did submit a talk proposal about these regulator notifications to > ELCE next summer. I don't really have much to say - but I do hope that > I could gain some insight exactly to the question: "[where / how] > these notifications [are / could be] used?" With a lot of luck the > talk proposal gets accepted - and with a bit of luck people in the > audience know more than I do ;) > > I think it was qcom-labibb-regulator.c which does try to disable the > failing regulators and then call BUG_ON() if even that fails. Back > when I did implement the regulator irq_helper, I did decide to try > supporting similar logic for drivers which explicitly request failures > to handle these protection IRQs to be treated as fatal errors - except > that instead of calling BUG_ON() (which to my understanding does not > necessarily lead to reset on all configs) - I try to run the poweroff. > This, however, is probably not a good generic solution. > > I am all for adding notification sending to user-space. Still, I am > not against a kernel-level protection either - that can be used in > cases where user-space or user-space handling is not available / to be > trusted for a reason or other. What should be known then in-kernel is > whether the handling should be left to user-space (or other consumers) > - or if the kernel driver should take some protective action(s). > > Yep, I know this was not much - sorry :( As I said, I do lack the > insight to real-world use cases... At least in my case I can tell something about the real-world use case. One of the current limiting switches supplies VBUS to a USB port on the board. The USB core already has an idea of over current handling. In theory we could wire up the over current event to the XHCI driver. In practice this doesn't seem to be that simple. Over current is normally read from a register bit in the XHCI controller. We would have to route the over current event from the DWC3 device tree binding through the DWC3 driver to the XHCI driver, short circuit the over current register bit and use the regulator information instead. While doable this sounds like a lot of work for just a special case. The other current limiting switch supplies the +5v output of a HDMI connector. Hell won't break loose in case of an over current event, it mainly only indicates that a broken cable is connected. In an ideal world the user would be notified about the over current and everything could go back to normal once the broken cable has been replaced with a working one. regulator_force_disable() wouldn't be needed in my case as the switch automatically turns off anyway when an over current is detected. Right now a regulator consumer doesn't know how the regulator behaves in case of an over current event. Does it turn off or does it go into current limiting mode? The consumer also doesn't know if it is exlusively using the regulator or if there are other consumers. Even if it had that information it would likely become quite a mess when each regulator consumer spread around hundreds of drivers get their own over current handling. With that I think over current events can only sensibly be handled in the regulator core, maybe with some help of device tree properties. I take from this discussion that it's at least a good idea to send user space notifications when something goes wrong with the regulators. Sascha -- Pengutronix e.K. | | Steuerwalder Str. 21 | http://www.pengutronix.de/ | 31137 Hildesheim, Germany | Phone: +49-5121-206917-0 | Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 | _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2023-02-28 8:34 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-02-27 13:05 About regulator error events Sascha Hauer 2023-02-27 13:18 ` Mark Brown 2023-02-27 13:19 ` Mark Brown 2023-02-27 16:19 ` Matti Vaittinen 2023-02-27 17:31 ` Mark Brown 2023-02-28 8:32 ` Sascha Hauer
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).