* CONFIG_DEBUG_SHIRQ and PM @ 2015-08-25 19:58 Felipe Balbi 2015-08-26 19:29 ` Ezequiel Garcia 2015-08-28 19:42 ` Thomas Gleixner 0 siblings, 2 replies; 10+ messages in thread From: Felipe Balbi @ 2015-08-25 19:58 UTC (permalink / raw) To: linux-arm-kernel Hi Ingo, I'm facing an issue with CONFIG_DEBUG_SHIRQ and pm_runtime when using devm_request_*irq(). If we using devm_request_*irq(), that irq will be freed after device drivers' ->remove() gets called. If on ->remove(), we're calling pm_runtime_put_sync(); pm_runtime_disable(), device's clocks might get gated and, because we do an extra call to the device's IRQ handler when CONFIG_DEBUG_SHIRQ=y, we might trigger an abort exception if, inside the IRQ handler, we try to read a register which is clocked by the device's clock. This is, of course, really old code which has been in tree for many, many years. I guess nobody has been running their tests in the setup mentioned above (CONFIG_DEBUG_SHIRQ=y, pm_runtime_put_sync() on ->remove(), a register read on IRQ handler, and a shared IRQ handler), so that's why we never caught this before. Disabling CONFIG_DEBUG_SHIRQ, of course, makes the problem go away, but if driver *must* be ready to receive, and handle, an IRQ even during module removal, I wonder what the IRQ handler should do. We can't, in most cases, call pm_runtime_put_sync() from IRQ handler. I'm guessing the only way would be to move pm_runtime calls into the bus driver (in this case, the platform_bus) and make sure it only gets called after device managed resources are freed ? Any hints would be greatly appreciated. cheers -- balbi -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: Digital signature URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20150825/091ce5a7/attachment.sig> ^ permalink raw reply [flat|nested] 10+ messages in thread
* CONFIG_DEBUG_SHIRQ and PM 2015-08-25 19:58 CONFIG_DEBUG_SHIRQ and PM Felipe Balbi @ 2015-08-26 19:29 ` Ezequiel Garcia 2015-08-26 19:38 ` Felipe Balbi 2015-08-28 19:42 ` Thomas Gleixner 1 sibling, 1 reply; 10+ messages in thread From: Ezequiel Garcia @ 2015-08-26 19:29 UTC (permalink / raw) To: linux-arm-kernel Felipe, On 25 August 2015 at 16:58, Felipe Balbi <balbi@ti.com> wrote: > Hi Ingo, > > I'm facing an issue with CONFIG_DEBUG_SHIRQ and pm_runtime when using > devm_request_*irq(). > I may be jumping on the gun here, but I believe here's your problem. Using devm_request_irq with shared IRQs is not a good idea. Or at least, it forces you to handle interrupts after your device is _completely_ removed (e.g. your IRQ cookie could be bogus). AFAICS, the CONFIG_DEBUG_SHIRQ option is just triggering a couple spurious interrupts, that are expected to happen anyway. Your IRQ is shared, and so you may get any IRQ at any time, coming from another device (not yours). So, if I'm right, my suggestion is simple: use request_irq/free_irq and free your IRQ before you disable your clocks, remove your device, etc. > If we using devm_request_*irq(), that irq will be freed after device > drivers' ->remove() gets called. If on ->remove(), we're calling > pm_runtime_put_sync(); pm_runtime_disable(), device's clocks might get > gated and, because we do an extra call to the device's IRQ handler when > CONFIG_DEBUG_SHIRQ=y, we might trigger an abort exception if, inside the > IRQ handler, we try to read a register which is clocked by the device's > clock. > > This is, of course, really old code which has been in tree for many, > many years. I guess nobody has been running their tests in the setup > mentioned above (CONFIG_DEBUG_SHIRQ=y, pm_runtime_put_sync() on > ->remove(), a register read on IRQ handler, and a shared IRQ handler), > so that's why we never caught this before. > > Disabling CONFIG_DEBUG_SHIRQ, of course, makes the problem go away, but > if driver *must* be ready to receive, and handle, an IRQ even during > module removal, I wonder what the IRQ handler should do. We can't, in > most cases, call pm_runtime_put_sync() from IRQ handler. > > I'm guessing the only way would be to move pm_runtime calls into the bus > driver (in this case, the platform_bus) and make sure it only gets > called after device managed resources are freed ? > > Any hints would be greatly appreciated. > Hope it helps! -- Ezequiel Garc?a, VanguardiaSur www.vanguardiasur.com.ar ^ permalink raw reply [flat|nested] 10+ messages in thread
* CONFIG_DEBUG_SHIRQ and PM 2015-08-26 19:29 ` Ezequiel Garcia @ 2015-08-26 19:38 ` Felipe Balbi 2015-08-26 19:53 ` Ezequiel Garcia 0 siblings, 1 reply; 10+ messages in thread From: Felipe Balbi @ 2015-08-26 19:38 UTC (permalink / raw) To: linux-arm-kernel Hi, On Wed, Aug 26, 2015 at 04:29:52PM -0300, Ezequiel Garcia wrote: > Felipe, > > On 25 August 2015 at 16:58, Felipe Balbi <balbi@ti.com> wrote: > > Hi Ingo, > > > > I'm facing an issue with CONFIG_DEBUG_SHIRQ and pm_runtime when using > > devm_request_*irq(). > > > > I may be jumping on the gun here, but I believe here's your problem. > Using devm_request_irq with shared IRQs is not a good idea. > > Or at least, it forces you to handle interrupts after your device > is _completely_ removed (e.g. your IRQ cookie could be bogus). > > AFAICS, the CONFIG_DEBUG_SHIRQ option is just triggering a couple > spurious interrupts, that are expected to happen anyway. > > Your IRQ is shared, and so you may get any IRQ at any time, > coming from another device (not yours). > > So, if I'm right, my suggestion is simple: use request_irq/free_irq > and free your IRQ before you disable your clocks, remove your device, > etc. yeah, that's just a workaround though. Specially with IRQ flags coming from DT, driver might have no knowledge that its IRQ is shared to start with. Besides, removing devm_* is just a workaround to the real problem. It seems, to me at least, that drivers shouldn't be calling pm_runtime_put_sync(); pm_runtime_disable() from their ->remove(), rather the bus driver should be responsible for doing so; much usb_bus_type and pci_bus_type do. Of course, this has the side effect of requiring buses to make sure that by the time ->probe() is called, that device's clocks are stable and running and the device is active. However, that's not done today for the platform_bus_type and, frankly, that would be somewhat of a complex problem to solve, considering that different SoCs integrate IPs the way they want. Personally, I think removing devm_* is but a workaround to the real thing we're dealing with. -- balbi -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: Digital signature URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20150826/b828fd38/attachment.sig> ^ permalink raw reply [flat|nested] 10+ messages in thread
* CONFIG_DEBUG_SHIRQ and PM 2015-08-26 19:38 ` Felipe Balbi @ 2015-08-26 19:53 ` Ezequiel Garcia 2015-08-26 20:03 ` Felipe Balbi 0 siblings, 1 reply; 10+ messages in thread From: Ezequiel Garcia @ 2015-08-26 19:53 UTC (permalink / raw) To: linux-arm-kernel On 26 August 2015 at 16:38, Felipe Balbi <balbi@ti.com> wrote: > Hi, > > On Wed, Aug 26, 2015 at 04:29:52PM -0300, Ezequiel Garcia wrote: >> Felipe, >> >> On 25 August 2015 at 16:58, Felipe Balbi <balbi@ti.com> wrote: >> > Hi Ingo, >> > >> > I'm facing an issue with CONFIG_DEBUG_SHIRQ and pm_runtime when using >> > devm_request_*irq(). >> > >> >> I may be jumping on the gun here, but I believe here's your problem. >> Using devm_request_irq with shared IRQs is not a good idea. >> >> Or at least, it forces you to handle interrupts after your device >> is _completely_ removed (e.g. your IRQ cookie could be bogus). >> >> AFAICS, the CONFIG_DEBUG_SHIRQ option is just triggering a couple >> spurious interrupts, that are expected to happen anyway. >> >> Your IRQ is shared, and so you may get any IRQ at any time, >> coming from another device (not yours). >> >> So, if I'm right, my suggestion is simple: use request_irq/free_irq >> and free your IRQ before you disable your clocks, remove your device, >> etc. > > yeah, that's just a workaround though. Specially with IRQ flags coming > from DT, driver might have no knowledge that its IRQ is shared to start > with. > Really? Is there any way the DT can set the IRQF_SHARED flag? The caller of devm_request_irq / request_irq needs to pass the irqflags, so I'd say the driver is perfectly aware of the IRQ being shared or not. Maybe you can clarify what I'm missing here. > Besides, removing devm_* is just a workaround to the real problem. It > seems, to me at least, that drivers shouldn't be calling > pm_runtime_put_sync(); pm_runtime_disable() from their ->remove(), > rather the bus driver should be responsible for doing so; much > usb_bus_type and pci_bus_type do. Of course, this has the side effect of > requiring buses to make sure that by the time ->probe() is called, that > device's clocks are stable and running and the device is active. > > However, that's not done today for the platform_bus_type and, frankly, > that would be somewhat of a complex problem to solve, considering that > different SoCs integrate IPs the way they want. > > Personally, I think removing devm_* is but a workaround to the real > thing we're dealing with. > I don't see any problems here: if your interrupt is shared, then you must be prepared to handle it any time, coming from any sources (not only your device). And CONFIG_DEBUG_SHIRQ does exactly that, in order to make sure all the drivers passing IRQF_SHARED comply with that rule. So you either avoid using devm_request_irq, or you prepare your handler accordingly to be ready to handle an interrupt _any time_. -- Ezequiel Garc?a, VanguardiaSur www.vanguardiasur.com.ar ^ permalink raw reply [flat|nested] 10+ messages in thread
* CONFIG_DEBUG_SHIRQ and PM 2015-08-26 19:53 ` Ezequiel Garcia @ 2015-08-26 20:03 ` Felipe Balbi 2015-08-26 20:15 ` Ezequiel Garcia 0 siblings, 1 reply; 10+ messages in thread From: Felipe Balbi @ 2015-08-26 20:03 UTC (permalink / raw) To: linux-arm-kernel On Wed, Aug 26, 2015 at 04:53:27PM -0300, Ezequiel Garcia wrote: > On 26 August 2015 at 16:38, Felipe Balbi <balbi@ti.com> wrote: > > Hi, > > > > On Wed, Aug 26, 2015 at 04:29:52PM -0300, Ezequiel Garcia wrote: > >> Felipe, > >> > >> On 25 August 2015 at 16:58, Felipe Balbi <balbi@ti.com> wrote: > >> > Hi Ingo, > >> > > >> > I'm facing an issue with CONFIG_DEBUG_SHIRQ and pm_runtime when using > >> > devm_request_*irq(). > >> > > >> > >> I may be jumping on the gun here, but I believe here's your problem. > >> Using devm_request_irq with shared IRQs is not a good idea. > >> > >> Or at least, it forces you to handle interrupts after your device > >> is _completely_ removed (e.g. your IRQ cookie could be bogus). > >> > >> AFAICS, the CONFIG_DEBUG_SHIRQ option is just triggering a couple > >> spurious interrupts, that are expected to happen anyway. > >> > >> Your IRQ is shared, and so you may get any IRQ at any time, > >> coming from another device (not yours). > >> > >> So, if I'm right, my suggestion is simple: use request_irq/free_irq > >> and free your IRQ before you disable your clocks, remove your device, > >> etc. > > > > yeah, that's just a workaround though. Specially with IRQ flags coming > > from DT, driver might have no knowledge that its IRQ is shared to start > > with. > > > > Really? Is there any way the DT can set the IRQF_SHARED flag? > The caller of devm_request_irq / request_irq needs to pass the irqflags, > so I'd say the driver is perfectly aware of the IRQ being shared or not. > > Maybe you can clarify what I'm missing here. hmm, that's true. Now that I look at it, DT can pass triggering flags. > > Besides, removing devm_* is just a workaround to the real problem. It > > seems, to me at least, that drivers shouldn't be calling > > pm_runtime_put_sync(); pm_runtime_disable() from their ->remove(), > > rather the bus driver should be responsible for doing so; much > > usb_bus_type and pci_bus_type do. Of course, this has the side effect of > > requiring buses to make sure that by the time ->probe() is called, that > > device's clocks are stable and running and the device is active. > > > > However, that's not done today for the platform_bus_type and, frankly, > > that would be somewhat of a complex problem to solve, considering that > > different SoCs integrate IPs the way they want. > > > > Personally, I think removing devm_* is but a workaround to the real > > thing we're dealing with. > > > > I don't see any problems here: if your interrupt is shared, then you must > be prepared to handle it any time, coming from any sources (not only > your device). And CONFIG_DEBUG_SHIRQ does exactly that, in order to > make sure all the drivers passing IRQF_SHARED comply with that rule. you need to be sure of that with non-shared IRQs anyway. Also, an IRQ which isn't shared in SoC A, might become shared in SoC B which uses the same IP. > So you either avoid using devm_request_irq, or you prepare your handler > accordingly to be ready to handle an interrupt _any time_. the handler is ready to handle at any time, what isn't correct is the fact that clocks get gated before IRQ is freed. There should be no such special case as "if your handler is shared, don't use devm_request_*irq()" because if we just disable PM_RUNTIME, it works as expected anyway. -- balbi -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: Digital signature URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20150826/e9f85e8f/attachment-0001.sig> ^ permalink raw reply [flat|nested] 10+ messages in thread
* CONFIG_DEBUG_SHIRQ and PM 2015-08-26 20:03 ` Felipe Balbi @ 2015-08-26 20:15 ` Ezequiel Garcia 2015-08-26 20:24 ` Felipe Balbi 0 siblings, 1 reply; 10+ messages in thread From: Ezequiel Garcia @ 2015-08-26 20:15 UTC (permalink / raw) To: linux-arm-kernel On 26 August 2015 at 17:03, Felipe Balbi <balbi@ti.com> wrote: > On Wed, Aug 26, 2015 at 04:53:27PM -0300, Ezequiel Garcia wrote: >> On 26 August 2015 at 16:38, Felipe Balbi <balbi@ti.com> wrote: >> > Hi, >> > >> > On Wed, Aug 26, 2015 at 04:29:52PM -0300, Ezequiel Garcia wrote: >> >> Felipe, >> >> >> >> On 25 August 2015 at 16:58, Felipe Balbi <balbi@ti.com> wrote: >> >> > Hi Ingo, >> >> > >> >> > I'm facing an issue with CONFIG_DEBUG_SHIRQ and pm_runtime when using >> >> > devm_request_*irq(). >> >> > >> >> >> >> I may be jumping on the gun here, but I believe here's your problem. >> >> Using devm_request_irq with shared IRQs is not a good idea. >> >> >> >> Or at least, it forces you to handle interrupts after your device >> >> is _completely_ removed (e.g. your IRQ cookie could be bogus). >> >> >> >> AFAICS, the CONFIG_DEBUG_SHIRQ option is just triggering a couple >> >> spurious interrupts, that are expected to happen anyway. >> >> >> >> Your IRQ is shared, and so you may get any IRQ at any time, >> >> coming from another device (not yours). >> >> >> >> So, if I'm right, my suggestion is simple: use request_irq/free_irq >> >> and free your IRQ before you disable your clocks, remove your device, >> >> etc. >> > >> > yeah, that's just a workaround though. Specially with IRQ flags coming >> > from DT, driver might have no knowledge that its IRQ is shared to start >> > with. >> > >> >> Really? Is there any way the DT can set the IRQF_SHARED flag? >> The caller of devm_request_irq / request_irq needs to pass the irqflags, >> so I'd say the driver is perfectly aware of the IRQ being shared or not. >> >> Maybe you can clarify what I'm missing here. > > hmm, that's true. Now that I look at it, DT can pass triggering flags. > >> > Besides, removing devm_* is just a workaround to the real problem. It >> > seems, to me at least, that drivers shouldn't be calling >> > pm_runtime_put_sync(); pm_runtime_disable() from their ->remove(), >> > rather the bus driver should be responsible for doing so; much >> > usb_bus_type and pci_bus_type do. Of course, this has the side effect of >> > requiring buses to make sure that by the time ->probe() is called, that >> > device's clocks are stable and running and the device is active. >> > >> > However, that's not done today for the platform_bus_type and, frankly, >> > that would be somewhat of a complex problem to solve, considering that >> > different SoCs integrate IPs the way they want. >> > >> > Personally, I think removing devm_* is but a workaround to the real >> > thing we're dealing with. >> > >> >> I don't see any problems here: if your interrupt is shared, then you must >> be prepared to handle it any time, coming from any sources (not only >> your device). And CONFIG_DEBUG_SHIRQ does exactly that, in order to >> make sure all the drivers passing IRQF_SHARED comply with that rule. > > you need to be sure of that with non-shared IRQs anyway. Not entirely. If your IRQ is not shared, then you usually have a register to enable or unmask your peripheral interrupts. So the driver is in control of when it will get interrupts. If the IRQ is shared, this won't do. This is what I mean by "shared IRQs must be prepared to receive an interrupt any time", in the sense that the driver has no way of preventing IRQs (because they may be coming from any source). In the same sense, shared IRQs handlers need to double-check the IRQ is coming to the current device by checking some IRQ status register to see if there's pending work. > Also, an IRQ > which isn't shared in SoC A, might become shared in SoC B which uses the > same IP. > >> So you either avoid using devm_request_irq, or you prepare your handler >> accordingly to be ready to handle an interrupt _any time_. > > the handler is ready to handle at any time, what isn't correct is the > fact that clocks get gated before IRQ is freed. > > There should be no such special case as "if your handler is shared, > don't use devm_request_*irq()" because if we just disable PM_RUNTIME, it > works as expected anyway. > Yeah, I meant to say: if you use devm_request_irq with IRQF_SHARED then you _must_ be prepared to get an IRQ *after* your remove() has been called. Let's consider this snippet from tw68: static irqreturn_t tw68_irq(int irq, void *dev_id) { struct tw68_dev *dev = dev_id; u32 status, orig; int loop; status = orig = tw_readl(TW68_INTSTAT) & dev->pci_irqmask; [etc] } The IRQ handler accesses the device struct and then reads through PCI. So if you use devm_request_irq you need to make sure the device struct is still allocated after remove(), and the PCI read won't stall or crash. Interestingly, tw68 uses devm_request_irq with IRQF_SHARED :-) Still, I don't think that's a good idea, since it relies on the IRQ being freed *before* the device struct. -- Ezequiel Garc?a, VanguardiaSur www.vanguardiasur.com.ar ^ permalink raw reply [flat|nested] 10+ messages in thread
* CONFIG_DEBUG_SHIRQ and PM 2015-08-26 20:15 ` Ezequiel Garcia @ 2015-08-26 20:24 ` Felipe Balbi 2015-08-26 20:36 ` Ezequiel Garcia 0 siblings, 1 reply; 10+ messages in thread From: Felipe Balbi @ 2015-08-26 20:24 UTC (permalink / raw) To: linux-arm-kernel Hi, On Wed, Aug 26, 2015 at 05:15:51PM -0300, Ezequiel Garcia wrote: <snip> > >> be prepared to handle it any time, coming from any sources (not only > >> your device). And CONFIG_DEBUG_SHIRQ does exactly that, in order to > >> make sure all the drivers passing IRQF_SHARED comply with that rule. > > > > you need to be sure of that with non-shared IRQs anyway. > > Not entirely. If your IRQ is not shared, then you usually have a register > to enable or unmask your peripheral interrupts. So the driver is in control > of when it will get interrupts. > > If the IRQ is shared, this won't do. This is what I mean by "shared IRQs > must be prepared to receive an interrupt any time", in the sense that > the driver has no way of preventing IRQs (because they may be > coming from any source). right, the problem is much less likely on non-shared lines but the fine that a line is shared or not is a function of HW integration, not the e.g. USB Controller, so that knowledge really doesn't fit the driver in a sense. We might as well get rid of IRQF_SHARED and assume all lines are shareable. > In the same sense, shared IRQs handlers need to double-check > the IRQ is coming to the current device by checking some IRQ > status register to see if there's pending work. you should the status register even on non-shared IRQs to catch spurious right ? > > Also, an IRQ > > which isn't shared in SoC A, might become shared in SoC B which uses the > > same IP. > > > >> So you either avoid using devm_request_irq, or you prepare your handler > >> accordingly to be ready to handle an interrupt _any time_. > > > > the handler is ready to handle at any time, what isn't correct is the > > fact that clocks get gated before IRQ is freed. > > > > There should be no such special case as "if your handler is shared, > > don't use devm_request_*irq()" because if we just disable PM_RUNTIME, it > > works as expected anyway. > > > > Yeah, I meant to say: if you use devm_request_irq with IRQF_SHARED > then you _must_ be prepared to get an IRQ *after* your remove() has > been called. > > Let's consider this snippet from tw68: > > static irqreturn_t tw68_irq(int irq, void *dev_id) > { > struct tw68_dev *dev = dev_id; > u32 status, orig; > int loop; > > status = orig = tw_readl(TW68_INTSTAT) & dev->pci_irqmask; Now try to read that register when your clock is gated. That's the problem I'm talking about. Everything about the handler is functioning correctly; however clocks are gated in ->remove() and free_irq() is only called *AFTER* ->remove() has returned. > [etc] > } > > The IRQ handler accesses the device struct and then > reads through PCI. So if you use devm_request_irq > you need to make sure the device struct is still allocated > after remove(), and the PCI read won't stall or crash. dude, that's not the problem I'm talking about. I still have my private_data around, what I don't have is: _ _ __ _ ___| | ___ ___| | __ / _` | / __| |/ _ \ / __| |/ / | (_| | | (__| | (_) | (__| < \__,_| \___|_|\___/ \___|_|\_\ > Interestingly, tw68 uses devm_request_irq with IRQF_SHARED :-) > > Still, I don't think that's a good idea, since it relies on > the IRQ being freed *before* the device struct. that's not an issue at all. If you're using devm_request_irq() you're likely using devm_kzalloc() for the device struct anyway. Also, you called devm_kzalloc() before devm_request_irq() so IRQ *will* be freed before your private data; there's nothing wrong there. -- balbi -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: Digital signature URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20150826/7edc4eeb/attachment.sig> ^ permalink raw reply [flat|nested] 10+ messages in thread
* CONFIG_DEBUG_SHIRQ and PM 2015-08-26 20:24 ` Felipe Balbi @ 2015-08-26 20:36 ` Ezequiel Garcia 2015-08-27 13:02 ` Felipe Balbi 0 siblings, 1 reply; 10+ messages in thread From: Ezequiel Garcia @ 2015-08-26 20:36 UTC (permalink / raw) To: linux-arm-kernel On 26 August 2015 at 17:24, Felipe Balbi <balbi@ti.com> wrote: [..] >> >> static irqreturn_t tw68_irq(int irq, void *dev_id) >> { >> struct tw68_dev *dev = dev_id; >> u32 status, orig; >> int loop; >> >> status = orig = tw_readl(TW68_INTSTAT) & dev->pci_irqmask; > > Now try to read that register when your clock is gated. That's the > problem I'm talking about. Everything about the handler is functioning > correctly; however clocks are gated in ->remove() and free_irq() is > only called *AFTER* ->remove() has returned. > Yeah, it's pretty clear you are talking about clocks here. That's why I said "read won't stall" in the next paragraph. >> [etc] >> } >> >> The IRQ handler accesses the device struct and then >> reads through PCI. So if you use devm_request_irq >> you need to make sure the device struct is still allocated >> after remove(), and the PCI read won't stall or crash. > > dude, that's not the problem I'm talking about. I still have my > private_data around, what I don't have is: > > _ _ > __ _ ___| | ___ ___| | __ > / _` | / __| |/ _ \ / __| |/ / > | (_| | | (__| | (_) | (__| < > \__,_| \___|_|\___/ \___|_|\_\ > > Yes, *you* may have your private data around and have a clock gated, others (the tw68 for instance) may have its region released and unmapped. And yet others may have $whatever resource released in the remove() and assume it's available in the IRQ handler. I honestly can't think why using request_irq / free_irq to solve this is a workaround. >> Interestingly, tw68 uses devm_request_irq with IRQF_SHARED :-) >> >> Still, I don't think that's a good idea, since it relies on >> the IRQ being freed *before* the device struct. > > that's not an issue at all. If you're using devm_request_irq() you're > likely using devm_kzalloc() for the device struct anyway. Also, you > called devm_kzalloc() before devm_request_irq() so IRQ *will* be freed > before your private data; there's nothing wrong there. > -- Ezequiel Garc?a, VanguardiaSur www.vanguardiasur.com.ar ^ permalink raw reply [flat|nested] 10+ messages in thread
* CONFIG_DEBUG_SHIRQ and PM 2015-08-26 20:36 ` Ezequiel Garcia @ 2015-08-27 13:02 ` Felipe Balbi 0 siblings, 0 replies; 10+ messages in thread From: Felipe Balbi @ 2015-08-27 13:02 UTC (permalink / raw) To: linux-arm-kernel On Wed, Aug 26, 2015 at 05:36:24PM -0300, Ezequiel Garcia wrote: > On 26 August 2015 at 17:24, Felipe Balbi <balbi@ti.com> wrote: > [..] > >> > >> static irqreturn_t tw68_irq(int irq, void *dev_id) > >> { > >> struct tw68_dev *dev = dev_id; > >> u32 status, orig; > >> int loop; > >> > >> status = orig = tw_readl(TW68_INTSTAT) & dev->pci_irqmask; > > > > Now try to read that register when your clock is gated. That's the > > problem I'm talking about. Everything about the handler is functioning > > correctly; however clocks are gated in ->remove() and free_irq() is > > only called *AFTER* ->remove() has returned. > > > > Yeah, it's pretty clear you are talking about clocks here. That's > why I said "read won't stall" in the next paragraph. > > >> [etc] > >> } > >> > >> The IRQ handler accesses the device struct and then > >> reads through PCI. So if you use devm_request_irq > >> you need to make sure the device struct is still allocated > >> after remove(), and the PCI read won't stall or crash. > > > > dude, that's not the problem I'm talking about. I still have my > > private_data around, what I don't have is: > > > > _ _ > > __ _ ___| | ___ ___| | __ > > / _` | / __| |/ _ \ / __| |/ / > > | (_| | | (__| | (_) | (__| < > > \__,_| \___|_|\___/ \___|_|\_\ > > > > > > Yes, *you* may have your private data around and have a clock gated, > others (the tw68 for instance) may have its region released and unmapped. > > And yet others may have $whatever resource released in the > remove() and assume it's available in the IRQ handler. > > I honestly can't think why using request_irq / free_irq to solve this > is a workaround. because it'll, eventually, boil down to not using devm_* at all and that's pretty stupid. -- balbi -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: Digital signature URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20150827/d7668b03/attachment.sig> ^ permalink raw reply [flat|nested] 10+ messages in thread
* CONFIG_DEBUG_SHIRQ and PM 2015-08-25 19:58 CONFIG_DEBUG_SHIRQ and PM Felipe Balbi 2015-08-26 19:29 ` Ezequiel Garcia @ 2015-08-28 19:42 ` Thomas Gleixner 1 sibling, 0 replies; 10+ messages in thread From: Thomas Gleixner @ 2015-08-28 19:42 UTC (permalink / raw) To: linux-arm-kernel On Tue, 25 Aug 2015, Felipe Balbi wrote: > Hi Ingo, Thanks for not cc'ing the irq maintainer .... > I'm facing an issue with CONFIG_DEBUG_SHIRQ and pm_runtime when using > devm_request_*irq(). > > If we using devm_request_*irq(), that irq will be freed after device > drivers' ->remove() gets called. If on ->remove(), we're calling > pm_runtime_put_sync(); pm_runtime_disable(), device's clocks might get > gated and, because we do an extra call to the device's IRQ handler when > CONFIG_DEBUG_SHIRQ=y, we might trigger an abort exception if, inside the > IRQ handler, we try to read a register which is clocked by the device's > clock. > > This is, of course, really old code which has been in tree for many, > many years. I guess nobody has been running their tests in the setup > mentioned above (CONFIG_DEBUG_SHIRQ=y, pm_runtime_put_sync() on > ->remove(), a register read on IRQ handler, and a shared IRQ handler), > so that's why we never caught this before. > > Disabling CONFIG_DEBUG_SHIRQ, of course, makes the problem go away, but > if driver *must* be ready to receive, and handle, an IRQ even during > module removal, I wonder what the IRQ handler should do. We can't, in > most cases, call pm_runtime_put_sync() from IRQ handler. Well, a shared interrupt handler must handle this situation, no matter what. Assume the following: irqreturn_t dev_irq(int irq, void *data) { struct devdata *dd = data; u32 state; state = readl(dd->base); ... } void module_exit(void) { /* Write to the device interrupt register */ disable_device_irq(dd->base); /* * After this point the device does not longer * raise an interrupt */ iounmap(dd->base); free_irq(); If the other device which shares the interrupt line raises an interrupt after the unmap and before free_irq() removed the device handler from the irq, the machine is toast, because the dev_irq handler is still called. If the handler is shut down after critical parts of the driver/device are shut down, then you can - either can change the setup/teardown ordering disable_device_irq(dd->base); free_irq(); iounmap(dd->base); - or have a proper flag in the private data which tells the interrupt handler to sod off. irqreturn_t dev_irq(int irq, void *data) { struct devdata *dd = data; if (dd->shutdown) return IRQ_NONE; ... void module_exit(void) { disable_device_irq(dd->base); dd->shutdown = 1; /* On an SMP machine you also need: */ synchronize_irq(dd->irq); So for the problem at hand, the devm magic needs to make sure that the crucial parts are still alive when the devm allocated irq is released. I have no idea how that runtime PM stuff is integrated into devm (I fear not at all), so it's hard to give you a proper advise on that. Thanks, tglx ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2015-08-28 19:42 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2015-08-25 19:58 CONFIG_DEBUG_SHIRQ and PM Felipe Balbi 2015-08-26 19:29 ` Ezequiel Garcia 2015-08-26 19:38 ` Felipe Balbi 2015-08-26 19:53 ` Ezequiel Garcia 2015-08-26 20:03 ` Felipe Balbi 2015-08-26 20:15 ` Ezequiel Garcia 2015-08-26 20:24 ` Felipe Balbi 2015-08-26 20:36 ` Ezequiel Garcia 2015-08-27 13:02 ` Felipe Balbi 2015-08-28 19:42 ` Thomas Gleixner
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).