* [GIT PULL tip/genirq] Please pull from lost-spurious-irq
@ 2010-07-28 13:42 Tejun Heo
2010-07-28 13:46 ` Tejun Heo
0 siblings, 1 reply; 18+ messages in thread
From: Tejun Heo @ 2010-07-28 13:42 UTC (permalink / raw)
To: Thomas Gleixner, lkml; +Cc: Jeff Garzik, Greg KH
Hello, Thomas.
With Jeff's acks added, patches to make libata use irq-expect are
commited. Please pull from the following branch to receive patches[1]
to improve lost/spurious irq handling.
git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc.git lost-spurious-irq
Thanks.
Tejun Heo (14):
irq: cleanup irqfixup
irq: make spurious poll timer per desc
irq: use desc->poll_timer for irqpoll
irq: kill IRQF_IRQPOLL
irq: misc preparations for further changes
irq: implement irq_schedule_poll()
irq: improve spurious IRQ handling
irq: implement IRQ watching
irq: implement IRQ expecting
irq: add comment about overall design of lost/spurious IRQ handling
usb: use IRQ watching
sata_fsl,mv,nv: prepare for NCQ command completion update
libata: always use ata_qc_complete_multiple() for NCQ command completion
libata: use IRQ expecting
arch/arm/mach-aaec2000/core.c | 2 +-
arch/arm/mach-at91/at91rm9200_time.c | 2 +-
arch/arm/mach-at91/at91sam926x_time.c | 2 +-
arch/arm/mach-bcmring/core.c | 2 +-
arch/arm/mach-clps711x/time.c | 2 +-
arch/arm/mach-cns3xxx/core.c | 2 +-
arch/arm/mach-ebsa110/core.c | 2 +-
arch/arm/mach-ep93xx/core.c | 2 +-
arch/arm/mach-footbridge/dc21285-timer.c | 2 +-
arch/arm/mach-footbridge/isa-timer.c | 2 +-
arch/arm/mach-h720x/cpu-h7201.c | 2 +-
arch/arm/mach-h720x/cpu-h7202.c | 2 +-
arch/arm/mach-integrator/integrator_ap.c | 2 +-
arch/arm/mach-ixp2000/core.c | 2 +-
arch/arm/mach-ixp23xx/core.c | 2 +-
arch/arm/mach-ixp4xx/common.c | 2 +-
arch/arm/mach-lh7a40x/time.c | 2 +-
arch/arm/mach-mmp/time.c | 2 +-
arch/arm/mach-netx/time.c | 2 +-
arch/arm/mach-ns9xxx/irq.c | 3 -
arch/arm/mach-ns9xxx/time-ns9360.c | 2 +-
arch/arm/mach-nuc93x/time.c | 2 +-
arch/arm/mach-omap1/time.c | 2 +-
arch/arm/mach-omap1/timer32k.c | 2 +-
arch/arm/mach-omap2/timer-gp.c | 2 +-
arch/arm/mach-pnx4008/time.c | 2 +-
arch/arm/mach-pxa/time.c | 2 +-
arch/arm/mach-sa1100/time.c | 2 +-
arch/arm/mach-shark/core.c | 2 +-
arch/arm/mach-u300/timer.c | 2 +-
arch/arm/mach-w90x900/time.c | 2 +-
arch/arm/plat-iop/time.c | 2 +-
arch/arm/plat-mxc/time.c | 2 +-
arch/arm/plat-samsung/time.c | 2 +-
arch/arm/plat-versatile/timer-sp.c | 2 +-
arch/blackfin/kernel/time-ts.c | 6 +-
arch/ia64/kernel/time.c | 2 +-
arch/parisc/kernel/irq.c | 2 +-
arch/powerpc/platforms/cell/interrupt.c | 5 +-
arch/x86/kernel/time.c | 2 +-
drivers/ata/libata-core.c | 54 ++-
drivers/ata/libata-eh.c | 4 +-
drivers/ata/libata-sff.c | 37 +-
drivers/ata/sata_fsl.c | 26 +-
drivers/ata/sata_mv.c | 58 +-
drivers/ata/sata_nv.c | 87 +--
drivers/clocksource/sh_cmt.c | 3 +-
drivers/clocksource/sh_mtu2.c | 3 +-
drivers/clocksource/sh_tmu.c | 3 +-
drivers/usb/core/hcd.c | 1 +
include/linux/interrupt.h | 43 +-
include/linux/irq.h | 40 +-
include/linux/libata.h | 2 +
kernel/irq/chip.c | 20 +-
kernel/irq/handle.c | 7 +-
kernel/irq/internals.h | 10 +-
kernel/irq/manage.c | 18 +-
kernel/irq/proc.c | 5 +-
kernel/irq/spurious.c | 978 +++++++++++++++++++++++++-----
59 files changed, 1101 insertions(+), 386 deletions(-)
--
tejun
[1] http://thread.gmane.org/gmane.linux.ide/46448
^ permalink raw reply [flat|nested] 18+ messages in thread* Re: [GIT PULL tip/genirq] Please pull from lost-spurious-irq 2010-07-28 13:42 [GIT PULL tip/genirq] Please pull from lost-spurious-irq Tejun Heo @ 2010-07-28 13:46 ` Tejun Heo 2010-07-29 8:44 ` Thomas Gleixner 0 siblings, 1 reply; 18+ messages in thread From: Tejun Heo @ 2010-07-28 13:46 UTC (permalink / raw) To: Thomas Gleixner, lkml; +Cc: Jeff Garzik, Greg KH On 07/28/2010 03:42 PM, Tejun Heo wrote: > Hello, Thomas. > > With Jeff's acks added, patches to make libata use irq-expect are > commited. Please pull from the following branch to receive patches[1] > to improve lost/spurious irq handling. > > git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc.git lost-spurious-irq I'll ask Stephen to pull the above branch into linux-next until it shows up via tip so that we don't lose any more linux-next testing time. Thank you. -- tejun ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [GIT PULL tip/genirq] Please pull from lost-spurious-irq 2010-07-28 13:46 ` Tejun Heo @ 2010-07-29 8:44 ` Thomas Gleixner 2010-07-30 9:46 ` Tejun Heo 0 siblings, 1 reply; 18+ messages in thread From: Thomas Gleixner @ 2010-07-29 8:44 UTC (permalink / raw) To: Tejun Heo; +Cc: lkml, Jeff Garzik, Greg KH On Wed, 28 Jul 2010, Tejun Heo wrote: > On 07/28/2010 03:42 PM, Tejun Heo wrote: > > Hello, Thomas. > > > > With Jeff's acks added, patches to make libata use irq-expect are > > commited. Please pull from the following branch to receive patches[1] > > to improve lost/spurious irq handling. > > > > git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc.git lost-spurious-irq > > I'll ask Stephen to pull the above branch into linux-next until it > shows up via tip so that we don't lose any more linux-next testing > time. I'm working on it already and I'm currently twisting my brain around the problem this patches will impose to the RT stuff, where we cannot access timer_list timers from atomic regions :( Thanks, tglx ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [GIT PULL tip/genirq] Please pull from lost-spurious-irq 2010-07-29 8:44 ` Thomas Gleixner @ 2010-07-30 9:46 ` Tejun Heo 2010-08-02 14:07 ` Thomas Gleixner 0 siblings, 1 reply; 18+ messages in thread From: Tejun Heo @ 2010-07-30 9:46 UTC (permalink / raw) To: Thomas Gleixner; +Cc: lkml, Jeff Garzik, Greg KH Hello, On 07/29/2010 10:44 AM, Thomas Gleixner wrote: >> I'll ask Stephen to pull the above branch into linux-next until it >> shows up via tip so that we don't lose any more linux-next testing >> time. > > I'm working on it already and I'm currently twisting my brain around > the problem this patches will impose to the RT stuff, where we cannot > access timer_list timers from atomic regions :( Oh, I see. A dull last ditch solution would be just disabling it on CONFIG_PREEMPT_RT, but yeah that will be dull. :-( Thanks. -- tejun ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [GIT PULL tip/genirq] Please pull from lost-spurious-irq 2010-07-30 9:46 ` Tejun Heo @ 2010-08-02 14:07 ` Thomas Gleixner 2010-08-02 15:28 ` Tejun Heo 0 siblings, 1 reply; 18+ messages in thread From: Thomas Gleixner @ 2010-08-02 14:07 UTC (permalink / raw) To: Tejun Heo; +Cc: lkml, Jeff Garzik, Greg KH Tejun, On Fri, 30 Jul 2010, Tejun Heo wrote: > Hello, > > On 07/29/2010 10:44 AM, Thomas Gleixner wrote: > >> I'll ask Stephen to pull the above branch into linux-next until it > >> shows up via tip so that we don't lose any more linux-next testing > >> time. > > > > I'm working on it already and I'm currently twisting my brain around > > the problem this patches will impose to the RT stuff, where we cannot > > access timer_list timers from atomic regions :( > > Oh, I see. A dull last ditch solution would be just disabling it on > CONFIG_PREEMPT_RT, but yeah that will be dull. :-( Yup, but I have some other issues with this series as well. That thing is massively overengineered. You mangle all the various functions into irq_poll which makes it really hard to understand what the code does under which circumstances. I understand most of the problems you want to solve, but I don't like the outcome too much. Let's look at the various parts: 1) irq polling - Get rid of the for_each_irq() loop in the poll timer. You can solve this by adding the interrupt to a linked list and let the poll timer run through the list. Also why is the simple counter based decision not good enough ? Why do we need an extra time period for this ? - Reenable the interrupt from time to time to check whether the irq storm subsided. Generally a good idea, but we really do not want to wait for another 10k unhandled interrupts flooding the machine. Ideally we should read out the irq pending register from the irq chip to see whether the interrupt is still asserted. 2) irq watch I have to admit, that I have no clue what this thing exactly does. After staring into the code for quite a while I came to the conclusion that it's a dynamic polling machinery, which adjusts it's polling frequency depending on the number of good and bad samples. The heuristics for this are completely non obvious. Aside of that the watch mechanism adds unneccesary code to the hard interrupt context. There is no reason why you need to update that watch magic in the hard interrupt context. Analysing the stats can be done in the watch timer as well. All it takes is in handle_irq_event() case IRQ_HANDLED: action->handled++; and in the watch timer function if (ret == IRQ_HANDLED) action->polled++; irq_watch should use a separate global timer and a linked list. The timer is started when an interrupt is added to the watch mechanism and stopped when the list becomes empty. That's a clear separation of watch and poll and simplifies the whole business a lot. 3) irq expect So you basically poll the interrupt until either the interrupt happened or the device driver timeout occured. That's really weird. What does it solve ? Not running into the device driver timeout routine if the hardware interrupt has been lost? That's overkill, isn't it ? When the hardware loses an interrupt occasionally then why isn't the device driver timeout routine sufficient? If it does not check whether the device has an interrupt pending despite the fact that it has not been delivered, then this code needs to be fixed and not worked around with weird polling in the generic irq code. Btw, this polling is pretty bad in terms of power savings. - The timers are not aligned, so if there are several expects armed you get unbatched wakeups. - If the poll interval is smaller than the average hardware interrupt delivery time, you add more wakeups than necessary. - If the hardware interrupt has happened, you let the poll timer pending, which gives us an extra wakeup for each interrupt in the worst case. I assume your concern is to detect and deal with flaky interrupts, but avoiding the permanent poll when the device is not used, right? So that can be done simpler as well. One timer and a linked list again. expect_irq() adds the irq to the linked list and arms the timer, if it is not armed already. unexpect_irq() removes the irq from the linked list and deletes the timer when the list becomes empty. That can reuse the list_head in irq_desc which you need for irq poll and the timer interval should be fixed without dynamic adjustment. Keep it simple! Thanks tglx ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [GIT PULL tip/genirq] Please pull from lost-spurious-irq 2010-08-02 14:07 ` Thomas Gleixner @ 2010-08-02 15:28 ` Tejun Heo 2010-08-02 15:35 ` Tejun Heo 2010-08-02 17:10 ` Thomas Gleixner 0 siblings, 2 replies; 18+ messages in thread From: Tejun Heo @ 2010-08-02 15:28 UTC (permalink / raw) To: Thomas Gleixner; +Cc: lkml, Jeff Garzik, Greg KH Hello, Thomas. On 08/02/2010 04:07 PM, Thomas Gleixner wrote: > Yup, but I have some other issues with this series as well. That thing > is massively overengineered. You mangle all the various functions into > irq_poll which makes it really hard to understand what the code does > under which circumstances. Oh, I'll explain why I muliplexed timers this way below. > I understand most of the problems you want to solve, but I don't like > the outcome too much. > > Let's look at the various parts: Heh, fair enough. Let's talk about it. > 1) irq polling Do you mean spurious irq polling here? > - Get rid of the for_each_irq() loop in the poll timer. > > You can solve this by adding the interrupt to a linked list and > let the poll timer run through the list. I suppose you're talking about using per-desc timer. I first thought about using a single common timer too but decided against it because timer events can be merged from timer code (depending on the interval and slack) and I wanted to use different polling frequencies for different cases. We can of course implement logic to put polled irqs on a list in chronological order but that's basically redoing the work timer code already does. > Also why is the simple counter based decision not good enough? > Why do we need an extra time period for this? Because in many cases IRQ storms are transient and spurious IRQ polling worsens performance a lot, it's worthwhile to be able to recover from such conditions, so the extra time period is there to trigger reenabling of the offending IRQ to see whether the storm is over now. Please note that many of this type of IRQ storms are extremely obscure (for example, happens during resume from STR once in a blue moon due to bad interaction between legacy ATA state machine in the controller and the drive's certain behavior) and some are very difficult to avoid from the driver in any reasonable way. > - Reenable the interrupt from time to time to check whether the irq > storm subsided. Or maybe you were talking about something else? > Generally a good idea, but we really do not want to wait for > another 10k unhandled interrupts flooding the machine. Ideally we > should read out the irq pending register from the irq chip to see > whether the interrupt is still asserted. In simulated tests with hosed IRQ routing the behavior was already quite acceptable, but yeah if peeking at interrupt state can be done easily that would definitely be an improvement. > 2) irq watch > > I have to admit, that I have no clue what this thing exactly > does. After staring into the code for quite a while I came to the > conclusion that it's a dynamic polling machinery, which adjusts > it's polling frequency depending on the number of good and bad > samples. The heuristics for this are completely non obvious. Eh, I thought I did pretty good on documenting each mechanism. Obviously, not enough at all. :-) It basically allows drivers to tell the irq code that IRQs are likely to happen. In response, IRQ code polls the desc for a while and sees whether poll detects IRQ handling w/o actual IRQ deliveries. If that seems to happen more than usual, IRQ code can determine that IRQ is not being delivered and enables full blown IRQ polling. The WARY state is inbetween state between good and bad. Dropping this will simplify the logic a bit. It's basically to avoid false positives as it would suck to enable polling for a working IRQ line. Anyways, the basic heuristics is it watches certain number of IRQ events and count how many are good (via IRQ) and bad (via poll). If bad goes over certain threshold, polling is enabled. > Aside of that the watch mechanism adds unneccesary code to the hard > interrupt context. There is no reason why you need to update that > watch magic in the hard interrupt context. Analysing the stats can > be done in the watch timer as well. All it takes is in > handle_irq_event() > > case IRQ_HANDLED: > action->handled++; > > and in the watch timer function > > if (ret == IRQ_HANDLED) > action->polled++; The whole thing is executed iff unlikely(desc->status & IRQ_CHECK_WATCHES) which the processor will be predicting correctly most of the time. Do you think it would make any noticeable difference? > irq_watch should use a separate global timer and a linked list. The > timer is started when an interrupt is added to the watch mechanism > and stopped when the list becomes empty. > > That's a clear separation of watch and poll and simplifies the > whole business a lot. The reason why poll_irq() looks complicated is because different types of timers on the same desc shares the timer. The reason why it's shared this way instead of across different descs of the same type is to avoid locking overhead in maintaining the timer and linked lists of its targets. By sharing the timer in the same desc, everything can be protected by desc->lock but if we use timers across different descs, we need another layer of locking. > 3) irq expect > > So you basically poll the interrupt until either the interrupt > happened or the device driver timeout occured. > > That's really weird. What does it solve ? Not running into the > device driver timeout routine if the hardware interrupt has been > lost? > > That's overkill, isn't it ? When the hardware loses an interrupt > occasionally then why isn't the device driver timeout routine > sufficient? If it does not check whether the device has an > interrupt pending despite the fact that it has not been delivered, > then this code needs to be fixed and not worked around with weird > polling in the generic irq code. Delays caused by lost interrupt and by other failures can differ a lot in their magnitude and, for example, libata does check IRQ delivery failure in its timeout handler but at that point it doesn't really matter why the timeout has occurred for recovery of that specific command. Too much time has passed anyway. We definitely can implement such quick polling timeouts which check for IRQ misdeliveries in drivers so that it can detect such events and maybe add an interface in the IRQ polling code which the driver can call to enable polling on the IRQ. But then again the pattern of such failures and handling of them would be very similar across different drivers and we already of most of polling machinary implemented in IRQ code, so I think it makes sense to have a generic implementation which drivers can use. Moreover, given the difference in test/review coverage and general complexity of doing it right, I think it is far better to do it in the core code and make it easy to use for drivers. > Btw, this polling is pretty bad in terms of power savings. > > - The timers are not aligned, so if there are several expects armed > you get unbatched wakeups. In usual operation conditions, the timer interval will quickly become 3 seconds w/ 1 sec slack. I doubt it can cause much problem. Right? > - If the poll interval is smaller than the average hardware > interrupt delivery time, you add more wakeups than necessary. The same thing as above. If IRQ delivery is working, it will be polling every 3 seconds w/ 1 sec slack. It wouldn't really matter. > - If the hardware interrupt has happened, you let the poll timer > pending, which gives us an extra wakeup for each interrupt in the > worst case. That's to avoid the overhead of constantly manipulating the timer for high frequency commands. expect/unexpect fast paths don't do much, no locking, no timer manipulations. No matter how busy the IRQ is, the timer will only activate and rearmed occassionally. > I assume your concern is to detect and deal with flaky interrupts, > but avoiding the permanent poll when the device is not used, right? > > So that can be done simpler as well. One timer and a linked list > again. > > expect_irq() adds the irq to the linked list and arms the timer, if > it is not armed already. unexpect_irq() removes the irq from the > linked list and deletes the timer when the list becomes empty. > > That can reuse the list_head in irq_desc which you need for irq > poll and the timer interval should be fixed without dynamic > adjustment. The main problem here is that expect/unexpect_irq() needs to be low overhead so that drivers can easily call in w/o worrying too much about performance overhead and I would definitely want to avoid locking in fast paths. > Keep it simple! To me, the more interesting question is where to put the complexity. In many cases, the most we can do is pushing complexity around, and if we can make things easier for drivers by making the core IRQ code somewhat more complex, I'll definitely go for that every time. That's also the main theme of this IRQ spurious/missing patchset. The implementation might be a bit complex but using it from the driver side is almost no brainer. The drivers just need to give hints to the IRQ polling code and the IRQ polling code will take care of everything else, and the low overhead for irq_expect/unexpect() is important in that aspect because drivers can only use it without worrying iff its overhead is sufficiently low. That said, I definitely think poll_irq() can be better factored. Its current form is mainly due to its transition from spurious poller to the current multiplexed form. Factoring out different parts and possibly killing watch handling should simplify it quite a bit. Thank you. -- tejun ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [GIT PULL tip/genirq] Please pull from lost-spurious-irq 2010-08-02 15:28 ` Tejun Heo @ 2010-08-02 15:35 ` Tejun Heo 2010-08-02 18:52 ` Thomas Gleixner 2010-08-02 17:10 ` Thomas Gleixner 1 sibling, 1 reply; 18+ messages in thread From: Tejun Heo @ 2010-08-02 15:35 UTC (permalink / raw) To: Thomas Gleixner; +Cc: lkml, Jeff Garzik, Greg KH On 08/02/2010 05:28 PM, Tejun Heo wrote: > The reason why poll_irq() looks complicated is because different types > of timers on the same desc shares the timer. The reason why it's > shared this way instead of across different descs of the same type is > to avoid locking overhead in maintaining the timer and linked lists of > its targets. By sharing the timer in the same desc, everything can be > protected by desc->lock but if we use timers across different descs, > we need another layer of locking. Ooh, another reason is timer locality. If timers are shared per desc, they have much higher chance of being on the same processor. Global timers would be pretty bad in that respect. -- tejun ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [GIT PULL tip/genirq] Please pull from lost-spurious-irq 2010-08-02 15:35 ` Tejun Heo @ 2010-08-02 18:52 ` Thomas Gleixner 2010-08-02 19:57 ` Tejun Heo 2010-08-02 21:06 ` Tejun Heo 0 siblings, 2 replies; 18+ messages in thread From: Thomas Gleixner @ 2010-08-02 18:52 UTC (permalink / raw) To: Tejun Heo; +Cc: lkml, Jeff Garzik, Greg KH On Mon, 2 Aug 2010, Tejun Heo wrote: > On 08/02/2010 05:28 PM, Tejun Heo wrote: > > The reason why poll_irq() looks complicated is because different types > > of timers on the same desc shares the timer. The reason why it's > > shared this way instead of across different descs of the same type is > > to avoid locking overhead in maintaining the timer and linked lists of > > its targets. By sharing the timer in the same desc, everything can be > > protected by desc->lock but if we use timers across different descs, > > we need another layer of locking. > > Ooh, another reason is timer locality. If timers are shared per desc, > they have much higher chance of being on the same processor. Global > timers would be pretty bad in that respect. That's irrelevant. If you need to poll an interrupt, then it does not matter at all whether you bounce some cache lines or not. In fact we have two cases: 1) An interrupt needs to be polled all the time. That sucks whether the poll timer bounces a few cache lines or not. 2) Polling an irq for some time. Either it works again after a while, so your suckage is restricted to the poll period. If not see #1 And it's even less of an issue as the main users of this misfeature are laptops and desktop machines, where locality is not really that important. If an enterprise admin decides to ignore the fact that the hardware is flaky, then he does not care about the cache line bounces either. Thanks, tglx ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [GIT PULL tip/genirq] Please pull from lost-spurious-irq 2010-08-02 18:52 ` Thomas Gleixner @ 2010-08-02 19:57 ` Tejun Heo 2010-08-03 10:06 ` Thomas Gleixner 2010-08-02 21:06 ` Tejun Heo 1 sibling, 1 reply; 18+ messages in thread From: Tejun Heo @ 2010-08-02 19:57 UTC (permalink / raw) To: Thomas Gleixner; +Cc: lkml, Jeff Garzik, Greg KH Hello, Thomas. On 08/02/2010 08:52 PM, Thomas Gleixner wrote: >> Ooh, another reason is timer locality. If timers are shared per desc, >> they have much higher chance of being on the same processor. Global >> timers would be pretty bad in that respect. > > That's irrelevant. If you need to poll an interrupt, then it does not > matter at all whether you bounce some cache lines or not. > > In fact we have two cases: > > 1) An interrupt needs to be polled all the time. That sucks whether > the poll timer bounces a few cache lines or not. > > 2) Polling an irq for some time. Either it works again after a > while, so your suckage is restricted to the poll period. If not > see #1 Hmm... for spurious and watch the above are true and if it were the above two it would definitely make more sense to use per-purpose global timers. The problem is w/ expect tho. It's supposed to be used with normal hot paths, so expect/unexpect operations better be low overhead and local. I'll talk more about it in the other reply. > And it's even less of an issue as the main users of this misfeature > are laptops and desktop machines, where locality is not really that > important. If an enterprise admin decides to ignore the fact that the > hardware is flaky, then he does not care about the cache line bounces > either. These problems do happen on intel chipset machines and is something which can be worked around with some effort. Eh, let's talk on the other reply. Thanks. -- tejun ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [GIT PULL tip/genirq] Please pull from lost-spurious-irq 2010-08-02 19:57 ` Tejun Heo @ 2010-08-03 10:06 ` Thomas Gleixner 2010-08-03 10:15 ` Tejun Heo 0 siblings, 1 reply; 18+ messages in thread From: Thomas Gleixner @ 2010-08-03 10:06 UTC (permalink / raw) To: Tejun Heo; +Cc: lkml, Jeff Garzik, Greg KH On Mon, 2 Aug 2010, Tejun Heo wrote: > > And it's even less of an issue as the main users of this misfeature > > are laptops and desktop machines, where locality is not really that > > important. If an enterprise admin decides to ignore the fact that the > > hardware is flaky, then he does not care about the cache line bounces > > either. > > These problems do happen on intel chipset machines and is something > which can be worked around with some effort. Eh, let's talk on the > other reply. So you're saying that the ATA problem is restricted to Intel chipsets? Do we know the root cause ? Thanks, tglx ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [GIT PULL tip/genirq] Please pull from lost-spurious-irq 2010-08-03 10:06 ` Thomas Gleixner @ 2010-08-03 10:15 ` Tejun Heo 0 siblings, 0 replies; 18+ messages in thread From: Tejun Heo @ 2010-08-03 10:15 UTC (permalink / raw) To: Thomas Gleixner; +Cc: lkml, Jeff Garzik, Greg KH Hello, On 08/03/2010 12:06 PM, Thomas Gleixner wrote: >> These problems do happen on intel chipset machines and is something >> which can be worked around with some effort. Eh, let's talk on the >> other reply. > > So you're saying that the ATA problem is restricted to Intel chipsets? > Do we know the root cause ? The original sentence is missing an 'also'. Severals have been root caused and worked around. For SATA, one of the notable problems was misinterpretation of nIEN (interrupt block bit on the ATA device side) on both device and host sides. In traditional IDE API, the interrupt bit is primarily under the control of the device, so it's pretty difficult to guarantee reliable operation from driver side only. For PATA, the device actually has full control of the interrupt line, so it's much worse. Thanks. -- tejun ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [GIT PULL tip/genirq] Please pull from lost-spurious-irq 2010-08-02 18:52 ` Thomas Gleixner 2010-08-02 19:57 ` Tejun Heo @ 2010-08-02 21:06 ` Tejun Heo 2010-08-02 21:51 ` Thomas Gleixner 1 sibling, 1 reply; 18+ messages in thread From: Tejun Heo @ 2010-08-02 21:06 UTC (permalink / raw) To: Thomas Gleixner; +Cc: lkml, Jeff Garzik, Greg KH Hello, Thomas. On 08/02/2010 08:52 PM, Thomas Gleixner wrote: >> Ooh, another reason is timer locality. If timers are shared per desc, >> they have much higher chance of being on the same processor. Global >> timers would be pretty bad in that respect. > > That's irrelevant. If you need to poll an interrupt, then it does not > matter at all whether you bounce some cache lines or not. > > In fact we have two cases: > > 1) An interrupt needs to be polled all the time. That sucks whether > the poll timer bounces a few cache lines or not. > > 2) Polling an irq for some time. Either it works again after a > while, so your suckage is restricted to the poll period. If not > see #1 Hmm... for spurious and watch the above are true and if it were the above two it would definitely make more sense to use per-purpose global timers. The problem is w/ expect tho. It's supposed to be used with normal hot paths, so expect/unexpect operations better be low overhead and local. I'll talk more about it in the other reply. > And it's even less of an issue as the main users of this misfeature > are laptops and desktop machines, where locality is not really that > important. If an enterprise admin decides to ignore the fact that the > hardware is flaky, then he does not care about the cache line bounces > either. These problems do happen on intel chipset machines and is something which can be worked around with some effort. Eh, let's talk on the other reply. Thanks. -- tejun -- tejun ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [GIT PULL tip/genirq] Please pull from lost-spurious-irq 2010-08-02 21:06 ` Tejun Heo @ 2010-08-02 21:51 ` Thomas Gleixner 0 siblings, 0 replies; 18+ messages in thread From: Thomas Gleixner @ 2010-08-02 21:51 UTC (permalink / raw) To: Tejun Heo; +Cc: lkml, Jeff Garzik, Greg KH Tejun, On Mon, 2 Aug 2010, Tejun Heo wrote: > Hello, Thomas. > > On 08/02/2010 08:52 PM, Thomas Gleixner wrote: > >> Ooh, another reason is timer locality. If timers are shared per desc, > >> they have much higher chance of being on the same processor. Global > >> timers would be pretty bad in that respect. > > > > That's irrelevant. If you need to poll an interrupt, then it does not > > matter at all whether you bounce some cache lines or not. > > > > In fact we have two cases: > > > > 1) An interrupt needs to be polled all the time. That sucks whether > > the poll timer bounces a few cache lines or not. > > > > 2) Polling an irq for some time. Either it works again after a > > while, so your suckage is restricted to the poll period. If not > > see #1 > > Hmm... for spurious and watch the above are true and if it were the > above two it would definitely make more sense to use per-purpose > global timers. The problem is w/ expect tho. It's supposed to be > used with normal hot paths, so expect/unexpect operations better be > low overhead and local. I'll talk more about it in the other reply. No, it's not. You are just looking at it from the wrong perspective. The expect scenario is just a different form of watch. You worked around the problem by moving the timer to 3 seconds + slack if everything works as expected, but that's just sloppy. In fact you really want to kick that thing in when things go awry and take it away when it comes back to normal. When things go awry, then the cache line bouncing is the least of your worries, really. Thanks, tglx ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [GIT PULL tip/genirq] Please pull from lost-spurious-irq 2010-08-02 15:28 ` Tejun Heo 2010-08-02 15:35 ` Tejun Heo @ 2010-08-02 17:10 ` Thomas Gleixner 2010-08-02 20:48 ` Tejun Heo 1 sibling, 1 reply; 18+ messages in thread From: Thomas Gleixner @ 2010-08-02 17:10 UTC (permalink / raw) To: Tejun Heo; +Cc: lkml, Jeff Garzik, Greg KH Tejun, On Mon, 2 Aug 2010, Tejun Heo wrote: > Hello, Thomas. > > On 08/02/2010 04:07 PM, Thomas Gleixner wrote: > > 1) irq polling > > Do you mean spurious irq polling here? Yep > > - Get rid of the for_each_irq() loop in the poll timer. > > > > You can solve this by adding the interrupt to a linked list and > > let the poll timer run through the list. > > I suppose you're talking about using per-desc timer. I first thought > about using a single common timer too but decided against it because > timer events can be merged from timer code (depending on the interval > and slack) and I wanted to use different polling frequencies for > different cases. We can of course implement logic to put polled irqs > on a list in chronological order but that's basically redoing the work > timer code already does. They might be merged depending on interval and slack, but why do we need different frequencies at all ? > > Also why is the simple counter based decision not good enough? > > Why do we need an extra time period for this? > > Because in many cases IRQ storms are transient and spurious IRQ > polling worsens performance a lot, it's worthwhile to be able to > recover from such conditions, so the extra time period is there to > trigger reenabling of the offending IRQ to see whether the storm is > over now. Please note that many of this type of IRQ storms are > extremely obscure (for example, happens during resume from STR once in > a blue moon due to bad interaction between legacy ATA state machine in > the controller and the drive's certain behavior) and some are very > difficult to avoid from the driver in any reasonable way. If the IRQ has been marked as spurious, then switch to polling for a defined time period (simply count the number of poll timer runs for that irq). After that switch back to non polling and keep a flag which kicks the stupid thing back to poll after 10 spurious ones in a row. > > - Reenable the interrupt from time to time to check whether the irq > > storm subsided. > > Or maybe you were talking about something else? No, that stuff is so convoluted that I did not understand it. > > Generally a good idea, but we really do not want to wait for > > another 10k unhandled interrupts flooding the machine. Ideally we > > should read out the irq pending register from the irq chip to see > > whether the interrupt is still asserted. > > In simulated tests with hosed IRQ routing the behavior was already > quite acceptable, but yeah if peeking at interrupt state can be done > easily that would definitely be an improvement. It could be done on various irq chips, but not on all. So we need that real retry anyway; no need to add another function to irq_chip. > > 2) irq watch > > The whole thing is executed iff unlikely(desc->status & > IRQ_CHECK_WATCHES) which the processor will be predicting correctly > most of the time. Do you think it would make any noticeable > difference? It's not about the !WATCH case. I don't like the extra work for those watched ones. > > irq_watch should use a separate global timer and a linked list. The > > timer is started when an interrupt is added to the watch mechanism > > and stopped when the list becomes empty. > > > > That's a clear separation of watch and poll and simplifies the > > whole business a lot. > > The reason why poll_irq() looks complicated is because different types > of timers on the same desc shares the timer. The reason why it's > shared this way instead of across different descs of the same type is > to avoid locking overhead in maintaining the timer and linked lists of > its targets. By sharing the timer in the same desc, everything can be > protected by desc->lock but if we use timers across different descs, > we need another layer of locking. The locking overhead for a global timer + linked list is minimal. There is only one additional lock, the one which protects the list_head. So you take it when watch_irq() adds the watch, when the watch is removed and in the timer routine. That's zero overhead as lock contention is totaly unlikely. There is no need to fiddle with the timer interval. Watch the stupid thing for a while, if it behaves remove it from the list, if not you need to keep polling anyway. > > 3) irq expect > > > We definitely can implement such quick polling timeouts which check > for IRQ misdeliveries in drivers so that it can detect such events and > maybe add an interface in the IRQ polling code which the driver can > call to enable polling on the IRQ. But then again the pattern of such > failures and handling of them would be very similar across different > drivers and we already of most of polling machinary implemented in IRQ > code, so I think it makes sense to have a generic implementation which > drivers can use. Moreover, given the difference in test/review > coverage and general complexity of doing it right, I think it is far > better to do it in the core code and make it easy to use for drivers. I'm not opposed to have a generic solution for this, but I want a simple and understandable one. The irq_poll() works for everything magic is definitely not in this category. > > Btw, this polling is pretty bad in terms of power savings. > > > > - The timers are not aligned, so if there are several expects armed > > you get unbatched wakeups. > > In usual operation conditions, the timer interval will quickly become > 3 seconds w/ 1 sec slack. I doubt it can cause much problem. Right? And those 4 seconds are probably larger than the device timeouts in most cases. So what's the point of running that timer at all ? > > I assume your concern is to detect and deal with flaky interrupts, > > but avoiding the permanent poll when the device is not used, right? > > > > So that can be done simpler as well. One timer and a linked list > > again. > > > > expect_irq() adds the irq to the linked list and arms the timer, if > > it is not armed already. unexpect_irq() removes the irq from the > > linked list and deletes the timer when the list becomes empty. > > > > That can reuse the list_head in irq_desc which you need for irq > > poll and the timer interval should be fixed without dynamic > > adjustment. > > The main problem here is that expect/unexpect_irq() needs to be low > overhead so that drivers can easily call in w/o worrying too much > about performance overhead and I would definitely want to avoid > locking in fast paths. In general you only want to use this expect thing when you detected that the device has a problem. So why not use the watch mechanism for this ? When you detect the first timeout in the device driver, set the watch on that action and let it figure out whether it goes back to normal. If it goes back to normal the watch disappears. If it happens again, repeat. expect_irq() would be basically a NOP, but I like to keep it, as it can be used for power management decisions. unexpect_irq() would either add or remove the watch depending on the timeout argument. So there is only overhead when things went bad and when we go back to normal in all other cases it's zero. Though we need a reference to irqaction.watch of that device, but that's not a big deal. With that we can call unexpect_irq(watch, false/true); which would be an inline function: static inline void unexpect_irq(watch, timeout) { if (unlikely(watch->active ^ timeout)) __unexpect_irq(watch, timeout); } Simple, right ? > To me, the more interesting question is where to put the complexity. > In many cases, the most we can do is pushing complexity around, and if > we can make things easier for drivers by making the core IRQ code > somewhat more complex, I'll definitely go for that every time. No objections, but I doubt, that we need all the heuristics and complex code to deal with it. Thanks, tglx ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [GIT PULL tip/genirq] Please pull from lost-spurious-irq 2010-08-02 17:10 ` Thomas Gleixner @ 2010-08-02 20:48 ` Tejun Heo 2010-08-02 22:28 ` Thomas Gleixner 0 siblings, 1 reply; 18+ messages in thread From: Tejun Heo @ 2010-08-02 20:48 UTC (permalink / raw) To: Thomas Gleixner; +Cc: lkml, Jeff Garzik, Greg KH Hello, again. On 08/02/2010 07:10 PM, Thomas Gleixner wrote: >> I suppose you're talking about using per-desc timer. I first thought >> about using a single common timer too but decided against it because >> timer events can be merged from timer code (depending on the interval >> and slack) and I wanted to use different polling frequencies for >> different cases. We can of course implement logic to put polled irqs >> on a list in chronological order but that's basically redoing the work >> timer code already does. > > They might be merged depending on interval and slack, but why do we > need different frequencies at all ? Yeah, these differences all come down to expecting. If it weren't for expecting, one timer to rule them all should have been enough. I'll continue below. >> Because in many cases IRQ storms are transient and spurious IRQ >> polling worsens performance a lot, it's worthwhile to be able to >> recover from such conditions, so the extra time period is there to >> trigger reenabling of the offending IRQ to see whether the storm is >> over now. Please note that many of this type of IRQ storms are >> extremely obscure (for example, happens during resume from STR once in >> a blue moon due to bad interaction between legacy ATA state machine in >> the controller and the drive's certain behavior) and some are very >> difficult to avoid from the driver in any reasonable way. > > If the IRQ has been marked as spurious, then switch to polling for a > defined time period (simply count the number of poll timer runs for > that irq). After that switch back to non polling and keep a flag which > kicks the stupid thing back to poll after 10 spurious ones in a row. The current logic isn't that different except that it always considers periods instead of consecutive ones and uses exponential backoff for re-enable timing. >> Or maybe you were talking about something else? > > No, that stuff is so convoluted that I did not understand it. Hmmm... there _are_ some complications but they're mainly how different mechanisms may interact with each other (ie. watching delayed while spurious polling is in effect kind of thing) and turning IRQs on/off (mostly due to locking), but the spurious irq polling part isn't exactly convoluted. Which part do you find so convoluted? I do agree it's somewhat complex and if you have different model in mind, it might not match your expectations. I tried to clean it up at least in its current structure but I could have been looking at it for a bit too long. >>> 2) irq watch >> >> The whole thing is executed iff unlikely(desc->status & >> IRQ_CHECK_WATCHES) which the processor will be predicting correctly >> most of the time. Do you think it would make any noticeable >> difference? > > It's not about the !WATCH case. I don't like the extra work for those > watched ones. WATCH shouldn't be used in normal paths because the polling code doesn't have enough information to reduce overhead. It can only be used in occassional events like IRQ enable, timeout detected by driver kind of events, so WATCH case overhead doesn't really matter. Its primary use case would be invoking it right after requesting IRQ as USB does. > The locking overhead for a global timer + linked list is minimal. > > There is only one additional lock, the one which protects the > list_head. So you take it when watch_irq() adds the watch, when the > watch is removed and in the timer routine. That's zero overhead as > lock contention is totaly unlikely. There is no need to fiddle with > the timer interval. Watch the stupid thing for a while, if it behaves > remove it from the list, if not you need to keep polling anyway. For watch only, I agree that global timer would work better. >>> 3) irq expect > I'm not opposed to have a generic solution for this, but I want a > simple and understandable one. The irq_poll() works for everything > magic is definitely not in this category. It's just multiplexing timer. If the biggest issue is convolution in irq_poll(), I can try to make it prettier for sure, but I guess that discussion is for later time. >> In usual operation conditions, the timer interval will quickly become >> 3 seconds w/ 1 sec slack. I doubt it can cause much problem. Right? > > And those 4 seconds are probably larger than the device timeouts in > most cases. So what's the point of running that timer at all ? Well, it's fairly common to have tens of secs for timeouts. Even going into minute range isn't too uncommon. >> The main problem here is that expect/unexpect_irq() needs to be low >> overhead so that drivers can easily call in w/o worrying too much >> about performance overhead and I would definitely want to avoid >> locking in fast paths. > > In general you only want to use this expect thing when you detected > that the device has a problem. So why not use the watch mechanism for > this ? No, the goal is to use it for normal cases, so that we can provide reasonable level of protection against occassional hiccups at fairly low overhead. Otherwise, it wouldn't be different at all from IRQ watching. > When you detect the first timeout in the device driver, set the watch > on that action and let it figure out whether it goes back to > normal. If it goes back to normal the watch disappears. If it happens > again, repeat. Yeah, that's how irq_watch is supposed to be used. irq_expect is for cases where drivers can provide better information about when to start and finish expecting interrupts. For these cases, we can provide protection against occassional IRQ hiccups too. A few secs of hiccup would be an order of magnitude better and will actually make such failures mostly bearable. > Though we need a reference to irqaction.watch of that device, but > that's not a big deal. With that we can call > > unexpect_irq(watch, false/true); > > which would be an inline function: > > static inline void unexpect_irq(watch, timeout) > { > if (unlikely(watch->active ^ timeout)) > __unexpect_irq(watch, timeout); > } > > Simple, right ? Well, now we're talking about different problems. If it were not for using it in normal cases, it would be better to just rip off expecting and use watching. >> To me, the more interesting question is where to put the complexity. >> In many cases, the most we can do is pushing complexity around, and if >> we can make things easier for drivers by making the core IRQ code >> somewhat more complex, I'll definitely go for that every time. > > No objections, but I doubt, that we need all the heuristics and > complex code to deal with it. Cool, I'm happy as long as you agree on that. So, here's where I'm coming from: There are two classes of ATA bugs which have been bothering me for quite some time now. Both are pretty cumbersome to solve from libata proper. The first is some ATAPI devices locking up because we use different probing sequence than windows and other media presence polling related issues. I think we'll need in-kernel media presence detection and with cmwq it shouldn't be too hard to implement. The other, as you might have expected, is these frigging IRQ issues. There are several different patterns of failures. One is misrouting or other persistent delivery failure. irq_watch alone should be able to solve most part of this. Another common one is stuck IRQ line. Cases where this condition is sporadic and transient aren't too rare, so the updates to spurious handling. The last is occassional delivery failures which unnecessarily lead to full blown command timeouts. This is what irq_expect tries to work around. After spending some time w/ ATA, what I've learned is that shit happens no matter what and os/driver better be ready to handle them and, for a lot of cases, the driver has enough information to be able to work around and survive most IRQ problems. I don't think that IRQ expect/unexpect is pushing it too far. It sure adds some level of complexity but I don't think it is an inherently complex thing - it could be just that my code sucks. Anyways, so there are two things to discuss here, I guess. First, irq_expect itself and second the implementation deatils. If you can think of something which is simpler and achieves about the same thing, it would great. Thanks. -- tejun ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [GIT PULL tip/genirq] Please pull from lost-spurious-irq 2010-08-02 20:48 ` Tejun Heo @ 2010-08-02 22:28 ` Thomas Gleixner 2010-08-03 8:49 ` Tejun Heo 0 siblings, 1 reply; 18+ messages in thread From: Thomas Gleixner @ 2010-08-02 22:28 UTC (permalink / raw) To: Tejun Heo; +Cc: lkml, Jeff Garzik, Greg KH Tejun, On Mon, 2 Aug 2010, Tejun Heo wrote: > >> Because in many cases IRQ storms are transient and spurious IRQ > >> polling worsens performance a lot, it's worthwhile to be able to > >> recover from such conditions, so the extra time period is there to > >> trigger reenabling of the offending IRQ to see whether the storm is > >> over now. Please note that many of this type of IRQ storms are > >> extremely obscure (for example, happens during resume from STR once in > >> a blue moon due to bad interaction between legacy ATA state machine in > >> the controller and the drive's certain behavior) and some are very > >> difficult to avoid from the driver in any reasonable way. > > > > If the IRQ has been marked as spurious, then switch to polling for a > > defined time period (simply count the number of poll timer runs for > > that irq). After that switch back to non polling and keep a flag which > > kicks the stupid thing back to poll after 10 spurious ones in a row. > > The current logic isn't that different except that it always considers > periods instead of consecutive ones and uses exponential backoff for > re-enable timing. There is no need for that exponential backoff. Either the thing works or it does not. Where is the fcking point? If you switch the thing to polling mode, then it does not matter at all whether you try once a second, once a minute or once an hour to restore the thing. It only matters when you insist on collecting another 10k spurious ones before deciding another time that the thing is hosed. > >> Or maybe you were talking about something else? > > > > No, that stuff is so convoluted that I did not understand it. > > Hmmm... there _are_ some complications but they're mainly how > different mechanisms may interact with each other (ie. watching > delayed while spurious polling is in effect kind of thing) and turning > IRQs on/off (mostly due to locking), but the spurious irq polling part > isn't exactly convoluted. Which part do you find so convoluted? I do > agree it's somewhat complex and if you have different model in mind, > it might not match your expectations. I tried to clean it up at least > in its current structure but I could have been looking at it for a bit > too long. My main concern is that you are trying to tie everything into one function, which is always a clear sign for crap. > >>> 2) irq watch > >> > >> The whole thing is executed iff unlikely(desc->status & > >> IRQ_CHECK_WATCHES) which the processor will be predicting correctly > >> most of the time. Do you think it would make any noticeable > >> difference? > > > > It's not about the !WATCH case. I don't like the extra work for those > > watched ones. > > WATCH shouldn't be used in normal paths because the polling code > doesn't have enough information to reduce overhead. It can only be > used in occassional events like IRQ enable, timeout detected by driver > kind of events, so WATCH case overhead doesn't really matter. Its > primary use case would be invoking it right after requesting IRQ as > USB does. That's nonsense. You add the watch in the first place to figure out whether that IRQ is reliably delivered or not. If it is not, then you add the whole magic to the irq disabled hotpath forever. That's not a problem for that particular IRQ, it's a problem for the overall system. > >>> 3) irq expect > > I'm not opposed to have a generic solution for this, but I want a > > simple and understandable one. The irq_poll() works for everything > > magic is definitely not in this category. > > It's just multiplexing timer. If the biggest issue is convolution in > irq_poll(), I can try to make it prettier for sure, but I guess that > discussion is for later time. It's _NOT_. You _CANNOT_ make it prettier because your "design" is sucky by "multiplexing the timer". The multiplexing is the main issue and there is no way to make this less convoluted as hard as you might try. > >> In usual operation conditions, the timer interval will quickly become > >> 3 seconds w/ 1 sec slack. I doubt it can cause much problem. Right? > > > > And those 4 seconds are probably larger than the device timeouts in > > most cases. So what's the point of running that timer at all ? > > Well, it's fairly common to have tens of secs for timeouts. Even > going into minute range isn't too uncommon. No. You just look at it from ATA or whatever, but there are tons of drivers which have timeouts below 100ms. You _CANNOT_ generalize that assumption. And if you want this for ATA where it possibly fits, then you just make your whole argument of generalizing the approach moot. > >> The main problem here is that expect/unexpect_irq() needs to be low > >> overhead so that drivers can easily call in w/o worrying too much > >> about performance overhead and I would definitely want to avoid > >> locking in fast paths. > > > > In general you only want to use this expect thing when you detected > > that the device has a problem. So why not use the watch mechanism for > > this ? > > No, the goal is to use it for normal cases, so that we can provide > reasonable level of protection against occassional hiccups at fairly > low overhead. Otherwise, it wouldn't be different at all from IRQ > watching. WTF ? Once you switched to IRQ_EXP_VERIFIED then you run with 3sec + slack timeout. You only switch back when a timeout in the device driver happens. That's at least how I understand the code. Correct me if I'm wrong, but then you just deliver another proof for complete non understandble horror. > > When you detect the first timeout in the device driver, set the watch > > on that action and let it figure out whether it goes back to > > normal. If it goes back to normal the watch disappears. If it happens > > again, repeat. > > Yeah, that's how irq_watch is supposed to be used. irq_expect is for > cases where drivers can provide better information about when to start > and finish expecting interrupts. For these cases, we can provide > protection against occassional IRQ hiccups too. A few secs of hiccup > would be an order of magnitude better and will actually make such > failures mostly bearable. Err. That's complete nonsense. The device driver has a timeout for the occasional hickup. If it happens you want to watch out for the device to settle back to normal. > > Though we need a reference to irqaction.watch of that device, but > > that's not a big deal. With that we can call > > > > unexpect_irq(watch, false/true); > > > > which would be an inline function: > > > > static inline void unexpect_irq(watch, timeout) > > { > > if (unlikely(watch->active ^ timeout)) > > __unexpect_irq(watch, timeout); > > } > > > > Simple, right ? > > Well, now we're talking about different problems. If it were not for > using it in normal cases, it would be better to just rip off expecting > and use watching. Oh man. You seem to be lost in some different universe. The normal case - and that's what it is according to your code is running the expect thing at low frequency (>= 3 seconds) and just swicthes back to quick polling when the driver code detected an error. So how is that fcking different from my approach ? > >> To me, the more interesting question is where to put the complexity. > >> In many cases, the most we can do is pushing complexity around, and if > >> we can make things easier for drivers by making the core IRQ code > >> somewhat more complex, I'll definitely go for that every time. > > > > No objections, but I doubt, that we need all the heuristics and > > complex code to deal with it. > > Cool, I'm happy as long as you agree on that. So, here's where I'm > coming from: There are two classes of ATA bugs which have been > bothering me for quite some time now. Both are pretty cumbersome to > solve from libata proper. > > The first is some ATAPI devices locking up because we use different > probing sequence than windows and other media presence polling related > issues. I think we'll need in-kernel media presence detection and > with cmwq it shouldn't be too hard to implement. That's an unrelated issue. > The other, as you might have expected, is these frigging IRQ issues. > There are several different patterns of failures. One is misrouting > or other persistent delivery failure. irq_watch alone should be able > to solve most part of this. Another common one is stuck IRQ line. > Cases where this condition is sporadic and transient aren't too rare, > so the updates to spurious handling. The last is occassional delivery > failures which unnecessarily lead to full blown command timeouts. > This is what irq_expect tries to work around. As I said, you can cope with transient failures with the modified watch scheme as well. > After spending some time w/ ATA, what I've learned is that shit > happens no matter what and os/driver better be ready to handle them > and, for a lot of cases, the driver has enough information to be able > to work around and survive most IRQ problems. > > I don't think that IRQ expect/unexpect is pushing it too far. It sure > adds some level of complexity but I don't think it is an inherently > complex thing - it could be just that my code sucks. Anyways, so > there are two things to discuss here, I guess. First, irq_expect > itself and second the implementation deatils. If you can think of > something which is simpler and achieves about the same thing, it would > great. I pointed out the simple and acceptable solution already. Thanks, tglx ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [GIT PULL tip/genirq] Please pull from lost-spurious-irq 2010-08-02 22:28 ` Thomas Gleixner @ 2010-08-03 8:49 ` Tejun Heo 2010-08-03 11:43 ` Thomas Gleixner 0 siblings, 1 reply; 18+ messages in thread From: Tejun Heo @ 2010-08-03 8:49 UTC (permalink / raw) To: Thomas Gleixner; +Cc: lkml, Jeff Garzik, Greg KH Hello, Thomas. On 08/03/2010 12:28 AM, Thomas Gleixner wrote: >> The current logic isn't that different except that it always considers >> periods instead of consecutive ones and uses exponential backoff for >> re-enable timing. > > There is no need for that exponential backoff. Either the thing works > or it does not. Where is the fcking point? If you switch the thing to > polling mode, then it does not matter at all whether you try once a > second, once a minute or once an hour to restore the thing. It only > matters when you insist on collecting another 10k spurious ones before > deciding another time that the thing is hosed. The intention was to use the same detection code for retrials as the initial detection. But it sure can be done that way too. >> WATCH shouldn't be used in normal paths because the polling code >> doesn't have enough information to reduce overhead. It can only be >> used in occassional events like IRQ enable, timeout detected by driver >> kind of events, so WATCH case overhead doesn't really matter. Its >> primary use case would be invoking it right after requesting IRQ as >> USB does. > > That's nonsense. You add the watch in the first place to figure out > whether that IRQ is reliably delivered or not. If it is not, then you > add the whole magic to the irq disabled hotpath forever. That's not a > problem for that particular IRQ, it's a problem for the overall > system. For the overall system? The thing is enabled/disabled per IRQ or are you talking about something else? >>>>> 3) irq expect >>> I'm not opposed to have a generic solution for this, but I want a >>> simple and understandable one. The irq_poll() works for everything >>> magic is definitely not in this category. >> >> It's just multiplexing timer. If the biggest issue is convolution in >> irq_poll(), I can try to make it prettier for sure, but I guess that >> discussion is for later time. > > It's _NOT_. You _CANNOT_ make it prettier because your "design" is sucky > by "multiplexing the timer". The multiplexing is the main issue and > there is no way to make this less convoluted as hard as you might try. Yeah, there's certain level of complexity w/ multiplexing timer. It would be nice to be able to do away with that. >> Well, it's fairly common to have tens of secs for timeouts. Even >> going into minute range isn't too uncommon. > > No. You just look at it from ATA or whatever, but there are tons of > drivers which have timeouts below 100ms. You _CANNOT_ generalize that > assumption. And if you want this for ATA where it possibly fits, then > you just make your whole argument of generalizing the approach moot. For most IO devices, having long timeouts is pretty common. irq_expect doesn't make sense for cases where the usual timeouts are pretty low. It's for cases where the device timeout is much longer. It probably is most useful for ATA where long timeouts and flaky IRQs are commonplace but it's generally useful for IO initiators. >> No, the goal is to use it for normal cases, so that we can provide >> reasonable level of protection against occassional hiccups at fairly >> low overhead. Otherwise, it wouldn't be different at all from IRQ >> watching. > > WTF ? Once you switched to IRQ_EXP_VERIFIED then you run with 3sec + > slack timeout. You only switch back when a timeout in the device > driver happens. That's at least how I understand the code. Correct > me if I'm wrong, No, it also switches on when 3sec timeout polls successfully. The IRQ_IN_POLLING test in unexpect_irq(). > but then you just deliver another proof for complete non > understandble horror. I understand the concern about and partially agree with the ugliness of multiplexing the timer but, come on, you misunderstanding that test can't be all my fault. There is even a comment saying /* succesful completion from IRQ? */ on top of that test. >> Yeah, that's how irq_watch is supposed to be used. irq_expect is for >> cases where drivers can provide better information about when to start >> and finish expecting interrupts. For these cases, we can provide >> protection against occassional IRQ hiccups too. A few secs of hiccup >> would be an order of magnitude better and will actually make such >> failures mostly bearable. > > Err. That's complete nonsense. The device driver has a timeout for the > occasional hickup. If it happens you want to watch out for the device > to settle back to normal. In all or most IO drivers, the timeout only kicks in after tens of seconds to basically clean things up and retry. irq_expect is a generic mechanism to detect and handle IRQ delivery hiccups which is a specific subclass of the different timeouts. >> Well, now we're talking about different problems. If it were not for >> using it in normal cases, it would be better to just rip off expecting >> and use watching. > > Oh man. You seem to be lost in some different universe. The normal > case - and that's what it is according to your code is running the > expect thing at low frequency (>= 3 seconds) and just swicthes back to > quick polling when the driver code detected an error. So how is that > fcking different from my approach ? Gees, enough with fucks already. As I wrote above, shorter interval polling kicks in when an IRQ event is delivered through polling. Why would there be irq_expect at all otherwise? It would be basically identical to irq_watch. I agree that timer multiplexing is a rather ugly thing and it would be great to remove it. You're right that it doesn't make whole lot of difference whether the timer is global or local if it's low frequency and in fast paths expect/unexpect would be able to just test its list entry and set/clear currently expecting status without messing with the global timer or lock. Then, we can have a single low freq timer for expect/unexpect and the other for actual polling. How does that sound to you? Thanks. -- tejun ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [GIT PULL tip/genirq] Please pull from lost-spurious-irq 2010-08-03 8:49 ` Tejun Heo @ 2010-08-03 11:43 ` Thomas Gleixner 0 siblings, 0 replies; 18+ messages in thread From: Thomas Gleixner @ 2010-08-03 11:43 UTC (permalink / raw) To: Tejun Heo; +Cc: lkml, Jeff Garzik, Greg KH Tejun, On Tue, 3 Aug 2010, Tejun Heo wrote: > I agree that timer multiplexing is a rather ugly thing and it would be > great to remove it. You're right that it doesn't make whole lot of > difference whether the timer is global or local if it's low frequency > and in fast paths expect/unexpect would be able to just test its list > entry and set/clear currently expecting status without messing with > the global timer or lock. Then, we can have a single low freq timer > for expect/unexpect and the other for actual polling. And the third one for watch, right ? That would give us separate timer functions which each serve a particular purpose. When you go for it, can you please simplify all the heuristics? spurious poll: One fixed poll interval is enough. The retry logic can be made simple, just set it back to interrupt delivery once per minute and limit the storm to 10. watch: Get rid of the interrupt context work and do all the work in the timer. Use a fixed interval and either keep it forever or remove it. expect: Use a slow fixed interval and just do the expect/unexpect fast marking. A nice thing would be to have a counter of expect calls so we can switch off the timer when there is no activity within 10 seconds. > How does that sound to you? Way better. :) Thanks, tglx ^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2010-08-03 11:43 UTC | newest] Thread overview: 18+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-07-28 13:42 [GIT PULL tip/genirq] Please pull from lost-spurious-irq Tejun Heo 2010-07-28 13:46 ` Tejun Heo 2010-07-29 8:44 ` Thomas Gleixner 2010-07-30 9:46 ` Tejun Heo 2010-08-02 14:07 ` Thomas Gleixner 2010-08-02 15:28 ` Tejun Heo 2010-08-02 15:35 ` Tejun Heo 2010-08-02 18:52 ` Thomas Gleixner 2010-08-02 19:57 ` Tejun Heo 2010-08-03 10:06 ` Thomas Gleixner 2010-08-03 10:15 ` Tejun Heo 2010-08-02 21:06 ` Tejun Heo 2010-08-02 21:51 ` Thomas Gleixner 2010-08-02 17:10 ` Thomas Gleixner 2010-08-02 20:48 ` Tejun Heo 2010-08-02 22:28 ` Thomas Gleixner 2010-08-03 8:49 ` Tejun Heo 2010-08-03 11:43 ` Thomas Gleixner
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox