* Re: question about softirqs [not found] ` <20090511.162436.193717082.davem@davemloft.net> @ 2009-05-12 0:43 ` Chris Friesen 2009-05-12 8:12 ` Ingo Molnar 0 siblings, 1 reply; 28+ messages in thread From: Chris Friesen @ 2009-05-12 0:43 UTC (permalink / raw) To: David Miller; +Cc: linuxppc-dev, Ingo Molnar, paulus, netdev This started out as a thread on the ppc list, but on the suggestion of DaveM and Paul Mackerras I'm expanding the receiver list a bit. Currently, if a softirq is raised in process context the TIF_RESCHED_PENDING flag gets set and on return to userspace we run the scheduler, expecting it to switch to ksoftirqd to handle the softirqd processing. I think I see a possible problem with this. Suppose I have a SCHED_FIFO task spinning on recvmsg() with MSG_DONTWAIT set. Under the scenario above, schedule() would re-run the spinning task rather than ksoftirqd, thus preventing any incoming packets from being sent up the stack until we get a real hardware interrupt--which could be a whole jiffy if interrupt mitigation is enabled in the net device. DaveM pointed out that if we're doing transmits we're likely to hit local_bh_enable(), which would process the softirq work. However, I think we may still have a problem in the above rx-only scenario--or is it too contrived to matter? Thanks, Chris ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: question about softirqs 2009-05-12 0:43 ` question about softirqs Chris Friesen @ 2009-05-12 8:12 ` Ingo Molnar 2009-05-12 9:12 ` Peter Zijlstra 2009-05-12 15:18 ` Chris Friesen 0 siblings, 2 replies; 28+ messages in thread From: Ingo Molnar @ 2009-05-12 8:12 UTC (permalink / raw) To: Chris Friesen, Peter Zijlstra, Thomas Gleixner, Steven Rostedt Cc: David Miller, linuxppc-dev, paulus, netdev * Chris Friesen <cfriesen@nortel.com> wrote: > This started out as a thread on the ppc list, but on the > suggestion of DaveM and Paul Mackerras I'm expanding the receiver > list a bit. > > Currently, if a softirq is raised in process context the > TIF_RESCHED_PENDING flag gets set and on return to userspace we > run the scheduler, expecting it to switch to ksoftirqd to handle > the softirqd processing. > > I think I see a possible problem with this. Suppose I have a > SCHED_FIFO task spinning on recvmsg() with MSG_DONTWAIT set. Under > the scenario above, schedule() would re-run the spinning task > rather than ksoftirqd, thus preventing any incoming packets from > being sent up the stack until we get a real hardware > interrupt--which could be a whole jiffy if interrupt mitigation is > enabled in the net device. TIF_RESCHED_PENDING will not be set if a SCHED_FIFO task wakes up a SCHED_OTHER ksoftirqd task. But starvation of ksoftirqd processing will occur. > DaveM pointed out that if we're doing transmits we're likely to > hit local_bh_enable(), which would process the softirq work. > However, I think we may still have a problem in the above rx-only > scenario--or is it too contrived to matter? This could occur, and the problem is really that task priorities do not extend across softirq work processing. This could occur in ordinary SCHED_OTHER tasks as well, if the softirq is bounced to ksoftirqd - which it only should be if there's serious softirq overload - or, as you describe it above, if the softirq is raised in process context: if (!in_interrupt()) wakeup_softirqd(); that's not really clean. We look into eliminating process context use of raise_softirq_irqsoff(). Such code sequence: local_irq_save(flags); ... raise_softirq_irqsoff(nr); ... local_irq_restore(flags); should be converted to something like: local_irq_save(flags); ... raise_softirq_irqsoff(nr); ... local_irq_restore(flags); recheck_softirqs(); If someone does not do proper local_bh_disable()/enable() sequences for micro-optimization reasons, then push the check to after the critcal section - and dont cause extra reschedules by waking up ksoftirqd. raise_softirq_irqsoff() will also be faster. Ingo ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: question about softirqs 2009-05-12 8:12 ` Ingo Molnar @ 2009-05-12 9:12 ` Peter Zijlstra 2009-05-12 9:23 ` Ingo Molnar 2009-05-13 5:55 ` Evgeniy Polyakov 2009-05-12 15:18 ` Chris Friesen 1 sibling, 2 replies; 28+ messages in thread From: Peter Zijlstra @ 2009-05-12 9:12 UTC (permalink / raw) To: Ingo Molnar Cc: linuxppc-dev, netdev, Steven Rostedt, paulus, Thomas Gleixner, David Miller On Tue, 2009-05-12 at 10:12 +0200, Ingo Molnar wrote: > * Chris Friesen <cfriesen@nortel.com> wrote: > > > This started out as a thread on the ppc list, but on the > > suggestion of DaveM and Paul Mackerras I'm expanding the receiver > > list a bit. > > > > Currently, if a softirq is raised in process context the > > TIF_RESCHED_PENDING flag gets set and on return to userspace we > > run the scheduler, expecting it to switch to ksoftirqd to handle > > the softirqd processing. > > > > I think I see a possible problem with this. Suppose I have a > > SCHED_FIFO task spinning on recvmsg() with MSG_DONTWAIT set. Under > > the scenario above, schedule() would re-run the spinning task > > rather than ksoftirqd, thus preventing any incoming packets from > > being sent up the stack until we get a real hardware > > interrupt--which could be a whole jiffy if interrupt mitigation is > > enabled in the net device. > > TIF_RESCHED_PENDING will not be set if a SCHED_FIFO task wakes up a > SCHED_OTHER ksoftirqd task. But starvation of ksoftirqd processing > will occur. > > > DaveM pointed out that if we're doing transmits we're likely to > > hit local_bh_enable(), which would process the softirq work. > > However, I think we may still have a problem in the above rx-only > > scenario--or is it too contrived to matter? > > This could occur, and the problem is really that task priorities do > not extend across softirq work processing. > > This could occur in ordinary SCHED_OTHER tasks as well, if the > softirq is bounced to ksoftirqd - which it only should be if there's > serious softirq overload - or, as you describe it above, if the > softirq is raised in process context: > > if (!in_interrupt()) > wakeup_softirqd(); > > that's not really clean. We look into eliminating process context > use of raise_softirq_irqsoff(). Such code sequence: > > local_irq_save(flags); > ... > raise_softirq_irqsoff(nr); > ... > local_irq_restore(flags); > > should be converted to something like: > > local_irq_save(flags); > ... > raise_softirq_irqsoff(nr); > ... > local_irq_restore(flags); > recheck_softirqs(); > > If someone does not do proper local_bh_disable()/enable() sequences > for micro-optimization reasons, then push the check to after the > critcal section - and dont cause extra reschedules by waking up > ksoftirqd. raise_softirq_irqsoff() will also be faster. Wouldn't the even better solution be to get rid of softirqs all-together? I see the recent work by Thomas to get threaded interrupts upstream as a good first step towards that goal, once the RX processing is moved to a thread (or multiple threads) one can priorize them in the regular sys_sched_setscheduler() way and its obvious that a FIFO task above the priority of the network tasks will have network starvation issues. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: question about softirqs 2009-05-12 9:12 ` Peter Zijlstra @ 2009-05-12 9:23 ` Ingo Molnar 2009-05-12 9:32 ` Peter Zijlstra 2009-05-13 4:44 ` David Miller 2009-05-13 5:55 ` Evgeniy Polyakov 1 sibling, 2 replies; 28+ messages in thread From: Ingo Molnar @ 2009-05-12 9:23 UTC (permalink / raw) To: Peter Zijlstra Cc: linuxppc-dev, netdev, Steven Rostedt, paulus, Thomas Gleixner, David Miller * Peter Zijlstra <a.p.zijlstra@chello.nl> wrote: > On Tue, 2009-05-12 at 10:12 +0200, Ingo Molnar wrote: > > * Chris Friesen <cfriesen@nortel.com> wrote: > > > > > This started out as a thread on the ppc list, but on the > > > suggestion of DaveM and Paul Mackerras I'm expanding the receiver > > > list a bit. > > > > > > Currently, if a softirq is raised in process context the > > > TIF_RESCHED_PENDING flag gets set and on return to userspace we > > > run the scheduler, expecting it to switch to ksoftirqd to handle > > > the softirqd processing. > > > > > > I think I see a possible problem with this. Suppose I have a > > > SCHED_FIFO task spinning on recvmsg() with MSG_DONTWAIT set. Under > > > the scenario above, schedule() would re-run the spinning task > > > rather than ksoftirqd, thus preventing any incoming packets from > > > being sent up the stack until we get a real hardware > > > interrupt--which could be a whole jiffy if interrupt mitigation is > > > enabled in the net device. > > > > TIF_RESCHED_PENDING will not be set if a SCHED_FIFO task wakes up a > > SCHED_OTHER ksoftirqd task. But starvation of ksoftirqd processing > > will occur. > > > > > DaveM pointed out that if we're doing transmits we're likely to > > > hit local_bh_enable(), which would process the softirq work. > > > However, I think we may still have a problem in the above rx-only > > > scenario--or is it too contrived to matter? > > > > This could occur, and the problem is really that task priorities do > > not extend across softirq work processing. > > > > This could occur in ordinary SCHED_OTHER tasks as well, if the > > softirq is bounced to ksoftirqd - which it only should be if there's > > serious softirq overload - or, as you describe it above, if the > > softirq is raised in process context: > > > > if (!in_interrupt()) > > wakeup_softirqd(); > > > > that's not really clean. We look into eliminating process context > > use of raise_softirq_irqsoff(). Such code sequence: > > > > local_irq_save(flags); > > ... > > raise_softirq_irqsoff(nr); > > ... > > local_irq_restore(flags); > > > > should be converted to something like: > > > > local_irq_save(flags); > > ... > > raise_softirq_irqsoff(nr); > > ... > > local_irq_restore(flags); > > recheck_softirqs(); > > > > If someone does not do proper local_bh_disable()/enable() sequences > > for micro-optimization reasons, then push the check to after the > > critcal section - and dont cause extra reschedules by waking up > > ksoftirqd. raise_softirq_irqsoff() will also be faster. > > > Wouldn't the even better solution be to get rid of softirqs > all-together? > > I see the recent work by Thomas to get threaded interrupts > upstream as a good first step towards that goal, once the RX > processing is moved to a thread (or multiple threads) one can > priorize them in the regular sys_sched_setscheduler() way and its > obvious that a FIFO task above the priority of the network tasks > will have network starvation issues. Yeah, that would be "nice". A single IRQ thread plus the process context(s) doing networking might perform well. Multiple IRQ threads (softirq and hardirq threads mixed) i'm not so sure about - it's extra context-switching cost. Btw, i noticed that using scheduling for work (packet, etc.) flow distribution standardizes and evens out the behavior of workloads. Softirq scheduling is really quite random currently. We have a random processing loop-limit in the core code and various batching and work-limit controls at individual usage sites. We sometimes piggyback to ksoftirqd. It's far easier to keep performance in check when things are more predictable. But this is not an easy endevour, and performance regressions have to be expected and addressed if they occur. There can be random packet queuing details in networking drivers that just happen to work fine now, and might work worse with a kernel thread in place. So there has to be broad buy-in for the concept, and a concerted effort to eliminate softirq processing and most of hardirq processing by pushing those two elements into a single hardirq thread (and the rest into process context). Not for the faint hearted. Nor is it recommended to be done without a good layer of asbestos. Ingo ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: question about softirqs 2009-05-12 9:23 ` Ingo Molnar @ 2009-05-12 9:32 ` Peter Zijlstra 2009-05-12 12:20 ` Steven Rostedt 2009-05-13 4:44 ` David Miller 1 sibling, 1 reply; 28+ messages in thread From: Peter Zijlstra @ 2009-05-12 9:32 UTC (permalink / raw) To: Ingo Molnar Cc: linuxppc-dev, netdev, Steven Rostedt, paulus, Thomas Gleixner, David Miller On Tue, 2009-05-12 at 11:23 +0200, Ingo Molnar wrote: > > Yeah, that would be "nice". A single IRQ thread plus the process > context(s) doing networking might perform well. > > Multiple IRQ threads (softirq and hardirq threads mixed) i'm not so > sure about - it's extra context-switching cost. Sure, that was implied by the getting rid of softirqs ;-), on -rt we currently suffer this hardirq/softirq thread ping-pong, it sucks. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: question about softirqs 2009-05-12 9:32 ` Peter Zijlstra @ 2009-05-12 12:20 ` Steven Rostedt 2009-05-13 4:45 ` David Miller 0 siblings, 1 reply; 28+ messages in thread From: Steven Rostedt @ 2009-05-12 12:20 UTC (permalink / raw) To: Peter Zijlstra Cc: Ingo Molnar, Chris Friesen, Thomas Gleixner, David Miller, linuxppc-dev, paulus, netdev On Tue, 12 May 2009, Peter Zijlstra wrote: > On Tue, 2009-05-12 at 11:23 +0200, Ingo Molnar wrote: > > > > Yeah, that would be "nice". A single IRQ thread plus the process > > context(s) doing networking might perform well. > > > > Multiple IRQ threads (softirq and hardirq threads mixed) i'm not so > > sure about - it's extra context-switching cost. > > Sure, that was implied by the getting rid of softirqs ;-), on -rt we > currently suffer this hardirq/softirq thread ping-pong, it sucks. I'm going to be playing around with bypassing the net-rx/tx with my network drivers. I'm going to add threaded irqs for my network cards and have the driver threads do the work to get through the tcp/ip stack. I'll still keep the softirqs for other cards, but I want to see how fast it speeds things up if I have the driver thread do it. -- Steve ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: question about softirqs 2009-05-12 12:20 ` Steven Rostedt @ 2009-05-13 4:45 ` David Miller 0 siblings, 0 replies; 28+ messages in thread From: David Miller @ 2009-05-13 4:45 UTC (permalink / raw) To: rostedt; +Cc: a.p.zijlstra, linuxppc-dev, netdev, paulus, mingo, tglx From: Steven Rostedt <rostedt@goodmis.org> Date: Tue, 12 May 2009 08:20:51 -0400 (EDT) > I'm going to be playing around with bypassing the net-rx/tx with my > network drivers. I'm going to add threaded irqs for my network cards and > have the driver threads do the work to get through the tcp/ip stack. > > I'll still keep the softirqs for other cards, but I want to see how fast > it speeds things up if I have the driver thread do it. I think your latency is going to be dreadful. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: question about softirqs 2009-05-12 9:23 ` Ingo Molnar 2009-05-12 9:32 ` Peter Zijlstra @ 2009-05-13 4:44 ` David Miller 2009-05-13 5:15 ` Paul Mackerras 1 sibling, 1 reply; 28+ messages in thread From: David Miller @ 2009-05-13 4:44 UTC (permalink / raw) To: mingo; +Cc: a.p.zijlstra, cfriesen, tglx, rostedt, linuxppc-dev, paulus, netdev From: Ingo Molnar <mingo@elte.hu> Date: Tue, 12 May 2009 11:23:48 +0200 >> Wouldn't the even better solution be to get rid of softirqs >> all-together? >> >> I see the recent work by Thomas to get threaded interrupts >> upstream as a good first step towards that goal, once the RX >> processing is moved to a thread (or multiple threads) one can >> priorize them in the regular sys_sched_setscheduler() way and its >> obvious that a FIFO task above the priority of the network tasks >> will have network starvation issues. > > Yeah, that would be "nice". A single IRQ thread plus the process > context(s) doing networking might perform well. Nice for -rt goals, but not for latency. So we're going to regress in this area again? I can't see how that's so desirable, to be honest with you. The fact that this discussion started about a task with a certain priority not being able to make forward progress, even though it was correct coded, just because softirqs are being processed in a thread context, should be a big red flag that this is a buggered up design. I fully expected us to be, at this point, talking about putting the pending softirq check back into the trap return path :-/ ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: question about softirqs 2009-05-13 4:44 ` David Miller @ 2009-05-13 5:15 ` Paul Mackerras 2009-05-13 5:28 ` David Miller 0 siblings, 1 reply; 28+ messages in thread From: Paul Mackerras @ 2009-05-13 5:15 UTC (permalink / raw) To: David Miller Cc: mingo, a.p.zijlstra, cfriesen, tglx, rostedt, linuxppc-dev, netdev David Miller writes: > I fully expected us to be, at this point, talking about putting the > pending softirq check back into the trap return path :-/ Would that actually do any good, in the case where the system has decided that ksoftirqd is handling soft irqs at the moment? Paul. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: question about softirqs 2009-05-13 5:15 ` Paul Mackerras @ 2009-05-13 5:28 ` David Miller 0 siblings, 0 replies; 28+ messages in thread From: David Miller @ 2009-05-13 5:28 UTC (permalink / raw) To: paulus; +Cc: mingo, a.p.zijlstra, cfriesen, tglx, rostedt, linuxppc-dev, netdev From: Paul Mackerras <paulus@samba.org> Date: Wed, 13 May 2009 15:15:34 +1000 > David Miller writes: > >> I fully expected us to be, at this point, talking about putting the >> pending softirq check back into the trap return path :-/ > > Would that actually do any good, in the case where the system has > decided that ksoftirqd is handling soft irqs at the moment? Even if ksoftirqd is running, we check and run pending softirqs from trap return. Sure, I imagine we could re-enter this "ksoftirq blocked by highprio thread" situation if we get flooded every single time over and over again. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: question about softirqs 2009-05-12 9:12 ` Peter Zijlstra 2009-05-12 9:23 ` Ingo Molnar @ 2009-05-13 5:55 ` Evgeniy Polyakov 1 sibling, 0 replies; 28+ messages in thread From: Evgeniy Polyakov @ 2009-05-13 5:55 UTC (permalink / raw) To: Peter Zijlstra Cc: Ingo Molnar, Chris Friesen, Thomas Gleixner, Steven Rostedt, David Miller, linuxppc-dev, paulus, netdev Hi. On Tue, May 12, 2009 at 11:12:58AM +0200, Peter Zijlstra (a.p.zijlstra@chello.nl) wrote: > Wouldn't the even better solution be to get rid of softirqs > all-together? And move tasklets into some thread context? Only if we are ready to fix 7 times rescheduling regressions compared to kernel threads (work queue actually). At least that's how tasklet behaved compared to work queue 1.5 years ago in the simplest and quite naive test where tasklet/work rescheduled iself number of times: http://marc.info/?l=linux-crypto-vger&m=119462472517405&w=2 -- Evgeniy Polyakov ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: question about softirqs 2009-05-12 8:12 ` Ingo Molnar 2009-05-12 9:12 ` Peter Zijlstra @ 2009-05-12 15:18 ` Chris Friesen 2009-05-13 8:34 ` Andi Kleen 1 sibling, 1 reply; 28+ messages in thread From: Chris Friesen @ 2009-05-12 15:18 UTC (permalink / raw) To: Ingo Molnar Cc: Peter Zijlstra, Thomas Gleixner, Steven Rostedt, David Miller, linuxppc-dev, paulus, netdev Ingo Molnar wrote: > * Chris Friesen <cfriesen@nortel.com> wrote: >>I think I see a possible problem with this. Suppose I have a >>SCHED_FIFO task spinning on recvmsg() with MSG_DONTWAIT set. Under >>the scenario above, schedule() would re-run the spinning task >>rather than ksoftirqd, thus preventing any incoming packets from >>being sent up the stack until we get a real hardware >>interrupt--which could be a whole jiffy if interrupt mitigation is >>enabled in the net device. >>DaveM pointed out that if we're doing transmits we're likely to >>hit local_bh_enable(), which would process the softirq work. >>However, I think we may still have a problem in the above rx-only >>scenario--or is it too contrived to matter? > This could occur, and the problem is really that task priorities do > not extend across softirq work processing. > > This could occur in ordinary SCHED_OTHER tasks as well, if the > softirq is bounced to ksoftirqd - which it only should be if there's > serious softirq overload - or, as you describe it above, if the > softirq is raised in process context: One of the reasons I brought up this issue is that there is a lot of documentation out there that says "softirqs will be processed on return from a syscall". The fact that it actually depends on the scheduler parameters of the task issuing the syscall isn't ever mentioned. In fact, "Documentation/DocBook/kernel-hacking.tmpl" in the kernel source still has the following: Whenever a system call is about to return to userspace, or a hardware interrupt handler exits, any 'software interrupts' which are marked pending (usually by hardware interrupts) are run (<filename>kernel/softirq.c</filename>). If anyone is looking at changing this code, it might be good to ensure that at least the kernel docs are updated. Chris ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: question about softirqs 2009-05-12 15:18 ` Chris Friesen @ 2009-05-13 8:34 ` Andi Kleen 2009-05-13 13:23 ` Chris Friesen 0 siblings, 1 reply; 28+ messages in thread From: Andi Kleen @ 2009-05-13 8:34 UTC (permalink / raw) To: Chris Friesen Cc: Ingo Molnar, Peter Zijlstra, Thomas Gleixner, Steven Rostedt, David Miller, linuxppc-dev, paulus, netdev "Chris Friesen" <cfriesen@nortel.com> writes: > > One of the reasons I brought up this issue is that there is a lot of > documentation out there that says "softirqs will be processed on return > from a syscall". The fact that it actually depends on the scheduler > parameters of the task issuing the syscall isn't ever mentioned. It's not mentioned because it is not currently. However some network TCP RX processing can happen in process context, which gives you most of the benefit anyways. > In fact, "Documentation/DocBook/kernel-hacking.tmpl" in the kernel > source still has the following: > > Whenever a system call is about to return to userspace, or a > hardware interrupt handler exits, any 'software interrupts' > which are marked pending (usually by hardware interrupts) are > run (<filename>kernel/softirq.c</filename>). > > If anyone is looking at changing this code, it might be good to ensure > that at least the kernel docs are updated. So far the code is not changed in mainline. There have been some proposals only. -Andi -- ak@linux.intel.com -- Speaking for myself only. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: question about softirqs 2009-05-13 8:34 ` Andi Kleen @ 2009-05-13 13:23 ` Chris Friesen 2009-05-13 14:15 ` Andi Kleen 0 siblings, 1 reply; 28+ messages in thread From: Chris Friesen @ 2009-05-13 13:23 UTC (permalink / raw) To: Andi Kleen Cc: Peter Zijlstra, netdev, Steven Rostedt, David Miller, linuxppc-dev, paulus, Ingo Molnar, Thomas Gleixner Andi Kleen wrote: > "Chris Friesen" <cfriesen@nortel.com> writes: > >>One of the reasons I brought up this issue is that there is a lot of >>documentation out there that says "softirqs will be processed on return >>from a syscall". The fact that it actually depends on the scheduler >>parameters of the task issuing the syscall isn't ever mentioned. > It's not mentioned because it is not currently. Paul Mackerras explained the current behaviour earlier in the thread (when it was still on the ppc list). His explanation agrees with my exporation of the code. "If a soft irq is raised in process context, raise_softirq() in kernel/softirq.c calls wakeup_softirqd() to make sure that ksoftirqd runs soon to process the soft irq. So what would happen is that we would see the TIF_RESCHED_PENDING flag on the current task in the syscall exit path and call schedule() which would switch to ksoftirqd to process the soft irq (if it hasn't already been processed by that stage)." If the current task is of higher priority, ksoftirqd doesn't get a chance to run and we don't process softirqs on return from a syscall. Chris ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: question about softirqs 2009-05-13 13:23 ` Chris Friesen @ 2009-05-13 14:15 ` Andi Kleen 2009-05-13 14:17 ` Thomas Gleixner 0 siblings, 1 reply; 28+ messages in thread From: Andi Kleen @ 2009-05-13 14:15 UTC (permalink / raw) To: Chris Friesen Cc: Andi Kleen, Ingo Molnar, Peter Zijlstra, Thomas Gleixner, Steven Rostedt, David Miller, linuxppc-dev, paulus, netdev > "If a soft irq is raised in process context, raise_softirq() in > kernel/softirq.c calls wakeup_softirqd() to make sure that ksoftirqd softirqd is only used when the softirq runs for too long or when there are no suitable irq exits for a long time. In normal situations (not excessive time in softirq) they don't do anything. -Andi -- ak@linux.intel.com -- Speaking for myself only. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: question about softirqs 2009-05-13 14:15 ` Andi Kleen @ 2009-05-13 14:17 ` Thomas Gleixner 2009-05-13 14:24 ` Andi Kleen 0 siblings, 1 reply; 28+ messages in thread From: Thomas Gleixner @ 2009-05-13 14:17 UTC (permalink / raw) To: Andi Kleen Cc: Chris Friesen, Ingo Molnar, Peter Zijlstra, Steven Rostedt, David Miller, linuxppc-dev, paulus, netdev On Wed, 13 May 2009, Andi Kleen wrote: > > "If a soft irq is raised in process context, raise_softirq() in > > kernel/softirq.c calls wakeup_softirqd() to make sure that ksoftirqd > > softirqd is only used when the softirq runs for too long or when > there are no suitable irq exits for a long time. > > In normal situations (not excessive time in softirq) they don't > do anything. Err, no. Chris is completely correct: if (!in_interrupt()) wakeup_softirqd(); We can not rely on irqs coming in when the softirq is raised from thread context. An irq_exit might be faster to process it than the scheduler can schedule ksoftirqd in, but ksoftirqd is woken and runs nevertheless. If it finds a softirq pending then it processes them in it's context and irq_exit calls to softirq are returning right away. Thanks, tglx ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: question about softirqs 2009-05-13 14:17 ` Thomas Gleixner @ 2009-05-13 14:24 ` Andi Kleen 2009-05-13 14:54 ` Eric Dumazet 2009-05-13 15:05 ` Chris Friesen 0 siblings, 2 replies; 28+ messages in thread From: Andi Kleen @ 2009-05-13 14:24 UTC (permalink / raw) To: Thomas Gleixner Cc: Chris Friesen, Ingo Molnar, Peter Zijlstra, Steven Rostedt, David Miller, linuxppc-dev, paulus, netdev Thomas Gleixner <tglx@linutronix.de> writes: > Err, no. Chris is completely correct: > > if (!in_interrupt()) > wakeup_softirqd(); Yes you have to wake it up just in case, but it doesn't normally process the data because a normal softirq comes in faster. It's just a safety policy. You can check this by checking the accumulated CPU time on your ksoftirqs. Mine are all 0 even on long running systems. The reason Andrea originally added the softirqds was just that if you have very softirq intensive workloads they would tie up too much CPU time or not make enough process with the default "don't loop too often" heuristics. > We can not rely on irqs coming in when the softirq is raised from You can't rely on it, but it happens in near all cases. -Andi -- ak@linux.intel.com -- Speaking for myself only. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: question about softirqs 2009-05-13 14:24 ` Andi Kleen @ 2009-05-13 14:54 ` Eric Dumazet 2009-05-13 15:02 ` Andi Kleen 2009-05-13 15:05 ` Chris Friesen 1 sibling, 1 reply; 28+ messages in thread From: Eric Dumazet @ 2009-05-13 14:54 UTC (permalink / raw) To: Andi Kleen Cc: Thomas Gleixner, Chris Friesen, Ingo Molnar, Peter Zijlstra, Steven Rostedt, David Miller, linuxppc-dev, paulus, netdev Andi Kleen a écrit : > Thomas Gleixner <tglx@linutronix.de> writes: > > >> Err, no. Chris is completely correct: >> >> if (!in_interrupt()) >> wakeup_softirqd(); > > Yes you have to wake it up just in case, but it doesn't normally > process the data because a normal softirq comes in faster. It's > just a safety policy. > > You can check this by checking the accumulated CPU time on your > ksoftirqs. Mine are all 0 even on long running systems. > Then its a bug Andi. Its quite easy to trigger ksoftirqd with a Gb ethernet link. commit f5f293a4e3d0a0c52cec31de6762c95050156516 corrected something (making mpstat and top correctly display softirq on cpu stats), but apparently we still have a problem to report correct time on processes, particularly on ksoftirq/x I have one machine SMP flooded by network frames, CPU0 handling all the work, inside ksoftirq/0 (napi processing : almost no more hard interrupts delivered) Still, top or ps reports no more than 30% of cpu time used by ksoftirqd, while this cpu only runs ksoftirqd/0 (100% in sirq), and has no idle time. $ps -fp 4 ; mpstat -P 0 1 10 ; ps -fp 4 UID PID PPID C STIME TTY TIME CMD root 4 2 1 15:35 ? 00:00:46 [ksoftirqd/0] Linux 2.6.30-rc5-tip-01595-g6f75dad-dirty (svivoipvnx001) 05/13/2009 _i686_ 04:45:01 PM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %idle 04:45:02 PM 0 0.00 0.00 0.00 0.00 0.00 100.00 0.00 0.00 0.00 04:45:03 PM 0 0.00 0.00 0.00 0.00 0.00 99.01 0.00 0.00 0.99 04:45:04 PM 0 0.00 0.00 0.00 0.00 0.00 100.00 0.00 0.00 0.00 04:45:05 PM 0 0.00 0.00 0.00 0.00 0.00 100.00 0.00 0.00 0.00 04:45:06 PM 0 0.00 0.00 0.00 0.00 0.00 100.00 0.00 0.00 0.00 04:45:07 PM 0 0.00 0.00 0.00 0.00 0.00 100.00 0.00 0.00 0.00 04:45:08 PM 0 0.00 0.00 0.00 0.00 0.00 100.00 0.00 0.00 0.00 04:45:09 PM 0 0.00 0.00 0.00 0.00 0.00 100.00 0.00 0.00 0.00 04:45:10 PM 0 0.00 0.00 0.00 0.00 0.00 100.00 0.00 0.00 0.00 04:45:11 PM 0 0.00 0.00 0.00 0.00 0.00 100.00 0.00 0.00 0.00 Average: 0 0.00 0.00 0.00 0.00 0.00 99.90 0.00 0.00 0.10 UID PID PPID C STIME TTY TIME CMD root 4 2 1 15:35 ? 00:00:49 [ksoftirqd/0] You can see here time consumed by ksoftirqd/0 suring this 10 seconds time frame is *only* 3 seconds. Therefore, we cannot trust ps, not with current kernel. # cat /proc/4/stat ; sleep 10 ; cat /proc/4/stat 4 (ksoftirqd/0) R 2 0 0 0 -1 2216730688 0 0 0 0 0 15347 0 0 15 -5 1 0 6 0 0 4294967295 0 0 0 0 0 0 0 2147483647 0 0 0 0 17 0 0 0 0 0 0 4 (ksoftirqd/0) R 2 0 0 0 -1 2216730688 0 0 0 0 0 15670 0 0 15 -5 1 0 6 0 0 4294967295 0 0 0 0 0 0 0 2147483647 0 0 0 0 17 0 0 0 0 0 0 > The reason Andrea originally added the softirqds was just that > if you have very softirq intensive workloads they would tie > up too much CPU time or not make enough process with the default > "don't loop too often" heuristics. > >> We can not rely on irqs coming in when the softirq is raised from > > You can't rely on it, but it happens in near all cases. > > -Andi ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: question about softirqs 2009-05-13 14:54 ` Eric Dumazet @ 2009-05-13 15:02 ` Andi Kleen 0 siblings, 0 replies; 28+ messages in thread From: Andi Kleen @ 2009-05-13 15:02 UTC (permalink / raw) To: Eric Dumazet Cc: Andi Kleen, Thomas Gleixner, Chris Friesen, Ingo Molnar, Peter Zijlstra, Steven Rostedt, David Miller, linuxppc-dev, paulus, netdev > I have one machine SMP flooded by network frames, CPU0 handling all Yes that's the case softirqd is supposed to handle. When you spend a significant part of your CPU time in softirq context it kicks in to provide somewhat fair additional CPU time. But most systems (like mine) don't do that. -Andi -- ak@linux.intel.com -- Speaking for myself only. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: question about softirqs 2009-05-13 14:24 ` Andi Kleen 2009-05-13 14:54 ` Eric Dumazet @ 2009-05-13 15:05 ` Chris Friesen 2009-05-13 15:54 ` Thomas Gleixner 2009-05-13 17:01 ` Andi Kleen 1 sibling, 2 replies; 28+ messages in thread From: Chris Friesen @ 2009-05-13 15:05 UTC (permalink / raw) To: Andi Kleen Cc: Peter Zijlstra, netdev, Ingo Molnar, Steven Rostedt, linuxppc-dev, paulus, Thomas Gleixner, David Miller Andi Kleen wrote: > Thomas Gleixner <tglx@linutronix.de> writes: >>Err, no. Chris is completely correct: >> >> if (!in_interrupt()) >> wakeup_softirqd(); > > Yes you have to wake it up just in case, but it doesn't normally > process the data because a normal softirq comes in faster. It's > just a safety policy. What about the scenario I raised earlier, where we have incoming network packets, no hardware interrupts coming in other than the timer tick, and a high-priority userspace app is spinning on recvmsg() with MSG_DONTWAIT set? As far as I can tell, in this scenario softirqs may not get processed on return from a syscall (contradicting the documentation). In the worst case, they may not get processed until the next timer tick. Chris ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: question about softirqs 2009-05-13 15:05 ` Chris Friesen @ 2009-05-13 15:54 ` Thomas Gleixner 2009-05-13 16:10 ` Chris Friesen 2009-05-13 17:01 ` Andi Kleen 1 sibling, 1 reply; 28+ messages in thread From: Thomas Gleixner @ 2009-05-13 15:54 UTC (permalink / raw) To: Chris Friesen Cc: Andi Kleen, Ingo Molnar, Peter Zijlstra, Steven Rostedt, David Miller, linuxppc-dev, paulus, netdev On Wed, 13 May 2009, Chris Friesen wrote: > Andi Kleen wrote: > > Thomas Gleixner <tglx@linutronix.de> writes: > > >>Err, no. Chris is completely correct: > >> > >> if (!in_interrupt()) > >> wakeup_softirqd(); > > > > Yes you have to wake it up just in case, but it doesn't normally > > process the data because a normal softirq comes in faster. It's > > just a safety policy. > > What about the scenario I raised earlier, where we have incoming network > packets, no hardware interrupts coming in other than the timer tick, and > a high-priority userspace app is spinning on recvmsg() with MSG_DONTWAIT > set? > > As far as I can tell, in this scenario softirqs may not get processed on > return from a syscall (contradicting the documentation). In the worst > case, they may not get processed until the next timer tick. Right because your high prio tasks prevents that ksoftirqd runs, because it can not preempt the high priority task. Thanks, tglx ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: question about softirqs 2009-05-13 15:54 ` Thomas Gleixner @ 2009-05-13 16:10 ` Chris Friesen 0 siblings, 0 replies; 28+ messages in thread From: Chris Friesen @ 2009-05-13 16:10 UTC (permalink / raw) To: Thomas Gleixner Cc: Peter Zijlstra, netdev, Steven Rostedt, linuxppc-dev, Andi Kleen, paulus, Ingo Molnar, David Miller Thomas Gleixner wrote: > On Wed, 13 May 2009, Chris Friesen wrote: >> As far as I can tell, in this scenario softirqs may not get processed on >> return from a syscall (contradicting the documentation). In the worst >> case, they may not get processed until the next timer tick. > > Right because your high prio tasks prevents that ksoftirqd runs, > because it can not preempt the high priority task. Exactly. I'm suggesting that this point (the idea that softirqs may or may not get processed on return from syscall depending on relative task priority) should probably be documented somewhere, because the current documentation (in the kernel and on the web) doesn't mention it at all. Maybe I should just submit a patch to Documentation/DocBook/kernel-hacking.tmpl. Chris ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: question about softirqs 2009-05-13 15:05 ` Chris Friesen 2009-05-13 15:54 ` Thomas Gleixner @ 2009-05-13 17:01 ` Andi Kleen 2009-05-13 19:04 ` Chris Friesen 1 sibling, 1 reply; 28+ messages in thread From: Andi Kleen @ 2009-05-13 17:01 UTC (permalink / raw) To: Chris Friesen Cc: Peter Zijlstra, netdev, Ingo Molnar, Steven Rostedt, linuxppc-dev, Andi Kleen, paulus, Thomas Gleixner, David Miller On Wed, May 13, 2009 at 09:05:01AM -0600, Chris Friesen wrote: > Andi Kleen wrote: > > Thomas Gleixner <tglx@linutronix.de> writes: > > >>Err, no. Chris is completely correct: > >> > >> if (!in_interrupt()) > >> wakeup_softirqd(); > > > > Yes you have to wake it up just in case, but it doesn't normally > > process the data because a normal softirq comes in faster. It's > > just a safety policy. > > What about the scenario I raised earlier, where we have incoming network > packets, network packets are normally processed by the network packet interrupt's softirq or alternatively in the NAPI poll loop. -Andi -- ak@linux.intel.com -- Speaking for myself only. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: question about softirqs 2009-05-13 17:01 ` Andi Kleen @ 2009-05-13 19:04 ` Chris Friesen 2009-05-13 19:13 ` Andi Kleen 0 siblings, 1 reply; 28+ messages in thread From: Chris Friesen @ 2009-05-13 19:04 UTC (permalink / raw) To: Andi Kleen Cc: Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Steven Rostedt, David Miller, linuxppc-dev, paulus, netdev Andi Kleen wrote: > network packets are normally processed by the network packet interrupt's > softirq or alternatively in the NAPI poll loop. If we have a high priority task, ksoftirqd may not get a chance to run. My point is simply that the documentation says that softirqs are processed on return from a syscall, and this is not necessarily the case. Chris ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: question about softirqs 2009-05-13 19:04 ` Chris Friesen @ 2009-05-13 19:13 ` Andi Kleen 2009-05-13 19:44 ` Chris Friesen 0 siblings, 1 reply; 28+ messages in thread From: Andi Kleen @ 2009-05-13 19:13 UTC (permalink / raw) To: Chris Friesen Cc: Andi Kleen, Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Steven Rostedt, David Miller, linuxppc-dev, paulus, netdev On Wed, May 13, 2009 at 01:04:09PM -0600, Chris Friesen wrote: > Andi Kleen wrote: > > > network packets are normally processed by the network packet interrupt's > > softirq or alternatively in the NAPI poll loop. > > If we have a high priority task, ksoftirqd may not get a chance to run. In this case the next interrupt will also process them. It will just go more slowly because interrupts limit the work compared to ksoftirqd. -Andi -- ak@linux.intel.com -- Speaking for myself only. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: question about softirqs 2009-05-13 19:13 ` Andi Kleen @ 2009-05-13 19:44 ` Chris Friesen 2009-05-13 19:53 ` Andi Kleen 0 siblings, 1 reply; 28+ messages in thread From: Chris Friesen @ 2009-05-13 19:44 UTC (permalink / raw) To: Andi Kleen Cc: Peter Zijlstra, netdev, Ingo Molnar, Steven Rostedt, linuxppc-dev, paulus, Thomas Gleixner, David Miller Andi Kleen wrote: > On Wed, May 13, 2009 at 01:04:09PM -0600, Chris Friesen wrote: >> Andi Kleen wrote: >> >>> network packets are normally processed by the network packet interrupt's >>> softirq or alternatively in the NAPI poll loop. >> If we have a high priority task, ksoftirqd may not get a chance to run. > > In this case the next interrupt will also process them. It will just > go more slowly because interrupts limit the work compared to ksoftirqd. I realize that they will eventually get processed. My point is that the documentation (in-kernel, online, and in various books) says that softirqs will be processed _on the return from a syscall_. As we all agree, this is not necessarily the case. Chris ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: question about softirqs 2009-05-13 19:44 ` Chris Friesen @ 2009-05-13 19:53 ` Andi Kleen 2009-05-13 20:55 ` Thomas Gleixner 0 siblings, 1 reply; 28+ messages in thread From: Andi Kleen @ 2009-05-13 19:53 UTC (permalink / raw) To: Chris Friesen Cc: Andi Kleen, Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Steven Rostedt, David Miller, linuxppc-dev, paulus, netdev On Wed, May 13, 2009 at 01:44:59PM -0600, Chris Friesen wrote: > Andi Kleen wrote: > > On Wed, May 13, 2009 at 01:04:09PM -0600, Chris Friesen wrote: > >> Andi Kleen wrote: > >> > >>> network packets are normally processed by the network packet interrupt's > >>> softirq or alternatively in the NAPI poll loop. > >> If we have a high priority task, ksoftirqd may not get a chance to run. > > > > In this case the next interrupt will also process them. It will just > > go more slowly because interrupts limit the work compared to ksoftirqd. > > I realize that they will eventually get processed. My point is that the > documentation (in-kernel, online, and in various books) says that > softirqs will be processed _on the return from a syscall_. They are. The documentation is correct. What might not be all processed is all packets that are in the per CPU backlog queue when the network softirq runs (for non NAPI, for NAPI that's obsolete anyways). That's because there are limits. Or when new work comes in in parallel it doesn't process it all. But that's always the case -- no queue is infinite, so you have always situations where it can drop or delay items. -Andi -- ak@linux.intel.com -- Speaking for myself only. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: question about softirqs 2009-05-13 19:53 ` Andi Kleen @ 2009-05-13 20:55 ` Thomas Gleixner 0 siblings, 0 replies; 28+ messages in thread From: Thomas Gleixner @ 2009-05-13 20:55 UTC (permalink / raw) To: Andi Kleen Cc: Chris Friesen, Ingo Molnar, Peter Zijlstra, Steven Rostedt, David Miller, linuxppc-dev, paulus, netdev On Wed, 13 May 2009, Andi Kleen wrote: > On Wed, May 13, 2009 at 01:44:59PM -0600, Chris Friesen wrote: > > Andi Kleen wrote: > > > On Wed, May 13, 2009 at 01:04:09PM -0600, Chris Friesen wrote: > > >> Andi Kleen wrote: > > >> > > >>> network packets are normally processed by the network packet interrupt's > > >>> softirq or alternatively in the NAPI poll loop. > > >> If we have a high priority task, ksoftirqd may not get a chance to run. > > > > > > In this case the next interrupt will also process them. It will just > > > go more slowly because interrupts limit the work compared to ksoftirqd. > > > > I realize that they will eventually get processed. My point is that the > > documentation (in-kernel, online, and in various books) says that > > softirqs will be processed _on the return from a syscall_. > > They are. The documentation is correct. No, the documentation is wrong for the case that the task, which raised the softirq and therefor woke up ksoftirqd, has a higher priority than ksoftirqd. In that case the kernel does _NOT_ schedule ksoftirqd in the return from syscall path. And that's all what Chris is pointing out. Thanks, tglx ^ permalink raw reply [flat|nested] 28+ messages in thread
end of thread, other threads:[~2009-05-13 20:56 UTC | newest]
Thread overview: 28+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <18948.63755.279732.294842@cargo.ozlabs.ibm.com>
[not found] ` <20090508.234815.127227651.davem@davemloft.net>
[not found] ` <4A086DB2.8040703@nortel.com>
[not found] ` <20090511.162436.193717082.davem@davemloft.net>
2009-05-12 0:43 ` question about softirqs Chris Friesen
2009-05-12 8:12 ` Ingo Molnar
2009-05-12 9:12 ` Peter Zijlstra
2009-05-12 9:23 ` Ingo Molnar
2009-05-12 9:32 ` Peter Zijlstra
2009-05-12 12:20 ` Steven Rostedt
2009-05-13 4:45 ` David Miller
2009-05-13 4:44 ` David Miller
2009-05-13 5:15 ` Paul Mackerras
2009-05-13 5:28 ` David Miller
2009-05-13 5:55 ` Evgeniy Polyakov
2009-05-12 15:18 ` Chris Friesen
2009-05-13 8:34 ` Andi Kleen
2009-05-13 13:23 ` Chris Friesen
2009-05-13 14:15 ` Andi Kleen
2009-05-13 14:17 ` Thomas Gleixner
2009-05-13 14:24 ` Andi Kleen
2009-05-13 14:54 ` Eric Dumazet
2009-05-13 15:02 ` Andi Kleen
2009-05-13 15:05 ` Chris Friesen
2009-05-13 15:54 ` Thomas Gleixner
2009-05-13 16:10 ` Chris Friesen
2009-05-13 17:01 ` Andi Kleen
2009-05-13 19:04 ` Chris Friesen
2009-05-13 19:13 ` Andi Kleen
2009-05-13 19:44 ` Chris Friesen
2009-05-13 19:53 ` Andi Kleen
2009-05-13 20:55 ` Thomas Gleixner
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).