* Re: [Query] Ticks happen in pair for NO_HZ_FULL cores ? [not found] <CAKohponch=o3nBKTmakA87OiN=HbgnEwJUL23mGkjQiNoJWjWw@mail.gmail.com> @ 2013-12-11 13:22 ` Frederic Weisbecker 2013-12-11 21:14 ` Tejun Heo 2013-12-17 10:35 ` Viresh Kumar 0 siblings, 2 replies; 13+ messages in thread From: Frederic Weisbecker @ 2013-12-11 13:22 UTC (permalink / raw) To: Viresh Kumar Cc: Kevin Hilman, Lists linaro-kernel, Linaro Networking, Linux Kernel Mailing List, Tejun Heo On Tue, Dec 03, 2013 at 01:57:37PM +0530, Viresh Kumar wrote: > Hi Frederic/Kevin, > > I was doing some work where I was required to use NO_HZ_FULL > on core 1 on a dual core ARM machine. > > I observed that I was able to isolate the second core using cpusets > but whenever the tick occurs, it occurs twice. i.e. Timer count > gets updated by two every time my core is disturbed. > > I tried to trace it (output attached) and found this sequence (Talking > only about core 1 here): > - Single task was running on Core 1 (using cpusets) > - got an arch_timer interrupt > - started servicing vmstat stuff > - so came out of NO_HZ_FULL domain as there is more than > one task on Core > - queued work again and went to the existing single task (stress) > - again got arch_timer interrupt after 5 ms (HZ=200) Right, looking at the details, the 2nd interrupt is caused by workqueue delayed work bdi writeback. > - got "tick_stop" event and went into NO_HZ_FULL domain again.. > - Got isolated again for long duration.. > > So the query is: why don't we check that at the end of servicing vmstat > stuff and migrating back to "stress" ?? I fear I don't understand your question. Do you mean why don't we prevent from that bdi writeback work to run when we are in full dynticks mode? We can't just ignore workqueues and timers callback when they are scheduled otherwise the kernel is going to behave randomly. OTOH what we can do is to work on these per cpu workqueues and timers and do what's necessary to avoid them to fire, as explained in detail there Documentation/kernel-per-CPU-kthreads.txt There is also the problem of unbound workqueues for which we don't have a solution yet. But the idea is that we could tweak their affinity from sysfs. > > Thanks. > > -- > viresh ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Query] Ticks happen in pair for NO_HZ_FULL cores ? 2013-12-11 13:22 ` [Query] Ticks happen in pair for NO_HZ_FULL cores ? Frederic Weisbecker @ 2013-12-11 21:14 ` Tejun Heo 2013-12-13 0:32 ` Kevin Hilman 2013-12-17 10:35 ` Viresh Kumar 1 sibling, 1 reply; 13+ messages in thread From: Tejun Heo @ 2013-12-11 21:14 UTC (permalink / raw) To: Frederic Weisbecker Cc: Viresh Kumar, Kevin Hilman, Lists linaro-kernel, Linaro Networking, Linux Kernel Mailing List, bsd, laijs Hey, guys. On Wed, Dec 11, 2013 at 02:22:14PM +0100, Frederic Weisbecker wrote: > I fear I don't understand your question. Do you mean why don't we prevent from > that bdi writeback work to run when we are in full dynticks mode? > > We can't just ignore workqueues and timers callback when they are scheduled > otherwise the kernel is going to behave randomly. > > OTOH what we can do is to work on these per cpu workqueues and timers and do > what's necessary to avoid them to fire, as explained in detail there Documentation/kernel-per-CPU-kthreads.txt Hmmm... some per-cpu workqueues can be turned into unbound ones and the writeback is one of those. Currently, this is used for powersaving on mobile but could also be useful for jitter control. In the long term, it could be beneficial to strictly distinguish the workqueues which really need per-cpu behavior and the ones which are per-cpu just for optimization. > There is also the problem of unbound workqueues for which we don't > have a solution yet. But the idea is that we could tweak their > affinity from sysfs. Yes, this is a long term todo item but I'm currently a bit too swamped to tackle it myself. cc'ing Lai, who has pretty good knowledge of workqueue internals, and Bandan, who seemed interested in working on implementing default attrs. Thanks. -- tejun ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Query] Ticks happen in pair for NO_HZ_FULL cores ? 2013-12-11 21:14 ` Tejun Heo @ 2013-12-13 0:32 ` Kevin Hilman 0 siblings, 0 replies; 13+ messages in thread From: Kevin Hilman @ 2013-12-13 0:32 UTC (permalink / raw) To: Tejun Heo Cc: Frederic Weisbecker, Viresh Kumar, Lists linaro-kernel, Linaro Networking, Linux Kernel Mailing List, bsd, laijs Tejun Heo <tj@kernel.org> writes: > Hey, guys. > > On Wed, Dec 11, 2013 at 02:22:14PM +0100, Frederic Weisbecker wrote: >> I fear I don't understand your question. Do you mean why don't we prevent from >> that bdi writeback work to run when we are in full dynticks mode? >> >> We can't just ignore workqueues and timers callback when they are scheduled >> otherwise the kernel is going to behave randomly. >> >> OTOH what we can do is to work on these per cpu workqueues and timers and do >> what's necessary to avoid them to fire, as explained in detail there Documentation/kernel-per-CPU-kthreads.txt > > Hmmm... some per-cpu workqueues can be turned into unbound ones and > the writeback is one of those. Ah, looks like the writeback one is already unbound, and configurable from sysfs. Viresh, add this to your test script, and it should get this workqueue out of the way: # pin the writeback workqueue to CPU0 echo 1 > /sys/bus/workqueue/devices/writeback/cpumask Kevin > Currently, this is used for > powersaving on mobile but could also be useful for jitter control. In > the long term, it could be beneficial to strictly distinguish the > workqueues which really need per-cpu behavior and the ones which are > per-cpu just for optimization. > >> There is also the problem of unbound workqueues for which we don't >> have a solution yet. But the idea is that we could tweak their >> affinity from sysfs. > > Yes, this is a long term todo item but I'm currently a bit too swamped > to tackle it myself. cc'ing Lai, who has pretty good knowledge of > workqueue internals, and Bandan, who seemed interested in working on > implementing default attrs. > > Thanks. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Query] Ticks happen in pair for NO_HZ_FULL cores ? 2013-12-11 13:22 ` [Query] Ticks happen in pair for NO_HZ_FULL cores ? Frederic Weisbecker 2013-12-11 21:14 ` Tejun Heo @ 2013-12-17 10:35 ` Viresh Kumar 2013-12-17 16:35 ` Kevin Hilman 1 sibling, 1 reply; 13+ messages in thread From: Viresh Kumar @ 2013-12-17 10:35 UTC (permalink / raw) To: Frederic Weisbecker Cc: Kevin Hilman, Lists linaro-kernel, Linaro Networking, Linux Kernel Mailing List, Tejun Heo Sorry for the delay, was on holidays.. On 11 December 2013 18:52, Frederic Weisbecker <fweisbec@gmail.com> wrote: > On Tue, Dec 03, 2013 at 01:57:37PM +0530, Viresh Kumar wrote: >> - again got arch_timer interrupt after 5 ms (HZ=200) > > Right, looking at the details, the 2nd interrupt is caused by workqueue delayed > work bdi writeback. I am not that great at reading traces or kernelshark output, but I still feel I haven't seen anything wrong. And I wasn't talking about the delayed workqueue here.. I am looking at the trace I attached with kernelshark after filtering out CPU0 events: - Event 41, timestamp: 159.891973 - it ends at event 56, timestamp: 159.892043 And after that the next event comes after 5 Seconds. And so I was talking for the Event 41. >> So the query is: why don't we check that at the end of servicing vmstat >> stuff and migrating back to "stress" ?? > > I fear I don't understand your question. Do you mean why don't we prevent from > that bdi writeback work to run when we are in full dynticks mode? No.. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Query] Ticks happen in pair for NO_HZ_FULL cores ? 2013-12-17 10:35 ` Viresh Kumar @ 2013-12-17 16:35 ` Kevin Hilman 2013-12-17 16:57 ` Frederic Weisbecker 2013-12-18 4:38 ` Viresh Kumar 0 siblings, 2 replies; 13+ messages in thread From: Kevin Hilman @ 2013-12-17 16:35 UTC (permalink / raw) To: Viresh Kumar Cc: Frederic Weisbecker, Lists linaro-kernel, Linaro Networking, Linux Kernel Mailing List, Tejun Heo Viresh Kumar <viresh.kumar@linaro.org> writes: > Sorry for the delay, was on holidays.. > > On 11 December 2013 18:52, Frederic Weisbecker <fweisbec@gmail.com> wrote: >> On Tue, Dec 03, 2013 at 01:57:37PM +0530, Viresh Kumar wrote: >>> - again got arch_timer interrupt after 5 ms (HZ=200) >> >> Right, looking at the details, the 2nd interrupt is caused by workqueue delayed >> work bdi writeback. > > I am not that great at reading traces or kernelshark output, but I > still feel I haven't > seen anything wrong. And I wasn't talking about the delayed workqueue here.. > > I am looking at the trace I attached with kernelshark after filtering > out CPU0 events: > - Event 41, timestamp: 159.891973 > - it ends at event 56, timestamp: 159.892043 For future reference, for generating email friendly trace output for discussion like this, you can use something like: trace-cmd report --cpu=1 trace.dat > And after that the next event comes after 5 Seconds. > > And so I was talking for the Event 41. That first event (Event 41) is an interrupt, and comes from the scheduler tick. The tick is happening because the writeback workqueue just ran and we're not in NO_HZ mode. However, as soon as that IRQ (and resulting softirqs) are finished, we enter NO_HZ mode again. But as you mention, it only lasts for ~5 sec when the timer fires again. Once again, it fires because of the writeback workqueue, and soon therafter it switches back to NO_HZ mode again. So the solution to avoid this jitter on the NO_HZ CPU is to set the affinity of the writeback workqueue to CPU0: # pin the writeback workqueue to CPU0 echo 1 > /sys/bus/workqueue/devices/writeback/cpumask I suspect by doing that, you will no longer see the jitter. Kevin ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Query] Ticks happen in pair for NO_HZ_FULL cores ? 2013-12-17 16:35 ` Kevin Hilman @ 2013-12-17 16:57 ` Frederic Weisbecker 2013-12-18 4:38 ` Viresh Kumar 1 sibling, 0 replies; 13+ messages in thread From: Frederic Weisbecker @ 2013-12-17 16:57 UTC (permalink / raw) To: Kevin Hilman Cc: Viresh Kumar, Lists linaro-kernel, Linaro Networking, Linux Kernel Mailing List, Tejun Heo On Tue, Dec 17, 2013 at 08:35:39AM -0800, Kevin Hilman wrote: > Viresh Kumar <viresh.kumar@linaro.org> writes: > > > Sorry for the delay, was on holidays.. > > > > On 11 December 2013 18:52, Frederic Weisbecker <fweisbec@gmail.com> wrote: > >> On Tue, Dec 03, 2013 at 01:57:37PM +0530, Viresh Kumar wrote: > >>> - again got arch_timer interrupt after 5 ms (HZ=200) > >> > >> Right, looking at the details, the 2nd interrupt is caused by workqueue delayed > >> work bdi writeback. > > > > I am not that great at reading traces or kernelshark output, but I > > still feel I haven't > > seen anything wrong. And I wasn't talking about the delayed workqueue here.. > > > > I am looking at the trace I attached with kernelshark after filtering > > out CPU0 events: > > - Event 41, timestamp: 159.891973 > > - it ends at event 56, timestamp: 159.892043 > > For future reference, for generating email friendly trace output for > discussion like this, you can use something like: > > trace-cmd report --cpu=1 trace.dat > > > And after that the next event comes after 5 Seconds. > > > > And so I was talking for the Event 41. > > That first event (Event 41) is an interrupt, and comes from the > scheduler tick. The tick is happening because the writeback workqueue > just ran and we're not in NO_HZ mode. > > However, as soon as that IRQ (and resulting softirqs) are finished, we > enter NO_HZ mode again. But as you mention, it only lasts for ~5 sec > when the timer fires again. Once again, it fires because of the > writeback workqueue, and soon therafter it switches back to NO_HZ mode > again. > > So the solution to avoid this jitter on the NO_HZ CPU is to set the > affinity of the writeback workqueue to CPU0: > > # pin the writeback workqueue to CPU0 > echo 1 > /sys/bus/workqueue/devices/writeback/cpumask Very interesting trick, I'm going to add it to my dyntick-testing suite. Thanks! ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Query] Ticks happen in pair for NO_HZ_FULL cores ? 2013-12-17 16:35 ` Kevin Hilman 2013-12-17 16:57 ` Frederic Weisbecker @ 2013-12-18 4:38 ` Viresh Kumar 2013-12-18 13:51 ` Kevin Hilman 1 sibling, 1 reply; 13+ messages in thread From: Viresh Kumar @ 2013-12-18 4:38 UTC (permalink / raw) To: Kevin Hilman Cc: Frederic Weisbecker, Lists linaro-kernel, Linaro Networking, Linux Kernel Mailing List, Tejun Heo On 17 December 2013 22:05, Kevin Hilman <khilman@linaro.org> wrote: > For future reference, for generating email friendly trace output for > discussion like this, you can use something like: > > trace-cmd report --cpu=1 trace.dat Okay.. >> And after that the next event comes after 5 Seconds. >> >> And so I was talking for the Event 41. > > That first event (Event 41) is an interrupt, and comes from the > scheduler tick. The tick is happening because the writeback workqueue > just ran and we're not in NO_HZ mode. This is what I was trying to ask. Why can't we enter in NO_HZ_FULL mode as soon as writeback workqueue just ran? That way we can go into NOHZ mode earlier.. > However, as soon as that IRQ (and resulting softirqs) are finished, we > enter NO_HZ mode again. But as you mention, it only lasts for ~5 sec > when the timer fires again. Once again, it fires because of the > writeback workqueue, and soon therafter it switches back to NO_HZ mode > again. That's fine.. It wasn't part of my query :) .. But yes your trick would be useful for my usecase :) ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Query] Ticks happen in pair for NO_HZ_FULL cores ? 2013-12-18 4:38 ` Viresh Kumar @ 2013-12-18 13:51 ` Kevin Hilman 2013-12-18 14:33 ` [LNG] " Viresh Kumar 0 siblings, 1 reply; 13+ messages in thread From: Kevin Hilman @ 2013-12-18 13:51 UTC (permalink / raw) To: Viresh Kumar Cc: Frederic Weisbecker, Lists linaro-kernel, Linaro Networking, Linux Kernel Mailing List, Tejun Heo Viresh Kumar <viresh.kumar@linaro.org> writes: > On 17 December 2013 22:05, Kevin Hilman <khilman@linaro.org> wrote: >> For future reference, for generating email friendly trace output for >> discussion like this, you can use something like: >> >> trace-cmd report --cpu=1 trace.dat > > Okay.. > >>> And after that the next event comes after 5 Seconds. >>> >>> And so I was talking for the Event 41. >> >> That first event (Event 41) is an interrupt, and comes from the >> scheduler tick. The tick is happening because the writeback workqueue >> just ran and we're not in NO_HZ mode. > > This is what I was trying to ask. Why can't we enter in NO_HZ_FULL mode > as soon as writeback workqueue just ran? That way we can go into NOHZ > mode earlier.. Ah, I see. So you're basically asking why we can't evaluate whether to turn off the tick more often, for example right after the workqueues are done. I suppose Frederic may have some views on that, but there's likely additional overhead from those checks as well as that workqueues may not be the only thing keeping us out of NO_HZ. Kevin ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [LNG] Re: [Query] Ticks happen in pair for NO_HZ_FULL cores ? 2013-12-18 13:51 ` Kevin Hilman @ 2013-12-18 14:33 ` Viresh Kumar 2013-12-23 8:18 ` Viresh Kumar 0 siblings, 1 reply; 13+ messages in thread From: Viresh Kumar @ 2013-12-18 14:33 UTC (permalink / raw) To: Kevin Hilman Cc: Frederic Weisbecker, Lists linaro-kernel, Linaro Networking, Linux Kernel Mailing List, Tejun Heo On 18 December 2013 19:21, Kevin Hilman <khilman@linaro.org> wrote: > Ah, I see. So you're basically asking why we can't evaluate whether to > turn off the tick more often, for example right after the workqueues are > done. I suppose Frederic may have some views on that, but there's > likely additional overhead from those checks as well as that workqueues > may not be the only thing keeping us out of NO_HZ. I see that sched_switch is called at the end most of the times so an check there might be useful ? ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [LNG] Re: [Query] Ticks happen in pair for NO_HZ_FULL cores ? 2013-12-18 14:33 ` [LNG] " Viresh Kumar @ 2013-12-23 8:18 ` Viresh Kumar 2014-01-07 7:49 ` Viresh Kumar 2014-01-07 8:47 ` Peter Zijlstra 0 siblings, 2 replies; 13+ messages in thread From: Viresh Kumar @ 2013-12-23 8:18 UTC (permalink / raw) To: Kevin Hilman, Ingo Molnar, Peter Zijlstra, Tejun Heo Cc: Frederic Weisbecker, Lists linaro-kernel, Linaro Networking, Linux Kernel Mailing List Adding Ingo/Peter.. On 18 December 2013 20:03, Viresh Kumar <viresh.kumar@linaro.org> wrote: > On 18 December 2013 19:21, Kevin Hilman <khilman@linaro.org> wrote: >> Ah, I see. So you're basically asking why we can't evaluate whether to >> turn off the tick more often, for example right after the workqueues are >> done. I suppose Frederic may have some views on that, but there's >> likely additional overhead from those checks as well as that workqueues >> may not be the only thing keeping us out of NO_HZ. > > I see that sched_switch is called at the end most of the times so an check > there might be useful ? Wrong time, probably many people on vacation now. But I am working, so will continue reporting my problems, in case somebody is around :) My usecase: I am working on making ARM better for Networking servers. In our usecase we need to isolate few of the cores in our SoC, so that they run a single user space task per CPU. And userspace will take care of data plane side of things for them. Now, we want to use NO_HZ_FULL with CPUSets (And this is what I have been trying since sometime), so that we don't get any, any interruption on those cores. They should keep running that task unless that task tries to switch to kernel space. I am getting interrupted by few of the workqueues (other than per-cpu ones). One of them was bdi writeback one, that we discussed earlier. I have done some work in the past about Power efficient workqueues (Mentioned by Tejun few mails back), which used to switch those works on UNBOUND workqueues and so scheduler would decide on the CPU it want's to queue those works on. With an idle CPU, it works fine as scheduler doesn't wake up a idle CPU for servicing that work. *But wouldn't it make sense if we can tell scheduler that don't queue these works on a CPU that is running in NO_HZ_FULL mode?* Also any suggestions on how to get rid of __prandom_timer events on such CPUs? Thanks in Advance.. -- viresh ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [LNG] Re: [Query] Ticks happen in pair for NO_HZ_FULL cores ? 2013-12-23 8:18 ` Viresh Kumar @ 2014-01-07 7:49 ` Viresh Kumar 2014-01-07 8:47 ` Peter Zijlstra 1 sibling, 0 replies; 13+ messages in thread From: Viresh Kumar @ 2014-01-07 7:49 UTC (permalink / raw) To: Kevin Hilman, Ingo Molnar, Peter Zijlstra, Tejun Heo Cc: Frederic Weisbecker, Lists linaro-kernel, Linaro Networking, Linux Kernel Mailing List On 23 December 2013 13:48, Viresh Kumar <viresh.kumar@linaro.org> wrote: > Wrong time, probably many people on vacation now. But I am working, so > will continue reporting my problems, in case somebody is around :) Ping!! (Probably many people would be back from their vacations.) ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [LNG] Re: [Query] Ticks happen in pair for NO_HZ_FULL cores ? 2013-12-23 8:18 ` Viresh Kumar 2014-01-07 7:49 ` Viresh Kumar @ 2014-01-07 8:47 ` Peter Zijlstra 2014-01-07 8:55 ` Viresh Kumar 1 sibling, 1 reply; 13+ messages in thread From: Peter Zijlstra @ 2014-01-07 8:47 UTC (permalink / raw) To: Viresh Kumar Cc: Kevin Hilman, Ingo Molnar, Tejun Heo, Frederic Weisbecker, Lists linaro-kernel, Linaro Networking, Linux Kernel Mailing List On Mon, Dec 23, 2013 at 01:48:02PM +0530, Viresh Kumar wrote: > *But wouldn't it make sense if we can tell scheduler that don't queue > these works on a CPU that is running in NO_HZ_FULL mode?* No,.. that's the wrong way around. > Also any suggestions on how to get rid of __prandom_timer events on > such CPUs? That looks to be a normal unpinned timer, it should migrate to a 'busy' cpu once the one its running on it going idle. ISTR people trying to make that active and also migrating on nohz full or somesuch, just like the workqueues. Forgot what happened with that; if it got dropped it should probably be ressurected. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [LNG] Re: [Query] Ticks happen in pair for NO_HZ_FULL cores ? 2014-01-07 8:47 ` Peter Zijlstra @ 2014-01-07 8:55 ` Viresh Kumar 0 siblings, 0 replies; 13+ messages in thread From: Viresh Kumar @ 2014-01-07 8:55 UTC (permalink / raw) To: Peter Zijlstra Cc: Kevin Hilman, Ingo Molnar, Tejun Heo, Frederic Weisbecker, Lists linaro-kernel, Linaro Networking, Linux Kernel Mailing List On 7 January 2014 14:17, Peter Zijlstra <peterz@infradead.org> wrote: > On Mon, Dec 23, 2013 at 01:48:02PM +0530, Viresh Kumar wrote: >> *But wouldn't it make sense if we can tell scheduler that don't queue >> these works on a CPU that is running in NO_HZ_FULL mode?* > > No,.. that's the wrong way around. Hmm.. Just to make it clear I didn't meant that any input from workqueue code should go to scheduler but something like this: Scheduler will check following before pushing a task on any CPU: - If that CPU is part of NO_HZ_FULL cpu list? - If yes, is that CPU running only one task for now? i.e. running task for best performance case? - If yes, then don't queue new task to that CPU, whether task belongs to workqueue or not doesn't matter. > That looks to be a normal unpinned timer, it should migrate to a 'busy' > cpu once the one its running on it going idle. > > ISTR people trying to make that active and also migrating on nohz full > or somesuch, just like the workqueues. Forgot what happened with that; > if it got dropped it should probably be ressurected. I will search for that in archives.. ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2014-01-07 8:55 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <CAKohponch=o3nBKTmakA87OiN=HbgnEwJUL23mGkjQiNoJWjWw@mail.gmail.com>
2013-12-11 13:22 ` [Query] Ticks happen in pair for NO_HZ_FULL cores ? Frederic Weisbecker
2013-12-11 21:14 ` Tejun Heo
2013-12-13 0:32 ` Kevin Hilman
2013-12-17 10:35 ` Viresh Kumar
2013-12-17 16:35 ` Kevin Hilman
2013-12-17 16:57 ` Frederic Weisbecker
2013-12-18 4:38 ` Viresh Kumar
2013-12-18 13:51 ` Kevin Hilman
2013-12-18 14:33 ` [LNG] " Viresh Kumar
2013-12-23 8:18 ` Viresh Kumar
2014-01-07 7:49 ` Viresh Kumar
2014-01-07 8:47 ` Peter Zijlstra
2014-01-07 8:55 ` Viresh Kumar
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox