* [QUERY]: Is using CPU hotplug right for isolating CPUs?
@ 2014-01-15 9:27 Viresh Kumar
2014-01-15 10:38 ` Peter Zijlstra
` (2 more replies)
0 siblings, 3 replies; 28+ messages in thread
From: Viresh Kumar @ 2014-01-15 9:27 UTC (permalink / raw)
To: Frédéric Weisbecker, Kevin Hilman
Cc: Vincent Guittot, Amit Kucheria, Peter Zijlstra,
Lists linaro-kernel, Linaro Networking, Linux Kernel Mailing List,
Steven Rostedt
Hi Again,
I am now successful in isolating a CPU completely using CPUsets,
NO_HZ_FULL and CPU hotplug..
My setup and requirements for those who weren't following the
earlier mails:
For networking machines it is required to run data plane threads on
some CPUs (i.e. one thread per CPU) and these CPUs shouldn't be
interrupted by kernel at all.
Earlier I tried CPUSets with NO_HZ by creating two groups with
load_balancing disabled between them and manually tried to move
all tasks out to CPU0 group. But even then there were interruptions
which were continuously coming on CPU1 (which I am trying to
isolate). These were some workqueue events, some timers (like
prandom), timer overflow events (As NO_HZ_FULL pushes hrtimer
to long ahead in future, 450 seconds, rather than disabling them
completely, and these hardware timers were overflowing their
counters after 90 seconds on Samsung Exynos board).
So after creating CPUsets I hotunplugged CPU1 and added it back
immediately. This moved all these interruptions away and now
CPU1 is running my single thread ("stress") for ever.
Now my question is: Is there anything particularly wrong about using
hotplugging here ? Will that lead to a disaster :)
Thanks in Advance.
--
viresh
^ permalink raw reply [flat|nested] 28+ messages in thread* Re: [QUERY]: Is using CPU hotplug right for isolating CPUs? 2014-01-15 9:27 [QUERY]: Is using CPU hotplug right for isolating CPUs? Viresh Kumar @ 2014-01-15 10:38 ` Peter Zijlstra 2014-01-15 10:47 ` Viresh Kumar 2014-01-15 17:17 ` Frederic Weisbecker 2014-01-20 13:59 ` Lei Wen 2 siblings, 1 reply; 28+ messages in thread From: Peter Zijlstra @ 2014-01-15 10:38 UTC (permalink / raw) To: Viresh Kumar Cc: Frédéric Weisbecker, Kevin Hilman, Vincent Guittot, Amit Kucheria, Lists linaro-kernel, Linaro Networking, Linux Kernel Mailing List, Steven Rostedt On Wed, Jan 15, 2014 at 02:57:36PM +0530, Viresh Kumar wrote: > Hi Again, > > I am now successful in isolating a CPU completely using CPUsets, > NO_HZ_FULL and CPU hotplug.. > > My setup and requirements for those who weren't following the > earlier mails: > > For networking machines it is required to run data plane threads on > some CPUs (i.e. one thread per CPU) and these CPUs shouldn't be > interrupted by kernel at all. > > Earlier I tried CPUSets with NO_HZ by creating two groups with > load_balancing disabled between them and manually tried to move > all tasks out to CPU0 group. But even then there were interruptions > which were continuously coming on CPU1 (which I am trying to > isolate). These were some workqueue events, some timers (like > prandom), timer overflow events (As NO_HZ_FULL pushes hrtimer > to long ahead in future, 450 seconds, rather than disabling them > completely, and these hardware timers were overflowing their > counters after 90 seconds on Samsung Exynos board). > > So after creating CPUsets I hotunplugged CPU1 and added it back > immediately. This moved all these interruptions away and now > CPU1 is running my single thread ("stress") for ever. > > Now my question is: Is there anything particularly wrong about using > hotplugging here ? Will that lead to a disaster :) Nah, its just ugly and we should fix it. You need to be careful to not place tasks in a cpuset you're going to unplug though, that'll give funny results. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [QUERY]: Is using CPU hotplug right for isolating CPUs? 2014-01-15 10:38 ` Peter Zijlstra @ 2014-01-15 10:47 ` Viresh Kumar 2014-01-15 11:34 ` Peter Zijlstra 0 siblings, 1 reply; 28+ messages in thread From: Viresh Kumar @ 2014-01-15 10:47 UTC (permalink / raw) To: Peter Zijlstra Cc: Frédéric Weisbecker, Kevin Hilman, Vincent Guittot, Amit Kucheria, Lists linaro-kernel, Linaro Networking, Linux Kernel Mailing List, Steven Rostedt On 15 January 2014 16:08, Peter Zijlstra <peterz@infradead.org> wrote: > Nah, its just ugly and we should fix it. You need to be careful to not > place tasks in a cpuset you're going to unplug though, that'll give > funny results. Okay. So how do you suggest to get rid of cases like a work queued on CPU1 initially and because it gets queued again from its work handler, it stays on the same CPU forever. And then there were timer overflow events that occur because hrtimer is started by tick-sched stuff for 450 seconds later in time. -- viresh ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [QUERY]: Is using CPU hotplug right for isolating CPUs? 2014-01-15 10:47 ` Viresh Kumar @ 2014-01-15 11:34 ` Peter Zijlstra 2014-02-28 9:04 ` Viresh Kumar 0 siblings, 1 reply; 28+ messages in thread From: Peter Zijlstra @ 2014-01-15 11:34 UTC (permalink / raw) To: Viresh Kumar Cc: Frédéric Weisbecker, Kevin Hilman, Vincent Guittot, Amit Kucheria, Lists linaro-kernel, Linaro Networking, Linux Kernel Mailing List, Steven Rostedt On Wed, Jan 15, 2014 at 04:17:26PM +0530, Viresh Kumar wrote: > On 15 January 2014 16:08, Peter Zijlstra <peterz@infradead.org> wrote: > > Nah, its just ugly and we should fix it. You need to be careful to not > > place tasks in a cpuset you're going to unplug though, that'll give > > funny results. > > Okay. So how do you suggest to get rid of cases like a work queued > on CPU1 initially and because it gets queued again from its work handler, > it stays on the same CPU forever. We should have a cpuset.quiesce control or something that moves all timers out. > And then there were timer overflow events that occur because hrtimer > is started by tick-sched stuff for 450 seconds later in time. -ENOPARSE ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [QUERY]: Is using CPU hotplug right for isolating CPUs? 2014-01-15 11:34 ` Peter Zijlstra @ 2014-02-28 9:04 ` Viresh Kumar 0 siblings, 0 replies; 28+ messages in thread From: Viresh Kumar @ 2014-02-28 9:04 UTC (permalink / raw) To: Peter Zijlstra, Thomas Gleixner Cc: Frédéric Weisbecker, Kevin Hilman, Vincent Guittot, Amit Kucheria, Lists linaro-kernel, Linaro Networking, Linux Kernel Mailing List, Steven Rostedt On 15 January 2014 17:04, Peter Zijlstra <peterz@infradead.org> wrote: > On Wed, Jan 15, 2014 at 04:17:26PM +0530, Viresh Kumar wrote: >> On 15 January 2014 16:08, Peter Zijlstra <peterz@infradead.org> wrote: >> > Nah, its just ugly and we should fix it. You need to be careful to not >> > place tasks in a cpuset you're going to unplug though, that'll give >> > funny results. >> >> Okay. So how do you suggest to get rid of cases like a work queued >> on CPU1 initially and because it gets queued again from its work handler, >> it stays on the same CPU forever. > > We should have a cpuset.quiesce control or something that moves all > timers out. What should we do here if we have a valid base->running_timer for the cpu requesting the quiesce ? ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [QUERY]: Is using CPU hotplug right for isolating CPUs? 2014-01-15 9:27 [QUERY]: Is using CPU hotplug right for isolating CPUs? Viresh Kumar 2014-01-15 10:38 ` Peter Zijlstra @ 2014-01-15 17:17 ` Frederic Weisbecker [not found] ` <CAKohponEZydR1OmP2xziA9bc3OJPgP3bFmuWFQmrmeQFZccMVQ@mail.gmail.com> 2014-01-20 13:59 ` Lei Wen 2 siblings, 1 reply; 28+ messages in thread From: Frederic Weisbecker @ 2014-01-15 17:17 UTC (permalink / raw) To: Viresh Kumar Cc: Kevin Hilman, Vincent Guittot, Amit Kucheria, Peter Zijlstra, Lists linaro-kernel, Linaro Networking, Linux Kernel Mailing List, Steven Rostedt On Wed, Jan 15, 2014 at 02:57:36PM +0530, Viresh Kumar wrote: > Hi Again, > > I am now successful in isolating a CPU completely using CPUsets, > NO_HZ_FULL and CPU hotplug.. > > My setup and requirements for those who weren't following the > earlier mails: > > For networking machines it is required to run data plane threads on > some CPUs (i.e. one thread per CPU) and these CPUs shouldn't be > interrupted by kernel at all. > > Earlier I tried CPUSets with NO_HZ by creating two groups with > load_balancing disabled between them and manually tried to move > all tasks out to CPU0 group. But even then there were interruptions > which were continuously coming on CPU1 (which I am trying to > isolate). These were some workqueue events, some timers (like > prandom), timer overflow events (As NO_HZ_FULL pushes hrtimer > to long ahead in future, 450 seconds, rather than disabling them > completely, and these hardware timers were overflowing their > counters after 90 seconds on Samsung Exynos board). Are you sure about that? NO_HZ_FULL shouldn't touch much hrtimers. Those are independant from the tick. Although some of them seem to rely on the softirq, but that seem to concern the tick hrtimer only. ^ permalink raw reply [flat|nested] 28+ messages in thread
[parent not found: <CAKohponEZydR1OmP2xziA9bc3OJPgP3bFmuWFQmrmeQFZccMVQ@mail.gmail.com>]
* Re: [QUERY]: Is using CPU hotplug right for isolating CPUs? [not found] ` <CAKohponEZydR1OmP2xziA9bc3OJPgP3bFmuWFQmrmeQFZccMVQ@mail.gmail.com> @ 2014-01-16 9:46 ` Thomas Gleixner 2014-01-20 11:30 ` Viresh Kumar 0 siblings, 1 reply; 28+ messages in thread From: Thomas Gleixner @ 2014-01-16 9:46 UTC (permalink / raw) To: Viresh Kumar Cc: Frederic Weisbecker, Peter Zijlstra, Linux Kernel Mailing List, Lists linaro-kernel, Steven Rostedt, Linaro Networking On Thu, 16 Jan 2014, Viresh Kumar wrote: > On 15 January 2014 22:47, Frederic Weisbecker <fweisbec@gmail.com> wrote: > > Are you sure about that? NO_HZ_FULL shouldn't touch much hrtimers. > > Those are independant from the tick. > > > > Although some of them seem to rely on the softirq, but that seem to > > concern the tick hrtimer only. > > To make it clear I was talking about the hrtimer used by tick_sched_timer. > I have crossed checked which timers are active on isolated CPU from > /proc/timer_list and it gave on tick_sched_timer's hrtimer. > > In the attached trace (dft.txt), see these locations: > - Line 252: Time 302.573881: we scheduled the hrtimer for 300 seconds > ahead of current time. > - Line 254, 258, 262, 330, 334: We got interruptions continuously after > ~90 seconds and this looked to be a case of timer's counter overflow. > Isn't it? (I have removed some lines towards the end of this file to make > it shorter, though dft.dat is untouched) Just do the math. max reload value / timer freq = max time span So: 0x7fffffff / 24MHz = 89.478485 sec Nothing to do here except to get rid of the requirement to arm the timer at all. Thanks, tglx ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [QUERY]: Is using CPU hotplug right for isolating CPUs? 2014-01-16 9:46 ` Thomas Gleixner @ 2014-01-20 11:30 ` Viresh Kumar 2014-01-20 15:51 ` Frederic Weisbecker 0 siblings, 1 reply; 28+ messages in thread From: Viresh Kumar @ 2014-01-20 11:30 UTC (permalink / raw) To: Thomas Gleixner Cc: Frederic Weisbecker, Peter Zijlstra, Linux Kernel Mailing List, Lists linaro-kernel, Steven Rostedt, Linaro Networking On 16 January 2014 15:16, Thomas Gleixner <tglx@linutronix.de> wrote: > Just do the math. > > max reload value / timer freq = max time span Thanks. > So: > > 0x7fffffff / 24MHz = 89.478485 sec > > Nothing to do here except to get rid of the requirement to arm the > timer at all. @Frederic: Any inputs on how to get rid of this timer here? ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [QUERY]: Is using CPU hotplug right for isolating CPUs? 2014-01-20 11:30 ` Viresh Kumar @ 2014-01-20 15:51 ` Frederic Weisbecker 2014-01-21 10:33 ` Viresh Kumar 0 siblings, 1 reply; 28+ messages in thread From: Frederic Weisbecker @ 2014-01-20 15:51 UTC (permalink / raw) To: Viresh Kumar Cc: Thomas Gleixner, Peter Zijlstra, Linux Kernel Mailing List, Lists linaro-kernel, Steven Rostedt, Linaro Networking On Mon, Jan 20, 2014 at 05:00:20PM +0530, Viresh Kumar wrote: > On 16 January 2014 15:16, Thomas Gleixner <tglx@linutronix.de> wrote: > > Just do the math. > > > > max reload value / timer freq = max time span > > Thanks. > > > So: > > > > 0x7fffffff / 24MHz = 89.478485 sec > > > > Nothing to do here except to get rid of the requirement to arm the > > timer at all. > > @Frederic: Any inputs on how to get rid of this timer here? I fear you can't. If you schedule a timer in 4 seconds away and your clockdevice can only count up to 2 seconds, you can't help much the interrupt in the middle to cope with the overflow. So you need to act on the source of the timer: * identify what cause this timer * try to turn that feature off * if you can't then move the timer to the housekeeping CPU I'll have a look into the latter point to affine global timers to the housekeeping CPU. Per cpu timers need more inspection though. Either we rework them to be possibly handled by remote/housekeeping CPUs, or we let the associate feature to be turned off. All in one it's a case by case work. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [QUERY]: Is using CPU hotplug right for isolating CPUs? 2014-01-20 15:51 ` Frederic Weisbecker @ 2014-01-21 10:33 ` Viresh Kumar 2014-01-23 14:58 ` Frederic Weisbecker 0 siblings, 1 reply; 28+ messages in thread From: Viresh Kumar @ 2014-01-21 10:33 UTC (permalink / raw) To: Frederic Weisbecker Cc: Thomas Gleixner, Peter Zijlstra, Linux Kernel Mailing List, Lists linaro-kernel, Steven Rostedt, Linaro Networking On 20 January 2014 21:21, Frederic Weisbecker <fweisbec@gmail.com> wrote: > I fear you can't. If you schedule a timer in 4 seconds away and your clockdevice > can only count up to 2 seconds, you can't help much the interrupt in the middle to > cope with the overflow. > > So you need to act on the source of the timer: > > * identify what cause this timer > * try to turn that feature off > * if you can't then move the timer to the housekeeping CPU So, the main problem in my case was caused by this: <...>-2147 [001] d..2 302.573881: hrtimer_start: hrtimer=c172aa50 function=tick_sched_timer expires=602075000000 softexpires=602075000000 I have mentioned this earlier when I sent you attachments. I think this is somehow tied with the NO_HZ_FULL stuff? As the timer is queued for 300 seconds after current time. How to get this out? > I'll have a look into the latter point to affine global timers to the > housekeeping CPU. Per cpu timers need more inspection though. Either we rework > them to be possibly handled by remote/housekeeping CPUs, or we let the associate feature > to be turned off. All in one it's a case by case work. Which CPUs are housekeeping CPUs? How do we declare them? ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [QUERY]: Is using CPU hotplug right for isolating CPUs? 2014-01-21 10:33 ` Viresh Kumar @ 2014-01-23 14:58 ` Frederic Weisbecker 2014-01-24 5:21 ` Viresh Kumar 0 siblings, 1 reply; 28+ messages in thread From: Frederic Weisbecker @ 2014-01-23 14:58 UTC (permalink / raw) To: Viresh Kumar Cc: Thomas Gleixner, Peter Zijlstra, Linux Kernel Mailing List, Lists linaro-kernel, Steven Rostedt, Linaro Networking, Kevin Hilman On Tue, Jan 21, 2014 at 04:03:53PM +0530, Viresh Kumar wrote: > On 20 January 2014 21:21, Frederic Weisbecker <fweisbec@gmail.com> wrote: > > I fear you can't. If you schedule a timer in 4 seconds away and your clockdevice > > can only count up to 2 seconds, you can't help much the interrupt in the middle to > > cope with the overflow. > > > > So you need to act on the source of the timer: > > > > * identify what cause this timer > > * try to turn that feature off > > * if you can't then move the timer to the housekeeping CPU > > So, the main problem in my case was caused by this: > > <...>-2147 [001] d..2 302.573881: hrtimer_start: > hrtimer=c172aa50 function=tick_sched_timer expires=602075000000 > softexpires=602075000000 > > I have mentioned this earlier when I sent you attachments. I think > this is somehow > tied with the NO_HZ_FULL stuff? As the timer is queued for 300 seconds after > current time. > > How to get this out? So it's scheduled away 300 seconds later. It might be a pending timer_list. Enabling the timer tracepoints may give you some clues. > > > I'll have a look into the latter point to affine global timers to the > > housekeeping CPU. Per cpu timers need more inspection though. Either we rework > > them to be possibly handled by remote/housekeeping CPUs, or we let the associate feature > > to be turned off. All in one it's a case by case work. > > Which CPUs are housekeeping CPUs? How do we declare them? It's not yet implemented, but it's an idea (partly from Thomas) of something we can do to define some general policy on various periodic/async work affinity to enforce isolation. The basic idea is to define the CPU handling the timekeeping duty to be the housekeeping CPU. Given that CPU must keep a periodic tick, lets move all the unbound timers and workqueues there. And also try to move some CPU affine work as well. For example we could handle the scheduler tick of the full dynticks CPUs into that housekeeping CPU, at a low freqency. This way we could remove that 1 second scheduler tick max deferment per CPU. It may be an overkill though to run all the scheduler ticks on a single CPU so there may be other ways to cope with that. And I would like to keep that housekeeping notion flexible enough to be extendable on more than one CPU, as I heard that some people plan to reserve one CPU per node on big NUMA machines for such a purpose. So that could be a cpumask, augmented with an infrastructure. Of course, if some people help contributing in this area, some things may eventually move foward on the support of CPU isolation. I can't do that all alone, at least not quickly, given all the things already pending in my queue (fix buggy nohz iowait accounting, support RCU full sysidle detection, apply AMD range breakpoints patches, further cleanup posix cpu timers, etc...). Thanks. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [QUERY]: Is using CPU hotplug right for isolating CPUs? 2014-01-23 14:58 ` Frederic Weisbecker @ 2014-01-24 5:21 ` Viresh Kumar 2014-01-24 8:29 ` Mike Galbraith 2014-01-28 13:23 ` Frederic Weisbecker 0 siblings, 2 replies; 28+ messages in thread From: Viresh Kumar @ 2014-01-24 5:21 UTC (permalink / raw) To: Frederic Weisbecker Cc: Thomas Gleixner, Peter Zijlstra, Linux Kernel Mailing List, Lists linaro-kernel, Steven Rostedt, Linaro Networking, Kevin Hilman On 23 January 2014 20:28, Frederic Weisbecker <fweisbec@gmail.com> wrote: > On Tue, Jan 21, 2014 at 04:03:53PM +0530, Viresh Kumar wrote: >> So, the main problem in my case was caused by this: >> >> <...>-2147 [001] d..2 302.573881: hrtimer_start: >> hrtimer=c172aa50 function=tick_sched_timer expires=602075000000 >> softexpires=602075000000 >> >> I have mentioned this earlier when I sent you attachments. I think >> this is somehow >> tied with the NO_HZ_FULL stuff? As the timer is queued for 300 seconds after >> current time. >> >> How to get this out? > > So it's scheduled away 300 seconds later. It might be a pending timer_list. Enabling the > timer tracepoints may give you some clues. Trace was done with that enabled. /proc/timer_list confirms that a hrtimer is queued for 300 seconds later for tick_sched_timer. And so I assumed this is part of the current NO_HZ_FULL implementation. Just to confirm, when we decide that a CPU is running a single task and so can enter tickless mode, do we queue this tick_sched_timer for 300 seconds ahead of time? If not, then who is doing this :) >> Which CPUs are housekeeping CPUs? How do we declare them? > > It's not yet implemented, but it's an idea (partly from Thomas) of something we can do to > define some general policy on various periodic/async work affinity to enforce isolation. > > The basic idea is to define the CPU handling the timekeeping duty to be the housekeeping > CPU. Given that CPU must keep a periodic tick, lets move all the unbound timers and > workqueues there. And also try to move some CPU affine work as well. For example > we could handle the scheduler tick of the full dynticks CPUs into that housekeeping > CPU, at a low freqency. This way we could remove that 1 second scheduler tick max deferment > per CPU. It may be an overkill though to run all the scheduler ticks on a single CPU so there > may be other ways to cope with that. > > And I would like to keep that housekeeping notion flexible enough to be extendable on more > than one CPU, as I heard that some people plan to reserve one CPU per node on big > NUMA machines for such a purpose. So that could be a cpumask, augmented with an infrastructure. > > Of course, if some people help contributing in this area, some things may eventually move foward > on the support of CPU isolation. I can't do that all alone, at least not quickly, given all the > things already pending in my queue (fix buggy nohz iowait accounting, support RCU full sysidle detection, > apply AMD range breakpoints patches, further cleanup posix cpu timers, etc...). I see. As I am currently working on the isolation stuff which is very much required for my usecase, I will try to do that as the second step of my work. The first one stays something like a cpuset.quiesce option that PeterZ suggested. Any pointers of earlier discussion on this topic would be helpful to start working on this.. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [QUERY]: Is using CPU hotplug right for isolating CPUs? 2014-01-24 5:21 ` Viresh Kumar @ 2014-01-24 8:29 ` Mike Galbraith 2014-01-28 13:23 ` Frederic Weisbecker 1 sibling, 0 replies; 28+ messages in thread From: Mike Galbraith @ 2014-01-24 8:29 UTC (permalink / raw) To: Viresh Kumar Cc: Frederic Weisbecker, Thomas Gleixner, Peter Zijlstra, Linux Kernel Mailing List, Lists linaro-kernel, Steven Rostedt, Linaro Networking, Kevin Hilman On Fri, 2014-01-24 at 10:51 +0530, Viresh Kumar wrote: > On 23 January 2014 20:28, Frederic Weisbecker <fweisbec@gmail.com> wrote: > > On Tue, Jan 21, 2014 at 04:03:53PM +0530, Viresh Kumar wrote: > > >> So, the main problem in my case was caused by this: > >> > >> <...>-2147 [001] d..2 302.573881: hrtimer_start: > >> hrtimer=c172aa50 function=tick_sched_timer expires=602075000000 > >> softexpires=602075000000 > >> > >> I have mentioned this earlier when I sent you attachments. I think > >> this is somehow > >> tied with the NO_HZ_FULL stuff? As the timer is queued for 300 seconds after > >> current time. > >> > >> How to get this out? > > > > So it's scheduled away 300 seconds later. It might be a pending timer_list. Enabling the > > timer tracepoints may give you some clues. > > Trace was done with that enabled. /proc/timer_list confirms that a hrtimer > is queued for 300 seconds later for tick_sched_timer. And so I assumed > this is part of the current NO_HZ_FULL implementation. > > Just to confirm, when we decide that a CPU is running a single task and so > can enter tickless mode, do we queue this tick_sched_timer for 300 seconds > ahead of time? If not, then who is doing this :) > > >> Which CPUs are housekeeping CPUs? How do we declare them? > > > > It's not yet implemented, but it's an idea (partly from Thomas) of something we can do to > > define some general policy on various periodic/async work affinity to enforce isolation. > > > > The basic idea is to define the CPU handling the timekeeping duty to be the housekeeping > > CPU. Given that CPU must keep a periodic tick, lets move all the unbound timers and > > workqueues there. And also try to move some CPU affine work as well. For example > > we could handle the scheduler tick of the full dynticks CPUs into that housekeeping > > CPU, at a low freqency. This way we could remove that 1 second scheduler tick max deferment > > per CPU. It may be an overkill though to run all the scheduler ticks on a single CPU so there > > may be other ways to cope with that. > > > > And I would like to keep that housekeeping notion flexible enough to be extendable on more > > than one CPU, as I heard that some people plan to reserve one CPU per node on big > > NUMA machines for such a purpose. So that could be a cpumask, augmented with an infrastructure. > > > > Of course, if some people help contributing in this area, some things may eventually move foward > > on the support of CPU isolation. I can't do that all alone, at least not quickly, given all the > > things already pending in my queue (fix buggy nohz iowait accounting, support RCU full sysidle detection, > > apply AMD range breakpoints patches, further cleanup posix cpu timers, etc...). > > I see. As I am currently working on the isolation stuff which is very > much required > for my usecase, I will try to do that as the second step of my work. > The first one > stays something like a cpuset.quiesce option that PeterZ suggested. > > Any pointers of earlier discussion on this topic would be helpful to > start working on > this.. All of that nohz_full stuff would be a lot more usable if it were dynamic via cpusets. As the thing sits, if you need a small group of tickless cores once in a while, you have to eat a truckload of overhead and zillion threads always. The price is high. I have a little hack for my -rt kernel that allows the user to turn the tick on/off (and cpupri) on a per fully isolated set basis, because jitter is lower with the tick than with nohz doing it's thing. With that, you can set up whatever portion of box to meet your needs on the fly. When you need very low jitter, turn all load balancing off in your critical set, turn nohz off, turn rt load balancing off, and 80 core boxen become usable for cool zillion dollar realtime video games.. box becomes a militarized playstation. Doing the same with nohz_full would be a _lot_ harder (my hacks are trivial), but would be a lot more attractive to users than always eating the high nohz_full cost whether using it or not. Poke buttons, threads are born or die, patch in/out expensive accounting goop and whatnot, play evil high speed stock market bandit, or whatever else, at the poke of couple buttons. -Mike ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [QUERY]: Is using CPU hotplug right for isolating CPUs? 2014-01-24 5:21 ` Viresh Kumar 2014-01-24 8:29 ` Mike Galbraith @ 2014-01-28 13:23 ` Frederic Weisbecker 2014-01-28 16:11 ` Kevin Hilman 2014-02-11 8:52 ` Viresh Kumar 1 sibling, 2 replies; 28+ messages in thread From: Frederic Weisbecker @ 2014-01-28 13:23 UTC (permalink / raw) To: Viresh Kumar Cc: Thomas Gleixner, Peter Zijlstra, Linux Kernel Mailing List, Lists linaro-kernel, Steven Rostedt, Linaro Networking, Kevin Hilman On Fri, Jan 24, 2014 at 10:51:14AM +0530, Viresh Kumar wrote: > On 23 January 2014 20:28, Frederic Weisbecker <fweisbec@gmail.com> wrote: > > On Tue, Jan 21, 2014 at 04:03:53PM +0530, Viresh Kumar wrote: > > >> So, the main problem in my case was caused by this: > >> > >> <...>-2147 [001] d..2 302.573881: hrtimer_start: > >> hrtimer=c172aa50 function=tick_sched_timer expires=602075000000 > >> softexpires=602075000000 > >> > >> I have mentioned this earlier when I sent you attachments. I think > >> this is somehow > >> tied with the NO_HZ_FULL stuff? As the timer is queued for 300 seconds after > >> current time. > >> > >> How to get this out? > > > > So it's scheduled away 300 seconds later. It might be a pending timer_list. Enabling the > > timer tracepoints may give you some clues. > > Trace was done with that enabled. /proc/timer_list confirms that a hrtimer > is queued for 300 seconds later for tick_sched_timer. And so I assumed > this is part of the current NO_HZ_FULL implementation. > > Just to confirm, when we decide that a CPU is running a single task and so > can enter tickless mode, do we queue this tick_sched_timer for 300 seconds > ahead of time? If not, then who is doing this :) No, when a single task is running on a full dynticks CPU, the tick is supposed to run every seconds. I'm actually suprised it doesn't happen in your traces, did you tweak something specific? The 300 seconds timer is probably due to a timer_list, just enable the timer_start and timer_expire_entry events to get the name of the culprit. > > >> Which CPUs are housekeeping CPUs? How do we declare them? > > > > It's not yet implemented, but it's an idea (partly from Thomas) of something we can do to > > define some general policy on various periodic/async work affinity to enforce isolation. > > > > The basic idea is to define the CPU handling the timekeeping duty to be the housekeeping > > CPU. Given that CPU must keep a periodic tick, lets move all the unbound timers and > > workqueues there. And also try to move some CPU affine work as well. For example > > we could handle the scheduler tick of the full dynticks CPUs into that housekeeping > > CPU, at a low freqency. This way we could remove that 1 second scheduler tick max deferment > > per CPU. It may be an overkill though to run all the scheduler ticks on a single CPU so there > > may be other ways to cope with that. > > > > And I would like to keep that housekeeping notion flexible enough to be extendable on more > > than one CPU, as I heard that some people plan to reserve one CPU per node on big > > NUMA machines for such a purpose. So that could be a cpumask, augmented with an infrastructure. > > > > Of course, if some people help contributing in this area, some things may eventually move foward > > on the support of CPU isolation. I can't do that all alone, at least not quickly, given all the > > things already pending in my queue (fix buggy nohz iowait accounting, support RCU full sysidle detection, > > apply AMD range breakpoints patches, further cleanup posix cpu timers, etc...). > > I see. As I am currently working on the isolation stuff which is very > much required > for my usecase, I will try to do that as the second step of my work. > The first one > stays something like a cpuset.quiesce option that PeterZ suggested. Cool! > > Any pointers of earlier discussion on this topic would be helpful to > start working on > this.. I think that being able to control the UNBOUND workqueue affinity may be a nice first step. Thanks. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [QUERY]: Is using CPU hotplug right for isolating CPUs? 2014-01-28 13:23 ` Frederic Weisbecker @ 2014-01-28 16:11 ` Kevin Hilman 2014-02-03 8:26 ` Viresh Kumar 2014-02-11 8:52 ` Viresh Kumar 1 sibling, 1 reply; 28+ messages in thread From: Kevin Hilman @ 2014-01-28 16:11 UTC (permalink / raw) To: Frederic Weisbecker Cc: Viresh Kumar, Thomas Gleixner, Peter Zijlstra, Linux Kernel Mailing List, Lists linaro-kernel, Steven Rostedt, Linaro Networking On Tue, Jan 28, 2014 at 5:23 AM, Frederic Weisbecker <fweisbec@gmail.com> wrote: > On Fri, Jan 24, 2014 at 10:51:14AM +0530, Viresh Kumar wrote: >> On 23 January 2014 20:28, Frederic Weisbecker <fweisbec@gmail.com> wrote: >> > On Tue, Jan 21, 2014 at 04:03:53PM +0530, Viresh Kumar wrote: >> >> >> So, the main problem in my case was caused by this: >> >> >> >> <...>-2147 [001] d..2 302.573881: hrtimer_start: >> >> hrtimer=c172aa50 function=tick_sched_timer expires=602075000000 >> >> softexpires=602075000000 >> >> >> >> I have mentioned this earlier when I sent you attachments. I think >> >> this is somehow >> >> tied with the NO_HZ_FULL stuff? As the timer is queued for 300 seconds after >> >> current time. >> >> >> >> How to get this out? >> > >> > So it's scheduled away 300 seconds later. It might be a pending timer_list. Enabling the >> > timer tracepoints may give you some clues. >> >> Trace was done with that enabled. /proc/timer_list confirms that a hrtimer >> is queued for 300 seconds later for tick_sched_timer. And so I assumed >> this is part of the current NO_HZ_FULL implementation. >> >> Just to confirm, when we decide that a CPU is running a single task and so >> can enter tickless mode, do we queue this tick_sched_timer for 300 seconds >> ahead of time? If not, then who is doing this :) > > No, when a single task is running on a full dynticks CPU, the tick is supposed to run > every seconds. I'm actually suprised it doesn't happen in your traces, did you tweak > something specific? I think Viresh is using my patch/hack to configure/disable the 1Hz residual tick. Kevin ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [QUERY]: Is using CPU hotplug right for isolating CPUs? 2014-01-28 16:11 ` Kevin Hilman @ 2014-02-03 8:26 ` Viresh Kumar 0 siblings, 0 replies; 28+ messages in thread From: Viresh Kumar @ 2014-02-03 8:26 UTC (permalink / raw) To: Kevin Hilman Cc: Frederic Weisbecker, Thomas Gleixner, Peter Zijlstra, Linux Kernel Mailing List, Lists linaro-kernel, Steven Rostedt, Linaro Networking On 28 January 2014 21:41, Kevin Hilman <khilman@linaro.org> wrote: > I think Viresh is using my patch/hack to configure/disable the 1Hz > residual tick. Yeah. I am using sched_tick_max_deferment by setting it to -1. Why do we need a timer every second for NO_HZ_FULL currently? ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [QUERY]: Is using CPU hotplug right for isolating CPUs? 2014-01-28 13:23 ` Frederic Weisbecker 2014-01-28 16:11 ` Kevin Hilman @ 2014-02-11 8:52 ` Viresh Kumar 2014-02-13 14:20 ` Frederic Weisbecker 1 sibling, 1 reply; 28+ messages in thread From: Viresh Kumar @ 2014-02-11 8:52 UTC (permalink / raw) To: Frederic Weisbecker Cc: Thomas Gleixner, Peter Zijlstra, Linux Kernel Mailing List, Lists linaro-kernel, Steven Rostedt, Linaro Networking, Kevin Hilman On 28 January 2014 18:53, Frederic Weisbecker <fweisbec@gmail.com> wrote: > No, when a single task is running on a full dynticks CPU, the tick is supposed to run > every seconds. I'm actually suprised it doesn't happen in your traces, did you tweak > something specific? Why do we need this 1 second tick currently? And what will happen if I hotunplug that CPU and get it back? Would the timer for tick move away from CPU in question? I see that when I have changed this 1sec stuff to 300 seconds. But what would be impact of that? Will things still work normally? ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [QUERY]: Is using CPU hotplug right for isolating CPUs? 2014-02-11 8:52 ` Viresh Kumar @ 2014-02-13 14:20 ` Frederic Weisbecker 0 siblings, 0 replies; 28+ messages in thread From: Frederic Weisbecker @ 2014-02-13 14:20 UTC (permalink / raw) To: Viresh Kumar Cc: Thomas Gleixner, Peter Zijlstra, Linux Kernel Mailing List, Lists linaro-kernel, Steven Rostedt, Linaro Networking, Kevin Hilman On Tue, Feb 11, 2014 at 02:22:43PM +0530, Viresh Kumar wrote: > On 28 January 2014 18:53, Frederic Weisbecker <fweisbec@gmail.com> wrote: > > No, when a single task is running on a full dynticks CPU, the tick is supposed to run > > every seconds. I'm actually suprised it doesn't happen in your traces, did you tweak > > something specific? > > Why do we need this 1 second tick currently? And what will happen if I > hotunplug that > CPU and get it back? Would the timer for tick move away from CPU in > question? I see > that when I have changed this 1sec stuff to 300 seconds. But what > would be impact > of that? Will things still work normally? So the problem resides in the gazillions accounting maintained in scheduler_tick() and current->sched_class->task_tick(). The scheduler correctness depends on these to be updated regularly. If you deactivate or increase the delay with very high values, the result is unpredictable. Just expect that at least some scheduler feature will behave randomly, like load balancing for example or simply local fairness issues. So we have that 1 Hz max that makes sure that things are moving forward while keeping a rate that should be still nice for HPC workloads. But we certainly want to find a way to remove the need for any tick altogether for extreme real time workloads which need guarantees rather than just optimizations. I see two potential solutions for that: 1) Rework the scheduler accounting such that it is safe against full dynticks. That was the initial plan but it's scary. The scheduler accountings is a huge maze. And I'm not sure it's actually worth the complication. 2) Offload the accounting. For example we could imagine that the timekeeping could handle the task_tick() calls on behalf of the full dynticks CPUs. At a small rate like 1 Hz. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [QUERY]: Is using CPU hotplug right for isolating CPUs? 2014-01-15 9:27 [QUERY]: Is using CPU hotplug right for isolating CPUs? Viresh Kumar 2014-01-15 10:38 ` Peter Zijlstra 2014-01-15 17:17 ` Frederic Weisbecker @ 2014-01-20 13:59 ` Lei Wen 2014-01-20 15:00 ` Viresh Kumar 2 siblings, 1 reply; 28+ messages in thread From: Lei Wen @ 2014-01-20 13:59 UTC (permalink / raw) To: Viresh Kumar Cc: Frédéric Weisbecker, Kevin Hilman, Lists linaro-kernel, Peter Zijlstra, Linux Kernel Mailing List, Steven Rostedt, Linaro Networking Hi Viresh, On Wed, Jan 15, 2014 at 5:27 PM, Viresh Kumar <viresh.kumar@linaro.org> wrote: > Hi Again, > > I am now successful in isolating a CPU completely using CPUsets, > NO_HZ_FULL and CPU hotplug.. > > My setup and requirements for those who weren't following the > earlier mails: > > For networking machines it is required to run data plane threads on > some CPUs (i.e. one thread per CPU) and these CPUs shouldn't be > interrupted by kernel at all. > > Earlier I tried CPUSets with NO_HZ by creating two groups with > load_balancing disabled between them and manually tried to move > all tasks out to CPU0 group. But even then there were interruptions > which were continuously coming on CPU1 (which I am trying to > isolate). These were some workqueue events, some timers (like > prandom), timer overflow events (As NO_HZ_FULL pushes hrtimer > to long ahead in future, 450 seconds, rather than disabling them > completely, and these hardware timers were overflowing their > counters after 90 seconds on Samsung Exynos board). > > So after creating CPUsets I hotunplugged CPU1 and added it back > immediately. This moved all these interruptions away and now > CPU1 is running my single thread ("stress") for ever. I have one question regarding unbounded workqueue migration in your case. You use hotplug to migrate the unbounded work to other cpus, but its cpu mask would still be 0xf, since cannot be changed by cpuset. My question is how you could prevent this unbounded work migrate back to your isolated cpu? Seems to me there is no such mechanism in kernel, am I understand wrong? Thanks, Lei > > Now my question is: Is there anything particularly wrong about using > hotplugging here ? Will that lead to a disaster :) > > Thanks in Advance. > > -- > viresh > > _______________________________________________ > linaro-kernel mailing list > linaro-kernel@lists.linaro.org > http://lists.linaro.org/mailman/listinfo/linaro-kernel ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [QUERY]: Is using CPU hotplug right for isolating CPUs? 2014-01-20 13:59 ` Lei Wen @ 2014-01-20 15:00 ` Viresh Kumar 2014-01-20 15:41 ` Frederic Weisbecker 0 siblings, 1 reply; 28+ messages in thread From: Viresh Kumar @ 2014-01-20 15:00 UTC (permalink / raw) To: Lei Wen Cc: Frédéric Weisbecker, Kevin Hilman, Lists linaro-kernel, Peter Zijlstra, Linux Kernel Mailing List, Steven Rostedt, Linaro Networking On 20 January 2014 19:29, Lei Wen <adrian.wenl@gmail.com> wrote: > Hi Viresh, Hi Lei, > I have one question regarding unbounded workqueue migration in your case. > You use hotplug to migrate the unbounded work to other cpus, but its cpu mask > would still be 0xf, since cannot be changed by cpuset. > > My question is how you could prevent this unbounded work migrate back > to your isolated cpu? > Seems to me there is no such mechanism in kernel, am I understand wrong? These workqueues are normally queued back from workqueue handler. And we normally queue them on the local cpu, that's the default behavior of workqueue subsystem. And so they land up on the same CPU again and again. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [QUERY]: Is using CPU hotplug right for isolating CPUs? 2014-01-20 15:00 ` Viresh Kumar @ 2014-01-20 15:41 ` Frederic Weisbecker 2014-01-21 2:07 ` Lei Wen 2014-01-21 9:49 ` Viresh Kumar 0 siblings, 2 replies; 28+ messages in thread From: Frederic Weisbecker @ 2014-01-20 15:41 UTC (permalink / raw) To: Viresh Kumar Cc: Lei Wen, Kevin Hilman, Lists linaro-kernel, Peter Zijlstra, Linux Kernel Mailing List, Steven Rostedt, Linaro Networking, Tejun Heo On Mon, Jan 20, 2014 at 08:30:10PM +0530, Viresh Kumar wrote: > On 20 January 2014 19:29, Lei Wen <adrian.wenl@gmail.com> wrote: > > Hi Viresh, > > Hi Lei, > > > I have one question regarding unbounded workqueue migration in your case. > > You use hotplug to migrate the unbounded work to other cpus, but its cpu mask > > would still be 0xf, since cannot be changed by cpuset. > > > > My question is how you could prevent this unbounded work migrate back > > to your isolated cpu? > > Seems to me there is no such mechanism in kernel, am I understand wrong? > > These workqueues are normally queued back from workqueue handler. And we > normally queue them on the local cpu, that's the default behavior of workqueue > subsystem. And so they land up on the same CPU again and again. But for workqueues having a global affinity, I think they can be rescheduled later on the old CPUs. Although I'm not sure about that, I'm Cc'ing Tejun. Also, one of the plan is to extend the sysfs interface of workqueues to override their affinity. If any of you guys want to try something there, that would be welcome. Also we want to work on the timer affinity. Perhaps we don't need a user interface for that, or maybe something on top of full dynticks to outline that we want the unbound timers to run on housekeeping CPUs only. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [QUERY]: Is using CPU hotplug right for isolating CPUs? 2014-01-20 15:41 ` Frederic Weisbecker @ 2014-01-21 2:07 ` Lei Wen 2014-01-21 9:50 ` Viresh Kumar 2014-01-23 13:54 ` Frederic Weisbecker 2014-01-21 9:49 ` Viresh Kumar 1 sibling, 2 replies; 28+ messages in thread From: Lei Wen @ 2014-01-21 2:07 UTC (permalink / raw) To: Frederic Weisbecker Cc: Viresh Kumar, Kevin Hilman, Lists linaro-kernel, Peter Zijlstra, Linux Kernel Mailing List, Steven Rostedt, Linaro Networking, Tejun Heo On Mon, Jan 20, 2014 at 11:41 PM, Frederic Weisbecker <fweisbec@gmail.com> wrote: > On Mon, Jan 20, 2014 at 08:30:10PM +0530, Viresh Kumar wrote: >> On 20 January 2014 19:29, Lei Wen <adrian.wenl@gmail.com> wrote: >> > Hi Viresh, >> >> Hi Lei, >> >> > I have one question regarding unbounded workqueue migration in your case. >> > You use hotplug to migrate the unbounded work to other cpus, but its cpu mask >> > would still be 0xf, since cannot be changed by cpuset. >> > >> > My question is how you could prevent this unbounded work migrate back >> > to your isolated cpu? >> > Seems to me there is no such mechanism in kernel, am I understand wrong? >> >> These workqueues are normally queued back from workqueue handler. And we >> normally queue them on the local cpu, that's the default behavior of workqueue >> subsystem. And so they land up on the same CPU again and again. > > But for workqueues having a global affinity, I think they can be rescheduled later > on the old CPUs. Although I'm not sure about that, I'm Cc'ing Tejun. Agree, since worker thread is made as enterring into all cpus, it cannot prevent scheduler do the migration. But here is one point, that I see Viresh alredy set up two cpuset with scheduler load balance disabled, so it should stop the task migration between those two groups? Since the sched_domain changed? What is more, I also did similiar test, and find when I set two such cpuset group, like core 0-2 to cpuset1, core 3 to cpuset2, while hotunplug the core3 afterwise. I find the cpuset's cpus member becomes NULL even I hotplug the core3 back again. So is it a bug? Thanks, Lei > > Also, one of the plan is to extend the sysfs interface of workqueues to override > their affinity. If any of you guys want to try something there, that would be welcome. > Also we want to work on the timer affinity. Perhaps we don't need a user interface > for that, or maybe something on top of full dynticks to outline that we want the unbound > timers to run on housekeeping CPUs only. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [QUERY]: Is using CPU hotplug right for isolating CPUs? 2014-01-21 2:07 ` Lei Wen @ 2014-01-21 9:50 ` Viresh Kumar 2014-01-23 13:54 ` Frederic Weisbecker 1 sibling, 0 replies; 28+ messages in thread From: Viresh Kumar @ 2014-01-21 9:50 UTC (permalink / raw) To: Lei Wen Cc: Frederic Weisbecker, Kevin Hilman, Lists linaro-kernel, Peter Zijlstra, Linux Kernel Mailing List, Steven Rostedt, Linaro Networking, Tejun Heo On 21 January 2014 07:37, Lei Wen <adrian.wenl@gmail.com> wrote: > What is more, I also did similiar test, and find when I set two such > cpuset group, > like core 0-2 to cpuset1, core 3 to cpuset2, while hotunplug the core3 > afterwise. > I find the cpuset's cpus member becomes NULL even I hotplug the core3 > back again. > So is it a bug? I confirm the same :) ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [QUERY]: Is using CPU hotplug right for isolating CPUs? 2014-01-21 2:07 ` Lei Wen 2014-01-21 9:50 ` Viresh Kumar @ 2014-01-23 13:54 ` Frederic Weisbecker 2014-01-23 14:27 ` Viresh Kumar 1 sibling, 1 reply; 28+ messages in thread From: Frederic Weisbecker @ 2014-01-23 13:54 UTC (permalink / raw) To: Lei Wen Cc: Viresh Kumar, Kevin Hilman, Lists linaro-kernel, Peter Zijlstra, Linux Kernel Mailing List, Steven Rostedt, Linaro Networking, Tejun Heo On Tue, Jan 21, 2014 at 10:07:58AM +0800, Lei Wen wrote: > On Mon, Jan 20, 2014 at 11:41 PM, Frederic Weisbecker > <fweisbec@gmail.com> wrote: > > On Mon, Jan 20, 2014 at 08:30:10PM +0530, Viresh Kumar wrote: > >> On 20 January 2014 19:29, Lei Wen <adrian.wenl@gmail.com> wrote: > >> > Hi Viresh, > >> > >> Hi Lei, > >> > >> > I have one question regarding unbounded workqueue migration in your case. > >> > You use hotplug to migrate the unbounded work to other cpus, but its cpu mask > >> > would still be 0xf, since cannot be changed by cpuset. > >> > > >> > My question is how you could prevent this unbounded work migrate back > >> > to your isolated cpu? > >> > Seems to me there is no such mechanism in kernel, am I understand wrong? > >> > >> These workqueues are normally queued back from workqueue handler. And we > >> normally queue them on the local cpu, that's the default behavior of workqueue > >> subsystem. And so they land up on the same CPU again and again. > > > > But for workqueues having a global affinity, I think they can be rescheduled later > > on the old CPUs. Although I'm not sure about that, I'm Cc'ing Tejun. > > Agree, since worker thread is made as enterring into all cpus, it > cannot prevent scheduler > do the migration. > > But here is one point, that I see Viresh alredy set up two cpuset with > scheduler load balance > disabled, so it should stop the task migration between those two groups? Since > the sched_domain changed? > > What is more, I also did similiar test, and find when I set two such > cpuset group, > like core 0-2 to cpuset1, core 3 to cpuset2, while hotunplug the core3 > afterwise. > I find the cpuset's cpus member becomes NULL even I hotplug the core3 > back again. > So is it a bug? Not sure, you may need to check cpuset internals. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [QUERY]: Is using CPU hotplug right for isolating CPUs? 2014-01-23 13:54 ` Frederic Weisbecker @ 2014-01-23 14:27 ` Viresh Kumar 0 siblings, 0 replies; 28+ messages in thread From: Viresh Kumar @ 2014-01-23 14:27 UTC (permalink / raw) To: Frederic Weisbecker Cc: Lei Wen, Kevin Hilman, Lists linaro-kernel, Peter Zijlstra, Linux Kernel Mailing List, Steven Rostedt, Linaro Networking, Tejun Heo On 23 January 2014 19:24, Frederic Weisbecker <fweisbec@gmail.com> wrote: > On Tue, Jan 21, 2014 at 10:07:58AM +0800, Lei Wen wrote: >> I find the cpuset's cpus member becomes NULL even I hotplug the core3 >> back again. >> So is it a bug? > > Not sure, you may need to check cpuset internals. I think this is the correct behavior. Userspace must decide what to do with that CPU once it is back. Simply reverting to earlier cpusets configuration might not be the right approach. Also, what if cpusets have been rewritten in-between hotplug events. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [QUERY]: Is using CPU hotplug right for isolating CPUs? 2014-01-20 15:41 ` Frederic Weisbecker 2014-01-21 2:07 ` Lei Wen @ 2014-01-21 9:49 ` Viresh Kumar 2014-01-23 14:01 ` Frederic Weisbecker 1 sibling, 1 reply; 28+ messages in thread From: Viresh Kumar @ 2014-01-21 9:49 UTC (permalink / raw) To: Frederic Weisbecker Cc: Lei Wen, Kevin Hilman, Lists linaro-kernel, Peter Zijlstra, Linux Kernel Mailing List, Steven Rostedt, Linaro Networking, Tejun Heo On 20 January 2014 21:11, Frederic Weisbecker <fweisbec@gmail.com> wrote: > But for workqueues having a global affinity, I think they can be rescheduled later > on the old CPUs. Although I'm not sure about that, I'm Cc'ing Tejun. Works queued on workqueues with WQ_UNBOUND flag set are run on any cpu and is decided by scheduler, whereas works queued on workqueues with this flag not set and without a cpu number mentioned while queuing work, runs on local CPU always. > Also, one of the plan is to extend the sysfs interface of workqueues to override > their affinity. If any of you guys want to try something there, that would be welcome. > Also we want to work on the timer affinity. Perhaps we don't need a user interface > for that, or maybe something on top of full dynticks to outline that we want the unbound > timers to run on housekeeping CPUs only. What about a quiesce option as mentioned by PeterZ? With that we can move all UNBOUND timers and workqueues away. But to guarantee that we don't get them queued again later we need to make similar updates in workqueue/timer subsystem to disallow queuing any such stuff on such cpusets. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [QUERY]: Is using CPU hotplug right for isolating CPUs? 2014-01-21 9:49 ` Viresh Kumar @ 2014-01-23 14:01 ` Frederic Weisbecker 2014-01-24 8:53 ` Viresh Kumar 0 siblings, 1 reply; 28+ messages in thread From: Frederic Weisbecker @ 2014-01-23 14:01 UTC (permalink / raw) To: Viresh Kumar Cc: Lei Wen, Kevin Hilman, Lists linaro-kernel, Peter Zijlstra, Linux Kernel Mailing List, Steven Rostedt, Linaro Networking, Tejun Heo On Tue, Jan 21, 2014 at 03:19:36PM +0530, Viresh Kumar wrote: > On 20 January 2014 21:11, Frederic Weisbecker <fweisbec@gmail.com> wrote: > > But for workqueues having a global affinity, I think they can be rescheduled later > > on the old CPUs. Although I'm not sure about that, I'm Cc'ing Tejun. > > Works queued on workqueues with WQ_UNBOUND flag set are run on any cpu > and is decided by scheduler, whereas works queued on workqueues with this > flag not set and without a cpu number mentioned while queuing work, > runs on local > CPU always. Ok, so it is fine to migrate the latter kind I guess? > > > Also, one of the plan is to extend the sysfs interface of workqueues to override > > their affinity. If any of you guys want to try something there, that would be welcome. > > Also we want to work on the timer affinity. Perhaps we don't need a user interface > > for that, or maybe something on top of full dynticks to outline that we want the unbound > > timers to run on housekeeping CPUs only. > > What about a quiesce option as mentioned by PeterZ? With that we can move > all UNBOUND timers and workqueues away. But to guarantee that we don't get > them queued again later we need to make similar updates in workqueue/timer > subsystem to disallow queuing any such stuff on such cpusets. I haven't checked the details but then this quiesce option would involve a dependency on cpuset for any workload involving workqueues affinity. I'm not sure we really want this. Besides, workqueues have an existing sysfs interface that can be easily extended. Now indeed we may also want to enforce some policy to make sure that further created and queued workqueues are affine to a specific subset of CPUs. And then cpuset sounds like a good idea :) ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [QUERY]: Is using CPU hotplug right for isolating CPUs? 2014-01-23 14:01 ` Frederic Weisbecker @ 2014-01-24 8:53 ` Viresh Kumar 0 siblings, 0 replies; 28+ messages in thread From: Viresh Kumar @ 2014-01-24 8:53 UTC (permalink / raw) To: Frederic Weisbecker Cc: Lei Wen, Kevin Hilman, Lists linaro-kernel, Peter Zijlstra, Linux Kernel Mailing List, Steven Rostedt, Linaro Networking, Tejun Heo On 23 January 2014 19:31, Frederic Weisbecker <fweisbec@gmail.com> wrote: > Ok, so it is fine to migrate the latter kind I guess? Unless somebody has abused the API and used bound workqueues where he should have used unbound ones. > I haven't checked the details but then this quiesce option would involve > a dependency on cpuset for any workload involving workqueues affinity. I'm > not sure we really want this. Besides, workqueues have an existing sysfs interface > that can be easily extended. > > Now indeed we may also want to enforce some policy to make sure that further > created and queued workqueues are affine to a specific subset of CPUs. And then > cpuset sounds like a good idea :) Exactly. Cpuset would be more useful here. Probably we can keep both cpusets and sysfs interface of workqueues.. I will try to add this option under cpuset which will initially move timers and workqueues away from the cpuset in question. ^ permalink raw reply [flat|nested] 28+ messages in thread
end of thread, other threads:[~2014-02-28 9:04 UTC | newest]
Thread overview: 28+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-01-15 9:27 [QUERY]: Is using CPU hotplug right for isolating CPUs? Viresh Kumar
2014-01-15 10:38 ` Peter Zijlstra
2014-01-15 10:47 ` Viresh Kumar
2014-01-15 11:34 ` Peter Zijlstra
2014-02-28 9:04 ` Viresh Kumar
2014-01-15 17:17 ` Frederic Weisbecker
[not found] ` <CAKohponEZydR1OmP2xziA9bc3OJPgP3bFmuWFQmrmeQFZccMVQ@mail.gmail.com>
2014-01-16 9:46 ` Thomas Gleixner
2014-01-20 11:30 ` Viresh Kumar
2014-01-20 15:51 ` Frederic Weisbecker
2014-01-21 10:33 ` Viresh Kumar
2014-01-23 14:58 ` Frederic Weisbecker
2014-01-24 5:21 ` Viresh Kumar
2014-01-24 8:29 ` Mike Galbraith
2014-01-28 13:23 ` Frederic Weisbecker
2014-01-28 16:11 ` Kevin Hilman
2014-02-03 8:26 ` Viresh Kumar
2014-02-11 8:52 ` Viresh Kumar
2014-02-13 14:20 ` Frederic Weisbecker
2014-01-20 13:59 ` Lei Wen
2014-01-20 15:00 ` Viresh Kumar
2014-01-20 15:41 ` Frederic Weisbecker
2014-01-21 2:07 ` Lei Wen
2014-01-21 9:50 ` Viresh Kumar
2014-01-23 13:54 ` Frederic Weisbecker
2014-01-23 14:27 ` Viresh Kumar
2014-01-21 9:49 ` Viresh Kumar
2014-01-23 14:01 ` Frederic Weisbecker
2014-01-24 8:53 ` Viresh Kumar
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox