* Re: [Bugme-new] [Bug 16417] New: Slow context switches with SMP and CONFIG_FAIR_GROUP_SCHED [not found] ` <bug-16417-10286-3bo0kxnWaOQUvHkbgXJLS5sdmw4N0Rt+2LY78lusg7I@public.gmane.org/> @ 2010-07-22 22:52 ` Andrew Morton [not found] ` <20100722155222.f0fdc50a.akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org> 0 siblings, 1 reply; 3+ messages in thread From: Andrew Morton @ 2010-07-22 22:52 UTC (permalink / raw) To: Peter Zijlstra, Ingo Molnar, Dhaval Giani, Srivatsa Vaddagiri, Th Cc: pbourdon-SxHCd5+OuqTrt3ojHgZu+w, containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, bugzilla-daemon-590EEB7GvNiWaY/ihj7yzEB+6BGkLq7r, bugme-daemon-590EEB7GvNiWaY/ihj7yzEB+6BGkLq7r (switched to email. Please respond via emailed reply-to-all, not via the bugzilla web interface). sched suckage! Do we have a linear search in there? On Mon, 19 Jul 2010 14:38:09 GMT bugzilla-daemon-590EEB7GvNiWaY/ihj7yzEB+6BGkLq7r@public.gmane.org wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=16417 > > Summary: Slow context switches with SMP and > CONFIG_FAIR_GROUP_SCHED > Product: Process Management > Version: 2.5 > Kernel Version: 2.6.34.1 > Platform: All > OS/Version: Linux > Tree: Mainline > Status: NEW > Severity: normal > Priority: P1 > Component: Scheduler > AssignedTo: mingo-X9Un+BFzKDI@public.gmane.org > ReportedBy: pbourdon-SxHCd5+OuqTrt3ojHgZu+w@public.gmane.org > Regression: No > > > Hello, > > We have been experiencing slow context switches using a large number of cgroups > (around 600 groups) and CONFIG_FAIR_GROUP_SCHED. This causes a system time > usage increase on context switching heavy processes (measured with pidstat -w) > and a drop in timer interrupts handling. > > This problem only appears on SMP : when booting with nosmp, the issue does not > appear. From maxprocs=2 to maxprocs=8 we were able to reproduce it accurately. > > Steps to reproduce : > - mount the cgroup filesystem in /dev/cgroup > - cd /dev/cgroup && for i in $(seq 1 5000); do mkdir test_group_$i; done > - launch lat_ctx from lmbench, for instance ./lat_ctx -N 200 100 > > The results from lat_ctx were the following : > - SMP enabled, no cgroups : 2.65 > - SMP enabled, 1000 cgroups : 3.40 > - SMP enabled, 6000 cgroups : 3957.36 > - SMP disabled, 6000 cgroups : 1.58 > > We can see that from a certain amount of cgroups, the context switching starts > taking a lot of time. Another way to reproduce this problem : > - launch cat /dev/zero | pv -L 1G > /dev/null > - look at the CPU usage (about 40% here) > - cd /dev/cgroup && for i in $(seq 1 5000); do mkdir test_group_$i; done > - look at the CPU usage (about 80% here) > > Also note that when a lot of cgroups are present, the system is spending a lot > of time in softirqs, and there are less timer interrupts handled than normally > (according to our graphs). > ^ permalink raw reply [flat|nested] 3+ messages in thread
[parent not found: <20100722155222.f0fdc50a.akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>]
* Re: [Bugme-new] [Bug 16417] New: Slow context switches with SMP and CONFIG_FAIR_GROUP_SCHED [not found] ` <20100722155222.f0fdc50a.akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org> @ 2010-08-02 8:58 ` Peter Zijlstra 2010-08-02 10:52 ` Pierre Bourdon 0 siblings, 1 reply; 3+ messages in thread From: Peter Zijlstra @ 2010-08-02 8:58 UTC (permalink / raw) To: Andrew Morton Cc: pbourdon-SxHCd5+OuqTrt3ojHgZu+w, Dhaval Giani, containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, bugzilla-daemon-590EEB7GvNiWaY/ihj7yzEB+6BGkLq7r, Srivatsa Vaddagiri, bugme-daemon-590EEB7GvNiWaY/ihj7yzEB+6BGkLq7r, Ingo Molnar, Thomas Gleixner On Thu, 2010-07-22 at 15:52 -0700, Andrew Morton wrote: > > We have been experiencing slow context switches using a large number of cgroups > > (around 600 groups) and CONFIG_FAIR_GROUP_SCHED. This causes a system time > > usage increase on context switching heavy processes (measured with pidstat -w) > > and a drop in timer interrupts handling. > > > > This problem only appears on SMP : when booting with nosmp, the issue does not > > appear. From maxprocs=2 to maxprocs=8 we were able to reproduce it accurately. > > > > Steps to reproduce : > > - mount the cgroup filesystem in /dev/cgroup > > - cd /dev/cgroup && for i in $(seq 1 5000); do mkdir test_group_$i; done > > - launch lat_ctx from lmbench, for instance ./lat_ctx -N 200 100 > > > > The results from lat_ctx were the following : > > - SMP enabled, no cgroups : 2.65 > > - SMP enabled, 1000 cgroups : 3.40 > > - SMP enabled, 6000 cgroups : 3957.36 > > - SMP disabled, 6000 cgroups : 1.58 > > > > We can see that from a certain amount of cgroups, the context switching starts > > taking a lot of time. Another way to reproduce this problem : > > - launch cat /dev/zero | pv -L 1G > /dev/null > > - look at the CPU usage (about 40% here) > > - cd /dev/cgroup && for i in $(seq 1 5000); do mkdir test_group_$i; done > > - look at the CPU usage (about 80% here) > > Does: echo NO_LB_SHARES_UPDATE > /debug/sched_features (or wherever you mounted debugfs) help things? It will make the thing less fair but should cut out a lot of overhead in the wakeup path. The wakeup redistribution is throttled somewhat, but if you're looking for the worst latency you'll see the spikes for sure. The problem is that the whole group fairness mess involves equations covering all groups and all cpus. Its a frigging nightmare I wish someone would take away from me. I've tried several times to come up with some statistical approach, but every time I try that I end up with unstable stuff that has feed-forward loops that cause unfairness to blow out in stead of dampen it. > > Also note that when a lot of cgroups are present, the system is spending a lot > > of time in softirqs, and there are less timer interrupts handled than normally > > (according to our graphs). Right, so load-balancing is O(n) in the number of tasks and groups, it does try to break out once it moved enough, but if you have tons of empty groups.. I guess the alternative would be to keep a per-cpu list of non-empty groups, except that that would add more overhead to wakeup/sleep and would need stronger serialization than the current RCU bits. ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [Bugme-new] [Bug 16417] New: Slow context switches with SMP and CONFIG_FAIR_GROUP_SCHED 2010-08-02 8:58 ` Peter Zijlstra @ 2010-08-02 10:52 ` Pierre Bourdon 0 siblings, 0 replies; 3+ messages in thread From: Pierre Bourdon @ 2010-08-02 10:52 UTC (permalink / raw) To: Peter Zijlstra Cc: bugzilla-daemon-590EEB7GvNiWaY/ihj7yzEB+6BGkLq7r, Dhaval Giani, containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Ingo Molnar, Srivatsa Vaddagiri, bugme-daemon-590EEB7GvNiWaY/ihj7yzEB+6BGkLq7r, Andrew Morton, Thomas Gleixner On Mon, 02 Aug 2010 10:58:41 +0200, Peter Zijlstra <peterz-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org> wrote: > Does: echo NO_LB_SHARES_UPDATE > /debug/sched_features > (or wherever you mounted debugfs) help things? It does not, sorry. Latency with lat_ctx is still high, and CPU usage with cat | pv is still high too. Regards, -- Pierre Bourdon ^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2010-08-02 10:52 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <bug-16417-10286@https.bugzilla.kernel.org/>
[not found] ` <bug-16417-10286-3bo0kxnWaOQUvHkbgXJLS5sdmw4N0Rt+2LY78lusg7I@public.gmane.org/>
2010-07-22 22:52 ` [Bugme-new] [Bug 16417] New: Slow context switches with SMP and CONFIG_FAIR_GROUP_SCHED Andrew Morton
[not found] ` <20100722155222.f0fdc50a.akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
2010-08-02 8:58 ` Peter Zijlstra
2010-08-02 10:52 ` Pierre Bourdon
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox