Re: [Bugme-new] [Bug 16417] New: Slow context switches with SMP and CONFIG_FAIR_GROUP

Linux Container Development
 help / color / mirror / Atom feed

* Re: [Bugme-new] [Bug 16417] New: Slow context switches with SMP and CONFIG_FAIR_GROUP_SCHED
       [not found] ` <bug-16417-10286-3bo0kxnWaOQUvHkbgXJLS5sdmw4N0Rt+2LY78lusg7I@public.gmane.org/>
@ 2010-07-22 22:52   ` Andrew Morton
       [not found]     ` <20100722155222.f0fdc50a.akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
  0 siblings, 1 reply; 3+ messages in thread
From: Andrew Morton @ 2010-07-22 22:52 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Dhaval Giani, Srivatsa Vaddagiri, Th
  Cc: pbourdon-SxHCd5+OuqTrt3ojHgZu+w,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	bugzilla-daemon-590EEB7GvNiWaY/ihj7yzEB+6BGkLq7r,
	bugme-daemon-590EEB7GvNiWaY/ihj7yzEB+6BGkLq7r


(switched to email.  Please respond via emailed reply-to-all, not via the
bugzilla web interface).

sched suckage!  Do we have a linear search in there?


On Mon, 19 Jul 2010 14:38:09 GMT
bugzilla-daemon-590EEB7GvNiWaY/ihj7yzEB+6BGkLq7r@public.gmane.org wrote:

> https://bugzilla.kernel.org/show_bug.cgi?id=16417
> 
>            Summary: Slow context switches with SMP and
>                     CONFIG_FAIR_GROUP_SCHED
>            Product: Process Management
>            Version: 2.5
>     Kernel Version: 2.6.34.1
>           Platform: All
>         OS/Version: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: normal
>           Priority: P1
>          Component: Scheduler
>         AssignedTo: mingo-X9Un+BFzKDI@public.gmane.org
>         ReportedBy: pbourdon-SxHCd5+OuqTrt3ojHgZu+w@public.gmane.org
>         Regression: No
> 
> 
> Hello,
> 
> We have been experiencing slow context switches using a large number of cgroups
> (around 600 groups) and CONFIG_FAIR_GROUP_SCHED. This causes a system time
> usage increase on context switching heavy processes (measured with pidstat -w)
> and a drop in timer interrupts handling.
> 
> This problem only appears on SMP : when booting with nosmp, the issue does not
> appear. From maxprocs=2 to maxprocs=8 we were able to reproduce it accurately.
> 
> Steps to reproduce :
> - mount the cgroup filesystem in /dev/cgroup
> - cd /dev/cgroup && for i in $(seq 1 5000); do mkdir test_group_$i; done
> - launch lat_ctx from lmbench, for instance ./lat_ctx -N 200 100
> 
> The results from lat_ctx were the following :
> - SMP enabled, no cgroups : 2.65
> - SMP enabled, 1000 cgroups : 3.40
> - SMP enabled, 6000 cgroups : 3957.36
> - SMP disabled, 6000 cgroups : 1.58
> 
> We can see that from a certain amount of cgroups, the context switching starts
> taking a lot of time. Another way to reproduce this problem :
> - launch cat /dev/zero | pv -L 1G > /dev/null
> - look at the CPU usage (about 40% here)
> - cd /dev/cgroup && for i in $(seq 1 5000); do mkdir test_group_$i; done
> - look at the CPU usage (about 80% here)
> 
> Also note that when a lot of cgroups are present, the system is spending a lot
> of time in softirqs, and there are less timer interrupts handled than normally
> (according to our graphs).
> 

^ permalink raw reply	[flat|nested] 3+ messages in thread

[parent not found: <20100722155222.f0fdc50a.akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>]

* Re: [Bugme-new] [Bug 16417] New: Slow context switches with SMP and CONFIG_FAIR_GROUP_SCHED
       [not found]     ` <20100722155222.f0fdc50a.akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
@ 2010-08-02  8:58       ` Peter Zijlstra
  2010-08-02 10:52         ` Pierre Bourdon
  0 siblings, 1 reply; 3+ messages in thread
From: Peter Zijlstra @ 2010-08-02  8:58 UTC (permalink / raw)
  To: Andrew Morton
  Cc: pbourdon-SxHCd5+OuqTrt3ojHgZu+w, Dhaval Giani,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	bugzilla-daemon-590EEB7GvNiWaY/ihj7yzEB+6BGkLq7r,
	Srivatsa Vaddagiri, bugme-daemon-590EEB7GvNiWaY/ihj7yzEB+6BGkLq7r,
	Ingo Molnar, Thomas Gleixner

On Thu, 2010-07-22 at 15:52 -0700, Andrew Morton wrote:

> > We have been experiencing slow context switches using a large number of cgroups
> > (around 600 groups) and CONFIG_FAIR_GROUP_SCHED. This causes a system time
> > usage increase on context switching heavy processes (measured with pidstat -w)
> > and a drop in timer interrupts handling.
> > 
> > This problem only appears on SMP : when booting with nosmp, the issue does not
> > appear. From maxprocs=2 to maxprocs=8 we were able to reproduce it accurately.
> > 
> > Steps to reproduce :
> > - mount the cgroup filesystem in /dev/cgroup
> > - cd /dev/cgroup && for i in $(seq 1 5000); do mkdir test_group_$i; done
> > - launch lat_ctx from lmbench, for instance ./lat_ctx -N 200 100
> > 
> > The results from lat_ctx were the following :
> > - SMP enabled, no cgroups : 2.65
> > - SMP enabled, 1000 cgroups : 3.40
> > - SMP enabled, 6000 cgroups : 3957.36
> > - SMP disabled, 6000 cgroups : 1.58
> > 
> > We can see that from a certain amount of cgroups, the context switching starts
> > taking a lot of time. Another way to reproduce this problem :
> > - launch cat /dev/zero | pv -L 1G > /dev/null
> > - look at the CPU usage (about 40% here)
> > - cd /dev/cgroup && for i in $(seq 1 5000); do mkdir test_group_$i; done
> > - look at the CPU usage (about 80% here)
> > 

Does: echo NO_LB_SHARES_UPDATE > /debug/sched_features
(or wherever you mounted debugfs) help things?

It will make the thing less fair but should cut out a lot of overhead in
the wakeup path. The wakeup redistribution is throttled somewhat, but if
you're looking for the worst latency you'll see the spikes for sure.

The problem is that the whole group fairness mess involves equations
covering all groups and all cpus. Its a frigging nightmare I wish
someone would take away from me.

I've tried several times to come up with some statistical approach, but
every time I try that I end up with unstable stuff that has feed-forward
loops that cause unfairness to blow out in stead of dampen it.

> > Also note that when a lot of cgroups are present, the system is spending a lot
> > of time in softirqs, and there are less timer interrupts handled than normally
> > (according to our graphs).

Right, so load-balancing is O(n) in the number of tasks and groups, it
does try to break out once it moved enough, but if you have tons of
empty groups..

I guess the alternative would be to keep a per-cpu list of non-empty
groups, except that that would add more overhead to wakeup/sleep and
would need stronger serialization than the current RCU bits.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [Bugme-new] [Bug 16417] New: Slow context switches with SMP and CONFIG_FAIR_GROUP_SCHED
  2010-08-02  8:58       ` Peter Zijlstra
@ 2010-08-02 10:52         ` Pierre Bourdon
  0 siblings, 0 replies; 3+ messages in thread
From: Pierre Bourdon @ 2010-08-02 10:52 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: bugzilla-daemon-590EEB7GvNiWaY/ihj7yzEB+6BGkLq7r, Dhaval Giani,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Ingo Molnar, Srivatsa Vaddagiri,
	bugme-daemon-590EEB7GvNiWaY/ihj7yzEB+6BGkLq7r, Andrew Morton,
	Thomas Gleixner

On Mon, 02 Aug 2010 10:58:41 +0200, Peter Zijlstra <peterz-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
wrote:
> Does: echo NO_LB_SHARES_UPDATE > /debug/sched_features
> (or wherever you mounted debugfs) help things?

It does not, sorry. Latency with lat_ctx is still high, and CPU usage with
cat | pv is still high too.

Regards,
-- 
Pierre Bourdon

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2010-08-02 10:52 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <bug-16417-10286@https.bugzilla.kernel.org/>
     [not found] ` <bug-16417-10286-3bo0kxnWaOQUvHkbgXJLS5sdmw4N0Rt+2LY78lusg7I@public.gmane.org/>
2010-07-22 22:52   ` [Bugme-new] [Bug 16417] New: Slow context switches with SMP and CONFIG_FAIR_GROUP_SCHED Andrew Morton
     [not found]     ` <20100722155222.f0fdc50a.akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
2010-08-02  8:58       ` Peter Zijlstra
2010-08-02 10:52         ` Pierre Bourdon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox