[PATCH RFC 0/9] cgroups: add res_counter_write_u64() API

* [PATCH RFC 0/9] cgroups: add res_counter_write_u64() API
@ 2013-12-12 21:35 Dwight Engen
       [not found] ` <1386884118-14972-1-git-send-email-dwight.engen-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
       [not found] ` <20131223125410.GA585@localhost.localdomain>
  0 siblings, 2 replies; 13+ messages in thread
From: Dwight Engen @ 2013-12-12 21:35 UTC (permalink / raw)
  To: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
  Cc: Frederic Weisbecker, Max Kellermann

Hello,

I've seen that some sort of fork/task limiter has been proposed and
discussed several times before. Despite the existance of kmem in memcg, a
fork limiter is still often asked for by container users. Perhaps this is
due to current granularity in kmem (ie. stack/struct task not split out from
other slab allocations) but I believe it is just more natural for users to
express a limit in terms of tasks.

So what I've done is updated Frederic Weisbecker's task counter patchset and
tried to address the concerns that I saw people had raised. This involved
the following changes:

- merged into cpuacct controller, as it seems there is a desire not to add
  new controllers, this controller is already heirarchical, and I feel
  limiting number of tasks/forks fits best here
- included a fork_limit similar to the one Max Kellermann posted
  (https://lkml.org/lkml/2011/2/17/116) which is a use case not addressable
  with memcg
- ala memcg kmem, for performance reasons don't account unless limit is set
- ala memcg, allow usage to roll up to root (prevents warnings on
  uncharge), but still don't allow setting limits in root
- changed the interface at fork()/exit(), adding
  can_fork()/cancel_can_fork() modeled on can_attach(). cgroup_fork()
  can now return failure to fork().
- ran Frederics selftests, and added a couple more

I also wrote a small fork micro benchmark to see how this change affected
performance. I did 20 runs of 100000 fork/exit/waitpid, and took the
average. Times are in seconds, base is without the change, cpuacct1 is with
the change but no accounting be done (ie. no limit set), and cpuacct2 is
with the test being in a cgroup 1 level deep.

base  cpuacct1  cpuaac2
5.59  5.59      5.64

So I believe this change has minimal performance impact, especially when no
limit is set.

^ permalink raw reply	[flat|nested] 13+ messages in thread