Re: [PATCH] new cgroup controller "fork"

From: "Brian K. White" <brian-goxB3+SAe6wAvxtiuMwx3w@public.gmane.org>
To: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
Cc: cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Subject: Re: [PATCH] new cgroup controller "fork"
Date: Fri, 04 Nov 2011 12:43:38 -0400	[thread overview]
Message-ID: <4EB4163A.3060305@aljex.com> (raw)
In-Reply-To: <4EB3E47E.2080003-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>

On 11/4/2011 9:11 AM, Glauber Costa wrote:
> On 11/04/2011 01:03 AM, Li Zefan wrote:
>> 于 2011年11月04日 05:54, Glauber Costa 写道:
>>> On 11/03/2011 06:13 PM, Brian K. White wrote:
>>>> On 11/3/2011 3:25 PM, Glauber Costa wrote:
>>>>> On 11/03/2011 05:20 PM, Max Kellermann wrote:
>>>>>> On 2011/11/03 20:03, Alan Cox<alan@lxorguk.ukuu.org.uk> wrote:
>>>>>>> Sure - I'm just not seeing that a whole separate cgroup for it is
>>>>>>> appropriate or a good plan. Anyone doing real resource management
>>>>>>> needs
>>>>>>> the rest of the stuff anyway.
>>>>>>
>>>>>> Right. When I saw Frederic's controller today, my first thought was
>>>>>> that one could move the fork limit code over into that controller. If
>>>>>> we reach a consensus that this would be a good idea, and would have
>>>>>> chances to get merged, I could probably take some time to refactor my
>>>>>> code.
>>>>>>
>>>>>> Max
>>>>> I'd advise you to take a step back and think if this is really needed.
>>>>> As Alan pointed out, the really expensive resource here is already
>>>>> being
>>>>> constrained by Frederic's controller.
>>>>
>>>> I think this really is a different knob that is nice to have as long as
>>>> it doesn't cost much. It's a way to set a max lifespan in a way that
>>>> isn't really addressed by the other controls. (I could absolutely be
>>>> missing something.)
>>>>
>>>> I think Max explained the issue clearly enough.
>>>
>>> He did, indeed.
>>>
>>>> It doesn't matter that the fork itself is supposedly so cheap.
>>>>
>>>> It's still nice to have a way to say, you may not
>>>> fork/die/fork/die/fork
>>>> in a race.
>>>>
>>>> What's so unimaginable about having a process that you know needs a lot
>>>> of cpu and ram or other resources to do it's job, and you expressly
>>>> want
>>>> to allow it to take as much of those resources as it can, but you know
>>>> it has no need to fork, so if it forks, _that_ is the only
>>>> indication of
>>>> a problem, so you may only want to block it based on that.
>>>>
>>>> Sure many other processes would legitimately fork/die/fork/die a lot
>>>> while never exceeding a few total concurrent tasks, and for them you
>>>> would not want to set any such fork limit. So what?
>>>>
>>> As I said previously, he knows his use cases better than anyone else.
>>> If a use case can be found in which the summation of cpu+task
>>> controllers is not enough, and if this is implemented as an option to
>>> the task controller, and does not make it:
>>> 1) confusing,
>>> 2) more expensive,
>>>
>>> then I don't see why not we shouldn't take it.
>>
>> Quoted from Lennart's reply in another mail thread:
>>
>> "Given that shutting down some services might involve forking off a few
>> things (think: a shell script handling shutdown which forks off a couple
>> of shell utilities) we'd want something that is between "from now on no
>> forking at all" and "unlimited forking". This could be done in many
>> different ways: we'd be happy if we could do time-based rate limiting,
>> but we'd also be fine with defining a certain budget of additional forks
>> a cgroup can do (i.e. "from now on you can do 50 more forks, then you'll
>> get EPERM)."
>>
>> (http://lkml.org/lkml/2011/10/19/468)
>>
>> The last sentence suggests he might like this fork controller.
>
> Well, If I understand Frederic's work well enough, this can be achieved
> by setting the task limit to 0 in his controller. No?

Not that I can see. Not without changes.

> Because being lower than your limit won't kick tasks out, the practical
> effect is that no forks will be allowed in the group with this setting.

Setting task limit to 0 (more properly named "new/additional task limit" 
if it works the way you say) will do only one part of the fork limiter. 
The part that prevents new tasks from spawning. But that is only 
supposed to happen after a counter has reached a target value, and no 
such counter currently exists. The task counter does not count forks. It 
could, but it currently doesn't.

Tasks only account for a subset of forks.

> So for time-based rate limiting, it is trivial to just set it to 0 after
> x seconds.

What? no no no, maybe there are uses for that too but probably even less 
common than the proposed fork counter/decrimenter. Most of unix 
explicitly avoids ever promising to perform tasks within any arbitrary 
time widows. The system is mostly asynchronous by design, and operations 
block, buffer, spool, wait, and eventually resume and succeed, by 
design, instead of failing after arbitrary time limits.
It's not that a forks-per-time-unit would never be useful, but I think 
less often, less generically. It would be a nice option for those 
unknown, unknowable, situations where someone might know that they do 
want that.

> For other uses, we can watch the task counter increase until a certain
> value, and then set the limit to 0.

What? again:

time  action  forks  tasks
0000  init    0      1
0001  fork    1      2
0002  die     1      1
0003  fork    2      2
0004  die     2      1
0005  fork    3      2
0006  die     3      1
0007  fork    4      2
0008  die     4      1
0009  fork    5      2
0010  die     5      1
0011  fork    6      2
0012  die     6      1

Tasks never exceeds 2
Forks may climb to infinite.

You can't say that cpu or ram or any other resource can solve the same 
problem because the ultimate purpose of cpu and ram and the other 
resources is to be used, to do jobs, not preserve idle cpu time. You may 
expressly want the task to consume all available cpu, ram, disk i/o, net 
i/o, etc, it depends on the job.

Likewise the time on the wall means nothing either, or at least, you 
can't say that it's useful generically. Maybe in some odd cases someone 
may know that for their job, anything over X seconds is wrong, but I 
can't say that about almost anything, since unix is an asynchronous 
system that specifically explicitly avoids making any promises about 
timing in most operations. Almost any operation may take almost any 
amount of time depending on countless other transient factors, and most 
things block, buffer, spool, wait, and eventually resume and succeed 
instead of failing after arbitrary time limits.

The task counter only covers some cases.

As for rate limiting, that's a nice additional feature and there are a 
few ways to handle that. One crude way might be a cron job that 
periodically resets a long-running tasks remaining-forks-limit. That 
would allow say X forks per minute, and allow normal ops to work, while 
blocking a runaway. But most long-running tasks on my box have no such 
intrinsic sane-max-forks, or max-forks-per-time-unit limit on my boxes. 
When all my users try to log in at once after a network blip and the box 
gets 300 new ssh connections at once from about 10 different ip's so 
it's a lot per ip even, (buildings full of end users behind one ip) I 
specifically need to accept and handle that surge to the limit of the 
hardware's ability to do it, even though normally the rate of new ssh 
connections is a tiny fraction of that. Same goes for practically every 
service.

But I think Max was more talking about discreet jobs than long-running 
tasks, where the entire start-work-exit life cycle of the job is well 
known, so you could set the known fork limit for it and under normal 
circumstances it can do it's entire normal job within that limit, and 
has no excuse for exceeding that limit, and so it's a good safety to set 
and enforce that limit, and in doing so you can prevent things you might 
not be able to prevent any other way, at least not reliably and/or 
without doing more harm than good by breaking things you don't want to 
break.

If someone gets a cgi process to start dictionary attacking the local 
box or some other box, it will fork-login-attempt-fail repeatedly and 
use up the fork budget immediately. But it wouldn't necessarily have 
touched any reasonable typical task limit or cpu or ram or network or 
disk i/o.

I think cgi is just an example and by no means a special case. It sounds 
like generically useful control to me like any of the other controls. 
Maybe only needed in relatively few situations, but not predictable or 
limited to any particular situations. Like cgroups itself or any other 
controls, it might be used anywhere for anything, and many important 
uses may only occur to people after the facility exists and is 
considered for a while, and then may even be seen as indispensable.

Am I missing something? How does the task counter provide this kind of 
boxing-in without actually counting forks along the way, distinctly from 
tasks?

-- 
bkw
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/containers