Re: threads-max observe limits

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: ebiederm@xmission.com (Eric W. Biederman)
To: Michal Hocko <mhocko@kernel.org>
Cc: Heinrich Schuchardt <xypron.glpk@gmx.de>,
	Andrew Morton <akpm@linux-foundation.org>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: threads-max observe limits
Date: Tue, 17 Sep 2019 12:26:18 -0500	[thread overview]
Message-ID: <87ftku96md.fsf@x220.int.ebiederm.org> (raw)
In-Reply-To: <20190917153830.GE1872@dhcp22.suse.cz> (Michal Hocko's message of "Tue, 17 Sep 2019 17:38:30 +0200")

Michal Hocko <mhocko@kernel.org> writes:

> On Tue 17-09-19 17:28:02, Heinrich Schuchardt wrote:
>> 
>> On 9/17/19 12:03 PM, Michal Hocko wrote:
>> > Hi,
>> > I have just stumbled over 16db3d3f1170 ("kernel/sysctl.c: threads-max
>> > observe limits") and I am really wondering what is the motivation behind
>> > the patch. We've had a customer noticing the threads_max autoscaling
>> > differences btween 3.12 and 4.4 kernels and wanted to override the auto
>> > tuning from the userspace, just to find out that this is not possible.
>> 
>> set_max_threads() sets the upper limit (max_threads_suggested) for
>> threads such that at a maximum 1/8th of the total memory can be occupied
>> by the thread's administrative data (of size THREADS_SIZE). On my 32 GiB
>> system this results in 254313 threads.
>
> This is quite arbitrary, isn't it? What would happen if the limit was
> twice as large?
>
>> With patch 16db3d3f1170 ("kernel/sysctl.c: threads-max observe limits")
>> a user cannot set an arbitrarily high number for
>> /proc/sys/kernel/threads-max which could lead to a system stalling
>> because the thread headers occupy all the memory.
>
> This is still a decision of the admin to make.  You can consume the
> memory by other means and that is why we have measures in place. E.g.
> memcg accounting.
>
>> When developing the patch I remarked that on a system where memory is
>> installed dynamically it might be a good idea to recalculate this limit.
>> If you have a system that boots with let's say 8 GiB and than
>> dynamically installs a few TiB of RAM this might make sense. But such a
>> dynamic update of thread_max_suggested was left out for the sake of
>> simplicity.
>> 
>> Anyway if more than 100,000 threads are used on a system, I would wonder
>> if the software should not be changed to use thread-pools instead.
>
> You do not change the software to overcome artificial bounds based on
> guessing.
>
> So can we get back to the justification of the patch. What kind of
> real life problem does it solve and why is it ok to override an admin
> decision?
> If there is no strong justification then the patch should be reverted
> because from what I have heard it has been noticed and it has broken
> a certain deployment. I am not really clear about technical details yet
> but it seems that there are workloads that believe they need to touch
> this tuning and complain if that is not possible.

Taking a quick look myself.

I am completely mystified by both sides of this conversation.

a) The logic to set the default number of threads in a system
   has not changed since 2.6.12-rc2 (the start of the git history).

The implementation has changed but we should still get the same
value.  So anyone seeing threads_max autoscaling differences
between kernels is either seeing a bug in the rewritten formula
or something else weird is going on.

Michal is it a very small effect your customers are seeing?
Is it another bug somewhere else?

b) Not being able to bump threads_max to the physical limit of
   the machine is very clearly a regression.

 Limiting threads_max to THREADS_MIN on the low end and THREAD_MAX on
 the high end is reasonable, because linux can't cope with values
 outside of that range.  Limiting threads_max to the auto-scaling value
 is a regression.

The point of limits like threads_max is to have something that 99%
of people won't hit and if they do it will indicate a bug in their
application.  And to generally keep the kernel working when an
application bug happens.

But there are always cases where heuristics fail so it is completely
reasonable to allow these values to be manually tuned.

Eric

next prev parent reply	other threads:[~2019-09-17 17:26 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-17 10:03 threads-max observe limits Michal Hocko
2019-09-17 15:28 ` Heinrich Schuchardt
2019-09-17 15:38   ` Michal Hocko
2019-09-17 17:26     ` Eric W. Biederman [this message]
2019-09-18  7:15       ` Michal Hocko
2019-09-19  7:59         ` Michal Hocko
2019-09-19 19:38           ` Andrew Morton
2019-09-19 19:33         ` Eric W. Biederman
2019-09-22  6:58           ` Michal Hocko
2019-09-22 15:31             ` Heinrich Schuchardt
2019-09-22 21:40               ` Eric W. Biederman
2019-09-22 21:24             ` Eric W. Biederman
2019-09-23  8:08               ` Michal Hocko
2019-09-23 21:23                 ` Eric W. Biederman
2019-09-24  8:48                   ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87ftku96md.fsf@x220.int.ebiederm.org \
    --to=ebiederm@xmission.com \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mhocko@kernel.org \
    --cc=xypron.glpk@gmx.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox