All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mathieu Desnoyers <mathieu.desnoyers-vg+e7yoeK/dWk0Htik3J/w@public.gmane.org>
To: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Cc: Li Zefan <lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>,
	cgroups <cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	linux-kernel
	<linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	Peter Zijlstra <peterz-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>,
	Ingo Molnar <mingo-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>,
	Valentin Schneider
	<valentin.schneider-5wv7dgnIgG8@public.gmane.org>,
	Thomas Gleixner <tglx-hfZtesqFncYOwBW4kG4KsQ@public.gmane.org>
Subject: Re: [regression] cpuset: offlined CPUs removed from affinity masks
Date: Mon, 30 Mar 2020 15:53:02 -0400 (EDT)	[thread overview]
Message-ID: <266054305.17171.1585597982690.JavaMail.zimbra@efficios.com> (raw)
In-Reply-To: <195391080.10219.1585078246788.JavaMail.zimbra-vg+e7yoeK/dWk0Htik3J/w@public.gmane.org>

----- On Mar 24, 2020, at 3:30 PM, Mathieu Desnoyers mathieu.desnoyers-vg+e7yoeK/dWk0Htik3J/w@public.gmane.org wrote:

> ----- On Mar 24, 2020, at 2:01 PM, Tejun Heo tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org wrote:
> 
>> On Thu, Mar 12, 2020 at 03:47:50PM -0400, Mathieu Desnoyers wrote:
>>> The basic idea is to allow applications to pin to every possible cpu, but
>>> not allow them to use this to consume a lot of cpu time on CPUs they
>>> are not allowed to run.
>>> 
>>> Thoughts ?
>> 
>> One thing that we learned is that priority alone isn't enough in isolating cpu
>> consumptions no matter how low the priority may be if the workload is latency
>> sensitive. The actual computation capacity of cpus gets saturated way before cpu
>> time is saturated and latency impact from lowered mips becomes noticeable. So,
>> depending on workloads, allowing threads to run at the lowest priority on
>> disallowed cpus might not lead to behaviors that users expect but I have no idea
>> what kind of usage models you have on mind for the new system call.
> 
[...]

One possibility would be to use SCHED_IDLE scheduling class rather than SCHED_OTHER
with nice +19. The unfortunate side-effect AFAIU shows up when a thread requests to
be pinned on a CPU which is continuously overcommitted. It may never run. This could
come as a surprise for the user. The only case where this would happen is if:

- A thread is pinned on CPU N, and
  - CPU N is not part of the allowed mask for the task's cpuset (and is overcommitted), or
  - CPU N is offline, and the fallback CPU is not part of the allowed mask for the
    task's cpuset (and is overcommitted).

Is it an acceptable behavior ? How is userspace supposed to detect this kind of situation
and mitigate it ?

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

WARNING: multiple messages have this Message-ID (diff)
From: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
To: Tejun Heo <tj@kernel.org>
Cc: Li Zefan <lizefan@huawei.com>, cgroups <cgroups@vger.kernel.org>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@kernel.org>,
	Valentin Schneider <valentin.schneider@arm.com>,
	Thomas Gleixner <tglx@linutronix.de>
Subject: Re: [regression] cpuset: offlined CPUs removed from affinity masks
Date: Mon, 30 Mar 2020 15:53:02 -0400 (EDT)	[thread overview]
Message-ID: <266054305.17171.1585597982690.JavaMail.zimbra@efficios.com> (raw)
In-Reply-To: <195391080.10219.1585078246788.JavaMail.zimbra@efficios.com>

----- On Mar 24, 2020, at 3:30 PM, Mathieu Desnoyers mathieu.desnoyers@efficios.com wrote:

> ----- On Mar 24, 2020, at 2:01 PM, Tejun Heo tj@kernel.org wrote:
> 
>> On Thu, Mar 12, 2020 at 03:47:50PM -0400, Mathieu Desnoyers wrote:
>>> The basic idea is to allow applications to pin to every possible cpu, but
>>> not allow them to use this to consume a lot of cpu time on CPUs they
>>> are not allowed to run.
>>> 
>>> Thoughts ?
>> 
>> One thing that we learned is that priority alone isn't enough in isolating cpu
>> consumptions no matter how low the priority may be if the workload is latency
>> sensitive. The actual computation capacity of cpus gets saturated way before cpu
>> time is saturated and latency impact from lowered mips becomes noticeable. So,
>> depending on workloads, allowing threads to run at the lowest priority on
>> disallowed cpus might not lead to behaviors that users expect but I have no idea
>> what kind of usage models you have on mind for the new system call.
> 
[...]

One possibility would be to use SCHED_IDLE scheduling class rather than SCHED_OTHER
with nice +19. The unfortunate side-effect AFAIU shows up when a thread requests to
be pinned on a CPU which is continuously overcommitted. It may never run. This could
come as a surprise for the user. The only case where this would happen is if:

- A thread is pinned on CPU N, and
  - CPU N is not part of the allowed mask for the task's cpuset (and is overcommitted), or
  - CPU N is offline, and the fallback CPU is not part of the allowed mask for the
    task's cpuset (and is overcommitted).

Is it an acceptable behavior ? How is userspace supposed to detect this kind of situation
and mitigate it ?

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

  parent reply	other threads:[~2020-03-30 19:53 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-01-16 17:41 [regression] cpuset: offlined CPUs removed from affinity masks Mathieu Desnoyers
2020-01-16 18:27 ` Valentin Schneider
     [not found] ` <1251528473.590671.1579196495905.JavaMail.zimbra-vg+e7yoeK/dWk0Htik3J/w@public.gmane.org>
2020-02-17 16:03   ` Mathieu Desnoyers
2020-02-17 16:03     ` Mathieu Desnoyers
2020-02-19 15:19     ` Tejun Heo
     [not found]       ` <20200219151922.GB698990-146+VewaZzwNjtGbbfXrCEEOCMrvLtNR@public.gmane.org>
2020-02-19 15:43         ` Mathieu Desnoyers
2020-02-19 15:43           ` Mathieu Desnoyers
2020-02-19 15:47           ` Tejun Heo
     [not found]             ` <20200219154740.GD698990-146+VewaZzwNjtGbbfXrCEEOCMrvLtNR@public.gmane.org>
2020-02-19 15:50               ` Mathieu Desnoyers
2020-02-19 15:50                 ` Mathieu Desnoyers
2020-02-19 15:52                 ` Tejun Heo
     [not found]                   ` <20200219155202.GE698990-146+VewaZzwNjtGbbfXrCEEOCMrvLtNR@public.gmane.org>
2020-02-19 16:08                     ` Mathieu Desnoyers
2020-02-19 16:08                       ` Mathieu Desnoyers
     [not found]                       ` <1358308409.804.1582128519523.JavaMail.zimbra-vg+e7yoeK/dWk0Htik3J/w@public.gmane.org>
2020-02-19 16:12                         ` Tejun Heo
2020-02-19 16:12                           ` Tejun Heo
     [not found]                           ` <20200219161222.GF698990-146+VewaZzwNjtGbbfXrCEEOCMrvLtNR@public.gmane.org>
2020-03-07 16:06                             ` Mathieu Desnoyers
2020-03-07 16:06                               ` Mathieu Desnoyers
     [not found]                               ` <316507033.21078.1583597207356.JavaMail.zimbra-vg+e7yoeK/dWk0Htik3J/w@public.gmane.org>
2020-03-12 18:26                                 ` Tejun Heo
2020-03-12 18:26                                   ` Tejun Heo
     [not found]                                   ` <20200312182618.GE79873-qYNAdHglDFBN0TnZuCh8vA@public.gmane.org>
2020-03-12 19:47                                     ` Mathieu Desnoyers
2020-03-12 19:47                                       ` Mathieu Desnoyers
     [not found]                                       ` <1289608777.27165.1584042470528.JavaMail.zimbra-vg+e7yoeK/dWk0Htik3J/w@public.gmane.org>
2020-03-24 18:01                                         ` Tejun Heo
2020-03-24 18:01                                           ` Tejun Heo
     [not found]                                           ` <20200324180139.GB162390-qYNAdHglDFBN0TnZuCh8vA@public.gmane.org>
2020-03-24 19:30                                             ` Mathieu Desnoyers
2020-03-24 19:30                                               ` Mathieu Desnoyers
     [not found]                                               ` <195391080.10219.1585078246788.JavaMail.zimbra-vg+e7yoeK/dWk0Htik3J/w@public.gmane.org>
2020-03-30 19:53                                                 ` Mathieu Desnoyers [this message]
2020-03-30 19:53                                                   ` Mathieu Desnoyers

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=266054305.17171.1585597982690.JavaMail.zimbra@efficios.com \
    --to=mathieu.desnoyers-vg+e7yoek/dwk0htik3j/w@public.gmane.org \
    --cc=cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org \
    --cc=mingo-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org \
    --cc=peterz-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org \
    --cc=tglx-hfZtesqFncYOwBW4kG4KsQ@public.gmane.org \
    --cc=tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org \
    --cc=valentin.schneider-5wv7dgnIgG8@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.