per-NUMA memory limits in mem cgroup?

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* per-NUMA memory limits in mem cgroup?
@ 2018-04-20 17:43 Chris Friesen
  2018-04-22 12:46 ` Michal Hocko
  0 siblings, 1 reply; 6+ messages in thread
From: Chris Friesen @ 2018-04-20 17:43 UTC (permalink / raw)
  To: cgroups@vger.kernel.org, linux-mm, Johannes Weiner, Michal Hocko,
	Vladimir Davydov

Hi,

I'm aware of the ability to use the memory controller to limit how much memory a 
group of tasks can consume.

Is there any way to limit how much memory a group of tasks can consume *per NUMA 
node*?

The specific scenario I'm considering is that of a hypervisor host.  I have 
system management stuff running on the host that may need more than one core, 
and currently these host tasks might be affined to cores from multiple NUMA 
nodes.  I'd like to put a cap on how much memory the host tasks can allocate 
from each NUMA node in order to ensure that there is a guaranteed amount of 
memory available for VMs on each NUMA node.

Is this possible, or are the knobs just not there?

Thanks,
Chris

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: per-NUMA memory limits in mem cgroup?
  2018-04-20 17:43 per-NUMA memory limits in mem cgroup? Chris Friesen
@ 2018-04-22 12:46 ` Michal Hocko
  2018-04-23 15:29   ` Chris Friesen
  0 siblings, 1 reply; 6+ messages in thread
From: Michal Hocko @ 2018-04-22 12:46 UTC (permalink / raw)
  To: Chris Friesen
  Cc: cgroups@vger.kernel.org, linux-mm, Johannes Weiner,
	Vladimir Davydov

On Fri 20-04-18 11:43:07, Chris Friesen wrote:
> Hi,
> 
> I'm aware of the ability to use the memory controller to limit how much
> memory a group of tasks can consume.
> 
> Is there any way to limit how much memory a group of tasks can consume *per
> NUMA node*?

Not really. We have all or nothing via cpusets but nothing really fine
grained for the amount of memory.

> The specific scenario I'm considering is that of a hypervisor host.  I have
> system management stuff running on the host that may need more than one
> core, and currently these host tasks might be affined to cores from multiple
> NUMA nodes.  I'd like to put a cap on how much memory the host tasks can
> allocate from each NUMA node in order to ensure that there is a guaranteed
> amount of memory available for VMs on each NUMA node.
> 
> Is this possible, or are the knobs just not there?

Not possible right now. What would be the policy when you reach the
limit on one node? Fallback to other nodes? What if those hit the limit
as well? OOM killer or an allocation failure?
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: per-NUMA memory limits in mem cgroup?
  2018-04-22 12:46 ` Michal Hocko
@ 2018-04-23 15:29   ` Chris Friesen
  2018-04-24 13:27     ` Michal Hocko
  0 siblings, 1 reply; 6+ messages in thread
From: Chris Friesen @ 2018-04-23 15:29 UTC (permalink / raw)
  To: Michal Hocko
  Cc: cgroups@vger.kernel.org, linux-mm, Johannes Weiner,
	Vladimir Davydov

On 04/22/2018 08:46 AM, Michal Hocko wrote:
> On Fri 20-04-18 11:43:07, Chris Friesen wrote:

>> The specific scenario I'm considering is that of a hypervisor host.  I have
>> system management stuff running on the host that may need more than one
>> core, and currently these host tasks might be affined to cores from multiple
>> NUMA nodes.  I'd like to put a cap on how much memory the host tasks can
>> allocate from each NUMA node in order to ensure that there is a guaranteed
>> amount of memory available for VMs on each NUMA node.
>>
>> Is this possible, or are the knobs just not there?
>
> Not possible right now. What would be the policy when you reach the
> limit on one node? Fallback to other nodes? What if those hit the limit
> as well? OOM killer or an allocation failure?

I'd envision it working exactly the same as the current memory cgroup, but with 
the ability to specify optional per-NUMA-node limits in addition to system-wide.

Chris

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: per-NUMA memory limits in mem cgroup?
  2018-04-23 15:29   ` Chris Friesen
@ 2018-04-24 13:27     ` Michal Hocko
  2018-04-24 15:13       ` Chris Friesen
  0 siblings, 1 reply; 6+ messages in thread
From: Michal Hocko @ 2018-04-24 13:27 UTC (permalink / raw)
  To: Chris Friesen
  Cc: cgroups@vger.kernel.org, linux-mm, Johannes Weiner,
	Vladimir Davydov

On Mon 23-04-18 11:29:21, Chris Friesen wrote:
> On 04/22/2018 08:46 AM, Michal Hocko wrote:
> > On Fri 20-04-18 11:43:07, Chris Friesen wrote:
> 
> > > The specific scenario I'm considering is that of a hypervisor host.  I have
> > > system management stuff running on the host that may need more than one
> > > core, and currently these host tasks might be affined to cores from multiple
> > > NUMA nodes.  I'd like to put a cap on how much memory the host tasks can
> > > allocate from each NUMA node in order to ensure that there is a guaranteed
> > > amount of memory available for VMs on each NUMA node.
> > > 
> > > Is this possible, or are the knobs just not there?
> > 
> > Not possible right now. What would be the policy when you reach the
> > limit on one node? Fallback to other nodes? What if those hit the limit
> > as well? OOM killer or an allocation failure?
> 
> I'd envision it working exactly the same as the current memory cgroup, but
> with the ability to specify optional per-NUMA-node limits in addition to
> system-wide.

OK, so you would have a per numa percentage of the hard limit? But more
importantly, note that the page allocation is done way before the charge
so we do not have any control over where the memory get allocated from
so we would have to play nasty tricks in the reclaim to somehow balance
NUMA charge pools. And I can easily imagine we would go OOM before we
saturate all NUMA pools. But I didn't get to think this whole thing
through as I am conferencing these days. I am even not sure the whole
thing is the best idea as well. It sounds more easily then it would end
up, I suspect.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: per-NUMA memory limits in mem cgroup?
  2018-04-24 13:27     ` Michal Hocko
@ 2018-04-24 15:13       ` Chris Friesen
  2018-04-24 15:23         ` Michal Hocko
  0 siblings, 1 reply; 6+ messages in thread
From: Chris Friesen @ 2018-04-24 15:13 UTC (permalink / raw)
  To: Michal Hocko
  Cc: cgroups@vger.kernel.org, linux-mm, Johannes Weiner,
	Vladimir Davydov

On 04/24/2018 09:27 AM, Michal Hocko wrote:
> On Mon 23-04-18 11:29:21, Chris Friesen wrote:
>> On 04/22/2018 08:46 AM, Michal Hocko wrote:
>>> On Fri 20-04-18 11:43:07, Chris Friesen wrote:
>>
>>>> The specific scenario I'm considering is that of a hypervisor host.  I have
>>>> system management stuff running on the host that may need more than one
>>>> core, and currently these host tasks might be affined to cores from multiple
>>>> NUMA nodes.  I'd like to put a cap on how much memory the host tasks can
>>>> allocate from each NUMA node in order to ensure that there is a guaranteed
>>>> amount of memory available for VMs on each NUMA node.
>>>>
>>>> Is this possible, or are the knobs just not there?
>>>
>>> Not possible right now. What would be the policy when you reach the
>>> limit on one node? Fallback to other nodes? What if those hit the limit
>>> as well? OOM killer or an allocation failure?
>>
>> I'd envision it working exactly the same as the current memory cgroup, but
>> with the ability to specify optional per-NUMA-node limits in addition to
>> system-wide.
>
> OK, so you would have a per numa percentage of the hard limit?

I think it'd make more sense as a hard limit per NUMA node.

> But more
> importantly, note that the page allocation is done way before the charge
> so we do not have any control over where the memory get allocated from
> so we would have to play nasty tricks in the reclaim to somehow balance
> NUMA charge pools.

Reading the docs on the memory controller it does seem a bit tricky.  I had 
envisioned some sort of "is there memory left in this group" check before 
"approving" the memory allocation, but it seems it doesn't really work that way.

Chris

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: per-NUMA memory limits in mem cgroup?
  2018-04-24 15:13       ` Chris Friesen
@ 2018-04-24 15:23         ` Michal Hocko
  0 siblings, 0 replies; 6+ messages in thread
From: Michal Hocko @ 2018-04-24 15:23 UTC (permalink / raw)
  To: Chris Friesen
  Cc: cgroups@vger.kernel.org, linux-mm, Johannes Weiner,
	Vladimir Davydov

On Tue 24-04-18 11:13:16, Chris Friesen wrote:
[...]
> Reading the docs on the memory controller it does seem a bit tricky.  I had
> envisioned some sort of "is there memory left in this group" check before
> "approving" the memory allocation, but it seems it doesn't really work that
> way.

No, your memory will be usually eaten by the page cache.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2018-04-24 15:23 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-04-20 17:43 per-NUMA memory limits in mem cgroup? Chris Friesen
2018-04-22 12:46 ` Michal Hocko
2018-04-23 15:29   ` Chris Friesen
2018-04-24 13:27     ` Michal Hocko
2018-04-24 15:13       ` Chris Friesen
2018-04-24 15:23         ` Michal Hocko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).