All of lore.kernel.org
 help / color / mirror / Atom feed
From: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
To: Vladimir Davydov <vdavydov@virtuozzo.com>
Cc: Michal Hocko <mhocko@kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Johannes Weiner <hannes@cmpxchg.org>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 1/7] mm: memcontrol: charge swap to cgroup2
Date: Wed, 16 Dec 2015 11:44:10 +0900	[thread overview]
Message-ID: <5670CFFA.3060309@jp.fujitsu.com> (raw)
In-Reply-To: <20151215110219.GJ28521@esperanza>

On 2015/12/15 20:02, Vladimir Davydov wrote:
> On Tue, Dec 15, 2015 at 12:22:41PM +0900, Kamezawa Hiroyuki wrote:
>> On 2015/12/15 4:42, Vladimir Davydov wrote:
>>> On Mon, Dec 14, 2015 at 04:30:37PM +0100, Michal Hocko wrote:
>>>> On Thu 10-12-15 14:39:14, Vladimir Davydov wrote:
>>>>> In the legacy hierarchy we charge memsw, which is dubious, because:
>>>>>
>>>>>   - memsw.limit must be >= memory.limit, so it is impossible to limit
>>>>>     swap usage less than memory usage. Taking into account the fact that
>>>>>     the primary limiting mechanism in the unified hierarchy is
>>>>>     memory.high while memory.limit is either left unset or set to a very
>>>>>     large value, moving memsw.limit knob to the unified hierarchy would
>>>>>     effectively make it impossible to limit swap usage according to the
>>>>>     user preference.
>>>>>
>>>>>   - memsw.usage != memory.usage + swap.usage, because a page occupying
>>>>>     both swap entry and a swap cache page is charged only once to memsw
>>>>>     counter. As a result, it is possible to effectively eat up to
>>>>>     memory.limit of memory pages *and* memsw.limit of swap entries, which
>>>>>     looks unexpected.
>>>>>
>>>>> That said, we should provide a different swap limiting mechanism for
>>>>> cgroup2.
>>>>> This patch adds mem_cgroup->swap counter, which charges the actual
>>>>> number of swap entries used by a cgroup. It is only charged in the
>>>>> unified hierarchy, while the legacy hierarchy memsw logic is left
>>>>> intact.
>>>>
>>>> I agree that the previous semantic was awkward. The problem I can see
>>>> with this approach is that once the swap limit is reached the anon
>>>> memory pressure might spill over to other and unrelated memcgs during
>>>> the global memory pressure. I guess this is what Kame referred to as
>>>> anon would become mlocked basically. This would be even more of an issue
>>>> with resource delegation to sub-hierarchies because nobody will prevent
>>>> setting the swap amount to a small value and use that as an anon memory
>>>> protection.
>>>
>>> AFAICS such anon memory protection has a side-effect: real-life
>>> workloads need page cache to run smoothly (at least for mapping
>>> executables). Disabling swapping would switch pressure to page caches,
>>> resulting in performance degradation. So, I don't think per memcg swap
>>> limit can be abused to boost your workload on an overcommitted system.
>>>
>>> If you mean malicious users, well, they already have plenty ways to eat
>>> all available memory up to the hard limit by creating unreclaimable
>>> kernel objects.
>>>
>> "protect anon" user's malicious degree is far lower than such cracker like users.
>
> What do you mean by "malicious degree"? What is such a user trying to
> achieve? Killing the system? Well, there are much more effective ways to
> do so. Or does it want to exploit a system specific feature to get
> benefit for itself? If so, it will hardly win by mlocking all anonymous
> memory, because this will result in higher pressure exerted upon its
> page cache and dcache, which normal workloads just can't get along
> without.
>

I wanted to say almost all application developers want to set swap.limit=0 if allowed.
So, it's a usual people who can kill the system if swap imbalance is allowed.
  
>>
>>> Anyway, if you don't trust a container you'd better set the hard memory
>>> limit so that it can't hurt others no matter what it runs and how it
>>> tweaks its sub-tree knobs.
>>>
>>
>> Limiting swap can easily cause "OOM-Killer even while there are
>> available swap" with easy mistake.
>
> What do you mean by "easy mistake"? Misconfiguration? If so, it's a lame
> excuse IMO. Admin should take system configuration seriously. If the
> host is not overcommitted, it's trivial. Otherwise, there's always a
> chance that things will go south, so it's not going to be easy. It's up
> to admin to analyze risks and set limits accordingly. Exporting knobs
> with clear meaning is the best we can do here. swap.max is one such knob
> It defines maximal usage of swap resource. Allowing to breach it just
> does not add up.
>
>> Can't you add "swap excess" switch to sysctl to allow global memory
>> reclaim can ignore swap limitation ?
>
> I'd be opposed to it, because this would obscure the user API. OTOH, a
> kind of swap soft limit (swap.high?) might be considered. I'm not sure
> if it's really necessary though, because all arguments for it do not
> look convincing to me for now. So, personally, I would refrain from
> implementing it until it is really called for by users of cgroup v2.
>

Considering my customers, running OOM-Killer while there are free swap space is
system's error rather than their misconfiguration.

BTW, mlock() requires CAP_IPC_LOCK.
please set default unlimited and check capability at setting swap limit, at least.

Thanks,
-Kame


> Thanks,
> Vladimir
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)
From: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
To: Vladimir Davydov <vdavydov@virtuozzo.com>
Cc: Michal Hocko <mhocko@kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Johannes Weiner <hannes@cmpxchg.org>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 1/7] mm: memcontrol: charge swap to cgroup2
Date: Wed, 16 Dec 2015 11:44:10 +0900	[thread overview]
Message-ID: <5670CFFA.3060309@jp.fujitsu.com> (raw)
In-Reply-To: <20151215110219.GJ28521@esperanza>

On 2015/12/15 20:02, Vladimir Davydov wrote:
> On Tue, Dec 15, 2015 at 12:22:41PM +0900, Kamezawa Hiroyuki wrote:
>> On 2015/12/15 4:42, Vladimir Davydov wrote:
>>> On Mon, Dec 14, 2015 at 04:30:37PM +0100, Michal Hocko wrote:
>>>> On Thu 10-12-15 14:39:14, Vladimir Davydov wrote:
>>>>> In the legacy hierarchy we charge memsw, which is dubious, because:
>>>>>
>>>>>   - memsw.limit must be >= memory.limit, so it is impossible to limit
>>>>>     swap usage less than memory usage. Taking into account the fact that
>>>>>     the primary limiting mechanism in the unified hierarchy is
>>>>>     memory.high while memory.limit is either left unset or set to a very
>>>>>     large value, moving memsw.limit knob to the unified hierarchy would
>>>>>     effectively make it impossible to limit swap usage according to the
>>>>>     user preference.
>>>>>
>>>>>   - memsw.usage != memory.usage + swap.usage, because a page occupying
>>>>>     both swap entry and a swap cache page is charged only once to memsw
>>>>>     counter. As a result, it is possible to effectively eat up to
>>>>>     memory.limit of memory pages *and* memsw.limit of swap entries, which
>>>>>     looks unexpected.
>>>>>
>>>>> That said, we should provide a different swap limiting mechanism for
>>>>> cgroup2.
>>>>> This patch adds mem_cgroup->swap counter, which charges the actual
>>>>> number of swap entries used by a cgroup. It is only charged in the
>>>>> unified hierarchy, while the legacy hierarchy memsw logic is left
>>>>> intact.
>>>>
>>>> I agree that the previous semantic was awkward. The problem I can see
>>>> with this approach is that once the swap limit is reached the anon
>>>> memory pressure might spill over to other and unrelated memcgs during
>>>> the global memory pressure. I guess this is what Kame referred to as
>>>> anon would become mlocked basically. This would be even more of an issue
>>>> with resource delegation to sub-hierarchies because nobody will prevent
>>>> setting the swap amount to a small value and use that as an anon memory
>>>> protection.
>>>
>>> AFAICS such anon memory protection has a side-effect: real-life
>>> workloads need page cache to run smoothly (at least for mapping
>>> executables). Disabling swapping would switch pressure to page caches,
>>> resulting in performance degradation. So, I don't think per memcg swap
>>> limit can be abused to boost your workload on an overcommitted system.
>>>
>>> If you mean malicious users, well, they already have plenty ways to eat
>>> all available memory up to the hard limit by creating unreclaimable
>>> kernel objects.
>>>
>> "protect anon" user's malicious degree is far lower than such cracker like users.
>
> What do you mean by "malicious degree"? What is such a user trying to
> achieve? Killing the system? Well, there are much more effective ways to
> do so. Or does it want to exploit a system specific feature to get
> benefit for itself? If so, it will hardly win by mlocking all anonymous
> memory, because this will result in higher pressure exerted upon its
> page cache and dcache, which normal workloads just can't get along
> without.
>

I wanted to say almost all application developers want to set swap.limit=0 if allowed.
So, it's a usual people who can kill the system if swap imbalance is allowed.
  
>>
>>> Anyway, if you don't trust a container you'd better set the hard memory
>>> limit so that it can't hurt others no matter what it runs and how it
>>> tweaks its sub-tree knobs.
>>>
>>
>> Limiting swap can easily cause "OOM-Killer even while there are
>> available swap" with easy mistake.
>
> What do you mean by "easy mistake"? Misconfiguration? If so, it's a lame
> excuse IMO. Admin should take system configuration seriously. If the
> host is not overcommitted, it's trivial. Otherwise, there's always a
> chance that things will go south, so it's not going to be easy. It's up
> to admin to analyze risks and set limits accordingly. Exporting knobs
> with clear meaning is the best we can do here. swap.max is one such knob
> It defines maximal usage of swap resource. Allowing to breach it just
> does not add up.
>
>> Can't you add "swap excess" switch to sysctl to allow global memory
>> reclaim can ignore swap limitation ?
>
> I'd be opposed to it, because this would obscure the user API. OTOH, a
> kind of swap soft limit (swap.high?) might be considered. I'm not sure
> if it's really necessary though, because all arguments for it do not
> look convincing to me for now. So, personally, I would refrain from
> implementing it until it is really called for by users of cgroup v2.
>

Considering my customers, running OOM-Killer while there are free swap space is
system's error rather than their misconfiguration.

BTW, mlock() requires CAP_IPC_LOCK.
please set default unlimited and check capability at setting swap limit, at least.

Thanks,
-Kame


> Thanks,
> Vladimir
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>



  reply	other threads:[~2015-12-16  2:44 UTC|newest]

Thread overview: 78+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-12-10 11:39 [PATCH 0/7] Add swap accounting to cgroup2 Vladimir Davydov
2015-12-10 11:39 ` Vladimir Davydov
2015-12-10 11:39 ` [PATCH 1/7] mm: memcontrol: charge swap " Vladimir Davydov
2015-12-10 11:39   ` Vladimir Davydov
2015-12-10 16:00   ` Johannes Weiner
2015-12-10 16:00     ` Johannes Weiner
2015-12-10 17:00     ` Vladimir Davydov
2015-12-10 17:00       ` Vladimir Davydov
2015-12-11  2:48   ` Kamezawa Hiroyuki
2015-12-11  2:48     ` Kamezawa Hiroyuki
2015-12-11  7:39     ` Vladimir Davydov
2015-12-11  7:39       ` Vladimir Davydov
2015-12-14 15:30   ` Michal Hocko
2015-12-14 15:30     ` Michal Hocko
2015-12-14 15:48     ` Johannes Weiner
2015-12-14 15:48       ` Johannes Weiner
2015-12-14 19:42     ` Vladimir Davydov
2015-12-14 19:42       ` Vladimir Davydov
2015-12-14 19:52       ` One Thousand Gnomes
2015-12-14 19:52         ` One Thousand Gnomes
2015-12-15  3:22       ` Kamezawa Hiroyuki
2015-12-15  3:22         ` Kamezawa Hiroyuki
2015-12-15 11:02         ` Vladimir Davydov
2015-12-15 11:02           ` Vladimir Davydov
2015-12-16  2:44           ` Kamezawa Hiroyuki [this message]
2015-12-16  2:44             ` Kamezawa Hiroyuki
2015-12-15 14:50         ` Johannes Weiner
2015-12-15 14:50           ` Johannes Weiner
2015-12-16  3:18           ` Kamezawa Hiroyuki
2015-12-16  3:18             ` Kamezawa Hiroyuki
2015-12-16 11:09             ` Johannes Weiner
2015-12-16 11:09               ` Johannes Weiner
2015-12-17  2:46               ` Kamezawa Hiroyuki
2015-12-17  2:46                 ` Kamezawa Hiroyuki
2015-12-17  3:32                 ` Johannes Weiner
2015-12-17  3:32                   ` Johannes Weiner
2015-12-17  4:29                   ` Kamezawa Hiroyuki
2015-12-17  4:29                     ` Kamezawa Hiroyuki
2015-12-15 17:21       ` Michal Hocko
2015-12-15 17:21         ` Michal Hocko
2015-12-15 20:22         ` Johannes Weiner
2015-12-15 20:22           ` Johannes Weiner
2015-12-16  3:57         ` Kamezawa Hiroyuki
2015-12-16  3:57           ` Kamezawa Hiroyuki
2015-12-15  3:12     ` Kamezawa Hiroyuki
2015-12-15  3:12       ` Kamezawa Hiroyuki
2015-12-15  8:30       ` Vladimir Davydov
2015-12-15  8:30         ` Vladimir Davydov
2015-12-15  9:29         ` Kamezawa Hiroyuki
2015-12-15  9:29           ` Kamezawa Hiroyuki
2015-12-10 11:39 ` [PATCH 2/7] mm: vmscan: pass memcg to get_scan_count() Vladimir Davydov
2015-12-10 11:39   ` Vladimir Davydov
2015-12-11 19:24   ` Johannes Weiner
2015-12-11 19:24     ` Johannes Weiner
2015-12-10 11:39 ` [PATCH 3/7] mm: memcontrol: replace mem_cgroup_lruvec_online with mem_cgroup_online Vladimir Davydov
2015-12-10 11:39   ` Vladimir Davydov
2015-12-11 19:25   ` Johannes Weiner
2015-12-11 19:25     ` Johannes Weiner
2015-12-10 11:39 ` [PATCH 4/7] swap.h: move memcg related stuff to the end of the file Vladimir Davydov
2015-12-10 11:39   ` Vladimir Davydov
2015-12-11 19:25   ` Johannes Weiner
2015-12-11 19:25     ` Johannes Weiner
2015-12-10 11:39 ` [PATCH 5/7] mm: vmscan: do not scan anon pages if memcg swap limit is hit Vladimir Davydov
2015-12-10 11:39   ` Vladimir Davydov
2015-12-11 19:27   ` Johannes Weiner
2015-12-11 19:27     ` Johannes Weiner
2015-12-10 11:39 ` [PATCH 6/7] mm: free swap cache aggressively if memcg swap is full Vladimir Davydov
2015-12-10 11:39   ` Vladimir Davydov
2015-12-11 19:33   ` Johannes Weiner
2015-12-11 19:33     ` Johannes Weiner
2015-12-12 16:18     ` Vladimir Davydov
2015-12-12 16:18       ` Vladimir Davydov
2015-12-10 11:39 ` [PATCH 7/7] Documentation: cgroup: add memory.swap.{current,max} description Vladimir Davydov
2015-12-10 11:39   ` Vladimir Davydov
2015-12-11 19:42   ` Johannes Weiner
2015-12-11 19:42     ` Johannes Weiner
2015-12-12 16:19     ` Vladimir Davydov
2015-12-12 16:19       ` Vladimir Davydov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5670CFFA.3060309@jp.fujitsu.com \
    --to=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=akpm@linux-foundation.org \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=vdavydov@virtuozzo.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.