public inbox for cgroups@vger.kernel.org
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko-IBi9RG/b67k@public.gmane.org>
To: Vasily Averin <vvs-5HdwGun5lf+gSpxsJD1C4w@public.gmane.org>
Cc: Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>,
	Vladimir Davydov
	<vdavydov.dev-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>,
	Andrew Morton
	<akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>,
	Roman Gushchin <guro-b10kYP2dOMg@public.gmane.org>,
	Uladzislau Rezki <urezki-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>,
	Vlastimil Babka <vbabka-AlSwsSmVLrQ@public.gmane.org>,
	Shakeel Butt <shakeelb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>,
	Mel Gorman
	<mgorman-3eNAlZScCAx27rWaFMvyedHuzzzSOjJt@public.gmane.org>,
	Tetsuo Handa
	<penguin-kernel-1yMVhJb1mP/7nzcFbJAaVXf5DAMn2ifp@public.gmane.org>,
	cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	kernel-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org
Subject: Re: [PATCH memcg 3/3] memcg: handle memcg oom failures
Date: Thu, 21 Oct 2021 13:49:29 +0200	[thread overview]
Message-ID: <YXFPSvGFV539OcEk@dhcp22.suse.cz> (raw)
In-Reply-To: <d3b32c72-6375-f755-7599-ab804719e1f6-5HdwGun5lf+gSpxsJD1C4w@public.gmane.org>

On Wed 20-10-21 18:46:56, Vasily Averin wrote:
> On 20.10.2021 16:02, Michal Hocko wrote:
> > On Wed 20-10-21 15:14:27, Vasily Averin wrote:
> >> mem_cgroup_oom() can fail if current task was marked unkillable
> >> and oom killer cannot find any victim.
> >>
> >> Currently we force memcg charge for such allocations,
> >> however it allow memcg-limited userspace task in to overuse assigned limits
> >> and potentially trigger the global memory shortage.
> > 
> > You should really go into more details whether that is a practical
> > problem to handle. OOM_FAILED means that the memcg oom killer couldn't
> > find any oom victim so it cannot help with a forward progress. There are
> > not that many situations when that can happen. Naming that would be
> > really useful.
> 
> I've pointed above: 
> "if current task was marked unkillable and oom killer cannot find any victim."
> This may happen when current task cannot be oom-killed because it was marked
> unkillable i.e. it have p->signal->oom_score_adj == OOM_SCORE_ADJ_MIN
> and other processes in memcg are either dying, or are kernel threads or are marked unkillable 
> by the same way. Or when memcg have this process only.
> 
> If we always approve such kind of allocation, it can be misused.
> Process can mmap a lot of memory,
> ant then touch it and generate page fault and make overcharged memory allocations.
> Finally it can consume all node memory and trigger global memory shortage on the host.

Yes, this is true but a) OOM_SCORE_ADJ_MIN tasks are excluded from the
OOM handling so they have to be careful with the memory consumption and
b) is this a theoretical or a practical concern. 

This is mostly what I wanted to make sure you describe in the changelog.

> >> Let's fail the memory charge in such cases.
> >>
> >> This failure should be somehow recognised in #PF context,
> > 
> > explain why
> 
> When #PF cannot allocate memory (due to reason described above), handle_mm_fault returns VM_FAULT_OOM,
> then its caller executes pagefault_out_of_memory(). If last one cannot recognize the real reason of this fail,
> it expect it was global memory shortage and executed global out_ouf_memory() that can kill random process 
> or just crash node if sysctl vm.panic_on_oom is set to 1.
> 
> Currently pagefault_out_of_memory() knows about possible async memcg OOM and handles it correctly.
> However it is not aware that memcg can reject some other allocations, do not recognize the fault
> as memcg-related and allows to run global OOM.

Again something to be added to the changelog.

> >> so let's use current->memcg_in_oom == (struct mem_cgroup *)OOM_FAILED
> >>
> >> ToDo: what is the best way to notify pagefault_out_of_memory() about 
> >>     mem_cgroup_out_of_memory failure ?
> > 
> > why don't you simply remove out_of_memory from pagefault_out_of_memory
> > and leave it only with the blocking memcg OOM handling? Wouldn't that be a
> > more generic solution? Your first patch already goes that way partially.
> 
> I clearly understand that global out_of_memory should not be trggired by memcg restrictions.
> I clearly understand that dying task will release some memory soon and we can do not run global oom if current task is dying.
> 
> However I'm not sure that I can remove out_of_memory at all. At least I do not have good arguments to do it.

I do understand that handling a very specific case sounds easier but it
would be better to have a robust fix even if that requires some more
head scratching. So far we have collected several reasons why the it is
bad to trigger oom killer from the #PF path. There is no single argument
to keep it so it sounds like a viable path to pursue. Maybe there are
some very well hidden reasons but those should be documented and this is
a great opportunity to do either of the step.

Moreover if it turns out that there is a regression then this can be
easily reverted and a different, maybe memcg specific, solution can be
implemented.
-- 
Michal Hocko
SUSE Labs

  parent reply	other threads:[~2021-10-21 11:49 UTC|newest]

Thread overview: 66+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-18  8:13 [PATCH memcg 0/1] false global OOM triggered by memcg-limited task Vasily Averin
     [not found] ` <9d10df01-0127-fb40-81c3-cc53c9733c3e-5HdwGun5lf+gSpxsJD1C4w@public.gmane.org>
2021-10-18  9:04   ` Michal Hocko
     [not found]     ` <YW04jWSv6pQb2Goe-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2021-10-18 10:05       ` Vasily Averin
     [not found]         ` <6b751abe-aa52-d1d8-2631-ec471975cc3a-5HdwGun5lf+gSpxsJD1C4w@public.gmane.org>
2021-10-18 10:12           ` Vasily Averin
2021-10-18 11:53           ` Michal Hocko
     [not found]             ` <27dc0c49-a0d6-875b-49c6-0ef5c0cc3ac8@virtuozzo.com>
     [not found]               ` <27dc0c49-a0d6-875b-49c6-0ef5c0cc3ac8-5HdwGun5lf+gSpxsJD1C4w@public.gmane.org>
2021-10-18 12:27                 ` Michal Hocko
     [not found]                   ` <YW1oMxNkUCaAimmg-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2021-10-18 15:07                     ` Shakeel Butt
     [not found]                       ` <CALvZod42uwgrg83CCKn6JgYqAQtR1RLJSuybNYjtkFo4wVgT1w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2021-10-18 16:51                         ` Michal Hocko
2021-10-18 17:13                           ` Shakeel Butt
2021-10-18 18:52                         ` Vasily Averin
     [not found]                           ` <153f7aa6-39ef-f064-8745-a9489e088239-5HdwGun5lf+gSpxsJD1C4w@public.gmane.org>
2021-10-18 19:18                             ` Vasily Averin
     [not found]                               ` <4a30aa18-e2a2-693c-8237-b75fffac9838-5HdwGun5lf+gSpxsJD1C4w@public.gmane.org>
2021-10-19  5:34                                 ` Shakeel Butt
2021-10-19  5:33                             ` Shakeel Butt
     [not found]                               ` <CALvZod5Kut63MLVfCkEW5XemqN4Jnd1iEQD_Gk0w5=fPffL8Bg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2021-10-19  6:42                                 ` Vasily Averin
     [not found]                                   ` <25120323-d222-cc5e-fe08-6471bce13bd6-5HdwGun5lf+gSpxsJD1C4w@public.gmane.org>
2021-10-19  8:47                                     ` Michal Hocko
     [not found]             ` <YW1gRz0rTkJrvc4L-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2021-10-19  6:30               ` Vasily Averin
     [not found]                 ` <339ae4b5-6efd-8fc2-33f1-2eb3aee71cb2-5HdwGun5lf+gSpxsJD1C4w@public.gmane.org>
2021-10-19  8:49                   ` Michal Hocko
     [not found]                     ` <YW6GoZhFUJc1uLYr-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2021-10-19 10:30                       ` Vasily Averin
     [not found]                         ` <687bf489-f7a7-5604-25c5-0c1a09e0905b-5HdwGun5lf+gSpxsJD1C4w@public.gmane.org>
2021-10-19 11:54                           ` Michal Hocko
     [not found]                             ` <YW6yAeAO+TeS3OdB-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2021-10-19 12:04                               ` Michal Hocko
     [not found]                                 ` <YW60Rs1mi24sJmp4-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2021-10-19 13:26                                   ` Vasily Averin
     [not found]                                     ` <6c422150-593f-f601-8f91-914c6c5e82f4-5HdwGun5lf+gSpxsJD1C4w@public.gmane.org>
2021-10-19 14:13                                       ` Michal Hocko
     [not found]                                         ` <YW7SfkZR/ZsabkXV-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2021-10-19 14:19                                           ` Michal Hocko
2021-10-19 19:09                                           ` Vasily Averin
     [not found]                                             ` <3c76e2d7-e545-ef34-b2c3-a5f63b1eff51-5HdwGun5lf+gSpxsJD1C4w@public.gmane.org>
2021-10-20  8:07                                               ` [PATCH memcg v4] memcg: prohibit unconditional exceeding the limit of dying tasks Vasily Averin
     [not found]                                                 ` <f40cd82c-f03a-4d36-e953-f89399cb8f58-5HdwGun5lf+gSpxsJD1C4w@public.gmane.org>
2021-10-20  8:43                                                   ` Michal Hocko
     [not found]                                                     ` <YW/WoJDFM3ddHn7Y-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2021-10-20 12:11                                                       ` [PATCH memcg RFC 0/3] " Vasily Averin
     [not found]                                                     ` <cover.1634730787.git.vvs@virtuozzo.com>
     [not found]                                                       ` <cover.1634730787.git.vvs-5HdwGun5lf+gSpxsJD1C4w@public.gmane.org>
2021-10-20 12:12                                                         ` [PATCH memcg 1/3] mm: do not firce global OOM from inside " Vasily Averin
     [not found]                                                           ` <2c13c739-7282-e6f4-da0a-c0b69e68581e-5HdwGun5lf+gSpxsJD1C4w@public.gmane.org>
2021-10-20 12:33                                                             ` Michal Hocko
     [not found]                                                               ` <YXAMpxjuV/h2awqG-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2021-10-20 13:52                                                                 ` Vasily Averin
2021-10-20 12:13                                                         ` [PATCH memcg 2/3] memcg: remove charge forcinig for " Vasily Averin
     [not found]                                                           ` <56180e53-b705-b1be-9b60-75e141c8560c-5HdwGun5lf+gSpxsJD1C4w@public.gmane.org>
2021-10-20 12:41                                                             ` Michal Hocko
     [not found]                                                               ` <YXAOjQO5r1g/WKmn-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2021-10-20 14:21                                                                 ` Vasily Averin
     [not found]                                                                   ` <cbda9b6b-3ee5-06ab-9a3b-debf361b55bb-5HdwGun5lf+gSpxsJD1C4w@public.gmane.org>
2021-10-20 14:57                                                                     ` Michal Hocko
     [not found]                                                                       ` <YXAubuMMgNDeguNx-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2021-10-20 15:20                                                                         ` Tetsuo Handa
     [not found]                                                                           ` <dee26724-3ead-24d4-0c1b-23905bfcdae9-1yMVhJb1mP/7nzcFbJAaVXf5DAMn2ifp@public.gmane.org>
2021-10-21 10:03                                                                             ` Michal Hocko
2021-10-20 12:14                                                         ` [PATCH memcg 3/3] memcg: handle memcg oom failures Vasily Averin
2021-10-20 13:02                                                           ` Michal Hocko
     [not found]                                                             ` <YXATW7KsUZzbbGHy-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2021-10-20 15:46                                                               ` Vasily Averin
     [not found]                                                                 ` <d3b32c72-6375-f755-7599-ab804719e1f6-5HdwGun5lf+gSpxsJD1C4w@public.gmane.org>
2021-10-21 11:49                                                                   ` Michal Hocko [this message]
     [not found]                                                                     ` <YXFPSvGFV539OcEk-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2021-10-21 15:05                                                                       ` Vasily Averin
     [not found]                                                                         ` <b618ac5c-e982-c4af-ecf3-564b8de52c8c-5HdwGun5lf+gSpxsJD1C4w@public.gmane.org>
2021-10-21 16:47                                                                           ` Michal Hocko
     [not found]                                                                             ` <YXGZoVhROdFG2Wym-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2021-10-22  8:10                                                                               ` [PATCH memcg v2 0/2] memcg: prohibit unconditional exceeding the limit of dying tasks Vasily Averin
     [not found]                                                                             ` <cover.1634889066.git.vvs@virtuozzo.com>
     [not found]                                                                               ` <cover.1634889066.git.vvs-5HdwGun5lf+gSpxsJD1C4w@public.gmane.org>
2021-10-22  8:11                                                                                 ` [PATCH memcg v2 1/2] mm, oom: do not trigger out_of_memory from the #PF Vasily Averin
     [not found]                                                                                   ` <91d9196e-842a-757f-a3f2-caeb4a89a0d8-5HdwGun5lf+gSpxsJD1C4w@public.gmane.org>
2021-10-22  8:55                                                                                     ` Michal Hocko
2021-10-22  8:11                                                                                 ` [PATCH memcg v2 2/2] memcg: prohibit unconditional exceeding the limit of dying tasks Vasily Averin
     [not found]                                                                                   ` <4b315938-5600-b7f5-bde9-82f638a2e595-5HdwGun5lf+gSpxsJD1C4w@public.gmane.org>
2021-10-22  9:10                                                                                     ` Michal Hocko
     [not found]                                                                                       ` <YXJ/63kIpTq8AOlD-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2021-10-23 13:18                                                                                         ` [PATCH memcg v3 0/3] " Vasily Averin
     [not found]                                                                                       ` <cover.1634994605.git.vvs@virtuozzo.com>
     [not found]                                                                                         ` <cover.1634994605.git.vvs-5HdwGun5lf+gSpxsJD1C4w@public.gmane.org>
2021-10-23 13:19                                                                                           ` [PATCH memcg v3 1/3] mm, oom: pagefault_out_of_memory: don't force global OOM for " Vasily Averin
     [not found]                                                                                             ` <0828a149-786e-7c06-b70a-52d086818ea3-5HdwGun5lf+gSpxsJD1C4w@public.gmane.org>
2021-10-25  9:27                                                                                               ` Michal Hocko
2021-10-23 13:20                                                                                           ` [PATCH memcg v3 2/3] mm, oom: do not trigger out_of_memory from the #PF Vasily Averin
     [not found]                                                                                             ` <f5fd8dd8-0ad4-c524-5f65-920b01972a42-5HdwGun5lf+gSpxsJD1C4w@public.gmane.org>
2021-10-23 15:01                                                                                               ` Tetsuo Handa
2021-10-23 19:15                                                                                                 ` Vasily Averin
     [not found]                                                                                                 ` <e2a847a2-a414-2535-e3d1-b100a023b9d1-1yMVhJb1mP/7nzcFbJAaVXf5DAMn2ifp@public.gmane.org>
2021-10-25  8:04                                                                                                   ` Michal Hocko
     [not found]                                                                                                     ` <YXZk9Lr217e+saSM-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2021-10-26 13:56                                                                                                       ` Tetsuo Handa
     [not found]                                                                                                         ` <62a326bc-37d2-b8c9-ddbf-7adaeaadf341-1yMVhJb1mP/7nzcFbJAaVXf5DAMn2ifp@public.gmane.org>
2021-10-26 14:07                                                                                                           ` Michal Hocko
2021-10-25  9:34                                                                                               ` Michal Hocko
2021-10-23 13:20                                                                                           ` [PATCH memcg v3 3/3] memcg: prohibit unconditional exceeding the limit of dying tasks Vasily Averin
     [not found]                                                                                             ` <8f5cebbb-06da-4902-91f0-6566fc4b4203-5HdwGun5lf+gSpxsJD1C4w@public.gmane.org>
2021-10-25  9:36                                                                                               ` Michal Hocko
     [not found]                                                                                                 ` <YXZ6qaMJBomVfV8O-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2021-10-27 22:36                                                                                                   ` Andrew Morton
     [not found]                                                                                                     ` <20211027153608.9910f7db99d5ef574045370e-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
2021-10-28  7:22                                                                                                       ` Vasily Averin
     [not found]                                                                                                         ` <ea14200f-ad2c-6901-25da-54900fe2ce14-5HdwGun5lf+gSpxsJD1C4w@public.gmane.org>
2021-10-29  7:46                                                                                                           ` Greg Kroah-Hartman
2021-10-29  7:58                                                                                                       ` Michal Hocko
2021-10-21  8:03       ` [PATCH memcg 0/1] false global OOM triggered by memcg-limited task Vasily Averin
     [not found]         ` <496ed57e-61c6-023a-05fd-4ef21b0294cf-5HdwGun5lf+gSpxsJD1C4w@public.gmane.org>
2021-10-21 11:49           ` Michal Hocko
2021-10-21 13:24             ` Vasily Averin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YXFPSvGFV539OcEk@dhcp22.suse.cz \
    --to=mhocko-ibi9rg/b67k@public.gmane.org \
    --cc=akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org \
    --cc=cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=guro-b10kYP2dOMg@public.gmane.org \
    --cc=hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org \
    --cc=kernel-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org \
    --cc=linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org \
    --cc=mgorman-3eNAlZScCAx27rWaFMvyedHuzzzSOjJt@public.gmane.org \
    --cc=penguin-kernel-1yMVhJb1mP/7nzcFbJAaVXf5DAMn2ifp@public.gmane.org \
    --cc=shakeelb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org \
    --cc=urezki-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
    --cc=vbabka-AlSwsSmVLrQ@public.gmane.org \
    --cc=vdavydov.dev-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
    --cc=vvs-5HdwGun5lf+gSpxsJD1C4w@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox