From: Oleg Nesterov <oleg@redhat.com>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>,
Michal Hocko <mhocko@suse.cz>,
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
linux-api@vger.kernel.org,
Andrew Morton <akpm@linux-foundation.org>,
Linus Torvalds <torvalds@linux-foundation.org>,
linux-kernel@vger.kernel.org,
Roman Gushchin <klamm@yandex-team.ru>,
Nikita Vetoshkin <nekto0n@yandex-team.ru>,
Pavel Emelyanov <xemul@parallels.com>
Subject: Re: memcg && uaccess (Was: [PATCH 1/2] kernel/fork: handle put_user errors for CLONE_CHILD_SETTID/CLEARTID)
Date: Tue, 10 Feb 2015 20:47:43 +0100 [thread overview]
Message-ID: <20150210194743.GA17333@redhat.com> (raw)
In-Reply-To: <20150210161941.GB11212@phnom.home.cmpxchg.org>
On 02/10, Johannes Weiner wrote:
>
> We had reports of systems deadlocking because
Yes, yes, to some degree I understand why it was done this way. Not
that I understand the details of course. Thanks for your explanations.
> > How can a system call know it should return -ENOMEM if put_user() can only
> > return -EFAULT ?
>
> I see the problem, but allocations can not be guaranteed to succeed,
> not even the OOM killer can reliably make progress,
Yes sure,
> So what
> can we do if that allocation fails? Even if we go the route that
> Linus proposes and make OOM situations more generic and check them on
> *every* return to userspace, the OOM handler at that point might still
> kill a task more suited to free memory than the faulting one, and so
> we still have to communicate the proper error value to the syscall.
Yes. To me this means that if a page fault from kernel-space fails because
of VM_FAULT_OOM the task should be killed in any case. Except we should
obviously exclude gup/kthreads.
We can't retry in this case and (say) schedule_tail() simply can't report
or handle the failure. Imho it would be better to kill the task loudly,
perhaps with a warning.
To avoid the confusion. Of course, it is not that I am trying to simply
add send_sig(SIGKILL) into the failure paths. My only point is that,
whatever we do, the "silent" or misleading failure is worse than SIGKILL.
The application can't really "handle an out of memory situation gracefully"
as the changlelog says. Even if put_user() (and thus syscall) could return
-ENOMEM, this doesn't really matter I think.
> However, I think we could go back to invoking OOM from all allocation
> contexts again as long as we change allocator and OOM killer to not
> wait for individual OOM victims to exit indefinitely (unless it's a
> __GFP_NOFAIL context). Maybe wait for some time on the first victim
> before moving on to the next one.
perhaps... can't really comment, at least right now.
> What do you think?
So far I only think that this problem is not trivial ;)
Oleg.
prev parent reply other threads:[~2015-02-10 19:47 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-02-06 16:23 [PATCH 1/2] kernel/fork: handle put_user errors for CLONE_CHILD_SETTID/CLEARTID Konstantin Khlebnikov
2015-02-06 16:23 ` [PATCH 2/2] kernel/fork: handle put_user errors for CLONE_PARENT_SETTID Konstantin Khlebnikov
2015-02-06 20:49 ` Linus Torvalds
[not found] ` <CA+55aFxBuf-0UkoYCrwH_vNsWFnUkFOz5c9O_Mswe_w0BTkqbQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-02-06 21:07 ` Oleg Nesterov
2015-02-06 21:13 ` Konstantin Khlebnikov
2015-02-06 21:55 ` Andy Lutomirski
[not found] ` <CALYGNiMv021=WC2uXsjo5zT8JwewweZUDdk0x8FGHh9V5j6bFQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-02-06 22:10 ` Linus Torvalds
2015-02-06 19:44 ` [PATCH 1/2] kernel/fork: handle put_user errors for CLONE_CHILD_SETTID/CLEARTID Oleg Nesterov
[not found] ` <20150206194405.GA13960-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2015-02-06 19:55 ` Oleg Nesterov
[not found] ` <20150206195529.GA15517-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2015-02-06 20:27 ` Konstantin Khlebnikov
2015-02-06 20:32 ` memcg && uaccess (Was: [PATCH 1/2] kernel/fork: handle put_user errors for CLONE_CHILD_SETTID/CLEARTID) Oleg Nesterov
[not found] ` <20150206203246.GA16924-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2015-02-10 16:19 ` Johannes Weiner
2015-02-10 19:47 ` Oleg Nesterov [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150210194743.GA17333@redhat.com \
--to=oleg@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=hannes@cmpxchg.org \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=khlebnikov@yandex-team.ru \
--cc=klamm@yandex-team.ru \
--cc=kosaki.motohiro@jp.fujitsu.com \
--cc=linux-api@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mhocko@suse.cz \
--cc=nekto0n@yandex-team.ru \
--cc=torvalds@linux-foundation.org \
--cc=xemul@parallels.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).