From: Vladimir Davydov <vdavydov@virtuozzo.com>
To: Michal Hocko <mhocko@kernel.org>
Cc: <linux-mm@kvack.org>,
Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>,
David Rientjes <rientjes@google.com>,
Oleg Nesterov <oleg@redhat.com>,
Andrew Morton <akpm@linux-foundation.org>,
LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 3/6] mm, oom_adj: make sure processes sharing mm have same view of oom_score_adj
Date: Mon, 30 May 2016 15:19:32 +0300 [thread overview]
Message-ID: <20160530121932.GC8293@esperanza> (raw)
In-Reply-To: <20160530111148.GQ22928@dhcp22.suse.cz>
On Mon, May 30, 2016 at 01:11:48PM +0200, Michal Hocko wrote:
> On Mon 30-05-16 13:26:44, Vladimir Davydov wrote:
> > On Mon, May 30, 2016 at 11:39:50AM +0200, Michal Hocko wrote:
> [...]
> > > Yes and that leads me to a suspicion that we can do that. Maybe I should
> > > just add a note into the log that we are doing that so that people can
> > > complain? Something like the following
> > > diff --git a/fs/proc/base.c b/fs/proc/base.c
> > > index fa0b3ca94dfb..7f3495415719 100644
> > > --- a/fs/proc/base.c
> > > +++ b/fs/proc/base.c
> > > @@ -1104,7 +1104,6 @@ static int __set_oom_adj(struct file *file, int oom_adj, bool legacy)
> > > err_sighand:
> > > unlock_task_sighand(task, &flags);
> > > err_put_task:
> > > - put_task_struct(task);
> > >
> > > if (mm) {
> > > struct task_struct *p;
> > > @@ -1113,6 +1112,10 @@ static int __set_oom_adj(struct file *file, int oom_adj, bool legacy)
> > > for_each_process(p) {
> > > task_lock(p);
> > > if (!p->vfork_done && process_shares_mm(p, mm)) {
> > > + pr_info("updating oom_score_adj for %d (%s) from %d to %d because it shares mm with %d (%s). Report if this is unexpected.\n",
> > > + task_pid_nr(p), p->comm,
> > > + p->signal->oom_score_adj, oom_adj,
> > > + task_pid_nr(task), task->comm);
> >
> > IMO this could be acceptable from userspace pov, but I don't very much
> > like how vfork is special-cased here and in oom killer code.
>
> Well, the vfork has to be special cased here. We definitely have to
> support
> vfork()
> set_oom_score_adj()
> exec()
>
> use case. And I do not see other way without adding something to the
> clone hot paths which sounds like not justifiable considering we are
> talking about a really rare usecase that basically nobody cares about.
I don't think that vfork->exec use case is rare. Quite the contrary, I'm
pretty sure it's used often, because in contrast to fork->exec it avoids
copying page tables, which can be very expensive for fat processes.
Frankly, I don't understand why you are so determined not to add
anything to the fork path. Of course, if the overhead were that
dramatic, we would have to forget the idea, but if it were say <= 0.1 %
for a contrived test that calls fork in a loop, IMHO the modification
would be justified.
>
> [...]
> > > so one process would want to be always selected while the other one
> > > doesn't want to get killed. All they can see is that everything is
> > > put in place until the oom killer comes over and ignores that.
> >
> > If we stored minimal oom_score_adj in mm struct, oom killer wouldn't
> > kill any of these processes, and it looks fine to me as long as we want
> > oom killer to be mm based, not task or signal_struct based.
> >
> > Come to think of it, it'd be difficult to keep mm->oom_score_adj in sync
> > with p->signal->oom_score_adj, because we would need to update
> > mm->oom_score_adj not only on /proc write, but also on fork. May be, we
> > could keep all signal_structs sharing mm linked in per mm list so that
> > we could quickly update mm->oom_score_adj on fork? That way we wouldn't
> > need to special case vfork.
>
> Yes the current approach is slightly racy but I do not see that would
> matter all that much. What you are suggesting might work but I am not
> really sure we want to complicate the whole thing now. Sure if we see
> that those races are real we can try to find a better solution, but I
> would like to start as easy as possible and placing all the logic into
> the oom_score_adj proc handler sounds like a good spot to me.
IMHO with all those p->vfork_done and p->signal_struct->oom_score_adj
checks the oom code is becoming more difficult to understand. Linking
signal_struct in a per mm list would probably require more code, but
IMHO it would be easier to follow.
next prev parent reply other threads:[~2016-05-30 12:19 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-05-26 12:40 [PATCH 0/5] Handle oom bypass more gracefully Michal Hocko
2016-05-26 12:40 ` [PATCH 1/6] mm, oom: do not loop over all tasks if there are no external tasks sharing mm Michal Hocko
2016-05-26 14:30 ` Tetsuo Handa
2016-05-26 14:59 ` Michal Hocko
2016-05-26 15:25 ` [PATCH 1/6] mm, oom: do not loop over all tasks if there are noexternal " Tetsuo Handa
2016-05-26 15:35 ` Michal Hocko
2016-05-26 16:14 ` [PATCH 1/6] mm, oom: do not loop over all tasks if there are no external " Tetsuo Handa
2016-05-27 6:45 ` Michal Hocko
2016-05-27 7:15 ` Michal Hocko
2016-05-27 8:03 ` Michal Hocko
2016-05-26 12:40 ` [PATCH 2/6] proc, oom_adj: extract oom_score_adj setting into a helper Michal Hocko
2016-05-26 12:40 ` [PATCH 3/6] mm, oom_adj: make sure processes sharing mm have same view of oom_score_adj Michal Hocko
2016-05-27 11:18 ` Michal Hocko
2016-05-27 16:18 ` Vladimir Davydov
2016-05-30 7:07 ` Michal Hocko
2016-05-30 8:47 ` Vladimir Davydov
2016-05-30 9:39 ` Michal Hocko
2016-05-30 10:26 ` Vladimir Davydov
2016-05-30 11:11 ` Michal Hocko
2016-05-30 12:19 ` Vladimir Davydov [this message]
2016-05-30 12:28 ` Michal Hocko
2016-05-26 12:40 ` [PATCH 4/6] mm, oom: skip over vforked tasks Michal Hocko
2016-05-27 16:48 ` Vladimir Davydov
2016-05-30 7:13 ` Michal Hocko
2016-05-30 9:52 ` Michal Hocko
2016-05-30 10:40 ` Vladimir Davydov
2016-05-30 10:53 ` Michal Hocko
2016-05-30 12:03 ` Michal Hocko
2016-05-26 12:40 ` [PATCH 5/6] mm, oom: kill all tasks sharing the mm Michal Hocko
2016-05-26 12:40 ` [PATCH 6/6] mm, oom: fortify task_will_free_mem Michal Hocko
2016-05-26 14:11 ` Tetsuo Handa
2016-05-26 14:23 ` Michal Hocko
2016-05-26 14:41 ` Tetsuo Handa
2016-05-26 14:56 ` Michal Hocko
2016-05-27 11:07 ` Michal Hocko
2016-05-27 16:00 ` [PATCH 0/5] Handle oom bypass more gracefully Michal Hocko
-- strict thread matches above, loose matches on Subject: below --
2016-05-30 13:05 [PATCH 0/6 -v2] " Michal Hocko
2016-05-30 13:05 ` [PATCH 3/6] mm, oom_adj: make sure processes sharing mm have same view of oom_score_adj Michal Hocko
2016-05-31 7:41 ` Michal Hocko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160530121932.GC8293@esperanza \
--to=vdavydov@virtuozzo.com \
--cc=akpm@linux-foundation.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@kernel.org \
--cc=oleg@redhat.com \
--cc=penguin-kernel@I-love.SAKURA.ne.jp \
--cc=rientjes@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).