From: minchan@kernel.org
To: Suren Baghdasaryan <surenb@google.com>
Cc: Michal Hocko <mhocko@suse.com>,
linux-api@vger.kernel.org, linux-mm <linux-mm@kvack.org>,
Andrew Morton <akpm@linux-foundation.org>,
David Rientjes <rientjes@google.com>,
Matthew Wilcox <willy@infradead.org>,
Johannes Weiner <hannes@cmpxchg.org>,
Roman Gushchin <guro@fb.com>, Rik van Riel <riel@surriel.com>,
Christian Brauner <christian@brauner.io>,
Oleg Nesterov <oleg@redhat.com>,
Tim Murray <timmurray@google.com>,
kernel-team <kernel-team@android.com>,
LKML <linux-kernel@vger.kernel.org>,
Mel Gorman <mgorman@techsingularity.net>
Subject: Re: [RFC]: userspace memory reaping
Date: Wed, 14 Oct 2020 11:39:43 -0700 [thread overview]
Message-ID: <20201014183943.GA1489464@google.com> (raw)
In-Reply-To: <CAJuCfpEQ_ADYsMrF_zjfAeQ3d-FALSP+CeYsvgH2H1-FSoGGqg@mail.gmail.com>
On Wed, Oct 14, 2020 at 09:57:20AM -0700, Suren Baghdasaryan wrote:
> On Wed, Oct 14, 2020 at 5:09 AM Michal Hocko <mhocko@suse.com> wrote:
> >
> > [Sorry for a late reply]
> >
> > On Mon 14-09-20 17:45:44, Suren Baghdasaryan wrote:
> > > + linux-kernel@vger.kernel.org
> > >
> > > On Mon, Sep 14, 2020 at 5:43 PM Suren Baghdasaryan <surenb@google.com> wrote:
> > > >
> > > > Last year I sent an RFC about using oom-reaper while killing a
> > > > process: https://patchwork.kernel.org/cover/10894999. During LSFMM2019
> > > > discussion https://lwn.net/Articles/787217 a couple of alternative
> > > > options were discussed with the most promising one (outlined in the
> > > > last paragraph of https://lwn.net/Articles/787217) suggesting to use a
> > > > remote version of madvise(MADV_DONTNEED) operation to force memory
> > > > reclaim of a killed process. With process_madvise() making its way
> > > > through reviews (https://patchwork.kernel.org/patch/11747133/), I
> > > > would like to revive this discussion and get feedback on several
> > > > possible options, their pros and cons.
> >
> > Thanks for reviving this!
>
> Thanks for your feedback!
>
> >
> > > > The need is similar to why oom-reaper was introduced - when a process
> > > > is being killed to free memory we want to make sure memory is freed
> > > > even if the victim is in uninterruptible sleep or is busy and reaction
> > > > to SIGKILL is delayed by an unpredictable amount of time. I
> > > > experimented with enabling process_madvise(MADV_DONTNEED) operation
> > > > and using it to force memory reclaim of the target process after
> > > > sending SIGKILL. Unfortunately this approach requires the caller to
> > > > read proc/pid/maps to extract the list of VMAs to pass as an input to
> > > > process_madvise().
> >
> > Well I would argue that this is not really necessary. You can simply
> > call process_madvise with the full address range and let the kernel
> > operated only on ranges which are safe to tear down asynchronously.
> > Sure that would require some changes to the existing code to not fail
> > on those ranges if they contain incompatible vmas but that should be
> > possible. If we are worried about backward compatibility then a
> > dedicated flag could override.
> >
>
> IIUC this is very similar to the last option I proposed. I think this
> is doable if we treat it as a special case. process_madvise() return
> value not being able to handle a large range would still be a problem.
> Maybe we can return MAX_INT in those cases?
Or, maybe we could just return 0 if the operation succeeds without any
error.
>
> > [...]
> >
> > > > While the objective is to guarantee forward progress even when the
> > > > victim cannot terminate, we still want this mechanism to be efficient
> > > > because we perform these operations to relieve memory pressure before
> > > > it affects user experience.
> > > >
> > > > Alternative options I would like your feedback are:
> > > > 1. Introduce a dedicated process_madvise(MADV_DONTNEED_MM)
> > > > specifically for this case to indicate that the whole mm can be freed.
> >
> > This shouldn't be any different from madvise on the full address range,
> > right?
> >
>
> Yep, just a matter of choosing the most appropriate API.
I agree full range or just NULL passing to indicate entire address
space would be better than introducing a new advise in that we could
avoid MADV_PAGEOUT_MM, MADV_COLD_MM.
>
> > > > 2. A new syscall to efficiently obtain a vector of VMAs (start,
> > > > length, flags) of the process instead of reading /proc/pid/maps. The
> > > > size of the vector is still limited by UIO_MAXIOV (1024), so several
> > > > calls might be needed to query larger number of VMAs, however it will
> > > > still be an order of magnitude more efficient than reading
> > > > /proc/pid/maps file in 4K or smaller chunks.
> >
> > While this might be interesting for other usecases - userspace memory
> > management in general - I do not think it is directly related to this
> > particular feature.
> >
>
> True but such a syscall would be useful for other use cases, like
> MADV_COLD/MADV_PAGEOUT that Minchan was working on. Maybe we can kill
> more than one bird here? Minchan, any thought?
Generally, it could be helpful but I don't see it as desperate at this
moment.
next prev parent reply other threads:[~2020-10-14 18:39 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-09-15 0:43 [RFC]: userspace memory reaping Suren Baghdasaryan
2020-09-15 0:45 ` Suren Baghdasaryan
2020-10-14 12:09 ` Michal Hocko
2020-10-14 16:57 ` Suren Baghdasaryan
2020-10-14 18:39 ` minchan [this message]
2020-10-15 9:20 ` Michal Hocko
2020-10-15 18:43 ` Minchan Kim
2020-10-15 19:32 ` Suren Baghdasaryan
2020-10-15 19:25 ` Suren Baghdasaryan
2020-11-02 20:29 ` Suren Baghdasaryan
2020-11-03 9:35 ` Michal Hocko
2020-11-03 21:28 ` Suren Baghdasaryan
2020-11-03 21:32 ` Minchan Kim
2020-11-03 21:40 ` Suren Baghdasaryan
2020-11-03 21:46 ` Minchan Kim
2020-11-04 6:58 ` Michal Hocko
2020-11-04 20:40 ` Minchan Kim
2020-11-05 12:20 ` Michal Hocko
2020-11-05 16:50 ` Suren Baghdasaryan
2020-11-05 17:07 ` Minchan Kim
2020-11-05 17:16 ` Michal Hocko
2020-11-05 17:21 ` Suren Baghdasaryan
2020-11-05 17:41 ` Minchan Kim
2020-11-05 17:43 ` Michal Hocko
2020-11-05 18:02 ` Suren Baghdasaryan
2020-11-13 17:37 ` Suren Baghdasaryan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20201014183943.GA1489464@google.com \
--to=minchan@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=christian@brauner.io \
--cc=guro@fb.com \
--cc=hannes@cmpxchg.org \
--cc=kernel-team@android.com \
--cc=linux-api@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@techsingularity.net \
--cc=mhocko@suse.com \
--cc=oleg@redhat.com \
--cc=riel@surriel.com \
--cc=rientjes@google.com \
--cc=surenb@google.com \
--cc=timmurray@google.com \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.