From: Mel Gorman <mgorman@suse.de>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Lin Feng <linfeng@cn.fujitsu.com>,
viro@zeniv.linux.org.uk, bcrl@kvack.org,
kamezawa.hiroyu@jp.fujitsu.com, mhocko@suse.cz, hughd@google.com,
cl@linux.com, minchan@kernel.org, isimatu.yasuaki@jp.fujitsu.com,
laijs@cn.fujitsu.com, wency@cn.fujitsu.com,
tangchen@cn.fujitsu.com, linux-fsdevel@vger.kernel.org,
linux-aio@kvack.org, linux-mm@kvack.org,
linux-kernel@vger.kernel.org
Subject: Re: [BUG REPORT] [mm-hotplug, aio] aio ring_pages can't be offlined
Date: Fri, 30 Nov 2012 11:00:59 +0000 [thread overview]
Message-ID: <20121130110059.GD8218@suse.de> (raw)
In-Reply-To: <20121129235502.05223586.akpm@linux-foundation.org>
On Thu, Nov 29, 2012 at 11:55:02PM -0800, Andrew Morton wrote:
> On Fri, 30 Nov 2012 15:01:26 +0800 Lin Feng <linfeng@cn.fujitsu.com> wrote:
>
> >
> >
> > On 11/30/2012 01:57 PM, Andrew Morton wrote:
> > > On Fri, 30 Nov 2012 11:42:05 +0800 Lin Feng <linfeng@cn.fujitsu.com> wrote:
> > >
> > >> hi Andrew,
> > >>
> > >> On 11/30/2012 07:39 AM, Andrew Morton wrote:
> > >>> Tricky.
> > >>>
> > >>> I expect the same problem would occur with pages which are under
> > >>> O_DIRECT I/O. Obviously O_DIRECT pages won't be pinned for such long
> > >>> periods, but the durations could still be lengthy (seconds).
> > >> the offline retry timeout duration is 2 minutes, so to O_DIRECT pages
> > >> seem maybe not a problem for the moment.
> > >>>
> > >>> Worse is a futex page, which could easily remain pinned indefinitely.
> > >>>
> > >>> The best I can think of is to make changes in or around
> > >>> get_user_pages(), to steal the pages from userspace and replace them
> > >>> with non-movable ones before pinning them. The performance cost of
> > >>> something like this would surely be unacceptable for direct-io, but
> > >>> maybe OK for the aio ring and futexes.
> > >> thanks for your advice.
> > >> I want to limit the impact as little as possible, as mentioned above,
> > >> direct-io seems not a problem, we needn't touch them. Maybe we can
> > >> just change the use of get_user_pages()(in or around) such as aio
> > >> ring pages. I will try to find a way to do this.
> > >
> > > What about futexes?
> > hi Andrew,
> >
> > Yes, better to find an approach to solve them all.
> >
> > But I'm worried about that if we just confine get_user_pages() to use
> > none-movable pages, it will drain the none-movable pages soon. Because
> > there are many places using get_user_pages() such as some drivers.
>
> Obviously we shouldn't change get_user_pages() for all callers.
>
> > IMHO in most cases get_user_pages() callers should release the pages soon,
> > so pages allocated from movable zone should be OK. But I'm not sure if
> > we get such rule upon get_user_pages().
> > And in other cases we specify get_user_pages() to allocate pages from
> > none-movable zone.
> >
> > So could we add a zone-alloc flags when we call get_user_pages()?
>
> Well, that's a fairly low-level implementation detail. A more typical
> approach would be to add a new get_user_pages_non_movable() or such.
> That would probably have the same signature as get_user_pages(), with
> one additional argument. Then get_user_pages() becomes a one-line
> wrapper which passes in a particular value of that argument.
>
That is going in the direction that all pinned pages become MIGRATE_UNMOVABLE
allocations. That will impact THP availability by increasing the number
of MIGRATE_UNMOVABLE blocks that exist and it would hit every user --
not just those that care about ZONE_MOVABLE.
I'm likely to NAK such a patch if it's only about node hot-remove because
it's much more of a corner case than wanting to use THP.
I would prefer if get_user_pages() checked if the page it was about to
pin was in ZONE_MOVABLE and if so, migrate it at that point before it's
pinned. It'll be expensive but will guarantee ZONE_MOVABLE availability
if that's what they want. The CMA people might also want to take
advantage of this if the page happened to be in the MIGRATE_CMA
pageblock.
--
Mel Gorman
SUSE Labs
next prev parent reply other threads:[~2012-11-30 11:00 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-11-29 6:54 [BUG REPORT] [mm-hotplug, aio] aio ring_pages can't be offlined Lin Feng
2012-11-29 23:39 ` Andrew Morton
2012-11-30 0:04 ` Zach Brown
2012-11-30 3:39 ` Lin Feng
2012-11-30 3:42 ` Lin Feng
2012-11-30 5:57 ` Andrew Morton
2012-11-30 7:01 ` Lin Feng
2012-11-30 7:55 ` Andrew Morton
2012-11-30 10:29 ` Lin Feng
2012-11-30 10:47 ` Andrew Morton
2012-12-03 3:00 ` Lin Feng
2012-11-30 11:00 ` Mel Gorman [this message]
2012-12-03 2:52 ` Lin Feng
2012-12-03 11:37 ` Mel Gorman
2012-11-30 7:13 ` Kamezawa Hiroyuki
2012-11-30 8:00 ` Andrew Morton
2012-11-30 10:57 ` Mel Gorman
2012-11-30 15:24 ` Domenico Andreoli
2012-12-03 2:05 ` Lin Feng
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20121130110059.GD8218@suse.de \
--to=mgorman@suse.de \
--cc=akpm@linux-foundation.org \
--cc=bcrl@kvack.org \
--cc=cl@linux.com \
--cc=hughd@google.com \
--cc=isimatu.yasuaki@jp.fujitsu.com \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=laijs@cn.fujitsu.com \
--cc=linfeng@cn.fujitsu.com \
--cc=linux-aio@kvack.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@suse.cz \
--cc=minchan@kernel.org \
--cc=tangchen@cn.fujitsu.com \
--cc=viro@zeniv.linux.org.uk \
--cc=wency@cn.fujitsu.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).