From: Mel Gorman <mgorman@suse.de>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Lin Feng <linfeng@cn.fujitsu.com>,
viro@zeniv.linux.org.uk, bcrl@kvack.org,
kamezawa.hiroyu@jp.fujitsu.com, mhocko@suse.cz, hughd@google.com,
cl@linux.com, minchan@kernel.org, isimatu.yasuaki@jp.fujitsu.com,
laijs@cn.fujitsu.com, wency@cn.fujitsu.com,
tangchen@cn.fujitsu.com, linux-fsdevel@vger.kernel.org,
linux-aio@kvack.org, linux-mm@kvack.org,
linux-kernel@vger.kernel.org
Subject: Re: [BUG REPORT] [mm-hotplug, aio] aio ring_pages can't be offlined
Date: Fri, 30 Nov 2012 11:00:59 +0000 [thread overview]
Message-ID: <20121130110059.GD8218@suse.de> (raw)
In-Reply-To: <20121129235502.05223586.akpm@linux-foundation.org>
On Thu, Nov 29, 2012 at 11:55:02PM -0800, Andrew Morton wrote:
> On Fri, 30 Nov 2012 15:01:26 +0800 Lin Feng <linfeng@cn.fujitsu.com> wrote:
>
> >
> >
> > On 11/30/2012 01:57 PM, Andrew Morton wrote:
> > > On Fri, 30 Nov 2012 11:42:05 +0800 Lin Feng <linfeng@cn.fujitsu.com> wrote:
> > >
> > >> hi Andrew,
> > >>
> > >> On 11/30/2012 07:39 AM, Andrew Morton wrote:
> > >>> Tricky.
> > >>>
> > >>> I expect the same problem would occur with pages which are under
> > >>> O_DIRECT I/O. Obviously O_DIRECT pages won't be pinned for such long
> > >>> periods, but the durations could still be lengthy (seconds).
> > >> the offline retry timeout duration is 2 minutes, so to O_DIRECT pages
> > >> seem maybe not a problem for the moment.
> > >>>
> > >>> Worse is a futex page, which could easily remain pinned indefinitely.
> > >>>
> > >>> The best I can think of is to make changes in or around
> > >>> get_user_pages(), to steal the pages from userspace and replace them
> > >>> with non-movable ones before pinning them. The performance cost of
> > >>> something like this would surely be unacceptable for direct-io, but
> > >>> maybe OK for the aio ring and futexes.
> > >> thanks for your advice.
> > >> I want to limit the impact as little as possible, as mentioned above,
> > >> direct-io seems not a problem, we needn't touch them. Maybe we can
> > >> just change the use of get_user_pages()(in or around) such as aio
> > >> ring pages. I will try to find a way to do this.
> > >
> > > What about futexes?
> > hi Andrew,
> >
> > Yes, better to find an approach to solve them all.
> >
> > But I'm worried about that if we just confine get_user_pages() to use
> > none-movable pages, it will drain the none-movable pages soon. Because
> > there are many places using get_user_pages() such as some drivers.
>
> Obviously we shouldn't change get_user_pages() for all callers.
>
> > IMHO in most cases get_user_pages() callers should release the pages soon,
> > so pages allocated from movable zone should be OK. But I'm not sure if
> > we get such rule upon get_user_pages().
> > And in other cases we specify get_user_pages() to allocate pages from
> > none-movable zone.
> >
> > So could we add a zone-alloc flags when we call get_user_pages()?
>
> Well, that's a fairly low-level implementation detail. A more typical
> approach would be to add a new get_user_pages_non_movable() or such.
> That would probably have the same signature as get_user_pages(), with
> one additional argument. Then get_user_pages() becomes a one-line
> wrapper which passes in a particular value of that argument.
>
That is going in the direction that all pinned pages become MIGRATE_UNMOVABLE
allocations. That will impact THP availability by increasing the number
of MIGRATE_UNMOVABLE blocks that exist and it would hit every user --
not just those that care about ZONE_MOVABLE.
I'm likely to NAK such a patch if it's only about node hot-remove because
it's much more of a corner case than wanting to use THP.
I would prefer if get_user_pages() checked if the page it was about to
pin was in ZONE_MOVABLE and if so, migrate it at that point before it's
pinned. It'll be expensive but will guarantee ZONE_MOVABLE availability
if that's what they want. The CMA people might also want to take
advantage of this if the page happened to be in the MIGRATE_CMA
pageblock.
--
Mel Gorman
SUSE Labs
WARNING: multiple messages have this Message-ID (diff)
From: Mel Gorman <mgorman@suse.de>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Lin Feng <linfeng@cn.fujitsu.com>,
viro@zeniv.linux.org.uk, bcrl@kvack.org,
kamezawa.hiroyu@jp.fujitsu.com, mhocko@suse.cz, hughd@google.com,
cl@linux.com, minchan@kernel.org, isimatu.yasuaki@jp.fujitsu.com,
laijs@cn.fujitsu.com, wency@cn.fujitsu.com,
tangchen@cn.fujitsu.com, linux-fsdevel@vger.kernel.org,
linux-aio@kvack.org, linux-mm@kvack.org,
linux-kernel@vger.kernel.org
Subject: Re: [BUG REPORT] [mm-hotplug, aio] aio ring_pages can't be offlined
Date: Fri, 30 Nov 2012 11:00:59 +0000 [thread overview]
Message-ID: <20121130110059.GD8218@suse.de> (raw)
In-Reply-To: <20121129235502.05223586.akpm@linux-foundation.org>
On Thu, Nov 29, 2012 at 11:55:02PM -0800, Andrew Morton wrote:
> On Fri, 30 Nov 2012 15:01:26 +0800 Lin Feng <linfeng@cn.fujitsu.com> wrote:
>
> >
> >
> > On 11/30/2012 01:57 PM, Andrew Morton wrote:
> > > On Fri, 30 Nov 2012 11:42:05 +0800 Lin Feng <linfeng@cn.fujitsu.com> wrote:
> > >
> > >> hi Andrew,
> > >>
> > >> On 11/30/2012 07:39 AM, Andrew Morton wrote:
> > >>> Tricky.
> > >>>
> > >>> I expect the same problem would occur with pages which are under
> > >>> O_DIRECT I/O. Obviously O_DIRECT pages won't be pinned for such long
> > >>> periods, but the durations could still be lengthy (seconds).
> > >> the offline retry timeout duration is 2 minutes, so to O_DIRECT pages
> > >> seem maybe not a problem for the moment.
> > >>>
> > >>> Worse is a futex page, which could easily remain pinned indefinitely.
> > >>>
> > >>> The best I can think of is to make changes in or around
> > >>> get_user_pages(), to steal the pages from userspace and replace them
> > >>> with non-movable ones before pinning them. The performance cost of
> > >>> something like this would surely be unacceptable for direct-io, but
> > >>> maybe OK for the aio ring and futexes.
> > >> thanks for your advice.
> > >> I want to limit the impact as little as possible, as mentioned above,
> > >> direct-io seems not a problem, we needn't touch them. Maybe we can
> > >> just change the use of get_user_pages()(in or around) such as aio
> > >> ring pages. I will try to find a way to do this.
> > >
> > > What about futexes?
> > hi Andrew,
> >
> > Yes, better to find an approach to solve them all.
> >
> > But I'm worried about that if we just confine get_user_pages() to use
> > none-movable pages, it will drain the none-movable pages soon. Because
> > there are many places using get_user_pages() such as some drivers.
>
> Obviously we shouldn't change get_user_pages() for all callers.
>
> > IMHO in most cases get_user_pages() callers should release the pages soon,
> > so pages allocated from movable zone should be OK. But I'm not sure if
> > we get such rule upon get_user_pages().
> > And in other cases we specify get_user_pages() to allocate pages from
> > none-movable zone.
> >
> > So could we add a zone-alloc flags when we call get_user_pages()?
>
> Well, that's a fairly low-level implementation detail. A more typical
> approach would be to add a new get_user_pages_non_movable() or such.
> That would probably have the same signature as get_user_pages(), with
> one additional argument. Then get_user_pages() becomes a one-line
> wrapper which passes in a particular value of that argument.
>
That is going in the direction that all pinned pages become MIGRATE_UNMOVABLE
allocations. That will impact THP availability by increasing the number
of MIGRATE_UNMOVABLE blocks that exist and it would hit every user --
not just those that care about ZONE_MOVABLE.
I'm likely to NAK such a patch if it's only about node hot-remove because
it's much more of a corner case than wanting to use THP.
I would prefer if get_user_pages() checked if the page it was about to
pin was in ZONE_MOVABLE and if so, migrate it at that point before it's
pinned. It'll be expensive but will guarantee ZONE_MOVABLE availability
if that's what they want. The CMA people might also want to take
advantage of this if the page happened to be in the MIGRATE_CMA
pageblock.
--
Mel Gorman
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2012-11-30 11:00 UTC|newest]
Thread overview: 48+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-11-29 6:54 [BUG REPORT] [mm-hotplug, aio] aio ring_pages can't be offlined Lin Feng
2012-11-29 6:54 ` Lin Feng
2012-11-29 6:54 ` Lin Feng
2012-11-29 23:39 ` Andrew Morton
2012-11-29 23:39 ` Andrew Morton
2012-11-29 23:39 ` Andrew Morton
2012-11-30 0:04 ` Zach Brown
2012-11-30 0:04 ` Zach Brown
2012-11-30 0:04 ` Zach Brown
2012-11-30 3:39 ` Lin Feng
2012-11-30 3:39 ` Lin Feng
2012-11-30 3:39 ` Lin Feng
2012-11-30 3:42 ` Lin Feng
2012-11-30 3:42 ` Lin Feng
2012-11-30 5:57 ` Andrew Morton
2012-11-30 5:57 ` Andrew Morton
2012-11-30 7:01 ` Lin Feng
2012-11-30 7:01 ` Lin Feng
2012-11-30 7:55 ` Andrew Morton
2012-11-30 7:55 ` Andrew Morton
2012-11-30 10:29 ` Lin Feng
2012-11-30 10:29 ` Lin Feng
2012-11-30 10:47 ` Andrew Morton
2012-11-30 10:47 ` Andrew Morton
2012-11-30 10:47 ` Andrew Morton
2012-12-03 3:00 ` Lin Feng
2012-12-03 3:00 ` Lin Feng
2012-12-03 3:00 ` Lin Feng
2012-11-30 11:00 ` Mel Gorman [this message]
2012-11-30 11:00 ` Mel Gorman
2012-12-03 2:52 ` Lin Feng
2012-12-03 2:52 ` Lin Feng
2012-12-03 11:37 ` Mel Gorman
2012-12-03 11:37 ` Mel Gorman
2012-12-03 11:37 ` Mel Gorman
2012-11-30 7:13 ` Kamezawa Hiroyuki
2012-11-30 7:13 ` Kamezawa Hiroyuki
2012-11-30 8:00 ` Andrew Morton
2012-11-30 8:00 ` Andrew Morton
2012-11-30 8:00 ` Andrew Morton
2012-11-30 10:57 ` Mel Gorman
2012-11-30 10:57 ` Mel Gorman
2012-11-30 10:57 ` Mel Gorman
2012-11-30 15:24 ` Domenico Andreoli
2012-11-30 15:24 ` Domenico Andreoli
2012-11-30 15:24 ` Domenico Andreoli
2012-12-03 2:05 ` Lin Feng
2012-12-03 2:05 ` Lin Feng
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20121130110059.GD8218@suse.de \
--to=mgorman@suse.de \
--cc=akpm@linux-foundation.org \
--cc=bcrl@kvack.org \
--cc=cl@linux.com \
--cc=hughd@google.com \
--cc=isimatu.yasuaki@jp.fujitsu.com \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=laijs@cn.fujitsu.com \
--cc=linfeng@cn.fujitsu.com \
--cc=linux-aio@kvack.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@suse.cz \
--cc=minchan@kernel.org \
--cc=tangchen@cn.fujitsu.com \
--cc=viro@zeniv.linux.org.uk \
--cc=wency@cn.fujitsu.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.