linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Mel Gorman <mgorman@suse.de>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Lin Feng <linfeng@cn.fujitsu.com>,
	viro@zeniv.linux.org.uk, bcrl@kvack.org,
	kamezawa.hiroyu@jp.fujitsu.com, mhocko@suse.cz, hughd@google.com,
	cl@linux.com, minchan@kernel.org, isimatu.yasuaki@jp.fujitsu.com,
	laijs@cn.fujitsu.com, wency@cn.fujitsu.com,
	tangchen@cn.fujitsu.com, linux-fsdevel@vger.kernel.org,
	linux-aio@kvack.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org
Subject: Re: [BUG REPORT] [mm-hotplug, aio] aio ring_pages can't be offlined
Date: Fri, 30 Nov 2012 10:57:15 +0000	[thread overview]
Message-ID: <20121130105715.GC8218@suse.de> (raw)
In-Reply-To: <20121129153930.477e9709.akpm@linux-foundation.org>

On Thu, Nov 29, 2012 at 03:39:30PM -0800, Andrew Morton wrote:
> On Thu, 29 Nov 2012 14:54:58 +0800
> Lin Feng <linfeng@cn.fujitsu.com> wrote:
> 
> > Hi all,
> > 
> > We encounter a "Resource temporarily unavailable" fail while trying
> > to offline a memory section in a movable zone. We found that there are 
> > some pages can't be migrated. The offline operation fails in function 
> > migrate_page_move_mapping() returning -EAGAIN till timeout because 
> > the if assertion 'page_count(page) != 1' fails.
> > I wonder in the case 'page_count(page) != 1', should we always wait
> > (return -EAGAING)? Or in other words, can we do something here for 
> > migration if we know where the pages from?
> > 
> > And finally found that such pages are used by /sbin/multipathd in the form
> > of aio ring_pages. Besides once increment introduced by the offline calling
> > chain, another increment is added by aio_setup_ring() via callling
> > get_userpages(), it won't decrease until we call aio_free_ring().
> > 
> > The dump_page info in the offline context is showed as following:
> > page:ffffea0011e69140 count:2 mapcount:0 mapping:ffff8801d6949881 index:0x7fc4b6d1d
> > page flags: 0x30000000018081d(locked|referenced|uptodate|dirty|swapbacked|unevictable)
> > page:ffffea0011fb0480 count:2 mapcount:0 mapping:ffff8801d6949881 index:0x7fc4b6d1c
> > page flags: 0x30000000018081d(locked|referenced|uptodate|dirty|swapbacked|unevictable)
> > page:ffffea0011fbaa80 count:2 mapcount:0 mapping:ffff8801d6949881 index:0x7fc4b6d1a
> > page flags: 0x30000000018081d(locked|referenced|uptodate|dirty|swapbacked|unevictable)
> > page:ffffea0011ff21c0 count:2 mapcount:0 mapping:ffff8801d6949881 index:0x7fc4b6d1b
> > page flags: 0x30000000018081d(locked|referenced|uptodate|dirty|swapbacked|unevictable)
> > 
> > The multipathd seems never going to release the ring_pages until we reboot the box.
> > Furthermore, if some guy makes app which only calls io_setup() but never calls 
> > io_destroy() for the reason that he has to keep the io_setup() for a long time 
> > or just forgets to or even on purpose that we can't expect.
> > So I think the mm-hotplug framwork should get the capability to deal with such
> > situation. And should we consider adding migration support for such pages?
> > 
> > However I don't know if there are any other kinds of such particular pages in 
> > current kernel/Linux system. If unluckily there are many apparently it's hard to 
> > handle them all, just adding migrate support for aio ring_pages is insufficient. 
> > 
> > But if luckily can we use the private field of page struct to track the
> > ring_pages[] pointer so that we can retrieve the user when migrate? 
> > Doing so another problem occurs, how to distinguish such special pages?
> > Use pageflag may cause an impact on current pageflag layout, add new pageflag
> > item also seems to be impossible.
> > 
> > I'm not sure what way is the right approach, seeking for help.
> > Any comments are extremely needed, thanks :)
> 
> Tricky.
> 
> I expect the same problem would occur with pages which are under
> O_DIRECT I/O.  Obviously O_DIRECT pages won't be pinned for such long
> periods, but the durations could still be lengthy (seconds).
> 
> Worse is a futex page, which could easily remain pinned indefinitely.
> 
> The best I can think of is to make changes in or around
> get_user_pages(), to steal the pages from userspace and replace them
> with non-movable ones before pinning them.  The performance cost of
> something like this would surely be unacceptable for direct-io, but
> maybe OK for the aio ring and futexes.
> 

If this happens then it would be preferred if this only happened for
ZONE_MOVABLE. If it generally happens it means we're going to have a lot
more MIGRATE_UNMOVABLE pageblocks and a lot more fragmentation leading
to lower THP availability. For THP, we're ok if some pageblocks are
temporarily unavailable or even unavailable for long periods of time,
we can cope with that but we (or I at least) do not want to lower THP
availability on systems that do not care about ZONE_MOVABLE or node hot-plug.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org.  For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

  parent reply	other threads:[~2012-11-30 10:57 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-11-29  6:54 [BUG REPORT] [mm-hotplug, aio] aio ring_pages can't be offlined Lin Feng
2012-11-29 23:39 ` Andrew Morton
2012-11-30  0:04   ` Zach Brown
2012-11-30  3:39     ` Lin Feng
2012-11-30  3:42   ` Lin Feng
2012-11-30  5:57     ` Andrew Morton
2012-11-30  7:01       ` Lin Feng
2012-11-30  7:55         ` Andrew Morton
2012-11-30 10:29           ` Lin Feng
2012-11-30 10:47             ` Andrew Morton
2012-12-03  3:00               ` Lin Feng
2012-11-30 11:00           ` Mel Gorman
2012-12-03  2:52             ` Lin Feng
2012-12-03 11:37               ` Mel Gorman
2012-11-30  7:13       ` Kamezawa Hiroyuki
2012-11-30  8:00         ` Andrew Morton
2012-11-30 10:57   ` Mel Gorman [this message]
2012-11-30 15:24 ` Domenico Andreoli
2012-12-03  2:05   ` Lin Feng

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20121130105715.GC8218@suse.de \
    --to=mgorman@suse.de \
    --cc=akpm@linux-foundation.org \
    --cc=bcrl@kvack.org \
    --cc=cl@linux.com \
    --cc=hughd@google.com \
    --cc=isimatu.yasuaki@jp.fujitsu.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=laijs@cn.fujitsu.com \
    --cc=linfeng@cn.fujitsu.com \
    --cc=linux-aio@kvack.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.cz \
    --cc=minchan@kernel.org \
    --cc=tangchen@cn.fujitsu.com \
    --cc=viro@zeniv.linux.org.uk \
    --cc=wency@cn.fujitsu.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).