From mboxrd@z Thu Jan 1 00:00:00 1970 From: Lin Feng Subject: [BUG REPORT] [mm-hotplug, aio] aio ring_pages can't be offlined Date: Thu, 29 Nov 2012 14:54:58 +0800 Message-ID: <1354172098-5691-1-git-send-email-linfeng@cn.fujitsu.com> Cc: mgorman@suse.de, minchan@kernel.org, isimatu.yasuaki@jp.fujitsu.com, laijs@cn.fujitsu.com, wency@cn.fujitsu.com, tangchen@cn.fujitsu.com, linux-fsdevel@vger.kernel.org, linux-aio@kvack.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Lin Feng To: akpm@linux-foundation.org, viro@zeniv.linux.org.uk, bcrl@kvack.org, kamezawa.hiroyu@jp.fujitsu.com, mhocko@suse.cz, hughd@google.com, cl@linux.com Return-path: Sender: owner-linux-aio@kvack.org List-Id: linux-fsdevel.vger.kernel.org Hi all, We encounter a "Resource temporarily unavailable" fail while trying to offline a memory section in a movable zone. We found that there are some pages can't be migrated. The offline operation fails in function migrate_page_move_mapping() returning -EAGAIN till timeout because the if assertion 'page_count(page) != 1' fails. I wonder in the case 'page_count(page) != 1', should we always wait (return -EAGAING)? Or in other words, can we do something here for migration if we know where the pages from? And finally found that such pages are used by /sbin/multipathd in the form of aio ring_pages. Besides once increment introduced by the offline calling chain, another increment is added by aio_setup_ring() via callling get_userpages(), it won't decrease until we call aio_free_ring(). The dump_page info in the offline context is showed as following: page:ffffea0011e69140 count:2 mapcount:0 mapping:ffff8801d6949881 index:0x7fc4b6d1d page flags: 0x30000000018081d(locked|referenced|uptodate|dirty|swapbacked|unevictable) page:ffffea0011fb0480 count:2 mapcount:0 mapping:ffff8801d6949881 index:0x7fc4b6d1c page flags: 0x30000000018081d(locked|referenced|uptodate|dirty|swapbacked|unevictable) page:ffffea0011fbaa80 count:2 mapcount:0 mapping:ffff8801d6949881 index:0x7fc4b6d1a page flags: 0x30000000018081d(locked|referenced|uptodate|dirty|swapbacked|unevictable) page:ffffea0011ff21c0 count:2 mapcount:0 mapping:ffff8801d6949881 index:0x7fc4b6d1b page flags: 0x30000000018081d(locked|referenced|uptodate|dirty|swapbacked|unevictable) The multipathd seems never going to release the ring_pages until we reboot the box. Furthermore, if some guy makes app which only calls io_setup() but never calls io_destroy() for the reason that he has to keep the io_setup() for a long time or just forgets to or even on purpose that we can't expect. So I think the mm-hotplug framwork should get the capability to deal with such situation. And should we consider adding migration support for such pages? However I don't know if there are any other kinds of such particular pages in current kernel/Linux system. If unluckily there are many apparently it's hard to handle them all, just adding migrate support for aio ring_pages is insufficient. But if luckily can we use the private field of page struct to track the ring_pages[] pointer so that we can retrieve the user when migrate? Doing so another problem occurs, how to distinguish such special pages? Use pageflag may cause an impact on current pageflag layout, add new pageflag item also seems to be impossible. I'm not sure what way is the right approach, seeking for help. Any comments are extremely needed, thanks :) Thanks, linfeng -- To unsubscribe, send a message with 'unsubscribe linux-aio' in the body to majordomo@kvack.org. For more info on Linux AIO, see: http://www.kvack.org/aio/ Don't email: aart@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andrew Morton Subject: Re: [BUG REPORT] [mm-hotplug, aio] aio ring_pages can't be offlined Date: Thu, 29 Nov 2012 15:39:30 -0800 Message-ID: <20121129153930.477e9709.akpm@linux-foundation.org> References: <1354172098-5691-1-git-send-email-linfeng@cn.fujitsu.com> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Cc: viro@zeniv.linux.org.uk, bcrl@kvack.org, kamezawa.hiroyu@jp.fujitsu.com, mhocko@suse.cz, hughd@google.com, cl@linux.com, mgorman@suse.de, minchan@kernel.org, isimatu.yasuaki@jp.fujitsu.com, laijs@cn.fujitsu.com, wency@cn.fujitsu.com, tangchen@cn.fujitsu.com, linux-fsdevel@vger.kernel.org, linux-aio@kvack.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org To: Lin Feng Return-path: In-Reply-To: <1354172098-5691-1-git-send-email-linfeng@cn.fujitsu.com> Sender: owner-linux-aio@kvack.org List-Id: linux-fsdevel.vger.kernel.org On Thu, 29 Nov 2012 14:54:58 +0800 Lin Feng wrote: > Hi all, > > We encounter a "Resource temporarily unavailable" fail while trying > to offline a memory section in a movable zone. We found that there are > some pages can't be migrated. The offline operation fails in function > migrate_page_move_mapping() returning -EAGAIN till timeout because > the if assertion 'page_count(page) != 1' fails. > I wonder in the case 'page_count(page) != 1', should we always wait > (return -EAGAING)? Or in other words, can we do something here for > migration if we know where the pages from? > > And finally found that such pages are used by /sbin/multipathd in the form > of aio ring_pages. Besides once increment introduced by the offline calling > chain, another increment is added by aio_setup_ring() via callling > get_userpages(), it won't decrease until we call aio_free_ring(). > > The dump_page info in the offline context is showed as following: > page:ffffea0011e69140 count:2 mapcount:0 mapping:ffff8801d6949881 index:0x7fc4b6d1d > page flags: 0x30000000018081d(locked|referenced|uptodate|dirty|swapbacked|unevictable) > page:ffffea0011fb0480 count:2 mapcount:0 mapping:ffff8801d6949881 index:0x7fc4b6d1c > page flags: 0x30000000018081d(locked|referenced|uptodate|dirty|swapbacked|unevictable) > page:ffffea0011fbaa80 count:2 mapcount:0 mapping:ffff8801d6949881 index:0x7fc4b6d1a > page flags: 0x30000000018081d(locked|referenced|uptodate|dirty|swapbacked|unevictable) > page:ffffea0011ff21c0 count:2 mapcount:0 mapping:ffff8801d6949881 index:0x7fc4b6d1b > page flags: 0x30000000018081d(locked|referenced|uptodate|dirty|swapbacked|unevictable) > > The multipathd seems never going to release the ring_pages until we reboot the box. > Furthermore, if some guy makes app which only calls io_setup() but never calls > io_destroy() for the reason that he has to keep the io_setup() for a long time > or just forgets to or even on purpose that we can't expect. > So I think the mm-hotplug framwork should get the capability to deal with such > situation. And should we consider adding migration support for such pages? > > However I don't know if there are any other kinds of such particular pages in > current kernel/Linux system. If unluckily there are many apparently it's hard to > handle them all, just adding migrate support for aio ring_pages is insufficient. > > But if luckily can we use the private field of page struct to track the > ring_pages[] pointer so that we can retrieve the user when migrate? > Doing so another problem occurs, how to distinguish such special pages? > Use pageflag may cause an impact on current pageflag layout, add new pageflag > item also seems to be impossible. > > I'm not sure what way is the right approach, seeking for help. > Any comments are extremely needed, thanks :) Tricky. I expect the same problem would occur with pages which are under O_DIRECT I/O. Obviously O_DIRECT pages won't be pinned for such long periods, but the durations could still be lengthy (seconds). Worse is a futex page, which could easily remain pinned indefinitely. The best I can think of is to make changes in or around get_user_pages(), to steal the pages from userspace and replace them with non-movable ones before pinning them. The performance cost of something like this would surely be unacceptable for direct-io, but maybe OK for the aio ring and futexes. -- To unsubscribe, send a message with 'unsubscribe linux-aio' in the body to majordomo@kvack.org. For more info on Linux AIO, see: http://www.kvack.org/aio/ Don't email: aart@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: Zach Brown Subject: Re: [BUG REPORT] [mm-hotplug, aio] aio ring_pages can't be offlined Date: Thu, 29 Nov 2012 16:04:43 -0800 Message-ID: <20121130000443.GK18574@lenny.home.zabbo.net> References: <1354172098-5691-1-git-send-email-linfeng@cn.fujitsu.com> <20121129153930.477e9709.akpm@linux-foundation.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: Lin Feng , viro@zeniv.linux.org.uk, bcrl@kvack.org, kamezawa.hiroyu@jp.fujitsu.com, mhocko@suse.cz, hughd@google.com, cl@linux.com, mgorman@suse.de, minchan@kernel.org, isimatu.yasuaki@jp.fujitsu.com, laijs@cn.fujitsu.com, wency@cn.fujitsu.com, tangchen@cn.fujitsu.com, linux-fsdevel@vger.kernel.org, linux-aio@kvack.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org To: Andrew Morton Return-path: Content-Disposition: inline In-Reply-To: <20121129153930.477e9709.akpm@linux-foundation.org> Sender: owner-linux-aio@kvack.org List-Id: linux-fsdevel.vger.kernel.org > The best I can think of is to make changes in or around > get_user_pages(), to steal the pages from userspace and replace them > with non-movable ones before pinning them. The performance cost of > something like this would surely be unacceptable for direct-io, but > maybe OK for the aio ring and futexes. In the aio case it seems like it could be taught to populate the mapping with non-movable pages to begin with. It's calling get_user_pages() a few lines after instantiating the mapping itself with do_mmap_pgoff(). - z -- To unsubscribe, send a message with 'unsubscribe linux-aio' in the body to majordomo@kvack.org. For more info on Linux AIO, see: http://www.kvack.org/aio/ Don't email: aart@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: Lin Feng Subject: Re: [BUG REPORT] [mm-hotplug, aio] aio ring_pages can't be offlined Date: Fri, 30 Nov 2012 11:42:05 +0800 Message-ID: <50B82B0D.8010206@cn.fujitsu.com> References: <1354172098-5691-1-git-send-email-linfeng@cn.fujitsu.com> <20121129153930.477e9709.akpm@linux-foundation.org> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: viro@zeniv.linux.org.uk, bcrl@kvack.org, kamezawa.hiroyu@jp.fujitsu.com, mhocko@suse.cz, hughd@google.com, cl@linux.com, mgorman@suse.de, minchan@kernel.org, isimatu.yasuaki@jp.fujitsu.com, laijs@cn.fujitsu.com, wency@cn.fujitsu.com, tangchen@cn.fujitsu.com, linux-fsdevel@vger.kernel.org, linux-aio@kvack.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org To: Andrew Morton Return-path: In-Reply-To: <20121129153930.477e9709.akpm@linux-foundation.org> Sender: linux-kernel-owner@vger.kernel.org List-Id: linux-fsdevel.vger.kernel.org hi Andrew, On 11/30/2012 07:39 AM, Andrew Morton wrote: > Tricky. > > I expect the same problem would occur with pages which are under > O_DIRECT I/O. Obviously O_DIRECT pages won't be pinned for such long > periods, but the durations could still be lengthy (seconds). the offline retry timeout duration is 2 minutes, so to O_DIRECT pages seem maybe not a problem for the moment. > > Worse is a futex page, which could easily remain pinned indefinitely. > > The best I can think of is to make changes in or around > get_user_pages(), to steal the pages from userspace and replace them > with non-movable ones before pinning them. The performance cost of > something like this would surely be unacceptable for direct-io, but > maybe OK for the aio ring and futexes. thanks for your advice. I want to limit the impact as little as possible, as mentioned above, direct-io seems not a problem, we needn't touch them. Maybe we can just change the use of get_user_pages()(in or around) such as aio ring pages. I will try to find a way to do this. Thanks, linfeng From mboxrd@z Thu Jan 1 00:00:00 1970 From: Lin Feng Subject: Re: [BUG REPORT] [mm-hotplug, aio] aio ring_pages can't be offlined Date: Fri, 30 Nov 2012 15:01:26 +0800 Message-ID: <50B859C6.3020707@cn.fujitsu.com> References: <1354172098-5691-1-git-send-email-linfeng@cn.fujitsu.com> <20121129153930.477e9709.akpm@linux-foundation.org> <50B82B0D.8010206@cn.fujitsu.com> <20121129215749.acfd872a.akpm@linux-foundation.org> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: viro@zeniv.linux.org.uk, bcrl@kvack.org, kamezawa.hiroyu@jp.fujitsu.com, mhocko@suse.cz, hughd@google.com, cl@linux.com, mgorman@suse.de, minchan@kernel.org, isimatu.yasuaki@jp.fujitsu.com, laijs@cn.fujitsu.com, wency@cn.fujitsu.com, tangchen@cn.fujitsu.com, linux-fsdevel@vger.kernel.org, linux-aio@kvack.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org To: Andrew Morton Return-path: Received: from cn.fujitsu.com ([222.73.24.84]:6540 "EHLO song.cn.fujitsu.com" rhost-flags-OK-FAIL-OK-OK) by vger.kernel.org with ESMTP id S1756379Ab2K3HBn (ORCPT ); Fri, 30 Nov 2012 02:01:43 -0500 In-Reply-To: <20121129215749.acfd872a.akpm@linux-foundation.org> Sender: linux-fsdevel-owner@vger.kernel.org List-ID: On 11/30/2012 01:57 PM, Andrew Morton wrote: > On Fri, 30 Nov 2012 11:42:05 +0800 Lin Feng wrote: > >> hi Andrew, >> >> On 11/30/2012 07:39 AM, Andrew Morton wrote: >>> Tricky. >>> >>> I expect the same problem would occur with pages which are under >>> O_DIRECT I/O. Obviously O_DIRECT pages won't be pinned for such long >>> periods, but the durations could still be lengthy (seconds). >> the offline retry timeout duration is 2 minutes, so to O_DIRECT pages >> seem maybe not a problem for the moment. >>> >>> Worse is a futex page, which could easily remain pinned indefinitely. >>> >>> The best I can think of is to make changes in or around >>> get_user_pages(), to steal the pages from userspace and replace them >>> with non-movable ones before pinning them. The performance cost of >>> something like this would surely be unacceptable for direct-io, but >>> maybe OK for the aio ring and futexes. >> thanks for your advice. >> I want to limit the impact as little as possible, as mentioned above, >> direct-io seems not a problem, we needn't touch them. Maybe we can >> just change the use of get_user_pages()(in or around) such as aio >> ring pages. I will try to find a way to do this. > > What about futexes? hi Andrew, Yes, better to find an approach to solve them all. But I'm worried about that if we just confine get_user_pages() to use none-movable pages, it will drain the none-movable pages soon. Because there are many places using get_user_pages() such as some drivers. IMHO in most cases get_user_pages() callers should release the pages soon, so pages allocated from movable zone should be OK. But I'm not sure if we get such rule upon get_user_pages(). And in other cases we specify get_user_pages() to allocate pages from none-movable zone. So could we add a zone-alloc flags when we call get_user_pages()? Thanks, linfeng > From mboxrd@z Thu Jan 1 00:00:00 1970 From: Kamezawa Hiroyuki Subject: Re: [BUG REPORT] [mm-hotplug, aio] aio ring_pages can't be offlined Date: Fri, 30 Nov 2012 16:13:16 +0900 Message-ID: <50B85C8C.2030702@jp.fujitsu.com> References: <1354172098-5691-1-git-send-email-linfeng@cn.fujitsu.com> <20121129153930.477e9709.akpm@linux-foundation.org> <50B82B0D.8010206@cn.fujitsu.com> <20121129215749.acfd872a.akpm@linux-foundation.org> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Lin Feng , viro@zeniv.linux.org.uk, bcrl@kvack.org, mhocko@suse.cz, hughd@google.com, cl@linux.com, mgorman@suse.de, minchan@kernel.org, isimatu.yasuaki@jp.fujitsu.com, laijs@cn.fujitsu.com, wency@cn.fujitsu.com, tangchen@cn.fujitsu.com, linux-fsdevel@vger.kernel.org, linux-aio@kvack.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org To: Andrew Morton Return-path: Received: from fgwmail6.fujitsu.co.jp ([192.51.44.36]:50957 "EHLO fgwmail6.fujitsu.co.jp" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752040Ab2K3HNp (ORCPT ); Fri, 30 Nov 2012 02:13:45 -0500 In-Reply-To: <20121129215749.acfd872a.akpm@linux-foundation.org> Sender: linux-fsdevel-owner@vger.kernel.org List-ID: (2012/11/30 14:57), Andrew Morton wrote: > On Fri, 30 Nov 2012 11:42:05 +0800 Lin Feng wrote: > >> hi Andrew, >> >> On 11/30/2012 07:39 AM, Andrew Morton wrote: >>> Tricky. >>> >>> I expect the same problem would occur with pages which are under >>> O_DIRECT I/O. Obviously O_DIRECT pages won't be pinned for such long >>> periods, but the durations could still be lengthy (seconds). >> the offline retry timeout duration is 2 minutes, so to O_DIRECT pages >> seem maybe not a problem for the moment. >>> >>> Worse is a futex page, which could easily remain pinned indefinitely. >>> >>> The best I can think of is to make changes in or around >>> get_user_pages(), to steal the pages from userspace and replace them >>> with non-movable ones before pinning them. The performance cost of >>> something like this would surely be unacceptable for direct-io, but >>> maybe OK for the aio ring and futexes. >> thanks for your advice. >> I want to limit the impact as little as possible, as mentioned above, >> direct-io seems not a problem, we needn't touch them. Maybe we can >> just change the use of get_user_pages()(in or around) such as aio >> ring pages. I will try to find a way to do this. > > What about futexes? > IIUC, futex's key is now a pair of (mm,address) or (inode, pgoff). Then, get_user_page() in futex.c will release the page by put_page(). 'struct page' is just touched by get_futex_key() to obtain page->mapping info. Thanks, -Kame From mboxrd@z Thu Jan 1 00:00:00 1970 From: Lin Feng Subject: Re: [BUG REPORT] [mm-hotplug, aio] aio ring_pages can't be offlined Date: Fri, 30 Nov 2012 11:39:40 +0800 Message-ID: <50B82A7C.8020202@cn.fujitsu.com> References: <1354172098-5691-1-git-send-email-linfeng@cn.fujitsu.com> <20121129153930.477e9709.akpm@linux-foundation.org> <20121130000443.GK18574@lenny.home.zabbo.net> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: Andrew Morton , viro@zeniv.linux.org.uk, bcrl@kvack.org, kamezawa.hiroyu@jp.fujitsu.com, mhocko@suse.cz, hughd@google.com, cl@linux.com, mgorman@suse.de, minchan@kernel.org, isimatu.yasuaki@jp.fujitsu.com, laijs@cn.fujitsu.com, wency@cn.fujitsu.com, tangchen@cn.fujitsu.com, linux-fsdevel@vger.kernel.org, linux-aio@kvack.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org To: Zach Brown Return-path: Received: from cn.fujitsu.com ([222.73.24.84]:60187 "EHLO song.cn.fujitsu.com" rhost-flags-OK-FAIL-OK-OK) by vger.kernel.org with ESMTP id S1753986Ab2K3Dj7 convert rfc822-to-8bit (ORCPT ); Thu, 29 Nov 2012 22:39:59 -0500 In-Reply-To: <20121130000443.GK18574@lenny.home.zabbo.net> Sender: linux-fsdevel-owner@vger.kernel.org List-ID: Hi Zach, Thanks for your advice. So agree, I will look into it to lead aio=20 to use non-movable pages. Thanks, linfeng On 11/30/2012 08:04 AM, Zach Brown wrote: >> The best I can think of is to make changes in or around >> get_user_pages(), to steal the pages from userspace and replace them >> with non-movable ones before pinning them. The performance cost of >> something like this would surely be unacceptable for direct-io, but >> maybe OK for the aio ring and futexes. >=20 > In the aio case it seems like it could be taught to populate the mapp= ing > with non-movable pages to begin with. It's calling get_user_pages() = a > few lines after instantiating the mapping itself with do_mmap_pgoff()= =2E >=20 > - z >=20 --=20 -------------------------------------------------- Lin Feng Development Dept.I Nanjing Fujitsu Nanda Software Tech. Co., Ltd.(FNST) No. 6 Wenzhu Road, Nanjing, 210012, China PHONE=EF=BC=9A+86-25-86630566-8557=20 COINS=EF=BC=9A7998-8557=20 =46AX=EF=BC=9A+86-25-83317685 MAIL=EF=BC=9Alinfeng@cn.fujitsu.com -------------------------------------------------- -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel= " in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html From mboxrd@z Thu Jan 1 00:00:00 1970 From: Mel Gorman Subject: Re: [BUG REPORT] [mm-hotplug, aio] aio ring_pages can't be offlined Date: Fri, 30 Nov 2012 10:57:15 +0000 Message-ID: <20121130105715.GC8218@suse.de> References: <1354172098-5691-1-git-send-email-linfeng@cn.fujitsu.com> <20121129153930.477e9709.akpm@linux-foundation.org> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Cc: Lin Feng , viro@zeniv.linux.org.uk, bcrl@kvack.org, kamezawa.hiroyu@jp.fujitsu.com, mhocko@suse.cz, hughd@google.com, cl@linux.com, minchan@kernel.org, isimatu.yasuaki@jp.fujitsu.com, laijs@cn.fujitsu.com, wency@cn.fujitsu.com, tangchen@cn.fujitsu.com, linux-fsdevel@vger.kernel.org, linux-aio@kvack.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org To: Andrew Morton Return-path: Content-Disposition: inline In-Reply-To: <20121129153930.477e9709.akpm@linux-foundation.org> Sender: owner-linux-aio@kvack.org List-Id: linux-fsdevel.vger.kernel.org On Thu, Nov 29, 2012 at 03:39:30PM -0800, Andrew Morton wrote: > On Thu, 29 Nov 2012 14:54:58 +0800 > Lin Feng wrote: > > > Hi all, > > > > We encounter a "Resource temporarily unavailable" fail while trying > > to offline a memory section in a movable zone. We found that there are > > some pages can't be migrated. The offline operation fails in function > > migrate_page_move_mapping() returning -EAGAIN till timeout because > > the if assertion 'page_count(page) != 1' fails. > > I wonder in the case 'page_count(page) != 1', should we always wait > > (return -EAGAING)? Or in other words, can we do something here for > > migration if we know where the pages from? > > > > And finally found that such pages are used by /sbin/multipathd in the form > > of aio ring_pages. Besides once increment introduced by the offline calling > > chain, another increment is added by aio_setup_ring() via callling > > get_userpages(), it won't decrease until we call aio_free_ring(). > > > > The dump_page info in the offline context is showed as following: > > page:ffffea0011e69140 count:2 mapcount:0 mapping:ffff8801d6949881 index:0x7fc4b6d1d > > page flags: 0x30000000018081d(locked|referenced|uptodate|dirty|swapbacked|unevictable) > > page:ffffea0011fb0480 count:2 mapcount:0 mapping:ffff8801d6949881 index:0x7fc4b6d1c > > page flags: 0x30000000018081d(locked|referenced|uptodate|dirty|swapbacked|unevictable) > > page:ffffea0011fbaa80 count:2 mapcount:0 mapping:ffff8801d6949881 index:0x7fc4b6d1a > > page flags: 0x30000000018081d(locked|referenced|uptodate|dirty|swapbacked|unevictable) > > page:ffffea0011ff21c0 count:2 mapcount:0 mapping:ffff8801d6949881 index:0x7fc4b6d1b > > page flags: 0x30000000018081d(locked|referenced|uptodate|dirty|swapbacked|unevictable) > > > > The multipathd seems never going to release the ring_pages until we reboot the box. > > Furthermore, if some guy makes app which only calls io_setup() but never calls > > io_destroy() for the reason that he has to keep the io_setup() for a long time > > or just forgets to or even on purpose that we can't expect. > > So I think the mm-hotplug framwork should get the capability to deal with such > > situation. And should we consider adding migration support for such pages? > > > > However I don't know if there are any other kinds of such particular pages in > > current kernel/Linux system. If unluckily there are many apparently it's hard to > > handle them all, just adding migrate support for aio ring_pages is insufficient. > > > > But if luckily can we use the private field of page struct to track the > > ring_pages[] pointer so that we can retrieve the user when migrate? > > Doing so another problem occurs, how to distinguish such special pages? > > Use pageflag may cause an impact on current pageflag layout, add new pageflag > > item also seems to be impossible. > > > > I'm not sure what way is the right approach, seeking for help. > > Any comments are extremely needed, thanks :) > > Tricky. > > I expect the same problem would occur with pages which are under > O_DIRECT I/O. Obviously O_DIRECT pages won't be pinned for such long > periods, but the durations could still be lengthy (seconds). > > Worse is a futex page, which could easily remain pinned indefinitely. > > The best I can think of is to make changes in or around > get_user_pages(), to steal the pages from userspace and replace them > with non-movable ones before pinning them. The performance cost of > something like this would surely be unacceptable for direct-io, but > maybe OK for the aio ring and futexes. > If this happens then it would be preferred if this only happened for ZONE_MOVABLE. If it generally happens it means we're going to have a lot more MIGRATE_UNMOVABLE pageblocks and a lot more fragmentation leading to lower THP availability. For THP, we're ok if some pageblocks are temporarily unavailable or even unavailable for long periods of time, we can cope with that but we (or I at least) do not want to lower THP availability on systems that do not care about ZONE_MOVABLE or node hot-plug. -- Mel Gorman SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-aio' in the body to majordomo@kvack.org. For more info on Linux AIO, see: http://www.kvack.org/aio/ Don't email: aart@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andrew Morton Subject: Re: [BUG REPORT] [mm-hotplug, aio] aio ring_pages can't be offlined Date: Fri, 30 Nov 2012 00:00:43 -0800 Message-ID: <20121130000043.cf356676.akpm@linux-foundation.org> References: <1354172098-5691-1-git-send-email-linfeng@cn.fujitsu.com> <20121129153930.477e9709.akpm@linux-foundation.org> <50B82B0D.8010206@cn.fujitsu.com> <20121129215749.acfd872a.akpm@linux-foundation.org> <50B85C8C.2030702@jp.fujitsu.com> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Cc: Lin Feng , viro@zeniv.linux.org.uk, bcrl@kvack.org, mhocko@suse.cz, hughd@google.com, cl@linux.com, mgorman@suse.de, minchan@kernel.org, isimatu.yasuaki@jp.fujitsu.com, laijs@cn.fujitsu.com, wency@cn.fujitsu.com, tangchen@cn.fujitsu.com, linux-fsdevel@vger.kernel.org, linux-aio@kvack.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org To: Kamezawa Hiroyuki Return-path: In-Reply-To: <50B85C8C.2030702@jp.fujitsu.com> Sender: owner-linux-aio@kvack.org List-Id: linux-fsdevel.vger.kernel.org On Fri, 30 Nov 2012 16:13:16 +0900 Kamezawa Hiroyuki wrote: > > What about futexes? > > > > IIUC, futex's key is now a pair of (mm,address) or (inode, pgoff). > Then, get_user_page() in futex.c will release the page by put_page(). > 'struct page' is just touched by get_futex_key() to obtain page->mapping info. Ah yes, that page is unpinned before syscall return. grep -rl get_user_pages . Gad. These should be audited. The great majority will be simple and OK, but drivers/media, drivers/infiniband and net/rds could be problematic. -- To unsubscribe, send a message with 'unsubscribe linux-aio' in the body to majordomo@kvack.org. For more info on Linux AIO, see: http://www.kvack.org/aio/ Don't email: aart@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: Lin Feng Subject: Re: [BUG REPORT] [mm-hotplug, aio] aio ring_pages can't be offlined Date: Fri, 30 Nov 2012 18:29:30 +0800 Message-ID: <50B88A8A.9020802@cn.fujitsu.com> References: <1354172098-5691-1-git-send-email-linfeng@cn.fujitsu.com> <20121129153930.477e9709.akpm@linux-foundation.org> <50B82B0D.8010206@cn.fujitsu.com> <20121129215749.acfd872a.akpm@linux-foundation.org> <50B859C6.3020707@cn.fujitsu.com> <20121129235502.05223586.akpm@linux-foundation.org> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: viro@zeniv.linux.org.uk, bcrl@kvack.org, kamezawa.hiroyu@jp.fujitsu.com, mhocko@suse.cz, hughd@google.com, cl@linux.com, mgorman@suse.de, minchan@kernel.org, isimatu.yasuaki@jp.fujitsu.com, laijs@cn.fujitsu.com, wency@cn.fujitsu.com, tangchen@cn.fujitsu.com, linux-fsdevel@vger.kernel.org, linux-aio@kvack.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Lin Feng To: Andrew Morton Return-path: In-Reply-To: <20121129235502.05223586.akpm@linux-foundation.org> Sender: linux-kernel-owner@vger.kernel.org List-Id: linux-fsdevel.vger.kernel.org On 11/30/2012 03:55 PM, Andrew Morton wrote: > On Fri, 30 Nov 2012 15:01:26 +0800 Lin Feng wrote: > >> >> >> On 11/30/2012 01:57 PM, Andrew Morton wrote: >>> On Fri, 30 Nov 2012 11:42:05 +0800 Lin Feng wrote: >>> >>>> hi Andrew, >>>> >>>> On 11/30/2012 07:39 AM, Andrew Morton wrote: >>>>> Tricky. >>>>> >>>>> I expect the same problem would occur with pages which are under >>>>> O_DIRECT I/O. Obviously O_DIRECT pages won't be pinned for such long >>>>> periods, but the durations could still be lengthy (seconds). >>>> the offline retry timeout duration is 2 minutes, so to O_DIRECT pages >>>> seem maybe not a problem for the moment. >>>>> >>>>> Worse is a futex page, which could easily remain pinned indefinitely. >>>>> >>>>> The best I can think of is to make changes in or around >>>>> get_user_pages(), to steal the pages from userspace and replace them >>>>> with non-movable ones before pinning them. The performance cost of >>>>> something like this would surely be unacceptable for direct-io, but >>>>> maybe OK for the aio ring and futexes. >>>> thanks for your advice. >>>> I want to limit the impact as little as possible, as mentioned above, >>>> direct-io seems not a problem, we needn't touch them. Maybe we can >>>> just change the use of get_user_pages()(in or around) such as aio >>>> ring pages. I will try to find a way to do this. >>> >>> What about futexes? >> hi Andrew, >> >> Yes, better to find an approach to solve them all. >> >> But I'm worried about that if we just confine get_user_pages() to use >> none-movable pages, it will drain the none-movable pages soon. Because >> there are many places using get_user_pages() such as some drivers. > > Obviously we shouldn't change get_user_pages() for all callers. > >> IMHO in most cases get_user_pages() callers should release the pages soon, >> so pages allocated from movable zone should be OK. But I'm not sure if >> we get such rule upon get_user_pages(). >> And in other cases we specify get_user_pages() to allocate pages from >> none-movable zone. >> >> So could we add a zone-alloc flags when we call get_user_pages()? > > Well, that's a fairly low-level implementation detail. A more typical > approach would be to add a new get_user_pages_non_movable() or such. > That would probably have the same signature as get_user_pages(), with > one additional argument. Then get_user_pages() becomes a one-line > wrapper which passes in a particular value of that argument. > > But that means we'd also have to add get_user_pages_fast_non_movable() > and things might become a bit stupid. A better approach might be to hi Andrew, Thanks for your patient reply. What I can think out is like following: inline int generic_get_user_pages(..., int movable_flag) { if (0 == movable_flag) return get_user_pages(); else if (1 == movable_flag) return get_user_pages_non_movable(); } Yes, that seems to add a lot of duplicated codes. > add a new library function which callers can use before (or after?) > calling get_user_pages[_fast](). Sorry, I'm not quite understand what "library function" function means.. Does it means a function aids get_user_pages() or totally wraps/replaces get_user_pages(), or none of above? Thanks, linfeng > > Unsure. It's the sort of thing where one has to dive in and try a few > things. ah, maybe more complicated than as I can expect.. > From mboxrd@z Thu Jan 1 00:00:00 1970 From: Mel Gorman Subject: Re: [BUG REPORT] [mm-hotplug, aio] aio ring_pages can't be offlined Date: Fri, 30 Nov 2012 11:00:59 +0000 Message-ID: <20121130110059.GD8218@suse.de> References: <1354172098-5691-1-git-send-email-linfeng@cn.fujitsu.com> <20121129153930.477e9709.akpm@linux-foundation.org> <50B82B0D.8010206@cn.fujitsu.com> <20121129215749.acfd872a.akpm@linux-foundation.org> <50B859C6.3020707@cn.fujitsu.com> <20121129235502.05223586.akpm@linux-foundation.org> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Cc: Lin Feng , viro@zeniv.linux.org.uk, bcrl@kvack.org, kamezawa.hiroyu@jp.fujitsu.com, mhocko@suse.cz, hughd@google.com, cl@linux.com, minchan@kernel.org, isimatu.yasuaki@jp.fujitsu.com, laijs@cn.fujitsu.com, wency@cn.fujitsu.com, tangchen@cn.fujitsu.com, linux-fsdevel@vger.kernel.org, linux-aio@kvack.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org To: Andrew Morton Return-path: Content-Disposition: inline In-Reply-To: <20121129235502.05223586.akpm@linux-foundation.org> Sender: linux-kernel-owner@vger.kernel.org List-Id: linux-fsdevel.vger.kernel.org On Thu, Nov 29, 2012 at 11:55:02PM -0800, Andrew Morton wrote: > On Fri, 30 Nov 2012 15:01:26 +0800 Lin Feng wrote: > > > > > > > On 11/30/2012 01:57 PM, Andrew Morton wrote: > > > On Fri, 30 Nov 2012 11:42:05 +0800 Lin Feng wrote: > > > > > >> hi Andrew, > > >> > > >> On 11/30/2012 07:39 AM, Andrew Morton wrote: > > >>> Tricky. > > >>> > > >>> I expect the same problem would occur with pages which are under > > >>> O_DIRECT I/O. Obviously O_DIRECT pages won't be pinned for such long > > >>> periods, but the durations could still be lengthy (seconds). > > >> the offline retry timeout duration is 2 minutes, so to O_DIRECT pages > > >> seem maybe not a problem for the moment. > > >>> > > >>> Worse is a futex page, which could easily remain pinned indefinitely. > > >>> > > >>> The best I can think of is to make changes in or around > > >>> get_user_pages(), to steal the pages from userspace and replace them > > >>> with non-movable ones before pinning them. The performance cost of > > >>> something like this would surely be unacceptable for direct-io, but > > >>> maybe OK for the aio ring and futexes. > > >> thanks for your advice. > > >> I want to limit the impact as little as possible, as mentioned above, > > >> direct-io seems not a problem, we needn't touch them. Maybe we can > > >> just change the use of get_user_pages()(in or around) such as aio > > >> ring pages. I will try to find a way to do this. > > > > > > What about futexes? > > hi Andrew, > > > > Yes, better to find an approach to solve them all. > > > > But I'm worried about that if we just confine get_user_pages() to use > > none-movable pages, it will drain the none-movable pages soon. Because > > there are many places using get_user_pages() such as some drivers. > > Obviously we shouldn't change get_user_pages() for all callers. > > > IMHO in most cases get_user_pages() callers should release the pages soon, > > so pages allocated from movable zone should be OK. But I'm not sure if > > we get such rule upon get_user_pages(). > > And in other cases we specify get_user_pages() to allocate pages from > > none-movable zone. > > > > So could we add a zone-alloc flags when we call get_user_pages()? > > Well, that's a fairly low-level implementation detail. A more typical > approach would be to add a new get_user_pages_non_movable() or such. > That would probably have the same signature as get_user_pages(), with > one additional argument. Then get_user_pages() becomes a one-line > wrapper which passes in a particular value of that argument. > That is going in the direction that all pinned pages become MIGRATE_UNMOVABLE allocations. That will impact THP availability by increasing the number of MIGRATE_UNMOVABLE blocks that exist and it would hit every user -- not just those that care about ZONE_MOVABLE. I'm likely to NAK such a patch if it's only about node hot-remove because it's much more of a corner case than wanting to use THP. I would prefer if get_user_pages() checked if the page it was about to pin was in ZONE_MOVABLE and if so, migrate it at that point before it's pinned. It'll be expensive but will guarantee ZONE_MOVABLE availability if that's what they want. The CMA people might also want to take advantage of this if the page happened to be in the MIGRATE_CMA pageblock. -- Mel Gorman SUSE Labs From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andrew Morton Subject: Re: [BUG REPORT] [mm-hotplug, aio] aio ring_pages can't be offlined Date: Thu, 29 Nov 2012 21:57:49 -0800 Message-ID: <20121129215749.acfd872a.akpm@linux-foundation.org> References: <1354172098-5691-1-git-send-email-linfeng@cn.fujitsu.com> <20121129153930.477e9709.akpm@linux-foundation.org> <50B82B0D.8010206@cn.fujitsu.com> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Cc: viro@zeniv.linux.org.uk, bcrl@kvack.org, kamezawa.hiroyu@jp.fujitsu.com, mhocko@suse.cz, hughd@google.com, cl@linux.com, mgorman@suse.de, minchan@kernel.org, isimatu.yasuaki@jp.fujitsu.com, laijs@cn.fujitsu.com, wency@cn.fujitsu.com, tangchen@cn.fujitsu.com, linux-fsdevel@vger.kernel.org, linux-aio@kvack.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org To: Lin Feng Return-path: In-Reply-To: <50B82B0D.8010206@cn.fujitsu.com> Sender: linux-kernel-owner@vger.kernel.org List-Id: linux-fsdevel.vger.kernel.org On Fri, 30 Nov 2012 11:42:05 +0800 Lin Feng wrote: > hi Andrew, > > On 11/30/2012 07:39 AM, Andrew Morton wrote: > > Tricky. > > > > I expect the same problem would occur with pages which are under > > O_DIRECT I/O. Obviously O_DIRECT pages won't be pinned for such long > > periods, but the durations could still be lengthy (seconds). > the offline retry timeout duration is 2 minutes, so to O_DIRECT pages > seem maybe not a problem for the moment. > > > > Worse is a futex page, which could easily remain pinned indefinitely. > > > > The best I can think of is to make changes in or around > > get_user_pages(), to steal the pages from userspace and replace them > > with non-movable ones before pinning them. The performance cost of > > something like this would surely be unacceptable for direct-io, but > > maybe OK for the aio ring and futexes. > thanks for your advice. > I want to limit the impact as little as possible, as mentioned above, > direct-io seems not a problem, we needn't touch them. Maybe we can > just change the use of get_user_pages()(in or around) such as aio > ring pages. I will try to find a way to do this. What about futexes? From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andrew Morton Subject: Re: [BUG REPORT] [mm-hotplug, aio] aio ring_pages can't be offlined Date: Fri, 30 Nov 2012 02:47:55 -0800 Message-ID: <20121130024755.b5dae17e.akpm@linux-foundation.org> References: <1354172098-5691-1-git-send-email-linfeng@cn.fujitsu.com> <20121129153930.477e9709.akpm@linux-foundation.org> <50B82B0D.8010206@cn.fujitsu.com> <20121129215749.acfd872a.akpm@linux-foundation.org> <50B859C6.3020707@cn.fujitsu.com> <20121129235502.05223586.akpm@linux-foundation.org> <50B88A8A.9020802@cn.fujitsu.com> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Cc: viro@zeniv.linux.org.uk, bcrl@kvack.org, kamezawa.hiroyu@jp.fujitsu.com, mhocko@suse.cz, hughd@google.com, cl@linux.com, mgorman@suse.de, minchan@kernel.org, isimatu.yasuaki@jp.fujitsu.com, laijs@cn.fujitsu.com, wency@cn.fujitsu.com, tangchen@cn.fujitsu.com, linux-fsdevel@vger.kernel.org, linux-aio@kvack.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org To: Lin Feng Return-path: In-Reply-To: <50B88A8A.9020802@cn.fujitsu.com> Sender: owner-linux-aio@kvack.org List-Id: linux-fsdevel.vger.kernel.org On Fri, 30 Nov 2012 18:29:30 +0800 Lin Feng wrote: > > add a new library function which callers can use before (or after?) > > calling get_user_pages[_fast](). > Sorry, I'm not quite understand what "library function" function means.. > Does it means a function aids get_user_pages() or totally wraps/replaces > get_user_pages(), or none of above? "library function" is terminology for a general facility which the core kernel makes available to other parts of the kernel. get_user_pages() is a library function, as are the functions in lib/, etc. "grep EXPORT_SYMBOL ./*/*.c" -- To unsubscribe, send a message with 'unsubscribe linux-aio' in the body to majordomo@kvack.org. For more info on Linux AIO, see: http://www.kvack.org/aio/ Don't email: aart@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andrew Morton Subject: Re: [BUG REPORT] [mm-hotplug, aio] aio ring_pages can't be offlined Date: Thu, 29 Nov 2012 23:55:02 -0800 Message-ID: <20121129235502.05223586.akpm@linux-foundation.org> References: <1354172098-5691-1-git-send-email-linfeng@cn.fujitsu.com> <20121129153930.477e9709.akpm@linux-foundation.org> <50B82B0D.8010206@cn.fujitsu.com> <20121129215749.acfd872a.akpm@linux-foundation.org> <50B859C6.3020707@cn.fujitsu.com> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Cc: viro@zeniv.linux.org.uk, bcrl@kvack.org, kamezawa.hiroyu@jp.fujitsu.com, mhocko@suse.cz, hughd@google.com, cl@linux.com, mgorman@suse.de, minchan@kernel.org, isimatu.yasuaki@jp.fujitsu.com, laijs@cn.fujitsu.com, wency@cn.fujitsu.com, tangchen@cn.fujitsu.com, linux-fsdevel@vger.kernel.org, linux-aio@kvack.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org To: Lin Feng Return-path: In-Reply-To: <50B859C6.3020707@cn.fujitsu.com> Sender: owner-linux-mm@kvack.org List-Id: linux-fsdevel.vger.kernel.org On Fri, 30 Nov 2012 15:01:26 +0800 Lin Feng wrote: > > > On 11/30/2012 01:57 PM, Andrew Morton wrote: > > On Fri, 30 Nov 2012 11:42:05 +0800 Lin Feng wrote: > > > >> hi Andrew, > >> > >> On 11/30/2012 07:39 AM, Andrew Morton wrote: > >>> Tricky. > >>> > >>> I expect the same problem would occur with pages which are under > >>> O_DIRECT I/O. Obviously O_DIRECT pages won't be pinned for such long > >>> periods, but the durations could still be lengthy (seconds). > >> the offline retry timeout duration is 2 minutes, so to O_DIRECT pages > >> seem maybe not a problem for the moment. > >>> > >>> Worse is a futex page, which could easily remain pinned indefinitely. > >>> > >>> The best I can think of is to make changes in or around > >>> get_user_pages(), to steal the pages from userspace and replace them > >>> with non-movable ones before pinning them. The performance cost of > >>> something like this would surely be unacceptable for direct-io, but > >>> maybe OK for the aio ring and futexes. > >> thanks for your advice. > >> I want to limit the impact as little as possible, as mentioned above, > >> direct-io seems not a problem, we needn't touch them. Maybe we can > >> just change the use of get_user_pages()(in or around) such as aio > >> ring pages. I will try to find a way to do this. > > > > What about futexes? > hi Andrew, > > Yes, better to find an approach to solve them all. > > But I'm worried about that if we just confine get_user_pages() to use > none-movable pages, it will drain the none-movable pages soon. Because > there are many places using get_user_pages() such as some drivers. Obviously we shouldn't change get_user_pages() for all callers. > IMHO in most cases get_user_pages() callers should release the pages soon, > so pages allocated from movable zone should be OK. But I'm not sure if > we get such rule upon get_user_pages(). > And in other cases we specify get_user_pages() to allocate pages from > none-movable zone. > > So could we add a zone-alloc flags when we call get_user_pages()? Well, that's a fairly low-level implementation detail. A more typical approach would be to add a new get_user_pages_non_movable() or such. That would probably have the same signature as get_user_pages(), with one additional argument. Then get_user_pages() becomes a one-line wrapper which passes in a particular value of that argument. But that means we'd also have to add get_user_pages_fast_non_movable() and things might become a bit stupid. A better approach might be to add a new library function which callers can use before (or after?) calling get_user_pages[_fast](). Unsure. It's the sort of thing where one has to dive in and try a few things. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: Domenico Andreoli Subject: Re: [BUG REPORT] [mm-hotplug, aio] aio ring_pages can't be offlined Date: Fri, 30 Nov 2012 16:24:21 +0100 Message-ID: <20121130152421.GA19849@glitch> References: <1354172098-5691-1-git-send-email-linfeng@cn.fujitsu.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: akpm@linux-foundation.org, viro@zeniv.linux.org.uk, bcrl@kvack.org, kamezawa.hiroyu@jp.fujitsu.com, mhocko@suse.cz, hughd@google.com, cl@linux.com, mgorman@suse.de, minchan@kernel.org, isimatu.yasuaki@jp.fujitsu.com, laijs@cn.fujitsu.com, wency@cn.fujitsu.com, tangchen@cn.fujitsu.com, linux-fsdevel@vger.kernel.org, linux-aio@kvack.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org To: Lin Feng Return-path: Content-Disposition: inline In-Reply-To: <1354172098-5691-1-git-send-email-linfeng@cn.fujitsu.com> Sender: owner-linux-aio@kvack.org List-Id: linux-fsdevel.vger.kernel.org On Thu, Nov 29, 2012 at 02:54:58PM +0800, Lin Feng wrote: > Hi all, Hi Lin, > We encounter a "Resource temporarily unavailable" fail while trying > to offline a memory section in a movable zone. We found that there are > some pages can't be migrated. The offline operation fails in function > migrate_page_move_mapping() returning -EAGAIN till timeout because > the if assertion 'page_count(page) != 1' fails. is this something that worked before? if yes (then it's a regression) do you know with which kernel? Thanks, Domenico -- To unsubscribe, send a message with 'unsubscribe linux-aio' in the body to majordomo@kvack.org. For more info on Linux AIO, see: http://www.kvack.org/aio/ Don't email: aart@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: Lin Feng Subject: Re: [BUG REPORT] [mm-hotplug, aio] aio ring_pages can't be offlined Date: Mon, 03 Dec 2012 10:05:04 +0800 Message-ID: <50BC08D0.5070006@cn.fujitsu.com> References: <1354172098-5691-1-git-send-email-linfeng@cn.fujitsu.com> <20121130152421.GA19849@glitch> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: akpm@linux-foundation.org, viro@zeniv.linux.org.uk, bcrl@kvack.org, kamezawa.hiroyu@jp.fujitsu.com, mhocko@suse.cz, hughd@google.com, cl@linux.com, mgorman@suse.de, minchan@kernel.org, isimatu.yasuaki@jp.fujitsu.com, laijs@cn.fujitsu.com, wency@cn.fujitsu.com, tangchen@cn.fujitsu.com, linux-fsdevel@vger.kernel.org, linux-aio@kvack.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org To: cavokz@gmail.com Return-path: In-Reply-To: <20121130152421.GA19849@glitch> Sender: linux-kernel-owner@vger.kernel.org List-Id: linux-fsdevel.vger.kernel.org hi Domenico, Sorry for my late reply and thanks for your attention, see below :) On 11/30/2012 11:24 PM, Domenico Andreoli wrote: > On Thu, Nov 29, 2012 at 02:54:58PM +0800, Lin Feng wrote: >> Hi all, > > Hi Lin, > >> We encounter a "Resource temporarily unavailable" fail while trying >> to offline a memory section in a movable zone. We found that there are >> some pages can't be migrated. The offline operation fails in function >> migrate_page_move_mapping() returning -EAGAIN till timeout because >> the if assertion 'page_count(page) != 1' fails. > > is this something that worked before? if yes (then it's a regression) > do you know with which kernel? I think it's a problem exist long ago since we got the offline feature, while I'm not sure from which version :) It can only be reproduce by a zone-movable configured system holding pages allocated by get_user_pages() for a long time. Maybe we could also reproduce it by write a app just calls io_setup() syscall and never release until it dies. Then locate the memory section from which pages are allocated and try to offline it. In fact if one doesn't want to use offline/hotplug memory feature, to whom it's not a bug :) Thanks, linfeng > > Thanks, > Domenico > From mboxrd@z Thu Jan 1 00:00:00 1970 From: Lin Feng Subject: Re: [BUG REPORT] [mm-hotplug, aio] aio ring_pages can't be offlined Date: Mon, 03 Dec 2012 10:52:27 +0800 Message-ID: <50BC13EB.1050009@cn.fujitsu.com> References: <1354172098-5691-1-git-send-email-linfeng@cn.fujitsu.com> <20121129153930.477e9709.akpm@linux-foundation.org> <50B82B0D.8010206@cn.fujitsu.com> <20121129215749.acfd872a.akpm@linux-foundation.org> <50B859C6.3020707@cn.fujitsu.com> <20121129235502.05223586.akpm@linux-foundation.org> <20121130110059.GD8218@suse.de> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-15 Content-Transfer-Encoding: 7bit Cc: Andrew Morton , viro@zeniv.linux.org.uk, bcrl@kvack.org, kamezawa.hiroyu@jp.fujitsu.com, mhocko@suse.cz, hughd@google.com, cl@linux.com, minchan@kernel.org, isimatu.yasuaki@jp.fujitsu.com, laijs@cn.fujitsu.com, wency@cn.fujitsu.com, tangchen@cn.fujitsu.com, linux-fsdevel@vger.kernel.org, linux-aio@kvack.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org To: Mel Gorman Return-path: In-Reply-To: <20121130110059.GD8218@suse.de> Sender: owner-linux-mm@kvack.org List-Id: linux-fsdevel.vger.kernel.org On 11/30/2012 07:00 PM, Mel Gorman wrote: >> >> Well, that's a fairly low-level implementation detail. A more typical >> approach would be to add a new get_user_pages_non_movable() or such. >> That would probably have the same signature as get_user_pages(), with >> one additional argument. Then get_user_pages() becomes a one-line >> wrapper which passes in a particular value of that argument. >> > > That is going in the direction that all pinned pages become MIGRATE_UNMOVABLE > allocations. That will impact THP availability by increasing the number > of MIGRATE_UNMOVABLE blocks that exist and it would hit every user -- > not just those that care about ZONE_MOVABLE. > > I'm likely to NAK such a patch if it's only about node hot-remove because > it's much more of a corner case than wanting to use THP. > > I would prefer if get_user_pages() checked if the page it was about to > pin was in ZONE_MOVABLE and if so, migrate it at that point before it's > pinned. It'll be expensive but will guarantee ZONE_MOVABLE availability > if that's what they want. The CMA people might also want to take > advantage of this if the page happened to be in the MIGRATE_CMA > pageblock. > hi Mel, Thanks for your suggestion. My initial idea is also to restrict the impact as little as possible so migrate such pages as we need. But even to such "going to pin pages", most of them are going to be released soon, so deal with them all in the same way is really *expensive*. May be we do have to find another way that makes everybody happy :) Thanks, linfeng -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: Lin Feng Subject: Re: [BUG REPORT] [mm-hotplug, aio] aio ring_pages can't be offlined Date: Mon, 03 Dec 2012 11:00:49 +0800 Message-ID: <50BC15E1.8060806@cn.fujitsu.com> References: <1354172098-5691-1-git-send-email-linfeng@cn.fujitsu.com> <20121129153930.477e9709.akpm@linux-foundation.org> <50B82B0D.8010206@cn.fujitsu.com> <20121129215749.acfd872a.akpm@linux-foundation.org> <50B859C6.3020707@cn.fujitsu.com> <20121129235502.05223586.akpm@linux-foundation.org> <50B88A8A.9020802@cn.fujitsu.com> <20121130024755.b5dae17e.akpm@linux-foundation.org> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: viro@zeniv.linux.org.uk, bcrl@kvack.org, kamezawa.hiroyu@jp.fujitsu.com, mhocko@suse.cz, hughd@google.com, cl@linux.com, mgorman@suse.de, minchan@kernel.org, isimatu.yasuaki@jp.fujitsu.com, laijs@cn.fujitsu.com, wency@cn.fujitsu.com, tangchen@cn.fujitsu.com, linux-fsdevel@vger.kernel.org, linux-aio@kvack.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Lin Feng To: Andrew Morton Return-path: In-Reply-To: <20121130024755.b5dae17e.akpm@linux-foundation.org> Sender: owner-linux-aio@kvack.org List-Id: linux-fsdevel.vger.kernel.org On 11/30/2012 06:47 PM, Andrew Morton wrote: > On Fri, 30 Nov 2012 18:29:30 +0800 Lin Feng wrote: > >>> add a new library function which callers can use before (or after?) >>> calling get_user_pages[_fast](). >> Sorry, I'm not quite understand what "library function" function means.. >> Does it means a function aids get_user_pages() or totally wraps/replaces >> get_user_pages(), or none of above? > > "library function" is terminology for a general facility which > the core kernel makes available to other parts of the kernel. > get_user_pages() is a library function, as are the functions in lib/, > etc. "grep EXPORT_SYMBOL ./*/*.c" hi Andrew, Thanks for your explanation and sorry for my ignorant question :) As Mel said Still I can't find a way to make every guy happy.. Thanks, linfeng > > > -- To unsubscribe, send a message with 'unsubscribe linux-aio' in the body to majordomo@kvack.org. For more info on Linux AIO, see: http://www.kvack.org/aio/ Don't email: aart@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: Mel Gorman Subject: Re: [BUG REPORT] [mm-hotplug, aio] aio ring_pages can't be offlined Date: Mon, 3 Dec 2012 11:37:04 +0000 Message-ID: <20121203113704.GK8218@suse.de> References: <1354172098-5691-1-git-send-email-linfeng@cn.fujitsu.com> <20121129153930.477e9709.akpm@linux-foundation.org> <50B82B0D.8010206@cn.fujitsu.com> <20121129215749.acfd872a.akpm@linux-foundation.org> <50B859C6.3020707@cn.fujitsu.com> <20121129235502.05223586.akpm@linux-foundation.org> <20121130110059.GD8218@suse.de> <50BC13EB.1050009@cn.fujitsu.com> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Cc: Andrew Morton , viro@zeniv.linux.org.uk, bcrl@kvack.org, kamezawa.hiroyu@jp.fujitsu.com, mhocko@suse.cz, hughd@google.com, cl@linux.com, minchan@kernel.org, isimatu.yasuaki@jp.fujitsu.com, laijs@cn.fujitsu.com, wency@cn.fujitsu.com, tangchen@cn.fujitsu.com, linux-fsdevel@vger.kernel.org, linux-aio@kvack.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org To: Lin Feng Return-path: Content-Disposition: inline In-Reply-To: <50BC13EB.1050009@cn.fujitsu.com> Sender: owner-linux-aio@kvack.org List-Id: linux-fsdevel.vger.kernel.org On Mon, Dec 03, 2012 at 10:52:27AM +0800, Lin Feng wrote: > > > On 11/30/2012 07:00 PM, Mel Gorman wrote: > >> > >> Well, that's a fairly low-level implementation detail. A more typical > >> approach would be to add a new get_user_pages_non_movable() or such. > >> That would probably have the same signature as get_user_pages(), with > >> one additional argument. Then get_user_pages() becomes a one-line > >> wrapper which passes in a particular value of that argument. > >> > > > > That is going in the direction that all pinned pages become MIGRATE_UNMOVABLE > > allocations. That will impact THP availability by increasing the number > > of MIGRATE_UNMOVABLE blocks that exist and it would hit every user -- > > not just those that care about ZONE_MOVABLE. > > > > I'm likely to NAK such a patch if it's only about node hot-remove because > > it's much more of a corner case than wanting to use THP. > > > > I would prefer if get_user_pages() checked if the page it was about to > > pin was in ZONE_MOVABLE and if so, migrate it at that point before it's > > pinned. It'll be expensive but will guarantee ZONE_MOVABLE availability > > if that's what they want. The CMA people might also want to take > > advantage of this if the page happened to be in the MIGRATE_CMA > > pageblock. > > > hi Mel, > > Thanks for your suggestion. > My initial idea is also to restrict the impact as little as possible so > migrate such pages as we need. > But even to such "going to pin pages", most of them are going to be released > soon, so deal with them all in the same way is really *expensive*. > Then you need to somehow distinguish between short-lived pins and long-lived pins and only migrate the long-lived pins. I didn't research how this could be implemented -- Mel Gorman SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-aio' in the body to majordomo@kvack.org. For more info on Linux AIO, see: http://www.kvack.org/aio/ Don't email: aart@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: From: Lin Feng Subject: [BUG REPORT] [mm-hotplug, aio] aio ring_pages can't be offlined Date: Thu, 29 Nov 2012 14:54:58 +0800 Message-Id: <1354172098-5691-1-git-send-email-linfeng@cn.fujitsu.com> Sender: owner-linux-mm@kvack.org List-ID: To: akpm@linux-foundation.org, viro@zeniv.linux.org.uk, bcrl@kvack.org, kamezawa.hiroyu@jp.fujitsu.com, mhocko@suse.cz, hughd@google.com, cl@linux.com Cc: mgorman@suse.de, minchan@kernel.org, isimatu.yasuaki@jp.fujitsu.com, laijs@cn.fujitsu.com, wency@cn.fujitsu.com, tangchen@cn.fujitsu.com, linux-fsdevel@vger.kernel.org, linux-aio@kvack.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Lin Feng Hi all, We encounter a "Resource temporarily unavailable" fail while trying to offline a memory section in a movable zone. We found that there are some pages can't be migrated. The offline operation fails in function migrate_page_move_mapping() returning -EAGAIN till timeout because the if assertion 'page_count(page) != 1' fails. I wonder in the case 'page_count(page) != 1', should we always wait (return -EAGAING)? Or in other words, can we do something here for migration if we know where the pages from? And finally found that such pages are used by /sbin/multipathd in the form of aio ring_pages. Besides once increment introduced by the offline calling chain, another increment is added by aio_setup_ring() via callling get_userpages(), it won't decrease until we call aio_free_ring(). The dump_page info in the offline context is showed as following: page:ffffea0011e69140 count:2 mapcount:0 mapping:ffff8801d6949881 index:0x7fc4b6d1d page flags: 0x30000000018081d(locked|referenced|uptodate|dirty|swapbacked|unevictable) page:ffffea0011fb0480 count:2 mapcount:0 mapping:ffff8801d6949881 index:0x7fc4b6d1c page flags: 0x30000000018081d(locked|referenced|uptodate|dirty|swapbacked|unevictable) page:ffffea0011fbaa80 count:2 mapcount:0 mapping:ffff8801d6949881 index:0x7fc4b6d1a page flags: 0x30000000018081d(locked|referenced|uptodate|dirty|swapbacked|unevictable) page:ffffea0011ff21c0 count:2 mapcount:0 mapping:ffff8801d6949881 index:0x7fc4b6d1b page flags: 0x30000000018081d(locked|referenced|uptodate|dirty|swapbacked|unevictable) The multipathd seems never going to release the ring_pages until we reboot the box. Furthermore, if some guy makes app which only calls io_setup() but never calls io_destroy() for the reason that he has to keep the io_setup() for a long time or just forgets to or even on purpose that we can't expect. So I think the mm-hotplug framwork should get the capability to deal with such situation. And should we consider adding migration support for such pages? However I don't know if there are any other kinds of such particular pages in current kernel/Linux system. If unluckily there are many apparently it's hard to handle them all, just adding migrate support for aio ring_pages is insufficient. But if luckily can we use the private field of page struct to track the ring_pages[] pointer so that we can retrieve the user when migrate? Doing so another problem occurs, how to distinguish such special pages? Use pageflag may cause an impact on current pageflag layout, add new pageflag item also seems to be impossible. I'm not sure what way is the right approach, seeking for help. Any comments are extremely needed, thanks :) Thanks, linfeng -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Date: Thu, 29 Nov 2012 15:39:30 -0800 From: Andrew Morton Subject: Re: [BUG REPORT] [mm-hotplug, aio] aio ring_pages can't be offlined Message-Id: <20121129153930.477e9709.akpm@linux-foundation.org> In-Reply-To: <1354172098-5691-1-git-send-email-linfeng@cn.fujitsu.com> References: <1354172098-5691-1-git-send-email-linfeng@cn.fujitsu.com> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Lin Feng Cc: viro@zeniv.linux.org.uk, bcrl@kvack.org, kamezawa.hiroyu@jp.fujitsu.com, mhocko@suse.cz, hughd@google.com, cl@linux.com, mgorman@suse.de, minchan@kernel.org, isimatu.yasuaki@jp.fujitsu.com, laijs@cn.fujitsu.com, wency@cn.fujitsu.com, tangchen@cn.fujitsu.com, linux-fsdevel@vger.kernel.org, linux-aio@kvack.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org On Thu, 29 Nov 2012 14:54:58 +0800 Lin Feng wrote: > Hi all, > > We encounter a "Resource temporarily unavailable" fail while trying > to offline a memory section in a movable zone. We found that there are > some pages can't be migrated. The offline operation fails in function > migrate_page_move_mapping() returning -EAGAIN till timeout because > the if assertion 'page_count(page) != 1' fails. > I wonder in the case 'page_count(page) != 1', should we always wait > (return -EAGAING)? Or in other words, can we do something here for > migration if we know where the pages from? > > And finally found that such pages are used by /sbin/multipathd in the form > of aio ring_pages. Besides once increment introduced by the offline calling > chain, another increment is added by aio_setup_ring() via callling > get_userpages(), it won't decrease until we call aio_free_ring(). > > The dump_page info in the offline context is showed as following: > page:ffffea0011e69140 count:2 mapcount:0 mapping:ffff8801d6949881 index:0x7fc4b6d1d > page flags: 0x30000000018081d(locked|referenced|uptodate|dirty|swapbacked|unevictable) > page:ffffea0011fb0480 count:2 mapcount:0 mapping:ffff8801d6949881 index:0x7fc4b6d1c > page flags: 0x30000000018081d(locked|referenced|uptodate|dirty|swapbacked|unevictable) > page:ffffea0011fbaa80 count:2 mapcount:0 mapping:ffff8801d6949881 index:0x7fc4b6d1a > page flags: 0x30000000018081d(locked|referenced|uptodate|dirty|swapbacked|unevictable) > page:ffffea0011ff21c0 count:2 mapcount:0 mapping:ffff8801d6949881 index:0x7fc4b6d1b > page flags: 0x30000000018081d(locked|referenced|uptodate|dirty|swapbacked|unevictable) > > The multipathd seems never going to release the ring_pages until we reboot the box. > Furthermore, if some guy makes app which only calls io_setup() but never calls > io_destroy() for the reason that he has to keep the io_setup() for a long time > or just forgets to or even on purpose that we can't expect. > So I think the mm-hotplug framwork should get the capability to deal with such > situation. And should we consider adding migration support for such pages? > > However I don't know if there are any other kinds of such particular pages in > current kernel/Linux system. If unluckily there are many apparently it's hard to > handle them all, just adding migrate support for aio ring_pages is insufficient. > > But if luckily can we use the private field of page struct to track the > ring_pages[] pointer so that we can retrieve the user when migrate? > Doing so another problem occurs, how to distinguish such special pages? > Use pageflag may cause an impact on current pageflag layout, add new pageflag > item also seems to be impossible. > > I'm not sure what way is the right approach, seeking for help. > Any comments are extremely needed, thanks :) Tricky. I expect the same problem would occur with pages which are under O_DIRECT I/O. Obviously O_DIRECT pages won't be pinned for such long periods, but the durations could still be lengthy (seconds). Worse is a futex page, which could easily remain pinned indefinitely. The best I can think of is to make changes in or around get_user_pages(), to steal the pages from userspace and replace them with non-movable ones before pinning them. The performance cost of something like this would surely be unacceptable for direct-io, but maybe OK for the aio ring and futexes. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Date: Thu, 29 Nov 2012 16:04:43 -0800 From: Zach Brown Subject: Re: [BUG REPORT] [mm-hotplug, aio] aio ring_pages can't be offlined Message-ID: <20121130000443.GK18574@lenny.home.zabbo.net> References: <1354172098-5691-1-git-send-email-linfeng@cn.fujitsu.com> <20121129153930.477e9709.akpm@linux-foundation.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20121129153930.477e9709.akpm@linux-foundation.org> Sender: owner-linux-mm@kvack.org List-ID: To: Andrew Morton Cc: Lin Feng , viro@zeniv.linux.org.uk, bcrl@kvack.org, kamezawa.hiroyu@jp.fujitsu.com, mhocko@suse.cz, hughd@google.com, cl@linux.com, mgorman@suse.de, minchan@kernel.org, isimatu.yasuaki@jp.fujitsu.com, laijs@cn.fujitsu.com, wency@cn.fujitsu.com, tangchen@cn.fujitsu.com, linux-fsdevel@vger.kernel.org, linux-aio@kvack.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org > The best I can think of is to make changes in or around > get_user_pages(), to steal the pages from userspace and replace them > with non-movable ones before pinning them. The performance cost of > something like this would surely be unacceptable for direct-io, but > maybe OK for the aio ring and futexes. In the aio case it seems like it could be taught to populate the mapping with non-movable pages to begin with. It's calling get_user_pages() a few lines after instantiating the mapping itself with do_mmap_pgoff(). - z -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Message-ID: <50B82A7C.8020202@cn.fujitsu.com> Date: Fri, 30 Nov 2012 11:39:40 +0800 From: Lin Feng MIME-Version: 1.0 Subject: Re: [BUG REPORT] [mm-hotplug, aio] aio ring_pages can't be offlined References: <1354172098-5691-1-git-send-email-linfeng@cn.fujitsu.com> <20121129153930.477e9709.akpm@linux-foundation.org> <20121130000443.GK18574@lenny.home.zabbo.net> In-Reply-To: <20121130000443.GK18574@lenny.home.zabbo.net> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Sender: owner-linux-mm@kvack.org List-ID: To: Zach Brown Cc: Andrew Morton , viro@zeniv.linux.org.uk, bcrl@kvack.org, kamezawa.hiroyu@jp.fujitsu.com, mhocko@suse.cz, hughd@google.com, cl@linux.com, mgorman@suse.de, minchan@kernel.org, isimatu.yasuaki@jp.fujitsu.com, laijs@cn.fujitsu.com, wency@cn.fujitsu.com, tangchen@cn.fujitsu.com, linux-fsdevel@vger.kernel.org, linux-aio@kvack.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Hi Zach, Thanks for your advice. So agree, I will look into it to lead aio=20 to use non-movable pages. Thanks, linfeng On 11/30/2012 08:04 AM, Zach Brown wrote: >> The best I can think of is to make changes in or around >> get=5Fuser=5Fpages(), to steal the pages from userspace and replace them >> with non-movable ones before pinning them. The performance cost of >> something like this would surely be unacceptable for direct-io, but >> maybe OK for the aio ring and futexes. >=20 > In the aio case it seems like it could be taught to populate the mapping > with non-movable pages to begin with. It's calling get=5Fuser=5Fpages() a > few lines after instantiating the mapping itself with do=5Fmmap=5Fpgoff(). >=20 > - z >=20 --=20 -------------------------------------------------- Lin Feng Development Dept.I Nanjing Fujitsu Nanda Software Tech. Co., Ltd.(FNST) No. 6 Wenzhu Road, Nanjing, 210012, China PHONE=EF=BC=9A+86-25-86630566-8557=20 COINS=EF=BC=9A7998-8557=20 FAX=EF=BC=9A+86-25-83317685 MAIL=EF=BC=9Alinfeng@cn.fujitsu.com -------------------------------------------------- = -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Message-ID: <50B82B0D.8010206@cn.fujitsu.com> Date: Fri, 30 Nov 2012 11:42:05 +0800 From: Lin Feng MIME-Version: 1.0 Subject: Re: [BUG REPORT] [mm-hotplug, aio] aio ring_pages can't be offlined References: <1354172098-5691-1-git-send-email-linfeng@cn.fujitsu.com> <20121129153930.477e9709.akpm@linux-foundation.org> In-Reply-To: <20121129153930.477e9709.akpm@linux-foundation.org> Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=ISO-8859-1 Sender: owner-linux-mm@kvack.org List-ID: To: Andrew Morton Cc: viro@zeniv.linux.org.uk, bcrl@kvack.org, kamezawa.hiroyu@jp.fujitsu.com, mhocko@suse.cz, hughd@google.com, cl@linux.com, mgorman@suse.de, minchan@kernel.org, isimatu.yasuaki@jp.fujitsu.com, laijs@cn.fujitsu.com, wency@cn.fujitsu.com, tangchen@cn.fujitsu.com, linux-fsdevel@vger.kernel.org, linux-aio@kvack.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org hi Andrew, On 11/30/2012 07:39 AM, Andrew Morton wrote: > Tricky. > > I expect the same problem would occur with pages which are under > O_DIRECT I/O. Obviously O_DIRECT pages won't be pinned for such long > periods, but the durations could still be lengthy (seconds). the offline retry timeout duration is 2 minutes, so to O_DIRECT pages seem maybe not a problem for the moment. > > Worse is a futex page, which could easily remain pinned indefinitely. > > The best I can think of is to make changes in or around > get_user_pages(), to steal the pages from userspace and replace them > with non-movable ones before pinning them. The performance cost of > something like this would surely be unacceptable for direct-io, but > maybe OK for the aio ring and futexes. thanks for your advice. I want to limit the impact as little as possible, as mentioned above, direct-io seems not a problem, we needn't touch them. Maybe we can just change the use of get_user_pages()(in or around) such as aio ring pages. I will try to find a way to do this. Thanks, linfeng -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Date: Thu, 29 Nov 2012 21:57:49 -0800 From: Andrew Morton Subject: Re: [BUG REPORT] [mm-hotplug, aio] aio ring_pages can't be offlined Message-Id: <20121129215749.acfd872a.akpm@linux-foundation.org> In-Reply-To: <50B82B0D.8010206@cn.fujitsu.com> References: <1354172098-5691-1-git-send-email-linfeng@cn.fujitsu.com> <20121129153930.477e9709.akpm@linux-foundation.org> <50B82B0D.8010206@cn.fujitsu.com> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Lin Feng Cc: viro@zeniv.linux.org.uk, bcrl@kvack.org, kamezawa.hiroyu@jp.fujitsu.com, mhocko@suse.cz, hughd@google.com, cl@linux.com, mgorman@suse.de, minchan@kernel.org, isimatu.yasuaki@jp.fujitsu.com, laijs@cn.fujitsu.com, wency@cn.fujitsu.com, tangchen@cn.fujitsu.com, linux-fsdevel@vger.kernel.org, linux-aio@kvack.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org On Fri, 30 Nov 2012 11:42:05 +0800 Lin Feng wrote: > hi Andrew, > > On 11/30/2012 07:39 AM, Andrew Morton wrote: > > Tricky. > > > > I expect the same problem would occur with pages which are under > > O_DIRECT I/O. Obviously O_DIRECT pages won't be pinned for such long > > periods, but the durations could still be lengthy (seconds). > the offline retry timeout duration is 2 minutes, so to O_DIRECT pages > seem maybe not a problem for the moment. > > > > Worse is a futex page, which could easily remain pinned indefinitely. > > > > The best I can think of is to make changes in or around > > get_user_pages(), to steal the pages from userspace and replace them > > with non-movable ones before pinning them. The performance cost of > > something like this would surely be unacceptable for direct-io, but > > maybe OK for the aio ring and futexes. > thanks for your advice. > I want to limit the impact as little as possible, as mentioned above, > direct-io seems not a problem, we needn't touch them. Maybe we can > just change the use of get_user_pages()(in or around) such as aio > ring pages. I will try to find a way to do this. What about futexes? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Message-ID: <50B859C6.3020707@cn.fujitsu.com> Date: Fri, 30 Nov 2012 15:01:26 +0800 From: Lin Feng MIME-Version: 1.0 Subject: Re: [BUG REPORT] [mm-hotplug, aio] aio ring_pages can't be offlined References: <1354172098-5691-1-git-send-email-linfeng@cn.fujitsu.com> <20121129153930.477e9709.akpm@linux-foundation.org> <50B82B0D.8010206@cn.fujitsu.com> <20121129215749.acfd872a.akpm@linux-foundation.org> In-Reply-To: <20121129215749.acfd872a.akpm@linux-foundation.org> Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=ISO-8859-1 Sender: owner-linux-mm@kvack.org List-ID: To: Andrew Morton Cc: viro@zeniv.linux.org.uk, bcrl@kvack.org, kamezawa.hiroyu@jp.fujitsu.com, mhocko@suse.cz, hughd@google.com, cl@linux.com, mgorman@suse.de, minchan@kernel.org, isimatu.yasuaki@jp.fujitsu.com, laijs@cn.fujitsu.com, wency@cn.fujitsu.com, tangchen@cn.fujitsu.com, linux-fsdevel@vger.kernel.org, linux-aio@kvack.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org On 11/30/2012 01:57 PM, Andrew Morton wrote: > On Fri, 30 Nov 2012 11:42:05 +0800 Lin Feng wrote: > >> hi Andrew, >> >> On 11/30/2012 07:39 AM, Andrew Morton wrote: >>> Tricky. >>> >>> I expect the same problem would occur with pages which are under >>> O_DIRECT I/O. Obviously O_DIRECT pages won't be pinned for such long >>> periods, but the durations could still be lengthy (seconds). >> the offline retry timeout duration is 2 minutes, so to O_DIRECT pages >> seem maybe not a problem for the moment. >>> >>> Worse is a futex page, which could easily remain pinned indefinitely. >>> >>> The best I can think of is to make changes in or around >>> get_user_pages(), to steal the pages from userspace and replace them >>> with non-movable ones before pinning them. The performance cost of >>> something like this would surely be unacceptable for direct-io, but >>> maybe OK for the aio ring and futexes. >> thanks for your advice. >> I want to limit the impact as little as possible, as mentioned above, >> direct-io seems not a problem, we needn't touch them. Maybe we can >> just change the use of get_user_pages()(in or around) such as aio >> ring pages. I will try to find a way to do this. > > What about futexes? hi Andrew, Yes, better to find an approach to solve them all. But I'm worried about that if we just confine get_user_pages() to use none-movable pages, it will drain the none-movable pages soon. Because there are many places using get_user_pages() such as some drivers. IMHO in most cases get_user_pages() callers should release the pages soon, so pages allocated from movable zone should be OK. But I'm not sure if we get such rule upon get_user_pages(). And in other cases we specify get_user_pages() to allocate pages from none-movable zone. So could we add a zone-alloc flags when we call get_user_pages()? Thanks, linfeng > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Message-ID: <50B85C8C.2030702@jp.fujitsu.com> Date: Fri, 30 Nov 2012 16:13:16 +0900 From: Kamezawa Hiroyuki MIME-Version: 1.0 Subject: Re: [BUG REPORT] [mm-hotplug, aio] aio ring_pages can't be offlined References: <1354172098-5691-1-git-send-email-linfeng@cn.fujitsu.com> <20121129153930.477e9709.akpm@linux-foundation.org> <50B82B0D.8010206@cn.fujitsu.com> <20121129215749.acfd872a.akpm@linux-foundation.org> In-Reply-To: <20121129215749.acfd872a.akpm@linux-foundation.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Andrew Morton Cc: Lin Feng , viro@zeniv.linux.org.uk, bcrl@kvack.org, mhocko@suse.cz, hughd@google.com, cl@linux.com, mgorman@suse.de, minchan@kernel.org, isimatu.yasuaki@jp.fujitsu.com, laijs@cn.fujitsu.com, wency@cn.fujitsu.com, tangchen@cn.fujitsu.com, linux-fsdevel@vger.kernel.org, linux-aio@kvack.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org (2012/11/30 14:57), Andrew Morton wrote: > On Fri, 30 Nov 2012 11:42:05 +0800 Lin Feng wrote: > >> hi Andrew, >> >> On 11/30/2012 07:39 AM, Andrew Morton wrote: >>> Tricky. >>> >>> I expect the same problem would occur with pages which are under >>> O_DIRECT I/O. Obviously O_DIRECT pages won't be pinned for such long >>> periods, but the durations could still be lengthy (seconds). >> the offline retry timeout duration is 2 minutes, so to O_DIRECT pages >> seem maybe not a problem for the moment. >>> >>> Worse is a futex page, which could easily remain pinned indefinitely. >>> >>> The best I can think of is to make changes in or around >>> get_user_pages(), to steal the pages from userspace and replace them >>> with non-movable ones before pinning them. The performance cost of >>> something like this would surely be unacceptable for direct-io, but >>> maybe OK for the aio ring and futexes. >> thanks for your advice. >> I want to limit the impact as little as possible, as mentioned above, >> direct-io seems not a problem, we needn't touch them. Maybe we can >> just change the use of get_user_pages()(in or around) such as aio >> ring pages. I will try to find a way to do this. > > What about futexes? > IIUC, futex's key is now a pair of (mm,address) or (inode, pgoff). Then, get_user_page() in futex.c will release the page by put_page(). 'struct page' is just touched by get_futex_key() to obtain page->mapping info. Thanks, -Kame -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Date: Fri, 30 Nov 2012 00:00:43 -0800 From: Andrew Morton Subject: Re: [BUG REPORT] [mm-hotplug, aio] aio ring_pages can't be offlined Message-Id: <20121130000043.cf356676.akpm@linux-foundation.org> In-Reply-To: <50B85C8C.2030702@jp.fujitsu.com> References: <1354172098-5691-1-git-send-email-linfeng@cn.fujitsu.com> <20121129153930.477e9709.akpm@linux-foundation.org> <50B82B0D.8010206@cn.fujitsu.com> <20121129215749.acfd872a.akpm@linux-foundation.org> <50B85C8C.2030702@jp.fujitsu.com> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Kamezawa Hiroyuki Cc: Lin Feng , viro@zeniv.linux.org.uk, bcrl@kvack.org, mhocko@suse.cz, hughd@google.com, cl@linux.com, mgorman@suse.de, minchan@kernel.org, isimatu.yasuaki@jp.fujitsu.com, laijs@cn.fujitsu.com, wency@cn.fujitsu.com, tangchen@cn.fujitsu.com, linux-fsdevel@vger.kernel.org, linux-aio@kvack.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org On Fri, 30 Nov 2012 16:13:16 +0900 Kamezawa Hiroyuki wrote: > > What about futexes? > > > > IIUC, futex's key is now a pair of (mm,address) or (inode, pgoff). > Then, get_user_page() in futex.c will release the page by put_page(). > 'struct page' is just touched by get_futex_key() to obtain page->mapping info. Ah yes, that page is unpinned before syscall return. grep -rl get_user_pages . Gad. These should be audited. The great majority will be simple and OK, but drivers/media, drivers/infiniband and net/rds could be problematic. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Message-ID: <50B88A8A.9020802@cn.fujitsu.com> Date: Fri, 30 Nov 2012 18:29:30 +0800 From: Lin Feng MIME-Version: 1.0 Subject: Re: [BUG REPORT] [mm-hotplug, aio] aio ring_pages can't be offlined References: <1354172098-5691-1-git-send-email-linfeng@cn.fujitsu.com> <20121129153930.477e9709.akpm@linux-foundation.org> <50B82B0D.8010206@cn.fujitsu.com> <20121129215749.acfd872a.akpm@linux-foundation.org> <50B859C6.3020707@cn.fujitsu.com> <20121129235502.05223586.akpm@linux-foundation.org> In-Reply-To: <20121129235502.05223586.akpm@linux-foundation.org> Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=ISO-8859-1 Sender: owner-linux-mm@kvack.org List-ID: To: Andrew Morton Cc: viro@zeniv.linux.org.uk, bcrl@kvack.org, kamezawa.hiroyu@jp.fujitsu.com, mhocko@suse.cz, hughd@google.com, cl@linux.com, mgorman@suse.de, minchan@kernel.org, isimatu.yasuaki@jp.fujitsu.com, laijs@cn.fujitsu.com, wency@cn.fujitsu.com, tangchen@cn.fujitsu.com, linux-fsdevel@vger.kernel.org, linux-aio@kvack.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Lin Feng On 11/30/2012 03:55 PM, Andrew Morton wrote: > On Fri, 30 Nov 2012 15:01:26 +0800 Lin Feng wrote: > >> >> >> On 11/30/2012 01:57 PM, Andrew Morton wrote: >>> On Fri, 30 Nov 2012 11:42:05 +0800 Lin Feng wrote: >>> >>>> hi Andrew, >>>> >>>> On 11/30/2012 07:39 AM, Andrew Morton wrote: >>>>> Tricky. >>>>> >>>>> I expect the same problem would occur with pages which are under >>>>> O_DIRECT I/O. Obviously O_DIRECT pages won't be pinned for such long >>>>> periods, but the durations could still be lengthy (seconds). >>>> the offline retry timeout duration is 2 minutes, so to O_DIRECT pages >>>> seem maybe not a problem for the moment. >>>>> >>>>> Worse is a futex page, which could easily remain pinned indefinitely. >>>>> >>>>> The best I can think of is to make changes in or around >>>>> get_user_pages(), to steal the pages from userspace and replace them >>>>> with non-movable ones before pinning them. The performance cost of >>>>> something like this would surely be unacceptable for direct-io, but >>>>> maybe OK for the aio ring and futexes. >>>> thanks for your advice. >>>> I want to limit the impact as little as possible, as mentioned above, >>>> direct-io seems not a problem, we needn't touch them. Maybe we can >>>> just change the use of get_user_pages()(in or around) such as aio >>>> ring pages. I will try to find a way to do this. >>> >>> What about futexes? >> hi Andrew, >> >> Yes, better to find an approach to solve them all. >> >> But I'm worried about that if we just confine get_user_pages() to use >> none-movable pages, it will drain the none-movable pages soon. Because >> there are many places using get_user_pages() such as some drivers. > > Obviously we shouldn't change get_user_pages() for all callers. > >> IMHO in most cases get_user_pages() callers should release the pages soon, >> so pages allocated from movable zone should be OK. But I'm not sure if >> we get such rule upon get_user_pages(). >> And in other cases we specify get_user_pages() to allocate pages from >> none-movable zone. >> >> So could we add a zone-alloc flags when we call get_user_pages()? > > Well, that's a fairly low-level implementation detail. A more typical > approach would be to add a new get_user_pages_non_movable() or such. > That would probably have the same signature as get_user_pages(), with > one additional argument. Then get_user_pages() becomes a one-line > wrapper which passes in a particular value of that argument. > > But that means we'd also have to add get_user_pages_fast_non_movable() > and things might become a bit stupid. A better approach might be to hi Andrew, Thanks for your patient reply. What I can think out is like following: inline int generic_get_user_pages(..., int movable_flag) { if (0 == movable_flag) return get_user_pages(); else if (1 == movable_flag) return get_user_pages_non_movable(); } Yes, that seems to add a lot of duplicated codes. > add a new library function which callers can use before (or after?) > calling get_user_pages[_fast](). Sorry, I'm not quite understand what "library function" function means.. Does it means a function aids get_user_pages() or totally wraps/replaces get_user_pages(), or none of above? Thanks, linfeng > > Unsure. It's the sort of thing where one has to dive in and try a few > things. ah, maybe more complicated than as I can expect.. > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Date: Fri, 30 Nov 2012 02:47:55 -0800 From: Andrew Morton Subject: Re: [BUG REPORT] [mm-hotplug, aio] aio ring_pages can't be offlined Message-Id: <20121130024755.b5dae17e.akpm@linux-foundation.org> In-Reply-To: <50B88A8A.9020802@cn.fujitsu.com> References: <1354172098-5691-1-git-send-email-linfeng@cn.fujitsu.com> <20121129153930.477e9709.akpm@linux-foundation.org> <50B82B0D.8010206@cn.fujitsu.com> <20121129215749.acfd872a.akpm@linux-foundation.org> <50B859C6.3020707@cn.fujitsu.com> <20121129235502.05223586.akpm@linux-foundation.org> <50B88A8A.9020802@cn.fujitsu.com> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Lin Feng Cc: viro@zeniv.linux.org.uk, bcrl@kvack.org, kamezawa.hiroyu@jp.fujitsu.com, mhocko@suse.cz, hughd@google.com, cl@linux.com, mgorman@suse.de, minchan@kernel.org, isimatu.yasuaki@jp.fujitsu.com, laijs@cn.fujitsu.com, wency@cn.fujitsu.com, tangchen@cn.fujitsu.com, linux-fsdevel@vger.kernel.org, linux-aio@kvack.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org On Fri, 30 Nov 2012 18:29:30 +0800 Lin Feng wrote: > > add a new library function which callers can use before (or after?) > > calling get_user_pages[_fast](). > Sorry, I'm not quite understand what "library function" function means.. > Does it means a function aids get_user_pages() or totally wraps/replaces > get_user_pages(), or none of above? "library function" is terminology for a general facility which the core kernel makes available to other parts of the kernel. get_user_pages() is a library function, as are the functions in lib/, etc. "grep EXPORT_SYMBOL ./*/*.c" -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Date: Fri, 30 Nov 2012 10:57:15 +0000 From: Mel Gorman Subject: Re: [BUG REPORT] [mm-hotplug, aio] aio ring_pages can't be offlined Message-ID: <20121130105715.GC8218@suse.de> References: <1354172098-5691-1-git-send-email-linfeng@cn.fujitsu.com> <20121129153930.477e9709.akpm@linux-foundation.org> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: <20121129153930.477e9709.akpm@linux-foundation.org> Sender: owner-linux-mm@kvack.org List-ID: To: Andrew Morton Cc: Lin Feng , viro@zeniv.linux.org.uk, bcrl@kvack.org, kamezawa.hiroyu@jp.fujitsu.com, mhocko@suse.cz, hughd@google.com, cl@linux.com, minchan@kernel.org, isimatu.yasuaki@jp.fujitsu.com, laijs@cn.fujitsu.com, wency@cn.fujitsu.com, tangchen@cn.fujitsu.com, linux-fsdevel@vger.kernel.org, linux-aio@kvack.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org On Thu, Nov 29, 2012 at 03:39:30PM -0800, Andrew Morton wrote: > On Thu, 29 Nov 2012 14:54:58 +0800 > Lin Feng wrote: > > > Hi all, > > > > We encounter a "Resource temporarily unavailable" fail while trying > > to offline a memory section in a movable zone. We found that there are > > some pages can't be migrated. The offline operation fails in function > > migrate_page_move_mapping() returning -EAGAIN till timeout because > > the if assertion 'page_count(page) != 1' fails. > > I wonder in the case 'page_count(page) != 1', should we always wait > > (return -EAGAING)? Or in other words, can we do something here for > > migration if we know where the pages from? > > > > And finally found that such pages are used by /sbin/multipathd in the form > > of aio ring_pages. Besides once increment introduced by the offline calling > > chain, another increment is added by aio_setup_ring() via callling > > get_userpages(), it won't decrease until we call aio_free_ring(). > > > > The dump_page info in the offline context is showed as following: > > page:ffffea0011e69140 count:2 mapcount:0 mapping:ffff8801d6949881 index:0x7fc4b6d1d > > page flags: 0x30000000018081d(locked|referenced|uptodate|dirty|swapbacked|unevictable) > > page:ffffea0011fb0480 count:2 mapcount:0 mapping:ffff8801d6949881 index:0x7fc4b6d1c > > page flags: 0x30000000018081d(locked|referenced|uptodate|dirty|swapbacked|unevictable) > > page:ffffea0011fbaa80 count:2 mapcount:0 mapping:ffff8801d6949881 index:0x7fc4b6d1a > > page flags: 0x30000000018081d(locked|referenced|uptodate|dirty|swapbacked|unevictable) > > page:ffffea0011ff21c0 count:2 mapcount:0 mapping:ffff8801d6949881 index:0x7fc4b6d1b > > page flags: 0x30000000018081d(locked|referenced|uptodate|dirty|swapbacked|unevictable) > > > > The multipathd seems never going to release the ring_pages until we reboot the box. > > Furthermore, if some guy makes app which only calls io_setup() but never calls > > io_destroy() for the reason that he has to keep the io_setup() for a long time > > or just forgets to or even on purpose that we can't expect. > > So I think the mm-hotplug framwork should get the capability to deal with such > > situation. And should we consider adding migration support for such pages? > > > > However I don't know if there are any other kinds of such particular pages in > > current kernel/Linux system. If unluckily there are many apparently it's hard to > > handle them all, just adding migrate support for aio ring_pages is insufficient. > > > > But if luckily can we use the private field of page struct to track the > > ring_pages[] pointer so that we can retrieve the user when migrate? > > Doing so another problem occurs, how to distinguish such special pages? > > Use pageflag may cause an impact on current pageflag layout, add new pageflag > > item also seems to be impossible. > > > > I'm not sure what way is the right approach, seeking for help. > > Any comments are extremely needed, thanks :) > > Tricky. > > I expect the same problem would occur with pages which are under > O_DIRECT I/O. Obviously O_DIRECT pages won't be pinned for such long > periods, but the durations could still be lengthy (seconds). > > Worse is a futex page, which could easily remain pinned indefinitely. > > The best I can think of is to make changes in or around > get_user_pages(), to steal the pages from userspace and replace them > with non-movable ones before pinning them. The performance cost of > something like this would surely be unacceptable for direct-io, but > maybe OK for the aio ring and futexes. > If this happens then it would be preferred if this only happened for ZONE_MOVABLE. If it generally happens it means we're going to have a lot more MIGRATE_UNMOVABLE pageblocks and a lot more fragmentation leading to lower THP availability. For THP, we're ok if some pageblocks are temporarily unavailable or even unavailable for long periods of time, we can cope with that but we (or I at least) do not want to lower THP availability on systems that do not care about ZONE_MOVABLE or node hot-plug. -- Mel Gorman SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Date: Fri, 30 Nov 2012 11:00:59 +0000 From: Mel Gorman Subject: Re: [BUG REPORT] [mm-hotplug, aio] aio ring_pages can't be offlined Message-ID: <20121130110059.GD8218@suse.de> References: <1354172098-5691-1-git-send-email-linfeng@cn.fujitsu.com> <20121129153930.477e9709.akpm@linux-foundation.org> <50B82B0D.8010206@cn.fujitsu.com> <20121129215749.acfd872a.akpm@linux-foundation.org> <50B859C6.3020707@cn.fujitsu.com> <20121129235502.05223586.akpm@linux-foundation.org> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: <20121129235502.05223586.akpm@linux-foundation.org> Sender: owner-linux-mm@kvack.org List-ID: To: Andrew Morton Cc: Lin Feng , viro@zeniv.linux.org.uk, bcrl@kvack.org, kamezawa.hiroyu@jp.fujitsu.com, mhocko@suse.cz, hughd@google.com, cl@linux.com, minchan@kernel.org, isimatu.yasuaki@jp.fujitsu.com, laijs@cn.fujitsu.com, wency@cn.fujitsu.com, tangchen@cn.fujitsu.com, linux-fsdevel@vger.kernel.org, linux-aio@kvack.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org On Thu, Nov 29, 2012 at 11:55:02PM -0800, Andrew Morton wrote: > On Fri, 30 Nov 2012 15:01:26 +0800 Lin Feng wrote: > > > > > > > On 11/30/2012 01:57 PM, Andrew Morton wrote: > > > On Fri, 30 Nov 2012 11:42:05 +0800 Lin Feng wrote: > > > > > >> hi Andrew, > > >> > > >> On 11/30/2012 07:39 AM, Andrew Morton wrote: > > >>> Tricky. > > >>> > > >>> I expect the same problem would occur with pages which are under > > >>> O_DIRECT I/O. Obviously O_DIRECT pages won't be pinned for such long > > >>> periods, but the durations could still be lengthy (seconds). > > >> the offline retry timeout duration is 2 minutes, so to O_DIRECT pages > > >> seem maybe not a problem for the moment. > > >>> > > >>> Worse is a futex page, which could easily remain pinned indefinitely. > > >>> > > >>> The best I can think of is to make changes in or around > > >>> get_user_pages(), to steal the pages from userspace and replace them > > >>> with non-movable ones before pinning them. The performance cost of > > >>> something like this would surely be unacceptable for direct-io, but > > >>> maybe OK for the aio ring and futexes. > > >> thanks for your advice. > > >> I want to limit the impact as little as possible, as mentioned above, > > >> direct-io seems not a problem, we needn't touch them. Maybe we can > > >> just change the use of get_user_pages()(in or around) such as aio > > >> ring pages. I will try to find a way to do this. > > > > > > What about futexes? > > hi Andrew, > > > > Yes, better to find an approach to solve them all. > > > > But I'm worried about that if we just confine get_user_pages() to use > > none-movable pages, it will drain the none-movable pages soon. Because > > there are many places using get_user_pages() such as some drivers. > > Obviously we shouldn't change get_user_pages() for all callers. > > > IMHO in most cases get_user_pages() callers should release the pages soon, > > so pages allocated from movable zone should be OK. But I'm not sure if > > we get such rule upon get_user_pages(). > > And in other cases we specify get_user_pages() to allocate pages from > > none-movable zone. > > > > So could we add a zone-alloc flags when we call get_user_pages()? > > Well, that's a fairly low-level implementation detail. A more typical > approach would be to add a new get_user_pages_non_movable() or such. > That would probably have the same signature as get_user_pages(), with > one additional argument. Then get_user_pages() becomes a one-line > wrapper which passes in a particular value of that argument. > That is going in the direction that all pinned pages become MIGRATE_UNMOVABLE allocations. That will impact THP availability by increasing the number of MIGRATE_UNMOVABLE blocks that exist and it would hit every user -- not just those that care about ZONE_MOVABLE. I'm likely to NAK such a patch if it's only about node hot-remove because it's much more of a corner case than wanting to use THP. I would prefer if get_user_pages() checked if the page it was about to pin was in ZONE_MOVABLE and if so, migrate it at that point before it's pinned. It'll be expensive but will guarantee ZONE_MOVABLE availability if that's what they want. The CMA people might also want to take advantage of this if the page happened to be in the MIGRATE_CMA pageblock. -- Mel Gorman SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Date: Fri, 30 Nov 2012 16:24:21 +0100 From: Domenico Andreoli Subject: Re: [BUG REPORT] [mm-hotplug, aio] aio ring_pages can't be offlined Message-ID: <20121130152421.GA19849@glitch> References: <1354172098-5691-1-git-send-email-linfeng@cn.fujitsu.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1354172098-5691-1-git-send-email-linfeng@cn.fujitsu.com> Sender: owner-linux-mm@kvack.org List-ID: To: Lin Feng Cc: akpm@linux-foundation.org, viro@zeniv.linux.org.uk, bcrl@kvack.org, kamezawa.hiroyu@jp.fujitsu.com, mhocko@suse.cz, hughd@google.com, cl@linux.com, mgorman@suse.de, minchan@kernel.org, isimatu.yasuaki@jp.fujitsu.com, laijs@cn.fujitsu.com, wency@cn.fujitsu.com, tangchen@cn.fujitsu.com, linux-fsdevel@vger.kernel.org, linux-aio@kvack.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org On Thu, Nov 29, 2012 at 02:54:58PM +0800, Lin Feng wrote: > Hi all, Hi Lin, > We encounter a "Resource temporarily unavailable" fail while trying > to offline a memory section in a movable zone. We found that there are > some pages can't be migrated. The offline operation fails in function > migrate_page_move_mapping() returning -EAGAIN till timeout because > the if assertion 'page_count(page) != 1' fails. is this something that worked before? if yes (then it's a regression) do you know with which kernel? Thanks, Domenico -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Message-ID: <50BC08D0.5070006@cn.fujitsu.com> Date: Mon, 03 Dec 2012 10:05:04 +0800 From: Lin Feng MIME-Version: 1.0 Subject: Re: [BUG REPORT] [mm-hotplug, aio] aio ring_pages can't be offlined References: <1354172098-5691-1-git-send-email-linfeng@cn.fujitsu.com> <20121130152421.GA19849@glitch> In-Reply-To: <20121130152421.GA19849@glitch> Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=ISO-8859-1 Sender: owner-linux-mm@kvack.org List-ID: To: cavokz@gmail.com Cc: akpm@linux-foundation.org, viro@zeniv.linux.org.uk, bcrl@kvack.org, kamezawa.hiroyu@jp.fujitsu.com, mhocko@suse.cz, hughd@google.com, cl@linux.com, mgorman@suse.de, minchan@kernel.org, isimatu.yasuaki@jp.fujitsu.com, laijs@cn.fujitsu.com, wency@cn.fujitsu.com, tangchen@cn.fujitsu.com, linux-fsdevel@vger.kernel.org, linux-aio@kvack.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org hi Domenico, Sorry for my late reply and thanks for your attention, see below :) On 11/30/2012 11:24 PM, Domenico Andreoli wrote: > On Thu, Nov 29, 2012 at 02:54:58PM +0800, Lin Feng wrote: >> Hi all, > > Hi Lin, > >> We encounter a "Resource temporarily unavailable" fail while trying >> to offline a memory section in a movable zone. We found that there are >> some pages can't be migrated. The offline operation fails in function >> migrate_page_move_mapping() returning -EAGAIN till timeout because >> the if assertion 'page_count(page) != 1' fails. > > is this something that worked before? if yes (then it's a regression) > do you know with which kernel? I think it's a problem exist long ago since we got the offline feature, while I'm not sure from which version :) It can only be reproduce by a zone-movable configured system holding pages allocated by get_user_pages() for a long time. Maybe we could also reproduce it by write a app just calls io_setup() syscall and never release until it dies. Then locate the memory section from which pages are allocated and try to offline it. In fact if one doesn't want to use offline/hotplug memory feature, to whom it's not a bug :) Thanks, linfeng > > Thanks, > Domenico > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Message-ID: <50BC15E1.8060806@cn.fujitsu.com> Date: Mon, 03 Dec 2012 11:00:49 +0800 From: Lin Feng MIME-Version: 1.0 Subject: Re: [BUG REPORT] [mm-hotplug, aio] aio ring_pages can't be offlined References: <1354172098-5691-1-git-send-email-linfeng@cn.fujitsu.com> <20121129153930.477e9709.akpm@linux-foundation.org> <50B82B0D.8010206@cn.fujitsu.com> <20121129215749.acfd872a.akpm@linux-foundation.org> <50B859C6.3020707@cn.fujitsu.com> <20121129235502.05223586.akpm@linux-foundation.org> <50B88A8A.9020802@cn.fujitsu.com> <20121130024755.b5dae17e.akpm@linux-foundation.org> In-Reply-To: <20121130024755.b5dae17e.akpm@linux-foundation.org> Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=ISO-8859-1 Sender: owner-linux-mm@kvack.org List-ID: To: Andrew Morton Cc: viro@zeniv.linux.org.uk, bcrl@kvack.org, kamezawa.hiroyu@jp.fujitsu.com, mhocko@suse.cz, hughd@google.com, cl@linux.com, mgorman@suse.de, minchan@kernel.org, isimatu.yasuaki@jp.fujitsu.com, laijs@cn.fujitsu.com, wency@cn.fujitsu.com, tangchen@cn.fujitsu.com, linux-fsdevel@vger.kernel.org, linux-aio@kvack.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Lin Feng On 11/30/2012 06:47 PM, Andrew Morton wrote: > On Fri, 30 Nov 2012 18:29:30 +0800 Lin Feng wrote: > >>> add a new library function which callers can use before (or after?) >>> calling get_user_pages[_fast](). >> Sorry, I'm not quite understand what "library function" function means.. >> Does it means a function aids get_user_pages() or totally wraps/replaces >> get_user_pages(), or none of above? > > "library function" is terminology for a general facility which > the core kernel makes available to other parts of the kernel. > get_user_pages() is a library function, as are the functions in lib/, > etc. "grep EXPORT_SYMBOL ./*/*.c" hi Andrew, Thanks for your explanation and sorry for my ignorant question :) As Mel said Still I can't find a way to make every guy happy.. Thanks, linfeng > > > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Date: Mon, 3 Dec 2012 11:37:04 +0000 From: Mel Gorman Subject: Re: [BUG REPORT] [mm-hotplug, aio] aio ring_pages can't be offlined Message-ID: <20121203113704.GK8218@suse.de> References: <1354172098-5691-1-git-send-email-linfeng@cn.fujitsu.com> <20121129153930.477e9709.akpm@linux-foundation.org> <50B82B0D.8010206@cn.fujitsu.com> <20121129215749.acfd872a.akpm@linux-foundation.org> <50B859C6.3020707@cn.fujitsu.com> <20121129235502.05223586.akpm@linux-foundation.org> <20121130110059.GD8218@suse.de> <50BC13EB.1050009@cn.fujitsu.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: <50BC13EB.1050009@cn.fujitsu.com> Sender: owner-linux-mm@kvack.org List-ID: To: Lin Feng Cc: Andrew Morton , viro@zeniv.linux.org.uk, bcrl@kvack.org, kamezawa.hiroyu@jp.fujitsu.com, mhocko@suse.cz, hughd@google.com, cl@linux.com, minchan@kernel.org, isimatu.yasuaki@jp.fujitsu.com, laijs@cn.fujitsu.com, wency@cn.fujitsu.com, tangchen@cn.fujitsu.com, linux-fsdevel@vger.kernel.org, linux-aio@kvack.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org On Mon, Dec 03, 2012 at 10:52:27AM +0800, Lin Feng wrote: > > > On 11/30/2012 07:00 PM, Mel Gorman wrote: > >> > >> Well, that's a fairly low-level implementation detail. A more typical > >> approach would be to add a new get_user_pages_non_movable() or such. > >> That would probably have the same signature as get_user_pages(), with > >> one additional argument. Then get_user_pages() becomes a one-line > >> wrapper which passes in a particular value of that argument. > >> > > > > That is going in the direction that all pinned pages become MIGRATE_UNMOVABLE > > allocations. That will impact THP availability by increasing the number > > of MIGRATE_UNMOVABLE blocks that exist and it would hit every user -- > > not just those that care about ZONE_MOVABLE. > > > > I'm likely to NAK such a patch if it's only about node hot-remove because > > it's much more of a corner case than wanting to use THP. > > > > I would prefer if get_user_pages() checked if the page it was about to > > pin was in ZONE_MOVABLE and if so, migrate it at that point before it's > > pinned. It'll be expensive but will guarantee ZONE_MOVABLE availability > > if that's what they want. The CMA people might also want to take > > advantage of this if the page happened to be in the MIGRATE_CMA > > pageblock. > > > hi Mel, > > Thanks for your suggestion. > My initial idea is also to restrict the impact as little as possible so > migrate such pages as we need. > But even to such "going to pin pages", most of them are going to be released > soon, so deal with them all in the same way is really *expensive*. > Then you need to somehow distinguish between short-lived pins and long-lived pins and only migrate the long-lived pins. I didn't research how this could be implemented -- Mel Gorman SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751894Ab2K2Gz0 (ORCPT ); Thu, 29 Nov 2012 01:55:26 -0500 Received: from cn.fujitsu.com ([222.73.24.84]:20561 "EHLO song.cn.fujitsu.com" rhost-flags-OK-FAIL-OK-OK) by vger.kernel.org with ESMTP id S1750727Ab2K2GzZ (ORCPT ); Thu, 29 Nov 2012 01:55:25 -0500 X-IronPort-AV: E=Sophos;i="4.83,339,1352044800"; d="scan'208";a="6298653" From: Lin Feng To: akpm@linux-foundation.org, viro@zeniv.linux.org.uk, bcrl@kvack.org, kamezawa.hiroyu@jp.fujitsu.com, mhocko@suse.cz, hughd@google.com, cl@linux.com Cc: mgorman@suse.de, minchan@kernel.org, isimatu.yasuaki@jp.fujitsu.com, laijs@cn.fujitsu.com, wency@cn.fujitsu.com, tangchen@cn.fujitsu.com, linux-fsdevel@vger.kernel.org, linux-aio@kvack.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Lin Feng Subject: [BUG REPORT] [mm-hotplug, aio] aio ring_pages can't be offlined Date: Thu, 29 Nov 2012 14:54:58 +0800 Message-Id: <1354172098-5691-1-git-send-email-linfeng@cn.fujitsu.com> X-Mailer: git-send-email 1.7.11.7 X-MIMETrack: Itemize by SMTP Server on mailserver/fnst(Release 8.5.3|September 15, 2011) at 2012/11/29 14:54:45, Serialize by Router on mailserver/fnst(Release 8.5.3|September 15, 2011) at 2012/11/29 14:54:46, Serialize complete at 2012/11/29 14:54:46 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi all, We encounter a "Resource temporarily unavailable" fail while trying to offline a memory section in a movable zone. We found that there are some pages can't be migrated. The offline operation fails in function migrate_page_move_mapping() returning -EAGAIN till timeout because the if assertion 'page_count(page) != 1' fails. I wonder in the case 'page_count(page) != 1', should we always wait (return -EAGAING)? Or in other words, can we do something here for migration if we know where the pages from? And finally found that such pages are used by /sbin/multipathd in the form of aio ring_pages. Besides once increment introduced by the offline calling chain, another increment is added by aio_setup_ring() via callling get_userpages(), it won't decrease until we call aio_free_ring(). The dump_page info in the offline context is showed as following: page:ffffea0011e69140 count:2 mapcount:0 mapping:ffff8801d6949881 index:0x7fc4b6d1d page flags: 0x30000000018081d(locked|referenced|uptodate|dirty|swapbacked|unevictable) page:ffffea0011fb0480 count:2 mapcount:0 mapping:ffff8801d6949881 index:0x7fc4b6d1c page flags: 0x30000000018081d(locked|referenced|uptodate|dirty|swapbacked|unevictable) page:ffffea0011fbaa80 count:2 mapcount:0 mapping:ffff8801d6949881 index:0x7fc4b6d1a page flags: 0x30000000018081d(locked|referenced|uptodate|dirty|swapbacked|unevictable) page:ffffea0011ff21c0 count:2 mapcount:0 mapping:ffff8801d6949881 index:0x7fc4b6d1b page flags: 0x30000000018081d(locked|referenced|uptodate|dirty|swapbacked|unevictable) The multipathd seems never going to release the ring_pages until we reboot the box. Furthermore, if some guy makes app which only calls io_setup() but never calls io_destroy() for the reason that he has to keep the io_setup() for a long time or just forgets to or even on purpose that we can't expect. So I think the mm-hotplug framwork should get the capability to deal with such situation. And should we consider adding migration support for such pages? However I don't know if there are any other kinds of such particular pages in current kernel/Linux system. If unluckily there are many apparently it's hard to handle them all, just adding migrate support for aio ring_pages is insufficient. But if luckily can we use the private field of page struct to track the ring_pages[] pointer so that we can retrieve the user when migrate? Doing so another problem occurs, how to distinguish such special pages? Use pageflag may cause an impact on current pageflag layout, add new pageflag item also seems to be impossible. I'm not sure what way is the right approach, seeking for help. Any comments are extremely needed, thanks :) Thanks, linfeng From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754926Ab2K2Xjd (ORCPT ); Thu, 29 Nov 2012 18:39:33 -0500 Received: from mail.linuxfoundation.org ([140.211.169.12]:40276 "EHLO mail.linuxfoundation.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752859Ab2K2Xjc (ORCPT ); Thu, 29 Nov 2012 18:39:32 -0500 Date: Thu, 29 Nov 2012 15:39:30 -0800 From: Andrew Morton To: Lin Feng Cc: viro@zeniv.linux.org.uk, bcrl@kvack.org, kamezawa.hiroyu@jp.fujitsu.com, mhocko@suse.cz, hughd@google.com, cl@linux.com, mgorman@suse.de, minchan@kernel.org, isimatu.yasuaki@jp.fujitsu.com, laijs@cn.fujitsu.com, wency@cn.fujitsu.com, tangchen@cn.fujitsu.com, linux-fsdevel@vger.kernel.org, linux-aio@kvack.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [BUG REPORT] [mm-hotplug, aio] aio ring_pages can't be offlined Message-Id: <20121129153930.477e9709.akpm@linux-foundation.org> In-Reply-To: <1354172098-5691-1-git-send-email-linfeng@cn.fujitsu.com> References: <1354172098-5691-1-git-send-email-linfeng@cn.fujitsu.com> X-Mailer: Sylpheed 3.0.2 (GTK+ 2.20.1; x86_64-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, 29 Nov 2012 14:54:58 +0800 Lin Feng wrote: > Hi all, > > We encounter a "Resource temporarily unavailable" fail while trying > to offline a memory section in a movable zone. We found that there are > some pages can't be migrated. The offline operation fails in function > migrate_page_move_mapping() returning -EAGAIN till timeout because > the if assertion 'page_count(page) != 1' fails. > I wonder in the case 'page_count(page) != 1', should we always wait > (return -EAGAING)? Or in other words, can we do something here for > migration if we know where the pages from? > > And finally found that such pages are used by /sbin/multipathd in the form > of aio ring_pages. Besides once increment introduced by the offline calling > chain, another increment is added by aio_setup_ring() via callling > get_userpages(), it won't decrease until we call aio_free_ring(). > > The dump_page info in the offline context is showed as following: > page:ffffea0011e69140 count:2 mapcount:0 mapping:ffff8801d6949881 index:0x7fc4b6d1d > page flags: 0x30000000018081d(locked|referenced|uptodate|dirty|swapbacked|unevictable) > page:ffffea0011fb0480 count:2 mapcount:0 mapping:ffff8801d6949881 index:0x7fc4b6d1c > page flags: 0x30000000018081d(locked|referenced|uptodate|dirty|swapbacked|unevictable) > page:ffffea0011fbaa80 count:2 mapcount:0 mapping:ffff8801d6949881 index:0x7fc4b6d1a > page flags: 0x30000000018081d(locked|referenced|uptodate|dirty|swapbacked|unevictable) > page:ffffea0011ff21c0 count:2 mapcount:0 mapping:ffff8801d6949881 index:0x7fc4b6d1b > page flags: 0x30000000018081d(locked|referenced|uptodate|dirty|swapbacked|unevictable) > > The multipathd seems never going to release the ring_pages until we reboot the box. > Furthermore, if some guy makes app which only calls io_setup() but never calls > io_destroy() for the reason that he has to keep the io_setup() for a long time > or just forgets to or even on purpose that we can't expect. > So I think the mm-hotplug framwork should get the capability to deal with such > situation. And should we consider adding migration support for such pages? > > However I don't know if there are any other kinds of such particular pages in > current kernel/Linux system. If unluckily there are many apparently it's hard to > handle them all, just adding migrate support for aio ring_pages is insufficient. > > But if luckily can we use the private field of page struct to track the > ring_pages[] pointer so that we can retrieve the user when migrate? > Doing so another problem occurs, how to distinguish such special pages? > Use pageflag may cause an impact on current pageflag layout, add new pageflag > item also seems to be impossible. > > I'm not sure what way is the right approach, seeking for help. > Any comments are extremely needed, thanks :) Tricky. I expect the same problem would occur with pages which are under O_DIRECT I/O. Obviously O_DIRECT pages won't be pinned for such long periods, but the durations could still be lengthy (seconds). Worse is a futex page, which could easily remain pinned indefinitely. The best I can think of is to make changes in or around get_user_pages(), to steal the pages from userspace and replace them with non-movable ones before pinning them. The performance cost of something like this would surely be unacceptable for direct-io, but maybe OK for the aio ring and futexes. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755203Ab2K3AEp (ORCPT ); Thu, 29 Nov 2012 19:04:45 -0500 Received: from tetsuo.zabbo.net ([50.193.208.193]:59499 "EHLO tetsuo.zabbo.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754521Ab2K3AEo (ORCPT ); Thu, 29 Nov 2012 19:04:44 -0500 Date: Thu, 29 Nov 2012 16:04:43 -0800 From: Zach Brown To: Andrew Morton Cc: Lin Feng , viro@zeniv.linux.org.uk, bcrl@kvack.org, kamezawa.hiroyu@jp.fujitsu.com, mhocko@suse.cz, hughd@google.com, cl@linux.com, mgorman@suse.de, minchan@kernel.org, isimatu.yasuaki@jp.fujitsu.com, laijs@cn.fujitsu.com, wency@cn.fujitsu.com, tangchen@cn.fujitsu.com, linux-fsdevel@vger.kernel.org, linux-aio@kvack.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [BUG REPORT] [mm-hotplug, aio] aio ring_pages can't be offlined Message-ID: <20121130000443.GK18574@lenny.home.zabbo.net> References: <1354172098-5691-1-git-send-email-linfeng@cn.fujitsu.com> <20121129153930.477e9709.akpm@linux-foundation.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20121129153930.477e9709.akpm@linux-foundation.org> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > The best I can think of is to make changes in or around > get_user_pages(), to steal the pages from userspace and replace them > with non-movable ones before pinning them. The performance cost of > something like this would surely be unacceptable for direct-io, but > maybe OK for the aio ring and futexes. In the aio case it seems like it could be taught to populate the mapping with non-movable pages to begin with. It's calling get_user_pages() a few lines after instantiating the mapping itself with do_mmap_pgoff(). - z From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754160Ab2K3DkD (ORCPT ); Thu, 29 Nov 2012 22:40:03 -0500 Received: from cn.fujitsu.com ([222.73.24.84]:60187 "EHLO song.cn.fujitsu.com" rhost-flags-OK-FAIL-OK-OK) by vger.kernel.org with ESMTP id S1753986Ab2K3Dj7 convert rfc822-to-8bit (ORCPT ); Thu, 29 Nov 2012 22:39:59 -0500 X-IronPort-AV: E=Sophos;i="4.83,346,1352044800"; d="scan'208";a="6303939" Message-ID: <50B82A7C.8020202@cn.fujitsu.com> Date: Fri, 30 Nov 2012 11:39:40 +0800 From: Lin Feng User-Agent: Mozilla/5.0 (X11; Linux i686; rv:15.0) Gecko/20120911 Thunderbird/15.0.1 MIME-Version: 1.0 To: Zach Brown CC: Andrew Morton , viro@zeniv.linux.org.uk, bcrl@kvack.org, kamezawa.hiroyu@jp.fujitsu.com, mhocko@suse.cz, hughd@google.com, cl@linux.com, mgorman@suse.de, minchan@kernel.org, isimatu.yasuaki@jp.fujitsu.com, laijs@cn.fujitsu.com, wency@cn.fujitsu.com, tangchen@cn.fujitsu.com, linux-fsdevel@vger.kernel.org, linux-aio@kvack.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [BUG REPORT] [mm-hotplug, aio] aio ring_pages can't be offlined References: <1354172098-5691-1-git-send-email-linfeng@cn.fujitsu.com> <20121129153930.477e9709.akpm@linux-foundation.org> <20121130000443.GK18574@lenny.home.zabbo.net> In-Reply-To: <20121130000443.GK18574@lenny.home.zabbo.net> X-MIMETrack: Itemize by SMTP Server on mailserver/fnst(Release 8.5.3|September 15, 2011) at 2012/11/30 11:39:26, Serialize by Router on mailserver/fnst(Release 8.5.3|September 15, 2011) at 2012/11/30 11:39:27 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8BIT Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Zach, Thanks for your advice. So agree, I will look into it to lead aio to use non-movable pages. Thanks, linfeng On 11/30/2012 08:04 AM, Zach Brown wrote: >> The best I can think of is to make changes in or around >> get_user_pages(), to steal the pages from userspace and replace them >> with non-movable ones before pinning them. The performance cost of >> something like this would surely be unacceptable for direct-io, but >> maybe OK for the aio ring and futexes. > > In the aio case it seems like it could be taught to populate the mapping > with non-movable pages to begin with. It's calling get_user_pages() a > few lines after instantiating the mapping itself with do_mmap_pgoff(). > > - z > -- -------------------------------------------------- Lin Feng Development Dept.I Nanjing Fujitsu Nanda Software Tech. Co., Ltd.(FNST) No. 6 Wenzhu Road, Nanjing, 210012, China PHONE:+86-25-86630566-8557 COINS:7998-8557 FAX:+86-25-83317685 MAIL:linfeng@cn.fujitsu.com -------------------------------------------------- From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753556Ab2K3HzP (ORCPT ); Fri, 30 Nov 2012 02:55:15 -0500 Received: from mail.linuxfoundation.org ([140.211.169.12]:42466 "EHLO mail.linuxfoundation.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752129Ab2K3HzN (ORCPT ); Fri, 30 Nov 2012 02:55:13 -0500 Date: Thu, 29 Nov 2012 23:55:02 -0800 From: Andrew Morton To: Lin Feng Cc: viro@zeniv.linux.org.uk, bcrl@kvack.org, kamezawa.hiroyu@jp.fujitsu.com, mhocko@suse.cz, hughd@google.com, cl@linux.com, mgorman@suse.de, minchan@kernel.org, isimatu.yasuaki@jp.fujitsu.com, laijs@cn.fujitsu.com, wency@cn.fujitsu.com, tangchen@cn.fujitsu.com, linux-fsdevel@vger.kernel.org, linux-aio@kvack.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [BUG REPORT] [mm-hotplug, aio] aio ring_pages can't be offlined Message-Id: <20121129235502.05223586.akpm@linux-foundation.org> In-Reply-To: <50B859C6.3020707@cn.fujitsu.com> References: <1354172098-5691-1-git-send-email-linfeng@cn.fujitsu.com> <20121129153930.477e9709.akpm@linux-foundation.org> <50B82B0D.8010206@cn.fujitsu.com> <20121129215749.acfd872a.akpm@linux-foundation.org> <50B859C6.3020707@cn.fujitsu.com> X-Mailer: Sylpheed 2.7.1 (GTK+ 2.18.9; x86_64-redhat-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, 30 Nov 2012 15:01:26 +0800 Lin Feng wrote: > > > On 11/30/2012 01:57 PM, Andrew Morton wrote: > > On Fri, 30 Nov 2012 11:42:05 +0800 Lin Feng wrote: > > > >> hi Andrew, > >> > >> On 11/30/2012 07:39 AM, Andrew Morton wrote: > >>> Tricky. > >>> > >>> I expect the same problem would occur with pages which are under > >>> O_DIRECT I/O. Obviously O_DIRECT pages won't be pinned for such long > >>> periods, but the durations could still be lengthy (seconds). > >> the offline retry timeout duration is 2 minutes, so to O_DIRECT pages > >> seem maybe not a problem for the moment. > >>> > >>> Worse is a futex page, which could easily remain pinned indefinitely. > >>> > >>> The best I can think of is to make changes in or around > >>> get_user_pages(), to steal the pages from userspace and replace them > >>> with non-movable ones before pinning them. The performance cost of > >>> something like this would surely be unacceptable for direct-io, but > >>> maybe OK for the aio ring and futexes. > >> thanks for your advice. > >> I want to limit the impact as little as possible, as mentioned above, > >> direct-io seems not a problem, we needn't touch them. Maybe we can > >> just change the use of get_user_pages()(in or around) such as aio > >> ring pages. I will try to find a way to do this. > > > > What about futexes? > hi Andrew, > > Yes, better to find an approach to solve them all. > > But I'm worried about that if we just confine get_user_pages() to use > none-movable pages, it will drain the none-movable pages soon. Because > there are many places using get_user_pages() such as some drivers. Obviously we shouldn't change get_user_pages() for all callers. > IMHO in most cases get_user_pages() callers should release the pages soon, > so pages allocated from movable zone should be OK. But I'm not sure if > we get such rule upon get_user_pages(). > And in other cases we specify get_user_pages() to allocate pages from > none-movable zone. > > So could we add a zone-alloc flags when we call get_user_pages()? Well, that's a fairly low-level implementation detail. A more typical approach would be to add a new get_user_pages_non_movable() or such. That would probably have the same signature as get_user_pages(), with one additional argument. Then get_user_pages() becomes a one-line wrapper which passes in a particular value of that argument. But that means we'd also have to add get_user_pages_fast_non_movable() and things might become a bit stupid. A better approach might be to add a new library function which callers can use before (or after?) calling get_user_pages[_fast](). Unsure. It's the sort of thing where one has to dive in and try a few things. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753749Ab2K3IAy (ORCPT ); Fri, 30 Nov 2012 03:00:54 -0500 Received: from mail.linuxfoundation.org ([140.211.169.12]:42586 "EHLO mail.linuxfoundation.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751167Ab2K3IAw (ORCPT ); Fri, 30 Nov 2012 03:00:52 -0500 Date: Fri, 30 Nov 2012 00:00:43 -0800 From: Andrew Morton To: Kamezawa Hiroyuki Cc: Lin Feng , viro@zeniv.linux.org.uk, bcrl@kvack.org, mhocko@suse.cz, hughd@google.com, cl@linux.com, mgorman@suse.de, minchan@kernel.org, isimatu.yasuaki@jp.fujitsu.com, laijs@cn.fujitsu.com, wency@cn.fujitsu.com, tangchen@cn.fujitsu.com, linux-fsdevel@vger.kernel.org, linux-aio@kvack.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [BUG REPORT] [mm-hotplug, aio] aio ring_pages can't be offlined Message-Id: <20121130000043.cf356676.akpm@linux-foundation.org> In-Reply-To: <50B85C8C.2030702@jp.fujitsu.com> References: <1354172098-5691-1-git-send-email-linfeng@cn.fujitsu.com> <20121129153930.477e9709.akpm@linux-foundation.org> <50B82B0D.8010206@cn.fujitsu.com> <20121129215749.acfd872a.akpm@linux-foundation.org> <50B85C8C.2030702@jp.fujitsu.com> X-Mailer: Sylpheed 2.7.1 (GTK+ 2.18.9; x86_64-redhat-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, 30 Nov 2012 16:13:16 +0900 Kamezawa Hiroyuki wrote: > > What about futexes? > > > > IIUC, futex's key is now a pair of (mm,address) or (inode, pgoff). > Then, get_user_page() in futex.c will release the page by put_page(). > 'struct page' is just touched by get_futex_key() to obtain page->mapping info. Ah yes, that page is unpinned before syscall return. grep -rl get_user_pages . Gad. These should be audited. The great majority will be simple and OK, but drivers/media, drivers/infiniband and net/rds could be problematic. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755791Ab2K3KsG (ORCPT ); Fri, 30 Nov 2012 05:48:06 -0500 Received: from mail.linuxfoundation.org ([140.211.169.12]:44785 "EHLO mail.linuxfoundation.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754240Ab2K3KsF (ORCPT ); Fri, 30 Nov 2012 05:48:05 -0500 Date: Fri, 30 Nov 2012 02:47:55 -0800 From: Andrew Morton To: Lin Feng Cc: viro@zeniv.linux.org.uk, bcrl@kvack.org, kamezawa.hiroyu@jp.fujitsu.com, mhocko@suse.cz, hughd@google.com, cl@linux.com, mgorman@suse.de, minchan@kernel.org, isimatu.yasuaki@jp.fujitsu.com, laijs@cn.fujitsu.com, wency@cn.fujitsu.com, tangchen@cn.fujitsu.com, linux-fsdevel@vger.kernel.org, linux-aio@kvack.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [BUG REPORT] [mm-hotplug, aio] aio ring_pages can't be offlined Message-Id: <20121130024755.b5dae17e.akpm@linux-foundation.org> In-Reply-To: <50B88A8A.9020802@cn.fujitsu.com> References: <1354172098-5691-1-git-send-email-linfeng@cn.fujitsu.com> <20121129153930.477e9709.akpm@linux-foundation.org> <50B82B0D.8010206@cn.fujitsu.com> <20121129215749.acfd872a.akpm@linux-foundation.org> <50B859C6.3020707@cn.fujitsu.com> <20121129235502.05223586.akpm@linux-foundation.org> <50B88A8A.9020802@cn.fujitsu.com> X-Mailer: Sylpheed 2.7.1 (GTK+ 2.18.9; x86_64-redhat-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, 30 Nov 2012 18:29:30 +0800 Lin Feng wrote: > > add a new library function which callers can use before (or after?) > > calling get_user_pages[_fast](). > Sorry, I'm not quite understand what "library function" function means.. > Does it means a function aids get_user_pages() or totally wraps/replaces > get_user_pages(), or none of above? "library function" is terminology for a general facility which the core kernel makes available to other parts of the kernel. get_user_pages() is a library function, as are the functions in lib/, etc. "grep EXPORT_SYMBOL ./*/*.c" From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757531Ab2K3K5W (ORCPT ); Fri, 30 Nov 2012 05:57:22 -0500 Received: from cantor2.suse.de ([195.135.220.15]:34134 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750852Ab2K3K5U (ORCPT ); Fri, 30 Nov 2012 05:57:20 -0500 Date: Fri, 30 Nov 2012 10:57:15 +0000 From: Mel Gorman To: Andrew Morton Cc: Lin Feng , viro@zeniv.linux.org.uk, bcrl@kvack.org, kamezawa.hiroyu@jp.fujitsu.com, mhocko@suse.cz, hughd@google.com, cl@linux.com, minchan@kernel.org, isimatu.yasuaki@jp.fujitsu.com, laijs@cn.fujitsu.com, wency@cn.fujitsu.com, tangchen@cn.fujitsu.com, linux-fsdevel@vger.kernel.org, linux-aio@kvack.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [BUG REPORT] [mm-hotplug, aio] aio ring_pages can't be offlined Message-ID: <20121130105715.GC8218@suse.de> References: <1354172098-5691-1-git-send-email-linfeng@cn.fujitsu.com> <20121129153930.477e9709.akpm@linux-foundation.org> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: <20121129153930.477e9709.akpm@linux-foundation.org> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Nov 29, 2012 at 03:39:30PM -0800, Andrew Morton wrote: > On Thu, 29 Nov 2012 14:54:58 +0800 > Lin Feng wrote: > > > Hi all, > > > > We encounter a "Resource temporarily unavailable" fail while trying > > to offline a memory section in a movable zone. We found that there are > > some pages can't be migrated. The offline operation fails in function > > migrate_page_move_mapping() returning -EAGAIN till timeout because > > the if assertion 'page_count(page) != 1' fails. > > I wonder in the case 'page_count(page) != 1', should we always wait > > (return -EAGAING)? Or in other words, can we do something here for > > migration if we know where the pages from? > > > > And finally found that such pages are used by /sbin/multipathd in the form > > of aio ring_pages. Besides once increment introduced by the offline calling > > chain, another increment is added by aio_setup_ring() via callling > > get_userpages(), it won't decrease until we call aio_free_ring(). > > > > The dump_page info in the offline context is showed as following: > > page:ffffea0011e69140 count:2 mapcount:0 mapping:ffff8801d6949881 index:0x7fc4b6d1d > > page flags: 0x30000000018081d(locked|referenced|uptodate|dirty|swapbacked|unevictable) > > page:ffffea0011fb0480 count:2 mapcount:0 mapping:ffff8801d6949881 index:0x7fc4b6d1c > > page flags: 0x30000000018081d(locked|referenced|uptodate|dirty|swapbacked|unevictable) > > page:ffffea0011fbaa80 count:2 mapcount:0 mapping:ffff8801d6949881 index:0x7fc4b6d1a > > page flags: 0x30000000018081d(locked|referenced|uptodate|dirty|swapbacked|unevictable) > > page:ffffea0011ff21c0 count:2 mapcount:0 mapping:ffff8801d6949881 index:0x7fc4b6d1b > > page flags: 0x30000000018081d(locked|referenced|uptodate|dirty|swapbacked|unevictable) > > > > The multipathd seems never going to release the ring_pages until we reboot the box. > > Furthermore, if some guy makes app which only calls io_setup() but never calls > > io_destroy() for the reason that he has to keep the io_setup() for a long time > > or just forgets to or even on purpose that we can't expect. > > So I think the mm-hotplug framwork should get the capability to deal with such > > situation. And should we consider adding migration support for such pages? > > > > However I don't know if there are any other kinds of such particular pages in > > current kernel/Linux system. If unluckily there are many apparently it's hard to > > handle them all, just adding migrate support for aio ring_pages is insufficient. > > > > But if luckily can we use the private field of page struct to track the > > ring_pages[] pointer so that we can retrieve the user when migrate? > > Doing so another problem occurs, how to distinguish such special pages? > > Use pageflag may cause an impact on current pageflag layout, add new pageflag > > item also seems to be impossible. > > > > I'm not sure what way is the right approach, seeking for help. > > Any comments are extremely needed, thanks :) > > Tricky. > > I expect the same problem would occur with pages which are under > O_DIRECT I/O. Obviously O_DIRECT pages won't be pinned for such long > periods, but the durations could still be lengthy (seconds). > > Worse is a futex page, which could easily remain pinned indefinitely. > > The best I can think of is to make changes in or around > get_user_pages(), to steal the pages from userspace and replace them > with non-movable ones before pinning them. The performance cost of > something like this would surely be unacceptable for direct-io, but > maybe OK for the aio ring and futexes. > If this happens then it would be preferred if this only happened for ZONE_MOVABLE. If it generally happens it means we're going to have a lot more MIGRATE_UNMOVABLE pageblocks and a lot more fragmentation leading to lower THP availability. For THP, we're ok if some pageblocks are temporarily unavailable or even unavailable for long periods of time, we can cope with that but we (or I at least) do not want to lower THP availability on systems that do not care about ZONE_MOVABLE or node hot-plug. -- Mel Gorman SUSE Labs From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1030707Ab2K3PY2 (ORCPT ); Fri, 30 Nov 2012 10:24:28 -0500 Received: from mail-ea0-f174.google.com ([209.85.215.174]:52965 "EHLO mail-ea0-f174.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1030658Ab2K3PY0 (ORCPT ); Fri, 30 Nov 2012 10:24:26 -0500 Date: Fri, 30 Nov 2012 16:24:21 +0100 From: Domenico Andreoli To: Lin Feng Cc: akpm@linux-foundation.org, viro@zeniv.linux.org.uk, bcrl@kvack.org, kamezawa.hiroyu@jp.fujitsu.com, mhocko@suse.cz, hughd@google.com, cl@linux.com, mgorman@suse.de, minchan@kernel.org, isimatu.yasuaki@jp.fujitsu.com, laijs@cn.fujitsu.com, wency@cn.fujitsu.com, tangchen@cn.fujitsu.com, linux-fsdevel@vger.kernel.org, linux-aio@kvack.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [BUG REPORT] [mm-hotplug, aio] aio ring_pages can't be offlined Message-ID: <20121130152421.GA19849@glitch> Mail-Followup-To: Lin Feng , akpm@linux-foundation.org, viro@zeniv.linux.org.uk, bcrl@kvack.org, kamezawa.hiroyu@jp.fujitsu.com, mhocko@suse.cz, hughd@google.com, cl@linux.com, mgorman@suse.de, minchan@kernel.org, isimatu.yasuaki@jp.fujitsu.com, laijs@cn.fujitsu.com, wency@cn.fujitsu.com, tangchen@cn.fujitsu.com, linux-fsdevel@vger.kernel.org, linux-aio@kvack.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org References: <1354172098-5691-1-git-send-email-linfeng@cn.fujitsu.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1354172098-5691-1-git-send-email-linfeng@cn.fujitsu.com> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Nov 29, 2012 at 02:54:58PM +0800, Lin Feng wrote: > Hi all, Hi Lin, > We encounter a "Resource temporarily unavailable" fail while trying > to offline a memory section in a movable zone. We found that there are > some pages can't be migrated. The offline operation fails in function > migrate_page_move_mapping() returning -EAGAIN till timeout because > the if assertion 'page_count(page) != 1' fails. is this something that worked before? if yes (then it's a regression) do you know with which kernel? Thanks, Domenico From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754784Ab2LCCwx (ORCPT ); Sun, 2 Dec 2012 21:52:53 -0500 Received: from cn.fujitsu.com ([222.73.24.84]:3419 "EHLO song.cn.fujitsu.com" rhost-flags-OK-FAIL-OK-OK) by vger.kernel.org with ESMTP id S1752153Ab2LCCwv (ORCPT ); Sun, 2 Dec 2012 21:52:51 -0500 X-IronPort-AV: E=Sophos;i="4.83,359,1352044800"; d="scan'208";a="6318091" Message-ID: <50BC13EB.1050009@cn.fujitsu.com> Date: Mon, 03 Dec 2012 10:52:27 +0800 From: Lin Feng User-Agent: Mozilla/5.0 (X11; Linux i686; rv:15.0) Gecko/20120911 Thunderbird/15.0.1 MIME-Version: 1.0 To: Mel Gorman CC: Andrew Morton , viro@zeniv.linux.org.uk, bcrl@kvack.org, kamezawa.hiroyu@jp.fujitsu.com, mhocko@suse.cz, hughd@google.com, cl@linux.com, minchan@kernel.org, isimatu.yasuaki@jp.fujitsu.com, laijs@cn.fujitsu.com, wency@cn.fujitsu.com, tangchen@cn.fujitsu.com, linux-fsdevel@vger.kernel.org, linux-aio@kvack.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [BUG REPORT] [mm-hotplug, aio] aio ring_pages can't be offlined References: <1354172098-5691-1-git-send-email-linfeng@cn.fujitsu.com> <20121129153930.477e9709.akpm@linux-foundation.org> <50B82B0D.8010206@cn.fujitsu.com> <20121129215749.acfd872a.akpm@linux-foundation.org> <50B859C6.3020707@cn.fujitsu.com> <20121129235502.05223586.akpm@linux-foundation.org> <20121130110059.GD8218@suse.de> In-Reply-To: <20121130110059.GD8218@suse.de> X-MIMETrack: Itemize by SMTP Server on mailserver/fnst(Release 8.5.3|September 15, 2011) at 2012/12/03 10:52:12, Serialize by Router on mailserver/fnst(Release 8.5.3|September 15, 2011) at 2012/12/03 10:52:15, Serialize complete at 2012/12/03 10:52:15 Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=ISO-8859-15 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 11/30/2012 07:00 PM, Mel Gorman wrote: >> >> Well, that's a fairly low-level implementation detail. A more typical >> approach would be to add a new get_user_pages_non_movable() or such. >> That would probably have the same signature as get_user_pages(), with >> one additional argument. Then get_user_pages() becomes a one-line >> wrapper which passes in a particular value of that argument. >> > > That is going in the direction that all pinned pages become MIGRATE_UNMOVABLE > allocations. That will impact THP availability by increasing the number > of MIGRATE_UNMOVABLE blocks that exist and it would hit every user -- > not just those that care about ZONE_MOVABLE. > > I'm likely to NAK such a patch if it's only about node hot-remove because > it's much more of a corner case than wanting to use THP. > > I would prefer if get_user_pages() checked if the page it was about to > pin was in ZONE_MOVABLE and if so, migrate it at that point before it's > pinned. It'll be expensive but will guarantee ZONE_MOVABLE availability > if that's what they want. The CMA people might also want to take > advantage of this if the page happened to be in the MIGRATE_CMA > pageblock. > hi Mel, Thanks for your suggestion. My initial idea is also to restrict the impact as little as possible so migrate such pages as we need. But even to such "going to pin pages", most of them are going to be released soon, so deal with them all in the same way is really *expensive*. May be we do have to find another way that makes everybody happy :) Thanks, linfeng From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754931Ab2LCDBL (ORCPT ); Sun, 2 Dec 2012 22:01:11 -0500 Received: from cn.fujitsu.com ([222.73.24.84]:41182 "EHLO song.cn.fujitsu.com" rhost-flags-OK-FAIL-OK-OK) by vger.kernel.org with ESMTP id S1754519Ab2LCDBK (ORCPT ); Sun, 2 Dec 2012 22:01:10 -0500 X-IronPort-AV: E=Sophos;i="4.83,359,1352044800"; d="scan'208";a="6318165" Message-ID: <50BC15E1.8060806@cn.fujitsu.com> Date: Mon, 03 Dec 2012 11:00:49 +0800 From: Lin Feng User-Agent: Mozilla/5.0 (X11; Linux i686; rv:15.0) Gecko/20120911 Thunderbird/15.0.1 MIME-Version: 1.0 To: Andrew Morton CC: viro@zeniv.linux.org.uk, bcrl@kvack.org, kamezawa.hiroyu@jp.fujitsu.com, mhocko@suse.cz, hughd@google.com, cl@linux.com, mgorman@suse.de, minchan@kernel.org, isimatu.yasuaki@jp.fujitsu.com, laijs@cn.fujitsu.com, wency@cn.fujitsu.com, tangchen@cn.fujitsu.com, linux-fsdevel@vger.kernel.org, linux-aio@kvack.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Lin Feng Subject: Re: [BUG REPORT] [mm-hotplug, aio] aio ring_pages can't be offlined References: <1354172098-5691-1-git-send-email-linfeng@cn.fujitsu.com> <20121129153930.477e9709.akpm@linux-foundation.org> <50B82B0D.8010206@cn.fujitsu.com> <20121129215749.acfd872a.akpm@linux-foundation.org> <50B859C6.3020707@cn.fujitsu.com> <20121129235502.05223586.akpm@linux-foundation.org> <50B88A8A.9020802@cn.fujitsu.com> <20121130024755.b5dae17e.akpm@linux-foundation.org> In-Reply-To: <20121130024755.b5dae17e.akpm@linux-foundation.org> X-MIMETrack: Itemize by SMTP Server on mailserver/fnst(Release 8.5.3|September 15, 2011) at 2012/12/03 11:00:33, Serialize by Router on mailserver/fnst(Release 8.5.3|September 15, 2011) at 2012/12/03 11:00:34, Serialize complete at 2012/12/03 11:00:34 Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=ISO-8859-1 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 11/30/2012 06:47 PM, Andrew Morton wrote: > On Fri, 30 Nov 2012 18:29:30 +0800 Lin Feng wrote: > >>> add a new library function which callers can use before (or after?) >>> calling get_user_pages[_fast](). >> Sorry, I'm not quite understand what "library function" function means.. >> Does it means a function aids get_user_pages() or totally wraps/replaces >> get_user_pages(), or none of above? > > "library function" is terminology for a general facility which > the core kernel makes available to other parts of the kernel. > get_user_pages() is a library function, as are the functions in lib/, > etc. "grep EXPORT_SYMBOL ./*/*.c" hi Andrew, Thanks for your explanation and sorry for my ignorant question :) As Mel said Still I can't find a way to make every guy happy.. Thanks, linfeng > > > From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754678Ab2LCLhM (ORCPT ); Mon, 3 Dec 2012 06:37:12 -0500 Received: from cantor2.suse.de ([195.135.220.15]:53496 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751291Ab2LCLhJ (ORCPT ); Mon, 3 Dec 2012 06:37:09 -0500 Date: Mon, 3 Dec 2012 11:37:04 +0000 From: Mel Gorman To: Lin Feng Cc: Andrew Morton , viro@zeniv.linux.org.uk, bcrl@kvack.org, kamezawa.hiroyu@jp.fujitsu.com, mhocko@suse.cz, hughd@google.com, cl@linux.com, minchan@kernel.org, isimatu.yasuaki@jp.fujitsu.com, laijs@cn.fujitsu.com, wency@cn.fujitsu.com, tangchen@cn.fujitsu.com, linux-fsdevel@vger.kernel.org, linux-aio@kvack.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [BUG REPORT] [mm-hotplug, aio] aio ring_pages can't be offlined Message-ID: <20121203113704.GK8218@suse.de> References: <1354172098-5691-1-git-send-email-linfeng@cn.fujitsu.com> <20121129153930.477e9709.akpm@linux-foundation.org> <50B82B0D.8010206@cn.fujitsu.com> <20121129215749.acfd872a.akpm@linux-foundation.org> <50B859C6.3020707@cn.fujitsu.com> <20121129235502.05223586.akpm@linux-foundation.org> <20121130110059.GD8218@suse.de> <50BC13EB.1050009@cn.fujitsu.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: <50BC13EB.1050009@cn.fujitsu.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Dec 03, 2012 at 10:52:27AM +0800, Lin Feng wrote: > > > On 11/30/2012 07:00 PM, Mel Gorman wrote: > >> > >> Well, that's a fairly low-level implementation detail. A more typical > >> approach would be to add a new get_user_pages_non_movable() or such. > >> That would probably have the same signature as get_user_pages(), with > >> one additional argument. Then get_user_pages() becomes a one-line > >> wrapper which passes in a particular value of that argument. > >> > > > > That is going in the direction that all pinned pages become MIGRATE_UNMOVABLE > > allocations. That will impact THP availability by increasing the number > > of MIGRATE_UNMOVABLE blocks that exist and it would hit every user -- > > not just those that care about ZONE_MOVABLE. > > > > I'm likely to NAK such a patch if it's only about node hot-remove because > > it's much more of a corner case than wanting to use THP. > > > > I would prefer if get_user_pages() checked if the page it was about to > > pin was in ZONE_MOVABLE and if so, migrate it at that point before it's > > pinned. It'll be expensive but will guarantee ZONE_MOVABLE availability > > if that's what they want. The CMA people might also want to take > > advantage of this if the page happened to be in the MIGRATE_CMA > > pageblock. > > > hi Mel, > > Thanks for your suggestion. > My initial idea is also to restrict the impact as little as possible so > migrate such pages as we need. > But even to such "going to pin pages", most of them are going to be released > soon, so deal with them all in the same way is really *expensive*. > Then you need to somehow distinguish between short-lived pins and long-lived pins and only migrate the long-lived pins. I didn't research how this could be implemented -- Mel Gorman SUSE Labs