dri-devel.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v6 00/12] Support non-lru page migration
@ 2016-05-20 14:23 Minchan Kim
  2016-05-20 14:23 ` [PATCH v6 02/12] mm: migrate: support non-lru movable " Minchan Kim
  0 siblings, 1 reply; 22+ messages in thread
From: Minchan Kim @ 2016-05-20 14:23 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, Minchan Kim, Vlastimil Babka, dri-devel,
	Hugh Dickins, John Einar Reitan, Jonathan Corbet, Joonsoo Kim,
	Konstantin Khlebnikov, Mel Gorman, Naoya Horiguchi, Rafael Aquini,
	Rik van Riel, Sergey Senozhatsky, virtualization, Gioh Kim,
	Chan Gyun Jeong, Sangseok Lee, Kyeongdon Kim, Chulmin Kim

Recently, I got many reports about perfermance degradation in embedded
system(Android mobile phone, webOS TV and so on) and easy fork fail.

The problem was fragmentation caused by zram and GPU driver mainly.
With memory pressure, their pages were spread out all of pageblock and
it cannot be migrated with current compaction algorithm which supports
only LRU pages. In the end, compaction cannot work well so reclaimer
shrinks all of working set pages. It made system very slow and even to
fail to fork easily which requires order-[2 or 3] allocations.

Other pain point is that they cannot use CMA memory space so when OOM
kill happens, I can see many free pages in CMA area, which is not
memory efficient. In our product which has big CMA memory, it reclaims
zones too exccessively to allocate GPU and zram page although there are
lots of free space in CMA so system becomes very slow easily.

To solve these problem, this patch tries to add facility to migrate
non-lru pages via introducing new functions and page flags to help
migration.


struct address_space_operations {
	..
	..
	bool (*isolate_page)(struct page *, isolate_mode_t);
	void (*putback_page)(struct page *);
	..
}

new page flags

	PG_movable
	PG_isolated

For details, please read description in "mm: migrate: support non-lru
movable page migration".

Originally, Gioh Kim had tried to support this feature but he moved so
I took over the work. I took many code from his work and changed a little
bit and Konstantin Khlebnikov helped Gioh a lot so he should deserve to have
many credit, too.

And I should mention Chulmin who have tested this patchset heavily
so I can find many bugs from him. :)

Thanks, Gioh, Konstantin and Chulmin!

This patchset consists of five parts.

1. clean up migration
  mm: use put_page to free page instead of putback_lru_page

2. add non-lru page migration feature
  mm: migrate: support non-lru movable page migration

3. rework KVM memory-ballooning
  mm: balloon: use general non-lru movable page feature

4. zsmalloc refactoring for preparing page migration
  zsmalloc: keep max_object in size_class
  zsmalloc: use bit_spin_lock
  zsmalloc: use accessor
  zsmalloc: factor page chain functionality out
  zsmalloc: introduce zspage structure
  zsmalloc: separate free_zspage from putback_zspage
  zsmalloc: use freeobj for index

5. zsmalloc page migration
  zsmalloc: page migration support
  zram: use __GFP_MOVABLE for memory allocation

* From v5
  * rebase on next-20160520
  * move utility functions to compaction.c and export - Sergey
  * zsmalloc dobule free fix - Sergey
  * add additional Reviewed-by for zsmalloc - Sergey

* From v4
  * rebase on mmotm-2016-05-05-17-19
  * fix huge object migration - Chulmin
  * !CONFIG_COMPACTION support for zsmalloc

* From v3
  * rebase on mmotm-2016-04-06-20-40
  * fix swap_info deadlock - Chulmin
  * race without page_lock - Vlastimil
  * no use page._mapcount for potential user-mapped page driver - Vlastimil
  * fix and enhance doc/description - Vlastimil
  * use page->mapping lower bits to represent PG_movable
  * make driver side's rule simple.

* From v2
  * rebase on mmotm-2016-03-29-15-54-16
  * check PageMovable before lock_page - Joonsoo
  * check PageMovable before PageIsolated checking - Joonsoo
  * add more description about rule

* From v1
  * rebase on v4.5-mmotm-2016-03-17-15-04
  * reordering patches to merge clean-up patches first
  * add Acked-by/Reviewed-by from Vlastimil and Sergey
  * use each own mount model instead of reusing anon_inode_fs - Al Viro
  * small changes - YiPing, Gioh

Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: dri-devel@lists.freedesktop.org
Cc: Hugh Dickins <hughd@google.com>
Cc: John Einar Reitan <john.reitan@foss.arm.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Rafael Aquini <aquini@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: virtualization@lists.linux-foundation.org
Cc: Gioh Kim <gi-oh.kim@profitbricks.com>
Cc: Chan Gyun Jeong <chan.jeong@lge.com>
Cc: Sangseok Lee <sangseok.lee@lge.com>
Cc: Kyeongdon Kim <kyeongdon.kim@lge.com>
Cc: Chulmin Kim <cmlaika.kim@samsung.com>

Minchan Kim (12):
  mm: use put_page to free page instead of putback_lru_page
  mm: migrate: support non-lru movable page migration
  mm: balloon: use general non-lru movable page feature
  zsmalloc: keep max_object in size_class
  zsmalloc: use bit_spin_lock
  zsmalloc: use accessor
  zsmalloc: factor page chain functionality out
  zsmalloc: introduce zspage structure
  zsmalloc: separate free_zspage from putback_zspage
  zsmalloc: use freeobj for index
  zsmalloc: page migration support
  zram: use __GFP_MOVABLE for memory allocation

 Documentation/filesystems/Locking  |    4 +
 Documentation/filesystems/vfs.txt  |   11 +
 Documentation/vm/page_migration    |  107 ++-
 drivers/block/zram/zram_drv.c      |    6 +-
 drivers/virtio/virtio_balloon.c    |   54 +-
 include/linux/balloon_compaction.h |   53 +-
 include/linux/compaction.h         |   17 +
 include/linux/fs.h                 |    2 +
 include/linux/ksm.h                |    3 +-
 include/linux/migrate.h            |    2 +
 include/linux/mm.h                 |    1 +
 include/linux/page-flags.h         |   33 +-
 include/uapi/linux/magic.h         |    2 +
 mm/balloon_compaction.c            |   94 +--
 mm/compaction.c                    |   76 +-
 mm/ksm.c                           |    4 +-
 mm/migrate.c                       |  256 +++++--
 mm/page_alloc.c                    |    2 +-
 mm/util.c                          |    6 +-
 mm/vmscan.c                        |    2 +-
 mm/zsmalloc.c                      | 1351 +++++++++++++++++++++++++-----------
 21 files changed, 1475 insertions(+), 611 deletions(-)

-- 
1.9.1

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v6 02/12] mm: migrate: support non-lru movable page migration
  2016-05-20 14:23 [PATCH v6 00/12] Support non-lru page migration Minchan Kim
@ 2016-05-20 14:23 ` Minchan Kim
  2016-05-27 14:26   ` Vlastimil Babka
  2016-05-30  1:39   ` PATCH v6v2 " Minchan Kim
  0 siblings, 2 replies; 22+ messages in thread
From: Minchan Kim @ 2016-05-20 14:23 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Rik van Riel, Sergey Senozhatsky, Rafael Aquini, Minchan Kim,
	Jonathan Corbet, Hugh Dickins, linux-kernel, dri-devel,
	virtualization, John Einar Reitan, linux-mm, Gioh Kim, Mel Gorman,
	Joonsoo Kim, Vlastimil Babka

We have allowed migration for only LRU pages until now and it was
enough to make high-order pages. But recently, embedded system(e.g.,
webOS, android) uses lots of non-movable pages(e.g., zram, GPU memory)
so we have seen several reports about troubles of small high-order
allocation. For fixing the problem, there were several efforts
(e,g,. enhance compaction algorithm, SLUB fallback to 0-order page,
reserved memory, vmalloc and so on) but if there are lots of
non-movable pages in system, their solutions are void in the long run.

So, this patch is to support facility to change non-movable pages
with movable. For the feature, this patch introduces functions related
to migration to address_space_operations as well as some page flags.

If a driver want to make own pages movable, it should define three functions
which are function pointers of struct address_space_operations.

1. bool (*isolate_page) (struct page *page, isolate_mode_t mode);

What VM expects on isolate_page function of driver is to return *true*
if driver isolates page successfully. On returing true, VM marks the page
as PG_isolated so concurrent isolation in several CPUs skip the page
for isolation. If a driver cannot isolate the page, it should return *false*.

Once page is successfully isolated, VM uses page.lru fields so driver
shouldn't expect to preserve values in that fields.

2. int (*migratepage) (struct address_space *mapping,
		struct page *newpage, struct page *oldpage, enum migrate_mode);

After isolation, VM calls migratepage of driver with isolated page.
The function of migratepage is to move content of the old page to new page
and set up fields of struct page newpage. Keep in mind that you should
clear PG_movable of oldpage via __ClearPageMovable under page_lock if you
migrated the oldpage successfully and returns 0.
If driver cannot migrate the page at the moment, driver can return -EAGAIN.
On -EAGAIN, VM will retry page migration in a short time because VM interprets
-EAGAIN as "temporal migration failure". On returning any error except -EAGAIN,
VM will give up the page migration without retrying in this time.

Driver shouldn't touch page.lru field VM using in the functions.

3. void (*putback_page)(struct page *);

If migration fails on isolated page, VM should return the isolated page
to the driver so VM calls driver's putback_page with migration failed page.
In this function, driver should put the isolated page back to the own data
structure.

4. non-lru movable page flags

There are two page flags for supporting non-lru movable page.

* PG_movable

Driver should use the below function to make page movable under page_lock.

	void __SetPageMovable(struct page *page, struct address_space *mapping)

It needs argument of address_space for registering migration family functions
which will be called by VM. Exactly speaking, PG_movable is not a real flag of
struct page. Rather than, VM reuses page->mapping's lower bits to represent it.

	#define PAGE_MAPPING_MOVABLE 0x2
	page->mapping = page->mapping | PAGE_MAPPING_MOVABLE;

so driver shouldn't access page->mapping directly. Instead, driver should
use page_mapping which mask off the low two bits of page->mapping so it can get
right struct address_space.

For testing of non-lru movable page, VM supports __PageMovable function.
However, it doesn't guarantee to identify non-lru movable page because
page->mapping field is unified with other variables in struct page.
As well, if driver releases the page after isolation by VM, page->mapping
doesn't have stable value although it has PAGE_MAPPING_MOVABLE
(Look at __ClearPageMovable). But __PageMovable is cheap to catch whether
page is LRU or non-lru movable once the page has been isolated. Because
LRU pages never can have PAGE_MAPPING_MOVABLE in page->mapping. It is also
good for just peeking to test non-lru movable pages before more expensive
checking with lock_page in pfn scanning to select victim.

For guaranteeing non-lru movable page, VM provides PageMovable function.
Unlike __PageMovable, PageMovable functions validates page->mapping and
mapping->a_ops->isolate_page under lock_page. The lock_page prevents sudden
destroying of page->mapping.

Driver using __SetPageMovable should clear the flag via __ClearMovablePage
under page_lock before the releasing the page.

* PG_isolated

To prevent concurrent isolation among several CPUs, VM marks isolated page
as PG_isolated under lock_page. So if a CPU encounters PG_isolated non-lru
movable page, it can skip it. Driver doesn't need to manipulate the flag
because VM will set/clear it automatically. Keep in mind that if driver
sees PG_isolated page, it means the page have been isolated by VM so it
shouldn't touch page.lru field.
PG_isolated is alias with PG_reclaim flag so driver shouldn't use the flag
for own purpose.

Cc: Rik van Riel <riel@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: Rafael Aquini <aquini@redhat.com>
Cc: virtualization@lists.linux-foundation.org
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: John Einar Reitan <john.reitan@foss.arm.com>
Cc: dri-devel@lists.freedesktop.org
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Gioh Kim <gi-oh.kim@profitbricks.com>
Signed-off-by: Minchan Kim <minchan@kernel.org>
---
 Documentation/filesystems/Locking |   4 +
 Documentation/filesystems/vfs.txt |  11 +++
 Documentation/vm/page_migration   | 107 ++++++++++++++++++++-
 include/linux/compaction.h        |  17 ++++
 include/linux/fs.h                |   2 +
 include/linux/ksm.h               |   3 +-
 include/linux/migrate.h           |   2 +
 include/linux/mm.h                |   1 +
 include/linux/page-flags.h        |  33 +++++--
 mm/compaction.c                   |  82 ++++++++++++----
 mm/ksm.c                          |   4 +-
 mm/migrate.c                      | 191 ++++++++++++++++++++++++++++++++++----
 mm/page_alloc.c                   |   2 +-
 mm/util.c                         |   6 +-
 14 files changed, 411 insertions(+), 54 deletions(-)

diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking
index 75eea7ce3d7c..dda6e3f8e203 100644
--- a/Documentation/filesystems/Locking
+++ b/Documentation/filesystems/Locking
@@ -195,7 +195,9 @@ unlocks and drops the reference.
 	int (*releasepage) (struct page *, int);
 	void (*freepage)(struct page *);
 	int (*direct_IO)(struct kiocb *, struct iov_iter *iter);
+	bool (*isolate_page) (struct page *, isolate_mode_t);
 	int (*migratepage)(struct address_space *, struct page *, struct page *);
+	void (*putback_page) (struct page *);
 	int (*launder_page)(struct page *);
 	int (*is_partially_uptodate)(struct page *, unsigned long, unsigned long);
 	int (*error_remove_page)(struct address_space *, struct page *);
@@ -219,7 +221,9 @@ invalidatepage:		yes
 releasepage:		yes
 freepage:		yes
 direct_IO:
+isolate_page:		yes
 migratepage:		yes (both)
+putback_page:		yes
 launder_page:		yes
 is_partially_uptodate:	yes
 error_remove_page:	yes
diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt
index c61a223ef3ff..900360cbcdae 100644
--- a/Documentation/filesystems/vfs.txt
+++ b/Documentation/filesystems/vfs.txt
@@ -592,9 +592,14 @@ struct address_space_operations {
 	int (*releasepage) (struct page *, int);
 	void (*freepage)(struct page *);
 	ssize_t (*direct_IO)(struct kiocb *, struct iov_iter *iter);
+	/* isolate a page for migration */
+	bool (*isolate_page) (struct page *, isolate_mode_t);
 	/* migrate the contents of a page to the specified target */
 	int (*migratepage) (struct page *, struct page *);
+	/* put migration-failed page back to right list */
+	void (*putback_page) (struct page *);
 	int (*launder_page) (struct page *);
+
 	int (*is_partially_uptodate) (struct page *, unsigned long,
 					unsigned long);
 	void (*is_dirty_writeback) (struct page *, bool *, bool *);
@@ -747,6 +752,10 @@ struct address_space_operations {
         and transfer data directly between the storage and the
         application's address space.
 
+  isolate_page: Called by the VM when isolating a movable non-lru page.
+	If page is successfully isolated, VM marks the page as PG_isolated
+	via __SetPageIsolated.
+
   migrate_page:  This is used to compact the physical memory usage.
         If the VM wants to relocate a page (maybe off a memory card
         that is signalling imminent failure) it will pass a new page
@@ -754,6 +763,8 @@ struct address_space_operations {
 	transfer any private data across and update any references
         that it has to the page.
 
+  putback_page: Called by the VM when isolated page's migration fails.
+
   launder_page: Called before freeing a page - it writes back the dirty page. To
   	prevent redirtying the page, it is kept locked during the whole
 	operation.
diff --git a/Documentation/vm/page_migration b/Documentation/vm/page_migration
index fea5c0864170..80e98af46e95 100644
--- a/Documentation/vm/page_migration
+++ b/Documentation/vm/page_migration
@@ -142,5 +142,110 @@ is increased so that the page cannot be freed while page migration occurs.
 20. The new page is moved to the LRU and can be scanned by the swapper
     etc again.
 
-Christoph Lameter, May 8, 2006.
+C. Non-LRU page migration
+-------------------------
+
+Although original migration aimed for reducing the latency of memory access
+for NUMA, compaction who want to create high-order page is also main customer.
+
+Current problem of the implementation is that it is designed to migrate only
+*LRU* pages. However, there are potential non-lru pages which can be migrated
+in drivers, for example, zsmalloc, virtio-balloon pages.
+
+For virtio-balloon pages, some parts of migration code path have been hooked
+up and added virtio-balloon specific functions to intercept migration logics.
+It's too specific to a driver so other drivers who want to make their pages
+movable would have to add own specific hooks in migration path.
+
+To overclome the problem, VM supports non-LRU page migration which provides
+generic functions for non-LRU movable pages without driver specific hooks
+migration path.
+
+If a driver want to make own pages movable, it should define three functions
+which are function pointers of struct address_space_operations.
+
+1. bool (*isolate_page) (struct page *page, isolate_mode_t mode);
+
+What VM expects on isolate_page function of driver is to return *true*
+if driver isolates page successfully. On returing true, VM marks the page
+as PG_isolated so concurrent isolation in several CPUs skip the page
+for isolation. If a driver cannot isolate the page, it should return *false*.
+
+Once page is successfully isolated, VM uses page.lru fields so driver
+shouldn't expect to preserve values in that fields.
+
+2. int (*migratepage) (struct address_space *mapping,
+		struct page *newpage, struct page *oldpage, enum migrate_mode);
+
+After isolation, VM calls migratepage of driver with isolated page.
+The function of migratepage is to move content of the old page to new page
+and set up fields of struct page newpage. Keep in mind that you should
+clear PG_movable of oldpage via __ClearPageMovable under page_lock if you
+migrated the oldpage successfully and returns 0.
+If driver cannot migrate the page at the moment, driver can return -EAGAIN.
+On -EAGAIN, VM will retry page migration in a short time because VM interprets
+-EAGAIN as "temporal migration failure". On returning any error except -EAGAIN,
+VM will give up the page migration without retrying in this time.
+
+Driver shouldn't touch page.lru field VM using in the functions.
+
+3. void (*putback_page)(struct page *);
+
+If migration fails on isolated page, VM should return the isolated page
+to the driver so VM calls driver's putback_page with migration failed page.
+In this function, driver should put the isolated page back to the own data
+structure.
 
+4. non-lru movable page flags
+
+There are two page flags for supporting non-lru movable page.
+
+* PG_movable
+
+Driver should use the below function to make page movable under page_lock.
+
+	void __SetPageMovable(struct page *page, struct address_space *mapping)
+
+It needs argument of address_space for registering migration family functions
+which will be called by VM. Exactly speaking, PG_movable is not a real flag of
+struct page. Rather than, VM reuses page->mapping's lower bits to represent it.
+
+	#define PAGE_MAPPING_MOVABLE 0x2
+	page->mapping = page->mapping | PAGE_MAPPING_MOVABLE;
+
+so driver shouldn't access page->mapping directly. Instead, driver should
+use page_mapping which mask off the low two bits of page->mapping so it can get
+right struct address_space.
+
+For testing of non-lru movable page, VM supports __PageMovable function.
+However, it doesn't guarantee to identify non-lru movable page because
+page->mapping field is unified with other variables in struct page.
+As well, if driver releases the page after isolation by VM, page->mapping
+doesn't have stable value although it has PAGE_MAPPING_MOVABLE
+(Look at __ClearPageMovable). But __PageMovable is cheap to catch whether
+page is LRU or non-lru movable once the page has been isolated. Because
+LRU pages never can have PAGE_MAPPING_MOVABLE in page->mapping. It is also
+good for just peeking to test non-lru movable pages before more expensive
+checking with lock_page in pfn scanning to select victim.
+
+For guaranteeing non-lru movable page, VM provides PageMovable function.
+Unlike __PageMovable, PageMovable functions validates page->mapping and
+mapping->a_ops->isolate_page under lock_page. The lock_page prevents sudden
+destroying of page->mapping.
+
+Driver using __SetPageMovable should clear the flag via __ClearMovablePage
+under page_lock before the releasing the page.
+
+* PG_isolated
+
+To prevent concurrent isolation among several CPUs, VM marks isolated page
+as PG_isolated under lock_page. So if a CPU encounters PG_isolated non-lru
+movable page, it can skip it. Driver doesn't need to manipulate the flag
+because VM will set/clear it automatically. Keep in mind that if driver
+sees PG_isolated page, it means the page have been isolated by VM so it
+shouldn't touch page.lru field.
+PG_isolated is alias with PG_reclaim flag so driver shouldn't use the flag
+for own purpose.
+
+Christoph Lameter, May 8, 2006.
+Minchan Kim, Mar 28, 2016.
diff --git a/include/linux/compaction.h b/include/linux/compaction.h
index a58c852a268f..c6b47c861cea 100644
--- a/include/linux/compaction.h
+++ b/include/linux/compaction.h
@@ -54,6 +54,9 @@ enum compact_result {
 struct alloc_context; /* in mm/internal.h */
 
 #ifdef CONFIG_COMPACTION
+extern int PageMovable(struct page *page);
+extern void __SetPageMovable(struct page *page, struct address_space *mapping);
+extern void __ClearPageMovable(struct page *page);
 extern int sysctl_compact_memory;
 extern int sysctl_compaction_handler(struct ctl_table *table, int write,
 			void __user *buffer, size_t *length, loff_t *ppos);
@@ -151,6 +154,19 @@ extern void kcompactd_stop(int nid);
 extern void wakeup_kcompactd(pg_data_t *pgdat, int order, int classzone_idx);
 
 #else
+static inline int PageMovable(struct page *page)
+{
+	return 0;
+}
+static inline void __SetPageMovable(struct page *page,
+			struct address_space *mapping)
+{
+}
+
+static inline void __ClearPageMovable(struct page *page)
+{
+}
+
 static inline enum compact_result try_to_compact_pages(gfp_t gfp_mask,
 			unsigned int order, int alloc_flags,
 			const struct alloc_context *ac,
@@ -212,6 +228,7 @@ static inline void wakeup_kcompactd(pg_data_t *pgdat, int order, int classzone_i
 #endif /* CONFIG_COMPACTION */
 
 #if defined(CONFIG_COMPACTION) && defined(CONFIG_SYSFS) && defined(CONFIG_NUMA)
+struct node;
 extern int compaction_register_node(struct node *node);
 extern void compaction_unregister_node(struct node *node);
 
diff --git a/include/linux/fs.h b/include/linux/fs.h
index c9cc1f699dc1..6a2ce439ea42 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -402,6 +402,8 @@ struct address_space_operations {
 	 */
 	int (*migratepage) (struct address_space *,
 			struct page *, struct page *, enum migrate_mode);
+	bool (*isolate_page)(struct page *, isolate_mode_t);
+	void (*putback_page)(struct page *);
 	int (*launder_page) (struct page *);
 	int (*is_partially_uptodate) (struct page *, unsigned long,
 					unsigned long);
diff --git a/include/linux/ksm.h b/include/linux/ksm.h
index 7ae216a39c9e..481c8c4627ca 100644
--- a/include/linux/ksm.h
+++ b/include/linux/ksm.h
@@ -43,8 +43,7 @@ static inline struct stable_node *page_stable_node(struct page *page)
 static inline void set_page_stable_node(struct page *page,
 					struct stable_node *stable_node)
 {
-	page->mapping = (void *)stable_node +
-				(PAGE_MAPPING_ANON | PAGE_MAPPING_KSM);
+	page->mapping = (void *)((unsigned long)stable_node | PAGE_MAPPING_KSM);
 }
 
 /*
diff --git a/include/linux/migrate.h b/include/linux/migrate.h
index 9b50325e4ddf..404fbfefeb33 100644
--- a/include/linux/migrate.h
+++ b/include/linux/migrate.h
@@ -37,6 +37,8 @@ extern int migrate_page(struct address_space *,
 			struct page *, struct page *, enum migrate_mode);
 extern int migrate_pages(struct list_head *l, new_page_t new, free_page_t free,
 		unsigned long private, enum migrate_mode mode, int reason);
+extern bool isolate_movable_page(struct page *page, isolate_mode_t mode);
+extern void putback_movable_page(struct page *page);
 
 extern int migrate_prep(void);
 extern int migrate_prep_local(void);
diff --git a/include/linux/mm.h b/include/linux/mm.h
index a00ec816233a..33eaec57e997 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1035,6 +1035,7 @@ static inline pgoff_t page_file_index(struct page *page)
 }
 
 bool page_mapped(struct page *page);
+struct address_space *page_mapping(struct page *page);
 
 /*
  * Return true only if the page has been allocated with
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index e5a32445f930..f8a2c4881608 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -129,6 +129,9 @@ enum pageflags {
 
 	/* Compound pages. Stored in first tail page's flags */
 	PG_double_map = PG_private_2,
+
+	/* non-lru isolated movable page */
+	PG_isolated = PG_reclaim,
 };
 
 #ifndef __GENERATING_BOUNDS_H
@@ -357,29 +360,37 @@ PAGEFLAG(Idle, idle, PF_ANY)
  * with the PAGE_MAPPING_ANON bit set to distinguish it.  See rmap.h.
  *
  * On an anonymous page in a VM_MERGEABLE area, if CONFIG_KSM is enabled,
- * the PAGE_MAPPING_KSM bit may be set along with the PAGE_MAPPING_ANON bit;
- * and then page->mapping points, not to an anon_vma, but to a private
+ * the PAGE_MAPPING_MOVABLE bit may be set along with the PAGE_MAPPING_ANON
+ * bit; and then page->mapping points, not to an anon_vma, but to a private
  * structure which KSM associates with that merged page.  See ksm.h.
  *
- * PAGE_MAPPING_KSM without PAGE_MAPPING_ANON is currently never used.
+ * PAGE_MAPPING_KSM without PAGE_MAPPING_ANON is used for non-lru movable
+ * page and then page->mapping points a struct address_space.
  *
  * Please note that, confusingly, "page_mapping" refers to the inode
  * address_space which maps the page from disk; whereas "page_mapped"
  * refers to user virtual address space into which the page is mapped.
  */
-#define PAGE_MAPPING_ANON	1
-#define PAGE_MAPPING_KSM	2
-#define PAGE_MAPPING_FLAGS	(PAGE_MAPPING_ANON | PAGE_MAPPING_KSM)
+#define PAGE_MAPPING_ANON	0x1
+#define PAGE_MAPPING_MOVABLE	0x2
+#define PAGE_MAPPING_KSM	(PAGE_MAPPING_ANON | PAGE_MAPPING_MOVABLE)
+#define PAGE_MAPPING_FLAGS	(PAGE_MAPPING_ANON | PAGE_MAPPING_MOVABLE)
 
-static __always_inline int PageAnonHead(struct page *page)
+static __always_inline int PageMappingFlag(struct page *page)
 {
-	return ((unsigned long)page->mapping & PAGE_MAPPING_ANON) != 0;
+	return ((unsigned long)page->mapping & PAGE_MAPPING_FLAGS) != 0;
 }
 
 static __always_inline int PageAnon(struct page *page)
 {
 	page = compound_head(page);
-	return PageAnonHead(page);
+	return ((unsigned long)page->mapping & PAGE_MAPPING_ANON) != 0;
+}
+
+static __always_inline int __PageMovable(struct page *page)
+{
+	return ((unsigned long)page->mapping & PAGE_MAPPING_FLAGS) ==
+				PAGE_MAPPING_MOVABLE;
 }
 
 #ifdef CONFIG_KSM
@@ -393,7 +404,7 @@ static __always_inline int PageKsm(struct page *page)
 {
 	page = compound_head(page);
 	return ((unsigned long)page->mapping & PAGE_MAPPING_FLAGS) ==
-				(PAGE_MAPPING_ANON | PAGE_MAPPING_KSM);
+				PAGE_MAPPING_KSM;
 }
 #else
 TESTPAGEFLAG_FALSE(Ksm)
@@ -641,6 +652,8 @@ static inline void __ClearPageBalloon(struct page *page)
 	atomic_set(&page->_mapcount, -1);
 }
 
+__PAGEFLAG(Isolated, isolated, PF_ANY);
+
 /*
  * If network-based swap is enabled, sl*b must keep track of whether pages
  * were allocated from pfmemalloc reserves.
diff --git a/mm/compaction.c b/mm/compaction.c
index 1427366ad673..2d6862d0df60 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -81,6 +81,41 @@ static inline bool migrate_async_suitable(int migratetype)
 
 #ifdef CONFIG_COMPACTION
 
+int PageMovable(struct page *page)
+{
+	struct address_space *mapping;
+
+	WARN_ON(!PageLocked(page));
+	if (!__PageMovable(page))
+		goto out;
+
+	mapping = page_mapping(page);
+	if (mapping && mapping->a_ops && mapping->a_ops->isolate_page)
+		return 1;
+out:
+	return 0;
+}
+EXPORT_SYMBOL(PageMovable);
+
+void __SetPageMovable(struct page *page, struct address_space *mapping)
+{
+	VM_BUG_ON_PAGE(!PageLocked(page), page);
+	VM_BUG_ON_PAGE((unsigned long)mapping & PAGE_MAPPING_MOVABLE, page);
+	page->mapping = (void *)((unsigned long)mapping | PAGE_MAPPING_MOVABLE);
+}
+EXPORT_SYMBOL(__SetPageMovable);
+
+void __ClearPageMovable(struct page *page)
+{
+	VM_BUG_ON_PAGE(!PageLocked(page), page);
+	VM_BUG_ON_PAGE(!PageMovable(page), page);
+	VM_BUG_ON_PAGE(!((unsigned long)page->mapping & PAGE_MAPPING_MOVABLE),
+				page);
+	page->mapping = (void *)((unsigned long)page->mapping &
+				PAGE_MAPPING_MOVABLE);
+}
+EXPORT_SYMBOL(__ClearPageMovable);
+
 /* Do not skip compaction more than 64 times */
 #define COMPACT_MAX_DEFER_SHIFT 6
 
@@ -735,21 +770,6 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
 		}
 
 		/*
-		 * Check may be lockless but that's ok as we recheck later.
-		 * It's possible to migrate LRU pages and balloon pages
-		 * Skip any other type of page
-		 */
-		is_lru = PageLRU(page);
-		if (!is_lru) {
-			if (unlikely(balloon_page_movable(page))) {
-				if (balloon_page_isolate(page)) {
-					/* Successfully isolated */
-					goto isolate_success;
-				}
-			}
-		}
-
-		/*
 		 * Regardless of being on LRU, compound pages such as THP and
 		 * hugetlbfs are not to be compacted. We can potentially save
 		 * a lot of iterations if we skip them at once. The check is
@@ -765,8 +785,38 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
 			goto isolate_fail;
 		}
 
-		if (!is_lru)
+		/*
+		 * Check may be lockless but that's ok as we recheck later.
+		 * It's possible to migrate LRU and non-lru movable pages.
+		 * Skip any other type of page
+		 */
+		is_lru = PageLRU(page);
+		if (!is_lru) {
+			if (unlikely(balloon_page_movable(page))) {
+				if (balloon_page_isolate(page)) {
+					/* Successfully isolated */
+					goto isolate_success;
+				}
+			}
+
+			/*
+			 * __PageMovable can return false positive so we need
+			 * to verify it under page_lock.
+			 */
+			if (unlikely(__PageMovable(page)) &&
+					!PageIsolated(page)) {
+				if (locked) {
+					spin_unlock_irqrestore(&zone->lru_lock,
+									flags);
+					locked = false;
+				}
+
+				if (isolate_movable_page(page, isolate_mode))
+					goto isolate_success;
+			}
+
 			goto isolate_fail;
+		}
 
 		/*
 		 * Migration will fail if an anonymous page is pinned in memory,
diff --git a/mm/ksm.c b/mm/ksm.c
index 4786b4150f62..35b8aef867a9 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -532,8 +532,8 @@ static struct page *get_ksm_page(struct stable_node *stable_node, bool lock_it)
 	void *expected_mapping;
 	unsigned long kpfn;
 
-	expected_mapping = (void *)stable_node +
-				(PAGE_MAPPING_ANON | PAGE_MAPPING_KSM);
+	expected_mapping = (void *)((unsigned long)stable_node |
+					PAGE_MAPPING_KSM);
 again:
 	kpfn = READ_ONCE(stable_node->kpfn);
 	page = pfn_to_page(kpfn);
diff --git a/mm/migrate.c b/mm/migrate.c
index 2666f28b5236..57559ca7c904 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -31,6 +31,7 @@
 #include <linux/vmalloc.h>
 #include <linux/security.h>
 #include <linux/backing-dev.h>
+#include <linux/compaction.h>
 #include <linux/syscalls.h>
 #include <linux/hugetlb.h>
 #include <linux/hugetlb_cgroup.h>
@@ -73,6 +74,79 @@ int migrate_prep_local(void)
 	return 0;
 }
 
+bool isolate_movable_page(struct page *page, isolate_mode_t mode)
+{
+	struct address_space *mapping;
+
+	/*
+	 * Avoid burning cycles with pages that are yet under __free_pages(),
+	 * or just got freed under us.
+	 *
+	 * In case we 'win' a race for a movable page being freed under us and
+	 * raise its refcount preventing __free_pages() from doing its job
+	 * the put_page() at the end of this block will take care of
+	 * release this page, thus avoiding a nasty leakage.
+	 */
+	if (unlikely(!get_page_unless_zero(page)))
+		goto out;
+
+	/*
+	 * Check PageMovable before holding a PG_lock because page's owner
+	 * assumes anybody doesn't touch PG_lock of newly allocated page
+	 * so unconditionally grapping the lock ruins page's owner side.
+	 */
+	if (unlikely(!__PageMovable(page)))
+		goto out_putpage;
+	/*
+	 * As movable pages are not isolated from LRU lists, concurrent
+	 * compaction threads can race against page migration functions
+	 * as well as race against the releasing a page.
+	 *
+	 * In order to avoid having an already isolated movable page
+	 * being (wrongly) re-isolated while it is under migration,
+	 * or to avoid attempting to isolate pages being released,
+	 * lets be sure we have the page lock
+	 * before proceeding with the movable page isolation steps.
+	 */
+	if (unlikely(!trylock_page(page)))
+		goto out_putpage;
+
+	if (!PageMovable(page) || PageIsolated(page))
+		goto out_no_isolated;
+
+	mapping = page_mapping(page);
+	if (!mapping->a_ops->isolate_page(page, mode))
+		goto out_no_isolated;
+
+	/* Driver shouldn't use PG_isolated bit of page->flags */
+	WARN_ON_ONCE(PageIsolated(page));
+	__SetPageIsolated(page);
+	unlock_page(page);
+
+	return true;
+
+out_no_isolated:
+	unlock_page(page);
+out_putpage:
+	put_page(page);
+out:
+	return false;
+}
+
+/* It should be called on page which is PG_movable */
+void putback_movable_page(struct page *page)
+{
+	struct address_space *mapping;
+
+	VM_BUG_ON_PAGE(!PageLocked(page), page);
+	VM_BUG_ON_PAGE(!PageMovable(page), page);
+	VM_BUG_ON_PAGE(!PageIsolated(page), page);
+
+	mapping = page_mapping(page);
+	mapping->a_ops->putback_page(page);
+	__ClearPageIsolated(page);
+}
+
 /*
  * Put previously isolated pages back onto the appropriate lists
  * from where they were once taken off for compaction/migration.
@@ -94,10 +168,25 @@ void putback_movable_pages(struct list_head *l)
 		list_del(&page->lru);
 		dec_zone_page_state(page, NR_ISOLATED_ANON +
 				page_is_file_cache(page));
-		if (unlikely(isolated_balloon_page(page)))
+		if (unlikely(isolated_balloon_page(page))) {
 			balloon_page_putback(page);
-		else
+		/*
+		 * We isolated non-lru movable page so here we can use
+		 * __PageMovable because LRU page's mapping cannot have
+		 * PAGE_MAPPING_MOVABLE.
+		 */
+		} else if (unlikely(__PageMovable(page))) {
+			VM_BUG_ON_PAGE(!PageIsolated(page), page);
+			lock_page(page);
+			if (PageMovable(page))
+				putback_movable_page(page);
+			else
+				__ClearPageIsolated(page);
+			unlock_page(page);
+			put_page(page);
+		} else {
 			putback_lru_page(page);
+		}
 	}
 }
 
@@ -592,7 +681,7 @@ void migrate_page_copy(struct page *newpage, struct page *page)
  ***********************************************************/
 
 /*
- * Common logic to directly migrate a single page suitable for
+ * Common logic to directly migrate a single LRU page suitable for
  * pages that do not use PagePrivate/PagePrivate2.
  *
  * Pages are locked upon entry and exit.
@@ -755,33 +844,69 @@ static int move_to_new_page(struct page *newpage, struct page *page,
 				enum migrate_mode mode)
 {
 	struct address_space *mapping;
-	int rc;
+	int rc = -EAGAIN;
+	bool is_lru = !__PageMovable(page);
 
 	VM_BUG_ON_PAGE(!PageLocked(page), page);
 	VM_BUG_ON_PAGE(!PageLocked(newpage), newpage);
 
 	mapping = page_mapping(page);
-	if (!mapping)
-		rc = migrate_page(mapping, newpage, page, mode);
-	else if (mapping->a_ops->migratepage)
-		/*
-		 * Most pages have a mapping and most filesystems provide a
-		 * migratepage callback. Anonymous pages are part of swap
-		 * space which also has its own migratepage callback. This
-		 * is the most common path for page migration.
-		 */
-		rc = mapping->a_ops->migratepage(mapping, newpage, page, mode);
-	else
-		rc = fallback_migrate_page(mapping, newpage, page, mode);
+	/*
+	 * In case of non-lru page, it could be released after
+	 * isolation step. In that case, we shouldn't try
+	 * fallback migration which is designed for LRU pages.
+	 */
+	if (unlikely(!is_lru)) {
+		VM_BUG_ON_PAGE(!PageIsolated(page), page);
+		if (!PageMovable(page)) {
+			rc = MIGRATEPAGE_SUCCESS;
+			__ClearPageIsolated(page);
+			goto out;
+		}
+	}
+
+	if (likely(is_lru)) {
+		if (!mapping)
+			rc = migrate_page(mapping, newpage, page, mode);
+		else if (mapping->a_ops->migratepage)
+			/*
+			 * Most pages have a mapping and most filesystems
+			 * provide a migratepage callback. Anonymous pages
+			 * are part of swap space which also has its own
+			 * migratepage callback. This is the most common path
+			 * for page migration.
+			 */
+			rc = mapping->a_ops->migratepage(mapping, newpage,
+							page, mode);
+		else
+			rc = fallback_migrate_page(mapping, newpage,
+							page, mode);
+	} else {
+		rc = mapping->a_ops->migratepage(mapping, newpage,
+						page, mode);
+		WARN_ON_ONCE(rc == MIGRATEPAGE_SUCCESS &&
+			!PageIsolated(page));
+	}
 
 	/*
 	 * When successful, old pagecache page->mapping must be cleared before
 	 * page is freed; but stats require that PageAnon be left as PageAnon.
 	 */
 	if (rc == MIGRATEPAGE_SUCCESS) {
-		if (!PageAnon(page))
+		if (__PageMovable(page)) {
+			VM_BUG_ON_PAGE(!PageIsolated(page), page);
+
+			/*
+			 * We clear PG_movable under page_lock so any compactor
+			 * cannot try to migrate this page.
+			 */
+			__ClearPageIsolated(page);
+		}
+
+		if (!((unsigned long)page->mapping & PAGE_MAPPING_FLAGS))
 			page->mapping = NULL;
 	}
+out:
 	return rc;
 }
 
@@ -791,6 +916,7 @@ static int __unmap_and_move(struct page *page, struct page *newpage,
 	int rc = -EAGAIN;
 	int page_was_mapped = 0;
 	struct anon_vma *anon_vma = NULL;
+	bool is_lru = !__PageMovable(page);
 
 	if (!trylock_page(page)) {
 		if (!force || mode == MIGRATE_ASYNC)
@@ -871,6 +997,11 @@ static int __unmap_and_move(struct page *page, struct page *newpage,
 		goto out_unlock_both;
 	}
 
+	if (unlikely(!is_lru)) {
+		rc = move_to_new_page(newpage, page, mode);
+		goto out_unlock_both;
+	}
+
 	/*
 	 * Corner case handling:
 	 * 1. When a new swap-cache page is read into, it is added to the LRU
@@ -920,7 +1051,8 @@ static int __unmap_and_move(struct page *page, struct page *newpage,
 	 * list in here.
 	 */
 	if (rc == MIGRATEPAGE_SUCCESS) {
-		if (unlikely(__is_movable_balloon_page(newpage)))
+		if (unlikely(__is_movable_balloon_page(newpage) ||
+				__PageMovable(newpage)))
 			put_page(newpage);
 		else
 			putback_lru_page(newpage);
@@ -961,6 +1093,12 @@ static ICE_noinline int unmap_and_move(new_page_t get_new_page,
 		/* page was freed from under us. So we are done. */
 		ClearPageActive(page);
 		ClearPageUnevictable(page);
+		if (unlikely(__PageMovable(page))) {
+			lock_page(page);
+			if (!PageMovable(page))
+				__ClearPageIsolated(page);
+			unlock_page(page);
+		}
 		if (put_new_page)
 			put_new_page(newpage, private);
 		else
@@ -1010,8 +1148,21 @@ static ICE_noinline int unmap_and_move(new_page_t get_new_page,
 				num_poisoned_pages_inc();
 		}
 	} else {
-		if (rc != -EAGAIN)
-			putback_lru_page(page);
+		if (rc != -EAGAIN) {
+			if (likely(!__PageMovable(page))) {
+				putback_lru_page(page);
+				goto put_new;
+			}
+
+			lock_page(page);
+			if (PageMovable(page))
+				putback_movable_page(page);
+			else
+				__ClearPageIsolated(page);
+			unlock_page(page);
+			put_page(page);
+		}
+put_new:
 		if (put_new_page)
 			put_new_page(newpage, private);
 		else
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index f8f3bfc435ee..26868bbaecce 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1008,7 +1008,7 @@ static __always_inline bool free_pages_prepare(struct page *page,
 			(page + i)->flags &= ~PAGE_FLAGS_CHECK_AT_PREP;
 		}
 	}
-	if (PageAnonHead(page))
+	if (PageMappingFlag(page))
 		page->mapping = NULL;
 	if (check_free)
 		bad += free_pages_check(page);
diff --git a/mm/util.c b/mm/util.c
index 224d36e43a94..a04ccff7cc17 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -399,10 +399,12 @@ struct address_space *page_mapping(struct page *page)
 	}
 
 	mapping = page->mapping;
-	if ((unsigned long)mapping & PAGE_MAPPING_FLAGS)
+	if ((unsigned long)mapping & PAGE_MAPPING_ANON)
 		return NULL;
-	return mapping;
+
+	return (void *)((unsigned long)mapping & ~PAGE_MAPPING_FLAGS);
 }
+EXPORT_SYMBOL(page_mapping);
 
 /* Slow path of page_mapcount() for compound pages */
 int __page_mapcount(struct page *page)
-- 
1.9.1

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH v6 02/12] mm: migrate: support non-lru movable page migration
  2016-05-20 14:23 ` [PATCH v6 02/12] mm: migrate: support non-lru movable " Minchan Kim
@ 2016-05-27 14:26   ` Vlastimil Babka
  2016-05-30  1:33     ` Minchan Kim
  2016-05-30  1:39   ` PATCH v6v2 " Minchan Kim
  1 sibling, 1 reply; 22+ messages in thread
From: Vlastimil Babka @ 2016-05-27 14:26 UTC (permalink / raw)
  To: Minchan Kim, Andrew Morton
  Cc: Rik van Riel, Sergey Senozhatsky, Rafael Aquini, Jonathan Corbet,
	Hugh Dickins, linux-kernel, dri-devel, virtualization,
	John Einar Reitan, linux-mm, Mel Gorman, Joonsoo Kim, Gioh Kim

On 05/20/2016 04:23 PM, Minchan Kim wrote:
> We have allowed migration for only LRU pages until now and it was
> enough to make high-order pages. But recently, embedded system(e.g.,
> webOS, android) uses lots of non-movable pages(e.g., zram, GPU memory)
> so we have seen several reports about troubles of small high-order
> allocation. For fixing the problem, there were several efforts
> (e,g,. enhance compaction algorithm, SLUB fallback to 0-order page,
> reserved memory, vmalloc and so on) but if there are lots of
> non-movable pages in system, their solutions are void in the long run.
>
> So, this patch is to support facility to change non-movable pages
> with movable. For the feature, this patch introduces functions related
> to migration to address_space_operations as well as some page flags.
>
> If a driver want to make own pages movable, it should define three functions
> which are function pointers of struct address_space_operations.
>
> 1. bool (*isolate_page) (struct page *page, isolate_mode_t mode);
>
> What VM expects on isolate_page function of driver is to return *true*
> if driver isolates page successfully. On returing true, VM marks the page
> as PG_isolated so concurrent isolation in several CPUs skip the page
> for isolation. If a driver cannot isolate the page, it should return *false*.
>
> Once page is successfully isolated, VM uses page.lru fields so driver
> shouldn't expect to preserve values in that fields.
>
> 2. int (*migratepage) (struct address_space *mapping,
> 		struct page *newpage, struct page *oldpage, enum migrate_mode);
>
> After isolation, VM calls migratepage of driver with isolated page.
> The function of migratepage is to move content of the old page to new page
> and set up fields of struct page newpage. Keep in mind that you should
> clear PG_movable of oldpage via __ClearPageMovable under page_lock if you
> migrated the oldpage successfully and returns 0.
> If driver cannot migrate the page at the moment, driver can return -EAGAIN.
> On -EAGAIN, VM will retry page migration in a short time because VM interprets
> -EAGAIN as "temporal migration failure". On returning any error except -EAGAIN,
> VM will give up the page migration without retrying in this time.
>
> Driver shouldn't touch page.lru field VM using in the functions.
>
> 3. void (*putback_page)(struct page *);
>
> If migration fails on isolated page, VM should return the isolated page
> to the driver so VM calls driver's putback_page with migration failed page.
> In this function, driver should put the isolated page back to the own data
> structure.
>
> 4. non-lru movable page flags
>
> There are two page flags for supporting non-lru movable page.
>
> * PG_movable
>
> Driver should use the below function to make page movable under page_lock.
>
> 	void __SetPageMovable(struct page *page, struct address_space *mapping)
>
> It needs argument of address_space for registering migration family functions
> which will be called by VM. Exactly speaking, PG_movable is not a real flag of
> struct page. Rather than, VM reuses page->mapping's lower bits to represent it.
>
> 	#define PAGE_MAPPING_MOVABLE 0x2
> 	page->mapping = page->mapping | PAGE_MAPPING_MOVABLE;

Interesting, let's see how that works out...

Overal this looks much better than the last version I checked!

[...]

> @@ -357,29 +360,37 @@ PAGEFLAG(Idle, idle, PF_ANY)
>   * with the PAGE_MAPPING_ANON bit set to distinguish it.  See rmap.h.
>   *
>   * On an anonymous page in a VM_MERGEABLE area, if CONFIG_KSM is enabled,
> - * the PAGE_MAPPING_KSM bit may be set along with the PAGE_MAPPING_ANON bit;
> - * and then page->mapping points, not to an anon_vma, but to a private
> + * the PAGE_MAPPING_MOVABLE bit may be set along with the PAGE_MAPPING_ANON
> + * bit; and then page->mapping points, not to an anon_vma, but to a private
>   * structure which KSM associates with that merged page.  See ksm.h.
>   *
> - * PAGE_MAPPING_KSM without PAGE_MAPPING_ANON is currently never used.
> + * PAGE_MAPPING_KSM without PAGE_MAPPING_ANON is used for non-lru movable
> + * page and then page->mapping points a struct address_space.
>   *
>   * Please note that, confusingly, "page_mapping" refers to the inode
>   * address_space which maps the page from disk; whereas "page_mapped"
>   * refers to user virtual address space into which the page is mapped.
>   */
> -#define PAGE_MAPPING_ANON	1
> -#define PAGE_MAPPING_KSM	2
> -#define PAGE_MAPPING_FLAGS	(PAGE_MAPPING_ANON | PAGE_MAPPING_KSM)
> +#define PAGE_MAPPING_ANON	0x1
> +#define PAGE_MAPPING_MOVABLE	0x2
> +#define PAGE_MAPPING_KSM	(PAGE_MAPPING_ANON | PAGE_MAPPING_MOVABLE)
> +#define PAGE_MAPPING_FLAGS	(PAGE_MAPPING_ANON | PAGE_MAPPING_MOVABLE)
>
> -static __always_inline int PageAnonHead(struct page *page)
> +static __always_inline int PageMappingFlag(struct page *page)

PageMappingFlags()?

[...]

> diff --git a/mm/compaction.c b/mm/compaction.c
> index 1427366ad673..2d6862d0df60 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -81,6 +81,41 @@ static inline bool migrate_async_suitable(int migratetype)
>
>  #ifdef CONFIG_COMPACTION
>
> +int PageMovable(struct page *page)
> +{
> +	struct address_space *mapping;
> +
> +	WARN_ON(!PageLocked(page));

Why not VM_BUG_ON_PAGE as elsewhere?

> +	if (!__PageMovable(page))
> +		goto out;

Just return 0.

> +
> +	mapping = page_mapping(page);
> +	if (mapping && mapping->a_ops && mapping->a_ops->isolate_page)
> +		return 1;
> +out:
> +	return 0;
> +}
> +EXPORT_SYMBOL(PageMovable);
> +
> +void __SetPageMovable(struct page *page, struct address_space *mapping)
> +{
> +	VM_BUG_ON_PAGE(!PageLocked(page), page);
> +	VM_BUG_ON_PAGE((unsigned long)mapping & PAGE_MAPPING_MOVABLE, page);
> +	page->mapping = (void *)((unsigned long)mapping | PAGE_MAPPING_MOVABLE);
> +}
> +EXPORT_SYMBOL(__SetPageMovable);
> +
> +void __ClearPageMovable(struct page *page)
> +{
> +	VM_BUG_ON_PAGE(!PageLocked(page), page);
> +	VM_BUG_ON_PAGE(!PageMovable(page), page);
> +	VM_BUG_ON_PAGE(!((unsigned long)page->mapping & PAGE_MAPPING_MOVABLE),
> +				page);

The last line sounds redundant, PageMovable() already checked this via 
__PageMovable()


> +	page->mapping = (void *)((unsigned long)page->mapping &
> +				PAGE_MAPPING_MOVABLE);

This should be negated to clear... use ~PAGE_MAPPING_MOVABLE ?

> +}
> +EXPORT_SYMBOL(__ClearPageMovable);
> +
>  /* Do not skip compaction more than 64 times */
>  #define COMPACT_MAX_DEFER_SHIFT 6
>
> @@ -735,21 +770,6 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
>  		}
>
>  		/*
> -		 * Check may be lockless but that's ok as we recheck later.
> -		 * It's possible to migrate LRU pages and balloon pages
> -		 * Skip any other type of page
> -		 */
> -		is_lru = PageLRU(page);
> -		if (!is_lru) {
> -			if (unlikely(balloon_page_movable(page))) {
> -				if (balloon_page_isolate(page)) {
> -					/* Successfully isolated */
> -					goto isolate_success;
> -				}
> -			}
> -		}

So this effectively prevents movable compound pages from being migrated. Are you 
sure no users of this functionality are going to have compound pages? I assumed 
that they could, and so made the code like this, with the is_lru variable (which 
is redundant after your change).

> -		/*
>  		 * Regardless of being on LRU, compound pages such as THP and
>  		 * hugetlbfs are not to be compacted. We can potentially save
>  		 * a lot of iterations if we skip them at once. The check is
> @@ -765,8 +785,38 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
>  			goto isolate_fail;
>  		}
>
> -		if (!is_lru)
> +		/*
> +		 * Check may be lockless but that's ok as we recheck later.
> +		 * It's possible to migrate LRU and non-lru movable pages.
> +		 * Skip any other type of page
> +		 */
> +		is_lru = PageLRU(page);
> +		if (!is_lru) {
> +			if (unlikely(balloon_page_movable(page))) {
> +				if (balloon_page_isolate(page)) {
> +					/* Successfully isolated */
> +					goto isolate_success;
> +				}
> +			}

[...]

> +bool isolate_movable_page(struct page *page, isolate_mode_t mode)
> +{
> +	struct address_space *mapping;
> +
> +	/*
> +	 * Avoid burning cycles with pages that are yet under __free_pages(),
> +	 * or just got freed under us.
> +	 *
> +	 * In case we 'win' a race for a movable page being freed under us and
> +	 * raise its refcount preventing __free_pages() from doing its job
> +	 * the put_page() at the end of this block will take care of
> +	 * release this page, thus avoiding a nasty leakage.
> +	 */
> +	if (unlikely(!get_page_unless_zero(page)))
> +		goto out;
> +
> +	/*
> +	 * Check PageMovable before holding a PG_lock because page's owner
> +	 * assumes anybody doesn't touch PG_lock of newly allocated page
> +	 * so unconditionally grapping the lock ruins page's owner side.
> +	 */
> +	if (unlikely(!__PageMovable(page)))
> +		goto out_putpage;
> +	/*
> +	 * As movable pages are not isolated from LRU lists, concurrent
> +	 * compaction threads can race against page migration functions
> +	 * as well as race against the releasing a page.
> +	 *
> +	 * In order to avoid having an already isolated movable page
> +	 * being (wrongly) re-isolated while it is under migration,
> +	 * or to avoid attempting to isolate pages being released,
> +	 * lets be sure we have the page lock
> +	 * before proceeding with the movable page isolation steps.
> +	 */
> +	if (unlikely(!trylock_page(page)))
> +		goto out_putpage;
> +
> +	if (!PageMovable(page) || PageIsolated(page))
> +		goto out_no_isolated;
> +
> +	mapping = page_mapping(page);

Hmm so on first tail page of a THP compound page, page->mapping will alias with 
compound_mapcount. That can easily have a value matching PageMovable flags and 
we'll proceed and start inspecting the compound head in page_mapping()... maybe 
it's not a big deal, or we better check and skip PageTail first, must think 
about it more...

[...]

> @@ -755,33 +844,69 @@ static int move_to_new_page(struct page *newpage, struct page *page,
>  				enum migrate_mode mode)
>  {
>  	struct address_space *mapping;
> -	int rc;
> +	int rc = -EAGAIN;
> +	bool is_lru = !__PageMovable(page);
>
>  	VM_BUG_ON_PAGE(!PageLocked(page), page);
>  	VM_BUG_ON_PAGE(!PageLocked(newpage), newpage);
>
>  	mapping = page_mapping(page);
> -	if (!mapping)
> -		rc = migrate_page(mapping, newpage, page, mode);
> -	else if (mapping->a_ops->migratepage)
> -		/*
> -		 * Most pages have a mapping and most filesystems provide a
> -		 * migratepage callback. Anonymous pages are part of swap
> -		 * space which also has its own migratepage callback. This
> -		 * is the most common path for page migration.
> -		 */
> -		rc = mapping->a_ops->migratepage(mapping, newpage, page, mode);
> -	else
> -		rc = fallback_migrate_page(mapping, newpage, page, mode);
> +	/*
> +	 * In case of non-lru page, it could be released after
> +	 * isolation step. In that case, we shouldn't try
> +	 * fallback migration which is designed for LRU pages.
> +	 */

Hmm but is_lru was determined from !__PageMovable() above, also well after the 
isolation step. So if the driver already released it, we wouldn't detect it? And 
this function is all under same page lock, so if __PageMovable was true above, 
so will be PageMovable below?

> +	if (unlikely(!is_lru)) {
> +		VM_BUG_ON_PAGE(!PageIsolated(page), page);
> +		if (!PageMovable(page)) {
> +			rc = MIGRATEPAGE_SUCCESS;
> +			__ClearPageIsolated(page);
> +			goto out;
> +		}
> +	}
> +
> +	if (likely(is_lru)) {
> +		if (!mapping)
> +			rc = migrate_page(mapping, newpage, page, mode);
> +		else if (mapping->a_ops->migratepage)
> +			/*
> +			 * Most pages have a mapping and most filesystems
> +			 * provide a migratepage callback. Anonymous pages
> +			 * are part of swap space which also has its own
> +			 * migratepage callback. This is the most common path
> +			 * for page migration.
> +			 */
> +			rc = mapping->a_ops->migratepage(mapping, newpage,
> +							page, mode);
> +		else
> +			rc = fallback_migrate_page(mapping, newpage,
> +							page, mode);
> +	} else {
> +		rc = mapping->a_ops->migratepage(mapping, newpage,
> +						page, mode);
> +		WARN_ON_ONCE(rc == MIGRATEPAGE_SUCCESS &&
> +			!PageIsolated(page));
> +	}

Why split the !is_lru handling in two places?

>
>  	/*
>  	 * When successful, old pagecache page->mapping must be cleared before
>  	 * page is freed; but stats require that PageAnon be left as PageAnon.
>  	 */
>  	if (rc == MIGRATEPAGE_SUCCESS) {
> -		if (!PageAnon(page))
> +		if (__PageMovable(page)) {
> +			VM_BUG_ON_PAGE(!PageIsolated(page), page);
> +
> +			/*
> +			 * We clear PG_movable under page_lock so any compactor
> +			 * cannot try to migrate this page.
> +			 */
> +			__ClearPageIsolated(page);
> +		}
> +
> +		if (!((unsigned long)page->mapping & PAGE_MAPPING_FLAGS))
>  			page->mapping = NULL;

The two lines above make little sense to me without a comment. Should the 
condition be negated, even?


_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v6 02/12] mm: migrate: support non-lru movable page migration
  2016-05-27 14:26   ` Vlastimil Babka
@ 2016-05-30  1:33     ` Minchan Kim
  2016-05-30  9:01       ` Vlastimil Babka
  0 siblings, 1 reply; 22+ messages in thread
From: Minchan Kim @ 2016-05-30  1:33 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Rik van Riel, Sergey Senozhatsky, Rafael Aquini, Jonathan Corbet,
	Hugh Dickins, linux-kernel, dri-devel, virtualization,
	John Einar Reitan, linux-mm, Mel Gorman, Andrew Morton,
	Joonsoo Kim, Gioh Kim

On Fri, May 27, 2016 at 04:26:21PM +0200, Vlastimil Babka wrote:
> On 05/20/2016 04:23 PM, Minchan Kim wrote:
> >We have allowed migration for only LRU pages until now and it was
> >enough to make high-order pages. But recently, embedded system(e.g.,
> >webOS, android) uses lots of non-movable pages(e.g., zram, GPU memory)
> >so we have seen several reports about troubles of small high-order
> >allocation. For fixing the problem, there were several efforts
> >(e,g,. enhance compaction algorithm, SLUB fallback to 0-order page,
> >reserved memory, vmalloc and so on) but if there are lots of
> >non-movable pages in system, their solutions are void in the long run.
> >
> >So, this patch is to support facility to change non-movable pages
> >with movable. For the feature, this patch introduces functions related
> >to migration to address_space_operations as well as some page flags.
> >
> >If a driver want to make own pages movable, it should define three functions
> >which are function pointers of struct address_space_operations.
> >
> >1. bool (*isolate_page) (struct page *page, isolate_mode_t mode);
> >
> >What VM expects on isolate_page function of driver is to return *true*
> >if driver isolates page successfully. On returing true, VM marks the page
> >as PG_isolated so concurrent isolation in several CPUs skip the page
> >for isolation. If a driver cannot isolate the page, it should return *false*.
> >
> >Once page is successfully isolated, VM uses page.lru fields so driver
> >shouldn't expect to preserve values in that fields.
> >
> >2. int (*migratepage) (struct address_space *mapping,
> >		struct page *newpage, struct page *oldpage, enum migrate_mode);
> >
> >After isolation, VM calls migratepage of driver with isolated page.
> >The function of migratepage is to move content of the old page to new page
> >and set up fields of struct page newpage. Keep in mind that you should
> >clear PG_movable of oldpage via __ClearPageMovable under page_lock if you
> >migrated the oldpage successfully and returns 0.
> >If driver cannot migrate the page at the moment, driver can return -EAGAIN.
> >On -EAGAIN, VM will retry page migration in a short time because VM interprets
> >-EAGAIN as "temporal migration failure". On returning any error except -EAGAIN,
> >VM will give up the page migration without retrying in this time.
> >
> >Driver shouldn't touch page.lru field VM using in the functions.
> >
> >3. void (*putback_page)(struct page *);
> >
> >If migration fails on isolated page, VM should return the isolated page
> >to the driver so VM calls driver's putback_page with migration failed page.
> >In this function, driver should put the isolated page back to the own data
> >structure.
> >
> >4. non-lru movable page flags
> >
> >There are two page flags for supporting non-lru movable page.
> >
> >* PG_movable
> >
> >Driver should use the below function to make page movable under page_lock.
> >
> >	void __SetPageMovable(struct page *page, struct address_space *mapping)
> >
> >It needs argument of address_space for registering migration family functions
> >which will be called by VM. Exactly speaking, PG_movable is not a real flag of
> >struct page. Rather than, VM reuses page->mapping's lower bits to represent it.
> >
> >	#define PAGE_MAPPING_MOVABLE 0x2
> >	page->mapping = page->mapping | PAGE_MAPPING_MOVABLE;
> 
> Interesting, let's see how that works out...
> 
> Overal this looks much better than the last version I checked!

Thanks.

> 
> [...]
> 
> >@@ -357,29 +360,37 @@ PAGEFLAG(Idle, idle, PF_ANY)
> >  * with the PAGE_MAPPING_ANON bit set to distinguish it.  See rmap.h.
> >  *
> >  * On an anonymous page in a VM_MERGEABLE area, if CONFIG_KSM is enabled,
> >- * the PAGE_MAPPING_KSM bit may be set along with the PAGE_MAPPING_ANON bit;
> >- * and then page->mapping points, not to an anon_vma, but to a private
> >+ * the PAGE_MAPPING_MOVABLE bit may be set along with the PAGE_MAPPING_ANON
> >+ * bit; and then page->mapping points, not to an anon_vma, but to a private
> >  * structure which KSM associates with that merged page.  See ksm.h.
> >  *
> >- * PAGE_MAPPING_KSM without PAGE_MAPPING_ANON is currently never used.
> >+ * PAGE_MAPPING_KSM without PAGE_MAPPING_ANON is used for non-lru movable
> >+ * page and then page->mapping points a struct address_space.
> >  *
> >  * Please note that, confusingly, "page_mapping" refers to the inode
> >  * address_space which maps the page from disk; whereas "page_mapped"
> >  * refers to user virtual address space into which the page is mapped.
> >  */
> >-#define PAGE_MAPPING_ANON	1
> >-#define PAGE_MAPPING_KSM	2
> >-#define PAGE_MAPPING_FLAGS	(PAGE_MAPPING_ANON | PAGE_MAPPING_KSM)
> >+#define PAGE_MAPPING_ANON	0x1
> >+#define PAGE_MAPPING_MOVABLE	0x2
> >+#define PAGE_MAPPING_KSM	(PAGE_MAPPING_ANON | PAGE_MAPPING_MOVABLE)
> >+#define PAGE_MAPPING_FLAGS	(PAGE_MAPPING_ANON | PAGE_MAPPING_MOVABLE)
> >
> >-static __always_inline int PageAnonHead(struct page *page)
> >+static __always_inline int PageMappingFlag(struct page *page)
> 
> PageMappingFlags()?

Yeb.

> 
> [...]
> 
> >diff --git a/mm/compaction.c b/mm/compaction.c
> >index 1427366ad673..2d6862d0df60 100644
> >--- a/mm/compaction.c
> >+++ b/mm/compaction.c
> >@@ -81,6 +81,41 @@ static inline bool migrate_async_suitable(int migratetype)
> >
> > #ifdef CONFIG_COMPACTION
> >
> >+int PageMovable(struct page *page)
> >+{
> >+	struct address_space *mapping;
> >+
> >+	WARN_ON(!PageLocked(page));
> 
> Why not VM_BUG_ON_PAGE as elsewhere?

I have backported this patchset to huge old kernel which doesn't have
VM_BUG_ON_PAGE and forgot to correct it beforre sending mainline.

Thanks.

> 
> >+	if (!__PageMovable(page))
> >+		goto out;
> 
> Just return 0.

Maybe I love goto. You realized me. I will split up will her.

> 
> >+
> >+	mapping = page_mapping(page);
> >+	if (mapping && mapping->a_ops && mapping->a_ops->isolate_page)
> >+		return 1;
> >+out:
> >+	return 0;
> >+}
> >+EXPORT_SYMBOL(PageMovable);
> >+
> >+void __SetPageMovable(struct page *page, struct address_space *mapping)
> >+{
> >+	VM_BUG_ON_PAGE(!PageLocked(page), page);
> >+	VM_BUG_ON_PAGE((unsigned long)mapping & PAGE_MAPPING_MOVABLE, page);
> >+	page->mapping = (void *)((unsigned long)mapping | PAGE_MAPPING_MOVABLE);
> >+}
> >+EXPORT_SYMBOL(__SetPageMovable);
> >+
> >+void __ClearPageMovable(struct page *page)
> >+{
> >+	VM_BUG_ON_PAGE(!PageLocked(page), page);
> >+	VM_BUG_ON_PAGE(!PageMovable(page), page);
> >+	VM_BUG_ON_PAGE(!((unsigned long)page->mapping & PAGE_MAPPING_MOVABLE),
> >+				page);
> 
> The last line sounds redundant, PageMovable() already checked this
> via __PageMovable()

Yeb.

> 
> 
> >+	page->mapping = (void *)((unsigned long)page->mapping &
> >+				PAGE_MAPPING_MOVABLE);
> 
> This should be negated to clear... use ~PAGE_MAPPING_MOVABLE ?

No.

The intention is to clear only mapping value but PAGE_MAPPING_MOVABLE
flag. So, any new migration trial will be failed because PageMovable
checks page's mapping value but ongoing migraion handling can catch
whether it's movable page or not with the type bit.

For example, we need to keep the type bit to handle putback the page.
With parallel freeing(e.g., __ClearPageMovable) from the owner after
isolating the page, migration will be failed and put it back. Then,
we need to identify movable to clear PG_isolate and shouldn't call
mapping->a_ops->putback_page. For that, we need to keep the type bit
until the page is return to buddy.

> 
> >+}
> >+EXPORT_SYMBOL(__ClearPageMovable);
> >+
> > /* Do not skip compaction more than 64 times */
> > #define COMPACT_MAX_DEFER_SHIFT 6
> >
> >@@ -735,21 +770,6 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
> > 		}
> >
> > 		/*
> >-		 * Check may be lockless but that's ok as we recheck later.
> >-		 * It's possible to migrate LRU pages and balloon pages
> >-		 * Skip any other type of page
> >-		 */
> >-		is_lru = PageLRU(page);
> >-		if (!is_lru) {
> >-			if (unlikely(balloon_page_movable(page))) {
> >-				if (balloon_page_isolate(page)) {
> >-					/* Successfully isolated */
> >-					goto isolate_success;
> >-				}
> >-			}
> >-		}
> 
> So this effectively prevents movable compound pages from being
> migrated. Are you sure no users of this functionality are going to
> have compound pages? I assumed that they could, and so made the code
> like this, with the is_lru variable (which is redundant after your
> change).

This implementation at the moment disables effectively non-lru compound
page migration but I'm not a god so I can't make sure no one doesn't want
it in future. If someone want it, we can support it then because this work
doesn't prevent it by design.

> 
> >-		/*
> > 		 * Regardless of being on LRU, compound pages such as THP and
> > 		 * hugetlbfs are not to be compacted. We can potentially save
> > 		 * a lot of iterations if we skip them at once. The check is
> >@@ -765,8 +785,38 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
> > 			goto isolate_fail;
> > 		}
> >
> >-		if (!is_lru)
> >+		/*
> >+		 * Check may be lockless but that's ok as we recheck later.
> >+		 * It's possible to migrate LRU and non-lru movable pages.
> >+		 * Skip any other type of page
> >+		 */
> >+		is_lru = PageLRU(page);
> >+		if (!is_lru) {
> >+			if (unlikely(balloon_page_movable(page))) {
> >+				if (balloon_page_isolate(page)) {
> >+					/* Successfully isolated */
> >+					goto isolate_success;
> >+				}
> >+			}
> 
> [...]
> 
> >+bool isolate_movable_page(struct page *page, isolate_mode_t mode)
> >+{
> >+	struct address_space *mapping;
> >+
> >+	/*
> >+	 * Avoid burning cycles with pages that are yet under __free_pages(),
> >+	 * or just got freed under us.
> >+	 *
> >+	 * In case we 'win' a race for a movable page being freed under us and
> >+	 * raise its refcount preventing __free_pages() from doing its job
> >+	 * the put_page() at the end of this block will take care of
> >+	 * release this page, thus avoiding a nasty leakage.
> >+	 */
> >+	if (unlikely(!get_page_unless_zero(page)))
> >+		goto out;
> >+
> >+	/*
> >+	 * Check PageMovable before holding a PG_lock because page's owner
> >+	 * assumes anybody doesn't touch PG_lock of newly allocated page
> >+	 * so unconditionally grapping the lock ruins page's owner side.
> >+	 */
> >+	if (unlikely(!__PageMovable(page)))
> >+		goto out_putpage;
> >+	/*
> >+	 * As movable pages are not isolated from LRU lists, concurrent
> >+	 * compaction threads can race against page migration functions
> >+	 * as well as race against the releasing a page.
> >+	 *
> >+	 * In order to avoid having an already isolated movable page
> >+	 * being (wrongly) re-isolated while it is under migration,
> >+	 * or to avoid attempting to isolate pages being released,
> >+	 * lets be sure we have the page lock
> >+	 * before proceeding with the movable page isolation steps.
> >+	 */
> >+	if (unlikely(!trylock_page(page)))
> >+		goto out_putpage;
> >+
> >+	if (!PageMovable(page) || PageIsolated(page))
> >+		goto out_no_isolated;
> >+
> >+	mapping = page_mapping(page);
> 
> Hmm so on first tail page of a THP compound page, page->mapping will
> alias with compound_mapcount. That can easily have a value matching
> PageMovable flags and we'll proceed and start inspecting the
> compound head in page_mapping()... maybe it's not a big deal, or we
> better check and skip PageTail first, must think about it more...

I thouht PageCompound check right before isolate_movable_page in
isolate_migratepages_block will filter it out mostly but yeah
it is racy without zone->lru_lock so it could reach to isolate_movable_page.
However, PageMovable check in there investigates mapping, mapping->a_ops,
and a_ops->isolate_page to verify whether it's movable page or not.

I thought it's sufficient to filter THP page.

> 
> [...]
> 
> >@@ -755,33 +844,69 @@ static int move_to_new_page(struct page *newpage, struct page *page,
> > 				enum migrate_mode mode)
> > {
> > 	struct address_space *mapping;
> >-	int rc;
> >+	int rc = -EAGAIN;
> >+	bool is_lru = !__PageMovable(page);
> >
> > 	VM_BUG_ON_PAGE(!PageLocked(page), page);
> > 	VM_BUG_ON_PAGE(!PageLocked(newpage), newpage);
> >
> > 	mapping = page_mapping(page);
> >-	if (!mapping)
> >-		rc = migrate_page(mapping, newpage, page, mode);
> >-	else if (mapping->a_ops->migratepage)
> >-		/*
> >-		 * Most pages have a mapping and most filesystems provide a
> >-		 * migratepage callback. Anonymous pages are part of swap
> >-		 * space which also has its own migratepage callback. This
> >-		 * is the most common path for page migration.
> >-		 */
> >-		rc = mapping->a_ops->migratepage(mapping, newpage, page, mode);
> >-	else
> >-		rc = fallback_migrate_page(mapping, newpage, page, mode);
> >+	/*
> >+	 * In case of non-lru page, it could be released after
> >+	 * isolation step. In that case, we shouldn't try
> >+	 * fallback migration which is designed for LRU pages.
> >+	 */
> 
> Hmm but is_lru was determined from !__PageMovable() above, also well
> after the isolation step. So if the driver already released it, we
> wouldn't detect it? And this function is all under same page lock,
> so if __PageMovable was true above, so will be PageMovable below?

You are missing what I mentioned above.
We should keep the type bit to catch what you are saying(i.e., driver
already released).

__PageMovable just checks PAGE_MAPPING_MOVABLE flag and PageMovable
checks page->mapping valid while __ClearPageMovable reset only
valid vaule of mapping, not PAGE_MAPPING_MOVABLE flag.

I wrote it down in Documentation/vm/page_migration.

"For testing of non-lru movable page, VM supports __PageMovable function.
However, it doesn't guarantee to identify non-lru movable page because
page->mapping field is unified with other variables in struct page.
As well, if driver releases the page after isolation by VM, page->mapping
doesn't have stable value although it has PAGE_MAPPING_MOVABLE
(Look at __ClearPageMovable). But __PageMovable is cheap to catch whether
page is LRU or non-lru movable once the page has been isolated. Because
LRU pages never can have PAGE_MAPPING_MOVABLE in page->mapping. It is also
good for just peeking to test non-lru movable pages before more expensive
checking with lock_page in pfn scanning to select victim.

For guaranteeing non-lru movable page, VM provides PageMovable function.
Unlike __PageMovable, PageMovable functions validates page->mapping and 
mapping->a_ops->isolate_page under lock_page. The lock_page prevents sudden
destroying of page->mapping.

Driver using __SetPageMovable should clear the flag via __ClearMovablePage
under page_lock before the releasing the page."

> 
> >+	if (unlikely(!is_lru)) {
> >+		VM_BUG_ON_PAGE(!PageIsolated(page), page);
> >+		if (!PageMovable(page)) {
> >+			rc = MIGRATEPAGE_SUCCESS;
> >+			__ClearPageIsolated(page);
> >+			goto out;
> >+		}
> >+	}
> >+
> >+	if (likely(is_lru)) {
> >+		if (!mapping)
> >+			rc = migrate_page(mapping, newpage, page, mode);
> >+		else if (mapping->a_ops->migratepage)
> >+			/*
> >+			 * Most pages have a mapping and most filesystems
> >+			 * provide a migratepage callback. Anonymous pages
> >+			 * are part of swap space which also has its own
> >+			 * migratepage callback. This is the most common path
> >+			 * for page migration.
> >+			 */
> >+			rc = mapping->a_ops->migratepage(mapping, newpage,
> >+							page, mode);
> >+		else
> >+			rc = fallback_migrate_page(mapping, newpage,
> >+							page, mode);
> >+	} else {
> >+		rc = mapping->a_ops->migratepage(mapping, newpage,
> >+						page, mode);
> >+		WARN_ON_ONCE(rc == MIGRATEPAGE_SUCCESS &&
> >+			!PageIsolated(page));
> >+	}
> 
> Why split the !is_lru handling in two places?

Fixed.

> 
> >
> > 	/*
> > 	 * When successful, old pagecache page->mapping must be cleared before
> > 	 * page is freed; but stats require that PageAnon be left as PageAnon.
> > 	 */
> > 	if (rc == MIGRATEPAGE_SUCCESS) {
> >-		if (!PageAnon(page))
> >+		if (__PageMovable(page)) {
> >+			VM_BUG_ON_PAGE(!PageIsolated(page), page);
> >+
> >+			/*
> >+			 * We clear PG_movable under page_lock so any compactor
> >+			 * cannot try to migrate this page.
> >+			 */
> >+			__ClearPageIsolated(page);
> >+		}
> >+
> >+		if (!((unsigned long)page->mapping & PAGE_MAPPING_FLAGS))
> > 			page->mapping = NULL;
> 
> The two lines above make little sense to me without a comment.

I folded this.

@@ -901,7 +901,12 @@ static int move_to_new_page(struct page *newpage, struct page *page,
                        __ClearPageIsolated(page);
                }
 
-               if (!((unsigned long)page->mapping & PAGE_MAPPING_FLAGS))
+               /*
+                * Anonymous and movable page->mapping will be cleard by
+                * free_pages_prepare so don't reset it here for keeping
+                * the type to work PageAnon, for example.
+                */
+               if (!PageMappingFlags(page))
                        page->mapping = NULL;
        }
 out:


> Should the condition be negated, even?

No.

> 
> 
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* PATCH v6v2 02/12] mm: migrate: support non-lru movable page migration
  2016-05-20 14:23 ` [PATCH v6 02/12] mm: migrate: support non-lru movable " Minchan Kim
  2016-05-27 14:26   ` Vlastimil Babka
@ 2016-05-30  1:39   ` Minchan Kim
  2016-05-30  9:36     ` Vlastimil Babka
  2016-05-31  0:01     ` [PATCH v6v3 " Minchan Kim
  1 sibling, 2 replies; 22+ messages in thread
From: Minchan Kim @ 2016-05-30  1:39 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Rik van Riel, Sergey Senozhatsky, Rafael Aquini, Jonathan Corbet,
	Hugh Dickins, linux-kernel, dri-devel, virtualization,
	John Einar Reitan, linux-mm, Gioh Kim, Mel Gorman, Joonsoo Kim,
	Vlastimil Babka

Per Vlastimil's review comment,

Vlastimil, I updated based on your comment. Please review this.
If everything is done, I will send v7 rebased on recent mmotm.

Thanks for the review!

>From ad4157e98651a2d18fd0a4ae90d1d9f609aab314 Mon Sep 17 00:00:00 2001
From: Minchan Kim <minchan@kernel.org>
Date: Fri, 8 Apr 2016 10:34:49 +0900
Subject: [PATCH v6r2] mm: migrate: support non-lru movable page migration

We have allowed migration for only LRU pages until now and it was
enough to make high-order pages. But recently, embedded system(e.g.,
webOS, android) uses lots of non-movable pages(e.g., zram, GPU memory)
so we have seen several reports about troubles of small high-order
allocation. For fixing the problem, there were several efforts
(e,g,. enhance compaction algorithm, SLUB fallback to 0-order page,
reserved memory, vmalloc and so on) but if there are lots of
non-movable pages in system, their solutions are void in the long run.

So, this patch is to support facility to change non-movable pages
with movable. For the feature, this patch introduces functions related
to migration to address_space_operations as well as some page flags.

If a driver want to make own pages movable, it should define three functions
which are function pointers of struct address_space_operations.

1. bool (*isolate_page) (struct page *page, isolate_mode_t mode);

What VM expects on isolate_page function of driver is to return *true*
if driver isolates page successfully. On returing true, VM marks the page
as PG_isolated so concurrent isolation in several CPUs skip the page
for isolation. If a driver cannot isolate the page, it should return *false*.

Once page is successfully isolated, VM uses page.lru fields so driver
shouldn't expect to preserve values in that fields.

2. int (*migratepage) (struct address_space *mapping,
		struct page *newpage, struct page *oldpage, enum migrate_mode);

After isolation, VM calls migratepage of driver with isolated page.
The function of migratepage is to move content of the old page to new page
and set up fields of struct page newpage. Keep in mind that you should
clear PG_movable of oldpage via __ClearPageMovable under page_lock if you
migrated the oldpage successfully and returns 0.
If driver cannot migrate the page at the moment, driver can return -EAGAIN.
On -EAGAIN, VM will retry page migration in a short time because VM interprets
-EAGAIN as "temporal migration failure". On returning any error except -EAGAIN,
VM will give up the page migration without retrying in this time.

Driver shouldn't touch page.lru field VM using in the functions.

3. void (*putback_page)(struct page *);

If migration fails on isolated page, VM should return the isolated page
to the driver so VM calls driver's putback_page with migration failed page.
In this function, driver should put the isolated page back to the own data
structure.

4. non-lru movable page flags

There are two page flags for supporting non-lru movable page.

* PG_movable

Driver should use the below function to make page movable under page_lock.

	void __SetPageMovable(struct page *page, struct address_space *mapping)

It needs argument of address_space for registering migration family functions
which will be called by VM. Exactly speaking, PG_movable is not a real flag of
struct page. Rather than, VM reuses page->mapping's lower bits to represent it.

	#define PAGE_MAPPING_MOVABLE 0x2
	page->mapping = page->mapping | PAGE_MAPPING_MOVABLE;

so driver shouldn't access page->mapping directly. Instead, driver should
use page_mapping which mask off the low two bits of page->mapping so it can get
right struct address_space.

For testing of non-lru movable page, VM supports __PageMovable function.
However, it doesn't guarantee to identify non-lru movable page because
page->mapping field is unified with other variables in struct page.
As well, if driver releases the page after isolation by VM, page->mapping
doesn't have stable value although it has PAGE_MAPPING_MOVABLE
(Look at __ClearPageMovable). But __PageMovable is cheap to catch whether
page is LRU or non-lru movable once the page has been isolated. Because
LRU pages never can have PAGE_MAPPING_MOVABLE in page->mapping. It is also
good for just peeking to test non-lru movable pages before more expensive
checking with lock_page in pfn scanning to select victim.

For guaranteeing non-lru movable page, VM provides PageMovable function.
Unlike __PageMovable, PageMovable functions validates page->mapping and
mapping->a_ops->isolate_page under lock_page. The lock_page prevents sudden
destroying of page->mapping.

Driver using __SetPageMovable should clear the flag via __ClearMovablePage
under page_lock before the releasing the page.

* PG_isolated

To prevent concurrent isolation among several CPUs, VM marks isolated page
as PG_isolated under lock_page. So if a CPU encounters PG_isolated non-lru
movable page, it can skip it. Driver doesn't need to manipulate the flag
because VM will set/clear it automatically. Keep in mind that if driver
sees PG_isolated page, it means the page have been isolated by VM so it
shouldn't touch page.lru field.
PG_isolated is alias with PG_reclaim flag so driver shouldn't use the flag
for own purpose.

Cc: Rik van Riel <riel@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: Rafael Aquini <aquini@redhat.com>
Cc: virtualization@lists.linux-foundation.org
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: John Einar Reitan <john.reitan@foss.arm.com>
Cc: dri-devel@lists.freedesktop.org
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Gioh Kim <gi-oh.kim@profitbricks.com>
Signed-off-by: Minchan Kim <minchan@kernel.org>
---
 Documentation/filesystems/Locking |   4 +
 Documentation/filesystems/vfs.txt |  11 +++
 Documentation/vm/page_migration   | 107 ++++++++++++++++++++-
 include/linux/compaction.h        |  17 ++++
 include/linux/fs.h                |   2 +
 include/linux/ksm.h               |   3 +-
 include/linux/migrate.h           |   2 +
 include/linux/mm.h                |   1 +
 include/linux/page-flags.h        |  33 +++++--
 mm/compaction.c                   |  80 ++++++++++++----
 mm/ksm.c                          |   4 +-
 mm/migrate.c                      | 192 ++++++++++++++++++++++++++++++++++----
 mm/page_alloc.c                   |   2 +-
 mm/util.c                         |   6 +-
 14 files changed, 412 insertions(+), 52 deletions(-)

diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking
index af7c030a0368..3991a976cf43 100644
--- a/Documentation/filesystems/Locking
+++ b/Documentation/filesystems/Locking
@@ -195,7 +195,9 @@ unlocks and drops the reference.
 	int (*releasepage) (struct page *, int);
 	void (*freepage)(struct page *);
 	int (*direct_IO)(struct kiocb *, struct iov_iter *iter);
+	bool (*isolate_page) (struct page *, isolate_mode_t);
 	int (*migratepage)(struct address_space *, struct page *, struct page *);
+	void (*putback_page) (struct page *);
 	int (*launder_page)(struct page *);
 	int (*is_partially_uptodate)(struct page *, unsigned long, unsigned long);
 	int (*error_remove_page)(struct address_space *, struct page *);
@@ -219,7 +221,9 @@ invalidatepage:		yes
 releasepage:		yes
 freepage:		yes
 direct_IO:
+isolate_page:		yes
 migratepage:		yes (both)
+putback_page:		yes
 launder_page:		yes
 is_partially_uptodate:	yes
 error_remove_page:	yes
diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt
index 19366fef2652..9d4ae317fdcb 100644
--- a/Documentation/filesystems/vfs.txt
+++ b/Documentation/filesystems/vfs.txt
@@ -591,9 +591,14 @@ struct address_space_operations {
 	int (*releasepage) (struct page *, int);
 	void (*freepage)(struct page *);
 	ssize_t (*direct_IO)(struct kiocb *, struct iov_iter *iter);
+	/* isolate a page for migration */
+	bool (*isolate_page) (struct page *, isolate_mode_t);
 	/* migrate the contents of a page to the specified target */
 	int (*migratepage) (struct page *, struct page *);
+	/* put migration-failed page back to right list */
+	void (*putback_page) (struct page *);
 	int (*launder_page) (struct page *);
+
 	int (*is_partially_uptodate) (struct page *, unsigned long,
 					unsigned long);
 	void (*is_dirty_writeback) (struct page *, bool *, bool *);
@@ -739,6 +744,10 @@ struct address_space_operations {
         and transfer data directly between the storage and the
         application's address space.
 
+  isolate_page: Called by the VM when isolating a movable non-lru page.
+	If page is successfully isolated, VM marks the page as PG_isolated
+	via __SetPageIsolated.
+
   migrate_page:  This is used to compact the physical memory usage.
         If the VM wants to relocate a page (maybe off a memory card
         that is signalling imminent failure) it will pass a new page
@@ -746,6 +755,8 @@ struct address_space_operations {
 	transfer any private data across and update any references
         that it has to the page.
 
+  putback_page: Called by the VM when isolated page's migration fails.
+
   launder_page: Called before freeing a page - it writes back the dirty page. To
   	prevent redirtying the page, it is kept locked during the whole
 	operation.
diff --git a/Documentation/vm/page_migration b/Documentation/vm/page_migration
index fea5c0864170..80e98af46e95 100644
--- a/Documentation/vm/page_migration
+++ b/Documentation/vm/page_migration
@@ -142,5 +142,110 @@ is increased so that the page cannot be freed while page migration occurs.
 20. The new page is moved to the LRU and can be scanned by the swapper
     etc again.
 
-Christoph Lameter, May 8, 2006.
+C. Non-LRU page migration
+-------------------------
+
+Although original migration aimed for reducing the latency of memory access
+for NUMA, compaction who want to create high-order page is also main customer.
+
+Current problem of the implementation is that it is designed to migrate only
+*LRU* pages. However, there are potential non-lru pages which can be migrated
+in drivers, for example, zsmalloc, virtio-balloon pages.
+
+For virtio-balloon pages, some parts of migration code path have been hooked
+up and added virtio-balloon specific functions to intercept migration logics.
+It's too specific to a driver so other drivers who want to make their pages
+movable would have to add own specific hooks in migration path.
+
+To overclome the problem, VM supports non-LRU page migration which provides
+generic functions for non-LRU movable pages without driver specific hooks
+migration path.
+
+If a driver want to make own pages movable, it should define three functions
+which are function pointers of struct address_space_operations.
+
+1. bool (*isolate_page) (struct page *page, isolate_mode_t mode);
+
+What VM expects on isolate_page function of driver is to return *true*
+if driver isolates page successfully. On returing true, VM marks the page
+as PG_isolated so concurrent isolation in several CPUs skip the page
+for isolation. If a driver cannot isolate the page, it should return *false*.
+
+Once page is successfully isolated, VM uses page.lru fields so driver
+shouldn't expect to preserve values in that fields.
+
+2. int (*migratepage) (struct address_space *mapping,
+		struct page *newpage, struct page *oldpage, enum migrate_mode);
+
+After isolation, VM calls migratepage of driver with isolated page.
+The function of migratepage is to move content of the old page to new page
+and set up fields of struct page newpage. Keep in mind that you should
+clear PG_movable of oldpage via __ClearPageMovable under page_lock if you
+migrated the oldpage successfully and returns 0.
+If driver cannot migrate the page at the moment, driver can return -EAGAIN.
+On -EAGAIN, VM will retry page migration in a short time because VM interprets
+-EAGAIN as "temporal migration failure". On returning any error except -EAGAIN,
+VM will give up the page migration without retrying in this time.
+
+Driver shouldn't touch page.lru field VM using in the functions.
+
+3. void (*putback_page)(struct page *);
+
+If migration fails on isolated page, VM should return the isolated page
+to the driver so VM calls driver's putback_page with migration failed page.
+In this function, driver should put the isolated page back to the own data
+structure.
 
+4. non-lru movable page flags
+
+There are two page flags for supporting non-lru movable page.
+
+* PG_movable
+
+Driver should use the below function to make page movable under page_lock.
+
+	void __SetPageMovable(struct page *page, struct address_space *mapping)
+
+It needs argument of address_space for registering migration family functions
+which will be called by VM. Exactly speaking, PG_movable is not a real flag of
+struct page. Rather than, VM reuses page->mapping's lower bits to represent it.
+
+	#define PAGE_MAPPING_MOVABLE 0x2
+	page->mapping = page->mapping | PAGE_MAPPING_MOVABLE;
+
+so driver shouldn't access page->mapping directly. Instead, driver should
+use page_mapping which mask off the low two bits of page->mapping so it can get
+right struct address_space.
+
+For testing of non-lru movable page, VM supports __PageMovable function.
+However, it doesn't guarantee to identify non-lru movable page because
+page->mapping field is unified with other variables in struct page.
+As well, if driver releases the page after isolation by VM, page->mapping
+doesn't have stable value although it has PAGE_MAPPING_MOVABLE
+(Look at __ClearPageMovable). But __PageMovable is cheap to catch whether
+page is LRU or non-lru movable once the page has been isolated. Because
+LRU pages never can have PAGE_MAPPING_MOVABLE in page->mapping. It is also
+good for just peeking to test non-lru movable pages before more expensive
+checking with lock_page in pfn scanning to select victim.
+
+For guaranteeing non-lru movable page, VM provides PageMovable function.
+Unlike __PageMovable, PageMovable functions validates page->mapping and
+mapping->a_ops->isolate_page under lock_page. The lock_page prevents sudden
+destroying of page->mapping.
+
+Driver using __SetPageMovable should clear the flag via __ClearMovablePage
+under page_lock before the releasing the page.
+
+* PG_isolated
+
+To prevent concurrent isolation among several CPUs, VM marks isolated page
+as PG_isolated under lock_page. So if a CPU encounters PG_isolated non-lru
+movable page, it can skip it. Driver doesn't need to manipulate the flag
+because VM will set/clear it automatically. Keep in mind that if driver
+sees PG_isolated page, it means the page have been isolated by VM so it
+shouldn't touch page.lru field.
+PG_isolated is alias with PG_reclaim flag so driver shouldn't use the flag
+for own purpose.
+
+Christoph Lameter, May 8, 2006.
+Minchan Kim, Mar 28, 2016.
diff --git a/include/linux/compaction.h b/include/linux/compaction.h
index a58c852a268f..c6b47c861cea 100644
--- a/include/linux/compaction.h
+++ b/include/linux/compaction.h
@@ -54,6 +54,9 @@ enum compact_result {
 struct alloc_context; /* in mm/internal.h */
 
 #ifdef CONFIG_COMPACTION
+extern int PageMovable(struct page *page);
+extern void __SetPageMovable(struct page *page, struct address_space *mapping);
+extern void __ClearPageMovable(struct page *page);
 extern int sysctl_compact_memory;
 extern int sysctl_compaction_handler(struct ctl_table *table, int write,
 			void __user *buffer, size_t *length, loff_t *ppos);
@@ -151,6 +154,19 @@ extern void kcompactd_stop(int nid);
 extern void wakeup_kcompactd(pg_data_t *pgdat, int order, int classzone_idx);
 
 #else
+static inline int PageMovable(struct page *page)
+{
+	return 0;
+}
+static inline void __SetPageMovable(struct page *page,
+			struct address_space *mapping)
+{
+}
+
+static inline void __ClearPageMovable(struct page *page)
+{
+}
+
 static inline enum compact_result try_to_compact_pages(gfp_t gfp_mask,
 			unsigned int order, int alloc_flags,
 			const struct alloc_context *ac,
@@ -212,6 +228,7 @@ static inline void wakeup_kcompactd(pg_data_t *pgdat, int order, int classzone_i
 #endif /* CONFIG_COMPACTION */
 
 #if defined(CONFIG_COMPACTION) && defined(CONFIG_SYSFS) && defined(CONFIG_NUMA)
+struct node;
 extern int compaction_register_node(struct node *node);
 extern void compaction_unregister_node(struct node *node);
 
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 0cfdf2aec8f7..39ef97414033 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -402,6 +402,8 @@ struct address_space_operations {
 	 */
 	int (*migratepage) (struct address_space *,
 			struct page *, struct page *, enum migrate_mode);
+	bool (*isolate_page)(struct page *, isolate_mode_t);
+	void (*putback_page)(struct page *);
 	int (*launder_page) (struct page *);
 	int (*is_partially_uptodate) (struct page *, unsigned long,
 					unsigned long);
diff --git a/include/linux/ksm.h b/include/linux/ksm.h
index 7ae216a39c9e..481c8c4627ca 100644
--- a/include/linux/ksm.h
+++ b/include/linux/ksm.h
@@ -43,8 +43,7 @@ static inline struct stable_node *page_stable_node(struct page *page)
 static inline void set_page_stable_node(struct page *page,
 					struct stable_node *stable_node)
 {
-	page->mapping = (void *)stable_node +
-				(PAGE_MAPPING_ANON | PAGE_MAPPING_KSM);
+	page->mapping = (void *)((unsigned long)stable_node | PAGE_MAPPING_KSM);
 }
 
 /*
diff --git a/include/linux/migrate.h b/include/linux/migrate.h
index 9b50325e4ddf..404fbfefeb33 100644
--- a/include/linux/migrate.h
+++ b/include/linux/migrate.h
@@ -37,6 +37,8 @@ extern int migrate_page(struct address_space *,
 			struct page *, struct page *, enum migrate_mode);
 extern int migrate_pages(struct list_head *l, new_page_t new, free_page_t free,
 		unsigned long private, enum migrate_mode mode, int reason);
+extern bool isolate_movable_page(struct page *page, isolate_mode_t mode);
+extern void putback_movable_page(struct page *page);
 
 extern int migrate_prep(void);
 extern int migrate_prep_local(void);
diff --git a/include/linux/mm.h b/include/linux/mm.h
index a00ec816233a..33eaec57e997 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1035,6 +1035,7 @@ static inline pgoff_t page_file_index(struct page *page)
 }
 
 bool page_mapped(struct page *page);
+struct address_space *page_mapping(struct page *page);
 
 /*
  * Return true only if the page has been allocated with
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index e5a32445f930..f36dbb3a3060 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -129,6 +129,9 @@ enum pageflags {
 
 	/* Compound pages. Stored in first tail page's flags */
 	PG_double_map = PG_private_2,
+
+	/* non-lru isolated movable page */
+	PG_isolated = PG_reclaim,
 };
 
 #ifndef __GENERATING_BOUNDS_H
@@ -357,29 +360,37 @@ PAGEFLAG(Idle, idle, PF_ANY)
  * with the PAGE_MAPPING_ANON bit set to distinguish it.  See rmap.h.
  *
  * On an anonymous page in a VM_MERGEABLE area, if CONFIG_KSM is enabled,
- * the PAGE_MAPPING_KSM bit may be set along with the PAGE_MAPPING_ANON bit;
- * and then page->mapping points, not to an anon_vma, but to a private
+ * the PAGE_MAPPING_MOVABLE bit may be set along with the PAGE_MAPPING_ANON
+ * bit; and then page->mapping points, not to an anon_vma, but to a private
  * structure which KSM associates with that merged page.  See ksm.h.
  *
- * PAGE_MAPPING_KSM without PAGE_MAPPING_ANON is currently never used.
+ * PAGE_MAPPING_KSM without PAGE_MAPPING_ANON is used for non-lru movable
+ * page and then page->mapping points a struct address_space.
  *
  * Please note that, confusingly, "page_mapping" refers to the inode
  * address_space which maps the page from disk; whereas "page_mapped"
  * refers to user virtual address space into which the page is mapped.
  */
-#define PAGE_MAPPING_ANON	1
-#define PAGE_MAPPING_KSM	2
-#define PAGE_MAPPING_FLAGS	(PAGE_MAPPING_ANON | PAGE_MAPPING_KSM)
+#define PAGE_MAPPING_ANON	0x1
+#define PAGE_MAPPING_MOVABLE	0x2
+#define PAGE_MAPPING_KSM	(PAGE_MAPPING_ANON | PAGE_MAPPING_MOVABLE)
+#define PAGE_MAPPING_FLAGS	(PAGE_MAPPING_ANON | PAGE_MAPPING_MOVABLE)
 
-static __always_inline int PageAnonHead(struct page *page)
+static __always_inline int PageMappingFlags(struct page *page)
 {
-	return ((unsigned long)page->mapping & PAGE_MAPPING_ANON) != 0;
+	return ((unsigned long)page->mapping & PAGE_MAPPING_FLAGS) != 0;
 }
 
 static __always_inline int PageAnon(struct page *page)
 {
 	page = compound_head(page);
-	return PageAnonHead(page);
+	return ((unsigned long)page->mapping & PAGE_MAPPING_ANON) != 0;
+}
+
+static __always_inline int __PageMovable(struct page *page)
+{
+	return ((unsigned long)page->mapping & PAGE_MAPPING_FLAGS) ==
+				PAGE_MAPPING_MOVABLE;
 }
 
 #ifdef CONFIG_KSM
@@ -393,7 +404,7 @@ static __always_inline int PageKsm(struct page *page)
 {
 	page = compound_head(page);
 	return ((unsigned long)page->mapping & PAGE_MAPPING_FLAGS) ==
-				(PAGE_MAPPING_ANON | PAGE_MAPPING_KSM);
+				PAGE_MAPPING_KSM;
 }
 #else
 TESTPAGEFLAG_FALSE(Ksm)
@@ -641,6 +652,8 @@ static inline void __ClearPageBalloon(struct page *page)
 	atomic_set(&page->_mapcount, -1);
 }
 
+__PAGEFLAG(Isolated, isolated, PF_ANY);
+
 /*
  * If network-based swap is enabled, sl*b must keep track of whether pages
  * were allocated from pfmemalloc reserves.
diff --git a/mm/compaction.c b/mm/compaction.c
index 1427366ad673..c7e0cd4dda9d 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -81,6 +81,39 @@ static inline bool migrate_async_suitable(int migratetype)
 
 #ifdef CONFIG_COMPACTION
 
+int PageMovable(struct page *page)
+{
+	struct address_space *mapping;
+
+	VM_BUG_ON_PAGE(!PageLocked(page), page);
+	if (!__PageMovable(page))
+		return 0;
+
+	mapping = page_mapping(page);
+	if (mapping && mapping->a_ops && mapping->a_ops->isolate_page)
+		return 1;
+
+	return 0;
+}
+EXPORT_SYMBOL(PageMovable);
+
+void __SetPageMovable(struct page *page, struct address_space *mapping)
+{
+	VM_BUG_ON_PAGE(!PageLocked(page), page);
+	VM_BUG_ON_PAGE((unsigned long)mapping & PAGE_MAPPING_MOVABLE, page);
+	page->mapping = (void *)((unsigned long)mapping | PAGE_MAPPING_MOVABLE);
+}
+EXPORT_SYMBOL(__SetPageMovable);
+
+void __ClearPageMovable(struct page *page)
+{
+	VM_BUG_ON_PAGE(!PageLocked(page), page);
+	VM_BUG_ON_PAGE(!PageMovable(page), page);
+	page->mapping = (void *)((unsigned long)page->mapping &
+				PAGE_MAPPING_MOVABLE);
+}
+EXPORT_SYMBOL(__ClearPageMovable);
+
 /* Do not skip compaction more than 64 times */
 #define COMPACT_MAX_DEFER_SHIFT 6
 
@@ -735,21 +768,6 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
 		}
 
 		/*
-		 * Check may be lockless but that's ok as we recheck later.
-		 * It's possible to migrate LRU pages and balloon pages
-		 * Skip any other type of page
-		 */
-		is_lru = PageLRU(page);
-		if (!is_lru) {
-			if (unlikely(balloon_page_movable(page))) {
-				if (balloon_page_isolate(page)) {
-					/* Successfully isolated */
-					goto isolate_success;
-				}
-			}
-		}
-
-		/*
 		 * Regardless of being on LRU, compound pages such as THP and
 		 * hugetlbfs are not to be compacted. We can potentially save
 		 * a lot of iterations if we skip them at once. The check is
@@ -765,8 +783,38 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
 			goto isolate_fail;
 		}
 
-		if (!is_lru)
+		/*
+		 * Check may be lockless but that's ok as we recheck later.
+		 * It's possible to migrate LRU and non-lru movable pages.
+		 * Skip any other type of page
+		 */
+		is_lru = PageLRU(page);
+		if (!is_lru) {
+			if (unlikely(balloon_page_movable(page))) {
+				if (balloon_page_isolate(page)) {
+					/* Successfully isolated */
+					goto isolate_success;
+				}
+			}
+
+			/*
+			 * __PageMovable can return false positive so we need
+			 * to verify it under page_lock.
+			 */
+			if (unlikely(__PageMovable(page)) &&
+					!PageIsolated(page)) {
+				if (locked) {
+					spin_unlock_irqrestore(&zone->lru_lock,
+									flags);
+					locked = false;
+				}
+
+				if (isolate_movable_page(page, isolate_mode))
+					goto isolate_success;
+			}
+
 			goto isolate_fail;
+		}
 
 		/*
 		 * Migration will fail if an anonymous page is pinned in memory,
diff --git a/mm/ksm.c b/mm/ksm.c
index 4786b4150f62..35b8aef867a9 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -532,8 +532,8 @@ static struct page *get_ksm_page(struct stable_node *stable_node, bool lock_it)
 	void *expected_mapping;
 	unsigned long kpfn;
 
-	expected_mapping = (void *)stable_node +
-				(PAGE_MAPPING_ANON | PAGE_MAPPING_KSM);
+	expected_mapping = (void *)((unsigned long)stable_node |
+					PAGE_MAPPING_KSM);
 again:
 	kpfn = READ_ONCE(stable_node->kpfn);
 	page = pfn_to_page(kpfn);
diff --git a/mm/migrate.c b/mm/migrate.c
index 2666f28b5236..60abcf379b51 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -31,6 +31,7 @@
 #include <linux/vmalloc.h>
 #include <linux/security.h>
 #include <linux/backing-dev.h>
+#include <linux/compaction.h>
 #include <linux/syscalls.h>
 #include <linux/hugetlb.h>
 #include <linux/hugetlb_cgroup.h>
@@ -73,6 +74,81 @@ int migrate_prep_local(void)
 	return 0;
 }
 
+bool isolate_movable_page(struct page *page, isolate_mode_t mode)
+{
+	struct address_space *mapping;
+
+	/*
+	 * Avoid burning cycles with pages that are yet under __free_pages(),
+	 * or just got freed under us.
+	 *
+	 * In case we 'win' a race for a movable page being freed under us and
+	 * raise its refcount preventing __free_pages() from doing its job
+	 * the put_page() at the end of this block will take care of
+	 * release this page, thus avoiding a nasty leakage.
+	 */
+	if (unlikely(!get_page_unless_zero(page)))
+		goto out;
+
+	/*
+	 * Check PageMovable before holding a PG_lock because page's owner
+	 * assumes anybody doesn't touch PG_lock of newly allocated page
+	 * so unconditionally grapping the lock ruins page's owner side.
+	 */
+	if (unlikely(!__PageMovable(page)))
+		goto out_putpage;
+	/*
+	 * As movable pages are not isolated from LRU lists, concurrent
+	 * compaction threads can race against page migration functions
+	 * as well as race against the releasing a page.
+	 *
+	 * In order to avoid having an already isolated movable page
+	 * being (wrongly) re-isolated while it is under migration,
+	 * or to avoid attempting to isolate pages being released,
+	 * lets be sure we have the page lock
+	 * before proceeding with the movable page isolation steps.
+	 */
+	if (unlikely(!trylock_page(page)))
+		goto out_putpage;
+
+	if (!PageMovable(page) || PageIsolated(page))
+		goto out_no_isolated;
+
+	mapping = page_mapping(page);
+	VM_BUG_ON_PAGE(!mapping, page);
+
+	if (!mapping->a_ops->isolate_page(page, mode))
+		goto out_no_isolated;
+
+	/* Driver shouldn't use PG_isolated bit of page->flags */
+	WARN_ON_ONCE(PageIsolated(page));
+	__SetPageIsolated(page);
+	unlock_page(page);
+
+	return true;
+
+out_no_isolated:
+	unlock_page(page);
+out_putpage:
+	put_page(page);
+out:
+	return false;
+}
+
+/* It should be called on page which is PG_movable */
+void putback_movable_page(struct page *page)
+{
+	struct address_space *mapping;
+
+	VM_BUG_ON_PAGE(!PageLocked(page), page);
+	VM_BUG_ON_PAGE(!PageMovable(page), page);
+	VM_BUG_ON_PAGE(!PageIsolated(page), page);
+
+	mapping = page_mapping(page);
+	mapping->a_ops->putback_page(page);
+	__ClearPageIsolated(page);
+}
+
 /*
  * Put previously isolated pages back onto the appropriate lists
  * from where they were once taken off for compaction/migration.
@@ -94,10 +170,25 @@ void putback_movable_pages(struct list_head *l)
 		list_del(&page->lru);
 		dec_zone_page_state(page, NR_ISOLATED_ANON +
 				page_is_file_cache(page));
-		if (unlikely(isolated_balloon_page(page)))
+		if (unlikely(isolated_balloon_page(page))) {
 			balloon_page_putback(page);
-		else
+		/*
+		 * We isolated non-lru movable page so here we can use
+		 * __PageMovable because LRU page's mapping cannot have
+		 * PAGE_MAPPING_MOVABLE.
+		 */
+		} else if (unlikely(__PageMovable(page))) {
+			VM_BUG_ON_PAGE(!PageIsolated(page), page);
+			lock_page(page);
+			if (PageMovable(page))
+				putback_movable_page(page);
+			else
+				__ClearPageIsolated(page);
+			unlock_page(page);
+			put_page(page);
+		} else {
 			putback_lru_page(page);
+		}
 	}
 }
 
@@ -592,7 +683,7 @@ void migrate_page_copy(struct page *newpage, struct page *page)
  ***********************************************************/
 
 /*
- * Common logic to directly migrate a single page suitable for
+ * Common logic to directly migrate a single LRU page suitable for
  * pages that do not use PagePrivate/PagePrivate2.
  *
  * Pages are locked upon entry and exit.
@@ -755,33 +846,72 @@ static int move_to_new_page(struct page *newpage, struct page *page,
 				enum migrate_mode mode)
 {
 	struct address_space *mapping;
-	int rc;
+	int rc = -EAGAIN;
+	bool is_lru = !__PageMovable(page);
 
 	VM_BUG_ON_PAGE(!PageLocked(page), page);
 	VM_BUG_ON_PAGE(!PageLocked(newpage), newpage);
 
 	mapping = page_mapping(page);
-	if (!mapping)
-		rc = migrate_page(mapping, newpage, page, mode);
-	else if (mapping->a_ops->migratepage)
+
+	if (likely(is_lru)) {
+		if (!mapping)
+			rc = migrate_page(mapping, newpage, page, mode);
+		else if (mapping->a_ops->migratepage)
+			/*
+			 * Most pages have a mapping and most filesystems
+			 * provide a migratepage callback. Anonymous pages
+			 * are part of swap space which also has its own
+			 * migratepage callback. This is the most common path
+			 * for page migration.
+			 */
+			rc = mapping->a_ops->migratepage(mapping, newpage,
+							page, mode);
+		else
+			rc = fallback_migrate_page(mapping, newpage,
+							page, mode);
+	} else {
 		/*
-		 * Most pages have a mapping and most filesystems provide a
-		 * migratepage callback. Anonymous pages are part of swap
-		 * space which also has its own migratepage callback. This
-		 * is the most common path for page migration.
+		 * In case of non-lru page, it could be released after
+		 * isolation step. In that case, we shouldn't try migration.
 		 */
-		rc = mapping->a_ops->migratepage(mapping, newpage, page, mode);
-	else
-		rc = fallback_migrate_page(mapping, newpage, page, mode);
+		VM_BUG_ON_PAGE(!PageIsolated(page), page);
+		if (!PageMovable(page)) {
+			rc = MIGRATEPAGE_SUCCESS;
+			__ClearPageIsolated(page);
+			goto out;
+		}
+
+		rc = mapping->a_ops->migratepage(mapping, newpage,
+						page, mode);
+		WARN_ON_ONCE(rc == MIGRATEPAGE_SUCCESS &&
+			!PageIsolated(page));
+	}
 
 	/*
 	 * When successful, old pagecache page->mapping must be cleared before
 	 * page is freed; but stats require that PageAnon be left as PageAnon.
 	 */
 	if (rc == MIGRATEPAGE_SUCCESS) {
-		if (!PageAnon(page))
+		if (__PageMovable(page)) {
+			VM_BUG_ON_PAGE(!PageIsolated(page), page);
+
+			/*
+			 * We clear PG_movable under page_lock so any compactor
+			 * cannot try to migrate this page.
+			 */
+			__ClearPageIsolated(page);
+		}
+
+		/*
+		 * Anonymous and movable page->mapping will be cleard by
+		 * free_pages_prepare so don't reset it here for keeping
+		 * the type to work PageAnon, for example.
+		 */
+		if (!PageMappingFlags(page))
 			page->mapping = NULL;
 	}
+out:
 	return rc;
 }
 
@@ -791,6 +921,7 @@ static int __unmap_and_move(struct page *page, struct page *newpage,
 	int rc = -EAGAIN;
 	int page_was_mapped = 0;
 	struct anon_vma *anon_vma = NULL;
+	bool is_lru = !__PageMovable(page);
 
 	if (!trylock_page(page)) {
 		if (!force || mode == MIGRATE_ASYNC)
@@ -871,6 +1002,11 @@ static int __unmap_and_move(struct page *page, struct page *newpage,
 		goto out_unlock_both;
 	}
 
+	if (unlikely(!is_lru)) {
+		rc = move_to_new_page(newpage, page, mode);
+		goto out_unlock_both;
+	}
+
 	/*
 	 * Corner case handling:
 	 * 1. When a new swap-cache page is read into, it is added to the LRU
@@ -920,7 +1056,8 @@ static int __unmap_and_move(struct page *page, struct page *newpage,
 	 * list in here.
 	 */
 	if (rc == MIGRATEPAGE_SUCCESS) {
-		if (unlikely(__is_movable_balloon_page(newpage)))
+		if (unlikely(__is_movable_balloon_page(newpage) ||
+				__PageMovable(newpage)))
 			put_page(newpage);
 		else
 			putback_lru_page(newpage);
@@ -961,6 +1098,12 @@ static ICE_noinline int unmap_and_move(new_page_t get_new_page,
 		/* page was freed from under us. So we are done. */
 		ClearPageActive(page);
 		ClearPageUnevictable(page);
+		if (unlikely(__PageMovable(page))) {
+			lock_page(page);
+			if (!PageMovable(page))
+				__ClearPageIsolated(page);
+			unlock_page(page);
+		}
 		if (put_new_page)
 			put_new_page(newpage, private);
 		else
@@ -1010,8 +1153,21 @@ static ICE_noinline int unmap_and_move(new_page_t get_new_page,
 				num_poisoned_pages_inc();
 		}
 	} else {
-		if (rc != -EAGAIN)
-			putback_lru_page(page);
+		if (rc != -EAGAIN) {
+			if (likely(!__PageMovable(page))) {
+				putback_lru_page(page);
+				goto put_new;
+			}
+
+			lock_page(page);
+			if (PageMovable(page))
+				putback_movable_page(page);
+			else
+				__ClearPageIsolated(page);
+			unlock_page(page);
+			put_page(page);
+		}
+put_new:
 		if (put_new_page)
 			put_new_page(newpage, private);
 		else
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index d27e8b968ac3..4d0ba8f6cfee 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1014,7 +1014,7 @@ static __always_inline bool free_pages_prepare(struct page *page,
 			(page + i)->flags &= ~PAGE_FLAGS_CHECK_AT_PREP;
 		}
 	}
-	if (PageAnonHead(page))
+	if (PageMappingFlags(page))
 		page->mapping = NULL;
 	if (check_free)
 		bad += free_pages_check(page);
diff --git a/mm/util.c b/mm/util.c
index 917e0e3d0f8e..b756ee36f7f0 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -399,10 +399,12 @@ struct address_space *page_mapping(struct page *page)
 	}
 
 	mapping = page->mapping;
-	if ((unsigned long)mapping & PAGE_MAPPING_FLAGS)
+	if ((unsigned long)mapping & PAGE_MAPPING_ANON)
 		return NULL;
-	return mapping;
+
+	return (void *)((unsigned long)mapping & ~PAGE_MAPPING_FLAGS);
 }
+EXPORT_SYMBOL(page_mapping);
 
 /* Slow path of page_mapcount() for compound pages */
 int __page_mapcount(struct page *page)
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH v6 02/12] mm: migrate: support non-lru movable page migration
  2016-05-30  1:33     ` Minchan Kim
@ 2016-05-30  9:01       ` Vlastimil Babka
  0 siblings, 0 replies; 22+ messages in thread
From: Vlastimil Babka @ 2016-05-30  9:01 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Rik van Riel, Sergey Senozhatsky, Rafael Aquini, Jonathan Corbet,
	Hugh Dickins, linux-kernel, dri-devel, virtualization,
	John Einar Reitan, linux-mm, Mel Gorman, Andrew Morton,
	Joonsoo Kim, Gioh Kim

On 05/30/2016 03:33 AM, Minchan Kim wrote:
>>
>>
>>> +	page->mapping = (void *)((unsigned long)page->mapping &
>>> +				PAGE_MAPPING_MOVABLE);
>>
>> This should be negated to clear... use ~PAGE_MAPPING_MOVABLE ?
>
> No.
>
> The intention is to clear only mapping value but PAGE_MAPPING_MOVABLE
> flag. So, any new migration trial will be failed because PageMovable
> checks page's mapping value but ongoing migraion handling can catch
> whether it's movable page or not with the type bit.

Oh, OK, I got that wrong. I'll point out in the reply to the v6v2 what 
misled me :)

>>
>> So this effectively prevents movable compound pages from being
>> migrated. Are you sure no users of this functionality are going to
>> have compound pages? I assumed that they could, and so made the code
>> like this, with the is_lru variable (which is redundant after your
>> change).
>
> This implementation at the moment disables effectively non-lru compound
> page migration but I'm not a god so I can't make sure no one doesn't want
> it in future. If someone want it, we can support it then because this work
> doesn't prevent it by design.

Oh well. As long as the balloon pages or zsmalloc don't already use 
compound pages...

>
> I thouht PageCompound check right before isolate_movable_page in
> isolate_migratepages_block will filter it out mostly but yeah
> it is racy without zone->lru_lock so it could reach to isolate_movable_page.
> However, PageMovable check in there investigates mapping, mapping->a_ops,
> and a_ops->isolate_page to verify whether it's movable page or not.
>
> I thought it's sufficient to filter THP page.

I guess, yeah.

>>
>> [...]
>>
>>> @@ -755,33 +844,69 @@ static int move_to_new_page(struct page *newpage, struct page *page,
>>> 				enum migrate_mode mode)
>>> {
>>> 	struct address_space *mapping;
>>> -	int rc;
>>> +	int rc = -EAGAIN;
>>> +	bool is_lru = !__PageMovable(page);
>>>
>>> 	VM_BUG_ON_PAGE(!PageLocked(page), page);
>>> 	VM_BUG_ON_PAGE(!PageLocked(newpage), newpage);
>>>
>>> 	mapping = page_mapping(page);
>>> -	if (!mapping)
>>> -		rc = migrate_page(mapping, newpage, page, mode);
>>> -	else if (mapping->a_ops->migratepage)
>>> -		/*
>>> -		 * Most pages have a mapping and most filesystems provide a
>>> -		 * migratepage callback. Anonymous pages are part of swap
>>> -		 * space which also has its own migratepage callback. This
>>> -		 * is the most common path for page migration.
>>> -		 */
>>> -		rc = mapping->a_ops->migratepage(mapping, newpage, page, mode);
>>> -	else
>>> -		rc = fallback_migrate_page(mapping, newpage, page, mode);
>>> +	/*
>>> +	 * In case of non-lru page, it could be released after
>>> +	 * isolation step. In that case, we shouldn't try
>>> +	 * fallback migration which is designed for LRU pages.
>>> +	 */
>>
>> Hmm but is_lru was determined from !__PageMovable() above, also well
>> after the isolation step. So if the driver already released it, we
>> wouldn't detect it? And this function is all under same page lock,
>> so if __PageMovable was true above, so will be PageMovable below?
>
> You are missing what I mentioned above.
> We should keep the type bit to catch what you are saying(i.e., driver
> already released).
>
> __PageMovable just checks PAGE_MAPPING_MOVABLE flag and PageMovable
> checks page->mapping valid while __ClearPageMovable reset only
> valid vaule of mapping, not PAGE_MAPPING_MOVABLE flag.
>
> I wrote it down in Documentation/vm/page_migration.
>
> "For testing of non-lru movable page, VM supports __PageMovable function.
> However, it doesn't guarantee to identify non-lru movable page because
> page->mapping field is unified with other variables in struct page.
> As well, if driver releases the page after isolation by VM, page->mapping
> doesn't have stable value although it has PAGE_MAPPING_MOVABLE
> (Look at __ClearPageMovable). But __PageMovable is cheap to catch whether
> page is LRU or non-lru movable once the page has been isolated. Because
> LRU pages never can have PAGE_MAPPING_MOVABLE in page->mapping. It is also
> good for just peeking to test non-lru movable pages before more expensive
> checking with lock_page in pfn scanning to select victim.
>
> For guaranteeing non-lru movable page, VM provides PageMovable function.
> Unlike __PageMovable, PageMovable functions validates page->mapping and
> mapping->a_ops->isolate_page under lock_page. The lock_page prevents sudden
> destroying of page->mapping.
>
> Driver using __SetPageMovable should clear the flag via __ClearMovablePage
> under page_lock before the releasing the page."

Right, I get it now.


>>> +		if (!((unsigned long)page->mapping & PAGE_MAPPING_FLAGS))
>>> 			page->mapping = NULL;
>>
>> The two lines above make little sense to me without a comment.
>
> I folded this.
>
> @@ -901,7 +901,12 @@ static int move_to_new_page(struct page *newpage, struct page *page,
>                         __ClearPageIsolated(page);
>                 }
>
> -               if (!((unsigned long)page->mapping & PAGE_MAPPING_FLAGS))
> +               /*
> +                * Anonymous and movable page->mapping will be cleard by
> +                * free_pages_prepare so don't reset it here for keeping
> +                * the type to work PageAnon, for example.
> +                */
> +               if (!PageMappingFlags(page))
>                         page->mapping = NULL;
>         }

Thanks.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: PATCH v6v2 02/12] mm: migrate: support non-lru movable page migration
  2016-05-30  1:39   ` PATCH v6v2 " Minchan Kim
@ 2016-05-30  9:36     ` Vlastimil Babka
  2016-05-30 16:25       ` Minchan Kim
  2016-05-31  0:01     ` [PATCH v6v3 " Minchan Kim
  1 sibling, 1 reply; 22+ messages in thread
From: Vlastimil Babka @ 2016-05-30  9:36 UTC (permalink / raw)
  To: Minchan Kim, Andrew Morton
  Cc: linux-mm, linux-kernel, Rik van Riel, Joonsoo Kim, Mel Gorman,
	Hugh Dickins, Rafael Aquini, virtualization, Jonathan Corbet,
	John Einar Reitan, dri-devel, Sergey Senozhatsky, Gioh Kim

On 05/30/2016 03:39 AM, Minchan Kim wrote:
> After isolation, VM calls migratepage of driver with isolated page.
> The function of migratepage is to move content of the old page to new page
> and set up fields of struct page newpage. Keep in mind that you should
> clear PG_movable of oldpage via __ClearPageMovable under page_lock if you
> migrated the oldpage successfully and returns 0.

This "clear PG_movable" is one of the reasons I was confused about what 
__ClearPageMovable() really does. There's no actual "PG_movable" page 
flag and the function doesn't clear even the actual mapping flag :) Also 
same thing in the Documentation/ part.

Something like "... you should indicate to the VM that the oldpage is no 
longer movable via __ClearPageMovable() ..."?

> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -81,6 +81,39 @@ static inline bool migrate_async_suitable(int migratetype)
>
>  #ifdef CONFIG_COMPACTION
>
> +int PageMovable(struct page *page)
> +{
> +	struct address_space *mapping;
> +
> +	VM_BUG_ON_PAGE(!PageLocked(page), page);
> +	if (!__PageMovable(page))
> +		return 0;
> +
> +	mapping = page_mapping(page);
> +	if (mapping && mapping->a_ops && mapping->a_ops->isolate_page)
> +		return 1;
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL(PageMovable);
> +
> +void __SetPageMovable(struct page *page, struct address_space *mapping)
> +{
> +	VM_BUG_ON_PAGE(!PageLocked(page), page);
> +	VM_BUG_ON_PAGE((unsigned long)mapping & PAGE_MAPPING_MOVABLE, page);
> +	page->mapping = (void *)((unsigned long)mapping | PAGE_MAPPING_MOVABLE);
> +}
> +EXPORT_SYMBOL(__SetPageMovable);
> +
> +void __ClearPageMovable(struct page *page)
> +{
> +	VM_BUG_ON_PAGE(!PageLocked(page), page);
> +	VM_BUG_ON_PAGE(!PageMovable(page), page);
> +	page->mapping = (void *)((unsigned long)page->mapping &
> +				PAGE_MAPPING_MOVABLE);
> +}
> +EXPORT_SYMBOL(__ClearPageMovable);

The second confusing thing is that the function is named 
__ClearPageMovable(), but what it really clears is the mapping pointer,
which is not at all the opposite of what __SetPageMovable() does.

I know it's explained in the documentation, but it also deserves a 
comment here so it doesn't confuse everyone who looks at it.
Even better would be a less confusing name for the function, but I can't 
offer one right now.

> diff --git a/mm/util.c b/mm/util.c
> index 917e0e3d0f8e..b756ee36f7f0 100644
> --- a/mm/util.c
> +++ b/mm/util.c
> @@ -399,10 +399,12 @@ struct address_space *page_mapping(struct page *page)
>  	}
>
>  	mapping = page->mapping;

I'd probably use READ_ONCE() here to be safe. Not all callers are under 
page lock?

> -	if ((unsigned long)mapping & PAGE_MAPPING_FLAGS)
> +	if ((unsigned long)mapping & PAGE_MAPPING_ANON)
>  		return NULL;
> -	return mapping;
> +
> +	return (void *)((unsigned long)mapping & ~PAGE_MAPPING_FLAGS);
>  }
> +EXPORT_SYMBOL(page_mapping);
>
>  /* Slow path of page_mapcount() for compound pages */
>  int __page_mapcount(struct page *page)
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: PATCH v6v2 02/12] mm: migrate: support non-lru movable page migration
  2016-05-30  9:36     ` Vlastimil Babka
@ 2016-05-30 16:25       ` Minchan Kim
  2016-05-31  7:51         ` Vlastimil Babka
  0 siblings, 1 reply; 22+ messages in thread
From: Minchan Kim @ 2016-05-30 16:25 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Rik van Riel, Sergey Senozhatsky, Rafael Aquini, Jonathan Corbet,
	Hugh Dickins, linux-kernel, dri-devel, virtualization,
	John Einar Reitan, linux-mm, Mel Gorman, Andrew Morton,
	Joonsoo Kim, Gioh Kim

On Mon, May 30, 2016 at 11:36:07AM +0200, Vlastimil Babka wrote:
> On 05/30/2016 03:39 AM, Minchan Kim wrote:
> >After isolation, VM calls migratepage of driver with isolated page.
> >The function of migratepage is to move content of the old page to new page
> >and set up fields of struct page newpage. Keep in mind that you should
> >clear PG_movable of oldpage via __ClearPageMovable under page_lock if you
> >migrated the oldpage successfully and returns 0.
> 
> This "clear PG_movable" is one of the reasons I was confused about
> what __ClearPageMovable() really does. There's no actual
> "PG_movable" page flag and the function doesn't clear even the
> actual mapping flag :) Also same thing in the Documentation/ part.
> 
> Something like "... you should indicate to the VM that the oldpage
> is no longer movable via __ClearPageMovable() ..."?

It's better. I will change it.

> 
> >--- a/mm/compaction.c
> >+++ b/mm/compaction.c
> >@@ -81,6 +81,39 @@ static inline bool migrate_async_suitable(int migratetype)
> >
> > #ifdef CONFIG_COMPACTION
> >
> >+int PageMovable(struct page *page)
> >+{
> >+	struct address_space *mapping;
> >+
> >+	VM_BUG_ON_PAGE(!PageLocked(page), page);
> >+	if (!__PageMovable(page))
> >+		return 0;
> >+
> >+	mapping = page_mapping(page);
> >+	if (mapping && mapping->a_ops && mapping->a_ops->isolate_page)
> >+		return 1;
> >+
> >+	return 0;
> >+}
> >+EXPORT_SYMBOL(PageMovable);
> >+
> >+void __SetPageMovable(struct page *page, struct address_space *mapping)
> >+{
> >+	VM_BUG_ON_PAGE(!PageLocked(page), page);
> >+	VM_BUG_ON_PAGE((unsigned long)mapping & PAGE_MAPPING_MOVABLE, page);
> >+	page->mapping = (void *)((unsigned long)mapping | PAGE_MAPPING_MOVABLE);
> >+}
> >+EXPORT_SYMBOL(__SetPageMovable);
> >+
> >+void __ClearPageMovable(struct page *page)
> >+{
> >+	VM_BUG_ON_PAGE(!PageLocked(page), page);
> >+	VM_BUG_ON_PAGE(!PageMovable(page), page);
> >+	page->mapping = (void *)((unsigned long)page->mapping &
> >+				PAGE_MAPPING_MOVABLE);
> >+}
> >+EXPORT_SYMBOL(__ClearPageMovable);
> 
> The second confusing thing is that the function is named
> __ClearPageMovable(), but what it really clears is the mapping
> pointer,
> which is not at all the opposite of what __SetPageMovable() does.
> 
> I know it's explained in the documentation, but it also deserves a
> comment here so it doesn't confuse everyone who looks at it.
> Even better would be a less confusing name for the function, but I
> can't offer one right now.

To me, __ClearPageMovable naming is suitable for user POV.
It effectively makes the page unmovable. The confusion is just caused
by the implementation and I don't prefer exported API depends on the
implementation. So I want to add just comment.

I didn't add comment above the function because I don't want to export
internal implementation to the user. I think they don't need to know it.

index a7df2ae71f2a..d1d2063b4fd9 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -108,6 +108,11 @@ void __ClearPageMovable(struct page *page)
 {
        VM_BUG_ON_PAGE(!PageLocked(page), page);
        VM_BUG_ON_PAGE(!PageMovable(page), page);
+       /*
+        * Clear registered address_space val with keeping PAGE_MAPPING_MOVABLE
+        * flag so that VM can catch up released page by driver after isolation.
+        * With it, VM migration doesn't try to put it back.
+        */
        page->mapping = (void *)((unsigned long)page->mapping &
                                PAGE_MAPPING_MOVABLE);

> 
> >diff --git a/mm/util.c b/mm/util.c
> >index 917e0e3d0f8e..b756ee36f7f0 100644
> >--- a/mm/util.c
> >+++ b/mm/util.c
> >@@ -399,10 +399,12 @@ struct address_space *page_mapping(struct page *page)
> > 	}
> >
> > 	mapping = page->mapping;
> 
> I'd probably use READ_ONCE() here to be safe. Not all callers are
> under page lock?

I don't understand. Yeah, all caller are not under page lock but at least,
new user of movable pages should call it under page_lock.
Yeah, I will write the rule down in document.
In this case, what kinds of problem do you see?

> 
> >-	if ((unsigned long)mapping & PAGE_MAPPING_FLAGS)
> >+	if ((unsigned long)mapping & PAGE_MAPPING_ANON)
> > 		return NULL;
> >-	return mapping;
> >+
> >+	return (void *)((unsigned long)mapping & ~PAGE_MAPPING_FLAGS);
> > }
> >+EXPORT_SYMBOL(page_mapping);
> >
> > /* Slow path of page_mapcount() for compound pages */
> > int __page_mapcount(struct page *page)
> >
> 

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v6v3 02/12] mm: migrate: support non-lru movable page migration
  2016-05-30  1:39   ` PATCH v6v2 " Minchan Kim
  2016-05-30  9:36     ` Vlastimil Babka
@ 2016-05-31  0:01     ` Minchan Kim
  2016-05-31  7:52       ` Vlastimil Babka
  2016-06-13  9:38       ` Anshuman Khandual
  1 sibling, 2 replies; 22+ messages in thread
From: Minchan Kim @ 2016-05-31  0:01 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, Rik van Riel, Vlastimil Babka,
	Joonsoo Kim, Mel Gorman, Hugh Dickins, Rafael Aquini,
	virtualization, Jonathan Corbet, John Einar Reitan, dri-devel,
	Sergey Senozhatsky, Gioh Kim

Per Vlastimi's review comment.

Thanks for the detail review, Vlastimi!
If you have another concern, feel free to say.
After I resolve all thing, I will send v7 rebased on recent mmotm.

>From b14aaeeeeb2d3ac0702c7b2eec36409d74406d43 Mon Sep 17 00:00:00 2001
From: Minchan Kim <minchan@kernel.org>
Date: Fri, 8 Apr 2016 10:34:49 +0900
Subject: [PATCH] mm: migrate: support non-lru movable page migration

We have allowed migration for only LRU pages until now and it was
enough to make high-order pages. But recently, embedded system(e.g.,
webOS, android) uses lots of non-movable pages(e.g., zram, GPU memory)
so we have seen several reports about troubles of small high-order
allocation. For fixing the problem, there were several efforts
(e,g,. enhance compaction algorithm, SLUB fallback to 0-order page,
reserved memory, vmalloc and so on) but if there are lots of
non-movable pages in system, their solutions are void in the long run.

So, this patch is to support facility to change non-movable pages
with movable. For the feature, this patch introduces functions related
to migration to address_space_operations as well as some page flags.

If a driver want to make own pages movable, it should define three functions
which are function pointers of struct address_space_operations.

1. bool (*isolate_page) (struct page *page, isolate_mode_t mode);

What VM expects on isolate_page function of driver is to return *true*
if driver isolates page successfully. On returing true, VM marks the page
as PG_isolated so concurrent isolation in several CPUs skip the page
for isolation. If a driver cannot isolate the page, it should return *false*.

Once page is successfully isolated, VM uses page.lru fields so driver
shouldn't expect to preserve values in that fields.

2. int (*migratepage) (struct address_space *mapping,
		struct page *newpage, struct page *oldpage, enum migrate_mode);

After isolation, VM calls migratepage of driver with isolated page.
The function of migratepage is to move content of the old page to new page
and set up fields of struct page newpage. Keep in mind that you should
indicate to the VM the oldpage is no longer movable via __ClearPageMovable()
under page_lock if you migrated the oldpage successfully and returns 0.
If driver cannot migrate the page at the moment, driver can return -EAGAIN.
On -EAGAIN, VM will retry page migration in a short time because VM interprets
-EAGAIN as "temporal migration failure". On returning any error except -EAGAIN,
VM will give up the page migration without retrying in this time.

Driver shouldn't touch page.lru field VM using in the functions.

3. void (*putback_page)(struct page *);

If migration fails on isolated page, VM should return the isolated page
to the driver so VM calls driver's putback_page with migration failed page.
In this function, driver should put the isolated page back to the own data
structure.

4. non-lru movable page flags

There are two page flags for supporting non-lru movable page.

* PG_movable

Driver should use the below function to make page movable under page_lock.

	void __SetPageMovable(struct page *page, struct address_space *mapping)

It needs argument of address_space for registering migration family functions
which will be called by VM. Exactly speaking, PG_movable is not a real flag of
struct page. Rather than, VM reuses page->mapping's lower bits to represent it.

	#define PAGE_MAPPING_MOVABLE 0x2
	page->mapping = page->mapping | PAGE_MAPPING_MOVABLE;

so driver shouldn't access page->mapping directly. Instead, driver should
use page_mapping which mask off the low two bits of page->mapping so it can get
right struct address_space.

For testing of non-lru movable page, VM supports __PageMovable function.
However, it doesn't guarantee to identify non-lru movable page because
page->mapping field is unified with other variables in struct page.
As well, if driver releases the page after isolation by VM, page->mapping
doesn't have stable value although it has PAGE_MAPPING_MOVABLE
(Look at __ClearPageMovable). But __PageMovable is cheap to catch whether
page is LRU or non-lru movable once the page has been isolated. Because
LRU pages never can have PAGE_MAPPING_MOVABLE in page->mapping. It is also
good for just peeking to test non-lru movable pages before more expensive
checking with lock_page in pfn scanning to select victim.

For guaranteeing non-lru movable page, VM provides PageMovable function.
Unlike __PageMovable, PageMovable functions validates page->mapping and
mapping->a_ops->isolate_page under lock_page. The lock_page prevents sudden
destroying of page->mapping.

Driver using __SetPageMovable should clear the flag via __ClearMovablePage
under page_lock before the releasing the page.

* PG_isolated

To prevent concurrent isolation among several CPUs, VM marks isolated page
as PG_isolated under lock_page. So if a CPU encounters PG_isolated non-lru
movable page, it can skip it. Driver doesn't need to manipulate the flag
because VM will set/clear it automatically. Keep in mind that if driver
sees PG_isolated page, it means the page have been isolated by VM so it
shouldn't touch page.lru field.
PG_isolated is alias with PG_reclaim flag so driver shouldn't use the flag
for own purpose.

Cc: Rik van Riel <riel@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: Rafael Aquini <aquini@redhat.com>
Cc: virtualization@lists.linux-foundation.org
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: John Einar Reitan <john.reitan@foss.arm.com>
Cc: dri-devel@lists.freedesktop.org
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Gioh Kim <gi-oh.kim@profitbricks.com>
Signed-off-by: Minchan Kim <minchan@kernel.org>
---
 Documentation/filesystems/Locking |   4 +
 Documentation/filesystems/vfs.txt |  11 +++
 Documentation/vm/page_migration   | 107 ++++++++++++++++++++-
 include/linux/compaction.h        |  17 ++++
 include/linux/fs.h                |   2 +
 include/linux/ksm.h               |   3 +-
 include/linux/migrate.h           |   2 +
 include/linux/mm.h                |   1 +
 include/linux/page-flags.h        |  33 +++++--
 mm/compaction.c                   |  85 +++++++++++++----
 mm/ksm.c                          |   4 +-
 mm/migrate.c                      | 192 ++++++++++++++++++++++++++++++++++----
 mm/page_alloc.c                   |   2 +-
 mm/util.c                         |   6 +-
 14 files changed, 417 insertions(+), 52 deletions(-)

diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking
index af7c030a0368..3991a976cf43 100644
--- a/Documentation/filesystems/Locking
+++ b/Documentation/filesystems/Locking
@@ -195,7 +195,9 @@ unlocks and drops the reference.
 	int (*releasepage) (struct page *, int);
 	void (*freepage)(struct page *);
 	int (*direct_IO)(struct kiocb *, struct iov_iter *iter);
+	bool (*isolate_page) (struct page *, isolate_mode_t);
 	int (*migratepage)(struct address_space *, struct page *, struct page *);
+	void (*putback_page) (struct page *);
 	int (*launder_page)(struct page *);
 	int (*is_partially_uptodate)(struct page *, unsigned long, unsigned long);
 	int (*error_remove_page)(struct address_space *, struct page *);
@@ -219,7 +221,9 @@ invalidatepage:		yes
 releasepage:		yes
 freepage:		yes
 direct_IO:
+isolate_page:		yes
 migratepage:		yes (both)
+putback_page:		yes
 launder_page:		yes
 is_partially_uptodate:	yes
 error_remove_page:	yes
diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt
index 19366fef2652..9d4ae317fdcb 100644
--- a/Documentation/filesystems/vfs.txt
+++ b/Documentation/filesystems/vfs.txt
@@ -591,9 +591,14 @@ struct address_space_operations {
 	int (*releasepage) (struct page *, int);
 	void (*freepage)(struct page *);
 	ssize_t (*direct_IO)(struct kiocb *, struct iov_iter *iter);
+	/* isolate a page for migration */
+	bool (*isolate_page) (struct page *, isolate_mode_t);
 	/* migrate the contents of a page to the specified target */
 	int (*migratepage) (struct page *, struct page *);
+	/* put migration-failed page back to right list */
+	void (*putback_page) (struct page *);
 	int (*launder_page) (struct page *);
+
 	int (*is_partially_uptodate) (struct page *, unsigned long,
 					unsigned long);
 	void (*is_dirty_writeback) (struct page *, bool *, bool *);
@@ -739,6 +744,10 @@ struct address_space_operations {
         and transfer data directly between the storage and the
         application's address space.
 
+  isolate_page: Called by the VM when isolating a movable non-lru page.
+	If page is successfully isolated, VM marks the page as PG_isolated
+	via __SetPageIsolated.
+
   migrate_page:  This is used to compact the physical memory usage.
         If the VM wants to relocate a page (maybe off a memory card
         that is signalling imminent failure) it will pass a new page
@@ -746,6 +755,8 @@ struct address_space_operations {
 	transfer any private data across and update any references
         that it has to the page.
 
+  putback_page: Called by the VM when isolated page's migration fails.
+
   launder_page: Called before freeing a page - it writes back the dirty page. To
   	prevent redirtying the page, it is kept locked during the whole
 	operation.
diff --git a/Documentation/vm/page_migration b/Documentation/vm/page_migration
index fea5c0864170..18d37c7ac50b 100644
--- a/Documentation/vm/page_migration
+++ b/Documentation/vm/page_migration
@@ -142,5 +142,110 @@ is increased so that the page cannot be freed while page migration occurs.
 20. The new page is moved to the LRU and can be scanned by the swapper
     etc again.
 
-Christoph Lameter, May 8, 2006.
+C. Non-LRU page migration
+-------------------------
+
+Although original migration aimed for reducing the latency of memory access
+for NUMA, compaction who want to create high-order page is also main customer.
+
+Current problem of the implementation is that it is designed to migrate only
+*LRU* pages. However, there are potential non-lru pages which can be migrated
+in drivers, for example, zsmalloc, virtio-balloon pages.
+
+For virtio-balloon pages, some parts of migration code path have been hooked
+up and added virtio-balloon specific functions to intercept migration logics.
+It's too specific to a driver so other drivers who want to make their pages
+movable would have to add own specific hooks in migration path.
+
+To overclome the problem, VM supports non-LRU page migration which provides
+generic functions for non-LRU movable pages without driver specific hooks
+migration path.
+
+If a driver want to make own pages movable, it should define three functions
+which are function pointers of struct address_space_operations.
+
+1. bool (*isolate_page) (struct page *page, isolate_mode_t mode);
+
+What VM expects on isolate_page function of driver is to return *true*
+if driver isolates page successfully. On returing true, VM marks the page
+as PG_isolated so concurrent isolation in several CPUs skip the page
+for isolation. If a driver cannot isolate the page, it should return *false*.
+
+Once page is successfully isolated, VM uses page.lru fields so driver
+shouldn't expect to preserve values in that fields.
+
+2. int (*migratepage) (struct address_space *mapping,
+		struct page *newpage, struct page *oldpage, enum migrate_mode);
+
+After isolation, VM calls migratepage of driver with isolated page.
+The function of migratepage is to move content of the old page to new page
+and set up fields of struct page newpage. Keep in mind that you should
+indicate to the VM the oldpage is no longer movable via __ClearPageMovable()
+under page_lock if you migrated the oldpage successfully and returns 0.
+If driver cannot migrate the page at the moment, driver can return -EAGAIN.
+On -EAGAIN, VM will retry page migration in a short time because VM interprets
+-EAGAIN as "temporal migration failure". On returning any error except -EAGAIN,
+VM will give up the page migration without retrying in this time.
+
+Driver shouldn't touch page.lru field VM using in the functions.
+
+3. void (*putback_page)(struct page *);
+
+If migration fails on isolated page, VM should return the isolated page
+to the driver so VM calls driver's putback_page with migration failed page.
+In this function, driver should put the isolated page back to the own data
+structure.
 
+4. non-lru movable page flags
+
+There are two page flags for supporting non-lru movable page.
+
+* PG_movable
+
+Driver should use the below function to make page movable under page_lock.
+
+	void __SetPageMovable(struct page *page, struct address_space *mapping)
+
+It needs argument of address_space for registering migration family functions
+which will be called by VM. Exactly speaking, PG_movable is not a real flag of
+struct page. Rather than, VM reuses page->mapping's lower bits to represent it.
+
+	#define PAGE_MAPPING_MOVABLE 0x2
+	page->mapping = page->mapping | PAGE_MAPPING_MOVABLE;
+
+so driver shouldn't access page->mapping directly. Instead, driver should
+use page_mapping which mask off the low two bits of page->mapping under
+page lock so it can get right struct address_space.
+
+For testing of non-lru movable page, VM supports __PageMovable function.
+However, it doesn't guarantee to identify non-lru movable page because
+page->mapping field is unified with other variables in struct page.
+As well, if driver releases the page after isolation by VM, page->mapping
+doesn't have stable value although it has PAGE_MAPPING_MOVABLE
+(Look at __ClearPageMovable). But __PageMovable is cheap to catch whether
+page is LRU or non-lru movable once the page has been isolated. Because
+LRU pages never can have PAGE_MAPPING_MOVABLE in page->mapping. It is also
+good for just peeking to test non-lru movable pages before more expensive
+checking with lock_page in pfn scanning to select victim.
+
+For guaranteeing non-lru movable page, VM provides PageMovable function.
+Unlike __PageMovable, PageMovable functions validates page->mapping and
+mapping->a_ops->isolate_page under lock_page. The lock_page prevents sudden
+destroying of page->mapping.
+
+Driver using __SetPageMovable should clear the flag via __ClearMovablePage
+under page_lock before the releasing the page.
+
+* PG_isolated
+
+To prevent concurrent isolation among several CPUs, VM marks isolated page
+as PG_isolated under lock_page. So if a CPU encounters PG_isolated non-lru
+movable page, it can skip it. Driver doesn't need to manipulate the flag
+because VM will set/clear it automatically. Keep in mind that if driver
+sees PG_isolated page, it means the page have been isolated by VM so it
+shouldn't touch page.lru field.
+PG_isolated is alias with PG_reclaim flag so driver shouldn't use the flag
+for own purpose.
+
+Christoph Lameter, May 8, 2006.
+Minchan Kim, Mar 28, 2016.
diff --git a/include/linux/compaction.h b/include/linux/compaction.h
index a58c852a268f..c6b47c861cea 100644
--- a/include/linux/compaction.h
+++ b/include/linux/compaction.h
@@ -54,6 +54,9 @@ enum compact_result {
 struct alloc_context; /* in mm/internal.h */
 
 #ifdef CONFIG_COMPACTION
+extern int PageMovable(struct page *page);
+extern void __SetPageMovable(struct page *page, struct address_space *mapping);
+extern void __ClearPageMovable(struct page *page);
 extern int sysctl_compact_memory;
 extern int sysctl_compaction_handler(struct ctl_table *table, int write,
 			void __user *buffer, size_t *length, loff_t *ppos);
@@ -151,6 +154,19 @@ extern void kcompactd_stop(int nid);
 extern void wakeup_kcompactd(pg_data_t *pgdat, int order, int classzone_idx);
 
 #else
+static inline int PageMovable(struct page *page)
+{
+	return 0;
+}
+static inline void __SetPageMovable(struct page *page,
+			struct address_space *mapping)
+{
+}
+
+static inline void __ClearPageMovable(struct page *page)
+{
+}
+
 static inline enum compact_result try_to_compact_pages(gfp_t gfp_mask,
 			unsigned int order, int alloc_flags,
 			const struct alloc_context *ac,
@@ -212,6 +228,7 @@ static inline void wakeup_kcompactd(pg_data_t *pgdat, int order, int classzone_i
 #endif /* CONFIG_COMPACTION */
 
 #if defined(CONFIG_COMPACTION) && defined(CONFIG_SYSFS) && defined(CONFIG_NUMA)
+struct node;
 extern int compaction_register_node(struct node *node);
 extern void compaction_unregister_node(struct node *node);
 
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 0cfdf2aec8f7..39ef97414033 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -402,6 +402,8 @@ struct address_space_operations {
 	 */
 	int (*migratepage) (struct address_space *,
 			struct page *, struct page *, enum migrate_mode);
+	bool (*isolate_page)(struct page *, isolate_mode_t);
+	void (*putback_page)(struct page *);
 	int (*launder_page) (struct page *);
 	int (*is_partially_uptodate) (struct page *, unsigned long,
 					unsigned long);
diff --git a/include/linux/ksm.h b/include/linux/ksm.h
index 7ae216a39c9e..481c8c4627ca 100644
--- a/include/linux/ksm.h
+++ b/include/linux/ksm.h
@@ -43,8 +43,7 @@ static inline struct stable_node *page_stable_node(struct page *page)
 static inline void set_page_stable_node(struct page *page,
 					struct stable_node *stable_node)
 {
-	page->mapping = (void *)stable_node +
-				(PAGE_MAPPING_ANON | PAGE_MAPPING_KSM);
+	page->mapping = (void *)((unsigned long)stable_node | PAGE_MAPPING_KSM);
 }
 
 /*
diff --git a/include/linux/migrate.h b/include/linux/migrate.h
index 9b50325e4ddf..404fbfefeb33 100644
--- a/include/linux/migrate.h
+++ b/include/linux/migrate.h
@@ -37,6 +37,8 @@ extern int migrate_page(struct address_space *,
 			struct page *, struct page *, enum migrate_mode);
 extern int migrate_pages(struct list_head *l, new_page_t new, free_page_t free,
 		unsigned long private, enum migrate_mode mode, int reason);
+extern bool isolate_movable_page(struct page *page, isolate_mode_t mode);
+extern void putback_movable_page(struct page *page);
 
 extern int migrate_prep(void);
 extern int migrate_prep_local(void);
diff --git a/include/linux/mm.h b/include/linux/mm.h
index a00ec816233a..33eaec57e997 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1035,6 +1035,7 @@ static inline pgoff_t page_file_index(struct page *page)
 }
 
 bool page_mapped(struct page *page);
+struct address_space *page_mapping(struct page *page);
 
 /*
  * Return true only if the page has been allocated with
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index e5a32445f930..f36dbb3a3060 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -129,6 +129,9 @@ enum pageflags {
 
 	/* Compound pages. Stored in first tail page's flags */
 	PG_double_map = PG_private_2,
+
+	/* non-lru isolated movable page */
+	PG_isolated = PG_reclaim,
 };
 
 #ifndef __GENERATING_BOUNDS_H
@@ -357,29 +360,37 @@ PAGEFLAG(Idle, idle, PF_ANY)
  * with the PAGE_MAPPING_ANON bit set to distinguish it.  See rmap.h.
  *
  * On an anonymous page in a VM_MERGEABLE area, if CONFIG_KSM is enabled,
- * the PAGE_MAPPING_KSM bit may be set along with the PAGE_MAPPING_ANON bit;
- * and then page->mapping points, not to an anon_vma, but to a private
+ * the PAGE_MAPPING_MOVABLE bit may be set along with the PAGE_MAPPING_ANON
+ * bit; and then page->mapping points, not to an anon_vma, but to a private
  * structure which KSM associates with that merged page.  See ksm.h.
  *
- * PAGE_MAPPING_KSM without PAGE_MAPPING_ANON is currently never used.
+ * PAGE_MAPPING_KSM without PAGE_MAPPING_ANON is used for non-lru movable
+ * page and then page->mapping points a struct address_space.
  *
  * Please note that, confusingly, "page_mapping" refers to the inode
  * address_space which maps the page from disk; whereas "page_mapped"
  * refers to user virtual address space into which the page is mapped.
  */
-#define PAGE_MAPPING_ANON	1
-#define PAGE_MAPPING_KSM	2
-#define PAGE_MAPPING_FLAGS	(PAGE_MAPPING_ANON | PAGE_MAPPING_KSM)
+#define PAGE_MAPPING_ANON	0x1
+#define PAGE_MAPPING_MOVABLE	0x2
+#define PAGE_MAPPING_KSM	(PAGE_MAPPING_ANON | PAGE_MAPPING_MOVABLE)
+#define PAGE_MAPPING_FLAGS	(PAGE_MAPPING_ANON | PAGE_MAPPING_MOVABLE)
 
-static __always_inline int PageAnonHead(struct page *page)
+static __always_inline int PageMappingFlags(struct page *page)
 {
-	return ((unsigned long)page->mapping & PAGE_MAPPING_ANON) != 0;
+	return ((unsigned long)page->mapping & PAGE_MAPPING_FLAGS) != 0;
 }
 
 static __always_inline int PageAnon(struct page *page)
 {
 	page = compound_head(page);
-	return PageAnonHead(page);
+	return ((unsigned long)page->mapping & PAGE_MAPPING_ANON) != 0;
+}
+
+static __always_inline int __PageMovable(struct page *page)
+{
+	return ((unsigned long)page->mapping & PAGE_MAPPING_FLAGS) ==
+				PAGE_MAPPING_MOVABLE;
 }
 
 #ifdef CONFIG_KSM
@@ -393,7 +404,7 @@ static __always_inline int PageKsm(struct page *page)
 {
 	page = compound_head(page);
 	return ((unsigned long)page->mapping & PAGE_MAPPING_FLAGS) ==
-				(PAGE_MAPPING_ANON | PAGE_MAPPING_KSM);
+				PAGE_MAPPING_KSM;
 }
 #else
 TESTPAGEFLAG_FALSE(Ksm)
@@ -641,6 +652,8 @@ static inline void __ClearPageBalloon(struct page *page)
 	atomic_set(&page->_mapcount, -1);
 }
 
+__PAGEFLAG(Isolated, isolated, PF_ANY);
+
 /*
  * If network-based swap is enabled, sl*b must keep track of whether pages
  * were allocated from pfmemalloc reserves.
diff --git a/mm/compaction.c b/mm/compaction.c
index 1427366ad673..a680b52e190b 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -81,6 +81,44 @@ static inline bool migrate_async_suitable(int migratetype)
 
 #ifdef CONFIG_COMPACTION
 
+int PageMovable(struct page *page)
+{
+	struct address_space *mapping;
+
+	VM_BUG_ON_PAGE(!PageLocked(page), page);
+	if (!__PageMovable(page))
+		return 0;
+
+	mapping = page_mapping(page);
+	if (mapping && mapping->a_ops && mapping->a_ops->isolate_page)
+		return 1;
+
+	return 0;
+}
+EXPORT_SYMBOL(PageMovable);
+
+void __SetPageMovable(struct page *page, struct address_space *mapping)
+{
+	VM_BUG_ON_PAGE(!PageLocked(page), page);
+	VM_BUG_ON_PAGE((unsigned long)mapping & PAGE_MAPPING_MOVABLE, page);
+	page->mapping = (void *)((unsigned long)mapping | PAGE_MAPPING_MOVABLE);
+}
+EXPORT_SYMBOL(__SetPageMovable);
+
+void __ClearPageMovable(struct page *page)
+{
+	VM_BUG_ON_PAGE(!PageLocked(page), page);
+	VM_BUG_ON_PAGE(!PageMovable(page), page);
+	/*
+	 * Clear registered address_space val with keeping PAGE_MAPPING_MOVABLE
+	 * flag so that VM can catch up released page by driver after isolation.
+	 * With it, VM migration doesn't try to put it back.
+	 */
+	page->mapping = (void *)((unsigned long)page->mapping &
+				PAGE_MAPPING_MOVABLE);
+}
+EXPORT_SYMBOL(__ClearPageMovable);
+
 /* Do not skip compaction more than 64 times */
 #define COMPACT_MAX_DEFER_SHIFT 6
 
@@ -735,21 +773,6 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
 		}
 
 		/*
-		 * Check may be lockless but that's ok as we recheck later.
-		 * It's possible to migrate LRU pages and balloon pages
-		 * Skip any other type of page
-		 */
-		is_lru = PageLRU(page);
-		if (!is_lru) {
-			if (unlikely(balloon_page_movable(page))) {
-				if (balloon_page_isolate(page)) {
-					/* Successfully isolated */
-					goto isolate_success;
-				}
-			}
-		}
-
-		/*
 		 * Regardless of being on LRU, compound pages such as THP and
 		 * hugetlbfs are not to be compacted. We can potentially save
 		 * a lot of iterations if we skip them at once. The check is
@@ -765,8 +788,38 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
 			goto isolate_fail;
 		}
 
-		if (!is_lru)
+		/*
+		 * Check may be lockless but that's ok as we recheck later.
+		 * It's possible to migrate LRU and non-lru movable pages.
+		 * Skip any other type of page
+		 */
+		is_lru = PageLRU(page);
+		if (!is_lru) {
+			if (unlikely(balloon_page_movable(page))) {
+				if (balloon_page_isolate(page)) {
+					/* Successfully isolated */
+					goto isolate_success;
+				}
+			}
+
+			/*
+			 * __PageMovable can return false positive so we need
+			 * to verify it under page_lock.
+			 */
+			if (unlikely(__PageMovable(page)) &&
+					!PageIsolated(page)) {
+				if (locked) {
+					spin_unlock_irqrestore(&zone->lru_lock,
+									flags);
+					locked = false;
+				}
+
+				if (isolate_movable_page(page, isolate_mode))
+					goto isolate_success;
+			}
+
 			goto isolate_fail;
+		}
 
 		/*
 		 * Migration will fail if an anonymous page is pinned in memory,
diff --git a/mm/ksm.c b/mm/ksm.c
index 4786b4150f62..35b8aef867a9 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -532,8 +532,8 @@ static struct page *get_ksm_page(struct stable_node *stable_node, bool lock_it)
 	void *expected_mapping;
 	unsigned long kpfn;
 
-	expected_mapping = (void *)stable_node +
-				(PAGE_MAPPING_ANON | PAGE_MAPPING_KSM);
+	expected_mapping = (void *)((unsigned long)stable_node |
+					PAGE_MAPPING_KSM);
 again:
 	kpfn = READ_ONCE(stable_node->kpfn);
 	page = pfn_to_page(kpfn);
diff --git a/mm/migrate.c b/mm/migrate.c
index 2666f28b5236..60abcf379b51 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -31,6 +31,7 @@
 #include <linux/vmalloc.h>
 #include <linux/security.h>
 #include <linux/backing-dev.h>
+#include <linux/compaction.h>
 #include <linux/syscalls.h>
 #include <linux/hugetlb.h>
 #include <linux/hugetlb_cgroup.h>
@@ -73,6 +74,81 @@ int migrate_prep_local(void)
 	return 0;
 }
 
+bool isolate_movable_page(struct page *page, isolate_mode_t mode)
+{
+	struct address_space *mapping;
+
+	/*
+	 * Avoid burning cycles with pages that are yet under __free_pages(),
+	 * or just got freed under us.
+	 *
+	 * In case we 'win' a race for a movable page being freed under us and
+	 * raise its refcount preventing __free_pages() from doing its job
+	 * the put_page() at the end of this block will take care of
+	 * release this page, thus avoiding a nasty leakage.
+	 */
+	if (unlikely(!get_page_unless_zero(page)))
+		goto out;
+
+	/*
+	 * Check PageMovable before holding a PG_lock because page's owner
+	 * assumes anybody doesn't touch PG_lock of newly allocated page
+	 * so unconditionally grapping the lock ruins page's owner side.
+	 */
+	if (unlikely(!__PageMovable(page)))
+		goto out_putpage;
+	/*
+	 * As movable pages are not isolated from LRU lists, concurrent
+	 * compaction threads can race against page migration functions
+	 * as well as race against the releasing a page.
+	 *
+	 * In order to avoid having an already isolated movable page
+	 * being (wrongly) re-isolated while it is under migration,
+	 * or to avoid attempting to isolate pages being released,
+	 * lets be sure we have the page lock
+	 * before proceeding with the movable page isolation steps.
+	 */
+	if (unlikely(!trylock_page(page)))
+		goto out_putpage;
+
+	if (!PageMovable(page) || PageIsolated(page))
+		goto out_no_isolated;
+
+	mapping = page_mapping(page);
+	VM_BUG_ON_PAGE(!mapping, page);
+
+	if (!mapping->a_ops->isolate_page(page, mode))
+		goto out_no_isolated;
+
+	/* Driver shouldn't use PG_isolated bit of page->flags */
+	WARN_ON_ONCE(PageIsolated(page));
+	__SetPageIsolated(page);
+	unlock_page(page);
+
+	return true;
+
+out_no_isolated:
+	unlock_page(page);
+out_putpage:
+	put_page(page);
+out:
+	return false;
+}
+
+/* It should be called on page which is PG_movable */
+void putback_movable_page(struct page *page)
+{
+	struct address_space *mapping;
+
+	VM_BUG_ON_PAGE(!PageLocked(page), page);
+	VM_BUG_ON_PAGE(!PageMovable(page), page);
+	VM_BUG_ON_PAGE(!PageIsolated(page), page);
+
+	mapping = page_mapping(page);
+	mapping->a_ops->putback_page(page);
+	__ClearPageIsolated(page);
+}
+
 /*
  * Put previously isolated pages back onto the appropriate lists
  * from where they were once taken off for compaction/migration.
@@ -94,10 +170,25 @@ void putback_movable_pages(struct list_head *l)
 		list_del(&page->lru);
 		dec_zone_page_state(page, NR_ISOLATED_ANON +
 				page_is_file_cache(page));
-		if (unlikely(isolated_balloon_page(page)))
+		if (unlikely(isolated_balloon_page(page))) {
 			balloon_page_putback(page);
-		else
+		/*
+		 * We isolated non-lru movable page so here we can use
+		 * __PageMovable because LRU page's mapping cannot have
+		 * PAGE_MAPPING_MOVABLE.
+		 */
+		} else if (unlikely(__PageMovable(page))) {
+			VM_BUG_ON_PAGE(!PageIsolated(page), page);
+			lock_page(page);
+			if (PageMovable(page))
+				putback_movable_page(page);
+			else
+				__ClearPageIsolated(page);
+			unlock_page(page);
+			put_page(page);
+		} else {
 			putback_lru_page(page);
+		}
 	}
 }
 
@@ -592,7 +683,7 @@ void migrate_page_copy(struct page *newpage, struct page *page)
  ***********************************************************/
 
 /*
- * Common logic to directly migrate a single page suitable for
+ * Common logic to directly migrate a single LRU page suitable for
  * pages that do not use PagePrivate/PagePrivate2.
  *
  * Pages are locked upon entry and exit.
@@ -755,33 +846,72 @@ static int move_to_new_page(struct page *newpage, struct page *page,
 				enum migrate_mode mode)
 {
 	struct address_space *mapping;
-	int rc;
+	int rc = -EAGAIN;
+	bool is_lru = !__PageMovable(page);
 
 	VM_BUG_ON_PAGE(!PageLocked(page), page);
 	VM_BUG_ON_PAGE(!PageLocked(newpage), newpage);
 
 	mapping = page_mapping(page);
-	if (!mapping)
-		rc = migrate_page(mapping, newpage, page, mode);
-	else if (mapping->a_ops->migratepage)
+
+	if (likely(is_lru)) {
+		if (!mapping)
+			rc = migrate_page(mapping, newpage, page, mode);
+		else if (mapping->a_ops->migratepage)
+			/*
+			 * Most pages have a mapping and most filesystems
+			 * provide a migratepage callback. Anonymous pages
+			 * are part of swap space which also has its own
+			 * migratepage callback. This is the most common path
+			 * for page migration.
+			 */
+			rc = mapping->a_ops->migratepage(mapping, newpage,
+							page, mode);
+		else
+			rc = fallback_migrate_page(mapping, newpage,
+							page, mode);
+	} else {
 		/*
-		 * Most pages have a mapping and most filesystems provide a
-		 * migratepage callback. Anonymous pages are part of swap
-		 * space which also has its own migratepage callback. This
-		 * is the most common path for page migration.
+		 * In case of non-lru page, it could be released after
+		 * isolation step. In that case, we shouldn't try migration.
 		 */
-		rc = mapping->a_ops->migratepage(mapping, newpage, page, mode);
-	else
-		rc = fallback_migrate_page(mapping, newpage, page, mode);
+		VM_BUG_ON_PAGE(!PageIsolated(page), page);
+		if (!PageMovable(page)) {
+			rc = MIGRATEPAGE_SUCCESS;
+			__ClearPageIsolated(page);
+			goto out;
+		}
+
+		rc = mapping->a_ops->migratepage(mapping, newpage,
+						page, mode);
+		WARN_ON_ONCE(rc == MIGRATEPAGE_SUCCESS &&
+			!PageIsolated(page));
+	}
 
 	/*
 	 * When successful, old pagecache page->mapping must be cleared before
 	 * page is freed; but stats require that PageAnon be left as PageAnon.
 	 */
 	if (rc == MIGRATEPAGE_SUCCESS) {
-		if (!PageAnon(page))
+		if (__PageMovable(page)) {
+			VM_BUG_ON_PAGE(!PageIsolated(page), page);
+
+			/*
+			 * We clear PG_movable under page_lock so any compactor
+			 * cannot try to migrate this page.
+			 */
+			__ClearPageIsolated(page);
+		}
+
+		/*
+		 * Anonymous and movable page->mapping will be cleard by
+		 * free_pages_prepare so don't reset it here for keeping
+		 * the type to work PageAnon, for example.
+		 */
+		if (!PageMappingFlags(page))
 			page->mapping = NULL;
 	}
+out:
 	return rc;
 }
 
@@ -791,6 +921,7 @@ static int __unmap_and_move(struct page *page, struct page *newpage,
 	int rc = -EAGAIN;
 	int page_was_mapped = 0;
 	struct anon_vma *anon_vma = NULL;
+	bool is_lru = !__PageMovable(page);
 
 	if (!trylock_page(page)) {
 		if (!force || mode == MIGRATE_ASYNC)
@@ -871,6 +1002,11 @@ static int __unmap_and_move(struct page *page, struct page *newpage,
 		goto out_unlock_both;
 	}
 
+	if (unlikely(!is_lru)) {
+		rc = move_to_new_page(newpage, page, mode);
+		goto out_unlock_both;
+	}
+
 	/*
 	 * Corner case handling:
 	 * 1. When a new swap-cache page is read into, it is added to the LRU
@@ -920,7 +1056,8 @@ static int __unmap_and_move(struct page *page, struct page *newpage,
 	 * list in here.
 	 */
 	if (rc == MIGRATEPAGE_SUCCESS) {
-		if (unlikely(__is_movable_balloon_page(newpage)))
+		if (unlikely(__is_movable_balloon_page(newpage) ||
+				__PageMovable(newpage)))
 			put_page(newpage);
 		else
 			putback_lru_page(newpage);
@@ -961,6 +1098,12 @@ static ICE_noinline int unmap_and_move(new_page_t get_new_page,
 		/* page was freed from under us. So we are done. */
 		ClearPageActive(page);
 		ClearPageUnevictable(page);
+		if (unlikely(__PageMovable(page))) {
+			lock_page(page);
+			if (!PageMovable(page))
+				__ClearPageIsolated(page);
+			unlock_page(page);
+		}
 		if (put_new_page)
 			put_new_page(newpage, private);
 		else
@@ -1010,8 +1153,21 @@ static ICE_noinline int unmap_and_move(new_page_t get_new_page,
 				num_poisoned_pages_inc();
 		}
 	} else {
-		if (rc != -EAGAIN)
-			putback_lru_page(page);
+		if (rc != -EAGAIN) {
+			if (likely(!__PageMovable(page))) {
+				putback_lru_page(page);
+				goto put_new;
+			}
+
+			lock_page(page);
+			if (PageMovable(page))
+				putback_movable_page(page);
+			else
+				__ClearPageIsolated(page);
+			unlock_page(page);
+			put_page(page);
+		}
+put_new:
 		if (put_new_page)
 			put_new_page(newpage, private);
 		else
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index d27e8b968ac3..4d0ba8f6cfee 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1014,7 +1014,7 @@ static __always_inline bool free_pages_prepare(struct page *page,
 			(page + i)->flags &= ~PAGE_FLAGS_CHECK_AT_PREP;
 		}
 	}
-	if (PageAnonHead(page))
+	if (PageMappingFlags(page))
 		page->mapping = NULL;
 	if (check_free)
 		bad += free_pages_check(page);
diff --git a/mm/util.c b/mm/util.c
index 917e0e3d0f8e..b756ee36f7f0 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -399,10 +399,12 @@ struct address_space *page_mapping(struct page *page)
 	}
 
 	mapping = page->mapping;
-	if ((unsigned long)mapping & PAGE_MAPPING_FLAGS)
+	if ((unsigned long)mapping & PAGE_MAPPING_ANON)
 		return NULL;
-	return mapping;
+
+	return (void *)((unsigned long)mapping & ~PAGE_MAPPING_FLAGS);
 }
+EXPORT_SYMBOL(page_mapping);
 
 /* Slow path of page_mapcount() for compound pages */
 int __page_mapcount(struct page *page)
-- 
1.9.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: PATCH v6v2 02/12] mm: migrate: support non-lru movable page migration
  2016-05-30 16:25       ` Minchan Kim
@ 2016-05-31  7:51         ` Vlastimil Babka
  0 siblings, 0 replies; 22+ messages in thread
From: Vlastimil Babka @ 2016-05-31  7:51 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, linux-mm, linux-kernel, Rik van Riel, Joonsoo Kim,
	Mel Gorman, Hugh Dickins, Rafael Aquini, virtualization,
	Jonathan Corbet, John Einar Reitan, dri-devel, Sergey Senozhatsky,
	Gioh Kim

On 05/30/2016 06:25 PM, Minchan Kim wrote:
>>> --- a/mm/compaction.c
>>> +++ b/mm/compaction.c
>>> @@ -81,6 +81,39 @@ static inline bool migrate_async_suitable(int migratetype)
>>>
>>> #ifdef CONFIG_COMPACTION
>>>
>>> +int PageMovable(struct page *page)
>>> +{
>>> +	struct address_space *mapping;
>>> +
>>> +	VM_BUG_ON_PAGE(!PageLocked(page), page);
>>> +	if (!__PageMovable(page))
>>> +		return 0;
>>> +
>>> +	mapping = page_mapping(page);
>>> +	if (mapping && mapping->a_ops && mapping->a_ops->isolate_page)
>>> +		return 1;
>>> +
>>> +	return 0;
>>> +}
>>> +EXPORT_SYMBOL(PageMovable);
>>> +
>>> +void __SetPageMovable(struct page *page, struct address_space *mapping)
>>> +{
>>> +	VM_BUG_ON_PAGE(!PageLocked(page), page);
>>> +	VM_BUG_ON_PAGE((unsigned long)mapping & PAGE_MAPPING_MOVABLE, page);
>>> +	page->mapping = (void *)((unsigned long)mapping | PAGE_MAPPING_MOVABLE);
>>> +}
>>> +EXPORT_SYMBOL(__SetPageMovable);
>>> +
>>> +void __ClearPageMovable(struct page *page)
>>> +{
>>> +	VM_BUG_ON_PAGE(!PageLocked(page), page);
>>> +	VM_BUG_ON_PAGE(!PageMovable(page), page);
>>> +	page->mapping = (void *)((unsigned long)page->mapping &
>>> +				PAGE_MAPPING_MOVABLE);
>>> +}
>>> +EXPORT_SYMBOL(__ClearPageMovable);
>>
>> The second confusing thing is that the function is named
>> __ClearPageMovable(), but what it really clears is the mapping
>> pointer,
>> which is not at all the opposite of what __SetPageMovable() does.
>>
>> I know it's explained in the documentation, but it also deserves a
>> comment here so it doesn't confuse everyone who looks at it.
>> Even better would be a less confusing name for the function, but I
>> can't offer one right now.
>
> To me, __ClearPageMovable naming is suitable for user POV.
> It effectively makes the page unmovable. The confusion is just caused
> by the implementation and I don't prefer exported API depends on the
> implementation. So I want to add just comment.
>
> I didn't add comment above the function because I don't want to export
> internal implementation to the user. I think they don't need to know it.
>
> index a7df2ae71f2a..d1d2063b4fd9 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -108,6 +108,11 @@ void __ClearPageMovable(struct page *page)
>  {
>         VM_BUG_ON_PAGE(!PageLocked(page), page);
>         VM_BUG_ON_PAGE(!PageMovable(page), page);
> +       /*
> +        * Clear registered address_space val with keeping PAGE_MAPPING_MOVABLE
> +        * flag so that VM can catch up released page by driver after isolation.
> +        * With it, VM migration doesn't try to put it back.
> +        */
>         page->mapping = (void *)((unsigned long)page->mapping &
>                                 PAGE_MAPPING_MOVABLE);

OK, that's fine!

>>
>>> diff --git a/mm/util.c b/mm/util.c
>>> index 917e0e3d0f8e..b756ee36f7f0 100644
>>> --- a/mm/util.c
>>> +++ b/mm/util.c
>>> @@ -399,10 +399,12 @@ struct address_space *page_mapping(struct page *page)
>>> 	}
>>>
>>> 	mapping = page->mapping;
>>
>> I'd probably use READ_ONCE() here to be safe. Not all callers are
>> under page lock?
>
> I don't understand. Yeah, all caller are not under page lock but at least,
> new user of movable pages should call it under page_lock.
> Yeah, I will write the rule down in document.
> In this case, what kinds of problem do you see?

After more thinking, probably none. It wouldn't prevent any extra races. 
Sorry for the noise.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v6v3 02/12] mm: migrate: support non-lru movable page migration
  2016-05-31  0:01     ` [PATCH v6v3 " Minchan Kim
@ 2016-05-31  7:52       ` Vlastimil Babka
  2016-05-31 23:05         ` Minchan Kim
  2016-06-13  9:38       ` Anshuman Khandual
  1 sibling, 1 reply; 22+ messages in thread
From: Vlastimil Babka @ 2016-05-31  7:52 UTC (permalink / raw)
  To: Minchan Kim, Andrew Morton
  Cc: linux-mm, linux-kernel, Rik van Riel, Joonsoo Kim, Mel Gorman,
	Hugh Dickins, Rafael Aquini, virtualization, Jonathan Corbet,
	John Einar Reitan, dri-devel, Sergey Senozhatsky, Gioh Kim

On 05/31/2016 02:01 AM, Minchan Kim wrote:
> Per Vlastimi's review comment.
>
> Thanks for the detail review, Vlastimi!
> If you have another concern, feel free to say.

I don't for now :)

[...]

> Cc: Rik van Riel <riel@redhat.com>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> Cc: Mel Gorman <mgorman@suse.de>
> Cc: Hugh Dickins <hughd@google.com>
> Cc: Rafael Aquini <aquini@redhat.com>
> Cc: virtualization@lists.linux-foundation.org
> Cc: Jonathan Corbet <corbet@lwn.net>
> Cc: John Einar Reitan <john.reitan@foss.arm.com>
> Cc: dri-devel@lists.freedesktop.org
> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
> Signed-off-by: Gioh Kim <gi-oh.kim@profitbricks.com>
> Signed-off-by: Minchan Kim <minchan@kernel.org>

Acked-by: Vlastimil Babka <vbabka@suse.cz>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v6v3 02/12] mm: migrate: support non-lru movable page migration
  2016-05-31  7:52       ` Vlastimil Babka
@ 2016-05-31 23:05         ` Minchan Kim
  0 siblings, 0 replies; 22+ messages in thread
From: Minchan Kim @ 2016-05-31 23:05 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, linux-mm, linux-kernel, Rik van Riel, Joonsoo Kim,
	Mel Gorman, Hugh Dickins, Rafael Aquini, virtualization,
	Jonathan Corbet, John Einar Reitan, dri-devel, Sergey Senozhatsky,
	Gioh Kim

On Tue, May 31, 2016 at 09:52:48AM +0200, Vlastimil Babka wrote:
> On 05/31/2016 02:01 AM, Minchan Kim wrote:
> >Per Vlastimi's review comment.
> >
> >Thanks for the detail review, Vlastimi!
> >If you have another concern, feel free to say.
> 
> I don't for now :)
> 
> [...]
> 
> >Cc: Rik van Riel <riel@redhat.com>
> >Cc: Vlastimil Babka <vbabka@suse.cz>
> >Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> >Cc: Mel Gorman <mgorman@suse.de>
> >Cc: Hugh Dickins <hughd@google.com>
> >Cc: Rafael Aquini <aquini@redhat.com>
> >Cc: virtualization@lists.linux-foundation.org
> >Cc: Jonathan Corbet <corbet@lwn.net>
> >Cc: John Einar Reitan <john.reitan@foss.arm.com>
> >Cc: dri-devel@lists.freedesktop.org
> >Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
> >Signed-off-by: Gioh Kim <gi-oh.kim@profitbricks.com>
> >Signed-off-by: Minchan Kim <minchan@kernel.org>
> 
> Acked-by: Vlastimil Babka <vbabka@suse.cz>

Thanks for the review, Vlastimil!

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v6v3 02/12] mm: migrate: support non-lru movable page migration
  2016-05-31  0:01     ` [PATCH v6v3 " Minchan Kim
  2016-05-31  7:52       ` Vlastimil Babka
@ 2016-06-13  9:38       ` Anshuman Khandual
  2016-06-15  2:32         ` Minchan Kim
  1 sibling, 1 reply; 22+ messages in thread
From: Anshuman Khandual @ 2016-06-13  9:38 UTC (permalink / raw)
  To: Minchan Kim, Andrew Morton
  Cc: linux-mm, linux-kernel, Rik van Riel, Vlastimil Babka,
	Joonsoo Kim, Mel Gorman, Hugh Dickins, Rafael Aquini,
	virtualization, Jonathan Corbet, John Einar Reitan, dri-devel,
	Sergey Senozhatsky, Gioh Kim

On 05/31/2016 05:31 AM, Minchan Kim wrote:
> @@ -791,6 +921,7 @@ static int __unmap_and_move(struct page *page, struct page *newpage,
>  	int rc = -EAGAIN;
>  	int page_was_mapped = 0;
>  	struct anon_vma *anon_vma = NULL;
> +	bool is_lru = !__PageMovable(page);
>  
>  	if (!trylock_page(page)) {
>  		if (!force || mode == MIGRATE_ASYNC)
> @@ -871,6 +1002,11 @@ static int __unmap_and_move(struct page *page, struct page *newpage,
>  		goto out_unlock_both;
>  	}
>  
> +	if (unlikely(!is_lru)) {
> +		rc = move_to_new_page(newpage, page, mode);
> +		goto out_unlock_both;
> +	}
> +

Hello Minchan,

I might be missing something here but does this implementation support the
scenario where these non LRU pages owned by the driver mapped as PTE into
process page table ? Because the "goto out_unlock_both" statement above
skips all the PTE unmap, putting a migration PTE and removing the migration
PTE steps.

Regards
Anshuman

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v6v3 02/12] mm: migrate: support non-lru movable page migration
  2016-06-13  9:38       ` Anshuman Khandual
@ 2016-06-15  2:32         ` Minchan Kim
  2016-06-15  6:45           ` Anshuman Khandual
  0 siblings, 1 reply; 22+ messages in thread
From: Minchan Kim @ 2016-06-15  2:32 UTC (permalink / raw)
  To: Anshuman Khandual
  Cc: Andrew Morton, linux-mm, linux-kernel, Rik van Riel,
	Vlastimil Babka, Joonsoo Kim, Mel Gorman, Hugh Dickins,
	Rafael Aquini, virtualization, Jonathan Corbet, John Einar Reitan,
	dri-devel, Sergey Senozhatsky, Gioh Kim

Hi,

On Mon, Jun 13, 2016 at 03:08:19PM +0530, Anshuman Khandual wrote:
> On 05/31/2016 05:31 AM, Minchan Kim wrote:
> > @@ -791,6 +921,7 @@ static int __unmap_and_move(struct page *page, struct page *newpage,
> >  	int rc = -EAGAIN;
> >  	int page_was_mapped = 0;
> >  	struct anon_vma *anon_vma = NULL;
> > +	bool is_lru = !__PageMovable(page);
> >  
> >  	if (!trylock_page(page)) {
> >  		if (!force || mode == MIGRATE_ASYNC)
> > @@ -871,6 +1002,11 @@ static int __unmap_and_move(struct page *page, struct page *newpage,
> >  		goto out_unlock_both;
> >  	}
> >  
> > +	if (unlikely(!is_lru)) {
> > +		rc = move_to_new_page(newpage, page, mode);
> > +		goto out_unlock_both;
> > +	}
> > +
> 
> Hello Minchan,
> 
> I might be missing something here but does this implementation support the
> scenario where these non LRU pages owned by the driver mapped as PTE into
> process page table ? Because the "goto out_unlock_both" statement above
> skips all the PTE unmap, putting a migration PTE and removing the migration
> PTE steps.

You're right. Unfortunately, it doesn't support right now but surely,
it's my TODO after landing this work.

Could you share your usecase?

It would be helpful for merging when I wll send patchset.

Thanks!

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v6v3 02/12] mm: migrate: support non-lru movable page migration
  2016-06-15  2:32         ` Minchan Kim
@ 2016-06-15  6:45           ` Anshuman Khandual
  2016-06-16  0:26             ` Minchan Kim
  0 siblings, 1 reply; 22+ messages in thread
From: Anshuman Khandual @ 2016-06-15  6:45 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Rik van Riel, Sergey Senozhatsky, Rafael Aquini, Jonathan Corbet,
	Hugh Dickins, linux-kernel, dri-devel, virtualization,
	John Einar Reitan, linux-mm, Gioh Kim, Mel Gorman, Andrew Morton,
	Joonsoo Kim, Vlastimil Babka

On 06/15/2016 08:02 AM, Minchan Kim wrote:
> Hi,
> 
> On Mon, Jun 13, 2016 at 03:08:19PM +0530, Anshuman Khandual wrote:
>> > On 05/31/2016 05:31 AM, Minchan Kim wrote:
>>> > > @@ -791,6 +921,7 @@ static int __unmap_and_move(struct page *page, struct page *newpage,
>>> > >  	int rc = -EAGAIN;
>>> > >  	int page_was_mapped = 0;
>>> > >  	struct anon_vma *anon_vma = NULL;
>>> > > +	bool is_lru = !__PageMovable(page);
>>> > >  
>>> > >  	if (!trylock_page(page)) {
>>> > >  		if (!force || mode == MIGRATE_ASYNC)
>>> > > @@ -871,6 +1002,11 @@ static int __unmap_and_move(struct page *page, struct page *newpage,
>>> > >  		goto out_unlock_both;
>>> > >  	}
>>> > >  
>>> > > +	if (unlikely(!is_lru)) {
>>> > > +		rc = move_to_new_page(newpage, page, mode);
>>> > > +		goto out_unlock_both;
>>> > > +	}
>>> > > +
>> > 
>> > Hello Minchan,
>> > 
>> > I might be missing something here but does this implementation support the
>> > scenario where these non LRU pages owned by the driver mapped as PTE into
>> > process page table ? Because the "goto out_unlock_both" statement above
>> > skips all the PTE unmap, putting a migration PTE and removing the migration
>> > PTE steps.
> You're right. Unfortunately, it doesn't support right now but surely,
> it's my TODO after landing this work.
> 
> Could you share your usecase?

Sure.

My driver has privately managed non LRU pages which gets mapped into user space
process page table through f_ops->mmap() and vmops->fault() which then updates
the file RMAP (page->mapping->i_mmap) through page_add_file_rmap(page). One thing
to note here is that the page->mapping eventually points to struct address_space
(file->f_mapping) which belongs to the character device file (created using mknod)
which we are using for establishing the mmap() regions in the user space.

Now as per this new framework, all the page's are to be made __SetPageMovable before
passing the list down to migrate_pages(). Now __SetPageMovable() takes *new* struct
address_space as an argument and replaces the existing page->mapping. Now thats the
problem, we have lost all our connection to the existing file RMAP information. This
stands as a problem when we try to migrate these non LRU pages which are PTE mapped.
The rmap_walk_file() never finds them in the VMA, skips all the migrate PTE steps and
then the migration eventually fails.

Seems like assigning a new struct address_space to the page through __SetPageMovable()
is the source of the problem. Can it take the existing (file->f_mapping) as an argument
in there ? Sure, but then can we override file system generic ->isolate(), ->putback(),
->migratepages() functions ? I dont think so. I am sure, there must be some work around
to fix this problem for the driver. But we need to rethink this framework from supporting
these mapped non LRU pages point of view.

I might be missing something here, feel free to point out.

- Anshuman

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v6v3 02/12] mm: migrate: support non-lru movable page migration
  2016-06-15  6:45           ` Anshuman Khandual
@ 2016-06-16  0:26             ` Minchan Kim
  2016-06-16  3:42               ` Anshuman Khandual
  0 siblings, 1 reply; 22+ messages in thread
From: Minchan Kim @ 2016-06-16  0:26 UTC (permalink / raw)
  To: Anshuman Khandual
  Cc: Rik van Riel, Sergey Senozhatsky, Rafael Aquini, Jonathan Corbet,
	Hugh Dickins, linux-kernel, dri-devel, virtualization,
	John Einar Reitan, linux-mm, Gioh Kim, Mel Gorman, Andrew Morton,
	Joonsoo Kim, Vlastimil Babka

On Wed, Jun 15, 2016 at 12:15:04PM +0530, Anshuman Khandual wrote:
> On 06/15/2016 08:02 AM, Minchan Kim wrote:
> > Hi,
> > 
> > On Mon, Jun 13, 2016 at 03:08:19PM +0530, Anshuman Khandual wrote:
> >> > On 05/31/2016 05:31 AM, Minchan Kim wrote:
> >>> > > @@ -791,6 +921,7 @@ static int __unmap_and_move(struct page *page, struct page *newpage,
> >>> > >  	int rc = -EAGAIN;
> >>> > >  	int page_was_mapped = 0;
> >>> > >  	struct anon_vma *anon_vma = NULL;
> >>> > > +	bool is_lru = !__PageMovable(page);
> >>> > >  
> >>> > >  	if (!trylock_page(page)) {
> >>> > >  		if (!force || mode == MIGRATE_ASYNC)
> >>> > > @@ -871,6 +1002,11 @@ static int __unmap_and_move(struct page *page, struct page *newpage,
> >>> > >  		goto out_unlock_both;
> >>> > >  	}
> >>> > >  
> >>> > > +	if (unlikely(!is_lru)) {
> >>> > > +		rc = move_to_new_page(newpage, page, mode);
> >>> > > +		goto out_unlock_both;
> >>> > > +	}
> >>> > > +
> >> > 
> >> > Hello Minchan,
> >> > 
> >> > I might be missing something here but does this implementation support the
> >> > scenario where these non LRU pages owned by the driver mapped as PTE into
> >> > process page table ? Because the "goto out_unlock_both" statement above
> >> > skips all the PTE unmap, putting a migration PTE and removing the migration
> >> > PTE steps.
> > You're right. Unfortunately, it doesn't support right now but surely,
> > it's my TODO after landing this work.
> > 
> > Could you share your usecase?
> 
> Sure.

Thanks a lot!

> 
> My driver has privately managed non LRU pages which gets mapped into user space
> process page table through f_ops->mmap() and vmops->fault() which then updates
> the file RMAP (page->mapping->i_mmap) through page_add_file_rmap(page). One thing

Hmm, page_add_file_rmap is not exported function. How does your driver can use it?
Do you use vm_insert_pfn?
What type your vma is? VM_PFNMMAP or VM_MIXEDMAP?

I want to make dummy driver to simulate your case.
It would be very helpful to implement/test pte-mapped non-lru page
migration feature. That's why I ask now.

> to note here is that the page->mapping eventually points to struct address_space
> (file->f_mapping) which belongs to the character device file (created using mknod)
> which we are using for establishing the mmap() regions in the user space.
> 
> Now as per this new framework, all the page's are to be made __SetPageMovable before
> passing the list down to migrate_pages(). Now __SetPageMovable() takes *new* struct
> address_space as an argument and replaces the existing page->mapping. Now thats the
> problem, we have lost all our connection to the existing file RMAP information. This

We could change __SetPageMovable doesn't need mapping argument.
Instead, it just marks PAGE_MAPPING_MOVABLE into page->mapping.
For that, user should take care of setting page->mapping earlier than
marking the flag.

> stands as a problem when we try to migrate these non LRU pages which are PTE mapped.
> The rmap_walk_file() never finds them in the VMA, skips all the migrate PTE steps and
> then the migration eventually fails.
> 
> Seems like assigning a new struct address_space to the page through __SetPageMovable()
> is the source of the problem. Can it take the existing (file->f_mapping) as an argument
We can set existing file->f_mapping under the page_lock.

> in there ? Sure, but then can we override file system generic ->isolate(), ->putback(),

I don't get it. Why does it override file system generic functions?

> ->migratepages() functions ? I dont think so. I am sure, there must be some work around
> to fix this problem for the driver. But we need to rethink this framework from supporting
> these mapped non LRU pages point of view.
> 
> I might be missing something here, feel free to point out.
> 
> - Anshuman
> 
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v6v3 02/12] mm: migrate: support non-lru movable page migration
  2016-06-16  0:26             ` Minchan Kim
@ 2016-06-16  3:42               ` Anshuman Khandual
  2016-06-16  5:37                 ` Minchan Kim
  0 siblings, 1 reply; 22+ messages in thread
From: Anshuman Khandual @ 2016-06-16  3:42 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Rik van Riel, Sergey Senozhatsky, Rafael Aquini, Jonathan Corbet,
	Hugh Dickins, linux-kernel, dri-devel, virtualization,
	John Einar Reitan, linux-mm, Gioh Kim, Mel Gorman, Andrew Morton,
	Joonsoo Kim, Vlastimil Babka

On 06/16/2016 05:56 AM, Minchan Kim wrote:
> On Wed, Jun 15, 2016 at 12:15:04PM +0530, Anshuman Khandual wrote:
>> On 06/15/2016 08:02 AM, Minchan Kim wrote:
>>> Hi,
>>>
>>> On Mon, Jun 13, 2016 at 03:08:19PM +0530, Anshuman Khandual wrote:
>>>>> On 05/31/2016 05:31 AM, Minchan Kim wrote:
>>>>>>> @@ -791,6 +921,7 @@ static int __unmap_and_move(struct page *page, struct page *newpage,
>>>>>>>  	int rc = -EAGAIN;
>>>>>>>  	int page_was_mapped = 0;
>>>>>>>  	struct anon_vma *anon_vma = NULL;
>>>>>>> +	bool is_lru = !__PageMovable(page);
>>>>>>>  
>>>>>>>  	if (!trylock_page(page)) {
>>>>>>>  		if (!force || mode == MIGRATE_ASYNC)
>>>>>>> @@ -871,6 +1002,11 @@ static int __unmap_and_move(struct page *page, struct page *newpage,
>>>>>>>  		goto out_unlock_both;
>>>>>>>  	}
>>>>>>>  
>>>>>>> +	if (unlikely(!is_lru)) {
>>>>>>> +		rc = move_to_new_page(newpage, page, mode);
>>>>>>> +		goto out_unlock_both;
>>>>>>> +	}
>>>>>>> +
>>>>>
>>>>> Hello Minchan,
>>>>>
>>>>> I might be missing something here but does this implementation support the
>>>>> scenario where these non LRU pages owned by the driver mapped as PTE into
>>>>> process page table ? Because the "goto out_unlock_both" statement above
>>>>> skips all the PTE unmap, putting a migration PTE and removing the migration
>>>>> PTE steps.
>>> You're right. Unfortunately, it doesn't support right now but surely,
>>> it's my TODO after landing this work.
>>>
>>> Could you share your usecase?
>>
>> Sure.
> 
> Thanks a lot!
> 
>>
>> My driver has privately managed non LRU pages which gets mapped into user space
>> process page table through f_ops->mmap() and vmops->fault() which then updates
>> the file RMAP (page->mapping->i_mmap) through page_add_file_rmap(page). One thing
> 
> Hmm, page_add_file_rmap is not exported function. How does your driver can use it?

Its not using the function directly, I just re-iterated the sequence of functions
above. (do_set_pte -> page_add_file_rmap) gets called after we grab the page from
driver through (__do_fault->vma->vm_ops->fault()).

> Do you use vm_insert_pfn?
> What type your vma is? VM_PFNMMAP or VM_MIXEDMAP?

I dont use vm_insert_pfn(). Here is the sequence of events how the user space
VMA gets the non LRU pages from the driver.

- Driver registers a character device with 'struct file_operations' binding
- Then the 'fops->mmap()' just binds the incoming 'struct vma' with a 'struct
  vm_operations_struct' which provides the 'vmops->fault()' routine which
  basically traps all page faults on the VMA and provides one page at a time
  through a driver specific allocation routine which hands over non LRU pages

The VMA is not anything special as such. Its what we get when we try to do a
simple mmap() on a file descriptor pointing to a character device. I can
figure out all the VM_* flags it holds after creation.

> 
> I want to make dummy driver to simulate your case.

Sure. I hope the above mentioned steps will help you but in case you need more
information, please do let me know.

> It would be very helpful to implement/test pte-mapped non-lru page
> migration feature. That's why I ask now.
> 
>> to note here is that the page->mapping eventually points to struct address_space
>> (file->f_mapping) which belongs to the character device file (created using mknod)
>> which we are using for establishing the mmap() regions in the user space.
>>
>> Now as per this new framework, all the page's are to be made __SetPageMovable before
>> passing the list down to migrate_pages(). Now __SetPageMovable() takes *new* struct
>> address_space as an argument and replaces the existing page->mapping. Now thats the
>> problem, we have lost all our connection to the existing file RMAP information. This
> 
> We could change __SetPageMovable doesn't need mapping argument.
> Instead, it just marks PAGE_MAPPING_MOVABLE into page->mapping.
> For that, user should take care of setting page->mapping earlier than
> marking the flag.

Sounds like a good idea, that way we dont loose the reverse mapping information.

> 
>> stands as a problem when we try to migrate these non LRU pages which are PTE mapped.
>> The rmap_walk_file() never finds them in the VMA, skips all the migrate PTE steps and
>> then the migration eventually fails.
>>
>> Seems like assigning a new struct address_space to the page through __SetPageMovable()
>> is the source of the problem. Can it take the existing (file->f_mapping) as an argument

> We can set existing file->f_mapping under the page_lock.

Thats another option along with what you mentioned above.

> 
>> in there ? Sure, but then can we override file system generic ->isolate(), ->putback(),
> 
> I don't get it. Why does it override file system generic functions?

Sure it does not, it was just an wild idea to over come the problem.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v6v3 02/12] mm: migrate: support non-lru movable page migration
  2016-06-16  3:42               ` Anshuman Khandual
@ 2016-06-16  5:37                 ` Minchan Kim
  2016-06-27  5:51                   ` Anshuman Khandual
  0 siblings, 1 reply; 22+ messages in thread
From: Minchan Kim @ 2016-06-16  5:37 UTC (permalink / raw)
  To: Anshuman Khandual
  Cc: Andrew Morton, linux-mm, linux-kernel, Rik van Riel,
	Vlastimil Babka, Joonsoo Kim, Mel Gorman, Hugh Dickins,
	Rafael Aquini, virtualization, Jonathan Corbet, John Einar Reitan,
	dri-devel, Sergey Senozhatsky, Gioh Kim

On Thu, Jun 16, 2016 at 09:12:07AM +0530, Anshuman Khandual wrote:
> On 06/16/2016 05:56 AM, Minchan Kim wrote:
> > On Wed, Jun 15, 2016 at 12:15:04PM +0530, Anshuman Khandual wrote:
> >> On 06/15/2016 08:02 AM, Minchan Kim wrote:
> >>> Hi,
> >>>
> >>> On Mon, Jun 13, 2016 at 03:08:19PM +0530, Anshuman Khandual wrote:
> >>>>> On 05/31/2016 05:31 AM, Minchan Kim wrote:
> >>>>>>> @@ -791,6 +921,7 @@ static int __unmap_and_move(struct page *page, struct page *newpage,
> >>>>>>>  	int rc = -EAGAIN;
> >>>>>>>  	int page_was_mapped = 0;
> >>>>>>>  	struct anon_vma *anon_vma = NULL;
> >>>>>>> +	bool is_lru = !__PageMovable(page);
> >>>>>>>  
> >>>>>>>  	if (!trylock_page(page)) {
> >>>>>>>  		if (!force || mode == MIGRATE_ASYNC)
> >>>>>>> @@ -871,6 +1002,11 @@ static int __unmap_and_move(struct page *page, struct page *newpage,
> >>>>>>>  		goto out_unlock_both;
> >>>>>>>  	}
> >>>>>>>  
> >>>>>>> +	if (unlikely(!is_lru)) {
> >>>>>>> +		rc = move_to_new_page(newpage, page, mode);
> >>>>>>> +		goto out_unlock_both;
> >>>>>>> +	}
> >>>>>>> +
> >>>>>
> >>>>> Hello Minchan,
> >>>>>
> >>>>> I might be missing something here but does this implementation support the
> >>>>> scenario where these non LRU pages owned by the driver mapped as PTE into
> >>>>> process page table ? Because the "goto out_unlock_both" statement above
> >>>>> skips all the PTE unmap, putting a migration PTE and removing the migration
> >>>>> PTE steps.
> >>> You're right. Unfortunately, it doesn't support right now but surely,
> >>> it's my TODO after landing this work.
> >>>
> >>> Could you share your usecase?
> >>
> >> Sure.
> > 
> > Thanks a lot!
> > 
> >>
> >> My driver has privately managed non LRU pages which gets mapped into user space
> >> process page table through f_ops->mmap() and vmops->fault() which then updates
> >> the file RMAP (page->mapping->i_mmap) through page_add_file_rmap(page). One thing
> > 
> > Hmm, page_add_file_rmap is not exported function. How does your driver can use it?
> 
> Its not using the function directly, I just re-iterated the sequence of functions
> above. (do_set_pte -> page_add_file_rmap) gets called after we grab the page from
> driver through (__do_fault->vma->vm_ops->fault()).
> 
> > Do you use vm_insert_pfn?
> > What type your vma is? VM_PFNMMAP or VM_MIXEDMAP?
> 
> I dont use vm_insert_pfn(). Here is the sequence of events how the user space
> VMA gets the non LRU pages from the driver.
> 
> - Driver registers a character device with 'struct file_operations' binding
> - Then the 'fops->mmap()' just binds the incoming 'struct vma' with a 'struct
>   vm_operations_struct' which provides the 'vmops->fault()' routine which
>   basically traps all page faults on the VMA and provides one page at a time
>   through a driver specific allocation routine which hands over non LRU pages
> 
> The VMA is not anything special as such. Its what we get when we try to do a
> simple mmap() on a file descriptor pointing to a character device. I can
> figure out all the VM_* flags it holds after creation.
> 
> > 
> > I want to make dummy driver to simulate your case.
> 
> Sure. I hope the above mentioned steps will help you but in case you need more
> information, please do let me know.

I got understood now. :)
I will test it with dummy driver and will Cc'ed when I send a patch.

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v6v3 02/12] mm: migrate: support non-lru movable page migration
  2016-06-16  5:37                 ` Minchan Kim
@ 2016-06-27  5:51                   ` Anshuman Khandual
  2016-06-28  6:39                     ` Minchan Kim
  0 siblings, 1 reply; 22+ messages in thread
From: Anshuman Khandual @ 2016-06-27  5:51 UTC (permalink / raw)
  To: Minchan Kim, Anshuman Khandual
  Cc: Andrew Morton, linux-mm, linux-kernel, Rik van Riel,
	Vlastimil Babka, Joonsoo Kim, Mel Gorman, Hugh Dickins,
	Rafael Aquini, virtualization, Jonathan Corbet, John Einar Reitan,
	dri-devel, Sergey Senozhatsky, Gioh Kim

On 06/16/2016 11:07 AM, Minchan Kim wrote:
> On Thu, Jun 16, 2016 at 09:12:07AM +0530, Anshuman Khandual wrote:
>> On 06/16/2016 05:56 AM, Minchan Kim wrote:
>>> On Wed, Jun 15, 2016 at 12:15:04PM +0530, Anshuman Khandual wrote:
>>>> On 06/15/2016 08:02 AM, Minchan Kim wrote:
>>>>> Hi,
>>>>>
>>>>> On Mon, Jun 13, 2016 at 03:08:19PM +0530, Anshuman Khandual wrote:
>>>>>>> On 05/31/2016 05:31 AM, Minchan Kim wrote:
>>>>>>>>> @@ -791,6 +921,7 @@ static int __unmap_and_move(struct page *page, struct page *newpage,
>>>>>>>>>  	int rc = -EAGAIN;
>>>>>>>>>  	int page_was_mapped = 0;
>>>>>>>>>  	struct anon_vma *anon_vma = NULL;
>>>>>>>>> +	bool is_lru = !__PageMovable(page);
>>>>>>>>>  
>>>>>>>>>  	if (!trylock_page(page)) {
>>>>>>>>>  		if (!force || mode == MIGRATE_ASYNC)
>>>>>>>>> @@ -871,6 +1002,11 @@ static int __unmap_and_move(struct page *page, struct page *newpage,
>>>>>>>>>  		goto out_unlock_both;
>>>>>>>>>  	}
>>>>>>>>>  
>>>>>>>>> +	if (unlikely(!is_lru)) {
>>>>>>>>> +		rc = move_to_new_page(newpage, page, mode);
>>>>>>>>> +		goto out_unlock_both;
>>>>>>>>> +	}
>>>>>>>>> +
>>>>>>>
>>>>>>> Hello Minchan,
>>>>>>>
>>>>>>> I might be missing something here but does this implementation support the
>>>>>>> scenario where these non LRU pages owned by the driver mapped as PTE into
>>>>>>> process page table ? Because the "goto out_unlock_both" statement above
>>>>>>> skips all the PTE unmap, putting a migration PTE and removing the migration
>>>>>>> PTE steps.
>>>>> You're right. Unfortunately, it doesn't support right now but surely,
>>>>> it's my TODO after landing this work.
>>>>>
>>>>> Could you share your usecase?
>>>>
>>>> Sure.
>>>
>>> Thanks a lot!
>>>
>>>>
>>>> My driver has privately managed non LRU pages which gets mapped into user space
>>>> process page table through f_ops->mmap() and vmops->fault() which then updates
>>>> the file RMAP (page->mapping->i_mmap) through page_add_file_rmap(page). One thing
>>>
>>> Hmm, page_add_file_rmap is not exported function. How does your driver can use it?
>>
>> Its not using the function directly, I just re-iterated the sequence of functions
>> above. (do_set_pte -> page_add_file_rmap) gets called after we grab the page from
>> driver through (__do_fault->vma->vm_ops->fault()).
>>
>>> Do you use vm_insert_pfn?
>>> What type your vma is? VM_PFNMMAP or VM_MIXEDMAP?
>>
>> I dont use vm_insert_pfn(). Here is the sequence of events how the user space
>> VMA gets the non LRU pages from the driver.
>>
>> - Driver registers a character device with 'struct file_operations' binding
>> - Then the 'fops->mmap()' just binds the incoming 'struct vma' with a 'struct
>>   vm_operations_struct' which provides the 'vmops->fault()' routine which
>>   basically traps all page faults on the VMA and provides one page at a time
>>   through a driver specific allocation routine which hands over non LRU pages
>>
>> The VMA is not anything special as such. Its what we get when we try to do a
>> simple mmap() on a file descriptor pointing to a character device. I can
>> figure out all the VM_* flags it holds after creation.
>>
>>>
>>> I want to make dummy driver to simulate your case.
>>
>> Sure. I hope the above mentioned steps will help you but in case you need more
>> information, please do let me know.
> 
> I got understood now. :)
> I will test it with dummy driver and will Cc'ed when I send a patch.

Hello Minchan,

Do you have any updates on this ? The V7 of the series still has this limitation.
Did you get a chance to test the driver out ? I am still concerned about how to
handle the struct address_space override problem within the struct page.

- Anshuman

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v6v3 02/12] mm: migrate: support non-lru movable page migration
  2016-06-27  5:51                   ` Anshuman Khandual
@ 2016-06-28  6:39                     ` Minchan Kim
  2016-06-30  5:56                       ` Anshuman Khandual
  0 siblings, 1 reply; 22+ messages in thread
From: Minchan Kim @ 2016-06-28  6:39 UTC (permalink / raw)
  To: Anshuman Khandual
  Cc: Andrew Morton, linux-mm, linux-kernel, Rik van Riel,
	Vlastimil Babka, Joonsoo Kim, Mel Gorman, Hugh Dickins,
	Rafael Aquini, virtualization, Jonathan Corbet, John Einar Reitan,
	dri-devel, Sergey Senozhatsky, Gioh Kim

On Mon, Jun 27, 2016 at 11:21:01AM +0530, Anshuman Khandual wrote:
> On 06/16/2016 11:07 AM, Minchan Kim wrote:
> > On Thu, Jun 16, 2016 at 09:12:07AM +0530, Anshuman Khandual wrote:
> >> On 06/16/2016 05:56 AM, Minchan Kim wrote:
> >>> On Wed, Jun 15, 2016 at 12:15:04PM +0530, Anshuman Khandual wrote:
> >>>> On 06/15/2016 08:02 AM, Minchan Kim wrote:
> >>>>> Hi,
> >>>>>
> >>>>> On Mon, Jun 13, 2016 at 03:08:19PM +0530, Anshuman Khandual wrote:
> >>>>>>> On 05/31/2016 05:31 AM, Minchan Kim wrote:
> >>>>>>>>> @@ -791,6 +921,7 @@ static int __unmap_and_move(struct page *page, struct page *newpage,
> >>>>>>>>>  	int rc = -EAGAIN;
> >>>>>>>>>  	int page_was_mapped = 0;
> >>>>>>>>>  	struct anon_vma *anon_vma = NULL;
> >>>>>>>>> +	bool is_lru = !__PageMovable(page);
> >>>>>>>>>  
> >>>>>>>>>  	if (!trylock_page(page)) {
> >>>>>>>>>  		if (!force || mode == MIGRATE_ASYNC)
> >>>>>>>>> @@ -871,6 +1002,11 @@ static int __unmap_and_move(struct page *page, struct page *newpage,
> >>>>>>>>>  		goto out_unlock_both;
> >>>>>>>>>  	}
> >>>>>>>>>  
> >>>>>>>>> +	if (unlikely(!is_lru)) {
> >>>>>>>>> +		rc = move_to_new_page(newpage, page, mode);
> >>>>>>>>> +		goto out_unlock_both;
> >>>>>>>>> +	}
> >>>>>>>>> +
> >>>>>>>
> >>>>>>> Hello Minchan,
> >>>>>>>
> >>>>>>> I might be missing something here but does this implementation support the
> >>>>>>> scenario where these non LRU pages owned by the driver mapped as PTE into
> >>>>>>> process page table ? Because the "goto out_unlock_both" statement above
> >>>>>>> skips all the PTE unmap, putting a migration PTE and removing the migration
> >>>>>>> PTE steps.
> >>>>> You're right. Unfortunately, it doesn't support right now but surely,
> >>>>> it's my TODO after landing this work.
> >>>>>
> >>>>> Could you share your usecase?
> >>>>
> >>>> Sure.
> >>>
> >>> Thanks a lot!
> >>>
> >>>>
> >>>> My driver has privately managed non LRU pages which gets mapped into user space
> >>>> process page table through f_ops->mmap() and vmops->fault() which then updates
> >>>> the file RMAP (page->mapping->i_mmap) through page_add_file_rmap(page). One thing
> >>>
> >>> Hmm, page_add_file_rmap is not exported function. How does your driver can use it?
> >>
> >> Its not using the function directly, I just re-iterated the sequence of functions
> >> above. (do_set_pte -> page_add_file_rmap) gets called after we grab the page from
> >> driver through (__do_fault->vma->vm_ops->fault()).
> >>
> >>> Do you use vm_insert_pfn?
> >>> What type your vma is? VM_PFNMMAP or VM_MIXEDMAP?
> >>
> >> I dont use vm_insert_pfn(). Here is the sequence of events how the user space
> >> VMA gets the non LRU pages from the driver.
> >>
> >> - Driver registers a character device with 'struct file_operations' binding
> >> - Then the 'fops->mmap()' just binds the incoming 'struct vma' with a 'struct
> >>   vm_operations_struct' which provides the 'vmops->fault()' routine which
> >>   basically traps all page faults on the VMA and provides one page at a time
> >>   through a driver specific allocation routine which hands over non LRU pages
> >>
> >> The VMA is not anything special as such. Its what we get when we try to do a
> >> simple mmap() on a file descriptor pointing to a character device. I can
> >> figure out all the VM_* flags it holds after creation.
> >>
> >>>
> >>> I want to make dummy driver to simulate your case.
> >>
> >> Sure. I hope the above mentioned steps will help you but in case you need more
> >> information, please do let me know.
> > 
> > I got understood now. :)
> > I will test it with dummy driver and will Cc'ed when I send a patch.
> 
> Hello Minchan,
> 
> Do you have any updates on this ? The V7 of the series still has this limitation.
> Did you get a chance to test the driver out ? I am still concerned about how to
> handle the struct address_space override problem within the struct page.

Hi Anshuman,

Slow but I am working on that. :) However, as I said, I want to do it
after soft landing of current non-lru-no-mapped page migration to solve
current real field issues.

About the overriding problem of non-lru-mapped-page, I implemented dummy
driver as miscellaneous device and in test_mmap(file_operations.mmap),
I changed a_ops with my address_space_operations.

int test_mmap(struct file *filp, struct vm_area_struct *vma)
{
        filp->f_mapping->a_ops = &test_aops;
        vma->vm_ops = &test_vm_ops;
        vma->vm_private_data = filp->private_data;
        return 0;
}

test_aops should have *set_page_dirty* overriding.

static int test_set_pag_dirty(struct page *page)
{
        if (!PageDirty(page))
                SetPageDirty*page);
        return 0;
}

Otherwise, it goes BUG_ON during radix tree operation because
currently try_to_unmap is designed for file-lru pages which lives
in page cache so it propagates page table dirty bit to PG_dirty flag
of struct page by set_page_dirty. And set_page_dirty want to mark
dirty tag in radix tree node but it's character driver so the page
cache doesn't have it. That's why we encounter BUG_ON in radix tree
operation. Anyway, to test, I implemented set_page_dirty in my dummy
driver.

With only that, it doesn't work because I need to modify migrate.c to
work non-lru-mapped-page and changing PG_isolated flag which is
override of PG_reclaim which is cleared in set_page_dirty.

With that, it seems to work. But I'm not saying it's right model now
for device drivers. In runtime, replacing filp->f_mapping->a_ops with
custom a_ops of own driver seems to be hacky to me.
So, I'm considering now new pseudo fs "movable_inode" which will
support 

struct file *movable_inode_getfile(const char *name,
                        const struct file_operations *fop,
                        const struct address_space_operations *a_ops)
{
        struct path path;
        struct qstr this;
        struct inode *inode;
        struct super_block *sb;

        this.name = name;
        this.len = strlen(name);
        this.hash = 0;
        sb = movable_mnt.mnt_sb;
        patch.denty = d_alloc_pseudo(movable_inode_mnt->mnt_sb, &this);
        patch.mnt = mntget(movable_inode_mnt);
        
        inode = new_inode(sb);
        ..
        ..
        inode->i_mapping->a_ops = a_ops;
        d_instantiate(path.dentry, inode);

        return alloc_file(&path, FMODE_WRITE | FMODE_READ, f_op);
}

And in our driver, we can change vma->vm_file with new one.

int test_mmap(struct file *filp, struct vm_area_structd *vma)
{
        struct file *newfile = movable_inode_getfile("[test"],
                                filep->f_op, &test_aops);
        vma->vm_file = newfile;
        ..
        ..
}

When I read mmap_region in mm/mmap.c, it's reasonable usecase
which dirver's mmap changes vma->vm_file with own file.

Anyway, it needs many subtle changes in mm/vfs/driver side so
need to review from each maintainers related subsystem so I
want to not be hurry.

Thanks.

> 
> - Anshuman
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v6v3 02/12] mm: migrate: support non-lru movable page migration
  2016-06-28  6:39                     ` Minchan Kim
@ 2016-06-30  5:56                       ` Anshuman Khandual
  2016-06-30  6:18                         ` Minchan Kim
  0 siblings, 1 reply; 22+ messages in thread
From: Anshuman Khandual @ 2016-06-30  5:56 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, linux-mm, linux-kernel, Rik van Riel,
	Vlastimil Babka, Joonsoo Kim, Mel Gorman, Hugh Dickins,
	Rafael Aquini, virtualization, Jonathan Corbet, John Einar Reitan,
	dri-devel, Sergey Senozhatsky, Gioh Kim

On 06/28/2016 12:09 PM, Minchan Kim wrote:
> On Mon, Jun 27, 2016 at 11:21:01AM +0530, Anshuman Khandual wrote:
>> On 06/16/2016 11:07 AM, Minchan Kim wrote:
>>> On Thu, Jun 16, 2016 at 09:12:07AM +0530, Anshuman Khandual wrote:
>>>> On 06/16/2016 05:56 AM, Minchan Kim wrote:
>>>>> On Wed, Jun 15, 2016 at 12:15:04PM +0530, Anshuman Khandual wrote:
>>>>>> On 06/15/2016 08:02 AM, Minchan Kim wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> On Mon, Jun 13, 2016 at 03:08:19PM +0530, Anshuman Khandual wrote:
>>>>>>>>> On 05/31/2016 05:31 AM, Minchan Kim wrote:
>>>>>>>>>>> @@ -791,6 +921,7 @@ static int __unmap_and_move(struct page *page, struct page *newpage,
>>>>>>>>>>>  	int rc = -EAGAIN;
>>>>>>>>>>>  	int page_was_mapped = 0;
>>>>>>>>>>>  	struct anon_vma *anon_vma = NULL;
>>>>>>>>>>> +	bool is_lru = !__PageMovable(page);
>>>>>>>>>>>  
>>>>>>>>>>>  	if (!trylock_page(page)) {
>>>>>>>>>>>  		if (!force || mode == MIGRATE_ASYNC)
>>>>>>>>>>> @@ -871,6 +1002,11 @@ static int __unmap_and_move(struct page *page, struct page *newpage,
>>>>>>>>>>>  		goto out_unlock_both;
>>>>>>>>>>>  	}
>>>>>>>>>>>  
>>>>>>>>>>> +	if (unlikely(!is_lru)) {
>>>>>>>>>>> +		rc = move_to_new_page(newpage, page, mode);
>>>>>>>>>>> +		goto out_unlock_both;
>>>>>>>>>>> +	}
>>>>>>>>>>> +
>>>>>>>>>
>>>>>>>>> Hello Minchan,
>>>>>>>>>
>>>>>>>>> I might be missing something here but does this implementation support the
>>>>>>>>> scenario where these non LRU pages owned by the driver mapped as PTE into
>>>>>>>>> process page table ? Because the "goto out_unlock_both" statement above
>>>>>>>>> skips all the PTE unmap, putting a migration PTE and removing the migration
>>>>>>>>> PTE steps.
>>>>>>> You're right. Unfortunately, it doesn't support right now but surely,
>>>>>>> it's my TODO after landing this work.
>>>>>>>
>>>>>>> Could you share your usecase?
>>>>>>
>>>>>> Sure.
>>>>>
>>>>> Thanks a lot!
>>>>>
>>>>>>
>>>>>> My driver has privately managed non LRU pages which gets mapped into user space
>>>>>> process page table through f_ops->mmap() and vmops->fault() which then updates
>>>>>> the file RMAP (page->mapping->i_mmap) through page_add_file_rmap(page). One thing
>>>>>
>>>>> Hmm, page_add_file_rmap is not exported function. How does your driver can use it?
>>>>
>>>> Its not using the function directly, I just re-iterated the sequence of functions
>>>> above. (do_set_pte -> page_add_file_rmap) gets called after we grab the page from
>>>> driver through (__do_fault->vma->vm_ops->fault()).
>>>>
>>>>> Do you use vm_insert_pfn?
>>>>> What type your vma is? VM_PFNMMAP or VM_MIXEDMAP?
>>>>
>>>> I dont use vm_insert_pfn(). Here is the sequence of events how the user space
>>>> VMA gets the non LRU pages from the driver.
>>>>
>>>> - Driver registers a character device with 'struct file_operations' binding
>>>> - Then the 'fops->mmap()' just binds the incoming 'struct vma' with a 'struct
>>>>   vm_operations_struct' which provides the 'vmops->fault()' routine which
>>>>   basically traps all page faults on the VMA and provides one page at a time
>>>>   through a driver specific allocation routine which hands over non LRU pages
>>>>
>>>> The VMA is not anything special as such. Its what we get when we try to do a
>>>> simple mmap() on a file descriptor pointing to a character device. I can
>>>> figure out all the VM_* flags it holds after creation.
>>>>
>>>>>
>>>>> I want to make dummy driver to simulate your case.
>>>>
>>>> Sure. I hope the above mentioned steps will help you but in case you need more
>>>> information, please do let me know.
>>>
>>> I got understood now. :)
>>> I will test it with dummy driver and will Cc'ed when I send a patch.
>>
>> Hello Minchan,
>>
>> Do you have any updates on this ? The V7 of the series still has this limitation.
>> Did you get a chance to test the driver out ? I am still concerned about how to
>> handle the struct address_space override problem within the struct page.
> 
> Hi Anshuman,
> 
> Slow but I am working on that. :) However, as I said, I want to do it

I really appreciate. Was just curious about the problem and any potential
solution we can look into.

> after soft landing of current non-lru-no-mapped page migration to solve
> current real field issues.

yeah it makes sense.

> 
> About the overriding problem of non-lru-mapped-page, I implemented dummy
> driver as miscellaneous device and in test_mmap(file_operations.mmap),
> I changed a_ops with my address_space_operations.
> 
> int test_mmap(struct file *filp, struct vm_area_struct *vma)
> {
>         filp->f_mapping->a_ops = &test_aops;
>         vma->vm_ops = &test_vm_ops;
>         vma->vm_private_data = filp->private_data;
>         return 0;
> }
> 

Okay.

> test_aops should have *set_page_dirty* overriding.
> 
> static int test_set_pag_dirty(struct page *page)
> {
>         if (!PageDirty(page))
>                 SetPageDirty*page);
>         return 0;
> }
> 
> Otherwise, it goes BUG_ON during radix tree operation because
> currently try_to_unmap is designed for file-lru pages which lives
> in page cache so it propagates page table dirty bit to PG_dirty flag
> of struct page by set_page_dirty. And set_page_dirty want to mark
> dirty tag in radix tree node but it's character driver so the page
> cache doesn't have it. That's why we encounter BUG_ON in radix tree
> operation. Anyway, to test, I implemented set_page_dirty in my dummy
> driver.

Okay and the above test_set_page_dirty() example is sufficient ?

> 
> With only that, it doesn't work because I need to modify migrate.c to
> work non-lru-mapped-page and changing PG_isolated flag which is
> override of PG_reclaim which is cleared in set_page_dirty.

Got it, so what changes you did ? Implemented PG_isolated differently
not by overriding PG_reclaim or something else ? Yes set_page_dirty
indeed clears the PG_reclaim flag.

> 
> With that, it seems to work. But I'm not saying it's right model now

So the mapped pages migration was successful ? Even after overloading
filp->f_mapping->a_ops = &test_aops, we still have the RMAP information
intact with filp->f_mappinp pointed interval tree. But would really like
to see the code changes.

> for device drivers. In runtime, replacing filp->f_mapping->a_ops with
> custom a_ops of own driver seems to be hacky to me.

Yeah I thought so.

> So, I'm considering now new pseudo fs "movable_inode" which will
> support 
> 
> struct file *movable_inode_getfile(const char *name,
>                         const struct file_operations *fop,
>                         const struct address_space_operations *a_ops)
> {
>         struct path path;
>         struct qstr this;
>         struct inode *inode;
>         struct super_block *sb;
> 
>         this.name = name;
>         this.len = strlen(name);
>         this.hash = 0;
>         sb = movable_mnt.mnt_sb;
>         patch.denty = d_alloc_pseudo(movable_inode_mnt->mnt_sb, &this);
>         patch.mnt = mntget(movable_inode_mnt);
>         
>         inode = new_inode(sb);
>         ..
>         ..
>         inode->i_mapping->a_ops = a_ops;
>         d_instantiate(path.dentry, inode);
> 
>         return alloc_file(&path, FMODE_WRITE | FMODE_READ, f_op);
> }
> 
> And in our driver, we can change vma->vm_file with new one.
> 
> int test_mmap(struct file *filp, struct vm_area_structd *vma)
> {
>         struct file *newfile = movable_inode_getfile("[test"],
>                                 filep->f_op, &test_aops);
>         vma->vm_file = newfile;
>         ..
>         ..
> }
> 
> When I read mmap_region in mm/mmap.c, it's reasonable usecase
> which dirver's mmap changes vma->vm_file with own file.

I will look into these details.

> Anyway, it needs many subtle changes in mm/vfs/driver side so
> need to review from each maintainers related subsystem so I
> want to not be hurry.

Sure, makes sense. Mean while it will be really great if you could share
your code changes as described above, so that I can try them out.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v6v3 02/12] mm: migrate: support non-lru movable page migration
  2016-06-30  5:56                       ` Anshuman Khandual
@ 2016-06-30  6:18                         ` Minchan Kim
  0 siblings, 0 replies; 22+ messages in thread
From: Minchan Kim @ 2016-06-30  6:18 UTC (permalink / raw)
  To: Anshuman Khandual
  Cc: Rik van Riel, Sergey Senozhatsky, Rafael Aquini, Jonathan Corbet,
	Hugh Dickins, linux-kernel, dri-devel, virtualization,
	John Einar Reitan, linux-mm, Gioh Kim, Mel Gorman, Andrew Morton,
	Joonsoo Kim, Vlastimil Babka

On Thu, Jun 30, 2016 at 11:26:45AM +0530, Anshuman Khandual wrote:

<snip>

> >> Did you get a chance to test the driver out ? I am still concerned about how to
> >> handle the struct address_space override problem within the struct page.
> > 
> > Hi Anshuman,
> > 
> > Slow but I am working on that. :) However, as I said, I want to do it
> 
> I really appreciate. Was just curious about the problem and any potential
> solution we can look into.
> 
> > after soft landing of current non-lru-no-mapped page migration to solve
> > current real field issues.
> 
> yeah it makes sense.
> 
> > 
> > About the overriding problem of non-lru-mapped-page, I implemented dummy
> > driver as miscellaneous device and in test_mmap(file_operations.mmap),
> > I changed a_ops with my address_space_operations.
> > 
> > int test_mmap(struct file *filp, struct vm_area_struct *vma)
> > {
> >         filp->f_mapping->a_ops = &test_aops;
> >         vma->vm_ops = &test_vm_ops;
> >         vma->vm_private_data = filp->private_data;
> >         return 0;
> > }
> > 
> 
> Okay.
> 
> > test_aops should have *set_page_dirty* overriding.
> > 
> > static int test_set_pag_dirty(struct page *page)
> > {
> >         if (!PageDirty(page))
> >                 SetPageDirty*page);
> >         return 0;
> > }
> > 
> > Otherwise, it goes BUG_ON during radix tree operation because
> > currently try_to_unmap is designed for file-lru pages which lives
> > in page cache so it propagates page table dirty bit to PG_dirty flag
> > of struct page by set_page_dirty. And set_page_dirty want to mark
> > dirty tag in radix tree node but it's character driver so the page
> > cache doesn't have it. That's why we encounter BUG_ON in radix tree
> > operation. Anyway, to test, I implemented set_page_dirty in my dummy
> > driver.
> 
> Okay and the above test_set_page_dirty() example is sufficient ?

I guess just return 0 is sufficeint without any dirting a page.

> 
> > 
> > With only that, it doesn't work because I need to modify migrate.c to
> > work non-lru-mapped-page and changing PG_isolated flag which is
> > override of PG_reclaim which is cleared in set_page_dirty.
> 
> Got it, so what changes you did ? Implemented PG_isolated differently
> not by overriding PG_reclaim or something else ? Yes set_page_dirty
> indeed clears the PG_reclaim flag.
> 
> > 
> > With that, it seems to work. But I'm not saying it's right model now
> 
> So the mapped pages migration was successful ? Even after overloading
> filp->f_mapping->a_ops = &test_aops, we still have the RMAP information
> intact with filp->f_mappinp pointed interval tree. But would really like
> to see the code changes.
> 
> > for device drivers. In runtime, replacing filp->f_mapping->a_ops with
> > custom a_ops of own driver seems to be hacky to me.
> 
> Yeah I thought so.
> 
> > So, I'm considering now new pseudo fs "movable_inode" which will
> > support 
> > 
> > struct file *movable_inode_getfile(const char *name,
> >                         const struct file_operations *fop,
> >                         const struct address_space_operations *a_ops)
> > {
> >         struct path path;
> >         struct qstr this;
> >         struct inode *inode;
> >         struct super_block *sb;
> > 
> >         this.name = name;
> >         this.len = strlen(name);
> >         this.hash = 0;
> >         sb = movable_mnt.mnt_sb;
> >         patch.denty = d_alloc_pseudo(movable_inode_mnt->mnt_sb, &this);
> >         patch.mnt = mntget(movable_inode_mnt);
> >         
> >         inode = new_inode(sb);
> >         ..
> >         ..
> >         inode->i_mapping->a_ops = a_ops;
> >         d_instantiate(path.dentry, inode);
> > 
> >         return alloc_file(&path, FMODE_WRITE | FMODE_READ, f_op);
> > }
> > 
> > And in our driver, we can change vma->vm_file with new one.
> > 
> > int test_mmap(struct file *filp, struct vm_area_structd *vma)
> > {
> >         struct file *newfile = movable_inode_getfile("[test"],
> >                                 filep->f_op, &test_aops);
> >         vma->vm_file = newfile;
> >         ..
> >         ..
> > }
> > 
> > When I read mmap_region in mm/mmap.c, it's reasonable usecase
> > which dirver's mmap changes vma->vm_file with own file.
> 
> I will look into these details.
> 
> > Anyway, it needs many subtle changes in mm/vfs/driver side so
> > need to review from each maintainers related subsystem so I
> > want to not be hurry.
> 
> Sure, makes sense. Mean while it will be really great if you could share
> your code changes as described above, so that I can try them out.
> 

It's almost done for draft version and I'm doing stress test now and
fortunately, doesn't see the problem until now.

I will send you when I'm ready.

Thanks.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2016-06-30  6:18 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-05-20 14:23 [PATCH v6 00/12] Support non-lru page migration Minchan Kim
2016-05-20 14:23 ` [PATCH v6 02/12] mm: migrate: support non-lru movable " Minchan Kim
2016-05-27 14:26   ` Vlastimil Babka
2016-05-30  1:33     ` Minchan Kim
2016-05-30  9:01       ` Vlastimil Babka
2016-05-30  1:39   ` PATCH v6v2 " Minchan Kim
2016-05-30  9:36     ` Vlastimil Babka
2016-05-30 16:25       ` Minchan Kim
2016-05-31  7:51         ` Vlastimil Babka
2016-05-31  0:01     ` [PATCH v6v3 " Minchan Kim
2016-05-31  7:52       ` Vlastimil Babka
2016-05-31 23:05         ` Minchan Kim
2016-06-13  9:38       ` Anshuman Khandual
2016-06-15  2:32         ` Minchan Kim
2016-06-15  6:45           ` Anshuman Khandual
2016-06-16  0:26             ` Minchan Kim
2016-06-16  3:42               ` Anshuman Khandual
2016-06-16  5:37                 ` Minchan Kim
2016-06-27  5:51                   ` Anshuman Khandual
2016-06-28  6:39                     ` Minchan Kim
2016-06-30  5:56                       ` Anshuman Khandual
2016-06-30  6:18                         ` Minchan Kim

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).