[PATCH v4 1/2] shmem: Support for registration of driver/file owner specific ops

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v4 1/2] shmem: Support for registration of driver/file owner specific ops
@ 2016-04-04 13:18 Chris Wilson
  2016-04-04 13:18 ` [PATCH v4 2/2] drm/i915: Make GPU pages movable Chris Wilson
                   ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Chris Wilson @ 2016-04-04 13:18 UTC (permalink / raw)
  To: intel-gfx; +Cc: Akash Goel, Hugh Dickins, linux-mm, linux-kernel, Sourab Gupta

From: Akash Goel <akash.goel@intel.com>

This provides support for the drivers or shmem file owners to register
a set of callbacks, which can be invoked from the address space
operations methods implemented by shmem.  This allow the file owners to
hook into the shmem address space operations to do some extra/custom
operations in addition to the default ones.

The private_data field of address_space struct is used to store the
pointer to driver specific ops.  Currently only one ops field is defined,
which is migratepage, but can be extended on an as-needed basis.

The need for driver specific operations arises since some of the
operations (like migratepage) may not be handled completely within shmem,
so as to be effective, and would need some driver specific handling also.
Specifically, i915.ko would like to participate in migratepage().
i915.ko uses shmemfs to provide swappable backing storage for its user
objects, but when those objects are in use by the GPU it must pin the
entire object until the GPU is idle.  As a result, large chunks of memory
can be arbitrarily withdrawn from page migration, resulting in premature
out-of-memory due to fragmentation.  However, if i915.ko can receive the
migratepage() request, it can then flush the object from the GPU, remove
its pin and thus enable the migration.

Since gfx allocations are one of the major consumer of system memory, its
imperative to have such a mechanism to effectively deal with
fragmentation.  And therefore the need for such a provision for initiating
driver specific actions during address space operations.

Cc: Hugh Dickins <hughd@google.com>
Cc: linux-mm@kvack.org
Cc: linux-kernel@vger.linux.org
Signed-off-by: Sourab Gupta <sourab.gupta@intel.com>
Signed-off-by: Akash Goel <akash.goel@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 include/linux/shmem_fs.h | 17 +++++++++++++++++
 mm/shmem.c               | 17 ++++++++++++++++-
 2 files changed, 33 insertions(+), 1 deletion(-)

diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h
index 4d4780c00d34..d7925b66c240 100644
--- a/include/linux/shmem_fs.h
+++ b/include/linux/shmem_fs.h
@@ -34,11 +34,28 @@ struct shmem_sb_info {
 	struct mempolicy *mpol;     /* default memory policy for mappings */
 };

+struct shmem_dev_info {
+	void *dev_private_data;
+	int (*dev_migratepage)(struct address_space *mapping,
+			       struct page *newpage, struct page *page,
+			       enum migrate_mode mode, void *dev_priv_data);
+};
+
 static inline struct shmem_inode_info *SHMEM_I(struct inode *inode)
 {
 	return container_of(inode, struct shmem_inode_info, vfs_inode);
 }

+static inline int shmem_set_device_ops(struct address_space *mapping,
+				       struct shmem_dev_info *info)
+{
+	if (mapping->private_data != NULL)
+		return -EEXIST;
+
+	mapping->private_data = info;
+	return 0;
+}
+
 /*
  * Functions in mm/shmem.c called directly from elsewhere:
  */
diff --git a/mm/shmem.c b/mm/shmem.c
index 9428c51ab2d6..6ed953193883 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -947,6 +947,21 @@ redirty:
 	return 0;
 }

+#ifdef CONFIG_MIGRATION
+static int shmem_migratepage(struct address_space *mapping,
+			     struct page *newpage, struct page *page,
+			     enum migrate_mode mode)
+{
+	struct shmem_dev_info *dev_info = mapping->private_data;
+
+	if (dev_info && dev_info->dev_migratepage)
+		return dev_info->dev_migratepage(mapping, newpage, page,
+				mode, dev_info->dev_private_data);
+
+	return migrate_page(mapping, newpage, page, mode);
+}
+#endif
+
 #ifdef CONFIG_NUMA
 #ifdef CONFIG_TMPFS
 static void shmem_show_mpol(struct seq_file *seq, struct mempolicy *mpol)
@@ -3161,7 +3176,7 @@ static const struct address_space_operations shmem_aops = {
 	.write_end	= shmem_write_end,
 #endif
 #ifdef CONFIG_MIGRATION
-	.migratepage	= migrate_page,
+	.migratepage	= shmem_migratepage,
 #endif
 	.error_remove_page = generic_error_remove_page,
 };
-- 
2.8.0.rc3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v4 2/2] drm/i915: Make GPU pages movable
  2016-04-04 13:18 [PATCH v4 1/2] shmem: Support for registration of driver/file owner specific ops Chris Wilson
@ 2016-04-04 13:18 ` Chris Wilson
  2016-10-18 12:05   ` [Intel-gfx] " Joonas Lahtinen
  2016-04-15 16:09 ` [Intel-gfx] [PATCH v4 1/2] shmem: Support for registration of driver/file owner specific ops Chris Wilson
  2016-04-24 23:42 ` Kirill A. Shutemov
  2 siblings, 1 reply; 13+ messages in thread
From: Chris Wilson @ 2016-04-04 13:18 UTC (permalink / raw)
  To: intel-gfx; +Cc: Akash Goel, Hugh Dickins, linux-mm, Sourab Gupta

From: Akash Goel <akash.goel@intel.com>

On a long run of more than 2-3 days, physical memory tends to get
fragmented severely, which considerably slows down the system. In such a
scenario, the shrinker is also unable to help as lack of memory is not
the actual problem, since it has been observed that there are enough free
pages of 0 order. This also manifests itself when an indiviual zone in
the mm runs out of pages and if we cannot migrate pages between zones,
the kernel hits an out-of-memory even though there are free pages (and
often all of swap) available.

To address the issue of external fragementation, kernel does a compaction
(which involves migration of pages) but it's efficacy depends upon how
many pages are marked as MOVABLE, as only those pages can be migrated.

Currently the backing pages for GFX buffers are allocated from shmemfs
with GFP_RECLAIMABLE flag, in units of 4KB pages.  In the case of limited
swap space, it may not be possible always to reclaim or swap-out pages of
all the inactive objects, to make way for free space allowing formation
of higher order groups of physically-contiguous pages on compaction.

Just marking the GPU pages as MOVABLE will not suffice, as i915.ko has to
pin the pages if they are in use by GPU, which will prevent their
migration. So the migratepage callback in shmem is also hooked up to get
a notification when kernel initiates the page migration. On the
notification, i915.ko appropriately unpin the pages.  With this we can
effectively mark the GPU pages as MOVABLE and hence mitigate the
fragmentation problem.

v2:
 - Rename the migration routine to gem_shrink_migratepage, move it to the
   shrinker file, and use the existing constructs (Chris)
 - To cleanup, add a new helper function to encapsulate all page migration
   skip conditions (Chris)
 - Add a new local helper function in shrinker file, for dropping the
   backing pages, and call the same from gem_shrink() also (Chris)

v3:
 - Fix/invert the check on the return value of unsafe_drop_pages (Chris)

v4:
 - Minor tidy

Testcase: igt/gem_shrink
Bugzilla: (e.g.) https://bugs.freedesktop.org/show_bug.cgi?id=90254
Cc: Hugh Dickins <hughd@google.com>
Cc: linux-mm@kvack.org
Signed-off-by: Sourab Gupta <sourab.gupta@intel.com>
Signed-off-by: Akash Goel <akash.goel@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_drv.h          |   3 +
 drivers/gpu/drm/i915/i915_gem.c          |  13 ++-
 drivers/gpu/drm/i915/i915_gem_shrinker.c | 174 ++++++++++++++++++++++++++++---
 3 files changed, 175 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index dd187727c813..2c1d5c3af780 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -52,6 +52,7 @@
 #include <linux/intel-iommu.h>
 #include <linux/kref.h>
 #include <linux/pm_qos.h>
+#include <linux/shmem_fs.h>
 #include "intel_guc.h"
 #include "intel_dpll_mgr.h"
 
@@ -1234,6 +1235,8 @@ struct intel_l3_parity {
 };
 
 struct i915_gem_mm {
+	struct shmem_dev_info shmem_info;
+
 	/** Memory allocator for GTT stolen memory */
 	struct drm_mm stolen;
 	/** Protects the usage of the GTT stolen memory allocator. This is
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index ca96fc12cdf4..7cef03efb539 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2206,6 +2206,7 @@ i915_gem_object_put_pages_gtt(struct drm_i915_gem_object *obj)
 		if (obj->madv == I915_MADV_WILLNEED)
 			mark_page_accessed(page);
 
+		set_page_private(page, 0);
 		page_cache_release(page);
 	}
 	obj->dirty = 0;
@@ -2320,6 +2321,7 @@ i915_gem_object_get_pages_gtt(struct drm_i915_gem_object *obj)
 			sg->length += PAGE_SIZE;
 		}
 		last_pfn = page_to_pfn(page);
+		set_page_private(page, (unsigned long)obj);
 
 		/* Check that the i965g/gm workaround works. */
 		WARN_ON((gfp & __GFP_DMA32) && (last_pfn >= 0x00100000UL));
@@ -2345,8 +2347,11 @@ i915_gem_object_get_pages_gtt(struct drm_i915_gem_object *obj)
 
 err_pages:
 	sg_mark_end(sg);
-	for_each_sg_page(st->sgl, &sg_iter, st->nents, 0)
-		page_cache_release(sg_page_iter_page(&sg_iter));
+	for_each_sg_page(st->sgl, &sg_iter, st->nents, 0) {
+		page = sg_page_iter_page(&sg_iter);
+		set_page_private(page, 0);
+		page_cache_release(page);
+	}
 	sg_free_table(st);
 	kfree(st);
 
@@ -4468,6 +4473,7 @@ static const struct drm_i915_gem_object_ops i915_gem_object_ops = {
 struct drm_i915_gem_object *i915_gem_alloc_object(struct drm_device *dev,
 						  size_t size)
 {
+	struct drm_i915_private *dev_priv = to_i915(dev);
 	struct drm_i915_gem_object *obj;
 	struct address_space *mapping;
 	gfp_t mask;
@@ -4481,7 +4487,7 @@ struct drm_i915_gem_object *i915_gem_alloc_object(struct drm_device *dev,
 		return NULL;
 	}
 
-	mask = GFP_HIGHUSER | __GFP_RECLAIMABLE;
+	mask = GFP_HIGHUSER_MOVABLE;
 	if (IS_CRESTLINE(dev) || IS_BROADWATER(dev)) {
 		/* 965gm cannot relocate objects above 4GiB. */
 		mask &= ~__GFP_HIGHMEM;
@@ -4490,6 +4496,7 @@ struct drm_i915_gem_object *i915_gem_alloc_object(struct drm_device *dev,
 
 	mapping = file_inode(obj->base.filp)->i_mapping;
 	mapping_set_gfp_mask(mapping, mask);
+	shmem_set_device_ops(mapping, &dev_priv->mm.shmem_info);
 
 	i915_gem_object_init(obj, &i915_gem_object_ops);
 
diff --git a/drivers/gpu/drm/i915/i915_gem_shrinker.c b/drivers/gpu/drm/i915/i915_gem_shrinker.c
index d3c473ffb90a..b29584149bdf 100644
--- a/drivers/gpu/drm/i915/i915_gem_shrinker.c
+++ b/drivers/gpu/drm/i915/i915_gem_shrinker.c
@@ -24,6 +24,7 @@
 
 #include <linux/oom.h>
 #include <linux/shmem_fs.h>
+#include <linux/migrate.h>
 #include <linux/slab.h>
 #include <linux/swap.h>
 #include <linux/pci.h>
@@ -87,6 +88,70 @@ static bool can_release_pages(struct drm_i915_gem_object *obj)
 	return swap_available() || obj->madv == I915_MADV_DONTNEED;
 }
 
+static bool can_migrate_page(struct drm_i915_gem_object *obj)
+{
+	/* Avoid the migration of page if being actively used by GPU */
+	if (obj->active)
+		return false;
+
+	/* Skip the migration for purgeable objects otherwise there
+	 * will be a deadlock when shmem will try to lock the page for
+	 * truncation, which is already locked by the caller before
+	 * migration.
+	 */
+	if (obj->madv == I915_MADV_DONTNEED)
+		return false;
+
+	/* Skip the migration for a pinned object */
+	if (obj->pages_pin_count != num_vma_bound(obj))
+		return false;
+
+	return true;
+}
+
+static int
+unsafe_drop_pages(struct drm_i915_gem_object *obj)
+{
+	struct i915_vma *vma, *next;
+	int ret;
+
+	drm_gem_object_reference(&obj->base);
+	list_for_each_entry_safe(vma, next, &obj->vma_list, obj_link)
+		if (i915_vma_unbind(vma))
+			break;
+
+	ret = i915_gem_object_put_pages(obj);
+	drm_gem_object_unreference(&obj->base);
+
+	return ret;
+}
+
+static int
+do_migrate_page(struct drm_i915_gem_object *obj)
+{
+	struct drm_i915_private *dev_priv = to_i915(obj->base.dev);
+	int ret = 0;
+
+	if (!can_migrate_page(obj))
+		return -EBUSY;
+
+	/* HW access would be required for a GGTT bound object, for which
+	 * device has to be kept awake. But a deadlock scenario can arise if
+	 * the attempt is made to resume the device, when either a suspend
+	 * or a resume operation is already happening concurrently from some
+	 * other path and that only also triggers compaction. So only unbind
+	 * if the device is currently awake.
+	 */
+	if (!intel_runtime_pm_get_if_in_use(dev_priv))
+		return -EBUSY;
+
+	if (unsafe_drop_pages(obj))
+		ret = -EBUSY;
+
+	intel_runtime_pm_put(dev_priv);
+	return ret;
+}
+
 /**
  * i915_gem_shrink - Shrink buffer object caches
  * @dev_priv: i915 device
@@ -156,7 +221,6 @@ i915_gem_shrink(struct drm_i915_private *dev_priv,
 		INIT_LIST_HEAD(&still_in_list);
 		while (count < target && !list_empty(phase->list)) {
 			struct drm_i915_gem_object *obj;
-			struct i915_vma *vma, *v;
 
 			obj = list_first_entry(phase->list,
 					       typeof(*obj), global_list);
@@ -172,18 +236,8 @@ i915_gem_shrink(struct drm_i915_private *dev_priv,
 			if (!can_release_pages(obj))
 				continue;
 
-			drm_gem_object_reference(&obj->base);
-
-			/* For the unbound phase, this should be a no-op! */
-			list_for_each_entry_safe(vma, v,
-						 &obj->vma_list, obj_link)
-				if (i915_vma_unbind(vma))
-					break;
-
-			if (i915_gem_object_put_pages(obj) == 0)
+			if (unsafe_drop_pages(obj) == 0)
 				count += obj->base.size >> PAGE_SHIFT;
-
-			drm_gem_object_unreference(&obj->base);
 		}
 		list_splice(&still_in_list, phase->list);
 	}
@@ -356,6 +410,97 @@ i915_gem_shrinker_oom(struct notifier_block *nb, unsigned long event, void *ptr)
 	return NOTIFY_DONE;
 }
 
+#ifdef CONFIG_MIGRATION
+static int i915_gem_shrinker_migratepage(struct address_space *mapping,
+					 struct page *newpage,
+					 struct page *page,
+					 enum migrate_mode mode,
+					 void *dev_priv_data)
+{
+	struct drm_i915_private *dev_priv = dev_priv_data;
+	struct drm_device *dev = dev_priv->dev;
+	unsigned long timeout = msecs_to_jiffies(10) + 1;
+	bool unlock;
+	int ret;
+
+	WARN((page_count(newpage) != 1), "Unexpected ref count for newpage\n");
+
+	/*
+	 * Clear the private field of the new target page as it could have a
+	 * stale value in the private field. Otherwise later on if this page
+	 * itself gets migrated, without getting referred by the Driver
+	 * in between, the stale value would cause the i915_migratepage
+	 * function to go for a toss as object pointer is derived from it.
+	 * This should be safe since at the time of migration, private field
+	 * of the new page (which is actually an independent free 4KB page now)
+	 * should be like a don't care for the kernel.
+	 */
+	set_page_private(newpage, 0);
+
+	if (!page_private(page))
+		goto migrate;
+
+	/*
+	 * Check the page count, if Driver also has a reference then it should
+	 * be more than 2, as shmem will have one reference and one reference
+	 * would have been taken by the migration path itself. So if reference
+	 * is <=2, we can directly invoke the migration function.
+	 */
+	if (page_count(page) <= 2)
+		goto migrate;
+
+	/*
+	 * Use trylock here, with a timeout, for struct_mutex as
+	 * otherwise there is a possibility of deadlock due to lock
+	 * inversion. This path, which tries to migrate a particular
+	 * page after locking that page, can race with a path which
+	 * truncate/purge pages of the corresponding object (after
+	 * acquiring struct_mutex). Since page truncation will also
+	 * try to lock the page, a scenario of deadlock can arise.
+	 */
+	while (!i915_gem_shrinker_lock(dev, &unlock) && --timeout)
+		schedule_timeout_killable(1);
+	if (timeout == 0) {
+		DRM_DEBUG_DRIVER("Unable to acquire device mutex.\n");
+		return -EBUSY;
+	}
+
+	ret = 0;
+	if (!PageSwapCache(page) && page_private(page)) {
+		struct drm_i915_gem_object *obj =
+			(struct drm_i915_gem_object *)page_private(page);
+
+		ret = do_migrate_page(obj);
+	}
+
+	if (unlock)
+		mutex_unlock(&dev->struct_mutex);
+
+	if (ret)
+		return ret;
+
+	/*
+	 * Ideally here we don't expect the page count to be > 2, as driver
+	 * would have dropped its reference, but occasionally it has been seen
+	 * coming as 3 & 4. This leads to a situation of unexpected page count,
+	 * causing migration failure, with -EGAIN error. This then leads to
+	 * multiple attempts by the kernel to migrate the same set of pages.
+	 * And sometimes the repeated attempts proves detrimental for stability.
+	 * Also since we don't know who is the other owner, and for how long its
+	 * gonna keep the reference, its better to return -EBUSY.
+	 */
+	if (page_count(page) > 2)
+		return -EBUSY;
+
+migrate:
+	ret = migrate_page(mapping, newpage, page, mode);
+	if (ret)
+		DRM_DEBUG_DRIVER("page=%p migration returned %d\n", page, ret);
+
+	return ret;
+}
+#endif
+
 /**
  * i915_gem_shrinker_init - Initialize i915 shrinker
  * @dev_priv: i915 device
@@ -371,6 +516,11 @@ void i915_gem_shrinker_init(struct drm_i915_private *dev_priv)
 
 	dev_priv->mm.oom_notifier.notifier_call = i915_gem_shrinker_oom;
 	WARN_ON(register_oom_notifier(&dev_priv->mm.oom_notifier));
+
+	dev_priv->mm.shmem_info.dev_private_data = dev_priv;
+#ifdef CONFIG_MIGRATION
+	dev_priv->mm.shmem_info.dev_migratepage = i915_gem_shrinker_migratepage;
+#endif
 }
 
 /**
-- 
2.8.0.rc3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [Intel-gfx] [PATCH v4 1/2] shmem: Support for registration of driver/file owner specific ops
  2016-04-04 13:18 [PATCH v4 1/2] shmem: Support for registration of driver/file owner specific ops Chris Wilson
  2016-04-04 13:18 ` [PATCH v4 2/2] drm/i915: Make GPU pages movable Chris Wilson
@ 2016-04-15 16:09 ` Chris Wilson
  2016-04-20 12:38   ` Daniel Vetter
  2016-04-24 23:42 ` Kirill A. Shutemov
  2 siblings, 1 reply; 13+ messages in thread
From: Chris Wilson @ 2016-04-15 16:09 UTC (permalink / raw)
  To: intel-gfx; +Cc: linux-mm, Akash Goel, linux-kernel, Hugh Dickins, Sourab Gupta

On Mon, Apr 04, 2016 at 02:18:10PM +0100, Chris Wilson wrote:
> From: Akash Goel <akash.goel@intel.com>
> 
> This provides support for the drivers or shmem file owners to register
> a set of callbacks, which can be invoked from the address space
> operations methods implemented by shmem.  This allow the file owners to
> hook into the shmem address space operations to do some extra/custom
> operations in addition to the default ones.
> 
> The private_data field of address_space struct is used to store the
> pointer to driver specific ops.  Currently only one ops field is defined,
> which is migratepage, but can be extended on an as-needed basis.
> 
> The need for driver specific operations arises since some of the
> operations (like migratepage) may not be handled completely within shmem,
> so as to be effective, and would need some driver specific handling also.
> Specifically, i915.ko would like to participate in migratepage().
> i915.ko uses shmemfs to provide swappable backing storage for its user
> objects, but when those objects are in use by the GPU it must pin the
> entire object until the GPU is idle.  As a result, large chunks of memory
> can be arbitrarily withdrawn from page migration, resulting in premature
> out-of-memory due to fragmentation.  However, if i915.ko can receive the
> migratepage() request, it can then flush the object from the GPU, remove
> its pin and thus enable the migration.
> 
> Since gfx allocations are one of the major consumer of system memory, its
> imperative to have such a mechanism to effectively deal with
> fragmentation.  And therefore the need for such a provision for initiating
> driver specific actions during address space operations.
> 
> Cc: Hugh Dickins <hughd@google.com>
> Cc: linux-mm@kvack.org
> Cc: linux-kernel@vger.linux.org
> Signed-off-by: Sourab Gupta <sourab.gupta@intel.com>
> Signed-off-by: Akash Goel <akash.goel@intel.com>
> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>

Ping?

> ---
>  include/linux/shmem_fs.h | 17 +++++++++++++++++
>  mm/shmem.c               | 17 ++++++++++++++++-
>  2 files changed, 33 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h
> index 4d4780c00d34..d7925b66c240 100644
> --- a/include/linux/shmem_fs.h
> +++ b/include/linux/shmem_fs.h
> @@ -34,11 +34,28 @@ struct shmem_sb_info {
>  	struct mempolicy *mpol;     /* default memory policy for mappings */
>  };
>  
> +struct shmem_dev_info {
> +	void *dev_private_data;
> +	int (*dev_migratepage)(struct address_space *mapping,
> +			       struct page *newpage, struct page *page,
> +			       enum migrate_mode mode, void *dev_priv_data);
> +};
> +
>  static inline struct shmem_inode_info *SHMEM_I(struct inode *inode)
>  {
>  	return container_of(inode, struct shmem_inode_info, vfs_inode);
>  }
>  
> +static inline int shmem_set_device_ops(struct address_space *mapping,
> +				       struct shmem_dev_info *info)
> +{
> +	if (mapping->private_data != NULL)
> +		return -EEXIST;
> +
> +	mapping->private_data = info;
> +	return 0;
> +}
> +
>  /*
>   * Functions in mm/shmem.c called directly from elsewhere:
>   */
> diff --git a/mm/shmem.c b/mm/shmem.c
> index 9428c51ab2d6..6ed953193883 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -947,6 +947,21 @@ redirty:
>  	return 0;
>  }
>  
> +#ifdef CONFIG_MIGRATION
> +static int shmem_migratepage(struct address_space *mapping,
> +			     struct page *newpage, struct page *page,
> +			     enum migrate_mode mode)
> +{
> +	struct shmem_dev_info *dev_info = mapping->private_data;
> +
> +	if (dev_info && dev_info->dev_migratepage)
> +		return dev_info->dev_migratepage(mapping, newpage, page,
> +				mode, dev_info->dev_private_data);
> +
> +	return migrate_page(mapping, newpage, page, mode);
> +}
> +#endif
> +
>  #ifdef CONFIG_NUMA
>  #ifdef CONFIG_TMPFS
>  static void shmem_show_mpol(struct seq_file *seq, struct mempolicy *mpol)
> @@ -3161,7 +3176,7 @@ static const struct address_space_operations shmem_aops = {
>  	.write_end	= shmem_write_end,
>  #endif
>  #ifdef CONFIG_MIGRATION
> -	.migratepage	= migrate_page,
> +	.migratepage	= shmem_migratepage,
>  #endif
>  	.error_remove_page = generic_error_remove_page,
>  };

-- 
Chris Wilson, Intel Open Source Technology Centre

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Intel-gfx] [PATCH v4 1/2] shmem: Support for registration of driver/file owner specific ops
  2016-04-15 16:09 ` [Intel-gfx] [PATCH v4 1/2] shmem: Support for registration of driver/file owner specific ops Chris Wilson
@ 2016-04-20 12:38   ` Daniel Vetter
  0 siblings, 0 replies; 13+ messages in thread
From: Daniel Vetter @ 2016-04-20 12:38 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx, linux-mm, Akash Goel, linux-kernel,
	Hugh Dickins, Sourab Gupta

On Fri, Apr 15, 2016 at 05:09:24PM +0100, Chris Wilson wrote:
> On Mon, Apr 04, 2016 at 02:18:10PM +0100, Chris Wilson wrote:
> > From: Akash Goel <akash.goel@intel.com>
> > 
> > This provides support for the drivers or shmem file owners to register
> > a set of callbacks, which can be invoked from the address space
> > operations methods implemented by shmem.  This allow the file owners to
> > hook into the shmem address space operations to do some extra/custom
> > operations in addition to the default ones.
> > 
> > The private_data field of address_space struct is used to store the
> > pointer to driver specific ops.  Currently only one ops field is defined,
> > which is migratepage, but can be extended on an as-needed basis.
> > 
> > The need for driver specific operations arises since some of the
> > operations (like migratepage) may not be handled completely within shmem,
> > so as to be effective, and would need some driver specific handling also.
> > Specifically, i915.ko would like to participate in migratepage().
> > i915.ko uses shmemfs to provide swappable backing storage for its user
> > objects, but when those objects are in use by the GPU it must pin the
> > entire object until the GPU is idle.  As a result, large chunks of memory
> > can be arbitrarily withdrawn from page migration, resulting in premature
> > out-of-memory due to fragmentation.  However, if i915.ko can receive the
> > migratepage() request, it can then flush the object from the GPU, remove
> > its pin and thus enable the migration.
> > 
> > Since gfx allocations are one of the major consumer of system memory, its
> > imperative to have such a mechanism to effectively deal with
> > fragmentation.  And therefore the need for such a provision for initiating
> > driver specific actions during address space operations.
> > 
> > Cc: Hugh Dickins <hughd@google.com>
> > Cc: linux-mm@kvack.org
> > Cc: linux-kernel@vger.linux.org
> > Signed-off-by: Sourab Gupta <sourab.gupta@intel.com>
> > Signed-off-by: Akash Goel <akash.goel@intel.com>
> > Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
> 
> Ping?

This addresses a long-standing issue we have in i915 gem of essentially
pinning way to much memory as unmoveable. Would be nice to get this
merged. It also does seem to fix some real-world issues we're seeing
(linked from the i915 patch).

Can I have acks/reviews from -mm folks to get this in through drm trees
please?

Thanks, Daniel

> 
> > ---
> >  include/linux/shmem_fs.h | 17 +++++++++++++++++
> >  mm/shmem.c               | 17 ++++++++++++++++-
> >  2 files changed, 33 insertions(+), 1 deletion(-)
> > 
> > diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h
> > index 4d4780c00d34..d7925b66c240 100644
> > --- a/include/linux/shmem_fs.h
> > +++ b/include/linux/shmem_fs.h
> > @@ -34,11 +34,28 @@ struct shmem_sb_info {
> >  	struct mempolicy *mpol;     /* default memory policy for mappings */
> >  };
> >  
> > +struct shmem_dev_info {
> > +	void *dev_private_data;
> > +	int (*dev_migratepage)(struct address_space *mapping,
> > +			       struct page *newpage, struct page *page,
> > +			       enum migrate_mode mode, void *dev_priv_data);
> > +};
> > +
> >  static inline struct shmem_inode_info *SHMEM_I(struct inode *inode)
> >  {
> >  	return container_of(inode, struct shmem_inode_info, vfs_inode);
> >  }
> >  
> > +static inline int shmem_set_device_ops(struct address_space *mapping,
> > +				       struct shmem_dev_info *info)
> > +{
> > +	if (mapping->private_data != NULL)
> > +		return -EEXIST;
> > +
> > +	mapping->private_data = info;
> > +	return 0;
> > +}
> > +
> >  /*
> >   * Functions in mm/shmem.c called directly from elsewhere:
> >   */
> > diff --git a/mm/shmem.c b/mm/shmem.c
> > index 9428c51ab2d6..6ed953193883 100644
> > --- a/mm/shmem.c
> > +++ b/mm/shmem.c
> > @@ -947,6 +947,21 @@ redirty:
> >  	return 0;
> >  }
> >  
> > +#ifdef CONFIG_MIGRATION
> > +static int shmem_migratepage(struct address_space *mapping,
> > +			     struct page *newpage, struct page *page,
> > +			     enum migrate_mode mode)
> > +{
> > +	struct shmem_dev_info *dev_info = mapping->private_data;
> > +
> > +	if (dev_info && dev_info->dev_migratepage)
> > +		return dev_info->dev_migratepage(mapping, newpage, page,
> > +				mode, dev_info->dev_private_data);
> > +
> > +	return migrate_page(mapping, newpage, page, mode);
> > +}
> > +#endif
> > +
> >  #ifdef CONFIG_NUMA
> >  #ifdef CONFIG_TMPFS
> >  static void shmem_show_mpol(struct seq_file *seq, struct mempolicy *mpol)
> > @@ -3161,7 +3176,7 @@ static const struct address_space_operations shmem_aops = {
> >  	.write_end	= shmem_write_end,
> >  #endif
> >  #ifdef CONFIG_MIGRATION
> > -	.migratepage	= migrate_page,
> > +	.migratepage	= shmem_migratepage,
> >  #endif
> >  	.error_remove_page = generic_error_remove_page,
> >  };
> 
> -- 
> Chris Wilson, Intel Open Source Technology Centre
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v4 1/2] shmem: Support for registration of driver/file owner specific ops
  2016-04-04 13:18 [PATCH v4 1/2] shmem: Support for registration of driver/file owner specific ops Chris Wilson
  2016-04-04 13:18 ` [PATCH v4 2/2] drm/i915: Make GPU pages movable Chris Wilson
  2016-04-15 16:09 ` [Intel-gfx] [PATCH v4 1/2] shmem: Support for registration of driver/file owner specific ops Chris Wilson
@ 2016-04-24 23:42 ` Kirill A. Shutemov
  2016-04-26 12:53   ` Daniel Vetter
  2 siblings, 1 reply; 13+ messages in thread
From: Kirill A. Shutemov @ 2016-04-24 23:42 UTC (permalink / raw)
  To: Chris Wilson
  Cc: intel-gfx, Akash Goel, Hugh Dickins, linux-mm, linux-kernel,
	Sourab Gupta

On Mon, Apr 04, 2016 at 02:18:10PM +0100, Chris Wilson wrote:
> From: Akash Goel <akash.goel@intel.com>
> 
> This provides support for the drivers or shmem file owners to register
> a set of callbacks, which can be invoked from the address space
> operations methods implemented by shmem.  This allow the file owners to
> hook into the shmem address space operations to do some extra/custom
> operations in addition to the default ones.
> 
> The private_data field of address_space struct is used to store the
> pointer to driver specific ops.  Currently only one ops field is defined,
> which is migratepage, but can be extended on an as-needed basis.
> 
> The need for driver specific operations arises since some of the
> operations (like migratepage) may not be handled completely within shmem,
> so as to be effective, and would need some driver specific handling also.
> Specifically, i915.ko would like to participate in migratepage().
> i915.ko uses shmemfs to provide swappable backing storage for its user
> objects, but when those objects are in use by the GPU it must pin the
> entire object until the GPU is idle.  As a result, large chunks of memory
> can be arbitrarily withdrawn from page migration, resulting in premature
> out-of-memory due to fragmentation.  However, if i915.ko can receive the
> migratepage() request, it can then flush the object from the GPU, remove
> its pin and thus enable the migration.
> 
> Since gfx allocations are one of the major consumer of system memory, its
> imperative to have such a mechanism to effectively deal with
> fragmentation.  And therefore the need for such a provision for initiating
> driver specific actions during address space operations.

Hm. Sorry, my ignorance, but shouldn't this kind of flushing be done in
response to mmu_notifier's ->invalidate_page?

I'm not aware about how i915 works and what's its expectation wrt shmem.
Do you have some userspace VMA which is mirrored on GPU side?
If yes, migration would cause unmapping of these pages and trigger the
mmu_notifier's hook.

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v4 1/2] shmem: Support for registration of driver/file owner specific ops
  2016-04-24 23:42 ` Kirill A. Shutemov
@ 2016-04-26 12:53   ` Daniel Vetter
  2016-04-26 23:33     ` Kirill A. Shutemov
  0 siblings, 1 reply; 13+ messages in thread
From: Daniel Vetter @ 2016-04-26 12:53 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Chris Wilson, intel-gfx, Akash Goel, Hugh Dickins, linux-mm,
	linux-kernel, Sourab Gupta

On Mon, Apr 25, 2016 at 02:42:50AM +0300, Kirill A. Shutemov wrote:
> On Mon, Apr 04, 2016 at 02:18:10PM +0100, Chris Wilson wrote:
> > From: Akash Goel <akash.goel@intel.com>
> > 
> > This provides support for the drivers or shmem file owners to register
> > a set of callbacks, which can be invoked from the address space
> > operations methods implemented by shmem.  This allow the file owners to
> > hook into the shmem address space operations to do some extra/custom
> > operations in addition to the default ones.
> > 
> > The private_data field of address_space struct is used to store the
> > pointer to driver specific ops.  Currently only one ops field is defined,
> > which is migratepage, but can be extended on an as-needed basis.
> > 
> > The need for driver specific operations arises since some of the
> > operations (like migratepage) may not be handled completely within shmem,
> > so as to be effective, and would need some driver specific handling also.
> > Specifically, i915.ko would like to participate in migratepage().
> > i915.ko uses shmemfs to provide swappable backing storage for its user
> > objects, but when those objects are in use by the GPU it must pin the
> > entire object until the GPU is idle.  As a result, large chunks of memory
> > can be arbitrarily withdrawn from page migration, resulting in premature
> > out-of-memory due to fragmentation.  However, if i915.ko can receive the
> > migratepage() request, it can then flush the object from the GPU, remove
> > its pin and thus enable the migration.
> > 
> > Since gfx allocations are one of the major consumer of system memory, its
> > imperative to have such a mechanism to effectively deal with
> > fragmentation.  And therefore the need for such a provision for initiating
> > driver specific actions during address space operations.
> 
> Hm. Sorry, my ignorance, but shouldn't this kind of flushing be done in
> response to mmu_notifier's ->invalidate_page?
> 
> I'm not aware about how i915 works and what's its expectation wrt shmem.
> Do you have some userspace VMA which is mirrored on GPU side?
> If yes, migration would cause unmapping of these pages and trigger the
> mmu_notifier's hook.

We do that for userptr pages (i.e. stuff we steal from userspace address
spaces). But we also have native gfx buffer objects based on shmem files,
and thus far we need to allocate them as !GFP_MOVEABLE. And we allocate a
_lot_ of those. And those files aren't mapped into any cpu address space
(ofc they're mapped on the gpu side, but that's driver private), from the
core mm they are pure pagecache. And afaiui for that we need to wire up
the migratepage hooks through shmem to i915_gem.c
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v4 1/2] shmem: Support for registration of driver/file owner specific ops
  2016-04-26 12:53   ` Daniel Vetter
@ 2016-04-26 23:33     ` Kirill A. Shutemov
  2016-04-27  7:16       ` Hugh Dickins
  0 siblings, 1 reply; 13+ messages in thread
From: Kirill A. Shutemov @ 2016-04-26 23:33 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx, Akash Goel, Hugh Dickins, linux-mm,
	linux-kernel, Sourab Gupta

On Tue, Apr 26, 2016 at 02:53:41PM +0200, Daniel Vetter wrote:
> On Mon, Apr 25, 2016 at 02:42:50AM +0300, Kirill A. Shutemov wrote:
> > On Mon, Apr 04, 2016 at 02:18:10PM +0100, Chris Wilson wrote:
> > > From: Akash Goel <akash.goel@intel.com>
> > > 
> > > This provides support for the drivers or shmem file owners to register
> > > a set of callbacks, which can be invoked from the address space
> > > operations methods implemented by shmem.  This allow the file owners to
> > > hook into the shmem address space operations to do some extra/custom
> > > operations in addition to the default ones.
> > > 
> > > The private_data field of address_space struct is used to store the
> > > pointer to driver specific ops.  Currently only one ops field is defined,
> > > which is migratepage, but can be extended on an as-needed basis.
> > > 
> > > The need for driver specific operations arises since some of the
> > > operations (like migratepage) may not be handled completely within shmem,
> > > so as to be effective, and would need some driver specific handling also.
> > > Specifically, i915.ko would like to participate in migratepage().
> > > i915.ko uses shmemfs to provide swappable backing storage for its user
> > > objects, but when those objects are in use by the GPU it must pin the
> > > entire object until the GPU is idle.  As a result, large chunks of memory
> > > can be arbitrarily withdrawn from page migration, resulting in premature
> > > out-of-memory due to fragmentation.  However, if i915.ko can receive the
> > > migratepage() request, it can then flush the object from the GPU, remove
> > > its pin and thus enable the migration.
> > > 
> > > Since gfx allocations are one of the major consumer of system memory, its
> > > imperative to have such a mechanism to effectively deal with
> > > fragmentation.  And therefore the need for such a provision for initiating
> > > driver specific actions during address space operations.
> > 
> > Hm. Sorry, my ignorance, but shouldn't this kind of flushing be done in
> > response to mmu_notifier's ->invalidate_page?
> > 
> > I'm not aware about how i915 works and what's its expectation wrt shmem.
> > Do you have some userspace VMA which is mirrored on GPU side?
> > If yes, migration would cause unmapping of these pages and trigger the
> > mmu_notifier's hook.
> 
> We do that for userptr pages (i.e. stuff we steal from userspace address
> spaces). But we also have native gfx buffer objects based on shmem files,
> and thus far we need to allocate them as !GFP_MOVEABLE. And we allocate a
> _lot_ of those. And those files aren't mapped into any cpu address space
> (ofc they're mapped on the gpu side, but that's driver private), from the
> core mm they are pure pagecache. And afaiui for that we need to wire up
> the migratepage hooks through shmem to i915_gem.c

I see.

I don't particularly like the way patch hooks into migrate, but don't a
good idea how to implement this better.

This way allows to hook up to any shmem file, which can be abused by
drivers later.

I wounder if it would be better for i915 to have its own in-kernel mount
with variant of tmpfs which provides different mapping->a_ops? Or is it
overkill? I don't know.

Hugh?

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v4 1/2] shmem: Support for registration of driver/file owner specific ops
  2016-04-26 23:33     ` Kirill A. Shutemov
@ 2016-04-27  7:16       ` Hugh Dickins
  2016-04-27  7:38         ` Daniel Vetter
  0 siblings, 1 reply; 13+ messages in thread
From: Hugh Dickins @ 2016-04-27  7:16 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Chris Wilson, intel-gfx, Akash Goel, Hugh Dickins, linux-mm,
	linux-kernel, Sourab Gupta

On Wed, 27 Apr 2016, Kirill A. Shutemov wrote:
> On Tue, Apr 26, 2016 at 02:53:41PM +0200, Daniel Vetter wrote:
> > On Mon, Apr 25, 2016 at 02:42:50AM +0300, Kirill A. Shutemov wrote:
> > > On Mon, Apr 04, 2016 at 02:18:10PM +0100, Chris Wilson wrote:
> > > > From: Akash Goel <akash.goel@intel.com>
> > > > 
> > > > This provides support for the drivers or shmem file owners to register
> > > > a set of callbacks, which can be invoked from the address space
> > > > operations methods implemented by shmem.  This allow the file owners to
> > > > hook into the shmem address space operations to do some extra/custom
> > > > operations in addition to the default ones.
> > > > 
> > > > The private_data field of address_space struct is used to store the
> > > > pointer to driver specific ops.  Currently only one ops field is defined,
> > > > which is migratepage, but can be extended on an as-needed basis.
> > > > 
> > > > The need for driver specific operations arises since some of the
> > > > operations (like migratepage) may not be handled completely within shmem,
> > > > so as to be effective, and would need some driver specific handling also.
> > > > Specifically, i915.ko would like to participate in migratepage().
> > > > i915.ko uses shmemfs to provide swappable backing storage for its user
> > > > objects, but when those objects are in use by the GPU it must pin the
> > > > entire object until the GPU is idle.  As a result, large chunks of memory
> > > > can be arbitrarily withdrawn from page migration, resulting in premature
> > > > out-of-memory due to fragmentation.  However, if i915.ko can receive the
> > > > migratepage() request, it can then flush the object from the GPU, remove
> > > > its pin and thus enable the migration.
> > > > 
> > > > Since gfx allocations are one of the major consumer of system memory, its
> > > > imperative to have such a mechanism to effectively deal with
> > > > fragmentation.  And therefore the need for such a provision for initiating
> > > > driver specific actions during address space operations.
> > > 
> > > Hm. Sorry, my ignorance, but shouldn't this kind of flushing be done in
> > > response to mmu_notifier's ->invalidate_page?
> > > 
> > > I'm not aware about how i915 works and what's its expectation wrt shmem.
> > > Do you have some userspace VMA which is mirrored on GPU side?
> > > If yes, migration would cause unmapping of these pages and trigger the
> > > mmu_notifier's hook.
> > 
> > We do that for userptr pages (i.e. stuff we steal from userspace address
> > spaces). But we also have native gfx buffer objects based on shmem files,
> > and thus far we need to allocate them as !GFP_MOVEABLE. And we allocate a
> > _lot_ of those. And those files aren't mapped into any cpu address space
> > (ofc they're mapped on the gpu side, but that's driver private), from the
> > core mm they are pure pagecache. And afaiui for that we need to wire up
> > the migratepage hooks through shmem to i915_gem.c
> 
> I see.
> 
> I don't particularly like the way patch hooks into migrate, but don't a
> good idea how to implement this better.
> 
> This way allows to hook up to any shmem file, which can be abused by
> drivers later.
> 
> I wounder if it would be better for i915 to have its own in-kernel mount
> with variant of tmpfs which provides different mapping->a_ops? Or is it
> overkill? I don't know.
> 
> Hugh?

This, and the 2/2, remain perpetually in my "needs more thought" box.
And won't get that thought today either, I'm afraid.  Tomorrow.

Like you, I don't particularly like these; but recognize that the i915
guys are doing all the rest of us a big favour by going to some trouble
to allow migration of their pinned pages.

Potential for abuse of migratepage by drivers is already there anyway:
we can be grateful that they're offering to use rather than abuse it;
but yes, it's a worry that such trickiness gets dispersed into drivers.

Amusing to see them grabbing shmem's page_private(), isn't it?

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v4 1/2] shmem: Support for registration of driver/file owner specific ops
  2016-04-27  7:16       ` Hugh Dickins
@ 2016-04-27  7:38         ` Daniel Vetter
  0 siblings, 0 replies; 13+ messages in thread
From: Daniel Vetter @ 2016-04-27  7:38 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Kirill A. Shutemov, Chris Wilson, intel-gfx, Akash Goel, linux-mm,
	linux-kernel, Sourab Gupta

On Wed, Apr 27, 2016 at 12:16:37AM -0700, Hugh Dickins wrote:
> On Wed, 27 Apr 2016, Kirill A. Shutemov wrote:
> > On Tue, Apr 26, 2016 at 02:53:41PM +0200, Daniel Vetter wrote:
> > > On Mon, Apr 25, 2016 at 02:42:50AM +0300, Kirill A. Shutemov wrote:
> > > > On Mon, Apr 04, 2016 at 02:18:10PM +0100, Chris Wilson wrote:
> > > > > From: Akash Goel <akash.goel@intel.com>
> > > > > 
> > > > > This provides support for the drivers or shmem file owners to register
> > > > > a set of callbacks, which can be invoked from the address space
> > > > > operations methods implemented by shmem.  This allow the file owners to
> > > > > hook into the shmem address space operations to do some extra/custom
> > > > > operations in addition to the default ones.
> > > > > 
> > > > > The private_data field of address_space struct is used to store the
> > > > > pointer to driver specific ops.  Currently only one ops field is defined,
> > > > > which is migratepage, but can be extended on an as-needed basis.
> > > > > 
> > > > > The need for driver specific operations arises since some of the
> > > > > operations (like migratepage) may not be handled completely within shmem,
> > > > > so as to be effective, and would need some driver specific handling also.
> > > > > Specifically, i915.ko would like to participate in migratepage().
> > > > > i915.ko uses shmemfs to provide swappable backing storage for its user
> > > > > objects, but when those objects are in use by the GPU it must pin the
> > > > > entire object until the GPU is idle.  As a result, large chunks of memory
> > > > > can be arbitrarily withdrawn from page migration, resulting in premature
> > > > > out-of-memory due to fragmentation.  However, if i915.ko can receive the
> > > > > migratepage() request, it can then flush the object from the GPU, remove
> > > > > its pin and thus enable the migration.
> > > > > 
> > > > > Since gfx allocations are one of the major consumer of system memory, its
> > > > > imperative to have such a mechanism to effectively deal with
> > > > > fragmentation.  And therefore the need for such a provision for initiating
> > > > > driver specific actions during address space operations.
> > > > 
> > > > Hm. Sorry, my ignorance, but shouldn't this kind of flushing be done in
> > > > response to mmu_notifier's ->invalidate_page?
> > > > 
> > > > I'm not aware about how i915 works and what's its expectation wrt shmem.
> > > > Do you have some userspace VMA which is mirrored on GPU side?
> > > > If yes, migration would cause unmapping of these pages and trigger the
> > > > mmu_notifier's hook.
> > > 
> > > We do that for userptr pages (i.e. stuff we steal from userspace address
> > > spaces). But we also have native gfx buffer objects based on shmem files,
> > > and thus far we need to allocate them as !GFP_MOVEABLE. And we allocate a
> > > _lot_ of those. And those files aren't mapped into any cpu address space
> > > (ofc they're mapped on the gpu side, but that's driver private), from the
> > > core mm they are pure pagecache. And afaiui for that we need to wire up
> > > the migratepage hooks through shmem to i915_gem.c
> > 
> > I see.
> > 
> > I don't particularly like the way patch hooks into migrate, but don't a
> > good idea how to implement this better.
> > 
> > This way allows to hook up to any shmem file, which can be abused by
> > drivers later.
> > 
> > I wounder if it would be better for i915 to have its own in-kernel mount
> > with variant of tmpfs which provides different mapping->a_ops? Or is it
> > overkill? I don't know.
> > 
> > Hugh?
> 
> This, and the 2/2, remain perpetually in my "needs more thought" box.
> And won't get that thought today either, I'm afraid.  Tomorrow.
> 
> Like you, I don't particularly like these; but recognize that the i915
> guys are doing all the rest of us a big favour by going to some trouble
> to allow migration of their pinned pages.
> 
> Potential for abuse of migratepage by drivers is already there anyway:
> we can be grateful that they're offering to use rather than abuse it;
> but yes, it's a worry that such trickiness gets dispersed into drivers.

Looking at our internal roadmap it'll likely get a lot worse, and in a few
years you'll have i915 asking the core mm politely to move around pages
for it because they're place suboptimally for gpu access. It'll be fun.

We don't have a prototype yet at all even internally, but I think that's
another reason why a more cozy relationship between i915 and shmem would
be good. Not sure you want that, or whether we should resurrect the old
idea of a gemfs.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Intel-gfx] [PATCH v4 2/2] drm/i915: Make GPU pages movable
  2016-04-04 13:18 ` [PATCH v4 2/2] drm/i915: Make GPU pages movable Chris Wilson
@ 2016-10-18 12:05   ` Joonas Lahtinen
  2016-10-18 13:25     ` Goel, Akash
  0 siblings, 1 reply; 13+ messages in thread
From: Joonas Lahtinen @ 2016-10-18 12:05 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx; +Cc: linux-mm, Akash Goel, Hugh Dickins, Sourab Gupta

On ma, 2016-04-04 at 14:18 +0100, Chris Wilson wrote:
> From: Akash Goel <akash.goel@intel.com>
> 
> On a long run of more than 2-3 days, physical memory tends to get
> fragmented severely, which considerably slows down the system. In such a
> scenario, the shrinker is also unable to help as lack of memory is not
> the actual problem, since it has been observed that there are enough free
> pages of 0 order. This also manifests itself when an indiviual zone in
> the mm runs out of pages and if we cannot migrate pages between zones,
> the kernel hits an out-of-memory even though there are free pages (and
> often all of swap) available.
> 
> To address the issue of external fragementation, kernel does a compaction
> (which involves migration of pages) but it's efficacy depends upon how
> many pages are marked as MOVABLE, as only those pages can be migrated.
> 
> Currently the backing pages for GFX buffers are allocated from shmemfs
> with GFP_RECLAIMABLE flag, in units of 4KB pages.A A In the case of limited
> swap space, it may not be possible always to reclaim or swap-out pages of
> all the inactive objects, to make way for free space allowing formation
> of higher order groups of physically-contiguous pages on compaction.
> 
> Just marking the GPU pages as MOVABLE will not suffice, as i915.ko has to
> pin the pages if they are in use by GPU, which will prevent their
> migration. So the migratepage callback in shmem is also hooked up to get
> a notification when kernel initiates the page migration. On the
> notification, i915.ko appropriately unpin the pages.A A With this we can
> effectively mark the GPU pages as MOVABLE and hence mitigate the
> fragmentation problem.
> 
> v2:
> A - Rename the migration routine to gem_shrink_migratepage, move it to the
> A A A shrinker file, and use the existing constructs (Chris)
> A - To cleanup, add a new helper function to encapsulate all page migration
> A A A skip conditions (Chris)
> A - Add a new local helper function in shrinker file, for dropping the
> A A A backing pages, and call the same from gem_shrink() also (Chris)
> 
> v3:
> A - Fix/invert the check on the return value of unsafe_drop_pages (Chris)
> 
> v4:
> A - Minor tidy
> 
> Testcase: igt/gem_shrink
> Bugzilla: (e.g.) https://bugs.freedesktop.org/show_bug.cgi?id=90254
> Cc: Hugh Dickins <hughd@google.com>
> Cc: linux-mm@kvack.org
> Signed-off-by: Sourab Gupta <sourab.gupta@intel.com>
> Signed-off-by: Akash Goel <akash.goel@intel.com>
> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>

Could this patch be re-spinned on top of current nightly?

After removing;

> WARN(page_count(newpage) != 1, "Unexpected ref count for newpage\n")

and

>	if (ret)
>		DRM_DEBUG_DRIVER("page=%p migration returned %d\n", page, ret);

This is;

Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

Regards, Joonas
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Intel-gfx] [PATCH v4 2/2] drm/i915: Make GPU pages movable
  2016-10-18 12:05   ` [Intel-gfx] " Joonas Lahtinen
@ 2016-10-18 13:25     ` Goel, Akash
  2016-10-18 13:39       ` Chris Wilson
  0 siblings, 1 reply; 13+ messages in thread
From: Goel, Akash @ 2016-10-18 13:25 UTC (permalink / raw)
  To: Joonas Lahtinen, Chris Wilson
  Cc: intel-gfx, akash.goel, linux-mm, Hugh Dickins, Sourab Gupta



On 10/18/2016 5:35 PM, Joonas Lahtinen wrote:
> On ma, 2016-04-04 at 14:18 +0100, Chris Wilson wrote:
>> From: Akash Goel <akash.goel@intel.com>
>>
>> On a long run of more than 2-3 days, physical memory tends to get
>> fragmented severely, which considerably slows down the system. In such a
>> scenario, the shrinker is also unable to help as lack of memory is not
>> the actual problem, since it has been observed that there are enough free
>> pages of 0 order. This also manifests itself when an indiviual zone in
>> the mm runs out of pages and if we cannot migrate pages between zones,
>> the kernel hits an out-of-memory even though there are free pages (and
>> often all of swap) available.
>>
>> To address the issue of external fragementation, kernel does a compaction
>> (which involves migration of pages) but it's efficacy depends upon how
>> many pages are marked as MOVABLE, as only those pages can be migrated.
>>
>> Currently the backing pages for GFX buffers are allocated from shmemfs
>> with GFP_RECLAIMABLE flag, in units of 4KB pages.  In the case of limited
>> swap space, it may not be possible always to reclaim or swap-out pages of
>> all the inactive objects, to make way for free space allowing formation
>> of higher order groups of physically-contiguous pages on compaction.
>>
>> Just marking the GPU pages as MOVABLE will not suffice, as i915.ko has to
>> pin the pages if they are in use by GPU, which will prevent their
>> migration. So the migratepage callback in shmem is also hooked up to get
>> a notification when kernel initiates the page migration. On the
>> notification, i915.ko appropriately unpin the pages.  With this we can
>> effectively mark the GPU pages as MOVABLE and hence mitigate the
>> fragmentation problem.
>>
>> v2:
>>  - Rename the migration routine to gem_shrink_migratepage, move it to the
>>    shrinker file, and use the existing constructs (Chris)
>>  - To cleanup, add a new helper function to encapsulate all page migration
>>    skip conditions (Chris)
>>  - Add a new local helper function in shrinker file, for dropping the
>>    backing pages, and call the same from gem_shrink() also (Chris)
>>
>> v3:
>>  - Fix/invert the check on the return value of unsafe_drop_pages (Chris)
>>
>> v4:
>>  - Minor tidy
>>
>> Testcase: igt/gem_shrink
>> Bugzilla: (e.g.) https://bugs.freedesktop.org/show_bug.cgi?id=90254
>> Cc: Hugh Dickins <hughd@google.com>
>> Cc: linux-mm@kvack.org
>> Signed-off-by: Sourab Gupta <sourab.gupta@intel.com>
>> Signed-off-by: Akash Goel <akash.goel@intel.com>
>> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
>
> Could this patch be re-spinned on top of current nightly?
>
Sure will rebase it on top of nightly.

> After removing;
>
>> WARN(page_count(newpage) != 1, "Unexpected ref count for newpage\n")
>
> and
>
>> 	if (ret)
>> 		DRM_DEBUG_DRIVER("page=%p migration returned %d\n", page, ret);
>
> This is;
>
> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Thanks much for the review.
But there is a precursor patch also, there has been no traction on that.
[1/2] shmem: Support for registration of Driver/file owner specific ops
https://patchwork.freedesktop.org/patch/77935/

Best regards
Akash

>
> Regards, Joonas
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Intel-gfx] [PATCH v4 2/2] drm/i915: Make GPU pages movable
  2016-10-18 13:25     ` Goel, Akash
@ 2016-10-18 13:39       ` Chris Wilson
  2016-10-19 10:40         ` Joonas Lahtinen
  0 siblings, 1 reply; 13+ messages in thread
From: Chris Wilson @ 2016-10-18 13:39 UTC (permalink / raw)
  To: Goel, Akash
  Cc: Joonas Lahtinen, intel-gfx, linux-mm, Hugh Dickins, Sourab Gupta

On Tue, Oct 18, 2016 at 06:55:12PM +0530, Goel, Akash wrote:
> 
> 
> On 10/18/2016 5:35 PM, Joonas Lahtinen wrote:
> >On ma, 2016-04-04 at 14:18 +0100, Chris Wilson wrote:
> >>From: Akash Goel <akash.goel@intel.com>
> >>
> >>On a long run of more than 2-3 days, physical memory tends to get
> >>fragmented severely, which considerably slows down the system. In such a
> >>scenario, the shrinker is also unable to help as lack of memory is not
> >>the actual problem, since it has been observed that there are enough free
> >>pages of 0 order. This also manifests itself when an indiviual zone in
> >>the mm runs out of pages and if we cannot migrate pages between zones,
> >>the kernel hits an out-of-memory even though there are free pages (and
> >>often all of swap) available.
> >>
> >>To address the issue of external fragementation, kernel does a compaction
> >>(which involves migration of pages) but it's efficacy depends upon how
> >>many pages are marked as MOVABLE, as only those pages can be migrated.
> >>
> >>Currently the backing pages for GFX buffers are allocated from shmemfs
> >>with GFP_RECLAIMABLE flag, in units of 4KB pages.  In the case of limited
> >>swap space, it may not be possible always to reclaim or swap-out pages of
> >>all the inactive objects, to make way for free space allowing formation
> >>of higher order groups of physically-contiguous pages on compaction.
> >>
> >>Just marking the GPU pages as MOVABLE will not suffice, as i915.ko has to
> >>pin the pages if they are in use by GPU, which will prevent their
> >>migration. So the migratepage callback in shmem is also hooked up to get
> >>a notification when kernel initiates the page migration. On the
> >>notification, i915.ko appropriately unpin the pages.  With this we can
> >>effectively mark the GPU pages as MOVABLE and hence mitigate the
> >>fragmentation problem.
> >>
> >>v2:
> >> - Rename the migration routine to gem_shrink_migratepage, move it to the
> >>   shrinker file, and use the existing constructs (Chris)
> >> - To cleanup, add a new helper function to encapsulate all page migration
> >>   skip conditions (Chris)
> >> - Add a new local helper function in shrinker file, for dropping the
> >>   backing pages, and call the same from gem_shrink() also (Chris)
> >>
> >>v3:
> >> - Fix/invert the check on the return value of unsafe_drop_pages (Chris)
> >>
> >>v4:
> >> - Minor tidy
> >>
> >>Testcase: igt/gem_shrink
> >>Bugzilla: (e.g.) https://bugs.freedesktop.org/show_bug.cgi?id=90254
> >>Cc: Hugh Dickins <hughd@google.com>
> >>Cc: linux-mm@kvack.org
> >>Signed-off-by: Sourab Gupta <sourab.gupta@intel.com>
> >>Signed-off-by: Akash Goel <akash.goel@intel.com>
> >>Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
> >
> >Could this patch be re-spinned on top of current nightly?
> >
> Sure will rebase it on top of nightly.

It's in my tree (on top of nightly) already with Joonas' r-b.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Intel-gfx] [PATCH v4 2/2] drm/i915: Make GPU pages movable
  2016-10-18 13:39       ` Chris Wilson
@ 2016-10-19 10:40         ` Joonas Lahtinen
  0 siblings, 0 replies; 13+ messages in thread
From: Joonas Lahtinen @ 2016-10-19 10:40 UTC (permalink / raw)
  To: Chris Wilson, Goel, Akash; +Cc: intel-gfx, linux-mm, Hugh Dickins, Sourab Gupta

On ti, 2016-10-18 at 14:39 +0100, Chris Wilson wrote:
> It's in my tree (on top of nightly) already with Joonas' r-b.

Patch 1/2 seems to have my comments already, could be addressed and
respined too.

Regards, Joonas
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2016-10-19 10:40 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-04-04 13:18 [PATCH v4 1/2] shmem: Support for registration of driver/file owner specific ops Chris Wilson
2016-04-04 13:18 ` [PATCH v4 2/2] drm/i915: Make GPU pages movable Chris Wilson
2016-10-18 12:05   ` [Intel-gfx] " Joonas Lahtinen
2016-10-18 13:25     ` Goel, Akash
2016-10-18 13:39       ` Chris Wilson
2016-10-19 10:40         ` Joonas Lahtinen
2016-04-15 16:09 ` [Intel-gfx] [PATCH v4 1/2] shmem: Support for registration of driver/file owner specific ops Chris Wilson
2016-04-20 12:38   ` Daniel Vetter
2016-04-24 23:42 ` Kirill A. Shutemov
2016-04-26 12:53   ` Daniel Vetter
2016-04-26 23:33     ` Kirill A. Shutemov
2016-04-27  7:16       ` Hugh Dickins
2016-04-27  7:38         ` Daniel Vetter

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).