* [PATCH] drm/radeon: add user pointer support v2
@ 2014-06-30 13:04 Christian König
2014-06-30 17:35 ` Jerome Glisse
0 siblings, 1 reply; 7+ messages in thread
From: Christian König @ 2014-06-30 13:04 UTC (permalink / raw)
To: dri-devel
From: Christian König <christian.koenig@amd.com>
This patch adds an IOCTL for turning a pointer supplied by
userspace into a buffer object.
It works similar to the recently added userptr support for i915,
so it also imposes several restrictions upon the memory being mapped:
1. It must be page aligned (both start/end addresses, i.e ptr and size).
2. It must be normal system memory, not a pointer into another map of IO
space (e.g. it must not be a GTT mmapping of another object).
3. The BO is mapped into GTT, so the maximum amount of memory mapped at
all times is still the GTT limit.
Exporting and sharing as well as mapping of buffer objects created by
this function is forbidden and results in an -EPERM.
In difference to the i915 implementation we don't use an interval tree to
manage all the buffers objects created and instead just register a separate
MMU notifier callback for each BO so that buffer objects are invalidated
whenever any modification is made to the processes page table. This probably
needs further optimization.
Please review and / or comment.
v2: squash all previous changes into first public version
Signed-off-by: Christian König <christian.koenig@amd.com>
---
drivers/gpu/drm/Kconfig | 1 +
drivers/gpu/drm/radeon/radeon.h | 8 +++
drivers/gpu/drm/radeon/radeon_cs.c | 16 ++++-
drivers/gpu/drm/radeon/radeon_drv.c | 5 +-
drivers/gpu/drm/radeon/radeon_gem.c | 58 ++++++++++++++++++
drivers/gpu/drm/radeon/radeon_kms.c | 1 +
drivers/gpu/drm/radeon/radeon_object.c | 51 ++++++++++++++++
drivers/gpu/drm/radeon/radeon_object.h | 1 +
drivers/gpu/drm/radeon/radeon_prime.c | 10 +++
drivers/gpu/drm/radeon/radeon_ttm.c | 108 +++++++++++++++++++++++++++++++++
include/uapi/drm/radeon_drm.h | 9 +++
11 files changed, 265 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/Kconfig b/drivers/gpu/drm/Kconfig
index f512004..1b3c162 100644
--- a/drivers/gpu/drm/Kconfig
+++ b/drivers/gpu/drm/Kconfig
@@ -114,6 +114,7 @@ config DRM_RADEON
select POWER_SUPPLY
select HWMON
select BACKLIGHT_CLASS_DEVICE
+ select MMU_NOTIFIER
help
Choose this option if you have an ATI Radeon graphics card. There
are both PCI and AGP versions. You don't need to choose this to
diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
index 4b0bbf8..09a7be1 100644
--- a/drivers/gpu/drm/radeon/radeon.h
+++ b/drivers/gpu/drm/radeon/radeon.h
@@ -64,6 +64,7 @@
#include <linux/wait.h>
#include <linux/list.h>
#include <linux/kref.h>
+#include <linux/mmu_notifier.h>
#include <ttm/ttm_bo_api.h>
#include <ttm/ttm_bo_driver.h>
@@ -478,6 +479,9 @@ struct radeon_bo {
struct ttm_bo_kmap_obj dma_buf_vmap;
pid_t pid;
+
+ /* MM callback for user pointer support */
+ struct mmu_notifier mn;
};
#define gem_to_radeon_bo(gobj) container_of((gobj), struct radeon_bo, gem_base)
@@ -2115,6 +2119,8 @@ int radeon_gem_info_ioctl(struct drm_device *dev, void *data,
struct drm_file *filp);
int radeon_gem_create_ioctl(struct drm_device *dev, void *data,
struct drm_file *filp);
+int radeon_gem_import_ioctl(struct drm_device *dev, void *data,
+ struct drm_file *filp);
int radeon_gem_pin_ioctl(struct drm_device *dev, void *data,
struct drm_file *file_priv);
int radeon_gem_unpin_ioctl(struct drm_device *dev, void *data,
@@ -2806,6 +2812,8 @@ extern void radeon_legacy_set_clock_gating(struct radeon_device *rdev, int enabl
extern void radeon_atom_set_clock_gating(struct radeon_device *rdev, int enable);
extern void radeon_ttm_placement_from_domain(struct radeon_bo *rbo, u32 domain);
extern bool radeon_ttm_bo_is_radeon_bo(struct ttm_buffer_object *bo);
+extern int radeon_ttm_tt_set_userptr(struct ttm_tt *ttm, uint64_t userptr);
+extern bool radeon_ttm_tt_has_userptr(struct ttm_tt *ttm);
extern void radeon_vram_location(struct radeon_device *rdev, struct radeon_mc *mc, u64 base);
extern void radeon_gtt_location(struct radeon_device *rdev, struct radeon_mc *mc);
extern int radeon_resume_kms(struct drm_device *dev, bool resume, bool fbcon);
diff --git a/drivers/gpu/drm/radeon/radeon_cs.c b/drivers/gpu/drm/radeon/radeon_cs.c
index 71a1434..0a195b9 100644
--- a/drivers/gpu/drm/radeon/radeon_cs.c
+++ b/drivers/gpu/drm/radeon/radeon_cs.c
@@ -78,7 +78,8 @@ static int radeon_cs_parser_relocs(struct radeon_cs_parser *p)
struct radeon_cs_chunk *chunk;
struct radeon_cs_buckets buckets;
unsigned i, j;
- bool duplicate;
+ bool duplicate, need_mmap_lock = false;
+ int r;
if (p->chunk_relocs_idx == -1) {
return 0;
@@ -100,6 +101,7 @@ static int radeon_cs_parser_relocs(struct radeon_cs_parser *p)
for (i = 0; i < p->nrelocs; i++) {
struct drm_radeon_cs_reloc *r;
+ struct ttm_tt *ttm;
unsigned priority;
duplicate = false;
@@ -126,6 +128,9 @@ static int radeon_cs_parser_relocs(struct radeon_cs_parser *p)
p->relocs_ptr[i] = &p->relocs[i];
p->relocs[i].robj = gem_to_radeon_bo(p->relocs[i].gobj);
+ ttm = p->relocs[i].robj->tbo.ttm;
+ need_mmap_lock |= radeon_ttm_tt_has_userptr(ttm);
+
/* The userspace buffer priorities are from 0 to 15. A higher
* number means the buffer is more important.
* Also, the buffers used for write have a higher priority than
@@ -176,8 +181,15 @@ static int radeon_cs_parser_relocs(struct radeon_cs_parser *p)
if (p->cs_flags & RADEON_CS_USE_VM)
p->vm_bos = radeon_vm_get_bos(p->rdev, p->ib.vm,
&p->validated);
+ if (need_mmap_lock)
+ down_read(¤t->mm->mmap_sem);
+
+ r = radeon_bo_list_validate(p->rdev, &p->ticket, &p->validated, p->ring);
- return radeon_bo_list_validate(p->rdev, &p->ticket, &p->validated, p->ring);
+ if (need_mmap_lock)
+ up_read(¤t->mm->mmap_sem);
+
+ return r;
}
static int radeon_cs_get_ring(struct radeon_cs_parser *p, u32 ring, s32 priority)
diff --git a/drivers/gpu/drm/radeon/radeon_drv.c b/drivers/gpu/drm/radeon/radeon_drv.c
index 6e30174..355aa67 100644
--- a/drivers/gpu/drm/radeon/radeon_drv.c
+++ b/drivers/gpu/drm/radeon/radeon_drv.c
@@ -112,6 +112,9 @@ int radeon_gem_object_open(struct drm_gem_object *obj,
struct drm_file *file_priv);
void radeon_gem_object_close(struct drm_gem_object *obj,
struct drm_file *file_priv);
+struct dma_buf *radeon_gem_prime_export(struct drm_device *dev,
+ struct drm_gem_object *gobj,
+ int flags);
extern int radeon_get_crtc_scanoutpos(struct drm_device *dev, int crtc,
unsigned int flags,
int *vpos, int *hpos, ktime_t *stime,
@@ -558,7 +561,7 @@ static struct drm_driver kms_driver = {
.prime_handle_to_fd = drm_gem_prime_handle_to_fd,
.prime_fd_to_handle = drm_gem_prime_fd_to_handle,
- .gem_prime_export = drm_gem_prime_export,
+ .gem_prime_export = radeon_gem_prime_export,
.gem_prime_import = drm_gem_prime_import,
.gem_prime_pin = radeon_gem_prime_pin,
.gem_prime_unpin = radeon_gem_prime_unpin,
diff --git a/drivers/gpu/drm/radeon/radeon_gem.c b/drivers/gpu/drm/radeon/radeon_gem.c
index d09650c..ac939a8 100644
--- a/drivers/gpu/drm/radeon/radeon_gem.c
+++ b/drivers/gpu/drm/radeon/radeon_gem.c
@@ -272,6 +272,60 @@ int radeon_gem_create_ioctl(struct drm_device *dev, void *data,
return 0;
}
+int radeon_gem_import_ioctl(struct drm_device *dev, void *data,
+ struct drm_file *filp)
+{
+ struct radeon_device *rdev = dev->dev_private;
+ struct drm_radeon_gem_import *args = data;
+ struct drm_gem_object *gobj;
+ struct radeon_bo *bo;
+ uint32_t handle;
+ int r;
+
+ if (offset_in_page(args->addr | args->size))
+ return -EINVAL;
+
+ if (!access_ok(VERIFY_WRITE, (char __user *)args->addr, args->size))
+ return -EFAULT;
+
+ down_read(&rdev->exclusive_lock);
+
+ /* create a gem object to contain this object in */
+ r = radeon_gem_object_create(rdev, args->size, 0,
+ RADEON_GEM_DOMAIN_CPU,
+ false, false, &gobj);
+ if (r)
+ goto handle_lockup;
+
+ bo = gem_to_radeon_bo(gobj);
+ r = radeon_ttm_tt_set_userptr(bo->tbo.ttm, args->addr);
+ if (r)
+ goto release_object;
+
+ r = radeon_bo_register_mm_callback(bo);
+ if (r)
+ goto release_object;
+
+ r = drm_gem_handle_create(filp, gobj, &handle);
+ /* drop reference from allocate - handle holds it now */
+ drm_gem_object_unreference_unlocked(gobj);
+ if (r)
+ goto handle_lockup;
+
+ args->handle = handle;
+ up_read(&rdev->exclusive_lock);
+ return 0;
+
+release_object:
+ drm_gem_object_unreference_unlocked(gobj);
+
+handle_lockup:
+ up_read(&rdev->exclusive_lock);
+ r = radeon_gem_handle_lockup(rdev, r);
+
+ return r;
+}
+
int radeon_gem_set_domain_ioctl(struct drm_device *dev, void *data,
struct drm_file *filp)
{
@@ -315,6 +369,8 @@ int radeon_mode_dumb_mmap(struct drm_file *filp,
return -ENOENT;
}
robj = gem_to_radeon_bo(gobj);
+ if (radeon_ttm_tt_has_userptr(robj->tbo.ttm))
+ return -EPERM;
*offset_p = radeon_bo_mmap_offset(robj);
drm_gem_object_unreference_unlocked(gobj);
return 0;
@@ -535,6 +591,8 @@ int radeon_gem_op_ioctl(struct drm_device *dev, void *data,
return -ENOENT;
}
robj = gem_to_radeon_bo(gobj);
+ if (radeon_ttm_tt_has_userptr(robj->tbo.ttm))
+ return -EPERM;
r = radeon_bo_reserve(robj, false);
if (unlikely(r))
goto out;
diff --git a/drivers/gpu/drm/radeon/radeon_kms.c b/drivers/gpu/drm/radeon/radeon_kms.c
index 35d9318..39e8a5c 100644
--- a/drivers/gpu/drm/radeon/radeon_kms.c
+++ b/drivers/gpu/drm/radeon/radeon_kms.c
@@ -874,5 +874,6 @@ const struct drm_ioctl_desc radeon_ioctls_kms[] = {
DRM_IOCTL_DEF_DRV(RADEON_GEM_BUSY, radeon_gem_busy_ioctl, DRM_AUTH|DRM_UNLOCKED|DRM_RENDER_ALLOW),
DRM_IOCTL_DEF_DRV(RADEON_GEM_VA, radeon_gem_va_ioctl, DRM_AUTH|DRM_UNLOCKED|DRM_RENDER_ALLOW),
DRM_IOCTL_DEF_DRV(RADEON_GEM_OP, radeon_gem_op_ioctl, DRM_AUTH|DRM_UNLOCKED|DRM_RENDER_ALLOW),
+ DRM_IOCTL_DEF_DRV(RADEON_GEM_IMPORT, radeon_gem_import_ioctl, DRM_AUTH|DRM_UNLOCKED|DRM_RENDER_ALLOW),
};
int radeon_max_kms_ioctl = ARRAY_SIZE(radeon_ioctls_kms);
diff --git a/drivers/gpu/drm/radeon/radeon_object.c b/drivers/gpu/drm/radeon/radeon_object.c
index 6c717b2..4bf5335 100644
--- a/drivers/gpu/drm/radeon/radeon_object.c
+++ b/drivers/gpu/drm/radeon/radeon_object.c
@@ -86,6 +86,9 @@ static void radeon_ttm_bo_destroy(struct ttm_buffer_object *tbo)
radeon_update_memory_usage(bo, bo->tbo.mem.mem_type, -1);
+ if (bo->mn.ops)
+ mmu_notifier_unregister(&bo->mn, current->mm);
+
mutex_lock(&bo->rdev->gem.mutex);
list_del_init(&bo->list);
mutex_unlock(&bo->rdev->gem.mutex);
@@ -253,6 +256,9 @@ int radeon_bo_pin_restricted(struct radeon_bo *bo, u32 domain, u64 max_offset,
{
int r, i;
+ if (radeon_ttm_tt_has_userptr(bo->tbo.ttm))
+ return -EPERM;
+
if (bo->pin_count) {
bo->pin_count++;
if (gpu_addr)
@@ -743,3 +749,48 @@ int radeon_bo_wait(struct radeon_bo *bo, u32 *mem_type, bool no_wait)
ttm_bo_unreserve(&bo->tbo);
return r;
}
+
+static void radeon_bo_mn_release(struct mmu_notifier *mn,
+ struct mm_struct *mm)
+{
+ struct radeon_bo *bo = container_of(mn, struct radeon_bo, mn);
+
+ bo->mn.ops = NULL;
+}
+
+static void radeon_bo_mn_invalidate_range_start(struct mmu_notifier *mn,
+ struct mm_struct *mm,
+ unsigned long start,
+ unsigned long end)
+{
+ struct radeon_bo *bo = container_of(mn, struct radeon_bo, mn);
+ int r;
+
+ /* TODO: optimize! */
+
+ r = radeon_bo_reserve(bo, true);
+ if (r) {
+ dev_err(bo->rdev->dev, "(%d) failed to reserve user bo\n", r);
+ return;
+ }
+
+ radeon_ttm_placement_from_domain(bo, RADEON_GEM_DOMAIN_CPU);
+ r = ttm_bo_validate(&bo->tbo, &bo->placement, false, false);
+ if (r) {
+ dev_err(bo->rdev->dev, "(%d) failed to validate user bo\n", r);
+ return;
+ }
+
+ radeon_bo_unreserve(bo);
+}
+
+static const struct mmu_notifier_ops radeon_bo_notifier = {
+ .release = radeon_bo_mn_release,
+ .invalidate_range_start = radeon_bo_mn_invalidate_range_start,
+};
+
+int radeon_bo_register_mm_callback(struct radeon_bo *bo)
+{
+ bo->mn.ops = &radeon_bo_notifier;
+ return mmu_notifier_register(&bo->mn, current->mm);
+}
diff --git a/drivers/gpu/drm/radeon/radeon_object.h b/drivers/gpu/drm/radeon/radeon_object.h
index 5a873f3..b10368a 100644
--- a/drivers/gpu/drm/radeon/radeon_object.h
+++ b/drivers/gpu/drm/radeon/radeon_object.h
@@ -153,6 +153,7 @@ extern void radeon_bo_move_notify(struct ttm_buffer_object *bo,
struct ttm_mem_reg *new_mem);
extern int radeon_bo_fault_reserve_notify(struct ttm_buffer_object *bo);
extern int radeon_bo_get_surface_reg(struct radeon_bo *bo);
+extern int radeon_bo_register_mm_callback(struct radeon_bo *bo);
/*
* sub allocation
diff --git a/drivers/gpu/drm/radeon/radeon_prime.c b/drivers/gpu/drm/radeon/radeon_prime.c
index 2007456..1f0d8f7 100644
--- a/drivers/gpu/drm/radeon/radeon_prime.c
+++ b/drivers/gpu/drm/radeon/radeon_prime.c
@@ -103,3 +103,13 @@ void radeon_gem_prime_unpin(struct drm_gem_object *obj)
radeon_bo_unpin(bo);
radeon_bo_unreserve(bo);
}
+
+struct dma_buf *radeon_gem_prime_export(struct drm_device *dev,
+ struct drm_gem_object *gobj,
+ int flags)
+{
+ struct radeon_bo *bo = gem_to_radeon_bo(gobj);
+ if (radeon_ttm_tt_has_userptr(bo->tbo.ttm))
+ return ERR_PTR(-EPERM);
+ return drm_gem_prime_export(dev, gobj, flags);
+}
diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c b/drivers/gpu/drm/radeon/radeon_ttm.c
index c8a8a51..37c5e19 100644
--- a/drivers/gpu/drm/radeon/radeon_ttm.c
+++ b/drivers/gpu/drm/radeon/radeon_ttm.c
@@ -515,6 +515,9 @@ struct radeon_ttm_tt {
struct ttm_dma_tt ttm;
struct radeon_device *rdev;
u64 offset;
+
+ uint64_t userptr;
+ struct mm_struct *usermm;
};
static int radeon_ttm_backend_bind(struct ttm_tt *ttm,
@@ -588,6 +591,77 @@ static struct ttm_tt *radeon_ttm_tt_create(struct ttm_bo_device *bdev,
return >t->ttm.ttm;
}
+static int radeon_ttm_tt_pin_userptr(struct ttm_tt *ttm)
+{
+ struct radeon_device *rdev = radeon_get_rdev(ttm->bdev);
+ struct radeon_ttm_tt *gtt = (void *)ttm;
+ unsigned pinned = 0, nents;
+ int r;
+
+ /* prepare the sg table with the user pages */
+ if (current->mm != gtt->usermm)
+ return -EPERM;
+
+ do {
+ unsigned num_pages = ttm->num_pages - pinned;
+ uint64_t userptr = gtt->userptr + pinned * PAGE_SIZE;
+ struct page **pages = ttm->pages + pinned;
+
+ r = get_user_pages(current, current->mm, userptr, num_pages,
+ 1, 0, pages, NULL);
+ if (r < 0) {
+ release_pages(ttm->pages, pinned, 0);
+ return r;
+ }
+ pinned += r;
+
+ } while (pinned < ttm->num_pages);
+
+ r = -ENOMEM;
+ ttm->sg = kcalloc(1, sizeof(struct sg_table), GFP_KERNEL);
+ if (!ttm->sg)
+ goto release_pages;
+
+ r = sg_alloc_table_from_pages(ttm->sg, ttm->pages, ttm->num_pages, 0,
+ ttm->num_pages << PAGE_SHIFT,
+ GFP_KERNEL);
+ if (r)
+ goto release_sg;
+
+ r = -ENOMEM;
+ nents = dma_map_sg(rdev->dev, ttm->sg->sgl, ttm->sg->nents,
+ DMA_BIDIRECTIONAL);
+ if (nents != ttm->sg->nents)
+ goto release_sg;
+
+ return 0;
+
+release_sg:
+ kfree(ttm->sg);
+
+release_pages:
+ release_pages(ttm->pages, pinned, 0);
+ return r;
+}
+
+static void radeon_ttm_tt_unpin_userptr(struct ttm_tt *ttm)
+{
+ struct radeon_device *rdev = radeon_get_rdev(ttm->bdev);
+ unsigned i;
+
+ /* free the sg table and pages again */
+ dma_unmap_sg(rdev->dev, ttm->sg->sgl,
+ ttm->sg->nents, DMA_BIDIRECTIONAL);
+
+ sg_free_table(ttm->sg);
+ kfree(ttm->sg);
+
+ for (i = 0; i < ttm->num_pages; ++i)
+ set_page_dirty(ttm->pages[i]);
+
+ release_pages(ttm->pages, ttm->num_pages, 0);
+}
+
static int radeon_ttm_tt_populate(struct ttm_tt *ttm)
{
struct radeon_device *rdev;
@@ -599,6 +673,13 @@ static int radeon_ttm_tt_populate(struct ttm_tt *ttm)
if (ttm->state != tt_unpopulated)
return 0;
+ if (gtt->userptr) {
+ r = radeon_ttm_tt_pin_userptr(ttm);
+ if (r)
+ return r;
+ slave = true;
+ }
+
if (slave && ttm->sg) {
drm_prime_sg_to_page_addr_arrays(ttm->sg, ttm->pages,
gtt->ttm.dma_address, ttm->num_pages);
@@ -648,6 +729,11 @@ static void radeon_ttm_tt_unpopulate(struct ttm_tt *ttm)
unsigned i;
bool slave = !!(ttm->page_flags & TTM_PAGE_FLAG_SG);
+ if (gtt->userptr) {
+ radeon_ttm_tt_unpin_userptr(ttm);
+ return;
+ }
+
if (slave)
return;
@@ -676,6 +762,28 @@ static void radeon_ttm_tt_unpopulate(struct ttm_tt *ttm)
ttm_pool_unpopulate(ttm);
}
+int radeon_ttm_tt_set_userptr(struct ttm_tt *ttm, uint64_t userptr)
+{
+ struct radeon_ttm_tt *gtt = (void *)ttm;
+
+ if (gtt == NULL)
+ return -EINVAL;
+
+ gtt->userptr = userptr;
+ gtt->usermm = current->mm;
+ return 0;
+}
+
+bool radeon_ttm_tt_has_userptr(struct ttm_tt *ttm)
+{
+ struct radeon_ttm_tt *gtt = (void *)ttm;
+
+ if (gtt == NULL)
+ return false;
+
+ return !!gtt->userptr;
+}
+
static struct ttm_bo_driver radeon_bo_driver = {
.ttm_tt_create = &radeon_ttm_tt_create,
.ttm_tt_populate = &radeon_ttm_tt_populate,
diff --git a/include/uapi/drm/radeon_drm.h b/include/uapi/drm/radeon_drm.h
index 1cc0b61..54e7421 100644
--- a/include/uapi/drm/radeon_drm.h
+++ b/include/uapi/drm/radeon_drm.h
@@ -511,6 +511,7 @@ typedef struct {
#define DRM_RADEON_GEM_BUSY 0x2a
#define DRM_RADEON_GEM_VA 0x2b
#define DRM_RADEON_GEM_OP 0x2c
+#define DRM_RADEON_GEM_IMPORT 0x2d
#define DRM_IOCTL_RADEON_CP_INIT DRM_IOW( DRM_COMMAND_BASE + DRM_RADEON_CP_INIT, drm_radeon_init_t)
#define DRM_IOCTL_RADEON_CP_START DRM_IO( DRM_COMMAND_BASE + DRM_RADEON_CP_START)
@@ -554,6 +555,7 @@ typedef struct {
#define DRM_IOCTL_RADEON_GEM_BUSY DRM_IOWR(DRM_COMMAND_BASE + DRM_RADEON_GEM_BUSY, struct drm_radeon_gem_busy)
#define DRM_IOCTL_RADEON_GEM_VA DRM_IOWR(DRM_COMMAND_BASE + DRM_RADEON_GEM_VA, struct drm_radeon_gem_va)
#define DRM_IOCTL_RADEON_GEM_OP DRM_IOWR(DRM_COMMAND_BASE + DRM_RADEON_GEM_OP, struct drm_radeon_gem_op)
+#define DRM_IOCTL_RADEON_GEM_IMPORT DRM_IOWR(DRM_COMMAND_BASE + DRM_RADEON_GEM_IMPORT, struct drm_radeon_gem_import)
typedef struct drm_radeon_init {
enum {
@@ -806,6 +808,13 @@ struct drm_radeon_gem_create {
uint32_t flags;
};
+struct drm_radeon_gem_import {
+ uint64_t addr;
+ uint64_t size;
+ uint32_t flags;
+ uint32_t handle;
+};
+
#define RADEON_TILING_MACRO 0x1
#define RADEON_TILING_MICRO 0x2
#define RADEON_TILING_SWAP_16BIT 0x4
--
1.9.1
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH] drm/radeon: add user pointer support v2
2014-06-30 13:04 [PATCH] drm/radeon: add user pointer support v2 Christian König
@ 2014-06-30 17:35 ` Jerome Glisse
2014-07-01 9:49 ` Christian König
0 siblings, 1 reply; 7+ messages in thread
From: Jerome Glisse @ 2014-06-30 17:35 UTC (permalink / raw)
To: Christian König; +Cc: dri-devel
On Mon, Jun 30, 2014 at 03:04:16PM +0200, Christian König wrote:
> From: Christian König <christian.koenig@amd.com>
>
> This patch adds an IOCTL for turning a pointer supplied by
> userspace into a buffer object.
>
> It works similar to the recently added userptr support for i915,
> so it also imposes several restrictions upon the memory being mapped:
>
> 1. It must be page aligned (both start/end addresses, i.e ptr and size).
>
> 2. It must be normal system memory, not a pointer into another map of IO
> space (e.g. it must not be a GTT mmapping of another object).
>
> 3. The BO is mapped into GTT, so the maximum amount of memory mapped at
> all times is still the GTT limit.
>
> Exporting and sharing as well as mapping of buffer objects created by
> this function is forbidden and results in an -EPERM.
>
> In difference to the i915 implementation we don't use an interval tree to
> manage all the buffers objects created and instead just register a separate
> MMU notifier callback for each BO so that buffer objects are invalidated
> whenever any modification is made to the processes page table. This probably
> needs further optimization.
>
> Please review and / or comment.
NAK, this is kind of an horror show. I should probably take a look at
Intel code too.
First registering one notifier per bo is a bad idea, it is not what you
want to do.
>
> v2: squash all previous changes into first public version
>
> Signed-off-by: Christian König <christian.koenig@amd.com>
> ---
> drivers/gpu/drm/Kconfig | 1 +
> drivers/gpu/drm/radeon/radeon.h | 8 +++
> drivers/gpu/drm/radeon/radeon_cs.c | 16 ++++-
> drivers/gpu/drm/radeon/radeon_drv.c | 5 +-
> drivers/gpu/drm/radeon/radeon_gem.c | 58 ++++++++++++++++++
> drivers/gpu/drm/radeon/radeon_kms.c | 1 +
> drivers/gpu/drm/radeon/radeon_object.c | 51 ++++++++++++++++
> drivers/gpu/drm/radeon/radeon_object.h | 1 +
> drivers/gpu/drm/radeon/radeon_prime.c | 10 +++
> drivers/gpu/drm/radeon/radeon_ttm.c | 108 +++++++++++++++++++++++++++++++++
> include/uapi/drm/radeon_drm.h | 9 +++
> 11 files changed, 265 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/Kconfig b/drivers/gpu/drm/Kconfig
> index f512004..1b3c162 100644
> --- a/drivers/gpu/drm/Kconfig
> +++ b/drivers/gpu/drm/Kconfig
> @@ -114,6 +114,7 @@ config DRM_RADEON
> select POWER_SUPPLY
> select HWMON
> select BACKLIGHT_CLASS_DEVICE
> + select MMU_NOTIFIER
> help
> Choose this option if you have an ATI Radeon graphics card. There
> are both PCI and AGP versions. You don't need to choose this to
> diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
> index 4b0bbf8..09a7be1 100644
> --- a/drivers/gpu/drm/radeon/radeon.h
> +++ b/drivers/gpu/drm/radeon/radeon.h
> @@ -64,6 +64,7 @@
> #include <linux/wait.h>
> #include <linux/list.h>
> #include <linux/kref.h>
> +#include <linux/mmu_notifier.h>
>
> #include <ttm/ttm_bo_api.h>
> #include <ttm/ttm_bo_driver.h>
> @@ -478,6 +479,9 @@ struct radeon_bo {
>
> struct ttm_bo_kmap_obj dma_buf_vmap;
> pid_t pid;
> +
> + /* MM callback for user pointer support */
> + struct mmu_notifier mn;
> };
> #define gem_to_radeon_bo(gobj) container_of((gobj), struct radeon_bo, gem_base)
>
> @@ -2115,6 +2119,8 @@ int radeon_gem_info_ioctl(struct drm_device *dev, void *data,
> struct drm_file *filp);
> int radeon_gem_create_ioctl(struct drm_device *dev, void *data,
> struct drm_file *filp);
> +int radeon_gem_import_ioctl(struct drm_device *dev, void *data,
> + struct drm_file *filp);
> int radeon_gem_pin_ioctl(struct drm_device *dev, void *data,
> struct drm_file *file_priv);
> int radeon_gem_unpin_ioctl(struct drm_device *dev, void *data,
> @@ -2806,6 +2812,8 @@ extern void radeon_legacy_set_clock_gating(struct radeon_device *rdev, int enabl
> extern void radeon_atom_set_clock_gating(struct radeon_device *rdev, int enable);
> extern void radeon_ttm_placement_from_domain(struct radeon_bo *rbo, u32 domain);
> extern bool radeon_ttm_bo_is_radeon_bo(struct ttm_buffer_object *bo);
> +extern int radeon_ttm_tt_set_userptr(struct ttm_tt *ttm, uint64_t userptr);
> +extern bool radeon_ttm_tt_has_userptr(struct ttm_tt *ttm);
> extern void radeon_vram_location(struct radeon_device *rdev, struct radeon_mc *mc, u64 base);
> extern void radeon_gtt_location(struct radeon_device *rdev, struct radeon_mc *mc);
> extern int radeon_resume_kms(struct drm_device *dev, bool resume, bool fbcon);
> diff --git a/drivers/gpu/drm/radeon/radeon_cs.c b/drivers/gpu/drm/radeon/radeon_cs.c
> index 71a1434..0a195b9 100644
> --- a/drivers/gpu/drm/radeon/radeon_cs.c
> +++ b/drivers/gpu/drm/radeon/radeon_cs.c
> @@ -78,7 +78,8 @@ static int radeon_cs_parser_relocs(struct radeon_cs_parser *p)
> struct radeon_cs_chunk *chunk;
> struct radeon_cs_buckets buckets;
> unsigned i, j;
> - bool duplicate;
> + bool duplicate, need_mmap_lock = false;
> + int r;
>
> if (p->chunk_relocs_idx == -1) {
> return 0;
> @@ -100,6 +101,7 @@ static int radeon_cs_parser_relocs(struct radeon_cs_parser *p)
>
> for (i = 0; i < p->nrelocs; i++) {
> struct drm_radeon_cs_reloc *r;
> + struct ttm_tt *ttm;
> unsigned priority;
>
> duplicate = false;
> @@ -126,6 +128,9 @@ static int radeon_cs_parser_relocs(struct radeon_cs_parser *p)
> p->relocs_ptr[i] = &p->relocs[i];
> p->relocs[i].robj = gem_to_radeon_bo(p->relocs[i].gobj);
>
> + ttm = p->relocs[i].robj->tbo.ttm;
> + need_mmap_lock |= radeon_ttm_tt_has_userptr(ttm);
> +
> /* The userspace buffer priorities are from 0 to 15. A higher
> * number means the buffer is more important.
> * Also, the buffers used for write have a higher priority than
> @@ -176,8 +181,15 @@ static int radeon_cs_parser_relocs(struct radeon_cs_parser *p)
> if (p->cs_flags & RADEON_CS_USE_VM)
> p->vm_bos = radeon_vm_get_bos(p->rdev, p->ib.vm,
> &p->validated);
> + if (need_mmap_lock)
> + down_read(¤t->mm->mmap_sem);
> +
> + r = radeon_bo_list_validate(p->rdev, &p->ticket, &p->validated, p->ring);
>
> - return radeon_bo_list_validate(p->rdev, &p->ticket, &p->validated, p->ring);
> + if (need_mmap_lock)
> + up_read(¤t->mm->mmap_sem);
> +
> + return r;
> }
So the mmap lock protect almost nothing here. Once things are validated
nothing forbid the userspace to unmap the range containing the user bo.
>
> static int radeon_cs_get_ring(struct radeon_cs_parser *p, u32 ring, s32 priority)
> diff --git a/drivers/gpu/drm/radeon/radeon_drv.c b/drivers/gpu/drm/radeon/radeon_drv.c
> index 6e30174..355aa67 100644
> --- a/drivers/gpu/drm/radeon/radeon_drv.c
> +++ b/drivers/gpu/drm/radeon/radeon_drv.c
> @@ -112,6 +112,9 @@ int radeon_gem_object_open(struct drm_gem_object *obj,
> struct drm_file *file_priv);
> void radeon_gem_object_close(struct drm_gem_object *obj,
> struct drm_file *file_priv);
> +struct dma_buf *radeon_gem_prime_export(struct drm_device *dev,
> + struct drm_gem_object *gobj,
> + int flags);
Use tab not space there is already enough broken tab/space in radeon
kernel as it is. I always tried to keep it consistant.
> extern int radeon_get_crtc_scanoutpos(struct drm_device *dev, int crtc,
> unsigned int flags,
> int *vpos, int *hpos, ktime_t *stime,
> @@ -558,7 +561,7 @@ static struct drm_driver kms_driver = {
>
> .prime_handle_to_fd = drm_gem_prime_handle_to_fd,
> .prime_fd_to_handle = drm_gem_prime_fd_to_handle,
> - .gem_prime_export = drm_gem_prime_export,
> + .gem_prime_export = radeon_gem_prime_export,
> .gem_prime_import = drm_gem_prime_import,
> .gem_prime_pin = radeon_gem_prime_pin,
> .gem_prime_unpin = radeon_gem_prime_unpin,
> diff --git a/drivers/gpu/drm/radeon/radeon_gem.c b/drivers/gpu/drm/radeon/radeon_gem.c
> index d09650c..ac939a8 100644
> --- a/drivers/gpu/drm/radeon/radeon_gem.c
> +++ b/drivers/gpu/drm/radeon/radeon_gem.c
> @@ -272,6 +272,60 @@ int radeon_gem_create_ioctl(struct drm_device *dev, void *data,
> return 0;
> }
>
> +int radeon_gem_import_ioctl(struct drm_device *dev, void *data,
> + struct drm_file *filp)
> +{
> + struct radeon_device *rdev = dev->dev_private;
> + struct drm_radeon_gem_import *args = data;
> + struct drm_gem_object *gobj;
> + struct radeon_bo *bo;
> + uint32_t handle;
> + int r;
> +
> + if (offset_in_page(args->addr | args->size))
> + return -EINVAL;
> +
> + if (!access_ok(VERIFY_WRITE, (char __user *)args->addr, args->size))
> + return -EFAULT;
> +
> + down_read(&rdev->exclusive_lock);
> +
> + /* create a gem object to contain this object in */
> + r = radeon_gem_object_create(rdev, args->size, 0,
> + RADEON_GEM_DOMAIN_CPU,
> + false, false, &gobj);
> + if (r)
> + goto handle_lockup;
> +
> + bo = gem_to_radeon_bo(gobj);
> + r = radeon_ttm_tt_set_userptr(bo->tbo.ttm, args->addr);
> + if (r)
> + goto release_object;
> +
> + r = radeon_bo_register_mm_callback(bo);
> + if (r)
> + goto release_object;
> +
> + r = drm_gem_handle_create(filp, gobj, &handle);
> + /* drop reference from allocate - handle holds it now */
> + drm_gem_object_unreference_unlocked(gobj);
> + if (r)
> + goto handle_lockup;
> +
> + args->handle = handle;
> + up_read(&rdev->exclusive_lock);
> + return 0;
> +
> +release_object:
> + drm_gem_object_unreference_unlocked(gobj);
> +
> +handle_lockup:
> + up_read(&rdev->exclusive_lock);
> + r = radeon_gem_handle_lockup(rdev, r);
> +
> + return r;
> +}
> +
> int radeon_gem_set_domain_ioctl(struct drm_device *dev, void *data,
> struct drm_file *filp)
> {
> @@ -315,6 +369,8 @@ int radeon_mode_dumb_mmap(struct drm_file *filp,
> return -ENOENT;
> }
> robj = gem_to_radeon_bo(gobj);
> + if (radeon_ttm_tt_has_userptr(robj->tbo.ttm))
> + return -EPERM;
> *offset_p = radeon_bo_mmap_offset(robj);
> drm_gem_object_unreference_unlocked(gobj);
> return 0;
> @@ -535,6 +591,8 @@ int radeon_gem_op_ioctl(struct drm_device *dev, void *data,
> return -ENOENT;
> }
> robj = gem_to_radeon_bo(gobj);
> + if (radeon_ttm_tt_has_userptr(robj->tbo.ttm))
> + return -EPERM;
> r = radeon_bo_reserve(robj, false);
> if (unlikely(r))
> goto out;
> diff --git a/drivers/gpu/drm/radeon/radeon_kms.c b/drivers/gpu/drm/radeon/radeon_kms.c
> index 35d9318..39e8a5c 100644
> --- a/drivers/gpu/drm/radeon/radeon_kms.c
> +++ b/drivers/gpu/drm/radeon/radeon_kms.c
> @@ -874,5 +874,6 @@ const struct drm_ioctl_desc radeon_ioctls_kms[] = {
> DRM_IOCTL_DEF_DRV(RADEON_GEM_BUSY, radeon_gem_busy_ioctl, DRM_AUTH|DRM_UNLOCKED|DRM_RENDER_ALLOW),
> DRM_IOCTL_DEF_DRV(RADEON_GEM_VA, radeon_gem_va_ioctl, DRM_AUTH|DRM_UNLOCKED|DRM_RENDER_ALLOW),
> DRM_IOCTL_DEF_DRV(RADEON_GEM_OP, radeon_gem_op_ioctl, DRM_AUTH|DRM_UNLOCKED|DRM_RENDER_ALLOW),
> + DRM_IOCTL_DEF_DRV(RADEON_GEM_IMPORT, radeon_gem_import_ioctl, DRM_AUTH|DRM_UNLOCKED|DRM_RENDER_ALLOW),
> };
> int radeon_max_kms_ioctl = ARRAY_SIZE(radeon_ioctls_kms);
> diff --git a/drivers/gpu/drm/radeon/radeon_object.c b/drivers/gpu/drm/radeon/radeon_object.c
> index 6c717b2..4bf5335 100644
> --- a/drivers/gpu/drm/radeon/radeon_object.c
> +++ b/drivers/gpu/drm/radeon/radeon_object.c
> @@ -86,6 +86,9 @@ static void radeon_ttm_bo_destroy(struct ttm_buffer_object *tbo)
>
> radeon_update_memory_usage(bo, bo->tbo.mem.mem_type, -1);
>
> + if (bo->mn.ops)
> + mmu_notifier_unregister(&bo->mn, current->mm);
> +
> mutex_lock(&bo->rdev->gem.mutex);
> list_del_init(&bo->list);
> mutex_unlock(&bo->rdev->gem.mutex);
> @@ -253,6 +256,9 @@ int radeon_bo_pin_restricted(struct radeon_bo *bo, u32 domain, u64 max_offset,
> {
> int r, i;
>
> + if (radeon_ttm_tt_has_userptr(bo->tbo.ttm))
> + return -EPERM;
> +
> if (bo->pin_count) {
> bo->pin_count++;
> if (gpu_addr)
> @@ -743,3 +749,48 @@ int radeon_bo_wait(struct radeon_bo *bo, u32 *mem_type, bool no_wait)
> ttm_bo_unreserve(&bo->tbo);
> return r;
> }
> +
> +static void radeon_bo_mn_release(struct mmu_notifier *mn,
> + struct mm_struct *mm)
> +{
> + struct radeon_bo *bo = container_of(mn, struct radeon_bo, mn);
> +
> + bo->mn.ops = NULL;
> +}
> +
> +static void radeon_bo_mn_invalidate_range_start(struct mmu_notifier *mn,
> + struct mm_struct *mm,
> + unsigned long start,
> + unsigned long end)
> +{
> + struct radeon_bo *bo = container_of(mn, struct radeon_bo, mn);
> + int r;
> +
> + /* TODO: optimize! */
> +
> + r = radeon_bo_reserve(bo, true);
> + if (r) {
> + dev_err(bo->rdev->dev, "(%d) failed to reserve user bo\n", r);
> + return;
> + }
> +
> + radeon_ttm_placement_from_domain(bo, RADEON_GEM_DOMAIN_CPU);
> + r = ttm_bo_validate(&bo->tbo, &bo->placement, false, false);
> + if (r) {
> + dev_err(bo->rdev->dev, "(%d) failed to validate user bo\n", r);
> + return;
> + }
> +
> + radeon_bo_unreserve(bo);
> +}
So you most likely want ttm_bo_validate to be interruptible but you also
want to call something like io_schedule so you are not just wasting time
slice of the process for the own selfishness of the gpu driver.
Also the invalidate_range is callback that is a common call during process
lifetime.
Moreover nothing prevent a cs that is call after the range_start callback
to pin again the buffer before the range_end callback is made. Thus this
is extremly racy.
> +
> +static const struct mmu_notifier_ops radeon_bo_notifier = {
> + .release = radeon_bo_mn_release,
> + .invalidate_range_start = radeon_bo_mn_invalidate_range_start,
> +};
What about invalidate_page callback ? You are breaking write back to
disk. Which is not a welcome move.
> +
> +int radeon_bo_register_mm_callback(struct radeon_bo *bo)
> +{
> + bo->mn.ops = &radeon_bo_notifier;
> + return mmu_notifier_register(&bo->mn, current->mm);
> +}
> diff --git a/drivers/gpu/drm/radeon/radeon_object.h b/drivers/gpu/drm/radeon/radeon_object.h
> index 5a873f3..b10368a 100644
> --- a/drivers/gpu/drm/radeon/radeon_object.h
> +++ b/drivers/gpu/drm/radeon/radeon_object.h
> @@ -153,6 +153,7 @@ extern void radeon_bo_move_notify(struct ttm_buffer_object *bo,
> struct ttm_mem_reg *new_mem);
> extern int radeon_bo_fault_reserve_notify(struct ttm_buffer_object *bo);
> extern int radeon_bo_get_surface_reg(struct radeon_bo *bo);
> +extern int radeon_bo_register_mm_callback(struct radeon_bo *bo);
>
> /*
> * sub allocation
> diff --git a/drivers/gpu/drm/radeon/radeon_prime.c b/drivers/gpu/drm/radeon/radeon_prime.c
> index 2007456..1f0d8f7 100644
> --- a/drivers/gpu/drm/radeon/radeon_prime.c
> +++ b/drivers/gpu/drm/radeon/radeon_prime.c
> @@ -103,3 +103,13 @@ void radeon_gem_prime_unpin(struct drm_gem_object *obj)
> radeon_bo_unpin(bo);
> radeon_bo_unreserve(bo);
> }
> +
> +struct dma_buf *radeon_gem_prime_export(struct drm_device *dev,
> + struct drm_gem_object *gobj,
> + int flags)
> +{
> + struct radeon_bo *bo = gem_to_radeon_bo(gobj);
> + if (radeon_ttm_tt_has_userptr(bo->tbo.ttm))
> + return ERR_PTR(-EPERM);
> + return drm_gem_prime_export(dev, gobj, flags);
> +}
> diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c b/drivers/gpu/drm/radeon/radeon_ttm.c
> index c8a8a51..37c5e19 100644
> --- a/drivers/gpu/drm/radeon/radeon_ttm.c
> +++ b/drivers/gpu/drm/radeon/radeon_ttm.c
> @@ -515,6 +515,9 @@ struct radeon_ttm_tt {
> struct ttm_dma_tt ttm;
> struct radeon_device *rdev;
> u64 offset;
> +
> + uint64_t userptr;
> + struct mm_struct *usermm;
> };
>
> static int radeon_ttm_backend_bind(struct ttm_tt *ttm,
> @@ -588,6 +591,77 @@ static struct ttm_tt *radeon_ttm_tt_create(struct ttm_bo_device *bdev,
> return >t->ttm.ttm;
> }
>
> +static int radeon_ttm_tt_pin_userptr(struct ttm_tt *ttm)
> +{
> + struct radeon_device *rdev = radeon_get_rdev(ttm->bdev);
> + struct radeon_ttm_tt *gtt = (void *)ttm;
> + unsigned pinned = 0, nents;
> + int r;
> +
> + /* prepare the sg table with the user pages */
> + if (current->mm != gtt->usermm)
> + return -EPERM;
> +
> + do {
> + unsigned num_pages = ttm->num_pages - pinned;
> + uint64_t userptr = gtt->userptr + pinned * PAGE_SIZE;
> + struct page **pages = ttm->pages + pinned;
> +
> + r = get_user_pages(current, current->mm, userptr, num_pages,
> + 1, 0, pages, NULL);
> + if (r < 0) {
> + release_pages(ttm->pages, pinned, 0);
> + return r;
> + }
> + pinned += r;
> +
> + } while (pinned < ttm->num_pages);
> +
> + r = -ENOMEM;
> + ttm->sg = kcalloc(1, sizeof(struct sg_table), GFP_KERNEL);
> + if (!ttm->sg)
> + goto release_pages;
> +
> + r = sg_alloc_table_from_pages(ttm->sg, ttm->pages, ttm->num_pages, 0,
> + ttm->num_pages << PAGE_SHIFT,
> + GFP_KERNEL);
> + if (r)
> + goto release_sg;
> +
> + r = -ENOMEM;
> + nents = dma_map_sg(rdev->dev, ttm->sg->sgl, ttm->sg->nents,
> + DMA_BIDIRECTIONAL);
> + if (nents != ttm->sg->nents)
> + goto release_sg;
> +
> + return 0;
> +
> +release_sg:
> + kfree(ttm->sg);
> +
> +release_pages:
> + release_pages(ttm->pages, pinned, 0);
> + return r;
> +}
So as said above page back the user mapping might change. As the gup
comment explain there is no garanty that page returned by gup are the
one in use to back the address. Well for thing that will be use as
read source by the gpu this is not much of an issue as you can probably
assume that by the time you pin the bo the user mapping is done writting
to it.
But the case of GPU writing to it and then userspace expecting to read
back through its user space mapping is just a game of luck. You will be
lucky lot of time because the kernel does not migrate page just for the
kick of it. But in case it does you will have inconsistency and user of
that API will not understand why the hell they are having an issue.
This kind of API is realy a no go the way the mm code works does not allow
for such thing. I will go look at the intel code but i fear the worst.
Cheers,
Jérôme
> +
> +static void radeon_ttm_tt_unpin_userptr(struct ttm_tt *ttm)
> +{
> + struct radeon_device *rdev = radeon_get_rdev(ttm->bdev);
> + unsigned i;
> +
> + /* free the sg table and pages again */
> + dma_unmap_sg(rdev->dev, ttm->sg->sgl,
> + ttm->sg->nents, DMA_BIDIRECTIONAL);
> +
> + sg_free_table(ttm->sg);
> + kfree(ttm->sg);
> +
> + for (i = 0; i < ttm->num_pages; ++i)
> + set_page_dirty(ttm->pages[i]);
> +
> + release_pages(ttm->pages, ttm->num_pages, 0);
> +}
> +
> static int radeon_ttm_tt_populate(struct ttm_tt *ttm)
> {
> struct radeon_device *rdev;
> @@ -599,6 +673,13 @@ static int radeon_ttm_tt_populate(struct ttm_tt *ttm)
> if (ttm->state != tt_unpopulated)
> return 0;
>
> + if (gtt->userptr) {
> + r = radeon_ttm_tt_pin_userptr(ttm);
> + if (r)
> + return r;
> + slave = true;
> + }
> +
> if (slave && ttm->sg) {
> drm_prime_sg_to_page_addr_arrays(ttm->sg, ttm->pages,
> gtt->ttm.dma_address, ttm->num_pages);
> @@ -648,6 +729,11 @@ static void radeon_ttm_tt_unpopulate(struct ttm_tt *ttm)
> unsigned i;
> bool slave = !!(ttm->page_flags & TTM_PAGE_FLAG_SG);
>
> + if (gtt->userptr) {
> + radeon_ttm_tt_unpin_userptr(ttm);
> + return;
> + }
> +
> if (slave)
> return;
>
> @@ -676,6 +762,28 @@ static void radeon_ttm_tt_unpopulate(struct ttm_tt *ttm)
> ttm_pool_unpopulate(ttm);
> }
>
> +int radeon_ttm_tt_set_userptr(struct ttm_tt *ttm, uint64_t userptr)
> +{
> + struct radeon_ttm_tt *gtt = (void *)ttm;
> +
> + if (gtt == NULL)
> + return -EINVAL;
> +
> + gtt->userptr = userptr;
> + gtt->usermm = current->mm;
> + return 0;
> +}
> +
> +bool radeon_ttm_tt_has_userptr(struct ttm_tt *ttm)
> +{
> + struct radeon_ttm_tt *gtt = (void *)ttm;
> +
> + if (gtt == NULL)
> + return false;
> +
> + return !!gtt->userptr;
> +}
> +
> static struct ttm_bo_driver radeon_bo_driver = {
> .ttm_tt_create = &radeon_ttm_tt_create,
> .ttm_tt_populate = &radeon_ttm_tt_populate,
> diff --git a/include/uapi/drm/radeon_drm.h b/include/uapi/drm/radeon_drm.h
> index 1cc0b61..54e7421 100644
> --- a/include/uapi/drm/radeon_drm.h
> +++ b/include/uapi/drm/radeon_drm.h
> @@ -511,6 +511,7 @@ typedef struct {
> #define DRM_RADEON_GEM_BUSY 0x2a
> #define DRM_RADEON_GEM_VA 0x2b
> #define DRM_RADEON_GEM_OP 0x2c
> +#define DRM_RADEON_GEM_IMPORT 0x2d
>
> #define DRM_IOCTL_RADEON_CP_INIT DRM_IOW( DRM_COMMAND_BASE + DRM_RADEON_CP_INIT, drm_radeon_init_t)
> #define DRM_IOCTL_RADEON_CP_START DRM_IO( DRM_COMMAND_BASE + DRM_RADEON_CP_START)
> @@ -554,6 +555,7 @@ typedef struct {
> #define DRM_IOCTL_RADEON_GEM_BUSY DRM_IOWR(DRM_COMMAND_BASE + DRM_RADEON_GEM_BUSY, struct drm_radeon_gem_busy)
> #define DRM_IOCTL_RADEON_GEM_VA DRM_IOWR(DRM_COMMAND_BASE + DRM_RADEON_GEM_VA, struct drm_radeon_gem_va)
> #define DRM_IOCTL_RADEON_GEM_OP DRM_IOWR(DRM_COMMAND_BASE + DRM_RADEON_GEM_OP, struct drm_radeon_gem_op)
> +#define DRM_IOCTL_RADEON_GEM_IMPORT DRM_IOWR(DRM_COMMAND_BASE + DRM_RADEON_GEM_IMPORT, struct drm_radeon_gem_import)
>
> typedef struct drm_radeon_init {
> enum {
> @@ -806,6 +808,13 @@ struct drm_radeon_gem_create {
> uint32_t flags;
> };
>
> +struct drm_radeon_gem_import {
> + uint64_t addr;
> + uint64_t size;
> + uint32_t flags;
> + uint32_t handle;
> +};
> +
> #define RADEON_TILING_MACRO 0x1
> #define RADEON_TILING_MICRO 0x2
> #define RADEON_TILING_SWAP_16BIT 0x4
> --
> 1.9.1
>
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/dri-devel
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] drm/radeon: add user pointer support v2
2014-06-30 17:35 ` Jerome Glisse
@ 2014-07-01 9:49 ` Christian König
2014-07-01 15:35 ` Jerome Glisse
0 siblings, 1 reply; 7+ messages in thread
From: Christian König @ 2014-07-01 9:49 UTC (permalink / raw)
To: Jerome Glisse; +Cc: dri-devel
Am 30.06.2014 19:35, schrieb Jerome Glisse:
> On Mon, Jun 30, 2014 at 03:04:16PM +0200, Christian König wrote:
>> From: Christian König <christian.koenig@amd.com>
>>
>> This patch adds an IOCTL for turning a pointer supplied by
>> userspace into a buffer object.
>>
>> It works similar to the recently added userptr support for i915,
>> so it also imposes several restrictions upon the memory being mapped:
>>
>> 1. It must be page aligned (both start/end addresses, i.e ptr and size).
>>
>> 2. It must be normal system memory, not a pointer into another map of IO
>> space (e.g. it must not be a GTT mmapping of another object).
>>
>> 3. The BO is mapped into GTT, so the maximum amount of memory mapped at
>> all times is still the GTT limit.
>>
>> Exporting and sharing as well as mapping of buffer objects created by
>> this function is forbidden and results in an -EPERM.
>>
>> In difference to the i915 implementation we don't use an interval tree to
>> manage all the buffers objects created and instead just register a separate
>> MMU notifier callback for each BO so that buffer objects are invalidated
>> whenever any modification is made to the processes page table. This probably
>> needs further optimization.
>>
>> Please review and / or comment.
> NAK, this is kind of an horror show. I should probably take a look at
> Intel code too.
Yeah, I'm not to keen about the approach either and would rather prefer
using the IOMMU, but unfortunately that isn't available under all use
cases we need to be able to handle.
BTW: Here is the link to the initial commit of the Intel implementation
as it is in Linus tree:
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=5cc9ed4b9a7ac579362ccebac67f7a4cdb36de06
> First registering one notifier per bo is a bad idea, it is not what you
> want to do.
I know, the Intel guys use a single registration for each process and an
interval tree to lockup all the BOs affected.
I just wanted to keep the implementation simple and stupid for now and
first get feedback about the approach in general (to save time if the
whole approach won't be accepted at all and your reaction indeed
confirms that this was a good idea).
>> [SNIP]
>> @@ -176,8 +181,15 @@ static int radeon_cs_parser_relocs(struct radeon_cs_parser *p)
>> if (p->cs_flags & RADEON_CS_USE_VM)
>> p->vm_bos = radeon_vm_get_bos(p->rdev, p->ib.vm,
>> &p->validated);
>> + if (need_mmap_lock)
>> + down_read(¤t->mm->mmap_sem);
>> +
>> + r = radeon_bo_list_validate(p->rdev, &p->ticket, &p->validated, p->ring);
>>
>> - return radeon_bo_list_validate(p->rdev, &p->ticket, &p->validated, p->ring);
>> + if (need_mmap_lock)
>> + up_read(¤t->mm->mmap_sem);
>> +
>> + return r;
>> }
> So the mmap lock protect almost nothing here. Once things are validated
> nothing forbid the userspace to unmap the range containing the user bo.
As far as I understand it (and I'm probably not so deep into the MM code
as you, so correct me if I'm wrong) unmapping the range would result in
an invalidate_range_start callback and this callback in turn validates
the BO into the CPU domain again. So it actually blocks for the GPU
operation to be completed, marks the pages in question as dirty and then
unpins them.
This should protected us anything nasty to happen in the case of a
unmap(), fork() etc...
>>
>> static int radeon_cs_get_ring(struct radeon_cs_parser *p, u32 ring, s32 priority)
>> diff --git a/drivers/gpu/drm/radeon/radeon_drv.c b/drivers/gpu/drm/radeon/radeon_drv.c
>> index 6e30174..355aa67 100644
>> --- a/drivers/gpu/drm/radeon/radeon_drv.c
>> +++ b/drivers/gpu/drm/radeon/radeon_drv.c
>> @@ -112,6 +112,9 @@ int radeon_gem_object_open(struct drm_gem_object *obj,
>> struct drm_file *file_priv);
>> void radeon_gem_object_close(struct drm_gem_object *obj,
>> struct drm_file *file_priv);
>> +struct dma_buf *radeon_gem_prime_export(struct drm_device *dev,
>> + struct drm_gem_object *gobj,
>> + int flags);
> Use tab not space there is already enough broken tab/space in radeon
> kernel as it is. I always tried to keep it consistant.
Fixed, thanks for the hint.
>> [SNIP]
>> +static void radeon_bo_mn_release(struct mmu_notifier *mn,
>> + struct mm_struct *mm)
>> +{
>> + struct radeon_bo *bo = container_of(mn, struct radeon_bo, mn);
>> +
>> + bo->mn.ops = NULL;
>> +}
>> +
>> +static void radeon_bo_mn_invalidate_range_start(struct mmu_notifier *mn,
>> + struct mm_struct *mm,
>> + unsigned long start,
>> + unsigned long end)
>> +{
>> + struct radeon_bo *bo = container_of(mn, struct radeon_bo, mn);
>> + int r;
>> +
>> + /* TODO: optimize! */
>> +
>> + r = radeon_bo_reserve(bo, true);
>> + if (r) {
>> + dev_err(bo->rdev->dev, "(%d) failed to reserve user bo\n", r);
>> + return;
>> + }
>> +
>> + radeon_ttm_placement_from_domain(bo, RADEON_GEM_DOMAIN_CPU);
>> + r = ttm_bo_validate(&bo->tbo, &bo->placement, false, false);
>> + if (r) {
>> + dev_err(bo->rdev->dev, "(%d) failed to validate user bo\n", r);
>> + return;
>> + }
>> +
>> + radeon_bo_unreserve(bo);
>> +}
> So you most likely want ttm_bo_validate to be interruptible but you also
> want to call something like io_schedule so you are not just wasting time
> slice of the process for the own selfishness of the gpu driver.
>
> Also the invalidate_range is callback that is a common call during process
> lifetime.
I know that this is horrible inefficient and error prone, but we can't
make ttm_bo_validate interruptible because invalidate_range_start
doesn't allows us to return an error number and so we can't rely on
userspace repeating the syscall causing the mapping change in case of a
signal.
The Intel code does it the same way and it actually send a cold shiver
down my back that this got upstream, but hey I'm not the one who
reviewed it and actually don't have a better idea either.
> Moreover nothing prevent a cs that is call after the range_start callback
> to pin again the buffer before the range_end callback is made. Thus this
> is extremly racy.
Oh really? Crap, I thought for all the cases that we are interested in
the mmap_sem would be locked between range_start and range_end.
>> +
>> +static const struct mmu_notifier_ops radeon_bo_notifier = {
>> + .release = radeon_bo_mn_release,
>> + .invalidate_range_start = radeon_bo_mn_invalidate_range_start,
>> +};
> What about invalidate_page callback ? You are breaking write back to
> disk. Which is not a welcome move.
Why? Keep in mind that the pages are pinned down as soon as we make the
first CS which uses them and marked as dirty when we are done with them.
In between those two events the page shouldn't be written back to disk.
>> [SNIP]
>> +static int radeon_ttm_tt_pin_userptr(struct ttm_tt *ttm)
>> +{
>> + struct radeon_device *rdev = radeon_get_rdev(ttm->bdev);
>> + struct radeon_ttm_tt *gtt = (void *)ttm;
>> + unsigned pinned = 0, nents;
>> + int r;
>> +
>> + /* prepare the sg table with the user pages */
>> + if (current->mm != gtt->usermm)
>> + return -EPERM;
>> +
>> + do {
>> + unsigned num_pages = ttm->num_pages - pinned;
>> + uint64_t userptr = gtt->userptr + pinned * PAGE_SIZE;
>> + struct page **pages = ttm->pages + pinned;
>> +
>> + r = get_user_pages(current, current->mm, userptr, num_pages,
>> + 1, 0, pages, NULL);
>> + if (r < 0) {
>> + release_pages(ttm->pages, pinned, 0);
>> + return r;
>> + }
>> + pinned += r;
>> +
>> + } while (pinned < ttm->num_pages);
>> +
>> + r = -ENOMEM;
>> + ttm->sg = kcalloc(1, sizeof(struct sg_table), GFP_KERNEL);
>> + if (!ttm->sg)
>> + goto release_pages;
>> +
>> + r = sg_alloc_table_from_pages(ttm->sg, ttm->pages, ttm->num_pages, 0,
>> + ttm->num_pages << PAGE_SHIFT,
>> + GFP_KERNEL);
>> + if (r)
>> + goto release_sg;
>> +
>> + r = -ENOMEM;
>> + nents = dma_map_sg(rdev->dev, ttm->sg->sgl, ttm->sg->nents,
>> + DMA_BIDIRECTIONAL);
>> + if (nents != ttm->sg->nents)
>> + goto release_sg;
>> +
>> + return 0;
>> +
>> +release_sg:
>> + kfree(ttm->sg);
>> +
>> +release_pages:
>> + release_pages(ttm->pages, pinned, 0);
>> + return r;
>> +}
> So as said above page back the user mapping might change. As the gup
> comment explain there is no garanty that page returned by gup are the
> one in use to back the address. Well for thing that will be use as
> read source by the gpu this is not much of an issue as you can probably
> assume that by the time you pin the bo the user mapping is done writting
> to it.
>
> But the case of GPU writing to it and then userspace expecting to read
> back through its user space mapping is just a game of luck. You will be
> lucky lot of time because the kernel does not migrate page just for the
> kick of it. But in case it does you will have inconsistency and user of
> that API will not understand why the hell they are having an issue.
Yeah, completely agree. I was aware of those limitations even before
starting to work on it, but hoped that registering an
invalidate_range_start callback like the Intel codes does would solve
the issue (well this code indeed got upstream).
Unfortunately I'm not the one who decides if we do it or not, I'm just
the one who volunteered to figure out how it is possible to do it in a
clean way.
> This kind of API is realy a no go the way the mm code works does not allow
> for such thing. I will go look at the intel code but i fear the worst.
Please keep me updated on this, if we can't get a solution about this
upstream I really need to be able to explain why.
Thanks for the comments,
Christian.
>
> Cheers,
> Jérôme
>
>> +
>> +static void radeon_ttm_tt_unpin_userptr(struct ttm_tt *ttm)
>> +{
>> + struct radeon_device *rdev = radeon_get_rdev(ttm->bdev);
>> + unsigned i;
>> +
>> + /* free the sg table and pages again */
>> + dma_unmap_sg(rdev->dev, ttm->sg->sgl,
>> + ttm->sg->nents, DMA_BIDIRECTIONAL);
>> +
>> + sg_free_table(ttm->sg);
>> + kfree(ttm->sg);
>> +
>> + for (i = 0; i < ttm->num_pages; ++i)
>> + set_page_dirty(ttm->pages[i]);
>> +
>> + release_pages(ttm->pages, ttm->num_pages, 0);
>> +}
>> +
>> static int radeon_ttm_tt_populate(struct ttm_tt *ttm)
>> {
>> struct radeon_device *rdev;
>> @@ -599,6 +673,13 @@ static int radeon_ttm_tt_populate(struct ttm_tt *ttm)
>> if (ttm->state != tt_unpopulated)
>> return 0;
>>
>> + if (gtt->userptr) {
>> + r = radeon_ttm_tt_pin_userptr(ttm);
>> + if (r)
>> + return r;
>> + slave = true;
>> + }
>> +
>> if (slave && ttm->sg) {
>> drm_prime_sg_to_page_addr_arrays(ttm->sg, ttm->pages,
>> gtt->ttm.dma_address, ttm->num_pages);
>> @@ -648,6 +729,11 @@ static void radeon_ttm_tt_unpopulate(struct ttm_tt *ttm)
>> unsigned i;
>> bool slave = !!(ttm->page_flags & TTM_PAGE_FLAG_SG);
>>
>> + if (gtt->userptr) {
>> + radeon_ttm_tt_unpin_userptr(ttm);
>> + return;
>> + }
>> +
>> if (slave)
>> return;
>>
>> @@ -676,6 +762,28 @@ static void radeon_ttm_tt_unpopulate(struct ttm_tt *ttm)
>> ttm_pool_unpopulate(ttm);
>> }
>>
>> +int radeon_ttm_tt_set_userptr(struct ttm_tt *ttm, uint64_t userptr)
>> +{
>> + struct radeon_ttm_tt *gtt = (void *)ttm;
>> +
>> + if (gtt == NULL)
>> + return -EINVAL;
>> +
>> + gtt->userptr = userptr;
>> + gtt->usermm = current->mm;
>> + return 0;
>> +}
>> +
>> +bool radeon_ttm_tt_has_userptr(struct ttm_tt *ttm)
>> +{
>> + struct radeon_ttm_tt *gtt = (void *)ttm;
>> +
>> + if (gtt == NULL)
>> + return false;
>> +
>> + return !!gtt->userptr;
>> +}
>> +
>> static struct ttm_bo_driver radeon_bo_driver = {
>> .ttm_tt_create = &radeon_ttm_tt_create,
>> .ttm_tt_populate = &radeon_ttm_tt_populate,
>> diff --git a/include/uapi/drm/radeon_drm.h b/include/uapi/drm/radeon_drm.h
>> index 1cc0b61..54e7421 100644
>> --- a/include/uapi/drm/radeon_drm.h
>> +++ b/include/uapi/drm/radeon_drm.h
>> @@ -511,6 +511,7 @@ typedef struct {
>> #define DRM_RADEON_GEM_BUSY 0x2a
>> #define DRM_RADEON_GEM_VA 0x2b
>> #define DRM_RADEON_GEM_OP 0x2c
>> +#define DRM_RADEON_GEM_IMPORT 0x2d
>>
>> #define DRM_IOCTL_RADEON_CP_INIT DRM_IOW( DRM_COMMAND_BASE + DRM_RADEON_CP_INIT, drm_radeon_init_t)
>> #define DRM_IOCTL_RADEON_CP_START DRM_IO( DRM_COMMAND_BASE + DRM_RADEON_CP_START)
>> @@ -554,6 +555,7 @@ typedef struct {
>> #define DRM_IOCTL_RADEON_GEM_BUSY DRM_IOWR(DRM_COMMAND_BASE + DRM_RADEON_GEM_BUSY, struct drm_radeon_gem_busy)
>> #define DRM_IOCTL_RADEON_GEM_VA DRM_IOWR(DRM_COMMAND_BASE + DRM_RADEON_GEM_VA, struct drm_radeon_gem_va)
>> #define DRM_IOCTL_RADEON_GEM_OP DRM_IOWR(DRM_COMMAND_BASE + DRM_RADEON_GEM_OP, struct drm_radeon_gem_op)
>> +#define DRM_IOCTL_RADEON_GEM_IMPORT DRM_IOWR(DRM_COMMAND_BASE + DRM_RADEON_GEM_IMPORT, struct drm_radeon_gem_import)
>>
>> typedef struct drm_radeon_init {
>> enum {
>> @@ -806,6 +808,13 @@ struct drm_radeon_gem_create {
>> uint32_t flags;
>> };
>>
>> +struct drm_radeon_gem_import {
>> + uint64_t addr;
>> + uint64_t size;
>> + uint32_t flags;
>> + uint32_t handle;
>> +};
>> +
>> #define RADEON_TILING_MACRO 0x1
>> #define RADEON_TILING_MICRO 0x2
>> #define RADEON_TILING_SWAP_16BIT 0x4
>> --
>> 1.9.1
>>
>> _______________________________________________
>> dri-devel mailing list
>> dri-devel@lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/dri-devel
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] drm/radeon: add user pointer support v2
2014-07-01 9:49 ` Christian König
@ 2014-07-01 15:35 ` Jerome Glisse
2014-07-01 18:14 ` Christian König
0 siblings, 1 reply; 7+ messages in thread
From: Jerome Glisse @ 2014-07-01 15:35 UTC (permalink / raw)
To: Christian König; +Cc: dri-devel
On Tue, Jul 01, 2014 at 11:49:18AM +0200, Christian König wrote:
> Am 30.06.2014 19:35, schrieb Jerome Glisse:
> >On Mon, Jun 30, 2014 at 03:04:16PM +0200, Christian König wrote:
> >>From: Christian König <christian.koenig@amd.com>
> >>
> >>This patch adds an IOCTL for turning a pointer supplied by
> >>userspace into a buffer object.
> >>
> >>It works similar to the recently added userptr support for i915,
> >>so it also imposes several restrictions upon the memory being mapped:
> >>
> >>1. It must be page aligned (both start/end addresses, i.e ptr and size).
> >>
> >>2. It must be normal system memory, not a pointer into another map of IO
> >>space (e.g. it must not be a GTT mmapping of another object).
> >>
> >>3. The BO is mapped into GTT, so the maximum amount of memory mapped at
> >>all times is still the GTT limit.
> >>
> >>Exporting and sharing as well as mapping of buffer objects created by
> >>this function is forbidden and results in an -EPERM.
> >>
> >>In difference to the i915 implementation we don't use an interval tree to
> >>manage all the buffers objects created and instead just register a separate
> >>MMU notifier callback for each BO so that buffer objects are invalidated
> >>whenever any modification is made to the processes page table. This probably
> >>needs further optimization.
> >>
> >>Please review and / or comment.
> >NAK, this is kind of an horror show. I should probably take a look at
> >Intel code too.
>
> Yeah, I'm not to keen about the approach either and would rather
> prefer using the IOMMU, but unfortunately that isn't available under
> all use cases we need to be able to handle.
>
> BTW: Here is the link to the initial commit of the Intel
> implementation as it is in Linus tree: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=5cc9ed4b9a7ac579362ccebac67f7a4cdb36de06
>
> >First registering one notifier per bo is a bad idea, it is not what you
> >want to do.
>
> I know, the Intel guys use a single registration for each process
> and an interval tree to lockup all the BOs affected.
>
> I just wanted to keep the implementation simple and stupid for now
> and first get feedback about the approach in general (to save time
> if the whole approach won't be accepted at all and your reaction
> indeed confirms that this was a good idea).
>
> >>[SNIP]
> >>@@ -176,8 +181,15 @@ static int radeon_cs_parser_relocs(struct radeon_cs_parser *p)
> >> if (p->cs_flags & RADEON_CS_USE_VM)
> >> p->vm_bos = radeon_vm_get_bos(p->rdev, p->ib.vm,
> >> &p->validated);
> >>+ if (need_mmap_lock)
> >>+ down_read(¤t->mm->mmap_sem);
> >>+
> >>+ r = radeon_bo_list_validate(p->rdev, &p->ticket, &p->validated, p->ring);
> >>- return radeon_bo_list_validate(p->rdev, &p->ticket, &p->validated, p->ring);
> >>+ if (need_mmap_lock)
> >>+ up_read(¤t->mm->mmap_sem);
> >>+
> >>+ return r;
> >> }
> >So the mmap lock protect almost nothing here. Once things are validated
> >nothing forbid the userspace to unmap the range containing the user bo.
>
> As far as I understand it (and I'm probably not so deep into the MM
> code as you, so correct me if I'm wrong) unmapping the range would
> result in an invalidate_range_start callback and this callback in
> turn validates the BO into the CPU domain again. So it actually
> blocks for the GPU operation to be completed, marks the pages in
> question as dirty and then unpins them.
>
> This should protected us anything nasty to happen in the case of a
> unmap(), fork() etc...
Thing is nothing forbid from a new cs ioctl to happen right after
the radeon range_start callback but before the core mm code that
was about to do something. The mmap_sem will protect you from fork
or munmap but not from otherthing.
>
> >> static int radeon_cs_get_ring(struct radeon_cs_parser *p, u32 ring, s32 priority)
> >>diff --git a/drivers/gpu/drm/radeon/radeon_drv.c b/drivers/gpu/drm/radeon/radeon_drv.c
> >>index 6e30174..355aa67 100644
> >>--- a/drivers/gpu/drm/radeon/radeon_drv.c
> >>+++ b/drivers/gpu/drm/radeon/radeon_drv.c
> >>@@ -112,6 +112,9 @@ int radeon_gem_object_open(struct drm_gem_object *obj,
> >> struct drm_file *file_priv);
> >> void radeon_gem_object_close(struct drm_gem_object *obj,
> >> struct drm_file *file_priv);
> >>+struct dma_buf *radeon_gem_prime_export(struct drm_device *dev,
> >>+ struct drm_gem_object *gobj,
> >>+ int flags);
> >Use tab not space there is already enough broken tab/space in radeon
> >kernel as it is. I always tried to keep it consistant.
>
> Fixed, thanks for the hint.
>
> >>[SNIP]
> >>+static void radeon_bo_mn_release(struct mmu_notifier *mn,
> >>+ struct mm_struct *mm)
> >>+{
> >>+ struct radeon_bo *bo = container_of(mn, struct radeon_bo, mn);
> >>+
> >>+ bo->mn.ops = NULL;
> >>+}
> >>+
> >>+static void radeon_bo_mn_invalidate_range_start(struct mmu_notifier *mn,
> >>+ struct mm_struct *mm,
> >>+ unsigned long start,
> >>+ unsigned long end)
> >>+{
> >>+ struct radeon_bo *bo = container_of(mn, struct radeon_bo, mn);
> >>+ int r;
> >>+
> >>+ /* TODO: optimize! */
> >>+
> >>+ r = radeon_bo_reserve(bo, true);
> >>+ if (r) {
> >>+ dev_err(bo->rdev->dev, "(%d) failed to reserve user bo\n", r);
> >>+ return;
> >>+ }
> >>+
> >>+ radeon_ttm_placement_from_domain(bo, RADEON_GEM_DOMAIN_CPU);
> >>+ r = ttm_bo_validate(&bo->tbo, &bo->placement, false, false);
> >>+ if (r) {
> >>+ dev_err(bo->rdev->dev, "(%d) failed to validate user bo\n", r);
> >>+ return;
> >>+ }
> >>+
> >>+ radeon_bo_unreserve(bo);
> >>+}
> >So you most likely want ttm_bo_validate to be interruptible but you also
> >want to call something like io_schedule so you are not just wasting time
> >slice of the process for the own selfishness of the gpu driver.
> >
> >Also the invalidate_range is callback that is a common call during process
> >lifetime.
>
> I know that this is horrible inefficient and error prone, but we
> can't make ttm_bo_validate interruptible because
> invalidate_range_start doesn't allows us to return an error number
> and so we can't rely on userspace repeating the syscall causing the
> mapping change in case of a signal.
My thinking was to restart inside the notifier callback with an io_schedule
in btw.
> The Intel code does it the same way and it actually send a cold
> shiver down my back that this got upstream, but hey I'm not the one
> who reviewed it and actually don't have a better idea either.
>
> >Moreover nothing prevent a cs that is call after the range_start callback
> >to pin again the buffer before the range_end callback is made. Thus this
> >is extremly racy.
>
> Oh really? Crap, I thought for all the cases that we are interested
> in the mmap_sem would be locked between range_start and range_end.
It can be taken in read mode. Not all call to range_start/end happen
with mmap_sem in write mode.
> >>+
> >>+static const struct mmu_notifier_ops radeon_bo_notifier = {
> >>+ .release = radeon_bo_mn_release,
> >>+ .invalidate_range_start = radeon_bo_mn_invalidate_range_start,
> >>+};
> >What about invalidate_page callback ? You are breaking write back to
> >disk. Which is not a welcome move.
>
> Why? Keep in mind that the pages are pinned down as soon as we make
> the first CS which uses them and marked as dirty when we are done
> with them. In between those two events the page shouldn't be written
> back to disk.
Page could be dirty prior to be pin, or dirtied after being (CPU access).
I am just pointing out that the GPU might be writting to the page while
the page is use as source for disk I/O.
>
> >>[SNIP]
> >>+static int radeon_ttm_tt_pin_userptr(struct ttm_tt *ttm)
> >>+{
> >>+ struct radeon_device *rdev = radeon_get_rdev(ttm->bdev);
> >>+ struct radeon_ttm_tt *gtt = (void *)ttm;
> >>+ unsigned pinned = 0, nents;
> >>+ int r;
> >>+
> >>+ /* prepare the sg table with the user pages */
> >>+ if (current->mm != gtt->usermm)
> >>+ return -EPERM;
> >>+
> >>+ do {
> >>+ unsigned num_pages = ttm->num_pages - pinned;
> >>+ uint64_t userptr = gtt->userptr + pinned * PAGE_SIZE;
> >>+ struct page **pages = ttm->pages + pinned;
> >>+
> >>+ r = get_user_pages(current, current->mm, userptr, num_pages,
> >>+ 1, 0, pages, NULL);
> >>+ if (r < 0) {
> >>+ release_pages(ttm->pages, pinned, 0);
> >>+ return r;
> >>+ }
> >>+ pinned += r;
> >>+
> >>+ } while (pinned < ttm->num_pages);
> >>+
> >>+ r = -ENOMEM;
> >>+ ttm->sg = kcalloc(1, sizeof(struct sg_table), GFP_KERNEL);
> >>+ if (!ttm->sg)
> >>+ goto release_pages;
> >>+
> >>+ r = sg_alloc_table_from_pages(ttm->sg, ttm->pages, ttm->num_pages, 0,
> >>+ ttm->num_pages << PAGE_SHIFT,
> >>+ GFP_KERNEL);
> >>+ if (r)
> >>+ goto release_sg;
> >>+
> >>+ r = -ENOMEM;
> >>+ nents = dma_map_sg(rdev->dev, ttm->sg->sgl, ttm->sg->nents,
> >>+ DMA_BIDIRECTIONAL);
> >>+ if (nents != ttm->sg->nents)
> >>+ goto release_sg;
> >>+
> >>+ return 0;
> >>+
> >>+release_sg:
> >>+ kfree(ttm->sg);
> >>+
> >>+release_pages:
> >>+ release_pages(ttm->pages, pinned, 0);
> >>+ return r;
> >>+}
> >So as said above page back the user mapping might change. As the gup
> >comment explain there is no garanty that page returned by gup are the
> >one in use to back the address. Well for thing that will be use as
> >read source by the gpu this is not much of an issue as you can probably
> >assume that by the time you pin the bo the user mapping is done writting
> >to it.
> >
> >But the case of GPU writing to it and then userspace expecting to read
> >back through its user space mapping is just a game of luck. You will be
> >lucky lot of time because the kernel does not migrate page just for the
> >kick of it. But in case it does you will have inconsistency and user of
> >that API will not understand why the hell they are having an issue.
>
> Yeah, completely agree. I was aware of those limitations even before
> starting to work on it, but hoped that registering an
> invalidate_range_start callback like the Intel codes does would
> solve the issue (well this code indeed got upstream).
>
> Unfortunately I'm not the one who decides if we do it or not, I'm
> just the one who volunteered to figure out how it is possible to do
> it in a clean way.
>
> >This kind of API is realy a no go the way the mm code works does not allow
> >for such thing. I will go look at the intel code but i fear the worst.
>
> Please keep me updated on this, if we can't get a solution about
> this upstream I really need to be able to explain why.
>
The biggest issue with it is the pining of memory, this violate the memcg.
Direct-io pin memory but limit itself to a reasonable amount at a time.
A GPU buffer object might be several hundred mega byte which obviously
can be nasty from memory starvation point of view.
Most other issue can be worked out.
Cheers,
Jérôme
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] drm/radeon: add user pointer support v2
2014-07-01 15:35 ` Jerome Glisse
@ 2014-07-01 18:14 ` Christian König
2014-07-01 19:05 ` Jerome Glisse
0 siblings, 1 reply; 7+ messages in thread
From: Christian König @ 2014-07-01 18:14 UTC (permalink / raw)
To: Jerome Glisse; +Cc: dri-devel
Am 01.07.2014 17:35, schrieb Jerome Glisse:
> On Tue, Jul 01, 2014 at 11:49:18AM +0200, Christian König wrote:
>> Am 30.06.2014 19:35, schrieb Jerome Glisse:
>>> On Mon, Jun 30, 2014 at 03:04:16PM +0200, Christian König wrote:
>>>
>>> [SNIP]
>>> @@ -176,8 +181,15 @@ static int radeon_cs_parser_relocs(struct radeon_cs_parser *p)
>>> if (p->cs_flags & RADEON_CS_USE_VM)
>>> p->vm_bos = radeon_vm_get_bos(p->rdev, p->ib.vm,
>>> &p->validated);
>>> + if (need_mmap_lock)
>>> + down_read(¤t->mm->mmap_sem);
>>> +
>>> + r = radeon_bo_list_validate(p->rdev, &p->ticket, &p->validated, p->ring);
>>> - return radeon_bo_list_validate(p->rdev, &p->ticket, &p->validated, p->ring);
>>> + if (need_mmap_lock)
>>> + up_read(¤t->mm->mmap_sem);
>>> +
>>> + return r;
>>> }
>>> So the mmap lock protect almost nothing here. Once things are validated
>>> nothing forbid the userspace to unmap the range containing the user bo.
>> As far as I understand it (and I'm probably not so deep into the MM
>> code as you, so correct me if I'm wrong) unmapping the range would
>> result in an invalidate_range_start callback and this callback in
>> turn validates the BO into the CPU domain again. So it actually
>> blocks for the GPU operation to be completed, marks the pages in
>> question as dirty and then unpins them.
>>
>> This should protected us anything nasty to happen in the case of a
>> unmap(), fork() etc...
> Thing is nothing forbid from a new cs ioctl to happen right after
> the radeon range_start callback but before the core mm code that
> was about to do something. The mmap_sem will protect you from fork
> or munmap but not from otherthing.
Any suggestion on how to fix this?
I mean I could take the BO reservation in range_start and release it in
range_end, but I'm pretty sure the code calling the MMU notifier won't
like that.
[SNIP]
>>>> +
>>>> +static const struct mmu_notifier_ops radeon_bo_notifier = {
>>>> + .release = radeon_bo_mn_release,
>>>> + .invalidate_range_start = radeon_bo_mn_invalidate_range_start,
>>>> +};
>>> What about invalidate_page callback ? You are breaking write back to
>>> disk. Which is not a welcome move.
>> Why? Keep in mind that the pages are pinned down as soon as we make
>> the first CS which uses them and marked as dirty when we are done
>> with them. In between those two events the page shouldn't be written
>> back to disk.
> Page could be dirty prior to be pin, or dirtied after being (CPU access).
> I am just pointing out that the GPU might be writting to the page while
> the page is use as source for disk I/O.
I still don't see the problem here, the worst thing that could happen is
that we write a halve filled page to disk. But since we set the dirty
bit again after the GPU is done with the pages it should be written back
to disk again if necessary.
[SNIP]
> The biggest issue with it is the pining of memory, this violate the memcg.
> Direct-io pin memory but limit itself to a reasonable amount at a time.
> A GPU buffer object might be several hundred mega byte which obviously
> can be nasty from memory starvation point of view.
I thought we would handle this gracefully by accounting the memory
pinned down to the GTT size as well. E.g. allocating GTT buffers pins
down memory in the same way we would pin it down through this interface.
In the end the maximum amount of pinned down memory is always the GTT size.
Regards,
Christian.
>
> Most other issue can be worked out.
>
> Cheers,
> Jérôme
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] drm/radeon: add user pointer support v2
2014-07-01 18:14 ` Christian König
@ 2014-07-01 19:05 ` Jerome Glisse
2014-07-02 16:44 ` Christian König
0 siblings, 1 reply; 7+ messages in thread
From: Jerome Glisse @ 2014-07-01 19:05 UTC (permalink / raw)
To: Christian König; +Cc: dri-devel
On Tue, Jul 01, 2014 at 08:14:47PM +0200, Christian König wrote:
> Am 01.07.2014 17:35, schrieb Jerome Glisse:
> >On Tue, Jul 01, 2014 at 11:49:18AM +0200, Christian König wrote:
> >>Am 30.06.2014 19:35, schrieb Jerome Glisse:
> >>>On Mon, Jun 30, 2014 at 03:04:16PM +0200, Christian König wrote:
> >>>
> >>>[SNIP]
> >>>@@ -176,8 +181,15 @@ static int radeon_cs_parser_relocs(struct radeon_cs_parser *p)
> >>> if (p->cs_flags & RADEON_CS_USE_VM)
> >>> p->vm_bos = radeon_vm_get_bos(p->rdev, p->ib.vm,
> >>> &p->validated);
> >>>+ if (need_mmap_lock)
> >>>+ down_read(¤t->mm->mmap_sem);
> >>>+
> >>>+ r = radeon_bo_list_validate(p->rdev, &p->ticket, &p->validated, p->ring);
> >>>- return radeon_bo_list_validate(p->rdev, &p->ticket, &p->validated, p->ring);
> >>>+ if (need_mmap_lock)
> >>>+ up_read(¤t->mm->mmap_sem);
> >>>+
> >>>+ return r;
> >>> }
> >>>So the mmap lock protect almost nothing here. Once things are validated
> >>>nothing forbid the userspace to unmap the range containing the user bo.
> >>As far as I understand it (and I'm probably not so deep into the MM
> >>code as you, so correct me if I'm wrong) unmapping the range would
> >>result in an invalidate_range_start callback and this callback in
> >>turn validates the BO into the CPU domain again. So it actually
> >>blocks for the GPU operation to be completed, marks the pages in
> >>question as dirty and then unpins them.
> >>
> >>This should protected us anything nasty to happen in the case of a
> >>unmap(), fork() etc...
> >Thing is nothing forbid from a new cs ioctl to happen right after
> >the radeon range_start callback but before the core mm code that
> >was about to do something. The mmap_sem will protect you from fork
> >or munmap but not from otherthing.
>
> Any suggestion on how to fix this?
>
> I mean I could take the BO reservation in range_start and release it
> in range_end, but I'm pretty sure the code calling the MMU notifier
> won't like that.
I need to think a bit more about all the case other than munmap/fork
to see what might happen and under what circumstances.
>
> [SNIP]
> >>>>+
> >>>>+static const struct mmu_notifier_ops radeon_bo_notifier = {
> >>>>+ .release = radeon_bo_mn_release,
> >>>>+ .invalidate_range_start = radeon_bo_mn_invalidate_range_start,
> >>>>+};
> >>>What about invalidate_page callback ? You are breaking write back to
> >>>disk. Which is not a welcome move.
> >>Why? Keep in mind that the pages are pinned down as soon as we make
> >>the first CS which uses them and marked as dirty when we are done
> >>with them. In between those two events the page shouldn't be written
> >>back to disk.
> >Page could be dirty prior to be pin, or dirtied after being (CPU access).
> >I am just pointing out that the GPU might be writting to the page while
> >the page is use as source for disk I/O.
>
> I still don't see the problem here, the worst thing that could
> happen is that we write a halve filled page to disk. But since we
> set the dirty bit again after the GPU is done with the pages it
> should be written back to disk again if necessary.
This break the fsync semantic and expectation. This is the reason why the
cpu mapping is set to read only while disk i/o is in progress.
>
> [SNIP]
> >The biggest issue with it is the pining of memory, this violate the memcg.
> >Direct-io pin memory but limit itself to a reasonable amount at a time.
> >A GPU buffer object might be several hundred mega byte which obviously
> >can be nasty from memory starvation point of view.
>
> I thought we would handle this gracefully by accounting the memory
> pinned down to the GTT size as well. E.g. allocating GTT buffers
> pins down memory in the same way we would pin it down through this
> interface.
>
> In the end the maximum amount of pinned down memory is always the GTT size.
Yes this could be argued that way, that anyway user than can use the gpu
can already pin large amount of ram. But pining anonymous memory or file
backed memory (ie non regular bo memory) is different as pages are still
on the lru list and thus the vmscan code will still go over them which
might worsen the out of memory case.
>
> Regards,
> Christian.
>
> >
> >Most other issue can be worked out.
> >
> >Cheers,
> >Jérôme
> >
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] drm/radeon: add user pointer support v2
2014-07-01 19:05 ` Jerome Glisse
@ 2014-07-02 16:44 ` Christian König
0 siblings, 0 replies; 7+ messages in thread
From: Christian König @ 2014-07-02 16:44 UTC (permalink / raw)
To: Jerome Glisse; +Cc: dri-devel
> Yes this could be argued that way, that anyway user than can use the gpu
> can already pin large amount of ram. But pining anonymous memory or file
> backed memory (ie non regular bo memory) is different as pages are still
> on the lru list and thus the vmscan code will still go over them which
> might worsen the out of memory case.
Assuming we forbid file backed pages and only allow anonymous memory for
this feature, wouldn't it be possible to "steal" the pages in question
from the lru and add them as backing pages to an BO?
For userspace it should look like they just created a BO, copied the
content of the memory location in question into it and then mapped the
BO at the original location (obviously without the race condition that
such a thing would imply for doing it in userspace).
Just a thought,
Christian.
Am 01.07.2014 21:05, schrieb Jerome Glisse:
> On Tue, Jul 01, 2014 at 08:14:47PM +0200, Christian König wrote:
>> Am 01.07.2014 17:35, schrieb Jerome Glisse:
>>> On Tue, Jul 01, 2014 at 11:49:18AM +0200, Christian König wrote:
>>>> Am 30.06.2014 19:35, schrieb Jerome Glisse:
>>>>> On Mon, Jun 30, 2014 at 03:04:16PM +0200, Christian König wrote:
>>>>>
>>>>> [SNIP]
>>>>> @@ -176,8 +181,15 @@ static int radeon_cs_parser_relocs(struct radeon_cs_parser *p)
>>>>> if (p->cs_flags & RADEON_CS_USE_VM)
>>>>> p->vm_bos = radeon_vm_get_bos(p->rdev, p->ib.vm,
>>>>> &p->validated);
>>>>> + if (need_mmap_lock)
>>>>> + down_read(¤t->mm->mmap_sem);
>>>>> +
>>>>> + r = radeon_bo_list_validate(p->rdev, &p->ticket, &p->validated, p->ring);
>>>>> - return radeon_bo_list_validate(p->rdev, &p->ticket, &p->validated, p->ring);
>>>>> + if (need_mmap_lock)
>>>>> + up_read(¤t->mm->mmap_sem);
>>>>> +
>>>>> + return r;
>>>>> }
>>>>> So the mmap lock protect almost nothing here. Once things are validated
>>>>> nothing forbid the userspace to unmap the range containing the user bo.
>>>> As far as I understand it (and I'm probably not so deep into the MM
>>>> code as you, so correct me if I'm wrong) unmapping the range would
>>>> result in an invalidate_range_start callback and this callback in
>>>> turn validates the BO into the CPU domain again. So it actually
>>>> blocks for the GPU operation to be completed, marks the pages in
>>>> question as dirty and then unpins them.
>>>>
>>>> This should protected us anything nasty to happen in the case of a
>>>> unmap(), fork() etc...
>>> Thing is nothing forbid from a new cs ioctl to happen right after
>>> the radeon range_start callback but before the core mm code that
>>> was about to do something. The mmap_sem will protect you from fork
>>> or munmap but not from otherthing.
>> Any suggestion on how to fix this?
>>
>> I mean I could take the BO reservation in range_start and release it
>> in range_end, but I'm pretty sure the code calling the MMU notifier
>> won't like that.
> I need to think a bit more about all the case other than munmap/fork
> to see what might happen and under what circumstances.
>
>> [SNIP]
>>>>>> +
>>>>>> +static const struct mmu_notifier_ops radeon_bo_notifier = {
>>>>>> + .release = radeon_bo_mn_release,
>>>>>> + .invalidate_range_start = radeon_bo_mn_invalidate_range_start,
>>>>>> +};
>>>>> What about invalidate_page callback ? You are breaking write back to
>>>>> disk. Which is not a welcome move.
>>>> Why? Keep in mind that the pages are pinned down as soon as we make
>>>> the first CS which uses them and marked as dirty when we are done
>>>> with them. In between those two events the page shouldn't be written
>>>> back to disk.
>>> Page could be dirty prior to be pin, or dirtied after being (CPU access).
>>> I am just pointing out that the GPU might be writting to the page while
>>> the page is use as source for disk I/O.
>> I still don't see the problem here, the worst thing that could
>> happen is that we write a halve filled page to disk. But since we
>> set the dirty bit again after the GPU is done with the pages it
>> should be written back to disk again if necessary.
> This break the fsync semantic and expectation. This is the reason why the
> cpu mapping is set to read only while disk i/o is in progress.
>
>> [SNIP]
>>> The biggest issue with it is the pining of memory, this violate the memcg.
>>> Direct-io pin memory but limit itself to a reasonable amount at a time.
>>> A GPU buffer object might be several hundred mega byte which obviously
>>> can be nasty from memory starvation point of view.
>> I thought we would handle this gracefully by accounting the memory
>> pinned down to the GTT size as well. E.g. allocating GTT buffers
>> pins down memory in the same way we would pin it down through this
>> interface.
>>
>> In the end the maximum amount of pinned down memory is always the GTT size.
> Yes this could be argued that way, that anyway user than can use the gpu
> can already pin large amount of ram. But pining anonymous memory or file
> backed memory (ie non regular bo memory) is different as pages are still
> on the lru list and thus the vmscan code will still go over them which
> might worsen the out of memory case.
>
>> Regards,
>> Christian.
>>
>>> Most other issue can be worked out.
>>>
>>> Cheers,
>>> Jérôme
>>>
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2014-07-02 16:44 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-06-30 13:04 [PATCH] drm/radeon: add user pointer support v2 Christian König
2014-06-30 17:35 ` Jerome Glisse
2014-07-01 9:49 ` Christian König
2014-07-01 15:35 ` Jerome Glisse
2014-07-01 18:14 ` Christian König
2014-07-01 19:05 ` Jerome Glisse
2014-07-02 16:44 ` Christian König
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.