From mboxrd@z Thu Jan 1 00:00:00 1970 From: =?ISO-8859-1?Q?Christian_K=F6nig?= Subject: Re: [PATCH] drm/radeon: add user pointer support v2 Date: Tue, 01 Jul 2014 11:49:18 +0200 Message-ID: <53B2841E.1000701@vodafone.de> References: <1404133456-1790-1-git-send-email-deathsimple@vodafone.de> <20140630173526.GA3169@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1"; Format="flowed" Content-Transfer-Encoding: quoted-printable Return-path: Received: from pegasos-out.vodafone.de (pegasos-out.vodafone.de [80.84.1.38]) by gabe.freedesktop.org (Postfix) with ESMTP id 8CCD16E356 for ; Tue, 1 Jul 2014 02:49:33 -0700 (PDT) In-Reply-To: <20140630173526.GA3169@gmail.com> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: Jerome Glisse Cc: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org Am 30.06.2014 19:35, schrieb Jerome Glisse: > On Mon, Jun 30, 2014 at 03:04:16PM +0200, Christian K=F6nig wrote: >> From: Christian K=F6nig >> >> This patch adds an IOCTL for turning a pointer supplied by >> userspace into a buffer object. >> >> It works similar to the recently added userptr support for i915, >> so it also imposes several restrictions upon the memory being mapped: >> >> 1. It must be page aligned (both start/end addresses, i.e ptr and size). >> >> 2. It must be normal system memory, not a pointer into another map of IO >> space (e.g. it must not be a GTT mmapping of another object). >> >> 3. The BO is mapped into GTT, so the maximum amount of memory mapped at >> all times is still the GTT limit. >> >> Exporting and sharing as well as mapping of buffer objects created by >> this function is forbidden and results in an -EPERM. >> >> In difference to the i915 implementation we don't use an interval tree to >> manage all the buffers objects created and instead just register a separ= ate >> MMU notifier callback for each BO so that buffer objects are invalidated >> whenever any modification is made to the processes page table. This prob= ably >> needs further optimization. >> >> Please review and / or comment. > NAK, this is kind of an horror show. I should probably take a look at > Intel code too. Yeah, I'm not to keen about the approach either and would rather prefer = using the IOMMU, but unfortunately that isn't available under all use = cases we need to be able to handle. BTW: Here is the link to the initial commit of the Intel implementation = as it is in Linus tree: = https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id= =3D5cc9ed4b9a7ac579362ccebac67f7a4cdb36de06 > First registering one notifier per bo is a bad idea, it is not what you > want to do. I know, the Intel guys use a single registration for each process and an = interval tree to lockup all the BOs affected. I just wanted to keep the implementation simple and stupid for now and = first get feedback about the approach in general (to save time if the = whole approach won't be accepted at all and your reaction indeed = confirms that this was a good idea). >> [SNIP] >> @@ -176,8 +181,15 @@ static int radeon_cs_parser_relocs(struct radeon_cs= _parser *p) >> if (p->cs_flags & RADEON_CS_USE_VM) >> p->vm_bos =3D radeon_vm_get_bos(p->rdev, p->ib.vm, >> &p->validated); >> + if (need_mmap_lock) >> + down_read(¤t->mm->mmap_sem); >> + >> + r =3D radeon_bo_list_validate(p->rdev, &p->ticket, &p->validated, p->r= ing); >> = >> - return radeon_bo_list_validate(p->rdev, &p->ticket, &p->validated, p->= ring); >> + if (need_mmap_lock) >> + up_read(¤t->mm->mmap_sem); >> + >> + return r; >> } > So the mmap lock protect almost nothing here. Once things are validated > nothing forbid the userspace to unmap the range containing the user bo. As far as I understand it (and I'm probably not so deep into the MM code = as you, so correct me if I'm wrong) unmapping the range would result in = an invalidate_range_start callback and this callback in turn validates = the BO into the CPU domain again. So it actually blocks for the GPU = operation to be completed, marks the pages in question as dirty and then = unpins them. This should protected us anything nasty to happen in the case of a = unmap(), fork() etc... >> = >> static int radeon_cs_get_ring(struct radeon_cs_parser *p, u32 ring, s3= 2 priority) >> diff --git a/drivers/gpu/drm/radeon/radeon_drv.c b/drivers/gpu/drm/radeo= n/radeon_drv.c >> index 6e30174..355aa67 100644 >> --- a/drivers/gpu/drm/radeon/radeon_drv.c >> +++ b/drivers/gpu/drm/radeon/radeon_drv.c >> @@ -112,6 +112,9 @@ int radeon_gem_object_open(struct drm_gem_object *ob= j, >> struct drm_file *file_priv); >> void radeon_gem_object_close(struct drm_gem_object *obj, >> struct drm_file *file_priv); >> +struct dma_buf *radeon_gem_prime_export(struct drm_device *dev, >> + struct drm_gem_object *gobj, >> + int flags); > Use tab not space there is already enough broken tab/space in radeon > kernel as it is. I always tried to keep it consistant. Fixed, thanks for the hint. >> [SNIP] >> +static void radeon_bo_mn_release(struct mmu_notifier *mn, >> + struct mm_struct *mm) >> +{ >> + struct radeon_bo *bo =3D container_of(mn, struct radeon_bo, mn); >> + >> + bo->mn.ops =3D NULL; >> +} >> + >> +static void radeon_bo_mn_invalidate_range_start(struct mmu_notifier *mn, >> + struct mm_struct *mm, >> + unsigned long start, >> + unsigned long end) >> +{ >> + struct radeon_bo *bo =3D container_of(mn, struct radeon_bo, mn); >> + int r; >> + >> + /* TODO: optimize! */ >> + >> + r =3D radeon_bo_reserve(bo, true); >> + if (r) { >> + dev_err(bo->rdev->dev, "(%d) failed to reserve user bo\n", r); >> + return; >> + } >> + >> + radeon_ttm_placement_from_domain(bo, RADEON_GEM_DOMAIN_CPU); >> + r =3D ttm_bo_validate(&bo->tbo, &bo->placement, false, false); >> + if (r) { >> + dev_err(bo->rdev->dev, "(%d) failed to validate user bo\n", r); >> + return; >> + } >> + >> + radeon_bo_unreserve(bo); >> +} > So you most likely want ttm_bo_validate to be interruptible but you also > want to call something like io_schedule so you are not just wasting time > slice of the process for the own selfishness of the gpu driver. > > Also the invalidate_range is callback that is a common call during process > lifetime. I know that this is horrible inefficient and error prone, but we can't = make ttm_bo_validate interruptible because invalidate_range_start = doesn't allows us to return an error number and so we can't rely on = userspace repeating the syscall causing the mapping change in case of a = signal. The Intel code does it the same way and it actually send a cold shiver = down my back that this got upstream, but hey I'm not the one who = reviewed it and actually don't have a better idea either. > Moreover nothing prevent a cs that is call after the range_start callback > to pin again the buffer before the range_end callback is made. Thus this > is extremly racy. Oh really? Crap, I thought for all the cases that we are interested in = the mmap_sem would be locked between range_start and range_end. >> + >> +static const struct mmu_notifier_ops radeon_bo_notifier =3D { >> + .release =3D radeon_bo_mn_release, >> + .invalidate_range_start =3D radeon_bo_mn_invalidate_range_start, >> +}; > What about invalidate_page callback ? You are breaking write back to > disk. Which is not a welcome move. Why? Keep in mind that the pages are pinned down as soon as we make the = first CS which uses them and marked as dirty when we are done with them. = In between those two events the page shouldn't be written back to disk. >> [SNIP] >> +static int radeon_ttm_tt_pin_userptr(struct ttm_tt *ttm) >> +{ >> + struct radeon_device *rdev =3D radeon_get_rdev(ttm->bdev); >> + struct radeon_ttm_tt *gtt =3D (void *)ttm; >> + unsigned pinned =3D 0, nents; >> + int r; >> + >> + /* prepare the sg table with the user pages */ >> + if (current->mm !=3D gtt->usermm) >> + return -EPERM; >> + >> + do { >> + unsigned num_pages =3D ttm->num_pages - pinned; >> + uint64_t userptr =3D gtt->userptr + pinned * PAGE_SIZE; >> + struct page **pages =3D ttm->pages + pinned; >> + >> + r =3D get_user_pages(current, current->mm, userptr, num_pages, >> + 1, 0, pages, NULL); >> + if (r < 0) { >> + release_pages(ttm->pages, pinned, 0); >> + return r; >> + } >> + pinned +=3D r; >> + >> + } while (pinned < ttm->num_pages); >> + >> + r =3D -ENOMEM; >> + ttm->sg =3D kcalloc(1, sizeof(struct sg_table), GFP_KERNEL); >> + if (!ttm->sg) >> + goto release_pages; >> + >> + r =3D sg_alloc_table_from_pages(ttm->sg, ttm->pages, ttm->num_pages, 0, >> + ttm->num_pages << PAGE_SHIFT, >> + GFP_KERNEL); >> + if (r) >> + goto release_sg; >> + >> + r =3D -ENOMEM; >> + nents =3D dma_map_sg(rdev->dev, ttm->sg->sgl, ttm->sg->nents, >> + DMA_BIDIRECTIONAL); >> + if (nents !=3D ttm->sg->nents) >> + goto release_sg; >> + >> + return 0; >> + >> +release_sg: >> + kfree(ttm->sg); >> + >> +release_pages: >> + release_pages(ttm->pages, pinned, 0); >> + return r; >> +} > So as said above page back the user mapping might change. As the gup > comment explain there is no garanty that page returned by gup are the > one in use to back the address. Well for thing that will be use as > read source by the gpu this is not much of an issue as you can probably > assume that by the time you pin the bo the user mapping is done writting > to it. > > But the case of GPU writing to it and then userspace expecting to read > back through its user space mapping is just a game of luck. You will be > lucky lot of time because the kernel does not migrate page just for the > kick of it. But in case it does you will have inconsistency and user of > that API will not understand why the hell they are having an issue. Yeah, completely agree. I was aware of those limitations even before = starting to work on it, but hoped that registering an = invalidate_range_start callback like the Intel codes does would solve = the issue (well this code indeed got upstream). Unfortunately I'm not the one who decides if we do it or not, I'm just = the one who volunteered to figure out how it is possible to do it in a = clean way. > This kind of API is realy a no go the way the mm code works does not allow > for such thing. I will go look at the intel code but i fear the worst. Please keep me updated on this, if we can't get a solution about this = upstream I really need to be able to explain why. Thanks for the comments, Christian. > > Cheers, > J=E9r=F4me > >> + >> +static void radeon_ttm_tt_unpin_userptr(struct ttm_tt *ttm) >> +{ >> + struct radeon_device *rdev =3D radeon_get_rdev(ttm->bdev); >> + unsigned i; >> + >> + /* free the sg table and pages again */ >> + dma_unmap_sg(rdev->dev, ttm->sg->sgl, >> + ttm->sg->nents, DMA_BIDIRECTIONAL); >> + >> + sg_free_table(ttm->sg); >> + kfree(ttm->sg); >> + >> + for (i =3D 0; i < ttm->num_pages; ++i) >> + set_page_dirty(ttm->pages[i]); >> + >> + release_pages(ttm->pages, ttm->num_pages, 0); >> +} >> + >> static int radeon_ttm_tt_populate(struct ttm_tt *ttm) >> { >> struct radeon_device *rdev; >> @@ -599,6 +673,13 @@ static int radeon_ttm_tt_populate(struct ttm_tt *tt= m) >> if (ttm->state !=3D tt_unpopulated) >> return 0; >> = >> + if (gtt->userptr) { >> + r =3D radeon_ttm_tt_pin_userptr(ttm); >> + if (r) >> + return r; >> + slave =3D true; >> + } >> + >> if (slave && ttm->sg) { >> drm_prime_sg_to_page_addr_arrays(ttm->sg, ttm->pages, >> gtt->ttm.dma_address, ttm->num_pages); >> @@ -648,6 +729,11 @@ static void radeon_ttm_tt_unpopulate(struct ttm_tt = *ttm) >> unsigned i; >> bool slave =3D !!(ttm->page_flags & TTM_PAGE_FLAG_SG); >> = >> + if (gtt->userptr) { >> + radeon_ttm_tt_unpin_userptr(ttm); >> + return; >> + } >> + >> if (slave) >> return; >> = >> @@ -676,6 +762,28 @@ static void radeon_ttm_tt_unpopulate(struct ttm_tt = *ttm) >> ttm_pool_unpopulate(ttm); >> } >> = >> +int radeon_ttm_tt_set_userptr(struct ttm_tt *ttm, uint64_t userptr) >> +{ >> + struct radeon_ttm_tt *gtt =3D (void *)ttm; >> + >> + if (gtt =3D=3D NULL) >> + return -EINVAL; >> + >> + gtt->userptr =3D userptr; >> + gtt->usermm =3D current->mm; >> + return 0; >> +} >> + >> +bool radeon_ttm_tt_has_userptr(struct ttm_tt *ttm) >> +{ >> + struct radeon_ttm_tt *gtt =3D (void *)ttm; >> + >> + if (gtt =3D=3D NULL) >> + return false; >> + >> + return !!gtt->userptr; >> +} >> + >> static struct ttm_bo_driver radeon_bo_driver =3D { >> .ttm_tt_create =3D &radeon_ttm_tt_create, >> .ttm_tt_populate =3D &radeon_ttm_tt_populate, >> diff --git a/include/uapi/drm/radeon_drm.h b/include/uapi/drm/radeon_drm= .h >> index 1cc0b61..54e7421 100644 >> --- a/include/uapi/drm/radeon_drm.h >> +++ b/include/uapi/drm/radeon_drm.h >> @@ -511,6 +511,7 @@ typedef struct { >> #define DRM_RADEON_GEM_BUSY 0x2a >> #define DRM_RADEON_GEM_VA 0x2b >> #define DRM_RADEON_GEM_OP 0x2c >> +#define DRM_RADEON_GEM_IMPORT 0x2d >> = >> #define DRM_IOCTL_RADEON_CP_INIT DRM_IOW( DRM_COMMAND_BASE + DRM_RA= DEON_CP_INIT, drm_radeon_init_t) >> #define DRM_IOCTL_RADEON_CP_START DRM_IO( DRM_COMMAND_BASE + DRM_RA= DEON_CP_START) >> @@ -554,6 +555,7 @@ typedef struct { >> #define DRM_IOCTL_RADEON_GEM_BUSY DRM_IOWR(DRM_COMMAND_BASE + DRM_RADE= ON_GEM_BUSY, struct drm_radeon_gem_busy) >> #define DRM_IOCTL_RADEON_GEM_VA DRM_IOWR(DRM_COMMAND_BASE + DRM_RADEO= N_GEM_VA, struct drm_radeon_gem_va) >> #define DRM_IOCTL_RADEON_GEM_OP DRM_IOWR(DRM_COMMAND_BASE + DRM_RADEO= N_GEM_OP, struct drm_radeon_gem_op) >> +#define DRM_IOCTL_RADEON_GEM_IMPORT DRM_IOWR(DRM_COMMAND_BASE + DRM_RAD= EON_GEM_IMPORT, struct drm_radeon_gem_import) >> = >> typedef struct drm_radeon_init { >> enum { >> @@ -806,6 +808,13 @@ struct drm_radeon_gem_create { >> uint32_t flags; >> }; >> = >> +struct drm_radeon_gem_import { >> + uint64_t addr; >> + uint64_t size; >> + uint32_t flags; >> + uint32_t handle; >> +}; >> + >> #define RADEON_TILING_MACRO 0x1 >> #define RADEON_TILING_MICRO 0x2 >> #define RADEON_TILING_SWAP_16BIT 0x4 >> -- = >> 1.9.1 >> >> _______________________________________________ >> dri-devel mailing list >> dri-devel@lists.freedesktop.org >> http://lists.freedesktop.org/mailman/listinfo/dri-devel