From mboxrd@z Thu Jan  1 00:00:00 1970
From: =?ISO-8859-1?Q?Christian_K=F6nig?= <deathsimple@vodafone.de>
Subject: Re: [PATCH] drm/radeon: add user pointer support v2
Date: Tue, 01 Jul 2014 11:49:18 +0200
Message-ID: <53B2841E.1000701@vodafone.de>
References: <1404133456-1790-1-git-send-email-deathsimple@vodafone.de>
 <20140630173526.GA3169@gmail.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="iso-8859-1"; Format="flowed"
Content-Transfer-Encoding: quoted-printable
Return-path: <dri-devel-bounces@lists.freedesktop.org>
Received: from pegasos-out.vodafone.de (pegasos-out.vodafone.de [80.84.1.38])
 by gabe.freedesktop.org (Postfix) with ESMTP id 8CCD16E356
 for <dri-devel@lists.freedesktop.org>; Tue,  1 Jul 2014 02:49:33 -0700 (PDT)
In-Reply-To: <20140630173526.GA3169@gmail.com>
List-Unsubscribe: <http://lists.freedesktop.org/mailman/options/dri-devel>,
 <mailto:dri-devel-request@lists.freedesktop.org?subject=unsubscribe>
List-Archive: <http://lists.freedesktop.org/archives/dri-devel>
List-Post: <mailto:dri-devel@lists.freedesktop.org>
List-Help: <mailto:dri-devel-request@lists.freedesktop.org?subject=help>
List-Subscribe: <http://lists.freedesktop.org/mailman/listinfo/dri-devel>,
 <mailto:dri-devel-request@lists.freedesktop.org?subject=subscribe>
Errors-To: dri-devel-bounces@lists.freedesktop.org
Sender: "dri-devel" <dri-devel-bounces@lists.freedesktop.org>
To: Jerome Glisse <j.glisse@gmail.com>
Cc: dri-devel@lists.freedesktop.org
List-Id: dri-devel@lists.freedesktop.org

Am 30.06.2014 19:35, schrieb Jerome Glisse:
> On Mon, Jun 30, 2014 at 03:04:16PM +0200, Christian K=F6nig wrote:
>> From: Christian K=F6nig <christian.koenig@amd.com>
>>
>> This patch adds an IOCTL for turning a pointer supplied by
>> userspace into a buffer object.
>>
>> It works similar to the recently added userptr support for i915,
>> so it also imposes several restrictions upon the memory being mapped:
>>
>> 1. It must be page aligned (both start/end addresses, i.e ptr and size).
>>
>> 2. It must be normal system memory, not a pointer into another map of IO
>> space (e.g. it must not be a GTT mmapping of another object).
>>
>> 3. The BO is mapped into GTT, so the maximum amount of memory mapped at
>> all times is still the GTT limit.
>>
>> Exporting and sharing as well as mapping of buffer objects created by
>> this function is forbidden and results in an -EPERM.
>>
>> In difference to the i915 implementation we don't use an interval tree to
>> manage all the buffers objects created and instead just register a separ=
ate
>> MMU notifier callback for each BO so that buffer objects are invalidated
>> whenever any modification is made to the processes page table. This prob=
ably
>> needs further optimization.
>>
>> Please review and / or comment.
> NAK, this is kind of an horror show. I should probably take a look at
> Intel code too.

Yeah, I'm not to keen about the approach either and would rather prefer =

using the IOMMU, but unfortunately that isn't available under all use =

cases we need to be able to handle.

BTW: Here is the link to the initial commit of the Intel implementation =

as it is in Linus tree: =

https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=
=3D5cc9ed4b9a7ac579362ccebac67f7a4cdb36de06

> First registering one notifier per bo is a bad idea, it is not what you
> want to do.

I know, the Intel guys use a single registration for each process and an =

interval tree to lockup all the BOs affected.

I just wanted to keep the implementation simple and stupid for now and =

first get feedback about the approach in general (to save time if the =

whole approach won't be accepted at all and your reaction indeed =

confirms that this was a good idea).

>> [SNIP]
>> @@ -176,8 +181,15 @@ static int radeon_cs_parser_relocs(struct radeon_cs=
_parser *p)
>>   	if (p->cs_flags & RADEON_CS_USE_VM)
>>   		p->vm_bos =3D radeon_vm_get_bos(p->rdev, p->ib.vm,
>>   					      &p->validated);
>> +	if (need_mmap_lock)
>> +		down_read(&current->mm->mmap_sem);
>> +
>> +	r =3D radeon_bo_list_validate(p->rdev, &p->ticket, &p->validated, p->r=
ing);
>>   =

>> -	return radeon_bo_list_validate(p->rdev, &p->ticket, &p->validated, p->=
ring);
>> +	if (need_mmap_lock)
>> +		up_read(&current->mm->mmap_sem);
>> +
>> +	return r;
>>   }
> So the mmap lock protect almost nothing here. Once things are validated
> nothing forbid the userspace to unmap the range containing the user bo.

As far as I understand it (and I'm probably not so deep into the MM code =

as you, so correct me if I'm wrong) unmapping the range would result in =

an invalidate_range_start callback and this callback in turn validates =

the BO into the CPU domain again. So it actually blocks for the GPU =

operation to be completed, marks the pages in question as dirty and then =

unpins them.

This should protected us anything nasty to happen in the case of a =

unmap(), fork() etc...

>>   =

>>   static int radeon_cs_get_ring(struct radeon_cs_parser *p, u32 ring, s3=
2 priority)
>> diff --git a/drivers/gpu/drm/radeon/radeon_drv.c b/drivers/gpu/drm/radeo=
n/radeon_drv.c
>> index 6e30174..355aa67 100644
>> --- a/drivers/gpu/drm/radeon/radeon_drv.c
>> +++ b/drivers/gpu/drm/radeon/radeon_drv.c
>> @@ -112,6 +112,9 @@ int radeon_gem_object_open(struct drm_gem_object *ob=
j,
>>   				struct drm_file *file_priv);
>>   void radeon_gem_object_close(struct drm_gem_object *obj,
>>   				struct drm_file *file_priv);
>> +struct dma_buf *radeon_gem_prime_export(struct drm_device *dev,
>> +                                        struct drm_gem_object *gobj,
>> +                                        int flags);
> Use tab not space there is already enough broken tab/space in radeon
> kernel as it is. I always tried to keep it consistant.

Fixed, thanks for the hint.

>> [SNIP]
>> +static void radeon_bo_mn_release(struct mmu_notifier *mn,
>> +				 struct mm_struct *mm)
>> +{
>> +	struct radeon_bo *bo =3D container_of(mn, struct radeon_bo, mn);
>> +
>> +	bo->mn.ops =3D NULL;
>> +}
>> +
>> +static void radeon_bo_mn_invalidate_range_start(struct mmu_notifier *mn,
>> +						struct mm_struct *mm,
>> +						unsigned long start,
>> +						unsigned long end)
>> +{
>> +	struct radeon_bo *bo =3D container_of(mn, struct radeon_bo, mn);
>> +	int r;
>> +
>> +	/* TODO: optimize! */
>> +
>> +	r =3D radeon_bo_reserve(bo, true);
>> +	if (r) {
>> +		dev_err(bo->rdev->dev, "(%d) failed to reserve user bo\n", r);
>> +		return;
>> +	}
>> +
>> +	radeon_ttm_placement_from_domain(bo, RADEON_GEM_DOMAIN_CPU);
>> +	r =3D ttm_bo_validate(&bo->tbo, &bo->placement, false, false);
>> +	if (r) {
>> +		dev_err(bo->rdev->dev, "(%d) failed to validate user bo\n", r);
>> +		return;
>> +	}
>> +
>> +	radeon_bo_unreserve(bo);
>> +}
> So you most likely want ttm_bo_validate to be interruptible but you also
> want to call something like io_schedule so you are not just wasting time
> slice of the process for the own selfishness of the gpu driver.
>
> Also the invalidate_range is callback that is a common call during process
> lifetime.

I know that this is horrible inefficient and error prone, but we can't =

make ttm_bo_validate interruptible because invalidate_range_start =

doesn't allows us to return an error number and so we can't rely on =

userspace repeating the syscall causing the mapping change in case of a =

signal.

The Intel code does it the same way and it actually send a cold shiver =

down my back that this got upstream, but hey I'm not the one who =

reviewed it and actually don't have a better idea either.

> Moreover nothing prevent a cs that is call after the range_start callback
> to pin again the buffer before the range_end callback is made. Thus this
> is extremly racy.

Oh really? Crap, I thought for all the cases that we are interested in =

the mmap_sem would be locked between range_start and range_end.

>> +
>> +static const struct mmu_notifier_ops radeon_bo_notifier =3D {
>> +	.release =3D radeon_bo_mn_release,
>> +	.invalidate_range_start =3D radeon_bo_mn_invalidate_range_start,
>> +};
> What about invalidate_page callback ? You are breaking write back to
> disk. Which is not a welcome move.

Why? Keep in mind that the pages are pinned down as soon as we make the =

first CS which uses them and marked as dirty when we are done with them. =

In between those two events the page shouldn't be written back to disk.

>> [SNIP]
>> +static int radeon_ttm_tt_pin_userptr(struct ttm_tt *ttm)
>> +{
>> +	struct radeon_device *rdev =3D radeon_get_rdev(ttm->bdev);
>> +	struct radeon_ttm_tt *gtt =3D (void *)ttm;
>> +	unsigned pinned =3D 0, nents;
>> +	int r;
>> +
>> +	/* prepare the sg table with the user pages */
>> +	if (current->mm !=3D gtt->usermm)
>> +		return -EPERM;
>> +
>> +	do {
>> +		unsigned num_pages =3D ttm->num_pages - pinned;
>> +		uint64_t userptr =3D gtt->userptr + pinned * PAGE_SIZE;
>> +		struct page **pages =3D ttm->pages + pinned;
>> +
>> +		r =3D get_user_pages(current, current->mm, userptr, num_pages,
>> +				   1, 0, pages, NULL);
>> +		if (r < 0) {
>> +			release_pages(ttm->pages, pinned, 0);
>> +			return r;
>> +		}
>> +		pinned +=3D r;
>> +
>> +	} while (pinned < ttm->num_pages);
>> +
>> +	r =3D -ENOMEM;
>> +	ttm->sg =3D kcalloc(1, sizeof(struct sg_table), GFP_KERNEL);
>> +	if (!ttm->sg)
>> +		goto release_pages;
>> +
>> +	r =3D sg_alloc_table_from_pages(ttm->sg, ttm->pages, ttm->num_pages, 0,
>> +				      ttm->num_pages << PAGE_SHIFT,
>> +				      GFP_KERNEL);
>> +	if (r)
>> +		goto release_sg;
>> +
>> +	r =3D -ENOMEM;
>> +	nents =3D dma_map_sg(rdev->dev, ttm->sg->sgl, ttm->sg->nents,
>> +			   DMA_BIDIRECTIONAL);
>> +	if (nents !=3D ttm->sg->nents)
>> +		goto release_sg;
>> +
>> +	return 0;
>> +
>> +release_sg:
>> +	kfree(ttm->sg);
>> +
>> +release_pages:
>> +	release_pages(ttm->pages, pinned, 0);
>> +	return r;
>> +}
> So as said above page back the user mapping might change. As the gup
> comment explain there is no garanty that page returned by gup are the
> one in use to back the address. Well for thing that will be use as
> read source by the gpu this is not much of an issue as you can probably
> assume that by the time you pin the bo the user mapping is done writting
> to it.
>
> But the case of GPU writing to it and then userspace expecting to read
> back through its user space mapping is just a game of luck. You will be
> lucky lot of time because the kernel does not migrate page just for the
> kick of it. But in case it does you will have inconsistency and user of
> that API will not understand why the hell they are having an issue.

Yeah, completely agree. I was aware of those limitations even before =

starting to work on it, but hoped that registering an =

invalidate_range_start callback like the Intel codes does would solve =

the issue (well this code indeed got upstream).

Unfortunately I'm not the one who decides if we do it or not, I'm just =

the one who volunteered to figure out how it is possible to do it in a =

clean way.

> This kind of API is realy a no go the way the mm code works does not allow
> for such thing. I will go look at the intel code but i fear the worst.

Please keep me updated on this, if we can't get a solution about this =

upstream I really need to be able to explain why.

Thanks for the comments,
Christian.

>
> Cheers,
> J=E9r=F4me
>
>> +
>> +static void radeon_ttm_tt_unpin_userptr(struct ttm_tt *ttm)
>> +{
>> +	struct radeon_device *rdev =3D radeon_get_rdev(ttm->bdev);
>> +	unsigned i;
>> +
>> +	/* free the sg table and pages again */
>> +	dma_unmap_sg(rdev->dev, ttm->sg->sgl,
>> +		     ttm->sg->nents, DMA_BIDIRECTIONAL);
>> +
>> +	sg_free_table(ttm->sg);
>> +	kfree(ttm->sg);
>> +
>> +	for (i =3D 0; i < ttm->num_pages; ++i)
>> +		set_page_dirty(ttm->pages[i]);
>> +
>> +	release_pages(ttm->pages, ttm->num_pages, 0);
>> +}
>> +
>>   static int radeon_ttm_tt_populate(struct ttm_tt *ttm)
>>   {
>>   	struct radeon_device *rdev;
>> @@ -599,6 +673,13 @@ static int radeon_ttm_tt_populate(struct ttm_tt *tt=
m)
>>   	if (ttm->state !=3D tt_unpopulated)
>>   		return 0;
>>   =

>> +	if (gtt->userptr) {
>> +		r =3D radeon_ttm_tt_pin_userptr(ttm);
>> +		if (r)
>> +			return r;
>> +		slave =3D true;
>> +	}
>> +
>>   	if (slave && ttm->sg) {
>>   		drm_prime_sg_to_page_addr_arrays(ttm->sg, ttm->pages,
>>   						 gtt->ttm.dma_address, ttm->num_pages);
>> @@ -648,6 +729,11 @@ static void radeon_ttm_tt_unpopulate(struct ttm_tt =
*ttm)
>>   	unsigned i;
>>   	bool slave =3D !!(ttm->page_flags & TTM_PAGE_FLAG_SG);
>>   =

>> +	if (gtt->userptr) {
>> +		radeon_ttm_tt_unpin_userptr(ttm);
>> +		return;
>> +	}
>> +
>>   	if (slave)
>>   		return;
>>   =

>> @@ -676,6 +762,28 @@ static void radeon_ttm_tt_unpopulate(struct ttm_tt =
*ttm)
>>   	ttm_pool_unpopulate(ttm);
>>   }
>>   =

>> +int radeon_ttm_tt_set_userptr(struct ttm_tt *ttm, uint64_t userptr)
>> +{
>> +	struct radeon_ttm_tt *gtt =3D (void *)ttm;
>> +
>> +	if (gtt =3D=3D NULL)
>> +		return -EINVAL;
>> +
>> +	gtt->userptr =3D userptr;
>> +	gtt->usermm =3D current->mm;
>> +	return 0;
>> +}
>> +
>> +bool radeon_ttm_tt_has_userptr(struct ttm_tt *ttm)
>> +{
>> +	struct radeon_ttm_tt *gtt =3D (void *)ttm;
>> +
>> +	if (gtt =3D=3D NULL)
>> +		return false;
>> +
>> +	return !!gtt->userptr;
>> +}
>> +
>>   static struct ttm_bo_driver radeon_bo_driver =3D {
>>   	.ttm_tt_create =3D &radeon_ttm_tt_create,
>>   	.ttm_tt_populate =3D &radeon_ttm_tt_populate,
>> diff --git a/include/uapi/drm/radeon_drm.h b/include/uapi/drm/radeon_drm=
.h
>> index 1cc0b61..54e7421 100644
>> --- a/include/uapi/drm/radeon_drm.h
>> +++ b/include/uapi/drm/radeon_drm.h
>> @@ -511,6 +511,7 @@ typedef struct {
>>   #define DRM_RADEON_GEM_BUSY		0x2a
>>   #define DRM_RADEON_GEM_VA		0x2b
>>   #define DRM_RADEON_GEM_OP		0x2c
>> +#define DRM_RADEON_GEM_IMPORT		0x2d
>>   =

>>   #define DRM_IOCTL_RADEON_CP_INIT    DRM_IOW( DRM_COMMAND_BASE + DRM_RA=
DEON_CP_INIT, drm_radeon_init_t)
>>   #define DRM_IOCTL_RADEON_CP_START   DRM_IO(  DRM_COMMAND_BASE + DRM_RA=
DEON_CP_START)
>> @@ -554,6 +555,7 @@ typedef struct {
>>   #define DRM_IOCTL_RADEON_GEM_BUSY	DRM_IOWR(DRM_COMMAND_BASE + DRM_RADE=
ON_GEM_BUSY, struct drm_radeon_gem_busy)
>>   #define DRM_IOCTL_RADEON_GEM_VA		DRM_IOWR(DRM_COMMAND_BASE + DRM_RADEO=
N_GEM_VA, struct drm_radeon_gem_va)
>>   #define DRM_IOCTL_RADEON_GEM_OP		DRM_IOWR(DRM_COMMAND_BASE + DRM_RADEO=
N_GEM_OP, struct drm_radeon_gem_op)
>> +#define DRM_IOCTL_RADEON_GEM_IMPORT	DRM_IOWR(DRM_COMMAND_BASE + DRM_RAD=
EON_GEM_IMPORT, struct drm_radeon_gem_import)
>>   =

>>   typedef struct drm_radeon_init {
>>   	enum {
>> @@ -806,6 +808,13 @@ struct drm_radeon_gem_create {
>>   	uint32_t	flags;
>>   };
>>   =

>> +struct drm_radeon_gem_import {
>> +	uint64_t		addr;
>> +	uint64_t		size;
>> +	uint32_t		flags;
>> +	uint32_t		handle;
>> +};
>> +
>>   #define RADEON_TILING_MACRO				0x1
>>   #define RADEON_TILING_MICRO				0x2
>>   #define RADEON_TILING_SWAP_16BIT			0x4
>> -- =

>> 1.9.1
>>
>> _______________________________________________
>> dri-devel mailing list
>> dri-devel@lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/dri-devel