AMD-GFX Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: "Christian König" <christian.koenig-5C7GfCeVMHo@public.gmane.org>
To: "Kuehling, Felix" <Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
Cc: Jerome Glisse <j.glisse-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>,
	"Yang, Philip" <Philip.Yang-5C7GfCeVMHo@public.gmane.org>,
	"amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org"
	<amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org>
Subject: Re: [PATCH] drm/amdgpu: use HMM mirror callback to replace mmu notifier v4
Date: Thu, 27 Sep 2018 15:21:33 +0200	[thread overview]
Message-ID: <9b427976-f2ff-8ba1-6ebf-588ca95aef80@amd.com> (raw)
In-Reply-To: <DM5PR12MB17072AE77EB0DE3AD22B8D7892140-2J9CzHegvk9TCtO+SvGBKwdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>


[-- Attachment #1.1: Type: text/plain, Size: 14005 bytes --]

Am 27.09.2018 um 15:18 schrieb Kuehling, Felix:
>
> > The problem is here:
> >
>
> > ttm_eu_fence_buffer_objects(&p->ticket, &p->validated, p->fence);
>
> > amdgpu_mn_unlock(p->mn);
>
> >
> > We need to hold the lock until the fence is added to the reservation 
> object.
> >
> > Otherwise somebody could have changed the page tables just in the 
> moment between the check of amdgpu_ttm_tt_userptr_needs_pages() and 
> adding the fence to the reservation object.
>
> I’m not planning to change that. I don’t think there is any need to 
> change it.
>

Yeah, but when HMM doesn't provide both the start and the end hock of 
the invalidation this way won't work any more.

So we need to find a solution for this,
Christian.

> Regards,
>
>   Felix
>
> *From:*Koenig, Christian
> *Sent:* Thursday, September 27, 2018 7:24 AM
> *To:* Kuehling, Felix <Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
> *Cc:* Yang, Philip <Philip.Yang-5C7GfCeVMHo@public.gmane.org>; 
> amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org; Jerome Glisse <j.glisse-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> *Subject:* Re: [PATCH] drm/amdgpu: use HMM mirror callback to replace 
> mmu notifier v4
>
> Am 27.09.2018 um 13:08 schrieb Kuehling, Felix:
>
>     > We double check that there wasn't any page table modification
>     while we prepared the submission and restart the whole process
>     when there actually was some update.
>     >
>     > The reason why we need to do this is here:
>     >
>
>     > ttm_eu_fence_buffer_objects(&p->ticket, &p->validated, p->fence);
>     >        amdgpu_mn_unlock(p->mn);
>
>     >
>     > Only after the new fence is added to the buffer object we can
>     release the lock so that any invalidation will now block on our
>     command submission to finish before it modifies the page table.
>
>
>     I don’t see why this requires holding the read-lock until
>     invalidate_range_end. amdgpu_ttm_tt_affect_userptr gets called
>     while the mn read-lock is held in invalidate_range_start notifier.
>
>
> That's not related to amdgpu_ttm_tt_affect_userptr(), this function 
> could actually be called outside the lock.
>
> The problem is here:
>
>     ttm_eu_fence_buffer_objects(&p->ticket, &p->validated, p->fence);
>
>     amdgpu_mn_unlock(p->mn);
>
>
> We need to hold the lock until the fence is added to the reservation 
> object.
>
> Otherwise somebody could have changed the page tables just in the 
> moment between the check of amdgpu_ttm_tt_userptr_needs_pages() and 
> adding the fence to the reservation object.
>
> Regards,
> Christian.
>
>
>
>     Regards,
>
>       Felix
>
>     *From:*Koenig, Christian
>     *Sent:* Thursday, September 27, 2018 5:27 AM
>     *To:* Kuehling, Felix <Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
>     <mailto:Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
>     *Cc:* Yang, Philip <Philip.Yang-5C7GfCeVMHo@public.gmane.org>
>     <mailto:Philip.Yang-5C7GfCeVMHo@public.gmane.org>; amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org
>     <mailto:amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org>; Jerome Glisse
>     <j.glisse-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> <mailto:j.glisse-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
>     *Subject:* Re: [PATCH] drm/amdgpu: use HMM mirror callback to
>     replace mmu notifier v4
>
>     That is correct, but take a look what we do when after calling the
>     amdgpu_mn_read_lock():
>
>
>
>                 /* No memory allocation is allowed while holding the
>         mn lock */
>                 amdgpu_mn_lock(p->mn);
>                 amdgpu_bo_list_for_each_userptr_entry(e, p->bo_list) {
>                         struct amdgpu_bo *bo = ttm_to_amdgpu_bo(e->tv.bo);
>
>                         if
>         (amdgpu_ttm_tt_userptr_needs_pages(bo->tbo.ttm)) {
>                                 r = -ERESTARTSYS;
>                                 goto error_abort;
>                         }
>                 }
>
>
>     We double check that there wasn't any page table modification
>     while we prepared the submission and restart the whole process
>     when there actually was some update.
>
>     The reason why we need to do this is here:
>
>
>         ttm_eu_fence_buffer_objects(&p->ticket, &p->validated, p->fence);
>                 amdgpu_mn_unlock(p->mn);
>
>
>     Only after the new fence is added to the buffer object we can
>     release the lock so that any invalidation will now block on our
>     command submission to finish before it modifies the page table.
>
>     The only other option would be to add the fence first and then
>     check if there was any update to the page tables.
>
>     The issue with that approach is that adding a fence can't be made
>     undone, so if we find that there actually was an update to the
>     page tables we would need to somehow turn the CS into a dummy
>     (e.g. overwrite all IBs with NOPs or something like that) and
>     still submit it.
>
>     Not sure if that is actually possible.
>
>     Regards,
>     Christian.
>
>     Am 27.09.2018 um 10:47 schrieb Kuehling, Felix:
>
>         So back to my previous question:
>
>         >> But do we really need another lock for this? Wouldn't the
>
>         >> re-validation of userptr BOs (currently calling
>         get_user_pages) force
>
>         >> synchronization with the ongoing page table invalidation
>         through the
>
>         >> mmap_sem or other MM locks?
>
>         >
>
>         > No and yes. We don't hold any other locks while doing
>         command submission, but I expect that HMM has its own
>         mechanism to prevent that.
>
>         >
>
>         > Since we don't modify amdgpu_mn_lock()/amdgpu_mn_unlock() we
>         are certainly not using this mechanism correctly.
>
>         The existing amdgpu_mn_lock/unlock should block the MMU
>         notifier while a command submission is in progress. It should
>         also block command submission while an MMU notifier is in
>         progress.
>
>         What we lose with HMM is the ability to hold a read-lock for
>         the entire duration of the invalidate_range_start until
>         invalidate_range_end. As I understand it, that lock is meant
>         to prevent new command submissions while the page tables are
>         being updated by the kernel. But my point is, that
>         get_user_pages or hmm_vma_fault should do the same kind of
>         thing. Before the command submission can go ahead, it needs to
>         update the userptr addresses. If the page tables are still
>         being updated, it will block there even without holding the
>         amdgpu_mn_read_lock.
>
>         Regards,
>
>           Felix
>
>         *From:* Koenig, Christian
>         *Sent:* Thursday, September 27, 2018 3:00 AM
>         *To:* Kuehling, Felix <Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
>         <mailto:Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
>         *Cc:* Yang, Philip <Philip.Yang-5C7GfCeVMHo@public.gmane.org>
>         <mailto:Philip.Yang-5C7GfCeVMHo@public.gmane.org>; amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org
>         <mailto:amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org>; Jerome Glisse
>         <j.glisse-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> <mailto:j.glisse-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
>         *Subject:* RE: [PATCH] drm/amdgpu: use HMM mirror callback to
>         replace mmu notifier v4
>
>         No, that won't work. We would still run into lock inversion
>         problems.
>
>         What we could do with the scheduler is to turn submissions
>         into dummies if we find that the page tables are now outdated.
>
>         But that would be really hacky and I'm not sure if that would
>         really work in all cases.
>
>         Christian.
>
>         Am 27.09.2018 08:53 schrieb "Kuehling, Felix"
>         <Felix.Kuehling-5C7GfCeVMHo@public.gmane.org <mailto:Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>>:
>
>         I had a chat with Jerome yesterday. He pointed out that the
>         new blockable parameter can be used to infer whether the MMU
>         notifier is being called in a reclaim operation. So if
>         blockable==true, it should even be safe to take the BO
>         reservation lock without problems. I think with that we should
>         be able to remove the read-write locking completely and go
>         back to locking (or try-locking for blockable==false) the
>         reservation locks in the MMU notifier?
>
>         Regards,
>           Felix
>
>         -----Original Message-----
>         From: amd-gfx <amd-gfx-bounces-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org
>         <mailto:amd-gfx-bounces-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org>> On Behalf Of
>         Christian König
>         Sent: Saturday, September 15, 2018 3:47 AM
>         To: Kuehling, Felix <Felix.Kuehling-5C7GfCeVMHo@public.gmane.org
>         <mailto:Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>>; Yang, Philip
>         <Philip.Yang-5C7GfCeVMHo@public.gmane.org <mailto:Philip.Yang-5C7GfCeVMHo@public.gmane.org>>;
>         amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org
>         <mailto:amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org>; Jerome Glisse
>         <j.glisse-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org <mailto:j.glisse-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>>
>         Subject: Re: [PATCH] drm/amdgpu: use HMM mirror callback to
>         replace mmu notifier v4
>
>         Am 14.09.2018 um 22:21 schrieb Felix Kuehling:
>         > On 2018-09-14 01:52 PM, Christian König wrote:
>         >> Am 14.09.2018 um 19:47 schrieb Philip Yang:
>         >>> On 2018-09-14 03:51 AM, Christian König wrote:
>         >>>> Am 13.09.2018 um 23:51 schrieb Felix Kuehling:
>         >>>>> On 2018-09-13 04:52 PM, Philip Yang wrote:
>         >>>>> [SNIP]
>         >>>>>>    + amdgpu_mn_read_unlock(amn);
>         >>>>>> +
>         >>>>> amdgpu_mn_read_lock/unlock support recursive locking for
>         multiple
>         >>>>> overlapping or nested invalidation ranges. But if you'r
>         locking
>         >>>>> and unlocking in the same function. Is that still a concern?
>         >>> I don't understand the possible recursive case, but
>         >>> amdgpu_mn_read_lock() still support recursive locking.
>         >>>> Well the real problem is that unlocking them here won't work.
>         >>>>
>         >>>> We need to hold the lock until we are sure that the
>         operation which
>         >>>> updates the page tables is completed.
>         >>>>
>         >>> The reason for this change is because hmm mirror has
>         >>> invalidate_start callback, no invalidate_end callback
>         >>>
>         >>> Check mmu_notifier.c and hmm.c again, below is entire
>         logic to
>         >>> update CPU page tables and callback:
>         >>>
>         >>> mn lock amn->lock is used to protect interval tree access
>         because
>         >>> user may submit/register new userptr anytime.
>         >>> This is same for old and new way.
>         >>>
>         >>> step 2 guarantee the GPU operation is done before updating
>         CPU page
>         >>> table.
>         >>>
>         >>> So I think the change is safe. We don't need hold mn lock
>         until the
>         >>> CPU page tables update is completed.
>         >> No, that isn't even remotely correct. The lock doesn't
>         protects the
>         >> interval tree.
>         >>
>         >>> Old:
>         >>>     1. down_read_non_owner(&amn->lock)
>         >>>     2. loop to handle BOs from node->bos through interval tree
>         >>> amn->object nodes
>         >>>         gfx: wait for pending BOs fence operation done,
>         mark user
>         >>> pages dirty
>         >>>         kfd: evict user queues of the process, wait for queue
>         >>> unmap/map operation done
>         >>>     3. update CPU page tables
>         >>>     4. up_read(&amn->lock)
>         >>>
>         >>> New, switch step 3 and 4
>         >>>     1. down_read_non_owner(&amn->lock)
>         >>>     2. loop to handle BOs from node->bos through interval tree
>         >>> amn->object nodes
>         >>>         gfx: wait for pending BOs fence operation done,
>         mark user
>         >>> pages dirty
>         >>>         kfd: evict user queues of the process, wait for queue
>         >>> unmap/map operation done
>         >>>     3. up_read(&amn->lock)
>         >>>     4. update CPU page tables
>         >> The lock is there to make sure that we serialize page table
>         updates
>         >> with command submission.
>         > As I understand it, the idea is to prevent command
>         submission (adding
>         > new fences to BOs) while a page table invalidation is in
>         progress.
>
>         Yes, exactly.
>
>         > But do we really need another lock for this? Wouldn't the
>         > re-validation of userptr BOs (currently calling
>         get_user_pages) force
>         > synchronization with the ongoing page table invalidation
>         through the
>         > mmap_sem or other MM locks?
>
>         No and yes. We don't hold any other locks while doing command
>         submission, but I expect that HMM has its own mechanism to
>         prevent that.
>
>         Since we don't modify amdgpu_mn_lock()/amdgpu_mn_unlock() we
>         are certainly not using this mechanism correctly.
>
>         Regards,
>         Christian.
>         _______________________________________________
>         amd-gfx mailing list
>         amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org
>         <mailto:amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org>
>         https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>


[-- Attachment #1.2: Type: text/html, Size: 27992 bytes --]

[-- Attachment #2: Type: text/plain, Size: 154 bytes --]

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

  parent reply	other threads:[~2018-09-27 13:21 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-09-13 20:52 [PATCH] drm/amdgpu: use HMM mirror callback to replace mmu notifier v4 Philip Yang
     [not found] ` <1536871954-8451-1-git-send-email-Philip.Yang-5C7GfCeVMHo@public.gmane.org>
2018-09-13 21:51   ` Felix Kuehling
     [not found]     ` <9d6717ac-23f0-7beb-6e41-58c6e32acdf8-5C7GfCeVMHo@public.gmane.org>
2018-09-14  7:51       ` Christian König
     [not found]         ` <58bc3bb9-b7b1-a32f-e355-c78a23d95215-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2018-09-14 17:47           ` Philip Yang
     [not found]             ` <383388c8-1bff-48d9-1044-f16e66bcbfa5-5C7GfCeVMHo@public.gmane.org>
2018-09-14 17:52               ` Christian König
     [not found]                 ` <3850fbeb-5d91-9c14-43c9-45d5d058e15b-5C7GfCeVMHo@public.gmane.org>
2018-09-14 20:21                   ` Felix Kuehling
     [not found]                     ` <de28cee0-3461-4f99-eeae-b793de00ca58-5C7GfCeVMHo@public.gmane.org>
2018-09-15  7:46                       ` Christian König
     [not found]                         ` <e4cf7212-4340-8639-c8c1-057e4d1083f0-5C7GfCeVMHo@public.gmane.org>
2018-09-27  6:53                           ` Kuehling, Felix
     [not found]                             ` <DM5PR12MB17078469EB6D3AF1D53B788992140-2J9CzHegvk9TCtO+SvGBKwdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2018-09-27  6:59                               ` Koenig, Christian
     [not found]                                 ` <a76b71ac-4b5b-45d7-b48b-6d0e4a7e7524-2ueSQiBKiTY7tOexoI0I+QC/G2K4zDHf@public.gmane.org>
2018-09-27  8:47                                   ` Kuehling, Felix
     [not found]                                     ` <DM5PR12MB1707D5E46617B2936F800F1992140-2J9CzHegvk9TCtO+SvGBKwdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2018-09-27  9:27                                       ` Christian König
     [not found]                                         ` <d752c19b-6d2e-c7c1-1cd7-651e25b8f708-5C7GfCeVMHo@public.gmane.org>
2018-09-27 11:08                                           ` Kuehling, Felix
     [not found]                                             ` <DM5PR12MB17077A78E0F95BFBA57F2E1A92140-2J9CzHegvk9TCtO+SvGBKwdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2018-09-27 11:23                                               ` Christian König
     [not found]                                                 ` <425fe859-c780-48a5-a2c6-c3bf2b9abb38-5C7GfCeVMHo@public.gmane.org>
2018-09-27 13:18                                                   ` Kuehling, Felix
     [not found]                                                     ` <DM5PR12MB17072AE77EB0DE3AD22B8D7892140-2J9CzHegvk9TCtO+SvGBKwdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2018-09-27 13:21                                                       ` Christian König [this message]
     [not found]                                                         ` <9b427976-f2ff-8ba1-6ebf-588ca95aef80-5C7GfCeVMHo@public.gmane.org>
2018-09-27 13:41                                                           ` Kuehling, Felix
     [not found]                                                             ` <DM5PR12MB17072879EC907150027D6F7892140-2J9CzHegvk9TCtO+SvGBKwdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2018-09-27 13:58                                                               ` Koenig, Christian
     [not found]                                                                 ` <58199419-e20f-4ab0-ac1d-a7eb79f5c6f7-2ueSQiBKiTY7tOexoI0I+QC/G2K4zDHf@public.gmane.org>
2018-09-27 14:27                                                                   ` Kuehling, Felix
     [not found]                                                                     ` <DM5PR12MB1707273AC4B0C03A3BEDF73092140-2J9CzHegvk9TCtO+SvGBKwdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2018-09-27 14:29                                                                       ` Koenig, Christian
     [not found]                                                                         ` <1068c389-56fc-4125-ac40-b1ef2d60eabd-2ueSQiBKiTY7tOexoI0I+QC/G2K4zDHf@public.gmane.org>
2018-09-27 14:50                                                                           ` Kuehling, Felix
     [not found]                                                                             ` <DM5PR12MB1707BCD7BFC10EDD594FE90B92140-2J9CzHegvk9TCtO+SvGBKwdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2018-09-27 15:36                                                                               ` Christian König
     [not found]                                                                                 ` <11ba3857-9bb0-648e-2806-0533090d9a0a-5C7GfCeVMHo@public.gmane.org>
2018-09-27 20:17                                                                                   ` Philip Yang
     [not found]                                                                                     ` <16d1faf6-80a4-dc46-bd2a-9cd475808e98-5C7GfCeVMHo@public.gmane.org>
2018-09-28  5:25                                                                                       ` Koenig, Christian
     [not found]                                                                                         ` <8f9f5703-214f-488d-9cfe-ccc64e8cd009-2ueSQiBKiTY7tOexoI0I+QC/G2K4zDHf@public.gmane.org>
2018-09-28 15:07                                                                                           ` Philip Yang
     [not found]                                                                                             ` <fe0d429b-5038-a297-e02e-423302544477-5C7GfCeVMHo@public.gmane.org>
2018-09-28 15:13                                                                                               ` Koenig, Christian
     [not found]                                                                                                 ` <b8686e6b-0c3e-4feb-afbd-80397aac31a0-2ueSQiBKiTY7tOexoI0I+QC/G2K4zDHf@public.gmane.org>
2018-10-02 15:05                                                                                                   ` Christian König
     [not found]                                                                                                     ` <09916f9a-3f5f-27ab-01e6-6d77303cf052-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2018-10-03 20:31                                                                                                       ` Philip Yang
     [not found]                                                                                                         ` <bf806477-06b3-61de-fea3-5ad260d92cdd-5C7GfCeVMHo@public.gmane.org>
2018-10-04  7:20                                                                                                           ` Christian König

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9b427976-f2ff-8ba1-6ebf-588ca95aef80@amd.com \
    --to=christian.koenig-5c7gfcevmho@public.gmane.org \
    --cc=Felix.Kuehling-5C7GfCeVMHo@public.gmane.org \
    --cc=Philip.Yang-5C7GfCeVMHo@public.gmane.org \
    --cc=amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org \
    --cc=j.glisse-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox