* Re: [PATCH 00/13] mmu_notifier kill invalidate_page callback
[not found] ` <f46585ee-7a89-1d68-5698-b6de76152cde@amd.com>
@ 2017-08-31 13:59 ` Jerome Glisse
[not found] ` <20170831135953.GA9227-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
0 siblings, 1 reply; 6+ messages in thread
From: Jerome Glisse @ 2017-08-31 13:59 UTC (permalink / raw)
To: Felix Kuehling; +Cc: intel-gfx, Christian König, amd-gfx
[Adding Intel folks as they might be interested in this discussion]
On Wed, Aug 30, 2017 at 05:51:52PM -0400, Felix Kuehling wrote:
> Hi Jérôme,
>
> I have some questions about the potential range-start-end race you
> mentioned.
>
> On 2017-08-29 07:54 PM, Jérôme Glisse wrote:
> > Note that a lot of existing user feels broken in respect to range_start/
> > range_end. Many user only have range_start() callback but there is nothing
> > preventing them to undo what was invalidated in their range_start() callback
> > after it returns but before any CPU page table update take place.
> >
> > The code pattern use in kvm or umem odp is an example on how to properly
> > avoid such race. In a nutshell use some kind of sequence number and active
> > range invalidation counter to block anything that might undo what the
> > range_start() callback did.
> What happens when we start monitoring an address range after
> invaligate_range_start was called? Sounds like we have to keep track of
> all active invalidations for just such a case, even in address ranges
> that we don't currently care about.
>
> What are the things we cannot do between invalidate_range_start and
> invalidate_range_end? amdgpu calls get_user_pages to re-validate our
> userptr mappings after the invalidate_range_start notifier invalidated
> it. Do we have to wait for invalidate_range_end before we can call
> get_user_pages safely?
Well the whole userptr bo object is somewhat broken from the start.
You never defined the semantic of it ie what is expected. I can
think of 2 differents semantics:
A) a uptr buffer object is a snapshot of a memory at the time of
uptr buffer object creation
B) a uptr buffer object allow GPU to access a range of virtual
address of a process an share coherent view of that range
between CPU and GPU
As it was implemented it is more inline with B but it is not defined
anywhere AFAICT.
Anyway getting back to your questions, it kind of doesn't matter as
you are using GUP ie you are pinning pages except for one scenario
(at least i can only think of one).
Problematic case is race between CPU write to zero page or COW and
GPU driver doing read only GUP:
CPU thread 1 | CPU thread 2
---------------------------------------------------------------------
|
| uptr covering addr A read only
| .... do stuff with A
write fault to addr A |
invalidate_range_start([A, A+1]) | unbind_ttm -> unpin
| validate bo -> GUP -> zero page
lock page table |
replace zero pfn/COW with new page |
unlock page table |
invalidate_range_end([A, A+1]) |
So here the GPU would be using wrong page for the address. How bad
is it is undefined as the semantic of uptr is undefine. Given how it
as been use so far this race is unlikely (i don't think we have many
userspace that use that feature and do fork).
So i would first define the semantic of uptr bo and then i would fix
accordingly the code. Semantic A is easier to implement and you could
just drop the whole mmu_notifier. Maybe it is better to create uptr
buffer object everytime you want to snapshot a range of address. I
don't think the overhead of buffer creation would matter.
If you want to close the race for COW and zero page in case of read
only GUP there is no other way than what KVM or ODP is doing. I had
patchset to simplify all this but i need to bring it back to life.
Note that other thing might race but as you pin the pages they do
not matter. It just mean that if you GUP after range_start() but
before range_end() and before CPU page table update then you pinned
the same old page again and nothing will happen (migrate will fail,
MADV_FREE will nop, ...). So you just did the range_start() callback
for nothing in those cases.
(Sorry for taking so long to answer i forgot your mail yesterday with
all the other discussion going on).
Cheers,
Jérôme
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 00/13] mmu_notifier kill invalidate_page callback
[not found] ` <20170831135953.GA9227-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2017-08-31 14:14 ` Christian König
2017-08-31 18:39 ` Felix Kuehling
1 sibling, 0 replies; 6+ messages in thread
From: Christian König @ 2017-08-31 14:14 UTC (permalink / raw)
To: Jerome Glisse, Felix Kuehling
Cc: intel-gfx, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
Am 31.08.2017 um 15:59 schrieb Jerome Glisse:
> [Adding Intel folks as they might be interested in this discussion]
>
> On Wed, Aug 30, 2017 at 05:51:52PM -0400, Felix Kuehling wrote:
>> Hi Jérôme,
>>
>> I have some questions about the potential range-start-end race you
>> mentioned.
>>
>> On 2017-08-29 07:54 PM, Jérôme Glisse wrote:
>>> Note that a lot of existing user feels broken in respect to range_start/
>>> range_end. Many user only have range_start() callback but there is nothing
>>> preventing them to undo what was invalidated in their range_start() callback
>>> after it returns but before any CPU page table update take place.
>>>
>>> The code pattern use in kvm or umem odp is an example on how to properly
>>> avoid such race. In a nutshell use some kind of sequence number and active
>>> range invalidation counter to block anything that might undo what the
>>> range_start() callback did.
>> What happens when we start monitoring an address range after
>> invaligate_range_start was called? Sounds like we have to keep track of
>> all active invalidations for just such a case, even in address ranges
>> that we don't currently care about.
>>
>> What are the things we cannot do between invalidate_range_start and
>> invalidate_range_end? amdgpu calls get_user_pages to re-validate our
>> userptr mappings after the invalidate_range_start notifier invalidated
>> it. Do we have to wait for invalidate_range_end before we can call
>> get_user_pages safely?
> Well the whole userptr bo object is somewhat broken from the start.
> You never defined the semantic of it ie what is expected. I can
> think of 2 differents semantics:
> A) a uptr buffer object is a snapshot of a memory at the time of
> uptr buffer object creation
> B) a uptr buffer object allow GPU to access a range of virtual
> address of a process an share coherent view of that range
> between CPU and GPU
>
> As it was implemented it is more inline with B but it is not defined
> anywhere AFAICT.
Well it is not documented, but the userspace APIs build on top of that
require semantics B.
Essentially you could have cases where the GPU or the CPU is waiting in
a busy loop for the other one to change some memory address.
> Anyway getting back to your questions, it kind of doesn't matter as
> you are using GUP ie you are pinning pages except for one scenario
> (at least i can only think of one).
>
> Problematic case is race between CPU write to zero page or COW and
> GPU driver doing read only GUP:
>
> CPU thread 1 | CPU thread 2
> ---------------------------------------------------------------------
> |
> | uptr covering addr A read only
> | .... do stuff with A
> write fault to addr A |
> invalidate_range_start([A, A+1]) | unbind_ttm -> unpin
> | validate bo -> GUP -> zero page
> lock page table |
> replace zero pfn/COW with new page |
> unlock page table |
> invalidate_range_end([A, A+1]) |
>
> So here the GPU would be using wrong page for the address. How bad
> is it is undefined as the semantic of uptr is undefine. Given how it
> as been use so far this race is unlikely (i don't think we have many
> userspace that use that feature and do fork).
>
>
> So i would first define the semantic of uptr bo and then i would fix
> accordingly the code. Semantic A is easier to implement and you could
> just drop the whole mmu_notifier. Maybe it is better to create uptr
> buffer object everytime you want to snapshot a range of address. I
> don't think the overhead of buffer creation would matter.
We do support creating userptr without mmu_notifier for exactly that
purpose, e.g. uploads of snapshots what user space address space looked
like in a certain moment.
Unfortunately we found that the overhead of buffer creation (and the
related gup) is way to high to be useful as a throw away object. Just
memcpy into a BO has just less latency over all.
And yeah, at least I'm perfectly aware of the problems with fork() and
COW. BTW: It becomes really really ugly if you think about what happens
when the parent writes to a page first and the GPU then has the child copy.
Regards,
Christian.
> If you want to close the race for COW and zero page in case of read
> only GUP there is no other way than what KVM or ODP is doing. I had
> patchset to simplify all this but i need to bring it back to life.
>
>
> Note that other thing might race but as you pin the pages they do
> not matter. It just mean that if you GUP after range_start() but
> before range_end() and before CPU page table update then you pinned
> the same old page again and nothing will happen (migrate will fail,
> MADV_FREE will nop, ...). So you just did the range_start() callback
> for nothing in those cases.
>
> (Sorry for taking so long to answer i forgot your mail yesterday with
> all the other discussion going on).
>
> Cheers,
> Jérôme
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 00/13] mmu_notifier kill invalidate_page callback
[not found] ` <20170831135953.GA9227-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2017-08-31 14:14 ` Christian König
@ 2017-08-31 18:39 ` Felix Kuehling
2017-08-31 19:00 ` Jerome Glisse
1 sibling, 1 reply; 6+ messages in thread
From: Felix Kuehling @ 2017-08-31 18:39 UTC (permalink / raw)
To: Jerome Glisse
Cc: intel-gfx, Christian König,
amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
On 2017-08-31 09:59 AM, Jerome Glisse wrote:
> [Adding Intel folks as they might be interested in this discussion]
>
> On Wed, Aug 30, 2017 at 05:51:52PM -0400, Felix Kuehling wrote:
>> Hi Jérôme,
>>
>> I have some questions about the potential range-start-end race you
>> mentioned.
>>
>> On 2017-08-29 07:54 PM, Jérôme Glisse wrote:
>>> Note that a lot of existing user feels broken in respect to range_start/
>>> range_end. Many user only have range_start() callback but there is nothing
>>> preventing them to undo what was invalidated in their range_start() callback
>>> after it returns but before any CPU page table update take place.
>>>
>>> The code pattern use in kvm or umem odp is an example on how to properly
>>> avoid such race. In a nutshell use some kind of sequence number and active
>>> range invalidation counter to block anything that might undo what the
>>> range_start() callback did.
>> What happens when we start monitoring an address range after
>> invaligate_range_start was called? Sounds like we have to keep track of
>> all active invalidations for just such a case, even in address ranges
>> that we don't currently care about.
>>
>> What are the things we cannot do between invalidate_range_start and
>> invalidate_range_end? amdgpu calls get_user_pages to re-validate our
>> userptr mappings after the invalidate_range_start notifier invalidated
>> it. Do we have to wait for invalidate_range_end before we can call
>> get_user_pages safely?
> Well the whole userptr bo object is somewhat broken from the start.
> You never defined the semantic of it ie what is expected. I can
> think of 2 differents semantics:
> A) a uptr buffer object is a snapshot of a memory at the time of
> uptr buffer object creation
> B) a uptr buffer object allow GPU to access a range of virtual
> address of a process an share coherent view of that range
> between CPU and GPU
>
> As it was implemented it is more inline with B but it is not defined
> anywhere AFAICT.
Yes, we're trying to achieve B, that's why we have an MMU notifier in
the first place. But it's been a struggle getting it to work properly,
and we're still dealing with some locking issues and now this one.
>
> Anyway getting back to your questions, it kind of doesn't matter as
> you are using GUP ie you are pinning pages except for one scenario
> (at least i can only think of one).
>
> Problematic case is race between CPU write to zero page or COW and
> GPU driver doing read only GUP:
[...]
Thanks, I was aware of COW but not of the zero-page case. I believe in
most practical cases our userptr mappings are read-write, so this is
probably not causing us any real trouble at the moment.
> So i would first define the semantic of uptr bo and then i would fix
> accordingly the code. Semantic A is easier to implement and you could
> just drop the whole mmu_notifier. Maybe it is better to create uptr
> buffer object everytime you want to snapshot a range of address. I
> don't think the overhead of buffer creation would matter.
That doesn't work for KFD and our compute memory model where CPU and GPU
expect to share the same address space.
>
> If you want to close the race for COW and zero page in case of read
> only GUP there is no other way than what KVM or ODP is doing. I had
> patchset to simplify all this but i need to bring it back to life.
OK. I'll look at these to get an idea.
> Note that other thing might race but as you pin the pages they do
> not matter. It just mean that if you GUP after range_start() but
> before range_end() and before CPU page table update then you pinned
> the same old page again and nothing will happen (migrate will fail,
> MADV_FREE will nop, ...). So you just did the range_start() callback
> for nothing in those cases.
We pin the memory because the GPU wants to access it. So this is
probably OK if the migration fails. However, your statement that we
"just did the range_start() callback for nothing" implies that we could
as well have ignored the range_start callback. But I don't think that's
true. That way we'd keep a page pinned that is no longer mapped in the
CPU address space. So the GPU mapping would be out of sync with the CPU
mapping.
>
> (Sorry for taking so long to answer i forgot your mail yesterday with
> all the other discussion going on).
Thanks. Your reply helps. I'll take a look at KVM or ODP and see how
this is applicable to our driver.
Regards,
Felix
>
> Cheers,
> Jérôme
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 00/13] mmu_notifier kill invalidate_page callback
2017-08-31 18:39 ` Felix Kuehling
@ 2017-08-31 19:00 ` Jerome Glisse
[not found] ` <20170831190021.GG9227-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
0 siblings, 1 reply; 6+ messages in thread
From: Jerome Glisse @ 2017-08-31 19:00 UTC (permalink / raw)
To: Felix Kuehling; +Cc: intel-gfx, Christian König, amd-gfx
On Thu, Aug 31, 2017 at 02:39:17PM -0400, Felix Kuehling wrote:
> On 2017-08-31 09:59 AM, Jerome Glisse wrote:
> > [Adding Intel folks as they might be interested in this discussion]
> >
> > On Wed, Aug 30, 2017 at 05:51:52PM -0400, Felix Kuehling wrote:
> >> Hi Jérôme,
> >>
> >> I have some questions about the potential range-start-end race you
> >> mentioned.
> >>
> >> On 2017-08-29 07:54 PM, Jérôme Glisse wrote:
> >>> Note that a lot of existing user feels broken in respect to range_start/
> >>> range_end. Many user only have range_start() callback but there is nothing
> >>> preventing them to undo what was invalidated in their range_start() callback
> >>> after it returns but before any CPU page table update take place.
> >>>
> >>> The code pattern use in kvm or umem odp is an example on how to properly
> >>> avoid such race. In a nutshell use some kind of sequence number and active
> >>> range invalidation counter to block anything that might undo what the
> >>> range_start() callback did.
> >> What happens when we start monitoring an address range after
> >> invaligate_range_start was called? Sounds like we have to keep track of
> >> all active invalidations for just such a case, even in address ranges
> >> that we don't currently care about.
> >>
> >> What are the things we cannot do between invalidate_range_start and
> >> invalidate_range_end? amdgpu calls get_user_pages to re-validate our
> >> userptr mappings after the invalidate_range_start notifier invalidated
> >> it. Do we have to wait for invalidate_range_end before we can call
> >> get_user_pages safely?
> > Well the whole userptr bo object is somewhat broken from the start.
> > You never defined the semantic of it ie what is expected. I can
> > think of 2 differents semantics:
> > A) a uptr buffer object is a snapshot of a memory at the time of
> > uptr buffer object creation
> > B) a uptr buffer object allow GPU to access a range of virtual
> > address of a process an share coherent view of that range
> > between CPU and GPU
> >
> > As it was implemented it is more inline with B but it is not defined
> > anywhere AFAICT.
>
> Yes, we're trying to achieve B, that's why we have an MMU notifier in
> the first place. But it's been a struggle getting it to work properly,
> and we're still dealing with some locking issues and now this one.
>
> >
> > Anyway getting back to your questions, it kind of doesn't matter as
> > you are using GUP ie you are pinning pages except for one scenario
> > (at least i can only think of one).
> >
> > Problematic case is race between CPU write to zero page or COW and
> > GPU driver doing read only GUP:
> [...]
>
> Thanks, I was aware of COW but not of the zero-page case. I believe in
> most practical cases our userptr mappings are read-write, so this is
> probably not causing us any real trouble at the moment.
>
> > So i would first define the semantic of uptr bo and then i would fix
> > accordingly the code. Semantic A is easier to implement and you could
> > just drop the whole mmu_notifier. Maybe it is better to create uptr
> > buffer object everytime you want to snapshot a range of address. I
> > don't think the overhead of buffer creation would matter.
>
> That doesn't work for KFD and our compute memory model where CPU and GPU
> expect to share the same address space.
>
> >
> > If you want to close the race for COW and zero page in case of read
> > only GUP there is no other way than what KVM or ODP is doing. I had
> > patchset to simplify all this but i need to bring it back to life.
>
> OK. I'll look at these to get an idea.
>
> > Note that other thing might race but as you pin the pages they do
> > not matter. It just mean that if you GUP after range_start() but
> > before range_end() and before CPU page table update then you pinned
> > the same old page again and nothing will happen (migrate will fail,
> > MADV_FREE will nop, ...). So you just did the range_start() callback
> > for nothing in those cases.
>
> We pin the memory because the GPU wants to access it. So this is
> probably OK if the migration fails. However, your statement that we
> "just did the range_start() callback for nothing" implies that we could
> as well have ignored the range_start callback. But I don't think that's
> true. That way we'd keep a page pinned that is no longer mapped in the
> CPU address space. So the GPU mapping would be out of sync with the CPU
> mapping.
Here is an example of "for nothing":
CPU0 CPU1
> migrate page at addr A
> invalidate_start addr A
> unbind_ttm(for addr A)
> use ttm object for addr A
> GUP addr A
> page table update
> invalidate_end addr A
> refcount check fails because
of GUP
> restore page table to A
This is what i meant by range_start() for nothing on CPU0 you invalidated
ttm object for nothing but this is because of a race. Fixing COW race
would also fix this race and avoid migration to fail because GPU GUP.
I was not saying you should not use mmu_notifier. For achieving B you need
mmu_notifier. Note that if you do like ODP/KVM then you do not need to
pin page.
Cheers,
Jérôme
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 00/13] mmu_notifier kill invalidate_page callback
[not found] ` <20170831190021.GG9227-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2017-08-31 23:19 ` Felix Kuehling
2017-08-31 23:29 ` Jerome Glisse
0 siblings, 1 reply; 6+ messages in thread
From: Felix Kuehling @ 2017-08-31 23:19 UTC (permalink / raw)
To: Jerome Glisse
Cc: intel-gfx, Christian König,
amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
On 2017-08-31 03:00 PM, Jerome Glisse wrote:
> I was not saying you should not use mmu_notifier. For achieving B you need
> mmu_notifier. Note that if you do like ODP/KVM then you do not need to
> pin page.
I would like that. I've thought about it before. The one problem I
couldn't figure out is, where to set the accessed and dirty bits for the
pages. Now we do it when we unpin. If we don't pin the pages in the
first place, we don't have a good place for this.
Our hardware doesn't give us notifications or accessed/dirty bits, so we
have to assume the worst case that the pages are continuously
accessed/dirty.
I'd appreciate any advice how to handle that. (Sorry, I realize this is
going a bit off topic.) A pointer to a document or source code would be
great. :)
Thanks,
Felix
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 00/13] mmu_notifier kill invalidate_page callback
2017-08-31 23:19 ` Felix Kuehling
@ 2017-08-31 23:29 ` Jerome Glisse
0 siblings, 0 replies; 6+ messages in thread
From: Jerome Glisse @ 2017-08-31 23:29 UTC (permalink / raw)
To: Felix Kuehling; +Cc: intel-gfx, Christian König, amd-gfx
On Thu, Aug 31, 2017 at 07:19:19PM -0400, Felix Kuehling wrote:
> On 2017-08-31 03:00 PM, Jerome Glisse wrote:
> > I was not saying you should not use mmu_notifier. For achieving B you need
> > mmu_notifier. Note that if you do like ODP/KVM then you do not need to
> > pin page.
> I would like that. I've thought about it before. The one problem I
> couldn't figure out is, where to set the accessed and dirty bits for the
> pages. Now we do it when we unpin. If we don't pin the pages in the
> first place, we don't have a good place for this.
>
> Our hardware doesn't give us notifications or accessed/dirty bits, so we
> have to assume the worst case that the pages are continuously
> accessed/dirty.
>
> I'd appreciate any advice how to handle that. (Sorry, I realize this is
> going a bit off topic.) A pointer to a document or source code would be
> great. :)
In invalidate_range_start() ie same place as where you unpin but instead
of unpining you would just call set_page_dirty()
Jérôme
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2017-08-31 23:29 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20170829235447.10050-1-jglisse@redhat.com>
[not found] ` <f46585ee-7a89-1d68-5698-b6de76152cde@amd.com>
2017-08-31 13:59 ` [PATCH 00/13] mmu_notifier kill invalidate_page callback Jerome Glisse
[not found] ` <20170831135953.GA9227-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2017-08-31 14:14 ` Christian König
2017-08-31 18:39 ` Felix Kuehling
2017-08-31 19:00 ` Jerome Glisse
[not found] ` <20170831190021.GG9227-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2017-08-31 23:19 ` Felix Kuehling
2017-08-31 23:29 ` Jerome Glisse
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox