* [bug report] mm: replace vma->vm_flags direct modifications with modifier calls
@ 2023-07-11 7:21 Dan Carpenter
2023-07-11 21:55 ` Suren Baghdasaryan
0 siblings, 1 reply; 18+ messages in thread
From: Dan Carpenter @ 2023-07-11 7:21 UTC (permalink / raw)
To: surenb; +Cc: linux-mm
Hello Suren Baghdasaryan,
The patch 1c71222e5f23: "mm: replace vma->vm_flags direct
modifications with modifier calls" from Jan 26, 2023, leads to the
following Smatch static checker warning:
./include/linux/mm.h:729 vma_start_write()
warn: sleeping in atomic context
include/linux/mm.h
722 static inline void vma_start_write(struct vm_area_struct *vma)
723 {
724 int mm_lock_seq;
725
726 if (__is_vma_write_locked(vma, &mm_lock_seq))
727 return;
728
--> 729 down_write(&vma->vm_lock->lock);
730 vma->vm_lock_seq = mm_lock_seq;
731 up_write(&vma->vm_lock->lock);
732 }
The call tree is:
gru_fault() <- disables preempt
-> remap_pfn_range()
-> track_pfn_remap()
-> remap_pfn_range_notrack()
-> vm_flags_set()
-> vma_start_write()
Before track_pfn_remap() and remap_pfn_range_notrack() would just do |=
to set the flags but now they use vm_flags_set() so there is a potential
they could sleep.
regards,
dan carpenter
^ permalink raw reply [flat|nested] 18+ messages in thread* Re: [bug report] mm: replace vma->vm_flags direct modifications with modifier calls 2023-07-11 7:21 [bug report] mm: replace vma->vm_flags direct modifications with modifier calls Dan Carpenter @ 2023-07-11 21:55 ` Suren Baghdasaryan 2023-07-11 22:21 ` Matthew Wilcox 0 siblings, 1 reply; 18+ messages in thread From: Suren Baghdasaryan @ 2023-07-11 21:55 UTC (permalink / raw) To: Dan Carpenter Cc: linux-mm, Andrew Morton, willy, Liam R. Howlett, Laurent Dufour, Michel Lespinasse, Jerome Glisse, Michal Hocko, Vlastimil Babka, Johannes Weiner, Peter Xu, David Hildenbrand On Tue, Jul 11, 2023 at 12:21 AM Dan Carpenter <dan.carpenter@linaro.org> wrote: > > Hello Suren Baghdasaryan, > > The patch 1c71222e5f23: "mm: replace vma->vm_flags direct > modifications with modifier calls" from Jan 26, 2023, leads to the > following Smatch static checker warning: > > ./include/linux/mm.h:729 vma_start_write() > warn: sleeping in atomic context > > include/linux/mm.h > 722 static inline void vma_start_write(struct vm_area_struct *vma) > 723 { > 724 int mm_lock_seq; > 725 > 726 if (__is_vma_write_locked(vma, &mm_lock_seq)) > 727 return; > 728 > --> 729 down_write(&vma->vm_lock->lock); > 730 vma->vm_lock_seq = mm_lock_seq; > 731 up_write(&vma->vm_lock->lock); > 732 } > > The call tree is: > > gru_fault() <- disables preempt > -> remap_pfn_range() > -> track_pfn_remap() > -> remap_pfn_range_notrack() > -> vm_flags_set() > -> vma_start_write() > > Before track_pfn_remap() and remap_pfn_range_notrack() would just do |= > to set the flags but now they use vm_flags_set() so there is a potential > they could sleep. Hi Dan, Thanks for reporting! Looks like the page fault handler is modifying the VMA flags, which has to be done under write-locked mmap_lock and I don't see that being done here... I wonder if that should be allowed. I'm CC'ing some MM folks to check if this is a valid VMA modification and should be allowed. Matthew, this might be especially interesting for you since gru_fault() handles file-backed page faults AFAIKT. Back to the issue at hand. If such modification should be indeed allowed then the simplest fix I think would be to add new remap_pfn_range_locked() function to be called from gru_fault() which would use __vm_flags_mod() instead of vm_flags_set(). __vm_flags_mod() does not lock the VMA, so would not have this issue. If the conclusion is that this is a valid scenario then I can post a fix I described. Thanks, Suren. > > regards, > dan carpenter ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [bug report] mm: replace vma->vm_flags direct modifications with modifier calls 2023-07-11 21:55 ` Suren Baghdasaryan @ 2023-07-11 22:21 ` Matthew Wilcox 2023-07-11 23:45 ` Suren Baghdasaryan 0 siblings, 1 reply; 18+ messages in thread From: Matthew Wilcox @ 2023-07-11 22:21 UTC (permalink / raw) To: Suren Baghdasaryan Cc: Dan Carpenter, linux-mm, Andrew Morton, Liam R. Howlett, Laurent Dufour, Michel Lespinasse, Jerome Glisse, Michal Hocko, Vlastimil Babka, Johannes Weiner, Peter Xu, David Hildenbrand, Dimitri Sivanich, Mike Travis, Steve Wahl On Tue, Jul 11, 2023 at 02:55:20PM -0700, Suren Baghdasaryan wrote: > On Tue, Jul 11, 2023 at 12:21 AM Dan Carpenter <dan.carpenter@linaro.org> wrote: > > > > Hello Suren Baghdasaryan, > > > > The patch 1c71222e5f23: "mm: replace vma->vm_flags direct > > modifications with modifier calls" from Jan 26, 2023, leads to the > > following Smatch static checker warning: > > > > ./include/linux/mm.h:729 vma_start_write() > > warn: sleeping in atomic context > > > > include/linux/mm.h > > 722 static inline void vma_start_write(struct vm_area_struct *vma) > > 723 { > > 724 int mm_lock_seq; > > 725 > > 726 if (__is_vma_write_locked(vma, &mm_lock_seq)) > > 727 return; > > 728 > > --> 729 down_write(&vma->vm_lock->lock); > > 730 vma->vm_lock_seq = mm_lock_seq; > > 731 up_write(&vma->vm_lock->lock); > > 732 } > > > > The call tree is: > > > > gru_fault() <- disables preempt > > -> remap_pfn_range() > > -> track_pfn_remap() > > -> remap_pfn_range_notrack() > > -> vm_flags_set() > > -> vma_start_write() > > > > Before track_pfn_remap() and remap_pfn_range_notrack() would just do |= > > to set the flags but now they use vm_flags_set() so there is a potential > > they could sleep. > > Hi Dan, > Thanks for reporting! Looks like the page fault handler is modifying > the VMA flags, which has to be done under write-locked mmap_lock and I > don't see that being done here... I wonder if that should be allowed. > I'm CC'ing some MM folks to check if this is a valid VMA modification > and should be allowed. Matthew, this might be especially interesting > for you since gru_fault() handles file-backed page faults AFAIKT. I don't run the ->fault handler under RCU, only the ->map_pages() method. I don't intend to change that. > Back to the issue at hand. If such modification should be indeed > allowed then the simplest fix I think would be to add new > remap_pfn_range_locked() function to be called from gru_fault() which > would use __vm_flags_mod() instead of vm_flags_set(). __vm_flags_mod() > does not lock the VMA, so would not have this issue. If the conclusion > is that this is a valid scenario then I can post a fix I described. I'm not certain, but calling remap_pfn_range() in the fault handler is definitely weird. It's normally called _instead_ of having a fault handler. The fault handler usually calls set_pte_at() directly. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [bug report] mm: replace vma->vm_flags direct modifications with modifier calls 2023-07-11 22:21 ` Matthew Wilcox @ 2023-07-11 23:45 ` Suren Baghdasaryan 2023-07-12 7:35 ` David Hildenbrand 0 siblings, 1 reply; 18+ messages in thread From: Suren Baghdasaryan @ 2023-07-11 23:45 UTC (permalink / raw) To: Matthew Wilcox Cc: Dan Carpenter, linux-mm, Andrew Morton, Liam R. Howlett, Laurent Dufour, Michel Lespinasse, Jerome Glisse, Michal Hocko, Vlastimil Babka, Johannes Weiner, Peter Xu, David Hildenbrand, Dimitri Sivanich, Mike Travis, Steve Wahl On Tue, Jul 11, 2023 at 3:21 PM Matthew Wilcox <willy@infradead.org> wrote: > > On Tue, Jul 11, 2023 at 02:55:20PM -0700, Suren Baghdasaryan wrote: > > On Tue, Jul 11, 2023 at 12:21 AM Dan Carpenter <dan.carpenter@linaro.org> wrote: > > > > > > Hello Suren Baghdasaryan, > > > > > > The patch 1c71222e5f23: "mm: replace vma->vm_flags direct > > > modifications with modifier calls" from Jan 26, 2023, leads to the > > > following Smatch static checker warning: > > > > > > ./include/linux/mm.h:729 vma_start_write() > > > warn: sleeping in atomic context > > > > > > include/linux/mm.h > > > 722 static inline void vma_start_write(struct vm_area_struct *vma) > > > 723 { > > > 724 int mm_lock_seq; > > > 725 > > > 726 if (__is_vma_write_locked(vma, &mm_lock_seq)) > > > 727 return; > > > 728 > > > --> 729 down_write(&vma->vm_lock->lock); > > > 730 vma->vm_lock_seq = mm_lock_seq; > > > 731 up_write(&vma->vm_lock->lock); > > > 732 } > > > > > > The call tree is: > > > > > > gru_fault() <- disables preempt > > > -> remap_pfn_range() > > > -> track_pfn_remap() > > > -> remap_pfn_range_notrack() > > > -> vm_flags_set() > > > -> vma_start_write() > > > > > > Before track_pfn_remap() and remap_pfn_range_notrack() would just do |= > > > to set the flags but now they use vm_flags_set() so there is a potential > > > they could sleep. > > > > Hi Dan, > > Thanks for reporting! Looks like the page fault handler is modifying > > the VMA flags, which has to be done under write-locked mmap_lock and I > > don't see that being done here... I wonder if that should be allowed. > > I'm CC'ing some MM folks to check if this is a valid VMA modification > > and should be allowed. Matthew, this might be especially interesting > > for you since gru_fault() handles file-backed page faults AFAIKT. > > I don't run the ->fault handler under RCU, only the ->map_pages() > method. I don't intend to change that. > > > Back to the issue at hand. If such modification should be indeed > > allowed then the simplest fix I think would be to add new > > remap_pfn_range_locked() function to be called from gru_fault() which > > would use __vm_flags_mod() instead of vm_flags_set(). __vm_flags_mod() > > does not lock the VMA, so would not have this issue. If the conclusion > > is that this is a valid scenario then I can post a fix I described. > > I'm not certain, but calling remap_pfn_range() in the fault handler > is definitely weird. It's normally called _instead_ of having a fault > handler. The fault handler usually calls set_pte_at() directly. Hmm. Is it weird enough to be considered invalid or weird but still ok? Also, is it ok to modify VMA flags here without write-locking the mmap_lock (and without write-locking the VMA)? The fault handler is done under read-locked mmap_lock but I thought VMA modifications require stronger locking... ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [bug report] mm: replace vma->vm_flags direct modifications with modifier calls 2023-07-11 23:45 ` Suren Baghdasaryan @ 2023-07-12 7:35 ` David Hildenbrand 2023-07-12 15:01 ` Suren Baghdasaryan 0 siblings, 1 reply; 18+ messages in thread From: David Hildenbrand @ 2023-07-12 7:35 UTC (permalink / raw) To: Suren Baghdasaryan, Matthew Wilcox Cc: Dan Carpenter, linux-mm, Andrew Morton, Liam R. Howlett, Laurent Dufour, Michel Lespinasse, Jerome Glisse, Michal Hocko, Vlastimil Babka, Johannes Weiner, Peter Xu, Dimitri Sivanich, Mike Travis, Steve Wahl On 12.07.23 01:45, Suren Baghdasaryan wrote: > On Tue, Jul 11, 2023 at 3:21 PM Matthew Wilcox <willy@infradead.org> wrote: >> >> On Tue, Jul 11, 2023 at 02:55:20PM -0700, Suren Baghdasaryan wrote: >>> On Tue, Jul 11, 2023 at 12:21 AM Dan Carpenter <dan.carpenter@linaro.org> wrote: >>>> >>>> Hello Suren Baghdasaryan, >>>> >>>> The patch 1c71222e5f23: "mm: replace vma->vm_flags direct >>>> modifications with modifier calls" from Jan 26, 2023, leads to the >>>> following Smatch static checker warning: >>>> >>>> ./include/linux/mm.h:729 vma_start_write() >>>> warn: sleeping in atomic context >>>> >>>> include/linux/mm.h >>>> 722 static inline void vma_start_write(struct vm_area_struct *vma) >>>> 723 { >>>> 724 int mm_lock_seq; >>>> 725 >>>> 726 if (__is_vma_write_locked(vma, &mm_lock_seq)) >>>> 727 return; >>>> 728 >>>> --> 729 down_write(&vma->vm_lock->lock); >>>> 730 vma->vm_lock_seq = mm_lock_seq; >>>> 731 up_write(&vma->vm_lock->lock); >>>> 732 } >>>> >>>> The call tree is: >>>> >>>> gru_fault() <- disables preempt >>>> -> remap_pfn_range() >>>> -> track_pfn_remap() >>>> -> remap_pfn_range_notrack() >>>> -> vm_flags_set() >>>> -> vma_start_write() >>>> >>>> Before track_pfn_remap() and remap_pfn_range_notrack() would just do |= >>>> to set the flags but now they use vm_flags_set() so there is a potential >>>> they could sleep. >>> >>> Hi Dan, >>> Thanks for reporting! Looks like the page fault handler is modifying >>> the VMA flags, which has to be done under write-locked mmap_lock and I >>> don't see that being done here... I wonder if that should be allowed. >>> I'm CC'ing some MM folks to check if this is a valid VMA modification >>> and should be allowed. Matthew, this might be especially interesting >>> for you since gru_fault() handles file-backed page faults AFAIKT. >> >> I don't run the ->fault handler under RCU, only the ->map_pages() >> method. I don't intend to change that. >> >>> Back to the issue at hand. If such modification should be indeed >>> allowed then the simplest fix I think would be to add new >>> remap_pfn_range_locked() function to be called from gru_fault() which >>> would use __vm_flags_mod() instead of vm_flags_set(). __vm_flags_mod() >>> does not lock the VMA, so would not have this issue. If the conclusion >>> is that this is a valid scenario then I can post a fix I described. >> >> I'm not certain, but calling remap_pfn_range() in the fault handler >> is definitely weird. It's normally called _instead_ of having a fault >> handler. The fault handler usually calls set_pte_at() directly. > > Hmm. Is it weird enough to be considered invalid or weird but still ok? > Also, is it ok to modify VMA flags here without write-locking the > mmap_lock (and without write-locking the VMA)? The fault handler is > done under read-locked mmap_lock but I thought VMA modifications > require stronger locking... > The "easy" fix would be to have something like "remap_pfn_range_prepare" that the remap_pfn_range() caller calls during mmap(). We can let that one set these flags, and then we can later let remap_pfn_range() fail if the relevant flags are not set. It's certainly one of these "always done like that and suboptimal but somehow it worked" thingies. I suspect, because only the first pagefault->remap_pfn_range() will actually modify the VMA flags, that this is ok. Otherwise, there usually wouldn't be any pagetables and nothing mapped, so who really cares about these VMA flags (e.g., GUP cannot pin anything if nothing is mapped). -- Cheers, David / dhildenb ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [bug report] mm: replace vma->vm_flags direct modifications with modifier calls 2023-07-12 7:35 ` David Hildenbrand @ 2023-07-12 15:01 ` Suren Baghdasaryan 2023-07-12 15:52 ` Matthew Wilcox 0 siblings, 1 reply; 18+ messages in thread From: Suren Baghdasaryan @ 2023-07-12 15:01 UTC (permalink / raw) To: David Hildenbrand Cc: Matthew Wilcox, Dan Carpenter, linux-mm, Andrew Morton, Liam R. Howlett, Laurent Dufour, Michel Lespinasse, Jerome Glisse, Michal Hocko, Vlastimil Babka, Johannes Weiner, Peter Xu, Dimitri Sivanich, Mike Travis, Steve Wahl On Wed, Jul 12, 2023 at 12:35 AM David Hildenbrand <david@redhat.com> wrote: > > On 12.07.23 01:45, Suren Baghdasaryan wrote: > > On Tue, Jul 11, 2023 at 3:21 PM Matthew Wilcox <willy@infradead.org> wrote: > >> > >> On Tue, Jul 11, 2023 at 02:55:20PM -0700, Suren Baghdasaryan wrote: > >>> On Tue, Jul 11, 2023 at 12:21 AM Dan Carpenter <dan.carpenter@linaro.org> wrote: > >>>> > >>>> Hello Suren Baghdasaryan, > >>>> > >>>> The patch 1c71222e5f23: "mm: replace vma->vm_flags direct > >>>> modifications with modifier calls" from Jan 26, 2023, leads to the > >>>> following Smatch static checker warning: > >>>> > >>>> ./include/linux/mm.h:729 vma_start_write() > >>>> warn: sleeping in atomic context > >>>> > >>>> include/linux/mm.h > >>>> 722 static inline void vma_start_write(struct vm_area_struct *vma) > >>>> 723 { > >>>> 724 int mm_lock_seq; > >>>> 725 > >>>> 726 if (__is_vma_write_locked(vma, &mm_lock_seq)) > >>>> 727 return; > >>>> 728 > >>>> --> 729 down_write(&vma->vm_lock->lock); > >>>> 730 vma->vm_lock_seq = mm_lock_seq; > >>>> 731 up_write(&vma->vm_lock->lock); > >>>> 732 } > >>>> > >>>> The call tree is: > >>>> > >>>> gru_fault() <- disables preempt > >>>> -> remap_pfn_range() > >>>> -> track_pfn_remap() > >>>> -> remap_pfn_range_notrack() > >>>> -> vm_flags_set() > >>>> -> vma_start_write() > >>>> > >>>> Before track_pfn_remap() and remap_pfn_range_notrack() would just do |= > >>>> to set the flags but now they use vm_flags_set() so there is a potential > >>>> they could sleep. > >>> > >>> Hi Dan, > >>> Thanks for reporting! Looks like the page fault handler is modifying > >>> the VMA flags, which has to be done under write-locked mmap_lock and I > >>> don't see that being done here... I wonder if that should be allowed. > >>> I'm CC'ing some MM folks to check if this is a valid VMA modification > >>> and should be allowed. Matthew, this might be especially interesting > >>> for you since gru_fault() handles file-backed page faults AFAIKT. > >> > >> I don't run the ->fault handler under RCU, only the ->map_pages() > >> method. I don't intend to change that. > >> > >>> Back to the issue at hand. If such modification should be indeed > >>> allowed then the simplest fix I think would be to add new > >>> remap_pfn_range_locked() function to be called from gru_fault() which > >>> would use __vm_flags_mod() instead of vm_flags_set(). __vm_flags_mod() > >>> does not lock the VMA, so would not have this issue. If the conclusion > >>> is that this is a valid scenario then I can post a fix I described. > >> > >> I'm not certain, but calling remap_pfn_range() in the fault handler > >> is definitely weird. It's normally called _instead_ of having a fault > >> handler. The fault handler usually calls set_pte_at() directly. > > > > Hmm. Is it weird enough to be considered invalid or weird but still ok? > > Also, is it ok to modify VMA flags here without write-locking the > > mmap_lock (and without write-locking the VMA)? The fault handler is > > done under read-locked mmap_lock but I thought VMA modifications > > require stronger locking... > > > > The "easy" fix would be to have something like "remap_pfn_range_prepare" > that the remap_pfn_range() caller calls during mmap(). Are you suggesting to break remap_pfn_range() into two stages (remap_pfn_range_prepare() then remap_pfn_range())? If so, there are many places remap_pfn_range() is called and IIUC all of them would need to use that 2-stage approach (lots of code churn). In addition, this is an exported function, so many more drivers might expect the current behavior. My suggestion was to have another function called smth like remap_pfn_range_nolock() and internally use the same code with an additional flag that would tell us whether we should lock the vma or not. Such change should be quite simple and small. > > We can let that one set these flags, and then we can later let > remap_pfn_range() fail if the relevant flags are not set. > > It's certainly one of these "always done like that and suboptimal but > somehow it worked" thingies. > > I suspect, because only the first pagefault->remap_pfn_range() will > actually modify the VMA flags, that this is ok. Otherwise, there usually > wouldn't be any pagetables and nothing mapped, so who really cares about > these VMA flags (e.g., GUP cannot pin anything if nothing is mapped). Is my understanding correct that the reasoning remap_pfn_range() is allowed here without write-locking the VMA (or write-locking mmap_lock) is because it's done only if there are no pre-existing patables? Thanks, Suren. > > -- > Cheers, > > David / dhildenb > ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [bug report] mm: replace vma->vm_flags direct modifications with modifier calls 2023-07-12 15:01 ` Suren Baghdasaryan @ 2023-07-12 15:52 ` Matthew Wilcox 2023-07-12 15:55 ` David Hildenbrand 0 siblings, 1 reply; 18+ messages in thread From: Matthew Wilcox @ 2023-07-12 15:52 UTC (permalink / raw) To: Suren Baghdasaryan Cc: David Hildenbrand, Dan Carpenter, linux-mm, Andrew Morton, Liam R. Howlett, Laurent Dufour, Michel Lespinasse, Jerome Glisse, Michal Hocko, Vlastimil Babka, Johannes Weiner, Peter Xu, Dimitri Sivanich, Mike Travis, Steve Wahl On Wed, Jul 12, 2023 at 08:01:18AM -0700, Suren Baghdasaryan wrote: > Are you suggesting to break remap_pfn_range() into two stages > (remap_pfn_range_prepare() then remap_pfn_range())? > If so, there are many places remap_pfn_range() is called and IIUC all > of them would need to use that 2-stage approach (lots of code churn). > In addition, this is an exported function, so many more drivers might > expect the current behavior. You do not understand correctly. When somebody calls mmap, there are two reasonable implementations. Here's one: .mmap = snd_dma_iram_mmap, static int snd_dma_iram_mmap(struct snd_dma_buffer *dmab, struct vm_area_struct *area) { area->vm_page_prot = pgprot_writecombine(area->vm_page_prot); return remap_pfn_range(area, area->vm_start, dmab->addr >> PAGE_SHIFT, area->vm_end - area->vm_start, area->vm_page_prot); } This is _fine_. It is not called from the fault path, it is called in process context. Few locks are held (which ones aren't even documented!) The other way is to set vma->vm_ops. The fault handler in vm_ops should not be calling remap_pfn_range(). It should be calling set_ptes(). I almost have this driver fixed up, but I have another meeting to go to now. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [bug report] mm: replace vma->vm_flags direct modifications with modifier calls 2023-07-12 15:52 ` Matthew Wilcox @ 2023-07-12 15:55 ` David Hildenbrand 2023-07-12 16:03 ` Suren Baghdasaryan 2023-07-12 18:49 ` Matthew Wilcox 0 siblings, 2 replies; 18+ messages in thread From: David Hildenbrand @ 2023-07-12 15:55 UTC (permalink / raw) To: Matthew Wilcox, Suren Baghdasaryan Cc: Dan Carpenter, linux-mm, Andrew Morton, Liam R. Howlett, Laurent Dufour, Michel Lespinasse, Jerome Glisse, Michal Hocko, Vlastimil Babka, Johannes Weiner, Peter Xu, Dimitri Sivanich, Mike Travis, Steve Wahl On 12.07.23 17:52, Matthew Wilcox wrote: > On Wed, Jul 12, 2023 at 08:01:18AM -0700, Suren Baghdasaryan wrote: >> Are you suggesting to break remap_pfn_range() into two stages >> (remap_pfn_range_prepare() then remap_pfn_range())? >> If so, there are many places remap_pfn_range() is called and IIUC all >> of them would need to use that 2-stage approach (lots of code churn). >> In addition, this is an exported function, so many more drivers might >> expect the current behavior. > > You do not understand correctly. > > When somebody calls mmap, there are two reasonable implementations. > Here's one: > > .mmap = snd_dma_iram_mmap, > > static int snd_dma_iram_mmap(struct snd_dma_buffer *dmab, > struct vm_area_struct *area) > { > area->vm_page_prot = pgprot_writecombine(area->vm_page_prot); > return remap_pfn_range(area, area->vm_start, > dmab->addr >> PAGE_SHIFT, > area->vm_end - area->vm_start, > area->vm_page_prot); > } > > This is _fine_. It is not called from the fault path, it is called in > process context. Few locks are held (which ones aren't even > documented!) > > The other way is to set vma->vm_ops. The fault handler in vm_ops > should not be calling remap_pfn_range(). It should be calling > set_ptes(). I almost have this driver fixed up, but I have another > meeting to go to now. Just a note that we still have to make sure that the VMA flags will be set properly -- I guess at mmap time is the right time as I suggested above. -- Cheers, David / dhildenb ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [bug report] mm: replace vma->vm_flags direct modifications with modifier calls 2023-07-12 15:55 ` David Hildenbrand @ 2023-07-12 16:03 ` Suren Baghdasaryan 2023-07-12 18:49 ` Matthew Wilcox 1 sibling, 0 replies; 18+ messages in thread From: Suren Baghdasaryan @ 2023-07-12 16:03 UTC (permalink / raw) To: David Hildenbrand Cc: Matthew Wilcox, Dan Carpenter, linux-mm, Andrew Morton, Liam R. Howlett, Laurent Dufour, Michel Lespinasse, Jerome Glisse, Michal Hocko, Vlastimil Babka, Johannes Weiner, Peter Xu, Dimitri Sivanich, Mike Travis, Steve Wahl On Wed, Jul 12, 2023 at 8:55 AM David Hildenbrand <david@redhat.com> wrote: > > On 12.07.23 17:52, Matthew Wilcox wrote: > > On Wed, Jul 12, 2023 at 08:01:18AM -0700, Suren Baghdasaryan wrote: > >> Are you suggesting to break remap_pfn_range() into two stages > >> (remap_pfn_range_prepare() then remap_pfn_range())? > >> If so, there are many places remap_pfn_range() is called and IIUC all > >> of them would need to use that 2-stage approach (lots of code churn). > >> In addition, this is an exported function, so many more drivers might > >> expect the current behavior. > > > > You do not understand correctly. > > > > When somebody calls mmap, there are two reasonable implementations. > > Here's one: > > > > .mmap = snd_dma_iram_mmap, > > > > static int snd_dma_iram_mmap(struct snd_dma_buffer *dmab, > > struct vm_area_struct *area) > > { > > area->vm_page_prot = pgprot_writecombine(area->vm_page_prot); > > return remap_pfn_range(area, area->vm_start, > > dmab->addr >> PAGE_SHIFT, > > area->vm_end - area->vm_start, > > area->vm_page_prot); > > } > > > > This is _fine_. It is not called from the fault path, it is called in > > process context. Few locks are held (which ones aren't even > > documented!) > > > > The other way is to set vma->vm_ops. The fault handler in vm_ops > > should not be calling remap_pfn_range(). It should be calling > > set_ptes(). I almost have this driver fixed up, but I have another > > meeting to go to now. > > Just a note that we still have to make sure that the VMA flags will be > set properly -- I guess at mmap time is the right time as I suggested above. Ah, ok, now I understand what you meant. Ok, let's wait for Matthew to send the fix for the driver. Sounds like he has it almost ready. > > -- > Cheers, > > David / dhildenb > ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [bug report] mm: replace vma->vm_flags direct modifications with modifier calls 2023-07-12 15:55 ` David Hildenbrand 2023-07-12 16:03 ` Suren Baghdasaryan @ 2023-07-12 18:49 ` Matthew Wilcox 2023-07-12 18:52 ` David Hildenbrand 2023-07-12 19:34 ` Matthew Wilcox 1 sibling, 2 replies; 18+ messages in thread From: Matthew Wilcox @ 2023-07-12 18:49 UTC (permalink / raw) To: David Hildenbrand Cc: Suren Baghdasaryan, Dan Carpenter, linux-mm, Andrew Morton, Liam R. Howlett, Laurent Dufour, Michel Lespinasse, Jerome Glisse, Michal Hocko, Vlastimil Babka, Johannes Weiner, Peter Xu, Dimitri Sivanich, Mike Travis, Steve Wahl On Wed, Jul 12, 2023 at 05:55:47PM +0200, David Hildenbrand wrote: > On 12.07.23 17:52, Matthew Wilcox wrote: > > On Wed, Jul 12, 2023 at 08:01:18AM -0700, Suren Baghdasaryan wrote: > > > Are you suggesting to break remap_pfn_range() into two stages > > > (remap_pfn_range_prepare() then remap_pfn_range())? > > > If so, there are many places remap_pfn_range() is called and IIUC all > > > of them would need to use that 2-stage approach (lots of code churn). > > > In addition, this is an exported function, so many more drivers might > > > expect the current behavior. > > > > You do not understand correctly. > > > > When somebody calls mmap, there are two reasonable implementations. > > Here's one: > > > > .mmap = snd_dma_iram_mmap, > > > > static int snd_dma_iram_mmap(struct snd_dma_buffer *dmab, > > struct vm_area_struct *area) > > { > > area->vm_page_prot = pgprot_writecombine(area->vm_page_prot); > > return remap_pfn_range(area, area->vm_start, > > dmab->addr >> PAGE_SHIFT, > > area->vm_end - area->vm_start, > > area->vm_page_prot); > > } > > > > This is _fine_. It is not called from the fault path, it is called in > > process context. Few locks are held (which ones aren't even > > documented!) > > > > The other way is to set vma->vm_ops. The fault handler in vm_ops > > should not be calling remap_pfn_range(). It should be calling > > set_ptes(). I almost have this driver fixed up, but I have another > > meeting to go to now. > > Just a note that we still have to make sure that the VMA flags will be set > properly -- I guess at mmap time is the right time as I suggested above. It actually does that already: static int gru_file_mmap(struct file *file, struct vm_area_struct *vma) { if ((vma->vm_flags & (VM_SHARED | VM_WRITE)) != (VM_SHARED | VM_WRITE)) return -EPERM; if (vma->vm_start & (GRU_GSEG_PAGESIZE - 1) || vma->vm_end & (GRU_GSEG_PAGESIZE - 1)) return -EINVAL; vm_flags_set(vma, VM_IO | VM_PFNMAP | VM_LOCKED | VM_DONTCOPY | VM_DONTEXPAND | VM_DONTDUMP); This compiles, but obviously I don't have a spare HP supercomputer lying around for me to test whether it works. Also set_ptes() was only just introduced to the mm tree, so doing something that needs backporting would take more effort (maybe having a private set_ptes() in the driver would be a good backport option that calls set_pte_at() in a loop). diff --git a/drivers/misc/sgi-gru/grumain.c b/drivers/misc/sgi-gru/grumain.c index 4eb4b9455139..c21bcb528f12 100644 --- a/drivers/misc/sgi-gru/grumain.c +++ b/drivers/misc/sgi-gru/grumain.c @@ -951,6 +951,8 @@ vm_fault_t gru_fault(struct vm_fault *vmf) } if (!gts->ts_gru) { + pte_t *ptep, pte; + STAT(load_user_context); if (!gru_assign_gru_context(gts)) { preempt_enable(); @@ -964,9 +966,12 @@ vm_fault_t gru_fault(struct vm_fault *vmf) } gru_load_context(gts); paddr = gseg_physical_address(gts->ts_gru, gts->ts_ctxnum); - remap_pfn_range(vma, vaddr & ~(GRU_GSEG_PAGESIZE - 1), - paddr >> PAGE_SHIFT, GRU_GSEG_PAGESIZE, - vma->vm_page_prot); + + pte = pfn_pte(paddr / PAGE_SIZE, vma->vm_page_prot); + ptep = vmf->pte - (vaddr % GRU_GSEG_PAGESIZE) / PAGE_SIZE; + set_ptes(vma->vm_mm, vaddr & ~(GRU_GSEG_PAGESIZE - 1), + ptep, pte_mkspecial(pte), + GRU_GSEG_PAGESIZE / PAGE_SIZE); } preempt_enable(); ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [bug report] mm: replace vma->vm_flags direct modifications with modifier calls 2023-07-12 18:49 ` Matthew Wilcox @ 2023-07-12 18:52 ` David Hildenbrand 2023-07-12 19:48 ` Suren Baghdasaryan 2023-07-12 19:34 ` Matthew Wilcox 1 sibling, 1 reply; 18+ messages in thread From: David Hildenbrand @ 2023-07-12 18:52 UTC (permalink / raw) To: Matthew Wilcox Cc: Suren Baghdasaryan, Dan Carpenter, linux-mm, Andrew Morton, Liam R. Howlett, Laurent Dufour, Michel Lespinasse, Jerome Glisse, Michal Hocko, Vlastimil Babka, Johannes Weiner, Peter Xu, Dimitri Sivanich, Mike Travis, Steve Wahl On 12.07.23 20:49, Matthew Wilcox wrote: > On Wed, Jul 12, 2023 at 05:55:47PM +0200, David Hildenbrand wrote: >> On 12.07.23 17:52, Matthew Wilcox wrote: >>> On Wed, Jul 12, 2023 at 08:01:18AM -0700, Suren Baghdasaryan wrote: >>>> Are you suggesting to break remap_pfn_range() into two stages >>>> (remap_pfn_range_prepare() then remap_pfn_range())? >>>> If so, there are many places remap_pfn_range() is called and IIUC all >>>> of them would need to use that 2-stage approach (lots of code churn). >>>> In addition, this is an exported function, so many more drivers might >>>> expect the current behavior. >>> >>> You do not understand correctly. >>> >>> When somebody calls mmap, there are two reasonable implementations. >>> Here's one: >>> >>> .mmap = snd_dma_iram_mmap, >>> >>> static int snd_dma_iram_mmap(struct snd_dma_buffer *dmab, >>> struct vm_area_struct *area) >>> { >>> area->vm_page_prot = pgprot_writecombine(area->vm_page_prot); >>> return remap_pfn_range(area, area->vm_start, >>> dmab->addr >> PAGE_SHIFT, >>> area->vm_end - area->vm_start, >>> area->vm_page_prot); >>> } >>> >>> This is _fine_. It is not called from the fault path, it is called in >>> process context. Few locks are held (which ones aren't even >>> documented!) >>> >>> The other way is to set vma->vm_ops. The fault handler in vm_ops >>> should not be calling remap_pfn_range(). It should be calling >>> set_ptes(). I almost have this driver fixed up, but I have another >>> meeting to go to now. >> >> Just a note that we still have to make sure that the VMA flags will be set >> properly -- I guess at mmap time is the right time as I suggested above. > > It actually does that already: > > static int gru_file_mmap(struct file *file, struct vm_area_struct *vma) > { > if ((vma->vm_flags & (VM_SHARED | VM_WRITE)) != (VM_SHARED | VM_WRITE)) > return -EPERM; > > if (vma->vm_start & (GRU_GSEG_PAGESIZE - 1) || > vma->vm_end & (GRU_GSEG_PAGESIZE - 1)) > return -EINVAL; > > vm_flags_set(vma, VM_IO | VM_PFNMAP | VM_LOCKED | > VM_DONTCOPY | VM_DONTEXPAND | VM_DONTDUMP); > > > This compiles, but obviously I don't have a spare HP supercomputer lying > around for me to test whether it works. Also set_ptes() was only just > introduced to the mm tree, so doing something that needs backporting > would take more effort (maybe having a private set_ptes() in the driver > would be a good backport option that calls set_pte_at() in a loop). > > > diff --git a/drivers/misc/sgi-gru/grumain.c b/drivers/misc/sgi-gru/grumain.c > index 4eb4b9455139..c21bcb528f12 100644 > --- a/drivers/misc/sgi-gru/grumain.c > +++ b/drivers/misc/sgi-gru/grumain.c > @@ -951,6 +951,8 @@ vm_fault_t gru_fault(struct vm_fault *vmf) > } > > if (!gts->ts_gru) { > + pte_t *ptep, pte; > + > STAT(load_user_context); > if (!gru_assign_gru_context(gts)) { > preempt_enable(); > @@ -964,9 +966,12 @@ vm_fault_t gru_fault(struct vm_fault *vmf) > } > gru_load_context(gts); > paddr = gseg_physical_address(gts->ts_gru, gts->ts_ctxnum); > - remap_pfn_range(vma, vaddr & ~(GRU_GSEG_PAGESIZE - 1), > - paddr >> PAGE_SHIFT, GRU_GSEG_PAGESIZE, > - vma->vm_page_prot); > + > + pte = pfn_pte(paddr / PAGE_SIZE, vma->vm_page_prot); > + ptep = vmf->pte - (vaddr % GRU_GSEG_PAGESIZE) / PAGE_SIZE; > + set_ptes(vma->vm_mm, vaddr & ~(GRU_GSEG_PAGESIZE - 1), > + ptep, pte_mkspecial(pte), > + GRU_GSEG_PAGESIZE / PAGE_SIZE); > } > > preempt_enable(); > Would we be able to fix it in stable simply by not triggering the vm_flags_set() in case these flags are already set? -- Cheers, David / dhildenb ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [bug report] mm: replace vma->vm_flags direct modifications with modifier calls 2023-07-12 18:52 ` David Hildenbrand @ 2023-07-12 19:48 ` Suren Baghdasaryan 2023-07-17 6:13 ` Yan Zhao 0 siblings, 1 reply; 18+ messages in thread From: Suren Baghdasaryan @ 2023-07-12 19:48 UTC (permalink / raw) To: David Hildenbrand Cc: Matthew Wilcox, Dan Carpenter, linux-mm, Andrew Morton, Liam R. Howlett, Laurent Dufour, Michel Lespinasse, Jerome Glisse, Michal Hocko, Vlastimil Babka, Johannes Weiner, Peter Xu, Dimitri Sivanich, Mike Travis, Steve Wahl On Wed, Jul 12, 2023 at 6:52 PM David Hildenbrand <david@redhat.com> wrote: > > On 12.07.23 20:49, Matthew Wilcox wrote: > > On Wed, Jul 12, 2023 at 05:55:47PM +0200, David Hildenbrand wrote: > >> On 12.07.23 17:52, Matthew Wilcox wrote: > >>> On Wed, Jul 12, 2023 at 08:01:18AM -0700, Suren Baghdasaryan wrote: > >>>> Are you suggesting to break remap_pfn_range() into two stages > >>>> (remap_pfn_range_prepare() then remap_pfn_range())? > >>>> If so, there are many places remap_pfn_range() is called and IIUC all > >>>> of them would need to use that 2-stage approach (lots of code churn). > >>>> In addition, this is an exported function, so many more drivers might > >>>> expect the current behavior. > >>> > >>> You do not understand correctly. > >>> > >>> When somebody calls mmap, there are two reasonable implementations. > >>> Here's one: > >>> > >>> .mmap = snd_dma_iram_mmap, > >>> > >>> static int snd_dma_iram_mmap(struct snd_dma_buffer *dmab, > >>> struct vm_area_struct *area) > >>> { > >>> area->vm_page_prot = pgprot_writecombine(area->vm_page_prot); > >>> return remap_pfn_range(area, area->vm_start, > >>> dmab->addr >> PAGE_SHIFT, > >>> area->vm_end - area->vm_start, > >>> area->vm_page_prot); > >>> } > >>> > >>> This is _fine_. It is not called from the fault path, it is called in > >>> process context. Few locks are held (which ones aren't even > >>> documented!) > >>> > >>> The other way is to set vma->vm_ops. The fault handler in vm_ops > >>> should not be calling remap_pfn_range(). It should be calling > >>> set_ptes(). I almost have this driver fixed up, but I have another > >>> meeting to go to now. > >> > >> Just a note that we still have to make sure that the VMA flags will be set > >> properly -- I guess at mmap time is the right time as I suggested above. > > > > It actually does that already: > > > > static int gru_file_mmap(struct file *file, struct vm_area_struct *vma) > > { > > if ((vma->vm_flags & (VM_SHARED | VM_WRITE)) != (VM_SHARED | VM_WRITE)) > > return -EPERM; > > > > if (vma->vm_start & (GRU_GSEG_PAGESIZE - 1) || > > vma->vm_end & (GRU_GSEG_PAGESIZE - 1)) > > return -EINVAL; > > > > vm_flags_set(vma, VM_IO | VM_PFNMAP | VM_LOCKED | > > VM_DONTCOPY | VM_DONTEXPAND | VM_DONTDUMP); > > > > > > This compiles, but obviously I don't have a spare HP supercomputer lying > > around for me to test whether it works. Also set_ptes() was only just > > introduced to the mm tree, so doing something that needs backporting > > would take more effort (maybe having a private set_ptes() in the driver > > would be a good backport option that calls set_pte_at() in a loop). vm_flags_set() was introduced in 6.3, so not too many versions to backport to if needed. Private set_ptes() seems reasonable to me. > > > > > > diff --git a/drivers/misc/sgi-gru/grumain.c b/drivers/misc/sgi-gru/grumain.c > > index 4eb4b9455139..c21bcb528f12 100644 > > --- a/drivers/misc/sgi-gru/grumain.c > > +++ b/drivers/misc/sgi-gru/grumain.c > > @@ -951,6 +951,8 @@ vm_fault_t gru_fault(struct vm_fault *vmf) > > } > > > > if (!gts->ts_gru) { > > + pte_t *ptep, pte; > > + > > STAT(load_user_context); > > if (!gru_assign_gru_context(gts)) { > > preempt_enable(); > > @@ -964,9 +966,12 @@ vm_fault_t gru_fault(struct vm_fault *vmf) > > } > > gru_load_context(gts); > > paddr = gseg_physical_address(gts->ts_gru, gts->ts_ctxnum); > > - remap_pfn_range(vma, vaddr & ~(GRU_GSEG_PAGESIZE - 1), > > - paddr >> PAGE_SHIFT, GRU_GSEG_PAGESIZE, > > - vma->vm_page_prot); > > + > > + pte = pfn_pte(paddr / PAGE_SIZE, vma->vm_page_prot); > > + ptep = vmf->pte - (vaddr % GRU_GSEG_PAGESIZE) / PAGE_SIZE; > > + set_ptes(vma->vm_mm, vaddr & ~(GRU_GSEG_PAGESIZE - 1), > > + ptep, pte_mkspecial(pte), > > + GRU_GSEG_PAGESIZE / PAGE_SIZE); > > } > > > > preempt_enable(); > > > > Would we be able to fix it in stable simply by not triggering the > vm_flags_set() in case these flags are already set? I think we can do that. gru_file_mmap() sets all the flags that are set by remap_pfn_range_notrack() (VM_IO | VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP), so we can check if all bits are already present and skip the vm_flags_set() call. > > -- > Cheers, > > David / dhildenb > ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [bug report] mm: replace vma->vm_flags direct modifications with modifier calls 2023-07-12 19:48 ` Suren Baghdasaryan @ 2023-07-17 6:13 ` Yan Zhao 2023-07-17 16:18 ` Suren Baghdasaryan 0 siblings, 1 reply; 18+ messages in thread From: Yan Zhao @ 2023-07-17 6:13 UTC (permalink / raw) To: Suren Baghdasaryan Cc: David Hildenbrand, Matthew Wilcox, Dan Carpenter, linux-mm, Andrew Morton, Liam R. Howlett, Laurent Dufour, Michel Lespinasse, Jerome Glisse, Michal Hocko, Vlastimil Babka, Johannes Weiner, Peter Xu, Dimitri Sivanich, Mike Travis, Steve Wahl On Wed, Jul 12, 2023 at 07:48:06PM +0000, Suren Baghdasaryan wrote: > > Would we be able to fix it in stable simply by not triggering the > > vm_flags_set() in case these flags are already set? > > I think we can do that. gru_file_mmap() sets all the flags that are > set by remap_pfn_range_notrack() (VM_IO | VM_PFNMAP | VM_DONTEXPAND | > VM_DONTDUMP), so we can check if all bits are already present and skip > the vm_flags_set() call. > But on x86, remap_pfn_range() also sets flag VM_PAT. (in track_pfn_remap()). Is there any interface to allow device driver to pre-set this flag in .mmap() before .fault()? e.g. export track_pfn_remap() ? ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [bug report] mm: replace vma->vm_flags direct modifications with modifier calls 2023-07-17 6:13 ` Yan Zhao @ 2023-07-17 16:18 ` Suren Baghdasaryan 2023-07-18 0:27 ` Yan Zhao 0 siblings, 1 reply; 18+ messages in thread From: Suren Baghdasaryan @ 2023-07-17 16:18 UTC (permalink / raw) To: Yan Zhao Cc: David Hildenbrand, Matthew Wilcox, Dan Carpenter, linux-mm, Andrew Morton, Liam R. Howlett, Laurent Dufour, Michel Lespinasse, Jerome Glisse, Michal Hocko, Vlastimil Babka, Johannes Weiner, Peter Xu, Dimitri Sivanich, Mike Travis, Steve Wahl On Sun, Jul 16, 2023 at 11:40 PM Yan Zhao <yan.y.zhao@intel.com> wrote: > > On Wed, Jul 12, 2023 at 07:48:06PM +0000, Suren Baghdasaryan wrote: > > > Would we be able to fix it in stable simply by not triggering the > > > vm_flags_set() in case these flags are already set? > > > > I think we can do that. gru_file_mmap() sets all the flags that are > > set by remap_pfn_range_notrack() (VM_IO | VM_PFNMAP | VM_DONTEXPAND | > > VM_DONTDUMP), so we can check if all bits are already present and skip > > the vm_flags_set() call. > > > But on x86, remap_pfn_range() also sets flag VM_PAT. (in track_pfn_remap()). > > Is there any interface to allow device driver to pre-set this flag in .mmap() > before .fault()? e.g. export track_pfn_remap() ? Driver should be able to call vm_flags_set() in its .mmap(). > ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [bug report] mm: replace vma->vm_flags direct modifications with modifier calls 2023-07-17 16:18 ` Suren Baghdasaryan @ 2023-07-18 0:27 ` Yan Zhao 2023-07-18 16:27 ` Suren Baghdasaryan 0 siblings, 1 reply; 18+ messages in thread From: Yan Zhao @ 2023-07-18 0:27 UTC (permalink / raw) To: Suren Baghdasaryan Cc: David Hildenbrand, Matthew Wilcox, Dan Carpenter, linux-mm, Andrew Morton, Liam R. Howlett, Laurent Dufour, Michel Lespinasse, Jerome Glisse, Michal Hocko, Vlastimil Babka, Johannes Weiner, Peter Xu, Dimitri Sivanich, Mike Travis, Steve Wahl On Mon, Jul 17, 2023 at 09:18:34AM -0700, Suren Baghdasaryan wrote: > On Sun, Jul 16, 2023 at 11:40 PM Yan Zhao <yan.y.zhao@intel.com> wrote: > > > > On Wed, Jul 12, 2023 at 07:48:06PM +0000, Suren Baghdasaryan wrote: > > > > Would we be able to fix it in stable simply by not triggering the > > > > vm_flags_set() in case these flags are already set? > > > > > > I think we can do that. gru_file_mmap() sets all the flags that are > > > set by remap_pfn_range_notrack() (VM_IO | VM_PFNMAP | VM_DONTEXPAND | > > > VM_DONTDUMP), so we can check if all bits are already present and skip > > > the vm_flags_set() call. > > > > > But on x86, remap_pfn_range() also sets flag VM_PAT. (in track_pfn_remap()). > > > > Is there any interface to allow device driver to pre-set this flag in .mmap() > > before .fault()? e.g. export track_pfn_remap() ? > > Driver should be able to call vm_flags_set() in its .mmap(). Do you mean do something like this in .mmap()? flags = VM_IO | VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP; #ifdef CONFIG_X86 flags |= VM_PAT; #endif vm_flags_set(vma, flags); But VM_PAT cannot be set until after a successful reserve_pfn_range(), which function is again not exported. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [bug report] mm: replace vma->vm_flags direct modifications with modifier calls 2023-07-18 0:27 ` Yan Zhao @ 2023-07-18 16:27 ` Suren Baghdasaryan 0 siblings, 0 replies; 18+ messages in thread From: Suren Baghdasaryan @ 2023-07-18 16:27 UTC (permalink / raw) To: Yan Zhao Cc: David Hildenbrand, Matthew Wilcox, Dan Carpenter, linux-mm, Andrew Morton, Liam R. Howlett, Laurent Dufour, Michel Lespinasse, Jerome Glisse, Michal Hocko, Vlastimil Babka, Johannes Weiner, Peter Xu, Dimitri Sivanich, Mike Travis, Steve Wahl On Mon, Jul 17, 2023 at 5:54 PM Yan Zhao <yan.y.zhao@intel.com> wrote: > > On Mon, Jul 17, 2023 at 09:18:34AM -0700, Suren Baghdasaryan wrote: > > On Sun, Jul 16, 2023 at 11:40 PM Yan Zhao <yan.y.zhao@intel.com> wrote: > > > > > > On Wed, Jul 12, 2023 at 07:48:06PM +0000, Suren Baghdasaryan wrote: > > > > > Would we be able to fix it in stable simply by not triggering the > > > > > vm_flags_set() in case these flags are already set? > > > > > > > > I think we can do that. gru_file_mmap() sets all the flags that are > > > > set by remap_pfn_range_notrack() (VM_IO | VM_PFNMAP | VM_DONTEXPAND | > > > > VM_DONTDUMP), so we can check if all bits are already present and skip > > > > the vm_flags_set() call. > > > > > > > But on x86, remap_pfn_range() also sets flag VM_PAT. (in track_pfn_remap()). > > > > > > Is there any interface to allow device driver to pre-set this flag in .mmap() > > > before .fault()? e.g. export track_pfn_remap() ? > > > > Driver should be able to call vm_flags_set() in its .mmap(). > > Do you mean do something like this in .mmap()? > > flags = VM_IO | VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP; > #ifdef CONFIG_X86 > flags |= VM_PAT; > #endif > vm_flags_set(vma, flags); > > But VM_PAT cannot be set until after a successful reserve_pfn_range(), > which function is again not exported. I see. Let's wait for Matthew's fix and see what it looks like first. Maybe we can backport it directly instead of using alternatives. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [bug report] mm: replace vma->vm_flags direct modifications with modifier calls 2023-07-12 18:49 ` Matthew Wilcox 2023-07-12 18:52 ` David Hildenbrand @ 2023-07-12 19:34 ` Matthew Wilcox 2023-07-17 19:54 ` Dimitri Sivanich 1 sibling, 1 reply; 18+ messages in thread From: Matthew Wilcox @ 2023-07-12 19:34 UTC (permalink / raw) To: David Hildenbrand Cc: Suren Baghdasaryan, Dan Carpenter, linux-mm, Andrew Morton, Liam R. Howlett, Laurent Dufour, Michel Lespinasse, Jerome Glisse, Michal Hocko, Vlastimil Babka, Johannes Weiner, Peter Xu, Dimitri Sivanich, Mike Travis, Steve Wahl On Wed, Jul 12, 2023 at 07:49:53PM +0100, Matthew Wilcox wrote: > @@ -964,9 +966,12 @@ vm_fault_t gru_fault(struct vm_fault *vmf) > } > gru_load_context(gts); > paddr = gseg_physical_address(gts->ts_gru, gts->ts_ctxnum); > - remap_pfn_range(vma, vaddr & ~(GRU_GSEG_PAGESIZE - 1), > - paddr >> PAGE_SHIFT, GRU_GSEG_PAGESIZE, > - vma->vm_page_prot); > + > + pte = pfn_pte(paddr / PAGE_SIZE, vma->vm_page_prot); > + ptep = vmf->pte - (vaddr % GRU_GSEG_PAGESIZE) / PAGE_SIZE; > + set_ptes(vma->vm_mm, vaddr & ~(GRU_GSEG_PAGESIZE - 1), > + ptep, pte_mkspecial(pte), > + GRU_GSEG_PAGESIZE / PAGE_SIZE); > } Argh, no, this is wrong. The page table isn't mapped at this point. What we want is a cross between vmf_insert_pfn() and vm_insert_pages(). Let me add that ... ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [bug report] mm: replace vma->vm_flags direct modifications with modifier calls 2023-07-12 19:34 ` Matthew Wilcox @ 2023-07-17 19:54 ` Dimitri Sivanich 0 siblings, 0 replies; 18+ messages in thread From: Dimitri Sivanich @ 2023-07-17 19:54 UTC (permalink / raw) To: Matthew Wilcox Cc: David Hildenbrand, Suren Baghdasaryan, Dan Carpenter, linux-mm, Andrew Morton, Liam R. Howlett, Laurent Dufour, Michel Lespinasse, Jerome Glisse, Michal Hocko, Vlastimil Babka, Johannes Weiner, Peter Xu, Dimitri Sivanich, Steve Wahl On Wed, Jul 12, 2023 at 08:34:10PM +0100, Matthew Wilcox wrote: > On Wed, Jul 12, 2023 at 07:49:53PM +0100, Matthew Wilcox wrote: > > @@ -964,9 +966,12 @@ vm_fault_t gru_fault(struct vm_fault *vmf) > > } > > gru_load_context(gts); > > paddr = gseg_physical_address(gts->ts_gru, gts->ts_ctxnum); > > - remap_pfn_range(vma, vaddr & ~(GRU_GSEG_PAGESIZE - 1), > > - paddr >> PAGE_SHIFT, GRU_GSEG_PAGESIZE, > > - vma->vm_page_prot); > > + > > + pte = pfn_pte(paddr / PAGE_SIZE, vma->vm_page_prot); > > + ptep = vmf->pte - (vaddr % GRU_GSEG_PAGESIZE) / PAGE_SIZE; > > + set_ptes(vma->vm_mm, vaddr & ~(GRU_GSEG_PAGESIZE - 1), > > + ptep, pte_mkspecial(pte), > > + GRU_GSEG_PAGESIZE / PAGE_SIZE); > > } > > Argh, no, this is wrong. The page table isn't mapped at this point. > What we want is a cross between vmf_insert_pfn() and vm_insert_pages(). > Let me add that ... I should be able to test this once you have something ready. ^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2023-07-18 16:27 UTC | newest] Thread overview: 18+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-07-11 7:21 [bug report] mm: replace vma->vm_flags direct modifications with modifier calls Dan Carpenter 2023-07-11 21:55 ` Suren Baghdasaryan 2023-07-11 22:21 ` Matthew Wilcox 2023-07-11 23:45 ` Suren Baghdasaryan 2023-07-12 7:35 ` David Hildenbrand 2023-07-12 15:01 ` Suren Baghdasaryan 2023-07-12 15:52 ` Matthew Wilcox 2023-07-12 15:55 ` David Hildenbrand 2023-07-12 16:03 ` Suren Baghdasaryan 2023-07-12 18:49 ` Matthew Wilcox 2023-07-12 18:52 ` David Hildenbrand 2023-07-12 19:48 ` Suren Baghdasaryan 2023-07-17 6:13 ` Yan Zhao 2023-07-17 16:18 ` Suren Baghdasaryan 2023-07-18 0:27 ` Yan Zhao 2023-07-18 16:27 ` Suren Baghdasaryan 2023-07-12 19:34 ` Matthew Wilcox 2023-07-17 19:54 ` Dimitri Sivanich
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).