* [PATCH] mm/vmalloc: Fix unlock order in s_stop()
@ 2020-12-13 18:08 Waiman Long
2020-12-13 18:39 ` Uladzislau Rezki
2020-12-14 9:39 ` David Hildenbrand
0 siblings, 2 replies; 9+ messages in thread
From: Waiman Long @ 2020-12-13 18:08 UTC (permalink / raw)
To: Andrew Morton, Uladzislau Rezki (Sony)
Cc: linux-mm, linux-kernel, Waiman Long
When multiple locks are acquired, they should be released in reverse
order. For s_start() and s_stop() in mm/vmalloc.c, that is not the
case.
s_start: mutex_lock(&vmap_purge_lock); spin_lock(&vmap_area_lock);
s_stop : mutex_unlock(&vmap_purge_lock); spin_unlock(&vmap_area_lock);
This unlock sequence, though allowed, is not optimal. If a waiter is
present, mutex_unlock() will need to go through the slowpath of waking
up the waiter with preemption disabled. Fix that by releasing the
spinlock first before the mutex.
Fixes: e36176be1c39 ("mm/vmalloc: rework vmap_area_lock")
Signed-off-by: Waiman Long <longman@redhat.com>
---
mm/vmalloc.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 6ae491a8b210..75913f685c71 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -3448,11 +3448,11 @@ static void *s_next(struct seq_file *m, void *p, loff_t *pos)
}
static void s_stop(struct seq_file *m, void *p)
- __releases(&vmap_purge_lock)
__releases(&vmap_area_lock)
+ __releases(&vmap_purge_lock)
{
- mutex_unlock(&vmap_purge_lock);
spin_unlock(&vmap_area_lock);
+ mutex_unlock(&vmap_purge_lock);
}
static void show_numa_info(struct seq_file *m, struct vm_struct *v)
--
2.18.1
^ permalink raw reply related [flat|nested] 9+ messages in thread* Re: [PATCH] mm/vmalloc: Fix unlock order in s_stop() 2020-12-13 18:08 [PATCH] mm/vmalloc: Fix unlock order in s_stop() Waiman Long @ 2020-12-13 18:39 ` Uladzislau Rezki 2020-12-13 19:42 ` Waiman Long 2020-12-13 21:51 ` Matthew Wilcox 2020-12-14 9:39 ` David Hildenbrand 1 sibling, 2 replies; 9+ messages in thread From: Uladzislau Rezki @ 2020-12-13 18:39 UTC (permalink / raw) To: Waiman Long Cc: Andrew Morton, Uladzislau Rezki (Sony), linux-mm, linux-kernel On Sun, Dec 13, 2020 at 01:08:43PM -0500, Waiman Long wrote: > When multiple locks are acquired, they should be released in reverse > order. For s_start() and s_stop() in mm/vmalloc.c, that is not the > case. > > s_start: mutex_lock(&vmap_purge_lock); spin_lock(&vmap_area_lock); > s_stop : mutex_unlock(&vmap_purge_lock); spin_unlock(&vmap_area_lock); > > This unlock sequence, though allowed, is not optimal. If a waiter is > present, mutex_unlock() will need to go through the slowpath of waking > up the waiter with preemption disabled. Fix that by releasing the > spinlock first before the mutex. > > Fixes: e36176be1c39 ("mm/vmalloc: rework vmap_area_lock") > Signed-off-by: Waiman Long <longman@redhat.com> > --- > mm/vmalloc.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c > index 6ae491a8b210..75913f685c71 100644 > --- a/mm/vmalloc.c > +++ b/mm/vmalloc.c > @@ -3448,11 +3448,11 @@ static void *s_next(struct seq_file *m, void *p, loff_t *pos) > } > > static void s_stop(struct seq_file *m, void *p) > - __releases(&vmap_purge_lock) > __releases(&vmap_area_lock) > + __releases(&vmap_purge_lock) > { > - mutex_unlock(&vmap_purge_lock); > spin_unlock(&vmap_area_lock); > + mutex_unlock(&vmap_purge_lock); > } > > static void show_numa_info(struct seq_file *m, struct vm_struct *v) BTW, if navigation over both list is an issue, for example when there are multiple heavy readers of /proc/vmallocinfo, i think, it make sense to implement RCU safe lists iteration and get rid of both locks. As for the patch: Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Thanks! -- Vlad Rezki ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] mm/vmalloc: Fix unlock order in s_stop() 2020-12-13 18:39 ` Uladzislau Rezki @ 2020-12-13 19:42 ` Waiman Long 2020-12-13 21:51 ` Matthew Wilcox 1 sibling, 0 replies; 9+ messages in thread From: Waiman Long @ 2020-12-13 19:42 UTC (permalink / raw) To: Uladzislau Rezki; +Cc: Andrew Morton, linux-mm, linux-kernel On 12/13/20 1:39 PM, Uladzislau Rezki wrote: > On Sun, Dec 13, 2020 at 01:08:43PM -0500, Waiman Long wrote: >> When multiple locks are acquired, they should be released in reverse >> order. For s_start() and s_stop() in mm/vmalloc.c, that is not the >> case. >> >> s_start: mutex_lock(&vmap_purge_lock); spin_lock(&vmap_area_lock); >> s_stop : mutex_unlock(&vmap_purge_lock); spin_unlock(&vmap_area_lock); >> >> This unlock sequence, though allowed, is not optimal. If a waiter is >> present, mutex_unlock() will need to go through the slowpath of waking >> up the waiter with preemption disabled. Fix that by releasing the >> spinlock first before the mutex. >> >> Fixes: e36176be1c39 ("mm/vmalloc: rework vmap_area_lock") >> Signed-off-by: Waiman Long <longman@redhat.com> >> --- >> mm/vmalloc.c | 4 ++-- >> 1 file changed, 2 insertions(+), 2 deletions(-) >> >> diff --git a/mm/vmalloc.c b/mm/vmalloc.c >> index 6ae491a8b210..75913f685c71 100644 >> --- a/mm/vmalloc.c >> +++ b/mm/vmalloc.c >> @@ -3448,11 +3448,11 @@ static void *s_next(struct seq_file *m, void *p, loff_t *pos) >> } >> >> static void s_stop(struct seq_file *m, void *p) >> - __releases(&vmap_purge_lock) >> __releases(&vmap_area_lock) >> + __releases(&vmap_purge_lock) >> { >> - mutex_unlock(&vmap_purge_lock); >> spin_unlock(&vmap_area_lock); >> + mutex_unlock(&vmap_purge_lock); >> } >> >> static void show_numa_info(struct seq_file *m, struct vm_struct *v) > BTW, if navigation over both list is an issue, for example when there > are multiple heavy readers of /proc/vmallocinfo, i think, it make sense > to implement RCU safe lists iteration and get rid of both locks. Making it lockless is certainly better, but doing lockless the right way is tricky. I will probably keep it as it unless there is a significant advantage of doing so. Cheers, Longman > > As for the patch: Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com> > > Thanks! > > -- > Vlad Rezki > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] mm/vmalloc: Fix unlock order in s_stop() 2020-12-13 18:39 ` Uladzislau Rezki 2020-12-13 19:42 ` Waiman Long @ 2020-12-13 21:51 ` Matthew Wilcox 2020-12-14 15:11 ` Uladzislau Rezki 1 sibling, 1 reply; 9+ messages in thread From: Matthew Wilcox @ 2020-12-13 21:51 UTC (permalink / raw) To: Uladzislau Rezki; +Cc: Waiman Long, Andrew Morton, linux-mm, linux-kernel On Sun, Dec 13, 2020 at 07:39:36PM +0100, Uladzislau Rezki wrote: > On Sun, Dec 13, 2020 at 01:08:43PM -0500, Waiman Long wrote: > > When multiple locks are acquired, they should be released in reverse > > order. For s_start() and s_stop() in mm/vmalloc.c, that is not the > > case. > > > > s_start: mutex_lock(&vmap_purge_lock); spin_lock(&vmap_area_lock); > > s_stop : mutex_unlock(&vmap_purge_lock); spin_unlock(&vmap_area_lock); > > > > This unlock sequence, though allowed, is not optimal. If a waiter is > > present, mutex_unlock() will need to go through the slowpath of waking > > up the waiter with preemption disabled. Fix that by releasing the > > spinlock first before the mutex. > > > > Fixes: e36176be1c39 ("mm/vmalloc: rework vmap_area_lock") > > Signed-off-by: Waiman Long <longman@redhat.com> > > --- > > mm/vmalloc.c | 4 ++-- > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c > > index 6ae491a8b210..75913f685c71 100644 > > --- a/mm/vmalloc.c > > +++ b/mm/vmalloc.c > > @@ -3448,11 +3448,11 @@ static void *s_next(struct seq_file *m, void *p, loff_t *pos) > > } > > > > static void s_stop(struct seq_file *m, void *p) > > - __releases(&vmap_purge_lock) > > __releases(&vmap_area_lock) > > + __releases(&vmap_purge_lock) > > { > > - mutex_unlock(&vmap_purge_lock); > > spin_unlock(&vmap_area_lock); > > + mutex_unlock(&vmap_purge_lock); > > } > > > > static void show_numa_info(struct seq_file *m, struct vm_struct *v) > BTW, if navigation over both list is an issue, for example when there > are multiple heavy readers of /proc/vmallocinfo, i think, it make sense > to implement RCU safe lists iteration and get rid of both locks. If we need to iterate the list efficiently, i'd suggest getting rid of the list and using an xarray instead. maybe a maple tree, once that code is better exercised. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] mm/vmalloc: Fix unlock order in s_stop() 2020-12-13 21:51 ` Matthew Wilcox @ 2020-12-14 15:11 ` Uladzislau Rezki 2020-12-14 15:37 ` Matthew Wilcox 0 siblings, 1 reply; 9+ messages in thread From: Uladzislau Rezki @ 2020-12-14 15:11 UTC (permalink / raw) To: Matthew Wilcox Cc: Uladzislau Rezki, Waiman Long, Andrew Morton, linux-mm, linux-kernel On Sun, Dec 13, 2020 at 09:51:34PM +0000, Matthew Wilcox wrote: > On Sun, Dec 13, 2020 at 07:39:36PM +0100, Uladzislau Rezki wrote: > > On Sun, Dec 13, 2020 at 01:08:43PM -0500, Waiman Long wrote: > > > When multiple locks are acquired, they should be released in reverse > > > order. For s_start() and s_stop() in mm/vmalloc.c, that is not the > > > case. > > > > > > s_start: mutex_lock(&vmap_purge_lock); spin_lock(&vmap_area_lock); > > > s_stop : mutex_unlock(&vmap_purge_lock); spin_unlock(&vmap_area_lock); > > > > > > This unlock sequence, though allowed, is not optimal. If a waiter is > > > present, mutex_unlock() will need to go through the slowpath of waking > > > up the waiter with preemption disabled. Fix that by releasing the > > > spinlock first before the mutex. > > > > > > Fixes: e36176be1c39 ("mm/vmalloc: rework vmap_area_lock") > > > Signed-off-by: Waiman Long <longman@redhat.com> > > > --- > > > mm/vmalloc.c | 4 ++-- > > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c > > > index 6ae491a8b210..75913f685c71 100644 > > > --- a/mm/vmalloc.c > > > +++ b/mm/vmalloc.c > > > @@ -3448,11 +3448,11 @@ static void *s_next(struct seq_file *m, void *p, loff_t *pos) > > > } > > > > > > static void s_stop(struct seq_file *m, void *p) > > > - __releases(&vmap_purge_lock) > > > __releases(&vmap_area_lock) > > > + __releases(&vmap_purge_lock) > > > { > > > - mutex_unlock(&vmap_purge_lock); > > > spin_unlock(&vmap_area_lock); > > > + mutex_unlock(&vmap_purge_lock); > > > } > > > > > > static void show_numa_info(struct seq_file *m, struct vm_struct *v) > > BTW, if navigation over both list is an issue, for example when there > > are multiple heavy readers of /proc/vmallocinfo, i think, it make sense > > to implement RCU safe lists iteration and get rid of both locks. > > If we need to iterate the list efficiently, i'd suggest getting rid of > the list and using an xarray instead. maybe a maple tree, once that code > is better exercised. > Not really efficiently. We need just a full scan of it propagating the information about mapped and un-purged areas to user space applications. For example RCU-safe list is what we need, IMHO. From the other hand i am not sure if xarray is RCU safe in a context of concurrent removing/adding an element(xa_remove()/xa_insert()) and scanning like xa_for_each_XXX(). -- Vlad Rezki ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] mm/vmalloc: Fix unlock order in s_stop() 2020-12-14 15:11 ` Uladzislau Rezki @ 2020-12-14 15:37 ` Matthew Wilcox 2020-12-14 17:56 ` Uladzislau Rezki 0 siblings, 1 reply; 9+ messages in thread From: Matthew Wilcox @ 2020-12-14 15:37 UTC (permalink / raw) To: Uladzislau Rezki; +Cc: Waiman Long, Andrew Morton, linux-mm, linux-kernel On Mon, Dec 14, 2020 at 04:11:28PM +0100, Uladzislau Rezki wrote: > On Sun, Dec 13, 2020 at 09:51:34PM +0000, Matthew Wilcox wrote: > > If we need to iterate the list efficiently, i'd suggest getting rid of > > the list and using an xarray instead. maybe a maple tree, once that code > > is better exercised. > > Not really efficiently. We need just a full scan of it propagating the > information about mapped and un-purged areas to user space applications. > > For example RCU-safe list is what we need, IMHO. From the other hand i > am not sure if xarray is RCU safe in a context of concurrent removing/adding > an element(xa_remove()/xa_insert()) and scanning like xa_for_each_XXX(). It's as RCU safe as an RCU-safe list. Specifically, it guarantees: - If an element is present at all times between the start and the end of the iteration, it will appear in the iteration. - No element will appear more than once. - No element will appear in the iteration that was never present. - The iteration will terminate. If an element is added or removed between the start and end of the iteration, it may or may not appear. Causality is not guaranteed (eg if modification A is made before modification B, modification B may be reflected in the iteration while modification A is not). ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] mm/vmalloc: Fix unlock order in s_stop() 2020-12-14 15:37 ` Matthew Wilcox @ 2020-12-14 17:56 ` Uladzislau Rezki 0 siblings, 0 replies; 9+ messages in thread From: Uladzislau Rezki @ 2020-12-14 17:56 UTC (permalink / raw) To: Matthew Wilcox Cc: Uladzislau Rezki, Waiman Long, Andrew Morton, linux-mm, linux-kernel On Mon, Dec 14, 2020 at 03:37:46PM +0000, Matthew Wilcox wrote: > On Mon, Dec 14, 2020 at 04:11:28PM +0100, Uladzislau Rezki wrote: > > On Sun, Dec 13, 2020 at 09:51:34PM +0000, Matthew Wilcox wrote: > > > If we need to iterate the list efficiently, i'd suggest getting rid of > > > the list and using an xarray instead. maybe a maple tree, once that code > > > is better exercised. > > > > Not really efficiently. We need just a full scan of it propagating the > > information about mapped and un-purged areas to user space applications. > > > > For example RCU-safe list is what we need, IMHO. From the other hand i > > am not sure if xarray is RCU safe in a context of concurrent removing/adding > > an element(xa_remove()/xa_insert()) and scanning like xa_for_each_XXX(). > > It's as RCU safe as an RCU-safe list. Specifically, it guarantees: > > - If an element is present at all times between the start and the > end of the iteration, it will appear in the iteration. > - No element will appear more than once. > - No element will appear in the iteration that was never present. > - The iteration will terminate. > > If an element is added or removed between the start and end of the > iteration, it may or may not appear. Causality is not guaranteed (eg > if modification A is made before modification B, modification B may > be reflected in the iteration while modification A is not). > Thank you for information! To make use of xarray it would require a migration from our current vmap_area_root RB-tree to xaarray. It probably makes sense if there are performance benefits of such migration work. Apparently running the vmalloc benchmark shows a quite big degrade: # X-array urezki@pc638:~$ time sudo ./test_vmalloc.sh run_test_mask=31 single_cpu_test=1 Run the test with following parameters: run_test_mask=31 single_cpu_test=1 Done. Check the kernel ring buffer to see the summary. real 0m18.928s user 0m0.017s sys 0m0.004s urezki@pc638:~$ [ 90.103768] Summary: fix_size_alloc_test passed: 1 failed: 0 repeat: 1 loops: 1000000 avg: 1275773 usec [ 90.103771] Summary: full_fit_alloc_test passed: 1 failed: 0 repeat: 1 loops: 1000000 avg: 1439371 usec [ 90.103772] Summary: long_busy_list_alloc_test passed: 1 failed: 0 repeat: 1 loops: 1000000 avg: 9138051 usec [ 90.103773] Summary: random_size_alloc_test passed: 1 failed: 0 repeat: 1 loops: 1000000 avg: 4821400 usec [ 90.103774] Summary: fix_align_alloc_test passed: 1 failed: 0 repeat: 1 loops: 1000000 avg: 2181207 usec [ 90.103775] All test took CPU0=69774784667 cycles # RB-tree urezki@pc638:~$ time sudo ./test_vmalloc.sh run_test_mask=31 single_cpu_test=1 Run the test with following parameters: run_test_mask=31 single_cpu_test=1 Done. Check the kernel ring buffer to see the summary. real 0m13.975s user 0m0.013s sys 0m0.010s urezki@pc638:~$ [ 26.633372] Summary: fix_size_alloc_test passed: 1 failed: 0 repeat: 1 loops: 1000000 avg: 429836 usec [ 26.633375] Summary: full_fit_alloc_test passed: 1 failed: 0 repeat: 1 loops: 1000000 avg: 566042 usec [ 26.633377] Summary: long_busy_list_alloc_test passed: 1 failed: 0 repeat: 1 loops: 1000000 avg: 7663974 usec [ 26.633378] Summary: random_size_alloc_test passed: 1 failed: 0 repeat: 1 loops: 1000000 avg: 3853388 usec [ 26.633379] Summary: fix_align_alloc_test passed: 1 failed: 0 repeat: 1 loops: 1000000 avg: 1370097 usec [ 26.633380] All test took CPU0=51370095742 cycles I suspect xa_load() does provide O(log(n)) search time? -- Vlad Rezki ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] mm/vmalloc: Fix unlock order in s_stop() 2020-12-13 18:08 [PATCH] mm/vmalloc: Fix unlock order in s_stop() Waiman Long 2020-12-13 18:39 ` Uladzislau Rezki @ 2020-12-14 9:39 ` David Hildenbrand 2020-12-14 15:05 ` Waiman Long 1 sibling, 1 reply; 9+ messages in thread From: David Hildenbrand @ 2020-12-14 9:39 UTC (permalink / raw) To: Waiman Long, Andrew Morton, Uladzislau Rezki (Sony) Cc: linux-mm, linux-kernel On 13.12.20 19:08, Waiman Long wrote: > When multiple locks are acquired, they should be released in reverse > order. For s_start() and s_stop() in mm/vmalloc.c, that is not the > case. > > s_start: mutex_lock(&vmap_purge_lock); spin_lock(&vmap_area_lock); > s_stop : mutex_unlock(&vmap_purge_lock); spin_unlock(&vmap_area_lock); > > This unlock sequence, though allowed, is not optimal. If a waiter is > present, mutex_unlock() will need to go through the slowpath of waking > up the waiter with preemption disabled. Fix that by releasing the > spinlock first before the mutex. > > Fixes: e36176be1c39 ("mm/vmalloc: rework vmap_area_lock") I'm not sure if this classifies as "Fixes". As you correctly state "is not optimal". But yeah, releasing a spinlock after releasing a mutex looks weird already. > Signed-off-by: Waiman Long <longman@redhat.com> > --- > mm/vmalloc.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c > index 6ae491a8b210..75913f685c71 100644 > --- a/mm/vmalloc.c > +++ b/mm/vmalloc.c > @@ -3448,11 +3448,11 @@ static void *s_next(struct seq_file *m, void *p, loff_t *pos) > } > > static void s_stop(struct seq_file *m, void *p) > - __releases(&vmap_purge_lock) > __releases(&vmap_area_lock) > + __releases(&vmap_purge_lock) > { > - mutex_unlock(&vmap_purge_lock); > spin_unlock(&vmap_area_lock); > + mutex_unlock(&vmap_purge_lock); > } > > static void show_numa_info(struct seq_file *m, struct vm_struct *v) > Reviewed-by: David Hildenbrand <david@redhat.com> -- Thanks, David / dhildenb ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] mm/vmalloc: Fix unlock order in s_stop() 2020-12-14 9:39 ` David Hildenbrand @ 2020-12-14 15:05 ` Waiman Long 0 siblings, 0 replies; 9+ messages in thread From: Waiman Long @ 2020-12-14 15:05 UTC (permalink / raw) To: David Hildenbrand, Andrew Morton, Uladzislau Rezki (Sony) Cc: linux-mm, linux-kernel On 12/14/20 4:39 AM, David Hildenbrand wrote: > On 13.12.20 19:08, Waiman Long wrote: >> When multiple locks are acquired, they should be released in reverse >> order. For s_start() and s_stop() in mm/vmalloc.c, that is not the >> case. >> >> s_start: mutex_lock(&vmap_purge_lock); spin_lock(&vmap_area_lock); >> s_stop : mutex_unlock(&vmap_purge_lock); spin_unlock(&vmap_area_lock); >> >> This unlock sequence, though allowed, is not optimal. If a waiter is >> present, mutex_unlock() will need to go through the slowpath of waking >> up the waiter with preemption disabled. Fix that by releasing the >> spinlock first before the mutex. >> >> Fixes: e36176be1c39 ("mm/vmalloc: rework vmap_area_lock") > I'm not sure if this classifies as "Fixes". As you correctly state "is > not optimal". But yeah, releasing a spinlock after releasing a mutex > looks weird already. > Yes, it may not be technically a real bug fix. However, the order just doesn't look right. That is why I sent out a patch to address that. Cheers, Longman ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2020-12-14 17:58 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2020-12-13 18:08 [PATCH] mm/vmalloc: Fix unlock order in s_stop() Waiman Long 2020-12-13 18:39 ` Uladzislau Rezki 2020-12-13 19:42 ` Waiman Long 2020-12-13 21:51 ` Matthew Wilcox 2020-12-14 15:11 ` Uladzislau Rezki 2020-12-14 15:37 ` Matthew Wilcox 2020-12-14 17:56 ` Uladzislau Rezki 2020-12-14 9:39 ` David Hildenbrand 2020-12-14 15:05 ` Waiman Long
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox