* [PATCH v2] mm: Give kmap_lock before call flush_tlb_kernel_rang,avoid kmap_high deadlock.
@ 2024-07-10 12:20 zhangchun
2024-07-10 17:36 ` Andrew Morton
0 siblings, 1 reply; 18+ messages in thread
From: zhangchun @ 2024-07-10 12:20 UTC (permalink / raw)
To: akpm
Cc: linux-mm, linux-kernel, jiaoxupo, zhang.zhengming,
zhang.zhansheng, shaohaojize, zhangchun
Use kmap_high and kmap_XXX or kumap_xxx among differt cores at the same
time may cause deadlock. The issue is like this:
CPU 0: CPU 1:
kmap_high(){ kmap_xxx() {
... irq_disable();
spin_lock(&kmap_lock)
...
map_new_virtual ...
flush_all_zero_pkmaps
flush_tlb_kernel_range /* CPU0 holds the kmap_lock */
smp_call_function_many spin_lock(&kmap_lock)
... ....
spin_unlock(&kmap_lock)
...
CPU 0 holds the kmap_lock, waiting for CPU 1 respond to IPI. But CPU 1
has disabled irqs, waiting for kmap_lock, cannot answer the IPI. Fix
this by releasing kmap_lock before call flush_tlb_kernel_range,
avoid kmap_lock deadlock.
Fixes: 3297e760776a ("highmem: atomic highmem kmap page pinning")
Signed-off-by: zhangchun <zhang.chuna@h3c.com>
Co-developed-by: zhangzhansheng <zhang.zhansheng@h3c.com>
Signed-off-by: zhangzhansheng <zhang.zhansheng@h3c.com>
Suggested-by: Matthew Wilcox <willy@infradead.org>
Reviewed-by: zhangzhengming <zhang.zhengming@h3c.com>
---
mm/highmem.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/mm/highmem.c b/mm/highmem.c
index bd48ba4..841b370 100644
--- a/mm/highmem.c
+++ b/mm/highmem.c
@@ -220,8 +220,11 @@ static void flush_all_zero_pkmaps(void)
set_page_address(page, NULL);
need_flush = 1;
}
- if (need_flush)
+ if (need_flush) {
+ unlock_kmap();
flush_tlb_kernel_range(PKMAP_ADDR(0), PKMAP_ADDR(LAST_PKMAP));
+ lock_kmap();
+ }
}
void __kmap_flush_unused(void)
--
1.8.3.1
^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH v2] mm: Give kmap_lock before call flush_tlb_kernel_rang,avoid kmap_high deadlock.
2024-07-10 12:20 [PATCH v2] mm: Give kmap_lock before call flush_tlb_kernel_rang,avoid kmap_high deadlock zhangchun
@ 2024-07-10 17:36 ` Andrew Morton
2024-07-11 7:07 ` zhangchun
` (3 more replies)
0 siblings, 4 replies; 18+ messages in thread
From: Andrew Morton @ 2024-07-10 17:36 UTC (permalink / raw)
To: zhangchun
Cc: linux-mm, linux-kernel, jiaoxupo, zhang.zhengming,
zhang.zhansheng, shaohaojize
On Wed, 10 Jul 2024 20:20:28 +0800 zhangchun <zhang.chuna@h3c.com> wrote:
> Use kmap_high and kmap_XXX or kumap_xxx among differt cores at the same
> time may cause deadlock. The issue is like this:
What is kmap_XXX?
> CPU 0: CPU 1:
> kmap_high(){ kmap_xxx() {
> ... irq_disable();
> spin_lock(&kmap_lock)
> ...
> map_new_virtual ...
> flush_all_zero_pkmaps
> flush_tlb_kernel_range /* CPU0 holds the kmap_lock */
> smp_call_function_many spin_lock(&kmap_lock)
> ... ....
> spin_unlock(&kmap_lock)
> ...
>
> CPU 0 holds the kmap_lock, waiting for CPU 1 respond to IPI. But CPU 1
> has disabled irqs, waiting for kmap_lock, cannot answer the IPI. Fix
> this by releasing kmap_lock before call flush_tlb_kernel_range,
> avoid kmap_lock deadlock.
>
> Fixes: 3297e760776a ("highmem: atomic highmem kmap page pinning")
Wow, that's 15 years old. Has the deadlock been observed?
> --- a/mm/highmem.c
> +++ b/mm/highmem.c
> @@ -220,8 +220,11 @@ static void flush_all_zero_pkmaps(void)
> set_page_address(page, NULL);
> need_flush = 1;
> }
> - if (need_flush)
> + if (need_flush) {
> + unlock_kmap();
> flush_tlb_kernel_range(PKMAP_ADDR(0), PKMAP_ADDR(LAST_PKMAP));
> + lock_kmap();
> + }
> }
Why is dropping the lock like this safe? What data is it protecting
and why is it OK to leave that data unprotected here?
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v2] mm: Give kmap_lock before call flush_tlb_kernel_rang,avoid kmap_high deadlock.
2024-07-10 17:36 ` Andrew Morton
@ 2024-07-11 7:07 ` zhangchun
2024-07-11 21:13 ` Andrew Morton
2024-07-12 7:54 ` zhangchun
` (2 subsequent siblings)
3 siblings, 1 reply; 18+ messages in thread
From: zhangchun @ 2024-07-11 7:07 UTC (permalink / raw)
To: akpm
Cc: jiaoxupo, linux-kernel, linux-mm, shaohaojize, zhang.chuna,
zhang.zhansheng, zhang.zhengming
>> Use kmap_high and kmap_XXX or kumap_xxx among differt cores at the
>> same time may cause deadlock. The issue is like this:
>What is kmap_XXX?
kmap/kunmap.
>> CPU 0: CPU 1:
>> kmap_high(){ kmap_xxx() {
>> ... irq_disable();
>> spin_lock(&kmap_lock)
>> ...
>> map_new_virtual ...
>> flush_all_zero_pkmaps
>> flush_tlb_kernel_range /* CPU0 holds the kmap_lock */
>> smp_call_function_many spin_lock(&kmap_lock)
>> ... ....
>> spin_unlock(&kmap_lock)
>> ...
>>
>> CPU 0 holds the kmap_lock, waiting for CPU 1 respond to IPI. But CPU 1
>> has disabled irqs, waiting for kmap_lock, cannot answer the IPI. Fix
>> this by releasing kmap_lock before call flush_tlb_kernel_range, avoid
>> kmap_lock deadlock.
>>
>> Fixes: 3297e760776a ("highmem: atomic highmem kmap page pinning")
>Wow, that's 15 years old. Has the deadlock been observed?
Yeah, the device crashed due to this reason.
>> --- a/mm/highmem.c
>> +++ b/mm/highmem.c
>> @@ -220,8 +220,11 @@ static void flush_all_zero_pkmaps(void)
>> set_page_address(page, NULL);
>> need_flush = 1;
>> }
>> - if (need_flush)
>> + if (need_flush) {
>> + unlock_kmap();
>> flush_tlb_kernel_range(PKMAP_ADDR(0), PKMAP_ADDR(LAST_PKMAP));
>> + lock_kmap();
>> + }
>> }
>Why is dropping the lock like this safe? What data is it protecting and why is it OK to
>leave that data unprotected here?
kmap_lock is used to protect pkmap_count, pkmap_page_table and last_pkmap_nr(static variable).
When call flush_tlb_kernel_range(PKMAP_ADDR(0), PKMAP_ADDR(LAST_PKMAP)), flush_tlb_kernel_range
will neither modify nor read these variables. Leave that data unprotected here is safe.
--
1.8.3.1
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2] mm: Give kmap_lock before call flush_tlb_kernel_rang,avoid kmap_high deadlock.
2024-07-11 7:07 ` zhangchun
@ 2024-07-11 21:13 ` Andrew Morton
0 siblings, 0 replies; 18+ messages in thread
From: Andrew Morton @ 2024-07-11 21:13 UTC (permalink / raw)
To: zhangchun
Cc: jiaoxupo, linux-kernel, linux-mm, shaohaojize, zhang.zhansheng,
zhang.zhengming
On Thu, 11 Jul 2024 15:07:56 +0800 zhangchun <zhang.chuna@h3c.com> wrote:
> >> --- a/mm/highmem.c
> >> +++ b/mm/highmem.c
> >> @@ -220,8 +220,11 @@ static void flush_all_zero_pkmaps(void)
> >> set_page_address(page, NULL);
> >> need_flush = 1;
> >> }
> >> - if (need_flush)
> >> + if (need_flush) {
> >> + unlock_kmap();
> >> flush_tlb_kernel_range(PKMAP_ADDR(0), PKMAP_ADDR(LAST_PKMAP));
> >> + lock_kmap();
> >> + }
> >> }
>
> >Why is dropping the lock like this safe? What data is it protecting and why is it OK to
> >leave that data unprotected here?
>
> kmap_lock is used to protect pkmap_count, pkmap_page_table and last_pkmap_nr(static variable).
> When call flush_tlb_kernel_range(PKMAP_ADDR(0), PKMAP_ADDR(LAST_PKMAP)), flush_tlb_kernel_range
> will neither modify nor read these variables. Leave that data unprotected here is safe.
No, the risk here is that when the lock is dropped, other threads will
modify the data. And when this thread (the one running
flush_all_zero_pkmaps()) retakes the lock, that data may now be
unexpectedly altered.
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v2] mm: Give kmap_lock before call flush_tlb_kernel_rang,avoid kmap_high deadlock.
2024-07-10 17:36 ` Andrew Morton
2024-07-11 7:07 ` zhangchun
@ 2024-07-12 7:54 ` zhangchun
2024-07-18 16:18 ` zhangchun
2024-08-08 10:32 ` [PATCH v3] " zhangchun
3 siblings, 0 replies; 18+ messages in thread
From: zhangchun @ 2024-07-12 7:54 UTC (permalink / raw)
To: akpm
Cc: jiaoxupo, linux-kernel, linux-mm, shaohaojize, zhang.chuna,
zhang.zhansheng, zhang.zhengming
>> >> --- a/mm/highmem.c
>> >> +++ b/mm/highmem.c
>> >> @@ -220,8 +220,11 @@ static void flush_all_zero_pkmaps(void)
>> >> set_page_address(page, NULL);
>> >> need_flush = 1;
>> >> }
>> >> - if (need_flush)
>> >> + if (need_flush) {
>> >> + unlock_kmap();
>> >> flush_tlb_kernel_range(PKMAP_ADDR(0), PKMAP_ADDR(LAST_PKMAP));
>> >> + lock_kmap();
>> >> + }
>> >> }
>>
>> >Why is dropping the lock like this safe? What data is it protecting
>> >and why is it OK to leave that data unprotected here?
>>
>> kmap_lock is used to protect pkmap_count, pkmap_page_table and last_pkmap_nr(static variable).
>> When call flush_tlb_kernel_range(PKMAP_ADDR(0),
>> PKMAP_ADDR(LAST_PKMAP)), flush_tlb_kernel_range will neither modify nor read these variables. Leave that data unprotected here is safe.
>No, the risk here is that when the lock is dropped, other threads will modify the data. And when this thread (the one running
>flush_all_zero_pkmaps()) retakes the lock, that data may now be unexpectedly altered.
map_new_virtual aims to find an usable entry pkmap_count[last_pkmap_nr]. When read and modify the pkmap_count[last_pkmap_nr], the kmap_lock is
not dropped.
"if (!pkmap_count[last_pkmap_nr])" determine pkmap_count[last_pkmap_nr] is usable or not. If unusable, try agin.
Furthermore, the value of static variable last_pkmap_nr is stored in a local variable last_pkmap_nr, when kmap_lock is acquired,
this is thread-safe.
In an extreme case, if Thread A and Thread B access the same last_pkmap_nr, Thread A calls function flush_tlb_kernel_range and release the
kmap_lock, and Thread B then acquires the kmap_lock and modifies the variable pkmap_count[last_pkmap_nr]. After Thread A completes
the execution of function flush_tlb_kernel_range, it will check the variable pkmap_count[last_pkmap_nr].
If pkmap_count[last_pkmap_nr] != 0, Thread A continue to call get_next_pkmap_nr and get next last_pkmap_nr.
static inline unsigned long map_new_virtual(struct page *page)
{
unsigned long vaddr;
int count;
unsigned int last_pkmap_nr; // local variable to store static variable last_pkmap_nr
unsigned int color = get_pkmap_color(page);
start:
...
flush_all_zero_pkmaps();// release kmap_lock, then acquire it
count = get_pkmap_entries_count(color);
}
...
if (!pkmap_count[last_pkmap_nr]) // pkmap_count[last_pkmap_nr] is used or not
break; /* Found a usable entry */
if (--count)
continue;
...
vaddr = PKMAP_ADDR(last_pkmap_nr);
set_pte_at(&init_mm, vaddr,
&(pkmap_page_table[last_pkmap_nr]), mk_pte(page, kmap_prot));
pkmap_count[last_pkmap_nr] = 1;
...
return vaddr;
}
--
1.8.3.1
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v2] mm: Give kmap_lock before call flush_tlb_kernel_rang,avoid kmap_high deadlock.
2024-07-10 17:36 ` Andrew Morton
2024-07-11 7:07 ` zhangchun
2024-07-12 7:54 ` zhangchun
@ 2024-07-18 16:18 ` zhangchun
2024-07-24 0:26 ` Andrew Morton
2024-08-08 10:32 ` [PATCH v3] " zhangchun
3 siblings, 1 reply; 18+ messages in thread
From: zhangchun @ 2024-07-18 16:18 UTC (permalink / raw)
To: akpm
Cc: jiaoxupo, linux-kernel, linux-mm, shaohaojize, zhang.chuna,
zhang.zhansheng, zhang.zhengming
Very sorry to disturb! Just a friendly ping to check in on the status of the
patch "Give kmap_lock before call flush_tlb_kernel_rang,avoid kmap_high deadlock.".
Please let me know if there is any additional information from my side.
Sincerely look forward to your suggestions and guidance!
>>> >> --- a/mm/highmem.c
>>> >> +++ b/mm/highmem.c
>>> >> @@ -220,8 +220,11 @@ static void flush_all_zero_pkmaps(void)
>>> >> set_page_address(page, NULL);
>>> >> need_flush = 1;
>>> >> }
>>> >> - if (need_flush)
>>> >> + if (need_flush) {
>>> >> + unlock_kmap();
>>> >> flush_tlb_kernel_range(PKMAP_ADDR(0), PKMAP_ADDR(LAST_PKMAP));
>>> >> + lock_kmap();
>>> >> + }
>>> >> }
>>>
>>> >Why is dropping the lock like this safe? What data is it protecting
>>> >and why is it OK to leave that data unprotected here?
>>>
>>> kmap_lock is used to protect pkmap_count, pkmap_page_table and last_pkmap_nr(static variable).
>>> When call flush_tlb_kernel_range(PKMAP_ADDR(0),
>>> PKMAP_ADDR(LAST_PKMAP)), flush_tlb_kernel_range will neither modify nor read these variables. Leave that data unprotected here is safe.
>>No, the risk here is that when the lock is dropped, other threads will modify the data. And when this thread (the one running
>>flush_all_zero_pkmaps()) retakes the lock, that data may now be unexpectedly altered.
>map_new_virtual aims to find an usable entry pkmap_count[last_pkmap_nr]. When read and modify the pkmap_count[last_pkmap_nr], the kmap_lock is
>not dropped.
>"if (!pkmap_count[last_pkmap_nr])" determine pkmap_count[last_pkmap_nr] is usable or not. If unusable, try agin.
>Furthermore, the value of static variable last_pkmap_nr is stored in a local variable last_pkmap_nr, when kmap_lock is acquired,
>this is thread-safe.
>In an extreme case, if Thread A and Thread B access the same last_pkmap_nr, Thread A calls function flush_tlb_kernel_range and release the
>kmap_lock, and Thread B then acquires the kmap_lock and modifies the variable pkmap_count[last_pkmap_nr]. After Thread A completes
>the execution of function flush_tlb_kernel_range, it will check the variable pkmap_count[last_pkmap_nr].
>If pkmap_count[last_pkmap_nr] != 0, Thread A continue to call get_next_pkmap_nr and get next last_pkmap_nr.
>static inline unsigned long map_new_virtual(struct page *page)
>{
> unsigned long vaddr;
> int count;
> unsigned int last_pkmap_nr; // local variable to store static variable last_pkmap_nr
> unsigned int color = get_pkmap_color(page);
>start:
> ...
> flush_all_zero_pkmaps();// release kmap_lock, then acquire it
> count = get_pkmap_entries_count(color);
> }
> ...
> if (!pkmap_count[last_pkmap_nr]) // pkmap_count[last_pkmap_nr] is used or not
> break; /* Found a usable entry */
> if (--count)
> continue;
>
> ...
> vaddr = PKMAP_ADDR(last_pkmap_nr);
> set_pte_at(&init_mm, vaddr,
> &(pkmap_page_table[last_pkmap_nr]), mk_pte(page, kmap_prot));
>
> pkmap_count[last_pkmap_nr] = 1;
> ...
> return vaddr;
>}
--
1.8.3.1
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2] mm: Give kmap_lock before call flush_tlb_kernel_rang,avoid kmap_high deadlock.
2024-07-18 16:18 ` zhangchun
@ 2024-07-24 0:26 ` Andrew Morton
2024-08-19 16:10 ` [PATCH v3] " zhangchun
0 siblings, 1 reply; 18+ messages in thread
From: Andrew Morton @ 2024-07-24 0:26 UTC (permalink / raw)
To: zhangchun
Cc: jiaoxupo, linux-kernel, linux-mm, shaohaojize, zhang.zhansheng,
zhang.zhengming
On Fri, 19 Jul 2024 00:18:32 +0800 zhangchun <zhang.chuna@h3c.com> wrote:
> Very sorry to disturb! Just a friendly ping to check in on the status of the
> patch "Give kmap_lock before call flush_tlb_kernel_rang,avoid kmap_high deadlock.".
> Please let me know if there is any additional information from my side.
>
Please update the patch to include a code comment which explains why
we're dropping kmap_lock at this point. And please update the
changelog to explain why holding kmap_lock was not necessary at this
point and why this is a safe change. Then resend.
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v3] mm: Give kmap_lock before call flush_tlb_kernel_rang,avoid kmap_high deadlock.
@ 2024-07-29 12:13 zhangchun
2024-08-08 10:17 ` zhangchun
0 siblings, 1 reply; 18+ messages in thread
From: zhangchun @ 2024-07-29 12:13 UTC (permalink / raw)
To: akpm
Cc: jiaoxupo, linux-kernel, linux-mm, shaohaojize, zhang.chuna,
zhang.zhansheng, zhang.zhengming
CPU 0: CPU 1:
kmap_high(){ kmap_xxx() {
... irq_disable();
spin_lock(&kmap_lock)
...
map_new_virtual ...
flush_all_zero_pkmaps
flush_tlb_kernel_range /* CPU0 holds the kmap_lock */
smp_call_function_many spin_lock(&kmap_lock)
... ....
spin_unlock(&kmap_lock)
...
CPU 0 holds the kmap_lock, waiting for CPU 1 respond to IPI. But CPU 1 has
disabled irqs, waiting for kmap_lock, cannot answer the IPI. Fix this by
releasing kmap_lock before call flush_tlb_kernel_range, avoid kmap_lock
deadlock.
if (need_flush) {
unlock_kmap();
flush_tlb_kernel_range(PKMAP_ADDR(0), PKMAP_ADDR(LAST_PKMAP));
lock_kmap();
}
Dropping the lock like this is safe. kmap_lock is used to protect
pkmap_count, pkmap_page_table and last_pkmap_nr(static variable).
When call flush_tlb_kernel_range(PKMAP_ADDR(0), PKMAP_ADDR(LAST_PKMAP)),
flush_tlb_kernel_range will neither modify nor read these variables.
Leave that data unprotected here is safe.
map_new_virtual aims to find an usable entry pkmap_count[last_pkmap_nr].
When read and modify the pkmap_count[last_pkmap_nr], the kmap_lock is
not dropped. "if (!pkmap_count[last_pkmap_nr])" determine
pkmap_count[last_pkmap_nr] is usable or not. If unusable, try agin.
Furthermore, the value of static variable last_pkmap_nr is stored in
a local variable last_pkmap_nr, when kmap_lock is acquired, this is
thread-safe.
In an extreme case, if Thread A and Thread B access the same last_pkmap_nr,
Thread A calls function flush_tlb_kernel_range and release the kmap_lock,
and Thread B then acquires the kmap_lock and modifies the variable
pkmap_count[last_pkmap_nr]. After Thread A completes the execution
of function the variable pkmap_count[last_pkmap_nr]. After Thread A
completes the execution of function flush_tlb_kernel_range, it will
check the variable pkmap_count[last_pkmap_nr].
static inline unsigned long map_new_virtual(struct page *page)
{
unsigned long vaddr;
int count;
unsigned int last_pkmap_nr; // local variable to store static variable last_pkmap_nr
unsigned int color = get_pkmap_color(page);
start:
...
flush_all_zero_pkmaps();// release kmap_lock, then acquire it
count = get_pkmap_entries_count(color);
}
...
if (!pkmap_count[last_pkmap_nr]) // pkmap_count[last_pkmap_nr] is used or not
break; /* Found a usable entry */
if (--count)
continue;
...
vaddr = PKMAP_ADDR(last_pkmap_nr);
set_pte_at(&init_mm, vaddr,
&(pkmap_page_table[last_pkmap_nr]), mk_pte(page, kmap_prot));
pkmap_count[last_pkmap_nr] = 1;
...
return vaddr;
}
Fixes: 3297e760776a ("highmem: atomic highmem kmap page pinning")
Signed-off-by: zhangchun <zhang.chuna@h3c.com>
Co-developed-by: zhangzhansheng <zhang.zhansheng@h3c.com>
Signed-off-by: zhangzhansheng <zhang.zhansheng@h3c.com>
Suggested-by: Matthew Wilcox <willy@infradead.org>
Reviewed-by: zhangzhengming <zhang.zhengming@h3c.com>
---
mm/highmem.c | 12 +++++++++++-
1 file changed, 11 insertions(+), 1 deletion(-)
diff --git a/mm/highmem.c b/mm/highmem.c
index ef3189b..07f2c67 100644
--- a/mm/highmem.c
+++ b/mm/highmem.c
@@ -231,8 +231,18 @@ static void flush_all_zero_pkmaps(void)
set_page_address(page, NULL);
need_flush = 1;
}
- if (need_flush)
+ if (need_flush) {
+ /*
+ * In multi-core system one CPU holds the kmap_lock, waiting
+ * for other CPUs respond to IPI. But other CPUS has disabled
+ * irqs, waiting for kmap_lock, cannot answer the IPI. Release
+ * kmap_lock before call flush_tlb_kernel_range, avoid kmap_lock
+ * deadlock.
+ */
+ unlock_kmap();
flush_tlb_kernel_range(PKMAP_ADDR(0), PKMAP_ADDR(LAST_PKMAP));
+ lock_kmap();
+ }
}
void __kmap_flush_unused(void)
--
1.8.3.1
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v3] mm: Give kmap_lock before call flush_tlb_kernel_rang,avoid kmap_high deadlock.
2024-07-29 12:13 zhangchun
@ 2024-08-08 10:17 ` zhangchun
0 siblings, 0 replies; 18+ messages in thread
From: zhangchun @ 2024-08-08 10:17 UTC (permalink / raw)
To: zhang.chuna
Cc: akpm, jiaoxupo, linux-kernel, linux-mm, shaohaojize,
zhang.zhansheng, zhang.zhengming
Very sorry to disturb! Just a friendly ping!
--
1.8.3.1
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v3] mm: Give kmap_lock before call flush_tlb_kernel_rang,avoid kmap_high deadlock.
2024-07-10 17:36 ` Andrew Morton
` (2 preceding siblings ...)
2024-07-18 16:18 ` zhangchun
@ 2024-08-08 10:32 ` zhangchun
3 siblings, 0 replies; 18+ messages in thread
From: zhangchun @ 2024-08-08 10:32 UTC (permalink / raw)
To: akpm
Cc: jiaoxupo, linux-kernel, linux-mm, shaohaojize, zhang.chuna,
zhang.zhansheng, zhang.zhengming
Very sorry to disturb! Just a friendly ping!
--
1.8.3.1
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v3] mm: Give kmap_lock before call flush_tlb_kernel_rang,avoid kmap_high deadlock.
2024-07-24 0:26 ` Andrew Morton
@ 2024-08-19 16:10 ` zhangchun
2024-09-03 11:52 ` zhangchun
` (2 more replies)
0 siblings, 3 replies; 18+ messages in thread
From: zhangchun @ 2024-08-19 16:10 UTC (permalink / raw)
To: akpm
Cc: jiaoxupo, linux-kernel, linux-mm, shaohaojize, zhang.chuna,
zhang.zhansheng, zhang.zhengming
CPU 0: CPU 1:
kmap_high(){ kmap_xxx() {
... irq_disable();
spin_lock(&kmap_lock)
...
map_new_virtual ...
flush_all_zero_pkmaps
flush_tlb_kernel_range /* CPU0 holds the kmap_lock */
smp_call_function_many spin_lock(&kmap_lock)
... ....
spin_unlock(&kmap_lock)
...
CPU 0 holds the kmap_lock, waiting for CPU 1 respond to IPI. But CPU 1 has disabled irqs, waiting for kmap_lock,
cannot answer the IPI. Fix this by releasing kmap_lock before call flush_tlb_kernel_range, avoid kmap_lock
deadlock. Like this:
if (need_flush) {
unlock_kmap();
flush_tlb_kernel_range(PKMAP_ADDR(0), PKMAP_ADDR(LAST_PKMAP));
lock_kmap();
}
Dropping the lock is safe. kmap_lock is used to protect pkmap_count, pkmap_page_table and last_pkmap_nr(static variable).
When call flush_tlb_kernel_range(PKMAP_ADDR(0),
PKMAP_ADDR(LAST_PKMAP)), flush_tlb_kernel_range will neither modify nor read these variables. Leave that data unprotected
here is safe.
map_new_virtual aims to find an usable entry pkmap_count[last_pkmap_nr]. When read and modify the pkmap_count[last_pkmap_nr],
the kmap_lock is not dropped.
"if (!pkmap_count[last_pkmap_nr])" determine pkmap_count[last_pkmap_nr] is usable or not. If unusable, try agin.
Furthermore, the value of static variable last_pkmap_nr is stored in a local variable last_pkmap_nr, when kmap_lock is acquired,
this is thread-safe.
In an extreme case, if Thread A and Thread B access the same last_pkmap_nr, Thread A calls function flush_tlb_kernel_range and
release the kmap_lock, and Thread B then acquires the kmap_lock and modifies the variable pkmap_count[last_pkmap_nr]. After
Thread A completes the execution of function flush_tlb_kernel_range, it will check the variable pkmap_count[last_pkmap_nr].
static inline unsigned long map_new_virtual(struct page *page)
{
unsigned long vaddr;
int count;
unsigned int last_pkmap_nr; // local variable to store static variable last_pkmap_nr
unsigned int color = get_pkmap_color(page);
start:
...
flush_all_zero_pkmaps();// release kmap_lock, then acquire it
count = get_pkmap_entries_count(color);
}
...
if (!pkmap_count[last_pkmap_nr]) // pkmap_count[last_pkmap_nr] is used or not
break; /* Found a usable entry */
if (--count)
continue;
...
vaddr = PKMAP_ADDR(last_pkmap_nr);
set_pte_at(&init_mm, vaddr,
&(pkmap_page_table[last_pkmap_nr]), mk_pte(page, kmap_prot));
pkmap_count[last_pkmap_nr] = 1;
...
return vaddr;
}
Fixes: 3297e760776a ("highmem: atomic highmem kmap page pinning")
Signed-off-by: zhangchun <zhang.chuna@h3c.com>
Co-developed-by: zhangzhansheng <zhang.zhansheng@h3c.com>
Signed-off-by: zhangzhansheng <zhang.zhansheng@h3c.com>
Suggested-by: Matthew Wilcox <willy@infradead.org>
Reviewed-by: zhangzhengming <zhang.zhengming@h3c.com>
---
mm/highmem.c | 12 +++++++++++-
1 file changed, 11 insertions(+), 1 deletion(-)
diff --git a/mm/highmem.c b/mm/highmem.c index ef3189b..07f2c67 100644
--- a/mm/highmem.c
+++ b/mm/highmem.c
@@ -231,8 +231,18 @@ static void flush_all_zero_pkmaps(void)
set_page_address(page, NULL);
need_flush = 1;
}
- if (need_flush)
+ if (need_flush) {
+ /*
+ * In multi-core system one CPU holds the kmap_lock, waiting
+ * for other CPUs respond to IPI. But other CPUS has disabled
+ * irqs, waiting for kmap_lock, cannot answer the IPI. Release
+ * kmap_lock before call flush_tlb_kernel_range, avoid kmap_lock
+ * deadlock.
+ */
+ unlock_kmap();
flush_tlb_kernel_range(PKMAP_ADDR(0), PKMAP_ADDR(LAST_PKMAP));
+ lock_kmap();
+ }
}
void __kmap_flush_unused(void)
--
1.8.3.1
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v3] mm: Give kmap_lock before call flush_tlb_kernel_rang,avoid kmap_high deadlock.
2024-08-19 16:10 ` [PATCH v3] " zhangchun
@ 2024-09-03 11:52 ` zhangchun
2024-10-08 3:23 ` zhangchun
2024-10-14 7:41 ` zhangchun
2 siblings, 0 replies; 18+ messages in thread
From: zhangchun @ 2024-09-03 11:52 UTC (permalink / raw)
To: akpm
Cc: jiaoxupo, linux-kernel, linux-mm, shaohaojize, zhang.zhansheng,
zhang.zhengming, zhangchun
Very sorry to disturb! Just a friendly ping!
--
1.8.3.1
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v3] mm: Give kmap_lock before call flush_tlb_kernel_rang,avoid kmap_high deadlock.
[not found] <1724083806-21956-1-git-send-email-akpm@linux-foundation.org>
@ 2024-10-08 3:19 ` zhangchun
2024-10-08 3:20 ` zhangchun
1 sibling, 0 replies; 18+ messages in thread
From: zhangchun @ 2024-10-08 3:19 UTC (permalink / raw)
To: zhang.chuna
Cc: akpm, jiaoxupo, linux-kernel, linux-mm, shaohaojize,
zhang.zhansheng, zhang.zhengming
Very sorry to disturb! Just a friendly ping! This deadlock bug needs to fixed!
If any additional info needs, please contact me. Long for your reply!
--
1.8.3.1
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v3] mm: Give kmap_lock before call flush_tlb_kernel_rang,avoid kmap_high deadlock.
[not found] <1724083806-21956-1-git-send-email-akpm@linux-foundation.org>
2024-10-08 3:19 ` zhangchun
@ 2024-10-08 3:20 ` zhangchun
1 sibling, 0 replies; 18+ messages in thread
From: zhangchun @ 2024-10-08 3:20 UTC (permalink / raw)
To: zhang.chuna
Cc: akpm, jiaoxupo, linux-kernel, linux-mm, shaohaojize,
zhang.zhansheng, zhang.zhengming
Very sorry to disturb! Just a friendly ping! This deadlock bug needs to fixed!
If any additional info needs, please contact me. Long for your reply!
--
1.8.3.1
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v3] mm: Give kmap_lock before call flush_tlb_kernel_rang,avoid kmap_high deadlock.
2024-08-19 16:10 ` [PATCH v3] " zhangchun
2024-09-03 11:52 ` zhangchun
@ 2024-10-08 3:23 ` zhangchun
2024-10-14 7:41 ` zhangchun
2 siblings, 0 replies; 18+ messages in thread
From: zhangchun @ 2024-10-08 3:23 UTC (permalink / raw)
To: akpm
Cc: jiaoxupo, linux-kernel, linux-mm, shaohaojize, zhang.zhansheng,
zhang.zhengming, zhangchun
Very sorry to disturb! Just a friendly ping! This deadlock bug needs to fixed!
If any additional info needs, please contact me. Long for your reply!
--
1.8.3.1
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v3] mm: Give kmap_lock before call flush_tlb_kernel_rang,avoid kmap_high deadlock.
2024-08-19 16:10 ` [PATCH v3] " zhangchun
2024-09-03 11:52 ` zhangchun
2024-10-08 3:23 ` zhangchun
@ 2024-10-14 7:41 ` zhangchun
2024-10-31 15:18 ` [PATCH v4] " zhangchun
2 siblings, 1 reply; 18+ messages in thread
From: zhangchun @ 2024-10-14 7:41 UTC (permalink / raw)
To: akpm
Cc: jiaoxupo, linux-kernel, linux-mm, shaohaojize, zhang.zhansheng,
zhang.zhengming, zhangchun
Very sorry to disturb! Just a friendly ping! This deadlock bug is necessary to fix!
If any additional info needs, please contact me. Long for your reply!
--
1.8.3.1
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v4] mm: Give kmap_lock before call flush_tlb_kernel_rang,avoid kmap_high deadlock.
2024-10-14 7:41 ` zhangchun
@ 2024-10-31 15:18 ` zhangchun
2025-02-18 6:30 ` zhangchun
0 siblings, 1 reply; 18+ messages in thread
From: zhangchun @ 2024-10-31 15:18 UTC (permalink / raw)
To: akpm
Cc: linux-kernel, linux-mm, shaohaojize, jiaoxupo, zhangchun,
zhangzhansheng
CPU 0: CPU 1:
kmap_high(){ kmap_xxx() {
... irq_disable();
spin_lock(&kmap_lock)
...
map_new_virtual ...
flush_all_zero_pkmaps
flush_tlb_kernel_range /* CPU0 holds the kmap_lock */
smp_call_function_many spin_lock(&kmap_lock)
... ....
spin_unlock(&kmap_lock)
...
CPU 0 holds the kmap_lock, waiting for CPU 1 respond to IPI. But CPU 1 has disabled irqs,
waiting for kmap_lock, cannot answer the IPI.
Fix this by releasing kmap_lock before call flush_tlb_kernel_range, avoid kmap_lock deadlock.
if (need_flush) {
unlock_kmap();
flush_tlb_kernel_range(PKMAP_ADDR(0), PKMAP_ADDR(LAST_PKMAP));
lock_kmap();
}
Dropping the lock like this is safe. kmap_lock is used to protect pkmap_count, pkmap_page_table and last_pkmap_nr(static variable).
When call flush_tlb_kernel_range(PKMAP_ADDR(0), PKMAP_ADDR(LAST_PKMAP)), flush_tlb_kernel_range will neither modify nor read these variables.
Leave that data unprotected here is safe.
map_new_virtual aims to find an usable entry pkmap_count[last_pkmap_nr].
When read and modify the pkmap_count[last_pkmap_nr], the kmap_lock is not dropped. "if (!pkmap_count[last_pkmap_nr])" determine pkmap_count[last_pkmap_nr] is usable or not.
If unusable, try agin.
Furthermore, the value of static variable last_pkmap_nr is stored in a local variable last_pkmap_nr, when kmap_lock is acquired, this is thread-safe.
In an extreme case, if Thread A and Thread B access the same last_pkmap_nr, Thread A calls function flush_tlb_kernel_range and release the kmap_lock,
and Thread B then acquires the kmap_lock and modifies the variable pkmap_count[last_pkmap_nr]. After Thread A completes the execution of function the
variable pkmap_count[last_pkmap_nr]. After Thread A completes the execution of function flush_tlb_kernel_range, it will check the variable
pkmap_count[last_pkmap_nr].
static inline unsigned long map_new_virtual(struct page *page) {
unsigned long vaddr;
int count;
unsigned int last_pkmap_nr; // local variable to store static variable last_pkmap_nr
unsigned int color = get_pkmap_color(page);
start:
...
flush_all_zero_pkmaps();// release kmap_lock, then acquire it
count = get_pkmap_entries_count(color);
}
...
if (!pkmap_count[last_pkmap_nr]) // pkmap_count[last_pkmap_nr] is used or not
break; /* Found a usable entry */
if (--count)
continue;
...
vaddr = PKMAP_ADDR(last_pkmap_nr);
set_pte_at(&init_mm, vaddr,
&(pkmap_page_table[last_pkmap_nr]), mk_pte(page, kmap_prot));
pkmap_count[last_pkmap_nr] = 1;
...
return vaddr;
}
Fixes: 3297e760776a ("highmem: atomic highmem kmap page pinning")
Signed-off-by: zhangchun <zhang.chuna@h3c.com>
Co-developed-by: zhangzhansheng <zhang.zhansheng@h3c.com>
Signed-off-by: zhangzhansheng <zhang.zhansheng@h3c.com>
Suggested-by: Matthew Wilcox <willy@infradead.org>
Reviewed-by: zhangzhengming <zhang.zhengming@h3c.com>
---
mm/highmem.c | 12 +++++++++++-
1 file changed, 11 insertions(+), 1 deletion(-)
diff --git a/mm/highmem.c b/mm/highmem.c index ef3189b..07f2c67 100644
--- a/mm/highmem.c
+++ b/mm/highmem.c
@@ -231,8 +231,18 @@ static void flush_all_zero_pkmaps(void)
set_page_address(page, NULL);
need_flush = 1;
}
- if (need_flush)
+ if (need_flush) {
+ /*
+ * In multi-core system one CPU holds the kmap_lock, waiting
+ * for other CPUs respond to IPI. But other CPUS has disabled
+ * irqs, waiting for kmap_lock, cannot answer the IPI. Release
+ * kmap_lock before call flush_tlb_kernel_range, avoid kmap_lock
+ * deadlock.
+ */
+ unlock_kmap();
flush_tlb_kernel_range(PKMAP_ADDR(0), PKMAP_ADDR(LAST_PKMAP));
+ lock_kmap();
+ }
}
void __kmap_flush_unused(void)
--
1.8.3.1
--
1.8.3.1
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v4] mm: Give kmap_lock before call flush_tlb_kernel_rang,avoid kmap_high deadlock.
2024-10-31 15:18 ` [PATCH v4] " zhangchun
@ 2025-02-18 6:30 ` zhangchun
0 siblings, 0 replies; 18+ messages in thread
From: zhangchun @ 2025-02-18 6:30 UTC (permalink / raw)
To: akpm; +Cc: zhang.chuna, linux-kernel, linux-mm, shaohaojize, zhang.zhansheng
Very sorry to disturb! Just a friendly ping! Half a year has passed, this deadlock bug is
necessary to fix! If any additional info needs, please contact. Long for your reply!
--
1.8.3.1
^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2025-02-18 7:33 UTC | newest]
Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-07-10 12:20 [PATCH v2] mm: Give kmap_lock before call flush_tlb_kernel_rang,avoid kmap_high deadlock zhangchun
2024-07-10 17:36 ` Andrew Morton
2024-07-11 7:07 ` zhangchun
2024-07-11 21:13 ` Andrew Morton
2024-07-12 7:54 ` zhangchun
2024-07-18 16:18 ` zhangchun
2024-07-24 0:26 ` Andrew Morton
2024-08-19 16:10 ` [PATCH v3] " zhangchun
2024-09-03 11:52 ` zhangchun
2024-10-08 3:23 ` zhangchun
2024-10-14 7:41 ` zhangchun
2024-10-31 15:18 ` [PATCH v4] " zhangchun
2025-02-18 6:30 ` zhangchun
2024-08-08 10:32 ` [PATCH v3] " zhangchun
-- strict thread matches above, loose matches on Subject: below --
2024-07-29 12:13 zhangchun
2024-08-08 10:17 ` zhangchun
[not found] <1724083806-21956-1-git-send-email-akpm@linux-foundation.org>
2024-10-08 3:19 ` zhangchun
2024-10-08 3:20 ` zhangchun
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).