* [PATCH] mm: swap: prevent possible data-race in __try_to_reclaim_swap
@ 2024-10-04 14:25 Jeongjun Park
2024-10-04 14:34 ` Matthew Wilcox
` (2 more replies)
0 siblings, 3 replies; 8+ messages in thread
From: Jeongjun Park @ 2024-10-04 14:25 UTC (permalink / raw)
To: akpm
Cc: kasong, linux-mm, linux-kernel, syzbot+fa43f1b63e3aa6f66329,
Jeongjun Park
A report [1] was uploaded from syzbot.
In the previous commit 862590ac3708 ("mm: swap: allow cache reclaim to skip
slot cache"), the __try_to_reclaim_swap() function reads offset and nr_pages
from folio without folio_lock protection.
In the currently reported KCSAN log, it is assumed that the actual data-race
will not occur because the calltrace that does WRITE already obtains the
folio_lock and then writes.
However, the existing __try_to_reclaim_swap() function was already implemented
to perform reads under folio_lock protection [1], and there is a risk of a
data-race occurring through a function other than the one shown in the KCSAN
log.
Therefore, I think it is appropriate to change all read operations for
folio to be performed under folio_lock.
[1]
==================================================================
BUG: KCSAN: data-race in __delete_from_swap_cache / __try_to_reclaim_swap
write to 0xffffea0004c90328 of 8 bytes by task 5186 on cpu 0:
__delete_from_swap_cache+0x1f0/0x290 mm/swap_state.c:163
delete_from_swap_cache+0x72/0xe0 mm/swap_state.c:243
folio_free_swap+0x1d8/0x1f0 mm/swapfile.c:1850
free_swap_cache mm/swap_state.c:293 [inline]
free_pages_and_swap_cache+0x1fc/0x410 mm/swap_state.c:325
__tlb_batch_free_encoded_pages mm/mmu_gather.c:136 [inline]
tlb_batch_pages_flush mm/mmu_gather.c:149 [inline]
tlb_flush_mmu_free mm/mmu_gather.c:366 [inline]
tlb_flush_mmu+0x2cf/0x440 mm/mmu_gather.c:373
zap_pte_range mm/memory.c:1700 [inline]
zap_pmd_range mm/memory.c:1739 [inline]
zap_pud_range mm/memory.c:1768 [inline]
zap_p4d_range mm/memory.c:1789 [inline]
unmap_page_range+0x1f3c/0x22d0 mm/memory.c:1810
unmap_single_vma+0x142/0x1d0 mm/memory.c:1856
unmap_vmas+0x18d/0x2b0 mm/memory.c:1900
exit_mmap+0x18a/0x690 mm/mmap.c:1864
__mmput+0x28/0x1b0 kernel/fork.c:1347
mmput+0x4c/0x60 kernel/fork.c:1369
exit_mm+0xe4/0x190 kernel/exit.c:571
do_exit+0x55e/0x17f0 kernel/exit.c:926
do_group_exit+0x102/0x150 kernel/exit.c:1088
get_signal+0xf2a/0x1070 kernel/signal.c:2917
arch_do_signal_or_restart+0x95/0x4b0 arch/x86/kernel/signal.c:337
exit_to_user_mode_loop kernel/entry/common.c:111 [inline]
exit_to_user_mode_prepare include/linux/entry-common.h:328 [inline]
__syscall_exit_to_user_mode_work kernel/entry/common.c:207 [inline]
syscall_exit_to_user_mode+0x59/0x130 kernel/entry/common.c:218
do_syscall_64+0xd6/0x1c0 arch/x86/entry/common.c:89
entry_SYSCALL_64_after_hwframe+0x77/0x7f
read to 0xffffea0004c90328 of 8 bytes by task 5189 on cpu 1:
__try_to_reclaim_swap+0x9d/0x510 mm/swapfile.c:198
free_swap_and_cache_nr+0x45d/0x8a0 mm/swapfile.c:1915
zap_pte_range mm/memory.c:1656 [inline]
zap_pmd_range mm/memory.c:1739 [inline]
zap_pud_range mm/memory.c:1768 [inline]
zap_p4d_range mm/memory.c:1789 [inline]
unmap_page_range+0xcf8/0x22d0 mm/memory.c:1810
unmap_single_vma+0x142/0x1d0 mm/memory.c:1856
unmap_vmas+0x18d/0x2b0 mm/memory.c:1900
exit_mmap+0x18a/0x690 mm/mmap.c:1864
__mmput+0x28/0x1b0 kernel/fork.c:1347
mmput+0x4c/0x60 kernel/fork.c:1369
exit_mm+0xe4/0x190 kernel/exit.c:571
do_exit+0x55e/0x17f0 kernel/exit.c:926
__do_sys_exit kernel/exit.c:1055 [inline]
__se_sys_exit kernel/exit.c:1053 [inline]
__x64_sys_exit+0x1f/0x20 kernel/exit.c:1053
x64_sys_call+0x2d46/0x2d60 arch/x86/include/generated/asm/syscalls_64.h:61
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xc9/0x1c0 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
value changed: 0x0000000000000242 -> 0x0000000000000000
Reported-by: syzbot+fa43f1b63e3aa6f66329@syzkaller.appspotmail.com
Fixes: 862590ac3708 ("mm: swap: allow cache reclaim to skip slot cache")
Signed-off-by: Jeongjun Park <aha310510@gmail.com>
---
mm/swapfile.c | 13 ++++++-------
1 file changed, 6 insertions(+), 7 deletions(-)
diff --git a/mm/swapfile.c b/mm/swapfile.c
index 0cded32414a1..904c21256fc2 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -193,13 +193,6 @@ static int __try_to_reclaim_swap(struct swap_info_struct *si,
folio = filemap_get_folio(address_space, swap_cache_index(entry));
if (IS_ERR(folio))
return 0;
-
- /* offset could point to the middle of a large folio */
- entry = folio->swap;
- offset = swp_offset(entry);
- nr_pages = folio_nr_pages(folio);
- ret = -nr_pages;
-
/*
* When this function is called from scan_swap_map_slots() and it's
* called by vmscan.c at reclaiming folios. So we hold a folio lock
@@ -210,6 +203,12 @@ static int __try_to_reclaim_swap(struct swap_info_struct *si,
if (!folio_trylock(folio))
goto out;
+ /* offset could point to the middle of a large folio */
+ entry = folio->swap;
+ offset = swp_offset(entry);
+ nr_pages = folio_nr_pages(folio);
+ ret = -nr_pages;
+
need_reclaim = ((flags & TTRS_ANYWAY) ||
((flags & TTRS_UNMAPPED) && !folio_mapped(folio)) ||
((flags & TTRS_FULL) && mem_cgroup_swap_full(folio)));
--
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH] mm: swap: prevent possible data-race in __try_to_reclaim_swap
2024-10-04 14:25 [PATCH] mm: swap: prevent possible data-race in __try_to_reclaim_swap Jeongjun Park
@ 2024-10-04 14:34 ` Matthew Wilcox
2024-10-04 14:50 ` Jeongjun Park
2024-10-06 20:15 ` Kairui Song
2024-10-07 5:06 ` kernel test robot
2 siblings, 1 reply; 8+ messages in thread
From: Matthew Wilcox @ 2024-10-04 14:34 UTC (permalink / raw)
To: Jeongjun Park
Cc: akpm, kasong, linux-mm, linux-kernel, syzbot+fa43f1b63e3aa6f66329
On Fri, Oct 04, 2024 at 11:25:04PM +0900, Jeongjun Park wrote:
> A report [1] was uploaded from syzbot.
>
> In the previous commit 862590ac3708 ("mm: swap: allow cache reclaim to skip
> slot cache"), the __try_to_reclaim_swap() function reads offset and nr_pages
> from folio without folio_lock protection.
Umm. You don't need folio_lock to read nr_pages. Holding a refcount
is sufficient to stabilise nr_pages. I cannot speak to folio->swap
though (and the KCSAN report does appear to be pointing to folio->swap).
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] mm: swap: prevent possible data-race in __try_to_reclaim_swap
2024-10-04 14:34 ` Matthew Wilcox
@ 2024-10-04 14:50 ` Jeongjun Park
0 siblings, 0 replies; 8+ messages in thread
From: Jeongjun Park @ 2024-10-04 14:50 UTC (permalink / raw)
To: Matthew Wilcox
Cc: akpm, kasong, linux-mm, linux-kernel, syzbot+fa43f1b63e3aa6f66329
Matthew Wilcox <willy@infradead.org> wrote:
>
> On Fri, Oct 04, 2024 at 11:25:04PM +0900, Jeongjun Park wrote:
> > A report [1] was uploaded from syzbot.
> >
> > In the previous commit 862590ac3708 ("mm: swap: allow cache reclaim to skip
> > slot cache"), the __try_to_reclaim_swap() function reads offset and nr_pages
> > from folio without folio_lock protection.
>
> Umm. You don't need folio_lock to read nr_pages. Holding a refcount
> is sufficient to stabilise nr_pages. I cannot speak to folio->swap
> though (and the KCSAN report does appear to be pointing to folio->swap).
>
That's right. It looks like KCSAN log occurs when reading folio->swap.
In fact, since most of the code reads folio->swap under the protection
of folio_lock, it is possible to modify only the part that reads folio->swap
and the code that reads offset to operate under the protection of
folio_lock.
However, even if reading nr_pages does not require folio_lock, I don't
think it is very desirable to modify only this code to not be protected
by folio_lock.
Regards,
Jeongjun Park
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] mm: swap: prevent possible data-race in __try_to_reclaim_swap
2024-10-04 14:25 [PATCH] mm: swap: prevent possible data-race in __try_to_reclaim_swap Jeongjun Park
2024-10-04 14:34 ` Matthew Wilcox
@ 2024-10-06 20:15 ` Kairui Song
2024-10-07 0:49 ` Jeongjun Park
2024-10-07 5:06 ` kernel test robot
2 siblings, 1 reply; 8+ messages in thread
From: Kairui Song @ 2024-10-06 20:15 UTC (permalink / raw)
To: Jeongjun Park; +Cc: akpm, linux-mm, linux-kernel, syzbot+fa43f1b63e3aa6f66329
On Fri, Oct 4, 2024 at 10:26 PM Jeongjun Park <aha310510@gmail.com> wrote:
>
> A report [1] was uploaded from syzbot.
>
> In the previous commit 862590ac3708 ("mm: swap: allow cache reclaim to skip
> slot cache"), the __try_to_reclaim_swap() function reads offset and nr_pages
> from folio without folio_lock protection.
>
> In the currently reported KCSAN log, it is assumed that the actual data-race
> will not occur because the calltrace that does WRITE already obtains the
> folio_lock and then writes.
>
> However, the existing __try_to_reclaim_swap() function was already implemented
> to perform reads under folio_lock protection [1], and there is a risk of a
> data-race occurring through a function other than the one shown in the KCSAN
> log.
>
> Therefore, I think it is appropriate to change all read operations for
> folio to be performed under folio_lock.
>
> [1]
>
> ==================================================================
> BUG: KCSAN: data-race in __delete_from_swap_cache / __try_to_reclaim_swap
>
> write to 0xffffea0004c90328 of 8 bytes by task 5186 on cpu 0:
> __delete_from_swap_cache+0x1f0/0x290 mm/swap_state.c:163
> delete_from_swap_cache+0x72/0xe0 mm/swap_state.c:243
> folio_free_swap+0x1d8/0x1f0 mm/swapfile.c:1850
> free_swap_cache mm/swap_state.c:293 [inline]
> free_pages_and_swap_cache+0x1fc/0x410 mm/swap_state.c:325
> __tlb_batch_free_encoded_pages mm/mmu_gather.c:136 [inline]
> tlb_batch_pages_flush mm/mmu_gather.c:149 [inline]
> tlb_flush_mmu_free mm/mmu_gather.c:366 [inline]
> tlb_flush_mmu+0x2cf/0x440 mm/mmu_gather.c:373
> zap_pte_range mm/memory.c:1700 [inline]
> zap_pmd_range mm/memory.c:1739 [inline]
> zap_pud_range mm/memory.c:1768 [inline]
> zap_p4d_range mm/memory.c:1789 [inline]
> unmap_page_range+0x1f3c/0x22d0 mm/memory.c:1810
> unmap_single_vma+0x142/0x1d0 mm/memory.c:1856
> unmap_vmas+0x18d/0x2b0 mm/memory.c:1900
> exit_mmap+0x18a/0x690 mm/mmap.c:1864
> __mmput+0x28/0x1b0 kernel/fork.c:1347
> mmput+0x4c/0x60 kernel/fork.c:1369
> exit_mm+0xe4/0x190 kernel/exit.c:571
> do_exit+0x55e/0x17f0 kernel/exit.c:926
> do_group_exit+0x102/0x150 kernel/exit.c:1088
> get_signal+0xf2a/0x1070 kernel/signal.c:2917
> arch_do_signal_or_restart+0x95/0x4b0 arch/x86/kernel/signal.c:337
> exit_to_user_mode_loop kernel/entry/common.c:111 [inline]
> exit_to_user_mode_prepare include/linux/entry-common.h:328 [inline]
> __syscall_exit_to_user_mode_work kernel/entry/common.c:207 [inline]
> syscall_exit_to_user_mode+0x59/0x130 kernel/entry/common.c:218
> do_syscall_64+0xd6/0x1c0 arch/x86/entry/common.c:89
> entry_SYSCALL_64_after_hwframe+0x77/0x7f
>
> read to 0xffffea0004c90328 of 8 bytes by task 5189 on cpu 1:
> __try_to_reclaim_swap+0x9d/0x510 mm/swapfile.c:198
> free_swap_and_cache_nr+0x45d/0x8a0 mm/swapfile.c:1915
> zap_pte_range mm/memory.c:1656 [inline]
> zap_pmd_range mm/memory.c:1739 [inline]
> zap_pud_range mm/memory.c:1768 [inline]
> zap_p4d_range mm/memory.c:1789 [inline]
> unmap_page_range+0xcf8/0x22d0 mm/memory.c:1810
> unmap_single_vma+0x142/0x1d0 mm/memory.c:1856
> unmap_vmas+0x18d/0x2b0 mm/memory.c:1900
> exit_mmap+0x18a/0x690 mm/mmap.c:1864
> __mmput+0x28/0x1b0 kernel/fork.c:1347
> mmput+0x4c/0x60 kernel/fork.c:1369
> exit_mm+0xe4/0x190 kernel/exit.c:571
> do_exit+0x55e/0x17f0 kernel/exit.c:926
> __do_sys_exit kernel/exit.c:1055 [inline]
> __se_sys_exit kernel/exit.c:1053 [inline]
> __x64_sys_exit+0x1f/0x20 kernel/exit.c:1053
> x64_sys_call+0x2d46/0x2d60 arch/x86/include/generated/asm/syscalls_64.h:61
> do_syscall_x64 arch/x86/entry/common.c:52 [inline]
> do_syscall_64+0xc9/0x1c0 arch/x86/entry/common.c:83
> entry_SYSCALL_64_after_hwframe+0x77/0x7f
>
> value changed: 0x0000000000000242 -> 0x0000000000000000
>
> Reported-by: syzbot+fa43f1b63e3aa6f66329@syzkaller.appspotmail.com
> Fixes: 862590ac3708 ("mm: swap: allow cache reclaim to skip slot cache")
> Signed-off-by: Jeongjun Park <aha310510@gmail.com>
> ---
> mm/swapfile.c | 13 ++++++-------
> 1 file changed, 6 insertions(+), 7 deletions(-)
>
> diff --git a/mm/swapfile.c b/mm/swapfile.c
> index 0cded32414a1..904c21256fc2 100644
> --- a/mm/swapfile.c
> +++ b/mm/swapfile.c
> @@ -193,13 +193,6 @@ static int __try_to_reclaim_swap(struct swap_info_struct *si,
> folio = filemap_get_folio(address_space, swap_cache_index(entry));
> if (IS_ERR(folio))
> return 0;
> -
> - /* offset could point to the middle of a large folio */
> - entry = folio->swap;
> - offset = swp_offset(entry);
> - nr_pages = folio_nr_pages(folio);
> - ret = -nr_pages;
> -
> /*
> * When this function is called from scan_swap_map_slots() and it's
> * called by vmscan.c at reclaiming folios. So we hold a folio lock
> @@ -210,6 +203,12 @@ static int __try_to_reclaim_swap(struct swap_info_struct *si,
> if (!folio_trylock(folio))
> goto out;
>
> + /* offset could point to the middle of a large folio */
> + entry = folio->swap;
> + offset = swp_offset(entry);
> + nr_pages = folio_nr_pages(folio);
> + ret = -nr_pages;
> +
> need_reclaim = ((flags & TTRS_ANYWAY) ||
> ((flags & TTRS_UNMAPPED) && !folio_mapped(folio)) ||
> ((flags & TTRS_FULL) && mem_cgroup_swap_full(folio)));
> --
>
Thanks for catching this!
This could lead to real problems, holding reference is not enough for
protecting folio->swap. There are several BUG_ONs later that will be
triggered if it changed.
But you still have to keep `nr_pages ` and `ret` before the
`folio_trylock `, or `ret` will be uninitialized if folio_trylock
fails, this function should always return the page number even if the
try lock failed. And as WIlly said, `folio_nr_pages` doesn't require
folio lock.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] mm: swap: prevent possible data-race in __try_to_reclaim_swap
2024-10-06 20:15 ` Kairui Song
@ 2024-10-07 0:49 ` Jeongjun Park
0 siblings, 0 replies; 8+ messages in thread
From: Jeongjun Park @ 2024-10-07 0:49 UTC (permalink / raw)
To: Kairui Song; +Cc: akpm, linux-mm, linux-kernel, syzbot+fa43f1b63e3aa6f66329
Kairui Song <ryncsn@gmail.com> wrote:
>
> On Fri, Oct 4, 2024 at 10:26 PM Jeongjun Park <aha310510@gmail.com> wrote:
>>
>> A report [1] was uploaded from syzbot.
>>
>> In the previous commit 862590ac3708 ("mm: swap: allow cache reclaim to skip
>> slot cache"), the __try_to_reclaim_swap() function reads offset and nr_pages
>> from folio without folio_lock protection.
>>
>> In the currently reported KCSAN log, it is assumed that the actual data-race
>> will not occur because the calltrace that does WRITE already obtains the
>> folio_lock and then writes.
>>
>> However, the existing __try_to_reclaim_swap() function was already implemented
>> to perform reads under folio_lock protection [1], and there is a risk of a
>> data-race occurring through a function other than the one shown in the KCSAN
>> log.
>>
>> Therefore, I think it is appropriate to change all read operations for
>> folio to be performed under folio_lock.
>>
>> [1]
>>
>> ==================================================================
>> BUG: KCSAN: data-race in __delete_from_swap_cache / __try_to_reclaim_swap
>>
>> write to 0xffffea0004c90328 of 8 bytes by task 5186 on cpu 0:
>> __delete_from_swap_cache+0x1f0/0x290 mm/swap_state.c:163
>> delete_from_swap_cache+0x72/0xe0 mm/swap_state.c:243
>> folio_free_swap+0x1d8/0x1f0 mm/swapfile.c:1850
>> free_swap_cache mm/swap_state.c:293 [inline]
>> free_pages_and_swap_cache+0x1fc/0x410 mm/swap_state.c:325
>> __tlb_batch_free_encoded_pages mm/mmu_gather.c:136 [inline]
>> tlb_batch_pages_flush mm/mmu_gather.c:149 [inline]
>> tlb_flush_mmu_free mm/mmu_gather.c:366 [inline]
>> tlb_flush_mmu+0x2cf/0x440 mm/mmu_gather.c:373
>> zap_pte_range mm/memory.c:1700 [inline]
>> zap_pmd_range mm/memory.c:1739 [inline]
>> zap_pud_range mm/memory.c:1768 [inline]
>> zap_p4d_range mm/memory.c:1789 [inline]
>> unmap_page_range+0x1f3c/0x22d0 mm/memory.c:1810
>> unmap_single_vma+0x142/0x1d0 mm/memory.c:1856
>> unmap_vmas+0x18d/0x2b0 mm/memory.c:1900
>> exit_mmap+0x18a/0x690 mm/mmap.c:1864
>> __mmput+0x28/0x1b0 kernel/fork.c:1347
>> mmput+0x4c/0x60 kernel/fork.c:1369
>> exit_mm+0xe4/0x190 kernel/exit.c:571
>> do_exit+0x55e/0x17f0 kernel/exit.c:926
>> do_group_exit+0x102/0x150 kernel/exit.c:1088
>> get_signal+0xf2a/0x1070 kernel/signal.c:2917
>> arch_do_signal_or_restart+0x95/0x4b0 arch/x86/kernel/signal.c:337
>> exit_to_user_mode_loop kernel/entry/common.c:111 [inline]
>> exit_to_user_mode_prepare include/linux/entry-common.h:328 [inline]
>> __syscall_exit_to_user_mode_work kernel/entry/common.c:207 [inline]
>> syscall_exit_to_user_mode+0x59/0x130 kernel/entry/common.c:218
>> do_syscall_64+0xd6/0x1c0 arch/x86/entry/common.c:89
>> entry_SYSCALL_64_after_hwframe+0x77/0x7f
>>
>> read to 0xffffea0004c90328 of 8 bytes by task 5189 on cpu 1:
>> __try_to_reclaim_swap+0x9d/0x510 mm/swapfile.c:198
>> free_swap_and_cache_nr+0x45d/0x8a0 mm/swapfile.c:1915
>> zap_pte_range mm/memory.c:1656 [inline]
>> zap_pmd_range mm/memory.c:1739 [inline]
>> zap_pud_range mm/memory.c:1768 [inline]
>> zap_p4d_range mm/memory.c:1789 [inline]
>> unmap_page_range+0xcf8/0x22d0 mm/memory.c:1810
>> unmap_single_vma+0x142/0x1d0 mm/memory.c:1856
>> unmap_vmas+0x18d/0x2b0 mm/memory.c:1900
>> exit_mmap+0x18a/0x690 mm/mmap.c:1864
>> __mmput+0x28/0x1b0 kernel/fork.c:1347
>> mmput+0x4c/0x60 kernel/fork.c:1369
>> exit_mm+0xe4/0x190 kernel/exit.c:571
>> do_exit+0x55e/0x17f0 kernel/exit.c:926
>> __do_sys_exit kernel/exit.c:1055 [inline]
>> __se_sys_exit kernel/exit.c:1053 [inline]
>> __x64_sys_exit+0x1f/0x20 kernel/exit.c:1053
>> x64_sys_call+0x2d46/0x2d60 arch/x86/include/generated/asm/syscalls_64.h:61
>> do_syscall_x64 arch/x86/entry/common.c:52 [inline]
>> do_syscall_64+0xc9/0x1c0 arch/x86/entry/common.c:83
>> entry_SYSCALL_64_after_hwframe+0x77/0x7f
>>
>> value changed: 0x0000000000000242 -> 0x0000000000000000
>>
>> Reported-by: syzbot+fa43f1b63e3aa6f66329@syzkaller.appspotmail.com
>> Fixes: 862590ac3708 ("mm: swap: allow cache reclaim to skip slot cache")
>> Signed-off-by: Jeongjun Park <aha310510@gmail.com>
>> ---
>> mm/swapfile.c | 13 ++++++-------
>> 1 file changed, 6 insertions(+), 7 deletions(-)
>>
>> diff --git a/mm/swapfile.c b/mm/swapfile.c
>> index 0cded32414a1..904c21256fc2 100644
>> --- a/mm/swapfile.c
>> +++ b/mm/swapfile.c
>> @@ -193,13 +193,6 @@ static int __try_to_reclaim_swap(struct swap_info_struct *si,
>> folio = filemap_get_folio(address_space, swap_cache_index(entry));
>> if (IS_ERR(folio))
>> return 0;
>> -
>> - /* offset could point to the middle of a large folio */
>> - entry = folio->swap;
>> - offset = swp_offset(entry);
>> - nr_pages = folio_nr_pages(folio);
>> - ret = -nr_pages;
>> -
>> /*
>> * When this function is called from scan_swap_map_slots() and it's
>> * called by vmscan.c at reclaiming folios. So we hold a folio lock
>> @@ -210,6 +203,12 @@ static int __try_to_reclaim_swap(struct swap_info_struct *si,
>> if (!folio_trylock(folio))
>> goto out;
>>
>> + /* offset could point to the middle of a large folio */
>> + entry = folio->swap;
>> + offset = swp_offset(entry);
>> + nr_pages = folio_nr_pages(folio);
>> + ret = -nr_pages;
>> +
>> need_reclaim = ((flags & TTRS_ANYWAY) ||
>> ((flags & TTRS_UNMAPPED) && !folio_mapped(folio)) ||
>> ((flags & TTRS_FULL) && mem_cgroup_swap_full(folio)));
>> --
>>
>
> Thanks for catching this!
>
> This could lead to real problems, holding reference is not enough for
> protecting folio->swap. There are several BUG_ONs later that will be
> triggered if it changed.
>
> But you still have to keep `nr_pages ` and `ret` before the
> `folio_trylock `, or `ret` will be uninitialized if folio_trylock
> fails, this function should always return the page number even if the
> try lock failed. And as WIlly said, `folio_nr_pages` doesn't require
> folio lock.
Oh, I see. After looking at the code again, I realized that
if we can get the folio, you should return a valid nr_pages
even if folio_trylock fails.
I'll send v2 patch with that shortly.
Regards,
Jeongjun Park
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] mm: swap: prevent possible data-race in __try_to_reclaim_swap
2024-10-04 14:25 [PATCH] mm: swap: prevent possible data-race in __try_to_reclaim_swap Jeongjun Park
2024-10-04 14:34 ` Matthew Wilcox
2024-10-06 20:15 ` Kairui Song
@ 2024-10-07 5:06 ` kernel test robot
2024-10-08 1:30 ` Andrew Morton
2 siblings, 1 reply; 8+ messages in thread
From: kernel test robot @ 2024-10-07 5:06 UTC (permalink / raw)
To: Jeongjun Park, akpm
Cc: llvm, oe-kbuild-all, kasong, linux-mm, linux-kernel,
syzbot+fa43f1b63e3aa6f66329, Jeongjun Park
Hi Jeongjun,
kernel test robot noticed the following build warnings:
[auto build test WARNING on akpm-mm/mm-everything]
url: https://github.com/intel-lab-lkp/linux/commits/Jeongjun-Park/mm-swap-prevent-possible-data-race-in-__try_to_reclaim_swap/20241004-222733
base: https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
patch link: https://lore.kernel.org/r/20241004142504.4379-1-aha310510%40gmail.com
patch subject: [PATCH] mm: swap: prevent possible data-race in __try_to_reclaim_swap
config: x86_64-kexec (https://download.01.org/0day-ci/archive/20241007/202410071223.t0yF8vP8-lkp@intel.com/config)
compiler: clang version 18.1.8 (https://github.com/llvm/llvm-project 3b5b5c1ec4a3095ab096dd780e84d7ab81f3d7ff)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20241007/202410071223.t0yF8vP8-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202410071223.t0yF8vP8-lkp@intel.com/
All warnings (new ones prefixed by >>):
>> mm/swapfile.c:203:6: warning: variable 'ret' is used uninitialized whenever 'if' condition is true [-Wsometimes-uninitialized]
203 | if (!folio_trylock(folio))
| ^~~~~~~~~~~~~~~~~~~~~
mm/swapfile.c:254:9: note: uninitialized use occurs here
254 | return ret;
| ^~~
mm/swapfile.c:203:2: note: remove the 'if' if its condition is always false
203 | if (!folio_trylock(folio))
| ^~~~~~~~~~~~~~~~~~~~~~~~~~
204 | goto out;
| ~~~~~~~~
mm/swapfile.c:190:9: note: initialize the variable 'ret' to silence this warning
190 | int ret, nr_pages;
| ^
| = 0
1 warning generated.
vim +203 mm/swapfile.c
bea67dcc5eea0f Barry Song 2024-08-08 177
a62fb92ac12ed3 Ryan Roberts 2024-04-08 178 /*
a62fb92ac12ed3 Ryan Roberts 2024-04-08 179 * returns number of pages in the folio that backs the swap entry. If positive,
a62fb92ac12ed3 Ryan Roberts 2024-04-08 180 * the folio was reclaimed. If negative, the folio was not reclaimed. If 0, no
a62fb92ac12ed3 Ryan Roberts 2024-04-08 181 * folio was associated with the swap entry.
a62fb92ac12ed3 Ryan Roberts 2024-04-08 182 */
bcd49e86710b42 Huang Ying 2018-10-26 183 static int __try_to_reclaim_swap(struct swap_info_struct *si,
bcd49e86710b42 Huang Ying 2018-10-26 184 unsigned long offset, unsigned long flags)
c9e444103b5e7a KAMEZAWA Hiroyuki 2009-06-16 185 {
efa90a981bbc89 Hugh Dickins 2009-12-14 186 swp_entry_t entry = swp_entry(si->type, offset);
862590ac3708e1 Kairui Song 2024-07-30 187 struct address_space *address_space = swap_address_space(entry);
862590ac3708e1 Kairui Song 2024-07-30 188 struct swap_cluster_info *ci;
2c3f6194b008b2 Matthew Wilcox (Oracle 2022-09-02 189) struct folio *folio;
862590ac3708e1 Kairui Song 2024-07-30 190 int ret, nr_pages;
862590ac3708e1 Kairui Song 2024-07-30 191 bool need_reclaim;
c9e444103b5e7a KAMEZAWA Hiroyuki 2009-06-16 192
862590ac3708e1 Kairui Song 2024-07-30 193 folio = filemap_get_folio(address_space, swap_cache_index(entry));
66dabbb65d673a Christoph Hellwig 2023-03-07 194 if (IS_ERR(folio))
c9e444103b5e7a KAMEZAWA Hiroyuki 2009-06-16 195 return 0;
c9e444103b5e7a KAMEZAWA Hiroyuki 2009-06-16 196 /*
bcd49e86710b42 Huang Ying 2018-10-26 197 * When this function is called from scan_swap_map_slots() and it's
2c3f6194b008b2 Matthew Wilcox (Oracle 2022-09-02 198) * called by vmscan.c at reclaiming folios. So we hold a folio lock
bcd49e86710b42 Huang Ying 2018-10-26 199 * here. We have to use trylock for avoiding deadlock. This is a special
2c3f6194b008b2 Matthew Wilcox (Oracle 2022-09-02 200) * case and you should use folio_free_swap() with explicit folio_lock()
c9e444103b5e7a KAMEZAWA Hiroyuki 2009-06-16 201 * in usual operations.
c9e444103b5e7a KAMEZAWA Hiroyuki 2009-06-16 202 */
862590ac3708e1 Kairui Song 2024-07-30 @203 if (!folio_trylock(folio))
862590ac3708e1 Kairui Song 2024-07-30 204 goto out;
862590ac3708e1 Kairui Song 2024-07-30 205
b2dbc30a2a909d Jeongjun Park 2024-10-04 206 /* offset could point to the middle of a large folio */
b2dbc30a2a909d Jeongjun Park 2024-10-04 207 entry = folio->swap;
b2dbc30a2a909d Jeongjun Park 2024-10-04 208 offset = swp_offset(entry);
b2dbc30a2a909d Jeongjun Park 2024-10-04 209 nr_pages = folio_nr_pages(folio);
b2dbc30a2a909d Jeongjun Park 2024-10-04 210 ret = -nr_pages;
b2dbc30a2a909d Jeongjun Park 2024-10-04 211
862590ac3708e1 Kairui Song 2024-07-30 212 need_reclaim = ((flags & TTRS_ANYWAY) ||
2c3f6194b008b2 Matthew Wilcox (Oracle 2022-09-02 213) ((flags & TTRS_UNMAPPED) && !folio_mapped(folio)) ||
862590ac3708e1 Kairui Song 2024-07-30 214 ((flags & TTRS_FULL) && mem_cgroup_swap_full(folio)));
862590ac3708e1 Kairui Song 2024-07-30 215 if (!need_reclaim || !folio_swapcache_freeable(folio))
862590ac3708e1 Kairui Song 2024-07-30 216 goto out_unlock;
862590ac3708e1 Kairui Song 2024-07-30 217
862590ac3708e1 Kairui Song 2024-07-30 218 /*
862590ac3708e1 Kairui Song 2024-07-30 219 * It's safe to delete the folio from swap cache only if the folio's
862590ac3708e1 Kairui Song 2024-07-30 220 * swap_map is HAS_CACHE only, which means the slots have no page table
862590ac3708e1 Kairui Song 2024-07-30 221 * reference or pending writeback, and can't be allocated to others.
862590ac3708e1 Kairui Song 2024-07-30 222 */
862590ac3708e1 Kairui Song 2024-07-30 223 ci = lock_cluster_or_swap_info(si, offset);
862590ac3708e1 Kairui Song 2024-07-30 224 need_reclaim = swap_is_has_cache(si, offset, nr_pages);
862590ac3708e1 Kairui Song 2024-07-30 225 unlock_cluster_or_swap_info(si, ci);
862590ac3708e1 Kairui Song 2024-07-30 226 if (!need_reclaim)
862590ac3708e1 Kairui Song 2024-07-30 227 goto out_unlock;
862590ac3708e1 Kairui Song 2024-07-30 228
862590ac3708e1 Kairui Song 2024-07-30 229 if (!(flags & TTRS_DIRECT)) {
862590ac3708e1 Kairui Song 2024-07-30 230 /* Free through slot cache */
862590ac3708e1 Kairui Song 2024-07-30 231 delete_from_swap_cache(folio);
862590ac3708e1 Kairui Song 2024-07-30 232 folio_set_dirty(folio);
862590ac3708e1 Kairui Song 2024-07-30 233 ret = nr_pages;
862590ac3708e1 Kairui Song 2024-07-30 234 goto out_unlock;
c9e444103b5e7a KAMEZAWA Hiroyuki 2009-06-16 235 }
862590ac3708e1 Kairui Song 2024-07-30 236
862590ac3708e1 Kairui Song 2024-07-30 237 xa_lock_irq(&address_space->i_pages);
862590ac3708e1 Kairui Song 2024-07-30 238 __delete_from_swap_cache(folio, entry, NULL);
862590ac3708e1 Kairui Song 2024-07-30 239 xa_unlock_irq(&address_space->i_pages);
862590ac3708e1 Kairui Song 2024-07-30 240 folio_ref_sub(folio, nr_pages);
862590ac3708e1 Kairui Song 2024-07-30 241 folio_set_dirty(folio);
862590ac3708e1 Kairui Song 2024-07-30 242
862590ac3708e1 Kairui Song 2024-07-30 243 spin_lock(&si->lock);
862590ac3708e1 Kairui Song 2024-07-30 244 /* Only sinple page folio can be backed by zswap */
862590ac3708e1 Kairui Song 2024-07-30 245 if (nr_pages == 1)
862590ac3708e1 Kairui Song 2024-07-30 246 zswap_invalidate(entry);
862590ac3708e1 Kairui Song 2024-07-30 247 swap_entry_range_free(si, entry, nr_pages);
862590ac3708e1 Kairui Song 2024-07-30 248 spin_unlock(&si->lock);
862590ac3708e1 Kairui Song 2024-07-30 249 ret = nr_pages;
862590ac3708e1 Kairui Song 2024-07-30 250 out_unlock:
862590ac3708e1 Kairui Song 2024-07-30 251 folio_unlock(folio);
862590ac3708e1 Kairui Song 2024-07-30 252 out:
2c3f6194b008b2 Matthew Wilcox (Oracle 2022-09-02 253) folio_put(folio);
c9e444103b5e7a KAMEZAWA Hiroyuki 2009-06-16 254 return ret;
c9e444103b5e7a KAMEZAWA Hiroyuki 2009-06-16 255 }
355cfa73ddff2f KAMEZAWA Hiroyuki 2009-06-16 256
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] mm: swap: prevent possible data-race in __try_to_reclaim_swap
2024-10-07 5:06 ` kernel test robot
@ 2024-10-08 1:30 ` Andrew Morton
2024-10-08 2:35 ` Jeongjun Park
0 siblings, 1 reply; 8+ messages in thread
From: Andrew Morton @ 2024-10-08 1:30 UTC (permalink / raw)
To: kernel test robot
Cc: Jeongjun Park, llvm, oe-kbuild-all, kasong, linux-mm,
linux-kernel, syzbot+fa43f1b63e3aa6f66329
On Mon, 7 Oct 2024 13:06:49 +0800 kernel test robot <lkp@intel.com> wrote:
> Hi Jeongjun,
>
> kernel test robot noticed the following build warnings:
>
> [auto build test WARNING on akpm-mm/mm-everything]
>
> url: https://github.com/intel-lab-lkp/linux/commits/Jeongjun-Park/mm-swap-prevent-possible-data-race-in-__try_to_reclaim_swap/20241004-222733
> base: https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
> patch link: https://lore.kernel.org/r/20241004142504.4379-1-aha310510%40gmail.com
> patch subject: [PATCH] mm: swap: prevent possible data-race in __try_to_reclaim_swap
> config: x86_64-kexec (https://download.01.org/0day-ci/archive/20241007/202410071223.t0yF8vP8-lkp@intel.com/config)
> compiler: clang version 18.1.8 (https://github.com/llvm/llvm-project 3b5b5c1ec4a3095ab096dd780e84d7ab81f3d7ff)
> reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20241007/202410071223.t0yF8vP8-lkp@intel.com/reproduce)
>
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <lkp@intel.com>
> | Closes: https://lore.kernel.org/oe-kbuild-all/202410071223.t0yF8vP8-lkp@intel.com/
>
> All warnings (new ones prefixed by >>):
>
> >> mm/swapfile.c:203:6: warning: variable 'ret' is used uninitialized whenever 'if' condition is true [-Wsometimes-uninitialized]
> 203 | if (!folio_trylock(folio))
> | ^~~~~~~~~~~~~~~~~~~~~
> mm/swapfile.c:254:9: note: uninitialized use occurs here
> 254 | return ret;
This warning can't be correct?
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] mm: swap: prevent possible data-race in __try_to_reclaim_swap
2024-10-08 1:30 ` Andrew Morton
@ 2024-10-08 2:35 ` Jeongjun Park
0 siblings, 0 replies; 8+ messages in thread
From: Jeongjun Park @ 2024-10-08 2:35 UTC (permalink / raw)
To: Andrew Morton
Cc: kernel test robot, llvm, oe-kbuild-all, kasong, linux-mm,
linux-kernel, syzbot+fa43f1b63e3aa6f66329
Andrew Morton <akpm@linux-foundation.org> wrote:
>
> On Mon, 7 Oct 2024 13:06:49 +0800 kernel test robot <lkp@intel.com> wrote:
>
> > Hi Jeongjun,
> >
> > kernel test robot noticed the following build warnings:
> >
> > [auto build test WARNING on akpm-mm/mm-everything]
> >
> > url: https://github.com/intel-lab-lkp/linux/commits/Jeongjun-Park/mm-swap-prevent-possible-data-race-in-__try_to_reclaim_swap/20241004-222733
> > base: https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
> > patch link: https://lore.kernel.org/r/20241004142504.4379-1-aha310510%40gmail.com
> > patch subject: [PATCH] mm: swap: prevent possible data-race in __try_to_reclaim_swap
> > config: x86_64-kexec (https://download.01.org/0day-ci/archive/20241007/202410071223.t0yF8vP8-lkp@intel.com/config)
> > compiler: clang version 18.1.8 (https://github.com/llvm/llvm-project 3b5b5c1ec4a3095ab096dd780e84d7ab81f3d7ff)
> > reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20241007/202410071223.t0yF8vP8-lkp@intel.com/reproduce)
> >
> > If you fix the issue in a separate patch/commit (i.e. not just a new version of
> > the same patch/commit), kindly add following tags
> > | Reported-by: kernel test robot <lkp@intel.com>
> > | Closes: https://lore.kernel.org/oe-kbuild-all/202410071223.t0yF8vP8-lkp@intel.com/
> >
> > All warnings (new ones prefixed by >>):
> >
> > >> mm/swapfile.c:203:6: warning: variable 'ret' is used uninitialized whenever 'if' condition is true [-Wsometimes-uninitialized]
> > 203 | if (!folio_trylock(folio))
> > | ^~~~~~~~~~~~~~~~~~~~~
> > mm/swapfile.c:254:9: note: uninitialized use occurs here
> > 254 | return ret;
>
> This warning can't be correct?
I think it's correct. Even if folio_trylock fails, the return value
should be -nr_pages. Not initializing ret like in the v1 patch
goes against the design purpose of the function.
So I think it's right to apply the v2 patch that I sent you.
Regards,
Jeongjun Park
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2024-10-08 2:35 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-04 14:25 [PATCH] mm: swap: prevent possible data-race in __try_to_reclaim_swap Jeongjun Park
2024-10-04 14:34 ` Matthew Wilcox
2024-10-04 14:50 ` Jeongjun Park
2024-10-06 20:15 ` Kairui Song
2024-10-07 0:49 ` Jeongjun Park
2024-10-07 5:06 ` kernel test robot
2024-10-08 1:30 ` Andrew Morton
2024-10-08 2:35 ` Jeongjun Park
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).