* [PATCH] mm: cachestat: avoid bogus workingset test during swapping & invalidation races
@ 2024-03-14 16:49 Johannes Weiner
2024-03-15 3:16 ` Chengming Zhou
0 siblings, 1 reply; 8+ messages in thread
From: Johannes Weiner @ 2024-03-14 16:49 UTC (permalink / raw)
To: Andrew Morton; +Cc: Nhat Pham, linux-mm, linux-kernel, Jann Horn
When cachestat against shmem races with swapping and invalidation, the
shadow entry might not exist: swapout IO is still in progress and
we're before __remove_mapping; or swapin/invalidation/swapoff has
removed the shadow from swapcache after we saw a shmem swap entry.
This will send a NULL to workingset_test_recent(). The latter purely
operates on pointer bits, so it won't crash - node 0, memcg ID 0,
eviction timestamp 0, etc. are all valid inputs - but it's a bogus
test. In theory that could result in a false "recently evicted" count.
Such a false positive wouldn't be the end of the world. But for code
clarity and (future) robustness, be explicit about this case.
Fixes: cf264e1329fb ("cachestat: implement cachestat syscall")
Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
mm/filemap.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/mm/filemap.c b/mm/filemap.c
index 222adac7c9c5..a07c27df7eab 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -4199,6 +4199,9 @@ static void filemap_cachestat(struct address_space *mapping,
swp_entry_t swp = radix_to_swp_entry(folio);
shadow = get_shadow_from_swap_cache(swp);
+ /* can race with swapping & invalidation */
+ if (!shadow)
+ goto resched;
}
#endif
if (workingset_test_recent(shadow, true, &workingset))
--
2.44.0
^ permalink raw reply related [flat|nested] 8+ messages in thread* Re: [PATCH] mm: cachestat: avoid bogus workingset test during swapping & invalidation races 2024-03-14 16:49 [PATCH] mm: cachestat: avoid bogus workingset test during swapping & invalidation races Johannes Weiner @ 2024-03-15 3:16 ` Chengming Zhou 2024-03-15 9:30 ` Johannes Weiner 0 siblings, 1 reply; 8+ messages in thread From: Chengming Zhou @ 2024-03-15 3:16 UTC (permalink / raw) To: Johannes Weiner, Andrew Morton Cc: Nhat Pham, linux-mm, linux-kernel, Jann Horn On 2024/3/15 00:49, Johannes Weiner wrote: > When cachestat against shmem races with swapping and invalidation, the > shadow entry might not exist: swapout IO is still in progress and > we're before __remove_mapping; or swapin/invalidation/swapoff has > removed the shadow from swapcache after we saw a shmem swap entry. > > This will send a NULL to workingset_test_recent(). The latter purely > operates on pointer bits, so it won't crash - node 0, memcg ID 0, > eviction timestamp 0, etc. are all valid inputs - but it's a bogus > test. In theory that could result in a false "recently evicted" count. Good catch! > > Such a false positive wouldn't be the end of the world. But for code > clarity and (future) robustness, be explicit about this case. > > Fixes: cf264e1329fb ("cachestat: implement cachestat syscall") > Reported-by: Jann Horn <jannh@google.com> > Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> > --- > mm/filemap.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/mm/filemap.c b/mm/filemap.c > index 222adac7c9c5..a07c27df7eab 100644 > --- a/mm/filemap.c > +++ b/mm/filemap.c > @@ -4199,6 +4199,9 @@ static void filemap_cachestat(struct address_space *mapping, > swp_entry_t swp = radix_to_swp_entry(folio); > IIUC, we should first check if it's a real swap entry using non_swap_entry(), right? Since there maybe other types of entries in shmem. And need to get_swap_device() to prevent concurrent swapoff here, get_shadow_from_swap_cache() won't do it for us. Thanks. > shadow = get_shadow_from_swap_cache(swp); > + /* can race with swapping & invalidation */ > + if (!shadow) > + goto resched; > } > #endif > if (workingset_test_recent(shadow, true, &workingset)) ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] mm: cachestat: avoid bogus workingset test during swapping & invalidation races 2024-03-15 3:16 ` Chengming Zhou @ 2024-03-15 9:30 ` Johannes Weiner 2024-03-15 9:47 ` Chengming Zhou 0 siblings, 1 reply; 8+ messages in thread From: Johannes Weiner @ 2024-03-15 9:30 UTC (permalink / raw) To: Chengming Zhou Cc: Andrew Morton, Nhat Pham, linux-mm, linux-kernel, Jann Horn On Fri, Mar 15, 2024 at 11:16:35AM +0800, Chengming Zhou wrote: > On 2024/3/15 00:49, Johannes Weiner wrote: > > When cachestat against shmem races with swapping and invalidation, the > > shadow entry might not exist: swapout IO is still in progress and > > we're before __remove_mapping; or swapin/invalidation/swapoff has > > removed the shadow from swapcache after we saw a shmem swap entry. > > > > This will send a NULL to workingset_test_recent(). The latter purely > > operates on pointer bits, so it won't crash - node 0, memcg ID 0, > > eviction timestamp 0, etc. are all valid inputs - but it's a bogus > > test. In theory that could result in a false "recently evicted" count. > > Good catch! > > > > > Such a false positive wouldn't be the end of the world. But for code > > clarity and (future) robustness, be explicit about this case. > > > > Fixes: cf264e1329fb ("cachestat: implement cachestat syscall") > > Reported-by: Jann Horn <jannh@google.com> > > Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> > > --- > > mm/filemap.c | 3 +++ > > 1 file changed, 3 insertions(+) > > > > diff --git a/mm/filemap.c b/mm/filemap.c > > index 222adac7c9c5..a07c27df7eab 100644 > > --- a/mm/filemap.c > > +++ b/mm/filemap.c > > @@ -4199,6 +4199,9 @@ static void filemap_cachestat(struct address_space *mapping, > > swp_entry_t swp = radix_to_swp_entry(folio); > > > > IIUC, we should first check if it's a real swap entry using non_swap_entry(), right? > Since there maybe other types of entries in shmem. Good point, it could be a poisoned entry. I'll add the non_swap_entry() check on swp. > And need to get_swap_device() to prevent concurrent swapoff here, > get_shadow_from_swap_cache() won't do it for us. We're holding rcu_read_lock() for the xarray iteration, so if we see the swap entry in the shmem mapping, it means we beat shmem_unuse() and swapoff hasn't run synchronize_rcu() yet. So it's safe. But I think it could use a comment. Maybe the documentation of get_swap_device() should mention this option too? ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] mm: cachestat: avoid bogus workingset test during swapping & invalidation races 2024-03-15 9:30 ` Johannes Weiner @ 2024-03-15 9:47 ` Chengming Zhou 2024-03-15 9:55 ` [PATCH] mm: cachestat: fix two shmem bugs Johannes Weiner 0 siblings, 1 reply; 8+ messages in thread From: Chengming Zhou @ 2024-03-15 9:47 UTC (permalink / raw) To: Johannes Weiner Cc: Andrew Morton, Nhat Pham, linux-mm, linux-kernel, Jann Horn On 2024/3/15 17:30, Johannes Weiner wrote: > On Fri, Mar 15, 2024 at 11:16:35AM +0800, Chengming Zhou wrote: >> On 2024/3/15 00:49, Johannes Weiner wrote: >>> When cachestat against shmem races with swapping and invalidation, the >>> shadow entry might not exist: swapout IO is still in progress and >>> we're before __remove_mapping; or swapin/invalidation/swapoff has >>> removed the shadow from swapcache after we saw a shmem swap entry. >>> >>> This will send a NULL to workingset_test_recent(). The latter purely >>> operates on pointer bits, so it won't crash - node 0, memcg ID 0, >>> eviction timestamp 0, etc. are all valid inputs - but it's a bogus >>> test. In theory that could result in a false "recently evicted" count. >> >> Good catch! >> >>> >>> Such a false positive wouldn't be the end of the world. But for code >>> clarity and (future) robustness, be explicit about this case. >>> >>> Fixes: cf264e1329fb ("cachestat: implement cachestat syscall") >>> Reported-by: Jann Horn <jannh@google.com> >>> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> >>> --- >>> mm/filemap.c | 3 +++ >>> 1 file changed, 3 insertions(+) >>> >>> diff --git a/mm/filemap.c b/mm/filemap.c >>> index 222adac7c9c5..a07c27df7eab 100644 >>> --- a/mm/filemap.c >>> +++ b/mm/filemap.c >>> @@ -4199,6 +4199,9 @@ static void filemap_cachestat(struct address_space *mapping, >>> swp_entry_t swp = radix_to_swp_entry(folio); >>> >> >> IIUC, we should first check if it's a real swap entry using non_swap_entry(), right? >> Since there maybe other types of entries in shmem. > > Good point, it could be a poisoned entry. I'll add the > non_swap_entry() check on swp. > >> And need to get_swap_device() to prevent concurrent swapoff here, >> get_shadow_from_swap_cache() won't do it for us. > > We're holding rcu_read_lock() for the xarray iteration, so if we see > the swap entry in the shmem mapping, it means we beat shmem_unuse() > and swapoff hasn't run synchronize_rcu() yet. Ah, you are right, so it's safe. > > So it's safe. But I think it could use a comment. Maybe the > documentation of get_swap_device() should mention this option too? ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH] mm: cachestat: fix two shmem bugs 2024-03-15 9:47 ` Chengming Zhou @ 2024-03-15 9:55 ` Johannes Weiner 2024-03-15 10:43 ` Chengming Zhou ` (2 more replies) 0 siblings, 3 replies; 8+ messages in thread From: Johannes Weiner @ 2024-03-15 9:55 UTC (permalink / raw) To: Chengming Zhou Cc: Andrew Morton, Nhat Pham, linux-mm, linux-kernel, Jann Horn When cachestat on shmem races with swapping and invalidation, there are two possible bugs: 1) A swapin error can have resulted in a poisoned swap entry in the shmem inode's xarray. Calling get_shadow_from_swap_cache() on it will result in an out-of-bounds access to swapper_spaces[]. Validate the entry with non_swap_entry() before going further. 2) When we find a valid swap entry in the shmem's inode, the shadow entry in the swapcache might not exist yet: swap IO is still in progress and we're before __remove_mapping; swapin, invalidation, or swapoff have removed the shadow from swapcache after we saw the shmem swap entry. This will send a NULL to workingset_test_recent(). The latter purely operates on pointer bits, so it won't crash - node 0, memcg ID 0, eviction timestamp 0, etc. are all valid inputs - but it's a bogus test. In theory that could result in a false "recently evicted" count. Such a false positive wouldn't be the end of the world. But for code clarity and (future) robustness, be explicit about this case. Bail on get_shadow_from_swap_cache() returning NULL. Fixes: cf264e1329fb ("cachestat: implement cachestat syscall") Cc: stable@vger.kernel.org [v6.5+] Reported-by: Chengming Zhou <chengming.zhou@linux.dev> [Bug #1] Reported-by: Jann Horn <jannh@google.com> [Bug #2] Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> --- mm/filemap.c | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) diff --git a/mm/filemap.c b/mm/filemap.c index 222adac7c9c5..0aa91bf6c1f7 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -4198,7 +4198,23 @@ static void filemap_cachestat(struct address_space *mapping, /* shmem file - in swap cache */ swp_entry_t swp = radix_to_swp_entry(folio); + /* swapin error results in poisoned entry */ + if (non_swap_entry(swp)) + goto resched; + + /* + * Getting a swap entry from the shmem + * inode means we beat + * shmem_unuse(). rcu_read_lock() + * ensures swapoff waits for us before + * freeing the swapper space. However, + * we can race with swapping and + * invalidation, so there might not be + * a shadow in the swapcache (yet). + */ shadow = get_shadow_from_swap_cache(swp); + if (!shadow) + goto resched; } #endif if (workingset_test_recent(shadow, true, &workingset)) -- 2.44.0 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH] mm: cachestat: fix two shmem bugs 2024-03-15 9:55 ` [PATCH] mm: cachestat: fix two shmem bugs Johannes Weiner @ 2024-03-15 10:43 ` Chengming Zhou 2024-03-16 2:41 ` Nhat Pham 2024-03-16 4:30 ` Nhat Pham 2 siblings, 0 replies; 8+ messages in thread From: Chengming Zhou @ 2024-03-15 10:43 UTC (permalink / raw) To: Johannes Weiner Cc: Andrew Morton, Nhat Pham, linux-mm, linux-kernel, Jann Horn On 2024/3/15 17:55, Johannes Weiner wrote: > When cachestat on shmem races with swapping and invalidation, there > are two possible bugs: > > 1) A swapin error can have resulted in a poisoned swap entry in the > shmem inode's xarray. Calling get_shadow_from_swap_cache() on it > will result in an out-of-bounds access to swapper_spaces[]. > > Validate the entry with non_swap_entry() before going further. > > 2) When we find a valid swap entry in the shmem's inode, the shadow > entry in the swapcache might not exist yet: swap IO is still in > progress and we're before __remove_mapping; swapin, invalidation, > or swapoff have removed the shadow from swapcache after we saw the > shmem swap entry. > > This will send a NULL to workingset_test_recent(). The latter > purely operates on pointer bits, so it won't crash - node 0, memcg > ID 0, eviction timestamp 0, etc. are all valid inputs - but it's a > bogus test. In theory that could result in a false "recently > evicted" count. > > Such a false positive wouldn't be the end of the world. But for > code clarity and (future) robustness, be explicit about this case. > > Bail on get_shadow_from_swap_cache() returning NULL. > > Fixes: cf264e1329fb ("cachestat: implement cachestat syscall") > Cc: stable@vger.kernel.org [v6.5+] > Reported-by: Chengming Zhou <chengming.zhou@linux.dev> [Bug #1] > Reported-by: Jann Horn <jannh@google.com> [Bug #2] > Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Looks good to me. Reviewed-by: Chengming Zhou <chengming.zhou@linux.dev> Thanks. > --- > mm/filemap.c | 16 ++++++++++++++++ > 1 file changed, 16 insertions(+) > > diff --git a/mm/filemap.c b/mm/filemap.c > index 222adac7c9c5..0aa91bf6c1f7 100644 > --- a/mm/filemap.c > +++ b/mm/filemap.c > @@ -4198,7 +4198,23 @@ static void filemap_cachestat(struct address_space *mapping, > /* shmem file - in swap cache */ > swp_entry_t swp = radix_to_swp_entry(folio); > > + /* swapin error results in poisoned entry */ > + if (non_swap_entry(swp)) > + goto resched; > + > + /* > + * Getting a swap entry from the shmem > + * inode means we beat > + * shmem_unuse(). rcu_read_lock() > + * ensures swapoff waits for us before > + * freeing the swapper space. However, > + * we can race with swapping and > + * invalidation, so there might not be > + * a shadow in the swapcache (yet). > + */ > shadow = get_shadow_from_swap_cache(swp); > + if (!shadow) > + goto resched; > } > #endif > if (workingset_test_recent(shadow, true, &workingset)) ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] mm: cachestat: fix two shmem bugs 2024-03-15 9:55 ` [PATCH] mm: cachestat: fix two shmem bugs Johannes Weiner 2024-03-15 10:43 ` Chengming Zhou @ 2024-03-16 2:41 ` Nhat Pham 2024-03-16 4:30 ` Nhat Pham 2 siblings, 0 replies; 8+ messages in thread From: Nhat Pham @ 2024-03-16 2:41 UTC (permalink / raw) To: Johannes Weiner Cc: Chengming Zhou, Andrew Morton, linux-mm, linux-kernel, Jann Horn On Fri, Mar 15, 2024 at 4:55 PM Johannes Weiner <hannes@cmpxchg.org> wrote: > > When cachestat on shmem races with swapping and invalidation, there > are two possible bugs: > > 1) A swapin error can have resulted in a poisoned swap entry in the > shmem inode's xarray. Calling get_shadow_from_swap_cache() on it > will result in an out-of-bounds access to swapper_spaces[]. > > Validate the entry with non_swap_entry() before going further. > > 2) When we find a valid swap entry in the shmem's inode, the shadow > entry in the swapcache might not exist yet: swap IO is still in > progress and we're before __remove_mapping; swapin, invalidation, > or swapoff have removed the shadow from swapcache after we saw the > shmem swap entry. > > This will send a NULL to workingset_test_recent(). The latter > purely operates on pointer bits, so it won't crash - node 0, memcg > ID 0, eviction timestamp 0, etc. are all valid inputs - but it's a > bogus test. In theory that could result in a false "recently > evicted" count. > > Such a false positive wouldn't be the end of the world. But for > code clarity and (future) robustness, be explicit about this case. > > Bail on get_shadow_from_swap_cache() returning NULL. > > Fixes: cf264e1329fb ("cachestat: implement cachestat syscall") > Cc: stable@vger.kernel.org [v6.5+] > Reported-by: Chengming Zhou <chengming.zhou@linux.dev> [Bug #1] > Reported-by: Jann Horn <jannh@google.com> [Bug #2] > Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Nice catch! Thanks for the report, Chengming and Jann, and thanks for the fix, Johannes! Reviewed-by: Nhat Pham <nphamcs@gmail.com> > --- > mm/filemap.c | 16 ++++++++++++++++ > 1 file changed, 16 insertions(+) > > diff --git a/mm/filemap.c b/mm/filemap.c > index 222adac7c9c5..0aa91bf6c1f7 100644 > --- a/mm/filemap.c > +++ b/mm/filemap.c > @@ -4198,7 +4198,23 @@ static void filemap_cachestat(struct address_space *mapping, > /* shmem file - in swap cache */ > swp_entry_t swp = radix_to_swp_entry(folio); > > + /* swapin error results in poisoned entry */ > + if (non_swap_entry(swp)) > + goto resched; > + > + /* > + * Getting a swap entry from the shmem > + * inode means we beat > + * shmem_unuse(). rcu_read_lock() > + * ensures swapoff waits for us before > + * freeing the swapper space. However, > + * we can race with swapping and > + * invalidation, so there might not be > + * a shadow in the swapcache (yet). > + */ > shadow = get_shadow_from_swap_cache(swp); > + if (!shadow) > + goto resched; > } > #endif > if (workingset_test_recent(shadow, true, &workingset)) > -- > 2.44.0 > ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] mm: cachestat: fix two shmem bugs 2024-03-15 9:55 ` [PATCH] mm: cachestat: fix two shmem bugs Johannes Weiner 2024-03-15 10:43 ` Chengming Zhou 2024-03-16 2:41 ` Nhat Pham @ 2024-03-16 4:30 ` Nhat Pham 2 siblings, 0 replies; 8+ messages in thread From: Nhat Pham @ 2024-03-16 4:30 UTC (permalink / raw) To: Johannes Weiner Cc: Chengming Zhou, Andrew Morton, linux-mm, linux-kernel, Jann Horn [-- Attachment #1: Type: text/plain, Size: 154 bytes --] 💖 Nhat Pham reacted via Gmail <https://www.google.com/gmail/about/?utm_source=gmail-in-product&utm_medium=et&utm_campaign=emojireactionemail#app> [-- Attachment #2: Type: text/vnd.google.email-reaction+json, Size: 40 bytes --] { "emoji": "💖", "version": 1 } [-- Attachment #3: Type: text/html, Size: 286 bytes --] ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2024-03-16 4:31 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-03-14 16:49 [PATCH] mm: cachestat: avoid bogus workingset test during swapping & invalidation races Johannes Weiner 2024-03-15 3:16 ` Chengming Zhou 2024-03-15 9:30 ` Johannes Weiner 2024-03-15 9:47 ` Chengming Zhou 2024-03-15 9:55 ` [PATCH] mm: cachestat: fix two shmem bugs Johannes Weiner 2024-03-15 10:43 ` Chengming Zhou 2024-03-16 2:41 ` Nhat Pham 2024-03-16 4:30 ` Nhat Pham
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).