* [PATCH RFC v2] mm/shmem: set __GFP_SKIP_KASAN for swap_cluster_readahead
@ 2026-05-20 4:31 Chia-I Wu via B4 Relay
2026-05-20 10:04 ` Baolin Wang
0 siblings, 1 reply; 7+ messages in thread
From: Chia-I Wu via B4 Relay @ 2026-05-20 4:31 UTC (permalink / raw)
To: Andrey Ryabinin, Alexander Potapenko, Andrey Konovalov,
Dmitry Vyukov, Vincenzo Frascino, Andrew Morton, Hugh Dickins,
Baolin Wang
Cc: kasan-dev, linux-mm, linux-kernel, Boris Brezillon, Chia-I Wu
From: Chia-I Wu <olvaffe@gmail.com>
swap_cluster_readahead can allocate folios for other mappings. If the
gfp flags do not have __GFP_SKIP_KASAN, but the other mappings have
PROT_MTE, we can end up with false KASAN errors such as
BUG: KASAN: invalid-access in swap_writepage+0xb0/0x21c
Read at addr f5ffff81aa71dff8 by task WM.task-4/6956
Pointer tag: [f5], memory tag: [f9]
In the above example, because __GFP_SKIP_KASAN was missing, KASAN set
both pointer tag and memory tag to 0xf5 when swap_cluster_readahead
allocated the folio. But the userspace had already set the memory tag to
0xf9 before swapped out. arch_swap_restore restored the memory tag back
to 0xf9, leading to the mismatch.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
---
Changes in v2:
- set __GFP_SKIP_KASAN for shmem instead of drm/panthor
- Link to v1: https://patch.msgid.link/20260512-panthor-kasan-v1-1-d8d3e275d71b@gmail.com
---
mm/shmem.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/mm/shmem.c b/mm/shmem.c
index 3b5dc21b323c2..db9130a8c5b76 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1784,6 +1784,11 @@ static struct folio *shmem_swapin_cluster(swp_entry_t swap, gfp_t gfp,
pgoff_t ilx;
struct folio *folio;
+ /* swap_cluster_readahead might cross the mapping boundary and
+ * allocate pages for other mappings. We have to skip KASAN.
+ */
+ gfp |= __GFP_SKIP_KASAN;
+
mpol = shmem_get_pgoff_policy(info, index, 0, &ilx);
folio = swap_cluster_readahead(swap, gfp, mpol, ilx);
mpol_cond_put(mpol);
---
base-commit: 5200f5f493f79f14bbdc349e402a40dfb32f23c8
change-id: 20260512-panthor-kasan-10477239bad1
Best regards,
--
Chia-I Wu <olvaffe@gmail.com>
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH RFC v2] mm/shmem: set __GFP_SKIP_KASAN for swap_cluster_readahead
2026-05-20 4:31 [PATCH RFC v2] mm/shmem: set __GFP_SKIP_KASAN for swap_cluster_readahead Chia-I Wu via B4 Relay
@ 2026-05-20 10:04 ` Baolin Wang
2026-05-20 17:06 ` Chia-I Wu
0 siblings, 1 reply; 7+ messages in thread
From: Baolin Wang @ 2026-05-20 10:04 UTC (permalink / raw)
To: olvaffe, Andrey Ryabinin, Alexander Potapenko, Andrey Konovalov,
Dmitry Vyukov, Vincenzo Frascino, Andrew Morton, Hugh Dickins,
Kairui Song
Cc: kasan-dev, linux-mm, linux-kernel, Boris Brezillon
CC Kairui,
On 5/20/26 12:31 PM, Chia-I Wu via B4 Relay wrote:
> From: Chia-I Wu <olvaffe@gmail.com>
>
> swap_cluster_readahead can allocate folios for other mappings. If the
> gfp flags do not have __GFP_SKIP_KASAN, but the other mappings have
> PROT_MTE, we can end up with false KASAN errors such as
>
> BUG: KASAN: invalid-access in swap_writepage+0xb0/0x21c
> Read at addr f5ffff81aa71dff8 by task WM.task-4/6956
> Pointer tag: [f5], memory tag: [f9]
>
> In the above example, because __GFP_SKIP_KASAN was missing, KASAN set
> both pointer tag and memory tag to 0xf5 when swap_cluster_readahead
> allocated the folio. But the userspace had already set the memory tag to
> 0xf9 before swapped out. arch_swap_restore restored the memory tag back
> to 0xf9, leading to the mismatch.
>
> Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
> ---
> Changes in v2:
> - set __GFP_SKIP_KASAN for shmem instead of drm/panthor
> - Link to v1: https://patch.msgid.link/20260512-panthor-kasan-v1-1-d8d3e275d71b@gmail.com
> ---
> mm/shmem.c | 5 +++++
> 1 file changed, 5 insertions(+)
>
> diff --git a/mm/shmem.c b/mm/shmem.c
> index 3b5dc21b323c2..db9130a8c5b76 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -1784,6 +1784,11 @@ static struct folio *shmem_swapin_cluster(swp_entry_t swap, gfp_t gfp,
> pgoff_t ilx;
> struct folio *folio;
>
> + /* swap_cluster_readahead might cross the mapping boundary and
> + * allocate pages for other mappings. We have to skip KASAN.
> + */
> + gfp |= __GFP_SKIP_KASAN;
> +
> mpol = shmem_get_pgoff_policy(info, index, 0, &ilx);
> folio = swap_cluster_readahead(swap, gfp, mpol, ilx);
> mpol_cond_put(mpol);
If we force __GFP_SKIP_KASAN, would this cause issues for mappings that
explicitly should NOT have the flag? and your v1 link already mentions
this scenario.
Additionally, I'm wondering if we could use shmem_should_replace_folio()
to detect such cases where shmem is being prematurely swapped in with
incorrect GFP flags (e.g.: __GFP_SKIP_KASAN), and then handle it through
shmem_replace_folio()?
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH RFC v2] mm/shmem: set __GFP_SKIP_KASAN for swap_cluster_readahead
2026-05-20 10:04 ` Baolin Wang
@ 2026-05-20 17:06 ` Chia-I Wu
2026-05-21 7:05 ` Baolin Wang
0 siblings, 1 reply; 7+ messages in thread
From: Chia-I Wu @ 2026-05-20 17:06 UTC (permalink / raw)
To: Baolin Wang
Cc: Andrey Ryabinin, Alexander Potapenko, Andrey Konovalov,
Dmitry Vyukov, Vincenzo Frascino, Andrew Morton, Hugh Dickins,
Kairui Song, kasan-dev, linux-mm, linux-kernel, Boris Brezillon
On Wed, May 20, 2026 at 3:04 AM Baolin Wang
<baolin.wang@linux.alibaba.com> wrote:
>
> CC Kairui,
>
> On 5/20/26 12:31 PM, Chia-I Wu via B4 Relay wrote:
> > From: Chia-I Wu <olvaffe@gmail.com>
> >
> > swap_cluster_readahead can allocate folios for other mappings. If the
> > gfp flags do not have __GFP_SKIP_KASAN, but the other mappings have
> > PROT_MTE, we can end up with false KASAN errors such as
> >
> > BUG: KASAN: invalid-access in swap_writepage+0xb0/0x21c
> > Read at addr f5ffff81aa71dff8 by task WM.task-4/6956
> > Pointer tag: [f5], memory tag: [f9]
> >
> > In the above example, because __GFP_SKIP_KASAN was missing, KASAN set
> > both pointer tag and memory tag to 0xf5 when swap_cluster_readahead
> > allocated the folio. But the userspace had already set the memory tag to
> > 0xf9 before swapped out. arch_swap_restore restored the memory tag back
> > to 0xf9, leading to the mismatch.
> >
> > Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
> > ---
> > Changes in v2:
> > - set __GFP_SKIP_KASAN for shmem instead of drm/panthor
> > - Link to v1: https://patch.msgid.link/20260512-panthor-kasan-v1-1-d8d3e275d71b@gmail.com
> > ---
> > mm/shmem.c | 5 +++++
> > 1 file changed, 5 insertions(+)
> >
> > diff --git a/mm/shmem.c b/mm/shmem.c
> > index 3b5dc21b323c2..db9130a8c5b76 100644
> > --- a/mm/shmem.c
> > +++ b/mm/shmem.c
> > @@ -1784,6 +1784,11 @@ static struct folio *shmem_swapin_cluster(swp_entry_t swap, gfp_t gfp,
> > pgoff_t ilx;
> > struct folio *folio;
> >
> > + /* swap_cluster_readahead might cross the mapping boundary and
> > + * allocate pages for other mappings. We have to skip KASAN.
> > + */
> > + gfp |= __GFP_SKIP_KASAN;
> > +
> > mpol = shmem_get_pgoff_policy(info, index, 0, &ilx);
> > folio = swap_cluster_readahead(swap, gfp, mpol, ilx);
> > mpol_cond_put(mpol);
>
> If we force __GFP_SKIP_KASAN, would this cause issues for mappings that
> explicitly should NOT have the flag? and your v1 link already mentions
> this scenario.
We lose the benefits of kasan hw tags (other modes are not affected)
by forcing the flag.
The other mappings swap_cluster_readahead can affect are anon
mappings, regular shmem mappings, or gpu shmem mappings. I think only
gpu shmem mappings miss __GFP_SKIP_KASAN. That might not even be
intentional, because gpu shmem mappings pick GFP_HIGHUSER over
GFP_HIGHUSER_MOVABLE to avoid __GFP_MOVABLE. That was before
__GFP_SKIP_KASAN was added to GFP_HIGHUSER_MOVABLE.
I guess what I am trying to say is these are all user pages. We have
to skip kasan when user pages can be mapped PROT_MTE. The
justification for gpu shmem mappings is that they cannot be mapped
PROT_MTE. But if readahead can affect non-gpu shmem mappings, it seems
we have to either force __GFP_SKIP_KASAN or to cap/disable readahead.
>
> Additionally, I'm wondering if we could use shmem_should_replace_folio()
> to detect such cases where shmem is being prematurely swapped in with
> incorrect GFP flags (e.g.: __GFP_SKIP_KASAN), and then handle it through
> shmem_replace_folio()?
I don't know if we want to impose a copy for the benefits. More
importantly, this only helps shmem mappings but not anon mappings.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH RFC v2] mm/shmem: set __GFP_SKIP_KASAN for swap_cluster_readahead
2026-05-20 17:06 ` Chia-I Wu
@ 2026-05-21 7:05 ` Baolin Wang
2026-05-21 8:51 ` Boris Brezillon
0 siblings, 1 reply; 7+ messages in thread
From: Baolin Wang @ 2026-05-21 7:05 UTC (permalink / raw)
To: Chia-I Wu
Cc: Andrey Ryabinin, Alexander Potapenko, Andrey Konovalov,
Dmitry Vyukov, Vincenzo Frascino, Andrew Morton, Hugh Dickins,
Kairui Song, kasan-dev, linux-mm, linux-kernel, Boris Brezillon
On 5/21/26 1:06 AM, Chia-I Wu wrote:
> On Wed, May 20, 2026 at 3:04 AM Baolin Wang
> <baolin.wang@linux.alibaba.com> wrote:
>>
>> CC Kairui,
>>
>> On 5/20/26 12:31 PM, Chia-I Wu via B4 Relay wrote:
>>> From: Chia-I Wu <olvaffe@gmail.com>
>>>
>>> swap_cluster_readahead can allocate folios for other mappings. If the
>>> gfp flags do not have __GFP_SKIP_KASAN, but the other mappings have
>>> PROT_MTE, we can end up with false KASAN errors such as
>>>
>>> BUG: KASAN: invalid-access in swap_writepage+0xb0/0x21c
>>> Read at addr f5ffff81aa71dff8 by task WM.task-4/6956
>>> Pointer tag: [f5], memory tag: [f9]
>>>
>>> In the above example, because __GFP_SKIP_KASAN was missing, KASAN set
>>> both pointer tag and memory tag to 0xf5 when swap_cluster_readahead
>>> allocated the folio. But the userspace had already set the memory tag to
>>> 0xf9 before swapped out. arch_swap_restore restored the memory tag back
>>> to 0xf9, leading to the mismatch.
>>>
>>> Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
>>> ---
>>> Changes in v2:
>>> - set __GFP_SKIP_KASAN for shmem instead of drm/panthor
>>> - Link to v1: https://patch.msgid.link/20260512-panthor-kasan-v1-1-d8d3e275d71b@gmail.com
>>> ---
>>> mm/shmem.c | 5 +++++
>>> 1 file changed, 5 insertions(+)
>>>
>>> diff --git a/mm/shmem.c b/mm/shmem.c
>>> index 3b5dc21b323c2..db9130a8c5b76 100644
>>> --- a/mm/shmem.c
>>> +++ b/mm/shmem.c
>>> @@ -1784,6 +1784,11 @@ static struct folio *shmem_swapin_cluster(swp_entry_t swap, gfp_t gfp,
>>> pgoff_t ilx;
>>> struct folio *folio;
>>>
>>> + /* swap_cluster_readahead might cross the mapping boundary and
>>> + * allocate pages for other mappings. We have to skip KASAN.
>>> + */
>>> + gfp |= __GFP_SKIP_KASAN;
>>> +
>>> mpol = shmem_get_pgoff_policy(info, index, 0, &ilx);
>>> folio = swap_cluster_readahead(swap, gfp, mpol, ilx);
>>> mpol_cond_put(mpol);
>>
>> If we force __GFP_SKIP_KASAN, would this cause issues for mappings that
>> explicitly should NOT have the flag? and your v1 link already mentions
>> this scenario.
> We lose the benefits of kasan hw tags (other modes are not affected)
> by forcing the flag.
>
> The other mappings swap_cluster_readahead can affect are anon
> mappings, regular shmem mappings, or gpu shmem mappings. I think only
> gpu shmem mappings miss __GFP_SKIP_KASAN. That might not even be
> intentional, because gpu shmem mappings pick GFP_HIGHUSER over
> GFP_HIGHUSER_MOVABLE to avoid __GFP_MOVABLE. That was before
> __GFP_SKIP_KASAN was added to GFP_HIGHUSER_MOVABLE.
It sounds like the right approach would be to explicitly set
__GFP_SKIP_KASAN for GPU shmem mappings, no? I think having users
explicitly set __GFP_SKIP_KASAN makes the implications clearer than
having shmem core set it implicitly.
We could also consider adding a VM_WARN in shmem_swapin_cluster() to
detect any mappings missing the __GFP_SKIP_KASAN flag.
> I guess what I am trying to say is these are all user pages. We have
> to skip kasan when user pages can be mapped PROT_MTE. The
Yes, regular shmem mappings typically default to GFP_HIGHUSER_MOVABLE,
while GPU shmem mappings are a special case.
> justification for gpu shmem mappings is that they cannot be mapped
> PROT_MTE. But if readahead can affect non-gpu shmem mappings, it seems
> we have to either force __GFP_SKIP_KASAN or to cap/disable readahead.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH RFC v2] mm/shmem: set __GFP_SKIP_KASAN for swap_cluster_readahead
2026-05-21 7:05 ` Baolin Wang
@ 2026-05-21 8:51 ` Boris Brezillon
2026-05-21 15:49 ` Chia-I Wu
0 siblings, 1 reply; 7+ messages in thread
From: Boris Brezillon @ 2026-05-21 8:51 UTC (permalink / raw)
To: Baolin Wang
Cc: Chia-I Wu, Andrey Ryabinin, Alexander Potapenko, Andrey Konovalov,
Dmitry Vyukov, Vincenzo Frascino, Andrew Morton, Hugh Dickins,
Kairui Song, kasan-dev, linux-mm, linux-kernel
On Thu, 21 May 2026 15:05:21 +0800
Baolin Wang <baolin.wang@linux.alibaba.com> wrote:
> On 5/21/26 1:06 AM, Chia-I Wu wrote:
> > On Wed, May 20, 2026 at 3:04 AM Baolin Wang
> > <baolin.wang@linux.alibaba.com> wrote:
> >>
> >> CC Kairui,
> >>
> >> On 5/20/26 12:31 PM, Chia-I Wu via B4 Relay wrote:
> >>> From: Chia-I Wu <olvaffe@gmail.com>
> >>>
> >>> swap_cluster_readahead can allocate folios for other mappings. If the
> >>> gfp flags do not have __GFP_SKIP_KASAN, but the other mappings have
> >>> PROT_MTE, we can end up with false KASAN errors such as
> >>>
> >>> BUG: KASAN: invalid-access in swap_writepage+0xb0/0x21c
> >>> Read at addr f5ffff81aa71dff8 by task WM.task-4/6956
> >>> Pointer tag: [f5], memory tag: [f9]
> >>>
> >>> In the above example, because __GFP_SKIP_KASAN was missing, KASAN set
> >>> both pointer tag and memory tag to 0xf5 when swap_cluster_readahead
> >>> allocated the folio. But the userspace had already set the memory tag to
> >>> 0xf9 before swapped out. arch_swap_restore restored the memory tag back
> >>> to 0xf9, leading to the mismatch.
> >>>
> >>> Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
> >>> ---
> >>> Changes in v2:
> >>> - set __GFP_SKIP_KASAN for shmem instead of drm/panthor
> >>> - Link to v1: https://patch.msgid.link/20260512-panthor-kasan-v1-1-d8d3e275d71b@gmail.com
> >>> ---
> >>> mm/shmem.c | 5 +++++
> >>> 1 file changed, 5 insertions(+)
> >>>
> >>> diff --git a/mm/shmem.c b/mm/shmem.c
> >>> index 3b5dc21b323c2..db9130a8c5b76 100644
> >>> --- a/mm/shmem.c
> >>> +++ b/mm/shmem.c
> >>> @@ -1784,6 +1784,11 @@ static struct folio *shmem_swapin_cluster(swp_entry_t swap, gfp_t gfp,
> >>> pgoff_t ilx;
> >>> struct folio *folio;
> >>>
> >>> + /* swap_cluster_readahead might cross the mapping boundary and
> >>> + * allocate pages for other mappings. We have to skip KASAN.
> >>> + */
> >>> + gfp |= __GFP_SKIP_KASAN;
> >>> +
> >>> mpol = shmem_get_pgoff_policy(info, index, 0, &ilx);
> >>> folio = swap_cluster_readahead(swap, gfp, mpol, ilx);
> >>> mpol_cond_put(mpol);
> >>
> >> If we force __GFP_SKIP_KASAN, would this cause issues for mappings that
> >> explicitly should NOT have the flag? and your v1 link already mentions
> >> this scenario.
> > We lose the benefits of kasan hw tags (other modes are not affected)
> > by forcing the flag.
> >
> > The other mappings swap_cluster_readahead can affect are anon
> > mappings, regular shmem mappings, or gpu shmem mappings. I think only
> > gpu shmem mappings miss __GFP_SKIP_KASAN. That might not even be
> > intentional, because gpu shmem mappings pick GFP_HIGHUSER over
> > GFP_HIGHUSER_MOVABLE to avoid __GFP_MOVABLE. That was before
> > __GFP_SKIP_KASAN was added to GFP_HIGHUSER_MOVABLE.
>
> It sounds like the right approach would be to explicitly set
> __GFP_SKIP_KASAN for GPU shmem mappings, no? I think having users
> explicitly set __GFP_SKIP_KASAN makes the implications clearer than
> having shmem core set it implicitly.
It's a bit of a shame that we have to explicitly set this
__GFP_SKIP_KASAN flag when we select GFP_HIGHUSER though (means a lot
of patching to do in drivers/gpu/drm/ basically, because basically
every driver relying on shmem for its buffer allocation uses this flag).
Also, it feels like KASAN poisoning for these pages would be interesting
to have since we know we won't allow MTE_PROT on userspace mappings
anyway. Oh, and some buffers might even be kernel only (no mmap()
allowed), which makes them even better candidates for poisoning.
>
> We could also consider adding a VM_WARN in shmem_swapin_cluster() to
> detect any mappings missing the __GFP_SKIP_KASAN flag.
If the general consensus is that all shmem-backed allocation must have
__GFP_SKIP_KASAN, yes, it'd make sense to add a VM_WARN.
>
> > I guess what I am trying to say is these are all user pages. We have
> > to skip kasan when user pages can be mapped PROT_MTE. The
>
> Yes, regular shmem mappings typically default to GFP_HIGHUSER_MOVABLE,
> while GPU shmem mappings are a special case.
They are not that special, they are just not MOVABLE because the GPU
might also access the same pages under the hood. If it's assumed that
any page being exposed through mmap() must have __GFP_SKIP_KASAN, why
does GFP_HIGHUSER not have that flag too?
>
> > justification for gpu shmem mappings is that they cannot be mapped
> > PROT_MTE. But if readahead can affect non-gpu shmem mappings, it seems
> > we have to either force __GFP_SKIP_KASAN or to cap/disable readahead.
I'm no MM expert, so it's probably me not understanding how this
swap-readahead logic is supposed to work, but the whole idea of using
different flags from those that were requested by the f_mapping seems
fragile. I mean, this comments proves [1] it's not the first time the
problem is considered, and I'm wondering why __GFP_SKIP_KASAN should be
treated differently from zones. Yes, that's an extra copy if the
SKIP_KASAN flags don't match but the zones do, but in practice, won't
we have GFP_HIGHUSER and GFP_HIGHUSER_MOVABLE in different zones? Or is
the problem that, even with a copy, it's already too late to restore
the flags because they been overwritten during kazan unpoisoning?
[1]https://elixir.bootlin.com/linux/v7.0.9/source/mm/shmem.c#L2112
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH RFC v2] mm/shmem: set __GFP_SKIP_KASAN for swap_cluster_readahead
2026-05-21 8:51 ` Boris Brezillon
@ 2026-05-21 15:49 ` Chia-I Wu
2026-05-21 21:12 ` Chia-I Wu
0 siblings, 1 reply; 7+ messages in thread
From: Chia-I Wu @ 2026-05-21 15:49 UTC (permalink / raw)
To: Boris Brezillon
Cc: Baolin Wang, Andrey Ryabinin, Alexander Potapenko,
Andrey Konovalov, Dmitry Vyukov, Vincenzo Frascino, Andrew Morton,
Hugh Dickins, Kairui Song, kasan-dev, linux-mm, linux-kernel
On Thu, May 21, 2026 at 1:51 AM Boris Brezillon
<boris.brezillon@collabora.com> wrote:
>
> On Thu, 21 May 2026 15:05:21 +0800
> Baolin Wang <baolin.wang@linux.alibaba.com> wrote:
>
> > On 5/21/26 1:06 AM, Chia-I Wu wrote:
> > > On Wed, May 20, 2026 at 3:04 AM Baolin Wang
> > > <baolin.wang@linux.alibaba.com> wrote:
> > >>
> > >> CC Kairui,
> > >>
> > >> On 5/20/26 12:31 PM, Chia-I Wu via B4 Relay wrote:
> > >>> From: Chia-I Wu <olvaffe@gmail.com>
> > >>>
> > >>> swap_cluster_readahead can allocate folios for other mappings. If the
> > >>> gfp flags do not have __GFP_SKIP_KASAN, but the other mappings have
> > >>> PROT_MTE, we can end up with false KASAN errors such as
> > >>>
> > >>> BUG: KASAN: invalid-access in swap_writepage+0xb0/0x21c
> > >>> Read at addr f5ffff81aa71dff8 by task WM.task-4/6956
> > >>> Pointer tag: [f5], memory tag: [f9]
> > >>>
> > >>> In the above example, because __GFP_SKIP_KASAN was missing, KASAN set
> > >>> both pointer tag and memory tag to 0xf5 when swap_cluster_readahead
> > >>> allocated the folio. But the userspace had already set the memory tag to
> > >>> 0xf9 before swapped out. arch_swap_restore restored the memory tag back
> > >>> to 0xf9, leading to the mismatch.
> > >>>
> > >>> Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
> > >>> ---
> > >>> Changes in v2:
> > >>> - set __GFP_SKIP_KASAN for shmem instead of drm/panthor
> > >>> - Link to v1: https://patch.msgid.link/20260512-panthor-kasan-v1-1-d8d3e275d71b@gmail.com
> > >>> ---
> > >>> mm/shmem.c | 5 +++++
> > >>> 1 file changed, 5 insertions(+)
> > >>>
> > >>> diff --git a/mm/shmem.c b/mm/shmem.c
> > >>> index 3b5dc21b323c2..db9130a8c5b76 100644
> > >>> --- a/mm/shmem.c
> > >>> +++ b/mm/shmem.c
> > >>> @@ -1784,6 +1784,11 @@ static struct folio *shmem_swapin_cluster(swp_entry_t swap, gfp_t gfp,
> > >>> pgoff_t ilx;
> > >>> struct folio *folio;
> > >>>
> > >>> + /* swap_cluster_readahead might cross the mapping boundary and
> > >>> + * allocate pages for other mappings. We have to skip KASAN.
> > >>> + */
> > >>> + gfp |= __GFP_SKIP_KASAN;
> > >>> +
> > >>> mpol = shmem_get_pgoff_policy(info, index, 0, &ilx);
> > >>> folio = swap_cluster_readahead(swap, gfp, mpol, ilx);
> > >>> mpol_cond_put(mpol);
> > >>
> > >> If we force __GFP_SKIP_KASAN, would this cause issues for mappings that
> > >> explicitly should NOT have the flag? and your v1 link already mentions
> > >> this scenario.
> > > We lose the benefits of kasan hw tags (other modes are not affected)
> > > by forcing the flag.
> > >
> > > The other mappings swap_cluster_readahead can affect are anon
> > > mappings, regular shmem mappings, or gpu shmem mappings. I think only
> > > gpu shmem mappings miss __GFP_SKIP_KASAN. That might not even be
> > > intentional, because gpu shmem mappings pick GFP_HIGHUSER over
> > > GFP_HIGHUSER_MOVABLE to avoid __GFP_MOVABLE. That was before
> > > __GFP_SKIP_KASAN was added to GFP_HIGHUSER_MOVABLE.
> >
> > It sounds like the right approach would be to explicitly set
> > __GFP_SKIP_KASAN for GPU shmem mappings, no? I think having users
> > explicitly set __GFP_SKIP_KASAN makes the implications clearer than
> > having shmem core set it implicitly.
>
> It's a bit of a shame that we have to explicitly set this
> __GFP_SKIP_KASAN flag when we select GFP_HIGHUSER though (means a lot
> of patching to do in drivers/gpu/drm/ basically, because basically
> every driver relying on shmem for its buffer allocation uses this flag).
>
> Also, it feels like KASAN poisoning for these pages would be interesting
> to have since we know we won't allow MTE_PROT on userspace mappings
> anyway. Oh, and some buffers might even be kernel only (no mmap()
> allowed), which makes them even better candidates for poisoning.
>
> >
> > We could also consider adding a VM_WARN in shmem_swapin_cluster() to
> > detect any mappings missing the __GFP_SKIP_KASAN flag.
>
> If the general consensus is that all shmem-backed allocation must have
> __GFP_SKIP_KASAN, yes, it'd make sense to add a VM_WARN.
>
> >
> > > I guess what I am trying to say is these are all user pages. We have
> > > to skip kasan when user pages can be mapped PROT_MTE. The
> >
> > Yes, regular shmem mappings typically default to GFP_HIGHUSER_MOVABLE,
> > while GPU shmem mappings are a special case.
>
> They are not that special, they are just not MOVABLE because the GPU
> might also access the same pages under the hood. If it's assumed that
> any page being exposed through mmap() must have __GFP_SKIP_KASAN, why
> does GFP_HIGHUSER not have that flag too?
It is also about whether PROT_MTE is allowed. This becomes a problem
when both kernel and userspace want to modify the tags stored in MTE.
Another way to achieve the same effect as this patch, but is more
explicit, is to have
#define GFP_HIGHUSER_SWAPPABLE (GFP_HIGHUSER | __GFP_SKIP_KASAN)
#define GFP_HIGHUSER_MOVABLE (GFP_HIGHUSER_SWAPPABLE | __GFP_MOVABLE)
GPU drivers that can swap should use GFP_HIGHUSER_SWAPPABLE. shmem
core can warn about missing __GFP_SKIP_KASAN.
>
> >
> > > justification for gpu shmem mappings is that they cannot be mapped
> > > PROT_MTE. But if readahead can affect non-gpu shmem mappings, it seems
> > > we have to either force __GFP_SKIP_KASAN or to cap/disable readahead.
>
> I'm no MM expert, so it's probably me not understanding how this
> swap-readahead logic is supposed to work, but the whole idea of using
> different flags from those that were requested by the f_mapping seems
> fragile. I mean, this comments proves [1] it's not the first time the
> problem is considered, and I'm wondering why __GFP_SKIP_KASAN should be
> treated differently from zones. Yes, that's an extra copy if the
> SKIP_KASAN flags don't match but the zones do, but in practice, won't
> we have GFP_HIGHUSER and GFP_HIGHUSER_MOVABLE in different zones? Or is
> the problem that, even with a copy, it's already too late to restore
> the flags because they been overwritten during kazan unpoisoning?
arch_swap_restore is called just before shmem_replace_folio. It is a
bit too late right now but I guess it is fixable.
But shmem is not just a victim. It is also an offender to anon
mappings. We would need a similar replacement logic in do_swap_page
for anon mappings.
>
> [1]https://elixir.bootlin.com/linux/v7.0.9/source/mm/shmem.c#L2112
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH RFC v2] mm/shmem: set __GFP_SKIP_KASAN for swap_cluster_readahead
2026-05-21 15:49 ` Chia-I Wu
@ 2026-05-21 21:12 ` Chia-I Wu
0 siblings, 0 replies; 7+ messages in thread
From: Chia-I Wu @ 2026-05-21 21:12 UTC (permalink / raw)
To: Boris Brezillon
Cc: Baolin Wang, Andrey Ryabinin, Alexander Potapenko,
Andrey Konovalov, Dmitry Vyukov, Vincenzo Frascino, Andrew Morton,
Hugh Dickins, Kairui Song, kasan-dev, linux-mm, linux-kernel
On Thu, May 21, 2026 at 8:49 AM Chia-I Wu <olvaffe@gmail.com> wrote:
>
> On Thu, May 21, 2026 at 1:51 AM Boris Brezillon
> <boris.brezillon@collabora.com> wrote:
> >
> > On Thu, 21 May 2026 15:05:21 +0800
> > Baolin Wang <baolin.wang@linux.alibaba.com> wrote:
> >
> > > On 5/21/26 1:06 AM, Chia-I Wu wrote:
> > > > On Wed, May 20, 2026 at 3:04 AM Baolin Wang
> > > > <baolin.wang@linux.alibaba.com> wrote:
> > > >>
> > > >> CC Kairui,
> > > >>
> > > >> On 5/20/26 12:31 PM, Chia-I Wu via B4 Relay wrote:
> > > >>> From: Chia-I Wu <olvaffe@gmail.com>
> > > >>>
> > > >>> swap_cluster_readahead can allocate folios for other mappings. If the
> > > >>> gfp flags do not have __GFP_SKIP_KASAN, but the other mappings have
> > > >>> PROT_MTE, we can end up with false KASAN errors such as
> > > >>>
> > > >>> BUG: KASAN: invalid-access in swap_writepage+0xb0/0x21c
> > > >>> Read at addr f5ffff81aa71dff8 by task WM.task-4/6956
> > > >>> Pointer tag: [f5], memory tag: [f9]
> > > >>>
> > > >>> In the above example, because __GFP_SKIP_KASAN was missing, KASAN set
> > > >>> both pointer tag and memory tag to 0xf5 when swap_cluster_readahead
> > > >>> allocated the folio. But the userspace had already set the memory tag to
> > > >>> 0xf9 before swapped out. arch_swap_restore restored the memory tag back
> > > >>> to 0xf9, leading to the mismatch.
> > > >>>
> > > >>> Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
> > > >>> ---
> > > >>> Changes in v2:
> > > >>> - set __GFP_SKIP_KASAN for shmem instead of drm/panthor
> > > >>> - Link to v1: https://patch.msgid.link/20260512-panthor-kasan-v1-1-d8d3e275d71b@gmail.com
> > > >>> ---
> > > >>> mm/shmem.c | 5 +++++
> > > >>> 1 file changed, 5 insertions(+)
> > > >>>
> > > >>> diff --git a/mm/shmem.c b/mm/shmem.c
> > > >>> index 3b5dc21b323c2..db9130a8c5b76 100644
> > > >>> --- a/mm/shmem.c
> > > >>> +++ b/mm/shmem.c
> > > >>> @@ -1784,6 +1784,11 @@ static struct folio *shmem_swapin_cluster(swp_entry_t swap, gfp_t gfp,
> > > >>> pgoff_t ilx;
> > > >>> struct folio *folio;
> > > >>>
> > > >>> + /* swap_cluster_readahead might cross the mapping boundary and
> > > >>> + * allocate pages for other mappings. We have to skip KASAN.
> > > >>> + */
> > > >>> + gfp |= __GFP_SKIP_KASAN;
> > > >>> +
> > > >>> mpol = shmem_get_pgoff_policy(info, index, 0, &ilx);
> > > >>> folio = swap_cluster_readahead(swap, gfp, mpol, ilx);
> > > >>> mpol_cond_put(mpol);
> > > >>
> > > >> If we force __GFP_SKIP_KASAN, would this cause issues for mappings that
> > > >> explicitly should NOT have the flag? and your v1 link already mentions
> > > >> this scenario.
> > > > We lose the benefits of kasan hw tags (other modes are not affected)
> > > > by forcing the flag.
> > > >
> > > > The other mappings swap_cluster_readahead can affect are anon
> > > > mappings, regular shmem mappings, or gpu shmem mappings. I think only
> > > > gpu shmem mappings miss __GFP_SKIP_KASAN. That might not even be
> > > > intentional, because gpu shmem mappings pick GFP_HIGHUSER over
> > > > GFP_HIGHUSER_MOVABLE to avoid __GFP_MOVABLE. That was before
> > > > __GFP_SKIP_KASAN was added to GFP_HIGHUSER_MOVABLE.
> > >
> > > It sounds like the right approach would be to explicitly set
> > > __GFP_SKIP_KASAN for GPU shmem mappings, no? I think having users
> > > explicitly set __GFP_SKIP_KASAN makes the implications clearer than
> > > having shmem core set it implicitly.
> >
> > It's a bit of a shame that we have to explicitly set this
> > __GFP_SKIP_KASAN flag when we select GFP_HIGHUSER though (means a lot
> > of patching to do in drivers/gpu/drm/ basically, because basically
> > every driver relying on shmem for its buffer allocation uses this flag).
> >
> > Also, it feels like KASAN poisoning for these pages would be interesting
> > to have since we know we won't allow MTE_PROT on userspace mappings
> > anyway. Oh, and some buffers might even be kernel only (no mmap()
> > allowed), which makes them even better candidates for poisoning.
> >
> > >
> > > We could also consider adding a VM_WARN in shmem_swapin_cluster() to
> > > detect any mappings missing the __GFP_SKIP_KASAN flag.
> >
> > If the general consensus is that all shmem-backed allocation must have
> > __GFP_SKIP_KASAN, yes, it'd make sense to add a VM_WARN.
> >
> > >
> > > > I guess what I am trying to say is these are all user pages. We have
> > > > to skip kasan when user pages can be mapped PROT_MTE. The
> > >
> > > Yes, regular shmem mappings typically default to GFP_HIGHUSER_MOVABLE,
> > > while GPU shmem mappings are a special case.
> >
> > They are not that special, they are just not MOVABLE because the GPU
> > might also access the same pages under the hood. If it's assumed that
> > any page being exposed through mmap() must have __GFP_SKIP_KASAN, why
> > does GFP_HIGHUSER not have that flag too?
> It is also about whether PROT_MTE is allowed. This becomes a problem
> when both kernel and userspace want to modify the tags stored in MTE.
>
> Another way to achieve the same effect as this patch, but is more
> explicit, is to have
>
> #define GFP_HIGHUSER_SWAPPABLE (GFP_HIGHUSER | __GFP_SKIP_KASAN)
> #define GFP_HIGHUSER_MOVABLE (GFP_HIGHUSER_SWAPPABLE | __GFP_MOVABLE)
>
> GPU drivers that can swap should use GFP_HIGHUSER_SWAPPABLE. shmem
> core can warn about missing __GFP_SKIP_KASAN.
>
> >
> > >
> > > > justification for gpu shmem mappings is that they cannot be mapped
> > > > PROT_MTE. But if readahead can affect non-gpu shmem mappings, it seems
> > > > we have to either force __GFP_SKIP_KASAN or to cap/disable readahead.
> >
> > I'm no MM expert, so it's probably me not understanding how this
> > swap-readahead logic is supposed to work, but the whole idea of using
> > different flags from those that were requested by the f_mapping seems
> > fragile. I mean, this comments proves [1] it's not the first time the
> > problem is considered, and I'm wondering why __GFP_SKIP_KASAN should be
> > treated differently from zones. Yes, that's an extra copy if the
> > SKIP_KASAN flags don't match but the zones do, but in practice, won't
> > we have GFP_HIGHUSER and GFP_HIGHUSER_MOVABLE in different zones? Or is
> > the problem that, even with a copy, it's already too late to restore
> > the flags because they been overwritten during kazan unpoisoning?
> arch_swap_restore is called just before shmem_replace_folio. It is a
> bit too late right now but I guess it is fixable.
>
> But shmem is not just a victim. It is also an offender to anon
> mappings. We would need a similar replacement logic in do_swap_page
> for anon mappings.
Come to think about it, that's not how things work.
Regular shmems and anon mappings set __GFP_SKIP_KASAN because they can
be mapped PROT_MTE. This calls page_kasan_tag_reset on the pages.
GPU shmems omit __GFP_SKIP_KASAN because they can't be mapped
PROT_MTE. This calls kasan_unpoison_pages on the pages.
With swap readahead, no one can expect the right function is called
anymore. The question is can we detect the mismatch and call
page_kasan_tag_reset/kasan_unpoison_pages to make things right again
in places such as do_swap_page and shmem_swapin_folio?
>
> >
> > [1]https://elixir.bootlin.com/linux/v7.0.9/source/mm/shmem.c#L2112
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2026-05-21 21:12 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-20 4:31 [PATCH RFC v2] mm/shmem: set __GFP_SKIP_KASAN for swap_cluster_readahead Chia-I Wu via B4 Relay
2026-05-20 10:04 ` Baolin Wang
2026-05-20 17:06 ` Chia-I Wu
2026-05-21 7:05 ` Baolin Wang
2026-05-21 8:51 ` Boris Brezillon
2026-05-21 15:49 ` Chia-I Wu
2026-05-21 21:12 ` Chia-I Wu
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox