From: Catalin Marinas <catalin.marinas@arm.com>
To: David Hildenbrand <david@redhat.com>
Cc: "Peter Collingbourne" <pcc@google.com>,
"Qun-wei Lin (林群崴)" <Qun-wei.Lin@mediatek.com>,
linux-arm-kernel@lists.infradead.org, linux-mm@kvack.org,
linux-kernel@vger.kernel.org,
"surenb@google.com" <surenb@google.com>,
"Chinwen Chang (張錦文)" <chinwen.chang@mediatek.com>,
"kasan-dev@googlegroups.com" <kasan-dev@googlegroups.com>,
"Kuan-Ying Lee (李冠穎)" <Kuan-Ying.Lee@mediatek.com>,
"Casper Li (李中榮)" <casper.li@mediatek.com>,
"gregkh@linuxfoundation.org" <gregkh@linuxfoundation.org>,
vincenzo.frascino@arm.com,
"Alexandru Elisei" <alexandru.elisei@arm.com>,
will@kernel.org, eugenis@google.com,
"Steven Price" <steven.price@arm.com>,
stable@vger.kernel.org
Subject: Re: [PATCH 1/3] mm: Move arch_do_swap_page() call to before swap_free()
Date: Mon, 15 May 2023 18:34:30 +0100 [thread overview]
Message-ID: <ZGJtJobLrBg3PtHm@arm.com> (raw)
In-Reply-To: <7471013e-4afb-e445-5985-2441155fc82c@redhat.com>
On Sat, May 13, 2023 at 05:29:53AM +0200, David Hildenbrand wrote:
> On 13.05.23 01:57, Peter Collingbourne wrote:
> > diff --git a/mm/memory.c b/mm/memory.c
> > index 01a23ad48a04..83268d287ff1 100644
> > --- a/mm/memory.c
> > +++ b/mm/memory.c
> > @@ -3914,19 +3914,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
> > }
> > }
> > - /*
> > - * Remove the swap entry and conditionally try to free up the swapcache.
> > - * We're already holding a reference on the page but haven't mapped it
> > - * yet.
> > - */
> > - swap_free(entry);
> > - if (should_try_to_free_swap(folio, vma, vmf->flags))
> > - folio_free_swap(folio);
> > -
> > - inc_mm_counter(vma->vm_mm, MM_ANONPAGES);
> > - dec_mm_counter(vma->vm_mm, MM_SWAPENTS);
> > pte = mk_pte(page, vma->vm_page_prot);
> > -
> > /*
> > * Same logic as in do_wp_page(); however, optimize for pages that are
> > * certainly not shared either because we just allocated them without
> > @@ -3946,8 +3934,21 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
> > pte = pte_mksoft_dirty(pte);
> > if (pte_swp_uffd_wp(vmf->orig_pte))
> > pte = pte_mkuffd_wp(pte);
> > + arch_do_swap_page(vma->vm_mm, vma, vmf->address, pte, vmf->orig_pte);
> > vmf->orig_pte = pte;
> > + /*
> > + * Remove the swap entry and conditionally try to free up the swapcache.
> > + * We're already holding a reference on the page but haven't mapped it
> > + * yet.
> > + */
> > + swap_free(entry);
> > + if (should_try_to_free_swap(folio, vma, vmf->flags))
> > + folio_free_swap(folio);
> > +
> > + inc_mm_counter(vma->vm_mm, MM_ANONPAGES);
> > + dec_mm_counter(vma->vm_mm, MM_SWAPENTS);
> > +
> > /* ksm created a completely new copy */
> > if (unlikely(folio != swapcache && swapcache)) {
> > page_add_new_anon_rmap(page, vma, vmf->address);
> > @@ -3959,7 +3960,6 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
> > VM_BUG_ON(!folio_test_anon(folio) ||
> > (pte_write(pte) && !PageAnonExclusive(page)));
> > set_pte_at(vma->vm_mm, vmf->address, vmf->pte, pte);
> > - arch_do_swap_page(vma->vm_mm, vma, vmf->address, pte, vmf->orig_pte);
> > folio_unlock(folio);
> > if (folio != swapcache && swapcache) {
>
>
> You are moving the folio_free_swap() call after the folio_ref_count(folio)
> == 1 check, which means that such (previously) swapped pages that are
> exclusive cannot be detected as exclusive.
>
> There must be a better way to handle MTE here.
>
> Where are the tags stored, how is the location identified, and when are they
> effectively restored right now?
I haven't gone through Peter's patches yet but a pretty good description
of the problem is here:
https://lore.kernel.org/all/5050805753ac469e8d727c797c2218a9d780d434.camel@mediatek.com/.
I couldn't reproduce it with my swap setup but both Qun-wei and Peter
triggered it.
When a tagged page is swapped out, the arm64 code stores the metadata
(tags) in a local xarray indexed by the swap pte. When restoring from
swap, the arm64 set_pte_at() checks this xarray using the old swap pte
and spills the tags onto the new page. Apparently something changed in
the kernel recently that causes swap_range_free() to be called before
set_pte_at(). The arm64 arch_swap_invalidate_page() frees the metadata
from the xarray and the subsequent set_pte_at() won't find it.
If we have the page, the metadata can be restored before set_pte_at()
and I guess that's what Peter is trying to do (again, I haven't looked
at the details yet; leaving it for tomorrow).
Is there any other way of handling this? E.g. not release the metadata
in arch_swap_invalidate_page() but later in set_pte_at() once it was
restored. But then we may leak this metadata if there's no set_pte_at()
(the process mapping the swap entry died).
--
Catalin
WARNING: multiple messages have this Message-ID (diff)
From: Catalin Marinas <catalin.marinas@arm.com>
To: David Hildenbrand <david@redhat.com>
Cc: "Peter Collingbourne" <pcc@google.com>,
"Qun-wei Lin (林群崴)" <Qun-wei.Lin@mediatek.com>,
linux-arm-kernel@lists.infradead.org, linux-mm@kvack.org,
linux-kernel@vger.kernel.org,
"surenb@google.com" <surenb@google.com>,
"Chinwen Chang (張錦文)" <chinwen.chang@mediatek.com>,
"kasan-dev@googlegroups.com" <kasan-dev@googlegroups.com>,
"Kuan-Ying Lee (李冠穎)" <Kuan-Ying.Lee@mediatek.com>,
"Casper Li (李中榮)" <casper.li@mediatek.com>,
"gregkh@linuxfoundation.org" <gregkh@linuxfoundation.org>,
vincenzo.frascino@arm.com,
"Alexandru Elisei" <alexandru.elisei@arm.com>,
will@kernel.org, eugenis@google.com,
"Steven Price" <steven.price@arm.com>,
stable@vger.kernel.org
Subject: Re: [PATCH 1/3] mm: Move arch_do_swap_page() call to before swap_free()
Date: Mon, 15 May 2023 18:34:30 +0100 [thread overview]
Message-ID: <ZGJtJobLrBg3PtHm@arm.com> (raw)
In-Reply-To: <7471013e-4afb-e445-5985-2441155fc82c@redhat.com>
On Sat, May 13, 2023 at 05:29:53AM +0200, David Hildenbrand wrote:
> On 13.05.23 01:57, Peter Collingbourne wrote:
> > diff --git a/mm/memory.c b/mm/memory.c
> > index 01a23ad48a04..83268d287ff1 100644
> > --- a/mm/memory.c
> > +++ b/mm/memory.c
> > @@ -3914,19 +3914,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
> > }
> > }
> > - /*
> > - * Remove the swap entry and conditionally try to free up the swapcache.
> > - * We're already holding a reference on the page but haven't mapped it
> > - * yet.
> > - */
> > - swap_free(entry);
> > - if (should_try_to_free_swap(folio, vma, vmf->flags))
> > - folio_free_swap(folio);
> > -
> > - inc_mm_counter(vma->vm_mm, MM_ANONPAGES);
> > - dec_mm_counter(vma->vm_mm, MM_SWAPENTS);
> > pte = mk_pte(page, vma->vm_page_prot);
> > -
> > /*
> > * Same logic as in do_wp_page(); however, optimize for pages that are
> > * certainly not shared either because we just allocated them without
> > @@ -3946,8 +3934,21 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
> > pte = pte_mksoft_dirty(pte);
> > if (pte_swp_uffd_wp(vmf->orig_pte))
> > pte = pte_mkuffd_wp(pte);
> > + arch_do_swap_page(vma->vm_mm, vma, vmf->address, pte, vmf->orig_pte);
> > vmf->orig_pte = pte;
> > + /*
> > + * Remove the swap entry and conditionally try to free up the swapcache.
> > + * We're already holding a reference on the page but haven't mapped it
> > + * yet.
> > + */
> > + swap_free(entry);
> > + if (should_try_to_free_swap(folio, vma, vmf->flags))
> > + folio_free_swap(folio);
> > +
> > + inc_mm_counter(vma->vm_mm, MM_ANONPAGES);
> > + dec_mm_counter(vma->vm_mm, MM_SWAPENTS);
> > +
> > /* ksm created a completely new copy */
> > if (unlikely(folio != swapcache && swapcache)) {
> > page_add_new_anon_rmap(page, vma, vmf->address);
> > @@ -3959,7 +3960,6 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
> > VM_BUG_ON(!folio_test_anon(folio) ||
> > (pte_write(pte) && !PageAnonExclusive(page)));
> > set_pte_at(vma->vm_mm, vmf->address, vmf->pte, pte);
> > - arch_do_swap_page(vma->vm_mm, vma, vmf->address, pte, vmf->orig_pte);
> > folio_unlock(folio);
> > if (folio != swapcache && swapcache) {
>
>
> You are moving the folio_free_swap() call after the folio_ref_count(folio)
> == 1 check, which means that such (previously) swapped pages that are
> exclusive cannot be detected as exclusive.
>
> There must be a better way to handle MTE here.
>
> Where are the tags stored, how is the location identified, and when are they
> effectively restored right now?
I haven't gone through Peter's patches yet but a pretty good description
of the problem is here:
https://lore.kernel.org/all/5050805753ac469e8d727c797c2218a9d780d434.camel@mediatek.com/.
I couldn't reproduce it with my swap setup but both Qun-wei and Peter
triggered it.
When a tagged page is swapped out, the arm64 code stores the metadata
(tags) in a local xarray indexed by the swap pte. When restoring from
swap, the arm64 set_pte_at() checks this xarray using the old swap pte
and spills the tags onto the new page. Apparently something changed in
the kernel recently that causes swap_range_free() to be called before
set_pte_at(). The arm64 arch_swap_invalidate_page() frees the metadata
from the xarray and the subsequent set_pte_at() won't find it.
If we have the page, the metadata can be restored before set_pte_at()
and I guess that's what Peter is trying to do (again, I haven't looked
at the details yet; leaving it for tomorrow).
Is there any other way of handling this? E.g. not release the metadata
in arch_swap_invalidate_page() but later in set_pte_at() once it was
restored. But then we may leak this metadata if there's no set_pte_at()
(the process mapping the swap entry died).
--
Catalin
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
next prev parent reply other threads:[~2023-05-15 17:37 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-05-12 23:57 [PATCH 0/3] mm: Fix bug affecting swapping in MTE tagged pages Peter Collingbourne
2023-05-12 23:57 ` [PATCH 1/3] mm: Move arch_do_swap_page() call to before swap_free() Peter Collingbourne
2023-05-13 3:29 ` David Hildenbrand
2023-05-15 17:34 ` Catalin Marinas [this message]
2023-05-15 17:34 ` Catalin Marinas
2023-05-15 23:40 ` Peter Collingbourne
2023-05-15 23:40 ` Peter Collingbourne
2023-05-16 12:35 ` David Hildenbrand
2023-05-16 12:35 ` David Hildenbrand
2023-05-17 1:57 ` Peter Collingbourne
2023-05-17 1:57 ` Peter Collingbourne
2023-05-17 8:30 ` David Hildenbrand
2023-05-17 8:30 ` David Hildenbrand
2023-05-18 20:06 ` Peter Collingbourne
2023-05-18 20:06 ` Peter Collingbourne
2023-05-19 9:21 ` David Hildenbrand
2023-05-19 9:21 ` David Hildenbrand
2023-05-19 16:21 ` Catalin Marinas
2023-05-19 16:21 ` Catalin Marinas
2023-05-16 12:30 ` David Hildenbrand
2023-05-16 12:30 ` David Hildenbrand
2023-05-17 1:37 ` Peter Collingbourne
2023-05-17 1:37 ` Peter Collingbourne
2023-05-17 8:31 ` David Hildenbrand
2023-05-17 8:31 ` David Hildenbrand
2023-05-16 0:16 ` Peter Collingbourne
2023-05-16 0:16 ` Peter Collingbourne
2023-05-16 2:35 ` Peter Collingbourne
2023-05-16 2:35 ` Peter Collingbourne
2023-05-17 8:34 ` David Hildenbrand
2023-05-17 8:34 ` David Hildenbrand
2023-05-16 12:40 ` David Hildenbrand
2023-05-16 12:40 ` David Hildenbrand
2023-05-17 2:13 ` Peter Collingbourne
2023-05-17 2:13 ` Peter Collingbourne
2023-05-12 23:57 ` [PATCH 2/3] mm: Call arch_swap_restore() from arch_do_swap_page() and deprecate the latter Peter Collingbourne
2023-05-13 3:34 ` David Hildenbrand
2023-05-12 23:57 ` [PATCH 3/3] arm64: mte: Simplify swap tag restoration logic and fix uninitialized tag issue Peter Collingbourne
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZGJtJobLrBg3PtHm@arm.com \
--to=catalin.marinas@arm.com \
--cc=Kuan-Ying.Lee@mediatek.com \
--cc=Qun-wei.Lin@mediatek.com \
--cc=alexandru.elisei@arm.com \
--cc=casper.li@mediatek.com \
--cc=chinwen.chang@mediatek.com \
--cc=david@redhat.com \
--cc=eugenis@google.com \
--cc=gregkh@linuxfoundation.org \
--cc=kasan-dev@googlegroups.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=pcc@google.com \
--cc=stable@vger.kernel.org \
--cc=steven.price@arm.com \
--cc=surenb@google.com \
--cc=vincenzo.frascino@arm.com \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.