linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v1 0/7] mm: COW fixes part 3: reliable GUP R/W FOLL_GET of anonymous pages
@ 2022-03-15 14:18 David Hildenbrand
  2022-03-15 14:18 ` [PATCH v1 1/7] mm/swap: remember PG_anon_exclusive via a swp pte bit David Hildenbrand
                   ` (7 more replies)
  0 siblings, 8 replies; 32+ messages in thread
From: David Hildenbrand @ 2022-03-15 14:18 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Jan Kara, David Hildenbrand, Catalin Marinas, Yang Shi,
	Dave Hansen, Peter Xu, Michal Hocko, linux-mm, Donald Dutile,
	Liang Zhang, Borislav Petkov, Alexander Gordeev, Will Deacon,
	Christoph Hellwig, Paul Mackerras, Andrea Arcangeli, linux-s390,
	Vasily Gorbik, Rik van Riel, Hugh Dickins, Matthew Wilcox,
	Mike Rapoport, Ingo Molnar, linux-arm-kernel, Jason Gunthorpe,
	David Rientjes, Pedro Gomes, Jann Horn, John Hubbard,
	Heiko Carstens, Shakeel Butt, Thomas Gleixner, Vlastimil Babka,
	Oded Gabbay, linuxppc-dev, Oleg Nesterov, Nadav Amit,
	Andrew Morton, Linus Torvalds, Roman Gushchin,
	Kirill A . Shutemov, Mike Kravetz

More information on the general COW issues can be found at [2]. This series
is based on v5.17-rc8, [1]:
	[PATCH v3 0/9] mm: COW fixes part 1: fix the COW security issue for
	THP and swap
and [4]:
	[PATCH v2 00/15] mm: COW fixes part 2: reliable GUP pins of
	anonymous pages

v1 is located at:
	https://github.com/davidhildenbrand/linux/tree/cow_fixes_part_3_v1


This series fixes memory corruptions when a GUP R/W reference
(FOLL_WRITE | FOLL_GET) was taken on an anonymous page and COW logic fails
to detect exclusivity of the page to then replacing the anonymous page by
a copy in the page table: The GUP reference lost synchronicity with the
pages mapped into the page tables. This series focuses on x86, arm64,
s390x and ppc64/book3s -- other architectures are fairly easy to support
by implementing __HAVE_ARCH_PTE_SWP_EXCLUSIVE.

This primarily fixes the O_DIRECT memory corruptions that can happen on
concurrent swapout, whereby we lose DMA reads to a page (modifying the user
page by writing to it).

O_DIRECT currently uses FOLL_GET for short-term (!FOLL_LONGTERM)
DMA from/to a user page. In the long run, we want to convert it to properly
use FOLL_PIN, and John is working on it, but that might take a while and
might not be easy to backport. In the meantime, let's restore what used to
work before we started modifying our COW logic: make R/W FOLL_GET
references reliable as long as there is no fork() after GUP involved.

This is just the natural follow-up of part 2, that will also further
reduce "wrong COW" on the swapin path, for example, when we cannot remove
a page from the swapcache due to concurrent writeback, or if we have two
threads faulting on the same swapped-out page. Fixing O_DIRECT is just a
nice side-product :)

This issue, including other related COW issues, has been summarized in [3]
under 2):
"
  2. Intra Process Memory Corruptions due to Wrong COW (FOLL_GET)

  It was discovered that we can create a memory corruption by reading a
  file via O_DIRECT to a part (e.g., first 512 bytes) of a page,
  concurrently writing to an unrelated part (e.g., last byte) of the same
  page, and concurrently write-protecting the page via clear_refs
  SOFTDIRTY tracking [6].

  For the reproducer, the issue is that O_DIRECT grabs a reference of the
  target page (via FOLL_GET) and clear_refs write-protects the relevant
  page table entry. On successive write access to the page from the
  process itself, we wrongly COW the page when resolving the write fault,
  resulting in a loss of synchronicity and consequently a memory corruption.

  While some people might think that using clear_refs in this combination
  is a corner cases, it turns out to be a more generic problem unfortunately.

  For example, it was just recently discovered that we can similarly
  create a memory corruption without clear_refs, simply by concurrently
  swapping out the buffer pages [7]. Note that we nowadays even use the
  swap infrastructure in Linux without an actual swap disk/partition: the
  prime example is zram which is enabled as default under Fedora [10].

  The root issue is that a write-fault on a page that has additional
  references results in a COW and thereby a loss of synchronicity
  and consequently a memory corruption if two parties believe they are
  referencing the same page.
"

We don't particularly care about R/O FOLL_GET references: they were never
reliable and O_DIRECT doesn't expect to observe modifications from a page
after DMA was started.

Note that:
* this only fixes the issue on x86, arm64, s390x and ppc64/book3s
  ("enterprise architectures"). Other architectures have to implement
  __HAVE_ARCH_PTE_SWP_EXCLUSIVE to achieve the same.
* this does *not * consider any kind of fork() after taking the reference:
  fork() after GUP never worked reliably with FOLL_GET.
* Not losing PG_anon_exclusive during swapout was the last remaining
  piece. KSM already makes sure that there are no other references on
  a page before considering it for sharing. Page migration maintains
  PG_anon_exclusive and simply fails when there are additional references
  (freezing the refcount fails). Only swapout code dropped the
  PG_anon_exclusive flag because it requires more work to remember +
  restore it.

With this series in place, most COW issues of [3] are fixed on said
architectures. Other architectures can implement
__HAVE_ARCH_PTE_SWP_EXCLUSIVE fairly easily.

What remains is the COW security issue on hugetlb with FOLL_GET, and
SOFTDIRTY tracking. I'll tackle both (guess what?) in part 4 once part 2
and part 3 are on its way upstream.

[1] https://lkml.kernel.org/r/20220131162940.210846-1-david@redhat.com
[2] https://lkml.kernel.org/r/20211217113049.23850-1-david@redhat.com
[3] https://lore.kernel.org/r/3ae33b08-d9ef-f846-56fb-645e3b9b4c66@redhat.com
[4] https://lkml.kernel.org/r/20220315104741.63071-1-david@redhat.com


David Hildenbrand (7):
  mm/swap: remember PG_anon_exclusive via a swp pte bit
  mm/debug_vm_pgtable: add tests for __HAVE_ARCH_PTE_SWP_EXCLUSIVE
  x86/pgtable: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
  arm64/pgtable: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
  s390/pgtable: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
  powerpc/pgtable: remove _PAGE_BIT_SWAP_TYPE for book3s
  powerpc/pgtable: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE for book3s

 arch/arm64/include/asm/pgtable-prot.h        |  1 +
 arch/arm64/include/asm/pgtable.h             | 23 ++++++--
 arch/powerpc/include/asm/book3s/64/pgtable.h | 31 ++++++++---
 arch/s390/include/asm/pgtable.h              | 37 ++++++++++---
 arch/x86/include/asm/pgtable.h               | 16 ++++++
 arch/x86/include/asm/pgtable_64.h            |  4 +-
 arch/x86/include/asm/pgtable_types.h         |  5 ++
 include/linux/pgtable.h                      | 29 +++++++++++
 include/linux/swapops.h                      |  2 +
 mm/debug_vm_pgtable.c                        | 15 ++++++
 mm/memory.c                                  | 55 ++++++++++++++++++--
 mm/rmap.c                                    | 19 ++++---
 mm/swapfile.c                                | 13 ++++-
 13 files changed, 219 insertions(+), 31 deletions(-)

-- 
2.35.1


^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2022-03-22  9:47 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-03-15 14:18 [PATCH v1 0/7] mm: COW fixes part 3: reliable GUP R/W FOLL_GET of anonymous pages David Hildenbrand
2022-03-15 14:18 ` [PATCH v1 1/7] mm/swap: remember PG_anon_exclusive via a swp pte bit David Hildenbrand
2022-03-15 14:18 ` [PATCH v1 2/7] mm/debug_vm_pgtable: add tests for __HAVE_ARCH_PTE_SWP_EXCLUSIVE David Hildenbrand
2022-03-15 14:18 ` [PATCH v1 3/7] x86/pgtable: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE David Hildenbrand
2022-03-15 14:18 ` [PATCH v1 4/7] arm64/pgtable: " David Hildenbrand
2022-03-16 18:27   ` Catalin Marinas
2022-03-17 10:04     ` David Hildenbrand
2022-03-17 17:58       ` Catalin Marinas
2022-03-18  9:59         ` David Hildenbrand
2022-03-18 11:33           ` Catalin Marinas
2022-03-18 14:14             ` David Hildenbrand
2022-03-21 14:38     ` Will Deacon
2022-03-21 14:39       ` Will Deacon
2022-03-21 15:07       ` David Hildenbrand
2022-03-21 17:44         ` Will Deacon
2022-03-21 18:27           ` Catalin Marinas
2022-03-22  9:46             ` David Hildenbrand
2022-03-15 14:18 ` [PATCH v1 5/7] s390/pgtable: " David Hildenbrand
2022-03-15 16:21   ` Gerald Schaefer
2022-03-15 16:37     ` David Hildenbrand
2022-03-15 16:58       ` David Hildenbrand
2022-03-15 17:12         ` David Hildenbrand
2022-03-15 17:14           ` David Hildenbrand
2022-03-16 10:56           ` Gerald Schaefer
2022-03-16 11:06             ` David Hildenbrand
2022-03-16 13:01             ` Christian Borntraeger
2022-03-16 13:27               ` Gerald Schaefer
2022-03-16 14:00                 ` David Hildenbrand
2022-03-15 14:18 ` [PATCH v1 6/7] powerpc/pgtable: remove _PAGE_BIT_SWAP_TYPE for book3s David Hildenbrand
2022-03-15 14:18 ` [PATCH v1 7/7] powerpc/pgtable: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE " David Hildenbrand
2022-03-18 23:48 ` [PATCH v1 0/7] mm: COW fixes part 3: reliable GUP R/W FOLL_GET of anonymous pages Jason Gunthorpe
2022-03-19 11:17   ` David Hildenbrand

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).