All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: linux-kernel@vger.kernel.org
Cc: Andrea Arcangeli <aarcange@redhat.com>,
	David Hildenbrand <david@redhat.com>,
	linuxppc-dev@lists.ozlabs.org,
	Anshuman Khandual <anshuman.khandual@arm.com>,
	Dave Chinner <david@fromorbit.com>,
	Mel Gorman <mgorman@techsingularity.net>,
	Peter Xu <peterx@redhat.com>,
	linux-mm@kvack.org, Hugh Dickins <hughd@google.com>,
	Nadav Amit <namit@vmware.com>,
	Nicholas Piggin <npiggin@gmail.com>,
	Mike Rapoport <rppt@kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Vlastimil Babka <vbabka@suse.cz>
Subject: [PATCH v2 0/7] mm/autonuma: replace savedwrite infrastructure
Date: Tue,  8 Nov 2022 18:46:45 +0100	[thread overview]
Message-ID: <20221108174652.198904-1-david@redhat.com> (raw)

This series is based on mm-unstable.

As discussed in my talk at LPC, we can reuse the same mechanism for
deciding whether to map a pte writable when upgrading permissions via
mprotect() -- e.g., PROT_READ -> PROT_READ|PROT_WRITE -- to replace the
savedwrite infrastructure used for NUMA hinting faults (e.g., PROT_NONE
-> PROT_READ|PROT_WRITE).

Instead of maintaining previous write permissions for a pte/pmd, we
re-determine if the pte/pmd can be writable. The big benefit is that we
have a common logic for deciding whether we can map a pte/pmd writable on
protection changes.

For private mappings, there should be no difference -- from
what I understand, that is what autonuma benchmarks care about.

I ran autonumabench for v1 on a system with 2 NUMA nodes, 96 GiB each via:
	perf stat --null --repeat 10
The numa01 benchmark is quite noisy in my environment and I failed to
reduce the noise so far.

numa01:
	mm-unstable:   146.88 +- 6.54 seconds time elapsed  ( +-  4.45% )
	mm-unstable++: 147.45 +- 13.39 seconds time elapsed  ( +-  9.08% )

numa02:
	mm-unstable:   16.0300 +- 0.0624 seconds time elapsed  ( +-  0.39% )
	mm-unstable++: 16.1281 +- 0.0945 seconds time elapsed  ( +-  0.59% )

It is worth noting that for shared writable mappings that require
writenotify, we will only avoid write faults if the pte/pmd is dirty
(inherited from the older mprotect logic). If we ever care about optimizing
that further, we'd need a different mechanism to identify whether the FS
still needs to get notified on the next write access.

In any case, such an optimiztion will then not be autonuma-specific,
but mprotect() permission upgrades would similarly benefit from it.

Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Nadav Amit <namit@vmware.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>

v1 -> v2:
* "mm/mprotect: factor out check whether manual PTE write upgrades are
   required"
 -> Added
* "mm/autonuma: use can_change_(pte|pmd)_writable() to replace savedwrite"
 -> Simplify and don't opimize for failed migration
 -> Update patch description

RFC -> v1:
* "mm/mprotect: allow clean exclusive anon pages to be writable"
 -> Move comment change to patch #2
* "mm/mprotect: minor can_change_pte_writable() cleanups"
 -> Adjust comments
* "mm/huge_memory: try avoiding write faults when changing PMD protection"
 -> Fix wrong check
* "selftests/vm: anon_cow: add mprotect() optimiation tests"
 -> Add basic tests for the mprotect() optimization

David Hildenbrand (6):
  mm/mprotect: minor can_change_pte_writable() cleanups
  mm/huge_memory: try avoiding write faults when changing PMD protection
  mm/mprotect: factor out check whether manual PTE write upgrades are
    required
  mm/autonuma: use can_change_(pte|pmd)_writable() to replace savedwrite
  mm: remove unused savedwrite infrastructure
  selftests/vm: anon_cow: add mprotect() optimization tests

Nadav Amit (1):
  mm/mprotect: allow clean exclusive anon pages to be writable

 arch/powerpc/include/asm/book3s/64/pgtable.h | 80 +-------------------
 arch/powerpc/kvm/book3s_hv_rm_mmu.c          |  2 +-
 include/linux/mm.h                           | 18 ++++-
 include/linux/pgtable.h                      | 24 ------
 mm/debug_vm_pgtable.c                        | 32 --------
 mm/huge_memory.c                             | 64 ++++++++++++----
 mm/ksm.c                                     |  9 +--
 mm/memory.c                                  | 16 +++-
 mm/mprotect.c                                | 50 ++++++------
 tools/testing/selftests/vm/anon_cow.c        | 49 +++++++++++-
 10 files changed, 158 insertions(+), 186 deletions(-)

-- 
2.38.1


WARNING: multiple messages have this Message-ID (diff)
From: David Hildenbrand <david@redhat.com>
To: linux-kernel@vger.kernel.org
Cc: linux-mm@kvack.org, linuxppc-dev@lists.ozlabs.org,
	David Hildenbrand <david@redhat.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Mel Gorman <mgorman@techsingularity.net>,
	Dave Chinner <david@fromorbit.com>, Nadav Amit <namit@vmware.com>,
	Peter Xu <peterx@redhat.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Hugh Dickins <hughd@google.com>, Vlastimil Babka <vbabka@suse.cz>,
	Michael Ellerman <mpe@ellerman.id.au>,
	Nicholas Piggin <npiggin@gmail.com>,
	Mike Rapoport <rppt@kernel.org>,
	Anshuman Khandual <anshuman.khandual@arm.com>
Subject: [PATCH v2 0/7] mm/autonuma: replace savedwrite infrastructure
Date: Tue,  8 Nov 2022 18:46:45 +0100	[thread overview]
Message-ID: <20221108174652.198904-1-david@redhat.com> (raw)

This series is based on mm-unstable.

As discussed in my talk at LPC, we can reuse the same mechanism for
deciding whether to map a pte writable when upgrading permissions via
mprotect() -- e.g., PROT_READ -> PROT_READ|PROT_WRITE -- to replace the
savedwrite infrastructure used for NUMA hinting faults (e.g., PROT_NONE
-> PROT_READ|PROT_WRITE).

Instead of maintaining previous write permissions for a pte/pmd, we
re-determine if the pte/pmd can be writable. The big benefit is that we
have a common logic for deciding whether we can map a pte/pmd writable on
protection changes.

For private mappings, there should be no difference -- from
what I understand, that is what autonuma benchmarks care about.

I ran autonumabench for v1 on a system with 2 NUMA nodes, 96 GiB each via:
	perf stat --null --repeat 10
The numa01 benchmark is quite noisy in my environment and I failed to
reduce the noise so far.

numa01:
	mm-unstable:   146.88 +- 6.54 seconds time elapsed  ( +-  4.45% )
	mm-unstable++: 147.45 +- 13.39 seconds time elapsed  ( +-  9.08% )

numa02:
	mm-unstable:   16.0300 +- 0.0624 seconds time elapsed  ( +-  0.39% )
	mm-unstable++: 16.1281 +- 0.0945 seconds time elapsed  ( +-  0.59% )

It is worth noting that for shared writable mappings that require
writenotify, we will only avoid write faults if the pte/pmd is dirty
(inherited from the older mprotect logic). If we ever care about optimizing
that further, we'd need a different mechanism to identify whether the FS
still needs to get notified on the next write access.

In any case, such an optimiztion will then not be autonuma-specific,
but mprotect() permission upgrades would similarly benefit from it.

Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Nadav Amit <namit@vmware.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>

v1 -> v2:
* "mm/mprotect: factor out check whether manual PTE write upgrades are
   required"
 -> Added
* "mm/autonuma: use can_change_(pte|pmd)_writable() to replace savedwrite"
 -> Simplify and don't opimize for failed migration
 -> Update patch description

RFC -> v1:
* "mm/mprotect: allow clean exclusive anon pages to be writable"
 -> Move comment change to patch #2
* "mm/mprotect: minor can_change_pte_writable() cleanups"
 -> Adjust comments
* "mm/huge_memory: try avoiding write faults when changing PMD protection"
 -> Fix wrong check
* "selftests/vm: anon_cow: add mprotect() optimiation tests"
 -> Add basic tests for the mprotect() optimization

David Hildenbrand (6):
  mm/mprotect: minor can_change_pte_writable() cleanups
  mm/huge_memory: try avoiding write faults when changing PMD protection
  mm/mprotect: factor out check whether manual PTE write upgrades are
    required
  mm/autonuma: use can_change_(pte|pmd)_writable() to replace savedwrite
  mm: remove unused savedwrite infrastructure
  selftests/vm: anon_cow: add mprotect() optimization tests

Nadav Amit (1):
  mm/mprotect: allow clean exclusive anon pages to be writable

 arch/powerpc/include/asm/book3s/64/pgtable.h | 80 +-------------------
 arch/powerpc/kvm/book3s_hv_rm_mmu.c          |  2 +-
 include/linux/mm.h                           | 18 ++++-
 include/linux/pgtable.h                      | 24 ------
 mm/debug_vm_pgtable.c                        | 32 --------
 mm/huge_memory.c                             | 64 ++++++++++++----
 mm/ksm.c                                     |  9 +--
 mm/memory.c                                  | 16 +++-
 mm/mprotect.c                                | 50 ++++++------
 tools/testing/selftests/vm/anon_cow.c        | 49 +++++++++++-
 10 files changed, 158 insertions(+), 186 deletions(-)

-- 
2.38.1



             reply	other threads:[~2022-11-08 17:50 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-08 17:46 David Hildenbrand [this message]
2022-11-08 17:46 ` [PATCH v2 0/7] mm/autonuma: replace savedwrite infrastructure David Hildenbrand
2022-11-08 17:46 ` [PATCH v2 1/7] mm/mprotect: allow clean exclusive anon pages to be writable David Hildenbrand
2022-11-08 17:46   ` David Hildenbrand
2022-11-08 17:46 ` [PATCH v2 2/7] mm/mprotect: minor can_change_pte_writable() cleanups David Hildenbrand
2022-11-08 17:46   ` David Hildenbrand
2022-11-08 17:46 ` [PATCH v2 3/7] mm/huge_memory: try avoiding write faults when changing PMD protection David Hildenbrand
2022-11-08 17:46   ` David Hildenbrand
2022-11-08 17:46 ` [PATCH v2 4/7] mm/mprotect: factor out check whether manual PTE write upgrades are required David Hildenbrand
2022-11-08 17:46   ` David Hildenbrand
2022-11-08 17:46 ` [PATCH v2 5/7] mm/autonuma: use can_change_(pte|pmd)_writable() to replace savedwrite David Hildenbrand
2022-11-08 17:46   ` David Hildenbrand
2022-11-08 17:46 ` [PATCH v2 6/7] mm: remove unused savedwrite infrastructure David Hildenbrand
2022-11-08 17:46   ` David Hildenbrand
2022-11-08 17:46 ` [PATCH v2 7/7] selftests/vm: anon_cow: add mprotect() optimization tests David Hildenbrand
2022-11-08 17:46   ` David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20221108174652.198904-1-david@redhat.com \
    --to=david@redhat.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=anshuman.khandual@arm.com \
    --cc=david@fromorbit.com \
    --cc=hughd@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mgorman@techsingularity.net \
    --cc=namit@vmware.com \
    --cc=npiggin@gmail.com \
    --cc=peterx@redhat.com \
    --cc=rppt@kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.