All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Jason A. Donenfeld" <Jason@zx2c4.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: David Hildenbrand <david@redhat.com>,
	linux-kernel@vger.kernel.org, patches@lists.linux.dev,
	tglx@linutronix.de, linux-crypto@vger.kernel.org,
	linux-api@vger.kernel.org, x86@kernel.org,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Adhemerval Zanella Netto <adhemerval.zanella@linaro.org>,
	Carlos O'Donell <carlos@redhat.com>,
	Florian Weimer <fweimer@redhat.com>,
	Arnd Bergmann <arnd@arndb.de>, Jann Horn <jannh@google.com>,
	Christian Brauner <brauner@kernel.org>,
	David Hildenbrand <dhildenb@redhat.com>,
	linux-mm@kvack.org
Subject: Re: [PATCH v22 1/4] mm: add MAP_DROPPABLE for designating always lazily freeable mappings
Date: Thu, 11 Jul 2024 19:09:36 +0200	[thread overview]
Message-ID: <ZpAR0CgLc28gEkV3@zx2c4.com> (raw)
In-Reply-To: <CAHk-=wh=vzhiDSNaLJdmjkhLqevB8+rhE49pqh0uBwhsV=1ccQ@mail.gmail.com>

Hi Linus, David,

On Wed, Jul 10, 2024 at 10:07:03PM -0700, Linus Torvalds wrote:
> The other approach might be to just let all the dirty handling happen
> - make droppable pages have a "page->mapping" (and not be anonymous),
> and have the mapping->a_ops->writepage() just always return success
> immediately.

When I was working on this patchset this year with the syscall, this is
similar somewhat to the initial approach I was taking with setting up a
special mapping. It turned into kind of a mess and I couldn't get it
working. There's a lot of functionality built around anonymous pages
that would need to be duplicated (I think?). I'll revisit it if need be,
but let's see if I can make avoiding the dirty bit propagation work.

> It's mainly the pte_dirty games in mm/vmscan.c that does it
> (walk_pte_range), but also the tear-down in mm/memory.c
> (zap_present_folio_ptes). Possibly others that I didn't think of.
> 
> Both do have access to the vma, although in the case of
> walk_pte_range() we don't actually pass it down because we haven't
> needed it).

Actually, it's there hanging out in args->vma, and the function makes
use of that member already. So not so bad.

> 
> There's also page_vma_mkclean_one(), try_to_unmap_one() and
> try_to_migrate_one().  And possibly many others I haven't even thought
> about.
> 
> So quite a few places that do that "transfer dirty bit from pte to folio".

Alright, an hour later of fiddling, and it doesn't actually work (yet?)
-- the selftest fails. A diff follows below.

So, hmm... The swapbacked thing really seemed so simple... I wonder if
there's a way of recovering that.

Jason


diff --git a/mm/gup.c b/mm/gup.c
index ca0f5cedce9b..38745cc4fa06 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -990,7 +990,8 @@ static struct page *follow_page_pte(struct vm_area_struct *vma,
 	}
 	if (flags & FOLL_TOUCH) {
 		if ((flags & FOLL_WRITE) &&
-		    !pte_dirty(pte) && !PageDirty(page))
+		    !pte_dirty(pte) && !PageDirty(page) &&
+		    !(vma->vm_flags & VM_DROPPABLE))
 			set_page_dirty(page);
 		/*
 		 * pte_mkyoung() would be more correct here, but atomic care
diff --git a/mm/ksm.c b/mm/ksm.c
index 34c4820e0d3d..2401fc4203ba 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -1339,7 +1339,7 @@ static int write_protect_page(struct vm_area_struct *vma, struct folio *folio,
 			goto out_unlock;
 		}

-		if (pte_dirty(entry))
+		if (pte_dirty(entry) && !(vma->vm_flags & VM_DROPPABLE))
 			folio_mark_dirty(folio);
 		entry = pte_mkclean(entry);

@@ -1518,7 +1518,7 @@ static int try_to_merge_one_page(struct vm_area_struct *vma,
 			 * Page reclaim just frees a clean page with no dirty
 			 * ptes: make sure that the ksm page would be swapped.
 			 */
-			if (!PageDirty(page))
+			if (!PageDirty(page) && !(vma->vm_flags & VM_DROPPABLE))
 				SetPageDirty(page);
 			err = 0;
 		} else if (pages_identical(page, kpage))
diff --git a/mm/memory.c b/mm/memory.c
index d10e616d7389..6a02d16309be 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1479,7 +1479,7 @@ static __always_inline void zap_present_folio_ptes(struct mmu_gather *tlb,

 	if (!folio_test_anon(folio)) {
 		ptent = get_and_clear_full_ptes(mm, addr, pte, nr, tlb->fullmm);
-		if (pte_dirty(ptent)) {
+		if (pte_dirty(ptent) && !(vma->vm_flags & VM_DROPPABLE)) {
 			folio_mark_dirty(folio);
 			if (tlb_delay_rmap(tlb)) {
 				delay_rmap = true;
@@ -6140,7 +6140,8 @@ static int __access_remote_vm(struct mm_struct *mm, unsigned long addr,
 			if (write) {
 				copy_to_user_page(vma, page, addr,
 						  maddr + offset, buf, bytes);
-				set_page_dirty_lock(page);
+				if (!(vma->vm_flags & VM_DROPPABLE))
+					set_page_dirty_lock(page);
 			} else {
 				copy_from_user_page(vma, page, addr,
 						    buf, maddr + offset, bytes);
diff --git a/mm/migrate_device.c b/mm/migrate_device.c
index aecc71972a87..72d3f8eaae6e 100644
--- a/mm/migrate_device.c
+++ b/mm/migrate_device.c
@@ -216,7 +216,7 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp,
 			migrate->cpages++;

 			/* Set the dirty flag on the folio now the pte is gone. */
-			if (pte_dirty(pte))
+			if (pte_dirty(pte) && !(vma->vm_flags & VM_DROPPABLE))
 				folio_mark_dirty(folio);

 			/* Setup special migration page table entry */
diff --git a/mm/rmap.c b/mm/rmap.c
index 1f9b5a9cb121..1688d06bb617 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1397,12 +1397,7 @@ void folio_add_new_anon_rmap(struct folio *folio, struct vm_area_struct *vma,
 	VM_WARN_ON_FOLIO(folio_test_hugetlb(folio), folio);
 	VM_BUG_ON_VMA(address < vma->vm_start ||
 			address + (nr << PAGE_SHIFT) > vma->vm_end, vma);
-	/*
-	 * VM_DROPPABLE mappings don't swap; instead they're just dropped when
-	 * under memory pressure.
-	 */
-	if (!(vma->vm_flags & VM_DROPPABLE))
-		__folio_set_swapbacked(folio);
+	__folio_set_swapbacked(folio);
 	__folio_set_anon(folio, vma, address, true);

 	if (likely(!folio_test_large(folio))) {
@@ -1777,7 +1772,7 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
 		pte_install_uffd_wp_if_needed(vma, address, pvmw.pte, pteval);

 		/* Set the dirty flag on the folio now the pte is gone. */
-		if (pte_dirty(pteval))
+		if (pte_dirty(pteval) && !(vma->vm_flags & VM_DROPPABLE))
 			folio_mark_dirty(folio);

 		/* Update high watermark before we lower rss */
@@ -1822,7 +1817,7 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
 			}

 			/* MADV_FREE page check */
-			if (!folio_test_swapbacked(folio)) {
+			if (!folio_test_swapbacked(folio) || (vma->vm_flags & VM_DROPPABLE)) {
 				int ref_count, map_count;

 				/*
@@ -1846,13 +1841,7 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
 				 * plus the rmap(s) (dropped by discard:).
 				 */
 				if (ref_count == 1 + map_count &&
-				    (!folio_test_dirty(folio) ||
-				     /*
-				      * Unlike MADV_FREE mappings, VM_DROPPABLE
-				      * ones can be dropped even if they've
-				      * been dirtied.
-				      */
-				     (vma->vm_flags & VM_DROPPABLE))) {
+				    !folio_test_dirty(folio)) {
 					dec_mm_counter(mm, MM_ANONPAGES);
 					goto discard;
 				}
@@ -1862,12 +1851,7 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
 				 * discarded. Remap the page to page table.
 				 */
 				set_pte_at(mm, address, pvmw.pte, pteval);
-				/*
-				 * Unlike MADV_FREE mappings, VM_DROPPABLE ones
-				 * never get swap backed on failure to drop.
-				 */
-				if (!(vma->vm_flags & VM_DROPPABLE))
-					folio_set_swapbacked(folio);
+				folio_set_swapbacked(folio);
 				ret = false;
 				page_vma_mapped_walk_done(&pvmw);
 				break;
@@ -2151,7 +2135,7 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma,
 		}

 		/* Set the dirty flag on the folio now the pte is gone. */
-		if (pte_dirty(pteval))
+		if (pte_dirty(pteval) && !(vma->vm_flags & VM_DROPPABLE))
 			folio_mark_dirty(folio);

 		/* Update high watermark before we lower rss */
@@ -2397,7 +2381,7 @@ static bool page_make_device_exclusive_one(struct folio *folio,
 		pteval = ptep_clear_flush(vma, address, pvmw.pte);

 		/* Set the dirty flag on the folio now the pte is gone. */
-		if (pte_dirty(pteval))
+		if (pte_dirty(pteval) && !(vma->vm_flags & VM_DROPPABLE))
 			folio_mark_dirty(folio);

 		/*
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 2e34de9cd0d4..cf5b26bd067a 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -3396,6 +3396,7 @@ static bool walk_pte_range(pmd_t *pmd, unsigned long start, unsigned long end,
 		walk->mm_stats[MM_LEAF_YOUNG]++;

 		if (pte_dirty(ptent) && !folio_test_dirty(folio) &&
+		    !(args->vma->vm_flags & VM_DROPPABLE) &&
 		    !(folio_test_anon(folio) && folio_test_swapbacked(folio) &&
 		      !folio_test_swapcache(folio)))
 			folio_mark_dirty(folio);
@@ -3476,6 +3477,7 @@ static void walk_pmd_range_locked(pud_t *pud, unsigned long addr, struct vm_area
 		walk->mm_stats[MM_LEAF_YOUNG]++;

 		if (pmd_dirty(pmd[i]) && !folio_test_dirty(folio) &&
+		    !(vma->vm_flags && VM_DROPPABLE) &&
 		    !(folio_test_anon(folio) && folio_test_swapbacked(folio) &&
 		      !folio_test_swapcache(folio)))
 			folio_mark_dirty(folio);
@@ -4076,6 +4078,7 @@ void lru_gen_look_around(struct page_vma_mapped_walk *pvmw)
 		young++;

 		if (pte_dirty(ptent) && !folio_test_dirty(folio) &&
+		    !(vma->vm_flags & VM_DROPPABLE) &&
 		    !(folio_test_anon(folio) && folio_test_swapbacked(folio) &&
 		      !folio_test_swapcache(folio)))
 			folio_mark_dirty(folio);


  reply	other threads:[~2024-07-11 17:09 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-07-09 13:05 [PATCH v22 0/4] implement getrandom() in vDSO Jason A. Donenfeld
2024-07-09 13:05 ` [PATCH v22 1/4] mm: add MAP_DROPPABLE for designating always lazily freeable mappings Jason A. Donenfeld
2024-07-10  3:27   ` David Hildenbrand
2024-07-10  4:05     ` David Hildenbrand
2024-07-11  0:44       ` Jason A. Donenfeld
2024-07-11  4:32         ` Jason A. Donenfeld
2024-07-11  4:46           ` David Hildenbrand
2024-07-11  5:07             ` Linus Torvalds
2024-07-11 17:09               ` Jason A. Donenfeld [this message]
2024-07-11 17:17                 ` Jason A. Donenfeld
2024-07-11 17:24                   ` David Hildenbrand
2024-07-11 17:27                     ` David Hildenbrand
2024-07-11 17:54                       ` Jason A. Donenfeld
2024-07-11 17:56                         ` Jason A. Donenfeld
2024-07-11 18:08                           ` Jason A. Donenfeld
2024-07-11 18:24                             ` David Hildenbrand
2024-07-11 18:54                               ` Jason A. Donenfeld
2024-07-11 18:56                                 ` David Hildenbrand
2024-07-11 19:18                                   ` David Hildenbrand
2024-07-11 19:20                                     ` David Hildenbrand
2024-07-11 19:49                                       ` Yu Zhao
2024-07-11 19:52                                         ` Yu Zhao
2024-07-11 19:53                                         ` David Hildenbrand
2024-07-11 19:58                                           ` Yu Zhao
2024-07-11 20:59                                             ` David Hildenbrand
2024-07-11 20:20                                         ` Jason A. Donenfeld
2024-07-11 20:59                                           ` David Hildenbrand
2024-07-11 17:49                     ` Jason A. Donenfeld
2024-07-11 17:57                 ` Linus Torvalds
2024-07-11 19:07                   ` David Hildenbrand
2024-07-11 19:17                     ` Linus Torvalds
2024-07-11 19:22                       ` David Hildenbrand
2024-07-11 20:07                   ` Jason A. Donenfeld
2024-07-11 20:17                     ` Jason A. Donenfeld
2024-07-11 22:29     ` David Hildenbrand
2024-07-12  1:21       ` Jason A. Donenfeld
2024-07-09 13:05 ` [PATCH v22 2/4] random: introduce generic vDSO getrandom() implementation Jason A. Donenfeld
2024-07-09 13:05 ` [PATCH v22 3/4] x86: vdso: Wire up getrandom() vDSO implementation Jason A. Donenfeld
2024-07-09 13:05 ` [PATCH v22 4/4] selftests/vDSO: add tests for vgetrandom Jason A. Donenfeld

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZpAR0CgLc28gEkV3@zx2c4.com \
    --to=jason@zx2c4.com \
    --cc=adhemerval.zanella@linaro.org \
    --cc=arnd@arndb.de \
    --cc=brauner@kernel.org \
    --cc=carlos@redhat.com \
    --cc=david@redhat.com \
    --cc=dhildenb@redhat.com \
    --cc=fweimer@redhat.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=jannh@google.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-crypto@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=patches@lists.linux.dev \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.