All of lore.kernel.org
 help / color / mirror / Atom feed
diff for duplicates of <20170307140453.GB2412@node>

diff --git a/a/1.txt b/N1/1.txt
index f327aca..17a3544 100644
--- a/a/1.txt
+++ b/N1/1.txt
@@ -65,3 +65,52 @@ Okay, updated patch is below.
 MADV_DONTNEED would skip the pmd and it will not be cleared.
 Userspace expects the area of memory to be clean after MADV_DONTNEED, but
 it's not. It can lead to userspace misbehaviour.
+
+>From a0967b0293a6f8053d85785c4d6340e550e849ea Mon Sep 17 00:00:00 2001
+From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
+Date: Thu, 2 Mar 2017 16:47:45 +0300
+Subject: [PATCH] thp: fix MADV_DONTNEED vs. MADV_FREE race
+
+Both MADV_DONTNEED and MADV_FREE handled with down_read(mmap_sem).
+It's critical to not clear pmd intermittently while handling MADV_FREE to
+avoid race with MADV_DONTNEED:
+
+	CPU0:				CPU1:
+				madvise_free_huge_pmd()
+				 pmdp_huge_get_and_clear_full()
+madvise_dontneed()
+ zap_pmd_range()
+  pmd_trans_huge(*pmd) == 0 (without ptl)
+  // skip the pmd
+				 set_pmd_at();
+				 // pmd is re-established
+
+It results in MADV_DONTNEED skipping the pmd, leaving it not cleared. It
+violates MADV_DONTNEED interface and can result is userspace misbehaviour.
+
+Basically it's the same race as with numa balancing in change_huge_pmd(),
+but a bit simpler to mitigate: we don't need to preserve dirty/young flags
+here due to MADV_FREE functionality.
+
+Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
+Cc: Minchan Kim <minchan@kernel.org>
+---
+ mm/huge_memory.c | 3 +--
+ 1 file changed, 1 insertion(+), 2 deletions(-)
+
+diff --git a/mm/huge_memory.c b/mm/huge_memory.c
+index 51a8c376d020..3c9ef1104d85 100644
+--- a/mm/huge_memory.c
++++ b/mm/huge_memory.c
+@@ -1568,8 +1568,7 @@ bool madvise_free_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
+ 		deactivate_page(page);
+ 
+ 	if (pmd_young(orig_pmd) || pmd_dirty(orig_pmd)) {
+-		orig_pmd = pmdp_huge_get_and_clear_full(tlb->mm, addr, pmd,
+-			tlb->fullmm);
++		pmdp_invalidate(vma, addr, pmd);
+ 		orig_pmd = pmd_mkold(orig_pmd);
+ 		orig_pmd = pmd_mkclean(orig_pmd);
+ 
+-- 
+ Kirill A. Shutemov
diff --git a/a/content_digest b/N1/content_digest
index 51d9097..468dc5f 100644
--- a/a/content_digest
+++ b/N1/content_digest
@@ -82,6 +82,55 @@
  "\"Vise versa\" part should be fine. The case I'm worry about is that\n"
  "MADV_DONTNEED would skip the pmd and it will not be cleared.\n"
  "Userspace expects the area of memory to be clean after MADV_DONTNEED, but\n"
- it's not. It can lead to userspace misbehaviour.
+ "it's not. It can lead to userspace misbehaviour.\n"
+ "\n"
+ ">From a0967b0293a6f8053d85785c4d6340e550e849ea Mon Sep 17 00:00:00 2001\n"
+ "From: \"Kirill A. Shutemov\" <kirill.shutemov@linux.intel.com>\n"
+ "Date: Thu, 2 Mar 2017 16:47:45 +0300\n"
+ "Subject: [PATCH] thp: fix MADV_DONTNEED vs. MADV_FREE race\n"
+ "\n"
+ "Both MADV_DONTNEED and MADV_FREE handled with down_read(mmap_sem).\n"
+ "It's critical to not clear pmd intermittently while handling MADV_FREE to\n"
+ "avoid race with MADV_DONTNEED:\n"
+ "\n"
+ "\tCPU0:\t\t\t\tCPU1:\n"
+ "\t\t\t\tmadvise_free_huge_pmd()\n"
+ "\t\t\t\t pmdp_huge_get_and_clear_full()\n"
+ "madvise_dontneed()\n"
+ " zap_pmd_range()\n"
+ "  pmd_trans_huge(*pmd) == 0 (without ptl)\n"
+ "  // skip the pmd\n"
+ "\t\t\t\t set_pmd_at();\n"
+ "\t\t\t\t // pmd is re-established\n"
+ "\n"
+ "It results in MADV_DONTNEED skipping the pmd, leaving it not cleared. It\n"
+ "violates MADV_DONTNEED interface and can result is userspace misbehaviour.\n"
+ "\n"
+ "Basically it's the same race as with numa balancing in change_huge_pmd(),\n"
+ "but a bit simpler to mitigate: we don't need to preserve dirty/young flags\n"
+ "here due to MADV_FREE functionality.\n"
+ "\n"
+ "Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>\n"
+ "Cc: Minchan Kim <minchan@kernel.org>\n"
+ "---\n"
+ " mm/huge_memory.c | 3 +--\n"
+ " 1 file changed, 1 insertion(+), 2 deletions(-)\n"
+ "\n"
+ "diff --git a/mm/huge_memory.c b/mm/huge_memory.c\n"
+ "index 51a8c376d020..3c9ef1104d85 100644\n"
+ "--- a/mm/huge_memory.c\n"
+ "+++ b/mm/huge_memory.c\n"
+ "@@ -1568,8 +1568,7 @@ bool madvise_free_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,\n"
+ " \t\tdeactivate_page(page);\n"
+ " \n"
+ " \tif (pmd_young(orig_pmd) || pmd_dirty(orig_pmd)) {\n"
+ "-\t\torig_pmd = pmdp_huge_get_and_clear_full(tlb->mm, addr, pmd,\n"
+ "-\t\t\ttlb->fullmm);\n"
+ "+\t\tpmdp_invalidate(vma, addr, pmd);\n"
+ " \t\torig_pmd = pmd_mkold(orig_pmd);\n"
+ " \t\torig_pmd = pmd_mkclean(orig_pmd);\n"
+ " \n"
+ "-- \n"
+  Kirill A. Shutemov
 
-3eccf09ececc726aad7ba2c0d67d6c632efcfe71c2dcb3c1c21e826acb58b197
+9dc296dce0284a2556a118bd86583ce6fb624a3ccc46c60045930212a2e65dbb

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.