diff for duplicates of <20170307140453.GB2412@node> diff --git a/a/1.txt b/N1/1.txt index f327aca..17a3544 100644 --- a/a/1.txt +++ b/N1/1.txt @@ -65,3 +65,52 @@ Okay, updated patch is below. MADV_DONTNEED would skip the pmd and it will not be cleared. Userspace expects the area of memory to be clean after MADV_DONTNEED, but it's not. It can lead to userspace misbehaviour. + +>From a0967b0293a6f8053d85785c4d6340e550e849ea Mon Sep 17 00:00:00 2001 +From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> +Date: Thu, 2 Mar 2017 16:47:45 +0300 +Subject: [PATCH] thp: fix MADV_DONTNEED vs. MADV_FREE race + +Both MADV_DONTNEED and MADV_FREE handled with down_read(mmap_sem). +It's critical to not clear pmd intermittently while handling MADV_FREE to +avoid race with MADV_DONTNEED: + + CPU0: CPU1: + madvise_free_huge_pmd() + pmdp_huge_get_and_clear_full() +madvise_dontneed() + zap_pmd_range() + pmd_trans_huge(*pmd) == 0 (without ptl) + // skip the pmd + set_pmd_at(); + // pmd is re-established + +It results in MADV_DONTNEED skipping the pmd, leaving it not cleared. It +violates MADV_DONTNEED interface and can result is userspace misbehaviour. + +Basically it's the same race as with numa balancing in change_huge_pmd(), +but a bit simpler to mitigate: we don't need to preserve dirty/young flags +here due to MADV_FREE functionality. + +Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> +Cc: Minchan Kim <minchan@kernel.org> +--- + mm/huge_memory.c | 3 +-- + 1 file changed, 1 insertion(+), 2 deletions(-) + +diff --git a/mm/huge_memory.c b/mm/huge_memory.c +index 51a8c376d020..3c9ef1104d85 100644 +--- a/mm/huge_memory.c ++++ b/mm/huge_memory.c +@@ -1568,8 +1568,7 @@ bool madvise_free_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, + deactivate_page(page); + + if (pmd_young(orig_pmd) || pmd_dirty(orig_pmd)) { +- orig_pmd = pmdp_huge_get_and_clear_full(tlb->mm, addr, pmd, +- tlb->fullmm); ++ pmdp_invalidate(vma, addr, pmd); + orig_pmd = pmd_mkold(orig_pmd); + orig_pmd = pmd_mkclean(orig_pmd); + +-- + Kirill A. Shutemov diff --git a/a/content_digest b/N1/content_digest index 51d9097..468dc5f 100644 --- a/a/content_digest +++ b/N1/content_digest @@ -82,6 +82,55 @@ "\"Vise versa\" part should be fine. The case I'm worry about is that\n" "MADV_DONTNEED would skip the pmd and it will not be cleared.\n" "Userspace expects the area of memory to be clean after MADV_DONTNEED, but\n" - it's not. It can lead to userspace misbehaviour. + "it's not. It can lead to userspace misbehaviour.\n" + "\n" + ">From a0967b0293a6f8053d85785c4d6340e550e849ea Mon Sep 17 00:00:00 2001\n" + "From: \"Kirill A. Shutemov\" <kirill.shutemov@linux.intel.com>\n" + "Date: Thu, 2 Mar 2017 16:47:45 +0300\n" + "Subject: [PATCH] thp: fix MADV_DONTNEED vs. MADV_FREE race\n" + "\n" + "Both MADV_DONTNEED and MADV_FREE handled with down_read(mmap_sem).\n" + "It's critical to not clear pmd intermittently while handling MADV_FREE to\n" + "avoid race with MADV_DONTNEED:\n" + "\n" + "\tCPU0:\t\t\t\tCPU1:\n" + "\t\t\t\tmadvise_free_huge_pmd()\n" + "\t\t\t\t pmdp_huge_get_and_clear_full()\n" + "madvise_dontneed()\n" + " zap_pmd_range()\n" + " pmd_trans_huge(*pmd) == 0 (without ptl)\n" + " // skip the pmd\n" + "\t\t\t\t set_pmd_at();\n" + "\t\t\t\t // pmd is re-established\n" + "\n" + "It results in MADV_DONTNEED skipping the pmd, leaving it not cleared. It\n" + "violates MADV_DONTNEED interface and can result is userspace misbehaviour.\n" + "\n" + "Basically it's the same race as with numa balancing in change_huge_pmd(),\n" + "but a bit simpler to mitigate: we don't need to preserve dirty/young flags\n" + "here due to MADV_FREE functionality.\n" + "\n" + "Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>\n" + "Cc: Minchan Kim <minchan@kernel.org>\n" + "---\n" + " mm/huge_memory.c | 3 +--\n" + " 1 file changed, 1 insertion(+), 2 deletions(-)\n" + "\n" + "diff --git a/mm/huge_memory.c b/mm/huge_memory.c\n" + "index 51a8c376d020..3c9ef1104d85 100644\n" + "--- a/mm/huge_memory.c\n" + "+++ b/mm/huge_memory.c\n" + "@@ -1568,8 +1568,7 @@ bool madvise_free_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,\n" + " \t\tdeactivate_page(page);\n" + " \n" + " \tif (pmd_young(orig_pmd) || pmd_dirty(orig_pmd)) {\n" + "-\t\torig_pmd = pmdp_huge_get_and_clear_full(tlb->mm, addr, pmd,\n" + "-\t\t\ttlb->fullmm);\n" + "+\t\tpmdp_invalidate(vma, addr, pmd);\n" + " \t\torig_pmd = pmd_mkold(orig_pmd);\n" + " \t\torig_pmd = pmd_mkclean(orig_pmd);\n" + " \n" + "-- \n" + Kirill A. Shutemov -3eccf09ececc726aad7ba2c0d67d6c632efcfe71c2dcb3c1c21e826acb58b197 +9dc296dce0284a2556a118bd86583ce6fb624a3ccc46c60045930212a2e65dbb
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.