* mm,tlb: revert 4647706ebeee? @ 2018-07-06 17:03 Rik van Riel 2018-07-06 17:10 ` [PATCH] Revert "mm: always flush VMA ranges affected by zap_page_range" Rik van Riel 2018-07-07 15:25 ` mm,tlb: revert 4647706ebeee? Nicholas Piggin 0 siblings, 2 replies; 6+ messages in thread From: Rik van Riel @ 2018-07-06 17:03 UTC (permalink / raw) To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, Andrew Morton, Michal Hocko, kirill.shutemov, Minchan Kim, Mel Gorman, kernel-team [-- Attachment #1: Type: text/plain, Size: 1021 bytes --] Hello, It looks like last summer, there were 2 sets of patches in flight to fix the issue of simultaneous mprotect/madvise calls unmapping PTEs, and some pages not being flushed from the TLB before returning to userspace. Minchan posted these patches: 56236a59556c ("mm: refactor TLB gathering API") 99baac21e458 ("mm: fix MADV_[FREE|DONTNEED] TLB flush miss problem") Around the same time, Mel posted: 4647706ebeee ("mm: always flush VMA ranges affected by zap_page_range") They both appear to solve the same bug. Only one of the two solutions is needed. However, 4647706ebeee appears to introduce extra TLB flushes - one per VMA, instead of one over the entire range unmapped, and also extra flushes when there are no simultaneous unmappers of the same mm. For that reason, it seems like we should revert 4647706ebeee and keep only Minchan's solution in the kernel. Am I overlooking any reason why we should not revert 4647706ebeee? kind regards, Rik -- All Rights Reversed. [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH] Revert "mm: always flush VMA ranges affected by zap_page_range" 2018-07-06 17:03 mm,tlb: revert 4647706ebeee? Rik van Riel @ 2018-07-06 17:10 ` Rik van Riel 2018-07-16 13:12 ` Mel Gorman 2018-07-07 15:25 ` mm,tlb: revert 4647706ebeee? Nicholas Piggin 1 sibling, 1 reply; 6+ messages in thread From: Rik van Riel @ 2018-07-06 17:10 UTC (permalink / raw) To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, Andrew Morton, Michal Hocko, kirill.shutemov, Minchan Kim, Mel Gorman, kernel-team There was a bug in Linux that could cause madvise (and mprotect?) system calls to return to userspace without the TLB having been flushed for all the pages involved. This could happen when multiple threads of a process made simultaneous madvise and/or mprotect calls. This was noticed in the summer of 2017, at which time two solutions were created: 56236a59556c ("mm: refactor TLB gathering API") 99baac21e458 ("mm: fix MADV_[FREE|DONTNEED] TLB flush miss problem") and 4647706ebeee ("mm: always flush VMA ranges affected by zap_page_range") We need only one of these solutions, and the former appears to be a little more efficient than the latter, so revert that one. This reverts commit 4647706ebeee6e50f7b9f922b095f4ec94d581c3. --- mm/memory.c | 14 +------------- 1 file changed, 1 insertion(+), 13 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index 7206a634270b..9d472e00fc2d 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1603,20 +1603,8 @@ void zap_page_range(struct vm_area_struct *vma, unsigned long start, tlb_gather_mmu(&tlb, mm, start, end); update_hiwater_rss(mm); mmu_notifier_invalidate_range_start(mm, start, end); - for ( ; vma && vma->vm_start < end; vma = vma->vm_next) { + for ( ; vma && vma->vm_start < end; vma = vma->vm_next) unmap_single_vma(&tlb, vma, start, end, NULL); - - /* - * zap_page_range does not specify whether mmap_sem should be - * held for read or write. That allows parallel zap_page_range - * operations to unmap a PTE and defer a flush meaning that - * this call observes pte_none and fails to flush the TLB. - * Rather than adding a complex API, ensure that no stale - * TLB entries exist when this call returns. - */ - flush_tlb_range(vma, start, end); - } - mmu_notifier_invalidate_range_end(mm, start, end); tlb_finish_mmu(&tlb, start, end); } -- 2.14.4 ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH] Revert "mm: always flush VMA ranges affected by zap_page_range" 2018-07-06 17:10 ` [PATCH] Revert "mm: always flush VMA ranges affected by zap_page_range" Rik van Riel @ 2018-07-16 13:12 ` Mel Gorman 0 siblings, 0 replies; 6+ messages in thread From: Mel Gorman @ 2018-07-16 13:12 UTC (permalink / raw) To: Rik van Riel Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, Andrew Morton, Michal Hocko, kirill.shutemov, Minchan Kim, kernel-team On Fri, Jul 06, 2018 at 01:10:19PM -0400, Rik van Riel wrote: > There was a bug in Linux that could cause madvise (and mprotect?) > system calls to return to userspace without the TLB having been > flushed for all the pages involved. > > This could happen when multiple threads of a process made simultaneous > madvise and/or mprotect calls. > > This was noticed in the summer of 2017, at which time two solutions > were created: > 56236a59556c ("mm: refactor TLB gathering API") > 99baac21e458 ("mm: fix MADV_[FREE|DONTNEED] TLB flush miss problem") > and > 4647706ebeee ("mm: always flush VMA ranges affected by zap_page_range") > > We need only one of these solutions, and the former appears to be > a little more efficient than the latter, so revert that one. > > This reverts commit 4647706ebeee6e50f7b9f922b095f4ec94d581c3. > --- > mm/memory.c | 14 +------------- > 1 file changed, 1 insertion(+), 13 deletions(-) Acked-by: Mel Gorman <mgorman@techsingularity.net> -- Mel Gorman SUSE Labs ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: mm,tlb: revert 4647706ebeee? 2018-07-06 17:03 mm,tlb: revert 4647706ebeee? Rik van Riel 2018-07-06 17:10 ` [PATCH] Revert "mm: always flush VMA ranges affected by zap_page_range" Rik van Riel @ 2018-07-07 15:25 ` Nicholas Piggin 2018-07-10 0:13 ` Andrew Morton 1 sibling, 1 reply; 6+ messages in thread From: Nicholas Piggin @ 2018-07-07 15:25 UTC (permalink / raw) To: Rik van Riel Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, Andrew Morton, Michal Hocko, kirill.shutemov, Minchan Kim, Mel Gorman, kernel-team On Fri, 06 Jul 2018 13:03:55 -0400 Rik van Riel <riel@surriel.com> wrote: > Hello, > > It looks like last summer, there were 2 sets of patches > in flight to fix the issue of simultaneous mprotect/madvise > calls unmapping PTEs, and some pages not being flushed from > the TLB before returning to userspace. > > Minchan posted these patches: > 56236a59556c ("mm: refactor TLB gathering API") > 99baac21e458 ("mm: fix MADV_[FREE|DONTNEED] TLB flush miss problem") > > Around the same time, Mel posted: > 4647706ebeee ("mm: always flush VMA ranges affected by zap_page_range") > > They both appear to solve the same bug. > > Only one of the two solutions is needed. > > However, 4647706ebeee appears to introduce extra TLB > flushes - one per VMA, instead of one over the entire > range unmapped, and also extra flushes when there are > no simultaneous unmappers of the same mm. > > For that reason, it seems like we should revert > 4647706ebeee and keep only Minchan's solution in > the kernel. > > Am I overlooking any reason why we should not revert > 4647706ebeee? Yes I think so. Discussed here recently: https://marc.info/?l=linux-mm&m=152878780528037&w=2 Actually we realized that powerpc does not implement the mmu gather flushing quite right so it needs a fix before this revert. But I propose the revert for next merge window. Thanks, Nick ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: mm,tlb: revert 4647706ebeee? 2018-07-07 15:25 ` mm,tlb: revert 4647706ebeee? Nicholas Piggin @ 2018-07-10 0:13 ` Andrew Morton 2018-07-10 5:04 ` Nicholas Piggin 0 siblings, 1 reply; 6+ messages in thread From: Andrew Morton @ 2018-07-10 0:13 UTC (permalink / raw) To: Nicholas Piggin Cc: Rik van Riel, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Michal Hocko, kirill.shutemov, Minchan Kim, Mel Gorman, kernel-team, Aneesh Kumar K.V, Nadav Amit On Sun, 8 Jul 2018 01:25:38 +1000 Nicholas Piggin <npiggin@gmail.com> wrote: > On Fri, 06 Jul 2018 13:03:55 -0400 > Rik van Riel <riel@surriel.com> wrote: > > > Hello, > > > > It looks like last summer, there were 2 sets of patches > > in flight to fix the issue of simultaneous mprotect/madvise > > calls unmapping PTEs, and some pages not being flushed from > > the TLB before returning to userspace. > > > > Minchan posted these patches: > > 56236a59556c ("mm: refactor TLB gathering API") > > 99baac21e458 ("mm: fix MADV_[FREE|DONTNEED] TLB flush miss problem") > > > > Around the same time, Mel posted: > > 4647706ebeee ("mm: always flush VMA ranges affected by zap_page_range") > > > > They both appear to solve the same bug. > > > > Only one of the two solutions is needed. > > > > However, 4647706ebeee appears to introduce extra TLB > > flushes - one per VMA, instead of one over the entire > > range unmapped, and also extra flushes when there are > > no simultaneous unmappers of the same mm. > > > > For that reason, it seems like we should revert > > 4647706ebeee and keep only Minchan's solution in > > the kernel. > > > > Am I overlooking any reason why we should not revert > > 4647706ebeee? > > Yes I think so. Discussed here recently: > > https://marc.info/?l=linux-mm&m=152878780528037&w=2 Unclear if that was an ack ;) > Actually we realized that powerpc does not implement the mmu > gather flushing quite right so it needs a fix before this > revert. But I propose the revert for next merge window. Yes, I have Rik's patch for 4.19-rc1. I added yourself, Aneesh and Nadav to cc so you'll see it fly past. If poss, please do get this all tested before the time comes and let me know? ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: mm,tlb: revert 4647706ebeee? 2018-07-10 0:13 ` Andrew Morton @ 2018-07-10 5:04 ` Nicholas Piggin 0 siblings, 0 replies; 6+ messages in thread From: Nicholas Piggin @ 2018-07-10 5:04 UTC (permalink / raw) To: Andrew Morton Cc: Rik van Riel, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Michal Hocko, kirill.shutemov, Minchan Kim, Mel Gorman, kernel-team, Aneesh Kumar K.V, Nadav Amit, linux-arch On Mon, 9 Jul 2018 17:13:56 -0700 Andrew Morton <akpm@linux-foundation.org> wrote: > On Sun, 8 Jul 2018 01:25:38 +1000 Nicholas Piggin <npiggin@gmail.com> wrote: > > > On Fri, 06 Jul 2018 13:03:55 -0400 > > Rik van Riel <riel@surriel.com> wrote: > > > > > Hello, > > > > > > It looks like last summer, there were 2 sets of patches > > > in flight to fix the issue of simultaneous mprotect/madvise > > > calls unmapping PTEs, and some pages not being flushed from > > > the TLB before returning to userspace. > > > > > > Minchan posted these patches: > > > 56236a59556c ("mm: refactor TLB gathering API") > > > 99baac21e458 ("mm: fix MADV_[FREE|DONTNEED] TLB flush miss problem") > > > > > > Around the same time, Mel posted: > > > 4647706ebeee ("mm: always flush VMA ranges affected by zap_page_range") > > > > > > They both appear to solve the same bug. > > > > > > Only one of the two solutions is needed. > > > > > > However, 4647706ebeee appears to introduce extra TLB > > > flushes - one per VMA, instead of one over the entire > > > range unmapped, and also extra flushes when there are > > > no simultaneous unmappers of the same mm. > > > > > > For that reason, it seems like we should revert > > > 4647706ebeee and keep only Minchan's solution in > > > the kernel. > > > > > > Am I overlooking any reason why we should not revert > > > 4647706ebeee? > > > > Yes I think so. Discussed here recently: > > > > https://marc.info/?l=linux-mm&m=152878780528037&w=2 > > Unclear if that was an ack ;) > Sure, I'm thinking Rik's mail is a ack for my patch :) No actually I think it's okay, but was in the middle of testing my series when Aneesh pointed out a bit was missing from powerpc, so I had to go off and fix that, I think that's upstream now. So need to go back and re-test this revert. Wouldn't hurt for other arch maintainers to have a look I guess (cc linux-arch): The problem powerpc had is that mmu_gather flushing will flush a single page size based on the ptes it encounters when we zap. If we hit a different page size, it flushes and switches to the new size. If we have concurrent zaps on the same range, the other thread may have cleared a large page pte so we won't see that and will only do a small page flush for that range. Which means we can return before the other thread invalidated our TLB for the large pages in the range we wanted to flush. I suspect most arches are probably okay, but if you make any TLB flush choices based on the pte contents, then you could be exposed. Except in the case of archs like sparc and powerpc/hash which do the flushing in arch_leave_lazy_mmu_mode(), because that is called under the same page table lock, so there can't be concurrent zap. A quick look through the archs doesn't show anything obvious, but please take a look at your arch. And I'll try to do a bit more testing. Thanks, Nick ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2018-07-16 13:12 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2018-07-06 17:03 mm,tlb: revert 4647706ebeee? Rik van Riel 2018-07-06 17:10 ` [PATCH] Revert "mm: always flush VMA ranges affected by zap_page_range" Rik van Riel 2018-07-16 13:12 ` Mel Gorman 2018-07-07 15:25 ` mm,tlb: revert 4647706ebeee? Nicholas Piggin 2018-07-10 0:13 ` Andrew Morton 2018-07-10 5:04 ` Nicholas Piggin
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).