* Re: [PATCH] x86, hugetlb: add missing TLB page invalidation for hugetlb_cow() [not found] <20140514092948.GA17391@server-36.huawei.corp> @ 2014-05-13 22:44 ` Dave Hansen 2014-05-15 17:00 ` Anthony Iliopoulos 0 siblings, 1 reply; 6+ messages in thread From: Dave Hansen @ 2014-05-13 22:44 UTC (permalink / raw) To: Anthony Iliopoulos, Thomas Gleixner, Ingo Molnar, H. Peter Anvin Cc: x86, linux-kernel, Kirill A. Shutemov, Shay Goikhman, Paul Mundt, Carlos Villavieja, Nacho Navarro, Avi Mendelson, Yoav Etsion, Gerald Schaefer, David Gibson, linux-arch On 05/14/2014 02:29 AM, Anthony Iliopoulos wrote: > The invalidation is required in order to maintain proper semantics > under CoW conditions. In scenarios where a process clones several > threads, a thread operating on a core whose DTLB entry for a > particular hugepage has not been invalidated, will be reading from > the hugepage that belongs to the forked child process, even after > hugetlb_cow(). > > The thread will not see the updated page as long as the stale DTLB > entry remains cached, the thread attempts to write into the page, > the child process exits, or the thread gets migrated to a different > processor. No to be too nitpicky, but this applies to ITLB too, right? I believe this bug came all the way back from: > commit 1e8f889b10d8d2223105719e36ce45688fedbd59 > Author: David Gibson <david@gibson.dropbear.id.au> > Date: Fri Jan 6 00:10:44 2006 -0800 > > [PATCH] Hugetlb: Copy on Write support It was probably the first time that we ever changed an _existing_ hugetlbfs pte, and that patch probably just missed the TLB flush because none of the other pte-setting hugetlb.c code needed TLB flushes. The bogus x86 version of huge_ptep_clear_flush() came from the s390 guys, so double-shame on IBM! :P > commit 8fe627ec5b7c47b1654dff50536d9709863295a3 > Author: Gerald Schaefer <gerald.schaefer@de.ibm.com> > Date: Mon Apr 28 02:13:28 2008 -0700 > > hugetlbfs: add missing TLB flush to hugetlb_cow() This is probably an opportunity for all the other architecture maintainers to make sure that they have proper copies of huge_ptep_clear_flush(). I went through the hugetlb code on x86 and couldn't find another TLB flush that fixes this issue, and I believe this is correct, so: Acked-by: Dave Hansen <dave.hansen@intel.com> ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] x86, hugetlb: add missing TLB page invalidation for hugetlb_cow() 2014-05-13 22:44 ` [PATCH] x86, hugetlb: add missing TLB page invalidation for hugetlb_cow() Dave Hansen @ 2014-05-15 17:00 ` Anthony Iliopoulos 2014-05-14 17:24 ` Dave Hansen 2014-05-15 7:05 ` Oren Twaig 0 siblings, 2 replies; 6+ messages in thread From: Anthony Iliopoulos @ 2014-05-15 17:00 UTC (permalink / raw) To: Dave Hansen Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, linux-kernel, Kirill A. Shutemov, Shay Goikhman, Paul Mundt, Carlos Villavieja, Nacho Navarro, Avi Mendelson, Yoav Etsion, Gerald Schaefer, David Gibson, linux-arch On Tue, May 13, 2014 at 03:44:55PM -0700, Dave Hansen wrote: > On 05/14/2014 02:29 AM, Anthony Iliopoulos wrote: > > The invalidation is required in order to maintain proper semantics > > under CoW conditions. In scenarios where a process clones several > > threads, a thread operating on a core whose DTLB entry for a > > particular hugepage has not been invalidated, will be reading from > > the hugepage that belongs to the forked child process, even after > > hugetlb_cow(). > > > > The thread will not see the updated page as long as the stale DTLB > > entry remains cached, the thread attempts to write into the page, > > the child process exits, or the thread gets migrated to a different > > processor. > > No to be too nitpicky, but this applies to ITLB too, right? Quite true, this does apply to ITLB too. I suppose there might be cases like self-modifying code that touches the private text pages, or some JIT compiler writing on mmap-ed executable regions that would observe the same behavior under similar conditions. > I believe this bug came all the way back from: > > > commit 1e8f889b10d8d2223105719e36ce45688fedbd59 > > Author: David Gibson <david@gibson.dropbear.id.au> > > Date: Fri Jan 6 00:10:44 2006 -0800 > > > > [PATCH] Hugetlb: Copy on Write support > > It was probably the first time that we ever changed an _existing_ > hugetlbfs pte, and that patch probably just missed the TLB flush because > none of the other pte-setting hugetlb.c code needed TLB flushes. This seems to be the case, I assume that our internal use case was also probably the first that needed to rely on proper semantics under a multithreaded CoW scenario with hugetlb pages, so this came into the surface. I have actually also wondered about another related thing: even when we actually do invalidate the page, there may still be a race between a thread on a core that reads the page in some tight loop (e.g. on a spinlock), and the page fault handler running on a different core, at the point where the pte is set. Since we invalidate the page via the TLB shootdowns *before* we update the pte (this is true for all do_wp_page(), do_huge_pmd_wp_page() as well as hugetlb_cow()), there may be some tiny window where the thread might re-read the page before the pte is set. I have dismissed this case, since I assume that there are many more cycles spent in servicing the TLB invalidation IPI, walking the pgtable plus other related overhead (e.g. sched) than in updating the pte/pmd so I am not sure how possible it would be to hit this condition. I am also hesitant to simply submit a patch that reverses the order of flushing and setting the pte, due to the following commit: commit 4ce072f1faf29d24df4600f53db8cdd62d400a8f Author: Siddha, Suresh B <suresh.b.siddha@intel.com> Date: Fri Sep 29 01:58:42 2006 -0700 [PATCH] mm: fix a race condition under SMC + COW > The bogus x86 version of huge_ptep_clear_flush() came from the s390 > guys, so double-shame on IBM! :P > > > commit 8fe627ec5b7c47b1654dff50536d9709863295a3 > > Author: Gerald Schaefer <gerald.schaefer@de.ibm.com> > > Date: Mon Apr 28 02:13:28 2008 -0700 > > > > hugetlbfs: add missing TLB flush to hugetlb_cow() > > This is probably an opportunity for all the other architecture > maintainers to make sure that they have proper copies of > huge_ptep_clear_flush(). > > I went through the hugetlb code on x86 and couldn't find another TLB > flush that fixes this issue, and I believe this is correct, so: > > Acked-by: Dave Hansen <dave.hansen@intel.com> Many thanks for confirming, and for your prompt response. Regards, Anthony ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] x86, hugetlb: add missing TLB page invalidation for hugetlb_cow() 2014-05-15 17:00 ` Anthony Iliopoulos @ 2014-05-14 17:24 ` Dave Hansen 2014-05-15 7:05 ` Oren Twaig 1 sibling, 0 replies; 6+ messages in thread From: Dave Hansen @ 2014-05-14 17:24 UTC (permalink / raw) To: Anthony Iliopoulos Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, linux-kernel, Kirill A. Shutemov, Shay Goikhman, Paul Mundt, Carlos Villavieja, Nacho Navarro, Avi Mendelson, Yoav Etsion, Gerald Schaefer, David Gibson, linux-arch On 05/15/2014 10:00 AM, Anthony Iliopoulos wrote: > I have actually also wondered about another related thing: > even when we actually do invalidate the page, there may still be a > race between a thread on a core that reads the page in some tight > loop (e.g. on a spinlock), and the page fault handler running on > a different core, at the point where the pte is set. Since we > invalidate the page via the TLB shootdowns *before* we update > the pte (this is true for all do_wp_page(), do_huge_pmd_wp_page() > as well as hugetlb_cow()), there may be some tiny window where the > thread might re-read the page before the pte is set. Don't forget about the "clear" part. ptep_clear_flush() does: pte = ptep_get_and_clear(mm, address, ptep); if (pte_accessible(mm, pte)) flush_tlb_page(vma, address); so it makes the pte !present and guarantees that any other CPUs looking at it after the flush but before the set_pte() will also end up in the page fault handler, and they'll wait until the first fault has finished with the page tables. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] x86, hugetlb: add missing TLB page invalidation for hugetlb_cow() 2014-05-15 17:00 ` Anthony Iliopoulos 2014-05-14 17:24 ` Dave Hansen @ 2014-05-15 7:05 ` Oren Twaig 2014-05-15 7:05 ` Oren Twaig 2014-05-17 9:24 ` Anthony Iliopoulos 1 sibling, 2 replies; 6+ messages in thread From: Oren Twaig @ 2014-05-15 7:05 UTC (permalink / raw) To: Anthony Iliopoulos, Dave Hansen Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, linux-kernel, Kirill A. Shutemov, Shay Goikhman, Paul Mundt, Carlos Villavieja, Nacho Navarro, Avi Mendelson, Yoav Etsion, Gerald Schaefer, David Gibson, linux-arch On 05/15/2014 08:00 PM, Anthony Iliopoulos wrote: > I have dismissed this case, since I assume that there are many more > cycles spent in servicing the TLB invalidation IPI, walking the pgtable > plus other related overhead (e.g. sched) than in updating the pte/pmd > so I am not sure how possible it would be to hit this condition. Hi Anthony, I have a question about the above statement. What will happen with multi cpu VMs ? won't the race described above can happen ? I.e one virtual CPU can will visit the host and the other will continue to encounter your race ? Thanks, Oren. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] x86, hugetlb: add missing TLB page invalidation for hugetlb_cow() 2014-05-15 7:05 ` Oren Twaig @ 2014-05-15 7:05 ` Oren Twaig 2014-05-17 9:24 ` Anthony Iliopoulos 1 sibling, 0 replies; 6+ messages in thread From: Oren Twaig @ 2014-05-15 7:05 UTC (permalink / raw) To: Anthony Iliopoulos, Dave Hansen Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, linux-kernel, Kirill A. Shutemov, Shay Goikhman, Paul Mundt, Carlos Villavieja, Nacho Navarro, Avi Mendelson, Yoav Etsion, Gerald Schaefer, David Gibson, linux-arch On 05/15/2014 08:00 PM, Anthony Iliopoulos wrote: > I have dismissed this case, since I assume that there are many more > cycles spent in servicing the TLB invalidation IPI, walking the pgtable > plus other related overhead (e.g. sched) than in updating the pte/pmd > so I am not sure how possible it would be to hit this condition. Hi Anthony, I have a question about the above statement. What will happen with multi cpu VMs ? won't the race described above can happen ? I.e one virtual CPU can will visit the host and the other will continue to encounter your race ? Thanks, Oren. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] x86, hugetlb: add missing TLB page invalidation for hugetlb_cow() 2014-05-15 7:05 ` Oren Twaig 2014-05-15 7:05 ` Oren Twaig @ 2014-05-17 9:24 ` Anthony Iliopoulos 1 sibling, 0 replies; 6+ messages in thread From: Anthony Iliopoulos @ 2014-05-17 9:24 UTC (permalink / raw) To: Oren Twaig Cc: Dave Hansen, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, linux-kernel, Kirill A. Shutemov, Shay Goikhman, Paul Mundt, Carlos Villavieja, Nacho Navarro, Avi Mendelson, Yoav Etsion, Gerald Schaefer, David Gibson, linux-arch Hi Oren, On Thu, May 15, 2014 at 10:05:05AM +0300, Oren Twaig wrote: > > On 05/15/2014 08:00 PM, Anthony Iliopoulos wrote: > > I have dismissed this case, since I assume that there are many more > > cycles spent in servicing the TLB invalidation IPI, walking the pgtable > > plus other related overhead (e.g. sched) than in updating the pte/pmd > > so I am not sure how possible it would be to hit this condition. > > Hi Anthony, > > I have a question about the above statement. What will happen with multi > cpu VMs ? won't the race described above can happen ? I.e one virtual CPU > can will visit the host and the other will continue to encounter your race ? I don't think there will be any race for the vcpu cases, since the sptes will be cleared via the mmu_notifier, so this should take care of it in the same manner as ptep_get_and_clear() does, as described by Dave earlier in this thread. Regards, Anthony ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2014-05-17 9:25 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20140514092948.GA17391@server-36.huawei.corp>
2014-05-13 22:44 ` [PATCH] x86, hugetlb: add missing TLB page invalidation for hugetlb_cow() Dave Hansen
2014-05-15 17:00 ` Anthony Iliopoulos
2014-05-14 17:24 ` Dave Hansen
2014-05-15 7:05 ` Oren Twaig
2014-05-15 7:05 ` Oren Twaig
2014-05-17 9:24 ` Anthony Iliopoulos
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).