Re: [PATCH] x86, hugetlb: add missing TLB page invalidation for hugetlb

linux-arch.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Re: [PATCH] x86, hugetlb: add missing TLB page invalidation for hugetlb_cow()
       [not found] <20140514092948.GA17391@server-36.huawei.corp>
@ 2014-05-13 22:44 ` Dave Hansen
  2014-05-15 17:00   ` Anthony Iliopoulos
  0 siblings, 1 reply; 6+ messages in thread
From: Dave Hansen @ 2014-05-13 22:44 UTC (permalink / raw)
  To: Anthony Iliopoulos, Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: x86, linux-kernel, Kirill A. Shutemov, Shay Goikhman, Paul Mundt,
	Carlos Villavieja, Nacho Navarro, Avi Mendelson, Yoav Etsion,
	Gerald Schaefer, David Gibson, linux-arch

On 05/14/2014 02:29 AM, Anthony Iliopoulos wrote:
> The invalidation is required in order to maintain proper semantics
> under CoW conditions. In scenarios where a process clones several
> threads, a thread operating on a core whose DTLB entry for a
> particular hugepage has not been invalidated, will be reading from
> the hugepage that belongs to the forked child process, even after
> hugetlb_cow().
> 
> The thread will not see the updated page as long as the stale DTLB
> entry remains cached, the thread attempts to write into the page,
> the child process exits, or the thread gets migrated to a different
> processor.

No to be too nitpicky, but this applies to ITLB too, right?

I believe this bug came all the way back from:

> commit 1e8f889b10d8d2223105719e36ce45688fedbd59
> Author: David Gibson <david@gibson.dropbear.id.au>
> Date:   Fri Jan 6 00:10:44 2006 -0800
> 
>     [PATCH] Hugetlb: Copy on Write support

It was probably the first time that we ever changed an _existing_
hugetlbfs pte, and that patch probably just missed the TLB flush because
none of the other pte-setting hugetlb.c code needed TLB flushes.

The bogus x86 version of huge_ptep_clear_flush() came from the s390
guys, so double-shame on IBM! :P

> commit 8fe627ec5b7c47b1654dff50536d9709863295a3
> Author: Gerald Schaefer <gerald.schaefer@de.ibm.com>
> Date:   Mon Apr 28 02:13:28 2008 -0700
> 
>     hugetlbfs: add missing TLB flush to hugetlb_cow()

This is probably an opportunity for all the other architecture
maintainers to make sure that they have proper copies of
huge_ptep_clear_flush().

I went through the hugetlb code on x86 and couldn't find another TLB
flush that fixes this issue, and I believe this is correct, so:

Acked-by: Dave Hansen <dave.hansen@intel.com>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] x86, hugetlb: add missing TLB page invalidation for hugetlb_cow()
  2014-05-13 22:44 ` [PATCH] x86, hugetlb: add missing TLB page invalidation for hugetlb_cow() Dave Hansen
@ 2014-05-15 17:00   ` Anthony Iliopoulos
  2014-05-14 17:24     ` Dave Hansen
  2014-05-15  7:05     ` Oren Twaig
  0 siblings, 2 replies; 6+ messages in thread
From: Anthony Iliopoulos @ 2014-05-15 17:00 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, linux-kernel,
	Kirill A. Shutemov, Shay Goikhman, Paul Mundt, Carlos Villavieja,
	Nacho Navarro, Avi Mendelson, Yoav Etsion, Gerald Schaefer,
	David Gibson, linux-arch

On Tue, May 13, 2014 at 03:44:55PM -0700, Dave Hansen wrote:
> On 05/14/2014 02:29 AM, Anthony Iliopoulos wrote:
> > The invalidation is required in order to maintain proper semantics
> > under CoW conditions. In scenarios where a process clones several
> > threads, a thread operating on a core whose DTLB entry for a
> > particular hugepage has not been invalidated, will be reading from
> > the hugepage that belongs to the forked child process, even after
> > hugetlb_cow().
> > 
> > The thread will not see the updated page as long as the stale DTLB
> > entry remains cached, the thread attempts to write into the page,
> > the child process exits, or the thread gets migrated to a different
> > processor.
> 
> No to be too nitpicky, but this applies to ITLB too, right?

Quite true, this does apply to ITLB too. I suppose there might be cases
like self-modifying code that touches the private text pages, or some
JIT compiler writing on mmap-ed executable regions that would observe
the same behavior under similar conditions.

> I believe this bug came all the way back from:
> 
> > commit 1e8f889b10d8d2223105719e36ce45688fedbd59
> > Author: David Gibson <david@gibson.dropbear.id.au>
> > Date:   Fri Jan 6 00:10:44 2006 -0800
> > 
> >     [PATCH] Hugetlb: Copy on Write support
> 
> It was probably the first time that we ever changed an _existing_
> hugetlbfs pte, and that patch probably just missed the TLB flush because
> none of the other pte-setting hugetlb.c code needed TLB flushes.

This seems to be the case, I assume that our internal use case was
also probably the first that needed to rely on proper semantics under
a multithreaded CoW scenario with hugetlb pages, so this came into
the surface.

I have actually also wondered about another related thing:
even when we actually do invalidate the page, there may still be a
race between a thread on a core that reads the page in some tight
loop (e.g. on a spinlock), and the page fault handler running on
a different core, at the point where the pte is set. Since we
invalidate the page via the TLB shootdowns *before* we update
the pte (this is true for all do_wp_page(), do_huge_pmd_wp_page()
as well as hugetlb_cow()), there may be some tiny window where the
thread might re-read the page before the pte is set.

I have dismissed this case, since I assume that there are many more
cycles spent in servicing the TLB invalidation IPI, walking the pgtable
plus other related overhead (e.g. sched) than in updating the pte/pmd
so I am not sure how possible it would be to hit this condition.

I am also hesitant to simply submit a patch that reverses the order
of flushing and setting the pte, due to the following commit:

    commit 4ce072f1faf29d24df4600f53db8cdd62d400a8f
    Author: Siddha, Suresh B <suresh.b.siddha@intel.com>
    Date:   Fri Sep 29 01:58:42 2006 -0700

    [PATCH] mm: fix a race condition under SMC + COW

> The bogus x86 version of huge_ptep_clear_flush() came from the s390
> guys, so double-shame on IBM! :P
> 
> > commit 8fe627ec5b7c47b1654dff50536d9709863295a3
> > Author: Gerald Schaefer <gerald.schaefer@de.ibm.com>
> > Date:   Mon Apr 28 02:13:28 2008 -0700
> > 
> >     hugetlbfs: add missing TLB flush to hugetlb_cow()
> 
> This is probably an opportunity for all the other architecture
> maintainers to make sure that they have proper copies of
> huge_ptep_clear_flush().
> 
> I went through the hugetlb code on x86 and couldn't find another TLB
> flush that fixes this issue, and I believe this is correct, so:
> 
> Acked-by: Dave Hansen <dave.hansen@intel.com>

Many thanks for confirming, and for your prompt response.

Regards,
Anthony

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] x86, hugetlb: add missing TLB page invalidation for hugetlb_cow()
  2014-05-15 17:00   ` Anthony Iliopoulos
@ 2014-05-14 17:24     ` Dave Hansen
  2014-05-15  7:05     ` Oren Twaig
  1 sibling, 0 replies; 6+ messages in thread
From: Dave Hansen @ 2014-05-14 17:24 UTC (permalink / raw)
  To: Anthony Iliopoulos
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, linux-kernel,
	Kirill A. Shutemov, Shay Goikhman, Paul Mundt, Carlos Villavieja,
	Nacho Navarro, Avi Mendelson, Yoav Etsion, Gerald Schaefer,
	David Gibson, linux-arch

On 05/15/2014 10:00 AM, Anthony Iliopoulos wrote:
> I have actually also wondered about another related thing:
> even when we actually do invalidate the page, there may still be a
> race between a thread on a core that reads the page in some tight
> loop (e.g. on a spinlock), and the page fault handler running on
> a different core, at the point where the pte is set. Since we
> invalidate the page via the TLB shootdowns *before* we update
> the pte (this is true for all do_wp_page(), do_huge_pmd_wp_page()
> as well as hugetlb_cow()), there may be some tiny window where the
> thread might re-read the page before the pte is set.

Don't forget about the "clear" part.  ptep_clear_flush() does:

        pte = ptep_get_and_clear(mm, address, ptep);
        if (pte_accessible(mm, pte))
                flush_tlb_page(vma, address);

so it makes the pte !present and guarantees that any other CPUs looking
at it after the flush but before the set_pte() will also end up in the
page fault handler, and they'll wait until the first fault has finished
with the page tables.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] x86, hugetlb: add missing TLB page invalidation for hugetlb_cow()
  2014-05-15 17:00   ` Anthony Iliopoulos
  2014-05-14 17:24     ` Dave Hansen
@ 2014-05-15  7:05     ` Oren Twaig
  2014-05-15  7:05       ` Oren Twaig
  2014-05-17  9:24       ` Anthony Iliopoulos
  1 sibling, 2 replies; 6+ messages in thread
From: Oren Twaig @ 2014-05-15  7:05 UTC (permalink / raw)
  To: Anthony Iliopoulos, Dave Hansen
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, linux-kernel,
	Kirill A. Shutemov, Shay Goikhman, Paul Mundt, Carlos Villavieja,
	Nacho Navarro, Avi Mendelson, Yoav Etsion, Gerald Schaefer,
	David Gibson, linux-arch


On 05/15/2014 08:00 PM, Anthony Iliopoulos wrote:
> I have dismissed this case, since I assume that there are many more
> cycles spent in servicing the TLB invalidation IPI, walking the pgtable
> plus other related overhead (e.g. sched) than in updating the pte/pmd
> so I am not sure how possible it would be to hit this condition.

Hi Anthony,

I have a question about the above statement. What will happen with multi
cpu VMs ? won't the race described above can happen ? I.e one virtual CPU
can will visit the host and the other will continue to encounter your race ?

Thanks,
Oren.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] x86, hugetlb: add missing TLB page invalidation for hugetlb_cow()
  2014-05-15  7:05     ` Oren Twaig
@ 2014-05-15  7:05       ` Oren Twaig
  2014-05-17  9:24       ` Anthony Iliopoulos
  1 sibling, 0 replies; 6+ messages in thread
From: Oren Twaig @ 2014-05-15  7:05 UTC (permalink / raw)
  To: Anthony Iliopoulos, Dave Hansen
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, linux-kernel,
	Kirill A. Shutemov, Shay Goikhman, Paul Mundt, Carlos Villavieja,
	Nacho Navarro, Avi Mendelson, Yoav Etsion, Gerald Schaefer,
	David Gibson, linux-arch


On 05/15/2014 08:00 PM, Anthony Iliopoulos wrote:
> I have dismissed this case, since I assume that there are many more
> cycles spent in servicing the TLB invalidation IPI, walking the pgtable
> plus other related overhead (e.g. sched) than in updating the pte/pmd
> so I am not sure how possible it would be to hit this condition.

Hi Anthony,

I have a question about the above statement. What will happen with multi
cpu VMs ? won't the race described above can happen ? I.e one virtual CPU
can will visit the host and the other will continue to encounter your race ?

Thanks,
Oren.



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] x86, hugetlb: add missing TLB page invalidation for hugetlb_cow()
  2014-05-15  7:05     ` Oren Twaig
  2014-05-15  7:05       ` Oren Twaig
@ 2014-05-17  9:24       ` Anthony Iliopoulos
  1 sibling, 0 replies; 6+ messages in thread
From: Anthony Iliopoulos @ 2014-05-17  9:24 UTC (permalink / raw)
  To: Oren Twaig
  Cc: Dave Hansen, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86,
	linux-kernel, Kirill A. Shutemov, Shay Goikhman, Paul Mundt,
	Carlos Villavieja, Nacho Navarro, Avi Mendelson, Yoav Etsion,
	Gerald Schaefer, David Gibson, linux-arch

Hi Oren,

On Thu, May 15, 2014 at 10:05:05AM +0300, Oren Twaig wrote:
> 
> On 05/15/2014 08:00 PM, Anthony Iliopoulos wrote:
> > I have dismissed this case, since I assume that there are many more
> > cycles spent in servicing the TLB invalidation IPI, walking the pgtable
> > plus other related overhead (e.g. sched) than in updating the pte/pmd
> > so I am not sure how possible it would be to hit this condition.
> 
> Hi Anthony,
> 
> I have a question about the above statement. What will happen with multi
> cpu VMs ? won't the race described above can happen ? I.e one virtual CPU
> can will visit the host and the other will continue to encounter your race ?

I don't think there will be any race for the vcpu cases, since the sptes
will be cleared via the mmu_notifier, so this should take care of it
in the same manner as ptep_get_and_clear() does, as described by Dave
earlier in this thread.

Regards,
Anthony

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2014-05-17  9:25 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20140514092948.GA17391@server-36.huawei.corp>
2014-05-13 22:44 ` [PATCH] x86, hugetlb: add missing TLB page invalidation for hugetlb_cow() Dave Hansen
2014-05-15 17:00   ` Anthony Iliopoulos
2014-05-14 17:24     ` Dave Hansen
2014-05-15  7:05     ` Oren Twaig
2014-05-15  7:05       ` Oren Twaig
2014-05-17  9:24       ` Anthony Iliopoulos

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).