All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jason Gunthorpe <jgg@nvidia.com>
To: Jann Horn <jannh@google.com>
Cc: Linux-MM <linux-mm@kvack.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Christoph Hellwig <hch@lst.de>,
	kernel list <linux-kernel@vger.kernel.org>,
	David Hildenbrand <david@redhat.com>
Subject: Re: ptep_get_lockless() on 32-bit x86/mips/sh looks wrong
Date: Thu, 6 Oct 2022 12:44:23 -0300	[thread overview]
Message-ID: <Yz7316NubHtw2xCa@nvidia.com> (raw)
In-Reply-To: <CAG48ez3h-mnp9ZFC10v+-BW_8NQvxbwBsMYJFP8JX31o0B17Pg@mail.gmail.com>

On Thu, Oct 06, 2022 at 05:23:59PM +0200, Jann Horn wrote:
> ptep_get_lockless() does the following under CONFIG_GUP_GET_PTE_LOW_HIGH:
> 
> pte_t pte;
> do {
>   pte.pte_low = ptep->pte_low;
>   smp_rmb();
>   pte.pte_high = ptep->pte_high;
>   smp_rmb();
> } while (unlikely(pte.pte_low != ptep->pte_low));
> 
> It has a comment above it that argues that this is correct because:
> 1. A present PTE can't become non-present and then become a present
> PTE pointing to another page without a TLB flush in between.
> 2. TLB flushes involve IPIs.
> 
> As far as I can tell, in particular on x86, _both_ of those
> assumptions are false; perhaps on mips and sh only one of them is?
> 
> Number 2 is straightforward: X86 can run under hypervisors, and when
> it runs under hypervisors, the MMU paravirtualization code (including
> the KVM version) can implement remote TLB flushes without IPIs.
> 
> Number 1 is gnarlier, because breaking that assumption implies that
> there can be a situation where different threads see different memory
> at the same virtual address because their TLBs are incoherent. But as
> far as I know, it can happen when MADV_DONTNEED races with an
> anonymous page fault, because zap_pte_range() does not always flush
> stale TLB entries before dropping the page table lock. I think that's
> probably fine, since it's a "garbage in, garbage out" kind of
> situation - but if a concurrent GUP-fast can then theoretically end up
> returning a completely unrelated page, that's bad.
> 
> 
> Sadly, mips and sh don't define arch_cmpxchg_double(), so we can't
> just change ptep_get_lockless() to use arch_cmpxchg_double() and be
> done with it...

I think the argument here has nothing to do with IPIs, but is more a
statement on memory ordering. What we want to get is a non-torn load
of low/high, under some restricted rules.

PTE writes should be ordered so that the present/not present bit is
properly:

Zapping a PTE:

write_low (not present)
wmb()
write_high (a)
wmb()

Reestablish a PTE:

write_high (b)
wmb()
write_low (present)
wmb()

This ordering is necessary to make the TLB's atomic 64 bit load work
properly, otherwise the TLB could read a present entry with a bogus
other half!

For ptep_get_lockless() we define non-torn as meaning the same as for the TLB:

     pre-zap low / high (present)
  restablish low / high (b) (present)
         any low / any high (not present)

Other combinations are forbidden.

The read side has a corresponding list of reads:

read_low
rmb()
read_high
rmb()
read_low

So, it seems plausible this could be OK based only on atomics (I did
not check that the present bit is properly placed in the right
low/high). Do you see a way the atomics don't work out?

Jason


  reply	other threads:[~2022-10-06 15:44 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-06 15:23 ptep_get_lockless() on 32-bit x86/mips/sh looks wrong Jann Horn
2022-10-06 15:44 ` Jason Gunthorpe [this message]
2022-10-06 15:55   ` Jann Horn
2022-10-06 15:58     ` Jann Horn
2022-10-06 16:15     ` Jason Gunthorpe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Yz7316NubHtw2xCa@nvidia.com \
    --to=jgg@nvidia.com \
    --cc=david@redhat.com \
    --cc=hch@lst.de \
    --cc=jannh@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=peterz@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.