From: Catalin Marinas <catalin.marinas@arm.com>
To: FF <figure1802@126.com>
Cc: mark.rutland@arm.com, steve.capper@arm.com,
runninglinuxkernel@126.com, will.deacon@arm.com,
julien.grall@arm.com,
linux-arm-kernel <linux-arm-kernel@lists.infradead.org>
Subject: Re: Re: about the ptep_set_access_flags() for hardware AF/DBM
Date: Tue, 29 Oct 2019 12:11:54 +0000 [thread overview]
Message-ID: <20191029121153.GB11440@arrakis.emea.arm.com> (raw)
In-Reply-To: <1b0920d5.c4b.16e1501ef37.Coremail.figure1802@126.com>
Hi Ben,
On Tue, Oct 29, 2019 at 08:54:38AM +0800, FF wrote:
> >On Sun, Oct 27, 2019 at 05:56:24PM +0800, FF wrote:
> >> Here is the scenario:
> >> A more complex situation is possible when all CPUs support hardware
> >> AF/DBM:
> >>
> >> a) Initial state: shareable + writable vma and pte_none(pte)
> >> b) Read fault taken by two threads of the same process on different
> >> CPUs
> >> c) CPU0 takes the mmap_sem and proceeds to handling the fault. It
> >> eventually reaches do_set_pte() which sets a writable + clean pte.
> >> CPU0 releases the mmap_sem
> >> d) CPU1 acquires the mmap_sem and proceeds to handle_pte_fault(). The
> >> pte entry it reads is present, writable and clean and it continues
> >> to pte_mkyoung()
> >> e) CPU1 calls ptep_set_access_flags()
> >>
> >> If between (d) and (e) the hardware (another CPU) updates the dirty
> >> state (clears PTE_RDONLY), CPU1 will override the PTR_RDONLY bit
> >> marking the entry clean again.
[...]
> i want to elaborate the scenario, i saw the first patch to fix the
> ptep_set_access_flags() for hardware AF/DBM is on Linux 4.7-rc1.
> commit id "66dbd6e6" ,arm64: Implement ptep_set_access_flags() for
> hardware AF/DBM
What are you trying to solve? ptep_set_access_flags() being atomic is
not any worse. Do you think we wouldn't need this patch?
> i think you have issue on Linux 4.6, let's assume that we are look at
> Linux 4.6 source code.
>
> 1. initial phase: we want to create a sharable+writable file mapping
> by mmap() API, the filesyste is:ext4
See more below but I think we may need shm instead of a file mapping to
trigger the race (which, BTW, is rather theoretical; I haven't seen it
in practice).
> in do_mmap(), the vm_flags should be set VM_READ | VM_WRITE | VM_SHARED.
> in mmap_region()->vma_set_page_prot(), it will let the some shared
> mappigns will want the pages marked read-only to track write
> events, so it will clear the VM_SHARED. so it will get the pte
> attribute from protection_map[] is __P011.
>
> In Linux 4.6, __P011 is PAGE_COPY:
> #define PAGE_COPY __pgprot(_PAGE_DEFAULT | PTE_USER | PTE_NG | PTE_PXN | PTE_UXN)
>
> for PAGE_COPY, the PTE_RDONLY and PTE_WRITE(DMB) are zero.
> so the vm_flags is: VM_READ | VM_WRITE
While you are right that PAGE_COPY has PTE_RDONLY and PTE_WRITE zero,
set_pte_at() in 4.6 sets PTE_RDONLY if !PTE_WRITE. So the resulting
mapping in the page table is read-only.
Anyway I think with vma_wants_writenotify() we can't trigger this race
since it's a purely read-only fault (requires kernel notification). What
we need is to end up with a writable+clean entry which means VM_SHARED
set leading to PAGE_SHARED attributes which have PTE_WRITE/DBM set. Note
that set_pte_at() in 4.6 would mark the page as PTE_RDONLY since
pte_sw_dirty() is false.
> 2. Thread 1 on CPU0 want to write this page, page_fault will be trigger.
> in handle_pte_fault->do_fault->do_shared_fault(), it will allocate
> a new page cache, and in do_set_pte(), it will call:
> "maybe_mkwrite(pte_mkdirty(entry), vma)" to set the pte entry. so
> the pte attribute should be: PTE_DIRTY | PTE_WRITE.
Yes but the scenario I had in mind was a read fault here rather than
write which would set a PAGE_SHARED attributes ending up with
PTE_WRITE|PTE_RDONLY (PTE_WRITE is the PTE_DBM bit).
> 3. Thread 2 on CPU1 also want to read this page but this pte has not
> create by Thread 1, so page_fault happen. in pte_offset_map(), it
> found that the pte is created by Thread 1, so it will directly
> call:
>
> entry = pte_mkyoung(entry);
> ptep_set_access_flags()
>
> in ptep_set_access_flags, it will call set_pte_at() to set pte.
> but in set_pte_at() function:
>
> if (pte_present(pte)) {
> if (pte_sw_dirty(pte) && pte_write(pte))
> pte_val(pte) &= ~PTE_RDONLY;
> else
> pte_val(pte) |= PTE_RDONLY;
> if (pte_user(pte) && pte_exec(pte) && !pte_special(pte))
> __sync_icache_dcache(pte, addr);
> }
>
> it will clean the PTE_RDONLY bit, because the PTE_DIRTY |
> PTE_WRITE is set in our scenario. otherwise, anyone clean the
> PTE_DIRTY bit, who will clean this PTE_DIRTY bit?
Correct for your scenario but not if point 2 is a read.
> so i am very confusing the patch "arm64: Implement
> ptep_set_access_flags() for hardware AF/DBM" commit log's scenrio.
> would you like point out what i am missing?
If point 2 is a read fault, that goes via do_read_fault() and the pte
ends up as clean with PTE_WRITE|PTE_RDONLY is set since it's not
pte_sw_dirty() (checked by set_pte_at()).
Thread 2 on CPU1 would end up calling ptep_set_access_flags() on a
read-only pte with DBM set because it took a read fault (same as Thread
1).
The problem appears if a Thread 3 on CPU2 performs a write access in
parallel with point 3 above. CPU2 sees the pte as valid, RDONLY and DBM
set, and proceeds to clearing the RDONLY bit in hardware. CPU1 then
overrides the PTE_RDONLY bit if ptep_set_access_flags() is not atomic.
Now you need to find a vm_operations_struct that allows shared, writable
and clean mappings and does not set .page_mkwrite (shm_vm_ops is one).
--
Catalin
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
next prev parent reply other threads:[~2019-10-29 12:12 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-10-27 9:56 about the ptep_set_access_flags() for hardware AF/DBM FF
2019-10-28 18:43 ` Catalin Marinas
2019-10-29 0:54 ` FF
2019-10-29 12:11 ` Catalin Marinas [this message]
2019-10-29 14:04 ` Re:Re: " FF
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20191029121153.GB11440@arrakis.emea.arm.com \
--to=catalin.marinas@arm.com \
--cc=figure1802@126.com \
--cc=julien.grall@arm.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=mark.rutland@arm.com \
--cc=runninglinuxkernel@126.com \
--cc=steve.capper@arm.com \
--cc=will.deacon@arm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).