stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 3/4] MIPS: Handle highmem pages in __update_cache
       [not found] <1456799879-14711-1-git-send-email-paul.burton@imgtec.com>
@ 2016-03-01  2:37 ` Paul Burton
  2016-03-29  8:39   ` Ralf Baechle
       [not found] ` <1456799879-14711-5-git-send-email-paul.burton@imgtec.com>
  1 sibling, 1 reply; 7+ messages in thread
From: Paul Burton @ 2016-03-01  2:37 UTC (permalink / raw)
  To: linux-mips
  Cc: Paul Burton, Lars Persson, stable # v4 . 1+, linux-kernel,
	Andrew Morton, Jerome Marchand, Kirill A. Shutemov, Ralf Baechle

The following patch will expose __update_cache to highmem pages. Handle
them by mapping them in for the duration of the cache maintenance, just
like in __flush_dcache_page. The code for that isn't shared because we
need the page address in __update_cache so sharing became messy. Given
that the entirity is an extra 5 lines, just duplicate it.

Signed-off-by: Paul Burton <paul.burton@imgtec.com>
Cc: Lars Persson <lars.persson@axis.com>
Cc: stable <stable@vger.kernel.org> # v4.1+
---

 arch/mips/mm/cache.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/arch/mips/mm/cache.c b/arch/mips/mm/cache.c
index 5a67d8c..8befa55 100644
--- a/arch/mips/mm/cache.c
+++ b/arch/mips/mm/cache.c
@@ -149,9 +149,17 @@ void __update_cache(struct vm_area_struct *vma, unsigned long address,
 		return;
 	page = pfn_to_page(pfn);
 	if (page_mapping(page) && Page_dcache_dirty(page)) {
-		addr = (unsigned long) page_address(page);
+		if (PageHighMem(page))
+			addr = (unsigned long)kmap_atomic(page);
+		else
+			addr = (unsigned long)page_address(page);
+
 		if (exec || pages_do_alias(addr, address & PAGE_MASK))
 			flush_data_cache_page(addr);
+
+		if (PageHighMem(page))
+			__kunmap_atomic((void *)addr);
+
 		ClearPageDcacheDirty(page);
 	}
 }
-- 
2.7.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 4/4] MIPS: Sync icache & dcache in set_pte_at
       [not found]     ` <20160301171940.GA26791@NP-P-BURTON>
@ 2016-03-02 14:12       ` Ralf Baechle
  0 siblings, 0 replies; 7+ messages in thread
From: Ralf Baechle @ 2016-03-02 14:12 UTC (permalink / raw)
  To: Paul Burton
  Cc: David Daney, linux-mips, Lars Persson, stable # v4 . 1+,
	Steven J. Hill, David Daney, Huacai Chen, Aneesh Kumar K.V,
	linux-kernel, Andrew Morton, Jerome Marchand, Kirill A. Shutemov

On Tue, Mar 01, 2016 at 05:19:40PM +0000, Paul Burton wrote:

> > >+static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
> > >+			      pte_t *ptep, pte_t pteval)
> > >+{
> > >+	extern void __update_cache(unsigned long address, pte_t pte);
> > >+
> > >+	if (!pte_present(pteval))
> > >+		goto cache_sync_done;
> > >+
> > >+	if (pte_present(*ptep) && (pte_pfn(*ptep) == pte_pfn(pteval)))
> > >+		goto cache_sync_done;
> > >+
> > >+	__update_cache(addr, pteval);
> > >+cache_sync_done:
> > >+	set_pte(ptep, pteval);
> > >+}
> > >+
> > 
> > This seems crazy.
> 
> Perhaps, but also correct...
> 
> > I don't think any other architecture does this type of work in set_pte_at().
> 
> Yes they do. As mentioned in the commit message see arm, ia64 or powerpc
> for architectures that all do the same sort of thing in set_pte_at.
> 
> > Can you look into finding a better way?
> 
> Not that I can see.

FYI, this is the original commit message adding set_pte_at().  The commit
predates Linus' git history but is in the various history trees, for example
https://git.kernel.org/cgit/linux/kernel/git/tglx/history.git/commit/?id=ae3d0a847f4b38812241e4a5dc3371965c752a8c

Or for your convenience:

commit ae3d0a847f4b38812241e4a5dc3371965c752a8c
Author: David S. Miller <davem@nuts.davemloft.net>
Date:   Tue Feb 22 23:42:56 2005 -0800

    [MM]: Add set_pte_at() which takes 'mm' and 'addr' args.
    
    I'm taking a slightly different approach this time around so things
    are easier to integrate.  Here is the first patch which builds the
    infrastructure.  Basically:
    
    1) Add set_pte_at() which is set_pte() with 'mm' and 'addr' arguments
       added.  All generic code uses set_pte_at().
    
       Most platforms simply get this define:
        #define set_pte_at(mm,addr,ptep,pteval) set_pte(ptep,pteval)
    
       I chose this method over simply changing all set_pte() call sites
       because many platforms implement this in assembler and it would
       take forever to preserve the build and stabilize things if modifying
       that was necessary.
    
       Soon, with platform maintainer's help, we can kill of set_pte() entirely.
       To be honest, there are only a handful of set_pte() call sites in the
       arch specific code.
    
       Actually, in this patch ppc64 is completely set_pte() free and does not
       define it.
    
    2) pte_clear() gets 'mm' and 'addr' arguments now.
       This had a cascading effect on many ptep_test_and_*() routines.  Specifically:
       a) ptep_test_and_clear_{young,dirty}() now take 'vma' and 'address' args.
       b) ptep_get_and_clear now take 'mm' and 'address' args.
       c) ptep_mkdirty was deleted, unused by any code.
       d) ptep_set_wrprotect now takes 'mm' and 'address' args.
    
    I've tested this patch as follows:
    
    1) compile and run tested on sparc64/SMP
    2) compile tested on:
       a) ppc64/SMP
       b) i386 both with and without PAE enabled
    
    Signed-off-by: David S. Miller <davem@davemloft.net>

Which doesn't specify if the function is meant for cache management nor
is it documented in Documentation/cachetlb.txt.

  Ralf

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [4/4] MIPS: Sync icache & dcache in set_pte_at
       [not found] ` <1456799879-14711-5-git-send-email-paul.burton@imgtec.com>
       [not found]   ` <56D5CDB3.80407@caviumnetworks.com>
@ 2016-03-03  3:03   ` Leonid Yegoshin
  2016-03-04 19:02   ` [PATCH 4/4] " Lars Persson
  2 siblings, 0 replies; 7+ messages in thread
From: Leonid Yegoshin @ 2016-03-03  3:03 UTC (permalink / raw)
  To: Paul Burton, linux-mips
  Cc: Lars Persson, stable # v4 . 1+, Steven J. Hill, David Daney,
	Huacai Chen, Aneesh Kumar K.V, linux-kernel, Andrew Morton,
	Jerome Marchand, Kirill A. Shutemov, Ralf Baechle

Paul Burton wrote:

> It is, however, used in such a way by others & seems to me like the only
> correct way to implement the lazy cache flushing. The alternative would
> be to adjust all generic code to ensure flush_icache_page gets called
> before set_pte_at

... which is an exact case right now. Both calls of flush_icache_page() are:

1) do_swap_page()

         flush_icache_page(vma, page);
         if (pte_swp_soft_dirty(orig_pte))
                 pte = pte_mksoft_dirty(pte);
         set_pte_at(mm, address, page_table, pte);
...

2) do_set_pte()

         flush_icache_page(vma, page);
         entry = mk_pte(page, vma->vm_page_prot);
         if (write)
                 entry = maybe_mkwrite(pte_mkdirty(entry), vma);
         if (anon) {
                 inc_mm_counter_fast(vma->vm_mm, MM_ANONPAGES);
                 page_add_new_anon_rmap(page, vma, address);
         } else {
                 inc_mm_counter_fast(vma->vm_mm, MM_FILEPAGES);
                 page_add_file_rmap(page);
         }
         set_pte_at(vma->vm_mm, address, pte, entry);

         /* no need to invalidate: a not-present page won't be cached */
         update_mmu_cache(vma, address, pte);
....

You should only be sure that flush_icache_page() completes a lazy 
D-cache flush.

- Leonid.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 4/4] MIPS: Sync icache & dcache in set_pte_at
       [not found] ` <1456799879-14711-5-git-send-email-paul.burton@imgtec.com>
       [not found]   ` <56D5CDB3.80407@caviumnetworks.com>
  2016-03-03  3:03   ` [4/4] " Leonid Yegoshin
@ 2016-03-04 19:02   ` Lars Persson
  2016-03-05  0:21     ` Paul Burton
  2 siblings, 1 reply; 7+ messages in thread
From: Lars Persson @ 2016-03-04 19:02 UTC (permalink / raw)
  To: Paul Burton
  Cc: linux-mips@linux-mips.org, stable # v4 . 1+, Steven J. Hill,
	David Daney, Huacai Chen, Aneesh Kumar K.V,
	linux-kernel@vger.kernel.org, Andrew Morton, Jerome Marchand,
	Kirill A. Shutemov, Ralf Baechle

Hi

Some further thoughts on the matter. You have so far not showed a valid example of a race condition. The two examples you give in the commit message are for a _single_ thread existing in the address space (fork and execve).

BR,
 Lars

> 1 mars 2016 kl. 03:39 skrev Paul Burton <paul.burton@imgtec.com>:
> 
> It's possible for pages to become visible prior to update_mmu_cache
> running if a thread within the same address space preempts the current
> thread or runs simultaneously on another CPU. That is, the following
> scenario is possible:
> 
>    CPU0                            CPU1
> 
>    write to page
>    flush_dcache_page
>    flush_icache_page
>    set_pte_at
>                                    map page
>    update_mmu_cache
> 
> If CPU1 maps the page in between CPU0's set_pte_at, which marks it valid
> & visible, and update_mmu_cache where the dcache flush occurs then CPU1s
> icache will fill from stale data (unless it fills from the dcache, in
> which case all is good, but most MIPS CPUs don't have this property).
> Commit 4d46a67a3eb8 ("MIPS: Fix race condition in lazy cache flushing.")
> attempted to fix that by performing the dcache flush in
> flush_icache_page such that it occurs before the set_pte_at call makes
> the page visible. However it has the problem that not all code that
> writes to pages exposed to userland call flush_icache_page. There are
> many callers of set_pte_at under mm/ and only 2 of them do call
> flush_icache_page. Thus the race window between a page becoming visible
> & being coherent between the icache & dcache remains open in some cases.
> 
> To illustrate some of the cases, a WARN was added to __update_cache with
> this patch applied that triggered in cases where a page about to be
> flushed from the dcache was not the last page provided to
> flush_icache_page. That is, backtraces were obtained for cases in which
> the race window is left open without this patch. The 2 standout examples
> follow.
> 
> When forking a process:
> 
> [   15.271842] [<80417630>] __update_cache+0xcc/0x188
> [   15.277274] [<80530394>] copy_page_range+0x56c/0x6ac
> [   15.282861] [<8042936c>] copy_process.part.54+0xd40/0x17ac
> [   15.289028] [<80429f80>] do_fork+0xe4/0x420
> [   15.293747] [<80413808>] handle_sys+0x128/0x14c
> 
> When exec'ing an ELF binary:
> 
> [   14.445964] [<80417630>] __update_cache+0xcc/0x188
> [   14.451369] [<80538d88>] move_page_tables+0x414/0x498
> [   14.457075] [<8055d848>] setup_arg_pages+0x220/0x318
> [   14.462685] [<805b0f38>] load_elf_binary+0x530/0x12a0
> [   14.468374] [<8055ec3c>] search_binary_handler+0xbc/0x214
> [   14.474444] [<8055f6c0>] do_execveat_common+0x43c/0x67c
> [   14.480324] [<8055f938>] do_execve+0x38/0x44
> [   14.485137] [<80413808>] handle_sys+0x128/0x14c
> 
> These code paths write into a page, call flush_dcache_page then call
> set_pte_at without flush_icache_page inbetween. The end result is that
> the icache can become corrupted & userland processes may execute
> unexpected or invalid code, typically resulting in a reserved
> instruction exception, a trap or a segfault.
> 
> Fix this race condition fully by performing any cache maintenance
> required to keep the icache & dcache in sync in set_pte_at, before the
> page is made valid. This has the added bonus of ensuring the cache
> maintenance always happens in one location, rather than being duplicated
> in flush_icache_page & update_mmu_cache. It also matches the way other
> architectures solve the same problem (see arm, ia64 & powerpc).
> 
> Signed-off-by: Paul Burton <paul.burton@imgtec.com>
> Reported-by: Ionela Voinescu <ionela.voinescu@imgtec.com>
> Cc: Lars Persson <lars.persson@axis.com>
> Cc: stable <stable@vger.kernel.org> # v4.1+
> Fixes: 4d46a67a3eb8 ("MIPS: Fix race condition in lazy cache flushing.")
> 
> ---
> 
> arch/mips/include/asm/cacheflush.h |  6 ------
> arch/mips/include/asm/pgtable.h    | 26 +++++++++++++++++++++-----
> arch/mips/mm/cache.c               | 19 +++----------------
> 3 files changed, 24 insertions(+), 27 deletions(-)
> 
> diff --git a/arch/mips/include/asm/cacheflush.h b/arch/mips/include/asm/cacheflush.h
> index 7e9f468..34ed22e 100644
> --- a/arch/mips/include/asm/cacheflush.h
> +++ b/arch/mips/include/asm/cacheflush.h
> @@ -51,7 +51,6 @@ extern void (*flush_cache_range)(struct vm_area_struct *vma,
>    unsigned long start, unsigned long end);
> extern void (*flush_cache_page)(struct vm_area_struct *vma, unsigned long page, unsigned long pfn);
> extern void __flush_dcache_page(struct page *page);
> -extern void __flush_icache_page(struct vm_area_struct *vma, struct page *page);
> 
> #define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 1
> static inline void flush_dcache_page(struct page *page)
> @@ -77,11 +76,6 @@ static inline void flush_anon_page(struct vm_area_struct *vma,
> static inline void flush_icache_page(struct vm_area_struct *vma,
>    struct page *page)
> {
> -    if (!cpu_has_ic_fills_f_dc && (vma->vm_flags & VM_EXEC) &&
> -        Page_dcache_dirty(page)) {
> -        __flush_icache_page(vma, page);
> -        ClearPageDcacheDirty(page);
> -    }
> }
> 
> extern void (*flush_icache_range)(unsigned long start, unsigned long end);
> diff --git a/arch/mips/include/asm/pgtable.h b/arch/mips/include/asm/pgtable.h
> index 9a4fe01..65bf2c0 100644
> --- a/arch/mips/include/asm/pgtable.h
> +++ b/arch/mips/include/asm/pgtable.h
> @@ -127,10 +127,14 @@ do {                                    \
>    }                                \
> } while(0)
> 
> +static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
> +                  pte_t *ptep, pte_t pteval);
> +
> #if defined(CONFIG_PHYS_ADDR_T_64BIT) && defined(CONFIG_CPU_MIPS32)
> 
> #define pte_none(pte)        (!(((pte).pte_high) & ~_PAGE_GLOBAL))
> #define pte_present(pte)    ((pte).pte_low & _PAGE_PRESENT)
> +#define pte_no_exec(pte)    ((pte).pte_low & _PAGE_NO_EXEC)
> 
> static inline void set_pte(pte_t *ptep, pte_t pte)
> {
> @@ -148,7 +152,6 @@ static inline void set_pte(pte_t *ptep, pte_t pte)
>            buddy->pte_high |= _PAGE_GLOBAL;
>    }
> }
> -#define set_pte_at(mm, addr, ptep, pteval) set_pte(ptep, pteval)
> 
> static inline void pte_clear(struct mm_struct *mm, unsigned long addr, pte_t *ptep)
> {
> @@ -166,6 +169,7 @@ static inline void pte_clear(struct mm_struct *mm, unsigned long addr, pte_t *pt
> 
> #define pte_none(pte)        (!(pte_val(pte) & ~_PAGE_GLOBAL))
> #define pte_present(pte)    (pte_val(pte) & _PAGE_PRESENT)
> +#define pte_no_exec(pte)    (pte_val(pte) & _PAGE_NO_EXEC)
> 
> /*
>  * Certain architectures need to do special things when pte's
> @@ -218,7 +222,6 @@ static inline void set_pte(pte_t *ptep, pte_t pteval)
>    }
> #endif
> }
> -#define set_pte_at(mm, addr, ptep, pteval) set_pte(ptep, pteval)
> 
> static inline void pte_clear(struct mm_struct *mm, unsigned long addr, pte_t *ptep)
> {
> @@ -234,6 +237,22 @@ static inline void pte_clear(struct mm_struct *mm, unsigned long addr, pte_t *pt
> }
> #endif
> 
> +static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
> +                  pte_t *ptep, pte_t pteval)
> +{
> +    extern void __update_cache(unsigned long address, pte_t pte);
> +
> +    if (!pte_present(pteval))
> +        goto cache_sync_done;
> +
> +    if (pte_present(*ptep) && (pte_pfn(*ptep) == pte_pfn(pteval)))
> +        goto cache_sync_done;
> +
> +    __update_cache(addr, pteval);
> +cache_sync_done:
> +    set_pte(ptep, pteval);
> +}
> +
> /*
>  * (pmds are folded into puds so this doesn't get actually called,
>  * but the define is needed for a generic inline function.)
> @@ -430,15 +449,12 @@ static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
> 
> extern void __update_tlb(struct vm_area_struct *vma, unsigned long address,
>    pte_t pte);
> -extern void __update_cache(struct vm_area_struct *vma, unsigned long address,
> -    pte_t pte);
> 
> static inline void update_mmu_cache(struct vm_area_struct *vma,
>    unsigned long address, pte_t *ptep)
> {
>    pte_t pte = *ptep;
>    __update_tlb(vma, address, pte);
> -    __update_cache(vma, address, pte);
> }
> 
> static inline void update_mmu_cache_pmd(struct vm_area_struct *vma,
> diff --git a/arch/mips/mm/cache.c b/arch/mips/mm/cache.c
> index 8befa55..bf04c6c 100644
> --- a/arch/mips/mm/cache.c
> +++ b/arch/mips/mm/cache.c
> @@ -125,30 +125,17 @@ void __flush_anon_page(struct page *page, unsigned long vmaddr)
> 
> EXPORT_SYMBOL(__flush_anon_page);
> 
> -void __flush_icache_page(struct vm_area_struct *vma, struct page *page)
> -{
> -    unsigned long addr;
> -
> -    if (PageHighMem(page))
> -        return;
> -
> -    addr = (unsigned long) page_address(page);
> -    flush_data_cache_page(addr);
> -}
> -EXPORT_SYMBOL_GPL(__flush_icache_page);
> -
> -void __update_cache(struct vm_area_struct *vma, unsigned long address,
> -    pte_t pte)
> +void __update_cache(unsigned long address, pte_t pte)
> {
>    struct page *page;
>    unsigned long pfn, addr;
> -    int exec = (vma->vm_flags & VM_EXEC) && !cpu_has_ic_fills_f_dc;
> +    int exec = !pte_no_exec(pte) && !cpu_has_ic_fills_f_dc;
> 
>    pfn = pte_pfn(pte);
>    if (unlikely(!pfn_valid(pfn)))
>        return;
>    page = pfn_to_page(pfn);
> -    if (page_mapping(page) && Page_dcache_dirty(page)) {
> +    if (Page_dcache_dirty(page)) {
>        if (PageHighMem(page))
>            addr = (unsigned long)kmap_atomic(page);
>        else
> -- 
> 2.7.1
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 4/4] MIPS: Sync icache & dcache in set_pte_at
  2016-03-04 19:02   ` [PATCH 4/4] " Lars Persson
@ 2016-03-05  0:21     ` Paul Burton
  2016-03-05  0:27       ` Paul Burton
  0 siblings, 1 reply; 7+ messages in thread
From: Paul Burton @ 2016-03-05  0:21 UTC (permalink / raw)
  To: Lars Persson
  Cc: linux-mips@linux-mips.org, stable # v4 . 1+, Steven J. Hill,
	David Daney, Huacai Chen, Aneesh Kumar K.V,
	linux-kernel@vger.kernel.org, Andrew Morton, Jerome Marchand,
	Kirill A. Shutemov, Ralf Baechle

On Fri, Mar 04, 2016 at 07:02:24PM +0000, Lars Persson wrote:
> Hi
> 
> Some further thoughts on the matter. You have so far not showed a
> valid example of a race condition. The two examples you give in the
> commit message are for a _single_ thread existing in the address space
> (fork and execve).

Hi Lars,

Neither fork nor exec are limited to a single thread existing in the
address space - I'm not sure what you're saying? fork by its very
definition results in 2.

Thanks,
    Paul

> BR,
>  Lars
> 
> > 1 mars 2016 kl. 03:39 skrev Paul Burton <paul.burton@imgtec.com>:
> > 
> > It's possible for pages to become visible prior to update_mmu_cache
> > running if a thread within the same address space preempts the current
> > thread or runs simultaneously on another CPU. That is, the following
> > scenario is possible:
> > 
> >    CPU0                            CPU1
> > 
> >    write to page
> >    flush_dcache_page
> >    flush_icache_page
> >    set_pte_at
> >                                    map page
> >    update_mmu_cache
> > 
> > If CPU1 maps the page in between CPU0's set_pte_at, which marks it valid
> > & visible, and update_mmu_cache where the dcache flush occurs then CPU1s
> > icache will fill from stale data (unless it fills from the dcache, in
> > which case all is good, but most MIPS CPUs don't have this property).
> > Commit 4d46a67a3eb8 ("MIPS: Fix race condition in lazy cache flushing.")
> > attempted to fix that by performing the dcache flush in
> > flush_icache_page such that it occurs before the set_pte_at call makes
> > the page visible. However it has the problem that not all code that
> > writes to pages exposed to userland call flush_icache_page. There are
> > many callers of set_pte_at under mm/ and only 2 of them do call
> > flush_icache_page. Thus the race window between a page becoming visible
> > & being coherent between the icache & dcache remains open in some cases.
> > 
> > To illustrate some of the cases, a WARN was added to __update_cache with
> > this patch applied that triggered in cases where a page about to be
> > flushed from the dcache was not the last page provided to
> > flush_icache_page. That is, backtraces were obtained for cases in which
> > the race window is left open without this patch. The 2 standout examples
> > follow.
> > 
> > When forking a process:
> > 
> > [   15.271842] [<80417630>] __update_cache+0xcc/0x188
> > [   15.277274] [<80530394>] copy_page_range+0x56c/0x6ac
> > [   15.282861] [<8042936c>] copy_process.part.54+0xd40/0x17ac
> > [   15.289028] [<80429f80>] do_fork+0xe4/0x420
> > [   15.293747] [<80413808>] handle_sys+0x128/0x14c
> > 
> > When exec'ing an ELF binary:
> > 
> > [   14.445964] [<80417630>] __update_cache+0xcc/0x188
> > [   14.451369] [<80538d88>] move_page_tables+0x414/0x498
> > [   14.457075] [<8055d848>] setup_arg_pages+0x220/0x318
> > [   14.462685] [<805b0f38>] load_elf_binary+0x530/0x12a0
> > [   14.468374] [<8055ec3c>] search_binary_handler+0xbc/0x214
> > [   14.474444] [<8055f6c0>] do_execveat_common+0x43c/0x67c
> > [   14.480324] [<8055f938>] do_execve+0x38/0x44
> > [   14.485137] [<80413808>] handle_sys+0x128/0x14c
> > 
> > These code paths write into a page, call flush_dcache_page then call
> > set_pte_at without flush_icache_page inbetween. The end result is that
> > the icache can become corrupted & userland processes may execute
> > unexpected or invalid code, typically resulting in a reserved
> > instruction exception, a trap or a segfault.
> > 
> > Fix this race condition fully by performing any cache maintenance
> > required to keep the icache & dcache in sync in set_pte_at, before the
> > page is made valid. This has the added bonus of ensuring the cache
> > maintenance always happens in one location, rather than being duplicated
> > in flush_icache_page & update_mmu_cache. It also matches the way other
> > architectures solve the same problem (see arm, ia64 & powerpc).
> > 
> > Signed-off-by: Paul Burton <paul.burton@imgtec.com>
> > Reported-by: Ionela Voinescu <ionela.voinescu@imgtec.com>
> > Cc: Lars Persson <lars.persson@axis.com>
> > Cc: stable <stable@vger.kernel.org> # v4.1+
> > Fixes: 4d46a67a3eb8 ("MIPS: Fix race condition in lazy cache flushing.")
> > 
> > ---
> > 
> > arch/mips/include/asm/cacheflush.h |  6 ------
> > arch/mips/include/asm/pgtable.h    | 26 +++++++++++++++++++++-----
> > arch/mips/mm/cache.c               | 19 +++----------------
> > 3 files changed, 24 insertions(+), 27 deletions(-)
> > 
> > diff --git a/arch/mips/include/asm/cacheflush.h b/arch/mips/include/asm/cacheflush.h
> > index 7e9f468..34ed22e 100644
> > --- a/arch/mips/include/asm/cacheflush.h
> > +++ b/arch/mips/include/asm/cacheflush.h
> > @@ -51,7 +51,6 @@ extern void (*flush_cache_range)(struct vm_area_struct *vma,
> >    unsigned long start, unsigned long end);
> > extern void (*flush_cache_page)(struct vm_area_struct *vma, unsigned long page, unsigned long pfn);
> > extern void __flush_dcache_page(struct page *page);
> > -extern void __flush_icache_page(struct vm_area_struct *vma, struct page *page);
> > 
> > #define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 1
> > static inline void flush_dcache_page(struct page *page)
> > @@ -77,11 +76,6 @@ static inline void flush_anon_page(struct vm_area_struct *vma,
> > static inline void flush_icache_page(struct vm_area_struct *vma,
> >    struct page *page)
> > {
> > -    if (!cpu_has_ic_fills_f_dc && (vma->vm_flags & VM_EXEC) &&
> > -        Page_dcache_dirty(page)) {
> > -        __flush_icache_page(vma, page);
> > -        ClearPageDcacheDirty(page);
> > -    }
> > }
> > 
> > extern void (*flush_icache_range)(unsigned long start, unsigned long end);
> > diff --git a/arch/mips/include/asm/pgtable.h b/arch/mips/include/asm/pgtable.h
> > index 9a4fe01..65bf2c0 100644
> > --- a/arch/mips/include/asm/pgtable.h
> > +++ b/arch/mips/include/asm/pgtable.h
> > @@ -127,10 +127,14 @@ do {                                    \
> >    }                                \
> > } while(0)
> > 
> > +static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
> > +                  pte_t *ptep, pte_t pteval);
> > +
> > #if defined(CONFIG_PHYS_ADDR_T_64BIT) && defined(CONFIG_CPU_MIPS32)
> > 
> > #define pte_none(pte)        (!(((pte).pte_high) & ~_PAGE_GLOBAL))
> > #define pte_present(pte)    ((pte).pte_low & _PAGE_PRESENT)
> > +#define pte_no_exec(pte)    ((pte).pte_low & _PAGE_NO_EXEC)
> > 
> > static inline void set_pte(pte_t *ptep, pte_t pte)
> > {
> > @@ -148,7 +152,6 @@ static inline void set_pte(pte_t *ptep, pte_t pte)
> >            buddy->pte_high |= _PAGE_GLOBAL;
> >    }
> > }
> > -#define set_pte_at(mm, addr, ptep, pteval) set_pte(ptep, pteval)
> > 
> > static inline void pte_clear(struct mm_struct *mm, unsigned long addr, pte_t *ptep)
> > {
> > @@ -166,6 +169,7 @@ static inline void pte_clear(struct mm_struct *mm, unsigned long addr, pte_t *pt
> > 
> > #define pte_none(pte)        (!(pte_val(pte) & ~_PAGE_GLOBAL))
> > #define pte_present(pte)    (pte_val(pte) & _PAGE_PRESENT)
> > +#define pte_no_exec(pte)    (pte_val(pte) & _PAGE_NO_EXEC)
> > 
> > /*
> >  * Certain architectures need to do special things when pte's
> > @@ -218,7 +222,6 @@ static inline void set_pte(pte_t *ptep, pte_t pteval)
> >    }
> > #endif
> > }
> > -#define set_pte_at(mm, addr, ptep, pteval) set_pte(ptep, pteval)
> > 
> > static inline void pte_clear(struct mm_struct *mm, unsigned long addr, pte_t *ptep)
> > {
> > @@ -234,6 +237,22 @@ static inline void pte_clear(struct mm_struct *mm, unsigned long addr, pte_t *pt
> > }
> > #endif
> > 
> > +static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
> > +                  pte_t *ptep, pte_t pteval)
> > +{
> > +    extern void __update_cache(unsigned long address, pte_t pte);
> > +
> > +    if (!pte_present(pteval))
> > +        goto cache_sync_done;
> > +
> > +    if (pte_present(*ptep) && (pte_pfn(*ptep) == pte_pfn(pteval)))
> > +        goto cache_sync_done;
> > +
> > +    __update_cache(addr, pteval);
> > +cache_sync_done:
> > +    set_pte(ptep, pteval);
> > +}
> > +
> > /*
> >  * (pmds are folded into puds so this doesn't get actually called,
> >  * but the define is needed for a generic inline function.)
> > @@ -430,15 +449,12 @@ static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
> > 
> > extern void __update_tlb(struct vm_area_struct *vma, unsigned long address,
> >    pte_t pte);
> > -extern void __update_cache(struct vm_area_struct *vma, unsigned long address,
> > -    pte_t pte);
> > 
> > static inline void update_mmu_cache(struct vm_area_struct *vma,
> >    unsigned long address, pte_t *ptep)
> > {
> >    pte_t pte = *ptep;
> >    __update_tlb(vma, address, pte);
> > -    __update_cache(vma, address, pte);
> > }
> > 
> > static inline void update_mmu_cache_pmd(struct vm_area_struct *vma,
> > diff --git a/arch/mips/mm/cache.c b/arch/mips/mm/cache.c
> > index 8befa55..bf04c6c 100644
> > --- a/arch/mips/mm/cache.c
> > +++ b/arch/mips/mm/cache.c
> > @@ -125,30 +125,17 @@ void __flush_anon_page(struct page *page, unsigned long vmaddr)
> > 
> > EXPORT_SYMBOL(__flush_anon_page);
> > 
> > -void __flush_icache_page(struct vm_area_struct *vma, struct page *page)
> > -{
> > -    unsigned long addr;
> > -
> > -    if (PageHighMem(page))
> > -        return;
> > -
> > -    addr = (unsigned long) page_address(page);
> > -    flush_data_cache_page(addr);
> > -}
> > -EXPORT_SYMBOL_GPL(__flush_icache_page);
> > -
> > -void __update_cache(struct vm_area_struct *vma, unsigned long address,
> > -    pte_t pte)
> > +void __update_cache(unsigned long address, pte_t pte)
> > {
> >    struct page *page;
> >    unsigned long pfn, addr;
> > -    int exec = (vma->vm_flags & VM_EXEC) && !cpu_has_ic_fills_f_dc;
> > +    int exec = !pte_no_exec(pte) && !cpu_has_ic_fills_f_dc;
> > 
> >    pfn = pte_pfn(pte);
> >    if (unlikely(!pfn_valid(pfn)))
> >        return;
> >    page = pfn_to_page(pfn);
> > -    if (page_mapping(page) && Page_dcache_dirty(page)) {
> > +    if (Page_dcache_dirty(page)) {
> >        if (PageHighMem(page))
> >            addr = (unsigned long)kmap_atomic(page);
> >        else
> > -- 
> > 2.7.1
> > 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 4/4] MIPS: Sync icache & dcache in set_pte_at
  2016-03-05  0:21     ` Paul Burton
@ 2016-03-05  0:27       ` Paul Burton
  0 siblings, 0 replies; 7+ messages in thread
From: Paul Burton @ 2016-03-05  0:27 UTC (permalink / raw)
  To: Lars Persson
  Cc: linux-mips@linux-mips.org, stable # v4 . 1+, Steven J. Hill,
	David Daney, Huacai Chen, Aneesh Kumar K.V,
	linux-kernel@vger.kernel.org, Andrew Morton, Jerome Marchand,
	Kirill A. Shutemov, Ralf Baechle

On Sat, Mar 05, 2016 at 12:21:54AM +0000, Paul Burton wrote:
> On Fri, Mar 04, 2016 at 07:02:24PM +0000, Lars Persson wrote:
> > Hi
> > 
> > Some further thoughts on the matter. You have so far not showed a
> > valid example of a race condition. The two examples you give in the
> > commit message are for a _single_ thread existing in the address space
> > (fork and execve).
> 
> Hi Lars,
> 
> Neither fork nor exec are limited to a single thread existing in the
> address space - I'm not sure what you're saying? fork by its very
> definition results in 2.

Ok, exec kinda is (it's late...). Still, fork clearly isn't.

Thanks,
    Paul

> Thanks,
>     Paul
> 
> > BR,
> >  Lars
> > 
> > > 1 mars 2016 kl. 03:39 skrev Paul Burton <paul.burton@imgtec.com>:
> > > 
> > > It's possible for pages to become visible prior to update_mmu_cache
> > > running if a thread within the same address space preempts the current
> > > thread or runs simultaneously on another CPU. That is, the following
> > > scenario is possible:
> > > 
> > >    CPU0                            CPU1
> > > 
> > >    write to page
> > >    flush_dcache_page
> > >    flush_icache_page
> > >    set_pte_at
> > >                                    map page
> > >    update_mmu_cache
> > > 
> > > If CPU1 maps the page in between CPU0's set_pte_at, which marks it valid
> > > & visible, and update_mmu_cache where the dcache flush occurs then CPU1s
> > > icache will fill from stale data (unless it fills from the dcache, in
> > > which case all is good, but most MIPS CPUs don't have this property).
> > > Commit 4d46a67a3eb8 ("MIPS: Fix race condition in lazy cache flushing.")
> > > attempted to fix that by performing the dcache flush in
> > > flush_icache_page such that it occurs before the set_pte_at call makes
> > > the page visible. However it has the problem that not all code that
> > > writes to pages exposed to userland call flush_icache_page. There are
> > > many callers of set_pte_at under mm/ and only 2 of them do call
> > > flush_icache_page. Thus the race window between a page becoming visible
> > > & being coherent between the icache & dcache remains open in some cases.
> > > 
> > > To illustrate some of the cases, a WARN was added to __update_cache with
> > > this patch applied that triggered in cases where a page about to be
> > > flushed from the dcache was not the last page provided to
> > > flush_icache_page. That is, backtraces were obtained for cases in which
> > > the race window is left open without this patch. The 2 standout examples
> > > follow.
> > > 
> > > When forking a process:
> > > 
> > > [   15.271842] [<80417630>] __update_cache+0xcc/0x188
> > > [   15.277274] [<80530394>] copy_page_range+0x56c/0x6ac
> > > [   15.282861] [<8042936c>] copy_process.part.54+0xd40/0x17ac
> > > [   15.289028] [<80429f80>] do_fork+0xe4/0x420
> > > [   15.293747] [<80413808>] handle_sys+0x128/0x14c
> > > 
> > > When exec'ing an ELF binary:
> > > 
> > > [   14.445964] [<80417630>] __update_cache+0xcc/0x188
> > > [   14.451369] [<80538d88>] move_page_tables+0x414/0x498
> > > [   14.457075] [<8055d848>] setup_arg_pages+0x220/0x318
> > > [   14.462685] [<805b0f38>] load_elf_binary+0x530/0x12a0
> > > [   14.468374] [<8055ec3c>] search_binary_handler+0xbc/0x214
> > > [   14.474444] [<8055f6c0>] do_execveat_common+0x43c/0x67c
> > > [   14.480324] [<8055f938>] do_execve+0x38/0x44
> > > [   14.485137] [<80413808>] handle_sys+0x128/0x14c
> > > 
> > > These code paths write into a page, call flush_dcache_page then call
> > > set_pte_at without flush_icache_page inbetween. The end result is that
> > > the icache can become corrupted & userland processes may execute
> > > unexpected or invalid code, typically resulting in a reserved
> > > instruction exception, a trap or a segfault.
> > > 
> > > Fix this race condition fully by performing any cache maintenance
> > > required to keep the icache & dcache in sync in set_pte_at, before the
> > > page is made valid. This has the added bonus of ensuring the cache
> > > maintenance always happens in one location, rather than being duplicated
> > > in flush_icache_page & update_mmu_cache. It also matches the way other
> > > architectures solve the same problem (see arm, ia64 & powerpc).
> > > 
> > > Signed-off-by: Paul Burton <paul.burton@imgtec.com>
> > > Reported-by: Ionela Voinescu <ionela.voinescu@imgtec.com>
> > > Cc: Lars Persson <lars.persson@axis.com>
> > > Cc: stable <stable@vger.kernel.org> # v4.1+
> > > Fixes: 4d46a67a3eb8 ("MIPS: Fix race condition in lazy cache flushing.")
> > > 
> > > ---
> > > 
> > > arch/mips/include/asm/cacheflush.h |  6 ------
> > > arch/mips/include/asm/pgtable.h    | 26 +++++++++++++++++++++-----
> > > arch/mips/mm/cache.c               | 19 +++----------------
> > > 3 files changed, 24 insertions(+), 27 deletions(-)
> > > 
> > > diff --git a/arch/mips/include/asm/cacheflush.h b/arch/mips/include/asm/cacheflush.h
> > > index 7e9f468..34ed22e 100644
> > > --- a/arch/mips/include/asm/cacheflush.h
> > > +++ b/arch/mips/include/asm/cacheflush.h
> > > @@ -51,7 +51,6 @@ extern void (*flush_cache_range)(struct vm_area_struct *vma,
> > >    unsigned long start, unsigned long end);
> > > extern void (*flush_cache_page)(struct vm_area_struct *vma, unsigned long page, unsigned long pfn);
> > > extern void __flush_dcache_page(struct page *page);
> > > -extern void __flush_icache_page(struct vm_area_struct *vma, struct page *page);
> > > 
> > > #define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 1
> > > static inline void flush_dcache_page(struct page *page)
> > > @@ -77,11 +76,6 @@ static inline void flush_anon_page(struct vm_area_struct *vma,
> > > static inline void flush_icache_page(struct vm_area_struct *vma,
> > >    struct page *page)
> > > {
> > > -    if (!cpu_has_ic_fills_f_dc && (vma->vm_flags & VM_EXEC) &&
> > > -        Page_dcache_dirty(page)) {
> > > -        __flush_icache_page(vma, page);
> > > -        ClearPageDcacheDirty(page);
> > > -    }
> > > }
> > > 
> > > extern void (*flush_icache_range)(unsigned long start, unsigned long end);
> > > diff --git a/arch/mips/include/asm/pgtable.h b/arch/mips/include/asm/pgtable.h
> > > index 9a4fe01..65bf2c0 100644
> > > --- a/arch/mips/include/asm/pgtable.h
> > > +++ b/arch/mips/include/asm/pgtable.h
> > > @@ -127,10 +127,14 @@ do {                                    \
> > >    }                                \
> > > } while(0)
> > > 
> > > +static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
> > > +                  pte_t *ptep, pte_t pteval);
> > > +
> > > #if defined(CONFIG_PHYS_ADDR_T_64BIT) && defined(CONFIG_CPU_MIPS32)
> > > 
> > > #define pte_none(pte)        (!(((pte).pte_high) & ~_PAGE_GLOBAL))
> > > #define pte_present(pte)    ((pte).pte_low & _PAGE_PRESENT)
> > > +#define pte_no_exec(pte)    ((pte).pte_low & _PAGE_NO_EXEC)
> > > 
> > > static inline void set_pte(pte_t *ptep, pte_t pte)
> > > {
> > > @@ -148,7 +152,6 @@ static inline void set_pte(pte_t *ptep, pte_t pte)
> > >            buddy->pte_high |= _PAGE_GLOBAL;
> > >    }
> > > }
> > > -#define set_pte_at(mm, addr, ptep, pteval) set_pte(ptep, pteval)
> > > 
> > > static inline void pte_clear(struct mm_struct *mm, unsigned long addr, pte_t *ptep)
> > > {
> > > @@ -166,6 +169,7 @@ static inline void pte_clear(struct mm_struct *mm, unsigned long addr, pte_t *pt
> > > 
> > > #define pte_none(pte)        (!(pte_val(pte) & ~_PAGE_GLOBAL))
> > > #define pte_present(pte)    (pte_val(pte) & _PAGE_PRESENT)
> > > +#define pte_no_exec(pte)    (pte_val(pte) & _PAGE_NO_EXEC)
> > > 
> > > /*
> > >  * Certain architectures need to do special things when pte's
> > > @@ -218,7 +222,6 @@ static inline void set_pte(pte_t *ptep, pte_t pteval)
> > >    }
> > > #endif
> > > }
> > > -#define set_pte_at(mm, addr, ptep, pteval) set_pte(ptep, pteval)
> > > 
> > > static inline void pte_clear(struct mm_struct *mm, unsigned long addr, pte_t *ptep)
> > > {
> > > @@ -234,6 +237,22 @@ static inline void pte_clear(struct mm_struct *mm, unsigned long addr, pte_t *pt
> > > }
> > > #endif
> > > 
> > > +static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
> > > +                  pte_t *ptep, pte_t pteval)
> > > +{
> > > +    extern void __update_cache(unsigned long address, pte_t pte);
> > > +
> > > +    if (!pte_present(pteval))
> > > +        goto cache_sync_done;
> > > +
> > > +    if (pte_present(*ptep) && (pte_pfn(*ptep) == pte_pfn(pteval)))
> > > +        goto cache_sync_done;
> > > +
> > > +    __update_cache(addr, pteval);
> > > +cache_sync_done:
> > > +    set_pte(ptep, pteval);
> > > +}
> > > +
> > > /*
> > >  * (pmds are folded into puds so this doesn't get actually called,
> > >  * but the define is needed for a generic inline function.)
> > > @@ -430,15 +449,12 @@ static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
> > > 
> > > extern void __update_tlb(struct vm_area_struct *vma, unsigned long address,
> > >    pte_t pte);
> > > -extern void __update_cache(struct vm_area_struct *vma, unsigned long address,
> > > -    pte_t pte);
> > > 
> > > static inline void update_mmu_cache(struct vm_area_struct *vma,
> > >    unsigned long address, pte_t *ptep)
> > > {
> > >    pte_t pte = *ptep;
> > >    __update_tlb(vma, address, pte);
> > > -    __update_cache(vma, address, pte);
> > > }
> > > 
> > > static inline void update_mmu_cache_pmd(struct vm_area_struct *vma,
> > > diff --git a/arch/mips/mm/cache.c b/arch/mips/mm/cache.c
> > > index 8befa55..bf04c6c 100644
> > > --- a/arch/mips/mm/cache.c
> > > +++ b/arch/mips/mm/cache.c
> > > @@ -125,30 +125,17 @@ void __flush_anon_page(struct page *page, unsigned long vmaddr)
> > > 
> > > EXPORT_SYMBOL(__flush_anon_page);
> > > 
> > > -void __flush_icache_page(struct vm_area_struct *vma, struct page *page)
> > > -{
> > > -    unsigned long addr;
> > > -
> > > -    if (PageHighMem(page))
> > > -        return;
> > > -
> > > -    addr = (unsigned long) page_address(page);
> > > -    flush_data_cache_page(addr);
> > > -}
> > > -EXPORT_SYMBOL_GPL(__flush_icache_page);
> > > -
> > > -void __update_cache(struct vm_area_struct *vma, unsigned long address,
> > > -    pte_t pte)
> > > +void __update_cache(unsigned long address, pte_t pte)
> > > {
> > >    struct page *page;
> > >    unsigned long pfn, addr;
> > > -    int exec = (vma->vm_flags & VM_EXEC) && !cpu_has_ic_fills_f_dc;
> > > +    int exec = !pte_no_exec(pte) && !cpu_has_ic_fills_f_dc;
> > > 
> > >    pfn = pte_pfn(pte);
> > >    if (unlikely(!pfn_valid(pfn)))
> > >        return;
> > >    page = pfn_to_page(pfn);
> > > -    if (page_mapping(page) && Page_dcache_dirty(page)) {
> > > +    if (Page_dcache_dirty(page)) {
> > >        if (PageHighMem(page))
> > >            addr = (unsigned long)kmap_atomic(page);
> > >        else
> > > -- 
> > > 2.7.1
> > > 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 3/4] MIPS: Handle highmem pages in __update_cache
  2016-03-01  2:37 ` [PATCH 3/4] MIPS: Handle highmem pages in __update_cache Paul Burton
@ 2016-03-29  8:39   ` Ralf Baechle
  0 siblings, 0 replies; 7+ messages in thread
From: Ralf Baechle @ 2016-03-29  8:39 UTC (permalink / raw)
  To: Paul Burton
  Cc: linux-mips, Lars Persson, stable # v4 . 1+, linux-kernel,
	Andrew Morton, Jerome Marchand, Kirill A. Shutemov

On Tue, Mar 01, 2016 at 02:37:58AM +0000, Paul Burton wrote:

> The following patch will expose __update_cache to highmem pages. Handle
> them by mapping them in for the duration of the cache maintenance, just
> like in __flush_dcache_page. The code for that isn't shared because we
> need the page address in __update_cache so sharing became messy. Given
> that the entirity is an extra 5 lines, just duplicate it.
> 
> Signed-off-by: Paul Burton <paul.burton@imgtec.com>
> Cc: Lars Persson <lars.persson@axis.com>
> Cc: stable <stable@vger.kernel.org> # v4.1+
> ---
> 
>  arch/mips/mm/cache.c | 10 +++++++++-
>  1 file changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/mips/mm/cache.c b/arch/mips/mm/cache.c
> index 5a67d8c..8befa55 100644
> --- a/arch/mips/mm/cache.c
> +++ b/arch/mips/mm/cache.c
> @@ -149,9 +149,17 @@ void __update_cache(struct vm_area_struct *vma, unsigned long address,
>  		return;
>  	page = pfn_to_page(pfn);
>  	if (page_mapping(page) && Page_dcache_dirty(page)) {
> -		addr = (unsigned long) page_address(page);
> +		if (PageHighMem(page))
> +			addr = (unsigned long)kmap_atomic(page);
> +		else
> +			addr = (unsigned long)page_address(page);
> +
>  		if (exec || pages_do_alias(addr, address & PAGE_MASK))
>  			flush_data_cache_page(addr);
> +
> +		if (PageHighMem(page))
> +			__kunmap_atomic((void *)addr);
> +
>  		ClearPageDcacheDirty(page);
>  	}
>  }

Yet again this is betting the house on getting the right virtual address
from kmap_atomic.

  Ralf

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2016-03-29  8:39 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <1456799879-14711-1-git-send-email-paul.burton@imgtec.com>
2016-03-01  2:37 ` [PATCH 3/4] MIPS: Handle highmem pages in __update_cache Paul Burton
2016-03-29  8:39   ` Ralf Baechle
     [not found] ` <1456799879-14711-5-git-send-email-paul.burton@imgtec.com>
     [not found]   ` <56D5CDB3.80407@caviumnetworks.com>
     [not found]     ` <20160301171940.GA26791@NP-P-BURTON>
2016-03-02 14:12       ` [PATCH 4/4] MIPS: Sync icache & dcache in set_pte_at Ralf Baechle
2016-03-03  3:03   ` [4/4] " Leonid Yegoshin
2016-03-04 19:02   ` [PATCH 4/4] " Lars Persson
2016-03-05  0:21     ` Paul Burton
2016-03-05  0:27       ` Paul Burton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).