public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Leonid Yegoshin <Leonid.Yegoshin@imgtec.com>
To: Lars Persson <lars.persson@axis.com>
Cc: Ralf Baechle <ralf@linux-mips.org>,
	"linux-mips@linux-mips.org" <linux-mips@linux-mips.org>,
	"james.hogan@imgtec.com" <james.hogan@imgtec.com>,
	"keescook@chromium.org" <keescook@chromium.org>,
	"paul.burton@imgtec.com" <paul.burton@imgtec.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"manuel.lauss@gmail.com" <manuel.lauss@gmail.com>,
	"pbonzini@redhat.com" <pbonzini@redhat.com>,
	"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
	"blogic@openwrt.org" <blogic@openwrt.org>,
	"markos.chandras@imgtec.com" <markos.chandras@imgtec.com>
Subject: Re: [PATCH] Revert "MIPS: Remove race window in page fault handling"
Date: Fri, 5 Dec 2014 13:41:25 -0800	[thread overview]
Message-ID: <54822685.8070504@imgtec.com> (raw)
In-Reply-To: <1417771976.8400.8.camel@lnxlarper.se.axis.com>

Lars,

On 12/05/2014 01:32 AM, Lars Persson wrote:
> Hi
>
> Our setup includes both a non-DMA block device and a compressing
> file-system (UBIFS). A flush_dcache_page() is issued by UBIFS so your
> patch fixes another problem that we do not hit.
>
> The stack trace is not available now. Do we need it for any further
> analysis ? I think the mechanism of the race window is understood and it
> depends on the __flush_dcache_page() deciding that the flush should be
> postponed.

Unfortunately, the research of original case is still needed.
I looked into all cases of update_mmu_cache() besides HUGE page support 
and NUMA, and I see:

1.  insert_pfn()
     It is used to put a special page (read - VDSO) into memory map. No 
cache flush is needed here because page is done and flushed during 
system boot.

2.  do_wp_page(), first occurrence
     It has flush_cache_page() before it sets PTE in 
ptep_set_access_flags(). This flush is unconditional and affects all caches.

3.  do_wp_page(), second case
     It is done after preparing a clear new page or after COW. COW has 
an appropriate cache flush of destination in copy_user_highpage(). The 
immediate use of cleared new page as instruction (you had SIGILL, 
right?)... hm-m, something wrong in application in this case.

4.  do_swap_page()
     Well, it may be a case of flush_icache_page() is not used (see 
below) and page is taken from non-DMA swap. But I also recommend to look 
into

         http://patchwork.linux-mips.org/patch/7615/

there is a bug in swap entry number presentation and it may affect your 
system.

5.  do_anonymous_page()
     The similar to case (3) - cleared new page, using of it as 
instruction page may point to some app problem.

6.  do_set_pte()
     It also has flush_icache_page() which may have impact if not 
implemented, see below.

7.  handle_pte_fault()
     Page is not touched and cache flush is NO-OP.

8.  remove_migration_pte()
     Well, it is a place for suspicion. But it should not run in 
parallel with any running thread - dirtying page while other thread is 
running is a way to disaster.

So, you see - if I understand it correctly, there is no place for 
failure... besides application misbehaviour or potential kernel bug in 
migration. Of course, I may miss something and that is a reason why 
stack trace is desirable.


> I think the mechanism of the race window is understood and it
> depends on the __flush_dcache_page() deciding that the flush should be
> postponed.

As I remember, you said you use HIGHMEM patch, right? It changes a 
little __flush_dcache_page() and flush of any mapped page is not 
postponed anymore. So, it has an immediate effect for application pages.

- Leonid.

>
>
> - Lars
>
> On Fri, 2014-12-05 at 03:16 +0100, Leonid Yegoshin wrote:
>> (repeat mesg, first one went to wrong place)
>>
>> Lars,
>>
>> Do you have a stack trace or so then you found the second VPE between
>> set_pte_at and update_mmu_cache?
>> It would be interesting how it happens - generally, to get a consistent
>> SIGILL in applications due to misbehaviour of memory subsystem, the bug
>> in FS is not enough.
>>
>> Hold on - do you use non-DMA file system?
>> If so, I advice you to try this simple patch:
>>
>>       Author: Leonid Yegoshin <yegoshin@mips.com>
>>       Date:   Tue Apr 2 14:20:37 2013 -0700
>>
>>       MIPS: (opt) Fix of reading I-pages from non-DMA FS devices for ID
>> cache separation
>>
>>       This optional fix provides a D-cache flush for instruction code
>> pages on
>>       page faults. In case of non-DMA block device a driver doesn't know
>> that it
>>       reads I-page and doesn't flush D-cache generally on systems without
>>       cache aliasing. And that takes toll during page fault of
>> instruction pages.
>>
>>       It is not a perfect fix, it should be considered as a temporary fix.
>>       The permanent fix would track page origin in page cache and flushes
>> D-cache
>>       during reception of page from driver only but not at each page fault.
>>       It is not done yet.
>>
>>       Change-Id: I43f5943d6ce0509729179615f6b81e77803a34ac
>>       Author: Leonid Yegoshin <yegoshin@mips.com>
>>       Signed-off-by: Leonid Yegoshin <yegoshin@mips.com>(imported from
>> commit 6ebd22eb7a3d9873582ebe990a77094f971652ee)(imported from commit
>> 0caf3b4a1eebb64572e81e4df6fdb3abf12c70
>>
>> arch/mips/include/asm/cacheflush.h:
>>
>>      @@ -61,6 +61,9 @@ static inline void flush_anon_page(struct
>> vm_area_struct *vma,
>>       static inline void flush_icache_page(struct vm_area_struct *vma,
>>              struct page *page)
>>       {
>>      +       if (cpu_has_dc_aliases ||
>>      +           ((vma->vm_flags & VM_EXEC) && !cpu_has_ic_fills_f_dc))
>>      +               __flush_dcache_page(page);
>>       }
>>
>>       extern void (*flush_icache_range)(unsigned long start, unsigned
>> long end);
>>
>>
>> It fixed crash problems with non-DMA FS in a couple of our customers.
>> Without it the non-DMA root FS crashes are catastrophic in aliasing
>> systems but it is still a problem for I-cache too but much rare.
>>
>> Unfortunately, it is also a performance hit, however is less than run a
>> page cache flush at each PTE setup. On 12/03/2014 06:03 AM, Lars Persson
>> wrote:
>>> It is the flush_dcache_page() that was called from the file-system
>>> reading the page contents into memory.
>>>
>>> - Lars
>>>
>>>
>
>


  reply	other threads:[~2014-12-05 21:41 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-12-03  3:25 [PATCH] Revert "MIPS: Remove race window in page fault handling" Leonid Yegoshin
2014-12-03  9:31 ` Lars Persson
2014-12-03 13:24   ` Ralf Baechle
2014-12-03 13:42   ` Ralf Baechle
2014-12-03 14:03     ` Lars Persson
2014-12-03 19:28       ` Leonid Yegoshin
2014-12-05  2:16       ` Leonid Yegoshin
2014-12-05  9:32         ` Lars Persson
2014-12-05 21:41           ` Leonid Yegoshin [this message]
2014-12-08  9:18             ` Lars Persson
2014-12-03 14:20     ` Lars Persson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=54822685.8070504@imgtec.com \
    --to=leonid.yegoshin@imgtec.com \
    --cc=akpm@linux-foundation.org \
    --cc=blogic@openwrt.org \
    --cc=james.hogan@imgtec.com \
    --cc=keescook@chromium.org \
    --cc=lars.persson@axis.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mips@linux-mips.org \
    --cc=manuel.lauss@gmail.com \
    --cc=markos.chandras@imgtec.com \
    --cc=paul.burton@imgtec.com \
    --cc=pbonzini@redhat.com \
    --cc=ralf@linux-mips.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox