All of lore.kernel.org
 help / color / mirror / Atom feed
From: Leonid Yegoshin <Leonid.Yegoshin@imgtec.com>
To: David Daney <ddaney.cavm@gmail.com>
Cc: <linux-mips@linux-mips.org>, <Zubair.Kakakhel@imgtec.com>,
	<david.daney@cavium.com>, <peterz@infradead.org>,
	<paul.gortmaker@windriver.com>, <davidlohr@hp.com>,
	<macro@linux-mips.org>, <chenhc@lemote.com>, <zajec5@gmail.com>,
	<james.hogan@imgtec.com>, <keescook@chromium.org>,
	<alex@alex-smith.me.uk>, <tglx@linutronix.de>,
	<blogic@openwrt.org>, <jchandra@broadcom.com>,
	<paul.burton@imgtec.com>, <qais.yousef@imgtec.com>,
	<linux-kernel@vger.kernel.org>, <ralf@linux-mips.org>,
	<markos.chandras@imgtec.com>, <manuel.lauss@gmail.com>,
	<akpm@linux-foundation.org>, <lars.persson@axis.com>
Subject: Re: [PATCH 2/3] MIPS: Setup an instruction emulation in VDSO protected page instead of user stack
Date: Mon, 6 Oct 2014 13:03:43 -0700	[thread overview]
Message-ID: <5432F59F.9080709@imgtec.com> (raw)
In-Reply-To: <5432D9F8.9040004@gmail.com>

On 10/06/2014 11:05 AM, David Daney wrote:
> On 10/03/2014 08:17 PM, Leonid Yegoshin wrote:
>> Historically, during FPU emulation MIPS runs live BD-slot instruction 
>> in stack.
>> This is needed because it was the only way to correctly handle branch
>> exceptions with unknown COP2 instructions in BD-slot. Now there is
>> an eXecuteInhibit feature and it is desirable to protect stack from 
>> execution
>> for security reasons.
>> This patch moves FPU emulation from stack area to VDSO-located page 
>> which is set
>> write-protected for application access. VDSO page itself is now 
>> per-thread and
>> it's addresses and offsets are stored in thread_info.
>> Small stack of emulation blocks is supported because nested traps are 
>> possible
>> in MIPS32/64 R6 emulation mix with FPU emulation.
>>
>
> Can you explain how this per-thread mapping works.
>
> I am especially interested in what happens when a different thread 
> from the thread using the special mapping, issues flush_tlb_mm(), and 
> invalidates the TLBs on all CPUs.  How does the TLB entry for the 
> special mapping survive this?
>
>
This patch works as long as 'install_special_mapping()' doesn't change 
PTE itself but installs Page Fault handler. It is the only hidden 
dependency from common Linux code.

MIPS code allocates a page (copy of a standard 'VDSO' page) and links it 
to thread_info and handles all allocation/deallocation/thread creation 
via arch hooks. It does it only for thread which have a memory map, not 
for kernel threads. Oh, it does all stuff only if CPU has RI/XI 
capability - the HW execute inhibit feature, otherwise it works as is 
done today.

It still does attachment of a standard 'VDSO' page to memory map for 
accounting purpose, so /proc/.../maps shows [VDSO] page. However the new 
(per-thread) page is actually a shadow.

Then TLB refill happens it loads an empty PTE and subsequent TLBL (TLB 
load Page Fault) comes to MIPS C-code which recognizes 'VDSO' address 
and asks install_vdso_tlb() to fill TLB directly and marks ASID of it in 
memory map for this CPU.

At process (read - thread) reschedule there is a check that on this CPU 
some previous thread of the same memory map loads TLB via comparing 
ASIDs. If that happend and ASIDs are the same, then local_flush_tlb_page 
is called to eliminate this TLB because it has the same ASID but can 
have a different per-thread page.

Because PTE stays as 0x00..00 and never changes then this activity 
starts again after eviction of TLB due to some reason - either 
flush_tlb_mm(), either other flush or either eviction due to TLB array 
HW or SW replacements, but only if page is demanded again.

Now, the emulation part:  some stack of emulation blocks can be used 
from top of page. Each time during emulation of FPU instruction from 
BD-slot it takes a kernel VA of page and puts that into stack but 
changes a thread EPC to user VA of that block. It uses a cache flush via 
different addresses here (D-cache via kernel VA and I-cache via user VA) 
in case of cache aliasing and new functions is needed to avoid a huge 
performance loss from flush_cache_page(). It uses a regular 
flush_cache_sigtramp() in absence of cache aliasing because in some 
systems it can be much faster (via SYNCI).

Stack of emulation blocks is needed because I work on MIPS32/64 R6 
architecture kernel and there is a need for emulation of some removed 
MIPS R2 instructions. And a reentry of emulation may happens in some 
rare cases - FPU emulation and MIPS R2 emulation subsystems are 
different pieces.


Note: After Peter Zijlstra note about performance I am thinking about 
adding the check of situation then the same single thread is rescheduled 
again on the same CPU and don't flush TLB in this case. It just requires 
yet another array of process-ids or 'VDSO' pages - one element per CPU 
and I am weighting it against schedule time interval. Today array is max 
8 elements for MIPS but it can change in future. There is also a 
possibility to write a special TLB flush function which compares TLB 
element address with page address and skips TLB element eviction if 
address compares.

- Leonid.

WARNING: multiple messages have this Message-ID (diff)
From: Leonid Yegoshin <Leonid.Yegoshin@imgtec.com>
To: David Daney <ddaney.cavm@gmail.com>
Cc: linux-mips@linux-mips.org, Zubair.Kakakhel@imgtec.com,
	david.daney@cavium.com, peterz@infradead.org,
	paul.gortmaker@windriver.com, davidlohr@hp.com,
	macro@linux-mips.org, chenhc@lemote.com, zajec5@gmail.com,
	james.hogan@imgtec.com, keescook@chromium.org,
	alex@alex-smith.me.uk, tglx@linutronix.de, blogic@openwrt.org,
	jchandra@broadcom.com, paul.burton@imgtec.com,
	qais.yousef@imgtec.com, linux-kernel@vger.kernel.org,
	ralf@linux-mips.org, markos.chandras@imgtec.com,
	manuel.lauss@gmail.com, akpm@linux-foundation.org,
	lars.persson@axis.com
Subject: Re: [PATCH 2/3] MIPS: Setup an instruction emulation in VDSO protected page instead of user stack
Date: Mon, 6 Oct 2014 13:03:43 -0700	[thread overview]
Message-ID: <5432F59F.9080709@imgtec.com> (raw)
Message-ID: <20141006200343.SqZkzrw_AoQfxsLl_TyXx_TfHw5kzOhWo5edsqfqq-s@z> (raw)
In-Reply-To: <5432D9F8.9040004@gmail.com>

On 10/06/2014 11:05 AM, David Daney wrote:
> On 10/03/2014 08:17 PM, Leonid Yegoshin wrote:
>> Historically, during FPU emulation MIPS runs live BD-slot instruction 
>> in stack.
>> This is needed because it was the only way to correctly handle branch
>> exceptions with unknown COP2 instructions in BD-slot. Now there is
>> an eXecuteInhibit feature and it is desirable to protect stack from 
>> execution
>> for security reasons.
>> This patch moves FPU emulation from stack area to VDSO-located page 
>> which is set
>> write-protected for application access. VDSO page itself is now 
>> per-thread and
>> it's addresses and offsets are stored in thread_info.
>> Small stack of emulation blocks is supported because nested traps are 
>> possible
>> in MIPS32/64 R6 emulation mix with FPU emulation.
>>
>
> Can you explain how this per-thread mapping works.
>
> I am especially interested in what happens when a different thread 
> from the thread using the special mapping, issues flush_tlb_mm(), and 
> invalidates the TLBs on all CPUs.  How does the TLB entry for the 
> special mapping survive this?
>
>
This patch works as long as 'install_special_mapping()' doesn't change 
PTE itself but installs Page Fault handler. It is the only hidden 
dependency from common Linux code.

MIPS code allocates a page (copy of a standard 'VDSO' page) and links it 
to thread_info and handles all allocation/deallocation/thread creation 
via arch hooks. It does it only for thread which have a memory map, not 
for kernel threads. Oh, it does all stuff only if CPU has RI/XI 
capability - the HW execute inhibit feature, otherwise it works as is 
done today.

It still does attachment of a standard 'VDSO' page to memory map for 
accounting purpose, so /proc/.../maps shows [VDSO] page. However the new 
(per-thread) page is actually a shadow.

Then TLB refill happens it loads an empty PTE and subsequent TLBL (TLB 
load Page Fault) comes to MIPS C-code which recognizes 'VDSO' address 
and asks install_vdso_tlb() to fill TLB directly and marks ASID of it in 
memory map for this CPU.

At process (read - thread) reschedule there is a check that on this CPU 
some previous thread of the same memory map loads TLB via comparing 
ASIDs. If that happend and ASIDs are the same, then local_flush_tlb_page 
is called to eliminate this TLB because it has the same ASID but can 
have a different per-thread page.

Because PTE stays as 0x00..00 and never changes then this activity 
starts again after eviction of TLB due to some reason - either 
flush_tlb_mm(), either other flush or either eviction due to TLB array 
HW or SW replacements, but only if page is demanded again.

Now, the emulation part:  some stack of emulation blocks can be used 
from top of page. Each time during emulation of FPU instruction from 
BD-slot it takes a kernel VA of page and puts that into stack but 
changes a thread EPC to user VA of that block. It uses a cache flush via 
different addresses here (D-cache via kernel VA and I-cache via user VA) 
in case of cache aliasing and new functions is needed to avoid a huge 
performance loss from flush_cache_page(). It uses a regular 
flush_cache_sigtramp() in absence of cache aliasing because in some 
systems it can be much faster (via SYNCI).

Stack of emulation blocks is needed because I work on MIPS32/64 R6 
architecture kernel and there is a need for emulation of some removed 
MIPS R2 instructions. And a reentry of emulation may happens in some 
rare cases - FPU emulation and MIPS R2 emulation subsystems are 
different pieces.


Note: After Peter Zijlstra note about performance I am thinking about 
adding the check of situation then the same single thread is rescheduled 
again on the same CPU and don't flush TLB in this case. It just requires 
yet another array of process-ids or 'VDSO' pages - one element per CPU 
and I am weighting it against schedule time interval. Today array is max 
8 elements for MIPS but it can change in future. There is also a 
possibility to write a special TLB flush function which compares TLB 
element address with page address and skips TLB element eviction if 
address compares.

- Leonid.

  reply	other threads:[~2014-10-06 20:03 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-10-04  3:17 [PATCH 0/3] MIPS executable stack protection Leonid Yegoshin
2014-10-04  3:17 ` Leonid Yegoshin
2014-10-04  3:17 ` [PATCH 1/3] MIPS: mips_flush_cache_range is added Leonid Yegoshin
2014-10-04  3:17   ` Leonid Yegoshin
2014-10-04  3:17 ` [PATCH 2/3] MIPS: Setup an instruction emulation in VDSO protected page instead of user stack Leonid Yegoshin
2014-10-04  3:17   ` Leonid Yegoshin
2014-10-04 20:00   ` Peter Zijlstra
2014-10-05  5:52     ` Leonid Yegoshin
2014-10-06 12:29   ` Paul Burton
2014-10-06 12:29     ` Paul Burton
2014-10-06 20:42     ` Leonid Yegoshin
2014-10-06 20:42       ` Leonid Yegoshin
2014-10-06 18:05   ` David Daney
2014-10-06 20:03     ` Leonid Yegoshin [this message]
2014-10-06 20:03       ` Leonid Yegoshin
2014-10-04  3:17 ` [PATCH 3/3] MIPS: set stack/data protection as non-executable Leonid Yegoshin
2014-10-04  3:17   ` Leonid Yegoshin
2014-10-04  8:23 ` [PATCH 0/3] MIPS executable stack protection Peter Zijlstra
2014-10-04 16:03   ` Linus Torvalds
2014-10-04 16:17     ` Leonid Yegoshin
2014-10-04 16:27       ` Linus Torvalds

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5432F59F.9080709@imgtec.com \
    --to=leonid.yegoshin@imgtec.com \
    --cc=Zubair.Kakakhel@imgtec.com \
    --cc=akpm@linux-foundation.org \
    --cc=alex@alex-smith.me.uk \
    --cc=blogic@openwrt.org \
    --cc=chenhc@lemote.com \
    --cc=david.daney@cavium.com \
    --cc=davidlohr@hp.com \
    --cc=ddaney.cavm@gmail.com \
    --cc=james.hogan@imgtec.com \
    --cc=jchandra@broadcom.com \
    --cc=keescook@chromium.org \
    --cc=lars.persson@axis.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mips@linux-mips.org \
    --cc=macro@linux-mips.org \
    --cc=manuel.lauss@gmail.com \
    --cc=markos.chandras@imgtec.com \
    --cc=paul.burton@imgtec.com \
    --cc=paul.gortmaker@windriver.com \
    --cc=peterz@infradead.org \
    --cc=qais.yousef@imgtec.com \
    --cc=ralf@linux-mips.org \
    --cc=tglx@linutronix.de \
    --cc=zajec5@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.