All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Christopher M. Riedl" <cmr@informatik.wtf>
To: "Christophe Leroy" <christophe.leroy@c-s.fr>
Cc: linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org
Subject: Re: [RFC PATCH] powerpc/lib: Fixing use a temporary mm for code patching
Date: Mon, 20 Apr 2020 23:22:49 -0500	[thread overview]
Message-ID: <C26LKGPLFN9C.57CFF2U7I7X0@geist> (raw)
In-Reply-To: <3a37ab41-ab0e-6fae-9fbe-710f83a945f2@c-s.fr>

On Sat Apr 18, 2020 at 12:27 PM, Christophe Leroy wrote:
>
> 
>
> 
> Le 15/04/2020 à 18:22, Christopher M Riedl a écrit :
> >> On April 15, 2020 4:12 AM Christophe Leroy <christophe.leroy@c-s.fr> wrote:
> >>
> >>   
> >> Le 15/04/2020 à 07:16, Christopher M Riedl a écrit :
> >>>> On March 26, 2020 9:42 AM Christophe Leroy <christophe.leroy@c-s.fr> wrote:
> >>>>
> >>>>    
> >>>> This patch fixes the RFC series identified below.
> >>>> It fixes three points:
> >>>> - Failure with CONFIG_PPC_KUAP
> >>>> - Failure to write do to lack of DIRTY bit set on the 8xx
> >>>> - Inadequaly complex WARN post verification
> >>>>
> >>>> However, it has an impact on the CPU load. Here is the time
> >>>> needed on an 8xx to run the ftrace selftests without and
> >>>> with this series:
> >>>> - Without CONFIG_STRICT_KERNEL_RWX		==> 38 seconds
> >>>> - With CONFIG_STRICT_KERNEL_RWX			==> 40 seconds
> >>>> - With CONFIG_STRICT_KERNEL_RWX + this series	==> 43 seconds
> >>>>
> >>>> Link: https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=166003
> >>>> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
> >>>> ---
> >>>>    arch/powerpc/lib/code-patching.c | 5 ++++-
> >>>>    1 file changed, 4 insertions(+), 1 deletion(-)
> >>>>
> >>>> diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
> >>>> index f156132e8975..4ccff427592e 100644
> >>>> --- a/arch/powerpc/lib/code-patching.c
> >>>> +++ b/arch/powerpc/lib/code-patching.c
> >>>> @@ -97,6 +97,7 @@ static int map_patch(const void *addr, struct patch_mapping *patch_mapping)
> >>>>    	}
> >>>>    
> >>>>    	pte = mk_pte(page, pgprot);
> >>>> +	pte = pte_mkdirty(pte);
> >>>>    	set_pte_at(patching_mm, patching_addr, ptep, pte);
> >>>>    
> >>>>    	init_temp_mm(&patch_mapping->temp_mm, patching_mm);
> >>>> @@ -168,7 +169,9 @@ static int do_patch_instruction(unsigned int *addr, unsigned int instr)
> >>>>    			(offset_in_page((unsigned long)addr) /
> >>>>    				sizeof(unsigned int));
> >>>>    
> >>>> +	allow_write_to_user(patch_addr, sizeof(instr));
> >>>>    	__patch_instruction(addr, instr, patch_addr);
> >>>> +	prevent_write_to_user(patch_addr, sizeof(instr));
> >>>>
> >>>
> >>> On radix we can map the page with PAGE_KERNEL protection which ends up
> >>> setting EAA[0] in the radix PTE. This means the KUAP (AMR) protection is
> >>> ignored (ISA v3.0b Fig. 35) since we are accessing the page from MSR[PR]=0.
> >>>
> >>> Can we employ a similar approach on the 8xx? I would prefer *not* to wrap
> >>> the __patch_instruction() with the allow_/prevent_write_to_user() KUAP things
> >>> because this is a temporary kernel mapping which really isn't userspace in
> >>> the usual sense.
> >>
> >> On the 8xx, that's pretty different.
> >>
> >> The PTE doesn't control whether a page is user page or a kernel page.
> >> The only thing that is set in the PTE is whether a page is linked to a
> >> given PID or not.
> >> PAGE_KERNEL tells that the page can be addressed with any PID.
> >>
> >> The user access right is given by a kind of zone, which is in the PGD
> >> entry. Every pages above PAGE_OFFSET are defined as belonging to zone 0.
> >> Every pages below PAGE_OFFSET are defined as belonging to zone 1.
> >>
> >> By default, zone 0 can only be accessed by kernel, and zone 1 can only
> >> be accessed by user. When kernel wants to access zone 1, it temporarily
> >> changes properties of zone 1 to allow both kernel and user accesses.
> >>
> >> So, if your mapping is below PAGE_OFFSET, it is in zone 1 and kernel
> >> must unlock it to access it.
> >>
> >>
> >> And this is more or less the same on hash/32. This is managed by segment
> >> registers. One segment register corresponds to a 256Mbytes area. Every
> >> pages below PAGE_OFFSET can only be read by default by kernel. Only user
> >> can write if the PTE allows it. When the kernel needs to write at an
> >> address below PAGE_OFFSET, it must change the segment properties in the
> >> corresponding segment register.
> >>
> >> So, for both cases, if we want to have it local to a task while still
> >> allowing kernel access, it means we have to define a new special area
> >> between TASK_SIZE and PAGE_OFFSET which belongs to kernel zone.
> >>
> >> That looks complex to me for a small benefit, especially as 8xx is not
> >> SMP and neither are most of the hash/32 targets.
> >>
> > 
> > Agreed. So I guess the solution is to differentiate between radix/non-radix
> > and use PAGE_SHARED for non-radix along with the KUAP functions when KUAP
> > is enabled. Hmm, I need to think about this some more, especially if it's
> > acceptable to temporarily map kernel text as PAGE_SHARED for patching. Do
> > you see any obvious problems on 8xx and hash/32 w/ using PAGE_SHARED?
>
> 
> No it shouldn't be a problem AFAICS, except maybe the CPU overhead it
> brings as I mentioned previously (ftrace selftests going from 40 to 43
> seconds ie 8% overhead.
>
Ok great. I will have some performance numbers for POWER8 and POWER9 with
the next spin of the RFC
> 
> > 
> > I don't necessarily want to drop the local mm patching idea for non-radix
> > platforms since that means we would have to maintain two implementations.
> > 
>
> 
> What's the problem with RADIX, why can't PAGE_SHARED be used on radix ?
>
It's not a problem. I would actually prefer to use PAGE_KERNEL since the
mapping is really for a kernel page. On radix using PAGE_KERNEL allows us
to avoid the KUAP functions due to the HW implementation (AMR and EAA).
> 
> Christophe
>
> 
>
> 


WARNING: multiple messages have this Message-ID (diff)
From: "Christopher M. Riedl" <cmr@informatik.wtf>
To: "Christophe Leroy" <christophe.leroy@c-s.fr>
Cc: <linux-kernel@vger.kernel.org>, <linuxppc-dev@lists.ozlabs.org>
Subject: Re: [RFC PATCH] powerpc/lib: Fixing use a temporary mm for code patching
Date: Mon, 20 Apr 2020 23:22:49 -0500	[thread overview]
Message-ID: <C26LKGPLFN9C.57CFF2U7I7X0@geist> (raw)
In-Reply-To: <3a37ab41-ab0e-6fae-9fbe-710f83a945f2@c-s.fr>

On Sat Apr 18, 2020 at 12:27 PM, Christophe Leroy wrote:
>
> 
>
> 
> Le 15/04/2020 à 18:22, Christopher M Riedl a écrit :
> >> On April 15, 2020 4:12 AM Christophe Leroy <christophe.leroy@c-s.fr> wrote:
> >>
> >>   
> >> Le 15/04/2020 à 07:16, Christopher M Riedl a écrit :
> >>>> On March 26, 2020 9:42 AM Christophe Leroy <christophe.leroy@c-s.fr> wrote:
> >>>>
> >>>>    
> >>>> This patch fixes the RFC series identified below.
> >>>> It fixes three points:
> >>>> - Failure with CONFIG_PPC_KUAP
> >>>> - Failure to write do to lack of DIRTY bit set on the 8xx
> >>>> - Inadequaly complex WARN post verification
> >>>>
> >>>> However, it has an impact on the CPU load. Here is the time
> >>>> needed on an 8xx to run the ftrace selftests without and
> >>>> with this series:
> >>>> - Without CONFIG_STRICT_KERNEL_RWX		==> 38 seconds
> >>>> - With CONFIG_STRICT_KERNEL_RWX			==> 40 seconds
> >>>> - With CONFIG_STRICT_KERNEL_RWX + this series	==> 43 seconds
> >>>>
> >>>> Link: https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=166003
> >>>> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
> >>>> ---
> >>>>    arch/powerpc/lib/code-patching.c | 5 ++++-
> >>>>    1 file changed, 4 insertions(+), 1 deletion(-)
> >>>>
> >>>> diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
> >>>> index f156132e8975..4ccff427592e 100644
> >>>> --- a/arch/powerpc/lib/code-patching.c
> >>>> +++ b/arch/powerpc/lib/code-patching.c
> >>>> @@ -97,6 +97,7 @@ static int map_patch(const void *addr, struct patch_mapping *patch_mapping)
> >>>>    	}
> >>>>    
> >>>>    	pte = mk_pte(page, pgprot);
> >>>> +	pte = pte_mkdirty(pte);
> >>>>    	set_pte_at(patching_mm, patching_addr, ptep, pte);
> >>>>    
> >>>>    	init_temp_mm(&patch_mapping->temp_mm, patching_mm);
> >>>> @@ -168,7 +169,9 @@ static int do_patch_instruction(unsigned int *addr, unsigned int instr)
> >>>>    			(offset_in_page((unsigned long)addr) /
> >>>>    				sizeof(unsigned int));
> >>>>    
> >>>> +	allow_write_to_user(patch_addr, sizeof(instr));
> >>>>    	__patch_instruction(addr, instr, patch_addr);
> >>>> +	prevent_write_to_user(patch_addr, sizeof(instr));
> >>>>
> >>>
> >>> On radix we can map the page with PAGE_KERNEL protection which ends up
> >>> setting EAA[0] in the radix PTE. This means the KUAP (AMR) protection is
> >>> ignored (ISA v3.0b Fig. 35) since we are accessing the page from MSR[PR]=0.
> >>>
> >>> Can we employ a similar approach on the 8xx? I would prefer *not* to wrap
> >>> the __patch_instruction() with the allow_/prevent_write_to_user() KUAP things
> >>> because this is a temporary kernel mapping which really isn't userspace in
> >>> the usual sense.
> >>
> >> On the 8xx, that's pretty different.
> >>
> >> The PTE doesn't control whether a page is user page or a kernel page.
> >> The only thing that is set in the PTE is whether a page is linked to a
> >> given PID or not.
> >> PAGE_KERNEL tells that the page can be addressed with any PID.
> >>
> >> The user access right is given by a kind of zone, which is in the PGD
> >> entry. Every pages above PAGE_OFFSET are defined as belonging to zone 0.
> >> Every pages below PAGE_OFFSET are defined as belonging to zone 1.
> >>
> >> By default, zone 0 can only be accessed by kernel, and zone 1 can only
> >> be accessed by user. When kernel wants to access zone 1, it temporarily
> >> changes properties of zone 1 to allow both kernel and user accesses.
> >>
> >> So, if your mapping is below PAGE_OFFSET, it is in zone 1 and kernel
> >> must unlock it to access it.
> >>
> >>
> >> And this is more or less the same on hash/32. This is managed by segment
> >> registers. One segment register corresponds to a 256Mbytes area. Every
> >> pages below PAGE_OFFSET can only be read by default by kernel. Only user
> >> can write if the PTE allows it. When the kernel needs to write at an
> >> address below PAGE_OFFSET, it must change the segment properties in the
> >> corresponding segment register.
> >>
> >> So, for both cases, if we want to have it local to a task while still
> >> allowing kernel access, it means we have to define a new special area
> >> between TASK_SIZE and PAGE_OFFSET which belongs to kernel zone.
> >>
> >> That looks complex to me for a small benefit, especially as 8xx is not
> >> SMP and neither are most of the hash/32 targets.
> >>
> > 
> > Agreed. So I guess the solution is to differentiate between radix/non-radix
> > and use PAGE_SHARED for non-radix along with the KUAP functions when KUAP
> > is enabled. Hmm, I need to think about this some more, especially if it's
> > acceptable to temporarily map kernel text as PAGE_SHARED for patching. Do
> > you see any obvious problems on 8xx and hash/32 w/ using PAGE_SHARED?
>
> 
> No it shouldn't be a problem AFAICS, except maybe the CPU overhead it
> brings as I mentioned previously (ftrace selftests going from 40 to 43
> seconds ie 8% overhead.
>
Ok great. I will have some performance numbers for POWER8 and POWER9 with
the next spin of the RFC
> 
> > 
> > I don't necessarily want to drop the local mm patching idea for non-radix
> > platforms since that means we would have to maintain two implementations.
> > 
>
> 
> What's the problem with RADIX, why can't PAGE_SHARED be used on radix ?
>
It's not a problem. I would actually prefer to use PAGE_KERNEL since the
mapping is really for a kernel page. On radix using PAGE_KERNEL allows us
to avoid the KUAP functions due to the HW implementation (AMR and EAA).
> 
> Christophe
>
> 
>
> 


  reply	other threads:[~2020-04-21  4:23 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-26 14:42 [RFC PATCH] powerpc/lib: Fixing use a temporary mm for code patching Christophe Leroy
2020-03-26 14:42 ` Christophe Leroy
2020-04-15  5:16 ` Christopher M Riedl
2020-04-15  5:16   ` Christopher M Riedl
2020-04-15  9:12   ` Christophe Leroy
2020-04-15  9:12     ` Christophe Leroy
2020-04-15 16:22     ` Christopher M Riedl
2020-04-15 16:22       ` Christopher M Riedl
2020-04-18 10:27       ` Christophe Leroy
2020-04-18 10:27         ` Christophe Leroy
2020-04-21  4:22         ` Christopher M. Riedl [this message]
2020-04-21  4:22           ` Christopher M. Riedl

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=C26LKGPLFN9C.57CFF2U7I7X0@geist \
    --to=cmr@informatik.wtf \
    --cc=christophe.leroy@c-s.fr \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.