LinuxPPC-Dev Archive on lore.kernel.org

LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed

* Latest Git kernel doesn't compile because of the LINUX_VERSION_CODE issue
From: Christian Zigotzky @ 2021-02-26 12:34 UTC (permalink / raw)
  To: Michael Ellerman, linuxppc-dev; +Cc: Darren Stevens, R.T.Dickinson
In-Reply-To: <CAOSf1CHQ=QDwH=J4kLYqboe481poa7EdbC6gzq29W7KYHhn1YQ@mail.gmail.com>

Hello,

I tried to compile the latest Git kernel today. Unfortunately it doesn't 
compile.

Error messages:

   CC      arch/powerpc/kernel/udbg_16550.o
In file included from ./include/linux/stackprotector.h:10:0,
                  from arch/powerpc/kernel/smp.c:35:
./arch/powerpc/include/asm/stackprotector.h: In function 
‘boot_init_stack_canary’:
./arch/powerpc/include/asm/stackprotector.h:29:30: error: expected 
expression before ‘;’ token
   canary ^= LINUX_VERSION_CODE;
                               ^
scripts/Makefile.build:271: recipe for target 
'arch/powerpc/kernel/smp.o' failed
make[2]: *** [arch/powerpc/kernel/smp.o] Error 1

----

drivers/media/cec/core/cec-api.c: In function ‘cec_adap_g_caps’:
drivers/media/cec/core/cec-api.c:85:35: error: expected expression 
before ‘;’ token
   caps.version = LINUX_VERSION_CODE;

----

I have found the bad commit. It's "Merge tag 'kbuild-v5.12' of 
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild" [1]

The changes in the Makefile (a/Makefile) are responsible for the 
compiling errors. [2]

I was able to revert this bad commit. After that it compiled without any 
problems.

Could you please compile the latest Git kernel and confirm this issue?

Thanks,
Christian

[1] 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=6fbd6cf85a3be127454a1ad58525a3adcf8612ab
[2] 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/diff/Makefile?id=6fbd6cf85a3be127454a1ad58525a3adcf8612ab

^ permalink raw reply

* Re: [PATCH V2 1/2] powerpc/perf: Infrastructure to support checking of attr.config*
From: Paul A. Clarke @ 2021-02-26 14:03 UTC (permalink / raw)
  To: Madhavan Srinivasan; +Cc: linuxppc-dev
In-Reply-To: <20210226065025.1254973-1-maddy@linux.ibm.com>

Another drive-by review... just some minor nits, below...

On Fri, Feb 26, 2021 at 12:20:24PM +0530, Madhavan Srinivasan wrote:
> Introduce code to support the checking of attr.config* for
> values which are reserved for a given platform.
> Performance Monitoring Unit (PMU) configuration registers
> have fields that are reserved and specific value to bit field

I'd reword to "some specific values for bit fields are reserved".

> as reserved. For ex., MMCRA[61:62] is Randome Sampling Mode (SM)

s/Randome/Random/
This occurs here, and below, and in patch 2/2.

> and value of 0b11 to this field is reserved.

s/to/for/

> Writing a non-zero values in these fields or writing invalid
> value to bit fields will have unknown behaviours.

Suggest: Writing non-zero or invalid values in these fields
will have unknown behaviors. (or "behaviours" ;-)

PC

> Patch adds a generic call-back function "check_attr_config"
> in "struct power_pmu", to be called in event_init to
> check for attr.config* values for a given platform.
> "check_attr_config" is valid only for raw event type.
> 
> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
> ---
> Changelog v1:
> -Fixed commit message and in-code comments
> 
>  arch/powerpc/include/asm/perf_event_server.h |  6 ++++++
>  arch/powerpc/perf/core-book3s.c              | 14 ++++++++++++++
>  2 files changed, 20 insertions(+)
> 
> diff --git a/arch/powerpc/include/asm/perf_event_server.h b/arch/powerpc/include/asm/perf_event_server.h
> index 00e7e671bb4b..dde97d7d9253 100644
> --- a/arch/powerpc/include/asm/perf_event_server.h
> +++ b/arch/powerpc/include/asm/perf_event_server.h
> @@ -67,6 +67,12 @@ struct power_pmu {
>  	 * the pmu supports extended perf regs capability
>  	 */
>  	int		capabilities;
> +	/*
> +	 * Function to check event code for values which are
> +	 * reserved. Function takes struct perf_event as input,
> +	 * since event code could be spread in attr.config*
> +	 */
> +	int		(*check_attr_config)(struct perf_event *ev);
>  };
> 
>  /*
> diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
> index 6817331e22ff..c6eeb4fdc5fd 100644
> --- a/arch/powerpc/perf/core-book3s.c
> +++ b/arch/powerpc/perf/core-book3s.c
> @@ -1958,6 +1958,20 @@ static int power_pmu_event_init(struct perf_event *event)
> 
>  		if (ppmu->blacklist_ev && is_event_blacklisted(ev))
>  			return -EINVAL;
> +		/*
> +		 * PMU config registers have fields that are
> +		 * reserved and specific value to bit field as reserved.
> +		 * For ex., MMCRA[61:62] is Randome Sampling Mode (SM)
> +		 * and value of 0b11 to this field is reserved.
> +		 *
> +		 * This check is needed only for raw event type,
> +		 * since tools like fuzzer use raw event type to
> +		 * provide randomized event code values for test.
> +		 *
> +		 */
> +		if (ppmu->check_attr_config &&
> +		    ppmu->check_attr_config(event))
> +			return -EINVAL;
>  		break;
>  	default:
>  		return -ENOENT;
> -- 
> 2.26.2
> 

^ permalink raw reply

* Re: Freeing page tables through RCU
From: Jason Gunthorpe @ 2021-02-26 14:42 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: linux-mm, linuxppc-dev, linux-kernel, linux-s390
In-Reply-To: <20210225205820.GC2858050@casper.infradead.org>

On Thu, Feb 25, 2021 at 08:58:20PM +0000, Matthew Wilcox wrote:

> I'd like to hear better ideas than this.

You didn't like my suggestion to put a sleepable lock around the
freeing of page tables during flushing?

I still don't see how you convert the sleepable page walkers to use
rcu??

Jason

^ permalink raw reply

* Re: [PATCH v2 13/37] KVM: PPC: Book3S HV P9: Move radix MMU switching instructions together
From: Fabiano Rosas @ 2021-02-26 15:56 UTC (permalink / raw)
  To: Nicholas Piggin, kvm-ppc; +Cc: linuxppc-dev, Nicholas Piggin
In-Reply-To: <20210225134652.2127648-14-npiggin@gmail.com>

Nicholas Piggin <npiggin@gmail.com> writes:

> Switching the MMU from radix<->radix mode is tricky particularly as the
> MMU can remain enabled and requires a certain sequence of SPR updates.
> Move these together into their own functions.
>
> This also includes the radix TLB check / flush because it's tied in to
> MMU switching due to tlbiel getting LPID from LPIDR.
>
> (XXX: isync / hwsync synchronisation TBD)

I see bot mtlpidr and mtlpcr requiring a CSI in the ISA. Do you say we
might need more than an isync?

Regardless, I'd expect that to be separate from the refactoring here,
so:

Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>

>
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
>  arch/powerpc/kvm/book3s_hv.c | 55 +++++++++++++++++++++---------------
>  1 file changed, 32 insertions(+), 23 deletions(-)
>
> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> index 895090636295..23d6dc04b0e9 100644
> --- a/arch/powerpc/kvm/book3s_hv.c
> +++ b/arch/powerpc/kvm/book3s_hv.c
> @@ -3440,12 +3440,38 @@ static noinline void kvmppc_run_core(struct kvmppc_vcore *vc)
>  	trace_kvmppc_run_core(vc, 1);
>  }
>
> +static void switch_mmu_to_guest_radix(struct kvm *kvm, struct kvm_vcpu *vcpu, u64 lpcr)
> +{
> +	struct kvmppc_vcore *vc = vcpu->arch.vcore;
> +	struct kvm_nested_guest *nested = vcpu->arch.nested;
> +	u32 lpid;
> +
> +	lpid = nested ? nested->shadow_lpid : kvm->arch.lpid;
> +
> +	mtspr(SPRN_LPID, lpid);
> +	mtspr(SPRN_LPCR, lpcr);
> +	mtspr(SPRN_PID, vcpu->arch.pid);
> +	isync();
> +
> +	/* TLBIEL must have LPIDR set, so set guest LPID before flushing. */
> +	kvmppc_check_need_tlb_flush(kvm, vc->pcpu, nested);
> +}
> +
> +static void switch_mmu_to_host_radix(struct kvm *kvm, u32 pid)
> +{
> +	mtspr(SPRN_PID, pid);
> +	mtspr(SPRN_LPID, kvm->arch.host_lpid);
> +	mtspr(SPRN_LPCR, kvm->arch.host_lpcr);
> +	isync();
> +}
> +
>  /*
>   * Load up hypervisor-mode registers on P9.
>   */
>  static int kvmhv_load_hv_regs_and_go(struct kvm_vcpu *vcpu, u64 time_limit,
>  				     unsigned long lpcr)
>  {
> +	struct kvm *kvm = vcpu->kvm;
>  	struct kvmppc_vcore *vc = vcpu->arch.vcore;
>  	s64 hdec;
>  	u64 tb, purr, spurr;
> @@ -3468,12 +3494,12 @@ static int kvmhv_load_hv_regs_and_go(struct kvm_vcpu *vcpu, u64 time_limit,
>  	 * P8 and P9 suppress the HDEC exception when LPCR[HDICE] = 0,
>  	 * so set HDICE before writing HDEC.
>  	 */
> -	mtspr(SPRN_LPCR, vcpu->kvm->arch.host_lpcr | LPCR_HDICE);
> +	mtspr(SPRN_LPCR, kvm->arch.host_lpcr | LPCR_HDICE);
>  	isync();
>
>  	hdec = time_limit - mftb();
>  	if (hdec < 0) {
> -		mtspr(SPRN_LPCR, vcpu->kvm->arch.host_lpcr);
> +		mtspr(SPRN_LPCR, kvm->arch.host_lpcr);
>  		isync();
>  		return BOOK3S_INTERRUPT_HV_DECREMENTER;
>  	}
> @@ -3508,7 +3534,6 @@ static int kvmhv_load_hv_regs_and_go(struct kvm_vcpu *vcpu, u64 time_limit,
>  	}
>  	mtspr(SPRN_CIABR, vcpu->arch.ciabr);
>  	mtspr(SPRN_IC, vcpu->arch.ic);
> -	mtspr(SPRN_PID, vcpu->arch.pid);
>
>  	mtspr(SPRN_PSSCR, vcpu->arch.psscr | PSSCR_EC |
>  	      (local_paca->kvm_hstate.fake_suspend << PSSCR_FAKE_SUSPEND_LG));
> @@ -3522,8 +3547,7 @@ static int kvmhv_load_hv_regs_and_go(struct kvm_vcpu *vcpu, u64 time_limit,
>
>  	mtspr(SPRN_AMOR, ~0UL);
>
> -	mtspr(SPRN_LPCR, lpcr);
> -	isync();
> +	switch_mmu_to_guest_radix(kvm, vcpu, lpcr);
>
>  	kvmppc_xive_push_vcpu(vcpu);
>
> @@ -3562,7 +3586,6 @@ static int kvmhv_load_hv_regs_and_go(struct kvm_vcpu *vcpu, u64 time_limit,
>  		mtspr(SPRN_DAWR1, host_dawr1);
>  		mtspr(SPRN_DAWRX1, host_dawrx1);
>  	}
> -	mtspr(SPRN_PID, host_pidr);
>
>  	/*
>  	 * Since this is radix, do a eieio; tlbsync; ptesync sequence in
> @@ -3577,9 +3600,6 @@ static int kvmhv_load_hv_regs_and_go(struct kvm_vcpu *vcpu, u64 time_limit,
>  	if (cpu_has_feature(CPU_FTR_ARCH_31))
>  		asm volatile(PPC_CP_ABORT);
>
> -	mtspr(SPRN_LPID, vcpu->kvm->arch.host_lpid);	/* restore host LPID */
> -	isync();
> -
>  	vc->dpdes = mfspr(SPRN_DPDES);
>  	vc->vtb = mfspr(SPRN_VTB);
>  	mtspr(SPRN_DPDES, 0);
> @@ -3596,7 +3616,8 @@ static int kvmhv_load_hv_regs_and_go(struct kvm_vcpu *vcpu, u64 time_limit,
>  	}
>
>  	mtspr(SPRN_HDEC, 0x7fffffff);
> -	mtspr(SPRN_LPCR, vcpu->kvm->arch.host_lpcr);
> +
> +	switch_mmu_to_host_radix(kvm, host_pidr);
>
>  	return trap;
>  }
> @@ -4130,7 +4151,7 @@ int kvmhv_run_single_vcpu(struct kvm_vcpu *vcpu, u64 time_limit,
>  {
>  	struct kvm_run *run = vcpu->run;
>  	int trap, r, pcpu;
> -	int srcu_idx, lpid;
> +	int srcu_idx;
>  	struct kvmppc_vcore *vc;
>  	struct kvm *kvm = vcpu->kvm;
>  	struct kvm_nested_guest *nested = vcpu->arch.nested;
> @@ -4204,13 +4225,6 @@ int kvmhv_run_single_vcpu(struct kvm_vcpu *vcpu, u64 time_limit,
>  	vc->vcore_state = VCORE_RUNNING;
>  	trace_kvmppc_run_core(vc, 0);
>
> -	if (cpu_has_feature(CPU_FTR_HVMODE)) {
> -		lpid = nested ? nested->shadow_lpid : kvm->arch.lpid;
> -		mtspr(SPRN_LPID, lpid);
> -		isync();
> -		kvmppc_check_need_tlb_flush(kvm, pcpu, nested);
> -	}
> -
>  	guest_enter_irqoff();
>
>  	srcu_idx = srcu_read_lock(&kvm->srcu);
> @@ -4229,11 +4243,6 @@ int kvmhv_run_single_vcpu(struct kvm_vcpu *vcpu, u64 time_limit,
>
>  	srcu_read_unlock(&kvm->srcu, srcu_idx);
>
> -	if (cpu_has_feature(CPU_FTR_HVMODE)) {
> -		mtspr(SPRN_LPID, kvm->arch.host_lpid);
> -		isync();
> -	}
> -
>  	set_irq_happened(trap);
>
>  	kvmppc_set_host_core(pcpu);

^ permalink raw reply

* Re: Freeing page tables through RCU
From: Matthew Wilcox @ 2021-02-26 16:03 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: linux-mm, linuxppc-dev, linux-kernel, linux-s390
In-Reply-To: <20210226144200.GV2643399@ziepe.ca>

On Fri, Feb 26, 2021 at 10:42:00AM -0400, Jason Gunthorpe wrote:
> On Thu, Feb 25, 2021 at 08:58:20PM +0000, Matthew Wilcox wrote:
> 
> > I'd like to hear better ideas than this.
> 
> You didn't like my suggestion to put a sleepable lock around the
> freeing of page tables during flushing?
> 
> I still don't see how you convert the sleepable page walkers to use
> rcu??

I don't want to convert the sleepable ones to use RCU ... I want to
convert the non-sleeping ones to use RCU.  A page_table_free_lock might
work, but it might have its own problems later (eg a sleeping lock can't
be acquired under RCU or spinlock, and it can't be a spinlock because
it'd have to be held while we wait for IPIs).

I think it would solve my immediate problem, and I wonder if it might
solve some other problems ...

^ permalink raw reply

* Re: Latest Git kernel doesn't compile because of the LINUX_VERSION_CODE issue
From: Christophe Leroy @ 2021-02-26 16:10 UTC (permalink / raw)
  To: Christian Zigotzky, Michael Ellerman, linuxppc-dev
  Cc: Darren Stevens, R.T.Dickinson
In-Reply-To: <99f6d05a-d431-7444-bb0a-180c042c2afd@xenosoft.de>



Le 26/02/2021 à 13:34, Christian Zigotzky a écrit :
> Hello,
> 
> I tried to compile the latest Git kernel today. Unfortunately it doesn't compile.

I have no such problem with latest git kernel.

Christophe

> 
> Error messages:
> 
>    CC      arch/powerpc/kernel/udbg_16550.o
> In file included from ./include/linux/stackprotector.h:10:0,
>                   from arch/powerpc/kernel/smp.c:35:
> ./arch/powerpc/include/asm/stackprotector.h: In function ‘boot_init_stack_canary’:
> ./arch/powerpc/include/asm/stackprotector.h:29:30: error: expected expression before ‘;’ token
>    canary ^= LINUX_VERSION_CODE;
>                                ^
> scripts/Makefile.build:271: recipe for target 'arch/powerpc/kernel/smp.o' failed
> make[2]: *** [arch/powerpc/kernel/smp.o] Error 1
> 
> ----
> 
> drivers/media/cec/core/cec-api.c: In function ‘cec_adap_g_caps’:
> drivers/media/cec/core/cec-api.c:85:35: error: expected expression before ‘;’ token
>    caps.version = LINUX_VERSION_CODE;
> 
> ----
> 
> I have found the bad commit. It's "Merge tag 'kbuild-v5.12' of 
> git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild" [1]
> 
> The changes in the Makefile (a/Makefile) are responsible for the compiling errors. [2]
> 
> I was able to revert this bad commit. After that it compiled without any problems.
> 
> Could you please compile the latest Git kernel and confirm this issue?
> 
> Thanks,
> Christian
> 
> [1] 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=6fbd6cf85a3be127454a1ad58525a3adcf8612ab 
> 
> [2] 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/diff/Makefile?id=6fbd6cf85a3be127454a1ad58525a3adcf8612ab 
> 

^ permalink raw reply

* Re: Freeing page tables through RCU
From: Gerald Schaefer @ 2021-02-26 16:12 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: linux-s390, Janosch Frank, Christian Borntraeger, Heiko Carstens,
	linux-kernel, linux-mm, Alexander Gordeev, Claudio Imbrenda,
	linuxppc-dev
In-Reply-To: <20210225205820.GC2858050@casper.infradead.org>

On Thu, 25 Feb 2021 20:58:20 +0000
Matthew Wilcox <willy@infradead.org> wrote:

> In order to walk the page tables without the mmap semaphore, it must
> be possible to prevent them from being freed and reused (eg if munmap()
> races with viewing /proc/$pid/smaps).
> 
> There is various commentary within the mm on how to prevent this.  One way
> is to disable interrupts, relying on that to block rcu_sched or IPIs.
> I don't think the RT people are terribly happy about reading a proc file
> disabling interrupts, and it doesn't work for architectures that free
> page tables directly instead of batching them into an rcu_sched (because
> the IPI may not be sent to this CPU if the task has never run on it).
> 
> See "Fast GUP" in mm/gup.c
> 
> Ideally, I'd like rcu_read_lock() to delay page table reuse.  This is
> close to trivial for architectures which use entire pages or multiple
> pages for levels of their page tables as we can use the rcu_head embedded
> in struct page to queue the page for RCU.
> 
> s390 and powerpc are the only two architectures I know of that have
> levels of their page table that are smaller than their PAGE_SIZE.
> I'd like to discuss options.  There may be a complicated scheme that
> allows partial pages to be freed via RCU, but I have something simpler
> in mind.  For powerpc in particular, it can have a PAGE_SIZE of 64kB
> and then the MMU wants to see 4kB entries in the PMD.  I suggest that
> instead of allocating each 4kB entry individually, we allocate a 64kB
> page and fill in 16 consecutive PMDs.  This could cost a bit more memory
> (although if you've asked for a CONFIG_PAGE_SIZE of 64kB, you presumably
> don't care too much about it), but it'll make future page faults cheaper
> (as the PMDs will already be present, assuming you have good locality
> of reference).
> 
> I'd like to hear better ideas than this.

Some background on the situation for s390: The architecture defines
an 8 bit pagetable index, so we have 256 entries in a 2 KB pagetable,
but PAGE_SIZE is 4 KB. pte_alloc(_one) will use alloc_page() to allocate
a full 4 KB page, and then do some housekeeping to maintain a per-mm list
of such 4 KB pages, which will contain either one or two 2 KB pagetable
fragments.

This is also the reason why pgtable_t on s390 is not pointing to the
struct page of the (4 KB) page containing a 2 KB pagetable fragment, but
rather to the 2 KB pagetable itself.

I see at least two issues here, with using rcu_head embedded in the
struct page (for a 4 KB page):

1) There might be two 2 KB pagetables present in that 4 KB page,
and the rcu_head would affect both. Not sure if this would really be
a problem, because we already have a similar situation with the split
ptlock embedded in struct page, which also might lock two 2 KB pagetables,
i.e. more than necessary. It still is far less "over-locking" than
using mm->page_table_lock, and the move_pte() code e.g. takes care
to avoid a deadlock if src and dst ptlocks happen to be on the
same page.

So, a similar "over-locking" might also be possible and acceptable
for the rcu_head approach, but I do not really understand if that could
have some deadlock or other unwanted side-effects.

2) The "housekeeping" of our 2 KB pagetable fragments uses page->lru
to maintain the per-mm list. It also (mis)uses page->_refcount to mark
which 2 KB half is used/free, but that should not be an issue I guess.
Using page->lru will be an issue though. IIUC, then page->rcu_head will
overlay with page->lru, so using page->rcu_head for pagetable pages
on s390 would conflict with our page->lru usage for such pagetable pages.

I do not really see how that could be fixed, maybe we could find and
re-use other struct page members for our 2 KB fragment list. Also, for
kvm, there seem to be even more users of page->lru for pagetable pages,
in arch/s390/mm/gmap.c. Not sure though if those would actually also
affect "regular" pagetable walks, or if they are somehow independent.
But if we'd find some new list home for the 2 KB fragments, then that
could probably also be used for the gmap stuff.

^ permalink raw reply

* Re: Freeing page tables through RCU
From: Jason Gunthorpe @ 2021-02-26 16:21 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: linux-mm, linuxppc-dev, linux-kernel, linux-s390
In-Reply-To: <20210226160354.GE2723601@casper.infradead.org>

On Fri, Feb 26, 2021 at 04:03:54PM +0000, Matthew Wilcox wrote:
> On Fri, Feb 26, 2021 at 10:42:00AM -0400, Jason Gunthorpe wrote:
> > On Thu, Feb 25, 2021 at 08:58:20PM +0000, Matthew Wilcox wrote:
> > 
> > > I'd like to hear better ideas than this.
> > 
> > You didn't like my suggestion to put a sleepable lock around the
> > freeing of page tables during flushing?
> > 
> > I still don't see how you convert the sleepable page walkers to use
> > rcu??
> 
> I don't want to convert the sleepable ones to use RCU ... I want to
> convert the non-sleeping ones to use RCU.  

Why? Convert them to use the normal page table spinlocks?

It makes everything much more understandable if it is locked properly.

The lockless walks should be reserved for special places because they
are actually hard to do properly.

> A page_table_free_lock might work, but it might have its own
> problems later (eg a sleeping lock can't be acquired under RCU or
> spinlock, and it can't be a spinlock because it'd have to be held
> while we wait for IPIs).

The mmap_sem today is serving the function of the page_table_free_lock
idea, so I think a sleepable lock is already proven to work?

Jason

^ permalink raw reply

* Re: [PATCH v4 2/5] ibmvfc: fix invalid sub-CRQ handles after hard reset
From: Brian King @ 2021-02-26 16:39 UTC (permalink / raw)
  To: Tyrel Datwyler, james.bottomley
  Cc: brking, linuxppc-dev, linux-scsi, martin.petersen, linux-kernel
In-Reply-To: <20210225215057.23020-3-tyreld@linux.ibm.com>

Reviewed-by: Brian King <brking@linux.vnet.ibm.com>

-- 
Brian King
Power Linux I/O
IBM Linux Technology Center


^ permalink raw reply

* Re: [PATCH v2 17/37] KVM: PPC: Book3S HV P9: Move setting HDEC after switching to guest LPCR
From: Fabiano Rosas @ 2021-02-26 16:38 UTC (permalink / raw)
  To: Nicholas Piggin, kvm-ppc; +Cc: linuxppc-dev, Nicholas Piggin
In-Reply-To: <20210225134652.2127648-18-npiggin@gmail.com>

Nicholas Piggin <npiggin@gmail.com> writes:

> LPCR[HDICE]=0 suppresses hypervisor decrementer exceptions on some
> processors, so it must be enabled before HDEC is set.
>
> Rather than set it in the host LPCR then setting HDEC, move the HDEC
> update to after the guest MMU context (including LPCR) is loaded.
> There shouldn't be much concern with delaying HDEC by some 10s or 100s
> of nanoseconds by setting it a bit later.
>
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
>  arch/powerpc/kvm/book3s_hv.c | 24 ++++++++++--------------
>  1 file changed, 10 insertions(+), 14 deletions(-)
>
> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> index d4770b222d7e..63cc92c45c5d 100644
> --- a/arch/powerpc/kvm/book3s_hv.c
> +++ b/arch/powerpc/kvm/book3s_hv.c
> @@ -3490,23 +3490,13 @@ static int kvmhv_load_hv_regs_and_go(struct kvm_vcpu *vcpu, u64 time_limit,
>  		host_dawrx1 = mfspr(SPRN_DAWRX1);
>  	}
>
> -	/*
> -	 * P8 and P9 suppress the HDEC exception when LPCR[HDICE] = 0,
> -	 * so set HDICE before writing HDEC.
> -	 */
> -	mtspr(SPRN_LPCR, kvm->arch.host_lpcr | LPCR_HDICE);
> -	isync();
> -
> -	hdec = time_limit - mftb();

Would it be possible to leave the mftb() in this patch and then replace
them all at once in patch 20/37 - "KVM: PPC: Book3S HV P9: Reduce mftb
per guest entry/exit"?

> -	if (hdec < 0) {
> -		mtspr(SPRN_LPCR, kvm->arch.host_lpcr);
> -		isync();
> +	tb = mftb();
> +	hdec = time_limit - tb;
> +	if (hdec < 0)
>  		return BOOK3S_INTERRUPT_HV_DECREMENTER;
> -	}
> -	mtspr(SPRN_HDEC, hdec);
>
>  	if (vc->tb_offset) {
> -		u64 new_tb = mftb() + vc->tb_offset;
> +		u64 new_tb = tb + vc->tb_offset;
>  		mtspr(SPRN_TBU40, new_tb);
>  		tb = mftb();
>  		if ((tb & 0xffffff) < (new_tb & 0xffffff))
> @@ -3549,6 +3539,12 @@ static int kvmhv_load_hv_regs_and_go(struct kvm_vcpu *vcpu, u64 time_limit,
>
>  	switch_mmu_to_guest_radix(kvm, vcpu, lpcr);
>
> +	/*
> +	 * P9 suppresses the HDEC exception when LPCR[HDICE] = 0,
> +	 * so set guest LPCR (with HDICE) before writing HDEC.
> +	 */
> +	mtspr(SPRN_HDEC, hdec);
> +
>  	mtspr(SPRN_SRR0, vcpu->arch.shregs.srr0);
>  	mtspr(SPRN_SRR1, vcpu->arch.shregs.srr1);

^ permalink raw reply

* Re: [PATCH v4 5/5] ibmvfc: reinitialize sub-CRQs and perform channel enquiry after LPM
From: Brian King @ 2021-02-26 16:43 UTC (permalink / raw)
  To: Tyrel Datwyler, james.bottomley
  Cc: brking, linuxppc-dev, linux-scsi, martin.petersen, linux-kernel
In-Reply-To: <20210225215057.23020-6-tyreld@linux.ibm.com>

Reviewed-by: Brian King <brking@linux.vnet.ibm.com>


-- 
Brian King
Power Linux I/O
IBM Linux Technology Center


^ permalink raw reply

* Re: [PATCH v2 23/37] KVM: PPC: Book3S HV P9: Implement the rest of the P9 path in C
From: Fabiano Rosas @ 2021-02-26 22:37 UTC (permalink / raw)
  To: Nicholas Piggin, kvm-ppc; +Cc: linuxppc-dev, Nicholas Piggin
In-Reply-To: <20210225134652.2127648-24-npiggin@gmail.com>

Nicholas Piggin <npiggin@gmail.com> writes:

Hi, thanks for this. It helped me clarify a bunch of details that I
haven't understood while reading the asm code.

Some comments below:

> Almost all logic is moved to C, by introducing a new in_guest mode that
> selects and branches very early in the interrupt handler to the P9 exit
> code.
>
> The remaining assembly is only about 160 lines of low level stack setup,
> with VCPU vs host register save and restore, plus a small shim to the
> legacy paths in the interrupt handler.
>
> There are two motivations for this, the first is just make the code more
> maintainable being in C. The second is to reduce the amount of code
> running in a special KVM mode, "realmode". I put that in quotes because
> with radix it is no longer necessarily real-mode in the MMU, but it
> still has to be treated specially because it may be in real-mode, and
> has various important registers like PID, DEC, TB, etc set to guest.
> This is hostile to the rest of Linux and can't use arbitrary kernel
> functionality or be instrumented well.
>
> This initial patch is a reasonably faithful conversion of the asm code.
> It does lack any loop to return quickly back into the guest without
> switching out of realmode in the case of unimportant or easily handled
> interrupts, as explained in the previous change, handling HV interrupts
> in real mode is not so important for P9.
>
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---

<snip>

> diff --git a/arch/powerpc/kvm/book3s_64_entry.S b/arch/powerpc/kvm/book3s_64_entry.S
> index f826c8dc2e19..cc7b76865a16 100644
> --- a/arch/powerpc/kvm/book3s_64_entry.S
> +++ b/arch/powerpc/kvm/book3s_64_entry.S
> @@ -1,10 +1,14 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
>  #include <asm/asm-offsets.h>
>  #include <asm/cache.h>
> +#include <asm/code-patching-asm.h>
>  #include <asm/exception-64s.h>
> +#include <asm/export.h>
>  #include <asm/kvm_asm.h>
>  #include <asm/kvm_book3s_asm.h>
>  #include <asm/ppc_asm.h>
>  #include <asm/reg.h>
> +#include <asm/ultravisor-api.h>
>
>  /*
>   * These are branched to from interrupt handlers in exception-64s.S which set
> @@ -13,13 +17,24 @@
>  .global	kvmppc_hcall
>  .balign IFETCH_ALIGN_BYTES
>  kvmppc_hcall:
> +	lbz	r10,HSTATE_IN_GUEST(r13)
> +	cmpwi	r10,KVM_GUEST_MODE_GUEST_HV_FAST
> +	beq	kvmppc_p9_exit_hcall
>  	ld	r10,PACA_EXGEN+EX_R13(r13)
>  	SET_SCRATCH0(r10)
>  	li	r10,0xc00
> +	li	r11,PACA_EXGEN
> +	b	1f
>
>  .global	kvmppc_interrupt
>  .balign IFETCH_ALIGN_BYTES
>  kvmppc_interrupt:
> +	std	r10,HSTATE_SCRATCH0(r13)
> +	lbz	r10,HSTATE_IN_GUEST(r13)
> +	cmpwi	r10,KVM_GUEST_MODE_GUEST_HV_FAST
> +	beq	kvmppc_p9_exit_interrupt
> +	ld	r10,HSTATE_SCRATCH0(r13)
> +	lbz	r11,HSTATE_IN_GUEST(r13)
>  	li	r11,PACA_EXGEN
>  	cmpdi	r10,0x200
>  	bgt+	1f
> @@ -114,3 +129,169 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
>  	GET_SCRATCH0(r13)
>  	HRFI_TO_KERNEL
>  #endif
> +
> +/* Stack frame offsets for kvmppc_hv_entry */
> +#define SFS			208
> +#define STACK_SLOT_VCPU		(SFS-8)
> +#define STACK_SLOT_NVGPRS	(SFS-152)	/* 18 gprs */
> +
> +/*
> + * void kvmppc_p9_enter_guest(struct vcpu *vcpu);
> + *
> + * Enter the guest on a ISAv3.0 or later system where we have exactly
> + * one vcpu per vcore, and both the host and guest are radix, and threads
> + * are set to "indepdent mode".
> + */
> +.balign	IFETCH_ALIGN_BYTES
> +_GLOBAL(kvmppc_p9_enter_guest)
> +EXPORT_SYMBOL_GPL(kvmppc_p9_enter_guest)
> +	mflr	r0
> +	std	r0,PPC_LR_STKOFF(r1)
> +	stdu	r1,-SFS(r1)
> +
> +	std	r1,HSTATE_HOST_R1(r13)
> +	std	r3,STACK_SLOT_VCPU(r1)

The vcpu was stored on the paca previously. I don't get the change,
could you explain?

> +
> +	mfcr	r4
> +	stw	r4,SFS+8(r1)
> +
> +	reg = 14
> +	.rept	18
> +	std	reg,STACK_SLOT_NVGPRS + ((reg - 14) * 8)(r1)
> +	reg = reg + 1
> +	.endr
> +
> +	ld	r4,VCPU_LR(r3)
> +	mtlr	r4
> +	ld	r4,VCPU_CTR(r3)
> +	mtctr	r4
> +	ld	r4,VCPU_XER(r3)
> +	mtspr	SPRN_XER,r4
> +
> +	ld	r1,VCPU_CR(r3)
> +
> +BEGIN_FTR_SECTION
> +	ld	r4,VCPU_CFAR(r3)
> +	mtspr	SPRN_CFAR,r4
> +END_FTR_SECTION_IFSET(CPU_FTR_CFAR)
> +BEGIN_FTR_SECTION
> +	ld	r4,VCPU_PPR(r3)
> +	mtspr	SPRN_PPR,r4
> +END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
> +
> +	reg = 4
> +	.rept	28
> +	ld	reg,__VCPU_GPR(reg)(r3)
> +	reg = reg + 1
> +	.endr
> +
> +	ld	r4,VCPU_KVM(r3)
> +	lbz	r4,KVM_SECURE_GUEST(r4)
> +	cmpdi	r4,0
> +	ld	r4,VCPU_GPR(R4)(r3)
> +	bne	.Lret_to_ultra
> +
> +	mtcr	r1
> +
> +	ld	r0,VCPU_GPR(R0)(r3)
> +	ld	r1,VCPU_GPR(R1)(r3)
> +	ld	r2,VCPU_GPR(R2)(r3)
> +	ld	r3,VCPU_GPR(R3)(r3)
> +
> +	HRFI_TO_GUEST
> +	b	.
> +
> +	/*
> +	 * Use UV_RETURN ultracall to return control back to the Ultravisor
> +	 * after processing an hypercall or interrupt that was forwarded
> +	 * (a.k.a. reflected) to the Hypervisor.
> +	 *
> +	 * All registers have already been reloaded except the ucall requires:
> +	 *   R0 = hcall result
> +	 *   R2 = SRR1, so UV can detect a synthesized interrupt (if any)
> +	 *   R3 = UV_RETURN
> +	 */
> +.Lret_to_ultra:
> +	mtcr	r1
> +	ld	r1,VCPU_GPR(R1)(r3)
> +
> +	ld	r0,VCPU_GPR(R3)(r3)
> +	mfspr	r2,SPRN_SRR1
> +	LOAD_REG_IMMEDIATE(r3, UV_RETURN)
> +	sc	2
> +
> +/*
> + * kvmppc_p9_exit_hcall and kvmppc_p9_exit_interrupt are branched to from
> + * above if the interrupt was taken for a guest that was entered via
> + * kvmppc_p9_enter_guest().
> + *
> + * This code recovers the host stack and vcpu pointer, saves all GPRs and
> + * CR, LR, CTR, XER as well as guest MSR and NIA into the VCPU, then re-
> + * establishes the host stack and registers to return from  the
> + * kvmppc_p9_enter_guest() function.
> + */
> +.balign	IFETCH_ALIGN_BYTES
> +kvmppc_p9_exit_hcall:
> +	mfspr	r11,SPRN_SRR0
> +	mfspr	r12,SPRN_SRR1
> +	li	r10,0xc00
> +	std	r10,HSTATE_SCRATCH0(r13)
> +
> +.balign	IFETCH_ALIGN_BYTES
> +kvmppc_p9_exit_interrupt:
> +	std     r1,HSTATE_SCRATCH1(r13)
> +	std     r3,HSTATE_SCRATCH2(r13)
> +	ld	r1,HSTATE_HOST_R1(r13)
> +	ld	r3,STACK_SLOT_VCPU(r1)
> +
> +	std	r9,VCPU_CR(r3)
> +
> +1:
> +	std	r11,VCPU_PC(r3)
> +	std	r12,VCPU_MSR(r3)
> +
> +	reg = 14
> +	.rept	18
> +	std	reg,__VCPU_GPR(reg)(r3)
> +	reg = reg + 1
> +	.endr
> +
> +	/* r1, r3, r9-r13 are saved to vcpu by C code */

If we just saved r1 and r3, why don't we put them in the vcpu right now?
That would avoid having the C code knowing about scratch areas.

> +	std	r0,VCPU_GPR(R0)(r3)
> +	std	r2,VCPU_GPR(R2)(r3)
> +	reg = 4
> +	.rept	5
> +	std	reg,__VCPU_GPR(reg)(r3)
> +	reg = reg + 1
> +	.endr
> +
> +	ld	r2,PACATOC(r13)
> +
> +	mflr	r4
> +	std	r4,VCPU_LR(r3)
> +	mfspr	r4,SPRN_XER
> +	std	r4,VCPU_XER(r3)
> +
> +	reg = 14
> +	.rept	18
> +	ld	reg,STACK_SLOT_NVGPRS + ((reg - 14) * 8)(r1)
> +	reg = reg + 1
> +	.endr
> +
> +	lwz	r4,SFS+8(r1)
> +	mtcr	r4
> +
> +	/*
> +	 * Flush the link stack here, before executing the first blr on the
> +	 * way out of the guest.
> +	 *
> +	 * The link stack won't match coming out of the guest anyway so the
> +	 * only cost is the flush itself. The call clobbers r0.
> +	 */
> +1:	nop
> +	patch_site 1b patch__call_kvm_flush_link_stack_2
> +
> +	addi	r1,r1,SFS
> +	ld	r0,PPC_LR_STKOFF(r1)
> +	mtlr	r0
> +	blr
> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> index 1997cf347d3e..28a2761515e3 100644
> --- a/arch/powerpc/kvm/book3s_hv.c
> +++ b/arch/powerpc/kvm/book3s_hv.c
> @@ -1421,6 +1421,8 @@ static int kvmppc_handle_exit_hv(struct kvm_vcpu *vcpu,
>  	 */
>  	case BOOK3S_INTERRUPT_H_DATA_STORAGE:
>  		r = RESUME_PAGE_FAULT;
> +		if (vcpu->arch.fault_dsisr == HDSISR_CANARY)
> +			r = RESUME_GUEST; /* Just retry if it's the canary */
>  		break;
>  	case BOOK3S_INTERRUPT_H_INST_STORAGE:
>  		vcpu->arch.fault_dar = kvmppc_get_pc(vcpu);
> @@ -3736,14 +3738,14 @@ static int kvmhv_p9_guest_entry(struct kvm_vcpu *vcpu, u64 time_limit,
>  		vcpu->arch.shregs.dsisr = mfspr(SPRN_DSISR);
>  		vcpu->arch.psscr = mfspr(SPRN_PSSCR_PR);
>  		mtspr(SPRN_PSSCR_PR, host_psscr);
> -
>  		/* H_CEDE has to be handled now, not later */
> -		if (trap == BOOK3S_INTERRUPT_SYSCALL && !vcpu->arch.nested &&
> +		if (trap == BOOK3S_INTERRUPT_SYSCALL &&
>  		    kvmppc_get_gpr(vcpu, 3) == H_CEDE) {
>  			kvmppc_cede(vcpu);
>  			kvmppc_set_gpr(vcpu, 3, 0);
>  			trap = 0;
>  		}
> +
>  	} else {
>  		kvmppc_xive_push_vcpu(vcpu);
>  		trap = kvmhv_load_hv_regs_and_go(vcpu, time_limit, lpcr);
> @@ -3768,9 +3770,10 @@ static int kvmhv_p9_guest_entry(struct kvm_vcpu *vcpu, u64 time_limit,
>  			}
>  		}
>  		kvmppc_xive_pull_vcpu(vcpu);
> +
> +		vcpu->arch.slb_max = 0;
>  	}
>
> -	vcpu->arch.slb_max = 0;
>  	dec = mfspr(SPRN_DEC);
>  	if (!(lpcr & LPCR_LD)) /* Sign extend if not using large decrementer */
>  		dec = (s32) dec;
> @@ -4429,11 +4432,19 @@ static int kvmppc_vcpu_run_hv(struct kvm_vcpu *vcpu)
>  		else
>  			r = kvmppc_run_vcpu(vcpu);
>
> -		if (run->exit_reason == KVM_EXIT_PAPR_HCALL &&
> -		    !(vcpu->arch.shregs.msr & MSR_PR)) {
> -			trace_kvm_hcall_enter(vcpu);
> -			r = kvmppc_pseries_do_hcall(vcpu);
> -			trace_kvm_hcall_exit(vcpu, r);
> +		if (run->exit_reason == KVM_EXIT_PAPR_HCALL) {
> +			if (unlikely(vcpu->arch.shregs.msr & MSR_PR)) {
> +				/*
> +				 * Guest userspace executed sc 1, reflect it
> +				 * back as a privileged program check interrupt.
> +				 */
> +				kvmppc_core_queue_program(vcpu, SRR1_PROGPRIV);
> +				r = RESUME_GUEST;

This is in conflict with this snippet in kvmppc_handle_exit_hv:

	case BOOK3S_INTERRUPT_SYSCALL:
	{
		/* hcall - punt to userspace */
		int i;

		/* hypercall with MSR_PR has already been handled in rmode,
		 * and never reaches here.
		 */

That function already queues some 0x700s so maybe we could move this one
in there as well.

> +			} else {
> +				trace_kvm_hcall_enter(vcpu);
> +				r = kvmppc_pseries_do_hcall(vcpu);
> +				trace_kvm_hcall_exit(vcpu, r);
> +			}
>  			kvmppc_core_prepare_to_enter(vcpu);
>  		} else if (r == RESUME_PAGE_FAULT) {
>  			srcu_idx = srcu_read_lock(&kvm->srcu);
> diff --git a/arch/powerpc/kvm/book3s_hv_interrupt.c b/arch/powerpc/kvm/book3s_hv_interrupt.c
> new file mode 100644
> index 000000000000..5a7b036c447f
> --- /dev/null
> +++ b/arch/powerpc/kvm/book3s_hv_interrupt.c
> @@ -0,0 +1,221 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +#include <linux/kernel.h>
> +#include <linux/kvm_host.h>
> +#include <asm/asm-prototypes.h>
> +#include <asm/dbell.h>
> +#include <asm/kvm_ppc.h>
> +
> +#ifdef CONFIG_KVM_BOOK3S_HV_EXIT_TIMING
> +static void __start_timing(struct kvm_vcpu *vcpu, struct kvmhv_tb_accumulator *next)
> +{
> +	struct kvmppc_vcore *vc = vcpu->arch.vcore;
> +	u64 tb = mftb() - vc->tb_offset_applied;
> +
> +	vcpu->arch.cur_activity = next;
> +	vcpu->arch.cur_tb_start = tb;
> +}
> +
> +static void __accumulate_time(struct kvm_vcpu *vcpu, struct kvmhv_tb_accumulator *next)
> +{
> +	struct kvmppc_vcore *vc = vcpu->arch.vcore;
> +	struct kvmhv_tb_accumulator *curr;
> +	u64 tb = mftb() - vc->tb_offset_applied;
> +	u64 prev_tb;
> +	u64 delta;
> +	u64 seq;
> +
> +	curr = vcpu->arch.cur_activity;
> +	vcpu->arch.cur_activity = next;
> +	prev_tb = vcpu->arch.cur_tb_start;
> +	vcpu->arch.cur_tb_start = tb;
> +
> +	if (!curr)
> +		return;
> +
> +	delta = tb - prev_tb;
> +
> +	seq = curr->seqcount;
> +	curr->seqcount = seq + 1;
> +	smp_wmb();
> +	curr->tb_total += delta;
> +	if (seq == 0 || delta < curr->tb_min)
> +		curr->tb_min = delta;
> +	if (delta > curr->tb_max)
> +		curr->tb_max = delta;
> +	smp_wmb();
> +	curr->seqcount = seq + 2;
> +}
> +
> +#define start_timing(vcpu, next) __start_timing(vcpu, next)
> +#define end_timing(vcpu) __start_timing(vcpu, NULL)
> +#define accumulate_time(vcpu, next) __accumulate_time(vcpu, next)
> +#else
> +#define start_timing(vcpu, next) do {} while (0)
> +#define end_timing(vcpu) do {} while (0)
> +#define accumulate_time(vcpu, next) do {} while (0)
> +#endif
> +
> +static inline void mfslb(unsigned int idx, u64 *slbee, u64 *slbev)
> +{
> +	asm volatile("slbmfev  %0,%1" : "=r" (*slbev) : "r" (idx));
> +	asm volatile("slbmfee  %0,%1" : "=r" (*slbee) : "r" (idx));
> +}
> +
> +static inline void mtslb(unsigned int idx, u64 slbee, u64 slbev)
> +{
> +	BUG_ON((slbee & 0xfff) != idx);
> +
> +	asm volatile("slbmte %0,%1" :: "r" (slbev), "r" (slbee));
> +}
> +
> +static inline void slb_invalidate(unsigned int ih)
> +{
> +	asm volatile("slbia %0" :: "i"(ih));
> +}
> +
> +/*
> + * Malicious or buggy radix guests may have inserted SLB entries
> + * (only 0..3 because radix always runs with UPRT=1), so these must
> + * be cleared here to avoid side-channels. slbmte is used rather
> + * than slbia, as it won't clear cached translations.
> + */
> +static void radix_clear_slb(void)
> +{
> +	u64 slbee, slbev;
> +	int i;
> +
> +	for (i = 0; i < 4; i++) {
> +		mfslb(i, &slbee, &slbev);
> +		if (unlikely(slbee || slbev)) {
> +			slbee = i;
> +			slbev = 0;
> +			mtslb(i, slbee, slbev);
> +		}
> +	}
> +}
> +
> +int __kvmhv_vcpu_entry_p9(struct kvm_vcpu *vcpu)
> +{
> +	u64 *exsave;
> +	unsigned long msr = mfmsr();
> +	int trap;
> +
> +	start_timing(vcpu, &vcpu->arch.rm_entry);
> +
> +	vcpu->arch.ceded = 0;
> +
> +	WARN_ON_ONCE(vcpu->arch.shregs.msr & MSR_HV);
> +	WARN_ON_ONCE(!(vcpu->arch.shregs.msr & MSR_ME));
> +
> +	mtspr(SPRN_HSRR0, vcpu->arch.regs.nip);
> +	mtspr(SPRN_HSRR1, (vcpu->arch.shregs.msr & ~MSR_HV) | MSR_ME);
> +
> +	/*
> +	 * On POWER9 DD2.1 and below, sometimes on a Hypervisor Data Storage
> +	 * Interrupt (HDSI) the HDSISR is not be updated at all.

s/be //

> +	 *
> +	 * To work around this we put a canary value into the HDSISR before
> +	 * returning to a guest and then check for this canary when we take a
> +	 * HDSI. If we find the canary on a HDSI, we know the hardware didn't
> +	 * update the HDSISR. In this case we return to the guest to retake the
> +	 * HDSI which should correctly update the HDSISR the second time HDSI
> +	 * entry.
> +	 *
> +	 * Just do this on all p9 processors for now.
> +	 */
> +	mtspr(SPRN_HDSISR, HDSISR_CANARY);
> +
> +	accumulate_time(vcpu, &vcpu->arch.guest_time);
> +
> +	local_paca->kvm_hstate.in_guest = KVM_GUEST_MODE_GUEST_HV_FAST;
> +	kvmppc_p9_enter_guest(vcpu);
> +	// Radix host and guest means host never runs with guest MMU state
> +	local_paca->kvm_hstate.in_guest = KVM_GUEST_MODE_NONE;
> +
> +	accumulate_time(vcpu, &vcpu->arch.rm_intr);
> +
> +	/* Get these from r11/12 and paca exsave */
> +	vcpu->arch.shregs.srr0 = mfspr(SPRN_SRR0);
> +	vcpu->arch.shregs.srr1 = mfspr(SPRN_SRR1);
> +	vcpu->arch.shregs.dar = mfspr(SPRN_DAR);
> +	vcpu->arch.shregs.dsisr = mfspr(SPRN_DSISR);
> +
> +	trap = local_paca->kvm_hstate.scratch0 & ~0x2;

Couldn't we return the trap from kvmppc_p9_enter_guest? Seems like a
nice pattern to have and avoids exposing the IVEC+0x2 magic which is hidden
in asm already.

> +	if (likely(trap > BOOK3S_INTERRUPT_MACHINE_CHECK)) {
> +		exsave = local_paca->exgen;
> +	} else if (trap == BOOK3S_INTERRUPT_SYSTEM_RESET) {
> +		exsave = local_paca->exnmi;
> +	} else { /* trap == 0x200 */
> +		exsave = local_paca->exmc;
> +	}
> +
> +	vcpu->arch.regs.gpr[1] = local_paca->kvm_hstate.scratch1;
> +	vcpu->arch.regs.gpr[3] = local_paca->kvm_hstate.scratch2;
> +	vcpu->arch.regs.gpr[9] = exsave[EX_R9/sizeof(u64)];
> +	vcpu->arch.regs.gpr[10] = exsave[EX_R10/sizeof(u64)];
> +	vcpu->arch.regs.gpr[11] = exsave[EX_R11/sizeof(u64)];
> +	vcpu->arch.regs.gpr[12] = exsave[EX_R12/sizeof(u64)];
> +	vcpu->arch.regs.gpr[13] = exsave[EX_R13/sizeof(u64)];
> +	vcpu->arch.ppr = exsave[EX_PPR/sizeof(u64)];
> +	vcpu->arch.cfar = exsave[EX_CFAR/sizeof(u64)];
> +	vcpu->arch.regs.ctr = exsave[EX_CTR/sizeof(u64)];
> +
> +	vcpu->arch.last_inst = KVM_INST_FETCH_FAILED;
> +
> +	if (unlikely(trap == BOOK3S_INTERRUPT_MACHINE_CHECK)) {
> +		vcpu->arch.fault_dar = exsave[EX_DAR/sizeof(u64)];
> +		vcpu->arch.fault_dsisr = exsave[EX_DSISR/sizeof(u64)];
> +		kvmppc_realmode_machine_check(vcpu);
> +
> +	} else if (unlikely(trap == BOOK3S_INTERRUPT_HMI)) {
> +		kvmppc_realmode_hmi_handler();
> +
> +	} else if (trap == BOOK3S_INTERRUPT_H_EMUL_ASSIST) {
> +		vcpu->arch.emul_inst = mfspr(SPRN_HEIR);
> +
> +	} else if (trap == BOOK3S_INTERRUPT_H_DATA_STORAGE) {
> +		vcpu->arch.fault_dar = exsave[EX_DAR/sizeof(u64)];
> +		vcpu->arch.fault_dsisr = exsave[EX_DSISR/sizeof(u64)];
> +		vcpu->arch.fault_gpa = mfspr(SPRN_ASDR);
> +
> +	} else if (trap == BOOK3S_INTERRUPT_H_INST_STORAGE) {
> +		vcpu->arch.fault_gpa = mfspr(SPRN_ASDR);
> +
> +	} else if (trap == BOOK3S_INTERRUPT_H_FAC_UNAVAIL) {
> +		vcpu->arch.hfscr = mfspr(SPRN_HFSCR);
> +
> +#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
> +	/*
> +	 * Softpatch interrupt for transactional memory emulation cases
> +	 * on POWER9 DD2.2.  This is early in the guest exit path - we
> +	 * haven't saved registers or done a treclaim yet.
> +	 */
> +	} else if (trap == BOOK3S_INTERRUPT_HV_SOFTPATCH) {
> +		vcpu->arch.emul_inst = mfspr(SPRN_HEIR);
> +
> +		/*
> +		 * The cases we want to handle here are those where the guest
> +		 * is in real suspend mode and is trying to transition to
> +		 * transactional mode.
> +		 */
> +		if (local_paca->kvm_hstate.fake_suspend &&
> +				(vcpu->arch.shregs.msr & MSR_TS_S)) {
> +			if (kvmhv_p9_tm_emulation_early(vcpu)) {
> +				/* Prevent it being handled again. */
> +				trap = 0;
> +			}
> +		}
> +#endif
> +	}
> +
> +	radix_clear_slb();
> +
> +	__mtmsrd(msr, 0);

Isn't this the same as mtmsr(msr) ?

> +
> +	accumulate_time(vcpu, &vcpu->arch.rm_exit);
> +
> +	end_timing(vcpu);
> +
> +	return trap;
> +}
> +EXPORT_SYMBOL_GPL(__kvmhv_vcpu_entry_p9);

<snip>

^ permalink raw reply

* Re: [PATCH v2 01/37] KVM: PPC: Book3S 64: remove unused kvmppc_h_protect argument
From: Nicholas Piggin @ 2021-02-26 23:50 UTC (permalink / raw)
  To: Daniel Axtens, kvm-ppc; +Cc: linuxppc-dev
In-Reply-To: <878s7ba0cm.fsf@linkitivity.dja.id.au>

Excerpts from Daniel Axtens's message of February 26, 2021 3:01 pm:
> Hi Nick,
> 
>> The va argument is not used in the function or set by its asm caller,
>> so remove it to be safe.
> 
> Huh, so it isn't. I tracked the original implementation down to commit
> a8606e20e41a ("KVM: PPC: Handle some PAPR hcalls in the kernel") where
> paulus first added the ability to handle it in the kernel - there it
> takes a va argument but even then doesn't do anything with it.
> 
> ajd also pointed out that we don't pass a va when linux is running as a
> guest, and LoPAR does not mention va as an argument.

Yeah interesting, maybe it was from a pre-release version of PAPR? Who
knows.

> One small nit: checkpatch is complaining about spaces vs tabs:
> ERROR: code indent should use tabs where possible
> #25: FILE: arch/powerpc/include/asm/kvm_ppc.h:770:
> +                      unsigned long pte_index, unsigned long avpn);$
> 
> WARNING: please, no spaces at the start of a line
> #25: FILE: arch/powerpc/include/asm/kvm_ppc.h:770:
> +                      unsigned long pte_index, unsigned long avpn);$

All the declarations are using the same style in this file so I think
I'll leave it for someone to do a cleanup patch on. Okay?

> 
> Once that is resolved,
>   Reviewed-by: Daniel Axtens <dja@axtens.net>

Thanks,
Nick

> 
> Kind regards,
> Daniel Axtens
> 
>> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
>> ---
>>  arch/powerpc/include/asm/kvm_ppc.h  | 3 +--
>>  arch/powerpc/kvm/book3s_hv_rm_mmu.c | 3 +--
>>  2 files changed, 2 insertions(+), 4 deletions(-)
>>
>> diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
>> index 8aacd76bb702..9531b1c1b190 100644
>> --- a/arch/powerpc/include/asm/kvm_ppc.h
>> +++ b/arch/powerpc/include/asm/kvm_ppc.h
>> @@ -767,8 +767,7 @@ long kvmppc_h_remove(struct kvm_vcpu *vcpu, unsigned long flags,
>>                       unsigned long pte_index, unsigned long avpn);
>>  long kvmppc_h_bulk_remove(struct kvm_vcpu *vcpu);
>>  long kvmppc_h_protect(struct kvm_vcpu *vcpu, unsigned long flags,
>> -                      unsigned long pte_index, unsigned long avpn,
>> -                      unsigned long va);
>> +                      unsigned long pte_index, unsigned long avpn);
>>  long kvmppc_h_read(struct kvm_vcpu *vcpu, unsigned long flags,
>>                     unsigned long pte_index);
>>  long kvmppc_h_clear_ref(struct kvm_vcpu *vcpu, unsigned long flags,
>> diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
>> index 88da2764c1bb..7af7c70f1468 100644
>> --- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
>> +++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
>> @@ -673,8 +673,7 @@ long kvmppc_h_bulk_remove(struct kvm_vcpu *vcpu)
>>  }
>>  
>>  long kvmppc_h_protect(struct kvm_vcpu *vcpu, unsigned long flags,
>> -		      unsigned long pte_index, unsigned long avpn,
>> -		      unsigned long va)
>> +		      unsigned long pte_index, unsigned long avpn)
>>  {
>>  	struct kvm *kvm = vcpu->kvm;
>>  	__be64 *hpte;
>> -- 
>> 2.23.0
> 

^ permalink raw reply

* Re: [PATCH v2 04/37] powerpc/64s: remove KVM SKIP test from instruction breakpoint handler
From: Nicholas Piggin @ 2021-02-26 23:51 UTC (permalink / raw)
  To: Daniel Axtens, kvm-ppc; +Cc: linuxppc-dev, Fabiano Rosas
In-Reply-To: <8735xj9yd4.fsf@linkitivity.dja.id.au>

Excerpts from Daniel Axtens's message of February 26, 2021 3:44 pm:
> Nicholas Piggin <npiggin@gmail.com> writes:
> 
>> The code being executed in KVM_GUEST_MODE_SKIP is hypervisor code with
>> MSR[IR]=0, so the faults of concern are the d-side ones caused by access
>> to guest context by the hypervisor.
>>
>> Instruction breakpoint interrupts are not a concern here. It's unlikely
>> any good would come of causing breaks in this code, but skipping the
>> instruction that caused it won't help matters (e.g., skip the mtmsr that
>> sets MSR[DR]=0 or clears KVM_GUEST_MODE_SKIP).
> 
> I'm not entirely clear on the example here, but the patch makes sense
> and I can follow your logic for removing the IKVM_SKIP handler from the
> instruction breakpoint exception.

The example just means that a breakpoint interrupt on any instruction 
inside the guest mode skip region would be skipped, and skipping one of 
those (mtmsrd or store that gets us out of guest mode skip) would cause 
a crash.

Thanks,
Nick

> 
> On that basis:
> Reviewed-by: Daniel Axtens <dja@axtens.net>
> 
> Kind regards,
> Daniel
> 
>>
>> Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
>> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
>> ---
>>  arch/powerpc/kernel/exceptions-64s.S | 1 -
>>  1 file changed, 1 deletion(-)
>>
>> diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
>> index a027600beeb1..0097e0676ed7 100644
>> --- a/arch/powerpc/kernel/exceptions-64s.S
>> +++ b/arch/powerpc/kernel/exceptions-64s.S
>> @@ -2553,7 +2553,6 @@ EXC_VIRT_NONE(0x5200, 0x100)
>>  INT_DEFINE_BEGIN(instruction_breakpoint)
>>  	IVEC=0x1300
>>  #ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE
>> -	IKVM_SKIP=1
>>  	IKVM_REAL=1
>>  #endif
>>  INT_DEFINE_END(instruction_breakpoint)
>> -- 
>> 2.23.0
> 

^ permalink raw reply

* Re: [PATCH v2 05/37] KVM: PPC: Book3S HV: Ensure MSR[ME] is always set in guest MSR
From: Nicholas Piggin @ 2021-02-26 23:55 UTC (permalink / raw)
  To: Daniel Axtens, kvm-ppc; +Cc: linuxppc-dev, Fabiano Rosas
In-Reply-To: <87zgzr8is2.fsf@linkitivity.dja.id.au>

Excerpts from Daniel Axtens's message of February 26, 2021 4:06 pm:
> Hi Nick,
> 
>>  void kvmppc_set_msr_hv(struct kvm_vcpu *vcpu, u64 msr)
>>  {
>> +	/*
>> +	 * Guest must always run with machine check interrupt
>> +	 * enabled.
>> +	 */
>> +	if (!(msr & MSR_ME))
>> +		msr |= MSR_ME;
> 
> This 'if' is technically redundant but you mention a future patch warning
> on !(msr & MSR_ME) so I'm holding off on any judgement about the 'if' until
> I get to that patch :)

That's true. The warning is actually further down when we're setting up 
the msr to run in guest mode. At this point the MSR I think comes from
qemu (and arguably the guest setup code shouldn't need to know about HV
specific MSR bits) so a warning here wouldn't be appropriate.

I could remove the if, although the compiler might already do that.

> 
> The patch seems sane to me, I agree that we don't want guests running with
> MSR_ME=0 and kvmppc_set_msr_hv already ensures that the transactional state is
> sane so this is another sanity-enforcement in the same sort of vein.
> 
> All up:
> Reviewed-by: Daniel Axtens <dja@axtens.net>

Thanks,
Nick

^ permalink raw reply

* Re: [PATCH v2 13/37] KVM: PPC: Book3S HV P9: Move radix MMU switching instructions together
From: Nicholas Piggin @ 2021-02-26 23:57 UTC (permalink / raw)
  To: Fabiano Rosas, kvm-ppc; +Cc: linuxppc-dev
In-Reply-To: <875z2eyg9r.fsf@linux.ibm.com>

Excerpts from Fabiano Rosas's message of February 27, 2021 1:56 am:
> Nicholas Piggin <npiggin@gmail.com> writes:
> 
>> Switching the MMU from radix<->radix mode is tricky particularly as the
>> MMU can remain enabled and requires a certain sequence of SPR updates.
>> Move these together into their own functions.
>>
>> This also includes the radix TLB check / flush because it's tied in to
>> MMU switching due to tlbiel getting LPID from LPIDR.
>>
>> (XXX: isync / hwsync synchronisation TBD)
> 
> I see bot mtlpidr and mtlpcr requiring a CSI in the ISA. Do you say we
> might need more than an isync?

We might need a CSI before it, we might also need a hwsync before it. I
don't know whether we need isyncs between any of them (I don't think we
should because they're all mtsprs).


> 
> Regardless, I'd expect that to be separate from the refactoring here,
> so:
> 
> Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>

Thanks,
Nick
> 
>>
>> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
>> ---
>>  arch/powerpc/kvm/book3s_hv.c | 55 +++++++++++++++++++++---------------
>>  1 file changed, 32 insertions(+), 23 deletions(-)
>>
>> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
>> index 895090636295..23d6dc04b0e9 100644
>> --- a/arch/powerpc/kvm/book3s_hv.c
>> +++ b/arch/powerpc/kvm/book3s_hv.c
>> @@ -3440,12 +3440,38 @@ static noinline void kvmppc_run_core(struct kvmppc_vcore *vc)
>>  	trace_kvmppc_run_core(vc, 1);
>>  }
>>
>> +static void switch_mmu_to_guest_radix(struct kvm *kvm, struct kvm_vcpu *vcpu, u64 lpcr)
>> +{
>> +	struct kvmppc_vcore *vc = vcpu->arch.vcore;
>> +	struct kvm_nested_guest *nested = vcpu->arch.nested;
>> +	u32 lpid;
>> +
>> +	lpid = nested ? nested->shadow_lpid : kvm->arch.lpid;
>> +
>> +	mtspr(SPRN_LPID, lpid);
>> +	mtspr(SPRN_LPCR, lpcr);
>> +	mtspr(SPRN_PID, vcpu->arch.pid);
>> +	isync();
>> +
>> +	/* TLBIEL must have LPIDR set, so set guest LPID before flushing. */
>> +	kvmppc_check_need_tlb_flush(kvm, vc->pcpu, nested);
>> +}
>> +
>> +static void switch_mmu_to_host_radix(struct kvm *kvm, u32 pid)
>> +{
>> +	mtspr(SPRN_PID, pid);
>> +	mtspr(SPRN_LPID, kvm->arch.host_lpid);
>> +	mtspr(SPRN_LPCR, kvm->arch.host_lpcr);
>> +	isync();
>> +}
>> +
>>  /*
>>   * Load up hypervisor-mode registers on P9.
>>   */
>>  static int kvmhv_load_hv_regs_and_go(struct kvm_vcpu *vcpu, u64 time_limit,
>>  				     unsigned long lpcr)
>>  {
>> +	struct kvm *kvm = vcpu->kvm;
>>  	struct kvmppc_vcore *vc = vcpu->arch.vcore;
>>  	s64 hdec;
>>  	u64 tb, purr, spurr;
>> @@ -3468,12 +3494,12 @@ static int kvmhv_load_hv_regs_and_go(struct kvm_vcpu *vcpu, u64 time_limit,
>>  	 * P8 and P9 suppress the HDEC exception when LPCR[HDICE] = 0,
>>  	 * so set HDICE before writing HDEC.
>>  	 */
>> -	mtspr(SPRN_LPCR, vcpu->kvm->arch.host_lpcr | LPCR_HDICE);
>> +	mtspr(SPRN_LPCR, kvm->arch.host_lpcr | LPCR_HDICE);
>>  	isync();
>>
>>  	hdec = time_limit - mftb();
>>  	if (hdec < 0) {
>> -		mtspr(SPRN_LPCR, vcpu->kvm->arch.host_lpcr);
>> +		mtspr(SPRN_LPCR, kvm->arch.host_lpcr);
>>  		isync();
>>  		return BOOK3S_INTERRUPT_HV_DECREMENTER;
>>  	}
>> @@ -3508,7 +3534,6 @@ static int kvmhv_load_hv_regs_and_go(struct kvm_vcpu *vcpu, u64 time_limit,
>>  	}
>>  	mtspr(SPRN_CIABR, vcpu->arch.ciabr);
>>  	mtspr(SPRN_IC, vcpu->arch.ic);
>> -	mtspr(SPRN_PID, vcpu->arch.pid);
>>
>>  	mtspr(SPRN_PSSCR, vcpu->arch.psscr | PSSCR_EC |
>>  	      (local_paca->kvm_hstate.fake_suspend << PSSCR_FAKE_SUSPEND_LG));
>> @@ -3522,8 +3547,7 @@ static int kvmhv_load_hv_regs_and_go(struct kvm_vcpu *vcpu, u64 time_limit,
>>
>>  	mtspr(SPRN_AMOR, ~0UL);
>>
>> -	mtspr(SPRN_LPCR, lpcr);
>> -	isync();
>> +	switch_mmu_to_guest_radix(kvm, vcpu, lpcr);
>>
>>  	kvmppc_xive_push_vcpu(vcpu);
>>
>> @@ -3562,7 +3586,6 @@ static int kvmhv_load_hv_regs_and_go(struct kvm_vcpu *vcpu, u64 time_limit,
>>  		mtspr(SPRN_DAWR1, host_dawr1);
>>  		mtspr(SPRN_DAWRX1, host_dawrx1);
>>  	}
>> -	mtspr(SPRN_PID, host_pidr);
>>
>>  	/*
>>  	 * Since this is radix, do a eieio; tlbsync; ptesync sequence in
>> @@ -3577,9 +3600,6 @@ static int kvmhv_load_hv_regs_and_go(struct kvm_vcpu *vcpu, u64 time_limit,
>>  	if (cpu_has_feature(CPU_FTR_ARCH_31))
>>  		asm volatile(PPC_CP_ABORT);
>>
>> -	mtspr(SPRN_LPID, vcpu->kvm->arch.host_lpid);	/* restore host LPID */
>> -	isync();
>> -
>>  	vc->dpdes = mfspr(SPRN_DPDES);
>>  	vc->vtb = mfspr(SPRN_VTB);
>>  	mtspr(SPRN_DPDES, 0);
>> @@ -3596,7 +3616,8 @@ static int kvmhv_load_hv_regs_and_go(struct kvm_vcpu *vcpu, u64 time_limit,
>>  	}
>>
>>  	mtspr(SPRN_HDEC, 0x7fffffff);
>> -	mtspr(SPRN_LPCR, vcpu->kvm->arch.host_lpcr);
>> +
>> +	switch_mmu_to_host_radix(kvm, host_pidr);
>>
>>  	return trap;
>>  }
>> @@ -4130,7 +4151,7 @@ int kvmhv_run_single_vcpu(struct kvm_vcpu *vcpu, u64 time_limit,
>>  {
>>  	struct kvm_run *run = vcpu->run;
>>  	int trap, r, pcpu;
>> -	int srcu_idx, lpid;
>> +	int srcu_idx;
>>  	struct kvmppc_vcore *vc;
>>  	struct kvm *kvm = vcpu->kvm;
>>  	struct kvm_nested_guest *nested = vcpu->arch.nested;
>> @@ -4204,13 +4225,6 @@ int kvmhv_run_single_vcpu(struct kvm_vcpu *vcpu, u64 time_limit,
>>  	vc->vcore_state = VCORE_RUNNING;
>>  	trace_kvmppc_run_core(vc, 0);
>>
>> -	if (cpu_has_feature(CPU_FTR_HVMODE)) {
>> -		lpid = nested ? nested->shadow_lpid : kvm->arch.lpid;
>> -		mtspr(SPRN_LPID, lpid);
>> -		isync();
>> -		kvmppc_check_need_tlb_flush(kvm, pcpu, nested);
>> -	}
>> -
>>  	guest_enter_irqoff();
>>
>>  	srcu_idx = srcu_read_lock(&kvm->srcu);
>> @@ -4229,11 +4243,6 @@ int kvmhv_run_single_vcpu(struct kvm_vcpu *vcpu, u64 time_limit,
>>
>>  	srcu_read_unlock(&kvm->srcu, srcu_idx);
>>
>> -	if (cpu_has_feature(CPU_FTR_HVMODE)) {
>> -		mtspr(SPRN_LPID, kvm->arch.host_lpid);
>> -		isync();
>> -	}
>> -
>>  	set_irq_happened(trap);
>>
>>  	kvmppc_set_host_core(pcpu);
> 

^ permalink raw reply

* Re: [PATCH v2 16/37] KVM: PPC: Book3S HV P9: Stop handling hcalls in real-mode in the P9 path
From: Nicholas Piggin @ 2021-02-26 23:59 UTC (permalink / raw)
  To: Cédric Le Goater, kvm-ppc; +Cc: linuxppc-dev
In-Reply-To: <47ae7b2f-9356-6cff-da38-142eaea773ca@kaod.org>

Excerpts from Cédric Le Goater's message of February 26, 2021 12:51 am:
> On 2/25/21 2:46 PM, Nicholas Piggin wrote:
>> In the interest of minimising the amount of code that is run in
>> "real-mode", don't handle hcalls in real mode in the P9 path.
>> 
>> POWER8 and earlier are much more expensive to exit from HV real mode
>> and switch to host mode, because on those processors HV interrupts get
>> to the hypervisor with the MMU off, and the other threads in the core
>> need to be pulled out of the guest, and SLBs all need to be saved,
>> ERATs invalidated, and host SLB reloaded before the MMU is re-enabled
>> in host mode. Hash guests also require a lot of hcalls to run. The
>> XICS interrupt controller requires hcalls to run.
>> 
>> By contrast, POWER9 has independent thread switching, and in radix mode
>> the hypervisor is already in a host virtual memory mode when the HV
>> interrupt is taken. Radix + xive guests don't need hcalls to handle
>> interrupts or manage translations.
>> 
>> So it's much less important to handle hcalls in real mode in P9.
>> 
>> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
>> ---
>>  arch/powerpc/include/asm/kvm_ppc.h      |  5 +++++
>>  arch/powerpc/kvm/book3s_hv.c            | 25 ++++++++++++++++++++++---
>>  arch/powerpc/kvm/book3s_hv_rmhandlers.S |  5 +++++
>>  arch/powerpc/kvm/book3s_xive.c          | 25 +++++++++++++++++++++++++
>>  4 files changed, 57 insertions(+), 3 deletions(-)
>> 
>> diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
>> index 73b1ca5a6471..db6646c2ade2 100644
>> --- a/arch/powerpc/include/asm/kvm_ppc.h
>> +++ b/arch/powerpc/include/asm/kvm_ppc.h
>> @@ -607,6 +607,7 @@ extern void kvmppc_free_pimap(struct kvm *kvm);
>>  extern int kvmppc_xics_rm_complete(struct kvm_vcpu *vcpu, u32 hcall);
>>  extern void kvmppc_xics_free_icp(struct kvm_vcpu *vcpu);
>>  extern int kvmppc_xics_hcall(struct kvm_vcpu *vcpu, u32 cmd);
>> +extern int kvmppc_xive_xics_hcall(struct kvm_vcpu *vcpu, u32 req);
>>  extern u64 kvmppc_xics_get_icp(struct kvm_vcpu *vcpu);
>>  extern int kvmppc_xics_set_icp(struct kvm_vcpu *vcpu, u64 icpval);
>>  extern int kvmppc_xics_connect_vcpu(struct kvm_device *dev,
>> @@ -639,6 +640,8 @@ static inline int kvmppc_xics_enabled(struct kvm_vcpu *vcpu)
>>  static inline void kvmppc_xics_free_icp(struct kvm_vcpu *vcpu) { }
>>  static inline int kvmppc_xics_hcall(struct kvm_vcpu *vcpu, u32 cmd)
>>  	{ return 0; }
>> +static inline int kvmppc_xive_xics_hcall(struct kvm_vcpu *vcpu, u32 req)
>> +	{ return 0; }
>>  #endif
>>  
>>  #ifdef CONFIG_KVM_XIVE
>> @@ -673,6 +676,7 @@ extern int kvmppc_xive_set_irq(struct kvm *kvm, int irq_source_id, u32 irq,
>>  			       int level, bool line_status);
>>  extern void kvmppc_xive_push_vcpu(struct kvm_vcpu *vcpu);
>>  extern void kvmppc_xive_pull_vcpu(struct kvm_vcpu *vcpu);
>> +extern void kvmppc_xive_cede_vcpu(struct kvm_vcpu *vcpu);
> 
> I can not find this routine. Is it missing or coming later in the patchset ? 

Yeah it leaked into a later patch but it belongs here. I'll fix it.

Thanks,
Nick

^ permalink raw reply

* Re: [PATCH v2 17/37] KVM: PPC: Book3S HV P9: Move setting HDEC after switching to guest LPCR
From: Nicholas Piggin @ 2021-02-26 23:59 UTC (permalink / raw)
  To: Fabiano Rosas, kvm-ppc; +Cc: linuxppc-dev
In-Reply-To: <8735xiyebs.fsf@linux.ibm.com>

Excerpts from Fabiano Rosas's message of February 27, 2021 2:38 am:
> Nicholas Piggin <npiggin@gmail.com> writes:
> 
>> LPCR[HDICE]=0 suppresses hypervisor decrementer exceptions on some
>> processors, so it must be enabled before HDEC is set.
>>
>> Rather than set it in the host LPCR then setting HDEC, move the HDEC
>> update to after the guest MMU context (including LPCR) is loaded.
>> There shouldn't be much concern with delaying HDEC by some 10s or 100s
>> of nanoseconds by setting it a bit later.
>>
>> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
>> ---
>>  arch/powerpc/kvm/book3s_hv.c | 24 ++++++++++--------------
>>  1 file changed, 10 insertions(+), 14 deletions(-)
>>
>> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
>> index d4770b222d7e..63cc92c45c5d 100644
>> --- a/arch/powerpc/kvm/book3s_hv.c
>> +++ b/arch/powerpc/kvm/book3s_hv.c
>> @@ -3490,23 +3490,13 @@ static int kvmhv_load_hv_regs_and_go(struct kvm_vcpu *vcpu, u64 time_limit,
>>  		host_dawrx1 = mfspr(SPRN_DAWRX1);
>>  	}
>>
>> -	/*
>> -	 * P8 and P9 suppress the HDEC exception when LPCR[HDICE] = 0,
>> -	 * so set HDICE before writing HDEC.
>> -	 */
>> -	mtspr(SPRN_LPCR, kvm->arch.host_lpcr | LPCR_HDICE);
>> -	isync();
>> -
>> -	hdec = time_limit - mftb();
> 
> Would it be possible to leave the mftb() in this patch and then replace
> them all at once in patch 20/37 - "KVM: PPC: Book3S HV P9: Reduce mftb
> per guest entry/exit"?

I suppose that makes sense. I'll see how that looks.

Thanks,
Nick

^ permalink raw reply

* Re: [PATCH v2 23/37] KVM: PPC: Book3S HV P9: Implement the rest of the P9 path in C
From: Nicholas Piggin @ 2021-02-27  0:21 UTC (permalink / raw)
  To: Fabiano Rosas, kvm-ppc; +Cc: linuxppc-dev
In-Reply-To: <87y2fawj4n.fsf@linux.ibm.com>

Excerpts from Fabiano Rosas's message of February 27, 2021 8:37 am:
> Nicholas Piggin <npiggin@gmail.com> writes:
> 
> Hi, thanks for this. It helped me clarify a bunch of details that I
> haven't understood while reading the asm code.

Yes, me too :)

>> +/*
>> + * void kvmppc_p9_enter_guest(struct vcpu *vcpu);
>> + *
>> + * Enter the guest on a ISAv3.0 or later system where we have exactly
>> + * one vcpu per vcore, and both the host and guest are radix, and threads
>> + * are set to "indepdent mode".
>> + */
>> +.balign	IFETCH_ALIGN_BYTES
>> +_GLOBAL(kvmppc_p9_enter_guest)
>> +EXPORT_SYMBOL_GPL(kvmppc_p9_enter_guest)
>> +	mflr	r0
>> +	std	r0,PPC_LR_STKOFF(r1)
>> +	stdu	r1,-SFS(r1)
>> +
>> +	std	r1,HSTATE_HOST_R1(r13)
>> +	std	r3,STACK_SLOT_VCPU(r1)
> 
> The vcpu was stored on the paca previously. I don't get the change,
> could you explain?

The stack is a nicer place to keep things. We only need one place to 
save the stack, then most things can come from there. There's actually 
more paca stuff we could move into local variables or onto the stack
too.

It was probably done like this because PR KVM which probably can't 
access the stack in real mode when running in an LPAR, and came across 
to the HV code that way. New path doesn't require it.

>> +kvmppc_p9_exit_interrupt:
>> +	std     r1,HSTATE_SCRATCH1(r13)
>> +	std     r3,HSTATE_SCRATCH2(r13)
>> +	ld	r1,HSTATE_HOST_R1(r13)
>> +	ld	r3,STACK_SLOT_VCPU(r1)
>> +
>> +	std	r9,VCPU_CR(r3)
>> +
>> +1:
>> +	std	r11,VCPU_PC(r3)
>> +	std	r12,VCPU_MSR(r3)
>> +
>> +	reg = 14
>> +	.rept	18
>> +	std	reg,__VCPU_GPR(reg)(r3)
>> +	reg = reg + 1
>> +	.endr
>> +
>> +	/* r1, r3, r9-r13 are saved to vcpu by C code */
> 
> If we just saved r1 and r3, why don't we put them in the vcpu right now?
> That would avoid having the C code knowing about scratch areas.

Putting it in C avoids having the asm code know about scratch areas.

>> @@ -4429,11 +4432,19 @@ static int kvmppc_vcpu_run_hv(struct kvm_vcpu *vcpu)
>>  		else
>>  			r = kvmppc_run_vcpu(vcpu);
>>
>> -		if (run->exit_reason == KVM_EXIT_PAPR_HCALL &&
>> -		    !(vcpu->arch.shregs.msr & MSR_PR)) {
>> -			trace_kvm_hcall_enter(vcpu);
>> -			r = kvmppc_pseries_do_hcall(vcpu);
>> -			trace_kvm_hcall_exit(vcpu, r);
>> +		if (run->exit_reason == KVM_EXIT_PAPR_HCALL) {
>> +			if (unlikely(vcpu->arch.shregs.msr & MSR_PR)) {
>> +				/*
>> +				 * Guest userspace executed sc 1, reflect it
>> +				 * back as a privileged program check interrupt.
>> +				 */
>> +				kvmppc_core_queue_program(vcpu, SRR1_PROGPRIV);
>> +				r = RESUME_GUEST;
> 
> This is in conflict with this snippet in kvmppc_handle_exit_hv:
> 
> 	case BOOK3S_INTERRUPT_SYSCALL:
> 	{
> 		/* hcall - punt to userspace */
> 		int i;
> 
> 		/* hypercall with MSR_PR has already been handled in rmode,
> 		 * and never reaches here.
> 		 */
> 
> That function already queues some 0x700s so maybe we could move this one
> in there as well.

I don't think it conflicts, but I think perhaps it should go in the 
patch which removed the real mode handlers.

kvmppc_handle_exit_hv is used by both HV paths so for now it's a bit 
neater to try get things into the same state, but we could do this in a 
later patch perhaps.

>> +	 *
>> +	 * To work around this we put a canary value into the HDSISR before
>> +	 * returning to a guest and then check for this canary when we take a
>> +	 * HDSI. If we find the canary on a HDSI, we know the hardware didn't
>> +	 * update the HDSISR. In this case we return to the guest to retake the
>> +	 * HDSI which should correctly update the HDSISR the second time HDSI
>> +	 * entry.
>> +	 *
>> +	 * Just do this on all p9 processors for now.
>> +	 */
>> +	mtspr(SPRN_HDSISR, HDSISR_CANARY);
>> +
>> +	accumulate_time(vcpu, &vcpu->arch.guest_time);
>> +
>> +	local_paca->kvm_hstate.in_guest = KVM_GUEST_MODE_GUEST_HV_FAST;
>> +	kvmppc_p9_enter_guest(vcpu);
>> +	// Radix host and guest means host never runs with guest MMU state
>> +	local_paca->kvm_hstate.in_guest = KVM_GUEST_MODE_NONE;
>> +
>> +	accumulate_time(vcpu, &vcpu->arch.rm_intr);
>> +
>> +	/* Get these from r11/12 and paca exsave */
>> +	vcpu->arch.shregs.srr0 = mfspr(SPRN_SRR0);
>> +	vcpu->arch.shregs.srr1 = mfspr(SPRN_SRR1);
>> +	vcpu->arch.shregs.dar = mfspr(SPRN_DAR);
>> +	vcpu->arch.shregs.dsisr = mfspr(SPRN_DSISR);
>> +
>> +	trap = local_paca->kvm_hstate.scratch0 & ~0x2;
> 
> Couldn't we return the trap from kvmppc_p9_enter_guest? Seems like a
> nice pattern to have and avoids exposing the IVEC+0x2 magic which is hidden
> in asm already.

I use the 0x2 test in C in a later patch.

The IVEC+0x2 thing might just go away entirely though for new HV path, 
but we'd still clear it in C on the principle of minimal asm code.

>> +	radix_clear_slb();
>> +
>> +	__mtmsrd(msr, 0);
> 
> Isn't this the same as mtmsr(msr) ?

Yes. For 64s you can use the __ variant though. We use it in other 
places in this function with L=1 so it doesn't make sense to mix 
variants.

Thanks,
Nick

> 
>> +
>> +	accumulate_time(vcpu, &vcpu->arch.rm_exit);
>> +
>> +	end_timing(vcpu);
>> +
>> +	return trap;
>> +}
>> +EXPORT_SYMBOL_GPL(__kvmhv_vcpu_entry_p9);
> 
> <snip>
> 

^ permalink raw reply

* Re: [PATCH v2 23/37] KVM: PPC: Book3S HV P9: Implement the rest of the P9 path in C
From: Nicholas Piggin @ 2021-02-27  0:55 UTC (permalink / raw)
  To: Fabiano Rosas, kvm-ppc; +Cc: linuxppc-dev
In-Reply-To: <1614384320.mv2klmakck.astroid@bobo.none>

Excerpts from Nicholas Piggin's message of February 27, 2021 10:21 am:
> Excerpts from Fabiano Rosas's message of February 27, 2021 8:37 am:
>> Nicholas Piggin <npiggin@gmail.com> writes:
>> 
>> Hi, thanks for this. It helped me clarify a bunch of details that I
>> haven't understood while reading the asm code.
> 
> Yes, me too :)
> 
>>> +/*
>>> + * void kvmppc_p9_enter_guest(struct vcpu *vcpu);
>>> + *
>>> + * Enter the guest on a ISAv3.0 or later system where we have exactly
>>> + * one vcpu per vcore, and both the host and guest are radix, and threads
>>> + * are set to "indepdent mode".
>>> + */
>>> +.balign	IFETCH_ALIGN_BYTES
>>> +_GLOBAL(kvmppc_p9_enter_guest)
>>> +EXPORT_SYMBOL_GPL(kvmppc_p9_enter_guest)
>>> +	mflr	r0
>>> +	std	r0,PPC_LR_STKOFF(r1)
>>> +	stdu	r1,-SFS(r1)
>>> +
>>> +	std	r1,HSTATE_HOST_R1(r13)
>>> +	std	r3,STACK_SLOT_VCPU(r1)
>> 
>> The vcpu was stored on the paca previously. I don't get the change,
>> could you explain?
> 
> The stack is a nicer place to keep things. We only need one place to 
> save the stack, then most things can come from there. There's actually 
> more paca stuff we could move into local variables or onto the stack
> too.
> 
> It was probably done like this because PR KVM which probably can't 
> access the stack in real mode when running in an LPAR, and came across 
> to the HV code that way. New path doesn't require it.
> 
>>> +kvmppc_p9_exit_interrupt:
>>> +	std     r1,HSTATE_SCRATCH1(r13)
>>> +	std     r3,HSTATE_SCRATCH2(r13)
>>> +	ld	r1,HSTATE_HOST_R1(r13)
>>> +	ld	r3,STACK_SLOT_VCPU(r1)
>>> +
>>> +	std	r9,VCPU_CR(r3)
>>> +
>>> +1:
>>> +	std	r11,VCPU_PC(r3)
>>> +	std	r12,VCPU_MSR(r3)
>>> +
>>> +	reg = 14
>>> +	.rept	18
>>> +	std	reg,__VCPU_GPR(reg)(r3)
>>> +	reg = reg + 1
>>> +	.endr
>>> +
>>> +	/* r1, r3, r9-r13 are saved to vcpu by C code */
>> 
>> If we just saved r1 and r3, why don't we put them in the vcpu right now?
>> That would avoid having the C code knowing about scratch areas.
> 
> Putting it in C avoids having the asm code know about scratch areas.
> 
>>> @@ -4429,11 +4432,19 @@ static int kvmppc_vcpu_run_hv(struct kvm_vcpu *vcpu)
>>>  		else
>>>  			r = kvmppc_run_vcpu(vcpu);
>>>
>>> -		if (run->exit_reason == KVM_EXIT_PAPR_HCALL &&
>>> -		    !(vcpu->arch.shregs.msr & MSR_PR)) {
>>> -			trace_kvm_hcall_enter(vcpu);
>>> -			r = kvmppc_pseries_do_hcall(vcpu);
>>> -			trace_kvm_hcall_exit(vcpu, r);
>>> +		if (run->exit_reason == KVM_EXIT_PAPR_HCALL) {
>>> +			if (unlikely(vcpu->arch.shregs.msr & MSR_PR)) {
>>> +				/*
>>> +				 * Guest userspace executed sc 1, reflect it
>>> +				 * back as a privileged program check interrupt.
>>> +				 */
>>> +				kvmppc_core_queue_program(vcpu, SRR1_PROGPRIV);
>>> +				r = RESUME_GUEST;
>> 
>> This is in conflict with this snippet in kvmppc_handle_exit_hv:
>> 
>> 	case BOOK3S_INTERRUPT_SYSCALL:
>> 	{
>> 		/* hcall - punt to userspace */
>> 		int i;
>> 
>> 		/* hypercall with MSR_PR has already been handled in rmode,
>> 		 * and never reaches here.
>> 		 */
>> 
>> That function already queues some 0x700s so maybe we could move this one
>> in there as well.
> 
> I don't think it conflicts, but I think perhaps it should go in the 
> patch which removed the real mode handlers.

Oh I'm wrong, it's actually the other way around by the looks.

Okay I'll fix this up.

Thanks,
Nick

^ permalink raw reply

* [PATCH v7 01/10] powerpc/uaccess: Add unsafe_copy_from_user()
From: Christopher M. Riedl @ 2021-02-27  1:12 UTC (permalink / raw)
  To: linuxppc-dev
In-Reply-To: <20210227011259.11992-1-cmr@codefail.de>

Use the same approach as unsafe_copy_to_user() but instead call
unsafe_get_user() in a loop.

Signed-off-by: Christopher M. Riedl <cmr@codefail.de>
---
v7:	* Change implementation to call unsafe_get_user() and remove
	  dja's 'Reviewed-by' tag
---
 arch/powerpc/include/asm/uaccess.h | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/arch/powerpc/include/asm/uaccess.h b/arch/powerpc/include/asm/uaccess.h
index 78e2a3990eab..ef5978b73a8d 100644
--- a/arch/powerpc/include/asm/uaccess.h
+++ b/arch/powerpc/include/asm/uaccess.h
@@ -487,6 +487,27 @@ user_write_access_begin(const void __user *ptr, size_t len)
 #define unsafe_put_user(x, p, e) \
 	__unsafe_put_user_goto((__typeof__(*(p)))(x), (p), sizeof(*(p)), e)
 
+#define unsafe_copy_from_user(d, s, l, e) \
+do {											\
+	u8 *_dst = (u8 *)(d);								\
+	const u8 __user *_src = (const u8 __user *)(s);					\
+	size_t _len = (l);								\
+	int _i;										\
+											\
+	for (_i = 0; _i < (_len & ~(sizeof(long) - 1)); _i += sizeof(long))		\
+		unsafe_get_user(*(long *)(_dst + _i), (long __user *)(_src + _i), e);	\
+	if (IS_ENABLED(CONFIG_PPC64) && (_len & 4)) {					\
+		unsafe_get_user(*(u32 *)(_dst + _i), (u32 __user *)(_src + _i), e);	\
+		_i += 4;								\
+	}										\
+	if (_len & 2) {									\
+		unsafe_get_user(*(u16 *)(_dst + _i), (u16 __user *)(_src + _i), e);	\
+		_i += 2;								\
+	}										\
+	if (_len & 1)									\
+		unsafe_get_user(*(u8 *)(_dst + _i), (u8 __user *)(_src + _i), e);	\
+} while (0)
+
 #define unsafe_copy_to_user(d, s, l, e) \
 do {									\
 	u8 __user *_dst = (u8 __user *)(d);				\
-- 
2.26.1


^ permalink raw reply related

* [PATCH v7 09/10] powerpc/signal64: Rewrite rt_sigreturn() to minimise uaccess switches
From: Christopher M. Riedl @ 2021-02-27  1:12 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Daniel Axtens
In-Reply-To: <20210227011259.11992-1-cmr@codefail.de>

From: Daniel Axtens <dja@axtens.net>

Add uaccess blocks and use the 'unsafe' versions of functions doing user
access where possible to reduce the number of times uaccess has to be
opened/closed.

Signed-off-by: Daniel Axtens <dja@axtens.net>
Co-developed-by: Christopher M. Riedl <cmr@codefail.de>
Signed-off-by: Christopher M. Riedl <cmr@codefail.de>
---
 arch/powerpc/kernel/signal_64.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kernel/signal_64.c b/arch/powerpc/kernel/signal_64.c
index 788854734b9a..00c907022707 100644
--- a/arch/powerpc/kernel/signal_64.c
+++ b/arch/powerpc/kernel/signal_64.c
@@ -822,11 +822,11 @@ SYSCALL_DEFINE0(rt_sigreturn)
 		 */
 		current->thread.regs->msr &= ~MSR_TS_MASK;
 		if (!user_read_access_begin(&uc->uc_mcontext, sizeof(uc->uc_mcontext)))
-			return -EFAULT;
-		if (__unsafe_restore_sigcontext(current, NULL, 1, &uc->uc_mcontext)) {
-			user_read_access_end();
 			goto badframe;
-		}
+
+		unsafe_restore_sigcontext(current, NULL, 1, &uc->uc_mcontext,
+					  badframe_block);
+
 		user_read_access_end();
 	}
 
@@ -836,6 +836,8 @@ SYSCALL_DEFINE0(rt_sigreturn)
 	set_thread_flag(TIF_RESTOREALL);
 	return 0;
 
+badframe_block:
+	user_read_access_end();
 badframe:
 	signal_fault(current, regs, "rt_sigreturn", uc);
 
-- 
2.26.1


^ permalink raw reply related

* [PATCH v7 02/10] powerpc/signal: Add unsafe_copy_{vsx, fpr}_from_user()
From: Christopher M. Riedl @ 2021-02-27  1:12 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Daniel Axtens
In-Reply-To: <20210227011259.11992-1-cmr@codefail.de>

Reuse the "safe" implementation from signal.c but call unsafe_get_user()
directly in a loop to avoid the intermediate copy into a local buffer.

Signed-off-by: Christopher M. Riedl <cmr@codefail.de>
Reviewed-by: Daniel Axtens <dja@axtens.net>
---
 arch/powerpc/kernel/signal.h | 26 ++++++++++++++++++++++++++
 1 file changed, 26 insertions(+)

diff --git a/arch/powerpc/kernel/signal.h b/arch/powerpc/kernel/signal.h
index 2559a681536e..d8dd76b1dc94 100644
--- a/arch/powerpc/kernel/signal.h
+++ b/arch/powerpc/kernel/signal.h
@@ -53,6 +53,26 @@ unsigned long copy_ckfpr_from_user(struct task_struct *task, void __user *from);
 				&buf[i], label);\
 } while (0)
 
+#define unsafe_copy_fpr_from_user(task, from, label)	do {		\
+	struct task_struct *__t = task;					\
+	u64 __user *buf = (u64 __user *)from;				\
+	int i;								\
+									\
+	for (i = 0; i < ELF_NFPREG - 1; i++)				\
+		unsafe_get_user(__t->thread.TS_FPR(i), &buf[i], label); \
+	unsafe_get_user(__t->thread.fp_state.fpscr, &buf[i], label);	\
+} while (0)
+
+#define unsafe_copy_vsx_from_user(task, from, label)	do {		\
+	struct task_struct *__t = task;					\
+	u64 __user *buf = (u64 __user *)from;				\
+	int i;								\
+									\
+	for (i = 0; i < ELF_NVSRHALFREG ; i++)				\
+		unsafe_get_user(__t->thread.fp_state.fpr[i][TS_VSRLOWOFFSET], \
+				&buf[i], label);			\
+} while (0)
+
 #ifdef CONFIG_PPC_TRANSACTIONAL_MEM
 #define unsafe_copy_ckfpr_to_user(to, task, label)	do {		\
 	struct task_struct *__t = task;					\
@@ -80,6 +100,10 @@ unsigned long copy_ckfpr_from_user(struct task_struct *task, void __user *from);
 	unsafe_copy_to_user(to, (task)->thread.fp_state.fpr,	\
 			    ELF_NFPREG * sizeof(double), label)
 
+#define unsafe_copy_fpr_from_user(task, from, label)			\
+	unsafe_copy_from_user((task)->thread.fp_state.fpr, from,	\
+			    ELF_NFPREG * sizeof(double), label)
+
 static inline unsigned long
 copy_fpr_to_user(void __user *to, struct task_struct *task)
 {
@@ -115,6 +139,8 @@ copy_ckfpr_from_user(struct task_struct *task, void __user *from)
 #else
 #define unsafe_copy_fpr_to_user(to, task, label) do { } while (0)
 
+#define unsafe_copy_fpr_from_user(task, from, label) do { } while (0)
+
 static inline unsigned long
 copy_fpr_to_user(void __user *to, struct task_struct *task)
 {
-- 
2.26.1


^ permalink raw reply related

* [PATCH v7 04/10] powerpc: Reference parameter in MSR_TM_ACTIVE() macro
From: Christopher M. Riedl @ 2021-02-27  1:12 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Daniel Axtens
In-Reply-To: <20210227011259.11992-1-cmr@codefail.de>

Unlike the other MSR_TM_* macros, MSR_TM_ACTIVE does not reference or
use its parameter unless CONFIG_PPC_TRANSACTIONAL_MEM is defined. This
causes an 'unused variable' compile warning unless the variable is also
guarded with CONFIG_PPC_TRANSACTIONAL_MEM.

Reference but do nothing with the argument in the macro to avoid a
potential compile warning.

Signed-off-by: Christopher M. Riedl <cmr@codefail.de>
Reviewed-by: Daniel Axtens <dja@axtens.net>
---
 arch/powerpc/include/asm/reg.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
index da103e92c112..1be20bc8dce2 100644
--- a/arch/powerpc/include/asm/reg.h
+++ b/arch/powerpc/include/asm/reg.h
@@ -124,7 +124,7 @@
 #ifdef CONFIG_PPC_TRANSACTIONAL_MEM
 #define MSR_TM_ACTIVE(x) (((x) & MSR_TS_MASK) != 0) /* Transaction active? */
 #else
-#define MSR_TM_ACTIVE(x) 0
+#define MSR_TM_ACTIVE(x) ((void)(x), 0)
 #endif
 
 #if defined(CONFIG_PPC_BOOK3S_64)
-- 
2.26.1


^ permalink raw reply related

* [PATCH v7 03/10] powerpc/signal64: Remove non-inline calls from setup_sigcontext()
From: Christopher M. Riedl @ 2021-02-27  1:12 UTC (permalink / raw)
  To: linuxppc-dev
In-Reply-To: <20210227011259.11992-1-cmr@codefail.de>

The majority of setup_sigcontext() can be refactored to execute in an
"unsafe" context assuming an open uaccess window except for some
non-inline function calls. Move these out into a separate
prepare_setup_sigcontext() function which must be called first and
before opening up a uaccess window. Non-inline function calls should be
avoided during a uaccess window for a few reasons:

	- KUAP should be enabled for as much kernel code as possible.
	  Opening a uaccess window disables KUAP which means any code
	  executed during this time contributes to a potential attack
	  surface.

	- Non-inline functions default to traceable which means they are
	  instrumented for ftrace. This adds more code which could run
	  with KUAP disabled.

	- Powerpc does not currently support the objtool UACCESS checks.
	  All code running with uaccess must be audited manually which
	  means: less code -> less work -> fewer problems (in theory).

A follow-up commit converts setup_sigcontext() to be "unsafe".

Signed-off-by: Christopher M. Riedl <cmr@codefail.de>
---
 arch/powerpc/kernel/signal_64.c | 32 +++++++++++++++++++++-----------
 1 file changed, 21 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/kernel/signal_64.c b/arch/powerpc/kernel/signal_64.c
index f9e4a1ac440f..6ca546192cbf 100644
--- a/arch/powerpc/kernel/signal_64.c
+++ b/arch/powerpc/kernel/signal_64.c
@@ -79,6 +79,24 @@ static elf_vrreg_t __user *sigcontext_vmx_regs(struct sigcontext __user *sc)
 }
 #endif
 
+static void prepare_setup_sigcontext(struct task_struct *tsk)
+{
+#ifdef CONFIG_ALTIVEC
+	/* save altivec registers */
+	if (tsk->thread.used_vr)
+		flush_altivec_to_thread(tsk);
+	if (cpu_has_feature(CPU_FTR_ALTIVEC))
+		tsk->thread.vrsave = mfspr(SPRN_VRSAVE);
+#endif /* CONFIG_ALTIVEC */
+
+	flush_fp_to_thread(tsk);
+
+#ifdef CONFIG_VSX
+	if (tsk->thread.used_vsr)
+		flush_vsx_to_thread(tsk);
+#endif /* CONFIG_VSX */
+}
+
 /*
  * Set up the sigcontext for the signal frame.
  */
@@ -97,7 +115,6 @@ static long setup_sigcontext(struct sigcontext __user *sc,
 	 */
 #ifdef CONFIG_ALTIVEC
 	elf_vrreg_t __user *v_regs = sigcontext_vmx_regs(sc);
-	unsigned long vrsave;
 #endif
 	struct pt_regs *regs = tsk->thread.regs;
 	unsigned long msr = regs->msr;
@@ -112,7 +129,6 @@ static long setup_sigcontext(struct sigcontext __user *sc,
 
 	/* save altivec registers */
 	if (tsk->thread.used_vr) {
-		flush_altivec_to_thread(tsk);
 		/* Copy 33 vec registers (vr0..31 and vscr) to the stack */
 		err |= __copy_to_user(v_regs, &tsk->thread.vr_state,
 				      33 * sizeof(vector128));
@@ -124,17 +140,10 @@ static long setup_sigcontext(struct sigcontext __user *sc,
 	/* We always copy to/from vrsave, it's 0 if we don't have or don't
 	 * use altivec.
 	 */
-	vrsave = 0;
-	if (cpu_has_feature(CPU_FTR_ALTIVEC)) {
-		vrsave = mfspr(SPRN_VRSAVE);
-		tsk->thread.vrsave = vrsave;
-	}
-
-	err |= __put_user(vrsave, (u32 __user *)&v_regs[33]);
+	err |= __put_user(tsk->thread.vrsave, (u32 __user *)&v_regs[33]);
 #else /* CONFIG_ALTIVEC */
 	err |= __put_user(0, &sc->v_regs);
 #endif /* CONFIG_ALTIVEC */
-	flush_fp_to_thread(tsk);
 	/* copy fpr regs and fpscr */
 	err |= copy_fpr_to_user(&sc->fp_regs, tsk);
 
@@ -150,7 +159,6 @@ static long setup_sigcontext(struct sigcontext __user *sc,
 	 * VMX data.
 	 */
 	if (tsk->thread.used_vsr && ctx_has_vsx_region) {
-		flush_vsx_to_thread(tsk);
 		v_regs += ELF_NVRREG;
 		err |= copy_vsx_to_user(v_regs, tsk);
 		/* set MSR_VSX in the MSR value in the frame to
@@ -655,6 +663,7 @@ SYSCALL_DEFINE3(swapcontext, struct ucontext __user *, old_ctx,
 		ctx_has_vsx_region = 1;
 
 	if (old_ctx != NULL) {
+		prepare_setup_sigcontext(current);
 		if (!access_ok(old_ctx, ctx_size)
 		    || setup_sigcontext(&old_ctx->uc_mcontext, current, 0, NULL, 0,
 					ctx_has_vsx_region)
@@ -842,6 +851,7 @@ int handle_rt_signal64(struct ksignal *ksig, sigset_t *set,
 #endif
 	{
 		err |= __put_user(0, &frame->uc.uc_link);
+		prepare_setup_sigcontext(tsk);
 		err |= setup_sigcontext(&frame->uc.uc_mcontext, tsk, ksig->sig,
 					NULL, (unsigned long)ksig->ka.sa.sa_handler,
 					1);
-- 
2.26.1


^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox