LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH 3/8] xen-swiotlb: use io_tlb_end in xen_swiotlb_dma_supported
From: Konrad Rzeszutek Wilk @ 2021-02-19 21:00 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: iommu, xen-devel, Claire Chang, linuxppc-dev, Dongli Zhang
In-Reply-To: <20210207160934.2955931-4-hch@lst.de>

On Sun, Feb 07, 2021 at 05:09:29PM +0100, Christoph Hellwig wrote:
> Use the existing variable that holds the physical address for
> xen_io_tlb_end to simplify xen_swiotlb_dma_supported a bit, and remove
> the otherwise unused xen_io_tlb_end variable and the xen_virt_to_bus
> helper.
> 
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  drivers/xen/swiotlb-xen.c | 10 ++--------
>  1 file changed, 2 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
> index a4026822a889f7..4298f74a083985 100644
> --- a/drivers/xen/swiotlb-xen.c
> +++ b/drivers/xen/swiotlb-xen.c
> @@ -46,7 +46,7 @@
>   * API.
>   */
>  
> -static char *xen_io_tlb_start, *xen_io_tlb_end;
> +static char *xen_io_tlb_start;
>  static unsigned long xen_io_tlb_nslabs;
>  /*
>   * Quick lookup value of the bus address of the IOTLB.
> @@ -82,11 +82,6 @@ static inline phys_addr_t xen_dma_to_phys(struct device *dev,
>  	return xen_bus_to_phys(dev, dma_to_phys(dev, dma_addr));
>  }
>  
> -static inline dma_addr_t xen_virt_to_bus(struct device *dev, void *address)
> -{
> -	return xen_phys_to_dma(dev, virt_to_phys(address));
> -}
> -
>  static inline int range_straddles_page_boundary(phys_addr_t p, size_t size)
>  {
>  	unsigned long next_bfn, xen_pfn = XEN_PFN_DOWN(p);
> @@ -250,7 +245,6 @@ int __ref xen_swiotlb_init(int verbose, bool early)
>  		rc = swiotlb_late_init_with_tbl(xen_io_tlb_start, xen_io_tlb_nslabs);
>  
>  end:
> -	xen_io_tlb_end = xen_io_tlb_start + bytes;
>  	if (!rc)
>  		swiotlb_set_max_segment(PAGE_SIZE);
>  
> @@ -558,7 +552,7 @@ xen_swiotlb_sync_sg_for_device(struct device *dev, struct scatterlist *sgl,
>  static int
>  xen_swiotlb_dma_supported(struct device *hwdev, u64 mask)
>  {
> -	return xen_virt_to_bus(hwdev, xen_io_tlb_end - 1) <= mask;
> +	return xen_phys_to_dma(hwdev, io_tlb_end - 1) <= mask;
>  }
>  
>  const struct dma_map_ops xen_swiotlb_dma_ops = {
> -- 
> 2.29.2
> 

^ permalink raw reply

* Re: [PATCH 2/8] xen-swiotlb: use is_swiotlb_buffer in is_xen_swiotlb_buffer
From: Konrad Rzeszutek Wilk @ 2021-02-19 20:43 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: iommu, xen-devel, Claire Chang, linuxppc-dev, Dongli Zhang
In-Reply-To: <20210207160934.2955931-3-hch@lst.de>

On Sun, Feb 07, 2021 at 05:09:28PM +0100, Christoph Hellwig wrote:
> Use the is_swiotlb_buffer to check if a physical address is
> a swiotlb buffer.  This works because xen-swiotlb does use the
> same buffer as the main swiotlb code, and xen_io_tlb_{start,end}
> are just the addresses for it that went through phys_to_virt.
> 

Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  drivers/xen/swiotlb-xen.c | 6 ++----
>  1 file changed, 2 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
> index 2b385c1b4a99cb..a4026822a889f7 100644
> --- a/drivers/xen/swiotlb-xen.c
> +++ b/drivers/xen/swiotlb-xen.c
> @@ -111,10 +111,8 @@ static int is_xen_swiotlb_buffer(struct device *dev, dma_addr_t dma_addr)
>  	 * have the same virtual address as another address
>  	 * in our domain. Therefore _only_ check address within our domain.
>  	 */
> -	if (pfn_valid(PFN_DOWN(paddr))) {
> -		return paddr >= virt_to_phys(xen_io_tlb_start) &&
> -		       paddr < virt_to_phys(xen_io_tlb_end);
> -	}
> +	if (pfn_valid(PFN_DOWN(paddr)))
> +		return is_swiotlb_buffer(paddr);
>  	return 0;
>  }
>  
> -- 
> 2.29.2
> 

^ permalink raw reply

* Re: [PATCH] of: error: 'const struct kimage' has no member named 'arch'
From: Lakshmi Ramasubramanian @ 2021-02-19 18:43 UTC (permalink / raw)
  To: Thiago Jung Bauermann, Mimi Zohar
  Cc: Sasha Levin, Rob Herring, Stephen Rothwell, Greg Kroah-Hartman,
	linuxppc-dev, linux-kernel@vger.kernel.org, AKASHI, Takahiro,
	devicetree, James Morse, Catalin Marinas, Joe Perches,
	linux-integrity, Will Deacon, linux-arm-kernel
In-Reply-To: <87blcgx72l.fsf@manicouagan.localdomain>

On 2/19/21 10:09 AM, Thiago Jung Bauermann wrote:
> 
> Mimi Zohar <zohar@linux.ibm.com> writes:
> 
>> On Fri, 2021-02-19 at 11:43 -0600, Rob Herring wrote:
>>> On Fri, Feb 19, 2021 at 10:57 AM Lakshmi Ramasubramanian
>>> <nramas@linux.microsoft.com> wrote:
>>>>
>>>> On 2/19/21 6:16 AM, Rob Herring wrote:
>>>>> On Thu, Feb 18, 2021 at 8:53 PM Lakshmi Ramasubramanian
>>>>> <nramas@linux.microsoft.com> wrote:
>>>>>>
>>>>>> On 2/18/21 5:13 PM, Thiago Jung Bauermann wrote:
>>>>>>>
>>>>>>> Lakshmi Ramasubramanian <nramas@linux.microsoft.com> writes:
>>>>>>>
>>>>>>>> On 2/18/21 4:07 PM, Mimi Zohar wrote:
>>>>>>>>
>>>>>>>> Hi Mimi,
>>>>>>>>
>>>>>>>>> On Thu, 2021-02-18 at 14:33 -0800, Lakshmi Ramasubramanian wrote:
>>>>>>>>>> of_kexec_alloc_and_setup_fdt() defined in drivers/of/kexec.c builds
>>>>>>>>>> a new device tree object that includes architecture specific data
>>>>>>>>>> for kexec system call.  This should be defined only if the architecture
>>>>>>>>>> being built defines kexec architecture structure "struct kimage_arch".
>>>>>>>>>>
>>>>>>>>>> Define a new boolean config OF_KEXEC that is enabled if
>>>>>>>>>> CONFIG_KEXEC_FILE and CONFIG_OF_FLATTREE are enabled, and
>>>>>>>>>> the architecture is arm64 or powerpc64.  Build drivers/of/kexec.c
>>>>>>>>>> if CONFIG_OF_KEXEC is enabled.
>>>>>>>>>>
>>>>>>>>>> Signed-off-by: Lakshmi Ramasubramanian <nramas@linux.microsoft.com>
>>>>>>>>>> Fixes: 33488dc4d61f ("of: Add a common kexec FDT setup function")
>>>>>>>>>> Reported-by: kernel test robot <lkp@intel.com>
>>>>>>>>>> ---
>>>>>>>>>>      drivers/of/Kconfig  | 6 ++++++
>>>>>>>>>>      drivers/of/Makefile | 7 +------
>>>>>>>>>>      2 files changed, 7 insertions(+), 6 deletions(-)
>>>>>>>>>>
>>>>>>>>>> diff --git a/drivers/of/Kconfig b/drivers/of/Kconfig
>>>>>>>>>> index 18450437d5d5..f2e8fa54862a 100644
>>>>>>>>>> --- a/drivers/of/Kconfig
>>>>>>>>>> +++ b/drivers/of/Kconfig
>>>>>>>>>> @@ -100,4 +100,10 @@ config OF_DMA_DEFAULT_COHERENT
>>>>>>>>>>              # arches should select this if DMA is coherent by default for OF devices
>>>>>>>>>>              bool
>>>>>>>>>>      +config OF_KEXEC
>>>>>>>>>> +  bool
>>>>>>>>>> +  depends on KEXEC_FILE
>>>>>>>>>> +  depends on OF_FLATTREE
>>>>>>>>>> +  default y if ARM64 || PPC64
>>>>>>>>>> +
>>>>>>>>>>      endif # OF
>>>>>>>>>> diff --git a/drivers/of/Makefile b/drivers/of/Makefile
>>>>>>>>>> index c13b982084a3..287579dd1695 100644
>>>>>>>>>> --- a/drivers/of/Makefile
>>>>>>>>>> +++ b/drivers/of/Makefile
>>>>>>>>>> @@ -13,11 +13,6 @@ obj-$(CONFIG_OF_RESERVED_MEM) += of_reserved_mem.o
>>>>>>>>>>      obj-$(CONFIG_OF_RESOLVE)  += resolver.o
>>>>>>>>>>      obj-$(CONFIG_OF_OVERLAY) += overlay.o
>>>>>>>>>>      obj-$(CONFIG_OF_NUMA) += of_numa.o
>>>>>>>>>> -
>>>>>>>>>> -ifdef CONFIG_KEXEC_FILE
>>>>>>>>>> -ifdef CONFIG_OF_FLATTREE
>>>>>>>>>> -obj-y     += kexec.o
>>>>>>>>>> -endif
>>>>>>>>>> -endif
>>>>>>>>>> +obj-$(CONFIG_OF_KEXEC) += kexec.o
>>>>>>>>>>        obj-$(CONFIG_OF_UNITTEST) += unittest-data/
>>>>>>>>> Is it possible to reuse CONFIG_HAVE_IMA_KEXEC here?
>>>>>>>>>
>>>>>>>>
>>>>>>>> For ppc64 CONFIG_HAVE_IMA_KEXEC is selected when CONFIG_KEXEC_FILE is enabled.
>>>>>>>> So I don't see a problem in reusing CONFIG_HAVE_IMA_KEXEC for ppc.
>>>>>>>>
>>>>>>>> But for arm64, CONFIG_HAVE_IMA_KEXEC is enabled in the final patch in the patch
>>>>>>>> set (the one for carrying forward IMA log across kexec for arm64). arm64 calls
>>>>>>>> of_kexec_alloc_and_setup_fdt() prior to enabling CONFIG_HAVE_IMA_KEXEC and hence
>>>>>>>> breaks the build for arm64.
>>>>>>>
>>>>>>> One problem is that I believe that this patch won't placate the robot,
>>>>>>> because IIUC it generates config files at random and this change still
>>>>>>> allows hppa and s390 to enable CONFIG_OF_KEXEC.
>>>>>>
>>>>>> I enabled CONFIG_OF_KEXEC for s390. With my patch applied,
>>>>>> CONFIG_OF_KEXEC is removed. So I think the robot enabling this config
>>>>>> would not be a problem.
>>>>>>
>>>>>>>
>>>>>>> Perhaps a new CONFIG_HAVE_KIMAGE_ARCH option? Not having that option
>>>>>>> would still allow building kexec.o, but would be used inside kexec.c to
>>>>>>> avoid accessing kimage.arch members.
>>>>>>>
>>>>>>
>>>>>> I think this is a good idea - a new CONFIG_HAVE_KIMAGE_ARCH, which will
>>>>>> be selected by arm64 and ppc for now. I tried this, and it fixes the
>>>>>> build issue.
>>>>>>
>>>>>> Although, the name for the new config can be misleading since PARISC,
>>>>>> for instance, also defines "struct kimage_arch". Perhaps,
>>>>>> CONFIG_HAVE_ELF_KIMAGE_ARCH since of_kexec_alloc_and_setup_fdt() is
>>>>>> accessing ELF specific fields in "struct kimage_arch"?
>>>>>>
>>>>>> Rob/Mimi - please let us know which approach you think is better.
>>>>>
>>>>> I'd just move the fields to kimage.
>>>>>
>>>>
>>>> I think Mimi's suggestion to use CONFIG_HAVE_IMA_KEXEC for building
>>>> drivers/of/kexec.c would work and also avoid the bisect issue if we do
>>>> the following:
>>>
>>> That seems wrong given only a portion of the file depends on IMA. And
>>> it reduces our compile coverage.
>>   
>> I agree with you this is the wrong solution.  Lakshmi's patch
>> introduced a new option to prevent other arch's from including kexec.o,
>> which is the same functionality as CONFIG_HAVE_IMA_KEXEC.  I'm just not
>> sure what the right solution would be.
> 
> I think Rob's suggestion of just moving the elf_load_addr,
> elf_headers_sz fields (and for consistency, elf_headers as well even though it
> isn't used in tihs file) from kimage_arch to kimage.
> 
> The downside is that these fields will go unused on a number of
> architectures, but it's not worth complicating the code just because of
> it.
> 
> The patch to do that would have to go before "of: Add a common kexec FDT
> setup function". That should be enough to preserve bisectability for all arches.
> 

Agreed. I'll make this change and update.

  -lakshmi

^ permalink raw reply

* Re: [PATCH] of: error: 'const struct kimage' has no member named 'arch'
From: Thiago Jung Bauermann @ 2021-02-19 18:09 UTC (permalink / raw)
  To: Mimi Zohar
  Cc: Sasha Levin, Rob Herring, Lakshmi Ramasubramanian,
	Greg Kroah-Hartman, linuxppc-dev, linux-kernel@vger.kernel.org,
	AKASHI, Takahiro, devicetree, James Morse, Catalin Marinas,
	Joe Perches, Stephen Rothwell, linux-integrity, Will Deacon,
	linux-arm-kernel
In-Reply-To: <6a197963deb8e44c71384ea9b89d7f3f13c947bf.camel@linux.ibm.com>


Mimi Zohar <zohar@linux.ibm.com> writes:

> On Fri, 2021-02-19 at 11:43 -0600, Rob Herring wrote:
>> On Fri, Feb 19, 2021 at 10:57 AM Lakshmi Ramasubramanian
>> <nramas@linux.microsoft.com> wrote:
>> >
>> > On 2/19/21 6:16 AM, Rob Herring wrote:
>> > > On Thu, Feb 18, 2021 at 8:53 PM Lakshmi Ramasubramanian
>> > > <nramas@linux.microsoft.com> wrote:
>> > >>
>> > >> On 2/18/21 5:13 PM, Thiago Jung Bauermann wrote:
>> > >>>
>> > >>> Lakshmi Ramasubramanian <nramas@linux.microsoft.com> writes:
>> > >>>
>> > >>>> On 2/18/21 4:07 PM, Mimi Zohar wrote:
>> > >>>>
>> > >>>> Hi Mimi,
>> > >>>>
>> > >>>>> On Thu, 2021-02-18 at 14:33 -0800, Lakshmi Ramasubramanian wrote:
>> > >>>>>> of_kexec_alloc_and_setup_fdt() defined in drivers/of/kexec.c builds
>> > >>>>>> a new device tree object that includes architecture specific data
>> > >>>>>> for kexec system call.  This should be defined only if the architecture
>> > >>>>>> being built defines kexec architecture structure "struct kimage_arch".
>> > >>>>>>
>> > >>>>>> Define a new boolean config OF_KEXEC that is enabled if
>> > >>>>>> CONFIG_KEXEC_FILE and CONFIG_OF_FLATTREE are enabled, and
>> > >>>>>> the architecture is arm64 or powerpc64.  Build drivers/of/kexec.c
>> > >>>>>> if CONFIG_OF_KEXEC is enabled.
>> > >>>>>>
>> > >>>>>> Signed-off-by: Lakshmi Ramasubramanian <nramas@linux.microsoft.com>
>> > >>>>>> Fixes: 33488dc4d61f ("of: Add a common kexec FDT setup function")
>> > >>>>>> Reported-by: kernel test robot <lkp@intel.com>
>> > >>>>>> ---
>> > >>>>>>     drivers/of/Kconfig  | 6 ++++++
>> > >>>>>>     drivers/of/Makefile | 7 +------
>> > >>>>>>     2 files changed, 7 insertions(+), 6 deletions(-)
>> > >>>>>>
>> > >>>>>> diff --git a/drivers/of/Kconfig b/drivers/of/Kconfig
>> > >>>>>> index 18450437d5d5..f2e8fa54862a 100644
>> > >>>>>> --- a/drivers/of/Kconfig
>> > >>>>>> +++ b/drivers/of/Kconfig
>> > >>>>>> @@ -100,4 +100,10 @@ config OF_DMA_DEFAULT_COHERENT
>> > >>>>>>             # arches should select this if DMA is coherent by default for OF devices
>> > >>>>>>             bool
>> > >>>>>>     +config OF_KEXEC
>> > >>>>>> +  bool
>> > >>>>>> +  depends on KEXEC_FILE
>> > >>>>>> +  depends on OF_FLATTREE
>> > >>>>>> +  default y if ARM64 || PPC64
>> > >>>>>> +
>> > >>>>>>     endif # OF
>> > >>>>>> diff --git a/drivers/of/Makefile b/drivers/of/Makefile
>> > >>>>>> index c13b982084a3..287579dd1695 100644
>> > >>>>>> --- a/drivers/of/Makefile
>> > >>>>>> +++ b/drivers/of/Makefile
>> > >>>>>> @@ -13,11 +13,6 @@ obj-$(CONFIG_OF_RESERVED_MEM) += of_reserved_mem.o
>> > >>>>>>     obj-$(CONFIG_OF_RESOLVE)  += resolver.o
>> > >>>>>>     obj-$(CONFIG_OF_OVERLAY) += overlay.o
>> > >>>>>>     obj-$(CONFIG_OF_NUMA) += of_numa.o
>> > >>>>>> -
>> > >>>>>> -ifdef CONFIG_KEXEC_FILE
>> > >>>>>> -ifdef CONFIG_OF_FLATTREE
>> > >>>>>> -obj-y     += kexec.o
>> > >>>>>> -endif
>> > >>>>>> -endif
>> > >>>>>> +obj-$(CONFIG_OF_KEXEC) += kexec.o
>> > >>>>>>       obj-$(CONFIG_OF_UNITTEST) += unittest-data/
>> > >>>>> Is it possible to reuse CONFIG_HAVE_IMA_KEXEC here?
>> > >>>>>
>> > >>>>
>> > >>>> For ppc64 CONFIG_HAVE_IMA_KEXEC is selected when CONFIG_KEXEC_FILE is enabled.
>> > >>>> So I don't see a problem in reusing CONFIG_HAVE_IMA_KEXEC for ppc.
>> > >>>>
>> > >>>> But for arm64, CONFIG_HAVE_IMA_KEXEC is enabled in the final patch in the patch
>> > >>>> set (the one for carrying forward IMA log across kexec for arm64). arm64 calls
>> > >>>> of_kexec_alloc_and_setup_fdt() prior to enabling CONFIG_HAVE_IMA_KEXEC and hence
>> > >>>> breaks the build for arm64.
>> > >>>
>> > >>> One problem is that I believe that this patch won't placate the robot,
>> > >>> because IIUC it generates config files at random and this change still
>> > >>> allows hppa and s390 to enable CONFIG_OF_KEXEC.
>> > >>
>> > >> I enabled CONFIG_OF_KEXEC for s390. With my patch applied,
>> > >> CONFIG_OF_KEXEC is removed. So I think the robot enabling this config
>> > >> would not be a problem.
>> > >>
>> > >>>
>> > >>> Perhaps a new CONFIG_HAVE_KIMAGE_ARCH option? Not having that option
>> > >>> would still allow building kexec.o, but would be used inside kexec.c to
>> > >>> avoid accessing kimage.arch members.
>> > >>>
>> > >>
>> > >> I think this is a good idea - a new CONFIG_HAVE_KIMAGE_ARCH, which will
>> > >> be selected by arm64 and ppc for now. I tried this, and it fixes the
>> > >> build issue.
>> > >>
>> > >> Although, the name for the new config can be misleading since PARISC,
>> > >> for instance, also defines "struct kimage_arch". Perhaps,
>> > >> CONFIG_HAVE_ELF_KIMAGE_ARCH since of_kexec_alloc_and_setup_fdt() is
>> > >> accessing ELF specific fields in "struct kimage_arch"?
>> > >>
>> > >> Rob/Mimi - please let us know which approach you think is better.
>> > >
>> > > I'd just move the fields to kimage.
>> > >
>> >
>> > I think Mimi's suggestion to use CONFIG_HAVE_IMA_KEXEC for building
>> > drivers/of/kexec.c would work and also avoid the bisect issue if we do
>> > the following:
>> 
>> That seems wrong given only a portion of the file depends on IMA. And
>> it reduces our compile coverage.
>  
> I agree with you this is the wrong solution.  Lakshmi's patch
> introduced a new option to prevent other arch's from including kexec.o,
> which is the same functionality as CONFIG_HAVE_IMA_KEXEC.  I'm just not
> sure what the right solution would be.

I think Rob's suggestion of just moving the elf_load_addr,
elf_headers_sz fields (and for consistency, elf_headers as well even though it
isn't used in tihs file) from kimage_arch to kimage.

The downside is that these fields will go unused on a number of
architectures, but it's not worth complicating the code just because of
it.

The patch to do that would have to go before "of: Add a common kexec FDT
setup function". That should be enough to preserve bisectability for all arches.

-- 
Thiago Jung Bauermann
IBM Linux Technology Center

^ permalink raw reply

* Re: [PATCH] of: error: 'const struct kimage' has no member named 'arch'
From: Mimi Zohar @ 2021-02-19 18:03 UTC (permalink / raw)
  To: Rob Herring, Lakshmi Ramasubramanian
  Cc: Sasha Levin, Stephen Rothwell, devicetree, Catalin Marinas,
	linuxppc-dev, linux-kernel@vger.kernel.org, AKASHI, Takahiro,
	James Morse, Greg Kroah-Hartman, Joe Perches, linux-integrity,
	Will Deacon, Thiago Jung Bauermann, linux-arm-kernel
In-Reply-To: <CAL_Jsq+R-zOT581_W0Ar5H58rfPnGiWeetoF_b+BaW7er22bPA@mail.gmail.com>

On Fri, 2021-02-19 at 11:43 -0600, Rob Herring wrote:
> On Fri, Feb 19, 2021 at 10:57 AM Lakshmi Ramasubramanian
> <nramas@linux.microsoft.com> wrote:
> >
> > On 2/19/21 6:16 AM, Rob Herring wrote:
> > > On Thu, Feb 18, 2021 at 8:53 PM Lakshmi Ramasubramanian
> > > <nramas@linux.microsoft.com> wrote:
> > >>
> > >> On 2/18/21 5:13 PM, Thiago Jung Bauermann wrote:
> > >>>
> > >>> Lakshmi Ramasubramanian <nramas@linux.microsoft.com> writes:
> > >>>
> > >>>> On 2/18/21 4:07 PM, Mimi Zohar wrote:
> > >>>>
> > >>>> Hi Mimi,
> > >>>>
> > >>>>> On Thu, 2021-02-18 at 14:33 -0800, Lakshmi Ramasubramanian wrote:
> > >>>>>> of_kexec_alloc_and_setup_fdt() defined in drivers/of/kexec.c builds
> > >>>>>> a new device tree object that includes architecture specific data
> > >>>>>> for kexec system call.  This should be defined only if the architecture
> > >>>>>> being built defines kexec architecture structure "struct kimage_arch".
> > >>>>>>
> > >>>>>> Define a new boolean config OF_KEXEC that is enabled if
> > >>>>>> CONFIG_KEXEC_FILE and CONFIG_OF_FLATTREE are enabled, and
> > >>>>>> the architecture is arm64 or powerpc64.  Build drivers/of/kexec.c
> > >>>>>> if CONFIG_OF_KEXEC is enabled.
> > >>>>>>
> > >>>>>> Signed-off-by: Lakshmi Ramasubramanian <nramas@linux.microsoft.com>
> > >>>>>> Fixes: 33488dc4d61f ("of: Add a common kexec FDT setup function")
> > >>>>>> Reported-by: kernel test robot <lkp@intel.com>
> > >>>>>> ---
> > >>>>>>     drivers/of/Kconfig  | 6 ++++++
> > >>>>>>     drivers/of/Makefile | 7 +------
> > >>>>>>     2 files changed, 7 insertions(+), 6 deletions(-)
> > >>>>>>
> > >>>>>> diff --git a/drivers/of/Kconfig b/drivers/of/Kconfig
> > >>>>>> index 18450437d5d5..f2e8fa54862a 100644
> > >>>>>> --- a/drivers/of/Kconfig
> > >>>>>> +++ b/drivers/of/Kconfig
> > >>>>>> @@ -100,4 +100,10 @@ config OF_DMA_DEFAULT_COHERENT
> > >>>>>>             # arches should select this if DMA is coherent by default for OF devices
> > >>>>>>             bool
> > >>>>>>     +config OF_KEXEC
> > >>>>>> +  bool
> > >>>>>> +  depends on KEXEC_FILE
> > >>>>>> +  depends on OF_FLATTREE
> > >>>>>> +  default y if ARM64 || PPC64
> > >>>>>> +
> > >>>>>>     endif # OF
> > >>>>>> diff --git a/drivers/of/Makefile b/drivers/of/Makefile
> > >>>>>> index c13b982084a3..287579dd1695 100644
> > >>>>>> --- a/drivers/of/Makefile
> > >>>>>> +++ b/drivers/of/Makefile
> > >>>>>> @@ -13,11 +13,6 @@ obj-$(CONFIG_OF_RESERVED_MEM) += of_reserved_mem.o
> > >>>>>>     obj-$(CONFIG_OF_RESOLVE)  += resolver.o
> > >>>>>>     obj-$(CONFIG_OF_OVERLAY) += overlay.o
> > >>>>>>     obj-$(CONFIG_OF_NUMA) += of_numa.o
> > >>>>>> -
> > >>>>>> -ifdef CONFIG_KEXEC_FILE
> > >>>>>> -ifdef CONFIG_OF_FLATTREE
> > >>>>>> -obj-y     += kexec.o
> > >>>>>> -endif
> > >>>>>> -endif
> > >>>>>> +obj-$(CONFIG_OF_KEXEC) += kexec.o
> > >>>>>>       obj-$(CONFIG_OF_UNITTEST) += unittest-data/
> > >>>>> Is it possible to reuse CONFIG_HAVE_IMA_KEXEC here?
> > >>>>>
> > >>>>
> > >>>> For ppc64 CONFIG_HAVE_IMA_KEXEC is selected when CONFIG_KEXEC_FILE is enabled.
> > >>>> So I don't see a problem in reusing CONFIG_HAVE_IMA_KEXEC for ppc.
> > >>>>
> > >>>> But for arm64, CONFIG_HAVE_IMA_KEXEC is enabled in the final patch in the patch
> > >>>> set (the one for carrying forward IMA log across kexec for arm64). arm64 calls
> > >>>> of_kexec_alloc_and_setup_fdt() prior to enabling CONFIG_HAVE_IMA_KEXEC and hence
> > >>>> breaks the build for arm64.
> > >>>
> > >>> One problem is that I believe that this patch won't placate the robot,
> > >>> because IIUC it generates config files at random and this change still
> > >>> allows hppa and s390 to enable CONFIG_OF_KEXEC.
> > >>
> > >> I enabled CONFIG_OF_KEXEC for s390. With my patch applied,
> > >> CONFIG_OF_KEXEC is removed. So I think the robot enabling this config
> > >> would not be a problem.
> > >>
> > >>>
> > >>> Perhaps a new CONFIG_HAVE_KIMAGE_ARCH option? Not having that option
> > >>> would still allow building kexec.o, but would be used inside kexec.c to
> > >>> avoid accessing kimage.arch members.
> > >>>
> > >>
> > >> I think this is a good idea - a new CONFIG_HAVE_KIMAGE_ARCH, which will
> > >> be selected by arm64 and ppc for now. I tried this, and it fixes the
> > >> build issue.
> > >>
> > >> Although, the name for the new config can be misleading since PARISC,
> > >> for instance, also defines "struct kimage_arch". Perhaps,
> > >> CONFIG_HAVE_ELF_KIMAGE_ARCH since of_kexec_alloc_and_setup_fdt() is
> > >> accessing ELF specific fields in "struct kimage_arch"?
> > >>
> > >> Rob/Mimi - please let us know which approach you think is better.
> > >
> > > I'd just move the fields to kimage.
> > >
> >
> > I think Mimi's suggestion to use CONFIG_HAVE_IMA_KEXEC for building
> > drivers/of/kexec.c would work and also avoid the bisect issue if we do
> > the following:
> 
> That seems wrong given only a portion of the file depends on IMA. And
> it reduces our compile coverage.
 
I agree with you this is the wrong solution.  Lakshmi's patch
introduced a new option to prevent other arch's from including kexec.o,
which is the same functionality as CONFIG_HAVE_IMA_KEXEC.  I'm just not
sure what the right solution would be.

Mimi


^ permalink raw reply

* Re: [PATCH] of: error: 'const struct kimage' has no member named 'arch'
From: Rob Herring @ 2021-02-19 17:43 UTC (permalink / raw)
  To: Lakshmi Ramasubramanian
  Cc: Sasha Levin, Stephen Rothwell, devicetree, Greg Kroah-Hartman,
	linuxppc-dev, linux-kernel@vger.kernel.org, Mimi Zohar,
	AKASHI, Takahiro, James Morse, Catalin Marinas, Joe Perches,
	linux-integrity, Will Deacon, Thiago Jung Bauermann,
	linux-arm-kernel
In-Reply-To: <98a061d1-05ea-eff2-5c5c-a59f491fe924@linux.microsoft.com>

On Fri, Feb 19, 2021 at 10:57 AM Lakshmi Ramasubramanian
<nramas@linux.microsoft.com> wrote:
>
> On 2/19/21 6:16 AM, Rob Herring wrote:
> > On Thu, Feb 18, 2021 at 8:53 PM Lakshmi Ramasubramanian
> > <nramas@linux.microsoft.com> wrote:
> >>
> >> On 2/18/21 5:13 PM, Thiago Jung Bauermann wrote:
> >>>
> >>> Lakshmi Ramasubramanian <nramas@linux.microsoft.com> writes:
> >>>
> >>>> On 2/18/21 4:07 PM, Mimi Zohar wrote:
> >>>>
> >>>> Hi Mimi,
> >>>>
> >>>>> On Thu, 2021-02-18 at 14:33 -0800, Lakshmi Ramasubramanian wrote:
> >>>>>> of_kexec_alloc_and_setup_fdt() defined in drivers/of/kexec.c builds
> >>>>>> a new device tree object that includes architecture specific data
> >>>>>> for kexec system call.  This should be defined only if the architecture
> >>>>>> being built defines kexec architecture structure "struct kimage_arch".
> >>>>>>
> >>>>>> Define a new boolean config OF_KEXEC that is enabled if
> >>>>>> CONFIG_KEXEC_FILE and CONFIG_OF_FLATTREE are enabled, and
> >>>>>> the architecture is arm64 or powerpc64.  Build drivers/of/kexec.c
> >>>>>> if CONFIG_OF_KEXEC is enabled.
> >>>>>>
> >>>>>> Signed-off-by: Lakshmi Ramasubramanian <nramas@linux.microsoft.com>
> >>>>>> Fixes: 33488dc4d61f ("of: Add a common kexec FDT setup function")
> >>>>>> Reported-by: kernel test robot <lkp@intel.com>
> >>>>>> ---
> >>>>>>     drivers/of/Kconfig  | 6 ++++++
> >>>>>>     drivers/of/Makefile | 7 +------
> >>>>>>     2 files changed, 7 insertions(+), 6 deletions(-)
> >>>>>>
> >>>>>> diff --git a/drivers/of/Kconfig b/drivers/of/Kconfig
> >>>>>> index 18450437d5d5..f2e8fa54862a 100644
> >>>>>> --- a/drivers/of/Kconfig
> >>>>>> +++ b/drivers/of/Kconfig
> >>>>>> @@ -100,4 +100,10 @@ config OF_DMA_DEFAULT_COHERENT
> >>>>>>             # arches should select this if DMA is coherent by default for OF devices
> >>>>>>             bool
> >>>>>>     +config OF_KEXEC
> >>>>>> +  bool
> >>>>>> +  depends on KEXEC_FILE
> >>>>>> +  depends on OF_FLATTREE
> >>>>>> +  default y if ARM64 || PPC64
> >>>>>> +
> >>>>>>     endif # OF
> >>>>>> diff --git a/drivers/of/Makefile b/drivers/of/Makefile
> >>>>>> index c13b982084a3..287579dd1695 100644
> >>>>>> --- a/drivers/of/Makefile
> >>>>>> +++ b/drivers/of/Makefile
> >>>>>> @@ -13,11 +13,6 @@ obj-$(CONFIG_OF_RESERVED_MEM) += of_reserved_mem.o
> >>>>>>     obj-$(CONFIG_OF_RESOLVE)  += resolver.o
> >>>>>>     obj-$(CONFIG_OF_OVERLAY) += overlay.o
> >>>>>>     obj-$(CONFIG_OF_NUMA) += of_numa.o
> >>>>>> -
> >>>>>> -ifdef CONFIG_KEXEC_FILE
> >>>>>> -ifdef CONFIG_OF_FLATTREE
> >>>>>> -obj-y     += kexec.o
> >>>>>> -endif
> >>>>>> -endif
> >>>>>> +obj-$(CONFIG_OF_KEXEC) += kexec.o
> >>>>>>       obj-$(CONFIG_OF_UNITTEST) += unittest-data/
> >>>>> Is it possible to reuse CONFIG_HAVE_IMA_KEXEC here?
> >>>>>
> >>>>
> >>>> For ppc64 CONFIG_HAVE_IMA_KEXEC is selected when CONFIG_KEXEC_FILE is enabled.
> >>>> So I don't see a problem in reusing CONFIG_HAVE_IMA_KEXEC for ppc.
> >>>>
> >>>> But for arm64, CONFIG_HAVE_IMA_KEXEC is enabled in the final patch in the patch
> >>>> set (the one for carrying forward IMA log across kexec for arm64). arm64 calls
> >>>> of_kexec_alloc_and_setup_fdt() prior to enabling CONFIG_HAVE_IMA_KEXEC and hence
> >>>> breaks the build for arm64.
> >>>
> >>> One problem is that I believe that this patch won't placate the robot,
> >>> because IIUC it generates config files at random and this change still
> >>> allows hppa and s390 to enable CONFIG_OF_KEXEC.
> >>
> >> I enabled CONFIG_OF_KEXEC for s390. With my patch applied,
> >> CONFIG_OF_KEXEC is removed. So I think the robot enabling this config
> >> would not be a problem.
> >>
> >>>
> >>> Perhaps a new CONFIG_HAVE_KIMAGE_ARCH option? Not having that option
> >>> would still allow building kexec.o, but would be used inside kexec.c to
> >>> avoid accessing kimage.arch members.
> >>>
> >>
> >> I think this is a good idea - a new CONFIG_HAVE_KIMAGE_ARCH, which will
> >> be selected by arm64 and ppc for now. I tried this, and it fixes the
> >> build issue.
> >>
> >> Although, the name for the new config can be misleading since PARISC,
> >> for instance, also defines "struct kimage_arch". Perhaps,
> >> CONFIG_HAVE_ELF_KIMAGE_ARCH since of_kexec_alloc_and_setup_fdt() is
> >> accessing ELF specific fields in "struct kimage_arch"?
> >>
> >> Rob/Mimi - please let us know which approach you think is better.
> >
> > I'd just move the fields to kimage.
> >
>
> I think Mimi's suggestion to use CONFIG_HAVE_IMA_KEXEC for building
> drivers/of/kexec.c would work and also avoid the bisect issue if we do
> the following:

That seems wrong given only a portion of the file depends on IMA. And
it reduces our compile coverage.

>   - In the patch set for carrying forward the IMA log on kexec, move the
> following patch to a later point in the set
>
> "[PATCH v18 04/11] arm64: Use common of_kexec_alloc_and_setup_fdt()"
>
> and merge the above patch with the following patch
> "[PATCH v18 11/11] arm64: Enable passing IMA log to next kernel on kexec"

If we're doing all that, then I'm dropping all this for 5.12.

Rob

^ permalink raw reply

* [PATCH] powerpc/mm: Move the linear_mapping_mutex to the ifdef where it is used
From: Sebastian Andrzej Siewior @ 2021-02-19 16:56 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Thomas Gleixner, Paul Mackerras, Sebastian Andrzej Siewior

The mutex linear_mapping_mutex is defined at the of the file while its
only two user are within the CONFIG_MEMORY_HOTPLUG block.
A compile without CONFIG_MEMORY_HOTPLUG set fails on PREEMPT_RT because
its mutex implementation is smart enough to realize that it is unused.

Move the definition of linear_mapping_mutex to ifdef block where it is
used.

Fixes: 1f73ad3e8d755 ("powerpc/mm: print warning in arch_remove_linear_mapping()")
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 arch/powerpc/mm/mem.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
index afab328d08874..d6c3f0b79f1d1 100644
--- a/arch/powerpc/mm/mem.c
+++ b/arch/powerpc/mm/mem.c
@@ -54,7 +54,6 @@
 
 #include <mm/mmu_decl.h>
 
-static DEFINE_MUTEX(linear_mapping_mutex);
 unsigned long long memory_limit;
 bool init_mem_is_free;
 
@@ -72,6 +71,7 @@ pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn,
 EXPORT_SYMBOL(phys_mem_access_prot);
 
 #ifdef CONFIG_MEMORY_HOTPLUG
+static DEFINE_MUTEX(linear_mapping_mutex);
 
 #ifdef CONFIG_NUMA
 int memory_add_physaddr_to_nid(u64 start)
-- 
2.30.0


^ permalink raw reply related

* Re: [PATCH] of: error: 'const struct kimage' has no member named 'arch'
From: Lakshmi Ramasubramanian @ 2021-02-19 16:56 UTC (permalink / raw)
  To: Rob Herring
  Cc: Sasha Levin, Stephen Rothwell, devicetree, Greg Kroah-Hartman,
	linuxppc-dev, linux-kernel@vger.kernel.org, Mimi Zohar,
	AKASHI, Takahiro, James Morse, Catalin Marinas, Joe Perches,
	linux-integrity, Will Deacon, Thiago Jung Bauermann,
	linux-arm-kernel
In-Reply-To: <CAL_JsqJiRV5xShOgso0PH2pFhv-yozay58i1uGQC0dJCVxkJPA@mail.gmail.com>

On 2/19/21 6:16 AM, Rob Herring wrote:
> On Thu, Feb 18, 2021 at 8:53 PM Lakshmi Ramasubramanian
> <nramas@linux.microsoft.com> wrote:
>>
>> On 2/18/21 5:13 PM, Thiago Jung Bauermann wrote:
>>>
>>> Lakshmi Ramasubramanian <nramas@linux.microsoft.com> writes:
>>>
>>>> On 2/18/21 4:07 PM, Mimi Zohar wrote:
>>>>
>>>> Hi Mimi,
>>>>
>>>>> On Thu, 2021-02-18 at 14:33 -0800, Lakshmi Ramasubramanian wrote:
>>>>>> of_kexec_alloc_and_setup_fdt() defined in drivers/of/kexec.c builds
>>>>>> a new device tree object that includes architecture specific data
>>>>>> for kexec system call.  This should be defined only if the architecture
>>>>>> being built defines kexec architecture structure "struct kimage_arch".
>>>>>>
>>>>>> Define a new boolean config OF_KEXEC that is enabled if
>>>>>> CONFIG_KEXEC_FILE and CONFIG_OF_FLATTREE are enabled, and
>>>>>> the architecture is arm64 or powerpc64.  Build drivers/of/kexec.c
>>>>>> if CONFIG_OF_KEXEC is enabled.
>>>>>>
>>>>>> Signed-off-by: Lakshmi Ramasubramanian <nramas@linux.microsoft.com>
>>>>>> Fixes: 33488dc4d61f ("of: Add a common kexec FDT setup function")
>>>>>> Reported-by: kernel test robot <lkp@intel.com>
>>>>>> ---
>>>>>>     drivers/of/Kconfig  | 6 ++++++
>>>>>>     drivers/of/Makefile | 7 +------
>>>>>>     2 files changed, 7 insertions(+), 6 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/of/Kconfig b/drivers/of/Kconfig
>>>>>> index 18450437d5d5..f2e8fa54862a 100644
>>>>>> --- a/drivers/of/Kconfig
>>>>>> +++ b/drivers/of/Kconfig
>>>>>> @@ -100,4 +100,10 @@ config OF_DMA_DEFAULT_COHERENT
>>>>>>             # arches should select this if DMA is coherent by default for OF devices
>>>>>>             bool
>>>>>>     +config OF_KEXEC
>>>>>> +  bool
>>>>>> +  depends on KEXEC_FILE
>>>>>> +  depends on OF_FLATTREE
>>>>>> +  default y if ARM64 || PPC64
>>>>>> +
>>>>>>     endif # OF
>>>>>> diff --git a/drivers/of/Makefile b/drivers/of/Makefile
>>>>>> index c13b982084a3..287579dd1695 100644
>>>>>> --- a/drivers/of/Makefile
>>>>>> +++ b/drivers/of/Makefile
>>>>>> @@ -13,11 +13,6 @@ obj-$(CONFIG_OF_RESERVED_MEM) += of_reserved_mem.o
>>>>>>     obj-$(CONFIG_OF_RESOLVE)  += resolver.o
>>>>>>     obj-$(CONFIG_OF_OVERLAY) += overlay.o
>>>>>>     obj-$(CONFIG_OF_NUMA) += of_numa.o
>>>>>> -
>>>>>> -ifdef CONFIG_KEXEC_FILE
>>>>>> -ifdef CONFIG_OF_FLATTREE
>>>>>> -obj-y     += kexec.o
>>>>>> -endif
>>>>>> -endif
>>>>>> +obj-$(CONFIG_OF_KEXEC) += kexec.o
>>>>>>       obj-$(CONFIG_OF_UNITTEST) += unittest-data/
>>>>> Is it possible to reuse CONFIG_HAVE_IMA_KEXEC here?
>>>>>
>>>>
>>>> For ppc64 CONFIG_HAVE_IMA_KEXEC is selected when CONFIG_KEXEC_FILE is enabled.
>>>> So I don't see a problem in reusing CONFIG_HAVE_IMA_KEXEC for ppc.
>>>>
>>>> But for arm64, CONFIG_HAVE_IMA_KEXEC is enabled in the final patch in the patch
>>>> set (the one for carrying forward IMA log across kexec for arm64). arm64 calls
>>>> of_kexec_alloc_and_setup_fdt() prior to enabling CONFIG_HAVE_IMA_KEXEC and hence
>>>> breaks the build for arm64.
>>>
>>> One problem is that I believe that this patch won't placate the robot,
>>> because IIUC it generates config files at random and this change still
>>> allows hppa and s390 to enable CONFIG_OF_KEXEC.
>>
>> I enabled CONFIG_OF_KEXEC for s390. With my patch applied,
>> CONFIG_OF_KEXEC is removed. So I think the robot enabling this config
>> would not be a problem.
>>
>>>
>>> Perhaps a new CONFIG_HAVE_KIMAGE_ARCH option? Not having that option
>>> would still allow building kexec.o, but would be used inside kexec.c to
>>> avoid accessing kimage.arch members.
>>>
>>
>> I think this is a good idea - a new CONFIG_HAVE_KIMAGE_ARCH, which will
>> be selected by arm64 and ppc for now. I tried this, and it fixes the
>> build issue.
>>
>> Although, the name for the new config can be misleading since PARISC,
>> for instance, also defines "struct kimage_arch". Perhaps,
>> CONFIG_HAVE_ELF_KIMAGE_ARCH since of_kexec_alloc_and_setup_fdt() is
>> accessing ELF specific fields in "struct kimage_arch"?
>>
>> Rob/Mimi - please let us know which approach you think is better.
> 
> I'd just move the fields to kimage.
> 

I think Mimi's suggestion to use CONFIG_HAVE_IMA_KEXEC for building 
drivers/of/kexec.c would work and also avoid the bisect issue if we do 
the following:

  - In the patch set for carrying forward the IMA log on kexec, move the 
following patch to a later point in the set

"[PATCH v18 04/11] arm64: Use common of_kexec_alloc_and_setup_fdt()"

and merge the above patch with the following patch
"[PATCH v18 11/11] arm64: Enable passing IMA log to next kernel on kexec"

I will try this now, and update.

thanks,
  -lakshmi

^ permalink raw reply

* Re: [PATCH] powerpc/kexec_file: Restore FDT size estimation for kdump kernel
From: Lakshmi Ramasubramanian @ 2021-02-19 16:00 UTC (permalink / raw)
  To: Thiago Jung Bauermann, linuxppc-dev
  Cc: Rob Herring, kexec, linux-kernel, Mimi Zohar, Hari Bathini
In-Reply-To: <20210219142552.762608-1-bauerman@linux.ibm.com>

On 2/19/21 6:25 AM, Thiago Jung Bauermann wrote:

One small nit in the function header (please see below), but otherwise 
the change looks good.

Reviewed-by: Lakshmi Ramasubramanian <nramas@linux.microsoft.com>

> Commit 2377c92e37fe ("powerpc/kexec_file: fix FDT size estimation for kdump
> kernel") fixed how elf64_load() estimates the FDT size needed by the
> crashdump kernel.
> 
> At the same time, commit 130b2d59cec0 ("powerpc: Use common
> of_kexec_alloc_and_setup_fdt()") changed the same code to use the generic
> function of_kexec_alloc_and_setup_fdt() to calculate the FDT size. That
> change made the code overestimate it a bit by counting twice the space
> required for the kernel command line and /chosen properties.
> 
> Therefore change kexec_fdt_totalsize_ppc64() to calculate just the extra
> space needed by the kdump kernel, and change the function name so that it
> better reflects what the function is now doing.
> 
> Signed-off-by: Thiago Jung Bauermann <bauerman@linux.ibm.com>
> ---
>   arch/powerpc/include/asm/kexec.h  |  2 +-
>   arch/powerpc/kexec/elf_64.c       |  2 +-
>   arch/powerpc/kexec/file_load_64.c | 26 ++++++++------------------
>   3 files changed, 10 insertions(+), 20 deletions(-)
> 
> Applies on top of next-20210219.
> 
> diff --git a/arch/powerpc/include/asm/kexec.h b/arch/powerpc/include/asm/kexec.h
> index baab158e215c..5a11cc8d2350 100644
> --- a/arch/powerpc/include/asm/kexec.h
> +++ b/arch/powerpc/include/asm/kexec.h
> @@ -128,7 +128,7 @@ int load_crashdump_segments_ppc64(struct kimage *image,
>   int setup_purgatory_ppc64(struct kimage *image, const void *slave_code,
>   			  const void *fdt, unsigned long kernel_load_addr,
>   			  unsigned long fdt_load_addr);
> -unsigned int kexec_fdt_totalsize_ppc64(struct kimage *image);
> +unsigned int kexec_extra_fdt_size_ppc64(struct kimage *image);
>   int setup_new_fdt_ppc64(const struct kimage *image, void *fdt,
>   			unsigned long initrd_load_addr,
>   			unsigned long initrd_len, const char *cmdline);
> diff --git a/arch/powerpc/kexec/elf_64.c b/arch/powerpc/kexec/elf_64.c
> index 0492ca6003f3..5a569bb51349 100644
> --- a/arch/powerpc/kexec/elf_64.c
> +++ b/arch/powerpc/kexec/elf_64.c
> @@ -104,7 +104,7 @@ static void *elf64_load(struct kimage *image, char *kernel_buf,
>   
>   	fdt = of_kexec_alloc_and_setup_fdt(image, initrd_load_addr,
>   					   initrd_len, cmdline,
> -					   kexec_fdt_totalsize_ppc64(image));
> +					   kexec_extra_fdt_size_ppc64(image));
>   	if (!fdt) {
>   		pr_err("Error setting up the new device tree.\n");
>   		ret = -EINVAL;
> diff --git a/arch/powerpc/kexec/file_load_64.c b/arch/powerpc/kexec/file_load_64.c
> index 3609de30a170..8541ba731908 100644
> --- a/arch/powerpc/kexec/file_load_64.c
> +++ b/arch/powerpc/kexec/file_load_64.c
> @@ -927,37 +927,27 @@ int setup_purgatory_ppc64(struct kimage *image, const void *slave_code,
>   }
>   
>   /**
> - * kexec_fdt_totalsize_ppc64 - Return the estimated size needed to setup FDT
> - *                             for kexec/kdump kernel.
> - * @image:                     kexec image being loaded.
> + * kexec_extra_fdt_size_ppc63 - Return the estimated size needed to setup FDT

Perhaps change to

"Return the estimated additional size needed to setup FDT for 
kexec/kdump kernel"?

  -lakshmi

> + *                              for kexec/kdump kernel.
> + * @image:                      kexec image being loaded.
>    *
> - * Returns the estimated size needed for kexec/kdump kernel FDT.
> + * Returns the estimated extra size needed for kexec/kdump kernel FDT.
>    */
> -unsigned int kexec_fdt_totalsize_ppc64(struct kimage *image)
> +unsigned int kexec_extra_fdt_size_ppc64(struct kimage *image)
>   {
> -	unsigned int fdt_size;
>   	u64 usm_entries;
>   
> -	/*
> -	 * The below estimate more than accounts for a typical kexec case where
> -	 * the additional space is to accommodate things like kexec cmdline,
> -	 * chosen node with properties for initrd start & end addresses and
> -	 * a property to indicate kexec boot..
> -	 */
> -	fdt_size = fdt_totalsize(initial_boot_params) + (2 * COMMAND_LINE_SIZE);
>   	if (image->type != KEXEC_TYPE_CRASH)
> -		return fdt_size;
> +		return 0;
>   
>   	/*
> -	 * For kdump kernel, also account for linux,usable-memory and
> +	 * For kdump kernel, account for linux,usable-memory and
>   	 * linux,drconf-usable-memory properties. Get an approximate on the
>   	 * number of usable memory entries and use for FDT size estimation.
>   	 */
>   	usm_entries = ((memblock_end_of_DRAM() / drmem_lmb_size()) +
>   		       (2 * (resource_size(&crashk_res) / drmem_lmb_size())));
> -	fdt_size += (unsigned int)(usm_entries * sizeof(u64));
> -
> -	return fdt_size;
> +	return (unsigned int)(usm_entries * sizeof(u64));
>   }
>   
>   /**
> 


^ permalink raw reply

* Re: [PATCH] of: error: 'const struct kimage' has no member named 'arch'
From: Mimi Zohar @ 2021-02-19 14:41 UTC (permalink / raw)
  To: Thiago Jung Bauermann, Lakshmi Ramasubramanian
  Cc: sashal, robh, sfr, gregkh, linuxppc-dev, linux-kernel,
	takahiro.akashi, devicetree, james.morse, catalin.marinas, joe,
	linux-integrity, will, linux-arm-kernel
In-Reply-To: <87eehcxi88.fsf@manicouagan.localdomain>

On Fri, 2021-02-19 at 11:08 -0300, Thiago Jung Bauermann wrote:
> Lakshmi Ramasubramanian <nramas@linux.microsoft.com> writes:
> 
> > On 2/18/21 5:13 PM, Thiago Jung Bauermann wrote:
> >> Lakshmi Ramasubramanian <nramas@linux.microsoft.com> writes:
> >> 
> >>> On 2/18/21 4:07 PM, Mimi Zohar wrote:
> >>>
> >>> Hi Mimi,
> >>>
> >>>> On Thu, 2021-02-18 at 14:33 -0800, Lakshmi Ramasubramanian wrote:
> >>>>> of_kexec_alloc_and_setup_fdt() defined in drivers/of/kexec.c builds
> >>>>> a new device tree object that includes architecture specific data
> >>>>> for kexec system call.  This should be defined only if the architecture
> >>>>> being built defines kexec architecture structure "struct kimage_arch".
> >>>>>
> >>>>> Define a new boolean config OF_KEXEC that is enabled if
> >>>>> CONFIG_KEXEC_FILE and CONFIG_OF_FLATTREE are enabled, and
> >>>>> the architecture is arm64 or powerpc64.  Build drivers/of/kexec.c
> >>>>> if CONFIG_OF_KEXEC is enabled.
> >>>>>
> >>>>> Signed-off-by: Lakshmi Ramasubramanian <nramas@linux.microsoft.com>
> >>>>> Fixes: 33488dc4d61f ("of: Add a common kexec FDT setup function")
> >>>>> Reported-by: kernel test robot <lkp@intel.com>
> >>>>> ---
> >>>>>    drivers/of/Kconfig  | 6 ++++++
> >>>>>    drivers/of/Makefile | 7 +------
> >>>>>    2 files changed, 7 insertions(+), 6 deletions(-)
> >>>>>
> >>>>> diff --git a/drivers/of/Kconfig b/drivers/of/Kconfig
> >>>>> index 18450437d5d5..f2e8fa54862a 100644
> >>>>> --- a/drivers/of/Kconfig
> >>>>> +++ b/drivers/of/Kconfig
> >>>>> @@ -100,4 +100,10 @@ config OF_DMA_DEFAULT_COHERENT
> >>>>>    	# arches should select this if DMA is coherent by default for OF devices
> >>>>>    	bool
> >>>>>    +config OF_KEXEC
> >>>>> +	bool
> >>>>> +	depends on KEXEC_FILE
> >>>>> +	depends on OF_FLATTREE
> >>>>> +	default y if ARM64 || PPC64
> >>>>> +
> >>>>>    endif # OF
> >>>>> diff --git a/drivers/of/Makefile b/drivers/of/Makefile
> >>>>> index c13b982084a3..287579dd1695 100644
> >>>>> --- a/drivers/of/Makefile
> >>>>> +++ b/drivers/of/Makefile
> >>>>> @@ -13,11 +13,6 @@ obj-$(CONFIG_OF_RESERVED_MEM) += of_reserved_mem.o
> >>>>>    obj-$(CONFIG_OF_RESOLVE)  += resolver.o
> >>>>>    obj-$(CONFIG_OF_OVERLAY) += overlay.o
> >>>>>    obj-$(CONFIG_OF_NUMA) += of_numa.o
> >>>>> -
> >>>>> -ifdef CONFIG_KEXEC_FILE
> >>>>> -ifdef CONFIG_OF_FLATTREE
> >>>>> -obj-y	+= kexec.o
> >>>>> -endif
> >>>>> -endif
> >>>>> +obj-$(CONFIG_OF_KEXEC) += kexec.o
> >>>>>      obj-$(CONFIG_OF_UNITTEST) += unittest-data/
> >>>> Is it possible to reuse CONFIG_HAVE_IMA_KEXEC here?
> >>>>
> >>>
> >>> For ppc64 CONFIG_HAVE_IMA_KEXEC is selected when CONFIG_KEXEC_FILE is enabled.
> >>> So I don't see a problem in reusing CONFIG_HAVE_IMA_KEXEC for ppc.
> >>>
> >>> But for arm64, CONFIG_HAVE_IMA_KEXEC is enabled in the final patch in the patch
> >>> set (the one for carrying forward IMA log across kexec for arm64). arm64 calls
> >>> of_kexec_alloc_and_setup_fdt() prior to enabling CONFIG_HAVE_IMA_KEXEC and hence
> >>> breaks the build for arm64.
> >> One problem is that I believe that this patch won't placate the robot,
> >> because IIUC it generates config files at random and this change still
> >> allows hppa and s390 to enable CONFIG_OF_KEXEC.
> >
> > I enabled CONFIG_OF_KEXEC for s390. With my patch applied, CONFIG_OF_KEXEC is
> > removed. So I think the robot enabling this config would not be a problem.
> >
> >> Perhaps a new CONFIG_HAVE_KIMAGE_ARCH option? Not having that option
> >> would still allow building kexec.o, but would be used inside kexec.c to
> >> avoid accessing kimage.arch members.
> >> 
> >
> > I think this is a good idea - a new CONFIG_HAVE_KIMAGE_ARCH, which will be
> > selected by arm64 and ppc for now. I tried this, and it fixes the build issue.
> >
> > Although, the name for the new config can be misleading since PARISC, for
> > instance, also defines "struct kimage_arch". Perhaps,
> > CONFIG_HAVE_ELF_KIMAGE_ARCH since of_kexec_alloc_and_setup_fdt() is 
> > accessing ELF specific fields in "struct kimage_arch"?
> 
> Ah, right. I should have digged into the code before making my
> suggestion. CONFIG_HAVE_KIMAGE_ARCH isn't appropriate, indeed.
> 
> >
> > Rob/Mimi - please let us know which approach you think is better.
> 
> Ah! We can actually use the existing CONFIG_HAVE_IMA_KEXEC, no? I don't
> know why I didn't think of it before.

Including kexec.o based on CONFIG_HAVE_IMA_KEXEC is a bisect issue on
ARM64, as Lakshmi pointed out.   Defining a new, maybe temporary, flag
would solve the problem.

Mimi



^ permalink raw reply

* [PATCH] powerpc/kexec_file: Restore FDT size estimation for kdump kernel
From: Thiago Jung Bauermann @ 2021-02-19 14:25 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Rob Herring, kexec, linux-kernel, Mimi Zohar,
	Lakshmi Ramasubramanian, Thiago Jung Bauermann, Hari Bathini

Commit 2377c92e37fe ("powerpc/kexec_file: fix FDT size estimation for kdump
kernel") fixed how elf64_load() estimates the FDT size needed by the
crashdump kernel.

At the same time, commit 130b2d59cec0 ("powerpc: Use common
of_kexec_alloc_and_setup_fdt()") changed the same code to use the generic
function of_kexec_alloc_and_setup_fdt() to calculate the FDT size. That
change made the code overestimate it a bit by counting twice the space
required for the kernel command line and /chosen properties.

Therefore change kexec_fdt_totalsize_ppc64() to calculate just the extra
space needed by the kdump kernel, and change the function name so that it
better reflects what the function is now doing.

Signed-off-by: Thiago Jung Bauermann <bauerman@linux.ibm.com>
---
 arch/powerpc/include/asm/kexec.h  |  2 +-
 arch/powerpc/kexec/elf_64.c       |  2 +-
 arch/powerpc/kexec/file_load_64.c | 26 ++++++++------------------
 3 files changed, 10 insertions(+), 20 deletions(-)

Applies on top of next-20210219.

diff --git a/arch/powerpc/include/asm/kexec.h b/arch/powerpc/include/asm/kexec.h
index baab158e215c..5a11cc8d2350 100644
--- a/arch/powerpc/include/asm/kexec.h
+++ b/arch/powerpc/include/asm/kexec.h
@@ -128,7 +128,7 @@ int load_crashdump_segments_ppc64(struct kimage *image,
 int setup_purgatory_ppc64(struct kimage *image, const void *slave_code,
 			  const void *fdt, unsigned long kernel_load_addr,
 			  unsigned long fdt_load_addr);
-unsigned int kexec_fdt_totalsize_ppc64(struct kimage *image);
+unsigned int kexec_extra_fdt_size_ppc64(struct kimage *image);
 int setup_new_fdt_ppc64(const struct kimage *image, void *fdt,
 			unsigned long initrd_load_addr,
 			unsigned long initrd_len, const char *cmdline);
diff --git a/arch/powerpc/kexec/elf_64.c b/arch/powerpc/kexec/elf_64.c
index 0492ca6003f3..5a569bb51349 100644
--- a/arch/powerpc/kexec/elf_64.c
+++ b/arch/powerpc/kexec/elf_64.c
@@ -104,7 +104,7 @@ static void *elf64_load(struct kimage *image, char *kernel_buf,
 
 	fdt = of_kexec_alloc_and_setup_fdt(image, initrd_load_addr,
 					   initrd_len, cmdline,
-					   kexec_fdt_totalsize_ppc64(image));
+					   kexec_extra_fdt_size_ppc64(image));
 	if (!fdt) {
 		pr_err("Error setting up the new device tree.\n");
 		ret = -EINVAL;
diff --git a/arch/powerpc/kexec/file_load_64.c b/arch/powerpc/kexec/file_load_64.c
index 3609de30a170..8541ba731908 100644
--- a/arch/powerpc/kexec/file_load_64.c
+++ b/arch/powerpc/kexec/file_load_64.c
@@ -927,37 +927,27 @@ int setup_purgatory_ppc64(struct kimage *image, const void *slave_code,
 }
 
 /**
- * kexec_fdt_totalsize_ppc64 - Return the estimated size needed to setup FDT
- *                             for kexec/kdump kernel.
- * @image:                     kexec image being loaded.
+ * kexec_extra_fdt_size_ppc63 - Return the estimated size needed to setup FDT
+ *                              for kexec/kdump kernel.
+ * @image:                      kexec image being loaded.
  *
- * Returns the estimated size needed for kexec/kdump kernel FDT.
+ * Returns the estimated extra size needed for kexec/kdump kernel FDT.
  */
-unsigned int kexec_fdt_totalsize_ppc64(struct kimage *image)
+unsigned int kexec_extra_fdt_size_ppc64(struct kimage *image)
 {
-	unsigned int fdt_size;
 	u64 usm_entries;
 
-	/*
-	 * The below estimate more than accounts for a typical kexec case where
-	 * the additional space is to accommodate things like kexec cmdline,
-	 * chosen node with properties for initrd start & end addresses and
-	 * a property to indicate kexec boot..
-	 */
-	fdt_size = fdt_totalsize(initial_boot_params) + (2 * COMMAND_LINE_SIZE);
 	if (image->type != KEXEC_TYPE_CRASH)
-		return fdt_size;
+		return 0;
 
 	/*
-	 * For kdump kernel, also account for linux,usable-memory and
+	 * For kdump kernel, account for linux,usable-memory and
 	 * linux,drconf-usable-memory properties. Get an approximate on the
 	 * number of usable memory entries and use for FDT size estimation.
 	 */
 	usm_entries = ((memblock_end_of_DRAM() / drmem_lmb_size()) +
 		       (2 * (resource_size(&crashk_res) / drmem_lmb_size())));
-	fdt_size += (unsigned int)(usm_entries * sizeof(u64));
-
-	return fdt_size;
+	return (unsigned int)(usm_entries * sizeof(u64));
 }
 
 /**

^ permalink raw reply related

* Re: [PATCH] of: error: 'const struct kimage' has no member named 'arch'
From: Rob Herring @ 2021-02-19 14:16 UTC (permalink / raw)
  To: Lakshmi Ramasubramanian
  Cc: Sasha Levin, Stephen Rothwell, devicetree, Greg Kroah-Hartman,
	linuxppc-dev, linux-kernel@vger.kernel.org, Mimi Zohar,
	AKASHI, Takahiro, James Morse, Catalin Marinas, Joe Perches,
	linux-integrity, Will Deacon, Thiago Jung Bauermann,
	linux-arm-kernel
In-Reply-To: <3ca0aa87-ca83-8024-4067-c2382a360db9@linux.microsoft.com>

On Thu, Feb 18, 2021 at 8:53 PM Lakshmi Ramasubramanian
<nramas@linux.microsoft.com> wrote:
>
> On 2/18/21 5:13 PM, Thiago Jung Bauermann wrote:
> >
> > Lakshmi Ramasubramanian <nramas@linux.microsoft.com> writes:
> >
> >> On 2/18/21 4:07 PM, Mimi Zohar wrote:
> >>
> >> Hi Mimi,
> >>
> >>> On Thu, 2021-02-18 at 14:33 -0800, Lakshmi Ramasubramanian wrote:
> >>>> of_kexec_alloc_and_setup_fdt() defined in drivers/of/kexec.c builds
> >>>> a new device tree object that includes architecture specific data
> >>>> for kexec system call.  This should be defined only if the architecture
> >>>> being built defines kexec architecture structure "struct kimage_arch".
> >>>>
> >>>> Define a new boolean config OF_KEXEC that is enabled if
> >>>> CONFIG_KEXEC_FILE and CONFIG_OF_FLATTREE are enabled, and
> >>>> the architecture is arm64 or powerpc64.  Build drivers/of/kexec.c
> >>>> if CONFIG_OF_KEXEC is enabled.
> >>>>
> >>>> Signed-off-by: Lakshmi Ramasubramanian <nramas@linux.microsoft.com>
> >>>> Fixes: 33488dc4d61f ("of: Add a common kexec FDT setup function")
> >>>> Reported-by: kernel test robot <lkp@intel.com>
> >>>> ---
> >>>>    drivers/of/Kconfig  | 6 ++++++
> >>>>    drivers/of/Makefile | 7 +------
> >>>>    2 files changed, 7 insertions(+), 6 deletions(-)
> >>>>
> >>>> diff --git a/drivers/of/Kconfig b/drivers/of/Kconfig
> >>>> index 18450437d5d5..f2e8fa54862a 100644
> >>>> --- a/drivers/of/Kconfig
> >>>> +++ b/drivers/of/Kconfig
> >>>> @@ -100,4 +100,10 @@ config OF_DMA_DEFAULT_COHERENT
> >>>>            # arches should select this if DMA is coherent by default for OF devices
> >>>>            bool
> >>>>    +config OF_KEXEC
> >>>> +  bool
> >>>> +  depends on KEXEC_FILE
> >>>> +  depends on OF_FLATTREE
> >>>> +  default y if ARM64 || PPC64
> >>>> +
> >>>>    endif # OF
> >>>> diff --git a/drivers/of/Makefile b/drivers/of/Makefile
> >>>> index c13b982084a3..287579dd1695 100644
> >>>> --- a/drivers/of/Makefile
> >>>> +++ b/drivers/of/Makefile
> >>>> @@ -13,11 +13,6 @@ obj-$(CONFIG_OF_RESERVED_MEM) += of_reserved_mem.o
> >>>>    obj-$(CONFIG_OF_RESOLVE)  += resolver.o
> >>>>    obj-$(CONFIG_OF_OVERLAY) += overlay.o
> >>>>    obj-$(CONFIG_OF_NUMA) += of_numa.o
> >>>> -
> >>>> -ifdef CONFIG_KEXEC_FILE
> >>>> -ifdef CONFIG_OF_FLATTREE
> >>>> -obj-y     += kexec.o
> >>>> -endif
> >>>> -endif
> >>>> +obj-$(CONFIG_OF_KEXEC) += kexec.o
> >>>>      obj-$(CONFIG_OF_UNITTEST) += unittest-data/
> >>> Is it possible to reuse CONFIG_HAVE_IMA_KEXEC here?
> >>>
> >>
> >> For ppc64 CONFIG_HAVE_IMA_KEXEC is selected when CONFIG_KEXEC_FILE is enabled.
> >> So I don't see a problem in reusing CONFIG_HAVE_IMA_KEXEC for ppc.
> >>
> >> But for arm64, CONFIG_HAVE_IMA_KEXEC is enabled in the final patch in the patch
> >> set (the one for carrying forward IMA log across kexec for arm64). arm64 calls
> >> of_kexec_alloc_and_setup_fdt() prior to enabling CONFIG_HAVE_IMA_KEXEC and hence
> >> breaks the build for arm64.
> >
> > One problem is that I believe that this patch won't placate the robot,
> > because IIUC it generates config files at random and this change still
> > allows hppa and s390 to enable CONFIG_OF_KEXEC.
>
> I enabled CONFIG_OF_KEXEC for s390. With my patch applied,
> CONFIG_OF_KEXEC is removed. So I think the robot enabling this config
> would not be a problem.
>
> >
> > Perhaps a new CONFIG_HAVE_KIMAGE_ARCH option? Not having that option
> > would still allow building kexec.o, but would be used inside kexec.c to
> > avoid accessing kimage.arch members.
> >
>
> I think this is a good idea - a new CONFIG_HAVE_KIMAGE_ARCH, which will
> be selected by arm64 and ppc for now. I tried this, and it fixes the
> build issue.
>
> Although, the name for the new config can be misleading since PARISC,
> for instance, also defines "struct kimage_arch". Perhaps,
> CONFIG_HAVE_ELF_KIMAGE_ARCH since of_kexec_alloc_and_setup_fdt() is
> accessing ELF specific fields in "struct kimage_arch"?
>
> Rob/Mimi - please let us know which approach you think is better.

I'd just move the fields to kimage.

Rob

^ permalink raw reply

* Re: [PATCH] of: error: 'const struct kimage' has no member named 'arch'
From: Thiago Jung Bauermann @ 2021-02-19 14:08 UTC (permalink / raw)
  To: Lakshmi Ramasubramanian
  Cc: sashal, robh, sfr, gregkh, linuxppc-dev, linux-kernel, Mimi Zohar,
	takahiro.akashi, devicetree, james.morse, catalin.marinas, joe,
	linux-integrity, will, linux-arm-kernel
In-Reply-To: <3ca0aa87-ca83-8024-4067-c2382a360db9@linux.microsoft.com>


Lakshmi Ramasubramanian <nramas@linux.microsoft.com> writes:

> On 2/18/21 5:13 PM, Thiago Jung Bauermann wrote:
>> Lakshmi Ramasubramanian <nramas@linux.microsoft.com> writes:
>> 
>>> On 2/18/21 4:07 PM, Mimi Zohar wrote:
>>>
>>> Hi Mimi,
>>>
>>>> On Thu, 2021-02-18 at 14:33 -0800, Lakshmi Ramasubramanian wrote:
>>>>> of_kexec_alloc_and_setup_fdt() defined in drivers/of/kexec.c builds
>>>>> a new device tree object that includes architecture specific data
>>>>> for kexec system call.  This should be defined only if the architecture
>>>>> being built defines kexec architecture structure "struct kimage_arch".
>>>>>
>>>>> Define a new boolean config OF_KEXEC that is enabled if
>>>>> CONFIG_KEXEC_FILE and CONFIG_OF_FLATTREE are enabled, and
>>>>> the architecture is arm64 or powerpc64.  Build drivers/of/kexec.c
>>>>> if CONFIG_OF_KEXEC is enabled.
>>>>>
>>>>> Signed-off-by: Lakshmi Ramasubramanian <nramas@linux.microsoft.com>
>>>>> Fixes: 33488dc4d61f ("of: Add a common kexec FDT setup function")
>>>>> Reported-by: kernel test robot <lkp@intel.com>
>>>>> ---
>>>>>    drivers/of/Kconfig  | 6 ++++++
>>>>>    drivers/of/Makefile | 7 +------
>>>>>    2 files changed, 7 insertions(+), 6 deletions(-)
>>>>>
>>>>> diff --git a/drivers/of/Kconfig b/drivers/of/Kconfig
>>>>> index 18450437d5d5..f2e8fa54862a 100644
>>>>> --- a/drivers/of/Kconfig
>>>>> +++ b/drivers/of/Kconfig
>>>>> @@ -100,4 +100,10 @@ config OF_DMA_DEFAULT_COHERENT
>>>>>    	# arches should select this if DMA is coherent by default for OF devices
>>>>>    	bool
>>>>>    +config OF_KEXEC
>>>>> +	bool
>>>>> +	depends on KEXEC_FILE
>>>>> +	depends on OF_FLATTREE
>>>>> +	default y if ARM64 || PPC64
>>>>> +
>>>>>    endif # OF
>>>>> diff --git a/drivers/of/Makefile b/drivers/of/Makefile
>>>>> index c13b982084a3..287579dd1695 100644
>>>>> --- a/drivers/of/Makefile
>>>>> +++ b/drivers/of/Makefile
>>>>> @@ -13,11 +13,6 @@ obj-$(CONFIG_OF_RESERVED_MEM) += of_reserved_mem.o
>>>>>    obj-$(CONFIG_OF_RESOLVE)  += resolver.o
>>>>>    obj-$(CONFIG_OF_OVERLAY) += overlay.o
>>>>>    obj-$(CONFIG_OF_NUMA) += of_numa.o
>>>>> -
>>>>> -ifdef CONFIG_KEXEC_FILE
>>>>> -ifdef CONFIG_OF_FLATTREE
>>>>> -obj-y	+= kexec.o
>>>>> -endif
>>>>> -endif
>>>>> +obj-$(CONFIG_OF_KEXEC) += kexec.o
>>>>>      obj-$(CONFIG_OF_UNITTEST) += unittest-data/
>>>> Is it possible to reuse CONFIG_HAVE_IMA_KEXEC here?
>>>>
>>>
>>> For ppc64 CONFIG_HAVE_IMA_KEXEC is selected when CONFIG_KEXEC_FILE is enabled.
>>> So I don't see a problem in reusing CONFIG_HAVE_IMA_KEXEC for ppc.
>>>
>>> But for arm64, CONFIG_HAVE_IMA_KEXEC is enabled in the final patch in the patch
>>> set (the one for carrying forward IMA log across kexec for arm64). arm64 calls
>>> of_kexec_alloc_and_setup_fdt() prior to enabling CONFIG_HAVE_IMA_KEXEC and hence
>>> breaks the build for arm64.
>> One problem is that I believe that this patch won't placate the robot,
>> because IIUC it generates config files at random and this change still
>> allows hppa and s390 to enable CONFIG_OF_KEXEC.
>
> I enabled CONFIG_OF_KEXEC for s390. With my patch applied, CONFIG_OF_KEXEC is
> removed. So I think the robot enabling this config would not be a problem.
>
>> Perhaps a new CONFIG_HAVE_KIMAGE_ARCH option? Not having that option
>> would still allow building kexec.o, but would be used inside kexec.c to
>> avoid accessing kimage.arch members.
>> 
>
> I think this is a good idea - a new CONFIG_HAVE_KIMAGE_ARCH, which will be
> selected by arm64 and ppc for now. I tried this, and it fixes the build issue.
>
> Although, the name for the new config can be misleading since PARISC, for
> instance, also defines "struct kimage_arch". Perhaps,
> CONFIG_HAVE_ELF_KIMAGE_ARCH since of_kexec_alloc_and_setup_fdt() is 
> accessing ELF specific fields in "struct kimage_arch"?

Ah, right. I should have digged into the code before making my
suggestion. CONFIG_HAVE_KIMAGE_ARCH isn't appropriate, indeed.

>
> Rob/Mimi - please let us know which approach you think is better.

Ah! We can actually use the existing CONFIG_HAVE_IMA_KEXEC, no? I don't
know why I didn't think of it before.

-- 
Thiago Jung Bauermann
IBM Linux Technology Center

^ permalink raw reply

* Re: [PATCH v12 13/14] mm/vmalloc: Hugepage vmalloc mappings
From: Ding Tianhong @ 2021-02-19  8:52 UTC (permalink / raw)
  To: Nicholas Piggin, Andrew Morton, linux-mm
  Cc: linux-arch, linux-kernel, Christoph Hellwig, Jonathan Cameron,
	Rick Edgecombe, linuxppc-dev
In-Reply-To: <1613720396.pnvmwaa8om.astroid@bobo.none>

On 2021/2/19 15:45, Nicholas Piggin wrote:
> Excerpts from Ding Tianhong's message of February 19, 2021 1:45 pm:
>> Hi Nicholas:
>>
>> I met some problem for this patch, like this:
>>
>> kva = vmalloc(3*1024k);
>>
>> remap_vmalloc_range(xxx, kva, xxx)
>>
>> It failed because that the check for page_count(page) is null so return, it break the some logic for current modules.
>> because the new huge page is not valid for composed page.
> 
> Hey Ding, that's a good catch. How are you testing this stuff, do you 
> have a particular driver that does this?
> 

yes, The driver would get a memory from the vmalloc in kernel space, and then the physical same memory will mmap to the user space. The drivers could not work when applying this patch.

>> I think some guys really don't get used to the changes for the vmalloc that the small pages was transparency to the hugepage
>> when the size is bigger than the PMD_SIZE.
> 
> I think in this case vmalloc could allocate the large page as a compound
> page which would solve this problem I think? (without having actually 
> tested it)
> 

yes, i think the __GFP_COMP flag could fix this.

>> can we think about give a new static huge page to fix it? just like use a a new vmalloc_huge_xxx function to disginguish the current function,
>> the user could choose to use the transparent hugepage or static hugepage for vmalloc.
> 
> Yeah that's a good question, there are a few things in the huge vmalloc 
> code that accounts things as small pages and you can't assume large or 
> small. If there is benefit from forcing large pages that could certainly
> be added.
> 

The vmalloc transparent is good, but not fit every user scenes, some guys like to use the deterministic function
for performance critical area.

Thanks
Ding

> Interestingly, remap_vmalloc_range in theory could map the pages as 
> large in userspace as well. That takes more work but if something
> really needs that for performance, it could be done.
> 
> Thanks,
> Nick
> .
> 


^ permalink raw reply

* Re: [RFC PATCH 1/9] KVM: PPC: Book3S 64: move KVM interrupt entry to a common entry point
From: Nicholas Piggin @ 2021-02-19  8:03 UTC (permalink / raw)
  To: Daniel Axtens, kvm-ppc; +Cc: linuxppc-dev
In-Reply-To: <87o8ggab50.fsf@linkitivity.dja.id.au>

Excerpts from Daniel Axtens's message of February 19, 2021 3:18 pm:
> Hi Nick,
> 
>> +++ b/arch/powerpc/kvm/book3s_64_entry.S
>> @@ -0,0 +1,34 @@
>> +#include <asm/cache.h>
>> +#include <asm/ppc_asm.h>
>> +#include <asm/kvm_asm.h>
>> +#include <asm/reg.h>
>> +#include <asm/asm-offsets.h>
>> +#include <asm/kvm_book3s_asm.h>
>> +
>> +/*
>> + * We come here from the first-level interrupt handlers.
>> + */
>> +.global	kvmppc_interrupt
>> +.balign IFETCH_ALIGN_BYTES
>> +kvmppc_interrupt:
>> +	/*
>> +	 * Register contents:
> 
> Clearly r9 contains some data at this point, and I think it's guest r9
> because of what you say later on in
> book3s_hv_rmhandlers.S::kvmppc_interrupt_hv. Is that right?

Yes I hope so.

> Should that
> be documented in this comment as well?

Normally it can be assumed the registers not explicitly enumerated would 
be unchanged from the interrupted context, so they would all contain 
guest values. I added the R9 contents comment later because I changed
it later.

> 
>> +	 * R12		= (guest CR << 32) | interrupt vector
>> +	 * R13		= PACA
>> +	 * guest R12 saved in shadow VCPU SCRATCH0
>> +	 * guest R13 saved in SPRN_SCRATCH0
>> +	 */
>> +#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
>> +	std	r9, HSTATE_SCRATCH2(r13)
>> +	lbz	r9, HSTATE_IN_GUEST(r13)
>> +	cmpwi	r9, KVM_GUEST_MODE_HOST_HV
>> +	beq	kvmppc_bad_host_intr
>> +#ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE
>> +	cmpwi	r9, KVM_GUEST_MODE_GUEST
>> +	ld	r9, HSTATE_SCRATCH2(r13)
>> +	beq	kvmppc_interrupt_pr
>> +#endif
>> +	b	kvmppc_interrupt_hv
>> +#else
>> +	b	kvmppc_interrupt_pr
>> +#endif
> 
> Apart from that I had a look and convinced myself that the code will
> behave the same as before. On that basis:
> 
> Reviewed-by: Daniel Axtens <dja@axtens.net>
> 
> Kind regards,
> Daniel

Thanks,
Nick
> 

^ permalink raw reply

* Re: [RFC PATCH 2/9] KVM: PPC: Book3S 64: Move GUEST_MODE_SKIP test into KVM
From: Nicholas Piggin @ 2021-02-19  7:56 UTC (permalink / raw)
  To: Daniel Axtens, kvm-ppc; +Cc: linuxppc-dev
In-Reply-To: <87lfbka92i.fsf@linkitivity.dja.id.au>

Excerpts from Daniel Axtens's message of February 19, 2021 4:03 pm:
> Hi Nick,
> 
>> +maybe_skip:
>> +	cmpwi	r12,0x200
>> +	beq	1f
>> +	cmpwi	r12,0x300
>> +	beq	1f
>> +	cmpwi	r12,0x380
>> +	beq	1f
>> +#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
>> +	/* XXX: cbe stuff? instruction breakpoint? */
>> +	cmpwi	r12,0xe02
>> +	beq	2f
>> +#endif
>> +	b	no_skip
>> +1:	mfspr	r9,SPRN_SRR0
>> +	addi	r9,r9,4
>> +	mtspr	SPRN_SRR0,r9
>> +	ld	r12,HSTATE_SCRATCH0(r13)
>> +	ld	r9,HSTATE_SCRATCH2(r13)
>> +	GET_SCRATCH0(r13)
>> +	RFI_TO_KERNEL
>> +#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
>> +2:	mfspr	r9,SPRN_HSRR0
>> +	addi	r9,r9,4
>> +	mtspr	SPRN_HSRR0,r9
>> +	ld	r12,HSTATE_SCRATCH0(r13)
>> +	ld	r9,HSTATE_SCRATCH2(r13)
>> +	GET_SCRATCH0(r13)
>> +	HRFI_TO_KERNEL
>> +#endif
> 
> If I understand correctly, label 1 is the kvmppc_skip_interrupt and
> label 2 is the kvmppc_skip_Hinterrupt. Would it be easier to understand
> if we used symbolic labels, or do you think the RFI_TO_KERNEL vs
> HRFI_TO_KERNEL and other changes are sufficient?

Yeah my thinking was it's okay this way because we've got all the 
context there, whereas prior to this patch those were branched to from 
far away places so the names helped more.

If the discontiguity or nesting was any larger then yes I would say 
naming the labels is probably a good idea (e.g., maybe_skip / no_skip).

> Apart from that, I haven't checked the precise copy-paste to make sure
> nothing has changed by accident, but I am able to follow the general
> idea of the patch and am vigorously in favour of anything that
> simplifies our exception/interrupt paths!

Thanks,
Nick

^ permalink raw reply

* Re: [RFC PATCH 2/9] KVM: PPC: Book3S 64: Move GUEST_MODE_SKIP test into KVM
From: Nicholas Piggin @ 2021-02-19  7:53 UTC (permalink / raw)
  To: Fabiano Rosas, kvm-ppc; +Cc: linuxppc-dev
In-Reply-To: <87tuqhxc01.fsf@linux.ibm.com>

Excerpts from Fabiano Rosas's message of February 13, 2021 6:33 am:
> Nicholas Piggin <npiggin@gmail.com> writes:
> 
>> Move the GUEST_MODE_SKIP logic into KVM code. This is quite a KVM
>> internal detail that has no real need to be in common handlers.
> 
> LGTM,
> 
>>
>> (XXX: Need to confirm CBE handlers etc)
> 
> Do these interrupts exist only in Cell? I see that they set HSRRs and
> MSR_HV, but CPU_FTRS_CELL does not contain CPU_HVMODE. So I don't get
> why they use the KVM macros.

Good question, I asked Michael Ellerman and he said there is a bare 
metal Cell which predates or otherwise does not define HVMODE.

However it does not support KVM. So I think we can remove it.

> And for the instruction_breakpoint (0x1300) I think it would help if we
> could at least restrict when it is built. But I don't know what
> ISA/processor version it is from.

Yeah we should be documenting these non-architected handlers a little 
better IMO.

I think we can get rid of the kvm skip from this one though. I've done 
that in a separate patch in the next series.

Thanks,
Nick

^ permalink raw reply

* Re: [PATCH v12 13/14] mm/vmalloc: Hugepage vmalloc mappings
From: Nicholas Piggin @ 2021-02-19  7:45 UTC (permalink / raw)
  To: Andrew Morton, Ding Tianhong, linux-mm
  Cc: linux-arch, linux-kernel, Christoph Hellwig, Jonathan Cameron,
	Rick Edgecombe, linuxppc-dev
In-Reply-To: <e18ef38c-ecef-b15c-29b1-bd4acf0e7fe5@huawei.com>

Excerpts from Ding Tianhong's message of February 19, 2021 1:45 pm:
> Hi Nicholas:
> 
> I met some problem for this patch, like this:
> 
> kva = vmalloc(3*1024k);
> 
> remap_vmalloc_range(xxx, kva, xxx)
> 
> It failed because that the check for page_count(page) is null so return, it break the some logic for current modules.
> because the new huge page is not valid for composed page.

Hey Ding, that's a good catch. How are you testing this stuff, do you 
have a particular driver that does this?

> I think some guys really don't get used to the changes for the vmalloc that the small pages was transparency to the hugepage
> when the size is bigger than the PMD_SIZE.

I think in this case vmalloc could allocate the large page as a compound
page which would solve this problem I think? (without having actually 
tested it)

> can we think about give a new static huge page to fix it? just like use a a new vmalloc_huge_xxx function to disginguish the current function,
> the user could choose to use the transparent hugepage or static hugepage for vmalloc.

Yeah that's a good question, there are a few things in the huge vmalloc 
code that accounts things as small pages and you can't assume large or 
small. If there is benefit from forcing large pages that could certainly
be added.

Interestingly, remap_vmalloc_range in theory could map the pages as 
large in userspace as well. That takes more work but if something
really needs that for performance, it could be done.

Thanks,
Nick

^ permalink raw reply

* [PATCH 13/13] KVM: PPC: Book3S HV: Implement the rest of the P9 entry/exit handling in C
From: Nicholas Piggin @ 2021-02-19  6:35 UTC (permalink / raw)
  To: kvm-ppc; +Cc: linuxppc-dev, Nicholas Piggin
In-Reply-To: <20210219063542.1425130-1-npiggin@gmail.com>

Almost all logic is moved to C, by introducing a new in_guest mode that
selects and branches very early in the interrupt handler to the P9 exit
code.

The remaining assembly is only about 160 lines of low level stack setup,
with VCPU vs host register save and restore, plus a small shim to the
legacy paths in the interrupt handler.

There are two motivations for this, the first is just make the code more
maintainable being in C. The second is to reduce the amount of code
running in a special KVM mode, "realmode". I put that in quotes because
with radix it is no longer necessarily real-mode in the MMU, but it
still has to be treated specially because it may be in real-mode, and
has various important registers like PID, DEC, TB, etc set to guest.
This is hostile to the rest of Linux and can't use arbitrary kernel
functionality or be instrumented well.

This initial patch is a reasonably faithful conversion of the asm code,
one notable difference being no real-mode hcall handler which necessitates
a small change to the H_CEDE call (although that makes it nicely common
with the nested HV case).

This also lacks any loop to return quickly back into the guest without
switching out of realmode in the case of unimportant or easily handled
interrupts.

The point was to reduce complexity and improve the ability to instrument
things, and it actually remains to be seen whether these short cuts are
required for modern processors. Radix, independent threads, XIVE, large
decrementer etc should all combine to vastly reduce guest exit frequency
compared with a POWER8. And radix mode additionally makes such exits
less costly by avoiding the need to flush and reload the SLB, flush
ERATs, etc.

Some of these things may be re-added if performance requires, or if more
of the other paths begin to be converted into C as well.

not-yet-Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/include/asm/asm-prototypes.h |   3 +-
 arch/powerpc/include/asm/kvm_asm.h        |   3 +-
 arch/powerpc/include/asm/kvm_book3s_64.h  |   2 +
 arch/powerpc/include/asm/kvm_ppc.h        |   2 +
 arch/powerpc/kernel/security.c            |   5 +-
 arch/powerpc/kvm/Makefile                 |   3 +
 arch/powerpc/kvm/book3s_64_entry.S        | 180 +++++++++++++++++++
 arch/powerpc/kvm/book3s_hv.c              |  19 +-
 arch/powerpc/kvm/book3s_hv_interrupt.c    | 208 ++++++++++++++++++++++
 arch/powerpc/kvm/book3s_hv_rmhandlers.S   | 112 +-----------
 arch/powerpc/kvm/book3s_xive.c            |  32 ++++
 11 files changed, 453 insertions(+), 116 deletions(-)
 create mode 100644 arch/powerpc/kvm/book3s_hv_interrupt.c

diff --git a/arch/powerpc/include/asm/asm-prototypes.h b/arch/powerpc/include/asm/asm-prototypes.h
index d0b832cbbec8..00eb5224019c 100644
--- a/arch/powerpc/include/asm/asm-prototypes.h
+++ b/arch/powerpc/include/asm/asm-prototypes.h
@@ -151,6 +151,7 @@ extern s32 patch__call_flush_branch_caches3;
 extern s32 patch__flush_count_cache_return;
 extern s32 patch__flush_link_stack_return;
 extern s32 patch__call_kvm_flush_link_stack;
+extern s32 patch__call_kvm_flush_link_stack_2;
 extern s32 patch__memset_nocache, patch__memcpy_nocache;
 
 extern long flush_branch_caches;
@@ -171,7 +172,7 @@ void kvmhv_load_host_pmu(void);
 void kvmhv_save_guest_pmu(struct kvm_vcpu *vcpu, bool pmu_in_use);
 void kvmhv_load_guest_pmu(struct kvm_vcpu *vcpu);
 
-int __kvmhv_vcpu_entry_p9(struct kvm_vcpu *vcpu);
+void kvmppc_p9_enter_guest(struct kvm_vcpu *vcpu);
 
 long kvmppc_h_set_dabr(struct kvm_vcpu *vcpu, unsigned long dabr);
 long kvmppc_h_set_xdabr(struct kvm_vcpu *vcpu, unsigned long dabr,
diff --git a/arch/powerpc/include/asm/kvm_asm.h b/arch/powerpc/include/asm/kvm_asm.h
index a3633560493b..b4f9996bd331 100644
--- a/arch/powerpc/include/asm/kvm_asm.h
+++ b/arch/powerpc/include/asm/kvm_asm.h
@@ -146,7 +146,8 @@
 #define KVM_GUEST_MODE_GUEST	1
 #define KVM_GUEST_MODE_SKIP	2
 #define KVM_GUEST_MODE_GUEST_HV	3
-#define KVM_GUEST_MODE_HOST_HV	4
+#define KVM_GUEST_MODE_GUEST_HV_FAST	4 /* ISA v3.0 with host radix mode */
+#define KVM_GUEST_MODE_HOST_HV	5
 
 #define KVM_INST_FETCH_FAILED	-1
 
diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h
index 9bb9bb370b53..7f08f6ed73df 100644
--- a/arch/powerpc/include/asm/kvm_book3s_64.h
+++ b/arch/powerpc/include/asm/kvm_book3s_64.h
@@ -153,6 +153,8 @@ static inline bool kvmhv_vcpu_is_radix(struct kvm_vcpu *vcpu)
 	return radix;
 }
 
+int __kvmhv_vcpu_entry_p9(struct kvm_vcpu *vcpu);
+
 #define KVM_DEFAULT_HPT_ORDER	24	/* 16MB HPT by default */
 #endif
 
diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
index 45b7610773b1..013e9d8b0754 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -670,6 +670,7 @@ extern int kvmppc_xive_set_icp(struct kvm_vcpu *vcpu, u64 icpval);
 extern int kvmppc_xive_set_irq(struct kvm *kvm, int irq_source_id, u32 irq,
 			       int level, bool line_status);
 extern void kvmppc_xive_push_vcpu(struct kvm_vcpu *vcpu);
+extern void kvmppc_xive_pull_vcpu(struct kvm_vcpu *vcpu);
 
 static inline int kvmppc_xive_enabled(struct kvm_vcpu *vcpu)
 {
@@ -710,6 +711,7 @@ static inline int kvmppc_xive_set_icp(struct kvm_vcpu *vcpu, u64 icpval) { retur
 static inline int kvmppc_xive_set_irq(struct kvm *kvm, int irq_source_id, u32 irq,
 				      int level, bool line_status) { return -ENODEV; }
 static inline void kvmppc_xive_push_vcpu(struct kvm_vcpu *vcpu) { }
+static inline void kvmppc_xive_pull_vcpu(struct kvm_vcpu *vcpu) { }
 
 static inline int kvmppc_xive_enabled(struct kvm_vcpu *vcpu)
 	{ return 0; }
diff --git a/arch/powerpc/kernel/security.c b/arch/powerpc/kernel/security.c
index e4e1a94ccf6a..6c37aeed0650 100644
--- a/arch/powerpc/kernel/security.c
+++ b/arch/powerpc/kernel/security.c
@@ -430,16 +430,19 @@ device_initcall(stf_barrier_debugfs_init);
 
 static void update_branch_cache_flush(void)
 {
-	u32 *site;
+	u32 *site, __maybe_unused *site2;
 
 #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
 	site = &patch__call_kvm_flush_link_stack;
+	site2 = &patch__call_kvm_flush_link_stack_2;
 	// This controls the branch from guest_exit_cont to kvm_flush_link_stack
 	if (link_stack_flush_type == BRANCH_CACHE_FLUSH_NONE) {
 		patch_instruction_site(site, ppc_inst(PPC_INST_NOP));
+		patch_instruction_site(site2, ppc_inst(PPC_INST_NOP));
 	} else {
 		// Could use HW flush, but that could also flush count cache
 		patch_branch_site(site, (u64)&kvm_flush_link_stack, BRANCH_SET_LINK);
+		patch_branch_site(site2, (u64)&kvm_flush_link_stack, BRANCH_SET_LINK);
 	}
 #endif
 
diff --git a/arch/powerpc/kvm/Makefile b/arch/powerpc/kvm/Makefile
index cdd119028f64..b94be8c9bad1 100644
--- a/arch/powerpc/kvm/Makefile
+++ b/arch/powerpc/kvm/Makefile
@@ -43,6 +43,9 @@ kvm-objs-$(CONFIG_KVM_E500MC) := $(kvm-e500mc-objs)
 kvm-book3s_64-builtin-objs-$(CONFIG_SPAPR_TCE_IOMMU) := \
 	book3s_64_vio_hv.o
 
+kvm-book3s_64-objs-y += \
+	book3s_hv_interrupt.o
+
 kvm-pr-y := \
 	fpu.o \
 	emulate.o \
diff --git a/arch/powerpc/kvm/book3s_64_entry.S b/arch/powerpc/kvm/book3s_64_entry.S
index 6f06b58b1bdd..5fb605dc457c 100644
--- a/arch/powerpc/kvm/book3s_64_entry.S
+++ b/arch/powerpc/kvm/book3s_64_entry.S
@@ -1,10 +1,13 @@
 #include <asm/asm-offsets.h>
 #include <asm/cache.h>
+#include <asm/code-patching-asm.h>
 #include <asm/exception-64s.h>
+#include <asm/export.h>
 #include <asm/kvm_asm.h>
 #include <asm/kvm_book3s_asm.h>
 #include <asm/ppc_asm.h>
 #include <asm/reg.h>
+#include <asm/ultravisor-api.h>
 
 /*
  * These are branched to from interrupt handlers in exception-64s.S which set
@@ -13,13 +16,24 @@
 .global	kvmppc_hcall
 .balign IFETCH_ALIGN_BYTES
 kvmppc_hcall:
+	lbz	r10,HSTATE_IN_GUEST(r13)
+	cmpwi	r10,KVM_GUEST_MODE_GUEST_HV_FAST
+	beq	kvmppc_p9_exit_hcall
 	ld	r10,PACA_EXGEN+EX_R13(r13)
 	SET_SCRATCH0(r10)
 	li	r10,0xc00
+	li	r11,PACA_EXGEN
+	b	1f
 
 .global	kvmppc_interrupt
 .balign IFETCH_ALIGN_BYTES
 kvmppc_interrupt:
+	std	r10,HSTATE_SCRATCH0(r13)
+	lbz	r10,HSTATE_IN_GUEST(r13)
+	cmpwi	r10,KVM_GUEST_MODE_GUEST_HV_FAST
+	beq	kvmppc_p9_exit_interrupt
+	ld	r10,HSTATE_SCRATCH0(r13)
+	lbz	r11,HSTATE_IN_GUEST(r13)
 	li	r11,PACA_EXGEN
 	cmpdi	r10,0x200
 	bgt+	1f
@@ -113,3 +127,169 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
 	GET_SCRATCH0(r13)
 	HRFI_TO_KERNEL
 #endif
+
+/* Stack frame offsets for kvmppc_hv_entry */
+#define SFS			208
+#define STACK_SLOT_VCPU		(SFS-8)
+#define STACK_SLOT_NVGPRS	(SFS-152)	/* 18 gprs */
+
+/*
+ * void kvmppc_p9_enter_guest(struct vcpu *vcpu);
+ *
+ * Enter the guest on a ISAv3.0 or later system where we have exactly
+ * one vcpu per vcore, and both the host and guest are radix, and threads
+ * are set to "indepdent mode".
+ */
+.balign	IFETCH_ALIGN_BYTES
+_GLOBAL(kvmppc_p9_enter_guest)
+EXPORT_SYMBOL_GPL(kvmppc_p9_enter_guest)
+	mflr	r0
+	std	r0, PPC_LR_STKOFF(r1)
+	stdu	r1, -SFS(r1)
+
+	std	r1, HSTATE_HOST_R1(r13)
+	std	r3, STACK_SLOT_VCPU(r1)
+
+	mfcr	r4
+	stw	r4, SFS+8(r1)
+
+	reg = 14
+	.rept	18
+	std	reg, STACK_SLOT_NVGPRS + ((reg - 14) * 8)(r1)
+	reg = reg + 1
+	.endr
+
+	ld	r4,VCPU_LR(r3)
+	mtlr	r4
+	ld	r4,VCPU_CTR(r3)
+	mtctr	r4
+	ld	r4,VCPU_XER(r3)
+	mtspr	SPRN_XER,r4
+
+	ld	r1,VCPU_CR(r3)
+
+BEGIN_FTR_SECTION
+	ld	r4,VCPU_CFAR(r3)
+	mtspr	SPRN_CFAR,r4
+END_FTR_SECTION_IFSET(CPU_FTR_CFAR)
+BEGIN_FTR_SECTION
+	ld	r4,VCPU_PPR(r3)
+	mtspr	SPRN_PPR,r4
+END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
+
+	reg = 4
+	.rept	28
+	ld	reg, __VCPU_GPR(reg)(r3)
+	reg = reg + 1
+	.endr
+
+	ld	r4,VCPU_KVM(r3)
+	lbz	r4,KVM_SECURE_GUEST(r4)
+	cmpdi	r4,0
+	ld	r4,VCPU_GPR(R4)(r3)
+	bne	.Lret_to_ultra
+
+	mtcr	r1
+
+	ld	r0,VCPU_GPR(R0)(r3)
+	ld	r1,VCPU_GPR(R1)(r3)
+	ld	r2,VCPU_GPR(R2)(r3)
+	ld	r3,VCPU_GPR(R3)(r3)
+
+	HRFI_TO_GUEST
+	b	.
+
+	/*
+	 * Use UV_RETURN ultracall to return control back to the Ultravisor
+	 * after processing an hypercall or interrupt that was forwarded
+	 * (a.k.a. reflected) to the Hypervisor.
+	 *
+	 * All registers have already been reloaded except the ucall requires:
+	 *   R0 = hcall result
+	 *   R2 = SRR1, so UV can detect a synthesized interrupt (if any)
+	 *   R3 = UV_RETURN
+	 */
+.Lret_to_ultra:
+	mtcr	r1
+	ld	r1,VCPU_GPR(R1)(r3)
+
+	ld	r0,VCPU_GPR(R3)(r3)
+	mfspr	r2,SPRN_SRR1
+	LOAD_REG_IMMEDIATE(r3, UV_RETURN)
+	sc	2
+
+/*
+ * kvmppc_p9_exit_hcall and kvmppc_p9_exit_interrupt are branched to from
+ * above if the interrupt was taken for a guest that was entered via
+ * kvmppc_p9_enter_guest().
+ *
+ * This code recovers the host stack and vcpu pointer, saves all GPRs and
+ * CR, LR, CTR, XER as well as guest MSR and NIA into the VCPU, then re-
+ * establishes the host stack and registers to return from  the
+ * kvmppc_p9_enter_guest() function.
+ */
+.balign	IFETCH_ALIGN_BYTES
+kvmppc_p9_exit_hcall:
+	mfspr	r11,SPRN_SRR0
+	mfspr	r12,SPRN_SRR1
+	li	r10,0xc00
+	std	r10,HSTATE_SCRATCH0(r13)
+
+.balign	IFETCH_ALIGN_BYTES
+kvmppc_p9_exit_interrupt:
+	std     r1,HSTATE_SCRATCH1(r13)
+	std     r3,HSTATE_SCRATCH2(r13)
+	ld	r1,HSTATE_HOST_R1(r13)
+	ld	r3, STACK_SLOT_VCPU(r1)
+
+	std	r9,VCPU_CR(r3)
+
+1:
+	std	r11,VCPU_PC(r3)
+	std	r12,VCPU_MSR(r3)
+
+	reg = 14
+	.rept	18
+	std	reg,__VCPU_GPR(reg)(r3)
+	reg = reg + 1
+	.endr
+
+	/* r1, r3, r9-r13 are saved to vcpu by C code */
+	std	r0,VCPU_GPR(R0)(r3)
+	std	r2,VCPU_GPR(R2)(r3)
+	reg = 4
+	.rept	5
+	std	reg,__VCPU_GPR(reg)(r3)
+	reg = reg + 1
+	.endr
+
+	ld	r2,PACATOC(r13)
+
+	mflr	r4
+	std	r4,VCPU_LR(r3)
+	mfspr	r4,SPRN_XER
+	std	r4,VCPU_XER(r3)
+
+	reg = 14
+	.rept	18
+	ld	reg, STACK_SLOT_NVGPRS + ((reg - 14) * 8)(r1)
+	reg = reg + 1
+	.endr
+
+	lwz	r4, SFS+8(r1)
+	mtcr	r4
+
+	/*
+	 * Flush the link stack here, before executing the first blr on the
+	 * way out of the guest.
+	 *
+	 * The link stack won't match coming out of the guest anyway so the
+	 * only cost is the flush itself. The call clobbers r0.
+	 */
+1:	nop
+	patch_site 1b patch__call_kvm_flush_link_stack_2
+
+	addi	r1, r1, SFS
+	ld	r0, PPC_LR_STKOFF(r1)
+	mtlr	r0
+	blr
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 6bf7f5ce4865..a2eafc207407 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -1127,7 +1127,7 @@ int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu)
  * This has to be done early, not in kvmppc_pseries_do_hcall(), so
  * that the cede logic in kvmppc_run_single_vcpu() works properly.
  */
-static void kvmppc_nested_cede(struct kvm_vcpu *vcpu)
+static void kvmppc_cede(struct kvm_vcpu *vcpu)
 {
 	vcpu->arch.shregs.msr |= MSR_EE;
 	vcpu->arch.ceded = 1;
@@ -3719,18 +3719,18 @@ static int kvmhv_p9_guest_entry(struct kvm_vcpu *vcpu, u64 time_limit,
 		vcpu->arch.shregs.dsisr = mfspr(SPRN_DSISR);
 		vcpu->arch.psscr = mfspr(SPRN_PSSCR_PR);
 		mtspr(SPRN_PSSCR_PR, host_psscr);
-
-		/* H_CEDE has to be handled now, not later */
-		if (trap == BOOK3S_INTERRUPT_SYSCALL && !vcpu->arch.nested &&
-		    kvmppc_get_gpr(vcpu, 3) == H_CEDE) {
-			kvmppc_nested_cede(vcpu);
-			kvmppc_set_gpr(vcpu, 3, 0);
-			trap = 0;
-		}
 	} else {
 		trap = kvmhv_load_hv_regs_and_go(vcpu, time_limit, lpcr);
 	}
 
+	/* H_CEDE has to be handled now, not later */
+	if (trap == BOOK3S_INTERRUPT_SYSCALL &&
+	    kvmppc_get_gpr(vcpu, 3) == H_CEDE) {
+		kvmppc_cede(vcpu);
+		kvmppc_set_gpr(vcpu, 3, 0);
+		trap = 0;
+	}
+
 	vcpu->arch.slb_max = 0;
 	dec = mfspr(SPRN_DEC);
 	if (!(lpcr & LPCR_LD)) /* Sign extend if not using large decrementer */
@@ -4396,6 +4396,7 @@ static int kvmppc_vcpu_run_hv(struct kvm_vcpu *vcpu)
 			r = kvmppc_pseries_do_hcall(vcpu);
 			trace_kvm_hcall_exit(vcpu, r);
 			kvmppc_core_prepare_to_enter(vcpu);
+			/* XXX: reflect sc 1 from PR=1 as a syscall to guest? */
 		} else if (r == RESUME_PAGE_FAULT) {
 			srcu_idx = srcu_read_lock(&kvm->srcu);
 			r = kvmppc_book3s_hv_page_fault(vcpu,
diff --git a/arch/powerpc/kvm/book3s_hv_interrupt.c b/arch/powerpc/kvm/book3s_hv_interrupt.c
new file mode 100644
index 000000000000..8623fc927ed3
--- /dev/null
+++ b/arch/powerpc/kvm/book3s_hv_interrupt.c
@@ -0,0 +1,208 @@
+#include <linux/kernel.h>
+#include <linux/kvm_host.h>
+#include <asm/asm-prototypes.h>
+#include <asm/dbell.h>
+#include <asm/kvm_ppc.h>
+
+#ifdef CONFIG_KVM_BOOK3S_HV_EXIT_TIMING
+static void __start_timing(struct kvm_vcpu *vcpu, struct kvmhv_tb_accumulator *next)
+{
+	struct kvmppc_vcore *vc = vcpu->arch.vcore;
+	u64 tb = mftb() - vc->tb_offset_applied;
+
+	vcpu->arch.cur_activity = next;
+	vcpu->arch.cur_tb_start = tb;
+}
+
+static void __accumulate_time(struct kvm_vcpu *vcpu, struct kvmhv_tb_accumulator *next)
+{
+	struct kvmppc_vcore *vc = vcpu->arch.vcore;
+	struct kvmhv_tb_accumulator *curr;
+	u64 tb = mftb() - vc->tb_offset_applied;
+	u64 prev_tb;
+	u64 delta;
+	u64 seq;
+
+	curr = vcpu->arch.cur_activity;
+	vcpu->arch.cur_activity = next;
+	prev_tb = vcpu->arch.cur_tb_start;
+	vcpu->arch.cur_tb_start = tb;
+
+	if (!curr)
+		return;
+
+	delta = tb - prev_tb;
+
+	seq = curr->seqcount;
+	curr->seqcount = seq + 1;
+	smp_wmb();
+	curr->tb_total += delta;
+	if (seq == 0 || delta < curr->tb_min)
+		curr->tb_min = delta;
+	if (delta > curr->tb_max)
+		curr->tb_max = delta;
+	smp_wmb();
+	curr->seqcount = seq + 2;
+}
+
+#define start_timing(vcpu, next) __start_timing(vcpu, next)
+#define end_timing(vcpu) __start_timing(vcpu, NULL)
+#define accumulate_time(vcpu, next) __accumulate_time(vcpu, next)
+#else
+#define start_timing(vcpu, next) do {} while (0)
+#define end_timing(vcpu) do {} while (0)
+#define accumulate_time(vcpu, next) do {} while (0)
+#endif
+
+static inline void mfslb(unsigned int idx, u64 *slbee, u64 *slbev)
+{
+	asm volatile("slbmfev  %0,%1" : "=r" (*slbev) : "r" (idx));
+	asm volatile("slbmfee  %0,%1" : "=r" (*slbee) : "r" (idx));
+}
+
+static inline void mtslb(unsigned int idx, u64 slbee, u64 slbev)
+{
+	BUG_ON((slbee & 0xfff) != idx);
+
+	asm volatile("slbmte %0,%1" :: "r" (slbev), "r" (slbee));
+}
+
+static inline void slb_invalidate(unsigned int ih)
+{
+	asm volatile("slbia %0" :: "i"(ih));
+}
+
+/*
+ * Malicious or buggy radix guests may have inserted SLB entries
+ * (only 0..3 because radix always runs with UPRT=1), so these must
+ * be cleared here to avoid side-channels. slbmte is used rather
+ * than slbia, as it won't clear cached translations.
+ */
+static void radix_clear_slb(void)
+{
+	u64 slbee, slbev;
+	int i;
+
+	for (i = 0; i < 4; i++) {
+		mfslb(i, &slbee, &slbev);
+		if (unlikely(slbee || slbev)) {
+			slbee = i;
+			slbev = 0;
+			mtslb(i, slbee, slbev);
+		}
+	}
+}
+
+int __kvmhv_vcpu_entry_p9(struct kvm_vcpu *vcpu)
+{
+	u64 *exsave;
+	unsigned long msr = mfmsr();
+	int trap;
+
+	start_timing(vcpu, &vcpu->arch.rm_entry);
+
+	vcpu->arch.ceded = 0;
+
+	WARN_ON_ONCE(vcpu->arch.shregs.msr & MSR_HV);
+	WARN_ON_ONCE(!(vcpu->arch.shregs.msr & MSR_ME));
+
+	mtspr(SPRN_HSRR0, vcpu->arch.regs.nip);
+	mtspr(SPRN_HSRR1, (vcpu->arch.shregs.msr & ~MSR_HV) | MSR_ME);
+
+	accumulate_time(vcpu, &vcpu->arch.guest_time);
+
+	local_paca->kvm_hstate.in_guest = KVM_GUEST_MODE_GUEST_HV_FAST;
+	kvmppc_p9_enter_guest(vcpu);
+	// Radix host and guest means host never runs with guest MMU state
+	local_paca->kvm_hstate.in_guest = KVM_GUEST_MODE_NONE;
+
+	accumulate_time(vcpu, &vcpu->arch.rm_intr);
+
+	/* Get these from r11/12 and paca exsave */
+	vcpu->arch.shregs.srr0 = mfspr(SPRN_SRR0);
+	vcpu->arch.shregs.srr1 = mfspr(SPRN_SRR1);
+	vcpu->arch.shregs.dar = mfspr(SPRN_DAR);
+	vcpu->arch.shregs.dsisr = mfspr(SPRN_DSISR);
+
+	trap = local_paca->kvm_hstate.scratch0 & ~0x2;
+	if (likely(trap > BOOK3S_INTERRUPT_MACHINE_CHECK)) {
+		exsave = local_paca->exgen;
+	} else if (trap == BOOK3S_INTERRUPT_SYSTEM_RESET) {
+		exsave = local_paca->exnmi;
+	} else { /* trap == 0x200 */
+		exsave = local_paca->exmc;
+	}
+
+	vcpu->arch.regs.gpr[1] = local_paca->kvm_hstate.scratch1;
+	vcpu->arch.regs.gpr[3] = local_paca->kvm_hstate.scratch2;
+	vcpu->arch.regs.gpr[9] = exsave[EX_R9/sizeof(u64)];
+	vcpu->arch.regs.gpr[10] = exsave[EX_R10/sizeof(u64)];
+	vcpu->arch.regs.gpr[11] = exsave[EX_R11/sizeof(u64)];
+	vcpu->arch.regs.gpr[12] = exsave[EX_R12/sizeof(u64)];
+	vcpu->arch.regs.gpr[13] = exsave[EX_R13/sizeof(u64)];
+	vcpu->arch.ppr = exsave[EX_PPR/sizeof(u64)];
+	vcpu->arch.cfar = exsave[EX_CFAR/sizeof(u64)];
+	vcpu->arch.regs.ctr = exsave[EX_CTR/sizeof(u64)];
+
+	vcpu->arch.last_inst = KVM_INST_FETCH_FAILED;
+
+	if (unlikely(trap == BOOK3S_INTERRUPT_MACHINE_CHECK)) {
+		vcpu->arch.fault_dar = exsave[EX_DAR/sizeof(u64)];
+		vcpu->arch.fault_dsisr = exsave[EX_DSISR/sizeof(u64)];
+		kvmppc_realmode_machine_check(vcpu);
+
+	} else if (unlikely(trap == BOOK3S_INTERRUPT_HMI)) {
+		kvmppc_realmode_hmi_handler();
+
+	} else if (trap == BOOK3S_INTERRUPT_H_EMUL_ASSIST) {
+		vcpu->arch.emul_inst = mfspr(SPRN_HEIR);
+
+	} else if (trap == BOOK3S_INTERRUPT_H_DATA_STORAGE) {
+		vcpu->arch.fault_dar = exsave[EX_DAR/sizeof(u64)];
+		vcpu->arch.fault_dsisr = exsave[EX_DSISR/sizeof(u64)];
+		vcpu->arch.fault_gpa = mfspr(SPRN_ASDR);
+
+	} else if (trap == BOOK3S_INTERRUPT_H_INST_STORAGE) {
+		vcpu->arch.fault_gpa = mfspr(SPRN_ASDR);
+
+	} else if (trap == BOOK3S_INTERRUPT_H_FAC_UNAVAIL) {
+		vcpu->arch.hfscr = mfspr(SPRN_HFSCR);
+
+#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
+	/*
+	 * Softpatch interrupt for transactional memory emulation cases
+	 * on POWER9 DD2.2.  This is early in the guest exit path - we
+	 * haven't saved registers or done a treclaim yet.
+	 */
+	} else if (trap == BOOK3S_INTERRUPT_HV_SOFTPATCH) {
+		vcpu->arch.emul_inst = mfspr(SPRN_HEIR);
+
+		/*
+		 * The cases we want to handle here are those where the guest
+		 * is in real suspend mode and is trying to transition to
+		 * transactional mode.
+		 */
+		if (local_paca->kvm_hstate.fake_suspend &&
+				(vcpu->arch.shregs.msr & MSR_TS_S)) {
+			if (kvmhv_p9_tm_emulation(vcpu)) {
+				/* Could return to guest quickly if handled */
+				/* XXX: can this all be done by the hv exit
+				 * handler? May need to adjust hv exit handler.
+				 */
+			}
+		}
+#endif
+	}
+
+	radix_clear_slb();
+
+	__mtmsrd(msr, 0);
+
+	accumulate_time(vcpu, &vcpu->arch.rm_exit);
+
+	kvmppc_xive_pull_vcpu(vcpu);
+
+	end_timing(vcpu);
+
+	return trap;
+}
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index eff4437e381c..aa8199c148f2 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -46,7 +46,6 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_300)
 /* Stack frame offsets for kvmppc_hv_entry */
 #define SFS			208
 #define STACK_SLOT_TRAP		(SFS-4)
-#define STACK_SLOT_SHORT_PATH	(SFS-8)
 #define STACK_SLOT_TID		(SFS-16)
 #define STACK_SLOT_PSSCR	(SFS-24)
 #define STACK_SLOT_PID		(SFS-32)
@@ -994,9 +993,6 @@ END_MMU_FTR_SECTION_IFCLR(MMU_FTR_TYPE_RADIX)
 no_xive:
 #endif /* CONFIG_KVM_XICS */
 
-	li	r0, 0
-	stw	r0, STACK_SLOT_SHORT_PATH(r1)
-
 deliver_guest_interrupt:	/* r4 = vcpu, r13 = paca */
 	/* Check if we can deliver an external or decrementer interrupt now */
 	ld	r0, VCPU_PENDING_EXC(r4)
@@ -1121,97 +1117,6 @@ ret_to_ultra:
 	ld	r4, VCPU_GPR(R4)(r4)
 	sc	2
 
-/*
- * Enter the guest on a P9 or later system where we have exactly
- * one vcpu per vcore and we don't need to go to real mode
- * (which implies that host and guest are both using radix MMU mode).
- * r3 = vcpu pointer
- * Most SPRs and all the VSRs have been loaded already.
- */
-_GLOBAL(__kvmhv_vcpu_entry_p9)
-EXPORT_SYMBOL_GPL(__kvmhv_vcpu_entry_p9)
-	mflr	r0
-	std	r0, PPC_LR_STKOFF(r1)
-	stdu	r1, -SFS(r1)
-
-	li	r0, 1
-	stw	r0, STACK_SLOT_SHORT_PATH(r1)
-
-	std	r3, HSTATE_KVM_VCPU(r13)
-	mfcr	r4
-	stw	r4, SFS+8(r1)
-
-	std	r1, HSTATE_HOST_R1(r13)
-
-	reg = 14
-	.rept	18
-	std	reg, STACK_SLOT_NVGPRS + ((reg - 14) * 8)(r1)
-	reg = reg + 1
-	.endr
-
-	reg = 14
-	.rept	18
-	ld	reg, __VCPU_GPR(reg)(r3)
-	reg = reg + 1
-	.endr
-
-	mfmsr	r10
-	std	r10, HSTATE_HOST_MSR(r13)
-
-	mr	r4, r3
-	b	fast_guest_entry_c
-guest_exit_short_path:
-	/*
-	 * Malicious or buggy radix guests may have inserted SLB entries
-	 * (only 0..3 because radix always runs with UPRT=1), so these must
-	 * be cleared here to avoid side-channels. slbmte is used rather
-	 * than slbia, as it won't clear cached translations.
-	 */
-	li	r0,0
-	slbmte	r0,r0
-	li	r4,1
-	slbmte	r0,r4
-	li	r4,2
-	slbmte	r0,r4
-	li	r4,3
-	slbmte	r0,r4
-
-	li	r0, KVM_GUEST_MODE_NONE
-	stb	r0, HSTATE_IN_GUEST(r13)
-
-	reg = 14
-	.rept	18
-	std	reg, __VCPU_GPR(reg)(r9)
-	reg = reg + 1
-	.endr
-
-	reg = 14
-	.rept	18
-	ld	reg, STACK_SLOT_NVGPRS + ((reg - 14) * 8)(r1)
-	reg = reg + 1
-	.endr
-
-	lwz	r4, SFS+8(r1)
-	mtcr	r4
-
-	mr	r3, r12		/* trap number */
-
-	addi	r1, r1, SFS
-	ld	r0, PPC_LR_STKOFF(r1)
-	mtlr	r0
-
-	/* If we are in real mode, do a rfid to get back to the caller */
-	mfmsr	r4
-	andi.	r5, r4, MSR_IR
-	bnelr
-	rldicl	r5, r4, 64 - MSR_TS_S_LG, 62	/* extract TS field */
-	mtspr	SPRN_SRR0, r0
-	ld	r10, HSTATE_HOST_MSR(r13)
-	rldimi	r10, r5, MSR_TS_S_LG, 63 - MSR_TS_T_LG
-	mtspr	SPRN_SRR1, r10
-	RFI_TO_KERNEL
-	b	.
-
 secondary_too_late:
 	li	r12, 0
 	stw	r12, STACK_SLOT_TRAP(r1)
@@ -1466,17 +1371,11 @@ guest_exit_cont:		/* r9 = vcpu, r12 = trap, r13 = paca */
 #endif /* CONFIG_KVM_XICS */
 
 	/*
-	 * Possibly flush the link stack here, before we do a blr in
-	 * guest_exit_short_path.
+	 * Possibly flush the link stack here.
 	 */
 1:	nop
 	patch_site 1b patch__call_kvm_flush_link_stack
 
-	/* If we came in through the P9 short path, go back out to C now */
-	lwz	r0, STACK_SLOT_SHORT_PATH(r1)
-	cmpwi	r0, 0
-	bne	guest_exit_short_path
-
 	/* For hash guest, read the guest SLB and save it away */
 	ld	r5, VCPU_KVM(r9)
 	lbz	r0, KVM_RADIX(r5)
@@ -1529,7 +1428,12 @@ END_MMU_FTR_SECTION_IFSET(MMU_FTR_TYPE_RADIX)
 	 * This is probably harmless but is unexpected.
 	 */
 	stw	r5,VCPU_SLB_MAX(r9)
-	/* Sanitise radix guest SLB, see guest_exit_short_path comment. */
+	/*
+	 * Malicious or buggy radix guests may have inserted SLB entries
+	 * (only 0..3 because radix always runs with UPRT=1), so these must
+	 * be cleared here to avoid side-channels. slbmte is used rather
+	 * than slbia, as it won't clear cached translations.
+	 */
 	li	r0,0
 	slbmte	r0,r0
 	li	r4,1
@@ -3331,7 +3235,7 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_300)
 	mtspr	SPRN_CIABR, r0
 	mtspr	SPRN_DAWRX0, r0
 
-	/* Clear hash and radix guest SLB, see guest_exit_short_path comment. */
+	/* Clear hash and radix guest SLB. */
 	slbmte	r0, r0
 	PPC_SLBIA(6)
 
diff --git a/arch/powerpc/kvm/book3s_xive.c b/arch/powerpc/kvm/book3s_xive.c
index 30dfeac731c6..22a3be0aafa1 100644
--- a/arch/powerpc/kvm/book3s_xive.c
+++ b/arch/powerpc/kvm/book3s_xive.c
@@ -127,6 +127,38 @@ void kvmppc_xive_push_vcpu(struct kvm_vcpu *vcpu)
 }
 EXPORT_SYMBOL_GPL(kvmppc_xive_push_vcpu);
 
+/*
+ * Pull a vcpu's context from the XIVE on guest exit.
+ * This assumes we are in virtual mode (MMU on)
+ */
+void kvmppc_xive_pull_vcpu(struct kvm_vcpu *vcpu)
+{
+	void __iomem *tima = local_paca->kvm_hstate.xive_tima_virt;
+
+	if (!vcpu->arch.xive_pushed)
+		return;
+
+	/*
+	 * Sould not have been pushed if there is no tima
+	 */
+	if (WARN_ON(!tima))
+		return
+
+	eieio();
+	/* First load to pull the context, we ignore the value */
+	__raw_readw(tima + TM_SPC_PULL_OS_CTX);
+	/* Second load to recover the context state (Words 0 and 1) */
+	vcpu->arch.xive_saved_state.w01 = __raw_readq(tima + TM_QW1_OS);
+
+	/* Fixup some of the state for the next load */
+	vcpu->arch.xive_pushed = 0;
+	vcpu->arch.xive_saved_state.lsmfb = 0;
+	vcpu->arch.xive_saved_state.ack = 0xff;
+	eieio();
+}
+EXPORT_SYMBOL_GPL(kvmppc_xive_pull_vcpu);
+
+
 /*
  * This is a simple trigger for a generic XIVE IRQ. This must
  * only be called for interrupts that support a trigger page
-- 
2.23.0


^ permalink raw reply related

* [PATCH 12/13] KVM: PPC: Book3S HV: Move radix MMU switching together in the P9 path
From: Nicholas Piggin @ 2021-02-19  6:35 UTC (permalink / raw)
  To: kvm-ppc; +Cc: linuxppc-dev, Nicholas Piggin
In-Reply-To: <20210219063542.1425130-1-npiggin@gmail.com>

Switching the MMU from radix<->radix mode is tricky particularly as the
MMU can remain enabled and requires a certain sequence of SPR updates.
Move these together into their own functions.

This also includes the radix TLB check / flush because it's tied in to
MMU switching due to tlbiel getting LPID from LPIDR.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/kvm/book3s_hv.c | 50 +++++++++++++++++++++++-------------
 1 file changed, 32 insertions(+), 18 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 53d0cbfe5933..6bf7f5ce4865 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -3446,12 +3446,38 @@ static noinline void kvmppc_run_core(struct kvmppc_vcore *vc)
 	trace_kvmppc_run_core(vc, 1);
 }
 
+static void switch_mmu_to_guest_radix(struct kvm *kvm, struct kvm_vcpu *vcpu, u64 lpcr)
+{
+	struct kvmppc_vcore *vc = vcpu->arch.vcore;
+	struct kvm_nested_guest *nested = vcpu->arch.nested;
+	u32 lpid;
+
+	lpid = nested ? nested->shadow_lpid : kvm->arch.lpid;
+
+	mtspr(SPRN_LPID, lpid);
+	mtspr(SPRN_LPCR, lpcr);
+	mtspr(SPRN_PID, vcpu->arch.pid);
+	isync();
+
+	/* TLBIEL must have LPIDR set, so set guest LPID before flushing. */
+	kvmppc_check_need_tlb_flush(kvm, vc->pcpu, nested);
+}
+
+static void switch_mmu_to_host_radix(struct kvm *kvm, u32 pid)
+{
+	mtspr(SPRN_PID, pid);
+	mtspr(SPRN_LPID, kvm->arch.host_lpid);
+	mtspr(SPRN_LPCR, kvm->arch.host_lpcr);
+	isync();
+}
+
 /*
  * Load up hypervisor-mode registers on P9.
  */
 static int kvmhv_load_hv_regs_and_go(struct kvm_vcpu *vcpu, u64 time_limit,
 				     unsigned long lpcr)
 {
+	struct kvm *kvm = vcpu->kvm;
 	struct kvmppc_vcore *vc = vcpu->arch.vcore;
 	s64 hdec;
 	u64 tb, purr, spurr;
@@ -3467,12 +3493,12 @@ static int kvmhv_load_hv_regs_and_go(struct kvm_vcpu *vcpu, u64 time_limit,
 	 * P8 and P9 suppress the HDEC exception when LPCR[HDICE] = 0,
 	 * so set HDICE before writing HDEC.
 	 */
-	mtspr(SPRN_LPCR, vcpu->kvm->arch.host_lpcr | LPCR_HDICE);
+	mtspr(SPRN_LPCR, kvm->arch.host_lpcr | LPCR_HDICE);
 	isync();
 
 	hdec = time_limit - mftb();
 	if (hdec < 0) {
-		mtspr(SPRN_LPCR, vcpu->kvm->arch.host_lpcr);
+		mtspr(SPRN_LPCR, kvm->arch.host_lpcr);
 		isync();
 		return BOOK3S_INTERRUPT_HV_DECREMENTER;
 	}
@@ -3503,7 +3529,6 @@ static int kvmhv_load_hv_regs_and_go(struct kvm_vcpu *vcpu, u64 time_limit,
 	}
 	mtspr(SPRN_CIABR, vcpu->arch.ciabr);
 	mtspr(SPRN_IC, vcpu->arch.ic);
-	mtspr(SPRN_PID, vcpu->arch.pid);
 
 	mtspr(SPRN_PSSCR, vcpu->arch.psscr | PSSCR_EC |
 	      (local_paca->kvm_hstate.fake_suspend << PSSCR_FAKE_SUSPEND_LG));
@@ -3517,8 +3542,7 @@ static int kvmhv_load_hv_regs_and_go(struct kvm_vcpu *vcpu, u64 time_limit,
 
 	mtspr(SPRN_AMOR, ~0UL);
 
-	mtspr(SPRN_LPCR, lpcr);
-	isync();
+	switch_mmu_to_guest_radix(kvm, vcpu, lpcr);
 
 	kvmppc_xive_push_vcpu(vcpu);
 
@@ -3553,7 +3577,6 @@ static int kvmhv_load_hv_regs_and_go(struct kvm_vcpu *vcpu, u64 time_limit,
 	mtspr(SPRN_CIABR, host_ciabr);
 	mtspr(SPRN_DAWR0, host_dawr);
 	mtspr(SPRN_DAWRX0, host_dawrx);
-	mtspr(SPRN_PID, host_pidr);
 
 	/*
 	 * Since this is radix, do a eieio; tlbsync; ptesync sequence in
@@ -3568,9 +3591,6 @@ static int kvmhv_load_hv_regs_and_go(struct kvm_vcpu *vcpu, u64 time_limit,
 	if (cpu_has_feature(CPU_FTR_ARCH_31))
 		asm volatile(PPC_CP_ABORT);
 
-	mtspr(SPRN_LPID, vcpu->kvm->arch.host_lpid);	/* restore host LPID */
-	isync();
-
 	vc->dpdes = mfspr(SPRN_DPDES);
 	vc->vtb = mfspr(SPRN_VTB);
 	mtspr(SPRN_DPDES, 0);
@@ -3587,7 +3607,8 @@ static int kvmhv_load_hv_regs_and_go(struct kvm_vcpu *vcpu, u64 time_limit,
 	}
 
 	mtspr(SPRN_HDEC, 0x7fffffff);
-	mtspr(SPRN_LPCR, vcpu->kvm->arch.host_lpcr);
+
+	switch_mmu_to_host_radix(kvm, host_pidr);
 
 	return trap;
 }
@@ -4117,7 +4138,7 @@ int kvmhv_run_single_vcpu(struct kvm_vcpu *vcpu, u64 time_limit,
 {
 	struct kvm_run *run = vcpu->run;
 	int trap, r, pcpu;
-	int srcu_idx, lpid;
+	int srcu_idx;
 	struct kvmppc_vcore *vc;
 	struct kvm *kvm = vcpu->kvm;
 	struct kvm_nested_guest *nested = vcpu->arch.nested;
@@ -4191,13 +4212,6 @@ int kvmhv_run_single_vcpu(struct kvm_vcpu *vcpu, u64 time_limit,
 	vc->vcore_state = VCORE_RUNNING;
 	trace_kvmppc_run_core(vc, 0);
 
-	if (cpu_has_feature(CPU_FTR_HVMODE)) {
-		lpid = nested ? nested->shadow_lpid : kvm->arch.lpid;
-		mtspr(SPRN_LPID, lpid);
-		isync();
-		kvmppc_check_need_tlb_flush(kvm, pcpu, nested);
-	}
-
 	guest_enter_irqoff();
 
 	srcu_idx = srcu_read_lock(&kvm->srcu);
-- 
2.23.0


^ permalink raw reply related

* [PATCH 11/13] KVM: PPC: Book3S HV: Minimise hcall handler calling convention differences
From: Nicholas Piggin @ 2021-02-19  6:35 UTC (permalink / raw)
  To: kvm-ppc; +Cc: linuxppc-dev, Nicholas Piggin
In-Reply-To: <20210219063542.1425130-1-npiggin@gmail.com>

This sets up the same calling convention from interrupt entry to
KVM interrupt handler for system calls as exists for other interrupt
types.

This is a better API, it uses a save area rather than SPR, and it has
more registers free to use. Using a single common API helps maintain
it, and it becomes easier to use in C in a later patch.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/kernel/exceptions-64s.S | 16 +++++++++++++++-
 arch/powerpc/kvm/book3s_64_entry.S   | 22 +++-------------------
 2 files changed, 18 insertions(+), 20 deletions(-)

diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
index 4878f05b5325..6f086ee922e0 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -1923,8 +1923,22 @@ EXC_VIRT_END(system_call, 0x4c00, 0x100)
 
 #ifdef CONFIG_KVM_BOOK3S_64_HANDLER
 TRAMP_REAL_BEGIN(kvm_hcall)
+	std	r9,PACA_EXGEN+EX_R9(r13)
+	std	r11,PACA_EXGEN+EX_R11(r13)
+	std	r12,PACA_EXGEN+EX_R12(r13)
+	mfcr	r9
 	mfctr	r10
-	SET_SCRATCH0(r10) /* Save r13 in SCRATCH0 */
+	std	r10,PACA_EXGEN+EX_R13(r13)
+	li	r10,0
+	std	r10,PACA_EXGEN+EX_CFAR(r13)
+	std	r10,PACA_EXGEN+EX_CTR(r13)
+BEGIN_FTR_SECTION
+	mfspr	r10,SPRN_PPR
+	std	r10,PACA_EXGEN+EX_PPR(r13)
+END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
+
+	HMT_MEDIUM
+
 #ifdef CONFIG_RELOCATABLE
 	/*
 	 * Requires __LOAD_FAR_HANDLER beause kvmppc_hcall lives
diff --git a/arch/powerpc/kvm/book3s_64_entry.S b/arch/powerpc/kvm/book3s_64_entry.S
index 6b2fda5dee38..6f06b58b1bdd 100644
--- a/arch/powerpc/kvm/book3s_64_entry.S
+++ b/arch/powerpc/kvm/book3s_64_entry.S
@@ -13,24 +13,9 @@
 .global	kvmppc_hcall
 .balign IFETCH_ALIGN_BYTES
 kvmppc_hcall:
-	/*
-	 * This is a hcall, so register convention is as
-	 * Documentation/powerpc/papr_hcalls.rst, with these additions:
-	 * R13		= PACA
-	 * guest R13 saved in SPRN_SCRATCH0
-	 * R10		= free
-	 */
-BEGIN_FTR_SECTION
-	mfspr	r10,SPRN_PPR
-	std	r10,HSTATE_PPR(r13)
-END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
-	HMT_MEDIUM
-	mfcr	r10
-	std	r12,HSTATE_SCRATCH0(r13)
-	sldi	r12,r10,32
-	ori	r12,r12,0xc00
-	ld	r10,PACA_EXGEN+EX_R10(r13)
-	b	do_kvm_interrupt
+	ld	r10,PACA_EXGEN+EX_R13(r13)
+	SET_SCRATCH0(r10)
+	li	r10,0xc00
 
 .global	kvmppc_interrupt
 .balign IFETCH_ALIGN_BYTES
@@ -61,7 +46,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
 	ld	r10,EX_R10(r11)
 	ld	r11,EX_R11(r11)
 
-do_kvm_interrupt:
 	/*
 	 * Hcalls and other interrupts come here after normalising register
 	 * contents and save locations:
-- 
2.23.0


^ permalink raw reply related

* [PATCH 10/13] KVM: PPC: Book3S HV: move bad_host_intr check to HV handler
From: Nicholas Piggin @ 2021-02-19  6:35 UTC (permalink / raw)
  To: kvm-ppc; +Cc: linuxppc-dev, Nicholas Piggin
In-Reply-To: <20210219063542.1425130-1-npiggin@gmail.com>

This is not used by PR KVM.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/kvm/book3s_64_entry.S      | 3 ---
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 4 +++-
 arch/powerpc/kvm/book3s_segment.S       | 7 +++++++
 3 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_64_entry.S b/arch/powerpc/kvm/book3s_64_entry.S
index ef946e202773..6b2fda5dee38 100644
--- a/arch/powerpc/kvm/book3s_64_entry.S
+++ b/arch/powerpc/kvm/book3s_64_entry.S
@@ -77,11 +77,8 @@ do_kvm_interrupt:
 	beq-	.Lmaybe_skip
 .Lno_skip:
 #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
-	cmpwi	r9,KVM_GUEST_MODE_HOST_HV
-	beq	kvmppc_bad_host_intr
 #ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE
 	cmpwi	r9,KVM_GUEST_MODE_GUEST
-	ld	r9,HSTATE_SCRATCH2(r13)
 	beq	kvmppc_interrupt_pr
 #endif
 	b	kvmppc_interrupt_hv
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index bbf786a0c0d6..eff4437e381c 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -1251,6 +1251,7 @@ hdec_soon:
 kvmppc_interrupt_hv:
 	/*
 	 * Register contents:
+	 * R9		= HSTATE_IN_GUEST
 	 * R12		= (guest CR << 32) | interrupt vector
 	 * R13		= PACA
 	 * guest R12 saved in shadow VCPU SCRATCH0
@@ -1258,6 +1259,8 @@ kvmppc_interrupt_hv:
 	 * guest R9 saved in HSTATE_SCRATCH2
 	 */
 	/* We're now back in the host but in guest MMU context */
+	cmpwi	r9,KVM_GUEST_MODE_HOST_HV
+	beq	kvmppc_bad_host_intr
 	li	r9, KVM_GUEST_MODE_HOST_HV
 	stb	r9, HSTATE_IN_GUEST(r13)
 
@@ -3254,7 +3257,6 @@ END_FTR_SECTION_IFCLR(CPU_FTR_P9_TM_HV_ASSIST)
  * cfar is saved in HSTATE_CFAR(r13)
  * ppr is saved in HSTATE_PPR(r13)
  */
-.global kvmppc_bad_host_intr
 kvmppc_bad_host_intr:
 	/*
 	 * Switch to the emergency stack, but start half-way down in
diff --git a/arch/powerpc/kvm/book3s_segment.S b/arch/powerpc/kvm/book3s_segment.S
index 1f492aa4c8d6..ef1d88b869bf 100644
--- a/arch/powerpc/kvm/book3s_segment.S
+++ b/arch/powerpc/kvm/book3s_segment.S
@@ -167,8 +167,15 @@ kvmppc_interrupt_pr:
 	 * R12             = (guest CR << 32) | exit handler id
 	 * R13             = PACA
 	 * HSTATE.SCRATCH0 = guest R12
+	 *
+	 * If HV is possible, additionally:
+	 * R9              = HSTATE_IN_GUEST
+	 * HSTATE.SCRATCH2 = guest R9
 	 */
 #ifdef CONFIG_PPC64
+#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
+	ld	r9,HSTATE_SCRATCH2(r13)
+#endif
 	/* Match 32-bit entry */
 	rotldi	r12, r12, 32		  /* Flip R12 halves for stw */
 	stw	r12, HSTATE_SCRATCH1(r13) /* CR is now in the low half */
-- 
2.23.0


^ permalink raw reply related

* [PATCH 09/13] KVM: PPC: Book3S HV: Move interrupt early register setup to KVM
From: Nicholas Piggin @ 2021-02-19  6:35 UTC (permalink / raw)
  To: kvm-ppc; +Cc: linuxppc-dev, Nicholas Piggin
In-Reply-To: <20210219063542.1425130-1-npiggin@gmail.com>

Like the earlier patch for hcalls, KVM interrupt entry requires a
different calling convention than the Linux interrupt handlers
set up. Move the code that converts from one to the other into KVM.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/kernel/exceptions-64s.S | 126 ++++-----------------------
 arch/powerpc/kvm/book3s_64_entry.S   |  34 +++++++-
 2 files changed, 50 insertions(+), 110 deletions(-)

diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
index 6872f0376003..4878f05b5325 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -191,7 +191,6 @@ do_define_int n
 	.endif
 .endm
 
-#ifdef CONFIG_KVM_BOOK3S_64_HANDLER
 /*
  * All interrupts which set HSRR registers, as well as SRESET and MCE and
  * syscall when invoked with "sc 1" switch to MSR[HV]=1 (HVMODE) to be taken,
@@ -224,54 +223,25 @@ do_define_int n
  * to KVM to handle.
  */
 
-.macro KVMTEST name
+.macro KVMTEST name handler
+#ifdef CONFIG_KVM_BOOK3S_64_HANDLER
 	lbz	r10,HSTATE_IN_GUEST(r13)
 	cmpwi	r10,0
-	bne	\name\()_kvm
-.endm
-
-.macro GEN_KVM name
-	.balign IFETCH_ALIGN_BYTES
-\name\()_kvm:
-
-BEGIN_FTR_SECTION
-	ld	r10,IAREA+EX_CFAR(r13)
-	std	r10,HSTATE_CFAR(r13)
-END_FTR_SECTION_IFSET(CPU_FTR_CFAR)
-
-	ld	r10,IAREA+EX_CTR(r13)
-	mtctr	r10
-BEGIN_FTR_SECTION
-	ld	r10,IAREA+EX_PPR(r13)
-	std	r10,HSTATE_PPR(r13)
-END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
-	ld	r11,IAREA+EX_R11(r13)
-	ld	r12,IAREA+EX_R12(r13)
-	std	r12,HSTATE_SCRATCH0(r13)
-	sldi	r12,r9,32
-	ld	r9,IAREA+EX_R9(r13)
-	ld	r10,IAREA+EX_R10(r13)
 	/* HSRR variants have the 0x2 bit added to their trap number */
 	.if IHSRR_IF_HVMODE
 	BEGIN_FTR_SECTION
-	ori	r12,r12,(IVEC + 0x2)
+	li	r10,(IVEC + 0x2)
 	FTR_SECTION_ELSE
-	ori	r12,r12,(IVEC)
+	li	r10,(IVEC)
 	ALT_FTR_SECTION_END_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206)
 	.elseif IHSRR
-	ori	r12,r12,(IVEC+ 0x2)
+	li	r10,(IVEC + 0x2)
 	.else
-	ori	r12,r12,(IVEC)
+	li	r10,(IVEC)
 	.endif
-	b	kvmppc_interrupt
-.endm
-
-#else
-.macro KVMTEST name
-.endm
-.macro GEN_KVM name
-.endm
+	bne	\handler
 #endif
+.endm
 
 /*
  * This is the BOOK3S interrupt entry code macro.
@@ -413,7 +383,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_CFAR)
 DEFINE_FIXED_SYMBOL(\name\()_common_real)
 \name\()_common_real:
 	.if IKVM_REAL
-		KVMTEST \name
+		KVMTEST \name kvm_interrupt
 	.endif
 
 	ld	r10,PACAKMSR(r13)	/* get MSR value for kernel */
@@ -436,7 +406,7 @@ DEFINE_FIXED_SYMBOL(\name\()_common_real)
 DEFINE_FIXED_SYMBOL(\name\()_common_virt)
 \name\()_common_virt:
 	.if IKVM_VIRT
-		KVMTEST \name
+		KVMTEST \name kvm_interrupt
 1:
 	.endif
 	.endif /* IVIRT */
@@ -450,7 +420,7 @@ DEFINE_FIXED_SYMBOL(\name\()_common_virt)
 DEFINE_FIXED_SYMBOL(\name\()_common_real)
 \name\()_common_real:
 	.if IKVM_REAL
-		KVMTEST \name
+		KVMTEST \name kvm_interrupt
 	.endif
 .endm
 
@@ -1011,8 +981,6 @@ EXC_COMMON_BEGIN(system_reset_common)
 	EXCEPTION_RESTORE_REGS
 	RFI_TO_USER_OR_KERNEL
 
-	GEN_KVM system_reset
-
 
 /**
  * Interrupt 0x200 - Machine Check Interrupt (MCE).
@@ -1196,7 +1164,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206)
 	/*
 	 * Check if we are coming from guest. If yes, then run the normal
 	 * exception handler which will take the
-	 * machine_check_kvm->kvmppc_interrupt branch to deliver the MC event
+	 * machine_check_kvm->kvm_interrupt branch to deliver the MC event
 	 * to guest.
 	 */
 	lbz	r11,HSTATE_IN_GUEST(r13)
@@ -1267,8 +1235,6 @@ EXC_COMMON_BEGIN(machine_check_common)
 	bl	machine_check_exception
 	b	interrupt_return
 
-	GEN_KVM machine_check
-
 
 #ifdef CONFIG_PPC_P7_NAP
 /*
@@ -1393,8 +1359,6 @@ MMU_FTR_SECTION_ELSE
 	b	handle_page_fault
 ALT_MMU_FTR_SECTION_END_IFCLR(MMU_FTR_TYPE_RADIX)
 
-	GEN_KVM data_access
-
 
 /**
  * Interrupt 0x380 - Data Segment Interrupt (DSLB).
@@ -1449,8 +1413,6 @@ ALT_MMU_FTR_SECTION_END_IFCLR(MMU_FTR_TYPE_RADIX)
 	bl	do_bad_slb_fault
 	b	interrupt_return
 
-	GEN_KVM data_access_slb
-
 
 /**
  * Interrupt 0x400 - Instruction Storage Interrupt (ISI).
@@ -1489,8 +1451,6 @@ MMU_FTR_SECTION_ELSE
 	b	handle_page_fault
 ALT_MMU_FTR_SECTION_END_IFCLR(MMU_FTR_TYPE_RADIX)
 
-	GEN_KVM instruction_access
-
 
 /**
  * Interrupt 0x480 - Instruction Segment Interrupt (ISLB).
@@ -1540,8 +1500,6 @@ ALT_MMU_FTR_SECTION_END_IFCLR(MMU_FTR_TYPE_RADIX)
 	bl	do_bad_slb_fault
 	b	interrupt_return
 
-	GEN_KVM instruction_access_slb
-
 
 /**
  * Interrupt 0x500 - External Interrupt.
@@ -1588,8 +1546,6 @@ EXC_COMMON_BEGIN(hardware_interrupt_common)
 	bl	do_IRQ
 	b	interrupt_return
 
-	GEN_KVM hardware_interrupt
-
 
 /**
  * Interrupt 0x600 - Alignment Interrupt
@@ -1617,8 +1573,6 @@ EXC_COMMON_BEGIN(alignment_common)
 	REST_NVGPRS(r1) /* instruction emulation may change GPRs */
 	b	interrupt_return
 
-	GEN_KVM alignment
-
 
 /**
  * Interrupt 0x700 - Program Interrupt (program check).
@@ -1681,8 +1635,6 @@ EXC_COMMON_BEGIN(program_check_common)
 	REST_NVGPRS(r1) /* instruction emulation may change GPRs */
 	b	interrupt_return
 
-	GEN_KVM program_check
-
 
 /*
  * Interrupt 0x800 - Floating-Point Unavailable Interrupt.
@@ -1735,8 +1687,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_TM)
 	b	interrupt_return
 #endif
 
-	GEN_KVM fp_unavailable
-
 
 /**
  * Interrupt 0x900 - Decrementer Interrupt.
@@ -1777,8 +1727,6 @@ EXC_COMMON_BEGIN(decrementer_common)
 	bl	timer_interrupt
 	b	interrupt_return
 
-	GEN_KVM decrementer
-
 
 /**
  * Interrupt 0x980 - Hypervisor Decrementer Interrupt.
@@ -1825,8 +1773,6 @@ EXC_COMMON_BEGIN(hdecrementer_common)
 	ld	r13,PACA_EXGEN+EX_R13(r13)
 	HRFI_TO_KERNEL
 
-	GEN_KVM hdecrementer
-
 
 /**
  * Interrupt 0xa00 - Directed Privileged Doorbell Interrupt.
@@ -1868,8 +1814,6 @@ EXC_COMMON_BEGIN(doorbell_super_common)
 #endif
 	b	interrupt_return
 
-	GEN_KVM doorbell_super
-
 
 EXC_REAL_NONE(0xb00, 0x100)
 EXC_VIRT_NONE(0x4b00, 0x100)
@@ -1919,7 +1863,7 @@ INT_DEFINE_END(system_call)
 	GET_PACA(r13)
 	std	r10,PACA_EXGEN+EX_R10(r13)
 	INTERRUPT_TO_KERNEL
-	KVMTEST system_call /* uses r10, branch to system_call_kvm */
+	KVMTEST system_call kvm_hcall /* uses r10, branch to kvm_hcall */
 	mfctr	r9
 #else
 	mr	r9,r13
@@ -1978,7 +1922,7 @@ EXC_VIRT_BEGIN(system_call, 0x4c00, 0x100)
 EXC_VIRT_END(system_call, 0x4c00, 0x100)
 
 #ifdef CONFIG_KVM_BOOK3S_64_HANDLER
-TRAMP_REAL_BEGIN(system_call_kvm)
+TRAMP_REAL_BEGIN(kvm_hcall)
 	mfctr	r10
 	SET_SCRATCH0(r10) /* Save r13 in SCRATCH0 */
 #ifdef CONFIG_RELOCATABLE
@@ -2018,8 +1962,6 @@ EXC_COMMON_BEGIN(single_step_common)
 	bl	single_step_exception
 	b	interrupt_return
 
-	GEN_KVM single_step
-
 
 /**
  * Interrupt 0xe00 - Hypervisor Data Storage Interrupt (HDSI).
@@ -2060,8 +2002,6 @@ MMU_FTR_SECTION_ELSE
 ALT_MMU_FTR_SECTION_END_IFSET(MMU_FTR_TYPE_RADIX)
 	b       interrupt_return
 
-	GEN_KVM h_data_storage
-
 
 /**
  * Interrupt 0xe20 - Hypervisor Instruction Storage Interrupt (HISI).
@@ -2087,8 +2027,6 @@ EXC_COMMON_BEGIN(h_instr_storage_common)
 	bl	unknown_exception
 	b	interrupt_return
 
-	GEN_KVM h_instr_storage
-
 
 /**
  * Interrupt 0xe40 - Hypervisor Emulation Assistance Interrupt.
@@ -2113,8 +2051,6 @@ EXC_COMMON_BEGIN(emulation_assist_common)
 	REST_NVGPRS(r1) /* instruction emulation may change GPRs */
 	b	interrupt_return
 
-	GEN_KVM emulation_assist
-
 
 /**
  * Interrupt 0xe60 - Hypervisor Maintenance Interrupt (HMI).
@@ -2187,8 +2123,6 @@ EXC_COMMON_BEGIN(hmi_exception_early_common)
 	EXCEPTION_RESTORE_REGS hsrr=1
 	GEN_INT_ENTRY hmi_exception, virt=0
 
-	GEN_KVM hmi_exception_early
-
 EXC_COMMON_BEGIN(hmi_exception_common)
 	GEN_COMMON hmi_exception
 	FINISH_NAP
@@ -2197,8 +2131,6 @@ EXC_COMMON_BEGIN(hmi_exception_common)
 	bl	handle_hmi_exception
 	b	interrupt_return
 
-	GEN_KVM hmi_exception
-
 
 /**
  * Interrupt 0xe80 - Directed Hypervisor Doorbell Interrupt.
@@ -2231,8 +2163,6 @@ EXC_COMMON_BEGIN(h_doorbell_common)
 #endif
 	b	interrupt_return
 
-	GEN_KVM h_doorbell
-
 
 /**
  * Interrupt 0xea0 - Hypervisor Virtualization Interrupt.
@@ -2261,8 +2191,6 @@ EXC_COMMON_BEGIN(h_virt_irq_common)
 	bl	do_IRQ
 	b	interrupt_return
 
-	GEN_KVM h_virt_irq
-
 
 EXC_REAL_NONE(0xec0, 0x20)
 EXC_VIRT_NONE(0x4ec0, 0x20)
@@ -2308,8 +2236,6 @@ EXC_COMMON_BEGIN(performance_monitor_common)
 	bl	performance_monitor_exception
 	b	interrupt_return
 
-	GEN_KVM performance_monitor
-
 
 /**
  * Interrupt 0xf20 - Vector Unavailable Interrupt.
@@ -2362,8 +2288,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_ALTIVEC)
 	bl	altivec_unavailable_exception
 	b	interrupt_return
 
-	GEN_KVM altivec_unavailable
-
 
 /**
  * Interrupt 0xf40 - VSX Unavailable Interrupt.
@@ -2415,8 +2339,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_VSX)
 	bl	vsx_unavailable_exception
 	b	interrupt_return
 
-	GEN_KVM vsx_unavailable
-
 
 /**
  * Interrupt 0xf60 - Facility Unavailable Interrupt.
@@ -2445,8 +2367,6 @@ EXC_COMMON_BEGIN(facility_unavailable_common)
 	REST_NVGPRS(r1) /* instruction emulation may change GPRs */
 	b	interrupt_return
 
-	GEN_KVM facility_unavailable
-
 
 /**
  * Interrupt 0xf60 - Hypervisor Facility Unavailable Interrupt.
@@ -2475,8 +2395,6 @@ EXC_COMMON_BEGIN(h_facility_unavailable_common)
 	REST_NVGPRS(r1) /* XXX Shouldn't be necessary in practice */
 	b	interrupt_return
 
-	GEN_KVM h_facility_unavailable
-
 
 EXC_REAL_NONE(0xfa0, 0x20)
 EXC_VIRT_NONE(0x4fa0, 0x20)
@@ -2506,8 +2424,6 @@ EXC_COMMON_BEGIN(cbe_system_error_common)
 	bl	cbe_system_error_exception
 	b	interrupt_return
 
-	GEN_KVM cbe_system_error
-
 #else /* CONFIG_CBE_RAS */
 EXC_REAL_NONE(0x1200, 0x100)
 EXC_VIRT_NONE(0x5200, 0x100)
@@ -2533,8 +2449,6 @@ EXC_COMMON_BEGIN(instruction_breakpoint_common)
 	bl	instruction_breakpoint_exception
 	b	interrupt_return
 
-	GEN_KVM instruction_breakpoint
-
 
 EXC_REAL_NONE(0x1400, 0x100)
 EXC_VIRT_NONE(0x5400, 0x100)
@@ -2655,8 +2569,6 @@ EXC_COMMON_BEGIN(denorm_exception_common)
 	bl	unknown_exception
 	b	interrupt_return
 
-	GEN_KVM denorm_exception
-
 
 #ifdef CONFIG_CBE_RAS
 INT_DEFINE_BEGIN(cbe_maintenance)
@@ -2674,8 +2586,6 @@ EXC_COMMON_BEGIN(cbe_maintenance_common)
 	bl	cbe_maintenance_exception
 	b	interrupt_return
 
-	GEN_KVM cbe_maintenance
-
 #else /* CONFIG_CBE_RAS */
 EXC_REAL_NONE(0x1600, 0x100)
 EXC_VIRT_NONE(0x5600, 0x100)
@@ -2706,8 +2616,6 @@ EXC_COMMON_BEGIN(altivec_assist_common)
 #endif
 	b	interrupt_return
 
-	GEN_KVM altivec_assist
-
 
 #ifdef CONFIG_CBE_RAS
 INT_DEFINE_BEGIN(cbe_thermal)
@@ -2725,8 +2633,6 @@ EXC_COMMON_BEGIN(cbe_thermal_common)
 	bl	cbe_thermal_exception
 	b	interrupt_return
 
-	GEN_KVM cbe_thermal
-
 #else /* CONFIG_CBE_RAS */
 EXC_REAL_NONE(0x1800, 0x100)
 EXC_VIRT_NONE(0x5800, 0x100)
@@ -2999,6 +2905,10 @@ TRAMP_REAL_BEGIN(rfscv_flush_fallback)
 
 USE_TEXT_SECTION()
 
+	/* conditional branch in KVMTEST can't reach all the way, make a stub */
+kvm_interrupt:
+	b	kvmppc_interrupt
+
 _GLOBAL(do_uaccess_flush)
 	UACCESS_FLUSH_FIXUP_SECTION
 	nop
diff --git a/arch/powerpc/kvm/book3s_64_entry.S b/arch/powerpc/kvm/book3s_64_entry.S
index 7050197cd359..ef946e202773 100644
--- a/arch/powerpc/kvm/book3s_64_entry.S
+++ b/arch/powerpc/kvm/book3s_64_entry.S
@@ -30,15 +30,45 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
 	sldi	r12,r10,32
 	ori	r12,r12,0xc00
 	ld	r10,PACA_EXGEN+EX_R10(r13)
+	b	do_kvm_interrupt
 
 .global	kvmppc_interrupt
 .balign IFETCH_ALIGN_BYTES
 kvmppc_interrupt:
+	li	r11,PACA_EXGEN
+	cmpdi	r10,0x200
+	bgt+	1f
+	li	r11,PACA_EXMC
+	beq	1f
+	li	r11,PACA_EXNMI
+1:	add	r11,r11,r13
+
+BEGIN_FTR_SECTION
+	ld	r12,EX_CFAR(r11)
+	std	r12,HSTATE_CFAR(r13)
+END_FTR_SECTION_IFSET(CPU_FTR_CFAR)
+	ld	r12,EX_CTR(r11)
+	mtctr	r12
+BEGIN_FTR_SECTION
+	ld	r12,EX_PPR(r11)
+	std	r12,HSTATE_PPR(r13)
+END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
+	ld	r12,EX_R12(r11)
+	std	r12,HSTATE_SCRATCH0(r13)
+	sldi	r12,r9,32
+	or	r12,r12,r10
+	ld	r9,EX_R9(r11)
+	ld	r10,EX_R10(r11)
+	ld	r11,EX_R11(r11)
+
+do_kvm_interrupt:
 	/*
-	 * Register contents:
+	 * Hcalls and other interrupts come here after normalising register
+	 * contents and save locations:
+	 *
 	 * R12		= (guest CR << 32) | interrupt vector
 	 * R13		= PACA
-	 * guest R12 saved in shadow VCPU SCRATCH0
+	 * guest R12 saved in shadow HSTATE_SCRATCH0
 	 * guest R13 saved in SPRN_SCRATCH0
 	 */
 	std	r9,HSTATE_SCRATCH2(r13)
-- 
2.23.0


^ permalink raw reply related

* [PATCH 08/13] KVM: PPC: Book3S HV: Move hcall early register setup to KVM
From: Nicholas Piggin @ 2021-02-19  6:35 UTC (permalink / raw)
  To: kvm-ppc; +Cc: linuxppc-dev, Nicholas Piggin
In-Reply-To: <20210219063542.1425130-1-npiggin@gmail.com>

System calls / hcalls have a different calling convention than
other interrupts, so there is code in the KVMTEST to massage these
into the same form as other interrupt handlers.

Move this work into the KVM hcall handler. This means teaching KVM
a little more about the low level interrupt handler setup, PACA save
areas, etc., although that's not obviously worse than the current
approach of coming up with an entirely different interrupt register
/ save convention.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/include/asm/exception-64s.h | 13 +++++++
 arch/powerpc/kernel/exceptions-64s.S     | 44 ++----------------------
 arch/powerpc/kvm/book3s_64_entry.S       | 17 +++++++++
 3 files changed, 32 insertions(+), 42 deletions(-)

diff --git a/arch/powerpc/include/asm/exception-64s.h b/arch/powerpc/include/asm/exception-64s.h
index c1a8aac01cf9..bb6f78fcf981 100644
--- a/arch/powerpc/include/asm/exception-64s.h
+++ b/arch/powerpc/include/asm/exception-64s.h
@@ -35,6 +35,19 @@
 /* PACA save area size in u64 units (exgen, exmc, etc) */
 #define EX_SIZE		10
 
+/* PACA save area offsets */
+#define EX_R9		0
+#define EX_R10		8
+#define EX_R11		16
+#define EX_R12		24
+#define EX_R13		32
+#define EX_DAR		40
+#define EX_DSISR	48
+#define EX_CCR		52
+#define EX_CFAR		56
+#define EX_PPR		64
+#define EX_CTR		72
+
 /*
  * maximum recursive depth of MCE exceptions
  */
diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
index a61a45704925..6872f0376003 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -21,22 +21,6 @@
 #include <asm/feature-fixups.h>
 #include <asm/kup.h>
 
-/* PACA save area offsets (exgen, exmc, etc) */
-#define EX_R9		0
-#define EX_R10		8
-#define EX_R11		16
-#define EX_R12		24
-#define EX_R13		32
-#define EX_DAR		40
-#define EX_DSISR	48
-#define EX_CCR		52
-#define EX_CFAR		56
-#define EX_PPR		64
-#define EX_CTR		72
-.if EX_SIZE != 10
-	.error "EX_SIZE is wrong"
-.endif
-
 /*
  * Following are fixed section helper macros.
  *
@@ -1995,45 +1979,21 @@ EXC_VIRT_END(system_call, 0x4c00, 0x100)
 
 #ifdef CONFIG_KVM_BOOK3S_64_HANDLER
 TRAMP_REAL_BEGIN(system_call_kvm)
-	/*
-	 * This is a hcall, so register convention is as above, with these
-	 * differences:
-	 * r13 = PACA
-	 * ctr = orig r13
-	 * orig r10 saved in PACA
-	 */
-	 /*
-	  * Save the PPR (on systems that support it) before changing to
-	  * HMT_MEDIUM. That allows the KVM code to save that value into the
-	  * guest state (it is the guest's PPR value).
-	  */
-BEGIN_FTR_SECTION
-	mfspr	r10,SPRN_PPR
-	std	r10,HSTATE_PPR(r13)
-END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
-	HMT_MEDIUM
 	mfctr	r10
-	SET_SCRATCH0(r10)
-	mfcr	r10
-	std	r12,HSTATE_SCRATCH0(r13)
-	sldi	r12,r10,32
-	ori	r12,r12,0xc00
+	SET_SCRATCH0(r10) /* Save r13 in SCRATCH0 */
 #ifdef CONFIG_RELOCATABLE
 	/*
-	 * Requires __LOAD_FAR_HANDLER beause kvmppc_interrupt lives
+	 * Requires __LOAD_FAR_HANDLER beause kvmppc_hcall lives
 	 * outside the head section.
 	 */
 	__LOAD_FAR_HANDLER(r10, kvmppc_hcall)
 	mtctr   r10
-	ld	r10,PACA_EXGEN+EX_R10(r13)
 	bctr
 #else
-	ld	r10,PACA_EXGEN+EX_R10(r13)
 	b       kvmppc_hcall
 #endif
 #endif
 
-
 /**
  * Interrupt 0xd00 - Trace Interrupt.
  * This is a synchronous interrupt in response to instruction step or
diff --git a/arch/powerpc/kvm/book3s_64_entry.S b/arch/powerpc/kvm/book3s_64_entry.S
index 53addbbe7b1a..7050197cd359 100644
--- a/arch/powerpc/kvm/book3s_64_entry.S
+++ b/arch/powerpc/kvm/book3s_64_entry.S
@@ -13,6 +13,23 @@
 .global	kvmppc_hcall
 .balign IFETCH_ALIGN_BYTES
 kvmppc_hcall:
+	/*
+	 * This is a hcall, so register convention is as
+	 * Documentation/powerpc/papr_hcalls.rst, with these additions:
+	 * R13		= PACA
+	 * guest R13 saved in SPRN_SCRATCH0
+	 * R10		= free
+	 */
+BEGIN_FTR_SECTION
+	mfspr	r10,SPRN_PPR
+	std	r10,HSTATE_PPR(r13)
+END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
+	HMT_MEDIUM
+	mfcr	r10
+	std	r12,HSTATE_SCRATCH0(r13)
+	sldi	r12,r10,32
+	ori	r12,r12,0xc00
+	ld	r10,PACA_EXGEN+EX_R10(r13)
 
 .global	kvmppc_interrupt
 .balign IFETCH_ALIGN_BYTES
-- 
2.23.0


^ permalink raw reply related

* [PATCH 07/13] KVM: PPC: Book3S 64: add hcall interrupt handler
From: Nicholas Piggin @ 2021-02-19  6:35 UTC (permalink / raw)
  To: kvm-ppc; +Cc: linuxppc-dev, Nicholas Piggin, Fabiano Rosas
In-Reply-To: <20210219063542.1425130-1-npiggin@gmail.com>

Add a separate hcall entry point. This can be used to deal with the
different calling convention.

Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/kernel/exceptions-64s.S | 4 ++--
 arch/powerpc/kvm/book3s_64_entry.S   | 6 +++++-
 2 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
index 96f22c582213..a61a45704925 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -2023,13 +2023,13 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
 	 * Requires __LOAD_FAR_HANDLER beause kvmppc_interrupt lives
 	 * outside the head section.
 	 */
-	__LOAD_FAR_HANDLER(r10, kvmppc_interrupt)
+	__LOAD_FAR_HANDLER(r10, kvmppc_hcall)
 	mtctr   r10
 	ld	r10,PACA_EXGEN+EX_R10(r13)
 	bctr
 #else
 	ld	r10,PACA_EXGEN+EX_R10(r13)
-	b       kvmppc_interrupt
+	b       kvmppc_hcall
 #endif
 #endif
 
diff --git a/arch/powerpc/kvm/book3s_64_entry.S b/arch/powerpc/kvm/book3s_64_entry.S
index 820d103e5f50..53addbbe7b1a 100644
--- a/arch/powerpc/kvm/book3s_64_entry.S
+++ b/arch/powerpc/kvm/book3s_64_entry.S
@@ -7,9 +7,13 @@
 #include <asm/reg.h>
 
 /*
- * This is branched to from interrupt handlers in exception-64s.S which set
+ * These are branched to from interrupt handlers in exception-64s.S which set
  * IKVM_REAL or IKVM_VIRT, if HSTATE_IN_GUEST was found to be non-zero.
  */
+.global	kvmppc_hcall
+.balign IFETCH_ALIGN_BYTES
+kvmppc_hcall:
+
 .global	kvmppc_interrupt
 .balign IFETCH_ALIGN_BYTES
 kvmppc_interrupt:
-- 
2.23.0


^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox