LinuxPPC-Dev Archive on lore.kernel.org

LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed

* Applied "ASoC: fsl_esai: Mark expected switch fall-through" to the asoc tree
From: Mark Brown @ 2018-08-03 17:01 UTC (permalink / raw)
  To: Gustavo A. R. Silva
  Cc: Mark Brown, Timur Tabi, Nicolin Chen, Xiubo Li, Fabio Estevam,
	linuxppc-dev, linux-kernel, alsa-devel, Gustavo A. R. Silva,
	Liam Girdwood, Takashi Iwai, Mark Brown, alsa-devel
In-Reply-To: <b866336e8826ed65e2c0d74429f0834474bf1152.1533312229.git.gustavo@embeddedor.com>

The patch

   ASoC: fsl_esai: Mark expected switch fall-through

has been applied to the asoc tree at

   https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound.git 

All being well this means that it will be integrated into the linux-next
tree (usually sometime in the next 24 hours) and sent to Linus during
the next merge window (or sooner if it is a bug fix), however if
problems are discovered then the patch may be dropped or reverted.  

You may get further e-mails resulting from automated or manual testing
and review of the tree, please engage with people reporting problems and
send followup patches addressing any issues that are reported if needed.

If any updates are required or you are submitting further changes they
should be sent as incremental updates against current git, existing
patches will not be replaced.

Please add any relevant lists and maintainers to the CCs when replying
to this mail.

Thanks,
Mark

>From 16bbeb2b43c3f5d69e1348477e75a24ae6d55d5a Mon Sep 17 00:00:00 2001
From: "Gustavo A. R. Silva" <gustavo@embeddedor.com>
Date: Fri, 3 Aug 2018 11:29:53 -0500
Subject: [PATCH] ASoC: fsl_esai: Mark expected switch fall-through

In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.

Addresses-Coverity-ID: 1222121 ("Missing break in switch")
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
---
 sound/soc/fsl/fsl_esai.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/sound/soc/fsl/fsl_esai.c b/sound/soc/fsl/fsl_esai.c
index 8f43110373b8..c1d1d06783e5 100644
--- a/sound/soc/fsl/fsl_esai.c
+++ b/sound/soc/fsl/fsl_esai.c
@@ -249,6 +249,7 @@ static int fsl_esai_set_dai_sysclk(struct snd_soc_dai *dai, int clk_id,
 		break;
 	case ESAI_HCKT_EXTAL:
 		ecr |= ESAI_ECR_ETI;
+		/* fall through */
 	case ESAI_HCKR_EXTAL:
 		ecr |= ESAI_ECR_ERI;
 		break;
-- 
2.18.0

^ permalink raw reply related

* Re: [PATCH 2/2] powerpc/cpu: post the event cpux add/remove instead of online/offline during hotplug
From: kbuild test robot @ 2018-08-03 17:09 UTC (permalink / raw)
  To: Pingfan Liu
  Cc: kbuild-all, linuxppc-dev, Pingfan Liu, Benjamin Herrenschmidt,
	Michael Ellerman, Rafael J. Wysocki, Tyrel Datwyler, linux-pm
In-Reply-To: <1533217359-11420-2-git-send-email-kernelfans@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 1732 bytes --]

Hi Pingfan,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on powerpc/next]
[also build test ERROR on v4.18-rc7 next-20180802]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Pingfan-Liu/powerpc-cpuidle-dynamically-register-unregister-cpuidle_device-during-hotplug/20180803-151916
base:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next
config: powerpc-allnoconfig (attached as .config)
compiler: powerpc-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        GCC_VERSION=7.2.0 make.cross ARCH=powerpc 

All errors (new ones prefixed by >>):

   arch/powerpc/kernel/sysfs.c: In function 'unregister_ppc_cpu':
>> arch/powerpc/kernel/sysfs.c:1052:2: error: implicit declaration of function 'unregister_cpu'; did you mean 'register_cpu'? [-Werror=implicit-function-declaration]
     unregister_cpu(container_of(get_cpu_device(cpu),
     ^~~~~~~~~~~~~~
     register_cpu
   cc1: all warnings being treated as errors

vim +1052 arch/powerpc/kernel/sysfs.c

  1047	
  1048	int unregister_ppc_cpu(int cpu)
  1049	{
  1050		device_remove_file(&per_cpu(cpu_devices, cpu).dev,
  1051			&dev_attr_physical_id);
> 1052		unregister_cpu(container_of(get_cpu_device(cpu),
  1053			struct cpu, dev));
  1054		return 0;
  1055	}
  1056	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 5685 bytes --]

^ permalink raw reply

* Re: [RFC 0/4] Virtio uses DMA API for all devices
From: Benjamin Herrenschmidt @ 2018-08-03 18:58 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Michael S. Tsirkin, Will Deacon, Anshuman Khandual,
	virtualization, linux-kernel, linuxppc-dev, aik, robh, joe,
	elfring, david, jasowang, mpe, linuxram, haren, paulus, srikar,
	robin.murphy, jean-philippe.brucker, marc.zyngier
In-Reply-To: <20180803160246.GA13794@infradead.org>

On Fri, 2018-08-03 at 09:02 -0700, Christoph Hellwig wrote:
> On Fri, Aug 03, 2018 at 10:58:36AM -0500, Benjamin Herrenschmidt wrote:
> > On Fri, 2018-08-03 at 00:05 -0700, Christoph Hellwig wrote:
> > > >   2- Make virtio use the DMA API with our custom platform-provided
> > > > swiotlb callbacks when needed, that is when not using IOMMU *and*
> > > > running on a secure VM in our case.
> > > 
> > > And total NAK the customer platform-provided part of this.  We need
> > > a flag passed in from the hypervisor that the device needs all bus
> > > specific dma api treatment, and then just use the normal plaform
> > > dma mapping setup. 
> > 
> > Christoph, as I have explained already, we do NOT have a way to provide
> > such a flag as neither the hypervisor nor qemu knows anything about
> > this when the VM is created.
> 
> Well, if your setup is so fucked up I see no way to support it in Linux.
> 
> Let's end the discussion right now then.

You are saying something along the lines of "I don't like an
instruction in your ISA, let's not support your entire CPU architecture
in Linux".

Our setup is not fucked. It makes a LOT of sense and it's a very
sensible design. It's hitting a problem due to a corner case oddity in
virtio bypassing the MMU, we've worked around such corner cases many
times in the past without any problem, I fail to see what the problem
is here.

We aren't going to cancel years of HW and SW development for our
security infrastructure bcs you don't like a 2 lines hook into virtio
to make things work and aren't willing to even consider the options.

Ben.

^ permalink raw reply

* Re: [RFC 0/4] Virtio uses DMA API for all devices
From: Michael S. Tsirkin @ 2018-08-03 19:07 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Christoph Hellwig, Will Deacon, Anshuman Khandual, virtualization,
	linux-kernel, linuxppc-dev, aik, robh, joe, elfring, david,
	jasowang, mpe, linuxram, haren, paulus, srikar, robin.murphy,
	jean-philippe.brucker, marc.zyngier
In-Reply-To: <eb1750e90e4bd45da297fa6f78f8ef93671b7c2f.camel@kernel.crashing.org>

On Fri, Aug 03, 2018 at 10:58:36AM -0500, Benjamin Herrenschmidt wrote:
> On Fri, 2018-08-03 at 00:05 -0700, Christoph Hellwig wrote:
> > >   2- Make virtio use the DMA API with our custom platform-provided
> > > swiotlb callbacks when needed, that is when not using IOMMU *and*
> > > running on a secure VM in our case.
> > 
> > And total NAK the customer platform-provided part of this.  We need
> > a flag passed in from the hypervisor that the device needs all bus
> > specific dma api treatment, and then just use the normal plaform
> > dma mapping setup. 
> 
> Christoph, as I have explained already, we do NOT have a way to provide
> such a flag as neither the hypervisor nor qemu knows anything about
> this when the VM is created.

I think the fact you can't add flags from the hypervisor is
a sign of a problematic architecture, you should look at
adding that down the road - you will likely need it at some point.

However in this specific case, the flag does not need to come from the
hypervisor, it can be set by arch boot code I think.
Christoph do you see a problem with that?

> >  To get swiotlb you'll need to then use the DT/ACPI
> > dma-range property to limit the addressable range, and a swiotlb
> > capable plaform will use swiotlb automatically.
> 
> This cannot be done as you describe it.
> 
> The VM is created as a *normal* VM. The DT stuff is generated by qemu
> at a point where it has *no idea* that the VM will later become secure
> and thus will have to restrict which pages can be used for "DMA".
> 
> The VM will *at runtime* turn itself into a secure VM via interactions
> with the security HW and the Ultravisor layer (which sits below the
> HV). This happens way after the DT has been created and consumed, the
> qemu devices instanciated etc...
> 
> Only the guest kernel knows because it initates the transition. When
> that happens, the virtio devices have already been used by the guest
> firmware, bootloader, possibly another kernel that kexeced the "secure"
> one, etc... 
> 
> So instead of running around saying NAK NAK NAK, please explain how we
> can solve that differently.
> 
> Ben.
> 

^ permalink raw reply

* Re: [RFC 0/4] Virtio uses DMA API for all devices
From: Michael S. Tsirkin @ 2018-08-03 19:08 UTC (permalink / raw)
  To: Jason Wang
  Cc: Anshuman Khandual, virtualization, linux-kernel, linuxppc-dev,
	aik, robh, joe, elfring, david, benh, mpe, hch, linuxram, haren,
	paulus, srikar
In-Reply-To: <b31d2162-884c-0ed4-00df-e7debe0cc0e7@redhat.com>

On Fri, Aug 03, 2018 at 10:41:41AM +0800, Jason Wang wrote:
> 
> 
> On 2018年08月03日 04:55, Michael S. Tsirkin wrote:
> > On Fri, Jul 20, 2018 at 09:29:37AM +0530, Anshuman Khandual wrote:
> > > This patch series is the follow up on the discussions we had before about
> > > the RFC titled [RFC,V2] virtio: Add platform specific DMA API translation
> > > for virito devices (https://patchwork.kernel.org/patch/10417371/). There
> > > were suggestions about doing away with two different paths of transactions
> > > with the host/QEMU, first being the direct GPA and the other being the DMA
> > > API based translations.
> > > 
> > > First patch attempts to create a direct GPA mapping based DMA operations
> > > structure called 'virtio_direct_dma_ops' with exact same implementation
> > > of the direct GPA path which virtio core currently has but just wrapped in
> > > a DMA API format. Virtio core must use 'virtio_direct_dma_ops' instead of
> > > the arch default in absence of VIRTIO_F_IOMMU_PLATFORM flag to preserve the
> > > existing semantics. The second patch does exactly that inside the function
> > > virtio_finalize_features(). The third patch removes the default direct GPA
> > > path from virtio core forcing it to use DMA API callbacks for all devices.
> > > Now with that change, every device must have a DMA operations structure
> > > associated with it. The fourth patch adds an additional hook which gives
> > > the platform an opportunity to do yet another override if required. This
> > > platform hook can be used on POWER Ultravisor based protected guests to
> > > load up SWIOTLB DMA callbacks to do the required (as discussed previously
> > > in the above mentioned thread how host is allowed to access only parts of
> > > the guest GPA range) bounce buffering into the shared memory for all I/O
> > > scatter gather buffers to be consumed on the host side.
> > > 
> > > Please go through these patches and review whether this approach broadly
> > > makes sense. I will appreciate suggestions, inputs, comments regarding
> > > the patches or the approach in general. Thank you.
> > Jason did some work on profiling this. Unfortunately he reports
> > about 4% extra overhead from this switch on x86 with no vIOMMU.
> 
> The test is rather simple, just run pktgen (pktgen_sample01_simple.sh) in
> guest and measure PPS on tap on host.
> 
> Thanks

Could you supply host configuration involved please?

> > 
> > I expect he's writing up the data in more detail, but
> > just wanted to let you know this would be one more
> > thing to debug before we can just switch to DMA APIs.
> > 
> > 
> > > Anshuman Khandual (4):
> > >    virtio: Define virtio_direct_dma_ops structure
> > >    virtio: Override device's DMA OPS with virtio_direct_dma_ops selectively
> > >    virtio: Force virtio core to use DMA API callbacks for all virtio devices
> > >    virtio: Add platform specific DMA API translation for virito devices
> > > 
> > >   arch/powerpc/include/asm/dma-mapping.h |  6 +++
> > >   arch/powerpc/platforms/pseries/iommu.c |  6 +++
> > >   drivers/virtio/virtio.c                | 72 ++++++++++++++++++++++++++++++++++
> > >   drivers/virtio/virtio_pci_common.h     |  3 ++
> > >   drivers/virtio/virtio_ring.c           | 65 +-----------------------------
> > >   5 files changed, 89 insertions(+), 63 deletions(-)
> > > 
> > > -- 
> > > 2.9.3

^ permalink raw reply

* Re: [RFC 0/4] Virtio uses DMA API for all devices
From: Michael S. Tsirkin @ 2018-08-03 19:17 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Benjamin Herrenschmidt, Will Deacon, Anshuman Khandual,
	virtualization, linux-kernel, linuxppc-dev, aik, robh, joe,
	elfring, david, jasowang, mpe, linuxram, haren, paulus, srikar,
	robin.murphy, jean-philippe.brucker, marc.zyngier
In-Reply-To: <20180803070507.GA1344@infradead.org>

On Fri, Aug 03, 2018 at 12:05:07AM -0700, Christoph Hellwig wrote:
> On Thu, Aug 02, 2018 at 04:13:09PM -0500, Benjamin Herrenschmidt wrote:
> > So let's differenciate the two problems of having an IOMMU (real or
> > emulated) which indeeds adds overhead etc... and using the DMA API.
> > 
> > At the moment, virtio does this all over the place:
> > 
> > 	if (use_dma_api)
> > 		dma_map/alloc_something(...)
> > 	else
> > 		use_pa
> > 
> > The idea of the patch set is to do two, somewhat orthogonal, changes
> > that together achieve what we want. Let me know where you think there
> > is "a bunch of issues" because I'm missing it:
> > 
> >  1- Replace the above if/else constructs with just calling the DMA API,
> > and have virtio, at initialization, hookup its own dma_ops that just
> > "return pa" (roughly) when the IOMMU stuff isn't used.
> > 
> > This adds an indirect function call to the path that previously didn't
> > have one (the else case above). Is that a significant/measurable
> > overhead ?
> 
> If you call it often enough it does:
> 
> https://www.spinics.net/lists/netdev/msg495413.html
> 
> >  2- Make virtio use the DMA API with our custom platform-provided
> > swiotlb callbacks when needed, that is when not using IOMMU *and*
> > running on a secure VM in our case.
> 
> And total NAK the customer platform-provided part of this.  We need
> a flag passed in from the hypervisor that the device needs all bus
> specific dma api treatment, and then just use the normal plaform
> dma mapping setup.  To get swiotlb you'll need to then use the DT/ACPI
> dma-range property to limit the addressable range, and a swiotlb
> capable plaform will use swiotlb automatically.

It seems reasonable to teach a platform to override dma-range
for a specific device e.g. in case it knows about bugs in ACPI.

-- 
MST

^ permalink raw reply

* Re: vio.c:__vio_register_driver && LPAR Migration issue
From: Tyrel Datwyler @ 2018-08-03 21:23 UTC (permalink / raw)
  To: Michael Bringmann, linuxppc-dev, Tyrel Datwyler, Michael Ellerman,
	engebret, santil, hollisb, rcjenn
In-Reply-To: <e815e222-e468-4bd4-5f6f-8657fc980689@linux.vnet.ibm.com>

On 08/02/2018 11:15 AM, Michael Bringmann wrote:
> Hello:
>     I have been observing an anomaly during LPAR migrations between
> a couple of P8 systems.
> 
> This is the problem.  After migrating an LPAR, the PPC mobility code
> receives RTAS requests to delete nodes with platform-/hardware-specific
> attributes when restarting the kernel after a migration.  My example is
> for migration between a P8 Alpine and a P8 Brazos.  Among the nodes
> that I see being deleted are 'ibm,random-v1', 'ibm,compression-v1',
> 'ibm,platform-facilities', and 'ibm,sym-encryption-v1'.  Of these
> nodes, the following are created during initial boot by calls to
> vio_register_driver:
> 
> drivers/char/hw_random/pseries-rng.c
>         ibm,random-v1
> 
> drivers/crypto/nx/nx-842-pseries.c
>         ibm,compression-v1
> 
> drivers/crypto/nx/nx.c
>         ibm,sym-encryption-v1
> 
> After the migration, these nodes are deleted, but nothing recreates
> them.  If I boot the LPAR on the target system, the nodes are added
> again.
> 
> My question is how do we recreate these nodes after migration?

Hmm, I'd have to see the scenario in action, but these should be added back by ibm,update-nodes RTAS call. There is some debug code in driver/of/dynamic.c that can be enabled that will log node/property dynamic reconfiguration events.

-Tyrel

> 
> Thanks.
> 

^ permalink raw reply

* Re: [RFC 0/4] Virtio uses DMA API for all devices
From: Benjamin Herrenschmidt @ 2018-08-04  1:11 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Christoph Hellwig, Will Deacon, Anshuman Khandual, virtualization,
	linux-kernel, linuxppc-dev, aik, robh, joe, elfring, david,
	jasowang, mpe, linuxram, haren, paulus, srikar, robin.murphy,
	jean-philippe.brucker, marc.zyngier
In-Reply-To: <20180803220443-mutt-send-email-mst@kernel.org>

On Fri, 2018-08-03 at 22:07 +0300, Michael S. Tsirkin wrote:
> On Fri, Aug 03, 2018 at 10:58:36AM -0500, Benjamin Herrenschmidt wrote:
> > On Fri, 2018-08-03 at 00:05 -0700, Christoph Hellwig wrote:
> > > >   2- Make virtio use the DMA API with our custom platform-provided
> > > > swiotlb callbacks when needed, that is when not using IOMMU *and*
> > > > running on a secure VM in our case.
> > > 
> > > And total NAK the customer platform-provided part of this.  We need
> > > a flag passed in from the hypervisor that the device needs all bus
> > > specific dma api treatment, and then just use the normal plaform
> > > dma mapping setup. 
> > 
> > Christoph, as I have explained already, we do NOT have a way to provide
> > such a flag as neither the hypervisor nor qemu knows anything about
> > this when the VM is created.
> 
> I think the fact you can't add flags from the hypervisor is
> a sign of a problematic architecture, you should look at
> adding that down the road - you will likely need it at some point.

Well, we can later in the boot process. At VM creation time, it's just
a normal VM. The VM firmware, bootloader etc... are just operating
normally etc...

Later on, (we may have even already run Linux at that point,
unsecurely, as we can use Linux as a bootloader under some
circumstances), we start a "secure image".

This is a kernel zImage that includes a "ticket" that has the
appropriate signature etc... so that when that kernel starts, it can
authenticate with the ultravisor, be verified (along with its ramdisk)
etc... and copied (by the UV) into secure memory & run from there.

At that point, the hypervisor is informed that the VM has become
secure.

So at that point, we could exit to qemu to inform it of the change, and
have it walk the qtree and "Switch" all the virtio devices to use the
IOMMU I suppose, but it feels a lot grosser to me.

That's the only other option I can think of.

> However in this specific case, the flag does not need to come from the
> hypervisor, it can be set by arch boot code I think.
> Christoph do you see a problem with that?

The above could do that yes. Another approach would be to do it from a
small virtio "quirk" that pokes a bit in the device to force it to
iommu mode when it detects that we are running in a secure VM. That's a
bit warty on the virito side but probably not as much as having a qemu
one that walks of the virtio devices to change how they behave.

What do you reckon ?

Cheers,
Ben.

> > >  To get swiotlb you'll need to then use the DT/ACPI
> > > dma-range property to limit the addressable range, and a swiotlb
> > > capable plaform will use swiotlb automatically.
> > 
> > This cannot be done as you describe it.
> > 
> > The VM is created as a *normal* VM. The DT stuff is generated by qemu
> > at a point where it has *no idea* that the VM will later become secure
> > and thus will have to restrict which pages can be used for "DMA".
> > 
> > The VM will *at runtime* turn itself into a secure VM via interactions
> > with the security HW and the Ultravisor layer (which sits below the
> > HV). This happens way after the DT has been created and consumed, the
> > qemu devices instanciated etc...
> > 
> > Only the guest kernel knows because it initates the transition. When
> > that happens, the virtio devices have already been used by the guest
> > firmware, bootloader, possibly another kernel that kexeced the "secure"
> > one, etc... 
> > 
> > So instead of running around saying NAK NAK NAK, please explain how we
> > can solve that differently.
> > 
> > Ben.
> > 

^ permalink raw reply

* Re: [RFC 0/4] Virtio uses DMA API for all devices
From: Benjamin Herrenschmidt @ 2018-08-04  1:18 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Christoph Hellwig, Will Deacon, Anshuman Khandual, virtualization,
	linux-kernel, linuxppc-dev, aik, robh, joe, elfring, david,
	jasowang, mpe, linuxram, haren, paulus, srikar, robin.murphy,
	jean-philippe.brucker, marc.zyngier
In-Reply-To: <20180803220443-mutt-send-email-mst@kernel.org>

On Fri, 2018-08-03 at 22:07 +0300, Michael S. Tsirkin wrote:
> On Fri, Aug 03, 2018 at 10:58:36AM -0500, Benjamin Herrenschmidt wrote:
> > On Fri, 2018-08-03 at 00:05 -0700, Christoph Hellwig wrote:
> > > >   2- Make virtio use the DMA API with our custom platform-provided
> > > > swiotlb callbacks when needed, that is when not using IOMMU *and*
> > > > running on a secure VM in our case.
> > > 
> > > And total NAK the customer platform-provided part of this.  We need
> > > a flag passed in from the hypervisor that the device needs all bus
> > > specific dma api treatment, and then just use the normal plaform
> > > dma mapping setup. 
> > 
> > Christoph, as I have explained already, we do NOT have a way to provide
> > such a flag as neither the hypervisor nor qemu knows anything about
> > this when the VM is created.
> 
> I think the fact you can't add flags from the hypervisor is
> a sign of a problematic architecture, you should look at
> adding that down the road - you will likely need it at some point.

Well, we can later in the boot process. At VM creation time, it's just
a normal VM. The VM firmware, bootloader etc... are just operating
normally etc...

Later on, (we may have even already run Linux at that point,
unsecurely, as we can use Linux as a bootloader under some
circumstances), we start a "secure image".

This is a kernel zImage that includes a "ticket" that has the
appropriate signature etc... so that when that kernel starts, it can
authenticate with the ultravisor, be verified (along with its ramdisk)
etc... and copied (by the UV) into secure memory & run from there.

At that point, the hypervisor is informed that the VM has become
secure.

So at that point, we could exit to qemu to inform it of the change, and
have it walk the qtree and "Switch" all the virtio devices to use the
IOMMU I suppose, but it feels a lot grosser to me.

That's the only other option I can think of.

> However in this specific case, the flag does not need to come from the
> hypervisor, it can be set by arch boot code I think.
> Christoph do you see a problem with that?

The above could do that yes. Another approach would be to do it from a
small virtio "quirk" that pokes a bit in the device to force it to
iommu mode when it detects that we are running in a secure VM. That's a
bit warty on the virito side but probably not as much as having a qemu
one that walks of the virtio devices to change how they behave.

What do you reckon ?

What we want to avoid is to expose any of this to the *end user* or
libvirt or any other higher level of the management stack. We really
want that stuff to remain contained between the VM itself, KVM and
maybe qemu.

We will need some other qemu changes for migration so that's ok. But
the minute you start touching libvirt and the higher levels it becomes
a nightmare.

Cheers,
Ben.

> > >  To get swiotlb you'll need to then use the DT/ACPI
> > > dma-range property to limit the addressable range, and a swiotlb
> > > capable plaform will use swiotlb automatically.
> > 
> > This cannot be done as you describe it.
> > 
> > The VM is created as a *normal* VM. The DT stuff is generated by qemu
> > at a point where it has *no idea* that the VM will later become secure
> > and thus will have to restrict which pages can be used for "DMA".
> > 
> > The VM will *at runtime* turn itself into a secure VM via interactions
> > with the security HW and the Ultravisor layer (which sits below the
> > HV). This happens way after the DT has been created and consumed, the
> > qemu devices instanciated etc...
> > 
> > Only the guest kernel knows because it initates the transition. When
> > that happens, the virtio devices have already been used by the guest
> > firmware, bootloader, possibly another kernel that kexeced the "secure"
> > one, etc... 
> > 
> > So instead of running around saying NAK NAK NAK, please explain how we
> > can solve that differently.
> > 
> > Ben.
> > 

^ permalink raw reply

* Re: [RFC 0/4] Virtio uses DMA API for all devices
From: Benjamin Herrenschmidt @ 2018-08-04  1:16 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Christoph Hellwig, Will Deacon, Anshuman Khandual, virtualization,
	linux-kernel, linuxppc-dev, aik, robh, joe, elfring, david,
	jasowang, mpe, linuxram, haren, paulus, srikar, robin.murphy,
	jean-philippe.brucker, marc.zyngier
In-Reply-To: <20180803220443-mutt-send-email-mst@kernel.org>

On Fri, 2018-08-03 at 22:07 +0300, Michael S. Tsirkin wrote:
> On Fri, Aug 03, 2018 at 10:58:36AM -0500, Benjamin Herrenschmidt wrote:
> > On Fri, 2018-08-03 at 00:05 -0700, Christoph Hellwig wrote:
> > > >   2- Make virtio use the DMA API with our custom platform-provided
> > > > swiotlb callbacks when needed, that is when not using IOMMU *and*
> > > > running on a secure VM in our case.
> > > 
> > > And total NAK the customer platform-provided part of this.  We need
> > > a flag passed in from the hypervisor that the device needs all bus
> > > specific dma api treatment, and then just use the normal plaform
> > > dma mapping setup. 
> > 
> > Christoph, as I have explained already, we do NOT have a way to provide
> > such a flag as neither the hypervisor nor qemu knows anything about
> > this when the VM is created.
> 
> I think the fact you can't add flags from the hypervisor is
> a sign of a problematic architecture, you should look at
> adding that down the road - you will likely need it at some point.

Well, we can later in the boot process. At VM creation time, it's just
a normal VM. The VM firmware, bootloader etc... are just operating
normally etc...

Later on, (we may have even already run Linux at that point,
unsecurely, as we can use Linux as a bootloader under some
circumstances), we start a "secure image".

This is a kernel zImage that includes a "ticket" that has the
appropriate signature etc... so that when that kernel starts, it can
authenticate with the ultravisor, be verified (along with its ramdisk)
etc... and copied (by the UV) into secure memory & run from there.

At that point, the hypervisor is informed that the VM has become
secure.

So at that point, we could exit to qemu to inform it of the change, and
have it walk the qtree and "Switch" all the virtio devices to use the
IOMMU I suppose, but it feels a lot grosser to me.

That's the only other option I can think of.

> However in this specific case, the flag does not need to come from the
> hypervisor, it can be set by arch boot code I think.
> Christoph do you see a problem with that?

The above could do that yes. Another approach would be to do it from a
small virtio "quirk" that pokes a bit in the device to force it to
iommu mode when it detects that we are running in a secure VM. That's a
bit warty on the virito side but probably not as much as having a qemu
one that walks of the virtio devices to change how they behave.

What do you reckon ?

What we want to avoid is to expose any of this to the *end user* or
libvirt or any other higher level of the management stack. We really
want that stuff to remain contained between the VM itself, KVM and
maybe qemu.

We will need some other qemu changes for migration so that's ok. But
the minute you start touching libvirt and the higher levels it becomes
a nightmare.

Cheers,
Ben.

> > >  To get swiotlb you'll need to then use the DT/ACPI
> > > dma-range property to limit the addressable range, and a swiotlb
> > > capable plaform will use swiotlb automatically.
> > 
> > This cannot be done as you describe it.
> > 
> > The VM is created as a *normal* VM. The DT stuff is generated by qemu
> > at a point where it has *no idea* that the VM will later become secure
> > and thus will have to restrict which pages can be used for "DMA".
> > 
> > The VM will *at runtime* turn itself into a secure VM via interactions
> > with the security HW and the Ultravisor layer (which sits below the
> > HV). This happens way after the DT has been created and consumed, the
> > qemu devices instanciated etc...
> > 
> > Only the guest kernel knows because it initates the transition. When
> > that happens, the virtio devices have already been used by the guest
> > firmware, bootloader, possibly another kernel that kexeced the "secure"
> > one, etc... 
> > 
> > So instead of running around saying NAK NAK NAK, please explain how we
> > can solve that differently.
> > 
> > Ben.
> > 

^ permalink raw reply

* Re: [RFC 0/4] Virtio uses DMA API for all devices
From: Benjamin Herrenschmidt @ 2018-08-04  1:22 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Christoph Hellwig, Will Deacon, Anshuman Khandual, virtualization,
	linux-kernel, linuxppc-dev, aik, robh, joe, elfring, david,
	jasowang, mpe, linuxram, haren, paulus, srikar, robin.murphy,
	jean-philippe.brucker, marc.zyngier
In-Reply-To: <20180803220443-mutt-send-email-mst@kernel.org>

On Fri, 2018-08-03 at 22:07 +0300, Michael S. Tsirkin wrote:
> On Fri, Aug 03, 2018 at 10:58:36AM -0500, Benjamin Herrenschmidt wrote:
> > On Fri, 2018-08-03 at 00:05 -0700, Christoph Hellwig wrote:
> > > >   2- Make virtio use the DMA API with our custom platform-provided
> > > > swiotlb callbacks when needed, that is when not using IOMMU *and*
> > > > running on a secure VM in our case.
> > > 
> > > And total NAK the customer platform-provided part of this.  We need
> > > a flag passed in from the hypervisor that the device needs all bus
> > > specific dma api treatment, and then just use the normal plaform
> > > dma mapping setup. 
> > 
> > Christoph, as I have explained already, we do NOT have a way to provide
> > such a flag as neither the hypervisor nor qemu knows anything about
> > this when the VM is created.
> 
> I think the fact you can't add flags from the hypervisor is
> a sign of a problematic architecture, you should look at
> adding that down the road - you will likely need it at some point.

(Appologies if you got this twice, my mailer had a brain fart and I don't
know if the first one got through & am about to disappear in a plane for 17h)

Well, we can later in the boot process. At VM creation time, it's just
a normal VM. The VM firmware, bootloader etc... are just operating
normally etc...

Later on, (we may have even already run Linux at that point,
unsecurely, as we can use Linux as a bootloader under some
circumstances), we start a "secure image".

This is a kernel zImage that includes a "ticket" that has the
appropriate signature etc... so that when that kernel starts, it can
authenticate with the ultravisor, be verified (along with its ramdisk)
etc... and copied (by the UV) into secure memory & run from there.

At that point, the hypervisor is informed that the VM has become
secure.

So at that point, we could exit to qemu to inform it of the change, and
have it walk the qtree and "Switch" all the virtio devices to use the
IOMMU I suppose, but it feels a lot grosser to me.

That's the only other option I can think of.

> However in this specific case, the flag does not need to come from the
> hypervisor, it can be set by arch boot code I think.
> Christoph do you see a problem with that?

The above could do that yes. Another approach would be to do it from a
small virtio "quirk" that pokes a bit in the device to force it to
iommu mode when it detects that we are running in a secure VM. That's a
bit warty on the virito side but probably not as much as having a qemu
one that walks of the virtio devices to change how they behave.

What do you reckon ?

What we want to avoid is to expose any of this to the *end user* or
libvirt or any other higher level of the management stack. We really
want that stuff to remain contained between the VM itself, KVM and
maybe qemu.

We will need some other qemu changes for migration so that's ok. But
the minute you start touching libvirt and the higher levels it becomes
a nightmare.

Cheers,
Ben.

> > >  To get swiotlb you'll need to then use the DT/ACPI
> > > dma-range property to limit the addressable range, and a swiotlb
> > > capable plaform will use swiotlb automatically.
> > 
> > This cannot be done as you describe it.
> > 
> > The VM is created as a *normal* VM. The DT stuff is generated by qemu
> > at a point where it has *no idea* that the VM will later become secure
> > and thus will have to restrict which pages can be used for "DMA".
> > 
> > The VM will *at runtime* turn itself into a secure VM via interactions
> > with the security HW and the Ultravisor layer (which sits below the
> > HV). This happens way after the DT has been created and consumed, the
> > qemu devices instanciated etc...
> > 
> > Only the guest kernel knows because it initates the transition. When
> > that happens, the virtio devices have already been used by the guest
> > firmware, bootloader, possibly another kernel that kexeced the "secure"
> > one, etc... 
> > 
> > So instead of running around saying NAK NAK NAK, please explain how we
> > can solve that differently.
> > 
> > Ben.
> > 

^ permalink raw reply

* Re: [PATCH] powerpc/powernv/opal: Use standard interrupts property when available
From: Michael Ellerman @ 2018-08-04  2:39 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, linuxppc-dev; +Cc: Stewart Smith
In-Reply-To: <1523344570.11062.65.camel@kernel.crashing.org>

Benjamin Herrenschmidt <benh@kernel.crashing.org> writes:
...
> diff --git a/arch/powerpc/platforms/powernv/opal-irqchip.c b/arch/powerpc/platforms/powernv/opal-irqchip.c
> index 9d1b8c0aaf93..46785eaf625d 100644
> --- a/arch/powerpc/platforms/powernv/opal-irqchip.c
> +++ b/arch/powerpc/platforms/powernv/opal-irqchip.c
> @@ -174,24 +175,21 @@ void opal_event_shutdown(void)
>  
>  	/* First free interrupts, which will also mask them */
>  	for (i = 0; i < opal_irq_count; i++) {
> -		if (!opal_irqs[i])
> +		if (!opal_irqs || !opal_irqs[i].start)
>  			continue;
>  
>  		if (in_interrupt())
> -			disable_irq_nosync(opal_irqs[i]);
> +			disable_irq_nosync(opal_irqs[i].start);
>  		else
> -			free_irq(opal_irqs[i], NULL);
> -
> -		opal_irqs[i] = 0;

This             ^^^^^^^^^^^^^^

> +			free_irq(opal_irqs[i].start, NULL);
>  	}

causes:

  ------------[ cut here ]------------
  Trying to free already-free IRQ 22
  WARNING: CPU: 0 PID: 1295 at ../kernel/irq/manage.c:1583 __free_irq+0xe0/0x420
  Modules linked in:
  CPU: 0 PID: 1295 Comm: init Tainted: G        W         4.18.0-rc3-gcc-7.3.1-00187-g46a3659b3791-dirty #80
  NIP:  c00000000017ca90 LR: c00000000017ca8c CTR: c000000000771a90
  REGS: c00000007552b810 TRAP: 0700   Tainted: G        W          (4.18.0-rc3-gcc-7.3.1-00187-g46a3659b3791-dirty)
  MSR:  9000000000029033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 28000222  XER: 20000000
  CFAR: c000000000106f10 IRQMASK: 1 
  GPR00: c00000000017ca8c c00000007552ba90 c000000000fd1600 0000000000000022 
  GPR04: 0000000000000001 000000000000015e 9000000000009033 0000000000000000 
  GPR08: 000000007ef50000 c000000000ef0bf0 c000000000ef0bf0 9000000000001003 
  GPR12: 0000000000000000 c000000001160000 0000000000000000 0000000000000000 
  GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 
  GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 
  GPR24: 00000000100c0828 0000000000000000 0000000000000016 c000000077118094 
  GPR28: c000000077118148 0000000000000000 c000000077118000 0000000000000000 
  NIP [c00000000017ca90] __free_irq+0xe0/0x420
  LR [c00000000017ca8c] __free_irq+0xdc/0x420
  Call Trace:
  [c00000007552ba90] [c00000000017ca8c] __free_irq+0xdc/0x420 (unreliable)
  [c00000007552bb30] [c00000000017ced8] free_irq+0x78/0xe0
  [c00000007552bb60] [c0000000000a7c48] opal_event_shutdown+0x158/0x160
  [c00000007552bbf0] [c00000000009d150] pnv_prepare_going_down+0x20/0x80
  [c00000007552bc10] [c00000000009d1d4] pnv_power_off+0x24/0x70
  [c00000007552bc40] [c00000000009d240] pnv_restart+0x0/0x70
  [c00000007552bc60] [c00000000002c2a0] machine_halt+0x60/0x70
  [c00000007552bc80] [c000000000139f74] kernel_halt+0x84/0xa0
  [c00000007552bce0] [c00000000013a36c] sys_reboot+0x28c/0x2c0
  [c00000007552be30] [c00000000000b9e4] system_call+0x5c/0x70
  Instruction dump:
  e93f0008 7fa9e840 419e0098 7feafb78 ebea0018 2fbf0000 409effe8 3c62ffd0 
  7f44d378 386326c8 4bf8a421 60000000 <0fe00000> 7f24cb78 7f63db78 48952bdd 
  ---[ end trace 926f8007a9952304 ]---


Because we're calling opal_event_shutdown() twice, via opal_shutdown()
and then pnv_prepare_going_down().

Did you drop that line on purpose, or was it just collateral damage?

The obvious fix does work:

+	opal_irqs[i].start = 0;


cheers

^ permalink raw reply

* Re: [RFC 0/4] Virtio uses DMA API for all devices
From: Benjamin Herrenschmidt @ 2018-08-04  1:21 UTC (permalink / raw)
  To: Michael S. Tsirkin, Jason Wang
  Cc: Anshuman Khandual, virtualization, linux-kernel, linuxppc-dev,
	aik, robh, joe, elfring, david, mpe, hch, linuxram, haren, paulus,
	srikar
In-Reply-To: <20180803220812-mutt-send-email-mst@kernel.org>

On Fri, 2018-08-03 at 22:08 +0300, Michael S. Tsirkin wrote:
> > > > Please go through these patches and review whether this approach broadly
> > > > makes sense. I will appreciate suggestions, inputs, comments regarding
> > > > the patches or the approach in general. Thank you.
> > > 
> > > Jason did some work on profiling this. Unfortunately he reports
> > > about 4% extra overhead from this switch on x86 with no vIOMMU.
> > 
> > The test is rather simple, just run pktgen (pktgen_sample01_simple.sh) in
> > guest and measure PPS on tap on host.
> > 
> > Thanks
> 
> Could you supply host configuration involved please?

I wonder how much of that could be caused by Spectre mitigations
blowing up indirect function calls...

Cheers,
Ben.

^ permalink raw reply

* Re: [PATCH 2/2] powerpc/cpu: post the event cpux add/remove instead of online/offline during hotplug
From: Michael Ellerman @ 2018-08-04  2:48 UTC (permalink / raw)
  To: Pingfan Liu, linuxppc-dev
  Cc: Pingfan Liu, Benjamin Herrenschmidt, Rafael J. Wysocki,
	Tyrel Datwyler, linux-pm
In-Reply-To: <1533217359-11420-2-git-send-email-kernelfans@gmail.com>

Hi Pingfan,

Pingfan Liu <kernelfans@gmail.com> writes:
> Technically speaking, echo 1/0 > cpuX/online is only a subset of cpu
> hotplug/unplug, i.e. add/remove. The latter one includes the physical
> adding/removing of a cpu device. Some user space tools such as kexec-tools
> resort to the event add/remove to automatically rebuild dtb.
> If the dtb is not rebuilt correctly, we may hang on 2nd kernel due to
> lack the info of boot-cpu-hwid in dtb.

I notice you also sent a patch for ppc64_cpu to deal with CPUs being
removed rather than just offlined.

If I apply this patch then existing ppc64_cpu (without your other patch)
will break. Is that right?

cheers

^ permalink raw reply

* Re: [RFC 0/4] Virtio uses DMA API for all devices
From: Christoph Hellwig @ 2018-08-04  8:21 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Christoph Hellwig, Michael S. Tsirkin, Will Deacon,
	Anshuman Khandual, virtualization, linux-kernel, linuxppc-dev,
	aik, robh, joe, elfring, david, jasowang, mpe, linuxram, haren,
	paulus, srikar, robin.murphy, jean-philippe.brucker, marc.zyngier
In-Reply-To: <22310f58605169fe9de83abf78b59f593ff7fbb7.camel@kernel.crashing.org>

On Fri, Aug 03, 2018 at 01:58:46PM -0500, Benjamin Herrenschmidt wrote:
> You are saying something along the lines of "I don't like an
> instruction in your ISA, let's not support your entire CPU architecture
> in Linux".

No.  I'm saying if you can't describe your architecture in the virtio
spec document it is bogus.

> Our setup is not fucked. It makes a LOT of sense and it's a very
> sensible design. It's hitting a problem due to a corner case oddity in
> virtio bypassing the MMU, we've worked around such corner cases many
> times in the past without any problem, I fail to see what the problem
> is here.

No matter if you like it or not (I don't!) virtio is defined to bypass
dma translations, it is very clearly stated in the spec.  It has some
ill-defined bits to bypass it, so if you want the dma mapping API
to be used you'll have to set that bit (in its original form, a refined
form, or an entirely newly defined sane form) and make sure your
hypersivors always sets it.  It's not rocket science, just a little bit
for work to make sure your setup is actually going to work reliably
and portably.

> We aren't going to cancel years of HW and SW development for our

Maybe you should have actually read the specs you are claiming to
implemented before spending all that effort.

^ permalink raw reply

* Re: [RFC 0/4] Virtio uses DMA API for all devices
From: Christoph Hellwig @ 2018-08-04  8:15 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Christoph Hellwig, Benjamin Herrenschmidt, Will Deacon,
	Anshuman Khandual, virtualization, linux-kernel, linuxppc-dev,
	aik, robh, joe, elfring, david, jasowang, mpe, linuxram, haren,
	paulus, srikar, robin.murphy, jean-philippe.brucker, marc.zyngier
In-Reply-To: <20180803221634-mutt-send-email-mst@kernel.org>

On Fri, Aug 03, 2018 at 10:17:32PM +0300, Michael S. Tsirkin wrote:
> It seems reasonable to teach a platform to override dma-range
> for a specific device e.g. in case it knows about bugs in ACPI.

A platform will be able override dma-range using the dev->bus_dma_mask
field starting in 4.19.  But we'll still need a way how to

  a) document in the virtio spec that all bus dma quirks are to be
     applied
  b) a way to document in a virtio-related spec how the bus handles
     dma for Ben's totally fucked up hypervisor.  Without that there
     is not way we'll get interoperable implementations.

^ permalink raw reply

* Re: linux-next: manual merge of the powerpc tree with the m68k tree
From: Finn Thain @ 2018-08-04 12:25 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Stephen Rothwell, Michael Ellerman, Benjamin Herrenschmidt,
	linuxppc-dev, Linux-Next, Linux Kernel Mailing List,
	Arnd Bergmann
In-Reply-To: <CAMuHMdXTW8iSA+HnnDNzoNkMwUchLY1J6YBYCBCfGRT5UatBqg@mail.gmail.com>

On Thu, 2 Aug 2018, Geert Uytterhoeven wrote:

> Hi Stephen,
> 
> On Thu, Aug 2, 2018 at 1:42 AM Stephen Rothwell <sfr@canb.auug.org.au> wrote:
> > [forgot the conflict resolution ...]
> >
> > On Thu, 2 Aug 2018 09:27:20 +1000 Stephen Rothwell <sfr@canb.auug.org.au> wrote:
> > >
> > > Today's linux-next merge of the powerpc tree got a conflict in:
> > >
> > >   arch/m68k/mac/misc.c
> > >
> > > between commit:
> > >
> > >   5b9bfb8ec467 ("m68k: mac: Use time64_t in RTC handling")
> > >
> > > from the m68k tree and commit:
> > >
> > >   ebd722275f9c ("macintosh/via-pmu: Replace via-pmu68k driver with via-pmu driver")
> > >
> > > from the powerpc tree.
> > >
> > > I fixed it up (see below) and can carry the fix as necessary. This 
> > > is now fixed as far as linux-next is concerned, but any non trivial 
> > > conflicts should be mentioned to your upstream maintainer when your 
> > > tree is submitted for merging.  You may also want to consider 
> > > cooperating with the maintainer of the conflicting tree to minimise 
> > > any particularly complex conflicts.
> 
> Ah, now I remember Finn said he was going to rebase his series once the 
> time64_t patch has entered my tree...
> 

The conflict I was worried about was avoided when I dropped v3 patch 10/12 
("macintosh: Use common code to access RTC"). I'll rework that patch after 
all the PMU and RTC work makes its way into Linus' tree.

> > --- a/arch/m68k/mac/misc.c
> > +++ b/arch/m68k/mac/misc.c
> > @@@ -90,11 -85,11 +90,11 @@@ static void cuda_write_pram(int offset
> >   }
> >   #endif /* CONFIG_ADB_CUDA */
> >
> > - #ifdef CONFIG_ADB_PMU68K
> > + #ifdef CONFIG_ADB_PMU
> >  -static long pmu_read_time(void)
> >  +static time64_t pmu_read_time(void)
> >   {
> >         struct adb_request req;
> >  -      long time;
> >  +      time64_t time;
> >
> >         if (pmu_request(&req, NULL, 1, PMU_READ_RTC) < 0)
> >                 return 0;
> 
> Thanks, looks good to me!
> 

Looks good to me, too.

Thanks.

-- 

> Gr{oetje,eeting}s,
> 
>                         Geert
> 
> 

^ permalink raw reply

* [PATCH] powpc:feature: of_node_put is not needed after iterator.
From: zhong jiang @ 2018-08-04 14:25 UTC (permalink / raw)
  To: benh; +Cc: paulus, mpe, linuxppc-dev, linux-kernel

for_each_node_by_name iterators only  exit normally when the loop
cursor is NULL, So there is no point to call of_node_put.

Signed-off-by: zhong jiang <zhongjiang@huawei.com>
---
 arch/powerpc/platforms/powermac/feature.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/arch/powerpc/platforms/powermac/feature.c b/arch/powerpc/platforms/powermac/feature.c
index 3f82cb2..4eb8cb3 100644
--- a/arch/powerpc/platforms/powermac/feature.c
+++ b/arch/powerpc/platforms/powermac/feature.c
@@ -2889,10 +2889,8 @@ static void __init probe_one_macio(const char *name, const char *compat, int typ
 	/* On all machines, switch modem & serial ports off */
 	for_each_node_by_name(np, "ch-a")
 		initial_serial_shutdown(np);
-	of_node_put(np);
 	for_each_node_by_name(np, "ch-b")
 		initial_serial_shutdown(np);
-	of_node_put(np);
 }
 
 void __init
-- 
1.7.12.4

^ permalink raw reply related

* Re: [PATCH] powpc:feature: of_node_put is not needed after iterator.
From: Tyrel Datwyler @ 2018-08-04 20:37 UTC (permalink / raw)
  To: zhong jiang, benh; +Cc: paulus, linuxppc-dev, linux-kernel
In-Reply-To: <1533392700-24621-1-git-send-email-zhongjiang@huawei.com>

On 08/04/2018 07:25 AM, zhong jiang wrote:
> for_each_node_by_name iterators only  exit normally when the loop
> cursor is NULL, So there is no point to call of_node_put.
> 
> Signed-off-by: zhong jiang <zhongjiang@huawei.com>

Reviewed-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>

> ---
>  arch/powerpc/platforms/powermac/feature.c | 2 --
>  1 file changed, 2 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/powermac/feature.c b/arch/powerpc/platforms/powermac/feature.c
> index 3f82cb2..4eb8cb3 100644
> --- a/arch/powerpc/platforms/powermac/feature.c
> +++ b/arch/powerpc/platforms/powermac/feature.c
> @@ -2889,10 +2889,8 @@ static void __init probe_one_macio(const char *name, const char *compat, int typ
>  	/* On all machines, switch modem & serial ports off */
>  	for_each_node_by_name(np, "ch-a")
>  		initial_serial_shutdown(np);
> -	of_node_put(np);
>  	for_each_node_by_name(np, "ch-b")
>  		initial_serial_shutdown(np);
> -	of_node_put(np);
>  }
> 
>  void __init
> 

^ permalink raw reply

* Re: vio.c:__vio_register_driver && LPAR Migration issue
From: Tyrel Datwyler @ 2018-08-04 23:01 UTC (permalink / raw)
  To: Michael Bringmann, linuxppc-dev, Tyrel Datwyler, Michael Ellerman,
	engebret
In-Reply-To: <d3d9e08c-70a7-aa6b-13f4-1afed294d39e@linux.vnet.ibm.com>

On 08/03/2018 02:23 PM, Tyrel Datwyler wrote:
> On 08/02/2018 11:15 AM, Michael Bringmann wrote:
>> Hello:
>>     I have been observing an anomaly during LPAR migrations between
>> a couple of P8 systems.
>>
>> This is the problem.  After migrating an LPAR, the PPC mobility code
>> receives RTAS requests to delete nodes with platform-/hardware-specific
>> attributes when restarting the kernel after a migration.  My example is
>> for migration between a P8 Alpine and a P8 Brazos.  Among the nodes
>> that I see being deleted are 'ibm,random-v1', 'ibm,compression-v1',
>> 'ibm,platform-facilities', and 'ibm,sym-encryption-v1'.  Of these
>> nodes, the following are created during initial boot by calls to
>> vio_register_driver:
>>
>> drivers/char/hw_random/pseries-rng.c
>>         ibm,random-v1
>>
>> drivers/crypto/nx/nx-842-pseries.c
>>         ibm,compression-v1
>>
>> drivers/crypto/nx/nx.c
>>         ibm,sym-encryption-v1
>>
>> After the migration, these nodes are deleted, but nothing recreates
>> them.  If I boot the LPAR on the target system, the nodes are added
>> again.
>>
>> My question is how do we recreate these nodes after migration?
> 
> Hmm, I'd have to see the scenario in action, but these should be added back by ibm,update-nodes RTAS call. There is some debug code in driver/of/dynamic.c that can be enabled that will log node/property dynamic reconfiguration events.
> 

I took a quick look by turning of phandle caching and turning on the OF reconfig debug on one of my lpars. I noticed that it looks like this could be a firmware issue. Looks to me like we aren't being notified to re-add those child nodes of ibm,platform-facilities.

[ 1671.638041] OF: notify DETACH_NODE     /cpus/l2-cache@2022
[ 1671.638094] OF: notify DETACH_NODE     /cpus/l3-cache@3122
[ 1671.638119] OF: notify DETACH_NODE     /cpus/l2-cache@2023
[ 1671.638136] OF: notify DETACH_NODE     /cpus/l3-cache@3123
[ 1671.638164] OF: notify DETACH_NODE     /ibm,platform-facilities/ibm,random-v1
[ 1671.638182] OF: notify DETACH_NODE     /ibm,platform-facilities/ibm,compression-v1
[ 1671.638198] OF: notify DETACH_NODE     /ibm,platform-facilities/ibm,sym-encryption-v1
[ 1671.638219] OF: notify DETACH_NODE     /ibm,platform-facilities
[ 1671.776419] OF: notify UPDATE_PROPERTY /:ibm,model-class

... snipped property updates

[ 1672.129941] OF: notify UPDATE_PROPERTY /cpus/PowerPC,POWER8@8:ibm,dfp
[ 1672.147551] OF: notify ATTACH_NODE     /cpus/l2-cache@202e
[ 1672.166321] OF: notify ATTACH_NODE     /cpus/l3-cache@312e
[ 1672.183971] OF: notify ATTACH_NODE     /cpus/l2-cache@202f
[ 1672.202752] OF: notify ATTACH_NODE     /cpus/l3-cache@312f
[ 1672.230760] OF: notify ATTACH_NODE     /ibm,platform-facilities

Need to verify this by tracing the RTAS calls to ibm,update-nodes. I'll try and look at that tomorrow. The loop that processes the nodes to update in pseries_devicetree_update() blindly ignores the return codes from delete_dt_node(), update_dt_node(), and add_dt_node(). So, it is also possible that we are notified, but are silently failing the add.

-Tyrel

^ permalink raw reply

* Re: [RFC 0/4] Virtio uses DMA API for all devices
From: Michael S. Tsirkin @ 2018-08-05  0:09 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Benjamin Herrenschmidt, Will Deacon, Anshuman Khandual,
	virtualization, linux-kernel, linuxppc-dev, aik, robh, joe,
	elfring, david, jasowang, mpe, linuxram, haren, paulus, srikar,
	robin.murphy, jean-philippe.brucker, marc.zyngier
In-Reply-To: <20180804081500.GA1455@infradead.org>

On Sat, Aug 04, 2018 at 01:15:00AM -0700, Christoph Hellwig wrote:
> On Fri, Aug 03, 2018 at 10:17:32PM +0300, Michael S. Tsirkin wrote:
> > It seems reasonable to teach a platform to override dma-range
> > for a specific device e.g. in case it knows about bugs in ACPI.
> 
> A platform will be able override dma-range using the dev->bus_dma_mask
> field starting in 4.19.  But we'll still need a way how to
> 
>   a) document in the virtio spec that all bus dma quirks are to be
>      applied

I agree it's a good idea. In particular I suspect that PLATFORM_IOMMU
should be extended to cover that. But see below.

>   b) a way to document in a virtio-related spec how the bus handles
>      dma for Ben's totally fucked up hypervisor.  Without that there
>      is not way we'll get interoperable implementations.

So in this case however I'm not sure what exactly do we want to add. It
seems that from point of view of the device, there is nothing special -
it just gets a PA and writes there.  It also seems that guest does not
need to get any info from the device either. Instead guest itself needs
device to DMA into specific addresses, for its own reasons.

It seems that the fact that within guest it's implemented using a bounce
buffer and that it's easiest to do by switching virtio to use the DMA API
isn't something virtio spec concerns itself with.

I'm open to suggestions.

-- 
MST

^ permalink raw reply

* Re: [RFC 0/4] Virtio uses DMA API for all devices
From: Michael S. Tsirkin @ 2018-08-05  0:22 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Christoph Hellwig, Will Deacon, Anshuman Khandual, virtualization,
	linux-kernel, linuxppc-dev, aik, robh, joe, elfring, david,
	jasowang, mpe, linuxram, haren, paulus, srikar, robin.murphy,
	jean-philippe.brucker, marc.zyngier
In-Reply-To: <051fd78e15595b414839fa8f9d445b9f4d7576c6.camel@kernel.crashing.org>

On Fri, Aug 03, 2018 at 08:16:21PM -0500, Benjamin Herrenschmidt wrote:
> On Fri, 2018-08-03 at 22:07 +0300, Michael S. Tsirkin wrote:
> > On Fri, Aug 03, 2018 at 10:58:36AM -0500, Benjamin Herrenschmidt wrote:
> > > On Fri, 2018-08-03 at 00:05 -0700, Christoph Hellwig wrote:
> > > > >   2- Make virtio use the DMA API with our custom platform-provided
> > > > > swiotlb callbacks when needed, that is when not using IOMMU *and*
> > > > > running on a secure VM in our case.
> > > > 
> > > > And total NAK the customer platform-provided part of this.  We need
> > > > a flag passed in from the hypervisor that the device needs all bus
> > > > specific dma api treatment, and then just use the normal plaform
> > > > dma mapping setup. 
> > > 
> > > Christoph, as I have explained already, we do NOT have a way to provide
> > > such a flag as neither the hypervisor nor qemu knows anything about
> > > this when the VM is created.
> > 
> > I think the fact you can't add flags from the hypervisor is
> > a sign of a problematic architecture, you should look at
> > adding that down the road - you will likely need it at some point.
> 
> Well, we can later in the boot process. At VM creation time, it's just
> a normal VM. The VM firmware, bootloader etc... are just operating
> normally etc...

I see the allure of this, but I think down the road you will
discover passing a flag in libvirt XML saying
"please use a secure mode" or whatever is a good idea.

Even thought it is probably not required to address this
specific issue.

For example, I don't think ballooning works in secure mode,
you will be able to teach libvirt not to try to add a
balloon to the guest.

> Later on, (we may have even already run Linux at that point,
> unsecurely, as we can use Linux as a bootloader under some
> circumstances), we start a "secure image".
> 
> This is a kernel zImage that includes a "ticket" that has the
> appropriate signature etc... so that when that kernel starts, it can
> authenticate with the ultravisor, be verified (along with its ramdisk)
> etc... and copied (by the UV) into secure memory & run from there.
> 
> At that point, the hypervisor is informed that the VM has become
> secure.
> 
> So at that point, we could exit to qemu to inform it of the change,

That's probably a good idea too.

> and
> have it walk the qtree and "Switch" all the virtio devices to use the
> IOMMU I suppose, but it feels a lot grosser to me.

That part feels gross, yes.

> That's the only other option I can think of.
> 
> > However in this specific case, the flag does not need to come from the
> > hypervisor, it can be set by arch boot code I think.
> > Christoph do you see a problem with that?
> 
> The above could do that yes. Another approach would be to do it from a
> small virtio "quirk" that pokes a bit in the device to force it to
> iommu mode when it detects that we are running in a secure VM. That's a
> bit warty on the virito side but probably not as much as having a qemu
> one that walks of the virtio devices to change how they behave.
> 
> What do you reckon ?

I think you are right that for the dma limit the hypervisor doesn't seem
to need to know.

> What we want to avoid is to expose any of this to the *end user* or
> libvirt or any other higher level of the management stack. We really
> want that stuff to remain contained between the VM itself, KVM and
> maybe qemu.
>
> We will need some other qemu changes for migration so that's ok. But
> the minute you start touching libvirt and the higher levels it becomes
> a nightmare.
> 
> Cheers,
> Ben.

I don't believe you'll be able to avoid that entirely. The split between
libvirt and qemu is more about community than about code, random bits of
functionality tend to land on random sides of that fence.  Better add a
tag in domain XML early is my advice. Having said that, it's your
hypervisor. I'm just suggesting that when hypervisor does somehow need
to care then I suspect most people won't be receptive to the argument
that changing libvirt is a nightmare.

> > > >  To get swiotlb you'll need to then use the DT/ACPI
> > > > dma-range property to limit the addressable range, and a swiotlb
> > > > capable plaform will use swiotlb automatically.
> > > 
> > > This cannot be done as you describe it.
> > > 
> > > The VM is created as a *normal* VM. The DT stuff is generated by qemu
> > > at a point where it has *no idea* that the VM will later become secure
> > > and thus will have to restrict which pages can be used for "DMA".
> > > 
> > > The VM will *at runtime* turn itself into a secure VM via interactions
> > > with the security HW and the Ultravisor layer (which sits below the
> > > HV). This happens way after the DT has been created and consumed, the
> > > qemu devices instanciated etc...
> > > 
> > > Only the guest kernel knows because it initates the transition. When
> > > that happens, the virtio devices have already been used by the guest
> > > firmware, bootloader, possibly another kernel that kexeced the "secure"
> > > one, etc... 
> > > 
> > > So instead of running around saying NAK NAK NAK, please explain how we
> > > can solve that differently.
> > > 
> > > Ben.
> > > 

^ permalink raw reply

* Re: [RFC 0/4] Virtio uses DMA API for all devices
From: Michael S. Tsirkin @ 2018-08-05  0:23 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Christoph Hellwig, Will Deacon, Anshuman Khandual, virtualization,
	linux-kernel, linuxppc-dev, aik, robh, joe, elfring, david,
	jasowang, mpe, linuxram, haren, paulus, srikar, robin.murphy,
	jean-philippe.brucker, marc.zyngier
In-Reply-To: <398d498048a5b288cce06003c0808bfefa1e9b1f.camel@kernel.crashing.org>

On Fri, Aug 03, 2018 at 08:22:11PM -0500, Benjamin Herrenschmidt wrote:
> (Appologies if you got this twice, my mailer had a brain fart and I don't
> know if the first one got through & am about to disappear in a plane for 17h)

I got like 3 of these. I hope that's true for everyone as I replied to
1st one.

^ permalink raw reply

* Re: [RFC 0/4] Virtio uses DMA API for all devices
From: Michael S. Tsirkin @ 2018-08-05  0:24 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Jason Wang, Anshuman Khandual, virtualization, linux-kernel,
	linuxppc-dev, aik, robh, joe, elfring, david, mpe, hch, linuxram,
	haren, paulus, srikar
In-Reply-To: <01c74680c4b3aa25d9b4375a9ab5e10046b7c71b.camel@kernel.crashing.org>

On Fri, Aug 03, 2018 at 08:21:26PM -0500, Benjamin Herrenschmidt wrote:
> On Fri, 2018-08-03 at 22:08 +0300, Michael S. Tsirkin wrote:
> > > > > Please go through these patches and review whether this approach broadly
> > > > > makes sense. I will appreciate suggestions, inputs, comments regarding
> > > > > the patches or the approach in general. Thank you.
> > > > 
> > > > Jason did some work on profiling this. Unfortunately he reports
> > > > about 4% extra overhead from this switch on x86 with no vIOMMU.
> > > 
> > > The test is rather simple, just run pktgen (pktgen_sample01_simple.sh) in
> > > guest and measure PPS on tap on host.
> > > 
> > > Thanks
> > 
> > Could you supply host configuration involved please?
> 
> I wonder how much of that could be caused by Spectre mitigations
> blowing up indirect function calls...
> 
> Cheers,
> Ben.

I won't be surprised. If yes I suggested a way to mitigate the overhead.

-- 
MSR 

^ permalink raw reply

* Re: [RFC 0/4] Virtio uses DMA API for all devices
From: Michael S. Tsirkin @ 2018-08-05  0:27 UTC (permalink / raw)
  To: Will Deacon
  Cc: Benjamin Herrenschmidt, Christoph Hellwig, Anshuman Khandual,
	virtualization, linux-kernel, linuxppc-dev, aik, robh, joe,
	elfring, david, jasowang, mpe, linuxram, haren, paulus, srikar,
	robin.murphy, jean-philippe.brucker, marc.zyngier
In-Reply-To: <20180801081637.GA14438@arm.com>

On Wed, Aug 01, 2018 at 09:16:38AM +0100, Will Deacon wrote:
> On Tue, Jul 31, 2018 at 03:36:22PM -0500, Benjamin Herrenschmidt wrote:
> > On Tue, 2018-07-31 at 10:30 -0700, Christoph Hellwig wrote:
> > > > However the question people raise is that DMA API is already full of
> > > > arch-specific tricks the likes of which are outlined in your post linked
> > > > above. How is this one much worse?
> > > 
> > > None of these warts is visible to the driver, they are all handled in
> > > the architecture (possibly on a per-bus basis).
> > > 
> > > So for virtio we really need to decide if it has one set of behavior
> > > as specified in the virtio spec, or if it behaves exactly as if it
> > > was on a PCI bus, or in fact probably both as you lined up.  But no
> > > magic arch specific behavior inbetween.
> > 
> > The only arch specific behaviour is needed in the case where it doesn't
> > behave like PCI. In this case, the PCI DMA ops are not suitable, but in
> > our secure VMs, we still need to make it use swiotlb in order to bounce
> > through non-secure pages.
> 
> On arm/arm64, the problem we have is that legacy virtio devices on the MMIO
> transport (so definitely not PCI) have historically been advertised by qemu
> as not being cache coherent, but because the virtio core has bypassed DMA
> ops then everything has happened to work. If we blindly enable the arch DMA
> ops, we'll plumb in the non-coherent ops and start getting data corruption,
> so we do need a way to quirk virtio as being "always coherent" if we want to
> use the DMA ops (which we do, because our emulation platforms have an IOMMU
> for all virtio devices).
> 
> Will

Right that's not very different from placing the device within the IOMMU
domain but in fact bypassing the IOMMU. I wonder whether anyone ever
needs a non coherent virtio-mmio. If yes we can extend
PLATFORM_IOMMU to cover that or add another bit.

What exactly do the non-coherent ops do that causes the corruption?

-- 
MST

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox