LinuxPPC-Dev Archive on lore.kernel.org

LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH] kvm: powerpc: book3s: Fix build break for BOOK3S_32
From: Aneesh Kumar K.V @ 2013-10-02 14:38 UTC (permalink / raw)
  To: agraf, benh, paulus; +Cc: linuxppc-dev, kvm, kvm-ppc, Aneesh Kumar K.V

From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>

This was introduced by 85a0d845d8bb5df5d2669416212f56cbe1474c6b

arch/powerpc/kvm/book3s_pr.c: In function 'kvmppc_core_vcpu_create':
arch/powerpc/kvm/book3s_pr.c:1182:30: error: 'struct kvmppc_vcpu_book3s' has no member named 'shadow_vcpu'
make[1]: *** [arch/powerpc/kvm/book3s_pr.o] Error 1

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 arch/powerpc/kvm/book3s_pr.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c
index 8941885..6075dbd 100644
--- a/arch/powerpc/kvm/book3s_pr.c
+++ b/arch/powerpc/kvm/book3s_pr.c
@@ -1179,7 +1179,7 @@ struct kvm_vcpu *kvmppc_core_vcpu_create(struct kvm *kvm, unsigned int id)
 
 #ifdef CONFIG_KVM_BOOK3S_32
 	vcpu->arch.shadow_vcpu =
-		kzalloc(sizeof(*vcpu_book3s->shadow_vcpu), GFP_KERNEL);
+		kzalloc(sizeof(*vcpu->arch.shadow_vcpu), GFP_KERNEL);
 	if (!vcpu->arch.shadow_vcpu)
 		goto free_vcpu3s;
 #endif
-- 
1.8.1.2

^ permalink raw reply related

* Re: [PATCH 3/3] KVM: PPC: Book3S: Add support for hwrng found on some powernv systems
From: Paolo Bonzini @ 2013-10-02 14:38 UTC (permalink / raw)
  To: Alexander Graf
  Cc: tytso, kvm, Gleb Natapov, linuxppc-dev, linux-kernel, kvm-ppc,
	herbert, Paul Mackerras, mpm
In-Reply-To: <C4834CF8-F81C-4B82-B1A7-1751D50AADB7@suse.de>

Il 02/10/2013 16:36, Alexander Graf ha scritto:
>> > 
>> > With Michael's earlier patch in this series, the hwrng is accessible by
>> > host userspace via /dev/hwrng, no?
> Yes, but there's not token from user space that gets passed into the
> kernel to check whether access is ok or not. So while QEMU may not have
> permission to open /dev/hwrng it could spawn a guest that opens it,
> drains all entropy out of it and thus stall other processes which try to
> fetch entropy, no?
> 
> Maybe I haven't fully grasped the interface yet though :).

Yes, that's right.  I don't think it's a huge problem, but it's another
point in favor of just doing the hypercall in userspace.

Paolo

^ permalink raw reply

* Re: [PATCH 3/3] KVM: PPC: Book3S: Add support for hwrng found on some powernv systems
From: Gleb Natapov @ 2013-10-02 14:37 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: tytso, kvm, linuxppc-dev, Alexander Graf, kvm-ppc, linux-kernel,
	herbert, Paul Mackerras, mpm
In-Reply-To: <524C2EAE.7090209@redhat.com>

On Wed, Oct 02, 2013 at 04:33:18PM +0200, Paolo Bonzini wrote:
> Il 02/10/2013 16:08, Alexander Graf ha scritto:
> > > The hwrng is accessible by host userspace via /dev/mem.
> > 
> > A guest should live on the same permission level as a user space
> > application. If you run QEMU as UID 1000 without access to /dev/mem, why
> > should the guest suddenly be able to directly access a memory location
> > (MMIO) it couldn't access directly through a normal user space interface.
> > 
> > It's basically a layering violation.
> 
> With Michael's earlier patch in this series, the hwrng is accessible by
> host userspace via /dev/hwrng, no?
> 
Access to which can be controlled by its permission. Permission of
/dev/kvm may be different. If we route hypercall via userspace and
configure qemu to get entropy from /dev/hwrng everything will fall
nicely together (except performance).

--
			Gleb.

^ permalink raw reply

* Re: [PATCH 3/3] KVM: PPC: Book3S: Add support for hwrng found on some powernv systems
From: Alexander Graf @ 2013-10-02 14:36 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: tytso, kvm, Gleb Natapov, linuxppc-dev, linux-kernel, kvm-ppc,
	herbert, Paul Mackerras, mpm
In-Reply-To: <524C2EAE.7090209@redhat.com>

On 02.10.2013, at 16:33, Paolo Bonzini wrote:

> Il 02/10/2013 16:08, Alexander Graf ha scritto:
>>> The hwrng is accessible by host userspace via /dev/mem.
>>=20
>> A guest should live on the same permission level as a user space
>> application. If you run QEMU as UID 1000 without access to /dev/mem, =
why
>> should the guest suddenly be able to directly access a memory =
location
>> (MMIO) it couldn't access directly through a normal user space =
interface.
>>=20
>> It's basically a layering violation.
>=20
> With Michael's earlier patch in this series, the hwrng is accessible =
by
> host userspace via /dev/hwrng, no?

Yes, but there's not token from user space that gets passed into the =
kernel to check whether access is ok or not. So while QEMU may not have =
permission to open /dev/hwrng it could spawn a guest that opens it, =
drains all entropy out of it and thus stall other processes which try to =
fetch entropy, no?

Maybe I haven't fully grasped the interface yet though :).

Alex

^ permalink raw reply

* Re: [PATCH 3/3] KVM: PPC: Book3S: Add support for hwrng found on some powernv systems
From: Paolo Bonzini @ 2013-10-02 14:33 UTC (permalink / raw)
  To: Alexander Graf
  Cc: tytso, kvm, Gleb Natapov, linuxppc-dev, linux-kernel, kvm-ppc,
	herbert, Paul Mackerras, mpm
In-Reply-To: <029A8D6C-C23C-42B2-8C26-D76B59E2C9DD@suse.de>

Il 02/10/2013 16:08, Alexander Graf ha scritto:
> > The hwrng is accessible by host userspace via /dev/mem.
> 
> A guest should live on the same permission level as a user space
> application. If you run QEMU as UID 1000 without access to /dev/mem, why
> should the guest suddenly be able to directly access a memory location
> (MMIO) it couldn't access directly through a normal user space interface.
> 
> It's basically a layering violation.

With Michael's earlier patch in this series, the hwrng is accessible by
host userspace via /dev/hwrng, no?

Paolo

^ permalink raw reply

* Re: [PATCH 3/3] KVM: PPC: Book3S: Add support for hwrng found on some powernv systems
From: Gleb Natapov @ 2013-10-02 14:10 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: tytso, kvm, linuxppc-dev, Alexander Graf, kvm-ppc, linux-kernel,
	herbert, Paul Mackerras, mpm, Paolo Bonzini
In-Reply-To: <1380722275.12149.28.camel@concordia>

On Wed, Oct 02, 2013 at 11:57:55PM +1000, Michael Ellerman wrote:
> On Wed, 2013-10-02 at 13:02 +0300, Gleb Natapov wrote:
> > On Wed, Oct 02, 2013 at 11:50:50AM +0200, Alexander Graf wrote:
> > > 
> > > On 02.10.2013, at 11:11, Alexander Graf wrote:
> > > 
> > > So how do you solve live migration between a kernel that has this patch and one that doesn't?
> > > 
> > Yes, I alluded to it in my email to Paul and Paolo asked also. How this
> > interface is disabled? 
> 
> Yes that is a valid point.
> 
> We can't disable the interface at runtime, the guest detects its
> presence at boot.
> 
> What will happen is the hcall will come through to QEMU, which will
> reject it with H_FUNCTION (~= ENOSYS).
> 
> The current pseries-rng driver does not handle that case well, which is
> exactly why I sent patches to fix it recently.
> 
> The only other option would be to feed it with /dev/random.
> 
What about other way, if guest migrates from kvm that has no this
hypercall to one that has? We try to not change HW under guest during
migration.

> > Also hwrnd is MMIO in a host why guest needs to
> > use hypercall instead of emulating the device (in kernel or somewhere
> > else?). 
> 
> Because PAPR is a platform specification and it specifies that the
> interface is a hypervisor call. We can't just decide we want to do it
> differently.
Any insights on why it was specified this what. What is special about
hwrnd device that hypercall is needed to access it? I got that you didn't
just decide to implement it that way :) Also what will happen if guest
will find emulated hwrnd device, will it use it?

> 
> > Another things is that on a host hwrnd is protected from
> > direct userspace access by virtue of been a device, but guest code (event
> > kernel mode) is userspace as far as hosts security model goes, so by
> > implementing this hypercall in a way that directly access hwrnd you
> > expose hwrnd to a userspace unconditionally. Why is this a good idea? 
> 
> I'm not sure I follow you.
> 
> The hwrng is accessible by host userspace via /dev/mem.
> 
Regular user has no access to /dev/mem, but he can start kvm guest and
gain access to the device.

--
			Gleb.

^ permalink raw reply

* Re: [PATCH 3/3] KVM: PPC: Book3S: Add support for hwrng found on some powernv systems
From: Alexander Graf @ 2013-10-02 14:08 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: tytso, kvm, Gleb Natapov, linuxppc-dev, linux-kernel, kvm-ppc,
	herbert, Paul Mackerras, mpm, Paolo Bonzini
In-Reply-To: <1380722275.12149.28.camel@concordia>


On 02.10.2013, at 15:57, Michael Ellerman wrote:

> On Wed, 2013-10-02 at 13:02 +0300, Gleb Natapov wrote:
>> On Wed, Oct 02, 2013 at 11:50:50AM +0200, Alexander Graf wrote:
>>>=20
>>> On 02.10.2013, at 11:11, Alexander Graf wrote:
>>>=20
>>> So how do you solve live migration between a kernel that has this =
patch and one that doesn't?
>>>=20
>> Yes, I alluded to it in my email to Paul and Paolo asked also. How =
this
>> interface is disabled?=20
>=20
> Yes that is a valid point.
>=20
> We can't disable the interface at runtime, the guest detects its
> presence at boot.
>=20
> What will happen is the hcall will come through to QEMU, which will
> reject it with H_FUNCTION (~=3D ENOSYS).
>=20
> The current pseries-rng driver does not handle that case well, which =
is
> exactly why I sent patches to fix it recently.
>=20
> The only other option would be to feed it with /dev/random.
>=20
>> Also hwrnd is MMIO in a host why guest needs to
>> use hypercall instead of emulating the device (in kernel or somewhere
>> else?).=20
>=20
> Because PAPR is a platform specification and it specifies that the
> interface is a hypervisor call. We can't just decide we want to do it
> differently.
>=20
>> Another things is that on a host hwrnd is protected from
>> direct userspace access by virtue of been a device, but guest code =
(event
>> kernel mode) is userspace as far as hosts security model goes, so by
>> implementing this hypercall in a way that directly access hwrnd you
>> expose hwrnd to a userspace unconditionally. Why is this a good idea?=20=

>=20
> I'm not sure I follow you.
>=20
> The hwrng is accessible by host userspace via /dev/mem.

A guest should live on the same permission level as a user space =
application. If you run QEMU as UID 1000 without access to /dev/mem, why =
should the guest suddenly be able to directly access a memory location =
(MMIO) it couldn't access directly through a normal user space =
interface.

It's basically a layering violation.


Alex

^ permalink raw reply

* Re: [PATCH 3/3] KVM: PPC: Book3S: Add support for hwrng found on some powernv systems
From: Michael Ellerman @ 2013-10-02 13:57 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: tytso, kvm, linuxppc-dev, Alexander Graf, kvm-ppc, linux-kernel,
	herbert, Paul Mackerras, mpm, Paolo Bonzini
In-Reply-To: <20131002100224.GF17294@redhat.com>

On Wed, 2013-10-02 at 13:02 +0300, Gleb Natapov wrote:
> On Wed, Oct 02, 2013 at 11:50:50AM +0200, Alexander Graf wrote:
> > 
> > On 02.10.2013, at 11:11, Alexander Graf wrote:
> > 
> > So how do you solve live migration between a kernel that has this patch and one that doesn't?
> > 
> Yes, I alluded to it in my email to Paul and Paolo asked also. How this
> interface is disabled? 

Yes that is a valid point.

We can't disable the interface at runtime, the guest detects its
presence at boot.

What will happen is the hcall will come through to QEMU, which will
reject it with H_FUNCTION (~= ENOSYS).

The current pseries-rng driver does not handle that case well, which is
exactly why I sent patches to fix it recently.

The only other option would be to feed it with /dev/random.

> Also hwrnd is MMIO in a host why guest needs to
> use hypercall instead of emulating the device (in kernel or somewhere
> else?). 

Because PAPR is a platform specification and it specifies that the
interface is a hypervisor call. We can't just decide we want to do it
differently.

> Another things is that on a host hwrnd is protected from
> direct userspace access by virtue of been a device, but guest code (event
> kernel mode) is userspace as far as hosts security model goes, so by
> implementing this hypercall in a way that directly access hwrnd you
> expose hwrnd to a userspace unconditionally. Why is this a good idea? 

I'm not sure I follow you.

The hwrng is accessible by host userspace via /dev/mem.

cheers

^ permalink raw reply

* Re: [PATCH 1/2][v7] powerpc/mpc85xx:Add initial device tree support of T104x
From: Prabhakar Kushwaha @ 2013-10-02 12:31 UTC (permalink / raw)
  To: Scott Wood; +Cc: Varun Sethi, linuxppc-dev, Poonam Aggrwal, Priyanka Jain
In-Reply-To: <1380657408.10618.52.camel@snotra.buserror.net>

On 10/02/2013 01:26 AM, Scott Wood wrote:
> On Tue, 2013-10-01 at 08:56 +0530, Prabhakar Kushwaha wrote:
>> On 10/01/2013 01:17 AM, Scott Wood wrote:
>>> On Mon, 2013-09-30 at 12:24 +0530, Prabhakar Kushwaha wrote:
>>>>       - Removed l2switch. It will be added later
>>> Why?
>> I am not aware of bindings required for l2switch as we are not working
>> on the driver.
>> Earlier I thought of putting a place holder. but as you suggested to put
>> bindings in documentation.
>> It will be good if it is put by actual driver owner.
> Is there a reason to believe the binding will be complicated?
>
> Does any such "driver owner" exist yet?

I don't know, as I am unaware of l2switch driver.

>
>>>> +sata@220000 {
>>>> +			fsl,iommu-parent = <&pamu0>;
>>>> +			fsl,liodn-reg = <&guts 0x550>; /* SATA1LIODNR */
>>>> +};
>>>> +/include/ "qoriq-sata2-1.dtsi"
>>>> +sata@221000 {
>>>> +			fsl,iommu-parent = <&pamu0>;
>>>> +			fsl,liodn-reg = <&guts 0x554>; /* SATA2LIODNR */
>>>> +};
>>> Whitespace
>> do we have any scripts which check for whitespace as checkpatch never
>> give any warning/error.
>> it is a very silly mistake which I am doing continuously :(
> checkpatch doesn't check dts files.
Manual check :(

>>>> +/include/ "t1040si-post.dtsi"
>>> Should at least have a comment indicating that eventually this should
>>> hold the l2 switch node.
>> yes. Ideally it should be.
>> but if I put a comment then I believe this patch will not be completed.
>> it will think as a RFC.
>> as I believe putting of TODO is generally for RFC patches.
> As is, one would wonder why the separate file exists at all.
>
> The TODO is there whether you have a comment acknowledging it or
> not. :-)
>
>
I agree. I will add a comments.

Regards,
Prabhakar

^ permalink raw reply

* Re: [PATCH] powerpc/iommu: use GFP_KERNEL instead of GFP_ATOMIC in iommu_init_table()
From: Thadeu Lima de Souza Cascardo @ 2013-10-02 12:32 UTC (permalink / raw)
  To: Nishanth Aravamudan; +Cc: linuxppc-dev, Anton Blanchard, Paul Mackerras
In-Reply-To: <20131001210453.GB4065@linux.vnet.ibm.com>

On Tue, Oct 01, 2013 at 02:04:53PM -0700, Nishanth Aravamudan wrote:
> Under heavy (DLPAR?) stress, we tripped this panic() in
> arch/powerpc/kernel/iommu.c::iommu_init_table():
>     
> 	page = alloc_pages_node(nid, GFP_ATOMIC, get_order(sz));
> 	if (!page)
> 		panic("iommu_init_table: Can't allocate %ld bytes\n",
>     sz);
>     
> Before the panic() we got a page allocation failure for an order-2
> allocation. There appears to be memory free, but perhaps not in the
> ATOMIC context. I looked through all the call-sites of
> iommu_init_table() and didn't see any obvious reason to need an ATOMIC
> allocation. Most call-sites in fact have an explicit GFP_KERNEL
> allocation shortly before the call to iommu_init_table(), indicating we
> are not in an atomic context. There is some indirection for some paths,
> but I didn't see any locks indicating that GFP_KERNEL is inappropriate.
> 
> With this change under the same conditions, we have not been able to
> reproduce the panic.
>     
> Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
> 
> diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
> index 0adab06..572bb5b 100644
> --- a/arch/powerpc/kernel/iommu.c
> +++ b/arch/powerpc/kernel/iommu.c
> @@ -661,7 +661,7 @@ struct iommu_table *iommu_init_table(struct iommu_table *tbl, int nid)
>  	/* number of bytes needed for the bitmap */
>  	sz = BITS_TO_LONGS(tbl->it_size) * sizeof(unsigned long);
> 
> -	page = alloc_pages_node(nid, GFP_ATOMIC, get_order(sz));
> +	page = alloc_pages_node(nid, GFP_KERNEL, get_order(sz));
>  	if (!page)
>  		panic("iommu_init_table: Can't allocate %ld bytes\n", sz);
>  	tbl->it_map = page_address(page);

I didn't respond to the previous message, but also checked if there were
any history on the logs, and found this was as it is from the start. I
also found no other reasons why it needs to be atomic. Therefore,

Acked-by: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>

^ permalink raw reply

* Re: [PATCH 3/3] KVM: PPC: Book3S: Add support for hwrng found on some powernv systems
From: Gleb Natapov @ 2013-10-02 10:02 UTC (permalink / raw)
  To: Alexander Graf
  Cc: tytso, kvm, linuxppc-dev, linux-kernel, kvm-ppc, herbert,
	Paul Mackerras, mpm, Paolo Bonzini
In-Reply-To: <3CBF5732-E7EE-4C96-8132-6D7B77270DAF@suse.de>

On Wed, Oct 02, 2013 at 11:50:50AM +0200, Alexander Graf wrote:
> 
> On 02.10.2013, at 11:11, Alexander Graf wrote:
> 
> > 
> > On 02.10.2013, at 11:06, Benjamin Herrenschmidt wrote:
> > 
> >> On Wed, 2013-10-02 at 10:46 +0200, Paolo Bonzini wrote:
> >> 
> >>> 
> >>> Thanks.  Any chance you can give some numbers of a kernel hypercall and
> >>> a userspace hypercall on Power, so we have actual data?  For example a
> >>> hypercall that returns H_PARAMETER as soon as possible.
> >> 
> >> I don't have (yet) numbers at hand but we have basically 3 places where
> >> we can handle hypercalls:
> >> 
> >> - Kernel real mode. This is where most of our MMU stuff goes for
> >> example unless it needs to trigger a page fault in Linux. This is
> >> executed with translation disabled and the MMU still in guest context.
> >> This is the fastest path since we don't take out the other threads nor
> >> perform any expensive context change. This is where we put the
> >> "accelerated" H_RANDOM as well.
> >> 
> >> - Kernel virtual mode. That's a full exit, so all threads are out and
> >> MMU switched back to host Linux. Things like vhost MMIO emulation goes
> >> there, page faults, etc...
> >> 
> >> - Qemu. This adds the round trip to userspace on top of the above.
> > 
> > Right, and the difference for the patch in question is really whether we handle in in kernel virtual mode or in QEMU, so the bulk of the overhead (kicking threads out of  guest context, switching MMU context, etc) happens either way.
> > 
> > So the additional overhead when handling it in QEMU here really boils down to the user space roundtrip (plus another random number read roundtrip).
> 
> Ah, sorry, I misread the patch. You're running the handler in real mode of course :).
> 
> So how do you solve live migration between a kernel that has this patch and one that doesn't?
> 
Yes, I alluded to it in my email to Paul and Paolo asked also. How this
interface is disabled? Also hwrnd is MMIO in a host why guest needs to
use hypercall instead of emulating the device (in kernel or somewhere
else?). Another things is that on a host hwrnd is protected from
direct userspace access by virtue of been a device, but guest code (event
kernel mode) is userspace as far as hosts security model goes, so by
implementing this hypercall in a way that directly access hwrnd you
expose hwrnd to a userspace unconditionally. Why is this a good idea? 

--
			Gleb.

^ permalink raw reply

* Re: [PATCH 3/3] KVM: PPC: Book3S: Add support for hwrng found on some powernv systems
From: Alexander Graf @ 2013-10-02  9:50 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: tytso, kvm, Gleb Natapov, linuxppc-dev, linux-kernel, kvm-ppc,
	herbert, Paul Mackerras, mpm, Paolo Bonzini
In-Reply-To: <668E4650-BC22-4CBF-A282-E7875DF29DB6@suse.de>


On 02.10.2013, at 11:11, Alexander Graf wrote:

>=20
> On 02.10.2013, at 11:06, Benjamin Herrenschmidt wrote:
>=20
>> On Wed, 2013-10-02 at 10:46 +0200, Paolo Bonzini wrote:
>>=20
>>>=20
>>> Thanks.  Any chance you can give some numbers of a kernel hypercall =
and
>>> a userspace hypercall on Power, so we have actual data?  For example =
a
>>> hypercall that returns H_PARAMETER as soon as possible.
>>=20
>> I don't have (yet) numbers at hand but we have basically 3 places =
where
>> we can handle hypercalls:
>>=20
>> - Kernel real mode. This is where most of our MMU stuff goes for
>> example unless it needs to trigger a page fault in Linux. This is
>> executed with translation disabled and the MMU still in guest =
context.
>> This is the fastest path since we don't take out the other threads =
nor
>> perform any expensive context change. This is where we put the
>> "accelerated" H_RANDOM as well.
>>=20
>> - Kernel virtual mode. That's a full exit, so all threads are out and
>> MMU switched back to host Linux. Things like vhost MMIO emulation =
goes
>> there, page faults, etc...
>>=20
>> - Qemu. This adds the round trip to userspace on top of the above.
>=20
> Right, and the difference for the patch in question is really whether =
we handle in in kernel virtual mode or in QEMU, so the bulk of the =
overhead (kicking threads out of  guest context, switching MMU context, =
etc) happens either way.
>=20
> So the additional overhead when handling it in QEMU here really boils =
down to the user space roundtrip (plus another random number read =
roundtrip).

Ah, sorry, I misread the patch. You're running the handler in real mode =
of course :).

So how do you solve live migration between a kernel that has this patch =
and one that doesn't?


Alex

^ permalink raw reply

* Re: [PATCH 3/3] KVM: PPC: Book3S: Add support for hwrng found on some powernv systems
From: Alexander Graf @ 2013-10-02  9:11 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: tytso, kvm, Gleb Natapov, linuxppc-dev, linux-kernel, kvm-ppc,
	herbert, Paul Mackerras, mpm, Paolo Bonzini
In-Reply-To: <1380704789.645.57.camel@pasglop>


On 02.10.2013, at 11:06, Benjamin Herrenschmidt wrote:

> On Wed, 2013-10-02 at 10:46 +0200, Paolo Bonzini wrote:
>=20
>>=20
>> Thanks.  Any chance you can give some numbers of a kernel hypercall =
and
>> a userspace hypercall on Power, so we have actual data?  For example =
a
>> hypercall that returns H_PARAMETER as soon as possible.
>=20
> I don't have (yet) numbers at hand but we have basically 3 places =
where
> we can handle hypercalls:
>=20
> - Kernel real mode. This is where most of our MMU stuff goes for
> example unless it needs to trigger a page fault in Linux. This is
> executed with translation disabled and the MMU still in guest context.
> This is the fastest path since we don't take out the other threads nor
> perform any expensive context change. This is where we put the
> "accelerated" H_RANDOM as well.
>=20
> - Kernel virtual mode. That's a full exit, so all threads are out and
> MMU switched back to host Linux. Things like vhost MMIO emulation goes
> there, page faults, etc...
>=20
> - Qemu. This adds the round trip to userspace on top of the above.

Right, and the difference for the patch in question is really whether we =
handle in in kernel virtual mode or in QEMU, so the bulk of the overhead =
(kicking threads out of  guest context, switching MMU context, etc) =
happens either way.

So the additional overhead when handling it in QEMU here really boils =
down to the user space roundtrip (plus another random number read =
roundtrip).


Alex

^ permalink raw reply

* Re: [PATCH 3/3] KVM: PPC: Book3S: Add support for hwrng found on some powernv systems
From: Benjamin Herrenschmidt @ 2013-10-02  9:06 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: tytso, kvm, Gleb Natapov, linuxppc-dev, linux-kernel, kvm-ppc,
	agraf, herbert, Paul Mackerras, mpm
In-Reply-To: <524BDD73.3020106@redhat.com>

On Wed, 2013-10-02 at 10:46 +0200, Paolo Bonzini wrote:

> 
> Thanks.  Any chance you can give some numbers of a kernel hypercall and
> a userspace hypercall on Power, so we have actual data?  For example a
> hypercall that returns H_PARAMETER as soon as possible.

I don't have (yet) numbers at hand but we have basically 3 places where
we can handle hypercalls:

 - Kernel real mode. This is where most of our MMU stuff goes for
example unless it needs to trigger a page fault in Linux. This is
executed with translation disabled and the MMU still in guest context.
This is the fastest path since we don't take out the other threads nor
perform any expensive context change. This is where we put the
"accelerated" H_RANDOM as well.

 - Kernel virtual mode. That's a full exit, so all threads are out and
MMU switched back to host Linux. Things like vhost MMIO emulation goes
there, page faults, etc...

 - Qemu. This adds the round trip to userspace on top of the above.

Cheers,
Ben.

^ permalink raw reply

* Re: linux-next: build failure after merge of the akpm tree
From: Frederic Weisbecker @ 2013-10-02  8:53 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Stephen Rothwell, Greg KH, Hugh Dickins, linux-kernel,
	Sergei Trofimovich, linux-next, ppc-dev, Timur Tabi
In-Reply-To: <20130925144328.c679dc74178e78e188386b5a@linux-foundation.org>

On Wed, Sep 25, 2013 at 02:43:28PM -0700, Andrew Morton wrote:
> On Wed, 25 Sep 2013 14:32:14 -0700 (PDT) Hugh Dickins <hughd@google.com> wrote:
> 
> > On Wed, 25 Sep 2013, Andrew Morton wrote:
> > > On Wed, 25 Sep 2013 11:06:43 +1000 Stephen Rothwell <sfr@canb.auug.org.au> wrote:
> > > > Hi Andrew,
> > > > 
> > > > After merging the akpm tree, linux-next builds (powerpc allmodconfig)
> > > > fail like this:
> > > 
> > > I can't get powerpc to build at all at present:
> > > 
> > >   CHK     include/config/kernel.release
> > >   CHK     include/generated/uapi/linux/version.h
> > >   CHK     include/generated/utsrelease.h
> > >   CC      arch/powerpc/kernel/asm-offsets.s
> > > In file included from include/linux/vtime.h:6,
> > >                  from include/linux/hardirq.h:7,
> > >                  from include/linux/memcontrol.h:24,
> > >                  from include/linux/swap.h:8,
> > >                  from include/linux/suspend.h:4,
> > >                  from arch/powerpc/kernel/asm-offsets.c:24:
> > > arch/powerpc/include/generated/asm/vtime.h:1:31: error: asm-generic/vtime.h: No such file or directory
> > 
> > That caught me too: include/asm-generic/vtime.h is a patch-unfriendly
> > 0-length file in the git tree;
> 
> hm, this?
> 
> 
> From: Andrew Morton <akpm@linux-foundation.org>
> Subject: include/asm-generic/vtime.h: avoid zero-length file
> 
> patch(1) can't handle zero-length files - it appears to simply not create
> the file, so my powerpc build fails.
> 
> Put something in here to make life easier.
> 
> Cc: Hugh Dickins <hughd@google.com>
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> ---
> 
>  include/asm-generic/vtime.h |    1 +
>  1 file changed, 1 insertion(+)
> 
> diff -puN /dev/null include/asm-generic/vtime.h
> --- /dev/null
> +++ a/include/asm-generic/vtime.h
> @@ -0,0 +1 @@
> +/* no content, but patch(1) dislikes empty files */
> _
> 
> 
> 
> > I wonder what use it's supposed to have.
> 
> Frederic, can you please confirm that include/asm-generic/vtime.h is
> supposed to be empty?

Yep. I use <asm/vtime.h> to let archs override some CPP symbols. And if they
don't override these, they simply return the generic vtime.h file that is empty
and as such doesn't override anything.

May be that's an ugly way to handle this kind of override scenario but I
couldn't find a better mechanism.

Actually, a Kconfig symbol would do the trick. It just seemed to me like
an overkill at that time. But it may be better.

Thanks.

> 
> > (And I'm not very keen on the growing trend for symlinks in the git tree.)
> 
> ooh, that explains why I lost my arch/microblaze/boot/dts/system.dts.

^ permalink raw reply

* Re: [PATCH 3/3] KVM: PPC: Book3S: Add support for hwrng found on some powernv systems
From: Paolo Bonzini @ 2013-10-02  8:46 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: tytso, kvm, Gleb Natapov, linuxppc-dev, linux-kernel, kvm-ppc,
	agraf, herbert, mpm
In-Reply-To: <20131002050940.GA25363@drongo>

Il 02/10/2013 07:09, Paul Mackerras ha scritto:
> On Tue, Oct 01, 2013 at 01:19:06PM +0200, Paolo Bonzini wrote:
> 
>> Anyhow, I would like to know more about this hwrng and hypercall.
>>
>> Does the hwrng return random numbers (like rdrand) or real entropy (like
>> rdseed that Intel will add in Broadwell)?  What about the hypercall?
> 
> Well, firstly, your terminology is inaccurate.  Real entropy will give
> you random numbers.  I think when you say "random numbers" you
> actually mean "pseudo-random numbers".

Yes---I meant pseudo-random numbers where the generator is periodically
seeded by a random number.

> Secondly, the RNG produces real entropy.

Good to know, thanks.

> Not sure why they are particularly "precious"; we get 64 bits per
> microsecond whether we use them or not.  What are you suggesting
> arch_get_random_long() should do instead?

If you are running rngd, there is no need to have arch_get_random_long()
at all.

>> 3) If the hypercall returns random numbers, then it is a pretty
>> braindead interface since returning 8 bytes at a time limits the
>> throughput to a handful of MB/s (compare to 200 MB/sec for x86 rdrand).
>>  But more important: in this case drivers/char/hw_random/pseries-rng.c
>> is completely broken and insecure, just like patch 2 in case (1) above.
> 
> Assuming that by "random numbers" you actually mean "pseudo-random
> numbers", then this doesn't apply.

Indeed.

>> 4) If the hypercall returns entropy (same as virtio-rng), the same
>> considerations on speed apply.  If you can only produce entropy at say 1
>> MB/s (so reading 8 bytes take 8 microseconds---which is actually very
>> fast), it doesn't matter that much to spend 7 microseconds on a
>> userspace roundtrip.  It's going to be only half the speed of bare
>> metal, not 100 times slower.
> 
> 8 bytes takes at most 1 microsecond, so the round-trip to userspace is
> definitely noticeable.

Thanks.  Any chance you can give some numbers of a kernel hypercall and
a userspace hypercall on Power, so we have actual data?  For example a
hypercall that returns H_PARAMETER as soon as possible.

Paolo

^ permalink raw reply

* Re: [PATCH 3/3] KVM: PPC: Book3S: Add support for hwrng found on some powernv systems
From: Paolo Bonzini @ 2013-10-02  8:38 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: tytso, kvm, Gleb Natapov, linuxppc-dev, linux-kernel, kvm-ppc,
	agraf, herbert, Paul Mackerras, mpm
In-Reply-To: <1380663871.645.44.camel@pasglop>

Il 01/10/2013 23:44, Benjamin Herrenschmidt ha scritto:
> On Tue, 2013-10-01 at 13:19 +0200, Paolo Bonzini wrote:
>> Il 01/10/2013 11:38, Benjamin Herrenschmidt ha scritto:
>>> So for the sake of that dogma you are going to make us do something that
>>> is about 100 times slower ? (and possibly involves more lines of code)
>>
>> If it's 100 times slower there is something else that's wrong.  It's
>> most likely not 100 times slower, and this makes me wonder if you or
>> Michael actually timed the code at all.
> 
> So no we haven't measured. But it is going to be VERY VERY VERY much
> slower. Our exit latencies are bad with our current MMU *and* any exit
> is going to cause all secondary threads on the core to have to exit as
> well (remember P7 is 4 threads, P8 is 8)

Ok, this is indeed the main difference between Power and x86.

>>   100 cycles            bare metal rdrand
>>   2000 cycles           guest->hypervisor->guest
>>   15000 cycles          guest->userspace->guest
>>
>> (100 cycles = 40 ns = 200 MB/sec; 2000 cycles = ~1 microseconds; 15000
>> cycles = ~7.5 microseconds).  Even on 5 year old hardware, a userspace
>> roundtrip is around a dozen microseconds.
> 
> So in your case going to qemu to "emulate" rdrand would indeed be 150
> times slower, I don't see in what universe that would be considered a
> good idea.

rdrand is not privileged on x86, guests can use it.  But my point is
that going to the kernel is already 20 times slower.  Getting entropy
(not just a pseudo-random number seeded by the HWRNG) with rdrand is
~1000 times slower according to Intel's recommendations, so the
roundtrip to userspace is entirely invisible in that case.

The numbers for PPC seem to be a bit different though (it's faster to
read entropy, and slower to do a userspace exit).

> It's a random number obtained from sampling a set of oscillators. It's
> slightly biased but we have very simple code (I believe shared with the
> host kernel implementation) for whitening it as is required by PAPR.

Good.  Actually, passing the dieharder tests does not mean much (an
AES-encrypted counter should also pass them with flashing colors), but
if it's specified by the architecture gods it's likely to have received
some scrutiny.

>> 2) If the hwrng returns entropy, a read from the hwrng is going to even
>> more expensive than an x86 rdrand (perhaps ~2000 cycles).
> 
> Depends how often you read, the HW I think is sampling asynchronously so
> you only block on the MMIO if you already consumed the previous sample
> but I'll let Paulus provide more details here.

Given Paul's description, there's indeed very little extra cost compared
to a "nop" hypercall.  That's nice.

Still, considering that QEMU code has to be there anyway for
compatibility, kernel emulation is not particularly necessary IMHO.  I
would of course like to see actual performance numbers, but besides that
are you ever going to ever see this in the profile except if you run "dd
if=/dev/hwrng of=/dev/null"?

Can you instrument pHyp to find out how many times per second is this
hypercall called by a "normal" Linux or AIX guest?

>> 3) If the hypercall returns random numbers, then it is a pretty
>> braindead interface since returning 8 bytes at a time limits the
>> throughput to a handful of MB/s (compare to 200 MB/sec for x86 rdrand).
>>  But more important: in this case drivers/char/hw_random/pseries-rng.c
>> is completely broken and insecure, just like patch 2 in case (1) above.
> 
> How so ?

Paul confirmed that it returns real entropy so this is moot.

Paolo

^ permalink raw reply

* [PATCH] powerpc/perf: Fix handling of FAB events
From: Michael Ellerman @ 2013-10-02  8:04 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: soonair3

Commit 4df4899 "Add power8 EBB support" included a bug in the handling
of the FAB_CRESP_MATCH and FAB_TYPE_MATCH fields.

These values are pulled out of the event code using EVENT_THR_CTL_SHIFT,
however we were then or'ing that value directly into MMCR1.

This meant we were failing to set the FAB fields correctly, and also
potentially corrupting the value for PMC4SEL. Leading to no counts for
the FAB events and incorrect counts for PMC4.

The fix is simply to shift left the FAB value correctly before or'ing it
with MMCR1.

Reported-by: Sooraj Ravindran Nair <soonair3@in.ibm.com>
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Cc: <stable@vger.kernel.org> # 3.10+
---
 arch/powerpc/perf/power8-pmu.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

Ben for 3.13 please.

diff --git a/arch/powerpc/perf/power8-pmu.c b/arch/powerpc/perf/power8-pmu.c
index 2ee4a70..a3f7abd 100644
--- a/arch/powerpc/perf/power8-pmu.c
+++ b/arch/powerpc/perf/power8-pmu.c
@@ -199,6 +199,7 @@
 #define MMCR1_UNIT_SHIFT(pmc)		(60 - (4 * ((pmc) - 1)))
 #define MMCR1_COMBINE_SHIFT(pmc)	(35 - ((pmc) - 1))
 #define MMCR1_PMCSEL_SHIFT(pmc)		(24 - (((pmc) - 1)) * 8)
+#define MMCR1_FAB_SHIFT			36
 #define MMCR1_DC_QUAL_SHIFT		47
 #define MMCR1_IC_QUAL_SHIFT		46
 
@@ -388,8 +389,8 @@ static int power8_compute_mmcr(u64 event[], int n_ev,
 		 * the threshold bits are used for the match value.
 		 */
 		if (event_is_fab_match(event[i])) {
-			mmcr1 |= (event[i] >> EVENT_THR_CTL_SHIFT) &
-				  EVENT_THR_CTL_MASK;
+			mmcr1 |= ((event[i] >> EVENT_THR_CTL_SHIFT) &
+				  EVENT_THR_CTL_MASK) << MMCR1_FAB_SHIFT;
 		} else {
 			val = (event[i] >> EVENT_THR_CTL_SHIFT) & EVENT_THR_CTL_MASK;
 			mmcra |= val << MMCRA_THR_CTL_SHIFT;
-- 
1.8.1.2

^ permalink raw reply related

* Re: [PATCH net-next] net:drivers/net: Miscellaneous conversions to ETH_ALEN
From: Arend van Spriel @ 2013-10-02  7:27 UTC (permalink / raw)
  To: Joe Perches, netdev
  Cc: bridge, e1000-devel, brcm80211-dev-list, linux-usb,
	linux-wireless, linux-kernel, ath10k, wil6210, netfilter-devel,
	b43-dev, linuxppc-dev
In-Reply-To: <1380679480.2081.24.camel@joe-AO722>

On 10/02/2013 04:04 AM, Joe Perches wrote:
> Convert the memset/memcpy uses of 6 to ETH_ALEN
> where appropriate.
>
> Also convert some struct definitions and u8 array
> declarations of [6] to ETH_ALEN.
>
For brcmsmac

Acked-by: Arend van Spriel <arend@broadcom.com>
> Signed-off-by: Joe Perches <joe@perches.com>
> ---
>   drivers/net/wireless/brcm80211/brcmsmac/main.c     |  6 +-

^ permalink raw reply

* Re: [PATCH v2 2/6] PCI/MSI: Factor out pci_get_msi_cap() interface
From: Alexander Gordeev @ 2013-10-02  7:26 UTC (permalink / raw)
  To: Mark Lord
  Cc: linuxppc-dev, Joerg Roedel, x86@kernel.org,
	linux-kernel@vger.kernel.org, linux-ide@vger.kernel.org,
	Jan Beulich, linux-pci@vger.kernel.org, Tejun Heo, Bjorn Helgaas,
	Ingo Molnar
In-Reply-To: <524B8908.8080607@start.ca>

On Tue, Oct 01, 2013 at 10:46:32PM -0400, Mark Lord wrote:
> >>> The last pattern makes most of sense to me and could be updated with a more
> >>> clear sequence - a call to (bit modified) pci_msix_table_size() followed
> >>> by a call to pci_enable_msix(). I think this pattern can effectively
> >>> supersede the currently recommended "loop" practice.
> >>
> >> The loop is still necessary, because there's a race between those two calls,
> >> so that pci_enable_msix() can still fail due to lack of MSIX slots.
> > 
> > Moreover, the existing loop pattern is racy and could fail just as easily ;)
> 
> Yes, but it then loops again to correct things.

No. If it failed it should exit the loop.

-- 
Regards,
Alexander Gordeev
agordeev@redhat.com

^ permalink raw reply

* [PATCH 2/2] powerpc/tm: Turn interrupts hard off in tm_reclaim()
From: Michael Neuling @ 2013-10-02  7:15 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: Michael Neuling, linuxppc-dev
In-Reply-To: <1380698115-25841-1-git-send-email-mikey@neuling.org>

We can't take IRQs in tm_reclaim as we might have a bogus r13 and r1.

This turns IRQs hard off in this function.

Signed-off-by: Michael Neuling <mikey@neuling.org>
---
 arch/powerpc/kernel/tm.S | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/kernel/tm.S b/arch/powerpc/kernel/tm.S
index 7b60b98..8ece190 100644
--- a/arch/powerpc/kernel/tm.S
+++ b/arch/powerpc/kernel/tm.S
@@ -123,6 +123,7 @@ _GLOBAL(tm_reclaim)
 	mr	r15, r14
 	ori	r15, r15, MSR_FP
 	li	r16, MSR_RI
+	ori	r16, r16, MSR_EE /* IRQs hard off */
 	andc	r15, r15, r16
 	oris	r15, r15, MSR_VEC@h
 #ifdef CONFIG_VSX
-- 
1.8.1.2

^ permalink raw reply related

* [PATCH 1/2] powerpc/tm: Remove interrupt disable in __switch_to()
From: Michael Neuling @ 2013-10-02  7:15 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: Michael Neuling, linuxppc-dev

We currently turn IRQs off in __switch_to(0 but this is unnecessary as it's
already disabled in the caller.

This removes the IRQ disable but adds a check to make sure it is really off
in case this changes in future.

Signed-off-by: Michael Neuling <mikey@neuling.org>
---
 arch/powerpc/kernel/process.c | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 96d2fdf..384c27e 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -596,12 +596,13 @@ struct task_struct *__switch_to(struct task_struct *prev,
 	struct task_struct *new)
 {
 	struct thread_struct *new_thread, *old_thread;
-	unsigned long flags;
 	struct task_struct *last;
 #ifdef CONFIG_PPC_BOOK3S_64
 	struct ppc64_tlb_batch *batch;
 #endif
 
+	WARN_ON(!irqs_disabled());
+
 	/* Back up the TAR across context switches.
 	 * Note that the TAR is not available for use in the kernel.  (To
 	 * provide this, the TAR should be backed up/restored on exception
@@ -721,8 +722,6 @@ struct task_struct *__switch_to(struct task_struct *prev,
 	}
 #endif /* CONFIG_PPC_BOOK3S_64 */
 
-	local_irq_save(flags);
-
 	/*
 	 * We can't take a PMU exception inside _switch() since there is a
 	 * window where the kernel stack SLB and the kernel stack are out
@@ -742,8 +741,6 @@ struct task_struct *__switch_to(struct task_struct *prev,
 	}
 #endif /* CONFIG_PPC_BOOK3S_64 */
 
-	local_irq_restore(flags);
-
 	return last;
 }
 
-- 
1.8.1.2

^ permalink raw reply related

* Re: [PATCH] Revert "powerpc: 52xx: provide a default in mpc52xx_irqhost_map()"
From: Sebastian Andrzej Siewior @ 2013-10-02  7:12 UTC (permalink / raw)
  To: Wolfram Sang; +Cc: linuxppc-dev, Anatolij Gustschin, linux-rt-users
In-Reply-To: <20131001190344.GA3006@katana>

On 10/01/2013 09:03 PM, Wolfram Sang wrote:
> 
> Yup. But I just remembered a better solution:
> 
> From: Wolfram Sang <wsa@the-dreams.de> Subject: [PATCH] ppc:
> mpc52xx: silence false positive from old GCC
> 
> So people can compile with -Werror.
> 
> Signed-off-by: Wolfram Sang <wsa@the-dreams.de> --- 
> arch/powerpc/platforms/52xx/mpc52xx_pic.c |    2 +- 1 file changed,
> 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/platforms/52xx/mpc52xx_pic.c
> b/arch/powerpc/platforms/52xx/mpc52xx_pic.c index b89ef65..2898b73
> 100644 --- a/arch/powerpc/platforms/52xx/mpc52xx_pic.c +++
> b/arch/powerpc/platforms/52xx/mpc52xx_pic.c @@ -340,7 +340,7 @@
> static int mpc52xx_irqhost_map(struct irq_domain *h, unsigned int
> virq, { int l1irq; int l2irq; -	struct irq_chip *irqchip; +	struct
> irq_chip *uninitialized_var(irqchip); void *hndlr; int type; u32
> reg;
> 
> 
> uninitialized_var was created for exactly that purpose IIRC.

Yup, looks good, thanks.

> 
> Thanks,
> 
> Wolfram
> 

Sebastian

^ permalink raw reply

* Re: [PATCH v2 2/6] PCI/MSI: Factor out pci_get_msi_cap() interface
From: Alexander Gordeev @ 2013-10-02  7:10 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: linuxppc-dev, Joerg Roedel, x86@kernel.org,
	linux-kernel@vger.kernel.org, linux-ide@vger.kernel.org,
	Jan Beulich, linux-pci@vger.kernel.org, Tejun Heo, Bjorn Helgaas,
	Ingo Molnar
In-Reply-To: <20131002024324.GC22748@concordia>

On Wed, Oct 02, 2013 at 12:43:24PM +1000, Michael Ellerman wrote:
> On Tue, Oct 01, 2013 at 12:35:27PM +0200, Alexander Gordeev wrote:
> > On Tue, Oct 01, 2013 at 05:51:33PM +1000, Michael Ellerman wrote:
> > > The disadvantage is that any restriction imposed on us above the quota
> > > can only be reported as an error from pci_enable_msix().
> > > 
> > > The quota code, called from pci_get_msix_limit(), can only do so much to
> > > interogate firmware about the limitations. The ultimate way to check if
> > > firmware will give us enough MSIs is to try and allocate them. But we
> > > can't do that from pci_get_msix_limit() because the driver is not asking
> > > us to enable MSIs, just query them.
> > 
> > If things are this way then pci_enable_msix() already exposed to this
> > problem internally on pSeries.
> > 
> > I see that even successful quota checks in rtas_msi_check_device() and
> > rtas_setup_msi_irqs() do not guarantee (as you say) that firmware will
> > give enough MSIs. Hence, pci_enable_msix() might fail even though the
> > its quota checks succeeded.
> 
> Yes, but it can report that failure to the caller, which can then retry.

If a driver wants to retry after a failure it is up to the driver (but why?).
The current guidlines state:

"If this function returns a negative number, it indicates an error and
the driver should not attempt to allocate any more MSI-X interrupts for
this device."

Anyway, what number could the driver retry with after it got a negative errno?

> > Therefore, nothing will really change if we make pci_get_msix_limit() check
> > quota and hope the follow-up call to pci_enable_msix() succeeded.
> 
> No that's not equivalent. Under your scheme if pci_enable_msix() fails
> then the caller just bails, it will never try again with a lower number.

Currently under the very same circumstances (the quota check within
rtas_setup_msi_irqs() returned Q vectors while the firmware has only F
vectors to allocate and Q > F) rtas_setup_msi_irqs() fails, pci_enable_msix()
fails, the caller bails and never try again with a lower number.

Am I missing something here?

> cheers

-- 
Regards,
Alexander Gordeev
agordeev@redhat.com

^ permalink raw reply

* Re: [PATCH 3/3] KVM: PPC: Book3S: Add support for hwrng found on some powernv systems
From: Paul Mackerras @ 2013-10-02  5:09 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: tytso, kvm, Gleb Natapov, linuxppc-dev, linux-kernel, kvm-ppc,
	agraf, herbert, mpm
In-Reply-To: <524AAFAA.3010801@redhat.com>

On Tue, Oct 01, 2013 at 01:19:06PM +0200, Paolo Bonzini wrote:

> Anyhow, I would like to know more about this hwrng and hypercall.
> 
> Does the hwrng return random numbers (like rdrand) or real entropy (like
> rdseed that Intel will add in Broadwell)?  What about the hypercall?

Well, firstly, your terminology is inaccurate.  Real entropy will give
you random numbers.  I think when you say "random numbers" you
actually mean "pseudo-random numbers".

Secondly, the RNG produces real entropy.  The way it works is that
there are 64 ring oscillators running at different frequencies (above
1GHz).  They get sampled at (typically) 1MHz and the samples get put
in a 64-entry FIFO, which is read by MMIO.  There is practically no
correlation between bits or between adjacent samples.  The only
deficiency is that the distribution of each bit is not always
precisely 50% zero / 50% one (it is somewhere between 40/60 and
60/40).

The whitening addresses this bias.  Looking at the stream of values
for a given bit, we XOR that stream with another stream that is
uncorrelated and has a 50/50 distribution (or very very close to
that), which gives a stream whose distribution is closer to 50/50 than
either input stream.  The second stream is effectively derived by
XORing together all 64 bits of some previous sample.  XORing together
many uncorrelated streams that are each close to 50/50 distribution
gives a stream that is much closer to a 50/50 distribution (by the
"piling up lemma").  The result passes all the dieharder tests.

> For example virtio-rng is specified to return actual entropy, it doesn't
> matter if it is from hardware or software.
> 
> In either case, the patches have problems.
> 
> 1) If the hwrng returns random numbers, the whitening you're doing is
> totally insufficient and patch 2 is forging entropy that doesn't exist.
> 
> 2) If the hwrng returns entropy, a read from the hwrng is going to even
> more expensive than an x86 rdrand (perhaps ~2000 cycles).  Hence, doing

The MMIO itself is reasonably quick if the FIFO is not empty, but the
long-term overall rate is limited by the sampling rate.

> the emulation in the kernel is even less necessary.  Also, if the hwrng
> returns entropy patch 1 is unnecessary: you do not need to waste
> precious entropy bits by passing them to arch_get_random_long; just run
> rngd in the host as that will put the entropy to much better use.

Not sure why they are particularly "precious"; we get 64 bits per
microsecond whether we use them or not.  What are you suggesting
arch_get_random_long() should do instead?

> 3) If the hypercall returns random numbers, then it is a pretty
> braindead interface since returning 8 bytes at a time limits the
> throughput to a handful of MB/s (compare to 200 MB/sec for x86 rdrand).
>  But more important: in this case drivers/char/hw_random/pseries-rng.c
> is completely broken and insecure, just like patch 2 in case (1) above.

Assuming that by "random numbers" you actually mean "pseudo-random
numbers", then this doesn't apply.

> 4) If the hypercall returns entropy (same as virtio-rng), the same
> considerations on speed apply.  If you can only produce entropy at say 1
> MB/s (so reading 8 bytes take 8 microseconds---which is actually very
> fast), it doesn't matter that much to spend 7 microseconds on a
> userspace roundtrip.  It's going to be only half the speed of bare
> metal, not 100 times slower.

8 bytes takes at most 1 microsecond, so the round-trip to userspace is
definitely noticeable.

Paul.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox