Re: [PATCH 3/3] KVM: PPC: Book3S: Add support for hwrng found on some powernv systems

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Paolo Bonzini <pbonzini@redhat.com>
To: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Gleb Natapov <gleb@redhat.com>,
	Michael Ellerman <michael@ellerman.id.au>,
	linux-kernel@vger.kernel.org, Paul Mackerras <paulus@samba.org>,
	agraf@suse.de, mpm@selenic.com, herbert@gondor.hengli.com.au,
	linuxppc-dev@ozlabs.org, kvm@vger.kernel.org,
	kvm-ppc@vger.kernel.org, tytso@mit.edu
Subject: Re: [PATCH 3/3] KVM: PPC: Book3S: Add support for hwrng found on some powernv systems
Date: Wed, 02 Oct 2013 08:38:21 +0000	[thread overview]
Message-ID: <524BDB7D.8000708@redhat.com> (raw)
In-Reply-To: <1380663871.645.44.camel@pasglop>

Il 01/10/2013 23:44, Benjamin Herrenschmidt ha scritto:
> On Tue, 2013-10-01 at 13:19 +0200, Paolo Bonzini wrote:
>> Il 01/10/2013 11:38, Benjamin Herrenschmidt ha scritto:
>>> So for the sake of that dogma you are going to make us do something that
>>> is about 100 times slower ? (and possibly involves more lines of code)
>>
>> If it's 100 times slower there is something else that's wrong.  It's
>> most likely not 100 times slower, and this makes me wonder if you or
>> Michael actually timed the code at all.
> 
> So no we haven't measured. But it is going to be VERY VERY VERY much
> slower. Our exit latencies are bad with our current MMU *and* any exit
> is going to cause all secondary threads on the core to have to exit as
> well (remember P7 is 4 threads, P8 is 8)

Ok, this is indeed the main difference between Power and x86.

>>   100 cycles            bare metal rdrand
>>   2000 cycles           guest->hypervisor->guest
>>   15000 cycles          guest->userspace->guest
>>
>> (100 cycles = 40 ns = 200 MB/sec; 2000 cycles = ~1 microseconds; 15000
>> cycles = ~7.5 microseconds).  Even on 5 year old hardware, a userspace
>> roundtrip is around a dozen microseconds.
> 
> So in your case going to qemu to "emulate" rdrand would indeed be 150
> times slower, I don't see in what universe that would be considered a
> good idea.

rdrand is not privileged on x86, guests can use it.  But my point is
that going to the kernel is already 20 times slower.  Getting entropy
(not just a pseudo-random number seeded by the HWRNG) with rdrand is
~1000 times slower according to Intel's recommendations, so the
roundtrip to userspace is entirely invisible in that case.

The numbers for PPC seem to be a bit different though (it's faster to
read entropy, and slower to do a userspace exit).

> It's a random number obtained from sampling a set of oscillators. It's
> slightly biased but we have very simple code (I believe shared with the
> host kernel implementation) for whitening it as is required by PAPR.

Good.  Actually, passing the dieharder tests does not mean much (an
AES-encrypted counter should also pass them with flashing colors), but
if it's specified by the architecture gods it's likely to have received
some scrutiny.

>> 2) If the hwrng returns entropy, a read from the hwrng is going to even
>> more expensive than an x86 rdrand (perhaps ~2000 cycles).
> 
> Depends how often you read, the HW I think is sampling asynchronously so
> you only block on the MMIO if you already consumed the previous sample
> but I'll let Paulus provide more details here.

Given Paul's description, there's indeed very little extra cost compared
to a "nop" hypercall.  That's nice.

Still, considering that QEMU code has to be there anyway for
compatibility, kernel emulation is not particularly necessary IMHO.  I
would of course like to see actual performance numbers, but besides that
are you ever going to ever see this in the profile except if you run "dd
if=/dev/hwrng of=/dev/null"?

Can you instrument pHyp to find out how many times per second is this
hypercall called by a "normal" Linux or AIX guest?

>> 3) If the hypercall returns random numbers, then it is a pretty
>> braindead interface since returning 8 bytes at a time limits the
>> throughput to a handful of MB/s (compare to 200 MB/sec for x86 rdrand).
>>  But more important: in this case drivers/char/hw_random/pseries-rng.c
>> is completely broken and insecure, just like patch 2 in case (1) above.
> 
> How so ?

Paul confirmed that it returns real entropy so this is moot.

Paolo

WARNING: multiple messages have this Message-ID (diff)

From: Paolo Bonzini <pbonzini@redhat.com>
To: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: tytso@mit.edu, kvm@vger.kernel.org,
	Gleb Natapov <gleb@redhat.com>,
	linuxppc-dev@ozlabs.org, linux-kernel@vger.kernel.org,
	kvm-ppc@vger.kernel.org, agraf@suse.de,
	herbert@gondor.hengli.com.au, Paul Mackerras <paulus@samba.org>,
	mpm@selenic.com
Subject: Re: [PATCH 3/3] KVM: PPC: Book3S: Add support for hwrng found on some powernv systems
Date: Wed, 02 Oct 2013 10:38:21 +0200	[thread overview]
Message-ID: <524BDB7D.8000708@redhat.com> (raw)
In-Reply-To: <1380663871.645.44.camel@pasglop>

Il 01/10/2013 23:44, Benjamin Herrenschmidt ha scritto:
> On Tue, 2013-10-01 at 13:19 +0200, Paolo Bonzini wrote:
>> Il 01/10/2013 11:38, Benjamin Herrenschmidt ha scritto:
>>> So for the sake of that dogma you are going to make us do something that
>>> is about 100 times slower ? (and possibly involves more lines of code)
>>
>> If it's 100 times slower there is something else that's wrong.  It's
>> most likely not 100 times slower, and this makes me wonder if you or
>> Michael actually timed the code at all.
> 
> So no we haven't measured. But it is going to be VERY VERY VERY much
> slower. Our exit latencies are bad with our current MMU *and* any exit
> is going to cause all secondary threads on the core to have to exit as
> well (remember P7 is 4 threads, P8 is 8)

Ok, this is indeed the main difference between Power and x86.

>>   100 cycles            bare metal rdrand
>>   2000 cycles           guest->hypervisor->guest
>>   15000 cycles          guest->userspace->guest
>>
>> (100 cycles = 40 ns = 200 MB/sec; 2000 cycles = ~1 microseconds; 15000
>> cycles = ~7.5 microseconds).  Even on 5 year old hardware, a userspace
>> roundtrip is around a dozen microseconds.
> 
> So in your case going to qemu to "emulate" rdrand would indeed be 150
> times slower, I don't see in what universe that would be considered a
> good idea.

rdrand is not privileged on x86, guests can use it.  But my point is
that going to the kernel is already 20 times slower.  Getting entropy
(not just a pseudo-random number seeded by the HWRNG) with rdrand is
~1000 times slower according to Intel's recommendations, so the
roundtrip to userspace is entirely invisible in that case.

The numbers for PPC seem to be a bit different though (it's faster to
read entropy, and slower to do a userspace exit).

> It's a random number obtained from sampling a set of oscillators. It's
> slightly biased but we have very simple code (I believe shared with the
> host kernel implementation) for whitening it as is required by PAPR.

Good.  Actually, passing the dieharder tests does not mean much (an
AES-encrypted counter should also pass them with flashing colors), but
if it's specified by the architecture gods it's likely to have received
some scrutiny.

>> 2) If the hwrng returns entropy, a read from the hwrng is going to even
>> more expensive than an x86 rdrand (perhaps ~2000 cycles).
> 
> Depends how often you read, the HW I think is sampling asynchronously so
> you only block on the MMIO if you already consumed the previous sample
> but I'll let Paulus provide more details here.

Given Paul's description, there's indeed very little extra cost compared
to a "nop" hypercall.  That's nice.

Still, considering that QEMU code has to be there anyway for
compatibility, kernel emulation is not particularly necessary IMHO.  I
would of course like to see actual performance numbers, but besides that
are you ever going to ever see this in the profile except if you run "dd
if=/dev/hwrng of=/dev/null"?

Can you instrument pHyp to find out how many times per second is this
hypercall called by a "normal" Linux or AIX guest?

>> 3) If the hypercall returns random numbers, then it is a pretty
>> braindead interface since returning 8 bytes at a time limits the
>> throughput to a handful of MB/s (compare to 200 MB/sec for x86 rdrand).
>>  But more important: in this case drivers/char/hw_random/pseries-rng.c
>> is completely broken and insecure, just like patch 2 in case (1) above.
> 
> How so ?

Paul confirmed that it returns real entropy so this is moot.

Paolo

WARNING: multiple messages have this Message-ID (diff)

From: Paolo Bonzini <pbonzini@redhat.com>
To: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Gleb Natapov <gleb@redhat.com>,
	Michael Ellerman <michael@ellerman.id.au>,
	linux-kernel@vger.kernel.org, Paul Mackerras <paulus@samba.org>,
	agraf@suse.de, mpm@selenic.com, herbert@gondor.hengli.com.au,
	linuxppc-dev@ozlabs.org, kvm@vger.kernel.org,
	kvm-ppc@vger.kernel.org, tytso@mit.edu
Subject: Re: [PATCH 3/3] KVM: PPC: Book3S: Add support for hwrng found on some powernv systems
Date: Wed, 02 Oct 2013 10:38:21 +0200	[thread overview]
Message-ID: <524BDB7D.8000708@redhat.com> (raw)
In-Reply-To: <1380663871.645.44.camel@pasglop>

Il 01/10/2013 23:44, Benjamin Herrenschmidt ha scritto:
> On Tue, 2013-10-01 at 13:19 +0200, Paolo Bonzini wrote:
>> Il 01/10/2013 11:38, Benjamin Herrenschmidt ha scritto:
>>> So for the sake of that dogma you are going to make us do something that
>>> is about 100 times slower ? (and possibly involves more lines of code)
>>
>> If it's 100 times slower there is something else that's wrong.  It's
>> most likely not 100 times slower, and this makes me wonder if you or
>> Michael actually timed the code at all.
> 
> So no we haven't measured. But it is going to be VERY VERY VERY much
> slower. Our exit latencies are bad with our current MMU *and* any exit
> is going to cause all secondary threads on the core to have to exit as
> well (remember P7 is 4 threads, P8 is 8)

Ok, this is indeed the main difference between Power and x86.

>>   100 cycles            bare metal rdrand
>>   2000 cycles           guest->hypervisor->guest
>>   15000 cycles          guest->userspace->guest
>>
>> (100 cycles = 40 ns = 200 MB/sec; 2000 cycles = ~1 microseconds; 15000
>> cycles = ~7.5 microseconds).  Even on 5 year old hardware, a userspace
>> roundtrip is around a dozen microseconds.
> 
> So in your case going to qemu to "emulate" rdrand would indeed be 150
> times slower, I don't see in what universe that would be considered a
> good idea.

rdrand is not privileged on x86, guests can use it.  But my point is
that going to the kernel is already 20 times slower.  Getting entropy
(not just a pseudo-random number seeded by the HWRNG) with rdrand is
~1000 times slower according to Intel's recommendations, so the
roundtrip to userspace is entirely invisible in that case.

The numbers for PPC seem to be a bit different though (it's faster to
read entropy, and slower to do a userspace exit).

> It's a random number obtained from sampling a set of oscillators. It's
> slightly biased but we have very simple code (I believe shared with the
> host kernel implementation) for whitening it as is required by PAPR.

Good.  Actually, passing the dieharder tests does not mean much (an
AES-encrypted counter should also pass them with flashing colors), but
if it's specified by the architecture gods it's likely to have received
some scrutiny.

>> 2) If the hwrng returns entropy, a read from the hwrng is going to even
>> more expensive than an x86 rdrand (perhaps ~2000 cycles).
> 
> Depends how often you read, the HW I think is sampling asynchronously so
> you only block on the MMIO if you already consumed the previous sample
> but I'll let Paulus provide more details here.

Given Paul's description, there's indeed very little extra cost compared
to a "nop" hypercall.  That's nice.

Still, considering that QEMU code has to be there anyway for
compatibility, kernel emulation is not particularly necessary IMHO.  I
would of course like to see actual performance numbers, but besides that
are you ever going to ever see this in the profile except if you run "dd
if=/dev/hwrng of=/dev/null"?

Can you instrument pHyp to find out how many times per second is this
hypercall called by a "normal" Linux or AIX guest?

>> 3) If the hypercall returns random numbers, then it is a pretty
>> braindead interface since returning 8 bytes at a time limits the
>> throughput to a handful of MB/s (compare to 200 MB/sec for x86 rdrand).
>>  But more important: in this case drivers/char/hw_random/pseries-rng.c
>> is completely broken and insecure, just like patch 2 in case (1) above.
> 
> How so ?

Paul confirmed that it returns real entropy so this is moot.

Paolo

next prev parent reply	other threads:[~2013-10-02  8:38 UTC|newest]

Thread overview: 146+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-09-26  6:31 [PATCH 1/3] powerpc: Implement arch_get_random_long/int() for powernv Michael Ellerman
2013-09-26  6:31 ` Michael Ellerman
2013-09-26  6:31 ` Michael Ellerman
2013-09-26  6:31 ` Michael Ellerman
2013-09-26  6:31 ` [PATCH 2/3] hwrng: Add a driver for the hwrng found in power7+ systems Michael Ellerman
2013-09-26  6:31   ` Michael Ellerman
2013-09-26  6:31   ` Michael Ellerman
2013-09-26  6:31   ` Michael Ellerman
2013-09-26  8:01   ` Benjamin Herrenschmidt
2013-09-26  8:01     ` Benjamin Herrenschmidt
2013-09-26  8:01     ` Benjamin Herrenschmidt
2013-10-01  8:25     ` Michael Ellerman
2013-10-01  8:25       ` Michael Ellerman
2013-10-01  8:25       ` Michael Ellerman
2013-09-26  6:31 ` [PATCH 3/3] KVM: PPC: Book3S: Add support for hwrng found on some powernv systems Michael Ellerman
2013-09-26  6:31   ` Michael Ellerman
2013-09-26  6:31   ` Michael Ellerman
2013-09-26  6:31   ` Michael Ellerman
2013-09-26  9:06   ` Paolo Bonzini
2013-09-26  9:06     ` Paolo Bonzini
2013-09-26  9:06     ` Paolo Bonzini
2013-09-26  9:06     ` Paolo Bonzini
2013-10-01  8:34     ` Michael Ellerman
2013-10-01  8:34       ` Michael Ellerman
2013-10-01  8:34       ` Michael Ellerman
2013-10-01  8:34       ` Michael Ellerman
2013-10-01  8:39       ` Gleb Natapov
2013-10-01  8:39         ` Gleb Natapov
2013-10-01  8:39         ` Gleb Natapov
2013-10-01  9:23         ` Paul Mackerras
2013-10-01  9:23           ` Paul Mackerras
2013-10-01  9:23           ` Paul Mackerras
2013-10-01  9:57           ` Gleb Natapov
2013-10-01  9:57             ` Gleb Natapov
2013-10-01  9:57             ` Gleb Natapov
2013-10-01 10:00           ` Alexander Graf
2013-10-01 10:00             ` Alexander Graf
2013-10-01 10:00             ` Alexander Graf
2013-10-01  9:38         ` Benjamin Herrenschmidt
2013-10-01  9:38           ` Benjamin Herrenschmidt
2013-10-01  9:38           ` Benjamin Herrenschmidt
2013-10-01 11:19           ` Paolo Bonzini
2013-10-01 11:19             ` Paolo Bonzini
2013-10-01 11:19             ` Paolo Bonzini
2013-10-01 21:44             ` Benjamin Herrenschmidt
2013-10-01 21:44               ` Benjamin Herrenschmidt
2013-10-01 21:44               ` Benjamin Herrenschmidt
2013-10-02  8:38               ` Paolo Bonzini [this message]
2013-10-02  8:38                 ` Paolo Bonzini
2013-10-02  8:38                 ` Paolo Bonzini
2013-10-02  5:09             ` Paul Mackerras
2013-10-02  5:09               ` Paul Mackerras
2013-10-02  5:09               ` Paul Mackerras
2013-10-02  8:46               ` Paolo Bonzini
2013-10-02  8:46                 ` Paolo Bonzini
2013-10-02  8:46                 ` Paolo Bonzini
2013-10-02  9:06                 ` Benjamin Herrenschmidt
2013-10-02  9:06                   ` Benjamin Herrenschmidt
2013-10-02  9:06                   ` Benjamin Herrenschmidt
2013-10-02  9:11                   ` Alexander Graf
2013-10-02  9:11                     ` Alexander Graf
2013-10-02  9:11                     ` Alexander Graf
2013-10-02  9:50                     ` Alexander Graf
2013-10-02  9:50                       ` Alexander Graf
2013-10-02  9:50                       ` Alexander Graf
2013-10-02 10:02                       ` Gleb Natapov
2013-10-02 10:02                         ` Gleb Natapov
2013-10-02 10:02                         ` Gleb Natapov
2013-10-02 13:57                         ` Michael Ellerman
2013-10-02 13:57                           ` Michael Ellerman
2013-10-02 13:57                           ` Michael Ellerman
2013-10-02 14:08                           ` Alexander Graf
2013-10-02 14:08                             ` Alexander Graf
2013-10-02 14:08                             ` Alexander Graf
2013-10-02 14:33                             ` Paolo Bonzini
2013-10-02 14:33                               ` Paolo Bonzini
2013-10-02 14:33                               ` Paolo Bonzini
2013-10-02 14:36                               ` Alexander Graf
2013-10-02 14:36                                 ` Alexander Graf
2013-10-02 14:36                                 ` Alexander Graf
2013-10-02 14:38                                 ` Paolo Bonzini
2013-10-02 14:38                                   ` Paolo Bonzini
2013-10-02 14:38                                   ` Paolo Bonzini
2013-10-02 22:45                                 ` Paul Mackerras
2013-10-02 22:45                                   ` Paul Mackerras
2013-10-02 22:45                                   ` Paul Mackerras
2013-10-03  5:48                                   ` Gleb Natapov
2013-10-03  5:48                                     ` Gleb Natapov
2013-10-03  5:48                                     ` Gleb Natapov
2013-10-03 10:06                                     ` Paul Mackerras
2013-10-03 10:06                                       ` Paul Mackerras
2013-10-03 10:06                                       ` Paul Mackerras
2013-10-03 12:08                                       ` Gleb Natapov
2013-10-03 12:08                                         ` Gleb Natapov
2013-10-03 12:08                                         ` Gleb Natapov
2013-10-02 14:37                               ` Gleb Natapov
2013-10-02 14:37                                 ` Gleb Natapov
2013-10-02 14:37                                 ` Gleb Natapov
2013-10-02 22:21                                 ` Benjamin Herrenschmidt
2013-10-02 22:21                                   ` Benjamin Herrenschmidt
2013-10-02 22:21                                   ` Benjamin Herrenschmidt
2013-10-03  6:08                                   ` Gleb Natapov
2013-10-03  6:08                                     ` Gleb Natapov
2013-10-03  6:08                                     ` Gleb Natapov
2013-10-02 22:13                             ` Benjamin Herrenschmidt
2013-10-02 22:13                               ` Benjamin Herrenschmidt
2013-10-02 22:13                               ` Benjamin Herrenschmidt
2013-10-02 14:10                           ` Gleb Natapov
2013-10-02 14:10                             ` Gleb Natapov
2013-10-02 14:10                             ` Gleb Natapov
2013-10-02 22:15                             ` Benjamin Herrenschmidt
2013-10-02 22:15                               ` Benjamin Herrenschmidt
2013-10-02 22:15                               ` Benjamin Herrenschmidt
2013-10-02 22:02                         ` Benjamin Herrenschmidt
2013-10-02 22:02                           ` Benjamin Herrenschmidt
2013-10-02 22:02                           ` Benjamin Herrenschmidt
2013-10-03  5:43                           ` Gleb Natapov
2013-10-03  5:43                             ` Gleb Natapov
2013-10-03  5:43                             ` Gleb Natapov
2013-10-03  7:22                             ` Benjamin Herrenschmidt
2013-10-03  7:22                               ` Benjamin Herrenschmidt
2013-10-03  7:22                               ` Benjamin Herrenschmidt
2013-10-02 22:07                         ` Benjamin Herrenschmidt
2013-10-02 22:07                           ` Benjamin Herrenschmidt
2013-10-02 22:07                           ` Benjamin Herrenschmidt
2013-10-03  6:28                           ` Gleb Natapov
2013-10-03  6:28                             ` Gleb Natapov
2013-10-03  6:28                             ` Gleb Natapov
2013-10-02 21:58                     ` Benjamin Herrenschmidt
2013-10-02 21:58                       ` Benjamin Herrenschmidt
2013-10-02 21:58                       ` Benjamin Herrenschmidt
2013-10-01  9:58       ` Paolo Bonzini
2013-10-01  9:58         ` Paolo Bonzini
2013-10-01  9:58         ` Paolo Bonzini
2013-10-01  9:58         ` Paolo Bonzini
2013-09-27 14:15   ` Anshuman Khandual
2013-09-27 14:27     ` Anshuman Khandual
2013-09-27 14:15     ` Anshuman Khandual
2013-09-27 14:15     ` Anshuman Khandual
2013-10-01  8:36     ` Michael Ellerman
2013-10-01  8:36       ` Michael Ellerman
2013-10-01  8:36       ` Michael Ellerman
2013-10-01  8:36       ` Michael Ellerman
2013-09-26  7:58 ` [PATCH 1/3] powerpc: Implement arch_get_random_long/int() for powernv Benjamin Herrenschmidt
2013-09-26  7:58   ` Benjamin Herrenschmidt
2013-09-26  7:58   ` Benjamin Herrenschmidt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=524BDB7D.8000708@redhat.com \
    --to=pbonzini@redhat.com \
    --cc=agraf@suse.de \
    --cc=benh@kernel.crashing.org \
    --cc=gleb@redhat.com \
    --cc=herbert@gondor.hengli.com.au \
    --cc=kvm-ppc@vger.kernel.org \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxppc-dev@ozlabs.org \
    --cc=michael@ellerman.id.au \
    --cc=mpm@selenic.com \
    --cc=paulus@samba.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.