qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Greg Kurz <groug@kaod.org>
To: qemu-devel@nongnu.org
Cc: "Eduardo Habkost" <ehabkost@redhat.com>,
	qemu-ppc@nongnu.org, "Cédric Le Goater" <clg@kaod.org>,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	"David Gibson" <david@gibson.dropbear.id.au>,
	"Richard Henderson" <rth@twiddle.net>
Subject: Re: [Qemu-devel] [Qemu-ppc] [PATCH 0/3] spapr: fix regression with older machine types
Date: Thu, 28 Jun 2018 21:48:25 +0200	[thread overview]
Message-ID: <20180628214618.09123598@bahia.lan> (raw)
In-Reply-To: <153018086531.336571.17029459443980070626.stgit@bahia.lan>

On Thu, 28 Jun 2018 12:14:25 +0200
Greg Kurz <groug@kaod.org> wrote:

> Since the recent cleanups to hide host configuration details from guests,
> it isn't possible to start an older machine type with HV KVM [*]:
> 
> qemu-system-ppc64: KVM doesn't support for base page shift 34
> 
> This basically boils down to the fact that it isn't safe to call
> the kvmppc_hpt_needs_host_contiguous_pages() helper from a class
> init function because:
> - KVM isn't initialized yet, and kvm_enabled() always return false
>   in this case. This causes kvmppc_hpt_needs_host_contiguous_pages()
>   to do nothing and we end up choosing a 16G default page size
>   which is not supported by KVM.
> - even if we drop kvm_enabled() we then have the issue that
>   kvmppc_hpt_needs_host_contiguous_pages() assumes CPUs are
>   created, which isn't the case either.
> 
> The choice was made to initialize capabilities during machine
> init before creating the CPUs, and I don't think we should
> revert to the previous behavior. Let's go forward instead and
> ensure we can retrieve the MMU information from KVM before
> CPUs are created.
> 
> To fix this, we first change kvm_get_smmu_info() so that it
> doesn't need a CPU object. This allows to stop using first_cpu
> in kvmppc_hpt_needs_host_contiguous_pages(). Then we delay
> the setting of the default value to machine init time, so
> that we're sure that KVM is fully initialized.
> 
> As a bonus, the last patch is a tentative to be able to detect
> such misuse of *_enabled() accelerator helpers earlier.
> 
> Please comment.
> 
> [*] it also breaks PR KVM actually, but the error is different and
>     I need to dig some more.
> 

With current master:

1) qemu-system-ppc64 -machine pseries,accel=kvm,kvm-type=PR

The guest starts but its kernel oopses at some point:

[    0.011328] kernel tried to execute exec-protected page (c000000001611244) -exploit attempt? (uid: 0)
[    0.011379] Unable to handle kernel paging request for instruction fetch
[    0.011416] Faulting instruction address: 0xc000000001611244
[    0.011453] Oops: Kernel access of bad area, sig: 11 [#1]
[    0.011482] LE SMP NR_CPUS=1024 NUMA pSeries
[    0.011512] Modules linked in:
[    0.011557] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.17.2-200.fc28.ppc64le #1
[    0.011600] NIP:  c000000001611244 LR: c00000000000acec CTR: 0000000000000000
[    0.011643] REGS: c00000003fffba90 TRAP: 0400   Not tainted  (4.17.2-200.fc28.ppc64le)
[    0.011694] MSR:  b000000010001033 <SF,HV,ME,IR,DR,RI,LE>  CR: 28000848  XER: 20000000
[    0.011741] CFAR: 0000000000000000 SOFTE: 1 
[    0.011741] GPR00: 0000000000000000 c00000003fffbd10 c000000001570b00 c00000003fffbd80 
[    0.011741] GPR04: c000000000034418 0000000048000000 000000000000000a 000000004aa21de8 
[    0.011741] GPR08: 000000007d410164 0000000000000000 0000000000000002 0000000000000900 
[    0.011741] GPR12: b000000002009033 c000000001840000 c000000000071a2c 00000000495de1a4 
[    0.011741] GPR16: 0000000000000078 c00000000160fd10 c000000000e705e0 000000007c1b03a6 
[    0.011741] GPR20: 000000007c1ffaa6 c0000000016125b8 c0000000014253e8 000000007c1303a6 
[    0.011741] GPR24: 000000007c1643a6 000000007c1a03a6 c00000000160fd08 ffffffffebc0f008 
[    0.011741] GPR28: ffffffffebc0f000 c0000000000345d8 c0000000000345d8 0000000000000000 
[    0.012138] NIP [c000000001611244] kvm_tmp+0x1534/0x100000
[    0.012170] LR [c00000000000acec] soft_nmi_common+0xcc/0xd0
[    0.012199] Call Trace:
[    0.012214] Instruction dump:
[    0.012236] XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX 
[    0.012289] XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX 
[    0.012334] ---[ end trace d2ee28832d481d2d ]---
[    0.012362] 
[    1.012387] kernel tried to execute exec-protected page (c000000001611808) -exploit attempt? (uid: 0)
[    1.012433] Unable to handle kernel paging request for instruction fetch
[    1.012468] Faulting instruction address: 0xc000000001611808
[    1.012504] Oops: Kernel access of bad area, sig: 11 [#2]
[    1.012532] LE SMP NR_CPUS=1024 NUMA pSeries
[    1.012561] Modules linked in:
[    1.012583] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G      D           4.17.2-200.fc28.ppc64le #1
[    1.012641] NIP:  c000000001611808 LR: c0000000001247fc CTR: c000000001840000
[    1.012684] REGS: c00000003fffb5d0 TRAP: 0400   Tainted: G      D            (4.17.2-200.fc28.ppc64le)
[    1.012740] MSR:  b000000010001033 <SF,HV,ME,IR,DR,RI,LE>  CR: 48000224  XER: 20000000
[    1.012785] CFAR: 0000000000000000 SOFTE: 0 
[    1.012785] GPR00: c0000000001247fc c00000003fffb850 c000000001570b00 0000000000000000 
[    1.012785] GPR04: 0000000000000000 c0000000fe9e4900 fffffffffffffffd c0000000fe9e4900 
[    1.012785] GPR08: 00000000fed50000 b000000000001033 0000000000000009 c00000003fffb55f 
[    1.012785] GPR12: 0000000000000000 c000000001840000 c000000000071a2c 00000000495de1a4 
[    1.012785] GPR16: 0000000000000078 c00000000160fd10 c000000000e705e0 000000007c1b03a6 
[    1.012785] GPR20: 000000007c1ffaa6 c0000000016125b8 c0000000014253e8 000000007c1303a6 
[    1.012785] GPR24: 000000007c1643a6 000000007c1a03a6 c00000000160fd08 ffffffffebc0f008 
[    1.012785] GPR28: 0000000000000000 000000000000000b 000000000000000b c0000000fe9e4900 
[    1.013166] NIP [c000000001611808] kvm_tmp+0x1af8/0x100000
[    1.013196] LR [c0000000001247fc] do_exit+0x12c/0xd30
[    1.013224] Call Trace:
[    1.013238] Instruction dump:
[    1.013260] XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX 
[    1.013303] XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX 
[    1.013348] ---[ end trace d2ee28832d481d2e ]---
[    1.013375] 
[    2.013391] Fixing recursive fault but reboot is needed!

and the guest gets unresponsive.

2) qemu-system-ppc64 -machine pseries-2.12,accel=kvm,kvm-type=PR

prints an error message and terminates right away:

qemu-system-ppc64: KVM doesn't support page shift 24/12

This error is expected: since PR KVM doesn't set KVM_PPC_PAGE_SIZES_REAL,
ie, we choose to support all possible page sizes, but PR KVM doesn't
support this page shift combination indeed. Unsurprisingly we get the
same error with:

-machine pseries,accel-kvm,kvm-type=PR,cap-hpt-max-page-size=${pagesize}

if ${pagesize} is >= 16m. This is the result of PR KVM not supporting
MPSS at all, even though it supports 16m pages in a 16m segment. We
cannot really fix this in QEMU, unless we completely filter out MPSS
in spapr_pagesize_cb() but I'm pretty sure we don't want that. :)

But then, if we go for a 64k limit, we hit 1).

An obvious change in the DT since the page size cleanup is:

                            [4k seg    [4k pg]] [64k seg      [64k pg]] [16m seg      [16m pg]]
- ibm,segment-page-sizes = <0xc 0x0 0x1 0xc 0x0 0x10 0x110 0x1 0x10 0x1 0x18 0x100 0x1 0x18 0x0>;
+ ibm,segment-page-sizes = <0xc 0x0 0x1 0xc 0x0 0x10 0x110 0x1 0x10 0x1>;
                            [4k seg    [4k pg]] [64k seg      [64k pg]]

If I add the 16m entry back, the guest boots just fine.

Not sure yet what's happening... any idea ?

Cheers,

--
Greg


> --
> Greg
> 
> ---
> 
> Greg Kurz (3):
>       target/ppc/kvm: don't pass cpu to kvm_get_smmu_info()
>       spapr: compute default value of "hpt-max-page-size" later
>       accel: forbid early use of kvm_enabled() and friends
> 
> 
>  accel/accel.c           |    7 +++++++
>  hw/ppc/spapr.c          |   25 ++++++++++++++++++-------
>  include/qemu-common.h   |    3 ++-
>  include/sysemu/accel.h  |    1 +
>  include/sysemu/kvm.h    |    3 ++-
>  qom/cpu.c               |    1 +
>  stubs/Makefile.objs     |    1 +
>  stubs/accel.c           |   14 ++++++++++++++
>  target/i386/hax-all.c   |    2 +-
>  target/i386/whpx-all.c  |    2 +-
>  target/ppc/kvm.c        |   37 ++++++++++++++++++-------------------
>  target/ppc/mmu-hash64.h |    8 +++++++-
>  12 files changed, 73 insertions(+), 31 deletions(-)
> 
> 

  parent reply	other threads:[~2018-06-28 19:48 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-06-28 10:14 [Qemu-devel] [PATCH 0/3] spapr: fix regression with older machine types Greg Kurz
2018-06-28 10:14 ` [Qemu-devel] [PATCH 1/3] target/ppc/kvm: don't pass cpu to kvm_get_smmu_info() Greg Kurz
2018-06-28 11:56   ` Cédric Le Goater
2018-06-28 12:14     ` Greg Kurz
2018-06-29  5:16   ` David Gibson
2018-06-28 10:15 ` [Qemu-devel] [PATCH 2/3] spapr: compute default value of "hpt-max-page-size" later Greg Kurz
2018-06-29  5:16   ` David Gibson
2018-06-29 19:08   ` Eduardo Habkost
2018-07-02  4:06     ` David Gibson
2018-06-28 10:15 ` [Qemu-devel] [PATCH 3/3] accel: forbid early use of kvm_enabled() and friends Greg Kurz
2018-06-29  5:18   ` David Gibson
2018-06-29 10:23     ` Greg Kurz
2018-06-29 19:58   ` Eduardo Habkost
2018-06-28 19:48 ` Greg Kurz [this message]
2018-06-29  5:21   ` [Qemu-devel] [Qemu-ppc] [PATCH 0/3] spapr: fix regression with older machine types David Gibson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180628214618.09123598@bahia.lan \
    --to=groug@kaod.org \
    --cc=clg@kaod.org \
    --cc=david@gibson.dropbear.id.au \
    --cc=ehabkost@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=qemu-ppc@nongnu.org \
    --cc=rth@twiddle.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).