All of lore.kernel.org
 help / color / mirror / Atom feed
From: Greg Kurz <groug@kaod.org>
To: qemu-devel@nongnu.org
Cc: "Eduardo Habkost" <ehabkost@redhat.com>,
	qemu-ppc@nongnu.org, "Cédric Le Goater" <clg@kaod.org>,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	"David Gibson" <david@gibson.dropbear.id.au>,
	"Richard Henderson" <rth@twiddle.net>
Subject: Re: [Qemu-devel] [Qemu-ppc] [PATCH 0/3] spapr: fix regression with older machine types
Date: Thu, 28 Jun 2018 21:48:25 +0200	[thread overview]
Message-ID: <20180628214618.09123598@bahia.lan> (raw)
In-Reply-To: <153018086531.336571.17029459443980070626.stgit@bahia.lan>

On Thu, 28 Jun 2018 12:14:25 +0200
Greg Kurz <groug@kaod.org> wrote:

> Since the recent cleanups to hide host configuration details from guests,
> it isn't possible to start an older machine type with HV KVM [*]:
> 
> qemu-system-ppc64: KVM doesn't support for base page shift 34
> 
> This basically boils down to the fact that it isn't safe to call
> the kvmppc_hpt_needs_host_contiguous_pages() helper from a class
> init function because:
> - KVM isn't initialized yet, and kvm_enabled() always return false
>   in this case. This causes kvmppc_hpt_needs_host_contiguous_pages()
>   to do nothing and we end up choosing a 16G default page size
>   which is not supported by KVM.
> - even if we drop kvm_enabled() we then have the issue that
>   kvmppc_hpt_needs_host_contiguous_pages() assumes CPUs are
>   created, which isn't the case either.
> 
> The choice was made to initialize capabilities during machine
> init before creating the CPUs, and I don't think we should
> revert to the previous behavior. Let's go forward instead and
> ensure we can retrieve the MMU information from KVM before
> CPUs are created.
> 
> To fix this, we first change kvm_get_smmu_info() so that it
> doesn't need a CPU object. This allows to stop using first_cpu
> in kvmppc_hpt_needs_host_contiguous_pages(). Then we delay
> the setting of the default value to machine init time, so
> that we're sure that KVM is fully initialized.
> 
> As a bonus, the last patch is a tentative to be able to detect
> such misuse of *_enabled() accelerator helpers earlier.
> 
> Please comment.
> 
> [*] it also breaks PR KVM actually, but the error is different and
>     I need to dig some more.
> 

With current master:

1) qemu-system-ppc64 -machine pseries,accel=kvm,kvm-type=PR

The guest starts but its kernel oopses at some point:

[    0.011328] kernel tried to execute exec-protected page (c000000001611244) -exploit attempt? (uid: 0)
[    0.011379] Unable to handle kernel paging request for instruction fetch
[    0.011416] Faulting instruction address: 0xc000000001611244
[    0.011453] Oops: Kernel access of bad area, sig: 11 [#1]
[    0.011482] LE SMP NR_CPUS=1024 NUMA pSeries
[    0.011512] Modules linked in:
[    0.011557] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.17.2-200.fc28.ppc64le #1
[    0.011600] NIP:  c000000001611244 LR: c00000000000acec CTR: 0000000000000000
[    0.011643] REGS: c00000003fffba90 TRAP: 0400   Not tainted  (4.17.2-200.fc28.ppc64le)
[    0.011694] MSR:  b000000010001033 <SF,HV,ME,IR,DR,RI,LE>  CR: 28000848  XER: 20000000
[    0.011741] CFAR: 0000000000000000 SOFTE: 1 
[    0.011741] GPR00: 0000000000000000 c00000003fffbd10 c000000001570b00 c00000003fffbd80 
[    0.011741] GPR04: c000000000034418 0000000048000000 000000000000000a 000000004aa21de8 
[    0.011741] GPR08: 000000007d410164 0000000000000000 0000000000000002 0000000000000900 
[    0.011741] GPR12: b000000002009033 c000000001840000 c000000000071a2c 00000000495de1a4 
[    0.011741] GPR16: 0000000000000078 c00000000160fd10 c000000000e705e0 000000007c1b03a6 
[    0.011741] GPR20: 000000007c1ffaa6 c0000000016125b8 c0000000014253e8 000000007c1303a6 
[    0.011741] GPR24: 000000007c1643a6 000000007c1a03a6 c00000000160fd08 ffffffffebc0f008 
[    0.011741] GPR28: ffffffffebc0f000 c0000000000345d8 c0000000000345d8 0000000000000000 
[    0.012138] NIP [c000000001611244] kvm_tmp+0x1534/0x100000
[    0.012170] LR [c00000000000acec] soft_nmi_common+0xcc/0xd0
[    0.012199] Call Trace:
[    0.012214] Instruction dump:
[    0.012236] XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX 
[    0.012289] XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX 
[    0.012334] ---[ end trace d2ee28832d481d2d ]---
[    0.012362] 
[    1.012387] kernel tried to execute exec-protected page (c000000001611808) -exploit attempt? (uid: 0)
[    1.012433] Unable to handle kernel paging request for instruction fetch
[    1.012468] Faulting instruction address: 0xc000000001611808
[    1.012504] Oops: Kernel access of bad area, sig: 11 [#2]
[    1.012532] LE SMP NR_CPUS=1024 NUMA pSeries
[    1.012561] Modules linked in:
[    1.012583] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G      D           4.17.2-200.fc28.ppc64le #1
[    1.012641] NIP:  c000000001611808 LR: c0000000001247fc CTR: c000000001840000
[    1.012684] REGS: c00000003fffb5d0 TRAP: 0400   Tainted: G      D            (4.17.2-200.fc28.ppc64le)
[    1.012740] MSR:  b000000010001033 <SF,HV,ME,IR,DR,RI,LE>  CR: 48000224  XER: 20000000
[    1.012785] CFAR: 0000000000000000 SOFTE: 0 
[    1.012785] GPR00: c0000000001247fc c00000003fffb850 c000000001570b00 0000000000000000 
[    1.012785] GPR04: 0000000000000000 c0000000fe9e4900 fffffffffffffffd c0000000fe9e4900 
[    1.012785] GPR08: 00000000fed50000 b000000000001033 0000000000000009 c00000003fffb55f 
[    1.012785] GPR12: 0000000000000000 c000000001840000 c000000000071a2c 00000000495de1a4 
[    1.012785] GPR16: 0000000000000078 c00000000160fd10 c000000000e705e0 000000007c1b03a6 
[    1.012785] GPR20: 000000007c1ffaa6 c0000000016125b8 c0000000014253e8 000000007c1303a6 
[    1.012785] GPR24: 000000007c1643a6 000000007c1a03a6 c00000000160fd08 ffffffffebc0f008 
[    1.012785] GPR28: 0000000000000000 000000000000000b 000000000000000b c0000000fe9e4900 
[    1.013166] NIP [c000000001611808] kvm_tmp+0x1af8/0x100000
[    1.013196] LR [c0000000001247fc] do_exit+0x12c/0xd30
[    1.013224] Call Trace:
[    1.013238] Instruction dump:
[    1.013260] XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX 
[    1.013303] XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX 
[    1.013348] ---[ end trace d2ee28832d481d2e ]---
[    1.013375] 
[    2.013391] Fixing recursive fault but reboot is needed!

and the guest gets unresponsive.

2) qemu-system-ppc64 -machine pseries-2.12,accel=kvm,kvm-type=PR

prints an error message and terminates right away:

qemu-system-ppc64: KVM doesn't support page shift 24/12

This error is expected: since PR KVM doesn't set KVM_PPC_PAGE_SIZES_REAL,
ie, we choose to support all possible page sizes, but PR KVM doesn't
support this page shift combination indeed. Unsurprisingly we get the
same error with:

-machine pseries,accel-kvm,kvm-type=PR,cap-hpt-max-page-size=${pagesize}

if ${pagesize} is >= 16m. This is the result of PR KVM not supporting
MPSS at all, even though it supports 16m pages in a 16m segment. We
cannot really fix this in QEMU, unless we completely filter out MPSS
in spapr_pagesize_cb() but I'm pretty sure we don't want that. :)

But then, if we go for a 64k limit, we hit 1).

An obvious change in the DT since the page size cleanup is:

                            [4k seg    [4k pg]] [64k seg      [64k pg]] [16m seg      [16m pg]]
- ibm,segment-page-sizes = <0xc 0x0 0x1 0xc 0x0 0x10 0x110 0x1 0x10 0x1 0x18 0x100 0x1 0x18 0x0>;
+ ibm,segment-page-sizes = <0xc 0x0 0x1 0xc 0x0 0x10 0x110 0x1 0x10 0x1>;
                            [4k seg    [4k pg]] [64k seg      [64k pg]]

If I add the 16m entry back, the guest boots just fine.

Not sure yet what's happening... any idea ?

Cheers,

--
Greg


> --
> Greg
> 
> ---
> 
> Greg Kurz (3):
>       target/ppc/kvm: don't pass cpu to kvm_get_smmu_info()
>       spapr: compute default value of "hpt-max-page-size" later
>       accel: forbid early use of kvm_enabled() and friends
> 
> 
>  accel/accel.c           |    7 +++++++
>  hw/ppc/spapr.c          |   25 ++++++++++++++++++-------
>  include/qemu-common.h   |    3 ++-
>  include/sysemu/accel.h  |    1 +
>  include/sysemu/kvm.h    |    3 ++-
>  qom/cpu.c               |    1 +
>  stubs/Makefile.objs     |    1 +
>  stubs/accel.c           |   14 ++++++++++++++
>  target/i386/hax-all.c   |    2 +-
>  target/i386/whpx-all.c  |    2 +-
>  target/ppc/kvm.c        |   37 ++++++++++++++++++-------------------
>  target/ppc/mmu-hash64.h |    8 +++++++-
>  12 files changed, 73 insertions(+), 31 deletions(-)
> 
> 

  parent reply	other threads:[~2018-06-28 19:48 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-06-28 10:14 [Qemu-devel] [PATCH 0/3] spapr: fix regression with older machine types Greg Kurz
2018-06-28 10:14 ` [Qemu-devel] [PATCH 1/3] target/ppc/kvm: don't pass cpu to kvm_get_smmu_info() Greg Kurz
2018-06-28 11:56   ` Cédric Le Goater
2018-06-28 12:14     ` Greg Kurz
2018-06-29  5:16   ` David Gibson
2018-06-28 10:15 ` [Qemu-devel] [PATCH 2/3] spapr: compute default value of "hpt-max-page-size" later Greg Kurz
2018-06-29  5:16   ` David Gibson
2018-06-29 19:08   ` Eduardo Habkost
2018-07-02  4:06     ` David Gibson
2018-06-28 10:15 ` [Qemu-devel] [PATCH 3/3] accel: forbid early use of kvm_enabled() and friends Greg Kurz
2018-06-29  5:18   ` David Gibson
2018-06-29 10:23     ` Greg Kurz
2018-06-29 19:58   ` Eduardo Habkost
2018-06-28 19:48 ` Greg Kurz [this message]
2018-06-29  5:21   ` [Qemu-devel] [Qemu-ppc] [PATCH 0/3] spapr: fix regression with older machine types David Gibson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180628214618.09123598@bahia.lan \
    --to=groug@kaod.org \
    --cc=clg@kaod.org \
    --cc=david@gibson.dropbear.id.au \
    --cc=ehabkost@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=qemu-ppc@nongnu.org \
    --cc=rth@twiddle.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.