From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:41151)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <dgilbert@redhat.com>) id 1bIZgk-0001De-SM
	for qemu-devel@nongnu.org; Thu, 30 Jun 2016 06:59:20 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <dgilbert@redhat.com>) id 1bIZgf-00029l-TM
	for qemu-devel@nongnu.org; Thu, 30 Jun 2016 06:59:18 -0400
Received: from mx1.redhat.com ([209.132.183.28]:47915)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <dgilbert@redhat.com>) id 1bIZgf-00029f-LH
	for qemu-devel@nongnu.org; Thu, 30 Jun 2016 06:59:13 -0400
Received: from int-mx13.intmail.prod.int.phx2.redhat.com
	(int-mx13.intmail.prod.int.phx2.redhat.com [10.5.11.26])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by mx1.redhat.com (Postfix) with ESMTPS id 07E55C049D5A
	for <qemu-devel@nongnu.org>; Thu, 30 Jun 2016 10:59:13 +0000 (UTC)
Date: Thu, 30 Jun 2016 11:59:09 +0100
From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Message-ID: <20160630105908.GA2683@work-vm>
References: <20160617154905.GH18662@thinpad.lan.raisama.net>
	<20160621194440.GN17952@thinpad.lan.raisama.net>
	<9b76415a-23e6-3ded-4dbc-42838cc164b0@redhat.com>
	<20160622142414.GI30202@redhat.com>
	<20160623014216-mutt-send-email-mst@redhat.com>
	<20160622232308.GQ30202@redhat.com>
	<20160623024400-mutt-send-email-mst@redhat.com>
	<1466671203.26189.35.camel@redhat.com>
	<20160629164252.GD10488@work-vm>
	<1467267046.15123.94.camel@redhat.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <1467267046.15123.94.camel@redhat.com>
Subject: Re: [Qemu-devel] Default for phys-addr-bits? (was Re: [PATCH 4/5]
 x86: Allow physical address bits to be set)
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel/>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Gerd Hoffmann <kraxel@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>, Andrea Arcangeli <aarcange@redhat.com>, Marcel Apfelbaum <marcel@redhat.com>, Paolo Bonzini <pbonzini@redhat.com>, qemu-devel@nongnu.org, Eduardo Habkost <ehabkost@redhat.com>

* Gerd Hoffmann (kraxel@redhat.com) wrote:
>   Hi,
> 
> > Something somewhere in qemu/ kernel/ firmware is already reading the number
> > of physical bits to determine PCI mapping; if I do:
> > 
> > ./x86_64-softmmu/qemu-system-x86_64 -m 4096,slots=16,maxmem=128T
> 
> No, it's not the physbits.  You add some memory hotplug slots here.
> Qemu will ask seabios to reserve address space for those, which seabios
> promptly does and maps 64bit pci bars above the reserved address space.

Right, that's what I was trying to do - I wanted to see if I could get something
to use the non-existing address space.

> >    -vga none -device qxl-vga,bus=pcie.0,ram_size_mb=2048,vram64_size_mb=2048 -vnc 0.0.0.0:0 /home/vms/7.2a.qcow2 -chardev stdio,mux=on,id=mon -mon chardev=mon,mode=readline -cpu host,phys-bits=48
> > 
> > it will happily map the qxl VRAM right up high, but if I lower
> > the phys-bits down to 46 it won't.
> 
> I suspect the linux kernel remaps the bar because the seabios mapping is
> unreachable.  Check dmesg.

Right, and that is dependent on physbits; if I run with:

./x86_64-softmmu/qemu-system-x86_64 -machine q35,accel=kvm,usb=off -m 4096,slots=16,maxmem=128T   -vga none -device qxl-vga,bus=pcie.0,ram_size_mb=2048,vram64_size_mb=2048 -vnc 0.0.0.0:0 /home/vms/7.2a.qcow2 -chardev stdio,mux=on,id=mon -mon chardev=mon,mode=readline -cpu host,phys-bits=48
  (on a 46 bit xeon) it happily maps that 64-bit bar into somewhere
that shouldn't be accessible:

[    0.266183] pci_bus 0000:00: root bus resource [mem 0x800480000000-0x8004ffffffff]
[    0.321611] pci 0000:00:02.0: reg 0x20: [mem 0x800480000000-0x8004ffffffff 64bit pref]
[    0.423257] pci_bus 0000:00: resource 8 [mem 0x800480000000-0x8004ffffffff]

lspci -v:

00:02.0 VGA compatible controller: Red Hat, Inc. QXL paravirtual graphic card (rev 04) (prog-if 00 [VGA controller])
	Subsystem: Red Hat, Inc QEMU Virtual Machine
	Flags: fast devsel, IRQ 22
	Memory at c0000000 (32-bit, non-prefetchable) [size=512M]
	Memory at e0000000 (32-bit, non-prefetchable) [size=64M]
	Memory at e4070000 (32-bit, non-prefetchable) [size=8K]
	I/O ports at c080 [size=32]
	Memory at 800480000000 (64-bit, prefetchable) [size=2G]
	Expansion ROM at e4060000 [disabled] [size=64K]
	Kernel driver in use: qxl

So that's mapped at an address beyond host phys-bits.
And it hasn't failed/crashed etc - but I guess maybe nothing is using that 2G space?

If I change the phys-bits=48 to 46 the kernel avoids it:
    [    0.414867] acpi PNP0A08:00: host bridge window [0x800480000000-0x8004ffffffff] (ignored, not CPU addressable)
    [    0.683134] pci 0000:00:02.0: can't claim BAR 4 [mem 0x800480000000-0x8004ffffffff 64bit pref]: no compatible bridge window
    [    0.703948] pci 0000:00:02.0: BAR 4: [mem size 0x80000000 64bit pref] conflicts with PCI mem [mem 0x00000000-0x3fffffffffff]
    [    0.703951] pci 0000:00:02.0: BAR 4: failed to assign [mem size 0x80000000 64bit pref]

lspci shows:
    Memory at <ignored> (64-bit, prefetchable)

(Although interesting qemu's info pci still shows it).

The 'ignored, not CPU addressable' comes from the kernel's drivers/acpi/pci_root.c acpi_pci_root_validate_resources
that uses a value set in arch/x86/kernel/setup.c:
    iomem_resource.end = (1ULL << boot_cpu_data.x86_phys_bits) - 1;

So at least the Linux kernel does sanity check using the phys_bits value.

Obviously 128T is a bit silly for maxmem at the moment, however I was worrying what
happens with 36/39/40bit hosts, and it's not unusual to pick a maxmem that's a few TB
even if the VMs you're initially creating are only a handful of GB. (oVirt/RHEV seems to use
a 4TB default for maxmem).

Still, this only hits as a problem if you hit the combination of:
   a) You use large PCI bars
   b) On a 36/39/40bit host
   c) With a large maxmem that forces those PCI bars up to something silly.

Dave

> 
> cheers,
>   Gerd
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK