qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Alex Williamson <alex.williamson@redhat.com>
To: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
	qemu-devel@nongnu.org, Luiz Capitulino <lcapitulino@redhat.com>
Subject: Re: [Qemu-devel] [PULL 14/28] exec: make address spaces 64-bit wide
Date: Thu, 09 Jan 2014 11:47:58 -0700	[thread overview]
Message-ID: <1389293278.3209.248.camel@bling.home> (raw)
In-Reply-To: <20140109180003.GA6819@redhat.com>

On Thu, 2014-01-09 at 20:00 +0200, Michael S. Tsirkin wrote:
> On Thu, Jan 09, 2014 at 10:24:47AM -0700, Alex Williamson wrote:
> > On Wed, 2013-12-11 at 20:30 +0200, Michael S. Tsirkin wrote:
> > > From: Paolo Bonzini <pbonzini@redhat.com>
> > > 
> > > As an alternative to commit 818f86b (exec: limit system memory
> > > size, 2013-11-04) let's just make all address spaces 64-bit wide.
> > > This eliminates problems with phys_page_find ignoring bits above
> > > TARGET_PHYS_ADDR_SPACE_BITS and address_space_translate_internal
> > > consequently messing up the computations.
> > > 
> > > In Luiz's reported crash, at startup gdb attempts to read from address
> > > 0xffffffffffffffe6 to 0xffffffffffffffff inclusive.  The region it gets
> > > is the newly introduced master abort region, which is as big as the PCI
> > > address space (see pci_bus_init).  Due to a typo that's only 2^63-1,
> > > not 2^64.  But we get it anyway because phys_page_find ignores the upper
> > > bits of the physical address.  In address_space_translate_internal then
> > > 
> > >     diff = int128_sub(section->mr->size, int128_make64(addr));
> > >     *plen = int128_get64(int128_min(diff, int128_make64(*plen)));
> > > 
> > > diff becomes negative, and int128_get64 booms.
> > > 
> > > The size of the PCI address space region should be fixed anyway.
> > > 
> > > Reported-by: Luiz Capitulino <lcapitulino@redhat.com>
> > > Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > > ---
> > >  exec.c | 8 ++------
> > >  1 file changed, 2 insertions(+), 6 deletions(-)
> > > 
> > > diff --git a/exec.c b/exec.c
> > > index 7e5ce93..f907f5f 100644
> > > --- a/exec.c
> > > +++ b/exec.c
> > > @@ -94,7 +94,7 @@ struct PhysPageEntry {
> > >  #define PHYS_MAP_NODE_NIL (((uint32_t)~0) >> 6)
> > >  
> > >  /* Size of the L2 (and L3, etc) page tables.  */
> > > -#define ADDR_SPACE_BITS TARGET_PHYS_ADDR_SPACE_BITS
> > > +#define ADDR_SPACE_BITS 64
> > >  
> > >  #define P_L2_BITS 10
> > >  #define P_L2_SIZE (1 << P_L2_BITS)
> > > @@ -1861,11 +1861,7 @@ static void memory_map_init(void)
> > >  {
> > >      system_memory = g_malloc(sizeof(*system_memory));
> > >  
> > > -    assert(ADDR_SPACE_BITS <= 64);
> > > -
> > > -    memory_region_init(system_memory, NULL, "system",
> > > -                       ADDR_SPACE_BITS == 64 ?
> > > -                       UINT64_MAX : (0x1ULL << ADDR_SPACE_BITS));
> > > +    memory_region_init(system_memory, NULL, "system", UINT64_MAX);
> > >      address_space_init(&address_space_memory, system_memory, "memory");
> > >  
> > >      system_io = g_malloc(sizeof(*system_io));
> > 
> > This seems to have some unexpected consequences around sizing 64bit PCI
> > BARs that I'm not sure how to handle.
> 
> BARs are often disabled during sizing. Maybe you
> don't detect BAR being disabled?

See the trace below, the BARs are not disabled.  QEMU pci-core is doing
the sizing an memory region updates for the BARs, vfio is just a
pass-through here.

> >  After this patch I get vfio
> > traces like this:
> > 
> > vfio: vfio_pci_read_config(0000:01:10.0, @0x10, len=0x4) febe0004
> > (save lower 32bits of BAR)
> > vfio: vfio_pci_write_config(0000:01:10.0, @0x10, 0xffffffff, len=0x4)
> > (write mask to BAR)
> > vfio: region_del febe0000 - febe3fff
> > (memory region gets unmapped)
> > vfio: vfio_pci_read_config(0000:01:10.0, @0x10, len=0x4) ffffc004
> > (read size mask)
> > vfio: vfio_pci_write_config(0000:01:10.0, @0x10, 0xfebe0004, len=0x4)
> > (restore BAR)
> > vfio: region_add febe0000 - febe3fff [0x7fcf3654d000]
> > (memory region re-mapped)
> > vfio: vfio_pci_read_config(0000:01:10.0, @0x14, len=0x4) 0
> > (save upper 32bits of BAR)
> > vfio: vfio_pci_write_config(0000:01:10.0, @0x14, 0xffffffff, len=0x4)
> > (write mask to BAR)
> > vfio: region_del febe0000 - febe3fff
> > (memory region gets unmapped)
> > vfio: region_add fffffffffebe0000 - fffffffffebe3fff [0x7fcf3654d000]
> > (memory region gets re-mapped with new address)
> > qemu-system-x86_64: vfio_dma_map(0x7fcf38861710, 0xfffffffffebe0000, 0x4000, 0x7fcf3654d000) = -14 (Bad address)
> > (iommu barfs because it can only handle 48bit physical addresses)
> > 
> 
> Why are you trying to program BAR addresses for dma in the iommu?

Two reasons, first I can't tell the difference between RAM and MMIO.
Second, it enables peer-to-peer DMA between devices, which is something
that we might be able to take advantage of with GPU passthrough.

> > Prior to this change, there was no re-map with the fffffffffebe0000
> > address, presumably because it was beyond the address space of the PCI
> > window.  This address is clearly not in a PCI MMIO space, so why are we
> > allowing it to be realized in the system address space at this location?
> > Thanks,
> > 
> > Alex
> 
> Why do you think it is not in PCI MMIO space?
> True, CPU can't access this address but other pci devices can.

What happens on real hardware when an address like this is programmed to
a device?  The CPU doesn't have the physical bits to access it.  I have
serious doubts that another PCI device would be able to access it
either.  Maybe in some limited scenario where the devices are on the
same conventional PCI bus.  In the typical case, PCI addresses are
always limited by some kind of aperture, whether that's explicit in
bridge windows or implicit in hardware design (and perhaps made explicit
in ACPI).  Even if I wanted to filter these out as noise in vfio, how
would I do it in a way that still allows real 64bit MMIO to be
programmed.  PCI has this knowledge, I hope.  VFIO doesn't.  Thanks,

Alex

  reply	other threads:[~2014-01-09 18:48 UTC|newest]

Thread overview: 74+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-12-11 18:30 [Qemu-devel] [PULL 00/28] acpi.pci,pc,memory core fixes Michael S. Tsirkin
2013-12-11 18:30 ` [Qemu-devel] [PULL 01/28] hw: Pass QEMUMachine to its init() method Michael S. Tsirkin
2013-12-11 18:30 ` [Qemu-devel] [PULL 02/28] pc: map PCI address space as catchall region for not mapped addresses Michael S. Tsirkin
2013-12-11 18:30 ` [Qemu-devel] [PULL 03/28] qtest: split configuration of qtest accelerator and chardev Michael S. Tsirkin
2013-12-11 18:30 ` [Qemu-devel] [PULL 04/28] acpi-test: basic acpi unit-test Michael S. Tsirkin
2013-12-11 18:30 ` [Qemu-devel] [PULL 05/28] MAINTAINERS: update X86 machine entry Michael S. Tsirkin
2013-12-11 18:30 ` [Qemu-devel] [PULL 06/28] pci: fix address space size for bridge Michael S. Tsirkin
2013-12-11 18:30 ` [Qemu-devel] [PULL 07/28] pc: s/INT64_MAX/UINT64_MAX/ Michael S. Tsirkin
2013-12-11 18:30 ` [Qemu-devel] [PULL 08/28] spapr_pci: s/INT64_MAX/UINT64_MAX/ Michael S. Tsirkin
2013-12-11 18:30 ` [Qemu-devel] [PULL 09/28] split definitions for exec.c and translate-all.c radix trees Michael S. Tsirkin
2013-12-11 18:30 ` [Qemu-devel] [PULL 10/28] exec: replace leaf with skip Michael S. Tsirkin
2013-12-11 18:30 ` [Qemu-devel] [PULL 11/28] exec: extend skip field to 6 bit, page entry to 32 bit Michael S. Tsirkin
2013-12-11 18:30 ` [Qemu-devel] [PULL 12/28] exec: pass hw address to phys_page_find Michael S. Tsirkin
2013-12-11 18:30 ` [Qemu-devel] [PULL 13/28] exec: memory radix tree page level compression Michael S. Tsirkin
2013-12-11 18:30 ` [Qemu-devel] [PULL 14/28] exec: make address spaces 64-bit wide Michael S. Tsirkin
2014-01-09 17:24   ` Alex Williamson
2014-01-09 18:00     ` Michael S. Tsirkin
2014-01-09 18:47       ` Alex Williamson [this message]
2014-01-09 19:03         ` Alex Williamson
2014-01-09 21:56           ` Michael S. Tsirkin
2014-01-09 22:42             ` Alex Williamson
2014-01-10 12:55               ` Michael S. Tsirkin
2014-01-10 15:31                 ` Alex Williamson
2014-01-12  7:54                   ` Michael S. Tsirkin
2014-01-12 15:03                     ` Alexander Graf
2014-01-13 21:39                       ` Alex Williamson
2014-01-13 21:48                         ` Alexander Graf
2014-01-13 22:48                           ` Alex Williamson
2014-01-14 10:24                             ` Avi Kivity
2014-01-14 11:50                               ` Michael S. Tsirkin
2014-01-14 15:36                               ` Alex Williamson
2014-01-14 16:20                                 ` Michael S. Tsirkin
2014-01-14 12:07                             ` Michael S. Tsirkin
2014-01-14 15:57                               ` Alex Williamson
2014-01-14 16:03                                 ` Michael S. Tsirkin
2014-01-14 16:15                                   ` Alex Williamson
2014-01-14 16:18                                     ` Michael S. Tsirkin
2014-01-14 16:39                                       ` Alex Williamson
2014-01-14 16:45                                         ` Michael S. Tsirkin
2014-01-14  8:18                           ` Michael S. Tsirkin
2014-01-14  9:20                             ` Alexander Graf
2014-01-14  9:31                               ` Peter Maydell
2014-01-14 10:28                               ` Michael S. Tsirkin
2014-01-14 10:43                               ` Michael S. Tsirkin
2014-01-14 12:21                         ` Michael S. Tsirkin
2014-01-14 15:49                           ` Alex Williamson
2014-01-14 16:07                             ` Michael S. Tsirkin
2014-01-14 17:49                             ` Mike Day
2014-01-14 17:55                               ` Mike Day
2014-01-14 18:05                                 ` Alex Williamson
2014-01-14 18:20                                   ` Mike Day
2014-01-14 13:50                     ` Mike Day
2014-01-14 14:05                       ` Michael S. Tsirkin
2014-01-14 15:01                         ` Mike Day
2014-01-15  0:48                         ` Alexey Kardashevskiy
2014-01-20 16:20     ` Mike Day
2014-01-20 16:45       ` Alex Williamson
2014-01-20 17:04         ` Michael S. Tsirkin
2014-01-20 17:16           ` Alex Williamson
2014-01-20 20:37             ` Michael S. Tsirkin
2013-12-11 18:30 ` [Qemu-devel] [PULL 15/28] exec: reduce L2_PAGE_SIZE Michael S. Tsirkin
2013-12-11 18:30 ` [Qemu-devel] [PULL 16/28] smbios: Set system manufacturer, product & version by default Michael S. Tsirkin
2013-12-11 18:31 ` [Qemu-devel] [PULL 17/28] acpi unit-test: verify signature and checksum Michael S. Tsirkin
2013-12-11 18:31 ` [Qemu-devel] [PULL 18/28] acpi: strip compiler info in built-in DSDT Michael S. Tsirkin
2013-12-11 18:31 ` [Qemu-devel] [PULL 19/28] ACPI DSDT: Make control method `IQCR` serialized Michael S. Tsirkin
2013-12-11 18:31 ` [Qemu-devel] [PULL 20/28] pci: fix pci bridge fw path Michael S. Tsirkin
2013-12-11 18:31 ` [Qemu-devel] [PULL 21/28] hpet: inverse polarity when pin above ISA_NUM_IRQS Michael S. Tsirkin
2013-12-11 18:31 ` [Qemu-devel] [PULL 22/28] hpet: enable to entitle more irq pins for hpet Michael S. Tsirkin
2013-12-11 18:31 ` [Qemu-devel] [PULL 23/28] memory.c: bugfix - ref counting mismatch in memory_region_find Michael S. Tsirkin
2013-12-11 18:31 ` [Qemu-devel] [PULL 24/28] exec: separate sections and nodes per address space Michael S. Tsirkin
2013-12-11 18:31 ` [Qemu-devel] [PULL 25/28] acpi unit-test: load and check facs table Michael S. Tsirkin
2013-12-11 18:31 ` [Qemu-devel] [PULL 26/28] acpi unit-test: adjust the test data structure for better handling Michael S. Tsirkin
2013-12-11 18:31 ` [Qemu-devel] [PULL 27/28] hpet: fix build with CONFIG_HPET off Michael S. Tsirkin
2013-12-11 18:31 ` [Qemu-devel] [PULL 28/28] pc: use macro for HPET type Michael S. Tsirkin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1389293278.3209.248.camel@bling.home \
    --to=alex.williamson@redhat.com \
    --cc=lcapitulino@redhat.com \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).