From: David Gibson <david@gibson.dropbear.id.au>
To: Alex Williamson <alex.williamson@redhat.com>
Cc: Alexey Kardashevskiy <aik@ozlabs.ru>,
Daniel Henrique Barboza <danielhb413@gmail.com>,
qemu-devel@nongnu.org, qemu-ppc@nongnu.org,
Piotr Jaroszynski <pjaroszynski@nvidia.com>,
Jose Ricardo Ziviani <joserz@linux.ibm.com>
Subject: Re: [Qemu-devel] [PATCH qemu 0/3] spapr_pci, vfio: NVIDIA V100 + P9 passthrough
Date: Fri, 8 Feb 2019 16:28:50 +1100 [thread overview]
Message-ID: <20190208052849.GB6434@umbus.fritz.box> (raw)
In-Reply-To: <20190207202620.23e9c063@x1.home>
[-- Attachment #1: Type: text/plain, Size: 3915 bytes --]
On Thu, Feb 07, 2019 at 08:26:20PM -0700, Alex Williamson wrote:
> On Fri, 8 Feb 2019 13:29:37 +1100
> Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
>
> > On 08/02/2019 02:18, Alex Williamson wrote:
> > > On Thu, 7 Feb 2019 15:43:18 +1100
> > > Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
> > >
> > >> On 07/02/2019 04:22, Daniel Henrique Barboza wrote:
> > >>> Based on this series, I've sent a Libvirt patch to allow a QEMU process
> > >>> to inherit IPC_LOCK when using VFIO passthrough with the Tesla V100
> > >>> GPU:
> > >>>
> > >>> https://www.redhat.com/archives/libvir-list/2019-February/msg00219.html
> > >>>
> > >>>
> > >>> In that thread, Alex raised concerns about allowing QEMU to freely lock
> > >>> all the memory it wants. Is this an issue to be considered in the review
> > >>> of this series here?
> > >>>
> > >>> Reading the patches, specially patch 3/3, it seems to me that QEMU is
> > >>> going to lock the KVM memory to populate the NUMA node with memory
> > >>> of the GPU itself, so at first there is no risk of not taking over the
> > >>> host RAM.
> > >>> Am I missing something?
> > >>
> > >>
> > >> The GPU memory belongs to the device and not visible to the host as
> > >> memory blocks and not covered by page structs, for the host it is more
> > >> like MMIO which is passed through to the guest without that locked
> > >> accounting, I'd expect libvirt to keep working as usual except that:
> > >>
> > >> when libvirt calculates the amount of memory needed for TCE tables
> > >> (which is guestRAM/64k*8), now it needs to use the end of the last GPU
> > >> RAM window as a guest RAM size. For example, in QEMU HMP "info mtree -f":
> > >>
> > >> FlatView #2
> > >> AS "memory", root: system
> > >> AS "cpu-memory-0", root: system
> > >> Root memory region: system
> > >> 0000000000000000-000000007fffffff (prio 0, ram): ppc_spapr.ram
> > >> 0000010000000000-0000011fffffffff (prio 0, ram): nvlink2-mr
> > >>
> > >> So previously the DMA window would cover 0x7fffffff+1, now it has to
> > >> cover 0x11fffffffff+1.
> > >
> > > This looks like a chicken and egg problem, you're saying libvirt needs
> > > to query mtree to understand the extent of the GPU layout, but we need
> > > to specify the locked memory limits in order for QEMU to start? Is
> > > libvirt supposed to start the VM with unlimited locked memory and fix
> > > it at some indeterminate point in the future? Run a dummy VM with
> > > unlimited locked memory in order to determine the limits for the real
> > > VM? Neither of these sound practical. Thanks,
> >
> >
> > QEMU maps GPU RAM at known locations (which only depends on the vPHB's
> > index or can be set explicitely) and libvirt knows how many GPUs are
> > passed so it is quite easy to calculate the required amount of memory.
> >
> > Here is the window start calculation:
> > https://github.com/aik/qemu/commit/7073cad3ae7708d657e01672bcf53092808b54fb#diff-662409c2a5a150fe231d07ea8384b920R3812
> >
> > We do not exactly know the GPU RAM window size until QEMU reads it from
> > VFIO/nvlink2 but we know that all existing hardware has a window of
> > 128GB (the adapters I have access to only have 16/32GB on board).
>
> So you're asking that libvirt add 128GB per GPU with magic nvlink
> properties, which may be 8x what's actually necessary and libvirt
> determines which GPUs to apply this to how? Does libvirt need to sort
> through device tree properties for this? Thanks,
Hm. If the GPU memory is really separate from main RAM, which it
sounds like, I don't think it makes sense to account it against the
same locked memory limit as regular RAM.
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
next prev parent reply other threads:[~2019-02-08 6:16 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-01-17 2:51 [Qemu-devel] [PATCH qemu 0/3] spapr_pci, vfio: NVIDIA V100 + P9 passthrough Alexey Kardashevskiy
2019-01-17 2:51 ` [Qemu-devel] [PATCH qemu 1/3] vfio/spapr: Fix indirect levels calculation Alexey Kardashevskiy
2019-02-05 5:54 ` David Gibson
2019-01-17 2:51 ` [Qemu-devel] [PATCH qemu 2/3] vfio: Make vfio_get_region_info_cap public Alexey Kardashevskiy
2019-01-17 2:51 ` [Qemu-devel] [PATCH qemu 3/3] spapr: Support NVIDIA V100 GPU with NVLink2 Alexey Kardashevskiy
2019-02-03 23:59 ` [Qemu-devel] [PATCH qemu 0/3] spapr_pci, vfio: NVIDIA V100 + P9 passthrough Alexey Kardashevskiy
2019-02-06 17:22 ` Daniel Henrique Barboza
2019-02-07 4:43 ` Alexey Kardashevskiy
2019-02-07 15:18 ` Alex Williamson
2019-02-08 2:29 ` Alexey Kardashevskiy
2019-02-08 3:26 ` Alex Williamson
2019-02-08 5:28 ` David Gibson [this message]
2019-02-08 15:52 ` Alex Williamson
2019-02-08 16:25 ` Daniel Henrique Barboza
2019-02-11 3:49 ` Alexey Kardashevskiy
2019-02-11 6:07 ` Alex Williamson
2019-02-11 7:46 ` Alexey Kardashevskiy
2019-02-14 5:02 ` David Gibson
2019-02-14 4:59 ` David Gibson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190208052849.GB6434@umbus.fritz.box \
--to=david@gibson.dropbear.id.au \
--cc=aik@ozlabs.ru \
--cc=alex.williamson@redhat.com \
--cc=danielhb413@gmail.com \
--cc=joserz@linux.ibm.com \
--cc=pjaroszynski@nvidia.com \
--cc=qemu-devel@nongnu.org \
--cc=qemu-ppc@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).