Re: [Qemu-devel] [PATCH qemu RFC 5/7] spapr-iommu: Always advertise the maximum possible DMA window size

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: David Gibson <david@gibson.dropbear.id.au>
To: Alexey Kardashevskiy <aik@ozlabs.ru>
Cc: qemu-devel@nongnu.org, qemu-ppc@nongnu.org,
	"Alistair Popple" <alistair@popple.id.au>,
	"Reza Arbab" <arbab@linux.ibm.com>,
	"Sam Bobroff" <sbobroff@linux.ibm.com>,
	"Piotr Jaroszynski" <pjaroszynski@nvidia.com>,
	"Leonardo Augusto Guimarães Garcia" <lagarcia@br.ibm.com>,
	"Jose Ricardo Ziviani" <joserz@linux.ibm.com>,
	"Alex Williamson" <alex.williamson@redhat.com>,
	"Oliver O'Halloran" <oohall@gmail.com>
Subject: Re: [Qemu-devel] [PATCH qemu RFC 5/7] spapr-iommu: Always advertise the maximum possible DMA window size
Date: Mon, 19 Nov 2018 13:42:39 +1100	[thread overview]
Message-ID: <20181119024239.GF23503@umbus> (raw)
In-Reply-To: <20181113083104.2692-6-aik@ozlabs.ru>

[-- Attachment #1: Type: text/plain, Size: 4196 bytes --]

On Tue, Nov 13, 2018 at 07:31:02PM +1100, Alexey Kardashevskiy wrote:
> When deciding about the huge DMA window, the typical Linux pseries guest
> uses the maximum allowed RAM size as the upper limit. We did the same
> on QEMU side to match that logic. Now we are going to support GPU RAM
> pass through which is not available at the guest boot time as it requires
> the guest driver interaction. As the result, the guest requests a smaller
> window than it should. Therefore the guest needs to be patched to
> understand this new memory and so does QEMU.
> 
> Instead of reimplementing here whatever solution we will choose for
> the guest, this advertises the biggest possible window size limited by
> 32 bit (as defined by LoPAPR).
> 
> This seems to be safe as:
> 1. The guest visible emulated table is allocated in KVM (actual pages
> are allocated in page fault handler) and QEMU (actual pages are allocated
> when changed);
> 2. The hardware table (and corresponding userspace addresses cache)
> supports sparse allocation and also checks for locked_vm limit so
> it is unable to cause the host any damage.
> 
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>

This seems like a good idea to me, even without the NPU stuff.  It
always bothered me slightly that we based what's effectively the IOVA
limit on the guest RAM size which doesn't have any direct connection
to it.

As long as it doesn't hit the locked memory limits, I don't see any
reason we should prevent a guest from doing something weird like
mapping a small bit of RAM over and over in IOVA space, or mapping its
RAM sparsely in IOVA space.

> ---
>  hw/ppc/spapr_rtas_ddw.c | 19 +++----------------
>  1 file changed, 3 insertions(+), 16 deletions(-)
> 
> diff --git a/hw/ppc/spapr_rtas_ddw.c b/hw/ppc/spapr_rtas_ddw.c
> index 329feb1..df60351 100644
> --- a/hw/ppc/spapr_rtas_ddw.c
> +++ b/hw/ppc/spapr_rtas_ddw.c
> @@ -96,9 +96,8 @@ static void rtas_ibm_query_pe_dma_window(PowerPCCPU *cpu,
>                                           uint32_t nret, target_ulong rets)
>  {
>      sPAPRPHBState *sphb;
> -    uint64_t buid, max_window_size;
> +    uint64_t buid;
>      uint32_t avail, addr, pgmask = 0;
> -    MachineState *machine = MACHINE(spapr);
>  
>      if ((nargs != 3) || (nret != 5)) {
>          goto param_error_exit;
> @@ -114,27 +113,15 @@ static void rtas_ibm_query_pe_dma_window(PowerPCCPU *cpu,
>      /* Translate page mask to LoPAPR format */
>      pgmask = spapr_page_mask_to_query_mask(sphb->page_size_mask);
>  
> -    /*
> -     * This is "Largest contiguous block of TCEs allocated specifically
> -     * for (that is, are reserved for) this PE".
> -     * Return the maximum number as maximum supported RAM size was in 4K pages.
> -     */
> -    if (machine->ram_size == machine->maxram_size) {
> -        max_window_size = machine->ram_size;
> -    } else {
> -        max_window_size = machine->device_memory->base +
> -                          memory_region_size(&machine->device_memory->mr);
> -    }
> -
>      avail = SPAPR_PCI_DMA_MAX_WINDOWS - spapr_phb_get_active_win_num(sphb);
>  
>      rtas_st(rets, 0, RTAS_OUT_SUCCESS);
>      rtas_st(rets, 1, avail);
> -    rtas_st(rets, 2, max_window_size >> SPAPR_TCE_PAGE_SHIFT);
> +    rtas_st(rets, 2, 0xFFFFFFFF); /* Largest contiguous block of TCEs */

One detail though.. where does this limit actually come from?  Is it
actually a limit imposed by the hardware somewhere, or just because
the RTAS call doesn't ahve room for anything more?

Previously limits would usually have been a power of 2; is it likely
to matter that now it won't be?

>      rtas_st(rets, 3, pgmask);
>      rtas_st(rets, 4, 0); /* DMA migration mask, not supported */
>  
> -    trace_spapr_iommu_ddw_query(buid, addr, avail, max_window_size, pgmask);
> +    trace_spapr_iommu_ddw_query(buid, addr, avail, 0xFFFFFFFF, pgmask);
>      return;
>  
>  param_error_exit:

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

next prev parent reply	other threads:[~2018-11-19  3:02 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-11-13  8:30 [Qemu-devel] [PATCH qemu RFC 0/7] spapr_pci, vfio: NVIDIA V100 + P9 passthrough Alexey Kardashevskiy
2018-11-13  8:30 ` [Qemu-devel] [PATCH qemu RFC 1/7] vfio/spapr: Fix indirect levels calculation Alexey Kardashevskiy
2018-11-19  1:45   ` David Gibson
2018-11-19  5:51     ` Alexey Kardashevskiy
2018-11-13  8:30 ` [Qemu-devel] [PATCH qemu RFC 2/7] linux-header: Update for new capabilities Alexey Kardashevskiy
2018-11-13  8:31 ` [Qemu-devel] [PATCH qemu RFC 3/7] pci: Move NVIDIA vendor id to the rest of ids Alexey Kardashevskiy
2018-11-19  1:46   ` David Gibson
2018-11-20 18:27   ` Alistair Francis
2018-12-14  3:36     ` Alexey Kardashevskiy
2019-01-16  4:20       ` Alexey Kardashevskiy
2019-02-14  2:26         ` Alexey Kardashevskiy
2019-02-14  3:21           ` Alex Williamson
2018-11-13  8:31 ` [Qemu-devel] [PATCH qemu RFC 4/7] vfio/nvidia-v100: Disable VBIOS update Alexey Kardashevskiy
2018-11-19  2:36   ` David Gibson
2018-11-13  8:31 ` [Qemu-devel] [PATCH qemu RFC 5/7] spapr-iommu: Always advertise the maximum possible DMA window size Alexey Kardashevskiy
2018-11-19  2:42   ` David Gibson [this message]
2018-11-19  5:08     ` Alexey Kardashevskiy
2018-11-19  5:31       ` David Gibson
2018-11-13  8:31 ` [Qemu-devel] [PATCH qemu RFC 6/7] vfio: Make vfio_get_region_info_cap public Alexey Kardashevskiy
2018-11-13  8:31 ` [Qemu-devel] [PATCH qemu RFC 7/7] spapr: Add NVLink2 pass through support Alexey Kardashevskiy
2018-11-19  3:01   ` David Gibson
2018-11-19  5:22     ` Alexey Kardashevskiy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181119024239.GF23503@umbus \
    --to=david@gibson.dropbear.id.au \
    --cc=aik@ozlabs.ru \
    --cc=alex.williamson@redhat.com \
    --cc=alistair@popple.id.au \
    --cc=arbab@linux.ibm.com \
    --cc=joserz@linux.ibm.com \
    --cc=lagarcia@br.ibm.com \
    --cc=oohall@gmail.com \
    --cc=pjaroszynski@nvidia.com \
    --cc=qemu-devel@nongnu.org \
    --cc=qemu-ppc@nongnu.org \
    --cc=sbobroff@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).