From: David Gibson <david@gibson.dropbear.id.au>
To: peter.maydell@linaro.org
Cc: groug@kaod.org, qemu-ppc@nongnu.org, qemu-devel@nongnu.org,
lvivier@redhat.com, Alexey Kardashevskiy <aik@ozlabs.ru>,
David Gibson <david@gibson.dropbear.id.au>
Subject: [Qemu-devel] [PULL 01/60] vfio/spapr: Fix indirect levels calculation
Date: Sun, 10 Mar 2019 19:26:04 +1100 [thread overview]
Message-ID: <20190310082703.1245-2-david@gibson.dropbear.id.au> (raw)
In-Reply-To: <20190310082703.1245-1-david@gibson.dropbear.id.au>
From: Alexey Kardashevskiy <aik@ozlabs.ru>
The current code assumes that we can address more bits on a PCI bus
for DMA than we really can but there is no way knowing the actual limit.
This makes a better guess for the number of levels and if the kernel
fails to allocate that, this increases the level numbers till succeeded
or reached the 64bit limit.
This adds levels to the trace point.
This may cause the kernel to warn about failed allocation:
[65122.837458] Failed to allocate a TCE memory, level shift=28
which might happen if MAX_ORDER is not large enough as it can vary:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/powerpc/Kconfig?h=v5.0-rc2#n727
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Message-Id: <20190227085149.38596-3-aik@ozlabs.ru>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
hw/vfio/spapr.c | 43 +++++++++++++++++++++++++++++++++----------
hw/vfio/trace-events | 2 +-
2 files changed, 34 insertions(+), 11 deletions(-)
diff --git a/hw/vfio/spapr.c b/hw/vfio/spapr.c
index becf71a3fc..88437a79e6 100644
--- a/hw/vfio/spapr.c
+++ b/hw/vfio/spapr.c
@@ -143,10 +143,10 @@ int vfio_spapr_create_window(VFIOContainer *container,
MemoryRegionSection *section,
hwaddr *pgsize)
{
- int ret;
+ int ret = 0;
IOMMUMemoryRegion *iommu_mr = IOMMU_MEMORY_REGION(section->mr);
uint64_t pagesize = memory_region_iommu_get_min_page_size(iommu_mr);
- unsigned entries, pages;
+ unsigned entries, bits_total, bits_per_level, max_levels;
struct vfio_iommu_spapr_tce_create create = { .argsz = sizeof(create) };
long systempagesize = qemu_getrampagesize();
@@ -176,16 +176,38 @@ int vfio_spapr_create_window(VFIOContainer *container,
create.window_size = int128_get64(section->size);
create.page_shift = ctz64(pagesize);
/*
- * SPAPR host supports multilevel TCE tables, there is some
- * heuristic to decide how many levels we want for our table:
- * 0..64 = 1; 65..4096 = 2; 4097..262144 = 3; 262145.. = 4
+ * SPAPR host supports multilevel TCE tables. We try to guess optimal
+ * levels number and if this fails (for example due to the host memory
+ * fragmentation), we increase levels. The DMA address structure is:
+ * rrrrrrrr rxxxxxxx xxxxxxxx xxxxxxxx xxxxxxxx xxxxxxxx xxxxxxxx iiiiiiii
+ * where:
+ * r = reserved (bits >= 55 are reserved in the existing hardware)
+ * i = IOMMU page offset (64K in this example)
+ * x = bits to index a TCE which can be split to equal chunks to index
+ * within the level.
+ * The aim is to split "x" to smaller possible number of levels.
*/
entries = create.window_size >> create.page_shift;
- pages = MAX((entries * sizeof(uint64_t)) / getpagesize(), 1);
- pages = MAX(pow2ceil(pages), 1); /* Round up */
- create.levels = ctz64(pages) / 6 + 1;
-
- ret = ioctl(container->fd, VFIO_IOMMU_SPAPR_TCE_CREATE, &create);
+ /* bits_total is number of "x" needed */
+ bits_total = ctz64(entries * sizeof(uint64_t));
+ /*
+ * bits_per_level is a safe guess of how much we can allocate per level:
+ * 8 is the current minimum for CONFIG_FORCE_MAX_ZONEORDER and MAX_ORDER
+ * is usually bigger than that.
+ * Below we look at getpagesize() as TCEs are allocated from system pages.
+ */
+ bits_per_level = ctz64(getpagesize()) + 8;
+ create.levels = bits_total / bits_per_level;
+ if (bits_total % bits_per_level) {
+ ++create.levels;
+ }
+ max_levels = (64 - create.page_shift) / ctz64(getpagesize());
+ for ( ; create.levels <= max_levels; ++create.levels) {
+ ret = ioctl(container->fd, VFIO_IOMMU_SPAPR_TCE_CREATE, &create);
+ if (!ret) {
+ break;
+ }
+ }
if (ret) {
error_report("Failed to create a window, ret = %d (%m)", ret);
return -errno;
@@ -200,6 +222,7 @@ int vfio_spapr_create_window(VFIOContainer *container,
return -EINVAL;
}
trace_vfio_spapr_create_window(create.page_shift,
+ create.levels,
create.window_size,
create.start_addr);
*pgsize = pagesize;
diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
index ed2f333ad7..cf1e886818 100644
--- a/hw/vfio/trace-events
+++ b/hw/vfio/trace-events
@@ -129,6 +129,6 @@ vfio_prereg_listener_region_add_skip(uint64_t start, uint64_t end) "0x%"PRIx64"
vfio_prereg_listener_region_del_skip(uint64_t start, uint64_t end) "0x%"PRIx64" - 0x%"PRIx64
vfio_prereg_register(uint64_t va, uint64_t size, int ret) "va=0x%"PRIx64" size=0x%"PRIx64" ret=%d"
vfio_prereg_unregister(uint64_t va, uint64_t size, int ret) "va=0x%"PRIx64" size=0x%"PRIx64" ret=%d"
-vfio_spapr_create_window(int ps, uint64_t ws, uint64_t off) "pageshift=0x%x winsize=0x%"PRIx64" offset=0x%"PRIx64
+vfio_spapr_create_window(int ps, unsigned int levels, uint64_t ws, uint64_t off) "pageshift=0x%x levels=%u winsize=0x%"PRIx64" offset=0x%"PRIx64
vfio_spapr_remove_window(uint64_t off) "offset=0x%"PRIx64
vfio_spapr_group_attach(int groupfd, int tablefd) "Attached groupfd %d to liobn fd %d"
--
2.20.1
next prev parent reply other threads:[~2019-03-10 8:27 UTC|newest]
Thread overview: 74+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-03-10 8:26 [Qemu-devel] [PULL 00/60] ppc-for-4.0 queue 20190310 David Gibson
2019-03-10 8:26 ` David Gibson [this message]
2019-03-10 8:26 ` [Qemu-devel] [PULL 02/60] vfio/spapr: Rename local systempagesize variable David Gibson
2019-03-10 8:26 ` [Qemu-devel] [PULL 03/60] spapr: Simulate CAS for qtest David Gibson
2019-03-10 8:26 ` [Qemu-devel] [PULL 04/60] Revert "spapr: support memory unplug for qtest" David Gibson
2019-03-11 10:52 ` Greg Kurz
2019-03-12 1:08 ` David Gibson
2019-03-10 8:26 ` [Qemu-devel] [PULL 05/60] target/ppc/spapr: Add SPAPR_CAP_LARGE_DECREMENTER David Gibson
2019-03-10 8:26 ` [Qemu-devel] [PULL 06/60] target/ppc: Implement large decrementer support for TCG David Gibson
2019-03-10 8:26 ` [Qemu-devel] [PULL 07/60] target/ppc: Implement large decrementer support for KVM David Gibson
2019-03-10 8:26 ` [Qemu-devel] [PULL 08/60] target/ppc/spapr: Enable the large decrementer for pseries-4.0 David Gibson
2019-03-10 8:26 ` [Qemu-devel] [PULL 09/60] target/ppc/spapr: Add workaround option to SPAPR_CAP_IBS David Gibson
2019-03-10 8:26 ` [Qemu-devel] [PULL 10/60] target/ppc/spapr: Add SPAPR_CAP_CCF_ASSIST David Gibson
2019-03-10 8:26 ` [Qemu-devel] [PULL 11/60] target/ppc/tcg: make spapr_caps apply cap-[cfpc/sbbc/ibs] non-fatal for tcg David Gibson
2019-03-10 8:26 ` [Qemu-devel] [PULL 12/60] target/ppc/spapr: Enable mitigations by default for pseries-4.0 machine type David Gibson
2019-03-10 8:26 ` [Qemu-devel] [PULL 13/60] target/ppc: Move exception vector offset computation into a function David Gibson
2019-03-10 8:26 ` [Qemu-devel] [PULL 14/60] target/ppc: Move handling of hardware breakpoints to a separate function David Gibson
2019-03-10 8:26 ` [Qemu-devel] [PULL 15/60] target/ppc: Refactor kvm_handle_debug David Gibson
2019-03-10 8:26 ` [Qemu-devel] [PULL 16/60] PPC: E500: Update u-boot to v2019.01 David Gibson
2019-03-10 8:26 ` [Qemu-devel] [PULL 17/60] target/ppc/spapr: Clear partition table entry when allocating hash table David Gibson
2019-03-10 8:26 ` [Qemu-devel] [PULL 18/60] spapr: Force SPAPR_MEMORY_BLOCK_SIZE to be a hwaddr (64-bit) David Gibson
2019-03-10 8:26 ` [Qemu-devel] [PULL 19/60] target/ppc/spapr: Enable H_PAGE_INIT in-kernel handling David Gibson
2019-03-10 8:26 ` [Qemu-devel] [PULL 20/60] PPC: E500: Add FSL I2C controller and integrate RTC with it David Gibson
2019-03-10 8:26 ` [Qemu-devel] [PULL 21/60] ppc/xive: hardwire the Physical CAM line of the thread context David Gibson
2019-03-10 8:26 ` [Qemu-devel] [PULL 22/60] ppc: externalize ppc_get_vcpu_by_pir() David Gibson
2019-03-10 8:26 ` [Qemu-devel] [PULL 23/60] ppc/xive: export the TIMA memory accessors David Gibson
2019-03-10 8:26 ` [Qemu-devel] [PULL 24/60] ppc/pnv: export the xive_router_notify() routine David Gibson
2019-03-10 8:26 ` [Qemu-devel] [PULL 25/60] ppc/pnv: change the CPU machine_data presenter type to Object * David Gibson
2019-03-10 8:26 ` [Qemu-devel] [PULL 26/60] ppc/pnv: add a XIVE interrupt controller model for POWER9 David Gibson
2019-03-10 8:26 ` [Qemu-devel] [PULL 27/60] ppc/pnv: introduce a new dt_populate() operation to the chip model David Gibson
2019-03-10 8:26 ` [Qemu-devel] [PULL 28/60] ppc/pnv: introduce a new pic_print_info() " David Gibson
2019-03-10 8:26 ` [Qemu-devel] [PULL 29/60] ppc/xive: activate HV support David Gibson
2019-03-10 8:26 ` [Qemu-devel] [PULL 30/60] ppc/pnv: fix logging primitives using Ox David Gibson
2019-03-10 8:26 ` [Qemu-devel] [PULL 31/60] ppc/pnv: psi: add a PSIHB_REG macro David Gibson
2019-03-10 8:26 ` [Qemu-devel] [PULL 32/60] ppc/pnv: psi: add a reset handler David Gibson
2019-03-10 8:26 ` [Qemu-devel] [PULL 33/60] spapr_iommu: Do not replay mappings from just created DMA window David Gibson
2019-03-10 8:26 ` [Qemu-devel] [PULL 34/60] target/ppc: introduce single fpr_offset() function David Gibson
2019-03-10 8:26 ` [Qemu-devel] [PULL 35/60] target/ppc: introduce single vsrl_offset() function David Gibson
2019-03-10 8:26 ` [Qemu-devel] [PULL 36/60] target/ppc: move Vsr* macros from internal.h to cpu.h David Gibson
2019-03-10 8:26 ` [Qemu-devel] [PULL 37/60] target/ppc: introduce avr_full_offset() function David Gibson
2019-03-10 8:26 ` [Qemu-devel] [PULL 38/60] target/ppc: improve avr64_offset() and use it to simplify get_avr64()/set_avr64() David Gibson
2019-03-10 8:26 ` [Qemu-devel] [PULL 39/60] target/ppc: switch fpr/vsrl registers so all VSX registers are in host endian order David Gibson
2019-03-10 8:26 ` [Qemu-devel] [PULL 40/60] target/ppc: introduce vsr64_offset() to simplify get_cpu_vsr{l, h}() and set_cpu_vsr{l, h}() David Gibson
2019-03-10 8:26 ` [Qemu-devel] [PULL 41/60] mac_oldworld: use node name instead of alias name for hd device in FWPathProvider David Gibson
2019-03-10 8:26 ` [Qemu-devel] [PULL 42/60] mac_newworld: " David Gibson
2019-03-10 8:26 ` [Qemu-devel] [PULL 43/60] ppc/pnv: add a PSI bridge class model David Gibson
2019-03-10 8:26 ` [Qemu-devel] [PULL 44/60] ppc/pnv: add a PSI bridge model for POWER9 David Gibson
2019-03-10 8:26 ` [Qemu-devel] [PULL 45/60] ppc/pnv: lpc: fix OPB address ranges David Gibson
2019-03-10 8:26 ` [Qemu-devel] [PULL 46/60] ppc/pnv: add a LPC Controller class model David Gibson
2019-03-10 8:26 ` [Qemu-devel] [PULL 47/60] ppc/pnv: add a 'dt_isa_nodename' to the chip David Gibson
2019-03-10 8:26 ` [Qemu-devel] [PULL 48/60] ppc/pnv: add a LPC Controller model for POWER9 David Gibson
2019-03-10 8:26 ` [Qemu-devel] [PULL 49/60] ppc/pnv: add SerIRQ routing registers David Gibson
2019-03-10 8:26 ` [Qemu-devel] [PULL 50/60] ppc/pnv: add a OCC model class David Gibson
2019-03-10 8:26 ` [Qemu-devel] [PULL 51/60] ppc/pnv: add a OCC model for POWER9 David Gibson
2019-03-10 8:26 ` [Qemu-devel] [PULL 52/60] ppc/pnv: extend XSCOM core support " David Gibson
2019-03-10 8:26 ` [Qemu-devel] [PULL 53/60] ppc/pnv: POWER9 XSCOM quad support David Gibson
2019-03-10 8:26 ` [Qemu-devel] [PULL 54/60] ppc/pnv: activate XSCOM tests for POWER9 David Gibson
2019-03-10 8:26 ` [Qemu-devel] [PULL 55/60] ppc/pnv: add more dummy XSCOM addresses David Gibson
2019-03-10 8:26 ` [Qemu-devel] [PULL 56/60] ppc/pnv: add a "ibm, opal/power-mgt" device tree node on POWER9 David Gibson
2019-03-10 8:27 ` [Qemu-devel] [PULL 57/60] target/ppc: add HV support for POWER9 David Gibson
[not found] ` <20190312150115.6zuaid43gr7hklt5@unused>
[not found] ` <58de43c6-31d5-a0a3-b443-54a33f11d75a@kaod.org>
[not found] ` <20190312191409.vxnpscrephtk6otv@dhcp-17-165.bos.redhat.com>
[not found] ` <1746025955.7399905.1552419034356.JavaMail.zimbra@redhat.com>
[not found] ` <154364d7-fe5b-4f40-b976-b85ff9060ee0@kaod.org>
2019-06-28 13:20 ` Philippe Mathieu-Daudé
2019-07-01 5:04 ` David Gibson
2019-07-01 9:45 ` Philippe Mathieu-Daudé
2019-07-02 0:14 ` David Gibson
2019-07-02 6:13 ` Cédric Le Goater
2019-07-02 9:22 ` Philippe Mathieu-Daudé
2019-03-10 8:27 ` [Qemu-devel] [PULL 58/60] target/ppc: Optimize xviexpdp() using deposit_i64() David Gibson
2019-03-10 8:27 ` [Qemu-devel] [PULL 59/60] target/ppc: Optimize x[sv]xsigdp " David Gibson
2019-03-10 8:27 ` [Qemu-devel] [PULL 60/60] spapr: Use CamelCase properly David Gibson
2019-03-10 9:23 ` [Qemu-devel] [PULL 00/60] ppc-for-4.0 queue 20190310 no-reply
2019-03-10 16:06 ` Peter Maydell
2019-03-11 10:40 ` Alex Bennée
2019-03-12 0:26 ` David Gibson
2019-03-12 0:44 ` David Gibson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190310082703.1245-2-david@gibson.dropbear.id.au \
--to=david@gibson.dropbear.id.au \
--cc=aik@ozlabs.ru \
--cc=groug@kaod.org \
--cc=lvivier@redhat.com \
--cc=peter.maydell@linaro.org \
--cc=qemu-devel@nongnu.org \
--cc=qemu-ppc@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).