From: Paul Durrant <Paul.Durrant@citrix.com>
To: "xen-devel@lists.xenproject.org" <xen-devel@lists.xenproject.org>
Cc: Stefano Stabellini <sstabellini@kernel.org>,
Wei Liu <wei.liu2@citrix.com>,
Andrew Cooper <Andrew.Cooper3@citrix.com>,
"Tim (Xen.org)" <tim@xen.org>,
George Dunlap <George.Dunlap@citrix.com>,
Julien Grall <julien.grall@arm.com>,
Jan Beulich <jbeulich@suse.com>,
Ian Jackson <Ian.Jackson@citrix.com>,
Boris Ostrovsky <boris.ostrovsky@oracle.com>,
Roger Pau Monne <roger.pau@citrix.com>
Subject: Re: [PATCH v10 07/11] vpci/bars: add handlers to map the BARs
Date: Mon, 19 Mar 2018 10:22:36 +0000 [thread overview]
Message-ID: <1fc2234601b74ccbaa59af1ebf015326@AMSPEX02CL03.citrite.net> (raw)
In-Reply-To: <20180316133008.66443-8-roger.pau@citrix.com>
> -----Original Message-----
> From: Roger Pau Monne [mailto:roger.pau@citrix.com]
> Sent: 16 March 2018 13:30
> To: xen-devel@lists.xenproject.org
> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>; Konrad Rzeszutek Wilk
> <konrad.wilk@oracle.com>; Roger Pau Monne <roger.pau@citrix.com>; Ian
> Jackson <Ian.Jackson@citrix.com>; Wei Liu <wei.liu2@citrix.com>; Andrew
> Cooper <Andrew.Cooper3@citrix.com>; George Dunlap
> <George.Dunlap@citrix.com>; Jan Beulich <jbeulich@suse.com>; Julien Grall
> <julien.grall@arm.com>; Stefano Stabellini <sstabellini@kernel.org>; Tim
> (Xen.org) <tim@xen.org>; Paul Durrant <Paul.Durrant@citrix.com>
> Subject: [PATCH v10 07/11] vpci/bars: add handlers to map the BARs
>
> Introduce a set of handlers that trap accesses to the PCI BARs and the
> command register, in order to snoop BAR sizing and BAR relocation.
>
> The command handler is used to detect changes to bit 2 (response to
> memory space accesses), and maps/unmaps the BARs of the device into
> the guest p2m. A rangeset is used in order to figure out which memory
> to map/unmap. This makes it easier to keep track of the possible
> overlaps with other BARs, and will also simplify MSI-X support, where
> certain regions of a BAR might be used for the MSI-X table or PBA.
>
> The BAR register handlers are used to detect attempts by the guest to
> size or relocate the BARs.
>
> Note that the long running BAR mapping and unmapping operations are
> deferred to be performed by hvm_io_pending, so that they can be safely
> preempted.
>
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
ioreq code mod...
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
> ---
> Cc: Ian Jackson <ian.jackson@eu.citrix.com>
> Cc: Wei Liu <wei.liu2@citrix.com>
> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
> Cc: George Dunlap <George.Dunlap@eu.citrix.com>
> Cc: Jan Beulich <jbeulich@suse.com>
> Cc: Julien Grall <julien.grall@arm.com>
> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> Cc: Stefano Stabellini <sstabellini@kernel.org>
> Cc: Tim Deegan <tim@xen.org>
> Cc: Paul Durrant <paul.durrant@citrix.com>
> ---
> Changes since v9:
> - Expand comments to clarify the code.
> - Rename rom to rom_only in the vpci_cpu struct.
> - Change definition style of dummy vpci_cpu.
> - Replace incorrect usage of PFN_UP.
> - Use system_state in order to check if the mapping functions are
> being called from Dom0 builder context.
> - Split the maybe_defer_map into two functions and place the Dom0
> builder one in the init section.
>
> Changes since v8:
> - Do not pretend to support ARM in the map_range function. Explain
> the required changes in the comment.
> - Introduce PCI_HEADER_{NORMAL/BRIDGE}_NR_BARS defines.
> - Rename 'rom' boolean variable to 'rom_only', which is more
> descriptive of it's meaning.
> - Introduce vpci_remove_device which removes all handlers for a
> device.
> - Simplify error handling when modifying BARs mapping. Any error will
> cause the device to be unplugged (by calling vpci_remove_device).
> - Return an error code in modify_bars. Add comments describing why
> the error is sometimes ignored.
>
> Changes since v7:
> - Order includes.
> - Add newline between switch cases.
> - Fix typo in comment (hopping).
> - Wrap ternary conditional in parentheses.
> - Remove CONFIG_HAS_PCI gueard from sched.h vpci_vcpu usage.
> - Add comment regarding vpci_vcpu usage.
> - Move rom_enabled from BAR struct to header.
> - Do not protect vpci_vcpu with __XEN__ guards.
>
> Changes since v6:
> - s/vpci_check_pending/vpci_process_pending/.
> - Improve error handling in vpci_process_pending.
> - Add a comment that explains how vpci_check_bar_overlap works.
> - Add error messages to vpci_modify_bars and vpci_modify_rom.
> - Introduce vpci_hw_read16/32, in order to passthrough reads to
> the underlying hw.
> - Print BAR number on error in vpci_bar_write.
> - Place the CONFIG_HAS_PCI guards inside the vpci.h header and
> provide an empty vpci_vcpu structure for the !CONFIG_HAS_PCI case.
> - Define CONFIG_HAS_PCI in the test harness emul.h header before
> including vpci.h
> - Add ARM TODOs and an ARM-specific bodge to vpci_map_range due to
> the lack of preemption in {un}map_mmio_regions.
> - Make vpci_maybe_defer_map void.
> - Set rom_enabled in vpci_init_bars.
> - Defer enabling/disabling the memory decoding (or the ROM enable
> bit) until the memory has been mapped/unmapped.
> - Remove vpci_ prefix from static functions.
> - Use the same code in order to map the general BARs and the ROM
> BARs.
> - Remove the seg/bus local variables and use pdev->{seg,bus} instead.
> - Convert the bools in the BAR related structs into bool bitfields.
> - Add the must_check attribute to vpci_process_pending.
> - Open code check_bar_overlap inside modify_bars, which was it's only
> user.
>
> Changes since v5:
> - Switch to the new handler type.
> - Use pci_sbdf_t to size the BARs.
> - Use a single return for vpci_modify_bar.
> - Do not return an error code from vpci_modify_bars, just log the
> failure.
> - Remove the 'sizing' parameter. Instead just let the guest write
> directly to the BAR, and read the value back. This simplifies the
> BAR register handlers, specially the read one.
> - Ignore ROM BAR writes with memory decoding enabled and ROM enabled.
> - Do not propagate failures to setup the ROM BAR in vpci_init_bars.
> - Add preemption support to the BAR mapping/unmapping operations.
>
> Changes since v4:
> - Expand commit message to mention the reason behind the usage of
> rangesets.
> - Fix comment related to the inclusiveness of rangesets.
> - Fix off-by-one error in the calculation of the end of memory
> regions.
> - Store the state of the BAR (mapped/unmapped) in the vpci_bar
> enabled field, previously was only used by ROMs.
> - Fix double negation of return code.
> - Modify vpci_cmd_write so it has a single call to pci_conf_write16.
> - Print a warning when trying to write to the BAR with memory
> decoding enabled (and ignore the write).
> - Remove header_type local variable, it's used only once.
> - Move the read of the command register.
> - Restore previous command register value in the exit paths.
> - Only set address to INVALID_PADDR if the initial BAR value matches
> ~0 & PCI_BASE_ADDRESS_MEM_MASK.
> - Don't disable the enabled bit in the expansion ROM register, memory
> decoding is already disabled and takes precedence.
> - Don't use INVALID_PADDR, just set the initial BAR address to the
> value found in the hardware.
> - Introduce rom_enabled to store the status of the
> PCI_ROM_ADDRESS_ENABLE bit.
> - Reorder fields of the structure to prevent holes.
>
> Changes since v3:
> - Propagate previous changes: drop xen_ prefix and use u8/u16/u32
> instead of the previous half_word/word/double_word.
> - Constify some of the paramerters.
> - s/VPCI_BAR_MEM/VPCI_BAR_MEM32/.
> - Simplify the number of fields stored for each BAR, a single address
> field is stored and contains the address of the BAR both on Xen and
> in the guest.
> - Allow the guest to move the BARs around in the physical memory map.
> - Add support for expansion ROM BARs.
> - Do not cache the value of the command register.
> - Remove a label used in vpci_cmd_write.
> - Fix the calculation of the sizing mask in vpci_bar_write.
> - Check the memory decode bit in order to decide if a BAR is
> positioned or not.
> - Disable memory decoding before sizing the BARs in Xen.
> - When mapping/unmapping BARs check if there's overlap between BARs,
> in order to avoid unmapping memory required by another BAR.
> - Introduce a macro to check whether a BAR is mappable or not.
> - Add a comment regarding the lack of support for SR-IOV.
> - Remove the usage of the GENMASK macro.
>
> Changes since v2:
> - Detect unset BARs and allow the hardware domain to position them.
> ---
> tools/tests/vpci/emul.h | 1 +
> xen/arch/x86/hvm/ioreq.c | 4 +
> xen/drivers/vpci/Makefile | 2 +-
> xen/drivers/vpci/header.c | 552
> ++++++++++++++++++++++++++++++++++++++++++++++
> xen/drivers/vpci/vpci.c | 45 ++--
> xen/include/xen/sched.h | 4 +
> xen/include/xen/vpci.h | 61 +++++
> 7 files changed, 655 insertions(+), 14 deletions(-)
> create mode 100644 xen/drivers/vpci/header.c
>
> diff --git a/tools/tests/vpci/emul.h b/tools/tests/vpci/emul.h
> index fd0317995a..5d47544bf7 100644
> --- a/tools/tests/vpci/emul.h
> +++ b/tools/tests/vpci/emul.h
> @@ -80,6 +80,7 @@ typedef union {
> };
> } pci_sbdf_t;
>
> +#define CONFIG_HAS_VPCI
> #include "vpci.h"
>
> #define __hwdom_init
> diff --git a/xen/arch/x86/hvm/ioreq.c b/xen/arch/x86/hvm/ioreq.c
> index 7e66965bcd..90c9e3cd59 100644
> --- a/xen/arch/x86/hvm/ioreq.c
> +++ b/xen/arch/x86/hvm/ioreq.c
> @@ -26,6 +26,7 @@
> #include <xen/domain.h>
> #include <xen/event.h>
> #include <xen/paging.h>
> +#include <xen/vpci.h>
>
> #include <asm/hvm/hvm.h>
> #include <asm/hvm/ioreq.h>
> @@ -48,6 +49,9 @@ bool hvm_io_pending(struct vcpu *v)
> struct domain *d = v->domain;
> struct hvm_ioreq_server *s;
>
> + if ( has_vpci(d) && vpci_process_pending(v) )
> + return true;
> +
> list_for_each_entry ( s,
> &d->arch.hvm_domain.ioreq_server.list,
> list_entry )
> diff --git a/xen/drivers/vpci/Makefile b/xen/drivers/vpci/Makefile
> index 840a906470..241467212f 100644
> --- a/xen/drivers/vpci/Makefile
> +++ b/xen/drivers/vpci/Makefile
> @@ -1 +1 @@
> -obj-y += vpci.o
> +obj-y += vpci.o header.o
> diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
> new file mode 100644
> index 0000000000..b502bac81e
> --- /dev/null
> +++ b/xen/drivers/vpci/header.c
> @@ -0,0 +1,552 @@
> +/*
> + * Generic functionality for handling accesses to the PCI header from the
> + * configuration space.
> + *
> + * Copyright (C) 2017 Citrix Systems R&D
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms and conditions of the GNU General Public
> + * License, version 2, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> GNU
> + * General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public
> + * License along with this program; If not, see
> <http://www.gnu.org/licenses/>.
> + */
> +
> +#include <xen/p2m-common.h>
> +#include <xen/sched.h>
> +#include <xen/softirq.h>
> +#include <xen/vpci.h>
> +
> +#include <asm/event.h>
> +
> +#define MAPPABLE_BAR(x) \
> + ((x)->type == VPCI_BAR_MEM32 || (x)->type == VPCI_BAR_MEM64_LO
> || \
> + (x)->type == VPCI_BAR_ROM)
> +
> +struct map_data {
> + struct domain *d;
> + bool map;
> +};
> +
> +static int map_range(unsigned long s, unsigned long e, void *data,
> + unsigned long *c)
> +{
> + const struct map_data *map = data;
> + int rc;
> +
> + for ( ; ; )
> + {
> + unsigned long size = e - s + 1;
> +
> + /*
> + * ARM TODOs:
> + * - On ARM whether the memory is prefetchable or not should be
> passed
> + * to map_mmio_regions in order to decide which memory attributes
> + * should be used.
> + *
> + * - {un}map_mmio_regions doesn't support preemption.
> + */
> +
> + rc = (map->map ? map_mmio_regions : unmap_mmio_regions)
> + (map->d, _gfn(s), size, _mfn(s));
> + if ( rc == 0 )
> + {
> + *c += size;
> + break;
> + }
> + if ( rc < 0 )
> + {
> + printk(XENLOG_G_WARNING
> + "Failed to identity %smap [%" PRI_gfn ", %" PRI_gfn "] for d%d:
> %d\n",
> + map ? "" : "un", s, e, map->d->domain_id, rc);
> + break;
> + }
> + ASSERT(rc < size);
> + *c += rc;
> + s += rc;
> + if ( general_preempt_check() )
> + return -ERESTART;
> + }
> +
> + return rc;
> +}
> +
> +/*
> + * The rom_only parameter is used to signal the map/unmap helpers that
> the ROM
> + * BAR's enable bit has changed with the memory decoding bit already
> enabled.
> + * If rom_only is not set then it's the memory decoding bit that changed.
> + */
> +static void modify_decoding(const struct pci_dev *pdev, bool map, bool
> rom_only)
> +{
> + struct vpci_header *header = &pdev->vpci->header;
> + uint8_t slot = PCI_SLOT(pdev->devfn), func = PCI_FUNC(pdev->devfn);
> + uint16_t cmd;
> + unsigned int i;
> +
> + for ( i = 0; i < ARRAY_SIZE(header->bars); i++ )
> + {
> + if ( !MAPPABLE_BAR(&header->bars[i]) )
> + continue;
> +
> + if ( rom_only && header->bars[i].type == VPCI_BAR_ROM )
> + {
> + unsigned int rom_pos = (i == PCI_HEADER_NORMAL_NR_BARS)
> + ? PCI_ROM_ADDRESS : PCI_ROM_ADDRESS1;
> + uint32_t val = pci_conf_read32(pdev->seg, pdev->bus, slot, func,
> + rom_pos);
> +
> + header->bars[i].enabled = header->rom_enabled = map;
> +
> + val &= ~PCI_ROM_ADDRESS_ENABLE;
> + val |= map ? PCI_ROM_ADDRESS_ENABLE : 0;
> + pci_conf_write32(pdev->seg, pdev->bus, slot, func, rom_pos, val);
> + return;
> + }
> +
> + if ( !rom_only &&
> + (header->bars[i].type != VPCI_BAR_ROM || header->rom_enabled)
> )
> + header->bars[i].enabled = map;
> + }
> +
> + ASSERT(!rom_only);
> + cmd = pci_conf_read16(pdev->seg, pdev->bus, slot, func,
> PCI_COMMAND);
> + cmd &= ~PCI_COMMAND_MEMORY;
> + cmd |= map ? PCI_COMMAND_MEMORY : 0;
> + pci_conf_write16(pdev->seg, pdev->bus, slot, func, PCI_COMMAND,
> + cmd);
> +}
> +
> +bool vpci_process_pending(struct vcpu *v)
> +{
> + if ( v->vpci.mem )
> + {
> + struct map_data data = {
> + .d = v->domain,
> + .map = v->vpci.map,
> + };
> + int rc = rangeset_consume_ranges(v->vpci.mem, map_range, &data);
> +
> + if ( rc == -ERESTART )
> + return true;
> +
> + spin_lock(&v->vpci.pdev->vpci->lock);
> + /* Disable memory decoding unconditionally on failure. */
> + modify_decoding(v->vpci.pdev, rc ? false : v->vpci.map,
> + rc ? false : v->vpci.rom_only);
> + spin_unlock(&v->vpci.pdev->vpci->lock);
> +
> + rangeset_destroy(v->vpci.mem);
> + v->vpci.mem = NULL;
> + if ( rc )
> + /*
> + * FIXME: in case of failure remove the device from the domain.
> + * Note that there might still be leftover mappings. While this is
> + * safe for Dom0, for DomUs the domain will likely need to be
> + * killed in order to avoid leaking stale p2m mappings on
> + * failure.
> + */
> + vpci_remove_device(v->vpci.pdev);
> + }
> +
> + return false;
> +}
> +
> +static int __init apply_map(struct domain *d, struct pci_dev *pdev,
> + struct rangeset *mem)
> +{
> + struct map_data data = { .d = d, .map = true };
> + int rc;
> +
> + while ( (rc = rangeset_consume_ranges(mem, map_range, &data)) == -
> ERESTART )
> + process_pending_softirqs();
> + rangeset_destroy(mem);
> + if ( rc )
> + return rc;
> + modify_decoding(pdev, true, false);
> +
> + return rc;
> +}
> +
> +static void defer_map(struct domain *d, struct pci_dev *pdev,
> + struct rangeset *mem, bool map, bool rom_only)
> +{
> + struct vcpu *curr = current;
> +
> + /*
> + * FIXME: when deferring the {un}map the state of the device should not
> + * be trusted. For example the enable bit is toggled after the device
> + * is mapped. This can lead to parallel mapping operations being
> + * started for the same device if the domain is not well-behaved.
> + */
> + curr->vpci.pdev = pdev;
> + curr->vpci.mem = mem;
> + curr->vpci.map = map;
> + curr->vpci.rom_only = rom_only;
> +}
> +
> +static int modify_bars(const struct pci_dev *pdev, bool map, bool
> rom_only)
> +{
> + struct vpci_header *header = &pdev->vpci->header;
> + struct rangeset *mem = rangeset_new(NULL, NULL, 0);
> + struct pci_dev *tmp, *dev = NULL;
> + unsigned int i;
> + int rc;
> +
> + if ( !mem )
> + return -ENOMEM;
> +
> + /*
> + * Create a rangeset that represents the current device BARs memory
> region
> + * and compare it against all the currently active BAR memory regions. If
> + * an overlap is found, subtract it from the region to be
> mapped/unmapped.
> + *
> + * First fill the rangeset with all the BARs of this device or with the ROM
> + * BAR only, depending on whether the guest is toggling the memory
> decode
> + * bit of the command register, or the enable bit of the ROM BAR register.
> + */
> + for ( i = 0; i < ARRAY_SIZE(header->bars); i++ )
> + {
> + const struct vpci_bar *bar = &header->bars[i];
> +
> + if ( !MAPPABLE_BAR(bar) ||
> + (rom_only ? bar->type != VPCI_BAR_ROM
> + : (bar->type == VPCI_BAR_ROM && !header->rom_enabled)) )
> + continue;
> +
> + rc = rangeset_add_range(mem, PFN_DOWN(bar->addr),
> + PFN_DOWN(bar->addr + bar->size - 1));
> + if ( rc )
> + {
> + printk(XENLOG_G_WARNING
> + "Failed to add [%" PRI_gfn ", %" PRI_gfn "]: %d\n",
> + PFN_DOWN(bar->addr), PFN_DOWN(bar->addr + bar->size - 1),
> + rc);
> + rangeset_destroy(mem);
> + return rc;
> + }
> + }
> +
> + /*
> + * Check for overlaps with other BARs. Note that only BARs that are
> + * currently mapped (enabled) are checked for overlaps.
> + */
> + list_for_each_entry(tmp, &pdev->domain->arch.pdev_list, domain_list)
> + {
> + if ( tmp == pdev )
> + {
> + /*
> + * Need to store the device so it's not constified and
> + * maybe_defer_map can modify it in case of error.
> + */
> + dev = tmp;
> + if ( !rom_only )
> + /*
> + * If memory decoding is toggled avoid checking against the
> + * same device, or else all regions will be removed from the
> + * memory map in the unmap case.
> + */
> + continue;
> + }
> +
> + for ( i = 0; i < ARRAY_SIZE(tmp->vpci->header.bars); i++ )
> + {
> + const struct vpci_bar *bar = &tmp->vpci->header.bars[i];
> + unsigned long start = PFN_DOWN(bar->addr);
> + unsigned long end = PFN_DOWN(bar->addr + bar->size - 1);
> +
> + if ( !bar->enabled || !rangeset_overlaps_range(mem, start, end) ||
> + /*
> + * If only the ROM enable bit is toggled check against other
> + * BARs in the same device for overlaps, but not against the
> + * same ROM BAR.
> + */
> + (rom_only && tmp == pdev && bar->type == VPCI_BAR_ROM) )
> + continue;
> +
> + rc = rangeset_remove_range(mem, start, end);
> + if ( rc )
> + {
> + printk(XENLOG_G_WARNING
> + "Failed to remove [%" PRI_gfn ", %" PRI_gfn "]: %d\n",
> + start, end, rc);
> + rangeset_destroy(mem);
> + return rc;
> + }
> + }
> + }
> +
> + ASSERT(dev);
> +
> + if ( system_state < SYS_STATE_active )
> + {
> + /*
> + * Mappings might be created when building Dom0 if the memory
> decoding
> + * bit of PCI devices is enabled. In that case it's not possible to
> + * defer the operation, so call apply_map in order to create the
> + * mappings right away. Note that at build time this function will only
> + * be called iff the memory decoding bit is enabled, thus the operation
> + * will always be to establish mappings and process all the BARs.
> + */
> + ASSERT(map && !rom_only);
> + return apply_map(pdev->domain, dev, mem);
> + }
> +
> + defer_map(pdev->domain, dev, mem, map, rom_only);
> +
> + return 0;
> +}
> +
> +static void cmd_write(const struct pci_dev *pdev, unsigned int reg,
> + uint32_t cmd, void *data)
> +{
> + uint8_t slot = PCI_SLOT(pdev->devfn), func = PCI_FUNC(pdev->devfn);
> + uint16_t current_cmd = pci_conf_read16(pdev->seg, pdev->bus, slot,
> func,
> + reg);
> +
> + /*
> + * Let Dom0 play with all the bits directly except for the memory
> + * decoding one.
> + */
> + if ( (cmd ^ current_cmd) & PCI_COMMAND_MEMORY )
> + /*
> + * Ignore the error. No memory has been added or removed from the
> p2m
> + * (because the actual p2m changes are deferred in maybe_defer_map)
> + * and the memory decoding bit has not been changed, so leave
> + * everything as-is, hoping the guest will realize and try again.
> + */
> + modify_bars(pdev, cmd & PCI_COMMAND_MEMORY, false);
> + else
> + pci_conf_write16(pdev->seg, pdev->bus, slot, func, reg, cmd);
> +}
> +
> +static void bar_write(const struct pci_dev *pdev, unsigned int reg,
> + uint32_t val, void *data)
> +{
> + struct vpci_bar *bar = data;
> + uint8_t slot = PCI_SLOT(pdev->devfn), func = PCI_FUNC(pdev->devfn);
> + bool hi = false;
> +
> + if ( pci_conf_read16(pdev->seg, pdev->bus, slot, func, PCI_COMMAND)
> &
> + PCI_COMMAND_MEMORY )
> + {
> + gprintk(XENLOG_WARNING,
> + "%04x:%02x:%02x.%u: ignored BAR %lu write with memory
> decoding enabled\n",
> + pdev->seg, pdev->bus, slot, func,
> + bar - pdev->vpci->header.bars);
> + return;
> + }
> +
> + if ( bar->type == VPCI_BAR_MEM64_HI )
> + {
> + ASSERT(reg > PCI_BASE_ADDRESS_0);
> + bar--;
> + hi = true;
> + }
> + else
> + val &= PCI_BASE_ADDRESS_MEM_MASK;
> +
> + /*
> + * Update the cached address, so that when memory decoding is enabled
> + * Xen can map the BAR into the guest p2m.
> + */
> + bar->addr &= ~(0xffffffffull << (hi ? 32 : 0));
> + bar->addr |= (uint64_t)val << (hi ? 32 : 0);
> +
> + /* Make sure Xen writes back the same value for the BAR RO bits. */
> + if ( !hi )
> + {
> + val |= bar->type == VPCI_BAR_MEM32 ?
> PCI_BASE_ADDRESS_MEM_TYPE_32
> + : PCI_BASE_ADDRESS_MEM_TYPE_64;
> + val |= bar->prefetchable ? PCI_BASE_ADDRESS_MEM_PREFETCH : 0;
> + }
> +
> + pci_conf_write32(pdev->seg, pdev->bus, PCI_SLOT(pdev->devfn),
> + PCI_FUNC(pdev->devfn), reg, val);
> +}
> +
> +static void rom_write(const struct pci_dev *pdev, unsigned int reg,
> + uint32_t val, void *data)
> +{
> + struct vpci_header *header = &pdev->vpci->header;
> + struct vpci_bar *rom = data;
> + uint8_t slot = PCI_SLOT(pdev->devfn), func = PCI_FUNC(pdev->devfn);
> + uint16_t cmd = pci_conf_read16(pdev->seg, pdev->bus, slot, func,
> + PCI_COMMAND);
> + bool new_enabled = val & PCI_ROM_ADDRESS_ENABLE;
> +
> + if ( (cmd & PCI_COMMAND_MEMORY) && header->rom_enabled &&
> new_enabled )
> + {
> + gprintk(XENLOG_WARNING,
> + "%04x:%02x:%02x.%u: ignored ROM BAR write with memory
> decoding enabled\n",
> + pdev->seg, pdev->bus, slot, func);
> + return;
> + }
> +
> + if ( !header->rom_enabled )
> + rom->addr = val & PCI_ROM_ADDRESS_MASK;
> +
> + /* Check if ROM BAR should be mapped/unmapped. */
> + if ( (cmd & PCI_COMMAND_MEMORY) && header->rom_enabled !=
> new_enabled )
> + {
> + if ( modify_bars(pdev, new_enabled, true) )
> + /*
> + * Return on error in order to avoid updating the 'addr' field. No
> + * memory has been added or removed from the p2m (because the
> + * actual p2m changes are deferred in maybe_defer_map) and the
> ROM
> + * enable bit has not been changed, so leave everything as-is,
> + * hoping the guest will realize and try again.
> + */
> + return;
> + }
> + else
> + {
> + header->rom_enabled = new_enabled;
> + pci_conf_write32(pdev->seg, pdev->bus, slot, func, reg, val);
> + }
> +
> + if ( !new_enabled )
> + rom->addr = val & PCI_ROM_ADDRESS_MASK;
> +}
> +
> +static int init_bars(struct pci_dev *pdev)
> +{
> + uint8_t slot = PCI_SLOT(pdev->devfn), func = PCI_FUNC(pdev->devfn);
> + uint16_t cmd;
> + uint64_t addr, size;
> + unsigned int i, num_bars, rom_reg;
> + struct vpci_header *header = &pdev->vpci->header;
> + struct vpci_bar *bars = header->bars;
> + pci_sbdf_t sbdf = {
> + .seg = pdev->seg,
> + .bus = pdev->bus,
> + .dev = slot,
> + .func = func,
> + };
> + int rc;
> +
> + switch ( pci_conf_read8(pdev->seg, pdev->bus, slot, func,
> PCI_HEADER_TYPE)
> + & 0x7f )
> + {
> + case PCI_HEADER_TYPE_NORMAL:
> + num_bars = PCI_HEADER_NORMAL_NR_BARS;
> + rom_reg = PCI_ROM_ADDRESS;
> + break;
> +
> + case PCI_HEADER_TYPE_BRIDGE:
> + num_bars = PCI_HEADER_BRIDGE_NR_BARS;
> + rom_reg = PCI_ROM_ADDRESS1;
> + break;
> +
> + default:
> + return -EOPNOTSUPP;
> + }
> +
> + /* Setup a handler for the command register. */
> + rc = vpci_add_register(pdev->vpci, vpci_hw_read16, cmd_write,
> PCI_COMMAND,
> + 2, header);
> + if ( rc )
> + return rc;
> +
> + /* Disable memory decoding before sizing. */
> + cmd = pci_conf_read16(pdev->seg, pdev->bus, slot, func,
> PCI_COMMAND);
> + if ( cmd & PCI_COMMAND_MEMORY )
> + pci_conf_write16(pdev->seg, pdev->bus, slot, func, PCI_COMMAND,
> + cmd & ~PCI_COMMAND_MEMORY);
> +
> + for ( i = 0; i < num_bars; i++ )
> + {
> + uint8_t reg = PCI_BASE_ADDRESS_0 + i * 4;
> + uint32_t val;
> +
> + if ( i && bars[i - 1].type == VPCI_BAR_MEM64_LO )
> + {
> + bars[i].type = VPCI_BAR_MEM64_HI;
> + rc = vpci_add_register(pdev->vpci, vpci_hw_read32, bar_write, reg,
> + 4, &bars[i]);
> + if ( rc )
> + {
> + pci_conf_write16(pdev->seg, pdev->bus, slot, func,
> + PCI_COMMAND, cmd);
> + return rc;
> + }
> +
> + continue;
> + }
> +
> + val = pci_conf_read32(pdev->seg, pdev->bus, slot, func, reg);
> + if ( (val & PCI_BASE_ADDRESS_SPACE) ==
> PCI_BASE_ADDRESS_SPACE_IO )
> + {
> + bars[i].type = VPCI_BAR_IO;
> + continue;
> + }
> + if ( (val & PCI_BASE_ADDRESS_MEM_TYPE_MASK) ==
> + PCI_BASE_ADDRESS_MEM_TYPE_64 )
> + bars[i].type = VPCI_BAR_MEM64_LO;
> + else
> + bars[i].type = VPCI_BAR_MEM32;
> +
> + rc = pci_size_mem_bar(sbdf, reg, &addr, &size,
> + (i == num_bars - 1) ? PCI_BAR_LAST : 0);
> + if ( rc < 0 )
> + {
> + pci_conf_write16(pdev->seg, pdev->bus, slot, func, PCI_COMMAND,
> + cmd);
> + return rc;
> + }
> +
> + if ( size == 0 )
> + {
> + bars[i].type = VPCI_BAR_EMPTY;
> + continue;
> + }
> +
> + bars[i].addr = addr;
> + bars[i].size = size;
> + bars[i].prefetchable = val & PCI_BASE_ADDRESS_MEM_PREFETCH;
> +
> + rc = vpci_add_register(pdev->vpci, vpci_hw_read32, bar_write, reg, 4,
> + &bars[i]);
> + if ( rc )
> + {
> + pci_conf_write16(pdev->seg, pdev->bus, slot, func, PCI_COMMAND,
> + cmd);
> + return rc;
> + }
> + }
> +
> + /* Check expansion ROM. */
> + rc = pci_size_mem_bar(sbdf, rom_reg, &addr, &size, PCI_BAR_ROM);
> + if ( rc > 0 && size )
> + {
> + struct vpci_bar *rom = &header->bars[num_bars];
> +
> + rom->type = VPCI_BAR_ROM;
> + rom->size = size;
> + rom->addr = addr;
> + header->rom_enabled = pci_conf_read32(pdev->seg, pdev->bus, slot,
> func,
> + rom_reg) & PCI_ROM_ADDRESS_ENABLE;
> +
> + rc = vpci_add_register(pdev->vpci, vpci_hw_read32, rom_write,
> rom_reg,
> + 4, rom);
> + if ( rc )
> + rom->type = VPCI_BAR_EMPTY;
> + }
> +
> + return (cmd & PCI_COMMAND_MEMORY) ? modify_bars(pdev, true,
> false) : 0;
> +}
> +REGISTER_VPCI_INIT(init_bars);
> +
> +/*
> + * Local variables:
> + * mode: C
> + * c-file-style: "BSD"
> + * c-basic-offset: 4
> + * tab-width: 4
> + * indent-tabs-mode: nil
> + * End:
> + */
> diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
> index 4740d02edf..e5b49b9d82 100644
> --- a/xen/drivers/vpci/vpci.c
> +++ b/xen/drivers/vpci/vpci.c
> @@ -34,6 +34,23 @@ struct vpci_register {
> struct list_head node;
> };
>
> +void vpci_remove_device(struct pci_dev *pdev)
> +{
> + spin_lock(&pdev->vpci->lock);
> + while ( !list_empty(&pdev->vpci->handlers) )
> + {
> + struct vpci_register *r = list_first_entry(&pdev->vpci->handlers,
> + struct vpci_register,
> + node);
> +
> + list_del(&r->node);
> + xfree(r);
> + }
> + spin_unlock(&pdev->vpci->lock);
> + xfree(pdev->vpci);
> + pdev->vpci = NULL;
> +}
> +
> int __hwdom_init vpci_add_handlers(struct pci_dev *pdev)
> {
> unsigned int i;
> @@ -57,19 +74,7 @@ int __hwdom_init vpci_add_handlers(struct pci_dev
> *pdev)
> }
>
> if ( rc )
> - {
> - while ( !list_empty(&pdev->vpci->handlers) )
> - {
> - struct vpci_register *r = list_first_entry(&pdev->vpci->handlers,
> - struct vpci_register,
> - node);
> -
> - list_del(&r->node);
> - xfree(r);
> - }
> - xfree(pdev->vpci);
> - pdev->vpci = NULL;
> - }
> + vpci_remove_device(pdev);
>
> return rc;
> }
> @@ -102,6 +107,20 @@ static void vpci_ignored_write(const struct pci_dev
> *pdev, unsigned int reg,
> {
> }
>
> +uint32_t vpci_hw_read16(const struct pci_dev *pdev, unsigned int reg,
> + void *data)
> +{
> + return pci_conf_read16(pdev->seg, pdev->bus, PCI_SLOT(pdev->devfn),
> + PCI_FUNC(pdev->devfn), reg);
> +}
> +
> +uint32_t vpci_hw_read32(const struct pci_dev *pdev, unsigned int reg,
> + void *data)
> +{
> + return pci_conf_read32(pdev->seg, pdev->bus, PCI_SLOT(pdev->devfn),
> + PCI_FUNC(pdev->devfn), reg);
> +}
> +
> int vpci_add_register(struct vpci *vpci, vpci_read_t *read_handler,
> vpci_write_t *write_handler, unsigned int offset,
> unsigned int size, void *data)
> diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
> index 39f938644a..a452546453 100644
> --- a/xen/include/xen/sched.h
> +++ b/xen/include/xen/sched.h
> @@ -20,6 +20,7 @@
> #include <xen/smp.h>
> #include <xen/perfc.h>
> #include <asm/atomic.h>
> +#include <xen/vpci.h>
> #include <xen/wait.h>
> #include <public/xen.h>
> #include <public/domctl.h>
> @@ -264,6 +265,9 @@ struct vcpu
>
> struct evtchn_fifo_vcpu *evtchn_fifo;
>
> + /* vPCI per-vCPU area, used to store data for long running operations. */
> + struct vpci_vcpu vpci;
> +
> struct arch_vcpu arch;
> };
>
> diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
> index 9f2864fb0c..6bf8b22b4f 100644
> --- a/xen/include/xen/vpci.h
> +++ b/xen/include/xen/vpci.h
> @@ -1,6 +1,8 @@
> #ifndef _XEN_VPCI_H_
> #define _XEN_VPCI_H_
>
> +#ifdef CONFIG_HAS_VPCI
> +
> #include <xen/pci.h>
> #include <xen/types.h>
> #include <xen/list.h>
> @@ -20,6 +22,9 @@ typedef int vpci_register_init_t(struct pci_dev *dev);
> /* Add vPCI handlers to device. */
> int __must_check vpci_add_handlers(struct pci_dev *dev);
>
> +/* Remove all handlers and free vpci related structures. */
> +void vpci_remove_device(struct pci_dev *pdev);
> +
> /* Add/remove a register handler. */
> int __must_check vpci_add_register(struct vpci *vpci,
> vpci_read_t *read_handler,
> @@ -34,12 +39,68 @@ uint32_t vpci_read(pci_sbdf_t sbdf, unsigned int reg,
> unsigned int size);
> void vpci_write(pci_sbdf_t sbdf, unsigned int reg, unsigned int size,
> uint32_t data);
>
> +/* Passthrough handlers. */
> +uint32_t vpci_hw_read16(const struct pci_dev *pdev, unsigned int reg,
> + void *data);
> +uint32_t vpci_hw_read32(const struct pci_dev *pdev, unsigned int reg,
> + void *data);
> +
> +/*
> + * Check for pending vPCI operations on this vcpu. Returns true if the vcpu
> + * should not run.
> + */
> +bool __must_check vpci_process_pending(struct vcpu *v);
> +
> struct vpci {
> /* List of vPCI handlers for a device. */
> struct list_head handlers;
> spinlock_t lock;
> +
> +#ifdef __XEN__
> + /* Hide the rest of the vpci struct from the user-space test harness. */
> + struct vpci_header {
> + /* Information about the PCI BARs of this device. */
> + struct vpci_bar {
> + uint64_t addr;
> + uint64_t size;
> + enum {
> + VPCI_BAR_EMPTY,
> + VPCI_BAR_IO,
> + VPCI_BAR_MEM32,
> + VPCI_BAR_MEM64_LO,
> + VPCI_BAR_MEM64_HI,
> + VPCI_BAR_ROM,
> + } type;
> + bool prefetchable : 1;
> + /* Store whether the BAR is mapped into guest p2m. */
> + bool enabled : 1;
> +#define PCI_HEADER_NORMAL_NR_BARS 6
> +#define PCI_HEADER_BRIDGE_NR_BARS 2
> + } bars[PCI_HEADER_NORMAL_NR_BARS + 1];
> + /* At most 6 BARS + 1 expansion ROM BAR. */
> +
> + /*
> + * Store whether the ROM enable bit is set (doesn't imply ROM BAR
> + * is mapped into guest p2m) if there's a ROM BAR on the device.
> + */
> + bool rom_enabled : 1;
> + /* FIXME: currently there's no support for SR-IOV. */
> + } header;
> +#endif
> +};
> +
> +struct vpci_vcpu {
> + /* Per-vcpu structure to store state while {un}mapping of PCI BARs. */
> + struct rangeset *mem;
> + struct pci_dev *pdev;
> + bool map : 1;
> + bool rom_only : 1;
> };
>
> +#else /* !CONFIG_HAS_VPCI */
> +struct vpci_vcpu {};
> +#endif
> +
> #endif
>
> /*
> --
> 2.16.2
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
next prev parent reply other threads:[~2018-03-19 10:22 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-03-16 13:29 [PATCH v10 00/11] vpci: PCI config space emulation Roger Pau Monne
2018-03-16 13:29 ` [PATCH v10 01/11] vpci: introduce basic handlers to trap accesses to the PCI config space Roger Pau Monne
2018-03-20 11:10 ` Jan Beulich
2018-03-21 16:01 ` Wei Liu
2018-03-16 13:29 ` [PATCH v10 02/11] x86/mmcfg: add handlers for the PVH Dom0 MMCFG areas Roger Pau Monne
2018-03-16 13:30 ` [PATCH v10 03/11] x86/physdev: enable PHYSDEVOP_pci_mmcfg_reserved for PVH Dom0 Roger Pau Monne
2018-03-16 13:30 ` [PATCH v10 04/11] pci: split code to size BARs from pci_add_device Roger Pau Monne
2018-03-16 13:30 ` [PATCH v10 05/11] pci: add support to size ROM BARs to pci_size_mem_bar Roger Pau Monne
2018-03-16 13:30 ` [PATCH v10 06/11] xen: introduce rangeset_consume_ranges Roger Pau Monne
2018-03-16 13:30 ` [PATCH v10 07/11] vpci/bars: add handlers to map the BARs Roger Pau Monne
2018-03-19 10:22 ` Paul Durrant [this message]
2018-03-19 16:18 ` Jan Beulich
2018-03-20 11:12 ` Roger Pau Monné
2018-03-20 11:29 ` Jan Beulich
2018-03-16 13:30 ` [PATCH v10 08/11] x86/pt: mask MSI vectors on unbind Roger Pau Monne
2018-03-16 13:30 ` [PATCH v10 09/11] vpci/msi: add MSI handlers Roger Pau Monne
2018-03-16 14:04 ` Jan Beulich
2018-03-16 14:34 ` Roger Pau Monné
2018-03-16 15:05 ` Jan Beulich
2018-03-16 15:38 ` Roger Pau Monné
2018-03-16 16:24 ` Jan Beulich
2018-03-16 17:51 ` Roger Pau Monné
2018-03-19 8:06 ` Jan Beulich
2018-03-16 13:30 ` [PATCH v10 10/11] vpci: add a priority parameter to the vPCI register initializer Roger Pau Monne
2018-03-16 13:30 ` [PATCH v10 11/11] vpci/msix: add MSI-X handlers Roger Pau Monne
2018-03-20 11:03 ` Jan Beulich
2018-03-20 11:43 ` Roger Pau Monné
2018-03-20 11:48 ` Roger Pau Monné
2018-03-20 13:10 ` Jan Beulich
2018-03-20 13:48 ` Roger Pau Monné
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1fc2234601b74ccbaa59af1ebf015326@AMSPEX02CL03.citrite.net \
--to=paul.durrant@citrix.com \
--cc=Andrew.Cooper3@citrix.com \
--cc=George.Dunlap@citrix.com \
--cc=Ian.Jackson@citrix.com \
--cc=boris.ostrovsky@oracle.com \
--cc=jbeulich@suse.com \
--cc=julien.grall@arm.com \
--cc=roger.pau@citrix.com \
--cc=sstabellini@kernel.org \
--cc=tim@xen.org \
--cc=wei.liu2@citrix.com \
--cc=xen-devel@lists.xenproject.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).