From: Paul Durrant <Paul.Durrant@citrix.com>
To: "xen-devel@lists.xenproject.org" <xen-devel@lists.xenproject.org>
Cc: Wei Liu <wei.liu2@citrix.com>,
"konrad.wilk@oracle.com" <konrad.wilk@oracle.com>,
Andrew Cooper <Andrew.Cooper3@citrix.com>,
Jan Beulich <jbeulich@suse.com>,
Ian Jackson <Ian.Jackson@citrix.com>,
"boris.ostrovsky@oracle.com" <boris.ostrovsky@oracle.com>,
Roger Pau Monne <roger.pau@citrix.com>
Subject: Re: [PATCH v7 for-next 03/12] vpci: introduce basic handlers to trap accesses to the PCI config space
Date: Fri, 20 Oct 2017 09:34:10 +0000 [thread overview]
Message-ID: <4fb72a7cc1ec4a14b53461e5197af7f5@AMSPEX02CL03.citrite.net> (raw)
In-Reply-To: <20171018114034.36587-4-roger.pau@citrix.com>
> -----Original Message-----
> From: Roger Pau Monne [mailto:roger.pau@citrix.com]
> Sent: 18 October 2017 12:40
> To: xen-devel@lists.xenproject.org
> Cc: konrad.wilk@oracle.com; boris.ostrovsky@oracle.com; Roger Pau Monne
> <roger.pau@citrix.com>; Ian Jackson <Ian.Jackson@citrix.com>; Wei Liu
> <wei.liu2@citrix.com>; Jan Beulich <jbeulich@suse.com>; Andrew Cooper
> <Andrew.Cooper3@citrix.com>; Paul Durrant <Paul.Durrant@citrix.com>
> Subject: [PATCH v7 for-next 03/12] vpci: introduce basic handlers to trap
> accesses to the PCI config space
>
> This functionality is going to reside in vpci.c (and the corresponding
> vpci.h header), and should be arch-agnostic. The handlers introduced
> in this patch setup the basic functionality required in order to trap
> accesses to the PCI config space, and allow decoding the address and
> finding the corresponding handler that should handle the access
> (although no handlers are implemented).
>
> Note that the traps to the PCI IO ports registers (0xcf8/0xcfc) are
> setup inside of a x86 HVM file, since that's not shared with other
> arches.
>
> A new XEN_X86_EMU_VPCI x86 domain flag is added in order to signal Xen
> whether a domain should use the newly introduced vPCI handlers, this
> is only enabled for PVH Dom0 at the moment.
>
> A very simple user-space test is also provided, so that the basic
> functionality of the vPCI traps can be asserted. This has been proven
> quite helpful during development, since the logic to handle partial
> accesses or accesses that expand across multiple registers is not
> trivial.
>
> The handlers for the registers are added to a linked list that's keep
> sorted at all times. Both the read and write handlers support accesses
> that expand across multiple emulated registers and contain gaps not
> emulated.
>
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> Acked-by: Wei Liu <wei.liu2@citrix.com>
io parts:
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
> ---
> Cc: Ian Jackson <ian.jackson@eu.citrix.com>
> Cc: Wei Liu <wei.liu2@citrix.com>
> Cc: Jan Beulich <jbeulich@suse.com>
> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
> Cc: Paul Durrant <paul.durrant@citrix.com>
> ---
> Changes since v6:
> - Align the vpci handlers in the linker script.
> - Switch add/remove register functions to take a vpci parameter
> instead of a pci_dev.
> - Expand comment of merge_result.
> - Return X86EMUL_UNHANDLEABLE if accessing cfc and cf8 is disabled.
>
> Changes since v5:
> - Use a spinlock per pci device.
> - Use the recently introduced pci_sbdf_t type.
> - Fix test harness to use the right handler type and the newly
> introduced lock.
> - Move the position of the vpci sections in the linker scripts.
> - Constify domain and pci_dev in vpci_{read/write}.
> - Fix typos in comments.
> - Use _XEN_VPCI_H_ as header guard.
>
> Changes since v4:
> * User-space test harness:
> - Do not redirect the output of the test.
> - Add main.c and emul.h as dependencies of the Makefile target.
> - Use the same rule to modify the vpci and list headers.
> - Remove underscores from local macro variables.
> - Add _check suffix to the test harness multiread function.
> - Change the value written by every different size in the multiwrite
> test.
> - Use { } to initialize the r16 and r20 arrays (instead of { 0 }).
> - Perform some of the read checks with the local variable directly.
> - Expand some comments.
> - Implement a dummy rwlock.
> * Hypervisor code:
> - Guard the linker script changes with CONFIG_HAS_PCI.
> - Rename vpci_access_check to vpci_access_allowed and make it return
> bool.
> - Make hvm_pci_decode_addr return the register as return value.
> - Use ~3 instead of 0xfffc to remove the register offset when
> checking accesses to IO ports.
> - s/head/prev in vpci_add_register.
> - Add parentheses around & in vpci_add_register.
> - Fix register removal.
> - Change the BUGs in vpci_{read/write}_hw helpers to
> ASSERT_UNREACHABLE.
> - Make merge_result static and change the computation of the mask to
> avoid using a uint64_t.
> - Modify vpci_read to only read from hardware the not-emulated gaps.
> - Remove the vpci_val union and use a uint32_t instead.
> - Change handler read type to return a uint32_t instead of modifying
> a variable passed by reference.
> - Constify the data opaque parameter of read handlers.
> - Change the size parameter of the vpci_{read/write} functions to
> unsigned int.
> - Place the array of initialization handlers in init.rodata or
> .rodata depending on whether late-hwdom is enabled.
> - Remove the pci_devs lock, assume the Dom0 is well behaved and won't
> remove the device while trying to access it.
> - Change the recursive spinlock into a rw lock for performance
> reasons.
>
> Changes since v3:
> * User-space test harness:
> - Fix spaces in container_of macro.
> - Implement a dummy locking functions.
> - Remove 'current' macro make current a pointer to the statically
> allocated vpcu.
> - Remove unneeded parentheses in the pci_conf_readX macros.
> - Fix the name of the write test macro.
> - Remove the dummy EXPORT_SYMBOL macro (this was needed by the RB
> code only).
> - Import the max macro.
> - Test all possible read/write size combinations with all possible
> emulated register sizes.
> - Introduce a test for register removal.
> * Hypervisor code:
> - Use a sorted list in order to store the config space handlers.
> - Remove some unneeded 'else' branches.
> - Make the IO port handlers always return X86EMUL_OKAY, and set the
> data to all 1's in case of read failure (write are simply ignored).
> - In hvm_select_ioreq_server reuse local variables when calling
> XEN_DMOP_PCI_SBDF.
> - Store the pointers to the initialization functions in the .rodata
> section.
> - Do not ignore the return value of xen_vpci_add_handlers in
> setup_one_hwdom_device.
> - Remove the vpci_init macro.
> - Do not hide the pointers inside of the vpci_{read/write}_t
> typedefs.
> - Rename priv_data to private in vpci_register.
> - Simplify checking for register overlap in vpci_register_cmp.
> - Check that the offset and the length match before removing a
> register in xen_vpci_remove_register.
> - Make vpci_read_hw return a value rather than storing it in a
> pointer passed by parameter.
> - Handler dispatcher functions vpci_{read/write} no longer return an
> error code, errors on reads/writes should be treated like hardware
> (writes ignored, reads return all 1's or garbage).
> - Make sure pcidevs is locked before calling pci_get_pdev_by_domain.
> - Use a recursive spinlock for the vpci lock, so that spin_is_locked
> checks that the current CPU is holding the lock.
> - Make the code less error-chatty by removing some of the printk's.
> - Pass the slot and the function as separate parameters to the
> handler dispatchers (instead of passing devfn).
> - Allow handlers to be registered with either a read or write
> function only, the missing handler will be replaced by a dummy
> handler (writes ignored, reads return 1's).
> - Introduce PCI_CFG_SPACE_* defines from Linux.
> - Simplify the handler dispatchers by removing the recursion, now the
> dispatchers iterate over the list of sorted handlers and call them
> in order.
> - Remove the GENMASK_BYTES, SHIFT_RIGHT_BYTES and ADD_RESULT
> macros,
> and instead provide a merge_result function in order to merge a
> register output into a partial result.
> - Rename the fields of the vpci_val union to u8/u16/u32.
> - Remove the return values from the read/write handlers, errors
> should be handled internally and signaled as would be done on
> native hardware.
> - Remove the usage of the GENMASK macro.
>
> Changes since v2:
> - Generalize the PCI address decoding and use it for IOREQ code also.
>
> Changes since v1:
> - Allow access to cross a word-boundary.
> - Add locking.
> - Add cleanup to xen_vpci_add_handlers in case of failure.
> ---
> .gitignore | 3 +
> tools/libxl/libxl_x86.c | 2 +-
> tools/tests/Makefile | 1 +
> tools/tests/vpci/Makefile | 37 ++++
> tools/tests/vpci/emul.h | 133 +++++++++++
> tools/tests/vpci/main.c | 309 ++++++++++++++++++++++++++
> xen/arch/arm/xen.lds.S | 14 ++
> xen/arch/x86/domain.c | 18 +-
> xen/arch/x86/hvm/hvm.c | 2 +
> xen/arch/x86/hvm/io.c | 103 +++++++++
> xen/arch/x86/setup.c | 3 +-
> xen/arch/x86/xen.lds.S | 14 ++
> xen/drivers/Makefile | 2 +-
> xen/drivers/passthrough/pci.c | 10 +-
> xen/drivers/vpci/Makefile | 1 +
> xen/drivers/vpci/vpci.c | 451
> ++++++++++++++++++++++++++++++++++++++
> xen/include/asm-x86/domain.h | 1 +
> xen/include/asm-x86/hvm/io.h | 3 +
> xen/include/public/arch-x86/xen.h | 5 +-
> xen/include/xen/pci.h | 3 +
> xen/include/xen/pci_regs.h | 8 +
> xen/include/xen/vpci.h | 53 +++++
> 22 files changed, 1166 insertions(+), 10 deletions(-)
> create mode 100644 tools/tests/vpci/Makefile
> create mode 100644 tools/tests/vpci/emul.h
> create mode 100644 tools/tests/vpci/main.c
> create mode 100644 xen/drivers/vpci/Makefile
> create mode 100644 xen/drivers/vpci/vpci.c
> create mode 100644 xen/include/xen/vpci.h
>
> diff --git a/.gitignore b/.gitignore
> index d64b03d06c..cfe54c6e8f 100644
> --- a/.gitignore
> +++ b/.gitignore
> @@ -245,6 +245,9 @@ tools/tests/regression/build/*
> tools/tests/regression/downloads/*
> tools/tests/mem-sharing/memshrtool
> tools/tests/mce-test/tools/xen-mceinj
> +tools/tests/vpci/list.h
> +tools/tests/vpci/vpci.[hc]
> +tools/tests/vpci/test_vpci
> tools/xcutils/lsevtchn
> tools/xcutils/readnotes
> tools/xenbackendd/_paths.h
> diff --git a/tools/libxl/libxl_x86.c b/tools/libxl/libxl_x86.c
> index 5f91fe4f92..8f6a5bc6f2 100644
> --- a/tools/libxl/libxl_x86.c
> +++ b/tools/libxl/libxl_x86.c
> @@ -9,7 +9,7 @@ int libxl__arch_domain_prepare_config(libxl__gc *gc,
> {
> switch(d_config->c_info.type) {
> case LIBXL_DOMAIN_TYPE_HVM:
> - xc_config->emulation_flags = XEN_X86_EMU_ALL;
> + xc_config->emulation_flags = (XEN_X86_EMU_ALL &
> ~XEN_X86_EMU_VPCI);
> break;
> case LIBXL_DOMAIN_TYPE_PVH:
> if (libxl_defbool_val(d_config->b_info.apic))
> diff --git a/tools/tests/Makefile b/tools/tests/Makefile
> index 7162945121..f6942a93fb 100644
> --- a/tools/tests/Makefile
> +++ b/tools/tests/Makefile
> @@ -13,6 +13,7 @@ endif
> SUBDIRS-$(CONFIG_X86) += x86_emulator
> SUBDIRS-y += xen-access
> SUBDIRS-y += xenstore
> +SUBDIRS-$(CONFIG_HAS_PCI) += vpci
>
> .PHONY: all clean install distclean uninstall
> all clean distclean: %: subdirs-%
> diff --git a/tools/tests/vpci/Makefile b/tools/tests/vpci/Makefile
> new file mode 100644
> index 0000000000..e45fcb5cd9
> --- /dev/null
> +++ b/tools/tests/vpci/Makefile
> @@ -0,0 +1,37 @@
> +XEN_ROOT=$(CURDIR)/../../..
> +include $(XEN_ROOT)/tools/Rules.mk
> +
> +TARGET := test_vpci
> +
> +.PHONY: all
> +all: $(TARGET)
> +
> +.PHONY: run
> +run: $(TARGET)
> + ./$(TARGET)
> +
> +$(TARGET): vpci.c vpci.h list.h main.c emul.h
> + $(HOSTCC) -g -o $@ vpci.c main.c
> +
> +.PHONY: clean
> +clean:
> + rm -rf $(TARGET) *.o *~ vpci.h vpci.c list.h
> +
> +.PHONY: distclean
> +distclean: clean
> +
> +.PHONY: install
> +install:
> +
> +vpci.c: $(XEN_ROOT)/xen/drivers/vpci/vpci.c
> + # Trick the compiler so it doesn't complain about missing symbols
> + sed -e '/#include/d' \
> + -e '1s;^;#include "emul.h"\
> + vpci_register_init_t *const __start_vpci_array[1]\;\
> + vpci_register_init_t *const __end_vpci_array[1]\;\
> + ;' <$< >$@
> +
> +list.h: $(XEN_ROOT)/xen/include/xen/list.h
> +vpci.h: $(XEN_ROOT)/xen/include/xen/vpci.h
> +list.h vpci.h:
> + sed -e '/#include/d' <$< >$@
> diff --git a/tools/tests/vpci/emul.h b/tools/tests/vpci/emul.h
> new file mode 100644
> index 0000000000..ebd676723d
> --- /dev/null
> +++ b/tools/tests/vpci/emul.h
> @@ -0,0 +1,133 @@
> +/*
> + * Unit tests for the generic vPCI handler code.
> + *
> + * Copyright (C) 2017 Citrix Systems R&D
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms and conditions of the GNU General Public
> + * License, version 2, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> GNU
> + * General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public
> + * License along with this program; If not, see
> <http://www.gnu.org/licenses/>.
> + */
> +
> +#ifndef _TEST_VPCI_
> +#define _TEST_VPCI_
> +
> +#include <stdlib.h>
> +#include <stdio.h>
> +#include <stddef.h>
> +#include <stdint.h>
> +#include <stdbool.h>
> +#include <errno.h>
> +#include <assert.h>
> +
> +#define container_of(ptr, type, member) ({ \
> + typeof(((type *)0)->member) *mptr = (ptr); \
> + \
> + (type *)((char *)mptr - offsetof(type, member)); \
> +})
> +
> +#define smp_wmb()
> +#define prefetch(x) __builtin_prefetch(x)
> +#define ASSERT(x) assert(x)
> +#define __must_check __attribute__((__warn_unused_result__))
> +
> +#include "list.h"
> +
> +struct domain {
> +};
> +
> +struct pci_dev {
> + struct vpci *vpci;
> +};
> +
> +struct vcpu
> +{
> + const struct domain *domain;
> +};
> +
> +extern const struct vcpu *current;
> +extern const struct pci_dev test_pdev;
> +
> +typedef bool spinlock_t;
> +#define spin_lock_init(l) (*(l) = false)
> +#define spin_lock(l) (*(l) = true)
> +#define spin_unlock(l) (*(l) = false)
> +
> +typedef union {
> + uint32_t sbdf;
> + struct {
> + union {
> + uint16_t bdf;
> + struct {
> + union {
> + struct {
> + uint8_t func : 3,
> + dev : 5;
> + };
> + uint8_t extfunc;
> + };
> + uint8_t bus;
> + };
> + };
> + uint16_t seg;
> + };
> +} pci_sbdf_t;
> +
> +#include "vpci.h"
> +
> +#define __hwdom_init
> +
> +#define has_vpci(d) true
> +
> +#define xzalloc(type) ((type *)calloc(1, sizeof(type)))
> +#define xmalloc(type) ((type *)malloc(sizeof(type)))
> +#define xfree(p) free(p)
> +
> +#define pci_get_pdev_by_domain(...) &test_pdev
> +
> +/* Dummy native helpers. Writes are ignored, reads return 1's. */
> +#define pci_conf_read8(...) 0xff
> +#define pci_conf_read16(...) 0xffff
> +#define pci_conf_read32(...) 0xffffffff
> +#define pci_conf_write8(...)
> +#define pci_conf_write16(...)
> +#define pci_conf_write32(...)
> +
> +#define PCI_CFG_SPACE_EXP_SIZE 4096
> +
> +#define BUG() assert(0)
> +#define ASSERT_UNREACHABLE() assert(0)
> +
> +#define min(x, y) ({ \
> + const typeof(x) tx = (x); \
> + const typeof(y) ty = (y); \
> + \
> + (void) (&tx == &ty); \
> + tx < ty ? tx : ty; \
> +})
> +
> +#define max(x, y) ({ \
> + const typeof(x) tx = (x); \
> + const typeof(y) ty = (y); \
> + \
> + (void) (&tx == &ty); \
> + tx > ty ? tx : ty; \
> +})
> +
> +#endif
> +
> +/*
> + * Local variables:
> + * mode: C
> + * c-file-style: "BSD"
> + * c-basic-offset: 4
> + * indent-tabs-mode: nil
> + * End:
> + */
> diff --git a/tools/tests/vpci/main.c b/tools/tests/vpci/main.c
> new file mode 100644
> index 0000000000..b9a0a6006b
> --- /dev/null
> +++ b/tools/tests/vpci/main.c
> @@ -0,0 +1,309 @@
> +/*
> + * Unit tests for the generic vPCI handler code.
> + *
> + * Copyright (C) 2017 Citrix Systems R&D
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms and conditions of the GNU General Public
> + * License, version 2, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> GNU
> + * General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public
> + * License along with this program; If not, see
> <http://www.gnu.org/licenses/>.
> + */
> +
> +#include "emul.h"
> +
> +/* Single vcpu (current), and single domain with a single PCI device. */
> +static struct vpci vpci;
> +
> +const static struct domain d;
> +
> +const struct pci_dev test_pdev = {
> + .vpci = &vpci,
> +};
> +
> +const static struct vcpu v = {
> + .domain = &d
> +};
> +
> +const struct vcpu *current = &v;
> +
> +/* Dummy hooks, write stores data, read fetches it. */
> +static uint32_t vpci_read8(const struct pci_dev *pdev, unsigned int reg,
> + void *data)
> +{
> + return *(uint8_t *)data;
> +}
> +
> +static void vpci_write8(const struct pci_dev *pdev, unsigned int reg,
> + uint32_t val, void *data)
> +{
> + *(uint8_t *)data = val;
> +}
> +
> +static uint32_t vpci_read16(const struct pci_dev *pdev, unsigned int reg,
> + void *data)
> +{
> + return *(uint16_t *)data;
> +}
> +
> +static void vpci_write16(const struct pci_dev *pdev, unsigned int reg,
> + uint32_t val, void *data)
> +{
> + *(uint16_t *)data = val;
> +}
> +
> +static uint32_t vpci_read32(const struct pci_dev *pdev, unsigned int reg,
> + void *data)
> +{
> + return *(uint32_t *)data;
> +}
> +
> +static void vpci_write32(const struct pci_dev *pdev, unsigned int reg,
> + uint32_t val, void *data)
> +{
> + *(uint32_t *)data = val;
> +}
> +
> +#define VPCI_READ(reg, size, data) ({ \
> + data = vpci_read((pci_sbdf_t){ .sbdf = 0 }, reg, size); \
> +})
> +
> +#define VPCI_READ_CHECK(reg, size, expected) ({ \
> + uint32_t rd; \
> + \
> + VPCI_READ(reg, size, rd); \
> + assert(rd == (expected)); \
> +})
> +
> +#define VPCI_WRITE(reg, size, data) ({ \
> + vpci_write((pci_sbdf_t){ .sbdf = 0 }, reg, size, data); \
> +})
> +
> +#define VPCI_WRITE_CHECK(reg, size, data) ({ \
> + VPCI_WRITE(reg, size, data); \
> + VPCI_READ_CHECK(reg, size, data); \
> +})
> +
> +#define VPCI_ADD_REG(fread, fwrite, off, size, store) \
> + assert(!vpci_add_register(test_pdev.vpci, fread, fwrite, off, size, \
> + &store))
> +
> +#define VPCI_ADD_INVALID_REG(fread, fwrite, off, size) \
> + assert(vpci_add_register(test_pdev.vpci, fread, fwrite, off, size, NULL))
> +
> +#define VPCI_REMOVE_REG(off, size) \
> + assert(!vpci_remove_register(test_pdev.vpci, off, size))
> +
> +#define VPCI_REMOVE_INVALID_REG(off, size) \
> + assert(vpci_remove_register(test_pdev.vpci, off, size))
> +
> +/* Read a 32b register using all possible sizes. */
> +void multiread4_check(unsigned int reg, uint32_t val)
> +{
> + unsigned int i;
> +
> + /* Read using bytes. */
> + for ( i = 0; i < 4; i++ )
> + VPCI_READ_CHECK(reg + i, 1, (val >> (i * 8)) & UINT8_MAX);
> +
> + /* Read using 2bytes. */
> + for ( i = 0; i < 2; i++ )
> + VPCI_READ_CHECK(reg + i * 2, 2, (val >> (i * 2 * 8)) & UINT16_MAX);
> +
> + VPCI_READ_CHECK(reg, 4, val);
> +}
> +
> +void multiwrite4_check(unsigned int reg)
> +{
> + unsigned int i;
> + uint32_t val = 0xa2f51732;
> +
> + /* Write using bytes. */
> + for ( i = 0; i < 4; i++ )
> + VPCI_WRITE_CHECK(reg + i, 1, (val >> (i * 8)) & UINT8_MAX);
> + multiread4_check(reg, val);
> +
> + /* Change the value each time to be sure writes work fine. */
> + val = 0x2b836fda;
> + /* Write using 2bytes. */
> + for ( i = 0; i < 2; i++ )
> + VPCI_WRITE_CHECK(reg + i * 2, 2, (val >> (i * 2 * 8)) & UINT16_MAX);
> + multiread4_check(reg, val);
> +
> + val = 0xc4693beb;
> + VPCI_WRITE_CHECK(reg, 4, val);
> + multiread4_check(reg, val);
> +}
> +
> +int
> +main(int argc, char **argv)
> +{
> + /* Index storage by offset. */
> + uint32_t r0 = 0xdeadbeef;
> + uint8_t r5 = 0xef;
> + uint8_t r6 = 0xbe;
> + uint8_t r7 = 0xef;
> + uint16_t r12 = 0x8696;
> + uint8_t r16[4] = { };
> + uint16_t r20[2] = { };
> + uint32_t r24 = 0;
> + uint8_t r28, r30;
> + unsigned int i;
> + int rc;
> +
> + INIT_LIST_HEAD(&vpci.handlers);
> + spin_lock_init(&vpci.lock);
> +
> + VPCI_ADD_REG(vpci_read32, vpci_write32, 0, 4, r0);
> + VPCI_READ_CHECK(0, 4, r0);
> + VPCI_WRITE_CHECK(0, 4, 0xbcbcbcbc);
> +
> + VPCI_ADD_REG(vpci_read8, vpci_write8, 5, 1, r5);
> + VPCI_READ_CHECK(5, 1, r5);
> + VPCI_WRITE_CHECK(5, 1, 0xba);
> +
> + VPCI_ADD_REG(vpci_read8, vpci_write8, 6, 1, r6);
> + VPCI_READ_CHECK(6, 1, r6);
> + VPCI_WRITE_CHECK(6, 1, 0xba);
> +
> + VPCI_ADD_REG(vpci_read8, vpci_write8, 7, 1, r7);
> + VPCI_READ_CHECK(7, 1, r7);
> + VPCI_WRITE_CHECK(7, 1, 0xbd);
> +
> + VPCI_ADD_REG(vpci_read16, vpci_write16, 12, 2, r12);
> + VPCI_READ_CHECK(12, 2, r12);
> + VPCI_READ_CHECK(12, 4, 0xffff8696);
> +
> + /*
> + * At this point we have the following layout:
> + *
> + * Note that this refers to the position of the variables,
> + * but the value has already changed from the one given at
> + * initialization time because write tests have been performed.
> + *
> + * 32 24 16 8 0
> + * +-----+-----+-----+-----+
> + * | r0 | 0
> + * +-----+-----+-----+-----+
> + * | r7 | r6 | r5 |/////| 32
> + * +-----+-----+-----+-----|
> + * |///////////////////////| 64
> + * +-----------+-----------+
> + * |///////////| r12 | 96
> + * +-----------+-----------+
> + * ...
> + * / = unhandled.
> + */
> +
> + /* Try to add an overlapping register handler. */
> + VPCI_ADD_INVALID_REG(vpci_read32, vpci_write32, 4, 4);
> +
> + /* Try to add a non-aligned register. */
> + VPCI_ADD_INVALID_REG(vpci_read16, vpci_write16, 15, 2);
> +
> + /* Try to add a register with wrong size. */
> + VPCI_ADD_INVALID_REG(vpci_read16, vpci_write16, 8, 3);
> +
> + /* Try to add a register with missing handlers. */
> + VPCI_ADD_INVALID_REG(NULL, NULL, 8, 2);
> +
> + /* Read/write of unset register. */
> + VPCI_READ_CHECK(8, 4, 0xffffffff);
> + VPCI_READ_CHECK(8, 2, 0xffff);
> + VPCI_READ_CHECK(8, 1, 0xff);
> + VPCI_WRITE(10, 2, 0xbeef);
> + VPCI_READ_CHECK(10, 2, 0xffff);
> +
> + /* Read of multiple registers */
> + VPCI_WRITE_CHECK(7, 1, 0xbd);
> + VPCI_READ_CHECK(4, 4, 0xbdbabaff);
> +
> + /* Partial read of a register. */
> + VPCI_WRITE_CHECK(0, 4, 0x1a1b1c1d);
> + VPCI_READ_CHECK(2, 1, 0x1b);
> + VPCI_READ_CHECK(6, 2, 0xbdba);
> +
> + /* Write of multiple registers. */
> + VPCI_WRITE_CHECK(4, 4, 0xaabbccff);
> +
> + /* Partial write of a register. */
> + VPCI_WRITE_CHECK(2, 1, 0xfe);
> + VPCI_WRITE_CHECK(6, 2, 0xfebc);
> +
> + /*
> + * Test all possible read/write size combinations.
> + *
> + * Place 4 1B registers at 128bits (16B), 2 2B registers at 160bits
> + * (20B) and finally 1 4B register at 192bits (24B).
> + *
> + * Then perform all possible write and read sizes on each of them.
> + *
> + * ...
> + * 32 24 16 8 0
> + * +------+------+------+------+
> + * |r16[3]|r16[2]|r16[1]|r16[0]| 16
> + * +------+------+------+------+
> + * | r20[1] | r20[0] | 20
> + * +-------------+-------------|
> + * | r24 | 24
> + * +-------------+-------------+
> + *
> + */
> + VPCI_ADD_REG(vpci_read8, vpci_write8, 16, 1, r16[0]);
> + VPCI_ADD_REG(vpci_read8, vpci_write8, 17, 1, r16[1]);
> + VPCI_ADD_REG(vpci_read8, vpci_write8, 18, 1, r16[2]);
> + VPCI_ADD_REG(vpci_read8, vpci_write8, 19, 1, r16[3]);
> +
> + VPCI_ADD_REG(vpci_read16, vpci_write16, 20, 2, r20[0]);
> + VPCI_ADD_REG(vpci_read16, vpci_write16, 22, 2, r20[1]);
> +
> + VPCI_ADD_REG(vpci_read32, vpci_write32, 24, 4, r24);
> +
> + /* Check the initial value is 0. */
> + multiread4_check(16, 0);
> + multiread4_check(20, 0);
> + multiread4_check(24, 0);
> +
> + multiwrite4_check(16);
> + multiwrite4_check(20);
> + multiwrite4_check(24);
> +
> + /*
> + * Check multiple non-consecutive gaps on the same read/write:
> + *
> + * 32 24 16 8 0
> + * +------+------+------+------+
> + * |//////| r30 |//////| r28 | 28
> + * +------+------+------+------+
> + *
> + */
> + VPCI_ADD_REG(vpci_read8, vpci_write8, 28, 1, r28);
> + VPCI_ADD_REG(vpci_read8, vpci_write8, 30, 1, r30);
> + VPCI_WRITE_CHECK(28, 4, 0xffacffdc);
> +
> + /* Finally try to remove a couple of registers. */
> + VPCI_REMOVE_REG(28, 1);
> + VPCI_REMOVE_REG(24, 4);
> + VPCI_REMOVE_REG(12, 2);
> +
> + VPCI_REMOVE_INVALID_REG(20, 1);
> + VPCI_REMOVE_INVALID_REG(16, 2);
> + VPCI_REMOVE_INVALID_REG(30, 2);
> +
> + return 0;
> +}
> +
> +/*
> + * Local variables:
> + * mode: C
> + * c-file-style: "BSD"
> + * c-basic-offset: 4
> + * indent-tabs-mode: nil
> + * End:
> + */
> diff --git a/xen/arch/arm/xen.lds.S b/xen/arch/arm/xen.lds.S
> index c9b9546435..98b82680c6 100644
> --- a/xen/arch/arm/xen.lds.S
> +++ b/xen/arch/arm/xen.lds.S
> @@ -65,6 +65,13 @@ SECTIONS
> __param_start = .;
> *(.data.param)
> __param_end = .;
> +
> +#if defined(CONFIG_HAS_PCI) && defined(CONFIG_LATE_HWDOM)
> + . = ALIGN(POINTER_ALIGN);
> + __start_vpci_array = .;
> + *(.data.vpci)
> + __end_vpci_array = .;
> +#endif
> } :text
>
> #if defined(BUILD_ID)
> @@ -173,6 +180,13 @@ SECTIONS
> *(.init_array)
> *(SORT(.init_array.*))
> __ctors_end = .;
> +
> +#if defined(CONFIG_HAS_PCI) && !defined(CONFIG_LATE_HWDOM)
> + . = ALIGN(POINTER_ALIGN);
> + __start_vpci_array = .;
> + *(.data.vpci)
> + __end_vpci_array = .;
> +#endif
> } :text
> __init_end_efi = .;
> . = ALIGN(STACK_SIZE);
> diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
> index c28ac38fbe..4c22e0952e 100644
> --- a/xen/arch/x86/domain.c
> +++ b/xen/arch/x86/domain.c
> @@ -397,11 +397,21 @@ static bool emulation_flags_ok(const struct domain
> *d, uint32_t emflags)
> if ( is_hvm_domain(d) )
> {
> if ( is_hardware_domain(d) &&
> - emflags != (XEN_X86_EMU_LAPIC|XEN_X86_EMU_IOAPIC) )
> - return false;
> - if ( !is_hardware_domain(d) && emflags &&
> - emflags != XEN_X86_EMU_ALL && emflags != XEN_X86_EMU_LAPIC )
> + emflags != (XEN_X86_EMU_LAPIC|XEN_X86_EMU_IOAPIC|
> + XEN_X86_EMU_VPCI) )
> return false;
> + if ( !is_hardware_domain(d) )
> + {
> + switch ( emflags )
> + {
> + case XEN_X86_EMU_ALL & ~XEN_X86_EMU_VPCI:
> + case XEN_X86_EMU_LAPIC:
> + case 0:
> + break;
> + default:
> + return false;
> + }
> + }
> }
> else if ( emflags != 0 && emflags != XEN_X86_EMU_PIT )
> {
> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
> index 205b4cb685..8ed6718bf6 100644
> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -36,6 +36,7 @@
> #include <xen/rangeset.h>
> #include <xen/monitor.h>
> #include <xen/warning.h>
> +#include <xen/vpci.h>
> #include <asm/shadow.h>
> #include <asm/hap.h>
> #include <asm/current.h>
> @@ -629,6 +630,7 @@ int hvm_domain_initialise(struct domain *d, unsigned
> long domcr_flags,
> d->arch.hvm_domain.io_bitmap = hvm_io_bitmap;
>
> register_g2m_portio_handler(d);
> + register_vpci_portio_handler(d);
>
> hvm_ioreq_init(d);
>
> diff --git a/xen/arch/x86/hvm/io.c b/xen/arch/x86/hvm/io.c
> index 6579e119ff..6c12cf5d22 100644
> --- a/xen/arch/x86/hvm/io.c
> +++ b/xen/arch/x86/hvm/io.c
> @@ -25,6 +25,7 @@
> #include <xen/trace.h>
> #include <xen/event.h>
> #include <xen/hypercall.h>
> +#include <xen/vpci.h>
> #include <asm/current.h>
> #include <asm/cpufeature.h>
> #include <asm/processor.h>
> @@ -278,6 +279,108 @@ unsigned int hvm_pci_decode_addr(unsigned int
> cf8, unsigned int addr,
> return CF8_ADDR_LO(cf8) | (addr & 3);
> }
>
> +/* Do some sanity checks. */
> +static bool vpci_access_allowed(unsigned int reg, unsigned int len)
> +{
> + /* Check access size. */
> + if ( len != 1 && len != 2 && len != 4 )
> + return false;
> +
> + /* Check that access is size aligned. */
> + if ( (reg & (len - 1)) )
> + return false;
> +
> + return true;
> +}
> +
> +/* vPCI config space IO ports handlers (0xcf8/0xcfc). */
> +static bool vpci_portio_accept(const struct hvm_io_handler *handler,
> + const ioreq_t *p)
> +{
> + return (p->addr == 0xcf8 && p->size == 4) || (p->addr & ~3) == 0xcfc;
> +}
> +
> +static int vpci_portio_read(const struct hvm_io_handler *handler,
> + uint64_t addr, uint32_t size, uint64_t *data)
> +{
> + struct domain *d = current->domain;
> + unsigned int reg;
> + pci_sbdf_t sbdf;
> + uint32_t cf8;
> +
> + *data = ~(uint64_t)0;
> +
> + if ( addr == 0xcf8 )
> + {
> + ASSERT(size == 4);
> + *data = d->arch.hvm_domain.pci_cf8;
> + return X86EMUL_OKAY;
> + }
> +
> + cf8 = ACCESS_ONCE(d->arch.hvm_domain.pci_cf8);
> + if ( !CF8_ENABLED(cf8) )
> + return X86EMUL_UNHANDLEABLE;
> +
> + reg = hvm_pci_decode_addr(cf8, addr, &sbdf);
> +
> + if ( !vpci_access_allowed(reg, size) )
> + return X86EMUL_OKAY;
> +
> + *data = vpci_read(sbdf, reg, size);
> +
> + return X86EMUL_OKAY;
> +}
> +
> +static int vpci_portio_write(const struct hvm_io_handler *handler,
> + uint64_t addr, uint32_t size, uint64_t data)
> +{
> + struct domain *d = current->domain;
> + unsigned int reg;
> + pci_sbdf_t sbdf;
> + uint32_t cf8;
> +
> + if ( addr == 0xcf8 )
> + {
> + ASSERT(size == 4);
> + d->arch.hvm_domain.pci_cf8 = data;
> + return X86EMUL_OKAY;
> + }
> +
> + cf8 = ACCESS_ONCE(d->arch.hvm_domain.pci_cf8);
> + if ( !CF8_ENABLED(cf8) )
> + return X86EMUL_UNHANDLEABLE;
> +
> + reg = hvm_pci_decode_addr(cf8, addr, &sbdf);
> +
> + if ( !vpci_access_allowed(reg, size) )
> + return X86EMUL_OKAY;
> +
> + vpci_write(sbdf, reg, size, data);
> +
> + return X86EMUL_OKAY;
> +}
> +
> +static const struct hvm_io_ops vpci_portio_ops = {
> + .accept = vpci_portio_accept,
> + .read = vpci_portio_read,
> + .write = vpci_portio_write,
> +};
> +
> +void register_vpci_portio_handler(struct domain *d)
> +{
> + struct hvm_io_handler *handler;
> +
> + if ( !has_vpci(d) )
> + return;
> +
> + handler = hvm_next_io_handler(d);
> + if ( !handler )
> + return;
> +
> + handler->type = IOREQ_TYPE_PIO;
> + handler->ops = &vpci_portio_ops;
> +}
> +
> /*
> * Local variables:
> * mode: C
> diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
> index 32bb02e3a5..528cc464ba 100644
> --- a/xen/arch/x86/setup.c
> +++ b/xen/arch/x86/setup.c
> @@ -1582,7 +1582,8 @@ void __init noreturn __start_xen(unsigned long
> mbi_p)
> domcr_flags |= DOMCRF_hvm |
> ((hvm_funcs.hap_supported && !opt_dom0_shadow) ?
> DOMCRF_hap : 0);
> - config.emulation_flags =
> XEN_X86_EMU_LAPIC|XEN_X86_EMU_IOAPIC;
> + config.emulation_flags =
> XEN_X86_EMU_LAPIC|XEN_X86_EMU_IOAPIC|
> + XEN_X86_EMU_VPCI;
> }
>
> /* Create initial domain 0. */
> diff --git a/xen/arch/x86/xen.lds.S b/xen/arch/x86/xen.lds.S
> index d5e8821d41..6c50916ed2 100644
> --- a/xen/arch/x86/xen.lds.S
> +++ b/xen/arch/x86/xen.lds.S
> @@ -124,6 +124,13 @@ SECTIONS
> __param_start = .;
> *(.data.param)
> __param_end = .;
> +
> +#if defined(CONFIG_HAS_PCI) && defined(CONFIG_LATE_HWDOM)
> + . = ALIGN(POINTER_ALIGN);
> + __start_vpci_array = .;
> + *(.data.vpci)
> + __end_vpci_array = .;
> +#endif
> } :text
>
> #if defined(BUILD_ID)
> @@ -213,6 +220,13 @@ SECTIONS
> *(.init_array)
> *(SORT(.init_array.*))
> __ctors_end = .;
> +
> +#if defined(CONFIG_HAS_PCI) && !defined(CONFIG_LATE_HWDOM)
> + . = ALIGN(POINTER_ALIGN);
> + __start_vpci_array = .;
> + *(.data.vpci)
> + __end_vpci_array = .;
> +#endif
> } :text
>
> #ifdef EFI
> diff --git a/xen/drivers/Makefile b/xen/drivers/Makefile
> index 19391802a8..d51c766453 100644
> --- a/xen/drivers/Makefile
> +++ b/xen/drivers/Makefile
> @@ -1,6 +1,6 @@
> subdir-y += char
> subdir-$(CONFIG_HAS_CPUFREQ) += cpufreq
> -subdir-$(CONFIG_HAS_PCI) += pci
> +subdir-$(CONFIG_HAS_PCI) += pci vpci
> subdir-$(CONFIG_HAS_PASSTHROUGH) += passthrough
> subdir-$(CONFIG_ACPI) += acpi
> subdir-$(CONFIG_VIDEO) += video
> diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
> index 469dfc6c3d..519993d536 100644
> --- a/xen/drivers/passthrough/pci.c
> +++ b/xen/drivers/passthrough/pci.c
> @@ -31,6 +31,7 @@
> #include <xen/radix-tree.h>
> #include <xen/softirq.h>
> #include <xen/tasklet.h>
> +#include <xen/vpci.h>
> #include <xsm/xsm.h>
> #include <asm/msi.h>
> #include "ats.h"
> @@ -1052,10 +1053,10 @@ static void __hwdom_init
> setup_one_hwdom_device(const struct setup_hwdom *ctxt,
> struct pci_dev *pdev)
> {
> u8 devfn = pdev->devfn;
> + int err;
>
> do {
> - int err = ctxt->handler(devfn, pdev);
> -
> + err = ctxt->handler(devfn, pdev);
> if ( err )
> {
> printk(XENLOG_ERR "setup %04x:%02x:%02x.%u for d%d failed
> (%d)\n",
> @@ -1067,6 +1068,11 @@ static void __hwdom_init
> setup_one_hwdom_device(const struct setup_hwdom *ctxt,
> devfn += pdev->phantom_stride;
> } while ( devfn != pdev->devfn &&
> PCI_SLOT(devfn) == PCI_SLOT(pdev->devfn) );
> +
> + err = vpci_add_handlers(pdev);
> + if ( err )
> + printk(XENLOG_ERR "setup of vPCI for d%d failed: %d\n",
> + ctxt->d->domain_id, err);
> }
>
> static int __hwdom_init _setup_hwdom_pci_devices(struct pci_seg *pseg,
> void *arg)
> diff --git a/xen/drivers/vpci/Makefile b/xen/drivers/vpci/Makefile
> new file mode 100644
> index 0000000000..840a906470
> --- /dev/null
> +++ b/xen/drivers/vpci/Makefile
> @@ -0,0 +1 @@
> +obj-y += vpci.o
> diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
> new file mode 100644
> index 0000000000..788825f5fd
> --- /dev/null
> +++ b/xen/drivers/vpci/vpci.c
> @@ -0,0 +1,451 @@
> +/*
> + * Generic functionality for handling accesses to the PCI configuration space
> + * from guests.
> + *
> + * Copyright (C) 2017 Citrix Systems R&D
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms and conditions of the GNU General Public
> + * License, version 2, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> GNU
> + * General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public
> + * License along with this program; If not, see
> <http://www.gnu.org/licenses/>.
> + */
> +
> +#include <xen/sched.h>
> +#include <xen/vpci.h>
> +
> +extern vpci_register_init_t *const __start_vpci_array[];
> +extern vpci_register_init_t *const __end_vpci_array[];
> +#define NUM_VPCI_INIT (__end_vpci_array - __start_vpci_array)
> +
> +/* Internal struct to store the emulated PCI registers. */
> +struct vpci_register {
> + vpci_read_t *read;
> + vpci_write_t *write;
> + unsigned int size;
> + unsigned int offset;
> + void *private;
> + struct list_head node;
> +};
> +
> +int __hwdom_init vpci_add_handlers(struct pci_dev *pdev)
> +{
> + unsigned int i;
> + int rc = 0;
> +
> + if ( !has_vpci(pdev->domain) )
> + return 0;
> +
> + pdev->vpci = xzalloc(struct vpci);
> + if ( !pdev->vpci )
> + return -ENOMEM;
> +
> + INIT_LIST_HEAD(&pdev->vpci->handlers);
> + spin_lock_init(&pdev->vpci->lock);
> +
> + for ( i = 0; i < NUM_VPCI_INIT; i++ )
> + {
> + rc = __start_vpci_array[i](pdev);
> + if ( rc )
> + break;
> + }
> +
> + if ( rc )
> + {
> + while ( !list_empty(&pdev->vpci->handlers) )
> + {
> + struct vpci_register *r = list_first_entry(&pdev->vpci->handlers,
> + struct vpci_register,
> + node);
> +
> + list_del(&r->node);
> + xfree(r);
> + }
> + xfree(pdev->vpci);
> + pdev->vpci = NULL;
> + }
> +
> + return rc;
> +}
> +
> +static int vpci_register_cmp(const struct vpci_register *r1,
> + const struct vpci_register *r2)
> +{
> + /* Return 0 if registers overlap. */
> + if ( r1->offset < r2->offset + r2->size &&
> + r2->offset < r1->offset + r1->size )
> + return 0;
> + if ( r1->offset < r2->offset )
> + return -1;
> + if ( r1->offset > r2->offset )
> + return 1;
> +
> + ASSERT_UNREACHABLE();
> + return 0;
> +}
> +
> +/* Dummy hooks, writes are ignored, reads return 1's */
> +static uint32_t vpci_ignored_read(const struct pci_dev *pdev, unsigned int
> reg,
> + void *data)
> +{
> + return ~(uint32_t)0;
> +}
> +
> +static void vpci_ignored_write(const struct pci_dev *pdev, unsigned int reg,
> + uint32_t val, void *data)
> +{
> +}
> +
> +int vpci_add_register(struct vpci *vpci, vpci_read_t *read_handler,
> + vpci_write_t *write_handler, unsigned int offset,
> + unsigned int size, void *data)
> +{
> + struct list_head *prev;
> + struct vpci_register *r;
> +
> + /* Some sanity checks. */
> + if ( (size != 1 && size != 2 && size != 4) ||
> + offset >= PCI_CFG_SPACE_EXP_SIZE || (offset & (size - 1)) ||
> + (!read_handler && !write_handler) )
> + return -EINVAL;
> +
> + r = xmalloc(struct vpci_register);
> + if ( !r )
> + return -ENOMEM;
> +
> + r->read = read_handler ?: vpci_ignored_read;
> + r->write = write_handler ?: vpci_ignored_write;
> + r->size = size;
> + r->offset = offset;
> + r->private = data;
> +
> + spin_lock(&vpci->lock);
> +
> + /* The list of handlers must be kept sorted at all times. */
> + list_for_each ( prev, &vpci->handlers )
> + {
> + const struct vpci_register *this =
> + list_entry(prev, const struct vpci_register, node);
> + int cmp = vpci_register_cmp(r, this);
> +
> + if ( cmp < 0 )
> + break;
> + if ( cmp == 0 )
> + {
> + spin_unlock(&vpci->lock);
> + xfree(r);
> + return -EEXIST;
> + }
> + }
> +
> + list_add_tail(&r->node, prev);
> + spin_unlock(&vpci->lock);
> +
> + return 0;
> +}
> +
> +int vpci_remove_register(struct vpci *vpci, unsigned int offset,
> + unsigned int size)
> +{
> + const struct vpci_register r = { .offset = offset, .size = size };
> + struct vpci_register *rm;
> +
> + spin_lock(&vpci->lock);
> + list_for_each_entry ( rm, &vpci->handlers, node )
> + {
> + int cmp = vpci_register_cmp(&r, rm);
> +
> + /*
> + * NB: do not use a switch so that we can use break to
> + * get out of the list loop earlier if required.
> + */
> + if ( !cmp && rm->offset == offset && rm->size == size )
> + {
> + list_del(&rm->node);
> + spin_unlock(&vpci->lock);
> + xfree(rm);
> + return 0;
> + }
> + if ( cmp <= 0 )
> + break;
> + }
> + spin_unlock(&vpci->lock);
> +
> + return -ENOENT;
> +}
> +
> +/* Wrappers for performing reads/writes to the underlying hardware. */
> +static uint32_t vpci_read_hw(pci_sbdf_t sbdf, unsigned int reg,
> + unsigned int size)
> +{
> + uint32_t data;
> +
> + switch ( size )
> + {
> + case 4:
> + data = pci_conf_read32(sbdf.seg, sbdf.bus, sbdf.dev, sbdf.func, reg);
> + break;
> + case 3:
> + /*
> + * This is possible because a 4byte read can have 1byte trapped and
> + * the rest passed-through.
> + */
> + if ( reg & 1 )
> + {
> + data = pci_conf_read8(sbdf.seg, sbdf.bus, sbdf.dev, sbdf.func,
> + reg);
> + data |= pci_conf_read16(sbdf.seg, sbdf.bus, sbdf.dev, sbdf.func,
> + reg + 1) << 8;
> + }
> + else
> + {
> + data = pci_conf_read16(sbdf.seg, sbdf.bus, sbdf.dev, sbdf.func,
> + reg);
> + data |= pci_conf_read8(sbdf.seg, sbdf.bus, sbdf.dev, sbdf.func,
> + reg + 2) << 16;
> + }
> + break;
> + case 2:
> + data = pci_conf_read16(sbdf.seg, sbdf.bus, sbdf.dev, sbdf.func, reg);
> + break;
> + case 1:
> + data = pci_conf_read8(sbdf.seg, sbdf.bus, sbdf.dev, sbdf.func, reg);
> + break;
> + default:
> + ASSERT_UNREACHABLE();
> + data = ~(uint32_t)0;
> + break;
> + }
> +
> + return data;
> +}
> +
> +static void vpci_write_hw(pci_sbdf_t sbdf, unsigned int reg, unsigned int
> size,
> + uint32_t data)
> +{
> + switch ( size )
> + {
> + case 4:
> + pci_conf_write32(sbdf.seg, sbdf.bus, sbdf.dev, sbdf.func, reg, data);
> + break;
> + case 3:
> + /*
> + * This is possible because a 4byte write can have 1byte trapped and
> + * the rest passed-through.
> + */
> + if ( reg & 1 )
> + {
> + pci_conf_write8(sbdf.seg, sbdf.bus, sbdf.dev, sbdf.func, reg,
> + data);
> + pci_conf_write16(sbdf.seg, sbdf.bus, sbdf.dev, sbdf.func, reg + 1,
> + data >> 8);
> + }
> + else
> + {
> + pci_conf_write16(sbdf.seg, sbdf.bus, sbdf.dev, sbdf.func, reg,
> + data);
> + pci_conf_write8(sbdf.seg, sbdf.bus, sbdf.dev, sbdf.func, reg + 2,
> + data >> 16);
> + }
> + break;
> + case 2:
> + pci_conf_write16(sbdf.seg, sbdf.bus, sbdf.dev, sbdf.func, reg, data);
> + break;
> + case 1:
> + pci_conf_write8(sbdf.seg, sbdf.bus, sbdf.dev, sbdf.func, reg, data);
> + break;
> + default:
> + ASSERT_UNREACHABLE();
> + break;
> + }
> +}
> +
> +/*
> + * Merge new data into a partial result.
> + *
> + * Copy the value found in 'new' from [0, size) left shifted by
> + * 'offset' into 'data'. Note that both 'size' and 'offset' are
> + * in byte units.
> + */
> +static uint32_t merge_result(uint32_t data, uint32_t new, unsigned int size,
> + unsigned int offset)
> +{
> + uint32_t mask = 0xffffffff >> (32 - 8 * size);
> +
> + return (data & ~(mask << (offset * 8))) | ((new & mask) << (offset * 8));
> +}
> +
> +uint32_t vpci_read(pci_sbdf_t sbdf, unsigned int reg, unsigned int size)
> +{
> + const struct domain *d = current->domain;
> + const struct pci_dev *pdev;
> + const struct vpci_register *r;
> + unsigned int data_offset = 0;
> + uint32_t data = ~(uint32_t)0;
> +
> + /* Find the PCI dev matching the address. */
> + pdev = pci_get_pdev_by_domain(d, sbdf.seg, sbdf.bus, sbdf.extfunc);
> + if ( !pdev )
> + return vpci_read_hw(sbdf, reg, size);
> +
> + spin_lock(&pdev->vpci->lock);
> +
> + /* Read from the hardware or the emulated register handlers. */
> + list_for_each_entry ( r, &pdev->vpci->handlers, node )
> + {
> + const struct vpci_register emu = {
> + .offset = reg + data_offset,
> + .size = size - data_offset
> + };
> + int cmp = vpci_register_cmp(&emu, r);
> + uint32_t val;
> + unsigned int read_size;
> +
> + if ( cmp < 0 )
> + break;
> + if ( cmp > 0 )
> + continue;
> +
> + if ( emu.offset < r->offset )
> + {
> + /* Heading gap, read partial content from hardware. */
> + read_size = r->offset - emu.offset;
> + val = vpci_read_hw(sbdf, emu.offset, read_size);
> + data = merge_result(data, val, read_size, data_offset);
> + data_offset += read_size;
> + }
> +
> + val = r->read(pdev, r->offset, r->private);
> +
> + /* Check if the read is in the middle of a register. */
> + if ( r->offset < emu.offset )
> + val >>= (emu.offset - r->offset) * 8;
> +
> + /* Find the intersection size between the two sets. */
> + read_size = min(emu.offset + emu.size, r->offset + r->size) -
> + max(emu.offset, r->offset);
> + /* Merge the emulated data into the native read value. */
> + data = merge_result(data, val, read_size, data_offset);
> + data_offset += read_size;
> + if ( data_offset == size )
> + break;
> + ASSERT(data_offset < size);
> + }
> +
> + if ( data_offset < size )
> + {
> + /* Tailing gap, read the remaining. */
> + uint32_t tmp_data = vpci_read_hw(sbdf, reg + data_offset,
> + size - data_offset);
> +
> + data = merge_result(data, tmp_data, size - data_offset, data_offset);
> + }
> + spin_unlock(&pdev->vpci->lock);
> +
> + return data & (0xffffffff >> (32 - 8 * size));
> +}
> +
> +/*
> + * Perform a maybe partial write to a register.
> + *
> + * Note that this will only work for simple registers, if Xen needs to
> + * trap accesses to rw1c registers (like the status PCI header register)
> + * the logic in vpci_write will have to be expanded in order to correctly
> + * deal with them.
> + */
> +static void vpci_write_helper(const struct pci_dev *pdev,
> + const struct vpci_register *r, unsigned int size,
> + unsigned int offset, uint32_t data)
> +{
> + ASSERT(size <= r->size);
> +
> + if ( size != r->size )
> + {
> + uint32_t val;
> +
> + val = r->read(pdev, r->offset, r->private);
> + data = merge_result(val, data, size, offset);
> + }
> +
> + r->write(pdev, r->offset, data & (0xffffffff >> (32 - 8 * r->size)),
> + r->private);
> +}
> +
> +void vpci_write(pci_sbdf_t sbdf, unsigned int reg, unsigned int size,
> + uint32_t data)
> +{
> + const struct domain *d = current->domain;
> + const struct pci_dev *pdev;
> + const struct vpci_register *r;
> + unsigned int data_offset = 0;
> +
> + /*
> + * Find the PCI dev matching the address.
> + * Passthrough everything that's not trapped.
> + */
> + pdev = pci_get_pdev_by_domain(d, sbdf.seg, sbdf.bus, sbdf.extfunc);
> + if ( !pdev )
> + {
> + vpci_write_hw(sbdf, reg, size, data);
> + return;
> + }
> +
> + spin_lock(&pdev->vpci->lock);
> +
> + /* Write the value to the hardware or emulated registers. */
> + list_for_each_entry ( r, &pdev->vpci->handlers, node )
> + {
> + const struct vpci_register emu = {
> + .offset = reg + data_offset,
> + .size = size - data_offset
> + };
> + int cmp = vpci_register_cmp(&emu, r);
> + unsigned int write_size;
> +
> + if ( cmp < 0 )
> + break;
> + if ( cmp > 0 )
> + continue;
> +
> + if ( emu.offset < r->offset )
> + {
> + /* Heading gap, write partial content to hardware. */
> + vpci_write_hw(sbdf, emu.offset, r->offset - emu.offset,
> + data >> (data_offset * 8));
> + data_offset += r->offset - emu.offset;
> + }
> +
> + /* Find the intersection size between the two sets. */
> + write_size = min(emu.offset + emu.size, r->offset + r->size) -
> + max(emu.offset, r->offset);
> + vpci_write_helper(pdev, r, write_size, reg + data_offset - r->offset,
> + data >> (data_offset * 8));
> + data_offset += write_size;
> + if ( data_offset == size )
> + break;
> + ASSERT(data_offset < size);
> + }
> +
> + if ( data_offset < size )
> + /* Tailing gap, write the remaining. */
> + vpci_write_hw(sbdf, reg + data_offset, size - data_offset,
> + data >> (data_offset * 8));
> +
> + spin_unlock(&pdev->vpci->lock);
> +}
> +
> +/*
> + * Local variables:
> + * mode: C
> + * c-file-style: "BSD"
> + * c-basic-offset: 4
> + * tab-width: 4
> + * indent-tabs-mode: nil
> + * End:
> + */
> diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h
> index 4d0b77dc28..72a3dd8e89 100644
> --- a/xen/include/asm-x86/domain.h
> +++ b/xen/include/asm-x86/domain.h
> @@ -430,6 +430,7 @@ struct arch_domain
> #define has_vpit(d) (!!((d)->arch.emulation_flags &
> XEN_X86_EMU_PIT))
> #define has_pirq(d) (!!((d)->arch.emulation_flags & \
> XEN_X86_EMU_USE_PIRQ))
> +#define has_vpci(d) (!!((d)->arch.emulation_flags &
> XEN_X86_EMU_VPCI))
>
> #define has_arch_pdevs(d) (!list_empty(&(d)->arch.pdev_list))
>
> diff --git a/xen/include/asm-x86/hvm/io.h b/xen/include/asm-x86/hvm/io.h
> index 707665fbba..ff0bea5d53 100644
> --- a/xen/include/asm-x86/hvm/io.h
> +++ b/xen/include/asm-x86/hvm/io.h
> @@ -160,6 +160,9 @@ unsigned int hvm_pci_decode_addr(unsigned int cf8,
> unsigned int addr,
> */
> void register_g2m_portio_handler(struct domain *d);
>
> +/* HVM port IO handler for vPCI accesses. */
> +void register_vpci_portio_handler(struct domain *d);
> +
> #endif /* __ASM_X86_HVM_IO_H__ */
>
>
> diff --git a/xen/include/public/arch-x86/xen.h b/xen/include/public/arch-
> x86/xen.h
> index ff918310f6..06ef4772cd 100644
> --- a/xen/include/public/arch-x86/xen.h
> +++ b/xen/include/public/arch-x86/xen.h
> @@ -293,12 +293,15 @@ struct xen_arch_domainconfig {
> #define XEN_X86_EMU_PIT (1U<<_XEN_X86_EMU_PIT)
> #define _XEN_X86_EMU_USE_PIRQ 9
> #define XEN_X86_EMU_USE_PIRQ (1U<<_XEN_X86_EMU_USE_PIRQ)
> +#define _XEN_X86_EMU_VPCI 10
> +#define XEN_X86_EMU_VPCI (1U<<_XEN_X86_EMU_VPCI)
>
> #define XEN_X86_EMU_ALL (XEN_X86_EMU_LAPIC |
> XEN_X86_EMU_HPET | \
> XEN_X86_EMU_PM | XEN_X86_EMU_RTC | \
> XEN_X86_EMU_IOAPIC | XEN_X86_EMU_PIC | \
> XEN_X86_EMU_VGA | XEN_X86_EMU_IOMMU | \
> - XEN_X86_EMU_PIT | XEN_X86_EMU_USE_PIRQ)
> + XEN_X86_EMU_PIT | XEN_X86_EMU_USE_PIRQ |\
> + XEN_X86_EMU_VPCI)
> uint32_t emulation_flags;
> };
>
> diff --git a/xen/include/xen/pci.h b/xen/include/xen/pci.h
> index dd5ec43a70..b7a6abfc53 100644
> --- a/xen/include/xen/pci.h
> +++ b/xen/include/xen/pci.h
> @@ -112,6 +112,9 @@ struct pci_dev {
> #define PT_FAULT_THRESHOLD 10
> } fault;
> u64 vf_rlen[6];
> +
> + /* Data for vPCI. */
> + struct vpci *vpci;
> };
>
> #define for_each_pdev(domain, pdev) \
> diff --git a/xen/include/xen/pci_regs.h b/xen/include/xen/pci_regs.h
> index ecd6124d91..cc4ee3b83e 100644
> --- a/xen/include/xen/pci_regs.h
> +++ b/xen/include/xen/pci_regs.h
> @@ -23,6 +23,14 @@
> #define LINUX_PCI_REGS_H
>
> /*
> + * Conventional PCI and PCI-X Mode 1 devices have 256 bytes of
> + * configuration space. PCI-X Mode 2 and PCIe devices have 4096 bytes of
> + * configuration space.
> + */
> +#define PCI_CFG_SPACE_SIZE 256
> +#define PCI_CFG_SPACE_EXP_SIZE 4096
> +
> +/*
> * Under PCI, each device has 256 bytes of configuration address space,
> * of which the first 64 bytes are standardized as follows:
> */
> diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
> new file mode 100644
> index 0000000000..9f2864fb0c
> --- /dev/null
> +++ b/xen/include/xen/vpci.h
> @@ -0,0 +1,53 @@
> +#ifndef _XEN_VPCI_H_
> +#define _XEN_VPCI_H_
> +
> +#include <xen/pci.h>
> +#include <xen/types.h>
> +#include <xen/list.h>
> +
> +typedef uint32_t vpci_read_t(const struct pci_dev *pdev, unsigned int reg,
> + void *data);
> +
> +typedef void vpci_write_t(const struct pci_dev *pdev, unsigned int reg,
> + uint32_t val, void *data);
> +
> +typedef int vpci_register_init_t(struct pci_dev *dev);
> +
> +#define REGISTER_VPCI_INIT(x) \
> + static vpci_register_init_t *const x##_entry \
> + __used_section(".data.vpci") = x
> +
> +/* Add vPCI handlers to device. */
> +int __must_check vpci_add_handlers(struct pci_dev *dev);
> +
> +/* Add/remove a register handler. */
> +int __must_check vpci_add_register(struct vpci *vpci,
> + vpci_read_t *read_handler,
> + vpci_write_t *write_handler,
> + unsigned int offset, unsigned int size,
> + void *data);
> +int __must_check vpci_remove_register(struct vpci *vpci, unsigned int
> offset,
> + unsigned int size);
> +
> +/* Generic read/write handlers for the PCI config space. */
> +uint32_t vpci_read(pci_sbdf_t sbdf, unsigned int reg, unsigned int size);
> +void vpci_write(pci_sbdf_t sbdf, unsigned int reg, unsigned int size,
> + uint32_t data);
> +
> +struct vpci {
> + /* List of vPCI handlers for a device. */
> + struct list_head handlers;
> + spinlock_t lock;
> +};
> +
> +#endif
> +
> +/*
> + * Local variables:
> + * mode: C
> + * c-file-style: "BSD"
> + * c-basic-offset: 4
> + * tab-width: 4
> + * indent-tabs-mode: nil
> + * End:
> + */
> --
> 2.13.5 (Apple Git-94)
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel
next prev parent reply other threads:[~2017-10-20 9:34 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-10-18 11:40 [PATCH v7 for-next 00/12] vpci: PCI config space emulation Roger Pau Monne
2017-10-18 11:40 ` [PATCH v7 for-next 01/12] x86/pio: allow internal PIO handlers to return RETRY Roger Pau Monne
2017-10-20 9:28 ` Paul Durrant
2017-10-18 11:40 ` [PATCH v7 for-next 02/12] pci: introduce a type to store a SBDF Roger Pau Monne
2017-10-26 15:57 ` Jan Beulich
2017-10-31 10:50 ` Wei Liu
2017-10-18 11:40 ` [PATCH v7 for-next 03/12] vpci: introduce basic handlers to trap accesses to the PCI config space Roger Pau Monne
2017-10-20 9:34 ` Paul Durrant [this message]
2017-12-12 16:17 ` Jan Beulich
2017-10-18 11:40 ` [PATCH v7 for-next 04/12] x86/mmcfg: add handlers for the PVH Dom0 MMCFG areas Roger Pau Monne
2017-10-20 9:47 ` Paul Durrant
2017-12-12 16:25 ` Jan Beulich
2017-10-18 11:40 ` [PATCH v7 for-next 05/12] x86/physdev: enable PHYSDEVOP_pci_mmcfg_reserved for PVH Dom0 Roger Pau Monne
2017-12-15 10:45 ` Jan Beulich
2017-10-18 11:40 ` [PATCH v7 for-next 06/12] pci: split code to size BARs from pci_add_device Roger Pau Monne
2017-12-15 10:54 ` Jan Beulich
2017-10-18 11:40 ` [PATCH v7 for-next 07/12] pci: add support to size ROM BARs to pci_size_mem_bar Roger Pau Monne
2017-10-18 11:40 ` [PATCH v7 for-next 08/12] xen: introduce rangeset_consume_ranges Roger Pau Monne
2017-10-31 10:50 ` Wei Liu
2017-10-18 11:40 ` [PATCH v7 for-next 09/12] vpci/bars: add handlers to map the BARs Roger Pau Monne
2017-12-15 11:43 ` Jan Beulich
2018-01-19 15:47 ` Roger Pau Monné
2018-01-19 16:16 ` Jan Beulich
2018-01-19 16:57 ` Roger Pau Monné
2017-10-18 11:40 ` [PATCH v7 for-next 10/12] vpci/msi: add MSI handlers Roger Pau Monne
2017-10-20 9:58 ` Paul Durrant
2017-12-15 12:07 ` Jan Beulich
2018-01-22 12:48 ` Roger Pau Monné
2018-01-22 12:58 ` Jan Beulich
2018-01-22 14:55 ` Roger Pau Monné
2017-10-18 11:40 ` [PATCH v7 for-next 11/12] vpci: add a priority parameter to the vPCI register initializer Roger Pau Monne
2017-10-18 11:40 ` [PATCH v7 for-next 12/12] vpci/msix: add MSI-X handlers Roger Pau Monne
2017-12-20 16:13 ` Jan Beulich
2018-01-23 10:38 ` Roger Pau Monné
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4fb72a7cc1ec4a14b53461e5197af7f5@AMSPEX02CL03.citrite.net \
--to=paul.durrant@citrix.com \
--cc=Andrew.Cooper3@citrix.com \
--cc=Ian.Jackson@citrix.com \
--cc=boris.ostrovsky@oracle.com \
--cc=jbeulich@suse.com \
--cc=konrad.wilk@oracle.com \
--cc=roger.pau@citrix.com \
--cc=wei.liu2@citrix.com \
--cc=xen-devel@lists.xenproject.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).