All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH][RFC] Support more Capability Structures and Device Specific
@ 2008-06-27  7:38 Yuji Shimada
  2008-06-27 10:14 ` [PATCH][RFC] Support more Capability Structures andDevice Specific Dong, Eddie
  2008-06-27 13:51 ` [PATCH][RFC] Support more Capability Structures and Device Specific Samuel Thibault
  0 siblings, 2 replies; 33+ messages in thread
From: Yuji Shimada @ 2008-06-27  7:38 UTC (permalink / raw)
  To: xen-devel

[-- Attachment #1: Type: text/plain, Size: 2496 bytes --]

I am submitting the patch which supports more Capability Structures
and Device Specific Registers for passthrough device.

In Xen 3.3 unstable, qemu-dm supports Configuration Header, MSI
Capability Structure, and MSI-X Capability Structure. But qemu-dm does
not support PCI Express Capability Structure, Device Specific
Registers, etc (writing them is ignored).

To support various I/O devices, I implemented following Capability
Structures and Device Specific Registers.

    * Configuration Header Type 0
        -> emulation.
           "emulation" does not mean no accessing real I/O device.
           Access real I/O device, but guest value and real value
           might be different.
    * PCI Express Capability Structure
        -> emulation.
    * PCI Power Management Capability Structure
        -> emulation.
    * Vital Product Data Capability Structure
        -> emulation (almost passthrough).
    * Vendor Specific Capability Structure
        -> emulation (almost passthrough).
    * Device Specific Register (exclude capability structures)
        -> passthrough.
           The device drivers in guest domain are allowed to access
           Device Specific Register. So various I/O device will work.

Currently MSI Capability Structure and MSI-X Capability Structure is
not implemented, and they are hidden from guest software. I disabled
MSI and MSI-X in qemu-dm temporary. I am implementing MSI Capability
Structure and merging current MSI routines. I will release the patch
if you agree with me.

MSI-X will be after MSI. I will be very happy if anyone can help me.

Other Capability Structures are hidden from guest software. To do
this, I change Next Capability Pointer's value to point only the
Capability Structure that need to be exported to guest software
(see emulate capabilities above). And some Capability Structures are 0
hardwired, and others are passthrough.

This patch removes "switch" statements for emulation, and introduces
table based emulation derived from pciback driver. You can implement
new Capability Structure by adding new table.
The other advantage of using this table is that you can easily change
the emulation policy of each field/bit by just simply modifying the
"emu_mask" value provided in each register table.
And for only special emulation or interacting with other components
(like hypervisor), you have to implement function corresponding to the
register.

Signed-off-by: Yuji Shimada <shimada-yxb@necst.nec.co.jp>

--
Yuji Shimada

[-- Attachment #2: pci_config_passthrough.patch --]
[-- Type: application/octet-stream, Size: 75944 bytes --]

diff -r 926a366ca82f tools/ioemu/hw/pass-through.c
--- a/tools/ioemu/hw/pass-through.c	Fri Jun 20 15:21:26 2008 +0100
+++ b/tools/ioemu/hw/pass-through.c	Fri Jun 27 11:58:26 2008 +0900
@@ -26,7 +26,7 @@
 #include "pass-through.h"
 #include "pci/header.h"
 #include "pci/pci.h"
-#include "pt-msi.h"
+//#include "pt-msi.h"
 
 extern FILE *logfile;
 
@@ -46,6 +46,498 @@ struct dpci_infos {
 
 } dpci_infos;
 
+/* prototype */
+static uint32_t pt_common_reg_init(struct pt_dev *ptdev,
+    struct pt_reg_info_tbl *reg, uint32_t real_offset);
+static uint32_t pt_ptr_reg_init(struct pt_dev *ptdev,
+    struct pt_reg_info_tbl *reg, uint32_t real_offset);
+static uint32_t pt_status_reg_init(struct pt_dev *ptdev,
+    struct pt_reg_info_tbl *reg, uint32_t real_offset);
+static uint32_t pt_irqpin_reg_init(struct pt_dev *ptdev,
+    struct pt_reg_info_tbl *reg, uint32_t real_offset);
+static uint32_t pt_bar_reg_init(struct pt_dev *ptdev,
+    struct pt_reg_info_tbl *reg, uint32_t real_offset);
+static uint32_t pt_linkctrl2_reg_init(struct pt_dev *ptdev,
+    struct pt_reg_info_tbl *reg, uint32_t real_offset);
+static uint8_t pt_reg_grp_size_init(struct pt_dev *ptdev,
+    struct pt_reg_grp_info_tbl *grp_reg, uint32_t base_offset);
+static uint8_t pt_msi_size_init(struct pt_dev *ptdev,
+    struct pt_reg_grp_info_tbl *grp_reg, uint32_t base_offset);
+static uint8_t pt_vendor_size_init(struct pt_dev *ptdev,
+    struct pt_reg_grp_info_tbl *grp_reg, uint32_t base_offset);
+static int pt_byte_reg_read(struct pt_dev *ptdev,
+    struct pt_reg_tbl *cfg_entry,
+    uint8_t *valueu, uint8_t valid_mask);
+static int pt_word_reg_read(struct pt_dev *ptdev,
+    struct pt_reg_tbl *cfg_entry,
+    uint16_t *value, uint16_t valid_mask);
+static int pt_long_reg_read(struct pt_dev *ptdev,
+    struct pt_reg_tbl *cfg_entry,
+    uint32_t *value, uint32_t valid_mask);
+static int pt_bar_reg_read(struct pt_dev *ptdev,
+    struct pt_reg_tbl *cfg_entry,
+    uint32_t *value, uint32_t valid_mask);
+static int pt_byte_reg_write(struct pt_dev *ptdev, 
+    struct pt_reg_tbl *cfg_entry, 
+    uint8_t *value, uint8_t dev_value, uint8_t valid_mask);
+static int pt_word_reg_write(struct pt_dev *ptdev, 
+    struct pt_reg_tbl *cfg_entry, 
+    uint16_t *value, uint16_t dev_value, uint16_t valid_mask);
+static int pt_long_reg_write(struct pt_dev *ptdev, 
+    struct pt_reg_tbl *cfg_entry, 
+    uint32_t *value, uint32_t dev_value, uint32_t valid_mask);
+static int pt_cmd_reg_write(struct pt_dev *ptdev, 
+    struct pt_reg_tbl *cfg_entry, 
+    uint16_t *value, uint16_t dev_value, uint16_t valid_mask);
+static int pt_bar_reg_write(struct pt_dev *ptdev, 
+    struct pt_reg_tbl *cfg_entry, 
+    uint32_t *value, uint32_t dev_value, uint32_t valid_mask);
+static int pt_exp_rom_bar_reg_write(struct pt_dev *ptdev, 
+    struct pt_reg_tbl *cfg_entry, 
+    uint32_t *value, uint32_t dev_value, uint32_t valid_mask);
+static int pt_pmcsr_reg_write(struct pt_dev *ptdev, 
+    struct pt_reg_tbl *cfg_entry, 
+    uint16_t *value, uint16_t dev_value, uint16_t valid_mask);
+static int pt_devctrl_reg_write(struct pt_dev *ptdev, 
+    struct pt_reg_tbl *cfg_entry, 
+    uint16_t *value, uint16_t dev_value, uint16_t valid_mask);
+static int pt_linkctrl_reg_write(struct pt_dev *ptdev, 
+    struct pt_reg_tbl *cfg_entry, 
+    uint16_t *value, uint16_t dev_value, uint16_t valid_mask);
+static int pt_devctrl2_reg_write(struct pt_dev *ptdev, 
+    struct pt_reg_tbl *cfg_entry, 
+    uint16_t *value, uint16_t dev_value, uint16_t valid_mask);
+static int pt_linkctrl2_reg_write(struct pt_dev *ptdev, 
+    struct pt_reg_tbl *cfg_entry, 
+    uint16_t *value, uint16_t dev_value, uint16_t valid_mask);
+
+/* Header Type0 reg static infomation table */
+static struct pt_reg_info_tbl pt_emu_reg_header0_tbl[] = {
+    /* Command reg */
+    {
+        .offset     = PCI_COMMAND,
+        .size       = 2,
+        .init_val   = 0x0000,
+        .ro_mask    = 0xF880,
+        .emu_mask   = 0x0340,
+        .init       = pt_common_reg_init,
+        .u.w.read   = pt_word_reg_read,
+        .u.w.write  = pt_cmd_reg_write,
+    },
+    /* Capabilities Pointer reg */
+    {
+        .offset     = PCI_CAPABILITY_LIST,
+        .size       = 1,
+        .init_val   = 0x00,
+        .ro_mask    = 0xFF,
+        .emu_mask   = 0xFF,
+        .init       = pt_ptr_reg_init,
+        .u.b.read   = pt_byte_reg_read,
+        .u.b.write  = pt_byte_reg_write,
+    },
+    /* Status reg */
+    /* use emulated Cap Ptr value to initialize, 
+     * so need to be declared after Cap Ptr reg 
+     */
+    {
+        .offset     = PCI_STATUS,
+        .size       = 2,
+        .init_val   = 0x0000,
+        .ro_mask    = 0x06FF,
+        .emu_mask   = 0x0010,
+        .init       = pt_status_reg_init,
+        .u.w.read   = pt_word_reg_read,
+        .u.w.write  = pt_word_reg_write,
+    },
+    /* Cache Line Size reg */
+    {
+        .offset     = PCI_CACHE_LINE_SIZE,
+        .size       = 1,
+        .init_val   = 0x00,
+        .ro_mask    = 0x00,
+        .emu_mask   = 0xFF,
+        .init       = pt_common_reg_init,
+        .u.b.read   = pt_byte_reg_read,
+        .u.b.write  = pt_byte_reg_write,
+    },
+    /* Latency Timer reg */
+    {
+        .offset     = PCI_LATENCY_TIMER,
+        .size       = 1,
+        .init_val   = 0x00,
+        .ro_mask    = 0x00,
+        .emu_mask   = 0xFF,
+        .init       = pt_common_reg_init,
+        .u.b.read   = pt_byte_reg_read,
+        .u.b.write  = pt_byte_reg_write,
+    },
+    /* Header Type reg */
+    {
+        .offset     = PCI_HEADER_TYPE,
+        .size       = 1,
+        .init_val   = 0x00,
+        .ro_mask    = 0xFF,
+        .emu_mask   = 0x80,
+        .init       = pt_common_reg_init,
+        .u.b.read   = pt_byte_reg_read,
+        .u.b.write  = pt_byte_reg_write,
+    },
+    /* Interrupt Line reg */
+    {
+        .offset     = PCI_INTERRUPT_LINE,
+        .size       = 1,
+        .init_val   = 0x00,
+        .ro_mask    = 0x00,
+        .emu_mask   = 0xFF,
+        .init       = pt_common_reg_init,
+        .u.b.read   = pt_byte_reg_read,
+        .u.b.write  = pt_byte_reg_write,
+    },
+    /* Interrupt Pin reg */
+    {
+        .offset     = PCI_INTERRUPT_PIN,
+        .size       = 1,
+        .init_val   = 0x00,
+        .ro_mask    = 0xFF,
+        .emu_mask   = 0xFF,
+        .init       = pt_irqpin_reg_init,
+        .u.b.read   = pt_byte_reg_read,
+        .u.b.write  = pt_byte_reg_write,
+    },
+    /* BAR 0 reg */
+    /* mask of BAR need to be decided later, depends on IO/MEM type */
+    {
+        .offset     = PCI_BASE_ADDRESS_0,
+        .size       = 4,
+        .init_val   = 0x00000000,
+        .init       = pt_bar_reg_init,
+        .u.dw.read  = pt_bar_reg_read,
+        .u.dw.write = pt_bar_reg_write,
+    },
+    /* BAR 1 reg */
+    {
+        .offset     = PCI_BASE_ADDRESS_1,
+        .size       = 4,
+        .init_val   = 0x00000000,
+        .init       = pt_bar_reg_init,
+        .u.dw.read  = pt_bar_reg_read,
+        .u.dw.write = pt_bar_reg_write,
+    },
+    /* BAR 2 reg */
+    {
+        .offset     = PCI_BASE_ADDRESS_2,
+        .size       = 4,
+        .init_val   = 0x00000000,
+        .init       = pt_bar_reg_init,
+        .u.dw.read  = pt_bar_reg_read,
+        .u.dw.write = pt_bar_reg_write,
+    },
+    /* BAR 3 reg */
+    {
+        .offset     = PCI_BASE_ADDRESS_3,
+        .size       = 4,
+        .init_val   = 0x00000000,
+        .init       = pt_bar_reg_init,
+        .u.dw.read  = pt_bar_reg_read,
+        .u.dw.write = pt_bar_reg_write,
+    },
+    /* BAR 4 reg */
+    {
+        .offset     = PCI_BASE_ADDRESS_4,
+        .size       = 4,
+        .init_val   = 0x00000000,
+        .init       = pt_bar_reg_init,
+        .u.dw.read  = pt_bar_reg_read,
+        .u.dw.write = pt_bar_reg_write,
+    },
+    /* BAR 5 reg */
+    {
+        .offset     = PCI_BASE_ADDRESS_5,
+        .size       = 4,
+        .init_val   = 0x00000000,
+        .init       = pt_bar_reg_init,
+        .u.dw.read  = pt_bar_reg_read,
+        .u.dw.write = pt_bar_reg_write,
+    },
+    /* Expansion ROM BAR reg */
+    {
+        .offset     = PCI_ROM_ADDRESS,
+        .size       = 4,
+        .init_val   = 0x00000000,
+        .ro_mask    = 0x000007FE,
+        .emu_mask   = 0xFFFFF800,
+        .init       = pt_bar_reg_init,
+        .u.dw.read  = pt_long_reg_read,
+        .u.dw.write = pt_exp_rom_bar_reg_write,
+    },
+    {
+        .size = 0,
+    }, 
+};
+
+/* Power Management Capability reg static infomation table */
+static struct pt_reg_info_tbl pt_emu_reg_pm_tbl[] = {
+    /* Next Pointer reg */
+    {
+        .offset     = PCI_CAP_LIST_NEXT,
+        .size       = 1,
+        .init_val   = 0x00,
+        .ro_mask    = 0xFF,
+        .emu_mask   = 0xFF,
+        .init       = pt_ptr_reg_init,
+        .u.b.read   = pt_byte_reg_read,
+        .u.b.write  = pt_byte_reg_write,
+    },
+    /* Power Management Capabilities reg */
+    {
+        .offset     = PCI_CAP_FLAGS,
+        .size       = 2,
+        .init_val   = 0x0000,
+        .ro_mask    = 0xFFFF,
+        .emu_mask   = 0xFFE8,
+        .init       = pt_common_reg_init,
+        .u.w.read   = pt_word_reg_read,
+        .u.w.write  = pt_word_reg_write,
+    },
+    /* PCI Power Management Control/Status reg */
+    {
+        .offset     = PCI_PM_CTRL,
+        .size       = 2,
+        .init_val   = 0x0008,
+        .ro_mask    = 0x60FC,
+        .emu_mask   = 0xFF0B,
+        .init       = pt_common_reg_init,
+        .u.w.read   = pt_word_reg_read,
+        .u.w.write  = pt_pmcsr_reg_write,
+    },
+    /* Data reg */
+    {
+        .offset     = PCI_PM_DATA_REGISTER,
+        .size       = 1,
+        .init_val   = 0x00,
+        .ro_mask    = 0xFF,
+        .emu_mask   = 0xFF,
+        .init       = pt_common_reg_init,
+        .u.b.read   = pt_byte_reg_read,
+        .u.b.write  = pt_byte_reg_write,
+    },
+    {
+        .size = 0,
+    }, 
+};
+
+/* Vital Product Data Capability Structure reg static infomation table */
+static struct pt_reg_info_tbl pt_emu_reg_vpd_tbl[] = {
+    /* Next Pointer reg */
+    {
+        .offset     = PCI_CAP_LIST_NEXT,
+        .size       = 1,
+        .init_val   = 0x00,
+        .ro_mask    = 0xFF,
+        .emu_mask   = 0xFF,
+        .init       = pt_ptr_reg_init,
+        .u.b.read   = pt_byte_reg_read,
+        .u.b.write  = pt_byte_reg_write,
+    },
+    {
+        .size = 0,
+    }, 
+};
+
+/* Vendor Specific Capability Structure reg static infomation table */
+static struct pt_reg_info_tbl pt_emu_reg_vendor_tbl[] = {
+    /* Next Pointer reg */
+    {
+        .offset     = PCI_CAP_LIST_NEXT,
+        .size       = 1,
+        .init_val   = 0x00,
+        .ro_mask    = 0xFF,
+        .emu_mask   = 0xFF,
+        .init       = pt_ptr_reg_init,
+        .u.b.read   = pt_byte_reg_read,
+        .u.b.write  = pt_byte_reg_write,
+    },
+    {
+        .size = 0,
+    }, 
+};
+
+/* PCI Express Capability Structure reg static infomation table */
+static struct pt_reg_info_tbl pt_emu_reg_pcie_tbl[] = {
+    /* Next Pointer reg */
+    {
+        .offset     = PCI_CAP_LIST_NEXT,
+        .size       = 1,
+        .init_val   = 0x00,
+        .ro_mask    = 0xFF,
+        .emu_mask   = 0xFF,
+        .init       = pt_ptr_reg_init,
+        .u.b.read   = pt_byte_reg_read,
+        .u.b.write  = pt_byte_reg_write,
+    },
+    /* Device Capabilities reg */
+    {
+        .offset     = PCI_EXP_DEVCAP,
+        .size       = 4,
+        .init_val   = 0x00000000,
+        .ro_mask    = 0x1FFCFFFF,
+        .emu_mask   = 0x10000000,
+        .init       = pt_common_reg_init,
+        .u.dw.read  = pt_long_reg_read,
+        .u.dw.write = pt_long_reg_write,
+    },
+    /* Device Control reg */
+    {
+        .offset     = PCI_EXP_DEVCTL,
+        .size       = 2,
+        .init_val   = 0x2810,
+        .ro_mask    = 0x0000,
+        .emu_mask   = 0xFFFF,
+        .init       = pt_common_reg_init,
+        .u.w.read   = pt_word_reg_read,
+        .u.w.write  = pt_devctrl_reg_write,
+    },
+    /* Link Control reg */
+    {
+        .offset     = PCI_EXP_LNKCTL,
+        .size       = 2,
+        .init_val   = 0x0000,
+        .ro_mask    = 0x0000,
+        .emu_mask   = 0xFFFF,
+        .init       = pt_common_reg_init,
+        .u.w.read   = pt_word_reg_read,
+        .u.w.write  = pt_linkctrl_reg_write,
+    },
+    /* Device Control 2 reg */
+    {
+        .offset     = 0x28,
+        .size       = 2,
+        .init_val   = 0x0000,
+        .ro_mask    = 0x0000,
+        .emu_mask   = 0xFFFF,
+        .init       = pt_common_reg_init,
+        .u.w.read   = pt_word_reg_read,
+        .u.w.write  = pt_devctrl2_reg_write,
+    },
+    /* Link Control 2 reg */
+    {
+        .offset     = 0x30,
+        .size       = 2,
+        .init_val   = 0x0000,
+        .ro_mask    = 0x0000,
+        .emu_mask   = 0xFFFF,
+        .init       = pt_linkctrl2_reg_init,
+        .u.w.read   = pt_word_reg_read,
+        .u.w.write  = pt_linkctrl2_reg_write,
+    },
+    {
+        .size = 0,
+    }, 
+};
+
+/* emul reg group static infomation table */
+static const struct pt_reg_grp_info_tbl pt_emu_reg_grp_tbl[] = {
+    /* Header Type0 reg group */
+    {
+        .grp_id     = 0xFF,
+        .grp_type   = GRP_TYPE_EMU,
+        .grp_size   = 0x40,
+        .size_init  = pt_reg_grp_size_init,
+        .emu_reg_tbl= pt_emu_reg_header0_tbl,
+    },
+    /* PCI PowerManagement Capability reg group */
+    {
+        .grp_id     = PCI_CAP_ID_PM,
+        .grp_type   = GRP_TYPE_EMU,
+        .grp_size   = PCI_PM_SIZEOF,
+        .size_init  = pt_reg_grp_size_init,
+        .emu_reg_tbl= pt_emu_reg_pm_tbl,
+    },
+    /* AGP Capability Structure reg group */
+    {
+        .grp_id     = PCI_CAP_ID_AGP,
+        .grp_type   = GRP_TYPE_HARDWIRED,
+        .grp_size   = 0x30,
+        .size_init  = pt_reg_grp_size_init,
+    },
+    /* Vital Product Data Capability Structure reg group */
+    {
+        .grp_id     = PCI_CAP_ID_VPD,
+        .grp_type   = GRP_TYPE_EMU,
+        .grp_size   = 0x08,
+        .size_init  = pt_reg_grp_size_init,
+        .emu_reg_tbl= pt_emu_reg_vpd_tbl,
+    },
+    /* Slot Identification reg group */
+    {
+        .grp_id     = PCI_CAP_ID_SLOTID,
+        .grp_type   = GRP_TYPE_HARDWIRED,
+        .grp_size   = 0x04,
+        .size_init  = pt_reg_grp_size_init,
+    },
+    /* MSI Capability Structure reg group */
+    {
+        .grp_id     = PCI_CAP_ID_MSI,
+        .grp_type   = GRP_TYPE_HARDWIRED,
+        .grp_size   = 0xFF,
+        .size_init  = pt_msi_size_init,
+    },
+    /* PCI-X Capabilities List Item reg group */
+    {
+        .grp_id     = PCI_CAP_ID_PCIX,
+        .grp_type   = GRP_TYPE_HARDWIRED,
+        .grp_size   = 0x18,
+        .size_init  = pt_reg_grp_size_init,
+    },
+    /* Vendor Specific Capability Structure reg group */
+    {
+        .grp_id     = PCI_CAP_ID_VNDR,
+        .grp_type   = GRP_TYPE_EMU,
+        .grp_size   = 0xFF,
+        .size_init  = pt_vendor_size_init,
+        .emu_reg_tbl= pt_emu_reg_vendor_tbl,
+    },
+    /* SHPC Capability List Item reg group */
+    {
+        .grp_id     = PCI_CAP_ID_HOTPLUG,
+        .grp_type   = GRP_TYPE_HARDWIRED,
+        .grp_size   = 0x08,
+        .size_init  = pt_reg_grp_size_init,
+    },
+    /* Subsystem ID and Subsystem Vendor ID Capability List Item reg group */
+    {
+        .grp_id     = PCI_CAP_ID_SSVID,
+        .grp_type   = GRP_TYPE_HARDWIRED,
+        .grp_size   = 0x08,
+        .size_init  = pt_reg_grp_size_init,
+    },
+    /* AGP 8x Capability Structure reg group */
+    {
+        .grp_id     = PCI_CAP_ID_AGP3,
+        .grp_type   = GRP_TYPE_HARDWIRED,
+        .grp_size   = 0x30,
+        .size_init  = pt_reg_grp_size_init,
+    },
+    /* PCI Express Capability Structure reg group */
+    {
+        .grp_id     = PCI_CAP_ID_EXP,
+        .grp_type   = GRP_TYPE_EMU,
+        .grp_size   = 0x3C,
+        .size_init  = pt_vendor_size_init,
+        .emu_reg_tbl= pt_emu_reg_pcie_tbl,
+    },
+    /* MSI-X Capability Structure reg group */
+    {
+        .grp_id     = PCI_CAP_ID_MSIX,
+        .grp_type   = GRP_TYPE_HARDWIRED,
+        .grp_size   = 0x0C,
+        .size_init  = pt_reg_grp_size_init,
+    },
+    {
+        .grp_size = 0,
+    }, 
+};
+
 static int token_value(char *token)
 {
     return strtol(token, NULL, 16);
@@ -197,15 +689,15 @@ void pt_iomem_map(PCIDevice *d, int i, u
     assigned_device->bases[i].e_physbase = e_phys;
     assigned_device->bases[i].e_size= e_size;
 
-    PT_LOG("e_phys=%08x maddr=%08x type=%d len=%08x index=%d\n",
-        e_phys, assigned_device->bases[i].access.maddr, type, e_size, i);
+    PT_LOG("e_phys=%08x maddr=%lx type=%d len=%d index=%d first_map=%d\n",
+        e_phys, assigned_device->bases[i].access.maddr, 
+        type, e_size, i, first_map);
 
     if ( e_size == 0 )
         return;
 
     if ( !first_map )
     {
-        add_msix_mapping(assigned_device, i);
         /* Remove old mapping */
         ret = xc_domain_memory_mapping(xc_handle, domid,
                 old_ebase >> XC_PAGE_SHIFT,
@@ -219,18 +711,21 @@ void pt_iomem_map(PCIDevice *d, int i, u
         }
     }
 
-    /* Create new mapping */
-    ret = xc_domain_memory_mapping(xc_handle, domid,
-            assigned_device->bases[i].e_physbase >> XC_PAGE_SHIFT,
-            assigned_device->bases[i].access.maddr >> XC_PAGE_SHIFT,
-            (e_size+XC_PAGE_SIZE-1) >> XC_PAGE_SHIFT,
-            DPCI_ADD_MAPPING);
-    if ( ret != 0 )
-        PT_LOG("Error: create new mapping failed!\n");
-
-    ret = remove_msix_mapping(assigned_device, i);
-    if ( ret != 0 )
-        PT_LOG("Error: remove MSX-X mmio mapping failed!\n");
+    /* map only valid guest address (include 0) */
+    if (e_phys != -1)
+    {
+        /* Create new mapping */
+        ret = xc_domain_memory_mapping(xc_handle, domid,
+                assigned_device->bases[i].e_physbase >> XC_PAGE_SHIFT,
+                assigned_device->bases[i].access.maddr >> XC_PAGE_SHIFT,
+                (e_size+XC_PAGE_SIZE-1) >> XC_PAGE_SHIFT,
+                DPCI_ADD_MAPPING);
+
+        if ( ret != 0 )
+        {
+            PT_LOG("Error: create new mapping failed!\n");
+        }
+    }
 }
 
 /* Being called each time a pio region has been updated */
@@ -245,9 +740,9 @@ void pt_ioport_map(PCIDevice *d, int i,
     assigned_device->bases[i].e_physbase = e_phys;
     assigned_device->bases[i].e_size= e_size;
 
-    PT_LOG("e_phys=%04x pio_base=%04x len=%04x index=%d\n",
+    PT_LOG("e_phys=%04x pio_base=%04x len=%d index=%d first_map=%d\n",
         (uint16_t)e_phys, (uint16_t)assigned_device->bases[i].access.pio_base,
-        (uint16_t)e_size, i);
+        (uint16_t)e_size, i, first_map);
 
     if ( e_size == 0 )
         return;
@@ -265,13 +760,84 @@ void pt_ioport_map(PCIDevice *d, int i,
         }
     }
 
-    /* Create new mapping */
-    ret = xc_domain_ioport_mapping(xc_handle, domid, e_phys,
-                assigned_device->bases[i].access.pio_base, e_size,
-                DPCI_ADD_MAPPING);
-    if ( ret != 0 )
-        PT_LOG("Error: create new mapping failed!\n");
-
+    /* map only valid guest address (include 0) */
+    if (e_phys != -1)
+    {
+        /* Create new mapping */
+        ret = xc_domain_ioport_mapping(xc_handle, domid, e_phys,
+                    assigned_device->bases[i].access.pio_base, e_size,
+                    DPCI_ADD_MAPPING);
+        if ( ret != 0 )
+        {
+            PT_LOG("Error: create new mapping failed!\n");
+        }
+    }
+}
+
+/* find emulate register group entry */
+struct pt_reg_grp_tbl* pt_find_reg_grp(
+        struct pt_dev *ptdev, uint32_t address)
+{
+    struct pt_reg_grp_tbl* reg_grp_entry = NULL;
+
+    /* find register group entry */
+    list_for_each_entry(reg_grp_entry, &ptdev->pt_reg_grp_tbl_list, list)
+    {
+        /* check address */
+        if ((reg_grp_entry->base_offset <= address) &&
+            ((reg_grp_entry->base_offset + reg_grp_entry->size) > address))
+            goto out;
+    }
+    /* group entry not found */
+    reg_grp_entry = NULL;
+
+out:
+    return reg_grp_entry;
+}
+
+/* find emulate register entry */
+struct pt_reg_tbl* pt_find_reg(
+        struct pt_reg_grp_tbl* reg_grp, uint32_t address)
+{
+    struct pt_reg_tbl* reg_entry = NULL;
+    struct pt_reg_info_tbl* reg = NULL;
+    uint32_t real_offset = 0;
+
+    /* find register entry */
+    list_for_each_entry(reg_entry, &reg_grp->pt_reg_tbl_list, list)
+    {
+        reg = reg_entry->reg;
+        real_offset = (reg_grp->base_offset + reg->offset);
+        /* check address */
+        if ((real_offset <= address) && ((real_offset + reg->size) > address))
+            goto out;
+    }
+    /* register entry not found */
+    reg_entry = NULL;
+
+out:
+    return reg_entry;
+}
+
+/* get BAR index */
+static int pt_bar_offset_to_index(uint32_t offset)
+{
+    int index = 0;
+
+    /* check Exp ROM BAR */
+    if (offset == PCI_ROM_ADDRESS)
+    {
+        index = PCI_ROM_SLOT;
+        goto out;
+    }
+
+    /* calculate BAR index */
+    index = ((offset - PCI_BASE_ADDRESS_0) >> 2);
+    if (index >= PCI_NUM_REGIONS)
+        index = -1;
+
+out:
+    return index;
 }
 
 static void pt_pci_write_config(PCIDevice *d, uint32_t address, uint32_t val,
@@ -279,60 +845,258 @@ static void pt_pci_write_config(PCIDevic
 {
     struct pt_dev *assigned_device = (struct pt_dev *)d;
     struct pci_dev *pci_dev = assigned_device->pci_dev;
-
-#ifdef PT_DEBUG_PCI_CONFIG_ACCESS
-    PT_LOG("(%x.%x): address=%04x val=0x%08x len=%d\n",
-       (d->devfn >> 3) & 0x1F, (d->devfn & 0x7), address, val, len);
-#endif
-
-    /* Pre-write hooking */
-    switch ( address ) {
-    case 0x0C ... 0x3F:
-        pci_default_write_config(d, address, val, len);
-        return;
-    }
-
-    if ( pt_msi_write(assigned_device, address, val, len) )
-        return;
-
-    if ( pt_msix_write(assigned_device, address, val, len) )
-        return;
-
-    /* PCI config pass-through */
-    if (address == 0x4) {
-        switch (len){
-        case 1:
-            pci_write_byte(pci_dev, address, val);
-            break;
-        case 2:
-            pci_write_word(pci_dev, address, val);
-            break;
-        case 4:
-            pci_write_long(pci_dev, address, val);
-            break;
-        }
-    }
-
-    if (address == 0x4) {
-        /* Post-write hooking */
-        pci_default_write_config(d, address, val, len);
-    }
+    struct pt_reg_grp_tbl *reg_grp_entry = NULL;
+    struct pt_reg_grp_info_tbl *reg_grp = NULL;
+    struct pt_reg_tbl *reg_entry = NULL;
+    struct pt_reg_info_tbl *reg = NULL;
+    uint32_t find_addr = address;
+    uint32_t real_offset = 0;
+    uint32_t valid_mask = 0xFFFFFFFF;
+    uint32_t read_val = 0;
+    uint8_t *ptr_val = NULL;
+    int emul_len = 0;
+    int index = 0;
+    int ret = 0;
+
+    PT_LOG("write(%x.%x): address=%04x val=0x%08x len=%d\n",
+        (d->devfn >> 3) & 0x1F, (d->devfn & 0x7), address, val, len);
+
+    /* check offset range */
+    if (address >= 0xFF)
+    {
+        PT_LOG("Failed to write register with offset exceeding FFh. "
+            "[%02x:%02x.%x][Offset:%02xh][Length:%d]\n",
+            pci_bus_num(d->bus), ((d->devfn >> 3) & 0x1F), (d->devfn & 0x7),
+            address, len);
+        goto exit;
+    }
+
+    /* check write size */
+    if ((len != 1) && (len != 2) && (len != 4))
+    {
+        PT_LOG("Failed to write register with invalid access length. "
+            "[%02x:%02x.%x][Offset:%02xh][Length:%d]\n",
+            pci_bus_num(d->bus), ((d->devfn >> 3) & 0x1F), (d->devfn & 0x7),
+            address, len);
+        goto exit;
+    }
+
+    /* check offset alignment */
+    if (address & (len-1))
+    {
+        PT_LOG("Failed to write register with invalid access size alignment. "
+            "[%02x:%02x.%x][Offset:%02xh][Length:%d]\n",
+            pci_bus_num(d->bus), ((d->devfn >> 3) & 0x1F), (d->devfn & 0x7),
+            address, len);
+        goto exit;
+    }
+
+    /* check unused BAR register */
+    index = pt_bar_offset_to_index(address);
+    if ((index >= 0) && !(val > 0 && val < PT_BAR_ALLF) &&
+        (assigned_device->bases[index].bar_flag == PT_BAR_FLAG_UNUSED))
+    {
+        PT_LOG("Guest attempt to set address to unused Base Address Register. "
+            "[%02x:%02x.%x][Offset:%02xh][Length:%d]\n",
+            pci_bus_num(d->bus), ((d->devfn >> 3) & 0x1F), 
+            (d->devfn & 0x7), address, len);
+    }
+
+    /* find register group entry */
+    reg_grp_entry = pt_find_reg_grp(assigned_device, address);
+    if (reg_grp_entry)
+    {
+        reg_grp = reg_grp_entry->reg_grp;
+        /* check 0 Hardwired register group */
+        if (reg_grp->grp_type == GRP_TYPE_HARDWIRED)
+        {
+            /* ignore silently */
+            PT_LOG("Access to 0 Hardwired register.\n");
+            goto exit;
+        }
+    }
+
+    /* read I/O device register value */
+    switch (len) {
+    case 1:
+        read_val = pci_read_byte(pci_dev, address);
+        break;
+    case 2:
+        read_val = pci_read_word(pci_dev, address);
+        break;
+    case 4:
+        read_val = pci_read_long(pci_dev, address);
+        break;
+    }
+
+    /* check libpci error */
+    valid_mask = (0xFFFFFFFF >> ((4 - len) << 3));
+    if ((read_val & valid_mask) == valid_mask)
+    {
+        PT_LOG("libpci read error. No emulation. "
+            "[%02x:%02x.%x][Offset:%02xh][Length:%d]\n",
+            pci_bus_num(d->bus), ((d->devfn >> 3) & 0x1F), (d->devfn & 0x7),
+            address, len);
+        goto exit;
+    }
+    
+    /* pass directly to libpci for passthrough type register group */
+    if (reg_grp_entry == NULL)
+        goto out;
+
+    /* adjust the write value to appropriate CFC-CFF window */
+    val <<= ((address & 3) << 3);
+    emul_len = len;
+
+    /* loop Guest request size */
+    while (0 < emul_len)
+    {
+        /* find register entry to be emulated */
+        reg_entry = pt_find_reg(reg_grp_entry, find_addr);
+        if (reg_entry)
+        {
+            reg = reg_entry->reg;
+            real_offset = (reg_grp_entry->base_offset + reg->offset);
+            valid_mask = (0xFFFFFFFF >> ((4 - emul_len) << 3));
+            valid_mask <<= ((find_addr - real_offset) << 3);
+            ptr_val = ((uint8_t *)&val + (real_offset & 3));
+
+            /* do emulation depend on register size */
+            switch (reg->size) {
+            case 1:
+                /* emulate write to byte register */
+                if (reg->u.b.write)
+                    ret = reg->u.b.write(assigned_device, reg_entry,
+                               (uint8_t *)ptr_val, 
+                               (uint8_t)(read_val >> ((real_offset & 3) << 3)),
+                               (uint8_t)valid_mask);
+                break;
+            case 2:
+                /* emulate write to word register */
+                if (reg->u.w.write)
+                    ret = reg->u.w.write(assigned_device, reg_entry,
+                               (uint16_t *)ptr_val, 
+                               (uint16_t)(read_val >> ((real_offset & 3) << 3)),
+                               (uint16_t)valid_mask);
+                break;
+            case 4:
+                /* emulate write to double word register */
+                if (reg->u.dw.write)
+                    ret = reg->u.dw.write(assigned_device, reg_entry,
+                               (uint32_t *)ptr_val, 
+                               (uint32_t)(read_val >> ((real_offset & 3) << 3)),
+                               (uint32_t)valid_mask);
+                break;
+            }
+
+            /* write emulation error */
+            if (ret < 0)
+            {
+                /* exit I/O emulator */
+                PT_LOG("I/O emulator exit()\n");
+                exit(1);
+            }
+
+            /* calculate next address to find */
+            emul_len -= reg->size;
+            if (emul_len > 0)
+                find_addr = real_offset + reg->size;
+        }
+        else
+        {
+            /* nothing to do with passthrough type register, 
+             * continue to find next byte 
+             */
+            emul_len--;
+            find_addr++;
+        }
+    }
+    
+    /* need to shift back before passing them to libpci */
+    val >>= ((address & 3) << 3);
+
+out:
+    switch (len){
+    case 1:
+        pci_write_byte(pci_dev, address, val);
+        break;
+    case 2:
+        pci_write_word(pci_dev, address, val);
+        break;
+    case 4:
+        pci_write_long(pci_dev, address, val);
+        break;
+    }
+
+exit:
+    return;
 }
 
 static uint32_t pt_pci_read_config(PCIDevice *d, uint32_t address, int len)
 {
     struct pt_dev *assigned_device = (struct pt_dev *)d;
     struct pci_dev *pci_dev = assigned_device->pci_dev;
-    uint32_t val = 0xFF;
-
-    /* Pre-hooking */
-    switch ( address ) {
-    case 0x0C ... 0x3F:
-        val = pci_default_read_config(d, address, len);
+    uint32_t val = 0xFFFFFFFF;
+    struct pt_reg_grp_tbl *reg_grp_entry = NULL;
+    struct pt_reg_grp_info_tbl *reg_grp = NULL;
+    struct pt_reg_tbl *reg_entry = NULL;
+    struct pt_reg_info_tbl *reg = NULL;
+    uint32_t find_addr = address;
+    uint32_t real_offset = 0;
+    uint32_t valid_mask = 0xFFFFFFFF;
+    uint8_t *ptr_val = NULL;
+    int emul_len = 0;
+    int ret = 0;
+
+    PT_LOG("read(%x.%x): address=%04x len=%d\n",
+        (d->devfn >> 3) & 0x1F, (d->devfn & 0x7), address, len);
+
+    /* check offset range */
+    if (address >= 0xFF)
+    {
+        PT_LOG("Failed to read register with offset exceeding FFh. "
+            "[%02x:%02x.%x][Offset:%02xh][Length:%d]\n",
+            pci_bus_num(d->bus), ((d->devfn >> 3) & 0x1F), (d->devfn & 0x7),
+            address, len);
         goto exit;
     }
 
-    switch ( len ) {
+    /* check read size */
+    if ((len != 1) && (len != 2) && (len != 4))
+    {
+        PT_LOG("Failed to read register with invalid access length. "
+            "[%02x:%02x.%x][Offset:%02xh][Length:%d]\n",
+            pci_bus_num(d->bus), ((d->devfn >> 3) & 0x1F), (d->devfn & 0x7),
+            address, len);
+        goto exit;
+    }
+
+    /* check offset alignment */
+    if (address & (len-1))
+    {
+        PT_LOG("Failed to read register with invalid access size alignment. "
+            "[%02x:%02x.%x][Offset:%02xh][Length:%d]\n",
+            pci_bus_num(d->bus), ((d->devfn >> 3) & 0x1F), (d->devfn & 0x7),
+            address, len);
+        goto exit;
+    }
+
+    /* find register group entry */
+    reg_grp_entry = pt_find_reg_grp(assigned_device, address);
+    if (reg_grp_entry)
+    {
+        reg_grp = reg_grp_entry->reg_grp;
+        /* check 0 Hardwired register group */
+        if (reg_grp->grp_type == GRP_TYPE_HARDWIRED)
+        {
+            /* no need to emulate, just return 0 */
+            val = 0;
+            goto exit;
+        }
+    }
+
+    /* read I/O device register value */
+    switch (len) {
     case 1:
         val = pci_read_byte(pci_dev, address);
         break;
@@ -344,15 +1108,92 @@ static uint32_t pt_pci_read_config(PCIDe
         break;
     }
 
-    pt_msi_read(assigned_device, address, len, &val);
-    pt_msix_read(assigned_device, address, len, &val);
+    /* check libpci error */
+    valid_mask = (0xFFFFFFFF >> ((4 - len) << 3));
+    if ((val & valid_mask) == valid_mask)
+    {
+        PT_LOG("libpci read error. No emulation. "
+            "[%02x:%02x.%x][Offset:%02xh][Length:%d]\n",
+            pci_bus_num(d->bus), ((d->devfn >> 3) & 0x1F), (d->devfn & 0x7),
+            address, len);
+        goto exit;
+    }
+
+    /* just return the I/O device register value for 
+     * passthrough type register group 
+     */
+    if (reg_grp_entry == NULL)
+        goto exit;
+
+    /* adjust the read value to appropriate CFC-CFF window */
+    val <<= ((address & 3) << 3);
+    emul_len = len;
+
+    /* loop Guest request size */
+    while (0 < emul_len)
+    {
+        /* find register entry to be emulated */
+        reg_entry = pt_find_reg(reg_grp_entry, find_addr);
+        if (reg_entry)
+        {
+            reg = reg_entry->reg;
+            real_offset = (reg_grp_entry->base_offset + reg->offset);
+            valid_mask = (0xFFFFFFFF >> ((4 - emul_len) << 3));
+            valid_mask <<= ((find_addr - real_offset) << 3);
+            ptr_val = ((uint8_t *)&val + (real_offset & 3));
+
+            /* do emulation depend on register size */
+            switch (reg->size) {
+            case 1:
+                /* emulate read to byte register */
+                if (reg->u.b.read)
+                    ret = reg->u.b.read(assigned_device, reg_entry,
+                                        (uint8_t *)ptr_val, 
+                                        (uint8_t)valid_mask);
+                break;
+            case 2:
+                /* emulate read to word register */
+                if (reg->u.w.read)
+                    ret = reg->u.w.read(assigned_device, reg_entry,
+                                        (uint16_t *)ptr_val, 
+                                        (uint16_t)valid_mask);
+                break;
+            case 4:
+                /* emulate read to double word register */
+                if (reg->u.dw.read)
+                    ret = reg->u.dw.read(assigned_device, reg_entry,
+                                        (uint32_t *)ptr_val, 
+                                        (uint32_t)valid_mask);
+                break;
+            }
+
+            /* read emulation error */
+            if (ret < 0)
+            {
+                /* exit I/O emulator */
+                PT_LOG("I/O emulator exit()\n");
+                exit(1);
+            }
+
+            /* calculate next address to find */
+            emul_len -= reg->size;
+            if (emul_len > 0)
+                find_addr = real_offset + reg->size;
+        }
+        else
+        {
+            /* nothing to do with passthrough type register, 
+             * continue to find next byte 
+             */
+            emul_len--;
+            find_addr++;
+        }
+    }
+    
+    /* need to shift back before returning them to pci bus emulator */
+    val >>= ((address & 3) << 3);
+
 exit:
-
-#ifdef PT_DEBUG_PCI_CONFIG_ACCESS
-    PT_LOG("(%x.%x): address=%04x val=0x%08x len=%d\n",
-       (d->devfn >> 3) & 0x1F, (d->devfn & 0x7), address, val, len);
-#endif
-
     return val;
 }
 
@@ -488,11 +1329,879 @@ uint8_t find_cap_offset(struct pci_dev *
     return 0;
 }
 
+/* parse BAR */
+static int pt_bar_reg_parse(
+        struct pt_dev *ptdev, struct pt_reg_info_tbl *reg)
+{
+    PCIDevice *d = &ptdev->dev;
+    struct pt_region *region = NULL;
+    uint32_t bar_64 = (reg->offset - 4);
+    uint32_t bar_data = 0;
+    int bar_flag = PT_BAR_FLAG_UNUSED;
+    int index = 0;
+    int i;
+
+    /* set again the BAR config because it has been overwritten
+     * by pci_register_io_region()
+     */
+    for (i=reg->offset; i<(reg->offset + 4); i++)
+        d->config[i] = pci_read_byte(ptdev->pci_dev, i);
+
+    /* check 64bit BAR */
+    index = pt_bar_offset_to_index(reg->offset);
+    if ((index > 0) && (index < PCI_ROM_SLOT) &&
+        (d->config[bar_64] & PCI_BASE_ADDRESS_MEM_TYPE_64))
+    {
+        region = &ptdev->bases[index-1];
+        if (region->bar_flag != PT_BAR_FLAG_UPPER)
+        {
+            bar_flag = PT_BAR_FLAG_UPPER;
+            goto out;
+        }
+    }
+
+    /* check unused BAR */
+    bar_data = *((uint32_t*)(d->config + reg->offset));
+    if ((!bar_data) ||
+        ((reg->offset == PCI_ROM_ADDRESS) &&
+        !(d->config[reg->offset] & PCI_ROM_ADDRESS_ENABLE)))
+            goto out;
+
+    /* check BAR I/O indicator */
+    if (d->config[reg->offset] & PCI_BASE_ADDRESS_SPACE_IO)
+        bar_flag = PT_BAR_FLAG_IO;
+    else
+        bar_flag = PT_BAR_FLAG_MEM;
+
+out:
+    return bar_flag;
+}
+
+/* mapping BAR */
+static void pt_bar_mapping(struct pt_dev *ptdev, int io_enable, int mem_enable)
+{
+    PCIDevice *dev = (PCIDevice *)&ptdev->dev;
+    PCIIORegion *r;
+    struct pt_region *base = NULL;
+    uint32_t r_size = 0;
+    int ret = 0;
+    int i;
+
+    for (i=0; i<PCI_NUM_REGIONS; i++)
+    {
+        r = &dev->io_regions[i];
+
+        /* check valid region */
+        if (!r->size)
+            continue;
+
+        base = &ptdev->bases[i];
+        /* skip unused BAR or upper 64bit BAR */
+        if ((base->bar_flag == PT_BAR_FLAG_UNUSED) || 
+           (base->bar_flag == PT_BAR_FLAG_UPPER))
+               continue;
+
+        /* clear region address in case I/O Space or Memory Space disable */
+        if (((base->bar_flag == PT_BAR_FLAG_IO) && !io_enable ) ||
+            ((base->bar_flag == PT_BAR_FLAG_MEM) && !mem_enable ))
+            r->addr = -1;
+
+        /* align resource size (memory type only) */
+        r_size = r->size;
+        PT_GET_EMUL_SIZE(base->bar_flag, r_size);
+
+        /* check overlapped address */
+        ret = pt_chk_bar_overlap(dev->bus, dev->devfn, r->addr, r_size);
+        if (ret > 0)
+        {
+            PT_LOG("Base Address[%d] is overlapped. "
+                "[Address:%08xh][Size:%04xh]\n",
+                i, r->addr, r_size);
+               continue;
+        }
+
+        /* check whether we need to update the mapping or not */
+        if (r->addr != ptdev->bases[i].e_physbase)
+        {
+            /* mapping BAR */
+            r->map_func((PCIDevice *)ptdev, i, r->addr, 
+                         r_size, r->type);
+        }
+    }
+
+    return;
+}
+
+/* initialize emulate register */
+static int pt_config_reg_init(struct pt_dev *ptdev,
+        struct pt_reg_grp_tbl *reg_grp,
+        struct pt_reg_info_tbl *reg)
+{
+    struct pt_reg_tbl *reg_entry;
+    uint32_t data = 0;
+    int err = 0;
+
+    /* allocate register entry */
+    reg_entry = qemu_mallocz(sizeof(struct pt_reg_tbl));
+    if (reg_entry == NULL)
+    {
+        PT_LOG("Failed to allocate memory.\n");
+        err = -1;
+        goto out;
+    }
+
+    /* initialize register entry */
+    reg_entry->reg = reg;
+    reg_entry->data = 0;
+
+    if (reg->init)
+    {
+        /* initialize emulate register */
+        data = reg->init(ptdev, reg_entry->reg,
+                        (reg_grp->base_offset + reg->offset));
+        if (data == PT_BAR_ALLF)
+        {
+            /* free unused BAR register entry */
+            free(reg_entry);
+            goto out;
+        }
+        /* set register value */
+        reg_entry->data = data;
+    }
+    /* list add register entry */
+    list_add_tail(&reg_entry->list, &reg_grp->pt_reg_tbl_list);
+
+out:
+    return err;
+}
+
+/* initialize emulate register group */
+static int pt_config_init(struct pt_dev *ptdev)
+{
+    struct pt_reg_grp_tbl *reg_grp_entry = NULL;
+    struct pt_reg_info_tbl *reg_tbl = NULL;
+    uint32_t reg_grp_offset = 0;
+    int i, j, err = 0;
+
+    /* initialize register group list */
+    INIT_LIST_HEAD(&ptdev->pt_reg_grp_tbl_list);
+
+    /* initialize register group */
+    for (i=0; pt_emu_reg_grp_tbl[i].grp_size != 0; i++)
+    {
+        if (pt_emu_reg_grp_tbl[i].grp_id != 0xFF)
+        {
+            reg_grp_offset = (uint32_t)find_cap_offset(ptdev->pci_dev, 
+                                 pt_emu_reg_grp_tbl[i].grp_id);
+            if (!reg_grp_offset) 
+                continue;
+        }
+
+        /* allocate register group table */
+        reg_grp_entry = qemu_mallocz(sizeof(struct pt_reg_grp_tbl));
+        if (reg_grp_entry == NULL)
+        {
+            PT_LOG("Failed to allocate memory.\n");
+            err = -1;
+            goto out;
+        }
+
+        /* initialize register group entry */
+        INIT_LIST_HEAD(&reg_grp_entry->pt_reg_tbl_list);
+
+        /* need to declare here, to enable searching Cap Ptr reg 
+         * (which is in the same reg group) when initializing Status reg 
+         */
+        list_add_tail(&reg_grp_entry->list, &ptdev->pt_reg_grp_tbl_list);
+
+        reg_grp_entry->base_offset = reg_grp_offset;
+        reg_grp_entry->reg_grp = 
+                (struct pt_reg_grp_info_tbl*)&pt_emu_reg_grp_tbl[i];
+        if (pt_emu_reg_grp_tbl[i].size_init)
+        {
+            /* get register group size */
+            reg_grp_entry->size = pt_emu_reg_grp_tbl[i].size_init(ptdev,
+                                      reg_grp_entry->reg_grp, 
+                                      reg_grp_offset);
+        }
+
+        if (pt_emu_reg_grp_tbl[i].grp_type == GRP_TYPE_EMU)
+        {
+            if (pt_emu_reg_grp_tbl[i].emu_reg_tbl)
+            {
+                reg_tbl = pt_emu_reg_grp_tbl[i].emu_reg_tbl;
+                /* initialize capability register */
+                for (j=0; reg_tbl->size != 0; j++, reg_tbl++)
+                {
+                    /* initialize capability register */
+                    err = pt_config_reg_init(ptdev, reg_grp_entry, reg_tbl);
+                    if (err < 0)
+                        goto out;
+                }
+            }
+        }
+        reg_grp_offset = 0;
+    }
+
+out:
+    return err;
+}
+
+/* initialize common register value */
+static uint32_t pt_common_reg_init(struct pt_dev *ptdev,
+        struct pt_reg_info_tbl *reg, uint32_t real_offset)
+{
+    return reg->init_val;
+}
+
+/* initialize Capabilities Pointer or Next Pointer register */
+static uint32_t pt_ptr_reg_init(struct pt_dev *ptdev,
+        struct pt_reg_info_tbl *reg, uint32_t real_offset)
+{
+    uint32_t reg_field = (uint32_t)ptdev->dev.config[real_offset];
+    int i;
+
+    /* find capability offset */
+    while (reg_field)
+    {
+        for (i=0; pt_emu_reg_grp_tbl[i].grp_size != 0; i++)
+        {
+            /* check whether the next capability 
+             * should be exported to guest or not 
+             */
+            if (pt_emu_reg_grp_tbl[i].grp_id == ptdev->dev.config[reg_field])
+            {
+                if (pt_emu_reg_grp_tbl[i].grp_type == GRP_TYPE_EMU)
+                    goto out;
+                /* ignore the 0 hardwired capability, find next one */
+                break;
+            }
+        }
+        /* next capability */
+        reg_field = (uint32_t)ptdev->dev.config[reg_field + 1];
+    }
+
+out:
+    return reg_field;
+}
+
+/* initialize Status register */
+static uint32_t pt_status_reg_init(struct pt_dev *ptdev,
+        struct pt_reg_info_tbl *reg, uint32_t real_offset)
+{
+    struct pt_reg_grp_tbl *reg_grp_entry = NULL;
+    struct pt_reg_tbl *reg_entry = NULL;
+    int reg_field = 0;
+
+    /* find Header register group */
+    reg_grp_entry = pt_find_reg_grp(ptdev, PCI_CAPABILITY_LIST);
+    if (reg_grp_entry)
+    {
+        /* find Capabilities Pointer register */
+        reg_entry = pt_find_reg(reg_grp_entry, PCI_CAPABILITY_LIST);
+        if (reg_entry)
+        {
+            /* check Capabilities Pointer register */
+            if (reg_entry->data)
+                reg_field |= PCI_STATUS_CAP_LIST;
+            else
+                reg_field &= ~PCI_STATUS_CAP_LIST;
+        }
+        else
+        {
+            /* exit I/O emulator */
+            PT_LOG("I/O emulator exit()\n");
+            exit(1);
+        }
+    }
+    else
+    {
+        /* exit I/O emulator */
+        PT_LOG("I/O emulator exit()\n");
+        exit(1);
+    }
+
+    return reg_field;
+}
+
+/* initialize Interrupt Pin register */
+static uint32_t pt_irqpin_reg_init(struct pt_dev *ptdev,
+        struct pt_reg_info_tbl *reg, uint32_t real_offset)
+{
+    int reg_field = 0;
+
+    /* set Interrupt Pin register to use INTA# if it has */
+    if (ptdev->dev.config[real_offset])
+        reg_field = 0x01;
+
+    return reg_field;
+}
+
+/* initialize BAR */
+static uint32_t pt_bar_reg_init(struct pt_dev *ptdev,
+        struct pt_reg_info_tbl *reg, uint32_t real_offset)
+{
+    int reg_field = 0;
+    int index;
+
+    /* get BAR index */
+    index = pt_bar_offset_to_index(reg->offset);
+    if (index < 0)
+    {
+        /* exit I/O emulator */
+        PT_LOG("I/O emulator exit()\n");
+        exit(1);
+    }
+
+    /* set initial guest physical base address to -1 */
+    ptdev->bases[index].e_physbase = -1;
+
+    /* set BAR flag */
+    ptdev->bases[index].bar_flag = pt_bar_reg_parse(ptdev, reg);
+    if (ptdev->bases[index].bar_flag == PT_BAR_FLAG_UNUSED)
+        reg_field = PT_BAR_ALLF;
+
+    return reg_field;
+}
+
+/* initialize Link Control 2 register */
+static uint32_t pt_linkctrl2_reg_init(struct pt_dev *ptdev,
+        struct pt_reg_info_tbl *reg, uint32_t real_offset)
+{
+    int reg_field = 0;
+
+    /* set Supported Link Speed */
+    reg_field |= 
+        (0x0F & 
+         ptdev->dev.config[(real_offset - reg->offset) + PCI_EXP_LNKCAP]);
+
+    return reg_field;
+}
+
+/* get register group size */
+static uint8_t pt_reg_grp_size_init(struct pt_dev *ptdev,
+        struct pt_reg_grp_info_tbl *grp_reg, uint32_t base_offset)
+{
+    return grp_reg->grp_size;
+}
+
+/* get MSI Capability Structure register group size */
+static uint8_t pt_msi_size_init(struct pt_dev *ptdev,
+        struct pt_reg_grp_info_tbl *grp_reg, uint32_t base_offset)
+{
+    PCIDevice *d = &ptdev->dev;
+    uint16_t msg_ctrl = 
+        *((uint16_t*)(d->config + (base_offset + PCI_MSI_FLAGS)));
+    uint8_t msi_size = 0;
+
+    /* check 64 bit address capable & Per-vector masking capable */
+    switch (msg_ctrl & (PCI_MSI_FLAGS_MASK_BIT | PCI_MSI_FLAGS_64BIT))
+    {
+    case 0x0000:
+        msi_size = 0x0A;
+        break;
+    case PCI_MSI_FLAGS_64BIT:
+        msi_size = 0x14;
+        break;
+    case PCI_MSI_FLAGS_MASK_BIT:
+        msi_size = 0x0E;
+        break;
+    case (PCI_MSI_FLAGS_MASK_BIT | PCI_MSI_FLAGS_64BIT):
+        msi_size = 0x18;
+        break;
+    }
+
+    return msi_size;
+}
+
+/* get Vendor Specific Capability Structure register group size */
+static uint8_t pt_vendor_size_init(struct pt_dev *ptdev,
+        struct pt_reg_grp_info_tbl *grp_reg, uint32_t base_offset)
+{
+    return ptdev->dev.config[base_offset + 0x02];
+}
+
+/* read byte size emulate register */
+static int pt_byte_reg_read(struct pt_dev *ptdev,
+        struct pt_reg_tbl *cfg_entry,
+        uint8_t *value, uint8_t valid_mask)
+{
+    struct pt_reg_info_tbl *reg = cfg_entry->reg;
+    uint8_t valid_emu_mask = 0;
+
+    /* emulate byte register */
+    valid_emu_mask = reg->emu_mask & valid_mask;
+    *value = ((*value & ~valid_emu_mask) | 
+              (cfg_entry->data & valid_emu_mask));
+
+    return 0;
+}
+
+/* read word size emulate register */
+static int pt_word_reg_read(struct pt_dev *ptdev,
+        struct pt_reg_tbl *cfg_entry,
+        uint16_t *value, uint16_t valid_mask)
+{
+    struct pt_reg_info_tbl *reg = cfg_entry->reg;
+    uint16_t valid_emu_mask = 0;
+
+    /* emulate word register */
+    valid_emu_mask = reg->emu_mask & valid_mask;
+    *value = ((*value & ~valid_emu_mask) | 
+              (cfg_entry->data & valid_emu_mask));
+
+    return 0;
+}
+
+/* read long size emulate register */
+static int pt_long_reg_read(struct pt_dev *ptdev,
+        struct pt_reg_tbl *cfg_entry,
+        uint32_t *value, uint32_t valid_mask)
+{
+    struct pt_reg_info_tbl *reg = cfg_entry->reg;
+    uint32_t valid_emu_mask = 0;
+
+    /* emulate long register */
+    valid_emu_mask = reg->emu_mask & valid_mask;
+    *value = ((*value & ~valid_emu_mask) | 
+              (cfg_entry->data & valid_emu_mask));
+
+   return 0;
+}
+
+/* read BAR */
+static int pt_bar_reg_read(struct pt_dev *ptdev,
+        struct pt_reg_tbl *cfg_entry,
+        uint32_t *value, uint32_t valid_mask)
+{
+    struct pt_reg_info_tbl *reg = cfg_entry->reg;
+    uint32_t valid_emu_mask = 0;
+    uint32_t bar_emu_mask = 0;
+    int index;
+
+    /* get BAR index */
+    index = pt_bar_offset_to_index(reg->offset);
+    if (index < 0)
+    {
+        /* exit I/O emulator */
+        PT_LOG("I/O emulator exit()\n");
+        exit(1);
+    }
+
+    /* set emulate mask depend on BAR flag */
+    switch (ptdev->bases[index].bar_flag)
+    {
+    case PT_BAR_FLAG_MEM:
+        bar_emu_mask = PT_BAR_MEM_EMU_MASK;
+        break;
+    case PT_BAR_FLAG_IO:
+        bar_emu_mask = PT_BAR_IO_EMU_MASK;
+        break;
+    case PT_BAR_FLAG_UPPER:
+        *value = 0;
+        goto out;
+    default:
+        break;
+    }
+
+    /* emulate BAR */
+    valid_emu_mask = bar_emu_mask & valid_mask;
+    *value = ((*value & ~valid_emu_mask) | 
+              (cfg_entry->data & valid_emu_mask));
+
+out:
+   return 0;
+}
+
+/* write byte size emulate register */
+static int pt_byte_reg_write(struct pt_dev *ptdev, 
+        struct pt_reg_tbl *cfg_entry, 
+        uint8_t *value, uint8_t dev_value, uint8_t valid_mask)
+{
+    struct pt_reg_info_tbl *reg = cfg_entry->reg;
+    uint8_t writable_mask = 0;
+    uint8_t throughable_mask = 0;
+
+    /* modify emulate register */
+    writable_mask = reg->emu_mask & ~reg->ro_mask & valid_mask;
+    cfg_entry->data = ((*value & writable_mask) |
+                       (cfg_entry->data & ~writable_mask));
+
+    /* create value for writing to I/O device register */
+    throughable_mask = ~reg->emu_mask & valid_mask;
+    *value = ((*value & throughable_mask) |
+              (dev_value & ~throughable_mask));
+
+    return 0;
+}
+
+/* write word size emulate register */
+static int pt_word_reg_write(struct pt_dev *ptdev, 
+        struct pt_reg_tbl *cfg_entry, 
+        uint16_t *value, uint16_t dev_value, uint16_t valid_mask)
+{
+    struct pt_reg_info_tbl *reg = cfg_entry->reg;
+    uint16_t writable_mask = 0;
+    uint16_t throughable_mask = 0;
+
+    /* modify emulate register */
+    writable_mask = reg->emu_mask & ~reg->ro_mask & valid_mask;
+    cfg_entry->data = ((*value & writable_mask) |
+                       (cfg_entry->data & ~writable_mask));
+
+    /* create value for writing to I/O device register */
+    throughable_mask = ~reg->emu_mask & valid_mask;
+    *value = ((*value & throughable_mask) |
+              (dev_value & ~throughable_mask));
+
+    return 0;
+}
+
+/* write long size emulate register */
+static int pt_long_reg_write(struct pt_dev *ptdev, 
+        struct pt_reg_tbl *cfg_entry, 
+        uint32_t *value, uint32_t dev_value, uint32_t valid_mask)
+{
+    struct pt_reg_info_tbl *reg = cfg_entry->reg;
+    uint32_t writable_mask = 0;
+    uint32_t throughable_mask = 0;
+
+    /* modify emulate register */
+    writable_mask = reg->emu_mask & ~reg->ro_mask & valid_mask;
+    cfg_entry->data = ((*value & writable_mask) |
+                       (cfg_entry->data & ~writable_mask));
+
+    /* create value for writing to I/O device register */
+    throughable_mask = ~reg->emu_mask & valid_mask;
+    *value = ((*value & throughable_mask) |
+              (dev_value & ~throughable_mask));
+
+    return 0;
+}
+
+/* write Command register */
+static int pt_cmd_reg_write(struct pt_dev *ptdev, 
+        struct pt_reg_tbl *cfg_entry, 
+        uint16_t *value, uint16_t dev_value, uint16_t valid_mask)
+{
+    struct pt_reg_info_tbl *reg = cfg_entry->reg;
+    uint16_t writable_mask = 0;
+    uint16_t throughable_mask = 0;
+    uint16_t wr_value = *value;
+
+    /* modify emulate register */
+    writable_mask = reg->emu_mask & ~reg->ro_mask & valid_mask;
+    cfg_entry->data = ((*value & writable_mask) |
+                       (cfg_entry->data & ~writable_mask));
+
+    /* create value for writing to I/O device register */
+    throughable_mask = ~reg->emu_mask & valid_mask;
+    *value = ((*value & throughable_mask) | (dev_value & ~throughable_mask));
+
+    /* mapping BAR */
+    pt_bar_mapping(ptdev, wr_value & PCI_COMMAND_IO, 
+                          wr_value & PCI_COMMAND_MEMORY);
+
+    return 0;
+}
+
+/* write BAR */
+static int pt_bar_reg_write(struct pt_dev *ptdev, 
+        struct pt_reg_tbl *cfg_entry, 
+        uint32_t *value, uint32_t dev_value, uint32_t valid_mask)
+{
+    struct pt_reg_info_tbl *reg = cfg_entry->reg;
+    struct pt_reg_grp_tbl *reg_grp_entry = NULL;
+    struct pt_reg_tbl *reg_entry = NULL;
+    struct pt_region *base = NULL;
+    PCIDevice *d = (PCIDevice *)&ptdev->dev;
+    PCIIORegion *r;
+    uint32_t writable_mask = 0;
+    uint32_t throughable_mask = 0;
+    uint32_t bar_emu_mask = 0;
+    uint32_t bar_ro_mask = 0;
+    uint32_t new_addr, last_addr;
+    uint32_t prev_offset;
+    uint32_t r_size = 0;
+    int index = 0;
+
+   /* get BAR index */
+    index = pt_bar_offset_to_index(reg->offset);
+    if (index < 0)
+    {
+        /* exit I/O emulator */
+        PT_LOG("I/O emulator exit()\n");
+        exit(1);
+    }
+
+    r = &d->io_regions[index];
+    r_size = r->size;
+    base = &ptdev->bases[index];
+    /* align resource size (memory type only) */
+    PT_GET_EMUL_SIZE(base->bar_flag, r_size);
+
+    /* check guest write value */
+    if (*value == PT_BAR_ALLF)
+    {
+        /* set register with resource size alligned to page size */
+        cfg_entry->data = ~(r_size - 1);
+        /* avoid writing ALL F to I/O device register */
+        *value = dev_value;
+    }
+    else
+    {
+        /* set emulate mask and read-only mask depend on BAR flag */
+        switch (ptdev->bases[index].bar_flag)
+        {
+        case PT_BAR_FLAG_MEM:
+            bar_emu_mask = PT_BAR_MEM_EMU_MASK;
+            bar_ro_mask = PT_BAR_MEM_RO_MASK;
+            break;
+        case PT_BAR_FLAG_IO:
+            new_addr = *value;
+            last_addr = new_addr + r_size - 1;
+            /* check 64K range */
+            if (last_addr <= new_addr || !new_addr || last_addr >= 0x10000)
+            {
+                PT_LOG("Guest attempt to set Base Address over the 64KB. "
+                    "[%02x:%02x.%x][Offset:%02xh][Range:%08xh-%08xh]\n",
+                    pci_bus_num(d->bus), 
+                    ((d->devfn >> 3) & 0x1F), (d->devfn & 0x7),
+                    reg->offset, new_addr, last_addr);
+                /* just remove mapping */
+                r->addr = -1;
+                goto exit;
+            }
+            bar_emu_mask = PT_BAR_IO_EMU_MASK;
+            bar_ro_mask = PT_BAR_IO_RO_MASK;
+            break;
+        case PT_BAR_FLAG_UPPER:
+            if (*value)
+            {
+                PT_LOG("Guest attempt to set high MMIO Base Address. "
+                   "Ignore mapping. "
+                   "[%02x:%02x.%x][Offset:%02xh][High Address:%08xh]\n",
+                    pci_bus_num(d->bus), 
+                    ((d->devfn >> 3) & 0x1F), (d->devfn & 0x7),
+                    reg->offset, *value);
+                /* clear lower address */
+                d->io_regions[index-1].addr = -1;
+            }
+            else
+            {
+                /* find lower 32bit BAR */
+                prev_offset = (reg->offset - 4);
+                reg_grp_entry = pt_find_reg_grp(ptdev, prev_offset);
+                if (reg_grp_entry)
+                {
+                    reg_entry = pt_find_reg(reg_grp_entry, prev_offset);
+                    if (reg_entry)
+                        /* restore lower address */
+                        d->io_regions[index-1].addr = reg_entry->data;
+                    else
+                        return -1;
+                }
+                else
+                    return -1;
+            }
+            cfg_entry->data = 0;
+            r->addr = -1;
+            goto exit;
+        }
+
+        /* modify emulate register */
+        writable_mask = bar_emu_mask & ~bar_ro_mask & valid_mask;
+        cfg_entry->data = ((*value & writable_mask) |
+                           (cfg_entry->data & ~writable_mask));
+        /* update the corresponding virtual region address */
+        r->addr = cfg_entry->data;
+
+        /* create value for writing to I/O device register */
+        throughable_mask = ~bar_emu_mask & valid_mask;
+        *value = ((*value & throughable_mask) |
+                  (dev_value & ~throughable_mask));
+    }
+
+exit:
+    return 0;
+}
+
+/* write Exp ROM BAR */
+static int pt_exp_rom_bar_reg_write(struct pt_dev *ptdev, 
+        struct pt_reg_tbl *cfg_entry, 
+        uint32_t *value, uint32_t dev_value, uint32_t valid_mask)
+{
+    struct pt_reg_info_tbl *reg = cfg_entry->reg;
+    struct pt_region *base = NULL;
+    PCIDevice *d = (PCIDevice *)&ptdev->dev;
+    PCIIORegion *r;
+    uint32_t writable_mask = 0;
+    uint32_t throughable_mask = 0;
+    uint32_t r_size = 0;
+
+    r = &d->io_regions[PCI_ROM_SLOT];
+    r_size = r->size;
+    base = &ptdev->bases[PCI_ROM_SLOT];
+    /* align memory type resource size */
+    PT_GET_EMUL_SIZE(base->bar_flag, r_size);
+
+    /* check guest write value */
+    if (*value == PT_BAR_ALLF)
+    {
+        /* set register with resource size alligned to page size */
+        cfg_entry->data = ~(r_size - 1);
+        /* avoid writing ALL F to I/O device register */
+        *value = dev_value;
+    }
+    else
+    {
+        /* modify emulate register */
+        writable_mask = reg->emu_mask & ~reg->ro_mask & valid_mask;
+        cfg_entry->data = ((*value & writable_mask) |
+                           (cfg_entry->data & ~writable_mask));
+        /* update the corresponding virtual region address */
+        r->addr = cfg_entry->data;
+
+        /* create value for writing to I/O device register */
+        throughable_mask = ~reg->emu_mask & valid_mask;
+        *value = ((*value & throughable_mask) |
+                  (dev_value & ~throughable_mask));
+    }
+
+    return 0;
+}
+
+/* write Power Management Control/Status register */
+static int pt_pmcsr_reg_write(struct pt_dev *ptdev, 
+        struct pt_reg_tbl *cfg_entry, 
+        uint16_t *value, uint16_t dev_value, uint16_t valid_mask)
+{
+    struct pt_reg_info_tbl *reg = cfg_entry->reg;
+    uint16_t writable_mask = 0;
+    uint16_t throughable_mask = 0;
+    uint16_t pmcsr_mask = (PCI_PM_CTRL_PME_ENABLE | 
+                           PCI_PM_CTRL_DATA_SEL_MASK |
+                           PCI_PM_CTRL_PME_STATUS);
+
+    /* modify emulate register */
+    writable_mask = reg->emu_mask & ~reg->ro_mask & valid_mask & ~pmcsr_mask;
+    /* ignore it when the requested state neither D3 nor D0 */
+    if (((*value & PCI_PM_CTRL_STATE_MASK) != PCI_PM_CTRL_STATE_MASK) &&
+        ((*value & PCI_PM_CTRL_STATE_MASK) != 0))
+        writable_mask &= ~PCI_PM_CTRL_STATE_MASK;
+
+    cfg_entry->data = ((*value & writable_mask) |
+                       (cfg_entry->data & ~writable_mask));
+
+    /* create value for writing to I/O device register */
+    throughable_mask = ~reg->emu_mask & valid_mask;
+    *value = ((*value & throughable_mask) |
+              (dev_value & ~throughable_mask));
+
+    return 0;
+}
+
+/* write Device Control register */
+static int pt_devctrl_reg_write(struct pt_dev *ptdev, 
+        struct pt_reg_tbl *cfg_entry, 
+        uint16_t *value, uint16_t dev_value, uint16_t valid_mask)
+{
+    struct pt_reg_info_tbl *reg = cfg_entry->reg;
+    uint16_t writable_mask = 0;
+    uint16_t throughable_mask = 0;
+    uint16_t devctrl_mask = (PCI_EXP_DEVCTL_AUX_PME | 0x8000);
+
+    /* modify emulate register */
+    writable_mask = reg->emu_mask & ~reg->ro_mask & valid_mask & ~devctrl_mask;
+    cfg_entry->data = ((*value & writable_mask) |
+                       (cfg_entry->data & ~writable_mask));
+
+    /* create value for writing to I/O device register */
+    throughable_mask = ~reg->emu_mask & valid_mask;
+    *value = ((*value & throughable_mask) |
+              (dev_value & ~throughable_mask));
+
+    return 0;
+}
+
+/* write Link Control register */
+static int pt_linkctrl_reg_write(struct pt_dev *ptdev, 
+        struct pt_reg_tbl *cfg_entry, 
+        uint16_t *value, uint16_t dev_value, uint16_t valid_mask)
+{
+    struct pt_reg_info_tbl *reg = cfg_entry->reg;
+    uint16_t writable_mask = 0;
+    uint16_t throughable_mask = 0;
+    uint16_t linkctrl_mask = (PCI_EXP_LNKCTL_ASPM | 0x04 |
+                              PCI_EXP_LNKCTL_DISABLE |
+                              PCI_EXP_LNKCTL_RETRAIN | 
+                              0x0400 | 0x0800 | 0xF000);
+
+    /* modify emulate register */
+    writable_mask = reg->emu_mask & ~reg->ro_mask & valid_mask & ~linkctrl_mask;
+    cfg_entry->data = ((*value & writable_mask) |
+                       (cfg_entry->data & ~writable_mask));
+
+    /* create value for writing to I/O device register */
+    throughable_mask = ~reg->emu_mask & valid_mask;
+    *value = ((*value & throughable_mask) |
+              (dev_value & ~throughable_mask));
+
+    return 0;
+}
+
+/* write Device Control2 register */
+static int pt_devctrl2_reg_write(struct pt_dev *ptdev, 
+        struct pt_reg_tbl *cfg_entry, 
+        uint16_t *value, uint16_t dev_value, uint16_t valid_mask)
+{
+    struct pt_reg_info_tbl *reg = cfg_entry->reg;
+    uint16_t writable_mask = 0;
+    uint16_t throughable_mask = 0;
+    uint16_t devctrl2_mask = 0xFFE0;
+
+    /* modify emulate register */
+    writable_mask = reg->emu_mask & ~reg->ro_mask & valid_mask & ~devctrl2_mask;
+    cfg_entry->data = ((*value & writable_mask) |
+                       (cfg_entry->data & ~writable_mask));
+
+    /* create value for writing to I/O device register */
+    throughable_mask = ~reg->emu_mask & valid_mask;
+    *value = ((*value & throughable_mask) |
+              (dev_value & ~throughable_mask));
+
+    return 0;
+}
+
+/* write Link Control2 register */
+static int pt_linkctrl2_reg_write(struct pt_dev *ptdev, 
+        struct pt_reg_tbl *cfg_entry, 
+        uint16_t *value, uint16_t dev_value, uint16_t valid_mask)
+{
+    struct pt_reg_info_tbl *reg = cfg_entry->reg;
+    uint16_t writable_mask = 0;
+    uint16_t throughable_mask = 0;
+    uint16_t linkctrl2_mask = (0x0040 | 0xE000);
+
+    /* modify emulate register */
+    writable_mask = reg->emu_mask & ~reg->ro_mask & valid_mask & 
+                    ~linkctrl2_mask;
+    cfg_entry->data = ((*value & writable_mask) |
+                       (cfg_entry->data & ~writable_mask));
+
+    /* create value for writing to I/O device register */
+    throughable_mask = ~reg->emu_mask & valid_mask;
+    *value = ((*value & throughable_mask) |
+              (dev_value & ~throughable_mask));
+
+    return 0;
+}
+
 struct pt_dev * register_real_device(PCIBus *e_bus,
         const char *e_dev_name, int e_devfn, uint8_t r_bus, uint8_t r_dev,
         uint8_t r_func, uint32_t machine_irq, struct pci_access *pci_access)
 {
-    int rc = -1, i, pos;
+    int rc = -1, i;
     struct pt_dev *assigned_device = NULL;
     struct pci_dev *pci_dev;
     uint8_t e_device, e_intx;
@@ -539,7 +2248,6 @@ struct pt_dev * register_real_device(PCI
         dpci_infos.php_devs[PCI_TO_PHP_SLOT(free_pci_slot)].pt_dev = assigned_device;
 
     assigned_device->pci_dev = pci_dev;
-
 
     /* Assign device */
     machine_bdf.reg = 0;
@@ -554,18 +2262,22 @@ struct pt_dev * register_real_device(PCI
     for ( i = 0; i < PCI_CONFIG_SIZE; i++ )
         assigned_device->dev.config[i] = pci_read_byte(pci_dev, i);
 
-    if ( (pos = find_cap_offset(pci_dev, PCI_CAP_ID_MSI)) )
-        pt_msi_init(assigned_device, pos);
-
-    if ( (pos = find_cap_offset(pci_dev, PCI_CAP_ID_MSIX)) )
-        pt_msix_init(assigned_device, pos);
-
     /* Handle real device's MMIO/PIO BARs */
     pt_register_regions(assigned_device);
 
+    /* reinitialize each config register to be emulated */
+    rc = pt_config_init(assigned_device);
+    if ( rc < 0 ) {
+        return NULL;
+    }
+
     /* Bind interrupt */
+    if (!assigned_device->dev.config[0x3d])
+        goto out;
+
     e_device = (assigned_device->dev.devfn >> 3) & 0x1f;
-    e_intx = assigned_device->dev.config[0x3d]-1;
+    /* fix virtual interrupt pin to INTA# */
+    e_intx = 0;
 
     if ( PT_MACHINE_IRQ_AUTO == machine_irq )
     {
@@ -602,6 +2314,7 @@ struct pt_dev * register_real_device(PCI
             *(uint16_t *)(&assigned_device->dev.config[0x04]));
     }
 
+out:
     PT_LOG("Real physical device %02x:%02x.%x registered successfuly!\n", 
         r_bus, r_dev, r_func);
 
diff -r 926a366ca82f tools/ioemu/hw/pass-through.h
--- a/tools/ioemu/hw/pass-through.h	Fri Jun 20 15:21:26 2008 +0100
+++ b/tools/ioemu/hw/pass-through.h	Fri Jun 27 11:58:26 2008 +0900
@@ -21,6 +21,7 @@
 #include "vl.h"
 #include "pci/header.h"
 #include "pci/pci.h"
+#include "list.h"
 
 /* Log acesss */
 #define PT_LOGGING_ENABLED
@@ -42,6 +43,38 @@
 #define PCI_EXP_DEVCAP_FLR      (1 << 28)
 #define PCI_EXP_DEVCTL_FLR      (1 << 15)
 #define PCI_BAR_ENTRIES         (6)
+
+/* because the current version of libpci (2.2.0) doesn't define these ID,
+ * so we define Capability ID here.
+ */
+/* SHPC Capability List Item reg group */
+#define PCI_CAP_ID_HOTPLUG      0x0C
+/* Subsystem ID and Subsystem Vendor ID Capability List Item reg group */
+#define PCI_CAP_ID_SSVID        0x0D
+/* interrupt masking & reporting supported */
+#define PCI_MSI_FLAGS_MASK_BIT  0x0100
+
+#define PT_BAR_ALLF             0xFFFFFFFF      /* BAR mask */
+#define PT_BAR_MEM_RO_MASK      0x0000000F      /* BAR ReadOnly mask(Memory) */
+#define PT_BAR_MEM_EMU_MASK     0xFFFFFFF0      /* BAR emul mask(Memory) */
+#define PT_BAR_IO_RO_MASK       0x00000003      /* BAR ReadOnly mask(I/O) */
+#define PT_BAR_IO_EMU_MASK      0xFFFFFFFC      /* BAR emul mask(I/O) */
+enum {
+    PT_BAR_FLAG_MEM = 0,                        /* Memory type BAR */
+    PT_BAR_FLAG_IO,                             /* I/O type BAR */
+    PT_BAR_FLAG_UPPER,                          /* upper 64bit BAR */
+    PT_BAR_FLAG_UNUSED,                         /* unused BAR */
+};
+enum {
+    GRP_TYPE_HARDWIRED = 0,                     /* 0 Hardwired reg group */
+    GRP_TYPE_EMU,                               /* emul reg group */
+};
+
+#define PT_GET_EMUL_SIZE(flag, r_size) do { \
+    if (flag == PT_BAR_FLAG_MEM) {\
+        r_size = (((r_size) + PAGE_SIZE - 1) & ~(PAGE_SIZE - 1)); \
+    }\
+} while(0)
 
 struct pt_region {
     /* Virtual phys base & size */
@@ -49,11 +82,13 @@ struct pt_region {
     uint32_t e_size;
     /* Index of region in qemu */
     uint32_t memory_index;
+    /* BAR flag */
+    uint32_t bar_flag;
     /* Translation of the emulated address */
     union {
-        uint32_t maddr;
-        uint32_t pio_base;
-        uint32_t u;
+        uint64_t maddr;
+        uint64_t pio_base;
+        uint64_t u;
     } access;
 };
 
@@ -89,8 +124,9 @@ struct pt_msix_info {
 */
 struct pt_dev {
     PCIDevice dev;
-    struct pci_dev *pci_dev;                     /* libpci struct */
+    struct pci_dev *pci_dev;                    /* libpci struct */
     struct pt_region bases[PCI_NUM_REGIONS];    /* Access regions */
+    struct list_head pt_reg_grp_tbl_list;       /* emul reg group list */
     struct pt_msi_info *msi;                    /* MSI virtualization */
     struct pt_msix_info *msix;                  /* MSI-X virtualization */
 };
@@ -113,5 +149,121 @@ struct pci_config_cf8 {
 
 int pt_init(PCIBus * e_bus, char * direct_pci);
 
+/* emul reg group management table */
+struct pt_reg_grp_tbl {
+    /* emul reg group list */
+    struct list_head list;
+    /* emul reg group info table */
+    struct pt_reg_grp_info_tbl *reg_grp;
+    /* emul reg group base offset */
+    uint32_t base_offset;
+    /* emul reg group size */
+    uint8_t size;
+    /* emul reg management table list */
+    struct list_head pt_reg_tbl_list;
+};
+
+/* emul reg group size initialize method */
+typedef uint8_t (*pt_reg_size_init) (struct pt_dev *ptdev, 
+                                     struct pt_reg_grp_info_tbl *grp_reg, 
+                                     uint32_t base_offset);
+/* emul reg group infomation table */
+struct pt_reg_grp_info_tbl {
+    /* emul reg group ID */
+    uint8_t grp_id;
+    /* emul reg group type */
+    uint8_t grp_type;
+    /* emul reg group size */
+    uint8_t grp_size;
+    /* emul reg get size method */
+    pt_reg_size_init size_init;
+    /* emul reg info table */
+    struct pt_reg_info_tbl *emu_reg_tbl;
+};
+
+/* emul reg management table */
+struct pt_reg_tbl {
+    /* emul reg table list */
+    struct list_head list;
+    /* emul reg info table */
+    struct pt_reg_info_tbl *reg;
+    /* emul reg value */
+    uint32_t data;
+};
+
+/* emul reg initialize method */
+typedef uint32_t (*conf_reg_init) (struct pt_dev *ptdev, 
+                                   struct pt_reg_info_tbl *reg, 
+                                   uint32_t real_offset);
+/* emul reg long write method */
+typedef int (*conf_dword_write) (struct pt_dev *ptdev,
+                                 struct pt_reg_tbl *cfg_entry, 
+                                 uint32_t *value, 
+                                 uint32_t dev_value,
+                                 uint32_t valid_mask);
+/* emul reg word write method */
+typedef int (*conf_word_write) (struct pt_dev *ptdev,
+                                struct pt_reg_tbl *cfg_entry, 
+                                uint16_t *value, 
+                                uint16_t dev_value,
+                                uint16_t valid_mask);
+/* emul reg byte write method */
+typedef int (*conf_byte_write) (struct pt_dev *ptdev,
+                                struct pt_reg_tbl *cfg_entry, 
+                                uint8_t *value, 
+                                uint8_t dev_value,
+                                uint8_t valid_mask);
+/* emul reg long read methods */
+typedef int (*conf_dword_read) (struct pt_dev *ptdev,
+                                struct pt_reg_tbl *cfg_entry, 
+                                uint32_t *value,
+                                uint32_t valid_mask);
+/* emul reg word read method */
+typedef int (*conf_word_read) (struct pt_dev *ptdev,
+                               struct pt_reg_tbl *cfg_entry, 
+                               uint16_t *value,
+                               uint16_t valid_mask);
+/* emul reg byte read method */
+typedef int (*conf_byte_read) (struct pt_dev *ptdev,
+                               struct pt_reg_tbl *cfg_entry, 
+                               uint8_t *value,
+                               uint8_t valid_mask);
+
+/* emul reg infomation table */
+struct pt_reg_info_tbl {
+    /* reg relative offset */
+    uint32_t offset;
+    /* reg size */
+    uint32_t size;
+    /* reg initial value */
+    uint32_t init_val;
+    /* reg read only field mask (ON:RO/ROS, OFF:other) */
+    uint32_t ro_mask;
+    /* reg emulate field mask (ON:emu, OFF:passthrough) */
+    uint32_t emu_mask;
+    /* emul reg initialize method */
+    conf_reg_init init;
+    union {
+        struct {
+            /* emul reg long write method */
+            conf_dword_write write;
+            /* emul reg long read method */
+            conf_dword_read read;
+        } dw;
+        struct {
+            /* emul reg word write method */
+            conf_word_write write;
+            /* emul reg word read method */
+            conf_word_read read;
+        } w;
+        struct {
+            /* emul reg byte write method */
+            conf_byte_write write;
+            /* emul reg byte read method */
+            conf_byte_read read;
+        } b;
+    } u;
+};
+
 #endif /* __PASSTHROUGH_H__ */
 
diff -r 926a366ca82f tools/ioemu/hw/pci.c
--- a/tools/ioemu/hw/pci.c	Fri Jun 20 15:21:26 2008 +0100
+++ b/tools/ioemu/hw/pci.c	Fri Jun 27 11:58:26 2008 +0900
@@ -641,3 +641,31 @@ PCIBus *pci_bridge_init(PCIBus *bus, int
     s->bus = pci_register_secondary_bus(&s->dev, map_irq);
     return s->bus;
 }
+
+int pt_chk_bar_overlap(PCIBus *bus, int devfn, uint32_t addr, uint32_t size)
+{
+    PCIDevice *devices = (PCIDevice *)bus->devices;
+    PCIIORegion *r;
+    int ret = 0;
+    int i, j;
+
+    /* check Overlapped to Base Address */
+    for (i=0; i<256; i++, devices++)
+    {
+        if ((devices == NULL) || (devices->devfn == devfn))
+            continue;
+
+        for (j=0; j<PCI_NUM_REGIONS; j++)
+        {
+            r = &devices->io_regions[j];
+            if ((addr < (r->addr + r->size)) && ((addr + size) > r->addr))
+            {
+                ret = 1;
+                goto out;
+            }
+        }
+    }
+
+out:
+    return ret;
+}
diff -r 926a366ca82f tools/ioemu/list.h
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/tools/ioemu/list.h	Fri Jun 27 11:58:26 2008 +0900
@@ -0,0 +1,89 @@
+#ifndef _IOEMU_LIST_H
+#define _IOEMU_LIST_H
+/* Taken from Linux kernel code, but de-kernelized for userspace. */
+#include <stddef.h>
+
+/*
+ * These are non-NULL pointers that will result in page faults
+ * under normal circumstances, used to verify that nobody uses
+ * non-initialized list entries.
+ */
+#define LIST_POISON1  ((void *) 0x00100100)
+#define LIST_POISON2  ((void *) 0x00200200)
+
+#define container_of(ptr, type, member) ({                \
+        typeof( ((type *)0)->member ) *__mptr = (ptr);    \
+        (type *)( (char *)__mptr - offsetof(type,member) );})
+
+/*
+ * Simple doubly linked list implementation.
+ *
+ * Some of the internal functions ("__xxx") are useful when
+ * manipulating whole lists rather than single entries, as
+ * sometimes we already know the next/prev entries and we can
+ * generate better code by using them directly rather than
+ * using the generic single-entry routines.
+ */
+
+struct list_head {
+    struct list_head *next, *prev;
+};
+
+#define LIST_HEAD_INIT(name) { &(name), &(name) }
+
+#define INIT_LIST_HEAD(ptr) do { \
+    (ptr)->next = (ptr); (ptr)->prev = (ptr); \
+} while (0)
+
+
+/*
+ * Insert a new entry between two known consecutive entries. 
+ *
+ * This is only for internal list manipulation where we know
+ * the prev/next entries already!
+ */
+static inline void __list_add(struct list_head *new,
+                  struct list_head *prev,
+                  struct list_head *next)
+{
+    next->prev = new;
+    new->next = next;
+    new->prev = prev;
+    prev->next = new;
+}
+
+/**
+ * list_add_tail - add a new entry
+ * @new: new entry to be added
+ * @head: list head to add it before
+ *
+ * Insert a new entry before the specified head.
+ * This is useful for implementing queues.
+ */
+static inline void list_add_tail(struct list_head *new, 
+                                 struct list_head *head)
+{
+    __list_add(new, head->prev, head);
+}
+
+/**
+ * list_entry - get the struct for this entry
+ * @ptr:    the &struct list_head pointer.
+ * @type:   the type of the struct this is embedded in.
+ * @member: the name of the list_struct within the struct.
+ */
+#define list_entry(ptr, type, member)  \
+    container_of(ptr, type, member)
+
+/**
+ * list_for_each_entry - iterate over list of given type
+ * @pos:    the type * to use as a loop counter.
+ * @head:   the head for your list.
+ * @member: the name of the list_struct within the struct.
+ */
+#define list_for_each_entry(pos, head, member)                    \
+    for (pos = list_entry((head)->next, typeof(*pos), member);    \
+         &pos->member != (head);                                  \
+         pos = list_entry(pos->member.next, typeof(*pos), member))
+
+#endif
diff -r 926a366ca82f tools/ioemu/vl.h
--- a/tools/ioemu/vl.h	Fri Jun 20 15:21:26 2008 +0100
+++ b/tools/ioemu/vl.h	Fri Jun 27 11:58:26 2008 +0900
@@ -832,6 +832,8 @@ void pci_register_io_region(PCIDevice *p
                             uint32_t size, int type, 
                             PCIMapIORegionFunc *map_func);
 
+int pt_chk_bar_overlap(PCIBus *bus, int devfn, uint32_t addr, uint32_t size);
+
 void pci_set_irq(PCIDevice *pci_dev, int irq_num, int level);
 
 uint32_t pci_default_read_config(PCIDevice *d, 

[-- Attachment #3: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* RE: [PATCH][RFC] Support more Capability Structures andDevice Specific
  2008-06-27  7:38 [PATCH][RFC] Support more Capability Structures and Device Specific Yuji Shimada
@ 2008-06-27 10:14 ` Dong, Eddie
  2008-06-27 10:19   ` Keir Fraser
  2008-06-27 13:51 ` [PATCH][RFC] Support more Capability Structures and Device Specific Samuel Thibault
  1 sibling, 1 reply; 33+ messages in thread
From: Dong, Eddie @ 2008-06-27 10:14 UTC (permalink / raw)
  To: Yuji Shimada, xen-devel; +Cc: Dong, Eddie

[-- Attachment #1: Type: text/plain, Size: 2866 bytes --]

Yuji:
	We have a discussion in xen summit for PCI CFGS emulation, are
you in the summit too?
	Here is the slide for the discussion, we can coordinate.
Thx, eddie

Yuji Shimada wrote:
> I am submitting the patch which supports more Capability
> Structures and Device Specific Registers for passthrough
> device. 
> 
> In Xen 3.3 unstable, qemu-dm supports Configuration
> Header, MSI Capability Structure, and MSI-X Capability
> Structure. But qemu-dm does not support PCI Express
> Capability Structure, Device Specific Registers, etc
> (writing them is ignored). 
> 
> To support various I/O devices, I implemented following
> Capability Structures and Device Specific Registers.
> 
>     * Configuration Header Type 0
>         -> emulation.
>            "emulation" does not mean no accessing real
>            I/O device. Access real I/O device, but guest
>            value and real value might be different.
>     * PCI Express Capability Structure
>         -> emulation.
>     * PCI Power Management Capability Structure
>         -> emulation.
>     * Vital Product Data Capability Structure
>         -> emulation (almost passthrough).
>     * Vendor Specific Capability Structure
>         -> emulation (almost passthrough).
>     * Device Specific Register (exclude capability
>         structures) -> passthrough.
>            The device drivers in guest domain are allowed
>            to access Device Specific Register. So various
> I/O device will work. 
> 
> Currently MSI Capability Structure and MSI-X Capability
> Structure is not implemented, and they are hidden from
> guest software. I disabled MSI and MSI-X in qemu-dm
> temporary. I am implementing MSI Capability Structure and
> merging current MSI routines. I will release the patch if
> you agree with me. 
> 
> MSI-X will be after MSI. I will be very happy if anyone
> can help me. 
> 
> Other Capability Structures are hidden from guest
> software. To do this, I change Next Capability Pointer's
> value to point only the Capability Structure that need to
> be exported to guest software (see emulate capabilities
> above). And some Capability Structures are 0 hardwired,
> and others are passthrough. 
> 
> This patch removes "switch" statements for emulation, and
> introduces table based emulation derived from pciback
> driver. You can implement new Capability Structure by
> adding new table. 
> The other advantage of using this table is that you can
> easily change the emulation policy of each field/bit by
> just simply modifying the "emu_mask" value provided in
> each register table. 
> And for only special emulation or interacting with other
> components (like hypervisor), you have to implement
> function corresponding to the register.
> 
> Signed-off-by: Yuji Shimada <shimada-yxb@necst.nec.co.jp>


[-- Attachment #2: Xen - VT-D Enhance.pdf --]
[-- Type: application/octet-stream, Size: 299576 bytes --]

[-- Attachment #3: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH][RFC] Support more Capability Structures andDevice Specific
  2008-06-27 10:14 ` [PATCH][RFC] Support more Capability Structures andDevice Specific Dong, Eddie
@ 2008-06-27 10:19   ` Keir Fraser
  2008-06-27 10:25     ` Dong, Eddie
  2008-06-27 13:27     ` [PATCH][RFC] Support more Capability Structures andDevice Specific Ian Jackson
  0 siblings, 2 replies; 33+ messages in thread
From: Keir Fraser @ 2008-06-27 10:19 UTC (permalink / raw)
  To: Dong, Eddie, Yuji Shimada, xen-devel; +Cc: Ian Jackson

What do you think of Yuji's patch? One thing to consider is that perhaps the
patch should be against the upstream qemu merge now. But I'm not sure that
PCI passthrough is even supported yet in that tree (Ian?).

 -- Keir

On 27/6/08 11:14, "Dong, Eddie" <eddie.dong@intel.com> wrote:

> Yuji:
> We have a discussion in xen summit for PCI CFGS emulation, are
> you in the summit too?
> Here is the slide for the discussion, we can coordinate.
> Thx, eddie
> 
> Yuji Shimada wrote:
>> I am submitting the patch which supports more Capability
>> Structures and Device Specific Registers for passthrough
>> device. 
>> 
>> In Xen 3.3 unstable, qemu-dm supports Configuration
>> Header, MSI Capability Structure, and MSI-X Capability
>> Structure. But qemu-dm does not support PCI Express
>> Capability Structure, Device Specific Registers, etc
>> (writing them is ignored).
>> 
>> To support various I/O devices, I implemented following
>> Capability Structures and Device Specific Registers.
>> 
>>     * Configuration Header Type 0
>>         -> emulation.
>>            "emulation" does not mean no accessing real
>>            I/O device. Access real I/O device, but guest
>>            value and real value might be different.
>>     * PCI Express Capability Structure
>>         -> emulation.
>>     * PCI Power Management Capability Structure
>>         -> emulation.
>>     * Vital Product Data Capability Structure
>>         -> emulation (almost passthrough).
>>     * Vendor Specific Capability Structure
>>         -> emulation (almost passthrough).
>>     * Device Specific Register (exclude capability
>>         structures) -> passthrough.
>>            The device drivers in guest domain are allowed
>>            to access Device Specific Register. So various
>> I/O device will work.
>> 
>> Currently MSI Capability Structure and MSI-X Capability
>> Structure is not implemented, and they are hidden from
>> guest software. I disabled MSI and MSI-X in qemu-dm
>> temporary. I am implementing MSI Capability Structure and
>> merging current MSI routines. I will release the patch if
>> you agree with me.
>> 
>> MSI-X will be after MSI. I will be very happy if anyone
>> can help me. 
>> 
>> Other Capability Structures are hidden from guest
>> software. To do this, I change Next Capability Pointer's
>> value to point only the Capability Structure that need to
>> be exported to guest software (see emulate capabilities
>> above). And some Capability Structures are 0 hardwired,
>> and others are passthrough.
>> 
>> This patch removes "switch" statements for emulation, and
>> introduces table based emulation derived from pciback
>> driver. You can implement new Capability Structure by
>> adding new table.
>> The other advantage of using this table is that you can
>> easily change the emulation policy of each field/bit by
>> just simply modifying the "emu_mask" value provided in
>> each register table.
>> And for only special emulation or interacting with other
>> components (like hypervisor), you have to implement
>> function corresponding to the register.
>> 
>> Signed-off-by: Yuji Shimada <shimada-yxb@necst.nec.co.jp>
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* RE: [PATCH][RFC] Support more Capability Structures andDevice Specific
  2008-06-27 10:19   ` Keir Fraser
@ 2008-06-27 10:25     ` Dong, Eddie
  2008-06-27 13:34       ` Ian Jackson
  2008-06-30  7:14       ` Yuji Shimada
  2008-06-27 13:27     ` [PATCH][RFC] Support more Capability Structures andDevice Specific Ian Jackson
  1 sibling, 2 replies; 33+ messages in thread
From: Dong, Eddie @ 2008-06-27 10:25 UTC (permalink / raw)
  To: Keir Fraser, Yuji Shimada, xen-devel; +Cc: Dong, Eddie, Ian Jackson

Keir Fraser wrote:
> What do you think of Yuji's patch? One thing to consider
> is that perhaps the patch should be against the upstream
> qemu merge now. But I'm not sure that PCI passthrough is
> even supported yet in that tree (Ian?). 
> 

If we agree the basic policy is pass through except the ones with known
behavior, I think we don't need that many case to case handle. Dexuan is
working on the implementation base on the summit talk and close to end,
maybe Yuji and Dexuan can coordinate first to see if the proposed policy
can server yuji's purpose.

Thx, eddie

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH][RFC] Support more Capability Structures andDevice Specific
  2008-06-27 10:19   ` Keir Fraser
  2008-06-27 10:25     ` Dong, Eddie
@ 2008-06-27 13:27     ` Ian Jackson
  2008-06-27 13:55       ` Ian Jackson
  1 sibling, 1 reply; 33+ messages in thread
From: Ian Jackson @ 2008-06-27 13:27 UTC (permalink / raw)
  To: Keir Fraser; +Cc: Yuji Shimada, xen-devel, Dong, Eddie

Keir Fraser writes ("Re: [Xen-devel] [PATCH][RFC] Support more Capability Structures andDevice Specific"):
> What do you think of Yuji's patch? One thing to consider is that perhaps the
> patch should be against the upstream qemu merge now. But I'm not sure that
> PCI passthrough is even supported yet in that tree (Ian?).

In theory PCI passthrough is supposed to be supported in my merged
tree.  But, I haven't tested it yet so I doubt it works.

Ian.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* RE: [PATCH][RFC] Support more Capability Structures andDevice Specific
  2008-06-27 10:25     ` Dong, Eddie
@ 2008-06-27 13:34       ` Ian Jackson
  2008-06-30  4:31         ` Yuji Shimada
  2008-07-01  2:12         ` Dong, Eddie
  2008-06-30  7:14       ` Yuji Shimada
  1 sibling, 2 replies; 33+ messages in thread
From: Ian Jackson @ 2008-06-27 13:34 UTC (permalink / raw)
  To: Dong, Eddie; +Cc: Yuji Shimada, xen-devel, Keir Fraser

Dong, Eddie writes ("RE: [Xen-devel] [PATCH][RFC] Support more Capability Structures andDevice Specific"):
> If we agree the basic policy is pass through except the ones with known
> behavior, I think we don't need that many case to case handle. Dexuan is
> working on the implementation base on the summit talk and close to end,
> maybe Yuji and Dexuan can coordinate first to see if the proposed policy
> can server yuji's purpose.

Is it really safe to pass through operations with unknown behavious ?
Particularly if the system has an iommu, the administrator may be
expecting the passthrough mechanism to defend the host from rogue
behaviour by the card and its owning guest.

(I'm no expert on PCI so forgive me if this question is stupid.)

Ian.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH][RFC] Support more Capability Structures and Device Specific
  2008-06-27  7:38 [PATCH][RFC] Support more Capability Structures and Device Specific Yuji Shimada
  2008-06-27 10:14 ` [PATCH][RFC] Support more Capability Structures andDevice Specific Dong, Eddie
@ 2008-06-27 13:51 ` Samuel Thibault
  2008-06-30  7:12   ` Yuji Shimada
  1 sibling, 1 reply; 33+ messages in thread
From: Samuel Thibault @ 2008-06-27 13:51 UTC (permalink / raw)
  To: Yuji Shimada; +Cc: xen-devel

Hello,

diff -r 926a366ca82f tools/ioemu/list.h
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/tools/ioemu/list.h	Fri Jun 27 11:58:26 2008 +0900
@@ -0,0 +1,89 @@
+#ifndef _IOEMU_LIST_H
+#define _IOEMU_LIST_H
+/* Taken from Linux kernel code, but de-kernelized for userspace. */
+#include <stddef.h>

Could you use the list implementation of qemu instead please?

Samuel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH][RFC] Support more Capability Structures andDevice Specific
  2008-06-27 13:27     ` [PATCH][RFC] Support more Capability Structures andDevice Specific Ian Jackson
@ 2008-06-27 13:55       ` Ian Jackson
  2008-06-30  8:00         ` Yuji Shimada
  0 siblings, 1 reply; 33+ messages in thread
From: Ian Jackson @ 2008-06-27 13:55 UTC (permalink / raw)
  To: Keir Fraser, Yuji Shimada, xen-devel, Dong, Eddie

Ian Jackson writes ("Re: [Xen-devel] [PATCH][RFC] Support more Capability Structures andDevice Specific"):
> Keir Fraser writes ("Re: [Xen-devel] [PATCH][RFC] Support more Capability Structures andDevice Specific"):
> > What do you think of Yuji's patch? One thing to consider is that
> > perhaps the patch should be against the upstream qemu merge
> > now. But I'm not sure that PCI passthrough is even supported yet
> > in that tree (Ian?).
> 
> In theory PCI passthrough is supposed to be supported in my merged
> tree.  But, I haven't tested it yet so I doubt it works.

I should expand on that.  I'd very much like for these kind of patches
to be going into the merged qemu tree now, and I'd also like people to
help test it.

So it would be very good if people who actually use PCI passthrough
were to take a look at the merged qemu tree.

Thanks,
Ian.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH][RFC] Support more Capability Structures andDevice Specific
  2008-06-27 13:34       ` Ian Jackson
@ 2008-06-30  4:31         ` Yuji Shimada
  2008-06-30  5:48           ` [PATCH][RFC] Support more Capability StructuresandDevice Specific Cui, Dexuan
  2008-07-01  2:27           ` [PATCH][RFC] Support more Capability Structures andDevice Specific Dong, Eddie
  2008-07-01  2:12         ` Dong, Eddie
  1 sibling, 2 replies; 33+ messages in thread
From: Yuji Shimada @ 2008-06-30  4:31 UTC (permalink / raw)
  To: Ian Jackson; +Cc: xen-devel, Dong, Eddie, Keir Fraser

I think it is NOT safe to pass through operations with unknown
behaviour. qemu-dm should prevent guest software setting unsafe value
to register.  We have to investigate each register and decide to
emulate(virtualize) or passthrough.

I haven't investigated some capability structures (like PCI-X
Capability Structure).  I hide them from guest software.

Device Specific Registers (exclude capability structures) is
passthrough. In non-virtualized environment, OS does not touch device
specific registers, but device drivers configure them. In virtualized
environment, we have to allow device drivers to configure them.

--
Yuji Shimada

On Fri, 27 Jun 2008 14:34:11 +0100
Ian Jackson <Ian.Jackson@eu.citrix.com> wrote:

> Dong, Eddie writes ("RE: [Xen-devel] [PATCH][RFC] Support more Capability Structures andDevice Specific"):
> > If we agree the basic policy is pass through except the ones with known
> > behavior, I think we don't need that many case to case handle. Dexuan is
> > working on the implementation base on the summit talk and close to end,
> > maybe Yuji and Dexuan can coordinate first to see if the proposed policy
> > can server yuji's purpose.
> 
> Is it really safe to pass through operations with unknown behavious ?
> Particularly if the system has an iommu, the administrator may be
> expecting the passthrough mechanism to defend the host from rogue
> behaviour by the card and its owning guest.
> 
> (I'm no expert on PCI so forgive me if this question is stupid.)
> 
> Ian.
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* RE: [PATCH][RFC] Support more Capability StructuresandDevice Specific
  2008-06-30  4:31         ` Yuji Shimada
@ 2008-06-30  5:48           ` Cui, Dexuan
  2008-06-30  8:14             ` Yuji Shimada
  2008-07-01  2:27           ` [PATCH][RFC] Support more Capability Structures andDevice Specific Dong, Eddie
  1 sibling, 1 reply; 33+ messages in thread
From: Cui, Dexuan @ 2008-06-30  5:48 UTC (permalink / raw)
  To: Yuji Shimada, Ian Jackson; +Cc: xen-devel, Dong, Eddie, Keir Fraser

Hi Yuji,
I looked at the patch.  It seems pretty good. 
Except for the (temporary) absence of MSI/MSI-X stuff, looks the passthrough policy in the patch is almost the same as what is discussed in the PDF file Eddie posted.

I also made some tests against the patch, and found there may be some unstable issues:
I.e., when I boot a 32e RHEL5u1 (I add the "pci=nomsi" parameter)), it can easily (30%~80% probable) stay for a very long (i.e., >40s) at "Starting udev:", and after I login in shell, the NIC seems not present (the guest has no network available), but "lspci" shows the NIC is there.
If I use the Qemu without your patch, the issue disappears at once, and NIC in guest works well.

I haven't found issue in your patch yet. :)

Thanks,
-- Dexuan


-----Original Message-----
From: xen-devel-bounces@lists.xensource.com [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of Yuji Shimada
Sent: 2008年6月30日 12:32
To: Ian Jackson
Cc: xen-devel@lists.xensource.com; Dong, Eddie; Keir Fraser
Subject: Re: [Xen-devel] [PATCH][RFC] Support more Capability StructuresandDevice Specific

I think it is NOT safe to pass through operations with unknown
behaviour. qemu-dm should prevent guest software setting unsafe value
to register.  We have to investigate each register and decide to
emulate(virtualize) or passthrough.

I haven't investigated some capability structures (like PCI-X
Capability Structure).  I hide them from guest software.

Device Specific Registers (exclude capability structures) is
passthrough. In non-virtualized environment, OS does not touch device
specific registers, but device drivers configure them. In virtualized
environment, we have to allow device drivers to configure them.

--
Yuji Shimada

On Fri, 27 Jun 2008 14:34:11 +0100
Ian Jackson <Ian.Jackson@eu.citrix.com> wrote:

> Dong, Eddie writes ("RE: [Xen-devel] [PATCH][RFC] Support more Capability Structures andDevice Specific"):
> > If we agree the basic policy is pass through except the ones with known
> > behavior, I think we don't need that many case to case handle. Dexuan is
> > working on the implementation base on the summit talk and close to end,
> > maybe Yuji and Dexuan can coordinate first to see if the proposed policy
> > can server yuji's purpose.
> 
> Is it really safe to pass through operations with unknown behavious ?
> Particularly if the system has an iommu, the administrator may be
> expecting the passthrough mechanism to defend the host from rogue
> behaviour by the card and its owning guest.
> 
> (I'm no expert on PCI so forgive me if this question is stupid.)
> 
> Ian.
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH][RFC] Support more Capability Structures and Device Specific
  2008-06-27 13:51 ` [PATCH][RFC] Support more Capability Structures and Device Specific Samuel Thibault
@ 2008-06-30  7:12   ` Yuji Shimada
  2008-06-30 10:22     ` Samuel Thibault
  0 siblings, 1 reply; 33+ messages in thread
From: Yuji Shimada @ 2008-06-30  7:12 UTC (permalink / raw)
  To: Samuel Thibault, xen-devel

Do you mean the list implementation inside tools/ioemu/audio/sys-queue.h
in xen-unstable tree ?
I haven't checked the upstream qemu code.

Thanks.

--
Yuji Shimada

> Hello,
> 
> diff -r 926a366ca82f tools/ioemu/list.h
> --- /dev/null	Thu Jan 01 00:00:00 1970 +0000
> +++ b/tools/ioemu/list.h	Fri Jun 27 11:58:26 2008 +0900
> @@ -0,0 +1,89 @@
> +#ifndef _IOEMU_LIST_H
> +#define _IOEMU_LIST_H
> +/* Taken from Linux kernel code, but de-kernelized for userspace. */
> +#include <stddef.h>
> 
> Could you use the list implementation of qemu instead please?
> 
> Samuel
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH][RFC] Support more Capability Structures andDevice Specific
  2008-06-27 10:25     ` Dong, Eddie
  2008-06-27 13:34       ` Ian Jackson
@ 2008-06-30  7:14       ` Yuji Shimada
  2008-06-30  9:02         ` [PATCH][RFC] Support more Capability StructuresandDevice Specific Cui, Dexuan
  1 sibling, 1 reply; 33+ messages in thread
From: Yuji Shimada @ 2008-06-30  7:14 UTC (permalink / raw)
  To: Dong, Eddie; +Cc: xen-devel, Ian Jackson, Keir Fraser

I think it is better to emulate registers related to error reporting
at least.

We can consider two patterns of error handling as following.

PATTERN-1: AER is enabled on host but will not be notified to guest domain.

Actual AER is enable, but guest software is not allowed to enable AER
by _OSC in guest firmware. When error occurs, dom0 will kill guest
domain and reset the device or bus. Registers related to error
reporting should be emulated to prevent guest software turning off
actual error reporting.

PATTERN-2: AER is enabled on host and will also be notified to guest domain.

Actual AER is always enable, but guest can enable/disable its virtual
AER. When error occurs, dom0 check guest emulated AER is enabled or not.
If guest emulated AER is enabled, dom0 will notify error to guest
software. Then guest software will reset I/O device or bus.
If guest emulated AER is disabled, dom0 will not notify error to
guest software. Dom0 will kill guest domain, and reset I/O device or
bus. To do this, registers related to error reporting should be
emulated. And Root Port emulation is also required.


I've taken a look at "Xen - VT-D Enhance.pdf".

Is Dexuan implementing Memory Mapped Configuration Access Mechanism to
support offset 256-4095? Are following interfaces not changed ?

    pci_dev->config_read
    pci_dev->config_write

If they are not changed, I think Dexuan's Memory Mapped Configuration
Access Mechanism and my code can be merged easily.

Thanks a lot.

--
Yuji Shimada

> Keir Fraser wrote:
> > What do you think of Yuji's patch? One thing to consider
> > is that perhaps the patch should be against the upstream
> > qemu merge now. But I'm not sure that PCI passthrough is
> > even supported yet in that tree (Ian?). 
> > 
> 
> If we agree the basic policy is pass through except the ones with known
> behavior, I think we don't need that many case to case handle. Dexuan is
> working on the implementation base on the summit talk and close to end,
> maybe Yuji and Dexuan can coordinate first to see if the proposed policy
> can server yuji's purpose.
> 
> Thx, eddie
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH][RFC] Support more Capability Structures andDevice Specific
  2008-06-27 13:55       ` Ian Jackson
@ 2008-06-30  8:00         ` Yuji Shimada
  2008-06-30 16:50           ` Ian Jackson
  0 siblings, 1 reply; 33+ messages in thread
From: Yuji Shimada @ 2008-06-30  8:00 UTC (permalink / raw)
  To: Ian Jackson; +Cc: xen-devel, Dong, Eddie, Keir Fraser

I am going to release these following patches.

    1. the patch supporting MSI Capability Structure for Xen Unstable.
       (list.h will be removed. some bugs will be fixed.)

    2. the patch supporting MSI-X Capability Structure for Xen Unstable.

    3. the patch against the upstream qemu merge (including MSI/MSI-X
       Capability Structure).

Should I release patch against the upstream qemu merge prior to Xen
Unstable ?
We'd be very happy if you could help me merged that patch, so that I can
concentrate on MSI/MSI-X patches.

Thanks.

--
Yuji Shimada

On Fri, 27 Jun 2008 14:55:01 +0100
Ian Jackson <Ian.Jackson@eu.citrix.com> wrote:

> Ian Jackson writes ("Re: [Xen-devel] [PATCH][RFC] Support more Capability Structures andDevice Specific"):
> > Keir Fraser writes ("Re: [Xen-devel] [PATCH][RFC] Support more Capability Structures andDevice Specific"):
> > > What do you think of Yuji's patch? One thing to consider is that
> > > perhaps the patch should be against the upstream qemu merge
> > > now. But I'm not sure that PCI passthrough is even supported yet
> > > in that tree (Ian?).
> > 
> > In theory PCI passthrough is supposed to be supported in my merged
> > tree.  But, I haven't tested it yet so I doubt it works.
> 
> I should expand on that.  I'd very much like for these kind of patches
> to be going into the merged qemu tree now, and I'd also like people to
> help test it.
> 
> So it would be very good if people who actually use PCI passthrough
> were to take a look at the merged qemu tree.
> 
> Thanks,
> Ian.
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH][RFC] Support more Capability StructuresandDevice Specific
  2008-06-30  5:48           ` [PATCH][RFC] Support more Capability StructuresandDevice Specific Cui, Dexuan
@ 2008-06-30  8:14             ` Yuji Shimada
  2008-06-30  9:29               ` Cui, Dexuan
  0 siblings, 1 reply; 33+ messages in thread
From: Yuji Shimada @ 2008-06-30  8:14 UTC (permalink / raw)
  To: Cui, Dexuan; +Cc: Dong, Eddie, xen-devel, Ian Jackson, Keir Fraser

Hi Dexuan,

I've tested my patch with CentOS 5.1 and PCI/PCIe NIC.  In my test
environment (with "pci=nomsi" set for Dom0 boot parameter), guest
OS can use the assigned NIC and can communicate with external machine.

Does guest OS recieve interrupt? You can check via /proc/interrupts.

Thanks.

--
Yuji Shimada

> Hi Yuji,
> I looked at the patch.  It seems pretty good. 
> Except for the (temporary) absence of MSI/MSI-X stuff, looks the passthrough policy in the patch is almost the same as what is discussed in the PDF file Eddie posted.
> 
> I also made some tests against the patch, and found there may be some unstable issues:
> I.e., when I boot a 32e RHEL5u1 (I add the "pci=nomsi" parameter)), it can easily (30%~80% probable) stay for a very long (i.e., >40s) at "Starting udev:", and after I login in shell, the NIC seems not present (the guest has no network available), but "lspci" shows the NIC is there.
> If I use the Qemu without your patch, the issue disappears at once, and NIC in guest works well.
> 
> I haven't found issue in your patch yet. :)
> 
> Thanks,
> -- Dexuan

^ permalink raw reply	[flat|nested] 33+ messages in thread

* RE: [PATCH][RFC] Support more Capability StructuresandDevice Specific
  2008-06-30  7:14       ` Yuji Shimada
@ 2008-06-30  9:02         ` Cui, Dexuan
  0 siblings, 0 replies; 33+ messages in thread
From: Cui, Dexuan @ 2008-06-30  9:02 UTC (permalink / raw)
  To: Yuji Shimada, Dong, Eddie; +Cc: xen-devel, Ian Jackson, Keir Fraser

I didn't work on the memory mapped access machinism.
I'm not sure whether adding that support is OK since now Qemu emulates an old chipset which is actually PCIe-unaware.

Thanks,
-- Dexuan


-----Original Message-----
From: xen-devel-bounces@lists.xensource.com [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of Yuji Shimada
Sent: 2008年6月30日 15:15
To: Dong, Eddie
Cc: xen-devel@lists.xensource.com; Ian Jackson; Keir Fraser
Subject: Re: [Xen-devel] [PATCH][RFC] Support more Capability StructuresandDevice Specific

I think it is better to emulate registers related to error reporting
at least.

We can consider two patterns of error handling as following.

PATTERN-1: AER is enabled on host but will not be notified to guest domain.

Actual AER is enable, but guest software is not allowed to enable AER
by _OSC in guest firmware. When error occurs, dom0 will kill guest
domain and reset the device or bus. Registers related to error
reporting should be emulated to prevent guest software turning off
actual error reporting.

PATTERN-2: AER is enabled on host and will also be notified to guest domain.

Actual AER is always enable, but guest can enable/disable its virtual
AER. When error occurs, dom0 check guest emulated AER is enabled or not.
If guest emulated AER is enabled, dom0 will notify error to guest
software. Then guest software will reset I/O device or bus.
If guest emulated AER is disabled, dom0 will not notify error to
guest software. Dom0 will kill guest domain, and reset I/O device or
bus. To do this, registers related to error reporting should be
emulated. And Root Port emulation is also required.


I've taken a look at "Xen - VT-D Enhance.pdf".

Is Dexuan implementing Memory Mapped Configuration Access Mechanism to
support offset 256-4095? Are following interfaces not changed ?

    pci_dev->config_read
    pci_dev->config_write

If they are not changed, I think Dexuan's Memory Mapped Configuration
Access Mechanism and my code can be merged easily.

Thanks a lot.

--
Yuji Shimada

> Keir Fraser wrote:
> > What do you think of Yuji's patch? One thing to consider
> > is that perhaps the patch should be against the upstream
> > qemu merge now. But I'm not sure that PCI passthrough is
> > even supported yet in that tree (Ian?). 
> > 
> 
> If we agree the basic policy is pass through except the ones with known
> behavior, I think we don't need that many case to case handle. Dexuan is
> working on the implementation base on the summit talk and close to end,
> maybe Yuji and Dexuan can coordinate first to see if the proposed policy
> can server yuji's purpose.
> 
> Thx, eddie
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* RE: [PATCH][RFC] Support more Capability StructuresandDevice Specific
  2008-06-30  8:14             ` Yuji Shimada
@ 2008-06-30  9:29               ` Cui, Dexuan
  2008-07-02  1:03                 ` Yuji Shimada
  0 siblings, 1 reply; 33+ messages in thread
From: Cui, Dexuan @ 2008-06-30  9:29 UTC (permalink / raw)
  To: Yuji Shimada; +Cc: Dong, Eddie, xen-devel, Ian Jackson, Keir Fraser

I'm using x86_64 c/s 17888: 6ace85eb96c0, and assigning a 82541PI Gigabit Etherer NIC to guest.
I also tried  "pci=nomsi" for Dom0, and the issus is still there. 
When the issue happens, eth0 doesn't occur in /proc/interrupt though the device driver module is loaded.
The issue doesn't happen every time. Really strange...

Thanks,
-- Dexuan


-----Original Message-----
From: Yuji Shimada [mailto:shimada-yxb@necst.nec.co.jp] 
Sent: 2008年6月30日 16:15
To: Cui, Dexuan
Cc: Ian Jackson; xen-devel@lists.xensource.com; Dong, Eddie; Keir Fraser
Subject: Re: [Xen-devel] [PATCH][RFC] Support more Capability StructuresandDevice Specific

Hi Dexuan,

I've tested my patch with CentOS 5.1 and PCI/PCIe NIC.  In my test
environment (with "pci=nomsi" set for Dom0 boot parameter), guest
OS can use the assigned NIC and can communicate with external machine.

Does guest OS recieve interrupt? You can check via /proc/interrupts.

Thanks.

--
Yuji Shimada

> Hi Yuji,
> I looked at the patch.  It seems pretty good. 
> Except for the (temporary) absence of MSI/MSI-X stuff, looks the passthrough policy in the patch is almost the same as what is discussed in the PDF file Eddie posted.
> 
> I also made some tests against the patch, and found there may be some unstable issues:
> I.e., when I boot a 32e RHEL5u1 (I add the "pci=nomsi" parameter)), it can easily (30%~80% probable) stay for a very long (i.e., >40s) at "Starting udev:", and after I login in shell, the NIC seems not present (the guest has no network available), but "lspci" shows the NIC is there.
> If I use the Qemu without your patch, the issue disappears at once, and NIC in guest works well.
> 
> I haven't found issue in your patch yet. :)
> 
> Thanks,
> -- Dexuan

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH][RFC] Support more Capability Structures and Device Specific
  2008-06-30  7:12   ` Yuji Shimada
@ 2008-06-30 10:22     ` Samuel Thibault
  0 siblings, 0 replies; 33+ messages in thread
From: Samuel Thibault @ 2008-06-30 10:22 UTC (permalink / raw)
  To: Yuji Shimada; +Cc: xen-devel

Yuji Shimada, le Mon 30 Jun 2008 16:12:31 +0900, a écrit :
> Do you mean the list implementation inside tools/ioemu/audio/sys-queue.h
> in xen-unstable tree ?

Yes.

> I haven't checked the upstream qemu code.

It's the same, with an additional QEMU prefix.

Samuel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH][RFC] Support more Capability Structures andDevice Specific
  2008-06-30  8:00         ` Yuji Shimada
@ 2008-06-30 16:50           ` Ian Jackson
  2008-07-01  2:25             ` [PATCH][RFC] Support more Capability StructuresandDevice Specific Cui, Dexuan
  0 siblings, 1 reply; 33+ messages in thread
From: Ian Jackson @ 2008-06-30 16:50 UTC (permalink / raw)
  To: Yuji Shimada; +Cc: xen-devel, Dong, Eddie, Keir Fraser

Yuji Shimada writes ("Re: [Xen-devel] [PATCH][RFC] Support more Capability Structures andDevice Specific"):
> I am going to release these following patches.
...
> Should I release patch against the upstream qemu merge prior to Xen
> Unstable ?

We're currently working to integrate the upstream qemu merge into
patchman.  I hope that we can be successful with this, so that we can
switch to this tree as the default in 3.3.

With that in mind:

Have you looked at the upstream merge at all yet ?  I haven't yet done
any PCI passthrough testing and I think it's very important that we do
some of that testing.  If you have any time available to do some
checking of the qemu upstream merge tree, even before we consider your
recent patches, that would be very helpful.

Ian.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* RE: [PATCH][RFC] Support more Capability Structures andDevice Specific
  2008-06-27 13:34       ` Ian Jackson
  2008-06-30  4:31         ` Yuji Shimada
@ 2008-07-01  2:12         ` Dong, Eddie
  1 sibling, 0 replies; 33+ messages in thread
From: Dong, Eddie @ 2008-07-01  2:12 UTC (permalink / raw)
  To: Ian Jackson; +Cc: Yuji Shimada, xen-devel, Dong, Eddie, Keir Fraser

Ian Jackson wrote:
> Dong, Eddie writes ("RE: [Xen-devel] [PATCH][RFC] Support
> more Capability Structures andDevice Specific"): 
>> If we agree the basic policy is pass through except the
>> ones with known behavior, I think we don't need that
>> many case to case handle. Dexuan is working on the
>> implementation base on the summit talk and close to end,
>> maybe Yuji and Dexuan can coordinate first to see if the
>> proposed policy can server yuji's purpose. 
> 
> Is it really safe to pass through operations with unknown
> behavious ? 
> Particularly if the system has an iommu, the
> administrator may be 
> expecting the passthrough mechanism to defend the host
> from rogue 
> behaviour by the card and its owning guest.


What kind of operations in your mind will hurt host?
But yes, guest may not work properly for some cases such as some vidoe
card I mentioned in summit which may map host address to internal
register and used by drivers. For those kind of devices, CP will just
disable assignment thru kind of assignable check such as blacklist.

Other than that, I didn't see how pass through will make things worse.
Can u specify ? All guest access to memory is protected by IOMMU and
thus no imapct to host. 


Thx, eddie

^ permalink raw reply	[flat|nested] 33+ messages in thread

* RE: [PATCH][RFC] Support more Capability StructuresandDevice Specific
  2008-06-30 16:50           ` Ian Jackson
@ 2008-07-01  2:25             ` Cui, Dexuan
  0 siblings, 0 replies; 33+ messages in thread
From: Cui, Dexuan @ 2008-07-01  2:25 UTC (permalink / raw)
  To: Ian Jackson, Yuji Shimada; +Cc: xen-devel, Dong, Eddie, Keir Fraser

The PCI config space passthrough is actually crucial for Xen 3.3.

We have already found many device assignment issues caused by the lack of the passthrough (and I think more issues will be found).
For example, the assigned BroadCom NIC can't work because the driver needs to access the register BNX2_PCICFG_INT_ACK_CMD (at 0x84); another example: the USB assignment doesn't work -- one of the reasons is: UHCI-hc needs to access config space reigster USB_LEGKEY(at 0xC0-C1) to enable USB IRQ.

So it will be great to push the patch into xen-unstable tree.

Thanks,
-- Dexuan


-----Original Message-----
From: xen-devel-bounces@lists.xensource.com [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of Ian Jackson
Sent: 2008年7月1日 0:50
To: Yuji Shimada
Cc: xen-devel@lists.xensource.com; Dong, Eddie; Keir Fraser
Subject: Re: [Xen-devel] [PATCH][RFC] Support more Capability StructuresandDevice Specific

Yuji Shimada writes ("Re: [Xen-devel] [PATCH][RFC] Support more Capability Structures andDevice Specific"):
> I am going to release these following patches.
...
> Should I release patch against the upstream qemu merge prior to Xen
> Unstable ?

We're currently working to integrate the upstream qemu merge into
patchman.  I hope that we can be successful with this, so that we can
switch to this tree as the default in 3.3.

With that in mind:

Have you looked at the upstream merge at all yet ?  I haven't yet done
any PCI passthrough testing and I think it's very important that we do
some of that testing.  If you have any time available to do some
checking of the qemu upstream merge tree, even before we consider your
recent patches, that would be very helpful.

Ian.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* RE: [PATCH][RFC] Support more Capability Structures andDevice Specific
  2008-06-30  4:31         ` Yuji Shimada
  2008-06-30  5:48           ` [PATCH][RFC] Support more Capability StructuresandDevice Specific Cui, Dexuan
@ 2008-07-01  2:27           ` Dong, Eddie
  2008-07-01  8:00             ` Yuji Shimada
  1 sibling, 1 reply; 33+ messages in thread
From: Dong, Eddie @ 2008-07-01  2:27 UTC (permalink / raw)
  To: Yuji Shimada, Ian Jackson; +Cc: xen-devel, Dong, Eddie, Keir Fraser

Yuji Shimada wrote:
> I think it is NOT safe to pass through operations with
> unknown behaviour. qemu-dm should prevent guest software

This used to be the reason why we only pass through CMD register, but we
then suffer from that in different devices and I have to admit, without
passthrough real setting, the device won't function correctly in many
cases.

Do u have any real data ? Device memory access is fine with IOMMU, irq
vector is virtualized now.

> setting unsafe value to register.  We have to investigate
> each register and decide to emulate(virtualize) or
> passthrough. 

Investigation is defintely good, eventually we need to know all
configuration registers, but even with that, there are still device
specifc registers we have to deal such as Vendor specific capability and
those registers directly defined by devices (not a standard PCI
capability). But hidding settings due to the reason we didn't
investigate yet will simply make things worse, and we already observed
this with more devices tested such as UHCI mouse etc.


> 
> I haven't investigated some capability structures (like
> PCI-X Capability Structure).  I hide them from guest
> software. 

ditto

> 
> Device Specific Registers (exclude capability structures)
> is passthrough. In non-virtualized environment, OS does
> not touch device specific registers, but device drivers
> configure them. In virtualized environment, we have to
> allow device drivers to configure them. 

We need to pass through except the device is doing IRQ vector setting
which can't be handled in either case, those devices are simply not
assignable.

Thanks, eddie

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH][RFC] Support more Capability Structures andDevice Specific
  2008-07-01  2:27           ` [PATCH][RFC] Support more Capability Structures andDevice Specific Dong, Eddie
@ 2008-07-01  8:00             ` Yuji Shimada
  2008-07-01  9:54               ` Ian Jackson
  0 siblings, 1 reply; 33+ messages in thread
From: Yuji Shimada @ 2008-07-01  8:00 UTC (permalink / raw)
  To: Dong, Eddie; +Cc: xen-devel, Ian Jackson, Keir Fraser

In my patch, registers inside Vendor Specific Capability Structure
(Capability ID is 09h) are all passthrough, except Next Capability
Pointer Register.

Registers area defined by device are not belonging to Capability
Structure, and they are all passthrough.

Do you mean non standard Capability Structure should be passthrough?


For PCI-X Capability Structure, we have to emulate Function Number
Field, Device Number Field, and Bus Number Field in PCI-X Status
Register at least. The reason is that the value for host and guest
maybe different.

Additionally, I think we have to emulate Maximum Outstanding Split
Transactions Field in PCI-X Command Register too. The reason is that
the real value should be decided by considering all over the system,
and we should also prevent guest software writing value.

Thanks.

--
Yuji Shimada

On Tue, 1 Jul 2008 10:27:02 +0800
"Dong, Eddie" <eddie.dong@intel.com> wrote:

> Yuji Shimada wrote:
> > I think it is NOT safe to pass through operations with
> > unknown behaviour. qemu-dm should prevent guest software
> 
> This used to be the reason why we only pass through CMD register, but we
> then suffer from that in different devices and I have to admit, without
> passthrough real setting, the device won't function correctly in many
> cases.
> 
> Do u have any real data ? Device memory access is fine with IOMMU, irq
> vector is virtualized now.
> 
> > setting unsafe value to register.  We have to investigate
> > each register and decide to emulate(virtualize) or
> > passthrough. 
> 
> Investigation is defintely good, eventually we need to know all
> configuration registers, but even with that, there are still device
> specifc registers we have to deal such as Vendor specific capability and
> those registers directly defined by devices (not a standard PCI
> capability). But hidding settings due to the reason we didn't
> investigate yet will simply make things worse, and we already observed
> this with more devices tested such as UHCI mouse etc.
> 
> 
> > 
> > I haven't investigated some capability structures (like
> > PCI-X Capability Structure).  I hide them from guest
> > software. 
> 
> ditto
> 
> > 
> > Device Specific Registers (exclude capability structures)
> > is passthrough. In non-virtualized environment, OS does
> > not touch device specific registers, but device drivers
> > configure them. In virtualized environment, we have to
> > allow device drivers to configure them. 
> 
> We need to pass through except the device is doing IRQ vector setting
> which can't be handled in either case, those devices are simply not
> assignable.
> 
> Thanks, eddie
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH][RFC] Support more Capability Structures andDevice Specific
  2008-07-01  8:00             ` Yuji Shimada
@ 2008-07-01  9:54               ` Ian Jackson
  2008-07-01 23:23                 ` Dong, Eddie
  0 siblings, 1 reply; 33+ messages in thread
From: Ian Jackson @ 2008-07-01  9:54 UTC (permalink / raw)
  To: Yuji Shimada; +Cc: xen-devel, Dong, Eddie, Keir Fraser

Yuji Shimada writes ("Re: [Xen-devel] [PATCH][RFC] Support more Capability Structures andDevice Specific"):
> In my patch, registers inside Vendor Specific Capability Structure
> (Capability ID is 09h) are all passthrough, except Next Capability
> Pointer Register.

My worry is that a device may say, in its vendor-specific
register-level programming documentation for these configurations,
something like:

  Do _not_ set USE_EXTERNAL_INPUT and USE_INTERNAL_INPUT
  simultaneously; this may cause damage to the Gnomovision PCI
  card and may also cause the Gnomovision PCI card to draw
  excessive current from the host power supply.

Or

  Do _not_ use the UPLOAD_FIRMWARE_* configuration.  These are for use
  by the approved Gnomovision firmware loader only.  Uploading bad
  firmware may cause damage [etc. etc.]

I haven't read many modern PCI card specs but with the constant
shifting of functionality (even functionality which is intended to
preserve hardware integrity) to software and firmware, I would be wary
of assuming that every unknown PCI card has no register and
configuration settings which can cause hardware damage or other kinds
of unexpected and undesirable events.

If there is there a requirement written into the general PCI
specification that this won't happen, then fine - if so please quote
chapter and verse.

Ian.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* RE: [PATCH][RFC] Support more Capability Structures andDevice Specific
  2008-07-01  9:54               ` Ian Jackson
@ 2008-07-01 23:23                 ` Dong, Eddie
  2008-07-02 10:30                   ` Ian Jackson
  0 siblings, 1 reply; 33+ messages in thread
From: Dong, Eddie @ 2008-07-01 23:23 UTC (permalink / raw)
  To: Ian Jackson, Yuji Shimada; +Cc: xen-devel, Dong, Eddie, Keir Fraser


> I haven't read many modern PCI card specs but with the
> constant 
> shifting of functionality (even functionality which is
> intended to 
> preserve hardware integrity) to software and firmware, I
> would be wary 
> of assuming that every unknown PCI card has no register
> and 
> configuration settings which can cause hardware damage or
> other kinds 
> of unexpected and undesirable events.

For those vendor specific registers (in PCI cap or not), we always have
dillemma: pass through with functionality or discard with malfunction.
If we pass through them, then the opposite card are in black list that
we can't support, if we discard them, the former class of card becomes
un-assignable.

Per current data, pass through get many known bug fixed as the case
Dexuan mentioned. But we didn't see a HW damaging host. Some know issue
could be a device issuing tons of PCIe traffic, absorbing extra power,
issuing interrupt storm etc, but right now we didn't see issues yet.

> 
> If there is there a requirement written into the general
> PCI 
> specification that this won't happen, then fine - if so
> please quote 
> chapter and verse.
> 
Defintely if PCIe spec can be modified to support this one is best, but
we still have legacy devices.

Thx, eddie

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH][RFC] Support more Capability StructuresandDevice Specific
  2008-06-30  9:29               ` Cui, Dexuan
@ 2008-07-02  1:03                 ` Yuji Shimada
  2008-07-02  2:07                   ` Cui, Dexuan
  2008-07-03  1:49                   ` Dong, Eddie
  0 siblings, 2 replies; 33+ messages in thread
From: Yuji Shimada @ 2008-07-02  1:03 UTC (permalink / raw)
  To: Cui, Dexuan; +Cc: Dong, Eddie, xen-devel, Ian Jackson, Keir Fraser

[-- Attachment #1: Type: text/plain, Size: 3301 bytes --]

I've done some bug fixes as follows.

1. correct the size calculation of MSI Capability Structure in
   pt_msi_size_init(). The next capability might be hidden due to wrong
   large size of MSI.

2. modify the decision logic for determining unused Exp ROM BAR in
   pt_bar_reg_parse(). Use PCIIORegion table instead of parsing
   BAR itself.

3. bug fix on .size_init func for PCI Express Capability Structure
   in pt_emu_reg_grp_tbl[].
   (pt_vendor_size_init ---> pt_reg_grp_size_init)

4. small bug fix on the decision logic for checking unused BAR in
   pt_pci_write_config().

5. add printf message to show overlapped device in pt_chk_bar_overlap().

6. modify pt_bar_mapping() to prevent guest software mapping memory
   resource to 00000000h

7. modify pt_bar_mapping() to map resource even if overlapping is
   detected.

I've tested my patch with CentOS 5.1 and PCI/PCIe NIC.  Without
"pci=nomsi", guest OS can use the assigned NIC and can communicate
with external machine.

Additionally I assigned UHCI Controller to guest domain. Guest OS can
use USB-HDD and USB-Mouse.

Could you test the patch?


I am going to remove list.h and enable MSI.

Thanks.

Signed-off-by: Yuji Shimada <shimada-yxb@necst.nec.co.jp>

--
Yuji Shimada

On Mon, 30 Jun 2008 17:29:38 +0800
"Cui, Dexuan" <dexuan.cui@intel.com> wrote:

> I'm using x86_64 c/s 17888: 6ace85eb96c0, and assigning a 82541PI Gigabit Etherer NIC to guest.
> I also tried  "pci=nomsi" for Dom0, and the issus is still there. 
> When the issue happens, eth0 doesn't occur in /proc/interrupt though the device driver module is loaded.
> The issue doesn't happen every time. Really strange...
> 
> Thanks,
> -- Dexuan
> 
> 
> -----Original Message-----
> From: Yuji Shimada [mailto:shimada-yxb@necst.nec.co.jp] 
> Sent: 2008夏・花可30科苛 16:15
> To: Cui, Dexuan
> Cc: Ian Jackson; xen-devel@lists.xensource.com; Dong, Eddie; Keir Fraser
> Subject: Re: [Xen-devel] [PATCH][RFC] Support more Capability StructuresandDevice Specific
> 
> Hi Dexuan,
> 
> I've tested my patch with CentOS 5.1 and PCI/PCIe NIC.  In my test
> environment (with "pci=nomsi" set for Dom0 boot parameter), guest
> OS can use the assigned NIC and can communicate with external machine.
> 
> Does guest OS recieve interrupt? You can check via /proc/interrupts.
> 
> Thanks.
> 
> --
> Yuji Shimada
> 
> > Hi Yuji,
> > I looked at the patch.  It seems pretty good. 
> > Except for the (temporary) absence of MSI/MSI-X stuff, looks the passthrough policy in the patch is almost the same as what is discussed in the PDF file Eddie posted.
> > 
> > I also made some tests against the patch, and found there may be some unstable issues:
> > I.e., when I boot a 32e RHEL5u1 (I add the "pci=nomsi" parameter)), it can easily (30%~80% probable) stay for a very long (i.e., >40s) at "Starting udev:", and after I login in shell, the NIC seems not present (the guest has no network available), but "lspci" shows the NIC is there.
> > If I use the Qemu without your patch, the issue disappears at once, and NIC in guest works well.
> > 
> > I haven't found issue in your patch yet. :)
> > 
> > Thanks,
> > -- Dexuan
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel

[-- Attachment #2: pci_config_passthrough-080702-02.patch --]
[-- Type: application/octet-stream, Size: 76170 bytes --]

diff -r 926a366ca82f tools/ioemu/hw/pass-through.c
--- a/tools/ioemu/hw/pass-through.c	Fri Jun 20 15:21:26 2008 +0100
+++ b/tools/ioemu/hw/pass-through.c	Tue Jul 01 20:35:37 2008 +0900
@@ -26,7 +26,7 @@
 #include "pass-through.h"
 #include "pci/header.h"
 #include "pci/pci.h"
-#include "pt-msi.h"
+//#include "pt-msi.h"
 
 extern FILE *logfile;
 
@@ -46,6 +46,498 @@ struct dpci_infos {
 
 } dpci_infos;
 
+/* prototype */
+static uint32_t pt_common_reg_init(struct pt_dev *ptdev,
+    struct pt_reg_info_tbl *reg, uint32_t real_offset);
+static uint32_t pt_ptr_reg_init(struct pt_dev *ptdev,
+    struct pt_reg_info_tbl *reg, uint32_t real_offset);
+static uint32_t pt_status_reg_init(struct pt_dev *ptdev,
+    struct pt_reg_info_tbl *reg, uint32_t real_offset);
+static uint32_t pt_irqpin_reg_init(struct pt_dev *ptdev,
+    struct pt_reg_info_tbl *reg, uint32_t real_offset);
+static uint32_t pt_bar_reg_init(struct pt_dev *ptdev,
+    struct pt_reg_info_tbl *reg, uint32_t real_offset);
+static uint32_t pt_linkctrl2_reg_init(struct pt_dev *ptdev,
+    struct pt_reg_info_tbl *reg, uint32_t real_offset);
+static uint8_t pt_reg_grp_size_init(struct pt_dev *ptdev,
+    struct pt_reg_grp_info_tbl *grp_reg, uint32_t base_offset);
+static uint8_t pt_msi_size_init(struct pt_dev *ptdev,
+    struct pt_reg_grp_info_tbl *grp_reg, uint32_t base_offset);
+static uint8_t pt_vendor_size_init(struct pt_dev *ptdev,
+    struct pt_reg_grp_info_tbl *grp_reg, uint32_t base_offset);
+static int pt_byte_reg_read(struct pt_dev *ptdev,
+    struct pt_reg_tbl *cfg_entry,
+    uint8_t *valueu, uint8_t valid_mask);
+static int pt_word_reg_read(struct pt_dev *ptdev,
+    struct pt_reg_tbl *cfg_entry,
+    uint16_t *value, uint16_t valid_mask);
+static int pt_long_reg_read(struct pt_dev *ptdev,
+    struct pt_reg_tbl *cfg_entry,
+    uint32_t *value, uint32_t valid_mask);
+static int pt_bar_reg_read(struct pt_dev *ptdev,
+    struct pt_reg_tbl *cfg_entry,
+    uint32_t *value, uint32_t valid_mask);
+static int pt_byte_reg_write(struct pt_dev *ptdev, 
+    struct pt_reg_tbl *cfg_entry, 
+    uint8_t *value, uint8_t dev_value, uint8_t valid_mask);
+static int pt_word_reg_write(struct pt_dev *ptdev, 
+    struct pt_reg_tbl *cfg_entry, 
+    uint16_t *value, uint16_t dev_value, uint16_t valid_mask);
+static int pt_long_reg_write(struct pt_dev *ptdev, 
+    struct pt_reg_tbl *cfg_entry, 
+    uint32_t *value, uint32_t dev_value, uint32_t valid_mask);
+static int pt_cmd_reg_write(struct pt_dev *ptdev, 
+    struct pt_reg_tbl *cfg_entry, 
+    uint16_t *value, uint16_t dev_value, uint16_t valid_mask);
+static int pt_bar_reg_write(struct pt_dev *ptdev, 
+    struct pt_reg_tbl *cfg_entry, 
+    uint32_t *value, uint32_t dev_value, uint32_t valid_mask);
+static int pt_exp_rom_bar_reg_write(struct pt_dev *ptdev, 
+    struct pt_reg_tbl *cfg_entry, 
+    uint32_t *value, uint32_t dev_value, uint32_t valid_mask);
+static int pt_pmcsr_reg_write(struct pt_dev *ptdev, 
+    struct pt_reg_tbl *cfg_entry, 
+    uint16_t *value, uint16_t dev_value, uint16_t valid_mask);
+static int pt_devctrl_reg_write(struct pt_dev *ptdev, 
+    struct pt_reg_tbl *cfg_entry, 
+    uint16_t *value, uint16_t dev_value, uint16_t valid_mask);
+static int pt_linkctrl_reg_write(struct pt_dev *ptdev, 
+    struct pt_reg_tbl *cfg_entry, 
+    uint16_t *value, uint16_t dev_value, uint16_t valid_mask);
+static int pt_devctrl2_reg_write(struct pt_dev *ptdev, 
+    struct pt_reg_tbl *cfg_entry, 
+    uint16_t *value, uint16_t dev_value, uint16_t valid_mask);
+static int pt_linkctrl2_reg_write(struct pt_dev *ptdev, 
+    struct pt_reg_tbl *cfg_entry, 
+    uint16_t *value, uint16_t dev_value, uint16_t valid_mask);
+
+/* Header Type0 reg static infomation table */
+static struct pt_reg_info_tbl pt_emu_reg_header0_tbl[] = {
+    /* Command reg */
+    {
+        .offset     = PCI_COMMAND,
+        .size       = 2,
+        .init_val   = 0x0000,
+        .ro_mask    = 0xF880,
+        .emu_mask   = 0x0340,
+        .init       = pt_common_reg_init,
+        .u.w.read   = pt_word_reg_read,
+        .u.w.write  = pt_cmd_reg_write,
+    },
+    /* Capabilities Pointer reg */
+    {
+        .offset     = PCI_CAPABILITY_LIST,
+        .size       = 1,
+        .init_val   = 0x00,
+        .ro_mask    = 0xFF,
+        .emu_mask   = 0xFF,
+        .init       = pt_ptr_reg_init,
+        .u.b.read   = pt_byte_reg_read,
+        .u.b.write  = pt_byte_reg_write,
+    },
+    /* Status reg */
+    /* use emulated Cap Ptr value to initialize, 
+     * so need to be declared after Cap Ptr reg 
+     */
+    {
+        .offset     = PCI_STATUS,
+        .size       = 2,
+        .init_val   = 0x0000,
+        .ro_mask    = 0x06FF,
+        .emu_mask   = 0x0010,
+        .init       = pt_status_reg_init,
+        .u.w.read   = pt_word_reg_read,
+        .u.w.write  = pt_word_reg_write,
+    },
+    /* Cache Line Size reg */
+    {
+        .offset     = PCI_CACHE_LINE_SIZE,
+        .size       = 1,
+        .init_val   = 0x00,
+        .ro_mask    = 0x00,
+        .emu_mask   = 0xFF,
+        .init       = pt_common_reg_init,
+        .u.b.read   = pt_byte_reg_read,
+        .u.b.write  = pt_byte_reg_write,
+    },
+    /* Latency Timer reg */
+    {
+        .offset     = PCI_LATENCY_TIMER,
+        .size       = 1,
+        .init_val   = 0x00,
+        .ro_mask    = 0x00,
+        .emu_mask   = 0xFF,
+        .init       = pt_common_reg_init,
+        .u.b.read   = pt_byte_reg_read,
+        .u.b.write  = pt_byte_reg_write,
+    },
+    /* Header Type reg */
+    {
+        .offset     = PCI_HEADER_TYPE,
+        .size       = 1,
+        .init_val   = 0x00,
+        .ro_mask    = 0xFF,
+        .emu_mask   = 0x80,
+        .init       = pt_common_reg_init,
+        .u.b.read   = pt_byte_reg_read,
+        .u.b.write  = pt_byte_reg_write,
+    },
+    /* Interrupt Line reg */
+    {
+        .offset     = PCI_INTERRUPT_LINE,
+        .size       = 1,
+        .init_val   = 0x00,
+        .ro_mask    = 0x00,
+        .emu_mask   = 0xFF,
+        .init       = pt_common_reg_init,
+        .u.b.read   = pt_byte_reg_read,
+        .u.b.write  = pt_byte_reg_write,
+    },
+    /* Interrupt Pin reg */
+    {
+        .offset     = PCI_INTERRUPT_PIN,
+        .size       = 1,
+        .init_val   = 0x00,
+        .ro_mask    = 0xFF,
+        .emu_mask   = 0xFF,
+        .init       = pt_irqpin_reg_init,
+        .u.b.read   = pt_byte_reg_read,
+        .u.b.write  = pt_byte_reg_write,
+    },
+    /* BAR 0 reg */
+    /* mask of BAR need to be decided later, depends on IO/MEM type */
+    {
+        .offset     = PCI_BASE_ADDRESS_0,
+        .size       = 4,
+        .init_val   = 0x00000000,
+        .init       = pt_bar_reg_init,
+        .u.dw.read  = pt_bar_reg_read,
+        .u.dw.write = pt_bar_reg_write,
+    },
+    /* BAR 1 reg */
+    {
+        .offset     = PCI_BASE_ADDRESS_1,
+        .size       = 4,
+        .init_val   = 0x00000000,
+        .init       = pt_bar_reg_init,
+        .u.dw.read  = pt_bar_reg_read,
+        .u.dw.write = pt_bar_reg_write,
+    },
+    /* BAR 2 reg */
+    {
+        .offset     = PCI_BASE_ADDRESS_2,
+        .size       = 4,
+        .init_val   = 0x00000000,
+        .init       = pt_bar_reg_init,
+        .u.dw.read  = pt_bar_reg_read,
+        .u.dw.write = pt_bar_reg_write,
+    },
+    /* BAR 3 reg */
+    {
+        .offset     = PCI_BASE_ADDRESS_3,
+        .size       = 4,
+        .init_val   = 0x00000000,
+        .init       = pt_bar_reg_init,
+        .u.dw.read  = pt_bar_reg_read,
+        .u.dw.write = pt_bar_reg_write,
+    },
+    /* BAR 4 reg */
+    {
+        .offset     = PCI_BASE_ADDRESS_4,
+        .size       = 4,
+        .init_val   = 0x00000000,
+        .init       = pt_bar_reg_init,
+        .u.dw.read  = pt_bar_reg_read,
+        .u.dw.write = pt_bar_reg_write,
+    },
+    /* BAR 5 reg */
+    {
+        .offset     = PCI_BASE_ADDRESS_5,
+        .size       = 4,
+        .init_val   = 0x00000000,
+        .init       = pt_bar_reg_init,
+        .u.dw.read  = pt_bar_reg_read,
+        .u.dw.write = pt_bar_reg_write,
+    },
+    /* Expansion ROM BAR reg */
+    {
+        .offset     = PCI_ROM_ADDRESS,
+        .size       = 4,
+        .init_val   = 0x00000000,
+        .ro_mask    = 0x000007FE,
+        .emu_mask   = 0xFFFFF800,
+        .init       = pt_bar_reg_init,
+        .u.dw.read  = pt_long_reg_read,
+        .u.dw.write = pt_exp_rom_bar_reg_write,
+    },
+    {
+        .size = 0,
+    }, 
+};
+
+/* Power Management Capability reg static infomation table */
+static struct pt_reg_info_tbl pt_emu_reg_pm_tbl[] = {
+    /* Next Pointer reg */
+    {
+        .offset     = PCI_CAP_LIST_NEXT,
+        .size       = 1,
+        .init_val   = 0x00,
+        .ro_mask    = 0xFF,
+        .emu_mask   = 0xFF,
+        .init       = pt_ptr_reg_init,
+        .u.b.read   = pt_byte_reg_read,
+        .u.b.write  = pt_byte_reg_write,
+    },
+    /* Power Management Capabilities reg */
+    {
+        .offset     = PCI_CAP_FLAGS,
+        .size       = 2,
+        .init_val   = 0x0000,
+        .ro_mask    = 0xFFFF,
+        .emu_mask   = 0xFFE8,
+        .init       = pt_common_reg_init,
+        .u.w.read   = pt_word_reg_read,
+        .u.w.write  = pt_word_reg_write,
+    },
+    /* PCI Power Management Control/Status reg */
+    {
+        .offset     = PCI_PM_CTRL,
+        .size       = 2,
+        .init_val   = 0x0008,
+        .ro_mask    = 0x60FC,
+        .emu_mask   = 0xFF0B,
+        .init       = pt_common_reg_init,
+        .u.w.read   = pt_word_reg_read,
+        .u.w.write  = pt_pmcsr_reg_write,
+    },
+    /* Data reg */
+    {
+        .offset     = PCI_PM_DATA_REGISTER,
+        .size       = 1,
+        .init_val   = 0x00,
+        .ro_mask    = 0xFF,
+        .emu_mask   = 0xFF,
+        .init       = pt_common_reg_init,
+        .u.b.read   = pt_byte_reg_read,
+        .u.b.write  = pt_byte_reg_write,
+    },
+    {
+        .size = 0,
+    }, 
+};
+
+/* Vital Product Data Capability Structure reg static infomation table */
+static struct pt_reg_info_tbl pt_emu_reg_vpd_tbl[] = {
+    /* Next Pointer reg */
+    {
+        .offset     = PCI_CAP_LIST_NEXT,
+        .size       = 1,
+        .init_val   = 0x00,
+        .ro_mask    = 0xFF,
+        .emu_mask   = 0xFF,
+        .init       = pt_ptr_reg_init,
+        .u.b.read   = pt_byte_reg_read,
+        .u.b.write  = pt_byte_reg_write,
+    },
+    {
+        .size = 0,
+    }, 
+};
+
+/* Vendor Specific Capability Structure reg static infomation table */
+static struct pt_reg_info_tbl pt_emu_reg_vendor_tbl[] = {
+    /* Next Pointer reg */
+    {
+        .offset     = PCI_CAP_LIST_NEXT,
+        .size       = 1,
+        .init_val   = 0x00,
+        .ro_mask    = 0xFF,
+        .emu_mask   = 0xFF,
+        .init       = pt_ptr_reg_init,
+        .u.b.read   = pt_byte_reg_read,
+        .u.b.write  = pt_byte_reg_write,
+    },
+    {
+        .size = 0,
+    }, 
+};
+
+/* PCI Express Capability Structure reg static infomation table */
+static struct pt_reg_info_tbl pt_emu_reg_pcie_tbl[] = {
+    /* Next Pointer reg */
+    {
+        .offset     = PCI_CAP_LIST_NEXT,
+        .size       = 1,
+        .init_val   = 0x00,
+        .ro_mask    = 0xFF,
+        .emu_mask   = 0xFF,
+        .init       = pt_ptr_reg_init,
+        .u.b.read   = pt_byte_reg_read,
+        .u.b.write  = pt_byte_reg_write,
+    },
+    /* Device Capabilities reg */
+    {
+        .offset     = PCI_EXP_DEVCAP,
+        .size       = 4,
+        .init_val   = 0x00000000,
+        .ro_mask    = 0x1FFCFFFF,
+        .emu_mask   = 0x10000000,
+        .init       = pt_common_reg_init,
+        .u.dw.read  = pt_long_reg_read,
+        .u.dw.write = pt_long_reg_write,
+    },
+    /* Device Control reg */
+    {
+        .offset     = PCI_EXP_DEVCTL,
+        .size       = 2,
+        .init_val   = 0x2810,
+        .ro_mask    = 0x0000,
+        .emu_mask   = 0xFFFF,
+        .init       = pt_common_reg_init,
+        .u.w.read   = pt_word_reg_read,
+        .u.w.write  = pt_devctrl_reg_write,
+    },
+    /* Link Control reg */
+    {
+        .offset     = PCI_EXP_LNKCTL,
+        .size       = 2,
+        .init_val   = 0x0000,
+        .ro_mask    = 0x0000,
+        .emu_mask   = 0xFFFF,
+        .init       = pt_common_reg_init,
+        .u.w.read   = pt_word_reg_read,
+        .u.w.write  = pt_linkctrl_reg_write,
+    },
+    /* Device Control 2 reg */
+    {
+        .offset     = 0x28,
+        .size       = 2,
+        .init_val   = 0x0000,
+        .ro_mask    = 0x0000,
+        .emu_mask   = 0xFFFF,
+        .init       = pt_common_reg_init,
+        .u.w.read   = pt_word_reg_read,
+        .u.w.write  = pt_devctrl2_reg_write,
+    },
+    /* Link Control 2 reg */
+    {
+        .offset     = 0x30,
+        .size       = 2,
+        .init_val   = 0x0000,
+        .ro_mask    = 0x0000,
+        .emu_mask   = 0xFFFF,
+        .init       = pt_linkctrl2_reg_init,
+        .u.w.read   = pt_word_reg_read,
+        .u.w.write  = pt_linkctrl2_reg_write,
+    },
+    {
+        .size = 0,
+    }, 
+};
+
+/* emul reg group static infomation table */
+static const struct pt_reg_grp_info_tbl pt_emu_reg_grp_tbl[] = {
+    /* Header Type0 reg group */
+    {
+        .grp_id     = 0xFF,
+        .grp_type   = GRP_TYPE_EMU,
+        .grp_size   = 0x40,
+        .size_init  = pt_reg_grp_size_init,
+        .emu_reg_tbl= pt_emu_reg_header0_tbl,
+    },
+    /* PCI PowerManagement Capability reg group */
+    {
+        .grp_id     = PCI_CAP_ID_PM,
+        .grp_type   = GRP_TYPE_EMU,
+        .grp_size   = PCI_PM_SIZEOF,
+        .size_init  = pt_reg_grp_size_init,
+        .emu_reg_tbl= pt_emu_reg_pm_tbl,
+    },
+    /* AGP Capability Structure reg group */
+    {
+        .grp_id     = PCI_CAP_ID_AGP,
+        .grp_type   = GRP_TYPE_HARDWIRED,
+        .grp_size   = 0x30,
+        .size_init  = pt_reg_grp_size_init,
+    },
+    /* Vital Product Data Capability Structure reg group */
+    {
+        .grp_id     = PCI_CAP_ID_VPD,
+        .grp_type   = GRP_TYPE_EMU,
+        .grp_size   = 0x08,
+        .size_init  = pt_reg_grp_size_init,
+        .emu_reg_tbl= pt_emu_reg_vpd_tbl,
+    },
+    /* Slot Identification reg group */
+    {
+        .grp_id     = PCI_CAP_ID_SLOTID,
+        .grp_type   = GRP_TYPE_HARDWIRED,
+        .grp_size   = 0x04,
+        .size_init  = pt_reg_grp_size_init,
+    },
+    /* MSI Capability Structure reg group */
+    {
+        .grp_id     = PCI_CAP_ID_MSI,
+        .grp_type   = GRP_TYPE_HARDWIRED,
+        .grp_size   = 0xFF,
+        .size_init  = pt_msi_size_init,
+    },
+    /* PCI-X Capabilities List Item reg group */
+    {
+        .grp_id     = PCI_CAP_ID_PCIX,
+        .grp_type   = GRP_TYPE_HARDWIRED,
+        .grp_size   = 0x18,
+        .size_init  = pt_reg_grp_size_init,
+    },
+    /* Vendor Specific Capability Structure reg group */
+    {
+        .grp_id     = PCI_CAP_ID_VNDR,
+        .grp_type   = GRP_TYPE_EMU,
+        .grp_size   = 0xFF,
+        .size_init  = pt_vendor_size_init,
+        .emu_reg_tbl= pt_emu_reg_vendor_tbl,
+    },
+    /* SHPC Capability List Item reg group */
+    {
+        .grp_id     = PCI_CAP_ID_HOTPLUG,
+        .grp_type   = GRP_TYPE_HARDWIRED,
+        .grp_size   = 0x08,
+        .size_init  = pt_reg_grp_size_init,
+    },
+    /* Subsystem ID and Subsystem Vendor ID Capability List Item reg group */
+    {
+        .grp_id     = PCI_CAP_ID_SSVID,
+        .grp_type   = GRP_TYPE_HARDWIRED,
+        .grp_size   = 0x08,
+        .size_init  = pt_reg_grp_size_init,
+    },
+    /* AGP 8x Capability Structure reg group */
+    {
+        .grp_id     = PCI_CAP_ID_AGP3,
+        .grp_type   = GRP_TYPE_HARDWIRED,
+        .grp_size   = 0x30,
+        .size_init  = pt_reg_grp_size_init,
+    },
+    /* PCI Express Capability Structure reg group */
+    {
+        .grp_id     = PCI_CAP_ID_EXP,
+        .grp_type   = GRP_TYPE_EMU,
+        .grp_size   = 0x3C,
+        .size_init  = pt_reg_grp_size_init,
+        .emu_reg_tbl= pt_emu_reg_pcie_tbl,
+    },
+    /* MSI-X Capability Structure reg group */
+    {
+        .grp_id     = PCI_CAP_ID_MSIX,
+        .grp_type   = GRP_TYPE_HARDWIRED,
+        .grp_size   = 0x0C,
+        .size_init  = pt_reg_grp_size_init,
+    },
+    {
+        .grp_size = 0,
+    }, 
+};
+
 static int token_value(char *token)
 {
     return strtol(token, NULL, 16);
@@ -197,15 +689,15 @@ void pt_iomem_map(PCIDevice *d, int i, u
     assigned_device->bases[i].e_physbase = e_phys;
     assigned_device->bases[i].e_size= e_size;
 
-    PT_LOG("e_phys=%08x maddr=%08x type=%d len=%08x index=%d\n",
-        e_phys, assigned_device->bases[i].access.maddr, type, e_size, i);
+    PT_LOG("e_phys=%08x maddr=%lx type=%d len=%d index=%d first_map=%d\n",
+        e_phys, assigned_device->bases[i].access.maddr, 
+        type, e_size, i, first_map);
 
     if ( e_size == 0 )
         return;
 
     if ( !first_map )
     {
-        add_msix_mapping(assigned_device, i);
         /* Remove old mapping */
         ret = xc_domain_memory_mapping(xc_handle, domid,
                 old_ebase >> XC_PAGE_SHIFT,
@@ -219,18 +711,21 @@ void pt_iomem_map(PCIDevice *d, int i, u
         }
     }
 
-    /* Create new mapping */
-    ret = xc_domain_memory_mapping(xc_handle, domid,
-            assigned_device->bases[i].e_physbase >> XC_PAGE_SHIFT,
-            assigned_device->bases[i].access.maddr >> XC_PAGE_SHIFT,
-            (e_size+XC_PAGE_SIZE-1) >> XC_PAGE_SHIFT,
-            DPCI_ADD_MAPPING);
-    if ( ret != 0 )
-        PT_LOG("Error: create new mapping failed!\n");
-
-    ret = remove_msix_mapping(assigned_device, i);
-    if ( ret != 0 )
-        PT_LOG("Error: remove MSX-X mmio mapping failed!\n");
+    /* map only valid guest address (include 0) */
+    if (e_phys != -1)
+    {
+        /* Create new mapping */
+        ret = xc_domain_memory_mapping(xc_handle, domid,
+                assigned_device->bases[i].e_physbase >> XC_PAGE_SHIFT,
+                assigned_device->bases[i].access.maddr >> XC_PAGE_SHIFT,
+                (e_size+XC_PAGE_SIZE-1) >> XC_PAGE_SHIFT,
+                DPCI_ADD_MAPPING);
+
+        if ( ret != 0 )
+        {
+            PT_LOG("Error: create new mapping failed!\n");
+        }
+    }
 }
 
 /* Being called each time a pio region has been updated */
@@ -245,9 +740,9 @@ void pt_ioport_map(PCIDevice *d, int i,
     assigned_device->bases[i].e_physbase = e_phys;
     assigned_device->bases[i].e_size= e_size;
 
-    PT_LOG("e_phys=%04x pio_base=%04x len=%04x index=%d\n",
+    PT_LOG("e_phys=%04x pio_base=%04x len=%d index=%d first_map=%d\n",
         (uint16_t)e_phys, (uint16_t)assigned_device->bases[i].access.pio_base,
-        (uint16_t)e_size, i);
+        (uint16_t)e_size, i, first_map);
 
     if ( e_size == 0 )
         return;
@@ -265,13 +760,84 @@ void pt_ioport_map(PCIDevice *d, int i,
         }
     }
 
-    /* Create new mapping */
-    ret = xc_domain_ioport_mapping(xc_handle, domid, e_phys,
-                assigned_device->bases[i].access.pio_base, e_size,
-                DPCI_ADD_MAPPING);
-    if ( ret != 0 )
-        PT_LOG("Error: create new mapping failed!\n");
-
+    /* map only valid guest address (include 0) */
+    if (e_phys != -1)
+    {
+        /* Create new mapping */
+        ret = xc_domain_ioport_mapping(xc_handle, domid, e_phys,
+                    assigned_device->bases[i].access.pio_base, e_size,
+                    DPCI_ADD_MAPPING);
+        if ( ret != 0 )
+        {
+            PT_LOG("Error: create new mapping failed!\n");
+        }
+    }
+}
+
+/* find emulate register group entry */
+struct pt_reg_grp_tbl* pt_find_reg_grp(
+        struct pt_dev *ptdev, uint32_t address)
+{
+    struct pt_reg_grp_tbl* reg_grp_entry = NULL;
+
+    /* find register group entry */
+    list_for_each_entry(reg_grp_entry, &ptdev->pt_reg_grp_tbl_list, list)
+    {
+        /* check address */
+        if ((reg_grp_entry->base_offset <= address) &&
+            ((reg_grp_entry->base_offset + reg_grp_entry->size) > address))
+            goto out;
+    }
+    /* group entry not found */
+    reg_grp_entry = NULL;
+
+out:
+    return reg_grp_entry;
+}
+
+/* find emulate register entry */
+struct pt_reg_tbl* pt_find_reg(
+        struct pt_reg_grp_tbl* reg_grp, uint32_t address)
+{
+    struct pt_reg_tbl* reg_entry = NULL;
+    struct pt_reg_info_tbl* reg = NULL;
+    uint32_t real_offset = 0;
+
+    /* find register entry */
+    list_for_each_entry(reg_entry, &reg_grp->pt_reg_tbl_list, list)
+    {
+        reg = reg_entry->reg;
+        real_offset = (reg_grp->base_offset + reg->offset);
+        /* check address */
+        if ((real_offset <= address) && ((real_offset + reg->size) > address))
+            goto out;
+    }
+    /* register entry not found */
+    reg_entry = NULL;
+
+out:
+    return reg_entry;
+}
+
+/* get BAR index */
+static int pt_bar_offset_to_index(uint32_t offset)
+{
+    int index = 0;
+
+    /* check Exp ROM BAR */
+    if (offset == PCI_ROM_ADDRESS)
+    {
+        index = PCI_ROM_SLOT;
+        goto out;
+    }
+
+    /* calculate BAR index */
+    index = ((offset - PCI_BASE_ADDRESS_0) >> 2);
+    if (index >= PCI_NUM_REGIONS)
+        index = -1;
+
+out:
+    return index;
 }
 
 static void pt_pci_write_config(PCIDevice *d, uint32_t address, uint32_t val,
@@ -279,60 +845,258 @@ static void pt_pci_write_config(PCIDevic
 {
     struct pt_dev *assigned_device = (struct pt_dev *)d;
     struct pci_dev *pci_dev = assigned_device->pci_dev;
-
-#ifdef PT_DEBUG_PCI_CONFIG_ACCESS
-    PT_LOG("(%x.%x): address=%04x val=0x%08x len=%d\n",
-       (d->devfn >> 3) & 0x1F, (d->devfn & 0x7), address, val, len);
-#endif
-
-    /* Pre-write hooking */
-    switch ( address ) {
-    case 0x0C ... 0x3F:
-        pci_default_write_config(d, address, val, len);
-        return;
-    }
-
-    if ( pt_msi_write(assigned_device, address, val, len) )
-        return;
-
-    if ( pt_msix_write(assigned_device, address, val, len) )
-        return;
-
-    /* PCI config pass-through */
-    if (address == 0x4) {
-        switch (len){
-        case 1:
-            pci_write_byte(pci_dev, address, val);
-            break;
-        case 2:
-            pci_write_word(pci_dev, address, val);
-            break;
-        case 4:
-            pci_write_long(pci_dev, address, val);
-            break;
-        }
-    }
-
-    if (address == 0x4) {
-        /* Post-write hooking */
-        pci_default_write_config(d, address, val, len);
-    }
+    struct pt_reg_grp_tbl *reg_grp_entry = NULL;
+    struct pt_reg_grp_info_tbl *reg_grp = NULL;
+    struct pt_reg_tbl *reg_entry = NULL;
+    struct pt_reg_info_tbl *reg = NULL;
+    uint32_t find_addr = address;
+    uint32_t real_offset = 0;
+    uint32_t valid_mask = 0xFFFFFFFF;
+    uint32_t read_val = 0;
+    uint8_t *ptr_val = NULL;
+    int emul_len = 0;
+    int index = 0;
+    int ret = 0;
+
+    PT_LOG("write(%x.%x): address=%04x val=0x%08x len=%d\n",
+        (d->devfn >> 3) & 0x1F, (d->devfn & 0x7), address, val, len);
+
+    /* check offset range */
+    if (address >= 0xFF)
+    {
+        PT_LOG("Failed to write register with offset exceeding FFh. "
+            "[%02x:%02x.%x][Offset:%02xh][Length:%d]\n",
+            pci_bus_num(d->bus), ((d->devfn >> 3) & 0x1F), (d->devfn & 0x7),
+            address, len);
+        goto exit;
+    }
+
+    /* check write size */
+    if ((len != 1) && (len != 2) && (len != 4))
+    {
+        PT_LOG("Failed to write register with invalid access length. "
+            "[%02x:%02x.%x][Offset:%02xh][Length:%d]\n",
+            pci_bus_num(d->bus), ((d->devfn >> 3) & 0x1F), (d->devfn & 0x7),
+            address, len);
+        goto exit;
+    }
+
+    /* check offset alignment */
+    if (address & (len-1))
+    {
+        PT_LOG("Failed to write register with invalid access size alignment. "
+            "[%02x:%02x.%x][Offset:%02xh][Length:%d]\n",
+            pci_bus_num(d->bus), ((d->devfn >> 3) & 0x1F), (d->devfn & 0x7),
+            address, len);
+        goto exit;
+    }
+
+    /* check unused BAR register */
+    index = pt_bar_offset_to_index(address);
+    if ((index >= 0) && (val > 0 && val < PT_BAR_ALLF) &&
+        (assigned_device->bases[index].bar_flag == PT_BAR_FLAG_UNUSED))
+    {
+        PT_LOG("Guest attempt to set address to unused Base Address Register. "
+            "[%02x:%02x.%x][Offset:%02xh][Length:%d]\n",
+            pci_bus_num(d->bus), ((d->devfn >> 3) & 0x1F), 
+            (d->devfn & 0x7), address, len);
+    }
+
+    /* find register group entry */
+    reg_grp_entry = pt_find_reg_grp(assigned_device, address);
+    if (reg_grp_entry)
+    {
+        reg_grp = reg_grp_entry->reg_grp;
+        /* check 0 Hardwired register group */
+        if (reg_grp->grp_type == GRP_TYPE_HARDWIRED)
+        {
+            /* ignore silently */
+            PT_LOG("Access to 0 Hardwired register.\n");
+            goto exit;
+        }
+    }
+
+    /* read I/O device register value */
+    switch (len) {
+    case 1:
+        read_val = pci_read_byte(pci_dev, address);
+        break;
+    case 2:
+        read_val = pci_read_word(pci_dev, address);
+        break;
+    case 4:
+        read_val = pci_read_long(pci_dev, address);
+        break;
+    }
+
+    /* check libpci error */
+    valid_mask = (0xFFFFFFFF >> ((4 - len) << 3));
+    if ((read_val & valid_mask) == valid_mask)
+    {
+        PT_LOG("libpci read error. No emulation. "
+            "[%02x:%02x.%x][Offset:%02xh][Length:%d]\n",
+            pci_bus_num(d->bus), ((d->devfn >> 3) & 0x1F), (d->devfn & 0x7),
+            address, len);
+        goto exit;
+    }
+    
+    /* pass directly to libpci for passthrough type register group */
+    if (reg_grp_entry == NULL)
+        goto out;
+
+    /* adjust the write value to appropriate CFC-CFF window */
+    val <<= ((address & 3) << 3);
+    emul_len = len;
+
+    /* loop Guest request size */
+    while (0 < emul_len)
+    {
+        /* find register entry to be emulated */
+        reg_entry = pt_find_reg(reg_grp_entry, find_addr);
+        if (reg_entry)
+        {
+            reg = reg_entry->reg;
+            real_offset = (reg_grp_entry->base_offset + reg->offset);
+            valid_mask = (0xFFFFFFFF >> ((4 - emul_len) << 3));
+            valid_mask <<= ((find_addr - real_offset) << 3);
+            ptr_val = ((uint8_t *)&val + (real_offset & 3));
+
+            /* do emulation depend on register size */
+            switch (reg->size) {
+            case 1:
+                /* emulate write to byte register */
+                if (reg->u.b.write)
+                    ret = reg->u.b.write(assigned_device, reg_entry,
+                               (uint8_t *)ptr_val, 
+                               (uint8_t)(read_val >> ((real_offset & 3) << 3)),
+                               (uint8_t)valid_mask);
+                break;
+            case 2:
+                /* emulate write to word register */
+                if (reg->u.w.write)
+                    ret = reg->u.w.write(assigned_device, reg_entry,
+                               (uint16_t *)ptr_val, 
+                               (uint16_t)(read_val >> ((real_offset & 3) << 3)),
+                               (uint16_t)valid_mask);
+                break;
+            case 4:
+                /* emulate write to double word register */
+                if (reg->u.dw.write)
+                    ret = reg->u.dw.write(assigned_device, reg_entry,
+                               (uint32_t *)ptr_val, 
+                               (uint32_t)(read_val >> ((real_offset & 3) << 3)),
+                               (uint32_t)valid_mask);
+                break;
+            }
+
+            /* write emulation error */
+            if (ret < 0)
+            {
+                /* exit I/O emulator */
+                PT_LOG("I/O emulator exit()\n");
+                exit(1);
+            }
+
+            /* calculate next address to find */
+            emul_len -= reg->size;
+            if (emul_len > 0)
+                find_addr = real_offset + reg->size;
+        }
+        else
+        {
+            /* nothing to do with passthrough type register, 
+             * continue to find next byte 
+             */
+            emul_len--;
+            find_addr++;
+        }
+    }
+    
+    /* need to shift back before passing them to libpci */
+    val >>= ((address & 3) << 3);
+
+out:
+    switch (len){
+    case 1:
+        pci_write_byte(pci_dev, address, val);
+        break;
+    case 2:
+        pci_write_word(pci_dev, address, val);
+        break;
+    case 4:
+        pci_write_long(pci_dev, address, val);
+        break;
+    }
+
+exit:
+    return;
 }
 
 static uint32_t pt_pci_read_config(PCIDevice *d, uint32_t address, int len)
 {
     struct pt_dev *assigned_device = (struct pt_dev *)d;
     struct pci_dev *pci_dev = assigned_device->pci_dev;
-    uint32_t val = 0xFF;
-
-    /* Pre-hooking */
-    switch ( address ) {
-    case 0x0C ... 0x3F:
-        val = pci_default_read_config(d, address, len);
+    uint32_t val = 0xFFFFFFFF;
+    struct pt_reg_grp_tbl *reg_grp_entry = NULL;
+    struct pt_reg_grp_info_tbl *reg_grp = NULL;
+    struct pt_reg_tbl *reg_entry = NULL;
+    struct pt_reg_info_tbl *reg = NULL;
+    uint32_t find_addr = address;
+    uint32_t real_offset = 0;
+    uint32_t valid_mask = 0xFFFFFFFF;
+    uint8_t *ptr_val = NULL;
+    int emul_len = 0;
+    int ret = 0;
+
+    PT_LOG("read(%x.%x): address=%04x len=%d\n",
+        (d->devfn >> 3) & 0x1F, (d->devfn & 0x7), address, len);
+
+    /* check offset range */
+    if (address >= 0xFF)
+    {
+        PT_LOG("Failed to read register with offset exceeding FFh. "
+            "[%02x:%02x.%x][Offset:%02xh][Length:%d]\n",
+            pci_bus_num(d->bus), ((d->devfn >> 3) & 0x1F), (d->devfn & 0x7),
+            address, len);
         goto exit;
     }
 
-    switch ( len ) {
+    /* check read size */
+    if ((len != 1) && (len != 2) && (len != 4))
+    {
+        PT_LOG("Failed to read register with invalid access length. "
+            "[%02x:%02x.%x][Offset:%02xh][Length:%d]\n",
+            pci_bus_num(d->bus), ((d->devfn >> 3) & 0x1F), (d->devfn & 0x7),
+            address, len);
+        goto exit;
+    }
+
+    /* check offset alignment */
+    if (address & (len-1))
+    {
+        PT_LOG("Failed to read register with invalid access size alignment. "
+            "[%02x:%02x.%x][Offset:%02xh][Length:%d]\n",
+            pci_bus_num(d->bus), ((d->devfn >> 3) & 0x1F), (d->devfn & 0x7),
+            address, len);
+        goto exit;
+    }
+
+    /* find register group entry */
+    reg_grp_entry = pt_find_reg_grp(assigned_device, address);
+    if (reg_grp_entry)
+    {
+        reg_grp = reg_grp_entry->reg_grp;
+        /* check 0 Hardwired register group */
+        if (reg_grp->grp_type == GRP_TYPE_HARDWIRED)
+        {
+            /* no need to emulate, just return 0 */
+            val = 0;
+            goto exit;
+        }
+    }
+
+    /* read I/O device register value */
+    switch (len) {
     case 1:
         val = pci_read_byte(pci_dev, address);
         break;
@@ -344,15 +1108,92 @@ static uint32_t pt_pci_read_config(PCIDe
         break;
     }
 
-    pt_msi_read(assigned_device, address, len, &val);
-    pt_msix_read(assigned_device, address, len, &val);
+    /* check libpci error */
+    valid_mask = (0xFFFFFFFF >> ((4 - len) << 3));
+    if ((val & valid_mask) == valid_mask)
+    {
+        PT_LOG("libpci read error. No emulation. "
+            "[%02x:%02x.%x][Offset:%02xh][Length:%d]\n",
+            pci_bus_num(d->bus), ((d->devfn >> 3) & 0x1F), (d->devfn & 0x7),
+            address, len);
+        goto exit;
+    }
+
+    /* just return the I/O device register value for 
+     * passthrough type register group 
+     */
+    if (reg_grp_entry == NULL)
+        goto exit;
+
+    /* adjust the read value to appropriate CFC-CFF window */
+    val <<= ((address & 3) << 3);
+    emul_len = len;
+
+    /* loop Guest request size */
+    while (0 < emul_len)
+    {
+        /* find register entry to be emulated */
+        reg_entry = pt_find_reg(reg_grp_entry, find_addr);
+        if (reg_entry)
+        {
+            reg = reg_entry->reg;
+            real_offset = (reg_grp_entry->base_offset + reg->offset);
+            valid_mask = (0xFFFFFFFF >> ((4 - emul_len) << 3));
+            valid_mask <<= ((find_addr - real_offset) << 3);
+            ptr_val = ((uint8_t *)&val + (real_offset & 3));
+
+            /* do emulation depend on register size */
+            switch (reg->size) {
+            case 1:
+                /* emulate read to byte register */
+                if (reg->u.b.read)
+                    ret = reg->u.b.read(assigned_device, reg_entry,
+                                        (uint8_t *)ptr_val, 
+                                        (uint8_t)valid_mask);
+                break;
+            case 2:
+                /* emulate read to word register */
+                if (reg->u.w.read)
+                    ret = reg->u.w.read(assigned_device, reg_entry,
+                                        (uint16_t *)ptr_val, 
+                                        (uint16_t)valid_mask);
+                break;
+            case 4:
+                /* emulate read to double word register */
+                if (reg->u.dw.read)
+                    ret = reg->u.dw.read(assigned_device, reg_entry,
+                                        (uint32_t *)ptr_val, 
+                                        (uint32_t)valid_mask);
+                break;
+            }
+
+            /* read emulation error */
+            if (ret < 0)
+            {
+                /* exit I/O emulator */
+                PT_LOG("I/O emulator exit()\n");
+                exit(1);
+            }
+
+            /* calculate next address to find */
+            emul_len -= reg->size;
+            if (emul_len > 0)
+                find_addr = real_offset + reg->size;
+        }
+        else
+        {
+            /* nothing to do with passthrough type register, 
+             * continue to find next byte 
+             */
+            emul_len--;
+            find_addr++;
+        }
+    }
+    
+    /* need to shift back before returning them to pci bus emulator */
+    val >>= ((address & 3) << 3);
+
 exit:
-
-#ifdef PT_DEBUG_PCI_CONFIG_ACCESS
-    PT_LOG("(%x.%x): address=%04x val=0x%08x len=%d\n",
-       (d->devfn >> 3) & 0x1F, (d->devfn & 0x7), address, val, len);
-#endif
-
     return val;
 }
 
@@ -488,11 +1329,880 @@ uint8_t find_cap_offset(struct pci_dev *
     return 0;
 }
 
+/* parse BAR */
+static int pt_bar_reg_parse(
+        struct pt_dev *ptdev, struct pt_reg_info_tbl *reg)
+{
+    PCIDevice *d = &ptdev->dev;
+    struct pt_region *region = NULL;
+    PCIIORegion *r;
+    uint32_t bar_64 = (reg->offset - 4);
+    int bar_flag = PT_BAR_FLAG_UNUSED;
+    int index = 0;
+    int i;
+
+    /* set again the BAR config because it has been overwritten
+     * by pci_register_io_region()
+     */
+    for (i=reg->offset; i<(reg->offset + 4); i++)
+        d->config[i] = pci_read_byte(ptdev->pci_dev, i);
+
+    /* check 64bit BAR */
+    index = pt_bar_offset_to_index(reg->offset);
+    if ((index > 0) && (index < PCI_ROM_SLOT) &&
+        (d->config[bar_64] & PCI_BASE_ADDRESS_MEM_TYPE_64))
+    {
+        region = &ptdev->bases[index-1];
+        if (region->bar_flag != PT_BAR_FLAG_UPPER)
+        {
+            bar_flag = PT_BAR_FLAG_UPPER;
+            goto out;
+        }
+    }
+
+    /* check unused BAR */
+    r = &d->io_regions[index];
+    if (!r->size)
+        goto out;
+
+    /* check BAR I/O indicator */
+    if (d->config[reg->offset] & PCI_BASE_ADDRESS_SPACE_IO)
+        bar_flag = PT_BAR_FLAG_IO;
+    else
+        bar_flag = PT_BAR_FLAG_MEM;
+
+out:
+    return bar_flag;
+}
+
+/* mapping BAR */
+static void pt_bar_mapping(struct pt_dev *ptdev, int io_enable, int mem_enable)
+{
+    PCIDevice *dev = (PCIDevice *)&ptdev->dev;
+    PCIIORegion *r;
+    struct pt_region *base = NULL;
+    uint32_t r_size = 0;
+    int ret = 0;
+    int i;
+
+    for (i=0; i<PCI_NUM_REGIONS; i++)
+    {
+        r = &dev->io_regions[i];
+
+        /* check valid region */
+        if (!r->size)
+            continue;
+
+        base = &ptdev->bases[i];
+        /* skip unused BAR or upper 64bit BAR */
+        if ((base->bar_flag == PT_BAR_FLAG_UNUSED) || 
+           (base->bar_flag == PT_BAR_FLAG_UPPER))
+               continue;
+
+        /* clear region address in case I/O Space or Memory Space disable */
+        if (((base->bar_flag == PT_BAR_FLAG_IO) && !io_enable ) ||
+            ((base->bar_flag == PT_BAR_FLAG_MEM) && !mem_enable ))
+            r->addr = -1;
+
+        /* prevent guest software mapping memory resource to 00000000h */
+        if ((base->bar_flag == PT_BAR_FLAG_MEM) && (r->addr == 0))
+            r->addr = -1;
+
+        /* align resource size (memory type only) */
+        r_size = r->size;
+        PT_GET_EMUL_SIZE(base->bar_flag, r_size);
+
+        /* check overlapped address */
+        ret = pt_chk_bar_overlap(dev->bus, dev->devfn, r->addr, r_size);
+        if (ret > 0)
+        {
+            PT_LOG("Base Address[%d] is overlapped. "
+                "[Address:%08xh][Size:%04xh]\n",
+                i, r->addr, r_size);
+        }
+
+        /* check whether we need to update the mapping or not */
+        if (r->addr != ptdev->bases[i].e_physbase)
+        {
+            /* mapping BAR */
+            r->map_func((PCIDevice *)ptdev, i, r->addr, 
+                         r_size, r->type);
+        }
+    }
+
+    return;
+}
+
+/* initialize emulate register */
+static int pt_config_reg_init(struct pt_dev *ptdev,
+        struct pt_reg_grp_tbl *reg_grp,
+        struct pt_reg_info_tbl *reg)
+{
+    struct pt_reg_tbl *reg_entry;
+    uint32_t data = 0;
+    int err = 0;
+
+    /* allocate register entry */
+    reg_entry = qemu_mallocz(sizeof(struct pt_reg_tbl));
+    if (reg_entry == NULL)
+    {
+        PT_LOG("Failed to allocate memory.\n");
+        err = -1;
+        goto out;
+    }
+
+    /* initialize register entry */
+    reg_entry->reg = reg;
+    reg_entry->data = 0;
+
+    if (reg->init)
+    {
+        /* initialize emulate register */
+        data = reg->init(ptdev, reg_entry->reg,
+                        (reg_grp->base_offset + reg->offset));
+        if (data == PT_BAR_ALLF)
+        {
+            /* free unused BAR register entry */
+            free(reg_entry);
+            goto out;
+        }
+        /* set register value */
+        reg_entry->data = data;
+    }
+    /* list add register entry */
+    list_add_tail(&reg_entry->list, &reg_grp->pt_reg_tbl_list);
+
+out:
+    return err;
+}
+
+/* initialize emulate register group */
+static int pt_config_init(struct pt_dev *ptdev)
+{
+    struct pt_reg_grp_tbl *reg_grp_entry = NULL;
+    struct pt_reg_info_tbl *reg_tbl = NULL;
+    uint32_t reg_grp_offset = 0;
+    int i, j, err = 0;
+
+    /* initialize register group list */
+    INIT_LIST_HEAD(&ptdev->pt_reg_grp_tbl_list);
+
+    /* initialize register group */
+    for (i=0; pt_emu_reg_grp_tbl[i].grp_size != 0; i++)
+    {
+        if (pt_emu_reg_grp_tbl[i].grp_id != 0xFF)
+        {
+            reg_grp_offset = (uint32_t)find_cap_offset(ptdev->pci_dev, 
+                                 pt_emu_reg_grp_tbl[i].grp_id);
+            if (!reg_grp_offset) 
+                continue;
+        }
+
+        /* allocate register group table */
+        reg_grp_entry = qemu_mallocz(sizeof(struct pt_reg_grp_tbl));
+        if (reg_grp_entry == NULL)
+        {
+            PT_LOG("Failed to allocate memory.\n");
+            err = -1;
+            goto out;
+        }
+
+        /* initialize register group entry */
+        INIT_LIST_HEAD(&reg_grp_entry->pt_reg_tbl_list);
+
+        /* need to declare here, to enable searching Cap Ptr reg 
+         * (which is in the same reg group) when initializing Status reg 
+         */
+        list_add_tail(&reg_grp_entry->list, &ptdev->pt_reg_grp_tbl_list);
+
+        reg_grp_entry->base_offset = reg_grp_offset;
+        reg_grp_entry->reg_grp = 
+                (struct pt_reg_grp_info_tbl*)&pt_emu_reg_grp_tbl[i];
+        if (pt_emu_reg_grp_tbl[i].size_init)
+        {
+            /* get register group size */
+            reg_grp_entry->size = pt_emu_reg_grp_tbl[i].size_init(ptdev,
+                                      reg_grp_entry->reg_grp, 
+                                      reg_grp_offset);
+        }
+
+        if (pt_emu_reg_grp_tbl[i].grp_type == GRP_TYPE_EMU)
+        {
+            if (pt_emu_reg_grp_tbl[i].emu_reg_tbl)
+            {
+                reg_tbl = pt_emu_reg_grp_tbl[i].emu_reg_tbl;
+                /* initialize capability register */
+                for (j=0; reg_tbl->size != 0; j++, reg_tbl++)
+                {
+                    /* initialize capability register */
+                    err = pt_config_reg_init(ptdev, reg_grp_entry, reg_tbl);
+                    if (err < 0)
+                        goto out;
+                }
+            }
+        }
+        reg_grp_offset = 0;
+    }
+
+out:
+    return err;
+}
+
+/* initialize common register value */
+static uint32_t pt_common_reg_init(struct pt_dev *ptdev,
+        struct pt_reg_info_tbl *reg, uint32_t real_offset)
+{
+    return reg->init_val;
+}
+
+/* initialize Capabilities Pointer or Next Pointer register */
+static uint32_t pt_ptr_reg_init(struct pt_dev *ptdev,
+        struct pt_reg_info_tbl *reg, uint32_t real_offset)
+{
+    uint32_t reg_field = (uint32_t)ptdev->dev.config[real_offset];
+    int i;
+
+    /* find capability offset */
+    while (reg_field)
+    {
+        for (i=0; pt_emu_reg_grp_tbl[i].grp_size != 0; i++)
+        {
+            /* check whether the next capability 
+             * should be exported to guest or not 
+             */
+            if (pt_emu_reg_grp_tbl[i].grp_id == ptdev->dev.config[reg_field])
+            {
+                if (pt_emu_reg_grp_tbl[i].grp_type == GRP_TYPE_EMU)
+                    goto out;
+                /* ignore the 0 hardwired capability, find next one */
+                break;
+            }
+        }
+        /* next capability */
+        reg_field = (uint32_t)ptdev->dev.config[reg_field + 1];
+    }
+
+out:
+    return reg_field;
+}
+
+/* initialize Status register */
+static uint32_t pt_status_reg_init(struct pt_dev *ptdev,
+        struct pt_reg_info_tbl *reg, uint32_t real_offset)
+{
+    struct pt_reg_grp_tbl *reg_grp_entry = NULL;
+    struct pt_reg_tbl *reg_entry = NULL;
+    int reg_field = 0;
+
+    /* find Header register group */
+    reg_grp_entry = pt_find_reg_grp(ptdev, PCI_CAPABILITY_LIST);
+    if (reg_grp_entry)
+    {
+        /* find Capabilities Pointer register */
+        reg_entry = pt_find_reg(reg_grp_entry, PCI_CAPABILITY_LIST);
+        if (reg_entry)
+        {
+            /* check Capabilities Pointer register */
+            if (reg_entry->data)
+                reg_field |= PCI_STATUS_CAP_LIST;
+            else
+                reg_field &= ~PCI_STATUS_CAP_LIST;
+        }
+        else
+        {
+            /* exit I/O emulator */
+            PT_LOG("I/O emulator exit()\n");
+            exit(1);
+        }
+    }
+    else
+    {
+        /* exit I/O emulator */
+        PT_LOG("I/O emulator exit()\n");
+        exit(1);
+    }
+
+    return reg_field;
+}
+
+/* initialize Interrupt Pin register */
+static uint32_t pt_irqpin_reg_init(struct pt_dev *ptdev,
+        struct pt_reg_info_tbl *reg, uint32_t real_offset)
+{
+    int reg_field = 0;
+
+    /* set Interrupt Pin register to use INTA# if it has */
+    if (ptdev->dev.config[real_offset])
+        reg_field = 0x01;
+
+    return reg_field;
+}
+
+/* initialize BAR */
+static uint32_t pt_bar_reg_init(struct pt_dev *ptdev,
+        struct pt_reg_info_tbl *reg, uint32_t real_offset)
+{
+    int reg_field = 0;
+    int index;
+
+    /* get BAR index */
+    index = pt_bar_offset_to_index(reg->offset);
+    if (index < 0)
+    {
+        /* exit I/O emulator */
+        PT_LOG("I/O emulator exit()\n");
+        exit(1);
+    }
+
+    /* set initial guest physical base address to -1 */
+    ptdev->bases[index].e_physbase = -1;
+
+    /* set BAR flag */
+    ptdev->bases[index].bar_flag = pt_bar_reg_parse(ptdev, reg);
+    if (ptdev->bases[index].bar_flag == PT_BAR_FLAG_UNUSED)
+        reg_field = PT_BAR_ALLF;
+
+    return reg_field;
+}
+
+/* initialize Link Control 2 register */
+static uint32_t pt_linkctrl2_reg_init(struct pt_dev *ptdev,
+        struct pt_reg_info_tbl *reg, uint32_t real_offset)
+{
+    int reg_field = 0;
+
+    /* set Supported Link Speed */
+    reg_field |= 
+        (0x0F & 
+         ptdev->dev.config[(real_offset - reg->offset) + PCI_EXP_LNKCAP]);
+
+    return reg_field;
+}
+
+/* get register group size */
+static uint8_t pt_reg_grp_size_init(struct pt_dev *ptdev,
+        struct pt_reg_grp_info_tbl *grp_reg, uint32_t base_offset)
+{
+    return grp_reg->grp_size;
+}
+
+/* get MSI Capability Structure register group size */
+static uint8_t pt_msi_size_init(struct pt_dev *ptdev,
+        struct pt_reg_grp_info_tbl *grp_reg, uint32_t base_offset)
+{
+    PCIDevice *d = &ptdev->dev;
+    uint16_t msg_ctrl = 
+        *((uint16_t*)(d->config + (base_offset + PCI_MSI_FLAGS)));
+    uint8_t msi_size = 0;
+
+    /* check 64 bit address capable & Per-vector masking capable */
+    switch (msg_ctrl & (PCI_MSI_FLAGS_MASK_BIT | PCI_MSI_FLAGS_64BIT))
+    {
+    case 0x0000:
+        msi_size = 0x0A;
+        break;
+    case PCI_MSI_FLAGS_64BIT:
+        msi_size = 0x0E;
+        break;
+    case PCI_MSI_FLAGS_MASK_BIT:
+        msi_size = 0x14;
+        break;
+    case (PCI_MSI_FLAGS_MASK_BIT | PCI_MSI_FLAGS_64BIT):
+        msi_size = 0x18;
+        break;
+    }
+
+    return msi_size;
+}
+
+/* get Vendor Specific Capability Structure register group size */
+static uint8_t pt_vendor_size_init(struct pt_dev *ptdev,
+        struct pt_reg_grp_info_tbl *grp_reg, uint32_t base_offset)
+{
+    return ptdev->dev.config[base_offset + 0x02];
+}
+
+/* read byte size emulate register */
+static int pt_byte_reg_read(struct pt_dev *ptdev,
+        struct pt_reg_tbl *cfg_entry,
+        uint8_t *value, uint8_t valid_mask)
+{
+    struct pt_reg_info_tbl *reg = cfg_entry->reg;
+    uint8_t valid_emu_mask = 0;
+
+    /* emulate byte register */
+    valid_emu_mask = reg->emu_mask & valid_mask;
+    *value = ((*value & ~valid_emu_mask) | 
+              (cfg_entry->data & valid_emu_mask));
+
+    return 0;
+}
+
+/* read word size emulate register */
+static int pt_word_reg_read(struct pt_dev *ptdev,
+        struct pt_reg_tbl *cfg_entry,
+        uint16_t *value, uint16_t valid_mask)
+{
+    struct pt_reg_info_tbl *reg = cfg_entry->reg;
+    uint16_t valid_emu_mask = 0;
+
+    /* emulate word register */
+    valid_emu_mask = reg->emu_mask & valid_mask;
+    *value = ((*value & ~valid_emu_mask) | 
+              (cfg_entry->data & valid_emu_mask));
+
+    return 0;
+}
+
+/* read long size emulate register */
+static int pt_long_reg_read(struct pt_dev *ptdev,
+        struct pt_reg_tbl *cfg_entry,
+        uint32_t *value, uint32_t valid_mask)
+{
+    struct pt_reg_info_tbl *reg = cfg_entry->reg;
+    uint32_t valid_emu_mask = 0;
+
+    /* emulate long register */
+    valid_emu_mask = reg->emu_mask & valid_mask;
+    *value = ((*value & ~valid_emu_mask) | 
+              (cfg_entry->data & valid_emu_mask));
+
+   return 0;
+}
+
+/* read BAR */
+static int pt_bar_reg_read(struct pt_dev *ptdev,
+        struct pt_reg_tbl *cfg_entry,
+        uint32_t *value, uint32_t valid_mask)
+{
+    struct pt_reg_info_tbl *reg = cfg_entry->reg;
+    uint32_t valid_emu_mask = 0;
+    uint32_t bar_emu_mask = 0;
+    int index;
+
+    /* get BAR index */
+    index = pt_bar_offset_to_index(reg->offset);
+    if (index < 0)
+    {
+        /* exit I/O emulator */
+        PT_LOG("I/O emulator exit()\n");
+        exit(1);
+    }
+
+    /* set emulate mask depend on BAR flag */
+    switch (ptdev->bases[index].bar_flag)
+    {
+    case PT_BAR_FLAG_MEM:
+        bar_emu_mask = PT_BAR_MEM_EMU_MASK;
+        break;
+    case PT_BAR_FLAG_IO:
+        bar_emu_mask = PT_BAR_IO_EMU_MASK;
+        break;
+    case PT_BAR_FLAG_UPPER:
+        *value = 0;
+        goto out;
+    default:
+        break;
+    }
+
+    /* emulate BAR */
+    valid_emu_mask = bar_emu_mask & valid_mask;
+    *value = ((*value & ~valid_emu_mask) | 
+              (cfg_entry->data & valid_emu_mask));
+
+out:
+   return 0;
+}
+
+/* write byte size emulate register */
+static int pt_byte_reg_write(struct pt_dev *ptdev, 
+        struct pt_reg_tbl *cfg_entry, 
+        uint8_t *value, uint8_t dev_value, uint8_t valid_mask)
+{
+    struct pt_reg_info_tbl *reg = cfg_entry->reg;
+    uint8_t writable_mask = 0;
+    uint8_t throughable_mask = 0;
+
+    /* modify emulate register */
+    writable_mask = reg->emu_mask & ~reg->ro_mask & valid_mask;
+    cfg_entry->data = ((*value & writable_mask) |
+                       (cfg_entry->data & ~writable_mask));
+
+    /* create value for writing to I/O device register */
+    throughable_mask = ~reg->emu_mask & valid_mask;
+    *value = ((*value & throughable_mask) |
+              (dev_value & ~throughable_mask));
+
+    return 0;
+}
+
+/* write word size emulate register */
+static int pt_word_reg_write(struct pt_dev *ptdev, 
+        struct pt_reg_tbl *cfg_entry, 
+        uint16_t *value, uint16_t dev_value, uint16_t valid_mask)
+{
+    struct pt_reg_info_tbl *reg = cfg_entry->reg;
+    uint16_t writable_mask = 0;
+    uint16_t throughable_mask = 0;
+
+    /* modify emulate register */
+    writable_mask = reg->emu_mask & ~reg->ro_mask & valid_mask;
+    cfg_entry->data = ((*value & writable_mask) |
+                       (cfg_entry->data & ~writable_mask));
+
+    /* create value for writing to I/O device register */
+    throughable_mask = ~reg->emu_mask & valid_mask;
+    *value = ((*value & throughable_mask) |
+              (dev_value & ~throughable_mask));
+
+    return 0;
+}
+
+/* write long size emulate register */
+static int pt_long_reg_write(struct pt_dev *ptdev, 
+        struct pt_reg_tbl *cfg_entry, 
+        uint32_t *value, uint32_t dev_value, uint32_t valid_mask)
+{
+    struct pt_reg_info_tbl *reg = cfg_entry->reg;
+    uint32_t writable_mask = 0;
+    uint32_t throughable_mask = 0;
+
+    /* modify emulate register */
+    writable_mask = reg->emu_mask & ~reg->ro_mask & valid_mask;
+    cfg_entry->data = ((*value & writable_mask) |
+                       (cfg_entry->data & ~writable_mask));
+
+    /* create value for writing to I/O device register */
+    throughable_mask = ~reg->emu_mask & valid_mask;
+    *value = ((*value & throughable_mask) |
+              (dev_value & ~throughable_mask));
+
+    return 0;
+}
+
+/* write Command register */
+static int pt_cmd_reg_write(struct pt_dev *ptdev, 
+        struct pt_reg_tbl *cfg_entry, 
+        uint16_t *value, uint16_t dev_value, uint16_t valid_mask)
+{
+    struct pt_reg_info_tbl *reg = cfg_entry->reg;
+    uint16_t writable_mask = 0;
+    uint16_t throughable_mask = 0;
+    uint16_t wr_value = *value;
+
+    /* modify emulate register */
+    writable_mask = reg->emu_mask & ~reg->ro_mask & valid_mask;
+    cfg_entry->data = ((*value & writable_mask) |
+                       (cfg_entry->data & ~writable_mask));
+
+    /* create value for writing to I/O device register */
+    throughable_mask = ~reg->emu_mask & valid_mask;
+    *value = ((*value & throughable_mask) | (dev_value & ~throughable_mask));
+
+    /* mapping BAR */
+    pt_bar_mapping(ptdev, wr_value & PCI_COMMAND_IO, 
+                          wr_value & PCI_COMMAND_MEMORY);
+
+    return 0;
+}
+
+/* write BAR */
+static int pt_bar_reg_write(struct pt_dev *ptdev, 
+        struct pt_reg_tbl *cfg_entry, 
+        uint32_t *value, uint32_t dev_value, uint32_t valid_mask)
+{
+    struct pt_reg_info_tbl *reg = cfg_entry->reg;
+    struct pt_reg_grp_tbl *reg_grp_entry = NULL;
+    struct pt_reg_tbl *reg_entry = NULL;
+    struct pt_region *base = NULL;
+    PCIDevice *d = (PCIDevice *)&ptdev->dev;
+    PCIIORegion *r;
+    uint32_t writable_mask = 0;
+    uint32_t throughable_mask = 0;
+    uint32_t bar_emu_mask = 0;
+    uint32_t bar_ro_mask = 0;
+    uint32_t new_addr, last_addr;
+    uint32_t prev_offset;
+    uint32_t r_size = 0;
+    int index = 0;
+
+   /* get BAR index */
+    index = pt_bar_offset_to_index(reg->offset);
+    if (index < 0)
+    {
+        /* exit I/O emulator */
+        PT_LOG("I/O emulator exit()\n");
+        exit(1);
+    }
+
+    r = &d->io_regions[index];
+    r_size = r->size;
+    base = &ptdev->bases[index];
+    /* align resource size (memory type only) */
+    PT_GET_EMUL_SIZE(base->bar_flag, r_size);
+
+    /* check guest write value */
+    if (*value == PT_BAR_ALLF)
+    {
+        /* set register with resource size alligned to page size */
+        cfg_entry->data = ~(r_size - 1);
+        /* avoid writing ALL F to I/O device register */
+        *value = dev_value;
+    }
+    else
+    {
+        /* set emulate mask and read-only mask depend on BAR flag */
+        switch (ptdev->bases[index].bar_flag)
+        {
+        case PT_BAR_FLAG_MEM:
+            bar_emu_mask = PT_BAR_MEM_EMU_MASK;
+            bar_ro_mask = PT_BAR_MEM_RO_MASK;
+            break;
+        case PT_BAR_FLAG_IO:
+            new_addr = *value;
+            last_addr = new_addr + r_size - 1;
+            /* check 64K range */
+            if (last_addr <= new_addr || !new_addr || last_addr >= 0x10000)
+            {
+                PT_LOG("Guest attempt to set Base Address over the 64KB. "
+                    "[%02x:%02x.%x][Offset:%02xh][Range:%08xh-%08xh]\n",
+                    pci_bus_num(d->bus), 
+                    ((d->devfn >> 3) & 0x1F), (d->devfn & 0x7),
+                    reg->offset, new_addr, last_addr);
+                /* just remove mapping */
+                r->addr = -1;
+                goto exit;
+            }
+            bar_emu_mask = PT_BAR_IO_EMU_MASK;
+            bar_ro_mask = PT_BAR_IO_RO_MASK;
+            break;
+        case PT_BAR_FLAG_UPPER:
+            if (*value)
+            {
+                PT_LOG("Guest attempt to set high MMIO Base Address. "
+                   "Ignore mapping. "
+                   "[%02x:%02x.%x][Offset:%02xh][High Address:%08xh]\n",
+                    pci_bus_num(d->bus), 
+                    ((d->devfn >> 3) & 0x1F), (d->devfn & 0x7),
+                    reg->offset, *value);
+                /* clear lower address */
+                d->io_regions[index-1].addr = -1;
+            }
+            else
+            {
+                /* find lower 32bit BAR */
+                prev_offset = (reg->offset - 4);
+                reg_grp_entry = pt_find_reg_grp(ptdev, prev_offset);
+                if (reg_grp_entry)
+                {
+                    reg_entry = pt_find_reg(reg_grp_entry, prev_offset);
+                    if (reg_entry)
+                        /* restore lower address */
+                        d->io_regions[index-1].addr = reg_entry->data;
+                    else
+                        return -1;
+                }
+                else
+                    return -1;
+            }
+            cfg_entry->data = 0;
+            r->addr = -1;
+            goto exit;
+        }
+
+        /* modify emulate register */
+        writable_mask = bar_emu_mask & ~bar_ro_mask & valid_mask;
+        cfg_entry->data = ((*value & writable_mask) |
+                           (cfg_entry->data & ~writable_mask));
+        /* update the corresponding virtual region address */
+        r->addr = cfg_entry->data;
+
+        /* create value for writing to I/O device register */
+        throughable_mask = ~bar_emu_mask & valid_mask;
+        *value = ((*value & throughable_mask) |
+                  (dev_value & ~throughable_mask));
+    }
+
+exit:
+    return 0;
+}
+
+/* write Exp ROM BAR */
+static int pt_exp_rom_bar_reg_write(struct pt_dev *ptdev, 
+        struct pt_reg_tbl *cfg_entry, 
+        uint32_t *value, uint32_t dev_value, uint32_t valid_mask)
+{
+    struct pt_reg_info_tbl *reg = cfg_entry->reg;
+    struct pt_region *base = NULL;
+    PCIDevice *d = (PCIDevice *)&ptdev->dev;
+    PCIIORegion *r;
+    uint32_t writable_mask = 0;
+    uint32_t throughable_mask = 0;
+    uint32_t r_size = 0;
+
+    r = &d->io_regions[PCI_ROM_SLOT];
+    r_size = r->size;
+    base = &ptdev->bases[PCI_ROM_SLOT];
+    /* align memory type resource size */
+    PT_GET_EMUL_SIZE(base->bar_flag, r_size);
+
+    /* check guest write value */
+    if (*value == PT_BAR_ALLF)
+    {
+        /* set register with resource size alligned to page size */
+        cfg_entry->data = ~(r_size - 1);
+        /* avoid writing ALL F to I/O device register */
+        *value = dev_value;
+    }
+    else
+    {
+        /* modify emulate register */
+        writable_mask = reg->emu_mask & ~reg->ro_mask & valid_mask;
+        cfg_entry->data = ((*value & writable_mask) |
+                           (cfg_entry->data & ~writable_mask));
+        /* update the corresponding virtual region address */
+        r->addr = cfg_entry->data;
+
+        /* create value for writing to I/O device register */
+        throughable_mask = ~reg->emu_mask & valid_mask;
+        *value = ((*value & throughable_mask) |
+                  (dev_value & ~throughable_mask));
+    }
+
+    return 0;
+}
+
+/* write Power Management Control/Status register */
+static int pt_pmcsr_reg_write(struct pt_dev *ptdev, 
+        struct pt_reg_tbl *cfg_entry, 
+        uint16_t *value, uint16_t dev_value, uint16_t valid_mask)
+{
+    struct pt_reg_info_tbl *reg = cfg_entry->reg;
+    uint16_t writable_mask = 0;
+    uint16_t throughable_mask = 0;
+    uint16_t pmcsr_mask = (PCI_PM_CTRL_PME_ENABLE | 
+                           PCI_PM_CTRL_DATA_SEL_MASK |
+                           PCI_PM_CTRL_PME_STATUS);
+
+    /* modify emulate register */
+    writable_mask = reg->emu_mask & ~reg->ro_mask & valid_mask & ~pmcsr_mask;
+    /* ignore it when the requested state neither D3 nor D0 */
+    if (((*value & PCI_PM_CTRL_STATE_MASK) != PCI_PM_CTRL_STATE_MASK) &&
+        ((*value & PCI_PM_CTRL_STATE_MASK) != 0))
+        writable_mask &= ~PCI_PM_CTRL_STATE_MASK;
+
+    cfg_entry->data = ((*value & writable_mask) |
+                       (cfg_entry->data & ~writable_mask));
+
+    /* create value for writing to I/O device register */
+    throughable_mask = ~reg->emu_mask & valid_mask;
+    *value = ((*value & throughable_mask) |
+              (dev_value & ~throughable_mask));
+
+    return 0;
+}
+
+/* write Device Control register */
+static int pt_devctrl_reg_write(struct pt_dev *ptdev, 
+        struct pt_reg_tbl *cfg_entry, 
+        uint16_t *value, uint16_t dev_value, uint16_t valid_mask)
+{
+    struct pt_reg_info_tbl *reg = cfg_entry->reg;
+    uint16_t writable_mask = 0;
+    uint16_t throughable_mask = 0;
+    uint16_t devctrl_mask = (PCI_EXP_DEVCTL_AUX_PME | 0x8000);
+
+    /* modify emulate register */
+    writable_mask = reg->emu_mask & ~reg->ro_mask & valid_mask & ~devctrl_mask;
+    cfg_entry->data = ((*value & writable_mask) |
+                       (cfg_entry->data & ~writable_mask));
+
+    /* create value for writing to I/O device register */
+    throughable_mask = ~reg->emu_mask & valid_mask;
+    *value = ((*value & throughable_mask) |
+              (dev_value & ~throughable_mask));
+
+    return 0;
+}
+
+/* write Link Control register */
+static int pt_linkctrl_reg_write(struct pt_dev *ptdev, 
+        struct pt_reg_tbl *cfg_entry, 
+        uint16_t *value, uint16_t dev_value, uint16_t valid_mask)
+{
+    struct pt_reg_info_tbl *reg = cfg_entry->reg;
+    uint16_t writable_mask = 0;
+    uint16_t throughable_mask = 0;
+    uint16_t linkctrl_mask = (PCI_EXP_LNKCTL_ASPM | 0x04 |
+                              PCI_EXP_LNKCTL_DISABLE |
+                              PCI_EXP_LNKCTL_RETRAIN | 
+                              0x0400 | 0x0800 | 0xF000);
+
+    /* modify emulate register */
+    writable_mask = reg->emu_mask & ~reg->ro_mask & valid_mask & ~linkctrl_mask;
+    cfg_entry->data = ((*value & writable_mask) |
+                       (cfg_entry->data & ~writable_mask));
+
+    /* create value for writing to I/O device register */
+    throughable_mask = ~reg->emu_mask & valid_mask;
+    *value = ((*value & throughable_mask) |
+              (dev_value & ~throughable_mask));
+
+    return 0;
+}
+
+/* write Device Control2 register */
+static int pt_devctrl2_reg_write(struct pt_dev *ptdev, 
+        struct pt_reg_tbl *cfg_entry, 
+        uint16_t *value, uint16_t dev_value, uint16_t valid_mask)
+{
+    struct pt_reg_info_tbl *reg = cfg_entry->reg;
+    uint16_t writable_mask = 0;
+    uint16_t throughable_mask = 0;
+    uint16_t devctrl2_mask = 0xFFE0;
+
+    /* modify emulate register */
+    writable_mask = reg->emu_mask & ~reg->ro_mask & valid_mask & ~devctrl2_mask;
+    cfg_entry->data = ((*value & writable_mask) |
+                       (cfg_entry->data & ~writable_mask));
+
+    /* create value for writing to I/O device register */
+    throughable_mask = ~reg->emu_mask & valid_mask;
+    *value = ((*value & throughable_mask) |
+              (dev_value & ~throughable_mask));
+
+    return 0;
+}
+
+/* write Link Control2 register */
+static int pt_linkctrl2_reg_write(struct pt_dev *ptdev, 
+        struct pt_reg_tbl *cfg_entry, 
+        uint16_t *value, uint16_t dev_value, uint16_t valid_mask)
+{
+    struct pt_reg_info_tbl *reg = cfg_entry->reg;
+    uint16_t writable_mask = 0;
+    uint16_t throughable_mask = 0;
+    uint16_t linkctrl2_mask = (0x0040 | 0xE000);
+
+    /* modify emulate register */
+    writable_mask = reg->emu_mask & ~reg->ro_mask & valid_mask & 
+                    ~linkctrl2_mask;
+    cfg_entry->data = ((*value & writable_mask) |
+                       (cfg_entry->data & ~writable_mask));
+
+    /* create value for writing to I/O device register */
+    throughable_mask = ~reg->emu_mask & valid_mask;
+    *value = ((*value & throughable_mask) |
+              (dev_value & ~throughable_mask));
+
+    return 0;
+}
+
 struct pt_dev * register_real_device(PCIBus *e_bus,
         const char *e_dev_name, int e_devfn, uint8_t r_bus, uint8_t r_dev,
         uint8_t r_func, uint32_t machine_irq, struct pci_access *pci_access)
 {
-    int rc = -1, i, pos;
+    int rc = -1, i;
     struct pt_dev *assigned_device = NULL;
     struct pci_dev *pci_dev;
     uint8_t e_device, e_intx;
@@ -539,7 +2249,6 @@ struct pt_dev * register_real_device(PCI
         dpci_infos.php_devs[PCI_TO_PHP_SLOT(free_pci_slot)].pt_dev = assigned_device;
 
     assigned_device->pci_dev = pci_dev;
-
 
     /* Assign device */
     machine_bdf.reg = 0;
@@ -554,18 +2263,22 @@ struct pt_dev * register_real_device(PCI
     for ( i = 0; i < PCI_CONFIG_SIZE; i++ )
         assigned_device->dev.config[i] = pci_read_byte(pci_dev, i);
 
-    if ( (pos = find_cap_offset(pci_dev, PCI_CAP_ID_MSI)) )
-        pt_msi_init(assigned_device, pos);
-
-    if ( (pos = find_cap_offset(pci_dev, PCI_CAP_ID_MSIX)) )
-        pt_msix_init(assigned_device, pos);
-
     /* Handle real device's MMIO/PIO BARs */
     pt_register_regions(assigned_device);
 
+    /* reinitialize each config register to be emulated */
+    rc = pt_config_init(assigned_device);
+    if ( rc < 0 ) {
+        return NULL;
+    }
+
     /* Bind interrupt */
+    if (!assigned_device->dev.config[0x3d])
+        goto out;
+
     e_device = (assigned_device->dev.devfn >> 3) & 0x1f;
-    e_intx = assigned_device->dev.config[0x3d]-1;
+    /* fix virtual interrupt pin to INTA# */
+    e_intx = 0;
 
     if ( PT_MACHINE_IRQ_AUTO == machine_irq )
     {
@@ -602,6 +2315,7 @@ struct pt_dev * register_real_device(PCI
             *(uint16_t *)(&assigned_device->dev.config[0x04]));
     }
 
+out:
     PT_LOG("Real physical device %02x:%02x.%x registered successfuly!\n", 
         r_bus, r_dev, r_func);
 
diff -r 926a366ca82f tools/ioemu/hw/pass-through.h
--- a/tools/ioemu/hw/pass-through.h	Fri Jun 20 15:21:26 2008 +0100
+++ b/tools/ioemu/hw/pass-through.h	Tue Jul 01 20:35:37 2008 +0900
@@ -21,6 +21,7 @@
 #include "vl.h"
 #include "pci/header.h"
 #include "pci/pci.h"
+#include "list.h"
 
 /* Log acesss */
 #define PT_LOGGING_ENABLED
@@ -42,6 +43,38 @@
 #define PCI_EXP_DEVCAP_FLR      (1 << 28)
 #define PCI_EXP_DEVCTL_FLR      (1 << 15)
 #define PCI_BAR_ENTRIES         (6)
+
+/* because the current version of libpci (2.2.0) doesn't define these ID,
+ * so we define Capability ID here.
+ */
+/* SHPC Capability List Item reg group */
+#define PCI_CAP_ID_HOTPLUG      0x0C
+/* Subsystem ID and Subsystem Vendor ID Capability List Item reg group */
+#define PCI_CAP_ID_SSVID        0x0D
+/* interrupt masking & reporting supported */
+#define PCI_MSI_FLAGS_MASK_BIT  0x0100
+
+#define PT_BAR_ALLF             0xFFFFFFFF      /* BAR mask */
+#define PT_BAR_MEM_RO_MASK      0x0000000F      /* BAR ReadOnly mask(Memory) */
+#define PT_BAR_MEM_EMU_MASK     0xFFFFFFF0      /* BAR emul mask(Memory) */
+#define PT_BAR_IO_RO_MASK       0x00000003      /* BAR ReadOnly mask(I/O) */
+#define PT_BAR_IO_EMU_MASK      0xFFFFFFFC      /* BAR emul mask(I/O) */
+enum {
+    PT_BAR_FLAG_MEM = 0,                        /* Memory type BAR */
+    PT_BAR_FLAG_IO,                             /* I/O type BAR */
+    PT_BAR_FLAG_UPPER,                          /* upper 64bit BAR */
+    PT_BAR_FLAG_UNUSED,                         /* unused BAR */
+};
+enum {
+    GRP_TYPE_HARDWIRED = 0,                     /* 0 Hardwired reg group */
+    GRP_TYPE_EMU,                               /* emul reg group */
+};
+
+#define PT_GET_EMUL_SIZE(flag, r_size) do { \
+    if (flag == PT_BAR_FLAG_MEM) {\
+        r_size = (((r_size) + PAGE_SIZE - 1) & ~(PAGE_SIZE - 1)); \
+    }\
+} while(0)
 
 struct pt_region {
     /* Virtual phys base & size */
@@ -49,11 +82,13 @@ struct pt_region {
     uint32_t e_size;
     /* Index of region in qemu */
     uint32_t memory_index;
+    /* BAR flag */
+    uint32_t bar_flag;
     /* Translation of the emulated address */
     union {
-        uint32_t maddr;
-        uint32_t pio_base;
-        uint32_t u;
+        uint64_t maddr;
+        uint64_t pio_base;
+        uint64_t u;
     } access;
 };
 
@@ -89,8 +124,9 @@ struct pt_msix_info {
 */
 struct pt_dev {
     PCIDevice dev;
-    struct pci_dev *pci_dev;                     /* libpci struct */
+    struct pci_dev *pci_dev;                    /* libpci struct */
     struct pt_region bases[PCI_NUM_REGIONS];    /* Access regions */
+    struct list_head pt_reg_grp_tbl_list;       /* emul reg group list */
     struct pt_msi_info *msi;                    /* MSI virtualization */
     struct pt_msix_info *msix;                  /* MSI-X virtualization */
 };
@@ -113,5 +149,121 @@ struct pci_config_cf8 {
 
 int pt_init(PCIBus * e_bus, char * direct_pci);
 
+/* emul reg group management table */
+struct pt_reg_grp_tbl {
+    /* emul reg group list */
+    struct list_head list;
+    /* emul reg group info table */
+    struct pt_reg_grp_info_tbl *reg_grp;
+    /* emul reg group base offset */
+    uint32_t base_offset;
+    /* emul reg group size */
+    uint8_t size;
+    /* emul reg management table list */
+    struct list_head pt_reg_tbl_list;
+};
+
+/* emul reg group size initialize method */
+typedef uint8_t (*pt_reg_size_init) (struct pt_dev *ptdev, 
+                                     struct pt_reg_grp_info_tbl *grp_reg, 
+                                     uint32_t base_offset);
+/* emul reg group infomation table */
+struct pt_reg_grp_info_tbl {
+    /* emul reg group ID */
+    uint8_t grp_id;
+    /* emul reg group type */
+    uint8_t grp_type;
+    /* emul reg group size */
+    uint8_t grp_size;
+    /* emul reg get size method */
+    pt_reg_size_init size_init;
+    /* emul reg info table */
+    struct pt_reg_info_tbl *emu_reg_tbl;
+};
+
+/* emul reg management table */
+struct pt_reg_tbl {
+    /* emul reg table list */
+    struct list_head list;
+    /* emul reg info table */
+    struct pt_reg_info_tbl *reg;
+    /* emul reg value */
+    uint32_t data;
+};
+
+/* emul reg initialize method */
+typedef uint32_t (*conf_reg_init) (struct pt_dev *ptdev, 
+                                   struct pt_reg_info_tbl *reg, 
+                                   uint32_t real_offset);
+/* emul reg long write method */
+typedef int (*conf_dword_write) (struct pt_dev *ptdev,
+                                 struct pt_reg_tbl *cfg_entry, 
+                                 uint32_t *value, 
+                                 uint32_t dev_value,
+                                 uint32_t valid_mask);
+/* emul reg word write method */
+typedef int (*conf_word_write) (struct pt_dev *ptdev,
+                                struct pt_reg_tbl *cfg_entry, 
+                                uint16_t *value, 
+                                uint16_t dev_value,
+                                uint16_t valid_mask);
+/* emul reg byte write method */
+typedef int (*conf_byte_write) (struct pt_dev *ptdev,
+                                struct pt_reg_tbl *cfg_entry, 
+                                uint8_t *value, 
+                                uint8_t dev_value,
+                                uint8_t valid_mask);
+/* emul reg long read methods */
+typedef int (*conf_dword_read) (struct pt_dev *ptdev,
+                                struct pt_reg_tbl *cfg_entry, 
+                                uint32_t *value,
+                                uint32_t valid_mask);
+/* emul reg word read method */
+typedef int (*conf_word_read) (struct pt_dev *ptdev,
+                               struct pt_reg_tbl *cfg_entry, 
+                               uint16_t *value,
+                               uint16_t valid_mask);
+/* emul reg byte read method */
+typedef int (*conf_byte_read) (struct pt_dev *ptdev,
+                               struct pt_reg_tbl *cfg_entry, 
+                               uint8_t *value,
+                               uint8_t valid_mask);
+
+/* emul reg infomation table */
+struct pt_reg_info_tbl {
+    /* reg relative offset */
+    uint32_t offset;
+    /* reg size */
+    uint32_t size;
+    /* reg initial value */
+    uint32_t init_val;
+    /* reg read only field mask (ON:RO/ROS, OFF:other) */
+    uint32_t ro_mask;
+    /* reg emulate field mask (ON:emu, OFF:passthrough) */
+    uint32_t emu_mask;
+    /* emul reg initialize method */
+    conf_reg_init init;
+    union {
+        struct {
+            /* emul reg long write method */
+            conf_dword_write write;
+            /* emul reg long read method */
+            conf_dword_read read;
+        } dw;
+        struct {
+            /* emul reg word write method */
+            conf_word_write write;
+            /* emul reg word read method */
+            conf_word_read read;
+        } w;
+        struct {
+            /* emul reg byte write method */
+            conf_byte_write write;
+            /* emul reg byte read method */
+            conf_byte_read read;
+        } b;
+    } u;
+};
+
 #endif /* __PASSTHROUGH_H__ */
 
diff -r 926a366ca82f tools/ioemu/hw/pci.c
--- a/tools/ioemu/hw/pci.c	Fri Jun 20 15:21:26 2008 +0100
+++ b/tools/ioemu/hw/pci.c	Tue Jul 01 20:35:37 2008 +0900
@@ -641,3 +641,34 @@ PCIBus *pci_bridge_init(PCIBus *bus, int
     s->bus = pci_register_secondary_bus(&s->dev, map_irq);
     return s->bus;
 }
+
+int pt_chk_bar_overlap(PCIBus *bus, int devfn, uint32_t addr, uint32_t size)
+{
+    PCIDevice *devices = (PCIDevice *)bus->devices;
+    PCIIORegion *r;
+    int ret = 0;
+    int i, j;
+
+    /* check Overlapped to Base Address */
+    for (i=0; i<256; i++, devices++)
+    {
+        if ((devices == NULL) || (devices->devfn == devfn))
+            continue;
+
+        for (j=0; j<PCI_NUM_REGIONS; j++)
+        {
+            r = &devices->io_regions[j];
+            if ((addr < (r->addr + r->size)) && ((addr + size) > r->addr))
+            {
+                printf("Overlapped to device[%02x:%02x.%x] region:%d addr:%08x"
+                    " size:%08x\n", bus->bus_num, (devices->devfn >> 3) & 0x1F,
+                    (devices->devfn & 0x7), j, r->addr, r->size);
+                ret = 1;
+                goto out;
+            }
+        }
+    }
+
+out:
+    return ret;
+}
diff -r 926a366ca82f tools/ioemu/list.h
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/tools/ioemu/list.h	Tue Jul 01 18:12:23 2008 +0900
@@ -0,0 +1,89 @@
+#ifndef _IOEMU_LIST_H
+#define _IOEMU_LIST_H
+/* Taken from Linux kernel code, but de-kernelized for userspace. */
+#include <stddef.h>
+
+/*
+ * These are non-NULL pointers that will result in page faults
+ * under normal circumstances, used to verify that nobody uses
+ * non-initialized list entries.
+ */
+#define LIST_POISON1  ((void *) 0x00100100)
+#define LIST_POISON2  ((void *) 0x00200200)
+
+#define container_of(ptr, type, member) ({                \
+        typeof( ((type *)0)->member ) *__mptr = (ptr);    \
+        (type *)( (char *)__mptr - offsetof(type,member) );})
+
+/*
+ * Simple doubly linked list implementation.
+ *
+ * Some of the internal functions ("__xxx") are useful when
+ * manipulating whole lists rather than single entries, as
+ * sometimes we already know the next/prev entries and we can
+ * generate better code by using them directly rather than
+ * using the generic single-entry routines.
+ */
+
+struct list_head {
+    struct list_head *next, *prev;
+};
+
+#define LIST_HEAD_INIT(name) { &(name), &(name) }
+
+#define INIT_LIST_HEAD(ptr) do { \
+    (ptr)->next = (ptr); (ptr)->prev = (ptr); \
+} while (0)
+
+
+/*
+ * Insert a new entry between two known consecutive entries. 
+ *
+ * This is only for internal list manipulation where we know
+ * the prev/next entries already!
+ */
+static inline void __list_add(struct list_head *new,
+                  struct list_head *prev,
+                  struct list_head *next)
+{
+    next->prev = new;
+    new->next = next;
+    new->prev = prev;
+    prev->next = new;
+}
+
+/**
+ * list_add_tail - add a new entry
+ * @new: new entry to be added
+ * @head: list head to add it before
+ *
+ * Insert a new entry before the specified head.
+ * This is useful for implementing queues.
+ */
+static inline void list_add_tail(struct list_head *new, 
+                                 struct list_head *head)
+{
+    __list_add(new, head->prev, head);
+}
+
+/**
+ * list_entry - get the struct for this entry
+ * @ptr:    the &struct list_head pointer.
+ * @type:   the type of the struct this is embedded in.
+ * @member: the name of the list_struct within the struct.
+ */
+#define list_entry(ptr, type, member)  \
+    container_of(ptr, type, member)
+
+/**
+ * list_for_each_entry - iterate over list of given type
+ * @pos:    the type * to use as a loop counter.
+ * @head:   the head for your list.
+ * @member: the name of the list_struct within the struct.
+ */
+#define list_for_each_entry(pos, head, member)                    \
+    for (pos = list_entry((head)->next, typeof(*pos), member);    \
+         &pos->member != (head);                                  \
+         pos = list_entry(pos->member.next, typeof(*pos), member))
+
+#endif
diff -r 926a366ca82f tools/ioemu/vl.h
--- a/tools/ioemu/vl.h	Fri Jun 20 15:21:26 2008 +0100
+++ b/tools/ioemu/vl.h	Tue Jul 01 20:35:37 2008 +0900
@@ -832,6 +832,8 @@ void pci_register_io_region(PCIDevice *p
                             uint32_t size, int type, 
                             PCIMapIORegionFunc *map_func);
 
+int pt_chk_bar_overlap(PCIBus *bus, int devfn, uint32_t addr, uint32_t size);
+
 void pci_set_irq(PCIDevice *pci_dev, int irq_num, int level);
 
 uint32_t pci_default_read_config(PCIDevice *d, 

[-- Attachment #3: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* RE: [PATCH][RFC] Support more Capability StructuresandDevice Specific
  2008-07-02  1:03                 ` Yuji Shimada
@ 2008-07-02  2:07                   ` Cui, Dexuan
  2008-07-03  1:49                   ` Dong, Eddie
  1 sibling, 0 replies; 33+ messages in thread
From: Cui, Dexuan @ 2008-07-02  2:07 UTC (permalink / raw)
  To: Yuji Shimada; +Cc: Dong, Eddie, xen-devel, Ian Jackson, Keir Fraser

I had a test with your new patch: the occasional-NIC-doesn't-work issue I noticed disappeared.
I'll have a look why it disappeared.  :-)

Thanks,
-- Dexuan


-----Original Message-----
From: Yuji Shimada [mailto:shimada-yxb@necst.nec.co.jp] 
Sent: 2008年7月2日 9:03
To: Cui, Dexuan
Cc: Ian Jackson; xen-devel@lists.xensource.com; Dong, Eddie; Keir Fraser
Subject: Re: [Xen-devel] [PATCH][RFC] Support more Capability StructuresandDevice Specific

I've done some bug fixes as follows.

1. correct the size calculation of MSI Capability Structure in
   pt_msi_size_init(). The next capability might be hidden due to wrong
   large size of MSI.

2. modify the decision logic for determining unused Exp ROM BAR in
   pt_bar_reg_parse(). Use PCIIORegion table instead of parsing
   BAR itself.

3. bug fix on .size_init func for PCI Express Capability Structure
   in pt_emu_reg_grp_tbl[].
   (pt_vendor_size_init ---> pt_reg_grp_size_init)

4. small bug fix on the decision logic for checking unused BAR in
   pt_pci_write_config().

5. add printf message to show overlapped device in pt_chk_bar_overlap().

6. modify pt_bar_mapping() to prevent guest software mapping memory
   resource to 00000000h

7. modify pt_bar_mapping() to map resource even if overlapping is
   detected.

I've tested my patch with CentOS 5.1 and PCI/PCIe NIC.  Without
"pci=nomsi", guest OS can use the assigned NIC and can communicate
with external machine.

Additionally I assigned UHCI Controller to guest domain. Guest OS can
use USB-HDD and USB-Mouse.

Could you test the patch?


I am going to remove list.h and enable MSI.

Thanks.

Signed-off-by: Yuji Shimada <shimada-yxb@necst.nec.co.jp>

--
Yuji Shimada

On Mon, 30 Jun 2008 17:29:38 +0800
"Cui, Dexuan" <dexuan.cui@intel.com> wrote:

> I'm using x86_64 c/s 17888: 6ace85eb96c0, and assigning a 82541PI Gigabit Etherer NIC to guest.
> I also tried  "pci=nomsi" for Dom0, and the issus is still there. 
> When the issue happens, eth0 doesn't occur in /proc/interrupt though the device driver module is loaded.
> The issue doesn't happen every time. Really strange...
> 
> Thanks,
> -- Dexuan
> 
> 
> -----Original Message-----
> From: Yuji Shimada [mailto:shimada-yxb@necst.nec.co.jp] 
> Sent: 2008夏・花可30科苛 16:15
> To: Cui, Dexuan
> Cc: Ian Jackson; xen-devel@lists.xensource.com; Dong, Eddie; Keir Fraser
> Subject: Re: [Xen-devel] [PATCH][RFC] Support more Capability StructuresandDevice Specific
> 
> Hi Dexuan,
> 
> I've tested my patch with CentOS 5.1 and PCI/PCIe NIC.  In my test
> environment (with "pci=nomsi" set for Dom0 boot parameter), guest
> OS can use the assigned NIC and can communicate with external machine.
> 
> Does guest OS recieve interrupt? You can check via /proc/interrupts.
> 
> Thanks.
> 
> --
> Yuji Shimada
> 
> > Hi Yuji,
> > I looked at the patch.  It seems pretty good. 
> > Except for the (temporary) absence of MSI/MSI-X stuff, looks the passthrough policy in the patch is almost the same as what is discussed in the PDF file Eddie posted.
> > 
> > I also made some tests against the patch, and found there may be some unstable issues:
> > I.e., when I boot a 32e RHEL5u1 (I add the "pci=nomsi" parameter)), it can easily (30%~80% probable) stay for a very long (i.e., >40s) at "Starting udev:", and after I login in shell, the NIC seems not present (the guest has no network available), but "lspci" shows the NIC is there.
> > If I use the Qemu without your patch, the issue disappears at once, and NIC in guest works well.
> > 
> > I haven't found issue in your patch yet. :)
> > 
> > Thanks,
> > -- Dexuan
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* RE: [PATCH][RFC] Support more Capability Structures andDevice Specific
  2008-07-01 23:23                 ` Dong, Eddie
@ 2008-07-02 10:30                   ` Ian Jackson
  2008-07-02 11:17                     ` Alan Cox
  2008-07-03  1:38                     ` [PATCH][RFC] Support more Capability Structures andDevice Specific Dong, Eddie
  0 siblings, 2 replies; 33+ messages in thread
From: Ian Jackson @ 2008-07-02 10:30 UTC (permalink / raw)
  To: Dong, Eddie; +Cc: Yuji Shimada, xen-devel, Keir Fraser

Dong, Eddie writes ("RE: [Xen-devel] [PATCH][RFC] Support more Capability Structures andDevice Specific"):
> Per current data, pass through get many known bug fixed as the case
> Dexuan mentioned. But we didn't see a HW damaging host. Some know issue
> could be a device issuing tons of PCIe traffic, absorbing extra power,
> issuing interrupt storm etc, but right now we didn't see issues yet.

Most people doing PCI passthrough appear to be under the impression
that the guest cannot escape and cannot damage the host.  (Even those
currently doing PCI passthrough with current production hardware
without an iommu!)

I think it is fine to have a passthrough option which doesn't properly
protect the host from the guest - this is a useful setup in many
situations.  But it should not be enabled by default, surely ?

Note that this is a _security_ problem.  So `data' about `issues'
which you have `seen' is irrelevant.  Just because you haven't
actually observed any misbehaviour with non-malicious guests doesn't
mean that a malicious guest couldn't cause the hardware to melt.

Ian.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH][RFC] Support more Capability Structures andDevice Specific
  2008-07-02 10:30                   ` Ian Jackson
@ 2008-07-02 11:17                     ` Alan Cox
  2008-07-03  1:46                       ` Dong, Eddie
  2008-07-03  1:38                     ` [PATCH][RFC] Support more Capability Structures andDevice Specific Dong, Eddie
  1 sibling, 1 reply; 33+ messages in thread
From: Alan Cox @ 2008-07-02 11:17 UTC (permalink / raw)
  To: Ian Jackson; +Cc: Yuji Shimada, xen-devel, Dong, Eddie, Keir Fraser

> I think it is fine to have a passthrough option which doesn't properly
> protect the host from the guest - this is a useful setup in many
> situations.  But it should not be enabled by default, surely ?

Agreed entirely. Note also that some implementations of an IOMMU will not
save you as they don't fence between individual PCI devices (PCIE is
obviously a bit easier). Not fencing between devices allows you for
example to use a fairly flexible SCSI controller to reprogram another
device. 

In the general case there are also some really nasty dirty attacks you
can't stop with an IOMMU one of which is to reflash the BIOS of the
graphics card to which you were given unrestricted access so that you
compromise the entire system next boot. These attacks appear well
understood except by IOMMU marketing people ;)

IOMMU is great for system correctness and flexibility, using it for
safely providing hardware direct access is a very very hairy business with
a complex device.

Alan

^ permalink raw reply	[flat|nested] 33+ messages in thread

* RE: [PATCH][RFC] Support more Capability Structures andDevice Specific
  2008-07-02 10:30                   ` Ian Jackson
  2008-07-02 11:17                     ` Alan Cox
@ 2008-07-03  1:38                     ` Dong, Eddie
  1 sibling, 0 replies; 33+ messages in thread
From: Dong, Eddie @ 2008-07-03  1:38 UTC (permalink / raw)
  To: Ian Jackson; +Cc: Yuji Shimada, xen-devel, Dong, Eddie, Keir Fraser

Ian Jackson wrote:
> Dong, Eddie writes ("RE: [Xen-devel] [PATCH][RFC] Support
> more Capability Structures andDevice Specific"): 
>> Per current data, pass through get many known bug fixed
>> as the case Dexuan mentioned. But we didn't see a HW
>> damaging host. Some know issue could be a device issuing
>> tons of PCIe traffic, absorbing extra power, issuing
>> interrupt storm etc, but right now we didn't see issues
>> yet.  
> 
> Most people doing PCI passthrough appear to be under the
> impression 
> that the guest cannot escape and cannot damage the host. 
> (Even those 
> currently doing PCI passthrough with current production
> hardware 
> without an iommu!)

What I am aware is only QoS, I didn't know how can a guest program the
device to crash host. Interrupt storm can be blocked by hypervisor at
certain situation. Competing for unnecessary PCIe traffic is never
related to if we pass through guest setting or not. Can you give me a
specific example how host will be crashed?

> 
> I think it is fine to have a passthrough option which
> doesn't properly 
> protect the host from the guest - this is a useful setup
> in many 
> situations.  But it should not be enabled by default,
> surely ? 

Same reason as above.

> 
> Note that this is a _security_ problem.  So `data' about
> `issues' 
> which you have `seen' is irrelevant.  Just because you
> haven't 
> actually observed any misbehaviour with non-malicious
> guests doesn't 
> mean that a malicious guest couldn't cause the hardware
> to melt. 

Examples even in theory?
NOTE here, current pass through logic only support devices under root
port.

> 
> Ian.

Thx, eddie

^ permalink raw reply	[flat|nested] 33+ messages in thread

* RE: [PATCH][RFC] Support more Capability Structures andDevice Specific
  2008-07-02 11:17                     ` Alan Cox
@ 2008-07-03  1:46                       ` Dong, Eddie
  2008-07-03  9:50                         ` Ian Jackson
  0 siblings, 1 reply; 33+ messages in thread
From: Dong, Eddie @ 2008-07-03  1:46 UTC (permalink / raw)
  To: Alan Cox, Ian Jackson; +Cc: Yuji Shimada, xen-devel, Dong, Eddie, Keir Fraser

Alan Cox wrote:
>> I think it is fine to have a passthrough option which
>> doesn't properly protect the host from the guest - this
>> is a useful setup in many situations.  But it should not
>> be enabled by default, surely ? 
> 
> Agreed entirely. Note also that some implementations of
> an IOMMU will not save you as they don't fence between
> individual PCI devices (PCIE is obviously a bit easier).

IOMMU, at least Intel's IOMMU, doesn't support pure PCI device, only
PCIe devices can be DMA protected.

> Not fencing between devices allows you for example to use
> a fairly flexible SCSI controller to reprogram another
> device. 

Again, at least for Intel IOMMU, devices under root endpoint can never
escape from IOMMU DMA protection, right now we don't support PCIe
devices under a switch to do assignement, but with future ATS or ACS is
implemented, we can assign devices under a switch, where ether the
switch disable peer to peer transaction or always pass up "untranslated"
traffic to upstream.

So your concern is a not real IMO, not? Or do u mean AMD IOMMU may have
different implementation? 

> 
> In the general case there are also some really nasty
> dirty attacks you can't stop with an IOMMU one of which
> is to reflash the BIOS of the graphics card to which you
> were given unrestricted access so that you compromise the
> entire system next boot. These attacks appear well
> understood except by IOMMU marketing people ;) 

Same with above, this is already protected by IOMMU, peer to peer DMA is
not supported right now.

> 
> IOMMU is great for system correctness and flexibility,
> using it for safely providing hardware direct access is a
> very very hairy business with a complex device.
> 
Agree, that is why we are here :)

Thx, eddie

^ permalink raw reply	[flat|nested] 33+ messages in thread

* RE: [PATCH][RFC] Support more Capability StructuresandDevice Specific
  2008-07-02  1:03                 ` Yuji Shimada
  2008-07-02  2:07                   ` Cui, Dexuan
@ 2008-07-03  1:49                   ` Dong, Eddie
  1 sibling, 0 replies; 33+ messages in thread
From: Dong, Eddie @ 2008-07-03  1:49 UTC (permalink / raw)
  To: Yuji Shimada, Cui, Dexuan
  Cc: Dong, Eddie, xen-devel, Ian Jackson, Keir Fraser

Acked-by: Eddie Dong <eddie.dong@intel.com>

Yuji Shimada wrote:
> I've done some bug fixes as follows.
> 
> 1. correct the size calculation of MSI Capability
>    Structure in pt_msi_size_init(). The next capability
>    might be hidden due to wrong large size of MSI.
> 
> 2. modify the decision logic for determining unused Exp
>    ROM BAR in pt_bar_reg_parse(). Use PCIIORegion table
>    instead of parsing BAR itself.
> 
> 3. bug fix on .size_init func for PCI Express Capability
>    Structure in pt_emu_reg_grp_tbl[].
>    (pt_vendor_size_init ---> pt_reg_grp_size_init)
> 
> 4. small bug fix on the decision logic for checking
>    unused BAR in pt_pci_write_config().
> 
> 5. add printf message to show overlapped device in
> pt_chk_bar_overlap(). 
> 
> 6. modify pt_bar_mapping() to prevent guest software
>    mapping memory resource to 00000000h
> 
> 7. modify pt_bar_mapping() to map resource even if
>    overlapping is detected.
> 
> I've tested my patch with CentOS 5.1 and PCI/PCIe NIC. 
> Without "pci=nomsi", guest OS can use the assigned NIC
> and can communicate 
> with external machine.
> 
> Additionally I assigned UHCI Controller to guest domain.
> Guest OS can use USB-HDD and USB-Mouse.
> 
> Could you test the patch?
> 
> 
> I am going to remove list.h and enable MSI.
> 
> Thanks.
> 
> Signed-off-by: Yuji Shimada <shimada-yxb@necst.nec.co.jp>
> 
> 
>> I'm using x86_64 c/s 17888: 6ace85eb96c0, and assigning
>> a 82541PI Gigabit Etherer NIC to guest. 
>> I also tried  "pci=nomsi" for Dom0, and the issus is
>> still there. 
>> When the issue happens, eth0 doesn't occur in
>> /proc/interrupt though the device driver module is
>> loaded. The issue doesn't happen every time. Really
>> strange...  
>> 
>> Thanks,
>> -- Dexuan
>> 
>> 
>> -----Original Message-----
>> From: Yuji Shimada [mailto:shimada-yxb@necst.nec.co.jp]
>> Sent: 2008夏・花可30科苛 16:15
>> To: Cui, Dexuan
>> Cc: Ian Jackson; xen-devel@lists.xensource.com; Dong,
>> Eddie; Keir Fraser 
>> Subject: Re: [Xen-devel] [PATCH][RFC] Support more
>> Capability StructuresandDevice Specific 
>> 
>> Hi Dexuan,
>> 
>> I've tested my patch with CentOS 5.1 and PCI/PCIe NIC. 
>> In my test 
>> environment (with "pci=nomsi" set for Dom0 boot
>> parameter), guest 
>> OS can use the assigned NIC and can communicate with
>> external machine. 
>> 
>> Does guest OS recieve interrupt? You can check via
>> /proc/interrupts. 
>> 
>> Thanks.
>> 
>> --
>> Yuji Shimada
>> 
>>> Hi Yuji,
>>> I looked at the patch.  It seems pretty good.
>>> Except for the (temporary) absence of MSI/MSI-X stuff,
>>> looks the passthrough policy in the patch is almost the
>>> same as what is discussed in the PDF file Eddie posted.
>>> 
>>> I also made some tests against the patch, and found
>>> there may be some unstable issues: 
>>> I.e., when I boot a 32e RHEL5u1 (I add the "pci=nomsi"
>>> parameter)), it can easily (30%~80% probable) stay for
>>> a very long (i.e., >40s) at "Starting udev:", and after
>>> I login in shell, the NIC seems not present (the guest
>>> has no network available), but "lspci" shows the NIC is
>>> there. If I use the Qemu without your patch, the issue
>>> disappears at once, and NIC in guest works well.     
>>> 
>>> I haven't found issue in your patch yet. :)
>>> 
>>> Thanks,
>>> -- Dexuan
>> 
>> 
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xensource.com
>> http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* RE: [PATCH][RFC] Support more Capability Structures andDevice Specific
  2008-07-03  1:46                       ` Dong, Eddie
@ 2008-07-03  9:50                         ` Ian Jackson
  2008-07-03 23:03                           ` [PATCH][RFC] Support more Capability StructuresandDevice Specific Dong, Eddie
  0 siblings, 1 reply; 33+ messages in thread
From: Ian Jackson @ 2008-07-03  9:50 UTC (permalink / raw)
  To: Dong, Eddie; +Cc: Yuji Shimada, xen-devel, Keir Fraser, Alan Cox

Dong, Eddie writes ("RE: [Xen-devel] [PATCH][RFC] Support more Capability Structures andDevice Specific"):
> Alan Cox wrote:
> > In the general case there are also some really nasty
> > dirty attacks you can't stop with an IOMMU one of which
> > is to reflash the BIOS of the graphics card to which you
> > were given unrestricted access so that you compromise the
> > entire system next boot. These attacks appear well
> > understood except by IOMMU marketing people ;) 
> 
> Same with above, this is already protected by IOMMU, peer to peer DMA is
> not supported right now.

You have evidently completely misunderstood Alan's point.

I was going to explain it again but I'm not sure I know how to say it
more clearly.  Alan's scenario doesn't involve any peer to peer DMA.

Ian.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* RE: [PATCH][RFC] Support more Capability StructuresandDevice Specific
  2008-07-03  9:50                         ` Ian Jackson
@ 2008-07-03 23:03                           ` Dong, Eddie
  0 siblings, 0 replies; 33+ messages in thread
From: Dong, Eddie @ 2008-07-03 23:03 UTC (permalink / raw)
  To: Ian Jackson; +Cc: Yuji Shimada, xen-devel, Dong, Eddie, Keir Fraser, Alan Cox

Ian Jackson wrote:
> Dong, Eddie writes ("RE: [Xen-devel] [PATCH][RFC] Support
> more Capability Structures andDevice Specific"): 
>> Alan Cox wrote:
>>> In the general case there are also some really nasty
>>> dirty attacks you can't stop with an IOMMU one of which
>>> is to reflash the BIOS of the graphics card to which you
>>> were given unrestricted access so that you compromise
>>> the entire system next boot. These attacks appear well
>>> understood except by IOMMU marketing people ;)
>> 
>> Same with above, this is already protected by IOMMU,
>> peer to peer DMA is not supported right now.
> 
> You have evidently completely misunderstood Alan's point.
> 
> I was going to explain it again but I'm not sure I know
> how to say it 
> more clearly.  Alan's scenario doesn't involve any peer
> to peer DMA. 
> 
> Ian.
> 
Ok, if it means guest direct MMIO to flash bios, then yes. But it is not
related with our discussion, i.e. no matter we pass through CFGS
registers or not, it may happen.

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2008-07-03 23:03 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-06-27  7:38 [PATCH][RFC] Support more Capability Structures and Device Specific Yuji Shimada
2008-06-27 10:14 ` [PATCH][RFC] Support more Capability Structures andDevice Specific Dong, Eddie
2008-06-27 10:19   ` Keir Fraser
2008-06-27 10:25     ` Dong, Eddie
2008-06-27 13:34       ` Ian Jackson
2008-06-30  4:31         ` Yuji Shimada
2008-06-30  5:48           ` [PATCH][RFC] Support more Capability StructuresandDevice Specific Cui, Dexuan
2008-06-30  8:14             ` Yuji Shimada
2008-06-30  9:29               ` Cui, Dexuan
2008-07-02  1:03                 ` Yuji Shimada
2008-07-02  2:07                   ` Cui, Dexuan
2008-07-03  1:49                   ` Dong, Eddie
2008-07-01  2:27           ` [PATCH][RFC] Support more Capability Structures andDevice Specific Dong, Eddie
2008-07-01  8:00             ` Yuji Shimada
2008-07-01  9:54               ` Ian Jackson
2008-07-01 23:23                 ` Dong, Eddie
2008-07-02 10:30                   ` Ian Jackson
2008-07-02 11:17                     ` Alan Cox
2008-07-03  1:46                       ` Dong, Eddie
2008-07-03  9:50                         ` Ian Jackson
2008-07-03 23:03                           ` [PATCH][RFC] Support more Capability StructuresandDevice Specific Dong, Eddie
2008-07-03  1:38                     ` [PATCH][RFC] Support more Capability Structures andDevice Specific Dong, Eddie
2008-07-01  2:12         ` Dong, Eddie
2008-06-30  7:14       ` Yuji Shimada
2008-06-30  9:02         ` [PATCH][RFC] Support more Capability StructuresandDevice Specific Cui, Dexuan
2008-06-27 13:27     ` [PATCH][RFC] Support more Capability Structures andDevice Specific Ian Jackson
2008-06-27 13:55       ` Ian Jackson
2008-06-30  8:00         ` Yuji Shimada
2008-06-30 16:50           ` Ian Jackson
2008-07-01  2:25             ` [PATCH][RFC] Support more Capability StructuresandDevice Specific Cui, Dexuan
2008-06-27 13:51 ` [PATCH][RFC] Support more Capability Structures and Device Specific Samuel Thibault
2008-06-30  7:12   ` Yuji Shimada
2008-06-30 10:22     ` Samuel Thibault

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.