* [PATCH] Improve the current FLR logic
@ 2008-07-12 12:37 Cui, Dexuan
2008-07-14 17:52 ` Espen Skoglund
` (2 more replies)
0 siblings, 3 replies; 27+ messages in thread
From: Cui, Dexuan @ 2008-07-12 12:37 UTC (permalink / raw)
To: Keir Fraser, xen-devel
[-- Attachment #1: Type: text/plain, Size: 2021 bytes --]
Hi, all,
The attached patches try to improve the current FLR logic.
The idea is: removing the FLR logic from hypervisor and adding the
improved logic in Control Panel.
The current FLR logic in hypervisor has some issues: 1) Dstate
transition is not guaranteed to properly clear the device state; 2) the
current code for PCIe FLR is actually buggy: PCI_EXP_DEVSTA_TRPND
doesn't mean the completion of FLR; according to the PCIe spec, after
issuing FLR, we should wait at least 100ms.
To make it easier to improve the FLR logic, and to make the hypervisor
thin, I think we might as well move the logic to Control Panel for the
time being. In the long run, the essential logic may be implemented in
the pciback driver of Dom0 instead.
[PATCH1] sysmnt_cleanup.patch: this is only a small cleanup.
[PATCH2] remove_FLR_in_xen.patch: remove the FLR logic in Xen.
[PATCH3] do_FLR_in_control_panel.patch: the improved FLR logic in
Control Panel: 1) If the device is PCIe endpoint and supports PCIe FLR,
we use that; 2) Else, if the device is PCIe endpoint, and all functions
on the device are assigned to the same guest, we use the immediate
parent bus's Secondary Bus Reset to reset all functions of the device
(here, actually we require all the functions of the device be assigned
to the same guest); 3) Else, if the device is PCI endpoint and is on a
host bus (e.g. integrated devices) and if the device supports PCI
Advanced Capabilities, we use that for FLR; 4) Else, we use the
Secondary Bus Reset (if the PCI device is behind a PCI/PCI-X bridge,
then all devices behind the uppermost such PCI/PCI-X bridge above this
device must be co-assigned)
[PATCH4] list_assignable_devices.patch: a command "xm
pci-list-assignable-devices" which can list all the assignable devices
in the system. It is useful for end users.
I made some tests on my hosts. Looks the patches work well.
I'd like to ask for your comments, and test feedbacks. Thank you very
much!
Thanks,
-- Dexuan
[-- Attachment #2: sysmnt_cleanup.patch --]
[-- Type: application/octet-stream, Size: 3312 bytes --]
A small cleanup to the find_sysfs_mnt() of pci.py
Signed-off-by: Dexuan Cui <dexuan.cui@intel.com>
diff -r 54dec90cbea9 -r da5b2c2e9f9d tools/python/xen/util/pci.py
--- a/tools/python/xen/util/pci.py Fri Jul 11 15:37:31 2008 +0100
+++ b/tools/python/xen/util/pci.py Sat Jul 12 18:29:16 2008 +0800
@@ -40,6 +40,9 @@ MSIX_SIZE_MASK = 0x7ff
# Global variable to store information from lspci
lspci_info = None
+# Global variable to store the sysfs mount point
+sysfs_mnt_point = None
+
#Calculate PAGE_SHIFT: number of bits to shift an address to get the page number
PAGE_SIZE = resource.getpagesize()
PAGE_SHIFT = 0
@@ -63,27 +66,28 @@ def parse_hex(val):
return None
def find_sysfs_mnt():
- mounts_file = open(PROC_MNT_PATH,'r')
-
- for line in mounts_file:
- sline = line.split()
- if len(sline)<3:
- continue
-
- if sline[2]=='sysfs':
- return sline[1]
-
+ global sysfs_mnt_point
+ if not sysfs_mnt_point is None:
+ return sysfs_mnt_point
+
+ try:
+ mounts_file = open(PROC_MNT_PATH,'r')
+
+ for line in mounts_file:
+ sline = line.split()
+ if len(sline)<3:
+ continue
+ if sline[2]=='sysfs':
+ sysfs_mnt_point= sline[1]
+ return sysfs_mnt_point
+ except IOError, (errno, strerr):
+ raise PciDeviceParseError(('Failed to locate sysfs mount: %s: %s (%d)'%
+ (PROC_PCI_PATH, strerr, errno)))
return None
def get_all_pci_names():
- try:
- sysfs_mnt = find_sysfs_mnt()
- except IOError, (errno, strerr):
- raise PciDeviceParseError(('Failed to locate sysfs mount: %s (%d)' %
- (PROC_PCI_PATH, strerr, errno)))
-
+ sysfs_mnt = find_sysfs_mnt()
pci_names = os.popen('ls ' + sysfs_mnt + SYSFS_PCI_DEVS_PATH).read().split()
-
return pci_names
def get_all_pci_devices():
@@ -175,12 +179,7 @@ class PciDevice:
self.get_info_from_lspci()
def find_capability(self, type):
- try:
- sysfs_mnt = find_sysfs_mnt()
- except IOError, (errno, strerr):
- raise PciDeviceParseError(('Failed to locate sysfs mount: %s (%d)' %
- (PROC_PCI_PATH, strerr, errno)))
-
+ sysfs_mnt = find_sysfs_mnt()
if sysfs_mnt == None:
return False
path = sysfs_mnt+SYSFS_PCI_DEVS_PATH+'/'+ \
@@ -218,7 +217,7 @@ class PciDevice:
self.pba_offset = self.pba_offset & ~MSIX_BIR_MASK
break
except IOError, (errno, strerr):
- raise PciDeviceParseError(('Failed to locate sysfs mount: %s (%d)' %
+ raise PciDeviceParseError(('Failed to locate sysfs mount: %s: %s (%d)' %
(PROC_PCI_PATH, strerr, errno)))
def remove_msix_iomem(self, index, start, size):
@@ -237,12 +236,7 @@ class PciDevice:
def get_info_from_sysfs(self):
self.find_capability(0x11)
- try:
- sysfs_mnt = find_sysfs_mnt()
- except IOError, (errno, strerr):
- raise PciDeviceParseError(('Failed to locate sysfs mount: %s (%d)' %
- (PROC_PCI_PATH, strerr, errno)))
-
+ sysfs_mnt = find_sysfs_mnt()
if sysfs_mnt == None:
return False
[-- Attachment #3: remove_FLR_in_xen.patch --]
[-- Type: application/octet-stream, Size: 4518 bytes --]
Remove the FLR logic in Xen.
The current simple logic has some issues: 1) Dstate transition is not
guaranteed to properly clear the device state; 2) the current code for
PCIe FLR is actually buggy: PCI_EXP_DEVSTA_TRPND doesn't mean the completion
of FLR; according to the PCIe spec, after issuing FLR, we should wait at least
100ms.
The improved FLR logic will be added into Control Panel.
Signed-off-by: Dexuan Cui <dexuan.cui@intel.com>
diff -r da5b2c2e9f9d -r ba31588eaac1 xen/drivers/passthrough/amd/pci_amd_iommu.c
--- a/xen/drivers/passthrough/amd/pci_amd_iommu.c Sat Jul 12 18:29:16 2008 +0800
+++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c Sat Jul 12 18:30:14 2008 +0800
@@ -507,7 +507,6 @@ static int reassign_device( struct domai
if ( !pdev )
return -ENODEV;
- pdev_flr(pdev);
bdf = (bus << 8) | devfn;
/* supported device? */
iommu = (bdf < ivrs_bdf_entries) ?
diff -r da5b2c2e9f9d -r ba31588eaac1 xen/drivers/passthrough/pci.c
--- a/xen/drivers/passthrough/pci.c Sat Jul 12 18:29:16 2008 +0800
+++ b/xen/drivers/passthrough/pci.c Sat Jul 12 18:30:14 2008 +0800
@@ -161,71 +161,6 @@ void pci_release_devices(struct domain *
}
}
-#define PCI_D3hot (3)
-#define PCI_CONFIG_DWORD_SIZE (64)
-#define PCI_EXP_DEVCAP_FLR (1 << 28)
-#define PCI_EXP_DEVCTL_FLR (1 << 15)
-
-void pdev_flr(struct pci_dev *pdev)
-{
- u8 pos;
- u32 dev_cap, dev_status, pm_ctl;
- int flr = 0;
- u8 dev = PCI_SLOT(pdev->devfn);
- u8 func = PCI_FUNC(pdev->devfn);
-
- pos = pci_find_cap_offset(pdev->bus, dev, func, PCI_CAP_ID_EXP);
- if ( pos != 0 )
- {
- dev_cap = pci_conf_read32(pdev->bus, dev, func, pos + PCI_EXP_DEVCAP);
- if ( dev_cap & PCI_EXP_DEVCAP_FLR )
- {
- pci_conf_write32(pdev->bus, dev, func,
- pos + PCI_EXP_DEVCTL, PCI_EXP_DEVCTL_FLR);
- do {
- dev_status = pci_conf_read32(pdev->bus, dev, func,
- pos + PCI_EXP_DEVSTA);
- } while ( dev_status & PCI_EXP_DEVSTA_TRPND );
-
- flr = 1;
- }
- }
-
- /* If this device doesn't support function level reset,
- * program device from D0 t0 D3hot, and then return to D0
- * to implement function level reset
- */
- if ( flr == 0 )
- {
- pos = pci_find_cap_offset(pdev->bus, dev, func, PCI_CAP_ID_PM);
- if ( pos != 0 )
- {
- int i;
- u32 config[PCI_CONFIG_DWORD_SIZE];
- for ( i = 0; i < PCI_CONFIG_DWORD_SIZE; i++ )
- config[i] = pci_conf_read32(pdev->bus, dev, func, i*4);
-
- /* Enter D3hot without soft reset */
- pm_ctl = pci_conf_read32(pdev->bus, dev, func, pos + PCI_PM_CTRL);
- pm_ctl |= PCI_PM_CTRL_NO_SOFT_RESET;
- pm_ctl &= ~PCI_PM_CTRL_STATE_MASK;
- pm_ctl |= PCI_D3hot;
- pci_conf_write32(pdev->bus, dev, func, pos + PCI_PM_CTRL, pm_ctl);
- mdelay(10);
-
- /* From D3hot to D0 */
- pci_conf_write32(pdev->bus, dev, func, pos + PCI_PM_CTRL, 0);
- mdelay(10);
-
- /* Write saved configurations to device */
- for ( i = 0; i < PCI_CONFIG_DWORD_SIZE; i++ )
- pci_conf_write32(pdev->bus, dev, func, i*4, config[i]);
-
- flr = 1;
- }
- }
-}
-
static void dump_pci_devices(unsigned char ch)
{
struct pci_dev *pdev;
diff -r da5b2c2e9f9d -r ba31588eaac1 xen/drivers/passthrough/vtd/iommu.c
--- a/xen/drivers/passthrough/vtd/iommu.c Sat Jul 12 18:29:16 2008 +0800
+++ b/xen/drivers/passthrough/vtd/iommu.c Sat Jul 12 18:30:14 2008 +0800
@@ -1382,7 +1382,6 @@ static int reassign_device_ownership(
if ( !(pdev = pci_lock_domain_pdev(source, bus, devfn)) )
return -ENODEV;
- pdev_flr(pdev);
drhd = acpi_find_matched_drhd_unit(bus, devfn);
pdev_iommu = drhd->iommu;
domain_context_unmap(bus, devfn);
diff -r da5b2c2e9f9d -r ba31588eaac1 xen/include/xen/pci.h
--- a/xen/include/xen/pci.h Sat Jul 12 18:29:16 2008 +0800
+++ b/xen/include/xen/pci.h Sat Jul 12 18:30:14 2008 +0800
@@ -56,7 +56,6 @@ struct pci_dev *pci_lock_pdev(int bus, i
struct pci_dev *pci_lock_pdev(int bus, int devfn);
struct pci_dev *pci_lock_domain_pdev(struct domain *d, int bus, int devfn);
-void pdev_flr(struct pci_dev *pdev);
void pci_release_devices(struct domain *d);
int pci_add_device(u8 bus, u8 devfn);
int pci_remove_device(u8 bus, u8 devfn);
[-- Attachment #4: do_FLR_in_control_panel.patch --]
[-- Type: application/octet-stream, Size: 24540 bytes --]
The improved FLR logic in Control Panel:
1) If the device is PCIe endpoint and supports PCIe FLR, we use that;
2) Else, if the device is PCIe endpoint, and all functions on the device are
assigned to the same guest, we use the immediate parent bus's Secondary Bus
Reset to reset all functions of the device (here, actually we require all the
functions of the device be assigned to the same guest); 3) Else, if the device
is PCI endpoint and is on a host bus (e.g. integrated devices), and if the
device supports PCI Advanced Capabilities, we use that for FLR; 4) Else, we
use the Secondary Bus Reset (if the PCI device is behind a PCI/PCI-X bridge,
then all devices behind the uppermost such PCI/PCI-X bridge above this device
must be co-assigned).
Signed-off-by: Dexuan Cui <dexuan.cui@intel.com>
diff -r ba31588eaac1 -r 9287873bb56c tools/python/xen/util/pci.py
--- a/tools/python/xen/util/pci.py Sat Jul 12 18:30:14 2008 +0800
+++ b/tools/python/xen/util/pci.py Sat Jul 12 18:31:00 2008 +0800
@@ -10,6 +10,8 @@ import resource
import resource
import re
import types
+import struct
+import time
PROC_MNT_PATH = '/proc/mounts'
PROC_PCI_PATH = '/proc/bus/pci/devices'
@@ -28,8 +30,44 @@ SYSFS_PCI_DEV_CLASS_PATH = '/class'
LSPCI_CMD = 'lspci'
+PCI_DEV_REG_EXPRESS_STR = r"[0-9a-fA-F]{4}:[0-9a-fA-F]{2}:[0-9a-fA-F]{2}."+ \
+ r"[0-9a-fA-F]{1}"
+PCI_DEV_FORMAT_STR = '%04x:%02x:%02x.%01x'
+
+DEV_TYPE_PCIe_ENDPOINT = 0
+DEV_TYPE_PCIe_BRIDGE = 1
+DEV_TYPE_PCI_BRIDGE = 2
+DEV_TYPE_PCI = 3
+
+PCI_STATUS = 0x6
+PCI_CLASS_DEVICE = 0x0a
+PCI_CLASS_BRIDGE_PCI = 0x0604
+
+PCI_CAPABILITY_LIST = 0x34
+PCI_CB_BRIDGE_CONTROL = 0x3e
+PCI_BRIDGE_CTL_BUS_RESET= 0x40
+
+PCI_CAP_ID_EXP = 0x10
+PCI_EXP_FLAGS = 0x2
+PCI_EXP_FLAGS_TYPE = 0x00f0
+PCI_EXP_TYPE_PCI_BRIDGE = 0x7
+PCI_EXP_DEVCAP = 0x4
+PCI_EXP_DEVCAP_FLR = (0x1 << 28)
+PCI_EXP_DEVCTL = 0x8
+PCI_EXP_DEVCTL_FLR = (0x1 << 15)
+
+PCI_CAP_ID_AF = 0x13
+PCI_AF_CAPs = 0x3
+PCI_AF_CAPs_TP_FLR = 0x3
+PCI_AF_CTL = 0x4
+PCI_AF_CTL_FLR = 0x1
+
+PCI_BAR_0 = 0x10
+PCI_BAR_5 = 0x24
+PCI_BAR_SPACE = 0x01
PCI_BAR_IO = 0x01
PCI_BAR_IO_MASK = ~0x03
+PCI_BAR_MEM = 0x00
PCI_BAR_MEM_MASK = ~0x0f
PCI_STATUS_CAP_MASK = 0x10
PCI_STATUS_OFFSET = 0x6
@@ -64,6 +102,17 @@ def parse_hex(val):
return val
except ValueError:
return None
+
+def parse_pci_name(pci_name_string):
+ # Format: xxxx:xx:xx:x
+ s = pci_name_string
+ s = s.split(':')
+ dom = parse_hex(s[0])
+ bus = parse_hex(s[1])
+ s = s[2].split('.')
+ dev = parse_hex(s[0])
+ func = parse_hex(s[1])
+ return (dom, bus, dev, func)
def find_sysfs_mnt():
global sysfs_mnt_point
@@ -134,13 +183,40 @@ def create_lspci_info():
if device_name is not None:
lspci_info[device_name] = device_info
+def save_pci_conf_space(devs_string):
+ pci_list = []
+ cfg_list = []
+ sysfs_mnt = find_sysfs_mnt()
+ for pci_str in devs_string:
+ pci_path = sysfs_mnt + SYSFS_PCI_DEVS_PATH + '/' + pci_str + \
+ SYSFS_PCI_DEV_CONFIG_PATH
+ fd = os.open(pci_path, os.O_RDONLY)
+ configs = []
+ for i in range(0, 256, 4):
+ configs = configs + [os.read(fd,4)]
+ os.close(fd)
+ pci_list = pci_list + [pci_path]
+ cfg_list = cfg_list + [configs]
+ return (pci_list, cfg_list)
+
+def restore_pci_conf_space(pci_cfg_list):
+ pci_list = pci_cfg_list[0]
+ cfg_list = pci_cfg_list[1]
+ for i in range(0, len(pci_list)):
+ pci_path = pci_list[i]
+ configs = cfg_list[i]
+ fd = os.open(pci_path, os.O_WRONLY)
+ for dw in configs:
+ os.write(fd, dw)
+ os.close(fd)
+
class PciDeviceNotFoundError(Exception):
def __init__(self,domain,bus,slot,func):
self.domain = domain
self.bus = bus
self.slot = slot
self.func = func
- self.name = "%04x:%02x:%02x.%01x"%(domain, bus, slot, func)
+ self.name = PCI_DEV_FORMAT_STR %(domain, bus, slot, func)
def __str__(self):
return ('PCI Device %s Not Found' % (self.name))
@@ -150,6 +226,20 @@ class PciDeviceParseError(Exception):
self.message = msg
def __str__(self):
return 'Error Parsing PCI Device Info: '+self.message
+
+class PciDeviceAssignmentError(Exception):
+ def __init__(self,msg):
+ self.message = msg
+ def __str__(self):
+ return 'pci: impproper device assignment spcified: ' + \
+ self.message
+
+class PciDeviceFlrError(PciDeviceAssignmentError):
+ def __init__(self,msg):
+ self.message = msg
+ def __str__(self):
+ return 'Can not find a suitable FLR method for the device(s): ' + \
+ self.message
class PciDevice:
def __init__(self, domain, bus, slot, func):
@@ -157,7 +247,9 @@ class PciDevice:
self.bus = bus
self.slot = slot
self.func = func
- self.name = "%04x:%02x:%02x.%01x"%(domain, bus, slot, func)
+ self.name = PCI_DEV_FORMAT_STR % (domain, bus, slot, func)
+ self.cfg_space_path = find_sysfs_mnt()+SYSFS_PCI_DEVS_PATH+'/'+ \
+ self.name + SYSFS_PCI_DEV_CONFIG_PATH
self.irq = 0
self.iomem = []
self.ioports = []
@@ -175,8 +267,316 @@ class PciDevice:
self.classname = ""
self.subvendorname = ""
self.subdevicename = ""
+ self.dev_type = None
+ self.has_non_page_aligned_bar = False
+ self.pcie_flr = False
+ self.pci_af_flr = False
+ self.detect_dev_info()
self.get_info_from_sysfs()
self.get_info_from_lspci()
+
+ def find_parent(self):
+ # i.e., /sys/bus/pci/devices/0000:00:19.0 or
+ # /sys/bus/pci/devices/0000:03:04.0
+ path = find_sysfs_mnt()+SYSFS_PCI_DEVS_PATH+'/'+ self.name
+ # i.e., ../../../devices/pci0000:00/0000:00:19.0
+ # ../../../devices/pci0000:00/0000:00:02.0/0000:01:00.2/0000:03:04.0
+ try:
+ target = os.readlink(path)
+ lst = target.split('/')
+ parent = lst[len(lst)-2]
+ if parent[0:3] == 'pci':
+ parent = parent[3:]
+ lst = parent.split(':')
+ dom = int(lst[0], 16)
+ bus = int(lst[1], 16)
+ dev = 0
+ func = 0
+ else:
+ lst = parent.split(':')
+ dom = int(lst[0], 16)
+ bus = int(lst[1], 16)
+ lst = lst[2]
+ lst = lst.split('.')
+ dev = int(lst[0], 16)
+ func = int(lst[1], 16)
+ return (dom, bus, dev, func)
+ except OSError, (errno, strerr):
+ raise PciDeviceParseError('Can not locate the parent of %s',
+ self.name)
+
+ def find_the_uppermost_pci_bridge(self):
+ # Find the uppermost PCI/PCI-X bridge
+ (dom, b, d, f) = self.find_parent()
+ dev = dev_parent = PciDevice(dom, b, d, f)
+ while dev_parent.dev_type != DEV_TYPE_PCIe_BRIDGE:
+ (dom, b, d, f) = dev_parent.find_parent()
+ dev = dev_parent
+ dev_parent = PciDevice(dom, b, d, f)
+ return dev
+
+ def find_all_devices_behind_the_bridge(self, ignore_bridge):
+ sysfs_mnt = find_sysfs_mnt()
+ self_path = sysfs_mnt + SYSFS_PCI_DEVS_PATH + '/' + self.name
+ pci_names = os.popen('ls ' + self_path).read()
+ dev_list = re.findall(PCI_DEV_REG_EXPRESS_STR, pci_names)
+
+ list = [self.name]
+ for pci_str in dev_list:
+ (dom, b, d, f) = parse_pci_name(pci_str)
+ dev = PciDevice(dom, b, d, f)
+ if dev.dev_type == DEV_TYPE_PCI_BRIDGE or \
+ dev.dev_type == DEV_TYPE_PCIe_BRIDGE:
+ sub_list_including_self = \
+ dev.find_all_devices_behind_the_bridge(ignore_bridge)
+ if ignore_bridge:
+ del sub_list_including_self[0]
+ list = list + [sub_list_including_self]
+ else:
+ list = list + [dev.name]
+ return list
+
+ def find_coassigned_devices(self, ignore_bridge = True):
+ ''' Here'self' is a PCI device, we need find the uppermost PCI/PCI-X
+ bridge, and all devices behind it must be co-assigned to the same
+ guest.
+
+ Parameter:
+ [ignore_bridge]: if set, the returned result doesn't include
+ any bridge behind the uppermost PCI/PCI-X bridge.
+
+ Note: The first element of the return value is the uppermost
+ PCI/PCI-X bridge. If the caller doesn't need the first
+ element, the caller itself can remove it explicitly.
+ '''
+ dev = self.find_the_uppermost_pci_bridge()
+ dev_list = dev.find_all_devices_behind_the_bridge(ignore_bridge)
+ dev_list = re.findall(PCI_DEV_REG_EXPRESS_STR, '%s' % dev_list)
+ return dev_list
+
+ def do_secondary_bus_reset(self, target_bus, devs):
+ # Save the config spaces of all the devices behind the bus.
+ (pci_list, cfg_list) = save_pci_conf_space(devs)
+
+ #Do the Secondary Bus Reset
+ sysfs_mnt = find_sysfs_mnt()
+ parent_path = sysfs_mnt + SYSFS_PCI_DEVS_PATH + '/' + \
+ target_bus + SYSFS_PCI_DEV_CONFIG_PATH
+ fd = os.open(parent_path, os.O_WRONLY)
+ # Assert Secondary Bus Reset
+ os.lseek(fd, PCI_CB_BRIDGE_CONTROL, 0)
+ os.write(fd, struct.pack('I', PCI_BRIDGE_CTL_BUS_RESET))
+ time.sleep(0.200)
+ # De-assert Secondary Bus Reset
+ os.lseek(fd, 0x3e, 0)
+ os.write(fd, struct.pack('I', 0x00))
+ time.sleep(0.200)
+ os.close(fd)
+
+ # Restore the config spaces
+ restore_pci_conf_space((pci_list, cfg_list))
+
+ def find_all_the_multi_functions(self):
+ sysfs_mnt = find_sysfs_mnt()
+ pci_names = os.popen('ls ' + sysfs_mnt + SYSFS_PCI_DEVS_PATH).read()
+ p = self.name
+ p = p[0 : p.rfind('.')] + '.[0-7]'
+ funcs = re.findall(p, pci_names)
+ return funcs
+
+ def find_cap_offset(self, cap):
+ path = find_sysfs_mnt()+SYSFS_PCI_DEVS_PATH+'/'+ \
+ self.name+SYSFS_PCI_DEV_CONFIG_PATH
+
+ pos = PCI_CAPABILITY_LIST
+
+ try:
+ fd = os.open(path, os.O_RDONLY)
+ os.lseek(fd, PCI_STATUS, 0)
+ status = struct.unpack('H', os.read(fd, 2))[0]
+ if (status & 0x10) == 0:
+ # The device doesn't support PCI_STATUS_CAP_LIST
+ return 0
+
+ max_cap = 48
+ while max_cap > 0:
+ os.lseek(fd, pos, 0)
+ pos = ord(os.read(fd, 1))
+ if pos < 0x40:
+ pos = 0
+ break;
+ os.lseek(fd, pos + 0, 0)
+ id = ord(os.read(fd, 1))
+ if id == 0xff:
+ pos = 0
+ break;
+
+ # Found the capability
+ if id == cap:
+ break;
+
+ # Test the next one
+ pos = pos + 1
+ max_cap = max_cap - 1;
+
+ os.close(fd)
+ except OSError, (errno, strerr):
+ raise PciDeviceParseError(('Error when accessing sysfs: %s (%d)' %
+ (strerr, errno)))
+ return pos
+
+ def pci_conf_read8(self, pos):
+ fd = os.open(self.cfg_space_path, os.O_RDONLY)
+ os.lseek(fd, pos, 0)
+ str = os.read(fd, 1)
+ os.close(fd)
+ val = struct.unpack('B', str)[0]
+ return val
+
+ def pci_conf_read16(self, pos):
+ fd = os.open(self.cfg_space_path, os.O_RDONLY)
+ os.lseek(fd, pos, 0)
+ str = os.read(fd, 2)
+ os.close(fd)
+ val = struct.unpack('H', str)[0]
+ return val
+
+ def pci_conf_read32(self, pos):
+ fd = os.open(self.cfg_space_path, os.O_RDONLY)
+ os.lseek(fd, pos, 0)
+ str = os.read(fd, 4)
+ os.close(fd)
+ val = struct.unpack('I', str)[0]
+ return val
+
+ def pci_conf_write8(self, pos, val):
+ str = struct.pack('B', val)
+ fd = os.open(self.cfg_space_path, os.O_WRONLY)
+ os.lseek(fd, pos, 0)
+ os.write(fd, str)
+ os.close(fd)
+
+ def pci_conf_write16(self, pos, val):
+ str = struct.pack('H', val)
+ fd = os.open(self.cfg_space_path, os.O_WRONLY)
+ os.lseek(fd, pos, 0)
+ os.write(fd, str)
+ os.close(fd)
+
+ def pci_conf_write32(self, pos, val):
+ str = struct.pack('I', val)
+ fd = os.open(self.cfg_space_path, os.O_WRONLY)
+ os.lseek(fd, pos, 0)
+ os.write(fd, str)
+ os.close(fd)
+
+ def detect_dev_info(self):
+ class_dev = self.pci_conf_read16(PCI_CLASS_DEVICE)
+ pos = self.find_cap_offset(PCI_CAP_ID_EXP)
+ if class_dev == PCI_CLASS_BRIDGE_PCI:
+ if pos == 0:
+ self.dev_type = DEV_TYPE_PCI_BRIDGE
+ else:
+ creg = self.pci_conf_read16(pos + PCI_EXP_FLAGS)
+ if ((creg & PCI_EXP_TYPE_PCI_BRIDGE) >> 4) == \
+ PCI_EXP_TYPE_PCI_BRIDGE:
+ self.dev_type = DEV_TYPE_PCI_BRIDGE
+ else:
+ self.dev_type = DEV_TYPE_PCIe_BRIDGE
+ else:
+ if pos != 0:
+ self.dev_type = DEV_TYPE_PCIe_ENDPOINT
+ else:
+ self.dev_type = DEV_TYPE_PCI
+
+ # Force 0000:00:00.0 to be DEV_TYPE_PCIe_BRIDGE
+ if self.name == '0000:00:00.0':
+ self.dev_type = DEV_TYPE_PCIe_BRIDGE
+
+ if (self.dev_type == DEV_TYPE_PCI_BRIDGE) or \
+ (self.dev_type == DEV_TYPE_PCIe_BRIDGE):
+ return
+
+ # Try to findthe PCIe FLR capability
+ if self.dev_type == DEV_TYPE_PCIe_ENDPOINT:
+ dev_cap = self.pci_conf_read32(pos + PCI_EXP_DEVCAP)
+ if dev_cap & PCI_EXP_DEVCAP_FLR:
+ self.pcie_flr = True
+ elif self.dev_type == DEV_TYPE_PCI:
+ # Try to find the "PCI Advanced Capabilities"
+ pos = self.find_cap_offset(PCI_CAP_ID_AF)
+ if pos != 0:
+ af_cap = self.pci_conf_read8(pos + PCI_AF_CAPs)
+ if (af_cap & PCI_AF_CAPs_TP_FLR) == PCI_AF_CAPs_TP_FLR:
+ self.pci_af_flr = True
+
+ bar_addr = PCI_BAR_0
+ while bar_addr <= PCI_BAR_5:
+ bar = self.pci_conf_read32(bar_addr)
+ if (bar & PCI_BAR_SPACE) == PCI_BAR_MEM:
+ bar = bar & PCI_BAR_MEM_MASK
+ bar = bar & ~PAGE_MASK
+ if bar != 0:
+ self.has_non_page_aligned_bar = True
+ break
+ bar_addr = bar_addr + 4
+
+ def devs_check_driver(self, devs):
+ if len(devs) == 0:
+ return
+ for pci_dev in devs:
+ (dom, b, d, f) = parse_pci_name(pci_dev)
+ dev = PciDevice(dom, b, d, f)
+ if dev.driver == 'pciback':
+ continue
+ err_msg = 'pci: %s must be co-assigned to the same guest with %s' + \
+ ', but it is not owned by pciback.'
+ raise PciDeviceAssignmentError(err_msg % (pci_dev, self.name))
+
+ def do_FLR(self):
+ """ Perform FLR (Functional Level Reset) for the device.
+ """
+ if self.dev_type == DEV_TYPE_PCIe_ENDPOINT:
+ # If PCIe device supports FLR, we use it.
+ if self.pcie_flr:
+ (pci_list, cfg_list) = save_pci_conf_space([self.name])
+ pos = self.find_cap_offset(PCI_CAP_ID_EXP)
+ self.pci_conf_write32(pos + PCI_EXP_DEVCTL, PCI_EXP_DEVCTL_FLR)
+ # We must sleep at least 100ms for the completion of FLR
+ time.sleep(0.200)
+ restore_pci_conf_space((pci_list, cfg_list))
+ else:
+ funcs = self.find_all_the_multi_functions()
+ self.devs_check_driver(funcs)
+
+ parent = '%04x:%02x:%02x.%01x' % self.find_parent()
+
+ # Do Secondary Bus Reset.
+ self.do_secondary_bus_reset(parent, funcs)
+ # PCI devices
+ else:
+ # For PCI device on host bus, we test "PCI Advanced Capabilities".
+ if self.bus == 0 and self.pci_af_flr:
+ (pci_list, cfg_list) = save_pci_conf_space([self.name])
+ # We use Advanced Capability to do FLR.
+ pos = self.find_cap_offset(PCI_CAP_ID_AF)
+ self.pci_conf_write8(pos + PCI_AF_CTL, PCI_AF_CTL_FLR)
+ time.sleep(0.200)
+ restore_pci_conf_space((pci_list, cfg_list))
+ else:
+ if self.bus == 0:
+ err_msg = 'pci: %s is not assignable: it is on bus 0, '+ \
+ 'but it has no PCI Advanced Capabilities.'
+ raise PciDeviceFlrError(err_msg % self.name)
+ else:
+ devs = self.find_coassigned_devices(False)
+ # Remove the element 0 which is a bridge
+ target_bus = devs[0]
+ del devs[0]
+ self.devs_check_driver(devs)
+
+ # Do Secondary Bus Reset.
+ self.do_secondary_bus_reset(target_bus, devs)
def find_capability(self, type):
sysfs_mnt = find_sysfs_mnt()
@@ -364,7 +764,7 @@ class PciDevice:
def main():
if len(sys.argv)<5:
- print "Usage: %s <domain> <bus> <slot> <func>\n"
+ print "Usage: %s <domain> <bus> <slot> <func>\n" % sys.argv[0]
sys.exit(2)
dev = PciDevice(int(sys.argv[1],16), int(sys.argv[2],16),
diff -r ba31588eaac1 -r 9287873bb56c tools/python/xen/xend/XendDomainInfo.py
--- a/tools/python/xen/xend/XendDomainInfo.py Sat Jul 12 18:30:14 2008 +0800
+++ b/tools/python/xen/xend/XendDomainInfo.py Sat Jul 12 18:31:00 2008 +0800
@@ -701,7 +701,7 @@ class XendDomainInfo:
vslt = x['vslt']
break
if vslt == '0x0':
- raise VmError("Device %04x:%02x:%02x.%02x is not connected"
+ raise VmError("Device %04x:%02x:%02x.%01x is not connected"
% (int(dev['domain'],16), int(dev['bus'],16),
int(dev['slot'],16), int(dev['func'],16)))
self.hvm_destroyPCIDevice(int(vslt, 16))
diff -r ba31588eaac1 -r 9287873bb56c tools/python/xen/xend/server/DevController.py
--- a/tools/python/xen/xend/server/DevController.py Sat Jul 12 18:30:14 2008 +0800
+++ b/tools/python/xen/xend/server/DevController.py Sat Jul 12 18:31:00 2008 +0800
@@ -223,6 +223,12 @@ class DevController:
raise VmError('%s devices may not be reconfigured' % self.deviceClass)
+ def cleanupDeviceOnDomainDestroy(self, devid):
+ """ Some devices may need special cleanup when the guest domain
+ is destroyed.
+ """
+ return
+
def destroyDevice(self, devid, force):
"""Destroy the specified device.
@@ -238,6 +244,8 @@ class DevController:
"""
dev = self.convertToDeviceNumber(devid)
+
+ self.cleanupDeviceOnDomainDestroy(dev)
# Modify online status /before/ updating state (latter is watched by
# drivers, so this ordering avoids a race).
diff -r ba31588eaac1 -r 9287873bb56c tools/python/xen/xend/server/pciif.py
--- a/tools/python/xen/xend/server/pciif.py Sat Jul 12 18:30:14 2008 +0800
+++ b/tools/python/xen/xend/server/pciif.py Sat Jul 12 18:31:00 2008 +0800
@@ -28,7 +28,7 @@ from xen.xend.server.DevController impor
import xen.lowlevel.xc
-from xen.util.pci import PciDevice
+from xen.util.pci import *
import resource
import re
@@ -74,7 +74,7 @@ class PciController(DevController):
if vslt is not None:
vslots = vslots + vslt + ";"
- back['dev-%i' % pcidevid] = "%04x:%02x:%02x.%02x" % \
+ back['dev-%i' % pcidevid] = "%04x:%02x:%02x.%01x" % \
(domain, bus, slot, func)
back['uuid-%i' % pcidevid] = pci_config.get('uuid', '')
pcidevid += 1
@@ -284,7 +284,12 @@ class PciController(DevController):
"bind your slot/device to the PCI backend using sysfs" \
)%(dev.name))
+ if dev.has_non_page_aligned_bar:
+ raise VmError("pci: %: non-page-aligned MMIO BAR found." % dev.name)
+
self.CheckSiblingDevices(fe_domid, dev)
+
+ dev.do_FLR()
PCIQuirk(dev.vendor, dev.device, dev.subvendor, dev.subdevice, domain,
bus, slot, func)
@@ -352,11 +357,48 @@ class PciController(DevController):
def setupDevice(self, config):
"""Setup devices from config
"""
+ pci_str_list = []
+ pci_dev_list = []
for pci_config in config.get('devs', []):
domain = parse_hex(pci_config.get('domain', 0))
bus = parse_hex(pci_config.get('bus', 0))
slot = parse_hex(pci_config.get('slot', 0))
func = parse_hex(pci_config.get('func', 0))
+ pci_str = '%04x:%02x:%02x.%01x' % (domain, bus, slot, func)
+ pci_str_list = pci_str_list + [pci_str]
+ pci_dev_list = pci_dev_list + [(domain, bus, slot, func)]
+
+ for (domain, bus, slot, func) in pci_dev_list:
+ try:
+ dev = PciDevice(domain, bus, slot, func)
+ except Exception, e:
+ raise VmError("pci: failed to locate device and "+
+ "parse it's resources - "+str(e))
+ if (dev.dev_type == DEV_TYPE_PCIe_ENDPOINT) and not dev.pcie_flr:
+ funcs = dev.find_all_the_multi_functions()
+ for f in funcs:
+ if not f in pci_str_list:
+ err_msg = 'pci: % must be co-assigned to guest with %s'
+ raise VmError(err_msg % (f, dev.name))
+ elif dev.dev_type == DEV_TYPE_PCI:
+ if dev.bus == 0:
+ if not dev.pci_af_flr:
+ err_msg = 'pci: %s is not assignable: it is on ' + \
+ 'bus 0, but lacks of FLR capability'
+ raise VmError(err_msg % dev.name)
+ else:
+ # All devices behind the uppermost PCI/PCI-X bridge must be\
+ # co-assigned to the same guest.
+ devs_str = dev.find_coassigned_devices(True)
+ # Remove the element 0 which is a bridge
+ del devs_str[0]
+
+ for s in devs_str:
+ if not s in pci_str_list:
+ err_msg = 'pci: %s must be co-assigned to guest with %s'
+ raise VmError(err_msg % (s, dev.name))
+
+ for (domain, bus, slot, func) in pci_dev_list:
self.setupOneDevice(domain, bus, slot, func)
return
@@ -419,6 +461,7 @@ class PciController(DevController):
if rc<0:
raise VmError(('pci: failed to configure irq on device '+
'%s - errno=%d')%(dev.name,rc))
+ dev.do_FLR()
def cleanupDevice(self, devid):
""" Detach I/O resources for device and cleanup xenstore nodes
@@ -471,6 +514,22 @@ class PciController(DevController):
return new_num_devs
+ def cleanupDeviceOnDomainDestroy(self, devid):
+ num_devs = int(self.readBackend(devid, 'num_devs'))
+ dev_str_list = []
+ for i in range(num_devs):
+ dev_str = self.readBackend(devid, 'dev-%i' % i)
+ dev_str_list = dev_str_list + [dev_str]
+
+ for dev_str in dev_str_list:
+ (dom, b, d, f) = parse_pci_name(dev_str)
+ try:
+ dev = PciDevice(dom, b, d, f)
+ except Exception, e:
+ raise VmError("pci: failed to locate device and "+
+ "parse it's resources - "+str(e))
+ dev.do_FLR()
+
def waitForBackend(self,devid):
return (0, "ok - no hotplug")
[-- Attachment #5: list_assignable_devices.patch --]
[-- Type: application/octet-stream, Size: 6940 bytes --]
Add a command "xm pci-list-assignable-devices" to list all the assignable
devices.
Signed-off-by: Dexuan Cui <dexuan.cui@intel.com>
diff -r 9287873bb56c tools/python/xen/util/pci.py
--- a/tools/python/xen/util/pci.py Sat Jul 12 18:31:00 2008 +0800
+++ b/tools/python/xen/util/pci.py Sat Jul 12 18:31:37 2008 +0800
@@ -27,6 +27,7 @@ SYSFS_PCI_DEV_SUBVENDOR_PATH = '/subsyst
SYSFS_PCI_DEV_SUBVENDOR_PATH = '/subsystem_vendor'
SYSFS_PCI_DEV_SUBDEVICE_PATH = '/subsystem_device'
SYSFS_PCI_DEV_CLASS_PATH = '/class'
+SYSFS_PCIBACK_PATH = '/bus/pci/drivers/pciback/'
LSPCI_CMD = 'lspci'
@@ -209,6 +210,109 @@ def restore_pci_conf_space(pci_cfg_list)
for dw in configs:
os.write(fd, dw)
os.close(fd)
+
+def find_all_devices_owned_by_pciback():
+ sysfs_mnt = find_sysfs_mnt()
+ pciback_path = sysfs_mnt + SYSFS_PCIBACK_PATH
+ pci_names = os.popen('ls ' + pciback_path).read()
+ pci_list = re.findall(PCI_DEV_REG_EXPRESS_STR, pci_names)
+ dev_list = []
+ for pci in pci_list:
+ (dom, b, d, f) = parse_pci_name(pci)
+ dev = PciDevice(dom, b, d, f)
+ dev_list = dev_list + [dev]
+ return dev_list
+
+def transform_list(target, src):
+ ''' src: its element is pci string (Format: xxxx:xx:xx:x).
+ target: its element is pci string, or a list of pci string.
+
+ If all the elements in src are in target, we remove them from target
+ and add src into target; otherwise, we remove from target all the
+ elements that also appear in src.
+ '''
+ result = []
+ target_contains_src = True
+ for e in src:
+ if not e in target:
+ target_contains_src = False
+ break
+
+ if target_contains_src:
+ result = result + [src]
+ for e in target:
+ if not e in src:
+ result = result + [e]
+ return result
+
+def check_FLR_capability(dev_list):
+ i = len(dev_list)
+ if i == 0:
+ return []
+ i = i - 1;
+ while i >= 0:
+ dev = dev_list[i]
+ if dev.bus == 0:
+ if dev.dev_type == DEV_TYPE_PCIe_ENDPOINT and not dev.pcie_flr:
+ del dev_list[i]
+ elif dev.dev_type == DEV_TYPE_PCI and not dev.pci_af_flr:
+ del dev_list[i]
+ i = i - 1
+
+ pci_list = []
+ pci_dev_dict = {}
+ for dev in dev_list:
+ pci_list = pci_list + [dev.name]
+ pci_dev_dict[dev.name] = dev
+
+ while True:
+ need_transform = False
+ for pci in pci_list:
+ if isinstance(pci, types.StringTypes):
+ dev = pci_dev_dict[pci]
+ if dev.dev_type == DEV_TYPE_PCIe_ENDPOINT and not dev.pcie_flr:
+ coassigned_pci_list = dev.find_all_the_multi_functions()
+ need_transform = True
+ elif dev.dev_type == DEV_TYPE_PCI and not dev.pci_af_flr:
+ coassigned_pci_list = dev.find_coassigned_devices(True)
+ del coassigned_pci_list[0]
+ need_transform = True
+
+ if need_transform:
+ pci_list = transform_list(pci_list, coassigned_pci_list)
+ if not need_transform:
+ break
+
+ if len(pci_list) == 0:
+ return []
+
+ for i in range(0, len(pci_list)):
+ if isinstance(pci_list[i], types.StringTypes):
+ pci_list[i] = [pci_list[i]]
+
+ # Now every element in pci_list is a list of pci string.
+
+ result = []
+ for pci_names in pci_list:
+ devs = []
+ for pci in pci_names:
+ devs = devs + [pci_dev_dict[pci]]
+ result = result + [devs]
+ return result
+
+def check_mmio_bar(devs_list):
+ result = []
+
+ for dev_list in devs_list:
+ non_aligned_bar_found = False
+ for dev in dev_list:
+ if dev.has_non_page_aligned_bar:
+ non_aligned_bar_found = True
+ break
+ if not non_aligned_bar_found:
+ result = result + [dev_list]
+
+ return result
class PciDeviceNotFoundError(Exception):
def __init__(self,domain,bus,slot,func):
diff -r 9287873bb56c tools/python/xen/xm/main.py
--- a/tools/python/xen/xm/main.py Sat Jul 12 18:31:00 2008 +0800
+++ b/tools/python/xen/xm/main.py Sat Jul 12 18:31:37 2008 +0800
@@ -38,6 +38,7 @@ import xml.dom.minidom
import xml.dom.minidom
from xen.util.blkif import blkdev_name_to_number
from xen.util import vscsi_util
+from xen.util.pci import *
import warnings
warnings.filterwarnings('ignore', category=FutureWarning)
@@ -55,6 +56,9 @@ from xen.util.acmpolicy import ACM_LABEL
from xen.util.acmpolicy import ACM_LABEL_UNLABELED_DISPLAY
import XenAPI
+
+import xen.lowlevel.xc
+xc = xen.lowlevel.xc.xc()
import inspect
from xen.xend import XendOptions
@@ -183,6 +187,7 @@ SUBCOMMAND_HELP = {
'Remove a domain\'s pass-through pci device.'),
'pci-list' : ('<Domain>',
'List pass-through pci devices for a domain.'),
+ 'pci-list-assignable-devices' : ('', 'List all the assignable pci devices'),
'scsi-attach' : ('<Domain> <PhysDevice> <VirtDevice> [BackDomain]',
'Attach a new SCSI device.'),
'scsi-detach' : ('<Domain> <VirtDevice>',
@@ -355,6 +360,7 @@ device_commands = [
"pci-attach",
"pci-detach",
"pci-list",
+ "pci-list-assignable-devices",
"scsi-attach",
"scsi-detach",
"scsi-list",
@@ -2115,6 +2121,35 @@ def xm_pci_list(args):
print (hdr_str)
hdr = 1
print ( fmt_str % x )
+
+def xm_pci_list_assignable_devices(args):
+ # Each element of dev_list is a PciDevice
+ dev_list = find_all_devices_owned_by_pciback()
+
+ # Each element of devs_list is a list of PciDevice
+ devs_list = check_FLR_capability(dev_list)
+
+ devs_list = check_mmio_bar(devs_list)
+
+ # Check if the devices have been assigned to guests.
+ final_devs_list = []
+ for dev_list in devs_list:
+ available = True
+ for d in dev_list:
+ pci_str = '0x%x,0x%x,0x%x,0x%x' %(d.domain, d.bus, d.slot, d.func)
+ # Xen doesn't care what the domid is, so we pass 0 here...
+ domid = 0
+ bdf = xc.test_assign_device(domid, pci_str)
+ if bdf != 0:
+ available = False
+ break
+ if available:
+ final_devs_list = final_devs_list + [dev_list]
+
+ for dev_list in final_devs_list:
+ for d in dev_list:
+ print d.name,
+ print
def vscsi_convert_sxp_to_dict(dev_sxp):
dev_dict = {}
@@ -2645,6 +2680,7 @@ commands = {
"pci-attach": xm_pci_attach,
"pci-detach": xm_pci_detach,
"pci-list": xm_pci_list,
+ "pci-list-assignable-devices": xm_pci_list_assignable_devices,
# vscsi
"scsi-attach": xm_scsi_attach,
"scsi-detach": xm_scsi_detach,
[-- Attachment #6: Type: text/plain, Size: 138 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
^ permalink raw reply [flat|nested] 27+ messages in thread* Re: [PATCH] Improve the current FLR logic
2008-07-12 12:37 [PATCH] Improve the current FLR logic Cui, Dexuan
@ 2008-07-14 17:52 ` Espen Skoglund
2008-07-17 10:21 ` Cui, Dexuan
2008-07-15 4:05 ` Yosuke Iwamatsu
2008-07-17 9:33 ` Yuji Shimada
2 siblings, 1 reply; 27+ messages in thread
From: Espen Skoglund @ 2008-07-14 17:52 UTC (permalink / raw)
To: Cui, Dexuan; +Cc: xen-devel, Keir Fraser
Maybe I've got this wrong, but it looks like you're doing the FLR
after the device has been deassigned (i.e., given back to dom0).
Shouldn't you do the FLR before you actually deassign the device
instead? If the device is currently set up for doing DMA transactions
to host memory, the pending transactions will end up in dom0's memory
space. This is fine if GMFN == MFN since the guest memory has still
not been released, but for GMFN != MFN you'll end up corrupting
arbitrary dom0 memory.
The right procedure for deassigning devices should be:
- Revoke device resources.
- FLR.
- Reassign device to dom0.
Likewise, for assigning devices we should do:
- FLR.
- Assign device to guest.
- Grant device resources to guest.
Another option is to completely deassign the devices from the IOMMU
rather than reassigning them to dom0. Devices bound to pciback need
not be assigned to dom0's IOMMU tables anyway.
Further, your patch probably breaks things when running old dom0/xend
on a new hypervisor since you'll end up not doing any FLR at all. I
recently experienced the same thing with some of my own PCI cleanup
patches.
eSk
[Dexuan Cui]
> Hi, all,
> The attached patches try to improve the current FLR logic.
> The idea is: removing the FLR logic from hypervisor and adding the
> improved logic in Control Panel.
> The current FLR logic in hypervisor has some issues: 1) Dstate
> transition is not guaranteed to properly clear the device state; 2) the
> current code for PCIe FLR is actually buggy: PCI_EXP_DEVSTA_TRPND
> doesn't mean the completion of FLR; according to the PCIe spec, after
> issuing FLR, we should wait at least 100ms.
> To make it easier to improve the FLR logic, and to make the hypervisor
> thin, I think we might as well move the logic to Control Panel for the
> time being. In the long run, the essential logic may be implemented in
> the pciback driver of Dom0 instead.
> [...]
> I made some tests on my hosts. Looks the patches work well.
> I'd like to ask for your comments, and test feedbacks. Thank you
> very much!
^ permalink raw reply [flat|nested] 27+ messages in thread
* RE: [PATCH] Improve the current FLR logic
2008-07-14 17:52 ` Espen Skoglund
@ 2008-07-17 10:21 ` Cui, Dexuan
2008-07-17 12:50 ` Espen Skoglund
0 siblings, 1 reply; 27+ messages in thread
From: Cui, Dexuan @ 2008-07-17 10:21 UTC (permalink / raw)
To: Espen Skoglund; +Cc: xen-devel, Keir Fraser, Jiang, Yunhong
Yes, when a domain is destroyed, the place where I do FLR is late. Thanks for pointing out this.
Looks the situation here is a little tricky. I'm trying to find a better place.
When a device is not assigned to other domain, since Dom0 might dynamically rebind the device to another driver rather than pciback, "completely deassign the devices from the IOMMU" may not be a good idea?
For the old xend running on a new hypervisor, FLR is actually skipped.
I'm not sure if this is unacceptable. I'd like to hear more comments.
Thanks,
-- Dexuan
-----Original Message-----
From: Espen Skoglund [mailto:espen.skoglund@netronome.com]
Sent: 2008年7月15日 1:52
To: Cui, Dexuan
Cc: Keir Fraser; xen-devel@lists.xensource.com
Subject: Re: [Xen-devel] [PATCH] Improve the current FLR logic
Maybe I've got this wrong, but it looks like you're doing the FLR
after the device has been deassigned (i.e., given back to dom0).
Shouldn't you do the FLR before you actually deassign the device
instead? If the device is currently set up for doing DMA transactions
to host memory, the pending transactions will end up in dom0's memory
space. This is fine if GMFN == MFN since the guest memory has still
not been released, but for GMFN != MFN you'll end up corrupting
arbitrary dom0 memory.
The right procedure for deassigning devices should be:
- Revoke device resources.
- FLR.
- Reassign device to dom0.
Likewise, for assigning devices we should do:
- FLR.
- Assign device to guest.
- Grant device resources to guest.
Another option is to completely deassign the devices from the IOMMU
rather than reassigning them to dom0. Devices bound to pciback need
not be assigned to dom0's IOMMU tables anyway.
Further, your patch probably breaks things when running old dom0/xend
on a new hypervisor since you'll end up not doing any FLR at all. I
recently experienced the same thing with some of my own PCI cleanup
patches.
eSk
[Dexuan Cui]
> Hi, all,
> The attached patches try to improve the current FLR logic.
> The idea is: removing the FLR logic from hypervisor and adding the
> improved logic in Control Panel.
> The current FLR logic in hypervisor has some issues: 1) Dstate
> transition is not guaranteed to properly clear the device state; 2) the
> current code for PCIe FLR is actually buggy: PCI_EXP_DEVSTA_TRPND
> doesn't mean the completion of FLR; according to the PCIe spec, after
> issuing FLR, we should wait at least 100ms.
> To make it easier to improve the FLR logic, and to make the hypervisor
> thin, I think we might as well move the logic to Control Panel for the
> time being. In the long run, the essential logic may be implemented in
> the pciback driver of Dom0 instead.
> [...]
> I made some tests on my hosts. Looks the patches work well.
> I'd like to ask for your comments, and test feedbacks. Thank you
> very much!
^ permalink raw reply [flat|nested] 27+ messages in thread
* RE: [PATCH] Improve the current FLR logic
2008-07-17 10:21 ` Cui, Dexuan
@ 2008-07-17 12:50 ` Espen Skoglund
2008-07-18 5:31 ` Jiang, Yunhong
0 siblings, 1 reply; 27+ messages in thread
From: Espen Skoglund @ 2008-07-17 12:50 UTC (permalink / raw)
To: Cui, Dexuan; +Cc: xen-devel, Espen Skoglund, Keir Fraser, Jiang, Yunhong
What I was thinking of with deassigning it completely was that pciback
performs a deassign when it is binds to a device, and an assign to
dom0 when it unbinds a device. The latter part is not strictly
necessary though, since Linux will also unregister the device from Xen
when pciback performs unbind. This implies a deassign and a
subsequent register and assign to dom0 once the device is bound to
another driver.
Doing a complete deassing instead of reassign to dom0 would ensure
that the whole FLR/deassign order problem is circumvented.
eSk
-----Original Message-----
From: "Cui, Dexuan" <dexuan.cui@intel.com>
Subject: RE: [Xen-devel] [PATCH] Improve the current FLR logic
Date: Thu, 17 Jul 2008 18:21:36 +0800
Yes, when a domain is destroyed, the place where I do FLR is late.
Thanks for pointing out this.
Looks the situation here is a little tricky. I'm trying to find a better
place.
When a device is not assigned to other domain, since Dom0 might
dynamically rebind the device to another driver rather than pciback,
"completely deassign the devices from the IOMMU" may not be a good idea?
For the old xend running on a new hypervisor, FLR is actually skipped.
I'm not sure if this is unacceptable. I'd like to hear more comments.
Thanks,
-- Dexuan
-----Original Message-----
From: Espen Skoglund [mailto:espen.skoglund@netronome.com]=20
Sent: 2008=C4=EA7=D4=C215=C8=D5 1:52
To: Cui, Dexuan
Cc: Keir Fraser; xen-devel@lists.xensource.com
Subject: Re: [Xen-devel] [PATCH] Improve the current FLR logic
Maybe I've got this wrong, but it looks like you're doing the FLR
after the device has been deassigned (i.e., given back to dom0).
Shouldn't you do the FLR before you actually deassign the device
instead? If the device is currently set up for doing DMA transactions
to host memory, the pending transactions will end up in dom0's memory
space. This is fine if GMFN =3D=3D MFN since the guest memory has still
not been released, but for GMFN !=3D MFN you'll end up corrupting
arbitrary dom0 memory.
The right procedure for deassigning devices should be:
- Revoke device resources.
- FLR.
- Reassign device to dom0.
Likewise, for assigning devices we should do:
- FLR.
- Assign device to guest.
- Grant device resources to guest.
Another option is to completely deassign the devices from the IOMMU
rather than reassigning them to dom0. Devices bound to pciback need
not be assigned to dom0's IOMMU tables anyway.
Further, your patch probably breaks things when running old dom0/xend
on a new hypervisor since you'll end up not doing any FLR at all. I
recently experienced the same thing with some of my own PCI cleanup
patches.
eSk
[Dexuan Cui]
> Hi, all,
> The attached patches try to improve the current FLR logic.
> The idea is: removing the FLR logic from hypervisor and adding the
> improved logic in Control Panel.
> The current FLR logic in hypervisor has some issues: 1) Dstate
> transition is not guaranteed to properly clear the device state; 2) =
the
> current code for PCIe FLR is actually buggy: PCI_EXP_DEVSTA_TRPND
> doesn't mean the completion of FLR; according to the PCIe spec, after
> issuing FLR, we should wait at least 100ms.
> To make it easier to improve the FLR logic, and to make the hypervisor
> thin, I think we might as well move the logic to Control Panel for the
> time being. In the long run, the essential logic may be implemented in
> the pciback driver of Dom0 instead.
> [...]
> I made some tests on my hosts. Looks the patches work well.
> I'd like to ask for your comments, and test feedbacks. Thank you
> very much!
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
^ permalink raw reply [flat|nested] 27+ messages in thread
* RE: [PATCH] Improve the current FLR logic
2008-07-17 12:50 ` Espen Skoglund
@ 2008-07-18 5:31 ` Jiang, Yunhong
0 siblings, 0 replies; 27+ messages in thread
From: Jiang, Yunhong @ 2008-07-18 5:31 UTC (permalink / raw)
To: Espen Skoglund, Cui, Dexuan; +Cc: xen-devel, Keir Fraser
Will this method impact PV domU when iommu_pv is not specified?
In fact, I have a look on the tools\python\xen\xend\server\pciif.py, I
noticed it is much bigger than other file (like tpmif.py or vscsiif.py)
in the same directory. and some pci logic in XendDomainInfo.py also. So
are there any rules when decide a function should be placed in control
panel or in backend?
As for this order issue, I think it is ok for HVM domain, since
domain_destroy should happen after control panel do the cleanup work (We
may need do that before qemu is destroed). But I'm not sure if PV domain
will have such opporunity.
Thanks
Yunhong Jiang
Espen Skoglund <mailto:espen.skoglund@netronome.com> wrote:
> What I was thinking of with deassigning it completely was that pciback
> performs a deassign when it is binds to a device, and an assign to
> dom0 when it unbinds a device. The latter part is not strictly
> necessary though, since Linux will also unregister the device from Xen
> when pciback performs unbind. This implies a deassign and a
> subsequent register and assign to dom0 once the device is bound to
another
> driver.
>
> Doing a complete deassing instead of reassign to dom0 would ensure
> that the whole FLR/deassign order problem is circumvented.
>
> eSk
>
>
>
> -----Original Message-----
> From: "Cui, Dexuan" <dexuan.cui@intel.com>
> Subject: RE: [Xen-devel] [PATCH] Improve the current FLR logic
> Date: Thu, 17 Jul 2008 18:21:36 +0800
>
> Yes, when a domain is destroyed, the place where I do FLR is late.
Thanks
> for pointing out this. Looks the situation here is a little tricky.
I'm
> trying to
> find a better
> place.
>
> When a device is not assigned to other domain, since Dom0 might
> dynamically rebind the device to another driver rather than pciback,
> "completely deassign the devices from the IOMMU" may not be a
> good idea?
>
> For the old xend running on a new hypervisor, FLR is actually skipped.
> I'm not sure if this is unacceptable. I'd like to hear more comments.
>
> Thanks,
> -- Dexuan
>
>
> -----Original Message-----
> From: Espen Skoglund [mailto:espen.skoglund@netronome.com]=20
> Sent: 2008=C4=EA7=D4=C215=C8=D5 1:52
> To: Cui, Dexuan
> Cc: Keir Fraser; xen-devel@lists.xensource.com
> Subject: Re: [Xen-devel] [PATCH] Improve the current FLR logic
>
> Maybe I've got this wrong, but it looks like you're doing the FLR
> after the device has been deassigned (i.e., given back to dom0).
> Shouldn't you do the FLR before you actually deassign the device
> instead? If the device is currently set up for doing DMA transactions
> to host memory, the pending transactions will end up in dom0's memory
> space. This is fine if GMFN =3D=3D MFN since the guest memory
> has still
> not been released, but for GMFN !=3D MFN you'll end up corrupting
arbitrary
> dom0 memory.
>
> The right procedure for deassigning devices should be:
>
> - Revoke device resources.
> - FLR.
> - Reassign device to dom0.
>
> Likewise, for assigning devices we should do:
>
> - FLR.
> - Assign device to guest.
> - Grant device resources to guest.
>
> Another option is to completely deassign the devices from the IOMMU
> rather than reassigning them to dom0. Devices bound to pciback need
> not be assigned to dom0's IOMMU tables anyway.
>
> Further, your patch probably breaks things when running old dom0/xend
> on a new hypervisor since you'll end up not doing any FLR at all. I
> recently experienced the same thing with some of my own PCI cleanup
patches.
>
> eSk
>
>
> [Dexuan Cui]
>> Hi, all,
>> The attached patches try to improve the current FLR logic.
>> The idea is: removing the FLR logic from hypervisor and adding the
>> improved logic in Control Panel.
>
>> The current FLR logic in hypervisor has some issues: 1) Dstate
>> transition is not guaranteed to properly clear the device state; 2) =
the
>> current code for PCIe FLR is actually buggy: PCI_EXP_DEVSTA_TRPND
>> doesn't mean the completion of FLR; according to the PCIe spec, after
>> issuing FLR, we should wait at least 100ms.
>
>> To make it easier to improve the FLR logic, and to make the
hypervisor
>> thin, I think we might as well move the logic to Control Panel for
the
>> time being. In the long run, the essential logic may be implemented
in
>> the pciback driver of Dom0 instead.
>
>> [...]
>
>> I made some tests on my hosts. Looks the patches work well.
>
>> I'd like to ask for your comments, and test feedbacks. Thank you
>> very much!
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] Improve the current FLR logic
2008-07-12 12:37 [PATCH] Improve the current FLR logic Cui, Dexuan
2008-07-14 17:52 ` Espen Skoglund
@ 2008-07-15 4:05 ` Yosuke Iwamatsu
2008-07-17 10:21 ` Cui, Dexuan
2008-07-17 9:33 ` Yuji Shimada
2 siblings, 1 reply; 27+ messages in thread
From: Yosuke Iwamatsu @ 2008-07-15 4:05 UTC (permalink / raw)
To: Cui, Dexuan; +Cc: xen-devel, Keir Fraser
Hi,
Cui, Dexuan wrote:
> [PATCH3] do_FLR_in_control_panel.patch: the improved FLR logic in
> Control Panel: 1) If the device is PCIe endpoint and supports PCIe FLR,
> we use that; 2) Else, if the device is PCIe endpoint, and all functions
> on the device are assigned to the same guest, we use the immediate
> parent bus's Secondary Bus Reset to reset all functions of the device
> (here, actually we require all the functions of the device be assigned
> to the same guest); 3) Else, if the device is PCI endpoint and is on a
> host bus (e.g. integrated devices) and if the device supports PCI
> Advanced Capabilities, we use that for FLR; 4) Else, we use the
> Secondary Bus Reset (if the PCI device is behind a PCI/PCI-X bridge,
> then all devices behind the uppermost such PCI/PCI-X bridge above this
> device must be co-assigned)
In case 2) or 4), I think multfunction devices or sibling devices need
not be actually assigned to the same guest, if they are just owned by
pciback and not used by other guests.
> [PATCH4] list_assignable_devices.patch: a command "xm
> pci-list-assignable-devices" which can list all the assignable devices
> in the system. It is useful for end users.
It looks like xm_pci_list_assignable_devices() directly accesses to
sysfs and invokes hypercall without going through xend. Since xm
commands can be executed by a remote server through xmlrpc, most of
this function should be implemented in xend.
Also please note that test_assign_device() excludes the case of
non-iommu device assignment for pv.
Thanks,
-- Yosuke Iwamatsu
^ permalink raw reply [flat|nested] 27+ messages in thread
* RE: [PATCH] Improve the current FLR logic
2008-07-15 4:05 ` Yosuke Iwamatsu
@ 2008-07-17 10:21 ` Cui, Dexuan
2008-07-17 11:06 ` Yosuke Iwamatsu
0 siblings, 1 reply; 27+ messages in thread
From: Cui, Dexuan @ 2008-07-17 10:21 UTC (permalink / raw)
To: Yosuke Iwamatsu; +Cc: xen-devel, Keir Fraser
Thanks for your comments!
For case 2) or 4), I think we might as well force the domain config file specifying all the related devices.
xm_pci_list_assignable_devices() does access to sysfs and invoke hypercall directly.
With respect to xmlrpc, I'm trying to move most of the function to xend.
I think xm_pci_list_assignable_devices only considers iommu devices. Maybe we can extend it to non-iommu devices in future.
Thanks,
-- Dexuan
-----Original Message-----
From: Yosuke Iwamatsu [mailto:y-iwamatsu@ab.jp.nec.com]
Sent: 2008年7月15日 12:06
To: Cui, Dexuan
Cc: Keir Fraser; xen-devel@lists.xensource.com
Subject: Re: [Xen-devel] [PATCH] Improve the current FLR logic
Hi,
Cui, Dexuan wrote:
> [PATCH3] do_FLR_in_control_panel.patch: the improved FLR logic in
> Control Panel: 1) If the device is PCIe endpoint and supports PCIe FLR,
> we use that; 2) Else, if the device is PCIe endpoint, and all functions
> on the device are assigned to the same guest, we use the immediate
> parent bus's Secondary Bus Reset to reset all functions of the device
> (here, actually we require all the functions of the device be assigned
> to the same guest); 3) Else, if the device is PCI endpoint and is on a
> host bus (e.g. integrated devices) and if the device supports PCI
> Advanced Capabilities, we use that for FLR; 4) Else, we use the
> Secondary Bus Reset (if the PCI device is behind a PCI/PCI-X bridge,
> then all devices behind the uppermost such PCI/PCI-X bridge above this
> device must be co-assigned)
In case 2) or 4), I think multfunction devices or sibling devices need
not be actually assigned to the same guest, if they are just owned by
pciback and not used by other guests.
> [PATCH4] list_assignable_devices.patch: a command "xm
> pci-list-assignable-devices" which can list all the assignable devices
> in the system. It is useful for end users.
It looks like xm_pci_list_assignable_devices() directly accesses to
sysfs and invokes hypercall without going through xend. Since xm
commands can be executed by a remote server through xmlrpc, most of
this function should be implemented in xend.
Also please note that test_assign_device() excludes the case of
non-iommu device assignment for pv.
Thanks,
-- Yosuke Iwamatsu
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] Improve the current FLR logic
2008-07-17 10:21 ` Cui, Dexuan
@ 2008-07-17 11:06 ` Yosuke Iwamatsu
2008-07-18 2:13 ` Cui, Dexuan
0 siblings, 1 reply; 27+ messages in thread
From: Yosuke Iwamatsu @ 2008-07-17 11:06 UTC (permalink / raw)
To: Cui, Dexuan; +Cc: xen-devel, Keir Fraser
Cui, Dexuan wrote:
> Thanks for your comments!
>
> For case 2) or 4), I think we might as well force the domain config file specifying all the related devices.
Specifying all the related devices means allowing the domain to use them
all, but there may be a case that people want to limit devices available
to guests.
(Suppose there is a device that has several functions without FLR and
the administrator wants to let the guest domain use only one function,
for example.)
>
> xm_pci_list_assignable_devices() does access to sysfs and invoke hypercall directly.
> With respect to xmlrpc, I'm trying to move most of the function to xend.
>
> I think xm_pci_list_assignable_devices only considers iommu devices. Maybe we can extend it to non-iommu devices in future.
>
Thank you,
-- Yosuke
^ permalink raw reply [flat|nested] 27+ messages in thread
* RE: [PATCH] Improve the current FLR logic
2008-07-17 11:06 ` Yosuke Iwamatsu
@ 2008-07-18 2:13 ` Cui, Dexuan
0 siblings, 0 replies; 27+ messages in thread
From: Cui, Dexuan @ 2008-07-18 2:13 UTC (permalink / raw)
To: Yosuke Iwamatsu; +Cc: xen-devel, Keir Fraser
Yosuke Iwamatsu wrote:
> Cui, Dexuan wrote:
>> Thanks for your comments!
>>
>> For case 2) or 4), I think we might as well force the domain config
>> file specifying all the related devices.
>
> Specifying all the related devices means allowing the domain to use
> them all, but there may be a case that people want to limit devices
> available to guests.
> (Suppose there is a device that has several functions without FLR and
> the administrator wants to let the guest domain use only one function,
> for example.)
Oh, I see. I should relax the checking here.
Thank you, Yosuke!
Thanks,
-- Dexuan
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] Improve the current FLR logic
2008-07-12 12:37 [PATCH] Improve the current FLR logic Cui, Dexuan
2008-07-14 17:52 ` Espen Skoglund
2008-07-15 4:05 ` Yosuke Iwamatsu
@ 2008-07-17 9:33 ` Yuji Shimada
2008-07-17 10:22 ` Cui, Dexuan
` (2 more replies)
2 siblings, 3 replies; 27+ messages in thread
From: Yuji Shimada @ 2008-07-17 9:33 UTC (permalink / raw)
To: Cui, Dexuan, Keir Fraser, xen-devel
On Sat, 12 Jul 2008 20:37:34 +0800
"Cui, Dexuan" <dexuan.cui@intel.com> wrote:
> I'd like to ask for your comments, and test feedbacks. Thank you very
> much!
There is a problem with the device reconfiguring logic. xend saves all
Configuration Register's values before reset. And xend writes the
values to the registers after reset. This means the resister's values
which are configured by guest software are restore. Guest software
setting should be cleared.
I think the following Configuration Resister should be reconfigured to
correct values after reset. And other registers should not be
reconfigured after reset if there is no reason.
- It is necessary to write the base address of the resource allocated
by the dom0 kernel to the following resister.
- Base Address Register
- When dom0 starts, the values configured by firmware should be saved,
and the values should be written to the following resisters after
reset. The reason is that firmware configures them to archive system
specific functions. Especially, the registers relating error reporting
are configured to collect error information. If this configured values
is changed, some functions might be lost.
Instead of save/restore, it is good way to write the value from _HPX
method to registers, I think. _HPX method returns the value to write
to registers.
- Cache-line size Register
- Latency timer Register
- Enable SERR Bit/Enable PERR Bit in Device Control Register
- Uncorrectable Error Mask Register
- Uncorrectable Error Severity Register
- Correctable Error Mask Register
- Advanced Error Capabilities and Control Register
- Device Control Register
- Link Control Register
- Secondary Uncorrectable Error Severity Register
- Secondary Uncorrectable Error Mask Register
- Device Control 2 Register
- Link Control 2 Register
- The following resister should be configured to "0".
- PME Enable Bit/PME Status Bit in PCI Power Management
Control/Status Register
I would like to discuss this logic more.
Thanks.
--
Yuji Shimada
^ permalink raw reply [flat|nested] 27+ messages in thread* RE: [PATCH] Improve the current FLR logic
2008-07-17 9:33 ` Yuji Shimada
@ 2008-07-17 10:22 ` Cui, Dexuan
2008-07-18 6:37 ` Cui, Dexuan
2008-10-01 6:53 ` Neo Jia
2 siblings, 0 replies; 27+ messages in thread
From: Cui, Dexuan @ 2008-07-17 10:22 UTC (permalink / raw)
To: Yuji Shimada, Keir Fraser, xen-devel
Thanks for the comments!
Yes, the current simple save/restore has issues, and you have pointed some.
Let me think over all the issues and work out a better solution.
Thanks,
-- Dexuan
-----Original Message-----
From: Yuji Shimada [mailto:shimada-yxb@necst.nec.co.jp]
Sent: 2008年7月17日 17:34
To: Cui, Dexuan; Keir Fraser; xen-devel@lists.xensource.com
Subject: Re: [Xen-devel] [PATCH] Improve the current FLR logic
On Sat, 12 Jul 2008 20:37:34 +0800
"Cui, Dexuan" <dexuan.cui@intel.com> wrote:
> I'd like to ask for your comments, and test feedbacks. Thank you very
> much!
There is a problem with the device reconfiguring logic. xend saves all
Configuration Register's values before reset. And xend writes the
values to the registers after reset. This means the resister's values
which are configured by guest software are restore. Guest software
setting should be cleared.
I think the following Configuration Resister should be reconfigured to
correct values after reset. And other registers should not be
reconfigured after reset if there is no reason.
- It is necessary to write the base address of the resource allocated
by the dom0 kernel to the following resister.
- Base Address Register
- When dom0 starts, the values configured by firmware should be saved,
and the values should be written to the following resisters after
reset. The reason is that firmware configures them to archive system
specific functions. Especially, the registers relating error reporting
are configured to collect error information. If this configured values
is changed, some functions might be lost.
Instead of save/restore, it is good way to write the value from _HPX
method to registers, I think. _HPX method returns the value to write
to registers.
- Cache-line size Register
- Latency timer Register
- Enable SERR Bit/Enable PERR Bit in Device Control Register
- Uncorrectable Error Mask Register
- Uncorrectable Error Severity Register
- Correctable Error Mask Register
- Advanced Error Capabilities and Control Register
- Device Control Register
- Link Control Register
- Secondary Uncorrectable Error Severity Register
- Secondary Uncorrectable Error Mask Register
- Device Control 2 Register
- Link Control 2 Register
- The following resister should be configured to "0".
- PME Enable Bit/PME Status Bit in PCI Power Management
Control/Status Register
I would like to discuss this logic more.
Thanks.
--
Yuji Shimada
^ permalink raw reply [flat|nested] 27+ messages in thread* RE: [PATCH] Improve the current FLR logic
2008-07-17 9:33 ` Yuji Shimada
2008-07-17 10:22 ` Cui, Dexuan
@ 2008-07-18 6:37 ` Cui, Dexuan
2008-07-18 9:34 ` Yuji Shimada
2008-10-01 6:53 ` Neo Jia
2 siblings, 1 reply; 27+ messages in thread
From: Cui, Dexuan @ 2008-07-18 6:37 UTC (permalink / raw)
To: Yuji Shimada, Keir Fraser, xen-devel
Yuji Shimada wrote:
> On Sat, 12 Jul 2008 20:37:34 +0800
> "Cui, Dexuan" <dexuan.cui@intel.com> wrote:
>
>> I'd like to ask for your comments, and test feedbacks. Thank you very
>> much!
>
> There is a problem with the device reconfiguring logic. xend saves all
> Configuration Register's values before reset. And xend writes the
> values to the registers after reset. This means the resister's values
> which are configured by guest software are restore. Guest software
> setting should be cleared.
Now xend saves and restores all the 256-byte space -- this is not
suitable as you pointed.
How about only saving/restoring the header (the first 64-byte)?
I think this may be a good idea.
I tend to make the least change to the current code. :-)
Thanks,
-- Dexuan
>
> I think the following Configuration Resister should be reconfigured to
> correct values after reset. And other registers should not be
> reconfigured after reset if there is no reason.
>
> - It is necessary to write the base address of the resource allocated
> by the dom0 kernel to the following resister.
> - Base Address Register
>
> - When dom0 starts, the values configured by firmware should be saved,
> and the values should be written to the following resisters after
> reset. The reason is that firmware configures them to archive system
> specific functions. Especially, the registers relating error
> reporting are configured to collect error information. If this
> configured values is changed, some functions might be lost.
> Instead of save/restore, it is good way to write the value from _HPX
> method to registers, I think. _HPX method returns the value to write
> to registers.
> - Cache-line size Register
> - Latency timer Register
> - Enable SERR Bit/Enable PERR Bit in Device Control Register
> - Uncorrectable Error Mask Register
> - Uncorrectable Error Severity Register
> - Correctable Error Mask Register
> - Advanced Error Capabilities and Control Register
> - Device Control Register
> - Link Control Register
> - Secondary Uncorrectable Error Severity Register
> - Secondary Uncorrectable Error Mask Register
> - Device Control 2 Register
> - Link Control 2 Register
>
> - The following resister should be configured to "0".
> - PME Enable Bit/PME Status Bit in PCI Power Management
> Control/Status Register
>
> I would like to discuss this logic more.
>
> Thanks.
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] Improve the current FLR logic
2008-07-18 6:37 ` Cui, Dexuan
@ 2008-07-18 9:34 ` Yuji Shimada
2008-07-18 10:43 ` Cui, Dexuan
0 siblings, 1 reply; 27+ messages in thread
From: Yuji Shimada @ 2008-07-18 9:34 UTC (permalink / raw)
To: Cui, Dexuan; +Cc: xen-devel, Keir Fraser
On Fri, 18 Jul 2008 14:37:30 +0800
"Cui, Dexuan" <dexuan.cui@intel.com> wrote:
> Now xend saves and restores all the 256-byte space -- this is not
> suitable as you pointed.
> How about only saving/restoring the header (the first 64-byte)?
I think following registers in PCI Express Capability Structure should be
restored. The reason is that host firmware might configure them.
- Uncorrectable Error Mask Register
- Uncorrectable Error Severity Register
- Correctable Error Mask Register
- Advanced Error Capabilities and Control Register
- Device Control Register
- Link Control Register
- Secondary Uncorrectable Error Severity Register
- Secondary Uncorrectable Error Mask Register
- Device Control 2 Register
- Link Control 2 Register
What do you think about following method?
1. create the table, and fill offset, size, etc. of registers to
save/restore into the table.
2. save/restore registers based on the table.
Thanks.
--
Yuji Shimada
^ permalink raw reply [flat|nested] 27+ messages in thread* RE: [PATCH] Improve the current FLR logic
2008-07-18 9:34 ` Yuji Shimada
@ 2008-07-18 10:43 ` Cui, Dexuan
2008-07-22 10:16 ` Yuji Shimada
0 siblings, 1 reply; 27+ messages in thread
From: Cui, Dexuan @ 2008-07-18 10:43 UTC (permalink / raw)
To: Yuji Shimada; +Cc: xen-devel, Keir Fraser
Yuji Shimada wrote:
> On Fri, 18 Jul 2008 14:37:30 +0800
> "Cui, Dexuan" <dexuan.cui@intel.com> wrote:
>
>> Now xend saves and restores all the 256-byte space -- this is not
>> suitable as you pointed. How about only saving/restoring the header
>> (the first 64-byte)?
>
> I think following registers in PCI Express Capability Structure
> should be restored. The reason is that host firmware might configure
> them.
>
> - Uncorrectable Error Mask Register
> - Uncorrectable Error Severity Register
> - Correctable Error Mask Register
> - Advanced Error Capabilities and Control Register
> - Device Control Register
> - Link Control Register
> - Secondary Uncorrectable Error Severity Register
> - Secondary Uncorrectable Error Mask Register
> - Device Control 2 Register
> - Link Control 2 Register
Hi Yuji,
Thanks for your advice!
To restore the values of the PCIe registers to the original values the
firmware (and Dom0?) configues, we need to store the original values
somewhere.
Maybe we can restore the info into xenstore?
> What do you think about following method?
>
> 1. create the table, and fill offset, size, etc. of registers to
> save/restore into the table.
>
> 2. save/restore registers based on the table.
This sounds nice.
Where should we store the table?
Thanks,
-- Dexuan
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] Improve the current FLR logic
2008-07-18 10:43 ` Cui, Dexuan
@ 2008-07-22 10:16 ` Yuji Shimada
2008-07-22 10:48 ` Keir Fraser
2008-07-23 3:16 ` Cui, Dexuan
0 siblings, 2 replies; 27+ messages in thread
From: Yuji Shimada @ 2008-07-22 10:16 UTC (permalink / raw)
To: Cui, Dexuan; +Cc: xen-devel, Keir Fraser
On Fri, 18 Jul 2008 18:43:49 +0800
"Cui, Dexuan" <dexuan.cui@intel.com> wrote:
> Hi Yuji,
> Thanks for your advice!
> To restore the values of the PCIe registers to the original values the
> firmware (and Dom0?) configues, we need to store the original values
> somewhere.
> Maybe we can restore the info into xenstore?
Hi Cui,
Xenstore is OK, but pciback is better place, I think.
Because device driver should configure devices.
If we do it in pciback, the coding might be difficult. But the
whole architecture will be simple.
I think the following saving/restoring timing is good.
- saving : after pciback bind device
- restoring : after xend writes SBDF into xenstore
before pciback removes backend device
>> What do you think about following method?
>>
>> 1. create the table, and fill offset, size, etc. of registers to
>> save/restore into the table.
>>
>> 2. save/restore registers based on the table.
> This sounds nice.
> Where should we store the table?
I don't think device specific tables are required.
So we only have to create one table.
xend (or pciback if you implement saving/restoring in pciback) is good
place, I think.
Thanks.
--
Yuji Shimada
^ permalink raw reply [flat|nested] 27+ messages in thread* Re: [PATCH] Improve the current FLR logic
2008-07-22 10:16 ` Yuji Shimada
@ 2008-07-22 10:48 ` Keir Fraser
2008-07-23 3:16 ` Cui, Dexuan
1 sibling, 0 replies; 27+ messages in thread
From: Keir Fraser @ 2008-07-22 10:48 UTC (permalink / raw)
To: Yuji Shimada, Cui, Dexuan; +Cc: xen-devel
On 22/7/08 11:16, "Yuji Shimada" <shimada-yxb@necst.nec.co.jp> wrote:
> Xenstore is OK, but pciback is better place, I think.
> Because device driver should configure devices.
> If we do it in pciback, the coding might be difficult. But the
> whole architecture will be simple.
I find the model of pciback owning pass-thru'ed devices, configuring and
resetting them, and ultimately handing them off to domU guests via the PV
front-back interface or via qemu-dm emulation, is quite attractive at least
conceptually. As you say, I'm not sure what devils may lie in the details.
Still, it seems nicer than hacking logic into xend. :-)
-- Keir
^ permalink raw reply [flat|nested] 27+ messages in thread
* RE: [PATCH] Improve the current FLR logic
2008-07-22 10:16 ` Yuji Shimada
2008-07-22 10:48 ` Keir Fraser
@ 2008-07-23 3:16 ` Cui, Dexuan
2008-07-24 7:53 ` Yuji Shimada
1 sibling, 1 reply; 27+ messages in thread
From: Cui, Dexuan @ 2008-07-23 3:16 UTC (permalink / raw)
To: Yuji Shimada; +Cc: xen-devel, Keir Fraser
Yuji Shimada wrote:
> On Fri, 18 Jul 2008 18:43:49 +0800
> "Cui, Dexuan" <dexuan.cui@intel.com> wrote:
>> Hi Yuji,
>> Thanks for your advice!
>> To restore the values of the PCIe registers to the original values
>> the firmware (and Dom0?) configues, we need to store the original
>> values somewhere. Maybe we can restore the info into xenstore?
>
> Hi Cui,
> Xenstore is OK, but pciback is better place, I think.
> Because device driver should configure devices.
I think xenstore seems to be a little ugly place to remember the proper
values of all or part of registers of all the devices in the system.
I agree pciback is a good place.
> If we do it in pciback, the coding might be difficult. But the
> whole architecture will be simple.
Yes. Now I realize pciback seems to be the best place to do most of the
FLR logic.
But this needs modify pciback and add interfaces for Control Panel.
I'm not sure the changes can be small enought to be in 3.3.
>
> I think the following saving/restoring timing is good.
> - saving : after pciback bind device
> - restoring : after xend writes SBDF into xenstore
> before pciback removes backend device
>
>>> What do you think about following method?
>>>
>>> 1. create the table, and fill offset, size, etc. of registers
>>> to save/restore into the table.
>>>
>>> 2. save/restore registers based on the table.
>> This sounds nice.
>> Where should we store the table?
>
> I don't think device specific tables are required.
> So we only have to create one table.
> xend (or pciback if you implement saving/restoring in pciback) is good
> place, I think.
>
Thanks,
-- Dexuan
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] Improve the current FLR logic
2008-07-23 3:16 ` Cui, Dexuan
@ 2008-07-24 7:53 ` Yuji Shimada
[not found] ` <FE7BBCFBB500984A9A7922EBC95F516E0178C213@pdsmsx414.ccr.corp.intel.com>
0 siblings, 1 reply; 27+ messages in thread
From: Yuji Shimada @ 2008-07-24 7:53 UTC (permalink / raw)
To: Cui, Dexuan; +Cc: xen-devel, Keir Fraser
On Wed, 23 Jul 2008 11:16:31 +0800
"Cui, Dexuan" <dexuan.cui@intel.com> wrote:
> I think xenstore seems to be a little ugly place to remember the proper
> values of all or part of registers of all the devices in the system.
> I agree pciback is a good place.
> Yes. Now I realize pciback seems to be the best place to do most of the
> FLR logic.
> But this needs modify pciback and add interfaces for Control Panel.
> I'm not sure the changes can be small enought to be in 3.3.
Hi Cui,
What do you think about the following idea?
- For 3.3, modify xend to have limited saving/restoring and resetting
functions.
- For 3.4, modify pciback to have proper saving/restoring and
resetting functions.
The purpose of saving is to save the values configured by firmware. So
the best timing of saving the value is when dom0 starts. But if saving
the values when dom0 starts, pciback is the best place to save/restore.
If pciback saves/restores the values, we need many codings. So it is
difficult to achieve it in 3.3.
So for 3.3, there is a way that saving the value just before resetting.
It is possible xend saves/restores the value, I think. In this case,
xend don't need saving the values in xenstore. Because interval between
saving and restoring is small.
My idea is summarized as follows:
Xen Timing Where Difficulty
3.3 before resetting xend easy
3.4 when Dom0 starts pciback hard
Thanks.
--
Yuji Shimada
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] Improve the current FLR logic
2008-07-17 9:33 ` Yuji Shimada
2008-07-17 10:22 ` Cui, Dexuan
2008-07-18 6:37 ` Cui, Dexuan
@ 2008-10-01 6:53 ` Neo Jia
2008-10-01 8:21 ` Yuji Shimada
2 siblings, 1 reply; 27+ messages in thread
From: Neo Jia @ 2008-10-01 6:53 UTC (permalink / raw)
To: Yuji Shimada; +Cc: xen-devel, Keir Fraser, Cui, Dexuan
Yuji,
Could you show me how to use the hotplug method to restore the
register? Also, in the current implementation, we just read 4 bytes
each time and write them from 0 to 256 - 4. Will this cause any side
effect?
Thanks,
Neo
On Thu, Jul 17, 2008 at 2:33 AM, Yuji Shimada
<shimada-yxb@necst.nec.co.jp> wrote:
> On Sat, 12 Jul 2008 20:37:34 +0800
> "Cui, Dexuan" <dexuan.cui@intel.com> wrote:
>
>> I'd like to ask for your comments, and test feedbacks. Thank you very
>> much!
>
> There is a problem with the device reconfiguring logic. xend saves all
> Configuration Register's values before reset. And xend writes the
> values to the registers after reset. This means the resister's values
> which are configured by guest software are restore. Guest software
> setting should be cleared.
>
> I think the following Configuration Resister should be reconfigured to
> correct values after reset. And other registers should not be
> reconfigured after reset if there is no reason.
>
> - It is necessary to write the base address of the resource allocated
> by the dom0 kernel to the following resister.
> - Base Address Register
>
> - When dom0 starts, the values configured by firmware should be saved,
> and the values should be written to the following resisters after
> reset. The reason is that firmware configures them to archive system
> specific functions. Especially, the registers relating error reporting
> are configured to collect error information. If this configured values
> is changed, some functions might be lost.
> Instead of save/restore, it is good way to write the value from _HPX
> method to registers, I think. _HPX method returns the value to write
> to registers.
> - Cache-line size Register
> - Latency timer Register
> - Enable SERR Bit/Enable PERR Bit in Device Control Register
> - Uncorrectable Error Mask Register
> - Uncorrectable Error Severity Register
> - Correctable Error Mask Register
> - Advanced Error Capabilities and Control Register
> - Device Control Register
> - Link Control Register
> - Secondary Uncorrectable Error Severity Register
> - Secondary Uncorrectable Error Mask Register
> - Device Control 2 Register
> - Link Control 2 Register
>
> - The following resister should be configured to "0".
> - PME Enable Bit/PME Status Bit in PCI Power Management
> Control/Status Register
>
> I would like to discuss this logic more.
>
> Thanks.
>
> --
> Yuji Shimada
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
>
--
I would remember that if researchers were not ambitious
probably today we haven't the technology we are using!
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] Improve the current FLR logic
2008-10-01 6:53 ` Neo Jia
@ 2008-10-01 8:21 ` Yuji Shimada
0 siblings, 0 replies; 27+ messages in thread
From: Yuji Shimada @ 2008-10-01 8:21 UTC (permalink / raw)
To: Neo Jia; +Cc: xen-devel, Keir Fraser, Cui, Dexuan
Hi, Neo,
On Tue, 30 Sep 2008 23:53:22 -0700
"Neo Jia" <neojia@gmail.com> wrote:
> Yuji,
>
> Could you show me how to use the hotplug method to restore the
> register?
Following code will help you to know how to use the hotplug
method(_HPP/_HPX method).
linux-2.6.18-xen.hg/drivers/pci/hotplug/pciehp_pci.c:program_fw_provided_values
Note: using the hotplug method is alternative approach to reconfigure
device after reset, without saving/restoring.
> Also, in the current implementation, we just read 4 bytes
> each time and write them from 0 to 256 - 4. Will this cause any side
> effect?
I'm not sure this cause any side effect.
But there is a case where memory enable bit is set to 1, while base
address registers are not programmed. Some devices might be confused.
Rich saving/restoring logic or using the hotplug method are needed for
following reasons.
- Most of the registers should not be restored, because there is a case
where guest software programs configuration registers to invalid value.
- We need to program FW provided value to some registers to use
machine's RAS function. For example, if system support SERR, SERR
enable bit should be set to 1, so that device reports error.
Thanks,
--
Yuji Shimada
^ permalink raw reply [flat|nested] 27+ messages in thread
end of thread, other threads:[~2008-10-01 8:21 UTC | newest]
Thread overview: 27+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-07-12 12:37 [PATCH] Improve the current FLR logic Cui, Dexuan
2008-07-14 17:52 ` Espen Skoglund
2008-07-17 10:21 ` Cui, Dexuan
2008-07-17 12:50 ` Espen Skoglund
2008-07-18 5:31 ` Jiang, Yunhong
2008-07-15 4:05 ` Yosuke Iwamatsu
2008-07-17 10:21 ` Cui, Dexuan
2008-07-17 11:06 ` Yosuke Iwamatsu
2008-07-18 2:13 ` Cui, Dexuan
2008-07-17 9:33 ` Yuji Shimada
2008-07-17 10:22 ` Cui, Dexuan
2008-07-18 6:37 ` Cui, Dexuan
2008-07-18 9:34 ` Yuji Shimada
2008-07-18 10:43 ` Cui, Dexuan
2008-07-22 10:16 ` Yuji Shimada
2008-07-22 10:48 ` Keir Fraser
2008-07-23 3:16 ` Cui, Dexuan
2008-07-24 7:53 ` Yuji Shimada
[not found] ` <FE7BBCFBB500984A9A7922EBC95F516E0178C213@pdsmsx414.ccr.corp.intel.com>
2008-07-24 8:38 ` Yuji Shimada
2008-07-24 10:19 ` Cui, Dexuan
2008-07-25 3:02 ` Yuji Shimada
2008-07-25 3:21 ` Cui, Dexuan
2008-09-19 6:45 ` Neo Jia
2008-09-19 6:58 ` Cui, Dexuan
2008-10-01 6:55 ` Neo Jia
2008-10-01 6:53 ` Neo Jia
2008-10-01 8:21 ` Yuji Shimada
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.