All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v5 0/8] Display IO topology when PXM data is available (plus some cleanup)
@ 2015-03-19 21:53 Boris Ostrovsky
  2015-03-19 21:53 ` [PATCH v5 1/8] numa: __node_distance() should return u8 Boris Ostrovsky
                   ` (8 more replies)
  0 siblings, 9 replies; 45+ messages in thread
From: Boris Ostrovsky @ 2015-03-19 21:53 UTC (permalink / raw)
  To: andrew.cooper3, jbeulich, keir, ian.jackson, stefano.stabellini,
	ian.campbell, wei.liu2, dario.faggioli, elena.ufimtseva,
	xen-devel
  Cc: boris.ostrovsky

Changes in v5:
* Make CPU topology and NUMA info sysctls behave more like XEN_DOMCTL_get_vcpu_msrs
  when passed NULL buffers. This required toolstack changes as well
* Don't use 8-bit data types in interfaces
* Fold interface version update into patch#3

Changes in v4:
* Split cputopology and NUMA info changes into separate patches
* Added patch#1 (partly because patch#4 needs to know when when distance is invalid,
  i.e. NUMA_NO_DISTANCE)
* Split sysctl version update into a separate patch
* Other changes are listed in each patch
* NOTE: I did not test python's xc changes since I don't think I know how.

Changes in v3:
* Added patch #1 to more consistently define nodes as a u8 and properly
  use NUMA_NO_NODE.
* Make changes to xen_sysctl_numainfo, similar to those made to
  xen_sysctl_topologyinfo. (Q: I kept both sets of changes in the same
  patch #3 to avoid bumping interface version twice. Perhaps it's better
  to split it into two?)
* Instead of copying data for each loop index allocate a buffer and copy
  once for all three queries in sysctl.c.
* Move hypercall buffer management from libxl to libxc (as requested by
  Dario, patches #5 and #6).
* Report topology info for offlined CPUs as well
* Added LIBXL_HAVE_PCITOPO macro

Changes in v2:
* Split topology sysctls into two --- one for CPU topology and the other
  for devices
* Avoid long loops in the hypervisor by using continuations. (I am not
  particularly happy about using first_dev in the interface, suggestions
  for a better interface would be appreciated)
* Use proper libxl conventions for interfaces
* Avoid hypervisor stack corruption when copying PXM data from guest


A few patches that add interface for querying hypervisor about device
topology and allow 'xl info -n' display this information if PXM object
is provided by ACPI.

This series also makes some optimizations and cleanup of current CPU
topology and NUMA sysctl queries.



Boris Ostrovsky (8):
  numa: __node_distance() should return u8
  pci: Stash device's PXM information in struct pci_dev
  sysctl: Make XEN_SYSCTL_topologyinfo sysctl a little more efficient
  sysctl: Make XEN_SYSCTL_numainfo a little more efficient
  sysctl: Add sysctl interface for querying PCI topology
  libxl/libxc: Move libxl_get_cpu_topology()'s hypercall buffer
    management to libxc
  libxl/libxc: Move libxl_get_numainfo()'s hypercall buffer management
    to libxc
  libxl: Add interface for querying hypervisor about PCI topology

 tools/libxc/include/xenctrl.h     |   12 ++-
 tools/libxc/xc_misc.c             |  102 ++++++++++++++++---
 tools/libxl/libxl.c               |  183 ++++++++++++++++------------------
 tools/libxl/libxl.h               |   12 ++
 tools/libxl/libxl_freebsd.c       |   12 ++
 tools/libxl/libxl_internal.h      |    5 +
 tools/libxl/libxl_linux.c         |   69 +++++++++++++
 tools/libxl/libxl_netbsd.c        |   12 ++
 tools/libxl/libxl_types.idl       |    7 ++
 tools/libxl/libxl_utils.c         |    8 ++
 tools/libxl/xl_cmdimpl.c          |   40 ++++++--
 tools/misc/xenpm.c                |  101 ++++++++----------
 tools/python/xen/lowlevel/xc/xc.c |  105 +++++++-------------
 xen/arch/x86/physdev.c            |   23 ++++-
 xen/arch/x86/srat.c               |   13 ++-
 xen/common/page_alloc.c           |    4 +-
 xen/common/sysctl.c               |  200 +++++++++++++++++++++++++++----------
 xen/drivers/passthrough/pci.c     |   13 ++-
 xen/include/asm-x86/numa.h        |    2 +-
 xen/include/public/physdev.h      |    6 +
 xen/include/public/sysctl.h       |  138 ++++++++++++++++---------
 xen/include/xen/numa.h            |    3 +-
 xen/include/xen/pci.h             |    5 +-
 23 files changed, 715 insertions(+), 360 deletions(-)

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH v5 1/8] numa: __node_distance() should return u8
  2015-03-19 21:53 [PATCH v5 0/8] Display IO topology when PXM data is available (plus some cleanup) Boris Ostrovsky
@ 2015-03-19 21:53 ` Boris Ostrovsky
  2015-03-20 14:25   ` Konrad Rzeszutek Wilk
  2015-03-19 21:53 ` [PATCH v5 2/8] pci: Stash device's PXM information in struct pci_dev Boris Ostrovsky
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 45+ messages in thread
From: Boris Ostrovsky @ 2015-03-19 21:53 UTC (permalink / raw)
  To: andrew.cooper3, jbeulich, keir, ian.jackson, stefano.stabellini,
	ian.campbell, wei.liu2, dario.faggioli, elena.ufimtseva,
	xen-devel
  Cc: boris.ostrovsky

SLIT values are byte-sized and some of them (0-9 and 255) have
special meaning. Adjust __node_distance() to reflect this and
modify scrub_heap_pages() and do_sysctl() to deal with
__node_distance() returning an invalid SLIT entry.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
---

Changes in v5:
* XEN_SYSCTL_numainfo knows about NUMA_NO_DISTANCE
* Cleaner changes in __node_distance()

 xen/arch/x86/srat.c        |   13 ++++++++++---
 xen/common/page_alloc.c    |    4 ++--
 xen/common/sysctl.c        |    6 +++++-
 xen/include/asm-x86/numa.h |    2 +-
 xen/include/xen/numa.h     |    3 ++-
 5 files changed, 20 insertions(+), 8 deletions(-)

diff --git a/xen/arch/x86/srat.c b/xen/arch/x86/srat.c
index dfabba3..92c89a5 100644
--- a/xen/arch/x86/srat.c
+++ b/xen/arch/x86/srat.c
@@ -496,14 +496,21 @@ static unsigned node_to_pxm(nodeid_t n)
 	return 0;
 }
 
-int __node_distance(nodeid_t a, nodeid_t b)
+u8 __node_distance(nodeid_t a, nodeid_t b)
 {
-	int index;
+	unsigned index;
+	u8 slit_val;
 
 	if (!acpi_slit)
 		return a == b ? 10 : 20;
 	index = acpi_slit->locality_count * node_to_pxm(a);
-	return acpi_slit->entry[index + node_to_pxm(b)];
+	slit_val = acpi_slit->entry[index + node_to_pxm(b)];
+
+	/* ACPI defines 0xff as an unreachable node and 0-9 are undefined */
+	if ((slit_val == 0xff) || (slit_val <= 9))
+		return NUMA_NO_DISTANCE;
+	else
+		return slit_val;
 }
 
 EXPORT_SYMBOL(__node_distance);
diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index d999296..bfb356e 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -1434,13 +1434,13 @@ void __init scrub_heap_pages(void)
         /* Figure out which NODE CPUs are close. */
         for_each_online_node ( j )
         {
-            int distance;
+            u8 distance;
 
             if ( cpumask_empty(&node_to_cpumask(j)) )
                 continue;
 
             distance = __node_distance(i, j);
-            if ( distance < last_distance )
+            if ( (distance < last_distance) && (distance != NUMA_NO_DISTANCE) )
             {
                 last_distance = distance;
                 best_node = j;
diff --git a/xen/common/sysctl.c b/xen/common/sysctl.c
index 0cb6ee1..6fdd029 100644
--- a/xen/common/sysctl.c
+++ b/xen/common/sysctl.c
@@ -304,7 +304,11 @@ long do_sysctl(XEN_GUEST_HANDLE_PARAM(xen_sysctl_t) u_sysctl)
                 {
                     uint32_t distance = ~0u;
                     if ( node_online(i) && node_online(j) )
-                        distance = __node_distance(i, j);
+                    {
+                        u8 d = __node_distance(i, j);
+                        if ( d != NUMA_NO_DISTANCE )
+                            distance = d;
+                    }
                     if ( copy_to_guest_offset(
                         ni->node_to_node_distance,
                         i*(max_node_index+1) + j, &distance, 1) )
diff --git a/xen/include/asm-x86/numa.h b/xen/include/asm-x86/numa.h
index cc5b5d1..7a489d3 100644
--- a/xen/include/asm-x86/numa.h
+++ b/xen/include/asm-x86/numa.h
@@ -85,6 +85,6 @@ extern int valid_numa_range(u64 start, u64 end, nodeid_t node);
 #endif
 
 void srat_parse_regions(u64 addr);
-extern int __node_distance(nodeid_t a, nodeid_t b);
+extern u8 __node_distance(nodeid_t a, nodeid_t b);
 
 #endif
diff --git a/xen/include/xen/numa.h b/xen/include/xen/numa.h
index ac4b391..7aef1a8 100644
--- a/xen/include/xen/numa.h
+++ b/xen/include/xen/numa.h
@@ -7,7 +7,8 @@
 #define NODES_SHIFT     0
 #endif
 
-#define NUMA_NO_NODE    0xFF
+#define NUMA_NO_NODE     0xFF
+#define NUMA_NO_DISTANCE 0xFF
 
 #define MAX_NUMNODES    (1 << NODES_SHIFT)
 
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v5 2/8] pci: Stash device's PXM information in struct pci_dev
  2015-03-19 21:53 [PATCH v5 0/8] Display IO topology when PXM data is available (plus some cleanup) Boris Ostrovsky
  2015-03-19 21:53 ` [PATCH v5 1/8] numa: __node_distance() should return u8 Boris Ostrovsky
@ 2015-03-19 21:53 ` Boris Ostrovsky
  2015-03-20 14:01   ` Konrad Rzeszutek Wilk
  2015-03-19 21:53 ` [PATCH v5 3/8] sysctl: Make XEN_SYSCTL_topologyinfo sysctl a little more efficient Boris Ostrovsky
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 45+ messages in thread
From: Boris Ostrovsky @ 2015-03-19 21:53 UTC (permalink / raw)
  To: andrew.cooper3, jbeulich, keir, ian.jackson, stefano.stabellini,
	ian.campbell, wei.liu2, dario.faggioli, elena.ufimtseva,
	xen-devel
  Cc: boris.ostrovsky

If ACPI provides PXM data for IO devices then dom0 will pass it to
hypervisor during PHYSDEVOP_pci_device_add call. This information,
however, is currently ignored.

We will store this information (in the form of nodeID) in pci_dev
structure so that we can provide it, for example, to the toolstack
when it adds support (in the following patches) for querying the
hypervisor about device topology

We will also print it when user requests device information dump.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
---

Changes in v5:
* Replace u8 with nodeid_t

 xen/arch/x86/physdev.c        |   23 ++++++++++++++++++++---
 xen/drivers/passthrough/pci.c |   13 +++++++++----
 xen/include/public/physdev.h  |    6 ++++++
 xen/include/xen/pci.h         |    5 ++++-
 4 files changed, 39 insertions(+), 8 deletions(-)

diff --git a/xen/arch/x86/physdev.c b/xen/arch/x86/physdev.c
index 1be1d50..57b7800 100644
--- a/xen/arch/x86/physdev.c
+++ b/xen/arch/x86/physdev.c
@@ -565,7 +565,8 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
         if ( copy_from_guest(&manage_pci, arg, 1) != 0 )
             break;
 
-        ret = pci_add_device(0, manage_pci.bus, manage_pci.devfn, NULL);
+        ret = pci_add_device(0, manage_pci.bus, manage_pci.devfn,
+                             NULL, NUMA_NO_NODE);
         break;
     }
 
@@ -597,13 +598,14 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
         pdev_info.physfn.devfn = manage_pci_ext.physfn.devfn;
         ret = pci_add_device(0, manage_pci_ext.bus,
                              manage_pci_ext.devfn,
-                             &pdev_info);
+                             &pdev_info, NUMA_NO_NODE);
         break;
     }
 
     case PHYSDEVOP_pci_device_add: {
         struct physdev_pci_device_add add;
         struct pci_dev_info pdev_info;
+        nodeid_t node;
 
         ret = -EFAULT;
         if ( copy_from_guest(&add, arg, 1) != 0 )
@@ -618,7 +620,22 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
         }
         else
             pdev_info.is_virtfn = 0;
-        ret = pci_add_device(add.seg, add.bus, add.devfn, &pdev_info);
+
+        if ( add.flags & XEN_PCI_DEV_PXM )
+        {
+            uint32_t pxm;
+            size_t optarr_off = offsetof(struct physdev_pci_device_add, optarr) /
+                                sizeof(add.optarr[0]);
+
+            if ( copy_from_guest_offset(&pxm, arg, optarr_off, 1) )
+                break;
+
+            node = pxm_to_node(pxm);
+        }
+        else
+            node = NUMA_NO_NODE;
+
+        ret = pci_add_device(add.seg, add.bus, add.devfn, &pdev_info, node);
         break;
     }
 
diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
index 4b83583..ecd061e 100644
--- a/xen/drivers/passthrough/pci.c
+++ b/xen/drivers/passthrough/pci.c
@@ -568,7 +568,8 @@ static void pci_enable_acs(struct pci_dev *pdev)
     pci_conf_write16(seg, bus, dev, func, pos + PCI_ACS_CTRL, ctrl);
 }
 
-int pci_add_device(u16 seg, u8 bus, u8 devfn, const struct pci_dev_info *info)
+int pci_add_device(u16 seg, u8 bus, u8 devfn,
+                   const struct pci_dev_info *info, nodeid_t node)
 {
     struct pci_seg *pseg;
     struct pci_dev *pdev;
@@ -586,7 +587,8 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn, const struct pci_dev_info *info)
         pdev = pci_get_pdev(seg, info->physfn.bus, info->physfn.devfn);
         spin_unlock(&pcidevs_lock);
         if ( !pdev )
-            pci_add_device(seg, info->physfn.bus, info->physfn.devfn, NULL);
+            pci_add_device(seg, info->physfn.bus, info->physfn.devfn,
+                           NULL, node);
         pdev_type = "virtual function";
     }
     else
@@ -609,6 +611,8 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn, const struct pci_dev_info *info)
     if ( !pdev )
         goto out;
 
+    pdev->node = node;
+
     if ( info )
         pdev->info = *info;
     else if ( !pdev->vf_rlen[0] )
@@ -1191,10 +1195,11 @@ static int _dump_pci_devices(struct pci_seg *pseg, void *arg)
 
     list_for_each_entry ( pdev, &pseg->alldevs_list, alldevs_list )
     {
-        printk("%04x:%02x:%02x.%u - dom %-3d - MSIs < ",
+        printk("%04x:%02x:%02x.%u - dom %-3d - node %-3d - MSIs < ",
                pseg->nr, pdev->bus,
                PCI_SLOT(pdev->devfn), PCI_FUNC(pdev->devfn),
-               pdev->domain ? pdev->domain->domain_id : -1);
+               pdev->domain ? pdev->domain->domain_id : -1,
+               (pdev->node != NUMA_NO_NODE) ? pdev->node : -1);
         list_for_each_entry ( msi, &pdev->msi_list, list )
                printk("%d ", msi->irq);
         printk(">\n");
diff --git a/xen/include/public/physdev.h b/xen/include/public/physdev.h
index 2683719..f33845d 100644
--- a/xen/include/public/physdev.h
+++ b/xen/include/public/physdev.h
@@ -293,6 +293,12 @@ struct physdev_pci_device_add {
         uint8_t bus;
         uint8_t devfn;
     } physfn;
+
+    /*
+     * Optional parameters array.
+     * First element ([0]) is PXM domain associated with the device (if
+     * XEN_PCI_DEV_PXM is set)
+     */
 #if defined(__STDC_VERSION__) && __STDC_VERSION__ >= 199901L
     uint32_t optarr[];
 #elif defined(__GNUC__)
diff --git a/xen/include/xen/pci.h b/xen/include/xen/pci.h
index 3988ee6..b281790 100644
--- a/xen/include/xen/pci.h
+++ b/xen/include/xen/pci.h
@@ -57,6 +57,8 @@ struct pci_dev {
 
     u8 phantom_stride;
 
+    nodeid_t node; /* NUMA node */
+
     enum pdev_type {
         DEV_TYPE_PCI_UNKNOWN,
         DEV_TYPE_PCIe_ENDPOINT,
@@ -103,7 +105,8 @@ void setup_hwdom_pci_devices(struct domain *,
 int pci_release_devices(struct domain *d);
 int pci_add_segment(u16 seg);
 const unsigned long *pci_get_ro_map(u16 seg);
-int pci_add_device(u16 seg, u8 bus, u8 devfn, const struct pci_dev_info *);
+int pci_add_device(u16 seg, u8 bus, u8 devfn,
+                   const struct pci_dev_info *, nodeid_t node);
 int pci_remove_device(u16 seg, u8 bus, u8 devfn);
 int pci_ro_device(int seg, int bus, int devfn);
 void arch_pci_ro_device(int seg, int bdf);
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v5 3/8] sysctl: Make XEN_SYSCTL_topologyinfo sysctl a little more efficient
  2015-03-19 21:53 [PATCH v5 0/8] Display IO topology when PXM data is available (plus some cleanup) Boris Ostrovsky
  2015-03-19 21:53 ` [PATCH v5 1/8] numa: __node_distance() should return u8 Boris Ostrovsky
  2015-03-19 21:53 ` [PATCH v5 2/8] pci: Stash device's PXM information in struct pci_dev Boris Ostrovsky
@ 2015-03-19 21:53 ` Boris Ostrovsky
  2015-03-20 13:25   ` Ian Campbell
  2015-03-25 16:13   ` Andrew Cooper
  2015-03-19 21:54 ` [PATCH v5 4/8] sysctl: Make XEN_SYSCTL_numainfo " Boris Ostrovsky
                   ` (5 subsequent siblings)
  8 siblings, 2 replies; 45+ messages in thread
From: Boris Ostrovsky @ 2015-03-19 21:53 UTC (permalink / raw)
  To: andrew.cooper3, jbeulich, keir, ian.jackson, stefano.stabellini,
	ian.campbell, wei.liu2, dario.faggioli, elena.ufimtseva,
	xen-devel
  Cc: boris.ostrovsky

A number of changes to XEN_SYSCTL_topologyinfo interface:

* Instead of copying data for each field in xen_sysctl_topologyinfo separately
  put cpu/socket/node into a single structure and do a single copy for each
  processor.
* A NULL cputopo handle passed is a request for maximum number of CPUs
  (num_cpus). If cputopo is valid and num_cpus is smaller than the number of
  CPUs in the system then -ENOBUFS is returned (and correct num_cpus is provided)
* Do not use max_cpu_index, which is almost always used for calculating number
  CPUs (thus requiring adding or subtracting one), replace it with num_cpus.
* There is no need to copy whole op in sysctl to user at the end, we only need
  num_cpus.
* Rename xen_sysctl_topologyinfo and XEN_SYSCTL_topologyinfo to reflect the fact
  that these are used for CPU topology. Subsequent patch will add support for
  PCI topology sysctl.
* Replace INVALID_TOPOLOGY_ID with "XEN_"-prefixed macros for each invalid type
  (core, socket, node).

Update sysctl version to 0x0000000C

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
---

Changes in v5:
* Make XEN_SYSCTL_cputopoinfo treat passed in NULL handles as requests
  for array size (and is too small size passed in results in -ENOBUFS)
* Make node in xen_sysctl_cputopo a uint32
* On the toolstack side use NULL handle to determine array size in 
  libxl_get_cpu_topology() and use dynamic arrays in python's xc.c and
  xenpm.c

 tools/libxc/include/xenctrl.h     |    4 +-
 tools/libxc/xc_misc.c             |   10 ++--
 tools/libxl/libxl.c               |   55 +++++++------------
 tools/misc/xenpm.c                |  106 ++++++++++++++++++-------------------
 tools/python/xen/lowlevel/xc/xc.c |   57 +++++++-------------
 xen/common/sysctl.c               |   68 ++++++++++++++++--------
 xen/include/public/sysctl.h       |   57 +++++++++++---------
 7 files changed, 177 insertions(+), 180 deletions(-)

diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index df18292..f64f815 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -1226,7 +1226,7 @@ int xc_readconsolering(xc_interface *xch,
 int xc_send_debug_keys(xc_interface *xch, char *keys);
 
 typedef xen_sysctl_physinfo_t xc_physinfo_t;
-typedef xen_sysctl_topologyinfo_t xc_topologyinfo_t;
+typedef xen_sysctl_cputopoinfo_t xc_cputopoinfo_t;
 typedef xen_sysctl_numainfo_t xc_numainfo_t;
 
 typedef uint32_t xc_cpu_to_node_t;
@@ -1237,7 +1237,7 @@ typedef uint64_t xc_node_to_memfree_t;
 typedef uint32_t xc_node_to_node_dist_t;
 
 int xc_physinfo(xc_interface *xch, xc_physinfo_t *info);
-int xc_topologyinfo(xc_interface *xch, xc_topologyinfo_t *info);
+int xc_cputopoinfo(xc_interface *xch, xc_cputopoinfo_t *info);
 int xc_numainfo(xc_interface *xch, xc_numainfo_t *info);
 
 int xc_sched_id(xc_interface *xch,
diff --git a/tools/libxc/xc_misc.c b/tools/libxc/xc_misc.c
index e253a58..be68291 100644
--- a/tools/libxc/xc_misc.c
+++ b/tools/libxc/xc_misc.c
@@ -177,20 +177,20 @@ int xc_physinfo(xc_interface *xch,
     return 0;
 }
 
-int xc_topologyinfo(xc_interface *xch,
-                xc_topologyinfo_t *put_info)
+int xc_cputopoinfo(xc_interface *xch,
+                   xc_cputopoinfo_t *put_info)
 {
     int ret;
     DECLARE_SYSCTL;
 
-    sysctl.cmd = XEN_SYSCTL_topologyinfo;
+    sysctl.cmd = XEN_SYSCTL_cputopoinfo;
 
-    memcpy(&sysctl.u.topologyinfo, put_info, sizeof(*put_info));
+    memcpy(&sysctl.u.cputopoinfo, put_info, sizeof(*put_info));
 
     if ( (ret = do_sysctl(xch, &sysctl)) != 0 )
         return ret;
 
-    memcpy(put_info, &sysctl.u.topologyinfo, sizeof(*put_info));
+    memcpy(put_info, &sysctl.u.cputopoinfo, sizeof(*put_info));
 
     return 0;
 }
diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index 94b4d59..c989abf 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -5054,64 +5054,51 @@ int libxl_get_physinfo(libxl_ctx *ctx, libxl_physinfo *physinfo)
 libxl_cputopology *libxl_get_cpu_topology(libxl_ctx *ctx, int *nb_cpu_out)
 {
     GC_INIT(ctx);
-    xc_topologyinfo_t tinfo;
-    DECLARE_HYPERCALL_BUFFER(xc_cpu_to_core_t, coremap);
-    DECLARE_HYPERCALL_BUFFER(xc_cpu_to_socket_t, socketmap);
-    DECLARE_HYPERCALL_BUFFER(xc_cpu_to_node_t, nodemap);
+    xc_cputopoinfo_t tinfo;
+    DECLARE_HYPERCALL_BUFFER(xen_sysctl_cputopo_t, cputopo);
     libxl_cputopology *ret = NULL;
     int i;
-    int max_cpus;
 
-    max_cpus = libxl_get_max_cpus(ctx);
-    if (max_cpus < 0)
+    /* Setting buffer to NULL makes the hypercall return number of CPUs */
+    set_xen_guest_handle(tinfo.cputopo, HYPERCALL_BUFFER_NULL);
+    if (xc_cputopoinfo(ctx->xch, &tinfo) != 0)
     {
         LIBXL__LOG(ctx, XTL_ERROR, "Unable to determine number of CPUS");
         ret = NULL;
         goto out;
     }
 
-    coremap = xc_hypercall_buffer_alloc
-        (ctx->xch, coremap, sizeof(*coremap) * max_cpus);
-    socketmap = xc_hypercall_buffer_alloc
-        (ctx->xch, socketmap, sizeof(*socketmap) * max_cpus);
-    nodemap = xc_hypercall_buffer_alloc
-        (ctx->xch, nodemap, sizeof(*nodemap) * max_cpus);
-    if ((coremap == NULL) || (socketmap == NULL) || (nodemap == NULL)) {
+    cputopo = xc_hypercall_buffer_alloc(ctx->xch, cputopo,
+                                        sizeof(*cputopo) * tinfo.num_cpus);
+    if (cputopo == NULL) {
         LIBXL__LOG_ERRNOVAL(ctx, XTL_ERROR, ENOMEM,
                             "Unable to allocate hypercall arguments");
         goto fail;
     }
+    set_xen_guest_handle(tinfo.cputopo, cputopo);
 
-    set_xen_guest_handle(tinfo.cpu_to_core, coremap);
-    set_xen_guest_handle(tinfo.cpu_to_socket, socketmap);
-    set_xen_guest_handle(tinfo.cpu_to_node, nodemap);
-    tinfo.max_cpu_index = max_cpus - 1;
-    if (xc_topologyinfo(ctx->xch, &tinfo) != 0) {
-        LIBXL__LOG_ERRNO(ctx, XTL_ERROR, "Topology info hypercall failed");
+    if (xc_cputopoinfo(ctx->xch, &tinfo) != 0) {
+        LIBXL__LOG_ERRNO(ctx, XTL_ERROR, "CPU topology info hypercall failed");
         goto fail;
     }
 
-    if (tinfo.max_cpu_index < max_cpus - 1)
-        max_cpus = tinfo.max_cpu_index + 1;
+    ret = libxl__zalloc(NOGC, sizeof(libxl_cputopology) * tinfo.num_cpus);
 
-    ret = libxl__zalloc(NOGC, sizeof(libxl_cputopology) * max_cpus);
-
-    for (i = 0; i < max_cpus; i++) {
-#define V(map, i) (map[i] == INVALID_TOPOLOGY_ID) ? \
-    LIBXL_CPUTOPOLOGY_INVALID_ENTRY : map[i]
-        ret[i].core = V(coremap, i);
-        ret[i].socket = V(socketmap, i);
-        ret[i].node = V(nodemap, i);
+    for (i = 0; i < tinfo.num_cpus; i++) {
+#define V(map, i, invalid) ( cputopo[i].map == invalid) ? \
+   LIBXL_CPUTOPOLOGY_INVALID_ENTRY : cputopo[i].map
+        ret[i].core = V(core, i, XEN_INVALID_CORE_ID);
+        ret[i].socket = V(socket, i, XEN_INVALID_SOCKET_ID);
+        ret[i].node = V(node, i, XEN_INVALID_NODE_ID);
 #undef V
     }
 
 fail:
-    xc_hypercall_buffer_free(ctx->xch, coremap);
-    xc_hypercall_buffer_free(ctx->xch, socketmap);
-    xc_hypercall_buffer_free(ctx->xch, nodemap);
+    xc_hypercall_buffer_free(ctx->xch, cputopo);
 
     if (ret)
-        *nb_cpu_out = max_cpus;
+        *nb_cpu_out = tinfo.num_cpus;
+
  out:
     GC_FREE;
     return ret;
diff --git a/tools/misc/xenpm.c b/tools/misc/xenpm.c
index e43924c..a5d07de 100644
--- a/tools/misc/xenpm.c
+++ b/tools/misc/xenpm.c
@@ -355,16 +355,21 @@ static void signal_int_handler(int signo)
     int i, j, k;
     struct timeval tv;
     int cx_cap = 0, px_cap = 0;
-    DECLARE_HYPERCALL_BUFFER(uint32_t, cpu_to_core);
-    DECLARE_HYPERCALL_BUFFER(uint32_t, cpu_to_socket);
-    DECLARE_HYPERCALL_BUFFER(uint32_t, cpu_to_node);
-    xc_topologyinfo_t info = { 0 };
+    DECLARE_HYPERCALL_BUFFER(xen_sysctl_cputopo_t, cputopo);
+    xc_cputopoinfo_t info = { 0 };
 
-    cpu_to_core = xc_hypercall_buffer_alloc(xc_handle, cpu_to_core, sizeof(*cpu_to_core) * MAX_NR_CPU);
-    cpu_to_socket = xc_hypercall_buffer_alloc(xc_handle, cpu_to_socket, sizeof(*cpu_to_socket) * MAX_NR_CPU);
-    cpu_to_node = xc_hypercall_buffer_alloc(xc_handle, cpu_to_node, sizeof(*cpu_to_node) * MAX_NR_CPU);
+    set_xen_guest_handle(info.cputopo, HYPERCALL_BUFFER_NULL);
+    if ( xc_cputopoinfo(xc_handle, &info) != 0 )
+    {
+        fprintf(stderr, "failed to discover number of CPUs: %s\n",
+                strerror(errno));
+        goto out;
+    }
 
-    if ( cpu_to_core == NULL || cpu_to_socket == NULL || cpu_to_node == NULL )
+    cputopo = xc_hypercall_buffer_alloc(xc_handle, cputopo,
+                                        sizeof(*cputopo) * info.num_cpus);
+
+    if ( cputopo == NULL )
     {
 	fprintf(stderr, "failed to allocate hypercall buffers\n");
 	goto out;
@@ -448,47 +453,44 @@ static void signal_int_handler(int signo)
             printf("  Avg freq\t%d\tKHz\n", avgfreq[i]);
     }
 
-    set_xen_guest_handle(info.cpu_to_core, cpu_to_core);
-    set_xen_guest_handle(info.cpu_to_socket, cpu_to_socket);
-    set_xen_guest_handle(info.cpu_to_node, cpu_to_node);
-    info.max_cpu_index = MAX_NR_CPU - 1;
+    set_xen_guest_handle(info.cputopo, cputopo);
 
-    if ( cx_cap && !xc_topologyinfo(xc_handle, &info) )
+    if ( cx_cap && !xc_cputopoinfo(xc_handle, &info) )
     {
         uint32_t socket_ids[MAX_NR_CPU];
         uint32_t core_ids[MAX_NR_CPU];
         uint32_t socket_nr = 0;
         uint32_t core_nr = 0;
 
-        if ( info.max_cpu_index > MAX_NR_CPU - 1 )
-            info.max_cpu_index = MAX_NR_CPU - 1;
+        if ( info.num_cpus > MAX_NR_CPU )
+            info.num_cpus = MAX_NR_CPU;
         /* check validity */
-        for ( i = 0; i <= info.max_cpu_index; i++ )
+        for ( i = 0; i < info.num_cpus; i++ )
         {
-            if ( cpu_to_core[i] == INVALID_TOPOLOGY_ID ||
-                 cpu_to_socket[i] == INVALID_TOPOLOGY_ID )
+            if ( cputopo[i].core == XEN_INVALID_CORE_ID ||
+                 cputopo[i].socket == XEN_INVALID_SOCKET_ID )
                 break;
         }
-        if ( i > info.max_cpu_index )
+        if ( i >= info.num_cpus )
         {
             /* find socket nr & core nr per socket */
-            for ( i = 0; i <= info.max_cpu_index; i++ )
+            for ( i = 0; i < info.num_cpus; i++ )
             {
                 for ( j = 0; j < socket_nr; j++ )
-                    if ( cpu_to_socket[i] == socket_ids[j] )
+                    if ( cputopo[i].socket == socket_ids[j] )
                         break;
                 if ( j == socket_nr )
                 {
-                    socket_ids[j] = cpu_to_socket[i];
+                    socket_ids[j] = cputopo[i].socket;
                     socket_nr++;
                 }
 
                 for ( j = 0; j < core_nr; j++ )
-                    if ( cpu_to_core[i] == core_ids[j] )
+                    if ( cputopo[i].core == core_ids[j] )
                         break;
                 if ( j == core_nr )
                 {
-                    core_ids[j] = cpu_to_core[i];
+                    core_ids[j] = cputopo[i].core;
                     core_nr++;
                 }
             }
@@ -499,9 +501,9 @@ static void signal_int_handler(int signo)
                 unsigned int n;
                 uint64_t res;
 
-                for ( j = 0; j <= info.max_cpu_index; j++ )
+                for ( j = 0; j < info.num_cpus; j++ )
                 {
-                    if ( cpu_to_socket[j] == socket_ids[i] )
+                    if ( cputopo[j].socket == socket_ids[i] )
                         break;
                 }
                 printf("\nSocket %d\n", socket_ids[i]);
@@ -518,10 +520,10 @@ static void signal_int_handler(int signo)
                 }
                 for ( k = 0; k < core_nr; k++ )
                 {
-                    for ( j = 0; j <= info.max_cpu_index; j++ )
+                    for ( j = 0; j < info.num_cpus; j++ )
                     {
-                        if ( cpu_to_socket[j] == socket_ids[i] &&
-                             cpu_to_core[j] == core_ids[k] )
+                        if ( cputopo[j].socket == socket_ids[i] &&
+                             cputopo[j].core == core_ids[k] )
                             break;
                     }
                     printf("\t Core %d CPU %d\n", core_ids[k], j);
@@ -556,9 +558,7 @@ static void signal_int_handler(int signo)
     free(sum);
     free(avgfreq);
 out:
-    xc_hypercall_buffer_free(xc_handle, cpu_to_core);
-    xc_hypercall_buffer_free(xc_handle, cpu_to_socket);
-    xc_hypercall_buffer_free(xc_handle, cpu_to_node);
+    xc_hypercall_buffer_free(xc_handle, cputopo);
     xc_interface_close(xc_handle);
     exit(0);
 }
@@ -965,28 +965,29 @@ void scaling_governor_func(int argc, char *argv[])
 
 void cpu_topology_func(int argc, char *argv[])
 {
-    DECLARE_HYPERCALL_BUFFER(uint32_t, cpu_to_core);
-    DECLARE_HYPERCALL_BUFFER(uint32_t, cpu_to_socket);
-    DECLARE_HYPERCALL_BUFFER(uint32_t, cpu_to_node);
-    xc_topologyinfo_t info = { 0 };
+    DECLARE_HYPERCALL_BUFFER(xen_sysctl_cputopo_t, cputopo);
+    xc_cputopoinfo_t info = { 0 };
     int i, rc = ENOMEM;
 
-    cpu_to_core = xc_hypercall_buffer_alloc(xc_handle, cpu_to_core, sizeof(*cpu_to_core) * MAX_NR_CPU);
-    cpu_to_socket = xc_hypercall_buffer_alloc(xc_handle, cpu_to_socket, sizeof(*cpu_to_socket) * MAX_NR_CPU);
-    cpu_to_node = xc_hypercall_buffer_alloc(xc_handle, cpu_to_node, sizeof(*cpu_to_node) * MAX_NR_CPU);
+    set_xen_guest_handle(info.cputopo, HYPERCALL_BUFFER_NULL);
+    if ( xc_cputopoinfo(xc_handle, &info) )
+    {
+        rc = errno;
+        fprintf(stderr, "failed to discover number of CPUs (%d - %s)\n",
+                errno, strerror(errno));
+        goto out;
+    }
 
-    if ( cpu_to_core == NULL || cpu_to_socket == NULL || cpu_to_node == NULL )
+    cputopo = xc_hypercall_buffer_alloc(xc_handle, cputopo,
+                                        sizeof(*cputopo) * info.num_cpus);
+    if ( cputopo == NULL )
     {
 	fprintf(stderr, "failed to allocate hypercall buffers\n");
 	goto out;
     }
 
-    set_xen_guest_handle(info.cpu_to_core, cpu_to_core);
-    set_xen_guest_handle(info.cpu_to_socket, cpu_to_socket);
-    set_xen_guest_handle(info.cpu_to_node, cpu_to_node);
-    info.max_cpu_index = MAX_NR_CPU-1;
-
-    if ( xc_topologyinfo(xc_handle, &info) )
+    set_xen_guest_handle(info.cputopo, cputopo);
+    if ( xc_cputopoinfo(xc_handle, &info) )
     {
         rc = errno;
         fprintf(stderr, "Cannot get Xen CPU topology (%d - %s)\n",
@@ -994,22 +995,17 @@ void cpu_topology_func(int argc, char *argv[])
         goto out;
     }
 
-    if ( info.max_cpu_index > (MAX_NR_CPU-1) )
-        info.max_cpu_index = MAX_NR_CPU-1;
-
     printf("CPU\tcore\tsocket\tnode\n");
-    for ( i = 0; i <= info.max_cpu_index; i++ )
+    for ( i = 0; i < info.num_cpus; i++ )
     {
-        if ( cpu_to_core[i] == INVALID_TOPOLOGY_ID )
+        if ( cputopo[i].core == XEN_INVALID_CORE_ID )
             continue;
         printf("CPU%d\t %d\t %d\t %d\n",
-               i, cpu_to_core[i], cpu_to_socket[i], cpu_to_node[i]);
+               i, cputopo[i].core, cputopo[i].socket, cputopo[i].node);
     }
     rc = 0;
 out:
-    xc_hypercall_buffer_free(xc_handle, cpu_to_core);
-    xc_hypercall_buffer_free(xc_handle, cpu_to_socket);
-    xc_hypercall_buffer_free(xc_handle, cpu_to_node);
+    xc_hypercall_buffer_free(xc_handle, cputopo);
     if ( rc )
         exit(rc);
 }
diff --git a/tools/python/xen/lowlevel/xc/xc.c b/tools/python/xen/lowlevel/xc/xc.c
index 2aa0dc7..5e81c4a 100644
--- a/tools/python/xen/lowlevel/xc/xc.c
+++ b/tools/python/xen/lowlevel/xc/xc.c
@@ -1220,78 +1220,66 @@ static PyObject *pyxc_getcpuinfo(XcObject *self, PyObject *args, PyObject *kwds)
 
 static PyObject *pyxc_topologyinfo(XcObject *self)
 {
-#define MAX_CPU_INDEX 255
-    xc_topologyinfo_t tinfo = { 0 };
-    int i, max_cpu_index;
+    xc_cputopoinfo_t tinfo = { 0 };
+    unsigned i;
     PyObject *ret_obj = NULL;
     PyObject *cpu_to_core_obj, *cpu_to_socket_obj, *cpu_to_node_obj;
-    DECLARE_HYPERCALL_BUFFER(xc_cpu_to_core_t, coremap);
-    DECLARE_HYPERCALL_BUFFER(xc_cpu_to_socket_t, socketmap);
-    DECLARE_HYPERCALL_BUFFER(xc_cpu_to_node_t, nodemap);
+    DECLARE_HYPERCALL_BUFFER(xen_sysctl_cputopo_t, cputopo);
 
-    coremap = xc_hypercall_buffer_alloc(self->xc_handle, coremap, sizeof(*coremap) * (MAX_CPU_INDEX+1));
-    if ( coremap == NULL )
-        goto out;
-    socketmap = xc_hypercall_buffer_alloc(self->xc_handle, socketmap, sizeof(*socketmap) * (MAX_CPU_INDEX+1));
-    if ( socketmap == NULL  )
-        goto out;
-    nodemap = xc_hypercall_buffer_alloc(self->xc_handle, nodemap, sizeof(*nodemap) * (MAX_CPU_INDEX+1));
-    if ( nodemap == NULL )
+    set_xen_guest_handle(tinfo.cputopo, HYPERCALL_BUFFER_NULL);
+    if ( xc_cputopoinfo(self->xc_handle, &tinfo) != 0 )
         goto out;
 
-    set_xen_guest_handle(tinfo.cpu_to_core, coremap);
-    set_xen_guest_handle(tinfo.cpu_to_socket, socketmap);
-    set_xen_guest_handle(tinfo.cpu_to_node, nodemap);
-    tinfo.max_cpu_index = MAX_CPU_INDEX;
+    cputopo = xc_hypercall_buffer_alloc(self->xc_handle, cputopo,
+                                        sizeof(*cputopo) * tinfo.num_cpus);
+    if ( cputopo == NULL )
+    	goto out;
 
-    if ( xc_topologyinfo(self->xc_handle, &tinfo) != 0 )
+    set_xen_guest_handle(tinfo.cputopo, cputopo);
+    if ( xc_cputopoinfo(self->xc_handle, &tinfo) != 0 )
         goto out;
 
-    max_cpu_index = tinfo.max_cpu_index;
-    if ( max_cpu_index > MAX_CPU_INDEX )
-        max_cpu_index = MAX_CPU_INDEX;
-
     /* Construct cpu-to-* lists. */
     cpu_to_core_obj = PyList_New(0);
     cpu_to_socket_obj = PyList_New(0);
     cpu_to_node_obj = PyList_New(0);
-    for ( i = 0; i <= max_cpu_index; i++ )
+    for ( i = 0; i < tinfo.num_cpus; i++ )
     {
-        if ( coremap[i] == INVALID_TOPOLOGY_ID )
+        if ( cputopo[i].core == XEN_INVALID_CORE_ID )
         {
             PyList_Append(cpu_to_core_obj, Py_None);
         }
         else
         {
-            PyObject *pyint = PyInt_FromLong(coremap[i]);
+            PyObject *pyint = PyInt_FromLong(cputopo[i].core);
             PyList_Append(cpu_to_core_obj, pyint);
             Py_DECREF(pyint);
         }
 
-        if ( socketmap[i] == INVALID_TOPOLOGY_ID )
+        if ( cputopo[i].socket == XEN_INVALID_SOCKET_ID )
         {
             PyList_Append(cpu_to_socket_obj, Py_None);
         }
         else
         {
-            PyObject *pyint = PyInt_FromLong(socketmap[i]);
+            PyObject *pyint = PyInt_FromLong(cputopo[i].socket);
             PyList_Append(cpu_to_socket_obj, pyint);
             Py_DECREF(pyint);
         }
 
-        if ( nodemap[i] == INVALID_TOPOLOGY_ID )
+        if ( cputopo[i].node == XEN_INVALID_NODE_ID )
         {
             PyList_Append(cpu_to_node_obj, Py_None);
         }
         else
         {
-            PyObject *pyint = PyInt_FromLong(nodemap[i]);
+            PyObject *pyint = PyInt_FromLong(cputopo[i].node);
             PyList_Append(cpu_to_node_obj, pyint);
             Py_DECREF(pyint);
         }
     }
 
-    ret_obj = Py_BuildValue("{s:i}", "max_cpu_index", max_cpu_index);
+    ret_obj = Py_BuildValue("{s:i}", "max_cpu_index", tinfo.num_cpus + 1);
 
     PyDict_SetItemString(ret_obj, "cpu_to_core", cpu_to_core_obj);
     Py_DECREF(cpu_to_core_obj);
@@ -1303,11 +1291,8 @@ static PyObject *pyxc_topologyinfo(XcObject *self)
     Py_DECREF(cpu_to_node_obj);
 
 out:
-    xc_hypercall_buffer_free(self->xc_handle, coremap);
-    xc_hypercall_buffer_free(self->xc_handle, socketmap);
-    xc_hypercall_buffer_free(self->xc_handle, nodemap);
+    xc_hypercall_buffer_free(self->xc_handle, cputopo);
     return ret_obj ? ret_obj : pyxc_error_to_exception(self->xc_handle);
-#undef MAX_CPU_INDEX
 }
 
 static PyObject *pyxc_numainfo(XcObject *self)
@@ -1375,7 +1360,7 @@ static PyObject *pyxc_numainfo(XcObject *self)
         for ( j = 0; j <= max_node_index; j++ )
         {
             uint32_t dist = nodes_dist[i*(max_node_index+1) + j];
-            if ( dist == INVALID_TOPOLOGY_ID )
+            if ( dist == ~0u )
             {
                 PyList_Append(node_to_node_dist_obj, Py_None);
             }
diff --git a/xen/common/sysctl.c b/xen/common/sysctl.c
index 6fdd029..b2cfe11 100644
--- a/xen/common/sysctl.c
+++ b/xen/common/sysctl.c
@@ -324,39 +324,63 @@ long do_sysctl(XEN_GUEST_HANDLE_PARAM(xen_sysctl_t) u_sysctl)
     }
     break;
 
-    case XEN_SYSCTL_topologyinfo:
+    case XEN_SYSCTL_cputopoinfo:
     {
-        uint32_t i, max_cpu_index, last_online_cpu;
-        xen_sysctl_topologyinfo_t *ti = &op->u.topologyinfo;
+        uint32_t i, num_cpus;
+        xen_sysctl_cputopoinfo_t *ti = &op->u.cputopoinfo;
 
-        last_online_cpu = cpumask_last(&cpu_online_map);
-        max_cpu_index = min_t(uint32_t, ti->max_cpu_index, last_online_cpu);
-        ti->max_cpu_index = last_online_cpu;
-
-        for ( i = 0; i <= max_cpu_index; i++ )
+        num_cpus = cpumask_last(&cpu_online_map) + 1;
+        if ( !guest_handle_is_null(ti->cputopo) )
         {
-            if ( !guest_handle_is_null(ti->cpu_to_core) )
+            if ( ti->num_cpus < num_cpus )
             {
-                uint32_t core = cpu_online(i) ? cpu_to_core(i) : ~0u;
-                if ( copy_to_guest_offset(ti->cpu_to_core, i, &core, 1) )
-                    break;
+                ret = -ENOBUFS;
+                i = num_cpus;
             }
-            if ( !guest_handle_is_null(ti->cpu_to_socket) )
+
+            for ( i = 0; i < num_cpus; i++ )
             {
-                uint32_t socket = cpu_online(i) ? cpu_to_socket(i) : ~0u;
-                if ( copy_to_guest_offset(ti->cpu_to_socket, i, &socket, 1) )
+                xen_sysctl_cputopo_t cputopo;
+
+                if ( cpu_present(i) )
+                {
+                    cputopo.core = cpu_to_core(i);
+                    if ( cputopo.core == BAD_APICID )
+                        cputopo.core = XEN_INVALID_CORE_ID;
+                    cputopo.socket = cpu_to_socket(i);
+                    if ( cputopo.socket == BAD_APICID )
+                        cputopo.socket = XEN_INVALID_SOCKET_ID;
+                    cputopo.node = cpu_to_node(i);
+                    if ( cputopo.node == NUMA_NO_NODE )
+                        cputopo.node = XEN_INVALID_NODE_ID;
+                }
+                else
+                {
+                    cputopo.core = XEN_INVALID_CORE_ID;
+                    cputopo.socket = XEN_INVALID_SOCKET_ID;
+                    cputopo.node = XEN_INVALID_NODE_ID;
+                }
+
+                if ( copy_to_guest_offset(ti->cputopo, i, &cputopo, 1) )
+                {
+                    ret = -EFAULT;
                     break;
+                }
             }
-            if ( !guest_handle_is_null(ti->cpu_to_node) )
+        }
+        else
+            i = num_cpus;
+
+        if ( (!ret || (ret == -ENOBUFS)) && (ti->num_cpus != i) )
+        {
+            ti->num_cpus = i;
+            if ( __copy_field_to_guest(u_sysctl, op,
+                                       u.cputopoinfo.num_cpus) )
             {
-                uint32_t node = cpu_online(i) ? cpu_to_node(i) : ~0u;
-                if ( copy_to_guest_offset(ti->cpu_to_node, i, &node, 1) )
-                    break;
+                ret = -EFAULT;
+                break;
             }
         }
-
-        ret = ((i <= max_cpu_index) || copy_to_guest(u_sysctl, op, 1))
-            ? -EFAULT : 0;
     }
     break;
 
diff --git a/xen/include/public/sysctl.h b/xen/include/public/sysctl.h
index 8552dc6..711441f 100644
--- a/xen/include/public/sysctl.h
+++ b/xen/include/public/sysctl.h
@@ -34,7 +34,7 @@
 #include "xen.h"
 #include "domctl.h"
 
-#define XEN_SYSCTL_INTERFACE_VERSION 0x0000000B
+#define XEN_SYSCTL_INTERFACE_VERSION 0x0000000C
 
 /*
  * Read console content from Xen buffer ring.
@@ -462,31 +462,36 @@ struct xen_sysctl_lockprof_op {
 typedef struct xen_sysctl_lockprof_op xen_sysctl_lockprof_op_t;
 DEFINE_XEN_GUEST_HANDLE(xen_sysctl_lockprof_op_t);
 
-/* XEN_SYSCTL_topologyinfo */
-#define INVALID_TOPOLOGY_ID  (~0U)
-struct xen_sysctl_topologyinfo {
-    /*
-     * IN: maximum addressable entry in the caller-provided arrays.
-     * OUT: largest cpu identifier in the system.
-     * If OUT is greater than IN then the arrays are truncated!
-     * If OUT is leass than IN then the array tails are not written by sysctl.
-     */
-    uint32_t max_cpu_index;
+/* XEN_SYSCTL_cputopoinfo */
+#define XEN_INVALID_CORE_ID     (~0U)
+#define XEN_INVALID_SOCKET_ID   (~0U)
+#define XEN_INVALID_NODE_ID     (~0U)
 
-    /*
-     * If not NULL, these arrays are filled with core/socket/node identifier
-     * for each cpu.
-     * If a cpu has no core/socket/node information (e.g., cpu not present) 
-     * then the sentinel value ~0u is written to each array.
-     * The number of array elements written by the sysctl is:
-     *   min(@max_cpu_index_IN,@max_cpu_index_OUT)+1
-     */
-    XEN_GUEST_HANDLE_64(uint32) cpu_to_core;
-    XEN_GUEST_HANDLE_64(uint32) cpu_to_socket;
-    XEN_GUEST_HANDLE_64(uint32) cpu_to_node;
+struct xen_sysctl_cputopo {
+    uint32_t core;
+    uint32_t socket;
+    uint32_t node;
+};
+typedef struct xen_sysctl_cputopo xen_sysctl_cputopo_t;
+DEFINE_XEN_GUEST_HANDLE(xen_sysctl_cputopo_t);
+
+/*
+ * IN:
+ *  - a NULL 'cputopo' handle is a request for maximun 'num_cpus'.
+ *  - otherwise it's the number of entries in 'cputopo'
+ *
+ * OUT:
+ *  - If 'num_cpus' is less than the number Xen needs to write, -ENOBUFS shall
+ *    be returned and 'num_cpus' updated to reflect the intended number.
+ *  - On success, 'num_cpus' shall indicate the number of entries written, which
+ *    may be less than the maximum.
+ */
+struct xen_sysctl_cputopoinfo {
+    uint32_t num_cpus;
+    XEN_GUEST_HANDLE_64(xen_sysctl_cputopo_t) cputopo;
 };
-typedef struct xen_sysctl_topologyinfo xen_sysctl_topologyinfo_t;
-DEFINE_XEN_GUEST_HANDLE(xen_sysctl_topologyinfo_t);
+typedef struct xen_sysctl_cputopoinfo xen_sysctl_cputopoinfo_t;
+DEFINE_XEN_GUEST_HANDLE(xen_sysctl_cputopoinfo_t);
 
 /* XEN_SYSCTL_numainfo */
 #define INVALID_NUMAINFO_ID (~0U)
@@ -672,7 +677,7 @@ struct xen_sysctl {
 #define XEN_SYSCTL_pm_op                         12
 #define XEN_SYSCTL_page_offline_op               14
 #define XEN_SYSCTL_lockprof_op                   15
-#define XEN_SYSCTL_topologyinfo                  16 
+#define XEN_SYSCTL_cputopoinfo                   16
 #define XEN_SYSCTL_numainfo                      17
 #define XEN_SYSCTL_cpupool_op                    18
 #define XEN_SYSCTL_scheduler_op                  19
@@ -683,7 +688,7 @@ struct xen_sysctl {
         struct xen_sysctl_readconsole       readconsole;
         struct xen_sysctl_tbuf_op           tbuf_op;
         struct xen_sysctl_physinfo          physinfo;
-        struct xen_sysctl_topologyinfo      topologyinfo;
+        struct xen_sysctl_cputopoinfo       cputopoinfo;
         struct xen_sysctl_numainfo          numainfo;
         struct xen_sysctl_sched_id          sched_id;
         struct xen_sysctl_perfc_op          perfc_op;
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v5 4/8] sysctl: Make XEN_SYSCTL_numainfo a little more efficient
  2015-03-19 21:53 [PATCH v5 0/8] Display IO topology when PXM data is available (plus some cleanup) Boris Ostrovsky
                   ` (2 preceding siblings ...)
  2015-03-19 21:53 ` [PATCH v5 3/8] sysctl: Make XEN_SYSCTL_topologyinfo sysctl a little more efficient Boris Ostrovsky
@ 2015-03-19 21:54 ` Boris Ostrovsky
  2015-03-20 13:26   ` Ian Campbell
  2015-03-20 16:15   ` Jan Beulich
  2015-03-19 21:54 ` [PATCH v5 5/8] sysctl: Add sysctl interface for querying PCI topology Boris Ostrovsky
                   ` (4 subsequent siblings)
  8 siblings, 2 replies; 45+ messages in thread
From: Boris Ostrovsky @ 2015-03-19 21:54 UTC (permalink / raw)
  To: andrew.cooper3, jbeulich, keir, ian.jackson, stefano.stabellini,
	ian.campbell, wei.liu2, dario.faggioli, elena.ufimtseva,
	xen-devel
  Cc: boris.ostrovsky

A number of changes to XEN_SYSCTL_numainfo interface:

* Make sysctl NUMA topology query use fewer copies by combining some
  fields into a single structure and copying distances for each node
  in a single copy.
* NULL meminfo and distance handles are a request for maximum number
  of nodes (num_nodes). If those handles are valid and num_nodes is
  is smaller than the number of nodes in the system then -ENOBUFS is
  returned (and correct num_nodes is provided)
* Instead of using max_node_index for passing number of nodes keep this
  value in num_nodes: almost all uses of max_node_index required adding
  or subtracting one to eventually get to number of nodes anyway.
* Replace INVALID_NUMAINFO_ID with XEN_INVALID_MEM_SZ and add
  XEN_INVALID_NODE_DIST.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
---
Changes in v5:
* Similar to 3/8 patch:
  * Make XEN_SYSCTL_numainfo treat passed in NULL handles as requests
    for array size (and is too small size passed in results in -ENOBUFS)
  * Make distance in xen_sysctl_numainfo a uint32
  * On the toolstack side use NULL handles to determine array size in
    libxl_get_numainfo() and use dynamic arrays in python's xc.c


 tools/libxl/libxl.c               |   65 +++++++++++++++-----------------
 tools/python/xen/lowlevel/xc/xc.c |   58 +++++++++++++---------------
 xen/common/sysctl.c               |   75 ++++++++++++++++++++----------------
 xen/include/public/sysctl.h       |   53 +++++++++++++++-----------
 4 files changed, 129 insertions(+), 122 deletions(-)

diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index c989abf..0234e36 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -5108,65 +5108,60 @@ libxl_numainfo *libxl_get_numainfo(libxl_ctx *ctx, int *nr)
 {
     GC_INIT(ctx);
     xc_numainfo_t ninfo;
-    DECLARE_HYPERCALL_BUFFER(xc_node_to_memsize_t, memsize);
-    DECLARE_HYPERCALL_BUFFER(xc_node_to_memfree_t, memfree);
-    DECLARE_HYPERCALL_BUFFER(uint32_t, node_dists);
+    DECLARE_HYPERCALL_BUFFER(xen_sysctl_meminfo_t, meminfo);
+    DECLARE_HYPERCALL_BUFFER(uint32_t, distance);
     libxl_numainfo *ret = NULL;
-    int i, j, max_nodes;
+    int i, j;
 
-    max_nodes = libxl_get_max_nodes(ctx);
-    if (max_nodes < 0)
+    set_xen_guest_handle(ninfo.meminfo, HYPERCALL_BUFFER_NULL);
+    set_xen_guest_handle(ninfo.distance, HYPERCALL_BUFFER_NULL);
+    if ( xc_numainfo(ctx->xch, &ninfo) != 0)
     {
         LIBXL__LOG(ctx, XTL_ERROR, "Unable to determine number of NODES");
         ret = NULL;
         goto out;
     }
 
-    memsize = xc_hypercall_buffer_alloc
-        (ctx->xch, memsize, sizeof(*memsize) * max_nodes);
-    memfree = xc_hypercall_buffer_alloc
-        (ctx->xch, memfree, sizeof(*memfree) * max_nodes);
-    node_dists = xc_hypercall_buffer_alloc
-        (ctx->xch, node_dists, sizeof(*node_dists) * max_nodes * max_nodes);
-    if ((memsize == NULL) || (memfree == NULL) || (node_dists == NULL)) {
+    meminfo = xc_hypercall_buffer_alloc(ctx->xch, meminfo,
+                                        sizeof(*meminfo) * ninfo.num_nodes);
+    distance = xc_hypercall_buffer_alloc(ctx->xch, distance,
+                                         sizeof(*distance) *
+                                         ninfo.num_nodes * ninfo.num_nodes);
+    if ((meminfo == NULL) || (distance == NULL)) {
         LIBXL__LOG_ERRNOVAL(ctx, XTL_ERROR, ENOMEM,
                             "Unable to allocate hypercall arguments");
         goto fail;
     }
 
-    set_xen_guest_handle(ninfo.node_to_memsize, memsize);
-    set_xen_guest_handle(ninfo.node_to_memfree, memfree);
-    set_xen_guest_handle(ninfo.node_to_node_distance, node_dists);
-    ninfo.max_node_index = max_nodes - 1;
+    set_xen_guest_handle(ninfo.meminfo, meminfo);
+    set_xen_guest_handle(ninfo.distance, distance);
     if (xc_numainfo(ctx->xch, &ninfo) != 0) {
         LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "getting numainfo");
         goto fail;
     }
 
-    if (ninfo.max_node_index < max_nodes - 1)
-        max_nodes = ninfo.max_node_index + 1;
+    *nr = ninfo.num_nodes;
 
-    *nr = max_nodes;
+    ret = libxl__zalloc(NOGC, sizeof(libxl_numainfo) * ninfo.num_nodes);
+    for (i = 0; i < ninfo.num_nodes; i++)
+        ret[i].dists = libxl__calloc(NOGC, ninfo.num_nodes, sizeof(*distance));
 
-    ret = libxl__zalloc(NOGC, sizeof(libxl_numainfo) * max_nodes);
-    for (i = 0; i < max_nodes; i++)
-        ret[i].dists = libxl__calloc(NOGC, max_nodes, sizeof(*node_dists));
-
-    for (i = 0; i < max_nodes; i++) {
-#define V(mem, i) (mem[i] == INVALID_NUMAINFO_ID) ? \
-    LIBXL_NUMAINFO_INVALID_ENTRY : mem[i]
-        ret[i].size = V(memsize, i);
-        ret[i].free = V(memfree, i);
-        ret[i].num_dists = max_nodes;
-        for (j = 0; j < ret[i].num_dists; j++)
-            ret[i].dists[j] = V(node_dists, i * max_nodes + j);
+    for (i = 0; i < ninfo.num_nodes; i++) {
+#define V(val, invalid) (val == invalid) ? \
+       LIBXL_NUMAINFO_INVALID_ENTRY : val
+        ret[i].size = V(meminfo[i].memsize, XEN_INVALID_MEM_SZ);
+        ret[i].free = V(meminfo[i].memfree, XEN_INVALID_MEM_SZ);
+        ret[i].num_dists = ninfo.num_nodes;
+        for (j = 0; j < ret[i].num_dists; j++) {
+            unsigned idx = i * ninfo.num_nodes + j;
+            ret[i].dists[j] = V(distance[idx], XEN_INVALID_NODE_DIST);
+        }
 #undef V
     }
 
  fail:
-    xc_hypercall_buffer_free(ctx->xch, memsize);
-    xc_hypercall_buffer_free(ctx->xch, memfree);
-    xc_hypercall_buffer_free(ctx->xch, node_dists);
+    xc_hypercall_buffer_free(ctx->xch, meminfo);
+    xc_hypercall_buffer_free(ctx->xch, distance);
 
  out:
     GC_FREE;
diff --git a/tools/python/xen/lowlevel/xc/xc.c b/tools/python/xen/lowlevel/xc/xc.c
index 5e81c4a..ba66d55 100644
--- a/tools/python/xen/lowlevel/xc/xc.c
+++ b/tools/python/xen/lowlevel/xc/xc.c
@@ -1297,55 +1297,52 @@ out:
 
 static PyObject *pyxc_numainfo(XcObject *self)
 {
-#define MAX_NODE_INDEX 31
     xc_numainfo_t ninfo = { 0 };
-    int i, j, max_node_index;
+    unsigned i, j;
     uint64_t free_heap;
     PyObject *ret_obj = NULL, *node_to_node_dist_list_obj;
     PyObject *node_to_memsize_obj, *node_to_memfree_obj;
     PyObject *node_to_dma32_mem_obj, *node_to_node_dist_obj;
-    DECLARE_HYPERCALL_BUFFER(xc_node_to_memsize_t, node_memsize);
-    DECLARE_HYPERCALL_BUFFER(xc_node_to_memfree_t, node_memfree);
-    DECLARE_HYPERCALL_BUFFER(xc_node_to_node_dist_t, nodes_dist);
+    DECLARE_HYPERCALL_BUFFER(xen_sysctl_meminfo_t, meminfo);
+    DECLARE_HYPERCALL_BUFFER(uint32_t, distance);
 
-    node_memsize = xc_hypercall_buffer_alloc(self->xc_handle, node_memsize, sizeof(*node_memsize)*(MAX_NODE_INDEX+1));
-    if ( node_memsize == NULL )
+    set_xen_guest_handle(ninfo.meminfo, HYPERCALL_BUFFER_NULL);
+    set_xen_guest_handle(ninfo.distance, HYPERCALL_BUFFER_NULL);
+    if ( xc_numainfo(self->xc_handle, &ninfo) != 0 )
         goto out;
-    node_memfree = xc_hypercall_buffer_alloc(self->xc_handle, node_memfree, sizeof(*node_memfree)*(MAX_NODE_INDEX+1));
-    if ( node_memfree == NULL )
+
+    meminfo = xc_hypercall_buffer_alloc(self->xc_handle, meminfo,
+                                        sizeof(*meminfo) * ninfo.num_nodes);
+    if ( meminfo == NULL )
         goto out;
-    nodes_dist = xc_hypercall_buffer_alloc(self->xc_handle, nodes_dist, sizeof(*nodes_dist)*(MAX_NODE_INDEX+1)*(MAX_NODE_INDEX+1));
-    if ( nodes_dist == NULL )
+    distance = xc_hypercall_buffer_alloc(self->xc_handle, distance,
+                                         sizeof(*distance) *
+                                         ninfo.num_nodes * ninfo.num_nodes);
+    if ( distance == NULL )
         goto out;
 
-    set_xen_guest_handle(ninfo.node_to_memsize, node_memsize);
-    set_xen_guest_handle(ninfo.node_to_memfree, node_memfree);
-    set_xen_guest_handle(ninfo.node_to_node_distance, nodes_dist);
-    ninfo.max_node_index = MAX_NODE_INDEX;
-
+    set_xen_guest_handle(ninfo.meminfo, meminfo);
+    set_xen_guest_handle(ninfo.distance, distance);
     if ( xc_numainfo(self->xc_handle, &ninfo) != 0 )
         goto out;
 
-    max_node_index = ninfo.max_node_index;
-    if ( max_node_index > MAX_NODE_INDEX )
-        max_node_index = MAX_NODE_INDEX;
-
     /* Construct node-to-* lists. */
     node_to_memsize_obj = PyList_New(0);
     node_to_memfree_obj = PyList_New(0);
     node_to_dma32_mem_obj = PyList_New(0);
     node_to_node_dist_list_obj = PyList_New(0);
-    for ( i = 0; i <= max_node_index; i++ )
+    for ( i = 0; i < ninfo.num_nodes; i++ )
     {
         PyObject *pyint;
+        unsigned invalid_node;
 
         /* Total Memory */
-        pyint = PyInt_FromLong(node_memsize[i] >> 20); /* MB */
+        pyint = PyInt_FromLong(meminfo[i].memsize >> 20); /* MB */
         PyList_Append(node_to_memsize_obj, pyint);
         Py_DECREF(pyint);
 
         /* Free Memory */
-        pyint = PyInt_FromLong(node_memfree[i] >> 20); /* MB */
+        pyint = PyInt_FromLong(meminfo[i].memfree >> 20); /* MB */
         PyList_Append(node_to_memfree_obj, pyint);
         Py_DECREF(pyint);
 
@@ -1357,10 +1354,11 @@ static PyObject *pyxc_numainfo(XcObject *self)
 
         /* Node to Node Distance */
         node_to_node_dist_obj = PyList_New(0);
-        for ( j = 0; j <= max_node_index; j++ )
+        invalid_node = (meminfo[i].memsize == XEN_INVALID_MEM_SZ);
+        for ( j = 0; j < ninfo.num_nodes; j++ )
         {
-            uint32_t dist = nodes_dist[i*(max_node_index+1) + j];
-            if ( dist == ~0u )
+            uint32_t dist = distance[i * ninfo.num_nodes + j];
+            if ( invalid_node || (dist == XEN_INVALID_NODE_DIST) )
             {
                 PyList_Append(node_to_node_dist_obj, Py_None);
             }
@@ -1375,7 +1373,7 @@ static PyObject *pyxc_numainfo(XcObject *self)
         Py_DECREF(node_to_node_dist_obj);
     }
 
-    ret_obj = Py_BuildValue("{s:i}", "max_node_index", max_node_index);
+    ret_obj = Py_BuildValue("{s:i}", "max_node_index", ninfo.num_nodes + 1);
 
     PyDict_SetItemString(ret_obj, "node_memsize", node_to_memsize_obj);
     Py_DECREF(node_to_memsize_obj);
@@ -1391,11 +1389,9 @@ static PyObject *pyxc_numainfo(XcObject *self)
     Py_DECREF(node_to_node_dist_list_obj);
 
 out:
-    xc_hypercall_buffer_free(self->xc_handle, node_memsize);
-    xc_hypercall_buffer_free(self->xc_handle, node_memfree);
-    xc_hypercall_buffer_free(self->xc_handle, nodes_dist);
+    xc_hypercall_buffer_free(self->xc_handle, meminfo);
+    xc_hypercall_buffer_free(self->xc_handle, distance);
     return ret_obj ? ret_obj : pyxc_error_to_exception(self->xc_handle);
-#undef MAX_NODE_INDEX
 }
 
 static PyObject *pyxc_xeninfo(XcObject *self)
diff --git a/xen/common/sysctl.c b/xen/common/sysctl.c
index b2cfe11..acaeeb2 100644
--- a/xen/common/sysctl.c
+++ b/xen/common/sysctl.c
@@ -274,53 +274,62 @@ long do_sysctl(XEN_GUEST_HANDLE_PARAM(xen_sysctl_t) u_sysctl)
 
     case XEN_SYSCTL_numainfo:
     {
-        uint32_t i, j, max_node_index, last_online_node;
+        uint32_t i, j, num_nodes;
         xen_sysctl_numainfo_t *ni = &op->u.numainfo;
 
-        last_online_node = last_node(node_online_map);
-        max_node_index = min_t(uint32_t, ni->max_node_index, last_online_node);
-        ni->max_node_index = last_online_node;
+        num_nodes = last_node(node_online_map) + 1;
 
-        for ( i = 0; i <= max_node_index; i++ )
+        if ( !guest_handle_is_null(ni->meminfo) &&
+             !guest_handle_is_null(ni->distance) )
         {
-            if ( !guest_handle_is_null(ni->node_to_memsize) )
+            if ( ni->num_nodes < num_nodes )
             {
-                uint64_t memsize = node_online(i) ?
-                                   node_spanned_pages(i) << PAGE_SHIFT : 0ul;
-                if ( copy_to_guest_offset(ni->node_to_memsize, i, &memsize, 1) )
-                    break;
-            }
-            if ( !guest_handle_is_null(ni->node_to_memfree) )
-            {
-                uint64_t memfree = node_online(i) ?
-                                   avail_node_heap_pages(i) << PAGE_SHIFT : 0ul;
-                if ( copy_to_guest_offset(ni->node_to_memfree, i, &memfree, 1) )
-                    break;
+                ret = -ENOBUFS;
+                i = num_nodes;
             }
 
-            if ( !guest_handle_is_null(ni->node_to_node_distance) )
+            for ( i = 0; i < num_nodes; i++ )
             {
-                for ( j = 0; j <= max_node_index; j++)
+                xen_sysctl_meminfo_t meminfo;
+                uint32_t distance[MAX_NUMNODES];
+
+                if ( node_online(i) )
                 {
-                    uint32_t distance = ~0u;
-                    if ( node_online(i) && node_online(j) )
-                    {
-                        u8 d = __node_distance(i, j);
-                        if ( d != NUMA_NO_DISTANCE )
-                            distance = d;
-                    }
-                    if ( copy_to_guest_offset(
-                        ni->node_to_node_distance,
-                        i*(max_node_index+1) + j, &distance, 1) )
-                        break;
+                    meminfo.memsize = node_spanned_pages(i) << PAGE_SHIFT;
+                    meminfo.memfree = avail_node_heap_pages(i) << PAGE_SHIFT;
                 }
-                if ( j <= max_node_index )
+                else
+                    meminfo.memsize = meminfo.memfree = XEN_INVALID_MEM_SZ;
+
+                for ( j = 0; j < num_nodes; j++ )
+                {
+                    distance[j] = __node_distance(i, j);
+                    if ( distance[j] == NUMA_NO_DISTANCE )
+                        distance[j] = XEN_INVALID_NODE_DIST;
+                }
+
+                if ( copy_to_guest_offset(ni->distance, i * num_nodes,
+                                          distance, num_nodes) ||
+                     copy_to_guest_offset(ni->meminfo, i, &meminfo, 1) )
+                {
+                    ret = -EFAULT;
                     break;
+                }
             }
         }
+        else
+            i = num_nodes;
 
-        ret = ((i <= max_node_index) || copy_to_guest(u_sysctl, op, 1))
-            ? -EFAULT : 0;
+        if ( (!ret || (ret = -ENOBUFS)) && (ni->num_nodes != i) )
+        {
+            ni->num_nodes = i;
+            if ( __copy_field_to_guest(u_sysctl, op,
+                                       u.numainfo.num_nodes) )
+            {
+                ret = -EFAULT;
+                break;
+            }
+        }
     }
     break;
 
diff --git a/xen/include/public/sysctl.h b/xen/include/public/sysctl.h
index 711441f..7e0d5fe 100644
--- a/xen/include/public/sysctl.h
+++ b/xen/include/public/sysctl.h
@@ -494,34 +494,41 @@ typedef struct xen_sysctl_cputopoinfo xen_sysctl_cputopoinfo_t;
 DEFINE_XEN_GUEST_HANDLE(xen_sysctl_cputopoinfo_t);
 
 /* XEN_SYSCTL_numainfo */
-#define INVALID_NUMAINFO_ID (~0U)
+#define XEN_INVALID_MEM_SZ     (~0U)
+#define XEN_INVALID_NODE_DIST  (~0U)
+
+struct xen_sysctl_meminfo {
+    uint64_t memsize;
+    uint64_t memfree;
+};
+typedef struct xen_sysctl_meminfo xen_sysctl_meminfo_t;
+DEFINE_XEN_GUEST_HANDLE(xen_sysctl_meminfo_t);
+
+/*
+ * IN:
+ *  - NULL 'meminfo' and 'distance'  handles is a request for maximun
+ *    'num_nodes'.
+ *  - otherwise it's the number of entries in 'meminfo' (and square root
+ *    of number of entries in 'distance')
+ *
+ * OUT:
+ *  - If 'num_nodes' is less than the number Xen needs to write, -ENOBUFS shall
+ *    be returned and 'num_nodes' updated to reflect the intended number.
+ *  - On success, 'num_nodes' shall indicate the number of entries written, which
+ *    may be less than the maximum.
+ */
+
 struct xen_sysctl_numainfo {
-    /*
-     * IN: maximum addressable entry in the caller-provided arrays.
-     * OUT: largest node identifier in the system.
-     * If OUT is greater than IN then the arrays are truncated!
-     */
-    uint32_t max_node_index;
+    uint32_t num_nodes;
 
-    /* NB. Entries are 0 if node is not present. */
-    XEN_GUEST_HANDLE_64(uint64) node_to_memsize;
-    XEN_GUEST_HANDLE_64(uint64) node_to_memfree;
+    XEN_GUEST_HANDLE_64(xen_sysctl_meminfo_t) meminfo;
 
     /*
-     * Array, of size (max_node_index+1)^2, listing memory access distances
-     * between nodes. If an entry has no node distance information (e.g., node 
-     * not present) then the value ~0u is written.
-     * 
-     * Note that the array rows must be indexed by multiplying by the minimum 
-     * of the caller-provided max_node_index and the returned value of
-     * max_node_index. That is, if the largest node index in the system is
-     * smaller than the caller can handle, a smaller 2-d array is constructed
-     * within the space provided by the caller. When this occurs, trailing
-     * space provided by the caller is not modified. If the largest node index
-     * in the system is larger than the caller can handle, then a 2-d array of
-     * the maximum size handleable by the caller is constructed.
+     * Distance between nodes 'i' and 'j' is stored in index 'i*N + j',
+     * where N is the number of nodes that will be returned in 'num_nodes'
+     * (i.e. not 'num_nodes' provided by the caller)
      */
-    XEN_GUEST_HANDLE_64(uint32) node_to_node_distance;
+    XEN_GUEST_HANDLE_64(uint32) distance;
 };
 typedef struct xen_sysctl_numainfo xen_sysctl_numainfo_t;
 DEFINE_XEN_GUEST_HANDLE(xen_sysctl_numainfo_t);
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v5 5/8] sysctl: Add sysctl interface for querying PCI topology
  2015-03-19 21:53 [PATCH v5 0/8] Display IO topology when PXM data is available (plus some cleanup) Boris Ostrovsky
                   ` (3 preceding siblings ...)
  2015-03-19 21:54 ` [PATCH v5 4/8] sysctl: Make XEN_SYSCTL_numainfo " Boris Ostrovsky
@ 2015-03-19 21:54 ` Boris Ostrovsky
  2015-03-20 14:21   ` Konrad Rzeszutek Wilk
  2015-03-20 16:26   ` Jan Beulich
  2015-03-19 21:54 ` [PATCH v5 6/8] libxl/libxc: Move libxl_get_cpu_topology()'s hypercall buffer management to libxc Boris Ostrovsky
                   ` (3 subsequent siblings)
  8 siblings, 2 replies; 45+ messages in thread
From: Boris Ostrovsky @ 2015-03-19 21:54 UTC (permalink / raw)
  To: andrew.cooper3, jbeulich, keir, ian.jackson, stefano.stabellini,
	ian.campbell, wei.liu2, dario.faggioli, elena.ufimtseva,
	xen-devel
  Cc: boris.ostrovsky

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
---

Changes in v5:
* Increment ti->first_dev in the loop
* Make node in xen_sysctl_pcitopoinfo a uint32
* Move sysctl to follow hearder file's order
* Update comments in sysctl.h 

 xen/common/sysctl.c         |   61 +++++++++++++++++++++++++++++++++++++++++++
 xen/include/public/sysctl.h |   28 +++++++++++++++++++
 2 files changed, 89 insertions(+), 0 deletions(-)

diff --git a/xen/common/sysctl.c b/xen/common/sysctl.c
index acaeeb2..c73dfc9 100644
--- a/xen/common/sysctl.c
+++ b/xen/common/sysctl.c
@@ -399,6 +399,67 @@ long do_sysctl(XEN_GUEST_HANDLE_PARAM(xen_sysctl_t) u_sysctl)
         break;
 #endif
 
+#ifdef HAS_PCI
+    case XEN_SYSCTL_pcitopoinfo:
+    {
+        xen_sysctl_pcitopoinfo_t *ti = &op->u.pcitopoinfo;
+
+        if ( guest_handle_is_null(ti->devs) ||
+             guest_handle_is_null(ti->nodes) ||
+             (ti->first_dev > ti->num_devs) )
+        {
+            ret = -EINVAL;
+            break;
+        }
+
+        while ( ti->first_dev < ti->num_devs )
+        {
+            physdev_pci_device_t dev;
+            uint32_t node;
+            struct pci_dev *pdev;
+
+            if ( copy_from_guest_offset(&dev, ti->devs, ti->first_dev, 1) )
+            {
+                ret = -EFAULT;
+                break;
+            }
+
+            spin_lock(&pcidevs_lock);
+            pdev = pci_get_pdev(dev.seg, dev.bus, dev.devfn);
+            if ( !pdev || (pdev->node == NUMA_NO_NODE) )
+                node = XEN_INVALID_NODE_ID;
+            else
+                node = pdev->node;
+            spin_unlock(&pcidevs_lock);
+
+            if ( copy_to_guest_offset(ti->nodes, ti->first_dev, &node, 1) )
+            {
+                ret = -EFAULT;
+                break;
+            }
+
+            ti->first_dev++;
+
+            if ( hypercall_preempt_check() )
+                break;
+        }
+
+        if ( !ret )
+        {
+            if ( __copy_field_to_guest(u_sysctl, op, u.pcitopoinfo.first_dev) )
+            {
+                ret = -EFAULT;
+                break;
+            }
+
+            if ( ti->first_dev < ti->num_devs )
+                ret = hypercall_create_continuation(__HYPERVISOR_sysctl,
+                                                    "h", u_sysctl);
+        }
+    }
+    break;
+#endif
+
     default:
         ret = arch_do_sysctl(op, u_sysctl);
         copyback = 0;
diff --git a/xen/include/public/sysctl.h b/xen/include/public/sysctl.h
index 7e0d5fe..ceb8ac8 100644
--- a/xen/include/public/sysctl.h
+++ b/xen/include/public/sysctl.h
@@ -33,6 +33,7 @@
 
 #include "xen.h"
 #include "domctl.h"
+#include "physdev.h"
 
 #define XEN_SYSCTL_INTERFACE_VERSION 0x0000000C
 
@@ -668,6 +669,31 @@ struct xen_sysctl_psr_cmt_op {
 typedef struct xen_sysctl_psr_cmt_op xen_sysctl_psr_cmt_op_t;
 DEFINE_XEN_GUEST_HANDLE(xen_sysctl_psr_cmt_op_t);
 
+/* XEN_SYSCTL_pcitopoinfo */
+struct xen_sysctl_pcitopoinfo {
+    /* IN: Number of elements in 'pcitopo' and 'nodes' arrays */
+    uint32_t num_devs;
+
+    /*
+     * IN/OUT: First element of pcitopo array that needs to be processed by
+     * hypervisor.
+     * This is used by hypercall continuations, callers must set it to zero.
+     */
+    uint32_t first_dev;
+
+    /* IN: list of devices */
+    XEN_GUEST_HANDLE_64(physdev_pci_device_t) devs;
+
+    /*
+     * OUT: node identifier for each device.
+     * If information for a particular device is not avalable then set
+     * to XEN_INVALID_NODE_ID.
+     */
+    XEN_GUEST_HANDLE_64(uint32) nodes;
+};
+typedef struct xen_sysctl_pcitopoinfo xen_sysctl_pcitopoinfo_t;
+DEFINE_XEN_GUEST_HANDLE(xen_sysctl_pcitopoinfo_t);
+
 struct xen_sysctl {
     uint32_t cmd;
 #define XEN_SYSCTL_readconsole                    1
@@ -690,12 +716,14 @@ struct xen_sysctl {
 #define XEN_SYSCTL_scheduler_op                  19
 #define XEN_SYSCTL_coverage_op                   20
 #define XEN_SYSCTL_psr_cmt_op                    21
+#define XEN_SYSCTL_pcitopoinfo                   22
     uint32_t interface_version; /* XEN_SYSCTL_INTERFACE_VERSION */
     union {
         struct xen_sysctl_readconsole       readconsole;
         struct xen_sysctl_tbuf_op           tbuf_op;
         struct xen_sysctl_physinfo          physinfo;
         struct xen_sysctl_cputopoinfo       cputopoinfo;
+        struct xen_sysctl_pcitopoinfo       pcitopoinfo;
         struct xen_sysctl_numainfo          numainfo;
         struct xen_sysctl_sched_id          sched_id;
         struct xen_sysctl_perfc_op          perfc_op;
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v5 6/8] libxl/libxc: Move libxl_get_cpu_topology()'s hypercall buffer management to libxc
  2015-03-19 21:53 [PATCH v5 0/8] Display IO topology when PXM data is available (plus some cleanup) Boris Ostrovsky
                   ` (4 preceding siblings ...)
  2015-03-19 21:54 ` [PATCH v5 5/8] sysctl: Add sysctl interface for querying PCI topology Boris Ostrovsky
@ 2015-03-19 21:54 ` Boris Ostrovsky
  2015-03-20 13:54   ` Ian Campbell
  2015-03-19 21:54 ` [PATCH v5 7/8] libxl/libxc: Move libxl_get_numainfo()'s " Boris Ostrovsky
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 45+ messages in thread
From: Boris Ostrovsky @ 2015-03-19 21:54 UTC (permalink / raw)
  To: andrew.cooper3, jbeulich, keir, ian.jackson, stefano.stabellini,
	ian.campbell, wei.liu2, dario.faggioli, elena.ufimtseva,
	xen-devel
  Cc: boris.ostrovsky

xc_cputopoinfo() is not expected to be used on a hot path and therefore
hypercall buffer management can be pushed into libxc. This will simplify
life for callers.

Also update error reporting macros.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
---
Changes in v5:
* Adjust to new interface (as described in changelog for patch 3/8)

 tools/libxc/include/xenctrl.h     |    5 ++-
 tools/libxc/xc_misc.c             |   28 +++++++++++++++++++------
 tools/libxl/libxl.c               |   37 ++++++++++----------------------
 tools/misc/xenpm.c                |   41 +++++++++++++++---------------------
 tools/python/xen/lowlevel/xc/xc.c |   20 +++++++----------
 5 files changed, 61 insertions(+), 70 deletions(-)

diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index f64f815..14d22ce 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -1226,7 +1226,7 @@ int xc_readconsolering(xc_interface *xch,
 int xc_send_debug_keys(xc_interface *xch, char *keys);
 
 typedef xen_sysctl_physinfo_t xc_physinfo_t;
-typedef xen_sysctl_cputopoinfo_t xc_cputopoinfo_t;
+typedef xen_sysctl_cputopo_t xc_cputopo_t;
 typedef xen_sysctl_numainfo_t xc_numainfo_t;
 
 typedef uint32_t xc_cpu_to_node_t;
@@ -1237,7 +1237,8 @@ typedef uint64_t xc_node_to_memfree_t;
 typedef uint32_t xc_node_to_node_dist_t;
 
 int xc_physinfo(xc_interface *xch, xc_physinfo_t *info);
-int xc_cputopoinfo(xc_interface *xch, xc_cputopoinfo_t *info);
+int xc_cputopoinfo(xc_interface *xch, unsigned *max_cpus,
+                   xc_cputopo_t *cputopo);
 int xc_numainfo(xc_interface *xch, xc_numainfo_t *info);
 
 int xc_sched_id(xc_interface *xch,
diff --git a/tools/libxc/xc_misc.c b/tools/libxc/xc_misc.c
index be68291..411128e 100644
--- a/tools/libxc/xc_misc.c
+++ b/tools/libxc/xc_misc.c
@@ -177,22 +177,36 @@ int xc_physinfo(xc_interface *xch,
     return 0;
 }
 
-int xc_cputopoinfo(xc_interface *xch,
-                   xc_cputopoinfo_t *put_info)
+int xc_cputopoinfo(xc_interface *xch, unsigned *max_cpus,
+                   xc_cputopo_t *cputopo)
 {
     int ret;
     DECLARE_SYSCTL;
+    DECLARE_HYPERCALL_BOUNCE(cputopo, *max_cpus * sizeof(*cputopo),
+                             XC_HYPERCALL_BUFFER_BOUNCE_OUT);
 
-    sysctl.cmd = XEN_SYSCTL_cputopoinfo;
+    if (cputopo) {
+        if ((ret = xc_hypercall_bounce_pre(xch, cputopo)))
+            goto out;
+
+        sysctl.u.cputopoinfo.num_cpus = *max_cpus;
+        set_xen_guest_handle(sysctl.u.cputopoinfo.cputopo, cputopo);
+    } else
+        set_xen_guest_handle(sysctl.u.cputopoinfo.cputopo,
+                             HYPERCALL_BUFFER_NULL);
 
-    memcpy(&sysctl.u.cputopoinfo, put_info, sizeof(*put_info));
+    sysctl.cmd = XEN_SYSCTL_cputopoinfo;
 
     if ( (ret = do_sysctl(xch, &sysctl)) != 0 )
-        return ret;
+        goto out;
 
-    memcpy(put_info, &sysctl.u.cputopoinfo, sizeof(*put_info));
+    *max_cpus = sysctl.u.cputopoinfo.num_cpus;
 
-    return 0;
+out:
+    if (cputopo)
+        xc_hypercall_bounce_post(xch, cputopo);
+
+    return ret;
 }
 
 int xc_numainfo(xc_interface *xch,
diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index 0234e36..2b7d19c 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -5054,37 +5054,28 @@ int libxl_get_physinfo(libxl_ctx *ctx, libxl_physinfo *physinfo)
 libxl_cputopology *libxl_get_cpu_topology(libxl_ctx *ctx, int *nb_cpu_out)
 {
     GC_INIT(ctx);
-    xc_cputopoinfo_t tinfo;
-    DECLARE_HYPERCALL_BUFFER(xen_sysctl_cputopo_t, cputopo);
+    xc_cputopo_t *cputopo;
     libxl_cputopology *ret = NULL;
     int i;
+    unsigned num_cpus;
 
-    /* Setting buffer to NULL makes the hypercall return number of CPUs */
-    set_xen_guest_handle(tinfo.cputopo, HYPERCALL_BUFFER_NULL);
-    if (xc_cputopoinfo(ctx->xch, &tinfo) != 0)
+    /* Setting buffer to NULL makes the call return number of CPUs */
+    if (xc_cputopoinfo(ctx->xch, &num_cpus, NULL))
     {
-        LIBXL__LOG(ctx, XTL_ERROR, "Unable to determine number of CPUS");
-        ret = NULL;
+        LOGEV(ERROR, errno, "Unable to determine number of CPUS");
         goto out;
     }
 
-    cputopo = xc_hypercall_buffer_alloc(ctx->xch, cputopo,
-                                        sizeof(*cputopo) * tinfo.num_cpus);
-    if (cputopo == NULL) {
-        LIBXL__LOG_ERRNOVAL(ctx, XTL_ERROR, ENOMEM,
-                            "Unable to allocate hypercall arguments");
-        goto fail;
-    }
-    set_xen_guest_handle(tinfo.cputopo, cputopo);
+    cputopo = libxl__zalloc(gc, sizeof(*cputopo) * num_cpus);
 
-    if (xc_cputopoinfo(ctx->xch, &tinfo) != 0) {
-        LIBXL__LOG_ERRNO(ctx, XTL_ERROR, "CPU topology info hypercall failed");
-        goto fail;
+    if (xc_cputopoinfo(ctx->xch, &num_cpus, cputopo)) {
+        LOGEV(ERROR, errno, "CPU topology info hypercall failed");
+        goto out;
     }
 
-    ret = libxl__zalloc(NOGC, sizeof(libxl_cputopology) * tinfo.num_cpus);
+    ret = libxl__zalloc(NOGC, sizeof(libxl_cputopology) * num_cpus);
 
-    for (i = 0; i < tinfo.num_cpus; i++) {
+    for (i = 0; i < num_cpus; i++) {
 #define V(map, i, invalid) ( cputopo[i].map == invalid) ? \
    LIBXL_CPUTOPOLOGY_INVALID_ENTRY : cputopo[i].map
         ret[i].core = V(core, i, XEN_INVALID_CORE_ID);
@@ -5093,11 +5084,7 @@ libxl_cputopology *libxl_get_cpu_topology(libxl_ctx *ctx, int *nb_cpu_out)
 #undef V
     }
 
-fail:
-    xc_hypercall_buffer_free(ctx->xch, cputopo);
-
-    if (ret)
-        *nb_cpu_out = tinfo.num_cpus;
+    *nb_cpu_out = num_cpus;
 
  out:
     GC_FREE;
diff --git a/tools/misc/xenpm.c b/tools/misc/xenpm.c
index a5d07de..e000234 100644
--- a/tools/misc/xenpm.c
+++ b/tools/misc/xenpm.c
@@ -355,20 +355,17 @@ static void signal_int_handler(int signo)
     int i, j, k;
     struct timeval tv;
     int cx_cap = 0, px_cap = 0;
-    DECLARE_HYPERCALL_BUFFER(xen_sysctl_cputopo_t, cputopo);
-    xc_cputopoinfo_t info = { 0 };
+    xc_cputopo_t *cputopo;
+    unsigned max_cpus;
 
-    set_xen_guest_handle(info.cputopo, HYPERCALL_BUFFER_NULL);
-    if ( xc_cputopoinfo(xc_handle, &info) != 0 )
+    if ( xc_cputopoinfo(xc_handle, &max_cpus, NULL) != 0 )
     {
         fprintf(stderr, "failed to discover number of CPUs: %s\n",
                 strerror(errno));
         goto out;
     }
 
-    cputopo = xc_hypercall_buffer_alloc(xc_handle, cputopo,
-                                        sizeof(*cputopo) * info.num_cpus);
-
+    cputopo = calloc(max_cpus, sizeof(*cputopo));
     if ( cputopo == NULL )
     {
 	fprintf(stderr, "failed to allocate hypercall buffers\n");
@@ -453,28 +450,26 @@ static void signal_int_handler(int signo)
             printf("  Avg freq\t%d\tKHz\n", avgfreq[i]);
     }
 
-    set_xen_guest_handle(info.cputopo, cputopo);
-
-    if ( cx_cap && !xc_cputopoinfo(xc_handle, &info) )
+    if ( cx_cap && !xc_cputopoinfo(xc_handle, &max_cpus, cputopo) )
     {
         uint32_t socket_ids[MAX_NR_CPU];
         uint32_t core_ids[MAX_NR_CPU];
         uint32_t socket_nr = 0;
         uint32_t core_nr = 0;
 
-        if ( info.num_cpus > MAX_NR_CPU )
-            info.num_cpus = MAX_NR_CPU;
+        if ( max_cpus > MAX_NR_CPU )
+            max_cpus = MAX_NR_CPU;
         /* check validity */
-        for ( i = 0; i < info.num_cpus; i++ )
+        for ( i = 0; i < max_cpus; i++ )
         {
             if ( cputopo[i].core == XEN_INVALID_CORE_ID ||
                  cputopo[i].socket == XEN_INVALID_SOCKET_ID )
                 break;
         }
-        if ( i >= info.num_cpus )
+        if ( i >= max_cpus )
         {
             /* find socket nr & core nr per socket */
-            for ( i = 0; i < info.num_cpus; i++ )
+            for ( i = 0; i < max_cpus; i++ )
             {
                 for ( j = 0; j < socket_nr; j++ )
                     if ( cputopo[i].socket == socket_ids[j] )
@@ -501,7 +496,7 @@ static void signal_int_handler(int signo)
                 unsigned int n;
                 uint64_t res;
 
-                for ( j = 0; j < info.num_cpus; j++ )
+                for ( j = 0; j < max_cpus; j++ )
                 {
                     if ( cputopo[j].socket == socket_ids[i] )
                         break;
@@ -520,7 +515,7 @@ static void signal_int_handler(int signo)
                 }
                 for ( k = 0; k < core_nr; k++ )
                 {
-                    for ( j = 0; j < info.num_cpus; j++ )
+                    for ( j = 0; j < max_cpus; j++ )
                     {
                         if ( cputopo[j].socket == socket_ids[i] &&
                              cputopo[j].core == core_ids[k] )
@@ -558,7 +553,7 @@ static void signal_int_handler(int signo)
     free(sum);
     free(avgfreq);
 out:
-    xc_hypercall_buffer_free(xc_handle, cputopo);
+    free(cputopo);
     xc_interface_close(xc_handle);
     exit(0);
 }
@@ -978,16 +973,14 @@ void cpu_topology_func(int argc, char *argv[])
         goto out;
     }
 
-    cputopo = xc_hypercall_buffer_alloc(xc_handle, cputopo,
-                                        sizeof(*cputopo) * info.num_cpus);
+    cputopo = calloc(max_cpus, sizeof(*cputopo));
     if ( cputopo == NULL )
     {
 	fprintf(stderr, "failed to allocate hypercall buffers\n");
 	goto out;
     }
 
-    set_xen_guest_handle(info.cputopo, cputopo);
-    if ( xc_cputopoinfo(xc_handle, &info) )
+    if ( xc_cputopoinfo(xc_handle, &max_cpus, cputopo) )
     {
         rc = errno;
         fprintf(stderr, "Cannot get Xen CPU topology (%d - %s)\n",
@@ -996,7 +989,7 @@ void cpu_topology_func(int argc, char *argv[])
     }
 
     printf("CPU\tcore\tsocket\tnode\n");
-    for ( i = 0; i < info.num_cpus; i++ )
+    for ( i = 0; i < max_cpus; i++ )
     {
         if ( cputopo[i].core == XEN_INVALID_CORE_ID )
             continue;
@@ -1005,7 +998,7 @@ void cpu_topology_func(int argc, char *argv[])
     }
     rc = 0;
 out:
-    xc_hypercall_buffer_free(xc_handle, cputopo);
+    free(cputopo);
     if ( rc )
         exit(rc);
 }
diff --git a/tools/python/xen/lowlevel/xc/xc.c b/tools/python/xen/lowlevel/xc/xc.c
index ba66d55..21ba57d 100644
--- a/tools/python/xen/lowlevel/xc/xc.c
+++ b/tools/python/xen/lowlevel/xc/xc.c
@@ -1220,30 +1220,26 @@ static PyObject *pyxc_getcpuinfo(XcObject *self, PyObject *args, PyObject *kwds)
 
 static PyObject *pyxc_topologyinfo(XcObject *self)
 {
-    xc_cputopoinfo_t tinfo = { 0 };
-    unsigned i;
+    xc_cputopo_t *cputopo = NULL;
+    unsigned i, num_cpus;
     PyObject *ret_obj = NULL;
     PyObject *cpu_to_core_obj, *cpu_to_socket_obj, *cpu_to_node_obj;
-    DECLARE_HYPERCALL_BUFFER(xen_sysctl_cputopo_t, cputopo);
 
-    set_xen_guest_handle(tinfo.cputopo, HYPERCALL_BUFFER_NULL);
-    if ( xc_cputopoinfo(self->xc_handle, &tinfo) != 0 )
+    if ( xc_cputopoinfo(self->xc_handle, &num_cpus, NULL) != 0 )
         goto out;
 
-    cputopo = xc_hypercall_buffer_alloc(self->xc_handle, cputopo,
-                                        sizeof(*cputopo) * tinfo.num_cpus);
+    cputopo = calloc(num_cpus, sizeof(*cputopo));
     if ( cputopo == NULL )
     	goto out;
 
-    set_xen_guest_handle(tinfo.cputopo, cputopo);
-    if ( xc_cputopoinfo(self->xc_handle, &tinfo) != 0 )
+    if ( xc_cputopoinfo(self->xc_handle, &num_cpus, cputopo) != 0 )
         goto out;
 
     /* Construct cpu-to-* lists. */
     cpu_to_core_obj = PyList_New(0);
     cpu_to_socket_obj = PyList_New(0);
     cpu_to_node_obj = PyList_New(0);
-    for ( i = 0; i < tinfo.num_cpus; i++ )
+    for ( i = 0; i < num_cpus; i++ )
     {
         if ( cputopo[i].core == XEN_INVALID_CORE_ID )
         {
@@ -1279,7 +1275,7 @@ static PyObject *pyxc_topologyinfo(XcObject *self)
         }
     }
 
-    ret_obj = Py_BuildValue("{s:i}", "max_cpu_index", tinfo.num_cpus + 1);
+    ret_obj = Py_BuildValue("{s:i}", "max_cpu_index", num_cpus + 1);
 
     PyDict_SetItemString(ret_obj, "cpu_to_core", cpu_to_core_obj);
     Py_DECREF(cpu_to_core_obj);
@@ -1291,7 +1287,7 @@ static PyObject *pyxc_topologyinfo(XcObject *self)
     Py_DECREF(cpu_to_node_obj);
 
 out:
-    xc_hypercall_buffer_free(self->xc_handle, cputopo);
+    free(cputopo);
     return ret_obj ? ret_obj : pyxc_error_to_exception(self->xc_handle);
 }
 
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v5 7/8] libxl/libxc: Move libxl_get_numainfo()'s hypercall buffer management to libxc
  2015-03-19 21:53 [PATCH v5 0/8] Display IO topology when PXM data is available (plus some cleanup) Boris Ostrovsky
                   ` (5 preceding siblings ...)
  2015-03-19 21:54 ` [PATCH v5 6/8] libxl/libxc: Move libxl_get_cpu_topology()'s hypercall buffer management to libxc Boris Ostrovsky
@ 2015-03-19 21:54 ` Boris Ostrovsky
  2015-03-20 13:49   ` Konrad Rzeszutek Wilk
  2015-03-20 13:56   ` Ian Campbell
  2015-03-19 21:54 ` [PATCH v5 8/8] libxl: Add interface for querying hypervisor about PCI topology Boris Ostrovsky
  2015-03-23 12:42 ` [PATCH v5 0/8] Display IO topology when PXM data is available (plus some cleanup) Julien Grall
  8 siblings, 2 replies; 45+ messages in thread
From: Boris Ostrovsky @ 2015-03-19 21:54 UTC (permalink / raw)
  To: andrew.cooper3, jbeulich, keir, ian.jackson, stefano.stabellini,
	ian.campbell, wei.liu2, dario.faggioli, elena.ufimtseva,
	xen-devel
  Cc: boris.ostrovsky

xc_numainfo() is not expected to be used on a hot path and therefore
hypercall buffer management can be pushed into libxc. This will simplify
life for callers.

Also update error logging macros.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
---
Changes in v5:
* Adjust to new interface (as described in changelog for patch 4/8)

 tools/libxc/include/xenctrl.h     |    4 ++-
 tools/libxc/xc_misc.c             |   41 ++++++++++++++++++++++++-----
 tools/libxl/libxl.c               |   52 ++++++++++++-------------------------
 tools/python/xen/lowlevel/xc/xc.c |   38 ++++++++++-----------------
 4 files changed, 68 insertions(+), 67 deletions(-)

diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index 14d22ce..54540e7 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -1228,6 +1228,7 @@ int xc_send_debug_keys(xc_interface *xch, char *keys);
 typedef xen_sysctl_physinfo_t xc_physinfo_t;
 typedef xen_sysctl_cputopo_t xc_cputopo_t;
 typedef xen_sysctl_numainfo_t xc_numainfo_t;
+typedef xen_sysctl_meminfo_t xc_meminfo_t;
 
 typedef uint32_t xc_cpu_to_node_t;
 typedef uint32_t xc_cpu_to_socket_t;
@@ -1239,7 +1240,8 @@ typedef uint32_t xc_node_to_node_dist_t;
 int xc_physinfo(xc_interface *xch, xc_physinfo_t *info);
 int xc_cputopoinfo(xc_interface *xch, unsigned *max_cpus,
                    xc_cputopo_t *cputopo);
-int xc_numainfo(xc_interface *xch, xc_numainfo_t *info);
+int xc_numainfo(xc_interface *xch, unsigned *max_nodes,
+                xc_meminfo_t *meminfo, uint32_t *distance);
 
 int xc_sched_id(xc_interface *xch,
                 int *sched_id);
diff --git a/tools/libxc/xc_misc.c b/tools/libxc/xc_misc.c
index 411128e..607ae61 100644
--- a/tools/libxc/xc_misc.c
+++ b/tools/libxc/xc_misc.c
@@ -209,22 +209,49 @@ out:
     return ret;
 }
 
-int xc_numainfo(xc_interface *xch,
-                xc_numainfo_t *put_info)
+int xc_numainfo(xc_interface *xch, unsigned *max_nodes,
+                xc_meminfo_t *meminfo, uint32_t *distance)
 {
     int ret;
     DECLARE_SYSCTL;
+    DECLARE_HYPERCALL_BOUNCE(meminfo, *max_nodes * sizeof(*meminfo),
+                             XC_HYPERCALL_BUFFER_BOUNCE_OUT);
+    DECLARE_HYPERCALL_BOUNCE(distance,
+                             *max_nodes * *max_nodes * sizeof(*distance),
+                             XC_HYPERCALL_BUFFER_BOUNCE_OUT);
 
-    sysctl.cmd = XEN_SYSCTL_numainfo;
+    if (meminfo && distance) {
+        if ((ret = xc_hypercall_bounce_pre(xch, meminfo)))
+            goto out;
+        if ((ret = xc_hypercall_bounce_pre(xch, distance)))
+            goto out;
 
-    memcpy(&sysctl.u.numainfo, put_info, sizeof(*put_info));
+        sysctl.u.numainfo.num_nodes = *max_nodes;
+        set_xen_guest_handle(sysctl.u.numainfo.meminfo, meminfo);
+        set_xen_guest_handle(sysctl.u.numainfo.distance, distance);
+    } else if (meminfo || distance) {
+        errno = EINVAL;
+        return -1;
+    } else {
+        set_xen_guest_handle(sysctl.u.numainfo.meminfo,
+                             HYPERCALL_BUFFER_NULL);
+        set_xen_guest_handle(sysctl.u.numainfo.distance,
+                             HYPERCALL_BUFFER_NULL);
+    }
 
+    sysctl.cmd = XEN_SYSCTL_numainfo;
     if ((ret = do_sysctl(xch, &sysctl)) != 0)
-        return ret;
+        goto out;
 
-    memcpy(put_info, &sysctl.u.numainfo, sizeof(*put_info));
+    *max_nodes = sysctl.u.numainfo.num_nodes;
 
-    return 0;
+out:
+    if (meminfo && distance) {
+        xc_hypercall_bounce_post(xch, meminfo);
+        xc_hypercall_bounce_post(xch, distance);
+    }
+
+    return ret;
 }
 
 
diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index 2b7d19c..ee97a54 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -5094,62 +5094,44 @@ libxl_cputopology *libxl_get_cpu_topology(libxl_ctx *ctx, int *nb_cpu_out)
 libxl_numainfo *libxl_get_numainfo(libxl_ctx *ctx, int *nr)
 {
     GC_INIT(ctx);
-    xc_numainfo_t ninfo;
-    DECLARE_HYPERCALL_BUFFER(xen_sysctl_meminfo_t, meminfo);
-    DECLARE_HYPERCALL_BUFFER(uint32_t, distance);
+    xc_meminfo_t *meminfo;
+    uint32_t *distance;
     libxl_numainfo *ret = NULL;
     int i, j;
+    unsigned num_nodes;
 
-    set_xen_guest_handle(ninfo.meminfo, HYPERCALL_BUFFER_NULL);
-    set_xen_guest_handle(ninfo.distance, HYPERCALL_BUFFER_NULL);
-    if ( xc_numainfo(ctx->xch, &ninfo) != 0)
-    {
-        LIBXL__LOG(ctx, XTL_ERROR, "Unable to determine number of NODES");
-        ret = NULL;
+    if (xc_numainfo(ctx->xch, &num_nodes, NULL, NULL)) {
+        LOGEV(ERROR, errno, "Unable to determine number of nodes");
         goto out;
     }
 
-    meminfo = xc_hypercall_buffer_alloc(ctx->xch, meminfo,
-                                        sizeof(*meminfo) * ninfo.num_nodes);
-    distance = xc_hypercall_buffer_alloc(ctx->xch, distance,
-                                         sizeof(*distance) *
-                                         ninfo.num_nodes * ninfo.num_nodes);
-    if ((meminfo == NULL) || (distance == NULL)) {
-        LIBXL__LOG_ERRNOVAL(ctx, XTL_ERROR, ENOMEM,
-                            "Unable to allocate hypercall arguments");
-        goto fail;
-    }
+    meminfo = libxl__zalloc(gc, sizeof(*meminfo) * num_nodes);
+    distance = libxl__zalloc(gc, sizeof(*distance) * num_nodes * num_nodes);
 
-    set_xen_guest_handle(ninfo.meminfo, meminfo);
-    set_xen_guest_handle(ninfo.distance, distance);
-    if (xc_numainfo(ctx->xch, &ninfo) != 0) {
-        LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "getting numainfo");
-        goto fail;
+    if (xc_numainfo(ctx->xch, &num_nodes, meminfo, distance)) {
+        LOGEV(ERROR, errno, "getting numainfo");
+        goto out;
     }
 
-    *nr = ninfo.num_nodes;
+    *nr = num_nodes;
 
-    ret = libxl__zalloc(NOGC, sizeof(libxl_numainfo) * ninfo.num_nodes);
-    for (i = 0; i < ninfo.num_nodes; i++)
-        ret[i].dists = libxl__calloc(NOGC, ninfo.num_nodes, sizeof(*distance));
+    ret = libxl__zalloc(NOGC, sizeof(libxl_numainfo) * num_nodes);
+    for (i = 0; i < num_nodes; i++)
+        ret[i].dists = libxl__calloc(NOGC, num_nodes, sizeof(*distance));
 
-    for (i = 0; i < ninfo.num_nodes; i++) {
+    for (i = 0; i < num_nodes; i++) {
 #define V(val, invalid) (val == invalid) ? \
        LIBXL_NUMAINFO_INVALID_ENTRY : val
         ret[i].size = V(meminfo[i].memsize, XEN_INVALID_MEM_SZ);
         ret[i].free = V(meminfo[i].memfree, XEN_INVALID_MEM_SZ);
-        ret[i].num_dists = ninfo.num_nodes;
+        ret[i].num_dists = num_nodes;
         for (j = 0; j < ret[i].num_dists; j++) {
-            unsigned idx = i * ninfo.num_nodes + j;
+            unsigned idx = i * num_nodes + j;
             ret[i].dists[j] = V(distance[idx], XEN_INVALID_NODE_DIST);
         }
 #undef V
     }
 
- fail:
-    xc_hypercall_buffer_free(ctx->xch, meminfo);
-    xc_hypercall_buffer_free(ctx->xch, distance);
-
  out:
     GC_FREE;
     return ret;
diff --git a/tools/python/xen/lowlevel/xc/xc.c b/tools/python/xen/lowlevel/xc/xc.c
index 21ba57d..fbd93db 100644
--- a/tools/python/xen/lowlevel/xc/xc.c
+++ b/tools/python/xen/lowlevel/xc/xc.c
@@ -1293,33 +1293,23 @@ out:
 
 static PyObject *pyxc_numainfo(XcObject *self)
 {
-    xc_numainfo_t ninfo = { 0 };
-    unsigned i, j;
+    unsigned i, j, num_nodes;
     uint64_t free_heap;
     PyObject *ret_obj = NULL, *node_to_node_dist_list_obj;
     PyObject *node_to_memsize_obj, *node_to_memfree_obj;
     PyObject *node_to_dma32_mem_obj, *node_to_node_dist_obj;
-    DECLARE_HYPERCALL_BUFFER(xen_sysctl_meminfo_t, meminfo);
-    DECLARE_HYPERCALL_BUFFER(uint32_t, distance);
+    xc_meminfo_t *meminfo = NULL;
+    uint32_t *distance = NULL;
 
-    set_xen_guest_handle(ninfo.meminfo, HYPERCALL_BUFFER_NULL);
-    set_xen_guest_handle(ninfo.distance, HYPERCALL_BUFFER_NULL);
-    if ( xc_numainfo(self->xc_handle, &ninfo) != 0 )
+    if ( xc_numainfo(self->xc_handle, &num_nodes, NULL, NULL) != 0 )
         goto out;
 
-    meminfo = xc_hypercall_buffer_alloc(self->xc_handle, meminfo,
-                                        sizeof(*meminfo) * ninfo.num_nodes);
-    if ( meminfo == NULL )
-        goto out;
-    distance = xc_hypercall_buffer_alloc(self->xc_handle, distance,
-                                         sizeof(*distance) *
-                                         ninfo.num_nodes * ninfo.num_nodes);
-    if ( distance == NULL )
+    meminfo = calloc(num_nodes, sizeof(*meminfo));
+    distance = calloc(num_nodes * num_nodes, sizeof(*distance));
+    if ( (meminfo == NULL) || (distance == NULL) )
         goto out;
 
-    set_xen_guest_handle(ninfo.meminfo, meminfo);
-    set_xen_guest_handle(ninfo.distance, distance);
-    if ( xc_numainfo(self->xc_handle, &ninfo) != 0 )
+    if ( xc_numainfo(self->xc_handle, &num_nodes, meminfo, distance) != 0 )
         goto out;
 
     /* Construct node-to-* lists. */
@@ -1327,7 +1317,7 @@ static PyObject *pyxc_numainfo(XcObject *self)
     node_to_memfree_obj = PyList_New(0);
     node_to_dma32_mem_obj = PyList_New(0);
     node_to_node_dist_list_obj = PyList_New(0);
-    for ( i = 0; i < ninfo.num_nodes; i++ )
+    for ( i = 0; i < num_nodes; i++ )
     {
         PyObject *pyint;
         unsigned invalid_node;
@@ -1351,9 +1341,9 @@ static PyObject *pyxc_numainfo(XcObject *self)
         /* Node to Node Distance */
         node_to_node_dist_obj = PyList_New(0);
         invalid_node = (meminfo[i].memsize == XEN_INVALID_MEM_SZ);
-        for ( j = 0; j < ninfo.num_nodes; j++ )
+        for ( j = 0; j < num_nodes; j++ )
         {
-            uint32_t dist = distance[i * ninfo.num_nodes + j];
+            uint32_t dist = distance[i * num_nodes + j];
             if ( invalid_node || (dist == XEN_INVALID_NODE_DIST) )
             {
                 PyList_Append(node_to_node_dist_obj, Py_None);
@@ -1369,7 +1359,7 @@ static PyObject *pyxc_numainfo(XcObject *self)
         Py_DECREF(node_to_node_dist_obj);
     }
 
-    ret_obj = Py_BuildValue("{s:i}", "max_node_index", ninfo.num_nodes + 1);
+    ret_obj = Py_BuildValue("{s:i}", "max_node_index", num_nodes + 1);
 
     PyDict_SetItemString(ret_obj, "node_memsize", node_to_memsize_obj);
     Py_DECREF(node_to_memsize_obj);
@@ -1385,8 +1375,8 @@ static PyObject *pyxc_numainfo(XcObject *self)
     Py_DECREF(node_to_node_dist_list_obj);
 
 out:
-    xc_hypercall_buffer_free(self->xc_handle, meminfo);
-    xc_hypercall_buffer_free(self->xc_handle, distance);
+    free(meminfo);
+    free(distance);
     return ret_obj ? ret_obj : pyxc_error_to_exception(self->xc_handle);
 }
 
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v5 8/8] libxl: Add interface for querying hypervisor about PCI topology
  2015-03-19 21:53 [PATCH v5 0/8] Display IO topology when PXM data is available (plus some cleanup) Boris Ostrovsky
                   ` (6 preceding siblings ...)
  2015-03-19 21:54 ` [PATCH v5 7/8] libxl/libxc: Move libxl_get_numainfo()'s " Boris Ostrovsky
@ 2015-03-19 21:54 ` Boris Ostrovsky
  2015-03-23 12:42 ` [PATCH v5 0/8] Display IO topology when PXM data is available (plus some cleanup) Julien Grall
  8 siblings, 0 replies; 45+ messages in thread
From: Boris Ostrovsky @ 2015-03-19 21:54 UTC (permalink / raw)
  To: andrew.cooper3, jbeulich, keir, ian.jackson, stefano.stabellini,
	ian.campbell, wei.liu2, dario.faggioli, elena.ufimtseva,
	xen-devel
  Cc: boris.ostrovsky

.. and use this new interface to display it along with CPU topology
and NUMA information when 'xl info -n' command is issued

The output will look like
...
cpu_topology           :
cpu:    core    socket     node
  0:       0        0        0
...
device topology        :
device           node
0000:00:00.0      0
0000:00:01.0      0
...

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
---
 tools/libxc/include/xenctrl.h |    3 ++
 tools/libxc/xc_misc.c         |   31 ++++++++++++++++++
 tools/libxl/libxl.c           |   42 +++++++++++++++++++++++++
 tools/libxl/libxl.h           |   12 +++++++
 tools/libxl/libxl_freebsd.c   |   12 +++++++
 tools/libxl/libxl_internal.h  |    5 +++
 tools/libxl/libxl_linux.c     |   69 +++++++++++++++++++++++++++++++++++++++++
 tools/libxl/libxl_netbsd.c    |   12 +++++++
 tools/libxl/libxl_types.idl   |    7 ++++
 tools/libxl/libxl_utils.c     |    8 +++++
 tools/libxl/xl_cmdimpl.c      |   40 +++++++++++++++++++----
 11 files changed, 234 insertions(+), 7 deletions(-)

diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index 54540e7..f1207fa 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -1229,6 +1229,7 @@ typedef xen_sysctl_physinfo_t xc_physinfo_t;
 typedef xen_sysctl_cputopo_t xc_cputopo_t;
 typedef xen_sysctl_numainfo_t xc_numainfo_t;
 typedef xen_sysctl_meminfo_t xc_meminfo_t;
+typedef xen_sysctl_pcitopoinfo_t xc_pcitopoinfo_t;
 
 typedef uint32_t xc_cpu_to_node_t;
 typedef uint32_t xc_cpu_to_socket_t;
@@ -1242,6 +1243,8 @@ int xc_cputopoinfo(xc_interface *xch, unsigned *max_cpus,
                    xc_cputopo_t *cputopo);
 int xc_numainfo(xc_interface *xch, unsigned *max_nodes,
                 xc_meminfo_t *meminfo, uint32_t *distance);
+int xc_pcitopoinfo(xc_interface *xch, unsigned num_devs,
+                   physdev_pci_device_t *devs, uint32_t *nodes);
 
 int xc_sched_id(xc_interface *xch,
                 int *sched_id);
diff --git a/tools/libxc/xc_misc.c b/tools/libxc/xc_misc.c
index 607ae61..6e10429 100644
--- a/tools/libxc/xc_misc.c
+++ b/tools/libxc/xc_misc.c
@@ -254,6 +254,37 @@ out:
     return ret;
 }
 
+int xc_pcitopoinfo(xc_interface *xch, unsigned num_devs,
+                   physdev_pci_device_t *devs,
+                   uint32_t *nodes)
+{
+    int ret;
+    DECLARE_SYSCTL;
+    DECLARE_HYPERCALL_BOUNCE(devs, num_devs * sizeof(*devs),
+                             XC_HYPERCALL_BUFFER_BOUNCE_IN);
+    DECLARE_HYPERCALL_BOUNCE(nodes, num_devs* sizeof(*nodes),
+                             XC_HYPERCALL_BUFFER_BOUNCE_OUT);
+
+    if ((ret = xc_hypercall_bounce_pre(xch, devs)))
+        goto out;
+    if ((ret = xc_hypercall_bounce_pre(xch, nodes)))
+        goto out;
+
+    sysctl.u.pcitopoinfo.first_dev = 0;
+    sysctl.u.pcitopoinfo.num_devs = num_devs;
+    set_xen_guest_handle(sysctl.u.pcitopoinfo.devs, devs);
+    set_xen_guest_handle(sysctl.u.pcitopoinfo.nodes, nodes);
+
+    sysctl.cmd = XEN_SYSCTL_pcitopoinfo;
+
+    ret = do_sysctl(xch, &sysctl);
+
+ out:
+    xc_hypercall_bounce_post(xch, devs);
+    xc_hypercall_bounce_post(xch, nodes);
+
+    return ret;
+}
 
 int xc_sched_id(xc_interface *xch,
                 int *sched_id)
diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index ee97a54..1c51c53 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -5091,6 +5091,48 @@ libxl_cputopology *libxl_get_cpu_topology(libxl_ctx *ctx, int *nb_cpu_out)
     return ret;
 }
 
+libxl_pcitopology *libxl_get_pci_topology(libxl_ctx *ctx, int *num_devs)
+{
+    GC_INIT(ctx);
+    physdev_pci_device_t *devs;
+    uint32_t *nodes;
+    libxl_pcitopology *ret = NULL;
+    int i;
+
+    *num_devs = libxl__pci_numdevs(gc);
+    if (*num_devs < 0) {
+        LOG(ERROR, "Unable to determine number of PCI devices");
+        goto out;
+    }
+
+    devs = libxl__zalloc(gc, sizeof(*devs) * *num_devs);
+    nodes = libxl__zalloc(gc, sizeof(*nodes) * *num_devs);
+
+    if (libxl__pci_topology_init(gc, devs, *num_devs)) {
+        LOGE(ERROR, "Cannot initialize PCI hypercall structure");
+        goto out;
+    }
+
+    if (xc_pcitopoinfo(ctx->xch, *num_devs, devs, nodes) != 0) {
+        LOGE(ERROR, "PCI topology info hypercall failed");
+        goto out;
+    }
+
+    ret = libxl__zalloc(NOGC, sizeof(libxl_pcitopology) * *num_devs);
+
+    for (i = 0; i < *num_devs; i++) {
+        ret[i].seg = devs[i].seg;
+        ret[i].bus = devs[i].bus;
+        ret[i].devfn = devs[i].devfn;
+        ret[i].node = (nodes[i] == XEN_INVALID_NODE_ID) ?
+            LIBXL_PCITOPOLOGY_INVALID_ENTRY : nodes[i];
+    }
+
+ out:
+    GC_FREE;
+    return ret;
+}
+
 libxl_numainfo *libxl_get_numainfo(libxl_ctx *ctx, int *nr)
 {
     GC_INIT(ctx);
diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
index 5eec092..c447636 100644
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -720,6 +720,14 @@ void libxl_mac_copy(libxl_ctx *ctx, libxl_mac *dst, libxl_mac *src);
 #define LIBXL_HAVE_PSR_MBM 1
 #endif
 
+/*
+ * LIBXL_HAVE_PCITOPOLOGY
+ *
+ * If this is defined, then interface to query hypervisor about PCI device
+ * topology is available.
+ */
+#define LIBXL_HAVE_PCITOPOLOGY 1
+
 typedef char **libxl_string_list;
 void libxl_string_list_dispose(libxl_string_list *sl);
 int libxl_string_list_length(const libxl_string_list *sl);
@@ -1110,6 +1118,10 @@ void libxl_vminfo_list_free(libxl_vminfo *list, int nb_vm);
 libxl_cputopology *libxl_get_cpu_topology(libxl_ctx *ctx, int *nb_cpu_out);
 void libxl_cputopology_list_free(libxl_cputopology *, int nb_cpu);
 
+#define LIBXL_PCITOPOLOGY_INVALID_ENTRY (~(uint32_t)0)
+libxl_pcitopology *libxl_get_pci_topology(libxl_ctx *ctx, int *num_devs);
+void libxl_pcitopology_list_free(libxl_pcitopology *, int num_devs);
+
 #define LIBXL_NUMAINFO_INVALID_ENTRY (~(uint32_t)0)
 libxl_numainfo *libxl_get_numainfo(libxl_ctx *ctx, int *nr);
 void libxl_numainfo_list_free(libxl_numainfo *, int nr);
diff --git a/tools/libxl/libxl_freebsd.c b/tools/libxl/libxl_freebsd.c
index e8b88b3..47c3391 100644
--- a/tools/libxl/libxl_freebsd.c
+++ b/tools/libxl/libxl_freebsd.c
@@ -131,3 +131,15 @@ libxl_device_model_version libxl__default_device_model(libxl__gc *gc)
 {
     return LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN;
 }
+
+int libxl__pci_numdevs(libxl__gc *gc)
+{
+    return ERROR_NI;
+}
+
+int libxl__pci_topology_init(libxl__gc *gc,
+                             physdev_pci_device_t *devs,
+                             int num_devs)
+{
+    return ERROR_NI;
+}
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 934465a..42942b2 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -1170,6 +1170,11 @@ _hidden int libxl__try_phy_backend(mode_t st_mode);
 
 _hidden char *libxl__devid_to_localdev(libxl__gc *gc, int devid);
 
+_hidden int libxl__pci_numdevs(libxl__gc *gc);
+_hidden int libxl__pci_topology_init(libxl__gc *gc,
+                                     physdev_pci_device_t *devs,
+                                     int num_devs);
+
 /* from libxl_pci */
 
 _hidden int libxl__device_pci_add(libxl__gc *gc, uint32_t domid, libxl_device_pci *pcidev, int starting);
diff --git a/tools/libxl/libxl_linux.c b/tools/libxl/libxl_linux.c
index b51930c..be76c6f 100644
--- a/tools/libxl/libxl_linux.c
+++ b/tools/libxl/libxl_linux.c
@@ -279,3 +279,72 @@ libxl_device_model_version libxl__default_device_model(libxl__gc *gc)
 {
     return LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN;
 }
+
+int libxl__pci_numdevs(libxl__gc *gc)
+{
+    DIR *dir;
+    struct dirent *entry;
+    int num_devs = 0;
+
+    dir = opendir("/sys/bus/pci/devices");
+    if (!dir) {
+        LOGEV(ERROR, errno, "Cannot open /sys/bus/pci/devices");
+        return ERROR_FAIL;
+    }
+
+    while ((entry = readdir(dir))) {
+        if (entry->d_name[0] == '.')
+            continue;
+        num_devs++;
+    }
+    closedir(dir);
+
+    return num_devs;
+}
+
+int libxl__pci_topology_init(libxl__gc *gc,
+                             physdev_pci_device_t *devs,
+                             int num_devs)
+{
+
+    DIR *dir;
+    struct dirent *entry;
+    int i, err = 0;
+
+    dir = opendir("/sys/bus/pci/devices");
+    if (!dir) {
+        LOGEV(ERROR, errno, "Cannot open /sys/bus/pci/devices");
+        return ERROR_FAIL;
+    }
+
+    i = 0;
+    while ((entry = readdir(dir))) {
+        unsigned int dom, bus, dev, func;
+
+        if (entry->d_name[0] == '.')
+            continue;
+
+        if (i == num_devs) {
+            LOGE(ERROR, "Too many devices\n");
+            err = ERROR_FAIL;
+            goto out;
+        }
+
+        if (sscanf(entry->d_name, "%x:%x:%x.%d", &dom, &bus, &dev, &func) < 4) {
+            LOGEV(ERROR, errno, "Error processing /sys/bus/pci/devices");
+            err = ERROR_FAIL;
+            goto out;
+        }
+
+        devs[i].seg = dom;
+        devs[i].bus = bus;
+        devs[i].devfn = ((dev & 0x1f) << 3) | (func & 7);
+
+        i++;
+    }
+
+ out:
+    closedir(dir);
+
+    return err;
+}
diff --git a/tools/libxl/libxl_netbsd.c b/tools/libxl/libxl_netbsd.c
index 898e160..a2a962e 100644
--- a/tools/libxl/libxl_netbsd.c
+++ b/tools/libxl/libxl_netbsd.c
@@ -95,3 +95,15 @@ libxl_device_model_version libxl__default_device_model(libxl__gc *gc)
 {
     return LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN_TRADITIONAL;
 }
+
+int libxl__pci_numdevs(libxl__gc *gc)
+{
+    return ERROR_NI;
+}
+
+int libxl__pci_topology_init(libxl__gc *gc,
+                             physdev_pci_device_t *devs,
+                             int num_devs)
+{
+    return ERROR_NI;
+}
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index 47af340..d3b5061 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -645,6 +645,13 @@ libxl_cputopology = Struct("cputopology", [
     ("node", uint32),
     ], dir=DIR_OUT)
 
+libxl_pcitopology = Struct("pcitopology", [
+    ("seg", uint16),
+    ("bus", uint8),
+    ("devfn", uint8),
+    ("node", uint32),
+    ], dir=DIR_OUT)
+
 libxl_sched_credit_params = Struct("sched_credit_params", [
     ("tslice_ms", integer),
     ("ratelimit_us", integer),
diff --git a/tools/libxl/libxl_utils.c b/tools/libxl/libxl_utils.c
index 9053b27..9108b56 100644
--- a/tools/libxl/libxl_utils.c
+++ b/tools/libxl/libxl_utils.c
@@ -882,6 +882,14 @@ void libxl_cputopology_list_free(libxl_cputopology *list, int nr)
     free(list);
 }
 
+void libxl_pcitopology_list_free(libxl_pcitopology *list, int nr)
+{
+    int i;
+    for (i = 0; i < nr; i++)
+        libxl_pcitopology_dispose(&list[i]);
+    free(list);
+}
+
 void libxl_numainfo_list_free(libxl_numainfo *list, int nr)
 {
     int i;
diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index 04faf98..f152152 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -5200,12 +5200,15 @@ static void output_numainfo(void)
 
 static void output_topologyinfo(void)
 {
-    libxl_cputopology *info;
+    libxl_cputopology *cpuinfo;
     int i, nr;
+    libxl_pcitopology *pciinfo;
+    int valid_devs;
 
-    info = libxl_get_cpu_topology(ctx, &nr);
-    if (info == NULL) {
-        fprintf(stderr, "libxl_get_topologyinfo failed.\n");
+
+    cpuinfo = libxl_get_cpu_topology(ctx, &nr);
+    if (cpuinfo == NULL) {
+        fprintf(stderr, "libxl_get_cpu_topology failed.\n");
         return;
     }
 
@@ -5213,12 +5216,35 @@ static void output_topologyinfo(void)
     printf("cpu:    core    socket     node\n");
 
     for (i = 0; i < nr; i++) {
-        if (info[i].core != LIBXL_CPUTOPOLOGY_INVALID_ENTRY)
+        if (cpuinfo[i].core != LIBXL_CPUTOPOLOGY_INVALID_ENTRY)
             printf("%3d:    %4d     %4d     %4d\n", i,
-                   info[i].core, info[i].socket, info[i].node);
+                   cpuinfo[i].core, cpuinfo[i].socket, cpuinfo[i].node);
+    }
+
+    libxl_cputopology_list_free(cpuinfo, nr);
+
+    pciinfo = libxl_get_pci_topology(ctx, &nr);
+    if (cpuinfo == NULL) {
+        fprintf(stderr, "libxl_get_pci_topology failed.\n");
+        return;
     }
 
-    libxl_cputopology_list_free(info, nr);
+    printf("device topology        :\n");
+    printf("device           node\n");
+    for (i = 0; i < nr; i++) {
+        if (pciinfo[i].node != LIBXL_PCITOPOLOGY_INVALID_ENTRY) {
+            printf("%04x:%02x:%02x.%01x      %d\n", pciinfo[i].seg,
+                   pciinfo[i].bus,
+                   ((pciinfo[i].devfn >> 3) & 0x1f), (pciinfo[i].devfn & 7),
+                   pciinfo[i].node);
+            valid_devs++;
+        }
+    }
+
+    if (valid_devs == 0)
+        printf("No device topology data available\n");
+
+    libxl_pcitopology_list_free(pciinfo, nr);
 
     return;
 }
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [PATCH v5 3/8] sysctl: Make XEN_SYSCTL_topologyinfo sysctl a little more efficient
  2015-03-19 21:53 ` [PATCH v5 3/8] sysctl: Make XEN_SYSCTL_topologyinfo sysctl a little more efficient Boris Ostrovsky
@ 2015-03-20 13:25   ` Ian Campbell
  2015-03-20 14:00     ` Boris Ostrovsky
  2015-03-25 16:13   ` Andrew Cooper
  1 sibling, 1 reply; 45+ messages in thread
From: Ian Campbell @ 2015-03-20 13:25 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: elena.ufimtseva, wei.liu2, stefano.stabellini, andrew.cooper3,
	dario.faggioli, ian.jackson, xen-devel, jbeulich, keir

On Thu, 2015-03-19 at 17:53 -0400, Boris Ostrovsky wrote:
[...]
> * Replace INVALID_TOPOLOGY_ID with "XEN_"-prefixed macros for each invalid type
>   (core, socket, node).

But not for distance:

> @@ -1375,7 +1360,7 @@ static PyObject *pyxc_numainfo(XcObject *self)
>          for ( j = 0; j <= max_node_index; j++ )
>          {
>              uint32_t dist = nodes_dist[i*(max_node_index+1) + j];
> -            if ( dist == INVALID_TOPOLOGY_ID )
> +            if ( dist == ~0u )

?

Other than that the tools side here looks ok.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v5 4/8] sysctl: Make XEN_SYSCTL_numainfo a little more efficient
  2015-03-19 21:54 ` [PATCH v5 4/8] sysctl: Make XEN_SYSCTL_numainfo " Boris Ostrovsky
@ 2015-03-20 13:26   ` Ian Campbell
  2015-03-20 16:15   ` Jan Beulich
  1 sibling, 0 replies; 45+ messages in thread
From: Ian Campbell @ 2015-03-20 13:26 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: elena.ufimtseva, wei.liu2, stefano.stabellini, andrew.cooper3,
	dario.faggioli, ian.jackson, xen-devel, jbeulich, keir

On Thu, 2015-03-19 at 17:54 -0400, Boris Ostrovsky wrote:
> A number of changes to XEN_SYSCTL_numainfo interface:
> 
> * Make sysctl NUMA topology query use fewer copies by combining some
>   fields into a single structure and copying distances for each node
>   in a single copy.
> * NULL meminfo and distance handles are a request for maximum number
>   of nodes (num_nodes). If those handles are valid and num_nodes is
>   is smaller than the number of nodes in the system then -ENOBUFS is
>   returned (and correct num_nodes is provided)
> * Instead of using max_node_index for passing number of nodes keep this
>   value in num_nodes: almost all uses of max_node_index required adding
>   or subtracting one to eventually get to number of nodes anyway.
> * Replace INVALID_NUMAINFO_ID with XEN_INVALID_MEM_SZ and add
>   XEN_INVALID_NODE_DIST.
> 
> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>

Acked-by: Ian Campbell <ian.campbell@citrix.com>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v5 7/8] libxl/libxc: Move libxl_get_numainfo()'s hypercall buffer management to libxc
  2015-03-19 21:54 ` [PATCH v5 7/8] libxl/libxc: Move libxl_get_numainfo()'s " Boris Ostrovsky
@ 2015-03-20 13:49   ` Konrad Rzeszutek Wilk
  2015-03-20 14:10     ` Boris Ostrovsky
  2015-03-20 13:56   ` Ian Campbell
  1 sibling, 1 reply; 45+ messages in thread
From: Konrad Rzeszutek Wilk @ 2015-03-20 13:49 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: elena.ufimtseva, keir, ian.campbell, stefano.stabellini,
	andrew.cooper3, dario.faggioli, ian.jackson, xen-devel, jbeulich,
	wei.liu2

On Thu, Mar 19, 2015 at 05:54:03PM -0400, Boris Ostrovsky wrote:
> xc_numainfo() is not expected to be used on a hot path and therefore
> hypercall buffer management can be pushed into libxc. This will simplify
> life for callers.
> 
> Also update error logging macros.
> 
> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> ---
> Changes in v5:
> * Adjust to new interface (as described in changelog for patch 4/8)
> 
>  tools/libxc/include/xenctrl.h     |    4 ++-
>  tools/libxc/xc_misc.c             |   41 ++++++++++++++++++++++++-----
>  tools/libxl/libxl.c               |   52 ++++++++++++-------------------------
>  tools/python/xen/lowlevel/xc/xc.c |   38 ++++++++++-----------------
>  4 files changed, 68 insertions(+), 67 deletions(-)
> 
> diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
> index 14d22ce..54540e7 100644
> --- a/tools/libxc/include/xenctrl.h
> +++ b/tools/libxc/include/xenctrl.h
> @@ -1228,6 +1228,7 @@ int xc_send_debug_keys(xc_interface *xch, char *keys);
>  typedef xen_sysctl_physinfo_t xc_physinfo_t;
>  typedef xen_sysctl_cputopo_t xc_cputopo_t;
>  typedef xen_sysctl_numainfo_t xc_numainfo_t;
> +typedef xen_sysctl_meminfo_t xc_meminfo_t;
>  
>  typedef uint32_t xc_cpu_to_node_t;
>  typedef uint32_t xc_cpu_to_socket_t;
> @@ -1239,7 +1240,8 @@ typedef uint32_t xc_node_to_node_dist_t;
>  int xc_physinfo(xc_interface *xch, xc_physinfo_t *info);
>  int xc_cputopoinfo(xc_interface *xch, unsigned *max_cpus,
>                     xc_cputopo_t *cputopo);
> -int xc_numainfo(xc_interface *xch, xc_numainfo_t *info);
> +int xc_numainfo(xc_interface *xch, unsigned *max_nodes,
> +                xc_meminfo_t *meminfo, uint32_t *distance);
>  
>  int xc_sched_id(xc_interface *xch,
>                  int *sched_id);
> diff --git a/tools/libxc/xc_misc.c b/tools/libxc/xc_misc.c
> index 411128e..607ae61 100644
> --- a/tools/libxc/xc_misc.c
> +++ b/tools/libxc/xc_misc.c
> @@ -209,22 +209,49 @@ out:
>      return ret;
>  }
>  
> -int xc_numainfo(xc_interface *xch,
> -                xc_numainfo_t *put_info)
> +int xc_numainfo(xc_interface *xch, unsigned *max_nodes,
> +                xc_meminfo_t *meminfo, uint32_t *distance)
>  {
>      int ret;
>      DECLARE_SYSCTL;
> +    DECLARE_HYPERCALL_BOUNCE(meminfo, *max_nodes * sizeof(*meminfo),
> +                             XC_HYPERCALL_BUFFER_BOUNCE_OUT);
> +    DECLARE_HYPERCALL_BOUNCE(distance,
> +                             *max_nodes * *max_nodes * sizeof(*distance),
> +                             XC_HYPERCALL_BUFFER_BOUNCE_OUT);
>  
> -    sysctl.cmd = XEN_SYSCTL_numainfo;
> +    if (meminfo && distance) {

The syntax in xc_misc.c looks mostly to be of the
'if ( meminfo && distance )
 {
' type.

> +        if ((ret = xc_hypercall_bounce_pre(xch, meminfo)))
> +            goto out;
> +        if ((ret = xc_hypercall_bounce_pre(xch, distance)))
> +            goto out;
>  
> -    memcpy(&sysctl.u.numainfo, put_info, sizeof(*put_info));
> +        sysctl.u.numainfo.num_nodes = *max_nodes;
> +        set_xen_guest_handle(sysctl.u.numainfo.meminfo, meminfo);
> +        set_xen_guest_handle(sysctl.u.numainfo.distance, distance);
> +    } else if (meminfo || distance) {
> +        errno = EINVAL;
> +        return -1;
> +    } else {
> +        set_xen_guest_handle(sysctl.u.numainfo.meminfo,
> +                             HYPERCALL_BUFFER_NULL);
> +        set_xen_guest_handle(sysctl.u.numainfo.distance,
> +                             HYPERCALL_BUFFER_NULL);
> +    }
>  
> +    sysctl.cmd = XEN_SYSCTL_numainfo;
>      if ((ret = do_sysctl(xch, &sysctl)) != 0)
> -        return ret;
> +        goto out;
>  
> -    memcpy(put_info, &sysctl.u.numainfo, sizeof(*put_info));
> +    *max_nodes = sysctl.u.numainfo.num_nodes;
>  
> -    return 0;
> +out:
> +    if (meminfo && distance) {
> +        xc_hypercall_bounce_post(xch, meminfo);
> +        xc_hypercall_bounce_post(xch, distance);
> +    }
> +
> +    return ret;
>  }
>  
>  
> diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
> index 2b7d19c..ee97a54 100644
> --- a/tools/libxl/libxl.c
> +++ b/tools/libxl/libxl.c
> @@ -5094,62 +5094,44 @@ libxl_cputopology *libxl_get_cpu_topology(libxl_ctx *ctx, int *nb_cpu_out)
>  libxl_numainfo *libxl_get_numainfo(libxl_ctx *ctx, int *nr)
>  {
>      GC_INIT(ctx);
> -    xc_numainfo_t ninfo;
> -    DECLARE_HYPERCALL_BUFFER(xen_sysctl_meminfo_t, meminfo);
> -    DECLARE_HYPERCALL_BUFFER(uint32_t, distance);
> +    xc_meminfo_t *meminfo;
> +    uint32_t *distance;
>      libxl_numainfo *ret = NULL;
>      int i, j;
> +    unsigned num_nodes;
>  
> -    set_xen_guest_handle(ninfo.meminfo, HYPERCALL_BUFFER_NULL);
> -    set_xen_guest_handle(ninfo.distance, HYPERCALL_BUFFER_NULL);
> -    if ( xc_numainfo(ctx->xch, &ninfo) != 0)
> -    {
> -        LIBXL__LOG(ctx, XTL_ERROR, "Unable to determine number of NODES");
> -        ret = NULL;
> +    if (xc_numainfo(ctx->xch, &num_nodes, NULL, NULL)) {

I think for this one you need 'if ( xc_numa.. )
 {'

> +        LOGEV(ERROR, errno, "Unable to determine number of nodes");
>          goto out;
>      }
>  
> -    meminfo = xc_hypercall_buffer_alloc(ctx->xch, meminfo,
> -                                        sizeof(*meminfo) * ninfo.num_nodes);
> -    distance = xc_hypercall_buffer_alloc(ctx->xch, distance,
> -                                         sizeof(*distance) *
> -                                         ninfo.num_nodes * ninfo.num_nodes);
> -    if ((meminfo == NULL) || (distance == NULL)) {
> -        LIBXL__LOG_ERRNOVAL(ctx, XTL_ERROR, ENOMEM,
> -                            "Unable to allocate hypercall arguments");
> -        goto fail;
> -    }
> +    meminfo = libxl__zalloc(gc, sizeof(*meminfo) * num_nodes);
> +    distance = libxl__zalloc(gc, sizeof(*distance) * num_nodes * num_nodes);
>  
> -    set_xen_guest_handle(ninfo.meminfo, meminfo);
> -    set_xen_guest_handle(ninfo.distance, distance);
> -    if (xc_numainfo(ctx->xch, &ninfo) != 0) {
> -        LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "getting numainfo");
> -        goto fail;
> +    if (xc_numainfo(ctx->xch, &num_nodes, meminfo, distance)) {
> +        LOGEV(ERROR, errno, "getting numainfo");

Ditto
> +        goto out;
>      }
>  
> -    *nr = ninfo.num_nodes;
> +    *nr = num_nodes;
>  
> -    ret = libxl__zalloc(NOGC, sizeof(libxl_numainfo) * ninfo.num_nodes);
> -    for (i = 0; i < ninfo.num_nodes; i++)
> -        ret[i].dists = libxl__calloc(NOGC, ninfo.num_nodes, sizeof(*distance));
> +    ret = libxl__zalloc(NOGC, sizeof(libxl_numainfo) * num_nodes);
> +    for (i = 0; i < num_nodes; i++)
> +        ret[i].dists = libxl__calloc(NOGC, num_nodes, sizeof(*distance));
>  
> -    for (i = 0; i < ninfo.num_nodes; i++) {
> +    for (i = 0; i < num_nodes; i++) {
>  #define V(val, invalid) (val == invalid) ? \
>         LIBXL_NUMAINFO_INVALID_ENTRY : val
>          ret[i].size = V(meminfo[i].memsize, XEN_INVALID_MEM_SZ);
>          ret[i].free = V(meminfo[i].memfree, XEN_INVALID_MEM_SZ);
> -        ret[i].num_dists = ninfo.num_nodes;
> +        ret[i].num_dists = num_nodes;
>          for (j = 0; j < ret[i].num_dists; j++) {
> -            unsigned idx = i * ninfo.num_nodes + j;
> +            unsigned idx = i * num_nodes + j;
>              ret[i].dists[j] = V(distance[idx], XEN_INVALID_NODE_DIST);
>          }
>  #undef V
>      }
>  
> - fail:
> -    xc_hypercall_buffer_free(ctx->xch, meminfo);
> -    xc_hypercall_buffer_free(ctx->xch, distance);
> -
>   out:
>      GC_FREE;
>      return ret;
> diff --git a/tools/python/xen/lowlevel/xc/xc.c b/tools/python/xen/lowlevel/xc/xc.c
> index 21ba57d..fbd93db 100644
> --- a/tools/python/xen/lowlevel/xc/xc.c
> +++ b/tools/python/xen/lowlevel/xc/xc.c
> @@ -1293,33 +1293,23 @@ out:
>  
>  static PyObject *pyxc_numainfo(XcObject *self)
>  {
> -    xc_numainfo_t ninfo = { 0 };
> -    unsigned i, j;
> +    unsigned i, j, num_nodes;
>      uint64_t free_heap;
>      PyObject *ret_obj = NULL, *node_to_node_dist_list_obj;
>      PyObject *node_to_memsize_obj, *node_to_memfree_obj;
>      PyObject *node_to_dma32_mem_obj, *node_to_node_dist_obj;
> -    DECLARE_HYPERCALL_BUFFER(xen_sysctl_meminfo_t, meminfo);
> -    DECLARE_HYPERCALL_BUFFER(uint32_t, distance);
> +    xc_meminfo_t *meminfo = NULL;
> +    uint32_t *distance = NULL;
>  
> -    set_xen_guest_handle(ninfo.meminfo, HYPERCALL_BUFFER_NULL);
> -    set_xen_guest_handle(ninfo.distance, HYPERCALL_BUFFER_NULL);
> -    if ( xc_numainfo(self->xc_handle, &ninfo) != 0 )
> +    if ( xc_numainfo(self->xc_handle, &num_nodes, NULL, NULL) != 0 )
>          goto out;
>  
> -    meminfo = xc_hypercall_buffer_alloc(self->xc_handle, meminfo,
> -                                        sizeof(*meminfo) * ninfo.num_nodes);
> -    if ( meminfo == NULL )
> -        goto out;
> -    distance = xc_hypercall_buffer_alloc(self->xc_handle, distance,
> -                                         sizeof(*distance) *
> -                                         ninfo.num_nodes * ninfo.num_nodes);
> -    if ( distance == NULL )
> +    meminfo = calloc(num_nodes, sizeof(*meminfo));
> +    distance = calloc(num_nodes * num_nodes, sizeof(*distance));
> +    if ( (meminfo == NULL) || (distance == NULL) )
>          goto out;
>  
> -    set_xen_guest_handle(ninfo.meminfo, meminfo);
> -    set_xen_guest_handle(ninfo.distance, distance);
> -    if ( xc_numainfo(self->xc_handle, &ninfo) != 0 )
> +    if ( xc_numainfo(self->xc_handle, &num_nodes, meminfo, distance) != 0 )
>          goto out;
>  
>      /* Construct node-to-* lists. */
> @@ -1327,7 +1317,7 @@ static PyObject *pyxc_numainfo(XcObject *self)
>      node_to_memfree_obj = PyList_New(0);
>      node_to_dma32_mem_obj = PyList_New(0);
>      node_to_node_dist_list_obj = PyList_New(0);
> -    for ( i = 0; i < ninfo.num_nodes; i++ )
> +    for ( i = 0; i < num_nodes; i++ )
>      {
>          PyObject *pyint;
>          unsigned invalid_node;
> @@ -1351,9 +1341,9 @@ static PyObject *pyxc_numainfo(XcObject *self)
>          /* Node to Node Distance */
>          node_to_node_dist_obj = PyList_New(0);
>          invalid_node = (meminfo[i].memsize == XEN_INVALID_MEM_SZ);
> -        for ( j = 0; j < ninfo.num_nodes; j++ )
> +        for ( j = 0; j < num_nodes; j++ )
>          {
> -            uint32_t dist = distance[i * ninfo.num_nodes + j];
> +            uint32_t dist = distance[i * num_nodes + j];
>              if ( invalid_node || (dist == XEN_INVALID_NODE_DIST) )
>              {
>                  PyList_Append(node_to_node_dist_obj, Py_None);
> @@ -1369,7 +1359,7 @@ static PyObject *pyxc_numainfo(XcObject *self)
>          Py_DECREF(node_to_node_dist_obj);
>      }
>  
> -    ret_obj = Py_BuildValue("{s:i}", "max_node_index", ninfo.num_nodes + 1);
> +    ret_obj = Py_BuildValue("{s:i}", "max_node_index", num_nodes + 1);
>  
>      PyDict_SetItemString(ret_obj, "node_memsize", node_to_memsize_obj);
>      Py_DECREF(node_to_memsize_obj);
> @@ -1385,8 +1375,8 @@ static PyObject *pyxc_numainfo(XcObject *self)
>      Py_DECREF(node_to_node_dist_list_obj);
>  
>  out:
> -    xc_hypercall_buffer_free(self->xc_handle, meminfo);
> -    xc_hypercall_buffer_free(self->xc_handle, distance);
> +    free(meminfo);
> +    free(distance);
>      return ret_obj ? ret_obj : pyxc_error_to_exception(self->xc_handle);
>  }
>  
> -- 
> 1.7.1
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v5 6/8] libxl/libxc: Move libxl_get_cpu_topology()'s hypercall buffer management to libxc
  2015-03-19 21:54 ` [PATCH v5 6/8] libxl/libxc: Move libxl_get_cpu_topology()'s hypercall buffer management to libxc Boris Ostrovsky
@ 2015-03-20 13:54   ` Ian Campbell
  2015-03-20 14:24     ` Boris Ostrovsky
  0 siblings, 1 reply; 45+ messages in thread
From: Ian Campbell @ 2015-03-20 13:54 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: elena.ufimtseva, wei.liu2, stefano.stabellini, andrew.cooper3,
	dario.faggioli, ian.jackson, xen-devel, jbeulich, keir

On Thu, 2015-03-19 at 17:54 -0400, Boris Ostrovsky wrote:
> +    if (cputopo) {
> +        if ((ret = xc_hypercall_bounce_pre(xch, cputopo)))

I think this guy will tolerate a NULL here (and on post), so I
think/hope you can get away without this conditional stuff. Can you give
it a go please.

The rest looked ok.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v5 7/8] libxl/libxc: Move libxl_get_numainfo()'s hypercall buffer management to libxc
  2015-03-19 21:54 ` [PATCH v5 7/8] libxl/libxc: Move libxl_get_numainfo()'s " Boris Ostrovsky
  2015-03-20 13:49   ` Konrad Rzeszutek Wilk
@ 2015-03-20 13:56   ` Ian Campbell
  2015-03-20 14:31     ` Boris Ostrovsky
  1 sibling, 1 reply; 45+ messages in thread
From: Ian Campbell @ 2015-03-20 13:56 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: elena.ufimtseva, wei.liu2, stefano.stabellini, andrew.cooper3,
	dario.faggioli, ian.jackson, xen-devel, jbeulich, keir

On Thu, 2015-03-19 at 17:54 -0400, Boris Ostrovsky wrote:
> diff --git a/tools/libxc/xc_misc.c b/tools/libxc/xc_misc.c
> index 411128e..607ae61 100644
> --- a/tools/libxc/xc_misc.c
> +++ b/tools/libxc/xc_misc.c
> @@ -209,22 +209,49 @@ out:
>      return ret;
>  }
>  
> -int xc_numainfo(xc_interface *xch,
> -                xc_numainfo_t *put_info)
> +int xc_numainfo(xc_interface *xch, unsigned *max_nodes,
> +                xc_meminfo_t *meminfo, uint32_t *distance)
>  {
>      int ret;
>      DECLARE_SYSCTL;
> +    DECLARE_HYPERCALL_BOUNCE(meminfo, *max_nodes * sizeof(*meminfo),
> +                             XC_HYPERCALL_BUFFER_BOUNCE_OUT);
> +    DECLARE_HYPERCALL_BOUNCE(distance,
> +                             *max_nodes * *max_nodes * sizeof(*distance),
> +                             XC_HYPERCALL_BUFFER_BOUNCE_OUT);
>  
> -    sysctl.cmd = XEN_SYSCTL_numainfo;
> +    if (meminfo && distance) {
> +        if ((ret = xc_hypercall_bounce_pre(xch, meminfo)))
> +            goto out;
> +        if ((ret = xc_hypercall_bounce_pre(xch, distance)))
> +            goto out;

Same comment about handling NULL as before.

In addition what if only one of meminfo and distance is NULL? Is that
valid or do you need a !!meminfo ^ !!distance check?

Rests looks ok.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v5 3/8] sysctl: Make XEN_SYSCTL_topologyinfo sysctl a little more efficient
  2015-03-20 13:25   ` Ian Campbell
@ 2015-03-20 14:00     ` Boris Ostrovsky
  2015-03-20 14:10       ` Ian Campbell
  0 siblings, 1 reply; 45+ messages in thread
From: Boris Ostrovsky @ 2015-03-20 14:00 UTC (permalink / raw)
  To: Ian Campbell
  Cc: elena.ufimtseva, wei.liu2, stefano.stabellini, andrew.cooper3,
	dario.faggioli, ian.jackson, xen-devel, jbeulich, keir

On 03/20/2015 09:25 AM, Ian Campbell wrote:
> On Thu, 2015-03-19 at 17:53 -0400, Boris Ostrovsky wrote:
> [...]
>> * Replace INVALID_TOPOLOGY_ID with "XEN_"-prefixed macros for each invalid type
>>    (core, socket, node).
> But not for distance:
>
>> @@ -1375,7 +1360,7 @@ static PyObject *pyxc_numainfo(XcObject *self)
>>           for ( j = 0; j <= max_node_index; j++ )
>>           {
>>               uint32_t dist = nodes_dist[i*(max_node_index+1) + j];
>> -            if ( dist == INVALID_TOPOLOGY_ID )
>> +            if ( dist == ~0u )
> ?

This patch replaces INVALID_TOPOLOGY_ID with 
XEN_INVALID_CORE/SOCKET/NODE_ID for CPU topology. Neither of those would 
be appropriate for distance.

OTOH, the hypervisor never actually returns INVALID_TOPOLOGY_ID for 
"bad" distance -- it explicitly sets the value to ~0u (see 
XEN_SYSCTL_numainfo in do_sysctl()). Thing work since 
INVALID_TOPOLOGY_ID happens to be defined as ~0u, but only by accident.

The next patch (4/8) introduces XEN_INVALID_NODE_DIST and updates the 
hypervisor to report this value when distance is not available. And the 
hunk above is updated to use it as well.

Andrew noticed this too when reviewing previous version so it may be 
worth mentioning in the commit message.

-boris

>
> Other than that the tools side here looks ok.
>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v5 2/8] pci: Stash device's PXM information in struct pci_dev
  2015-03-19 21:53 ` [PATCH v5 2/8] pci: Stash device's PXM information in struct pci_dev Boris Ostrovsky
@ 2015-03-20 14:01   ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 45+ messages in thread
From: Konrad Rzeszutek Wilk @ 2015-03-20 14:01 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: elena.ufimtseva, keir, ian.campbell, stefano.stabellini,
	andrew.cooper3, dario.faggioli, ian.jackson, xen-devel, jbeulich,
	wei.liu2

On Thu, Mar 19, 2015 at 05:53:58PM -0400, Boris Ostrovsky wrote:
> If ACPI provides PXM data for IO devices then dom0 will pass it to
> hypervisor during PHYSDEVOP_pci_device_add call. This information,
> however, is currently ignored.
> 
> We will store this information (in the form of nodeID) in pci_dev
> structure so that we can provide it, for example, to the toolstack
> when it adds support (in the following patches) for querying the
> hypervisor about device topology
> 
> We will also print it when user requests device information dump.
> 
> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>

Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> ---
> 
> Changes in v5:
> * Replace u8 with nodeid_t
> 
>  xen/arch/x86/physdev.c        |   23 ++++++++++++++++++++---
>  xen/drivers/passthrough/pci.c |   13 +++++++++----
>  xen/include/public/physdev.h  |    6 ++++++
>  xen/include/xen/pci.h         |    5 ++++-
>  4 files changed, 39 insertions(+), 8 deletions(-)
> 
> diff --git a/xen/arch/x86/physdev.c b/xen/arch/x86/physdev.c
> index 1be1d50..57b7800 100644
> --- a/xen/arch/x86/physdev.c
> +++ b/xen/arch/x86/physdev.c
> @@ -565,7 +565,8 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>          if ( copy_from_guest(&manage_pci, arg, 1) != 0 )
>              break;
>  
> -        ret = pci_add_device(0, manage_pci.bus, manage_pci.devfn, NULL);
> +        ret = pci_add_device(0, manage_pci.bus, manage_pci.devfn,
> +                             NULL, NUMA_NO_NODE);
>          break;
>      }
>  
> @@ -597,13 +598,14 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>          pdev_info.physfn.devfn = manage_pci_ext.physfn.devfn;
>          ret = pci_add_device(0, manage_pci_ext.bus,
>                               manage_pci_ext.devfn,
> -                             &pdev_info);
> +                             &pdev_info, NUMA_NO_NODE);
>          break;
>      }
>  
>      case PHYSDEVOP_pci_device_add: {
>          struct physdev_pci_device_add add;
>          struct pci_dev_info pdev_info;
> +        nodeid_t node;
>  
>          ret = -EFAULT;
>          if ( copy_from_guest(&add, arg, 1) != 0 )
> @@ -618,7 +620,22 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>          }
>          else
>              pdev_info.is_virtfn = 0;
> -        ret = pci_add_device(add.seg, add.bus, add.devfn, &pdev_info);
> +
> +        if ( add.flags & XEN_PCI_DEV_PXM )
> +        {
> +            uint32_t pxm;
> +            size_t optarr_off = offsetof(struct physdev_pci_device_add, optarr) /
> +                                sizeof(add.optarr[0]);
> +
> +            if ( copy_from_guest_offset(&pxm, arg, optarr_off, 1) )
> +                break;
> +
> +            node = pxm_to_node(pxm);
> +        }
> +        else
> +            node = NUMA_NO_NODE;
> +
> +        ret = pci_add_device(add.seg, add.bus, add.devfn, &pdev_info, node);
>          break;
>      }
>  
> diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
> index 4b83583..ecd061e 100644
> --- a/xen/drivers/passthrough/pci.c
> +++ b/xen/drivers/passthrough/pci.c
> @@ -568,7 +568,8 @@ static void pci_enable_acs(struct pci_dev *pdev)
>      pci_conf_write16(seg, bus, dev, func, pos + PCI_ACS_CTRL, ctrl);
>  }
>  
> -int pci_add_device(u16 seg, u8 bus, u8 devfn, const struct pci_dev_info *info)
> +int pci_add_device(u16 seg, u8 bus, u8 devfn,
> +                   const struct pci_dev_info *info, nodeid_t node)
>  {
>      struct pci_seg *pseg;
>      struct pci_dev *pdev;
> @@ -586,7 +587,8 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn, const struct pci_dev_info *info)
>          pdev = pci_get_pdev(seg, info->physfn.bus, info->physfn.devfn);
>          spin_unlock(&pcidevs_lock);
>          if ( !pdev )
> -            pci_add_device(seg, info->physfn.bus, info->physfn.devfn, NULL);
> +            pci_add_device(seg, info->physfn.bus, info->physfn.devfn,
> +                           NULL, node);
>          pdev_type = "virtual function";
>      }
>      else
> @@ -609,6 +611,8 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn, const struct pci_dev_info *info)
>      if ( !pdev )
>          goto out;
>  
> +    pdev->node = node;
> +
>      if ( info )
>          pdev->info = *info;
>      else if ( !pdev->vf_rlen[0] )
> @@ -1191,10 +1195,11 @@ static int _dump_pci_devices(struct pci_seg *pseg, void *arg)
>  
>      list_for_each_entry ( pdev, &pseg->alldevs_list, alldevs_list )
>      {
> -        printk("%04x:%02x:%02x.%u - dom %-3d - MSIs < ",
> +        printk("%04x:%02x:%02x.%u - dom %-3d - node %-3d - MSIs < ",
>                 pseg->nr, pdev->bus,
>                 PCI_SLOT(pdev->devfn), PCI_FUNC(pdev->devfn),
> -               pdev->domain ? pdev->domain->domain_id : -1);
> +               pdev->domain ? pdev->domain->domain_id : -1,
> +               (pdev->node != NUMA_NO_NODE) ? pdev->node : -1);
>          list_for_each_entry ( msi, &pdev->msi_list, list )
>                 printk("%d ", msi->irq);
>          printk(">\n");
> diff --git a/xen/include/public/physdev.h b/xen/include/public/physdev.h
> index 2683719..f33845d 100644
> --- a/xen/include/public/physdev.h
> +++ b/xen/include/public/physdev.h
> @@ -293,6 +293,12 @@ struct physdev_pci_device_add {
>          uint8_t bus;
>          uint8_t devfn;
>      } physfn;
> +
> +    /*
> +     * Optional parameters array.
> +     * First element ([0]) is PXM domain associated with the device (if
> +     * XEN_PCI_DEV_PXM is set)
> +     */
>  #if defined(__STDC_VERSION__) && __STDC_VERSION__ >= 199901L
>      uint32_t optarr[];
>  #elif defined(__GNUC__)
> diff --git a/xen/include/xen/pci.h b/xen/include/xen/pci.h
> index 3988ee6..b281790 100644
> --- a/xen/include/xen/pci.h
> +++ b/xen/include/xen/pci.h
> @@ -57,6 +57,8 @@ struct pci_dev {
>  
>      u8 phantom_stride;
>  
> +    nodeid_t node; /* NUMA node */
> +
>      enum pdev_type {
>          DEV_TYPE_PCI_UNKNOWN,
>          DEV_TYPE_PCIe_ENDPOINT,
> @@ -103,7 +105,8 @@ void setup_hwdom_pci_devices(struct domain *,
>  int pci_release_devices(struct domain *d);
>  int pci_add_segment(u16 seg);
>  const unsigned long *pci_get_ro_map(u16 seg);
> -int pci_add_device(u16 seg, u8 bus, u8 devfn, const struct pci_dev_info *);
> +int pci_add_device(u16 seg, u8 bus, u8 devfn,
> +                   const struct pci_dev_info *, nodeid_t node);
>  int pci_remove_device(u16 seg, u8 bus, u8 devfn);
>  int pci_ro_device(int seg, int bus, int devfn);
>  void arch_pci_ro_device(int seg, int bdf);
> -- 
> 1.7.1
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v5 7/8] libxl/libxc: Move libxl_get_numainfo()'s hypercall buffer management to libxc
  2015-03-20 13:49   ` Konrad Rzeszutek Wilk
@ 2015-03-20 14:10     ` Boris Ostrovsky
  2015-03-20 14:21       ` Ian Campbell
  0 siblings, 1 reply; 45+ messages in thread
From: Boris Ostrovsky @ 2015-03-20 14:10 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: elena.ufimtseva, keir, ian.campbell, stefano.stabellini,
	andrew.cooper3, dario.faggioli, ian.jackson, xen-devel, jbeulich,
	wei.liu2

On 03/20/2015 09:49 AM, Konrad Rzeszutek Wilk wrote:
> On Thu, Mar 19, 2015 at 05:54:03PM -0400, Boris Ostrovsky wrote:
>> xc_numainfo() is not expected to be used on a hot path and therefore
>> hypercall buffer management can be pushed into libxc. This will simplify
>> life for callers.
>>
>> Also update error logging macros.
>>
>> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
>> ---
>> Changes in v5:
>> * Adjust to new interface (as described in changelog for patch 4/8)
>>
>>   tools/libxc/include/xenctrl.h     |    4 ++-
>>   tools/libxc/xc_misc.c             |   41 ++++++++++++++++++++++++-----
>>   tools/libxl/libxl.c               |   52 ++++++++++++-------------------------
>>   tools/python/xen/lowlevel/xc/xc.c |   38 ++++++++++-----------------
>>   4 files changed, 68 insertions(+), 67 deletions(-)
>>
>> diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
>> index 14d22ce..54540e7 100644
>> --- a/tools/libxc/include/xenctrl.h
>> +++ b/tools/libxc/include/xenctrl.h
>> @@ -1228,6 +1228,7 @@ int xc_send_debug_keys(xc_interface *xch, char *keys);
>>   typedef xen_sysctl_physinfo_t xc_physinfo_t;
>>   typedef xen_sysctl_cputopo_t xc_cputopo_t;
>>   typedef xen_sysctl_numainfo_t xc_numainfo_t;
>> +typedef xen_sysctl_meminfo_t xc_meminfo_t;
>>   
>>   typedef uint32_t xc_cpu_to_node_t;
>>   typedef uint32_t xc_cpu_to_socket_t;
>> @@ -1239,7 +1240,8 @@ typedef uint32_t xc_node_to_node_dist_t;
>>   int xc_physinfo(xc_interface *xch, xc_physinfo_t *info);
>>   int xc_cputopoinfo(xc_interface *xch, unsigned *max_cpus,
>>                      xc_cputopo_t *cputopo);
>> -int xc_numainfo(xc_interface *xch, xc_numainfo_t *info);
>> +int xc_numainfo(xc_interface *xch, unsigned *max_nodes,
>> +                xc_meminfo_t *meminfo, uint32_t *distance);
>>   
>>   int xc_sched_id(xc_interface *xch,
>>                   int *sched_id);
>> diff --git a/tools/libxc/xc_misc.c b/tools/libxc/xc_misc.c
>> index 411128e..607ae61 100644
>> --- a/tools/libxc/xc_misc.c
>> +++ b/tools/libxc/xc_misc.c
>> @@ -209,22 +209,49 @@ out:
>>       return ret;
>>   }
>>   
>> -int xc_numainfo(xc_interface *xch,
>> -                xc_numainfo_t *put_info)
>> +int xc_numainfo(xc_interface *xch, unsigned *max_nodes,
>> +                xc_meminfo_t *meminfo, uint32_t *distance)
>>   {
>>       int ret;
>>       DECLARE_SYSCTL;
>> +    DECLARE_HYPERCALL_BOUNCE(meminfo, *max_nodes * sizeof(*meminfo),
>> +                             XC_HYPERCALL_BUFFER_BOUNCE_OUT);
>> +    DECLARE_HYPERCALL_BOUNCE(distance,
>> +                             *max_nodes * *max_nodes * sizeof(*distance),
>> +                             XC_HYPERCALL_BUFFER_BOUNCE_OUT);
>>   
>> -    sysctl.cmd = XEN_SYSCTL_numainfo;
>> +    if (meminfo && distance) {
> The syntax in xc_misc.c looks mostly to be of the
> 'if ( meminfo && distance )
>   {
> ' type.
>
>> +        if ((ret = xc_hypercall_bounce_pre(xch, meminfo)))
>> +            goto out;
>> +        if ((ret = xc_hypercall_bounce_pre(xch, distance)))
>> +            goto out;
>>   
>> -    memcpy(&sysctl.u.numainfo, put_info, sizeof(*put_info));
>> +        sysctl.u.numainfo.num_nodes = *max_nodes;
>> +        set_xen_guest_handle(sysctl.u.numainfo.meminfo, meminfo);
>> +        set_xen_guest_handle(sysctl.u.numainfo.distance, distance);
>> +    } else if (meminfo || distance) {
>> +        errno = EINVAL;
>> +        return -1;
>> +    } else {
>> +        set_xen_guest_handle(sysctl.u.numainfo.meminfo,
>> +                             HYPERCALL_BUFFER_NULL);
>> +        set_xen_guest_handle(sysctl.u.numainfo.distance,
>> +                             HYPERCALL_BUFFER_NULL);
>> +    }
>>   
>> +    sysctl.cmd = XEN_SYSCTL_numainfo;
>>       if ((ret = do_sysctl(xch, &sysctl)) != 0)
>> -        return ret;
>> +        goto out;
>>   
>> -    memcpy(put_info, &sysctl.u.numainfo, sizeof(*put_info));
>> +    *max_nodes = sysctl.u.numainfo.num_nodes;
>>   
>> -    return 0;
>> +out:
>> +    if (meminfo && distance) {
>> +        xc_hypercall_bounce_post(xch, meminfo);
>> +        xc_hypercall_bounce_post(xch, distance);
>> +    }
>> +
>> +    return ret;
>>   }
>>   
>>   
>> diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
>> index 2b7d19c..ee97a54 100644
>> --- a/tools/libxl/libxl.c
>> +++ b/tools/libxl/libxl.c
>> @@ -5094,62 +5094,44 @@ libxl_cputopology *libxl_get_cpu_topology(libxl_ctx *ctx, int *nb_cpu_out)
>>   libxl_numainfo *libxl_get_numainfo(libxl_ctx *ctx, int *nr)
>>   {
>>       GC_INIT(ctx);
>> -    xc_numainfo_t ninfo;
>> -    DECLARE_HYPERCALL_BUFFER(xen_sysctl_meminfo_t, meminfo);
>> -    DECLARE_HYPERCALL_BUFFER(uint32_t, distance);
>> +    xc_meminfo_t *meminfo;
>> +    uint32_t *distance;
>>       libxl_numainfo *ret = NULL;
>>       int i, j;
>> +    unsigned num_nodes;
>>   
>> -    set_xen_guest_handle(ninfo.meminfo, HYPERCALL_BUFFER_NULL);
>> -    set_xen_guest_handle(ninfo.distance, HYPERCALL_BUFFER_NULL);
>> -    if ( xc_numainfo(ctx->xch, &ninfo) != 0)
>> -    {
>> -        LIBXL__LOG(ctx, XTL_ERROR, "Unable to determine number of NODES");
>> -        ret = NULL;
>> +    if (xc_numainfo(ctx->xch, &num_nodes, NULL, NULL)) {
> I think for this one you need 'if ( xc_numa.. )
>   {'

I actually think it's the other way around: when I made the first change 
in 4/8 I added spaces, which this file's style generally doesn't use.

Which, conveniently, is different from the style used by xc_misc.c

And I am quite confused about curly bracket positions. They are all over 
the place (pun intended).

-boris

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v5 3/8] sysctl: Make XEN_SYSCTL_topologyinfo sysctl a little more efficient
  2015-03-20 14:00     ` Boris Ostrovsky
@ 2015-03-20 14:10       ` Ian Campbell
  0 siblings, 0 replies; 45+ messages in thread
From: Ian Campbell @ 2015-03-20 14:10 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: elena.ufimtseva, wei.liu2, stefano.stabellini, andrew.cooper3,
	dario.faggioli, ian.jackson, xen-devel, jbeulich, keir

On Fri, 2015-03-20 at 10:00 -0400, Boris Ostrovsky wrote:
> On 03/20/2015 09:25 AM, Ian Campbell wrote:
> > On Thu, 2015-03-19 at 17:53 -0400, Boris Ostrovsky wrote:
> > [...]
> >> * Replace INVALID_TOPOLOGY_ID with "XEN_"-prefixed macros for each invalid type
> >>    (core, socket, node).
> > But not for distance:
> >
> >> @@ -1375,7 +1360,7 @@ static PyObject *pyxc_numainfo(XcObject *self)
> >>           for ( j = 0; j <= max_node_index; j++ )
> >>           {
> >>               uint32_t dist = nodes_dist[i*(max_node_index+1) + j];
> >> -            if ( dist == INVALID_TOPOLOGY_ID )
> >> +            if ( dist == ~0u )
> > ?
> 
> This patch replaces INVALID_TOPOLOGY_ID with 
> XEN_INVALID_CORE/SOCKET/NODE_ID for CPU topology. Neither of those would 
> be appropriate for distance.

Right, I was asking for INVALID_DIST, but...
[...]
> The next patch (4/8) introduces XEN_INVALID_NODE_DIST

You already did it!

So, Acked-by: Ian Campbell <ian.campbell@citrix.com>

(you may as well mention in the commit log next time too)

Ian.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v5 5/8] sysctl: Add sysctl interface for querying PCI topology
  2015-03-19 21:54 ` [PATCH v5 5/8] sysctl: Add sysctl interface for querying PCI topology Boris Ostrovsky
@ 2015-03-20 14:21   ` Konrad Rzeszutek Wilk
  2015-03-20 16:26   ` Jan Beulich
  1 sibling, 0 replies; 45+ messages in thread
From: Konrad Rzeszutek Wilk @ 2015-03-20 14:21 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: elena.ufimtseva, keir, ian.campbell, stefano.stabellini,
	andrew.cooper3, dario.faggioli, ian.jackson, xen-devel, jbeulich,
	wei.liu2

On Thu, Mar 19, 2015 at 05:54:01PM -0400, Boris Ostrovsky wrote:
> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>

Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> ---
> 
> Changes in v5:
> * Increment ti->first_dev in the loop
> * Make node in xen_sysctl_pcitopoinfo a uint32
> * Move sysctl to follow hearder file's order
> * Update comments in sysctl.h 
> 
>  xen/common/sysctl.c         |   61 +++++++++++++++++++++++++++++++++++++++++++
>  xen/include/public/sysctl.h |   28 +++++++++++++++++++
>  2 files changed, 89 insertions(+), 0 deletions(-)
> 
> diff --git a/xen/common/sysctl.c b/xen/common/sysctl.c
> index acaeeb2..c73dfc9 100644
> --- a/xen/common/sysctl.c
> +++ b/xen/common/sysctl.c
> @@ -399,6 +399,67 @@ long do_sysctl(XEN_GUEST_HANDLE_PARAM(xen_sysctl_t) u_sysctl)
>          break;
>  #endif
>  
> +#ifdef HAS_PCI
> +    case XEN_SYSCTL_pcitopoinfo:
> +    {
> +        xen_sysctl_pcitopoinfo_t *ti = &op->u.pcitopoinfo;
> +
> +        if ( guest_handle_is_null(ti->devs) ||
> +             guest_handle_is_null(ti->nodes) ||
> +             (ti->first_dev > ti->num_devs) )
> +        {
> +            ret = -EINVAL;
> +            break;
> +        }
> +
> +        while ( ti->first_dev < ti->num_devs )
> +        {
> +            physdev_pci_device_t dev;
> +            uint32_t node;
> +            struct pci_dev *pdev;
> +
> +            if ( copy_from_guest_offset(&dev, ti->devs, ti->first_dev, 1) )
> +            {
> +                ret = -EFAULT;
> +                break;
> +            }
> +
> +            spin_lock(&pcidevs_lock);
> +            pdev = pci_get_pdev(dev.seg, dev.bus, dev.devfn);
> +            if ( !pdev || (pdev->node == NUMA_NO_NODE) )
> +                node = XEN_INVALID_NODE_ID;
> +            else
> +                node = pdev->node;
> +            spin_unlock(&pcidevs_lock);
> +
> +            if ( copy_to_guest_offset(ti->nodes, ti->first_dev, &node, 1) )
> +            {
> +                ret = -EFAULT;
> +                break;
> +            }
> +
> +            ti->first_dev++;
> +
> +            if ( hypercall_preempt_check() )
> +                break;
> +        }
> +
> +        if ( !ret )
> +        {
> +            if ( __copy_field_to_guest(u_sysctl, op, u.pcitopoinfo.first_dev) )
> +            {
> +                ret = -EFAULT;
> +                break;
> +            }
> +
> +            if ( ti->first_dev < ti->num_devs )
> +                ret = hypercall_create_continuation(__HYPERVISOR_sysctl,
> +                                                    "h", u_sysctl);
> +        }
> +    }
> +    break;
> +#endif
> +
>      default:
>          ret = arch_do_sysctl(op, u_sysctl);
>          copyback = 0;
> diff --git a/xen/include/public/sysctl.h b/xen/include/public/sysctl.h
> index 7e0d5fe..ceb8ac8 100644
> --- a/xen/include/public/sysctl.h
> +++ b/xen/include/public/sysctl.h
> @@ -33,6 +33,7 @@
>  
>  #include "xen.h"
>  #include "domctl.h"
> +#include "physdev.h"
>  
>  #define XEN_SYSCTL_INTERFACE_VERSION 0x0000000C
>  
> @@ -668,6 +669,31 @@ struct xen_sysctl_psr_cmt_op {
>  typedef struct xen_sysctl_psr_cmt_op xen_sysctl_psr_cmt_op_t;
>  DEFINE_XEN_GUEST_HANDLE(xen_sysctl_psr_cmt_op_t);
>  
> +/* XEN_SYSCTL_pcitopoinfo */
> +struct xen_sysctl_pcitopoinfo {
> +    /* IN: Number of elements in 'pcitopo' and 'nodes' arrays */
> +    uint32_t num_devs;
> +
> +    /*
> +     * IN/OUT: First element of pcitopo array that needs to be processed by
> +     * hypervisor.
> +     * This is used by hypercall continuations, callers must set it to zero.
> +     */
> +    uint32_t first_dev;
> +
> +    /* IN: list of devices */
> +    XEN_GUEST_HANDLE_64(physdev_pci_device_t) devs;
> +
> +    /*
> +     * OUT: node identifier for each device.
> +     * If information for a particular device is not avalable then set
> +     * to XEN_INVALID_NODE_ID.
> +     */
> +    XEN_GUEST_HANDLE_64(uint32) nodes;
> +};
> +typedef struct xen_sysctl_pcitopoinfo xen_sysctl_pcitopoinfo_t;
> +DEFINE_XEN_GUEST_HANDLE(xen_sysctl_pcitopoinfo_t);
> +
>  struct xen_sysctl {
>      uint32_t cmd;
>  #define XEN_SYSCTL_readconsole                    1
> @@ -690,12 +716,14 @@ struct xen_sysctl {
>  #define XEN_SYSCTL_scheduler_op                  19
>  #define XEN_SYSCTL_coverage_op                   20
>  #define XEN_SYSCTL_psr_cmt_op                    21
> +#define XEN_SYSCTL_pcitopoinfo                   22
>      uint32_t interface_version; /* XEN_SYSCTL_INTERFACE_VERSION */
>      union {
>          struct xen_sysctl_readconsole       readconsole;
>          struct xen_sysctl_tbuf_op           tbuf_op;
>          struct xen_sysctl_physinfo          physinfo;
>          struct xen_sysctl_cputopoinfo       cputopoinfo;
> +        struct xen_sysctl_pcitopoinfo       pcitopoinfo;
>          struct xen_sysctl_numainfo          numainfo;
>          struct xen_sysctl_sched_id          sched_id;
>          struct xen_sysctl_perfc_op          perfc_op;
> -- 
> 1.7.1
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v5 7/8] libxl/libxc: Move libxl_get_numainfo()'s hypercall buffer management to libxc
  2015-03-20 14:10     ` Boris Ostrovsky
@ 2015-03-20 14:21       ` Ian Campbell
  0 siblings, 0 replies; 45+ messages in thread
From: Ian Campbell @ 2015-03-20 14:21 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: elena.ufimtseva, keir, stefano.stabellini, andrew.cooper3,
	dario.faggioli, ian.jackson, xen-devel, jbeulich, wei.liu2

On Fri, 2015-03-20 at 10:10 -0400, Boris Ostrovsky wrote:
> I actually think it's the other way around: when I made the first change 
> in 4/8 I added spaces, which this file's style generally doesn't use.
> 
> Which, conveniently, is different from the style used by xc_misc.c

Right, libxc uses the Xen coding style (top-level CODING_STYLE) while
libxl uses its own (tools/libxc/CODING_STYLE).

In some cases for very loose values of "uses" though, sorry.

> And I am quite confused about curly bracket positions. They are all over 
> the place (pun intended).

In libxc they go on the next line (Xen style) in libxl they go on the
same line (libxl style).

Ian.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v5 6/8] libxl/libxc: Move libxl_get_cpu_topology()'s hypercall buffer management to libxc
  2015-03-20 13:54   ` Ian Campbell
@ 2015-03-20 14:24     ` Boris Ostrovsky
  2015-03-20 14:30       ` Ian Campbell
  0 siblings, 1 reply; 45+ messages in thread
From: Boris Ostrovsky @ 2015-03-20 14:24 UTC (permalink / raw)
  To: Ian Campbell
  Cc: elena.ufimtseva, wei.liu2, stefano.stabellini, andrew.cooper3,
	dario.faggioli, ian.jackson, xen-devel, jbeulich, keir

On 03/20/2015 09:54 AM, Ian Campbell wrote:
> On Thu, 2015-03-19 at 17:54 -0400, Boris Ostrovsky wrote:
>> +    if (cputopo) {
>> +        if ((ret = xc_hypercall_bounce_pre(xch, cputopo)))
> I think this guy will tolerate a NULL here (and on post), so I
> think/hope you can get away without this conditional stuff. Can you give
> it a go please.

I'll see if this works. (I can't mentally unwrap all these macros, I'll 
need to test it. Or at least run it through pre-processor)

-boris

>
> The rest looked ok.
>
>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v5 1/8] numa: __node_distance() should return u8
  2015-03-19 21:53 ` [PATCH v5 1/8] numa: __node_distance() should return u8 Boris Ostrovsky
@ 2015-03-20 14:25   ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 45+ messages in thread
From: Konrad Rzeszutek Wilk @ 2015-03-20 14:25 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: elena.ufimtseva, keir, ian.campbell, stefano.stabellini,
	andrew.cooper3, dario.faggioli, ian.jackson, xen-devel, jbeulich,
	wei.liu2

On Thu, Mar 19, 2015 at 05:53:57PM -0400, Boris Ostrovsky wrote:
> SLIT values are byte-sized and some of them (0-9 and 255) have
> special meaning. Adjust __node_distance() to reflect this and
> modify scrub_heap_pages() and do_sysctl() to deal with
> __node_distance() returning an invalid SLIT entry.
> 
> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>

Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> ---
> 
> Changes in v5:
> * XEN_SYSCTL_numainfo knows about NUMA_NO_DISTANCE
> * Cleaner changes in __node_distance()
> 
>  xen/arch/x86/srat.c        |   13 ++++++++++---
>  xen/common/page_alloc.c    |    4 ++--
>  xen/common/sysctl.c        |    6 +++++-
>  xen/include/asm-x86/numa.h |    2 +-
>  xen/include/xen/numa.h     |    3 ++-
>  5 files changed, 20 insertions(+), 8 deletions(-)
> 
> diff --git a/xen/arch/x86/srat.c b/xen/arch/x86/srat.c
> index dfabba3..92c89a5 100644
> --- a/xen/arch/x86/srat.c
> +++ b/xen/arch/x86/srat.c
> @@ -496,14 +496,21 @@ static unsigned node_to_pxm(nodeid_t n)
>  	return 0;
>  }
>  
> -int __node_distance(nodeid_t a, nodeid_t b)
> +u8 __node_distance(nodeid_t a, nodeid_t b)
>  {
> -	int index;
> +	unsigned index;
> +	u8 slit_val;
>  
>  	if (!acpi_slit)
>  		return a == b ? 10 : 20;
>  	index = acpi_slit->locality_count * node_to_pxm(a);
> -	return acpi_slit->entry[index + node_to_pxm(b)];
> +	slit_val = acpi_slit->entry[index + node_to_pxm(b)];
> +
> +	/* ACPI defines 0xff as an unreachable node and 0-9 are undefined */
> +	if ((slit_val == 0xff) || (slit_val <= 9))
> +		return NUMA_NO_DISTANCE;
> +	else
> +		return slit_val;
>  }
>  
>  EXPORT_SYMBOL(__node_distance);
> diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
> index d999296..bfb356e 100644
> --- a/xen/common/page_alloc.c
> +++ b/xen/common/page_alloc.c
> @@ -1434,13 +1434,13 @@ void __init scrub_heap_pages(void)
>          /* Figure out which NODE CPUs are close. */
>          for_each_online_node ( j )
>          {
> -            int distance;
> +            u8 distance;
>  
>              if ( cpumask_empty(&node_to_cpumask(j)) )
>                  continue;
>  
>              distance = __node_distance(i, j);
> -            if ( distance < last_distance )
> +            if ( (distance < last_distance) && (distance != NUMA_NO_DISTANCE) )
>              {
>                  last_distance = distance;
>                  best_node = j;
> diff --git a/xen/common/sysctl.c b/xen/common/sysctl.c
> index 0cb6ee1..6fdd029 100644
> --- a/xen/common/sysctl.c
> +++ b/xen/common/sysctl.c
> @@ -304,7 +304,11 @@ long do_sysctl(XEN_GUEST_HANDLE_PARAM(xen_sysctl_t) u_sysctl)
>                  {
>                      uint32_t distance = ~0u;
>                      if ( node_online(i) && node_online(j) )
> -                        distance = __node_distance(i, j);
> +                    {
> +                        u8 d = __node_distance(i, j);
> +                        if ( d != NUMA_NO_DISTANCE )
> +                            distance = d;
> +                    }
>                      if ( copy_to_guest_offset(
>                          ni->node_to_node_distance,
>                          i*(max_node_index+1) + j, &distance, 1) )
> diff --git a/xen/include/asm-x86/numa.h b/xen/include/asm-x86/numa.h
> index cc5b5d1..7a489d3 100644
> --- a/xen/include/asm-x86/numa.h
> +++ b/xen/include/asm-x86/numa.h
> @@ -85,6 +85,6 @@ extern int valid_numa_range(u64 start, u64 end, nodeid_t node);
>  #endif
>  
>  void srat_parse_regions(u64 addr);
> -extern int __node_distance(nodeid_t a, nodeid_t b);
> +extern u8 __node_distance(nodeid_t a, nodeid_t b);
>  
>  #endif
> diff --git a/xen/include/xen/numa.h b/xen/include/xen/numa.h
> index ac4b391..7aef1a8 100644
> --- a/xen/include/xen/numa.h
> +++ b/xen/include/xen/numa.h
> @@ -7,7 +7,8 @@
>  #define NODES_SHIFT     0
>  #endif
>  
> -#define NUMA_NO_NODE    0xFF
> +#define NUMA_NO_NODE     0xFF
> +#define NUMA_NO_DISTANCE 0xFF
>  
>  #define MAX_NUMNODES    (1 << NODES_SHIFT)
>  
> -- 
> 1.7.1
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v5 6/8] libxl/libxc: Move libxl_get_cpu_topology()'s hypercall buffer management to libxc
  2015-03-20 14:24     ` Boris Ostrovsky
@ 2015-03-20 14:30       ` Ian Campbell
  0 siblings, 0 replies; 45+ messages in thread
From: Ian Campbell @ 2015-03-20 14:30 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: elena.ufimtseva, wei.liu2, stefano.stabellini, andrew.cooper3,
	dario.faggioli, ian.jackson, xen-devel, jbeulich, keir

On Fri, 2015-03-20 at 10:24 -0400, Boris Ostrovsky wrote:
> On 03/20/2015 09:54 AM, Ian Campbell wrote:
> > On Thu, 2015-03-19 at 17:54 -0400, Boris Ostrovsky wrote:
> >> +    if (cputopo) {
> >> +        if ((ret = xc_hypercall_bounce_pre(xch, cputopo)))
> > I think this guy will tolerate a NULL here (and on post), so I
> > think/hope you can get away without this conditional stuff. Can you give
> > it a go please.
> 
> I'll see if this works. (I can't mentally unwrap all these macros, I'll 
> need to test it. Or at least run it through pre-processor)

The important function is xc__hypercall_bounce_pre which handles
b->ubuf==NULL.

Ian.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v5 7/8] libxl/libxc: Move libxl_get_numainfo()'s hypercall buffer management to libxc
  2015-03-20 13:56   ` Ian Campbell
@ 2015-03-20 14:31     ` Boris Ostrovsky
  2015-03-20 14:46       ` Ian Campbell
  0 siblings, 1 reply; 45+ messages in thread
From: Boris Ostrovsky @ 2015-03-20 14:31 UTC (permalink / raw)
  To: Ian Campbell
  Cc: elena.ufimtseva, wei.liu2, stefano.stabellini, andrew.cooper3,
	dario.faggioli, ian.jackson, xen-devel, jbeulich, keir

On 03/20/2015 09:56 AM, Ian Campbell wrote:
> On Thu, 2015-03-19 at 17:54 -0400, Boris Ostrovsky wrote:
>> diff --git a/tools/libxc/xc_misc.c b/tools/libxc/xc_misc.c
>> index 411128e..607ae61 100644
>> --- a/tools/libxc/xc_misc.c
>> +++ b/tools/libxc/xc_misc.c
>> @@ -209,22 +209,49 @@ out:
>>       return ret;
>>   }
>>   
>> -int xc_numainfo(xc_interface *xch,
>> -                xc_numainfo_t *put_info)
>> +int xc_numainfo(xc_interface *xch, unsigned *max_nodes,
>> +                xc_meminfo_t *meminfo, uint32_t *distance)
>>   {
>>       int ret;
>>       DECLARE_SYSCTL;
>> +    DECLARE_HYPERCALL_BOUNCE(meminfo, *max_nodes * sizeof(*meminfo),
>> +                             XC_HYPERCALL_BUFFER_BOUNCE_OUT);
>> +    DECLARE_HYPERCALL_BOUNCE(distance,
>> +                             *max_nodes * *max_nodes * sizeof(*distance),
>> +                             XC_HYPERCALL_BUFFER_BOUNCE_OUT);
>>   
>> -    sysctl.cmd = XEN_SYSCTL_numainfo;
>> +    if (meminfo && distance) {
>> +        if ((ret = xc_hypercall_bounce_pre(xch, meminfo)))
>> +            goto out;
>> +        if ((ret = xc_hypercall_bounce_pre(xch, distance)))
>> +            goto out;
> Same comment about handling NULL as before.
>
> In addition what if only one of meminfo and distance is NULL? Is that
> valid or do you need a !!meminfo ^ !!distance check?

I want to treat this as as an error here, which is why I have
         } else if (meminfo || distance) {
             errno = EINVAL;
             return -1;
         }

because the hypervisor will only attempt to copy numainfo things when 
both are valid. Otherwise (i.e. even if only one is a NULL) it will 
assume that this is a request for size. The alternative would be to add 
another error there, which I decided not to do.

-boris

>
> Rests looks ok.
>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v5 7/8] libxl/libxc: Move libxl_get_numainfo()'s hypercall buffer management to libxc
  2015-03-20 14:31     ` Boris Ostrovsky
@ 2015-03-20 14:46       ` Ian Campbell
  2015-03-20 14:59         ` Boris Ostrovsky
  0 siblings, 1 reply; 45+ messages in thread
From: Ian Campbell @ 2015-03-20 14:46 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: elena.ufimtseva, wei.liu2, stefano.stabellini, andrew.cooper3,
	dario.faggioli, ian.jackson, xen-devel, jbeulich, keir

On Fri, 2015-03-20 at 10:31 -0400, Boris Ostrovsky wrote:
> On 03/20/2015 09:56 AM, Ian Campbell wrote:
> > On Thu, 2015-03-19 at 17:54 -0400, Boris Ostrovsky wrote:
> >> diff --git a/tools/libxc/xc_misc.c b/tools/libxc/xc_misc.c
> >> index 411128e..607ae61 100644
> >> --- a/tools/libxc/xc_misc.c
> >> +++ b/tools/libxc/xc_misc.c
> >> @@ -209,22 +209,49 @@ out:
> >>       return ret;
> >>   }
> >>   
> >> -int xc_numainfo(xc_interface *xch,
> >> -                xc_numainfo_t *put_info)
> >> +int xc_numainfo(xc_interface *xch, unsigned *max_nodes,
> >> +                xc_meminfo_t *meminfo, uint32_t *distance)
> >>   {
> >>       int ret;
> >>       DECLARE_SYSCTL;
> >> +    DECLARE_HYPERCALL_BOUNCE(meminfo, *max_nodes * sizeof(*meminfo),
> >> +                             XC_HYPERCALL_BUFFER_BOUNCE_OUT);
> >> +    DECLARE_HYPERCALL_BOUNCE(distance,
> >> +                             *max_nodes * *max_nodes * sizeof(*distance),
> >> +                             XC_HYPERCALL_BUFFER_BOUNCE_OUT);
> >>   
> >> -    sysctl.cmd = XEN_SYSCTL_numainfo;
> >> +    if (meminfo && distance) {
> >> +        if ((ret = xc_hypercall_bounce_pre(xch, meminfo)))
> >> +            goto out;
> >> +        if ((ret = xc_hypercall_bounce_pre(xch, distance)))
> >> +            goto out;
> > Same comment about handling NULL as before.
> >
> > In addition what if only one of meminfo and distance is NULL? Is that
> > valid or do you need a !!meminfo ^ !!distance check?
> 
> I want to treat this as as an error here, which is why I have
>          } else if (meminfo || distance) {
>              errno = EINVAL;
>              return -1;
>          }
> 
> because the hypervisor will only attempt to copy numainfo things when 
> both are valid. Otherwise (i.e. even if only one is a NULL) it will 
> assume that this is a request for size. The alternative would be to add 
> another error there, which I decided not to do.

Sorry, I'd trimmed that bit before I thought of the issue. Yes, what you
had was right (although in general it is nicer to test for parameter
validity first).

Ian.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v5 7/8] libxl/libxc: Move libxl_get_numainfo()'s hypercall buffer management to libxc
  2015-03-20 14:46       ` Ian Campbell
@ 2015-03-20 14:59         ` Boris Ostrovsky
  0 siblings, 0 replies; 45+ messages in thread
From: Boris Ostrovsky @ 2015-03-20 14:59 UTC (permalink / raw)
  To: Ian Campbell
  Cc: elena.ufimtseva, wei.liu2, stefano.stabellini, andrew.cooper3,
	dario.faggioli, ian.jackson, xen-devel, jbeulich, keir

On 03/20/2015 10:46 AM, Ian Campbell wrote:
> On Fri, 2015-03-20 at 10:31 -0400, Boris Ostrovsky wrote:
>> On 03/20/2015 09:56 AM, Ian Campbell wrote:
>>> On Thu, 2015-03-19 at 17:54 -0400, Boris Ostrovsky wrote:
>>>> diff --git a/tools/libxc/xc_misc.c b/tools/libxc/xc_misc.c
>>>> index 411128e..607ae61 100644
>>>> --- a/tools/libxc/xc_misc.c
>>>> +++ b/tools/libxc/xc_misc.c
>>>> @@ -209,22 +209,49 @@ out:
>>>>        return ret;
>>>>    }
>>>>    
>>>> -int xc_numainfo(xc_interface *xch,
>>>> -                xc_numainfo_t *put_info)
>>>> +int xc_numainfo(xc_interface *xch, unsigned *max_nodes,
>>>> +                xc_meminfo_t *meminfo, uint32_t *distance)
>>>>    {
>>>>        int ret;
>>>>        DECLARE_SYSCTL;
>>>> +    DECLARE_HYPERCALL_BOUNCE(meminfo, *max_nodes * sizeof(*meminfo),
>>>> +                             XC_HYPERCALL_BUFFER_BOUNCE_OUT);
>>>> +    DECLARE_HYPERCALL_BOUNCE(distance,
>>>> +                             *max_nodes * *max_nodes * sizeof(*distance),
>>>> +                             XC_HYPERCALL_BUFFER_BOUNCE_OUT);
>>>>    
>>>> -    sysctl.cmd = XEN_SYSCTL_numainfo;
>>>> +    if (meminfo && distance) {
>>>> +        if ((ret = xc_hypercall_bounce_pre(xch, meminfo)))
>>>> +            goto out;
>>>> +        if ((ret = xc_hypercall_bounce_pre(xch, distance)))
>>>> +            goto out;
>>> Same comment about handling NULL as before.
>>>
>>> In addition what if only one of meminfo and distance is NULL? Is that
>>> valid or do you need a !!meminfo ^ !!distance check?
>> I want to treat this as as an error here, which is why I have
>>           } else if (meminfo || distance) {
>>               errno = EINVAL;
>>               return -1;
>>           }
>>
>> because the hypervisor will only attempt to copy numainfo things when
>> both are valid. Otherwise (i.e. even if only one is a NULL) it will
>> assume that this is a request for size. The alternative would be to add
>> another error there, which I decided not to do.
> Sorry, I'd trimmed that bit before I thought of the issue. Yes, what you
> had was right (although in general it is nicer to test for parameter
> validity first).

I'll flip the test order.

-boris

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v5 4/8] sysctl: Make XEN_SYSCTL_numainfo a little more efficient
  2015-03-19 21:54 ` [PATCH v5 4/8] sysctl: Make XEN_SYSCTL_numainfo " Boris Ostrovsky
  2015-03-20 13:26   ` Ian Campbell
@ 2015-03-20 16:15   ` Jan Beulich
  1 sibling, 0 replies; 45+ messages in thread
From: Jan Beulich @ 2015-03-20 16:15 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: elena.ufimtseva, wei.liu2, ian.campbell, stefano.stabellini,
	andrew.cooper3, dario.faggioli, ian.jackson, xen-devel, keir

>>> On 19.03.15 at 22:54, <boris.ostrovsky@oracle.com> wrote:
> --- a/xen/common/sysctl.c
> +++ b/xen/common/sysctl.c
> @@ -274,53 +274,62 @@ long do_sysctl(XEN_GUEST_HANDLE_PARAM(xen_sysctl_t) u_sysctl)
>  
>      case XEN_SYSCTL_numainfo:
>      {
> -        uint32_t i, j, max_node_index, last_online_node;
> +        uint32_t i, j, num_nodes;

unsigned int please.

>          xen_sysctl_numainfo_t *ni = &op->u.numainfo;
>  
> -        last_online_node = last_node(node_online_map);
> -        max_node_index = min_t(uint32_t, ni->max_node_index, last_online_node);
> -        ni->max_node_index = last_online_node;
> +        num_nodes = last_node(node_online_map) + 1;
>  
> -        for ( i = 0; i <= max_node_index; i++ )
> +        if ( !guest_handle_is_null(ni->meminfo) &&
> +             !guest_handle_is_null(ni->distance) )
>          {
> -            if ( !guest_handle_is_null(ni->node_to_memsize) )
> +            if ( ni->num_nodes < num_nodes )
>              {
> -                uint64_t memsize = node_online(i) ?
> -                                   node_spanned_pages(i) << PAGE_SHIFT : 0ul;
> -                if ( copy_to_guest_offset(ni->node_to_memsize, i, &memsize, 1) )
> -                    break;
> -            }
> -            if ( !guest_handle_is_null(ni->node_to_memfree) )
> -            {
> -                uint64_t memfree = node_online(i) ?
> -                                   avail_node_heap_pages(i) << PAGE_SHIFT : 0ul;
> -                if ( copy_to_guest_offset(ni->node_to_memfree, i, &memfree, 1) )
> -                    break;
> +                ret = -ENOBUFS;
> +                i = num_nodes;
>              }
>  
> -            if ( !guest_handle_is_null(ni->node_to_node_distance) )
> +            for ( i = 0; i < num_nodes; i++ )
>              {
> -                for ( j = 0; j <= max_node_index; j++)
> +                xen_sysctl_meminfo_t meminfo;
> +                uint32_t distance[MAX_NUMNODES];

I don't think it is a good idea to put such an array on the stack. Since
this is serialized with a lock anyway, please make this static.


> --- a/xen/include/public/sysctl.h
> +++ b/xen/include/public/sysctl.h
> @@ -494,34 +494,41 @@ typedef struct xen_sysctl_cputopoinfo 
> xen_sysctl_cputopoinfo_t;
>  DEFINE_XEN_GUEST_HANDLE(xen_sysctl_cputopoinfo_t);
>  
>  /* XEN_SYSCTL_numainfo */
> -#define INVALID_NUMAINFO_ID (~0U)
> +#define XEN_INVALID_MEM_SZ     (~0U)
> +#define XEN_INVALID_NODE_DIST  (~0U)
> +
> +struct xen_sysctl_meminfo {
> +    uint64_t memsize;
> +    uint64_t memfree;
> +};
> +typedef struct xen_sysctl_meminfo xen_sysctl_meminfo_t;
> +DEFINE_XEN_GUEST_HANDLE(xen_sysctl_meminfo_t);
> +
> +/*
> + * IN:
> + *  - NULL 'meminfo' and 'distance'  handles is a request for maximun
> + *    'num_nodes'.

Please bring code and comment in line (the "and" here contradicts
the implicit "or" in the code).

Jan

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v5 5/8] sysctl: Add sysctl interface for querying PCI topology
  2015-03-19 21:54 ` [PATCH v5 5/8] sysctl: Add sysctl interface for querying PCI topology Boris Ostrovsky
  2015-03-20 14:21   ` Konrad Rzeszutek Wilk
@ 2015-03-20 16:26   ` Jan Beulich
  2015-03-20 20:01     ` Boris Ostrovsky
  1 sibling, 1 reply; 45+ messages in thread
From: Jan Beulich @ 2015-03-20 16:26 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: elena.ufimtseva, wei.liu2, ian.campbell, stefano.stabellini,
	andrew.cooper3, dario.faggioli, ian.jackson, xen-devel, keir

>>> On 19.03.15 at 22:54, <boris.ostrovsky@oracle.com> wrote:
> --- a/xen/common/sysctl.c
> +++ b/xen/common/sysctl.c
> @@ -399,6 +399,67 @@ long do_sysctl(XEN_GUEST_HANDLE_PARAM(xen_sysctl_t) u_sysctl)
>          break;
>  #endif
>  
> +#ifdef HAS_PCI
> +    case XEN_SYSCTL_pcitopoinfo:
> +    {
> +        xen_sysctl_pcitopoinfo_t *ti = &op->u.pcitopoinfo;
> +
> +        if ( guest_handle_is_null(ti->devs) ||
> +             guest_handle_is_null(ti->nodes) ||
> +             (ti->first_dev > ti->num_devs) )
> +        {
> +            ret = -EINVAL;
> +            break;
> +        }
> +
> +        while ( ti->first_dev < ti->num_devs )
> +        {
> +            physdev_pci_device_t dev;
> +            uint32_t node;
> +            struct pci_dev *pdev;
> +
> +            if ( copy_from_guest_offset(&dev, ti->devs, ti->first_dev, 1) )
> +            {
> +                ret = -EFAULT;
> +                break;
> +            }
> +
> +            spin_lock(&pcidevs_lock);
> +            pdev = pci_get_pdev(dev.seg, dev.bus, dev.devfn);
> +            if ( !pdev || (pdev->node == NUMA_NO_NODE) )
> +                node = XEN_INVALID_NODE_ID;

I really think the two cases folded here should be distinguishable
by the caller.

> +            else
> +                node = pdev->node;
> +            spin_unlock(&pcidevs_lock);
> +
> +            if ( copy_to_guest_offset(ti->nodes, ti->first_dev, &node, 1) )
> +            {
> +                ret = -EFAULT;
> +                break;
> +            }
> +
> +            ti->first_dev++;
> +
> +            if ( hypercall_preempt_check() )
> +                break;
> +        }
> +
> +        if ( !ret )
> +        {
> +            if ( __copy_field_to_guest(u_sysctl, op, u.pcitopoinfo.first_dev) )
> +            {
> +                ret = -EFAULT;
> +                break;
> +            }
> +
> +            if ( ti->first_dev < ti->num_devs )
> +                ret = hypercall_create_continuation(__HYPERVISOR_sysctl,
> +                                                    "h", u_sysctl);

Considering this is a tools only interface, enforcing a not too high
limit on num_devs would seem better than this not really clean
continuation mechanism. The (tool stack) caller(s) can be made
iterate.

Jan

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v5 5/8] sysctl: Add sysctl interface for querying PCI topology
  2015-03-20 16:26   ` Jan Beulich
@ 2015-03-20 20:01     ` Boris Ostrovsky
  2015-03-23  8:15       ` Jan Beulich
  0 siblings, 1 reply; 45+ messages in thread
From: Boris Ostrovsky @ 2015-03-20 20:01 UTC (permalink / raw)
  To: Jan Beulich
  Cc: elena.ufimtseva, wei.liu2, ian.campbell, stefano.stabellini,
	andrew.cooper3, dario.faggioli, ian.jackson, xen-devel, keir

On 03/20/2015 12:26 PM, Jan Beulich wrote:
>>>> On 19.03.15 at 22:54, <boris.ostrovsky@oracle.com> wrote:
>> --- a/xen/common/sysctl.c
>> +++ b/xen/common/sysctl.c
>> @@ -399,6 +399,67 @@ long do_sysctl(XEN_GUEST_HANDLE_PARAM(xen_sysctl_t) u_sysctl)
>>           break;
>>   #endif
>>   
>> +#ifdef HAS_PCI
>> +    case XEN_SYSCTL_pcitopoinfo:
>> +    {
>> +        xen_sysctl_pcitopoinfo_t *ti = &op->u.pcitopoinfo;
>> +
>> +        if ( guest_handle_is_null(ti->devs) ||
>> +             guest_handle_is_null(ti->nodes) ||
>> +             (ti->first_dev > ti->num_devs) )
>> +        {
>> +            ret = -EINVAL;
>> +            break;
>> +        }
>> +
>> +        while ( ti->first_dev < ti->num_devs )
>> +        {
>> +            physdev_pci_device_t dev;
>> +            uint32_t node;
>> +            struct pci_dev *pdev;
>> +
>> +            if ( copy_from_guest_offset(&dev, ti->devs, ti->first_dev, 1) )
>> +            {
>> +                ret = -EFAULT;
>> +                break;
>> +            }
>> +
>> +            spin_lock(&pcidevs_lock);
>> +            pdev = pci_get_pdev(dev.seg, dev.bus, dev.devfn);
>> +            if ( !pdev || (pdev->node == NUMA_NO_NODE) )
>> +                node = XEN_INVALID_NODE_ID;
> I really think the two cases folded here should be distinguishable
> by the caller.

How about making  ti->devs array an IN/OUT argument and updating the 
entry with -1s (which I think is an invalid PCI device)? This will make 
the original deviceID disappear though so the callers would be expected 
to stash the array before making the call if they want to know which 
devices were not reported.

Alternatively, since node is 32-bit value while nodeid_t is 8-bit, I can 
add another token that signifies an invalid device. The main problem 
with this approach is that logically we use 'nodes' array for passing 
nodeIDs, not information about devices.

>> +            else
>> +                node = pdev->node;
>> +            spin_unlock(&pcidevs_lock);
>> +
>> +            if ( copy_to_guest_offset(ti->nodes, ti->first_dev, &node, 1) )
>> +            {
>> +                ret = -EFAULT;
>> +                break;
>> +            }
>> +
>> +            ti->first_dev++;
>> +
>> +            if ( hypercall_preempt_check() )
>> +                break;
>> +        }
>> +
>> +        if ( !ret )
>> +        {
>> +            if ( __copy_field_to_guest(u_sysctl, op, u.pcitopoinfo.first_dev) )
>> +            {
>> +                ret = -EFAULT;
>> +                break;
>> +            }
>> +
>> +            if ( ti->first_dev < ti->num_devs )
>> +                ret = hypercall_create_continuation(__HYPERVISOR_sysctl,
>> +                                                    "h", u_sysctl);
> Considering this is a tools only interface, enforcing a not too high
> limit on num_devs would seem better than this not really clean
> continuation mechanism. The (tool stack) caller(s) can be made
> iterate.

What's a reasonable limit per call? 100?

-boris

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v5 5/8] sysctl: Add sysctl interface for querying PCI topology
  2015-03-20 20:01     ` Boris Ostrovsky
@ 2015-03-23  8:15       ` Jan Beulich
  2015-03-23 13:58         ` Boris Ostrovsky
  0 siblings, 1 reply; 45+ messages in thread
From: Jan Beulich @ 2015-03-23  8:15 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: elena.ufimtseva, wei.liu2, ian.campbell, stefano.stabellini,
	andrew.cooper3, dario.faggioli, ian.jackson, xen-devel, keir

>>> On 20.03.15 at 21:01, <boris.ostrovsky@oracle.com> wrote:
> On 03/20/2015 12:26 PM, Jan Beulich wrote:
>>>>> On 19.03.15 at 22:54, <boris.ostrovsky@oracle.com> wrote:
>>> --- a/xen/common/sysctl.c
>>> +++ b/xen/common/sysctl.c
>>> @@ -399,6 +399,67 @@ long do_sysctl(XEN_GUEST_HANDLE_PARAM(xen_sysctl_t) u_sysctl)
>>>           break;
>>>   #endif
>>>   
>>> +#ifdef HAS_PCI
>>> +    case XEN_SYSCTL_pcitopoinfo:
>>> +    {
>>> +        xen_sysctl_pcitopoinfo_t *ti = &op->u.pcitopoinfo;
>>> +
>>> +        if ( guest_handle_is_null(ti->devs) ||
>>> +             guest_handle_is_null(ti->nodes) ||
>>> +             (ti->first_dev > ti->num_devs) )
>>> +        {
>>> +            ret = -EINVAL;
>>> +            break;
>>> +        }
>>> +
>>> +        while ( ti->first_dev < ti->num_devs )
>>> +        {
>>> +            physdev_pci_device_t dev;
>>> +            uint32_t node;
>>> +            struct pci_dev *pdev;
>>> +
>>> +            if ( copy_from_guest_offset(&dev, ti->devs, ti->first_dev, 1) )
>>> +            {
>>> +                ret = -EFAULT;
>>> +                break;
>>> +            }
>>> +
>>> +            spin_lock(&pcidevs_lock);
>>> +            pdev = pci_get_pdev(dev.seg, dev.bus, dev.devfn);
>>> +            if ( !pdev || (pdev->node == NUMA_NO_NODE) )
>>> +                node = XEN_INVALID_NODE_ID;
>> I really think the two cases folded here should be distinguishable
>> by the caller.
> 
> How about making  ti->devs array an IN/OUT argument and updating the 
> entry with -1s (which I think is an invalid PCI device)? This will make 
> the original deviceID disappear though so the callers would be expected 
> to stash the array before making the call if they want to know which 
> devices were not reported.

Sadly all ones in physdev_pci_device_t still could be a valid device.

> Alternatively, since node is 32-bit value while nodeid_t is 8-bit, I can 
> add another token that signifies an invalid device. The main problem 
> with this approach is that logically we use 'nodes' array for passing 
> nodeIDs, not information about devices.

I realize that. I wonder whether passing in a bad device shouldn't
simply result in -ENODEV, perhaps with first_dev pointing at the
bad slot?

>>> +            else
>>> +                node = pdev->node;
>>> +            spin_unlock(&pcidevs_lock);
>>> +
>>> +            if ( copy_to_guest_offset(ti->nodes, ti->first_dev, &node, 1) )
>>> +            {
>>> +                ret = -EFAULT;
>>> +                break;
>>> +            }
>>> +
>>> +            ti->first_dev++;
>>> +
>>> +            if ( hypercall_preempt_check() )
>>> +                break;
>>> +        }
>>> +
>>> +        if ( !ret )
>>> +        {
>>> +            if ( __copy_field_to_guest(u_sysctl, op, u.pcitopoinfo.first_dev) )
>>> +            {
>>> +                ret = -EFAULT;
>>> +                break;
>>> +            }
>>> +
>>> +            if ( ti->first_dev < ti->num_devs )
>>> +                ret = hypercall_create_continuation(__HYPERVISOR_sysctl,
>>> +                                                    "h", u_sysctl);
>> Considering this is a tools only interface, enforcing a not too high
>> limit on num_devs would seem better than this not really clean
>> continuation mechanism. The (tool stack) caller(s) can be made
>> iterate.
> 
> What's a reasonable limit per call? 100?

Commonly we use powers of two for these, even if not strictly
needed to be that way. Hence I'd suggest 64. But please be sure
not to make this implementation detail part of the ABI.

Jan

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v5 0/8] Display IO topology when PXM data is available (plus some cleanup)
  2015-03-19 21:53 [PATCH v5 0/8] Display IO topology when PXM data is available (plus some cleanup) Boris Ostrovsky
                   ` (7 preceding siblings ...)
  2015-03-19 21:54 ` [PATCH v5 8/8] libxl: Add interface for querying hypervisor about PCI topology Boris Ostrovsky
@ 2015-03-23 12:42 ` Julien Grall
  2015-03-23 13:47   ` Boris Ostrovsky
  2015-03-23 14:30   ` Jan Beulich
  8 siblings, 2 replies; 45+ messages in thread
From: Julien Grall @ 2015-03-23 12:42 UTC (permalink / raw)
  To: Boris Ostrovsky, andrew.cooper3, jbeulich, keir, ian.jackson,
	stefano.stabellini, ian.campbell, wei.liu2, dario.faggioli,
	elena.ufimtseva, xen-devel

Hi,

On 19/03/15 21:53, Boris Ostrovsky wrote:
> A few patches that add interface for querying hypervisor about device
> topology and allow 'xl info -n' display this information if PXM object
> is provided by ACPI.
> 
> This series also makes some optimizations and cleanup of current CPU
> topology and NUMA sysctl queries.

I saw that some of these patches was pushed upstream last week. It
actually breaks compilation on ARM.

While the first error is trivial to fix (it's a missing include
xen/numa.h in xen/pci.h ). The second one is more difficult

sysctl.c: In function ‘do_sysctl’:
sysctl.c:349:42: error: ‘BAD_APICID’ undeclared (first use in this function)
                     if ( cputopo.core == BAD_APICID )
                                          ^
The value BAD_APICID doesn't have any meaning on ARM. Therefore the
usage in common code looks wrong to me. I'm not sure what should be the
testing value here given that cpu_to_core is not yet correctly
implemented on ARM.

It would be nice to at least build test it ARM/ARM64 before pushing
patches which modify common code.

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v5 0/8] Display IO topology when PXM data is available (plus some cleanup)
  2015-03-23 12:42 ` [PATCH v5 0/8] Display IO topology when PXM data is available (plus some cleanup) Julien Grall
@ 2015-03-23 13:47   ` Boris Ostrovsky
  2015-03-23 14:30     ` George Dunlap
                       ` (2 more replies)
  2015-03-23 14:30   ` Jan Beulich
  1 sibling, 3 replies; 45+ messages in thread
From: Boris Ostrovsky @ 2015-03-23 13:47 UTC (permalink / raw)
  To: Julien Grall, andrew.cooper3, jbeulich, keir, ian.jackson,
	stefano.stabellini, ian.campbell, wei.liu2, dario.faggioli,
	elena.ufimtseva, xen-devel

On 03/23/2015 08:42 AM, Julien Grall wrote:
> Hi,
>
> On 19/03/15 21:53, Boris Ostrovsky wrote:
>> A few patches that add interface for querying hypervisor about device
>> topology and allow 'xl info -n' display this information if PXM object
>> is provided by ACPI.
>>
>> This series also makes some optimizations and cleanup of current CPU
>> topology and NUMA sysctl queries.
> I saw that some of these patches was pushed upstream last week. It
> actually breaks compilation on ARM.
>
> While the first error is trivial to fix (it's a missing include
> xen/numa.h in xen/pci.h ). The second one is more difficult
>
> sysctl.c: In function ‘do_sysctl’:
> sysctl.c:349:42: error: ‘BAD_APICID’ undeclared (first use in this function)
>                       if ( cputopo.core == BAD_APICID )
>                                            ^
> The value BAD_APICID doesn't have any meaning on ARM. Therefore the
> usage in common code looks wrong to me. I'm not sure what should be the
> testing value here given that cpu_to_core is not yet correctly
> implemented on ARM.

How about this (only x86 compile-tested). And perhaps, while at it, fix 
types for cpu_core_id and phys_proc_id.

diff --git a/xen/common/sysctl.c b/xen/common/sysctl.c
index c73dfc9..b319be7 100644
--- a/xen/common/sysctl.c
+++ b/xen/common/sysctl.c
@@ -354,10 +354,10 @@ long 
do_sysctl(XEN_GUEST_HANDLE_PARAM(xen_sysctl_t) u_sysctl)
                  if ( cpu_present(i) )
                  {
                      cputopo.core = cpu_to_core(i);
-                    if ( cputopo.core == BAD_APICID )
+                    if ( cputopo.core == INVALID_COREID )
                          cputopo.core = XEN_INVALID_CORE_ID;
                      cputopo.socket = cpu_to_socket(i);
-                    if ( cputopo.socket == BAD_APICID )
+                    if ( cputopo.socket == INVALID_SOCKETID )
                          cputopo.socket = XEN_INVALID_SOCKET_ID;
                      cputopo.node = cpu_to_node(i);
                      if ( cputopo.node == NUMA_NO_NODE )
diff --git a/xen/include/asm-x86/processor.h 
b/xen/include/asm-x86/processor.h
index 87d80ff..07cee2a 100644
--- a/xen/include/asm-x86/processor.h
+++ b/xen/include/asm-x86/processor.h
@@ -214,8 +214,19 @@ extern void detect_extended_topology(struct 
cpuinfo_x86 *c);

  extern void detect_ht(struct cpuinfo_x86 *c);

-#define cpu_to_core(_cpu)   (cpu_data[_cpu].cpu_core_id)
-#define cpu_to_socket(_cpu) (cpu_data[_cpu].phys_proc_id)
+inline int cpu_to_core(unsigned cpu)
+{
+    if ( cpu_data[cpu].cpu_core_id == BAD_APICID )
+        return INVALID_COREID;
+    return cpu_data[cpu].cpu_core_id;
+}
+
+inline int cpu_to_socket(unsigned cpu)
+{
+    if ( cpu_data[cpu].phys_proc_id == BAD_APICID )
+        return INVALID_SOCKETID;
+    return cpu_data[cpu].phys_proc_id;
+}

  unsigned int apicid_to_socket(unsigned int);

diff --git a/xen/include/xen/smp.h b/xen/include/xen/smp.h
index 6febb56..0b4c660 100644
--- a/xen/include/xen/smp.h
+++ b/xen/include/xen/smp.h
@@ -3,6 +3,9 @@

  #include <asm/smp.h>

+#define INVALID_COREID   -1
+#define INVALID_SOCKETID -1
+
  /*
   * stops all CPUs but the current one:
   */



> It would be nice to at least build test it ARM/ARM64 before pushing
> patches which modify common code.

I unfortunately have no ability to build on ARM (but I should have 
realized that APICID has no meaning there).


-boris

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [PATCH v5 5/8] sysctl: Add sysctl interface for querying PCI topology
  2015-03-23  8:15       ` Jan Beulich
@ 2015-03-23 13:58         ` Boris Ostrovsky
  0 siblings, 0 replies; 45+ messages in thread
From: Boris Ostrovsky @ 2015-03-23 13:58 UTC (permalink / raw)
  To: Jan Beulich
  Cc: elena.ufimtseva, wei.liu2, ian.campbell, stefano.stabellini,
	andrew.cooper3, dario.faggioli, ian.jackson, xen-devel, keir

On 03/23/2015 04:15 AM, Jan Beulich wrote:
>>>> On 20.03.15 at 21:01, <boris.ostrovsky@oracle.com> wrote:
>> On 03/20/2015 12:26 PM, Jan Beulich wrote:
>>>>>> On 19.03.15 at 22:54, <boris.ostrovsky@oracle.com> wrote:
>>>> --- a/xen/common/sysctl.c
>>>> +++ b/xen/common/sysctl.c
>>>> @@ -399,6 +399,67 @@ long do_sysctl(XEN_GUEST_HANDLE_PARAM(xen_sysctl_t) u_sysctl)
>>>>            break;
>>>>    #endif
>>>>    
>>>> +#ifdef HAS_PCI
>>>> +    case XEN_SYSCTL_pcitopoinfo:
>>>> +    {
>>>> +        xen_sysctl_pcitopoinfo_t *ti = &op->u.pcitopoinfo;
>>>> +
>>>> +        if ( guest_handle_is_null(ti->devs) ||
>>>> +             guest_handle_is_null(ti->nodes) ||
>>>> +             (ti->first_dev > ti->num_devs) )
>>>> +        {
>>>> +            ret = -EINVAL;
>>>> +            break;
>>>> +        }
>>>> +
>>>> +        while ( ti->first_dev < ti->num_devs )
>>>> +        {
>>>> +            physdev_pci_device_t dev;
>>>> +            uint32_t node;
>>>> +            struct pci_dev *pdev;
>>>> +
>>>> +            if ( copy_from_guest_offset(&dev, ti->devs, ti->first_dev, 1) )
>>>> +            {
>>>> +                ret = -EFAULT;
>>>> +                break;
>>>> +            }
>>>> +
>>>> +            spin_lock(&pcidevs_lock);
>>>> +            pdev = pci_get_pdev(dev.seg, dev.bus, dev.devfn);
>>>> +            if ( !pdev || (pdev->node == NUMA_NO_NODE) )
>>>> +                node = XEN_INVALID_NODE_ID;
>>> I really think the two cases folded here should be distinguishable
>>> by the caller.
>> How about making  ti->devs array an IN/OUT argument and updating the
>> entry with -1s (which I think is an invalid PCI device)? This will make
>> the original deviceID disappear though so the callers would be expected
>> to stash the array before making the call if they want to know which
>> devices were not reported.
> Sadly all ones in physdev_pci_device_t still could be a valid device.
>
>> Alternatively, since node is 32-bit value while nodeid_t is 8-bit, I can
>> add another token that signifies an invalid device. The main problem
>> with this approach is that logically we use 'nodes' array for passing
>> nodeIDs, not information about devices.
> I realize that. I wonder whether passing in a bad device shouldn't
> simply result in -ENODEV, perhaps with first_dev pointing at the
> bad slot?

Yes, this will work. The caller can then make a note of the bad device, 
increment first_dev and retry the call.

-boris

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v5 0/8] Display IO topology when PXM data is available (plus some cleanup)
  2015-03-23 13:47   ` Boris Ostrovsky
@ 2015-03-23 14:30     ` George Dunlap
  2015-03-23 14:33       ` Boris Ostrovsky
  2015-03-23 14:42       ` Ian Campbell
  2015-03-23 15:26     ` Julien Grall
  2015-03-23 15:27     ` Jan Beulich
  2 siblings, 2 replies; 45+ messages in thread
From: George Dunlap @ 2015-03-23 14:30 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: elena.ufimtseva, Keir Fraser, Ian Campbell, Stefano Stabellini,
	Andrew Cooper, Dario Faggioli, Julien Grall, Ian Jackson,
	xen-devel@lists.xen.org, Jan Beulich, Wei Liu

On Mon, Mar 23, 2015 at 1:47 PM, Boris Ostrovsky
<boris.ostrovsky@oracle.com> wrote:
> On 03/23/2015 08:42 AM, Julien Grall wrote:
>>
>> Hi,
>>
>> On 19/03/15 21:53, Boris Ostrovsky wrote:
>>>
>>> A few patches that add interface for querying hypervisor about device
>>> topology and allow 'xl info -n' display this information if PXM object
>>> is provided by ACPI.
>>>
>>> This series also makes some optimizations and cleanup of current CPU
>>> topology and NUMA sysctl queries.
>>
>> I saw that some of these patches was pushed upstream last week. It
>> actually breaks compilation on ARM.
>>
>> While the first error is trivial to fix (it's a missing include
>> xen/numa.h in xen/pci.h ). The second one is more difficult
>>
>> sysctl.c: In function ‘do_sysctl’:
>> sysctl.c:349:42: error: ‘BAD_APICID’ undeclared (first use in this
>> function)
>>                       if ( cputopo.core == BAD_APICID )
>>                                            ^
>> The value BAD_APICID doesn't have any meaning on ARM. Therefore the
>> usage in common code looks wrong to me. I'm not sure what should be the
>> testing value here given that cpu_to_core is not yet correctly
>> implemented on ARM.
>
>
> How about this (only x86 compile-tested). And perhaps, while at it, fix
> types for cpu_core_id and phys_proc_id.
>
> diff --git a/xen/common/sysctl.c b/xen/common/sysctl.c
> index c73dfc9..b319be7 100644
> --- a/xen/common/sysctl.c
> +++ b/xen/common/sysctl.c
> @@ -354,10 +354,10 @@ long do_sysctl(XEN_GUEST_HANDLE_PARAM(xen_sysctl_t)
> u_sysctl)
>                  if ( cpu_present(i) )
>                  {
>                      cputopo.core = cpu_to_core(i);
> -                    if ( cputopo.core == BAD_APICID )
> +                    if ( cputopo.core == INVALID_COREID )
>                          cputopo.core = XEN_INVALID_CORE_ID;
>                      cputopo.socket = cpu_to_socket(i);
> -                    if ( cputopo.socket == BAD_APICID )
> +                    if ( cputopo.socket == INVALID_SOCKETID )
>                          cputopo.socket = XEN_INVALID_SOCKET_ID;
>                      cputopo.node = cpu_to_node(i);
>                      if ( cputopo.node == NUMA_NO_NODE )
> diff --git a/xen/include/asm-x86/processor.h
> b/xen/include/asm-x86/processor.h
> index 87d80ff..07cee2a 100644
> --- a/xen/include/asm-x86/processor.h
> +++ b/xen/include/asm-x86/processor.h
> @@ -214,8 +214,19 @@ extern void detect_extended_topology(struct cpuinfo_x86
> *c);
>
>  extern void detect_ht(struct cpuinfo_x86 *c);
>
> -#define cpu_to_core(_cpu)   (cpu_data[_cpu].cpu_core_id)
> -#define cpu_to_socket(_cpu) (cpu_data[_cpu].phys_proc_id)
> +inline int cpu_to_core(unsigned cpu)
> +{
> +    if ( cpu_data[cpu].cpu_core_id == BAD_APICID )
> +        return INVALID_COREID;
> +    return cpu_data[cpu].cpu_core_id;
> +}
> +
> +inline int cpu_to_socket(unsigned cpu)
> +{
> +    if ( cpu_data[cpu].phys_proc_id == BAD_APICID )
> +        return INVALID_SOCKETID;
> +    return cpu_data[cpu].phys_proc_id;
> +}
>
>  unsigned int apicid_to_socket(unsigned int);
>
> diff --git a/xen/include/xen/smp.h b/xen/include/xen/smp.h
> index 6febb56..0b4c660 100644
> --- a/xen/include/xen/smp.h
> +++ b/xen/include/xen/smp.h
> @@ -3,6 +3,9 @@
>
>  #include <asm/smp.h>
>
> +#define INVALID_COREID   -1
> +#define INVALID_SOCKETID -1
> +
>  /*
>   * stops all CPUs but the current one:
>   */
>
>
>
>> It would be nice to at least build test it ARM/ARM64 before pushing
>> patches which modify common code.
>
>
> I unfortunately have no ability to build on ARM (but I should have realized
> that APICID has no meaning there).

Does gcc have a tolerable cross-compiler for ARM?  Particularly just
for building the hypervisor.  The resulting code doesn't need to be
that great, it just needs to catch this sort of thing.

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v5 0/8] Display IO topology when PXM data is available (plus some cleanup)
  2015-03-23 12:42 ` [PATCH v5 0/8] Display IO topology when PXM data is available (plus some cleanup) Julien Grall
  2015-03-23 13:47   ` Boris Ostrovsky
@ 2015-03-23 14:30   ` Jan Beulich
  2015-03-23 14:38     ` George Dunlap
  1 sibling, 1 reply; 45+ messages in thread
From: Jan Beulich @ 2015-03-23 14:30 UTC (permalink / raw)
  To: Julien Grall, Boris Ostrovsky
  Cc: elena.ufimtseva, wei.liu2, ian.campbell, stefano.stabellini,
	andrew.cooper3, dario.faggioli, ian.jackson, xen-devel, keir

>>> On 23.03.15 at 13:42, <julien.grall@linaro.org> wrote:
> It would be nice to at least build test it ARM/ARM64 before pushing
> patches which modify common code.

I try to remember that, but do not always succeed. Really I'd expect
people to not even submit such patches.

Jan

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v5 0/8] Display IO topology when PXM data is available (plus some cleanup)
  2015-03-23 14:30     ` George Dunlap
@ 2015-03-23 14:33       ` Boris Ostrovsky
  2015-03-23 14:42       ` Ian Campbell
  1 sibling, 0 replies; 45+ messages in thread
From: Boris Ostrovsky @ 2015-03-23 14:33 UTC (permalink / raw)
  To: George Dunlap
  Cc: elena.ufimtseva, Keir Fraser, Ian Campbell, Stefano Stabellini,
	Andrew Cooper, Dario Faggioli, Julien Grall, Ian Jackson,
	xen-devel@lists.xen.org, Jan Beulich, Wei Liu

On 03/23/2015 10:30 AM, George Dunlap wrote:
> On Mon, Mar 23, 2015 at 1:47 PM, Boris Ostrovsky
> <boris.ostrovsky@oracle.com> wrote:
>> I unfortunately have no ability to build on ARM (but I should have realized
>> that APICID has no meaning there).
> Does gcc have a tolerable cross-compiler for ARM?  Particularly just
> for building the hypervisor.  The resulting code doesn't need to be
> that great, it just needs to catch this sort of thing.

Yes. Konrad pointed out to me that we do have ARM cross-compiler set up 
on our test harness.

-boris

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v5 0/8] Display IO topology when PXM data is available (plus some cleanup)
  2015-03-23 14:30   ` Jan Beulich
@ 2015-03-23 14:38     ` George Dunlap
  2015-03-23 15:17       ` Ian Campbell
  2015-03-23 15:21       ` Julien Grall
  0 siblings, 2 replies; 45+ messages in thread
From: George Dunlap @ 2015-03-23 14:38 UTC (permalink / raw)
  To: Jan Beulich
  Cc: elena.ufimtseva, Wei Liu, Ian Campbell, Stefano Stabellini,
	Andrew Cooper, Dario Faggioli, Julien Grall, Ian Jackson,
	xen-devel@lists.xen.org, Keir Fraser, Boris Ostrovsky

On Mon, Mar 23, 2015 at 2:30 PM, Jan Beulich <JBeulich@suse.com> wrote:
>>>> On 23.03.15 at 13:42, <julien.grall@linaro.org> wrote:
>> It would be nice to at least build test it ARM/ARM64 before pushing
>> patches which modify common code.
>
> I try to remember that, but do not always succeed. Really I'd expect
> people to not even submit such patches.

To be fair, this is the reason we have a push gate; people who want
only non-broken builds should use master rather than staging.

But given how long the test cycles take, it would be better if people
submitting patches did some smoke-testing first.

 -George

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v5 0/8] Display IO topology when PXM data is available (plus some cleanup)
  2015-03-23 14:30     ` George Dunlap
  2015-03-23 14:33       ` Boris Ostrovsky
@ 2015-03-23 14:42       ` Ian Campbell
  1 sibling, 0 replies; 45+ messages in thread
From: Ian Campbell @ 2015-03-23 14:42 UTC (permalink / raw)
  To: George Dunlap
  Cc: elena.ufimtseva, Keir Fraser, Stefano Stabellini, Andrew Cooper,
	Dario Faggioli, Julien Grall, Ian Jackson,
	xen-devel@lists.xen.org, Jan Beulich, Wei Liu, Boris Ostrovsky

On Mon, 2015-03-23 at 14:30 +0000, George Dunlap wrote:

> Does gcc have a tolerable cross-compiler for ARM?  Particularly just
> for building the hypervisor.  The resulting code doesn't need to be
> that great, it just needs to catch this sort of thing.

It's pretty easy for the hypervisor using Linaro's toolchain (which I
use every day):

http://wiki.xen.org/wiki/Xen_ARM_with_Virtualization_Extensions#Cross_Compiling

Cross building the tools takes a bit more preparation and is enough faff
that I wouldn't expect a non-arm contributor to do it.

Ian.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v5 0/8] Display IO topology when PXM data is available (plus some cleanup)
  2015-03-23 14:38     ` George Dunlap
@ 2015-03-23 15:17       ` Ian Campbell
  2015-03-23 15:21       ` Julien Grall
  1 sibling, 0 replies; 45+ messages in thread
From: Ian Campbell @ 2015-03-23 15:17 UTC (permalink / raw)
  To: George Dunlap
  Cc: elena.ufimtseva, Wei Liu, Stefano Stabellini, Andrew Cooper,
	Dario Faggioli, Julien Grall, Ian Jackson,
	xen-devel@lists.xen.org, Jan Beulich, Keir Fraser,
	Boris Ostrovsky

On Mon, 2015-03-23 at 14:38 +0000, George Dunlap wrote:
> On Mon, Mar 23, 2015 at 2:30 PM, Jan Beulich <JBeulich@suse.com> wrote:
> >>>> On 23.03.15 at 13:42, <julien.grall@linaro.org> wrote:
> >> It would be nice to at least build test it ARM/ARM64 before pushing
> >> patches which modify common code.
> >
> > I try to remember that, but do not always succeed. Really I'd expect
> > people to not even submit such patches.
> 
> To be fair, this is the reason we have a push gate; people who want
> only non-broken builds should use master rather than staging.

Agreed.

> But given how long the test cycles take, it would be better if people
> submitting patches did some smoke-testing first.

Also agreed, although it can be hard to think "ARM" when you see a
diffstat containing xen/common/foo.c unless you actually work on ARM day
to day.

Ian.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v5 0/8] Display IO topology when PXM data is available (plus some cleanup)
  2015-03-23 14:38     ` George Dunlap
  2015-03-23 15:17       ` Ian Campbell
@ 2015-03-23 15:21       ` Julien Grall
  1 sibling, 0 replies; 45+ messages in thread
From: Julien Grall @ 2015-03-23 15:21 UTC (permalink / raw)
  To: George Dunlap, Jan Beulich
  Cc: elena.ufimtseva, Wei Liu, Ian Campbell, Stefano Stabellini,
	Andrew Cooper, Dario Faggioli, Ian Jackson,
	xen-devel@lists.xen.org, Keir Fraser, Boris Ostrovsky

On 23/03/15 14:38, George Dunlap wrote:
> On Mon, Mar 23, 2015 at 2:30 PM, Jan Beulich <JBeulich@suse.com> wrote:
>>>>> On 23.03.15 at 13:42, <julien.grall@linaro.org> wrote:
>>> It would be nice to at least build test it ARM/ARM64 before pushing
>>> patches which modify common code.
>>
>> I try to remember that, but do not always succeed. Really I'd expect
>> people to not even submit such patches.
> 
> To be fair, this is the reason we have a push gate; people who want
> only non-broken builds should use master rather than staging.

I'm using staging because it often contains bug fixes which is necessary
for booting Xen on my boards.

Using master means having least a week old Xen and cherry-pick some
patches from staging.

The latter may be difficult when the patch has dependency.

> But given how long the test cycles take, it would be better if people
> submitting patches did some smoke-testing first.

I think it's a good solution, building Xen ARM take less than 2 minutes

Cheers,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v5 0/8] Display IO topology when PXM data is available (plus some cleanup)
  2015-03-23 13:47   ` Boris Ostrovsky
  2015-03-23 14:30     ` George Dunlap
@ 2015-03-23 15:26     ` Julien Grall
  2015-03-23 15:27     ` Jan Beulich
  2 siblings, 0 replies; 45+ messages in thread
From: Julien Grall @ 2015-03-23 15:26 UTC (permalink / raw)
  To: Boris Ostrovsky, andrew.cooper3, jbeulich, keir, ian.jackson,
	stefano.stabellini, ian.campbell, wei.liu2, dario.faggioli,
	elena.ufimtseva, xen-devel

Hi,

On 23/03/15 13:47, Boris Ostrovsky wrote:
> On 03/23/2015 08:42 AM, Julien Grall wrote:
>> On 19/03/15 21:53, Boris Ostrovsky wrote:
>>> A few patches that add interface for querying hypervisor about device
>>> topology and allow 'xl info -n' display this information if PXM object
>>> is provided by ACPI.
>>>
>>> This series also makes some optimizations and cleanup of current CPU
>>> topology and NUMA sysctl queries.
>> I saw that some of these patches was pushed upstream last week. It
>> actually breaks compilation on ARM.
>>
>> While the first error is trivial to fix (it's a missing include
>> xen/numa.h in xen/pci.h ). The second one is more difficult
>>
>> sysctl.c: In function ‘do_sysctl’:
>> sysctl.c:349:42: error: ‘BAD_APICID’ undeclared (first use in this
>> function)
>>                       if ( cputopo.core == BAD_APICID )
>>                                            ^
>> The value BAD_APICID doesn't have any meaning on ARM. Therefore the
>> usage in common code looks wrong to me. I'm not sure what should be the
>> testing value here given that cpu_to_core is not yet correctly
>> implemented on ARM.
> 
> How about this (only x86 compile-tested). And perhaps, while at it, fix
> types for cpu_core_id and phys_proc_id.

This patch build on ARM/ARM64.

> diff --git a/xen/common/sysctl.c b/xen/common/sysctl.c
> index c73dfc9..b319be7 100644
> --- a/xen/common/sysctl.c
> +++ b/xen/common/sysctl.c
> @@ -354,10 +354,10 @@ long
> do_sysctl(XEN_GUEST_HANDLE_PARAM(xen_sysctl_t) u_sysctl)
>                  if ( cpu_present(i) )
>                  {
>                      cputopo.core = cpu_to_core(i);
> -                    if ( cputopo.core == BAD_APICID )
> +                    if ( cputopo.core == INVALID_COREID )
>                          cputopo.core = XEN_INVALID_CORE_ID;
>                      cputopo.socket = cpu_to_socket(i);
> -                    if ( cputopo.socket == BAD_APICID )
> +                    if ( cputopo.socket == INVALID_SOCKETID )
>                          cputopo.socket = XEN_INVALID_SOCKET_ID;
>                      cputopo.node = cpu_to_node(i);
>                      if ( cputopo.node == NUMA_NO_NODE )
> diff --git a/xen/include/asm-x86/processor.h
> b/xen/include/asm-x86/processor.h
> index 87d80ff..07cee2a 100644
> --- a/xen/include/asm-x86/processor.h
> +++ b/xen/include/asm-x86/processor.h
> @@ -214,8 +214,19 @@ extern void detect_extended_topology(struct
> cpuinfo_x86 *c);
> 
>  extern void detect_ht(struct cpuinfo_x86 *c);
> 
> -#define cpu_to_core(_cpu)   (cpu_data[_cpu].cpu_core_id)
> -#define cpu_to_socket(_cpu) (cpu_data[_cpu].phys_proc_id)
> +inline int cpu_to_core(unsigned cpu)
> +{
> +    if ( cpu_data[cpu].cpu_core_id == BAD_APICID )
> +        return INVALID_COREID;
> +    return cpu_data[cpu].cpu_core_id;
> +}
> +
> +inline int cpu_to_socket(unsigned cpu)
> +{
> +    if ( cpu_data[cpu].phys_proc_id == BAD_APICID )
> +        return INVALID_SOCKETID;
> +    return cpu_data[cpu].phys_proc_id;
> +}
> 
>  unsigned int apicid_to_socket(unsigned int);
> 
> diff --git a/xen/include/xen/smp.h b/xen/include/xen/smp.h
> index 6febb56..0b4c660 100644
> --- a/xen/include/xen/smp.h
> +++ b/xen/include/xen/smp.h
> @@ -3,6 +3,9 @@
> 
>  #include <asm/smp.h>
> 
> +#define INVALID_COREID   -1
> +#define INVALID_SOCKETID -1
> +
>  /*
>   * stops all CPUs but the current one:
>   */

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v5 0/8] Display IO topology when PXM data is available (plus some cleanup)
  2015-03-23 13:47   ` Boris Ostrovsky
  2015-03-23 14:30     ` George Dunlap
  2015-03-23 15:26     ` Julien Grall
@ 2015-03-23 15:27     ` Jan Beulich
  2015-03-23 15:33       ` Boris Ostrovsky
  2 siblings, 1 reply; 45+ messages in thread
From: Jan Beulich @ 2015-03-23 15:27 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: elena.ufimtseva, wei.liu2, ian.campbell, stefano.stabellini,
	andrew.cooper3, dario.faggioli, Julien Grall, ian.jackson,
	xen-devel, keir

>>> On 23.03.15 at 14:47, <boris.ostrovsky@oracle.com> wrote:
> How about this (only x86 compile-tested). And perhaps, while at it, fix 
> types for cpu_core_id and phys_proc_id.
> 
> diff --git a/xen/common/sysctl.c b/xen/common/sysctl.c
> index c73dfc9..b319be7 100644
> --- a/xen/common/sysctl.c
> +++ b/xen/common/sysctl.c
> @@ -354,10 +354,10 @@ long do_sysctl(XEN_GUEST_HANDLE_PARAM(xen_sysctl_t) u_sysctl)
>                   if ( cpu_present(i) )
>                   {
>                       cputopo.core = cpu_to_core(i);
> -                    if ( cputopo.core == BAD_APICID )
> +                    if ( cputopo.core == INVALID_COREID )
>                           cputopo.core = XEN_INVALID_CORE_ID;
>                       cputopo.socket = cpu_to_socket(i);
> -                    if ( cputopo.socket == BAD_APICID )
> +                    if ( cputopo.socket == INVALID_SOCKETID )
>                           cputopo.socket = XEN_INVALID_SOCKET_ID;

Why not use XEN_INVALID_CORE_ID / XEN_INVALID_SOCKET_ID
for those return values right away?

> --- a/xen/include/asm-x86/processor.h
> +++ b/xen/include/asm-x86/processor.h
> @@ -214,8 +214,19 @@ extern void detect_extended_topology(struct cpuinfo_x86 *c);
> 
>   extern void detect_ht(struct cpuinfo_x86 *c);
> 
> -#define cpu_to_core(_cpu)   (cpu_data[_cpu].cpu_core_id)
> -#define cpu_to_socket(_cpu) (cpu_data[_cpu].phys_proc_id)
> +inline int cpu_to_core(unsigned cpu)
> +{
> +    if ( cpu_data[cpu].cpu_core_id == BAD_APICID )
> +        return INVALID_COREID;
> +    return cpu_data[cpu].cpu_core_id;
> +}
> +
> +inline int cpu_to_socket(unsigned cpu)
> +{
> +    if ( cpu_data[cpu].phys_proc_id == BAD_APICID )
> +        return INVALID_SOCKETID;
> +    return cpu_data[cpu].phys_proc_id;
> +}

Apart from them needing to be static, I don't think we want the
extra conditionals in x86 code. Hence I think you rather should
introduce wrappers for the specific us in sysctl.c.

Jan

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v5 0/8] Display IO topology when PXM data is available (plus some cleanup)
  2015-03-23 15:27     ` Jan Beulich
@ 2015-03-23 15:33       ` Boris Ostrovsky
  2015-03-23 15:46         ` Jan Beulich
  0 siblings, 1 reply; 45+ messages in thread
From: Boris Ostrovsky @ 2015-03-23 15:33 UTC (permalink / raw)
  To: Jan Beulich
  Cc: elena.ufimtseva, wei.liu2, ian.campbell, stefano.stabellini,
	andrew.cooper3, dario.faggioli, Julien Grall, ian.jackson,
	xen-devel, keir

On 03/23/2015 11:27 AM, Jan Beulich wrote:
>>>> On 23.03.15 at 14:47, <boris.ostrovsky@oracle.com> wrote:
>> How about this (only x86 compile-tested). And perhaps, while at it, fix
>> types for cpu_core_id and phys_proc_id.
>>
>> diff --git a/xen/common/sysctl.c b/xen/common/sysctl.c
>> index c73dfc9..b319be7 100644
>> --- a/xen/common/sysctl.c
>> +++ b/xen/common/sysctl.c
>> @@ -354,10 +354,10 @@ long do_sysctl(XEN_GUEST_HANDLE_PARAM(xen_sysctl_t) u_sysctl)
>>                    if ( cpu_present(i) )
>>                    {
>>                        cputopo.core = cpu_to_core(i);
>> -                    if ( cputopo.core == BAD_APICID )
>> +                    if ( cputopo.core == INVALID_COREID )
>>                            cputopo.core = XEN_INVALID_CORE_ID;
>>                        cputopo.socket = cpu_to_socket(i);
>> -                    if ( cputopo.socket == BAD_APICID )
>> +                    if ( cputopo.socket == INVALID_SOCKETID )
>>                            cputopo.socket = XEN_INVALID_SOCKET_ID;
>
> Why not use XEN_INVALID_CORE_ID / XEN_INVALID_SOCKET_ID
> for those return values right away?
>
>> --- a/xen/include/asm-x86/processor.h
>> +++ b/xen/include/asm-x86/processor.h
>> @@ -214,8 +214,19 @@ extern void detect_extended_topology(struct cpuinfo_x86 *c);
>>
>>    extern void detect_ht(struct cpuinfo_x86 *c);
>>
>> -#define cpu_to_core(_cpu)   (cpu_data[_cpu].cpu_core_id)
>> -#define cpu_to_socket(_cpu) (cpu_data[_cpu].phys_proc_id)
>> +inline int cpu_to_core(unsigned cpu)
>> +{
>> +    if ( cpu_data[cpu].cpu_core_id == BAD_APICID )
>> +        return INVALID_COREID;
>> +    return cpu_data[cpu].cpu_core_id;
>> +}
>> +
>> +inline int cpu_to_socket(unsigned cpu)
>> +{
>> +    if ( cpu_data[cpu].phys_proc_id == BAD_APICID )
>> +        return INVALID_SOCKETID;
>> +    return cpu_data[cpu].phys_proc_id;
>> +}
>
> Apart from them needing to be static, I don't think we want the
> extra conditionals in x86 code. Hence I think you rather should
> introduce wrappers for the specific us in sysctl.c.


If we use XEN_INVALID_CORE_ID/XEN_INVALID_SOCKET_ID in 
cpu_data.cpu_core_id/phys_proc_id then no conditionals would be needed.

(I didn't do it in the above patch because these two fields are 
currently signed. I'll make them unsigned).

-boris

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v5 0/8] Display IO topology when PXM data is available (plus some cleanup)
  2015-03-23 15:33       ` Boris Ostrovsky
@ 2015-03-23 15:46         ` Jan Beulich
  0 siblings, 0 replies; 45+ messages in thread
From: Jan Beulich @ 2015-03-23 15:46 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: elena.ufimtseva, wei.liu2, ian.campbell, stefano.stabellini,
	andrew.cooper3, dario.faggioli, Julien Grall, ian.jackson,
	xen-devel, keir

>>> On 23.03.15 at 16:33, <boris.ostrovsky@oracle.com> wrote:
> On 03/23/2015 11:27 AM, Jan Beulich wrote:
>>>>> On 23.03.15 at 14:47, <boris.ostrovsky@oracle.com> wrote:
>>> @@ -214,8 +214,19 @@ extern void detect_extended_topology(struct cpuinfo_x86 *c);
>>>
>>>    extern void detect_ht(struct cpuinfo_x86 *c);
>>>
>>> -#define cpu_to_core(_cpu)   (cpu_data[_cpu].cpu_core_id)
>>> -#define cpu_to_socket(_cpu) (cpu_data[_cpu].phys_proc_id)
>>> +inline int cpu_to_core(unsigned cpu)
>>> +{
>>> +    if ( cpu_data[cpu].cpu_core_id == BAD_APICID )
>>> +        return INVALID_COREID;
>>> +    return cpu_data[cpu].cpu_core_id;
>>> +}
>>> +
>>> +inline int cpu_to_socket(unsigned cpu)
>>> +{
>>> +    if ( cpu_data[cpu].phys_proc_id == BAD_APICID )
>>> +        return INVALID_SOCKETID;
>>> +    return cpu_data[cpu].phys_proc_id;
>>> +}
>>
>> Apart from them needing to be static, I don't think we want the
>> extra conditionals in x86 code. Hence I think you rather should
>> introduce wrappers for the specific us in sysctl.c.
> 
> 
> If we use XEN_INVALID_CORE_ID/XEN_INVALID_SOCKET_ID in 
> cpu_data.cpu_core_id/phys_proc_id then no conditionals would be needed.
> 
> (I didn't do it in the above patch because these two fields are 
> currently signed. I'll make them unsigned).

Right - that's not where I meant them to be used; I suggested
having the inline wrappers use those.

Jan

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v5 3/8] sysctl: Make XEN_SYSCTL_topologyinfo sysctl a little more efficient
  2015-03-19 21:53 ` [PATCH v5 3/8] sysctl: Make XEN_SYSCTL_topologyinfo sysctl a little more efficient Boris Ostrovsky
  2015-03-20 13:25   ` Ian Campbell
@ 2015-03-25 16:13   ` Andrew Cooper
  1 sibling, 0 replies; 45+ messages in thread
From: Andrew Cooper @ 2015-03-25 16:13 UTC (permalink / raw)
  To: Boris Ostrovsky, jbeulich, keir, ian.jackson, stefano.stabellini,
	ian.campbell, wei.liu2, dario.faggioli, elena.ufimtseva,
	xen-devel

On 19/03/15 22:53, Boris Ostrovsky wrote:
> --- a/xen/common/sysctl.c
> +++ b/xen/common/sysctl.c
> @@ -324,39 +324,63 @@ long do_sysctl(XEN_GUEST_HANDLE_PARAM(xen_sysctl_t) u_sysctl)
>       }
>       break;
>   
> -    case XEN_SYSCTL_topologyinfo:
> +    case XEN_SYSCTL_cputopoinfo:
>       {
> -        uint32_t i, max_cpu_index, last_online_cpu;
> -        xen_sysctl_topologyinfo_t *ti = &op->u.topologyinfo;
> +        uint32_t i, num_cpus;
> +        xen_sysctl_cputopoinfo_t *ti = &op->u.cputopoinfo;
>   
> -        last_online_cpu = cpumask_last(&cpu_online_map);
> -        max_cpu_index = min_t(uint32_t, ti->max_cpu_index, last_online_cpu);
> -        ti->max_cpu_index = last_online_cpu;
> -
> -        for ( i = 0; i <= max_cpu_index; i++ )
> +        num_cpus = cpumask_last(&cpu_online_map) + 1;
> +        if ( !guest_handle_is_null(ti->cputopo) )
>           {
> -            if ( !guest_handle_is_null(ti->cpu_to_core) )
> +            if ( ti->num_cpus < num_cpus )
>               {
> -                uint32_t core = cpu_online(i) ? cpu_to_core(i) : ~0u;
> -                if ( copy_to_guest_offset(ti->cpu_to_core, i, &core, 1) )
> -                    break;
> +                ret = -ENOBUFS;
> +                i = num_cpus;
>               }
> -            if ( !guest_handle_is_null(ti->cpu_to_socket) )
> +
> +            for ( i = 0; i < num_cpus; i++ )

Observe that the "i = 0" clobbers the -ENOBUFS detection, meaning that 
Xen will always write num_cpus into the userspace array, writing past 
the end of the array if it is too short.

As this patch has already been committed, please fix as a matter of 
priority (or I can if you are overly busy).

~Andrew

(Also, you have introduced a mixed tab/space into 
tools/python/xen/lowlevel/xc/xc.c on the "goto out;" line)

^ permalink raw reply	[flat|nested] 45+ messages in thread

end of thread, other threads:[~2015-03-25 16:13 UTC | newest]

Thread overview: 45+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-03-19 21:53 [PATCH v5 0/8] Display IO topology when PXM data is available (plus some cleanup) Boris Ostrovsky
2015-03-19 21:53 ` [PATCH v5 1/8] numa: __node_distance() should return u8 Boris Ostrovsky
2015-03-20 14:25   ` Konrad Rzeszutek Wilk
2015-03-19 21:53 ` [PATCH v5 2/8] pci: Stash device's PXM information in struct pci_dev Boris Ostrovsky
2015-03-20 14:01   ` Konrad Rzeszutek Wilk
2015-03-19 21:53 ` [PATCH v5 3/8] sysctl: Make XEN_SYSCTL_topologyinfo sysctl a little more efficient Boris Ostrovsky
2015-03-20 13:25   ` Ian Campbell
2015-03-20 14:00     ` Boris Ostrovsky
2015-03-20 14:10       ` Ian Campbell
2015-03-25 16:13   ` Andrew Cooper
2015-03-19 21:54 ` [PATCH v5 4/8] sysctl: Make XEN_SYSCTL_numainfo " Boris Ostrovsky
2015-03-20 13:26   ` Ian Campbell
2015-03-20 16:15   ` Jan Beulich
2015-03-19 21:54 ` [PATCH v5 5/8] sysctl: Add sysctl interface for querying PCI topology Boris Ostrovsky
2015-03-20 14:21   ` Konrad Rzeszutek Wilk
2015-03-20 16:26   ` Jan Beulich
2015-03-20 20:01     ` Boris Ostrovsky
2015-03-23  8:15       ` Jan Beulich
2015-03-23 13:58         ` Boris Ostrovsky
2015-03-19 21:54 ` [PATCH v5 6/8] libxl/libxc: Move libxl_get_cpu_topology()'s hypercall buffer management to libxc Boris Ostrovsky
2015-03-20 13:54   ` Ian Campbell
2015-03-20 14:24     ` Boris Ostrovsky
2015-03-20 14:30       ` Ian Campbell
2015-03-19 21:54 ` [PATCH v5 7/8] libxl/libxc: Move libxl_get_numainfo()'s " Boris Ostrovsky
2015-03-20 13:49   ` Konrad Rzeszutek Wilk
2015-03-20 14:10     ` Boris Ostrovsky
2015-03-20 14:21       ` Ian Campbell
2015-03-20 13:56   ` Ian Campbell
2015-03-20 14:31     ` Boris Ostrovsky
2015-03-20 14:46       ` Ian Campbell
2015-03-20 14:59         ` Boris Ostrovsky
2015-03-19 21:54 ` [PATCH v5 8/8] libxl: Add interface for querying hypervisor about PCI topology Boris Ostrovsky
2015-03-23 12:42 ` [PATCH v5 0/8] Display IO topology when PXM data is available (plus some cleanup) Julien Grall
2015-03-23 13:47   ` Boris Ostrovsky
2015-03-23 14:30     ` George Dunlap
2015-03-23 14:33       ` Boris Ostrovsky
2015-03-23 14:42       ` Ian Campbell
2015-03-23 15:26     ` Julien Grall
2015-03-23 15:27     ` Jan Beulich
2015-03-23 15:33       ` Boris Ostrovsky
2015-03-23 15:46         ` Jan Beulich
2015-03-23 14:30   ` Jan Beulich
2015-03-23 14:38     ` George Dunlap
2015-03-23 15:17       ` Ian Campbell
2015-03-23 15:21       ` Julien Grall

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.