[RFC][PATCH] Per-cpu xentrace buffers

All of lore.kernel.org
 help / color / mirror / Atom feed

* [RFC][PATCH] Per-cpu xentrace buffers
@ 2010-01-07 15:13 George Dunlap
  2010-01-20 17:38 ` George Dunlap
  0 siblings, 1 reply; 5+ messages in thread
From: George Dunlap @ 2010-01-07 15:13 UTC (permalink / raw)
  To: xen-devel

[-- Attachment #1: Type: text/plain, Size: 3624 bytes --]

In the current xentrace configuration, xentrace buffers are all
allocated in a single contiguous chunk, and then divided among logical
cpus, one buffer per cpu.  The size of an allocatable chunk is fairly
limited, in my experience about 128 pages (512KiB).  As the number of
logical cores increase, this means a much smaller maximum per-cpu
trace buffer per cpu; on my dual-socket quad-core nehalem box with
hyperthreading (16 logical cpus), that comes to 8 pages per logical
cpu.

The attached patch addresses this issue by allocating per-cpu buffers
separately.  This allows larger trace buffers; however, it requires an
interface change to xentrace, which is why I'm making a Request For
Comments.  (I'm not expecting this patch to be included in the 4.0
release.)

The old interface to get trace buffers was fairly simple: you ask for
the info, and it gives you:
* the mfn of the first page in the buffer allocation
* the total size of the trace buffer

The tools then mapped [mfn,mfn+size), calculated where the per-pcpu
buffers were, and went on to consume records from them.

-- Interface --

The proposed interface works as follows.

* XEN_SYSCTL_TBUFOP_get_info still returns an mfn and a size (so no
changes to the library).  However, this new are is to a trace buffer
info area  (t_info), allocated once at boot time.  The trace buffer
info area contains mfns of the per-pcpu buffers.
* The t_info struct contains an array of "offset pointers", one per
pcpu.  These are an offset into the t_info data area of an array of
mfns for that pcpu.  So logically, the layout looks like this:
struct {
 int16_t tbuf_size; /* Number of pages per cpu */
 int16_t offset[NR_CPUS]; /* Offset into the t_info area of the array */
 uint32_t mfn[NR_CPUS][TBUF_SIZE];
};

So if NR_CPUS was 16, and TBUF_SIZE was 32, we'd have:
struct {
 int16_t tbuf_size; /* Number of pages per cpu */
 int16_t offset[16]; /* Offset into the t_info area of the array */
 uint32_t p0_mfn_list[32];
 uint32_t p1_mfn_list[32];
  ...
 uint32_t p15_mfn_list[32];
};
* So the new way to map trace buffers is as follows:
 + Call TBUFOP_get_info to get the mfn and size of the t_info area, and map it.
 + Get the number of cpus
 + For each cpu:
  - Calculate the offset into the t_info area thus: unsigned long
*mfn_list = ((unsigned long*)t_info)+(t_info->cpu_offset[cpu]))
  - Map t_info->tbuf_size mfns from mfn_list using xc_map_foreign_batch()

In the current implementation, the t_info size is fixed at 2 pages,
allowing about 2000 pages total to be mapped.  For a 32-way system,
this would allow up to 63 pages per cpu (256MiB).  Bumping this up to
4 would allow even larger systems if required.

The current implementation also allocates each trace buffer
contiguously, since that's the easiest way to get contiguous virtual
address space.  But this interface allows Xen the flexibility, in the
future, to allocate buffers in several chunks if necessary, without
having to change the interface again.

-- Implementation notes --

The t_info area is allocated once at boot.  Trace buffers are
allocated either at boot (if a parameter is passed) or when
TBUFOP_set_size is called.  Due to the complexity of tracking pages
mapped by dom0, unmapping or resizing trace buffers is not supported.

I introduced a new per-cpu spinlock guarding trace data and buffers.
This allows per-cpu data to be safely accessed and modified without
tracing with current tracing events.  The per-cpu spinlock is grabbed
whenever a trace event is generated; but in the (very very very)
common case, the lock should be in the cache already.

Feedback welcome.

 -George

[-- Attachment #2: 20100106-unstable-xentrace-interface.diff --]
[-- Type: text/x-patch, Size: 14817 bytes --]

diff -r a1d0a575b4ba tools/xentrace/xentrace.c
--- a/tools/xentrace/xentrace.c	Wed Jan 06 10:13:55 2010 +0000
+++ b/tools/xentrace/xentrace.c	Wed Jan 06 18:14:16 2010 +0000
@@ -61,6 +61,12 @@
         disable_tracing:1;
 } settings_t;
 
+struct t_struct {
+    struct t_info *t_info;  /* Structure with information about individual buffers */
+    struct t_buf **meta;    /* Pointers to trace buffer metadata */
+    unsigned char **data;   /* Pointers to trace buffer data areas */
+};
+
 settings_t opts;
 
 int interrupted = 0; /* gets set if we get a SIGHUP */
@@ -446,22 +452,61 @@
  *
  * Maps the Xen trace buffers them into process address space.
  */
-static struct t_buf *map_tbufs(unsigned long tbufs_mfn, unsigned int num,
-                        unsigned long size)
+static struct t_struct *map_tbufs(unsigned long tbufs_mfn, unsigned int num,
+                                  unsigned long tinfo_size)
 {
-    struct t_buf *tbufs_mapped;
+    static struct t_struct tbufs = { 0 };
+    int i;
 
-    tbufs_mapped = xc_map_foreign_range(xc_handle, DOMID_XEN,
-                                        size * num, PROT_READ | PROT_WRITE,
+    /* Map t_info metadata structure */
+    tbufs.t_info = xc_map_foreign_range(xc_handle, DOMID_XEN,
+                                        tinfo_size, PROT_READ | PROT_WRITE,
                                         tbufs_mfn);
 
-    if ( tbufs_mapped == 0 ) 
+    if ( tbufs.t_info == 0 ) 
     {
         PERROR("Failed to mmap trace buffers");
         exit(EXIT_FAILURE);
     }
 
-    return tbufs_mapped;
+    if ( tbufs.t_info->tbuf_size == 0 )
+    {
+        fprintf(stderr, "%s: tbuf_size 0!\n", __func__);
+        exit(EXIT_FAILURE);
+    }
+
+    /* Map per-cpu buffers */
+    tbufs.meta = (struct t_buf **)calloc(num, sizeof(struct t_buf *));
+    tbufs.data = (unsigned char **)calloc(num, sizeof(unsigned char *));
+    if ( tbufs.meta == NULL || tbufs.data == NULL )
+    {
+        PERROR( "Failed to allocate memory for buffer pointers\n");
+        exit(EXIT_FAILURE);
+    }
+
+    for(i=0; i<num; i++)
+    {
+        
+        uint32_t *mfn_list = ((uint32_t *)tbufs.t_info) + tbufs.t_info->mfn_offset[i];
+        int j;
+        xen_pfn_t pfn_list[tbufs.t_info->tbuf_size];
+
+        for ( j=0; j<tbufs.t_info->tbuf_size; j++)
+            pfn_list[j] = (xen_pfn_t)mfn_list[j];
+
+        tbufs.meta[i] = xc_map_foreign_batch(xc_handle, DOMID_XEN,
+                                             PROT_READ | PROT_WRITE,
+                                             pfn_list,
+                                             tbufs.t_info->tbuf_size);
+        if ( tbufs.meta[i] == NULL )
+        {
+            PERROR("Failed to map cpu buffer!");
+            exit(EXIT_FAILURE);
+        }
+        tbufs.data[i] = (unsigned char *)(tbufs.meta[i]+1);
+    }
+
+    return &tbufs;
 }
 
 /**
@@ -490,66 +535,6 @@
 }
 
 /**
- * init_bufs_ptrs - initialises an array of pointers to the trace buffers
- * @bufs_mapped:    the userspace address where the trace buffers are mapped
- * @num:            number of trace buffers
- * @size:           trace buffer size
- *
- * Initialises an array of pointers to individual trace buffers within the
- * mapped region containing all trace buffers.
- */
-static struct t_buf **init_bufs_ptrs(void *bufs_mapped, unsigned int num,
-                              unsigned long size)
-{
-    int i;
-    struct t_buf **user_ptrs;
-
-    user_ptrs = (struct t_buf **)calloc(num, sizeof(struct t_buf *));
-    if ( user_ptrs == NULL )
-    {
-        PERROR( "Failed to allocate memory for buffer pointers\n");
-        exit(EXIT_FAILURE);
-    }
-    
-    /* initialise pointers to the trace buffers - given the size of a trace
-     * buffer and the value of bufs_maped, we can easily calculate these */
-    for ( i = 0; i<num; i++ )
-        user_ptrs[i] = (struct t_buf *)((unsigned long)bufs_mapped + size * i);
-
-    return user_ptrs;
-}
-
-
-/**
- * init_rec_ptrs - initialises data area pointers to locations in user space
- * @tbufs_mfn:     base mfn of the trace buffer area
- * @tbufs_mapped:  user virtual address of base of trace buffer area
- * @meta:          array of user-space pointers to struct t_buf's of metadata
- * @num:           number of trace buffers
- *
- * Initialises data area pointers to the locations that data areas have been
- * mapped in user space.  Note that the trace buffer metadata contains machine
- * pointers - the array returned allows more convenient access to them.
- */
-static unsigned char **init_rec_ptrs(struct t_buf **meta, unsigned int num)
-{
-    int i;
-    unsigned char **data;
-    
-    data = calloc(num, sizeof(unsigned char *));
-    if ( data == NULL )
-    {
-        PERROR("Failed to allocate memory for data pointers\n");
-        exit(EXIT_FAILURE);
-    }
-
-    for ( i = 0; i < num; i++ )
-        data[i] = (unsigned char *)(meta[i] + 1);
-
-    return data;
-}
-
-/**
  * get_num_cpus - get the number of logical CPUs
  */
 static unsigned int get_num_cpus(void)
@@ -638,12 +623,13 @@
 {
     int i;
 
-    void *tbufs_mapped;          /* pointer to where the tbufs are mapped    */
+    struct t_struct *tbufs;      /* Pointer to hypervisor maps */
     struct t_buf **meta;         /* pointers to the trace buffer metadata    */
     unsigned char **data;        /* pointers to the trace buffer data areas
                                   * where they are mapped into user space.   */
     unsigned long tbufs_mfn;     /* mfn of the tbufs                         */
     unsigned int  num;           /* number of trace buffers / logical CPUS   */
+    unsigned long tinfo_size;    /* size of t_info metadata map */
     unsigned long size;          /* size of a single trace buffer            */
 
     unsigned long data_size;
@@ -655,14 +641,15 @@
     num = get_num_cpus();
 
     /* setup access to trace buffers */
-    get_tbufs(&tbufs_mfn, &size);
-    tbufs_mapped = map_tbufs(tbufs_mfn, num, size);
+    get_tbufs(&tbufs_mfn, &tinfo_size);
+    tbufs = map_tbufs(tbufs_mfn, num, tinfo_size);
+
+    size = tbufs->t_info->tbuf_size * PAGE_SIZE;
 
     data_size = size - sizeof(struct t_buf);
 
-    /* build arrays of convenience ptrs */
-    meta  = init_bufs_ptrs(tbufs_mapped, num, size);
-    data  = init_rec_ptrs(meta, num);
+    meta = tbufs->meta;
+    data = tbufs->data;
 
     if ( opts.discard )
         for ( i = 0; i < num; i++ )
diff -r a1d0a575b4ba xen/common/trace.c
--- a/xen/common/trace.c	Wed Jan 06 10:13:55 2010 +0000
+++ b/xen/common/trace.c	Wed Jan 06 18:14:16 2010 +0000
@@ -46,8 +46,11 @@
 integer_param("tbuf_size", opt_tbuf_size);
 
 /* Pointers to the meta-data objects for all system trace buffers */
+static struct t_info *t_info;
+#define T_INFO_PAGES 2  /* Size fixed at 2 pages for now. */
 static DEFINE_PER_CPU_READ_MOSTLY(struct t_buf *, t_bufs);
 static DEFINE_PER_CPU_READ_MOSTLY(unsigned char *, t_data);
+static DEFINE_PER_CPU_READ_MOSTLY(spinlock_t, t_lock);
 static int data_size;
 
 /* High water mark for trace buffers; */
@@ -80,41 +83,104 @@
  */
 static int alloc_trace_bufs(void)
 {
-    int           i, order;
+    int           i, cpu, order;
     unsigned long nr_pages;
-    char         *rawbuf;
-    struct t_buf *buf;
+    /* Start after a fixed-size array of NR_CPUS */
+    uint32_t *t_info_mfn_list = (uint32_t *)t_info;
+    int offset = (NR_CPUS * 2 + 1 + 1) / 4;
 
     if ( opt_tbuf_size == 0 )
         return -EINVAL;
 
-    nr_pages = num_online_cpus() * opt_tbuf_size;
-    order    = get_order_from_pages(nr_pages);
-    data_size  = (opt_tbuf_size * PAGE_SIZE - sizeof(struct t_buf));
-    
-    if ( (rawbuf = alloc_xenheap_pages(order, 0)) == NULL )
+    if ( !t_info )
     {
-        printk("Xen trace buffers: memory allocation failed\n");
-        opt_tbuf_size = 0;
+        printk("%s: t_info not allocated, cannot allocate trace buffers!\n",
+               __func__);
         return -EINVAL;
     }
 
-    /* Share pages so that xentrace can map them. */
-    for ( i = 0; i < nr_pages; i++ )
-        share_xen_page_with_privileged_guests(
-            virt_to_page(rawbuf) + i, XENSHARE_writable);
+    t_info->tbuf_size = opt_tbuf_size;
+    printk("tbuf_size %d\n", t_info->tbuf_size);
 
-    for_each_online_cpu ( i )
+    nr_pages = opt_tbuf_size;
+    order = get_order_from_pages(nr_pages);
+
+    /*
+     * First, allocate buffers for all of the cpus.  If any
+     * fails, deallocate what you have so far and exit. 
+     */
+    for_each_online_cpu(cpu)
     {
-        buf = per_cpu(t_bufs, i) = (struct t_buf *)
-            &rawbuf[i*opt_tbuf_size*PAGE_SIZE];
+        int flags;
+        char         *rawbuf;
+        struct t_buf *buf;
+
+        if ( (rawbuf = alloc_xenheap_pages(order, 0)) == NULL )
+        {
+            printk("Xen trace buffers: memory allocation failed\n");
+            opt_tbuf_size = 0;
+            goto out_dealloc;
+        }
+
+        spin_lock_irqsave(&per_cpu(t_lock, cpu), flags);
+
+        buf = per_cpu(t_bufs, cpu) = (struct t_buf *)rawbuf;
         buf->cons = buf->prod = 0;
-        per_cpu(t_data, i) = (unsigned char *)(buf + 1);
+        per_cpu(t_data, cpu) = (unsigned char *)(buf + 1);
+
+        spin_unlock_irqrestore(&per_cpu(t_lock, cpu), flags);
+
     }
 
+    /*
+     * Now share the pages to xentrace can map them, and write them in
+     * the global t_info structure.
+     */
+    for_each_online_cpu(cpu)
+    {
+        /* Share pages so that xentrace can map them. */
+        char         *rawbuf;
+
+        if ( (rawbuf = (char *)per_cpu(t_bufs, cpu)) )
+        {
+            struct page_info *p = virt_to_page(rawbuf);
+            uint32_t mfn = virt_to_mfn(rawbuf);
+
+            for ( i = 0; i < nr_pages; i++ )
+            {
+                share_xen_page_with_privileged_guests(
+                    p + i, XENSHARE_writable);
+            
+                t_info_mfn_list[offset + i]=mfn + i;
+            }
+            /* Write list first, then write per-cpu offset. */
+            wmb();
+            t_info->mfn_offset[cpu]=offset;
+            printk("p%d mfn %"PRIx32" offset %d\n",
+                   cpu, mfn, offset);
+            offset+=i;
+        }
+    }
+
+    data_size  = (opt_tbuf_size * PAGE_SIZE - sizeof(struct t_buf));
     t_buf_highwater = data_size >> 1; /* 50% high water */
 
     return 0;
+out_dealloc:
+    for_each_online_cpu(cpu)
+    {
+        int flags;
+        char * rawbuf;
+
+        spin_lock_irqsave(&per_cpu(t_lock, cpu), flags);
+        if ( (rawbuf = (char *)per_cpu(t_bufs, cpu)) )
+        {
+            ASSERT(!(virt_to_page(rawbuf)->count_info & PGC_allocated));
+            free_xenheap_pages(rawbuf, order);
+        }
+        spin_unlock_irqrestore(&per_cpu(t_lock, cpu), flags);
+    }
+    return -EINVAL;
 }
 
 
@@ -181,6 +247,26 @@
  */
 void __init init_trace_bufs(void)
 {
+    int i;
+    /* t_info size fixed at 2 pages for now.  That should be big enough / small enough
+     * until it's worth making it dynamic. */
+    t_info = alloc_xenheap_pages(1, 0);
+
+    if ( t_info == NULL )
+    {
+        printk("Xen trace buffers: t_info allocation failed!  Tracing disabled.\n");
+        return;
+    }
+
+    for(i = 0; i < NR_CPUS; i++)
+        spin_lock_init(&per_cpu(t_lock, i));
+
+    for(i=0; i<T_INFO_PAGES; i++)
+        share_xen_page_with_privileged_guests(
+            virt_to_page(t_info) + i, XENSHARE_writable);
+
+
+
     if ( opt_tbuf_size == 0 )
     {
         printk("Xen trace buffers: disabled\n");
@@ -210,8 +296,8 @@
     {
     case XEN_SYSCTL_TBUFOP_get_info:
         tbc->evt_mask   = tb_event_mask;
-        tbc->buffer_mfn = opt_tbuf_size ? virt_to_mfn(per_cpu(t_bufs, 0)) : 0;
-        tbc->size       = opt_tbuf_size * PAGE_SIZE;
+        tbc->buffer_mfn = t_info ? virt_to_mfn(t_info) : 0;
+        tbc->size = T_INFO_PAGES;
         break;
     case XEN_SYSCTL_TBUFOP_set_cpu_mask:
         xenctl_cpumap_to_cpumask(&tb_cpu_mask, &tbc->cpu_mask);
@@ -220,7 +306,7 @@
         tb_event_mask = tbc->evt_mask;
         break;
     case XEN_SYSCTL_TBUFOP_set_size:
-        rc = !tb_init_done ? tb_set_size(tbc->size) : -EINVAL;
+        rc = tb_set_size(tbc->size);
         break;
     case XEN_SYSCTL_TBUFOP_enable:
         /* Enable trace buffers. Check buffers are already allocated. */
@@ -428,7 +514,7 @@
     unsigned long flags, bytes_to_tail, bytes_to_wrap;
     int rec_size, total_size;
     int extra_word;
-    int started_below_highwater;
+    int started_below_highwater = 0;
 
     if( !tb_init_done )
         return;
@@ -462,9 +548,12 @@
     /* Read tb_init_done /before/ t_bufs. */
     rmb();
 
+    spin_lock_irqsave(&this_cpu(t_lock), flags);
+
     buf = this_cpu(t_bufs);
 
-    local_irq_save(flags);
+    if ( unlikely(!buf) )
+        goto unlock;
 
     started_below_highwater = (calc_unconsumed_bytes(buf) < t_buf_highwater);
 
@@ -511,8 +600,8 @@
     {
         if ( ++this_cpu(lost_records) == 1 )
             this_cpu(lost_records_first_tsc)=(u64)get_cycles();
-        local_irq_restore(flags);
-        return;
+        started_below_highwater = 0;
+        goto unlock;
     }
 
     /*
@@ -541,7 +630,8 @@
     /* Write the original record */
     __insert_record(buf, event, extra, cycles, rec_size, extra_data);
 
-    local_irq_restore(flags);
+unlock:
+    spin_unlock_irqrestore(&this_cpu(t_lock), flags);
 
     /* Notify trace buffer consumer that we've crossed the high water mark. */
     if ( started_below_highwater &&
diff -r a1d0a575b4ba xen/include/public/sysctl.h
--- a/xen/include/public/sysctl.h	Wed Jan 06 10:13:55 2010 +0000
+++ b/xen/include/public/sysctl.h	Wed Jan 06 18:14:16 2010 +0000
@@ -75,7 +75,7 @@
     uint32_t             evt_mask;
     /* OUT variables */
     uint64_aligned_t buffer_mfn;
-    uint32_t size;
+    uint32_t size;  /* Also an IN variable! */
 };
 typedef struct xen_sysctl_tbuf_op xen_sysctl_tbuf_op_t;
 DEFINE_XEN_GUEST_HANDLE(xen_sysctl_tbuf_op_t);
diff -r a1d0a575b4ba xen/include/public/trace.h
--- a/xen/include/public/trace.h	Wed Jan 06 10:13:55 2010 +0000
+++ b/xen/include/public/trace.h	Wed Jan 06 18:14:16 2010 +0000
@@ -195,6 +195,16 @@
     /*  Records follow immediately after the meta-data header.    */
 };
 
+/* Structure used to pass MFNs to the trace buffers back to trace consumers.
+ * Offset is an offset into the mapped structure where the mfn list will be held.
+ * MFNs will be at ((unsigned long *)(t_info))+(t_info->cpu_offset[cpu]).
+ */
+struct t_info {
+    uint16_t tbuf_size; /* Size in pages of each trace buffer */
+    uint16_t mfn_offset[];  /* Offset within t_info structure of the page list per cpu */
+    /* MFN lists immediately after the header */
+};
+
 #endif /* __XEN_PUBLIC_TRACE_H__ */
 
 /*

[-- Attachment #3: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC][PATCH] Per-cpu xentrace buffers
  2010-01-07 15:13 [RFC][PATCH] Per-cpu xentrace buffers George Dunlap
@ 2010-01-20 17:38 ` George Dunlap
  2010-01-20 17:50   ` Keir Fraser
  0 siblings, 1 reply; 5+ messages in thread
From: George Dunlap @ 2010-01-20 17:38 UTC (permalink / raw)
  To: xen-devel, Keir Fraser

Keir, would you mind commenting on this new design in the next few
days?  If it looks like a good design, I'd like to do some more
testing and get this into our next XenServer release.

 -George

On Thu, Jan 7, 2010 at 3:13 PM, George Dunlap <dunlapg@umich.edu> wrote:
> In the current xentrace configuration, xentrace buffers are all
> allocated in a single contiguous chunk, and then divided among logical
> cpus, one buffer per cpu.  The size of an allocatable chunk is fairly
> limited, in my experience about 128 pages (512KiB).  As the number of
> logical cores increase, this means a much smaller maximum per-cpu
> trace buffer per cpu; on my dual-socket quad-core nehalem box with
> hyperthreading (16 logical cpus), that comes to 8 pages per logical
> cpu.
>
> The attached patch addresses this issue by allocating per-cpu buffers
> separately.  This allows larger trace buffers; however, it requires an
> interface change to xentrace, which is why I'm making a Request For
> Comments.  (I'm not expecting this patch to be included in the 4.0
> release.)
>
> The old interface to get trace buffers was fairly simple: you ask for
> the info, and it gives you:
> * the mfn of the first page in the buffer allocation
> * the total size of the trace buffer
>
> The tools then mapped [mfn,mfn+size), calculated where the per-pcpu
> buffers were, and went on to consume records from them.
>
> -- Interface --
>
> The proposed interface works as follows.
>
> * XEN_SYSCTL_TBUFOP_get_info still returns an mfn and a size (so no
> changes to the library).  However, this new are is to a trace buffer
> info area  (t_info), allocated once at boot time.  The trace buffer
> info area contains mfns of the per-pcpu buffers.
> * The t_info struct contains an array of "offset pointers", one per
> pcpu.  These are an offset into the t_info data area of an array of
> mfns for that pcpu.  So logically, the layout looks like this:
> struct {
>  int16_t tbuf_size; /* Number of pages per cpu */
>  int16_t offset[NR_CPUS]; /* Offset into the t_info area of the array */
>  uint32_t mfn[NR_CPUS][TBUF_SIZE];
> };
>
> So if NR_CPUS was 16, and TBUF_SIZE was 32, we'd have:
> struct {
>  int16_t tbuf_size; /* Number of pages per cpu */
>  int16_t offset[16]; /* Offset into the t_info area of the array */
>  uint32_t p0_mfn_list[32];
>  uint32_t p1_mfn_list[32];
>  ...
>  uint32_t p15_mfn_list[32];
> };
> * So the new way to map trace buffers is as follows:
>  + Call TBUFOP_get_info to get the mfn and size of the t_info area, and map it.
>  + Get the number of cpus
>  + For each cpu:
>  - Calculate the offset into the t_info area thus: unsigned long
> *mfn_list = ((unsigned long*)t_info)+(t_info->cpu_offset[cpu]))
>  - Map t_info->tbuf_size mfns from mfn_list using xc_map_foreign_batch()
>
> In the current implementation, the t_info size is fixed at 2 pages,
> allowing about 2000 pages total to be mapped.  For a 32-way system,
> this would allow up to 63 pages per cpu (256MiB).  Bumping this up to
> 4 would allow even larger systems if required.
>
> The current implementation also allocates each trace buffer
> contiguously, since that's the easiest way to get contiguous virtual
> address space.  But this interface allows Xen the flexibility, in the
> future, to allocate buffers in several chunks if necessary, without
> having to change the interface again.
>
> -- Implementation notes --
>
> The t_info area is allocated once at boot.  Trace buffers are
> allocated either at boot (if a parameter is passed) or when
> TBUFOP_set_size is called.  Due to the complexity of tracking pages
> mapped by dom0, unmapping or resizing trace buffers is not supported.
>
> I introduced a new per-cpu spinlock guarding trace data and buffers.
> This allows per-cpu data to be safely accessed and modified without
> tracing with current tracing events.  The per-cpu spinlock is grabbed
> whenever a trace event is generated; but in the (very very very)
> common case, the lock should be in the cache already.
>
> Feedback welcome.
>
>  -George
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC][PATCH] Per-cpu xentrace buffers
  2010-01-20 17:38 ` George Dunlap
@ 2010-01-20 17:50   ` Keir Fraser
  2010-01-20 18:06     ` George Dunlap
  0 siblings, 1 reply; 5+ messages in thread
From: Keir Fraser @ 2010-01-20 17:50 UTC (permalink / raw)
  To: George Dunlap, xen-devel@lists.xensource.com

Oh, I'm fine with it. I wasn't sure about putting it in for 4.0.0, but
actually plenty is going in for rc2. What do you think?

 -- Keir

On 20/01/2010 17:38, "George Dunlap" <George.Dunlap@eu.citrix.com> wrote:

> Keir, would you mind commenting on this new design in the next few
> days?  If it looks like a good design, I'd like to do some more
> testing and get this into our next XenServer release.
> 
>  -George
> 
> On Thu, Jan 7, 2010 at 3:13 PM, George Dunlap <dunlapg@umich.edu> wrote:
>> In the current xentrace configuration, xentrace buffers are all
>> allocated in a single contiguous chunk, and then divided among logical
>> cpus, one buffer per cpu.  The size of an allocatable chunk is fairly
>> limited, in my experience about 128 pages (512KiB).  As the number of
>> logical cores increase, this means a much smaller maximum per-cpu
>> trace buffer per cpu; on my dual-socket quad-core nehalem box with
>> hyperthreading (16 logical cpus), that comes to 8 pages per logical
>> cpu.
>> 
>> The attached patch addresses this issue by allocating per-cpu buffers
>> separately.  This allows larger trace buffers; however, it requires an
>> interface change to xentrace, which is why I'm making a Request For
>> Comments.  (I'm not expecting this patch to be included in the 4.0
>> release.)
>> 
>> The old interface to get trace buffers was fairly simple: you ask for
>> the info, and it gives you:
>> * the mfn of the first page in the buffer allocation
>> * the total size of the trace buffer
>> 
>> The tools then mapped [mfn,mfn+size), calculated where the per-pcpu
>> buffers were, and went on to consume records from them.
>> 
>> -- Interface --
>> 
>> The proposed interface works as follows.
>> 
>> * XEN_SYSCTL_TBUFOP_get_info still returns an mfn and a size (so no
>> changes to the library).  However, this new are is to a trace buffer
>> info area  (t_info), allocated once at boot time.  The trace buffer
>> info area contains mfns of the per-pcpu buffers.
>> * The t_info struct contains an array of "offset pointers", one per
>> pcpu.  These are an offset into the t_info data area of an array of
>> mfns for that pcpu.  So logically, the layout looks like this:
>> struct {
>>  int16_t tbuf_size; /* Number of pages per cpu */
>>  int16_t offset[NR_CPUS]; /* Offset into the t_info area of the array */
>>  uint32_t mfn[NR_CPUS][TBUF_SIZE];
>> };
>> 
>> So if NR_CPUS was 16, and TBUF_SIZE was 32, we'd have:
>> struct {
>>  int16_t tbuf_size; /* Number of pages per cpu */
>>  int16_t offset[16]; /* Offset into the t_info area of the array */
>>  uint32_t p0_mfn_list[32];
>>  uint32_t p1_mfn_list[32];
>>  ...
>>  uint32_t p15_mfn_list[32];
>> };
>> * So the new way to map trace buffers is as follows:
>>  + Call TBUFOP_get_info to get the mfn and size of the t_info area, and map
>> it.
>>  + Get the number of cpus
>>  + For each cpu:
>>  - Calculate the offset into the t_info area thus: unsigned long
>> *mfn_list = ((unsigned long*)t_info)+(t_info->cpu_offset[cpu]))
>>  - Map t_info->tbuf_size mfns from mfn_list using xc_map_foreign_batch()
>> 
>> In the current implementation, the t_info size is fixed at 2 pages,
>> allowing about 2000 pages total to be mapped.  For a 32-way system,
>> this would allow up to 63 pages per cpu (256MiB).  Bumping this up to
>> 4 would allow even larger systems if required.
>> 
>> The current implementation also allocates each trace buffer
>> contiguously, since that's the easiest way to get contiguous virtual
>> address space.  But this interface allows Xen the flexibility, in the
>> future, to allocate buffers in several chunks if necessary, without
>> having to change the interface again.
>> 
>> -- Implementation notes --
>> 
>> The t_info area is allocated once at boot.  Trace buffers are
>> allocated either at boot (if a parameter is passed) or when
>> TBUFOP_set_size is called.  Due to the complexity of tracking pages
>> mapped by dom0, unmapping or resizing trace buffers is not supported.
>> 
>> I introduced a new per-cpu spinlock guarding trace data and buffers.
>> This allows per-cpu data to be safely accessed and modified without
>> tracing with current tracing events.  The per-cpu spinlock is grabbed
>> whenever a trace event is generated; but in the (very very very)
>> common case, the lock should be in the cache already.
>> 
>> Feedback welcome.
>> 
>>  -George
>> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC][PATCH] Per-cpu xentrace buffers
  2010-01-20 17:50   ` Keir Fraser
@ 2010-01-20 18:06     ` George Dunlap
  2010-01-20 18:34       ` Keir Fraser
  0 siblings, 1 reply; 5+ messages in thread
From: George Dunlap @ 2010-01-20 18:06 UTC (permalink / raw)
  To: Keir Fraser; +Cc: xen-devel@lists.xensource.com

How long between rc2 and expected release (if no other candidates are 
considered)?  It's more of a debugging feature, so it's not going to 
screw over any production systems if it's got some subtle bugs.  (The 
"tb_init_done" flag that turns it on or off is exactly the same.)  I 
could try to put it through its paces this week and early next week, and 
if nothing turns up, it's probably fine to go in.

It will definitely require a tools rebuild if anyone's using xentrace, 
which people may not expect. :-)

 -George

Keir Fraser wrote:
> Oh, I'm fine with it. I wasn't sure about putting it in for 4.0.0, but
> actually plenty is going in for rc2. What do you think?
>
>  -- Keir
>
> On 20/01/2010 17:38, "George Dunlap" <George.Dunlap@eu.citrix.com> wrote:
>
>   
>> Keir, would you mind commenting on this new design in the next few
>> days?  If it looks like a good design, I'd like to do some more
>> testing and get this into our next XenServer release.
>>
>>  -George
>>
>> On Thu, Jan 7, 2010 at 3:13 PM, George Dunlap <dunlapg@umich.edu> wrote:
>>     
>>> In the current xentrace configuration, xentrace buffers are all
>>> allocated in a single contiguous chunk, and then divided among logical
>>> cpus, one buffer per cpu.  The size of an allocatable chunk is fairly
>>> limited, in my experience about 128 pages (512KiB).  As the number of
>>> logical cores increase, this means a much smaller maximum per-cpu
>>> trace buffer per cpu; on my dual-socket quad-core nehalem box with
>>> hyperthreading (16 logical cpus), that comes to 8 pages per logical
>>> cpu.
>>>
>>> The attached patch addresses this issue by allocating per-cpu buffers
>>> separately.  This allows larger trace buffers; however, it requires an
>>> interface change to xentrace, which is why I'm making a Request For
>>> Comments.  (I'm not expecting this patch to be included in the 4.0
>>> release.)
>>>
>>> The old interface to get trace buffers was fairly simple: you ask for
>>> the info, and it gives you:
>>> * the mfn of the first page in the buffer allocation
>>> * the total size of the trace buffer
>>>
>>> The tools then mapped [mfn,mfn+size), calculated where the per-pcpu
>>> buffers were, and went on to consume records from them.
>>>
>>> -- Interface --
>>>
>>> The proposed interface works as follows.
>>>
>>> * XEN_SYSCTL_TBUFOP_get_info still returns an mfn and a size (so no
>>> changes to the library).  However, this new are is to a trace buffer
>>> info area  (t_info), allocated once at boot time.  The trace buffer
>>> info area contains mfns of the per-pcpu buffers.
>>> * The t_info struct contains an array of "offset pointers", one per
>>> pcpu.  These are an offset into the t_info data area of an array of
>>> mfns for that pcpu.  So logically, the layout looks like this:
>>> struct {
>>>  int16_t tbuf_size; /* Number of pages per cpu */
>>>  int16_t offset[NR_CPUS]; /* Offset into the t_info area of the array */
>>>  uint32_t mfn[NR_CPUS][TBUF_SIZE];
>>> };
>>>
>>> So if NR_CPUS was 16, and TBUF_SIZE was 32, we'd have:
>>> struct {
>>>  int16_t tbuf_size; /* Number of pages per cpu */
>>>  int16_t offset[16]; /* Offset into the t_info area of the array */
>>>  uint32_t p0_mfn_list[32];
>>>  uint32_t p1_mfn_list[32];
>>>  ...
>>>  uint32_t p15_mfn_list[32];
>>> };
>>> * So the new way to map trace buffers is as follows:
>>>  + Call TBUFOP_get_info to get the mfn and size of the t_info area, and map
>>> it.
>>>  + Get the number of cpus
>>>  + For each cpu:
>>>  - Calculate the offset into the t_info area thus: unsigned long
>>> *mfn_list = ((unsigned long*)t_info)+(t_info->cpu_offset[cpu]))
>>>  - Map t_info->tbuf_size mfns from mfn_list using xc_map_foreign_batch()
>>>
>>> In the current implementation, the t_info size is fixed at 2 pages,
>>> allowing about 2000 pages total to be mapped.  For a 32-way system,
>>> this would allow up to 63 pages per cpu (256MiB).  Bumping this up to
>>> 4 would allow even larger systems if required.
>>>
>>> The current implementation also allocates each trace buffer
>>> contiguously, since that's the easiest way to get contiguous virtual
>>> address space.  But this interface allows Xen the flexibility, in the
>>> future, to allocate buffers in several chunks if necessary, without
>>> having to change the interface again.
>>>
>>> -- Implementation notes --
>>>
>>> The t_info area is allocated once at boot.  Trace buffers are
>>> allocated either at boot (if a parameter is passed) or when
>>> TBUFOP_set_size is called.  Due to the complexity of tracking pages
>>> mapped by dom0, unmapping or resizing trace buffers is not supported.
>>>
>>> I introduced a new per-cpu spinlock guarding trace data and buffers.
>>> This allows per-cpu data to be safely accessed and modified without
>>> tracing with current tracing events.  The per-cpu spinlock is grabbed
>>> whenever a trace event is generated; but in the (very very very)
>>> common case, the lock should be in the cache already.
>>>
>>> Feedback welcome.
>>>
>>>  -George
>>>
>>>       
>
>
>   

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC][PATCH] Per-cpu xentrace buffers
  2010-01-20 18:06     ` George Dunlap
@ 2010-01-20 18:34       ` Keir Fraser
  0 siblings, 0 replies; 5+ messages in thread
From: Keir Fraser @ 2010-01-20 18:34 UTC (permalink / raw)
  To: George Dunlap; +Cc: xen-devel@lists.xensource.com

Final release is still a few weeks away. It should probably go in for rc2
then.

 -- Keir

On 20/01/2010 18:06, "George Dunlap" <George.Dunlap@eu.citrix.com> wrote:

> How long between rc2 and expected release (if no other candidates are
> considered)?  It's more of a debugging feature, so it's not going to
> screw over any production systems if it's got some subtle bugs.  (The
> "tb_init_done" flag that turns it on or off is exactly the same.)  I
> could try to put it through its paces this week and early next week, and
> if nothing turns up, it's probably fine to go in.
> 
> It will definitely require a tools rebuild if anyone's using xentrace,
> which people may not expect. :-)
> 
>  -George
> 
> Keir Fraser wrote:
>> Oh, I'm fine with it. I wasn't sure about putting it in for 4.0.0, but
>> actually plenty is going in for rc2. What do you think?
>> 
>>  -- Keir
>> 
>> On 20/01/2010 17:38, "George Dunlap" <George.Dunlap@eu.citrix.com> wrote:
>> 
>>   
>>> Keir, would you mind commenting on this new design in the next few
>>> days?  If it looks like a good design, I'd like to do some more
>>> testing and get this into our next XenServer release.
>>> 
>>>  -George
>>> 
>>> On Thu, Jan 7, 2010 at 3:13 PM, George Dunlap <dunlapg@umich.edu> wrote:
>>>     
>>>> In the current xentrace configuration, xentrace buffers are all
>>>> allocated in a single contiguous chunk, and then divided among logical
>>>> cpus, one buffer per cpu.  The size of an allocatable chunk is fairly
>>>> limited, in my experience about 128 pages (512KiB).  As the number of
>>>> logical cores increase, this means a much smaller maximum per-cpu
>>>> trace buffer per cpu; on my dual-socket quad-core nehalem box with
>>>> hyperthreading (16 logical cpus), that comes to 8 pages per logical
>>>> cpu.
>>>> 
>>>> The attached patch addresses this issue by allocating per-cpu buffers
>>>> separately.  This allows larger trace buffers; however, it requires an
>>>> interface change to xentrace, which is why I'm making a Request For
>>>> Comments.  (I'm not expecting this patch to be included in the 4.0
>>>> release.)
>>>> 
>>>> The old interface to get trace buffers was fairly simple: you ask for
>>>> the info, and it gives you:
>>>> * the mfn of the first page in the buffer allocation
>>>> * the total size of the trace buffer
>>>> 
>>>> The tools then mapped [mfn,mfn+size), calculated where the per-pcpu
>>>> buffers were, and went on to consume records from them.
>>>> 
>>>> -- Interface --
>>>> 
>>>> The proposed interface works as follows.
>>>> 
>>>> * XEN_SYSCTL_TBUFOP_get_info still returns an mfn and a size (so no
>>>> changes to the library).  However, this new are is to a trace buffer
>>>> info area  (t_info), allocated once at boot time.  The trace buffer
>>>> info area contains mfns of the per-pcpu buffers.
>>>> * The t_info struct contains an array of "offset pointers", one per
>>>> pcpu.  These are an offset into the t_info data area of an array of
>>>> mfns for that pcpu.  So logically, the layout looks like this:
>>>> struct {
>>>>  int16_t tbuf_size; /* Number of pages per cpu */
>>>>  int16_t offset[NR_CPUS]; /* Offset into the t_info area of the array */
>>>>  uint32_t mfn[NR_CPUS][TBUF_SIZE];
>>>> };
>>>> 
>>>> So if NR_CPUS was 16, and TBUF_SIZE was 32, we'd have:
>>>> struct {
>>>>  int16_t tbuf_size; /* Number of pages per cpu */
>>>>  int16_t offset[16]; /* Offset into the t_info area of the array */
>>>>  uint32_t p0_mfn_list[32];
>>>>  uint32_t p1_mfn_list[32];
>>>>  ...
>>>>  uint32_t p15_mfn_list[32];
>>>> };
>>>> * So the new way to map trace buffers is as follows:
>>>>  + Call TBUFOP_get_info to get the mfn and size of the t_info area, and map
>>>> it.
>>>>  + Get the number of cpus
>>>>  + For each cpu:
>>>>  - Calculate the offset into the t_info area thus: unsigned long
>>>> *mfn_list = ((unsigned long*)t_info)+(t_info->cpu_offset[cpu]))
>>>>  - Map t_info->tbuf_size mfns from mfn_list using xc_map_foreign_batch()
>>>> 
>>>> In the current implementation, the t_info size is fixed at 2 pages,
>>>> allowing about 2000 pages total to be mapped.  For a 32-way system,
>>>> this would allow up to 63 pages per cpu (256MiB).  Bumping this up to
>>>> 4 would allow even larger systems if required.
>>>> 
>>>> The current implementation also allocates each trace buffer
>>>> contiguously, since that's the easiest way to get contiguous virtual
>>>> address space.  But this interface allows Xen the flexibility, in the
>>>> future, to allocate buffers in several chunks if necessary, without
>>>> having to change the interface again.
>>>> 
>>>> -- Implementation notes --
>>>> 
>>>> The t_info area is allocated once at boot.  Trace buffers are
>>>> allocated either at boot (if a parameter is passed) or when
>>>> TBUFOP_set_size is called.  Due to the complexity of tracking pages
>>>> mapped by dom0, unmapping or resizing trace buffers is not supported.
>>>> 
>>>> I introduced a new per-cpu spinlock guarding trace data and buffers.
>>>> This allows per-cpu data to be safely accessed and modified without
>>>> tracing with current tracing events.  The per-cpu spinlock is grabbed
>>>> whenever a trace event is generated; but in the (very very very)
>>>> common case, the lock should be in the cache already.
>>>> 
>>>> Feedback welcome.
>>>> 
>>>>  -George
>>>> 
>>>>       
>> 
>> 
>>   
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2010-01-20 18:34 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-01-07 15:13 [RFC][PATCH] Per-cpu xentrace buffers George Dunlap
2010-01-20 17:38 ` George Dunlap
2010-01-20 17:50   ` Keir Fraser
2010-01-20 18:06     ` George Dunlap
2010-01-20 18:34       ` Keir Fraser

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.