All of lore.kernel.org
 help / color / mirror / Atom feed
From: Keir Fraser <keir.fraser@eu.citrix.com>
To: George Dunlap <George.Dunlap@eu.citrix.com>,
	"xen-devel@lists.xensource.com" <xen-devel@lists.xensource.com>
Subject: Re: [RFC][PATCH] Per-cpu xentrace buffers
Date: Wed, 20 Jan 2010 17:50:05 +0000	[thread overview]
Message-ID: <C77CF2CD.6D94%keir.fraser@eu.citrix.com> (raw)
In-Reply-To: <de76405a1001200938j8210aadkeaf5b64e6833cea9@mail.gmail.com>

Oh, I'm fine with it. I wasn't sure about putting it in for 4.0.0, but
actually plenty is going in for rc2. What do you think?

 -- Keir

On 20/01/2010 17:38, "George Dunlap" <George.Dunlap@eu.citrix.com> wrote:

> Keir, would you mind commenting on this new design in the next few
> days?  If it looks like a good design, I'd like to do some more
> testing and get this into our next XenServer release.
> 
>  -George
> 
> On Thu, Jan 7, 2010 at 3:13 PM, George Dunlap <dunlapg@umich.edu> wrote:
>> In the current xentrace configuration, xentrace buffers are all
>> allocated in a single contiguous chunk, and then divided among logical
>> cpus, one buffer per cpu.  The size of an allocatable chunk is fairly
>> limited, in my experience about 128 pages (512KiB).  As the number of
>> logical cores increase, this means a much smaller maximum per-cpu
>> trace buffer per cpu; on my dual-socket quad-core nehalem box with
>> hyperthreading (16 logical cpus), that comes to 8 pages per logical
>> cpu.
>> 
>> The attached patch addresses this issue by allocating per-cpu buffers
>> separately.  This allows larger trace buffers; however, it requires an
>> interface change to xentrace, which is why I'm making a Request For
>> Comments.  (I'm not expecting this patch to be included in the 4.0
>> release.)
>> 
>> The old interface to get trace buffers was fairly simple: you ask for
>> the info, and it gives you:
>> * the mfn of the first page in the buffer allocation
>> * the total size of the trace buffer
>> 
>> The tools then mapped [mfn,mfn+size), calculated where the per-pcpu
>> buffers were, and went on to consume records from them.
>> 
>> -- Interface --
>> 
>> The proposed interface works as follows.
>> 
>> * XEN_SYSCTL_TBUFOP_get_info still returns an mfn and a size (so no
>> changes to the library).  However, this new are is to a trace buffer
>> info area  (t_info), allocated once at boot time.  The trace buffer
>> info area contains mfns of the per-pcpu buffers.
>> * The t_info struct contains an array of "offset pointers", one per
>> pcpu.  These are an offset into the t_info data area of an array of
>> mfns for that pcpu.  So logically, the layout looks like this:
>> struct {
>>  int16_t tbuf_size; /* Number of pages per cpu */
>>  int16_t offset[NR_CPUS]; /* Offset into the t_info area of the array */
>>  uint32_t mfn[NR_CPUS][TBUF_SIZE];
>> };
>> 
>> So if NR_CPUS was 16, and TBUF_SIZE was 32, we'd have:
>> struct {
>>  int16_t tbuf_size; /* Number of pages per cpu */
>>  int16_t offset[16]; /* Offset into the t_info area of the array */
>>  uint32_t p0_mfn_list[32];
>>  uint32_t p1_mfn_list[32];
>>  ...
>>  uint32_t p15_mfn_list[32];
>> };
>> * So the new way to map trace buffers is as follows:
>>  + Call TBUFOP_get_info to get the mfn and size of the t_info area, and map
>> it.
>>  + Get the number of cpus
>>  + For each cpu:
>>  - Calculate the offset into the t_info area thus: unsigned long
>> *mfn_list = ((unsigned long*)t_info)+(t_info->cpu_offset[cpu]))
>>  - Map t_info->tbuf_size mfns from mfn_list using xc_map_foreign_batch()
>> 
>> In the current implementation, the t_info size is fixed at 2 pages,
>> allowing about 2000 pages total to be mapped.  For a 32-way system,
>> this would allow up to 63 pages per cpu (256MiB).  Bumping this up to
>> 4 would allow even larger systems if required.
>> 
>> The current implementation also allocates each trace buffer
>> contiguously, since that's the easiest way to get contiguous virtual
>> address space.  But this interface allows Xen the flexibility, in the
>> future, to allocate buffers in several chunks if necessary, without
>> having to change the interface again.
>> 
>> -- Implementation notes --
>> 
>> The t_info area is allocated once at boot.  Trace buffers are
>> allocated either at boot (if a parameter is passed) or when
>> TBUFOP_set_size is called.  Due to the complexity of tracking pages
>> mapped by dom0, unmapping or resizing trace buffers is not supported.
>> 
>> I introduced a new per-cpu spinlock guarding trace data and buffers.
>> This allows per-cpu data to be safely accessed and modified without
>> tracing with current tracing events.  The per-cpu spinlock is grabbed
>> whenever a trace event is generated; but in the (very very very)
>> common case, the lock should be in the cache already.
>> 
>> Feedback welcome.
>> 
>>  -George
>> 

  reply	other threads:[~2010-01-20 17:50 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-01-07 15:13 [RFC][PATCH] Per-cpu xentrace buffers George Dunlap
2010-01-20 17:38 ` George Dunlap
2010-01-20 17:50   ` Keir Fraser [this message]
2010-01-20 18:06     ` George Dunlap
2010-01-20 18:34       ` Keir Fraser

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=C77CF2CD.6D94%keir.fraser@eu.citrix.com \
    --to=keir.fraser@eu.citrix.com \
    --cc=George.Dunlap@eu.citrix.com \
    --cc=xen-devel@lists.xensource.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.