From mboxrd@z Thu Jan 1 00:00:00 1970 From: Keir Fraser Subject: Re: [RFC][PATCH] Per-cpu xentrace buffers Date: Wed, 20 Jan 2010 17:50:05 +0000 Message-ID: References: Mime-Version: 1.0 Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: quoted-printable Return-path: In-Reply-To: List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: George Dunlap , "xen-devel@lists.xensource.com" List-Id: xen-devel@lists.xenproject.org Oh, I'm fine with it. I wasn't sure about putting it in for 4.0.0, but actually plenty is going in for rc2. What do you think? -- Keir On 20/01/2010 17:38, "George Dunlap" wrote: > Keir, would you mind commenting on this new design in the next few > days? If it looks like a good design, I'd like to do some more > testing and get this into our next XenServer release. >=20 > -George >=20 > On Thu, Jan 7, 2010 at 3:13 PM, George Dunlap wrote: >> In the current xentrace configuration, xentrace buffers are all >> allocated in a single contiguous chunk, and then divided among logical >> cpus, one buffer per cpu. =A0The size of an allocatable chunk is fairly >> limited, in my experience about 128 pages (512KiB). =A0As the number of >> logical cores increase, this means a much smaller maximum per-cpu >> trace buffer per cpu; on my dual-socket quad-core nehalem box with >> hyperthreading (16 logical cpus), that comes to 8 pages per logical >> cpu. >>=20 >> The attached patch addresses this issue by allocating per-cpu buffers >> separately. =A0This allows larger trace buffers; however, it requires an >> interface change to xentrace, which is why I'm making a Request For >> Comments. =A0(I'm not expecting this patch to be included in the 4.0 >> release.) >>=20 >> The old interface to get trace buffers was fairly simple: you ask for >> the info, and it gives you: >> * the mfn of the first page in the buffer allocation >> * the total size of the trace buffer >>=20 >> The tools then mapped [mfn,mfn+size), calculated where the per-pcpu >> buffers were, and went on to consume records from them. >>=20 >> -- Interface -- >>=20 >> The proposed interface works as follows. >>=20 >> * XEN_SYSCTL_TBUFOP_get_info still returns an mfn and a size (so no >> changes to the library). =A0However, this new are is to a trace buffer >> info area =A0(t_info), allocated once at boot time. =A0The trace buffer >> info area contains mfns of the per-pcpu buffers. >> * The t_info struct contains an array of "offset pointers", one per >> pcpu. =A0These are an offset into the t_info data area of an array of >> mfns for that pcpu. =A0So logically, the layout looks like this: >> struct { >> =A0int16_t tbuf_size; /* Number of pages per cpu */ >> =A0int16_t offset[NR_CPUS]; /* Offset into the t_info area of the array */ >> =A0uint32_t mfn[NR_CPUS][TBUF_SIZE]; >> }; >>=20 >> So if NR_CPUS was 16, and TBUF_SIZE was 32, we'd have: >> struct { >> =A0int16_t tbuf_size; /* Number of pages per cpu */ >> =A0int16_t offset[16]; /* Offset into the t_info area of the array */ >> =A0uint32_t p0_mfn_list[32]; >> =A0uint32_t p1_mfn_list[32]; >> =A0... >> =A0uint32_t p15_mfn_list[32]; >> }; >> * So the new way to map trace buffers is as follows: >> =A0+ Call TBUFOP_get_info to get the mfn and size of the t_info area, and = map >> it. >> =A0+ Get the number of cpus >> =A0+ For each cpu: >> =A0- Calculate the offset into the t_info area thus: unsigned long >> *mfn_list =3D ((unsigned long*)t_info)+(t_info->cpu_offset[cpu])) >> =A0- Map t_info->tbuf_size mfns from mfn_list using xc_map_foreign_batch() >>=20 >> In the current implementation, the t_info size is fixed at 2 pages, >> allowing about 2000 pages total to be mapped. =A0For a 32-way system, >> this would allow up to 63 pages per cpu (256MiB). =A0Bumping this up to >> 4 would allow even larger systems if required. >>=20 >> The current implementation also allocates each trace buffer >> contiguously, since that's the easiest way to get contiguous virtual >> address space. =A0But this interface allows Xen the flexibility, in the >> future, to allocate buffers in several chunks if necessary, without >> having to change the interface again. >>=20 >> -- Implementation notes -- >>=20 >> The t_info area is allocated once at boot. =A0Trace buffers are >> allocated either at boot (if a parameter is passed) or when >> TBUFOP_set_size is called. =A0Due to the complexity of tracking pages >> mapped by dom0, unmapping or resizing trace buffers is not supported. >>=20 >> I introduced a new per-cpu spinlock guarding trace data and buffers. >> This allows per-cpu data to be safely accessed and modified without >> tracing with current tracing events. =A0The per-cpu spinlock is grabbed >> whenever a trace event is generated; but in the (very very very) >> common case, the lock should be in the cache already. >>=20 >> Feedback welcome. >>=20 >> =A0-George >>=20