From mboxrd@z Thu Jan 1 00:00:00 1970 From: Konrad Rzeszutek Wilk Subject: heap_lock optimizations? Date: Mon, 15 Jul 2013 11:15:25 -0400 Message-ID: <20130715151525.GG4817@phenom.dumpdata.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: Content-Disposition: inline List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: tim@xen.org, xen-devel@lists.xensource.com List-Id: xen-devel@lists.xenproject.org Hey Tim, I was looking at making the 'Scrubbing Free RAM:' code faster on 1TB boxes with 128 CPUs. And naively I wrote code that setup a tasklet on each CPU and scrub a swatch of MFNs. Unfortunatly even on 8VCPU machines the end result was a slower boot time! The culprit looks to be the heap_lock that is taken and released on every MFN (for fun I added a bit of code to do batches - of 32 MFNs and to iterate over those 32 MFNs while holding the lock - that did make it a bit faster, but not by a much). What I am wondering is: - Have you ever thought about optimizing this? If so, how? - Another idea to potentially make this faster was to seperate this scrubbing in two stages: 1) (under the heap_lock) - reserve/take a giant set of MFN pages (perhaps also consult the NUMA affinity). This would be usurping the whole heap[zone]. 2) Give it out to the CPUS to scrub (this would be done without being under a spinlock). The heap[zone] would be split equally amongst the CPUs. 3) Goto 1 until done. - Look for examples in the Linux kernel to see how it does it. Thanks!