* heap_lock optimizations?
@ 2013-07-15 15:15 Konrad Rzeszutek Wilk
2013-07-15 15:46 ` Malcolm Crossley
2013-07-15 16:09 ` Tim Deegan
0 siblings, 2 replies; 5+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-07-15 15:15 UTC (permalink / raw)
To: tim, xen-devel
Hey Tim,
I was looking at making the 'Scrubbing Free RAM:' code faster on 1TB
boxes with 128 CPUs. And naively I wrote code that setup a tasklet
on each CPU and scrub a swatch of MFNs. Unfortunatly even on 8VCPU
machines the end result was a slower boot time!
The culprit looks to be the heap_lock that is taken and released
on every MFN (for fun I added a bit of code to do batches - of
32 MFNs and to iterate over those 32 MFNs while holding the lock - that
did make it a bit faster, but not by a much).
What I am wondering is:
- Have you ever thought about optimizing this? If so, how?
- Another idea to potentially make this faster was to seperate this
scrubbing in two stages:
1) (under the heap_lock) - reserve/take a giant set of MFN pages
(perhaps also consult the NUMA affinity). This would be
usurping the whole heap[zone].
2) Give it out to the CPUS to scrub (this would be done without being
under a spinlock). The heap[zone] would be split equally amongst the
CPUs.
3) Goto 1 until done.
- Look for examples in the Linux kernel to see how it does it.
Thanks!
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: heap_lock optimizations?
2013-07-15 15:15 heap_lock optimizations? Konrad Rzeszutek Wilk
@ 2013-07-15 15:46 ` Malcolm Crossley
2013-07-15 16:14 ` Konrad Rzeszutek Wilk
2013-07-16 16:41 ` Konrad Rzeszutek Wilk
2013-07-15 16:09 ` Tim Deegan
1 sibling, 2 replies; 5+ messages in thread
From: Malcolm Crossley @ 2013-07-15 15:46 UTC (permalink / raw)
To: xen-devel
On 15/07/13 16:15, Konrad Rzeszutek Wilk wrote:
> Hey Tim,
>
> I was looking at making the 'Scrubbing Free RAM:' code faster on 1TB
> boxes with 128 CPUs. And naively I wrote code that setup a tasklet
> on each CPU and scrub a swatch of MFNs. Unfortunatly even on 8VCPU
> machines the end result was a slower boot time!
>
> The culprit looks to be the heap_lock that is taken and released
> on every MFN (for fun I added a bit of code to do batches - of
> 32 MFNs and to iterate over those 32 MFNs while holding the lock - that
> did make it a bit faster, but not by a much).
>
> What I am wondering is:
> - Have you ever thought about optimizing this? If so, how?
> - Another idea to potentially make this faster was to seperate this
> scrubbing in two stages:
> 1) (under the heap_lock) - reserve/take a giant set of MFN pages
> (perhaps also consult the NUMA affinity). This would be
> usurping the whole heap[zone].
> 2) Give it out to the CPUS to scrub (this would be done without being
> under a spinlock). The heap[zone] would be split equally amongst the
> CPUs.
> 3) Goto 1 until done.
> - Look for examples in the Linux kernel to see how it does it.
>
> Thanks!
Hi Konrad,
Did you see a patch I posted for this last year?
http://lists.xen.org/archives/html/xen-devel/2012-05/msg00701.html
Unfortunately I made some minor errors and it didn't apply cleanly but
I'll fix it up now and repost so you can test it.
Malcolm
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: heap_lock optimizations?
2013-07-15 15:15 heap_lock optimizations? Konrad Rzeszutek Wilk
2013-07-15 15:46 ` Malcolm Crossley
@ 2013-07-15 16:09 ` Tim Deegan
1 sibling, 0 replies; 5+ messages in thread
From: Tim Deegan @ 2013-07-15 16:09 UTC (permalink / raw)
To: Konrad Rzeszutek Wilk; +Cc: Malcolm Crossley, xen-devel
Hi,
At 11:15 -0400 on 15 Jul (1373886925), Konrad Rzeszutek Wilk wrote:
> Hey Tim,
>
> I was looking at making the 'Scrubbing Free RAM:' code faster on 1TB
> boxes with 128 CPUs. And naively I wrote code that setup a tasklet
> on each CPU and scrub a swatch of MFNs. Unfortunatly even on 8VCPU
> machines the end result was a slower boot time!
>
> The culprit looks to be the heap_lock that is taken and released
> on every MFN (for fun I added a bit of code to do batches - of
> 32 MFNs and to iterate over those 32 MFNs while holding the lock - that
> did make it a bit faster, but not by a much).
>
> What I am wondering is:
> - Have you ever thought about optimizing this? If so, how?
Malcolm Crossley posted an RFC patch a while ago to do this kind of
stuff -- parcelled out RAM to socket-local CPUs and IIRC took the
heap-lock once for all on the coordinating CPU.
http://lists.xen.org/archives/html/xen-devel/2012-05/msg00701.html
AIUI he's going to send a v2 now that 4.3 is done.
Tim.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: heap_lock optimizations?
2013-07-15 15:46 ` Malcolm Crossley
@ 2013-07-15 16:14 ` Konrad Rzeszutek Wilk
2013-07-16 16:41 ` Konrad Rzeszutek Wilk
1 sibling, 0 replies; 5+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-07-15 16:14 UTC (permalink / raw)
To: Malcolm Crossley; +Cc: xen-devel
On Mon, Jul 15, 2013 at 04:46:37PM +0100, Malcolm Crossley wrote:
> On 15/07/13 16:15, Konrad Rzeszutek Wilk wrote:
> >Hey Tim,
> >
> >I was looking at making the 'Scrubbing Free RAM:' code faster on 1TB
> >boxes with 128 CPUs. And naively I wrote code that setup a tasklet
> >on each CPU and scrub a swatch of MFNs. Unfortunatly even on 8VCPU
> >machines the end result was a slower boot time!
> >
> >The culprit looks to be the heap_lock that is taken and released
> >on every MFN (for fun I added a bit of code to do batches - of
> >32 MFNs and to iterate over those 32 MFNs while holding the lock - that
> >did make it a bit faster, but not by a much).
> >
> >What I am wondering is:
> > - Have you ever thought about optimizing this? If so, how?
> > - Another idea to potentially make this faster was to seperate this
> > scrubbing in two stages:
> > 1) (under the heap_lock) - reserve/take a giant set of MFN pages
> > (perhaps also consult the NUMA affinity). This would be
> > usurping the whole heap[zone].
> > 2) Give it out to the CPUS to scrub (this would be done without being
> > under a spinlock). The heap[zone] would be split equally amongst the
> > CPUs.
> > 3) Goto 1 until done.
> > - Look for examples in the Linux kernel to see how it does it.
> >
> >Thanks!
> Hi Konrad,
>
> Did you see a patch I posted for this last year?
> http://lists.xen.org/archives/html/xen-devel/2012-05/msg00701.html
I did not.
>
> Unfortunately I made some minor errors and it didn't apply cleanly
> but I'll fix it up now and repost so you can test it.
Ah, it follows similar logic to mine. I used tasklet but you are using
IPIs. That might be better.
Will wait for your patch and test it out. Thanks!
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: heap_lock optimizations?
2013-07-15 15:46 ` Malcolm Crossley
2013-07-15 16:14 ` Konrad Rzeszutek Wilk
@ 2013-07-16 16:41 ` Konrad Rzeszutek Wilk
1 sibling, 0 replies; 5+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-07-16 16:41 UTC (permalink / raw)
To: Malcolm Crossley; +Cc: xen-devel
On Mon, Jul 15, 2013 at 04:46:37PM +0100, Malcolm Crossley wrote:
> On 15/07/13 16:15, Konrad Rzeszutek Wilk wrote:
> >Hey Tim,
> >
> >I was looking at making the 'Scrubbing Free RAM:' code faster on 1TB
> >boxes with 128 CPUs. And naively I wrote code that setup a tasklet
> >on each CPU and scrub a swatch of MFNs. Unfortunatly even on 8VCPU
> >machines the end result was a slower boot time!
> >
> >The culprit looks to be the heap_lock that is taken and released
> >on every MFN (for fun I added a bit of code to do batches - of
> >32 MFNs and to iterate over those 32 MFNs while holding the lock - that
> >did make it a bit faster, but not by a much).
> >
> >What I am wondering is:
> > - Have you ever thought about optimizing this? If so, how?
> > - Another idea to potentially make this faster was to seperate this
> > scrubbing in two stages:
> > 1) (under the heap_lock) - reserve/take a giant set of MFN pages
> > (perhaps also consult the NUMA affinity). This would be
> > usurping the whole heap[zone].
> > 2) Give it out to the CPUS to scrub (this would be done without being
> > under a spinlock). The heap[zone] would be split equally amongst the
> > CPUs.
> > 3) Goto 1 until done.
> > - Look for examples in the Linux kernel to see how it does it.
> >
> >Thanks!
> Hi Konrad,
>
> Did you see a patch I posted for this last year?
> http://lists.xen.org/archives/html/xen-devel/2012-05/msg00701.html
>
> Unfortunately I made some minor errors and it didn't apply cleanly
> but I'll fix it up now and repost so you can test it.
I took a stab at it (your updated one), and this is what I found
(this is on a 4CPU box and the cycles count)
14112560772 <- original
14006409540 <- Mine (tasklet) - using old heap_lock
1331412384 <- Malcoms IPI (heap_lock held for a long time).
1374497324 <- Mine (tasklet) - heap_lock held for a long time
Meaning that your usage of IPI is superior. The heap_lock
is held for chunk_size across all of the CPUs and that looks OK to me.
Looking forward to seeing you post the patch. Thanks!
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2013-07-16 16:41 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-07-15 15:15 heap_lock optimizations? Konrad Rzeszutek Wilk
2013-07-15 15:46 ` Malcolm Crossley
2013-07-15 16:14 ` Konrad Rzeszutek Wilk
2013-07-16 16:41 ` Konrad Rzeszutek Wilk
2013-07-15 16:09 ` Tim Deegan
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).