* [PATCH] CPUIDLE: shorten hpet spin_lock holding time
@ 2010-04-20 5:39 Wei, Gang
2010-04-20 12:49 ` Keir Fraser
0 siblings, 1 reply; 19+ messages in thread
From: Wei, Gang @ 2010-04-20 5:39 UTC (permalink / raw)
To: xen-devel@lists.xensource.com; +Cc: Keir Fraser
[-- Attachment #1: Type: text/plain, Size: 2706 bytes --]
CPUIDLE: shorten hpet spin_lock holding time
Try to reduce spin_lock overhead for deep C state entry/exit. This will benefit systems with a lot of cpus which need the hpet broadcast to wakeup from deep C state.
Signed-off-by: Wei Gang <gang.wei@intel.com>
diff -r 7ee8bb40200a xen/arch/x86/hpet.c
--- a/xen/arch/x86/hpet.c Thu Apr 15 19:11:16 2010 +0100
+++ b/xen/arch/x86/hpet.c Fri Apr 16 15:05:28 2010 +0800
@@ -186,6 +186,9 @@ static void handle_hpet_broadcast(struct
again:
ch->next_event = STIME_MAX;
+
+ spin_unlock_irq(&ch->lock);
+
next_event = STIME_MAX;
mask = (cpumask_t)CPU_MASK_NONE;
now = NOW();
@@ -204,10 +207,14 @@ again:
if ( next_event != STIME_MAX )
{
- if ( reprogram_hpet_evt_channel(ch, next_event, now, 0) )
+ spin_lock_irq(&ch->lock);
+
+ if ( next_event < ch->next_event &&
+ reprogram_hpet_evt_channel(ch, next_event, now, 0) )
goto again;
- }
- spin_unlock_irq(&ch->lock);
+
+ spin_unlock_irq(&ch->lock);
+ }
}
static void hpet_interrupt_handler(int irq, void *data,
@@ -656,10 +663,15 @@ void hpet_broadcast_enter(void)
BUG_ON( !ch );
ASSERT(!local_irq_is_enabled());
- spin_lock(&ch->lock);
if ( hpet_attach_channel )
+ {
+ spin_lock(&ch->lock);
+
hpet_attach_channel(cpu, ch);
+
+ spin_unlock(&ch->lock);
+ }
/* Cancel any outstanding LAPIC timer event and disable interrupts. */
reprogram_timer(0);
@@ -667,6 +679,8 @@ void hpet_broadcast_enter(void)
cpu_set(cpu, ch->cpumask);
+ spin_lock(&ch->lock);
+
/* reprogram if current cpu expire time is nearer */
if ( this_cpu(timer_deadline_end) < ch->next_event )
reprogram_hpet_evt_channel(ch, this_cpu(timer_deadline_end), NOW(), 1);
@@ -683,8 +697,6 @@ void hpet_broadcast_exit(void)
return;
BUG_ON( !ch );
-
- spin_lock_irq(&ch->lock);
if ( cpu_test_and_clear(cpu, ch->cpumask) )
{
@@ -693,14 +705,22 @@ void hpet_broadcast_exit(void)
if ( !reprogram_timer(this_cpu(timer_deadline_start)) )
raise_softirq(TIMER_SOFTIRQ);
+ spin_lock_irq(&ch->lock);
+
if ( cpus_empty(ch->cpumask) && ch->next_event != STIME_MAX )
reprogram_hpet_evt_channel(ch, STIME_MAX, 0, 0);
+
+ spin_unlock_irq(&ch->lock);
}
if ( hpet_detach_channel )
+ {
+ spin_lock_irq(&ch->lock);
+
hpet_detach_channel(cpu);
- spin_unlock_irq(&ch->lock);
+ spin_unlock_irq(&ch->lock);
+ }
}
int hpet_broadcast_is_available(void)
[-- Attachment #2: hpet_short_lock.patch --]
[-- Type: application/octet-stream, Size: 2612 bytes --]
CPUIDLE: shorten hpet spin_lock holding time
Try to reduce spin_lock overhead for deep C state entry/exit. This will benefit systems with a lot of cpus which need the hpet broadcast to wakeup from deep C state.
Signed-off-by: Wei Gang <gang.wei@intel.com>
diff -r 7ee8bb40200a xen/arch/x86/hpet.c
--- a/xen/arch/x86/hpet.c Thu Apr 15 19:11:16 2010 +0100
+++ b/xen/arch/x86/hpet.c Fri Apr 16 15:05:28 2010 +0800
@@ -186,6 +186,9 @@ static void handle_hpet_broadcast(struct
again:
ch->next_event = STIME_MAX;
+
+ spin_unlock_irq(&ch->lock);
+
next_event = STIME_MAX;
mask = (cpumask_t)CPU_MASK_NONE;
now = NOW();
@@ -204,10 +207,14 @@ again:
if ( next_event != STIME_MAX )
{
- if ( reprogram_hpet_evt_channel(ch, next_event, now, 0) )
+ spin_lock_irq(&ch->lock);
+
+ if ( next_event < ch->next_event &&
+ reprogram_hpet_evt_channel(ch, next_event, now, 0) )
goto again;
- }
- spin_unlock_irq(&ch->lock);
+
+ spin_unlock_irq(&ch->lock);
+ }
}
static void hpet_interrupt_handler(int irq, void *data,
@@ -656,10 +663,15 @@ void hpet_broadcast_enter(void)
BUG_ON( !ch );
ASSERT(!local_irq_is_enabled());
- spin_lock(&ch->lock);
if ( hpet_attach_channel )
+ {
+ spin_lock(&ch->lock);
+
hpet_attach_channel(cpu, ch);
+
+ spin_unlock(&ch->lock);
+ }
/* Cancel any outstanding LAPIC timer event and disable interrupts. */
reprogram_timer(0);
@@ -667,6 +679,8 @@ void hpet_broadcast_enter(void)
cpu_set(cpu, ch->cpumask);
+ spin_lock(&ch->lock);
+
/* reprogram if current cpu expire time is nearer */
if ( this_cpu(timer_deadline_end) < ch->next_event )
reprogram_hpet_evt_channel(ch, this_cpu(timer_deadline_end), NOW(), 1);
@@ -683,8 +697,6 @@ void hpet_broadcast_exit(void)
return;
BUG_ON( !ch );
-
- spin_lock_irq(&ch->lock);
if ( cpu_test_and_clear(cpu, ch->cpumask) )
{
@@ -693,14 +705,22 @@ void hpet_broadcast_exit(void)
if ( !reprogram_timer(this_cpu(timer_deadline_start)) )
raise_softirq(TIMER_SOFTIRQ);
+ spin_lock_irq(&ch->lock);
+
if ( cpus_empty(ch->cpumask) && ch->next_event != STIME_MAX )
reprogram_hpet_evt_channel(ch, STIME_MAX, 0, 0);
+
+ spin_unlock_irq(&ch->lock);
}
if ( hpet_detach_channel )
+ {
+ spin_lock_irq(&ch->lock);
+
hpet_detach_channel(cpu);
- spin_unlock_irq(&ch->lock);
+ spin_unlock_irq(&ch->lock);
+ }
}
int hpet_broadcast_is_available(void)
[-- Attachment #3: Type: text/plain, Size: 138 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
^ permalink raw reply [flat|nested] 19+ messages in thread* Re: [PATCH] CPUIDLE: shorten hpet spin_lock holding time 2010-04-20 5:39 [PATCH] CPUIDLE: shorten hpet spin_lock holding time Wei, Gang @ 2010-04-20 12:49 ` Keir Fraser 2010-04-20 14:04 ` Wei, Gang 0 siblings, 1 reply; 19+ messages in thread From: Keir Fraser @ 2010-04-20 12:49 UTC (permalink / raw) To: Wei, Gang, xen-devel@lists.xensource.com Is this a measurable win? The newer locking looks like it could be dodgy on 32-bit Xen: the 64-bit reads of timer_deadline_{start,end} will be non-atomic and unsynchronised so you can read garbage. Even on 64-bit Xen you can read stale values. I'll be surprised if you got a performance win from chopping up critical regions in individual functions like that anyway. -- Keir On 20/04/2010 06:39, "Wei, Gang" <gang.wei@intel.com> wrote: > CPUIDLE: shorten hpet spin_lock holding time > > Try to reduce spin_lock overhead for deep C state entry/exit. This will > benefit systems with a lot of cpus which need the hpet broadcast to wakeup > from deep C state. > > Signed-off-by: Wei Gang <gang.wei@intel.com> > > diff -r 7ee8bb40200a xen/arch/x86/hpet.c > --- a/xen/arch/x86/hpet.c Thu Apr 15 19:11:16 2010 +0100 > +++ b/xen/arch/x86/hpet.c Fri Apr 16 15:05:28 2010 +0800 > @@ -186,6 +186,9 @@ static void handle_hpet_broadcast(struct > > again: > ch->next_event = STIME_MAX; > + > + spin_unlock_irq(&ch->lock); > + > next_event = STIME_MAX; > mask = (cpumask_t)CPU_MASK_NONE; > now = NOW(); > @@ -204,10 +207,14 @@ again: > > if ( next_event != STIME_MAX ) > { > - if ( reprogram_hpet_evt_channel(ch, next_event, now, 0) ) > + spin_lock_irq(&ch->lock); > + > + if ( next_event < ch->next_event && > + reprogram_hpet_evt_channel(ch, next_event, now, 0) ) > goto again; > - } > - spin_unlock_irq(&ch->lock); > + > + spin_unlock_irq(&ch->lock); > + } > } > > static void hpet_interrupt_handler(int irq, void *data, > @@ -656,10 +663,15 @@ void hpet_broadcast_enter(void) > BUG_ON( !ch ); > > ASSERT(!local_irq_is_enabled()); > - spin_lock(&ch->lock); > > if ( hpet_attach_channel ) > + { > + spin_lock(&ch->lock); > + > hpet_attach_channel(cpu, ch); > + > + spin_unlock(&ch->lock); > + } > > /* Cancel any outstanding LAPIC timer event and disable interrupts. */ > reprogram_timer(0); > @@ -667,6 +679,8 @@ void hpet_broadcast_enter(void) > > cpu_set(cpu, ch->cpumask); > > + spin_lock(&ch->lock); > + > /* reprogram if current cpu expire time is nearer */ > if ( this_cpu(timer_deadline_end) < ch->next_event ) > reprogram_hpet_evt_channel(ch, this_cpu(timer_deadline_end), NOW(), > 1); > @@ -683,8 +697,6 @@ void hpet_broadcast_exit(void) > return; > > BUG_ON( !ch ); > - > - spin_lock_irq(&ch->lock); > > if ( cpu_test_and_clear(cpu, ch->cpumask) ) > { > @@ -693,14 +705,22 @@ void hpet_broadcast_exit(void) > if ( !reprogram_timer(this_cpu(timer_deadline_start)) ) > raise_softirq(TIMER_SOFTIRQ); > > + spin_lock_irq(&ch->lock); > + > if ( cpus_empty(ch->cpumask) && ch->next_event != STIME_MAX ) > reprogram_hpet_evt_channel(ch, STIME_MAX, 0, 0); > + > + spin_unlock_irq(&ch->lock); > } > > if ( hpet_detach_channel ) > + { > + spin_lock_irq(&ch->lock); > + > hpet_detach_channel(cpu); > > - spin_unlock_irq(&ch->lock); > + spin_unlock_irq(&ch->lock); > + } > } > > int hpet_broadcast_is_available(void) ^ permalink raw reply [flat|nested] 19+ messages in thread
* RE: [PATCH] CPUIDLE: shorten hpet spin_lock holding time 2010-04-20 12:49 ` Keir Fraser @ 2010-04-20 14:04 ` Wei, Gang 2010-04-20 14:21 ` Keir Fraser 0 siblings, 1 reply; 19+ messages in thread From: Wei, Gang @ 2010-04-20 14:04 UTC (permalink / raw) To: Keir Fraser, xen-devel@lists.xensource.com On Tuesday, 2010-4-20 8:49 PM, Keir Fraser wrote: > Is this a measurable win? The newer locking looks like it could be > dodgy on 32-bit Xen: the 64-bit reads of timer_deadline_{start,end} > will be non-atomic and unsynchronised so you can read garbage. Even > on 64-bit Xen you can read stale values. I'll be surprised if you got > a performance win from chopping up critical regions in individual > functions like that anyway. First of all, this is a measurable power win for cpu overcommitment idle case (vcpu:pcpu > 2:1, pcpu >= 32, guests are non-tickless kernels). As to the non-atomic access to timer_deadline_{start,end}, it should already be there before this patch. It is not protected by the hpet lock. Shall we add rw_lock for each timer_deadline_{start,end}? This can be done separately. Jimmy ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] CPUIDLE: shorten hpet spin_lock holding time 2010-04-20 14:04 ` Wei, Gang @ 2010-04-20 14:21 ` Keir Fraser 2010-04-20 15:20 ` Wei, Gang 2010-04-20 16:05 ` Wei, Gang 0 siblings, 2 replies; 19+ messages in thread From: Keir Fraser @ 2010-04-20 14:21 UTC (permalink / raw) To: Wei, Gang, xen-devel@lists.xensource.com On 20/04/2010 15:04, "Wei, Gang" <gang.wei@intel.com> wrote: > On Tuesday, 2010-4-20 8:49 PM, Keir Fraser wrote: >> Is this a measurable win? The newer locking looks like it could be >> dodgy on 32-bit Xen: the 64-bit reads of timer_deadline_{start,end} >> will be non-atomic and unsynchronised so you can read garbage. Even >> on 64-bit Xen you can read stale values. I'll be surprised if you got >> a performance win from chopping up critical regions in individual >> functions like that anyway. > > First of all, this is a measurable power win for cpu overcommitment idle case > (vcpu:pcpu > 2:1, pcpu >= 32, guests are non-tickless kernels). So lots of short sleep periods, and possibly only a very few HPET channels to share? How prevalent is always-running APIC timer now, and is that going to be supported in future processors? > As to the non-atomic access to timer_deadline_{start,end}, it should already > be there before this patch. It is not protected by the hpet lock. Shall we add > rw_lock for each timer_deadline_{start,end}? This can be done separately. The bug isn't previously there, since the fields will not be read unless the cpu is in ch->cpumask, which (was) protected by ch->lock. That was sufficient since a CPU would not modify timer_deadline_{start,end} between hpet_broadcast_enter and hpet_broadcast_exit. After your patch, handle_hpet_broadcast is no longer fully synchronised against those functions. -- Keir ^ permalink raw reply [flat|nested] 19+ messages in thread
* RE: [PATCH] CPUIDLE: shorten hpet spin_lock holding time 2010-04-20 14:21 ` Keir Fraser @ 2010-04-20 15:20 ` Wei, Gang 2010-04-20 16:05 ` Wei, Gang 1 sibling, 0 replies; 19+ messages in thread From: Wei, Gang @ 2010-04-20 15:20 UTC (permalink / raw) To: Keir Fraser, xen-devel@lists.xensource.com On Tuesday, 2010-4-20 10:22 PM, Keir Fraser wrote: > On 20/04/2010 15:04, "Wei, Gang" <gang.wei@intel.com> wrote: > >> On Tuesday, 2010-4-20 8:49 PM, Keir Fraser wrote: >>> Is this a measurable win? The newer locking looks like it could be >>> dodgy on 32-bit Xen: the 64-bit reads of timer_deadline_{start,end} >>> will be non-atomic and unsynchronised so you can read garbage. Even >>> on 64-bit Xen you can read stale values. I'll be surprised if you >>> got a performance win from chopping up critical regions in >>> individual functions like that anyway. >> >> First of all, this is a measurable power win for cpu overcommitment >> idle case (vcpu:pcpu > 2:1, pcpu >= 32, guests are non-tickless >> kernels). > > So lots of short sleep periods, and possibly only a very few HPET > channels to share? Yes. > How prevalent is always-running APIC timer now, > and is that going to be supported in future processors? Always-running APIC timer just started from Westmere. And I personally guess it should be supported in future processors. There are a lot of existing system still need hpet broadcast wakeup. >> As to the non-atomic access to timer_deadline_{start,end}, it should >> already be there before this patch. It is not protected by the hpet >> lock. Shall we add rw_lock for each timer_deadline_{start,end}? This >> can be done separately. > > The bug isn't previously there, since the fields will not be read > unless the cpu is in ch->cpumask, which (was) protected by ch->lock. > That was sufficient since a CPU would not modify > timer_deadline_{start,end} between hpet_broadcast_enter and > hpet_broadcast_exit. After your patch, handle_hpet_broadcast is no > longer fully synchronised against those functions. Ok, your are right. So we may just need to make sure the cpu is in ch->cpumask while reading the timer_deadline_{start,end}. I can make some changes to my patch. Jimmy ^ permalink raw reply [flat|nested] 19+ messages in thread
* RE: [PATCH] CPUIDLE: shorten hpet spin_lock holding time 2010-04-20 14:21 ` Keir Fraser 2010-04-20 15:20 ` Wei, Gang @ 2010-04-20 16:05 ` Wei, Gang 2010-04-21 8:09 ` Keir Fraser 2010-04-22 8:21 ` Keir Fraser 1 sibling, 2 replies; 19+ messages in thread From: Wei, Gang @ 2010-04-20 16:05 UTC (permalink / raw) To: Keir Fraser, xen-devel@lists.xensource.com [-- Attachment #1: Type: text/plain, Size: 3865 bytes --] Resend. CPUIDLE: shorten hpet spin_lock holding time Try to reduce spin_lock overhead for deep C state entry/exit. This will benefit systems with a lot of cpus which need the hpet broadcast to wakeup from deep C state. Signed-off-by: Wei Gang <gang.wei@intel.com> diff -r dbf0fd95180f xen/arch/x86/hpet.c --- a/xen/arch/x86/hpet.c Tue Apr 20 14:32:53 2010 +0100 +++ b/xen/arch/x86/hpet.c Tue Apr 20 23:48:19 2010 +0800 @@ -186,6 +186,9 @@ static void handle_hpet_broadcast(struct again: ch->next_event = STIME_MAX; + + spin_unlock_irq(&ch->lock); + next_event = STIME_MAX; mask = (cpumask_t)CPU_MASK_NONE; now = NOW(); @@ -193,10 +196,17 @@ again: /* find all expired events */ for_each_cpu_mask(cpu, ch->cpumask) { - if ( per_cpu(timer_deadline_start, cpu) <= now ) - cpu_set(cpu, mask); - else if ( per_cpu(timer_deadline_end, cpu) < next_event ) - next_event = per_cpu(timer_deadline_end, cpu); + spin_lock_irq(&ch->lock); + + if ( cpumask_test_cpu(cpu, ch->cpumask) ) + { + if ( per_cpu(timer_deadline_start, cpu) <= now ) + cpu_set(cpu, mask); + else if ( per_cpu(timer_deadline_end, cpu) < next_event ) + next_event = per_cpu(timer_deadline_end, cpu); + } + + spin_unlock_irq(&ch->lock); } /* wakeup the cpus which have an expired event. */ @@ -204,10 +214,14 @@ again: if ( next_event != STIME_MAX ) { - if ( reprogram_hpet_evt_channel(ch, next_event, now, 0) ) + spin_lock_irq(&ch->lock); + + if ( next_event < ch->next_event && + reprogram_hpet_evt_channel(ch, next_event, now, 0) ) goto again; - } - spin_unlock_irq(&ch->lock); + + spin_unlock_irq(&ch->lock); + } } static void hpet_interrupt_handler(int irq, void *data, @@ -656,17 +670,23 @@ void hpet_broadcast_enter(void) BUG_ON( !ch ); ASSERT(!local_irq_is_enabled()); - spin_lock(&ch->lock); if ( hpet_attach_channel ) + { + spin_lock(&ch->lock); + hpet_attach_channel(cpu, ch); + + spin_unlock(&ch->lock); + } /* Cancel any outstanding LAPIC timer event and disable interrupts. */ reprogram_timer(0); disable_APIC_timer(); + spin_lock(&ch->lock); + cpu_set(cpu, ch->cpumask); - /* reprogram if current cpu expire time is nearer */ if ( this_cpu(timer_deadline_end) < ch->next_event ) reprogram_hpet_evt_channel(ch, this_cpu(timer_deadline_end), NOW(), 1); @@ -684,23 +704,28 @@ void hpet_broadcast_exit(void) BUG_ON( !ch ); + /* Reprogram the deadline; trigger timer work now if it has passed. */ + enable_APIC_timer(); + if ( !reprogram_timer(this_cpu(timer_deadline_start)) ) + raise_softirq(TIMER_SOFTIRQ); + spin_lock_irq(&ch->lock); - if ( cpu_test_and_clear(cpu, ch->cpumask) ) - { - /* Reprogram the deadline; trigger timer work now if it has passed. */ - enable_APIC_timer(); - if ( !reprogram_timer(this_cpu(timer_deadline_start)) ) - raise_softirq(TIMER_SOFTIRQ); - - if ( cpus_empty(ch->cpumask) && ch->next_event != STIME_MAX ) - reprogram_hpet_evt_channel(ch, STIME_MAX, 0, 0); - } + cpu_clear(cpu, ch->cpumask); + if ( cpus_empty(ch->cpumask) && ch->next_event != STIME_MAX ) + reprogram_hpet_evt_channel(ch, STIME_MAX, 0, 0); + + spin_unlock_irq(&ch->lock); + if ( hpet_detach_channel ) + { + spin_lock_irq(&ch->lock); + hpet_detach_channel(cpu); - spin_unlock_irq(&ch->lock); + spin_unlock_irq(&ch->lock); + } } int hpet_broadcast_is_available(void) [-- Attachment #2: hpet_short_lock_v2.patch --] [-- Type: application/octet-stream, Size: 3731 bytes --] CPUIDLE: shorten hpet spin_lock holding time Try to reduce spin_lock overhead for deep C state entry/exit. This will benefit systems with a lot of cpus which need the hpet broadcast to wakeup from deep C state. Signed-off-by: Wei Gang <gang.wei@intel.com> diff -r dbf0fd95180f xen/arch/x86/hpet.c --- a/xen/arch/x86/hpet.c Tue Apr 20 14:32:53 2010 +0100 +++ b/xen/arch/x86/hpet.c Tue Apr 20 23:48:19 2010 +0800 @@ -186,6 +186,9 @@ static void handle_hpet_broadcast(struct again: ch->next_event = STIME_MAX; + + spin_unlock_irq(&ch->lock); + next_event = STIME_MAX; mask = (cpumask_t)CPU_MASK_NONE; now = NOW(); @@ -193,10 +196,17 @@ again: /* find all expired events */ for_each_cpu_mask(cpu, ch->cpumask) { - if ( per_cpu(timer_deadline_start, cpu) <= now ) - cpu_set(cpu, mask); - else if ( per_cpu(timer_deadline_end, cpu) < next_event ) - next_event = per_cpu(timer_deadline_end, cpu); + spin_lock_irq(&ch->lock); + + if ( cpumask_test_cpu(cpu, ch->cpumask) ) + { + if ( per_cpu(timer_deadline_start, cpu) <= now ) + cpu_set(cpu, mask); + else if ( per_cpu(timer_deadline_end, cpu) < next_event ) + next_event = per_cpu(timer_deadline_end, cpu); + } + + spin_unlock_irq(&ch->lock); } /* wakeup the cpus which have an expired event. */ @@ -204,10 +214,14 @@ again: if ( next_event != STIME_MAX ) { - if ( reprogram_hpet_evt_channel(ch, next_event, now, 0) ) + spin_lock_irq(&ch->lock); + + if ( next_event < ch->next_event && + reprogram_hpet_evt_channel(ch, next_event, now, 0) ) goto again; - } - spin_unlock_irq(&ch->lock); + + spin_unlock_irq(&ch->lock); + } } static void hpet_interrupt_handler(int irq, void *data, @@ -656,17 +670,23 @@ void hpet_broadcast_enter(void) BUG_ON( !ch ); ASSERT(!local_irq_is_enabled()); - spin_lock(&ch->lock); if ( hpet_attach_channel ) + { + spin_lock(&ch->lock); + hpet_attach_channel(cpu, ch); + + spin_unlock(&ch->lock); + } /* Cancel any outstanding LAPIC timer event and disable interrupts. */ reprogram_timer(0); disable_APIC_timer(); + spin_lock(&ch->lock); + cpu_set(cpu, ch->cpumask); - /* reprogram if current cpu expire time is nearer */ if ( this_cpu(timer_deadline_end) < ch->next_event ) reprogram_hpet_evt_channel(ch, this_cpu(timer_deadline_end), NOW(), 1); @@ -684,23 +704,28 @@ void hpet_broadcast_exit(void) BUG_ON( !ch ); + /* Reprogram the deadline; trigger timer work now if it has passed. */ + enable_APIC_timer(); + if ( !reprogram_timer(this_cpu(timer_deadline_start)) ) + raise_softirq(TIMER_SOFTIRQ); + spin_lock_irq(&ch->lock); - if ( cpu_test_and_clear(cpu, ch->cpumask) ) - { - /* Reprogram the deadline; trigger timer work now if it has passed. */ - enable_APIC_timer(); - if ( !reprogram_timer(this_cpu(timer_deadline_start)) ) - raise_softirq(TIMER_SOFTIRQ); - - if ( cpus_empty(ch->cpumask) && ch->next_event != STIME_MAX ) - reprogram_hpet_evt_channel(ch, STIME_MAX, 0, 0); - } + cpu_clear(cpu, ch->cpumask); + if ( cpus_empty(ch->cpumask) && ch->next_event != STIME_MAX ) + reprogram_hpet_evt_channel(ch, STIME_MAX, 0, 0); + + spin_unlock_irq(&ch->lock); + if ( hpet_detach_channel ) + { + spin_lock_irq(&ch->lock); + hpet_detach_channel(cpu); - spin_unlock_irq(&ch->lock); + spin_unlock_irq(&ch->lock); + } } int hpet_broadcast_is_available(void) [-- Attachment #3: Type: text/plain, Size: 138 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] CPUIDLE: shorten hpet spin_lock holding time 2010-04-20 16:05 ` Wei, Gang @ 2010-04-21 8:09 ` Keir Fraser 2010-04-21 9:06 ` Wei, Gang 2010-04-22 8:21 ` Keir Fraser 1 sibling, 1 reply; 19+ messages in thread From: Keir Fraser @ 2010-04-21 8:09 UTC (permalink / raw) To: Wei, Gang, xen-devel@lists.xensource.com On 20/04/2010 17:05, "Wei, Gang" <gang.wei@intel.com> wrote: > Resend. > > CPUIDLE: shorten hpet spin_lock holding time > > Try to reduce spin_lock overhead for deep C state entry/exit. This will > benefit systems with a lot of cpus which need the hpet broadcast to wakeup > from deep C state. > > Signed-off-by: Wei Gang <gang.wei@intel.com> It fixes the unsafe accesses to timer_deadline_{start,end} but I still think this optimisation is misguided and also unsafe. There is nothing to stop new CPUs being added to ch->cpumask after you start scanning ch->cpumask. For example, some new CPU which has a timer_deadline_end greater than ch->next_event, so it does not reprogram the HPET. But handle_hpet_broadcast is already mid-scan and misses this new CPU, so it does not reprogram the HPET either. Hence no timer fires for the new CPU and it misses its deadline. Really I think a better approach than something like this patch would be to better advertise the timer_slop=xxx Xen boot parameter for power-saving scenarios. I wonder what your numbers look like if you re-run your benchmark with (say) timer_slop=10000000 (i.e., 10ms slop) on the Xen command line? -- Keir ^ permalink raw reply [flat|nested] 19+ messages in thread
* RE: [PATCH] CPUIDLE: shorten hpet spin_lock holding time 2010-04-21 8:09 ` Keir Fraser @ 2010-04-21 9:06 ` Wei, Gang 2010-04-21 9:25 ` Keir Fraser 0 siblings, 1 reply; 19+ messages in thread From: Wei, Gang @ 2010-04-21 9:06 UTC (permalink / raw) To: Keir Fraser, xen-devel@lists.xensource.com On Wednesday, 2010-4-21 4:10 PM, Keir Fraser wrote: > It fixes the unsafe accesses to timer_deadline_{start,end} but I > still think this optimisation is misguided and also unsafe. There is > nothing to stop new CPUs being added to ch->cpumask after you start > scanning ch->cpumask. For example, some new CPU which has a > timer_deadline_end greater than ch->next_event, so it does not > reprogram the HPET. But handle_hpet_broadcast is already mid-scan and > misses this new CPU, so it does not reprogram the HPET either. Hence > no timer fires for the new CPU and it misses its deadline. This will not happen. ch->next_event has already been set as STIME_MAX before start scanning ch->cpumask, so the new CPU with smallest timer_deadline_end will reprogram the HPET successfully. > Really I think a better approach than something like this patch would > be to better advertise the timer_slop=xxx Xen boot parameter for > power-saving scenarios. I wonder what your numbers look like if you > re-run your benchmark with (say) timer_slop=10000000 (i.e., 10ms > slop) on the Xen command line? I think it is another story. Enlarging timer_slop is one way to aligned & reduce breakevents, it do have effects to save power and possibly bring larger latency. What I am trying to address here is how to reduce spin_lock overheads in idel entry/exit path. The spin_lock overheads along with other overheads in the system with 32pcpu/64vcpu caused >25% cpu utilization while all guest are idle. Jimmy ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] CPUIDLE: shorten hpet spin_lock holding time 2010-04-21 9:06 ` Wei, Gang @ 2010-04-21 9:25 ` Keir Fraser 2010-04-21 9:36 ` Wei, Gang 0 siblings, 1 reply; 19+ messages in thread From: Keir Fraser @ 2010-04-21 9:25 UTC (permalink / raw) To: Wei, Gang, xen-devel@lists.xensource.com On 21/04/2010 10:06, "Wei, Gang" <gang.wei@intel.com> wrote: >> It fixes the unsafe accesses to timer_deadline_{start,end} but I >> still think this optimisation is misguided and also unsafe. There is >> nothing to stop new CPUs being added to ch->cpumask after you start >> scanning ch->cpumask. For example, some new CPU which has a >> timer_deadline_end greater than ch->next_event, so it does not >> reprogram the HPET. But handle_hpet_broadcast is already mid-scan and >> misses this new CPU, so it does not reprogram the HPET either. Hence >> no timer fires for the new CPU and it misses its deadline. > > This will not happen. ch->next_event has already been set as STIME_MAX before > start scanning ch->cpumask, so the new CPU with smallest timer_deadline_end > will reprogram the HPET successfully. Okay, then CPU A executes hpet_broadcast_enter() and programs the HPET channel for its timeout X. Meanwhile concurrently executing handle_hpet_broadcast misses CPU A but finds some other CPU B with timeout Y much later than X, and erroneously programs the HPET channel with Y, causing CPU A to miss its deadline by an arbitrary amount. I dare say I can carry on finding races. :-) > I think it is another story. Enlarging timer_slop is one way to aligned & > reduce breakevents, it do have effects to save power and possibly bring larger > latency. What I am trying to address here is how to reduce spin_lock overheads > in idel entry/exit path. The spin_lock overheads along with other overheads in > the system with 32pcpu/64vcpu caused >25% cpu utilization while all guest are > idle. So far it's looked to me like a correctness/performance tradeoff. :-D -- Keir ^ permalink raw reply [flat|nested] 19+ messages in thread
* RE: [PATCH] CPUIDLE: shorten hpet spin_lock holding time 2010-04-21 9:25 ` Keir Fraser @ 2010-04-21 9:36 ` Wei, Gang 2010-04-21 9:52 ` Keir Fraser 0 siblings, 1 reply; 19+ messages in thread From: Wei, Gang @ 2010-04-21 9:36 UTC (permalink / raw) To: Keir Fraser, xen-devel@lists.xensource.com On Wednesday, 2010-4-21 5:26 PM, Keir Fraser wrote: > On 21/04/2010 10:06, "Wei, Gang" <gang.wei@intel.com> wrote: > >>> It fixes the unsafe accesses to timer_deadline_{start,end} but I >>> still think this optimisation is misguided and also unsafe. There is >>> nothing to stop new CPUs being added to ch->cpumask after you start >>> scanning ch->cpumask. For example, some new CPU which has a >>> timer_deadline_end greater than ch->next_event, so it does not >>> reprogram the HPET. But handle_hpet_broadcast is already mid-scan >>> and misses this new CPU, so it does not reprogram the HPET either. >>> Hence no timer fires for the new CPU and it misses its deadline. >> >> This will not happen. ch->next_event has already been set as >> STIME_MAX before start scanning ch->cpumask, so the new CPU with >> smallest timer_deadline_end will reprogram the HPET successfully. > > Okay, then CPU A executes hpet_broadcast_enter() and programs the HPET > channel for its timeout X. Meanwhile concurrently executing > handle_hpet_broadcast misses CPU A but finds some other CPU B with > timeout Y much later than X, and erroneously programs the HPET > channel with Y, causing CPU A to miss its deadline by an arbitrary > amount. It is also not possible. handle_hpet_broadcast reprogram HPET just while next_event < ch->next_event. Here next_evet is Y, and ch->next_event is X. :-) > I dare say I can carry on finding races. :-) Please go on... The two racing conditions you mentioned were just considered before I resent the patch. :-D ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] CPUIDLE: shorten hpet spin_lock holding time 2010-04-21 9:36 ` Wei, Gang @ 2010-04-21 9:52 ` Keir Fraser 2010-04-21 10:03 ` Keir Fraser 0 siblings, 1 reply; 19+ messages in thread From: Keir Fraser @ 2010-04-21 9:52 UTC (permalink / raw) To: Wei, Gang, xen-devel@lists.xensource.com On 21/04/2010 10:36, "Wei, Gang" <gang.wei@intel.com> wrote: >> Okay, then CPU A executes hpet_broadcast_enter() and programs the HPET >> channel for its timeout X. Meanwhile concurrently executing >> handle_hpet_broadcast misses CPU A but finds some other CPU B with >> timeout Y much later than X, and erroneously programs the HPET >> channel with Y, causing CPU A to miss its deadline by an arbitrary >> amount. > > It is also not possible. handle_hpet_broadcast reprogram HPET just while > next_event < ch->next_event. Here next_evet is Y, and ch->next_event is X. :-) Ah yes, I spotted that just after I replied! Okay I'm almost convinced now, but... >> I dare say I can carry on finding races. :-) > > Please go on... The two racing conditions you mentioned were just considered > before I resent the patch. :-D Okay, one concern I still have is over possible races around cpuidle_wakeup_mwait(). It makes use of a cpumask cpuidle_mwait_flags, avoiding an IPI to cpus in the mask. However, there is nothing to stop the CPU having cleared itself from that cpumask before cpuidle does the write to softirq_pending. In that case, even assuming the CPU is now non-idle and so wakeup is spurious, a subsequent attempt to raise_softirq(TIMER_SOFTIRQ) will incorrectly not IPI because the flag is already set in softirq_pending? This race would be benign in the current locking strategy (without your patch) because the CPU that has left mwait_idle_with_hints() cannot get further than hpet_broadcast_exit() because it will spin on ch->lock, and there is a guaranteed check of softirq_pending at some point after that. However your patch removes that synchronisation because ch->lock is not held across cpuidle_wakeup_mwait(). What do you think to that? -- Keir ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Re: [PATCH] CPUIDLE: shorten hpet spin_lock holding time 2010-04-21 9:52 ` Keir Fraser @ 2010-04-21 10:03 ` Keir Fraser 2010-04-22 3:59 ` Wei, Gang 0 siblings, 1 reply; 19+ messages in thread From: Keir Fraser @ 2010-04-21 10:03 UTC (permalink / raw) To: Wei, Gang, xen-devel@lists.xensource.com On 21/04/2010 10:52, "Keir Fraser" <keir.fraser@eu.citrix.com> wrote: > Okay, one concern I still have is over possible races around > cpuidle_wakeup_mwait(). It makes use of a cpumask cpuidle_mwait_flags, > avoiding an IPI to cpus in the mask. However, there is nothing to stop the > CPU having cleared itself from that cpumask before cpuidle does the write to > softirq_pending. In that case, even assuming the CPU is now non-idle and so > wakeup is spurious, a subsequent attempt to raise_softirq(TIMER_SOFTIRQ) > will incorrectly not IPI because the flag is already set in softirq_pending? Oh, another one, which can **even occur without your patch**: CPU A adds itself to cpuidle_mwait_flags while cpuidle_wakeup_mwait() is running. That function doesn't see CPU A in its first read of the cpumask so it does not set TIMER_SOFTIRQ for CPU A. But it then *does* see CPU A in its final read of the cpumask, and hence clears A from the caller's mask. Hence the caller does not pass CPU A to cpumask_raise_softirq(). Hence CPU A is erroneously not woken. Isn't the synchronisation around those mwait/monitor functions just inherently broken, even without your patch, and your patch just makes it worse? :-) -- Keir ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] CPUIDLE: shorten hpet spin_lock holding time 2010-04-21 10:03 ` Keir Fraser @ 2010-04-22 3:59 ` Wei, Gang 2010-04-22 7:22 ` Keir Fraser 0 siblings, 1 reply; 19+ messages in thread From: Wei, Gang @ 2010-04-22 3:59 UTC (permalink / raw) To: Keir Fraser, xen-devel@lists.xensource.com On Wednesday, 2010-4-21 6:03 PM, Keir Fraser wrote: > On 21/04/2010 10:52, "Keir Fraser" <keir.fraser@eu.citrix.com> wrote: > >> Okay, one concern I still have is over possible races around >> cpuidle_wakeup_mwait(). It makes use of a cpumask >> cpuidle_mwait_flags, avoiding an IPI to cpus in the mask. However, >> there is nothing to stop the CPU having cleared itself from that >> cpumask before cpuidle does the write to softirq_pending. In that >> case, even assuming the CPU is now non-idle and so wakeup is >> spurious, a subsequent attempt to raise_softirq(TIMER_SOFTIRQ) will >> incorrectly not IPI because the flag is already set in >> softirq_pending? If a CPU cleared itself from cpuidle_mwait_flags, then this CPU didn't need a IPI to be waken. And one useless write to softirq_pending doesn't have any side effect. So this case should be acceptable. > Oh, another one, which can **even occur without your patch**: CPU A > adds itself to cpuidle_mwait_flags while cpuidle_wakeup_mwait() is > running. That function doesn't see CPU A in its first read of the > cpumask so it does not set TIMER_SOFTIRQ for CPU A. But it then > *does* see CPU A in its final read of the cpumask, and hence clears A > from the caller's mask. Hence the caller does not pass CPU A to > cpumask_raise_softirq(). Hence CPU A is erroneously not woken. Nice shot. This issue can be resolved via keeping and using a snapshot of cpuidle_mwait_flags in cpuidle_wakeup_mwait. > Isn't the synchronisation around those mwait/monitor functions just > inherently broken, even without your patch, and your patch just makes > it worse? :-) How about now after fixing the second concern you mentioned? Jimmy ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] CPUIDLE: shorten hpet spin_lock holding time 2010-04-22 3:59 ` Wei, Gang @ 2010-04-22 7:22 ` Keir Fraser 2010-04-22 8:19 ` Keir Fraser 0 siblings, 1 reply; 19+ messages in thread From: Keir Fraser @ 2010-04-22 7:22 UTC (permalink / raw) To: Wei, Gang, xen-devel@lists.xensource.com On 22/04/2010 04:59, "Wei, Gang" <gang.wei@intel.com> wrote: >>> Okay, one concern I still have is over possible races around >>> cpuidle_wakeup_mwait(). It makes use of a cpumask >>> cpuidle_mwait_flags, avoiding an IPI to cpus in the mask. However, >>> there is nothing to stop the CPU having cleared itself from that >>> cpumask before cpuidle does the write to softirq_pending. In that >>> case, even assuming the CPU is now non-idle and so wakeup is >>> spurious, a subsequent attempt to raise_softirq(TIMER_SOFTIRQ) will >>> incorrectly not IPI because the flag is already set in >>> softirq_pending? > > If a CPU cleared itself from cpuidle_mwait_flags, then this CPU didn't need a > IPI to be waken. And one useless write to softirq_pending doesn't have any > side effect. So this case should be acceptable. That's not totally convincing. The write to softirq_pending has one extra side effect: it is possible that the next time TIMER_SOFTIRQ really needs to be raised on that CPU, it will not receive notification via IPI, because the flag is already set in its softirq_pending mask. Hm, let me see if I can come up with a patch for this and post it for you. -- Keir ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Re: [PATCH] CPUIDLE: shorten hpet spin_lock holding time 2010-04-22 7:22 ` Keir Fraser @ 2010-04-22 8:19 ` Keir Fraser 2010-04-22 8:23 ` Keir Fraser 0 siblings, 1 reply; 19+ messages in thread From: Keir Fraser @ 2010-04-22 8:19 UTC (permalink / raw) To: Wei, Gang, xen-devel@lists.xensource.com [-- Attachment #1: Type: text/plain, Size: 955 bytes --] On 22/04/2010 08:22, "Keir Fraser" <keir.fraser@eu.citrix.com> wrote: >> If a CPU cleared itself from cpuidle_mwait_flags, then this CPU didn't need a >> IPI to be waken. And one useless write to softirq_pending doesn't have any >> side effect. So this case should be acceptable. > > That's not totally convincing. The write to softirq_pending has one extra > side effect: it is possible that the next time TIMER_SOFTIRQ really needs to > be raised on that CPU, it will not receive notification via IPI, because the > flag is already set in its softirq_pending mask. > > Hm, let me see if I can come up with a patch for this and post it for you. How about the attached patch? It MWAITs on a completely new flag field, avoiding the IPI-avoidance semantics of softirq_pending. It also folds in your patch. It also does wakeup-waiting checks on timer_deadline_start, that being the field that initiates wakeup via the MONITORed memory region. -- Keir [-- Attachment #2: 00-mwait --] [-- Type: application/octet-stream, Size: 1941 bytes --] diff -r b0562b298d73 xen/arch/x86/acpi/cpu_idle.c --- a/xen/arch/x86/acpi/cpu_idle.c Wed Apr 21 12:51:53 2010 +0100 +++ b/xen/arch/x86/acpi/cpu_idle.c Thu Apr 22 09:17:36 2010 +0100 @@ -153,40 +153,45 @@ /* * The bit is set iff cpu use monitor/mwait to enter C state * with this flag set, CPU can be waken up from C state - * by writing to specific memory address, instead of sending IPI + * by writing to specific memory address, instead of sending an IPI. */ static cpumask_t cpuidle_mwait_flags; +static DEFINE_PER_CPU(bool_t, cpuidle_mwait_wakeup); void cpuidle_wakeup_mwait(cpumask_t *mask) { cpumask_t target; - int cpu; + unsigned int cpu; cpus_and(target, *mask, cpuidle_mwait_flags); - /* cpu is 'mwait'ing at softirq_pending, - writing to it will wake up CPU */ + /* CPU is MWAITing on the cpuidle_mwait_wakeup flag. */ for_each_cpu_mask(cpu, target) - set_bit(TIMER_SOFTIRQ, &softirq_pending(cpu)); + per_cpu(cpuidle_mwait_wakeup, cpu) = 0; - cpus_andnot(*mask, *mask, cpuidle_mwait_flags); + cpus_andnot(*mask, *mask, target); } static void mwait_idle_with_hints(unsigned long eax, unsigned long ecx) { int cpu = smp_processor_id(); - __monitor((void *)&softirq_pending(cpu), 0, 0); + __monitor((void *)&this_cpu(cpuidle_mwait_wakeup), 0, 0); + smp_mb(); - smp_mb(); - if (!softirq_pending(cpu)) + /* + * Timer deadline passing is the event on which we will be woken via + * cpuidle_mwait_wakeup. So check it now that the location is armed. + */ + if ( per_cpu(timer_deadline_start, cpu) > NOW() ) { cpu_set(cpu, cpuidle_mwait_flags); - __mwait(eax, ecx); - cpu_clear(cpu, cpuidle_mwait_flags); } + + if ( per_cpu(timer_deadline_start, cpu) <= NOW() ) + raise_softirq(TIMER_SOFTIRQ); } static void acpi_processor_ffh_cstate_enter(struct acpi_processor_cx *cx) [-- Attachment #3: Type: text/plain, Size: 138 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Re: [PATCH] CPUIDLE: shorten hpet spin_lock holding time 2010-04-22 8:19 ` Keir Fraser @ 2010-04-22 8:23 ` Keir Fraser 2010-04-29 11:08 ` Wei, Gang 0 siblings, 1 reply; 19+ messages in thread From: Keir Fraser @ 2010-04-22 8:23 UTC (permalink / raw) To: Wei, Gang, xen-devel@lists.xensource.com On 22/04/2010 09:19, "Keir Fraser" <keir.fraser@eu.citrix.com> wrote: >> Hm, let me see if I can come up with a patch for this and post it for you. > > How about the attached patch? It MWAITs on a completely new flag field, > avoiding the IPI-avoidance semantics of softirq_pending. It also folds in > your patch. It also does wakeup-waiting checks on timer_deadline_start, that > being the field that initiates wakeup via the MONITORed memory region. ...If you do think it looks okay, could you also test it out on relevant hardware? :-) Thanks, Keir ^ permalink raw reply [flat|nested] 19+ messages in thread
* RE: Re: [PATCH] CPUIDLE: shorten hpet spin_lock holding time 2010-04-22 8:23 ` Keir Fraser @ 2010-04-29 11:08 ` Wei, Gang 0 siblings, 0 replies; 19+ messages in thread From: Wei, Gang @ 2010-04-29 11:08 UTC (permalink / raw) To: Keir Fraser, xen-devel@lists.xensource.com [-- Attachment #1: Type: text/plain, Size: 813 bytes --] On Thursday, 2010-4-22 4:23 PM, Keir Fraser wrote: > On 22/04/2010 09:19, "Keir Fraser" <keir.fraser@eu.citrix.com> wrote: >> How about the attached patch? It MWAITs on a completely new flag >> field, avoiding the IPI-avoidance semantics of softirq_pending. It >> also folds in your patch. It also does wakeup-waiting checks on >> timer_deadline_start, that being the field that initiates wakeup via >> the MONITORed memory region. > > ...If you do think it looks okay, could you also test it out on > relevant hardware? :-) Did some modification to this patch -- move the per_cpu mait_wakeup flag into irq_cpustat_t to make it __cacheline_aligned, and add check for timer_deadline_start==0 (means no timer in queue, it took me quite a lot time to find out it is necessary) case. Tested ok. Jimmy [-- Attachment #2: 00-mwait-v2.patch --] [-- Type: application/octet-stream, Size: 3318 bytes --] CPUIDLE: re-implement mwait wakeup process It MWAITs on a completely new flag field, avoiding the IPI-avoidance semantics of softirq_pending. It also does wakeup-waiting checks on timer_deadline_start, that being the field that initiates wakeup via the MONITORed memory region. Signed-off-by: Keir Fraser <keir.fraser@citrix.com> Signed-off-by: Wei Gang <gang.wei@intel.com> diff -r 6e72f5485836 xen/arch/x86/acpi/cpu_idle.c --- a/xen/arch/x86/acpi/cpu_idle.c Thu Apr 29 16:49:54 2010 +0800 +++ b/xen/arch/x86/acpi/cpu_idle.c Thu Apr 29 18:35:38 2010 +0800 @@ -153,40 +153,45 @@ static void acpi_safe_halt(void) /* * The bit is set iff cpu use monitor/mwait to enter C state * with this flag set, CPU can be waken up from C state - * by writing to specific memory address, instead of sending IPI + * by writing to specific memory address, instead of sending an IPI. */ static cpumask_t cpuidle_mwait_flags; void cpuidle_wakeup_mwait(cpumask_t *mask) { cpumask_t target; - int cpu; + unsigned int cpu; cpus_and(target, *mask, cpuidle_mwait_flags); - /* cpu is 'mwait'ing at softirq_pending, - writing to it will wake up CPU */ + /* CPU is MWAITing on the cpuidle_mwait_wakeup flag. */ for_each_cpu_mask(cpu, target) - set_bit(TIMER_SOFTIRQ, &softirq_pending(cpu)); - - cpus_andnot(*mask, *mask, cpuidle_mwait_flags); + mwait_wakeup(cpu) = 0; + + cpus_andnot(*mask, *mask, target); } static void mwait_idle_with_hints(unsigned long eax, unsigned long ecx) { - int cpu = smp_processor_id(); - - __monitor((void *)&softirq_pending(cpu), 0, 0); - + unsigned int cpu = smp_processor_id(); + s_time_t expires = per_cpu(timer_deadline_start, cpu); + + __monitor((void *)&mwait_wakeup(cpu), 0, 0); smp_mb(); - if (!softirq_pending(cpu)) + + /* + * Timer deadline passing is the event on which we will be woken via + * cpuidle_mwait_wakeup. So check it now that the location is armed. + */ + if ( expires > NOW() || expires == 0 ) { cpu_set(cpu, cpuidle_mwait_flags); - __mwait(eax, ecx); - cpu_clear(cpu, cpuidle_mwait_flags); } + + if ( expires <= NOW() && expires > 0 ) + raise_softirq(TIMER_SOFTIRQ); } static void acpi_processor_ffh_cstate_enter(struct acpi_processor_cx *cx) diff -r 6e72f5485836 xen/include/asm-x86/hardirq.h --- a/xen/include/asm-x86/hardirq.h Thu Apr 29 16:49:54 2010 +0800 +++ b/xen/include/asm-x86/hardirq.h Thu Apr 29 16:49:56 2010 +0800 @@ -8,6 +8,7 @@ typedef struct { unsigned long __softirq_pending; unsigned int __local_irq_count; unsigned int __nmi_count; + bool_t __mwait_wakeup; } __cacheline_aligned irq_cpustat_t; #include <xen/irq_cpustat.h> /* Standard mappings for irq_cpustat_t above */ diff -r 6e72f5485836 xen/include/xen/irq_cpustat.h --- a/xen/include/xen/irq_cpustat.h Thu Apr 29 16:49:54 2010 +0800 +++ b/xen/include/xen/irq_cpustat.h Thu Apr 29 16:49:56 2010 +0800 @@ -26,5 +26,6 @@ extern irq_cpustat_t irq_stat[]; #define softirq_pending(cpu) __IRQ_STAT((cpu), __softirq_pending) #define local_irq_count(cpu) __IRQ_STAT((cpu), __local_irq_count) #define nmi_count(cpu) __IRQ_STAT((cpu), __nmi_count) +#define mwait_wakeup(cpu) __IRQ_STAT((cpu), __mwait_wakeup) #endif /* __irq_cpustat_h */ [-- Attachment #3: Type: text/plain, Size: 138 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] CPUIDLE: shorten hpet spin_lock holding time 2010-04-20 16:05 ` Wei, Gang 2010-04-21 8:09 ` Keir Fraser @ 2010-04-22 8:21 ` Keir Fraser 2010-04-29 11:14 ` Wei, Gang 1 sibling, 1 reply; 19+ messages in thread From: Keir Fraser @ 2010-04-22 8:21 UTC (permalink / raw) To: Wei, Gang, xen-devel@lists.xensource.com On 20/04/2010 17:05, "Wei, Gang" <gang.wei@intel.com> wrote: > Resend. > > CPUIDLE: shorten hpet spin_lock holding time > > Try to reduce spin_lock overhead for deep C state entry/exit. This will > benefit systems with a lot of cpus which need the hpet broadcast to wakeup > from deep C state. > > Signed-off-by: Wei Gang <gang.wei@intel.com> I was going to ask: do you still get the decent power-saving win from this new version of the patch? You acquire/release the lock a lot more potentially in this version, since you do so inside the loop over cpumask in handle_hpet_broadcast(). -- Keir ^ permalink raw reply [flat|nested] 19+ messages in thread
* RE: [PATCH] CPUIDLE: shorten hpet spin_lock holding time 2010-04-22 8:21 ` Keir Fraser @ 2010-04-29 11:14 ` Wei, Gang 0 siblings, 0 replies; 19+ messages in thread From: Wei, Gang @ 2010-04-29 11:14 UTC (permalink / raw) To: Keir Fraser, xen-devel@lists.xensource.com On Thursday, 2010-4-22 4:21 PM, Keir Fraser wrote: > On 20/04/2010 17:05, "Wei, Gang" <gang.wei@intel.com> wrote: > >> Resend. >> >> CPUIDLE: shorten hpet spin_lock holding time >> >> Try to reduce spin_lock overhead for deep C state entry/exit. This >> will benefit systems with a lot of cpus which need the hpet >> broadcast to wakeup from deep C state. >> >> Signed-off-by: Wei Gang <gang.wei@intel.com> > > I was going to ask: do you still get the decent power-saving win from > this new version of the patch? You acquire/release the lock a lot more > potentially in this version, since you do so inside the loop over > cpumask in handle_hpet_broadcast(). Yes, I tested it and could still get the decent power-saving win from it. And it should not have racing issue after the mwait-wakeup re-implementation patch. Jimmy ^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2010-04-29 11:14 UTC | newest] Thread overview: 19+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-04-20 5:39 [PATCH] CPUIDLE: shorten hpet spin_lock holding time Wei, Gang 2010-04-20 12:49 ` Keir Fraser 2010-04-20 14:04 ` Wei, Gang 2010-04-20 14:21 ` Keir Fraser 2010-04-20 15:20 ` Wei, Gang 2010-04-20 16:05 ` Wei, Gang 2010-04-21 8:09 ` Keir Fraser 2010-04-21 9:06 ` Wei, Gang 2010-04-21 9:25 ` Keir Fraser 2010-04-21 9:36 ` Wei, Gang 2010-04-21 9:52 ` Keir Fraser 2010-04-21 10:03 ` Keir Fraser 2010-04-22 3:59 ` Wei, Gang 2010-04-22 7:22 ` Keir Fraser 2010-04-22 8:19 ` Keir Fraser 2010-04-22 8:23 ` Keir Fraser 2010-04-29 11:08 ` Wei, Gang 2010-04-22 8:21 ` Keir Fraser 2010-04-29 11:14 ` Wei, Gang
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).