[PATCH] C6 state with EOI issue fix for some Intel processors

xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] C6 state with EOI issue fix for some Intel processors
@ 2010-09-15  7:10 Sheng Yang
  2010-09-15  7:18 ` Sheng Yang
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Sheng Yang @ 2010-09-15  7:10 UTC (permalink / raw)
  To: Keir Fraser; +Cc: xen-devel@lists.xensource.com

[-- Attachment #1: Type: Text/Plain, Size: 583 bytes --]

There is an errata in some of Intel processors.

AAJ72. EOI Transaction May Not be Sent if Software Enters Core C6 During
an Interrupt Service Routine

If core C6 is entered after the start of an interrupt service routine but before
a write to the APIC EOI register, the core may not send an EOI transaction (if
needed) and further interrupts from the same priority level or lower may be
blocked.

This patch fix this issue, by checking if ISR is pending before enter deep Cx 
state. If so, it would use power->safe_state instead of deep Cx state to prevent 
the above issue happen.

[-- Attachment #2: c6_eoi_fix.patch --]
[-- Type: text/x-patch, Size: 1429 bytes --]

diff --git a/xen/arch/x86/acpi/cpu_idle.c b/xen/arch/x86/acpi/cpu_idle.c
--- a/xen/arch/x86/acpi/cpu_idle.c
+++ b/xen/arch/x86/acpi/cpu_idle.c
@@ -367,6 +367,28 @@
     return atomic_read(&this_cpu(schedule_data).urgent_count);
 }
 
+static int cpu_has_isr_pending(void)
+{
+    int i;
+
+    for ( i = 1; i < 8; i++ )
+        if ( apic_read(APIC_ISR + (i << 4)) != 0 )
+            return 1;
+    return 0;
+}
+
+int errata_c6_eoi_fix_needed(void)
+{
+    int model = boot_cpu_data.x86_model;
+    if ( boot_cpu_data.x86_vendor == X86_VENDOR_INTEL &&
+            boot_cpu_data.x86 == 6 &&
+            ((model == 0x1a || model == 0x1e || model == 0x1f ||
+	     model == 0x25 || model == 0x2c || model == 0x2f) &&
+              !directed_eoi_enabled) )
+        return 1;
+    return 0;
+}
+
 static void acpi_processor_idle(void)
 {
     struct acpi_processor_power *power = processor_powers[smp_processor_id()];
@@ -417,6 +439,16 @@
         return;
     }
 
+    /*
+     * There was an errata with some Core i7 processors that, EOI
+     * transaction may not be sent if software enters core C6 during an
+     * interrupt service routine. So we don't want to get into deep Cx
+     * state if there was isr pending.
+     */
+    if ( cpu_has_apic && errata_c6_eoi_fix_needed() &&
+           cx->type == ACPI_STATE_C3 && cpu_has_isr_pending() )
+        cx = power->safe_state;
+
     power->last_state = cx;
 
     /*

[-- Attachment #3: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] C6 state with EOI issue fix for some Intel processors
  2010-09-15  7:10 [PATCH] C6 state with EOI issue fix for some Intel processors Sheng Yang
@ 2010-09-15  7:18 ` Sheng Yang
  2010-09-15  7:32 ` Keir Fraser
  2010-09-15  8:03 ` Keir Fraser
  2 siblings, 0 replies; 6+ messages in thread
From: Sheng Yang @ 2010-09-15  7:18 UTC (permalink / raw)
  To: xen-devel; +Cc: Keir Fraser

On Wednesday 15 September 2010 15:10:43 Sheng Yang wrote:
> There is an errata in some of Intel processors.
> 
> AAJ72. EOI Transaction May Not be Sent if Software Enters Core C6 During
> an Interrupt Service Routine
> 
> If core C6 is entered after the start of an interrupt service routine but
> before a write to the APIC EOI register, the core may not send an EOI
> transaction (if needed) and further interrupts from the same priority
> level or lower may be blocked.
> 
> This patch fix this issue, by checking if ISR is pending before enter deep
> Cx state. If so, it would use power->safe_state instead of deep Cx state
> to prevent the above issue happen.

Signed-off-by: Sheng Yang <sheng@linux.intel.com>

--
regards
Yang, Sheng

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] C6 state with EOI issue fix for some Intel processors
  2010-09-15  7:10 [PATCH] C6 state with EOI issue fix for some Intel processors Sheng Yang
  2010-09-15  7:18 ` Sheng Yang
@ 2010-09-15  7:32 ` Keir Fraser
  2010-09-15  8:03 ` Keir Fraser
  2 siblings, 0 replies; 6+ messages in thread
From: Keir Fraser @ 2010-09-15  7:32 UTC (permalink / raw)
  To: Sheng Yang; +Cc: xen-devel@lists.xensource.com

Aieee! :-)

 K.

On 15/09/2010 08:10, "Sheng Yang" <sheng@linux.intel.com> wrote:

> There is an errata in some of Intel processors.
> 
> AAJ72. EOI Transaction May Not be Sent if Software Enters Core C6 During
> an Interrupt Service Routine
> 
> If core C6 is entered after the start of an interrupt service routine but
> before
> a write to the APIC EOI register, the core may not send an EOI transaction (if
> needed) and further interrupts from the same priority level or lower may be
> blocked.
> 
> This patch fix this issue, by checking if ISR is pending before enter deep Cx
> state. If so, it would use power->safe_state instead of deep Cx state to
> prevent 
> the above issue happen.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] C6 state with EOI issue fix for some Intel processors
  2010-09-15  7:10 [PATCH] C6 state with EOI issue fix for some Intel processors Sheng Yang
  2010-09-15  7:18 ` Sheng Yang
  2010-09-15  7:32 ` Keir Fraser
@ 2010-09-15  8:03 ` Keir Fraser
  2010-09-15 13:42   ` Andreas Kinzler
  2 siblings, 1 reply; 6+ messages in thread
From: Keir Fraser @ 2010-09-15  8:03 UTC (permalink / raw)
  To: Sheng Yang; +Cc: xen-devel@lists.xensource.com

On 15/09/2010 08:10, "Sheng Yang" <sheng@linux.intel.com> wrote:

> This patch fix this issue, by checking if ISR is pending before enter deep Cx
> state. If so, it would use power->safe_state instead of deep Cx state to
> prevent 
> the above issue happen.

Thanks. I reworked this patch substantially and applied as
xen-unstable:22160 and xen-4.0-testing:21348.

 -- Keir

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Re: [PATCH] C6 state with EOI issue fix for some Intel processors
  2010-09-15  8:03 ` Keir Fraser
@ 2010-09-15 13:42   ` Andreas Kinzler
  2010-09-16  0:23     ` Sheng Yang
  0 siblings, 1 reply; 6+ messages in thread
From: Andreas Kinzler @ 2010-09-15 13:42 UTC (permalink / raw)
  To: Keir Fraser, sheng; +Cc: xen-devel

On 15.09.2010 10:03, Keir Fraser wrote:
>> This patch fix this issue, by checking if ISR is pending before enter deep Cx
>> state. If so, it would use power->safe_state instead of deep Cx state to
>> prevent
>> the above issue happen.
> Thanks. I reworked this patch substantially and applied as
> xen-unstable:22160 and xen-4.0-testing:21348.

I tested the patch on vanilla 4.0.1 and it does help a bit. Uptime was 
now over 100 minutes instead of under 3 minutes. But problems still 
occurred (aacraid reset, eth reset).

With my patch from 
(http://lists.xensource.com/archives/html/xen-devel/2010-09/msg00556.html) 
the machine uptime was over 10 days when I stopped the test.

Regards Andreas

Sep 15 14:55:19 virt kernel: ------------[ cut here ]------------
Sep 15 14:55:19 virt kernel: WARNING: at net/sched/sch_generic.c:261 
dev_watchdog+0x220/0x230()
Sep 15 14:55:19 virt kernel: Hardware name: X8SIL
Sep 15 14:55:19 virt kernel: NETDEV WATCHDOG: peth0 (e1000e): transmit 
queue 0 timed out
Sep 15 14:55:19 virt kernel: Modules linked in: bridge stp llc 
iptable_filter xt_MARK xt_mark xt_iprange xt_conntrack nf_conntrack 
ip_tables x_tables tun loop e1000e
Sep 15 14:55:19 virt kernel: Pid: 4088, comm: blkback.1.hdc Not tainted 
2.6.32.18-pvops0-ak3 #1
Sep 15 14:55:19 virt kernel: Call Trace:
Sep 15 14:55:19 virt kernel: <IRQ>  [<ffffffff810458f6>] 
warn_slowpath_common+0x76/0xb0
Sep 15 14:55:19 virt kernel: [<ffffffff8104598c>] 
warn_slowpath_fmt+0x3c/0x40
Sep 15 14:55:19 virt kernel: [<ffffffff812f1b70>] dev_watchdog+0x220/0x230
Sep 15 14:55:19 virt kernel: [<ffffffff81050820>] ? mod_timer+0x110/0x180
Sep 15 14:55:19 virt kernel: [<ffffffff81091c40>] ? 
sync_supers_timer_fn+0x0/0x20
Sep 15 14:55:19 virt kernel: [<ffffffff812f1950>] ? dev_watchdog+0x0/0x230
Sep 15 14:55:19 virt kernel: [<ffffffff810502fc>] 
run_timer_softirq+0x14c/0x230
Sep 15 14:55:19 virt kernel: [<ffffffff8104b72f>] __do_softirq+0xaf/0x140
Sep 15 14:55:19 virt kernel: [<ffffffff811c5c09>] ? 
__xen_evtchn_do_upcall+0x219/0x230
Sep 15 14:55:19 virt kernel: [<ffffffff8101357c>] call_softirq+0x1c/0x30
Sep 15 14:55:19 virt kernel: [<ffffffff81015675>] do_softirq+0x65/0xa0
Sep 15 14:55:19 virt kernel: [<ffffffff8104b3fd>] irq_exit+0x8d/0x90
Sep 15 14:55:19 virt kernel: [<ffffffff811c5cdd>] 
xen_evtchn_do_upcall+0x3d/0x60
Sep 15 14:55:19 virt kernel: [<ffffffff810135ce>] 
xen_do_hypervisor_callback+0x1e/0x30
Sep 15 14:55:19 virt kernel: <EOI>  [<ffffffff8100922a>] ? 
hypercall_page+0x22a/0x1010
Sep 15 14:55:19 virt kernel: [<ffffffff8100922a>] ? 
hypercall_page+0x22a/0x1010
Sep 15 14:55:19 virt kernel: [<ffffffff8100ed7d>] ? 
xen_force_evtchn_callback+0xd/0x10
Sep 15 14:55:19 virt kernel: [<ffffffff8100f712>] ? check_events+0x12/0x20
Sep 15 14:55:19 virt kernel: [<ffffffff8100f6b9>] ? 
xen_irq_enable_direct_end+0x0/0x7
Sep 15 14:55:19 virt kernel: [<ffffffff8135e9dd>] ? 
_spin_unlock_irq+0xd/0x40
Sep 15 14:55:19 virt kernel: [<ffffffff81151955>] ? 
generic_unplug_device+0x35/0x40
Sep 15 14:55:19 virt kernel: [<ffffffff811cf456>] ? unplug_queue+0x26/0x50
Sep 15 14:55:19 virt kernel: [<ffffffff811d001e>] ? 
blkif_schedule+0xde/0x320
Sep 15 14:55:19 virt kernel: [<ffffffff8105c530>] ? 
autoremove_wake_function+0x0/0x40
Sep 15 14:55:19 virt kernel: [<ffffffff8135ea42>] ? 
_spin_unlock_irqrestore+0x32/0x40
Sep 15 14:55:19 virt kernel: [<ffffffff811cff40>] ? blkif_schedule+0x0/0x320
Sep 15 14:55:19 virt kernel: [<ffffffff8105c24e>] ? kthread+0x8e/0xa0
Sep 15 14:55:19 virt kernel: [<ffffffff8101347a>] ? child_rip+0xa/0x20
Sep 15 14:55:19 virt kernel: [<ffffffff81012626>] ? 
int_ret_from_sys_call+0x7/0x1b
Sep 15 14:55:19 virt kernel: [<ffffffff81012de1>] ? 
retint_restore_args+0x5/0x6
Sep 15 14:55:19 virt kernel: [<ffffffff81013470>] ? child_rip+0x0/0x20
Sep 15 14:55:19 virt kernel: ---[ end trace 6548e737c4c22ec9 ]---
Sep 15 14:55:19 virt kernel: e1000e 0000:04:00.0: peth0: Reset adapter
Sep 15 14:55:19 virt kernel: eth0: port 1(peth0) entering disabled state
Sep 15 14:55:19 virt kernel: e1000e 0000:04:00.0: peth0: Reset adapter
Sep 15 14:55:22 virt kernel: e1000e: peth0 NIC Link is Up 1000 Mbps Full 
Duplex, Flow Control: None
Sep 15 14:55:22 virt kernel: eth0: port 1(peth0) entering forwarding state
Sep 15 15:16:29 virt kernel: hrtimer: interrupt took 10082426 ns
Sep 15 15:24:06 virt kernel: aacraid: Host adapter abort request (0,0,1,0)
Sep 15 15:24:06 virt kernel: aacraid: Host adapter abort request (0,0,1,0)
Sep 15 15:24:06 virt kernel: aacraid: Host adapter reset request. SCSI 
hang ?
Sep 15 15:24:06 virt kernel: e1000e 0000:04:00.0: peth0: Reset adapter
Sep 15 15:24:06 virt kernel: eth0: port 1(peth0) entering disabled state
Sep 15 15:24:06 virt kernel: e1000e 0000:04:00.0: peth0: Reset adapter
Sep 15 15:24:09 virt kernel: e1000e: peth0 NIC Link is Up 1000 Mbps Full 
Duplex, Flow Control: None

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Re: [PATCH] C6 state with EOI issue fix for some Intel processors
  2010-09-15 13:42   ` Andreas Kinzler
@ 2010-09-16  0:23     ` Sheng Yang
  0 siblings, 0 replies; 6+ messages in thread
From: Sheng Yang @ 2010-09-16  0:23 UTC (permalink / raw)
  To: Andreas Kinzler; +Cc: xen-devel, Keir Fraser

On Wednesday 15 September 2010 21:42:53 Andreas Kinzler wrote:
> On 15.09.2010 10:03, Keir Fraser wrote:
> >> This patch fix this issue, by checking if ISR is pending before enter
> >> deep Cx state. If so, it would use power->safe_state instead of deep Cx
> >> state to prevent
> >> the above issue happen.
> > 
> > Thanks. I reworked this patch substantially and applied as
> > xen-unstable:22160 and xen-4.0-testing:21348.
> 
> I tested the patch on vanilla 4.0.1 and it does help a bit. Uptime was
> now over 100 minutes instead of under 3 minutes. But problems still
> occurred (aacraid reset, eth reset).

To determine if the issue was caused by the errata, you can try disable C6 state 
in the BIOS. This errata only happen with C6 state involved. 

--
regards
Yang, Sheng

> With my patch from
> (http://lists.xensource.com/archives/html/xen-devel/2010-09/msg00556.html)
> the machine uptime was over 10 days when I stopped the test.
> 
> Regards Andreas
> 
> Sep 15 14:55:19 virt kernel: ------------[ cut here ]------------
> Sep 15 14:55:19 virt kernel: WARNING: at net/sched/sch_generic.c:261
> dev_watchdog+0x220/0x230()
> Sep 15 14:55:19 virt kernel: Hardware name: X8SIL
> Sep 15 14:55:19 virt kernel: NETDEV WATCHDOG: peth0 (e1000e): transmit
> queue 0 timed out
> Sep 15 14:55:19 virt kernel: Modules linked in: bridge stp llc
> iptable_filter xt_MARK xt_mark xt_iprange xt_conntrack nf_conntrack
> ip_tables x_tables tun loop e1000e
> Sep 15 14:55:19 virt kernel: Pid: 4088, comm: blkback.1.hdc Not tainted
> 2.6.32.18-pvops0-ak3 #1
> Sep 15 14:55:19 virt kernel: Call Trace:
> Sep 15 14:55:19 virt kernel: <IRQ>  [<ffffffff810458f6>]
> warn_slowpath_common+0x76/0xb0
> Sep 15 14:55:19 virt kernel: [<ffffffff8104598c>]
> warn_slowpath_fmt+0x3c/0x40
> Sep 15 14:55:19 virt kernel: [<ffffffff812f1b70>] dev_watchdog+0x220/0x230
> Sep 15 14:55:19 virt kernel: [<ffffffff81050820>] ? mod_timer+0x110/0x180
> Sep 15 14:55:19 virt kernel: [<ffffffff81091c40>] ?
> sync_supers_timer_fn+0x0/0x20
> Sep 15 14:55:19 virt kernel: [<ffffffff812f1950>] ? dev_watchdog+0x0/0x230
> Sep 15 14:55:19 virt kernel: [<ffffffff810502fc>]
> run_timer_softirq+0x14c/0x230
> Sep 15 14:55:19 virt kernel: [<ffffffff8104b72f>] __do_softirq+0xaf/0x140
> Sep 15 14:55:19 virt kernel: [<ffffffff811c5c09>] ?
> __xen_evtchn_do_upcall+0x219/0x230
> Sep 15 14:55:19 virt kernel: [<ffffffff8101357c>] call_softirq+0x1c/0x30
> Sep 15 14:55:19 virt kernel: [<ffffffff81015675>] do_softirq+0x65/0xa0
> Sep 15 14:55:19 virt kernel: [<ffffffff8104b3fd>] irq_exit+0x8d/0x90
> Sep 15 14:55:19 virt kernel: [<ffffffff811c5cdd>]
> xen_evtchn_do_upcall+0x3d/0x60
> Sep 15 14:55:19 virt kernel: [<ffffffff810135ce>]
> xen_do_hypervisor_callback+0x1e/0x30
> Sep 15 14:55:19 virt kernel: <EOI>  [<ffffffff8100922a>] ?
> hypercall_page+0x22a/0x1010
> Sep 15 14:55:19 virt kernel: [<ffffffff8100922a>] ?
> hypercall_page+0x22a/0x1010
> Sep 15 14:55:19 virt kernel: [<ffffffff8100ed7d>] ?
> xen_force_evtchn_callback+0xd/0x10
> Sep 15 14:55:19 virt kernel: [<ffffffff8100f712>] ? check_events+0x12/0x20
> Sep 15 14:55:19 virt kernel: [<ffffffff8100f6b9>] ?
> xen_irq_enable_direct_end+0x0/0x7
> Sep 15 14:55:19 virt kernel: [<ffffffff8135e9dd>] ?
> _spin_unlock_irq+0xd/0x40
> Sep 15 14:55:19 virt kernel: [<ffffffff81151955>] ?
> generic_unplug_device+0x35/0x40
> Sep 15 14:55:19 virt kernel: [<ffffffff811cf456>] ? unplug_queue+0x26/0x50
> Sep 15 14:55:19 virt kernel: [<ffffffff811d001e>] ?
> blkif_schedule+0xde/0x320
> Sep 15 14:55:19 virt kernel: [<ffffffff8105c530>] ?
> autoremove_wake_function+0x0/0x40
> Sep 15 14:55:19 virt kernel: [<ffffffff8135ea42>] ?
> _spin_unlock_irqrestore+0x32/0x40
> Sep 15 14:55:19 virt kernel: [<ffffffff811cff40>] ?
> blkif_schedule+0x0/0x320 Sep 15 14:55:19 virt kernel: [<ffffffff8105c24e>]
> ? kthread+0x8e/0xa0 Sep 15 14:55:19 virt kernel: [<ffffffff8101347a>] ?
> child_rip+0xa/0x20 Sep 15 14:55:19 virt kernel: [<ffffffff81012626>] ?
> int_ret_from_sys_call+0x7/0x1b
> Sep 15 14:55:19 virt kernel: [<ffffffff81012de1>] ?
> retint_restore_args+0x5/0x6
> Sep 15 14:55:19 virt kernel: [<ffffffff81013470>] ? child_rip+0x0/0x20
> Sep 15 14:55:19 virt kernel: ---[ end trace 6548e737c4c22ec9 ]---
> Sep 15 14:55:19 virt kernel: e1000e 0000:04:00.0: peth0: Reset adapter
> Sep 15 14:55:19 virt kernel: eth0: port 1(peth0) entering disabled state
> Sep 15 14:55:19 virt kernel: e1000e 0000:04:00.0: peth0: Reset adapter
> Sep 15 14:55:22 virt kernel: e1000e: peth0 NIC Link is Up 1000 Mbps Full
> Duplex, Flow Control: None
> Sep 15 14:55:22 virt kernel: eth0: port 1(peth0) entering forwarding state
> Sep 15 15:16:29 virt kernel: hrtimer: interrupt took 10082426 ns
> Sep 15 15:24:06 virt kernel: aacraid: Host adapter abort request (0,0,1,0)
> Sep 15 15:24:06 virt kernel: aacraid: Host adapter abort request (0,0,1,0)
> Sep 15 15:24:06 virt kernel: aacraid: Host adapter reset request. SCSI
> hang ?
> Sep 15 15:24:06 virt kernel: e1000e 0000:04:00.0: peth0: Reset adapter
> Sep 15 15:24:06 virt kernel: eth0: port 1(peth0) entering disabled state
> Sep 15 15:24:06 virt kernel: e1000e 0000:04:00.0: peth0: Reset adapter
> Sep 15 15:24:09 virt kernel: e1000e: peth0 NIC Link is Up 1000 Mbps Full
> Duplex, Flow Control: None

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2010-09-16  0:23 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-09-15  7:10 [PATCH] C6 state with EOI issue fix for some Intel processors Sheng Yang
2010-09-15  7:18 ` Sheng Yang
2010-09-15  7:32 ` Keir Fraser
2010-09-15  8:03 ` Keir Fraser
2010-09-15 13:42   ` Andreas Kinzler
2010-09-16  0:23     ` Sheng Yang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).