xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* IRQ: issues with directed EOI and IO-APIC ack methods
@ 2012-02-13 16:03 Andrew Cooper
  2012-02-13 16:53 ` Keir Fraser
  0 siblings, 1 reply; 4+ messages in thread
From: Andrew Cooper @ 2012-02-13 16:03 UTC (permalink / raw)
  To: xen-devel@lists.xensource.com, Keir Fraser, Jan Beulich

Hello,

XenServer6.0 (Xen 4.1.1) has had a support escalation against it for
Cisco C210 M2 servers.  I do not have access to any of these servers, so
cant debug the issue myself.

The pcpu LAPICs support EOI Broadcast suppression and Xen enabled it. 
In arch/x86/apic.c:verify_local_APIC, there is a comment stating that
directed EOI support must use the old IO-APIC ack method.

A hypervisor with this check disabled (i.e. never checking for, or
enabling directed EOI) seems to make the system stable again (5 days
stable now, as opposed to a hang due to lost interrupts once every few
hours before).

First of all, I have discovered that forcing "ioapic_ack=new" does not
have the indented effect, because verify_local_APIC trashes it, even if
the user has specified the ack method.  I intend to send a patch to fix
this in due course.

However, as for the main issue, I cant work out any logical reason why
directed EOI would not work with the new ack mode.  I am still trying to
work out the differences in the code path incase I have missed something
subtle, but I wondered if anyone on the list has more knowledge of these
intricacies than me?  Either way, it appears that there is a bug on the
codepath with directed EOI and old ack method.

Thanks in advance,

-- 
Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer
T: +44 (0)1223 225 900, http://www.citrix.com

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: IRQ: issues with directed EOI and IO-APIC ack methods
  2012-02-13 16:03 IRQ: issues with directed EOI and IO-APIC ack methods Andrew Cooper
@ 2012-02-13 16:53 ` Keir Fraser
  2012-02-13 18:02   ` Andrew Cooper
  2012-02-14 10:45   ` Jan Beulich
  0 siblings, 2 replies; 4+ messages in thread
From: Keir Fraser @ 2012-02-13 16:53 UTC (permalink / raw)
  To: Andrew Cooper, xen-devel@lists.xensource.com, Jan Beulich; +Cc: edwin.zhai

On 13/02/2012 16:03, "Andrew Cooper" <andrew.cooper3@citrix.com> wrote:

> Hello,
> 
> XenServer6.0 (Xen 4.1.1) has had a support escalation against it for
> Cisco C210 M2 servers.  I do not have access to any of these servers, so
> cant debug the issue myself.
> 
> The pcpu LAPICs support EOI Broadcast suppression and Xen enabled it.
> In arch/x86/apic.c:verify_local_APIC, there is a comment stating that
> directed EOI support must use the old IO-APIC ack method.

Well, it's not surprising that some systems won't like this method. Firstly,
calling the LAPIC feature 'directed EOI' is misleading. The feature is 'EOI
broadcast suppression' -- specifically, EOI to the LAPIC does not cause EOI
to the IO-APIC, instead the IO-APIC has to be manually EOIed as a separate
operation.

Now, not all IO-APICs directly support this. See io_apic.c:__io_apic_eoi()
-- if the IO-APIC does not have an EOI register, then an EOI is forced in a
slightly gross way. I wonder how reliable that is across a broad range of
chipsets; reliable enough to rely on it for *every* interrupt? ;-)

Cc'ing the patch author Edwin Zhai. If it can't be resolved with Intel, I'm
personally quite happy to see the original patch reverted.

 -- Keir

> A hypervisor with this check disabled (i.e. never checking for, or
> enabling directed EOI) seems to make the system stable again (5 days
> stable now, as opposed to a hang due to lost interrupts once every few
> hours before).
> 
> First of all, I have discovered that forcing "ioapic_ack=new" does not
> have the indented effect, because verify_local_APIC trashes it, even if
> the user has specified the ack method.  I intend to send a patch to fix
> this in due course.
> 
> However, as for the main issue, I cant work out any logical reason why
> directed EOI would not work with the new ack mode.  I am still trying to
> work out the differences in the code path incase I have missed something
> subtle, but I wondered if anyone on the list has more knowledge of these
> intricacies than me?  Either way, it appears that there is a bug on the
> codepath with directed EOI and old ack method.
> 
> Thanks in advance,

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: IRQ: issues with directed EOI and IO-APIC ack methods
  2012-02-13 16:53 ` Keir Fraser
@ 2012-02-13 18:02   ` Andrew Cooper
  2012-02-14 10:45   ` Jan Beulich
  1 sibling, 0 replies; 4+ messages in thread
From: Andrew Cooper @ 2012-02-13 18:02 UTC (permalink / raw)
  To: Keir Fraser
  Cc: xen-devel@lists.xensource.com, edwin.zhai@intel.com, Jan Beulich

[-- Attachment #1: Type: text/plain, Size: 3469 bytes --]

On 13/02/12 16:53, Keir Fraser wrote:
> On 13/02/2012 16:03, "Andrew Cooper" <andrew.cooper3@citrix.com> wrote:
>
>> Hello,
>>
>> XenServer6.0 (Xen 4.1.1) has had a support escalation against it for
>> Cisco C210 M2 servers.  I do not have access to any of these servers, so
>> cant debug the issue myself.
>>
>> The pcpu LAPICs support EOI Broadcast suppression and Xen enabled it.
>> In arch/x86/apic.c:verify_local_APIC, there is a comment stating that
>> directed EOI support must use the old IO-APIC ack method.
> Well, it's not surprising that some systems won't like this method. Firstly,
> calling the LAPIC feature 'directed EOI' is misleading. The feature is 'EOI
> broadcast suppression' -- specifically, EOI to the LAPIC does not cause EOI
> to the IO-APIC, instead the IO-APIC has to be manually EOIed as a separate
> operation.

Yes - I had noticed the naming discrepancy but decided that fixing the
issue was more important than arguing over naming at this point.

> Now, not all IO-APICs directly support this. See io_apic.c:__io_apic_eoi()
> -- if the IO-APIC does not have an EOI register, then an EOI is forced in a
> slightly gross way. I wonder how reliable that is across a broad range of
> chipsets; reliable enough to rely on it for *every* interrupt? ;-)

When I wrote __io_apic_eoi(), it was based on the comment about an
erratum on the 82093AA chipset.  The comment appears in
io_apic.c:mask_and_ack_level_ioapic_irq and end_level_ioapic_irq. 
Before my patch, Xen assumed that every IOAPIC had an EOI register which
was causing issues on older chipsets.  However, it was still using this
hack about flipping the trigger mode because of the erratum, which is
why Xen was only encountering problems as a race condition when
migrating pirqs.

The IO-APICs in question advertise their version as 0x20 so should an
EOI register.  However, I can't find a chip number reference so I cant
find a definite specification for the chip in question.

It appears that Citrix does have an almost identical server so I am
currently negotiating for access to it so I can debug this issue properly.

I have attached a patch which prevents the advertisement of EOI
Broadcast Suppression from overriding what the user specifies on the
command line.

~Andrew

> Cc'ing the patch author Edwin Zhai. If it can't be resolved with Intel, I'm
> personally quite happy to see the original patch reverted.
>
>  -- Keir
>
>> A hypervisor with this check disabled (i.e. never checking for, or
>> enabling directed EOI) seems to make the system stable again (5 days
>> stable now, as opposed to a hang due to lost interrupts once every few
>> hours before).
>>
>> First of all, I have discovered that forcing "ioapic_ack=new" does not
>> have the indented effect, because verify_local_APIC trashes it, even if
>> the user has specified the ack method.  I intend to send a patch to fix
>> this in due course.
>>
>> However, as for the main issue, I cant work out any logical reason why
>> directed EOI would not work with the new ack mode.  I am still trying to
>> work out the differences in the code path incase I have missed something
>> subtle, but I wondered if anyone on the list has more knowledge of these
>> intricacies than me?  Either way, it appears that there is a bug on the
>> codepath with directed EOI and old ack method.
>>
>> Thanks in advance,
>

-- 
Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer
T: +44 (0)1223 225 900, http://www.citrix.com


[-- Attachment #2: ioapic-ack-fix.patch --]
[-- Type: text/x-patch, Size: 2427 bytes --]

# HG changeset patch
# Parent 9ad1e42c341bc78463b6f6610a6300f75b535fbb
IO-APIC: Prevent using EOI broadcast suppression if user specified
ioapic_ack=new on the command line.

Currently, if EOI broadcast suppression is advertised on the BSP LAPIC, Xen will
discard any user specified option regarding IO-APIC ack mode.

This patch introduces a check which prevents EOI Broadcast suppression from
forcing the IO-APIC ack mode to old if the user has explicitly asked for the new
ack mode on the command line.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

diff -r 9ad1e42c341b xen/arch/x86/apic.c
--- a/xen/arch/x86/apic.c
+++ b/xen/arch/x86/apic.c
@@ -424,9 +424,15 @@ int __init verify_local_APIC(void)
      */
     if ( reg0 & APIC_LVR_DIRECTED_EOI )
     {
-        ioapic_ack_new = 0;
-        directed_eoi_enabled = 1;
-        printk("Enabled directed EOI with ioapic_ack_old on!\n");
+        if ( ioapic_ack_new == 1 && ioapic_ack_forced == 1 )
+            printk("Not enabling directed EOI because ioapic_ack_new has been "
+                   "forced on the command line\n");
+        else
+        {
+            ioapic_ack_new = 0;
+            directed_eoi_enabled = 1;
+            printk("Enabled directed EOI with ioapic_ack_old on!\n");
+        }
     }
 
     /*
diff -r 9ad1e42c341b xen/arch/x86/io_apic.c
--- a/xen/arch/x86/io_apic.c
+++ b/xen/arch/x86/io_apic.c
@@ -44,6 +44,7 @@ static DEFINE_SPINLOCK(ioapic_lock);
 
 bool_t __read_mostly skip_ioapic_setup;
 bool_t __read_mostly ioapic_ack_new = 1;
+bool_t __read_mostly ioapic_ack_forced = 0;
 
 #ifndef sis_apic_bug
 /*
@@ -1543,9 +1544,15 @@ static unsigned int startup_level_ioapic
 static void __init setup_ioapic_ack(char *s)
 {
     if ( !strcmp(s, "old") )
+    {
         ioapic_ack_new = 0;
+        ioapic_ack_forced = 1;
+    }
     else if ( !strcmp(s, "new") )
+    {
         ioapic_ack_new = 1;
+        ioapic_ack_forced = 1;
+    }
     else
         printk("Unknown ioapic_ack value specified: '%s'\n", s);
 }
diff -r 9ad1e42c341b xen/include/asm-x86/io_apic.h
--- a/xen/include/asm-x86/io_apic.h
+++ b/xen/include/asm-x86/io_apic.h
@@ -180,6 +180,7 @@ static inline void io_apic_modify(unsign
 /* 1 if "noapic" boot option passed */
 extern bool_t skip_ioapic_setup;
 extern bool_t ioapic_ack_new;
+extern bool_t ioapic_ack_forced;
 
 #ifdef CONFIG_ACPI_BOOT
 extern int io_apic_get_unique_id (int ioapic, int apic_id);

[-- Attachment #3: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: IRQ: issues with directed EOI and IO-APIC ack methods
  2012-02-13 16:53 ` Keir Fraser
  2012-02-13 18:02   ` Andrew Cooper
@ 2012-02-14 10:45   ` Jan Beulich
  1 sibling, 0 replies; 4+ messages in thread
From: Jan Beulich @ 2012-02-14 10:45 UTC (permalink / raw)
  To: Andrew Cooper, Keir Fraser; +Cc: xen-devel@lists.xensource.com, edwin.zhai

>>> On 13.02.12 at 17:53, Keir Fraser <keir@xen.org> wrote:
> On 13/02/2012 16:03, "Andrew Cooper" <andrew.cooper3@citrix.com> wrote:
>> The pcpu LAPICs support EOI Broadcast suppression and Xen enabled it.
>> In arch/x86/apic.c:verify_local_APIC, there is a comment stating that
>> directed EOI support must use the old IO-APIC ack method.

The reason is that with this LAPIC feature you want to send the ack
to the LAPIC immediately (mask_and_ack_level_ioapic_irq()), which
matched the old method.

With the various conditionals there and in end_level_ioapic_irq() it
may be a good first step to create three distinct hw_irq_controller
instances (old, new, and "directed EOI") for level triggered interrupts,
and have them do just what is needed for the specific case. Then it
may become easier to spot what's going wrong.

> Well, it's not surprising that some systems won't like this method. Firstly,
> calling the LAPIC feature 'directed EOI' is misleading. The feature is 'EOI
> broadcast suppression' -- specifically, EOI to the LAPIC does not cause EOI
> to the IO-APIC, instead the IO-APIC has to be manually EOIed as a separate
> operation.
> 
> Now, not all IO-APICs directly support this. See io_apic.c:__io_apic_eoi()
> -- if the IO-APIC does not have an EOI register, then an EOI is forced in a
> slightly gross way. I wonder how reliable that is across a broad range of
> chipsets; reliable enough to rely on it for *every* interrupt? ;-)

With the EOI register pre-dating this mis-named CPU feature, I
wouldn't expect anyone to have built a system with CPU(s) this new
but an old chipset. But it's certainly worth verifying the IO-APIC
version on the affected system(s).

> Cc'ing the patch author Edwin Zhai. If it can't be resolved with Intel, I'm
> personally quite happy to see the original patch reverted.

Let's rather try understanding the issue than reverting something that
ought to help performance.

Jan

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2012-02-14 10:45 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-02-13 16:03 IRQ: issues with directed EOI and IO-APIC ack methods Andrew Cooper
2012-02-13 16:53 ` Keir Fraser
2012-02-13 18:02   ` Andrew Cooper
2012-02-14 10:45   ` Jan Beulich

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).