* [RFC PATCH 0/10] apic_wait_icr_idle issues and possible solutions
@ 2007-04-25 11:03 Fernando Luis Vázquez Cao
2007-04-25 11:13 ` [PATCH 1/10] safe_apic_wait_icr_idle - i386 Fernando Luis Vázquez Cao
` (10 more replies)
0 siblings, 11 replies; 22+ messages in thread
From: Fernando Luis Vázquez Cao @ 2007-04-25 11:03 UTC (permalink / raw)
To: Eric W. Biederman
Cc: Keith Owens, akpm, kexec, linux-kernel, ak, vgoyal, horms, mbligh
--- Background
On i386 and x86_64, in the event of a crash, kdump attempts to stop all
other CPUs before proceeding to boot into the dump capture kernel.
Depending on the sub-architecture several implementations exist as
detailed below.
- i386
smp_send_nmi_allbutself
send_IPI_mask(mask, NMI_VECTOR)
* mach-default
send_IPI_mask
-->send_IPI_mask_bitmask
apic_wait_icr_idle
* mach-bigsmp, mach-es7000, mach-numaq, mach-summit
send_IPI_mask
-->send_IPI_mask_sequence
apic_wait_icr_idle
-- x86_64
smp_send_nmi_allbutself
send_IPI_allbutself(NMI_VECTOR)
* flat APIC
send_IPI_allbutself=flat_send_IPI_allbutself
-->flat_send_IPI_mask
apic_wait_icr_idle
* clustered APIC, physflat APIC
send_IPI_allbutself=(cluster_send_IPI_allbutself/physflat_send_IPI_allbutself)
(cluster_send_IPI_mask/physflat_send_IPI_mask)
-->send_IPI_mask_sequence
apic_wait_icr_idle
As the pseudo-code above shows all the sub-architectures end up calling
apic_wait_icr_idle, which looks like this:
static __inline__ void apic_wait_icr_idle(void)
{
while (apic_read(APIC_ICR) & APIC_ICR_BUSY)
cpu_relax();
}
The busy loop in this function would not be problematic if the
corresponding status bit in the ICR were always updated, but that does
not seem to be the case under certain crash scenarios. As an example,
when the other CPUs are locked-up inside the NMI handler the CPU that
sends the IPI will end up looping forever in the ICR check, effectively
hard-locking the whole system.
Quoting from Intel's "MultiProcessor Specification" (Version 1.4), B-3:
"A local APIC unit indicates successful dispatch of an IPI by
resetting the Delivery Status bit in the Interrupt Command
Register (ICR). The operating system polls the delivery status
bit after sending an INIT or STARTUP IPI until the command has
been dispatched.
A period of 20 microseconds should be sufficient for IPI dispatch
to complete under normal operating conditions. If the IPI is not
successfully dispatched, the operating system can abort the
command. Alternatively, the operating system can retry the IPI by
writing the lower 32-bit double word of the ICR. This “time-out”
mechanism can be implemented through an external interrupt, if
interrupts are enabled on the processor, or through execution of
an instruction or time-stamp counter spin loop."
Intel's documentation suggests the implementation of a time-out
mechanism, which, by the way, is already being open-coded in some parts
of the kernel that tinker with ICR.
--- Possible solutions
* Solution A: Implement the time-out mechanism in apic_wait_icr_idle.
The problem with this approach is that introduces a performance penalty
that may not be acceptable for some callers of apic_wait_icr_idle.
Besides, during normal operation delivery errors should not occur. This
brings us to solution B.
* Solution B: Create a new function that implements the time-out
mechanism.
Create a new function (what about safe_apic_wait_icr_idle?) that
provides the apic_wait_icr_idle functionality but also implements a
time-out mechanism. This function would look like this:
static __inline__ int safe_apic_wait_icr_idle(void)
{
unsigned long send_status;
int timeout;
timeout = 0;
do {
udelay(100);
send_status = apic_read(APIC_ICR) & APIC_ICR_BUSY;
} while (send_status && (timeout++ < 1000));
return send_status;
}
Once we have this function we still have to decide how to use it:
Solution B-1: Create send_IPI_mask and send_IPI_allbutself replacements
that use the new safe_apic_wait_icr_idle instead of apic_wait_icr_idle.
The problem with this approach is that it may be overkill, since they
probably would not have many users besides kdump.
Solution B-2: Use safe_apic_wait_icr_idle when the vector is NMI_VECTOR
and apic_wait_icr_icr_idle otherwise. This solution has the advantage
that it is very simple, but it probably isn't very elegant.
--- About this patch set
The kdump-related stuff is included in patches 9 and 10, but even this
part is rejected I think that patches 1 through 8 include some necessary
cleanups.
Patches 1 and 2: implement safe_apic_wait_icr_idle for i386 and x86_64
respectively.
Patches 3 through 6: remove some ugly cut and pasting from the smp code.
Patches 7 and 8: Cleanup *send_IPI*.
Patches 9 and 10: Implement solution B-2.
I would appreciate your feedback on these patches.
- Fernando
^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH 1/10] safe_apic_wait_icr_idle - i386
2007-04-25 11:03 [RFC PATCH 0/10] apic_wait_icr_idle issues and possible solutions Fernando Luis Vázquez Cao
@ 2007-04-25 11:13 ` Fernando Luis Vázquez Cao
2007-04-25 12:55 ` Keith Owens
2007-04-25 11:15 ` [PATCH 2/10] safe_apic_wait_icr_idle - x86_64 Fernando Luis Vázquez Cao
` (9 subsequent siblings)
10 siblings, 1 reply; 22+ messages in thread
From: Fernando Luis Vázquez Cao @ 2007-04-25 11:13 UTC (permalink / raw)
To: Eric W. Biederman
Cc: horms, kexec, linux-kernel, ak, vgoyal, mbligh, Keith Owens, akpm
apic_wait_icr_idle looks like this:
static __inline__ void apic_wait_icr_idle(void)
{
while (apic_read(APIC_ICR) & APIC_ICR_BUSY)
cpu_relax();
}
The busy loop in this function would not be problematic if the
corresponding status bit in the ICR were always updated, but that does
not seem to be the case under certain crash scenarios. Kdump uses an IPI
to stop the other CPUs in the event of a crash, but when any of the
other CPUs are locked-up inside the NMI handler the CPU that sends the
IPI will end up looping forever in the ICR check, effectively
hard-locking the whole system.
Quoting from Intel's "MultiProcessor Specification" (Version 1.4), B-3:
"A local APIC unit indicates successful dispatch of an IPI by
resetting the Delivery Status bit in the Interrupt Command
Register (ICR). The operating system polls the delivery status
bit after sending an INIT or STARTUP IPI until the command has
been dispatched.
A period of 20 microseconds should be sufficient for IPI dispatch
to complete under normal operating conditions. If the IPI is not
successfully dispatched, the operating system can abort the
command. Alternatively, the operating system can retry the IPI by
writing the lower 32-bit double word of the ICR. This “time-out”
mechanism can be implemented through an external interrupt, if
interrupts are enabled on the processor, or through execution of
an instruction or time-stamp counter spin loop."
Intel's documentation suggests the implementation of a time-out
mechanism, which, by the way, is already being open-coded in some parts
of the kernel that tinker with ICR.
Create a apic_wait_icr_idle replacement that implements the time-out
mechanism and that can be used to solve the aforementioned problem.
Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
---
diff -urNp linux-2.6.21-rc7-orig/include/asm-i386/apic.h linux-2.6.21-rc7/include/asm-i386/apic.h
--- linux-2.6.21-rc7-orig/include/asm-i386/apic.h 2007-04-18 15:27:50.000000000 +0900
+++ linux-2.6.21-rc7/include/asm-i386/apic.h 2007-04-18 15:28:36.000000000 +0900
@@ -2,6 +2,7 @@
#define __ASM_APIC_H
#include <linux/pm.h>
+#include <linux/delay.h>
#include <asm/fixmap.h>
#include <asm/apicdef.h>
#include <asm/processor.h>
@@ -66,10 +67,24 @@ static __inline fastcall unsigned long n
static __inline__ void apic_wait_icr_idle(void)
{
- while ( apic_read( APIC_ICR ) & APIC_ICR_BUSY )
+ while (apic_read(APIC_ICR) & APIC_ICR_BUSY)
cpu_relax();
}
+static __inline__ unsigned long safe_apic_wait_icr_idle(void)
+{
+ unsigned long send_status;
+ int timeout;
+
+ timeout = 0;
+ do {
+ udelay(100);
+ send_status = apic_read(APIC_ICR) & APIC_ICR_BUSY;
+ } while (send_status && (timeout++ < 1000));
+
+ return send_status;
+}
+
int get_physical_broadcast(void);
#ifdef CONFIG_X86_GOOD_APIC
^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH 2/10] safe_apic_wait_icr_idle - x86_64
2007-04-25 11:03 [RFC PATCH 0/10] apic_wait_icr_idle issues and possible solutions Fernando Luis Vázquez Cao
2007-04-25 11:13 ` [PATCH 1/10] safe_apic_wait_icr_idle - i386 Fernando Luis Vázquez Cao
@ 2007-04-25 11:15 ` Fernando Luis Vázquez Cao
2007-04-25 12:26 ` Andi Kleen
2007-04-25 11:19 ` [PATCH 3/10] smpboot: use safe_apic_wait_icr_idle - i386 Fernando Luis Vázquez Cao
` (8 subsequent siblings)
10 siblings, 1 reply; 22+ messages in thread
From: Fernando Luis Vázquez Cao @ 2007-04-25 11:15 UTC (permalink / raw)
To: Eric W. Biederman
Cc: horms, kexec, linux-kernel, ak, vgoyal, mbligh, Keith Owens, akpm
apic_wait_icr_idle looks like this:
static __inline__ void apic_wait_icr_idle(void)
{
while (apic_read(APIC_ICR) & APIC_ICR_BUSY)
cpu_relax();
}
The busy loop in this function would not be problematic if the
corresponding status bit in the ICR were always updated, but that does
not seem to be the case under certain crash scenarios. Kdump uses an IPI
to stop the other CPUs in the event of a crash, but when any of the
other CPUs are locked-up inside the NMI handler the CPU that sends the
IPI will end up looping forever in the ICR check, effectively
hard-locking the whole system.
Quoting from Intel's "MultiProcessor Specification" (Version 1.4), B-3:
"A local APIC unit indicates successful dispatch of an IPI by
resetting the Delivery Status bit in the Interrupt Command
Register (ICR). The operating system polls the delivery status
bit after sending an INIT or STARTUP IPI until the command has
been dispatched.
A period of 20 microseconds should be sufficient for IPI dispatch
to complete under normal operating conditions. If the IPI is not
successfully dispatched, the operating system can abort the
command. Alternatively, the operating system can retry the IPI by
writing the lower 32-bit double word of the ICR. This “time-out”
mechanism can be implemented through an external interrupt, if
interrupts are enabled on the processor, or through execution of
an instruction or time-stamp counter spin loop."
Intel's documentation suggests the implementation of a time-out
mechanism, which, by the way, is already being open-coded in some parts
of the kernel that tinker with ICR.
Create a apic_wait_icr_idle replacement that implements the time-out
mechanism and that can be used to solve the aforementioned problem.
Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
---
diff -urNp linux-2.6.21-rc7-orig/include/asm-x86_64/apic.h linux-2.6.21-rc7/include/asm-x86_64/apic.h
--- linux-2.6.21-rc7-orig/include/asm-x86_64/apic.h 2007-04-18 13:47:06.000000000 +0900
+++ linux-2.6.21-rc7/include/asm-x86_64/apic.h 2007-04-18 15:24:30.000000000 +0900
@@ -2,6 +2,7 @@
#define __ASM_APIC_H
#include <linux/pm.h>
+#include <linux/delay.h>
#include <asm/fixmap.h>
#include <asm/apicdef.h>
#include <asm/system.h>
@@ -49,10 +50,24 @@ static __inline unsigned int apic_read(u
static __inline__ void apic_wait_icr_idle(void)
{
- while (apic_read( APIC_ICR ) & APIC_ICR_BUSY)
+ while (apic_read(APIC_ICR) & APIC_ICR_BUSY)
cpu_relax();
}
+static __inline__ unsigned int safe_apic_wait_icr_idle(void)
+{
+ unsigned int send_status;
+ int timeout;
+
+ timeout = 0;
+ do {
+ udelay(100);
+ send_status = apic_read(APIC_ICR) & APIC_ICR_BUSY;
+ } while (send_status && (timeout++ < 1000));
+
+ return send_status;
+}
+
static inline void ack_APIC_irq(void)
{
/*
^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH 3/10] smpboot: use safe_apic_wait_icr_idle - i386
2007-04-25 11:03 [RFC PATCH 0/10] apic_wait_icr_idle issues and possible solutions Fernando Luis Vázquez Cao
2007-04-25 11:13 ` [PATCH 1/10] safe_apic_wait_icr_idle - i386 Fernando Luis Vázquez Cao
2007-04-25 11:15 ` [PATCH 2/10] safe_apic_wait_icr_idle - x86_64 Fernando Luis Vázquez Cao
@ 2007-04-25 11:19 ` Fernando Luis Vázquez Cao
2007-04-25 11:21 ` [PATCH 4/10] smpboot: use safe_apic_wait_icr_idle - x86_64 Fernando Luis Vázquez Cao
` (7 subsequent siblings)
10 siblings, 0 replies; 22+ messages in thread
From: Fernando Luis Vázquez Cao @ 2007-04-25 11:19 UTC (permalink / raw)
To: Eric W. Biederman
Cc: horms, kexec, linux-kernel, ak, vgoyal, mbligh, Keith Owens, akpm
The functionality provided by the new safe_apic_wait_icr_idle is being
open-coded all over "kernel/smpboot.c". Use safe_apic_wait_icr_idle
instead to consolidate code and ease maintenance.
Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
---
diff -urNp linux-2.6.21-rc7-orig/arch/i386/kernel/smpboot.c linux-2.6.21-rc7/arch/i386/kernel/smpboot.c
--- linux-2.6.21-rc7-orig/arch/i386/kernel/smpboot.c 2007-04-18 15:27:48.000000000 +0900
+++ linux-2.6.21-rc7/arch/i386/kernel/smpboot.c 2007-04-18 15:44:41.000000000 +0900
@@ -568,8 +568,8 @@ static inline void __inquire_remote_apic
static int __devinit
wakeup_secondary_cpu(int logical_apicid, unsigned long start_eip)
{
- unsigned long send_status = 0, accept_status = 0;
- int timeout, maxlvt;
+ unsigned long send_status, accept_status = 0;
+ int maxlvt;
/* Target chip */
apic_write_around(APIC_ICR2, SET_APIC_DEST_FIELD(logical_apicid));
@@ -579,12 +579,7 @@ wakeup_secondary_cpu(int logical_apicid,
apic_write_around(APIC_ICR, APIC_DM_NMI | APIC_DEST_LOGICAL);
Dprintk("Waiting for send to finish...\n");
- timeout = 0;
- do {
- Dprintk("+");
- udelay(100);
- send_status = apic_read(APIC_ICR) & APIC_ICR_BUSY;
- } while (send_status && (timeout++ < 1000));
+ send_status = safe_apic_wait_icr_idle();
/*
* Give the other CPU some time to accept the IPI.
@@ -614,8 +609,8 @@ wakeup_secondary_cpu(int logical_apicid,
static int __devinit
wakeup_secondary_cpu(int phys_apicid, unsigned long start_eip)
{
- unsigned long send_status = 0, accept_status = 0;
- int maxlvt, timeout, num_starts, j;
+ unsigned long send_status, accept_status = 0;
+ int maxlvt, num_starts, j;
/*
* Be paranoid about clearing APIC errors.
@@ -640,12 +635,7 @@ wakeup_secondary_cpu(int phys_apicid, un
| APIC_DM_INIT);
Dprintk("Waiting for send to finish...\n");
- timeout = 0;
- do {
- Dprintk("+");
- udelay(100);
- send_status = apic_read(APIC_ICR) & APIC_ICR_BUSY;
- } while (send_status && (timeout++ < 1000));
+ send_status = safe_apic_wait_icr_idle();
mdelay(10);
@@ -658,12 +648,7 @@ wakeup_secondary_cpu(int phys_apicid, un
apic_write_around(APIC_ICR, APIC_INT_LEVELTRIG | APIC_DM_INIT);
Dprintk("Waiting for send to finish...\n");
- timeout = 0;
- do {
- Dprintk("+");
- udelay(100);
- send_status = apic_read(APIC_ICR) & APIC_ICR_BUSY;
- } while (send_status && (timeout++ < 1000));
+ send_status = safe_apic_wait_icr_idle();
atomic_set(&init_deasserted, 1);
@@ -719,12 +704,7 @@ wakeup_secondary_cpu(int phys_apicid, un
Dprintk("Startup point 1.\n");
Dprintk("Waiting for send to finish...\n");
- timeout = 0;
- do {
- Dprintk("+");
- udelay(100);
- send_status = apic_read(APIC_ICR) & APIC_ICR_BUSY;
- } while (send_status && (timeout++ < 1000));
+ send_status = safe_apic_wait_icr_idle();
/*
* Give the other CPU some time to accept the IPI.
^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH 4/10] smpboot: use safe_apic_wait_icr_idle - x86_64
2007-04-25 11:03 [RFC PATCH 0/10] apic_wait_icr_idle issues and possible solutions Fernando Luis Vázquez Cao
` (2 preceding siblings ...)
2007-04-25 11:19 ` [PATCH 3/10] smpboot: use safe_apic_wait_icr_idle - i386 Fernando Luis Vázquez Cao
@ 2007-04-25 11:21 ` Fernando Luis Vázquez Cao
2007-04-25 11:26 ` [PATCH 5/10] __inquire_remote_apic: use safe_apic_wait_icr_idle - i386 Fernando Luis Vázquez Cao
` (6 subsequent siblings)
10 siblings, 0 replies; 22+ messages in thread
From: Fernando Luis Vázquez Cao @ 2007-04-25 11:21 UTC (permalink / raw)
To: Eric W. Biederman
Cc: horms, kexec, linux-kernel, ak, vgoyal, mbligh, Keith Owens, akpm
The functionality provided by the new safe_apic_wait_icr_idle is being
open-coded all over "kernel/smpboot.c". Use safe_apic_wait_icr_idle
instead to consolidate code and ease maintenance.
Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
---
diff -urNp linux-2.6.21-rc7-orig/arch/x86_64/kernel/smpboot.c linux-2.6.21-rc7/arch/x86_64/kernel/smpboot.c
--- linux-2.6.21-rc7-orig/arch/x86_64/kernel/smpboot.c 2007-04-18 15:27:48.000000000 +0900
+++ linux-2.6.21-rc7/arch/x86_64/kernel/smpboot.c 2007-04-18 15:52:08.000000000 +0900
@@ -430,8 +430,8 @@ static void inquire_remote_apic(int apic
*/
static int __cpuinit wakeup_secondary_via_INIT(int phys_apicid, unsigned int start_rip)
{
- unsigned long send_status = 0, accept_status = 0;
- int maxlvt, timeout, num_starts, j;
+ unsigned long send_status, accept_status = 0;
+ int maxlvt, num_starts, j;
Dprintk("Asserting INIT.\n");
@@ -447,12 +447,7 @@ static int __cpuinit wakeup_secondary_vi
| APIC_DM_INIT);
Dprintk("Waiting for send to finish...\n");
- timeout = 0;
- do {
- Dprintk("+");
- udelay(100);
- send_status = apic_read(APIC_ICR) & APIC_ICR_BUSY;
- } while (send_status && (timeout++ < 1000));
+ send_status = safe_apic_wait_icr_idle();
mdelay(10);
@@ -465,12 +460,7 @@ static int __cpuinit wakeup_secondary_vi
apic_write(APIC_ICR, APIC_INT_LEVELTRIG | APIC_DM_INIT);
Dprintk("Waiting for send to finish...\n");
- timeout = 0;
- do {
- Dprintk("+");
- udelay(100);
- send_status = apic_read(APIC_ICR) & APIC_ICR_BUSY;
- } while (send_status && (timeout++ < 1000));
+ send_status = safe_apic_wait_icr_idle();
mb();
atomic_set(&init_deasserted, 1);
@@ -509,12 +499,7 @@ static int __cpuinit wakeup_secondary_vi
Dprintk("Startup point 1.\n");
Dprintk("Waiting for send to finish...\n");
- timeout = 0;
- do {
- Dprintk("+");
- udelay(100);
- send_status = apic_read(APIC_ICR) & APIC_ICR_BUSY;
- } while (send_status && (timeout++ < 1000));
+ send_status = safe_apic_wait_icr_idle();
/*
* Give the other CPU some time to accept the IPI.
^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH 5/10] __inquire_remote_apic: use safe_apic_wait_icr_idle - i386
2007-04-25 11:03 [RFC PATCH 0/10] apic_wait_icr_idle issues and possible solutions Fernando Luis Vázquez Cao
` (3 preceding siblings ...)
2007-04-25 11:21 ` [PATCH 4/10] smpboot: use safe_apic_wait_icr_idle - x86_64 Fernando Luis Vázquez Cao
@ 2007-04-25 11:26 ` Fernando Luis Vázquez Cao
2007-04-25 11:27 ` [PATCH 6/10] inquire_remote_apic: use safe_apic_wait_icr_idle - x86_64 Fernando Luis Vázquez Cao
` (5 subsequent siblings)
10 siblings, 0 replies; 22+ messages in thread
From: Fernando Luis Vázquez Cao @ 2007-04-25 11:26 UTC (permalink / raw)
To: Eric W. Biederman
Cc: horms, kexec, linux-kernel, ak, vgoyal, mbligh, Keith Owens, akpm
__inquire_remote_apic is used for APIC debugging, so use
safe_apic_wait_icr_idle instead of apic_wait_icr_idle to avoid possible
lockups when APIC delivery fails.
Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
---
diff -urNp linux-2.6.21-rc7-orig/arch/i386/kernel/smpboot.c linux-2.6.21-rc7/arch/i386/kernel/smpboot.c
--- linux-2.6.21-rc7-orig/arch/i386/kernel/smpboot.c 2007-04-18 15:49:33.000000000 +0900
+++ linux-2.6.21-rc7/arch/i386/kernel/smpboot.c 2007-04-18 16:36:27.000000000 +0900
@@ -526,7 +526,8 @@ static inline void __inquire_remote_apic
{
int i, regs[] = { APIC_ID >> 4, APIC_LVR >> 4, APIC_SPIV >> 4 };
char *names[] = { "ID", "VERSION", "SPIV" };
- int timeout, status;
+ int timeout;
+ unsigned long status;
printk("Inquiring remote APIC #%d...\n", apicid);
@@ -536,7 +537,9 @@ static inline void __inquire_remote_apic
/*
* Wait for idle.
*/
- apic_wait_icr_idle();
+ status = safe_apic_wait_icr_idle();
+ if (status)
+ printk("a previous APIC delivery may have failed\n");
apic_write_around(APIC_ICR2, SET_APIC_DEST_FIELD(apicid));
apic_write_around(APIC_ICR, APIC_DM_REMRD | regs[i]);
^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH 6/10] inquire_remote_apic: use safe_apic_wait_icr_idle - x86_64
2007-04-25 11:03 [RFC PATCH 0/10] apic_wait_icr_idle issues and possible solutions Fernando Luis Vázquez Cao
` (4 preceding siblings ...)
2007-04-25 11:26 ` [PATCH 5/10] __inquire_remote_apic: use safe_apic_wait_icr_idle - i386 Fernando Luis Vázquez Cao
@ 2007-04-25 11:27 ` Fernando Luis Vázquez Cao
2007-04-25 11:37 ` [PATCH 7/10] __send_IPI_dest_field - i386 Fernando Luis Vázquez Cao
` (4 subsequent siblings)
10 siblings, 0 replies; 22+ messages in thread
From: Fernando Luis Vázquez Cao @ 2007-04-25 11:27 UTC (permalink / raw)
To: Eric W. Biederman
Cc: horms, kexec, linux-kernel, ak, vgoyal, mbligh, Keith Owens, akpm
inquire_remote_apic is used for APIC debugging, so use
safe_apic_wait_icr_idle instead of apic_wait_icr_idle to avoid possible
lockups when APIC delivery fails.
Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
---
diff -urNp linux-2.6.21-rc7-orig/arch/x86_64/kernel/smpboot.c linux-2.6.21-rc7/arch/x86_64/kernel/smpboot.c
--- linux-2.6.21-rc7-orig/arch/x86_64/kernel/smpboot.c 2007-04-18 15:53:11.000000000 +0900
+++ linux-2.6.21-rc7/arch/x86_64/kernel/smpboot.c 2007-04-18 16:39:20.000000000 +0900
@@ -392,7 +392,8 @@ static void inquire_remote_apic(int apic
{
unsigned i, regs[] = { APIC_ID >> 4, APIC_LVR >> 4, APIC_SPIV >> 4 };
char *names[] = { "ID", "VERSION", "SPIV" };
- int timeout, status;
+ int timeout;
+ unsigned int status;
printk(KERN_INFO "Inquiring remote APIC #%d...\n", apicid);
@@ -402,7 +403,9 @@ static void inquire_remote_apic(int apic
/*
* Wait for idle.
*/
- apic_wait_icr_idle();
+ status = safe_apic_wait_icr_idle();
+ if (status)
+ printk("a previous APIC delivery may have failed\n");
apic_write(APIC_ICR2, SET_APIC_DEST_FIELD(apicid));
apic_write(APIC_ICR, APIC_DM_REMRD | regs[i]);
^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH 7/10] __send_IPI_dest_field - i386
2007-04-25 11:03 [RFC PATCH 0/10] apic_wait_icr_idle issues and possible solutions Fernando Luis Vázquez Cao
` (5 preceding siblings ...)
2007-04-25 11:27 ` [PATCH 6/10] inquire_remote_apic: use safe_apic_wait_icr_idle - x86_64 Fernando Luis Vázquez Cao
@ 2007-04-25 11:37 ` Fernando Luis Vázquez Cao
2007-04-25 11:39 ` [PATCH 8/10] __send_IPI_dest_field - x86_64 Fernando Luis Vázquez Cao
` (3 subsequent siblings)
10 siblings, 0 replies; 22+ messages in thread
From: Fernando Luis Vázquez Cao @ 2007-04-25 11:37 UTC (permalink / raw)
To: Eric W. Biederman
Cc: horms, kexec, linux-kernel, ak, vgoyal, mbligh, Keith Owens, akpm
Implement __send_IPI_dest_field which can be used to send IPIs when the
"destination shorthand" field of the ICR is set to 00 (destination
field). Use it whenever possible.
Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
---
diff -urNp linux-2.6.21-rc7-orig/arch/i386/kernel/smp.c linux-2.6.21-rc7/arch/i386/kernel/smp.c
--- linux-2.6.21-rc7-orig/arch/i386/kernel/smp.c 2007-04-18 15:27:48.000000000 +0900
+++ linux-2.6.21-rc7/arch/i386/kernel/smp.c 2007-04-23 16:29:24.000000000 +0900
@@ -165,16 +165,13 @@ void fastcall send_IPI_self(int vector)
}
/*
- * This is only used on smaller machines.
+ * This is used to send an IPI with no shorthand notation (the destination is
+ * specified in bits 56 to 63 of the ICR).
*/
-void send_IPI_mask_bitmask(cpumask_t cpumask, int vector)
+static inline void __send_IPI_dest_field(unsigned long mask, int vector)
{
- unsigned long mask = cpus_addr(cpumask)[0];
unsigned long cfg;
- unsigned long flags;
- local_irq_save(flags);
- WARN_ON(mask & ~cpus_addr(cpu_online_map)[0]);
/*
* Wait for idle.
*/
@@ -195,13 +192,25 @@ void send_IPI_mask_bitmask(cpumask_t cpu
* Send the IPI. The write to APIC_ICR fires this off.
*/
apic_write_around(APIC_ICR, cfg);
+}
+/*
+ * This is only used on smaller machines.
+ */
+void send_IPI_mask_bitmask(cpumask_t cpumask, int vector)
+{
+ unsigned long mask = cpus_addr(cpumask)[0];
+ unsigned long flags;
+
+ local_irq_save(flags);
+ WARN_ON(mask & ~cpus_addr(cpu_online_map)[0]);
+ __send_IPI_dest_field(mask, vector);
local_irq_restore(flags);
}
void send_IPI_mask_sequence(cpumask_t mask, int vector)
{
- unsigned long cfg, flags;
+ unsigned long flags;
unsigned int query_cpu;
/*
@@ -211,30 +220,10 @@ void send_IPI_mask_sequence(cpumask_t ma
*/
local_irq_save(flags);
-
for (query_cpu = 0; query_cpu < NR_CPUS; ++query_cpu) {
if (cpu_isset(query_cpu, mask)) {
-
- /*
- * Wait for idle.
- */
- apic_wait_icr_idle();
-
- /*
- * prepare target chip field
- */
- cfg = __prepare_ICR2(cpu_to_logical_apicid(query_cpu));
- apic_write_around(APIC_ICR2, cfg);
-
- /*
- * program the ICR
- */
- cfg = __prepare_ICR(0, vector);
-
- /*
- * Send the IPI. The write to APIC_ICR fires this off.
- */
- apic_write_around(APIC_ICR, cfg);
+ __send_IPI_dest_field(cpu_to_logical_apicid(query_cpu),
+ vector);
}
}
local_irq_restore(flags);
^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH 8/10] __send_IPI_dest_field - x86_64
2007-04-25 11:03 [RFC PATCH 0/10] apic_wait_icr_idle issues and possible solutions Fernando Luis Vázquez Cao
` (6 preceding siblings ...)
2007-04-25 11:37 ` [PATCH 7/10] __send_IPI_dest_field - i386 Fernando Luis Vázquez Cao
@ 2007-04-25 11:39 ` Fernando Luis Vázquez Cao
2007-04-25 11:49 ` [PATCH 9/10] Use safe_apic_wait_icr_idle in safe_apic_wait_icr_idle - i386 Fernando Luis Vázquez Cao
` (2 subsequent siblings)
10 siblings, 0 replies; 22+ messages in thread
From: Fernando Luis Vázquez Cao @ 2007-04-25 11:39 UTC (permalink / raw)
To: Eric W. Biederman
Cc: horms, kexec, linux-kernel, ak, vgoyal, mbligh, Keith Owens, akpm
Implement __send_IPI_dest_field which can be used to send IPIs when the
"destination shorthand" field of the ICR is set to 00 (destination
field). Use it whenever possible.
Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
---
diff -urNp linux-2.6.21-rc7-orig/arch/x86_64/kernel/genapic_flat.c linux-2.6.21-rc7/arch/x86_64/kernel/genapic_flat.c
--- linux-2.6.21-rc7-orig/arch/x86_64/kernel/genapic_flat.c 2007-02-05 03:44:54.000000000 +0900
+++ linux-2.6.21-rc7/arch/x86_64/kernel/genapic_flat.c 2007-04-23 17:19:14.000000000 +0900
@@ -60,31 +60,10 @@ static void flat_init_apic_ldr(void)
static void flat_send_IPI_mask(cpumask_t cpumask, int vector)
{
unsigned long mask = cpus_addr(cpumask)[0];
- unsigned long cfg;
unsigned long flags;
local_irq_save(flags);
-
- /*
- * Wait for idle.
- */
- apic_wait_icr_idle();
-
- /*
- * prepare target chip field
- */
- cfg = __prepare_ICR2(mask);
- apic_write(APIC_ICR2, cfg);
-
- /*
- * program the ICR
- */
- cfg = __prepare_ICR(0, vector, APIC_DEST_LOGICAL);
-
- /*
- * Send the IPI. The write to APIC_ICR fires this off.
- */
- apic_write(APIC_ICR, cfg);
+ __send_IPI_dest_field(mask, vector, APIC_DEST_LOGICAL);
local_irq_restore(flags);
}
diff -urNp linux-2.6.21-rc7-orig/include/asm-x86_64/ipi.h linux-2.6.21-rc7/include/asm-x86_64/ipi.h
--- linux-2.6.21-rc7-orig/include/asm-x86_64/ipi.h 2007-02-05 03:44:54.000000000 +0900
+++ linux-2.6.21-rc7/include/asm-x86_64/ipi.h 2007-04-23 17:11:04.000000000 +0900
@@ -76,10 +76,39 @@ static inline void __send_IPI_shortcut(u
apic_write(APIC_ICR, cfg);
}
+/*
+ * This is used to send an IPI with no shorthand notation (the destination is
+ * specified in bits 56 to 63 of the ICR).
+ */
+static inline void __send_IPI_dest_field(unsigned int mask, int vector, unsigned int dest)
+{
+ unsigned long cfg;
+
+ /*
+ * Wait for idle.
+ */
+ apic_wait_icr_idle();
+
+ /*
+ * prepare target chip field
+ */
+ cfg = __prepare_ICR2(mask);
+ apic_write(APIC_ICR2, cfg);
+
+ /*
+ * program the ICR
+ */
+ cfg = __prepare_ICR(0, vector, dest);
+
+ /*
+ * Send the IPI. The write to APIC_ICR fires this off.
+ */
+ apic_write(APIC_ICR, cfg);
+}
static inline void send_IPI_mask_sequence(cpumask_t mask, int vector)
{
- unsigned long cfg, flags;
+ unsigned long flags;
unsigned long query_cpu;
/*
@@ -88,28 +117,9 @@ static inline void send_IPI_mask_sequenc
* - mbligh
*/
local_irq_save(flags);
-
for_each_cpu_mask(query_cpu, mask) {
- /*
- * Wait for idle.
- */
- apic_wait_icr_idle();
-
- /*
- * prepare target chip field
- */
- cfg = __prepare_ICR2(x86_cpu_to_apicid[query_cpu]);
- apic_write(APIC_ICR2, cfg);
-
- /*
- * program the ICR
- */
- cfg = __prepare_ICR(0, vector, APIC_DEST_PHYSICAL);
-
- /*
- * Send the IPI. The write to APIC_ICR fires this off.
- */
- apic_write(APIC_ICR, cfg);
+ __send_IPI_dest_field(x86_cpu_to_apicid[query_cpu],
+ vector, APIC_DEST_PHYSICAL);
}
local_irq_restore(flags);
}
^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH 9/10] Use safe_apic_wait_icr_idle in safe_apic_wait_icr_idle - i386
2007-04-25 11:03 [RFC PATCH 0/10] apic_wait_icr_idle issues and possible solutions Fernando Luis Vázquez Cao
` (7 preceding siblings ...)
2007-04-25 11:39 ` [PATCH 8/10] __send_IPI_dest_field - x86_64 Fernando Luis Vázquez Cao
@ 2007-04-25 11:49 ` Fernando Luis Vázquez Cao
2007-04-25 11:51 ` [PATCH 10/10] Use safe_apic_wait_icr_idle in __send_IPI_dest_field - x86_64 Fernando Luis Vázquez Cao
2007-04-26 6:48 ` [RFC PATCH 0/10] apic_wait_icr_idle issues and possible solutions Vivek Goyal
10 siblings, 0 replies; 22+ messages in thread
From: Fernando Luis Vázquez Cao @ 2007-04-25 11:49 UTC (permalink / raw)
To: Eric W. Biederman
Cc: horms, kexec, linux-kernel, ak, vgoyal, mbligh, Keith Owens, akpm
Use safe_apic_wait_icr_idle to check ICR idle bit if the vector is
NMI_VECTOR to avoid potential hangups in the event of crash when kdump
tries to stop the other CPUs.
Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
---
diff -urNp linux-2.6.21-rc7-orig/arch/i386/kernel/smp.c linux-2.6.21-rc7/arch/i386/kernel/smp.c
--- linux-2.6.21-rc7-orig/arch/i386/kernel/smp.c 2007-04-23 16:32:58.000000000 +0900
+++ linux-2.6.21-rc7/arch/i386/kernel/smp.c 2007-04-25 19:06:40.000000000 +0900
@@ -175,7 +175,10 @@ static inline void __send_IPI_dest_field
/*
* Wait for idle.
*/
- apic_wait_icr_idle();
+ if (unlikely(vector == NMI_VECTOR))
+ safe_apic_wait_icr_idle();
+ else
+ apic_wait_icr_idle();
/*
* prepare target chip field
^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH 10/10] Use safe_apic_wait_icr_idle in __send_IPI_dest_field - x86_64
2007-04-25 11:03 [RFC PATCH 0/10] apic_wait_icr_idle issues and possible solutions Fernando Luis Vázquez Cao
` (8 preceding siblings ...)
2007-04-25 11:49 ` [PATCH 9/10] Use safe_apic_wait_icr_idle in safe_apic_wait_icr_idle - i386 Fernando Luis Vázquez Cao
@ 2007-04-25 11:51 ` Fernando Luis Vázquez Cao
2007-04-25 12:33 ` Andi Kleen
2007-04-26 6:48 ` [RFC PATCH 0/10] apic_wait_icr_idle issues and possible solutions Vivek Goyal
10 siblings, 1 reply; 22+ messages in thread
From: Fernando Luis Vázquez Cao @ 2007-04-25 11:51 UTC (permalink / raw)
To: Eric W. Biederman
Cc: horms, kexec, linux-kernel, ak, vgoyal, mbligh, Keith Owens, akpm
Use safe_apic_wait_icr_idle to check ICR idle bit if the vector is
NMI_VECTOR to avoid potential hangups in the event of crash when kdump
tries to stop the other CPUs.
Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
---
diff -urNp linux-2.6.21-rc7-orig/include/asm-x86_64/ipi.h linux-2.6.21-rc7/include/asm-x86_64/ipi.h
--- linux-2.6.21-rc7-orig/include/asm-x86_64/ipi.h 2007-04-23 17:34:56.000000000 +0900
+++ linux-2.6.21-rc7/include/asm-x86_64/ipi.h 2007-04-25 19:15:01.000000000 +0900
@@ -87,7 +87,10 @@ static inline void __send_IPI_dest_field
/*
* Wait for idle.
*/
- apic_wait_icr_idle();
+ if (unlikely(vector == NMI_VECTOR))
+ safe_apic_wait_icr_idle();
+ else
+ apic_wait_icr_idle();
/*
* prepare target chip field
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 2/10] safe_apic_wait_icr_idle - x86_64
2007-04-25 11:15 ` [PATCH 2/10] safe_apic_wait_icr_idle - x86_64 Fernando Luis Vázquez Cao
@ 2007-04-25 12:26 ` Andi Kleen
2007-04-25 12:55 ` Fernando Luis Vázquez Cao
0 siblings, 1 reply; 22+ messages in thread
From: Andi Kleen @ 2007-04-25 12:26 UTC (permalink / raw)
To: Fernando Luis Vázquez Cao
Cc: Eric W. Biederman, horms, kexec, linux-kernel, vgoyal, mbligh,
Keith Owens, akpm
> static __inline__ void apic_wait_icr_idle(void)
> {
> - while (apic_read( APIC_ICR ) & APIC_ICR_BUSY)
> + while (apic_read(APIC_ICR) & APIC_ICR_BUSY)
> cpu_relax();
> }
>
> +static __inline__ unsigned int safe_apic_wait_icr_idle(void)
This should be probably not inline -- too large
-Andi
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 10/10] Use safe_apic_wait_icr_idle in __send_IPI_dest_field - x86_64
2007-04-25 11:51 ` [PATCH 10/10] Use safe_apic_wait_icr_idle in __send_IPI_dest_field - x86_64 Fernando Luis Vázquez Cao
@ 2007-04-25 12:33 ` Andi Kleen
2007-04-25 12:52 ` Fernando Luis Vázquez Cao
0 siblings, 1 reply; 22+ messages in thread
From: Andi Kleen @ 2007-04-25 12:33 UTC (permalink / raw)
To: Fernando Luis Vázquez Cao
Cc: Eric W. Biederman, horms, kexec, linux-kernel, vgoyal, mbligh,
Keith Owens, akpm
On Wednesday 25 April 2007 13:51:12 Fernando Luis Vázquez Cao wrote:
> Use safe_apic_wait_icr_idle to check ICR idle bit if the vector is
> NMI_VECTOR to avoid potential hangups in the event of crash when kdump
> tries to stop the other CPUs.
But what happens then when this fails? Won't this give another hang?
Have you tested this?
I applied the patches for now.
-Andi
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 10/10] Use safe_apic_wait_icr_idle in __send_IPI_dest_field - x86_64
2007-04-25 12:33 ` Andi Kleen
@ 2007-04-25 12:52 ` Fernando Luis Vázquez Cao
0 siblings, 0 replies; 22+ messages in thread
From: Fernando Luis Vázquez Cao @ 2007-04-25 12:52 UTC (permalink / raw)
To: Andi Kleen
Cc: Eric W. Biederman, horms, kexec, linux-kernel, vgoyal, mbligh,
Keith Owens, akpm
On Wed, 2007-04-25 at 14:33 +0200, Andi Kleen wrote:
> On Wednesday 25 April 2007 13:51:12 Fernando Luis Vázquez Cao wrote:
> > Use safe_apic_wait_icr_idle to check ICR idle bit if the vector is
> > NMI_VECTOR to avoid potential hangups in the event of crash when kdump
> > tries to stop the other CPUs.
>
> But what happens then when this fails? Won't this give another hang?
> Have you tested this?
In kdump the crashing CPU (i.e. the CPU that called crash_kexec) is the
one in charge of rebooting into and executing the dump capture kernel.
But before doing this it attempts to stop the other CPUs sending a IPI
using NMI_VECTOR as the vector. The problem is that sometimes delivery
seems to fail and the crashing CPU gets stuck waiting for the ICR status
bit to be cleared, which will never happen.
With this patch, when safe_apic_wait_icr_idle times out the CPU will
continue executing and try to hand over control to the dump capture
kernel as usual. After applying this patch I have not seen hangs in the
reboot path to second kernel showing the symptoms mentioned before, but
perhaps I am just being lucky and there is something else going on.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 1/10] safe_apic_wait_icr_idle - i386
2007-04-25 11:13 ` [PATCH 1/10] safe_apic_wait_icr_idle - i386 Fernando Luis Vázquez Cao
@ 2007-04-25 12:55 ` Keith Owens
2007-04-25 12:59 ` Fernando Luis Vázquez Cao
2007-04-25 13:16 ` Andi Kleen
0 siblings, 2 replies; 22+ messages in thread
From: Keith Owens @ 2007-04-25 12:55 UTC (permalink / raw)
To: Fernando Luis Vázquez Cao
Cc: Eric W. Biederman, horms, kexec, linux-kernel, ak, vgoyal, mbligh,
akpm
Fernando Luis =?ISO-8859-1?Q?V=E1zquez?= Cao (on Wed, 25 Apr 2007 20:13:28 +0900) wrote:
>+static __inline__ unsigned long safe_apic_wait_icr_idle(void)
>+{
>+ unsigned long send_status;
>+ int timeout;
>+
>+ timeout = 0;
>+ do {
>+ udelay(100);
>+ send_status = apic_read(APIC_ICR) & APIC_ICR_BUSY;
>+ } while (send_status && (timeout++ < 1000));
>+
>+ return send_status;
>+}
>+
safe_apic_wait_icr_idle() as coded guarantees a minimum 100 usec delay
before sending the IPI, this extra delay is unnecessary. Change it to
do {
send_status = apic_read(APIC_ICR) & APIC_ICR_BUSY;
if (send_status)
break;
udelay(100);
} while (timeout++ < 1000);
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 2/10] safe_apic_wait_icr_idle - x86_64
2007-04-25 12:26 ` Andi Kleen
@ 2007-04-25 12:55 ` Fernando Luis Vázquez Cao
2007-04-25 13:11 ` Andi Kleen
0 siblings, 1 reply; 22+ messages in thread
From: Fernando Luis Vázquez Cao @ 2007-04-25 12:55 UTC (permalink / raw)
To: Andi Kleen
Cc: Eric W. Biederman, horms, kexec, linux-kernel, vgoyal, mbligh,
Keith Owens, akpm
On Wed, 2007-04-25 at 14:26 +0200, Andi Kleen wrote:
> > static __inline__ void apic_wait_icr_idle(void)
> > {
> > - while (apic_read( APIC_ICR ) & APIC_ICR_BUSY)
> > + while (apic_read(APIC_ICR) & APIC_ICR_BUSY)
> > cpu_relax();
> > }
> >
> > +static __inline__ unsigned int safe_apic_wait_icr_idle(void)
>
> This should be probably not inline -- too large
Hello Andi,
Thank you for reviewing the patches. Do you want me to resend the whole
patch or should I cook a new one that un-inlines the function instead?
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 1/10] safe_apic_wait_icr_idle - i386
2007-04-25 12:55 ` Keith Owens
@ 2007-04-25 12:59 ` Fernando Luis Vázquez Cao
2007-04-25 13:16 ` Andi Kleen
1 sibling, 0 replies; 22+ messages in thread
From: Fernando Luis Vázquez Cao @ 2007-04-25 12:59 UTC (permalink / raw)
To: Keith Owens
Cc: Eric W. Biederman, horms, kexec, linux-kernel, ak, vgoyal, mbligh,
akpm
On Wed, 2007-04-25 at 22:55 +1000, Keith Owens wrote:
> Fernando Luis =?ISO-8859-1?Q?V=E1zquez?= Cao (on Wed, 25 Apr 2007 20:13:28 +0900) wrote:
> >+static __inline__ unsigned long safe_apic_wait_icr_idle(void)
> >+{
> >+ unsigned long send_status;
> >+ int timeout;
> >+
> >+ timeout = 0;
> >+ do {
> >+ udelay(100);
> >+ send_status = apic_read(APIC_ICR) & APIC_ICR_BUSY;
> >+ } while (send_status && (timeout++ < 1000));
> >+
> >+ return send_status;
> >+}
> >+
>
> safe_apic_wait_icr_idle() as coded guarantees a minimum 100 usec delay
> before sending the IPI, this extra delay is unnecessary. Change it to
Hi Keith,
Thank you for the feedback!
> do {
> send_status = apic_read(APIC_ICR) & APIC_ICR_BUSY;
> if (send_status)
> break;
> udelay(100);
> } while (timeout++ < 1000);
Oops, that is a good point. I will resend this patch either tonight or
tomorrow morning.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 2/10] safe_apic_wait_icr_idle - x86_64
2007-04-25 12:55 ` Fernando Luis Vázquez Cao
@ 2007-04-25 13:11 ` Andi Kleen
2007-04-25 13:32 ` Fernando Luis Vázquez Cao
0 siblings, 1 reply; 22+ messages in thread
From: Andi Kleen @ 2007-04-25 13:11 UTC (permalink / raw)
To: Fernando Luis Vázquez Cao
Cc: Eric W. Biederman, horms, kexec, linux-kernel, vgoyal, mbligh,
Keith Owens, akpm
On Wednesday 25 April 2007 14:55:12 Fernando Luis Vázquez Cao wrote:
> On Wed, 2007-04-25 at 14:26 +0200, Andi Kleen wrote:
> > > static __inline__ void apic_wait_icr_idle(void)
> > > {
> > > - while (apic_read( APIC_ICR ) & APIC_ICR_BUSY)
> > > + while (apic_read(APIC_ICR) & APIC_ICR_BUSY)
> > > cpu_relax();
> > > }
> > >
> > > +static __inline__ unsigned int safe_apic_wait_icr_idle(void)
> >
> > This should be probably not inline -- too large
> Hello Andi,
>
> Thank you for reviewing the patches. Do you want me to resend the whole
> patch or should I cook a new one that un-inlines the function instead?
I already did that.
Also will apply Keith's suggestion.
-Andi
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 1/10] safe_apic_wait_icr_idle - i386
2007-04-25 12:55 ` Keith Owens
2007-04-25 12:59 ` Fernando Luis Vázquez Cao
@ 2007-04-25 13:16 ` Andi Kleen
1 sibling, 0 replies; 22+ messages in thread
From: Andi Kleen @ 2007-04-25 13:16 UTC (permalink / raw)
To: Keith Owens
Cc: Fernando Luis Vázquez Cao, Eric W. Biederman, horms, kexec,
linux-kernel, vgoyal, mbligh, akpm
>
> do {
> send_status = apic_read(APIC_ICR) & APIC_ICR_BUSY;
> if (send_status)
Surely if (!send_status)
> break;
> udelay(100);
> } while (timeout++ < 1000);
-Andi
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 2/10] safe_apic_wait_icr_idle - x86_64
2007-04-25 13:11 ` Andi Kleen
@ 2007-04-25 13:32 ` Fernando Luis Vázquez Cao
0 siblings, 0 replies; 22+ messages in thread
From: Fernando Luis Vázquez Cao @ 2007-04-25 13:32 UTC (permalink / raw)
To: Andi Kleen
Cc: vgoyal, kexec, linux-kernel, horms, Eric W. Biederman, mbligh,
Keith Owens, akpm
On Wed, 2007-04-25 at 15:11 +0200, Andi Kleen wrote:
> On Wednesday 25 April 2007 14:55:12 Fernando Luis Vázquez Cao wrote:
> > On Wed, 2007-04-25 at 14:26 +0200, Andi Kleen wrote:
> > > > static __inline__ void apic_wait_icr_idle(void)
> > > > {
> > > > - while (apic_read( APIC_ICR ) & APIC_ICR_BUSY)
> > > > + while (apic_read(APIC_ICR) & APIC_ICR_BUSY)
> > > > cpu_relax();
> > > > }
> > > >
> > > > +static __inline__ unsigned int safe_apic_wait_icr_idle(void)
> > >
> > > This should be probably not inline -- too large
> > Hello Andi,
> >
> > Thank you for reviewing the patches. Do you want me to resend the whole
> > patch or should I cook a new one that un-inlines the function instead?
>
> I already did that.
>
> Also will apply Keith's suggestion.
Thank you Andi. Sorry for the trouble.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [RFC PATCH 0/10] apic_wait_icr_idle issues and possible solutions
2007-04-25 11:03 [RFC PATCH 0/10] apic_wait_icr_idle issues and possible solutions Fernando Luis Vázquez Cao
` (9 preceding siblings ...)
2007-04-25 11:51 ` [PATCH 10/10] Use safe_apic_wait_icr_idle in __send_IPI_dest_field - x86_64 Fernando Luis Vázquez Cao
@ 2007-04-26 6:48 ` Vivek Goyal
2007-04-26 7:20 ` Fernando Luis Vázquez Cao
10 siblings, 1 reply; 22+ messages in thread
From: Vivek Goyal @ 2007-04-26 6:48 UTC (permalink / raw)
To: Fernando Luis Vázquez Cao
Cc: Eric W. Biederman, Keith Owens, akpm, kexec, linux-kernel, ak,
horms, mbligh
On Wed, Apr 25, 2007 at 08:03:04PM +0900, Fernando Luis Vázquez Cao wrote:
>
> static __inline__ void apic_wait_icr_idle(void)
> {
> while (apic_read(APIC_ICR) & APIC_ICR_BUSY)
> cpu_relax();
> }
>
> The busy loop in this function would not be problematic if the
> corresponding status bit in the ICR were always updated, but that does
> not seem to be the case under certain crash scenarios. As an example,
> when the other CPUs are locked-up inside the NMI handler the CPU that
> sends the IPI will end up looping forever in the ICR check, effectively
> hard-locking the whole system.
>
> Quoting from Intel's "MultiProcessor Specification" (Version 1.4), B-3:
>
> "A local APIC unit indicates successful dispatch of an IPI by
> resetting the Delivery Status bit in the Interrupt Command
> Register (ICR). The operating system polls the delivery status
> bit after sending an INIT or STARTUP IPI until the command has
> been dispatched.
>
> A period of 20 microseconds should be sufficient for IPI dispatch
> to complete under normal operating conditions. If the IPI is not
> successfully dispatched, the operating system can abort the
> command. Alternatively, the operating system can retry the IPI by
> writing the lower 32-bit double word of the ICR. This “time-out”
> mechanism can be implemented through an external interrupt, if
> interrupts are enabled on the processor, or through execution of
> an instruction or time-stamp counter spin loop."
>
> Intel's documentation suggests the implementation of a time-out
> mechanism, which, by the way, is already being open-coded in some parts
> of the kernel that tinker with ICR.
>
> --- Possible solutions
>
> * Solution A: Implement the time-out mechanism in apic_wait_icr_idle.
>
> The problem with this approach is that introduces a performance penalty
> that may not be acceptable for some callers of apic_wait_icr_idle.
> Besides, during normal operation delivery errors should not occur. This
> brings us to solution B.
>
Hi Fernando,
How much is the performance penalty? Is it really significant. My point
is that, to me changing apic_wait_icr_dle() itself seems to be the simple
approach instead of introducing another function.
Original implementation is:
static __inline__ void apic_wait_icr_idle(void)
{
while (apic_read(APIC_ICR) & APIC_ICR_BUSY)
cpu_relax();
}
And new one will look something like.
do {
send_status = apic_read(APIC_ICR) & APIC_ICR_BUSY;
if (!send_status)
break;
udelay(100);
} while (timeout++ < 1000);
There will be at max 100 microsecond delay before you realize that IPI has
been dispatched. To optimize it further we can change it to 10 microsecond
delay
do {
send_status = apic_read(APIC_ICR) & APIC_ICR_BUSY;
if (!send_status)
break;
udelay(10);
} while (timeout++ < 10000);
or may be
do {
send_status = apic_read(APIC_ICR) & APIC_ICR_BUSY;
if (!send_status)
break;
udelay(1);
} while (timeout++ < 100000);
I don't know if 1 micro second delay is supported. I do see it being
used in kernel/hpet.c
Is it too much of performance overhead? Somebody who knows more about it
needs to tell. To me changing apic_wait_icr_idle() seems simple instead
of introducing a new function and then making a special case for NMI.
Thanks
Vivek
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [RFC PATCH 0/10] apic_wait_icr_idle issues and possible solutions
2007-04-26 6:48 ` [RFC PATCH 0/10] apic_wait_icr_idle issues and possible solutions Vivek Goyal
@ 2007-04-26 7:20 ` Fernando Luis Vázquez Cao
0 siblings, 0 replies; 22+ messages in thread
From: Fernando Luis Vázquez Cao @ 2007-04-26 7:20 UTC (permalink / raw)
To: vgoyal
Cc: Eric W. Biederman, Keith Owens, akpm, kexec, linux-kernel, ak,
horms, mbligh
On Thu, 2007-04-26 at 12:18 +0530, Vivek Goyal wrote:
> On Wed, Apr 25, 2007 at 08:03:04PM +0900, Fernando Luis Vázquez Cao wrote:
> >
> > static __inline__ void apic_wait_icr_idle(void)
> > {
> > while (apic_read(APIC_ICR) & APIC_ICR_BUSY)
> > cpu_relax();
> > }
> >
> > The busy loop in this function would not be problematic if the
> > corresponding status bit in the ICR were always updated, but that does
> > not seem to be the case under certain crash scenarios. As an example,
> > when the other CPUs are locked-up inside the NMI handler the CPU that
> > sends the IPI will end up looping forever in the ICR check, effectively
> > hard-locking the whole system.
> >
> > Quoting from Intel's "MultiProcessor Specification" (Version 1.4), B-3:
> >
> > "A local APIC unit indicates successful dispatch of an IPI by
> > resetting the Delivery Status bit in the Interrupt Command
> > Register (ICR). The operating system polls the delivery status
> > bit after sending an INIT or STARTUP IPI until the command has
> > been dispatched.
> >
> > A period of 20 microseconds should be sufficient for IPI dispatch
> > to complete under normal operating conditions. If the IPI is not
> > successfully dispatched, the operating system can abort the
> > command. Alternatively, the operating system can retry the IPI by
> > writing the lower 32-bit double word of the ICR. This “time-out”
> > mechanism can be implemented through an external interrupt, if
> > interrupts are enabled on the processor, or through execution of
> > an instruction or time-stamp counter spin loop."
> >
> > Intel's documentation suggests the implementation of a time-out
> > mechanism, which, by the way, is already being open-coded in some parts
> > of the kernel that tinker with ICR.
> >
> > --- Possible solutions
> >
> > * Solution A: Implement the time-out mechanism in apic_wait_icr_idle.
> >
> > The problem with this approach is that introduces a performance penalty
> > that may not be acceptable for some callers of apic_wait_icr_idle.
> > Besides, during normal operation delivery errors should not occur. This
> > brings us to solution B.
> >
>
> Hi Fernando,
Hi Vivek,
Thank you for your feedback!
> How much is the performance penalty? Is it really significant. My point
> is that, to me changing apic_wait_icr_dle() itself seems to be the simple
> approach instead of introducing another function.
That was what my gut feel said at first, too. But...
> Original implementation is:
>
> static __inline__ void apic_wait_icr_idle(void)
> {
> while (apic_read(APIC_ICR) & APIC_ICR_BUSY)
> cpu_relax();
> }
>
> And new one will look something like.
>
> do {
> send_status = apic_read(APIC_ICR) & APIC_ICR_BUSY;
> if (!send_status)
> break;
> udelay(100);
> } while (timeout++ < 1000);
>
> There will be at max 100 microsecond delay before you realize that IPI has
> been dispatched. To optimize it further we can change it to 10 microsecond
> delay
... I noticed that the maximum theoretical delay you mention has to be
multiplied by the number of CPUs on big systems, because on those
machines send_IPI_mask is implemented as a series of unicasts to each
CPU. It is for this reason that I thought this approach may be
considered unacceptable (I have to admit that I have not performed any
micro-benchmarks, though).
> do {
> send_status = apic_read(APIC_ICR) & APIC_ICR_BUSY;
> if (!send_status)
> break;
> udelay(10);
> } while (timeout++ < 10000);
>
> or may be
>
> do {
> send_status = apic_read(APIC_ICR) & APIC_ICR_BUSY;
> if (!send_status)
> break;
> udelay(1);
> } while (timeout++ < 100000);
>
> I don't know if 1 micro second delay is supported. I do see it being
> used in kernel/hpet.c
Please notice that such high-precision timers are not available in all
machines.
> Is it too much of performance overhead? Somebody who knows more about it
> needs to tell. To me changing apic_wait_icr_idle() seems simple instead
> of introducing a new function and then making a special case for NMI.
As I mentioned above the performance overhead depends on several
factors. I hope it makes sense.
- Fernando
^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2007-04-26 7:20 UTC | newest]
Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-04-25 11:03 [RFC PATCH 0/10] apic_wait_icr_idle issues and possible solutions Fernando Luis Vázquez Cao
2007-04-25 11:13 ` [PATCH 1/10] safe_apic_wait_icr_idle - i386 Fernando Luis Vázquez Cao
2007-04-25 12:55 ` Keith Owens
2007-04-25 12:59 ` Fernando Luis Vázquez Cao
2007-04-25 13:16 ` Andi Kleen
2007-04-25 11:15 ` [PATCH 2/10] safe_apic_wait_icr_idle - x86_64 Fernando Luis Vázquez Cao
2007-04-25 12:26 ` Andi Kleen
2007-04-25 12:55 ` Fernando Luis Vázquez Cao
2007-04-25 13:11 ` Andi Kleen
2007-04-25 13:32 ` Fernando Luis Vázquez Cao
2007-04-25 11:19 ` [PATCH 3/10] smpboot: use safe_apic_wait_icr_idle - i386 Fernando Luis Vázquez Cao
2007-04-25 11:21 ` [PATCH 4/10] smpboot: use safe_apic_wait_icr_idle - x86_64 Fernando Luis Vázquez Cao
2007-04-25 11:26 ` [PATCH 5/10] __inquire_remote_apic: use safe_apic_wait_icr_idle - i386 Fernando Luis Vázquez Cao
2007-04-25 11:27 ` [PATCH 6/10] inquire_remote_apic: use safe_apic_wait_icr_idle - x86_64 Fernando Luis Vázquez Cao
2007-04-25 11:37 ` [PATCH 7/10] __send_IPI_dest_field - i386 Fernando Luis Vázquez Cao
2007-04-25 11:39 ` [PATCH 8/10] __send_IPI_dest_field - x86_64 Fernando Luis Vázquez Cao
2007-04-25 11:49 ` [PATCH 9/10] Use safe_apic_wait_icr_idle in safe_apic_wait_icr_idle - i386 Fernando Luis Vázquez Cao
2007-04-25 11:51 ` [PATCH 10/10] Use safe_apic_wait_icr_idle in __send_IPI_dest_field - x86_64 Fernando Luis Vázquez Cao
2007-04-25 12:33 ` Andi Kleen
2007-04-25 12:52 ` Fernando Luis Vázquez Cao
2007-04-26 6:48 ` [RFC PATCH 0/10] apic_wait_icr_idle issues and possible solutions Vivek Goyal
2007-04-26 7:20 ` Fernando Luis Vázquez Cao
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox