* [PATCH][2.4] generic cluster APIC support for systems with more than 8 CPUs
@ 2002-12-18 22:36 Pallipadi, Venkatesh
2002-12-18 23:26 ` Christoph Hellwig
` (2 more replies)
0 siblings, 3 replies; 20+ messages in thread
From: Pallipadi, Venkatesh @ 2002-12-18 22:36 UTC (permalink / raw)
To: Linux Kernel
Cc: Martin Bligh, John Stultz, Nakajima, Jun, jamesclv,
Mallick, Asit K, Saxena, Sunil
[-- Attachment #1: Type: text/plain, Size: 4237 bytes --]
2.4.21-pre1 (i386) based patch to fix the issues with systems having more than
8 CPUs, in a generic way.
Motivation:
The current APIC destination mode ("Flat Logical") used in linux kernel has
an upper limit of 8 CPUs. For more than 8 CPUs, either "Clustered Logical" or
"Physical" mode has to be used.
There is already some code in current kernel (2.4.21-pre1), to support such
conditions. Specifically, IBM Summit, uses Physical mode, and IBM NUMAQ
uses clustered mode. But, there are some issues too:
- it is not generic enough to support any other more than 8 CPU system
out of the box. Supporting different systems may need more hacks in the code.
- clustered mode setup is tightly coupled with NUMA system. Whereas, in reality,
we can as well have logical clusters in a non-NUMA system as well.
- physical mode setup is somewhat tightly coupled with xAPIC. But, xAPIC doesn't
necessarily imply physical mode. You can as well have clustered mode with xAPIC
- APIC destination mode is selected based on particular OEM string.
These reasons together led to panics on other OEM systems with > 8 CPUS. The
patch tries to fix this issue in a generic way (in place of having multiple
hacks for different OEMs). Note, the patch only intends to change the
initialization of systems with more than 8 CPUs and it will not affect
other systems (apart from possible bugs in my code itself).
Description:
The basic idea is to use the number of processors detected on the system, to
decide on which APIC destination mode is to be used. Once all the CPU info, is
collected either from ACPI or MP table, we can check the total number of
processors in the system.
If the number of processors in less than equal to 8,
then no change is required, and we can use the default, "Flat Logical" set up.
If the number of processors is more than 8
we can switch to clustered logical setup.
The logical clusters set up as follows.
Cluster 0 (CPU 0-3), Cluster 1 (CPU 4-7), Cluster 2 (CPU 8-11) and so on..
The other things that are done in the patch include:
- Separate out the NUMA specific stuff from APIC setup in cluster mode. Also,
NUMA has its own way of setting up the clusters, and doesn't follow the
logical cluster mapping defined above.
- Separate out xAPIC stuff from APIC destination setup. And the availability of
xAPIC support can actually be determined from the LAPIC version.
- physical mode support _removed_, as we can use clustered logical setup to
support can support upto a max of 60 CPUs. This is mainly because of the
advantage of being able to setup IOAPICs in LowestPrio, when using clustered mode.
The whole stuff is protected by 'Clustered APIC (> 8 CPUs) support
(CONFIG_X86_APIC_CLUSTER)' config option under Processor Type and Features.
But going forward, we can actually make this as default, as this doesn't
affect the systems with less than equal to 8 CPUs (Apart from minor increase
in code size and couple of additional checks during boot up), but gives the
default support to more than 8 CPU systems.
Please let me know your comments/criticisms about this.
I was able to test this successfully on an 8-way with HT(16 logical)
CPU systems that I have access to. But, I haven't tested it on x440, or NUMAQ
systems. Would love to hear about the effect of this patch on these systems.
Thanks,
-Venkatesh
> -----Original Message-----
> From: Nakajima, Jun
> Sent: Thursday, December 12, 2002 7:06 PM
> To: jamesclv@us.ibm.com; Zwane Mwaikambo
> Cc: Martin Bligh; John Stultz; Linux Kernel
> Subject: RE: [PATCH][2.5][RFC] Using xAPIC apic address space
> on !Summit
>
>
> BTW, we are working on a xAPIC patch that supports more than
> 8 CPUs in a
> generic fashion (don't use hardcode OEM checking). We already
> tested it on
> two OEM systems with 16 CPUs.
> - It uses clustered mode. We don't want to use physical mode
> because it does
> not support lowest priority delivery mode.
> - We also check the version of the local APIC if it's xAPIC
> or not. It's
> possible that some other x86 architecture (other than P4P) uses xAPIC.
>
> Stay tuned.
>
> Jun
[-- Attachment #2: cluster-2.4.21-pre1.patch --]
[-- Type: application/octet-stream, Size: 25995 bytes --]
diff -urN linux-2.4.21-pre1.org/Documentation/Configure.help linux-test1/Documentation/Configure.help
--- linux-2.4.21-pre1.org/Documentation/Configure.help 2002-12-13 17:53:58.000000000 -0800
+++ linux-test1/Documentation/Configure.help 2002-12-14 15:00:39.000000000 -0800
@@ -246,6 +246,13 @@
If unsure, say N.
+Clustered APIC support for x86 systems
+CONFIG_X86_APIC_CLUSTER
+ This option is used for getting Linux to run on more than 8 CPU system.
+ This dynamically changes the way that processors are bootstrapped,
+ and uses Clustered Logical APIC addressing mode instead of Flat Logical, for
+ more than 8 CPU system. It doesn't effect the systems with less than 8 CPUs.
+
Multiquad support for NUMAQ systems
CONFIG_X86_NUMAQ
This option is used for getting Linux to run on a (IBM/Sequent) NUMA
diff -urN linux-2.4.21-pre1.org/arch/i386/config.in linux-test1/arch/i386/config.in
--- linux-2.4.21-pre1.org/arch/i386/config.in 2002-12-13 17:53:58.000000000 -0800
+++ linux-test1/arch/i386/config.in 2002-12-14 14:59:22.000000000 -0800
@@ -216,17 +216,13 @@
define_bool CONFIG_X86_IO_APIC y
fi
else
- bool 'Multi-node NUMA system support' CONFIG_X86_NUMA
- if [ "$CONFIG_X86_NUMA" = "y" ]; then
+ bool 'Clustered APIC (> 8 CPUs) support' CONFIG_X86_APIC_CLUSTER
+ if [ "$CONFIG_X86_APIC_CLUSTER" = "y" ]; then
+ define_bool CONFIG_X86_CLUSTERED_APIC y
#Platform Choices
bool ' Multiquad (IBM/Sequent) NUMAQ support' CONFIG_X86_NUMAQ
if [ "$CONFIG_X86_NUMAQ" = "y" ]; then
- define_bool CONFIG_X86_CLUSTERED_APIC y
- define_bool CONFIG_MULTIQUAD y
- fi
- bool ' IBM x440 (Summit/EXA) support' CONFIG_X86_SUMMIT
- if [ "$CONFIG_X86_SUMMIT" = "y" ]; then
- define_bool CONFIG_X86_CLUSTERED_APIC y
+ define_bool CONFIG_MULTIQUAD y
fi
fi
fi
diff -urN linux-2.4.21-pre1.org/arch/i386/defconfig linux-test1/arch/i386/defconfig
--- linux-2.4.21-pre1.org/arch/i386/defconfig 2002-11-28 15:53:09.000000000 -0800
+++ linux-test1/arch/i386/defconfig 2002-12-14 14:59:52.000000000 -0800
@@ -62,6 +62,7 @@
# CONFIG_MATH_EMULATION is not set
# CONFIG_MTRR is not set
CONFIG_SMP=y
+CONFIG_X86_APIC_CLUSTER=y
# CONFIG_MULTIQUAD is not set
CONFIG_HAVE_DEC_LOCK=y
diff -urN linux-2.4.21-pre1.org/arch/i386/kernel/acpitable.c linux-test1/arch/i386/kernel/acpitable.c
--- linux-2.4.21-pre1.org/arch/i386/kernel/acpitable.c 2002-08-02 17:39:42.000000000 -0700
+++ linux-test1/arch/i386/kernel/acpitable.c 2002-12-13 23:25:09.000000000 -0800
@@ -314,12 +314,15 @@
int have_acpi_tables;
extern void __init MP_processor_info(struct mpc_config_processor *);
+extern unsigned int xapic_support;
static void __init
acpi_parse_lapic(struct acpi_table_lapic *local_apic)
{
struct mpc_config_processor proc_entry;
int ix = 0;
+ static unsigned long apic_ver;
+ static int first_time = 1;
if (!local_apic)
return;
@@ -357,7 +360,16 @@
proc_entry.mpc_featureflag = boot_cpu_data.x86_capability[0];
proc_entry.mpc_reserved[0] = 0;
proc_entry.mpc_reserved[1] = 0;
- proc_entry.mpc_apicver = 0x10; /* integrated APIC */
+ if (first_time) {
+ first_time = 0;
+ set_fixmap(FIX_APIC_BASE, APIC_DEFAULT_PHYS_BASE);
+ Dprintk("Local APIC ID %lx\n", apic_read(APIC_ID));
+ apic_ver = apic_read(APIC_LVR);
+ Dprintk("Local APIC Version %lx\n", apic_ver);
+ if (APIC_XAPIC_SUPPORT(apic_ver))
+ xapic_support = 1;
+ }
+ proc_entry.mpc_apicver = apic_ver;
MP_processor_info(&proc_entry);
} else {
printk(" disabled");
diff -urN linux-2.4.21-pre1.org/arch/i386/kernel/apic.c linux-test1/arch/i386/kernel/apic.c
--- linux-2.4.21-pre1.org/arch/i386/kernel/apic.c 2002-12-13 17:53:58.000000000 -0800
+++ linux-test1/arch/i386/kernel/apic.c 2002-12-13 23:25:09.000000000 -0800
@@ -264,8 +264,8 @@
static unsigned long calculate_ldr(unsigned long old)
{
unsigned long id;
- if(clustered_apic_mode == CLUSTERED_APIC_XAPIC)
- id = physical_to_logical_apicid(hard_smp_processor_id());
+ if(clustered_apic_mode)
+ id = cpu_2_logical_apicid[smp_processor_id()];
else
id = 1UL << smp_processor_id();
return (old & ~APIC_LDR_MASK)|SET_APIC_LOGICAL_ID(id);
@@ -302,22 +302,26 @@
* an APIC. See e.g. "AP-388 82489DX User's Manual" (Intel
* document number 292116). So here it goes...
*/
- if (clustered_apic_mode != CLUSTERED_APIC_NUMAQ) {
+ if (configured_platform_type != CONFIGURED_PLATFORM_NUMA) {
+ unsigned int dfr;
/*
* For NUMA-Q (clustered apic logical), the firmware does this
* for us. Otherwise put the APIC into clustered or flat
* delivery mode. Must be "all ones" explicitly for 82489DX.
*/
- if(clustered_apic_mode == CLUSTERED_APIC_XAPIC)
- apic_write_around(APIC_DFR, APIC_DFR_CLUSTER);
+ if(clustered_apic_mode)
+ dfr = APIC_DFR_CLUSTER;
else
- apic_write_around(APIC_DFR, APIC_DFR_FLAT);
+ dfr = APIC_DFR_FLAT;
+ apic_write_around(APIC_DFR, dfr);
/*
* Set up the logical destination ID.
*/
value = apic_read(APIC_LDR);
apic_write_around(APIC_LDR, calculate_ldr(value));
+ Dprintk("setup_local_APIC: CPU#%d LDR 0x%lx\n", smp_processor_id(), calculate_ldr(value));
+ Dprintk("setup_local_APIC: CPU#%d DFR 0x%lx\n", smp_processor_id(), dfr);
}
/*
diff -urN linux-2.4.21-pre1.org/arch/i386/kernel/io_apic.c linux-test1/arch/i386/kernel/io_apic.c
--- linux-2.4.21-pre1.org/arch/i386/kernel/io_apic.c 2002-12-13 17:53:58.000000000 -0800
+++ linux-test1/arch/i386/kernel/io_apic.c 2002-12-13 23:25:09.000000000 -0800
@@ -40,8 +40,9 @@
static spinlock_t ioapic_lock = SPIN_LOCK_UNLOCKED;
-unsigned int int_dest_addr_mode = APIC_DEST_LOGICAL;
-unsigned char int_delivery_mode = dest_LowestPrio;
+extern unsigned int int_dest_addr_mode; /* Default = APIC_DEST_LOGICAL */
+extern unsigned char int_delivery_mode; /* Default = dest_LowestPrio */
+extern unsigned int xapic_support;
/*
@@ -658,7 +659,8 @@
* skip adding the timer int on secondary nodes, which causes
* a small but painful rift in the time-space continuum
*/
- if ((clustered_apic_mode == CLUSTERED_APIC_NUMAQ)
+ if (clustered_apic_mode &&
+ (configured_platform_type == CONFIGURED_PLATFORM_NUMA)
&& (apic != 0) && (irq == 0))
continue;
else
@@ -1067,7 +1069,8 @@
old_id = mp_ioapics[apic].mpc_apicid;
- if (mp_ioapics[apic].mpc_apicid >= apic_broadcast_id) {
+ if ( !xapic_support &&
+ (mp_ioapics[apic].mpc_apicid >= apic_broadcast_id)) {
printk(KERN_ERR "BIOS bug, IO-APIC#%d ID is %d in the MPC table!...\n",
apic, mp_ioapics[apic].mpc_apicid);
printk(KERN_ERR "... fixing up to %d. (tell your hw vendor)\n",
@@ -1081,22 +1084,23 @@
* 'stuck on smp_invalidate_needed IPI wait' messages.
* I/O APIC IDs no longer have any meaning for xAPICs and SAPICs.
*/
- if ((clustered_apic_mode != CLUSTERED_APIC_XAPIC) &&
- (phys_id_present_map & (1 << mp_ioapics[apic].mpc_apicid))) {
- printk(KERN_ERR "BIOS bug, IO-APIC#%d ID %d is already used!...\n",
- apic, mp_ioapics[apic].mpc_apicid);
- for (i = 0; i < 0xf; i++)
- if (!(phys_id_present_map & (1 << i)))
- break;
- if (i >= apic_broadcast_id)
- panic("Max APIC ID exceeded!\n");
- printk(KERN_ERR "... fixing up to %d. (tell your hw vendor)\n",
- i);
- phys_id_present_map |= 1 << i;
- mp_ioapics[apic].mpc_apicid = i;
- } else {
- printk("Setting %d in the phys_id_present_map\n", mp_ioapics[apic].mpc_apicid);
- phys_id_present_map |= 1 << mp_ioapics[apic].mpc_apicid;
+ if ( !xapic_support ) {
+ if (phys_id_present_map & (1 << mp_ioapics[apic].mpc_apicid)) {
+ printk(KERN_ERR "BIOS bug, IO-APIC#%d ID %d is already used!...\n",
+ apic, mp_ioapics[apic].mpc_apicid);
+ for (i = 0; i < 0xf; i++)
+ if (!(phys_id_present_map & (1 << i)))
+ break;
+ if (i >= 0xf)
+ panic("Max APIC ID exceeded!\n");
+ printk(KERN_ERR "... fixing up to %d. (tell your hw vendor)\n",
+ i);
+ phys_id_present_map |= 1 << i;
+ mp_ioapics[apic].mpc_apicid = i;
+ } else {
+ printk("Setting %d in the phys_id_present_map\n", mp_ioapics[apic].mpc_apicid);
+ phys_id_present_map |= 1 << mp_ioapics[apic].mpc_apicid;
+ }
}
diff -urN linux-2.4.21-pre1.org/arch/i386/kernel/mpparse.c linux-test1/arch/i386/kernel/mpparse.c
--- linux-2.4.21-pre1.org/arch/i386/kernel/mpparse.c 2002-12-13 17:53:58.000000000 -0800
+++ linux-test1/arch/i386/kernel/mpparse.c 2002-12-13 23:25:09.000000000 -0800
@@ -10,6 +10,8 @@
* Alan Cox : Added EBDA scanning
* Ingo Molnar : various cleanups and rewrites
* Maciej W. Rozycki : Bits for default MP configurations
+ * Venkatesh Pallipadi : Added generic support for Clustered
+ * int. dest. modes
*/
#include <linux/mm.h>
@@ -67,11 +69,17 @@
unsigned long phys_cpu_present_map;
unsigned long logical_cpu_present_map;
+/* Default values are for Logical Flat destination set up */
#ifdef CONFIG_X86_CLUSTERED_APIC
unsigned char esr_disable = 0;
-unsigned char clustered_apic_mode = CLUSTERED_APIC_NONE;
+unsigned char clustered_apic_mode = CONFIGURED_APIC_NONE;
+unsigned char configured_platform_type = CONFIGURED_PLATFORM_NONE;
unsigned int apic_broadcast_id = APIC_BROADCAST_ID_APIC;
#endif
+unsigned int xapic_support=0;
+unsigned int int_dest_addr_mode = APIC_DEST_LOGICAL;
+unsigned char int_delivery_mode = dest_LowestPrio;
+
unsigned char raw_phys_apicid[NR_CPUS] = { [0 ... NR_CPUS-1] = BAD_APICID };
/*
@@ -156,14 +164,15 @@
return;
logical_apicid = m->mpc_apicid;
- if (clustered_apic_mode == CLUSTERED_APIC_NUMAQ) {
+ if (clustered_apic_mode &&
+ (configured_platform_type == CONFIGURED_PLATFORM_NUMA) ) {
quad = translation_table[mpc_record]->trans_quad;
logical_apicid = (quad << 4) +
(m->mpc_apicid ? m->mpc_apicid << 1 : 1);
printk("Processor #%d %s APIC version %d (quad %d, apic %d)\n",
m->mpc_apicid,
mpc_family((m->mpc_cpufeature & CPU_FAMILY_MASK)>>8 ,
- (m->mpc_cpufeature & CPU_MODEL_MASK)>>4),
+ (m->mpc_cpufeature & CPU_MODEL_MASK)>>4),
m->mpc_apicver, quad, logical_apicid);
} else {
printk("Processor #%d %s APIC version %d\n",
@@ -236,6 +245,8 @@
return;
}
ver = m->mpc_apicver;
+ if (APIC_XAPIC_SUPPORT(ver))
+ xapic_support = 1;
logical_cpu_present_map |= 1 << (num_processors-1);
phys_cpu_present_map |= apicid_to_phys_cpu_present(m->mpc_apicid);
@@ -259,7 +270,8 @@
memcpy(str, m->mpc_bustype, 6);
str[6] = 0;
- if (clustered_apic_mode == CLUSTERED_APIC_NUMAQ) {
+ if (clustered_apic_mode &&
+ (configured_platform_type == CONFIGURED_PLATFORM_NUMA) ) {
quad = translation_table[mpc_record]->trans_quad;
mp_bus_id_to_node[m->mpc_busid] = quad;
mp_bus_id_to_local[m->mpc_busid] = translation_table[mpc_record]->trans_local;
@@ -446,7 +458,9 @@
if (!have_acpi_tables)
mp_lapic_addr = mpc->mpc_lapic;
- if ((clustered_apic_mode == CLUSTERED_APIC_NUMAQ) && mpc->mpc_oemptr) {
+ if (clustered_apic_mode &&
+ (configured_platform_type == CONFIGURED_PLATFORM_NUMA) &&
+ mpc->mpc_oemptr) {
/* We need to process the oem mpc tables to tell us which quad things are in ... */
mpc_record = 0;
smp_read_mpc_oem((struct mp_config_oemtable *) mpc->mpc_oemptr, mpc->mpc_oemsize);
@@ -516,20 +530,11 @@
++mpc_record;
}
- if (clustered_apic_mode){
+ if (clustered_apic_mode &&
+ (configured_platform_type==CONFIGURED_PLATFORM_NUMA)){
phys_cpu_present_map = logical_cpu_present_map;
}
-
- printk("Enabling APIC mode: ");
- if(clustered_apic_mode == CLUSTERED_APIC_NUMAQ)
- printk("Clustered Logical. ");
- else if(clustered_apic_mode == CLUSTERED_APIC_XAPIC)
- printk("Physical. ");
- else
- printk("Flat. ");
- printk("Using %d I/O APICs\n",nr_ioapics);
-
if (!num_processors)
printk(KERN_ERR "SMP mptable: no processors registered!\n");
return num_processors;
@@ -712,7 +717,7 @@
*/
config_acpi_tables();
#endif
-
+
printk("Intel MultiProcessor Specification v1.%d\n", mpf->mpf_specification);
if (mpf->mpf_feature2 & (1<<7)) {
printk(" IMCR and PIC compatibility mode.\n");
@@ -764,6 +769,29 @@
BUG();
printk("Processors: %d\n", num_processors);
+
+#ifdef CONFIG_X86_CLUSTERED_APIC
+ /* Default is Logical Flat destination mode */
+ apic_broadcast_id = xapic_support?APIC_BROADCAST_ID_XAPIC:APIC_BROADCAST_ID_APIC;
+ if ((clustered_apic_mode == CONFIGURED_APIC_NONE) &&
+ (num_processors > FLAT_APIC_CPU_MAX)) {
+ /*Clustered Logical destination mode*/
+ configured_platform_type = CONFIGURED_PLATFORM_NONE;
+ clustered_apic_mode = CONFIGURED_APIC_CLUSTERED;
+ int_dest_addr_mode = APIC_DEST_LOGICAL;
+ int_delivery_mode = dest_LowestPrio;
+ esr_disable = 1;
+ }
+#endif //CONFIG_X86_CLUSTERED_APIC
+
+ printk("Enabling APIC mode: ");
+ if(clustered_apic_mode == CONFIGURED_APIC_CLUSTERED)
+ printk("Clustered Logical. ");
+ else
+ printk("Flat Logical. ");
+ printk("Using %d I/O APICs\n",nr_ioapics);
+ printk("xAPIC support %s\n", (xapic_support?"Enabled":"Disabled"));
+
/*
* Only use the first configuration found.
*/
diff -urN linux-2.4.21-pre1.org/arch/i386/kernel/pci-pc.c linux-test1/arch/i386/kernel/pci-pc.c
--- linux-2.4.21-pre1.org/arch/i386/kernel/pci-pc.c 2002-11-28 15:53:09.000000000 -0800
+++ linux-test1/arch/i386/kernel/pci-pc.c 2002-12-13 23:25:09.000000000 -0800
@@ -478,7 +478,7 @@
#ifdef CONFIG_MULTIQUAD
/* Multi-Quad has an extended PCI Conf1 */
- if(clustered_apic_mode == CLUSTERED_APIC_NUMAQ)
+ if(configured_platform_type == CONFIGURED_PLATFORM_NUMA)
return &pci_direct_mq_conf1;
#endif
return &pci_direct_conf1;
@@ -1407,7 +1407,7 @@
printk(KERN_INFO "PCI: Probing PCI hardware\n");
pci_root_bus = pci_scan_bus(0, pci_root_ops, NULL);
- if (clustered_apic_mode && (numnodes > 1)) {
+ if ( (configured_platform_type==CONFIGURED_PLATFORM_NUMA) && (numnodes > 1)) {
for (quad = 1; quad < numnodes; ++quad) {
printk("Scanning PCI bus %d for quad %d\n",
QUADLOCAL2BUS(quad,0), quad);
diff -urN linux-2.4.21-pre1.org/arch/i386/kernel/smp.c linux-test1/arch/i386/kernel/smp.c
--- linux-2.4.21-pre1.org/arch/i386/kernel/smp.c 2002-12-13 17:53:58.000000000 -0800
+++ linux-test1/arch/i386/kernel/smp.c 2002-12-13 23:25:09.000000000 -0800
@@ -214,10 +214,7 @@
/*
* prepare target chip field
*/
- if(clustered_apic_mode == CLUSTERED_APIC_XAPIC)
- cfg = __prepare_ICR2(cpu_to_physical_apicid(query_cpu));
- else
- cfg = __prepare_ICR2(cpu_to_logical_apicid(query_cpu));
+ cfg = __prepare_ICR2(cpu_to_logical_apicid(query_cpu));
apic_write_around(APIC_ICR2, cfg);
/*
diff -urN linux-2.4.21-pre1.org/arch/i386/kernel/smpboot.c linux-test1/arch/i386/kernel/smpboot.c
--- linux-2.4.21-pre1.org/arch/i386/kernel/smpboot.c 2002-12-13 17:53:58.000000000 -0800
+++ linux-test1/arch/i386/kernel/smpboot.c 2002-12-13 23:25:09.000000000 -0800
@@ -30,6 +30,8 @@
* Tigran Aivazian : fixed "0.00 in /proc/uptime on SMP" bug.
* Maciej W. Rozycki : Bits for genuine 82489DX APICs
* Martin J. Bligh : Added support for multi-quad systems
+ * Venkatesh Pallipadi : Added generic support for Clustered
+ * int. dest. modes
*/
#include <linux/config.h>
@@ -358,7 +360,7 @@
* our local APIC. We have to wait for the IPI or we'll
* lock up on an APIC access.
*/
- if (!clustered_apic_mode)
+ if (configured_platform_type != CONFIGURED_PLATFORM_NUMA)
while (!atomic_read(&init_deasserted));
/*
@@ -412,7 +414,7 @@
* Because we use NMIs rather than the INIT-STARTUP sequence to
* bootstrap the CPUs, the APIC may be in a wierd state. Kick it.
*/
- if (clustered_apic_mode)
+ if (configured_platform_type == CONFIGURED_PLATFORM_NUMA)
clear_local_APIC();
setup_local_APIC();
@@ -540,9 +542,10 @@
* else physical apic ids
*/
{
- if (clustered_apic_mode == CLUSTERED_APIC_NUMAQ) {
- logical_apicid_2_cpu[apicid] = cpu;
+ Dprintk("cpu %d, apicid %d, clustered %d\n", cpu, apicid, clustered_apic_mode);
+ if (clustered_apic_mode) {
cpu_2_logical_apicid[cpu] = apicid;
+ logical_apicid_2_cpu[apicid] = cpu;
} else {
physical_apicid_2_cpu[apicid] = cpu;
cpu_2_physical_apicid[cpu] = apicid;
@@ -555,7 +558,7 @@
* else physical apic ids
*/
{
- if (clustered_apic_mode == CLUSTERED_APIC_NUMAQ) {
+ if (clustered_apic_mode) {
logical_apicid_2_cpu[apicid] = BAD_APICID;
cpu_2_logical_apicid[cpu] = BAD_APICID;
} else {
@@ -702,6 +705,7 @@
* Determine this based on the APIC version.
* If we don't have an integrated APIC, don't send the STARTUP IPIs.
*/
+ Dprintk("###phys_apicid: %d.\n", phys_apicid);
if (APIC_INTEGRATED(apic_version[phys_apicid]))
num_starts = 2;
else
@@ -778,7 +782,7 @@
static void __init do_boot_cpu (int apicid)
/*
* NOTE - on most systems this is a PHYSICAL apic ID, but on multiquad
- * (ie clustered apic addressing mode), this is a LOGICAL apic ID.
+ * (ie NUMA ), this is a LOGICAL apic ID.
*/
{
struct task_struct *idle;
@@ -806,7 +810,17 @@
idle->processor = cpu;
idle->cpus_runnable = 1 << cpu; /* we schedule the first task manually */
- map_cpu_to_boot_apicid(cpu, apicid);
+ /*
+ * In Clustered mode interrupt delivery, without NUMA, we
+ * initialize all the CPUs in normal IPI fashion and use their
+ * logical_apicid while setting the local apics. Thats when clustering
+ * is enabled. So, setup their logical_apicid here.
+ */
+ if (clustered_apic_mode &&
+ (configured_platform_type != CONFIGURED_PLATFORM_NUMA))
+ map_cpu_to_boot_apicid(cpu, physical_to_logical_apicid(apicid));
+ else
+ map_cpu_to_boot_apicid(cpu, apicid);
idle->thread.eip = (unsigned long) start_secondary;
@@ -830,7 +844,7 @@
Dprintk("Setting warm reset code and vector.\n");
- if (clustered_apic_mode == CLUSTERED_APIC_NUMAQ) {
+ if (configured_platform_type == CONFIGURED_PLATFORM_NUMA) {
/* stash the current NMI vector, so we can put things back */
nmi_high = *((volatile unsigned short *) TRAMPOLINE_HIGH);
nmi_low = *((volatile unsigned short *) TRAMPOLINE_LOW);
@@ -847,7 +861,7 @@
/*
* Be paranoid about clearing APIC errors.
*/
- if (!clustered_apic_mode && APIC_INTEGRATED(apic_version[apicid])) {
+ if ((configured_platform_type!=CONFIGURED_PLATFORM_NUMA) && APIC_INTEGRATED(apic_version[apicid])) {
apic_read_around(APIC_SPIV);
apic_write(APIC_ESR, 0);
apic_read(APIC_ESR);
@@ -862,7 +876,7 @@
* Starting actual IPI sequence...
*/
- if (clustered_apic_mode == CLUSTERED_APIC_NUMAQ)
+ if (configured_platform_type == CONFIGURED_PLATFORM_NUMA)
boot_error = wakeup_secondary_via_NMI(apicid);
else
boot_error = wakeup_secondary_via_INIT(apicid, start_eip);
@@ -917,7 +931,7 @@
/* mark "stuck" area as not stuck */
*((volatile unsigned long *)phys_to_virt(8192)) = 0;
- if(clustered_apic_mode == CLUSTERED_APIC_NUMAQ) {
+ if(configured_platform_type == CONFIGURED_PLATFORM_NUMA) {
printk("Restoring NMI vector\n");
*((volatile unsigned short *) TRAMPOLINE_HIGH) = nmi_high;
*((volatile unsigned short *) TRAMPOLINE_LOW) = nmi_low;
@@ -981,7 +995,7 @@
{
int apicid, cpu, bit;
- if ((clustered_apic_mode == CLUSTERED_APIC_NUMAQ) && (numnodes > 1)) {
+ if ((configured_platform_type == CONFIGURED_PLATFORM_NUMA) && (numnodes > 1)) {
printk("Remapping cross-quad port I/O for %d quads\n",
numnodes);
printk("xquad_portio vaddr 0x%08lx, len %08lx\n",
@@ -1019,11 +1033,22 @@
* We have the boot CPU online for sure.
*/
set_bit(0, &cpu_online_map);
- if (clustered_apic_mode == CLUSTERED_APIC_XAPIC)
+ if (clustered_apic_mode)
boot_cpu_logical_apicid = physical_to_logical_apicid(boot_cpu_physical_apicid);
else
boot_cpu_logical_apicid = logical_smp_processor_id();
- map_cpu_to_boot_apicid(0, boot_cpu_apicid);
+
+ /*
+ * In Clustered mode interrupt delivery, without NUMA, we
+ * initialize all the CPUs in normal IPI fashion and use their
+ * logical_apicid while setting the local apics. Thats when clustering
+ * is enabled. So, setup their logical_apicid here.
+ */
+ if (clustered_apic_mode &&
+ (configured_platform_type != CONFIGURED_PLATFORM_NUMA))
+ map_cpu_to_boot_apicid(0, physical_to_logical_apicid(boot_cpu_apicid));
+ else
+ map_cpu_to_boot_apicid(0, boot_cpu_apicid);
global_irq_holder = 0;
current->processor = 0;
@@ -1111,6 +1136,7 @@
/*
* Don't even attempt to start the boot CPU!
*/
+ Dprintk("apicid %d, boot_cpu_apicid %d, bit %d\n", apicid, boot_cpu_apicid, bit);
if (apicid == boot_cpu_apicid)
continue;
@@ -1201,7 +1227,7 @@
}
}
}
-
+
#ifndef CONFIG_VISWS
/*
* Here we can be sure that there is an IO-APIC in the system. Let's
diff -urN linux-2.4.21-pre1.org/include/asm-i386/apicdef.h linux-test1/include/asm-i386/apicdef.h
--- linux-2.4.21-pre1.org/include/asm-i386/apicdef.h 2002-12-13 17:53:58.000000000 -0800
+++ linux-test1/include/asm-i386/apicdef.h 2002-12-13 23:26:22.000000000 -0800
@@ -18,6 +18,7 @@
#define GET_APIC_VERSION(x) ((x)&0xFF)
#define GET_APIC_MAXLVT(x) (((x)>>16)&0xFF)
#define APIC_INTEGRATED(x) ((x)&0xF0)
+#define APIC_XAPIC_SUPPORT(x) (x >= 0x14)
#define APIC_TASKPRI 0x80
#define APIC_TPRI_MASK 0xFF
#define APIC_ARBPRI 0x90
diff -urN linux-2.4.21-pre1.org/include/asm-i386/smpboot.h linux-test1/include/asm-i386/smpboot.h
--- linux-2.4.21-pre1.org/include/asm-i386/smpboot.h 2002-12-13 17:53:58.000000000 -0800
+++ linux-test1/include/asm-i386/smpboot.h 2002-12-13 23:26:40.000000000 -0800
@@ -3,14 +3,22 @@
/*emum for clustered_apic_mode values*/
enum{
- CLUSTERED_APIC_NONE = 0,
- CLUSTERED_APIC_XAPIC,
- CLUSTERED_APIC_NUMAQ
+ CONFIGURED_APIC_NONE = 0,
+ CONFIGURED_APIC_CLUSTERED
};
+/*emum for configured_platform_type values*/
+enum{
+ CONFIGURED_PLATFORM_NONE = 0,
+ CONFIGURED_PLATFORM_NUMA
+};
+
+#define FLAT_APIC_CPU_MAX 8
+
#ifdef CONFIG_X86_CLUSTERED_APIC
extern unsigned int apic_broadcast_id;
extern unsigned char clustered_apic_mode;
+extern unsigned char configured_platform_type;
extern unsigned char esr_disable;
extern unsigned char int_delivery_mode;
extern unsigned int int_dest_addr_mode;
@@ -20,14 +28,15 @@
* Can't recognize Summit xAPICs at present, so use the OEM ID.
*/
if (!strncmp(oem, "IBM ENSW", 8) && !strncmp(prod, "VIGIL SMP", 9)){
- clustered_apic_mode = CLUSTERED_APIC_XAPIC;
- apic_broadcast_id = APIC_BROADCAST_ID_XAPIC;
- int_dest_addr_mode = APIC_DEST_PHYSICAL;
- int_delivery_mode = dest_Fixed;
+ clustered_apic_mode = CONFIGURED_APIC_CLUSTERED;
+ apic_broadcast_id = APIC_BROADCAST_ID_APIC;
+ int_dest_addr_mode = APIC_DEST_LOGICAL;
+ int_delivery_mode = dest_LowestPrio;
esr_disable = 1;
}
else if (!strncmp(oem, "IBM NUMA", 8)){
- clustered_apic_mode = CLUSTERED_APIC_NUMAQ;
+ configured_platform_type = CONFIGURED_PLATFORM_NUMA;
+ clustered_apic_mode = CONFIGURED_APIC_CLUSTERED;
apic_broadcast_id = APIC_BROADCAST_ID_APIC;
int_dest_addr_mode = APIC_DEST_LOGICAL;
int_delivery_mode = dest_LowestPrio;
@@ -38,7 +47,8 @@
#define INT_DELIVERY_MODE (int_delivery_mode)
#else /* CONFIG_X86_CLUSTERED_APIC */
#define apic_broadcast_id (APIC_BROADCAST_ID_APIC)
-#define clustered_apic_mode (CLUSTERED_APIC_NONE)
+#define configured_platform_type (CONFIGURED_PLATFORM_NONE)
+#define clustered_apic_mode (CONFIGURED_APIC_NONE)
#define esr_disable (0)
#define detect_clustered_apic(x,y)
#define INT_DEST_ADDR_MODE (APIC_DEST_LOGICAL) /* logical delivery */
@@ -46,10 +56,10 @@
#endif /* CONFIG_X86_CLUSTERED_APIC */
#define BAD_APICID 0xFFu
-#define TRAMPOLINE_LOW phys_to_virt((clustered_apic_mode == CLUSTERED_APIC_NUMAQ)?0x8:0x467)
-#define TRAMPOLINE_HIGH phys_to_virt((clustered_apic_mode == CLUSTERED_APIC_NUMAQ)?0xa:0x469)
+#define TRAMPOLINE_LOW phys_to_virt((configured_platform_type == CONFIGURED_PLATFORM_NUMA)?0x8:0x467)
+#define TRAMPOLINE_HIGH phys_to_virt((configured_platform_type == CONFIGURED_PLATFORM_NUMA)?0xa:0x469)
-#define boot_cpu_apicid ((clustered_apic_mode == CLUSTERED_APIC_NUMAQ)?boot_cpu_logical_apicid:boot_cpu_physical_apicid)
+#define boot_cpu_apicid ((configured_platform_type == CONFIGURED_PLATFORM_NUMA)? boot_cpu_logical_apicid: boot_cpu_physical_apicid)
extern unsigned char raw_phys_apicid[NR_CPUS];
@@ -58,21 +68,30 @@
*/
static inline int cpu_present_to_apicid(int mps_cpu)
{
- if (clustered_apic_mode == CLUSTERED_APIC_XAPIC)
- return raw_phys_apicid[mps_cpu];
- if(clustered_apic_mode == CLUSTERED_APIC_NUMAQ)
+ if (clustered_apic_mode &&
+ (configured_platform_type == CONFIGURED_PLATFORM_NUMA))
return (mps_cpu/4)*16 + (1<<(mps_cpu%4));
return mps_cpu;
}
static inline unsigned long apicid_to_phys_cpu_present(int apicid)
{
- if(clustered_apic_mode)
+ if (clustered_apic_mode &&
+ (configured_platform_type == CONFIGURED_PLATFORM_NUMA))
return 1UL << (((apicid >> 4) << 2) + (apicid & 0x3));
return 1UL << apicid;
}
-#define physical_to_logical_apicid(phys_apic) ( (1ul << (phys_apic & 0x3)) | (phys_apic & 0xF0u) )
+static inline unsigned long physical_to_logical_apicid(int apicid)
+{
+ if (clustered_apic_mode) {
+ if (configured_platform_type == CONFIGURED_PLATFORM_NUMA)
+ return ((1ul << (apicid & 0x3)) | (apicid & 0xF0u));
+ else
+ return ((1ul << (apicid & 0x3)) + ((apicid&(~0x3))<<2));
+ }
+ return apicid;
+}
/*
* Mappings between logical cpu number and logical / physical apicid
@@ -100,13 +119,12 @@
{
static int cpu;
switch(clustered_apic_mode){
- case CLUSTERED_APIC_NUMAQ:
+ case CONFIGURED_APIC_CLUSTERED:
/* Broadcast intrs to local quad only. */
- return APIC_BROADCAST_ID_APIC;
- case CLUSTERED_APIC_XAPIC:
- /*round robin the interrupts*/
- cpu = (cpu+1)%smp_num_cpus;
- return cpu_to_physical_apicid(cpu);
+#define APIC_BROADCAST_CLUSTER 0xf
+ if (configured_platform_type == CONFIGURED_PLATFORM_NUMA)
+ return APIC_BROADCAST_CLUSTER;
+ return (cpu_online_map&APIC_BROADCAST_CLUSTER);
default:
}
return cpu_online_map;
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH][2.4] generic cluster APIC support for systems with more than 8 CPUs
2002-12-18 22:36 Pallipadi, Venkatesh
@ 2002-12-18 23:26 ` Christoph Hellwig
2002-12-18 23:41 ` William Lee Irwin III
2002-12-18 23:59 ` Martin J. Bligh
2002-12-19 0:24 ` Martin J. Bligh
2002-12-20 2:04 ` James Cleverdon
2 siblings, 2 replies; 20+ messages in thread
From: Christoph Hellwig @ 2002-12-18 23:26 UTC (permalink / raw)
To: Pallipadi, Venkatesh
Cc: Linux Kernel, Martin Bligh, John Stultz, Nakajima, Jun, jamesclv,
Mallick, Asit K, Saxena, Sunil
On Wed, Dec 18, 2002 at 02:36:20PM -0800, Pallipadi, Venkatesh wrote:
> These reasons together led to panics on other OEM systems with > 8 CPUS. The
> patch tries to fix this issue in a generic way (in place of having multiple
> hacks for different OEMs). Note, the patch only intends to change the
> initialization of systems with more than 8 CPUs and it will not affect
> other systems (apart from possible bugs in my code itself).
Any pointers to these systems?
> - Separate out xAPIC stuff from APIC destination setup. And the availability of
> xAPIC support can actually be determined from the LAPIC version.
Are you sure? IIRC some of the early summit boxens didn't report the
right versions..
> - physical mode support _removed_, as we can use clustered logical setup to
> support can support upto a max of 60 CPUs. This is mainly because of the
> advantage of being able to setup IOAPICs in LowestPrio, when using clustered mode.
does this really not break anything in the fragile summit setups?
- bool 'Multi-node NUMA system support' CONFIG_X86_NUMA
- if [ "$CONFIG_X86_NUMA" = "y" ]; then
+ bool 'Clustered APIC (> 8 CPUs) support' CONFIG_X86_APIC_CLUSTER
+ if [ "$CONFIG_X86_APIC_CLUSTER" = "y" ]; then
+ define_bool CONFIG_X86_CLUSTERED_APIC y
Do we really need CONFIG_X86_APIC_CLUSTER _and_ CONFIG_X86_CLUSTERED_APIC?
unsigned long id;
- if(clustered_apic_mode == CLUSTERED_APIC_XAPIC)
- id = physical_to_logical_apicid(hard_smp_processor_id());
+ if(clustered_apic_mode)
+ id = cpu_2_logical_apicid[smp_processor_id()];
else
Okay, this was wrong before, but any chance you could use proper
style here (i.e. if ()
id = 1UL << smp_processor_id();
- if (mp_ioapics[apic].mpc_apicid >= apic_broadcast_id) {
+ if ( !xapic_support &&
+ (mp_ioapics[apic].mpc_apicid >= apic_broadcast_id)) {
if (!xapic_support &&
(mp_ioapics[apic].mpc_apicid >= apic_broadcast_id)) {
+ if ( !xapic_support ) {
Again.
- if (clustered_apic_mode == CLUSTERED_APIC_NUMAQ) {
+ if (clustered_apic_mode &&
+ (configured_platform_type == CONFIGURED_PLATFORM_NUMA) ) {
Doesn;t configured_platform_type == CONFIGURED_PLATFORM_NUMA imply
clustered_apic_mode? and it should be at least CONFIGURED_PLATFORM_NUMAQ,
btw. Probably better something short like SUBARCH_NUMAQ..
Except of that the patch looks fine, but IMHO something like that should
get testing in 2.5 first.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH][2.4] generic cluster APIC support for systems with more than 8 CPUs
2002-12-18 23:26 ` Christoph Hellwig
@ 2002-12-18 23:41 ` William Lee Irwin III
2002-12-18 23:59 ` Martin J. Bligh
1 sibling, 0 replies; 20+ messages in thread
From: William Lee Irwin III @ 2002-12-18 23:41 UTC (permalink / raw)
To: Christoph Hellwig, Pallipadi, Venkatesh, Linux Kernel,
Martin Bligh, John Stultz, Nakajima, Jun, jamesclv,
Mallick, Asit K, Saxena, Sunil
On Wed, Dec 18, 2002 at 11:26:40PM +0000, Christoph Hellwig wrote:
> Except of that the patch looks fine, but IMHO something like that should
> get testing in 2.5 first.
Yes, I'd prefer this happen in 2.5 first and that I and the rest of
everyone in our lab test the living daylights out of it on NUMA-Q.
Bill
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH][2.4] generic cluster APIC support for systems with more than 8 CPUs
2002-12-18 23:26 ` Christoph Hellwig
2002-12-18 23:41 ` William Lee Irwin III
@ 2002-12-18 23:59 ` Martin J. Bligh
1 sibling, 0 replies; 20+ messages in thread
From: Martin J. Bligh @ 2002-12-18 23:59 UTC (permalink / raw)
To: Christoph Hellwig, Pallipadi, Venkatesh
Cc: Linux Kernel, John Stultz, Nakajima, Jun, jamesclv,
Mallick, Asit K, Saxena, Sunil
> - if (clustered_apic_mode == CLUSTERED_APIC_NUMAQ) {
> + if (clustered_apic_mode &&
> + (configured_platform_type == CONFIGURED_PLATFORM_NUMA) ) {
Arrrggh - no. Let's not create even more of an unholy mess than is there
already. The above is just vile.
> Except of that the patch looks fine, but IMHO something like that should
> get testing in 2.5 first.
Do it under subarch, in 2.5, and please wait until I merge the NUMA-Q
and Summit support that's working as is first. I'll send it out within
a week.
M.
PS. if people could change the email headers when replying to other
branches of this thread from mbligh@us.ibm.com to mbligh@aracnet.com,
I'd much appreciate it.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH][2.4] generic cluster APIC support for systems with more than 8 CPUs
2002-12-18 22:36 Pallipadi, Venkatesh
2002-12-18 23:26 ` Christoph Hellwig
@ 2002-12-19 0:24 ` Martin J. Bligh
2002-12-20 2:04 ` James Cleverdon
2 siblings, 0 replies; 20+ messages in thread
From: Martin J. Bligh @ 2002-12-19 0:24 UTC (permalink / raw)
To: Pallipadi, Venkatesh, Linux Kernel
Cc: John Stultz, Nakajima, Jun, jamesclv, Mallick, Asit K,
Saxena, Sunil
First thing, can you split this into much smaller pieces, each of
which perform one code change ... then it might be more feasible
to read it.
> - bool 'Multi-node NUMA system support' CONFIG_X86_NUMA
> - if [ "$CONFIG_X86_NUMA" = "y" ]; then
> + bool 'Clustered APIC (> 8 CPUs) support' CONFIG_X86_APIC_CLUSTER
> + if [ "$CONFIG_X86_APIC_CLUSTER" = "y" ]; then
> + define_bool CONFIG_X86_CLUSTERED_APIC y
> #Platform Choices
> bool ' Multiquad (IBM/Sequent) NUMAQ support' CONFIG_X86_NUMAQ
> if [ "$CONFIG_X86_NUMAQ" = "y" ]; then
> - define_bool CONFIG_X86_CLUSTERED_APIC y
> - define_bool CONFIG_MULTIQUAD y
> - fi
> - bool ' IBM x440 (Summit/EXA) support' CONFIG_X86_SUMMIT
> - if [ "$CONFIG_X86_SUMMIT" = "y" ]; then
> - define_bool CONFIG_X86_CLUSTERED_APIC y
> + define_bool CONFIG_MULTIQUAD y
You seem to have lost turning on CONFIG_X86_NUMA.
> --- linux-2.4.21-pre1.org/arch/i386/defconfig 2002-11-28 15:53:09.000000000 -0800
> +++ linux-test1/arch/i386/defconfig 2002-12-14 14:59:52.000000000 -0800
> @@ -62,6 +62,7 @@
> # CONFIG_MATH_EMULATION is not set
> # CONFIG_MTRR is not set
> CONFIG_SMP=y
> +CONFIG_X86_APIC_CLUSTER=y
> # CONFIG_MULTIQUAD is not set
> CONFIG_HAVE_DEC_LOCK=y
Errrm ... on by default?
> - if(clustered_apic_mode == CLUSTERED_APIC_XAPIC)
> - id = physical_to_logical_apicid(hard_smp_processor_id());
> + if(clustered_apic_mode)
> + id = cpu_2_logical_apicid[smp_processor_id()];
Don't use those arrays directly, use the macros.
And that was off before for NUMA-Q ... you seem to have turned it on.
Unless you've inverted the meaning of clustered_apic_mode, which is
going to confuse the hell out of everyone?
> - if (clustered_apic_mode != CLUSTERED_APIC_NUMAQ) {
> + if (configured_platform_type != CONFIGURED_PLATFORM_NUMA) {
OK, what exactly are your switching rules here? Before:
if (clustered_apic_mode == CLUSTERED_APIC_NUMAQ) -> numaq only
if (clustered_apic_mode == CLUSTERED_APIC_XAPIC) -> x440
if (clustered_apic_mode) -> numaq or x440
Make sure you match that switching logic in whatever you do.
For instance, this whole section gets skipped for NUMA-Q, but not
other NUMA machines.
> /* Multi-Quad has an extended PCI Conf1 */
> - if(clustered_apic_mode == CLUSTERED_APIC_NUMAQ)
> + if(configured_platform_type == CONFIGURED_PLATFORM_NUMA)
If that's the direct substitution you're trying to make, don't misname
NUMAQ stuff as NUMA - very confusing ...
OK ... I give up trying to read the rest of it until you explain the
switching rules you're trying to use ... perhaps they're just confusingly
named, but it looks all wrong to me ...
M.
^ permalink raw reply [flat|nested] 20+ messages in thread
* RE: [PATCH][2.4] generic cluster APIC support for systems with more than 8 CPUs
@ 2002-12-19 1:05 Pallipadi, Venkatesh
2002-12-19 1:32 ` James Cleverdon
0 siblings, 1 reply; 20+ messages in thread
From: Pallipadi, Venkatesh @ 2002-12-19 1:05 UTC (permalink / raw)
To: Linux Kernel, Christoph Hellwig
Cc: Martin Bligh, John Stultz, Nakajima, Jun, jamesclv,
Mallick, Asit K, Saxena, Sunil
I have started working on a similar patch for 2.5. Other thing in my todo list is to
split this patch up into chunks.
Other comments inlined below.
> From: Christoph Hellwig [mailto:hch@infradead.org]
> On Wed, Dec 18, 2002 at 02:36:20PM -0800, Pallipadi, Venkatesh wrote:
> > xAPIC support can actually be determined from the LAPIC version.
>
> Are you sure? IIRC some of the early summit boxens didn't report the
> right versions..
> does this really not break anything in the fragile summit setups?
I am not really sure about the local APIC versions in summit. What I remember seeing on
lkml was summit has older IOAPIC version. Can someone clarify this?
> Okay, this was wrong before, but any chance you could use proper
> style here (i.e. if ()
> Again.
oops.. I somehow missed these 'if' coding style stuff. changing it immediately.
> > + define_bool CONFIG_X86_CLUSTERED_APIC y
> Do we really need CONFIG_X86_APIC_CLUSTER _and_ CONFIG_X86_CLUSTERED_APIC?
I will also eliminate CONFIG_X86_APIC_CLUSTER and use CONFIG_X86_CLUSTERED_APIC directly.
>
> - if (clustered_apic_mode == CLUSTERED_APIC_NUMAQ) {
> + if (clustered_apic_mode &&
> + (configured_platform_type ==
> CONFIGURED_PLATFORM_NUMA) ) {
>
> Doesn;t configured_platform_type == CONFIGURED_PLATFORM_NUMA imply
> clustered_apic_mode? and it should be at least
> CONFIGURED_PLATFORM_NUMAQ,
> btw. Probably better something short like SUBARCH_NUMAQ..
Yes, right now CONFIGURED_PLATFORM_NUMA implies clustered_apic_mode, and one of the
checks in the above 'if' is redundant. Will do a search and replace of NUMA by NUMAQ.
Thanks,
Venkatesh
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH][2.4] generic cluster APIC support for systems with more than 8 CPUs
2002-12-19 1:05 Pallipadi, Venkatesh
@ 2002-12-19 1:32 ` James Cleverdon
0 siblings, 0 replies; 20+ messages in thread
From: James Cleverdon @ 2002-12-19 1:32 UTC (permalink / raw)
To: Pallipadi, Venkatesh, Linux Kernel, Christoph Hellwig
Cc: Martin Bligh, John Stultz, Nakajima, Jun, Mallick, Asit K,
Saxena, Sunil
On Wednesday 18 December 2002 05:05 pm, Pallipadi, Venkatesh wrote:
> I have started working on a similar patch for 2.5. Other thing in my todo
> list is to split this patch up into chunks.
>
> Other comments inlined below.
>
> > From: Christoph Hellwig [mailto:hch@infradead.org]
> >
> > On Wed, Dec 18, 2002 at 02:36:20PM -0800, Pallipadi, Venkatesh wrote:
> > > xAPIC support can actually be determined from the LAPIC version.
> >
> > Are you sure? IIRC some of the early summit boxens didn't report the
> > right versions..
> > does this really not break anything in the fragile summit setups?
>
> I am not really sure about the local APIC versions in summit. What I
> remember seeing on lkml was summit has older IOAPIC version. Can someone
> clarify this?
Sure, I can verify it. The I/O APICs in shipped summit chipsets contains a
version ID of 0x11 instead of 0x14 to 0x1F. The high performance folks
claimed that Intel specified 0x14 for the local APICs, but left their orange
jacket docs saying 0x1X for I/O APICs until after the chips taped out.
Whatever. In any case, there are boxes in the field that contain those
version numbers. We can recognize them using the OEM and product strings in
the MPS and ACPI tables, so it's only an annoyance.
> > Okay, this was wrong before, but any chance you could use proper
> > style here (i.e. if ()
> > Again.
>
> oops.. I somehow missed these 'if' coding style stuff. changing it
> immediately.
>
> > > + define_bool CONFIG_X86_CLUSTERED_APIC y
> >
> > Do we really need CONFIG_X86_APIC_CLUSTER _and_
> > CONFIG_X86_CLUSTERED_APIC?
>
> I will also eliminate CONFIG_X86_APIC_CLUSTER and use
> CONFIG_X86_CLUSTERED_APIC directly.
>
> > - if (clustered_apic_mode == CLUSTERED_APIC_NUMAQ) {
> > + if (clustered_apic_mode &&
> > + (configured_platform_type ==
> > CONFIGURED_PLATFORM_NUMA) ) {
> >
> > Doesn;t configured_platform_type == CONFIGURED_PLATFORM_NUMA imply
> > clustered_apic_mode? and it should be at least
> > CONFIGURED_PLATFORM_NUMAQ,
> > btw. Probably better something short like SUBARCH_NUMAQ..
>
> Yes, right now CONFIGURED_PLATFORM_NUMA implies clustered_apic_mode, and
> one of the checks in the above 'if' is redundant. Will do a search and
> replace of NUMA by NUMAQ.
>
>
> Thanks,
> Venkatesh
--
James Cleverdon
IBM xSeries Linux Solutions
{jamesclv(Unix, preferred), cleverdj(Notes)} at us dot ibm dot com
^ permalink raw reply [flat|nested] 20+ messages in thread
* RE: [PATCH][2.4] generic cluster APIC support for systems with more than 8 CPUs
@ 2002-12-19 2:35 Pallipadi, Venkatesh
2002-12-19 3:10 ` Martin J. Bligh
0 siblings, 1 reply; 20+ messages in thread
From: Pallipadi, Venkatesh @ 2002-12-19 2:35 UTC (permalink / raw)
To: Martin J. Bligh, Linux Kernel
Cc: John Stultz, Nakajima, Jun, jamesclv, Mallick, Asit K,
Saxena, Sunil
The rules that I am trying to follow
numaq summit/ all other
other >8 CPU system systems
------------------------------------------------------------------------------
clustered_apic_mode CLUSTERED CLUSTERED NONE
configured_platform_type NUMAQ NONE NONE
------------------------------------------------------------------------------
Note that in the patch, wherever I said NUMA, I actually meant NUMAQ. I think I lost
that Q, while I was trying to reduce the length of this variable
(CONFIGURED_PLATFORM_NUMAQ) :). Sorry about all the resulting confusion. Doing a
search and replace of NUMA by NUMAQ on my patch right now.
Noticeable changes here are
- summit using CLUSTERED in place of XAPIC(Physical destination).
- use "configured_platform_type" it basically separate out numaq specific stuff
(like, waking up the CPUs through NMI), from the generic cluster apic support.
We are trying to use a common APIC destination mode for all systems with more
than 8 CPUs. This is by having the logical clusters of the CPUs. I am hoping that
this mode works fine on summit. Another option is to allow summit to continue using
physical mode, if there is any binding reason to do so. But anyway NUMAQ specific stuff
has to be separated from cluster APIC stuff.
Rest of the comments inlined below..
> From: Martin J. Bligh [mailto:mbligh@aracnet.com]
> > - define_bool CONFIG_X86_CLUSTERED_APIC y
> > + define_bool CONFIG_MULTIQUAD y
>
> You seem to have lost turning on CONFIG_X86_NUMA.
I dont see CONFIG_X86_NUMA getting used anywhere in 2.4.21-pre1. Am I missing
something here??
> > +CONFIG_X86_APIC_CLUSTER=y
> > # CONFIG_MULTIQUAD is not set
> > CONFIG_HAVE_DEC_LOCK=y
>
> Errrm ... on by default?
I was just trying to be little ambitious :). Will remove that now..
> > - if(clustered_apic_mode == CLUSTERED_APIC_XAPIC)
> > - id =
> physical_to_logical_apicid(hard_smp_processor_id());
> > + if(clustered_apic_mode)
> > + id = cpu_2_logical_apicid[smp_processor_id()];
>
> Don't use those arrays directly, use the macros.
OK. Will change it.
> And that was off before for NUMA-Q ... you seem to have turned it on.
> Unless you've inverted the meaning of clustered_apic_mode, which is
> going to confuse the hell out of everyone?
NO. This check is happening inside calculate_ldr() routine, and NUMAQ never comes
to calculate_ldr(), as (according to the comments), it is the BIOS that programs
LDR in NUMAQ. So only thing we have to worry about in calculate_ldr() is
non-NUMAQ systems.
> > - if (clustered_apic_mode != CLUSTERED_APIC_NUMAQ) {
> > + if (configured_platform_type != CONFIGURED_PLATFORM_NUMA) {
>
> OK, what exactly are your switching rules here? Before:
I am trying to get the stuff which are _only_ specific to NUMAQ, under
"platform type" check. And the stuff specific to cluster APIC setup under
"apic mode" check.
Can you please review the complete patch now. I am not sure whether my explaination
was clear enough. Let me know if have any questions.
Thanks,
-Venkatesh
^ permalink raw reply [flat|nested] 20+ messages in thread
* RE: [PATCH][2.4] generic cluster APIC support for systems with more than 8 CPUs
@ 2002-12-19 2:45 Pallipadi, Venkatesh
2002-12-19 4:14 ` James Cleverdon
0 siblings, 1 reply; 20+ messages in thread
From: Pallipadi, Venkatesh @ 2002-12-19 2:45 UTC (permalink / raw)
To: jamesclv, Linux Kernel, Christoph Hellwig
Cc: Martin Bligh, John Stultz, Nakajima, Jun, Mallick, Asit K,
Saxena, Sunil
> From: James Cleverdon [mailto:jamesclv@us.ibm.com]
> > On Wednesday 18 December 2002 05:05 pm, Pallipadi, Venkatesh wrote:
> > I am not really sure about the local APIC versions in summit. What I
> > remember seeing on lkml was summit has older IOAPIC
> version. Can someone
> > clarify this?
>
> Sure, I can verify it. The I/O APICs in shipped summit
> chipsets contains a
> version ID of 0x11 instead of 0x14 to 0x1F. The high
> performance folks
> claimed that Intel specified 0x14 for the local APICs, but
> left their orange
> jacket docs saying 0x1X for I/O APICs until after the chips taped out.
>
> Whatever. In any case, there are boxes in the field that
> contain those
> version numbers. We can recognize them using the OEM and
> product strings in
> the MPS and ACPI tables, so it's only an annoyance.
>
OK. In my patch I am looking at local APIC version > 0x14, to check xAPIC support.
This should work on all systems irrespective of IOAPIC version.
And even if there are problems here for summit, we can workaround it, by simply
forcing xAPIC support at already existing OEM string check.
Thanks,
-Venkatesh
^ permalink raw reply [flat|nested] 20+ messages in thread
* RE: [PATCH][2.4] generic cluster APIC support for systems with more than 8 CPUs
2002-12-19 2:35 Pallipadi, Venkatesh
@ 2002-12-19 3:10 ` Martin J. Bligh
0 siblings, 0 replies; 20+ messages in thread
From: Martin J. Bligh @ 2002-12-19 3:10 UTC (permalink / raw)
To: Pallipadi, Venkatesh, Linux Kernel
Cc: John Stultz, Nakajima, Jun, jamesclv, Mallick, Asit K,
Saxena, Sunil
Apologies if I'm somewhat jumpy - bear in mind that I've burnt a lot of
hours
and torn out hair recently breaking up all the Summit patches (which I sent
you, but aren't generally published in their entirety yet). I've also got a
hell of a lot of time invested in making this area of code work as it is
now ;-)
(as have others). Going through another iteration isn't top on my list of
favourite things to do right now ... what you're trying to do does actually
seem
sane in general, I'm just not sure I like the method - probably fairly easy
to
fix.
As a general approach thing, it would be much smoother if you took the path
of trying to make it totally obvious that whatever you're trying to change
doesn't hurt anyone else. Part of that is small easily readable patches,
part of that is choosing your constructs carefully, explaining what you're
doing and why it's safe, and not turning changes on by default ;-)
> numaq summit/ all other
> other >8 CPU system systems
> -------------------------------------------------------------------------
> ----- clustered_apic_mode CLUSTERED CLUSTERED NONE
> configured_platform_type NUMAQ NONE NONE
> -------------------------------------------------------------------------
OK, so you still seem to have a tristate here. What does this gain us over
the existing scheme?
clustered_apic_mode == CLUSTERED_APIC_NUMAQ (equiv CLUSTERED / NUMAQ)
clustered_apic_mode == CLUSTERED_APIC_XAPIC (equiv CLUSTERED / NONE)
clustered_apic_mode == CLUSTERED_APIC_NONE (equiv NONE / NONE)
Or are their other situations you haven't outlined above?
> ----- Note that in the patch, wherever I said NUMA, I actually meant
> NUMAQ. I think I lost that Q, while I was trying to reduce the length of
> this variable (CONFIGURED_PLATFORM_NUMAQ) :). Sorry about all the
> resulting confusion. Doing a search and replace of NUMA by NUMAQ on my
> patch right now.
Cool, that starts to make a little more sense to me now then ...
> Noticeable changes here are
> - summit using CLUSTERED in place of XAPIC(Physical destination).
> - use "configured_platform_type" it basically separate out numaq specific
> stuff (like, waking up the CPUs through NMI), from the generic cluster
> apic support.
Right ... that's all my fault. I started off with clustered_apic_mode as
a simple boolean switch to represent the P3's clustered apic mode ... then
abused it for anything NUMA-Q specific. Stuff that's numaq specific should
be changed to something like "if (x86_numaq)" instead of "if
(clustered_apic_mode == ...)".
If we only have one platform type, it's going to make much more readable
code
to just use a boolean here. Somewhere in a header file (where
clustered_apic_mode
is defined):
#ifdef CONFIG_X86_NUMAQ
#define x86_numaq (1)
#else
#define x86_numaq (0)
#endif
Just makes the resultant c-code easier to read, and shoves all the ifdefs
in a
header file. I used to think that people complaining about ifdefs in code
was
annoying, but having tried to read the results of ifdef hell ... I rapidly
came
to the conclusion they're right ... the whole apic handling code area is
enough to
make your head hurt even if it's as readable as possible ;-)
I can't deny that the current code has a few problems re style and
cleanliness ...
I've been off doing 2.5 for ages, and that will be pretty clean after it's
broken into subarch.
> We are trying to use a common APIC destination mode for all systems with
> more than 8 CPUs. This is by having the logical clusters of the CPUs. I
> am hoping that this mode works fine on summit. Another option is to allow
> summit to continue using physical mode, if there is any binding reason
> to do so. But anyway NUMAQ specific stuff has to be separated from
> cluster APIC stuff.
Look at the 2.5 summit stuff I sent you - I think 2.5 uses logical on
Summit.
Historical split ... don't ask.
>> You seem to have lost turning on CONFIG_X86_NUMA.
>
> I dont see CONFIG_X86_NUMA getting used anywhere in 2.4.21-pre1. Am I
> missing something here??
I think it's used in some patches floating around ... as a general rule,
please don't delete stuff as cleanups at the same time as adding new
features,
the resultant tangle is very hard to verify correct (I have the scars from
trying to untangle such things, and they're fresh and they hurt ;-))
>> Errrm ... on by default?
>
> I was just trying to be little ambitious :). Will remove that now..
Cool ;-)
>> And that was off before for NUMA-Q ... you seem to have turned it on.
>> Unless you've inverted the meaning of clustered_apic_mode, which is
>> going to confuse the hell out of everyone?
>
> NO. This check is happening inside calculate_ldr() routine, and NUMAQ
> never comes to calculate_ldr(), as (according to the comments), it is the
> BIOS that programs LDR in NUMAQ. So only thing we have to worry about in
> calculate_ldr() is non-NUMAQ systems.
I don't have a view in front of me, but the fact there's a
"if (clustered_apic_mode != CLUSTERED_APIC_NUMAQ)" in there right now
makes me suspect it's needed. It's *possible* it's unneeded, but I'm
suspicious.
> I am trying to get the stuff which are _only_ specific to NUMAQ, under
> "platform type" check. And the stuff specific to cluster APIC setup under
> "apic mode" check.
See above ... I'm not sure you need anything this invasive to do what you
want. However, I do have a better idea what you're trying to do now ;-)
> Can you please review the complete patch now. I am not sure whether my
> explaination was clear enough. Let me know if have any questions.
I'll try to go through the updated one when you send it. Hint: smaller
patches with explanations of why they're safe are much easier to read.
Andrew Morton has managed to beat this lesson into me after a while ;-)
M.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH][2.4] generic cluster APIC support for systems with more than 8 CPUs
2002-12-19 2:45 Pallipadi, Venkatesh
@ 2002-12-19 4:14 ` James Cleverdon
0 siblings, 0 replies; 20+ messages in thread
From: James Cleverdon @ 2002-12-19 4:14 UTC (permalink / raw)
To: Pallipadi, Venkatesh, Linux Kernel, Christoph Hellwig
Cc: Martin Bligh, John Stultz, Nakajima, Jun, Mallick, Asit K,
Saxena, Sunil
On Wednesday 18 December 2002 06:45 pm, Pallipadi, Venkatesh wrote:
> > From: James Cleverdon [mailto:jamesclv@us.ibm.com]
> >
> > > On Wednesday 18 December 2002 05:05 pm, Pallipadi, Venkatesh wrote:
> > > I am not really sure about the local APIC versions in summit. What I
> > > remember seeing on lkml was summit has older IOAPIC
> >
> > version. Can someone
> >
> > > clarify this?
> >
> > Sure, I can verify it. The I/O APICs in shipped summit
> > chipsets contains a
> > version ID of 0x11 instead of 0x14 to 0x1F. The high
> > performance folks
> > claimed that Intel specified 0x14 for the local APICs, but
> > left their orange
> > jacket docs saying 0x1X for I/O APICs until after the chips taped out.
> >
> > Whatever. In any case, there are boxes in the field that
> > contain those
> > version numbers. We can recognize them using the OEM and
> > product strings in
> > the MPS and ACPI tables, so it's only an annoyance.
>
> OK. In my patch I am looking at local APIC version > 0x14, to check xAPIC
> support. This should work on all systems irrespective of IOAPIC version.
> And even if there are problems here for summit, we can workaround it, by
> simply forcing xAPIC support at already existing OEM string check.
>
>
> Thanks,
> -Venkatesh
Yes, such a scheme should work fine. (Had something like that in my patch,
but it was cut out to avoid any chance of breaking flat P4 boxen.)
Once you've determined that you have a system with xAPICs, how do you intend
to distinguish between those delivering interrupts through the serial bus vs.
the system bus? (Correct me if I'm wrong, but your patch didn't define the
new I/O APIC register that contains the serial/parallel status bit.)
How will you decide between a box that always must use clustered APIC mode,
like the x440, vs. one which can be operated in flat mode, like the x360? (I
only include the x360 in my summit patch to avoid having all the interrupts
hit CPU 0. Otherwise, it is a flat box that delivers interrupts through the
system bus.) What about other flat P4 systems?
--
James Cleverdon
IBM xSeries Linux Solutions
{jamesclv(Unix, preferred), cleverdj(Notes)} at us dot ibm dot com
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH][2.4] generic cluster APIC support for systems with more than 8 CPUs
2002-12-18 22:36 Pallipadi, Venkatesh
2002-12-18 23:26 ` Christoph Hellwig
2002-12-19 0:24 ` Martin J. Bligh
@ 2002-12-20 2:04 ` James Cleverdon
2002-12-20 8:00 ` Christoph Hellwig
2 siblings, 1 reply; 20+ messages in thread
From: James Cleverdon @ 2002-12-20 2:04 UTC (permalink / raw)
To: Pallipadi, Venkatesh, Linux Kernel
Cc: Martin Bligh, John Stultz, Nakajima, Jun, Mallick, Asit K,
Saxena, Sunil
On Wednesday 18 December 2002 02:36 pm, Pallipadi, Venkatesh wrote:
> 2.4.21-pre1 (i386) based patch to fix the issues with systems having more
> than 8 CPUs, in a generic way.
>
>
> Motivation:
> The current APIC destination mode ("Flat Logical") used in linux kernel has
> an upper limit of 8 CPUs. For more than 8 CPUs, either "Clustered Logical"
> or "Physical" mode has to be used.
> There is already some code in current kernel (2.4.21-pre1), to support such
> conditions. Specifically, IBM Summit, uses Physical mode, and IBM NUMAQ
> uses clustered mode. But, there are some issues too:
> - it is not generic enough to support any other more than 8 CPU system
> out of the box. Supporting different systems may need more hacks in the
> code. - clustered mode setup is tightly coupled with NUMA system. Whereas,
> in reality, we can as well have logical clusters in a non-NUMA system as
> well. - physical mode setup is somewhat tightly coupled with xAPIC. But,
> xAPIC doesn't necessarily imply physical mode. You can as well have
> clustered mode with xAPIC - APIC destination mode is selected based on
> particular OEM string.
>
> These reasons together led to panics on other OEM systems with > 8 CPUS.
> The patch tries to fix this issue in a generic way (in place of having
> multiple hacks for different OEMs). Note, the patch only intends to change
> the initialization of systems with more than 8 CPUs and it will not affect
> other systems (apart from possible bugs in my code itself).
Your goals are laudable and, in many ways, I share them. However, this is
2.4. Our goals for 2.4 have always been minimal changes and as little impact
as possible to the UP and flat SMP cases.
> Description:
> The basic idea is to use the number of processors detected on the system,
> to decide on which APIC destination mode is to be used. Once all the CPU
> info, is collected either from ACPI or MP table, we can check the total
> number of processors in the system.
> If the number of processors in less than equal to 8,
> then no change is required, and we can use the default, "Flat Logical"
> set up. If the number of processors is more than 8
> we can switch to clustered logical setup.
> The logical clusters set up as follows.
> Cluster 0 (CPU 0-3), Cluster 1 (CPU 4-7), Cluster 2 (CPU 8-11) and so
> on..
>
> The other things that are done in the patch include:
> - Separate out the NUMA specific stuff from APIC setup in cluster mode.
> Also, NUMA has its own way of setting up the clusters, and doesn't follow
> the logical cluster mapping defined above.
> - Separate out xAPIC stuff from APIC destination setup. And the
> availability of xAPIC support can actually be determined from the LAPIC
> version. - physical mode support _removed_, as we can use clustered logical
> setup to support can support upto a max of 60 CPUs. This is mainly because
> of the advantage of being able to setup IOAPICs in LowestPrio, when using
> clustered mode.
See my 2.5 patches for an example of using solely logical mode interrupts.
The patches submitted to 2.4 are those that have been in Alan's tree since
August and running in SuSE 8.0+ since 8.0's release.
> The whole stuff is protected by 'Clustered APIC (> 8 CPUs) support
> (CONFIG_X86_APIC_CLUSTER)' config option under Processor Type and Features.
> But going forward, we can actually make this as default, as this doesn't
> affect the systems with less than equal to 8 CPUs (Apart from minor
> increase in code size and couple of additional checks during boot up), but
> gives the default support to more than 8 CPU systems.
An x440 needs to use clustered APIC mode whenever more than two physical CPUs
are present. Consider scanning the list of physical APIC IDs to determine
whether clustered mode is necessary. (I had such code in my 2.5 patch, but
ripped it out when it falsely triggered on a couple oddball systems. You may
be able to write a more comprehensive analyzer.)
> Please let me know your comments/criticisms about this.
> I was able to test this successfully on an 8-way with HT(16 logical)
> CPU systems that I have access to. But, I haven't tested it on x440, or
> NUMAQ systems. Would love to hear about the effect of this patch on these
> systems.
>
> Thanks,
> -Venkatesh
A generic patch should also support Unisys' new box, the ES7000 or some such.
> > -----Original Message-----
> > From: Nakajima, Jun
> > Sent: Thursday, December 12, 2002 7:06 PM
> > To: jamesclv@us.ibm.com; Zwane Mwaikambo
> > Cc: Martin Bligh; John Stultz; Linux Kernel
> > Subject: RE: [PATCH][2.5][RFC] Using xAPIC apic address space
> > on !Summit
> >
> >
> > BTW, we are working on a xAPIC patch that supports more than
> > 8 CPUs in a
> > generic fashion (don't use hardcode OEM checking). We already
> > tested it on
> > two OEM systems with 16 CPUs.
> > - It uses clustered mode. We don't want to use physical mode
> > because it does
> > not support lowest priority delivery mode.
> > - We also check the version of the local APIC if it's xAPIC
> > or not. It's
> > possible that some other x86 architecture (other than P4P) uses xAPIC.
> >
> > Stay tuned.
> >
> > Jun
--
James Cleverdon
IBM xSeries Linux Solutions
{jamesclv(Unix, preferred), cleverdj(Notes)} at us dot ibm dot com
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH][2.4] generic cluster APIC support for systems with more than 8 CPUs
2002-12-20 2:04 ` James Cleverdon
@ 2002-12-20 8:00 ` Christoph Hellwig
2002-12-20 11:24 ` William Lee Irwin III
0 siblings, 1 reply; 20+ messages in thread
From: Christoph Hellwig @ 2002-12-20 8:00 UTC (permalink / raw)
To: James Cleverdon
Cc: Pallipadi, Venkatesh, Linux Kernel, Martin Bligh, John Stultz,
Nakajima, Jun, Mallick, Asit K, Saxena, Sunil
On Thu, Dec 19, 2002 at 06:04:55PM -0800, James Cleverdon wrote:
> A generic patch should also support Unisys' new box, the ES7000 or some such.
That box needs more changes than just the apic setup. Unfortunately
unisys thinks they shouldn't send their patches to lkml, but when you see
them e.g. in the suse tree it's a bit understandable that they don't want
anyone to really see their mess :)
And btw, the box isn't that new, but three years ago or so when they first
showed it on cebit they even refused to talk about linux due to their
restrictive agreements with Microsoft..
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH][2.4] generic cluster APIC support for systems with more than 8 CPUs
2002-12-20 8:00 ` Christoph Hellwig
@ 2002-12-20 11:24 ` William Lee Irwin III
2002-12-20 11:29 ` William Lee Irwin III
0 siblings, 1 reply; 20+ messages in thread
From: William Lee Irwin III @ 2002-12-20 11:24 UTC (permalink / raw)
To: Christoph Hellwig, James Cleverdon, Pallipadi, Venkatesh,
Linux Kernel, Martin Bligh, John Stultz, Nakajima, Jun,
Mallick, Asit K, Saxena, Sunil
On Thu, Dec 19, 2002 at 06:04:55PM -0800, James Cleverdon wrote:
>> A generic patch should also support Unisys' new box, the ES7000 or
>> some such.
On Fri, Dec 20, 2002 at 08:00:50AM +0000, Christoph Hellwig wrote:
> That box needs more changes than just the apic setup. Unfortunately
> unisys thinks they shouldn't send their patches to lkml, but when you see
> them e.g. in the suse tree it's a bit understandable that they don't want
> anyone to really see their mess :)
> And btw, the box isn't that new, but three years ago or so when they first
> showed it on cebit they even refused to talk about linux due to their
> restrictive agreements with Microsoft..
Kevin, you're the only lkml-posting contact point I know of within Unisys.
Is there any chance you could flag down some of the ia32 crew there for
some commentary on this stuff? (or do so yourself if you're in it)
Thanks,
Bill
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH][2.4] generic cluster APIC support for systems with more than 8 CPUs
2002-12-20 11:24 ` William Lee Irwin III
@ 2002-12-20 11:29 ` William Lee Irwin III
0 siblings, 0 replies; 20+ messages in thread
From: William Lee Irwin III @ 2002-12-20 11:29 UTC (permalink / raw)
To: Christoph Hellwig, James Cleverdon, Pallipadi, Venkatesh,
Linux Kernel, Martin Bligh, John Stultz, Nakajima, Jun,
Mallick, Asit K, Saxena, Sunil, kevin.vanmaren
On Thu, Dec 19, 2002 at 06:04:55PM -0800, James Cleverdon wrote:
>>> A generic patch should also support Unisys' new box, the ES7000 or
>>> some such.
On Fri, Dec 20, 2002 at 08:00:50AM +0000, Christoph Hellwig wrote:
>> That box needs more changes than just the apic setup. Unfortunately
>> unisys thinks they shouldn't send their patches to lkml, but when you see
>> them e.g. in the suse tree it's a bit understandable that they don't want
>> anyone to really see their mess :)
>> And btw, the box isn't that new, but three years ago or so when they first
>> showed it on cebit they even refused to talk about linux due to their
>> restrictive agreements with Microsoft..
On Fri, Dec 20, 2002 at 03:24:01AM -0800, William Lee Irwin III wrote:
> Kevin, you're the only lkml-posting contact point I know of within Unisys.
> Is there any chance you could flag down some of the ia32 crew there for
> some commentary on this stuff? (or do so yourself if you're in it)
Sorry for the meaningless post -- I forgot to add Kevin to the cc: list.
Bill
^ permalink raw reply [flat|nested] 20+ messages in thread
* RE: [PATCH][2.4] generic cluster APIC support for systems with more than 8 CPUs
@ 2002-12-21 3:27 Nakajima, Jun
2002-12-21 4:43 ` Martin J. Bligh
0 siblings, 1 reply; 20+ messages in thread
From: Nakajima, Jun @ 2002-12-21 3:27 UTC (permalink / raw)
To: Martin J. Bligh, Van Maren, Kevin, William Lee Irwin III,
Christoph Hellwig, James Cleverdon, Pallipadi, Venkatesh,
John Stultz, Mallick, Asit K, Saxena, Sunil, Linux Kernel
Cc: Protasevich, Natalie
I share the same concerns and comments with Martin.
As far as xAPIC mode is concerned, the changes for ES7000 in SuSe/United Linux are simply activating physical mode. And we are confident the patch we provided should work for the machine as well. Looks like ES7000 requires changes in other areas as well, though.
Since Martin already has code in place in 2.5, we should reuse his code as much as possible. And our current plan is:
For 2.5:
- Martin posts a new patch (that moves IBM-specifc stuff to subarch, for example) next week.
- Venkatesh merges the generic cluster APIC support for systems (with more than 8 CPUs) to it, testing it on some OEM machines (I cannot tell which)
For 2.4:
- Venkatesh will post a confined patch to support APIC physical mode.
Thanks,
Jun
> -----Original Message-----
> From: Martin J. Bligh [mailto:mbligh@aracnet.com]
> Sent: Friday, December 20, 2002 8:30 AM
> To: Van Maren, Kevin; 'William Lee Irwin III'; Christoph Hellwig; James
> Cleverdon; Pallipadi, Venkatesh; Linux Kernel; John Stultz; Nakajima, Jun;
> Mallick, Asit K; Saxena, Sunil
> Cc: Protasevich, Natalie
> Subject: RE: [PATCH][2.4] generic cluster APIC support for systems with m
> ore than 8 CPUs
>
> > Natalie is the engineer who added support for the ES7000 to Linux.
> > Fortunately she is in the cube next to me.
> >
> > She has sent the patches to SuSe/United Linux, and is in the final
> process
> > of testing them on 2.5.5x before submitting them to LKML for comment.
>
> Are they under subarch or somehow abstracted this time, or are there
> going to be 10,000 ifdef's again? If the latter, I can predict what
> the first set of review comments you get are going to be ;-)
>
> M.
^ permalink raw reply [flat|nested] 20+ messages in thread
* RE: [PATCH][2.4] generic cluster APIC support for systems with more than 8 CPUs
2002-12-21 3:27 Nakajima, Jun
@ 2002-12-21 4:43 ` Martin J. Bligh
0 siblings, 0 replies; 20+ messages in thread
From: Martin J. Bligh @ 2002-12-21 4:43 UTC (permalink / raw)
To: Nakajima, Jun, Van Maren, Kevin, William Lee Irwin III,
Christoph Hellwig, James Cleverdon, Pallipadi, Venkatesh,
John Stultz, Mallick, Asit K, Saxena, Sunil, Linux Kernel
Cc: Protasevich, Natalie
> I share the same concerns and comments with Martin.
>
> As far as xAPIC mode is concerned, the changes for ES7000 in SuSe/United
> Linux are simply activating physical mode. And we are confident the patch
> we provided should work for the machine as well. Looks like ES7000
> requires changes in other areas as well, though.
>
> Since Martin already has code in place in 2.5, we should reuse his code
> as much as possible. And our current plan is:
I should point out that James wrote most of the Summit code, not me (I did
the original NUMA-Q code) - I'm splitting it out into manageable chunks
and debugging it (it breaks NUMA-Q at the moment, but I think that's fixed
now it's in nice bite-sized pieces).
> For 2.5:
>
> - Martin posts a new patch (that moves IBM-specifc stuff to subarch, for
> example) next week. - Venkatesh merges the generic cluster APIC support
> for systems (with more than 8 CPUs) to it, testing it on some OEM
> machines (I cannot tell which)
Excellent - thankyou for this. When it's abstracted out (as the patches
I'll send out as soon as I've got them tested do), it should be much
easier to merge things together.
> For 2.4:
> - Venkatesh will post a confined patch to support APIC physical mode.
That should be what the current 2.4 summit code uses ... oddly it's
physical in 2.4, and logical in 2.5 ... don't ask why ;-) If you meant
logical, that sounds like a good plan.
M.
^ permalink raw reply [flat|nested] 20+ messages in thread
* RE: [PATCH][2.4] generic cluster APIC support for systems with more than 8 CPUs
@ 2003-01-07 22:42 Andrew Theurer
2003-01-11 4:28 ` James Cleverdon
0 siblings, 1 reply; 20+ messages in thread
From: Andrew Theurer @ 2003-01-07 22:42 UTC (permalink / raw)
To: nitin.a.kamble; +Cc: Martin J. Bligh, linux-kernel, jamesclv
>Hi Martin,
> Would somebody get a chance to try the kirq patch out? If yes, please
>let me know, how the patch did on your systems. Did your guys find any
>issues with it? I will also appreciate more comments.
>
>Thanks & Regards,
>Nitin
Nitin,
I am seeing if I can get together a test for Netbench with/without your patch.
Looking at the patch, I would expect a slight increase in performance. I'll
let you know the results as soon as I have them.
I do have one question for you: Have you tested netperf using only one
gigabit adapter? If so, have you been able to max out the adapter when using
hyperthreading? If not, could you test this? So far I have not been able
to, while I can quite easily with no hyperthreading (in Netbench). This is
the case with both irq_balance and irq affinity. having ints processed by
one and only one logical CPU at one time really seems to bottleneck network
throughput. I'm sure some of this has to do with sharing those resources
among 2 logical CPUs, but I also wonder if int processing is just a lot
slower than P3 overall.
I am bringing this up, because I recall James Cleverdon having some code which
allows interrupts to be dynamically routed to two CPU destinations, a pair of
CPUs with consecutive CPU ID's. Interrupts are dynamically routed to the
least loaded CPU, and if both are idle, to the CPU with the lower CPUID. I
like this idea, because when in HT, if consecutive logical CPU ID's map to
one physical core, we get to use "whole" processor, and both destinations
share the cache. Anyway, just a thought.
-Andrew Theurer
^ permalink raw reply [flat|nested] 20+ messages in thread
* RE: [PATCH][2.4] generic cluster APIC support for systems with more than 8 CPUs
@ 2003-01-08 3:19 Kamble, Nitin A
0 siblings, 0 replies; 20+ messages in thread
From: Kamble, Nitin A @ 2003-01-08 3:19 UTC (permalink / raw)
To: Andrew Theurer
Cc: Martin J. Bligh, linux-kernel, jamesclv, Nakajima, Jun,
Mallick, Asit K, Saxena, Sunil, Schlobohm, Bruce
Hi Andrew,
> I am seeing if I can get together a test for Netbench with/without
your
> patch.
> Looking at the patch, I would expect a slight increase in performance.
> I'll
> let you know the results as soon as I have them.
>
[NK] Thanks for trying it out. The numbers from my post may be useful to
you.
> I do have one question for you: Have you tested netperf using only
one
> gigabit adapter? If so, have you been able to max out the adapter
when
> using
> hyperthreading? If not, could you test this? So far I have not been
able
> to, while I can quite easily with no hyperthreading (in Netbench).
This is
> the case with both irq_balance and irq affinity. having ints
processed by
> one and only one logical CPU at one time really seems to bottleneck
> network
> throughput. I'm sure some of this has to do with sharing those
resources
> among 2 logical CPUs, but I also wonder if int processing is just a
lot
> slower than P3 overall.
[NK] While testing we had 4 100mbps NICs with 4 Way Intel P4 Xeon
1.6GHz. What is your system configuration? Some numbers from your
experiments would be useful in understanding the situation better. Also
what do you mean by no hyper-threading, Is HT disabled in the BIOS, or
the HT awareness is disabled in the code. I need more details of your
setup to test it out.
>
> I am bringing this up, because I recall James Cleverdon having some
code
> which
> allows interrupts to be dynamically routed to two CPU destinations, a
pair
> of
> CPUs with consecutive CPU ID's. Interrupts are dynamically routed to
the
> least loaded CPU, and if both are idle, to the CPU with the lower
CPUID.
> I
> like this idea, because when in HT, if consecutive logical CPU ID's
map to
> one physical core, we get to use "whole" processor, and both
destinations
> share the cache. Anyway, just a thought.
[NK] This case will work well if the CPUs are lightly loaded. For
Heavily loaded CPUs to let then perform their on task, it is required to
move the interrupts out of the package.
Thanks,
Nitin
>
> -Andrew Theurer
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH][2.4] generic cluster APIC support for systems with more than 8 CPUs
2003-01-07 22:42 [PATCH][2.4] generic cluster APIC support for systems with more than 8 CPUs Andrew Theurer
@ 2003-01-11 4:28 ` James Cleverdon
0 siblings, 0 replies; 20+ messages in thread
From: James Cleverdon @ 2003-01-11 4:28 UTC (permalink / raw)
To: Andrew Theurer, nitin.a.kamble; +Cc: Martin J. Bligh, linux-kernel, John Stultz
[-- Attachment #1: Type: text/plain, Size: 795 bytes --]
On Tuesday 07 January 2003 02:42 pm, Andrew Theurer wrote:
[ Snip! ]
>
> I am bringing this up, because I recall James Cleverdon having some code
> which allows interrupts to be dynamically routed to two CPU destinations, a
> pair of CPUs with consecutive CPU ID's. Interrupts are dynamically routed
> to the least loaded CPU, and if both are idle, to the CPU with the lower
> CPUID. I like this idea, because when in HT, if consecutive logical CPU
> ID's map to one physical core, we get to use "whole" processor, and both
> destinations share the cache. Anyway, just a thought.
>
> -Andrew Theurer
Here's a quick respin of my old TPR patch for 2.5.55.
--
James Cleverdon
IBM xSeries Linux Solutions
{jamesclv(Unix, preferred), cleverdj(Notes)} at us dot ibm dot com
[-- Attachment #2: tpr_dyn-2003-01-10_2.5.55 --]
[-- Type: text/x-diff, Size: 3480 bytes --]
diff -pru j55/arch/i386/kernel/irq.c t55/arch/i386/kernel/irq.c
--- j55/arch/i386/kernel/irq.c Wed Jan 8 20:03:51 2003
+++ t55/arch/i386/kernel/irq.c Fri Jan 10 18:01:03 2003
@@ -329,6 +329,7 @@ asmlinkage unsigned int do_IRQ(struct pt
struct irqaction * action;
unsigned int status;
+ apic_adj_tpr(TPR_IRQ);
irq_enter();
#ifdef CONFIG_DEBUG_STACKOVERFLOW
@@ -406,6 +407,7 @@ out:
spin_unlock(&desc->lock);
irq_exit();
+ apic_adj_tpr(-TPR_IRQ);
return 1;
}
diff -pru j55/arch/i386/kernel/process.c t55/arch/i386/kernel/process.c
--- j55/arch/i386/kernel/process.c Wed Jan 8 20:03:48 2003
+++ t55/arch/i386/kernel/process.c Fri Jan 10 17:59:13 2003
@@ -143,7 +143,9 @@ void cpu_idle (void)
irq_stat[smp_processor_id()].idle_timestamp = jiffies;
while (!need_resched())
idle();
+ apic_set_tpr(TPR_TASK);
schedule();
+ apic_set_tpr(TPR_IDLE);
}
}
diff -pru j55/include/asm-i386/apic.h t55/include/asm-i386/apic.h
--- j55/include/asm-i386/apic.h Wed Jan 8 20:04:27 2003
+++ t55/include/asm-i386/apic.h Fri Jan 10 17:59:13 2003
@@ -64,6 +64,22 @@ static inline void ack_APIC_irq(void)
apic_write_around(APIC_EOI, 0);
}
+static inline void apic_set_tpr(unsigned long val)
+{
+ unsigned long value;
+
+ value = apic_read(APIC_TASKPRI);
+ apic_write_around(APIC_TASKPRI, (value & ~APIC_TPRI_MASK) + val);
+}
+
+static inline void apic_adj_tpr(long adj)
+{
+ unsigned long value;
+
+ value = apic_read(APIC_TASKPRI);
+ apic_write_around(APIC_TASKPRI, value + adj);
+}
+
extern int get_maxlvt(void);
extern void clear_local_APIC(void);
extern void connect_bsp_APIC (void);
@@ -96,6 +112,15 @@ extern unsigned int nmi_watchdog;
#define NMI_LOCAL_APIC 2
#define NMI_INVALID 3
+#else /* CONFIG_X86_LOCAL_APIC */
+#define apic_set_tpr(val)
+#define apic_adj_tpr(adj)
#endif /* CONFIG_X86_LOCAL_APIC */
+/* Priority values for apic_adj_tpr() and apic_set_tpr() */
+/* xAPICs only do priority comparisons on the upper nibble. */
+#define TPR_IDLE (0x00L)
+#define TPR_TASK (0x10L)
+#define TPR_IRQ (0x10L)
+
#endif /* __ASM_APIC_H */
diff -pru j55/include/asm-i386/mach-summit/mach_apic.h t55/include/asm-i386/mach-summit/mach_apic.h
--- j55/include/asm-i386/mach-summit/mach_apic.h Fri Jan 10 16:16:44 2003
+++ t55/include/asm-i386/mach-summit/mach_apic.h Fri Jan 10 19:24:52 2003
@@ -4,7 +4,7 @@
extern int x86_summit;
#define esr_disable (1)
-#define no_balance_irq (0)
+#define no_balance_irq (1)
#define XAPIC_DEST_CPUS_MASK 0x0Fu
#define XAPIC_DEST_CLUSTER_MASK 0xF0u
@@ -13,7 +13,7 @@ extern int x86_summit;
((phys_apic) & XAPIC_DEST_CLUSTER_MASK) )
#define APIC_DFR_VALUE (x86_summit ? APIC_DFR_CLUSTER : APIC_DFR_FLAT)
-#define TARGET_CPUS (x86_summit ? XAPIC_DEST_CPUS_MASK : cpu_online_map)
+#define TARGET_CPUS (x86_summit ? xapic_round_robin_cpu_apic_id() : cpu_online_map)
#define APIC_BROADCAST_ID (x86_summit ? 0xFF : 0x0F)
#define check_apicid_used(bitmap, apicid) (0)
@@ -106,4 +106,20 @@ static inline int check_phys_apicid_pres
return (1);
}
+/*
+ * xapic_round_robin_cpu_apic_id -- Distribute the interrupts using a simple
+ * round robin scheme.
+ */
+static inline int xapic_round_robin_cpu_apic_id(void)
+{
+ int val;
+ static unsigned next_cpu = 0;
+
+ if (next_cpu >= NR_CPUS || cpu_2_logical_apicid[next_cpu] == BAD_APICID)
+ next_cpu = 0;
+ val = cpu_to_logical_apicid(next_cpu) | XAPIC_DEST_CPUS_MASK;
+ ++next_cpu;
+ return (val);
+}
+
#endif /* __ASM_MACH_APIC_H */
^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2003-01-11 4:21 UTC | newest]
Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-01-07 22:42 [PATCH][2.4] generic cluster APIC support for systems with more than 8 CPUs Andrew Theurer
2003-01-11 4:28 ` James Cleverdon
-- strict thread matches above, loose matches on Subject: below --
2003-01-08 3:19 Kamble, Nitin A
2002-12-21 3:27 Nakajima, Jun
2002-12-21 4:43 ` Martin J. Bligh
2002-12-19 2:45 Pallipadi, Venkatesh
2002-12-19 4:14 ` James Cleverdon
2002-12-19 2:35 Pallipadi, Venkatesh
2002-12-19 3:10 ` Martin J. Bligh
2002-12-19 1:05 Pallipadi, Venkatesh
2002-12-19 1:32 ` James Cleverdon
2002-12-18 22:36 Pallipadi, Venkatesh
2002-12-18 23:26 ` Christoph Hellwig
2002-12-18 23:41 ` William Lee Irwin III
2002-12-18 23:59 ` Martin J. Bligh
2002-12-19 0:24 ` Martin J. Bligh
2002-12-20 2:04 ` James Cleverdon
2002-12-20 8:00 ` Christoph Hellwig
2002-12-20 11:24 ` William Lee Irwin III
2002-12-20 11:29 ` William Lee Irwin III
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox