public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/16] dyn_array and nr_irqs support v2
@ 2008-08-01  9:37 Yinghai Lu
  2008-08-01  9:37 ` [PATCH 01/16] x86: 64bit support more than 256 irq Yinghai Lu
  2008-08-01 20:46 ` [PATCH 00/16] dyn_array and nr_irqs support v2 Eric W. Biederman
  0 siblings, 2 replies; 39+ messages in thread
From: Yinghai Lu @ 2008-08-01  9:37 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, hpa, Eric Biederman, Dhaval Giani,
	Mike Travis, Andrew Morton
  Cc: linux-kernel, Yinghai Lu

Please check dyn_array support for x86

Thanks

Yinghai Lu


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH 01/16] x86: 64bit support more than 256 irq
  2008-08-01  9:37 [PATCH 00/16] dyn_array and nr_irqs support v2 Yinghai Lu
@ 2008-08-01  9:37 ` Yinghai Lu
  2008-08-01  9:37   ` [PATCH 02/16] x86: introduce nr_irqs for 64bit v3 Yinghai Lu
  2008-08-01 20:46 ` [PATCH 00/16] dyn_array and nr_irqs support v2 Eric W. Biederman
  1 sibling, 1 reply; 39+ messages in thread
From: Yinghai Lu @ 2008-08-01  9:37 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, hpa, Eric Biederman, Dhaval Giani,
	Mike Travis, Andrew Morton
  Cc: linux-kernel, Yinghai Lu

Dhaval Giani got:
kernel BUG at arch/x86/kernel/io_apic_64.c:357!
invalid opcode: 0000 [1] SMP
CPU 24
...

his system (x3950) has 8 ioapic, irq > 256

caused by
        commit 9b7dc567d03d74a1fbae84e88949b6a60d922d82
        Author: Thomas Gleixner <tglx@linutronix.de>
        Date:   Fri May 2 20:10:09 2008 +0200

           x86: unify interrupt vector defines

           The interrupt vector defines are copied 4 times around with minimal
           differences. Move them all into asm-x86/irq_vectors.h

because 64bit allow same vector for different cpu to serve different irq

need to create that array dynamically later

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Tested-by: Dhaval Giani <dhaval@linux.vnet.ibm.com>

---
 include/asm-x86/irq_vectors.h |   14 ++++++--------
 1 file changed, 6 insertions(+), 8 deletions(-)
Index: linux-2.6/include/asm-x86/irq_vectors.h
===================================================================
--- linux-2.6.orig/include/asm-x86/irq_vectors.h
+++ linux-2.6/include/asm-x86/irq_vectors.h
@@ -113,28 +113,26 @@
 
 # if defined(CONFIG_X86_IO_APIC) || defined(CONFIG_PARAVIRT) || defined(CONFIG_X86_VISWS)
 
+#ifdef CONFIG_X86_64
+#  define NR_IRQS		(32 * NR_CPUS + 224)
+#else
 #  define NR_IRQS		224
-
-#  if (224 >= 32 * NR_CPUS)
-#   define NR_IRQ_VECTORS	NR_IRQS
-#  else
-#   define NR_IRQ_VECTORS	(32 * NR_CPUS)
-#  endif
+#endif
 
 # else /* IO_APIC || PARAVIRT */
 
 #  define NR_IRQS		16
-#  define NR_IRQ_VECTORS	NR_IRQS
 
 # endif
 
 #else /* !VISWS && !VOYAGER */
 
 # define NR_IRQS		224
-# define NR_IRQ_VECTORS		NR_IRQS
 
 #endif /* VISWS */
 
+#define NR_IRQ_VECTORS		NR_IRQS
+
 /* Voyager specific defines */
 /* These define the CPIs we use in linux */
 #define VIC_CPI_LEVEL0			0

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH 02/16] x86: introduce nr_irqs for 64bit v3
  2008-08-01  9:37 ` [PATCH 01/16] x86: 64bit support more than 256 irq Yinghai Lu
@ 2008-08-01  9:37   ` Yinghai Lu
  2008-08-01  9:37     ` [PATCH 03/16] add dyn_array support Yinghai Lu
  0 siblings, 1 reply; 39+ messages in thread
From: Yinghai Lu @ 2008-08-01  9:37 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, hpa, Eric Biederman, Dhaval Giani,
	Mike Travis, Andrew Morton
  Cc: linux-kernel, Yinghai Lu

add DEFINE_DYN_ARRAY for dynamical array support

v2: other platform will have nr_irqs = NR_IRQS
    for MAXSMP/UV: could set smaller nr_irqs in acpi_madt_oem_check in genx2_apic_uv_x
v3: seperate DYN_ARRAY and enabling to x86_64 to following patches

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
---
---
 arch/x86/kernel/io_apic_32.c            |   26 +++++++++++++------------
 arch/x86/kernel/io_apic_64.c            |   33 ++++++++++++++++----------------
 arch/x86/kernel/irq_32.c                |    8 +++----
 arch/x86/kernel/irq_64.c                |    8 +++----
 arch/x86/kernel/irqinit_32.c            |    2 -
 arch/x86/kernel/irqinit_64.c            |    2 -
 drivers/char/hpet.c                     |    2 -
 drivers/char/random.c                   |    4 +--
 drivers/char/vr41xx_giu.c               |    2 -
 drivers/net/3c59x.c                     |    4 +--
 drivers/net/hamradio/baycom_ser_fdx.c   |    4 +--
 drivers/net/hamradio/scc.c              |    6 ++---
 drivers/net/wan/sbni.c                  |    2 -
 drivers/pci/intr_remapping.c            |   16 +++++++--------
 drivers/pcmcia/at91_cf.c                |    2 -
 drivers/pcmcia/vrc4171_card.c           |    2 -
 drivers/rtc/rtc-vr41xx.c                |    4 +--
 drivers/scsi/aha152x.c                  |    2 -
 drivers/serial/8250.c                   |    4 +--
 drivers/serial/amba-pl010.c             |    2 -
 drivers/serial/amba-pl011.c             |    2 -
 drivers/serial/cpm_uart/cpm_uart_core.c |    2 -
 drivers/serial/m32r_sio.c               |    4 +--
 drivers/serial/serial_core.c            |    2 -
 drivers/serial/serial_lh7a40x.c         |    2 -
 drivers/serial/sh-sci.c                 |    2 -
 drivers/serial/ucc_uart.c               |    2 -
 drivers/xen/events.c                    |   12 +++++------
 fs/proc/proc_misc.c                     |   10 ++++-----
 include/asm-x86/irq.h                   |    3 ++
 include/linux/irq.h                     |    2 +
 kernel/irq/autoprobe.c                  |   10 ++++-----
 kernel/irq/chip.c                       |   20 +++++++++----------
 kernel/irq/handle.c                     |    3 +-
 kernel/irq/manage.c                     |   16 +++++++--------
 kernel/irq/proc.c                       |    2 -
 kernel/irq/resend.c                     |    4 +--
 kernel/irq/spurious.c                   |    4 +--
 38 files changed, 123 insertions(+), 114 deletions(-)

Index: linux-2.6/arch/x86/kernel/io_apic_32.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/io_apic_32.c
+++ linux-2.6/arch/x86/kernel/io_apic_32.c
@@ -70,6 +70,7 @@ int timer_through_8259 __initdata;
  */
 int sis_apic_bug = -1;
 
+int first_free_entry = NR_IRQS;
 /*
  * # of IRQ routing registers
  */
@@ -100,6 +101,8 @@ static int disable_timer_pin_1 __initdat
 #define MAX_PLUS_SHARED_IRQS NR_IRQS
 #define PIN_MAP_SIZE (MAX_PLUS_SHARED_IRQS + NR_IRQS)
 
+int pin_map_size = PIN_MAP_SIZE;
+
 /*
  * This is performance-critical, we want to do it O(1)
  *
@@ -213,7 +216,6 @@ static void ioapic_mask_entry(int apic,
  */
 static void add_pin_to_irq(unsigned int irq, int apic, int pin)
 {
-	static int first_free_entry = NR_IRQS;
 	struct irq_pin_list *entry = irq_2_pin + irq;
 
 	while (entry->next)
@@ -222,7 +224,7 @@ static void add_pin_to_irq(unsigned int
 	if (entry->pin != -1) {
 		entry->next = first_free_entry;
 		entry = irq_2_pin + entry->next;
-		if (++first_free_entry >= PIN_MAP_SIZE)
+		if (++first_free_entry >= pin_map_size)
 			panic("io_apic.c: whoops");
 	}
 	entry->apic = apic;
@@ -457,7 +459,7 @@ static inline void rotate_irqs_among_cpu
 	int i, j;
 
 	for_each_online_cpu(i) {
-		for (j = 0; j < NR_IRQS; j++) {
+		for (j = 0; j < nr_irqs; j++) {
 			if (!irq_desc[j].action)
 				continue;
 			/* Is it a significant load ?  */
@@ -492,7 +494,7 @@ static void do_irq_balance(void)
 		if (!cpu_online(i))
 			continue;
 		package_index = CPU_TO_PACKAGEINDEX(i);
-		for (j = 0; j < NR_IRQS; j++) {
+		for (j = 0; j < nr_irqs; j++) {
 			unsigned long value_now, delta;
 			/* Is this an active IRQ or balancing disabled ? */
 			if (!irq_desc[j].action || irq_balancing_disabled(j))
@@ -587,7 +589,7 @@ tryanotherirq:
 	 */
 	move_this_load = 0;
 	selected_irq = -1;
-	for (j = 0; j < NR_IRQS; j++) {
+	for (j = 0; j < nr_irqs; j++) {
 		/* Is this an active IRQ? */
 		if (!irq_desc[j].action)
 			continue;
@@ -664,7 +666,7 @@ static int balanced_irq(void *unused)
 	long time_remaining = balanced_irq_interval;
 
 	/* push everything to CPU 0 to give us a starting point.  */
-	for (i = 0 ; i < NR_IRQS ; i++) {
+	for (i = 0 ; i < nr_irqs ; i++) {
 		irq_desc[i].pending_mask = cpumask_of_cpu(0);
 		set_pending_irq(i, cpumask_of_cpu(0));
 	}
@@ -712,8 +714,8 @@ static int __init balanced_irq_init(void
 		physical_balance = 1;
 
 	for_each_online_cpu(i) {
-		irq_cpu_data[i].irq_delta = kzalloc(sizeof(unsigned long) * NR_IRQS, GFP_KERNEL);
-		irq_cpu_data[i].last_irq = kzalloc(sizeof(unsigned long) * NR_IRQS, GFP_KERNEL);
+		irq_cpu_data[i].irq_delta = kzalloc(sizeof(unsigned long) * nr_irqs, GFP_KERNEL);
+		irq_cpu_data[i].last_irq = kzalloc(sizeof(unsigned long) * nr_irqs, GFP_KERNEL);
 		if (irq_cpu_data[i].irq_delta == NULL || irq_cpu_data[i].last_irq == NULL) {
 			printk(KERN_ERR "balanced_irq_init: out of memory");
 			goto failed;
@@ -1445,7 +1447,7 @@ __apicdebuginit(void) print_IO_APIC(void
 	}
 	}
 	printk(KERN_DEBUG "IRQ to pin mappings:\n");
-	for (i = 0; i < NR_IRQS; i++) {
+	for (i = 0; i < nr_irqs; i++) {
 		struct irq_pin_list *entry = irq_2_pin + i;
 		if (entry->pin < 0)
 			continue;
@@ -1625,7 +1627,7 @@ static void __init enable_IO_APIC(void)
 	int i, apic;
 	unsigned long flags;
 
-	for (i = 0; i < PIN_MAP_SIZE; i++) {
+	for (i = 0; i < pin_map_size; i++) {
 		irq_2_pin[i].pin = -1;
 		irq_2_pin[i].next = 0;
 	}
@@ -2009,7 +2011,7 @@ static inline void init_IO_APIC_traps(vo
 	 * Also, we've got to be careful not to trash gate
 	 * 0x80, because int 0x80 is hm, kind of importantish. ;)
 	 */
-	for (irq = 0; irq < NR_IRQS ; irq++) {
+	for (irq = 0; irq < nr_irqs ; irq++) {
 		if (IO_APIC_IRQ(irq) && !irq_vector[irq]) {
 			/*
 			 * Hmm.. We don't have an entry for this,
@@ -2453,7 +2455,7 @@ int create_irq(void)
 
 	irq = -ENOSPC;
 	spin_lock_irqsave(&vector_lock, flags);
-	for (new = (NR_IRQS - 1); new >= 0; new--) {
+	for (new = (nr_irqs - 1); new >= 0; new--) {
 		if (platform_legacy_irq(new))
 			continue;
 		if (irq_vector[new] != 0)
Index: linux-2.6/arch/x86/kernel/io_apic_64.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/io_apic_64.c
+++ linux-2.6/arch/x86/kernel/io_apic_64.c
@@ -132,6 +132,7 @@ DECLARE_BITMAP(mp_bus_not_pci, MAX_MP_BU
 #define MAX_PLUS_SHARED_IRQS NR_IRQS
 #define PIN_MAP_SIZE (MAX_PLUS_SHARED_IRQS + NR_IRQS)
 
+int pin_map_size = PIN_MAP_SIZE;
 /*
  * This is performance-critical, we want to do it O(1)
  *
@@ -224,7 +225,7 @@ static inline void io_apic_sync(unsigned
 	int pin;							\
 	struct irq_pin_list *entry = irq_2_pin + irq;			\
 									\
-	BUG_ON(irq >= NR_IRQS);						\
+	BUG_ON(irq >= nr_irqs);						\
 	for (;;) {							\
 		unsigned int reg;					\
 		pin = entry->pin;					\
@@ -301,7 +302,7 @@ static void __target_IO_APIC_irq(unsigne
 	int apic, pin;
 	struct irq_pin_list *entry = irq_2_pin + irq;
 
-	BUG_ON(irq >= NR_IRQS);
+	BUG_ON(irq >= nr_irqs);
 	for (;;) {
 		unsigned int reg;
 		apic = entry->apic;
@@ -358,19 +359,19 @@ static void set_ioapic_affinity_irq(unsi
  * shared ISA-space IRQs, so we have to support them. We are super
  * fast in the common case, and fast for shared ISA-space IRQs.
  */
+int first_free_entry = NR_IRQS;
 static void add_pin_to_irq(unsigned int irq, int apic, int pin)
 {
-	static int first_free_entry = NR_IRQS;
 	struct irq_pin_list *entry = irq_2_pin + irq;
 
-	BUG_ON(irq >= NR_IRQS);
+	BUG_ON(irq >= nr_irqs);
 	while (entry->next)
 		entry = irq_2_pin + entry->next;
 
 	if (entry->pin != -1) {
 		entry->next = first_free_entry;
 		entry = irq_2_pin + entry->next;
-		if (++first_free_entry >= PIN_MAP_SIZE)
+		if (++first_free_entry >= pin_map_size)
 			panic("io_apic.c: ran out of irq_2_pin entries!");
 	}
 	entry->apic = apic;
@@ -634,7 +635,7 @@ int IO_APIC_get_PCI_irq_vector(int bus,
 				best_guess = irq;
 		}
 	}
-	BUG_ON(best_guess >= NR_IRQS);
+	BUG_ON(best_guess >= nr_irqs);
 	return best_guess;
 }
 
@@ -766,7 +767,7 @@ static int pin_2_irq(int idx, int apic,
 			irq += nr_ioapic_registers[i++];
 		irq += pin;
 	}
-	BUG_ON(irq >= NR_IRQS);
+	BUG_ON(irq >= nr_irqs);
 	return irq;
 }
 
@@ -788,7 +789,7 @@ static int __assign_irq_vector(int irq,
 	int cpu;
 	struct irq_cfg *cfg;
 
-	BUG_ON((unsigned)irq >= NR_IRQS);
+	BUG_ON((unsigned)irq >= nr_irqs);
 	cfg = &irq_cfg[irq];
 
 	/* Only try and allocate irqs on cpus that are present */
@@ -862,7 +863,7 @@ static void __clear_irq_vector(int irq)
 	cpumask_t mask;
 	int cpu, vector;
 
-	BUG_ON((unsigned)irq >= NR_IRQS);
+	BUG_ON((unsigned)irq >= nr_irqs);
 	cfg = &irq_cfg[irq];
 	BUG_ON(!cfg->vector);
 
@@ -882,7 +883,7 @@ static void __setup_vector_irq(int cpu)
 	int irq, vector;
 
 	/* Mark the inuse vectors */
-	for (irq = 0; irq < NR_IRQS; ++irq) {
+	for (irq = 0; irq < nr_irqs; ++irq) {
 		if (!cpu_isset(cpu, irq_cfg[irq].domain))
 			continue;
 		vector = irq_cfg[irq].vector;
@@ -1188,7 +1189,7 @@ __apicdebuginit(void) print_IO_APIC(void
 	}
 	}
 	printk(KERN_DEBUG "IRQ to pin mappings:\n");
-	for (i = 0; i < NR_IRQS; i++) {
+	for (i = 0; i < nr_irqs; i++) {
 		struct irq_pin_list *entry = irq_2_pin + i;
 		if (entry->pin < 0)
 			continue;
@@ -1361,7 +1362,7 @@ void __init enable_IO_APIC(void)
 	int i, apic;
 	unsigned long flags;
 
-	for (i = 0; i < PIN_MAP_SIZE; i++) {
+	for (i = 0; i < pin_map_size; i++) {
 		irq_2_pin[i].pin = -1;
 		irq_2_pin[i].next = 0;
 	}
@@ -1655,7 +1656,7 @@ static void ir_irq_migration(struct work
 {
 	int irq;
 
-	for (irq = 0; irq < NR_IRQS; irq++) {
+	for (irq = 0; irq < nr_irqs; irq++) {
 		struct irq_desc *desc = irq_desc + irq;
 		if (desc->status & IRQ_MOVE_PENDING) {
 			unsigned long flags;
@@ -1704,7 +1705,7 @@ asmlinkage void smp_irq_move_cleanup_int
 		struct irq_desc *desc;
 		struct irq_cfg *cfg;
 		irq = __get_cpu_var(vector_irq)[vector];
-		if (irq >= NR_IRQS)
+		if (irq >= nr_irqs)
 			continue;
 
 		desc = irq_desc + irq;
@@ -1862,7 +1863,7 @@ static inline void init_IO_APIC_traps(vo
 	 * Also, we've got to be careful not to trash gate
 	 * 0x80, because int 0x80 is hm, kind of importantish. ;)
 	 */
-	for (irq = 0; irq < NR_IRQS ; irq++) {
+	for (irq = 0; irq < nr_irqs ; irq++) {
 		if (IO_APIC_IRQ(irq) && !irq_cfg[irq].vector) {
 			/*
 			 * Hmm.. We don't have an entry for this,
@@ -2276,7 +2277,7 @@ int create_irq(void)
 
 	irq = -ENOSPC;
 	spin_lock_irqsave(&vector_lock, flags);
-	for (new = (NR_IRQS - 1); new >= 0; new--) {
+	for (new = (nr_irqs - 1); new >= 0; new--) {
 		if (platform_legacy_irq(new))
 			continue;
 		if (irq_cfg[new].vector != 0)
Index: linux-2.6/arch/x86/kernel/irq_32.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/irq_32.c
+++ linux-2.6/arch/x86/kernel/irq_32.c
@@ -226,7 +226,7 @@ unsigned int do_IRQ(struct pt_regs *regs
 	int overflow, irq = ~regs->orig_ax;
 	struct irq_desc *desc = irq_desc + irq;
 
-	if (unlikely((unsigned)irq >= NR_IRQS)) {
+	if (unlikely((unsigned)irq >= nr_irqs)) {
 		printk(KERN_EMERG "%s: cannot handle IRQ %d\n",
 					__func__, irq);
 		BUG();
@@ -271,7 +271,7 @@ int show_interrupts(struct seq_file *p,
 		seq_putc(p, '\n');
 	}
 
-	if (i < NR_IRQS) {
+	if (i < nr_irqs) {
 		unsigned any_count = 0;
 
 		spin_lock_irqsave(&irq_desc[i].lock, flags);
@@ -303,7 +303,7 @@ int show_interrupts(struct seq_file *p,
 		seq_putc(p, '\n');
 skip:
 		spin_unlock_irqrestore(&irq_desc[i].lock, flags);
-	} else if (i == NR_IRQS) {
+	} else if (i == nr_irqs) {
 		seq_printf(p, "NMI: ");
 		for_each_online_cpu(j)
 			seq_printf(p, "%10u ", nmi_count(j));
@@ -396,7 +396,7 @@ void fixup_irqs(cpumask_t map)
 	unsigned int irq;
 	static int warned;
 
-	for (irq = 0; irq < NR_IRQS; irq++) {
+	for (irq = 0; irq < nr_irqs; irq++) {
 		cpumask_t mask;
 		if (irq == 2)
 			continue;
Index: linux-2.6/arch/x86/kernel/irq_64.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/irq_64.c
+++ linux-2.6/arch/x86/kernel/irq_64.c
@@ -81,7 +81,7 @@ int show_interrupts(struct seq_file *p,
 		seq_putc(p, '\n');
 	}
 
-	if (i < NR_IRQS) {
+	if (i < nr_irqs) {
 		unsigned any_count = 0;
 
 		spin_lock_irqsave(&irq_desc[i].lock, flags);
@@ -112,7 +112,7 @@ int show_interrupts(struct seq_file *p,
 		seq_putc(p, '\n');
 skip:
 		spin_unlock_irqrestore(&irq_desc[i].lock, flags);
-	} else if (i == NR_IRQS) {
+	} else if (i == nr_irqs) {
 		seq_printf(p, "NMI: ");
 		for_each_online_cpu(j)
 			seq_printf(p, "%10u ", cpu_pda(j)->__nmi_count);
@@ -201,7 +201,7 @@ asmlinkage unsigned int do_IRQ(struct pt
 	stack_overflow_check(regs);
 #endif
 
-	if (likely(irq < NR_IRQS))
+	if (likely(irq < nr_irqs))
 		generic_handle_irq(irq);
 	else {
 		if (!disable_apic)
@@ -224,7 +224,7 @@ void fixup_irqs(cpumask_t map)
 	unsigned int irq;
 	static int warned;
 
-	for (irq = 0; irq < NR_IRQS; irq++) {
+	for (irq = 0; irq < nr_irqs; irq++) {
 		cpumask_t mask;
 		int break_affinity = 0;
 		int set_affinity = 1;
Index: linux-2.6/arch/x86/kernel/irqinit_32.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/irqinit_32.c
+++ linux-2.6/arch/x86/kernel/irqinit_32.c
@@ -91,7 +91,7 @@ void __init native_init_IRQ(void)
 	 */
 	for (i = 0; i < (NR_VECTORS - FIRST_EXTERNAL_VECTOR); i++) {
 		int vector = FIRST_EXTERNAL_VECTOR + i;
-		if (i >= NR_IRQS)
+		if (i >= nr_irqs)
 			break;
 		/* SYSCALL_VECTOR was reserved in trap_init. */
 		if (!test_bit(vector, used_vectors))
Index: linux-2.6/arch/x86/kernel/irqinit_64.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/irqinit_64.c
+++ linux-2.6/arch/x86/kernel/irqinit_64.c
@@ -142,7 +142,7 @@ static void __init init_ISA_irqs (void)
 	init_bsp_APIC();
 	init_8259A(0);
 
-	for (i = 0; i < NR_IRQS; i++) {
+	for (i = 0; i < nr_irqs; i++) {
 		irq_desc[i].status = IRQ_DISABLED;
 		irq_desc[i].action = NULL;
 		irq_desc[i].depth = 1;
Index: linux-2.6/drivers/char/hpet.c
===================================================================
--- linux-2.6.orig/drivers/char/hpet.c
+++ linux-2.6/drivers/char/hpet.c
@@ -222,7 +222,7 @@ static void hpet_timer_set_irq(struct hp
 	for (irq = find_first_bit(&v, HPET_MAX_IRQ); irq < HPET_MAX_IRQ;
 		irq = find_next_bit(&v, HPET_MAX_IRQ, 1 + irq)) {
 
-		if (irq >= NR_IRQS) {
+		if (irq >= nr_irqs) {
 			irq = HPET_MAX_IRQ;
 			break;
 		}
Index: linux-2.6/drivers/char/random.c
===================================================================
--- linux-2.6.orig/drivers/char/random.c
+++ linux-2.6/drivers/char/random.c
@@ -647,7 +647,7 @@ EXPORT_SYMBOL_GPL(add_input_randomness);
 
 void add_interrupt_randomness(int irq)
 {
-	if (irq >= NR_IRQS || irq_timer_state[irq] == NULL)
+	if (irq >= nr_irqs || irq_timer_state[irq] == NULL)
 		return;
 
 	DEBUG_ENT("irq event %d\n", irq);
@@ -911,7 +911,7 @@ void rand_initialize_irq(int irq)
 {
 	struct timer_rand_state *state;
 
-	if (irq >= NR_IRQS || irq_timer_state[irq])
+	if (irq >= nr_irqs || irq_timer_state[irq])
 		return;
 
 	/*
Index: linux-2.6/drivers/char/vr41xx_giu.c
===================================================================
--- linux-2.6.orig/drivers/char/vr41xx_giu.c
+++ linux-2.6/drivers/char/vr41xx_giu.c
@@ -641,7 +641,7 @@ static int __devinit giu_probe(struct pl
 	}
 
 	irq = platform_get_irq(dev, 0);
-	if (irq < 0 || irq >= NR_IRQS)
+	if (irq < 0 || irq >= nr_irqs)
 		return -EBUSY;
 
 	return cascade_irq(irq, giu_get_irq);
Index: linux-2.6/drivers/net/3c59x.c
===================================================================
--- linux-2.6.orig/drivers/net/3c59x.c
+++ linux-2.6/drivers/net/3c59x.c
@@ -90,7 +90,7 @@ static int vortex_debug = 1;
 #include <linux/eisa.h>
 #include <linux/bitops.h>
 #include <linux/jiffies.h>
-#include <asm/irq.h>			/* For NR_IRQS only. */
+#include <asm/irq.h>			/* For nr_irqs only. */
 #include <asm/io.h>
 #include <asm/uaccess.h>
 
@@ -1221,7 +1221,7 @@ static int __devinit vortex_probe1(struc
 	if (print_info)
 		printk(", IRQ %d\n", dev->irq);
 	/* Tell them about an invalid IRQ. */
-	if (dev->irq <= 0 || dev->irq >= NR_IRQS)
+	if (dev->irq <= 0 || dev->irq >= nr_irqs)
 		printk(KERN_WARNING " *** Warning: IRQ %d is unlikely to work! ***\n",
 			   dev->irq);
 
Index: linux-2.6/drivers/net/hamradio/baycom_ser_fdx.c
===================================================================
--- linux-2.6.orig/drivers/net/hamradio/baycom_ser_fdx.c
+++ linux-2.6/drivers/net/hamradio/baycom_ser_fdx.c
@@ -416,10 +416,10 @@ static int ser12_open(struct net_device
 	if (!dev || !bc)
 		return -ENXIO;
 	if (!dev->base_addr || dev->base_addr > 0xffff-SER12_EXTENT ||
-	    dev->irq < 2 || dev->irq > NR_IRQS) {
+	    dev->irq < 2 || dev->irq > nr_irqs) {
 		printk(KERN_INFO "baycom_ser_fdx: invalid portnumber (max %u) "
 				"or irq (2 <= irq <= %d)\n",
-				0xffff-SER12_EXTENT, NR_IRQS);
+				0xffff-SER12_EXTENT, nr_irqs);
 		return -ENXIO;
 	}
 	if (bc->baud < 300 || bc->baud > 4800) {
Index: linux-2.6/drivers/net/hamradio/scc.c
===================================================================
--- linux-2.6.orig/drivers/net/hamradio/scc.c
+++ linux-2.6/drivers/net/hamradio/scc.c
@@ -1465,7 +1465,7 @@ static void z8530_init(void)
 	printk(KERN_INFO "Init Z8530 driver: %u channels, IRQ", Nchips*2);
 	
 	flag=" ";
-	for (k = 0; k < NR_IRQS; k++)
+	for (k = 0; k < nr_irqs; k++)
 		if (Ivec[k].used) 
 		{
 			printk("%s%d", flag, k);
@@ -1728,7 +1728,7 @@ static int scc_net_ioctl(struct net_devi
 
 			if (hwcfg.irq == 2) hwcfg.irq = 9;
 
-			if (hwcfg.irq < 0 || hwcfg.irq >= NR_IRQS)
+			if (hwcfg.irq < 0 || hwcfg.irq >= nr_irqs)
 				return -EINVAL;
 				
 			if (!Ivec[hwcfg.irq].used && hwcfg.irq)
@@ -2148,7 +2148,7 @@ static void __exit scc_cleanup_driver(vo
 		}
 		
 	/* To unload the port must be closed so no real IRQ pending */
-	for (k=0; k < NR_IRQS ; k++)
+	for (k=0; k < nr_irqs ; k++)
 		if (Ivec[k].used) free_irq(k, NULL);
 		
 	local_irq_enable();
Index: linux-2.6/drivers/net/wan/sbni.c
===================================================================
--- linux-2.6.orig/drivers/net/wan/sbni.c
+++ linux-2.6/drivers/net/wan/sbni.c
@@ -318,7 +318,7 @@ sbni_pci_probe( struct net_device  *dev
 				continue;
 		}
 
-		if( pci_irq_line <= 0  ||  pci_irq_line >= NR_IRQS )
+		if( pci_irq_line <= 0  ||  pci_irq_line >= nr_irqs )
 			printk( KERN_WARNING "  WARNING: The PCI BIOS assigned "
 				"this PCI card to IRQ %d, which is unlikely "
 				"to work!.\n"
Index: linux-2.6/drivers/pci/intr_remapping.c
===================================================================
--- linux-2.6.orig/drivers/pci/intr_remapping.c
+++ linux-2.6/drivers/pci/intr_remapping.c
@@ -22,7 +22,7 @@ static DEFINE_SPINLOCK(irq_2_ir_lock);
 
 int irq_remapped(int irq)
 {
-	if (irq > NR_IRQS)
+	if (irq > nr_irqs)
 		return 0;
 
 	if (!irq_2_iommu[irq].iommu)
@@ -35,7 +35,7 @@ int get_irte(int irq, struct irte *entry
 {
 	int index;
 
-	if (!entry || irq > NR_IRQS)
+	if (!entry || irq > nr_irqs)
 		return -1;
 
 	spin_lock(&irq_2_ir_lock);
@@ -126,7 +126,7 @@ int map_irq_to_irte_handle(int irq, u16
 	int index;
 
 	spin_lock(&irq_2_ir_lock);
-	if (irq >= NR_IRQS || !irq_2_iommu[irq].iommu) {
+	if (irq >= nr_irqs || !irq_2_iommu[irq].iommu) {
 		spin_unlock(&irq_2_ir_lock);
 		return -1;
 	}
@@ -140,7 +140,7 @@ int map_irq_to_irte_handle(int irq, u16
 int set_irte_irq(int irq, struct intel_iommu *iommu, u16 index, u16 subhandle)
 {
 	spin_lock(&irq_2_ir_lock);
-	if (irq >= NR_IRQS || irq_2_iommu[irq].iommu) {
+	if (irq >= nr_irqs || irq_2_iommu[irq].iommu) {
 		spin_unlock(&irq_2_ir_lock);
 		return -1;
 	}
@@ -158,7 +158,7 @@ int set_irte_irq(int irq, struct intel_i
 int clear_irte_irq(int irq, struct intel_iommu *iommu, u16 index)
 {
 	spin_lock(&irq_2_ir_lock);
-	if (irq >= NR_IRQS || !irq_2_iommu[irq].iommu) {
+	if (irq >= nr_irqs || !irq_2_iommu[irq].iommu) {
 		spin_unlock(&irq_2_ir_lock);
 		return -1;
 	}
@@ -180,7 +180,7 @@ int modify_irte(int irq, struct irte *ir
 	struct intel_iommu *iommu;
 
 	spin_lock(&irq_2_ir_lock);
-	if (irq >= NR_IRQS || !irq_2_iommu[irq].iommu) {
+	if (irq >= nr_irqs || !irq_2_iommu[irq].iommu) {
 		spin_unlock(&irq_2_ir_lock);
 		return -1;
 	}
@@ -205,7 +205,7 @@ int flush_irte(int irq)
 	struct intel_iommu *iommu;
 
 	spin_lock(&irq_2_ir_lock);
-	if (irq >= NR_IRQS || !irq_2_iommu[irq].iommu) {
+	if (irq >= nr_irqs || !irq_2_iommu[irq].iommu) {
 		spin_unlock(&irq_2_ir_lock);
 		return -1;
 	}
@@ -248,7 +248,7 @@ int free_irte(int irq)
 	struct intel_iommu *iommu;
 
 	spin_lock(&irq_2_ir_lock);
-	if (irq >= NR_IRQS || !irq_2_iommu[irq].iommu) {
+	if (irq >= nr_irqs || !irq_2_iommu[irq].iommu) {
 		spin_unlock(&irq_2_ir_lock);
 		return -1;
 	}
Index: linux-2.6/drivers/pcmcia/at91_cf.c
===================================================================
--- linux-2.6.orig/drivers/pcmcia/at91_cf.c
+++ linux-2.6/drivers/pcmcia/at91_cf.c
@@ -273,7 +273,7 @@ static int __init at91_cf_probe(struct p
 			goto fail0d;
 		cf->socket.pci_irq = board->irq_pin;
 	} else
-		cf->socket.pci_irq = NR_IRQS + 1;
+		cf->socket.pci_irq = nr_irqs + 1;
 
 	/* pcmcia layer only remaps "real" memory not iospace */
 	cf->socket.io_offset = (unsigned long)
Index: linux-2.6/drivers/pcmcia/vrc4171_card.c
===================================================================
--- linux-2.6.orig/drivers/pcmcia/vrc4171_card.c
+++ linux-2.6/drivers/pcmcia/vrc4171_card.c
@@ -639,7 +639,7 @@ static int __devinit vrc4171_card_setup(
 		int irq;
 		options += 4;
 		irq = simple_strtoul(options, &options, 0);
-		if (irq >= 0 && irq < NR_IRQS)
+		if (irq >= 0 && irq < nr_irqs)
 			vrc4171_irq = irq;
 
 		if (*options != ',')
Index: linux-2.6/drivers/rtc/rtc-vr41xx.c
===================================================================
--- linux-2.6.orig/drivers/rtc/rtc-vr41xx.c
+++ linux-2.6/drivers/rtc/rtc-vr41xx.c
@@ -360,7 +360,7 @@ static int __devinit rtc_probe(struct pl
 	spin_unlock_irq(&rtc_lock);
 
 	aie_irq = platform_get_irq(pdev, 0);
-	if (aie_irq < 0 || aie_irq >= NR_IRQS) {
+	if (aie_irq < 0 || aie_irq >= nr_irqs) {
 		retval = -EBUSY;
 		goto err_device_unregister;
 	}
@@ -371,7 +371,7 @@ static int __devinit rtc_probe(struct pl
 		goto err_device_unregister;
 
 	pie_irq = platform_get_irq(pdev, 1);
-	if (pie_irq < 0 || pie_irq >= NR_IRQS)
+	if (pie_irq < 0 || pie_irq >= nr_irqs)
 		goto err_free_irq;
 
 	retval = request_irq(pie_irq, rtclong1_interrupt, IRQF_DISABLED,
Index: linux-2.6/drivers/scsi/aha152x.c
===================================================================
--- linux-2.6.orig/drivers/scsi/aha152x.c
+++ linux-2.6/drivers/scsi/aha152x.c
@@ -337,7 +337,7 @@ CMD_INC_RESID(struct scsi_cmnd *cmd, int
 #else
 #define IRQ_MIN 9
 #if defined(__PPC)
-#define IRQ_MAX (NR_IRQS-1)
+#define IRQ_MAX (nr_irqs-1)
 #else
 #define IRQ_MAX 12
 #endif
Index: linux-2.6/drivers/serial/8250.c
===================================================================
--- linux-2.6.orig/drivers/serial/8250.c
+++ linux-2.6/drivers/serial/8250.c
@@ -2433,7 +2433,7 @@ static void serial8250_config_port(struc
 static int
 serial8250_verify_port(struct uart_port *port, struct serial_struct *ser)
 {
-	if (ser->irq >= NR_IRQS || ser->irq < 0 ||
+	if (ser->irq >= nr_irqs || ser->irq < 0 ||
 	    ser->baud_base < 9600 || ser->type < PORT_UNKNOWN ||
 	    ser->type >= ARRAY_SIZE(uart_config) || ser->type == PORT_CIRRUS ||
 	    ser->type == PORT_STARTECH)
@@ -2964,7 +2964,7 @@ static int __init serial8250_init(void)
 		"%d ports, IRQ sharing %sabled\n", nr_uarts,
 		share_irqs ? "en" : "dis");
 
-	for (i = 0; i < NR_IRQS; i++)
+	for (i = 0; i < nr_irqs; i++)
 		spin_lock_init(&irq_lists[i].lock);
 
 	ret = uart_register_driver(&serial8250_reg);
Index: linux-2.6/drivers/serial/amba-pl010.c
===================================================================
--- linux-2.6.orig/drivers/serial/amba-pl010.c
+++ linux-2.6/drivers/serial/amba-pl010.c
@@ -512,7 +512,7 @@ static int pl010_verify_port(struct uart
 	int ret = 0;
 	if (ser->type != PORT_UNKNOWN && ser->type != PORT_AMBA)
 		ret = -EINVAL;
-	if (ser->irq < 0 || ser->irq >= NR_IRQS)
+	if (ser->irq < 0 || ser->irq >= nr_irqs)
 		ret = -EINVAL;
 	if (ser->baud_base < 9600)
 		ret = -EINVAL;
Index: linux-2.6/drivers/serial/amba-pl011.c
===================================================================
--- linux-2.6.orig/drivers/serial/amba-pl011.c
+++ linux-2.6/drivers/serial/amba-pl011.c
@@ -572,7 +572,7 @@ static int pl010_verify_port(struct uart
 	int ret = 0;
 	if (ser->type != PORT_UNKNOWN && ser->type != PORT_AMBA)
 		ret = -EINVAL;
-	if (ser->irq < 0 || ser->irq >= NR_IRQS)
+	if (ser->irq < 0 || ser->irq >= nr_irqs)
 		ret = -EINVAL;
 	if (ser->baud_base < 9600)
 		ret = -EINVAL;
Index: linux-2.6/drivers/serial/cpm_uart/cpm_uart_core.c
===================================================================
--- linux-2.6.orig/drivers/serial/cpm_uart/cpm_uart_core.c
+++ linux-2.6/drivers/serial/cpm_uart/cpm_uart_core.c
@@ -589,7 +589,7 @@ static int cpm_uart_verify_port(struct u
 
 	if (ser->type != PORT_UNKNOWN && ser->type != PORT_CPM)
 		ret = -EINVAL;
-	if (ser->irq < 0 || ser->irq >= NR_IRQS)
+	if (ser->irq < 0 || ser->irq >= nr_irqs)
 		ret = -EINVAL;
 	if (ser->baud_base < 9600)
 		ret = -EINVAL;
Index: linux-2.6/drivers/serial/m32r_sio.c
===================================================================
--- linux-2.6.orig/drivers/serial/m32r_sio.c
+++ linux-2.6/drivers/serial/m32r_sio.c
@@ -922,7 +922,7 @@ static void m32r_sio_config_port(struct
 static int
 m32r_sio_verify_port(struct uart_port *port, struct serial_struct *ser)
 {
-	if (ser->irq >= NR_IRQS || ser->irq < 0 ||
+	if (ser->irq >= nr_irqs || ser->irq < 0 ||
 	    ser->baud_base < 9600 || ser->type < PORT_UNKNOWN ||
 	    ser->type >= ARRAY_SIZE(uart_config))
 		return -EINVAL;
@@ -1162,7 +1162,7 @@ static int __init m32r_sio_init(void)
 
 	printk(KERN_INFO "Serial: M32R SIO driver\n");
 
-	for (i = 0; i < NR_IRQS; i++)
+	for (i = 0; i < nr_irqs; i++)
 		spin_lock_init(&irq_lists[i].lock);
 
 	ret = uart_register_driver(&m32r_sio_reg);
Index: linux-2.6/drivers/serial/serial_core.c
===================================================================
--- linux-2.6.orig/drivers/serial/serial_core.c
+++ linux-2.6/drivers/serial/serial_core.c
@@ -741,7 +741,7 @@ static int uart_set_info(struct uart_sta
 	if (port->ops->verify_port)
 		retval = port->ops->verify_port(port, &new_serial);
 
-	if ((new_serial.irq >= NR_IRQS) || (new_serial.irq < 0) ||
+	if ((new_serial.irq >= nr_irqs) || (new_serial.irq < 0) ||
 	    (new_serial.baud_base < 9600))
 		retval = -EINVAL;
 
Index: linux-2.6/drivers/serial/serial_lh7a40x.c
===================================================================
--- linux-2.6.orig/drivers/serial/serial_lh7a40x.c
+++ linux-2.6/drivers/serial/serial_lh7a40x.c
@@ -460,7 +460,7 @@ static int lh7a40xuart_verify_port (stru
 
 	if (ser->type != PORT_UNKNOWN && ser->type != PORT_LH7A40X)
 		ret = -EINVAL;
-	if (ser->irq < 0 || ser->irq >= NR_IRQS)
+	if (ser->irq < 0 || ser->irq >= nr_irqs)
 		ret = -EINVAL;
 	if (ser->baud_base < 9600) /* *** FIXME: is this true? */
 		ret = -EINVAL;
Index: linux-2.6/drivers/serial/sh-sci.c
===================================================================
--- linux-2.6.orig/drivers/serial/sh-sci.c
+++ linux-2.6/drivers/serial/sh-sci.c
@@ -1157,7 +1157,7 @@ static int sci_verify_port(struct uart_p
 {
 	struct sci_port *s = &sci_ports[port->line];
 
-	if (ser->irq != s->irqs[SCIx_TXI_IRQ] || ser->irq > NR_IRQS)
+	if (ser->irq != s->irqs[SCIx_TXI_IRQ] || ser->irq > nr_irqs)
 		return -EINVAL;
 	if (ser->baud_base < 2400)
 		/* No paper tape reader for Mitch.. */
Index: linux-2.6/drivers/serial/ucc_uart.c
===================================================================
--- linux-2.6.orig/drivers/serial/ucc_uart.c
+++ linux-2.6/drivers/serial/ucc_uart.c
@@ -1066,7 +1066,7 @@ static int qe_uart_verify_port(struct ua
 	if (ser->type != PORT_UNKNOWN && ser->type != PORT_CPM)
 		return -EINVAL;
 
-	if (ser->irq < 0 || ser->irq >= NR_IRQS)
+	if (ser->irq < 0 || ser->irq >= nr_irqs)
 		return -EINVAL;
 
 	if (ser->baud_base < 9600)
Index: linux-2.6/drivers/xen/events.c
===================================================================
--- linux-2.6.orig/drivers/xen/events.c
+++ linux-2.6/drivers/xen/events.c
@@ -139,7 +139,7 @@ static void init_evtchn_cpu_bindings(voi
 #ifdef CONFIG_SMP
 	int i;
 	/* By default all event channels notify CPU#0. */
-	for (i = 0; i < NR_IRQS; i++)
+	for (i = 0; i < nr_irqs; i++)
 		irq_desc[i].affinity = cpumask_of_cpu(0);
 #endif
 
@@ -223,12 +223,12 @@ static int find_unbound_irq(void)
 	int irq;
 
 	/* Only allocate from dynirq range */
-	for (irq = 0; irq < NR_IRQS; irq++)
+	for (irq = 0; irq < nr_irqs; irq++)
 		if (irq_bindcount[irq] == 0)
 			break;
 
-	if (irq == NR_IRQS)
-		panic("No available IRQ to bind to: increase NR_IRQS!\n");
+	if (irq == nr_irqs)
+		panic("No available IRQ to bind to: increase nr_irqs!\n");
 
 	return irq;
 }
@@ -761,7 +761,7 @@ void xen_irq_resume(void)
 		mask_evtchn(evtchn);
 
 	/* No IRQ <-> event-channel mappings. */
-	for (irq = 0; irq < NR_IRQS; irq++)
+	for (irq = 0; irq < nr_irqs; irq++)
 		irq_info[irq].evtchn = 0; /* zap event-channel binding */
 
 	for (evtchn = 0; evtchn < NR_EVENT_CHANNELS; evtchn++)
@@ -793,7 +793,7 @@ void __init xen_init_IRQ(void)
 		mask_evtchn(i);
 
 	/* Dynamic IRQ space is currently unbound. Zero the refcnts. */
-	for (i = 0; i < NR_IRQS; i++)
+	for (i = 0; i < nr_irqs; i++)
 		irq_bindcount[i] = 0;
 
 	irq_ctx_init(smp_processor_id());
Index: linux-2.6/fs/proc/proc_misc.c
===================================================================
--- linux-2.6.orig/fs/proc/proc_misc.c
+++ linux-2.6/fs/proc/proc_misc.c
@@ -503,7 +503,7 @@ static int show_stat(struct seq_file *p,
 	struct timespec boottime;
 	unsigned int *per_irq_sum;
 
-	per_irq_sum = kzalloc(sizeof(unsigned int)*NR_IRQS, GFP_KERNEL);
+	per_irq_sum = kzalloc(sizeof(unsigned int)*nr_irqs, GFP_KERNEL);
 	if (!per_irq_sum)
 		return -ENOMEM;
 
@@ -525,7 +525,7 @@ static int show_stat(struct seq_file *p,
 		softirq = cputime64_add(softirq, kstat_cpu(i).cpustat.softirq);
 		steal = cputime64_add(steal, kstat_cpu(i).cpustat.steal);
 		guest = cputime64_add(guest, kstat_cpu(i).cpustat.guest);
-		for (j = 0; j < NR_IRQS; j++) {
+		for (j = 0; j < nr_irqs; j++) {
 			unsigned int temp = kstat_cpu(i).irqs[j];
 			sum += temp;
 			per_irq_sum[j] += temp;
@@ -571,7 +571,7 @@ static int show_stat(struct seq_file *p,
 	}
 	seq_printf(p, "intr %llu", (unsigned long long)sum);
 
-	for (i = 0; i < NR_IRQS; i++)
+	for (i = 0; i < nr_irqs; i++)
 		seq_printf(p, " %u", per_irq_sum[i]);
 
 	seq_printf(p,
@@ -625,13 +625,13 @@ static const struct file_operations proc
  */
 static void *int_seq_start(struct seq_file *f, loff_t *pos)
 {
-	return (*pos <= NR_IRQS) ? pos : NULL;
+	return (*pos <= nr_irqs) ? pos : NULL;
 }
 
 static void *int_seq_next(struct seq_file *f, void *v, loff_t *pos)
 {
 	(*pos)++;
-	if (*pos > NR_IRQS)
+	if (*pos > nr_irqs)
 		return NULL;
 	return pos;
 }
Index: linux-2.6/include/asm-x86/irq.h
===================================================================
--- linux-2.6.orig/include/asm-x86/irq.h
+++ linux-2.6/include/asm-x86/irq.h
@@ -10,6 +10,9 @@
 #include <asm/apicdef.h>
 #include <asm/irq_vectors.h>
 
+extern int pin_map_size;
+extern int first_free_entry;
+
 static inline int irq_canonicalize(int irq)
 {
 	return ((irq == 2) ? 9 : irq);
Index: linux-2.6/include/linux/irq.h
===================================================================
--- linux-2.6.orig/include/linux/irq.h
+++ linux-2.6/include/linux/irq.h
@@ -24,6 +24,8 @@
 #include <asm/ptrace.h>
 #include <asm/irq_regs.h>
 
+extern int nr_irqs;
+
 struct irq_desc;
 typedef	void (*irq_flow_handler_t)(unsigned int irq,
 					    struct irq_desc *desc);
Index: linux-2.6/kernel/irq/autoprobe.c
===================================================================
--- linux-2.6.orig/kernel/irq/autoprobe.c
+++ linux-2.6/kernel/irq/autoprobe.c
@@ -38,7 +38,7 @@ unsigned long probe_irq_on(void)
 	 * something may have generated an irq long ago and we want to
 	 * flush such a longstanding irq before considering it as spurious.
 	 */
-	for (i = NR_IRQS-1; i > 0; i--) {
+	for (i = nr_irqs-1; i > 0; i--) {
 		desc = irq_desc + i;
 
 		spin_lock_irq(&desc->lock);
@@ -68,7 +68,7 @@ unsigned long probe_irq_on(void)
 	 * (we must startup again here because if a longstanding irq
 	 * happened in the previous stage, it may have masked itself)
 	 */
-	for (i = NR_IRQS-1; i > 0; i--) {
+	for (i = nr_irqs-1; i > 0; i--) {
 		desc = irq_desc + i;
 
 		spin_lock_irq(&desc->lock);
@@ -89,7 +89,7 @@ unsigned long probe_irq_on(void)
 	 * Now filter out any obviously spurious interrupts
 	 */
 	mask = 0;
-	for (i = 0; i < NR_IRQS; i++) {
+	for (i = 0; i < nr_irqs; i++) {
 		unsigned int status;
 
 		desc = irq_desc + i;
@@ -130,7 +130,7 @@ unsigned int probe_irq_mask(unsigned lon
 	int i;
 
 	mask = 0;
-	for (i = 0; i < NR_IRQS; i++) {
+	for (i = 0; i < nr_irqs; i++) {
 		struct irq_desc *desc = irq_desc + i;
 		unsigned int status;
 
@@ -173,7 +173,7 @@ int probe_irq_off(unsigned long val)
 {
 	int i, irq_found = 0, nr_irqs = 0;
 
-	for (i = 0; i < NR_IRQS; i++) {
+	for (i = 0; i < nr_irqs; i++) {
 		struct irq_desc *desc = irq_desc + i;
 		unsigned int status;
 
Index: linux-2.6/kernel/irq/chip.c
===================================================================
--- linux-2.6.orig/kernel/irq/chip.c
+++ linux-2.6/kernel/irq/chip.c
@@ -27,7 +27,7 @@ void dynamic_irq_init(unsigned int irq)
 	struct irq_desc *desc;
 	unsigned long flags;
 
-	if (irq >= NR_IRQS) {
+	if (irq >= nr_irqs) {
 		WARN(1, KERN_ERR "Trying to initialize invalid IRQ%d\n", irq);
 		return;
 	}
@@ -60,7 +60,7 @@ void dynamic_irq_cleanup(unsigned int ir
 	struct irq_desc *desc;
 	unsigned long flags;
 
-	if (irq >= NR_IRQS) {
+	if (irq >= nr_irqs) {
 		WARN(1, KERN_ERR "Trying to cleanup invalid IRQ%d\n", irq);
 		return;
 	}
@@ -92,7 +92,7 @@ int set_irq_chip(unsigned int irq, struc
 	struct irq_desc *desc;
 	unsigned long flags;
 
-	if (irq >= NR_IRQS) {
+	if (irq >= nr_irqs) {
 		WARN(1, KERN_ERR "Trying to install chip for IRQ%d\n", irq);
 		return -EINVAL;
 	}
@@ -121,7 +121,7 @@ int set_irq_type(unsigned int irq, unsig
 	unsigned long flags;
 	int ret = -ENXIO;
 
-	if (irq >= NR_IRQS) {
+	if (irq >= nr_irqs) {
 		printk(KERN_ERR "Trying to set irq type for IRQ%d\n", irq);
 		return -ENODEV;
 	}
@@ -148,7 +148,7 @@ int set_irq_data(unsigned int irq, void
 	struct irq_desc *desc;
 	unsigned long flags;
 
-	if (irq >= NR_IRQS) {
+	if (irq >= nr_irqs) {
 		printk(KERN_ERR
 		       "Trying to install controller data for IRQ%d\n", irq);
 		return -EINVAL;
@@ -174,7 +174,7 @@ int set_irq_msi(unsigned int irq, struct
 	struct irq_desc *desc;
 	unsigned long flags;
 
-	if (irq >= NR_IRQS) {
+	if (irq >= nr_irqs) {
 		printk(KERN_ERR
 		       "Trying to install msi data for IRQ%d\n", irq);
 		return -EINVAL;
@@ -200,7 +200,7 @@ int set_irq_chip_data(unsigned int irq,
 	struct irq_desc *desc = irq_desc + irq;
 	unsigned long flags;
 
-	if (irq >= NR_IRQS || !desc->chip) {
+	if (irq >= nr_irqs || !desc->chip) {
 		printk(KERN_ERR "BUG: bad set_irq_chip_data(IRQ#%d)\n", irq);
 		return -EINVAL;
 	}
@@ -544,7 +544,7 @@ __set_irq_handler(unsigned int irq, irq_
 	struct irq_desc *desc;
 	unsigned long flags;
 
-	if (irq >= NR_IRQS) {
+	if (irq >= nr_irqs) {
 		printk(KERN_ERR
 		       "Trying to install type control for IRQ%d\n", irq);
 		return;
@@ -609,7 +609,7 @@ void __init set_irq_noprobe(unsigned int
 	struct irq_desc *desc;
 	unsigned long flags;
 
-	if (irq >= NR_IRQS) {
+	if (irq >= nr_irqs) {
 		printk(KERN_ERR "Trying to mark IRQ%d non-probeable\n", irq);
 
 		return;
@@ -627,7 +627,7 @@ void __init set_irq_probe(unsigned int i
 	struct irq_desc *desc;
 	unsigned long flags;
 
-	if (irq >= NR_IRQS) {
+	if (irq >= nr_irqs) {
 		printk(KERN_ERR "Trying to mark IRQ%d probeable\n", irq);
 
 		return;
Index: linux-2.6/kernel/irq/handle.c
===================================================================
--- linux-2.6.orig/kernel/irq/handle.c
+++ linux-2.6/kernel/irq/handle.c
@@ -47,6 +47,7 @@ handle_bad_irq(unsigned int irq, struct
  *
  * Controller mappings for all interrupt sources:
  */
+int nr_irqs = NR_IRQS;
 struct irq_desc irq_desc[NR_IRQS] __cacheline_aligned_in_smp = {
 	[0 ... NR_IRQS-1] = {
 		.status = IRQ_DISABLED,
@@ -265,7 +266,7 @@ void early_init_irq_lock_class(void)
 {
 	int i;
 
-	for (i = 0; i < NR_IRQS; i++)
+	for (i = 0; i < nr_irqs; i++)
 		lockdep_set_class(&irq_desc[i].lock, &irq_desc_lock_class);
 }
 
Index: linux-2.6/kernel/irq/manage.c
===================================================================
--- linux-2.6.orig/kernel/irq/manage.c
+++ linux-2.6/kernel/irq/manage.c
@@ -34,7 +34,7 @@ void synchronize_irq(unsigned int irq)
 	struct irq_desc *desc = irq_desc + irq;
 	unsigned int status;
 
-	if (irq >= NR_IRQS)
+	if (irq >= nr_irqs)
 		return;
 
 	do {
@@ -143,7 +143,7 @@ void disable_irq_nosync(unsigned int irq
 	struct irq_desc *desc = irq_desc + irq;
 	unsigned long flags;
 
-	if (irq >= NR_IRQS)
+	if (irq >= nr_irqs)
 		return;
 
 	spin_lock_irqsave(&desc->lock, flags);
@@ -171,7 +171,7 @@ void disable_irq(unsigned int irq)
 {
 	struct irq_desc *desc = irq_desc + irq;
 
-	if (irq >= NR_IRQS)
+	if (irq >= nr_irqs)
 		return;
 
 	disable_irq_nosync(irq);
@@ -214,7 +214,7 @@ void enable_irq(unsigned int irq)
 	struct irq_desc *desc = irq_desc + irq;
 	unsigned long flags;
 
-	if (irq >= NR_IRQS)
+	if (irq >= nr_irqs)
 		return;
 
 	spin_lock_irqsave(&desc->lock, flags);
@@ -290,7 +290,7 @@ int can_request_irq(unsigned int irq, un
 {
 	struct irqaction *action;
 
-	if (irq >= NR_IRQS || irq_desc[irq].status & IRQ_NOREQUEST)
+	if (irq >= nr_irqs || irq_desc[irq].status & IRQ_NOREQUEST)
 		return 0;
 
 	action = irq_desc[irq].action;
@@ -349,7 +349,7 @@ int setup_irq(unsigned int irq, struct i
 	int shared = 0;
 	int ret;
 
-	if (irq >= NR_IRQS)
+	if (irq >= nr_irqs)
 		return -EINVAL;
 
 	if (desc->chip == &no_irq_chip)
@@ -503,7 +503,7 @@ void free_irq(unsigned int irq, void *de
 	unsigned long flags;
 
 	WARN_ON(in_interrupt());
-	if (irq >= NR_IRQS)
+	if (irq >= nr_irqs)
 		return;
 
 	desc = irq_desc + irq;
@@ -617,7 +617,7 @@ int request_irq(unsigned int irq, irq_ha
 	 */
 	if ((irqflags & IRQF_SHARED) && !dev_id)
 		return -EINVAL;
-	if (irq >= NR_IRQS)
+	if (irq >= nr_irqs)
 		return -EINVAL;
 	if (irq_desc[irq].status & IRQ_NOREQUEST)
 		return -EINVAL;
Index: linux-2.6/kernel/irq/proc.c
===================================================================
--- linux-2.6.orig/kernel/irq/proc.c
+++ linux-2.6/kernel/irq/proc.c
@@ -234,7 +234,7 @@ void init_irq_proc(void)
 	/*
 	 * Create entries for all existing IRQs.
 	 */
-	for (i = 0; i < NR_IRQS; i++)
+	for (i = 0; i < nr_irqs; i++)
 		register_irq_proc(i);
 }
 
Index: linux-2.6/kernel/irq/resend.c
===================================================================
--- linux-2.6.orig/kernel/irq/resend.c
+++ linux-2.6/kernel/irq/resend.c
@@ -33,8 +33,8 @@ static void resend_irqs(unsigned long ar
 	struct irq_desc *desc;
 	int irq;
 
-	while (!bitmap_empty(irqs_resend, NR_IRQS)) {
-		irq = find_first_bit(irqs_resend, NR_IRQS);
+	while (!bitmap_empty(irqs_resend, nr_irqs)) {
+		irq = find_first_bit(irqs_resend, nr_irqs);
 		clear_bit(irq, irqs_resend);
 		desc = irq_desc + irq;
 		local_irq_disable();
Index: linux-2.6/kernel/irq/spurious.c
===================================================================
--- linux-2.6.orig/kernel/irq/spurious.c
+++ linux-2.6/kernel/irq/spurious.c
@@ -91,7 +91,7 @@ static int misrouted_irq(int irq)
 	int i;
 	int ok = 0;
 
-	for (i = 1; i < NR_IRQS; i++) {
+	for (i = 1; i < nr_irqs; i++) {
 		struct irq_desc *desc = irq_desc + i;
 
 		if (i == irq)	/* Already tried */
@@ -107,7 +107,7 @@ static int misrouted_irq(int irq)
 static void poll_spurious_irqs(unsigned long dummy)
 {
 	int i;
-	for (i = 1; i < NR_IRQS; i++) {
+	for (i = 1; i < nr_irqs; i++) {
 		struct irq_desc *desc = irq_desc + i;
 		unsigned int status;
 

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH 03/16] add dyn_array support
  2008-08-01  9:37   ` [PATCH 02/16] x86: introduce nr_irqs for 64bit v3 Yinghai Lu
@ 2008-08-01  9:37     ` Yinghai Lu
  2008-08-01  9:37       ` [PATCH 04/16] make irq_timer_state to use dyn_array Yinghai Lu
  0 siblings, 1 reply; 39+ messages in thread
From: Yinghai Lu @ 2008-08-01  9:37 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, hpa, Eric Biederman, Dhaval Giani,
	Mike Travis, Andrew Morton
  Cc: linux-kernel, Yinghai Lu

so could put some crazy big array in bootmem in init stage.

use CONFIG_HAVE_DYN_ARRAY to enable it or not

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>

---
 include/asm-generic/vmlinux.lds.h |    7 +++++++
 include/linux/init.h              |   23 +++++++++++++++++++++++
 init/main.c                       |   25 +++++++++++++++++++++++++
 3 files changed, 55 insertions(+)

Index: linux-2.6/include/asm-generic/vmlinux.lds.h
===================================================================
--- linux-2.6.orig/include/asm-generic/vmlinux.lds.h
+++ linux-2.6/include/asm-generic/vmlinux.lds.h
@@ -214,6 +214,13 @@
  * All archs are supposed to use RO_DATA() */
 #define RODATA RO_DATA(4096)
 
+#define DYN_ARRAY_INIT(align)							\
+	. = ALIGN((align));						\
+	.dyn_array.init : AT(ADDR(.dyn_array.init) - LOAD_OFFSET) {	\
+		VMLINUX_SYMBOL(__dyn_array_start) = .;			\
+		*(.dyn_array.init)					\
+		VMLINUX_SYMBOL(__dyn_array_end) = .;			\
+	}
 #define SECURITY_INIT							\
 	.security_initcall.init : AT(ADDR(.security_initcall.init) - LOAD_OFFSET) { \
 		VMLINUX_SYMBOL(__security_initcall_start) = .;		\
Index: linux-2.6/include/linux/init.h
===================================================================
--- linux-2.6.orig/include/linux/init.h
+++ linux-2.6/include/linux/init.h
@@ -249,6 +249,29 @@ struct obs_kernel_param {
 
 /* Relies on boot_command_line being set */
 void __init parse_early_param(void);
+
+struct dyn_array {
+        void **name;
+        unsigned long size;
+        unsigned int *nr;
+        unsigned long align;
+        void (*init_work)(void *);
+};
+extern struct dyn_array *__dyn_array_start[], *__dyn_array_end[];
+
+#define DEFINE_DYN_ARRAY(nameX, sizeX, nrX, alignX, init_workX) \
+		static struct dyn_array __dyn_array_##nameX __initdata = \
+		{	.name = (void **)&nameX,\
+			.size = sizeX,\
+			.nr   = &nrX,\
+			.align = alignX,\
+			.init_work = init_workX,\
+		}; \
+		static struct dyn_array *__dyn_array_ptr_##nameX __used \
+		__attribute__((__section__(".dyn_array.init"))) = \
+			&__dyn_array_##nameX
+
+extern void pre_alloc_dyn_array(void);
 #endif /* __ASSEMBLY__ */
 
 /**
Index: linux-2.6/init/main.c
===================================================================
--- linux-2.6.orig/init/main.c
+++ linux-2.6/init/main.c
@@ -539,6 +539,29 @@ void __init __weak thread_info_cache_ini
 {
 }
 
+void pre_alloc_dyn_array(void)
+{
+#ifdef CONFIG_HAVE_DYN_ARRAY
+	unsigned long size, phys = 0;
+	struct dyn_array **daa;
+
+	for (daa = __dyn_array_start ; daa < __dyn_array_end; daa++) {
+		struct dyn_array *da = *daa;
+
+		size = da->size * (*da->nr);
+		print_fn_descriptor_symbol("dyna_array %s ", da->name);
+		printk(KERN_CONT "size:%#lx nr:%d align:%#lx",
+			da->size, *da->nr, da->align);
+		*da->name = __alloc_bootmem_nopanic(size, da->align, phys);
+		phys = virt_to_phys(*da->name);
+		printk(KERN_CONT " ==> [%#lx - %#lx]\n", phys, phys + size);
+
+		if (da->init_work)
+			da->init_work(da);
+	}
+#endif
+}
+
 asmlinkage void __init start_kernel(void)
 {
 	char * command_line;
@@ -576,6 +599,8 @@ asmlinkage void __init start_kernel(void
 	printk(KERN_NOTICE);
 	printk(linux_banner);
 	setup_arch(&command_line);
+	printk(KERN_INFO "nr_irqs: %d\n", nr_irqs);
+	pre_alloc_dyn_array();
 	mm_init_owner(&init_mm, &init_task);
 	setup_command_line(command_line);
 	unwind_setup();

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH 04/16] make irq_timer_state to use dyn_array
  2008-08-01  9:37     ` [PATCH 03/16] add dyn_array support Yinghai Lu
@ 2008-08-01  9:37       ` Yinghai Lu
  2008-08-01  9:37         ` [PATCH 05/16] make irq2_iommu " Yinghai Lu
  0 siblings, 1 reply; 39+ messages in thread
From: Yinghai Lu @ 2008-08-01  9:37 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, hpa, Eric Biederman, Dhaval Giani,
	Mike Travis, Andrew Morton
  Cc: linux-kernel, Yinghai Lu

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>

---
 drivers/char/random.c |    6 ++++++
 1 file changed, 6 insertions(+)

Index: linux-2.6/drivers/char/random.c
===================================================================
--- linux-2.6.orig/drivers/char/random.c
+++ linux-2.6/drivers/char/random.c
@@ -558,7 +558,13 @@ struct timer_rand_state {
 };
 
 static struct timer_rand_state input_timer_state;
+
+#ifdef CONFIG_HAVE_DYN_ARRAY
+static struct timer_rand_state **irq_timer_state;
+DEFINE_DYN_ARRAY(irq_timer_state, sizeof(struct timer_rand_state *), nr_irqs, PAGE_SIZE, NULL);
+#else
 static struct timer_rand_state *irq_timer_state[NR_IRQS];
+#endif
 
 /*
  * This function adds entropy to the entropy "pool" by using timing

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH 05/16] make irq2_iommu to use dyn_array
  2008-08-01  9:37       ` [PATCH 04/16] make irq_timer_state to use dyn_array Yinghai Lu
@ 2008-08-01  9:37         ` Yinghai Lu
  2008-08-01  9:37           ` [PATCH 06/16] make irq_desc " Yinghai Lu
  0 siblings, 1 reply; 39+ messages in thread
From: Yinghai Lu @ 2008-08-01  9:37 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, hpa, Eric Biederman, Dhaval Giani,
	Mike Travis, Andrew Morton
  Cc: linux-kernel, Yinghai Lu

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>

---
 drivers/pci/intr_remapping.c |    6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

Index: linux-2.6/drivers/pci/intr_remapping.c
===================================================================
--- linux-2.6.orig/drivers/pci/intr_remapping.c
+++ linux-2.6/drivers/pci/intr_remapping.c
@@ -11,12 +11,14 @@ static struct ioapic_scope ir_ioapic[MAX
 static int ir_ioapic_num;
 int intr_remapping_enabled;
 
-static struct {
+static struct irq_2_iommu {
 	struct intel_iommu *iommu;
 	u16 irte_index;
 	u16 sub_handle;
 	u8  irte_mask;
-} irq_2_iommu[NR_IRQS];
+} *irq_2_iommu;
+
+DEFINE_DYN_ARRAY(irq_2_iommu, sizeof(struct irq_2_iommu), nr_irqs, PAGE_SIZE, NULL);
 
 static DEFINE_SPINLOCK(irq_2_ir_lock);
 

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH 06/16] make irq_desc to use dyn_array
  2008-08-01  9:37         ` [PATCH 05/16] make irq2_iommu " Yinghai Lu
@ 2008-08-01  9:37           ` Yinghai Lu
  2008-08-01  9:37             ` [PATCH 07/16] x86: make 64bit support dyn_array Yinghai Lu
  0 siblings, 1 reply; 39+ messages in thread
From: Yinghai Lu @ 2008-08-01  9:37 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, hpa, Eric Biederman, Dhaval Giani,
	Mike Travis, Andrew Morton
  Cc: linux-kernel, Yinghai Lu

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>

---
 include/linux/irq.h |    4 ++++
 kernel/irq/handle.c |   31 +++++++++++++++++++++++++++++++
 2 files changed, 35 insertions(+)

Index: linux-2.6/include/linux/irq.h
===================================================================
--- linux-2.6.orig/include/linux/irq.h
+++ linux-2.6/include/linux/irq.h
@@ -181,7 +181,11 @@ struct irq_desc {
 	const char		*name;
 } ____cacheline_internodealigned_in_smp;
 
+#ifdef CONFIG_HAVE_DYN_ARRAY
+extern struct irq_desc *irq_desc;
+#else
 extern struct irq_desc irq_desc[NR_IRQS];
+#endif
 
 /*
  * Migration helpers for obsolete names, they will go away:
Index: linux-2.6/kernel/irq/handle.c
===================================================================
--- linux-2.6.orig/kernel/irq/handle.c
+++ linux-2.6/kernel/irq/handle.c
@@ -48,6 +48,36 @@ handle_bad_irq(unsigned int irq, struct
  * Controller mappings for all interrupt sources:
  */
 int nr_irqs = NR_IRQS;
+
+#ifdef CONFIG_HAVE_DYN_ARRAY
+static struct irq_desc irq_desc_init = {
+	.status = IRQ_DISABLED,
+	.chip = &no_irq_chip,
+	.handle_irq = handle_bad_irq,
+	.depth = 1,
+	.lock = __SPIN_LOCK_UNLOCKED(irq_desc->lock),
+#ifdef CONFIG_SMP
+	.affinity = CPU_MASK_ALL
+#endif
+};
+
+static void __init init_work(void *data)
+{
+	struct dyn_array *da = data;
+	int i;
+	struct  irq_desc *desc;
+
+	desc = *da->name;
+
+	for (i = 0; i < *da->nr; i++)
+		memcpy(&desc[i], &irq_desc_init, sizeof(struct irq_desc));
+}
+
+struct irq_desc *irq_desc;
+DEFINE_DYN_ARRAY(irq_desc, sizeof(struct irq_desc), nr_irqs, PAGE_SIZE, init_work);
+
+#else
+
 struct irq_desc irq_desc[NR_IRQS] __cacheline_aligned_in_smp = {
 	[0 ... NR_IRQS-1] = {
 		.status = IRQ_DISABLED,
@@ -60,6 +90,7 @@ struct irq_desc irq_desc[NR_IRQS] __cach
 #endif
 	}
 };
+#endif
 
 /*
  * What should we do if we get a hw irq event on an illegal vector?

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH 07/16] x86: make 64bit support dyn_array
  2008-08-01  9:37           ` [PATCH 06/16] make irq_desc " Yinghai Lu
@ 2008-08-01  9:37             ` Yinghai Lu
  2008-08-01  9:37               ` [PATCH 08/16] serial: change remove NR_IRQS in 8250.c v2 Yinghai Lu
  0 siblings, 1 reply; 39+ messages in thread
From: Yinghai Lu @ 2008-08-01  9:37 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, hpa, Eric Biederman, Dhaval Giani,
	Mike Travis, Andrew Morton
  Cc: linux-kernel, Yinghai Lu

set nr_irqs according to nr_cpu_ids, so could get small footprint when use
big kernel.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>

---
 arch/Kconfig                     |    2 ++
 arch/x86/Kconfig                 |    1 +
 arch/x86/kernel/io_apic_64.c     |   28 +++++++++++++++++++++-------
 arch/x86/kernel/setup.c          |    6 ++++++
 arch/x86/kernel/vmlinux_64.lds.S |    3 +++
 5 files changed, 33 insertions(+), 7 deletions(-)

Index: linux-2.6/arch/Kconfig
===================================================================
--- linux-2.6.orig/arch/Kconfig
+++ linux-2.6/arch/Kconfig
@@ -103,3 +103,5 @@ config HAVE_CLK
 	  The <linux/clk.h> calls support software clock gating and
 	  thus are a key power management tool on many systems.
 
+config HAVE_DYN_ARRAY
+	def_bool n
Index: linux-2.6/arch/x86/Kconfig
===================================================================
--- linux-2.6.orig/arch/x86/Kconfig
+++ linux-2.6/arch/x86/Kconfig
@@ -33,6 +33,7 @@ config X86
 	select HAVE_ARCH_TRACEHOOK
 	select HAVE_GENERIC_DMA_COHERENT if X86_32
 	select HAVE_EFFICIENT_UNALIGNED_ACCESS
+	select HAVE_DYN_ARRAY if X86_64
 
 config ARCH_DEFCONFIG
 	string
Index: linux-2.6/arch/x86/kernel/io_apic_64.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/io_apic_64.c
+++ linux-2.6/arch/x86/kernel/io_apic_64.c
@@ -66,7 +66,7 @@ struct irq_cfg {
 };
 
 /* irq_cfg is indexed by the sum of all RTEs in all I/O APICs. */
-static struct irq_cfg irq_cfg[NR_IRQS] __read_mostly = {
+static struct irq_cfg irq_cfg_legacy[] __initdata = {
 	[0]  = { .domain = CPU_MASK_ALL, .vector = IRQ0_VECTOR,  },
 	[1]  = { .domain = CPU_MASK_ALL, .vector = IRQ1_VECTOR,  },
 	[2]  = { .domain = CPU_MASK_ALL, .vector = IRQ2_VECTOR,  },
@@ -85,6 +85,17 @@ static struct irq_cfg irq_cfg[NR_IRQS] _
 	[15] = { .domain = CPU_MASK_ALL, .vector = IRQ15_VECTOR, },
 };
 
+static struct irq_cfg *irq_cfg;
+
+static void __init init_work(void *data)
+{
+	struct dyn_array *da = data;
+
+	memcpy(*da->name, irq_cfg_legacy, sizeof(irq_cfg_legacy));
+}
+
+DEFINE_DYN_ARRAY(irq_cfg, sizeof(struct irq_cfg), nr_irqs, PAGE_SIZE, init_work);
+
 static int assign_irq_vector(int irq, cpumask_t mask);
 
 int first_system_vector = 0xfe;
@@ -129,10 +140,9 @@ DECLARE_BITMAP(mp_bus_not_pci, MAX_MP_BU
  * Rough estimation of how many shared IRQs there are, can
  * be changed anytime.
  */
-#define MAX_PLUS_SHARED_IRQS NR_IRQS
-#define PIN_MAP_SIZE (MAX_PLUS_SHARED_IRQS + NR_IRQS)
 
-int pin_map_size = PIN_MAP_SIZE;
+int pin_map_size;
+
 /*
  * This is performance-critical, we want to do it O(1)
  *
@@ -141,8 +151,12 @@ int pin_map_size = PIN_MAP_SIZE;
  */
 
 static struct irq_pin_list {
-	short apic, pin, next;
-} irq_2_pin[PIN_MAP_SIZE];
+	short apic, pin;
+	int next;
+} *irq_2_pin;
+
+DEFINE_DYN_ARRAY(irq_2_pin, sizeof(struct irq_pin_list), pin_map_size, sizeof(struct irq_pin_list), NULL);
+
 
 struct io_apic {
 	unsigned int index;
@@ -359,7 +373,7 @@ static void set_ioapic_affinity_irq(unsi
  * shared ISA-space IRQs, so we have to support them. We are super
  * fast in the common case, and fast for shared ISA-space IRQs.
  */
-int first_free_entry = NR_IRQS;
+int first_free_entry;
 static void add_pin_to_irq(unsigned int irq, int apic, int pin)
 {
 	struct irq_pin_list *entry = irq_2_pin + irq;
Index: linux-2.6/arch/x86/kernel/setup.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/setup.c
+++ linux-2.6/arch/x86/kernel/setup.c
@@ -856,7 +856,13 @@ void __init setup_arch(char **cmdline_p)
 #endif
 
 	prefill_possible_map();
+
 #ifdef CONFIG_X86_64
+	/* need to wait for nr_cpu_ids settle down */
+	if (nr_irqs == NR_IRQS)
+		nr_irqs = 32 * nr_cpu_ids + 224;
+	pin_map_size = nr_irqs * 2;
+	first_free_entry = nr_irqs;
 	init_cpu_to_node();
 #endif
 
Index: linux-2.6/arch/x86/kernel/vmlinux_64.lds.S
===================================================================
--- linux-2.6.orig/arch/x86/kernel/vmlinux_64.lds.S
+++ linux-2.6/arch/x86/kernel/vmlinux_64.lds.S
@@ -174,6 +174,9 @@ SECTIONS
 	*(.x86cpuvendor.init)
   }
   __x86cpuvendor_end = .;
+
+  DYN_ARRAY_INIT(8)
+
   SECURITY_INIT
 
   . = ALIGN(8);

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH 08/16] serial: change remove NR_IRQS in 8250.c v2
  2008-08-01  9:37             ` [PATCH 07/16] x86: make 64bit support dyn_array Yinghai Lu
@ 2008-08-01  9:37               ` Yinghai Lu
  2008-08-01  9:37                 ` [PATCH 09/16] add per_cpu_dyn_array support Yinghai Lu
  0 siblings, 1 reply; 39+ messages in thread
From: Yinghai Lu @ 2008-08-01  9:37 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, hpa, Eric Biederman, Dhaval Giani,
	Mike Travis, Andrew Morton
  Cc: linux-kernel, Yinghai Lu

replace
	[PATCH] serial: change irq_lists to use dyn_array
use small array with index to handle irq locking for serial port
hope 32 slot is enough

v2: according to Eric, move irq_no into irq_info, and not clean irq_no

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>

---
 drivers/serial/8250.c |   45 +++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 41 insertions(+), 4 deletions(-)

Index: linux-2.6/drivers/serial/8250.c
===================================================================
--- linux-2.6.orig/drivers/serial/8250.c
+++ linux-2.6/drivers/serial/8250.c
@@ -147,9 +147,39 @@ struct uart_8250_port {
 struct irq_info {
 	spinlock_t		lock;
 	struct list_head	*head;
+	int irq_no;
 };
 
-static struct irq_info irq_lists[NR_IRQS];
+#define NR_IRQ_INFO	32
+
+static struct irq_info irq_lists[NR_IRQ_INFO] = {
+	[0 ... NR_IRQ_INFO-1] = {
+		.irq_no = -1,
+	}
+};
+
+static struct irq_info *get_irq_info(int irq, int with_free)
+{
+	int i, first_free = -1;
+
+	for (i = 0; i < NR_IRQ_INFO; i++) {
+		if (irq_lists[i].irq_no == irq)
+			return &irq_lists[i];
+		if (irq_lists[i].irq_no == -1 && first_free == -1)
+			first_free = i;
+	}
+	if (!with_free)
+		return NULL;
+
+	if (first_free != -1) {
+		irq_lists[first_free].irq_no = irq;
+		return &irq_lists[first_free];
+	}
+
+	WARN_ON("NR_IRQ_INFO too small");
+
+	return NULL;
+}
 
 /*
  * Here we define the default xmit fifo size used for each type of UART.
@@ -1541,9 +1571,12 @@ static void serial_do_unlink(struct irq_
 
 static int serial_link_irq_chain(struct uart_8250_port *up)
 {
-	struct irq_info *i = irq_lists + up->port.irq;
+	struct irq_info *i = get_irq_info(up->port.irq, 1);
 	int ret, irq_flags = up->port.flags & UPF_SHARE_IRQ ? IRQF_SHARED : 0;
 
+	if (!i)
+		return -1;
+
 	spin_lock_irq(&i->lock);
 
 	if (i->head) {
@@ -1567,7 +1600,11 @@ static int serial_link_irq_chain(struct
 
 static void serial_unlink_irq_chain(struct uart_8250_port *up)
 {
-	struct irq_info *i = irq_lists + up->port.irq;
+	int irq_no = up->port.irq;
+	struct irq_info *i = get_irq_info(irq_no, 0);
+
+	if (!i)
+		return;
 
 	BUG_ON(i->head == NULL);
 
@@ -2951,7 +2988,7 @@ static int __init serial8250_init(void)
 		"%d ports, IRQ sharing %sabled\n", nr_uarts,
 		share_irqs ? "en" : "dis");
 
-	for (i = 0; i < nr_irqs; i++)
+	for (i = 0; i < NR_IRQ_INFO; i++)
 		spin_lock_init(&irq_lists[i].lock);
 
 	ret = uart_register_driver(&serial8250_reg);

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH 09/16] add per_cpu_dyn_array support
  2008-08-01  9:37               ` [PATCH 08/16] serial: change remove NR_IRQS in 8250.c v2 Yinghai Lu
@ 2008-08-01  9:37                 ` Yinghai Lu
  2008-08-01  9:37                   ` [PATCH 10/16] irq: make irqs in kernel stat use per_cpu_dyn_array Yinghai Lu
  0 siblings, 1 reply; 39+ messages in thread
From: Yinghai Lu @ 2008-08-01  9:37 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, hpa, Eric Biederman, Dhaval Giani,
	Mike Travis, Andrew Morton
  Cc: linux-kernel, Yinghai Lu

so could make array in per_cpu is allocated dynamically too

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>

---
 arch/x86/kernel/setup_percpu.c    |    7 +++-
 include/asm-generic/vmlinux.lds.h |    6 ++++
 include/linux/init.h              |   27 ++++++++++++++++--
 init/main.c                       |   57 ++++++++++++++++++++++++++++++++++++++
 4 files changed, 92 insertions(+), 5 deletions(-)

Index: linux-2.6/arch/x86/kernel/setup_percpu.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/setup_percpu.c
+++ linux-2.6/arch/x86/kernel/setup_percpu.c
@@ -140,7 +140,7 @@ static void __init setup_cpu_pda_map(voi
  */
 void __init setup_per_cpu_areas(void)
 {
-	ssize_t size = PERCPU_ENOUGH_ROOM;
+	ssize_t size, old_size;
 	char *ptr;
 	int cpu;
 
@@ -148,7 +148,8 @@ void __init setup_per_cpu_areas(void)
 	setup_cpu_pda_map();
 
 	/* Copy section for each CPU (we discard the original) */
-	size = PERCPU_ENOUGH_ROOM;
+	old_size = PERCPU_ENOUGH_ROOM;
+	size = old_size + per_cpu_dyn_array_size();
 	printk(KERN_INFO "PERCPU: Allocating %zd bytes of per cpu data\n",
 			  size);
 
@@ -176,6 +177,8 @@ void __init setup_per_cpu_areas(void)
 		per_cpu_offset(cpu) = ptr - __per_cpu_start;
 		memcpy(ptr, __per_cpu_start, __per_cpu_end - __per_cpu_start);
 
+		per_cpu_alloc_dyn_array(cpu, ptr + old_size);
+
 	}
 
 	printk(KERN_DEBUG "NR_CPUS: %d, nr_cpu_ids: %d, nr_node_ids %d\n",
Index: linux-2.6/include/asm-generic/vmlinux.lds.h
===================================================================
--- linux-2.6.orig/include/asm-generic/vmlinux.lds.h
+++ linux-2.6/include/asm-generic/vmlinux.lds.h
@@ -220,6 +220,12 @@
 		VMLINUX_SYMBOL(__dyn_array_start) = .;			\
 		*(.dyn_array.init)					\
 		VMLINUX_SYMBOL(__dyn_array_end) = .;			\
+	}								\
+	. = ALIGN((align));						\
+	.per_cpu_dyn_array.init : AT(ADDR(.per_cpu_dyn_array.init) - LOAD_OFFSET) {	\
+		VMLINUX_SYMBOL(__per_cpu_dyn_array_start) = .;		\
+		*(.per_cpu_dyn_array.init)				\
+		VMLINUX_SYMBOL(__per_cpu_dyn_array_end) = .;		\
 	}
 #define SECURITY_INIT							\
 	.security_initcall.init : AT(ADDR(.security_initcall.init) - LOAD_OFFSET) { \
Index: linux-2.6/include/linux/init.h
===================================================================
--- linux-2.6.orig/include/linux/init.h
+++ linux-2.6/include/linux/init.h
@@ -258,12 +258,13 @@ struct dyn_array {
         void (*init_work)(void *);
 };
 extern struct dyn_array *__dyn_array_start[], *__dyn_array_end[];
+extern struct dyn_array *__per_cpu_dyn_array_start[], *__per_cpu_dyn_array_end[];
 
-#define DEFINE_DYN_ARRAY(nameX, sizeX, nrX, alignX, init_workX) \
+#define DEFINE_DYN_ARRAY_ADDR(nameX, addrX, sizeX, nrX, alignX, init_workX) \
 		static struct dyn_array __dyn_array_##nameX __initdata = \
-		{	.name = (void **)&nameX,\
+		{	.name = (void **)&(nameX),\
 			.size = sizeX,\
-			.nr   = &nrX,\
+			.nr   = &(nrX),\
 			.align = alignX,\
 			.init_work = init_workX,\
 		}; \
@@ -271,7 +272,27 @@ extern struct dyn_array *__dyn_array_sta
 		__attribute__((__section__(".dyn_array.init"))) = \
 			&__dyn_array_##nameX
 
+#define DEFINE_DYN_ARRAY(nameX, sizeX, nrX, alignX, init_workX) \
+	DEFINE_DYN_ARRAY_ADDR(nameX, nameX, sizeX, nrX, alignX, init_workX)
+
+#define DEFINE_PER_CPU_DYN_ARRAY_ADDR(nameX, addrX, sizeX, nrX, alignX, init_workX) \
+		static struct dyn_array __per_cpu_dyn_array_##nameX __initdata = \
+		{	.name = (void **)&(addrX),\
+			.size = sizeX,\
+			.nr   = &(nrX),\
+			.align = alignX,\
+			.init_work = init_workX,\
+		}; \
+		static struct dyn_array *__per_cpu_dyn_array_ptr_##nameX __used \
+		__attribute__((__section__(".per_cpu_dyn_array.init"))) = \
+			&__per_cpu_dyn_array_##nameX
+
+#define DEFINE_PER_CPU_DYN_ARRAY(nameX, sizeX, nrX, alignX, init_workX) \
+	DEFINE_PER_CPU_DYN_ARRAY_ADDR(nameX, nameX, nrX, alignX, init_workX)
+
 extern void pre_alloc_dyn_array(void);
+extern unsigned long per_cpu_dyn_array_size(void);
+extern void per_cpu_alloc_dyn_array(int cpu, char *ptr);
 #endif /* __ASSEMBLY__ */
 
 /**
Index: linux-2.6/init/main.c
===================================================================
--- linux-2.6.orig/init/main.c
+++ linux-2.6/init/main.c
@@ -562,6 +562,63 @@ void pre_alloc_dyn_array(void)
 #endif
 }
 
+unsigned long per_cpu_dyn_array_size(void)
+{
+	unsigned long total_size = 0;
+#ifdef CONFIG_HAVE_DYN_ARRAY
+	unsigned long size;
+	struct dyn_array **daa;
+
+	for (daa = __per_cpu_dyn_array_start ; daa < __per_cpu_dyn_array_end; daa++) {
+		struct dyn_array *da = *daa;
+
+		size = da->size * (*da->nr);
+		print_fn_descriptor_symbol("per_cpu_dyna_array %s ", da->name);
+		printk(KERN_CONT "size:%#lx nr:%d align:%#lx\n",
+			da->size, *da->nr, da->align);
+		total_size += roundup(size, da->align);
+	}
+	if (total_size)
+		printk(KERN_DEBUG "per_cpu_dyna_array total_size: %#lx\n",
+			 total_size);
+#endif
+	return total_size;
+}
+
+void per_cpu_alloc_dyn_array(int cpu, char *ptr)
+{
+#ifdef CONFIG_HAVE_DYN_ARRAY
+	unsigned long size, phys;
+	struct dyn_array **daa;
+	unsigned long addr;
+	void **array;
+
+	phys = virt_to_phys(ptr);
+
+	for (daa = __per_cpu_dyn_array_start ; daa < __per_cpu_dyn_array_end; daa++) {
+		struct dyn_array *da = *daa;
+
+		size = da->size * (*da->nr);
+		print_fn_descriptor_symbol("per_cpu_dyna_array %s ", da->name);
+		printk(KERN_CONT "size:%#lx nr:%d align:%#lx",
+			da->size, *da->nr, da->align);
+
+		phys = roundup(phys, da->align);
+		addr = (unsigned long)da->name;
+		addr += per_cpu_offset(cpu);
+		array = (void **)addr;
+		*array = phys_to_virt(phys);
+		*da->name = *array; /* so init_work could use it directly */
+		printk(KERN_CONT " %p ==> [%#lx - %#lx]\n", array, phys, phys + size);
+		phys += size;
+
+		if (da->init_work) {
+			da->init_work(da);
+		}
+	}
+#endif
+}
+
 asmlinkage void __init start_kernel(void)
 {
 	char * command_line;

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH 10/16] irq: make irqs in kernel stat use per_cpu_dyn_array
  2008-08-01  9:37                 ` [PATCH 09/16] add per_cpu_dyn_array support Yinghai Lu
@ 2008-08-01  9:37                   ` Yinghai Lu
  2008-08-01  9:37                     ` [PATCH 11/16] x86 remove irq_vectors_limit.h Yinghai Lu
  0 siblings, 1 reply; 39+ messages in thread
From: Yinghai Lu @ 2008-08-01  9:37 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, hpa, Eric Biederman, Dhaval Giani,
	Mike Travis, Andrew Morton
  Cc: linux-kernel, Yinghai Lu

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>

---
 include/linux/kernel_stat.h |    4 ++++
 kernel/sched.c              |    5 ++++-
 2 files changed, 8 insertions(+), 1 deletion(-)

Index: linux-2.6/include/linux/kernel_stat.h
===================================================================
--- linux-2.6.orig/include/linux/kernel_stat.h
+++ linux-2.6/include/linux/kernel_stat.h
@@ -28,7 +28,11 @@ struct cpu_usage_stat {
 
 struct kernel_stat {
 	struct cpu_usage_stat	cpustat;
+#ifdef CONFIG_HAVE_DYN_ARRAY
+	unsigned int *irqs;
+#else
 	unsigned int irqs[NR_IRQS];
+#endif
 };
 
 DECLARE_PER_CPU(struct kernel_stat, kstat);
Index: linux-2.6/kernel/sched.c
===================================================================
--- linux-2.6.orig/kernel/sched.c
+++ linux-2.6/kernel/sched.c
@@ -4021,9 +4021,12 @@ static inline void idle_balance(int cpu,
 #endif
 
 DEFINE_PER_CPU(struct kernel_stat, kstat);
-
 EXPORT_PER_CPU_SYMBOL(kstat);
 
+#ifdef CONFIG_HAVE_DYN_ARRAY
+DEFINE_PER_CPU_DYN_ARRAY_ADDR(per_cpu__kstat_irqs, per_cpu__kstat.irqs, sizeof(unsigned int), nr_irqs, sizeof(unsigned long), NULL);
+#endif
+
 /*
  * Return p->sum_exec_runtime plus any more ns on the sched_clock
  * that have not yet been banked in case the task is currently running.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH 11/16] x86 remove irq_vectors_limit.h
  2008-08-01  9:37                   ` [PATCH 10/16] irq: make irqs in kernel stat use per_cpu_dyn_array Yinghai Lu
@ 2008-08-01  9:37                     ` Yinghai Lu
  2008-08-01  9:37                       ` [PATCH 12/16] x86: make 32bit use dyn_array Yinghai Lu
  0 siblings, 1 reply; 39+ messages in thread
From: Yinghai Lu @ 2008-08-01  9:37 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, hpa, Eric Biederman, Dhaval Giani,
	Mike Travis, Andrew Morton
  Cc: linux-kernel, Yinghai Lu

    no user

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>

---
 include/asm-x86/mach-generic/irq_vectors_limits.h |   14 --------------
 include/asm-x86/summit/irq_vectors_limits.h       |   14 --------------
 2 files changed, 28 deletions(-)

Index: linux-2.6/include/asm-x86/mach-generic/irq_vectors_limits.h
===================================================================
--- linux-2.6.orig/include/asm-x86/mach-generic/irq_vectors_limits.h
+++ /dev/null
@@ -1,14 +0,0 @@
-#ifndef ASM_X86__MACH_GENERIC__IRQ_VECTORS_LIMITS_H
-#define ASM_X86__MACH_GENERIC__IRQ_VECTORS_LIMITS_H
-
-/*
- * For Summit or generic (i.e. installer) kernels, we have lots of I/O APICs,
- * even with uni-proc kernels, so use a big array.
- *
- * This value should be the same in both the generic and summit subarches.
- * Change one, change 'em both.
- */
-#define NR_IRQS	224
-#define NR_IRQ_VECTORS	1024
-
-#endif /* ASM_X86__MACH_GENERIC__IRQ_VECTORS_LIMITS_H */
Index: linux-2.6/include/asm-x86/summit/irq_vectors_limits.h
===================================================================
--- linux-2.6.orig/include/asm-x86/summit/irq_vectors_limits.h
+++ /dev/null
@@ -1,14 +0,0 @@
-#ifndef _ASM_IRQ_VECTORS_LIMITS_H
-#define _ASM_IRQ_VECTORS_LIMITS_H
-
-/*
- * For Summit or generic (i.e. installer) kernels, we have lots of I/O APICs,
- * even with uni-proc kernels, so use a big array.
- *
- * This value should be the same in both the generic and summit subarches.
- * Change one, change 'em both.
- */
-#define NR_IRQS	224
-#define NR_IRQ_VECTORS	1024
-
-#endif /* _ASM_IRQ_VECTORS_LIMITS_H */

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH 12/16] x86: make 32bit use dyn_array
  2008-08-01  9:37                     ` [PATCH 11/16] x86 remove irq_vectors_limit.h Yinghai Lu
@ 2008-08-01  9:37                       ` Yinghai Lu
  2008-08-01  9:37                         ` [PATCH 13/16] add per_cpu_dyn_array for arch percpu support Yinghai Lu
  0 siblings, 1 reply; 39+ messages in thread
From: Yinghai Lu @ 2008-08-01  9:37 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, hpa, Eric Biederman, Dhaval Giani,
	Mike Travis, Andrew Morton
  Cc: linux-kernel, Yinghai Lu

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>

---
 arch/x86/Kconfig                 |    2 +-
 arch/x86/kernel/vmlinux_32.lds.S |    1 +
 2 files changed, 2 insertions(+), 1 deletion(-)

Index: linux-2.6/arch/x86/Kconfig
===================================================================
--- linux-2.6.orig/arch/x86/Kconfig
+++ linux-2.6/arch/x86/Kconfig
@@ -33,7 +33,7 @@ config X86
 	select HAVE_ARCH_TRACEHOOK
 	select HAVE_GENERIC_DMA_COHERENT if X86_32
 	select HAVE_EFFICIENT_UNALIGNED_ACCESS
-	select HAVE_DYN_ARRAY if X86_64
+	select HAVE_DYN_ARRAY
 
 config ARCH_DEFCONFIG
 	string
Index: linux-2.6/arch/x86/kernel/vmlinux_32.lds.S
===================================================================
--- linux-2.6.orig/arch/x86/kernel/vmlinux_32.lds.S
+++ linux-2.6/arch/x86/kernel/vmlinux_32.lds.S
@@ -145,6 +145,7 @@ SECTIONS
 	*(.x86cpuvendor.init)
 	__x86cpuvendor_end = .;
   }
+  DYN_ARRAY_INIT(8)
   SECURITY_INIT
   . = ALIGN(4);
   .altinstructions : AT(ADDR(.altinstructions) - LOAD_OFFSET) {

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH 13/16] add per_cpu_dyn_array for arch percpu support
  2008-08-01  9:37                       ` [PATCH 12/16] x86: make 32bit use dyn_array Yinghai Lu
@ 2008-08-01  9:37                         ` Yinghai Lu
  2008-08-01  9:37                           ` [PATCH 14/16] x86: get mp_irqs from madt Yinghai Lu
  0 siblings, 1 reply; 39+ messages in thread
From: Yinghai Lu @ 2008-08-01  9:37 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, hpa, Eric Biederman, Dhaval Giani,
	Mike Travis, Andrew Morton
  Cc: linux-kernel, Yinghai Lu

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>

---
 init/main.c |    6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

Index: linux-2.6/init/main.c
===================================================================
--- linux-2.6.orig/init/main.c
+++ linux-2.6/init/main.c
@@ -394,17 +394,19 @@ EXPORT_SYMBOL(__per_cpu_offset);
 
 static void __init setup_per_cpu_areas(void)
 {
-	unsigned long size, i;
+	unsigned long size, i, old_size;
 	char *ptr;
 	unsigned long nr_possible_cpus = num_possible_cpus();
 
 	/* Copy section for each CPU (we discard the original) */
-	size = ALIGN(PERCPU_ENOUGH_ROOM, PAGE_SIZE);
+	old_size = PERCPU_ENOUGH_ROOM;
+	size = ALIGN(old_size + per_cpu_dyn_array_size(), PAGE_SIZE);
 	ptr = alloc_bootmem_pages(size * nr_possible_cpus);
 
 	for_each_possible_cpu(i) {
 		__per_cpu_offset[i] = ptr - __per_cpu_start;
 		memcpy(ptr, __per_cpu_start, __per_cpu_end - __per_cpu_start);
+		per_cpu_alloc_dyn_array(cpu, ptr + old_size);
 		ptr += size;
 	}
 }

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH 14/16] x86: get mp_irqs from madt
  2008-08-01  9:37                         ` [PATCH 13/16] add per_cpu_dyn_array for arch percpu support Yinghai Lu
@ 2008-08-01  9:37                           ` Yinghai Lu
  2008-08-01  9:37                             ` [PATCH 15/16] x86: make 32bit more like with io_apic/dyn_array to 64 bit Yinghai Lu
  0 siblings, 1 reply; 39+ messages in thread
From: Yinghai Lu @ 2008-08-01  9:37 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, hpa, Eric Biederman, Dhaval Giani,
	Mike Travis, Andrew Morton
  Cc: linux-kernel, Yinghai Lu

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>

---
 arch/x86/kernel/acpi/boot.c |   30 ++++++++++++++++++++++++++++--
 include/asm-x86/mpspec.h    |    1 +
 2 files changed, 29 insertions(+), 2 deletions(-)

Index: linux-2.6/arch/x86/kernel/acpi/boot.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/acpi/boot.c
+++ linux-2.6/arch/x86/kernel/acpi/boot.c
@@ -962,6 +962,29 @@ void __init mp_register_ioapic(int id, u
 	nr_ioapics++;
 }
 
+int get_nr_irqs_via_madt(void)
+{
+	int idx;
+	int nr = 0;
+
+	for (idx = 0; idx < nr_ioapics; idx ++) {
+		if (mp_ioapic_routing[idx].gsi_end > nr)
+			nr = mp_ioapic_routing[idx].gsi_end;
+	}
+
+	nr++;
+
+	/* double it for hotplug and msi and nmi */
+	nr <<= 1;
+
+	/* something wrong ? */
+	if (nr < 32)
+		nr = 32;
+
+	return nr;
+
+}
+
 static void assign_to_mp_irq(struct mp_config_intsrc *m,
 				    struct mp_config_intsrc *mp_irq)
 {
@@ -1259,9 +1282,12 @@ static int __init acpi_parse_madt_ioapic
 		return count;
 	}
 
+
+	nr_irqs = get_nr_irqs_via_madt();
+
 	count =
 	    acpi_table_parse_madt(ACPI_MADT_TYPE_INTERRUPT_OVERRIDE, acpi_parse_int_src_ovr,
-				  NR_IRQ_VECTORS);
+				  nr_irqs);
 	if (count < 0) {
 		printk(KERN_ERR PREFIX
 		       "Error parsing interrupt source overrides entry\n");
@@ -1281,7 +1307,7 @@ static int __init acpi_parse_madt_ioapic
 
 	count =
 	    acpi_table_parse_madt(ACPI_MADT_TYPE_NMI_SOURCE, acpi_parse_nmi_src,
-				  NR_IRQ_VECTORS);
+				  nr_irqs);
 	if (count < 0) {
 		printk(KERN_ERR PREFIX "Error parsing NMI SRC entry\n");
 		/* TBD: Cleanup to allow fallback to MPS */
Index: linux-2.6/include/asm-x86/mpspec.h
===================================================================
--- linux-2.6.orig/include/asm-x86/mpspec.h
+++ linux-2.6/include/asm-x86/mpspec.h
@@ -59,6 +59,7 @@ extern void mp_override_legacy_irq(u8 bu
 				   u32 gsi);
 extern void mp_config_acpi_legacy_irqs(void);
 extern int mp_register_gsi(u32 gsi, int edge_level, int active_high_low);
+extern int get_nr_irqs_via_madt(void);
 #ifdef CONFIG_X86_IO_APIC
 extern int mp_config_acpi_gsi(unsigned char number, unsigned int devfn, u8 pin,
 				u32 gsi, int triggering, int polarity);

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH 15/16] x86: make 32bit more like with io_apic/dyn_array to 64 bit
  2008-08-01  9:37                           ` [PATCH 14/16] x86: get mp_irqs from madt Yinghai Lu
@ 2008-08-01  9:37                             ` Yinghai Lu
  2008-08-01  9:37                               ` [PATCH 16/16] x86: alloc dyn_array all alltogether Yinghai Lu
  0 siblings, 1 reply; 39+ messages in thread
From: Yinghai Lu @ 2008-08-01  9:37 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, hpa, Eric Biederman, Dhaval Giani,
	Mike Travis, Andrew Morton
  Cc: linux-kernel, Yinghai Lu

also remove NR_IRQ_VECTORS

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>

---
 arch/x86/kernel/io_apic_32.c  |   53 +++++++++++++++++++++++++++++++++---------
 arch/x86/kernel/setup.c       |    4 +--
 include/asm-x86/irq_vectors.h |    2 -
 3 files changed, 44 insertions(+), 15 deletions(-)

Index: linux-2.6/arch/x86/kernel/io_apic_32.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/io_apic_32.c
+++ linux-2.6/arch/x86/kernel/io_apic_32.c
@@ -70,7 +70,7 @@ int timer_through_8259 __initdata;
  */
 int sis_apic_bug = -1;
 
-int first_free_entry = NR_IRQS;
+int first_free_entry;
 /*
  * # of IRQ routing registers
  */
@@ -98,10 +98,7 @@ static int disable_timer_pin_1 __initdat
  * Rough estimation of how many shared IRQs there are, can
  * be changed anytime.
  */
-#define MAX_PLUS_SHARED_IRQS NR_IRQS
-#define PIN_MAP_SIZE (MAX_PLUS_SHARED_IRQS + NR_IRQS)
-
-int pin_map_size = PIN_MAP_SIZE;
+int pin_map_size;
 
 /*
  * This is performance-critical, we want to do it O(1)
@@ -112,7 +109,9 @@ int pin_map_size = PIN_MAP_SIZE;
 
 static struct irq_pin_list {
 	int apic, pin, next;
-} irq_2_pin[PIN_MAP_SIZE];
+} *irq_2_pin;
+
+DEFINE_DYN_ARRAY(irq_2_pin, sizeof(struct irq_pin_list), pin_map_size, 16, NULL);
 
 struct io_apic {
 	unsigned int index;
@@ -403,9 +402,27 @@ static struct irq_cpu_info {
 
 #define CPU_TO_PACKAGEINDEX(i) (first_cpu(per_cpu(cpu_sibling_map, i)))
 
-static cpumask_t balance_irq_affinity[NR_IRQS] = {
-	[0 ... NR_IRQS-1] = CPU_MASK_ALL
-};
+static cpumask_t balance_irq_affinity_init = CPU_MASK_ALL;
+
+static cpumask_t *balance_irq_affinity;
+
+
+static void __init irq_affinity_init_work(void *data)
+{
+        struct dyn_array *da = data;
+
+        int i;
+        struct  balance_irq_affinity *affinity;
+
+        affinity = *da->name;
+
+        for (i = 0; i < *da->nr; i++)
+                memcpy(&affinity[i], &balance_irq_affinity_init, sizeof(struct balance_irq_affinity));
+
+}
+
+DEFINE_DYN_ARRAY(balance_irq_affinity, sizeof(struct balance_irq_affinity), nr_irqs, PAGE_SIZE, irq_affinity_init_work);
+
 
 void set_balance_irq_affinity(unsigned int irq, cpumask_t mask)
 {
@@ -1170,14 +1187,28 @@ static inline int IO_APIC_irq_trigger(in
 }
 
 /* irq_vectors is indexed by the sum of all RTEs in all I/O APICs. */
-static u8 irq_vector[NR_IRQ_VECTORS] __read_mostly = { FIRST_DEVICE_VECTOR , 0 };
+static u8 irq_vector_init_first = FIRST_DEVICE_VECTOR;
+static u8 *irq_vector;
+
+static void __init irq_vector_init_work(void *data)
+{
+	struct dyn_array *da = data;
+
+	u8 *irq_vec;
+
+	irq_vec = *da->name;
+
+	irq_vec[0] = irq_vector_init_first;
+}
+
+DEFINE_DYN_ARRAY(irq_vector, sizeof(u8), nr_irqs, PAGE_SIZE, irq_vector_init_work);
 
 static int __assign_irq_vector(int irq)
 {
 	static int current_vector = FIRST_DEVICE_VECTOR, current_offset;
 	int vector, offset;
 
-	BUG_ON((unsigned)irq >= NR_IRQ_VECTORS);
+	BUG_ON((unsigned)irq >= nr_irqs);
 
 	if (irq_vector[irq] > 0)
 		return irq_vector[irq];
Index: linux-2.6/arch/x86/kernel/setup.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/setup.c
+++ linux-2.6/arch/x86/kernel/setup.c
@@ -861,10 +861,10 @@ void __init setup_arch(char **cmdline_p)
 	/* need to wait for nr_cpu_ids settle down */
 	if (nr_irqs == NR_IRQS)
 		nr_irqs = 32 * nr_cpu_ids + 224;
-	pin_map_size = nr_irqs * 2;
-	first_free_entry = nr_irqs;
 	init_cpu_to_node();
 #endif
+	pin_map_size = nr_irqs * 2;
+	first_free_entry = nr_irqs;
 
 	init_apic_mappings();
 	ioapic_init_mappings();
Index: linux-2.6/include/asm-x86/irq_vectors.h
===================================================================
--- linux-2.6.orig/include/asm-x86/irq_vectors.h
+++ linux-2.6/include/asm-x86/irq_vectors.h
@@ -131,8 +131,6 @@
 
 #endif /* VISWS */
 
-#define NR_IRQ_VECTORS		NR_IRQS
-
 /* Voyager specific defines */
 /* These define the CPIs we use in linux */
 #define VIC_CPI_LEVEL0			0

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH 16/16] x86: alloc dyn_array all alltogether
  2008-08-01  9:37                             ` [PATCH 15/16] x86: make 32bit more like with io_apic/dyn_array to 64 bit Yinghai Lu
@ 2008-08-01  9:37                               ` Yinghai Lu
  0 siblings, 0 replies; 39+ messages in thread
From: Yinghai Lu @ 2008-08-01  9:37 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, hpa, Eric Biederman, Dhaval Giani,
	Mike Travis, Andrew Morton
  Cc: linux-kernel, Yinghai Lu

also tighten the alignment checking..., and make print out less

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>

---
 arch/x86/kernel/setup_percpu.c |   16 ++++++----
 include/linux/init.h           |    2 -
 init/main.c                    |   65 +++++++++++++++++++++++++++++++----------
 3 files changed, 62 insertions(+), 21 deletions(-)

Index: linux-2.6/arch/x86/kernel/setup_percpu.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/setup_percpu.c
+++ linux-2.6/arch/x86/kernel/setup_percpu.c
@@ -140,26 +140,31 @@ static void __init setup_cpu_pda_map(voi
  */
 void __init setup_per_cpu_areas(void)
 {
-	ssize_t size, old_size;
+	ssize_t size, old_size, da_size;
 	char *ptr;
 	int cpu;
+	unsigned long align = 1;
 
 	/* Setup cpu_pda map */
 	setup_cpu_pda_map();
 
 	/* Copy section for each CPU (we discard the original) */
 	old_size = PERCPU_ENOUGH_ROOM;
-	size = old_size + per_cpu_dyn_array_size();
+	da_size = per_cpu_dyn_array_size(&align);
+	align = max_t(unsigned long, PAGE_SIZE, align);
+	size = roundup(old_size + da_size, align);
 	printk(KERN_INFO "PERCPU: Allocating %zd bytes of per cpu data\n",
 			  size);
 
 	for_each_possible_cpu(cpu) {
 #ifndef CONFIG_NEED_MULTIPLE_NODES
-		ptr = alloc_bootmem_pages(size);
+		ptr = __alloc_bootmem_nopanic(size, align,
+				 __pa(MAX_DMA_ADDRESS));
 #else
 		int node = early_cpu_to_node(cpu);
 		if (!node_online(node) || !NODE_DATA(node)) {
-			ptr = alloc_bootmem_pages(size);
+			ptr = __alloc_bootmem_nopanic(size, align,
+					 __pa(MAX_DMA_ADDRESS));
 			printk(KERN_INFO
 			       "cpu %d has no node %d or node-local memory\n",
 				cpu, node);
@@ -168,7 +173,8 @@ void __init setup_per_cpu_areas(void)
 					 cpu, __pa(ptr));
 		}
 		else {
-			ptr = alloc_bootmem_pages_node(NODE_DATA(node), size);
+			ptr = __alloc_bootmem_node(NODE_DATA(node), size, align,
+							__pa(MAX_DMA_ADDRESS));
 			if (ptr)
 				printk(KERN_DEBUG "per cpu data for cpu%d on node%d at %016lx\n",
 					 cpu, node, __pa(ptr));
Index: linux-2.6/include/linux/init.h
===================================================================
--- linux-2.6.orig/include/linux/init.h
+++ linux-2.6/include/linux/init.h
@@ -291,7 +291,7 @@ extern struct dyn_array *__per_cpu_dyn_a
 	DEFINE_PER_CPU_DYN_ARRAY_ADDR(nameX, nameX, nrX, alignX, init_workX)
 
 extern void pre_alloc_dyn_array(void);
-extern unsigned long per_cpu_dyn_array_size(void);
+extern unsigned long per_cpu_dyn_array_size(unsigned long *align);
 extern void per_cpu_alloc_dyn_array(int cpu, char *ptr);
 #endif /* __ASSEMBLY__ */
 
Index: linux-2.6/init/main.c
===================================================================
--- linux-2.6.orig/init/main.c
+++ linux-2.6/init/main.c
@@ -397,10 +397,14 @@ static void __init setup_per_cpu_areas(v
 	unsigned long size, i, old_size;
 	char *ptr;
 	unsigned long nr_possible_cpus = num_possible_cpus();
+	unsigned long align = 1;
+	unsigned da_size;
 
 	/* Copy section for each CPU (we discard the original) */
 	old_size = PERCPU_ENOUGH_ROOM;
-	size = ALIGN(old_size + per_cpu_dyn_array_size(), PAGE_SIZE);
+	da_size = per_cpu_dyn_array_size(&align);
+	align = max_t(unsigned long, PAGE_SIZE, align);
+	size = ALIGN(old_size + da_size, align);
 	ptr = alloc_bootmem_pages(size * nr_possible_cpus);
 
 	for_each_possible_cpu(i) {
@@ -544,45 +548,78 @@ void __init __weak thread_info_cache_ini
 void pre_alloc_dyn_array(void)
 {
 #ifdef CONFIG_HAVE_DYN_ARRAY
-	unsigned long size, phys = 0;
+	unsigned long total_size = 0, size, phys;
+	unsigned long max_align = 1;
 	struct dyn_array **daa;
+	char *ptr;
 
+	/* get the total size at first */
 	for (daa = __dyn_array_start ; daa < __dyn_array_end; daa++) {
 		struct dyn_array *da = *daa;
 
 		size = da->size * (*da->nr);
-		print_fn_descriptor_symbol("dyna_array %s ", da->name);
-		printk(KERN_CONT "size:%#lx nr:%d align:%#lx",
+		print_fn_descriptor_symbol("dyn_array %s ", da->name);
+		printk(KERN_CONT "size:%#lx nr:%d align:%#lx\n",
 			da->size, *da->nr, da->align);
-		*da->name = __alloc_bootmem_nopanic(size, da->align, phys);
-		phys = virt_to_phys(*da->name);
+		total_size += roundup(size, da->align);
+		if (da->align > max_align)
+			max_align = da->align;
+	}
+	if (total_size)
+		printk(KERN_DEBUG "dyn_array total_size: %#lx\n",
+			 total_size);
+	else
+		return;
+
+	/* allocate them all together */
+	max_align = max_t(unsigned long, max_align, PAGE_SIZE);
+	ptr = __alloc_bootmem_nopanic(total_size, max_align, 0);
+	if (!ptr)
+		panic("Can not alloc dyn_alloc\n");
+
+	phys = virt_to_phys(ptr);
+	for (daa = __dyn_array_start ; daa < __dyn_array_end; daa++) {
+		struct dyn_array *da = *daa;
+
+		size = da->size * (*da->nr);
+		print_fn_descriptor_symbol("dyn_array %s ", da->name);
+
+		phys = roundup(phys, da->align);
+		*da->name = phys_to_virt(phys);
 		printk(KERN_CONT " ==> [%#lx - %#lx]\n", phys, phys + size);
 
+		phys += size;
+
 		if (da->init_work)
 			da->init_work(da);
 	}
 #endif
 }
 
-unsigned long per_cpu_dyn_array_size(void)
+unsigned long per_cpu_dyn_array_size(unsigned long *align)
 {
 	unsigned long total_size = 0;
 #ifdef CONFIG_HAVE_DYN_ARRAY
 	unsigned long size;
 	struct dyn_array **daa;
+	unsigned max_align = 1;
 
 	for (daa = __per_cpu_dyn_array_start ; daa < __per_cpu_dyn_array_end; daa++) {
 		struct dyn_array *da = *daa;
 
 		size = da->size * (*da->nr);
-		print_fn_descriptor_symbol("per_cpu_dyna_array %s ", da->name);
+		print_fn_descriptor_symbol("per_cpu_dyn_array %s ", da->name);
 		printk(KERN_CONT "size:%#lx nr:%d align:%#lx\n",
 			da->size, *da->nr, da->align);
 		total_size += roundup(size, da->align);
+		if (da->align > max_align)
+			max_align = da->align;
 	}
-	if (total_size)
-		printk(KERN_DEBUG "per_cpu_dyna_array total_size: %#lx\n",
+	if (total_size) {
+		printk(KERN_DEBUG "per_cpu_dyn_array total_size: %#lx\n",
 			 total_size);
+		*align = max_align;
+	}
 #endif
 	return total_size;
 }
@@ -596,14 +633,11 @@ void per_cpu_alloc_dyn_array(int cpu, ch
 	void **array;
 
 	phys = virt_to_phys(ptr);
-
 	for (daa = __per_cpu_dyn_array_start ; daa < __per_cpu_dyn_array_end; daa++) {
 		struct dyn_array *da = *daa;
 
 		size = da->size * (*da->nr);
-		print_fn_descriptor_symbol("per_cpu_dyna_array %s ", da->name);
-		printk(KERN_CONT "size:%#lx nr:%d align:%#lx",
-			da->size, *da->nr, da->align);
+		print_fn_descriptor_symbol("per_cpu_dyn_array %s ", da->name);
 
 		phys = roundup(phys, da->align);
 		addr = (unsigned long)da->name;
@@ -611,7 +645,8 @@ void per_cpu_alloc_dyn_array(int cpu, ch
 		array = (void **)addr;
 		*array = phys_to_virt(phys);
 		*da->name = *array; /* so init_work could use it directly */
-		printk(KERN_CONT " %p ==> [%#lx - %#lx]\n", array, phys, phys + size);
+		printk(KERN_CONT " ==> [%#lx - %#lx]\n", phys, phys + size);
+
 		phys += size;
 
 		if (da->init_work) {

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 00/16] dyn_array and nr_irqs support v2
  2008-08-01  9:37 [PATCH 00/16] dyn_array and nr_irqs support v2 Yinghai Lu
  2008-08-01  9:37 ` [PATCH 01/16] x86: 64bit support more than 256 irq Yinghai Lu
@ 2008-08-01 20:46 ` Eric W. Biederman
  2008-08-01 21:30   ` Yinghai Lu
                     ` (2 more replies)
  1 sibling, 3 replies; 39+ messages in thread
From: Eric W. Biederman @ 2008-08-01 20:46 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Ingo Molnar, Thomas Gleixner, hpa, Dhaval Giani, Mike Travis,
	Andrew Morton, linux-kernel

Yinghai Lu <yhlu.kernel@gmail.com> writes:

> Please check dyn_array support for x86

YH you have not addressed any of my core concerns and this exceeds my review limit.
Unfortunately I don't feel like this is a productive process.

My core concerns are:
- You have not separated out and separately pushed the regression patch.  So that we can
  fix the current rc release.  Simply tuning NR_IRQS is all I feel comfortable with for
  fixing things in the post merge window period.

- The generic code has no business with dealing with NR_IRQS sized arrays.
  Since we don't have a generic problem I don't see why we should have a generic dyn_array solution.

- The dyn_array infrastructure does not provide for per numa node allocation of
  irq_desc structures, limiting NUMA scalability.

- You appear to be papering over problems instead of digging in and actually fixing them.

YH  Here is what I was suggesting when the topic of killing NR_IRQs came up a week or so
ago.
http://lkml.org/lkml/2008/7/10/439
http://lkml.org/lkml/2008/7/10/532

Which essentially boils down to:
- Removing NR_IRQS from the non-irq infrastructure code.
- Add a config option for architectures that are not going to use an array
- In the genirq code have a lookup function that goes from irq number to irq_desc *.

The rest we should be able to handle in a arch dependent fashion.

When we are done we should be able to create a stable irq number for msi interrupts
that is something like:  bus:dev:fun:vector_no which is 8+5+3+12=28 bits long.

Eric

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 00/16] dyn_array and nr_irqs support v2
  2008-08-01 20:46 ` [PATCH 00/16] dyn_array and nr_irqs support v2 Eric W. Biederman
@ 2008-08-01 21:30   ` Yinghai Lu
  2008-08-01 21:57     ` Yinghai Lu
                       ` (2 more replies)
  2008-08-01 21:47   ` Mike Travis
  2008-08-02  2:58   ` Yinghai Lu
  2 siblings, 3 replies; 39+ messages in thread
From: Yinghai Lu @ 2008-08-01 21:30 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Ingo Molnar, Thomas Gleixner, hpa, Dhaval Giani, Mike Travis,
	Andrew Morton, linux-kernel

On Fri, Aug 1, 2008 at 1:46 PM, Eric W. Biederman <ebiederm@xmission.com> wrote:
> Yinghai Lu <yhlu.kernel@gmail.com> writes:
>
>> Please check dyn_array support for x86
>
> YH you have not addressed any of my core concerns and this exceeds my review limit.

i mean drivers/serial/8250.c

> Unfortunately I don't feel like this is a productive process.
>
> My core concerns are:
> - You have not separated out and separately pushed the regression patch.  So that we can
>  fix the current rc release.  Simply tuning NR_IRQS is all I feel comfortable with for
>  fixing things in the post merge window period.

Increase NR_IRQS to 512 for x86_64?

>
> - The generic code has no business with dealing with NR_IRQS sized arrays.
>  Since we don't have a generic problem I don't see why we should have a generic dyn_array solution.
besides

arch/x86/kernel/io_apic_32.c:DEFINE_DYN_ARRAY(irq_2_pin, sizeof(struct
irq_pin_list), pin_map_size, 16, NULL);
arch/x86/kernel/io_apic_32.c:DEFINE_DYN_ARRAY(balance_irq_affinity,
sizeof(struct balance_irq_affinity), nr_irqs, PAGE_SIZE,
irq_affinity_init_work);
arch/x86/kernel/io_apic_32.c:DEFINE_DYN_ARRAY(irq_vector, sizeof(u8),
nr_irqs, PAGE_SIZE, irq_vector_init_work);
arch/x86/kernel/io_apic_64.c:DEFINE_DYN_ARRAY(irq_cfg, sizeof(struct
irq_cfg), nr_irqs, PAGE_SIZE, init_work);
arch/x86/kernel/io_apic_64.c:DEFINE_DYN_ARRAY(irq_2_pin, sizeof(struct
irq_pin_list), pin_map_size, sizeof(struct irq_pin_list), NULL);

kernel/sched.c:DEFINE_PER_CPU_DYN_ARRAY_ADDR(per_cpu__kstat_irqs,
per_cpu__kstat.irqs, sizeof(unsigned int), nr_irqs, sizeof(unsigned
long), NULL);

and kstat.irqs is the killer... every cpu will have that. [NR_CPUS][NR_IRQS]...

>
> - The dyn_array infrastructure does not provide for per numa node allocation of
>  irq_desc structures, limiting NUMA scalability.

you plan to move irq_desc when irq_affinity is set to cpus on other node?

something like DEFINE_PER_NODE_DYN_ARRAY ?

>
> - You appear to be papering over problems instead of digging in and actually fixing them.

use dyn_array is less intrusive at this point. and dyn_array related
code is not big.
just NR_IRQS to nr_irqs to make the patches more bigger. actually it is simple.

with acpi_madt probing, nr_irqs is much small. like 48 or 98. and
current one is MACRO 224 or 256.

>
> YH  Here is what I was suggesting when the topic of killing NR_IRQs came up a week or so
> ago.
> http://lkml.org/lkml/2008/7/10/439
> http://lkml.org/lkml/2008/7/10/532
>
> Which essentially boils down to:
> - Removing NR_IRQS from the non-irq infrastructure code.
> - Add a config option for architectures that are not going to use an array
> - In the genirq code have a lookup function that goes from irq number to irq_desc *.

so we need one pointer array with that lookup function? what is the
pointer array index size?
or use list in that lookup function?

how about percpu kstat.irqs?

>
> The rest we should be able to handle in a arch dependent fashion.
>
> When we are done we should be able to create a stable irq number for msi interrupts
> that is something like:  bus:dev:fun:vector_no which is 8+5+3+12=28 bits long.

how about irq migration from one cpu to another with different vector_no ?

YH

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 00/16] dyn_array and nr_irqs support v2
  2008-08-01 20:46 ` [PATCH 00/16] dyn_array and nr_irqs support v2 Eric W. Biederman
  2008-08-01 21:30   ` Yinghai Lu
@ 2008-08-01 21:47   ` Mike Travis
  2008-08-02  2:58   ` Yinghai Lu
  2 siblings, 0 replies; 39+ messages in thread
From: Mike Travis @ 2008-08-01 21:47 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Yinghai Lu, Ingo Molnar, Thomas Gleixner, hpa, Dhaval Giani,
	Andrew Morton, linux-kernel

Eric W. Biederman wrote:
> Yinghai Lu <yhlu.kernel@gmail.com> writes:
> 
>> Please check dyn_array support for x86
> 
> YH you have not addressed any of my core concerns and this exceeds my review limit.
> Unfortunately I don't feel like this is a productive process.
> 
> My core concerns are:
> - You have not separated out and separately pushed the regression patch.  So that we can
>   fix the current rc release.  Simply tuning NR_IRQS is all I feel comfortable with for
>   fixing things in the post merge window period.
> 
> - The generic code has no business with dealing with NR_IRQS sized arrays.
>   Since we don't have a generic problem I don't see why we should have a generic dyn_array solution.
> 
> - The dyn_array infrastructure does not provide for per numa node allocation of
>   irq_desc structures, limiting NUMA scalability.
> 
> - You appear to be papering over problems instead of digging in and actually fixing them.
> 
> YH  Here is what I was suggesting when the topic of killing NR_IRQs came up a week or so
> ago.
> http://lkml.org/lkml/2008/7/10/439
> http://lkml.org/lkml/2008/7/10/532
> 
> Which essentially boils down to:
> - Removing NR_IRQS from the non-irq infrastructure code.
> - Add a config option for architectures that are not going to use an array
> - In the genirq code have a lookup function that goes from irq number to irq_desc *.
> 
> The rest we should be able to handle in a arch dependent fashion.
> 
> When we are done we should be able to create a stable irq number for msi interrupts
> that is something like:  bus:dev:fun:vector_no which is 8+5+3+12=28 bits long.
> 
> Eric

Hi Eric,

Small nit:  domain:bus:dev:fun:vector_no ... an SGI UV system can have potentially
512 domains (NODES), each having some # of busses.  

Thanks,
Mike

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 00/16] dyn_array and nr_irqs support v2
  2008-08-01 21:30   ` Yinghai Lu
@ 2008-08-01 21:57     ` Yinghai Lu
  2008-08-01 22:45       ` Eric W. Biederman
  2008-08-01 22:10     ` Yinghai Lu
  2008-08-01 22:38     ` Eric W. Biederman
  2 siblings, 1 reply; 39+ messages in thread
From: Yinghai Lu @ 2008-08-01 21:57 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Ingo Molnar, Thomas Gleixner, hpa, Dhaval Giani, Mike Travis,
	Andrew Morton, linux-kernel

On Fri, Aug 1, 2008 at 2:30 PM, Yinghai Lu <yhlu.kernel@gmail.com> wrote:
> On Fri, Aug 1, 2008 at 1:46 PM, Eric W. Biederman <ebiederm@xmission.com> wrote:

>> http://lkml.org/lkml/2008/7/10/439

you moved kstat_irqs to irqdesc, and it will not numa-aware. if
irq_desc is not go with every cpu.

YH

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 00/16] dyn_array and nr_irqs support v2
  2008-08-01 21:30   ` Yinghai Lu
  2008-08-01 21:57     ` Yinghai Lu
@ 2008-08-01 22:10     ` Yinghai Lu
  2008-08-01 22:38     ` Eric W. Biederman
  2 siblings, 0 replies; 39+ messages in thread
From: Yinghai Lu @ 2008-08-01 22:10 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Ingo Molnar, Thomas Gleixner, hpa, Dhaval Giani, Mike Travis,
	Andrew Morton, linux-kernel

On Fri, Aug 1, 2008 at 2:30 PM, Yinghai Lu <yhlu.kernel@gmail.com> wrote:
> On Fri, Aug 1, 2008 at 1:46 PM, Eric W. Biederman <ebiederm@xmission.com> wrote:

>> http://lkml.org/lkml/2008/7/10/532

so using rcu to get irq_desc.

YH

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 00/16] dyn_array and nr_irqs support v2
  2008-08-01 21:30   ` Yinghai Lu
  2008-08-01 21:57     ` Yinghai Lu
  2008-08-01 22:10     ` Yinghai Lu
@ 2008-08-01 22:38     ` Eric W. Biederman
  2008-08-02  1:09       ` Yinghai Lu
  2 siblings, 1 reply; 39+ messages in thread
From: Eric W. Biederman @ 2008-08-01 22:38 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Ingo Molnar, Thomas Gleixner, hpa, Dhaval Giani, Mike Travis,
	Andrew Morton, linux-kernel

"Yinghai Lu" <yhlu.kernel@gmail.com> writes:

> On Fri, Aug 1, 2008 at 1:46 PM, Eric W. Biederman <ebiederm@xmission.com> wrote:
>> Yinghai Lu <yhlu.kernel@gmail.com> writes:
>>
>>> Please check dyn_array support for x86
>>
>> YH you have not addressed any of my core concerns and this exceeds my review
> limit.
>
> i mean drivers/serial/8250.c

Still not based on UART_NR.  Although Alan said he would take a look at it
next week, because he thinks this is important work.

>> Unfortunately I don't feel like this is a productive process.
>>
>> My core concerns are:
>> - You have not separated out and separately pushed the regression patch.  So
> that we can
>> fix the current rc release.  Simply tuning NR_IRQS is all I feel comfortable
> with for
>>  fixing things in the post merge window period.
>
> Increase NR_IRQS to 512 for x86_64?

x86_32 has it set to 1024 so 512 is too small.  I think your patch
which essentially restores the old behavior is the right way to go for
this merge window.  I just want to carefully look at it and ensure we
are restoring the old heuristics.  On a lot of large machines we wind
up having irqs for pci slots that are never filled with cards.

>> - The generic code has no business with dealing with NR_IRQS sized arrays.
>> Since we don't have a generic problem I don't see why we should have a generic
> dyn_array solution.
> besides
>
> arch/x86/kernel/io_apic_32.c:DEFINE_DYN_ARRAY(irq_2_pin, sizeof(struct
> irq_pin_list), pin_map_size, 16, NULL);
> arch/x86/kernel/io_apic_32.c:DEFINE_DYN_ARRAY(balance_irq_affinity,
> sizeof(struct balance_irq_affinity), nr_irqs, PAGE_SIZE,
> irq_affinity_init_work);
> arch/x86/kernel/io_apic_32.c:DEFINE_DYN_ARRAY(irq_vector, sizeof(u8),
> nr_irqs, PAGE_SIZE, irq_vector_init_work);
> arch/x86/kernel/io_apic_64.c:DEFINE_DYN_ARRAY(irq_cfg, sizeof(struct
> irq_cfg), nr_irqs, PAGE_SIZE, init_work);
> arch/x86/kernel/io_apic_64.c:DEFINE_DYN_ARRAY(irq_2_pin, sizeof(struct
> irq_pin_list), pin_map_size, sizeof(struct irq_pin_list), NULL);

You have noticed how much of those arrays I have collapsed into irq_cfg
on x86_64.  We can ultimately do the same on x86_32.  The
tricky one is irq_2_pin.  I believe the proper solution is to just
dynamically allocate entries and place a pointer in irq_cfg.  Although
we may be able to simply a place a single entry in irq_cfg.

> kernel/sched.c:DEFINE_PER_CPU_DYN_ARRAY_ADDR(per_cpu__kstat_irqs,
> per_cpu__kstat.irqs, sizeof(unsigned int), nr_irqs, sizeof(unsigned
> long), NULL);
>
> and kstat.irqs is the killer... every cpu will have that. [NR_CPUS][NR_IRQS]...

Yes.  See my patch in the referenced lkml link.

>> - The dyn_array infrastructure does not provide for per numa node allocation
> of
>>  irq_desc structures, limiting NUMA scalability.
>
> you plan to move irq_desc when irq_affinity is set to cpus on other node?
> something like DEFINE_PER_NODE_DYN_ARRAY ?

Not when irq_affinity is set.  But rather allocate it with the on the
node where the device that generates the irq and the node where the
irq controller the irq goes through is located on.  Which is where we
should be handling the irq if we want performance.

>> - You appear to be papering over problems instead of digging in and actually
> fixing them.
>
> use dyn_array is less intrusive at this point. and dyn_array related
> code is not big.
> just NR_IRQS to nr_irqs to make the patches more bigger. actually it is simple.
>
> with acpi_madt probing, nr_irqs is much small. like 48 or 98. and
> current one is MACRO 224 or 256.

I agree with your sentiment if we can actually allocate the irqs by
demand instead of preallocating them based on worst case usage we
should use much less memory.

I figure that keeping any type of nr_irqs around you are requiring
us to estimate the worst case number of irqs we need to deal with.

The challenge is that we have hot plug devices with MSI-X capabilities
on them.  Just one of those could add 4K irqs (worst case).  256 or
so I have actually heard hardware guys talking about.

But even one msi vector on a pci card that doesn't have normal irqs could
mess up a tightly sized nr_irqs based soley on acpi_madt probing.

>> YH Here is what I was suggesting when the topic of killing NR_IRQs came up a
> week or so
>> ago.
>> http://lkml.org/lkml/2008/7/10/439
>> http://lkml.org/lkml/2008/7/10/532
>>
>> Which essentially boils down to:
>> - Removing NR_IRQS from the non-irq infrastructure code.
>> - Add a config option for architectures that are not going to use an array
>> - In the genirq code have a lookup function that goes from irq number to
> irq_desc *.
>
> so we need one pointer array with that lookup function? what is the
> pointer array index size?
> or use list in that lookup function?

Please read the articles I mentioned.  My first approximation would
be a linked list.  irq is always defined as "unsigned int irq"

> how about percpu kstat.irqs?

Again in the referenced articles is my old patch that turns kstat.irqs
inside out.  Allowing us to handle that case with a normal percpu
allocation per irq.  Ultimately I think that is smaller.

>> The rest we should be able to handle in a arch dependent fashion.
>>
>> When we are done we should be able to create a stable irq number for msi
> interrupts
>> that is something like: bus:dev:fun:vector_no which is 8+5+3+12=28 bits long.
>
> how about irq migration from one cpu to another with different vector_no ?

Sorry I was referring to the MSI-X source vector number which is a 12
bit index into an array of MSI-X vectors on the pci device, not the
vector we receive the irq at on the pci card.

Eric

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 00/16] dyn_array and nr_irqs support v2
  2008-08-01 21:57     ` Yinghai Lu
@ 2008-08-01 22:45       ` Eric W. Biederman
  0 siblings, 0 replies; 39+ messages in thread
From: Eric W. Biederman @ 2008-08-01 22:45 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Ingo Molnar, Thomas Gleixner, hpa, Dhaval Giani, Mike Travis,
	Andrew Morton, linux-kernel

"Yinghai Lu" <yhlu.kernel@gmail.com> writes:

> On Fri, Aug 1, 2008 at 2:30 PM, Yinghai Lu <yhlu.kernel@gmail.com> wrote:
>> On Fri, Aug 1, 2008 at 1:46 PM, Eric W. Biederman <ebiederm@xmission.com>
> wrote:
>
>>> http://lkml.org/lkml/2008/7/10/439
>
> you moved kstat_irqs to irqdesc, and it will not numa-aware. if
> irq_desc is not go with every cpu.

That part is a limitation of the per cpu allocator that the sgi guys
are in the process of fixing.  Which is one of the following goals
of folding the pda into a per cpu structure.

In practice it matters little as irqs only occur on one cpu at a time,
so we shouldn't have cache line contention.

I never got to the arch specific part of allocating irq_desc in a numa
aware fashion.  But I have always figured that if we move the work to
arch code it won't be too difficult, to do things appropriately.

Eric

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 00/16] dyn_array and nr_irqs support v2
  2008-08-01 22:38     ` Eric W. Biederman
@ 2008-08-02  1:09       ` Yinghai Lu
  2008-08-02  1:36         ` H. Peter Anvin
  2008-08-02  1:41         ` Eric W. Biederman
  0 siblings, 2 replies; 39+ messages in thread
From: Yinghai Lu @ 2008-08-02  1:09 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Ingo Molnar, Thomas Gleixner, hpa, Dhaval Giani, Mike Travis,
	Andrew Morton, linux-kernel

On Fri, Aug 1, 2008 at 3:38 PM, Eric W. Biederman <ebiederm@xmission.com> wrote:
> "Yinghai Lu" <yhlu.kernel@gmail.com> writes:
>
>> On Fri, Aug 1, 2008 at 1:46 PM, Eric W. Biederman <ebiederm@xmission.com> wrote:
>>> Yinghai Lu <yhlu.kernel@gmail.com> writes:
>>>
>>>> Please check dyn_array support for x86
>>>
>>> YH you have not addressed any of my core concerns and this exceeds my review
>> limit.
>>
>> i mean drivers/serial/8250.c
>
> Still not based on UART_NR.  Although Alan said he would take a look at it
> next week, because he thinks this is important work.

change that to list?

>
>>> Unfortunately I don't feel like this is a productive process.
>>>
>>> My core concerns are:
>>> - You have not separated out and separately pushed the regression patch.  So
>> that we can
>>> fix the current rc release.  Simply tuning NR_IRQS is all I feel comfortable
>> with for
>>>  fixing things in the post merge window period.
>>
>> Increase NR_IRQS to 512 for x86_64?
>
> x86_32 has it set to 1024 so 512 is too small.  I think your patch
> which essentially restores the old behavior is the right way to go for
> this merge window.  I just want to carefully look at it and ensure we
> are restoring the old heuristics.  On a lot of large machines we wind
> up having irqs for pci slots that are never filled with cards.

it seems 32bit summit need NR_IRQS=256, NR_IRQ_VECTOR=1024

>
>>> - The generic code has no business with dealing with NR_IRQS sized arrays.
>>> Since we don't have a generic problem I don't see why we should have a generic
>> dyn_array solution.
>> besides
>>
>> arch/x86/kernel/io_apic_32.c:DEFINE_DYN_ARRAY(irq_2_pin, sizeof(struct
>> irq_pin_list), pin_map_size, 16, NULL);
>> arch/x86/kernel/io_apic_32.c:DEFINE_DYN_ARRAY(balance_irq_affinity,
>> sizeof(struct balance_irq_affinity), nr_irqs, PAGE_SIZE,
>> irq_affinity_init_work);
>> arch/x86/kernel/io_apic_32.c:DEFINE_DYN_ARRAY(irq_vector, sizeof(u8),
>> nr_irqs, PAGE_SIZE, irq_vector_init_work);
>> arch/x86/kernel/io_apic_64.c:DEFINE_DYN_ARRAY(irq_cfg, sizeof(struct
>> irq_cfg), nr_irqs, PAGE_SIZE, init_work);
>> arch/x86/kernel/io_apic_64.c:DEFINE_DYN_ARRAY(irq_2_pin, sizeof(struct
>> irq_pin_list), pin_map_size, sizeof(struct irq_pin_list), NULL);
>
> You have noticed how much of those arrays I have collapsed into irq_cfg
> on x86_64.  We can ultimately do the same on x86_32.  The
> tricky one is irq_2_pin.  I believe the proper solution is to just
> dynamically allocate entries and place a pointer in irq_cfg.  Although
> we may be able to simply a place a single entry in irq_cfg.

so there will be irq_desc and irq_cfg lists?

wonder if helper to get irq_desc and irq_cfg for one irq_no could be bottleneck?

PS: cpumask_t domain in irq_cfg need to updated... it wast 512bytes
when NR_CPUS=4096
could change it to unsigned int. logical mode (flat, x2apic logical) it as mask
and (physical flat mode, and x2apic physical) it is cpu number.

>
>> kernel/sched.c:DEFINE_PER_CPU_DYN_ARRAY_ADDR(per_cpu__kstat_irqs,
>> per_cpu__kstat.irqs, sizeof(unsigned int), nr_irqs, sizeof(unsigned
>> long), NULL);
>>
>> and kstat.irqs is the killer... every cpu will have that. [NR_CPUS][NR_IRQS]...
>
> Yes.  See my patch in the referenced lkml link.
>
>>> - The dyn_array infrastructure does not provide for per numa node allocation
>> of
>>>  irq_desc structures, limiting NUMA scalability.
>>
>> you plan to move irq_desc when irq_affinity is set to cpus on other node?
>> something like DEFINE_PER_NODE_DYN_ARRAY ?
>
> Not when irq_affinity is set.  But rather allocate it with the on the
> node where the device that generates the irq and the node where the
> irq controller the irq goes through is located on.  Which is where we
> should be handling the irq if we want performance.
>
>>> - You appear to be papering over problems instead of digging in and actually
>> fixing them.
>>
>> use dyn_array is less intrusive at this point. and dyn_array related
>> code is not big.
>> just NR_IRQS to nr_irqs to make the patches more bigger. actually it is simple.
>>
>> with acpi_madt probing, nr_irqs is much small. like 48 or 98. and
>> current one is MACRO 224 or 256.
>
> I agree with your sentiment if we can actually allocate the irqs by
> demand instead of preallocating them based on worst case usage we
> should use much less memory.

yes.

>
> I figure that keeping any type of nr_irqs around you are requiring
> us to estimate the worst case number of irqs we need to deal with.

need to comprise flexibility and performance..., or say waste some
space to get some performance...

>
> The challenge is that we have hot plug devices with MSI-X capabilities
> on them.  Just one of those could add 4K irqs (worst case).  256 or
> so I have actually heard hardware guys talking about.
good know. so one cpu handle one card? or need 16 cpus serve one
cards? or they got new cpu to NR_VECTORS  with 32bit?

then need to keep struct irq_desc, can not put everything into it.

>
> But even one msi vector on a pci card that doesn't have normal irqs could
> mess up a tightly sized nr_irqs based soley on acpi_madt probing.

v2 double that last_gsi_end

>
>>> YH Here is what I was suggesting when the topic of killing NR_IRQs came up a
>> week or so
>>> ago.
>>> http://lkml.org/lkml/2008/7/10/439
>>> http://lkml.org/lkml/2008/7/10/532
>>>
>>> Which essentially boils down to:
>>> - Removing NR_IRQS from the non-irq infrastructure code.
>>> - Add a config option for architectures that are not going to use an array
>>> - In the genirq code have a lookup function that goes from irq number to
>> irq_desc *.
>>
>> so we need one pointer array with that lookup function? what is the
>> pointer array index size?
>> or use list in that lookup function?
>
> Please read the articles I mentioned.  My first approximation would
> be a linked list.  irq is always defined as "unsigned int irq"
>
>> how about percpu kstat.irqs?
>
> Again in the referenced articles is my old patch that turns kstat.irqs
> inside out.  Allowing us to handle that case with a normal percpu
> allocation per irq.  Ultimately I think that is smaller.
so it is
kstat_irqs[cpu][NR_IRQS] ==> irq_desc..kstat_irqs[nr_cpus]
>
>>> The rest we should be able to handle in a arch dependent fashion.
>>>
>>> When we are done we should be able to create a stable irq number for msi
>> interrupts
>>> that is something like: bus:dev:fun:vector_no which is 8+5+3+12=28 bits long.
>>
>> how about irq migration from one cpu to another with different vector_no ?
>
> Sorry I was referring to the MSI-X source vector number which is a 12
> bit index into an array of MSI-X vectors on the pci device, not the
> vector we receive the irq at on the pci card.

cpu is going to check that vectors in addition to vectors in IDT?

YH

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 00/16] dyn_array and nr_irqs support v2
  2008-08-02  1:09       ` Yinghai Lu
@ 2008-08-02  1:36         ` H. Peter Anvin
  2008-08-02  1:41         ` Eric W. Biederman
  1 sibling, 0 replies; 39+ messages in thread
From: H. Peter Anvin @ 2008-08-02  1:36 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Eric W. Biederman, Ingo Molnar, Thomas Gleixner, Dhaval Giani,
	Mike Travis, Andrew Morton, linux-kernel

Yinghai Lu wrote:
>> Sorry I was referring to the MSI-X source vector number which is a 12
>> bit index into an array of MSI-X vectors on the pci device, not the
>> vector we receive the irq at on the pci card.
> 
> cpu is going to check that vectors in addition to vectors in IDT?

No, but the mapping from MSI-X vectors to -> {CPU, IDT} is arbitrary, as 
the MSI(-X) address and data registers contain the target CPU and 
destination vector, respectively.  However, we may have to manage the 
mappings directly, to re-use IDT entries and provide interrupt balancing.

	-hpa

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 00/16] dyn_array and nr_irqs support v2
  2008-08-02  1:09       ` Yinghai Lu
  2008-08-02  1:36         ` H. Peter Anvin
@ 2008-08-02  1:41         ` Eric W. Biederman
  2008-08-02  2:01           ` Yinghai Lu
  2008-08-04 12:57           ` Mike Travis
  1 sibling, 2 replies; 39+ messages in thread
From: Eric W. Biederman @ 2008-08-02  1:41 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Ingo Molnar, Thomas Gleixner, hpa, Dhaval Giani, Mike Travis,
	Andrew Morton, linux-kernel

"Yinghai Lu" <yhlu.kernel@gmail.com> writes:

>>> Increase NR_IRQS to 512 for x86_64?
>>
>> x86_32 has it set to 1024 so 512 is too small.  I think your patch
>> which essentially restores the old behavior is the right way to go for
>> this merge window.  I just want to carefully look at it and ensure we
>> are restoring the old heuristics.  On a lot of large machines we wind
>> up having irqs for pci slots that are never filled with cards.
>
> it seems 32bit summit need NR_IRQS=256, NR_IRQ_VECTOR=1024

Yes.  Which is 1024 irq sources/gsis only 1/4 used so it will fit into 256 irqs.

On x86_64 we have removed the confusing and brittle irq compression
code.  So to handle that many irqs we would need 1024 irqs.

I expect modern big systems that can only run x86_64 are larger still.

>> You have noticed how much of those arrays I have collapsed into irq_cfg
>> on x86_64.  We can ultimately do the same on x86_32.  The
>> tricky one is irq_2_pin.  I believe the proper solution is to just
>> dynamically allocate entries and place a pointer in irq_cfg.  Although
>> we may be able to simply a place a single entry in irq_cfg.

> so there will be irq_desc and irq_cfg lists?
Or we place irq_desc in irq_cfg.

> wonder if helper to get irq_desc and irq_cfg for one irq_no could be bottleneck?

Nah.  We lookup whatever it we need in the 256 entry vector_irq table.
I expect we can do the container_of trick beyond that.

If the helper which we should only see on the slow path is a bottleneck
we can easily turn organize irq_desc into a tree structure.  Ultimately
I think we want drivers to have a struct irq *irq pointer but we need
to get the arch backend working first.

> PS: cpumask_t domain in irq_cfg need to updated... it wast 512bytes
> when NR_CPUS=4096
> could change it to unsigned int. logical mode (flat, x2apic logical) it as mask
> and (physical flat mode, and x2apic physical) it is cpu number.

Certainly there is the potential to simplify things.

>> I agree with your sentiment if we can actually allocate the irqs by
>> demand instead of preallocating them based on worst case usage we
>> should use much less memory.
>
> yes.
>
>>
>> I figure that keeping any type of nr_irqs around you are requiring
>> us to estimate the worst case number of irqs we need to deal with.
>
> need to comprise flexibility and performance..., or say waste some
> space to get some performance...

The thing is there is no good upper bound of how many irqs we can see
short of of NR_PCI_DEVICES*4096

>> The challenge is that we have hot plug devices with MSI-X capabilities
>> on them.  Just one of those could add 4K irqs (worst case).  256 or
>> so I have actually heard hardware guys talking about.

> good know. so one cpu handle one card? or need 16 cpus serve one
> cards? or they got new cpu to NR_VECTORS  with 32bit?

Yes.  Currently for the current worst case it requires 16 cpus.
The biggest I have heard a card using at this point is 256 irqs.
At lot of the goal in those cards is so they can have 2 irqs per cpu.
1 rx irq and 1 tx irq.  Allowing them to implement per cpu queues.

> then need to keep struct irq_desc, can not put everything into it.

Yes.  But we can put all the arch specific code in irq_cfg, and put
irq_desc in irq_cfg.

>> But even one msi vector on a pci card that doesn't have normal irqs could
>> mess up a tightly sized nr_irqs based soley on acpi_madt probing.
>
> v2 double that last_gsi_end

Which is usable, but no where near as nice as not having a fixed upper bound.


>> Sorry I was referring to the MSI-X source vector number which is a 12
>> bit index into an array of MSI-X vectors on the pci device, not the
>> vector we receive the irq at on the pci card.
>
> cpu is going to check that vectors in addition to vectors in IDT?

No. The destination cpu and destination vector number are encoded in
the MSI message.  Each MSI-X source ``vector'' has a different MSI message.

So on my wish list is to stably encode the MSI interurrpt numbers.  And
using a sparse irq address space I can.  As it only takes 28 bits to hold
the complete bus + device + function + msi source [ 0-4095 ] 

Eric

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 00/16] dyn_array and nr_irqs support v2
  2008-08-02  1:41         ` Eric W. Biederman
@ 2008-08-02  2:01           ` Yinghai Lu
  2008-08-02  2:03             ` H. Peter Anvin
  2008-08-04 12:57           ` Mike Travis
  1 sibling, 1 reply; 39+ messages in thread
From: Yinghai Lu @ 2008-08-02  2:01 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Ingo Molnar, Thomas Gleixner, hpa, Dhaval Giani, Mike Travis,
	Andrew Morton, linux-kernel

>> cpu is going to check that vectors in addition to vectors in IDT?
>
> No. The destination cpu and destination vector number are encoded in
> the MSI message.  Each MSI-X source ``vector'' has a different MSI message.
>
> So on my wish list is to stably encode the MSI interurrpt numbers.  And
> using a sparse irq address space I can.  As it only takes 28 bits to hold
> the complete bus + device + function + msi source [ 0-4095 ]
>

how about ioapic interrupt numbers...? they should stay with same
numbering with gsi?

and how about pci segments : that will need another 4 bits for AMD
systems..aka 16 segments..

you will run out of 32bits...

BTW:
kstat_irqs patch is there.
How are the progress with irq_cfg/irq_desc dyn allocating patch?

YH

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 00/16] dyn_array and nr_irqs support v2
  2008-08-02  2:01           ` Yinghai Lu
@ 2008-08-02  2:03             ` H. Peter Anvin
  2008-08-02  2:39               ` Eric W. Biederman
  0 siblings, 1 reply; 39+ messages in thread
From: H. Peter Anvin @ 2008-08-02  2:03 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Eric W. Biederman, Ingo Molnar, Thomas Gleixner, Dhaval Giani,
	Mike Travis, Andrew Morton, linux-kernel

Yinghai Lu wrote:
>>> cpu is going to check that vectors in addition to vectors in IDT?
>> No. The destination cpu and destination vector number are encoded in
>> the MSI message.  Each MSI-X source ``vector'' has a different MSI message.
>>
>> So on my wish list is to stably encode the MSI interurrpt numbers.  And
>> using a sparse irq address space I can.  As it only takes 28 bits to hold
>> the complete bus + device + function + msi source [ 0-4095 ]
> 
> how about ioapic interrupt numbers...? they should stay with same
> numbering with gsi?
> 
> and how about pci segments : that will need another 4 bits for AMD
> systems..aka 16 segments..
> 
> you will run out of 32bits...
> 

I also see little value in stably encoding IRQ numbers using 
geographical identifiers.  It seems that the only case where you care 
that an interrupt number is stable is when it is *not* tied to a 
geographically addressed entity, so why does it matter?

	-hpa

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 00/16] dyn_array and nr_irqs support v2
  2008-08-02  2:03             ` H. Peter Anvin
@ 2008-08-02  2:39               ` Eric W. Biederman
  2008-08-02  3:28                 ` H. Peter Anvin
  0 siblings, 1 reply; 39+ messages in thread
From: Eric W. Biederman @ 2008-08-02  2:39 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Yinghai Lu, Ingo Molnar, Thomas Gleixner, Dhaval Giani,
	Mike Travis, Andrew Morton, linux-kernel

"H. Peter Anvin" <hpa@zytor.com> writes:

> I also see little value in stably encoding IRQ numbers using geographical
> identifiers.  It seems that the only case where you care that an interrupt
> number is stable is when it is *not* tied to a geographically addressed entity,
> so why does it matter?

In the case of msi it is a minor. In the case of GSIs from ACPI it dramatically
simplified the code, and improved it's reliability.  Because then everyone including
ACPI was always using the same.

So in general principle I think we should have stable irq numbers if we can.  Which
allows someone to say I have a problem with irq X.  And it will always be irq X on
their box.  An extra level of indirection makes debugging more difficult.

Having a human readable name like: eth0irq22 or hbairq5 is likely just
as good in the case of msi.  Still all of the users interfaces today take numbers.
So we are stuck with dealing with numbers for a long time to come.

Eric

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 00/16] dyn_array and nr_irqs support v2
  2008-08-01 20:46 ` [PATCH 00/16] dyn_array and nr_irqs support v2 Eric W. Biederman
  2008-08-01 21:30   ` Yinghai Lu
  2008-08-01 21:47   ` Mike Travis
@ 2008-08-02  2:58   ` Yinghai Lu
  2 siblings, 0 replies; 39+ messages in thread
From: Yinghai Lu @ 2008-08-02  2:58 UTC (permalink / raw)
  To: Eric W. Biederman, Mike Travis
  Cc: Ingo Molnar, Thomas Gleixner, hpa, Dhaval Giani, Andrew Morton,
	linux-kernel

On Fri, Aug 1, 2008 at 1:46 PM, Eric W. Biederman <ebiederm@xmission.com> wrote:
> My core concerns are:
> - The generic code has no business with dealing with NR_IRQS sized arrays.
>  Since we don't have a generic problem I don't see why we should have a generic dyn_array solution.

there is some NR_IRQS arrays are left over and not put into PER_CPU?

Wonder if could use dyn_array with some of them.

YH

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 00/16] dyn_array and nr_irqs support v2
  2008-08-02  2:39               ` Eric W. Biederman
@ 2008-08-02  3:28                 ` H. Peter Anvin
  2008-08-02  4:42                   ` Eric W. Biederman
  0 siblings, 1 reply; 39+ messages in thread
From: H. Peter Anvin @ 2008-08-02  3:28 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Yinghai Lu, Ingo Molnar, Thomas Gleixner, Dhaval Giani,
	Mike Travis, Andrew Morton, linux-kernel

Eric W. Biederman wrote:
> 
> Having a human readable name like: eth0irq22 or hbairq5 is likely just
> as good in the case of msi.  Still all of the users interfaces today take numbers.
> So we are stuck with dealing with numbers for a long time to come.
> 

Long sparse numbers are messy, too, though.  It might be interesting to 
have a routine somewhere like "irq_name()" to output a human-readable 
IRQ name, which in case of MSI-X could contain the PCI device name.

	-hpa

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 00/16] dyn_array and nr_irqs support v2
  2008-08-02  3:28                 ` H. Peter Anvin
@ 2008-08-02  4:42                   ` Eric W. Biederman
  2008-08-02 15:41                     ` H. Peter Anvin
  0 siblings, 1 reply; 39+ messages in thread
From: Eric W. Biederman @ 2008-08-02  4:42 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Yinghai Lu, Ingo Molnar, Thomas Gleixner, Dhaval Giani,
	Mike Travis, Andrew Morton, linux-kernel

"H. Peter Anvin" <hpa@zytor.com> writes:

> Eric W. Biederman wrote:
>>
>> Having a human readable name like: eth0irq22 or hbairq5 is likely just
>> as good in the case of msi.  Still all of the users interfaces today take
> numbers.
>> So we are stuck with dealing with numbers for a long time to come.
>>
>
> Long sparse numbers are messy, too, though.  It might be interesting to have a
> routine somewhere like "irq_name()" to output a human-readable IRQ name, which
> in case of MSI-X could contain the PCI device name.

Yes.  I want the option of using those bits.  It might not be smart to
use them to encode a physical location and the irq number but just
having the option would be nice.

Making /proc/interrupts useful without breaking user space is going to be
an interesting challenge one of these days.

Eric

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 00/16] dyn_array and nr_irqs support v2
  2008-08-02  4:42                   ` Eric W. Biederman
@ 2008-08-02 15:41                     ` H. Peter Anvin
  2008-08-02 20:20                       ` Eric W. Biederman
  0 siblings, 1 reply; 39+ messages in thread
From: H. Peter Anvin @ 2008-08-02 15:41 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Yinghai Lu, Ingo Molnar, Thomas Gleixner, Dhaval Giani,
	Mike Travis, Andrew Morton, linux-kernel

Eric W. Biederman wrote:
> 
> Yes.  I want the option of using those bits.  It might not be smart to
> use them to encode a physical location and the irq number but just
> having the option would be nice.
> 

Urk!  First of all, there isn't enough space as we have already proven 
(on the machines where it actually matters there just aren't enough 
bits), but doing this kind of stuff *optionally* is going to hurt even 
worse.

Furthermore, this crap will break anyway the *next* time someone comes 
up with a new clever way to do interrupts -- and to truly get stable 
identifiers, we can't treat HyperTransport MSI as APICs anymore, yadda, 
yadda...

> Making /proc/interrupts useful without breaking user space is going to be
> an interesting challenge one of these days.

If changing to non-numbers in /proc/interrupts will break userspace, 
then userspace will have to deal with a numeric token in 
/proc/interrupts which will have to be looked up elsewhere (perhaps in a 
sysfs directory) to get a more meaningful index.

	-hpa

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 00/16] dyn_array and nr_irqs support v2
  2008-08-02 15:41                     ` H. Peter Anvin
@ 2008-08-02 20:20                       ` Eric W. Biederman
  0 siblings, 0 replies; 39+ messages in thread
From: Eric W. Biederman @ 2008-08-02 20:20 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Yinghai Lu, Ingo Molnar, Thomas Gleixner, Dhaval Giani,
	Mike Travis, Andrew Morton, linux-kernel

"H. Peter Anvin" <hpa@zytor.com> writes:

> Eric W. Biederman wrote:
>>
>> Yes.  I want the option of using those bits.  It might not be smart to
>> use them to encode a physical location and the irq number but just
>> having the option would be nice.
>>
>
> Urk!  First of all, there isn't enough space as we have already proven (on the
> machines where it actually matters there just aren't enough bits), but doing
> this kind of stuff *optionally* is going to hurt even worse.

With respect to space we have shown: We create many more irq_desc
entries then we use in practice.  Which hurts us when it comes to
pace.  Especially when compiling a single kernel for a wide range
of machines.

Which is why I ultimately want a list or a tree data structure holding
irq_desc entries instead of an array.  Arrays must be statically
oversized sized, waisting space and reducing our flexibility of
dealing with irqs at run time.

Which says to me the low level architecture code that actually knows
at run time how many irqs there are should do the allocation of
irq_desc entries and allocating them on the appropriate NUMA node.

All of which should yield no fixed cap short of 32 bits for the irq
number at run time.  Not having an arbitrarily low cap is what I mean
by having the option of a sparsely allocated irq number. If we have a
nice data structure that is a side effect that comes essentially for
free.

Except for upgrading the genirq code to pass things internally and to
the arch code in terms of irq_desc * entries.  This should be very little
change from where we are today.

> Furthermore, this crap will break anyway the *next* time someone comes up with a
> new clever way to do interrupts -- and to truly get stable identifiers, we can't
> treat HyperTransport MSI as APICs anymore, yadda, yadda...

Yes.  There are those kinds of issues.  I don't think I have yet come up with
a usable stable mapping for msi interrupts.  Just something close.

I expect what is most likely to work is after allocating the fixed irqs, to scan the
pci busses and for each for each pci device if msi is supported reserve 1 irq number.
If msi-X is supported reserve 4096 irq numbers.  If ht-irqs are supported reserve
1 irq for each irq number.  Hot plug slots that can ultimately have pci busses
plugged into them are going to be interesting.  But I think if we make an
effort msi irq numbers will stop flapping in the breeze and are likely to
remain the same, and fit in the number of bits we have.  While still not
requiring us to allocate storage for them.  Potentially we can even treat
GSIs the same way.  If we know that an ioapic line is simply not connected
we can reserve an irq number for it at boot but never allocate an irq_desc
structure for it.

What I mean by having the option to do a stable mapping is that we don't build
in unnecessary a priori limits to the maximum irq number.  Irq numbers have
always been sparsely allocated.  It was a rare ISA system that used all 16
of it's irqs.  It was an even rarer ioapic based system that used all of it's ioapic
inputs but we have always reserved irq numbers for all of those potential irqs.

So I ask to have a data structure that can potentially span the entire 32bit
range of irq numbers, and that instead of a dense and sparsely used array
we keep just the irq_desc entries that we need.

The only compile time options would be:  Has this architecture switched over
to a sparse irq array data structure.

Eric

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 00/16] dyn_array and nr_irqs support v2
  2008-08-02  1:41         ` Eric W. Biederman
  2008-08-02  2:01           ` Yinghai Lu
@ 2008-08-04 12:57           ` Mike Travis
  2008-08-05  2:38             ` H. Peter Anvin
  1 sibling, 1 reply; 39+ messages in thread
From: Mike Travis @ 2008-08-04 12:57 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Yinghai Lu, Ingo Molnar, Thomas Gleixner, hpa, Dhaval Giani,
	Andrew Morton, linux-kernel

Eric W. Biederman wrote:
> "Yinghai Lu" <yhlu.kernel@gmail.com> writes:
> 
>>>> Increase NR_IRQS to 512 for x86_64?
>>> x86_32 has it set to 1024 so 512 is too small.  I think your patch
>>> which essentially restores the old behavior is the right way to go for
>>> this merge window.  I just want to carefully look at it and ensure we
>>> are restoring the old heuristics.  On a lot of large machines we wind
>>> up having irqs for pci slots that are never filled with cards.
>> it seems 32bit summit need NR_IRQS=256, NR_IRQ_VECTOR=1024
> 
> Yes.  Which is 1024 irq sources/gsis only 1/4 used so it will fit into 256 irqs.
> 
> On x86_64 we have removed the confusing and brittle irq compression
> code.  So to handle that many irqs we would need 1024 irqs.
> 
> I expect modern big systems that can only run x86_64 are larger still.
> 
>>> You have noticed how much of those arrays I have collapsed into irq_cfg
>>> on x86_64.  We can ultimately do the same on x86_32.  The
>>> tricky one is irq_2_pin.  I believe the proper solution is to just
>>> dynamically allocate entries and place a pointer in irq_cfg.  Although
>>> we may be able to simply a place a single entry in irq_cfg.
> 
>> so there will be irq_desc and irq_cfg lists?
> Or we place irq_desc in irq_cfg.
> 
>> wonder if helper to get irq_desc and irq_cfg for one irq_no could be bottleneck?
> 
> Nah.  We lookup whatever it we need in the 256 entry vector_irq table.
> I expect we can do the container_of trick beyond that.
> 
> If the helper which we should only see on the slow path is a bottleneck
> we can easily turn organize irq_desc into a tree structure.  Ultimately
> I think we want drivers to have a struct irq *irq pointer but we need
> to get the arch backend working first.
> 
>> PS: cpumask_t domain in irq_cfg need to updated... it wast 512bytes
>> when NR_CPUS=4096
>> could change it to unsigned int. logical mode (flat, x2apic logical) it as mask
>> and (physical flat mode, and x2apic physical) it is cpu number.
> 
> Certainly there is the potential to simplify things.
> 
>>> I agree with your sentiment if we can actually allocate the irqs by
>>> demand instead of preallocating them based on worst case usage we
>>> should use much less memory.
>> yes.
>>
>>> I figure that keeping any type of nr_irqs around you are requiring
>>> us to estimate the worst case number of irqs we need to deal with.
>> need to comprise flexibility and performance..., or say waste some
>> space to get some performance...
> 
> The thing is there is no good upper bound of how many irqs we can see
> short of of NR_PCI_DEVICES*4096
> 
>>> The challenge is that we have hot plug devices with MSI-X capabilities
>>> on them.  Just one of those could add 4K irqs (worst case).  256 or
>>> so I have actually heard hardware guys talking about.
> 
>> good know. so one cpu handle one card? or need 16 cpus serve one
>> cards? or they got new cpu to NR_VECTORS  with 32bit?
> 
> Yes.  Currently for the current worst case it requires 16 cpus.
> The biggest I have heard a card using at this point is 256 irqs.
> At lot of the goal in those cards is so they can have 2 irqs per cpu.
> 1 rx irq and 1 tx irq.  Allowing them to implement per cpu queues.
> 
>> then need to keep struct irq_desc, can not put everything into it.
> 
> Yes.  But we can put all the arch specific code in irq_cfg, and put
> irq_desc in irq_cfg.
> 
>>> But even one msi vector on a pci card that doesn't have normal irqs could
>>> mess up a tightly sized nr_irqs based soley on acpi_madt probing.
>> v2 double that last_gsi_end
> 
> Which is usable, but no where near as nice as not having a fixed upper bound.
> 
> 
>>> Sorry I was referring to the MSI-X source vector number which is a 12
>>> bit index into an array of MSI-X vectors on the pci device, not the
>>> vector we receive the irq at on the pci card.
>> cpu is going to check that vectors in addition to vectors in IDT?
> 
> No. The destination cpu and destination vector number are encoded in
> the MSI message.  Each MSI-X source ``vector'' has a different MSI message.
> 
> So on my wish list is to stably encode the MSI interurrpt numbers.  And
> using a sparse irq address space I can.  As it only takes 28 bits to hold
> the complete bus + device + function + msi source [ 0-4095 ] 
> 
> Eric

Don't you need "domain" (node) in the bus:device:function:vector combination?
(Or [hack] use a lot bigger field for bus with the node encoded into it.)

Thanks,
Mike

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 00/16] dyn_array and nr_irqs support v2
  2008-08-04 12:57           ` Mike Travis
@ 2008-08-05  2:38             ` H. Peter Anvin
  2008-08-05  3:40               ` Eric W. Biederman
  0 siblings, 1 reply; 39+ messages in thread
From: H. Peter Anvin @ 2008-08-05  2:38 UTC (permalink / raw)
  To: Mike Travis
  Cc: Eric W. Biederman, Yinghai Lu, Ingo Molnar, Thomas Gleixner,
	Dhaval Giani, Andrew Morton, linux-kernel

Mike Travis wrote:
>>
>> So on my wish list is to stably encode the MSI interurrpt numbers.  And
>> using a sparse irq address space I can.  As it only takes 28 bits to hold
>> the complete bus + device + function + msi source [ 0-4095 ] 
> 
> Don't you need "domain" (node) in the bus:device:function:vector combination?
> (Or [hack] use a lot bigger field for bus with the node encoded into it.)
> 

You definitely need domain, and that blows the 32-bit limit quite out of 
the water.

	-hpa

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 00/16] dyn_array and nr_irqs support v2
  2008-08-05  2:38             ` H. Peter Anvin
@ 2008-08-05  3:40               ` Eric W. Biederman
  2008-08-05  3:48                 ` H. Peter Anvin
  0 siblings, 1 reply; 39+ messages in thread
From: Eric W. Biederman @ 2008-08-05  3:40 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Mike Travis, Yinghai Lu, Ingo Molnar, Thomas Gleixner,
	Dhaval Giani, Andrew Morton, linux-kernel

"H. Peter Anvin" <hpa@zytor.com> writes:

> Mike Travis wrote:
>>>
>>> So on my wish list is to stably encode the MSI interurrpt numbers.  And
>>> using a sparse irq address space I can.  As it only takes 28 bits to hold
>>> the complete bus + device + function + msi source [ 0-4095 ]
>>
>> Don't you need "domain" (node) in the bus:device:function:vector combination?
>> (Or [hack] use a lot bigger field for bus with the node encoded into it.)
>>
>
> You definitely need domain, and that blows the 32-bit limit quite out of the
> water.

Yes. Although when I dreamed it up it domain wasn't more then a twinkle in
someone's eye on x86.  I'm not certain it is much more than that now.

The interesting implication of this is that if you have the right hardware
and are absolutely loopy you can have more interrupt sources than can
be described in a 32bit unsigned int, and certainly more than any sane person
would allocate in a statically sized array.

Eric

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 00/16] dyn_array and nr_irqs support v2
  2008-08-05  3:40               ` Eric W. Biederman
@ 2008-08-05  3:48                 ` H. Peter Anvin
  0 siblings, 0 replies; 39+ messages in thread
From: H. Peter Anvin @ 2008-08-05  3:48 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Mike Travis, Yinghai Lu, Ingo Molnar, Thomas Gleixner,
	Dhaval Giani, Andrew Morton, linux-kernel

Eric W. Biederman wrote:
> 
> The interesting implication of this is that if you have the right hardware
> and are absolutely loopy you can have more interrupt sources than can
> be described in a 32bit unsigned int, and certainly more than any sane person
> would allocate in a statically sized array.
> 

Yes, I'm quite convinced that the statically sized array is a bad idea.

	-hpa

^ permalink raw reply	[flat|nested] 39+ messages in thread

end of thread, other threads:[~2008-08-05  4:06 UTC | newest]

Thread overview: 39+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-08-01  9:37 [PATCH 00/16] dyn_array and nr_irqs support v2 Yinghai Lu
2008-08-01  9:37 ` [PATCH 01/16] x86: 64bit support more than 256 irq Yinghai Lu
2008-08-01  9:37   ` [PATCH 02/16] x86: introduce nr_irqs for 64bit v3 Yinghai Lu
2008-08-01  9:37     ` [PATCH 03/16] add dyn_array support Yinghai Lu
2008-08-01  9:37       ` [PATCH 04/16] make irq_timer_state to use dyn_array Yinghai Lu
2008-08-01  9:37         ` [PATCH 05/16] make irq2_iommu " Yinghai Lu
2008-08-01  9:37           ` [PATCH 06/16] make irq_desc " Yinghai Lu
2008-08-01  9:37             ` [PATCH 07/16] x86: make 64bit support dyn_array Yinghai Lu
2008-08-01  9:37               ` [PATCH 08/16] serial: change remove NR_IRQS in 8250.c v2 Yinghai Lu
2008-08-01  9:37                 ` [PATCH 09/16] add per_cpu_dyn_array support Yinghai Lu
2008-08-01  9:37                   ` [PATCH 10/16] irq: make irqs in kernel stat use per_cpu_dyn_array Yinghai Lu
2008-08-01  9:37                     ` [PATCH 11/16] x86 remove irq_vectors_limit.h Yinghai Lu
2008-08-01  9:37                       ` [PATCH 12/16] x86: make 32bit use dyn_array Yinghai Lu
2008-08-01  9:37                         ` [PATCH 13/16] add per_cpu_dyn_array for arch percpu support Yinghai Lu
2008-08-01  9:37                           ` [PATCH 14/16] x86: get mp_irqs from madt Yinghai Lu
2008-08-01  9:37                             ` [PATCH 15/16] x86: make 32bit more like with io_apic/dyn_array to 64 bit Yinghai Lu
2008-08-01  9:37                               ` [PATCH 16/16] x86: alloc dyn_array all alltogether Yinghai Lu
2008-08-01 20:46 ` [PATCH 00/16] dyn_array and nr_irqs support v2 Eric W. Biederman
2008-08-01 21:30   ` Yinghai Lu
2008-08-01 21:57     ` Yinghai Lu
2008-08-01 22:45       ` Eric W. Biederman
2008-08-01 22:10     ` Yinghai Lu
2008-08-01 22:38     ` Eric W. Biederman
2008-08-02  1:09       ` Yinghai Lu
2008-08-02  1:36         ` H. Peter Anvin
2008-08-02  1:41         ` Eric W. Biederman
2008-08-02  2:01           ` Yinghai Lu
2008-08-02  2:03             ` H. Peter Anvin
2008-08-02  2:39               ` Eric W. Biederman
2008-08-02  3:28                 ` H. Peter Anvin
2008-08-02  4:42                   ` Eric W. Biederman
2008-08-02 15:41                     ` H. Peter Anvin
2008-08-02 20:20                       ` Eric W. Biederman
2008-08-04 12:57           ` Mike Travis
2008-08-05  2:38             ` H. Peter Anvin
2008-08-05  3:40               ` Eric W. Biederman
2008-08-05  3:48                 ` H. Peter Anvin
2008-08-01 21:47   ` Mike Travis
2008-08-02  2:58   ` Yinghai Lu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox