* [PATCH 0/25] Decouple IRQ issues (MSI, i386, x86_64, ia64)
@ 2006-06-20 22:24 Eric W. Biederman
2006-06-20 22:28 ` [PATCH 1/25] irq: Convert the move_irq flag from a 32bit word to a single bit Eric W. Biederman
` (2 more replies)
0 siblings, 3 replies; 50+ messages in thread
From: Eric W. Biederman @ 2006-06-20 22:24 UTC (permalink / raw)
To: Andrew Morton
Cc: linux-kernel, linux-acpi, linux-pci, discuss, Ingo Molnar,
Thomas Gleixner, Andi Kleen, Natalie Protasevich, Len Brown,
Kimball Murray, Brice Goglin, Greg Lindahl, Dave Olson,
Jeff Garzik, Greg KH, Grant Grundler, bibo,mao, Rajesh Shah,
Mark Maule, Jesper Juhl, Shaohua Li, Matthew Wilcox,
Michael S. Tsirkin, Ashok Raj, Randy Dunlap
The following patchset is against 2.6.17-rc6-mm2.
It was the only easy place I could get everyones work who has been
touching relevant code.
The primary aim of this patch is to remove maintenances problems caused
by the irq infrastructure. The two big issues I address are an
artificially small cap on the number of irqs, and that MSI assumes
vector == irq. My primary focus is on x86_64 but I have touched
other architectures where necessary to keep them from breaking.
- To increase the number of irqs I modify the code to look at
the (cpu, vector) pair instead of just looking at the vector.
With a large number of irqs available systems with a large irq
count no longer need to compress their irq numbers to fit.
Removing a lot of brittle special cases.
For acpi guys the result is that irq == gsi.
- Addressing the fact that MSI assumes irq == vector takes a few more
patches. But suffice it to say when I am done none of the generic
irq code even knows what a vector is.
In quick testing on a large Unisys x86_64 machine we stumbled over at
least one driver that assumed that NR_IRQS could always fit into an 8
bit number. This driver is clearly buggy today. But this has become
a class of bugs that it is now much easier to hit.
I've done my best but if this patchset wasn't perfect it won't
surprise me. But I'm pretty certain I have succeeded in decoupling
any fixes should be small and well contained.
Eric
^ permalink raw reply [flat|nested] 50+ messages in thread* [PATCH 1/25] irq: Convert the move_irq flag from a 32bit word to a single bit 2006-06-20 22:24 [PATCH 0/25] Decouple IRQ issues (MSI, i386, x86_64, ia64) Eric W. Biederman @ 2006-06-20 22:28 ` Eric W. Biederman 2006-06-20 22:28 ` [PATCH 2/25] irq: Add moved_masked_irq Eric W. Biederman 2006-06-21 0:30 ` [PATCH 0/25] Decouple IRQ issues (MSI, i386, x86_64, ia64) Rajesh Shah 2006-06-21 10:24 ` [PATCH 0/25] Decouple IRQ issues (MSI, i386, x86_64, ia64) Ingo Molnar 2 siblings, 1 reply; 50+ messages in thread From: Eric W. Biederman @ 2006-06-20 22:28 UTC (permalink / raw) To: Andrew Morton Cc: linux-kernel, linux-acpi, linux-pci, discuss, Ingo Molnar, Thomas Gleixner, Andi Kleen, Natalie Protasevich, Len Brown, Kimball Murray, Brice Goglin, Greg Lindahl, Dave Olson, Jeff Garzik, Greg KH, Grant Grundler, bibo,mao, Rajesh Shah, Mark Maule, Jesper Juhl, Shaohua Li, Matthew Wilcox, Michael S. Tsirkin, Ashok Raj, Randy Dunlap This is a minor space optimization. In practice I don't think this has any affect because of our alignment constraints and the other fields but there is not point in chewing up an uncessary word and since we already read the flag field this should improve the cache hit ratio of the irq handler. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> --- include/linux/irq.h | 5 +++-- kernel/irq/migration.c | 6 +++--- 2 files changed, 6 insertions(+), 5 deletions(-) diff --git a/include/linux/irq.h b/include/linux/irq.h index c684bab..1ad1acb 100644 --- a/include/linux/irq.h +++ b/include/linux/irq.h @@ -45,6 +45,9 @@ #define IRQ_NOREQUEST 1024 /* IRQ cannot #define IRQ_NOAUTOEN 2048 /* IRQ will not be enabled on request irq */ #define IRQ_DELAYED_DISABLE \ 4096 /* IRQ disable (masking) happens delayed. */ +#if defined(CONFIG_GENERIC_PENDING_IRQ) || defined(CONFIG_IRQBALANCE) +# define IRQ_MOVE_PENDING 8192 /* need to re-target IRQ destination */ +#endif /* * IRQ types, see also include/linux/interrupt.h @@ -130,7 +133,6 @@ #endif * @affinity: IRQ affinity on SMP * @cpu: cpu index useful for balancing * @pending_mask: pending rebalanced interrupts - * @move_irq: need to re-target IRQ destination * @dir: /proc/irq/ procfs entry * @affinity_entry: /proc/irq/smp_affinity procfs entry on SMP * @@ -156,7 +158,6 @@ #ifdef CONFIG_SMP #endif #if defined(CONFIG_GENERIC_PENDING_IRQ) || defined(CONFIG_IRQBALANCE) cpumask_t pending_mask; - unsigned int move_irq; /* need to re-target IRQ dest */ #endif #ifdef CONFIG_PROC_FS struct proc_dir_entry *dir; diff --git a/kernel/irq/migration.c b/kernel/irq/migration.c index a57ebe9..9b234df 100644 --- a/kernel/irq/migration.c +++ b/kernel/irq/migration.c @@ -7,7 +7,7 @@ void set_pending_irq(unsigned int irq, c unsigned long flags; spin_lock_irqsave(&desc->lock, flags); - desc->move_irq = 1; + desc->status |= IRQ_MOVE_PENDING; irq_desc[irq].pending_mask = mask; spin_unlock_irqrestore(&desc->lock, flags); } @@ -17,7 +17,7 @@ void move_native_irq(int irq) struct irq_desc *desc = irq_desc + irq; cpumask_t tmp; - if (likely(!desc->move_irq)) + if (likely(!(desc->status & IRQ_MOVE_PENDING))) return; /* @@ -28,7 +28,7 @@ void move_native_irq(int irq) return; } - desc->move_irq = 0; + desc->status &= ~IRQ_MOVE_PENDING; if (unlikely(cpus_empty(irq_desc[irq].pending_mask))) return; -- 1.4.0.gc07e ^ permalink raw reply related [flat|nested] 50+ messages in thread
* [PATCH 2/25] irq: Add moved_masked_irq 2006-06-20 22:28 ` [PATCH 1/25] irq: Convert the move_irq flag from a 32bit word to a single bit Eric W. Biederman @ 2006-06-20 22:28 ` Eric W. Biederman 2006-06-20 22:28 ` [PATCH 3/25] x86_64 irq: Reenable migrating irqs to other cpus Eric W. Biederman 0 siblings, 1 reply; 50+ messages in thread From: Eric W. Biederman @ 2006-06-20 22:28 UTC (permalink / raw) To: Andrew Morton Cc: linux-kernel, linux-acpi, linux-pci, discuss, Ingo Molnar, Thomas Gleixner, Andi Kleen, Natalie Protasevich, Len Brown, Kimball Murray, Brice Goglin, Greg Lindahl, Dave Olson, Jeff Garzik, Greg KH, Grant Grundler, bibo,mao, Rajesh Shah, Mark Maule, Jesper Juhl, Shaohua Li, Matthew Wilcox, Michael S. Tsirkin, Ashok Raj, Randy Dunlap Currently move_native_irq disables and renables the irq we are migrating to ensure we don't take that irq when we are actually doing the migration operation. Disabling the irq needs to happen but sometimes doing the work is move_native_irq is too late. On x86 with ioapics the irq move sequences needs to be: edge_triggered: mask irq. move irq. unmask irq. ack irq. level_triggered: mask irq. ack irq. move irq. unmask irq. We can esasily perform the edge triggered sequence, with the current defintion of move_native_irq. However the level triggered case does not map well. For that I have added move_masked_irq, to allow me to disable the irqs around both the ack and the move. Q: Why have we not seen this problem earlier? A: The only symptom I have been able to reproduce is that if we change the vector before acknowleding an irq the wrong irq is acknowledged. Since we currently are not reprogramming the irq vector during migration no problems show up. We have to mask the irq before we acknowledge the irq or else we could hit a window where an irq is asserted just before we acknowledge it. Edge triggered irqs do not have this problem because acknowledgements do not propogate in the same way. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> --- include/linux/irq.h | 6 ++++++ kernel/irq/migration.c | 28 +++++++++++++++++++++------- 2 files changed, 27 insertions(+), 7 deletions(-) diff --git a/include/linux/irq.h b/include/linux/irq.h index 1ad1acb..b79d178 100644 --- a/include/linux/irq.h +++ b/include/linux/irq.h @@ -200,6 +200,7 @@ #if defined(CONFIG_GENERIC_PENDING_IRQ) void set_pending_irq(unsigned int irq, cpumask_t mask); void move_native_irq(int irq); +void move_masked_irq(int irq); #ifdef CONFIG_PCI_MSI /* @@ -241,6 +242,10 @@ static inline void move_native_irq(int i { } +static inline void move_masked_irq(int irq) +{ +} + static inline void set_pending_irq(unsigned int irq, cpumask_t mask) { } @@ -256,6 +261,7 @@ #else /* CONFIG_SMP */ #define move_irq(x) #define move_native_irq(x) +#define move_masked_irq(x) #endif /* CONFIG_SMP */ diff --git a/kernel/irq/migration.c b/kernel/irq/migration.c index 9b234df..4baa3bb 100644 --- a/kernel/irq/migration.c +++ b/kernel/irq/migration.c @@ -12,7 +12,7 @@ void set_pending_irq(unsigned int irq, c spin_unlock_irqrestore(&desc->lock, flags); } -void move_native_irq(int irq) +void move_masked_irq(int irq) { struct irq_desc *desc = irq_desc + irq; cpumask_t tmp; @@ -48,15 +48,29 @@ void move_native_irq(int irq) * when an active trigger is comming in. This could * cause some ioapics to mal-function. * Being paranoid i guess! + * + * For correct operation this depends on the caller + * masking the irqs. */ if (likely(!cpus_empty(tmp))) { - if (likely(!(desc->status & IRQ_DISABLED))) - desc->chip->disable(irq); - desc->chip->set_affinity(irq,tmp); - - if (likely(!(desc->status & IRQ_DISABLED))) - desc->chip->enable(irq); } cpus_clear(irq_desc[irq].pending_mask); } + +void move_native_irq(int irq) +{ + struct irq_desc *desc = irq_desc + irq; + + if (likely(!(desc->status & IRQ_MOVE_PENDING))) + return; + + if (likely(!(desc->status & IRQ_DISABLED))) + desc->chip->disable(irq); + + move_masked_irq(irq); + + if (likely(!(desc->status & IRQ_DISABLED))) + desc->chip->enable(irq); +} + -- 1.4.0.gc07e ^ permalink raw reply related [flat|nested] 50+ messages in thread
* [PATCH 3/25] x86_64 irq: Reenable migrating irqs to other cpus. 2006-06-20 22:28 ` [PATCH 2/25] irq: Add moved_masked_irq Eric W. Biederman @ 2006-06-20 22:28 ` Eric W. Biederman 2006-06-20 22:28 ` [PATCH 4/25] msi: Simplify msi enable and disable Eric W. Biederman 0 siblings, 1 reply; 50+ messages in thread From: Eric W. Biederman @ 2006-06-20 22:28 UTC (permalink / raw) To: Andrew Morton Cc: linux-kernel, linux-acpi, linux-pci, discuss, Ingo Molnar, Thomas Gleixner, Andi Kleen, Natalie Protasevich, Len Brown, Kimball Murray, Brice Goglin, Greg Lindahl, Dave Olson, Jeff Garzik, Greg KH, Grant Grundler, bibo,mao, Rajesh Shah, Mark Maule, Jesper Juhl, Shaohua Li, Matthew Wilcox, Michael S. Tsirkin, Ashok Raj, Randy Dunlap In the latest changes the code for migrating x86_64 irqs was dropped. This reads it in a fashion that will work even if we change the vector on level triggered irqs when we migrate them. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> --- arch/x86_64/kernel/io_apic.c | 31 ++++++++++++++++++++++++++++--- 1 files changed, 28 insertions(+), 3 deletions(-) diff --git a/arch/x86_64/kernel/io_apic.c b/arch/x86_64/kernel/io_apic.c index 2e9f1cf..f50be45 100644 --- a/arch/x86_64/kernel/io_apic.c +++ b/arch/x86_64/kernel/io_apic.c @@ -1572,18 +1572,43 @@ static int ioapic_retrigger_vector(unsig * races. */ -static void ack_apic(unsigned int vector) +static void ack_apic(unsigned int irq) { ack_APIC_irq(); } +static void ack_apic_edge(unsigned int irq) +{ + move_native_irq(irq); + ack_APIC_irq(); +} + +static void ack_apic_level(unsigned int irq) +{ + int do_unmask_irq = 0; + /* If we are moving the irq we need to mask it */ + if (unlikely(irq_desc[irq].status & IRQ_MOVE_PENDING)) { + do_unmask_irq = 1; + mask_IO_APIC_irq(irq); + } + /* We must acknowledge the irq before we move it + * or the acknowledge will not propogate properly. + */ + ack_APIC_irq(); + + /* Now we can move and renable the irq */ + move_masked_irq(irq); + if (unlikely(do_unmask_irq)) + unmask_IO_APIC_irq(irq); +} + static struct irq_chip ioapic_chip __read_mostly = { .name = "IO-APIC", .startup = startup_ioapic_vector, .mask = mask_ioapic_vector, .unmask = unmask_ioapic_vector, - .ack = ack_apic, - .eoi = ack_apic, + .ack = ack_apic_edge, + .eoi = ack_apic_level, #ifdef CONFIG_SMP .set_affinity = set_ioapic_affinity_vector, #endif -- 1.4.0.gc07e ^ permalink raw reply related [flat|nested] 50+ messages in thread
* [PATCH 4/25] msi: Simplify msi enable and disable. 2006-06-20 22:28 ` [PATCH 3/25] x86_64 irq: Reenable migrating irqs to other cpus Eric W. Biederman @ 2006-06-20 22:28 ` Eric W. Biederman 2006-06-20 22:28 ` [PATCH 5/25] msi: Make the msi boolean tests return either 0 or 1 Eric W. Biederman 2006-06-21 0:44 ` [PATCH 4/25] msi: Simplify msi enable and disable Rajesh Shah 0 siblings, 2 replies; 50+ messages in thread From: Eric W. Biederman @ 2006-06-20 22:28 UTC (permalink / raw) To: Andrew Morton Cc: linux-kernel, linux-acpi, linux-pci, discuss, Ingo Molnar, Thomas Gleixner, Andi Kleen, Natalie Protasevich, Len Brown, Kimball Murray, Brice Goglin, Greg Lindahl, Dave Olson, Jeff Garzik, Greg KH, Grant Grundler, bibo,mao, Rajesh Shah, Mark Maule, Jesper Juhl, Shaohua Li, Matthew Wilcox, Michael S. Tsirkin, Ashok Raj, Randy Dunlap The problem. Because the disable routines leave the msi interrupts in all sorts of half enabled states the enable routines become impossible to implement correctly, and almost impossible to understand. Simplifing this allows me to simply kill the buggy reroute_msix_table, and generally makes the code more maintainable. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> --- drivers/pci/msi.c | 122 +++++++---------------------------------------------- 1 files changed, 16 insertions(+), 106 deletions(-) diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c index 76d023d..c1c93f0 100644 --- a/drivers/pci/msi.c +++ b/drivers/pci/msi.c @@ -915,7 +915,6 @@ int pci_enable_msi(struct pci_dev* dev) { struct pci_bus *bus; int pos, temp, status = -EINVAL; - u16 control; if (!pci_msi_enable || !dev) return status; @@ -937,27 +936,8 @@ int pci_enable_msi(struct pci_dev* dev) if (!pos) return -EINVAL; - if (!msi_lookup_vector(dev, PCI_CAP_ID_MSI)) { - /* Lookup Sucess */ - unsigned long flags; + BUG_ON(!msi_lookup_vector(dev, PCI_CAP_ID_MSI)); - pci_read_config_word(dev, msi_control_reg(pos), &control); - if (control & PCI_MSI_FLAGS_ENABLE) - return 0; /* Already in MSI mode */ - spin_lock_irqsave(&msi_lock, flags); - if (!vector_irq[dev->irq]) { - msi_desc[dev->irq]->msi_attrib.state = 0; - vector_irq[dev->irq] = -1; - nr_released_vectors--; - spin_unlock_irqrestore(&msi_lock, flags); - status = msi_register_init(dev, msi_desc[dev->irq]); - if (status == 0) - enable_msi_mode(dev, pos, PCI_CAP_ID_MSI); - return status; - } - spin_unlock_irqrestore(&msi_lock, flags); - dev->irq = temp; - } /* Check whether driver already requested for MSI-X vectors */ pos = pci_find_capability(dev, PCI_CAP_ID_MSIX); if (pos > 0 && !msi_lookup_vector(dev, PCI_CAP_ID_MSIX)) { @@ -999,6 +979,8 @@ void pci_disable_msi(struct pci_dev* dev if (!(control & PCI_MSI_FLAGS_ENABLE)) return; + disable_msi_mode(dev, pos, PCI_CAP_ID_MSI); + spin_lock_irqsave(&msi_lock, flags); entry = msi_desc[dev->irq]; if (!entry || !entry->dev || entry->msi_attrib.type != PCI_CAP_ID_MSI) { @@ -1012,14 +994,12 @@ void pci_disable_msi(struct pci_dev* dev pci_name(dev), dev->irq); BUG_ON(entry->msi_attrib.state > 0); } else { - vector_irq[dev->irq] = 0; /* free it */ - nr_released_vectors++; default_vector = entry->msi_attrib.default_vector; spin_unlock_irqrestore(&msi_lock, flags); + msi_free_vector(dev, dev->irq, 0); + /* Restore dev->irq to its default pin-assertion vector */ dev->irq = default_vector; - disable_msi_mode(dev, pci_find_capability(dev, PCI_CAP_ID_MSI), - PCI_CAP_ID_MSI); } } @@ -1067,57 +1047,6 @@ static int msi_free_vector(struct pci_de return 0; } -static int reroute_msix_table(int head, struct msix_entry *entries, int *nvec) -{ - int vector = head, tail = 0; - int i, j = 0, nr_entries = 0; - void __iomem *base; - unsigned long flags; - - spin_lock_irqsave(&msi_lock, flags); - while (head != tail) { - nr_entries++; - tail = msi_desc[vector]->link.tail; - if (entries[0].entry == msi_desc[vector]->msi_attrib.entry_nr) - j = vector; - vector = tail; - } - if (*nvec > nr_entries) { - spin_unlock_irqrestore(&msi_lock, flags); - *nvec = nr_entries; - return -EINVAL; - } - vector = ((j > 0) ? j : head); - for (i = 0; i < *nvec; i++) { - j = msi_desc[vector]->msi_attrib.entry_nr; - msi_desc[vector]->msi_attrib.state = 0; /* Mark it not active */ - vector_irq[vector] = -1; /* Mark it busy */ - nr_released_vectors--; - entries[i].vector = vector; - if (j != (entries + i)->entry) { - base = msi_desc[vector]->mask_base; - msi_desc[vector]->msi_attrib.entry_nr = - (entries + i)->entry; - writel( readl(base + j * PCI_MSIX_ENTRY_SIZE + - PCI_MSIX_ENTRY_LOWER_ADDR_OFFSET), base + - (entries + i)->entry * PCI_MSIX_ENTRY_SIZE + - PCI_MSIX_ENTRY_LOWER_ADDR_OFFSET); - writel( readl(base + j * PCI_MSIX_ENTRY_SIZE + - PCI_MSIX_ENTRY_UPPER_ADDR_OFFSET), base + - (entries + i)->entry * PCI_MSIX_ENTRY_SIZE + - PCI_MSIX_ENTRY_UPPER_ADDR_OFFSET); - writel( (readl(base + j * PCI_MSIX_ENTRY_SIZE + - PCI_MSIX_ENTRY_DATA_OFFSET) & 0xff00) | vector, - base + (entries+i)->entry*PCI_MSIX_ENTRY_SIZE + - PCI_MSIX_ENTRY_DATA_OFFSET); - } - vector = msi_desc[vector]->link.tail; - } - spin_unlock_irqrestore(&msi_lock, flags); - - return 0; -} - /** * pci_enable_msix - configure device's MSI-X capability structure * @dev: pointer to the pci_dev data structure of MSI-X device function @@ -1160,9 +1089,6 @@ int pci_enable_msix(struct pci_dev* dev, return -EINVAL; pci_read_config_word(dev, msi_control_reg(pos), &control); - if (control & PCI_MSIX_FLAGS_ENABLE) - return -EINVAL; /* Already in MSI-X mode */ - nr_entries = multi_msix_capable(control); if (nvec > nr_entries) return -EINVAL; @@ -1177,19 +1103,8 @@ int pci_enable_msix(struct pci_dev* dev, } } temp = dev->irq; - if (!msi_lookup_vector(dev, PCI_CAP_ID_MSIX)) { - /* Lookup Sucess */ - nr_entries = nvec; - /* Reroute MSI-X table */ - if (reroute_msix_table(dev->irq, entries, &nr_entries)) { - /* #requested > #previous-assigned */ - dev->irq = temp; - return nr_entries; - } - dev->irq = temp; - enable_msi_mode(dev, pos, PCI_CAP_ID_MSIX); - return 0; - } + BUG_ON(!msi_lookup_vector(dev, PCI_CAP_ID_MSIX)); + /* Check whether driver already requested for MSI vector */ if (pci_find_capability(dev, PCI_CAP_ID_MSI) > 0 && !msi_lookup_vector(dev, PCI_CAP_ID_MSI)) { @@ -1248,37 +1163,32 @@ void pci_disable_msix(struct pci_dev* de if (!(control & PCI_MSIX_FLAGS_ENABLE)) return; + disable_msi_mode(dev, pos, PCI_CAP_ID_MSIX); + temp = dev->irq; if (!msi_lookup_vector(dev, PCI_CAP_ID_MSIX)) { int state, vector, head, tail = 0, warning = 0; unsigned long flags; vector = head = dev->irq; - spin_lock_irqsave(&msi_lock, flags); + dev->irq = temp; /* Restore pin IRQ */ while (head != tail) { + spin_lock_irqsave(&msi_lock, flags); state = msi_desc[vector]->msi_attrib.state; + tail = msi_desc[vector]->link.tail; + spin_unlock_irqrestore(&msi_lock, flags); if (state) warning = 1; - else { - vector_irq[vector] = 0; /* free it */ - nr_released_vectors++; - } - tail = msi_desc[vector]->link.tail; + else if (vector != head) /* Release MSI-X vector */ + msi_free_vector(dev, vector, 0); vector = tail; } - spin_unlock_irqrestore(&msi_lock, flags); + msi_free_vector(dev, vector, 0); if (warning) { - dev->irq = temp; printk(KERN_WARNING "PCI: %s: pci_disable_msix() called without " "free_irq() on all MSI-X vectors\n", pci_name(dev)); BUG_ON(warning > 0); - } else { - dev->irq = temp; - disable_msi_mode(dev, - pci_find_capability(dev, PCI_CAP_ID_MSIX), - PCI_CAP_ID_MSIX); - } } } -- 1.4.0.gc07e ^ permalink raw reply related [flat|nested] 50+ messages in thread
* [PATCH 5/25] msi: Make the msi boolean tests return either 0 or 1. 2006-06-20 22:28 ` [PATCH 4/25] msi: Simplify msi enable and disable Eric W. Biederman @ 2006-06-20 22:28 ` Eric W. Biederman 2006-06-20 22:28 ` [PATCH 6/25] msi: Implement helper functions read_msi_msg and write_msi_msg Eric W. Biederman 2006-06-20 22:45 ` [PATCH 5/25] msi: Make the msi boolean tests return either 0 or 1 Jeff Garzik 2006-06-21 0:44 ` [PATCH 4/25] msi: Simplify msi enable and disable Rajesh Shah 1 sibling, 2 replies; 50+ messages in thread From: Eric W. Biederman @ 2006-06-20 22:28 UTC (permalink / raw) To: Andrew Morton Cc: linux-kernel, linux-acpi, linux-pci, discuss, Ingo Molnar, Thomas Gleixner, Andi Kleen, Natalie Protasevich, Len Brown, Kimball Murray, Brice Goglin, Greg Lindahl, Dave Olson, Jeff Garzik, Greg KH, Grant Grundler, bibo,mao, Rajesh Shah, Mark Maule, Jesper Juhl, Shaohua Li, Matthew Wilcox, Michael S. Tsirkin, Ashok Raj, Randy Dunlap This allows the output of the msi tests to be stored directly in a bit field. If you don't do this a value greater than one will be truncated and become 0. Changing true to false with bizare consequences. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> --- drivers/pci/msi.h | 4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/pci/msi.h b/drivers/pci/msi.h index 56951c3..9b31d4c 100644 --- a/drivers/pci/msi.h +++ b/drivers/pci/msi.h @@ -110,8 +110,8 @@ #define multi_msi_capable(control) \ (1 << ((control & PCI_MSI_FLAGS_QMASK) >> 1)) #define multi_msi_enable(control, num) \ control |= (((num >> 1) << 4) & PCI_MSI_FLAGS_QSIZE); -#define is_64bit_address(control) (control & PCI_MSI_FLAGS_64BIT) -#define is_mask_bit_support(control) (control & PCI_MSI_FLAGS_MASKBIT) +#define is_64bit_address(control) (!!(control & PCI_MSI_FLAGS_64BIT)) +#define is_mask_bit_support(control) (!!(control & PCI_MSI_FLAGS_MASKBIT)) #define msi_enable(control, num) multi_msi_enable(control, num); \ control |= PCI_MSI_FLAGS_ENABLE -- 1.4.0.gc07e ^ permalink raw reply related [flat|nested] 50+ messages in thread
* [PATCH 6/25] msi: Implement helper functions read_msi_msg and write_msi_msg. 2006-06-20 22:28 ` [PATCH 5/25] msi: Make the msi boolean tests return either 0 or 1 Eric W. Biederman @ 2006-06-20 22:28 ` Eric W. Biederman 2006-06-20 22:28 ` [PATCH 7/25] msi: Refactor the msi_ops Eric W. Biederman 2006-06-21 1:04 ` [PATCH 6/25] msi: Implement helper functions read_msi_msg and write_msi_msg Rajesh Shah 2006-06-20 22:45 ` [PATCH 5/25] msi: Make the msi boolean tests return either 0 or 1 Jeff Garzik 1 sibling, 2 replies; 50+ messages in thread From: Eric W. Biederman @ 2006-06-20 22:28 UTC (permalink / raw) To: Andrew Morton Cc: linux-kernel, linux-acpi, linux-pci, discuss, Ingo Molnar, Thomas Gleixner, Andi Kleen, Natalie Protasevich, Len Brown, Kimball Murray, Brice Goglin, Greg Lindahl, Dave Olson, Jeff Garzik, Greg KH, Grant Grundler, bibo,mao, Rajesh Shah, Mark Maule, Jesper Juhl, Shaohua Li, Matthew Wilcox, Michael S. Tsirkin, Ashok Raj, Randy Dunlap In support of this I also add a struct msi_msg that captures the the two address and one data field ina typical msi message, and I remember the pos and if the address is 64bit in struct msi_desc. This makes the code a little more readable and easier to maintain, and paves the way to further simplfications. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> --- drivers/pci/msi.c | 195 +++++++++++++++++++++++++-------------------------- drivers/pci/msi.h | 9 +- include/linux/pci.h | 6 ++ 3 files changed, 104 insertions(+), 106 deletions(-) diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c index c1c93f0..e9db6c5 100644 --- a/drivers/pci/msi.c +++ b/drivers/pci/msi.c @@ -94,63 +94,100 @@ static void msi_set_mask_bit(unsigned in } } -#ifdef CONFIG_SMP -static void set_msi_affinity(unsigned int vector, cpumask_t cpu_mask) +static void read_msi_msg(struct msi_desc *entry, struct msi_msg *msg) { - struct msi_desc *entry; - u32 address_hi, address_lo; - unsigned int irq = vector; - unsigned int dest_cpu = first_cpu(cpu_mask); - - entry = (struct msi_desc *)msi_desc[vector]; - if (!entry || !entry->dev) - return; + switch(entry->msi_attrib.type) { + case PCI_CAP_ID_MSI: + { + struct pci_dev *dev = entry->dev; + int pos = entry->msi_attrib.pos; + u16 data; + + pci_read_config_dword(dev, msi_lower_address_reg(pos), + &msg->address_lo); + if (entry->msi_attrib.is_64) { + pci_read_config_dword(dev, msi_upper_address_reg(pos), + &msg->address_hi); + pci_read_config_word(dev, msi_data_reg(pos, 1), &data); + } else { + msg->address_hi = 0; + pci_read_config_word(dev, msi_data_reg(pos, 1), &data); + } + msg->data = data; + break; + } + case PCI_CAP_ID_MSIX: + { + void __iomem *base; + base = entry->mask_base + + entry->msi_attrib.entry_nr * PCI_MSIX_ENTRY_SIZE; + + msg->address_lo = readl(base + PCI_MSIX_ENTRY_LOWER_ADDR_OFFSET); + msg->address_hi = readl(base + PCI_MSIX_ENTRY_UPPER_ADDR_OFFSET); + msg->data = readl(base + PCI_MSIX_ENTRY_DATA_OFFSET); + break; + } + default: + BUG(); + } +} +static void write_msi_msg(struct msi_desc *entry, struct msi_msg *msg) +{ switch (entry->msi_attrib.type) { case PCI_CAP_ID_MSI: { - int pos = pci_find_capability(entry->dev, PCI_CAP_ID_MSI); - - if (!pos) - return; - - pci_read_config_dword(entry->dev, msi_upper_address_reg(pos), - &address_hi); - pci_read_config_dword(entry->dev, msi_lower_address_reg(pos), - &address_lo); - - msi_ops->target(vector, dest_cpu, &address_hi, &address_lo); - - pci_write_config_dword(entry->dev, msi_upper_address_reg(pos), - address_hi); - pci_write_config_dword(entry->dev, msi_lower_address_reg(pos), - address_lo); - set_native_irq_info(irq, cpu_mask); + struct pci_dev *dev = entry->dev; + int pos = entry->msi_attrib.pos; + + pci_write_config_dword(dev, msi_lower_address_reg(pos), + msg->address_lo); + if (entry->msi_attrib.is_64) { + pci_write_config_dword(dev, msi_upper_address_reg(pos), + msg->address_hi); + pci_write_config_word(dev, msi_data_reg(pos, 1), + msg->data); + } else { + pci_write_config_word(dev, msi_data_reg(pos, 0), + msg->data); + } break; } case PCI_CAP_ID_MSIX: { - int offset_hi = - entry->msi_attrib.entry_nr * PCI_MSIX_ENTRY_SIZE + - PCI_MSIX_ENTRY_UPPER_ADDR_OFFSET; - int offset_lo = - entry->msi_attrib.entry_nr * PCI_MSIX_ENTRY_SIZE + - PCI_MSIX_ENTRY_LOWER_ADDR_OFFSET; - - address_hi = readl(entry->mask_base + offset_hi); - address_lo = readl(entry->mask_base + offset_lo); - - msi_ops->target(vector, dest_cpu, &address_hi, &address_lo); - - writel(address_hi, entry->mask_base + offset_hi); - writel(address_lo, entry->mask_base + offset_lo); - set_native_irq_info(irq, cpu_mask); + void __iomem *base; + base = entry->mask_base + + entry->msi_attrib.entry_nr * PCI_MSIX_ENTRY_SIZE; + + writel(msg->address_lo, + base + PCI_MSIX_ENTRY_LOWER_ADDR_OFFSET); + writel(msg->address_hi, + base + PCI_MSIX_ENTRY_UPPER_ADDR_OFFSET); + writel(msg->data, base + PCI_MSIX_ENTRY_DATA_OFFSET); break; } default: - break; + BUG(); } } + +#ifdef CONFIG_SMP +static void set_msi_affinity(unsigned int vector, cpumask_t cpu_mask) +{ + struct msi_desc *entry; + struct msi_msg msg; + unsigned int irq = vector; + unsigned int dest_cpu = first_cpu(cpu_mask); + + entry = (struct msi_desc *)msi_desc[vector]; + if (!entry || !entry->dev) + return; + + read_msi_msg(entry, &msg); + msi_ops->target(vector, dest_cpu, &msg.address_hi, &msg.address_lo); + write_msi_msg(entry, &msg); + set_native_irq_info(irq, cpu_mask); +} #else #define set_msi_affinity NULL #endif /* CONFIG_SMP */ @@ -614,23 +651,10 @@ int pci_save_msix_state(struct pci_dev * vector = head = dev->irq; while (head != tail) { - int j; - void __iomem *base; struct msi_desc *entry; entry = msi_desc[vector]; - base = entry->mask_base; - j = entry->msi_attrib.entry_nr; - - entry->address_lo_save = - readl(base + j * PCI_MSIX_ENTRY_SIZE + - PCI_MSIX_ENTRY_LOWER_ADDR_OFFSET); - entry->address_hi_save = - readl(base + j * PCI_MSIX_ENTRY_SIZE + - PCI_MSIX_ENTRY_UPPER_ADDR_OFFSET); - entry->data_save = - readl(base + j * PCI_MSIX_ENTRY_SIZE + - PCI_MSIX_ENTRY_DATA_OFFSET); + read_msi_msg(entry, &entry->msg_save); tail = msi_desc[vector]->link.tail; vector = tail; @@ -647,8 +671,6 @@ void pci_restore_msix_state(struct pci_d u16 save; int pos; int vector, head, tail = 0; - void __iomem *base; - int j; struct msi_desc *entry; int temp; struct pci_cap_saved_state *save_state; @@ -671,18 +693,7 @@ void pci_restore_msix_state(struct pci_d vector = head = dev->irq; while (head != tail) { entry = msi_desc[vector]; - base = entry->mask_base; - j = entry->msi_attrib.entry_nr; - - writel(entry->address_lo_save, - base + j * PCI_MSIX_ENTRY_SIZE + - PCI_MSIX_ENTRY_LOWER_ADDR_OFFSET); - writel(entry->address_hi_save, - base + j * PCI_MSIX_ENTRY_SIZE + - PCI_MSIX_ENTRY_UPPER_ADDR_OFFSET); - writel(entry->data_save, - base + j * PCI_MSIX_ENTRY_SIZE + - PCI_MSIX_ENTRY_DATA_OFFSET); + write_msi_msg(entry, &entry->msg_save); tail = msi_desc[vector]->link.tail; vector = tail; @@ -697,29 +708,19 @@ #endif static int msi_register_init(struct pci_dev *dev, struct msi_desc *entry) { int status; - u32 address_hi; - u32 address_lo; - u32 data; + struct msi_msg msg; int pos, vector = dev->irq; u16 control; - pos = pci_find_capability(dev, PCI_CAP_ID_MSI); + pos = entry->msi_attrib.pos; pci_read_config_word(dev, msi_control_reg(pos), &control); /* Configure MSI capability structure */ - status = msi_ops->setup(dev, vector, &address_hi, &address_lo, &data); + status = msi_ops->setup(dev, vector, &msg.address_hi, &msg.address_lo, &msg.data); if (status < 0) return status; - pci_write_config_dword(dev, msi_lower_address_reg(pos), address_lo); - if (is_64bit_address(control)) { - pci_write_config_dword(dev, - msi_upper_address_reg(pos), address_hi); - pci_write_config_word(dev, - msi_data_reg(pos, 1), data); - } else - pci_write_config_word(dev, - msi_data_reg(pos, 0), data); + write_msi_msg(entry, &msg); if (entry->msi_attrib.maskbit) { unsigned int maskbits, temp; /* All MSIs are unmasked by default, Mask them all */ @@ -769,9 +770,11 @@ static int msi_capability_init(struct pc entry->link.tail = vector; entry->msi_attrib.type = PCI_CAP_ID_MSI; entry->msi_attrib.state = 0; /* Mark it not active */ + entry->msi_attrib.is_64 = is_64bit_address(control); entry->msi_attrib.entry_nr = 0; entry->msi_attrib.maskbit = is_mask_bit_support(control); entry->msi_attrib.default_vector = dev->irq; /* Save IOAPIC IRQ */ + entry->msi_attrib.pos = pos; dev->irq = vector; entry->dev = dev; if (is_mask_bit_support(control)) { @@ -809,9 +812,7 @@ static int msix_capability_init(struct p struct msix_entry *entries, int nvec) { struct msi_desc *head = NULL, *tail = NULL, *entry = NULL; - u32 address_hi; - u32 address_lo; - u32 data; + struct msi_msg msg; int status; int vector, pos, i, j, nr_entries, temp = 0; unsigned long phys_addr; @@ -848,9 +849,11 @@ static int msix_capability_init(struct p entries[i].vector = vector; entry->msi_attrib.type = PCI_CAP_ID_MSIX; entry->msi_attrib.state = 0; /* Mark it not active */ + entry->msi_attrib.is_64 = 1; entry->msi_attrib.entry_nr = j; entry->msi_attrib.maskbit = 1; entry->msi_attrib.default_vector = dev->irq; + entry->msi_attrib.pos = pos; entry->dev = dev; entry->mask_base = base; if (!head) { @@ -869,21 +872,13 @@ static int msix_capability_init(struct p irq_handler_init(PCI_CAP_ID_MSIX, vector, 1); /* Configure MSI-X capability structure */ status = msi_ops->setup(dev, vector, - &address_hi, - &address_lo, - &data); + &msg.address_hi, + &msg.address_lo, + &msg.data); if (status < 0) break; - writel(address_lo, - base + j * PCI_MSIX_ENTRY_SIZE + - PCI_MSIX_ENTRY_LOWER_ADDR_OFFSET); - writel(address_hi, - base + j * PCI_MSIX_ENTRY_SIZE + - PCI_MSIX_ENTRY_UPPER_ADDR_OFFSET); - writel(data, - base + j * PCI_MSIX_ENTRY_SIZE + - PCI_MSIX_ENTRY_DATA_OFFSET); + write_msi_msg(entry, &msg); attach_msi_entry(entry, vector); } if (i != nvec) { diff --git a/drivers/pci/msi.h b/drivers/pci/msi.h index 9b31d4c..62f61b6 100644 --- a/drivers/pci/msi.h +++ b/drivers/pci/msi.h @@ -130,10 +130,10 @@ struct msi_desc { __u8 type : 5; /* {0: unused, 5h:MSI, 11h:MSI-X} */ __u8 maskbit : 1; /* mask-pending bit supported ? */ __u8 state : 1; /* {0: free, 1: busy} */ - __u8 reserved: 1; /* reserved */ + __u8 is_64 : 1; /* Address size: 0=32bit 1=64bit */ __u8 entry_nr; /* specific enabled entry */ __u8 default_vector; /* default pre-assigned vector */ - __u8 unused; /* formerly unused destination cpu*/ + __u8 pos; /* Location of the msi capability */ }msi_attrib; struct { @@ -146,10 +146,7 @@ struct msi_desc { #ifdef CONFIG_PM /* PM save area for MSIX address/data */ - - u32 address_hi_save; - u32 address_lo_save; - u32 data_save; + struct msi_msg msg_save; #endif }; diff --git a/include/linux/pci.h b/include/linux/pci.h index e36e3d5..c7be27b 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -591,6 +591,12 @@ struct msix_entry { u16 entry; /* driver uses to specify entry, OS writes */ }; +struct msi_msg { + u32 address_lo; /* low 32 bits of msi message address */ + u32 address_hi; /* high 32 bits of msi message address */ + u32 data; /* 16 bits of msi message data */ +}; + #ifndef CONFIG_PCI_MSI static inline void pci_scan_msi_device(struct pci_dev *dev) {} static inline int pci_enable_msi(struct pci_dev *dev) {return -1;} -- 1.4.0.gc07e ^ permalink raw reply related [flat|nested] 50+ messages in thread
* [PATCH 7/25] msi: Refactor the msi_ops. 2006-06-20 22:28 ` [PATCH 6/25] msi: Implement helper functions read_msi_msg and write_msi_msg Eric W. Biederman @ 2006-06-20 22:28 ` Eric W. Biederman 2006-06-20 22:28 ` [PATCH 8/25] msi: Simplify the msi irq limit policy Eric W. Biederman 2006-06-21 1:18 ` [PATCH 7/25] msi: Refactor the msi_ops Rajesh Shah 2006-06-21 1:04 ` [PATCH 6/25] msi: Implement helper functions read_msi_msg and write_msi_msg Rajesh Shah 1 sibling, 2 replies; 50+ messages in thread From: Eric W. Biederman @ 2006-06-20 22:28 UTC (permalink / raw) To: Andrew Morton Cc: linux-kernel, linux-acpi, linux-pci, discuss, Ingo Molnar, Thomas Gleixner, Andi Kleen, Natalie Protasevich, Len Brown, Kimball Murray, Brice Goglin, Greg Lindahl, Dave Olson, Jeff Garzik, Greg KH, Grant Grundler, bibo,mao, Rajesh Shah, Mark Maule, Jesper Juhl, Shaohua Li, Matthew Wilcox, Michael S. Tsirkin, Ashok Raj, Randy Dunlap The current msi_ops are short sighted in a number of ways, this patch attempts to fix the glaring deficiences. - Report in msi_ops if a 64bit address is needed in the msi message, so we can fail 32bit only msi structures. - Send and receive a full struct msi_msg in both setup and target. This is a little cleaner and allows for architectures that need to modify the data to retarget the msi interrupt to a different cpu. - In target pass in the full cpu mask instead of just the first cpu in case we can make use of the full cpu mask. - Operate in terms of irqs and not vectors, currently there is still a 1-1 relationship but on architectures other than ia64 I expect this will change. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> --- drivers/pci/msi-altix.c | 49 +++++++++++++++++++------------------ drivers/pci/msi-apic.c | 36 ++++++++++++++------------- drivers/pci/msi.c | 22 ++++++++--------- drivers/pci/msi.h | 62 ----------------------------------------------- include/linux/pci.h | 62 +++++++++++++++++++++++++++++++++++++++++++++++ 5 files changed, 116 insertions(+), 115 deletions(-) diff --git a/drivers/pci/msi-altix.c b/drivers/pci/msi-altix.c index bed4183..7aedc2a 100644 --- a/drivers/pci/msi-altix.c +++ b/drivers/pci/msi-altix.c @@ -26,7 +26,7 @@ struct sn_msi_info { static struct sn_msi_info *sn_msi_info; static void -sn_msi_teardown(unsigned int vector) +sn_msi_teardown(unsigned int irq) { nasid_t nasid; int widget; @@ -36,7 +36,7 @@ sn_msi_teardown(unsigned int vector) struct pcibus_bussoft *bussoft; struct sn_pcibus_provider *provider; - sn_irq_info = sn_msi_info[vector].sn_irq_info; + sn_irq_info = sn_msi_info[irq].sn_irq_info; if (sn_irq_info == NULL || sn_irq_info->irq_int_bit >= 0) return; @@ -45,9 +45,9 @@ sn_msi_teardown(unsigned int vector) provider = SN_PCIDEV_BUSPROVIDER(pdev); (*provider->dma_unmap)(pdev, - sn_msi_info[vector].pci_addr, + sn_msi_info[irq].pci_addr, PCI_DMA_FROMDEVICE); - sn_msi_info[vector].pci_addr = 0; + sn_msi_info[irq].pci_addr = 0; bussoft = SN_PCIDEV_BUSSOFT(pdev); nasid = NASID_GET(bussoft->bs_base); @@ -56,14 +56,13 @@ sn_msi_teardown(unsigned int vector) SWIN_WIDGETNUM(bussoft->bs_base); sn_intr_free(nasid, widget, sn_irq_info); - sn_msi_info[vector].sn_irq_info = NULL; + sn_msi_info[irq].sn_irq_info = NULL; return; } int -sn_msi_setup(struct pci_dev *pdev, unsigned int vector, - u32 *addr_hi, u32 *addr_lo, u32 *data) +sn_msi_setup(struct pci_dev *pdev, unsigned int irq, struct msi_msg *msg) { int widget; int status; @@ -93,7 +92,7 @@ sn_msi_setup(struct pci_dev *pdev, unsig if (! sn_irq_info) return -ENOMEM; - status = sn_intr_alloc(nasid, widget, sn_irq_info, vector, -1, -1); + status = sn_intr_alloc(nasid, widget, sn_irq_info, irq, -1, -1); if (status) { kfree(sn_irq_info); return -ENOMEM; @@ -119,28 +118,27 @@ sn_msi_setup(struct pci_dev *pdev, unsig return -ENOMEM; } - sn_msi_info[vector].sn_irq_info = sn_irq_info; - sn_msi_info[vector].pci_addr = bus_addr; + sn_msi_info[irq].sn_irq_info = sn_irq_info; + sn_msi_info[irq].pci_addr = bus_addr; - *addr_hi = (u32)(bus_addr >> 32); - *addr_lo = (u32)(bus_addr & 0x00000000ffffffff); + msg->address_hi = (u32)(bus_addr >> 32); + msg->address_lo = (u32)(bus_addr & 0x00000000ffffffff); /* * In the SN platform, bit 16 is a "send vector" bit which * must be present in order to move the vector through the system. */ - *data = 0x100 + (unsigned int)vector; + msg->data = 0x100 + irq; #ifdef CONFIG_SMP - set_irq_affinity_info((vector & 0xff), sn_irq_info->irq_cpuid, 0); + set_irq_affinity_info(irq, sn_irq_info->irq_cpuid, 0); #endif return 0; } static void -sn_msi_target(unsigned int vector, unsigned int cpu, - u32 *addr_hi, u32 *addr_lo) +sn_msi_target(unsigned int irq, cpumask_t cpu_mask, struct msi_msg *msg) { int slice; nasid_t nasid; @@ -150,8 +148,10 @@ sn_msi_target(unsigned int vector, unsig struct sn_irq_info *sn_irq_info; struct sn_irq_info *new_irq_info; struct sn_pcibus_provider *provider; + unsigned int cpu; - sn_irq_info = sn_msi_info[vector].sn_irq_info; + cpu = first_cpu(cpu_mask); + sn_irq_info = sn_msi_info[irq].sn_irq_info; if (sn_irq_info == NULL || sn_irq_info->irq_int_bit >= 0) return; @@ -163,15 +163,15 @@ sn_msi_target(unsigned int vector, unsig pdev = sn_pdev->pdi_linux_pcidev; provider = SN_PCIDEV_BUSPROVIDER(pdev); - bus_addr = (u64)(*addr_hi) << 32 | (u64)(*addr_lo); + bus_addr = (u64)(msg->address_hi) << 32 | (u64)(msg->address_lo); (*provider->dma_unmap)(pdev, bus_addr, PCI_DMA_FROMDEVICE); - sn_msi_info[vector].pci_addr = 0; + sn_msi_info[irq].pci_addr = 0; nasid = cpuid_to_nasid(cpu); slice = cpuid_to_slice(cpu); new_irq_info = sn_retarget_vector(sn_irq_info, nasid, slice); - sn_msi_info[vector].sn_irq_info = new_irq_info; + sn_msi_info[irq].sn_irq_info = new_irq_info; if (new_irq_info == NULL) return; @@ -184,12 +184,13 @@ sn_msi_target(unsigned int vector, unsig sizeof(new_irq_info->irq_xtalkaddr), SN_DMA_MSI|SN_DMA_ADDR_XIO); - sn_msi_info[vector].pci_addr = bus_addr; - *addr_hi = (u32)(bus_addr >> 32); - *addr_lo = (u32)(bus_addr & 0x00000000ffffffff); + sn_msi_info[irq].pci_addr = bus_addr; + msg->address_hi = (u32)(bus_addr >> 32); + msg->address_lo = (u32)(bus_addr & 0x00000000ffffffff); } struct msi_ops sn_msi_ops = { + .needs_64bit_address = 1, .setup = sn_msi_setup, .teardown = sn_msi_teardown, #ifdef CONFIG_SMP @@ -201,7 +202,7 @@ int sn_msi_init(void) { sn_msi_info = - kzalloc(sizeof(struct sn_msi_info) * NR_VECTORS, GFP_KERNEL); + kzalloc(sizeof(struct sn_msi_info) * NR_IRQS, GFP_KERNEL); if (! sn_msi_info) return -ENOMEM; diff --git a/drivers/pci/msi-apic.c b/drivers/pci/msi-apic.c index 5ed798b..1ce2589 100644 --- a/drivers/pci/msi-apic.c +++ b/drivers/pci/msi-apic.c @@ -46,37 +46,36 @@ #define MSI_ADDR_REDIRECTION_LOWPRI static void -msi_target_apic(unsigned int vector, - unsigned int dest_cpu, - u32 *address_hi, /* in/out */ - u32 *address_lo) /* in/out */ +msi_target_apic(unsigned int irq, cpumask_t cpu_mask, struct msi_msg *msg) { - u32 addr = *address_lo; + u32 addr = msg->address_lo; addr &= MSI_ADDR_DESTID_MASK; - addr |= MSI_ADDR_DESTID_CPU(cpu_physical_id(dest_cpu)); + addr |= MSI_ADDR_DESTID_CPU(cpu_physical_id(first_cpu(cpu_mask))); - *address_lo = addr; + msg->address_lo = addr; } static int msi_setup_apic(struct pci_dev *pdev, /* unused in generic */ - unsigned int vector, - u32 *address_hi, - u32 *address_lo, - u32 *data) + unsigned int irq, + struct msi_msg *msg) { unsigned long dest_phys_id; + unsigned int vector; dest_phys_id = cpu_physical_id(first_cpu(cpu_online_map)); + vector = irq; - *address_hi = 0; - *address_lo = MSI_ADDR_HEADER | - MSI_ADDR_DESTMODE_PHYS | - MSI_ADDR_REDIRECTION_CPU | - MSI_ADDR_DESTID_CPU(dest_phys_id); + msg->address_hi = 0; + msg->address_lo = + MSI_ADDR_HEADER | + MSI_ADDR_DESTMODE_PHYS | + MSI_ADDR_REDIRECTION_CPU | + MSI_ADDR_DESTID_CPU(dest_phys_id); - *data = MSI_DATA_TRIGGER_EDGE | + msg->data = + MSI_DATA_TRIGGER_EDGE | MSI_DATA_LEVEL_ASSERT | MSI_DATA_DELIVERY_FIXED | MSI_DATA_VECTOR(vector); @@ -85,7 +84,7 @@ msi_setup_apic(struct pci_dev *pdev, /* } static void -msi_teardown_apic(unsigned int vector) +msi_teardown_apic(unsigned int irq) { return; /* no-op */ } @@ -95,6 +94,7 @@ msi_teardown_apic(unsigned int vector) */ struct msi_ops msi_apic_ops = { + .needs_64bit_address = 0, .setup = msi_setup_apic, .teardown = msi_teardown_apic, .target = msi_target_apic, diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c index e9db6c5..40499c0 100644 --- a/drivers/pci/msi.c +++ b/drivers/pci/msi.c @@ -172,19 +172,17 @@ static void write_msi_msg(struct msi_des } #ifdef CONFIG_SMP -static void set_msi_affinity(unsigned int vector, cpumask_t cpu_mask) +static void set_msi_affinity(unsigned int irq, cpumask_t cpu_mask) { struct msi_desc *entry; struct msi_msg msg; - unsigned int irq = vector; - unsigned int dest_cpu = first_cpu(cpu_mask); - entry = (struct msi_desc *)msi_desc[vector]; + entry = msi_desc[irq]; if (!entry || !entry->dev) return; read_msi_msg(entry, &msg); - msi_ops->target(vector, dest_cpu, &msg.address_hi, &msg.address_lo); + msi_ops->target(irq, cpu_mask, &msg); write_msi_msg(entry, &msg); set_native_irq_info(irq, cpu_mask); } @@ -709,14 +707,14 @@ static int msi_register_init(struct pci_ { int status; struct msi_msg msg; - int pos, vector = dev->irq; + int pos; u16 control; pos = entry->msi_attrib.pos; pci_read_config_word(dev, msi_control_reg(pos), &control); /* Configure MSI capability structure */ - status = msi_ops->setup(dev, vector, &msg.address_hi, &msg.address_lo, &msg.data); + status = msi_ops->setup(dev, dev->irq, &msg); if (status < 0) return status; @@ -871,10 +869,7 @@ static int msix_capability_init(struct p /* Replace with MSI-X handler */ irq_handler_init(PCI_CAP_ID_MSIX, vector, 1); /* Configure MSI-X capability structure */ - status = msi_ops->setup(dev, vector, - &msg.address_hi, - &msg.address_lo, - &msg.data); + status = msi_ops->setup(dev, vector, &msg); if (status < 0) break; @@ -910,6 +905,7 @@ int pci_enable_msi(struct pci_dev* dev) { struct pci_bus *bus; int pos, temp, status = -EINVAL; + u16 control; if (!pci_msi_enable || !dev) return status; @@ -931,6 +927,10 @@ int pci_enable_msi(struct pci_dev* dev) if (!pos) return -EINVAL; + pci_read_config_word(dev, msi_control_reg(pos), &control); + if (!is_64bit_address(control) && msi_ops->needs_64bit_address) + return -EINVAL; + BUG_ON(!msi_lookup_vector(dev, PCI_CAP_ID_MSI)); /* Check whether driver already requested for MSI-X vectors */ diff --git a/drivers/pci/msi.h b/drivers/pci/msi.h index 62f61b6..3519eca 100644 --- a/drivers/pci/msi.h +++ b/drivers/pci/msi.h @@ -6,68 +6,6 @@ #ifndef MSI_H #define MSI_H -/* - * MSI operation vector. Used by the msi core code (drivers/pci/msi.c) - * to abstract platform-specific tasks relating to MSI address generation - * and resource management. - */ -struct msi_ops { - /** - * setup - generate an MSI bus address and data for a given vector - * @pdev: PCI device context (in) - * @vector: vector allocated by the msi core (in) - * @addr_hi: upper 32 bits of PCI bus MSI address (out) - * @addr_lo: lower 32 bits of PCI bus MSI address (out) - * @data: MSI data payload (out) - * - * Description: The setup op is used to generate a PCI bus addres and - * data which the msi core will program into the card MSI capability - * registers. The setup routine is responsible for picking an initial - * cpu to target the MSI at. The setup routine is responsible for - * examining pdev to determine the MSI capabilities of the card and - * generating a suitable address/data. The setup routine is - * responsible for allocating and tracking any system resources it - * needs to route the MSI to the cpu it picks, and for associating - * those resources with the passed in vector. - * - * Returns 0 if the MSI address/data was successfully setup. - **/ - - int (*setup) (struct pci_dev *pdev, unsigned int vector, - u32 *addr_hi, u32 *addr_lo, u32 *data); - - /** - * teardown - release resources allocated by setup - * @vector: vector context for resources (in) - * - * Description: The teardown op is used to release any resources - * that were allocated in the setup routine associated with the passed - * in vector. - **/ - - void (*teardown) (unsigned int vector); - - /** - * target - retarget an MSI at a different cpu - * @vector: vector context for resources (in) - * @cpu: new cpu to direct vector at (in) - * @addr_hi: new value of PCI bus upper 32 bits (in/out) - * @addr_lo: new value of PCI bus lower 32 bits (in/out) - * - * Description: The target op is used to redirect an MSI vector - * at a different cpu. addr_hi/addr_lo coming in are the existing - * values that the MSI core has programmed into the card. The - * target code is responsible for freeing any resources (if any) - * associated with the old address, and generating a new PCI bus - * addr_hi/addr_lo that will redirect the vector at the indicated cpu. - **/ - - void (*target) (unsigned int vector, unsigned int cpu, - u32 *addr_hi, u32 *addr_lo); -}; - -extern int msi_register(struct msi_ops *ops); - #include <asm/msi.h> /* diff --git a/include/linux/pci.h b/include/linux/pci.h index c7be27b..1aa01aa 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -613,6 +613,68 @@ extern int pci_enable_msix(struct pci_de struct msix_entry *entries, int nvec); extern void pci_disable_msix(struct pci_dev *dev); extern void msi_remove_pci_irq_vectors(struct pci_dev *dev); + +/* + * MSI operation vector. Used by the msi core code (drivers/pci/msi.c) + * to abstract platform-specific tasks relating to MSI address generation + * and resource management. + */ +struct msi_ops { + int needs_64bit_address; + /** + * setup - generate an MSI bus address and data for a given vector + * @pdev: PCI device context (in) + * @irq: irq allocated by the msi core (in) + * @msg: PCI bus address and data for msi message (out) + * + * Description: The setup op is used to generate a PCI bus addres and + * data which the msi core will program into the card MSI capability + * registers. The setup routine is responsible for picking an initial + * cpu to target the MSI at. The setup routine is responsible for + * examining pdev to determine the MSI capabilities of the card and + * generating a suitable address/data. The setup routine is + * responsible for allocating and tracking any system resources it + * needs to route the MSI to the cpu it picks, and for associating + * those resources with the passed in vector. + * + * Returns 0 if the MSI address/data was successfully setup. + **/ + + int (*setup) (struct pci_dev *pdev, unsigned int irq, + struct msi_msg *msg); + + /** + * teardown - release resources allocated by setup + * @vector: vector context for resources (in) + * + * Description: The teardown op is used to release any resources + * that were allocated in the setup routine associated with the passed + * in vector. + **/ + + void (*teardown) (unsigned int irq); + + /** + * target - retarget an MSI at a different cpu + * @vector: vector context for resources (in) + * @cpu: new cpu to direct vector at (in) + * @addr_hi: new value of PCI bus upper 32 bits (in/out) + * @addr_lo: new value of PCI bus lower 32 bits (in/out) + * + * Description: The target op is used to redirect an MSI vector + * at a different cpu. addr_hi/addr_lo coming in are the existing + * values that the MSI core has programmed into the card. The + * target code is responsible for freeing any resources (if any) + * associated with the old address, and generating a new PCI bus + * addr_hi/addr_lo that will redirect the vector at the indicated cpu. + **/ + + void (*target) (unsigned int irq, cpumask_t cpumask, + struct msi_msg *msg); +}; + +extern int msi_register(struct msi_ops *ops); + #endif extern void pci_block_user_cfg_access(struct pci_dev *dev); -- 1.4.0.gc07e ^ permalink raw reply related [flat|nested] 50+ messages in thread
* [PATCH 8/25] msi: Simplify the msi irq limit policy. 2006-06-20 22:28 ` [PATCH 7/25] msi: Refactor the msi_ops Eric W. Biederman @ 2006-06-20 22:28 ` Eric W. Biederman 2006-06-20 22:28 ` [PATCH 9/25] irq: Add a dynamic irq creation API Eric W. Biederman ` (2 more replies) 2006-06-21 1:18 ` [PATCH 7/25] msi: Refactor the msi_ops Rajesh Shah 1 sibling, 3 replies; 50+ messages in thread From: Eric W. Biederman @ 2006-06-20 22:28 UTC (permalink / raw) To: Andrew Morton Cc: linux-kernel, linux-acpi, linux-pci, discuss, Ingo Molnar, Thomas Gleixner, Andi Kleen, Natalie Protasevich, Len Brown, Kimball Murray, Brice Goglin, Greg Lindahl, Dave Olson, Jeff Garzik, Greg KH, Grant Grundler, bibo,mao, Rajesh Shah, Mark Maule, Jesper Juhl, Shaohua Li, Matthew Wilcox, Michael S. Tsirkin, Ashok Raj, Randy Dunlap Currently we attempt to predict how many irqs we will be able to allocate with msi using pci_vector_resources and some complicated accounting, and then we only allow each device as many irqs as we think are available on average. Only the s2io driver even takes advantage of this feature all other drivers have a fixed number of irqs they need and bail if they can't get them. pci_vector_resources is inaccurate if anyone ever frees an irq. The whole implmentation is racy. The current irq limit policy does not appear to make sense with current drivers. So I have simplified things. We can revisit this we we need a more sophisticated policy. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> --- arch/i386/pci/irq.c | 30 ----------------------------- arch/ia64/pci/pci.c | 9 --------- drivers/pci/msi.c | 53 ++++++++------------------------------------------- drivers/pci/msi.h | 11 ----------- 4 files changed, 8 insertions(+), 95 deletions(-) diff --git a/arch/i386/pci/irq.c b/arch/i386/pci/irq.c index 8ce6950..768584d 100644 --- a/arch/i386/pci/irq.c +++ b/arch/i386/pci/irq.c @@ -1170,33 +1170,3 @@ #endif } return 0; } - -int pci_vector_resources(int last, int nr_released) -{ - int count = nr_released; - - int next = last; - int offset = (last % 8); - - while (next < FIRST_SYSTEM_VECTOR) { - next += 8; -#ifdef CONFIG_X86_64 - if (next == IA32_SYSCALL_VECTOR) - continue; -#else - if (next == SYSCALL_VECTOR) - continue; -#endif - count++; - if (next >= FIRST_SYSTEM_VECTOR) { - if (offset%8) { - next = FIRST_DEVICE_VECTOR + offset; - offset++; - continue; - } - count--; - } - } - - return count; -} diff --git a/arch/ia64/pci/pci.c b/arch/ia64/pci/pci.c index 5bef0e3..b0028fd 100644 --- a/arch/ia64/pci/pci.c +++ b/arch/ia64/pci/pci.c @@ -810,12 +810,3 @@ pcibios_prep_mwi (struct pci_dev *dev) } return rc; } - -int pci_vector_resources(int last, int nr_released) -{ - int count = nr_released; - - count += (IA64_LAST_DEVICE_VECTOR - last); - - return count; -} diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c index 40499c0..772f5b6 100644 --- a/drivers/pci/msi.c +++ b/drivers/pci/msi.c @@ -30,8 +30,6 @@ static kmem_cache_t* msi_cachep; static int pci_msi_enable = 1; static int last_alloc_vector; static int nr_released_vectors; -static int nr_reserved_vectors = NR_HP_RESERVED_VECTORS; -static int nr_msix_devices; #ifndef CONFIG_X86_IO_APIC int vector_irq[NR_VECTORS] = { [0 ... NR_VECTORS - 1] = -1}; @@ -542,11 +540,6 @@ void pci_scan_msi_device(struct pci_dev { if (!dev) return; - - if (pci_find_capability(dev, PCI_CAP_ID_MSIX) > 0) - nr_msix_devices++; - else if (pci_find_capability(dev, PCI_CAP_ID_MSI) > 0) - nr_reserved_vectors++; } #ifdef CONFIG_PM @@ -877,13 +870,19 @@ static int msix_capability_init(struct p attach_msi_entry(entry, vector); } if (i != nvec) { + int avail = i - 1; i--; for (; i >= 0; i--) { vector = (entries + i)->vector; msi_free_vector(dev, vector, 0); (entries + i)->vector = 0; } - return -EBUSY; + /* If we had some success report the number of irqs + * we succeeded in setting up. + */ + if (avail <= 0) + avail = -EBUSY; + return avail; } /* Set MSI-X enabled bits */ enable_msi_mode(dev, pos, PCI_CAP_ID_MSIX); @@ -943,14 +942,6 @@ int pci_enable_msi(struct pci_dev* dev) return -EINVAL; } status = msi_capability_init(dev); - if (!status) { - if (!pos) - nr_reserved_vectors--; /* Only MSI capable */ - else if (nr_msix_devices > 0) - nr_msix_devices--; /* Both MSI and MSI-X capable, - but choose enabling MSI */ - } - return status; } @@ -1060,10 +1051,9 @@ static int msi_free_vector(struct pci_de int pci_enable_msix(struct pci_dev* dev, struct msix_entry *entries, int nvec) { struct pci_bus *bus; - int status, pos, nr_entries, free_vectors; + int status, pos, nr_entries; int i, j, temp; u16 control; - unsigned long flags; if (!pci_msi_enable || !dev || !entries) return -EINVAL; @@ -1109,34 +1099,7 @@ int pci_enable_msix(struct pci_dev* dev, dev->irq = temp; return -EINVAL; } - - spin_lock_irqsave(&msi_lock, flags); - /* - * msi_lock is provided to ensure that enough vectors resources are - * available before granting. - */ - free_vectors = pci_vector_resources(last_alloc_vector, - nr_released_vectors); - /* Ensure that each MSI/MSI-X device has one vector reserved by - default to avoid any MSI-X driver to take all available - resources */ - free_vectors -= nr_reserved_vectors; - /* Find the average of free vectors among MSI-X devices */ - if (nr_msix_devices > 0) - free_vectors /= nr_msix_devices; - spin_unlock_irqrestore(&msi_lock, flags); - - if (nvec > free_vectors) { - if (free_vectors > 0) - return free_vectors; - else - return -EBUSY; - } - status = msix_capability_init(dev, entries, nvec); - if (!status && nr_msix_devices > 0) - nr_msix_devices--; - return status; } diff --git a/drivers/pci/msi.h b/drivers/pci/msi.h index 3519eca..6793241 100644 --- a/drivers/pci/msi.h +++ b/drivers/pci/msi.h @@ -8,19 +8,8 @@ #define MSI_H #include <asm/msi.h> -/* - * Assume the maximum number of hot plug slots supported by the system is about - * ten. The worstcase is that each of these slots is hot-added with a device, - * which has two MSI/MSI-X capable functions. To avoid any MSI-X driver, which - * attempts to request all available vectors, NR_HP_RESERVED_VECTORS is defined - * as below to ensure at least one message is assigned to each detected MSI/ - * MSI-X device function. - */ -#define NR_HP_RESERVED_VECTORS 20 - extern int vector_irq[NR_VECTORS]; extern void (*interrupt[NR_IRQS])(void); -extern int pci_vector_resources(int last, int nr_released); /* * MSI-X Address Register -- 1.4.0.gc07e ^ permalink raw reply related [flat|nested] 50+ messages in thread
* [PATCH 9/25] irq: Add a dynamic irq creation API 2006-06-20 22:28 ` [PATCH 8/25] msi: Simplify the msi irq limit policy Eric W. Biederman @ 2006-06-20 22:28 ` Eric W. Biederman 2006-06-20 22:28 ` [PATCH 10/25] ia64 irq: Dynamic irq support Eric W. Biederman 2006-06-20 23:56 ` [PATCH 9/25] irq: Add a dynamic irq creation API Benjamin Herrenschmidt 2006-06-21 1:28 ` [PATCH 8/25] msi: Simplify the msi irq limit policy Rajesh Shah 2006-06-21 2:46 ` Roland Dreier 2 siblings, 2 replies; 50+ messages in thread From: Eric W. Biederman @ 2006-06-20 22:28 UTC (permalink / raw) To: Andrew Morton Cc: linux-kernel, linux-acpi, linux-pci, discuss, Ingo Molnar, Thomas Gleixner, Andi Kleen, Natalie Protasevich, Len Brown, Kimball Murray, Brice Goglin, Greg Lindahl, Dave Olson, Jeff Garzik, Greg KH, Grant Grundler, bibo,mao, Rajesh Shah, Mark Maule, Jesper Juhl, Shaohua Li, Matthew Wilcox, Michael S. Tsirkin, Ashok Raj, Randy Dunlap With the msi support comes a new concept in irq handling, irqs that are created dynamically at run time. Currently the msi code allocates irqs backwards. First it allocates a platform dependent routing value for an interrupt the ``vector'' and then it figures out from the vector which irq you are on. This msi backwards allocator suffers from two basic problems. The allocator suffers because it is trying to do something that is architecture specific in a generic way making it brittle, inflexible, and tied to tightly to the architecture implementation. The alloctor also suffers from it's very backwards nature as it has tied things together that should have no dependencies. To solve the basic dynamic irq allocation problem two new architecture specific functions are added: create_irq and destroy_irq. create_irq takes no input and returns an unused irq number, that won't be reused until it is returned to the free poll with destroy_irq. The irq then can be used for any purpose although the only initial consumer is the msi code. destroy_irq takes an irq number allocated with create_irq and returns it to the free pool. Making this functionality per architecture increases the simplicity of the irq allocation code and increases it's flexibility. dynamic_irq_init() and dynamic_irq_cleanup() are added to automate the irq_desc initializtion that should happen for dynamic irqs. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> --- include/linux/irq.h | 9 +++++++- kernel/irq/chip.c | 56 +++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 64 insertions(+), 1 deletions(-) diff --git a/include/linux/irq.h b/include/linux/irq.h index b79d178..6d1ad88 100644 --- a/include/linux/irq.h +++ b/include/linux/irq.h @@ -392,8 +392,15 @@ set_irq_chained_handler(unsigned int irq __set_irq_handler(irq, handle, 1); } -/* Set/get chip/data for an IRQ: */ +/* Handle dynamic irq creation and destruction */ +extern int create_irq(void); +extern void destroy_irq(unsigned int irq); + +/* Dynamic irq helper functions */ +extern void dynamic_irq_init(unsigned int irq); +extern void dynamic_irq_cleanup(unsigned int irq); +/* Set/get chip/data for an IRQ: */ extern int set_irq_chip(unsigned int irq, struct irq_chip *chip); extern int set_irq_data(unsigned int irq, void *data); extern int set_irq_chip_data(unsigned int irq, void *data); diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c index 431e9d5..9c01e48 100644 --- a/kernel/irq/chip.c +++ b/kernel/irq/chip.c @@ -18,6 +18,62 @@ #include <linux/kernel_stat.h> #include "internals.h" /** + * dynamic_irq_init - initialize a dynamically allocated irq + * @irq: irq number to initialize + */ +void dynamic_irq_init(unsigned int irq) +{ + struct irq_desc *desc; + unsigned long flags; + + if (irq >= NR_IRQS) { + printk(KERN_ERR "Trying to initialize invalid IRQ%d\n", irq); + WARN_ON(1); + return; + } + + /* Ensure we don't have left over values from a previous use of this irq */ + desc = irq_desc + irq; + spin_lock_irqsave(&desc->lock, flags); + desc->status = IRQ_DISABLED; + desc->chip = &no_irq_chip; + desc->handle_irq = handle_bad_irq; + desc->depth = 1; + desc->handler_data = NULL; + desc->chip_data = NULL; + desc->action = NULL; + desc->irq_count = 0; + desc->irqs_unhandled = 0; +#ifdef CONFIG_SMP + desc->affinity = CPU_MASK_ALL; +#endif + spin_unlock_irqrestore(&desc->lock, flags); +} + +/** + * dynamic_irq_cleanup - cleanup a dynamically allocated irq + * @irq: irq number to initialize + */ +void dynamic_irq_cleanup(unsigned int irq) +{ + struct irq_desc *desc; + unsigned long flags; + + if (irq >= NR_IRQS) { + printk(KERN_ERR "Trying to cleanup invalid IRQ%d\n", irq); + WARN_ON(1); + return; + } + + desc = irq_desc + irq; + spin_lock_irqsave(&desc->lock, flags); + desc->handle_irq = handle_bad_irq; + desc->chip = &no_irq_chip; + spin_unlock_irqrestore(&desc->lock, flags); +} + + +/** * set_irq_chip - set the irq chip for an irq * @irq: irq number * @chip: pointer to irq chip description structure -- 1.4.0.gc07e ^ permalink raw reply related [flat|nested] 50+ messages in thread
* [PATCH 10/25] ia64 irq: Dynamic irq support 2006-06-20 22:28 ` [PATCH 9/25] irq: Add a dynamic irq creation API Eric W. Biederman @ 2006-06-20 22:28 ` Eric W. Biederman 2006-06-20 22:28 ` [PATCH 11/25] i386 " Eric W. Biederman 2006-06-20 23:56 ` [PATCH 9/25] irq: Add a dynamic irq creation API Benjamin Herrenschmidt 1 sibling, 1 reply; 50+ messages in thread From: Eric W. Biederman @ 2006-06-20 22:28 UTC (permalink / raw) To: Andrew Morton Cc: linux-kernel, linux-acpi, linux-pci, discuss, Ingo Molnar, Thomas Gleixner, Andi Kleen, Natalie Protasevich, Len Brown, Kimball Murray, Brice Goglin, Greg Lindahl, Dave Olson, Jeff Garzik, Greg KH, Grant Grundler, bibo,mao, Rajesh Shah, Mark Maule, Jesper Juhl, Shaohua Li, Matthew Wilcox, Michael S. Tsirkin, Ashok Raj, Randy Dunlap Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> --- arch/ia64/kernel/irq_ia64.c | 23 +++++++++++++++++++++++ 1 files changed, 23 insertions(+), 0 deletions(-) diff --git a/arch/ia64/kernel/irq_ia64.c b/arch/ia64/kernel/irq_ia64.c index f503530..b0189c7 100644 --- a/arch/ia64/kernel/irq_ia64.c +++ b/arch/ia64/kernel/irq_ia64.c @@ -31,6 +31,7 @@ #include <linux/smp.h> #include <linux/smp_lock.h> #include <linux/threads.h> #include <linux/bitops.h> +#include <linux/irq.h> #include <asm/delay.h> #include <asm/intrinsics.h> @@ -106,6 +107,28 @@ reserve_irq_vector (int vector) return test_and_set_bit(pos, ia64_vector_mask); } +/* + * Dynamic irq allocate and deallocation for MSI + */ +int create_irq(void) +{ + int vector; + unsigned long flags; + + vector = assign_irq_vector(AUTO_ASSIGN); + + if (vector >= 0) + dynamic_irq_init(irq); + + return vector; +} + +void destroy_irq(unsigned int irq) +{ + dynamic_irq_cleanup(irq); + free_irq_vector(irq); +} + #ifdef CONFIG_SMP # define IS_RESCHEDULE(vec) (vec == IA64_IPI_RESCHEDULE) #else -- 1.4.0.gc07e ^ permalink raw reply related [flat|nested] 50+ messages in thread
* [PATCH 11/25] i386 irq: Dynamic irq support 2006-06-20 22:28 ` [PATCH 10/25] ia64 irq: Dynamic irq support Eric W. Biederman @ 2006-06-20 22:28 ` Eric W. Biederman 2006-06-20 22:28 ` [PATCH 12/25] x86_64 " Eric W. Biederman 2006-06-21 1:50 ` [PATCH 11/25] i386 irq: Dynamic irq support Rajesh Shah 0 siblings, 2 replies; 50+ messages in thread From: Eric W. Biederman @ 2006-06-20 22:28 UTC (permalink / raw) To: Andrew Morton Cc: linux-kernel, linux-acpi, linux-pci, discuss, Ingo Molnar, Thomas Gleixner, Andi Kleen, Natalie Protasevich, Len Brown, Kimball Murray, Brice Goglin, Greg Lindahl, Dave Olson, Jeff Garzik, Greg KH, Grant Grundler, bibo,mao, Rajesh Shah, Mark Maule, Jesper Juhl, Shaohua Li, Matthew Wilcox, Michael S. Tsirkin, Ashok Raj, Randy Dunlap The current implementation of create_irq() is a hack but it is the current hack that msi.c uses, and unfortunately the ``generic'' apic msi ops depend on this hack. Thus we are stuck this hack of assuming irq == vector until the depencencies in the generic msi code are removed. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> --- arch/i386/kernel/io_apic.c | 48 ++++++++++++++++++++++++++++++++++++++++++++ 1 files changed, 48 insertions(+), 0 deletions(-) diff --git a/arch/i386/kernel/io_apic.c b/arch/i386/kernel/io_apic.c index 16966f4..04f78ff 100644 --- a/arch/i386/kernel/io_apic.c +++ b/arch/i386/kernel/io_apic.c @@ -2497,6 +2497,54 @@ static int __init ioapic_init_sysfs(void device_initcall(ioapic_init_sysfs); +#ifdef CONFIG_PCI_MSI +/* + * Dynamic irq allocate and deallocation for MSI + */ +int create_irq(void) +{ + /* Hack of the day: irq == vector. + * + * Ultimately this will be be more general, + * and not depend on the irq to vector identity mapping. + * But this version is needed until msi.c can cope with + * the more general form. + */ + int irq, vector; + unsigned long flags; + vector = assign_irq_vector(AUTO_ASSIGN); + irq = vector; + + if (vector >= 0) { + struct irq_desc *desc; + + spin_lock_irqsave(&vector_lock, flags); + vector_irq[vector] = irq; + irq_vector[irq] = vector; + spin_unlock_irqrestore(&vector_lock, flags); + + set_intr_gate(vector, interrupt[irq]); + + dynamic_irq_init(irq); + } + return irq; +} + +void destroy_irq(unsigned int irq) +{ + unsigned long flags; + unsigned int vector; + + dynmic_irq_cleanup(irq); + + spin_lock_irqsave(&vector_lock, flags); + vector = irq_vector[irq]; + vector_irq[vector] = -1; + irq_vector[irq] = 0; + spin_unlock_irqrestore(&vector_lock, flags); +} +#endif /* CONFIG_PCI_MSI */ + /* -------------------------------------------------------------------------- ACPI-based IOAPIC Configuration -------------------------------------------------------------------------- */ -- 1.4.0.gc07e ^ permalink raw reply related [flat|nested] 50+ messages in thread
* [PATCH 12/25] x86_64 irq: Dynamic irq support 2006-06-20 22:28 ` [PATCH 11/25] i386 " Eric W. Biederman @ 2006-06-20 22:28 ` Eric W. Biederman 2006-06-20 22:28 ` [PATCH 13/25] msi: Make the msi code irq based and not vector based Eric W. Biederman 2006-06-21 1:50 ` [PATCH 11/25] i386 irq: Dynamic irq support Rajesh Shah 1 sibling, 1 reply; 50+ messages in thread From: Eric W. Biederman @ 2006-06-20 22:28 UTC (permalink / raw) To: Andrew Morton Cc: linux-kernel, linux-acpi, linux-pci, discuss, Ingo Molnar, Thomas Gleixner, Andi Kleen, Natalie Protasevich, Len Brown, Kimball Murray, Brice Goglin, Greg Lindahl, Dave Olson, Jeff Garzik, Greg KH, Grant Grundler, bibo,mao, Rajesh Shah, Mark Maule, Jesper Juhl, Shaohua Li, Matthew Wilcox, Michael S. Tsirkin, Ashok Raj, Randy Dunlap The current implementation of create_irq() is a hack but it is the current hack that msi.c uses, and unfortunately the ``generic'' apic msi ops depend on this hack. Thus we are this hack of assuming irq == vector until the depencencies in the generic irq code are removed. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> --- arch/x86_64/kernel/io_apic.c | 48 ++++++++++++++++++++++++++++++++++++++++++ 1 files changed, 48 insertions(+), 0 deletions(-) diff --git a/arch/x86_64/kernel/io_apic.c b/arch/x86_64/kernel/io_apic.c index f50be45..ae64a63 100644 --- a/arch/x86_64/kernel/io_apic.c +++ b/arch/x86_64/kernel/io_apic.c @@ -2013,6 +2013,54 @@ static int __init ioapic_init_sysfs(void device_initcall(ioapic_init_sysfs); +#ifdef CONFIG_PCI_MSI +/* + * Dynamic irq allocate and deallocation for MSI + */ +int create_irq(void) +{ + /* Hack of the day: irq == vector. + * + * Ultimately this will be be more general, + * and not depend on the irq to vector identity mapping. + * But this version is needed until msi.c can cope with + * the more general form. + */ + int irq, vector; + unsigned long flags; + vector = assign_irq_vector(AUTO_ASSIGN); + irq = vector; + + if (vector >= 0) { + struct irq_desc *desc; + + spin_lock_irqsave(&vector_lock, flags); + vector_irq[vector] = irq; + irq_vector[irq] = vector; + spin_unlock_irqrestore(&vector_lock, flags); + + set_intr_gate(vector, interrupt[irq]); + + dynamic_irq_init(irq); + } + return irq; +} + +void destroy_irq(unsigned int irq) +{ + unsigned long flags; + unsigned int vector; + + dynamic_irq_cleanup(irq); + + spin_lock_irqsave(&vector_lock, flags); + vector = irq_vector[irq]; + vector_irq[vector] = -1; + irq_vector[irq] = 0; + spin_unlock_irqrestore(&vector_lock, flags); +} +#endif + /* -------------------------------------------------------------------------- ACPI-based IOAPIC Configuration -------------------------------------------------------------------------- */ -- 1.4.0.gc07e ^ permalink raw reply related [flat|nested] 50+ messages in thread
* [PATCH 13/25] msi: Make the msi code irq based and not vector based. 2006-06-20 22:28 ` [PATCH 12/25] x86_64 " Eric W. Biederman @ 2006-06-20 22:28 ` Eric W. Biederman 2006-06-20 22:28 ` [PATCH 14/25] x86_64 irq: Move msi message composition into io_apic.c Eric W. Biederman 0 siblings, 1 reply; 50+ messages in thread From: Eric W. Biederman @ 2006-06-20 22:28 UTC (permalink / raw) To: Andrew Morton Cc: linux-kernel, linux-acpi, linux-pci, discuss, Ingo Molnar, Thomas Gleixner, Andi Kleen, Natalie Protasevich, Len Brown, Kimball Murray, Brice Goglin, Greg Lindahl, Dave Olson, Jeff Garzik, Greg KH, Grant Grundler, bibo,mao, Rajesh Shah, Mark Maule, Jesper Juhl, Shaohua Li, Matthew Wilcox, Michael S. Tsirkin, Ashok Raj, Randy Dunlap The msi currently allocates irqs backwards. First it allocates a platform dependent routing value for an interrupt the ``vector'' and then it figures out from the vector which irq you are on. For ia64 this is fine. For x86 and x86_64 this is complete nonsense and makes an enourmous mess of the irq handling code and prevents some pretty significant cleanups in the code for handling large numbers of irqs. This patch refactors msi.c to work in terms of irqs and create_irq/destroy_irq for dynamically managing irqs. Hopefully this is finally a version of msi.c that is useful on more than just x86 derivatives. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> --- drivers/pci/msi.c | 425 +++++++++++++++++++++-------------------------------- drivers/pci/msi.h | 7 - 2 files changed, 168 insertions(+), 264 deletions(-) diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c index 772f5b6..a5d3685 100644 --- a/drivers/pci/msi.c +++ b/drivers/pci/msi.c @@ -6,6 +6,7 @@ * Copyright (C) Tom Long Nguyen (tom.l.nguyen@intel.com) */ +#include <linux/err.h> #include <linux/mm.h> #include <linux/irq.h> #include <linux/interrupt.h> @@ -28,12 +29,6 @@ static struct msi_desc* msi_desc[NR_IRQS static kmem_cache_t* msi_cachep; static int pci_msi_enable = 1; -static int last_alloc_vector; -static int nr_released_vectors; - -#ifndef CONFIG_X86_IO_APIC -int vector_irq[NR_VECTORS] = { [0 ... NR_VECTORS - 1] = -1}; -#endif static struct msi_ops *msi_ops; @@ -60,11 +55,11 @@ static int msi_cache_init(void) return 0; } -static void msi_set_mask_bit(unsigned int vector, int flag) +static void msi_set_mask_bit(unsigned int irq, int flag) { struct msi_desc *entry; - entry = (struct msi_desc *)msi_desc[vector]; + entry = msi_desc[irq]; if (!entry || !entry->dev || !entry->mask_base) return; switch (entry->msi_attrib.type) { @@ -188,23 +183,23 @@ #else #define set_msi_affinity NULL #endif /* CONFIG_SMP */ -static void mask_MSI_irq(unsigned int vector) +static void mask_MSI_irq(unsigned int irq) { - msi_set_mask_bit(vector, 1); + msi_set_mask_bit(irq, 1); } -static void unmask_MSI_irq(unsigned int vector) +static void unmask_MSI_irq(unsigned int irq) { - msi_set_mask_bit(vector, 0); + msi_set_mask_bit(irq, 0); } -static unsigned int startup_msi_irq_wo_maskbit(unsigned int vector) +static unsigned int startup_msi_irq_wo_maskbit(unsigned int irq) { struct msi_desc *entry; unsigned long flags; spin_lock_irqsave(&msi_lock, flags); - entry = msi_desc[vector]; + entry = msi_desc[irq]; if (!entry || !entry->dev) { spin_unlock_irqrestore(&msi_lock, flags); return 0; @@ -215,39 +210,39 @@ static unsigned int startup_msi_irq_wo_m return 0; /* never anything pending */ } -static unsigned int startup_msi_irq_w_maskbit(unsigned int vector) +static unsigned int startup_msi_irq_w_maskbit(unsigned int irq) { - startup_msi_irq_wo_maskbit(vector); - unmask_MSI_irq(vector); + startup_msi_irq_wo_maskbit(irq); + unmask_MSI_irq(irq); return 0; /* never anything pending */ } -static void shutdown_msi_irq(unsigned int vector) +static void shutdown_msi_irq(unsigned int irq) { struct msi_desc *entry; unsigned long flags; spin_lock_irqsave(&msi_lock, flags); - entry = msi_desc[vector]; + entry = msi_desc[irq]; if (entry && entry->dev) entry->msi_attrib.state = 0; /* Mark it not active */ spin_unlock_irqrestore(&msi_lock, flags); } -static void end_msi_irq_wo_maskbit(unsigned int vector) +static void end_msi_irq_wo_maskbit(unsigned int irq) { - move_native_irq(vector); + move_native_irq(irq); ack_APIC_irq(); } -static void end_msi_irq_w_maskbit(unsigned int vector) +static void end_msi_irq_w_maskbit(unsigned int irq) { - move_native_irq(vector); - unmask_MSI_irq(vector); + move_native_irq(irq); + unmask_MSI_irq(irq); ack_APIC_irq(); } -static void do_nothing(unsigned int vector) +static void do_nothing(unsigned int irq) { } @@ -298,86 +293,7 @@ static struct hw_interrupt_type msi_irq_ .set_affinity = set_msi_affinity }; -static int msi_free_vector(struct pci_dev* dev, int vector, int reassign); -static int assign_msi_vector(void) -{ - static int new_vector_avail = 1; - int vector; - unsigned long flags; - - /* - * msi_lock is provided to ensure that successful allocation of MSI - * vector is assigned unique among drivers. - */ - spin_lock_irqsave(&msi_lock, flags); - - if (!new_vector_avail) { - int free_vector = 0; - - /* - * vector_irq[] = -1 indicates that this specific vector is: - * - assigned for MSI (since MSI have no associated IRQ) or - * - assigned for legacy if less than 16, or - * - having no corresponding 1:1 vector-to-IOxAPIC IRQ mapping - * vector_irq[] = 0 indicates that this vector, previously - * assigned for MSI, is freed by hotplug removed operations. - * This vector will be reused for any subsequent hotplug added - * operations. - * vector_irq[] > 0 indicates that this vector is assigned for - * IOxAPIC IRQs. This vector and its value provides a 1-to-1 - * vector-to-IOxAPIC IRQ mapping. - */ - for (vector = FIRST_DEVICE_VECTOR; vector < NR_IRQS; vector++) { - if (vector_irq[vector] != 0) - continue; - free_vector = vector; - if (!msi_desc[vector]) - break; - else - continue; - } - if (!free_vector) { - spin_unlock_irqrestore(&msi_lock, flags); - return -EBUSY; - } - vector_irq[free_vector] = -1; - nr_released_vectors--; - spin_unlock_irqrestore(&msi_lock, flags); - if (msi_desc[free_vector] != NULL) { - struct pci_dev *dev; - int tail; - - /* free all linked vectors before re-assign */ - do { - spin_lock_irqsave(&msi_lock, flags); - dev = msi_desc[free_vector]->dev; - tail = msi_desc[free_vector]->link.tail; - spin_unlock_irqrestore(&msi_lock, flags); - msi_free_vector(dev, tail, 1); - } while (free_vector != tail); - } - - return free_vector; - } - vector = assign_irq_vector(AUTO_ASSIGN); - last_alloc_vector = vector; - if (vector == LAST_DEVICE_VECTOR) - new_vector_avail = 0; - - spin_unlock_irqrestore(&msi_lock, flags); - return vector; -} - -static int get_new_vector(void) -{ - int vector = assign_msi_vector(); - - if (vector > 0) - set_intr_gate(vector, interrupt[vector]); - - return vector; -} - +static int msi_free_irq(struct pci_dev* dev, int irq); static int msi_init(void) { static int status = -ENOMEM; @@ -401,13 +317,13 @@ static int msi_init(void) } if (! msi_ops) { + pci_msi_enable = 0; printk(KERN_WARNING "PCI: MSI ops not registered. MSI disabled.\n"); status = -EINVAL; return status; } - last_alloc_vector = assign_irq_vector(AUTO_ASSIGN); status = msi_cache_init(); if (status < 0) { pci_msi_enable = 0; @@ -415,23 +331,9 @@ static int msi_init(void) return status; } - if (last_alloc_vector < 0) { - pci_msi_enable = 0; - printk(KERN_WARNING "PCI: No interrupt vectors available for MSI\n"); - status = -EBUSY; - return status; - } - vector_irq[last_alloc_vector] = 0; - nr_released_vectors++; - return status; } -static int get_msi_vector(struct pci_dev *dev) -{ - return get_new_vector(); -} - static struct msi_desc* alloc_msi_entry(void) { struct msi_desc *entry; @@ -447,29 +349,45 @@ static struct msi_desc* alloc_msi_entry( return entry; } -static void attach_msi_entry(struct msi_desc *entry, int vector) +static void attach_msi_entry(struct msi_desc *entry, int irq) { unsigned long flags; spin_lock_irqsave(&msi_lock, flags); - msi_desc[vector] = entry; + msi_desc[irq] = entry; spin_unlock_irqrestore(&msi_lock, flags); } -static void irq_handler_init(int cap_id, int pos, int mask) +static int create_msi_irq(struct hw_interrupt_type *handler) { - unsigned long flags; + struct msi_desc *entry; + int irq; - spin_lock_irqsave(&irq_desc[pos].lock, flags); - if (cap_id == PCI_CAP_ID_MSIX) - irq_desc[pos].chip = &msix_irq_type; - else { - if (!mask) - irq_desc[pos].chip = &msi_irq_wo_maskbit_type; - else - irq_desc[pos].chip = &msi_irq_w_maskbit_type; + entry = alloc_msi_entry(); + if (!entry) + return -ENOMEM; + + irq = create_irq(); + if (irq < 0) { + kmem_cache_free(msi_cachep, entry); + return -EBUSY; } - spin_unlock_irqrestore(&irq_desc[pos].lock, flags); + + set_irq_chip(irq, handler); + set_irq_data(irq, entry); + + return irq; +} + +static void destroy_msi_irq(unsigned int irq) +{ + struct msi_desc *entry; + + entry = get_irq_data(irq); + set_irq_chip(irq, NULL); + set_irq_data(irq, NULL); + destroy_irq(irq); + kmem_cache_free(msi_cachep, entry); } static void enable_msi_mode(struct pci_dev *dev, int pos, int type) @@ -514,21 +432,21 @@ void disable_msi_mode(struct pci_dev *de } } -static int msi_lookup_vector(struct pci_dev *dev, int type) +static int msi_lookup_irq(struct pci_dev *dev, int type) { - int vector; + int irq; unsigned long flags; spin_lock_irqsave(&msi_lock, flags); - for (vector = FIRST_DEVICE_VECTOR; vector < NR_IRQS; vector++) { - if (!msi_desc[vector] || msi_desc[vector]->dev != dev || - msi_desc[vector]->msi_attrib.type != type || - msi_desc[vector]->msi_attrib.default_vector != dev->irq) + for (irq = 0; irq < NR_IRQS; irq++) { + if (!msi_desc[irq] || msi_desc[irq]->dev != dev || + msi_desc[irq]->msi_attrib.type != type || + msi_desc[irq]->msi_attrib.default_irq != dev->irq) continue; spin_unlock_irqrestore(&msi_lock, flags); - /* This pre-assigned MSI vector for this device - already exits. Override dev->irq with this vector */ - dev->irq = vector; + /* This pre-assigned MSI irq for this device + already exits. Override dev->irq with this irq */ + dev->irq = irq; return 0; } spin_unlock_irqrestore(&msi_lock, flags); @@ -613,7 +531,7 @@ int pci_save_msix_state(struct pci_dev * { int pos; int temp; - int vector, head, tail = 0; + int irq, head, tail = 0; u16 control; struct pci_cap_saved_state *save_state; @@ -635,20 +553,20 @@ int pci_save_msix_state(struct pci_dev * /* save the table */ temp = dev->irq; - if (msi_lookup_vector(dev, PCI_CAP_ID_MSIX)) { + if (msi_lookup_irq(dev, PCI_CAP_ID_MSIX)) { kfree(save_state); return -EINVAL; } - vector = head = dev->irq; + irq = head = dev->irq; while (head != tail) { struct msi_desc *entry; - entry = msi_desc[vector]; + entry = msi_desc[irq]; read_msi_msg(entry, &entry->msg_save); - tail = msi_desc[vector]->link.tail; - vector = tail; + tail = msi_desc[irq]->link.tail; + irq = tail; } dev->irq = temp; @@ -661,7 +579,7 @@ void pci_restore_msix_state(struct pci_d { u16 save; int pos; - int vector, head, tail = 0; + int irq, head, tail = 0; struct msi_desc *entry; int temp; struct pci_cap_saved_state *save_state; @@ -679,15 +597,15 @@ void pci_restore_msix_state(struct pci_d /* route the table */ temp = dev->irq; - if (msi_lookup_vector(dev, PCI_CAP_ID_MSIX)) + if (msi_lookup_irq(dev, PCI_CAP_ID_MSIX)) return; - vector = head = dev->irq; + irq = head = dev->irq; while (head != tail) { - entry = msi_desc[vector]; + entry = msi_desc[irq]; write_msi_msg(entry, &entry->msg_save); - tail = msi_desc[vector]->link.tail; - vector = tail; + tail = msi_desc[irq]->link.tail; + irq = tail; } dev->irq = temp; @@ -734,55 +652,54 @@ static int msi_register_init(struct pci_ * @dev: pointer to the pci_dev data structure of MSI device function * * Setup the MSI capability structure of device function with a single - * MSI vector, regardless of device function is capable of handling + * MSI irq, regardless of device function is capable of handling * multiple messages. A return of zero indicates the successful setup - * of an entry zero with the new MSI vector or non-zero for otherwise. + * of an entry zero with the new MSI irq or non-zero for otherwise. **/ static int msi_capability_init(struct pci_dev *dev) { int status; struct msi_desc *entry; - int pos, vector; + int pos, irq; u16 control; + struct hw_interrupt_type *handler; pos = pci_find_capability(dev, PCI_CAP_ID_MSI); pci_read_config_word(dev, msi_control_reg(pos), &control); /* MSI Entry Initialization */ - entry = alloc_msi_entry(); - if (!entry) - return -ENOMEM; + handler = &msi_irq_wo_maskbit_type; + if (is_mask_bit_support(control)) + handler = &msi_irq_w_maskbit_type; - vector = get_msi_vector(dev); - if (vector < 0) { - kmem_cache_free(msi_cachep, entry); - return -EBUSY; - } - entry->link.head = vector; - entry->link.tail = vector; + irq = create_msi_irq(handler); + if (irq < 0) + return irq; + + entry = get_irq_data(irq); + entry->link.head = irq; + entry->link.tail = irq; entry->msi_attrib.type = PCI_CAP_ID_MSI; entry->msi_attrib.state = 0; /* Mark it not active */ entry->msi_attrib.is_64 = is_64bit_address(control); entry->msi_attrib.entry_nr = 0; entry->msi_attrib.maskbit = is_mask_bit_support(control); - entry->msi_attrib.default_vector = dev->irq; /* Save IOAPIC IRQ */ + entry->msi_attrib.default_irq = dev->irq; /* Save IOAPIC IRQ */ entry->msi_attrib.pos = pos; - dev->irq = vector; + dev->irq = irq; entry->dev = dev; if (is_mask_bit_support(control)) { entry->mask_base = (void __iomem *)(long)msi_mask_bits_reg(pos, is_64bit_address(control)); } - /* Replace with MSI handler */ - irq_handler_init(PCI_CAP_ID_MSI, vector, entry->msi_attrib.maskbit); /* Configure MSI capability structure */ status = msi_register_init(dev, entry); if (status != 0) { - dev->irq = entry->msi_attrib.default_vector; - kmem_cache_free(msi_cachep, entry); + dev->irq = entry->msi_attrib.default_irq; + destroy_msi_irq(irq); return status; } - attach_msi_entry(entry, vector); + attach_msi_entry(entry, irq); /* Set MSI enabled bits */ enable_msi_mode(dev, pos, PCI_CAP_ID_MSI); @@ -796,8 +713,8 @@ static int msi_capability_init(struct pc * @nvec: number of @entries * * Setup the MSI-X capability structure of device function with a - * single MSI-X vector. A return of zero indicates the successful setup of - * requested MSI-X entries with allocated vectors or non-zero for otherwise. + * single MSI-X irq. A return of zero indicates the successful setup of + * requested MSI-X entries with allocated irqs or non-zero for otherwise. **/ static int msix_capability_init(struct pci_dev *dev, struct msix_entry *entries, int nvec) @@ -805,7 +722,7 @@ static int msix_capability_init(struct p struct msi_desc *head = NULL, *tail = NULL, *entry = NULL; struct msi_msg msg; int status; - int vector, pos, i, j, nr_entries, temp = 0; + int irq, pos, i, j, nr_entries, temp = 0; unsigned long phys_addr; u32 table_offset; u16 control; @@ -827,54 +744,50 @@ static int msix_capability_init(struct p /* MSI-X Table Initialization */ for (i = 0; i < nvec; i++) { - entry = alloc_msi_entry(); - if (!entry) - break; - vector = get_msi_vector(dev); - if (vector < 0) { - kmem_cache_free(msi_cachep, entry); + irq = create_msi_irq(&msix_irq_type); + if (irq < 0) break; - } + entry = get_irq_data(irq); j = entries[i].entry; - entries[i].vector = vector; + entries[i].vector = irq; entry->msi_attrib.type = PCI_CAP_ID_MSIX; entry->msi_attrib.state = 0; /* Mark it not active */ entry->msi_attrib.is_64 = 1; entry->msi_attrib.entry_nr = j; entry->msi_attrib.maskbit = 1; - entry->msi_attrib.default_vector = dev->irq; + entry->msi_attrib.default_irq = dev->irq; entry->msi_attrib.pos = pos; entry->dev = dev; entry->mask_base = base; if (!head) { - entry->link.head = vector; - entry->link.tail = vector; + entry->link.head = irq; + entry->link.tail = irq; head = entry; } else { entry->link.head = temp; entry->link.tail = tail->link.tail; - tail->link.tail = vector; - head->link.head = vector; + tail->link.tail = irq; + head->link.head = irq; } - temp = vector; + temp = irq; tail = entry; - /* Replace with MSI-X handler */ - irq_handler_init(PCI_CAP_ID_MSIX, vector, 1); /* Configure MSI-X capability structure */ - status = msi_ops->setup(dev, vector, &msg); - if (status < 0) + status = msi_ops->setup(dev, irq, &msg); + if (status < 0) { + destroy_msi_irq(irq); break; + } write_msi_msg(entry, &msg); - attach_msi_entry(entry, vector); + attach_msi_entry(entry, irq); } if (i != nvec) { int avail = i - 1; i--; for (; i >= 0; i--) { - vector = (entries + i)->vector; - msi_free_vector(dev, vector, 0); + irq = (entries + i)->vector; + msi_free_irq(dev, irq); (entries + i)->vector = 0; } /* If we had some success report the number of irqs @@ -895,10 +808,10 @@ static int msix_capability_init(struct p * @dev: pointer to the pci_dev data structure of MSI device function * * Setup the MSI capability structure of device function with - * a single MSI vector upon its software driver call to request for + * a single MSI irq upon its software driver call to request for * MSI mode enabled on its hardware device function. A return of zero * indicates the successful setup of an entry zero with the new MSI - * vector or non-zero for otherwise. + * irq or non-zero for otherwise. **/ int pci_enable_msi(struct pci_dev* dev) { @@ -930,13 +843,13 @@ int pci_enable_msi(struct pci_dev* dev) if (!is_64bit_address(control) && msi_ops->needs_64bit_address) return -EINVAL; - BUG_ON(!msi_lookup_vector(dev, PCI_CAP_ID_MSI)); + BUG_ON(!msi_lookup_irq(dev, PCI_CAP_ID_MSI)); - /* Check whether driver already requested for MSI-X vectors */ + /* Check whether driver already requested for MSI-X irqs */ pos = pci_find_capability(dev, PCI_CAP_ID_MSIX); - if (pos > 0 && !msi_lookup_vector(dev, PCI_CAP_ID_MSIX)) { + if (pos > 0 && !msi_lookup_irq(dev, PCI_CAP_ID_MSIX)) { printk(KERN_INFO "PCI: %s: Can't enable MSI. " - "Device already has MSI-X vectors assigned\n", + "Device already has MSI-X irq assigned\n", pci_name(dev)); dev->irq = temp; return -EINVAL; @@ -948,7 +861,7 @@ int pci_enable_msi(struct pci_dev* dev) void pci_disable_msi(struct pci_dev* dev) { struct msi_desc *entry; - int pos, default_vector; + int pos, default_irq; u16 control; unsigned long flags; @@ -976,30 +889,30 @@ void pci_disable_msi(struct pci_dev* dev if (entry->msi_attrib.state) { spin_unlock_irqrestore(&msi_lock, flags); printk(KERN_WARNING "PCI: %s: pci_disable_msi() called without " - "free_irq() on MSI vector %d\n", + "free_irq() on MSI irq %d\n", pci_name(dev), dev->irq); BUG_ON(entry->msi_attrib.state > 0); } else { - default_vector = entry->msi_attrib.default_vector; + default_irq = entry->msi_attrib.default_irq; spin_unlock_irqrestore(&msi_lock, flags); - msi_free_vector(dev, dev->irq, 0); + msi_free_irq(dev, dev->irq); - /* Restore dev->irq to its default pin-assertion vector */ - dev->irq = default_vector; + /* Restore dev->irq to its default pin-assertion irq */ + dev->irq = default_irq; } } -static int msi_free_vector(struct pci_dev* dev, int vector, int reassign) +static int msi_free_irq(struct pci_dev* dev, int irq) { struct msi_desc *entry; int head, entry_nr, type; void __iomem *base; unsigned long flags; - msi_ops->teardown(vector); + msi_ops->teardown(irq); spin_lock_irqsave(&msi_lock, flags); - entry = msi_desc[vector]; + entry = msi_desc[irq]; if (!entry || entry->dev != dev) { spin_unlock_irqrestore(&msi_lock, flags); return -EINVAL; @@ -1011,22 +924,16 @@ static int msi_free_vector(struct pci_de msi_desc[entry->link.head]->link.tail = entry->link.tail; msi_desc[entry->link.tail]->link.head = entry->link.head; entry->dev = NULL; - if (!reassign) { - vector_irq[vector] = 0; - nr_released_vectors++; - } - msi_desc[vector] = NULL; + msi_desc[irq] = NULL; spin_unlock_irqrestore(&msi_lock, flags); - kmem_cache_free(msi_cachep, entry); + destroy_msi_irq(irq); if (type == PCI_CAP_ID_MSIX) { - if (!reassign) - writel(1, base + - entry_nr * PCI_MSIX_ENTRY_SIZE + - PCI_MSIX_ENTRY_VECTOR_CTRL_OFFSET); + writel(1, base + entry_nr * PCI_MSIX_ENTRY_SIZE + + PCI_MSIX_ENTRY_VECTOR_CTRL_OFFSET); - if (head == vector) + if (head == irq) iounmap(base); } @@ -1037,15 +944,15 @@ static int msi_free_vector(struct pci_de * pci_enable_msix - configure device's MSI-X capability structure * @dev: pointer to the pci_dev data structure of MSI-X device function * @entries: pointer to an array of MSI-X entries - * @nvec: number of MSI-X vectors requested for allocation by device driver + * @nvec: number of MSI-X irqs requested for allocation by device driver * * Setup the MSI-X capability structure of device function with the number - * of requested vectors upon its software driver call to request for + * of requested irqs upon its software driver call to request for * MSI-X mode enabled on its hardware device function. A return of zero * indicates the successful configuration of MSI-X capability structure - * with new allocated MSI-X vectors. A return of < 0 indicates a failure. + * with new allocated MSI-X irqs. A return of < 0 indicates a failure. * Or a return of > 0 indicates that driver request is exceeding the number - * of vectors available. Driver should use the returned value to re-send + * of irqs available. Driver should use the returned value to re-send * its request. **/ int pci_enable_msix(struct pci_dev* dev, struct msix_entry *entries, int nvec) @@ -1088,13 +995,13 @@ int pci_enable_msix(struct pci_dev* dev, } } temp = dev->irq; - BUG_ON(!msi_lookup_vector(dev, PCI_CAP_ID_MSIX)); + BUG_ON(!msi_lookup_irq(dev, PCI_CAP_ID_MSIX)); - /* Check whether driver already requested for MSI vector */ + /* Check whether driver already requested for MSI irq */ if (pci_find_capability(dev, PCI_CAP_ID_MSI) > 0 && - !msi_lookup_vector(dev, PCI_CAP_ID_MSI)) { + !msi_lookup_irq(dev, PCI_CAP_ID_MSI)) { printk(KERN_INFO "PCI: %s: Can't enable MSI-X. " - "Device already has an MSI vector assigned\n", + "Device already has an MSI irq assigned\n", pci_name(dev)); dev->irq = temp; return -EINVAL; @@ -1124,27 +1031,27 @@ void pci_disable_msix(struct pci_dev* de disable_msi_mode(dev, pos, PCI_CAP_ID_MSIX); temp = dev->irq; - if (!msi_lookup_vector(dev, PCI_CAP_ID_MSIX)) { - int state, vector, head, tail = 0, warning = 0; + if (!msi_lookup_irq(dev, PCI_CAP_ID_MSIX)) { + int state, irq, head, tail = 0, warning = 0; unsigned long flags; - vector = head = dev->irq; + irq = head = dev->irq; dev->irq = temp; /* Restore pin IRQ */ while (head != tail) { spin_lock_irqsave(&msi_lock, flags); - state = msi_desc[vector]->msi_attrib.state; - tail = msi_desc[vector]->link.tail; + state = msi_desc[irq]->msi_attrib.state; + tail = msi_desc[irq]->link.tail; spin_unlock_irqrestore(&msi_lock, flags); if (state) warning = 1; - else if (vector != head) /* Release MSI-X vector */ - msi_free_vector(dev, vector, 0); - vector = tail; + else if (irq != head) /* Release MSI-X irq */ + msi_free_irq(dev, irq); + irq = tail; } - msi_free_vector(dev, vector, 0); + msi_free_irq(dev, irq); if (warning) { printk(KERN_WARNING "PCI: %s: pci_disable_msix() called without " - "free_irq() on all MSI-X vectors\n", + "free_irq() on all MSI-X irqs\n", pci_name(dev)); BUG_ON(warning > 0); } @@ -1152,11 +1059,11 @@ void pci_disable_msix(struct pci_dev* de } /** - * msi_remove_pci_irq_vectors - reclaim MSI(X) vectors to unused state + * msi_remove_pci_irq_vectors - reclaim MSI(X) irqs to unused state * @dev: pointer to the pci_dev data structure of MSI(X) device function * * Being called during hotplug remove, from which the device function - * is hot-removed. All previous assigned MSI/MSI-X vectors, if + * is hot-removed. All previous assigned MSI/MSI-X irqs, if * allocated for this device function, are reclaimed to unused state, * which may be used later on. **/ @@ -1170,42 +1077,42 @@ void msi_remove_pci_irq_vectors(struct p temp = dev->irq; /* Save IOAPIC IRQ */ pos = pci_find_capability(dev, PCI_CAP_ID_MSI); - if (pos > 0 && !msi_lookup_vector(dev, PCI_CAP_ID_MSI)) { + if (pos > 0 && !msi_lookup_irq(dev, PCI_CAP_ID_MSI)) { spin_lock_irqsave(&msi_lock, flags); state = msi_desc[dev->irq]->msi_attrib.state; spin_unlock_irqrestore(&msi_lock, flags); if (state) { printk(KERN_WARNING "PCI: %s: msi_remove_pci_irq_vectors() " - "called without free_irq() on MSI vector %d\n", + "called without free_irq() on MSI irq %d\n", pci_name(dev), dev->irq); BUG_ON(state > 0); - } else /* Release MSI vector assigned to this device */ - msi_free_vector(dev, dev->irq, 0); + } else /* Release MSI irq assigned to this device */ + msi_free_irq(dev, dev->irq); dev->irq = temp; /* Restore IOAPIC IRQ */ } pos = pci_find_capability(dev, PCI_CAP_ID_MSIX); - if (pos > 0 && !msi_lookup_vector(dev, PCI_CAP_ID_MSIX)) { - int vector, head, tail = 0, warning = 0; + if (pos > 0 && !msi_lookup_irq(dev, PCI_CAP_ID_MSIX)) { + int irq, head, tail = 0, warning = 0; void __iomem *base = NULL; - vector = head = dev->irq; + irq = head = dev->irq; while (head != tail) { spin_lock_irqsave(&msi_lock, flags); - state = msi_desc[vector]->msi_attrib.state; - tail = msi_desc[vector]->link.tail; - base = msi_desc[vector]->mask_base; + state = msi_desc[irq]->msi_attrib.state; + tail = msi_desc[irq]->link.tail; + base = msi_desc[irq]->mask_base; spin_unlock_irqrestore(&msi_lock, flags); if (state) warning = 1; - else if (vector != head) /* Release MSI-X vector */ - msi_free_vector(dev, vector, 0); - vector = tail; + else if (irq != head) /* Release MSI-X irq */ + msi_free_irq(dev, irq); + irq = tail; } - msi_free_vector(dev, vector, 0); + msi_free_irq(dev, irq); if (warning) { iounmap(base); printk(KERN_WARNING "PCI: %s: msi_remove_pci_irq_vectors() " - "called without free_irq() on all MSI-X vectors\n", + "called without free_irq() on all MSI-X irqs\n", pci_name(dev)); BUG_ON(warning > 0); } diff --git a/drivers/pci/msi.h b/drivers/pci/msi.h index 6793241..435d05a 100644 --- a/drivers/pci/msi.h +++ b/drivers/pci/msi.h @@ -8,9 +8,6 @@ #define MSI_H #include <asm/msi.h> -extern int vector_irq[NR_VECTORS]; -extern void (*interrupt[NR_IRQS])(void); - /* * MSI-X Address Register */ @@ -58,9 +55,9 @@ struct msi_desc { __u8 maskbit : 1; /* mask-pending bit supported ? */ __u8 state : 1; /* {0: free, 1: busy} */ __u8 is_64 : 1; /* Address size: 0=32bit 1=64bit */ - __u8 entry_nr; /* specific enabled entry */ - __u8 default_vector; /* default pre-assigned vector */ __u8 pos; /* Location of the msi capability */ + __u16 entry_nr; /* specific enabled entry */ + unsigned default_irq; /* default pre-assigned irq */ }msi_attrib; struct { -- 1.4.0.gc07e ^ permalink raw reply related [flat|nested] 50+ messages in thread
* [PATCH 14/25] x86_64 irq: Move msi message composition into io_apic.c 2006-06-20 22:28 ` [PATCH 13/25] msi: Make the msi code irq based and not vector based Eric W. Biederman @ 2006-06-20 22:28 ` Eric W. Biederman 2006-06-20 22:28 ` [PATCH 15/25] i386 " Eric W. Biederman 0 siblings, 1 reply; 50+ messages in thread From: Eric W. Biederman @ 2006-06-20 22:28 UTC (permalink / raw) To: Andrew Morton Cc: linux-kernel, linux-acpi, linux-pci, discuss, Ingo Molnar, Thomas Gleixner, Andi Kleen, Natalie Protasevich, Len Brown, Kimball Murray, Brice Goglin, Greg Lindahl, Dave Olson, Jeff Garzik, Greg KH, Grant Grundler, bibo,mao, Rajesh Shah, Mark Maule, Jesper Juhl, Shaohua Li, Matthew Wilcox, Michael S. Tsirkin, Ashok Raj, Randy Dunlap This removes the hardcoded assumption that irq == vector in the msi composition code, and it allows the msi message composition to setup logical mode, or lowest priorirty delivery mode as we do for other apic interrupts, and with the same selection criteria. Basically this moves the problem of what is in the msi message into the architecture irq management code where it belongs. Not in a generic layer that doesn't have enough information to compose msi messages properly. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> --- arch/x86_64/kernel/io_apic.c | 73 ++++++++++++++++++++++++++++++++++++++++++ include/asm-x86_64/msi.h | 7 +--- include/asm-x86_64/msidef.h | 47 +++++++++++++++++++++++++++ 3 files changed, 122 insertions(+), 5 deletions(-) diff --git a/arch/x86_64/kernel/io_apic.c b/arch/x86_64/kernel/io_apic.c index ae64a63..7ad0980 100644 --- a/arch/x86_64/kernel/io_apic.c +++ b/arch/x86_64/kernel/io_apic.c @@ -42,6 +42,7 @@ #include <asm/mach_apic.h> #include <asm/acpi.h> #include <asm/dma.h> #include <asm/nmi.h> +#include <asm/msidef.h> #define __apicdebuginit __init @@ -2061,6 +2062,78 @@ void destroy_irq(unsigned int irq) } #endif +/* + * MSI mesage composition + */ +#ifdef CONFIG_PCI_MSI +static int msi_msg_setup(struct pci_dev *pdev, unsigned int irq, struct msi_msg *msg) +{ + /* For now always this code always uses physical delivery + * mode. + */ + int vector; + unsigned dest; + + vector = assign_irq_vector(irq); + if (vector >= 0) { + cpumask_t tmp; + + cpus_clear(tmp); + cpu_set(first_cpu(cpu_online_map), tmp); + dest = cpu_mask_to_apicid(tmp); + + msg->address_hi = MSI_ADDR_BASE_HI; + msg->address_lo = + MSI_ADDR_BASE_LO | + ((INT_DEST_MODE == 0) ? + MSI_ADDR_DEST_MODE_PHYSICAL: + MSI_ADDR_DEST_MODE_LOGICAL) | + ((INT_DELIVERY_MODE != dest_LowestPrio) ? + MSI_ADDR_REDIRECTION_CPU: + MSI_ADDR_REDIRECTION_LOWPRI) | + MSI_ADDR_DEST_ID(dest); + + msg->data = + MSI_DATA_TRIGGER_EDGE | + MSI_DATA_LEVEL_ASSERT | + ((INT_DELIVERY_MODE != dest_LowestPrio) ? + MSI_DATA_DELIVERY_FIXED: + MSI_DATA_DELIVERY_LOWPRI) | + MSI_DATA_VECTOR(vector); + } + return vector; +} + +static void msi_msg_teardown(unsigned int irq) +{ + return; +} + +static void msi_msg_set_affinity(unsigned int irq, cpumask_t mask, struct msi_msg *msg) +{ + int vector; + unsigned dest; + + vector = assign_irq_vector(irq); + if (vector > 0) { + dest = cpu_mask_to_apicid(mask); + + msg->data &= ~MSI_DATA_VECTOR_MASK; + msg->data |= MSI_DATA_VECTOR(vector); + msg->address_lo &= ~MSI_ADDR_DEST_ID_MASK; + msg->address_lo |= MSI_ADDR_DEST_ID(dest); + } +} + +struct msi_ops arch_msi_ops = { + .needs_64bit_address = 0, + .setup = msi_msg_setup, + .teardown = msi_msg_teardown, + .target = msi_msg_set_affinity, +}; + +#endif + /* -------------------------------------------------------------------------- ACPI-based IOAPIC Configuration -------------------------------------------------------------------------- */ diff --git a/include/asm-x86_64/msi.h b/include/asm-x86_64/msi.h index 3ad2346..1876fda 100644 --- a/include/asm-x86_64/msi.h +++ b/include/asm-x86_64/msi.h @@ -10,14 +10,11 @@ #include <asm/desc.h> #include <asm/mach_apic.h> #include <asm/smp.h> -#define LAST_DEVICE_VECTOR (FIRST_SYSTEM_VECTOR - 1) -#define MSI_TARGET_CPU_SHIFT 12 - -extern struct msi_ops msi_apic_ops; +extern struct msi_ops arch_msi_ops; static inline int msi_arch_init(void) { - msi_register(&msi_apic_ops); + msi_register(&arch_msi_ops); return 0; } diff --git a/include/asm-x86_64/msidef.h b/include/asm-x86_64/msidef.h new file mode 100644 index 0000000..4667f1a --- /dev/null +++ b/include/asm-x86_64/msidef.h @@ -0,0 +1,47 @@ +#ifndef ASM_MSIDEF_H +#define ASM_MSIDEF_H + +/* + * Constants for Intel APIC based MSI messages. + */ + +/* + * Shifts for MSI data + */ + +#define MSI_DATA_VECTOR_SHIFT 0 +#define MSI_DATA_VECTOR_MASK 0x000000ff +#define MSI_DATA_VECTOR(v) (((v) << MSI_DATA_VECTOR_SHIFT) & MSI_DATA_VECTOR_MASK) + +#define MSI_DATA_DELIVERY_MODE_SHIFT 8 +#define MSI_DATA_DELIVERY_FIXED (0 << MSI_DATA_DELIVERY_MODE_SHIFT) +#define MSI_DATA_DELIVERY_LOWPRI (1 << MSI_DATA_DELIVERY_MODE_SHIFT) + +#define MSI_DATA_LEVEL_SHIFT 14 +#define MSI_DATA_LEVEL_DEASSERT (0 << MSI_DATA_LEVEL_SHIFT) +#define MSI_DATA_LEVEL_ASSERT (1 << MSI_DATA_LEVEL_SHIFT) + +#define MSI_DATA_TRIGGER_SHIFT 15 +#define MSI_DATA_TRIGGER_EDGE (0 << MSI_DATA_TRIGGER_SHIFT) +#define MSI_DATA_TRIGGER_LEVEL (1 << MSI_DATA_TRIGGER_SHIFT) + +/* + * Shift/mask fields for msi address + */ + +#define MSI_ADDR_BASE_HI 0 +#define MSI_ADDR_BASE_LO 0xfee00000 + +#define MSI_ADDR_DEST_MODE_SHIFT 2 +#define MSI_ADDR_DEST_MODE_PHYSICAL (0 << MSI_ADDR_DEST_MODE_SHIFT) +#define MSI_ADDR_DEST_MODE_LOGICAL (1 << MSI_ADDR_DEST_MODE_SHIFT) + +#define MSI_ADDR_REDIRECTION_SHIFT 3 +#define MSI_ADDR_REDIRECTION_CPU (0 << MSI_ADDR_REDIRECTION_SHIFT) /* dedicated cpu */ +#define MSI_ADDR_REDIRECTION_LOWPRI (1 << MSI_ADDR_REDIRECTION_SHIFT) /* lowest priority */ + +#define MSI_ADDR_DEST_ID_SHIFT 12 +#define MSI_ADDR_DEST_ID_MASK 0x00ffff0 +#define MSI_ADDR_DEST_ID(dest) (((dest) << MSI_ADDR_DEST_ID_SHIFT) & MSI_ADDR_DEST_ID_MASK) + +#endif /* ASM_MSIDEF_H */ -- 1.4.0.gc07e ^ permalink raw reply related [flat|nested] 50+ messages in thread
* [PATCH 15/25] i386 irq: Move msi message composition into io_apic.c 2006-06-20 22:28 ` [PATCH 14/25] x86_64 irq: Move msi message composition into io_apic.c Eric W. Biederman @ 2006-06-20 22:28 ` Eric W. Biederman 2006-06-20 22:28 ` [PATCH 16/25] msi: Only build msi-apic.c on ia64 Eric W. Biederman 0 siblings, 1 reply; 50+ messages in thread From: Eric W. Biederman @ 2006-06-20 22:28 UTC (permalink / raw) To: Andrew Morton Cc: linux-kernel, linux-acpi, linux-pci, discuss, Ingo Molnar, Thomas Gleixner, Andi Kleen, Natalie Protasevich, Len Brown, Kimball Murray, Brice Goglin, Greg Lindahl, Dave Olson, Jeff Garzik, Greg KH, Grant Grundler, bibo,mao, Rajesh Shah, Mark Maule, Jesper Juhl, Shaohua Li, Matthew Wilcox, Michael S. Tsirkin, Ashok Raj, Randy Dunlap This removes the hardcoded assumption that irq == vector in the msi composition code, and it allows the msi message composition to setup logical mode, or lowest priorirty delivery mode as we do for other apic interrupts, and with the same selection criteria. Basically this moves the problem of what is in the msi message into the architecture irq management code where it belongs. Not in a generic layer that doesn't have enough information to compose msi messages properly. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> --- arch/i386/kernel/io_apic.c | 70 ++++++++++++++++++++++++++++++++++++++++++++ include/asm-i386/msi.h | 7 +--- include/asm-i386/msidef.h | 47 ++++++++++++++++++++++++++++++ 3 files changed, 119 insertions(+), 5 deletions(-) diff --git a/arch/i386/kernel/io_apic.c b/arch/i386/kernel/io_apic.c index 04f78ff..68af125 100644 --- a/arch/i386/kernel/io_apic.c +++ b/arch/i386/kernel/io_apic.c @@ -32,6 +32,7 @@ #include <linux/compiler.h> #include <linux/acpi.h> #include <linux/module.h> #include <linux/sysdev.h> +#include <linux/pci.h> #include <asm/io.h> #include <asm/smp.h> @@ -39,6 +40,7 @@ #include <asm/desc.h> #include <asm/timer.h> #include <asm/i8259.h> #include <asm/nmi.h> +#include <asm/msidef.h> #include <mach_apic.h> #include <mach_apicdef.h> @@ -2545,6 +2547,74 @@ void destroy_irq(unsigned int irq) } #endif /* CONFIG_PCI_MSI */ +/* + * MSI mesage composition + */ +#ifdef CONFIG_PCI_MSI +static int msi_msg_setup(struct pci_dev *pdev, unsigned int irq, struct msi_msg *msg) +{ + /* For now always this code always uses physical delivery + * mode. + */ + int vector; + unsigned dest; + + vector = assign_irq_vector(irq); + if (vector >= 0) { + dest = cpu_mask_to_apicid(TARGET_CPUS); + + msg->address_hi = MSI_ADDR_BASE_HI; + msg->address_lo = + MSI_ADDR_BASE_LO | + ((INT_DEST_MODE == 0) ? + MSI_ADDR_DEST_MODE_PHYSICAL: + MSI_ADDR_DEST_MODE_LOGICAL) | + ((INT_DELIVERY_MODE != dest_LowestPrio) ? + MSI_ADDR_REDIRECTION_CPU: + MSI_ADDR_REDIRECTION_LOWPRI) | + MSI_ADDR_DEST_ID(dest); + + msg->data = + MSI_DATA_TRIGGER_EDGE | + MSI_DATA_LEVEL_ASSERT | + ((INT_DELIVERY_MODE != dest_LowestPrio) ? + MSI_DATA_DELIVERY_FIXED: + MSI_DATA_DELIVERY_LOWPRI) | + MSI_DATA_VECTOR(vector); + } + return vector; +} + +static void msi_msg_teardown(unsigned int irq) +{ + return; +} + +static void msi_msg_set_affinity(unsigned int irq, cpumask_t mask, struct msi_msg *msg) +{ + int vector; + unsigned dest; + + vector = assign_irq_vector(irq); + if (vector > 0) { + dest = cpu_mask_to_apicid(mask); + + msg->data &= ~MSI_DATA_VECTOR_MASK; + msg->data |= MSI_DATA_VECTOR(vector); + msg->address_lo &= ~MSI_ADDR_DEST_ID_MASK; + msg->address_lo |= MSI_ADDR_DEST_ID(dest); + } +} + +struct msi_ops arch_msi_ops = { + .needs_64bit_address = 0, + .setup = msi_msg_setup, + .teardown = msi_msg_teardown, + .target = msi_msg_set_affinity, +}; + +#endif /* CONFIG_PCI_MSI */ + /* -------------------------------------------------------------------------- ACPI-based IOAPIC Configuration -------------------------------------------------------------------------- */ diff --git a/include/asm-i386/msi.h b/include/asm-i386/msi.h index b11c4b7..7368a89 100644 --- a/include/asm-i386/msi.h +++ b/include/asm-i386/msi.h @@ -9,14 +9,11 @@ #define ASM_MSI_H #include <asm/desc.h> #include <mach_apic.h> -#define LAST_DEVICE_VECTOR (FIRST_SYSTEM_VECTOR - 1) -#define MSI_TARGET_CPU_SHIFT 12 - -extern struct msi_ops msi_apic_ops; +extern struct msi_ops arch_msi_ops; static inline int msi_arch_init(void) { - msi_register(&msi_apic_ops); + msi_register(&arch_msi_ops); return 0; } diff --git a/include/asm-i386/msidef.h b/include/asm-i386/msidef.h new file mode 100644 index 0000000..4667f1a --- /dev/null +++ b/include/asm-i386/msidef.h @@ -0,0 +1,47 @@ +#ifndef ASM_MSIDEF_H +#define ASM_MSIDEF_H + +/* + * Constants for Intel APIC based MSI messages. + */ + +/* + * Shifts for MSI data + */ + +#define MSI_DATA_VECTOR_SHIFT 0 +#define MSI_DATA_VECTOR_MASK 0x000000ff +#define MSI_DATA_VECTOR(v) (((v) << MSI_DATA_VECTOR_SHIFT) & MSI_DATA_VECTOR_MASK) + +#define MSI_DATA_DELIVERY_MODE_SHIFT 8 +#define MSI_DATA_DELIVERY_FIXED (0 << MSI_DATA_DELIVERY_MODE_SHIFT) +#define MSI_DATA_DELIVERY_LOWPRI (1 << MSI_DATA_DELIVERY_MODE_SHIFT) + +#define MSI_DATA_LEVEL_SHIFT 14 +#define MSI_DATA_LEVEL_DEASSERT (0 << MSI_DATA_LEVEL_SHIFT) +#define MSI_DATA_LEVEL_ASSERT (1 << MSI_DATA_LEVEL_SHIFT) + +#define MSI_DATA_TRIGGER_SHIFT 15 +#define MSI_DATA_TRIGGER_EDGE (0 << MSI_DATA_TRIGGER_SHIFT) +#define MSI_DATA_TRIGGER_LEVEL (1 << MSI_DATA_TRIGGER_SHIFT) + +/* + * Shift/mask fields for msi address + */ + +#define MSI_ADDR_BASE_HI 0 +#define MSI_ADDR_BASE_LO 0xfee00000 + +#define MSI_ADDR_DEST_MODE_SHIFT 2 +#define MSI_ADDR_DEST_MODE_PHYSICAL (0 << MSI_ADDR_DEST_MODE_SHIFT) +#define MSI_ADDR_DEST_MODE_LOGICAL (1 << MSI_ADDR_DEST_MODE_SHIFT) + +#define MSI_ADDR_REDIRECTION_SHIFT 3 +#define MSI_ADDR_REDIRECTION_CPU (0 << MSI_ADDR_REDIRECTION_SHIFT) /* dedicated cpu */ +#define MSI_ADDR_REDIRECTION_LOWPRI (1 << MSI_ADDR_REDIRECTION_SHIFT) /* lowest priority */ + +#define MSI_ADDR_DEST_ID_SHIFT 12 +#define MSI_ADDR_DEST_ID_MASK 0x00ffff0 +#define MSI_ADDR_DEST_ID(dest) (((dest) << MSI_ADDR_DEST_ID_SHIFT) & MSI_ADDR_DEST_ID_MASK) + +#endif /* ASM_MSIDEF_H */ -- 1.4.0.gc07e ^ permalink raw reply related [flat|nested] 50+ messages in thread
* [PATCH 16/25] msi: Only build msi-apic.c on ia64 2006-06-20 22:28 ` [PATCH 15/25] i386 " Eric W. Biederman @ 2006-06-20 22:28 ` Eric W. Biederman 2006-06-20 22:28 ` [PATCH 17/25] x86_64 irq: Remove the msi assumption that irq == vector Eric W. Biederman 0 siblings, 1 reply; 50+ messages in thread From: Eric W. Biederman @ 2006-06-20 22:28 UTC (permalink / raw) To: Andrew Morton Cc: linux-kernel, linux-acpi, linux-pci, discuss, Ingo Molnar, Thomas Gleixner, Andi Kleen, Natalie Protasevich, Len Brown, Kimball Murray, Brice Goglin, Greg Lindahl, Dave Olson, Jeff Garzik, Greg KH, Grant Grundler, bibo,mao, Rajesh Shah, Mark Maule, Jesper Juhl, Shaohua Li, Matthew Wilcox, Michael S. Tsirkin, Ashok Raj, Randy Dunlap After the previous changes ia64 is the only architecture useing msi-apic.c Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> --- drivers/pci/Makefile | 3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) diff --git a/drivers/pci/Makefile b/drivers/pci/Makefile index f2d152b..f4c7a4b 100644 --- a/drivers/pci/Makefile +++ b/drivers/pci/Makefile @@ -27,7 +27,8 @@ obj-$(CONFIG_PPC64) += setup-bus.o obj-$(CONFIG_MIPS) += setup-bus.o setup-irq.o obj-$(CONFIG_X86_VISWS) += setup-irq.o -msiobj-y := msi.o msi-apic.o +msiobj-y := msi.o +msiobj-$(CONFIG_IA64) := msi-apic.o msiobj-$(CONFIG_IA64_GENERIC) += msi-altix.o msiobj-$(CONFIG_IA64_SGI_SN2) += msi-altix.o obj-$(CONFIG_PCI_MSI) += $(msiobj-y) -- 1.4.0.gc07e ^ permalink raw reply related [flat|nested] 50+ messages in thread
* [PATCH 17/25] x86_64 irq: Remove the msi assumption that irq == vector 2006-06-20 22:28 ` [PATCH 16/25] msi: Only build msi-apic.c on ia64 Eric W. Biederman @ 2006-06-20 22:28 ` Eric W. Biederman 2006-06-20 22:28 ` [PATCH 18/25] i386 " Eric W. Biederman 0 siblings, 1 reply; 50+ messages in thread From: Eric W. Biederman @ 2006-06-20 22:28 UTC (permalink / raw) To: Andrew Morton Cc: linux-kernel, linux-acpi, linux-pci, discuss, Ingo Molnar, Thomas Gleixner, Andi Kleen, Natalie Protasevich, Len Brown, Kimball Murray, Brice Goglin, Greg Lindahl, Dave Olson, Jeff Garzik, Greg KH, Grant Grundler, bibo,mao, Rajesh Shah, Mark Maule, Jesper Juhl, Shaohua Li, Matthew Wilcox, Michael S. Tsirkin, Ashok Raj, Randy Dunlap This patch removes the change in behavior of the irq allocation code when CONFIG_PCI_MSI is defined. Removing all instances of the assumption that irq == vector. create_irq is rewritten to first allocate a free irq and then to assign that irq a vector. assign_irq_vector is made static and the AUTO_ASSIGN case which allocates an vector not bound to an irq is removed. The ioapic vector methods are removed, and everything now works with irqs. The definition of NR_IRQS no longer depends on CONFIG_PCI_MSI Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> --- arch/x86_64/kernel/io_apic.c | 147 +++++++++++++----------------------------- include/asm-x86_64/hw_irq.h | 1 include/asm-x86_64/io_apic.h | 40 ----------- include/asm-x86_64/irq.h | 5 - 4 files changed, 45 insertions(+), 148 deletions(-) diff --git a/arch/x86_64/kernel/io_apic.c b/arch/x86_64/kernel/io_apic.c index 7ad0980..1a63a0e 100644 --- a/arch/x86_64/kernel/io_apic.c +++ b/arch/x86_64/kernel/io_apic.c @@ -44,6 +44,8 @@ #include <asm/dma.h> #include <asm/nmi.h> #include <asm/msidef.h> +static int assign_irq_vector(int irq); + #define __apicdebuginit __init int sis_apic_bug; /* not actually supported, dummy for compile */ @@ -83,14 +85,6 @@ static struct irq_pin_list { short apic, pin, next; } irq_2_pin[PIN_MAP_SIZE]; -int vector_irq[NR_VECTORS] __read_mostly = { [0 ... NR_VECTORS - 1] = -1}; -#ifdef CONFIG_PCI_MSI -#define vector_to_irq(vector) \ - (platform_legacy_irq(vector) ? vector : vector_irq[vector]) -#else -#define vector_to_irq(vector) (vector) -#endif - #define __DO_ACTION(R, ACTION, FINAL) \ \ { \ @@ -135,7 +129,7 @@ static void set_ioapic_affinity_irq(unsi spin_lock_irqsave(&ioapic_lock, flags); __DO_ACTION(1, = dest, ) - set_irq_info(irq, mask); + set_native_irq_info(irq, mask); spin_unlock_irqrestore(&ioapic_lock, flags); } #endif @@ -834,18 +828,14 @@ static inline int IO_APIC_irq_trigger(in /* irq_vectors is indexed by the sum of all RTEs in all I/O APICs. */ u8 irq_vector[NR_IRQ_VECTORS] __read_mostly = { FIRST_DEVICE_VECTOR , 0 }; -int assign_irq_vector(int irq) +static int __assign_irq_vector(int irq) { static int current_vector = FIRST_DEVICE_VECTOR, offset = 0; - unsigned long flags; int vector; - BUG_ON(irq != AUTO_ASSIGN && (unsigned)irq >= NR_IRQ_VECTORS); - - spin_lock_irqsave(&vector_lock, flags); + BUG_ON((unsigned)irq >= NR_IRQ_VECTORS); - if (irq != AUTO_ASSIGN && IO_APIC_VECTOR(irq) > 0) { - spin_unlock_irqrestore(&vector_lock, flags); + if (IO_APIC_VECTOR(irq) > 0) { return IO_APIC_VECTOR(irq); } next: @@ -860,10 +850,18 @@ next: } vector = current_vector; - vector_irq[vector] = irq; - if (irq != AUTO_ASSIGN) - IO_APIC_VECTOR(irq) = vector; + IO_APIC_VECTOR(irq) = vector; + return vector; +} + +static int assign_irq_vector(int irq) +{ + int vector; + unsigned long flags; + + spin_lock_irqsave(&vector_lock, flags); + vector = __assign_irq_vector(irq); spin_unlock_irqrestore(&vector_lock, flags); return vector; @@ -879,18 +877,14 @@ #define IOAPIC_LEVEL 1 static void ioapic_register_intr(int irq, int vector, unsigned long trigger) { - unsigned idx; - - idx = use_pci_vector() && !platform_legacy_irq(irq) ? vector : irq; - if ((trigger == IOAPIC_AUTO && IO_APIC_irq_trigger(irq)) || trigger == IOAPIC_LEVEL) - set_irq_chip_and_handler(idx, &ioapic_chip, + set_irq_chip_and_handler(irq, &ioapic_chip, handle_fasteoi_irq); else - set_irq_chip_and_handler(idx, &ioapic_chip, + set_irq_chip_and_handler(irq, &ioapic_chip, handle_edge_irq); - set_intr_gate(vector, interrupt[idx]); + set_intr_gate(vector, interrupt[irq]); } static void __init setup_IO_APIC_irqs(void) @@ -1110,17 +1104,12 @@ void __apicdebuginit print_IO_APIC(void) ); } } - if (use_pci_vector()) - printk(KERN_INFO "Using vector-based indexing\n"); printk(KERN_DEBUG "IRQ to pin mappings:\n"); for (i = 0; i < NR_IRQS; i++) { struct irq_pin_list *entry = irq_2_pin + i; if (entry->pin < 0) continue; - if (use_pci_vector() && !platform_legacy_irq(i)) - printk(KERN_DEBUG "IRQ%d ", IO_APIC_VECTOR(i)); - else - printk(KERN_DEBUG "IRQ%d ", i); + printk(KERN_DEBUG "IRQ%d ", i); for (;;) { printk("-> %d:%d", entry->apic, entry->pin); if (!entry->next) @@ -1523,42 +1512,8 @@ static unsigned int startup_ioapic_irq(u return was_pending; } -static unsigned int startup_ioapic_vector(unsigned int vector) -{ - int irq = vector_to_irq(vector); - - return startup_ioapic_irq(irq); -} - -static void mask_ioapic_vector (unsigned int vector) +static int ioapic_retrigger_irq(unsigned int irq) { - int irq = vector_to_irq(vector); - - mask_IO_APIC_irq(irq); -} - -static void unmask_ioapic_vector (unsigned int vector) -{ - int irq = vector_to_irq(vector); - - unmask_IO_APIC_irq(irq); -} - -#ifdef CONFIG_SMP -static void set_ioapic_affinity_vector (unsigned int vector, - cpumask_t cpu_mask) -{ - int irq = vector_to_irq(vector); - - set_native_irq_info(vector, cpu_mask); - set_ioapic_affinity_irq(irq, cpu_mask); -} -#endif // CONFIG_SMP - -static int ioapic_retrigger_vector(unsigned int vector) -{ - int irq = vector_to_irq(vector); - send_IPI_self(IO_APIC_VECTOR(irq)); return 1; @@ -1605,15 +1560,15 @@ static void ack_apic_level(unsigned int static struct irq_chip ioapic_chip __read_mostly = { .name = "IO-APIC", - .startup = startup_ioapic_vector, - .mask = mask_ioapic_vector, - .unmask = unmask_ioapic_vector, + .startup = startup_ioapic_irq, + .mask = mask_IO_APIC_irq, + .unmask = unmask_IO_APIC_irq, .ack = ack_apic_edge, .eoi = ack_apic_level, #ifdef CONFIG_SMP - .set_affinity = set_ioapic_affinity_vector, + .set_affinity = set_ioapic_affinity_irq, #endif - .retrigger = ioapic_retrigger_vector, + .retrigger = ioapic_retrigger_irq, }; static inline void init_IO_APIC_traps(void) @@ -1633,11 +1588,6 @@ static inline void init_IO_APIC_traps(vo */ for (irq = 0; irq < NR_IRQS ; irq++) { int tmp = irq; - if (use_pci_vector()) { - if (!platform_legacy_irq(tmp)) - if ((tmp = vector_to_irq(tmp)) == -1) - continue; - } if (IO_APIC_IRQ(tmp) && !IO_APIC_VECTOR(tmp)) { /* * Hmm.. We don't have an entry for this, @@ -2014,34 +1964,31 @@ static int __init ioapic_init_sysfs(void device_initcall(ioapic_init_sysfs); -#ifdef CONFIG_PCI_MSI /* - * Dynamic irq allocate and deallocation for MSI + * Dynamic irq allocate and deallocation */ int create_irq(void) { - /* Hack of the day: irq == vector. - * - * Ultimately this will be be more general, - * and not depend on the irq to vector identity mapping. - * But this version is needed until msi.c can cope with - * the more general form. - */ - int irq, vector; + /* Allocate an unused irq */ + int irq, new, vector; unsigned long flags; - vector = assign_irq_vector(AUTO_ASSIGN); - irq = vector; - if (vector >= 0) { - struct irq_desc *desc; + irq = -ENOSPC; + spin_lock_irqsave(&vector_lock, flags); + for (new = (NR_IRQS - 1); new >= 0; new--) { + if (platform_legacy_irq(new)) + continue; + if (irq_vector[new] != 0) + continue; + vector = __assign_irq_vector(new); + if (likely(vector > 0)) + irq = new; + break; + } + spin_unlock_irqrestore(&vector_lock, flags); - spin_lock_irqsave(&vector_lock, flags); - vector_irq[vector] = irq; - irq_vector[irq] = vector; - spin_unlock_irqrestore(&vector_lock, flags); - + if (irq >= 0) { set_intr_gate(vector, interrupt[irq]); - dynamic_irq_init(irq); } return irq; @@ -2050,17 +1997,13 @@ int create_irq(void) void destroy_irq(unsigned int irq) { unsigned long flags; - unsigned int vector; dynamic_irq_cleanup(irq); spin_lock_irqsave(&vector_lock, flags); - vector = irq_vector[irq]; - vector_irq[vector] = -1; irq_vector[irq] = 0; spin_unlock_irqrestore(&vector_lock, flags); } -#endif /* * MSI mesage composition @@ -2216,7 +2159,7 @@ int io_apic_set_pci_routing (int ioapic, spin_lock_irqsave(&ioapic_lock, flags); io_apic_write(ioapic, 0x11+2*pin, *(((int *)&entry)+1)); io_apic_write(ioapic, 0x10+2*pin, *(((int *)&entry)+0)); - set_native_irq_info(use_pci_vector() ? entry.vector : irq, TARGET_CPUS); + set_native_irq_info(irq, TARGET_CPUS); spin_unlock_irqrestore(&ioapic_lock, flags); return 0; diff --git a/include/asm-x86_64/hw_irq.h b/include/asm-x86_64/hw_irq.h index f5da94a..1a8dc18 100644 --- a/include/asm-x86_64/hw_irq.h +++ b/include/asm-x86_64/hw_irq.h @@ -75,7 +75,6 @@ #define FIRST_SYSTEM_VECTOR 0xef /* du #ifndef __ASSEMBLY__ extern u8 irq_vector[NR_IRQ_VECTORS]; #define IO_APIC_VECTOR(irq) (irq_vector[irq]) -#define AUTO_ASSIGN -1 /* * Various low-level irq details needed by irq.c, process.c, diff --git a/include/asm-x86_64/io_apic.h b/include/asm-x86_64/io_apic.h index fb7a090..2885bea 100644 --- a/include/asm-x86_64/io_apic.h +++ b/include/asm-x86_64/io_apic.h @@ -12,45 +12,7 @@ #include <asm/mpspec.h> #ifdef CONFIG_X86_IO_APIC -#ifdef CONFIG_PCI_MSI -static inline int use_pci_vector(void) {return 1;} -static inline void disable_edge_ioapic_vector(unsigned int vector) { } -static inline void mask_and_ack_level_ioapic_vector(unsigned int vector) { } -static inline void end_edge_ioapic_vector (unsigned int vector) { } -#define startup_level_ioapic startup_level_ioapic_vector -#define shutdown_level_ioapic mask_IO_APIC_vector -#define enable_level_ioapic unmask_IO_APIC_vector -#define disable_level_ioapic mask_IO_APIC_vector -#define mask_and_ack_level_ioapic mask_and_ack_level_ioapic_vector -#define end_level_ioapic end_level_ioapic_vector -#define set_ioapic_affinity set_ioapic_affinity_vector - -#define startup_edge_ioapic startup_edge_ioapic_vector -#define shutdown_edge_ioapic disable_edge_ioapic_vector -#define enable_edge_ioapic unmask_IO_APIC_vector -#define disable_edge_ioapic disable_edge_ioapic_vector -#define ack_edge_ioapic ack_edge_ioapic_vector -#define end_edge_ioapic end_edge_ioapic_vector -#else static inline int use_pci_vector(void) {return 0;} -static inline void disable_edge_ioapic_irq(unsigned int irq) { } -static inline void mask_and_ack_level_ioapic_irq(unsigned int irq) { } -static inline void end_edge_ioapic_irq (unsigned int irq) { } -#define startup_level_ioapic startup_level_ioapic_irq -#define shutdown_level_ioapic mask_IO_APIC_irq -#define enable_level_ioapic unmask_IO_APIC_irq -#define disable_level_ioapic mask_IO_APIC_irq -#define mask_and_ack_level_ioapic mask_and_ack_level_ioapic_irq -#define end_level_ioapic end_level_ioapic_irq -#define set_ioapic_affinity set_ioapic_affinity_irq - -#define startup_edge_ioapic startup_edge_ioapic_irq -#define shutdown_edge_ioapic disable_edge_ioapic_irq -#define enable_edge_ioapic unmask_IO_APIC_irq -#define disable_edge_ioapic disable_edge_ioapic_irq -#define ack_edge_ioapic ack_edge_ioapic_irq -#define end_edge_ioapic end_edge_ioapic_irq -#endif #define APIC_MISMATCH_DEBUG @@ -213,8 +175,6 @@ #else /* !CONFIG_X86_IO_APIC */ #define io_apic_assign_pci_irqs 0 #endif -extern int assign_irq_vector(int irq); - void enable_NMI_through_LVT0 (void * dummy); extern spinlock_t i8259A_lock; diff --git a/include/asm-x86_64/irq.h b/include/asm-x86_64/irq.h index 9db5a1b..0c8d570 100644 --- a/include/asm-x86_64/irq.h +++ b/include/asm-x86_64/irq.h @@ -31,13 +31,8 @@ #define NR_VECTORS 256 #define FIRST_SYSTEM_VECTOR 0xef /* duplicated in hw_irq.h */ -#ifdef CONFIG_PCI_MSI -#define NR_IRQS FIRST_SYSTEM_VECTOR -#define NR_IRQ_VECTORS NR_IRQS -#else #define NR_IRQS 224 #define NR_IRQ_VECTORS (32 * NR_CPUS) -#endif static __inline__ int irq_canonicalize(int irq) { -- 1.4.0.gc07e ^ permalink raw reply related [flat|nested] 50+ messages in thread
* [PATCH 18/25] i386 irq: Remove the msi assumption that irq == vector 2006-06-20 22:28 ` [PATCH 17/25] x86_64 irq: Remove the msi assumption that irq == vector Eric W. Biederman @ 2006-06-20 22:28 ` Eric W. Biederman 2006-06-20 22:28 ` [PATCH 19/25] irq: Remove msi hacks Eric W. Biederman 0 siblings, 1 reply; 50+ messages in thread From: Eric W. Biederman @ 2006-06-20 22:28 UTC (permalink / raw) To: Andrew Morton Cc: linux-kernel, linux-acpi, linux-pci, discuss, Ingo Molnar, Thomas Gleixner, Andi Kleen, Natalie Protasevich, Len Brown, Kimball Murray, Brice Goglin, Greg Lindahl, Dave Olson, Jeff Garzik, Greg KH, Grant Grundler, bibo,mao, Rajesh Shah, Mark Maule, Jesper Juhl, Shaohua Li, Matthew Wilcox, Michael S. Tsirkin, Ashok Raj, Randy Dunlap This patch removes the change in behavior of the irq allocation code when CONFIG_PCI_MSI is defined. Removing all instances of the assumption that irq == vector. create_irq is rewritten to first allocate a free irq and then to assign that irq a vector. assign_irq_vector is made static and the AUTO_ASSIGN case which allocates an vector not bound to an irq is removed. The ioapic vector methods are removed, and everything now works with irqs. The definition of NR_IRQS no longer depends on CONFIG_PCI_MSI Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> --- arch/i386/kernel/acpi/boot.c | 7 - arch/i386/kernel/io_apic.c | 171 +++++--------------- arch/i386/pci/irq.c | 4 include/asm-i386/hw_irq.h | 1 include/asm-i386/io_apic.h | 42 ----- include/asm-i386/mach-default/irq_vectors_limits.h | 5 - include/asm-x86_64/io_apic.h | 2 7 files changed, 47 insertions(+), 185 deletions(-) diff --git a/arch/i386/kernel/acpi/boot.c b/arch/i386/kernel/acpi/boot.c index 107018b..6e7edc0 100644 --- a/arch/i386/kernel/acpi/boot.c +++ b/arch/i386/kernel/acpi/boot.c @@ -459,12 +459,7 @@ void __init acpi_pic_sci_set_trigger(uns int acpi_gsi_to_irq(u32 gsi, unsigned int *irq) { -#ifdef CONFIG_X86_IO_APIC - if (use_pci_vector() && !platform_legacy_irq(gsi)) - *irq = IO_APIC_VECTOR(gsi); - else -#endif - *irq = gsi_irq_sharing(gsi); + *irq = gsi_irq_sharing(gsi); return 0; } diff --git a/arch/i386/kernel/io_apic.c b/arch/i386/kernel/io_apic.c index 68af125..18a5c2a 100644 --- a/arch/i386/kernel/io_apic.c +++ b/arch/i386/kernel/io_apic.c @@ -89,14 +89,6 @@ static struct irq_pin_list { int apic, pin, next; } irq_2_pin[PIN_MAP_SIZE]; -int vector_irq[NR_VECTORS] __read_mostly = { [0 ... NR_VECTORS - 1] = -1}; -#ifdef CONFIG_PCI_MSI -#define vector_to_irq(vector) \ - (platform_legacy_irq(vector) ? vector : vector_irq[vector]) -#else -#define vector_to_irq(vector) (vector) -#endif - /* * The common case is 1:1 IRQ<->pin mappings. Sometimes there are * shared ISA-space IRQs, so we have to support them. We are super @@ -262,7 +254,7 @@ static void set_ioapic_affinity_irq(unsi break; entry = irq_2_pin + entry->next; } - set_irq_info(irq, cpumask); + set_native_irq_info(irq, cpumask); spin_unlock_irqrestore(&ioapic_lock, flags); } @@ -1163,18 +1155,14 @@ static inline int IO_APIC_irq_trigger(in /* irq_vectors is indexed by the sum of all RTEs in all I/O APICs. */ u8 irq_vector[NR_IRQ_VECTORS] __read_mostly = { FIRST_DEVICE_VECTOR , 0 }; -int assign_irq_vector(int irq) +static int __assign_irq_vector(int irq) { static int current_vector = FIRST_DEVICE_VECTOR, offset = 0; - unsigned long flags; int vector; - BUG_ON(irq != AUTO_ASSIGN && (unsigned)irq >= NR_IRQ_VECTORS); - - spin_lock_irqsave(&vector_lock, flags); + BUG_ON((unsigned)irq >= NR_IRQ_VECTORS); - if (irq != AUTO_ASSIGN && IO_APIC_VECTOR(irq) > 0) { - spin_unlock_irqrestore(&vector_lock, flags); + if (IO_APIC_VECTOR(irq) > 0) { return IO_APIC_VECTOR(irq); } next: @@ -1192,15 +1180,22 @@ next: } vector = current_vector; - vector_irq[vector] = irq; - if (irq != AUTO_ASSIGN) - IO_APIC_VECTOR(irq) = vector; + IO_APIC_VECTOR(irq) = vector; + return vector; +} + +static int assign_irq_vector(int irq) +{ + unsigned long flags; + int vector; + + spin_lock_irqsave(&vector_lock, flags); + vector = __assign_irq_vector(irq); spin_unlock_irqrestore(&vector_lock, flags); return vector; } - static struct irq_chip ioapic_chip; #define IOAPIC_AUTO -1 @@ -1209,18 +1204,14 @@ #define IOAPIC_LEVEL 1 static void ioapic_register_intr(int irq, int vector, unsigned long trigger) { - unsigned idx; - - idx = use_pci_vector() && !platform_legacy_irq(irq) ? vector : irq; - if ((trigger == IOAPIC_AUTO && IO_APIC_irq_trigger(irq)) || trigger == IOAPIC_LEVEL) - set_irq_chip_and_handler(idx, &ioapic_chip, + set_irq_chip_and_handler(irq, &ioapic_chip, handle_fasteoi_irq); else - set_irq_chip_and_handler(idx, &ioapic_chip, + set_irq_chip_and_handler(irq, &ioapic_chip, handle_edge_irq); - set_intr_gate(vector, interrupt[idx]); + set_intr_gate(vector, interrupt[irq]); } static void __init setup_IO_APIC_irqs(void) @@ -1473,17 +1464,12 @@ void __init print_IO_APIC(void) ); } } - if (use_pci_vector()) - printk(KERN_INFO "Using vector-based indexing\n"); printk(KERN_DEBUG "IRQ to pin mappings:\n"); for (i = 0; i < NR_IRQS; i++) { struct irq_pin_list *entry = irq_2_pin + i; if (entry->pin < 0) continue; - if (use_pci_vector() && !platform_legacy_irq(i)) - printk(KERN_DEBUG "IRQ%d ", IO_APIC_VECTOR(i)); - else - printk(KERN_DEBUG "IRQ%d ", i); + printk(KERN_DEBUG "IRQ%d ", i); for (;;) { printk("-> %d:%d", entry->apic, entry->pin); if (!entry->next) @@ -1950,7 +1936,7 @@ static unsigned int startup_ioapic_irq(u static void ack_ioapic_irq(unsigned int irq) { - move_irq(irq); + move_native_irq(irq); ack_APIC_irq(); } @@ -1959,7 +1945,7 @@ static void ack_ioapic_quirk_irq(unsigne unsigned long v; int i; - move_irq(irq); + move_native_irq(irq); /* * It appears there is an erratum which affects at least version 0x11 * of I/O APIC (that's the 82093AA and cores integrated into various @@ -1994,63 +1980,8 @@ static void ack_ioapic_quirk_irq(unsigne } } -static unsigned int startup_ioapic_vector(unsigned int vector) -{ - int irq = vector_to_irq(vector); - - return startup_ioapic_irq(irq); -} - -static void ack_ioapic_vector(unsigned int vector) -{ - int irq = vector_to_irq(vector); - - move_native_irq(vector); - ack_ioapic_irq(irq); -} - -static void ack_ioapic_quirk_vector(unsigned int vector) -{ - int irq = vector_to_irq(vector); - - move_native_irq(vector); - ack_ioapic_quirk_irq(irq); -} - -static void mask_IO_APIC_vector (unsigned int vector) -{ - int irq = vector_to_irq(vector); - - mask_IO_APIC_irq(irq); -} - -static void unmask_IO_APIC_vector (unsigned int vector) +static int ioapic_retrigger_irq(unsigned int irq) { - int irq = vector_to_irq(vector); - - unmask_IO_APIC_irq(irq); -} - -/* - * Oh just glorious. If CONFIG_PCI_MSI we've done - * #define set_ioapic_affinity set_ioapic_affinity_vector - */ -#if defined (CONFIG_SMP) && defined(CONFIG_X86_IO_APIC) && \ - defined(CONFIG_PCI_MSI) -static void set_ioapic_affinity_vector (unsigned int vector, - cpumask_t cpu_mask) -{ - int irq = vector_to_irq(vector); - - set_native_irq_info(vector, cpu_mask); - set_ioapic_affinity_irq(irq, cpu_mask); -} -#endif - -static int ioapic_retrigger_vector(unsigned int vector) -{ - int irq = vector_to_irq(vector); - send_IPI_self(IO_APIC_VECTOR(irq)); return 1; @@ -2058,15 +1989,15 @@ static int ioapic_retrigger_vector(unsig static struct irq_chip ioapic_chip __read_mostly = { .name = "IO-APIC", - .startup = startup_ioapic_vector, - .mask = mask_IO_APIC_vector, - .unmask = unmask_IO_APIC_vector, - .ack = ack_ioapic_vector, - .eoi = ack_ioapic_quirk_vector, + .startup = startup_ioapic_irq, + .mask = mask_IO_APIC_irq, + .unmask = unmask_IO_APIC_irq, + .ack = ack_ioapic_irq, + .eoi = ack_ioapic_quirk_irq, #ifdef CONFIG_SMP - .set_affinity = set_ioapic_affinity, + .set_affinity = set_ioapic_affinity_irq, #endif - .retrigger = ioapic_retrigger_vector, + .retrigger = ioapic_retrigger_irq, }; @@ -2087,11 +2018,6 @@ static inline void init_IO_APIC_traps(vo */ for (irq = 0; irq < NR_IRQS ; irq++) { int tmp = irq; - if (use_pci_vector()) { - if (!platform_legacy_irq(tmp)) - if ((tmp = vector_to_irq(tmp)) == -1) - continue; - } if (IO_APIC_IRQ(tmp) && !IO_APIC_VECTOR(tmp)) { /* * Hmm.. We don't have an entry for this, @@ -2505,28 +2431,26 @@ #ifdef CONFIG_PCI_MSI */ int create_irq(void) { - /* Hack of the day: irq == vector. - * - * Ultimately this will be be more general, - * and not depend on the irq to vector identity mapping. - * But this version is needed until msi.c can cope with - * the more general form. - */ - int irq, vector; + /* Allocate an unused irq */ + int irq, new, vector; unsigned long flags; - vector = assign_irq_vector(AUTO_ASSIGN); - irq = vector; - if (vector >= 0) { - struct irq_desc *desc; + irq = -ENOSPC; + spin_lock_irqsave(&vector_lock, flags); + for (new = (NR_IRQS - 1); new >= 0; new--) { + if (platform_legacy_irq(new)) + continue; + if (irq_vector[new] != 0) + continue; + vector = __assign_irq_vector(new); + if (likely(vector > 0)) + irq = new; + break; + } + spin_unlock_irqrestore(&vector_lock, flags); - spin_lock_irqsave(&vector_lock, flags); - vector_irq[vector] = irq; - irq_vector[irq] = vector; - spin_unlock_irqrestore(&vector_lock, flags); - + if (irq >= 0) { set_intr_gate(vector, interrupt[irq]); - dynamic_irq_init(irq); } return irq; @@ -2535,13 +2459,10 @@ int create_irq(void) void destroy_irq(unsigned int irq) { unsigned long flags; - unsigned int vector; dynmic_irq_cleanup(irq); spin_lock_irqsave(&vector_lock, flags); - vector = irq_vector[irq]; - vector_irq[vector] = -1; irq_vector[irq] = 0; spin_unlock_irqrestore(&vector_lock, flags); } @@ -2769,7 +2690,7 @@ int io_apic_set_pci_routing (int ioapic, spin_lock_irqsave(&ioapic_lock, flags); io_apic_write(ioapic, 0x11+2*pin, *(((int *)&entry)+1)); io_apic_write(ioapic, 0x10+2*pin, *(((int *)&entry)+0)); - set_native_irq_info(use_pci_vector() ? entry.vector : irq, TARGET_CPUS); + set_native_irq_info(irq, TARGET_CPUS); spin_unlock_irqrestore(&ioapic_lock, flags); return 0; diff --git a/arch/i386/pci/irq.c b/arch/i386/pci/irq.c index 768584d..54a72ca 100644 --- a/arch/i386/pci/irq.c +++ b/arch/i386/pci/irq.c @@ -982,10 +982,6 @@ #ifdef CONFIG_X86_IO_APIC pci_name(bridge), 'A' + pin, irq); } if (irq >= 0) { - if (use_pci_vector() && - !platform_legacy_irq(irq)) - irq = IO_APIC_VECTOR(irq); - printk(KERN_INFO "PCI->APIC IRQ transform: %s[%c] -> IRQ %d\n", pci_name(dev), 'A' + pin, irq); dev->irq = irq; diff --git a/include/asm-i386/hw_irq.h b/include/asm-i386/hw_irq.h index 00988e8..5a72436 100644 --- a/include/asm-i386/hw_irq.h +++ b/include/asm-i386/hw_irq.h @@ -26,7 +26,6 @@ #include <asm/sections.h> extern u8 irq_vector[NR_IRQ_VECTORS]; #define IO_APIC_VECTOR(irq) (irq_vector[irq]) -#define AUTO_ASSIGN -1 extern void (*interrupt[NR_IRQS])(void); diff --git a/include/asm-i386/io_apic.h b/include/asm-i386/io_apic.h index 5092e81..3909524 100644 --- a/include/asm-i386/io_apic.h +++ b/include/asm-i386/io_apic.h @@ -12,46 +12,6 @@ #include <asm/mpspec.h> #ifdef CONFIG_X86_IO_APIC -#ifdef CONFIG_PCI_MSI -static inline int use_pci_vector(void) {return 1;} -static inline void disable_edge_ioapic_vector(unsigned int vector) { } -static inline void mask_and_ack_level_ioapic_vector(unsigned int vector) { } -static inline void end_edge_ioapic_vector (unsigned int vector) { } -#define startup_level_ioapic startup_level_ioapic_vector -#define shutdown_level_ioapic mask_IO_APIC_vector -#define enable_level_ioapic unmask_IO_APIC_vector -#define disable_level_ioapic mask_IO_APIC_vector -#define mask_and_ack_level_ioapic mask_and_ack_level_ioapic_vector -#define end_level_ioapic end_level_ioapic_vector -#define set_ioapic_affinity set_ioapic_affinity_vector - -#define startup_edge_ioapic startup_edge_ioapic_vector -#define shutdown_edge_ioapic disable_edge_ioapic_vector -#define enable_edge_ioapic unmask_IO_APIC_vector -#define disable_edge_ioapic disable_edge_ioapic_vector -#define ack_edge_ioapic ack_edge_ioapic_vector -#define end_edge_ioapic end_edge_ioapic_vector -#else -static inline int use_pci_vector(void) {return 0;} -static inline void disable_edge_ioapic_irq(unsigned int irq) { } -static inline void mask_and_ack_level_ioapic_irq(unsigned int irq) { } -static inline void end_edge_ioapic_irq (unsigned int irq) { } -#define startup_level_ioapic startup_level_ioapic_irq -#define shutdown_level_ioapic mask_IO_APIC_irq -#define enable_level_ioapic unmask_IO_APIC_irq -#define disable_level_ioapic mask_IO_APIC_irq -#define mask_and_ack_level_ioapic mask_and_ack_level_ioapic_irq -#define end_level_ioapic end_level_ioapic_irq -#define set_ioapic_affinity set_ioapic_affinity_irq - -#define startup_edge_ioapic startup_edge_ioapic_irq -#define shutdown_edge_ioapic disable_edge_ioapic_irq -#define enable_edge_ioapic unmask_IO_APIC_irq -#define disable_edge_ioapic disable_edge_ioapic_irq -#define ack_edge_ioapic ack_edge_ioapic_irq -#define end_edge_ioapic end_edge_ioapic_irq -#endif - #define IO_APIC_BASE(idx) \ ((volatile int *)(__fix_to_virt(FIX_IO_APIC_BASE_0 + idx) \ + (mp_ioapics[idx].mpc_apicaddr & ~PAGE_MASK))) @@ -208,6 +168,4 @@ #else /* !CONFIG_X86_IO_APIC */ #define io_apic_assign_pci_irqs 0 #endif -extern int assign_irq_vector(int irq); - #endif diff --git a/include/asm-i386/mach-default/irq_vectors_limits.h b/include/asm-i386/mach-default/irq_vectors_limits.h index b330026..7f161e7 100644 --- a/include/asm-i386/mach-default/irq_vectors_limits.h +++ b/include/asm-i386/mach-default/irq_vectors_limits.h @@ -1,10 +1,6 @@ #ifndef _ASM_IRQ_VECTORS_LIMITS_H #define _ASM_IRQ_VECTORS_LIMITS_H -#ifdef CONFIG_PCI_MSI -#define NR_IRQS FIRST_SYSTEM_VECTOR -#define NR_IRQ_VECTORS NR_IRQS -#else #ifdef CONFIG_X86_IO_APIC #define NR_IRQS 224 # if (224 >= 32 * NR_CPUS) @@ -16,6 +12,5 @@ #else #define NR_IRQS 16 #define NR_IRQ_VECTORS NR_IRQS #endif -#endif #endif /* _ASM_IRQ_VECTORS_LIMITS_H */ diff --git a/include/asm-x86_64/io_apic.h b/include/asm-x86_64/io_apic.h index 2885bea..06806b1 100644 --- a/include/asm-x86_64/io_apic.h +++ b/include/asm-x86_64/io_apic.h @@ -12,8 +12,6 @@ #include <asm/mpspec.h> #ifdef CONFIG_X86_IO_APIC -static inline int use_pci_vector(void) {return 0;} - #define APIC_MISMATCH_DEBUG #define IO_APIC_BASE(idx) \ -- 1.4.0.gc07e ^ permalink raw reply related [flat|nested] 50+ messages in thread
* [PATCH 19/25] irq: Remove msi hacks 2006-06-20 22:28 ` [PATCH 18/25] i386 " Eric W. Biederman @ 2006-06-20 22:28 ` Eric W. Biederman 2006-06-20 22:28 ` [PATCH 20/25] irq: Generalize the check for HARDIRQ_BITS Eric W. Biederman 0 siblings, 1 reply; 50+ messages in thread From: Eric W. Biederman @ 2006-06-20 22:28 UTC (permalink / raw) To: Andrew Morton Cc: linux-kernel, linux-acpi, linux-pci, discuss, Ingo Molnar, Thomas Gleixner, Andi Kleen, Natalie Protasevich, Len Brown, Kimball Murray, Brice Goglin, Greg Lindahl, Dave Olson, Jeff Garzik, Greg KH, Grant Grundler, bibo,mao, Rajesh Shah, Mark Maule, Jesper Juhl, Shaohua Li, Matthew Wilcox, Michael S. Tsirkin, Ashok Raj, Randy Dunlap Because of the nasty way that CONFIG_PCI_MSI was implemented we wound up with set_irq_info and set_native_irq_info, with move_irq and move_native_irq. Both functions did the same thing but they were built and called under different circumstances. Now that the msi hacks are gone we can kill move_irq and set_irq_info. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> --- include/linux/irq.h | 36 ------------------------------------ 1 files changed, 0 insertions(+), 36 deletions(-) diff --git a/include/linux/irq.h b/include/linux/irq.h index 6d1ad88..cfd2f31 100644 --- a/include/linux/irq.h +++ b/include/linux/irq.h @@ -202,36 +202,6 @@ void set_pending_irq(unsigned int irq, c void move_native_irq(int irq); void move_masked_irq(int irq); -#ifdef CONFIG_PCI_MSI -/* - * Wonder why these are dummies? - * For e.g the set_ioapic_affinity_vector() calls the set_ioapic_affinity_irq() - * counter part after translating the vector to irq info. We need to perform - * this operation on the real irq, when we dont use vector, i.e when - * pci_use_vector() is false. - */ -static inline void move_irq(int irq) -{ -} - -static inline void set_irq_info(int irq, cpumask_t mask) -{ -} - -#else /* CONFIG_PCI_MSI */ - -static inline void move_irq(int irq) -{ - move_native_irq(irq); -} - -static inline void set_irq_info(int irq, cpumask_t mask) -{ - set_native_irq_info(irq, mask); -} - -#endif /* CONFIG_PCI_MSI */ - #else /* CONFIG_GENERIC_PENDING_IRQ || CONFIG_IRQBALANCE */ static inline void move_irq(int irq) @@ -250,16 +220,10 @@ static inline void set_pending_irq(unsig { } -static inline void set_irq_info(int irq, cpumask_t mask) -{ - set_native_irq_info(irq, mask); -} - #endif /* CONFIG_GENERIC_PENDING_IRQ */ #else /* CONFIG_SMP */ -#define move_irq(x) #define move_native_irq(x) #define move_masked_irq(x) -- 1.4.0.gc07e ^ permalink raw reply related [flat|nested] 50+ messages in thread
* [PATCH 20/25] irq: Generalize the check for HARDIRQ_BITS. 2006-06-20 22:28 ` [PATCH 19/25] irq: Remove msi hacks Eric W. Biederman @ 2006-06-20 22:28 ` Eric W. Biederman 2006-06-20 22:28 ` [PATCH 21/25] x86_64 irq: Make the external irq handlers report their vector, not the irq number Eric W. Biederman 0 siblings, 1 reply; 50+ messages in thread From: Eric W. Biederman @ 2006-06-20 22:28 UTC (permalink / raw) To: Andrew Morton Cc: linux-kernel, linux-acpi, linux-pci, discuss, Ingo Molnar, Thomas Gleixner, Andi Kleen, Natalie Protasevich, Len Brown, Kimball Murray, Brice Goglin, Greg Lindahl, Dave Olson, Jeff Garzik, Greg KH, Grant Grundler, bibo,mao, Rajesh Shah, Mark Maule, Jesper Juhl, Shaohua Li, Matthew Wilcox, Michael S. Tsirkin, Ashok Raj, Randy Dunlap This patch adds support for systems that cannot receive every interrupt on a single cpu simultaneously, in the check to see if we have enough HARDIRQ_BITS. MAX_HARDIRQS_PER_CPU becomes the count of the maximum number of hardare generated interrupts per cpu. On architectures that support per cpu interrupt delivery this can be a significant space savings and scalability bonus. This patch adds support for systems that cannot receive every interrupt on Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> --- include/asm-x86_64/hardirq.h | 3 +++ include/linux/hardirq.h | 7 ++++++- 2 files changed, 9 insertions(+), 1 deletions(-) diff --git a/include/asm-x86_64/hardirq.h b/include/asm-x86_64/hardirq.h index 64a65ce..95d5e09 100644 --- a/include/asm-x86_64/hardirq.h +++ b/include/asm-x86_64/hardirq.h @@ -6,6 +6,9 @@ #include <linux/irq.h> #include <asm/pda.h> #include <asm/apic.h> +/* We can have at most NR_VECTORS irqs routed to a cpu at a time */ +#define MAX_HARDIRQS_PER_CPU NR_VECTORS + #define __ARCH_IRQ_STAT 1 #define local_softirq_pending() read_pda(__softirq_pending) diff --git a/include/linux/hardirq.h b/include/linux/hardirq.h index ccabda7..f60b8c3 100644 --- a/include/linux/hardirq.h +++ b/include/linux/hardirq.h @@ -27,11 +27,16 @@ #define SOFTIRQ_BITS 8 #ifndef HARDIRQ_BITS #define HARDIRQ_BITS 12 + +#ifndef MAX_HARDIRQS_PER_CPU +#define MAX_HARDIRQS_PER_CPU NR_IRQS +#endif + /* * The hardirq mask has to be large enough to have space for potentially * all IRQ sources in the system nesting on a single CPU. */ -#if (1 << HARDIRQ_BITS) < NR_IRQS +#if (1 << HARDIRQ_BITS) < MAX_HARDIRQS_PER_CPU # error HARDIRQ_BITS is too low! #endif #endif -- 1.4.0.gc07e ^ permalink raw reply related [flat|nested] 50+ messages in thread
* [PATCH 21/25] x86_64 irq: Make the external irq handlers report their vector, not the irq number. 2006-06-20 22:28 ` [PATCH 20/25] irq: Generalize the check for HARDIRQ_BITS Eric W. Biederman @ 2006-06-20 22:28 ` Eric W. Biederman 2006-06-20 22:28 ` [PATCH 22/25] x86_64 irq: make vector_irq per cpu Eric W. Biederman 0 siblings, 1 reply; 50+ messages in thread From: Eric W. Biederman @ 2006-06-20 22:28 UTC (permalink / raw) To: Andrew Morton Cc: linux-kernel, linux-acpi, linux-pci, discuss, Ingo Molnar, Thomas Gleixner, Andi Kleen, Natalie Protasevich, Len Brown, Kimball Murray, Brice Goglin, Greg Lindahl, Dave Olson, Jeff Garzik, Greg KH, Grant Grundler, bibo,mao, Rajesh Shah, Mark Maule, Jesper Juhl, Shaohua Li, Matthew Wilcox, Michael S. Tsirkin, Ashok Raj, Randy Dunlap This is a small pessimization but it paves the way for making this information per cpu. Which allows the the maximum number of IRQS to become NR_CPUS*224. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> --- arch/x86_64/kernel/i8259.c | 62 +++++++++++++++++------------------------- arch/x86_64/kernel/io_apic.c | 4 +-- arch/x86_64/kernel/irq.c | 7 +++-- include/asm-x86_64/hw_irq.h | 1 + 4 files changed, 32 insertions(+), 42 deletions(-) diff --git a/arch/x86_64/kernel/i8259.c b/arch/x86_64/kernel/i8259.c index 4749954..c3e737a 100644 --- a/arch/x86_64/kernel/i8259.c +++ b/arch/x86_64/kernel/i8259.c @@ -44,19 +44,11 @@ #define BUILD_16_IRQS(x) \ BI(x,8) BI(x,9) BI(x,a) BI(x,b) \ BI(x,c) BI(x,d) BI(x,e) BI(x,f) -#define BUILD_15_IRQS(x) \ - BI(x,0) BI(x,1) BI(x,2) BI(x,3) \ - BI(x,4) BI(x,5) BI(x,6) BI(x,7) \ - BI(x,8) BI(x,9) BI(x,a) BI(x,b) \ - BI(x,c) BI(x,d) BI(x,e) - /* * ISA PIC or low IO-APIC triggered (INTA-cycle or APIC) interrupts: * (these are usually mapped to vectors 0x20-0x2f) */ -BUILD_16_IRQS(0x0) -#ifdef CONFIG_X86_LOCAL_APIC /* * The IO-APIC gives us many more interrupt sources. Most of these * are unused but an SMP system is supposed to have enough memory ... @@ -67,19 +59,13 @@ #ifdef CONFIG_X86_LOCAL_APIC * * (these are usually mapped into the 0x30-0xff vector range) */ - BUILD_16_IRQS(0x1) BUILD_16_IRQS(0x2) BUILD_16_IRQS(0x3) + BUILD_16_IRQS(0x2) BUILD_16_IRQS(0x3) BUILD_16_IRQS(0x4) BUILD_16_IRQS(0x5) BUILD_16_IRQS(0x6) BUILD_16_IRQS(0x7) BUILD_16_IRQS(0x8) BUILD_16_IRQS(0x9) BUILD_16_IRQS(0xa) BUILD_16_IRQS(0xb) -BUILD_16_IRQS(0xc) BUILD_16_IRQS(0xd) +BUILD_16_IRQS(0xc) BUILD_16_IRQS(0xd) BUILD_16_IRQS(0xe) BUILD_16_IRQS(0xf) -#ifdef CONFIG_PCI_MSI - BUILD_15_IRQS(0xe) -#endif - -#endif #undef BUILD_16_IRQS -#undef BUILD_15_IRQS #undef BI @@ -92,31 +78,15 @@ #define IRQLIST_16(x) \ IRQ(x,8), IRQ(x,9), IRQ(x,a), IRQ(x,b), \ IRQ(x,c), IRQ(x,d), IRQ(x,e), IRQ(x,f) -#define IRQLIST_15(x) \ - IRQ(x,0), IRQ(x,1), IRQ(x,2), IRQ(x,3), \ - IRQ(x,4), IRQ(x,5), IRQ(x,6), IRQ(x,7), \ - IRQ(x,8), IRQ(x,9), IRQ(x,a), IRQ(x,b), \ - IRQ(x,c), IRQ(x,d), IRQ(x,e) - void (*interrupt[NR_IRQS])(void) = { - IRQLIST_16(0x0), - -#ifdef CONFIG_X86_IO_APIC - IRQLIST_16(0x1), IRQLIST_16(0x2), IRQLIST_16(0x3), + IRQLIST_16(0x2), IRQLIST_16(0x3), IRQLIST_16(0x4), IRQLIST_16(0x5), IRQLIST_16(0x6), IRQLIST_16(0x7), IRQLIST_16(0x8), IRQLIST_16(0x9), IRQLIST_16(0xa), IRQLIST_16(0xb), - IRQLIST_16(0xc), IRQLIST_16(0xd) - -#ifdef CONFIG_PCI_MSI - , IRQLIST_15(0xe) -#endif - -#endif + IRQLIST_16(0xc), IRQLIST_16(0xd), IRQLIST_16(0xe), IRQLIST_16(0xf) }; #undef IRQ #undef IRQLIST_16 -#undef IRQLIST_14 /* * This is the 'legacy' 8259A Programmable Interrupt Controller, @@ -424,6 +394,26 @@ device_initcall(i8259A_init_sysfs); */ static struct irqaction irq2 = { no_action, 0, CPU_MASK_NONE, "cascade", NULL, NULL}; +int vector_irq[NR_VECTORS] __read_mostly = { + [0 ... FIRST_EXTERNAL_VECTOR - 1] = -1, + [FIRST_EXTERNAL_VECTOR + 0] = 0, + [FIRST_EXTERNAL_VECTOR + 1] = 1, + [FIRST_EXTERNAL_VECTOR + 2] = 2, + [FIRST_EXTERNAL_VECTOR + 3] = 3, + [FIRST_EXTERNAL_VECTOR + 4] = 4, + [FIRST_EXTERNAL_VECTOR + 5] = 5, + [FIRST_EXTERNAL_VECTOR + 6] = 6, + [FIRST_EXTERNAL_VECTOR + 7] = 7, + [FIRST_EXTERNAL_VECTOR + 8] = 8, + [FIRST_EXTERNAL_VECTOR + 9] = 9, + [FIRST_EXTERNAL_VECTOR + 10] = 10, + [FIRST_EXTERNAL_VECTOR + 11] = 11, + [FIRST_EXTERNAL_VECTOR + 12] = 12, + [FIRST_EXTERNAL_VECTOR + 13] = 13, + [FIRST_EXTERNAL_VECTOR + 14] = 14, + [FIRST_EXTERNAL_VECTOR + 15] = 15, + [FIRST_EXTERNAL_VECTOR + 16 ... NR_VECTORS - 1] = -1 +}; void __init init_ISA_irqs (void) { @@ -523,8 +513,6 @@ void __init init_IRQ(void) */ for (i = 0; i < (NR_VECTORS - FIRST_EXTERNAL_VECTOR); i++) { int vector = FIRST_EXTERNAL_VECTOR + i; - if (i >= NR_IRQS) - break; if (vector != IA32_SYSCALL_VECTOR) set_intr_gate(vector, interrupt[i]); } @@ -534,7 +522,7 @@ #ifdef CONFIG_SMP * IRQ0 must be given a fixed assignment and initialized, * because it's used before the IO-APIC is set up. */ - set_intr_gate(FIRST_DEVICE_VECTOR, interrupt[0]); + vector_irq[FIRST_DEVICE_VECTOR] = 0; /* * The reschedule interrupt is a CPU-to-CPU reschedule-helper diff --git a/arch/x86_64/kernel/io_apic.c b/arch/x86_64/kernel/io_apic.c index 1a63a0e..3ff5606 100644 --- a/arch/x86_64/kernel/io_apic.c +++ b/arch/x86_64/kernel/io_apic.c @@ -850,6 +850,7 @@ next: } vector = current_vector; + vector_irq[vector] = irq; IO_APIC_VECTOR(irq) = vector; return vector; @@ -884,7 +885,6 @@ static void ioapic_register_intr(int irq else set_irq_chip_and_handler(irq, &ioapic_chip, handle_edge_irq); - set_intr_gate(vector, interrupt[irq]); } static void __init setup_IO_APIC_irqs(void) @@ -1727,7 +1727,6 @@ static inline void check_timer(void) */ disable_8259A_irq(0); vector = assign_irq_vector(0); - set_intr_gate(vector, interrupt[0]); /* * Subtle, code in do_timer_interrupt() expects an AEOI @@ -1988,7 +1987,6 @@ int create_irq(void) spin_unlock_irqrestore(&vector_lock, flags); if (irq >= 0) { - set_intr_gate(vector, interrupt[irq]); dynamic_irq_init(irq); } return irq; diff --git a/arch/x86_64/kernel/irq.c b/arch/x86_64/kernel/irq.c index c2d27d3..dd8d79a 100644 --- a/arch/x86_64/kernel/irq.c +++ b/arch/x86_64/kernel/irq.c @@ -93,12 +93,15 @@ #endif asmlinkage unsigned int do_IRQ(struct pt_regs *regs) { /* high bit used in ret_from_ code */ - unsigned irq = ~regs->orig_rax; + unsigned vector = ~regs->orig_rax; + unsigned irq; exit_idle(); irq_enter(); - generic_handle_irq(irq, regs); + irq = vector_irq[vector]; + if (likely(irq < NR_IRQS)) + generic_handle_irq(irq, regs); irq_exit(); diff --git a/include/asm-x86_64/hw_irq.h b/include/asm-x86_64/hw_irq.h index 1a8dc18..9f6a0bf 100644 --- a/include/asm-x86_64/hw_irq.h +++ b/include/asm-x86_64/hw_irq.h @@ -74,6 +74,7 @@ #define FIRST_SYSTEM_VECTOR 0xef /* du #ifndef __ASSEMBLY__ extern u8 irq_vector[NR_IRQ_VECTORS]; +extern int vector_irq[NR_VECTORS]; #define IO_APIC_VECTOR(irq) (irq_vector[irq]) /* -- 1.4.0.gc07e ^ permalink raw reply related [flat|nested] 50+ messages in thread
* [PATCH 22/25] x86_64 irq: make vector_irq per cpu. 2006-06-20 22:28 ` [PATCH 21/25] x86_64 irq: Make the external irq handlers report their vector, not the irq number Eric W. Biederman @ 2006-06-20 22:28 ` Eric W. Biederman 2006-06-20 22:28 ` [PATCH 23/25] x86_64 irq: Kill gsi_irq_sharing Eric W. Biederman 0 siblings, 1 reply; 50+ messages in thread From: Eric W. Biederman @ 2006-06-20 22:28 UTC (permalink / raw) To: Andrew Morton Cc: linux-kernel, linux-acpi, linux-pci, discuss, Ingo Molnar, Thomas Gleixner, Andi Kleen, Natalie Protasevich, Len Brown, Kimball Murray, Brice Goglin, Greg Lindahl, Dave Olson, Jeff Garzik, Greg KH, Grant Grundler, bibo,mao, Rajesh Shah, Mark Maule, Jesper Juhl, Shaohua Li, Matthew Wilcox, Michael S. Tsirkin, Ashok Raj, Randy Dunlap This refactors the irq handling code to make the vectors a per cpu resource so the same vector number can be simultaneously used on multiple cpus for different irqs. This should make systems that were hitting limits on the total number of irqs much more livable. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> --- arch/x86_64/kernel/i8259.c | 4 - arch/x86_64/kernel/io_apic.c | 185 +++++++++++++++++++++++++++++++----------- arch/x86_64/kernel/irq.c | 2 include/asm-x86_64/hw_irq.h | 5 + include/asm-x86_64/irq.h | 4 - 5 files changed, 145 insertions(+), 55 deletions(-) diff --git a/arch/x86_64/kernel/i8259.c b/arch/x86_64/kernel/i8259.c index c3e737a..11c10f3 100644 --- a/arch/x86_64/kernel/i8259.c +++ b/arch/x86_64/kernel/i8259.c @@ -394,7 +394,7 @@ device_initcall(i8259A_init_sysfs); */ static struct irqaction irq2 = { no_action, 0, CPU_MASK_NONE, "cascade", NULL, NULL}; -int vector_irq[NR_VECTORS] __read_mostly = { +DEFINE_PER_CPU(vector_irq_t, vector_irq) = { [0 ... FIRST_EXTERNAL_VECTOR - 1] = -1, [FIRST_EXTERNAL_VECTOR + 0] = 0, [FIRST_EXTERNAL_VECTOR + 1] = 1, @@ -522,7 +522,7 @@ #ifdef CONFIG_SMP * IRQ0 must be given a fixed assignment and initialized, * because it's used before the IO-APIC is set up. */ - vector_irq[FIRST_DEVICE_VECTOR] = 0; + __get_cpu_var(vector_irq)[FIRST_DEVICE_VECTOR] = 0; /* * The reschedule interrupt is a CPU-to-CPU reschedule-helper diff --git a/arch/x86_64/kernel/io_apic.c b/arch/x86_64/kernel/io_apic.c index 3ff5606..f818d64 100644 --- a/arch/x86_64/kernel/io_apic.c +++ b/arch/x86_64/kernel/io_apic.c @@ -44,7 +44,7 @@ #include <asm/dma.h> #include <asm/nmi.h> #include <asm/msidef.h> -static int assign_irq_vector(int irq); +static int assign_irq_vector(int irq, cpumask_t mask); #define __apicdebuginit __init @@ -107,12 +107,36 @@ #define __DO_ACTION(R, ACTION, FINAL) FINAL; \ } +static void __target_IO_APIC_irq(unsigned int irq, unsigned int dest, u8 vector) +{ + int apic, pin; + struct irq_pin_list *entry = irq_2_pin + irq; + + BUG_ON(irq >= NR_IRQS); + for (;;) { + unsigned int reg; + apic = entry->apic; + pin = entry->pin; + if (pin == -1) + break; + io_apic_write(apic, 0x11 + pin*2, dest); + reg = io_apic_read(apic, 0x10 + pin*2); + reg &= ~0x000000ff; + reg |= vector; + io_apic_modify(apic, reg); + if (!entry->next) + break; + entry = irq_2_pin + entry->next; + } +} + #ifdef CONFIG_SMP static void set_ioapic_affinity_irq(unsigned int irq, cpumask_t mask) { unsigned long flags; unsigned int dest; cpumask_t tmp; + int vector; cpus_and(tmp, mask, cpu_online_map); if (cpus_empty(tmp)) @@ -120,7 +144,13 @@ static void set_ioapic_affinity_irq(unsi cpus_and(mask, tmp, CPU_MASK_ALL); - dest = cpu_mask_to_apicid(mask); + vector = assign_irq_vector(irq, mask); + if (vector < 0) + return; + + cpus_clear(tmp); + cpu_set(vector >> 8, tmp); + dest = cpu_mask_to_apicid(tmp); /* * Only the high 8 bits are valid. @@ -128,7 +158,7 @@ static void set_ioapic_affinity_irq(unsi dest = SET_APIC_LOGICAL_ID(dest); spin_lock_irqsave(&ioapic_lock, flags); - __DO_ACTION(1, = dest, ) + __target_IO_APIC_irq(irq, dest, vector & 0xff); set_native_irq_info(irq, mask); spin_unlock_irqrestore(&ioapic_lock, flags); } @@ -715,7 +745,7 @@ int gsi_irq_sharing(int gsi) tries = NR_IRQS; try_again: - vector = assign_irq_vector(gsi); + vector = assign_irq_vector(gsi, TARGET_CPUS); /* * Sharing vectors means sharing IRQs, so scan irq_vectors for previous @@ -826,45 +856,77 @@ static inline int IO_APIC_irq_trigger(in } /* irq_vectors is indexed by the sum of all RTEs in all I/O APICs. */ -u8 irq_vector[NR_IRQ_VECTORS] __read_mostly = { FIRST_DEVICE_VECTOR , 0 }; +unsigned int irq_vector[NR_IRQ_VECTORS] __read_mostly = { FIRST_EXTERNAL_VECTOR, 0 }; -static int __assign_irq_vector(int irq) +static int __assign_irq_vector(int irq, cpumask_t mask) { - static int current_vector = FIRST_DEVICE_VECTOR, offset = 0; - int vector; + /* + * NOTE! The local APIC isn't very good at handling + * multiple interrupts at the same interrupt level. + * As the interrupt level is determined by taking the + * vector number and shifting that right by 4, we + * want to spread these out a bit so that they don't + * all fall in the same interrupt level. + * + * Also, we've got to be careful not to trash gate + * 0x80, because int 0x80 is hm, kind of importantish. ;) + */ + static struct { + int vector; + int offset; + } pos[NR_CPUS] = { [ 0 ... NR_CPUS - 1] = {FIRST_DEVICE_VECTOR, 0} }; + int old_vector = -1; + int cpu; BUG_ON((unsigned)irq >= NR_IRQ_VECTORS); - if (IO_APIC_VECTOR(irq) > 0) { - return IO_APIC_VECTOR(irq); + if (IO_APIC_VECTOR(irq) > 0) + old_vector = IO_APIC_VECTOR(irq); + if ((old_vector > 0) && cpu_isset(old_vector >> 8, mask)) { + return old_vector; } + + for_each_cpu_mask(cpu, mask) { + int vector, offset; + vector = pos[cpu].vector; + offset = pos[cpu].offset; next: - current_vector += 8; - if (current_vector == IA32_SYSCALL_VECTOR) - goto next; - - if (current_vector >= FIRST_SYSTEM_VECTOR) { - /* If we run out of vectors on large boxen, must share them. */ - offset = (offset + 1) % 8; - current_vector = FIRST_DEVICE_VECTOR + offset; + vector += 8; + if (vector >= FIRST_SYSTEM_VECTOR) { + /* If we run out of vectors on large boxen, must share them. */ + offset = (offset + 1) % 8; + vector = FIRST_DEVICE_VECTOR + offset; + } + if (unlikely(pos[cpu].vector == vector)) + continue; + if (vector == IA32_SYSCALL_VECTOR) + goto next; + if (per_cpu(vector_irq, cpu)[vector] != -1) + goto next; + /* Found one! */ + pos[cpu].vector = vector; + pos[cpu].offset = offset; + if (old_vector >= 0) { + int old_cpu = old_vector >> 8; + old_vector &= 0xff; + per_cpu(vector_irq, old_cpu)[old_vector] = -1; + } + per_cpu(vector_irq, cpu)[vector] = irq; + vector |= cpu << 8; + IO_APIC_VECTOR(irq) = vector; + return vector; } - - vector = current_vector; - vector_irq[vector] = irq; - IO_APIC_VECTOR(irq) = vector; - - return vector; + return -ENOSPC; } -static int assign_irq_vector(int irq) +static int assign_irq_vector(int irq, cpumask_t mask) { int vector; unsigned long flags; spin_lock_irqsave(&vector_lock, flags); - vector = __assign_irq_vector(irq); + vector = __assign_irq_vector(irq, mask); spin_unlock_irqrestore(&vector_lock, flags); - return vector; } @@ -934,8 +996,15 @@ static void __init setup_IO_APIC_irqs(vo continue; if (IO_APIC_IRQ(irq)) { - vector = assign_irq_vector(irq); - entry.vector = vector; + cpumask_t mask; + vector = assign_irq_vector(irq, TARGET_CPUS); + if (vector < 0) + continue; + + cpus_clear(mask); + cpu_set(vector >> 8, mask); + entry.dest.logical.logical_dest = cpu_mask_to_apicid(mask); + entry.vector = vector & 0xff; ioapic_register_intr(irq, vector, IOAPIC_AUTO); if (!apic && (irq < 16)) @@ -1514,7 +1583,14 @@ static unsigned int startup_ioapic_irq(u static int ioapic_retrigger_irq(unsigned int irq) { - send_IPI_self(IO_APIC_VECTOR(irq)); + cpumask_t mask; + unsigned vector; + + vector = irq_vector[irq]; + cpus_clear(mask); + cpu_set(vector >> 8, mask); + + send_IPI_mask(mask, vector & 0xff); return 1; } @@ -1726,7 +1802,7 @@ static inline void check_timer(void) * get/set the timer IRQ vector: */ disable_8259A_irq(0); - vector = assign_irq_vector(0); + vector = assign_irq_vector(0, TARGET_CPUS); /* * Subtle, code in do_timer_interrupt() expects an AEOI @@ -1979,7 +2055,7 @@ int create_irq(void) continue; if (irq_vector[new] != 0) continue; - vector = __assign_irq_vector(new); + vector = __assign_irq_vector(new, TARGET_CPUS); if (likely(vector > 0)) irq = new; break; @@ -2015,14 +2091,14 @@ static int msi_msg_setup(struct pci_dev int vector; unsigned dest; - vector = assign_irq_vector(irq); + vector = assign_irq_vector(irq, TARGET_CPUS); if (vector >= 0) { cpumask_t tmp; - + cpus_clear(tmp); - cpu_set(first_cpu(cpu_online_map), tmp); + cpu_set(vector >> 8, tmp); dest = cpu_mask_to_apicid(tmp); - + msg->address_hi = MSI_ADDR_BASE_HI; msg->address_lo = MSI_ADDR_BASE_LO | @@ -2055,9 +2131,13 @@ static void msi_msg_set_affinity(unsigne int vector; unsigned dest; - vector = assign_irq_vector(irq); + vector = assign_irq_vector(irq, mask); if (vector > 0) { - dest = cpu_mask_to_apicid(mask); + cpumask_t tmp; + + cpus_clear(tmp); + cpu_set(vector >> 8, tmp); + dest = cpu_mask_to_apicid(tmp); msg->data &= ~MSI_DATA_VECTOR_MASK; msg->data |= MSI_DATA_VECTOR(vector); @@ -2113,6 +2193,8 @@ int io_apic_set_pci_routing (int ioapic, { struct IO_APIC_route_entry entry; unsigned long flags; + int vector; + cpumask_t mask; if (!IO_APIC_IRQ(irq)) { apic_printk(APIC_QUIET,KERN_ERR "IOAPIC[%d]: Invalid reference to IRQ 0\n", @@ -2120,6 +2202,21 @@ int io_apic_set_pci_routing (int ioapic, return -EINVAL; } + irq = gsi_irq_sharing(irq); + /* + * IRQs < 16 are already in the irq_2_pin[] map + */ + if (irq >= 16) + add_pin_to_irq(irq, ioapic, pin); + + + vector = assign_irq_vector(irq, TARGET_CPUS); + if (vector < 0) + return vector; + + cpus_clear(mask); + cpu_set(vector >> 8, mask); + /* * Generate a PCI IRQ routing entry and program the IOAPIC accordingly. * Note that we mask (disable) IRQs now -- these get enabled when the @@ -2130,19 +2227,11 @@ int io_apic_set_pci_routing (int ioapic, entry.delivery_mode = INT_DELIVERY_MODE; entry.dest_mode = INT_DEST_MODE; - entry.dest.logical.logical_dest = cpu_mask_to_apicid(TARGET_CPUS); + entry.dest.logical.logical_dest = cpu_mask_to_apicid(mask); entry.trigger = triggering; entry.polarity = polarity; entry.mask = 1; /* Disabled (masked) */ - - irq = gsi_irq_sharing(irq); - /* - * IRQs < 16 are already in the irq_2_pin[] map - */ - if (irq >= 16) - add_pin_to_irq(irq, ioapic, pin); - - entry.vector = assign_irq_vector(irq); + entry.vector = vector & 0xff; apic_printk(APIC_VERBOSE,KERN_DEBUG "IOAPIC[%d]: Set PCI routing entry (%d-%d -> 0x%x -> " "IRQ %d Mode:%i Active:%i)\n", ioapic, diff --git a/arch/x86_64/kernel/irq.c b/arch/x86_64/kernel/irq.c index dd8d79a..2de92cc 100644 --- a/arch/x86_64/kernel/irq.c +++ b/arch/x86_64/kernel/irq.c @@ -99,7 +99,7 @@ asmlinkage unsigned int do_IRQ(struct pt exit_idle(); irq_enter(); - irq = vector_irq[vector]; + irq = __get_cpu_var(vector_irq)[vector]; if (likely(irq < NR_IRQS)) generic_handle_irq(irq, regs); diff --git a/include/asm-x86_64/hw_irq.h b/include/asm-x86_64/hw_irq.h index 9f6a0bf..2c79d03 100644 --- a/include/asm-x86_64/hw_irq.h +++ b/include/asm-x86_64/hw_irq.h @@ -73,8 +73,9 @@ #define FIRST_SYSTEM_VECTOR 0xef /* du #ifndef __ASSEMBLY__ -extern u8 irq_vector[NR_IRQ_VECTORS]; -extern int vector_irq[NR_VECTORS]; +extern unsigned int irq_vector[NR_IRQ_VECTORS]; +typedef int vector_irq_t[NR_VECTORS]; +DECLARE_PER_CPU(vector_irq_t, vector_irq); #define IO_APIC_VECTOR(irq) (irq_vector[irq]) /* diff --git a/include/asm-x86_64/irq.h b/include/asm-x86_64/irq.h index 0c8d570..d3b5790 100644 --- a/include/asm-x86_64/irq.h +++ b/include/asm-x86_64/irq.h @@ -31,8 +31,8 @@ #define NR_VECTORS 256 #define FIRST_SYSTEM_VECTOR 0xef /* duplicated in hw_irq.h */ -#define NR_IRQS 224 -#define NR_IRQ_VECTORS (32 * NR_CPUS) +#define NR_IRQS (NR_VECTORS + (32 *NR_CPUS)) +#define NR_IRQ_VECTORS NR_IRQS static __inline__ int irq_canonicalize(int irq) { -- 1.4.0.gc07e ^ permalink raw reply related [flat|nested] 50+ messages in thread
* [PATCH 23/25] x86_64 irq: Kill gsi_irq_sharing. 2006-06-20 22:28 ` [PATCH 22/25] x86_64 irq: make vector_irq per cpu Eric W. Biederman @ 2006-06-20 22:28 ` Eric W. Biederman 2006-06-20 22:28 ` [PATCH 24/25] x86_64 irq: Kill irq compression Eric W. Biederman 0 siblings, 1 reply; 50+ messages in thread From: Eric W. Biederman @ 2006-06-20 22:28 UTC (permalink / raw) To: Andrew Morton Cc: linux-kernel, linux-acpi, linux-pci, discuss, Ingo Molnar, Thomas Gleixner, Andi Kleen, Natalie Protasevich, Len Brown, Kimball Murray, Brice Goglin, Greg Lindahl, Dave Olson, Jeff Garzik, Greg KH, Grant Grundler, bibo,mao, Rajesh Shah, Mark Maule, Jesper Juhl, Shaohua Li, Matthew Wilcox, Michael S. Tsirkin, Ashok Raj, Randy Dunlap After raising the number of irqs the system supports this function is no longer necessary. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> --- arch/i386/kernel/acpi/boot.c | 5 +-- arch/x86_64/kernel/io_apic.c | 62 ------------------------------------------ 2 files changed, 1 insertions(+), 66 deletions(-) diff --git a/arch/i386/kernel/acpi/boot.c b/arch/i386/kernel/acpi/boot.c index 6e7edc0..74fd53c 100644 --- a/arch/i386/kernel/acpi/boot.c +++ b/arch/i386/kernel/acpi/boot.c @@ -42,7 +42,6 @@ #ifdef CONFIG_X86_64 extern void __init clustered_apic_check(void); -extern int gsi_irq_sharing(int gsi); #include <asm/proto.h> static inline int acpi_madt_oem_check(char *oem_id, char *oem_table_id) { return 0; } @@ -55,8 +54,6 @@ #include <mach_apic.h> #include <mach_mpparse.h> #endif /* CONFIG_X86_LOCAL_APIC */ -static inline int gsi_irq_sharing(int gsi) { return gsi; } - #endif /* X86 */ #define BAD_MADT_ENTRY(entry, end) ( \ @@ -459,7 +456,7 @@ void __init acpi_pic_sci_set_trigger(uns int acpi_gsi_to_irq(u32 gsi, unsigned int *irq) { - *irq = gsi_irq_sharing(gsi); + *irq = gsi; return 0; } diff --git a/arch/x86_64/kernel/io_apic.c b/arch/x86_64/kernel/io_apic.c index f818d64..a904ba7 100644 --- a/arch/x86_64/kernel/io_apic.c +++ b/arch/x86_64/kernel/io_apic.c @@ -164,8 +164,6 @@ static void set_ioapic_affinity_irq(unsi } #endif -static u8 gsi_2_irq[NR_IRQ_VECTORS] = { [0 ... NR_IRQ_VECTORS-1] = 0xFF }; - /* * The common case is 1:1 IRQ<->pin mappings. Sometimes there are * shared ISA-space IRQs, so we have to support them. We are super @@ -721,64 +719,6 @@ static inline int irq_trigger(int idx) return MPBIOS_trigger(idx); } -static int next_irq = 16; - -/* - * gsi_irq_sharing -- Name overload! "irq" can be either a legacy IRQ - * in the range 0-15, a linux IRQ in the range 0-223, or a GSI number - * from ACPI, which can reach 800 in large boxen. - * - * Compact the sparse GSI space into a sequential IRQ series and reuse - * vectors if possible. - */ -int gsi_irq_sharing(int gsi) -{ - int i, tries, vector; - - BUG_ON(gsi >= NR_IRQ_VECTORS); - - if (platform_legacy_irq(gsi)) - return gsi; - - if (gsi_2_irq[gsi] != 0xFF) - return (int)gsi_2_irq[gsi]; - - tries = NR_IRQS; - try_again: - vector = assign_irq_vector(gsi, TARGET_CPUS); - - /* - * Sharing vectors means sharing IRQs, so scan irq_vectors for previous - * use of vector and if found, return that IRQ. However, we never want - * to share legacy IRQs, which usually have a different trigger mode - * than PCI. - */ - for (i = 0; i < NR_IRQS; i++) - if (IO_APIC_VECTOR(i) == vector) - break; - if (platform_legacy_irq(i)) { - if (--tries >= 0) { - IO_APIC_VECTOR(i) = 0; - goto try_again; - } - panic("gsi_irq_sharing: didn't find an IRQ using vector 0x%02X for GSI %d", vector, gsi); - } - if (i < NR_IRQS) { - gsi_2_irq[gsi] = i; - printk(KERN_INFO "GSI %d sharing vector 0x%02X and IRQ %d\n", - gsi, vector, i); - return i; - } - - i = next_irq++; - BUG_ON(i >= NR_IRQS); - gsi_2_irq[gsi] = i; - IO_APIC_VECTOR(i) = vector; - printk(KERN_INFO "GSI %d assigned vector 0x%02X and IRQ %d\n", - gsi, vector, i); - return i; -} - static int pin_2_irq(int idx, int apic, int pin) { int irq, i; @@ -808,7 +748,6 @@ static int pin_2_irq(int idx, int apic, while (i < apic) irq += nr_ioapic_registers[i++]; irq += pin; - irq = gsi_irq_sharing(irq); break; } default: @@ -2202,7 +2141,6 @@ int io_apic_set_pci_routing (int ioapic, return -EINVAL; } - irq = gsi_irq_sharing(irq); /* * IRQs < 16 are already in the irq_2_pin[] map */ -- 1.4.0.gc07e ^ permalink raw reply related [flat|nested] 50+ messages in thread
* [PATCH 24/25] x86_64 irq: Kill irq compression. 2006-06-20 22:28 ` [PATCH 23/25] x86_64 irq: Kill gsi_irq_sharing Eric W. Biederman @ 2006-06-20 22:28 ` Eric W. Biederman 2006-06-20 22:28 ` [PATCH 25/25] irq: Document what an IRQ is Eric W. Biederman 0 siblings, 1 reply; 50+ messages in thread From: Eric W. Biederman @ 2006-06-20 22:28 UTC (permalink / raw) To: Andrew Morton Cc: linux-kernel, linux-acpi, linux-pci, discuss, Ingo Molnar, Thomas Gleixner, Andi Kleen, Natalie Protasevich, Len Brown, Kimball Murray, Brice Goglin, Greg Lindahl, Dave Olson, Jeff Garzik, Greg KH, Grant Grundler, bibo,mao, Rajesh Shah, Mark Maule, Jesper Juhl, Shaohua Li, Matthew Wilcox, Michael S. Tsirkin, Ashok Raj, Randy Dunlap With more irqs in the system we don't need this. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> --- arch/x86_64/kernel/io_apic.c | 5 ----- arch/x86_64/kernel/mpparse.c | 42 +----------------------------------------- include/asm-x86_64/io_apic.h | 1 - 3 files changed, 1 insertions(+), 47 deletions(-) diff --git a/arch/x86_64/kernel/io_apic.c b/arch/x86_64/kernel/io_apic.c index a904ba7..c4be89e 100644 --- a/arch/x86_64/kernel/io_apic.c +++ b/arch/x86_64/kernel/io_apic.c @@ -1722,8 +1722,6 @@ static inline void unlock_ExtINT_logic(v spin_unlock_irqrestore(&ioapic_lock, flags); } -int timer_uses_ioapic_pin_0; - /* * This code may look a bit paranoid, but it's supposed to cooperate with * a wide range of boards and BIOS bugs. Fortunately only the timer IRQ @@ -1760,9 +1758,6 @@ static inline void check_timer(void) pin2 = ioapic_i8259.pin; apic2 = ioapic_i8259.apic; - if (pin1 == 0) - timer_uses_ioapic_pin_0 = 1; - apic_printk(APIC_VERBOSE,KERN_INFO "..TIMER: vector=0x%02X apic1=%d pin1=%d apic2=%d pin2=%d\n", vector, apic1, pin1, apic2, pin2); diff --git a/arch/x86_64/kernel/mpparse.c b/arch/x86_64/kernel/mpparse.c index 083da7e..304cef6 100644 --- a/arch/x86_64/kernel/mpparse.c +++ b/arch/x86_64/kernel/mpparse.c @@ -910,20 +910,11 @@ void __init mp_config_acpi_legacy_irqs ( return; } -#define MAX_GSI_NUM 4096 - int mp_register_gsi(u32 gsi, int triggering, int polarity) { int ioapic = -1; int ioapic_pin = 0; int idx, bit = 0; - static int pci_irq = 16; - /* - * Mapping between Global System Interrupts, which - * represent all possible interrupts, to the IRQs - * assigned to actual devices. - */ - static int gsi_to_irq[MAX_GSI_NUM]; if (acpi_irq_model != ACPI_IRQ_MODEL_IOAPIC) return gsi; @@ -956,42 +947,11 @@ int mp_register_gsi(u32 gsi, int trigger if ((1<<bit) & mp_ioapic_routing[ioapic].pin_programmed[idx]) { Dprintk(KERN_DEBUG "Pin %d-%d already programmed\n", mp_ioapic_routing[ioapic].apic_id, ioapic_pin); - return gsi_to_irq[gsi]; + return gsi; } mp_ioapic_routing[ioapic].pin_programmed[idx] |= (1<<bit); - if (triggering == ACPI_LEVEL_SENSITIVE) { - /* - * For PCI devices assign IRQs in order, avoiding gaps - * due to unused I/O APIC pins. - */ - int irq = gsi; - if (gsi < MAX_GSI_NUM) { - /* - * Retain the VIA chipset work-around (gsi > 15), but - * avoid a problem where the 8254 timer (IRQ0) is setup - * via an override (so it's not on pin 0 of the ioapic), - * and at the same time, the pin 0 interrupt is a PCI - * type. The gsi > 15 test could cause these two pins - * to be shared as IRQ0, and they are not shareable. - * So test for this condition, and if necessary, avoid - * the pin collision. - */ - if (gsi > 15 || (gsi == 0 && !timer_uses_ioapic_pin_0)) - gsi = pci_irq++; - /* - * Don't assign IRQ used by ACPI SCI - */ - if (gsi == acpi_fadt.sci_int) - gsi = pci_irq++; - gsi_to_irq[irq] = gsi; - } else { - printk(KERN_ERR "GSI %u is too high\n", gsi); - return gsi; - } - } - io_apic_set_pci_routing(ioapic, ioapic_pin, gsi, triggering == ACPI_EDGE_SENSITIVE ? 0 : 1, polarity == ACPI_ACTIVE_HIGH ? 0 : 1); diff --git a/include/asm-x86_64/io_apic.h b/include/asm-x86_64/io_apic.h index 06806b1..c5235d6 100644 --- a/include/asm-x86_64/io_apic.h +++ b/include/asm-x86_64/io_apic.h @@ -164,7 +164,6 @@ #ifdef CONFIG_ACPI extern int io_apic_get_version (int ioapic); extern int io_apic_get_redir_entries (int ioapic); extern int io_apic_set_pci_routing (int ioapic, int pin, int irq, int, int); -extern int timer_uses_ioapic_pin_0; #endif extern int sis_apic_bug; /* dummy */ -- 1.4.0.gc07e ^ permalink raw reply related [flat|nested] 50+ messages in thread
* [PATCH 25/25] irq: Document what an IRQ is. 2006-06-20 22:28 ` [PATCH 24/25] x86_64 irq: Kill irq compression Eric W. Biederman @ 2006-06-20 22:28 ` Eric W. Biederman 0 siblings, 0 replies; 50+ messages in thread From: Eric W. Biederman @ 2006-06-20 22:28 UTC (permalink / raw) To: Andrew Morton Cc: linux-kernel, linux-acpi, linux-pci, discuss, Ingo Molnar, Thomas Gleixner, Andi Kleen, Natalie Protasevich, Len Brown, Kimball Murray, Brice Goglin, Greg Lindahl, Dave Olson, Jeff Garzik, Greg KH, Grant Grundler, bibo,mao, Rajesh Shah, Mark Maule, Jesper Juhl, Shaohua Li, Matthew Wilcox, Michael S. Tsirkin, Ashok Raj, Randy Dunlap From: Eric W. Biederman <ebiederman@lnxi.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> --- Documentation/IRQ.txt | 22 ++++++++++++++++++++++ 1 files changed, 22 insertions(+), 0 deletions(-) diff --git a/Documentation/IRQ.txt b/Documentation/IRQ.txt new file mode 100644 index 0000000..237235d --- /dev/null +++ b/Documentation/IRQ.txt @@ -0,0 +1,22 @@ +What is an IRQ? + +An IRQ is an interrupt request from a device. +Currently they can come in over a pin, or over a packet. +Several devices may be connected to the same pin thus +sharing an IRQ. + +An IRQ number is a kernel identifier used to talk about a hardware +interrupt source. Typically this is an index into the global irq_desc +array, but except for what linux/interrupt.h implements the details +are architecture specific. + +An IRQ number is an enumeration of the possible interrupt sources on a +machine. Typically what is enumerated is the number of input pins on +all of the interrupt controller in the system. In the case of ISA +what is enumerated are the 16 input pins on the two i8259 interrupt +controllers. + +Architectures can assign additional meaning to the IRQ numbers, and +are encouraged to in the case where there is any manual configuration +of the hardware involved. The ISA IRQs are a classic example of +assigning this kind of additional meaning. -- 1.4.0.gc07e ^ permalink raw reply related [flat|nested] 50+ messages in thread
* Re: [PATCH 11/25] i386 irq: Dynamic irq support 2006-06-20 22:28 ` [PATCH 11/25] i386 " Eric W. Biederman 2006-06-20 22:28 ` [PATCH 12/25] x86_64 " Eric W. Biederman @ 2006-06-21 1:50 ` Rajesh Shah 2006-06-21 2:21 ` Eric W. Biederman 1 sibling, 1 reply; 50+ messages in thread From: Rajesh Shah @ 2006-06-21 1:50 UTC (permalink / raw) To: Eric W. Biederman Cc: Andrew Morton, linux-kernel, linux-acpi, linux-pci, discuss, Ingo Molnar, Thomas Gleixner, Andi Kleen, Natalie Protasevich, Len Brown, Kimball Murray, Brice Goglin, Greg Lindahl, Dave Olson, Jeff Garzik, Greg KH, Grant Grundler, bibo,mao, Rajesh Shah, Mark Maule, Jesper Juhl, Shaohua Li, Matthew Wilcox, Michael S. Tsirkin, Ashok Raj, Randy Dunlap On Tue, Jun 20, 2006 at 04:28:24PM -0600, Eric W. Biederman wrote: > The current implementation of create_irq() is a hack but it is the > current hack that msi.c uses, and unfortunately the ``generic'' apic > msi ops depend on this hack. Thus we are stuck this hack of assuming > irq == vector until the depencencies in the generic msi code are removed. > > Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> > --- > arch/i386/kernel/io_apic.c | 48 ++++++++++++++++++++++++++++++++++++++++++++ > 1 files changed, 48 insertions(+), 0 deletions(-) > > diff --git a/arch/i386/kernel/io_apic.c b/arch/i386/kernel/io_apic.c > index 16966f4..04f78ff 100644 > --- a/arch/i386/kernel/io_apic.c > +++ b/arch/i386/kernel/io_apic.c > @@ -2497,6 +2497,54 @@ static int __init ioapic_init_sysfs(void > > device_initcall(ioapic_init_sysfs); > > +#ifdef CONFIG_PCI_MSI > +/* It would be really good to decouple MSI implementation from IO APICs, since there's really no real hardware dependence here. This code can actually go to arch/xxx/pci/msi-apic.c Rajesh ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 11/25] i386 irq: Dynamic irq support 2006-06-21 1:50 ` [PATCH 11/25] i386 irq: Dynamic irq support Rajesh Shah @ 2006-06-21 2:21 ` Eric W. Biederman 2006-06-21 2:27 ` Rajesh Shah 0 siblings, 1 reply; 50+ messages in thread From: Eric W. Biederman @ 2006-06-21 2:21 UTC (permalink / raw) To: Rajesh Shah Cc: Andrew Morton, linux-kernel, linux-acpi, linux-pci, discuss, Ingo Molnar, Thomas Gleixner, Andi Kleen, Natalie Protasevich, Len Brown, Kimball Murray, Brice Goglin, Greg Lindahl, Dave Olson, Jeff Garzik, Greg KH, Grant Grundler, bibo,mao, Jesper Juhl, Shaohua Li, Matthew Wilcox, Michael S. Tsirkin, Ashok Raj, Randy Dunlap, Roland Dreier, Tony Luck Rajesh Shah <rajesh.shah@intel.com> writes: > On Tue, Jun 20, 2006 at 04:28:24PM -0600, Eric W. Biederman wrote: >> The current implementation of create_irq() is a hack but it is the >> current hack that msi.c uses, and unfortunately the ``generic'' apic >> msi ops depend on this hack. Thus we are stuck this hack of assuming >> irq == vector until the depencencies in the generic msi code are removed. >> >> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> >> --- >> arch/i386/kernel/io_apic.c | 48 ++++++++++++++++++++++++++++++++++++++++++++ >> 1 files changed, 48 insertions(+), 0 deletions(-) >> >> diff --git a/arch/i386/kernel/io_apic.c b/arch/i386/kernel/io_apic.c >> index 16966f4..04f78ff 100644 >> --- a/arch/i386/kernel/io_apic.c >> +++ b/arch/i386/kernel/io_apic.c >> @@ -2497,6 +2497,54 @@ static int __init ioapic_init_sysfs(void >> >> device_initcall(ioapic_init_sysfs); >> >> +#ifdef CONFIG_PCI_MSI >> +/* > > It would be really good to decouple MSI implementation from IO > APICs, since there's really no real hardware dependence here. > This code can actually go to arch/xxx/pci/msi-apic.c I agree in theory. In practice however msi interrupts look like io_apics. with a different register set and the use all of the same support facilities. So until that part of the architecture is refactored it doesn't make much sense. There is a slightly better case for moving the code into a separate file. Namely I think I know of a second common implementation for x86_64. At which point the files will probably be named msi-intel.c and msi-amd.c Or something like that. The name msi-apic.c is at least as bad as putting the code in io_apic.c Eric ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 11/25] i386 irq: Dynamic irq support 2006-06-21 2:21 ` Eric W. Biederman @ 2006-06-21 2:27 ` Rajesh Shah 2006-06-21 14:07 ` Eric W. Biederman 0 siblings, 1 reply; 50+ messages in thread From: Rajesh Shah @ 2006-06-21 2:27 UTC (permalink / raw) To: Eric W. Biederman Cc: Rajesh Shah, Andrew Morton, linux-kernel, linux-acpi, linux-pci, discuss, Ingo Molnar, Thomas Gleixner, Andi Kleen, Natalie Protasevich, Len Brown, Kimball Murray, Brice Goglin, Greg Lindahl, Dave Olson, Jeff Garzik, Greg KH, Grant Grundler, bibo,mao, Jesper Juhl, Shaohua Li, Matthew Wilcox, Michael S. Tsirkin, Ashok Raj, Randy Dunlap, Roland Dreier On Tue, Jun 20, 2006 at 08:21:00PM -0600, Eric W. Biederman wrote: > Rajesh Shah <rajesh.shah@intel.com> writes: > > > It would be really good to decouple MSI implementation from IO > > APICs, since there's really no real hardware dependence here. > > This code can actually go to arch/xxx/pci/msi-apic.c > > I agree in theory. In practice however msi interrupts look like io_apics. > with a different register set and the use all of the same support facilities. > So until that part of the architecture is refactored it doesn't make much > sense. There is a slightly better case for moving the code into a separate > file. Namely I think I know of a second common implementation for x86_64. > At which point the files will probably be named msi-intel.c and msi-amd.c > Or something like that. > Actually, I meant just the vector tracking code could be in a separate file and the ioapic and msi code could both assign vectors from a common routine. I had the patch below in my patchkit, plus another patch for x86_64 to do the same thing in io_apic.c and share the same intrvec.c file between the two archs. Once you have this, the MSI callbacks in arch code can be moved out of io_apic.c arch/i386/kernel/Makefile | 2 arch/i386/kernel/intrvec.c | 94 ++++++++++++++++++++++++++++ arch/i386/kernel/io_apic.c | 26 ++----- include/asm-i386/mach-default/irq_vectors.h | 1 4 files changed, 105 insertions(+), 18 deletions(-) Index: linux-2.6.17-rc6-mm2/arch/i386/kernel/intrvec.c =================================================================== --- /dev/null +++ linux-2.6.17-rc6-mm2/arch/i386/kernel/intrvec.c @@ -0,0 +1,94 @@ + +/* + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. + * + * Copyright (C) 2006 Intel Corporation (rajesh.shah@intel.com) + * + */ + +#include <linux/irq.h> + +/* + * Code to manage interrupt vectors to program for IO-APIC and PCI + * Message Signalled Interrupts (MSI/MSI-X) + */ +#define VECTOR_STRIDE 8 +#define NUM_VECTORS (FIRST_SYSTEM_VECTOR-FIRST_DEVICE_VECTOR) + +static DEFINE_SPINLOCK(intr_vector_lock); +static DECLARE_BITMAP(vectors_used, NUM_VECTORS); +int current_vector = FIRST_DEVICE_VECTOR; +int offset = 0; + +static inline void mark_vector_used(int vector) +{ + if ((vector >= FIRST_DEVICE_VECTOR) && (vector < FIRST_SYSTEM_VECTOR)) + set_bit((vector - FIRST_DEVICE_VECTOR), vectors_used); +} + +static inline void mark_vector_free(int vector) +{ + if ((vector >= FIRST_DEVICE_VECTOR) && (vector < FIRST_SYSTEM_VECTOR)) + clear_bit((vector - FIRST_DEVICE_VECTOR), vectors_used); +} + +static inline int is_used(int vector) +{ + if ((vector < FIRST_DEVICE_VECTOR) || (vector >= FIRST_SYSTEM_VECTOR)) + return 1; + return (test_bit((vector - FIRST_DEVICE_VECTOR), vectors_used)); +} + +int assign_vector(void) +{ + unsigned long flags; + int vector; + + spin_lock_irqsave(&intr_vector_lock, flags); + if (bitmap_full(vectors_used, NUM_VECTORS)) { + spin_unlock_irqrestore(&intr_vector_lock, flags); + return -1; + } + vector = current_vector; + while (is_used(vector)) { + vector += VECTOR_STRIDE; + if (vector >= FIRST_SYSTEM_VECTOR) + vector = FIRST_DEVICE_VECTOR + + (++offset % VECTOR_STRIDE); + } + mark_vector_used(vector); + current_vector = vector; + spin_unlock_irqrestore(&intr_vector_lock, flags); + return vector; +} + +void free_vector(int vector) +{ + unsigned long flags; + + spin_lock_irqsave(&intr_vector_lock, flags); + mark_vector_free(vector); + current_vector = vector; /* use this vector at next request */ + spin_unlock_irqrestore(&intr_vector_lock, flags); +} + +static int __init init_vector_array(void) +{ + mark_vector_used(IA32_SYSCALL_VECTOR); + return 0; +} + +core_initcall(init_vector_array); + Index: linux-2.6.17-rc6-mm2/arch/i386/kernel/Makefile =================================================================== --- linux-2.6.17-rc6-mm2.orig/arch/i386/kernel/Makefile +++ linux-2.6.17-rc6-mm2/arch/i386/kernel/Makefile @@ -21,7 +21,7 @@ obj-$(CONFIG_X86_SMP) += smp.o smpboot. obj-$(CONFIG_X86_TRAMPOLINE) += trampoline.o obj-$(CONFIG_X86_MPPARSE) += mpparse.o obj-$(CONFIG_X86_LOCAL_APIC) += apic.o nmi.o -obj-$(CONFIG_X86_IO_APIC) += io_apic.o +obj-$(CONFIG_X86_IO_APIC) += io_apic.o intrvec.o obj-$(CONFIG_X86_REBOOTFIXUPS) += reboot_fixups.o obj-$(CONFIG_KEXEC) += machine_kexec.o relocate_kernel.o crash.o obj-$(CONFIG_CRASH_DUMP) += crash_dump.o Index: linux-2.6.17-rc6-mm2/arch/i386/kernel/io_apic.c =================================================================== --- linux-2.6.17-rc6-mm2.orig/arch/i386/kernel/io_apic.c +++ linux-2.6.17-rc6-mm2/arch/i386/kernel/io_apic.c @@ -95,6 +95,8 @@ int vector_irq[NR_VECTORS] __read_mostly #define vector_to_irq(vector) (vector) #endif +extern int assign_vector(void); + /* * The common case is 1:1 IRQ<->pin mappings. Sometimes there are * shared ISA-space IRQs, so we have to support them. We are super @@ -1163,38 +1165,28 @@ u8 irq_vector[NR_IRQ_VECTORS] __read_mos int assign_irq_vector(int irq) { - static int current_vector = FIRST_DEVICE_VECTOR, offset = 0; - unsigned long flags; int vector; BUG_ON(irq != AUTO_ASSIGN && (unsigned)irq >= NR_IRQ_VECTORS); - spin_lock_irqsave(&vector_lock, flags); + spin_lock(&vector_lock); if (irq != AUTO_ASSIGN && IO_APIC_VECTOR(irq) > 0) { - spin_unlock_irqrestore(&vector_lock, flags); + spin_unlock(&vector_lock); return IO_APIC_VECTOR(irq); } -next: - current_vector += 8; - if (current_vector == SYSCALL_VECTOR) - goto next; - if (current_vector >= FIRST_SYSTEM_VECTOR) { - offset++; - if (!(offset%8)) { - spin_unlock_irqrestore(&vector_lock, flags); - return -ENOSPC; - } - current_vector = FIRST_DEVICE_VECTOR + offset; + vector = assign_vector(); + if (vector < 0) { + spin_unlock(&vector_lock); + return -1; } - vector = current_vector; vector_irq[vector] = irq; if (irq != AUTO_ASSIGN) IO_APIC_VECTOR(irq) = vector; - spin_unlock_irqrestore(&vector_lock, flags); + spin_unlock(&vector_lock); return vector; } Index: linux-2.6.17-rc6-mm2/include/asm-i386/mach-default/irq_vectors.h =================================================================== --- linux-2.6.17-rc6-mm2.orig/include/asm-i386/mach-default/irq_vectors.h +++ linux-2.6.17-rc6-mm2/include/asm-i386/mach-default/irq_vectors.h @@ -29,6 +29,7 @@ #define FIRST_EXTERNAL_VECTOR 0x20 #define SYSCALL_VECTOR 0x80 +#define IA32_SYSCALL_VECTOR SYSCALL_VECTOR /* * Vectors 0x20-0x2f are used for ISA interrupts. ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 11/25] i386 irq: Dynamic irq support 2006-06-21 2:27 ` Rajesh Shah @ 2006-06-21 14:07 ` Eric W. Biederman 0 siblings, 0 replies; 50+ messages in thread From: Eric W. Biederman @ 2006-06-21 14:07 UTC (permalink / raw) To: Rajesh Shah Cc: Andrew Morton, linux-kernel, linux-acpi, linux-pci, discuss, Ingo Molnar, Thomas Gleixner, Andi Kleen, Natalie Protasevich, Len Brown, Kimball Murray, Brice Goglin, Greg Lindahl, Dave Olson, Jeff Garzik, Greg KH, Grant Grundler, bibo,mao, Jesper Juhl, Shaohua Li, Matthew Wilcox, Michael S. Tsirkin, Ashok Raj, Randy Dunlap, Roland Dreier, Tony Luck Rajesh Shah <rajesh.shah@intel.com> writes: > On Tue, Jun 20, 2006 at 08:21:00PM -0600, Eric W. Biederman wrote: >> Rajesh Shah <rajesh.shah@intel.com> writes: >> >> > It would be really good to decouple MSI implementation from IO >> > APICs, since there's really no real hardware dependence here. >> > This code can actually go to arch/xxx/pci/msi-apic.c >> >> I agree in theory. In practice however msi interrupts look like io_apics. >> with a different register set and the use all of the same support facilities. >> So until that part of the architecture is refactored it doesn't make much >> sense. There is a slightly better case for moving the code into a separate >> file. Namely I think I know of a second common implementation for x86_64. >> At which point the files will probably be named msi-intel.c and msi-amd.c >> Or something like that. >> > Actually, I meant just the vector tracking code could be in a > separate file and the ioapic and msi code could both assign > vectors from a common routine. I had the patch below in my > patchkit, plus another patch for x86_64 to do the same thing > in io_apic.c and share the same intrvec.c file between the > two archs. Once you have this, the MSI callbacks in arch > code can be moved out of io_apic.c Well irq.c is probably the obvious place to put it. But that goes way beyond small obviously correct steps. So there is no way I'm going to include a change like that in the middle of my patchset because it is unnecessary. Doing this kind of thing later is certainly sane. I guess this is a difference in focus. You have been focused on code cleanup. I have been focused on breaking the unnatural tying between parts of the code. As for this specific patch it makes no sense to only move half of assign_irq_vector to a different file. Eric ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 9/25] irq: Add a dynamic irq creation API 2006-06-20 22:28 ` [PATCH 9/25] irq: Add a dynamic irq creation API Eric W. Biederman 2006-06-20 22:28 ` [PATCH 10/25] ia64 irq: Dynamic irq support Eric W. Biederman @ 2006-06-20 23:56 ` Benjamin Herrenschmidt 2006-06-21 1:01 ` Eric W. Biederman 1 sibling, 1 reply; 50+ messages in thread From: Benjamin Herrenschmidt @ 2006-06-20 23:56 UTC (permalink / raw) To: Eric W. Biederman Cc: Andrew Morton, linux-kernel, linux-acpi, linux-pci, discuss, Ingo Molnar, Thomas Gleixner, Andi Kleen, Natalie Protasevich, Len Brown, Kimball Murray, Brice Goglin, Greg Lindahl, Dave Olson, Jeff Garzik, Greg KH, Grant Grundler, bibo,mao, Rajesh Shah, Mark Maule, Jesper Juhl, Shaohua Li, Matthew Wilcox, Michael S. Tsirkin, Ashok Raj, Randy Dunlap On Tue, 2006-06-20 at 16:28 -0600, Eric W. Biederman wrote: > With the msi support comes a new concept in irq handling, > irqs that are created dynamically at run time. > > Currently the msi code allocates irqs backwards. First it > allocates a platform dependent routing value for an > interrupt the ``vector'' and then it figures out from the > vector which irq you are on. You may want to look at the work I'm currently doing for powerpc where we need a fully dynamic linux irq number allocation, completely separate spaces for hw numbers (vectors) and linux irq numbers for arbitrary PICs (and more than one in a given system) etc... I'll post a patch that shows the stuff I'm adding later today so you can have a look. There is some overlap with your dynamic irq stuff. I haven't completely ported all of powerpc to my new core yet which is why I haven't posted patches yet, but I'll have something out today. Ben. > This msi backwards allocator suffers from two basic > problems. The allocator suffers because it is trying > to do something that is architecture specific in a generic > way making it brittle, inflexible, and tied to tightly > to the architecture implementation. The alloctor also > suffers from it's very backwards nature as it has tied > things together that should have no dependencies. > > To solve the basic dynamic irq allocation problem two > new architecture specific functions are added: > create_irq and destroy_irq. > > create_irq takes no input and returns an unused irq number, > that won't be reused until it is returned to the free > poll with destroy_irq. The irq then can be used for > any purpose although the only initial consumer is > the msi code. > > destroy_irq takes an irq number allocated with create_irq > and returns it to the free pool. > > Making this functionality per architecture increases > the simplicity of the irq allocation code and increases > it's flexibility. > > dynamic_irq_init() and dynamic_irq_cleanup() are added > to automate the irq_desc initializtion that should happen > for dynamic irqs. > > Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> > --- > include/linux/irq.h | 9 +++++++- > kernel/irq/chip.c | 56 +++++++++++++++++++++++++++++++++++++++++++++++++++ > 2 files changed, 64 insertions(+), 1 deletions(-) > > diff --git a/include/linux/irq.h b/include/linux/irq.h > index b79d178..6d1ad88 100644 > --- a/include/linux/irq.h > +++ b/include/linux/irq.h > @@ -392,8 +392,15 @@ set_irq_chained_handler(unsigned int irq > __set_irq_handler(irq, handle, 1); > } > > -/* Set/get chip/data for an IRQ: */ > +/* Handle dynamic irq creation and destruction */ > +extern int create_irq(void); > +extern void destroy_irq(unsigned int irq); > + > +/* Dynamic irq helper functions */ > +extern void dynamic_irq_init(unsigned int irq); > +extern void dynamic_irq_cleanup(unsigned int irq); > > +/* Set/get chip/data for an IRQ: */ > extern int set_irq_chip(unsigned int irq, struct irq_chip *chip); > extern int set_irq_data(unsigned int irq, void *data); > extern int set_irq_chip_data(unsigned int irq, void *data); > diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c > index 431e9d5..9c01e48 100644 > --- a/kernel/irq/chip.c > +++ b/kernel/irq/chip.c > @@ -18,6 +18,62 @@ #include <linux/kernel_stat.h> > #include "internals.h" > > /** > + * dynamic_irq_init - initialize a dynamically allocated irq > + * @irq: irq number to initialize > + */ > +void dynamic_irq_init(unsigned int irq) > +{ > + struct irq_desc *desc; > + unsigned long flags; > + > + if (irq >= NR_IRQS) { > + printk(KERN_ERR "Trying to initialize invalid IRQ%d\n", irq); > + WARN_ON(1); > + return; > + } > + > + /* Ensure we don't have left over values from a previous use of this irq */ > + desc = irq_desc + irq; > + spin_lock_irqsave(&desc->lock, flags); > + desc->status = IRQ_DISABLED; > + desc->chip = &no_irq_chip; > + desc->handle_irq = handle_bad_irq; > + desc->depth = 1; > + desc->handler_data = NULL; > + desc->chip_data = NULL; > + desc->action = NULL; > + desc->irq_count = 0; > + desc->irqs_unhandled = 0; > +#ifdef CONFIG_SMP > + desc->affinity = CPU_MASK_ALL; > +#endif > + spin_unlock_irqrestore(&desc->lock, flags); > +} > + > +/** > + * dynamic_irq_cleanup - cleanup a dynamically allocated irq > + * @irq: irq number to initialize > + */ > +void dynamic_irq_cleanup(unsigned int irq) > +{ > + struct irq_desc *desc; > + unsigned long flags; > + > + if (irq >= NR_IRQS) { > + printk(KERN_ERR "Trying to cleanup invalid IRQ%d\n", irq); > + WARN_ON(1); > + return; > + } > + > + desc = irq_desc + irq; > + spin_lock_irqsave(&desc->lock, flags); > + desc->handle_irq = handle_bad_irq; > + desc->chip = &no_irq_chip; > + spin_unlock_irqrestore(&desc->lock, flags); > +} > + > + > +/** > * set_irq_chip - set the irq chip for an irq > * @irq: irq number > * @chip: pointer to irq chip description structure > -- > 1.4.0.gc07e > > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 9/25] irq: Add a dynamic irq creation API 2006-06-20 23:56 ` [PATCH 9/25] irq: Add a dynamic irq creation API Benjamin Herrenschmidt @ 2006-06-21 1:01 ` Eric W. Biederman 2006-06-21 1:33 ` Benjamin Herrenschmidt 2006-06-21 1:36 ` Matthew Wilcox 0 siblings, 2 replies; 50+ messages in thread From: Eric W. Biederman @ 2006-06-21 1:01 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: Andrew Morton, linux-kernel, linux-acpi, linux-pci, discuss, Ingo Molnar, Thomas Gleixner, Andi Kleen, Natalie Protasevich, Len Brown, Kimball Murray, Brice Goglin, Greg Lindahl, Dave Olson, Jeff Garzik, Greg KH, Grant Grundler, bibo,mao, Rajesh Shah, Mark Maule, Jesper Juhl, Shaohua Li, Matthew Wilcox, Michael S. Tsirkin, Ashok Raj, Randy Dunlap Benjamin Herrenschmidt <benh@kernel.crashing.org> writes: > On Tue, 2006-06-20 at 16:28 -0600, Eric W. Biederman wrote: >> With the msi support comes a new concept in irq handling, >> irqs that are created dynamically at run time. >> >> Currently the msi code allocates irqs backwards. First it >> allocates a platform dependent routing value for an >> interrupt the ``vector'' and then it figures out from the >> vector which irq you are on. > > You may want to look at the work I'm currently doing for powerpc where > we need a fully dynamic linux irq number allocation, completely separate > spaces for hw numbers (vectors) and linux irq numbers for arbitrary PICs > (and more than one in a given system) etc... > > I'll post a patch that shows the stuff I'm adding later today so you can > have a look. There is some overlap with your dynamic irq stuff. > > I haven't completely ported all of powerpc to my new core yet which is > why I haven't posted patches yet, but I'll have something out today. Sure. I know by the end of my patchset I have separated out hw numbers from the linux irq numbers, so this should work for powerpc. I would love to hear feedback on it though. So to be very clear what we mean, because I have gotten bitten in the past. I understand the linux irq number to be: a) An index in the irq_desc array. b) An enumeration of the hardware interrupts sources. c) Human visible so ideally it is neither arbitrary, nor very dynamic if the hardware is not. Then there is the destination cookie (vector on x86) that is available to the cpu when the interrupt is delivered. I think we are on a similar track but I'm not at all certain I like the idea of a fully dynamic linux irq number except in cases like MSI where your sources are dynamic. But I may be making the wrong assumptions about what you are doing. I think implementations where we expose the hardware cookie instead of an enumeration of irq sources like ia64 does, impedes debugging because the irq number will be different between boots, or loads of the kernel module. At the same time as long as we don't assume the irq number is the hardware cookie I don't see any maintenance problems with an implementation like that. What I do know is that on x86_64 the multiple levels of translation from the firmwares notion of the irq number to linux different notion of the irq number made the code overly complex and fragile. Which is one of the things I address in this patchset. But I would be happy to have a look, and I very much hope we can our implementations working together. Eric ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 9/25] irq: Add a dynamic irq creation API 2006-06-21 1:01 ` Eric W. Biederman @ 2006-06-21 1:33 ` Benjamin Herrenschmidt 2006-06-21 1:41 ` Jeff Garzik 2006-06-21 1:36 ` Matthew Wilcox 1 sibling, 1 reply; 50+ messages in thread From: Benjamin Herrenschmidt @ 2006-06-21 1:33 UTC (permalink / raw) To: Eric W. Biederman Cc: Andrew Morton, linux-kernel, linux-acpi, linux-pci, discuss, Ingo Molnar, Thomas Gleixner, Andi Kleen, Natalie Protasevich, Len Brown, Kimball Murray, Brice Goglin, Greg Lindahl, Dave Olson, Jeff Garzik, Greg KH, Grant Grundler, bibo,mao, Rajesh Shah, Mark Maule, Jesper Juhl, Shaohua Li, Matthew Wilcox, Michael S. Tsirkin, Ashok Raj, Randy Dunlap > Sure. I know by the end of my patchset I have separated out hw numbers > from the linux irq numbers, so this should work for powerpc. Almost :) I have to look in more details but we have more nasty issues... Some of the things I need for powerpc is: - On Hypervisor machines, the firmware does it all... that is, it allocates hardware vectors, enables the stuff on the device, etc etc... all of it. The only step we might still add on top of it is mapping those hw vectors to linux irq numbers. (Note that my new core stuff for powerpc that I'll post has explcit mecanisms for doing just that ... mapping hw interrupts/controller pairs to linux irq numbers) - We need control on what is in the irq_desc->chip and irq_desc->chip->ops of MSIs. That should be filled by something in msi_ops and not always set to the functions that muck with the config space etc... (same goes for affinity, no need for an msi_ops callback for that, just let us set what is in the irq_desc->chip and possibly provide "helpers" that do the config space way that intel can use there). For all of these, on both the above (hypervisor), but also MPIC based platforms like the G5 etc..., we need to use the normal IRQ ops of the PIC for enable/disable/affinity, etc... - msi_ops shall be per PCI bus. We can have completely different MSI handling hardware on separate busses of a given machine. For example, an Apple quad G5 has a register that triggers any source on the main MPIC on the primary PCIe segment (the one used by the gfx card), and has a completely separate HT2000 bridge on hypertransport that generates HT interrupts. Those are some of the basic requirements. Plus of course it has to fit in what I'm currently coding of course :) > I would love to hear feedback on it though. > > So to be very clear what we mean, because I have gotten bitten in the > past. I understand the linux irq number to be: > a) An index in the irq_desc array. Yes. > b) An enumeration of the hardware interrupts sources. Hrm... > c) Human visible so ideally it is neither arbitrary, nor > very dynamic if the hardware is not. Hrm... I have neither b) nor c) nowadays on powerpc.... "linux" irq numbers are purely a virtual thing, that is an index in irq_desc array and something we give to drivers to do request_irq() from. They can map onto hw interrupts, MSI-like messages, environment interrupts, could be hypervisor messgaes, in fact, it could be anything that remotely looks like an interrupt and the concept of "hw vector" is very blurry here... every interrupt controller defines it's own hardware vector space. On pSeries, hardware vectors are fairly big numbers that can encode the geographical location of the slot where the device is connected to, on some other hypervisor, they are 64 bits "tokens" representing an hypervisor object that can send events, etc etc.... My remapper _tries_ to assign a linux irq number that is equal to the hardware number whenever possible (because it's indeed nicer that way) but there is no hard requirement nor anything like that here. I might even add a hook to /proc/interrupt to be able to display the HW infos for each interrupts. > Then there is the destination cookie (vector on x86) that is > available to the cpu when the interrupt is delivered. Not sure what you mean by "destination cookie". I suppose you talk about the hw vector. That is, whatever hardware (or hypervisor) number/object represents a given interrupt source. I call that the hardware number. > I think we are on a similar track but I'm not at all certain I like > the idea of a fully dynamic linux irq number except in cases like MSI > where your sources are dynamic. But I may be making the wrong > assumptions about what you are doing. I'm doing a fully dynamic numbering. However, on a machine with a static hardware setup (like a powermac with no MSIs), you'll always end up with the same virtual numbers after boot and they'll happen to match the hardware source numbers because I'm nice and my remapper "tries" to allocate the same number if possible. > I think implementations where we expose the hardware cookie instead > of an enumeration of irq sources like ia64 does, impedes debugging > because the irq number will be different between boots, or loads > of the kernel module. At the same time as long as we don't assume > the irq number is the hardware cookie I don't see any maintenance > problems with an implementation like that. I'm not sure I understood completely your above sentence but if you mean that doing full virtual makes debugging harder... well.. it might not help _some_ classes of problem, but mostly it can be solved by having a hook to display the HW infos on the same line in /proc/interrupts. My remapper also reserved linux numbers 0...15 to map then 1:1 to legacy interrupts when a 8259 is present in the machine and keeeps them reserved (un-requestable) if not. That avoids tons of problems with legacy drivers loading on machines like powermac with no legacy stuff. > What I do know is that on x86_64 the multiple levels of translation > from the firmwares notion of the irq number to linux different > notion of the irq number made the code overly complex and fragile. > Which is one of the things I address in this patchset. It was a bit dodgy on powerpc too, which is why I'm doing a nice layer to handle it in a way that should be solid enough. A hardware number can be anything in the context of a given controller (I call it "host", bad name maybe, could be "domain" instead, it's a bit different than the "chip" exposed by genirq because a given hw numbering domain may span several chips on some archs and a given hw controller may use several chip structures for different classes of interrupts). Essentially, when you "discover" an interrupt and needs to map it to a linux interrupt (for example, when you discover a PCI device and need to fill pci_dev->irq, or whatever else), you call irq_create_mapping(host, hw_number, trigger). I also provide additional helpers that automatically give you those 3 informations from the firmware for various sort of devices, that sort of thing. > But I would be happy to have a look, and I very much > hope we can our implementations working together. I'm trying to split the giant patch so I can post only the bit that adds the new stuff :) Due to the way I work, I always first remove the old stuff so that things don't build unless I've fixed them all but that sucks for posting patch so I need ot do a bit of cleanup :) Ben. ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 9/25] irq: Add a dynamic irq creation API 2006-06-21 1:33 ` Benjamin Herrenschmidt @ 2006-06-21 1:41 ` Jeff Garzik 0 siblings, 0 replies; 50+ messages in thread From: Jeff Garzik @ 2006-06-21 1:41 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: Eric W. Biederman, Andrew Morton, linux-kernel, linux-acpi, linux-pci, discuss, Ingo Molnar, Thomas Gleixner, Andi Kleen, Natalie Protasevich, Len Brown, Kimball Murray, Brice Goglin, Greg Lindahl, Dave Olson, Greg KH, Grant Grundler, bibo,mao, Rajesh Shah, Mark Maule, Jesper Juhl, Shaohua Li, Matthew Wilcox, Michael S. Tsirkin, Ashok Raj, Randy Dunlap Benjamin Herrenschmidt wrote: > I have neither b) nor c) nowadays on powerpc.... "linux" irq numbers are > purely a virtual thing, that is an index in irq_desc array and something > we give to drivers to do request_irq() from. They can map onto hw > interrupts, MSI-like messages, environment interrupts, could be > hypervisor messgaes, in fact, it could be anything that remotely looks > like an interrupt and the concept of "hw vector" is very blurry here... > every interrupt controller defines it's own hardware vector space. On > pSeries, hardware vectors are fairly big numbers that can encode the > geographical location of the slot where the device is connected to, on > some other hypervisor, they are 64 bits "tokens" representing an > hypervisor object that can send events, etc etc.... Indeed... The return value from return_irq() is purely a cookie, and has been for quite some time. Jeff ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 9/25] irq: Add a dynamic irq creation API 2006-06-21 1:01 ` Eric W. Biederman 2006-06-21 1:33 ` Benjamin Herrenschmidt @ 2006-06-21 1:36 ` Matthew Wilcox 1 sibling, 0 replies; 50+ messages in thread From: Matthew Wilcox @ 2006-06-21 1:36 UTC (permalink / raw) To: Eric W. Biederman Cc: Benjamin Herrenschmidt, Andrew Morton, linux-kernel, linux-acpi, linux-pci, discuss, Ingo Molnar, Thomas Gleixner, Andi Kleen, Natalie Protasevich, Len Brown, Kimball Murray, Brice Goglin, Greg Lindahl, Dave Olson, Jeff Garzik, Greg KH, Grant Grundler, bibo,mao, Rajesh Shah, Mark Maule, Jesper Juhl, Shaohua Li, Michael S. Tsirkin, Ashok Raj, Randy Dunlap <rdunla> On Tue, Jun 20, 2006 at 07:01:52PM -0600, Eric W. Biederman wrote: > So to be very clear what we mean, because I have gotten bitten in the > past. I understand the linux irq number to be: > a) An index in the irq_desc array. > b) An enumeration of the hardware interrupts sources. > c) Human visible so ideally it is neither arbitrary, nor > very dynamic if the hardware is not. > > Then there is the destination cookie (vector on x86) that is > available to the cpu when the interrupt is delivered. > > I think we are on a similar track but I'm not at all certain I like > the idea of a fully dynamic linux irq number except in cases like MSI > where your sources are dynamic. But I may be making the wrong > assumptions about what you are doing. Hi Eric. Unfortunately, I've only received [0/25] so far, depsite both being on the cc list and on linux-pci. I'm getting all the replies though, so I'm hopeful I'll receive the original posts soon. Did you look at the parisc scheme? We have a fixed area for CPU interrupts and then an area for dynamic interrupt assignment. Since devices are typically discovered in the same order between boots, the interrupt number doesn't end up varying between boots. ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 8/25] msi: Simplify the msi irq limit policy. 2006-06-20 22:28 ` [PATCH 8/25] msi: Simplify the msi irq limit policy Eric W. Biederman 2006-06-20 22:28 ` [PATCH 9/25] irq: Add a dynamic irq creation API Eric W. Biederman @ 2006-06-21 1:28 ` Rajesh Shah 2006-06-21 2:46 ` Roland Dreier 2 siblings, 0 replies; 50+ messages in thread From: Rajesh Shah @ 2006-06-21 1:28 UTC (permalink / raw) To: Eric W. Biederman Cc: Andrew Morton, linux-kernel, linux-acpi, linux-pci, discuss, Ingo Molnar, Thomas Gleixner, Andi Kleen, Natalie Protasevich, Len Brown, Kimball Murray, Brice Goglin, Greg Lindahl, Dave Olson, Jeff Garzik, Greg KH, Grant Grundler, bibo,mao, Rajesh Shah, Mark Maule, Jesper Juhl, Shaohua Li, Matthew Wilcox, Michael S. Tsirkin, Ashok Raj, Randy Dunlap On Tue, Jun 20, 2006 at 04:28:21PM -0600, Eric W. Biederman wrote: > Currently we attempt to predict how many irqs we will be > able to allocate with msi using pci_vector_resources and some > complicated accounting, and then we only allow each device > as many irqs as we think are available on average. > > Only the s2io driver even takes advantage of this feature > all other drivers have a fixed number of irqs they need and > bail if they can't get them. > > pci_vector_resources is inaccurate if anyone ever frees an irq. > The whole implmentation is racy. The current irq limit policy > does not appear to make sense with current drivers. So I have > simplified things. We can revisit this we we need a more sophisticated > policy. > > Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> > --- > arch/i386/pci/irq.c | 30 ----------------------------- > arch/ia64/pci/pci.c | 9 --------- > drivers/pci/msi.c | 53 ++++++++------------------------------------------- > drivers/pci/msi.h | 11 ----------- > 4 files changed, 8 insertions(+), 95 deletions(-) > > diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c > index 40499c0..772f5b6 100644 > --- a/drivers/pci/msi.c > +++ b/drivers/pci/msi.c > > @@ -542,11 +540,6 @@ void pci_scan_msi_device(struct pci_dev > { > if (!dev) > return; > - > - if (pci_find_capability(dev, PCI_CAP_ID_MSIX) > 0) > - nr_msix_devices++; > - else if (pci_find_capability(dev, PCI_CAP_ID_MSI) > 0) > - nr_reserved_vectors++; > } Actually, why not just eliminate this function and the corresponding call from probe.c? It does nothing useful with all the vector tracking gone from generic MSI code. Rajesh ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 8/25] msi: Simplify the msi irq limit policy. 2006-06-20 22:28 ` [PATCH 8/25] msi: Simplify the msi irq limit policy Eric W. Biederman 2006-06-20 22:28 ` [PATCH 9/25] irq: Add a dynamic irq creation API Eric W. Biederman 2006-06-21 1:28 ` [PATCH 8/25] msi: Simplify the msi irq limit policy Rajesh Shah @ 2006-06-21 2:46 ` Roland Dreier 2006-06-21 3:48 ` Eric W. Biederman 2 siblings, 1 reply; 50+ messages in thread From: Roland Dreier @ 2006-06-21 2:46 UTC (permalink / raw) To: Eric W. Biederman Cc: Andrew Morton, linux-kernel, linux-acpi, linux-pci, discuss, Ingo Molnar, Thomas Gleixner, Andi Kleen, Natalie Protasevich, Len Brown, Kimball Murray, Brice Goglin, Greg Lindahl, Dave Olson, Jeff Garzik, Greg KH, Grant Grundler, bibo,mao, Rajesh Shah, Mark Maule, Jesper Juhl, Shaohua Li, Matthew Wilcox, Michael S. Tsirkin, Ashok Raj, Randy Dunlap > Only the s2io driver even takes advantage of this feature > all other drivers have a fixed number of irqs they need and > bail if they can't get them. My todo list for the mthca (InfiniBand HCA) driver includes adding support for more event queues. When I do that, I'll likely want to try to get something on the order of number_of_cpus plus two or three MSI-X vectors, and fall back to a lower number of vectors if that allocation fails. - R. ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 8/25] msi: Simplify the msi irq limit policy. 2006-06-21 2:46 ` Roland Dreier @ 2006-06-21 3:48 ` Eric W. Biederman 0 siblings, 0 replies; 50+ messages in thread From: Eric W. Biederman @ 2006-06-21 3:48 UTC (permalink / raw) To: Roland Dreier Cc: Andrew Morton, linux-kernel, linux-acpi, linux-pci, discuss, Ingo Molnar, Thomas Gleixner, Andi Kleen, Natalie Protasevich, Len Brown, Kimball Murray, Brice Goglin, Greg Lindahl, Dave Olson, Jeff Garzik, Greg KH, Grant Grundler, bibo,mao, Rajesh Shah, Jesper Juhl, Shaohua Li, Matthew Wilcox, Michael S. Tsirkin, Ashok Raj, Randy Dunlap, Tony Luck <tony> Roland Dreier <rdreier@cisco.com> writes: > > Only the s2io driver even takes advantage of this feature > > all other drivers have a fixed number of irqs they need and > > bail if they can't get them. > > My todo list for the mthca (InfiniBand HCA) driver includes adding > support for more event queues. When I do that, I'll likely want to > try to get something on the order of number_of_cpus plus two or three > MSI-X vectors, and fall back to a lower number of vectors if that > allocation fails. Allowing drivers that can take advantage of large numbers of irqs is part of this patch. To be clear the method still supports allocate a bunch of irqs and then falling back. We have to kinds of drivers. Those that allocate a lot of irqs and allocate fewer if they can't get them, and those just allocate a couple and fail if they can't get them. The policy I deleted tries to be fair, and give all drivers the same number of irqs. What I left us with is a simple first come first serve policy. To do better you need an accurate count and you need to separate out the various kinds of drivers, and you need a shortage of irqs. Currently x86_64 hardware supports up to 244*NR_CPUS irqs. I don't expect we will exceed that limit any time soon even with a first come first serve policy. The worst case I can think of with your proposed irq allocation policy in the mthca driver is a 128 cpu machine with 128 IB cards in it. That would just barely fail with every driver allocating 3 IRQs per cpu, and it could be trivially fixed by putting in dual core cpus :) So when we exceed the limit on the number of IRQs we actually have then it probably makes sense to see if a policy more aggressive than first come first serve makes sense. But until then it is a waste of time and we should be concentrating our efforts on making more IRQs usable. Eric ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 7/25] msi: Refactor the msi_ops. 2006-06-20 22:28 ` [PATCH 7/25] msi: Refactor the msi_ops Eric W. Biederman 2006-06-20 22:28 ` [PATCH 8/25] msi: Simplify the msi irq limit policy Eric W. Biederman @ 2006-06-21 1:18 ` Rajesh Shah 1 sibling, 0 replies; 50+ messages in thread From: Rajesh Shah @ 2006-06-21 1:18 UTC (permalink / raw) To: Eric W. Biederman Cc: Andrew Morton, linux-kernel, linux-acpi, linux-pci, discuss, Ingo Molnar, Thomas Gleixner, Andi Kleen, Natalie Protasevich, Len Brown, Kimball Murray, Brice Goglin, Greg Lindahl, Dave Olson, Jeff Garzik, Greg KH, Grant Grundler, bibo,mao, Rajesh Shah, Mark Maule, Jesper Juhl, Shaohua Li, Matthew Wilcox, Michael S. Tsirkin, Ashok Raj, Randy Dunlap On Tue, Jun 20, 2006 at 04:28:20PM -0600, Eric W. Biederman wrote: > drivers/pci/msi-altix.c | 49 +++++++++++++++++++------------------ > drivers/pci/msi-apic.c | 36 ++++++++++++++------------- > drivers/pci/msi.c | 22 ++++++++--------- > drivers/pci/msi.h | 62 ----------------------------------------------- > include/linux/pci.h | 62 +++++++++++++++++++++++++++++++++++++++++++++++ I think the platform/arch specific code here should move to arch/. For example, msi-altix can go to arch/ia64/sn/pci, msi-apic.c can go to arch/i386/pci (and can be shared by x86_64)... Rajesh ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 6/25] msi: Implement helper functions read_msi_msg and write_msi_msg. 2006-06-20 22:28 ` [PATCH 6/25] msi: Implement helper functions read_msi_msg and write_msi_msg Eric W. Biederman 2006-06-20 22:28 ` [PATCH 7/25] msi: Refactor the msi_ops Eric W. Biederman @ 2006-06-21 1:04 ` Rajesh Shah 2006-06-21 1:43 ` Eric W. Biederman 1 sibling, 1 reply; 50+ messages in thread From: Rajesh Shah @ 2006-06-21 1:04 UTC (permalink / raw) To: Eric W. Biederman Cc: Andrew Morton, linux-kernel, linux-acpi, linux-pci, discuss, Ingo Molnar, Thomas Gleixner, Andi Kleen, Natalie Protasevich, Len Brown, Kimball Murray, Brice Goglin, Greg Lindahl, Dave Olson, Jeff Garzik, Greg KH, Grant Grundler, bibo,mao, Rajesh Shah, Mark Maule, Jesper Juhl, Shaohua Li, Matthew Wilcox, Michael S. Tsirkin, Ashok Raj, Randy Dunlap On Tue, Jun 20, 2006 at 04:28:19PM -0600, Eric W. Biederman wrote: > In support of this I also add a struct msi_msg that captures > the the two address and one data field ina typical msi message, > and I remember the pos and if the address is 64bit in > struct msi_desc. > One thing I found very useful was to kmalloc msi_msg at MSI/MSI-X enable time, and stick a pointer to it in the msi_desc structure, not just for the CONFIG_PM case. For MSI, there's a single pointer to track. This simplified a lot of code and allowed me to avoid pci config reads to read the hardware at various places. > Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> > --- > drivers/pci/msi.c | 195 +++++++++++++++++++++++++-------------------------- > drivers/pci/msi.h | 9 +- > include/linux/pci.h | 6 ++ > 3 files changed, 104 insertions(+), 106 deletions(-) > > diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c > index c1c93f0..e9db6c5 100644 > --- a/drivers/pci/msi.c > +++ b/drivers/pci/msi.c > @@ -94,63 +94,100 @@ static void msi_set_mask_bit(unsigned in > } > } > > -#ifdef CONFIG_SMP > -static void set_msi_affinity(unsigned int vector, cpumask_t cpu_mask) > +static void read_msi_msg(struct msi_desc *entry, struct msi_msg *msg) > { You wouldn't need this if you saved away the msi_msg values returned from ->setup(). Rajesh ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 6/25] msi: Implement helper functions read_msi_msg and write_msi_msg. 2006-06-21 1:04 ` [PATCH 6/25] msi: Implement helper functions read_msi_msg and write_msi_msg Rajesh Shah @ 2006-06-21 1:43 ` Eric W. Biederman 0 siblings, 0 replies; 50+ messages in thread From: Eric W. Biederman @ 2006-06-21 1:43 UTC (permalink / raw) To: Rajesh Shah Cc: Andrew Morton, linux-kernel, linux-acpi, linux-pci, discuss, Ingo Molnar, Thomas Gleixner, Andi Kleen, Natalie Protasevich, Len Brown, Kimball Murray, Brice Goglin, Greg Lindahl, Dave Olson, Jeff Garzik, Greg KH, Grant Grundler, bibo,mao, Mark Maule, Jesper Juhl, Shaohua Li, Matthew Wilcox, Michael S. Tsirkin, Ashok Raj, Randy Dunlap, Roland Dreier, Ton Rajesh Shah <rajesh.shah@intel.com> writes: > On Tue, Jun 20, 2006 at 04:28:19PM -0600, Eric W. Biederman wrote: >> In support of this I also add a struct msi_msg that captures >> the the two address and one data field ina typical msi message, >> and I remember the pos and if the address is 64bit in >> struct msi_desc. >> > One thing I found very useful was to kmalloc msi_msg at MSI/MSI-X > enable time, and stick a pointer to it in the msi_desc structure, > not just for the CONFIG_PM case. For MSI, there's a single pointer > to track. This simplified a lot of code and allowed me to avoid > pci config reads to read the hardware at various places. Well I think kmalloc is overkill. If we are going to keep it we can keep it in msi_desc. The structure is only 4 bytes longer than a pointer as it is. >> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> >> --- >> drivers/pci/msi.c | 195 +++++++++++++++++++++++++-------------------------- >> drivers/pci/msi.h | 9 +- >> include/linux/pci.h | 6 ++ >> 3 files changed, 104 insertions(+), 106 deletions(-) >> >> diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c >> index c1c93f0..e9db6c5 100644 >> --- a/drivers/pci/msi.c >> +++ b/drivers/pci/msi.c >> @@ -94,63 +94,100 @@ static void msi_set_mask_bit(unsigned in >> } >> } >> >> -#ifdef CONFIG_SMP >> -static void set_msi_affinity(unsigned int vector, cpumask_t cpu_mask) >> +static void read_msi_msg(struct msi_desc *entry, struct msi_msg *msg) >> { > > You wouldn't need this if you saved away the msi_msg values > returned from ->setup(). Sure. The point of my work was not so much to clean up the code. But to cleanup up the semantics in how the code worked with every one else. The read_msi_msg and write_msi_msg bits were just so I could read the code enough to clearly tell what was going on. To a large extent I suspect I have made possible a lot of additional cleanups all over the place. Eric ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 5/25] msi: Make the msi boolean tests return either 0 or 1. 2006-06-20 22:28 ` [PATCH 5/25] msi: Make the msi boolean tests return either 0 or 1 Eric W. Biederman 2006-06-20 22:28 ` [PATCH 6/25] msi: Implement helper functions read_msi_msg and write_msi_msg Eric W. Biederman @ 2006-06-20 22:45 ` Jeff Garzik 1 sibling, 0 replies; 50+ messages in thread From: Jeff Garzik @ 2006-06-20 22:45 UTC (permalink / raw) To: Eric W. Biederman Cc: Andrew Morton, linux-kernel, linux-acpi, linux-pci, discuss, Ingo Molnar, Thomas Gleixner, Andi Kleen, Natalie Protasevich, Len Brown, Kimball Murray, Brice Goglin, Greg Lindahl, Dave Olson, Greg KH, Grant Grundler, bibo,mao, Rajesh Shah, Mark Maule, Jesper Juhl, Shaohua Li, Matthew Wilcox, Michael S. Tsirkin, Ashok Raj, Randy Dunlap, Roland Dreier Eric W. Biederman wrote: > This allows the output of the msi tests to be stored directly > in a bit field. If you don't do this a value greater than > one will be truncated and become 0. Changing true to false > with bizare consequences. Another example of why bit fields are a pain in the butt... Jeff ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 4/25] msi: Simplify msi enable and disable. 2006-06-20 22:28 ` [PATCH 4/25] msi: Simplify msi enable and disable Eric W. Biederman 2006-06-20 22:28 ` [PATCH 5/25] msi: Make the msi boolean tests return either 0 or 1 Eric W. Biederman @ 2006-06-21 0:44 ` Rajesh Shah 2006-06-21 1:19 ` Eric W. Biederman 1 sibling, 1 reply; 50+ messages in thread From: Rajesh Shah @ 2006-06-21 0:44 UTC (permalink / raw) To: Eric W. Biederman Cc: Andrew Morton, linux-kernel, linux-acpi, linux-pci, discuss, Ingo Molnar, Thomas Gleixner, Andi Kleen, Natalie Protasevich, Len Brown, Kimball Murray, Brice Goglin, Greg Lindahl, Dave Olson, Jeff Garzik, Greg KH, Grant Grundler, bibo,mao, Rajesh Shah, Mark Maule, Jesper Juhl, Shaohua Li, Matthew Wilcox, Michael S. Tsirkin, Ashok Raj, Randy Dunlap On Tue, Jun 20, 2006 at 04:28:17PM -0600, Eric W. Biederman wrote: > > @@ -937,27 +936,8 @@ int pci_enable_msi(struct pci_dev* dev) > if (!pos) > return -EINVAL; > > - if (!msi_lookup_vector(dev, PCI_CAP_ID_MSI)) { > - /* Lookup Sucess */ > - unsigned long flags; > + BUG_ON(!msi_lookup_vector(dev, PCI_CAP_ID_MSI)); > A driver that calls pci_enable_msi() while MSI is already enabled will hit this BUG_ON. This is different from the behavior of some other pci functions like pci_enable_device(), which silently return success if the requested operation is a nop. It's pretty easy to do the same here too (ditto for MSI-X). Rajesh ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 4/25] msi: Simplify msi enable and disable. 2006-06-21 0:44 ` [PATCH 4/25] msi: Simplify msi enable and disable Rajesh Shah @ 2006-06-21 1:19 ` Eric W. Biederman 0 siblings, 0 replies; 50+ messages in thread From: Eric W. Biederman @ 2006-06-21 1:19 UTC (permalink / raw) To: Rajesh Shah Cc: Andrew Morton, linux-kernel, linux-acpi, linux-pci, discuss, Ingo Molnar, Thomas Gleixner, Andi Kleen, Natalie Protasevich, Len Brown, Kimball Murray, Brice Goglin, Greg Lindahl, Dave Olson, Jeff Garzik, Greg KH, Grant Grundler, bibo,mao, Mark Maule, Jesper Juhl, Shaohua Li, Matthew Wilcox, Michael S. Tsirkin, Ashok Raj, Randy Dunlap, Roland Dreier, Ton Rajesh Shah <rajesh.shah@intel.com> writes: > On Tue, Jun 20, 2006 at 04:28:17PM -0600, Eric W. Biederman wrote: >> >> @@ -937,27 +936,8 @@ int pci_enable_msi(struct pci_dev* dev) >> if (!pos) >> return -EINVAL; >> >> - if (!msi_lookup_vector(dev, PCI_CAP_ID_MSI)) { >> - /* Lookup Sucess */ >> - unsigned long flags; >> + BUG_ON(!msi_lookup_vector(dev, PCI_CAP_ID_MSI)); >> > A driver that calls pci_enable_msi() while MSI is already enabled > will hit this BUG_ON. This is different from the behavior of > some other pci functions like pci_enable_device(), which > silently return success if the requested operation is a nop. > It's pretty easy to do the same here too (ditto for MSI-X). With MSI-X we can't be a NOOP so it is clearly a bug. With MSI it might happen, and I don't have a problem if it becomes a noop. At the same time I'm not convinced it isn't a bug. All I really care about is that we don't try and do the impossible and enable the msi from a half initialized software state like we were doing before. That was really ugly and impossible to get right. Eric ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 0/25] Decouple IRQ issues (MSI, i386, x86_64, ia64) 2006-06-20 22:24 [PATCH 0/25] Decouple IRQ issues (MSI, i386, x86_64, ia64) Eric W. Biederman 2006-06-20 22:28 ` [PATCH 1/25] irq: Convert the move_irq flag from a 32bit word to a single bit Eric W. Biederman @ 2006-06-21 0:30 ` Rajesh Shah 2006-06-21 1:07 ` Eric W. Biederman 2006-06-21 14:10 ` [PATCH] Decouple IRQ issues (fix i386 compile issues) Eric W. Biederman 2006-06-21 10:24 ` [PATCH 0/25] Decouple IRQ issues (MSI, i386, x86_64, ia64) Ingo Molnar 2 siblings, 2 replies; 50+ messages in thread From: Rajesh Shah @ 2006-06-21 0:30 UTC (permalink / raw) To: Eric W. Biederman Cc: Andrew Morton, linux-kernel, linux-acpi, linux-pci, discuss, Ingo Molnar, Thomas Gleixner, Andi Kleen, Natalie Protasevich, Len Brown, Kimball Murray, Brice Goglin, Greg Lindahl, Dave Olson, Jeff Garzik, Greg KH, Grant Grundler, bibo,mao, Rajesh Shah, Mark Maule, Jesper Juhl, Shaohua Li, Matthew Wilcox, Michael S. Tsirkin, Ashok Raj, Randy Dunlap On Tue, Jun 20, 2006 at 04:24:35PM -0600, Eric W. Biederman wrote: > > The primary aim of this patch is to remove maintenances problems caused > by the irq infrastructure. The two big issues I address are an > artificially small cap on the number of irqs, and that MSI assumes > vector == irq. My primary focus is on x86_64 but I have touched > other architectures where necessary to keep them from breaking. > The MSI portions of this patchset is similar to the MSI cleanup I was working on. I'll drop my patchkit and instead comment on the relevant patches in this kit. I got a couple of minor compile errors on i386 (kernel/io_apic.c). I fixed them up by hand and the resulting kernel booted and worked with MSI in the limited testing I've done so far. thanks, Rajesh ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 0/25] Decouple IRQ issues (MSI, i386, x86_64, ia64) 2006-06-21 0:30 ` [PATCH 0/25] Decouple IRQ issues (MSI, i386, x86_64, ia64) Rajesh Shah @ 2006-06-21 1:07 ` Eric W. Biederman 2006-06-21 14:10 ` [PATCH] Decouple IRQ issues (fix i386 compile issues) Eric W. Biederman 1 sibling, 0 replies; 50+ messages in thread From: Eric W. Biederman @ 2006-06-21 1:07 UTC (permalink / raw) To: Rajesh Shah Cc: Andrew Morton, linux-kernel, linux-acpi, linux-pci, discuss, Ingo Molnar, Thomas Gleixner, Andi Kleen, Natalie Protasevich, Len Brown, Kimball Murray, Brice Goglin, Greg Lindahl, Dave Olson, Jeff Garzik, Greg KH, Grant Grundler, bibo,mao, Mark Maule, Jesper Juhl, Shaohua Li, Matthew Wilcox, Michael S. Tsirkin, Ashok Raj, Randy Dunlap, Roland Dreier, Ton Rajesh Shah <rajesh.shah@intel.com> writes: > On Tue, Jun 20, 2006 at 04:24:35PM -0600, Eric W. Biederman wrote: >> >> The primary aim of this patch is to remove maintenances problems caused >> by the irq infrastructure. The two big issues I address are an >> artificially small cap on the number of irqs, and that MSI assumes >> vector == irq. My primary focus is on x86_64 but I have touched >> other architectures where necessary to keep them from breaking. >> > The MSI portions of this patchset is similar to the MSI cleanup > I was working on. I'll drop my patchkit and instead comment on > the relevant patches in this kit. > > I got a couple of minor compile errors on i386 (kernel/io_apic.c). > I fixed them up by hand and the resulting kernel booted and > worked with MSI in the limited testing I've done so far. Sounds good. Hmm. I thought I had compile tested on i386. Something must have bit rotted since then. :( Eric ^ permalink raw reply [flat|nested] 50+ messages in thread
* [PATCH] Decouple IRQ issues (fix i386 compile issues) 2006-06-21 0:30 ` [PATCH 0/25] Decouple IRQ issues (MSI, i386, x86_64, ia64) Rajesh Shah 2006-06-21 1:07 ` Eric W. Biederman @ 2006-06-21 14:10 ` Eric W. Biederman 1 sibling, 0 replies; 50+ messages in thread From: Eric W. Biederman @ 2006-06-21 14:10 UTC (permalink / raw) To: Rajesh Shah Cc: Andrew Morton, linux-kernel, linux-acpi, linux-pci, discuss, Ingo Molnar, Thomas Gleixner, Andi Kleen, Natalie Protasevich, Len Brown, Kimball Murray, Brice Goglin, Greg Lindahl, Dave Olson, Jeff Garzik, Greg KH, Grant Grundler, bibo,mao, Mark Maule, Jesper Juhl, Shaohua Li, Matthew Wilcox, Michael S. Tsirkin, Ashok Raj, Randy Dunlap, Roland Dreier, Ton Rajesh Shah <rajesh.shah@intel.com> writes: > On Tue, Jun 20, 2006 at 04:24:35PM -0600, Eric W. Biederman wrote: >> >> The primary aim of this patch is to remove maintenances problems caused >> by the irq infrastructure. The two big issues I address are an >> artificially small cap on the number of irqs, and that MSI assumes >> vector == irq. My primary focus is on x86_64 but I have touched >> other architectures where necessary to keep them from breaking. >> > The MSI portions of this patchset is similar to the MSI cleanup > I was working on. I'll drop my patchkit and instead comment on > the relevant patches in this kit. > > I got a couple of minor compile errors on i386 (kernel/io_apic.c). > I fixed them up by hand and the resulting kernel booted and > worked with MSI in the limited testing I've done so far. Somewhere in the final round of cleanups I missed these two one liners. This is what it takes to fix the i386 build. Eric diff --git a/arch/i386/kernel/io_apic.c b/arch/i386/kernel/io_apic.c index 18a5c2a..3068cde 100644 --- a/arch/i386/kernel/io_apic.c +++ b/arch/i386/kernel/io_apic.c @@ -1173,7 +1173,6 @@ next: if (current_vector >= FIRST_SYSTEM_VECTOR) { offset++; if (!(offset%8)) { - spin_unlock_irqrestore(&vector_lock, flags); return -ENOSPC; } current_vector = FIRST_DEVICE_VECTOR + offset; @@ -2460,7 +2459,7 @@ void destroy_irq(unsigned int irq) { unsigned long flags; - dynmic_irq_cleanup(irq); + dynamic_irq_cleanup(irq); spin_lock_irqsave(&vector_lock, flags); irq_vector[irq] = 0; ^ permalink raw reply related [flat|nested] 50+ messages in thread
* Re: [PATCH 0/25] Decouple IRQ issues (MSI, i386, x86_64, ia64) 2006-06-20 22:24 [PATCH 0/25] Decouple IRQ issues (MSI, i386, x86_64, ia64) Eric W. Biederman 2006-06-20 22:28 ` [PATCH 1/25] irq: Convert the move_irq flag from a 32bit word to a single bit Eric W. Biederman 2006-06-21 0:30 ` [PATCH 0/25] Decouple IRQ issues (MSI, i386, x86_64, ia64) Rajesh Shah @ 2006-06-21 10:24 ` Ingo Molnar 2006-06-21 16:25 ` Greg KH 2 siblings, 1 reply; 50+ messages in thread From: Ingo Molnar @ 2006-06-21 10:24 UTC (permalink / raw) To: Eric W. Biederman Cc: Andrew Morton, linux-kernel, linux-acpi, linux-pci, discuss, Thomas Gleixner, Andi Kleen, Natalie Protasevich, Len Brown, Kimball Murray, Brice Goglin, Greg Lindahl, Dave Olson, Jeff Garzik, Greg KH, Grant Grundler, bibo,mao, Rajesh Shah, Mark Maule, Jesper Juhl, Shaohua Li, Matthew Wilcox, Michael S. Tsirkin, Ashok Raj, Randy Dunlap, Roland Dreier * Eric W. Biederman <ebiederm@xmission.com> wrote: > The following patchset is against 2.6.17-rc6-mm2. It was the only easy > place I could get everyones work who has been touching relevant code. > > The primary aim of this patch is to remove maintenances problems > caused by the irq infrastructure. The two big issues I address are an > artificially small cap on the number of irqs, and that MSI assumes > vector == irq. My primary focus is on x86_64 but I have touched other > architectures where necessary to keep them from breaking. Very nice! Your queue addresses all of the remaining grievances i had about the x86_64/i386 IRQ code (MSI/balancing) and does this ontop of genirq, which is very good. This is much more than i hoped for when you told us about your project! :) The only open bigger issue i guess (besides all the smaller code details that i'm sure we'll sort out) is timing. Your queue, as tempting as it is, is probably not 2.6.18 material. _I_ would certainly dare this for 2.6.18, but Andrew/Linus would kill me i guess. So the question is - are we brave/confident enough to try to stabilize this in the next couple of days and drop it into 2.6.18 together with the other bits of genirq? I strongly suspect that the bugs this patchset will introduce is roughly equal to the bugs we already have due to the existing MSI and irq-balancing unrobustnesses, so we might as well go for that, instead of prolonging the pain by doing a two-stage (or 3-stage) process. (which would be to introduce genirq stage #1 now, then introduce genirq stage #2 in 2.6.19) Delaying genirq to 2.6.19 altogether would be messy i think and would interfere with ben's (and others') platform plans. Hm? One big point of confidence would be if ia64 built and booted fine with these changes. Somehow ia64 seems to be the most sensitive to MSI (and genirq) changes. Ingo ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 0/25] Decouple IRQ issues (MSI, i386, x86_64, ia64) 2006-06-21 10:24 ` [PATCH 0/25] Decouple IRQ issues (MSI, i386, x86_64, ia64) Ingo Molnar @ 2006-06-21 16:25 ` Greg KH 2006-06-22 3:55 ` Eric W. Biederman 0 siblings, 1 reply; 50+ messages in thread From: Greg KH @ 2006-06-21 16:25 UTC (permalink / raw) To: Ingo Molnar Cc: Eric W. Biederman, Andrew Morton, linux-kernel, linux-acpi, linux-pci, discuss, Thomas Gleixner, Andi Kleen, Natalie Protasevich, Len Brown, Kimball Murray, Brice Goglin, Greg Lindahl, Dave Olson, Jeff Garzik, Greg KH, Grant Grundler, bibo,mao, Rajesh Shah, Mark Maule, Jesper Juhl, Shaohua Li, Matthew Wilcox, Michael S. Tsirkin, Ashok Raj, Randy Dunlap <rdunlap> On Wed, Jun 21, 2006 at 12:24:07PM +0200, Ingo Molnar wrote: > > * Eric W. Biederman <ebiederm@xmission.com> wrote: > > > The following patchset is against 2.6.17-rc6-mm2. It was the only easy > > place I could get everyones work who has been touching relevant code. > > > > The primary aim of this patch is to remove maintenances problems > > caused by the irq infrastructure. The two big issues I address are an > > artificially small cap on the number of irqs, and that MSI assumes > > vector == irq. My primary focus is on x86_64 but I have touched other > > architectures where necessary to keep them from breaking. > > Very nice! Your queue addresses all of the remaining grievances i had > about the x86_64/i386 IRQ code (MSI/balancing) and does this ontop of > genirq, which is very good. This is much more than i hoped for when you > told us about your project! :) > > The only open bigger issue i guess (besides all the smaller code details > that i'm sure we'll sort out) is timing. Your queue, as tempting as it > is, is probably not 2.6.18 material. _I_ would certainly dare this for > 2.6.18, but Andrew/Linus would kill me i guess. No, it needs to sit in -mm for a while. All of the new 2.6.18 stuff already has been in there, this is a bit too late for it. I have no problem with taking this and letting it get beat on for a few months and then go into 2.6.19. > So the question is - are we brave/confident enough to try to stabilize > this in the next couple of days and drop it into 2.6.18 together with > the other bits of genirq? No, see above please. > I strongly suspect that the bugs this patchset will introduce is > roughly equal to the bugs we already have due to the existing MSI and > irq-balancing unrobustnesses, so we might as well go for that, instead > of prolonging the pain by doing a two-stage (or 3-stage) process. > (which would be to introduce genirq stage #1 now, then introduce > genirq stage #2 in 2.6.19) Delaying genirq to 2.6.19 altogether would > be messy i think and would interfere with ben's (and others') platform > plans. Hm? I don't object to genirq to go in now for 2.6.18, but this series is too new. Incremental changes please :) thanks, greg k-h ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 0/25] Decouple IRQ issues (MSI, i386, x86_64, ia64) 2006-06-21 16:25 ` Greg KH @ 2006-06-22 3:55 ` Eric W. Biederman 0 siblings, 0 replies; 50+ messages in thread From: Eric W. Biederman @ 2006-06-22 3:55 UTC (permalink / raw) To: Greg KH Cc: Ingo Molnar, Andrew Morton, linux-kernel, linux-acpi, linux-pci, discuss, Thomas Gleixner, Andi Kleen, Natalie Protasevich, Len Brown, Kimball Murray, Brice Goglin, Greg Lindahl, Dave Olson, Jeff Garzik, Greg KH, Grant Grundler, bibo,mao, Rajesh Shah, Mark Maule, Jesper Juhl, Shaohua Li, Matthew Wilcox, Michael S. Tsirkin, Ashok Raj, Randy Dunlap Greg KH <greg@kroah.com> writes: >> Very nice! Your queue addresses all of the remaining grievances i had >> about the x86_64/i386 IRQ code (MSI/balancing) and does this ontop of >> genirq, which is very good. This is much more than i hoped for when you >> told us about your project! :) >> >> The only open bigger issue i guess (besides all the smaller code details >> that i'm sure we'll sort out) is timing. Your queue, as tempting as it >> is, is probably not 2.6.18 material. _I_ would certainly dare this for >> 2.6.18, but Andrew/Linus would kill me i guess. > > No, it needs to sit in -mm for a while. All of the new 2.6.18 stuff > already has been in there, this is a bit too late for it. I have no > problem with taking this and letting it get beat on for a few months and > then go into 2.6.19. As long as these patches get put somewhere for a beating I don't care. They are so cross architecture I'm not really certain where they should sit. >> So the question is - are we brave/confident enough to try to stabilize >> this in the next couple of days and drop it into 2.6.18 together with >> the other bits of genirq? > > No, see above please. If we did it I would support them. The important thing is that these patches get put somewhere so other people can build on them. If you don't count msi the changes are pretty trivial. Especially with the last couple of patches removed from the patchset. >> I strongly suspect that the bugs this patchset will introduce is >> roughly equal to the bugs we already have due to the existing MSI and >> irq-balancing unrobustnesses, so we might as well go for that, instead >> of prolonging the pain by doing a two-stage (or 3-stage) process. >> (which would be to introduce genirq stage #1 now, then introduce >> genirq stage #2 in 2.6.19) Delaying genirq to 2.6.19 altogether would >> be messy i think and would interfere with ben's (and others') platform >> plans. Hm? > > I don't object to genirq to go in now for 2.6.18, but this series is too > new. Incremental changes please :) I object to genirq going into 2.6.18 without fixing the x86_64 and the x86 regressions I spotted with the number of calls to move_native_irq. x86 calls it twice and x86_64 not at all. But that is a trivial fix. My patchset kills the root case (CONFIG_PCI_MSI non msi irq behavioral changes). Eric ^ permalink raw reply [flat|nested] 50+ messages in thread
end of thread, other threads:[~2006-06-22 3:56 UTC | newest] Thread overview: 50+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2006-06-20 22:24 [PATCH 0/25] Decouple IRQ issues (MSI, i386, x86_64, ia64) Eric W. Biederman 2006-06-20 22:28 ` [PATCH 1/25] irq: Convert the move_irq flag from a 32bit word to a single bit Eric W. Biederman 2006-06-20 22:28 ` [PATCH 2/25] irq: Add moved_masked_irq Eric W. Biederman 2006-06-20 22:28 ` [PATCH 3/25] x86_64 irq: Reenable migrating irqs to other cpus Eric W. Biederman 2006-06-20 22:28 ` [PATCH 4/25] msi: Simplify msi enable and disable Eric W. Biederman 2006-06-20 22:28 ` [PATCH 5/25] msi: Make the msi boolean tests return either 0 or 1 Eric W. Biederman 2006-06-20 22:28 ` [PATCH 6/25] msi: Implement helper functions read_msi_msg and write_msi_msg Eric W. Biederman 2006-06-20 22:28 ` [PATCH 7/25] msi: Refactor the msi_ops Eric W. Biederman 2006-06-20 22:28 ` [PATCH 8/25] msi: Simplify the msi irq limit policy Eric W. Biederman 2006-06-20 22:28 ` [PATCH 9/25] irq: Add a dynamic irq creation API Eric W. Biederman 2006-06-20 22:28 ` [PATCH 10/25] ia64 irq: Dynamic irq support Eric W. Biederman 2006-06-20 22:28 ` [PATCH 11/25] i386 " Eric W. Biederman 2006-06-20 22:28 ` [PATCH 12/25] x86_64 " Eric W. Biederman 2006-06-20 22:28 ` [PATCH 13/25] msi: Make the msi code irq based and not vector based Eric W. Biederman 2006-06-20 22:28 ` [PATCH 14/25] x86_64 irq: Move msi message composition into io_apic.c Eric W. Biederman 2006-06-20 22:28 ` [PATCH 15/25] i386 " Eric W. Biederman 2006-06-20 22:28 ` [PATCH 16/25] msi: Only build msi-apic.c on ia64 Eric W. Biederman 2006-06-20 22:28 ` [PATCH 17/25] x86_64 irq: Remove the msi assumption that irq == vector Eric W. Biederman 2006-06-20 22:28 ` [PATCH 18/25] i386 " Eric W. Biederman 2006-06-20 22:28 ` [PATCH 19/25] irq: Remove msi hacks Eric W. Biederman 2006-06-20 22:28 ` [PATCH 20/25] irq: Generalize the check for HARDIRQ_BITS Eric W. Biederman 2006-06-20 22:28 ` [PATCH 21/25] x86_64 irq: Make the external irq handlers report their vector, not the irq number Eric W. Biederman 2006-06-20 22:28 ` [PATCH 22/25] x86_64 irq: make vector_irq per cpu Eric W. Biederman 2006-06-20 22:28 ` [PATCH 23/25] x86_64 irq: Kill gsi_irq_sharing Eric W. Biederman 2006-06-20 22:28 ` [PATCH 24/25] x86_64 irq: Kill irq compression Eric W. Biederman 2006-06-20 22:28 ` [PATCH 25/25] irq: Document what an IRQ is Eric W. Biederman 2006-06-21 1:50 ` [PATCH 11/25] i386 irq: Dynamic irq support Rajesh Shah 2006-06-21 2:21 ` Eric W. Biederman 2006-06-21 2:27 ` Rajesh Shah 2006-06-21 14:07 ` Eric W. Biederman 2006-06-20 23:56 ` [PATCH 9/25] irq: Add a dynamic irq creation API Benjamin Herrenschmidt 2006-06-21 1:01 ` Eric W. Biederman 2006-06-21 1:33 ` Benjamin Herrenschmidt 2006-06-21 1:41 ` Jeff Garzik 2006-06-21 1:36 ` Matthew Wilcox 2006-06-21 1:28 ` [PATCH 8/25] msi: Simplify the msi irq limit policy Rajesh Shah 2006-06-21 2:46 ` Roland Dreier 2006-06-21 3:48 ` Eric W. Biederman 2006-06-21 1:18 ` [PATCH 7/25] msi: Refactor the msi_ops Rajesh Shah 2006-06-21 1:04 ` [PATCH 6/25] msi: Implement helper functions read_msi_msg and write_msi_msg Rajesh Shah 2006-06-21 1:43 ` Eric W. Biederman 2006-06-20 22:45 ` [PATCH 5/25] msi: Make the msi boolean tests return either 0 or 1 Jeff Garzik 2006-06-21 0:44 ` [PATCH 4/25] msi: Simplify msi enable and disable Rajesh Shah 2006-06-21 1:19 ` Eric W. Biederman 2006-06-21 0:30 ` [PATCH 0/25] Decouple IRQ issues (MSI, i386, x86_64, ia64) Rajesh Shah 2006-06-21 1:07 ` Eric W. Biederman 2006-06-21 14:10 ` [PATCH] Decouple IRQ issues (fix i386 compile issues) Eric W. Biederman 2006-06-21 10:24 ` [PATCH 0/25] Decouple IRQ issues (MSI, i386, x86_64, ia64) Ingo Molnar 2006-06-21 16:25 ` Greg KH 2006-06-22 3:55 ` Eric W. Biederman
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox