* Re: [Alsa-devel] snd-aoa: Add sound-layout-36 alias
From: Johannes Berg @ 2006-07-07 17:22 UTC (permalink / raw)
To: Andreas Schwab; +Cc: linuxppc-dev list, alsa-devel
In-Reply-To: <jed5chjpuj.fsf@sykes.suse.de>
[-- Attachment #1: Type: text/plain, Size: 206 bytes --]
On Fri, 2006-07-07 at 19:18 +0200, Andreas Schwab wrote:
>
> +MODULE_ALIAS("sound-layout-36");
Heh. I just sent a patch (in my series) that adds all the ones that were
still missing :)
johannes
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 828 bytes --]
^ permalink raw reply
* Re: G5 troubles booting powerpc-git (July 6)
From: Andrew Morton @ 2006-07-07 18:04 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: linuxppc-dev
In-Reply-To: <1152272710.9862.74.camel@localhost.localdomain>
On Fri, 07 Jul 2006 21:45:10 +1000
Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:
> On Fri, 2006-07-07 at 02:11 -0700, Andrew Morton wrote:
> > On Fri, 07 Jul 2006 18:57:47 +1000
> > Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:
> >
> > >
> > > In the meantime, try on your quad using a g5_defconfig
> >
> > I tried it with your recently-sent patch applied. It still dies in
> > smu_sensors_init().
>
> Do you have lockdep ?
Nope.
^ permalink raw reply
* Re: Linux v2.6.18-rc1
From: Steve Fox @ 2006-07-07 15:41 UTC (permalink / raw)
To: linuxppc-dev; +Cc: linux-kernel
In-Reply-To: <Pine.LNX.4.64.0607052115210.12404@g5.osdl.org>
We've got a ppc64 machine that won't boot with this due to an IDE error.
[snip]
Freeing unused kernel memory: 256k freed
running (1:2) /init autobench_args: ABAT:1152213829
creating device nodes .hda: lost interrupt
hda: lost interrupt
hda: lost interrupt
hda: lost interrupt
hda: lost interrupt
hda: lost interrupt
hda: lost interrupt
hda: lost interrupt
hda: lost interrupt
--
Steve Fox
IBM Linux Technology Center
^ permalink raw reply
* MPC5200 boot giving "request_module: runaway loop modprobe binfmt-4c46" errors
From: gturnock @ 2006-07-07 22:15 UTC (permalink / raw)
To: linuxppc-embedded
[-- Attachment #1: Type: text/html, Size: 16545 bytes --]
^ permalink raw reply
* [PATCH maple] Fix new interrupt code (MPIC endianness)
From: Segher Boessenkool @ 2006-07-08 0:37 UTC (permalink / raw)
To: linuxppc-dev; +Cc: Paul Mackerras
All U3/U4 based systems are big-endian, not all express it in their
device trees.
Signed-off-by: Segher Boessenkool <segher@kernel.crashing.org>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---
Index: linux-powerpc/arch/powerpc/platforms/maple/setup.c
===================================================================
--- linux-powerpc.orig/arch/powerpc/platforms/maple/setup.c 2006-07-03 19:56:28.469294976 +0200
+++ linux-powerpc/arch/powerpc/platforms/maple/setup.c 2006-07-03 20:05:01.121380920 +0200
@@ -259,6 +259,8 @@
/* XXX Maple specific bits */
flags |= MPIC_BROKEN_U3 | MPIC_WANTS_RESET;
+ /* All U3/U4 are big-endian, older SLOF firmware doesn't encode this */
+ flags |= MPIC_BIG_ENDIAN;
/* Setup the openpic driver. More device-tree junks, we hard code no
* ISUs for now. I'll have to revisit some stuffs with the folks doing
^ permalink raw reply
* [PATCH maple] Fix new interrupt code (MPIC detection)
From: Segher Boessenkool @ 2006-07-08 0:37 UTC (permalink / raw)
To: linuxppc-dev; +Cc: Paul Mackerras
As the code comment already says, the Maple device-tree is incorrect here;
make the Linux code detect the correct thing, too.
Signed-off-by: Segher Boessenkool <segher@kernel.crashing.org>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---
Index: linux-powerpc/arch/powerpc/platforms/maple/setup.c
===================================================================
--- linux-powerpc.orig/arch/powerpc/platforms/maple/setup.c 2006-07-03 19:38:29.597405696 +0200
+++ linux-powerpc/arch/powerpc/platforms/maple/setup.c 2006-07-03 19:56:28.469294976 +0200
@@ -221,10 +221,17 @@
* in Maple device-tree where the type of the controller is
* open-pic and not interrupt-controller
*/
- for_each_node_by_type(np, "open-pic") {
- mpic_node = np;
- break;
- }
+
+ for_each_node_by_type(np, "interrupt-controller")
+ if (device_is_compatible(np, "open-pic")) {
+ mpic_node = np;
+ break;
+ }
+ if (mpic_node == NULL)
+ for_each_node_by_type(np, "open-pic") {
+ mpic_node = np;
+ break;
+ }
if (mpic_node == NULL) {
printk(KERN_ERR
"Failed to locate the MPIC interrupt controller\n");
^ permalink raw reply
* help? baffled trying to figure out where time is being spent
From: Chris Friesen @ 2006-07-08 2:05 UTC (permalink / raw)
To: linuxppc-dev
Hi guys,
I'm looking at the following instrumented code. This is part of an
instrumented version of scheduler_tick(), running on what is essentially
a "maple" board. Dual 970fx, 4GB of memory. Modified 2.6.10.
Shortly after initial boot, on cpu1, it seems like somehow there is a
huge time gap between where "c" gets assigned and where "d" gets
assigned in the code below.
The time gap between "c" and "d" is about 4.2 seconds.
The time gap between "aj" and "bj" is about 6.17 seconds.
The next time through the loop, the printk() triggers with a complaint
that it took 6381 ticks between scheduler_tick() calls.
I'm baffled as to where the gap is coming from. Any ideas?
Interrupts are for certain disabled while this code runs. The printk
statement is not being reached, so the "sched_delta" check is failing.
The gethrtime() function simply reads the tbr on ppc64, so it maps to a
single instruction.
b=gethrtime();
sched_delta = schedtime - __get_cpu_var(last_sched_tick);
c=gethrtime();
aj=jiffies;
if (sched_delta > SCHED_INTERVAL_THRESH) {
/* disable exception history if specified */
if (disable_hist_on_event)
disable_history_buffer = 1;
printk(KERN_WARNING "cpu%d: jiffies: %lu, hrtime: %llu, %lu ticks
between scheduler_tick() calls\n",
smp_processor_id(), schedtime, (unsigned long long) gethrtime(),
sched_delta);
}
d=gethrtime();
bj=jiffies;
Chris
^ permalink raw reply
* [PATCH] powerpc: Slight rework of new irq handling (rfc)
From: Benjamin Herrenschmidt @ 2006-07-08 8:08 UTC (permalink / raw)
To: linuxppc-dev list
This patch slightly reworks the new irq code to fix a small design
error. I removed the passing of the trigger to the map() calls entirely.
Mapping a linux virtual irq to a physical irq does only just that.
Setting the trigger is a different action which has a different call.
The changes are:
- I no longer call host->ops->map() for an already mapped irq, I just
return the virtual number that was already mapped. It was called before
to give an opportunity to change the trigger, but that was causing
issues as that could happen while the interrupt was in use by a device,
and because of the trigger change, map would potentially muck around
with things in a racy way. That was causing much burden on a given's
controller implementation of map() to get it right. This is much simpler
now. map() is only called on the initial mapping of an irq, meaning that
you know that this irq is _not_ being used. You can initialize the
hardware if you want (though you don't have to).
- Controllers that can handle different type of triggers
(level/edge/etc...) now implement the standard irq_chip->set_type() call
as defined by the generic code. That means that you can use the standard
set_irq_type() to configure an irq line manually if you wish or (though
I don't like that interface), pass explicit trigger flags to
request_irq() as defined by the generic kernel interfaces. Also, using
those interfaces guarantees that your controller set_type callback is
called with the descriptor lock held, thus providing locking against
activity on the same interrupt (including mask/unmask/etc...)
automatically. A result is that, for example, MPIC's own map()
implementation calls irq_set_type(NONE) to configure the hardware to the
default triggers.
- To allow the above, the actual irq_map is new filled before map()
callback is called.
- The irq_create_of_mapping() (also used by irq_of_parse_and_map())
function for mapping interrupts from the device-tree now also call the
separate set_irq_type(), and only does so if there is a change in the
trigger type.
- While I was at it, I changed pci_read_irq_line() (which is the helper
I would expect most archs to use in their pcibios_fixup() to get the PCI
interrupt routing from the device tree) to also handle a fallback when
the DT mapping fails consisting of reading the PCI_INTERRUPT_LINE from
the device, mapping that through the default controller, and setting the
trigger to level low. That default behaviour works for several platforms
that don't have a proper interrupt tree like Pegasos.
The patch hasn't yet been tested that much, which is why I'm not yet
submitting it officially for upstream, I will do so tomorrow or monday.
In the meantime, comments welcome.
Segher: Your patches #1 and #2 are still needed.
Index: linux-irq-work/arch/powerpc/kernel/irq.c
===================================================================
--- linux-irq-work.orig/arch/powerpc/kernel/irq.c 2006-07-07 17:49:20.000000000 +1000
+++ linux-irq-work/arch/powerpc/kernel/irq.c 2006-07-08 09:36:20.000000000 +1000
@@ -391,15 +391,14 @@
irq_map[i].host = host;
smp_wmb();
- /* Clear some flags */
- get_irq_desc(i)->status
- &= ~(IRQ_NOREQUEST | IRQ_LEVEL);
+ /* Clear norequest flags */
+ get_irq_desc(i)->status &= ~IRQ_NOREQUEST;
/* Legacy flags are left to default at this point,
* one can then use irq_create_mapping() to
* explicitely change them
*/
- ops->map(host, i, i, 0);
+ ops->map(host, i, i);
}
break;
case IRQ_HOST_MAP_LINEAR:
@@ -457,13 +456,11 @@
}
unsigned int irq_create_mapping(struct irq_host *host,
- irq_hw_number_t hwirq,
- unsigned int flags)
+ irq_hw_number_t hwirq)
{
unsigned int virq, hint;
- pr_debug("irq: irq_create_mapping(0x%p, 0x%lx, 0x%x)\n",
- host, hwirq, flags);
+ pr_debug("irq: irq_create_mapping(0x%p, 0x%lx)\n", host, hwirq);
/* Look for default host if nececssary */
if (host == NULL)
@@ -482,7 +479,6 @@
virq = irq_find_mapping(host, hwirq);
if (virq != IRQ_NONE) {
pr_debug("irq: -> existing mapping on virq %d\n", virq);
- host->ops->map(host, virq, hwirq, flags);
return virq;
}
@@ -504,18 +500,18 @@
}
pr_debug("irq: -> obtained virq %d\n", virq);
- /* Clear some flags */
- get_irq_desc(virq)->status &= ~(IRQ_NOREQUEST | IRQ_LEVEL);
+ /* Clear IRQ_NOREQUEST flag */
+ get_irq_desc(virq)->status &= ~IRQ_NOREQUEST;
/* map it */
- if (host->ops->map(host, virq, hwirq, flags)) {
+ smp_wmb();
+ irq_map[virq].hwirq = hwirq;
+ smp_mb();
+ if (host->ops->map(host, virq, hwirq)) {
pr_debug("irq: -> mapping failed, freeing\n");
irq_free_virt(virq, 1);
return NO_IRQ;
}
- smp_wmb();
- irq_map[virq].hwirq = hwirq;
- smp_mb();
return virq;
}
EXPORT_SYMBOL_GPL(irq_create_mapping);
@@ -525,7 +521,8 @@
{
struct irq_host *host;
irq_hw_number_t hwirq;
- unsigned int flags = IRQ_TYPE_NONE;
+ unsigned int type = IRQ_TYPE_NONE;
+ unsigned int virq;
if (controller == NULL)
host = irq_default_host;
@@ -539,11 +536,20 @@
hwirq = intspec[0];
else {
if (host->ops->xlate(host, controller, intspec, intsize,
- &hwirq, &flags))
+ &hwirq, &type))
return NO_IRQ;
}
- return irq_create_mapping(host, hwirq, flags);
+ /* Create mapping */
+ virq = irq_create_mapping(host, hwirq);
+ if (virq == NO_IRQ)
+ return virq;
+
+ /* Set type if specified and different than the current one */
+ if (type != IRQ_TYPE_NONE &&
+ type != (get_irq_desc(virq)->status & IRQF_TRIGGER_MASK))
+ set_irq_type(virq, type);
+ return virq;
}
EXPORT_SYMBOL_GPL(irq_create_of_mapping);
Index: linux-irq-work/arch/powerpc/platforms/powermac/pci.c
===================================================================
--- linux-irq-work.orig/arch/powerpc/platforms/powermac/pci.c 2006-07-07 17:49:20.000000000 +1000
+++ linux-irq-work/arch/powerpc/platforms/powermac/pci.c 2006-07-08 09:36:20.000000000 +1000
@@ -46,7 +46,6 @@
static struct pci_controller *u3_agp;
static struct pci_controller *u4_pcie;
static struct pci_controller *u3_ht;
-#define has_second_ohare 0
#else
static int has_second_ohare;
#endif /* CONFIG_PPC64 */
@@ -993,6 +992,7 @@
/* Read interrupt from the device-tree */
pci_read_irq_line(dev);
+#ifdef CONFIG_PPC32
/* Fixup interrupt for the modem/ethernet combo controller.
* on machines with a second ohare chip.
* The number in the device tree (27) is bogus (correct for
@@ -1002,8 +1002,11 @@
*/
if (has_second_ohare &&
dev->vendor == PCI_VENDOR_ID_DEC &&
- dev->device == PCI_DEVICE_ID_DEC_TULIP_PLUS)
- dev->irq = irq_create_mapping(NULL, 60, 0);
+ dev->device == PCI_DEVICE_ID_DEC_TULIP_PLUS) {
+ dev->irq = irq_create_mapping(NULL, 60);
+ set_irq_type(dev->irq, IRQ_TYPE_LEVEL_LOW);
+ }
+#endif /* CONFIG_PPC32 */
}
}
Index: linux-irq-work/arch/powerpc/platforms/powermac/pic.c
===================================================================
--- linux-irq-work.orig/arch/powerpc/platforms/powermac/pic.c 2006-07-07 17:49:20.000000000 +1000
+++ linux-irq-work/arch/powerpc/platforms/powermac/pic.c 2006-07-08 09:37:18.000000000 +1000
@@ -291,7 +291,7 @@
}
static int pmac_pic_host_map(struct irq_host *h, unsigned int virq,
- irq_hw_number_t hw, unsigned int flags)
+ irq_hw_number_t hw)
{
struct irq_desc *desc = get_irq_desc(virq);
int level;
@@ -318,6 +318,7 @@
unsigned int *out_flags)
{
+ *out_flags = IRQ_TYPE_NONE;
*out_hwirq = *intspec;
return 0;
}
@@ -434,7 +435,7 @@
printk(KERN_INFO "irq: System has %d possible interrupts\n", max_irqs);
#ifdef CONFIG_XMON
- setup_irq(irq_create_mapping(NULL, 20, 0), &xmon_action);
+ setup_irq(irq_create_mapping(NULL, 20), &xmon_action);
#endif
}
#endif /* CONFIG_PPC32 */
@@ -579,9 +580,10 @@
flags |= OF_IMAP_OLDWORLD_MAC;
if (get_property(of_chosen, "linux,bootx", NULL) != NULL)
flags |= OF_IMAP_NO_PHANDLE;
- of_irq_map_init(flags);
#endif /* CONFIG_PPC_32 */
+ of_irq_map_init(flags);
+
/* We first try to detect Apple's new Core99 chipset, since mac-io
* is quite different on those machines and contains an IBM MPIC2.
*/
Index: linux-irq-work/arch/powerpc/sysdev/mpic.c
===================================================================
--- linux-irq-work.orig/arch/powerpc/sysdev/mpic.c 2006-07-07 17:49:20.000000000 +1000
+++ linux-irq-work/arch/powerpc/sysdev/mpic.c 2006-07-08 09:36:20.000000000 +1000
@@ -337,6 +337,17 @@
}
}
+#else /* CONFIG_MPIC_BROKEN_U3 */
+
+static inline int mpic_is_ht_interrupt(struct mpic *mpic, unsigned int source)
+{
+ return 0;
+}
+
+static void __init mpic_scan_ht_pics(struct mpic *mpic)
+{
+}
+
#endif /* CONFIG_MPIC_BROKEN_U3 */
@@ -405,11 +416,9 @@
unsigned int loops = 100000;
struct mpic *mpic = mpic_from_irq(irq);
unsigned int src = mpic_irq_to_hw(irq);
- unsigned long flags;
DBG("%p: %s: enable_irq: %d (src %d)\n", mpic, mpic->name, irq, src);
- spin_lock_irqsave(&mpic_lock, flags);
mpic_irq_write(src, MPIC_IRQ_VECTOR_PRI,
mpic_irq_read(src, MPIC_IRQ_VECTOR_PRI) &
~MPIC_VECPRI_MASK);
@@ -420,7 +429,6 @@
break;
}
} while(mpic_irq_read(src, MPIC_IRQ_VECTOR_PRI) & MPIC_VECPRI_MASK);
- spin_unlock_irqrestore(&mpic_lock, flags);
}
static void mpic_mask_irq(unsigned int irq)
@@ -428,11 +436,9 @@
unsigned int loops = 100000;
struct mpic *mpic = mpic_from_irq(irq);
unsigned int src = mpic_irq_to_hw(irq);
- unsigned long flags;
DBG("%s: disable_irq: %d (src %d)\n", mpic->name, irq, src);
- spin_lock_irqsave(&mpic_lock, flags);
mpic_irq_write(src, MPIC_IRQ_VECTOR_PRI,
mpic_irq_read(src, MPIC_IRQ_VECTOR_PRI) |
MPIC_VECPRI_MASK);
@@ -444,7 +450,6 @@
break;
}
} while(!(mpic_irq_read(src, MPIC_IRQ_VECTOR_PRI) & MPIC_VECPRI_MASK));
- spin_unlock_irqrestore(&mpic_lock, flags);
}
static void mpic_end_irq(unsigned int irq)
@@ -512,8 +517,7 @@
mpic_ht_end_irq(mpic, src);
mpic_eoi(mpic);
}
-
-#endif /* CONFIG_MPIC_BROKEN_U3 */
+#endif /* !CONFIG_MPIC_BROKEN_U3 */
#ifdef CONFIG_SMP
@@ -560,47 +564,74 @@
mpic_physmask(cpus_addr(tmp)[0]));
}
-static unsigned int mpic_flags_to_vecpri(unsigned int flags, int *level)
+static unsigned int mpic_type_to_vecpri(unsigned int type)
{
- unsigned int vecpri;
-
/* Now convert sense value */
- switch(flags & IRQ_TYPE_SENSE_MASK) {
+ switch(type & IRQ_TYPE_SENSE_MASK) {
case IRQ_TYPE_EDGE_RISING:
- vecpri = MPIC_VECPRI_SENSE_EDGE |
- MPIC_VECPRI_POLARITY_POSITIVE;
- *level = 0;
- break;
+ return MPIC_VECPRI_SENSE_EDGE | MPIC_VECPRI_POLARITY_POSITIVE;
case IRQ_TYPE_EDGE_FALLING:
- vecpri = MPIC_VECPRI_SENSE_EDGE |
- MPIC_VECPRI_POLARITY_NEGATIVE;
- *level = 0;
- break;
+ case IRQ_TYPE_EDGE_BOTH:
+ return MPIC_VECPRI_SENSE_EDGE | MPIC_VECPRI_POLARITY_NEGATIVE;
case IRQ_TYPE_LEVEL_HIGH:
- vecpri = MPIC_VECPRI_SENSE_LEVEL |
- MPIC_VECPRI_POLARITY_POSITIVE;
- *level = 1;
- break;
+ return MPIC_VECPRI_SENSE_LEVEL | MPIC_VECPRI_POLARITY_POSITIVE;
case IRQ_TYPE_LEVEL_LOW:
default:
- vecpri = MPIC_VECPRI_SENSE_LEVEL |
- MPIC_VECPRI_POLARITY_NEGATIVE;
- *level = 1;
+ return MPIC_VECPRI_SENSE_LEVEL | MPIC_VECPRI_POLARITY_NEGATIVE;
}
- return vecpri;
+}
+
+static int mpic_set_irq_type(unsigned int virq, unsigned int flow_type)
+{
+ struct mpic *mpic = mpic_from_irq(virq);
+ unsigned int src = mpic_irq_to_hw(virq);
+ struct irq_desc *desc = get_irq_desc(virq);
+ unsigned int vecpri, vold, vnew;
+
+ pr_debug("mpic: set_irq_type(mpic:@%p,virq:%d,src:%d,type:0x%x)\n",
+ mpic, virq, src, flow_type);
+
+ if (src >= mpic->irq_count)
+ return -EINVAL;
+
+ if (flow_type == IRQ_TYPE_NONE)
+ if (mpic->senses && src < mpic->senses_count)
+ flow_type = mpic->senses[src];
+ if (flow_type == IRQ_TYPE_NONE)
+ flow_type = IRQ_TYPE_LEVEL_LOW;
+
+ desc->status &= ~(IRQ_TYPE_SENSE_MASK | IRQ_LEVEL);
+ desc->status |= flow_type & IRQ_TYPE_SENSE_MASK;
+ if (flow_type & (IRQ_TYPE_LEVEL_HIGH | IRQ_TYPE_LEVEL_LOW))
+ desc->status |= IRQ_LEVEL;
+
+ if (mpic_is_ht_interrupt(mpic, src))
+ vecpri = MPIC_VECPRI_POLARITY_POSITIVE |
+ MPIC_VECPRI_SENSE_EDGE;
+ else
+ vecpri = mpic_type_to_vecpri(flow_type);
+
+ vold = mpic_irq_read(src, MPIC_IRQ_VECTOR_PRI);
+ vnew = vold & ~(MPIC_VECPRI_POLARITY_MASK | MPIC_VECPRI_SENSE_MASK);
+ vnew |= vecpri;
+ if (vold != vnew)
+ mpic_irq_write(src, MPIC_IRQ_VECTOR_PRI, vnew);
+
+ return 0;
}
static struct irq_chip mpic_irq_chip = {
- .mask = mpic_mask_irq,
- .unmask = mpic_unmask_irq,
- .eoi = mpic_end_irq,
+ .mask = mpic_mask_irq,
+ .unmask = mpic_unmask_irq,
+ .eoi = mpic_end_irq,
+ .set_type = mpic_set_irq_type,
};
#ifdef CONFIG_SMP
static struct irq_chip mpic_ipi_chip = {
- .mask = mpic_mask_ipi,
- .unmask = mpic_unmask_ipi,
- .eoi = mpic_end_ipi,
+ .mask = mpic_mask_ipi,
+ .unmask = mpic_unmask_ipi,
+ .eoi = mpic_end_ipi,
};
#endif /* CONFIG_SMP */
@@ -611,6 +642,7 @@
.mask = mpic_mask_irq,
.unmask = mpic_unmask_ht_irq,
.eoi = mpic_end_ht_irq,
+ .set_type = mpic_set_irq_type,
};
#endif /* CONFIG_MPIC_BROKEN_U3 */
@@ -624,18 +656,12 @@
}
static int mpic_host_map(struct irq_host *h, unsigned int virq,
- irq_hw_number_t hw, unsigned int flags)
+ irq_hw_number_t hw)
{
- struct irq_desc *desc = get_irq_desc(virq);
- struct irq_chip *chip;
struct mpic *mpic = h->host_data;
- u32 v, vecpri = MPIC_VECPRI_SENSE_LEVEL |
- MPIC_VECPRI_POLARITY_NEGATIVE;
- int level;
- unsigned long iflags;
+ struct irq_chip *chip;
- pr_debug("mpic: map virq %d, hwirq 0x%lx, flags: 0x%x\n",
- virq, hw, flags);
+ pr_debug("mpic: map virq %d, hwirq 0x%lx\n", virq, hw);
if (hw == MPIC_VEC_SPURRIOUS)
return -EINVAL;
@@ -654,44 +680,23 @@
if (hw >= mpic->irq_count)
return -EINVAL;
- /* If no sense provided, check default sense array */
- if (((flags & IRQ_TYPE_SENSE_MASK) == IRQ_TYPE_NONE) &&
- mpic->senses && hw < mpic->senses_count)
- flags |= mpic->senses[hw];
-
- vecpri = mpic_flags_to_vecpri(flags, &level);
- if (level)
- desc->status |= IRQ_LEVEL;
+ /* Default chip */
chip = &mpic->hc_irq;
#ifdef CONFIG_MPIC_BROKEN_U3
/* Check for HT interrupts, override vecpri */
- if (mpic_is_ht_interrupt(mpic, hw)) {
- vecpri &= ~(MPIC_VECPRI_SENSE_MASK |
- MPIC_VECPRI_POLARITY_MASK);
- vecpri |= MPIC_VECPRI_POLARITY_POSITIVE;
+ if (mpic_is_ht_interrupt(mpic, hw))
chip = &mpic->hc_ht_irq;
- }
-#endif
-
- /* Reconfigure irq. We must preserve the mask bit as we can be called
- * while the interrupt is still active (This may change in the future
- * but for now, it is the case).
- */
- spin_lock_irqsave(&mpic_lock, iflags);
- v = mpic_irq_read(hw, MPIC_IRQ_VECTOR_PRI);
- vecpri = (v &
- ~(MPIC_VECPRI_POLARITY_MASK | MPIC_VECPRI_SENSE_MASK)) |
- vecpri;
- if (vecpri != v)
- mpic_irq_write(hw, MPIC_IRQ_VECTOR_PRI, vecpri);
- spin_unlock_irqrestore(&mpic_lock, iflags);
+#endif /* CONFIG_MPIC_BROKEN_U3 */
- pr_debug("mpic: mapping as IRQ, vecpri = 0x%08x (was 0x%08x)\n",
- vecpri, v);
+ pr_debug("mpic: mapping to irq chip @%p\n", chip);
set_irq_chip_data(virq, mpic);
set_irq_chip_and_handler(virq, chip, handle_fasteoi_irq);
+
+ /* Set default irq type */
+ set_irq_type(virq, IRQ_TYPE_NONE);
+
return 0;
}
@@ -906,41 +911,16 @@
if (mpic->irq_count == 0)
mpic->irq_count = mpic->num_sources;
-#ifdef CONFIG_MPIC_BROKEN_U3
/* Do the HT PIC fixups on U3 broken mpic */
DBG("MPIC flags: %x\n", mpic->flags);
if ((mpic->flags & MPIC_BROKEN_U3) && (mpic->flags & MPIC_PRIMARY))
mpic_scan_ht_pics(mpic);
-#endif /* CONFIG_MPIC_BROKEN_U3 */
for (i = 0; i < mpic->num_sources; i++) {
/* start with vector = source number, and masked */
- u32 vecpri = MPIC_VECPRI_MASK | i | (8 << MPIC_VECPRI_PRIORITY_SHIFT);
- int level = 1;
+ u32 vecpri = MPIC_VECPRI_MASK | i |
+ (8 << MPIC_VECPRI_PRIORITY_SHIFT);
- /* do senses munging */
- if (mpic->senses && i < mpic->senses_count)
- vecpri |= mpic_flags_to_vecpri(mpic->senses[i],
- &level);
- else
- vecpri |= MPIC_VECPRI_SENSE_LEVEL;
-
- /* deal with broken U3 */
- if (mpic->flags & MPIC_BROKEN_U3) {
-#ifdef CONFIG_MPIC_BROKEN_U3
- if (mpic_is_ht_interrupt(mpic, i)) {
- vecpri &= ~(MPIC_VECPRI_SENSE_MASK |
- MPIC_VECPRI_POLARITY_MASK);
- vecpri |= MPIC_VECPRI_POLARITY_POSITIVE;
- }
-#else
- printk(KERN_ERR "mpic: BROKEN_U3 set, but CONFIG doesn't match\n");
-#endif
- }
-
- DBG("setup source %d, vecpri: %08x, level: %d\n", i, vecpri,
- (level != 0));
-
/* init hw */
mpic_irq_write(i, MPIC_IRQ_VECTOR_PRI, vecpri);
mpic_irq_write(i, MPIC_IRQ_DESTINATION,
@@ -1154,7 +1134,7 @@
for (i = 0; i < 4; i++) {
unsigned int vipi = irq_create_mapping(mpic->irqhost,
- MPIC_VEC_IPI_0 + i, 0);
+ MPIC_VEC_IPI_0 + i);
if (vipi == NO_IRQ) {
printk(KERN_ERR "Failed to map IPI %d\n", i);
break;
Index: linux-irq-work/drivers/macintosh/macio_asic.c
===================================================================
--- linux-irq-work.orig/drivers/macintosh/macio_asic.c 2006-07-07 17:49:20.000000000 +1000
+++ linux-irq-work/drivers/macintosh/macio_asic.c 2006-07-08 09:36:20.000000000 +1000
@@ -330,7 +330,7 @@
{
unsigned int irq;
- irq = irq_create_mapping(NULL, line, 0);
+ irq = irq_create_mapping(NULL, line);
if (irq != NO_IRQ) {
dev->interrupt[index].start = irq;
dev->interrupt[index].flags = IORESOURCE_IRQ;
Index: linux-irq-work/include/asm-powerpc/irq.h
===================================================================
--- linux-irq-work.orig/include/asm-powerpc/irq.h 2006-07-07 17:49:20.000000000 +1000
+++ linux-irq-work/include/asm-powerpc/irq.h 2006-07-08 09:36:20.000000000 +1000
@@ -83,25 +83,24 @@
int (*match)(struct irq_host *h, struct device_node *node);
/* Create or update a mapping between a virtual irq number and a hw
- * irq number. This can be called several times for the same mapping
- * but with different flags, though unmap shall always be called
- * before the virq->hw mapping is changed.
+ * irq number. This is called only once for a given mapping.
*/
- int (*map)(struct irq_host *h, unsigned int virq,
- irq_hw_number_t hw, unsigned int flags);
+ int (*map)(struct irq_host *h, unsigned int virq, irq_hw_number_t hw);
/* Dispose of such a mapping */
void (*unmap)(struct irq_host *h, unsigned int virq);
/* Translate device-tree interrupt specifier from raw format coming
* from the firmware to a irq_hw_number_t (interrupt line number) and
- * trigger flags that can be passed to irq_create_mapping().
- * If no translation is provided, raw format is assumed to be one cell
- * for interrupt line and default sense.
+ * type (sense) that can be passed to set_irq_type(). In the absence
+ * of this callback, irq_create_of_mapping() and irq_of_parse_and_map()
+ * will return the hw number in the first cell and IRQ_TYPE_NONE for
+ * the type (which amount to keeping whatever default value the
+ * interrupt controller has for that line)
*/
int (*xlate)(struct irq_host *h, struct device_node *ctrler,
u32 *intspec, unsigned int intsize,
- irq_hw_number_t *out_hwirq, unsigned int *out_flags);
+ irq_hw_number_t *out_hwirq, unsigned int *out_type);
};
struct irq_host {
@@ -193,25 +192,14 @@
* irq_create_mapping - Map a hardware interrupt into linux virq space
* @host: host owning this hardware interrupt or NULL for default host
* @hwirq: hardware irq number in that host space
- * @flags: flags passed to the controller. contains the trigger type among
- * others. Use IRQ_TYPE_* defined in include/linux/irq.h
*
* Only one mapping per hardware interrupt is permitted. Returns a linux
- * virq number. The flags can be used to provide sense information to the
- * controller (typically extracted from the device-tree). If no information
- * is passed, the controller defaults will apply (for example, xics can only
- * do edge so flags are irrelevant for some pseries specific irqs).
- *
- * The device-tree generally contains the trigger info in an encoding that is
- * specific to a given type of controller. In that case, you can directly use
- * host->ops->trigger_xlate() to translate that.
- *
- * It is recommended that new PICs that don't have existing OF bindings chose
- * to use a representation of triggers identical to linux.
+ * virq number.
+ * If the sense/trigger is to be specified, set_irq_type() should be called
+ * on the number returned from that call.
*/
extern unsigned int irq_create_mapping(struct irq_host *host,
- irq_hw_number_t hwirq,
- unsigned int flags);
+ irq_hw_number_t hwirq);
/***
@@ -295,7 +283,7 @@
*
* This function is identical to irq_create_mapping except that it takes
* as input informations straight from the device-tree (typically the results
- * of the of_irq_map_*() functions
+ * of the of_irq_map_*() functions.
*/
extern unsigned int irq_create_of_mapping(struct device_node *controller,
u32 *intspec, unsigned int intsize);
Index: linux-irq-work/arch/powerpc/kernel/ibmebus.c
===================================================================
--- linux-irq-work.orig/arch/powerpc/kernel/ibmebus.c 2006-07-04 09:37:03.000000000 +1000
+++ linux-irq-work/arch/powerpc/kernel/ibmebus.c 2006-07-08 11:05:35.000000000 +1000
@@ -323,7 +323,7 @@
unsigned long irq_flags, const char * devname,
void *dev_id)
{
- unsigned int irq = irq_create_mapping(NULL, ist, 0);
+ unsigned int irq = irq_create_mapping(NULL, ist);
if (irq == NO_IRQ)
return -EINVAL;
Index: linux-irq-work/arch/powerpc/platforms/cell/interrupt.c
===================================================================
--- linux-irq-work.orig/arch/powerpc/platforms/cell/interrupt.c 2006-07-04 09:37:03.000000000 +1000
+++ linux-irq-work/arch/powerpc/platforms/cell/interrupt.c 2006-07-08 09:41:01.000000000 +1000
@@ -159,7 +159,7 @@
if (iic_hosts[node] == NULL)
continue;
virq = irq_create_mapping(iic_hosts[node],
- iic_ipi_to_irq(ipi), 0);
+ iic_ipi_to_irq(ipi));
if (virq == NO_IRQ) {
printk(KERN_ERR
"iic: failed to map IPI %s on node %d\n",
@@ -197,7 +197,7 @@
}
static int iic_host_map(struct irq_host *h, unsigned int virq,
- irq_hw_number_t hw, unsigned int flags)
+ irq_hw_number_t hw)
{
if (hw < IIC_IRQ_IPI0)
set_irq_chip_and_handler(virq, &iic_chip, handle_fasteoi_irq);
Index: linux-irq-work/arch/powerpc/platforms/cell/spider-pic.c
===================================================================
--- linux-irq-work.orig/arch/powerpc/platforms/cell/spider-pic.c 2006-07-04 09:37:03.000000000 +1000
+++ linux-irq-work/arch/powerpc/platforms/cell/spider-pic.c 2006-07-08 11:09:28.000000000 +1000
@@ -85,9 +85,6 @@
struct spider_pic *pic = spider_virq_to_pic(virq);
void __iomem *cfg = spider_get_irq_config(pic, irq_map[virq].hwirq);
- /* We use no locking as we should be covered by the descriptor lock
- * for access to invidual source configuration registers
- */
out_be32(cfg, in_be32(cfg) | 0x30000000u);
}
@@ -96,9 +93,6 @@
struct spider_pic *pic = spider_virq_to_pic(virq);
void __iomem *cfg = spider_get_irq_config(pic, irq_map[virq].hwirq);
- /* We use no locking as we should be covered by the descriptor lock
- * for access to invidual source configuration registers
- */
out_be32(cfg, in_be32(cfg) & ~0x30000000u);
}
@@ -120,26 +114,14 @@
out_be32(pic->regs + TIR_EDC, 0x100 | (src & 0xf));
}
-static struct irq_chip spider_pic = {
- .typename = " SPIDER ",
- .unmask = spider_unmask_irq,
- .mask = spider_mask_irq,
- .ack = spider_ack_irq,
-};
-
-static int spider_host_match(struct irq_host *h, struct device_node *node)
-{
- struct spider_pic *pic = h->host_data;
- return node == pic->of_node;
-}
-
-static int spider_host_map(struct irq_host *h, unsigned int virq,
- irq_hw_number_t hw, unsigned int flags)
+static int spider_set_irq_type(unsigned int virq, unsigned int type)
{
- unsigned int sense = flags & IRQ_TYPE_SENSE_MASK;
- struct spider_pic *pic = h->host_data;
+ unsigned int sense = type & IRQ_TYPE_SENSE_MASK;
+ struct spider_pic *pic = spider_virq_to_pic(virq);
+ unsigned int hw = irq_map[virq].hwirq;
void __iomem *cfg = spider_get_irq_config(pic, hw);
- int level = 0;
+ struct irq_desc *desc = get_irq_desc(virq);
+ u32 old_mask;
u32 ic;
/* Note that only level high is supported for most interrupts */
@@ -157,29 +139,57 @@
break;
case IRQ_TYPE_LEVEL_LOW:
ic = 0x0;
- level = 1;
break;
case IRQ_TYPE_LEVEL_HIGH:
case IRQ_TYPE_NONE:
ic = 0x1;
- level = 1;
break;
default:
return -EINVAL;
}
+ /* Update irq_desc */
+ desc->status &= ~(IRQ_TYPE_SENSE_MASK | IRQ_LEVEL);
+ desc->status |= type & IRQ_TYPE_SENSE_MASK;
+ if (type & (IRQ_TYPE_LEVEL_HIGH | IRQ_TYPE_LEVEL_LOW))
+ desc->status |= IRQ_LEVEL;
+
/* Configure the source. One gross hack that was there before and
* that I've kept around is the priority to the BE which I set to
* be the same as the interrupt source number. I don't know wether
* that's supposed to make any kind of sense however, we'll have to
* decide that, but for now, I'm not changing the behaviour.
*/
- out_be32(cfg, (ic << 24) | (0x7 << 16) | (pic->node_id << 4) | 0xe);
+ old_mask = in_be32(cfg) & 0x30000000u;
+ out_be32(cfg, old_mask | (ic << 24) | (0x7 << 16) |
+ (pic->node_id << 4) | 0xe);
out_be32(cfg + 4, (0x2 << 16) | (hw & 0xff));
- if (level)
- get_irq_desc(virq)->status |= IRQ_LEVEL;
+ return 0;
+}
+
+static struct irq_chip spider_pic = {
+ .typename = " SPIDER ",
+ .unmask = spider_unmask_irq,
+ .mask = spider_mask_irq,
+ .ack = spider_ack_irq,
+ .set_type = spider_set_irq_type,
+};
+
+static int spider_host_match(struct irq_host *h, struct device_node *node)
+{
+ struct spider_pic *pic = h->host_data;
+ return node == pic->of_node;
+}
+
+static int spider_host_map(struct irq_host *h, unsigned int virq,
+ irq_hw_number_t hw)
+{
set_irq_chip_and_handler(virq, &spider_pic, handle_level_irq);
+
+ /* Set default irq type */
+ set_irq_type(virq, IRQ_TYPE_NONE);
+
return 0;
}
@@ -283,7 +293,7 @@
if (iic_host == NULL)
return NO_IRQ;
/* Manufacture an IIC interrupt number of class 2 */
- virq = irq_create_mapping(iic_host, 0x20 | unit, 0);
+ virq = irq_create_mapping(iic_host, 0x20 | unit);
if (virq == NO_IRQ)
printk(KERN_ERR "spider_pic: failed to map cascade !");
return virq;
Index: linux-irq-work/arch/powerpc/platforms/cell/spu_base.c
===================================================================
--- linux-irq-work.orig/arch/powerpc/platforms/cell/spu_base.c 2006-07-04 09:37:03.000000000 +1000
+++ linux-irq-work/arch/powerpc/platforms/cell/spu_base.c 2006-07-08 11:05:01.000000000 +1000
@@ -583,9 +583,9 @@
spu->isrc = isrc = tmp[0];
/* Now map interrupts of all 3 classes */
- spu->irqs[0] = irq_create_mapping(host, 0x00 | isrc, 0);
- spu->irqs[1] = irq_create_mapping(host, 0x10 | isrc, 0);
- spu->irqs[2] = irq_create_mapping(host, 0x20 | isrc, 0);
+ spu->irqs[0] = irq_create_mapping(host, 0x00 | isrc);
+ spu->irqs[1] = irq_create_mapping(host, 0x10 | isrc);
+ spu->irqs[2] = irq_create_mapping(host, 0x20 | isrc);
/* Right now, we only fail if class 2 failed */
return spu->irqs[2] == NO_IRQ ? -EINVAL : 0;
Index: linux-irq-work/arch/powerpc/platforms/iseries/irq.c
===================================================================
--- linux-irq-work.orig/arch/powerpc/platforms/iseries/irq.c 2006-07-04 09:37:03.000000000 +1000
+++ linux-irq-work/arch/powerpc/platforms/iseries/irq.c 2006-07-08 11:06:10.000000000 +1000
@@ -300,7 +300,7 @@
realirq = (((((sub_bus << 8) + (bus - 1)) << 3) + (idsel - 1)) << 3)
+ function;
- return irq_create_mapping(NULL, realirq, IRQ_TYPE_NONE);
+ return irq_create_mapping(NULL, realirq);
}
#endif /* CONFIG_PCI */
@@ -341,7 +341,7 @@
}
static int iseries_irq_host_map(struct irq_host *h, unsigned int virq,
- irq_hw_number_t hw, unsigned int flags)
+ irq_hw_number_t hw)
{
set_irq_chip_and_handler(virq, &iseries_pic, handle_fasteoi_irq);
Index: linux-irq-work/arch/powerpc/platforms/pseries/ras.c
===================================================================
--- linux-irq-work.orig/arch/powerpc/platforms/pseries/ras.c 2006-07-04 09:37:03.000000000 +1000
+++ linux-irq-work/arch/powerpc/platforms/pseries/ras.c 2006-07-08 11:06:34.000000000 +1000
@@ -93,8 +93,7 @@
for (i = 0; i < opicplen; i++) {
if (count > 15)
break;
- virqs[count] = irq_create_mapping(NULL, *(opicprop++),
- IRQ_TYPE_NONE);
+ virqs[count] = irq_create_mapping(NULL, *(opicprop++));
if (virqs[count] == NO_IRQ)
printk(KERN_ERR "Unable to allocate interrupt "
"number for %s\n", np->full_name);
Index: linux-irq-work/arch/powerpc/platforms/pseries/xics.c
===================================================================
--- linux-irq-work.orig/arch/powerpc/platforms/pseries/xics.c 2006-07-04 09:37:03.000000000 +1000
+++ linux-irq-work/arch/powerpc/platforms/pseries/xics.c 2006-07-08 14:28:49.000000000 +1000
@@ -502,16 +502,9 @@
}
static int xics_host_map_direct(struct irq_host *h, unsigned int virq,
- irq_hw_number_t hw, unsigned int flags)
+ irq_hw_number_t hw)
{
- unsigned int sense = flags & IRQ_TYPE_SENSE_MASK;
-
- pr_debug("xics: map_direct virq %d, hwirq 0x%lx, flags: 0x%x\n",
- virq, hw, flags);
-
- if (sense && sense != IRQ_TYPE_LEVEL_LOW)
- printk(KERN_WARNING "xics: using unsupported sense 0x%x"
- " for irq %d (h: 0x%lx)\n", flags, virq, hw);
+ pr_debug("xics: map_direct virq %d, hwirq 0x%lx\n", virq, hw);
get_irq_desc(virq)->status |= IRQ_LEVEL;
set_irq_chip_and_handler(virq, &xics_pic_direct, handle_fasteoi_irq);
@@ -519,16 +512,9 @@
}
static int xics_host_map_lpar(struct irq_host *h, unsigned int virq,
- irq_hw_number_t hw, unsigned int flags)
+ irq_hw_number_t hw)
{
- unsigned int sense = flags & IRQ_TYPE_SENSE_MASK;
-
- pr_debug("xics: map_lpar virq %d, hwirq 0x%lx, flags: 0x%x\n",
- virq, hw, flags);
-
- if (sense && sense != IRQ_TYPE_LEVEL_LOW)
- printk(KERN_WARNING "xics: using unsupported sense 0x%x"
- " for irq %d (h: 0x%lx)\n", flags, virq, hw);
+ pr_debug("xics: map_direct virq %d, hwirq 0x%lx\n", virq, hw);
get_irq_desc(virq)->status |= IRQ_LEVEL;
set_irq_chip_and_handler(virq, &xics_pic_lpar, handle_fasteoi_irq);
@@ -757,7 +743,7 @@
{
unsigned int ipi;
- ipi = irq_create_mapping(xics_host, XICS_IPI, 0);
+ ipi = irq_create_mapping(xics_host, XICS_IPI);
BUG_ON(ipi == NO_IRQ);
/*
@@ -795,7 +781,7 @@
return;
desc = get_irq_desc(ipi);
if (desc->chip && desc->chip->eoi)
- desc->chip->eoi(XICS_IPI);
+ desc->chip->eoi(ipi);
/*
* Some machines need to have at least one cpu in the GIQ,
Index: linux-irq-work/arch/powerpc/sysdev/i8259.c
===================================================================
--- linux-irq-work.orig/arch/powerpc/sysdev/i8259.c 2006-07-04 09:37:03.000000000 +1000
+++ linux-irq-work/arch/powerpc/sysdev/i8259.c 2006-07-08 09:39:38.000000000 +1000
@@ -169,7 +169,7 @@
}
static int i8259_host_map(struct irq_host *h, unsigned int virq,
- irq_hw_number_t hw, unsigned int flags)
+ irq_hw_number_t hw)
{
pr_debug("i8259_host_map(%d, 0x%lx)\n", virq, hw);
@@ -177,7 +177,7 @@
if (hw == 2)
get_irq_desc(virq)->status |= IRQ_NOREQUEST;
- /* We use the level stuff only for now, we might want to
+ /* We use the level handler only for now, we might want to
* be more cautious here but that works for now
*/
get_irq_desc(virq)->status |= IRQ_LEVEL;
Index: linux-irq-work/drivers/char/hvsi.c
===================================================================
--- linux-irq-work.orig/drivers/char/hvsi.c 2006-07-04 09:37:03.000000000 +1000
+++ linux-irq-work/drivers/char/hvsi.c 2006-07-08 11:07:11.000000000 +1000
@@ -1299,7 +1299,7 @@
hp->inbuf_end = hp->inbuf;
hp->state = HVSI_CLOSED;
hp->vtermno = *vtermno;
- hp->virq = irq_create_mapping(NULL, irq[0], 0);
+ hp->virq = irq_create_mapping(NULL, irq[0]);
if (hp->virq == NO_IRQ) {
printk(KERN_ERR "%s: couldn't create irq mapping for 0x%x\n",
__FUNCTION__, irq[0]);
Index: linux-irq-work/arch/powerpc/kernel/pci_64.c
===================================================================
--- linux-irq-work.orig/arch/powerpc/kernel/pci_64.c 2006-07-04 09:37:03.000000000 +1000
+++ linux-irq-work/arch/powerpc/kernel/pci_64.c 2006-07-08 15:31:31.000000000 +1000
@@ -21,13 +21,13 @@
#include <linux/mm.h>
#include <linux/list.h>
#include <linux/syscalls.h>
+#include <linux/irq.h>
#include <asm/processor.h>
#include <asm/io.h>
#include <asm/prom.h>
#include <asm/pci-bridge.h>
#include <asm/byteorder.h>
-#include <asm/irq.h>
#include <asm/machdep.h>
#include <asm/ppc-pci.h>
@@ -1289,15 +1289,31 @@
DBG("Try to map irq for %s...\n", pci_name(pci_dev));
+ /* Try to get a mapping from the device-tree */
if (of_irq_map_pci(pci_dev, &oirq)) {
- DBG(" -> failed !\n");
- return -1;
- }
+ u8 line;
- DBG(" -> got one, spec %d cells (0x%08x...) on %s\n",
- oirq.size, oirq.specifier[0], oirq.controller->full_name);
+ /* If that fails, lets fallback to what is in the config
+ * space and map that through the default controller. We
+ * also set the type to level low since that's what PCI
+ * interrupts are. If your platform does differently, then
+ * either provide a proper interrupt tree or don't use this
+ * function.
+ */
+ if (pci_read_config_byte(pci_dev, PCI_INTERRUPT_LINE, &line))
+ return -1;
+ DBG(" -> failed ! Using irq line %d from PCI config\n", line);
+
+ virq = irq_create_mapping(NULL, line);
+ if (virq != NO_IRQ)
+ set_irq_type(virq, IRQ_TYPE_LEVEL_LOW);
+ } else {
+ DBG(" -> got one, spec %d cells (0x%08x...) on %s\n",
+ oirq.size, oirq.specifier[0], oirq.controller->full_name);
- virq = irq_create_of_mapping(oirq.controller, oirq.specifier, oirq.size);
+ virq = irq_create_of_mapping(oirq.controller, oirq.specifier,
+ oirq.size);
+ }
if(virq == NO_IRQ) {
DBG(" -> failed to map !\n");
return -1;
Index: linux-irq-work/arch/powerpc/kernel/pci_32.c
===================================================================
--- linux-irq-work.orig/arch/powerpc/kernel/pci_32.c 2006-07-04 09:37:03.000000000 +1000
+++ linux-irq-work/arch/powerpc/kernel/pci_32.c 2006-07-08 16:05:49.000000000 +1000
@@ -11,6 +11,7 @@
#include <linux/sched.h>
#include <linux/errno.h>
#include <linux/bootmem.h>
+#include <linux/irq.h>
#include <asm/processor.h>
#include <asm/io.h>
@@ -18,7 +19,6 @@
#include <asm/sections.h>
#include <asm/pci-bridge.h>
#include <asm/byteorder.h>
-#include <asm/irq.h>
#include <asm/uaccess.h>
#include <asm/machdep.h>
@@ -1420,15 +1420,31 @@
DBG("Try to map irq for %s...\n", pci_name(pci_dev));
+ /* Try to get a mapping from the device-tree */
if (of_irq_map_pci(pci_dev, &oirq)) {
- DBG(" -> failed !\n");
- return -1;
- }
+ u8 line;
- DBG(" -> got one, spec %d cells (0x%08x...) on %s\n",
- oirq.size, oirq.specifier[0], oirq.controller->full_name);
+ /* If that fails, lets fallback to what is in the config
+ * space and map that through the default controller. We
+ * also set the type to level low since that's what PCI
+ * interrupts are. If your platform does differently, then
+ * either provide a proper interrupt tree or don't use this
+ * function.
+ */
+ if (pci_read_config_byte(pci_dev, PCI_INTERRUPT_LINE, &line))
+ return -1;
+ DBG(" -> failed ! Using irq line %d from PCI config\n", line);
- virq = irq_create_of_mapping(oirq.controller, oirq.specifier, oirq.size);
+ virq = irq_create_mapping(NULL, line);
+ if (virq != NO_IRQ)
+ set_irq_type(virq, IRQ_TYPE_LEVEL_LOW);
+ } else {
+ DBG(" -> got one, spec %d cells (0x%08x...) on %s\n",
+ oirq.size, oirq.specifier[0], oirq.controller->full_name);
+
+ virq = irq_create_of_mapping(oirq.controller, oirq.specifier,
+ oirq.size);
+ }
if(virq == NO_IRQ) {
DBG(" -> failed to map !\n");
return -1;
^ permalink raw reply
* [PATCH 0/6] Sizing zones and holes in an architecture independent manner V8
From: Mel Gorman @ 2006-07-08 11:10 UTC (permalink / raw)
To: akpm
Cc: davej, tony.luck, linux-mm, Mel Gorman, ak, bob.picco,
linux-kernel, linuxppc-dev
This is V8 of the patchset to size zones and memory holes in an
architecture-independent manner. The notable addition in this release is
accounting for mem_map as a memory hole as it is not reclaimable and the
optional account of the kernel image as a memory hole. This is to match the
existing behavior of x86_64.
Changelog since V7
o Rebase to 2.6.17-mm6
o Account for mem_map as a memory hole
o Adjust mem_map when arch independent zone-sizing is used and PFN 0 is in
a memory hole not accounted for by ARCH_PFN_OFFSET
Changelog since V6
o MAX_ACTIVE_REGIONS is really maximum active regions, not MAX_ACTIVE_REGIONS-1
o MAX_ACTIVE_REGIONS is 256 unless the architecture specifically asks for
a different number or MAX_NUMNODES is >= 32
o nr_nodemap_entries tracks the number of entries rather than terminating with
end_pfn == 0
o Add number of documentation-related comments. Functions exposed by headers
may potentially be picked up by kerneldoc
o Changed misleading zone_present_pages_in_node() name to
zone_spanned_pages_in_node()
o Be a bit more verbose to help debugging when things go wrong.
o On x86_64, end_pfn_map now gets updated properly or ACPI tables get "lost"
o Signoffs added to patches 1 and 5 by Bob Picco related to contributions,
fixes and reviews
Changelog since V5
o Add a missing #include to mm/mem_init.c
o Drop the verbose debugging part of the set
o Report active range registration when loglevel is set for KERN_DEBUG
Changelog since V4
o Rebase to 2.6.17-rc3-mm1
o Calculate holes on x86 with SRAT correctly
Changelog since V3
o Rebase to 2.6.17-rc2
o Allow the active regions to be cleared. Needed by x86_64 when it decides
the SRAT table is bad half way through the registering of active regions
o Fix for flatmem x86_64 machines booting
Changelog since V2
o Fix a bug where holes in lower zones get double counted
o Catch the case where a new range is registered that is within an range
o Catch the case where a zone boundary is within a hole
o Use the EFI map for registering ranges on x86_64+numa
o On IA64+NUMA, add the active ranges before rounding for granules
o On x86_64, remove e820_hole_size and e820_bootmem_free and use
arch-independent equivalents
o On x86_64, remove the map walk in e820_end_of_ram()
o Rename memory_present_with_active_regions, name ambiguous
o Add absent_pages_in_range() for arches to call
Changelog since V1
o Correctly convert virtual and physical addresses to PFNs on ia64
o Correctly convert physical addresses to PFN on older ppc
o When add_active_range() is called with overlapping pfn ranges, merge them
o When a zone boundary occurs within a memory hole, account correctly
o Minor whitespace damage cleanup
o Debugging patch temporarily included
At a basic level, architectures define structures to record where active
ranges of page frames are located. Once located, the code to calculate
zone sizes and holes in each architecture is very similar. Some of this
zone and hole sizing code is difficult to read for no good reason. This
set of patches eliminates the similar-looking architecture-specific code.
The patches introduce a mechanism where architectures register where the
active ranges of page frames are with add_active_range(). When all areas
have been discovered, free_area_init_nodes() is called to initialise
the pgdat and zones. The zone sizes and holes are then calculated in an
architecture independent manner.
Patch 1 introduces the mechanism for registering and initialising PFN ranges
Patch 2 changes ppc to use the mechanism - 134 arch-specific LOC removed
Patch 3 changes x86 to use the mechanism - 142 arch-specific LOC removed
Patch 4 changes x86_64 to use the mechanism - 78 arch-specific LOC removed
Patch 5 changes ia64 to use the mechanism - 57 arch-specific LOC removed
Patch 6 accounts for mem_map as a memory hole as the pages are not reclaimable.
It adjusts the watermarks slightly
The patches have been successfully boot tested by me and verified that the
zones are the correct size on
o x86, flatmem with 1.5GiB of RAM
o x86, NUMAQ
o x86 with SRAT CONFIG_NUMA=n
o PPC64, NUMA
o PPC64, CONFIG_NUMA=n
o PPC64, CONFIG_64BIT=N
o x86_64, NUMA with SRAT
o x86_64, NUMA with broken SRAT that falls back to k8topology discovery
o x86_64, CONFIG_NUMA=n
o x86_64, CONFIG_64=n
o x86_64, CONFIG_64=n, CONFIG_NUMA=n
o x86_64, ACPI_NUMA, ACPI_MEMORY_HOTPLUG && !SPARSEMEM to trigger the
hotadd path without sparsemem fun in srat.c (SRAT broken on test machine and
I'm pretty sure the machine does not support physical memory hotadd anyway
so test may not have been effective other than being a compile test.)
o ia64 (Itanium 2)
o ia64 (Itanium 2), CONFIG_64=N
Tony Luck has successfully tested for ia64 on Itanium with tiger_defconfig,
gensparse_defconfig and defconfig. Bob Picco has also tested and debugged
on IA64. Jack Steiner successfully boot tested on a mammoth SGI IA64-based
machine. These were on patches against 2.6.17-rc1 and release 3 of these
patches but there have been no ia64-changes since release 3.
There are differences in the zone sizes for x86_64 as the arch-specific code
for x86_64 accounts the kernel image and the starting mem_maps as memory
holes but the architecture-independent code accounts the memory as present.
The big benefit of this set of patches is the reduction of 411 lines of
architecture-specific code, some of which is very hairy. There should be
a greater net reduction when other architectures use the same mechanisms
for zone and hole sizing but I lack the hardware to test on.
Additional credit;
Dave Hansen for the initial suggestion and comments on early patches
Andy Whitcroft for reviewing early versions and catching numerous
errors
Tony Luck for testing and debugging on IA64
Bob Picco for fixing bugs related to pfn registration, reviewing a
number of patch revisions, providing a number of suggestions
on future direction and testing heavily
Jack Steiner and Robin Holt for testing on IA64 and clarifying
issues related to memory holes
Yasunori for testing on IA64
Andi Kleen for reviewing and feeding back about x86_64
Christian Kujau for providing valuable information related to ACPI
problems on x86_64 and testing potential fixes
--
--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
^ permalink raw reply
* [PATCH 1/6] Introduce mechanism for registering active regions of memory
From: Mel Gorman @ 2006-07-08 11:11 UTC (permalink / raw)
To: akpm
Cc: davej, tony.luck, linuxppc-dev, Mel Gorman, linux-kernel,
bob.picco, ak, linux-mm
In-Reply-To: <20060708111042.28664.14732.sendpatchset@skynet.skynet.ie>
This patch defines the structure to represent an active range of page
frames within a node in an architecture independent manner. Architectures
are expected to register active ranges of PFNs using add_active_range(nid,
start_pfn, end_pfn) and call free_area_init_nodes() passing the PFNs of
the end of each zone.
include/linux/mm.h | 45 +++
include/linux/mmzone.h | 10
mm/page_alloc.c | 557 ++++++++++++++++++++++++++++++++++++++++++--
3 files changed, 586 insertions(+), 26 deletions(-)
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Bob Picco <bob.picco@hp.com>
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.17-mm6-clean/include/linux/mm.h linux-2.6.17-mm6-101-add_free_area_init_nodes/include/linux/mm.h
--- linux-2.6.17-mm6-clean/include/linux/mm.h 2006-07-05 14:31:17.000000000 +0100
+++ linux-2.6.17-mm6-101-add_free_area_init_nodes/include/linux/mm.h 2006-07-06 11:04:22.000000000 +0100
@@ -960,6 +960,51 @@ extern void free_area_init(unsigned long
extern void free_area_init_node(int nid, pg_data_t *pgdat,
unsigned long * zones_size, unsigned long zone_start_pfn,
unsigned long *zholes_size);
+#ifdef CONFIG_ARCH_POPULATES_NODE_MAP
+/*
+ * With CONFIG_ARCH_POPULATES_NODE_MAP set, an architecture may initialise its
+ * zones, allocate the backing mem_map and account for memory holes in a more
+ * architecture independent manner. This is a substitute for creating the
+ * zone_sizes[] and zholes_size[] arrays and passing them to
+ * free_area_init_node()
+ *
+ * An architecture is expected to register range of page frames backed by
+ * physical memory with add_active_range() before calling
+ * free_area_init_nodes() passing in the PFN each zone ends at. At a basic
+ * usage, an architecture is expected to do something like
+ *
+ * for_each_valid_physical_page_range()
+ * add_active_range(node_id, start_pfn, end_pfn)
+ * free_area_init_nodes(max_dma, max_dma32, max_normal_pfn, max_highmem_pfn);
+ *
+ * If the architecture guarantees that there are no holes in the ranges
+ * registered with add_active_range(), free_bootmem_active_regions()
+ * will call free_bootmem_node() for each registered physical page range.
+ * Similarly sparse_memory_present_with_active_regions() calls
+ * memory_present() for each range when SPARSEMEM is enabled.
+ *
+ * See mm/page_alloc.c for more information on each function exposed by
+ * CONFIG_ARCH_POPULATES_NODE_MAP
+ */
+extern void free_area_init_nodes(unsigned long max_dma_pfn,
+ unsigned long max_dma32_pfn,
+ unsigned long max_low_pfn,
+ unsigned long max_high_pfn);
+extern void add_active_range(unsigned int nid, unsigned long start_pfn,
+ unsigned long end_pfn);
+extern void shrink_active_range(unsigned int nid, unsigned long old_end_pfn,
+ unsigned long new_end_pfn);
+extern void remove_all_active_ranges(void);
+extern unsigned long absent_pages_in_range(unsigned long start_pfn,
+ unsigned long end_pfn);
+extern void get_pfn_range_for_nid(unsigned int nid,
+ unsigned long *start_pfn, unsigned long *end_pfn);
+extern unsigned long find_min_pfn_with_active_regions(void);
+extern unsigned long find_max_pfn_with_active_regions(void);
+extern void free_bootmem_with_active_regions(int nid,
+ unsigned long max_low_pfn);
+extern void sparse_memory_present_with_active_regions(int nid);
+#endif /* CONFIG_ARCH_POPULATES_NODE_MAP */
extern void memmap_init_zone(unsigned long, int, unsigned long, unsigned long);
extern void setup_per_zone_pages_min(void);
extern void mem_init(void);
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.17-mm6-clean/include/linux/mmzone.h linux-2.6.17-mm6-101-add_free_area_init_nodes/include/linux/mmzone.h
--- linux-2.6.17-mm6-clean/include/linux/mmzone.h 2006-07-05 14:31:17.000000000 +0100
+++ linux-2.6.17-mm6-101-add_free_area_init_nodes/include/linux/mmzone.h 2006-07-06 11:04:22.000000000 +0100
@@ -293,6 +293,13 @@ struct zonelist {
struct zone *zones[MAX_NUMNODES * MAX_NR_ZONES + 1]; // NULL delimited
};
+#ifdef CONFIG_ARCH_POPULATES_NODE_MAP
+struct node_active_region {
+ unsigned long start_pfn;
+ unsigned long end_pfn;
+ int nid;
+};
+#endif /* CONFIG_ARCH_POPULATES_NODE_MAP */
/*
* The pg_data_t structure is used in machines with CONFIG_DISCONTIGMEM
@@ -493,7 +500,8 @@ extern struct zone *next_zone(struct zon
#endif
-#ifndef CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID
+#if !defined(CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID) && \
+ !defined(CONFIG_ARCH_POPULATES_NODE_MAP)
#define early_pfn_to_nid(nid) (0UL)
#endif
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.17-mm6-clean/mm/page_alloc.c linux-2.6.17-mm6-101-add_free_area_init_nodes/mm/page_alloc.c
--- linux-2.6.17-mm6-clean/mm/page_alloc.c 2006-07-05 14:31:18.000000000 +0100
+++ linux-2.6.17-mm6-101-add_free_area_init_nodes/mm/page_alloc.c 2006-07-06 21:44:44.000000000 +0100
@@ -37,6 +37,8 @@
#include <linux/vmalloc.h>
#include <linux/mempolicy.h>
#include <linux/stop_machine.h>
+#include <linux/sort.h>
+#include <linux/pfn.h>
#include <asm/tlbflush.h>
#include <asm/div64.h>
@@ -86,6 +88,33 @@ int min_free_kbytes = 1024;
unsigned long __meminitdata nr_kernel_pages;
unsigned long __meminitdata nr_all_pages;
+#ifdef CONFIG_ARCH_POPULATES_NODE_MAP
+ /*
+ * MAX_ACTIVE_REGIONS determines the maxmimum number of distinct
+ * ranges of memory (RAM) that may be registered with add_active_range().
+ * Ranges passed to add_active_range() will be merged if possible
+ * so the number of times add_active_range() can be called is
+ * related to the number of nodes and the number of holes
+ */
+ #ifdef CONFIG_MAX_ACTIVE_REGIONS
+ /* Allow an architecture to set MAX_ACTIVE_REGIONS to save memory */
+ #define MAX_ACTIVE_REGIONS CONFIG_MAX_ACTIVE_REGIONS
+ #else
+ #if MAX_NUMNODES >= 32
+ /* If there can be many nodes, allow up to 50 holes per node */
+ #define MAX_ACTIVE_REGIONS (MAX_NUMNODES*50)
+ #else
+ /* By default, allow up to 256 distinct regions */
+ #define MAX_ACTIVE_REGIONS 256
+ #endif
+ #endif
+
+ struct node_active_region __initdata early_node_map[MAX_ACTIVE_REGIONS];
+ int __initdata nr_nodemap_entries;
+ unsigned long __initdata arch_zone_lowest_possible_pfn[MAX_NR_ZONES];
+ unsigned long __initdata arch_zone_highest_possible_pfn[MAX_NR_ZONES];
+#endif /* CONFIG_ARCH_POPULATES_NODE_MAP */
+
#ifdef CONFIG_DEBUG_VM
static int page_outside_zone_boundaries(struct zone *zone, struct page *page)
{
@@ -1728,25 +1757,6 @@ static inline unsigned long wait_table_b
#define LONG_ALIGN(x) (((x)+(sizeof(long))-1)&~((sizeof(long))-1))
-static void __init calculate_zone_totalpages(struct pglist_data *pgdat,
- unsigned long *zones_size, unsigned long *zholes_size)
-{
- unsigned long realtotalpages, totalpages = 0;
- int i;
-
- for (i = 0; i < MAX_NR_ZONES; i++)
- totalpages += zones_size[i];
- pgdat->node_spanned_pages = totalpages;
-
- realtotalpages = totalpages;
- if (zholes_size)
- for (i = 0; i < MAX_NR_ZONES; i++)
- realtotalpages -= zholes_size[i];
- pgdat->node_present_pages = realtotalpages;
- printk(KERN_DEBUG "On node %d totalpages: %lu\n", pgdat->node_id, realtotalpages);
-}
-
-
/*
* Initially all pages are reserved - free ones are freed
* up by free_all_bootmem() once the early boot process is
@@ -2064,6 +2074,272 @@ __meminit int init_currently_empty_zone(
return 0;
}
+#ifdef CONFIG_ARCH_POPULATES_NODE_MAP
+/*
+ * Basic iterator support. Return the first range of PFNs for a node
+ * Note: nid == MAX_NUMNODES returns first region regardless of node
+ */
+static int __init first_active_region_index_in_nid(int nid)
+{
+ int i;
+
+ for (i = 0; i < nr_nodemap_entries; i++)
+ if (nid == MAX_NUMNODES || early_node_map[i].nid == nid)
+ return i;
+
+ return -1;
+}
+
+/*
+ * Basic iterator support. Return the next active range of PFNs for a node
+ * Note: nid == MAX_NUMNODES returns next region regardles of node
+ */
+static int __init next_active_region_index_in_nid(int index, int nid)
+{
+ for (index = index + 1; index < nr_nodemap_entries; index++)
+ if (nid == MAX_NUMNODES || early_node_map[index].nid == nid)
+ return index;
+
+ return -1;
+}
+
+#ifndef CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID
+/*
+ * Required by SPARSEMEM. Given a PFN, return what node the PFN is on.
+ * Architectures may implement their own version but if add_active_range()
+ * was used and there are no special requirements, this is a convenient
+ * alternative
+ */
+int __init early_pfn_to_nid(unsigned long pfn)
+{
+ int i;
+
+ for (i = 0; i < nr_nodemap_entries; i++) {
+ unsigned long start_pfn = early_node_map[i].start_pfn;
+ unsigned long end_pfn = early_node_map[i].end_pfn;
+
+ if (start_pfn <= pfn && pfn < end_pfn)
+ return early_node_map[i].nid;
+ }
+
+ return 0;
+}
+#endif /* CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID */
+
+/* Basic iterator support to walk early_node_map[] */
+#define for_each_active_range_index_in_nid(i, nid) \
+ for (i = first_active_region_index_in_nid(nid); i != -1; \
+ i = next_active_region_index_in_nid(i, nid))
+
+/**
+ * free_bootmem_with_active_regions - Call free_bootmem_node for each active range
+ * @nid: The node to free memory on. If MAX_NUMNODES, all nodes are freed
+ * @max_low_pfn: The highest PFN that till be passed to free_bootmem_node
+ *
+ * If an architecture guarantees that all ranges registered with
+ * add_active_ranges() contain no holes and may be freed, this
+ * this function may be used instead of calling free_bootmem() manually.
+ */
+void __init free_bootmem_with_active_regions(int nid,
+ unsigned long max_low_pfn)
+{
+ int i;
+
+ for_each_active_range_index_in_nid(i, nid) {
+ unsigned long size_pages = 0;
+ unsigned long end_pfn = early_node_map[i].end_pfn;
+
+ if (early_node_map[i].start_pfn >= max_low_pfn)
+ continue;
+
+ if (end_pfn > max_low_pfn)
+ end_pfn = max_low_pfn;
+
+ size_pages = end_pfn - early_node_map[i].start_pfn;
+ free_bootmem_node(NODE_DATA(early_node_map[i].nid),
+ PFN_PHYS(early_node_map[i].start_pfn),
+ size_pages << PAGE_SHIFT);
+ }
+}
+
+/**
+ * sparse_memory_present_with_active_regions - Call memory_present for each active range
+ * @nid: The node to call memory_present for. If MAX_NUMNODES, all nodes will be used
+ *
+ * If an architecture guarantees that all ranges registered with
+ * add_active_ranges() contain no holes and may be freed, this
+ * this function may be used instead of calling memory_present() manually.
+ */
+void __init sparse_memory_present_with_active_regions(int nid)
+{
+ int i;
+
+ for_each_active_range_index_in_nid(i, nid)
+ memory_present(early_node_map[i].nid,
+ early_node_map[i].start_pfn,
+ early_node_map[i].end_pfn);
+}
+
+/**
+ * get_pfn_range_for_nid - Return the start and end page frames for a node
+ * @nid: The nid to return the range for. If MAX_NUMNODES, the min and max PFN are returned
+ * @start_pfn: Passed by reference. On return, it will have the node start_pfn
+ * @end_pfn: Passed by reference. On return, it will have the node end_pfn
+ *
+ * It returns the start and end page frame of a node based on information
+ * provided by an arch calling add_active_range(). If called for a node
+ * with no available memory, a warning is printed and the start and end
+ * PFNs will be 0
+ */
+void __init get_pfn_range_for_nid(unsigned int nid,
+ unsigned long *start_pfn, unsigned long *end_pfn)
+{
+ int i;
+ *start_pfn = -1UL;
+ *end_pfn = 0;
+
+ for_each_active_range_index_in_nid(i, nid) {
+ *start_pfn = min(*start_pfn, early_node_map[i].start_pfn);
+ *end_pfn = max(*end_pfn, early_node_map[i].end_pfn);
+ }
+
+ if (*start_pfn == -1UL) {
+ printk(KERN_WARNING "Node %u active with no memory\n", nid);
+ *start_pfn = 0;
+ }
+}
+
+/*
+ * Return the number of pages a zone spans in a node, including holes
+ * present_pages = zone_spanned_pages_in_node() - zone_absent_pages_in_node()
+ */
+unsigned long __init zone_spanned_pages_in_node(int nid,
+ unsigned long zone_type,
+ unsigned long *ignored)
+{
+ unsigned long node_start_pfn, node_end_pfn;
+ unsigned long zone_start_pfn, zone_end_pfn;
+
+ /* Get the start and end of the node and zone */
+ get_pfn_range_for_nid(nid, &node_start_pfn, &node_end_pfn);
+ zone_start_pfn = arch_zone_lowest_possible_pfn[zone_type];
+ zone_end_pfn = arch_zone_highest_possible_pfn[zone_type];
+
+ /* Check that this node has pages within the zone's required range */
+ if (zone_end_pfn < node_start_pfn || zone_start_pfn > node_end_pfn)
+ return 0;
+
+ /* Move the zone boundaries inside the node if necessary */
+ zone_end_pfn = min(zone_end_pfn, node_end_pfn);
+ zone_start_pfn = max(zone_start_pfn, node_start_pfn);
+
+ /* Return the spanned pages */
+ return zone_end_pfn - zone_start_pfn;
+}
+
+/*
+ * Return the number of holes in a range on a node. If nid is MAX_NUMNODES,
+ * then all holes in the requested range will be accounted for
+ */
+unsigned long __init __absent_pages_in_range(int nid,
+ unsigned long range_start_pfn,
+ unsigned long range_end_pfn)
+{
+ int i = 0;
+ unsigned long prev_end_pfn = 0, hole_pages = 0;
+ unsigned long start_pfn;
+
+ /* Find the end_pfn of the first active range of pfns in the node */
+ i = first_active_region_index_in_nid(nid);
+ if (i == -1)
+ return 0;
+
+ prev_end_pfn = early_node_map[i].start_pfn;
+
+ /* Find all holes for the zone within the node */
+ for (; i != -1; i = next_active_region_index_in_nid(i, nid)) {
+
+ /* No need to continue if prev_end_pfn is outside the zone */
+ if (prev_end_pfn >= range_end_pfn)
+ break;
+
+ /* Make sure the end of the zone is not within the hole */
+ start_pfn = min(early_node_map[i].start_pfn, range_end_pfn);
+ prev_end_pfn = max(prev_end_pfn, range_start_pfn);
+
+ /* Update the hole size cound and move on */
+ if (start_pfn > range_start_pfn) {
+ BUG_ON(prev_end_pfn > start_pfn);
+ hole_pages += start_pfn - prev_end_pfn;
+ }
+ prev_end_pfn = early_node_map[i].end_pfn;
+ }
+
+ return hole_pages;
+}
+
+/**
+ * absent_pages_in_range - Return number of page frames in holes within a range
+ * @start_pfn: The start PFN to start searching for holes
+ * @end_pfn: The end PFN to stop searching for holes
+ *
+ * It returns the number of pages frames in memory holes within a range
+ */
+unsigned long __init absent_pages_in_range(unsigned long start_pfn,
+ unsigned long end_pfn)
+{
+ return __absent_pages_in_range(MAX_NUMNODES, start_pfn, end_pfn);
+}
+
+/* Return the number of page frames in holes in a zone on a node */
+unsigned long __init zone_absent_pages_in_node(int nid,
+ unsigned long zone_type,
+ unsigned long *ignored)
+{
+ return __absent_pages_in_range(nid,
+ arch_zone_lowest_possible_pfn[zone_type],
+ arch_zone_highest_possible_pfn[zone_type]);
+}
+#else
+static inline unsigned long zone_spanned_pages_in_node(int nid,
+ unsigned long zone_type,
+ unsigned long *zones_size)
+{
+ return zones_size[zone_type];
+}
+
+static inline unsigned long zone_absent_pages_in_node(int nid,
+ unsigned long zone_type,
+ unsigned long *zholes_size)
+{
+ if (!zholes_size)
+ return 0;
+
+ return zholes_size[zone_type];
+}
+#endif
+
+static void __init calculate_node_totalpages(struct pglist_data *pgdat,
+ unsigned long *zones_size, unsigned long *zholes_size)
+{
+ unsigned long realtotalpages, totalpages = 0;
+ int i;
+
+ for (i = 0; i < MAX_NR_ZONES; i++)
+ totalpages += zone_spanned_pages_in_node(pgdat->node_id, i,
+ zones_size);
+ pgdat->node_spanned_pages = totalpages;
+
+ realtotalpages = totalpages;
+ for (i = 0; i < MAX_NR_ZONES; i++)
+ realtotalpages -=
+ zone_absent_pages_in_node(pgdat->node_id, i,
+ zholes_size);
+ pgdat->node_present_pages = realtotalpages;
+ printk(KERN_DEBUG "On node %d totalpages: %lu\n", pgdat->node_id,
+ realtotalpages);
+}
+
/*
* Set up the zone data structures:
* - mark all pages reserved
@@ -2087,10 +2363,9 @@ static void __meminit free_area_init_cor
struct zone *zone = pgdat->node_zones + j;
unsigned long size, realsize;
- realsize = size = zones_size[j];
- if (zholes_size)
- realsize -= zholes_size[j];
-
+ size = zone_spanned_pages_in_node(nid, j, zones_size);
+ realsize = size - zone_absent_pages_in_node(nid, j,
+ zholes_size);
if (j < ZONE_HIGHMEM)
nr_kernel_pages += realsize;
nr_all_pages += realsize;
@@ -2159,8 +2434,13 @@ static void __init alloc_node_mem_map(st
/*
* With no DISCONTIG, the global mem_map is just set as node 0's
*/
- if (pgdat == NODE_DATA(0))
+ if (pgdat == NODE_DATA(0)) {
mem_map = NODE_DATA(0)->node_mem_map;
+#ifdef CONFIG_ARCH_POPULATES_NODE_MAP
+ if (page_to_pfn(mem_map) != pgdat->node_start_pfn)
+ mem_map -= pgdat->node_start_pfn;
+#endif /* CONFIG_ARCH_POPULATES_NODE_MAP */
+ }
#endif
#endif /* CONFIG_FLAT_NODE_MEM_MAP */
}
@@ -2171,13 +2451,240 @@ void __meminit free_area_init_node(int n
{
pgdat->node_id = nid;
pgdat->node_start_pfn = node_start_pfn;
- calculate_zone_totalpages(pgdat, zones_size, zholes_size);
+ calculate_node_totalpages(pgdat, zones_size, zholes_size);
alloc_node_mem_map(pgdat);
free_area_init_core(pgdat, zones_size, zholes_size);
}
+#ifdef CONFIG_ARCH_POPULATES_NODE_MAP
+/**
+ * add_active_range - Register a range of PFNs backed by physical memory
+ * @nid: The node ID the range resides on
+ * @start_pfn: The start PFN of the available physical memory
+ * @end_pfn: The end PFN of the available physical memory
+ *
+ * These ranges are stored in an early_node_map[] and later used by
+ * free_area_init_nodes() to calculate zone sizes and holes. If the
+ * range spans a memory hole, it is up to the architecture to ensure
+ * the memory is not freed by the bootmem allocator. If possible
+ * the range being registered will be merged with existing ranges.
+ */
+void __init add_active_range(unsigned int nid, unsigned long start_pfn,
+ unsigned long end_pfn)
+{
+ int i;
+
+ printk(KERN_DEBUG "Entering add_active_range(%d, %lu, %lu) "
+ "%d entries of %d used\n",
+ nid, start_pfn, end_pfn,
+ nr_nodemap_entries, MAX_ACTIVE_REGIONS);
+
+ /* Merge with existing active regions if possible */
+ for (i = 0; i < nr_nodemap_entries; i++) {
+ if (early_node_map[i].nid != nid)
+ continue;
+
+ /* Skip if an existing region covers this new one */
+ if (start_pfn >= early_node_map[i].start_pfn &&
+ end_pfn <= early_node_map[i].end_pfn)
+ return;
+
+ /* Merge forward if suitable */
+ if (start_pfn <= early_node_map[i].end_pfn &&
+ end_pfn > early_node_map[i].end_pfn) {
+ early_node_map[i].end_pfn = end_pfn;
+ return;
+ }
+
+ /* Merge backward if suitable */
+ if (start_pfn < early_node_map[i].end_pfn &&
+ end_pfn >= early_node_map[i].start_pfn) {
+ early_node_map[i].start_pfn = start_pfn;
+ return;
+ }
+ }
+
+ /* Check that early_node_map is large enough */
+ if (i >= MAX_ACTIVE_REGIONS) {
+ printk(KERN_CRIT "More than %d memory regions, truncating\n",
+ MAX_ACTIVE_REGIONS);
+ return;
+ }
+
+ early_node_map[i].nid = nid;
+ early_node_map[i].start_pfn = start_pfn;
+ early_node_map[i].end_pfn = end_pfn;
+ nr_nodemap_entries = i + 1;
+}
+
+/**
+ * shrink_active_range - Shrink an existing registered range of PFNs
+ * @nid: The node id the range is on that should be shrunk
+ * @old_end_pfn: The old end PFN of the range
+ * @new_end_pfn: The new PFN of the range
+ *
+ * i386 with NUMA use alloc_remap() to store a node_mem_map on a local node.
+ * The map is kept at the end physical page range that has already been
+ * registered with add_active_range(). This function allows an arch to shrink
+ * an existing registered range.
+ */
+void __init shrink_active_range(unsigned int nid, unsigned long old_end_pfn,
+ unsigned long new_end_pfn)
+{
+ int i;
+
+ /* Find the old active region end and shrink */
+ for_each_active_range_index_in_nid(i, nid)
+ if (early_node_map[i].end_pfn == old_end_pfn) {
+ early_node_map[i].end_pfn = new_end_pfn;
+ break;
+ }
+}
+
+/**
+ * remove_all_active_ranges - Remove all currently registered regions
+ * During discovery, it may be found that a table like SRAT is invalid
+ * and an alternative discovery method must be used. This function removes
+ * all currently registered regions.
+ */
+void __init remove_all_active_ranges()
+{
+ memset(early_node_map, 0, sizeof(early_node_map));
+ nr_nodemap_entries = 0;
+}
+
+/* Compare two active node_active_regions */
+static int __init cmp_node_active_region(const void *a, const void *b)
+{
+ struct node_active_region *arange = (struct node_active_region *)a;
+ struct node_active_region *brange = (struct node_active_region *)b;
+
+ /* Done this way to avoid overflows */
+ if (arange->start_pfn > brange->start_pfn)
+ return 1;
+ if (arange->start_pfn < brange->start_pfn)
+ return -1;
+
+ return 0;
+}
+
+/* sort the node_map by start_pfn */
+static void __init sort_node_map(void)
+{
+ sort(early_node_map, (size_t)nr_nodemap_entries,
+ sizeof(struct node_active_region),
+ cmp_node_active_region, NULL);
+}
+
+/* Find the lowest pfn for a node. This depends on a sorted early_node_map */
+unsigned long __init find_min_pfn_for_node(unsigned long nid)
+{
+ int i;
+
+ /* Assuming a sorted map, the first range found has the starting pfn */
+ for_each_active_range_index_in_nid(i, nid)
+ return early_node_map[i].start_pfn;
+
+ printk(KERN_WARNING "Could not find start_pfn for node %lu\n", nid);
+ return 0;
+}
+
+/**
+ * find_min_pfn_with_active_regions - Find the minimum PFN registered
+ *
+ * It returns the minimum PFN based on information provided via
+ * add_active_range()
+ */
+unsigned long __init find_min_pfn_with_active_regions(void)
+{
+ return find_min_pfn_for_node(MAX_NUMNODES);
+}
+
+/**
+ * find_max_pfn_with_active_regions - Find the maximum PFN registered
+ *
+ * It returns the maximum PFN based on information provided via
+ * add_active_range()
+ */
+unsigned long __init find_max_pfn_with_active_regions(void)
+{
+ int i;
+ unsigned long max_pfn = 0;
+
+ for (i = 0; i < nr_nodemap_entries; i++)
+ max_pfn = max(max_pfn, early_node_map[i].end_pfn);
+
+ return max_pfn;
+}
+
+/**
+ * free_area_init_nodes - Initialise all pg_data_t and zone data
+ * @arch_max_dma_pfn: The maximum PFN usable for ZONE_DMA
+ * @arch_max_dma32_pfn: The maximum PFN usable for ZONE_DMA32
+ * @arch_max_low_pfn: The maximum PFN usable for ZONE_NORMAL
+ * @arch_max_high_pfn: The maximum PFN usable for ZONE_HIGHMEM
+ *
+ * This will call free_area_init_node() for each active node in the system.
+ * Using the page ranges provided by add_active_range(), the size of each
+ * zone in each node and their holes is calculated. If the maximum PFN
+ * between two adjacent zones match, it is assumed that the zone is empty.
+ * For example, if arch_max_dma_pfn == arch_max_dma32_pfn, it is assumed
+ * that arch_max_dma32_pfn has no pages. It is also assumed that a zone
+ * starts where the previous one ended. For example, ZONE_DMA32 starts
+ * at arch_max_dma_pfn.
+ */
+void __init free_area_init_nodes(unsigned long arch_max_dma_pfn,
+ unsigned long arch_max_dma32_pfn,
+ unsigned long arch_max_low_pfn,
+ unsigned long arch_max_high_pfn)
+{
+ unsigned long nid;
+ int i;
+
+ /* Record where the zone boundaries are */
+ memset(arch_zone_lowest_possible_pfn, 0,
+ sizeof(arch_zone_lowest_possible_pfn));
+ memset(arch_zone_highest_possible_pfn, 0,
+ sizeof(arch_zone_highest_possible_pfn));
+ arch_zone_lowest_possible_pfn[ZONE_DMA] =
+ find_min_pfn_with_active_regions();
+ arch_zone_highest_possible_pfn[ZONE_DMA] = arch_max_dma_pfn;
+ arch_zone_highest_possible_pfn[ZONE_DMA32] = arch_max_dma32_pfn;
+ arch_zone_highest_possible_pfn[ZONE_NORMAL] = arch_max_low_pfn;
+ arch_zone_highest_possible_pfn[ZONE_HIGHMEM] = arch_max_high_pfn;
+ for (i = 1; i < MAX_NR_ZONES; i++)
+ arch_zone_lowest_possible_pfn[i] =
+ arch_zone_highest_possible_pfn[i-1];
+
+ /* Regions in the early_node_map can be in any order */
+ sort_node_map();
+
+ /* Print out the zone ranges */
+ printk("Zone PFN ranges:\n");
+ for (i = 0; i < MAX_NR_ZONES; i++)
+ printk(" %-8s %8lu -> %8lu\n",
+ zone_names[i],
+ arch_zone_lowest_possible_pfn[i],
+ arch_zone_highest_possible_pfn[i]);
+
+ /* Print out the early_node_map[] */
+ printk("early_node_map[%d] active PFN ranges\n", nr_nodemap_entries);
+ for (i = 0; i < nr_nodemap_entries; i++)
+ printk(" %3d: %8lu -> %8lu\n", early_node_map[i].nid,
+ early_node_map[i].start_pfn,
+ early_node_map[i].end_pfn);
+
+ /* Initialise every node */
+ for_each_online_node(nid) {
+ pg_data_t *pgdat = NODE_DATA(nid);
+ free_area_init_node(nid, pgdat, NULL,
+ find_min_pfn_for_node(nid), NULL);
+ }
+}
+#endif /* CONFIG_ARCH_POPULATES_NODE_MAP */
+
#ifndef CONFIG_NEED_MULTIPLE_NODES
static bootmem_data_t contig_bootmem_data;
struct pglist_data contig_page_data = { .bdata = &contig_bootmem_data };
^ permalink raw reply
* [PATCH 2/6] Have Power use add_active_range() and free_area_init_nodes()
From: Mel Gorman @ 2006-07-08 11:11 UTC (permalink / raw)
To: akpm
Cc: davej, tony.luck, linux-mm, Mel Gorman, ak, bob.picco,
linux-kernel, linuxppc-dev
In-Reply-To: <20060708111042.28664.14732.sendpatchset@skynet.skynet.ie>
Size zones and holes in an architecture independent manner for Power.
powerpc/Kconfig | 7 --
powerpc/mm/mem.c | 53 ++++++----------
powerpc/mm/numa.c | 157 ++++---------------------------------------------
ppc/Kconfig | 3
ppc/mm/init.c | 26 ++++----
5 files changed, 56 insertions(+), 190 deletions(-)
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.17-mm6-101-add_free_area_init_nodes/arch/powerpc/Kconfig linux-2.6.17-mm6-102-powerpc_use_init_nodes/arch/powerpc/Kconfig
--- linux-2.6.17-mm6-101-add_free_area_init_nodes/arch/powerpc/Kconfig 2006-07-05 14:31:12.000000000 +0100
+++ linux-2.6.17-mm6-102-powerpc_use_init_nodes/arch/powerpc/Kconfig 2006-07-06 11:06:11.000000000 +0100
@@ -715,11 +715,10 @@ config ARCH_SPARSEMEM_DEFAULT
def_bool y
depends on SMP && PPC_PSERIES
-source "mm/Kconfig"
-
-config HAVE_ARCH_EARLY_PFN_TO_NID
+config ARCH_POPULATES_NODE_MAP
def_bool y
- depends on NEED_MULTIPLE_NODES
+
+source "mm/Kconfig"
config ARCH_MEMORY_PROBE
def_bool y
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.17-mm6-101-add_free_area_init_nodes/arch/powerpc/mm/mem.c linux-2.6.17-mm6-102-powerpc_use_init_nodes/arch/powerpc/mm/mem.c
--- linux-2.6.17-mm6-101-add_free_area_init_nodes/arch/powerpc/mm/mem.c 2006-07-05 14:31:12.000000000 +0100
+++ linux-2.6.17-mm6-102-powerpc_use_init_nodes/arch/powerpc/mm/mem.c 2006-07-06 11:06:11.000000000 +0100
@@ -256,20 +256,22 @@ void __init do_init_bootmem(void)
boot_mapsize = init_bootmem(start >> PAGE_SHIFT, total_pages);
+ /* Add active regions with valid PFNs */
+ for (i = 0; i < lmb.memory.cnt; i++) {
+ unsigned long start_pfn, end_pfn;
+ start_pfn = lmb.memory.region[i].base >> PAGE_SHIFT;
+ end_pfn = start_pfn + lmb_size_pages(&lmb.memory, i);
+ add_active_range(0, start_pfn, end_pfn);
+ }
+
/* Add all physical memory to the bootmem map, mark each area
* present.
*/
- for (i = 0; i < lmb.memory.cnt; i++) {
- unsigned long base = lmb.memory.region[i].base;
- unsigned long size = lmb_size_bytes(&lmb.memory, i);
#ifdef CONFIG_HIGHMEM
- if (base >= total_lowmem)
- continue;
- if (base + size > total_lowmem)
- size = total_lowmem - base;
+ free_bootmem_with_active_regions(0, total_lowmem >> PAGE_SHIFT);
+#else
+ free_bootmem_with_active_regions(0, max_pfn);
#endif
- free_bootmem(base, size);
- }
/* reserve the sections we're already using */
for (i = 0; i < lmb.reserved.cnt; i++)
@@ -277,9 +279,8 @@ void __init do_init_bootmem(void)
lmb_size_bytes(&lmb.reserved, i));
/* XXX need to clip this if using highmem? */
- for (i = 0; i < lmb.memory.cnt; i++)
- memory_present(0, lmb_start_pfn(&lmb.memory, i),
- lmb_end_pfn(&lmb.memory, i));
+ sparse_memory_present_with_active_regions(0);
+
init_bootmem_done = 1;
}
@@ -288,8 +289,6 @@ void __init do_init_bootmem(void)
*/
void __init paging_init(void)
{
- unsigned long zones_size[MAX_NR_ZONES];
- unsigned long zholes_size[MAX_NR_ZONES];
unsigned long total_ram = lmb_phys_mem_size();
unsigned long top_of_ram = lmb_end_of_DRAM();
@@ -307,26 +306,18 @@ void __init paging_init(void)
top_of_ram, total_ram);
printk(KERN_DEBUG "Memory hole size: %ldMB\n",
(top_of_ram - total_ram) >> 20);
- /*
- * All pages are DMA-able so we put them all in the DMA zone.
- */
- memset(zones_size, 0, sizeof(zones_size));
- memset(zholes_size, 0, sizeof(zholes_size));
-
- zones_size[ZONE_DMA] = top_of_ram >> PAGE_SHIFT;
- zholes_size[ZONE_DMA] = (top_of_ram - total_ram) >> PAGE_SHIFT;
-
#ifdef CONFIG_HIGHMEM
- zones_size[ZONE_DMA] = total_lowmem >> PAGE_SHIFT;
- zones_size[ZONE_HIGHMEM] = (total_memory - total_lowmem) >> PAGE_SHIFT;
- zholes_size[ZONE_HIGHMEM] = (top_of_ram - total_ram) >> PAGE_SHIFT;
+ free_area_init_nodes(total_lowmem >> PAGE_SHIFT,
+ total_lowmem >> PAGE_SHIFT,
+ total_lowmem >> PAGE_SHIFT,
+ top_of_ram >> PAGE_SHIFT);
#else
- zones_size[ZONE_DMA] = top_of_ram >> PAGE_SHIFT;
- zholes_size[ZONE_DMA] = (top_of_ram - total_ram) >> PAGE_SHIFT;
-#endif /* CONFIG_HIGHMEM */
+ free_area_init_nodes(top_of_ram >> PAGE_SHIFT,
+ top_of_ram >> PAGE_SHIFT,
+ top_of_ram >> PAGE_SHIFT,
+ top_of_ram >> PAGE_SHIFT);
+#endif
- free_area_init_node(0, NODE_DATA(0), zones_size,
- __pa(PAGE_OFFSET) >> PAGE_SHIFT, zholes_size);
}
#endif /* ! CONFIG_NEED_MULTIPLE_NODES */
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.17-mm6-101-add_free_area_init_nodes/arch/powerpc/mm/numa.c linux-2.6.17-mm6-102-powerpc_use_init_nodes/arch/powerpc/mm/numa.c
--- linux-2.6.17-mm6-101-add_free_area_init_nodes/arch/powerpc/mm/numa.c 2006-07-05 14:31:12.000000000 +0100
+++ linux-2.6.17-mm6-102-powerpc_use_init_nodes/arch/powerpc/mm/numa.c 2006-07-06 11:06:11.000000000 +0100
@@ -39,96 +39,6 @@ static bootmem_data_t __initdata plat_no
static int min_common_depth;
static int n_mem_addr_cells, n_mem_size_cells;
-/*
- * We need somewhere to store start/end/node for each region until we have
- * allocated the real node_data structures.
- */
-#define MAX_REGIONS (MAX_LMB_REGIONS*2)
-static struct {
- unsigned long start_pfn;
- unsigned long end_pfn;
- int nid;
-} init_node_data[MAX_REGIONS] __initdata;
-
-int __init early_pfn_to_nid(unsigned long pfn)
-{
- unsigned int i;
-
- for (i = 0; init_node_data[i].end_pfn; i++) {
- unsigned long start_pfn = init_node_data[i].start_pfn;
- unsigned long end_pfn = init_node_data[i].end_pfn;
-
- if ((start_pfn <= pfn) && (pfn < end_pfn))
- return init_node_data[i].nid;
- }
-
- return -1;
-}
-
-void __init add_region(unsigned int nid, unsigned long start_pfn,
- unsigned long pages)
-{
- unsigned int i;
-
- dbg("add_region nid %d start_pfn 0x%lx pages 0x%lx\n",
- nid, start_pfn, pages);
-
- for (i = 0; init_node_data[i].end_pfn; i++) {
- if (init_node_data[i].nid != nid)
- continue;
- if (init_node_data[i].end_pfn == start_pfn) {
- init_node_data[i].end_pfn += pages;
- return;
- }
- if (init_node_data[i].start_pfn == (start_pfn + pages)) {
- init_node_data[i].start_pfn -= pages;
- return;
- }
- }
-
- /*
- * Leave last entry NULL so we dont iterate off the end (we use
- * entry.end_pfn to terminate the walk).
- */
- if (i >= (MAX_REGIONS - 1)) {
- printk(KERN_ERR "WARNING: too many memory regions in "
- "numa code, truncating\n");
- return;
- }
-
- init_node_data[i].start_pfn = start_pfn;
- init_node_data[i].end_pfn = start_pfn + pages;
- init_node_data[i].nid = nid;
-}
-
-/* We assume init_node_data has no overlapping regions */
-void __init get_region(unsigned int nid, unsigned long *start_pfn,
- unsigned long *end_pfn, unsigned long *pages_present)
-{
- unsigned int i;
-
- *start_pfn = -1UL;
- *end_pfn = *pages_present = 0;
-
- for (i = 0; init_node_data[i].end_pfn; i++) {
- if (init_node_data[i].nid != nid)
- continue;
-
- *pages_present += init_node_data[i].end_pfn -
- init_node_data[i].start_pfn;
-
- if (init_node_data[i].start_pfn < *start_pfn)
- *start_pfn = init_node_data[i].start_pfn;
-
- if (init_node_data[i].end_pfn > *end_pfn)
- *end_pfn = init_node_data[i].end_pfn;
- }
-
- /* We didnt find a matching region, return start/end as 0 */
- if (*start_pfn == -1UL)
- *start_pfn = 0;
-}
-
static void __cpuinit map_cpu_to_node(int cpu, int node)
{
numa_cpu_lookup_table[cpu] = node;
@@ -471,8 +381,8 @@ new_range:
continue;
}
- add_region(nid, start >> PAGE_SHIFT,
- size >> PAGE_SHIFT);
+ add_active_range(nid, start >> PAGE_SHIFT,
+ (start >> PAGE_SHIFT) + (size >> PAGE_SHIFT));
if (--ranges)
goto new_range;
@@ -485,6 +395,7 @@ static void __init setup_nonnuma(void)
{
unsigned long top_of_ram = lmb_end_of_DRAM();
unsigned long total_ram = lmb_phys_mem_size();
+ unsigned long start_pfn, end_pfn;
unsigned int i;
printk(KERN_DEBUG "Top of RAM: 0x%lx, Total RAM: 0x%lx\n",
@@ -492,9 +403,11 @@ static void __init setup_nonnuma(void)
printk(KERN_DEBUG "Memory hole size: %ldMB\n",
(top_of_ram - total_ram) >> 20);
- for (i = 0; i < lmb.memory.cnt; ++i)
- add_region(0, lmb.memory.region[i].base >> PAGE_SHIFT,
- lmb_size_pages(&lmb.memory, i));
+ for (i = 0; i < lmb.memory.cnt; ++i) {
+ start_pfn = lmb.memory.region[i].base >> PAGE_SHIFT;
+ end_pfn = start_pfn + lmb_size_pages(&lmb.memory, i);
+ add_active_range(0, start_pfn, end_pfn);
+ }
node_set_online(0);
}
@@ -633,11 +546,11 @@ void __init do_init_bootmem(void)
(void *)(unsigned long)boot_cpuid);
for_each_online_node(nid) {
- unsigned long start_pfn, end_pfn, pages_present;
+ unsigned long start_pfn, end_pfn;
unsigned long bootmem_paddr;
unsigned long bootmap_pages;
- get_region(nid, &start_pfn, &end_pfn, &pages_present);
+ get_pfn_range_for_nid(nid, &start_pfn, &end_pfn);
/* Allocate the node structure node local if possible */
NODE_DATA(nid) = careful_allocation(nid,
@@ -670,19 +583,7 @@ void __init do_init_bootmem(void)
init_bootmem_node(NODE_DATA(nid), bootmem_paddr >> PAGE_SHIFT,
start_pfn, end_pfn);
- /* Add free regions on this node */
- for (i = 0; init_node_data[i].end_pfn; i++) {
- unsigned long start, end;
-
- if (init_node_data[i].nid != nid)
- continue;
-
- start = init_node_data[i].start_pfn << PAGE_SHIFT;
- end = init_node_data[i].end_pfn << PAGE_SHIFT;
-
- dbg("free_bootmem %lx %lx\n", start, end - start);
- free_bootmem_node(NODE_DATA(nid), start, end - start);
- }
+ free_bootmem_with_active_regions(nid, end_pfn);
/* Mark reserved regions on this node */
for (i = 0; i < lmb.reserved.cnt; i++) {
@@ -713,44 +614,14 @@ void __init do_init_bootmem(void)
}
}
- /* Add regions into sparsemem */
- for (i = 0; init_node_data[i].end_pfn; i++) {
- unsigned long start, end;
-
- if (init_node_data[i].nid != nid)
- continue;
-
- start = init_node_data[i].start_pfn;
- end = init_node_data[i].end_pfn;
-
- memory_present(nid, start, end);
- }
+ sparse_memory_present_with_active_regions(nid);
}
}
void __init paging_init(void)
{
- unsigned long zones_size[MAX_NR_ZONES];
- unsigned long zholes_size[MAX_NR_ZONES];
- int nid;
-
- memset(zones_size, 0, sizeof(zones_size));
- memset(zholes_size, 0, sizeof(zholes_size));
-
- for_each_online_node(nid) {
- unsigned long start_pfn, end_pfn, pages_present;
-
- get_region(nid, &start_pfn, &end_pfn, &pages_present);
-
- zones_size[ZONE_DMA] = end_pfn - start_pfn;
- zholes_size[ZONE_DMA] = zones_size[ZONE_DMA] - pages_present;
-
- dbg("free_area_init node %d %lx %lx (hole: %lx)\n", nid,
- zones_size[ZONE_DMA], start_pfn, zholes_size[ZONE_DMA]);
-
- free_area_init_node(nid, NODE_DATA(nid), zones_size, start_pfn,
- zholes_size);
- }
+ unsigned long end_pfn = lmb_end_of_DRAM() >> PAGE_SHIFT;
+ free_area_init_nodes(end_pfn, end_pfn, end_pfn, end_pfn);
}
static int __init early_numa(char *p)
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.17-mm6-101-add_free_area_init_nodes/arch/ppc/Kconfig linux-2.6.17-mm6-102-powerpc_use_init_nodes/arch/ppc/Kconfig
--- linux-2.6.17-mm6-101-add_free_area_init_nodes/arch/ppc/Kconfig 2006-07-05 14:31:12.000000000 +0100
+++ linux-2.6.17-mm6-102-powerpc_use_init_nodes/arch/ppc/Kconfig 2006-07-06 11:06:11.000000000 +0100
@@ -953,6 +953,9 @@ config NR_CPUS
config HIGHMEM
bool "High memory support"
+config ARCH_POPULATES_NODE_MAP
+ def_bool y
+
source kernel/Kconfig.hz
source kernel/Kconfig.preempt
source "mm/Kconfig"
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.17-mm6-101-add_free_area_init_nodes/arch/ppc/mm/init.c linux-2.6.17-mm6-102-powerpc_use_init_nodes/arch/ppc/mm/init.c
--- linux-2.6.17-mm6-101-add_free_area_init_nodes/arch/ppc/mm/init.c 2006-07-05 14:31:12.000000000 +0100
+++ linux-2.6.17-mm6-102-powerpc_use_init_nodes/arch/ppc/mm/init.c 2006-07-06 11:06:11.000000000 +0100
@@ -358,8 +358,7 @@ void __init do_init_bootmem(void)
*/
void __init paging_init(void)
{
- unsigned long zones_size[MAX_NR_ZONES], i;
-
+ unsigned long start_pfn, end_pfn;
#ifdef CONFIG_HIGHMEM
map_page(PKMAP_BASE, 0, 0); /* XXX gross */
pkmap_page_table = pte_offset_kernel(pmd_offset(pgd_offset_k
@@ -369,19 +368,22 @@ void __init paging_init(void)
(KMAP_FIX_BEGIN), KMAP_FIX_BEGIN), KMAP_FIX_BEGIN);
kmap_prot = PAGE_KERNEL;
#endif /* CONFIG_HIGHMEM */
-
- /*
- * All pages are DMA-able so we put them all in the DMA zone.
- */
- zones_size[ZONE_DMA] = total_lowmem >> PAGE_SHIFT;
- for (i = 1; i < MAX_NR_ZONES; i++)
- zones_size[i] = 0;
+ /* All pages are DMA-able so we put them all in the DMA zone. */
+ start_pfn = __pa(PAGE_OFFSET) >> PAGE_SHIFT;
+ end_pfn = start_pfn + (total_memory >> PAGE_SHIFT);
+ add_active_range(0, start_pfn, end_pfn);
#ifdef CONFIG_HIGHMEM
- zones_size[ZONE_HIGHMEM] = (total_memory - total_lowmem) >> PAGE_SHIFT;
+ free_area_init_nodes(total_lowmem >> PAGE_SHIFT,
+ total_lowmem >> PAGE_SHIFT,
+ total_lowmem >> PAGE_SHIFT,
+ total_memory >> PAGE_SHIFT);
+#else
+ free_area_init_nodes(total_memory >> PAGE_SHIFT,
+ total_memory >> PAGE_SHIFT,
+ total_memory >> PAGE_SHIFT,
+ total_memory >> PAGE_SHIFT);
#endif /* CONFIG_HIGHMEM */
-
- free_area_init(zones_size);
}
void __init mem_init(void)
^ permalink raw reply
* [PATCH 3/6] Have x86 use add_active_range() and free_area_init_nodes
From: Mel Gorman @ 2006-07-08 11:11 UTC (permalink / raw)
To: akpm
Cc: davej, tony.luck, linuxppc-dev, Mel Gorman, linux-kernel,
bob.picco, ak, linux-mm
In-Reply-To: <20060708111042.28664.14732.sendpatchset@skynet.skynet.ie>
Size zones and holes in an architecture independent manner for x86.
Kconfig | 8 +---
kernel/setup.c | 19 +++------
kernel/srat.c | 100 +---------------------------------------------------
mm/discontig.c | 65 +++++++--------------------------
4 files changed, 25 insertions(+), 167 deletions(-)
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.17-mm6-102-powerpc_use_init_nodes/arch/i386/Kconfig linux-2.6.17-mm6-103-x86_use_init_nodes/arch/i386/Kconfig
--- linux-2.6.17-mm6-102-powerpc_use_init_nodes/arch/i386/Kconfig 2006-07-05 14:31:11.000000000 +0100
+++ linux-2.6.17-mm6-103-x86_use_init_nodes/arch/i386/Kconfig 2006-07-06 11:08:03.000000000 +0100
@@ -603,12 +603,10 @@ config ARCH_SELECT_MEMORY_MODEL
def_bool y
depends on ARCH_SPARSEMEM_ENABLE
-source "mm/Kconfig"
+config ARCH_POPULATES_NODE_MAP
+ def_bool y
-config HAVE_ARCH_EARLY_PFN_TO_NID
- bool
- default y
- depends on NUMA
+source "mm/Kconfig"
config HIGHPTE
bool "Allocate 3rd-level pagetables from highmem"
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.17-mm6-102-powerpc_use_init_nodes/arch/i386/kernel/setup.c linux-2.6.17-mm6-103-x86_use_init_nodes/arch/i386/kernel/setup.c
--- linux-2.6.17-mm6-102-powerpc_use_init_nodes/arch/i386/kernel/setup.c 2006-07-05 14:31:11.000000000 +0100
+++ linux-2.6.17-mm6-103-x86_use_init_nodes/arch/i386/kernel/setup.c 2006-07-06 11:08:03.000000000 +0100
@@ -1201,22 +1201,15 @@ static unsigned long __init setup_memory
void __init zone_sizes_init(void)
{
- unsigned long zones_size[MAX_NR_ZONES] = {0, 0, 0};
- unsigned int max_dma, low;
+ unsigned int max_dma;
+#ifndef CONFIG_HIGHMEM
+ unsigned long highend_pfn = max_low_pfn;
+#endif
max_dma = virt_to_phys((char *)MAX_DMA_ADDRESS) >> PAGE_SHIFT;
- low = max_low_pfn;
- if (low < max_dma)
- zones_size[ZONE_DMA] = low;
- else {
- zones_size[ZONE_DMA] = max_dma;
- zones_size[ZONE_NORMAL] = low - max_dma;
-#ifdef CONFIG_HIGHMEM
- zones_size[ZONE_HIGHMEM] = highend_pfn - low;
-#endif
- }
- free_area_init(zones_size);
+ add_active_range(0, 0, highend_pfn);
+ free_area_init_nodes(max_dma, max_dma, max_low_pfn, highend_pfn);
}
#else
extern unsigned long __init setup_memory(void);
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.17-mm6-102-powerpc_use_init_nodes/arch/i386/kernel/srat.c linux-2.6.17-mm6-103-x86_use_init_nodes/arch/i386/kernel/srat.c
--- linux-2.6.17-mm6-102-powerpc_use_init_nodes/arch/i386/kernel/srat.c 2006-07-05 14:31:11.000000000 +0100
+++ linux-2.6.17-mm6-103-x86_use_init_nodes/arch/i386/kernel/srat.c 2006-07-06 11:08:03.000000000 +0100
@@ -54,8 +54,6 @@ struct node_memory_chunk_s {
static struct node_memory_chunk_s node_memory_chunk[MAXCHUNKS];
static int num_memory_chunks; /* total number of memory chunks */
-static int zholes_size_init;
-static unsigned long zholes_size[MAX_NUMNODES * MAX_NR_ZONES];
extern void * boot_ioremap(unsigned long, unsigned long);
@@ -135,50 +133,6 @@ static void __init parse_memory_affinity
"enabled and removable" : "enabled" ) );
}
-#if MAX_NR_ZONES != 4
-#error "MAX_NR_ZONES != 4, chunk_to_zone requires review"
-#endif
-/* Take a chunk of pages from page frame cstart to cend and count the number
- * of pages in each zone, returned via zones[].
- */
-static __init void chunk_to_zones(unsigned long cstart, unsigned long cend,
- unsigned long *zones)
-{
- unsigned long max_dma;
- extern unsigned long max_low_pfn;
-
- int z;
- unsigned long rend;
-
- /* FIXME: MAX_DMA_ADDRESS and max_low_pfn are trying to provide
- * similarly scoped information and should be handled in a consistant
- * manner.
- */
- max_dma = virt_to_phys((char *)MAX_DMA_ADDRESS) >> PAGE_SHIFT;
-
- /* Split the hole into the zones in which it falls. Repeatedly
- * take the segment in which the remaining hole starts, round it
- * to the end of that zone.
- */
- memset(zones, 0, MAX_NR_ZONES * sizeof(long));
- while (cstart < cend) {
- if (cstart < max_dma) {
- z = ZONE_DMA;
- rend = (cend < max_dma)? cend : max_dma;
-
- } else if (cstart < max_low_pfn) {
- z = ZONE_NORMAL;
- rend = (cend < max_low_pfn)? cend : max_low_pfn;
-
- } else {
- z = ZONE_HIGHMEM;
- rend = cend;
- }
- zones[z] += rend - cstart;
- cstart = rend;
- }
-}
-
/*
* The SRAT table always lists ascending addresses, so can always
* assume that the first "start" address that you see is the real
@@ -223,7 +177,6 @@ static int __init acpi20_parse_srat(stru
memset(pxm_bitmap, 0, sizeof(pxm_bitmap)); /* init proximity domain bitmap */
memset(node_memory_chunk, 0, sizeof(node_memory_chunk));
- memset(zholes_size, 0, sizeof(zholes_size));
num_memory_chunks = 0;
while (p < end) {
@@ -287,6 +240,7 @@ static int __init acpi20_parse_srat(stru
printk("chunk %d nid %d start_pfn %08lx end_pfn %08lx\n",
j, chunk->nid, chunk->start_pfn, chunk->end_pfn);
node_read_chunk(chunk->nid, chunk);
+ add_active_range(chunk->nid, chunk->start_pfn, chunk->end_pfn);
}
for_each_online_node(nid) {
@@ -403,57 +357,7 @@ int __init get_memcfg_from_srat(void)
return acpi20_parse_srat((struct acpi_table_srat *)header);
}
out_err:
+ remove_all_active_ranges();
printk("failed to get NUMA memory information from SRAT table\n");
return 0;
}
-
-/* For each node run the memory list to determine whether there are
- * any memory holes. For each hole determine which ZONE they fall
- * into.
- *
- * NOTE#1: this requires knowledge of the zone boundries and so
- * _cannot_ be performed before those are calculated in setup_memory.
- *
- * NOTE#2: we rely on the fact that the memory chunks are ordered by
- * start pfn number during setup.
- */
-static void __init get_zholes_init(void)
-{
- int nid;
- int c;
- int first;
- unsigned long end = 0;
-
- for_each_online_node(nid) {
- first = 1;
- for (c = 0; c < num_memory_chunks; c++){
- if (node_memory_chunk[c].nid == nid) {
- if (first) {
- end = node_memory_chunk[c].end_pfn;
- first = 0;
-
- } else {
- /* Record any gap between this chunk
- * and the previous chunk on this node
- * against the zones it spans.
- */
- chunk_to_zones(end,
- node_memory_chunk[c].start_pfn,
- &zholes_size[nid * MAX_NR_ZONES]);
- }
- }
- }
- }
-}
-
-unsigned long * __init get_zholes_size(int nid)
-{
- if (!zholes_size_init) {
- zholes_size_init++;
- get_zholes_init();
- }
- if (nid >= MAX_NUMNODES || !node_online(nid))
- printk("%s: nid = %d is invalid/offline. num_online_nodes = %d",
- __FUNCTION__, nid, num_online_nodes());
- return &zholes_size[nid * MAX_NR_ZONES];
-}
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.17-mm6-102-powerpc_use_init_nodes/arch/i386/mm/discontig.c linux-2.6.17-mm6-103-x86_use_init_nodes/arch/i386/mm/discontig.c
--- linux-2.6.17-mm6-102-powerpc_use_init_nodes/arch/i386/mm/discontig.c 2006-07-05 14:31:11.000000000 +0100
+++ linux-2.6.17-mm6-103-x86_use_init_nodes/arch/i386/mm/discontig.c 2006-07-06 11:08:03.000000000 +0100
@@ -156,21 +156,6 @@ static void __init find_max_pfn_node(int
BUG();
}
-/* Find the owning node for a pfn. */
-int early_pfn_to_nid(unsigned long pfn)
-{
- int nid;
-
- for_each_node(nid) {
- if (node_end_pfn[nid] == 0)
- break;
- if (node_start_pfn[nid] <= pfn && node_end_pfn[nid] >= pfn)
- return nid;
- }
-
- return 0;
-}
-
/*
* Allocate memory for the pg_data_t for this node via a crude pre-bootmem
* method. For node zero take this from the bottom of memory, for
@@ -226,6 +211,8 @@ static unsigned long calculate_numa_rema
unsigned long pfn;
for_each_online_node(nid) {
+ unsigned old_end_pfn = node_end_pfn[nid];
+
/*
* The acpi/srat node info can show hot-add memroy zones
* where memory could be added but not currently present.
@@ -275,6 +262,7 @@ static unsigned long calculate_numa_rema
node_end_pfn[nid] -= size;
node_remap_start_pfn[nid] = node_end_pfn[nid];
+ shrink_active_range(nid, old_end_pfn, node_end_pfn[nid]);
}
printk("Reserving total of %ld pages for numa KVA remap\n",
reserve_pages);
@@ -351,45 +339,20 @@ unsigned long __init setup_memory(void)
void __init zone_sizes_init(void)
{
int nid;
+ unsigned long max_dma_pfn;
-
- for_each_online_node(nid) {
- unsigned long zones_size[MAX_NR_ZONES] = {0, 0, 0};
- unsigned long *zholes_size;
- unsigned int max_dma;
-
- unsigned long low = max_low_pfn;
- unsigned long start = node_start_pfn[nid];
- unsigned long high = node_end_pfn[nid];
-
- max_dma = virt_to_phys((char *)MAX_DMA_ADDRESS) >> PAGE_SHIFT;
-
- if (node_has_online_mem(nid)){
- if (start > low) {
-#ifdef CONFIG_HIGHMEM
- BUG_ON(start > high);
- zones_size[ZONE_HIGHMEM] = high - start;
-#endif
- } else {
- if (low < max_dma)
- zones_size[ZONE_DMA] = low;
- else {
- BUG_ON(max_dma > low);
- BUG_ON(low > high);
- zones_size[ZONE_DMA] = max_dma;
- zones_size[ZONE_NORMAL] = low - max_dma;
-#ifdef CONFIG_HIGHMEM
- zones_size[ZONE_HIGHMEM] = high - low;
-#endif
- }
- }
+ /* If SRAT has not registered memory, register it now */
+ if (find_max_pfn_with_active_regions() == 0) {
+ for_each_online_node(nid) {
+ if (node_has_online_mem(nid))
+ add_active_range(nid, node_start_pfn[nid],
+ node_end_pfn[nid]);
}
-
- zholes_size = get_zholes_size(nid);
-
- free_area_init_node(nid, NODE_DATA(nid), zones_size, start,
- zholes_size);
}
+
+ max_dma_pfn = virt_to_phys((char *)MAX_DMA_ADDRESS) >> PAGE_SHIFT;
+ free_area_init_nodes(max_dma_pfn, max_dma_pfn,
+ max_low_pfn, highend_pfn);
return;
}
^ permalink raw reply
* [PATCH 4/6] Have x86_64 use add_active_range() and free_area_init_nodes
From: Mel Gorman @ 2006-07-08 11:12 UTC (permalink / raw)
To: akpm
Cc: davej, tony.luck, linux-mm, Mel Gorman, ak, bob.picco,
linux-kernel, linuxppc-dev
In-Reply-To: <20060708111042.28664.14732.sendpatchset@skynet.skynet.ie>
Size zones and holes in an architecture independent manner for x86_64.
arch/x86_64/Kconfig | 3
arch/x86_64/kernel/e820.c | 125 ++++++++++++++-------------------------
arch/x86_64/kernel/setup.c | 7 +-
arch/x86_64/mm/init.c | 62 -------------------
arch/x86_64/mm/k8topology.c | 3
arch/x86_64/mm/numa.c | 18 ++---
arch/x86_64/mm/srat.c | 11 ++-
include/asm-x86_64/e820.h | 5 -
include/asm-x86_64/proto.h | 2
9 files changed, 79 insertions(+), 157 deletions(-)
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.17-mm6-103-x86_use_init_nodes/arch/x86_64/Kconfig linux-2.6.17-mm6-104-x86_64_use_init_nodes/arch/x86_64/Kconfig
--- linux-2.6.17-mm6-103-x86_use_init_nodes/arch/x86_64/Kconfig 2006-07-05 14:31:12.000000000 +0100
+++ linux-2.6.17-mm6-104-x86_64_use_init_nodes/arch/x86_64/Kconfig 2006-07-06 11:09:46.000000000 +0100
@@ -81,6 +81,9 @@ config ARCH_MAY_HAVE_PC_FDC
bool
default y
+config ARCH_POPULATES_NODE_MAP
+ def_bool y
+
config DMI
bool
default y
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.17-mm6-103-x86_use_init_nodes/arch/x86_64/kernel/e820.c linux-2.6.17-mm6-104-x86_64_use_init_nodes/arch/x86_64/kernel/e820.c
--- linux-2.6.17-mm6-103-x86_use_init_nodes/arch/x86_64/kernel/e820.c 2006-07-05 14:31:12.000000000 +0100
+++ linux-2.6.17-mm6-104-x86_64_use_init_nodes/arch/x86_64/kernel/e820.c 2006-07-06 11:09:46.000000000 +0100
@@ -16,6 +16,7 @@
#include <linux/string.h>
#include <linux/kexec.h>
#include <linux/module.h>
+#include <linux/mm.h>
#include <asm/page.h>
#include <asm/e820.h>
@@ -159,58 +160,14 @@ unsigned long __init find_e820_area(unsi
return -1UL;
}
-/*
- * Free bootmem based on the e820 table for a node.
- */
-void __init e820_bootmem_free(pg_data_t *pgdat, unsigned long start,unsigned long end)
-{
- int i;
- for (i = 0; i < e820.nr_map; i++) {
- struct e820entry *ei = &e820.map[i];
- unsigned long last, addr;
-
- if (ei->type != E820_RAM ||
- ei->addr+ei->size <= start ||
- ei->addr >= end)
- continue;
-
- addr = round_up(ei->addr, PAGE_SIZE);
- if (addr < start)
- addr = start;
-
- last = round_down(ei->addr + ei->size, PAGE_SIZE);
- if (last >= end)
- last = end;
-
- if (last > addr && last-addr >= PAGE_SIZE)
- free_bootmem_node(pgdat, addr, last-addr);
- }
-}
-
/*
* Find the highest page frame number we have available
*/
unsigned long __init e820_end_of_ram(void)
{
- int i;
unsigned long end_pfn = 0;
- for (i = 0; i < e820.nr_map; i++) {
- struct e820entry *ei = &e820.map[i];
- unsigned long start, end;
-
- start = round_up(ei->addr, PAGE_SIZE);
- end = round_down(ei->addr + ei->size, PAGE_SIZE);
- if (start >= end)
- continue;
- if (ei->type == E820_RAM) {
- if (end > end_pfn<<PAGE_SHIFT)
- end_pfn = end>>PAGE_SHIFT;
- } else {
- if (end > end_pfn_map<<PAGE_SHIFT)
- end_pfn_map = end>>PAGE_SHIFT;
- }
- }
+ end_pfn = find_max_pfn_with_active_regions();
if (end_pfn > end_pfn_map)
end_pfn_map = end_pfn;
@@ -221,43 +178,10 @@ unsigned long __init e820_end_of_ram(voi
if (end_pfn > end_pfn_map)
end_pfn = end_pfn_map;
+ printk("end_pfn_map = %lu\n", end_pfn_map);
return end_pfn;
}
-/*
- * Compute how much memory is missing in a range.
- * Unlike the other functions in this file the arguments are in page numbers.
- */
-unsigned long __init
-e820_hole_size(unsigned long start_pfn, unsigned long end_pfn)
-{
- unsigned long ram = 0;
- unsigned long start = start_pfn << PAGE_SHIFT;
- unsigned long end = end_pfn << PAGE_SHIFT;
- int i;
- for (i = 0; i < e820.nr_map; i++) {
- struct e820entry *ei = &e820.map[i];
- unsigned long last, addr;
-
- if (ei->type != E820_RAM ||
- ei->addr+ei->size <= start ||
- ei->addr >= end)
- continue;
-
- addr = round_up(ei->addr, PAGE_SIZE);
- if (addr < start)
- addr = start;
-
- last = round_down(ei->addr + ei->size, PAGE_SIZE);
- if (last >= end)
- last = end;
-
- if (last > addr)
- ram += last - addr;
- }
- return ((end - start) - ram) >> PAGE_SHIFT;
-}
-
/*
* Mark e820 reserved areas as busy for the resource manager.
*/
@@ -292,6 +216,49 @@ void __init e820_reserve_resources(void)
}
}
+/* Walk the e820 map and register active regions within a node */
+void __init
+e820_register_active_regions(int nid, unsigned long start_pfn,
+ unsigned long end_pfn)
+{
+ int i;
+ unsigned long ei_startpfn, ei_endpfn;
+ for (i = 0; i < e820.nr_map; i++) {
+ struct e820entry *ei = &e820.map[i];
+ ei_startpfn = round_up(ei->addr, PAGE_SIZE) >> PAGE_SHIFT;
+ ei_endpfn = round_down(ei->addr + ei->size, PAGE_SIZE)
+ >> PAGE_SHIFT;
+
+ /* Skip map entries smaller than a page */
+ if (ei_startpfn > ei_endpfn)
+ continue;
+
+ /* Check if end_pfn_map should be updated */
+ if (ei->type != E820_RAM && ei_endpfn > end_pfn_map)
+ end_pfn_map = ei_endpfn;
+
+ /* Skip if map is outside the node */
+ if (ei->type != E820_RAM ||
+ ei_endpfn <= start_pfn ||
+ ei_startpfn >= end_pfn)
+ continue;
+
+ /* Check for overlaps */
+ if (ei_startpfn < start_pfn)
+ ei_startpfn = start_pfn;
+ if (ei_endpfn > end_pfn)
+ ei_endpfn = end_pfn;
+
+ /* Obey end_user_pfn to save on memmap */
+ if (ei_startpfn >= end_user_pfn)
+ continue;
+ if (ei_endpfn > end_user_pfn)
+ ei_endpfn = end_user_pfn;
+
+ add_active_range(nid, ei_startpfn, ei_endpfn);
+ }
+}
+
/*
* Add a memory region to the kernel e820 map.
*/
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.17-mm6-103-x86_use_init_nodes/arch/x86_64/kernel/setup.c linux-2.6.17-mm6-104-x86_64_use_init_nodes/arch/x86_64/kernel/setup.c
--- linux-2.6.17-mm6-103-x86_use_init_nodes/arch/x86_64/kernel/setup.c 2006-07-05 14:31:12.000000000 +0100
+++ linux-2.6.17-mm6-104-x86_64_use_init_nodes/arch/x86_64/kernel/setup.c 2006-07-06 11:09:46.000000000 +0100
@@ -466,7 +466,8 @@ contig_initmem_init(unsigned long start_
if (bootmap == -1L)
panic("Cannot find bootmem map of size %ld\n",bootmap_size);
bootmap_size = init_bootmem(bootmap >> PAGE_SHIFT, end_pfn);
- e820_bootmem_free(NODE_DATA(0), 0, end_pfn << PAGE_SHIFT);
+ e820_register_active_regions(0, start_pfn, end_pfn);
+ free_bootmem_with_active_regions(0, end_pfn);
reserve_bootmem(bootmap, bootmap_size);
}
#endif
@@ -546,6 +547,7 @@ void __init setup_arch(char **cmdline_p)
early_identify_cpu(&boot_cpu_data);
+ e820_register_active_regions(0, 0, -1UL);
/*
* partially used pages are not usable - thus
* we are rounding upwards:
@@ -571,6 +573,9 @@ void __init setup_arch(char **cmdline_p)
acpi_boot_table_init();
#endif
+ /* Remove active ranges so rediscovery with NUMA-awareness happens */
+ remove_all_active_ranges();
+
#ifdef CONFIG_ACPI_NUMA
/*
* Parse SRAT to discover nodes.
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.17-mm6-103-x86_use_init_nodes/arch/x86_64/mm/init.c linux-2.6.17-mm6-104-x86_64_use_init_nodes/arch/x86_64/mm/init.c
--- linux-2.6.17-mm6-103-x86_use_init_nodes/arch/x86_64/mm/init.c 2006-07-05 14:31:12.000000000 +0100
+++ linux-2.6.17-mm6-104-x86_64_use_init_nodes/arch/x86_64/mm/init.c 2006-07-06 11:09:46.000000000 +0100
@@ -403,69 +403,12 @@ void __cpuinit zap_low_mappings(int cpu)
__flush_tlb_all();
}
-/* Compute zone sizes for the DMA and DMA32 zones in a node. */
-__init void
-size_zones(unsigned long *z, unsigned long *h,
- unsigned long start_pfn, unsigned long end_pfn)
-{
- int i;
- unsigned long w;
-
- for (i = 0; i < MAX_NR_ZONES; i++)
- z[i] = 0;
-
- if (start_pfn < MAX_DMA_PFN)
- z[ZONE_DMA] = MAX_DMA_PFN - start_pfn;
- if (start_pfn < MAX_DMA32_PFN) {
- unsigned long dma32_pfn = MAX_DMA32_PFN;
- if (dma32_pfn > end_pfn)
- dma32_pfn = end_pfn;
- z[ZONE_DMA32] = dma32_pfn - start_pfn;
- }
- z[ZONE_NORMAL] = end_pfn - start_pfn;
-
- /* Remove lower zones from higher ones. */
- w = 0;
- for (i = 0; i < MAX_NR_ZONES; i++) {
- if (z[i])
- z[i] -= w;
- w += z[i];
- }
-
- /* Compute holes */
- w = start_pfn;
- for (i = 0; i < MAX_NR_ZONES; i++) {
- unsigned long s = w;
- w += z[i];
- h[i] = e820_hole_size(s, w);
- }
-
- /* Add the space pace needed for mem_map to the holes too. */
- for (i = 0; i < MAX_NR_ZONES; i++)
- h[i] += (z[i] * sizeof(struct page)) / PAGE_SIZE;
-
- /* The 16MB DMA zone has the kernel and other misc mappings.
- Account them too */
- if (h[ZONE_DMA]) {
- h[ZONE_DMA] += dma_reserve;
- if (h[ZONE_DMA] >= z[ZONE_DMA]) {
- printk(KERN_WARNING
- "Kernel too large and filling up ZONE_DMA?\n");
- h[ZONE_DMA] = z[ZONE_DMA];
- }
- }
-}
-
#ifndef CONFIG_NUMA
void __init paging_init(void)
{
- unsigned long zones[MAX_NR_ZONES], holes[MAX_NR_ZONES];
-
memory_present(0, 0, end_pfn);
sparse_init();
- size_zones(zones, holes, 0, end_pfn);
- free_area_init_node(0, NODE_DATA(0), zones,
- __pa(PAGE_OFFSET) >> PAGE_SHIFT, holes);
+ free_area_init_nodes(MAX_DMA_PFN, MAX_DMA32_PFN, end_pfn, end_pfn);
}
#endif
@@ -614,7 +557,8 @@ void __init mem_init(void)
#else
totalram_pages = free_all_bootmem();
#endif
- reservedpages = end_pfn - totalram_pages - e820_hole_size(0, end_pfn);
+ reservedpages = end_pfn - totalram_pages -
+ absent_pages_in_range(0, end_pfn);
after_bootmem = 1;
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.17-mm6-103-x86_use_init_nodes/arch/x86_64/mm/k8topology.c linux-2.6.17-mm6-104-x86_64_use_init_nodes/arch/x86_64/mm/k8topology.c
--- linux-2.6.17-mm6-103-x86_use_init_nodes/arch/x86_64/mm/k8topology.c 2006-06-18 02:49:35.000000000 +0100
+++ linux-2.6.17-mm6-104-x86_64_use_init_nodes/arch/x86_64/mm/k8topology.c 2006-07-06 11:09:46.000000000 +0100
@@ -146,6 +146,9 @@ int __init k8_scan_nodes(unsigned long s
nodes[nodeid].start = base;
nodes[nodeid].end = limit;
+ e820_register_active_regions(nodeid,
+ nodes[nodeid].start >> PAGE_SHIFT,
+ nodes[nodeid].end >> PAGE_SHIFT);
prevbase = base;
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.17-mm6-103-x86_use_init_nodes/arch/x86_64/mm/numa.c linux-2.6.17-mm6-104-x86_64_use_init_nodes/arch/x86_64/mm/numa.c
--- linux-2.6.17-mm6-103-x86_use_init_nodes/arch/x86_64/mm/numa.c 2006-06-18 02:49:35.000000000 +0100
+++ linux-2.6.17-mm6-104-x86_64_use_init_nodes/arch/x86_64/mm/numa.c 2006-07-06 11:09:46.000000000 +0100
@@ -161,7 +161,7 @@ void __init setup_node_bootmem(int nodei
bootmap_start >> PAGE_SHIFT,
start_pfn, end_pfn);
- e820_bootmem_free(NODE_DATA(nodeid), start, end);
+ free_bootmem_with_active_regions(nodeid, end);
reserve_bootmem_node(NODE_DATA(nodeid), nodedata_phys, pgdat_size);
reserve_bootmem_node(NODE_DATA(nodeid), bootmap_start, bootmap_pages<<PAGE_SHIFT);
@@ -175,13 +175,11 @@ void __init setup_node_bootmem(int nodei
void __init setup_node_zones(int nodeid)
{
unsigned long start_pfn, end_pfn, memmapsize, limit;
- unsigned long zones[MAX_NR_ZONES];
- unsigned long holes[MAX_NR_ZONES];
start_pfn = node_start_pfn(nodeid);
end_pfn = node_end_pfn(nodeid);
- Dprintk(KERN_INFO "Setting up node %d %lx-%lx\n",
+ Dprintk(KERN_INFO "Setting up memmap for node %d %lx-%lx\n",
nodeid, start_pfn, end_pfn);
/* Try to allocate mem_map at end to not fill up precious <4GB
@@ -195,10 +193,6 @@ void __init setup_node_zones(int nodeid)
round_down(limit - memmapsize, PAGE_SIZE),
limit);
#endif
-
- size_zones(zones, holes, start_pfn, end_pfn);
- free_area_init_node(nodeid, NODE_DATA(nodeid), zones,
- start_pfn, holes);
}
void __init numa_init_array(void)
@@ -259,8 +253,11 @@ static int numa_emulation(unsigned long
printk(KERN_ERR "No NUMA hash function found. Emulation disabled.\n");
return -1;
}
- for_each_online_node(i)
+ for_each_online_node(i) {
+ e820_register_active_regions(i, nodes[i].start >> PAGE_SHIFT,
+ nodes[i].end >> PAGE_SHIFT);
setup_node_bootmem(i, nodes[i].start, nodes[i].end);
+ }
numa_init_array();
return 0;
}
@@ -299,6 +296,7 @@ void __init numa_initmem_init(unsigned l
for (i = 0; i < NR_CPUS; i++)
numa_set_node(i, 0);
node_to_cpumask[0] = cpumask_of_cpu(0);
+ e820_register_active_regions(0, start_pfn, end_pfn);
setup_node_bootmem(0, start_pfn << PAGE_SHIFT, end_pfn << PAGE_SHIFT);
}
@@ -346,6 +344,8 @@ void __init paging_init(void)
for_each_online_node(i) {
setup_node_zones(i);
}
+
+ free_area_init_nodes(MAX_DMA_PFN, MAX_DMA32_PFN, end_pfn, end_pfn);
}
/* [numa=off] */
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.17-mm6-103-x86_use_init_nodes/arch/x86_64/mm/srat.c linux-2.6.17-mm6-104-x86_64_use_init_nodes/arch/x86_64/mm/srat.c
--- linux-2.6.17-mm6-103-x86_use_init_nodes/arch/x86_64/mm/srat.c 2006-07-05 14:31:12.000000000 +0100
+++ linux-2.6.17-mm6-104-x86_64_use_init_nodes/arch/x86_64/mm/srat.c 2006-07-06 11:09:46.000000000 +0100
@@ -91,6 +91,7 @@ static __init void bad_srat(void)
apicid_to_node[i] = NUMA_NO_NODE;
for (i = 0; i < MAX_NUMNODES; i++)
nodes_add[i].start = nodes[i].end = 0;
+ remove_all_active_ranges();
}
static __init inline int srat_disabled(void)
@@ -173,7 +174,7 @@ static int hotadd_enough_memory(struct b
if (mem < 0)
return 0;
- allowed = (end_pfn - e820_hole_size(0, end_pfn)) * PAGE_SIZE;
+ allowed = (end_pfn - absent_pages_in_range(0, end_pfn)) * PAGE_SIZE;
allowed = (allowed / 100) * hotadd_percent;
if (allocated + mem > allowed) {
unsigned long range;
@@ -223,7 +224,7 @@ static int reserve_hotadd(int node, unsi
}
/* This check might be a bit too strict, but I'm keeping it for now. */
- if (e820_hole_size(s_pfn, e_pfn) != e_pfn - s_pfn) {
+ if (absent_pages_in_range(s_pfn, e_pfn) != e_pfn - s_pfn) {
printk(KERN_ERR "SRAT: Hotplug area has existing memory\n");
return -1;
}
@@ -317,6 +318,8 @@ acpi_numa_memory_affinity_init(struct ac
printk(KERN_INFO "SRAT: Node %u PXM %u %Lx-%Lx\n", node, pxm,
nd->start, nd->end);
+ e820_register_active_regions(node, nd->start >> PAGE_SHIFT,
+ nd->end >> PAGE_SHIFT);
#ifdef RESERVE_HOTADD
if (ma->flags.hot_pluggable && reserve_hotadd(node, start, end) < 0) {
@@ -341,13 +344,13 @@ static int nodes_cover_memory(void)
unsigned long s = nodes[i].start >> PAGE_SHIFT;
unsigned long e = nodes[i].end >> PAGE_SHIFT;
pxmram += e - s;
- pxmram -= e820_hole_size(s, e);
+ pxmram -= absent_pages_in_range(s, e);
pxmram -= nodes_add[i].end - nodes_add[i].start;
if ((long)pxmram < 0)
pxmram = 0;
}
- e820ram = end_pfn - e820_hole_size(0, end_pfn);
+ e820ram = end_pfn - absent_pages_in_range(0, end_pfn);
/* We seem to lose 3 pages somewhere. Allow a bit of slack. */
if ((long)(e820ram - pxmram) >= 1*1024*1024) {
printk(KERN_ERR
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.17-mm6-103-x86_use_init_nodes/include/asm-x86_64/e820.h linux-2.6.17-mm6-104-x86_64_use_init_nodes/include/asm-x86_64/e820.h
--- linux-2.6.17-mm6-103-x86_use_init_nodes/include/asm-x86_64/e820.h 2006-06-18 02:49:35.000000000 +0100
+++ linux-2.6.17-mm6-104-x86_64_use_init_nodes/include/asm-x86_64/e820.h 2006-07-06 11:09:46.000000000 +0100
@@ -50,10 +50,9 @@ extern void e820_print_map(char *who);
extern int e820_any_mapped(unsigned long start, unsigned long end, unsigned type);
extern int e820_all_mapped(unsigned long start, unsigned long end, unsigned type);
-extern void e820_bootmem_free(pg_data_t *pgdat, unsigned long start,unsigned long end);
extern void e820_setup_gap(void);
-extern unsigned long e820_hole_size(unsigned long start_pfn,
- unsigned long end_pfn);
+extern void e820_register_active_regions(int nid,
+ unsigned long start_pfn, unsigned long end_pfn);
extern void __init parse_memopt(char *p, char **end);
extern void __init parse_memmapopt(char *p, char **end);
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.17-mm6-103-x86_use_init_nodes/include/asm-x86_64/proto.h linux-2.6.17-mm6-104-x86_64_use_init_nodes/include/asm-x86_64/proto.h
--- linux-2.6.17-mm6-103-x86_use_init_nodes/include/asm-x86_64/proto.h 2006-07-05 14:31:17.000000000 +0100
+++ linux-2.6.17-mm6-104-x86_64_use_init_nodes/include/asm-x86_64/proto.h 2006-07-06 11:09:46.000000000 +0100
@@ -24,8 +24,6 @@ extern void mtrr_bp_init(void);
#define mtrr_bp_init() do {} while (0)
#endif
extern void init_memory_mapping(unsigned long start, unsigned long end);
-extern void size_zones(unsigned long *z, unsigned long *h,
- unsigned long start_pfn, unsigned long end_pfn);
extern void system_call(void);
extern int kernel_syscall(void);
^ permalink raw reply
* [PATCH 5/6] Have ia64 use add_active_range() and free_area_init_nodes
From: Mel Gorman @ 2006-07-08 11:12 UTC (permalink / raw)
To: akpm
Cc: davej, tony.luck, linuxppc-dev, Mel Gorman, linux-kernel,
bob.picco, ak, linux-mm
In-Reply-To: <20060708111042.28664.14732.sendpatchset@skynet.skynet.ie>
Size zones and holes in an architecture independent manner for ia64.
arch/ia64/Kconfig | 3 ++
arch/ia64/mm/contig.c | 60 +++++-----------------------------------
arch/ia64/mm/discontig.c | 41 ++++-----------------------
arch/ia64/mm/init.c | 12 ++++++++
include/asm-ia64/meminit.h | 1
5 files changed, 30 insertions(+), 87 deletions(-)
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Bob Picco <bob.picco@hp.com>
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.17-mm6-104-x86_64_use_init_nodes/arch/ia64/Kconfig linux-2.6.17-mm6-105-ia64_use_init_nodes/arch/ia64/Kconfig
--- linux-2.6.17-mm6-104-x86_64_use_init_nodes/arch/ia64/Kconfig 2006-07-05 14:31:11.000000000 +0100
+++ linux-2.6.17-mm6-105-ia64_use_init_nodes/arch/ia64/Kconfig 2006-07-06 11:11:30.000000000 +0100
@@ -361,6 +361,9 @@ config NODES_SHIFT
MAX_NUMNODES will be 2^(This value).
If in doubt, use the default.
+config ARCH_POPULATES_NODE_MAP
+ def_bool y
+
# VIRTUAL_MEM_MAP and FLAT_NODE_MEM_MAP are functionally equivalent.
# VIRTUAL_MEM_MAP has been retained for historical reasons.
config VIRTUAL_MEM_MAP
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.17-mm6-104-x86_64_use_init_nodes/arch/ia64/mm/contig.c linux-2.6.17-mm6-105-ia64_use_init_nodes/arch/ia64/mm/contig.c
--- linux-2.6.17-mm6-104-x86_64_use_init_nodes/arch/ia64/mm/contig.c 2006-07-05 14:31:11.000000000 +0100
+++ linux-2.6.17-mm6-105-ia64_use_init_nodes/arch/ia64/mm/contig.c 2006-07-06 11:11:30.000000000 +0100
@@ -25,10 +25,6 @@
#include <asm/sections.h>
#include <asm/mca.h>
-#ifdef CONFIG_VIRTUAL_MEM_MAP
-static unsigned long num_dma_physpages;
-#endif
-
/**
* show_mem - display a memory statistics summary
*
@@ -209,18 +205,6 @@ count_pages (u64 start, u64 end, void *a
return 0;
}
-#ifdef CONFIG_VIRTUAL_MEM_MAP
-static int
-count_dma_pages (u64 start, u64 end, void *arg)
-{
- unsigned long *count = arg;
-
- if (start < MAX_DMA_ADDRESS)
- *count += (min(end, MAX_DMA_ADDRESS) - start) >> PAGE_SHIFT;
- return 0;
-}
-#endif
-
/*
* Set up the page tables.
*/
@@ -229,47 +213,24 @@ void __init
paging_init (void)
{
unsigned long max_dma;
- unsigned long zones_size[MAX_NR_ZONES];
#ifdef CONFIG_VIRTUAL_MEM_MAP
- unsigned long zholes_size[MAX_NR_ZONES];
+ unsigned long nid = 0;
unsigned long max_gap;
#endif
- /* initialize mem_map[] */
-
- memset(zones_size, 0, sizeof(zones_size));
-
num_physpages = 0;
efi_memmap_walk(count_pages, &num_physpages);
max_dma = virt_to_phys((void *) MAX_DMA_ADDRESS) >> PAGE_SHIFT;
#ifdef CONFIG_VIRTUAL_MEM_MAP
- memset(zholes_size, 0, sizeof(zholes_size));
-
- num_dma_physpages = 0;
- efi_memmap_walk(count_dma_pages, &num_dma_physpages);
-
- if (max_low_pfn < max_dma) {
- zones_size[ZONE_DMA] = max_low_pfn;
- zholes_size[ZONE_DMA] = max_low_pfn - num_dma_physpages;
- } else {
- zones_size[ZONE_DMA] = max_dma;
- zholes_size[ZONE_DMA] = max_dma - num_dma_physpages;
- if (num_physpages > num_dma_physpages) {
- zones_size[ZONE_NORMAL] = max_low_pfn - max_dma;
- zholes_size[ZONE_NORMAL] =
- ((max_low_pfn - max_dma) -
- (num_physpages - num_dma_physpages));
- }
- }
-
max_gap = 0;
+ efi_memmap_walk(register_active_ranges, &nid);
efi_memmap_walk(find_largest_hole, (u64 *)&max_gap);
if (max_gap < LARGE_GAP) {
vmem_map = (struct page *) 0;
- free_area_init_node(0, NODE_DATA(0), zones_size, 0,
- zholes_size);
+ free_area_init_nodes(max_dma, max_dma,
+ max_low_pfn, max_low_pfn);
} else {
unsigned long map_size;
@@ -281,19 +242,14 @@ paging_init (void)
efi_memmap_walk(create_mem_map_page_table, NULL);
NODE_DATA(0)->node_mem_map = vmem_map;
- free_area_init_node(0, NODE_DATA(0), zones_size,
- 0, zholes_size);
+ free_area_init_nodes(max_dma, max_dma,
+ max_low_pfn, max_low_pfn);
printk("Virtual mem_map starts at 0x%p\n", mem_map);
}
#else /* !CONFIG_VIRTUAL_MEM_MAP */
- if (max_low_pfn < max_dma)
- zones_size[ZONE_DMA] = max_low_pfn;
- else {
- zones_size[ZONE_DMA] = max_dma;
- zones_size[ZONE_NORMAL] = max_low_pfn - max_dma;
- }
- free_area_init(zones_size);
+ add_active_range(0, 0, max_low_pfn);
+ free_area_init_nodes(max_dma, max_dma, max_low_pfn, max_low_pfn);
#endif /* !CONFIG_VIRTUAL_MEM_MAP */
zero_page_memmap_ptr = virt_to_page(ia64_imva(empty_zero_page));
}
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.17-mm6-104-x86_64_use_init_nodes/arch/ia64/mm/discontig.c linux-2.6.17-mm6-105-ia64_use_init_nodes/arch/ia64/mm/discontig.c
--- linux-2.6.17-mm6-104-x86_64_use_init_nodes/arch/ia64/mm/discontig.c 2006-07-05 14:31:11.000000000 +0100
+++ linux-2.6.17-mm6-105-ia64_use_init_nodes/arch/ia64/mm/discontig.c 2006-07-06 11:11:30.000000000 +0100
@@ -703,6 +703,7 @@ static __init int count_node_pages(unsig
{
unsigned long end = start + len;
+ add_active_range(node, start >> PAGE_SHIFT, end >> PAGE_SHIFT);
mem_data[node].num_physpages += len >> PAGE_SHIFT;
if (start <= __pa(MAX_DMA_ADDRESS))
mem_data[node].num_dma_physpages +=
@@ -727,9 +728,8 @@ static __init int count_node_pages(unsig
void __init paging_init(void)
{
unsigned long max_dma;
- unsigned long zones_size[MAX_NR_ZONES];
- unsigned long zholes_size[MAX_NR_ZONES];
unsigned long pfn_offset = 0;
+ unsigned long max_pfn = 0;
int node;
max_dma = virt_to_phys((void *) MAX_DMA_ADDRESS) >> PAGE_SHIFT;
@@ -746,47 +746,18 @@ void __init paging_init(void)
#endif
for_each_online_node(node) {
- memset(zones_size, 0, sizeof(zones_size));
- memset(zholes_size, 0, sizeof(zholes_size));
-
num_physpages += mem_data[node].num_physpages;
-
- if (mem_data[node].min_pfn >= max_dma) {
- /* All of this node's memory is above ZONE_DMA */
- zones_size[ZONE_NORMAL] = mem_data[node].max_pfn -
- mem_data[node].min_pfn;
- zholes_size[ZONE_NORMAL] = mem_data[node].max_pfn -
- mem_data[node].min_pfn -
- mem_data[node].num_physpages;
- } else if (mem_data[node].max_pfn < max_dma) {
- /* All of this node's memory is in ZONE_DMA */
- zones_size[ZONE_DMA] = mem_data[node].max_pfn -
- mem_data[node].min_pfn;
- zholes_size[ZONE_DMA] = mem_data[node].max_pfn -
- mem_data[node].min_pfn -
- mem_data[node].num_dma_physpages;
- } else {
- /* This node has memory in both zones */
- zones_size[ZONE_DMA] = max_dma -
- mem_data[node].min_pfn;
- zholes_size[ZONE_DMA] = zones_size[ZONE_DMA] -
- mem_data[node].num_dma_physpages;
- zones_size[ZONE_NORMAL] = mem_data[node].max_pfn -
- max_dma;
- zholes_size[ZONE_NORMAL] = zones_size[ZONE_NORMAL] -
- (mem_data[node].num_physpages -
- mem_data[node].num_dma_physpages);
- }
-
pfn_offset = mem_data[node].min_pfn;
#ifdef CONFIG_VIRTUAL_MEM_MAP
NODE_DATA(node)->node_mem_map = vmem_map + pfn_offset;
#endif
- free_area_init_node(node, NODE_DATA(node), zones_size,
- pfn_offset, zholes_size);
+ if (mem_data[node].max_pfn > max_pfn)
+ max_pfn = mem_data[node].max_pfn;
}
+ free_area_init_nodes(max_dma, max_dma, max_pfn, max_pfn);
+
zero_page_memmap_ptr = virt_to_page(ia64_imva(empty_zero_page));
}
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.17-mm6-104-x86_64_use_init_nodes/arch/ia64/mm/init.c linux-2.6.17-mm6-105-ia64_use_init_nodes/arch/ia64/mm/init.c
--- linux-2.6.17-mm6-104-x86_64_use_init_nodes/arch/ia64/mm/init.c 2006-07-05 14:31:11.000000000 +0100
+++ linux-2.6.17-mm6-105-ia64_use_init_nodes/arch/ia64/mm/init.c 2006-07-06 11:11:30.000000000 +0100
@@ -538,6 +538,18 @@ find_largest_hole (u64 start, u64 end, v
last_end = end;
return 0;
}
+
+int __init
+register_active_ranges(u64 start, u64 end, void *nid)
+{
+ BUG_ON(nid == NULL);
+ BUG_ON(*(unsigned long *)nid >= MAX_NUMNODES);
+
+ add_active_range(*(unsigned long *)nid,
+ __pa(start) >> PAGE_SHIFT,
+ __pa(end) >> PAGE_SHIFT);
+ return 0;
+}
#endif /* CONFIG_VIRTUAL_MEM_MAP */
static int __init
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.17-mm6-104-x86_64_use_init_nodes/include/asm-ia64/meminit.h linux-2.6.17-mm6-105-ia64_use_init_nodes/include/asm-ia64/meminit.h
--- linux-2.6.17-mm6-104-x86_64_use_init_nodes/include/asm-ia64/meminit.h 2006-07-05 14:31:17.000000000 +0100
+++ linux-2.6.17-mm6-105-ia64_use_init_nodes/include/asm-ia64/meminit.h 2006-07-06 11:11:30.000000000 +0100
@@ -56,6 +56,7 @@ extern void efi_memmap_init(unsigned lon
extern unsigned long vmalloc_end;
extern struct page *vmem_map;
extern int find_largest_hole (u64 start, u64 end, void *arg);
+ extern int register_active_ranges (u64 start, u64 end, void *arg);
extern int create_mem_map_page_table (u64 start, u64 end, void *arg);
#endif
^ permalink raw reply
* [PATCH 6/6] Account for memmap and optionally the kernel image as holes
From: Mel Gorman @ 2006-07-08 11:12 UTC (permalink / raw)
To: akpm
Cc: davej, tony.luck, linux-mm, Mel Gorman, ak, bob.picco,
linux-kernel, linuxppc-dev
In-Reply-To: <20060708111042.28664.14732.sendpatchset@skynet.skynet.ie>
The x86_64 code accounted for memmap and some portions of the the DMA zone
as holes. This was because those areas would never be reclaimed and accounting
for them as memory affects min watermarks. This patch will account for the
memmap as a memory hole. Architectures may optionally use set_dma_reserve() if
they wish to account for a portion of memory in ZONE_DMA as a hole.
arch/x86_64/mm/init.c | 4 +
include/linux/mm.h | 1
mm/page_alloc.c | 96 +++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 100 insertions(+), 1 deletion(-)
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.17-mm6-105-ia64_use_init_nodes/arch/x86_64/mm/init.c linux-2.6.17-mm6-106-account_kernel_mmap/arch/x86_64/mm/init.c
--- linux-2.6.17-mm6-105-ia64_use_init_nodes/arch/x86_64/mm/init.c 2006-07-06 11:09:46.000000000 +0100
+++ linux-2.6.17-mm6-106-account_kernel_mmap/arch/x86_64/mm/init.c 2006-07-06 11:13:22.000000000 +0100
@@ -658,8 +658,10 @@ void __init reserve_bootmem_generic(unsi
#else
reserve_bootmem(phys, len);
#endif
- if (phys+len <= MAX_DMA_PFN*PAGE_SIZE)
+ if (phys+len <= MAX_DMA_PFN*PAGE_SIZE) {
dma_reserve += len / PAGE_SIZE;
+ set_dma_reserve(dma_reserve);
+ }
}
int kern_addr_valid(unsigned long addr)
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.17-mm6-105-ia64_use_init_nodes/include/linux/mm.h linux-2.6.17-mm6-106-account_kernel_mmap/include/linux/mm.h
--- linux-2.6.17-mm6-105-ia64_use_init_nodes/include/linux/mm.h 2006-07-06 11:04:22.000000000 +0100
+++ linux-2.6.17-mm6-106-account_kernel_mmap/include/linux/mm.h 2006-07-06 11:13:22.000000000 +0100
@@ -1005,6 +1005,7 @@ extern void free_bootmem_with_active_reg
unsigned long max_low_pfn);
extern void sparse_memory_present_with_active_regions(int nid);
#endif /* CONFIG_ARCH_POPULATES_NODE_MAP */
+extern void set_dma_reserve(unsigned long new_dma_reserve);
extern void memmap_init_zone(unsigned long, int, unsigned long, unsigned long);
extern void setup_per_zone_pages_min(void);
extern void mem_init(void);
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.17-mm6-105-ia64_use_init_nodes/mm/page_alloc.c linux-2.6.17-mm6-106-account_kernel_mmap/mm/page_alloc.c
--- linux-2.6.17-mm6-105-ia64_use_init_nodes/mm/page_alloc.c 2006-07-06 21:44:44.000000000 +0100
+++ linux-2.6.17-mm6-106-account_kernel_mmap/mm/page_alloc.c 2006-07-06 11:13:22.000000000 +0100
@@ -87,6 +87,7 @@ int min_free_kbytes = 1024;
unsigned long __meminitdata nr_kernel_pages;
unsigned long __meminitdata nr_all_pages;
+unsigned long __initdata dma_reserve;
#ifdef CONFIG_ARCH_POPULATES_NODE_MAP
/*
@@ -2300,6 +2301,20 @@ unsigned long __init zone_absent_pages_i
arch_zone_lowest_possible_pfn[zone_type],
arch_zone_highest_possible_pfn[zone_type]);
}
+
+/* Return the zone index a PFN is in */
+int memmap_zone_idx(struct page *lmem_map)
+{
+ int i;
+ unsigned long phys_addr = virt_to_phys(lmem_map);
+ unsigned long pfn = phys_addr >> PAGE_SHIFT;
+
+ for (i = 0; i < MAX_NR_ZONES; i++)
+ if (pfn < arch_zone_highest_possible_pfn[i])
+ break;
+
+ return i;
+}
#else
static inline unsigned long zone_spanned_pages_in_node(int nid,
unsigned long zone_type,
@@ -2317,6 +2332,11 @@ static inline unsigned long zone_absent_
return zholes_size[zone_type];
}
+
+static inline int memmap_zone_idx(struct page *lmem_map)
+{
+ return MAX_NR_ZONES;
+}
#endif
static void __init calculate_node_totalpages(struct pglist_data *pgdat,
@@ -2340,6 +2360,58 @@ static void __init calculate_node_totalp
realtotalpages);
}
+#ifdef CONFIG_FLAT_NODE_MEM_MAP
+/* Account for mem_map for CONFIG_FLAT_NODE_MEM_MAP */
+unsigned long __meminit account_memmap(struct pglist_data *pgdat,
+ int zone_index)
+{
+ unsigned long pages = 0;
+ if (zone_index == memmap_zone_idx(pgdat->node_mem_map)) {
+ pages = pgdat->node_spanned_pages;
+ pages = (pages * sizeof(struct page)) >> PAGE_SHIFT;
+ printk(KERN_DEBUG "%lu pages used for memmap\n", pages);
+ }
+ return pages;
+}
+#else
+/* Account for mem_map for CONFIG_SPARSEMEM */
+unsigned long account_memmap(struct pglist_data *pgdat, int zone_index)
+{
+ unsigned long pages = 0;
+ unsigned long memmap_pfn;
+ struct page *memmap_addr;
+ int pnum;
+ unsigned long pgdat_startpfn, pgdat_endpfn;
+ struct mem_section *section;
+
+ pgdat_startpfn = pgdat->node_start_pfn;
+ pgdat_endpfn = pgdat_startpfn + pgdat->node_spanned_pages;
+
+ /* Go through valid sections looking for memmap */
+ for (pnum = 0; pnum < NR_MEM_SECTIONS; pnum++) {
+ if (!valid_section_nr(pnum))
+ continue;
+
+ section = __nr_to_section(pnum);
+ if (!section_has_mem_map(section))
+ continue;
+
+ memmap_addr = __section_mem_map_addr(section);
+ memmap_pfn = (unsigned long)memmap_addr >> PAGE_SHIFT;
+
+ if (memmap_pfn < pgdat_startpfn || memmap_pfn >= pgdat_endpfn)
+ continue;
+
+ if (zone_index == memmap_zone_idx(memmap_addr))
+ pages += (PAGES_PER_SECTION * sizeof(struct page));
+ }
+
+ pages >>= PAGE_SHIFT;
+ printk(KERN_DEBUG "%lu pages used for SPARSE memmap\n", pages);
+ return pages;
+}
+#endif
+
/*
* Set up the zone data structures:
* - mark all pages reserved
@@ -2366,6 +2438,15 @@ static void __meminit free_area_init_cor
size = zone_spanned_pages_in_node(nid, j, zones_size);
realsize = size - zone_absent_pages_in_node(nid, j,
zholes_size);
+
+ realsize -= account_memmap(pgdat, j);
+ /* Account for reserved DMA pages */
+ if (j == ZONE_DMA && realsize > dma_reserve) {
+ realsize -= dma_reserve;
+ printk(KERN_DEBUG "%lu pages DMA reserved\n",
+ dma_reserve);
+ }
+
if (j < ZONE_HIGHMEM)
nr_kernel_pages += realsize;
nr_all_pages += realsize;
@@ -2685,6 +2761,21 @@ void __init free_area_init_nodes(unsigne
}
#endif /* CONFIG_ARCH_POPULATES_NODE_MAP */
+/**
+ * set_dma_reserve - Account the specified number of pages reserved in ZONE_DMA
+ * @new_dma_reserve - The number of pages to mark reserved
+ *
+ * The per-cpu batchsize and zone watermarks are determined by present_pages.
+ * In the DMA zone, a significant percentage may be consumed by kernel image
+ * and other unfreeable allocations which can skew the watermarks badly. This
+ * function may optionally be used to account for unfreeable pages in
+ * ZONE_DMA. The effect will be lower watermarks and smaller per-cpu batchsize
+ */
+void __init set_dma_reserve(unsigned long new_dma_reserve)
+{
+ dma_reserve = new_dma_reserve;
+}
+
#ifndef CONFIG_NEED_MULTIPLE_NODES
static bootmem_data_t contig_bootmem_data;
struct pglist_data contig_page_data = { .bdata = &contig_bootmem_data };
^ permalink raw reply
* Re: [PATCH 0/6] Sizing zones and holes in an architecture independent manner V8
From: Heiko Carstens @ 2006-07-08 11:42 UTC (permalink / raw)
To: Mel Gorman
Cc: akpm, davej, tony.luck, linux-mm, ak, bob.picco, linux-kernel,
linuxppc-dev
In-Reply-To: <20060708111042.28664.14732.sendpatchset@skynet.skynet.ie>
On Sat, Jul 08, 2006 at 12:10:42PM +0100, Mel Gorman wrote:
> There are differences in the zone sizes for x86_64 as the arch-specific code
> for x86_64 accounts the kernel image and the starting mem_maps as memory
> holes but the architecture-independent code accounts the memory as present.
Shouldn't this be the same for all architectures? Or to put it in other words:
why does only x86_64 account the kernel image as memory hole?
^ permalink raw reply
* Re: [PATCH] powermac: defer work in backlight key press
From: Michael Hanselmann @ 2006-07-08 14:35 UTC (permalink / raw)
To: Aristeu Sergio Rozanski Filho; +Cc: linuxppc-dev
In-Reply-To: <20060704013923.GG27596@cathedrallabs.org>
On Mon, Jul 03, 2006 at 10:39:23PM -0300, Aristeu Sergio Rozanski Filho wrote:
> powermac: defer work in backlight key press
> pmac_backlight_key() is called under interrupt context, can't use mutexes or
> semaphores, so defer the backlight level for later, as it's not critical
Nack, needs spinlock around this part:
> + pmac_backlight_key_queued = direction;
> + schedule_work(&pmac_backlight_key_work);
I'll submit another patch with more fixes soon.
Greets,
Michael
^ permalink raw reply
* Kernel hangs after "Now booting the kernel".
From: Ming Liu @ 2006-07-08 15:17 UTC (permalink / raw)
To: ammubhai; +Cc: linuxppc-embedded
In-Reply-To: <44AE83F6.2090702@gmail.com>
Dear Ameet,
I really feel so sorry to take away much time from you. However, there are
really so many strange problems for my system.
I tried to implement the Temac option and recompiled the kernel. Then when
I boot the kernel from CF card, after the information "Now booting the
kernel", it hangs. I canceled the Temac again and recompiled the kernel,
restoring the former condition, this problem still existed. It's so
strange.
Any suggestion to solve this? Of course, suggestions from other friends are
also appreciated.
Thanks a lot again for your great help!
Regards
Ming
_________________________________________________________________
与联机的朋友进行交流,请使用 MSN Messenger: http://messenger.msn.com/cn
^ permalink raw reply
* Re: [PATCH] powermac: defer work in backlight key press (and export fixes)
From: Michael Hanselmann @ 2006-07-08 15:58 UTC (permalink / raw)
To: akpm; +Cc: linuxppc-dev, johannes, aris, linux-kernel
In-Reply-To: <20060704013923.GG27596@cathedrallabs.org>
pmac_backlight_key() is called under interrupt context, and therefore
can't use mutexes or semaphores, so defer the backlight level for later,
as it's not critical (code by Aristeu S. Rozanski F. <aris@valeta.org>).
Also, it fixes exports and Kconfig depdencies.
Signed-off-by: Michael Hanselmann <linux-kernel@hansmi.ch>
Acked-by: Aristeu S. Rozanski F. <aris@valeta.org>
---
diff -Nrup --exclude-from linux-exclude-from linux-2.6.18-rc1.orig/arch/powerpc/platforms/powermac/backlight.c linux-2.6.18-rc1/arch/powerpc/platforms/powermac/backlight.c
--- linux-2.6.18-rc1.orig/arch/powerpc/platforms/powermac/backlight.c 2006-07-08 12:27:01.000000000 +0200
+++ linux-2.6.18-rc1/arch/powerpc/platforms/powermac/backlight.c 2006-07-08 17:30:23.000000000 +0200
@@ -15,6 +15,15 @@
#define OLD_BACKLIGHT_MAX 15
+static void pmac_backlight_key_worker(void *data);
+static DECLARE_WORK(pmac_backlight_key_work, pmac_backlight_key_worker, NULL);
+
+/* Although this variable is used in interrupt context, it makes no sense to
+ * protect it. No user is able to produce enough key events per second and
+ * notice the errors that might happen.
+ */
+static int pmac_backlight_key_queued;
+
/* Protect the pmac_backlight variable */
DEFINE_MUTEX(pmac_backlight_mutex);
@@ -71,7 +80,7 @@ int pmac_backlight_curve_lookup(struct f
return level;
}
-static void pmac_backlight_key(int direction)
+static void pmac_backlight_key_worker(void *data)
{
mutex_lock(&pmac_backlight_mutex);
if (pmac_backlight) {
@@ -82,7 +91,8 @@ static void pmac_backlight_key(int direc
props = pmac_backlight->props;
brightness = props->brightness +
- ((direction?-1:1) * (props->max_brightness / 15));
+ ((pmac_backlight_key_queued?-1:1) *
+ (props->max_brightness / 15));
if (brightness < 0)
brightness = 0;
@@ -97,14 +107,13 @@ static void pmac_backlight_key(int direc
mutex_unlock(&pmac_backlight_mutex);
}
-void pmac_backlight_key_up()
+void pmac_backlight_key(int direction)
{
- pmac_backlight_key(0);
-}
-
-void pmac_backlight_key_down()
-{
- pmac_backlight_key(1);
+ /* we can receive multiple interrupts here, but the scheduled work
+ * will run only once, with the last value
+ */
+ pmac_backlight_key_queued = direction;
+ schedule_work(&pmac_backlight_key_work);
}
int pmac_backlight_set_legacy_brightness(int brightness)
@@ -157,3 +166,7 @@ int pmac_backlight_get_legacy_brightness
return result;
}
+
+EXPORT_SYMBOL_GPL(pmac_backlight);
+EXPORT_SYMBOL_GPL(pmac_backlight_mutex);
+EXPORT_SYMBOL_GPL(pmac_has_backlight_type);
diff -Nrup --exclude-from linux-exclude-from linux-2.6.18-rc1.orig/drivers/macintosh/Kconfig linux-2.6.18-rc1/drivers/macintosh/Kconfig
--- linux-2.6.18-rc1.orig/drivers/macintosh/Kconfig 2006-07-08 12:27:10.000000000 +0200
+++ linux-2.6.18-rc1/drivers/macintosh/Kconfig 2006-07-08 15:28:36.000000000 +0200
@@ -113,7 +113,10 @@ config PMAC_MEDIABAY
config PMAC_BACKLIGHT
bool "Backlight control for LCD screens"
- depends on ADB_PMU && (BROKEN || !PPC64)
+ depends on ADB_PMU && FB = y && (BROKEN || !PPC64)
+ select FB_BACKLIGHT
+ select BACKLIGHT_CLASS_DEVICE
+ select BACKLIGHT_LCD_SUPPORT
help
Say Y here to enable Macintosh specific extensions of the generic
backlight code. With this enabled, the brightness keys on older
diff -Nrup --exclude-from linux-exclude-from linux-2.6.18-rc1.orig/include/asm-powerpc/backlight.h linux-2.6.18-rc1/include/asm-powerpc/backlight.h
--- linux-2.6.18-rc1.orig/include/asm-powerpc/backlight.h 2006-07-08 12:27:21.000000000 +0200
+++ linux-2.6.18-rc1/include/asm-powerpc/backlight.h 2006-07-08 17:14:43.000000000 +0200
@@ -16,13 +16,16 @@
extern struct backlight_device *pmac_backlight;
extern struct mutex pmac_backlight_mutex;
-extern void pmac_backlight_calc_curve(struct fb_info*);
extern int pmac_backlight_curve_lookup(struct fb_info *info, int value);
extern int pmac_has_backlight_type(const char *type);
-extern void pmac_backlight_key_up(void);
-extern void pmac_backlight_key_down(void);
+extern void pmac_backlight_key(int direction);
+
+#define pmac_backlight_key_up() \
+ do { pmac_backlight_key(0); } while(0);
+#define pmac_backlight_key_down() \
+ do { pmac_backlight_key(1); } while(0);
extern int pmac_backlight_set_legacy_brightness(int brightness);
extern int pmac_backlight_get_legacy_brightness(void);
^ permalink raw reply
* powermac: Combined fixes for backlight code
From: Michael Hanselmann @ 2006-07-08 17:35 UTC (permalink / raw)
To: akpm; +Cc: linuxppc-dev, johannes, aris, linux-kernel
Hello Andrew
Sorry, I already had two other patches earlier today. Now Michael Buesch
pointed out that I should have used inline's instead of #define. Thus I
decided to combine all the patches and ask you to ignore the other ones.
The subjects were "Locking fixes for powermac backlight infrastructure"
and "powermac: defer work in backlight key press (and export fixes)".
Here's the description for the combined patch:
This patch fixes several problems:
- pmac_backlight_key() is called under interrupt context, and therefore
can't use mutexes or semaphores, so defer the backlight level for
later, as it's not critical (original code by Aristeu S. Rozanski F.
<aris@valeta.org>).
- Add exports for functions that might be called from modules
- Fix Kconfig depdencies on PMAC_BACKLIGHT.
- Fix locking issues on calls from inside the driver (reported by
Aristeu S. Rozanski F., too)
- Fix wrong calculation of backlight values in some of the drivers
- Replace pmac_backlight_key_up/down by inline functions
Signed-off-by: Michael Hanselmann <linux-kernel@hansmi.ch>
Acked-by: Aristeu S. Rozanski F. <aris@valeta.org>
Acked-by: René Nussbaumer <linux-kernel@killerfox.forkbomb.ch>
---
diff -Nrup --exclude-from linux-exclude-from linux-2.6.18-rc1.orig/arch/powerpc/platforms/powermac/backlight.c linux-2.6.18-rc1/arch/powerpc/platforms/powermac/backlight.c
--- linux-2.6.18-rc1.orig/arch/powerpc/platforms/powermac/backlight.c 2006-07-08 18:04:49.000000000 +0200
+++ linux-2.6.18-rc1/arch/powerpc/platforms/powermac/backlight.c 2006-07-08 19:11:52.000000000 +0200
@@ -15,6 +15,15 @@
#define OLD_BACKLIGHT_MAX 15
+static void pmac_backlight_key_worker(void *data);
+static DECLARE_WORK(pmac_backlight_key_work, pmac_backlight_key_worker, NULL);
+
+/* Although this variable is used in interrupt context, it makes no sense to
+ * protect it. No user is able to produce enough key events per second and
+ * notice the errors that might happen.
+ */
+static int pmac_backlight_key_queued;
+
/* Protect the pmac_backlight variable */
DEFINE_MUTEX(pmac_backlight_mutex);
@@ -71,7 +80,7 @@ int pmac_backlight_curve_lookup(struct f
return level;
}
-static void pmac_backlight_key(int direction)
+static void pmac_backlight_key_worker(void *data)
{
mutex_lock(&pmac_backlight_mutex);
if (pmac_backlight) {
@@ -82,7 +91,8 @@ static void pmac_backlight_key(int direc
props = pmac_backlight->props;
brightness = props->brightness +
- ((direction?-1:1) * (props->max_brightness / 15));
+ ((pmac_backlight_key_queued?-1:1) *
+ (props->max_brightness / 15));
if (brightness < 0)
brightness = 0;
@@ -97,14 +107,13 @@ static void pmac_backlight_key(int direc
mutex_unlock(&pmac_backlight_mutex);
}
-void pmac_backlight_key_up()
+void pmac_backlight_key(int direction)
{
- pmac_backlight_key(0);
-}
-
-void pmac_backlight_key_down()
-{
- pmac_backlight_key(1);
+ /* we can receive multiple interrupts here, but the scheduled work
+ * will run only once, with the last value
+ */
+ pmac_backlight_key_queued = direction;
+ schedule_work(&pmac_backlight_key_work);
}
int pmac_backlight_set_legacy_brightness(int brightness)
@@ -157,3 +166,7 @@ int pmac_backlight_get_legacy_brightness
return result;
}
+
+EXPORT_SYMBOL_GPL(pmac_backlight);
+EXPORT_SYMBOL_GPL(pmac_backlight_mutex);
+EXPORT_SYMBOL_GPL(pmac_has_backlight_type);
diff -Nrup --exclude-from linux-exclude-from linux-2.6.18-rc1.orig/drivers/macintosh/Kconfig linux-2.6.18-rc1/drivers/macintosh/Kconfig
--- linux-2.6.18-rc1.orig/drivers/macintosh/Kconfig 2006-07-08 18:04:50.000000000 +0200
+++ linux-2.6.18-rc1/drivers/macintosh/Kconfig 2006-07-08 19:11:52.000000000 +0200
@@ -113,7 +113,10 @@ config PMAC_MEDIABAY
config PMAC_BACKLIGHT
bool "Backlight control for LCD screens"
- depends on ADB_PMU && (BROKEN || !PPC64)
+ depends on ADB_PMU && FB = y && (BROKEN || !PPC64)
+ select FB_BACKLIGHT
+ select BACKLIGHT_CLASS_DEVICE
+ select BACKLIGHT_LCD_SUPPORT
help
Say Y here to enable Macintosh specific extensions of the generic
backlight code. With this enabled, the brightness keys on older
diff -Nrup --exclude-from linux-exclude-from linux-2.6.18-rc1.orig/drivers/video/aty/aty128fb.c linux-2.6.18-rc1/drivers/video/aty/aty128fb.c
--- linux-2.6.18-rc1.orig/drivers/video/aty/aty128fb.c 2006-07-08 18:04:50.000000000 +0200
+++ linux-2.6.18-rc1/drivers/video/aty/aty128fb.c 2006-07-08 19:11:50.000000000 +0200
@@ -456,6 +456,7 @@ static void do_wait_for_fifo(u16 entries
static void wait_for_fifo(u16 entries, struct aty128fb_par *par);
static void wait_for_idle(struct aty128fb_par *par);
static u32 depth_to_dst(u32 depth);
+static void aty128_bl_set_power(struct fb_info *info, int power);
#define BIOS_IN8(v) (readb(bios + (v)))
#define BIOS_IN16(v) (readb(bios + (v)) | \
@@ -1258,25 +1259,11 @@ static void aty128_set_lcd_enable(struct
reg &= ~LVDS_DISPLAY_DIS;
aty_st_le32(LVDS_GEN_CNTL, reg);
#ifdef CONFIG_FB_ATY128_BACKLIGHT
- mutex_lock(&info->bl_mutex);
- if (info->bl_dev) {
- down(&info->bl_dev->sem);
- info->bl_dev->props->update_status(info->bl_dev);
- up(&info->bl_dev->sem);
- }
- mutex_unlock(&info->bl_mutex);
+ aty128_bl_set_power(info, FB_BLANK_UNBLANK);
#endif
} else {
#ifdef CONFIG_FB_ATY128_BACKLIGHT
- mutex_lock(&info->bl_mutex);
- if (info->bl_dev) {
- down(&info->bl_dev->sem);
- info->bl_dev->props->brightness = 0;
- info->bl_dev->props->power = FB_BLANK_POWERDOWN;
- info->bl_dev->props->update_status(info->bl_dev);
- up(&info->bl_dev->sem);
- }
- mutex_unlock(&info->bl_mutex);
+ aty128_bl_set_power(info, FB_BLANK_POWERDOWN);
#endif
reg = aty_ld_le32(LVDS_GEN_CNTL);
reg |= LVDS_DISPLAY_DIS;
@@ -1703,6 +1690,7 @@ static int __devinit aty128fb_setup(char
static struct backlight_properties aty128_bl_data;
+/* Call with fb_info->bl_mutex held */
static int aty128_bl_get_level_brightness(struct aty128fb_par *par,
int level)
{
@@ -1710,10 +1698,8 @@ static int aty128_bl_get_level_brightnes
int atylevel;
/* Get and convert the value */
- mutex_lock(&info->bl_mutex);
atylevel = MAX_LEVEL -
(info->bl_curve[level] * FB_BACKLIGHT_MAX / MAX_LEVEL);
- mutex_unlock(&info->bl_mutex);
if (atylevel < 0)
atylevel = 0;
@@ -1731,7 +1717,8 @@ static int aty128_bl_get_level_brightnes
/* That one prevents proper CRT output with LCD off */
#undef BACKLIGHT_DAC_OFF
-static int aty128_bl_update_status(struct backlight_device *bd)
+/* Call with fb_info->bl_mutex held */
+static int __aty128_bl_update_status(struct backlight_device *bd)
{
struct aty128fb_par *par = class_get_devdata(&bd->class_dev);
unsigned int reg = aty_ld_le32(LVDS_GEN_CNTL);
@@ -1784,6 +1771,19 @@ static int aty128_bl_update_status(struc
return 0;
}
+static int aty128_bl_update_status(struct backlight_device *bd)
+{
+ struct aty128fb_par *par = class_get_devdata(&bd->class_dev);
+ struct fb_info *info = pci_get_drvdata(par->pdev);
+ int ret;
+
+ mutex_lock(&info->bl_mutex);
+ ret = __aty128_bl_update_status(bd);
+ mutex_unlock(&info->bl_mutex);
+
+ return ret;
+}
+
static int aty128_bl_get_brightness(struct backlight_device *bd)
{
return bd->props->brightness;
@@ -1796,6 +1796,16 @@ static struct backlight_properties aty12
.max_brightness = (FB_BACKLIGHT_LEVELS - 1),
};
+static void aty128_bl_set_power(struct fb_info *info, int power)
+{
+ mutex_lock(&info->bl_mutex);
+ up(&info->bl_dev->sem);
+ info->bl_dev->props->power = power;
+ __aty128_bl_update_status(info->bl_dev);
+ down(&info->bl_dev->sem);
+ mutex_unlock(&info->bl_mutex);
+}
+
static void aty128_bl_init(struct aty128fb_par *par)
{
struct fb_info *info = pci_get_drvdata(par->pdev);
@@ -2198,12 +2208,8 @@ static int aty128fb_blank(int blank, str
return 0;
#ifdef CONFIG_FB_ATY128_BACKLIGHT
- if (machine_is(powermac) && blank) {
- down(&fb->bl_dev->sem);
- fb->bl_dev->props->power = FB_BLANK_POWERDOWN;
- fb->bl_dev->props->update_status(fb->bl_dev);
- up(&fb->bl_dev->sem);
- }
+ if (machine_is(powermac) && blank)
+ aty128_bl_set_power(fb, FB_BLANK_POWERDOWN);
#endif
if (blank & FB_BLANK_VSYNC_SUSPEND)
@@ -2219,14 +2225,12 @@ static int aty128fb_blank(int blank, str
aty128_set_crt_enable(par, par->crt_on && !blank);
aty128_set_lcd_enable(par, par->lcd_on && !blank);
}
+
#ifdef CONFIG_FB_ATY128_BACKLIGHT
- if (machine_is(powermac) && !blank) {
- down(&fb->bl_dev->sem);
- fb->bl_dev->props->power = FB_BLANK_UNBLANK;
- fb->bl_dev->props->update_status(fb->bl_dev);
- up(&fb->bl_dev->sem);
- }
+ if (machine_is(powermac) && !blank)
+ aty128_bl_set_power(fb, FB_BLANK_UNBLANK);
#endif
+
return 0;
}
diff -Nrup --exclude-from linux-exclude-from linux-2.6.18-rc1.orig/drivers/video/aty/atyfb_base.c linux-2.6.18-rc1/drivers/video/aty/atyfb_base.c
--- linux-2.6.18-rc1.orig/drivers/video/aty/atyfb_base.c 2006-07-08 18:04:50.000000000 +0200
+++ linux-2.6.18-rc1/drivers/video/aty/atyfb_base.c 2006-07-08 19:11:50.000000000 +0200
@@ -2129,15 +2129,14 @@ static int atyfb_pci_resume(struct pci_d
static struct backlight_properties aty_bl_data;
+/* Call with fb_info->bl_mutex held */
static int aty_bl_get_level_brightness(struct atyfb_par *par, int level)
{
struct fb_info *info = pci_get_drvdata(par->pdev);
int atylevel;
/* Get and convert the value */
- mutex_lock(&info->bl_mutex);
atylevel = info->bl_curve[level] * FB_BACKLIGHT_MAX / MAX_LEVEL;
- mutex_unlock(&info->bl_mutex);
if (atylevel < 0)
atylevel = 0;
@@ -2147,7 +2146,8 @@ static int aty_bl_get_level_brightness(s
return atylevel;
}
-static int aty_bl_update_status(struct backlight_device *bd)
+/* Call with fb_info->bl_mutex held */
+static int __aty_bl_update_status(struct backlight_device *bd)
{
struct atyfb_par *par = class_get_devdata(&bd->class_dev);
unsigned int reg = aty_ld_lcd(LCD_MISC_CNTL, par);
@@ -2172,6 +2172,19 @@ static int aty_bl_update_status(struct b
return 0;
}
+static int aty_bl_update_status(struct backlight_device *bd)
+{
+ struct atyfb_par *par = class_get_devdata(&bd->class_dev);
+ struct fb_info *info = pci_get_drvdata(par->pdev);
+ int ret;
+
+ mutex_lock(&info->bl_mutex);
+ ret = __aty_bl_update_status(bd);
+ mutex_unlock(&info->bl_mutex);
+
+ return ret;
+}
+
static int aty_bl_get_brightness(struct backlight_device *bd)
{
return bd->props->brightness;
@@ -2184,6 +2197,16 @@ static struct backlight_properties aty_b
.max_brightness = (FB_BACKLIGHT_LEVELS - 1),
};
+static void aty_bl_set_power(struct fb_info *info, int power)
+{
+ mutex_lock(&info->bl_mutex);
+ up(&info->bl_dev->sem);
+ info->bl_dev->props->power = power;
+ __aty_bl_update_status(info->bl_dev);
+ down(&info->bl_dev->sem);
+ mutex_unlock(&info->bl_mutex);
+}
+
static void aty_bl_init(struct atyfb_par *par)
{
struct fb_info *info = pci_get_drvdata(par->pdev);
@@ -2790,16 +2813,8 @@ static int atyfb_blank(int blank, struct
return 0;
#ifdef CONFIG_PMAC_BACKLIGHT
- if (machine_is(powermac) && blank > FB_BLANK_NORMAL) {
- mutex_lock(&info->bl_mutex);
- if (info->bl_dev) {
- down(&info->bl_dev->sem);
- info->bl_dev->props->power = FB_BLANK_POWERDOWN;
- info->bl_dev->props->update_status(info->bl_dev);
- up(&info->bl_dev->sem);
- }
- mutex_unlock(&info->bl_mutex);
- }
+ if (machine_is(powermac) && blank > FB_BLANK_NORMAL)
+ aty_bl_set_power(info, FB_BLANK_POWERDOWN);
#elif defined(CONFIG_FB_ATY_GENERIC_LCD)
if (par->lcd_table && blank > FB_BLANK_NORMAL &&
(aty_ld_lcd(LCD_GEN_CNTL, par) & LCD_ON)) {
@@ -2830,16 +2845,8 @@ static int atyfb_blank(int blank, struct
aty_st_le32(CRTC_GEN_CNTL, gen_cntl, par);
#ifdef CONFIG_PMAC_BACKLIGHT
- if (machine_is(powermac) && blank <= FB_BLANK_NORMAL) {
- mutex_lock(&info->bl_mutex);
- if (info->bl_dev) {
- down(&info->bl_dev->sem);
- info->bl_dev->props->power = FB_BLANK_UNBLANK;
- info->bl_dev->props->update_status(info->bl_dev);
- up(&info->bl_dev->sem);
- }
- mutex_unlock(&info->bl_mutex);
- }
+ if (machine_is(powermac) && blank <= FB_BLANK_NORMAL)
+ aty_bl_set_power(info, FB_BLANK_UNBLANK);
#elif defined(CONFIG_FB_ATY_GENERIC_LCD)
if (par->lcd_table && blank <= FB_BLANK_NORMAL &&
(aty_ld_lcd(LCD_GEN_CNTL, par) & LCD_ON)) {
diff -Nrup --exclude-from linux-exclude-from linux-2.6.18-rc1.orig/drivers/video/chipsfb.c linux-2.6.18-rc1/drivers/video/chipsfb.c
--- linux-2.6.18-rc1.orig/drivers/video/chipsfb.c 2006-07-08 18:04:50.000000000 +0200
+++ linux-2.6.18-rc1/drivers/video/chipsfb.c 2006-07-08 19:11:50.000000000 +0200
@@ -150,12 +150,11 @@ static int chipsfb_blank(int blank, stru
mutex_lock(&pmac_backlight_mutex);
if (pmac_backlight) {
- down(&pmac_backlight->sem);
-
/* used to disable backlight only for blank > 1, but it seems
* useful at blank = 1 too (saves battery, extends backlight
* life)
*/
+ down(&pmac_backlight->sem);
if (blank)
pmac_backlight->props->power = FB_BLANK_POWERDOWN;
else
diff -Nrup --exclude-from linux-exclude-from linux-2.6.18-rc1.orig/drivers/video/nvidia/nv_backlight.c linux-2.6.18-rc1/drivers/video/nvidia/nv_backlight.c
--- linux-2.6.18-rc1.orig/drivers/video/nvidia/nv_backlight.c 2006-07-08 18:04:50.000000000 +0200
+++ linux-2.6.18-rc1/drivers/video/nvidia/nv_backlight.c 2006-07-08 19:11:50.000000000 +0200
@@ -26,9 +26,11 @@
*/
#define MIN_LEVEL 0x158
#define MAX_LEVEL 0x534
+#define LEVEL_STEP ((MAX_LEVEL - MIN_LEVEL) / FB_BACKLIGHT_MAX)
static struct backlight_properties nvidia_bl_data;
+/* Call with fb_info->bl_mutex held */
static int nvidia_bl_get_level_brightness(struct nvidia_par *par,
int level)
{
@@ -36,9 +38,7 @@ static int nvidia_bl_get_level_brightnes
int nlevel;
/* Get and convert the value */
- mutex_lock(&info->bl_mutex);
- nlevel = info->bl_curve[level] * FB_BACKLIGHT_MAX / MAX_LEVEL;
- mutex_unlock(&info->bl_mutex);
+ nlevel = MIN_LEVEL + info->bl_curve[level] * LEVEL_STEP;
if (nlevel < 0)
nlevel = 0;
@@ -50,7 +50,8 @@ static int nvidia_bl_get_level_brightnes
return nlevel;
}
-static int nvidia_bl_update_status(struct backlight_device *bd)
+/* Call with fb_info->bl_mutex held */
+static int __nvidia_bl_update_status(struct backlight_device *bd)
{
struct nvidia_par *par = class_get_devdata(&bd->class_dev);
u32 tmp_pcrt, tmp_pmc, fpcontrol;
@@ -84,6 +85,19 @@ static int nvidia_bl_update_status(struc
return 0;
}
+static int nvidia_bl_update_status(struct backlight_device *bd)
+{
+ struct nvidia_par *par = class_get_devdata(&bd->class_dev);
+ struct fb_info *info = pci_get_drvdata(par->pci_dev);
+ int ret;
+
+ mutex_lock(&info->bl_mutex);
+ ret = __nvidia_bl_update_status(bd);
+ mutex_unlock(&info->bl_mutex);
+
+ return ret;
+}
+
static int nvidia_bl_get_brightness(struct backlight_device *bd)
{
return bd->props->brightness;
@@ -96,6 +110,16 @@ static struct backlight_properties nvidi
.max_brightness = (FB_BACKLIGHT_LEVELS - 1),
};
+void nvidia_bl_set_power(struct fb_info *info, int power)
+{
+ mutex_lock(&info->bl_mutex);
+ up(&info->bl_dev->sem);
+ info->bl_dev->props->power = power;
+ __nvidia_bl_update_status(info->bl_dev);
+ down(&info->bl_dev->sem);
+ mutex_unlock(&info->bl_mutex);
+}
+
void nvidia_bl_init(struct nvidia_par *par)
{
struct fb_info *info = pci_get_drvdata(par->pci_dev);
diff -Nrup --exclude-from linux-exclude-from linux-2.6.18-rc1.orig/drivers/video/nvidia/nvidia.c linux-2.6.18-rc1/drivers/video/nvidia/nvidia.c
--- linux-2.6.18-rc1.orig/drivers/video/nvidia/nvidia.c 2006-07-08 18:04:50.000000000 +0200
+++ linux-2.6.18-rc1/drivers/video/nvidia/nvidia.c 2006-07-08 19:11:50.000000000 +0200
@@ -933,16 +933,7 @@ static int nvidiafb_blank(int blank, str
NVWriteSeq(par, 0x01, tmp);
NVWriteCrtc(par, 0x1a, vesa);
-#ifdef CONFIG_FB_NVIDIA_BACKLIGHT
- mutex_lock(&info->bl_mutex);
- if (info->bl_dev) {
- down(&info->bl_dev->sem);
- info->bl_dev->props->power = blank;
- info->bl_dev->props->update_status(info->bl_dev);
- up(&info->bl_dev->sem);
- }
- mutex_unlock(&info->bl_mutex);
-#endif
+ nvidia_bl_set_power(info, blank);
NVTRACE_LEAVE();
diff -Nrup --exclude-from linux-exclude-from linux-2.6.18-rc1.orig/drivers/video/nvidia/nv_proto.h linux-2.6.18-rc1/drivers/video/nvidia/nv_proto.h
--- linux-2.6.18-rc1.orig/drivers/video/nvidia/nv_proto.h 2006-07-08 18:04:50.000000000 +0200
+++ linux-2.6.18-rc1/drivers/video/nvidia/nv_proto.h 2006-07-08 19:11:50.000000000 +0200
@@ -68,9 +68,11 @@ extern u8 byte_rev[256];
#ifdef CONFIG_FB_NVIDIA_BACKLIGHT
extern void nvidia_bl_init(struct nvidia_par *par);
extern void nvidia_bl_exit(struct nvidia_par *par);
+extern void nvidia_bl_set_power(struct fb_info *info, int power);
#else
static inline void nvidia_bl_init(struct nvidia_par *par) {}
static inline void nvidia_bl_exit(struct nvidia_par *par) {}
+static inline void nvidia_bl_set_power(struct fb_info *info, int power) {}
#endif
#endif /* __NV_PROTO_H__ */
diff -Nrup --exclude-from linux-exclude-from linux-2.6.18-rc1.orig/drivers/video/riva/fbdev.c linux-2.6.18-rc1/drivers/video/riva/fbdev.c
--- linux-2.6.18-rc1.orig/drivers/video/riva/fbdev.c 2006-07-08 18:04:50.000000000 +0200
+++ linux-2.6.18-rc1/drivers/video/riva/fbdev.c 2006-07-08 19:11:50.000000000 +0200
@@ -278,9 +278,11 @@ static const struct riva_regs reg_templa
*/
#define MIN_LEVEL 0x158
#define MAX_LEVEL 0x534
+#define LEVEL_STEP ((MAX_LEVEL - MIN_LEVEL) / FB_BACKLIGHT_MAX)
static struct backlight_properties riva_bl_data;
+/* Call with fb_info->bl_mutex held */
static int riva_bl_get_level_brightness(struct riva_par *par,
int level)
{
@@ -288,9 +290,7 @@ static int riva_bl_get_level_brightness(
int nlevel;
/* Get and convert the value */
- mutex_lock(&info->bl_mutex);
- nlevel = info->bl_curve[level] * FB_BACKLIGHT_MAX / MAX_LEVEL;
- mutex_unlock(&info->bl_mutex);
+ nlevel = MIN_LEVEL + info->bl_curve[level] * LEVEL_STEP;
if (nlevel < 0)
nlevel = 0;
@@ -302,7 +302,8 @@ static int riva_bl_get_level_brightness(
return nlevel;
}
-static int riva_bl_update_status(struct backlight_device *bd)
+/* Call with fb_info->bl_mutex held */
+static int __riva_bl_update_status(struct backlight_device *bd)
{
struct riva_par *par = class_get_devdata(&bd->class_dev);
U032 tmp_pcrt, tmp_pmc;
@@ -327,6 +328,19 @@ static int riva_bl_update_status(struct
return 0;
}
+static int riva_bl_update_status(struct backlight_device *bd)
+{
+ struct riva_par *par = class_get_devdata(&bd->class_dev);
+ struct fb_info *info = pci_get_drvdata(par->pdev);
+ int ret;
+
+ mutex_lock(&info->bl_mutex);
+ ret = __riva_bl_update_status(bd);
+ mutex_unlock(&info->bl_mutex);
+
+ return ret;
+}
+
static int riva_bl_get_brightness(struct backlight_device *bd)
{
return bd->props->brightness;
@@ -339,6 +353,16 @@ static struct backlight_properties riva_
.max_brightness = (FB_BACKLIGHT_LEVELS - 1),
};
+static void riva_bl_set_power(struct fb_info *info, int power)
+{
+ mutex_lock(&info->bl_mutex);
+ up(&info->bl_dev->sem);
+ info->bl_dev->props->power = power;
+ __riva_bl_update_status(info->bl_dev);
+ down(&info->bl_dev->sem);
+ mutex_unlock(&info->bl_mutex);
+}
+
static void riva_bl_init(struct riva_par *par)
{
struct fb_info *info = pci_get_drvdata(par->pdev);
@@ -419,6 +443,7 @@ static void riva_bl_exit(struct riva_par
#else
static inline void riva_bl_init(struct riva_par *par) {}
static inline void riva_bl_exit(struct riva_par *par) {}
+static inline void riva_bl_set_power(struct fb_info *info, int power) {}
#endif /* CONFIG_FB_RIVA_BACKLIGHT */
/* ------------------------------------------------------------------------- *
@@ -1337,16 +1362,7 @@ static int rivafb_blank(int blank, struc
SEQout(par, 0x01, tmp);
CRTCout(par, 0x1a, vesa);
-#ifdef CONFIG_FB_RIVA_BACKLIGHT
- mutex_lock(&info->bl_mutex);
- if (info->bl_dev) {
- down(&info->bl_dev->sem);
- info->bl_dev->props->power = blank;
- info->bl_dev->props->update_status(info->bl_dev);
- up(&info->bl_dev->sem);
- }
- mutex_unlock(&info->bl_mutex);
-#endif
+ riva_bl_set_power(info, blank);
NVTRACE_LEAVE();
diff -Nrup --exclude-from linux-exclude-from linux-2.6.18-rc1.orig/include/asm-powerpc/backlight.h linux-2.6.18-rc1/include/asm-powerpc/backlight.h
--- linux-2.6.18-rc1.orig/include/asm-powerpc/backlight.h 2006-07-08 18:04:51.000000000 +0200
+++ linux-2.6.18-rc1/include/asm-powerpc/backlight.h 2006-07-08 19:13:53.000000000 +0200
@@ -16,13 +16,13 @@
extern struct backlight_device *pmac_backlight;
extern struct mutex pmac_backlight_mutex;
-extern void pmac_backlight_calc_curve(struct fb_info*);
extern int pmac_backlight_curve_lookup(struct fb_info *info, int value);
extern int pmac_has_backlight_type(const char *type);
-extern void pmac_backlight_key_up(void);
-extern void pmac_backlight_key_down(void);
+extern void pmac_backlight_key(int direction);
+static inline void pmac_backlight_key_up() { pmac_backlight_key(0); }
+static inline void pmac_backlight_key_down() { pmac_backlight_key(1); }
extern int pmac_backlight_set_legacy_brightness(int brightness);
extern int pmac_backlight_get_legacy_brightness(void);
^ permalink raw reply
* [PATCH] powerpc: EOI and clear IPI fix in xics_teardown_cpu()
From: Haren Myneni @ 2006-07-09 1:28 UTC (permalink / raw)
To: Paul Mackerras, benh; +Cc: linuxppc-dev
[-- Attachment #1: Type: text/plain, Size: 697 bytes --]
If OK, please sent this patch to upstream.
Thanks
Haren
When invoked kdump boot, plpar_eoi() call is getting failed and calling
panic().
Kernel panic - not syncing: bad return code EOI - rc = -4, value=ff000000
The issue is with the desc->chip->eoi(XICS_IPI) in xics_teardown_cpu().
Instead of passing the virq to desc->chip->eoi(), XICS_IPI is used.
Also, clear IPI in xics_teardown_cpu() got removed recently (in
2.6.17-git25). Noticed in some crash dump cases (Ex: initiate kdump
boot using soft-reset and xmon is enabled), IPI is not cleared for some
CPU(s) before starting the kdump boot. Hence, causing the kdump boot
failure.
Signed-off-by: Haren Myneni <haren@us.ibm.com>
[-- Attachment #2: ppc64-kdump-fix-eoi-IPI.patch --]
[-- Type: text/x-patch, Size: 802 bytes --]
--- 2618-rc1/arch/powerpc/platforms/pseries/xics.c.orig 2006-07-08 11:47:58.000000000 -0700
+++ 2618-rc1/arch/powerpc/platforms/pseries/xics.c 2006-07-08 11:47:07.000000000 -0700
@@ -783,6 +783,14 @@ void xics_teardown_cpu(int secondary)
xics_set_cpu_priority(cpu, 0);
/*
+ * Clear IPI
+ */
+ if (firmware_has_feature(FW_FEATURE_LPAR))
+ lpar_qirr_info(cpu, 0xff);
+ else
+ direct_qirr_info(cpu, 0xff);
+
+ /*
* we need to EOI the IPI if we got here from kexec down IPI
*
* probably need to check all the other interrupts too
@@ -795,7 +803,7 @@ void xics_teardown_cpu(int secondary)
return;
desc = get_irq_desc(ipi);
if (desc->chip && desc->chip->eoi)
- desc->chip->eoi(XICS_IPI);
+ desc->chip->eoi(ipi);
/*
* Some machines need to have at least one cpu in the GIQ,
^ permalink raw reply
* Re: [PATCH] powerpc: EOI and clear IPI fix in xics_teardown_cpu()
From: Benjamin Herrenschmidt @ 2006-07-09 1:37 UTC (permalink / raw)
To: Haren Myneni; +Cc: linuxppc-dev, Paul Mackerras
In-Reply-To: <44B05BD0.304@us.ibm.com>
On Sat, 2006-07-08 at 18:28 -0700, Haren Myneni wrote:
> If OK, please sent this patch to upstream.
>
> Thanks
> Haren
>
> When invoked kdump boot, plpar_eoi() call is getting failed and calling
> panic().
> Kernel panic - not syncing: bad return code EOI - rc = -4, value=ff000000
>
> The issue is with the desc->chip->eoi(XICS_IPI) in xics_teardown_cpu().
> Instead of passing the virq to desc->chip->eoi(), XICS_IPI is used.
> Also, clear IPI in xics_teardown_cpu() got removed recently (in
> 2.6.17-git25). Noticed in some crash dump cases (Ex: initiate kdump
> boot using soft-reset and xmon is enabled), IPI is not cleared for some
> CPU(s) before starting the kdump boot. Hence, causing the kdump boot
> failure.
It's already fixed in my latest patch that fixes some issues with the
new irq rework. Hopefully, paul will send the patch upstream tomorrow
after we had a chance to test it a bit more.
Cheers,
Ben
^ permalink raw reply
* Re: [PATCH] powerpc: EOI and clear IPI fix in xics_teardown_cpu()
From: Haren Myneni @ 2006-07-09 2:17 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: linuxppc-dev, Paul Mackerras
In-Reply-To: <1152409027.4128.14.camel@localhost.localdomain>
[-- Attachment #1: Type: text/plain, Size: 1291 bytes --]
Benjamin Herrenschmidt wrote:
>On Sat, 2006-07-08 at 18:28 -0700, Haren Myneni wrote:
>
>
>>If OK, please sent this patch to upstream.
>>
>>Thanks
>>Haren
>>
>>When invoked kdump boot, plpar_eoi() call is getting failed and calling
>>panic().
>>Kernel panic - not syncing: bad return code EOI - rc = -4, value=ff000000
>>
>>The issue is with the desc->chip->eoi(XICS_IPI) in xics_teardown_cpu().
>>Instead of passing the virq to desc->chip->eoi(), XICS_IPI is used.
>>Also, clear IPI in xics_teardown_cpu() got removed recently (in
>>2.6.17-git25). Noticed in some crash dump cases (Ex: initiate kdump
>>boot using soft-reset and xmon is enabled), IPI is not cleared for some
>>CPU(s) before starting the kdump boot. Hence, causing the kdump boot
>>failure.
>>
>>
>
>It's already fixed in my latest patch that fixes some issues with the
>new irq rework. Hopefully, paul will send the patch upstream tomorrow
>after we had a chance to test it a bit more.
>
>Cheers,
>Ben
>
>
>
Ok, Are you talking about the patch posted
http://ozlabs.org/pipermail/linuxppc-dev/2006-July/024350.html? Sorry, I
did not notice it before I posted. Yes, it is fixed passing proper ipi
value to desc->chip->eoi(). But, we also to need to clear IPI in
xics_teardown_cpu().
Thanks
Haren
[-- Attachment #2: ppc64-kdump-clear-IPI-fix.patch --]
[-- Type: text/x-patch, Size: 545 bytes --]
--- 2618-rc1/arch/powerpc/platforms/pseries/xics.c.orig 2006-07-08 13:14:29.000000000 -0700
+++ 2618-rc1/arch/powerpc/platforms/pseries/xics.c 2006-07-08 11:47:07.000000000 -0700
@@ -783,6 +783,14 @@ void xics_teardown_cpu(int secondary)
xics_set_cpu_priority(cpu, 0);
/*
+ * Clear IPI
+ */
+ if (firmware_has_feature(FW_FEATURE_LPAR))
+ lpar_qirr_info(cpu, 0xff);
+ else
+ direct_qirr_info(cpu, 0xff);
+
+ /*
* we need to EOI the IPI if we got here from kexec down IPI
*
* probably need to check all the other interrupts too
^ permalink raw reply
* Re: [PATCH] powerpc: EOI and clear IPI fix in xics_teardown_cpu()
From: Benjamin Herrenschmidt @ 2006-07-09 4:08 UTC (permalink / raw)
To: Haren Myneni; +Cc: linuxppc-dev, Paul Mackerras
In-Reply-To: <44B06756.8090301@us.ibm.com>
> Ok, Are you talking about the patch posted
> http://ozlabs.org/pipermail/linuxppc-dev/2006-July/024350.html? Sorry, I
> did not notice it before I posted. Yes, it is fixed passing proper ipi
> value to desc->chip->eoi(). But, we also to need to clear IPI in
> xics_teardown_cpu().
Ok, I'll fix that too.
Ben.
^ permalink raw reply
* Re: Linux v2.6.18-rc1
From: Benjamin Herrenschmidt @ 2006-07-09 10:34 UTC (permalink / raw)
To: Steve Fox; +Cc: linuxppc-dev, linux-kernel
In-Reply-To: <pan.2006.07.07.15.41.35.528827@us.ibm.com>
On Fri, 2006-07-07 at 10:41 -0500, Steve Fox wrote:
> We've got a ppc64 machine that won't boot with this due to an IDE error.
What machine precisely ?
> [snip]
> Freeing unused kernel memory: 256k freed
> running (1:2) /init autobench_args: ABAT:1152213829
>
> creating device nodes .hda: lost interrupt
> hda: lost interrupt
> hda: lost interrupt
> hda: lost interrupt
> hda: lost interrupt
> hda: lost interrupt
> hda: lost interrupt
> hda: lost interrupt
> hda: lost interrupt
>
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox