* Re: Create permanent mapping from PCI bus to region of physical memory
From: David Hawkins @ 2006-04-11 3:13 UTC (permalink / raw)
To: Howard, Marc; +Cc: linuxppc-embedded
In-Reply-To: <91B22F93A880FA48879475E134D6F0BE386FBC@CA1EXCLV02.adcorp.kla-tencor.com>
Hi Marc,
> The periperal is an FPGA. No, there is no internal processor;
> everything is coded in Verilog.
>
> Scatter/gather isn't a viable option because of this.
Er, why not, its an FPGA, everything is possible :)
So you have a PCI core, since you are planning to write to
the host memory space, its a Bus Master PCI core.
Who's FPGA, Altera, Xilinx, or someone else?
> Additionally non-contiguous memory would reduce bandwidth
> and increase FPGA design complexity.
Not necessarily. If the target is using bus master DMA to
write to the host memory, then you can hit pretty close
to the bandwidth of the PCI bus. If you are DMAing in
big blocks, the overhead of a block change isn't too much.
I did tests with the 440EP using a DMA controller on an
adapter board and found that the PCI bridge in the 440EP
was the limiting factor, i.e., for a 33MHz 32-bit bus
with a potential for 132MB/s, the *best* you can do is
about 40MB/s since the bridge only accepts data in cache
line sizes before sending a retry to the target. I can
send you those results.
> The data must be contignuous because
> of these reasons and the need for the data to be randomly
> accessible from the outside using simple address arithmetic.
Randomly accessible from where; the host or an I/O interface
at the FPGA. The pages can be made to appear contiguous to
a host processor user-space process using the nopage callback
of the VMA.
> I realize this isn't a standard linux request but having
> fixed, linear memory is quite common in embedded apps. There
> should be a way to create this mapping in the 440GX's hardware
> and I'm just looking for a system call (if there is one) to
> implement it.
Alas, this is one of the concessions one must make if you
want to use a processor that enables the MMU. However,
I don't see any fundamental limitation in the design
that would preclude a little extra work on the FPGA.
But, it does require additional Verilog to support
the flexibility. The long-term advantage is that you
don't have to provide a hack (eg. reserve a block of
high-memory under Linux).
So how about this concession. If Linux lets you alloc_pages
in 2MB max chunks, create 8 address decode regions on your
FPGA. Provide the host access to the 8 address registers.
When the Linux driver is installed, the driver alloc_pages
8 times and loads the PCI address of those regions into
the 8 registers.
On the FPGA side of things create a flat 16MB region, as
an address passes through a 2MB block, it changes the PCI
address it decodes to. So, your FPGA believes it has
a 16MB continuous block, and Linux can supply the memory
as 8 non-contiguous chunks. Back in user-space on Linux,
the nopage VMA call is used to map a page-at-a-time
and the 2MB regions appear as a 16MB contiguous region to
user-space. (There are probably other ways to make user
space see it as one block too)
Cheers
Dave
^ permalink raw reply
* RE: Create permanent mapping from PCI bus to region of physical memory
From: Howard, Marc @ 2006-04-11 2:56 UTC (permalink / raw)
To: David Hawkins; +Cc: linuxppc-embedded
In-Reply-To: <443B0718.3050001@ovro.caltech.edu>
[-- Attachment #1: Type: text/plain, Size: 2082 bytes --]
Howard, Marc wrote:
>> Hi,
>>
>> I'm working on a PPC440GX based board in which a PCI based peripheral
>> communicates with the host via a shared region of processor memory. The
>> peripheral will read and write this region autonomously. Because of
>> this I need a fixed mapping FROM the PCI bus TO system memory.
>>
>> It's easy enough to rope off the top 16 MB or so of system physical
>> memory by lying to linux in the boot arguments. What I can't seem find
>> is a system call to use to create a fixed, permanent mapping of PPC
>> memory as seen from the PCI bus.
>>
>> Has anyone out there done this and could share a code snippet with me?
>>
>Hi Marc,
>So the PPC440GX is the host, but what is the peripheral, and
>what restricts it to require a fixed address? Can it handle
>a set of fixed addresses, eg. a scatter-gather buffer setup
>by the host? Can the host and peripheral communicate in any
>other way, eg. when the peripheral changes something in the
>shared memory, surely it has to interrupt the host to let it
>know?
>Does the peripheral contain a CPU, if thats the case, then
>the host and peripheral could also maintain a smaller region
>that contains addresses of other pages, i.e., no need to
>restrict the design to a single block of memory, you just
>need a page of pointers to other shared blocks of pages.
Dave,
The periperal is an FPGA. No, there is no internal processor;
everything is coded in Verilog.
Scatter/gather isn't a viable option because of this. Additionally
non-contiguous memory would reduce bandwidth and increase
FPGA design complexity. The data must be contignuous because
of these reasons and the need for the data to be randomly
accessible from the outside using simple address arithmetic.
I realize this isn't a standard linux request but having
fixed, linear memory is quite common in embedded apps. There
should be a way to create this mapping in the 440GX's hardware
and I'm just looking for a system call (if there is one) to
implement it.
Thanks,
Marc
[-- Attachment #2: Type: text/html, Size: 2869 bytes --]
^ permalink raw reply
* Re: Oops: machine check, sig: 7 [#1] - 16-bit Pccard - SOLVED!!!
From: Benjamin Herrenschmidt @ 2006-04-11 1:30 UTC (permalink / raw)
To: Daniel Ritz; +Cc: linuxppc-dev, Edward Felberbaum, linux-pcmcia, paulus
In-Reply-To: <200604092257.52206.daniel.ritz-ml@swissonline.ch>
> > I see from the dmesg output from my original post that memory ranges
> > 0xfdd7f000 and 0xfddff000 are used by the Gatwick and Heathrow mac io
> > controllers. That explains the conflict with PCMCIA over 0xfd000000.
>
> interesting...the memory ranges are used by other devices yet the
> request_resource() call in PCMCIA succeeds,,,and PCI resources shoudn't
> be there in the first place then...
>
> ok, it's in file arch/powerpc/platforms/powermac/feature.c...
> i can't see any request_resource() calls in there...so CC'ing the PPC guys..
> they can sure comment...
Most of the macio stuff is in drivers/macintosh/macio_asic.c ... the
whole macio itself is a PCI device and thus should already "occupy" it's
possibly address range...
> >
> > Question, can I minimize the range of memory that is reserved 0xffffff - or
> > is it a waste of time?
> >
>
> yeah, you probably could, but it sounds like a waste of time...
>
> > Eddie
> >
>
> rgds
> -daniel
^ permalink raw reply
* Re: Create permanent mapping from PCI bus to region of physical memory
From: David Hawkins @ 2006-04-11 1:32 UTC (permalink / raw)
To: Howard, Marc; +Cc: linuxppc-embedded
In-Reply-To: <91B22F93A880FA48879475E134D6F0BE026CD163@CA1EXCLV02.adcorp.kla-tencor.com>
Howard, Marc wrote:
> Hi,
>
> I'm working on a PPC440GX based board in which a PCI based peripheral
> communicates with the host via a shared region of processor memory. The
> peripheral will read and write this region autonomously. Because of
> this I need a fixed mapping FROM the PCI bus TO system memory.
>
> It's easy enough to rope off the top 16 MB or so of system physical
> memory by lying to linux in the boot arguments. What I can't seem find
> is a system call to use to create a fixed, permanent mapping of PPC
> memory as seen from the PCI bus.
>
> Has anyone out there done this and could share a code snippet with me?
>
Hi Marc,
So the PPC440GX is the host, but what is the peripheral, and
what restricts it to require a fixed address? Can it handle
a set of fixed addresses, eg. a scatter-gather buffer setup
by the host? Can the host and peripheral communicate in any
other way, eg. when the peripheral changes something in the
shared memory, surely it has to interrupt the host to let it
know?
Does the peripheral contain a CPU, if thats the case, then
the host and peripheral could also maintain a smaller region
that contains addresses of other pages, i.e., no need to
restrict the design to a single block of memory, you just
need a page of pointers to other shared blocks of pages.
For example, the alloc_pages call can be used to get order 11
pages, eg. 2^11 x 4096-byte pages, 8MB (hmm, thats sounds a bit
big, I seem to recall 2MB was the maximum). So your 16MBs
could be 8 separate blocks.
Anyway the comment is that you can use alloc_pages() to give
you a contiguous chunk of memory at some allocation-time
address, but not a 16MB chunk.
Of course if the peripheral is fairly dumb, then you can't do
this. But if it indicates a buffer-used condition, perhaps you
can swap over to a new 2MB block.
If you can provide a few more details on the restriction of
the peripheral, then I or others can suggest options.
Dave
^ permalink raw reply
* Re: [Cbe-oss-dev] [PATCH 4/5] powerpc: export symbols for page size selection
From: Benjamin Herrenschmidt @ 2006-04-11 1:21 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: linuxppc-dev, cbe-oss-dev, Arnd Bergmann
In-Reply-To: <20060407152813.GA382@lst.de>
On Fri, 2006-04-07 at 17:28 +0200, Christoph Hellwig wrote:
> On Fri, Apr 07, 2006 at 12:00:04AM +0200, arnd@arndb.de wrote:
> > We need access to some symbols in powerpc memory management
> > from spufs in order to create proper SLB entries.
>
> One more reason to disallow modular spufs..
Why ? It's not problem as long as they are exported GPL only ...
Ben.
^ permalink raw reply
* Create permanent mapping from PCI bus to region of physical memory
From: Howard, Marc @ 2006-04-11 0:17 UTC (permalink / raw)
To: linuxppc-embedded
Hi,
I'm working on a PPC440GX based board in which a PCI based peripheral
communicates with the host via a shared region of processor memory. The
peripheral will read and write this region autonomously. Because of
this I need a fixed mapping FROM the PCI bus TO system memory.
It's easy enough to rope off the top 16 MB or so of system physical
memory by lying to linux in the boot arguments. What I can't seem find
is a system call to use to create a fixed, permanent mapping of PPC
memory as seen from the PCI bus.
Has anyone out there done this and could share a code snippet with me?
Thanks,
Marc W. Howard
^ permalink raw reply
* Re: [PATCH] 2 of 3 kdump-ppc64-soft-reset-fixes
From: Andrew Morton @ 2006-04-10 22:58 UTC (permalink / raw)
To: David Wilder; +Cc: mchintage, linuxppc-dev, fastboot, paulus
In-Reply-To: <443ADCCE.1070504@us.ibm.com>
David Wilder <dwilder@us.ibm.com> wrote:
>
>
Please don't use a filename as a patch title. See section 2 of
http://www.zip.com.au/~akpm/linux/patches/stuff/tpp.txt.
> --- 2617-rc1.orig/arch/powerpc/kernel/crash.c 2006-04-05 13:20:38.000000000 -0700
> +++ 2617-rc1/arch/powerpc/kernel/crash.c 2006-04-05 13:24:19.000000000 -0700
> @@ -23,6 +23,7 @@
> #include <linux/elfcore.h>
> #include <linux/init.h>
> #include <linux/types.h>
> +#include <linux/irq.h>
>
> #include <asm/processor.h>
> #include <asm/machdep.h>
> @@ -40,6 +41,9 @@
>
> /* This keeps a track of which one is crashing cpu. */
> int crashing_cpu = -1;
> +static cpumask_t cpus_in_crash = CPU_MASK_NONE;
> +extern struct kimage *kexec_crash_image;
> +extern cpumask_t cpus_in_sr;
extern declarations should be placed in .h, not in .c.
> + while (!cpu_isset(crashing_cpu, cpus_in_crash))
> + barrier();
The patch contains lots of busy-loops which all do barrier().
barrier() is purely a compiler thing. I suspect you meant cpu_relax().
Either way, there should be a cpu_relax() in these loops.
> +
> +/*
> + * This function will be called by secondary cpus or by kexec cpu
> + * if soft-reset is activated to stop some CPUs.
> + */
> +void crash_kexec_secondary(struct pt_regs *regs)
> +{
> + int cpu = smp_processor_id();
> + unsigned long flags;
> + int msecs = 5;
> +
> + local_irq_save(flags);
> + /* Wait 5ms if the kexec CPU is not entered yet. */
> + while (crashing_cpu < 0) {
> + if (--msecs < 0) {
> + /*
> + * Either kdump image is not loaded or
> + * kdump process is not started - Probably xmon
> + * exited using 'x'(exit and recover) or
> + * kexec_should_crash() failed for all running tasks.
> + */
> + cpu_clear(cpu, cpus_in_sr);
> + local_irq_restore(flags);
> + return;
> + }
> + mdelay(1);
> + barrier();
> + }
> + if (cpu == crashing_cpu) {
Whitespace broke here.
> + /*
> + * Panic CPU will enter this func only via soft-reset.
> + * Wait until all secondary CPUs entered and
> + * then start kexec boot.
> + */
> + crash_soft_reset_check(cpu);
> + cpu_set(crashing_cpu, cpus_in_crash);
> + if (ppc_md.kexec_cpu_down)
> + ppc_md.kexec_cpu_down(1, 0);
> + machine_kexec(kexec_crash_image);
> + /* NOTREACHED */
> + }
> + crash_ipi_callback(regs);
> +}
^ permalink raw reply
* [PATCH] 2 of 3 kdump-ppc64-soft-reset-fixes
From: David Wilder @ 2006-04-10 22:31 UTC (permalink / raw)
To: fastboot, linuxppc-dev, akpm, mchintage, paulus, hbabu
[-- Attachment #1: Type: text/plain, Size: 1680 bytes --]
- When a system hangs, user will activate the soft-reset to initiate the
kdump boot. But, soft-reset behavior is indeterminate on sending FWNMI
to all CPUS. i.e, all CPUs will not get FW NMI at the same time. When
the first CPU entered (calling primary CPU here onwards) kdump using
crash_kexec(), sends an IPI to other CPUs. Some CPUs will respond to
this IPI and execute crash_ipi_callback() before receive NMI. When they
receive FW NMI, will execute die() and waiting forever since no more
kdump IPI coming from the primary CPU. This issue will be fixed by
invoking crash_kexec_secondary() directly from die().
Since the secondary CPUs will enter the IPI_callback function two times,
CPU states have to be saved only once and the primary CPU has to start
kdump boot after all CPUs are stopped. Hence, cpus_in_crash bitmap is
used to determine whether pt_regs is saved. If the bit is not set, regs
will be saved. Introduced cpus_in_sr bitmap and enter_on_soft_reset
counter which are used to let the primary CPU know that all secondary
CPUs entered via soft-reset and ready to do down.
- For the crash scenario, when a CPU hangs with interrupts disabled and
the other CPUs panic or user invoked kdump boot using sysrq-c. In this
case, the hung CPU can not be stopped and causes the kdump boot not
successful. This case can be treated as complete system hang and asks
the user to activate soft-reset if all secondary CPUs are not stopped.
Please pick-up this patch. (Note this patch is dependent on
kdump-image-rm-static.patch, see my earlier posting)
--
David Wilder
IBM Linux Technology Center
Beaverton, Oregon, USA
dwilder@us.ibm.com
(503)578-3789
[-- Attachment #2: kdump-ppc64-soft-reset-fixes.patch --]
[-- Type: text/x-patch, Size: 10588 bytes --]
- When a system hangs, user will activate the soft-reset to initiate the kdump boot. But, soft-reset behavior is indeterminate on sending FWNMI to all CPUS. i.e, all CPUs will not get FW NMI at the same time. When the first CPU entered (calling primary CPU here onwards) kdump using crash_kexec(), sends an IPI to other CPUs. Some CPUs will respond to this IPI and execute crash_ipi_callback() before receive NMI. When they receive FW NMI, will execute die() and waiting forever since no more kdump IPI coming from the primary CPU. This issue will be fixed by invoking crash_kexec_secondary() directly from die().
Since the secondary CPUs will enter the IPI_callback function two times, CPU states have to be saved only once and the primary CPU has to start kdump boot after all CPUs are stopped. Hence, cpus_in_crash bitmap is used to determine whether pt_regs is saved. If the bit is not set, regs will be saved. Introduced cpus_in_sr bitmap and enter_on_soft_reset counter which are used to let the primary CPU know that all secondary CPUs entered via soft-reset and ready to do down.
- For the crash scenario, when a CPU hangs with interrupts disabled and the other CPUs panic or user invoked kdump boot using sysrq-c. In this case, the hung CPU can not be stopped and causes the kdump boot not successful. This case can be treated as complete system hang and asks the user to activate soft-reset if all secondary CPUs are not stopped.
Signed-off-by: David Wilder <dwilder@us.ibm.com>
Signed-off-by: Haren Myneni <haren@us.ibm.com>
diff -Naurp 2617-rc1.orig/arch/powerpc/kernel/crash.c 2617-rc1/arch/powerpc/kernel/crash.c
--- 2617-rc1.orig/arch/powerpc/kernel/crash.c 2006-04-05 13:20:38.000000000 -0700
+++ 2617-rc1/arch/powerpc/kernel/crash.c 2006-04-05 13:24:19.000000000 -0700
@@ -23,6 +23,7 @@
#include <linux/elfcore.h>
#include <linux/init.h>
#include <linux/types.h>
+#include <linux/irq.h>
#include <asm/processor.h>
#include <asm/machdep.h>
@@ -40,6 +41,9 @@
/* This keeps a track of which one is crashing cpu. */
int crashing_cpu = -1;
+static cpumask_t cpus_in_crash = CPU_MASK_NONE;
+extern struct kimage *kexec_crash_image;
+extern cpumask_t cpus_in_sr;
static u32 *append_elf_note(u32 *buf, char *name, unsigned type, void *data,
size_t data_len)
@@ -97,34 +101,66 @@ static void crash_save_this_cpu(struct p
}
#ifdef CONFIG_SMP
-static atomic_t waiting_for_crash_ipi;
+static atomic_t enter_on_soft_reset = ATOMIC_INIT(0);
void crash_ipi_callback(struct pt_regs *regs)
{
int cpu = smp_processor_id();
- if (cpu == crashing_cpu)
- return;
-
if (!cpu_online(cpu))
return;
- if (ppc_md.kexec_cpu_down)
- ppc_md.kexec_cpu_down(1, 1);
-
local_irq_disable();
+ if (!cpu_isset(cpu, cpus_in_crash))
+ crash_save_this_cpu(regs, cpu);
+ cpu_set(cpu, cpus_in_crash);
- crash_save_this_cpu(regs, cpu);
- atomic_dec(&waiting_for_crash_ipi);
+ /*
+ * Entered via soft-reset - could be the kdump
+ * process is invoked using soft-reset or user activated
+ * it if some CPU did not respond to an IPI.
+ * For soft-reset, the secondary CPU can enter this func
+ * twice. 1 - using IPI, and 2. soft-reset.
+ * Tell the kexec CPU that entered via soft-reset and ready
+ * to go down.
+ */
+ if (cpu_isset(cpu, cpus_in_sr)) {
+ cpu_clear(cpu, cpus_in_sr);
+ atomic_inc(&enter_on_soft_reset);
+ }
+
+ /*
+ * Starting the kdump boot.
+ * This barrier is needed to make sure that all CPUs are stopped.
+ * If not, soft-reset will be invoked to bring other CPUs.
+ */
+ while (!cpu_isset(crashing_cpu, cpus_in_crash))
+ barrier();
+
+ if (ppc_md.kexec_cpu_down)
+ ppc_md.kexec_cpu_down(1, 1);
kexec_smp_wait();
/* NOTREACHED */
}
-static void crash_kexec_prepare_cpus(void)
+/*
+ * Wait until all CPUs are entered via soft-reset.
+ */
+static void crash_soft_reset_check(int cpu)
+{
+ unsigned int ncpus = num_online_cpus() - 1;/* Excluding the panic cpu */
+
+ cpu_clear(cpu, cpus_in_sr);
+ while (atomic_read(&enter_on_soft_reset) != ncpus)
+ barrier();
+}
+
+
+static void crash_kexec_prepare_cpus(int cpu)
{
unsigned int msecs;
- atomic_set(&waiting_for_crash_ipi, num_online_cpus() - 1);
+ unsigned int ncpus = num_online_cpus() - 1;/* Excluding the panic cpu */
crash_send_ipi(crash_ipi_callback);
smp_wmb();
@@ -132,13 +168,12 @@ static void crash_kexec_prepare_cpus(voi
/*
* FIXME: Until we will have the way to stop other CPUSs reliabally,
* the crash CPU will send an IPI and wait for other CPUs to
- * respond. If not, proceed the kexec boot even though we failed to
- * capture other CPU states.
+ * respond.
* Delay of at least 10 seconds.
*/
- printk(KERN_ALERT "Sending IPI to other cpus...\n");
+ printk(KERN_EMERG "Sending IPI to other cpus...\n");
msecs = 10000;
- while ((atomic_read(&waiting_for_crash_ipi) > 0) && (--msecs > 0)) {
+ while ((cpus_weight(cpus_in_crash) < ncpus) && (--msecs > 0)) {
barrier();
mdelay(1);
}
@@ -148,18 +183,71 @@ static void crash_kexec_prepare_cpus(voi
/*
* FIXME: In case if we do not get all CPUs, one possibility: ask the
* user to do soft reset such that we get all.
- * IPI handler is already set by the panic cpu initially. Therefore,
- * all cpus could invoke this handler from die() and the panic CPU
- * will call machine_kexec() directly from this handler to do
- * kexec boot.
- */
- if (atomic_read(&waiting_for_crash_ipi))
- printk(KERN_ALERT "done waiting: %d cpus not responding\n",
- atomic_read(&waiting_for_crash_ipi));
+ * Soft-reset will be used until better mechanism is implemented.
+ */
+ if (cpus_weight(cpus_in_crash) < ncpus) {
+ printk(KERN_EMERG "done waiting: %d cpu(s) not responding\n",
+ ncpus - cpus_weight(cpus_in_crash));
+ printk(KERN_EMERG "Activate soft-reset to stop other cpu(s)\n");
+ cpus_in_sr = CPU_MASK_NONE;
+ atomic_set(&enter_on_soft_reset, 0);
+ while (cpus_weight(cpus_in_crash) < ncpus)
+ barrier();
+ }
+ /*
+ * Make sure all CPUs are entered via soft-reset if the kdump is
+ * invoked using soft-reset.
+ */
+ if (cpu_isset(cpu, cpus_in_sr))
+ crash_soft_reset_check(cpu);
/* Leave the IPI callback set */
}
+
+/*
+ * This function will be called by secondary cpus or by kexec cpu
+ * if soft-reset is activated to stop some CPUs.
+ */
+void crash_kexec_secondary(struct pt_regs *regs)
+{
+ int cpu = smp_processor_id();
+ unsigned long flags;
+ int msecs = 5;
+
+ local_irq_save(flags);
+ /* Wait 5ms if the kexec CPU is not entered yet. */
+ while (crashing_cpu < 0) {
+ if (--msecs < 0) {
+ /*
+ * Either kdump image is not loaded or
+ * kdump process is not started - Probably xmon
+ * exited using 'x'(exit and recover) or
+ * kexec_should_crash() failed for all running tasks.
+ */
+ cpu_clear(cpu, cpus_in_sr);
+ local_irq_restore(flags);
+ return;
+ }
+ mdelay(1);
+ barrier();
+ }
+ if (cpu == crashing_cpu) {
+ /*
+ * Panic CPU will enter this func only via soft-reset.
+ * Wait until all secondary CPUs entered and
+ * then start kexec boot.
+ */
+ crash_soft_reset_check(cpu);
+ cpu_set(crashing_cpu, cpus_in_crash);
+ if (ppc_md.kexec_cpu_down)
+ ppc_md.kexec_cpu_down(1, 0);
+ machine_kexec(kexec_crash_image);
+ /* NOTREACHED */
+ }
+ crash_ipi_callback(regs);
+}
+
#else
-static void crash_kexec_prepare_cpus(void)
+static void crash_kexec_prepare_cpus(int cpu)
{
/*
* move the secondarys to us so that we can copy
@@ -170,6 +258,10 @@ static void crash_kexec_prepare_cpus(voi
smp_release_cpus();
}
+void crash_kexec_secondary(struct pt_regs *regs)
+{
+ cpus_in_sr = CPU_MASK_NONE;
+}
#endif
void default_machine_crash_shutdown(struct pt_regs *regs)
@@ -185,15 +277,14 @@ void default_machine_crash_shutdown(stru
* The kernel is broken so disable interrupts.
*/
local_irq_disable();
-
- if (ppc_md.kexec_cpu_down)
- ppc_md.kexec_cpu_down(1, 0);
-
/*
* Make a note of crashing cpu. Will be used in machine_kexec
* such that another IPI will not be sent.
*/
crashing_cpu = smp_processor_id();
- crash_kexec_prepare_cpus();
crash_save_this_cpu(regs, crashing_cpu);
+ crash_kexec_prepare_cpus(crashing_cpu);
+ cpu_set(crashing_cpu, cpus_in_crash);
+ if (ppc_md.kexec_cpu_down)
+ ppc_md.kexec_cpu_down(1, 0);
}
diff -Naurp 2617-rc1.orig/arch/powerpc/kernel/traps.c 2617-rc1/arch/powerpc/kernel/traps.c
--- 2617-rc1.orig/arch/powerpc/kernel/traps.c 2006-04-05 13:20:33.000000000 -0700
+++ 2617-rc1/arch/powerpc/kernel/traps.c 2006-04-05 13:22:00.000000000 -0700
@@ -51,9 +51,13 @@
#include <asm/firmware.h>
#include <asm/processor.h>
#endif
+#include <asm/kexec.h>
#ifdef CONFIG_PPC64 /* XXX */
#define _IO_BASE pci_io_base
+#ifdef CONFIG_KEXEC
+cpumask_t cpus_in_sr = CPU_MASK_NONE;
+#endif
#endif
#ifdef CONFIG_DEBUGGER
@@ -96,7 +100,7 @@ static DEFINE_SPINLOCK(die_lock);
int die(const char *str, struct pt_regs *regs, long err)
{
- static int die_counter, crash_dump_start = 0;
+ static int die_counter;
if (debugger(regs))
return 1;
@@ -128,22 +132,12 @@ int die(const char *str, struct pt_regs
print_modules();
show_regs(regs);
bust_spinlocks(0);
-
- if (!crash_dump_start && kexec_should_crash(current)) {
- crash_dump_start = 1;
- spin_unlock_irq(&die_lock);
- crash_kexec(regs);
- /* NOTREACHED */
- }
spin_unlock_irq(&die_lock);
- if (crash_dump_start)
- /*
- * Only for soft-reset: Other CPUs will be responded to an IPI
- * sent by first kexec CPU.
- */
- for(;;)
- ;
+ if (kexec_should_crash(current))
+ crash_kexec(regs);
+ crash_kexec_secondary(regs);
+
if (in_interrupt())
panic("Fatal exception in interrupt");
@@ -205,6 +199,10 @@ void system_reset_exception(struct pt_re
if (ppc_md.system_reset_exception(regs))
return;
}
+
+#ifdef CONFIG_KEXEC
+ cpu_set(smp_processor_id(), cpus_in_sr);
+#endif
die("System Reset", regs, SIGABRT);
diff -Naurp 2617-rc1.orig/include/asm-powerpc/kexec.h 2617-rc1/include/asm-powerpc/kexec.h
--- 2617-rc1.orig/include/asm-powerpc/kexec.h 2006-04-05 13:20:53.000000000 -0700
+++ 2617-rc1/include/asm-powerpc/kexec.h 2006-04-05 13:18:27.000000000 -0700
@@ -123,8 +123,10 @@ extern int default_machine_kexec_prepare
extern void default_machine_crash_shutdown(struct pt_regs *regs);
extern void machine_kexec_simple(struct kimage *image);
-
+extern void crash_kexec_secondary(struct pt_regs *regs);
#endif /* ! __ASSEMBLY__ */
+#else
+static inline void crash_kexec_secondary(struct pt_regs *regs) { }
#endif /* CONFIG_KEXEC */
#endif /* __KERNEL__ */
#endif /* _ASM_POWERPC_KEXEC_H */
^ permalink raw reply
* MPC83xx: CF access in True IDE mode
From: SIP COP 009 @ 2006-04-10 23:00 UTC (permalink / raw)
To: linuxppc-embedded
[-- Attachment #1: Type: text/plain, Size: 172 bytes --]
Hi !
I am looking out for some pointers to get CF working in the TRUE IDE mode on
Linux2.6
Any pointers and help would be greatly appreciated.
Thanks!
ashutosh
[-- Attachment #2: Type: text/html, Size: 209 bytes --]
^ permalink raw reply
* [PATCH] kdump-ppc64-xmon-stop-cpu
From: David Wilder @ 2006-04-10 22:32 UTC (permalink / raw)
To: fastboot, linuxppc-dev, akpm, mchintage, paulus, hbabu
[-- Attachment #1: Type: text/plain, Size: 493 bytes --]
Patch 3 of 3
- During CPU(s) hang scenarios, kdump could not stop these CPUs.
However, the user could invoke soft-reset to shoot down CPUs reliably.
But, when the debugger is enabled, these CPUs are returned to hang state
after they exited from the debugger. This patch fixes this issue by
calling crash_kexec_secondary() before returns to previous state.
Please pick up this patch.
--
David Wilder
IBM Linux Technology Center
Beaverton, Oregon, USA
dwilder@us.ibm.com
(503)578-3789
[-- Attachment #2: kdump-ppc64-xmon-stop-cpu.patch --]
[-- Type: text/x-patch, Size: 1273 bytes --]
- During CPU(s) hang scenarios, kdump could not stop these CPUs. However, the user could invoke soft-reset to shoot down CPUs reliably. But, when the debugger is enabled, these CPUs are returned to hang state after they exited from the debugger. This patch fixes this issue by calling crash_kexec_secondary() before returns to previous state.
Signed-off-by: David Wilder <dwilder@us.ibm.com>
Signed-off-by: Haren Myneni <haren@us.ibm.com>
--- 2617-rc1/arch/powerpc/kernel/traps.c.orig 2006-04-05 13:25:22.000000000 -0700
+++ 2617-rc1/arch/powerpc/kernel/traps.c 2006-04-05 13:25:33.000000000 -0700
@@ -206,6 +206,16 @@ void system_reset_exception(struct pt_re
die("System Reset", regs, SIGABRT);
+ /*
+ * Some CPUs which got released from debugger will execute this path.
+ * These CPUs entered debugger first time via soft-reset - Means,
+ * could be possible that these CPUs may not repond to an IPI later.
+ * Therefore, has to call kdump func directly.
+ * Not a problem if we exited from debugger to recover. In this case
+ * there will not be any primary kexec CPU. Hence, will be returned.
+ */
+ crash_kexec_secondary(regs);
+
/* Must die if the interrupt is not recoverable */
if (!(regs->msr & MSR_RI))
panic("Unrecoverable System Reset");
^ permalink raw reply
* Re: freescale lite 5200 board and kernel 2.6
From: Domenico Andreoli @ 2006-04-10 21:54 UTC (permalink / raw)
To: linuxppc-embedded
In-Reply-To: <20060408082100.GA54030@server.idefix.loc>
On Sat, Apr 08, 2006 at 10:21:00AM +0200, Matthias Fechner wrote:
> Hello Domenico,
Hello Matthias,
> * Domenico Andreoli <cavokz@gmail.com> [07-04-06 00:10]:
> > kernel is built following the instructions on your wiki, i attached
> > the config file. please have a look, let me know if any check/test may
> > be advised.
>
> sry, but I have now time to try your kernel config, but I attached
> mine which is working fine for me.
> Maybe this helps you.
unfortunately it does not :/
sometimes kernel prints "eth0: config: auto-negotiation on, 100HDX,
10HDX.", the eth0 is up and everything works.
most of the times "eth0: config: auto-negotiation off, No speed/duplex
selected?." is instead printed and eth0 seems to not exist.
i tend to exclude hw problems, older kernel 2.4.x always succeed in
using eth0.
is there any way to force eth0 auto-negotiation?
thank you
domenico
-----[ Domenico Andreoli, aka cavok
--[ http://people.debian.org/~cavok/gpgkey.asc
---[ 3A0F 2F80 F79C 678A 8936 4FEE 0677 9033 A20E BC50
^ permalink raw reply
* [PATCH] kdump-image-rm-static
From: David Wilder @ 2006-04-10 22:30 UTC (permalink / raw)
To: fastboot, linuxppc-dev, akpm, mchintage, paulus, hbabu
[-- Attachment #1: Type: text/plain, Size: 974 bytes --]
I am posting three patches (in separate emails) to both these lists.
The 2nd and 3d patches have dependencies on the first patch that I have
attached to this email (changing a static to non-static). This patch
applies to generic code where the other two are powerpc specific. Please
pick up these patchs.
On powerpc, the panic CPU sends an IPI to shoot down other CPUs. Since
the IPI is not an NMI, it may not be able to stop all CPUs before kdump
boot. However, one solution could be, if some CPUs are not stopped,
asking the user to activate soft-reset (either from management console
or pressing soft-reset button) which sends FW NMI to all CPUs. These
CPUS will execute arch specific kdump func which has to be invoked
machine_kexec() directly. At present, kexec_crash_image is not passed to
machine_crash_shutdown() or defined as static in kexec.c.
--
David Wilder
IBM Linux Technology Center
Beaverton, Oregon, USA
dwilder@us.ibm.com
(503)578-3789
[-- Attachment #2: kdump-image-rm-static.patch --]
[-- Type: text/x-patch, Size: 1658 bytes --]
On powerpc, the panic CPU sends an IPI to shoot down other CPUs. Since not an NMI, may not able to stop all CPUs before kdump boot. However, one solution could be, if some CPUs are not stopped, asking the user to activate soft-reset (either from management console or pressing soft-reset button) which sends FW NMI to all CPUs. These CPUS will execute arch specific kdump func which has to be invoked machine_kexec() directly. At present, kexec_crash_image is not passed to machine_crash_shutdown() or defined as static in kexec.c.
Signed-off-by: David Wilder <dwilder@us.ibm.com>
Signed-off-by: Haren Myneni <haren@us.ibm.com>
--- 2617-rc1/kernel/kexec.c.orig 2006-04-05 13:27:53.000000000 -0700
+++ 2617-rc1/kernel/kexec.c 2006-04-05 13:27:43.000000000 -0700
@@ -903,7 +903,7 @@ static int kimage_load_segment(struct ki
* that to happen you need to do that yourself.
*/
struct kimage *kexec_image = NULL;
-static struct kimage *kexec_crash_image = NULL;
+struct kimage *kexec_crash_image = NULL;
/*
* A home grown binary mutex.
* Nothing can wait so this mutex is safe to use
@@ -1042,7 +1042,6 @@ asmlinkage long compat_sys_kexec_load(un
void crash_kexec(struct pt_regs *regs)
{
- struct kimage *image;
int locked;
@@ -1056,12 +1055,11 @@ void crash_kexec(struct pt_regs *regs)
*/
locked = xchg(&kexec_lock, 1);
if (!locked) {
- image = xchg(&kexec_crash_image, NULL);
- if (image) {
+ if (kexec_crash_image) {
struct pt_regs fixed_regs;
crash_setup_regs(&fixed_regs, regs);
machine_crash_shutdown(&fixed_regs);
- machine_kexec(image);
+ machine_kexec(kexec_crash_image);
}
xchg(&kexec_lock, 0);
}
^ permalink raw reply
* hdlc_enet
From: Antonio Di Bacco @ 2006-04-10 20:51 UTC (permalink / raw)
To: linuxppc-embedded
Hi,
I want to use the hdlc_enet driver, to use both scc2 and scc4 connected
together on a MBX card.
To get my hdlc_enet_init called shoudl I patch Space.c adding a list of hdlc
devices?
Anyone could help?
Bye,
Antonio.
^ permalink raw reply
* Re: GPIO endianness on MPC8349
From: Ben Warren @ 2006-04-10 20:20 UTC (permalink / raw)
To: Kumar Gala; +Cc: linuxppc-embedded
In-Reply-To: <6518EE00-812C-4839-AF00-AA976C35E799@kernel.crashing.org>
[-- Attachment #1: Type: text/plain, Size: 1389 bytes --]
Sorry for wasting bandwidth (again). Turns out my schematic is for an
earlier spin of the board.
regards,
Ben
On Mon, 2006-04-10 at 15:06 -0500, Kumar Gala wrote:
> On Apr 10, 2006, at 2:48 PM, Ben Warren wrote:
>
> > Hello,
> >
> > I'm a noobie to this CPU, and am utterly confused with how the bits
> > are
> > ordered on the GPIO ports. I imagine it's the same as all Freescale
> > PPCs, but who knows. Anyway...
> >
> > Using an MPC8349MDS eval board, I have one LED to play with. From the
> > schematic, it's connected to GPIO1[1]. From other processors that
> > I've
> > worked with, I would have expected to toggle it with either 0x40000000
> > (IBM 405) or 0x00000002 (68360). Nope. To make this bit move, I mess
> > with bit 0x00000040 in the appropriate DAT register. This leads me to
> > believe that either the bit ordering is something
> > like ...89abcdef01234567 (sorry for the confusing notation, but
> > hopefully it makes sense) or the schematic has a typo. Since I'm
> > trying
> > to write a generic GPIO handler, I'd like to have a little
> > confidence in
> > my extrapolation from a single point.
> >
> > Can anybody shed some light on this?
>
> This is because the Freescale docs are misleading. If you look at
> the schematic you will see the LED is wired to GPIO1[5] which makes
> sense for the 0x40 value you have to use.
>
> - kumar
[-- Attachment #2: Type: text/html, Size: 2480 bytes --]
^ permalink raw reply
* Re: GPIO endianness on MPC8349
From: Kumar Gala @ 2006-04-10 20:06 UTC (permalink / raw)
To: bwarren; +Cc: linuxppc-embedded
In-Reply-To: <1144698501.972.103.camel@saruman.qstreams.net>
On Apr 10, 2006, at 2:48 PM, Ben Warren wrote:
> Hello,
>
> I'm a noobie to this CPU, and am utterly confused with how the bits
> are
> ordered on the GPIO ports. I imagine it's the same as all Freescale
> PPCs, but who knows. Anyway...
>
> Using an MPC8349MDS eval board, I have one LED to play with. From the
> schematic, it's connected to GPIO1[1]. From other processors that
> I've
> worked with, I would have expected to toggle it with either 0x40000000
> (IBM 405) or 0x00000002 (68360). Nope. To make this bit move, I mess
> with bit 0x00000040 in the appropriate DAT register. This leads me to
> believe that either the bit ordering is something
> like ...89abcdef01234567 (sorry for the confusing notation, but
> hopefully it makes sense) or the schematic has a typo. Since I'm
> trying
> to write a generic GPIO handler, I'd like to have a little
> confidence in
> my extrapolation from a single point.
>
> Can anybody shed some light on this?
This is because the Freescale docs are misleading. If you look at
the schematic you will see the LED is wired to GPIO1[5] which makes
sense for the 0x40 value you have to use.
- kumar
^ permalink raw reply
* GPIO endianness on MPC8349
From: Ben Warren @ 2006-04-10 19:48 UTC (permalink / raw)
To: linuxppc-embedded
Hello,
I'm a noobie to this CPU, and am utterly confused with how the bits are
ordered on the GPIO ports. I imagine it's the same as all Freescale
PPCs, but who knows. Anyway...
Using an MPC8349MDS eval board, I have one LED to play with. From the
schematic, it's connected to GPIO1[1]. From other processors that I've
worked with, I would have expected to toggle it with either 0x40000000
(IBM 405) or 0x00000002 (68360). Nope. To make this bit move, I mess
with bit 0x00000040 in the appropriate DAT register. This leads me to
believe that either the bit ordering is something
like ...89abcdef01234567 (sorry for the confusing notation, but
hopefully it makes sense) or the schematic has a typo. Since I'm trying
to write a generic GPIO handler, I'd like to have a little confidence in
my extrapolation from a single point.
Can anybody shed some light on this?
thanks,
Ben
^ permalink raw reply
* Re: Slab errors on 4xx (STB04)
From: Andre Draszik @ 2006-04-10 19:48 UTC (permalink / raw)
To: Andre Draszik, linuxppc-embedded
In-Reply-To: <20060410071457.GA16898@gate.ebshome.net>
Eugene Surovegin wrote:
> On Mon, Apr 10, 2006 at 12:33:56AM +0200, Andre Draszik wrote:
>> Since it _seems_ to work nevertheless - is CONFIG_DEBUG_SLAB known to be
>> broken on this platform?
>
> Yes, it's very likely that CONFIG_DEBUG_SLAB is the culprit here. This
> [...]
Thanks Roland and Eugene for the advice.
> You can try changing __dma_sync() to do flush_dcache_range() even for
> DMA_FROM_DEVICE case. However, do this only to check this theory, not
> as a permanent solution :).
OK, I will play with the cache later today... I wanted DEBUG_SLAB turned
on for some other unrelated problem, so just for debugging, this hack
would be fine if it worked :)
Greetings,
Andre'
^ permalink raw reply
* [PATCH] powerpc: fixup pci resource DBG code to handle size change
From: Kumar Gala @ 2006-04-10 18:11 UTC (permalink / raw)
To: Andrew Morton; +Cc: linuxppc-dev, linux-kernel
A number of DBG() calls needed to be fixed up to properly handle the size
change in struct resource
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
---
commit 8e55334e29b2996d921641911a3802f9d004566d
tree 723945265ad953fd6ce4f59c66e560165e78da41
parent 6d05f46f58f45d4aa74cf328f9f7935a71d4fe87
author Kumar Gala <galak@kernel.crashing.org> Mon, 10 Apr 2006 13:10:45 -0500
committer Kumar Gala <galak@kernel.crashing.org> Mon, 10 Apr 2006 13:10:45 -0500
arch/powerpc/kernel/pci_32.c | 28 ++++++++++++++--------------
arch/ppc/kernel/pci.c | 2 +-
2 files changed, 15 insertions(+), 15 deletions(-)
diff --git a/arch/powerpc/kernel/pci_32.c b/arch/powerpc/kernel/pci_32.c
index 96a5ee9..9607a09 100644
--- a/arch/powerpc/kernel/pci_32.c
+++ b/arch/powerpc/kernel/pci_32.c
@@ -99,7 +99,7 @@ pcibios_fixup_resources(struct pci_dev *
if (!res->flags)
continue;
if (res->end == 0xffffffff) {
- DBG("PCI:%s Resource %d [%08lx-%08lx] is unassigned\n",
+ DBG("PCI:%s Resource %d [%016llx-%016llx] is unassigned\n",
pci_name(dev), i, res->start, res->end);
res->end -= res->start;
res->start = 0;
@@ -117,7 +117,7 @@ pcibios_fixup_resources(struct pci_dev *
res->start += offset;
res->end += offset;
#ifdef DEBUG
- printk("Fixup res %d (%lx) of dev %s: %lx -> %lx\n",
+ printk("Fixup res %d (%lx) of dev %s: %llx -> %llx\n",
i, res->flags, pci_name(dev),
res->start - offset, res->start);
#endif
@@ -179,7 +179,7 @@ void pcibios_align_resource(void *data,
struct pci_dev *dev = data;
if (res->flags & IORESOURCE_IO) {
- unsigned long start = res->start;
+ u64 start = res->start;
if (size > 0x100) {
printk(KERN_ERR "PCI: I/O Region %s/%d too large"
@@ -255,8 +255,8 @@ pcibios_allocate_bus_resources(struct li
}
}
- DBG("PCI: bridge rsrc %lx..%lx (%lx), parent %p\n",
- res->start, res->end, res->flags, pr);
+ DBG("PCI: bridge rsrc %llx..%llx (%lx), parent %p\n",
+ res->start, res->end, res->flags, pr);
if (pr) {
if (request_resource(pr, res) == 0)
continue;
@@ -306,7 +306,7 @@ reparent_resources(struct resource *pare
*pp = NULL;
for (p = res->child; p != NULL; p = p->sibling) {
p->parent = res;
- DBG(KERN_INFO "PCI: reparented %s [%lx..%lx] under %s\n",
+ DBG(KERN_INFO "PCI: reparented %s [%llx..%llx] under %s\n",
p->name, p->start, p->end, res->name);
}
return 0;
@@ -362,7 +362,7 @@ pci_relocate_bridge_resource(struct pci_
try = conflict->start - 1;
}
if (request_resource(pr, res)) {
- DBG(KERN_ERR "PCI: huh? couldn't move to %lx..%lx\n",
+ DBG(KERN_ERR "PCI: huh? couldn't move to %llx..%llx\n",
res->start, res->end);
return -1; /* "can't happen" */
}
@@ -480,14 +480,14 @@ static inline void alloc_resource(struct
{
struct resource *pr, *r = &dev->resource[idx];
- DBG("PCI:%s: Resource %d: %08lx-%08lx (f=%lx)\n",
+ DBG("PCI:%s: Resource %d: %016llx-%016llx (f=%lx)\n",
pci_name(dev), idx, r->start, r->end, r->flags);
pr = pci_find_parent_resource(dev, r);
if (!pr || request_resource(pr, r) < 0) {
printk(KERN_ERR "PCI: Cannot allocate resource region %d"
" of device %s\n", idx, pci_name(dev));
if (pr)
- DBG("PCI: parent is %p: %08lx-%08lx (f=%lx)\n",
+ DBG("PCI: parent is %p: %016llx-%016llx (f=%lx)\n",
pr, pr->start, pr->end, pr->flags);
/* We'll assign a new address later */
r->flags |= IORESOURCE_UNSET;
@@ -957,7 +957,7 @@ pci_process_bridge_OF_ranges(struct pci_
res = &hose->io_resource;
res->flags = IORESOURCE_IO;
res->start = ranges[2];
- DBG("PCI: IO 0x%lx -> 0x%lx\n",
+ DBG("PCI: IO 0x%llx -> 0x%llx\n",
res->start, res->start + size - 1);
break;
case 2: /* memory space */
@@ -979,7 +979,7 @@ pci_process_bridge_OF_ranges(struct pci_
if(ranges[0] & 0x40000000)
res->flags |= IORESOURCE_PREFETCH;
res->start = ranges[na+2];
- DBG("PCI: MEM[%d] 0x%lx -> 0x%lx\n", memno,
+ DBG("PCI: MEM[%d] 0x%llx -> 0x%llx\n", memno,
res->start, res->start + size - 1);
}
break;
@@ -1075,7 +1075,7 @@ do_update_p2p_io_resource(struct pci_bus
DBG("Remapping Bus %d, bridge: %s\n", bus->number, pci_name(bridge));
res.start -= ((unsigned long) hose->io_base_virt - isa_io_base);
res.end -= ((unsigned long) hose->io_base_virt - isa_io_base);
- DBG(" IO window: %08lx-%08lx\n", res.start, res.end);
+ DBG(" IO window: %016llx-%016llx\n", res.start, res.end);
/* Set up the top and bottom of the PCI I/O segment for this bus. */
pci_read_config_dword(bridge, PCI_IO_BASE, &l);
@@ -1223,8 +1223,8 @@ do_fixup_p2p_level(struct pci_bus *bus)
continue;
if ((r->flags & IORESOURCE_IO) == 0)
continue;
- DBG("Trying to allocate from %08lx, size %08lx from parent"
- " res %d: %08lx -> %08lx\n",
+ DBG("Trying to allocate from %016llx, size %016llx from parent"
+ " res %d: %016llx -> %016llx\n",
res->start, res->end, i, r->start, r->end);
if (allocate_resource(r, res, res->end + 1, res->start, max,
diff --git a/arch/ppc/kernel/pci.c b/arch/ppc/kernel/pci.c
index ffbae40..fb25b30 100644
--- a/arch/ppc/kernel/pci.c
+++ b/arch/ppc/kernel/pci.c
@@ -960,7 +960,7 @@ static pgprot_t __pci_mmap_set_pgprot(st
else
prot |= _PAGE_GUARDED;
- printk("PCI map for %s:%lx, prot: %lx\n", pci_name(dev), rp->start,
+ printk("PCI map for %s:%llx, prot: %llx\n", pci_name(dev), rp->start,
prot);
return __pgprot(prot);
^ permalink raw reply related
* Re: [PATCH 2/2] Base pSeries PCIe support
From: Jake Moilanen @ 2006-04-10 15:39 UTC (permalink / raw)
To: paulus; +Cc: linuxppc-dev
In-Reply-To: <20060401165723.ab77c81f.moilanen@austin.ibm.com>
On Sat, 1 Apr 2006 16:57:23 -0600
Jake Moilanen <moilanen@austin.ibm.com> wrote:
> On Sat, 1 Apr 2006 19:36:07 +1100
> Paul Mackerras <paulus@samba.org> wrote:
>
> > Jake Moilanen writes:
> >
> > > The NR_IRQS got bumped up to 1024, as vectors can go much higher.
> > > Unfortunately, this number was arbitrarily picked as there is no claim
> > > at what the max number really is by either the firmware team, or the
> > > PAPR+.
> >
> > What matters is the number of different vectors, not the actual value
> > of the vectors, because we remap interrupt numbers that the firmware
> > gives us to logical Linux irq numbers between 0 and NR_IRQS-1. We had
> > to do that when the POWER5 systems came out, because the interrupt
> > numbers there occupy 24 bits.
>
> Ah. That sounds right. I haven't had a chance to test this version of
> the patch. Firmware is currently broken on my machine.
I was able to validate that these patches do work, and we are receiving
MSI interrupts correctly.
Is it too late to get into 2.6.17?
Jake
^ permalink raw reply
* Re: [PATCH 2/4] tickless idle cpu: Skip ticks when CPU is idle
From: Srivatsa Vaddagiri @ 2006-04-10 12:23 UTC (permalink / raw)
To: Kumar Gala; +Cc: sri_vatsa_v, paulus, linuxppc-dev
In-Reply-To: <981C3B4E-7336-403D-AF58-3B36AA071866@kernel.crashing.org>
On Fri, Apr 07, 2006 at 09:16:58AM -0500, Kumar Gala wrote:
> >+config NO_IDLE_HZ
> >+ depends on EXPERIMENTAL && (PPC_PSERIES || PPC_PMAC || PPC_MAPLE)
> >+ bool "Switch off timer ticks on idle CPUs"
> >+ help
> >+ Switches the HZ timer interrupts off when a CPU is idle.
> >+
>
> any reason not to provide this for all 6xx class processors?
I think the same patch would work mostly for 6xx cpus as well. I however
dont think have any hardware to test it. If I am not mistaken, to
support 6xx CPUs, only ppc6xx_idle needs to be modified to call stop_hz_timer
before going into power-save mode?
--
Regards,
vatsa
^ permalink raw reply
* RE: Accessing physical memory
From: Fillod Stephane @ 2006-04-10 12:43 UTC (permalink / raw)
To: Antonio Di Bacco, linuxppc-embedded
Antonio Di Bacco wrote:
>How can I access the physical memory? Can I MMAP for example /dev/mem?
Is=20
>there a simpler way?
Your question is a linuxppc-embedded FAQ.
It is documented in Denx's FAQ[1], and accessible through shorter
URL[2].
For more information, please follow this thread[3] (not ppc specific
actually).
[1]
http://www.denx.de/twiki/bin/view/PPCEmbedded/DeviceDrivers#Section_Acce
ssingPeripheralsFromUserSpace
[2] http://tinyurl.com/6c7th
[3] http://lists.linuxppc.org/linuxppc-embedded/200403/msg00059.html
CIAO,
--=20
Stephane
^ permalink raw reply
* [PATCH 2/2] tickless idle cpus: allow boot cpu to skip ticks
From: Srivatsa Vaddagiri @ 2006-04-10 12:19 UTC (permalink / raw)
To: Paul Mackerras; +Cc: linuxppc-dev
In-Reply-To: <17462.61423.42032.559627@cargo.ozlabs.ibm.com>
This patch (version 2) lets boot cpu to skip ticks. Tested against
2.6.17-rc1-mm1.
Signed-off-by: Srivatsa Vaddagiri <vatsa@in.ibm.com>
---
linux-2.6.17-rc1-root/arch/powerpc/kernel/time.c | 71 ++++++++++++++++++++---
1 file changed, 63 insertions(+), 8 deletions(-)
diff -puN arch/powerpc/kernel/time.c~boot_cpu_fix arch/powerpc/kernel/time.c
--- linux-2.6.17-rc1/arch/powerpc/kernel/time.c~boot_cpu_fix 2006-04-10 17:43:11.000000000 +0530
+++ linux-2.6.17-rc1-root/arch/powerpc/kernel/time.c 2006-04-10 17:44:32.000000000 +0530
@@ -637,6 +637,39 @@ static void iSeries_tb_recal(void)
static void account_ticks(struct pt_regs *regs);
+static spinlock_t do_timer_cpulock = SPIN_LOCK_UNLOCKED;
+static int do_timer_cpu; /* Which CPU should call do_timer? */
+
+static int __devinit do_timer_cpucallback(struct notifier_block *self,
+ unsigned long action, void *hcpu)
+{
+ int cpu = (long)hcpu;
+
+ switch (action) {
+ case CPU_DOWN_PREPARE:
+ spin_lock(&do_timer_cpulock);
+ if (do_timer_cpu == cpu) {
+ cpumask_t tmpmask;
+ int new_cpu;
+
+ cpus_complement(tmpmask, nohz_cpu_mask);
+ cpu_clear(cpu, tmpmask);
+ new_cpu = any_online_cpu(tmpmask);
+ if (new_cpu != NR_CPUS)
+ do_timer_cpu = new_cpu;
+ }
+ spin_unlock(&do_timer_cpulock);
+ break;
+ }
+
+ return NOTIFY_OK;
+}
+
+static struct notifier_block __devinitdata do_timer_notifier =
+{
+ .notifier_call = do_timer_cpucallback
+};
+
/* Returns 1 if this CPU was set in the mask */
static inline int clear_hzless_mask(void)
{
@@ -645,8 +678,12 @@ static inline int clear_hzless_mask(void
if (unlikely(cpu_isset(cpu, nohz_cpu_mask))) {
cpu_clear(cpu, nohz_cpu_mask);
- rc = 1;
- }
+ spin_lock(&do_timer_cpulock);
+ if (do_timer_cpu == NR_CPUS)
+ do_timer_cpu = cpu;
+ spin_unlock(&do_timer_cpulock);
+ rc = 1;
+ }
return rc;
}
@@ -684,6 +721,15 @@ void stop_hz_timer(void)
return;
}
+ spin_lock(&do_timer_cpulock);
+ if (do_timer_cpu == cpu) {
+ cpumask_t tmpmask;
+
+ cpus_complement(tmpmask, nohz_cpu_mask);
+ do_timer_cpu = any_online_cpu(tmpmask);
+ }
+ spin_unlock(&do_timer_cpulock);
+
do {
seq = read_seqbegin(&xtime_lock);
@@ -716,6 +762,7 @@ void start_hz_timer(struct pt_regs *regs
#else
static inline int clear_hzless_mask(void) { return 0;}
+#define do_timer_cpu boot_cpuid
#endif
static void account_ticks(struct pt_regs *regs)
@@ -742,16 +789,15 @@ static void account_ticks(struct pt_regs
if (!cpu_is_offline(cpu))
account_process_time(regs);
- /*
- * No need to check whether cpu is offline here; boot_cpuid
- * should have been fixed up by now.
- */
- if (cpu != boot_cpuid)
+ if (cpu != do_timer_cpu)
continue;
write_seqlock(&xtime_lock);
tb_last_jiffy += tb_ticks_per_jiffy;
- tb_last_stamp = per_cpu(last_jiffy, cpu);
+ tb_last_stamp += tb_ticks_per_jiffy;
+ /* Handle RTCL overflow on 601 */
+ if (__USE_RTC() && tb_last_stamp >= 1000000000)
+ tb_last_stamp -= 1000000000;
do_timer(regs);
timer_recalc_offset(tb_last_jiffy);
timer_check_rtc();
@@ -836,6 +882,13 @@ void __init smp_space_timers(unsigned in
unsigned long offset = tb_ticks_per_jiffy / max_cpus;
unsigned long previous_tb = per_cpu(last_jiffy, boot_cpuid);
+#ifdef CONFIG_NO_IDLE_HZ
+ /* Don't space timers - we want to let any CPU call do_timer to
+ * increment xtime.
+ */
+ half = offset = 0;
+#endif
+
/* make sure tb > per_cpu(last_jiffy, cpu) for all cpus always */
previous_tb -= tb_ticks_per_jiffy;
/*
@@ -1051,6 +1104,8 @@ void __init time_init(void)
calc_cputime_factors();
#ifdef CONFIG_NO_IDLE_HZ
max_skip = __USE_RTC() ? HZ : MAX_DEC_COUNT / tb_ticks_per_jiffy;
+ do_timer_cpu = boot_cpuid;
+ register_cpu_notifier(&do_timer_notifier);
#endif
/*
_
--
Regards,
vatsa
^ permalink raw reply
* [PATCH 1/2] tickless idle cpus: core patch - v2
From: Srivatsa Vaddagiri @ 2006-04-10 12:18 UTC (permalink / raw)
To: Paul Mackerras; +Cc: linuxppc-dev
In-Reply-To: <17462.61423.42032.559627@cargo.ozlabs.ibm.com>
This is the v2 of the core patch to skip ticks when a CPU is idle.
Changes since v1:
- fix the buggy call to stop_hz_timer in idle_power4.S (hopefully it
is correct now!).
- Dont allow boot cpu to skip ticks (a follow-on patch will
remove this restriction)
Signed-off-by: Srivatsa Vaddagiri <vatsa@in.ibm.com>
---
linux-2.6.17-rc1-root/arch/powerpc/Kconfig | 6
linux-2.6.17-rc1-root/arch/powerpc/kernel/idle_power4.S | 5
linux-2.6.17-rc1-root/arch/powerpc/kernel/irq.c | 3
linux-2.6.17-rc1-root/arch/powerpc/kernel/time.c | 143 +++++++++--
linux-2.6.17-rc1-root/arch/powerpc/kernel/traps.c | 1
linux-2.6.17-rc1-root/arch/powerpc/platforms/pseries/setup.c | 6
linux-2.6.17-rc1-root/include/asm-powerpc/time.h | 8
7 files changed, 147 insertions(+), 25 deletions(-)
diff -puN arch/powerpc/kernel/time.c~no_idle_hz arch/powerpc/kernel/time.c
--- linux-2.6.17-rc1/arch/powerpc/kernel/time.c~no_idle_hz 2006-04-09 10:40:58.000000000 +0530
+++ linux-2.6.17-rc1-root/arch/powerpc/kernel/time.c 2006-04-10 14:32:04.000000000 +0530
@@ -633,40 +633,97 @@ static void iSeries_tb_recal(void)
}
#endif
-/*
- * For iSeries shared processors, we have to let the hypervisor
- * set the hardware decrementer. We set a virtual decrementer
- * in the lppaca and call the hypervisor if the virtual
- * decrementer is less than the current value in the hardware
- * decrementer. (almost always the new decrementer value will
- * be greater than the current hardware decementer so the hypervisor
- * call will not be needed)
- */
+#ifdef CONFIG_NO_IDLE_HZ
-/*
- * timer_interrupt - gets called when the decrementer overflows,
- * with interrupts disabled.
+static void account_ticks(struct pt_regs *regs);
+
+/* Returns 1 if this CPU was set in the mask */
+static inline int clear_hzless_mask(void)
+{
+ int cpu = smp_processor_id();
+ int rc = 0;
+
+ if (unlikely(cpu_isset(cpu, nohz_cpu_mask))) {
+ cpu_clear(cpu, nohz_cpu_mask);
+ rc = 1;
+ }
+
+ return rc;
+}
+
+#define MAX_DEC_COUNT UINT_MAX /* Decrementer is 32-bit */
+static int min_skip = 2; /* Minimum number of ticks to skip */
+static int max_skip; /* Maximum number of ticks to skip */
+
+
+int sysctl_hz_timer = 1;
+
+/* Defer timer interrupt for as long as possible. This is accomplished by
+ * programming the decrementer to a suitable value such that it raises the
+ * exception after desired interval. This features allows CPUs to
+ * be used more efficiently in virtualized environments and/or allows for
+ * lower power consumption.
+ *
+ * Called with interrupts disabled on an idle CPU. Caller has to ensure that
+ * idle loop is not exited w/o start_hz_timer being called via an interrupt
+ * to restore timer interrupt frequency.
*/
-void timer_interrupt(struct pt_regs * regs)
+
+void stop_hz_timer(void)
{
+ unsigned long cpu = smp_processor_id(), seq, delta;
int next_dec;
- int cpu = smp_processor_id();
- unsigned long ticks;
-#ifdef CONFIG_PPC32
- if (atomic_read(&ppc_n_lost_interrupts) != 0)
- do_IRQ(regs);
-#endif
+ if (sysctl_hz_timer != 0 || cpu == boot_cpuid)
+ return;
- irq_enter();
+ cpu_set(cpu, nohz_cpu_mask);
+ mb();
+ if (rcu_pending(cpu) || local_softirq_pending()) {
+ cpu_clear(cpu, nohz_cpu_mask);
+ return;
+ }
- profile_tick(CPU_PROFILING, regs);
- calculate_steal_time();
+ do {
+ seq = read_seqbegin(&xtime_lock);
-#ifdef CONFIG_PPC_ISERIES
- get_lppaca()->int_dword.fields.decr_int = 0;
+ delta = next_timer_interrupt() - jiffies;
+
+ if (delta < min_skip) {
+ cpu_clear(cpu, nohz_cpu_mask);
+ return;
+ }
+
+ if (delta > max_skip)
+ delta = max_skip;
+
+ next_dec = tb_last_stamp + delta * tb_ticks_per_jiffy;
+
+ } while (read_seqretry(&xtime_lock, seq));
+
+ next_dec -= get_tb();
+ set_dec(next_dec);
+
+ return;
+}
+
+/* Take into account skipped ticks and restore the HZ timer frequency */
+void start_hz_timer(struct pt_regs *regs)
+{
+ if (clear_hzless_mask())
+ account_ticks(regs);
+}
+
+#else
+static inline int clear_hzless_mask(void) { return 0;}
#endif
+static void account_ticks(struct pt_regs *regs)
+{
+ int next_dec;
+ int cpu = smp_processor_id();
+ unsigned long ticks;
+
while ((ticks = tb_ticks_since(per_cpu(last_jiffy, cpu)))
>= tb_ticks_per_jiffy) {
/* Update last_jiffy */
@@ -703,6 +760,41 @@ void timer_interrupt(struct pt_regs * re
next_dec = tb_ticks_per_jiffy - ticks;
set_dec(next_dec);
+}
+
+/*
+ * For iSeries shared processors, we have to let the hypervisor
+ * set the hardware decrementer. We set a virtual decrementer
+ * in the lppaca and call the hypervisor if the virtual
+ * decrementer is less than the current value in the hardware
+ * decrementer. (almost always the new decrementer value will
+ * be greater than the current hardware decementer so the hypervisor
+ * call will not be needed)
+ */
+
+/*
+ * timer_interrupt - gets called when the decrementer overflows,
+ * with interrupts disabled.
+ */
+void timer_interrupt(struct pt_regs * regs)
+{
+#ifdef CONFIG_PPC32
+ if (atomic_read(&ppc_n_lost_interrupts) != 0)
+ do_IRQ(regs);
+#endif
+
+ irq_enter();
+
+ clear_hzless_mask();
+
+ profile_tick(CPU_PROFILING, regs);
+ calculate_steal_time();
+
+#ifdef CONFIG_PPC_ISERIES
+ get_lppaca()->int_dword.fields.decr_int = 0;
+#endif
+
+ account_ticks(regs);
#ifdef CONFIG_PPC_ISERIES
if (hvlpevent_is_pending())
@@ -957,6 +1049,9 @@ void __init time_init(void)
tb_ticks_per_usec = ppc_tb_freq / 1000000;
tb_to_us = mulhwu_scale_factor(ppc_tb_freq, 1000000);
calc_cputime_factors();
+#ifdef CONFIG_NO_IDLE_HZ
+ max_skip = __USE_RTC() ? HZ : MAX_DEC_COUNT / tb_ticks_per_jiffy;
+#endif
/*
* Calculate the length of each tick in ns. It will not be
diff -puN arch/powerpc/kernel/irq.c~no_idle_hz arch/powerpc/kernel/irq.c
--- linux-2.6.17-rc1/arch/powerpc/kernel/irq.c~no_idle_hz 2006-04-09 10:40:58.000000000 +0530
+++ linux-2.6.17-rc1-root/arch/powerpc/kernel/irq.c 2006-04-09 10:40:59.000000000 +0530
@@ -60,6 +60,7 @@
#ifdef CONFIG_PPC_ISERIES
#include <asm/paca.h>
#endif
+#include <asm/time.h>
int __irq_offset_value;
#ifdef CONFIG_PPC32
@@ -189,6 +190,8 @@ void do_IRQ(struct pt_regs *regs)
irq_enter();
+ start_hz_timer(regs);
+
#ifdef CONFIG_DEBUG_STACKOVERFLOW
/* Debugging check for stack overflow: is there less than 2KB free? */
{
diff -puN include/asm-powerpc/time.h~no_idle_hz include/asm-powerpc/time.h
--- linux-2.6.17-rc1/include/asm-powerpc/time.h~no_idle_hz 2006-04-09 10:40:59.000000000 +0530
+++ linux-2.6.17-rc1-root/include/asm-powerpc/time.h 2006-04-09 10:40:59.000000000 +0530
@@ -198,6 +198,14 @@ static inline unsigned long tb_ticks_sin
return get_tbl() - tstamp;
}
+#ifdef CONFIG_NO_IDLE_HZ
+extern void stop_hz_timer(void);
+extern void start_hz_timer(struct pt_regs *);
+#else
+static inline void stop_hz_timer(void) { }
+static inline void start_hz_timer(struct pt_regs *regs) { }
+#endif
+
#define mulhwu(x,y) \
({unsigned z; asm ("mulhwu %0,%1,%2" : "=r" (z) : "r" (x), "r" (y)); z;})
diff -puN arch/powerpc/Kconfig~no_idle_hz arch/powerpc/Kconfig
--- linux-2.6.17-rc1/arch/powerpc/Kconfig~no_idle_hz 2006-04-09 10:40:59.000000000 +0530
+++ linux-2.6.17-rc1-root/arch/powerpc/Kconfig 2006-04-09 10:40:59.000000000 +0530
@@ -593,6 +593,12 @@ config HOTPLUG_CPU
Say N if you are unsure.
+config NO_IDLE_HZ
+ depends on EXPERIMENTAL && (PPC_PSERIES || PPC_PMAC || PPC_MAPLE)
+ bool "Switch off timer ticks on idle CPUs"
+ help
+ Switches the HZ timer interrupts off when a CPU is idle.
+
config KEXEC
bool "kexec system call (EXPERIMENTAL)"
depends on PPC_MULTIPLATFORM && EXPERIMENTAL
diff -puN arch/powerpc/kernel/traps.c~no_idle_hz arch/powerpc/kernel/traps.c
--- linux-2.6.17-rc1/arch/powerpc/kernel/traps.c~no_idle_hz 2006-04-09 10:40:59.000000000 +0530
+++ linux-2.6.17-rc1-root/arch/powerpc/kernel/traps.c 2006-04-09 10:40:59.000000000 +0530
@@ -875,6 +875,7 @@ void altivec_unavailable_exception(struc
void performance_monitor_exception(struct pt_regs *regs)
{
+ start_hz_timer(regs);
perf_irq(regs);
}
diff -puN arch/powerpc/platforms/pseries/setup.c~no_idle_hz arch/powerpc/platforms/pseries/setup.c
--- linux-2.6.17-rc1/arch/powerpc/platforms/pseries/setup.c~no_idle_hz 2006-04-09 10:40:59.000000000 +0530
+++ linux-2.6.17-rc1-root/arch/powerpc/platforms/pseries/setup.c 2006-04-09 10:40:59.000000000 +0530
@@ -463,8 +463,10 @@ static void pseries_dedicated_idle_sleep
* very low priority. The cede enables interrupts, which
* doesn't matter here.
*/
- if (!lppaca[cpu ^ 1].idle || poll_pending() == H_PENDING)
+ if (!lppaca[cpu ^ 1].idle || poll_pending() == H_PENDING) {
+ stop_hz_timer();
cede_processor();
+ }
out:
HMT_medium();
@@ -479,6 +481,8 @@ static void pseries_shared_idle_sleep(vo
*/
get_lppaca()->idle = 1;
+ stop_hz_timer();
+
/*
* Yield the processor to the hypervisor. We return if
* an external interrupt occurs (which are driven prior
diff -puN arch/powerpc/kernel/idle_power4.S~no_idle_hz arch/powerpc/kernel/idle_power4.S
--- linux-2.6.17-rc1/arch/powerpc/kernel/idle_power4.S~no_idle_hz 2006-04-09 10:40:59.000000000 +0530
+++ linux-2.6.17-rc1-root/arch/powerpc/kernel/idle_power4.S 2006-04-10 14:50:36.000000000 +0530
@@ -30,6 +30,11 @@ END_FTR_SECTION_IFCLR(CPU_FTR_CAN_NAP)
cmpwi 0,r4,0
beqlr
+ mflr r0
+ std r0,16(r1)
+ bl .stop_hz_timer
+ ld r0,16(r1)
+ mtlr r0
/* Go to NAP now */
BEGIN_FTR_SECTION
DSSALL
_
--
Regards,
vatsa
^ permalink raw reply
* Re: [PATCH 1/4] tickless idle cpu - Allow any CPU to update jiffies
From: Srivatsa Vaddagiri @ 2006-04-10 11:49 UTC (permalink / raw)
To: Paul Mackerras; +Cc: linuxppc-dev
In-Reply-To: <17462.61423.42032.559627@cargo.ozlabs.ibm.com>
On Sat, Apr 08, 2006 at 09:04:15AM +1000, Paul Mackerras wrote:
> Srivatsa Vaddagiri writes:
>
> > Currently, only boot CPU calls do_timer to update jiffies. This prevents
> > idle boot CPU from skipping ticks. Patch below, against 2.6.17-rc1-mm1,
> > allows jiffies to be updated from any CPU.
>
> We have to be very careful here. The code that keeps xtime and
> gettimeofday in sync relies on xtime being incremented as close as
> possible in time to when the timebase passes specific values. Since
> we currently stagger the timer interrupts for the cpus throughout a
> jiffy, having cpus other than the boot cpus calling do_timer will
> break this and introduce inaccuracies. There are also implications
> for the stolen time accounting on shared-processor LPAR systems.
>
> I think we need to remove the staggering, thus having all cpus take
> their timer interrupt at the same time. That way, any of them can
> call do_timer. However we then have to be much more careful about
> possible contention, e.g. on xtime_lock. Your patch has every cpu
> taking xtime_lock for writing rather than just the boot cpu. I'd like
> to see if there is some way to avoid that (while still having just one
> cpu call do_timer, of course).
Paul,
Thanks for the feedback on the patches.
Avoiding contention on xtime_lock doesnt seem to be trivial. Any
solution to it is fraught with races. Anyway, I have attempted one
solution (in the followon Patch 2/2) which keeps the overhead in timer
interrupt handler low.
Let me know if you have other suggestions to avoid xtime_lock
contention!
Following patches are sent in separate mails:
Patch 1/2 - Core patch to skip ticks - v2
Patch 2/2 - Allow boot CPU to skip ticks - v2
The sysctl control patch and decrementer statistics patch are as before
and hence I am not resending them this time.
--
Regards,
vatsa
^ permalink raw reply
* [ANNOUNCE] socket-can for linux
From: Andrey Volkov @ 2006-04-10 10:39 UTC (permalink / raw)
To: linuxppc-embedded; +Cc: linux-kernel
Hi all,
FYI, as it pointed in subj., yesterday socket-can project finally created at
berlios.de, project page: http://developer.berlios.de/projects/socketcan/
Happy hacking.
Andrey Volkov
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox