LinuxPPC-Dev Archive on lore.kernel.org

LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed

* Re: 2.6.31-git5 kernel boot hangs on powerpc
From: Tejun Heo @ 2009-09-25  7:39 UTC (permalink / raw)
  To: Sachin Sant; +Cc: David Miller, Linux/PPC Development
In-Reply-To: <4ABC6E25.7090904@in.ibm.com>

Hello,

Sachin Sant wrote:
> <4>PERCPU: chunk 1 relocating -1 -> 18 c0000000db70fb00
> <c0000000db70fb00:c0000000db70fb00>
> <4>PERCPU: relocated <c000000001120320:c000000001120320>
> <4>PERCPU: chunk 1 relocating 18 -> 16 c0000000db70fb00
> <c000000001120320:c000000001120320>
> <4>PERCPU: relocated <c000000001120300:c000000001120300>
> <4>PERCPU: chunk 1, alloc pages [0,1)
> <4>PERCPU: chunk 1, map pages [0,1)
> <4>PERCPU: map 0xd00007fffff00000, 1 pages 53544
> <4>PERCPU: map 0xd00007fffff80000, 1 pages 53545
> <4>PERCPU: chunk 1, will clear 4096b/unit d00007fffff00000 d00007fffff80000
> <3>INFO: RCU detected CPU 0 stall (t=1000 jiffies)

This supports my hypothesis.  This is the first area being allocated
from a dynamic chunk and cleared.  PFN 53544 and 53545 have been
allocated and successfully mapped to 0xd00007fffff00000 and
0xd00007fffff80000 using map_kernel_range_noflush() but when those
addresses are actually accessed, we end up with infinite faults.  The
fault handler probably thinks that the fault has been handled
correctly but, when the control is returned, the processor faults
again.  Benjamin, I'm way out of my depth here, can you please help?

Oh, one more simple experiment.  Sachin, does the following patch make
any difference?

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 69511e6..93d29eb 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -2102,7 +2102,8 @@ struct vm_struct **pcpu_get_vm_areas(const unsigned long *offsets,
 				     size_t align, gfp_t gfp_mask)
 {
 	const unsigned long vmalloc_start = ALIGN(VMALLOC_START, align);
-	const unsigned long vmalloc_end = VMALLOC_END & ~(align - 1);
+	//const unsigned long vmalloc_end = VMALLOC_END & ~(align - 1);
+	const unsigned long vmalloc_end = vmalloc_start + (512 << 20);
 	struct vmap_area **vas, *prev, *next;
 	struct vm_struct **vms;
 	int area, area2, last_area, term_area;


-- 
tejun

^ permalink raw reply related

* Re: [PATCH v2 0/2] cpu: pseries: Offline state framework.
From: Arjan van de Ven @ 2009-09-25  7:42 UTC (permalink / raw)
  To: svaidy
  Cc: Peter Zijlstra, Gautham R Shenoy, Venkatesh Pallipadi,
	linux-kernel, linuxppc-dev, Darrick J. Wong
In-Reply-To: <20090925072549.GB9562@dirshya.in.ibm.com>

On Fri, 25 Sep 2009 12:55:49 +0530
Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> wrote:

> > I obviously can't speak for p-series cpus, just wanted to point out
> > that there is no universal truth about "offlining saves power".
> 
> Hi Arjan,
> 
> As you have said, on some cpus the extra effort of offlining does not
> save us any extra power, and the state will be same as idle.  The
> assertion that offlining saves power is still valid, it could be same
> as idle or better depending on the architecture and implementation.
> 
> On x86 we still need the code (Venki posted) to take cpus to C6 on
> offline to save power or else offlining consumes more power than idle
> due to C1/hlt state.  This framework can help here as well if we have
> any apprehension on making lowest sleep state as default on x86 and
> want the administrator to decide.

even with Venki's patch, all our measurements indicate that taking
cores away is damage on x86. Don't let that stop what you do for
powerpc, but for x86 it's not a win. Linux is good at keeping cores in
C6 long enough that the downside of offlining is bigger...



-- 
Arjan van de Ven 	Intel Open Source Technology Centre
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org

^ permalink raw reply

* Re: 2.6.31-git5 kernel boot hangs on powerpc
From: Tejun Heo @ 2009-09-25  7:43 UTC (permalink / raw)
  To: Sachin Sant; +Cc: David Miller, Linux/PPC Development
In-Reply-To: <4ABC73C7.20403@kernel.org>

Tejun Heo wrote:
> Hello,
> 
> Sachin Sant wrote:
>> <4>PERCPU: chunk 1 relocating -1 -> 18 c0000000db70fb00
>> <c0000000db70fb00:c0000000db70fb00>
>> <4>PERCPU: relocated <c000000001120320:c000000001120320>
>> <4>PERCPU: chunk 1 relocating 18 -> 16 c0000000db70fb00
>> <c000000001120320:c000000001120320>
>> <4>PERCPU: relocated <c000000001120300:c000000001120300>
>> <4>PERCPU: chunk 1, alloc pages [0,1)
>> <4>PERCPU: chunk 1, map pages [0,1)
>> <4>PERCPU: map 0xd00007fffff00000, 1 pages 53544
>> <4>PERCPU: map 0xd00007fffff80000, 1 pages 53545
>> <4>PERCPU: chunk 1, will clear 4096b/unit d00007fffff00000 d00007fffff80000
>> <3>INFO: RCU detected CPU 0 stall (t=1000 jiffies)
> 
> This supports my hypothesis.  This is the first area being allocated
> from a dynamic chunk and cleared.  PFN 53544 and 53545 have been
> allocated and successfully mapped to 0xd00007fffff00000 and
> 0xd00007fffff80000 using map_kernel_range_noflush() but when those
> addresses are actually accessed, we end up with infinite faults.  The
> fault handler probably thinks that the fault has been handled
> correctly but, when the control is returned, the processor faults
> again.  Benjamin, I'm way out of my depth here, can you please help?
> 
> Oh, one more simple experiment.  Sachin, does the following patch make
> any difference?

Oops, the patch should look like the following.

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 69511e6..37ab9e2 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -2056,7 +2056,8 @@ static unsigned long pvm_determine_end(struct vmap_area **pnext,
 				       struct vmap_area **pprev,
 				       unsigned long align)
 {
-	const unsigned long vmalloc_end = VMALLOC_END & ~(align - 1);
+	const unsigned long vmalloc_start = ALIGN(VMALLOC_START, align);
+	const unsigned long vmalloc_end = vmalloc_start + (512 << 20);
 	unsigned long addr;

 	if (*pnext)
@@ -2102,7 +2103,7 @@ struct vm_struct **pcpu_get_vm_areas(const unsigned long *offsets,
 				     size_t align, gfp_t gfp_mask)
 {
 	const unsigned long vmalloc_start = ALIGN(VMALLOC_START, align);
-	const unsigned long vmalloc_end = VMALLOC_END & ~(align - 1);
+	const unsigned long vmalloc_end = vmalloc_start + (512 << 20);
 	struct vmap_area **vas, *prev, *next;
 	struct vm_struct **vms;
 	int area, area2, last_area, term_area;

^ permalink raw reply related

* Re: 2.6.31-git5 kernel boot hangs on powerpc
From: Sachin Sant @ 2009-09-25  8:03 UTC (permalink / raw)
  To: Tejun Heo; +Cc: David Miller, Linux/PPC Development
In-Reply-To: <4ABC7486.8040500@kernel.org>

[-- Attachment #1: Type: text/plain, Size: 2821 bytes --]

Tejun Heo wrote:
> Tejun Heo wrote:
>   
>> Hello,
>>
>> Sachin Sant wrote:
>>     
>>> <4>PERCPU: chunk 1 relocating -1 -> 18 c0000000db70fb00
>>> <c0000000db70fb00:c0000000db70fb00>
>>> <4>PERCPU: relocated <c000000001120320:c000000001120320>
>>> <4>PERCPU: chunk 1 relocating 18 -> 16 c0000000db70fb00
>>> <c000000001120320:c000000001120320>
>>> <4>PERCPU: relocated <c000000001120300:c000000001120300>
>>> <4>PERCPU: chunk 1, alloc pages [0,1)
>>> <4>PERCPU: chunk 1, map pages [0,1)
>>> <4>PERCPU: map 0xd00007fffff00000, 1 pages 53544
>>> <4>PERCPU: map 0xd00007fffff80000, 1 pages 53545
>>> <4>PERCPU: chunk 1, will clear 4096b/unit d00007fffff00000 d00007fffff80000
>>> <3>INFO: RCU detected CPU 0 stall (t=1000 jiffies)
>>>       
>> This supports my hypothesis.  This is the first area being allocated
>> from a dynamic chunk and cleared.  PFN 53544 and 53545 have been
>> allocated and successfully mapped to 0xd00007fffff00000 and
>> 0xd00007fffff80000 using map_kernel_range_noflush() but when those
>> addresses are actually accessed, we end up with infinite faults.  The
>> fault handler probably thinks that the fault has been handled
>> correctly but, when the control is returned, the processor faults
>> again.  Benjamin, I'm way out of my depth here, can you please help?
>>
>> Oh, one more simple experiment.  Sachin, does the following patch make
>> any difference?
>>     
With this patch applied the machine boots OK :-)

Have attached the boot log. Note that this boot log is
from a different machine, but the reported problem can be
recreate on this machine as well.

Thanks
-Sachin

>
> Oops, the patch should look like the following.
>
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 69511e6..37ab9e2 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -2056,7 +2056,8 @@ static unsigned long pvm_determine_end(struct vmap_area **pnext,
>  				       struct vmap_area **pprev,
>  				       unsigned long align)
>  {
> -	const unsigned long vmalloc_end = VMALLOC_END & ~(align - 1);
> +	const unsigned long vmalloc_start = ALIGN(VMALLOC_START, align);
> +	const unsigned long vmalloc_end = vmalloc_start + (512 << 20);
>  	unsigned long addr;
>
>  	if (*pnext)
> @@ -2102,7 +2103,7 @@ struct vm_struct **pcpu_get_vm_areas(const unsigned long *offsets,
>  				     size_t align, gfp_t gfp_mask)
>  {
>  	const unsigned long vmalloc_start = ALIGN(VMALLOC_START, align);
> -	const unsigned long vmalloc_end = VMALLOC_END & ~(align - 1);
> +	const unsigned long vmalloc_end = vmalloc_start + (512 << 20);
>  	struct vmap_area **vas, *prev, *next;
>  	struct vm_struct **vms;
>  	int area, area2, last_area, term_area;
>
>   


-- 

---------------------------------
Sachin Sant
IBM Linux Technology Center
India Systems and Technology Labs
Bangalore, India
---------------------------------


[-- Attachment #2: boot-log-with-patch-2 --]
[-- Type: text/plain, Size: 11143 bytes --]

Phyp-dump disabled at boot time
Using pSeries machine description
Page orders: linear mapping = 24, virtual = 16, io = 12, vmemmap = 24
Using 1TB segments
Found initrd at 0xc000000003700000:0xc000000003eca37e
bootconsole [udbg0] enabled
Partition configured for 8 cpus.
CPU maps initialized for 2 threads per core
 (thread shift is 1)
Starting Linux PPC64 #3 SMP Fri Sep 25 13:19:46 IST 2009
-----------------------------------------------------
ppc64_pft_size                = 0x19
physicalMemorySize            = 0x80000000
htab_hash_mask                = 0x3ffff
-----------------------------------------------------
Initializing cgroup subsys cpuset
Initializing cgroup subsys cpu
Linux version 2.6.31-git15 (root@mjs22lp5) (gcc version 4.3.2 [gcc-4_3-branch revision 141291] (SUSE Linux) ) #3 SMP Fri Sep 25 13:19:46 IST 2009
[boot]0012 Setup Arch
Node 0 Memory: 0x0-0x42000000
Node 1 Memory: 0x42000000-0x80000000
EEH: No capable adapters found
PPC64 nvram contains 15360 bytes
Using shared processor idle loop
Zone PFN ranges:
  DMA      0x00000000 -> 0x00008000
  Normal   0x00008000 -> 0x00008000
Movable zone start PFN for each node
early_node_map[2] active PFN ranges
    0: 0x00000000 -> 0x00004200
    1: 0x00004200 -> 0x00008000
On node 0 totalpages: 16896
  DMA zone: 15 pages used for memmap
  DMA zone: 0 pages reserved
  DMA zone: 16881 pages, LIFO batch:1
On node 1 totalpages: 15872
  DMA zone: 14 pages used for memmap
  DMA zone: 0 pages reserved
  DMA zone: 15858 pages, LIFO batch:1
[boot]0015 Setup Done
PERCPU: Embedded 2 pages/cpu @c000000001400000 s96744 r0 d34328 u131072
pcpu-alloc: s96744 r0 d34328 u131072 alloc=1*1048576
pcpu-alloc: [0] 0 1 2 3 4 5 6 7 
PERCPU: initialized 17 slots [c000000001500200,c000000001500310)
PERCPU: chunk 0 relocating -1 -> 13 c000000001500380 <c000000001500380:c000000001500380>
PERCPU: relocated <c0000000015002d0:c0000000015002d0>
Built 2 zonelists in Node order, mobility grouping on.  Total pages: 32739
Policy zone: DMA
Kernel command line: root=/dev/sda3 sysrq=8 xmon=on 
PID hash table entries: 4096 (order: -1, 32768 bytes)
freeing bootmem node 0
freeing bootmem node 1
Memory: 2040320k/2097152k available (12800k kernel code, 56832k reserved, 2880k data, 4268k bss, 4800k init)
Hierarchical RCU implementation.
NR_IRQS:512
[boot]0020 XICS Init
[boot]0021 XICS Done
pic: no ISA interrupt controller
time_init: decrementer frequency = 512.000000 MHz
time_init: processor frequency   = 4005.000000 MHz
clocksource: timebase mult[7d0000] shift[22] registered
clockevent: decrementer mult[83126e97] shift[32] cpu[0]
Console: colour dummy device 80x25
console [hvc0] enabled, bootconsole disabled
allocated 1310720 bytes of page_cgroup
please try 'cgroup_disable=memory' option if you don't want memory cgroups
Security Framework initialized
SELinux:  Disabled at boot.
Dentry cache hash table entries: 262144 (order: 5, 2097152 bytes)
Inode-cache hash table entries: 131072 (order: 4, 1048576 bytes)
Mount-cache hash table entries: 4096
Initializing cgroup subsys ns
Initializing cgroup subsys cpuacct
Initializing cgroup subsys memory
Initializing cgroup subsys devices
Initializing cgroup subsys freezer
irq: irq 2 on host null mapped to virtual irq 16
clockevent: decrementer mult[83126e97] shift[32] cpu[1]
Processor 1 found.
clockevent: decrementer mult[83126e97] shift[32] cpu[2]
Processor 2 found.
clockevent: decrementer mult[83126e97] shift[32] cpu[3]
Processor 3 found.
Brought up 4 CPUs
Node 0 CPUs: 0-3
Node 1 CPUs:
CPU0 attaching sched-domain:
 domain 0: span 0-1 level SIBLING
  groups: 0 (cpu_power = 589) 1 (cpu_power = 589)
  domain 1: span 0-3 level CPU
   groups: 0-1 (cpu_power = 1178) 2-3 (cpu_power = 1178)
CPU1 attaching sched-domain:
 domain 0: span 0-1 level SIBLING
  groups: 1 (cpu_power = 589) 0 (cpu_power = 589)
  domain 1: span 0-3 level CPU
   groups: 0-1 (cpu_power = 1178) 2-3 (cpu_power = 1178)
CPU2 attaching sched-domain:
 domain 0: span 2-3 level SIBLING
  groups: 2 (cpu_power = 589) 3 (cpu_power = 589)
  domain 1: span 0-3 level CPU
   groups: 2-3 (cpu_power = 1178) 0-1 (cpu_power = 1178)
CPU3 attaching sched-domain:
 domain 0: span 2-3 level SIBLING
  groups: 3 (cpu_power = 589) 2 (cpu_power = 589)
  domain 1: span 0-3 level CPU
   groups: 2-3 (cpu_power = 1178) 0-1 (cpu_power = 1178)
NET: Registered protocol family 16
IBM eBus Device Driver
POWER6 performance monitor hardware support registered
PCI: Probing PCI hardware
PCI: Probing PCI hardware done
bio: create slab <bio-0> at 0
vgaarb: loaded
usbcore: registered new interface driver usbfs
usbcore: registered new interface driver hub
usbcore: registered new device driver usb
Switching to clocksource timebase
Switched to high resolution mode on CPU 1
Switched to high resolution mode on CPU 2
Switched to high resolution mode on CPU 3
NET: Registered protocol family 2
PERCPU: chunk 0 relocating 13 -> 12 c000000001500380 <c0000000015002d0:c0000000015002d0>
PERCPU: relocated <c0000000015002c0:c0000000015002c0>
IP route cache hash table entries: 16384 (order: 1, 131072 bytes)
TCP established hash table entries: 65536 (order: 4, 1048576 bytes)
TCP bind hash table entries: 65536 (order: 4, 1048576 bytes)
TCP: Hash tables configured (established 65536 bind 65536)
TCP reno registered
NET: Registered protocol family 1
Unpacking initramfs...
Switched to high resolution mode on CPU 0
Freeing initrd memory: 7976k freed
irq: irq 655360 on host null mapped to virtual irq 17
irq: irq 655362 on host null mapped to virtual irq 18
IOMMU table initialized, virtual merging enabled
irq: irq 655364 on host null mapped to virtual irq 19
irq: irq 655365 on host null mapped to virtual irq 20
irq: irq 589825 on host null mapped to virtual irq 21
RTAS daemon started
audit: initializing netlink socket (disabled)
type=2000 audit(1253865170.250:1): initialized
HugeTLB registered 16 MB page size, pre-allocated 0 pages
HugeTLB registered 16 GB page size, pre-allocated 0 pages
VFS: Disk quotas dquot_6.5.2
Dquot-cache hash table entries: 8192 (order 0, 65536 bytes)
msgmni has been set to 4000
alg: No test for stdrng (krng)
Block layer SCSI generic (bsg) driver version 0.4 loaded (major 254)
io scheduler noop registered
io scheduler anticipatory registered
io scheduler deadline registered
io scheduler cfq registered (default)
pci_hotplug: PCI Hot Plug PCI Core version: 0.5
rpaphp: RPA HOT Plug PCI Controller Driver version: 0.1
vio_register_driver: driver hvc_console registering
HVSI: registered 0 devices
Generic RTC Driver v1.07
Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled
pmac_zilog: 0.6 (Benjamin Herrenschmidt <benh@kernel.crashing.org>)
input: Macintosh mouse button emulation as /devices/virtual/input/input0
Uniform Multi-Platform E-IDE driver
ide-gd driver 1.18
ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver
ohci_hcd: USB 1.1 'Open' Host Controller (OHCI) Driver
mice: PS/2 mouse device common for all mice
EDAC MC: Ver: 2.1.0 Sep 25 2009
usbcore: registered new interface driver hiddev
usbcore: registered new interface driver usbhid
usbhid: v2.6:USB HID core driver
TCP cubic registered
NET: Registered protocol family 15
registered taskstats version 1
Freeing unused kernel memory: 4800k freed
PERCPU: chunk 0 relocating 12 -> 11 c000000001500380 <c0000000015002c0:c0000000015002c0>
PERCPU: relocated <c0000000015002b0:c0000000015002b0>
SysRq : Changing Loglevel
Loglevel set to 8
SCSI subsystem initialized
vio_register_driver: driver ibmvscsi registering
ibmvscsi 30000002: SRP_VERSION: 16.a
scsi0 : IBM POWER Virtual SCSI Adapter 1.5.8
ibmvscsi 30000002: partner initialization complete
ibmvscsi 30000002: host srp version: 16.a, host partition 06-1C12A (1), OS 3, max io 262144
ibmvscsi 30000002: Client reserve enabled
ibmvscsi 30000002: sent SRP login
ibmvscsi 30000002: SRP_LOGIN succeeded
scsi 0:0:1:0: Direct-Access     AIX      VDASD            0001 PQ: 0 ANSI: 3
scsi 0:0:2:0: CD-ROM            AIX      VOPTA                 PQ: 0 ANSI: 4
udevd version 128 started
sd 0:0:1:0: [sda] 33554432 512-byte logical blocks: (17.1 GB/16.0 GiB)
sd 0:0:1:0: [sda] Write Protect is off
sd 0:0:1:0: [sda] Mode Sense: 17 00 00 08
sd 0:0:1:0: [sda] Cache data unavailable
sd 0:0:1:0: [sda] Assuming drive cache: write through
sd 0:0:1:0: [sda] Cache data unavailable
sd 0:0:1:0: [sda] Assuming drive cache: write through
 sda: sda1 sda2 sda3
sd 0:0:1:0: [sda] Cache data unavailable
sd 0:0:1:0: [sda] Assuming drive cache: write through
sd 0:0:1:0: [sda] Attached SCSI disk
kjournald starting.  Commit interval 5 seconds
EXT3 FS on sda3, internal journal
EXT3-fs: mounted filesystem with writeback data mode.
udevd version 128 started
sd 0:0:1:0: Attached scsi generic sg0 type 0
scsi 0:0:2:0: Attached scsi generic sg1 type 5
drivers/net/ibmveth.c: ibmveth: IBM i/pSeries Virtual Ethernet Driver 1.03
vio_register_driver: driver ibmveth registering
IBM eHEA ethernet device driver (Release EHEA_0102)
irq: irq 590080 on host null mapped to virtual irq 256
ehea: eth2: Jumbo frames are enabled
ehea: eth2 -> logical port id #9
ehea: eth3: Jumbo frames are enabled
ehea: eth3 -> logical port id #10
sr0: scsi-1 drive
Uniform CD-ROM driver Revision: 3.20
sr 0:0:2:0: Attached scsi CD-ROM sr0
Adding 1044096k swap on /dev/sda2.  Priority:-1 extents:1 across:1044096k 
device-mapper: uevent: version 1.0.3
device-mapper: ioctl: 4.15.0-ioctl (2009-04-01) initialised: dm-devel@redhat.com
loop: module loaded
fuse init (API version 7.13)
irq: irq 777 on host null mapped to virtual irq 265
ehea: eth2: Physical port up
ehea: External switch port is backup port
irq: irq 778 on host null mapped to virtual irq 266
NET: Registered protocol family 10
PERCPU: chunk 0 relocating 11 -> 10 c000000001500380 <c0000000015002b0:c0000000015002b0>
PERCPU: relocated <c0000000015002a0:c0000000015002a0>
PERCPU: chunk 0 relocating 10 -> 9 c000000001500380 <c0000000015002a0:c0000000015002a0>
PERCPU: relocated <c000000001500290:c000000001500290>
PERCPU: chunk 1 relocating -1 -> 16 c00000003e6d7500 <c00000003e6d7500:c00000003e6d7500>
PERCPU: relocated <c000000001500300:c000000001500300>
PERCPU: chunk 1 relocating 16 -> 14 c00000003e6d7500 <c000000001500300:c000000001500300>
PERCPU: relocated <c0000000015002e0:c0000000015002e0>
PERCPU: chunk 1, alloc pages [0,1)
PERCPU: chunk 1, map pages [0,1)
PERCPU: map 0xd00000001ff00000, 1 pages 14136
PERCPU: map 0xd00000001ff20000, 1 pages 14137
PERCPU: map 0xd00000001ff40000, 1 pages 14159
PERCPU: map 0xd00000001ff60000, 1 pages 14166
PERCPU: map 0xd00000001ff80000, 1 pages 14161
PERCPU: map 0xd00000001ffa0000, 1 pages 14165
PERCPU: map 0xd00000001ffc0000, 1 pages 15732
PERCPU: map 0xd00000001ffe0000, 1 pages 16049
PERCPU: chunk 1, will clear 4096b/unit d00000001ff00000 d00000001ff20000 d00000001ff40000 d00000001ff60000 d00000001ff80000 d00000001ffa0000 d00000001ffc0000 d00000001ffe0000
PERCPU: chunk 0 relocating 9 -> 8 c000000001500380 <c000000001500290:c000000001500290>
PERCPU: relocated <c000000001500280:c000000001500280>
PERCPU: chunk 0 relocating 8 -> 7 c000000001500380 <c000000001500280:c000000001500280>
PERCPU: relocated <c000000001500270:c000000001500270>
eth2: no IPv6 routers present

^ permalink raw reply

* Re: 2.6.31-git5 kernel boot hangs on powerpc
From: Benjamin Herrenschmidt @ 2009-09-25  8:31 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Linux/PPC Development, David Miller
In-Reply-To: <4ABC73C7.20403@kernel.org>

On Fri, 2009-09-25 at 16:39 +0900, Tejun Heo wrote:
> Hello,
> 
> Sachin Sant wrote:
> > <4>PERCPU: chunk 1 relocating -1 -> 18 c0000000db70fb00
> > <c0000000db70fb00:c0000000db70fb00>
> > <4>PERCPU: relocated <c000000001120320:c000000001120320>
> > <4>PERCPU: chunk 1 relocating 18 -> 16 c0000000db70fb00
> > <c000000001120320:c000000001120320>
> > <4>PERCPU: relocated <c000000001120300:c000000001120300>
> > <4>PERCPU: chunk 1, alloc pages [0,1)
> > <4>PERCPU: chunk 1, map pages [0,1)
> > <4>PERCPU: map 0xd00007fffff00000, 1 pages 53544
> > <4>PERCPU: map 0xd00007fffff80000, 1 pages 53545
> > <4>PERCPU: chunk 1, will clear 4096b/unit d00007fffff00000 d00007fffff80000
> > <3>INFO: RCU detected CPU 0 stall (t=1000 jiffies)
> 
> This supports my hypothesis.  This is the first area being allocated
> from a dynamic chunk and cleared.  PFN 53544 and 53545 have been
> allocated and successfully mapped to 0xd00007fffff00000 and
> 0xd00007fffff80000 using map_kernel_range_noflush() but when those
> addresses are actually accessed, we end up with infinite faults.  The
> fault handler probably thinks that the fault has been handled
> correctly but, when the control is returned, the processor faults
> again.  Benjamin, I'm way out of my depth here, can you please help?

Definitely looks like a powerpc mm problem. I'll have a look on monday.

Cheers,
Ben.

> Oh, one more simple experiment.  Sachin, does the following patch make
> any difference?
> 
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 69511e6..93d29eb 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -2102,7 +2102,8 @@ struct vm_struct **pcpu_get_vm_areas(const unsigned long *offsets,
>  				     size_t align, gfp_t gfp_mask)
>  {
>  	const unsigned long vmalloc_start = ALIGN(VMALLOC_START, align);
> -	const unsigned long vmalloc_end = VMALLOC_END & ~(align - 1);
> +	//const unsigned long vmalloc_end = VMALLOC_END & ~(align - 1);
> +	const unsigned long vmalloc_end = vmalloc_start + (512 << 20);
>  	struct vmap_area **vas, *prev, *next;
>  	struct vm_struct **vms;
>  	int area, area2, last_area, term_area;
> 
> 

^ permalink raw reply

* Re: [PATCH] powerpc/8xx: fix regression introduced by cache coherency rewrite
From: Joakim Tjernlund @ 2009-09-25  8:31 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev@ozlabs.org, Rex Feany
In-Reply-To: <1253847827.7103.504.camel@pasglop>

>
>
> > I think there's more finishyness to 8xx than we thought. IE. That
> > tlbil_va might have more reasons to be there than what the comment
> > seems to advertize. Can you try to move it even higher up ? IE.
> > Unconditionally at the beginning of set_pte_filter ?
> >
> > Also, if that doesn't help, can you try putting one in
> > set_access_flags_filter() just below ?
>
> Ok, I got a refresher on the whole concept of "unpopulated TLB entries"
> on 8xx, and that's damn scary. I think what mislead me initially is that
> the comment around the workaround is simply not properly describing the
> extent of the problem :-)
>
> So I'm not going to make the 8xx TLB miss code sane, that's beyond what
> I'm prepare to do with it, but I suspect that this should fix it (on top
> of upstream). Let me know if that's enough or if we also need to put
> one of these in ptep_set_access_flags().
>
> Please let me know if that works for you.
>
> Cheers,
> Ben.
>
> diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm/pgtable.c
> index 5304093..7a8e676 100644
> --- a/arch/powerpc/mm/pgtable.c
> +++ b/arch/powerpc/mm/pgtable.c
> @@ -170,6 +170,16 @@ struct page * maybe_pte_to_page(pte_t pte)
>
>  static pte_t set_pte_filter(pte_t pte, unsigned long addr)
>  {
> +#ifdef CONFIG_8xx
> +   /* 8xx has a weird concept of "unpopulated" entries. When we take
> +    * a TLB miss for a non-valid PTE, we insert such an entry which
> +    * causes a page fault the next time around. This entry must now
> +    * be kicked out or we'll just fault again
> +    */
> +   /* 8xx doesn't care about PID, size or ind args */
> +   _tlbil_va(addr, 0, 0, 0);
> +#endif /* CONFIG_8xx */
> +

The main problem with 8xx it does not update the DAR register in
the TLB Miss/Fault handlers for cache instructions :( It on old bug
that was found only some years ago.

I think the old comment is correct though, as I recall it was Marcelo
that found the problem and added the workaround.

   Jocke

^ permalink raw reply

* Re: [v6 PATCH 0/7]: cpuidle/x86/POWER: Cleanup idle power management code in x86, cleanup drivers/cpuidle/cpuidle.c and introduce cpuidle to POWER.
From: Peter Zijlstra @ 2009-09-25  8:54 UTC (permalink / raw)
  To: svaidy
  Cc: Shaohua Li, Gautham R Shenoy, Venkatesh Pallipadi, linux-kernel,
	linux-acpi, Paul Mackerras, arun, Ingo Molnar, Arjan van de Ven,
	linuxppc-dev, Len Brown
In-Reply-To: <20090925070623.GH8595@dirshya.in.ibm.com>

On Fri, 2009-09-25 at 12:36 +0530, Vaidyanathan Srinivasan wrote:
> * Arjan van de Ven <arjan@infradead.org> [2009-09-24 14:22:28]:
> 
> > On Thu, 24 Sep 2009 10:42:41 +0530
> > Arun R Bharadwaj <arun@linux.vnet.ibm.com> wrote:
> > 
> > > * Arun R Bharadwaj <arun@linux.vnet.ibm.com> [2009-09-22 16:55:27]:
> > > 
> > > Hi Len, (or other acpi folks),
> > > 
> > > I had a question regarding ACPI-cpuidle interaction in the current
> > > implementation.
> > > 
> > > Currently, every cpu (i.e. acpi_processor) registers to cpuidle as
> > > a cpuidle_device. So every cpu has to go through the process of
> > > setting up the idle states and then registering as a cpuidle device.
> > > 
> > > What exactly is the reason behind this?
> > > 
> > 
> > technically a BIOS can opt to give you C states via ACPI on some cpus,
> > but not on others.
> > 
> > in practice when this happens it tends to be a bug.. but it's
> > technically a valid configuration
> 
> So we will need to keep the per-cpu registration as of now because we
> may have such buggy BIOS in the field and we don't want the cpuidle
> framework to malfunction there.

If the BIOS doesn't mention a certain C state on a cpu, and you try to
set it anyway, does that go boom?

This whole per-cpu registration thing is horridly ugly, can't you have a
per-cpu C state exception mask and leave it at that -- if its really
needed?

^ permalink raw reply

* Re: 2.6.31-git5 kernel boot hangs on powerpc
From: Tejun Heo @ 2009-09-25  9:01 UTC (permalink / raw)
  To: Sachin Sant; +Cc: David Miller, Linux/PPC Development
In-Reply-To: <4ABC7955.2070404@in.ibm.com>

Sachin Sant wrote:
> With this patch applied the machine boots OK :-)

Ah... so, the problem really is too high address.  If you've got some
time, it might be interesting to find out how far high is safe.

Thanks.

-- 
tejun

^ permalink raw reply

* Re: [v6 PATCH 0/7]: cpuidle/x86/POWER: Cleanup idle power management code in x86, cleanup drivers/cpuidle/cpuidle.c and introduce cpuidle to POWER.
From: Arjan van de Ven @ 2009-09-25  9:35 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Shaohua Li, Gautham R Shenoy, Venkatesh Pallipadi, linux-kernel,
	linux-acpi, Paul Mackerras, arun, Ingo Molnar, linuxppc-dev,
	Len Brown
In-Reply-To: <1253868864.10287.3.camel@twins>

On Fri, 25 Sep 2009 10:54:24 +0200
Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:

> On Fri, 2009-09-25 at 12:36 +0530, Vaidyanathan Srinivasan wrote:
> > * Arjan van de Ven <arjan@infradead.org> [2009-09-24 14:22:28]:
> > 
> > > On Thu, 24 Sep 2009 10:42:41 +0530
> > > Arun R Bharadwaj <arun@linux.vnet.ibm.com> wrote:
> > > 
> > > > * Arun R Bharadwaj <arun@linux.vnet.ibm.com> [2009-09-22
> > > > 16:55:27]:
> > > > 
> > > > Hi Len, (or other acpi folks),
> > > > 
> > > > I had a question regarding ACPI-cpuidle interaction in the
> > > > current implementation.
> > > > 
> > > > Currently, every cpu (i.e. acpi_processor) registers to cpuidle
> > > > as a cpuidle_device. So every cpu has to go through the process
> > > > of setting up the idle states and then registering as a cpuidle
> > > > device.
> > > > 
> > > > What exactly is the reason behind this?
> > > > 
> > > 
> > > technically a BIOS can opt to give you C states via ACPI on some
> > > cpus, but not on others.
> > > 
> > > in practice when this happens it tends to be a bug.. but it's
> > > technically a valid configuration
> > 
> > So we will need to keep the per-cpu registration as of now because
> > we may have such buggy BIOS in the field and we don't want the
> > cpuidle framework to malfunction there.
> 
> If the BIOS doesn't mention a certain C state on a cpu, and you try to
> set it anyway, does that go boom?
> 
> This whole per-cpu registration thing is horridly ugly, can't you
> have a per-cpu C state exception mask and leave it at that -- if its
> really needed?

the real solution is to make the acpi code always know about C1, even
if the bios doesn't.... That's one for Len :)

(C1 is just "hlt", what we do in the other idle loop ;-) 


-- 
Arjan van de Ven 	Intel Open Source Technology Centre
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org

^ permalink raw reply

* Re: [patch] powerpc: build modules outside the kernel tree fails, if it was built using O=
From: Yuri Frolov @ 2009-09-25  9:39 UTC (permalink / raw)
  To: Sam Ravnborg; +Cc: linux-kbuild, linuxppc-dev, rep.dot.nop
In-Reply-To: <20090925043902.GA2484@merkur.ravnborg.org>

On 09/25/2009 08:39 AM, Sam Ravnborg wrote:
> On Fri, Sep 25, 2009 at 11:12:21AM +1000, Benjamin Herrenschmidt wrote:
>> On Thu, 2009-09-24 at 15:28 +0400, Yuri Frolov wrote:
>>> Hello,
>>>
>>> here is a corresponding bug: http://bugzilla.kernel.org/show_bug.cgi?id=11143
>>> This patch should correctly export crtsavres.o in order to make O= option working.
>>> Please, consider to apply.
>>>
>>>
>>> Fix linking modules against crtsavres.o
>> Hi !
>>
>> This is the same patch you already posted as "
>>
>>
>> [PATCH] Fix linking modules against
>> crtsavres.o
>> "
>>
>> Or it's an update ?

It's the same, sorry for not mentioning it. The previous letter contains attachment crap, so I sent the letter with patch in-lined.

>>
>> I've asked Sam to review it already since it affects the main kernel
>> makefiles, waiting for his answer.
> Saw the duplicates. Will get back to it tonight (morning here now).
> 
> 	Sam
Ok, thank you.

^ permalink raw reply

* Re: [PATCH] powerpc/8xx: fix regression introduced by cache coherency rewrite
From: Benjamin Herrenschmidt @ 2009-09-25  9:47 UTC (permalink / raw)
  To: Joakim Tjernlund; +Cc: linuxppc-dev@ozlabs.org, Rex Feany
In-Reply-To: <OF028339A2.3A7A8D6F-ONC125763C.002DE73B-C125763C.002ECF90@transmode.se>

On Fri, 2009-09-25 at 10:31 +0200, Joakim Tjernlund wrote:
> 
> The main problem with 8xx it does not update the DAR register in
> the TLB Miss/Fault handlers for cache instructions :( It on old bug
> that was found only some years ago.
> 
> I think the old comment is correct though, as I recall it was Marcelo
> that found the problem and added the workaround.

But the TLB needs flushing on more than just the cache instructions,
no ?

IE. We take a TLB miss, there's no valid PTE, we put one of those
"unpopulated" entries in and get into the page fault, at which point we
do a set_pte, we -still- need to do an invalidation to get rid of the
unpopulated entry so it gets a new TLB miss no ? Without that, it's just
going to fault over and over again...

In any case, I think flushing unconditionally the target address isn't
going to hurt since we are just changing its PTE anyways.

As for the DAR problem, I'm not sure whether we really need a workaround
since I haven't seem much people complaining about it so far :-)

Can you educate me more on the problem ? Can it be fixed without
bloating those handlers to oblivion ?

Cheers,
Ben.

^ permalink raw reply

* Re: 2.6.31-git5 kernel boot hangs on powerpc
From: Benjamin Herrenschmidt @ 2009-09-25  9:48 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Linux/PPC Development, David Miller
In-Reply-To: <4ABC86E0.9090807@kernel.org>

On Fri, 2009-09-25 at 18:01 +0900, Tejun Heo wrote:
> > With this patch applied the machine boots OK :-)
> 
> Ah... so, the problem really is too high address.  If you've got some
> time, it might be interesting to find out how far high is safe.
> 
Might give me a clue about what the problem is but I think I'll just
cook up a test case that forcibly vmap something high up and see how it
goes from there. It could be a very old bug that nobody ever noticed
because our vmalloc space on 64-bit is so huge :-)

Cheers,
Ben.

^ permalink raw reply

* Re: [PATCH] i2c-mpc: Do not generate STOP after read.
From: Wolfgang Grandegger @ 2009-09-25 10:01 UTC (permalink / raw)
  To: Joakim Tjernlund; +Cc: linuxppc-dev, linux-i2c, Esben Haabendal
In-Reply-To: <1253620242-18461-1-git-send-email-Joakim.Tjernlund@transmode.se>

Joakim Tjernlund wrote:
> The driver always ends a read with a STOP condition which
> breaks subsequent I2C reads/writes in the same transaction as
> these expect to do a repeated START(ReSTART).
> 
> This will also help I2C multimaster as the bus will not be released
> after the first read, but when the whole transaction ends.
> 
> Signed-off-by: Joakim Tjernlund <Joakim.Tjernlund@transmode.se>
Tested-by: Wolfgang Grandegger <wg@grandegger.com>

on a MPC8548 board with an up-to-date kernel. I did not realize any
problems.

Wolfgang.

^ permalink raw reply

* Re: [PATCH] powerpc/8xx: fix regression introduced by cache coherency rewrite
From: Joakim Tjernlund @ 2009-09-25 10:21 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev@ozlabs.org, Rex Feany
In-Reply-To: <1253872054.7103.519.camel@pasglop>

[-- Attachment #1: Type: text/plain, Size: 2112 bytes --]

Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote on 25/09/2009 11:47:34:
>
> On Fri, 2009-09-25 at 10:31 +0200, Joakim Tjernlund wrote:
> >
> > The main problem with 8xx it does not update the DAR register in
> > the TLB Miss/Fault handlers for cache instructions :( It on old bug
> > that was found only some years ago.
> >
> > I think the old comment is correct though, as I recall it was Marcelo
> > that found the problem and added the workaround.
>
> But the TLB needs flushing on more than just the cache instructions,
> no ?
>
> IE. We take a TLB miss, there's no valid PTE, we put one of those
> "unpopulated" entries in and get into the page fault, at which point we
> do a set_pte, we -still- need to do an invalidation to get rid of the
> unpopulated entry so it gets a new TLB miss no ? Without that, it's just
> going to fault over and over again...

I don't know enough about 8xx in 2.6 as we still use 2.4 for 8xx to
say for sure.

>
> In any case, I think flushing unconditionally the target address isn't
> going to hurt since we are just changing its PTE anyways.
>
> As for the DAR problem, I'm not sure whether we really need a workaround
> since I haven't seem much people complaining about it so far :-)

I did some years ago on 2.4 but no one cared enough :(
The drawbacks of not handling this problem is that you will have
to very carful to use cache instructions and user space must
be especially compiled to omit using them in optimizations.

>
> Can you educate me more on the problem ? Can it be fixed without
> bloating those handlers to oblivion ?

Yes, I fixed it for myself but the fix was never accepted. Currently
only TLB Error depends on DAR so what I did was to tag DAR with an impossible
value and test for that value in the TLB Error handler. If it matched I
branched to a subroutine the did instruction decoding in assembler to
get at registers used and calculate DAR, then return to the TLB error
handler. In hindsight it would have been better to do this work in
handle_page_fault.

I am attaching my old head_8xx.S for 2.4

 Jocke
(See attached file: head_8xx.S)

[-- Attachment #2: head_8xx.S --]
[-- Type: application/octet-stream, Size: 33768 bytes --]

/*
 *  arch/ppc/kernel/except_8xx.S
 *
 *  PowerPC version 
 *    Copyright (C) 1995-1996 Gary Thomas (gdt@linuxppc.org)
 *  Rewritten by Cort Dougan (cort@cs.nmt.edu) for PReP
 *    Copyright (C) 1996 Cort Dougan <cort@cs.nmt.edu>
 *  Low-level exception handlers and MMU support
 *  rewritten by Paul Mackerras.
 *    Copyright (C) 1996 Paul Mackerras.
 *  MPC8xx modifications by Dan Malek
 *    Copyright (C) 1997 Dan Malek (dmalek@jlc.net).
 *
 *  This file contains low-level support and setup for PowerPC 8xx
 *  embedded processors, including trap and interrupt dispatch.
 *
 *  This program is free software; you can redistribute it and/or
 *  modify it under the terms of the GNU General Public License
 *  as published by the Free Software Foundation; either version
 *  2 of the License, or (at your option) any later version.
 *
 */

#include <linux/config.h>
#include <asm/processor.h>
#include <asm/page.h>
#include <asm/mmu.h>
#include <asm/cache.h>
#include <asm/pgtable.h>
#include <asm/cputable.h>
#include <asm/ppc_asm.h>
#include "ppc_defs.h"

#ifdef CONFIG_8xx_DCBxFIXED
/* These macros are used to tag DAR with a known value so that the
 * DataTLBError can recognize a buggy dcbx instruction and workaround
 * the problem.
 */
	#define TAG_VAL 0x00f0	/*  -1 may also be used */
	#define TAG_DAR_R20 	\
		li	r20, TAG_VAL;\
		mtspr	DAR, r20;
#else
	#define TAG_DAR_R20
#endif
/* Macro to make the code more readable. */
#ifdef CONFIG_8xx_CPU6
  #define DO_8xx_CPU6(val, reg) \
	li	reg, val; \
	stw	reg, 12(r0); \
	lwz	reg, 12(r0);
#else
  #define DO_8xx_CPU6(val, reg)
#endif
	.text
	.globl	_stext
_stext:

/*
 * _start is defined this way because the XCOFF loader in the OpenFirmware
 * on the powermac expects the entry point to be a procedure descriptor.
 */
	.text
	.globl	_start
_start:

/* MPC8xx
 * This port was done on an MBX board with an 860.  Right now I only
 * support an ELF compressed (zImage) boot from EPPC-Bug because the
 * code there loads up some registers before calling us:
 *   r3: ptr to board info data
 *   r4: initrd_start or if no initrd then 0
 *   r5: initrd_end - unused if r4 is 0
 *   r6: Start of command line string
 *   r7: End of command line string
 *
 * I decided to use conditional compilation instead of checking PVR and
 * adding more processor specific branches around code I don't need.
 * Since this is an embedded processor, I also appreciate any memory
 * savings I can get.
 *
 * The MPC8xx does not have any BATs, but it supports large page sizes.
 * We first initialize the MMU to support 8M byte pages, then load one
 * entry into each of the instruction and data TLBs to map the first
 * 8M 1:1.  I also mapped an additional I/O space 1:1 so we can get to
 * the "internal" processor registers before MMU_init is called.
 *
 * The TLB code currently contains a major hack.  Since I use the condition
 * code register, I have to save and restore it.  I am out of registers, so
 * I just store it in memory location 0 (the TLB handlers are not reentrant).
 * To avoid making any decisions, I need to use the "segment" valid bit
 * in the first level table, but that would require many changes to the
 * Linux page directory/table functions that I don't want to do right now.
 *
 * I used to use SPRG2 for a temporary register in the TLB handler, but it
 * has since been put to other uses.  I now use a hack to save a register
 * and the CCR at memory location 0.....Someday I'll fix this.....
 *	-- Dan
 */

	.globl	__start
__start:
	/* To accomodate some SMP systems that overwrite the first few
	 * locations before cpu 0 starts, the bootloader starts us at 0xc.
	 */
	nop
	nop
	nop
	mr	r31,r3			/* save parameters */
	mr	r30,r4
	mr	r29,r5
	mr	r28,r6
	mr	r27,r7
	li	r24,0			/* cpu # */

	/* We have to turn on the MMU right away so we get cache modes
	 * set correctly.
	 */
	bl	initial_mmu

/* We now have the lower 8 Meg mapped into TLB entries, and the caches
 * ready to work.
 */

turn_on_mmu:
	mfmsr	r0
	ori	r0,r0,MSR_DR|MSR_IR
	mtspr	SRR1,r0
	lis	r0,start_here@h
	ori	r0,r0,start_here@l
	mtspr	SRR0,r0
	SYNC
	rfi				/* enables MMU */

/*
 * Exception entry code.  This code runs with address translation
 * turned off, i.e. using physical addresses.
 * We assume sprg3 has the physical address of the current
 * task's thread_struct.
 */
#define EXCEPTION_PROLOG	\
	mtspr	SPRG0,r20;	\
	mtspr	SPRG1,r21;	\
	mfcr	r20;		\
	mfspr	r21,SPRG2;		/* exception stack to use from */ \
	cmpwi	0,r21,0;		/* user mode or RTAS */ \
	bne	1f;		\
	tophys(r21,r1);			/* use tophys(kernel sp) otherwise */ \
	subi	r21,r21,INT_FRAME_SIZE;	/* alloc exc. frame */\
1:	stw	r20,_CCR(r21);		/* save registers */ \
	stw	r22,GPR22(r21);	\
	stw	r23,GPR23(r21);	\
	mfspr	r20,SPRG0;	\
	stw	r20,GPR20(r21);	\
	mfspr	r22,SPRG1;	\
	stw	r22,GPR21(r21);	\
	mflr	r20;		\
	stw	r20,_LINK(r21);	\
	mfctr	r22;		\
	stw	r22,_CTR(r21);	\
	mfspr	r20,XER;	\
	stw	r20,_XER(r21);	\
	mfspr	r22,SRR0;	\
	mfspr	r23,SRR1;	\
	stw	r0,GPR0(r21);	\
	stw	r1,GPR1(r21);	\
	stw	r2,GPR2(r21);	\
	stw	r1,0(r21);	\
	tovirt(r1,r21);			/* set new kernel sp */	\
	SAVE_4GPRS(3, r21);	\
	SAVE_GPR(7, r21);
/*
 * Note: code which follows this uses cr0.eq (set if from kernel),
 * r21, r22 (SRR0), and r23 (SRR1).
 */

/*
 * Exception vectors.
 */

#define FINISH_EXCEPTION(func)			\
	bl	transfer_to_handler;		\
	.long	func;				\
	.long	ret_from_except

#define STD_EXCEPTION(n, label, hdlr)		\
	. = n;					\
label:						\
	EXCEPTION_PROLOG;			\
	TAG_DAR_R20;				\
	addi	r3,r1,STACK_FRAME_OVERHEAD;	\
	li	r20,MSR_KERNEL;			\
	FINISH_EXCEPTION(hdlr)

/* System reset */
	STD_EXCEPTION(0x100, Reset, UnknownException)

/* Machine check */
	STD_EXCEPTION(0x200, MachineCheck, MachineCheckException)

/* Data access exception.
 * This is "never generated" by the MPC8xx.  We jump to it for other
 * translation errors.
 */
	. = 0x300
DataAccess:
	EXCEPTION_PROLOG
	mfspr	r20,DSISR
	stw	r20,_DSISR(r21)
	mr	r5,r20
	mfspr	r4,DAR
	stw	r4,_DAR(r21)
	TAG_DAR_R20
	addi	r3,r1,STACK_FRAME_OVERHEAD
	li	r20,MSR_KERNEL
	rlwimi	r20,r23,0,16,16		/* copy EE bit from saved MSR */
	FINISH_EXCEPTION(do_page_fault)

/* Instruction access exception.
 * This is "never generated" by the MPC8xx.  We jump to it for other
 * translation errors.
 */
	. = 0x400
InstructionAccess:
	EXCEPTION_PROLOG
	addi	r3,r1,STACK_FRAME_OVERHEAD
	mr	r4,r22
	mr	r5,r23
	li	r20,MSR_KERNEL
	rlwimi	r20,r23,0,16,16		/* copy EE bit from saved MSR */
	FINISH_EXCEPTION(do_page_fault)

/* External interrupt */
	. = 0x500;
HardwareInterrupt:
	EXCEPTION_PROLOG;
	addi	r3,r1,STACK_FRAME_OVERHEAD
	li	r20,MSR_KERNEL
	li	r4,0
	bl	transfer_to_handler
	.globl	do_IRQ_intercept
do_IRQ_intercept:
	.long	do_IRQ;
	.long	ret_from_intercept

/* Alignment exception */
	. = 0x600
Alignment:
	EXCEPTION_PROLOG
	mfspr	r4,DAR
	stw	r4,_DAR(r21)
	TAG_DAR_R20
	mfspr	r5,DSISR
	stw	r5,_DSISR(r21)
	addi	r3,r1,STACK_FRAME_OVERHEAD
	li	r20,MSR_KERNEL
	rlwimi	r20,r23,0,16,16		/* copy EE bit from saved MSR */
	FINISH_EXCEPTION(AlignmentException)

/* Program check exception */
	. = 0x700
ProgramCheck:
	EXCEPTION_PROLOG
	addi	r3,r1,STACK_FRAME_OVERHEAD
	li	r20,MSR_KERNEL
	rlwimi	r20,r23,0,16,16		/* copy EE bit from saved MSR */
	FINISH_EXCEPTION(ProgramCheckException)

/* No FPU on MPC8xx.  This exception is not supposed to happen.
*/
	STD_EXCEPTION(0x800, FPUnavailable, UnknownException)

	. = 0x900
Decrementer:
	EXCEPTION_PROLOG
	addi	r3,r1,STACK_FRAME_OVERHEAD
	li	r20,MSR_KERNEL
	bl	transfer_to_handler
	.globl	timer_interrupt_intercept
timer_interrupt_intercept:
	.long	timer_interrupt
	.long	ret_from_intercept

	STD_EXCEPTION(0xa00, Trap_0a, UnknownException)
	STD_EXCEPTION(0xb00, Trap_0b, UnknownException)

/* System call */
	. = 0xc00
SystemCall:
	EXCEPTION_PROLOG
	stw	r3,ORIG_GPR3(r21)
	li	r20,MSR_KERNEL
	rlwimi	r20,r23,0,16,16		/* copy EE bit from saved MSR */
	FINISH_EXCEPTION(DoSyscall)

/* Single step - not used on 601 */
	STD_EXCEPTION(0xd00, SingleStep, SingleStepException)

	STD_EXCEPTION(0xe00, Trap_0e, UnknownException)
	STD_EXCEPTION(0xf00, Trap_0f, UnknownException)

/* On the MPC8xx, this is a software emulation interrupt.  It occurs
 * for all unimplemented and illegal instructions.
 */
	STD_EXCEPTION(0x1000, SoftEmu, SoftwareEmulation)

	. = 0x1100
/*
 * For the MPC8xx, this is a software tablewalk to load the instruction
 * TLB.  It is modelled after the example in the Motorola manual.  The task
 * switch loads the M_TWB register with the pointer to the first level table.
 * If we discover there is no second level table (the value is zero), the
 * plan was to load that into the TLB, which causes another fault into the
 * TLB Error interrupt where we can handle such problems.  However, that did
 * not work, so if we discover there is no second level table, we restore
 * registers and branch to the error exception.  We have to use the MD_xxx
 * registers for the tablewalk because the equivalent MI_xxx registers
 * only perform the attribute functions.
 */
InstructionTLBMiss:
#ifdef CONFIG_8xx_CPU6
	stw	r3, 8(r0)
	li	r3, 0x3f80
	stw	r3, 12(r0)
	lwz	r3, 12(r0)
#endif
	mtspr	M_TW, r20	/* Save a couple of working registers */
#if !CONFIG_PIN_TLB || CONFIG_MODULES
	mfcr	r20
	stw	r20, 0(r0)
#endif
	stw	r21, 4(r0)
	mfspr	r20, SRR0	/* Get effective address of fault */
#ifdef CONFIG_8xx_CPU6
	li	r3, 0x3780
	stw	r3, 12(r0)
	lwz	r3, 12(r0)
#endif
	mtspr	MD_EPN, r20	/* Have to use MD_EPN for walk, MI_EPN can't */
	mfspr	r20, M_TWB	/* Get level 1 table entry address */

#if !CONFIG_PIN_TLB || CONFIG_MODULES
	/* If we are faulting a kernel address, we have to use the
	 * kernel page tables.
	 */
	andi.	r21, r20, 0x0800	/* Address >= 0x80000000 */
	beq	3f
	lis	r21, swapper_pg_dir@h
	ori	r21, r21, swapper_pg_dir@l
	rlwimi	r20, r21, 0, 2, 19
3:
	lwz	r21, 0(r20)	/* Get the level 1 entry */
	rlwinm.	r20, r21,0,0,19	/* Extract page descriptor page address */
	tophys(r21,r21)
	ori	r21,r21,1		/* Set valid bit */
	beq	2f		/* If zero, don't try to find a pte */
#else
	lwz	r21, 0(r20)	/* Get the level 1 entry */
	mfcr	r20
	cmplwi	cr0,r21,0x0fff	/* Test page descriptor page address */
	tophys(r21,r21)
	ori	r21,r21,1		/* Set valid bit */
	bng-	2f		/* If zero, don't try to find a pte */
	mtcr	r20
#endif

	/* We have a pte table, so load the MI_TWC with the attributes
	 * for this "segment."
	 */
#ifdef CONFIG_8xx_CPU6
	li	r3, 0x2b80
	stw	r3, 12(r0)
	lwz	r3, 12(r0)
#endif
	mtspr	MI_TWC, r21	/* Set segment attributes */
#ifdef CONFIG_8xx_CPU6
	li	r3, 0x3b80
	stw	r3, 12(r0)
	lwz	r3, 12(r0)
#endif
	mtspr	MD_TWC, r21	/* Load pte table base address */
	mfspr	r21, MD_TWC	/* ....and get the pte address */
	lwz	r20, 0(r21)	/* Get the pte */

	ori	r20, r20, _PAGE_ACCESSED
	stw	r20, 0(r21)

	/* The Linux PTE won't go exactly into the MMU TLB.
	 * Software indicator bits 21, 22 and 28 must be clear.
	 * Software indicator bits 24, 25, 26, and 27 must be
	 * set.  All other Linux PTE bits control the behavior
	 * of the MMU.
	 */
	li	r21, 0x00f0
	rlwimi	r20, r21, 0, 24, 28	/* Set 24-27, clear 28 */

#ifdef CONFIG_8xx_CPU6
	li	r3, 0x2d80
	stw	r3, 12(r0)
	lwz	r3, 12(r0)
#endif
	mtspr	MI_RPN, r20	/* Update TLB entry */

	mfspr	r20, M_TW	/* Restore registers */
#if !CONFIG_PIN_TLB || CONFIG_MODULES
	lwz	r21, 0(r0)
	mtcr	r21
#endif
	lwz	r21, 4(r0)
#ifdef CONFIG_8xx_CPU6
	lwz	r3, 8(r0)
#endif
	rfi

2:	/* Restore registers */
#if !CONFIG_PIN_TLB || CONFIG_MODULES
	lwz	r21, 0(r0)
	mtcr	r21
#else
	mtcr	r20
#endif
	mfspr	r20, M_TW
	lwz	r21, 4(r0)
#ifdef CONFIG_8xx_CPU6
	lwz	r3, 8(r0)
#endif
	b	InstructionAccess

	. = 0x1200
DataStoreTLBMiss:
#ifdef CONFIG_8xx_CPU6
	stw	r3, 8(r0)
	li	r3, 0x3f80
	stw	r3, 12(r0)
	lwz	r3, 12(r0)
#endif
	mtspr	M_TW, r20	/* Save a couple of working registers */
	mfcr	r20
	stw	r20, 0(r0)
	stw	r21, 4(r0)
	mfspr	r20, M_TWB	/* Get level 1 table entry address */

	/* If we are faulting a kernel address, we have to use the
	 * kernel page tables.
	 */
	andi.	r21, r20, 0x0800
	beq+	3f
	lis	r21, swapper_pg_dir@h
	ori	r21, r21, swapper_pg_dir@l
	rlwimi r20, r21, 0, 2, 19
3:
	lwz	r21, 0(r20)	/* Get the level 1 entry */
	rlwinm.	r20, r21,0,0,19	/* Extract page descriptor page address */

//	beq	4f		/* If zero, don't try to find a pte */
	/* We have a pte table, so load fetch the pte from the table.
	 */
	tophys(r21, r21)
	ori	r21, r21, 1	/* Set valid bit in physical L2 page */
//	beq-	4f		/* If zero, don't try to find a pte */
	beq-	2f		/* If zero, don't try to find a pte */

#ifdef CONFIG_8xx_CPU6
	li	r3, 0x3b80
	stw	r3, 12(r0)
	lwz	r3, 12(r0)
#endif
	mtspr	MD_TWC, r21	/* Load pte table base address */
	mfspr	r20, MD_TWC	/* ....and get the pte address */
	lwz	r20, 0(r20)	/* Get the pte */

	/* Insert the Guarded flag into the TWC from the Linux PTE.
	 * It is bit 27 of both the Linux PTE and the TWC (at least
	 * I got that right :-).  It will be better when we can put
	 * this into the Linux pgd/pmd and load it in the operation
	 * above.
	 */
	rlwimi	r21, r20, 0, 27, 27
#ifdef CONFIG_8xx_CPU6
	li	r3, 0x3b80
	stw	r3, 12(r0)
	lwz	r3, 12(r0)
#endif
	mtspr	MD_TWC, r21

//	mfspr	r21, MD_TWC	/* get the pte address again */
	ori	r20, r20, _PAGE_ACCESSED
999:	mfspr	r21, MD_TWC	/* get the pte address again */
	stw	r20, 0(r21)

	/* The Linux PTE won't go exactly into the MMU TLB.
	 * Software indicator bits 21, 22 and 28 must be clear.
	 * Software indicator bits 24, 25, 26, and 27 must be
	 * set.  All other Linux PTE bits control the behavior
	 * of the MMU.
	 */
4:	li	r21, 0x00f0
	rlwimi	r20, r21, 0, 24, 28	/* Set 24-27, clear 28 */

#ifdef CONFIG_8xx_CPU6
	li	r3, 0x3d80
	stw	r3, 12(r0)
	lwz	r3, 12(r0)
#endif
	mtspr	MD_RPN, r20	/* Update TLB entry */

#ifdef CONFIG_8xx_DCBxFIXED
#if TAG_VAL == 0x00f0 /* Save 1 instr. by reusing the val loaded in r21 above */
	mtspr	DAR, r21
#else
	TAG_DAR_R20
#endif
#endif
	mfspr	r20, M_TW	/* Restore registers */
	lwz	r21, 0(r0)
	mtcr	r21
	lwz	r21, 4(r0)
#ifdef CONFIG_8xx_CPU6
	lwz	r3, 8(r0)
#endif
	rfi

2:
#ifdef CONFIG_8xx_DCBxFIXED
	/* Copy 20 msb from MD_EPN to DAR since the dcxx instructions fails
	 * to update DAR when they cause a DTLB Miss.
	 */
	mfspr	r21, MD_EPN
	mfspr	r20, DAR
	rlwimi	r20, r21, 0, 0, 19
	mtspr	DAR, r20
#endif
	mfspr	r20, M_TW	/* Restore registers */
	lwz	r21, 0(r0)
	mtcr	r21
	lwz	r21, 4(r0)
#ifdef CONFIG_8xx_CPU6
	lwz	r3, 8(r0)
#endif
	b	DataAccess

/* This is an instruction TLB error on the MPC8xx.  This could be due
 * to many reasons, such as executing guarded memory or illegal instruction
 * addresses.  There is nothing to do but handle a big time error fault.
 */
	. = 0x1300
InstructionTLBError:
	b	InstructionAccess

/* This is the data TLB error on the MPC8xx.  This could be due to
 * many reasons, including a dirty update to a pte.  We can catch that
 * one here, but anything else is an error.  First, we track down the
 * Linux pte.  If it is valid, write access is allowed, but the
 * page dirty bit is not set, we will set it and reload the TLB.  For
 * any other case, we bail out to a higher level function that can
 * handle it.
 */
	. = 0x1400
DataTLBError:
#ifdef CONFIG_8xx_CPU6
	stw	r3, 8(r0)
	li	r3, 0x3f80
	stw	r3, 12(r0)
	lwz	r3, 12(r0)
#endif
	mtspr	M_TW, r20	/* Save a couple of working registers */
	mfcr	r20
	stw	r20, 0(r0)
	stw	r21, 4(r0)

	mfspr	r20, DAR
#ifdef  CONFIG_8xx_DCBxFIXED
	/* If DAR contains TAG_VAL implies a buggy dcbx instruction
	 * that did not set DAR.
	 */
	cmpwi	cr0, r20, TAG_VAL
	beq-	100f	/* Branch if TAG_VAL to dcbx workaround procedure */
101:	/* return from dcbx instruction bug workaround, r20 holds value of DAR */	
	/* First, make sure this was a store operation.
	*/
#endif
	mfspr	r21, DSISR
	andis.	r21, r21, 0x0200	/* If set, indicates store op */
//	beq	2f

	/* The EA of a data TLB miss is automatically stored in the MD_EPN 
	 * register.  The EA of a data TLB error is automatically stored in 
	 * the DAR, but not the MD_EPN register.  We must copy the 20 most 
	 * significant bits of the EA from the DAR to MD_EPN before we 
	 * start walking the page tables.  We also need to copy the CASID 
	 * value from the M_CASID register.
	 * Addendum:  The EA of a data TLB error is _supposed_ to be stored 
	 * in DAR, but it seems that this doesn't happen in some cases, such 
	 * as when the error is due to a dcbi instruction to a page with a 
	 * TLB that doesn't have the changed bit set.  In such cases, there 
	 * does not appear to be any way  to recover the EA of the error 
	 * since it is neither in DAR nor MD_EPN.  As a workaround, the 
	 * _PAGE_HWWRITE bit is set for all kernel data pages when the PTEs 
	 * are initialized in mapin_ram().  This will avoid the problem, 
	 * assuming we only use the dcbi instruction on kernel addresses.
	 */
	/* DAR is in r20 already */
	rlwinm	r21, r20, 0, 0, 19
	ori	r21, r21, MD_EVALID
	beq-	2f
	mfspr	r20, M_CASID
	rlwimi	r21, r20, 0, 28, 31
#ifdef CONFIG_8xx_CPU6
	li	r3, 0x3780
	stw	r3, 12(r0)
	lwz	r3, 12(r0)
#endif
	mtspr	MD_EPN, r21

	mfspr	r20, M_TWB	/* Get level 1 table entry address */

	/* If we are faulting a kernel address, we have to use the
	 * kernel page tables.
	 */
	andi.	r21, r20, 0x0800
	beq+	3f
	lis	r21, swapper_pg_dir@h
	ori	r21, r21, swapper_pg_dir@l
	rlwimi	r20, r21, 0, 2, 19
3:
	lwz	r21, 0(r20)	/* Get the level 1 entry */
	rlwinm.	r20, r21,0,0,19	/* Extract page descriptor page address */
//	beq	2f		/* If zero, bail */

	/* We have a pte table, so fetch the pte from the table.
	 */
	tophys(r21, r21)
	ori	r21, r21, 1		/* Set valid bit in physical L2 page */
	beq-	2f		/* If zero, bail */
#ifdef CONFIG_8xx_CPU6
	li	r3, 0x3b80
	stw	r3, 12(r0)
	lwz	r3, 12(r0)
#endif
	mtspr	MD_TWC, r21		/* Load pte table base address */
	mfspr	r21, MD_TWC		/* ....and get the pte address */
	lwz	r20, 0(r21)		/* Get the pte */

	andi.	r21, r20, _PAGE_RW	/* Is it writeable? */
//	beq	2f			/* Bail out if not */

	/* Update 'changed', among others.
	*/
	ori	r20, r20, _PAGE_DIRTY|_PAGE_ACCESSED|_PAGE_HWWRITE
	beq-	2f			/* Bail out if not */
	b	999b
	mfspr	r21, MD_TWC		/* Get pte address again */
	stw	r20, 0(r21)		/* and update pte in table */

	/* The Linux PTE won't go exactly into the MMU TLB.
	 * Software indicator bits 21, 22 and 28 must be clear.
	 * Software indicator bits 24, 25, 26, and 27 must be
	 * set.  All other Linux PTE bits control the behavior
	 * of the MMU.
	 */
	li	r21, 0x00f0
	rlwimi	r20, r21, 0, 24, 28	/* Set 24-27, clear 28 */

#ifdef CONFIG_8xx_CPU6
	li	r3, 0x3d80
	stw	r3, 12(r0)
	lwz	r3, 12(r0)
#endif
	mtspr	MD_RPN, r20	/* Update TLB entry */

#ifdef CONFIG_8xx_DCBxFIXED
#if TAG_VAL == 0x00f0 /* Save 1 instr. by reusing the val loaded in r21 above */
	mtspr	DAR, r21
#else
	TAG_DAR_R20
#endif
#endif
	mfspr	r20, M_TW	/* Restore registers */
	lwz	r21, 0(r0)
	mtcr	r21
	lwz	r21, 4(r0)
#ifdef CONFIG_8xx_CPU6
	lwz	r3, 8(r0)
#endif
	rfi
2:
	mfspr	r20, M_TW	/* Restore registers */
	lwz	r21, 0(r0)
	mtcr	r21
	lwz	r21, 4(r0)
#ifdef CONFIG_8xx_CPU6
	lwz	r3, 8(r0)
#endif
	b	DataAccess

	STD_EXCEPTION(0x1500, Trap_15, UnknownException)
	STD_EXCEPTION(0x1600, Trap_16, UnknownException)
	STD_EXCEPTION(0x1700, Trap_17, TAUException)
	STD_EXCEPTION(0x1800, Trap_18, UnknownException)
	STD_EXCEPTION(0x1900, Trap_19, UnknownException)
	STD_EXCEPTION(0x1a00, Trap_1a, UnknownException)
	STD_EXCEPTION(0x1b00, Trap_1b, UnknownException)

/* On the MPC8xx, these next four traps are used for development
 * support of breakpoints and such.  Someday I will get around to
 * using them.
 */
	STD_EXCEPTION(0x1c00, Trap_1c, UnknownException)
	STD_EXCEPTION(0x1d00, Trap_1d, UnknownException)
	STD_EXCEPTION(0x1e00, Trap_1e, UnknownException)
	STD_EXCEPTION(0x1f00, Trap_1f, UnknownException)

	. = 0x2000

#ifdef CONFIG_8xx_DCBxFIXED
/* This is the workaround procedure to calculate the data EA for buggy dcbx,dcbi instructions
 * by decoding the registers used by the dcbx instruction and adding them.
 * DAR is set to the calculated address and r20 also holds the EA on exit.
 */
//#define INSTR_CHECK /* define to verify if it is a dcbx instr. Should not be needed. */
//#define NO_SELF_MODIFYING_CODE /* define if you don't want to use self modifying code */
//#define DEBUG_DCBX_INSTRUCTIONS /* for debugging only. Needs INSTR_CHECK defined as well. */
//#define KERNEL_SPACE_ONLY /* define if user space do NOT contain dcbx instructions. */

#ifndef KERNEL_SPACE_ONLY
	nop	/* A few nops to make the modified_instr: space below cache line aligned */
	nop
139:	/* fetch instruction from userspace memory */
	DO_8xx_CPU6(0x3780, r3)
	mtspr	MD_EPN, r20
	mfspr	r21, M_TWB	/* Get level 1 table entry address */
	lwz	r21, 0(r21)	/* Get the level 1 entry */
	tophys  (r21, r21)
	DO_8xx_CPU6(0x3b80, r3)
	mtspr	MD_TWC, r21	/* Load pte table base address */
	mfspr	r21, MD_TWC	/* ....and get the pte address */
	lwz	r21, 0(r21)	/* Get the pte */
	/* concat physical page address(r21) and page offset(r20) */
	rlwimi	r21, r20, 0, 20, 31
	b	140f
#endif
100:	/* Entry point for dcbx workaround. */
	/* fetch instruction from memory. */
	mfspr	r20,SRR0
#ifndef KERNEL_SPACE_ONLY
	andis.	r21, r20, 0x8000
	tophys  (r21, r20)
	beq-	139b		/* Branch if user space address */
#else
	tophys  (r21, r20)
#endif
140:	lwz	r21,0(r21)
#ifdef INSTR_CHECK
/* Check if it really is a dcbx instruction. This is not needed as far as I can tell */
/* dcbt and dcbtst does not generate DTLB Misses/Errors, no need to include them here */
	rlwinm	r20, r21, 0, 21, 30
	cmpwi	cr0, r20, 2028	/* Is dcbz? */
	beq+	142f
	cmpwi	cr0, r20, 940	/* Is dcbi? */
	beq+	142f
	cmpwi	cr0, r20, 108	/* Is dcbst? */
	beq+	142f
	cmpwi	cr0, r20, 172	/* Is dcbf? */
	beq+	142f
	cmpwi	cr0, r20, 1964	/* Is icbi? */
	beq+	142f
#ifdef DEBUG_DCBX_INSTRUCTIONS
141:	b 141b /* Stop here if no dcbx instruction */
#endif
	mfspr	r20, DAR	/* r20 must hold DAR at exit */
	b 101b			/* None of the above, go back to normal TLB processing */
142:	/* continue, it was a dcbx instruction. */
#endif
#ifdef CONFIG_8xx_CPU6
	lwz	r3, 8(r0)		/* restore r3 from memory */
#endif
#ifndef NO_SELF_MODIFYING_CODE
	andis.	r20,r21,0x1f	/* test if reg RA is r0 */
	li	r20,modified_instr@l
	dcbtst	r0,r20		/* touch for store */
	rlwinm	r21,r21,0,0,20	/* Zero lower 10 bits */
	oris	r21,r21,640	/* Transform instr. to a "add r20,RA,RB" */
	ori	r21,r21,532
	stw	r21,0(r20)	/* store add/and instruction */
	dcbf	0,r20		/* flush new instr. to memory. */
	icbi	0,r20		/* invalidate instr. cache line */
	lwz	r21, 4(r0)	/* restore r21 from memory */
	mfspr	r20, M_TW	/* restore r20 from M_TW */
	isync			/* Wait until new instr is loaded from memory */
modified_instr:
	.space	4		/* this is where the add/and instr. is stored */
#ifdef DEBUG_DCBX_INSTRUCTIONS
	/* fill with some garbage */ 
	li	r21,0xffff
	stw	r21,0(r21)
#endif
	bne+	143f
	subf	r20,r0,r20		/* r20=r20-r0, only if reg RA is r0 */
143:	mtdar	r20			/* store faulting EA in DAR */
	b	101b			/* Go back to normal TLB handling */
#else
	mfctr	r20
	mtdar	r20			/* save ctr reg in DAR */
	rlwinm	r20, r21, 24, 24, 28	/* offset into jump table for reg RB */
	addi	r20, r20, 150f@l	/* add start of table */
	mtctr	r20			/* load ctr with jump address */
	xor	r20, r20, r20		/* sum starts at zero */
	bctr				/* jump into table */
150:
	add	r20, r20, r0
	b	151f
	add	r20, r20, r1
	b	151f
	add	r20, r20, r2
	b	151f
	add	r20, r20, r3
	b	151f
	add	r20, r20, r4
	b	151f
	add	r20, r20, r5
	b	151f
	add	r20, r20, r6
	b	151f
	add	r20, r20, r7
	b	151f
	add	r20, r20, r8
	b	151f
	add	r20, r20, r9
	b	151f
	add	r20, r20, r10
	b	151f
	add	r20, r20, r11
	b	151f
	add	r20, r20, r12
	b	151f
	add	r20, r20, r13
	b	151f
	add	r20, r20, r14
	b	151f
	add	r20, r20, r15
	b	151f
	add	r20, r20, r16
	b	151f
	add	r20, r20, r17
	b	151f
	add	r20, r20, r18
	b	151f
	add	r20, r20, r19
	b	151f
	mtctr	r21	/* reg 20 needs special handling */
	b	154f
	mtctr	r21	/* reg 21 needs special handling */
	b	153f
	add	r20, r20, r22
	b	151f
	add	r20, r20, r23
	b	151f
	add	r20, r20, r24
	b	151f
	add	r20, r20, r25
	b	151f
	add	r20, r20, r25
	b	151f
	add	r20, r20, r27
	b	151f
	add	r20, r20, r28
	b	151f
	add	r20, r20, r29
	b	151f
	add	r20, r20, r30
	b	151f
	add	r20, r20, r31
151:
	rlwinm. r21,r21,19,24,28	/* offset into jump table for reg RA */
	beq	152f			/* if reg RA is zero, don't add it */ 
	addi	r21, r21, 150b@l	/* add start of table */
	mtctr	r21			/* load ctr with jump address */
	rlwinm	r21,r21,0,16,10		/* make sure we don't execute this more than once */
	bctr				/* jump into table */
152:
	mfdar	r21
	mtctr	r21			/* restore ctr reg from DAR */
	mtdar	r20			/* save fault EA to DAR */
	b	101b			/* Go back to normal TLB handling */

	/* special handling for r20,r21 since these are modified already */
153:	lwz	r21, 4(r0)	/* load r21 from memory */
	b	155f
154:	mfspr	r21, M_TW	/* load r20 from M_TW */
155:	add	r20, r20, r21	/* add it */
	mfctr	r21		/* restore r21 */
	b	151b
#endif
#endif
/*
 * This code finishes saving the registers to the exception frame
 * and jumps to the appropriate handler for the exception, turning
 * on address translation.
 */
	.globl	transfer_to_handler
transfer_to_handler:
	stw	r22,_NIP(r21)
	lis	r22,MSR_POW@h
	andc	r23,r23,r22
	stw	r23,_MSR(r21)
	SAVE_4GPRS(8, r21)
	SAVE_8GPRS(12, r21)
	SAVE_8GPRS(24, r21)
	andi.	r23,r23,MSR_PR
	mfspr	r23,SPRG3		/* if from user, fix up THREAD.regs */
	beq	2f
	addi	r24,r1,STACK_FRAME_OVERHEAD
	stw	r24,PT_REGS(r23)
2:	addi	r2,r23,-THREAD		/* set r2 to current */
	tovirt(r2,r2)
	mflr	r23
	andi.	r24,r23,0x3f00		/* get vector offset */
	stw	r24,TRAP(r21)
	li	r22,0
	stw	r22,RESULT(r21)
	mtspr	SPRG2,r22		/* r1 is now kernel sp */
	addi	r24,r2,TASK_STRUCT_SIZE	/* check for kernel stack overflow */
	cmplw	0,r1,r2
	cmplw	1,r1,r24
	crand	1,1,4
	bgt-	stack_ovf		/* if r2 < r1 < r2+TASK_STRUCT_SIZE */
	lwz	r24,0(r23)		/* virtual address of handler */
	lwz	r23,4(r23)		/* where to go when done */
	mtspr	SRR0,r24
	mtspr	SRR1,r20
	mtlr	r23
	SYNC
	rfi				/* jump to handler, enable MMU */

/*
 * On kernel stack overflow, load up an initial stack pointer
 * and call StackOverflow(regs), which should not return.
 */
stack_ovf:
	addi	r3,r1,STACK_FRAME_OVERHEAD
	lis	r1,init_task_union@ha
	addi	r1,r1,init_task_union@l
	addi	r1,r1,TASK_UNION_SIZE-STACK_FRAME_OVERHEAD
	lis	r24,StackOverflow@ha
	addi	r24,r24,StackOverflow@l
	li	r20,MSR_KERNEL
	mtspr	SRR0,r24
	mtspr	SRR1,r20
	SYNC
	rfi

	.globl	giveup_fpu
giveup_fpu:
	blr

/* Maybe someday.......
*/
_GLOBAL(__setup_cpu_8xx)
	blr

/*
 * This is where the main kernel code starts.
 */
start_here:

	/* ptr to current */
	lis	r2,init_task_union@h
	ori	r2,r2,init_task_union@l

	/* ptr to phys current thread */
	tophys(r4,r2)
	addi	r4,r4,THREAD	/* init task's THREAD */
	mtspr	SPRG3,r4
	li	r3,0
	mtspr	SPRG2,r3	/* 0 => r1 has kernel sp */

	/* stack */
	addi	r1,r2,TASK_UNION_SIZE
	li	r0,0
	stwu	r0,-STACK_FRAME_OVERHEAD(r1)

	bl	early_init	/* We have to do this with MMU on */

/*
 * Decide what sort of machine this is and initialize the MMU.
 */
	mr	r3,r31
	mr	r4,r30
	mr	r5,r29
	mr	r6,r28
	mr	r7,r27
	bl	machine_init
	bl	MMU_init

/*
 * Go back to running unmapped so we can load up new values
 * and change to using our exception vectors.
 * On the 8xx, all we have to do is invalidate the TLB to clear
 * the old 8M byte TLB mappings and load the page table base register.
 */
	/* The right way to do this would be to track it down through
	 * init's THREAD like the context switch code does, but this is
	 * easier......until someone changes init's static structures.
	 */
	lis	r6, swapper_pg_dir@h
	ori	r6, r6, swapper_pg_dir@l
	tophys(r6,r6)
#ifdef CONFIG_8xx_CPU6
	lis	r4, cpu6_errata_word@h
	ori	r4, r4, cpu6_errata_word@l
	li	r3, 0x3980
	stw	r3, 12(r4)
	lwz	r3, 12(r4)
#endif
	mtspr	M_TWB, r6
	lis	r4,2f@h
	ori	r4,r4,2f@l
	tophys(r4,r4)
	li	r3,MSR_KERNEL & ~(MSR_IR|MSR_DR)
	mtspr	SRR0,r4
	mtspr	SRR1,r3
	rfi
/* Load up the kernel context */
2:
	SYNC			/* Force all PTE updates to finish */
	tlbia			/* Clear all TLB entries */
	sync			/* wait for tlbia/tlbie to finish */
	TLBSYNC			/* ... on all CPUs */

#ifdef CONFIG_BDI_SWITCH
	/* Add helper information for the Abatron bdiGDB debugger.
	 * We do this here because we know the mmu is disabled, and
	 * will be enabled for real in just a few instructions.
	 */
	tovirt(r6,r6)
	lis	r5, abatron_pteptrs@h
	ori	r5, r5, abatron_pteptrs@l
	stw	r5, 0xf0(r0)	/* Must match your Abatron config file */
	tophys(r5,r5)
	stw	r6, 0(r5)
#endif

/* Now turn on the MMU for real! */
	li	r4,MSR_KERNEL
	lis	r3,start_kernel@h
	ori	r3,r3,start_kernel@l
	mtspr	SRR0,r3
	mtspr	SRR1,r4
	rfi			/* enable MMU and jump to start_kernel */

/* Set up the initial MMU state so we can do the first level of
 * kernel initialization.  This maps the first 8 MBytes of memory 1:1
 * virtual to physical.  Also, set the cache mode since that is defined
 * by TLB entries and perform any additional mapping (like of the IMMR).
 * If configured to pin some TLBs, we pin the first 8 Mbytes of kernel,
 * 24 Mbytes of data, and the 8M IMMR space.  Anything not covered by
 * these mappings is mapped by page tables.
 */
initial_mmu:
	tlbia			/* Invalidate all TLB entries */
#ifdef CONFIG_PIN_TLB
	lis	r8, MI_RSV4I@h
	ori	r8, r8, 0x1c00
#else
	li	r8, 0
#endif
	mtspr	MI_CTR, r8	/* Set instruction MMU control */

#ifdef CONFIG_PIN_TLB
	lis	r10, (MD_RSV4I | MD_RESETVAL)@h
	ori	r10, r10, 0x1c00
	mr	r8, r10
#else
	lis	r10, MD_RESETVAL@h
#endif
#ifndef CONFIG_8xx_COPYBACK
	oris	r10, r10, MD_WTDEF@h
#endif
	mtspr	MD_CTR, r10	/* Set data TLB control */

	/* Now map the lower 8 Meg into the TLBs.  For this quick hack,
	 * we can load the instruction and data TLB registers with the
	 * same values.
	 */
	lis	r8, KERNELBASE@h	/* Create vaddr for TLB */
	ori	r8, r8, MI_EVALID	/* Mark it valid */
	mtspr	MI_EPN, r8
	mtspr	MD_EPN, r8
	li	r8, MI_PS8MEG		/* Set 8M byte page */
	ori	r8, r8, MI_SVALID	/* Make it valid */
	mtspr	MI_TWC, r8
	mtspr	MD_TWC, r8
	li	r8, MI_BOOTINIT		/* Create RPN for address 0 */
	mtspr	MI_RPN, r8		/* Store TLB entry */
	mtspr	MD_RPN, r8
	lis	r8, MI_Kp@h		/* Set the protection mode */
	mtspr	MI_AP, r8
	mtspr	MD_AP, r8

	/* Map another 8 MByte at the IMMR to get the processor
	 * internal registers (among other things).
	 */
#ifdef CONFIG_PIN_TLB
	addi	r10, r10, 0x0100
	mtspr	MD_CTR, r10
#endif
	mfspr	r9, 638			/* Get current IMMR */
	andis.	r9, r9, 0xff80		/* Get 8Mbyte boundary */

	mr	r8, r9			/* Create vaddr for TLB */
	ori	r8, r8, MD_EVALID	/* Mark it valid */
	mtspr	MD_EPN, r8
	li	r8, MD_PS8MEG		/* Set 8M byte page */
	ori	r8, r8, MD_SVALID	/* Make it valid */
	mtspr	MD_TWC, r8
	mr	r8, r9			/* Create paddr for TLB */
	ori	r8, r8, MI_BOOTINIT|0x2 /* Inhibit cache -- Cort */
	mtspr	MD_RPN, r8

#ifdef CONFIG_PIN_TLB
	/* Map two more 8M kernel data pages.
	*/
	addi	r10, r10, 0x0100
	mtspr	MD_CTR, r10

	lis	r8, KERNELBASE@h	/* Create vaddr for TLB */
	addis	r8, r8, 0x0080		/* Add 8M */
	ori	r8, r8, MI_EVALID	/* Mark it valid */
	mtspr	MD_EPN, r8
	li	r9, MI_PS8MEG		/* Set 8M byte page */
	ori	r9, r9, MI_SVALID	/* Make it valid */
	mtspr	MD_TWC, r9
	li	r11, MI_BOOTINIT	/* Create RPN for address 0 */
	addis	r11, r11, 0x0080	/* Add 8M */
	mtspr	MD_RPN, r8

	addis	r8, r8, 0x0080		/* Add 8M */
	mtspr	MD_EPN, r8
	mtspr	MD_TWC, r9
	addis	r11, r11, 0x0080	/* Add 8M */
	mtspr	MD_RPN, r8
#endif

	/* Since the cache is enabled according to the information we
	 * just loaded into the TLB, invalidate and enable the caches here.
	 * We should probably check/set other modes....later.
	 */
	lis	r8, IDC_INVALL@h
	mtspr	IC_CST, r8
	mtspr	DC_CST, r8
	lis	r8, IDC_ENABLE@h
	mtspr	IC_CST, r8
#ifdef CONFIG_8xx_COPYBACK
	mtspr	DC_CST, r8
#else
	/* For a debug option, I left this here to easily enable
	 * the write through cache mode
	 */
	lis	r8, DC_SFWT@h
	mtspr	DC_CST, r8
	lis	r8, IDC_ENABLE@h
	mtspr	DC_CST, r8
#endif
	blr


/*
 * Set up to use a given MMU context.
 * r3 is context number, r4 is PGD pointer.
 *
 * We place the physical address of the new task page directory loaded
 * into the MMU base register, and set the ASID compare register with
 * the new "context."
 */
_GLOBAL(set_context)

#ifdef CONFIG_BDI_SWITCH
	/* Context switch the PTE pointer for the Abatron BDI2000.
	 * The PGDIR is passed as second argument.
	 */
	lis	r5, KERNELBASE@h
	lwz	r5, 0xf0(r5)
	stw	r4, 0x4(r5)
#endif

#ifdef CONFIG_8xx_CPU6
	lis	r6, cpu6_errata_word@h
	ori	r6, r6, cpu6_errata_word@l
	tophys	(r4, r4)
	li	r7, 0x3980
	stw	r7, 12(r6)
	lwz	r7, 12(r6)
        mtspr   M_TWB, r4               /* Update MMU base address */
	li	r7, 0x3380
	stw	r7, 12(r6)
	lwz	r7, 12(r6)
        mtspr   M_CASID, r3             /* Update context */
#else
        mtspr   M_CASID,r3		/* Update context */
	tophys	(r4, r4)
	mtspr	M_TWB, r4		/* and pgd */
#endif
	SYNC
	blr

#ifdef CONFIG_8xx_CPU6
/* It's here because it is unique to the 8xx.
 * It is important we get called with interrupts disabled.  I used to
 * do that, but it appears that all code that calls this already had
 * interrupt disabled.
 */
	.globl	set_dec_cpu6
set_dec_cpu6:
	lis	r7, cpu6_errata_word@h
	ori	r7, r7, cpu6_errata_word@l
	li	r4, 0x2c00
	stw	r4, 8(r7)
	lwz	r4, 8(r7)
        mtspr   22, r3		/* Update Decrementer */
	SYNC
	blr
#endif

/*
 * We put a few things here that have to be page-aligned.
 * This stuff goes at the beginning of the data segment,
 * which is page-aligned.
 */
	.data
	.globl	sdata
sdata:
	.globl	empty_zero_page
empty_zero_page:
	.space	4096

	.globl	swapper_pg_dir
swapper_pg_dir:
	.space	4096

/*
 * This space gets a copy of optional info passed to us by the bootstrap
 * Used to pass parameters into the kernel like root=/dev/sda1, etc.
 */
	.globl	cmd_line
cmd_line:
	.space	512

#ifdef CONFIG_BDI_SWITCH
/* Room for two PTE table poiners, usually the kernel and current user
 * pointer to their respective root page table (pgdir).
 */
abatron_pteptrs:
	.space	8
#endif

#ifdef CONFIG_8xx_CPU6
	.globl	cpu6_errata_word
cpu6_errata_word:
	.space	16
#endif

^ permalink raw reply

* e300 (MPC5121) dlmzb
From: Fortini Matteo @ 2009-09-25 11:07 UTC (permalink / raw)
  To: linux-ppc list

I was trying to insert an optimized strlen() function using the 
following code taken from the ibm site on an MPC5121, but it crashes the 
kernel.
Is it because it's an unsupported op, or because I'm missing some needed 
steps?

Thank you,
Matteo

_GLOBAL(strlen)
    addi   r4,0,8    // Load byte count of 8
    mtxer  r4        // Set byte count for load string
    xor    r4,r4,r4  // r4 = 0, r4 == accumulator
1:
    lswx   r5,r3,r4  // load string into r5 & r6
    dlmzb. r12,r5,r6 // find NULL byte and record in r7.
    add    r4,r4,r12 // Update accumulator.
    beq    1b        // Loop if NULL not found.
    addi   r3,r4,-1  // Subtract 1 for NULL byte.
    blr              // Return length

^ permalink raw reply

* Re: lite5200b kernel not booting
From: Jon Smirl @ 2009-09-25 12:15 UTC (permalink / raw)
  To: Asier Llano Palacios; +Cc: linuxppc-dev, Aitor Arzuaga
In-Reply-To: <1253803279.4632.91.camel@allano>

On Thu, Sep 24, 2009 at 10:41 AM, Asier Llano Palacios
<asierllano@gmail.com> wrote:
> Hi Grant,
>
> We've been working with a lite5200b for a while, we have been working
> with the ppc platform in linux 2.6.x for 5 years and it worked properly
> until 2.6.25 included. We want to switch to the powerpc platform but it
> doesn't seem to work.
>
> After the bootloader (tested with the uboot 1.2.0 and 2009.08) starts
> the cuImage.lite5200 it doesn't show anything in the console.
>
> I'd like to know if the lite5200b is still supported and which version
> is known to work with it and what is the default configuration. I want
> to test it like the developers do, until I configure it myself.
>
> I've managed to do some debugging in assembler, to know that it works
> properly until DCACHE is enabled in setup_common_caches of
> arch/powerpc/kernel/cpu_setup_6xx.S. If I skip enabling the DCACHE it
> continues properly until the MMU is enabled.
>
> I'm only debugging it writing to the serial port registers in assembler,
> so I'm not very sure if it continues properly or if I am not able to
> debug it after the DCACHE is enabled or the MMU is enabled. I want to
> debug it with a JTAG debugger, but I still don't have one (do you
> recommend me anyone?).

No one has tried the Macraigor USB wiggler on the mpc5200 and reported
back if it works.

Chart says it is supported...
http://www.macraigor.com/cpus.htm

It's a $250 device so it would good to know if it works.
http://www.macraigor.com/usbWiggler.htm

uboot 1.2 is very old. That may be the cause of your problems. For
example old u-boots don't initialize the PCI hardware correctly on
systems that don't have PCI implemented.

In general the current powerpc kernel works fine on the mpc5200b. We
are running it on four different CPU boards but I don't have a
lite5200b.

We use the Phytec dev boards. We've never had any trouble with them.
http://www.phytec.de/de/produkte/rapid-development-kits/linux-kits/produktdetails.html?tx_ttproducts_pi1[backPID]=270&tx_ttproducts_pi1[product]=62&cHash=6fbc8dcdb2

http://www.phytec.com/products/rdk/PowerPC/phyCORE-MPC5200B-tinyRDK.html

>
> Kind regards and thank you for the help,
> Asier
>
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev
>

-- 
Jon Smirl
jonsmirl@gmail.com

^ permalink raw reply

* linux-next: 20090925 - hvc driver build breaks with !HVC_CONSOLE
From: Kamalesh Babulal @ 2009-09-25 13:31 UTC (permalink / raw)
  To: Stephen Rothwell; +Cc: linuxppc-dev, linux-next, LKML
In-Reply-To: <20090925133830.1ba29584.sfr@canb.auug.org.au>

Hi Stephen,

	next-20090925 randconfig build breaks on hvcs driver on powerpc,
with HVC_CONSOLE=n.

ERROR: ".hvc_put_chars" [drivers/char/hvcs.ko] undefined!
ERROR: ".hvc_get_chars" [drivers/char/hvcs.ko] undefined!

adding the dependency of HVC_CONSOLE helped

Signed-off-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
--
 drivers/char/Kconfig |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/char/Kconfig b/drivers/char/Kconfig
index a2a0e67..2583231 100644
--- a/drivers/char/Kconfig
+++ b/drivers/char/Kconfig
@@ -682,7 +682,7 @@ config VIRTIO_CONSOLE
 
 config HVCS
 	tristate "IBM Hypervisor Virtual Console Server support"
-	depends on PPC_PSERIES
+	depends on PPC_PSERIES && HVC_CONSOLE
 	help
 	  Partitionable IBM Power5 ppc64 machines allow hosting of
 	  firmware virtual consoles from one Linux partition by
			
			Kamalesh

^ permalink raw reply related

* Re: [PATCH v3 0/3] cpu: pseries: Cpu offline states framework
From: Peter Zijlstra @ 2009-09-25 14:48 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Gautham R Shenoy, linux-kernel, Venkatesh Pallipadi,
	Arun R Bharadwaj, linuxppc-dev, Darrick J. Wong
In-Reply-To: <1253753501.7103.358.camel@pasglop>

On Thu, 2009-09-24 at 10:51 +1000, Benjamin Herrenschmidt wrote:
> On Tue, 2009-09-15 at 14:11 +0200, Peter Zijlstra wrote:
> > I still think its a layering violation... its the hypervisor manager
> > that should be bothered in what state an off-lined cpu is in. 
> > 
> That's not how our hypervisor works.

Then fix it?

> If you ask through the management interface, to remove a CPU from a
> partition, the HV will communicate with a daemon inside the partition
> that will then unplug the CPU via the right call.
> 
> I don't really understand your objections to be honest. And I fail to
> see why it would be a layering violation to have the ability for the OS
> to indicate in what state it wishes to relinguish a CPU to the
> hypervisor, which more or less defines what is the expected latency for
> getting it back later on.

OK, so the main objection is the abuse of CPU hotplug as resource
management feature.

CPU hotplug is terribly invasive and expensive to the kernel, doing
hotplug on a minute basis is just plain crazy.

If you want a CPU in a keep it near and don't hand it back to the HV
state, why not use cpusets to isolate it and simply not run tasks on it?

cpusets don't use stopmachine and are much nicer to the rest of the
kernel over-all.

^ permalink raw reply

* Re: e300 (MPC5121) dlmzb
From: Kumar Gala @ 2009-09-25 16:34 UTC (permalink / raw)
  To: Fortini Matteo; +Cc: linux-ppc list
In-Reply-To: <4ABCA454.5030807@mta.it>


On Sep 25, 2009, at 4:07 AM, Fortini Matteo wrote:

> I was trying to insert an optimized strlen() function using the  
> following code taken from the ibm site on an MPC5121, but it crashes  
> the kernel.
> Is it because it's an unsupported op, or because I'm missing some  
> needed steps?
>
> Thank you,
> Matteo
>
> _GLOBAL(strlen)
>   addi   r4,0,8    // Load byte count of 8
>   mtxer  r4        // Set byte count for load string
>   xor    r4,r4,r4  // r4 = 0, r4 == accumulator
> 1:
>   lswx   r5,r3,r4  // load string into r5 & r6
>   dlmzb. r12,r5,r6 // find NULL byte and record in r7.
>   add    r4,r4,r12 // Update accumulator.
>   beq    1b        // Loop if NULL not found.
>   addi   r3,r4,-1  // Subtract 1 for NULL byte.
>   blr              // Return length

dlmzb. isn't supported on e300.  I believe only 4xx supports it.

- k

^ permalink raw reply

* Re: [v6 PATCH 0/7]: cpuidle/x86/POWER: Cleanup idle power management code in x86, cleanup drivers/cpuidle/cpuidle.c and introduce cpuidle to POWER.
From: Arun R Bharadwaj @ 2009-09-25 17:08 UTC (permalink / raw)
  To: Peter Zijlstra, Joel Schopp, Benjamin Herrenschmidt,
	Paul Mackerras, Ingo Molnar, Vaidyanathan Srinivasan,
	Dipankar Sarma, Balbir Singh, Gautham R Shenoy, Shaohua Li,
	Venkatesh Pallipadi, Arun Bharadwaj
  Cc: linuxppc-dev, linux-kernel
In-Reply-To: <20090922112526.GA7788@linux.vnet.ibm.com>

* Arun R Bharadwaj <arun@linux.vnet.ibm.com> [2009-09-22 16:55:27]:
Hi,

I have done the following experiments and have posted the results
below.


Average of 5 iterations:
------------------------------------------------------------------------------
------------------------------------------------------------------------------

Kernbench make -j16 results on a
16 core x86 machine _with_deep_sleep_ support (C1,C2,C3)


Without the patches applied			With the patches applied


	31.8s						30.4s

------------------------------------------------------------------------------
------------------------------------------------------------------------------


Kernbench make -j8 results on a
8 core x86 machine _without_deep_sleep_ support (only mwait)


Without the patches applied			With the patches applied


	20.2s						20.4s

------------------------------------------------------------------------------
------------------------------------------------------------------------------

Kernbench make -j8 results on a 8 core _dedicated_lpar_pSeries_ machine


Without the patches applied                     With the patches applied


	4m, 37s						4m, 36s

------------------------------------------------------------------------------
------------------------------------------------------------------------------

Please let me know if any other kind of testing is necessary.


Based on the feedback, I will post out the next iteration with a few
minor bug fixes.


thanks,
arun

^ permalink raw reply

* Re: [patch] powerpc: build modules outside the kernel tree fails, if it was built using O=
From: Sam Ravnborg @ 2009-09-25 19:45 UTC (permalink / raw)
  To: Yuri Frolov; +Cc: rep.dot.nop, linuxppc-dev, linux-kbuild
In-Reply-To: <4ABB57CB.3000409@ru.mvista.com>

On Thu, Sep 24, 2009 at 03:28:11PM +0400, Yuri Frolov wrote:
> Hello,
> 
> here is a corresponding bug: http://bugzilla.kernel.org/show_bug.cgi?id=11143
> This patch should correctly export crtsavres.o in order to make O= option working.
> Please, consider to apply.

Hi Yuri.

I like the way you do the extra link in Makefile.modpost.
But you need to redo some parts as per comments below.

> 
> 
> Fix linking modules against crtsavres.o

Please elaborate more on what this commit does.

> 
> Previously we got
>   CC      drivers/char/hw_random/rng-core.mod.o
>   LD [M]  drivers/char/hw_random/rng-core.ko
> /there/src/buildroot.git.ppc/build_powerpc_nofpu/staging_dir/usr/bin/powerpc-linux-uclibc-ld: arch/powerpc/lib/crtsavres.o: No such file: No such file or directory

Always good to include error messages.

> 	* Makefile (LDFLAGS_MODULE_PREREQ): New variable to hold prerequisite
>           files for modules.
> 	* arch/powerpc/Makefile: add crtsavres.o to LDFLAGS_MODULE_PREREQ.
> 	* scripts/Makefile.modpost (cmd_as_o_S): Copy from Makefile.build.
> 	  (cmd_ld_ko_o): Also link LDFLAGS_MODULE_PREREQ.
> 	  Provide rule to build objects from assembler.
But this GNUism can go - we do not use it in the kernel.
 
> 
> Signed-off-by:  Bernhard Reutner-Fischer  <rep.dot.nop@gmail.com>
> Signed-off by:  Yuri Frolov <yfrolov@ru.mvista.com>
> 
> Makefile                 |    2 ++
> arch/powerpc/Makefile    |    2 +-
> scripts/Makefile.modpost |   12 ++++++++++--
> 3 files changed, 13 insertions(+), 3 deletions(-)
> 
> diff -urpN -X linux-2.6/Documentation/dontdiff linux-2.6/arch/powerpc/Makefile linux-2.6-powerpc-crtsavres/arch/powerpc/Makefile
> --- linux-2.6/arch/powerpc/Makefile	2009-09-17 20:04:31.000000000 +0400
> +++ linux-2.6-powerpc-crtsavres/arch/powerpc/Makefile	2009-09-23 22:08:03.000000000 +0400
> @@ -93,7 +93,7 @@ else
>  	KBUILD_CFLAGS += $(call cc-option,-mtune=power4)
>  endif
>  else
> -LDFLAGS_MODULE	+= arch/powerpc/lib/crtsavres.o
> +LDFLAGS_MODULE_PREREQ += arch/powerpc/lib/crtsavres.o
>  endif

The naming sucks.
How about:

KBUILD_MODULE_LINK_SOURCE

This would tell the reader that this is source to be linked on a module.

And this is an arch specific thing so no need to preset it in top-level
Makefile.
But it is mandatory to include a description in Documentation/kbuild/kbuild.txt


> --- linux-2.6/scripts/Makefile.modpost	2009-09-17 20:04:42.000000000 +0400
> +++ linux-2.6-powerpc-crtsavres/scripts/Makefile.modpost	2009-09-23 22:15:00.000000000 +0400
> @@ -122,14 +122,22 @@ quiet_cmd_cc_o_c = CC      $@
>        cmd_cc_o_c = $(CC) $(c_flags) $(CFLAGS_MODULE)	\
>  		   -c -o $@ $<
>  
> -$(modules:.ko=.mod.o): %.mod.o: %.mod.c FORCE
> +quiet_cmd_as_o_S = AS $(quiet_modtag)  $@
> +cmd_as_o_S       = $(CC) $(a_flags) $(AFLAGS_MODULE) -c -o $@ $<

Align this so cmd_as_o_S is under each other - as we do for cmd_cc_o_c


> +
> +$(LDFLAGS_MODULE_PREREQ): %.o: %.S FORCE
> +	$(Q)mkdir -p $(dir $@)
> +	$(call if_changed_dep,as_o_S)
Good catch with the mkdir - needed for O= builds.
I think we shall wrap this in
ifdef KBUILD_MODULE_LINK_SOURCE
...
endif

So we do not have an empty rule when it is not defined.

Please fix up these things and resubmit.

Thanks,
	Sam

^ permalink raw reply

* RE: PCI HotPlug and Adding Resources after Linux Boots
From: Morrison, Tom @ 2009-09-25 20:26 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev@ozlabs.org
In-Reply-To: <1253698927.7103.312.camel@pasglop>

Ben/all,

I have had some luck reserving some memory (via in this autoscan
(which reserves the resources)) which in this hack code - I reserve
an appropriate amount of resources for the bridge (by detecting=20
this special type of port).

The problem comes later on - when the Hotplug event comes - and it
still can't allocate the resources...

A member of this group (who is away from the office right now) had=20
the following comments:

>> In your case, if the only problem that you are running into is=20
>> that the resources cannot be assigned to the FPGA, it may be=20
>> sufficient to hardcode the forwarding addresses and subordinate=20
>> bridge number for port 6 of your 8616.  The reason you are seeing=20
>> that error message is because the parent bridge device for the=20
>> detected FPGA (port 6 of the 8616) does not have forwarded=20
>> resource regions that match what the FPGA is trying to claim.

I guess the question I am asking is my friend's statement possibly
true. And, if he is correct - I am a little confused as to exactly=20
where/how do I determine and configure this forwarding=20
addresses/subordinate bridge number (I'm really a newbie at this
level of PCI configuration).

Thank you in advance to any/all who can help me figure this out!

Tom



>> -----Original Message-----

[Morrison, Tom]=20
>>=20
>> There's a few things you can do, though I don't have time just right now
>> to give you a detailed answer. I'll try again later.
>>=20
>> In the meantime, some of the answers could be around not using full
>> automatic resource assignment, but instead, pre-initializing the top
>> bridge with some resources that are going to be enough for the device.
>>=20
>> You can also try to get the bridge to re-allocate. There's various funky
>> locking issues with doing that though as long as it's during boot time,
>> it's not too much of a problem.
>>=20
>> There are other more or less hackish ways to do it, but I'll have to
>> give it more thought.
>>=20
>> I'm quite stretched at the moment so if you don't hear back from me in
>> the upcoming few days, don't hesitate to ping me again.
>>=20
>> Cheers,
>> Ben.
>>=20
>> >
>> > I am running Linux (2.6.23x (and 2.6.27.x)) on a MPC8572 based system.
>> >
>> >
>> >
>> > I have an 8616 switch that has a Port (6) connected to a FPGA that is
>> >
>> > NOT loaded at before Linux boots (note: this port is configured for
>> > HOTPLUG
>> >
>> > events - which we do get after FPGA  is loaded). We are NOT using a
>> >
>> > static device tree map (because the devices in the system are very
>> > dynamic).
>> >
>> >
>> > We use instead the pci auto scan mechanism(s) to scan/assign
>> > resources
>> >
>> > (including into the BAR registers) at bootup to all of the devices
>> > that are
>> >
>> > attached to this MPC8572...
>> >
>> >
>> >
>> > Here is the port that is attached to the device (note: there are NO
>> >
>> > resources assigned at this point this port):
>> >
>> >
>> >
>> > ----------------------------------------------------------------------=
-
>> --------------------------
>> >
>> > 02:06.0 PCI bridge: PLX Technology, Inc.: Unknown device 8616 (rev bb)
>> > (prog-if 00 [Normal decode])
>> >
>> >         Flags: bus master, fast devsel, latency 0
>> >
>> >         Bus: primary=3D02, secondary=3D05, subordinate=3D05, sec-laten=
cy=3D0
>> >
>> >         Capabilities: [40] Power Management version 3
>> >
>> >         Capabilities: [48] Message Signalled Interrupts: 64bit+
>> > Queue=3D0/2 Enable+
>> >
>> >         Capabilities: [68] #10 [0162]
>> >
>> >         Capabilities: [a4] #0d [0000]
>> >
>> >
>> >
>> > root@slave7 ~ # lspci -t
>> >
>> > -+-[01]---00.0-[02-05]--+-01.0
>> >
>> >  |                      +-04.0-[03]--
>> >
>> >  |                      +-05.0-[04]--
>> >
>> >  |                      \-06.0-[05]-
>> >
>> >
>> >
>> > ----------------------------------------------------------------------=
-
>> --------------------------
>> >
>> >
>> >
>> > Later, after I detect there is an FPGA to load - I load it. At
>> > completion of the
>> >
>> > loading of the FPGA - the 8616  detects the FPGA - and creates a
>> > HotPlug
>> >
>> > event that the PCI Express HotPlug Driver handles:
>> >
>> > ----------------------------------------------------------------------=
-
>> --------------------------
>> >
>> >
>> >
>> > root@slave7 ~ # pciehp: pcie_isr: intr_loc 8
>> >
>> > pciehp: pciehp:  Presence/Notify input change.
>> >
>> > pciehp: Card present on Slot(0005_0070)
>> >
>> > pciehp: Surprise Removal
>> >
>> > pciehp: hpc_get_power_status: SLOTCTRL 80 value read 8
>> >
>> > pciehp: hpc_get_attention_status: SLOTCTRL 80, value read 8
>> >
>> > pciehp: board_added: slot device, slot offset, hp slot =3D 0, 0 ,0
>> >
>> > pciehp: hpc_check_lnk_status: lnk_status =3D 2021
>> >
>> > PCI: Found 0000:05:00.0 [1172/0004] 00ff00 00
>> >
>> > PCI: Calling quirk c0012d3c for 0000:05:00.0
>> >
>> > program_fw_provided_values: Could not get hotplug parameters
>> >
>> > entering assign resources (size: 2000000)
>> >
>> > PCI: Failed to allocate mem resource #0:2000000@0 for 0000:05:00.0
>> >
>> > bus pci: add device 0000:05:00.0
>> >
>> > entering uevent
>> >
>> > pci: Trying to Match Device 0000:05:00.0 with Driver pcieport-driver
>> >
>> > pci: Trying to Match Device 0000:05:00.0 with Driver serial
>> >
>> > pci: Trying to Match Device 0000:05:00.0 with Driver pexntb
>> >
>> > pciehp: hpc_get_power_status: SLOTCTRL 80 value read 8
>> >
>> > pciehp: hpc_get_attention_status: SLOTCTRL 80, value read 8
>> >
>> >
>> >
>> > 02:06.0 PCI bridge: PLX Technology, Inc.: Unknown device 8616 (rev bb)
>> > (prog-if 00 [Normal decode])
>> >
>> >         Flags: bus master, fast devsel, latency 0
>> >
>> >         Bus: primary=3D02, secondary=3D05, subordinate=3D05, sec-laten=
cy=3D0
>> >
>> >         Capabilities: [40] Power Management version 3
>> >
>> >         Capabilities: [48] Message Signalled Interrupts: 64bit+
>> > Queue=3D0/2 Enable+
>> >
>> >         Capabilities: [68] #10 [0162]
>> >
>> >         Capabilities: [a4] #0d [0000]
>> >
>> >
>> >
>> > 05:00.0 Class ff00: Altera Corporation: Unknown device 0004 (rev 01)
>> >
>> >         Subsystem: Altera Corporation: Unknown device 0004
>> >
>> >         Flags: fast devsel
>> >
>> >         Capabilities: [50] Message Signalled Interrupts: 64bit+
>> > Queue=3D0/5 Enable-
>> >
>> >         Capabilities: [78] Power Management version 3
>> >
>> >         Capabilities: [80] #10 [0001]
>> >
>> >
>> >
>> > root@slave7 ~ # lspci -t
>> >
>> > -+-[01]---00.0-[02-05]--+-01.0
>> >
>> >  |                      +-04.0-[03]--
>> >
>> >  |                      +-05.0-[04]--
>> >
>> >  |                      \-06.0-[05]----00.0
>> >
>> >  \-[00]---00.0
>> >
>> >
>> >
>> > ----------------------------------------------------------------------=
-
>> --------------------------
>> >
>> >
>> >
>> > So, as you can see - the device has been read - and it requires 32M of
>> > resources, but
>> >
>> > because its parent doesn't have any resources allocated - it seemingly
>> > can't allocate and
>> >
>> > use any additional resources.
>> >
>> >
>> >
>> > How do I 'customize' and/or add resources at this point for this
>> > device (using semi-standard mechanisms)?
>> >
>> >
>> >
>> > Thanks in advance for any/all ideas...
>> >
>> >
>> >
>> >
>> >
>> > I
>> >
>> >
>> >
>> > Tom Morrison
>> > Principal Software Engineer
>> >
>> > EMPIRIX
>> > 20 Crosby Drive - Bedford, MA  01730
>> > p: 781.266.3567 f: 781.266.3670
>> > email: tmorrison@empirix.com
>> > www.empirix.com
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > _______________________________________________
>> > Linuxppc-dev mailing list
>> > Linuxppc-dev@lists.ozlabs.org
>> > https://lists.ozlabs.org/listinfo/linuxppc-dev
>>=20

^ permalink raw reply

* warning: allocated section `.data_nosave'  not in segment
From: Sean MacLennan @ 2009-09-25 20:54 UTC (permalink / raw)
  To: linuxppc-dev

Anybody else getting these? They just started popping up with the
recent changes for 2.6.32. 

ppc_4xxFP-ld: .tmp_vmlinux1: warning: allocated section `.data_nosave'
 not in segment KSYM    .tmp_kallsyms1.S
  AS      .tmp_kallsyms1.o
  LD      .tmp_vmlinux2
ppc_4xxFP-ld: .tmp_vmlinux2: warning: allocated section `.data_nosave' not in segment
  KSYM    .tmp_kallsyms2.S
  AS      .tmp_kallsyms2.o
  LD      vmlinux
ppc_4xxFP-ld: vmlinux: warning: allocated section `.data_nosave' not in segment

Cheers,
   Sean

^ permalink raw reply

* Re: [PATCH v3 0/3] cpu: pseries: Cpu offline states framework
From: Benjamin Herrenschmidt @ 2009-09-25 21:12 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Gautham R Shenoy, linux-kernel, Venkatesh Pallipadi,
	Arun R Bharadwaj, linuxppc-dev, Darrick J. Wong
In-Reply-To: <1253890120.18939.189.camel@laptop>

On Fri, 2009-09-25 at 16:48 +0200, Peter Zijlstra wrote:
> On Thu, 2009-09-24 at 10:51 +1000, Benjamin Herrenschmidt wrote:
> > On Tue, 2009-09-15 at 14:11 +0200, Peter Zijlstra wrote:
> > > I still think its a layering violation... its the hypervisor manager
> > > that should be bothered in what state an off-lined cpu is in. 
> > > 
> > That's not how our hypervisor works.
> 
> Then fix it?

Are you serious ? :-)

> CPU hotplug is terribly invasive and expensive to the kernel, doing
> hotplug on a minute basis is just plain crazy.
> 
> If you want a CPU in a keep it near and don't hand it back to the HV
> state, why not use cpusets to isolate it and simply not run tasks on it?
> 
> cpusets don't use stopmachine and are much nicer to the rest of the
> kernel over-all.

Gautham, what is the different in term of power saving between having
it idle for long periods of time (which could do H_CEDE and with NO_HZ,
probably wouln't need to wake up that often) and having it unplugged in
a H_CEDE loop ?

Ben.

^ permalink raw reply

* Re: [PATCH] powerpc/8xx: fix regression introduced by cache coherency rewrite
From: Rex Feany @ 2009-09-25 21:18 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev@ozlabs.org
In-Reply-To: <1253847827.7103.504.camel@pasglop>

Thus spake Benjamin Herrenschmidt (benh@kernel.crashing.org):

> 
> > I think there's more finishyness to 8xx than we thought. IE. That
> > tlbil_va might have more reasons to be there than what the comment
> > seems to advertize. Can you try to move it even higher up ? IE.
> > Unconditionally at the beginning of set_pte_filter ?
> > 
> > Also, if that doesn't help, can you try putting one in
> > set_access_flags_filter() just below ?
> 
> Ok, I got a refresher on the whole concept of "unpopulated TLB entries"
> on 8xx, and that's damn scary. I think what mislead me initially is that
> the comment around the workaround is simply not properly describing the
> extent of the problem :-)

Oh boy, that sounds bad. Where is a good place to read about this?

> So I'm not going to make the 8xx TLB miss code sane, that's beyond what
> I'm prepare to do with it, but I suspect that this should fix it (on top
> of upstream). Let me know if that's enough or if we also need to put
> one of these in ptep_set_access_flags().
> 
> Please let me know if that works for you.

Putting the tlbil_va() in the top of set_pte_filter() doesn't work - it
hangs on boot before it even prints any messages to the console.

However, adding tlbil_va() to ptep_set_access_flags() as you suggested
makes everything happy. I need to test it some more, but it looks good
so far. Below is what I am testing now.

thanks!
/rex.


diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm/pgtable.c
index 5304093..aef552a 100644
--- a/arch/powerpc/mm/pgtable.c
+++ b/arch/powerpc/mm/pgtable.c
@@ -176,18 +176,19 @@ static pte_t set_pte_filter(pte_t pte, unsigned long addr)
 		struct page *pg = maybe_pte_to_page(pte);
 		if (!pg)
 			return pte;
-		if (!test_bit(PG_arch_1, &pg->flags)) {
 #ifdef CONFIG_8xx
-			/* On 8xx, cache control instructions (particularly
-			 * "dcbst" from flush_dcache_icache) fault as write
-			 * operation if there is an unpopulated TLB entry
-			 * for the address in question. To workaround that,
-			 * we invalidate the TLB here, thus avoiding dcbst
-			 * misbehaviour.
-			 */
-			/* 8xx doesn't care about PID, size or ind args */
-			_tlbil_va(addr, 0, 0, 0);
+		/* On 8xx, cache control instructions (particularly
+		 * "dcbst" from flush_dcache_icache) fault as write
+		 * operation if there is an unpopulated TLB entry
+		 * for the address in question. To workaround that,
+		 * we invalidate the TLB here, thus avoiding dcbst
+		 * misbehaviour.
+		 */
+		/* 8xx doesn't care about PID, size or ind args */
+		_tlbil_va(addr, 0, 0, 0);
 #endif /* CONFIG_8xx */
+
+		if (!test_bit(PG_arch_1, &pg->flags)) {
 			flush_dcache_icache_page(pg);
 			set_bit(PG_arch_1, &pg->flags);
 		}
@@ -308,6 +309,12 @@ int ptep_set_access_flags(struct vm_area_struct *vma, unsigned long address,
 	int changed;
 	entry = set_access_flags_filter(entry, vma, dirty);
 	changed = !pte_same(*(ptep), entry);
+
+#ifdef CONFIG_8xx
+	/* 8xx doesn't care about PID, size or ind args */
+	_tlbil_va(address, 0, 0, 0);
+#endif /* CONFIG_8xx */
+
 	if (changed) {
 		if (!(vma->vm_flags & VM_HUGETLB))
 			assert_pte_locked(vma->vm_mm, address);

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox