LinuxPPC-Dev Archive on lore.kernel.org

LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed

* Re: PROBLEM: memory corrupting bug, bisected to 6dda9d55
From: pacman @ 2010-10-18 19:10 UTC (permalink / raw)
  To: Mel Gorman; +Cc: linux-mm, Andrew Morton, linuxppc-dev, linux-kernel
In-Reply-To: <20101018113331.GB30667@csn.ul.ie>

Mel Gorman writes:
> 
> A bit but I still don't know why it would cause corruption. Maybe this is still
> a caching issue but the difference in timing between list_add and list_add_tail
> is enough to hide the bug. It's also possible there are some registers
> ioremapped after the memmap array and reading them is causing some
> problem.

I've been doing a lot more tests and I'm sure that 6dda9d55 is not really
responsible. It just happens to provoke the bug in my particular setup.
Whatever it is, it's very sensitive to small changes.

At the end of free_all_bootmem, the free list for order 9 has 4 entries.
Which one is at the head of the list depends on whether 6dda9d55 is applied
or not. If page number 130048 is at the head of the list, it gets used fairly
soon, and everything's fine. The alternative is that page number 64512 is at
the head of the list, so it gets used fairly soon, and corruption occurs.

> 
> Andrew, what is the right thing to do here? We could flail around looking
> for explanations as to why the bug causes a user buffer corruption but never
> get an answer or do we go with this patch, preferably before 2.6.36 releases?

I've been flailing around quite a bit. Here's my latest result:

Since I can view the corruption with md5sum /sbin/e2fsck, I know it's in a
clean cached page. So I made an extra copy of /sbin/e2fsck, which won't be
loaded into memory during boot. So now after the corruption happens, I can
  cmp -l /sbin/e2fsck good-e2fsck
for a quick look at the changed bytes. Much easier than provoking a segfault
under gdb.

Then I got really creative and wrote a cmp replacement which mmaps the files
and reports the physical addresses from /proc/self/pagemap of the pages that
don't match. And the consistent result is that physical pages 64604 and 64609
(both in the range of the order=9 64512) have wrong contents. And the
corruption is always a single word 128 bytes after the start of the page.
Physical addresses 0x0fc5c080 and 0x0fc61080 are hit every time.

The values of the corrupted words, observed in 5 consecutive boots, were:
  at 0fc5c080   at 0fc61080
  -----------   -----------
  c3540000      92510000
  565c0000      23590000
  c85b0000      97580000
  d15f0000      9e5c0000
  d95b0000      a8580000

The low 16 bits are all 0 and the upper 16 bits seem randomly distributed.
But look at the differences:

  c3540000 - 92510000 = 31030000
  565c0000 - 23590000 = 33030000
  c85b0000 - 97580000 = 31030000
  d15f0000 - 9e5c0000 = 33030000
  d95b0000 - a8580000 = 31030000

This means something... but I don't know what.

In a completely different method of investigation, I went back a few stable
kernels, got 2.6.33.7 and applied 6dda9d55 to it, thinking that if 6dda9d55
only reveals a pre-existing bug, I could bisect it using 6dda9d55 as a
bug-revealing assistant. The bug appeared when running 2.6.33.7 with 6dda9d55
applied. That was discouraging.

>This patch fixes the problem by ensuring we are not reading a possibly
>invalid location of memory. It's not clear why the read causes
>corruption but one way or the other it is a buggy read.

At least that part of the explanation is wrong. Where's the buggy read?
The action taken by the 6dda9d55 version of __free_one_page looks perfectly
legitimate to me. Page numbers:

[129024       ] [130048       ]   order=10
[129024 129536] [130048 130560]   order=9

130048 is being freed. 130560 is not free. 129024 (the higher_buddy) is
already free at order=10. So 130048 is being pushed to the tail of the free
list, on the speculation that 130560 might soon be free and then the whole
thing will form an order=11 free page, the only problem being that order=11
is too high so that later merge will never happen. It's not useful, and maybe
not conceptually valid to say that 129024 is the buddy of 130048, but it is
an existing page, and the only way it wouldn't be is if the total memory size
was not a multiple of 1<<(MAX_ORDER-1) pages

-- 
Alan Curry

^ permalink raw reply

* RE: msi_bitmap.c question
From: Tirumala Marri @ 2010-10-18 18:04 UTC (permalink / raw)
  To: michael; +Cc: linuxppc-dev
In-Reply-To: <1287102354.4194.15.camel@concordia>

> >
> > I am trying to resubmit a patch for MSI support for ppc4xx devices.
> > One of the review feedback was not to use the bit map as it is only
> > for the devices which don=92t have hard wired mapping between interrupt
> > controller interrupts and MSI number. For example intr-ctrl0
> interrupt
> > 20 goes to MSI-0, interrupt 21 goes to MSI-1 ..etc. But when I
> checked
> > freescale SoCs and cell SoCs they have interrupts hard wired to MSI
> > interrupts.
> >
> >
> >
> > Why do they have to use the bitmap and create irqhost, even though
> > they are one-to-one mapped between interrupt controller numbers and
> > MSI ?
>
> I'm not quite sure I understand your question.
>
> The MSI bitmap and the irq_host are two different things.
>
> The MSI bitmap is basically an allocator for hardware numbers that can
> be used for MSI. On some interrupt controllers that might be any
> interrupt that's not used, on others there are restrictions on which
> numbers can be used for MSI, it depends. So it's possible you don't
> need to use that code, but I don't know how your hardware works.
>
> The irq_host is the struct that controls mapping hardware irq numbers
> into linux irq numbers. The cell MSI code has no restrictions on what
> the MSI value is, so it just uses the Linux irq number directly using
> irq_create_direct_mapping().
>

Mike, thanks. Could please you clarify your statement
" The cell MSI code has no restrictions on what  the MSI value is ".
If MSIs are one to one mapped to system interrupt controller
Interrupts, why do we need to create new irq_host? Isn't passing
Interrupt controllers irq_host instance not enough ?

Also when is cascade is needed?

Regards,
Marri

^ permalink raw reply

* Re: Freescale P2020 CPU Freeze over PCIe abort signal
From: Eran Liberty @ 2010-10-18 18:00 UTC (permalink / raw)
  Cc: linuxppc-dev, linux-pci
In-Reply-To: <4CBB4D80.3030007@extricom.com>

Eran Liberty wrote:
> This should probably go to the Freescale support, as it feels like a 
> hardware issue yet the end result is a very frozen Linux kernel so I 
> post here first...
>
> I have a programmable FPGA PCIe device connected to a Freescale's 
> P2020 PCIe port. As part of the bring-up tests, we are testing two 
> faulty scenarios:
> 1. The FPGA totally ignores the PCIe transaction.
> 2. The FPGA return a transaction abort.
>
> Both are plausible PCIe behavior and their should be outcome is 
> documented in the PCIe spec. The first should be terminated by the 
> transaction requestor timeout mechanism and raise an error, the second 
> should abort the transaction and raise and error.
>
> In P2020 if I do any of those the CPU is left hung over the transaction.
>
> something like:
> in_le32(addr)
>
> is turned into:
> 7c 00 04 ac     sync   7c 00 4c 2c     lwbrx   r0,0,r9
> 0c 00 00 00     twi     0,r0,0
> 4c 00 01 2c     isync
>
> assembly code, where in r9 (in this example) hold an address which is 
> physically mapped into the PCIe resource space.
>
> The CPU will hang over the load instruction.
>
> Just for the fun of it, I have wrote my own assembly function omitting 
> everything but the load instruction; still freeze.
> Replace "lwbrx" with a simple "lwz"; still freeze.
>
> It looks like the CPU snoozes till the PCIe transaction is done with 
> no timeouts, ignoring any abort signal.
>
> I am going to:
> A. Try to reach the Freescale support.
> B. Asked the FPGA designed to give me a new behavior that will stall 
> the PCIe transaction replay for 10 sec, but after those return ok.
> C. report back here with either A or B.
>
> If you have any ideas I would love to hear them.
>
> -- Liberty
>
Some more info:

As said the the FPGA designer provided me a PCIe device that will stall 
its response to a variable amount of time. The CPU became un-frozen 
after this amount of time. More over, we have found that in that period 
till it un-froze the PCIe core did a retry to that transaction over and 
over every 40 ms. This gave me the bright idea to look for the word 
"retry" in the Freescale documentation which rewarded me with these 
registers:

------------------------------------------------------- snip 
-------------------------------------------------------
16.3.2.3        PCI Express Outbound Completion Timeout Register
                (PEX_OTB_CPL_TOR)
The PCI Express outbound completion timeout register, shown in Figure 
16-4, contains the maximum wait
time for a response to come back as a result of an outbound non-posted 
request before a timeout condition
occurs.
 Offset 
0x00C                                                                                                
Access: Read/Write
         0   1              5     7   
8                                                                                      
31
     R
        TD            
—                                                            TC
     W
 Reset 0     0  0  0   0   0   0  0   0   0   0   1    0  0   0  0    
1   1  1    1   1  1   1   1   1  1   1   1  1  1   1  1
            Figure 16-4. PCI Express Outbound Completion Timeout 
Register (PEX_OTB_CPL_TOR)
Table 16-6 describes the PCI Express outbound completion timeout 
register fields.
                                 Table 16-6. PEX_OTB_CPL_TOR Field 
Descriptions
  Bits     Name                                                     
Description
   0        TD     Timeout disable. This bit controls the 
enabling/disabling of the timeout function.
                   0 Enable completion timeout
                   1 Disable completion timeout
  1–7        —     Reserved
 8–31       TC     Timeout counter. This is the value that is used to 
load the response counter of the completion timeout.
                   One TC unit is 8× the PCI Express controller clock 
period; that is, one TC unit is 20 ns at 400 MHz, and 30
                   ns at 266.66 MHz.
                   The following are examples of timeout periods based 
on different TC settings:
                   0x00_0000 Reserved
                   0x10_FFFF 22.28 ms at 400 MHz controller clock; 33.34 
ms at 266.66 MHz controller clock
                   0xFF_FFFF 335.54 ms at 400 MHz controller clock; 
503.31 ms at 266.66 MHz controller clock


16.3.2.4       PCI Express Configuration Retry Timeout Register
               (PEX_CONF_RTY_TOR)
The PCI Express configuration retry timeout register, shown in Figure 
16-5, contains the maximum time
period during which retries of configuration transactions which resulted 
in a CRS response occur.
 Offset 
0x010                                                                               
Access: Read/Write
         0  1     3   
4                                                                                     
31
     R
        RD     —                                                 TC
     W
 Reset 0    0  0  0  0   1  0  0  0  0   0  0   0  0  0  0   1  1  1  1  
1  1   1   1 1 1   1  1   1 1   1  1
           Figure 16-5. PCI Express Configuration Retry Timeout Register 
(PEX_CONF_RTY_TOR)
                            QorIQ P2020 Integrated Processor Reference 
Manual, Rev. 0
16-12                                                                                   
Freescale Semiconductor
                                                                                                 
PCI Express Interface Controller
Table 16-7 describes the PCI Express configuration retry timeout 
register fields.
                            Table 16-7. PEX_CONF_RTY_TOR Field Descriptions
  Bits  Name                                                     Description
   0     RD    Retry disable. This bit disables the retry of a 
configuration transaction that receives a CRS status response
               packet.
               0 Enable retry of a configuration transaction in response 
to receiving a CRS status response until the timeout
                  counter (defined by the PEX_CONF_RTY_TOR[TC] field) 
has expired.
               1 Disable retry of a configuration transaction regardless 
of receiving a CRS status response.
  1–3     —    Reserved
 4–31    TC    Timeout counter. This is the value that is used to load 
the CRS response counter.
               One TC unit is 8× the PCI Express controller clock 
period; that is, one TC unit is 20 ns at 400 MHz and 30 ns
               at 266.66 MHz.
               Timeout period based on different TC settings:
               0x000_0000        Reserved
               0x400_FFFF        1.34 s at 400 MHz controller clock, 
2.02 s at 266.66 MHz controller clock
               0xFFF_FFFF        5.37 s at 400 MHz controller clock, 
8.05 s at 266.66 MHz controller clock
------------------------------------------------------- snap 
-------------------------------------------------------

Now this is all nice on the paper, but what the P2020 seems to be doing 
in reality is
1. never expire
2. do re-tries even in the non configuration access

I am going to try to disable completion timeout and see if I get better 
behavior.

-- Liberty

^ permalink raw reply

* [PATCH 1/7] microblaze: pci-common cleanup
From: Nishanth Aravamudan @ 2010-10-18 17:26 UTC (permalink / raw)
  To: nacc
  Cc: Michal Simek, linux-kernel, miltonm, Jesse Barnes,
	microblaze-uclinux, linuxppc-dev, Bjorn Helgaas
In-Reply-To: <1287422825-14999-1-git-send-email-nacc@us.ibm.com>

Use set_dma_ops and remove now used-once oddly named temp pointer sd.

Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
Cc: benh@kernel.crashing.org
Cc: linuxppc-dev@lists.ozlabs.org
---
 arch/microblaze/pci/pci-common.c |    6 ++----
 1 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/arch/microblaze/pci/pci-common.c b/arch/microblaze/pci/pci-common.c
index 55ef532..9631e1d 100644
--- a/arch/microblaze/pci/pci-common.c
+++ b/arch/microblaze/pci/pci-common.c
@@ -1075,8 +1075,6 @@ void __devinit pcibios_setup_bus_devices(struct pci_bus *bus)
 		 bus->number, bus->self ? pci_name(bus->self) : "PHB");
 
 	list_for_each_entry(dev, &bus->devices, bus_list) {
-		struct dev_archdata *sd = &dev->dev.archdata;
-
 		/* Setup OF node pointer in archdata */
 		dev->dev.of_node = pci_device_to_OF_node(dev);
 
@@ -1086,8 +1084,8 @@ void __devinit pcibios_setup_bus_devices(struct pci_bus *bus)
 		set_dev_node(&dev->dev, pcibus_to_node(dev->bus));
 
 		/* Hook up default DMA ops */
-		sd->dma_ops = pci_dma_ops;
-		sd->dma_data = (void *)PCI_DRAM_OFFSET;
+		set_dma_ops(&dev->dev, pci_dma_ops);
+		dev->dev.archdata.dma_data = (void *)PCI_DRAM_OFFSET;
 
 		/* Read default IRQs and fixup if necessary */
 		pci_read_irq_line(dev);
-- 
1.7.1

^ permalink raw reply related

* [PATCH 7/7] ppc64 iommu: use coherent_dma_mask for alloc_coherent
From: Nishanth Aravamudan @ 2010-10-18 17:27 UTC (permalink / raw)
  To: nacc; +Cc: linux-kernel, miltonm, Paul Mackerras, Brian King, linuxppc-dev
In-Reply-To: <1287422825-14999-1-git-send-email-nacc@us.ibm.com>

The IOMMU code has been passing the dma-mask instead of the
coherent_dma_mask to the iommu allocator.  Coherent allocations should
be made using the coherent_dma_mask.

Also update the vio code to ensure the coherent_dma_mask is set. Without
this change drivers, such as ibmvscsi, fail to load with the corrected
dma_iommu_alloc_coherent().

Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
---
We currently don't check the mask other than to warn when its being set,
so I don't think this is stable material.
---
 arch/powerpc/kernel/dma-iommu.c |    2 +-
 arch/powerpc/kernel/vio.c       |    4 ++++
 2 files changed, 5 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/kernel/dma-iommu.c b/arch/powerpc/kernel/dma-iommu.c
index 6e54a0f..e755415 100644
--- a/arch/powerpc/kernel/dma-iommu.c
+++ b/arch/powerpc/kernel/dma-iommu.c
@@ -19,7 +19,7 @@ static void *dma_iommu_alloc_coherent(struct device *dev, size_t size,
 				      dma_addr_t *dma_handle, gfp_t flag)
 {
 	return iommu_alloc_coherent(dev, get_iommu_table_base(dev), size,
-				    dma_handle, device_to_mask(dev), flag,
+				    dma_handle, dev->coherent_dma_mask, flag,
 				    dev_to_node(dev));
 }
 
diff --git a/arch/powerpc/kernel/vio.c b/arch/powerpc/kernel/vio.c
index 3c3083f..f15b3df 100644
--- a/arch/powerpc/kernel/vio.c
+++ b/arch/powerpc/kernel/vio.c
@@ -1259,6 +1259,10 @@ struct vio_dev *vio_register_device_node(struct device_node *of_node)
 	viodev->dev.parent = &vio_bus_device.dev;
 	viodev->dev.bus = &vio_bus_type;
 	viodev->dev.release = vio_dev_release;
+        /* needed to ensure proper operation of coherent allocations
+         * later, in case driver doesn't set it explicitly */
+        dma_set_mask(&viodev->dev, DMA_BIT_MASK(64));
+        dma_set_coherent_mask(&viodev->dev, DMA_BIT_MASK(64));
 
 	/* register with generic device framework */
 	if (device_register(&viodev->dev)) {
-- 
1.7.1

^ permalink raw reply related

* [PATCH 2/7] ppc/vio: use dma ops helpers
From: Nishanth Aravamudan @ 2010-10-18 17:26 UTC (permalink / raw)
  To: nacc; +Cc: linux-kernel, miltonm, Paul Mackerras, Brian King, linuxppc-dev
In-Reply-To: <1287422825-14999-1-git-send-email-nacc@us.ibm.com>

Use the set_dma_ops helper. Instead of modifying vio_dma_mapping_ops,
just create a trivial wrapper for dma_supported.

Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
---
 arch/powerpc/kernel/vio.c |   11 ++++++++---
 1 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/vio.c b/arch/powerpc/kernel/vio.c
index d692989..3c3083f 100644
--- a/arch/powerpc/kernel/vio.c
+++ b/arch/powerpc/kernel/vio.c
@@ -602,6 +602,11 @@ static void vio_dma_iommu_unmap_sg(struct device *dev,
 	vio_cmo_dealloc(viodev, alloc_size);
 }
 
+static int vio_dma_iommu_dma_supported(struct device *dev, u64 mask)
+{
+        return dma_iommu_ops.dma_supported(dev, mask);
+}
+
 struct dma_map_ops vio_dma_mapping_ops = {
 	.alloc_coherent = vio_dma_iommu_alloc_coherent,
 	.free_coherent  = vio_dma_iommu_free_coherent,
@@ -609,6 +614,7 @@ struct dma_map_ops vio_dma_mapping_ops = {
 	.unmap_sg       = vio_dma_iommu_unmap_sg,
 	.map_page       = vio_dma_iommu_map_page,
 	.unmap_page     = vio_dma_iommu_unmap_page,
+	.dma_supported  = vio_dma_iommu_dma_supported,
 
 };
 
@@ -860,8 +866,7 @@ static void vio_cmo_bus_remove(struct vio_dev *viodev)
 
 static void vio_cmo_set_dma_ops(struct vio_dev *viodev)
 {
-	vio_dma_mapping_ops.dma_supported = dma_iommu_ops.dma_supported;
-	viodev->dev.archdata.dma_ops = &vio_dma_mapping_ops;
+	set_dma_ops(&viodev->dev, &vio_dma_mapping_ops);
 }
 
 /**
@@ -1246,7 +1251,7 @@ struct vio_dev *vio_register_device_node(struct device_node *of_node)
 	if (firmware_has_feature(FW_FEATURE_CMO))
 		vio_cmo_set_dma_ops(viodev);
 	else
-		viodev->dev.archdata.dma_ops = &dma_iommu_ops;
+		set_dma_ops(&viodev->dev, &dma_iommu_ops);
 	set_iommu_table_base(&viodev->dev, vio_build_iommu_table(viodev));
 	set_dev_node(&viodev->dev, of_node_to_nid(of_node));
 
-- 
1.7.1

^ permalink raw reply related

* [PATCH 5/7] ppc/dart: iommu table cleanup
From: Nishanth Aravamudan @ 2010-10-18 17:27 UTC (permalink / raw)
  To: nacc
  Cc: linux-kernel, miltonm, André Goddard Rosa, Paul Mackerras,
	Tejun Heo, linuxppc-dev
In-Reply-To: <1287422825-14999-1-git-send-email-nacc@us.ibm.com>

No need to set the device tree device_node pci node iommu pointer, its
only used for dlpar remove.

Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
---
 arch/powerpc/sysdev/dart_iommu.c |    9 +--------
 1 files changed, 1 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/sysdev/dart_iommu.c b/arch/powerpc/sysdev/dart_iommu.c
index 17cf15e..8e9e06a 100644
--- a/arch/powerpc/sysdev/dart_iommu.c
+++ b/arch/powerpc/sysdev/dart_iommu.c
@@ -312,17 +312,10 @@ static void pci_dma_dev_setup_dart(struct pci_dev *dev)
 
 static void pci_dma_bus_setup_dart(struct pci_bus *bus)
 {
-	struct device_node *dn;
-
 	if (!iommu_table_dart_inited) {
 		iommu_table_dart_inited = 1;
 		iommu_table_dart_setup();
 	}
-
-	dn = pci_bus_to_OF_node(bus);
-
-	if (dn)
-		PCI_DN(dn)->iommu_table = &iommu_table_dart;
 }
 
 static bool dart_device_on_pcie(struct device *dev)
@@ -373,7 +366,7 @@ void __init iommu_init_early_dart(void)
 	if (dn == NULL) {
 		dn = of_find_compatible_node(NULL, "dart", "u4-dart");
 		if (dn == NULL)
-			goto bail;
+			return;	/* use default direct_dma_ops */
 		dart_is_u4 = 1;
 	}
 
-- 
1.7.1

^ permalink raw reply related

* [PATCH 3/7] ppc/pasemi: clean up pasemi iommu table initializations
From: Nishanth Aravamudan @ 2010-10-18 17:27 UTC (permalink / raw)
  To: nacc
  Cc: linuxppc-dev, miltonm, linux-kernel, Paul Mackerras,
	Olof Johansson, Yinghai Lu
In-Reply-To: <1287422825-14999-1-git-send-email-nacc@us.ibm.com>

No need for empty helpers with iommu off, the ppc_md hooks are optional.

The direct_dma_ops are the default pci_dma_ops, so no need to set in the
them iommu off case.

No need to set the device tree device_node pci node iommu pointer, its
only used for dlpar remove.

Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
Acked-by: Olof Johansson <olof@lixom.net>
---
 arch/powerpc/platforms/pasemi/iommu.c |   19 +------------------
 1 files changed, 1 insertions(+), 18 deletions(-)

diff --git a/arch/powerpc/platforms/pasemi/iommu.c b/arch/powerpc/platforms/pasemi/iommu.c
index 1f9fb2c..14943ef 100644
--- a/arch/powerpc/platforms/pasemi/iommu.c
+++ b/arch/powerpc/platforms/pasemi/iommu.c
@@ -156,20 +156,12 @@ static void iommu_table_iobmap_setup(void)
 
 static void pci_dma_bus_setup_pasemi(struct pci_bus *bus)
 {
-	struct device_node *dn;
-
 	pr_debug("pci_dma_bus_setup, bus %p, bus->self %p\n", bus, bus->self);
 
 	if (!iommu_table_iobmap_inited) {
 		iommu_table_iobmap_inited = 1;
 		iommu_table_iobmap_setup();
 	}
-
-	dn = pci_bus_to_OF_node(bus);
-
-	if (dn)
-		PCI_DN(dn)->iommu_table = &iommu_table_iobmap;
-
 }
 
 
@@ -192,9 +184,6 @@ static void pci_dma_dev_setup_pasemi(struct pci_dev *dev)
 	set_iommu_table_base(&dev->dev, &iommu_table_iobmap);
 }
 
-static void pci_dma_bus_setup_null(struct pci_bus *b) { }
-static void pci_dma_dev_setup_null(struct pci_dev *d) { }
-
 int __init iob_init(struct device_node *dn)
 {
 	unsigned long tmp;
@@ -251,14 +240,8 @@ void __init iommu_init_early_pasemi(void)
 	iommu_off = of_chosen &&
 			of_get_property(of_chosen, "linux,iommu-off", NULL);
 #endif
-	if (iommu_off) {
-		/* Direct I/O, IOMMU off */
-		ppc_md.pci_dma_dev_setup = pci_dma_dev_setup_null;
-		ppc_md.pci_dma_bus_setup = pci_dma_bus_setup_null;
-		set_pci_dma_ops(&dma_direct_ops);
-
+	if (iommu_off)
 		return;
-	}
 
 	iob_init(NULL);
 
-- 
1.7.1

^ permalink raw reply related

* [PATCH 6/7] ppc/pseries: iommu cleanup
From: Nishanth Aravamudan @ 2010-10-18 17:27 UTC (permalink / raw)
  To: nacc
  Cc: devicetree-discuss, linux-kernel, miltonm, Paul Mackerras,
	Anton Blanchard, linuxppc-dev
In-Reply-To: <1287422825-14999-1-git-send-email-nacc@us.ibm.com>

No need to initialize per-cpu pointer to NULL, it is the default.

Direct dma ops and no setup are the defaults, no need to set for
iommu-off.

Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
Reviewed-by: Grant Likely <grant.likely@secretlab.ca>
---
 arch/powerpc/platforms/pseries/iommu.c |    9 ++-------
 1 files changed, 2 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
index a77bcae..9184db3 100644
--- a/arch/powerpc/platforms/pseries/iommu.c
+++ b/arch/powerpc/platforms/pseries/iommu.c
@@ -140,7 +140,7 @@ static int tce_build_pSeriesLP(struct iommu_table *tbl, long tcenum,
 	return ret;
 }
 
-static DEFINE_PER_CPU(u64 *, tce_page) = NULL;
+static DEFINE_PER_CPU(u64 *, tce_page);
 
 static int tce_buildmulti_pSeriesLP(struct iommu_table *tbl, long tcenum,
 				     long npages, unsigned long uaddr,
@@ -589,13 +589,8 @@ static struct notifier_block iommu_reconfig_nb = {
 /* These are called very early. */
 void iommu_init_early_pSeries(void)
 {
-	if (of_chosen && of_get_property(of_chosen, "linux,iommu-off", NULL)) {
-		/* Direct I/O, IOMMU off */
-		ppc_md.pci_dma_dev_setup = NULL;
-		ppc_md.pci_dma_bus_setup = NULL;
-		set_pci_dma_ops(&dma_direct_ops);
+	if (of_chosen && of_get_property(of_chosen, "linux,iommu-off", NULL))
 		return;
-	}
 
 	if (firmware_has_feature(FW_FEATURE_LPAR)) {
 		if (firmware_has_feature(FW_FEATURE_MULTITCE)) {
-- 
1.7.1

^ permalink raw reply related

* [PATCH 4/7] ppc/cell: beat dma ops cleanup
From: Nishanth Aravamudan @ 2010-10-18 17:27 UTC (permalink / raw)
  To: nacc
  Cc: cbe-oss-dev, Arnd Bergmann, linux-kernel, miltonm, Paul Mackerras,
	linuxppc-dev, David S. Miller
In-Reply-To: <1287422825-14999-1-git-send-email-nacc@us.ibm.com>

direct_dma_ops is the default pci dma ops.

No need to call a function to get the pci dma ops, we know they are the
dma_direct_ops.

Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
---
 arch/powerpc/platforms/cell/beat_iommu.c |    3 +--
 1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/platforms/cell/beat_iommu.c b/arch/powerpc/platforms/cell/beat_iommu.c
index beec405..3ce6855 100644
--- a/arch/powerpc/platforms/cell/beat_iommu.c
+++ b/arch/powerpc/platforms/cell/beat_iommu.c
@@ -76,7 +76,7 @@ static void __init celleb_init_direct_mapping(void)
 
 static void celleb_dma_dev_setup(struct device *dev)
 {
-	dev->archdata.dma_ops = get_pci_dma_ops();
+	set_dma_ops(dev, &dma_direct_ops);
 	set_dma_offset(dev, celleb_dma_direct_offset);
 }
 
@@ -106,7 +106,6 @@ static struct notifier_block celleb_of_bus_notifier = {
 static int __init celleb_init_iommu(void)
 {
 	celleb_init_direct_mapping();
-	set_pci_dma_ops(&dma_direct_ops);
 	ppc_md.pci_dma_dev_setup = celleb_pci_dma_dev_setup;
 	bus_register_notifier(&platform_bus_type, &celleb_of_bus_notifier);
 
-- 
1.7.1

^ permalink raw reply related

* Re: CONFIG_FEC is not good for mpc8xx ethernet?
From: Scott Wood @ 2010-10-18 16:42 UTC (permalink / raw)
  To: tiejun.chen; +Cc: ppcdev
In-Reply-To: <4CBC080A.7030005@windriver.com>

On Mon, 18 Oct 2010 16:40:42 +0800
"tiejun.chen" <tiejun.chen@windriver.com> wrote:

> Shawn Jin wrote:
> > Hi,
> > 
> > My target is a mpc875 based board and has FEC ethernet. The phy is
> > AM79C874. I have the following configuration for the network support.
> > 
> > CONFIG_PHYLIB=y
> > CONFIG_NET_ETHERNET=y
> > CONFIG_MII=y
> > CONFIG_FS_ENET=y
> > CONFIG_FS_ENET_HAS_FEC=y
> > CONFIG_FS_ENET_MDIO_FEC=y
> > 
> > However I found that the phy support (AM79C874) is actually in
> > drivers/net/fec.c which is compiled only when CONFIG_FEC=y. However
> 
> The phy driver should not be embedded into the NIC driver in theory.

Right, those are handled by drivers/net/phy/.

> I think you should include the phy driver, mdio-bitbang.c, which should be
> support AMD79C874.

On MPC8xx you want drivers/net/fs_enet/mii-fec.c.  This is just the
MDIO driver; it doesn't handle any particular PHY.  I don't know if
there is a driver specifically for AM79C874, though the generic PHY
support may be good enough.

-Scott

^ permalink raw reply

* Re: [PATCH 1/2] P4080/eLBC: Make Freescale elbc interrupt common to elbc devices
From: Scott Wood @ 2010-10-18 16:06 UTC (permalink / raw)
  To: tiejun.chen
  Cc: B07421, dedekind1, dwmw2, B25806, linuxppc-dev, linux-mtd, akpm,
	B11780
In-Reply-To: <4CBC0B95.9010300@windriver.com>

On Mon, 18 Oct 2010 16:55:49 +0800
"tiejun.chen" <tiejun.chen@windriver.com> wrote:

> Looks you always iounmap(fsl_lbc_ctrl_dev->regs) on position 'err' but here
> of_iomap() is already failed you should skip iounmap() fsl_lbc_ctrl_dev->regs
> again. So you should improve that as the following on 'err', or layout 'err' in
> gain.
> ------
> 	if(fsl_lbc_ctrl_dev->regs)
> 		iounmap(fsl_lbc_ctrl_dev->regs);
> 

It looks like iounmap(NULL) is a no-op, just like kfree(NULL).

-Scott

^ permalink raw reply

* RE: Parsing DTS file
From: Ivo Schenk @ 2010-10-18 15:22 UTC (permalink / raw)
  To: linuxppc-dev
In-Reply-To: <201010181413.58302.dargaud@lpsc.in2p3.fr>

>Hello all,
>I'm sure there's a standard way of doing this: I'd like to access some =
info=20
>from the DTS file from a usermode program (things like address of =
device,=20
>etc...)
>What is the best way to do this ? Write a parser ? Or query the running =
kernel=20
>somehow ?

Compile your kernel with CONFIG_PROC_DEVICETREE, then you can access all =
entries via filesystem:

[root@t:~] # ll /proc/device-tree/

-r--r--r--  1 root root  4 Oct 18 17:17 #address-cells
-r--r--r--  1 root root  4 Oct 18 17:17 #size-cells
dr-xr-xr-x  2 root root  0 Oct 18 17:17 chosen
-r--r--r--  1 root root 13 Oct 18 17:17 compatible
dr-xr-xr-x  3 root root  0 Oct 18 17:17 cpus
-r--r--r--  1 root root 13 Oct 18 17:17 dtb-revision
-r--r--r--  1 root root  4 Oct 18 17:17 interrupt-parent
dr-xr-xr-x  8 root root  0 Oct 18 17:17 localbus
dr-xr-xr-x  2 root root  0 Oct 18 17:17 memory
-r--r--r--  1 root root  5 Oct 18 17:17 model
-r--r--r--  1 root root  1 Oct 18 17:17 name
dr-xr-xr-x  2 root root  0 Oct 18 17:17 pci@f0000d00
-r--r--r--  1 root root  9 Oct 18 17:17 serialno
dr-xr-xr-x 32 root root  0 Oct 18 17:17 soc5200@f0000000

--=20
Ivo Schenk
http://www.delphin.de/

Delphin Technology AG
Sitz der Gesellschaft: Bergisch Gladbach
Registergericht: Amtsgericht Koeln, HRB 47544
Vorstand: Ursula Renner (Vorsitzende), Frank Ringsdorf
Vorsitzender des Aufsichtsrats: Peter Renner

^ permalink raw reply

* Re: BUG: dead loop in PowerPC hcall tracepoint (Was: [LTP] [PATCH v2] Add ftrace-stress-test to LTP)
From: Steven Rostedt @ 2010-10-18 14:25 UTC (permalink / raw)
  To: Li Zefan
  Cc: ltp-list, Peter Zijlstra, linuxppc-dev, LKML, Paul Mackerras,
	Anton Blanchard, Ingo Molnar, subrata
In-Reply-To: <4CBBBCB0.8070406@cn.fujitsu.com>

On Mon, 2010-10-18 at 11:19 +0800, Li Zefan wrote:

> This is a dead loop:
> 
> trace_hcall_entry() -> trace_clock_global() -> trace_hcall_entry() ..
> 
> And this is a PPC specific bug. Hope some ppc guys will fix it?
> Or we kill trace_clock_global() if no one actually uses it..

trace_clock_global() is used by many. I use it (and recommend using it)
on boxes where the TSC is horribly out of sync, and the trace needs
synchronization between CPUs.

The trace_hcall_entry and exit has wrappers already. Just add recursion
protection there.

Perhaps something like this:

(Not compiled nor ran)

+static DEFINE_PER_CPU(hcall_trace_disable);
+
 void hcall_tracepoint_regfunc(void)
 {
 	hcall_tracepoint_refcount++;
 }

 void hcall_tracepoint_unregfunc(void)
 {
 	hcall_tracepoint_refcount--;
 }

+int __trace_disable_check(void)
+{
+	if (!hcall_tracepoint_refcount)
+		return 1;
+
+	if (get_cpu_var(hcall_trace_disable)) {
+		put_cpu_var(hcall_trace_disable);
+		return 1;
+	}
+
+	__get_cpu_var(hcall_trace_disable)++;
+
+	return 0;
+}
+
+void __trace_disable_put(void)
+{
+	__get_cpu_var(hcall_trace_disable)--;
+	put_cpu_var(hcall_trace_disable);
+}
+
 void __trace_hcall_entry(unsigned long opcode, unsigned long *args)
 {
+	int trace_disable;
+
+	if (__trace_disable_check())
+		return;
+
 	trace_hcall_entry(opcode, args);
+	__trace_disable_put();
 }

 void __trace_hcall_exit(long opcode, unsigned long retval,
 			unsigned long *retbuf)
 {
+	if (__trace_disable_check())
+		return;
+
 	trace_hcall_exit(opcode, retval, retbuf);
+	__trace_disable_put();
 }

-- Steve

^ permalink raw reply

* Re: Parsing DTS file
From: David Gibson @ 2010-10-18 14:14 UTC (permalink / raw)
  To: Guillaume Dargaud; +Cc: linuxppc-dev
In-Reply-To: <201010181413.58302.dargaud@lpsc.in2p3.fr>

On Mon, Oct 18, 2010 at 02:13:58PM +0200, Guillaume Dargaud wrote:
> Hello all,
> I'm sure there's a standard way of doing this: I'd like to access some info 
> from the DTS file from a usermode program (things like address of device, 
> etc...)
> What is the best way to do this ? Write a parser ? Or query the running kernel 
> somehow ?

Um.. this question doesn't seem to make a lot of sense.  What's the
context in which you're trying to do this.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

^ permalink raw reply

* [PATCH -mm 2/2] RapidIO: Fix destructive port EM initialization for Tsi568
From: Alexandre Bounine @ 2010-10-18 14:03 UTC (permalink / raw)
  To: akpm, linux-kernel, linuxppc-dev; +Cc: Alexandre Bounine, Thomas Moll
In-Reply-To: <1287410600-24783-1-git-send-email-alexandre.bounine@idt.com>

Replaces possibly damaging broadcast writes into the per-port SP_MODE
registers with individual writes for each port. This will preserve
individual port configurations in case if ports are configured differently.

Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Cc: Kumar Gala <galak@kernel.crashing.org>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Li Yang <leoli@freescale.com>
Cc: Thomas Moll <thomas.moll@sysgo.com>
Cc: Micha Nelissen <micha@neli.hopto.org>
---
 drivers/rapidio/switches/tsi568.c |   15 ++++++++++-----
 1 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/drivers/rapidio/switches/tsi568.c b/drivers/rapidio/switches/tsi568.c
index f7fd789..b9a389b 100644
--- a/drivers/rapidio/switches/tsi568.c
+++ b/drivers/rapidio/switches/tsi568.c
@@ -29,7 +29,7 @@
 #define SPP_ROUTE_CFG_DESTID(n)	(0x11070 + 0x100*n)
 #define SPP_ROUTE_CFG_PORT(n)	(0x11074 + 0x100*n)
 
-#define TSI568_SP_MODE_BC	0x10004
+#define TSI568_SP_MODE(n)	(0x11004 + 0x100*n)
 #define  TSI568_SP_MODE_PW_DIS	0x08000000
 
 static int
@@ -117,14 +117,19 @@ tsi568_em_init(struct rio_dev *rdev)
 	u16 destid = rdev->rswitch->destid;
 	u8 hopcount = rdev->rswitch->hopcount;
 	u32 regval;
+	int portnum;
 
 	pr_debug("TSI568 %s [%d:%d]\n", __func__, destid, hopcount);
 
 	/* Make sure that Port-Writes are disabled (for all ports) */
-	rio_mport_read_config_32(mport, destid, hopcount,
-			TSI568_SP_MODE_BC, &regval);
-	rio_mport_write_config_32(mport, destid, hopcount,
-			TSI568_SP_MODE_BC, regval | TSI568_SP_MODE_PW_DIS);
+	for (portnum = 0;
+	     portnum < RIO_GET_TOTAL_PORTS(rdev->swpinfo); portnum++) {
+		rio_mport_read_config_32(mport, destid, hopcount,
+				TSI568_SP_MODE(portnum), &regval);
+		rio_mport_write_config_32(mport, destid, hopcount,
+				TSI568_SP_MODE(portnum),
+				regval | TSI568_SP_MODE_PW_DIS);
+	}
 
 	return 0;
 }
-- 
1.7.3.1

^ permalink raw reply related

* [PATCH -mm 1/2] RapidIO: Fix maximum port number in tsi57x EM initialization
From: Alexandre Bounine @ 2010-10-18 14:03 UTC (permalink / raw)
  To: akpm, linux-kernel, linuxppc-dev; +Cc: Alexandre Bounine, Thomas Moll

Replace hardcoded maximum port number with actual reported number of
switch ports.

Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Cc: Kumar Gala <galak@kernel.crashing.org>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Li Yang <leoli@freescale.com>
Cc: Thomas Moll <thomas.moll@sysgo.com>
Cc: Micha Nelissen <micha@neli.hopto.org>
---
 drivers/rapidio/switches/tsi57x.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/drivers/rapidio/switches/tsi57x.c b/drivers/rapidio/switches/tsi57x.c
index d9e9492..2003fb6 100644
--- a/drivers/rapidio/switches/tsi57x.c
+++ b/drivers/rapidio/switches/tsi57x.c
@@ -166,7 +166,8 @@ tsi57x_em_init(struct rio_dev *rdev)
 
 	pr_debug("TSI578 %s [%d:%d]\n", __func__, destid, hopcount);
 
-	for (portnum = 0; portnum < 16; portnum++) {
+	for (portnum = 0;
+	     portnum < RIO_GET_TOTAL_PORTS(rdev->swpinfo); portnum++) {
 		/* Make sure that Port-Writes are enabled (for all ports) */
 		rio_mport_read_config_32(mport, destid, hopcount,
 				TSI578_SP_MODE(portnum), &regval);
-- 
1.7.3.1

^ permalink raw reply related

* Parsing DTS file
From: Guillaume Dargaud @ 2010-10-18 12:13 UTC (permalink / raw)
  To: linuxppc-dev

Hello all,
I'm sure there's a standard way of doing this: I'd like to access some info 
from the DTS file from a usermode program (things like address of device, 
etc...)
What is the best way to do this ? Write a parser ? Or query the running kernel 
somehow ?
-- 
Guillaume Dargaud
http://www.gdargaud.net/

^ permalink raw reply

* Re: Freescale P2020 CPU Freeze over PCIe abort signal
From: Eran Liberty @ 2010-10-18 11:44 UTC (permalink / raw)
  To: tiejun.chen; +Cc: linuxppc-dev, linux-pci
In-Reply-To: <4CBC18D8.5060804@windriver.com>

tiejun.chen wrote:
> AFAIK we can set one bit on PEX_ERR_DISR to detect PCI Express completion
> time-out. If so one interrupt should be issued. But I'm not sure if this can fix
> your issue.
>
> Tiejun
>   
As I understand the problem, this will not help me as the CPU itself is 
on hold waiting for the assembly line to complete. It may give me enough 
to spot the faulty situation and crash the system... but I am aiming for 
an Enterprise grade product.  Crash/oops is not something I want to put 
into the system if I do not absolutely have to and I do have a hardware 
watch-dog that will pull me out if I do.

So just for the fun of it I am going to follow up on this PEX_ERR_DISR, 
but I do not see how it will help.

-- Liberty

^ permalink raw reply

* Re: PROBLEM: memory corrupting bug, bisected to 6dda9d55
From: Mel Gorman @ 2010-10-18 11:33 UTC (permalink / raw)
  To: pacman; +Cc: linux-mm, Andrew Morton, linuxppc-dev, linux-kernel
In-Reply-To: <20101013175205.21187.qmail@kosh.dhis.org>

On Wed, Oct 13, 2010 at 12:52:05PM -0500, pacman@kosh.dhis.org wrote:
> Mel Gorman writes:
> > 
> > On Mon, Oct 11, 2010 at 02:00:39PM -0700, Andrew Morton wrote:
> > > 
> > > It's corruption of user memory, which is unusual.  I'd be wondering if
> > > there was a pre-existing bug which 6dda9d55bf545013597 has exposed -
> > > previously the corruption was hitting something harmless.  Something
> > > like a missed CPU cache writeback or invalidate operation.
> > > 
> > 
> > This seems somewhat plausible although it's hard to tell for sure. But
> > lets say we had the following situation in memory
> > 
> > [<----MAX_ORDER_NR_PAGES---->][<----MAX_ORDER_NR_PAGES---->]
> > INITRD                        memmap array
> 
> I don't use initrd, so this isn't exactly what happened here. But it could be
> close. Let me throw out some more information and see if it triggers any
> ideas.
> 

Ok.

> First, I tried a new test after seeing the corruption happen:
> # md5sum /sbin/e2fsck ; echo 1 > /proc/sys/vm/drop_caches ; md5sum /sbin/e2fsck
> And got 2 different answers. The second answer was the correct one.
> 
> Since applying the suggested patch which changed MAX_ORDER-1 to MAX_ORDER-2,
> I've been trying to isolate exactly when the corruption happens. Since I
> don't know much about kernel code, my main method is stuffing the area full
> of printk's.
> 
> First I duplicated the affected function __free_one_page, since it's inlined
> at 2 different places, so I could apply the patch to just one of them. This
> proved that the problem is happening when called from free_one_page.
> 
> The patch which fixes (or at least covers up) the bug will only matter when
> order==MAX_ORDER-2, otherwise everything is the same. So I added a lot of
> printk's to show what's happening when order==MAX_ORDER-2. I found that, very
> repeatably, 126 such instances occur during boot, and 61 of them pass the
> page_is_buddy(higher_page, higher_buddy, order + 1) test, causing them to
> call list_add_tail.
> 
> Next, since the bug appears when this code decides to call list_add_tail,
> I made my own wrapper for list_add_tail, which allowed me to force some of
> the calls to do list_add instead. Eventually I found that of the 61 calls,
> the last one makes the difference. Allowing the first 60 calls to go through
> to list_add_tail, and switching the last one to list_add, the symptom goes
> away.
> 
> dump_stack() for that last call gave me a backtrace like this:
> [c0303e80] [c0008124] show_stack+0x4c/0x144 (unreliable)
> [c0303ec0] [c0068a84] free_one_page+0x28c/0x5b0
> [c0303f20] [c0069588] __free_pages_ok+0xf8/0x120
> [c0303f40] [c02d28c8] free_all_bootmem_core+0xf0/0x1f8
> [c0303f70] [c02d29fc] free_all_bootmem+0x2c/0x6c
> [c0303f90] [c02cc7dc] mem_init+0x70/0x2ac
> [c0303fc0] [c02c66a4] start_kernel+0x150/0x27c
> [c0303ff0] [00003438] 0x3438
> 
> And this might be interesting: the PFN of the page being added in that
> critical 61st call is 130048, which exactly matches the number of available
> pages:
> 
>   free_area_init_node: node 0, pgdat c02fee6c, node_mem_map c0330000
>     DMA zone: 1024 pages used for memmap
>     DMA zone: 0 pages reserved
>     DMA zone: 130048 pages, LIFO batch:31
>   Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 130048
> 
> Suspicious?
> 

A bit but I still don't know why it would cause corruption. Maybe this is still
a caching issue but the difference in timing between list_add and list_add_tail
is enough to hide the bug. It's also possible there are some registers
ioremapped after the memmap array and reading them is causing some
problem.

Andrew, what is the right thing to do here? We could flail around looking
for explanations as to why the bug causes a user buffer corruption but never
get an answer or do we go with this patch, preferably before 2.6.36 releases?

==== CUT HERE ====
mm, page-allocator: Do not check the state of a non-existant buddy during free

There is a bug in commit [6dda9d55: page allocator: reduce fragmentation
in buddy allocator by adding buddies that are merging to the tail of the
free lists] that means a buddy at order MAX_ORDER is checked for
merging. A page of this order never exists so at times, an effectively
random piece of memory is being checked.

Alan Curry has reported that this is causing memory corruption in userspace
data on a PPC32 platform (http://lkml.org/lkml/2010/10/9/32). It is not clear
why this is happening. It could be a cache coherency problem where pages
mapped in both user and kernel space are getting different cache lines due
to the bad read from kernel space (http://lkml.org/lkml/2010/10/13/179). It
could also be that there are some special registers being io-remapped at
the end of the memmap array and that a read has special meaning on them.
Compiler bugs have been ruled out because the assembly before and after
the patch looks relatively harmless.

This patch fixes the problem by ensuring we are not reading a possibly
invalid location of memory. It's not clear why the read causes
corruption but one way or the other it is a buggy read.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
---
 mm/page_alloc.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index a8cfa9c..93cef41 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -530,7 +530,7 @@ static inline void __free_one_page(struct page *page,
 	 * so it's less likely to be used soon and more likely to be merged
 	 * as a higher order page
 	 */
-	if ((order < MAX_ORDER-1) && pfn_valid_within(page_to_pfn(buddy))) {
+	if ((order < MAX_ORDER-2) && pfn_valid_within(page_to_pfn(buddy))) {
 		struct page *higher_page, *higher_buddy;
 		combined_idx = __find_combined_index(page_idx, order);
 		higher_page = page + combined_idx - page_idx;

^ permalink raw reply related

* Re: BUG: dead loop in PowerPC hcall tracepoint (Was: [LTP] [PATCH v2] Add ftrace-stress-test to LTP)
From: Benjamin Herrenschmidt @ 2010-10-18 10:05 UTC (permalink / raw)
  To: Li Zefan
  Cc: ltp-list, Peter Zijlstra, linuxppc-dev, LKML, Steven Rostedt,
	Paul Mackerras, Anton Blanchard, Ingo Molnar, subrata
In-Reply-To: <4CBBBCB0.8070406@cn.fujitsu.com>

On Mon, 2010-10-18 at 11:19 +0800, Li Zefan wrote:
> Cc: Steven
> Cc: Ingo
> Cc: Peter
> Cc: Anton Blanchard <anton@samba.org>
> Cc: Paul Mackerras <paulus@samba.org>
> 
> For those Cced, More information here:
> 
> http://marc.info/?l=ltp-list&m=128569007015044&w=2
> http://marc.info/?l=ltp-list&m=128696942432669&w=2

Hrm... that's nasty...

Should we have some kind of flag to avoid spin_yield() calling H_CEDE
(or whatever it calls) when tracing an hcall to prevent that ? Anton,
Paulus, any other smart idea ? A TLF_ would do...

Cheers,
Ben.

> 02:37, Subrata Modak wrote:
> > I get a bigger trace with this patch:
> > 
> > Unable to handle kernel paging request for data at address 0x00000000
> > Faulting instruction address: 0xc0000000002133f0
> > cpu 0x2: Vector: 300 (Data Access) at [c0000000d9f8b560]
> >     pc: c0000000002133f0: .trace_clock_global+0xb4/0x2a0
> >     lr: c000000000213458: .trace_clock_global+0x11c/0x2a0
> >     sp: c0000000d9f8b7e0
> >    msr: 800000000200b032
> >    dar: 0
> >  dsisr: 40000000
> >   current = 0xc0000000d9f7d100
> >   paca    = 0xc000000007fc8e00
> >     pid   = 1667, comm = ftrace_stack_tr
> > Unrecoverable FP Unavailable Exception 800 at c0000000016a9540
> > cpu 0x0: Vector: 8Unable to handle0 kernel paging r0 equest for data (at
> > address 0xbffFffffe0175b688
> > PU UnavaFaulting instruciltion address: 0xac0000000001017fcb
> > le) at [c0000000d9f8a6a0]
> > p   pc: c0000000016a9540: etnetre r?  ?f ofro rh ehlepl
> > 
> > 
> >     lr: [c000000000016a9540: key_type_dns_resolver+0x15110/0x365f8
> >     sp: c0000000018804e8
> >    msr: 8000000000001032
> >   current = 0xc0000000d838d100
> >   paca    = 0xc000000007fc8000
> >     pid   = 1668, comm = ftrace_stack_ma
> >  pid   = 1668, cc0000000002226b0 .rb_reserve_next_event+0x20c/0x804
> > [c0000000d9f8b9b0] c000000000223178 .ring_buffer_lock_reserve
> > +0x24c/0x2a4
> > [c0000000d9f8ba40] c00000000022d6f4 .trace_buffer_lock_reserve+0x58/0xe4
> > [c0000000d9f8baf0] c00000000022ec9c .trace_current_buffer_lock_reserve
> > +0x44/0x6c
> > [c0000000d9f8bb80] c000000000011c5c .ftrace_raw_event_hcall_entry
> > +0x7c/0x144
> > [c0000000d9f8bc40] c000000000096624 .__trace_hcall_entry+0xa0/0xec
> > [c0000000d9f8bcd0] c00000000009786c .plpar_hcall_norets+0x50/0xd0
> > [c0000000d9f8bd40] c0000000000749c8 .__spin_yield+0x130/0x15c
> > [c0000000d9f8bdd0] c000000000213458 .trace_clock_global+0x11c/0x2a0
> 
> This is a dead loop:
> 
> trace_hcall_entry() -> trace_clock_global() -> trace_hcall_entry() ..
> 
> And this is a PPC specific bug. Hope some ppc guys will fix it?
> Or we kill trace_clock_global() if no one actually uses it..
> 
> --
> Li Zefan
> 
> > [c0000000d9f8be90] c0000000002226b0 .rb_reserve_next_event+0x20c/0x804
> > [c0000000d9f8bfa0] c000000000223178 .ring_buffer_lock_reserve
> > +0x24c/0x2a4
> > [c0000000d9f8c030] c00000000022d6f4 .trace_buffer_lock_reserve+0x58/0xe4
> > [c0000000d9f8c0e0] c00000000022ec9c .trace_current_buffer_lock_reserve
> > +0x44/0x6c
> > [c0000000d9f8c170] c000000000011c5c .ftrace_raw_event_hcall_entry
> > +0x7c/0x144
> > [c0000000d9f8c230] c000000000096624 .__trace_hcall_entry+0xa0/0xec
> > [c0000000d9f8c2c0] c00000000009786c .plpar_hcall_norets+0x50/0xd0
> > [c0000000d9f8c330] c0000000000749c8 .__spin_yield+0x130/0x15c
> > [c0000000d9f8c3c0] c000000000213458 .trace_clock_global+0x11c/0x2a0
> > [c0000000d9f8c480] c0000000002226b0 .rb_reserve_next_event+0x20c/0x804
> > [c0000000d9f8c590] c000000000223178 .ring_buffer_lock_reserve
> > +0x24c/0x2a4
> > [c0000000d9f8c620] c00000000022d6f4 .trace_buffer_lock_reserve+0x58/0xe4
> > [c0000000d9f8c6d0] c00000000022ec9c .trace_current_buffer_lock_reserve
> > +0x44/0x6c
> > [c0000000d9f8c760] c000000000011c5c .ftrace_raw_event_hcall_entry
> > +0x7c/0x144
> > [c0000000d9f8c820] c000000000096624 .__trace_hcall_entry+0xa0/0xec
> > [c0000000d9f8c8b0] c00000000009786c .plpar_hcall_norets+0x50/0xd0
> > [c0000000d9f8c920] c0000000000749c8 .__spin_yield+0x130/0x15c
> > [c0000000d9f8c9b0] c000000000213458 .trace_clock_global+0x11c/0x2a0
> > [c0000000d9f8ca70] c0000000002226b0 .rb_reserve_next_event+0x20c/0x804
> > [c0000000d9f8cb80] c000000000223178 .ring_buffer_lock_reserve
> > +0x24c/0x2a4
> > [c0000000d9f8cc10] c00000000022d6f4 .trace_buffer_lock_reserve+0x58/0xe4
> > [c0000000d9f8ccc0] c00000000022ec9c .trace_current_buffer_lock_reserve
> > +0x44/0x6c
> > [c0000000d9f8cd50] c000000000011c5c .ftrace_raw_event_hcall_entry
> > +0x7c/0x144
> > [c0000000d9f8ce10] c000000000096624 .__trace_hcall_entry+0xa0/0xec
> > [c0000000d9f8cea0] c00000000009786c .plpar_hcall_norets+0x50/0xd0
> > [c0000000d9f8cf10] c0000000000749c8 .__spin_yield+0x130/0x15c
> > [c0000000d9f8cfa0] c000000000213458 .trace_clock_global+0x11c/0x2a0
> > [c0000000d9f8d060] c0000000002226b0 .rb_reserve_next_event+0x20c/0x804
> > [c0000000d9f8d170] c000000000223178 .ring_buffer_lock_reserve
> > +0x24c/0x2a4
> > [c0000000d9f8d200] c00000000022d6f4 .trace_buffer_lock_reserve+0x58/0xe4
> > [c0000000d9f8d2b0] c00000000022ec9c .trace_current_buffer_lock_reserve
> > +0x44/0x6c
> > [c0000000d9f8d340] c000000000011c5c .ftrace_raw_event_hcall_entry
> > +0x7c/0x144
> > [c0000000d9f8d400] c000000000096624 .__trace_hcall_entry+0xa0/0xec
> > [c0000000d9f8d490] c00000000009786c .plpar_hcall_norets+0x50/0xd0
> > [c0000000d9f8d500] c0000000000749c8 .__spin_yield+0x130/0x15c
> > [c0000000d9f8d590] c000000000213458 .trace_clock_global+0x11c/0x2a0
> > [c0000000d9f8d650] c0000000002226b0 .rb_reserve_next_event+0x20c/0x804
> > [c0000000d9f8d760] c000000000223178 .ring_buffer_lock_reserve
> > +0x24c/0x2a4
> > [c0000000d9f8d7f0] c00000000022d6f4 .trace_buffer_lock_reserve+0x58/0xe4
> > [c0000000d9f8d8a0] c00000000022ec9c .trace_current_buffer_lock_reserve
> > +0x44/0x6c
> > [c0000000d9f8d930] c000000000011c5c .ftrace_raw_event_hcall_entry
> > +0x7c/0x144
> > [c0000000d9f8d9f0] c000000000096624 .__trace_hcall_entry+0xa0/0xec
> > [c0000000d9f8da80] c00000000009786c .plpar_hcall_norets+0x50/0xd0
> > [c0000000d9f8daf0] c0000000000749c8 .__spin_yield+0x130/0x15c
> > [c0000000d9f8db80] c000000000213458 .trace_clock_global+0x11c/0x2a0
> > [c0000000d9f8dc40] c0000000002226b0 .rb_reserve_next_event+0x20c/0x804
> > [c0000000d9f8dd50] c000000000223178 .ring_buffer_lock_reserve
> > +0x24c/0x2a4
> > [c0000000d9f8dde0] c00000000022d6f4 .trace_buffer_lock_reserve+0x58/0xe4
> > [c0000000d9f8de90] c00000000022ec9c .trace_current_buffer_lock_reserve
> > +0x44/0x6c
> > [c0000000d9f8df20] c000000000011c5c .ftrace_raw_event_hcall_entry
> > +0x7c/0x144
> > [c0000000d9f8dfe0] c000000000096624 .__trace_hcall_entry+0xa0/0xec
> > [c0000000d9f8e070] c00000000009786c .plpar_hcall_norets+0x50/0xd0
> > [c0000000d9f8e0e0] c0000000000749c8 .__spin_yield+0x130/0x15c
> > [c0000000d9f8e170] c000000000213458 .trace_clock_global+0x11c/0x2a0
> > [c0000000d9f8e230] c0000000002226b0 .rb_reserve_next_event+0x20c/0x804
> > 
> > Regards--
> > Subrata
> > 
> > On Wed, 2010-10-13 at 19:29 +0800, Li Zefan wrote:
> >> Li Zefan wrote:
> >>> Li Zefan wrote:
> >>>> Subrata Modak wrote:
> >>>>> Li,
> >>>>>
> >>>>> Can you kindly fix the below issues and send me a new patch ?
> >>>>>
> >>>> Sure, I will.
> >>>>
> >>> At a first glance, the kernel bug doesn't seem to come from ftrace,
> >>> at least not directly from it.
> >>>
> >>> Question:
> >>>
> >>> Can you trigger this bug in latest kernel (2.6.36-rc7)? If you
> >>> haven't tried it, could you test it? Thanks!
> >>>
> >> I got a patch from Peter, though he's not sure about the bug yet.
> >> So could you try this?
> >>
> >> Note, I don't know if this patch can be applied to 2.6.35-4..
> >>
> >> ======================
> >>
> >> (From Peter)
> >>
> >> Hrmm, not something I've seen before.. but since the migration thread
> >> and stop_machine are involved, does the below patch which rewrites the
> >> migration-thread/stop_macehine/scheduler interaction cure this?
> >>
> >> This patch is probably .37 fodder, but who knows..
> >>
> >> ---
> >> Subject: sched: Create special class for stop/migrate work
> >> From: Peter Zijlstra <a.p.zijlstra@chello.nl>
> >> Date: Wed Sep 22 13:53:15 CEST 2010
> >>
> >> In order to separate the stop/migrate work thread from the SCHED_FIFO
> >> implementation, create a special class for it that is of higher priority
> >> than SCHED_FIFO itself.
> >>
> >> This currently solves a problem where cpu-hotplug consumes so much cpu-time
> >> that the SCHED_FIFO class gets throttled, but has the bandwidth replenishment
> >> timer pending on the now dead cpu.
> >>
> >> It is also required for when we add the planned deadline scheduling class
> >> above SCHED_FIFO, as the stop/migrate thread still needs to transcent those
> >> tasks.
> >>
> >> Tested-by: Heiko Carstens <heiko.carstens@de.ibm.com>
> >> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
> >> LKML-Reference: <1285165776.2275.1022.camel@laptop>
> >> ---
> >>  kernel/sched.c          |   54 ++++++++++++++++++++----
> >>  kernel/sched_stoptask.c |  108 ++++++++++++++++++++++++++++++++++++++++++++++++
> >>  kernel/stop_machine.c   |    8 ++-
> >>  3 files changed, 158 insertions(+), 12 deletions(-)
> >>
> >> Index: linux-2.6/kernel/sched.c
> >> ===================================================================
> >> --- linux-2.6.orig/kernel/sched.c
> >> +++ linux-2.6/kernel/sched.c
> >> @@ -486,7 +486,7 @@ struct rq {
> >>  	 */
> >>  	unsigned long nr_uninterruptible;
> >>
> >> -	struct task_struct *curr, *idle;
> >> +	struct task_struct *curr, *idle, *stop;
> >>  	unsigned long next_balance;
> >>  	struct mm_struct *prev_mm;
> >>
> >> @@ -1837,7 +1837,7 @@ static inline void __set_task_cpu(struct
> >>
> >>  static const struct sched_class rt_sched_class;
> >>
> >> -#define sched_class_highest (&rt_sched_class)
> >> +#define sched_class_highest (&stop_sched_class)
> >>  #define for_each_class(class) \
> >>     for (class = sched_class_highest; class; class = class->next)
> >>
> >> @@ -1917,10 +1917,41 @@ static void deactivate_task(struct rq *r
> >>  #include "sched_idletask.c"
> >>  #include "sched_fair.c"
> >>  #include "sched_rt.c"
> >> +#include "sched_stoptask.c"
> >>  #ifdef CONFIG_SCHED_DEBUG
> >>  # include "sched_debug.c"
> >>  #endif
> >>
> >> +void sched_set_stop_task(int cpu, struct task_struct *stop)
> >> +{
> >> +	struct sched_param param = { .sched_priority = MAX_RT_PRIO - 1 };
> >> +	struct task_struct *old_stop = cpu_rq(cpu)->stop;
> >> +
> >> +	if (stop) {
> >> +		/*
> >> +		 * Make it appear like a SCHED_FIFO task, its something
> >> +		 * userspace knows about and won't get confused about.
> >> +		 *
> >> +		 * Also, it will make PI more or less work without too
> >> +		 * much confusion -- but then, stop work should not
> >> +		 * rely on PI working anyway.
> >> +		 */
> >> +		sched_setscheduler_nocheck(stop, SCHED_FIFO, &param);
> >> +
> >> +		stop->sched_class = &stop_sched_class;
> >> +	}
> >> +
> >> +	cpu_rq(cpu)->stop = stop;
> >> +
> >> +	if (old_stop) {
> >> +		/*
> >> +		 * Reset it back to a normal scheduling class so that
> >> +		 * it can die in pieces.
> >> +		 */
> >> +		old_stop->sched_class = &rt_sched_class;
> >> +	}
> >> +}
> >> +
> >>  /*
> >>   * __normal_prio - return the priority that is based on the static prio
> >>   */
> >> @@ -3720,17 +3751,13 @@ pick_next_task(struct rq *rq)
> >>  			return p;
> >>  	}
> >>
> >> -	class = sched_class_highest;
> >> -	for ( ; ; ) {
> >> +	for_each_class(class) {
> >>  		p = class->pick_next_task(rq);
> >>  		if (p)
> >>  			return p;
> >> -		/*
> >> -		 * Will never be NULL as the idle class always
> >> -		 * returns a non-NULL p:
> >> -		 */
> >> -		class = class->next;
> >>  	}
> >> +
> >> +	BUG(); /* the idle class will always have a runnable task */
> >>  }
> >>
> >>  /*
> >> @@ -4659,6 +4686,15 @@ static int __sched_setscheduler(struct t
> >>  	 */
> >>  	rq = __task_rq_lock(p);
> >>
> >> +	/*
> >> +	 * Changing the policy of the stop threads its a very bad idea
> >> +	 */
> >> +	if (p == rq->stop) {
> >> +		__task_rq_unlock(rq);
> >> +		raw_spin_unlock_irqrestore(&p->pi_lock, flags);
> >> +		return -EINVAL;
> >> +	}
> >> +
> >>  #ifdef CONFIG_RT_GROUP_SCHED
> >>  	if (user) {
> >>  		/*
> >> Index: linux-2.6/kernel/stop_machine.c
> >> ===================================================================
> >> --- linux-2.6.orig/kernel/stop_machine.c
> >> +++ linux-2.6/kernel/stop_machine.c
> >> @@ -287,11 +287,12 @@ static int cpu_stopper_thread(void *data
> >>  	goto repeat;
> >>  }
> >>
> >> +extern void sched_set_stop_task(int cpu, struct task_struct *stop);
> >> +
> >>  /* manage stopper for a cpu, mostly lifted from sched migration thread mgmt */
> >>  static int __cpuinit cpu_stop_cpu_callback(struct notifier_block *nfb,
> >>  					   unsigned long action, void *hcpu)
> >>  {
> >> -	struct sched_param param = { .sched_priority = MAX_RT_PRIO - 1 };
> >>  	unsigned int cpu = (unsigned long)hcpu;
> >>  	struct cpu_stopper *stopper = &per_cpu(cpu_stopper, cpu);
> >>  	struct task_struct *p;
> >> @@ -304,13 +305,13 @@ static int __cpuinit cpu_stop_cpu_callba
> >>  				   cpu);
> >>  		if (IS_ERR(p))
> >>  			return NOTIFY_BAD;
> >> -		sched_setscheduler_nocheck(p, SCHED_FIFO, &param);
> >>  		get_task_struct(p);
> >> +		kthread_bind(p, cpu);
> >> +		sched_set_stop_task(cpu, p);
> >>  		stopper->thread = p;
> >>  		break;
> >>
> >>  	case CPU_ONLINE:
> >> -		kthread_bind(stopper->thread, cpu);
> >>  		/* strictly unnecessary, as first user will wake it */
> >>  		wake_up_process(stopper->thread);
> >>  		/* mark enabled */
> >> @@ -325,6 +326,7 @@ static int __cpuinit cpu_stop_cpu_callba
> >>  	{
> >>  		struct cpu_stop_work *work;
> >>
> >> +		sched_set_stop_task(cpu, NULL);
> >>  		/* kill the stopper */
> >>  		kthread_stop(stopper->thread);
> >>  		/* drain remaining works */
> >> Index: linux-2.6/kernel/sched_stoptask.c
> >> ===================================================================
> >> --- /dev/null
> >> +++ linux-2.6/kernel/sched_stoptask.c
> >> @@ -0,0 +1,108 @@
> >> +/*
> >> + * stop-task scheduling class.
> >> + *
> >> + * The stop task is the highest priority task in the system, it preempts
> >> + * everything and will be preempted by nothing.
> >> + *
> >> + * See kernel/stop_machine.c
> >> + */
> >> +
> >> +#ifdef CONFIG_SMP
> >> +static int
> >> +select_task_rq_stop(struct rq *rq, struct task_struct *p,
> >> +		    int sd_flag, int flags)
> >> +{
> >> +	return task_cpu(p); /* stop tasks as never migrate */
> >> +}
> >> +#endif /* CONFIG_SMP */
> >> +
> >> +static void
> >> +check_preempt_curr_stop(struct rq *rq, struct task_struct *p, int flags)
> >> +{
> >> +	resched_task(rq->curr); /* we preempt everything */
> >> +}
> >> +
> >> +static struct task_struct *pick_next_task_stop(struct rq *rq)
> >> +{
> >> +	struct task_struct *stop = rq->stop;
> >> +
> >> +	if (stop && stop->state == TASK_RUNNING)
> >> +		return stop;
> >> +
> >> +	return NULL;
> >> +}
> >> +
> >> +static void
> >> +enqueue_task_stop(struct rq *rq, struct task_struct *p, int flags)
> >> +{
> >> +}
> >> +
> >> +static void
> >> +dequeue_task_stop(struct rq *rq, struct task_struct *p, int flags)
> >> +{
> >> +}
> >> +
> >> +static void yield_task_stop(struct rq *rq)
> >> +{
> >> +	BUG(); /* the stop task should never yield, its pointless. */
> >> +}
> >> +
> >> +static void put_prev_task_stop(struct rq *rq, struct task_struct *prev)
> >> +{
> >> +}
> >> +
> >> +static void task_tick_stop(struct rq *rq, struct task_struct *curr, int queued)
> >> +{
> >> +}
> >> +
> >> +static void set_curr_task_stop(struct rq *rq)
> >> +{
> >> +}
> >> +
> >> +static void switched_to_stop(struct rq *rq, struct task_struct *p,
> >> +			     int running)
> >> +{
> >> +	BUG(); /* its impossible to change to this class */
> >> +}
> >> +
> >> +static void prio_changed_stop(struct rq *rq, struct task_struct *p,
> >> +			      int oldprio, int running)
> >> +{
> >> +	BUG(); /* how!?, what priority? */
> >> +}
> >> +
> >> +static unsigned int
> >> +get_rr_interval_stop(struct rq *rq, struct task_struct *task)
> >> +{
> >> +	return 0;
> >> +}
> >> +
> >> +/*
> >> + * Simple, special scheduling class for the per-CPU stop tasks:
> >> + */
> >> +static const struct sched_class stop_sched_class = {
> >> +	.next			= &rt_sched_class,
> >> +
> >> +	.enqueue_task		= enqueue_task_stop,
> >> +	.dequeue_task		= dequeue_task_stop,
> >> +	.yield_task		= yield_task_stop,
> >> +
> >> +	.check_preempt_curr	= check_preempt_curr_stop,
> >> +
> >> +	.pick_next_task		= pick_next_task_stop,
> >> +	.put_prev_task		= put_prev_task_stop,
> >> +
> >> +#ifdef CONFIG_SMP
> >> +	.select_task_rq		= select_task_rq_stop,
> >> +#endif
> >> +
> >> +	.set_curr_task          = set_curr_task_stop,
> >> +	.task_tick		= task_tick_stop,
> >> +
> >> +	.get_rr_interval	= get_rr_interval_stop,
> >> +
> >> +	.prio_changed		= prio_changed_stop,
> >> +	.switched_to		= switched_to_stop,
> >> +
> >> +	/* no .task_new for stop tasks */
> >> +};
> >>
> >>
> >>
> > 
> > 
> > 
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev

^ permalink raw reply

* Re: 64K PAGE_SIZE and arch/powerpc/kernel/vdso.c
From: Benjamin Herrenschmidt @ 2010-10-18 10:03 UTC (permalink / raw)
  To: Michael Neuling
  Cc: Brian King, Tim Abbott, linuxppc-dev, Nicholas A. Bellinger
In-Reply-To: <7015.1287359052@neuling.org>

On Mon, 2010-10-18 at 10:44 +1100, Michael Neuling wrote:
> > Greetings Linux-ppc64 folks,
> > 
> > While trying to compile v2.6.36-rc8 with PAGE_SIZE=65536 I run into the
> > following compile failure w/ strict checking on a RHEL5.4 / gcc (GCC)
> > 4.1.2 20080704 (Red Hat 4.1.2-46) system:
> > 
> > cc1: warnings being treated as errors
> > arch/powerpc/kernel/vdso.c:81: warning: alignment of ‘vdso_data_store’
> > is greater than maximum object file alignment.  Using 32768
> >   CC      arch/powerpc/sysdev/msi_bitmap.o
> > make[1]: *** [arch/powerpc/kernel/vdso.o] Error 1
> > make[1]: *** Waiting for unfinished jobs....
> > 
> > Any ideas folks..?
> 
> It seems this broke it:

Yes, this is a known (old) issue. You can apply the workaround locally
if you really want to build with that old toolchain, but the risk is
that if an object whose size is not a multiple of PAGE_SIZE is put in
the .data.page_aligned section, your vdso data page might end up not
being aligned properly. Fortunately that doesn't generally happen but
it's worth double checking your System.map...

This is the reason I decided to leave it alone in the kernel and
encourage people to update their toolchain (and binutils).

Cheers,
Ben.

> commit abe1ee3a221d53778c3e58747bbec6e518e5471b
> Author: Tim Abbott <tabbott@ksplice.com>
> Date:   Sun Sep 20 18:14:15 2009 -0400
> 
>     Use macros for .data.page_aligned section.
>     
>     This patch changes the remaining direct references to
>     .data.page_aligned in C and assembly code to use the macros in
>     include/linux/linkage.h.
> 
> Backing out just that part of the change (see below) fixes it.
> 
> FYI the error only occurs on gcc 4.1 and 4.2.  4.3 and greater is fine.  
> 
> Mikey
> 
> diff --git a/arch/powerpc/kernel/vdso.c b/arch/powerpc/kernel/vdso.c
> index 13002fe..c140fce 100644
> --- a/arch/powerpc/kernel/vdso.c
> +++ b/arch/powerpc/kernel/vdso.c
> @@ -78,7 +78,7 @@ static int vdso_ready;
>  static union {
>  	struct vdso_data	data;
>  	u8			page[PAGE_SIZE];
> -} vdso_data_store __page_aligned_data;
> +} vdso_data_store __attribute__((__section__(".data.page_aligned")));
>  struct vdso_data *vdso_data = &vdso_data_store.data;
>  
>  /* Format of the patch table */
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev

^ permalink raw reply

* Re: Freescale P2020 CPU Freeze over PCIe abort signal
From: tiejun.chen @ 2010-10-18  9:52 UTC (permalink / raw)
  To: Eran Liberty; +Cc: linuxppc-dev, linux-pci
In-Reply-To: <4CBB4D80.3030007@extricom.com>

Eran Liberty wrote:
> This should probably go to the Freescale support, as it feels like a
> hardware issue yet the end result is a very frozen Linux kernel so I
> post here first...
> 
> I have a programmable FPGA PCIe device connected to a Freescale's P2020
> PCIe port. As part of the bring-up tests, we are testing two faulty
> scenarios:
> 1. The FPGA totally ignores the PCIe transaction.
> 2. The FPGA return a transaction abort.
> 
> Both are plausible PCIe behavior and their should be outcome is
> documented in the PCIe spec. The first should be terminated by the
> transaction requestor timeout mechanism and raise an error, the second
> should abort the transaction and raise and error.
> 
> In P2020 if I do any of those the CPU is left hung over the transaction.
> 
> something like:
> in_le32(addr)
> 
> is turned into:
> 7c 00 04 ac     sync   7c 00 4c 2c     lwbrx   r0,0,r9
> 0c 00 00 00     twi     0,r0,0
> 4c 00 01 2c     isync
> 
> assembly code, where in r9 (in this example) hold an address which is
> physically mapped into the PCIe resource space.
> 
> The CPU will hang over the load instruction.
> 
> Just for the fun of it, I have wrote my own assembly function omitting
> everything but the load instruction; still freeze.
> Replace "lwbrx" with a simple "lwz"; still freeze.
> 
> It looks like the CPU snoozes till the PCIe transaction is done with no
> timeouts, ignoring any abort signal.
> 

AFAIK we can set one bit on PEX_ERR_DISR to detect PCI Express completion
time-out. If so one interrupt should be issued. But I'm not sure if this can fix
your issue.

Tiejun

> I am going to:
> A. Try to reach the Freescale support.
> B. Asked the FPGA designed to give me a new behavior that will stall the
> PCIe transaction replay for 10 sec, but after those return ok.
> C. report back here with either A or B.
> 
> If you have any ideas I would love to hear them.
> 
> -- Liberty
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev
> 

^ permalink raw reply

* Re: [PATCH 1/2] P4080/eLBC: Make Freescale elbc interrupt common to elbc devices
From: tiejun.chen @ 2010-10-18  9:46 UTC (permalink / raw)
  To: tiejun.chen
  Cc: Wood Scott-B07421, dedekind1, Zang Roy-R61911, Lan Chunhe-B25806,
	linuxppc-dev, linux-mtd, akpm, dwmw2, Gala Kumar-B11780
In-Reply-To: <4CBC16FD.6030507@windriver.com>

tiejun.chen wrote:
> Zang Roy-R61911 wrote:
>>> -----Original Message-----
>>> From: tiejun.chen [mailto:tiejun.chen@windriver.com]
>>> Sent: Monday, October 18, 2010 16:56 PM
>>> To: Zang Roy-R61911
>>> Cc: linux-mtd@lists.infradead.org; Wood Scott-B07421; dedekind1@gmail.com; Lan
>>> Chunhe-B25806; linuxppc-dev@ozlabs.org; akpm@linux-foundation.org;
>>> dwmw2@infradead.org; Gala Kumar-B11780
>>> Subject: Re: [PATCH 1/2] P4080/eLBC: Make Freescale elbc interrupt common to
>>> elbc devices
>>>
>>> Roy Zang wrote:
>>>> Move Freescale elbc interrupt from nand dirver to elbc driver.
>>>> Then all elbc devices can use the interrupt instead of ONLY nand.
>>>>
>>>> For former nand driver, it had the two functions:
>>>>
>>>> 1. detecting nand flash partitions;
>>>> 2. registering elbc interrupt.
>>>>
>>>> Now, second function is removed to fsl_lbc.c.
>>>>
>>>> Signed-off-by: Lan Chunhe-B25806 <b25806@freescale.com>
>>>> Signed-off-by: Roy Zang <tie-fei.zang@freescale.com>
>>>> Reviewed-by: Anton Vorontsov <cbouatmailru@gmail.com>
>>>> Cc: Wood Scott-B07421 <B07421@freescale.com>
>>>> ---
>>>>
>>>> These two patches are based on the following commits:
>>>> 1.	http://lists.infradead.org/pipermail/linux-mtd/2010-
>>> September/032112.html
>>>> 2.	http://lists.infradead.org/pipermail/linux-mtd/2010-
>>> September/032110.html
>>>> 3.	http://lists.infradead.org/pipermail/linux-mtd/2010-
>>> September/032111.html
>>>> According to Anton's comment, I merge 1 & 2 together and start a new thread.
>>>> Comparing the provided link:
>>>> 1.	Merge 1 & 2 together.
>>>> 2.	Some code style updates
>>>> 3.	Add counter protect for elbc driver remove
>>>> 4.	Rebase to 2.6.36-rc7
>>>>
>>>> Other histories from the links:
>>>> V2: Comparing with v1, according to the feedback, add some decorations.
>>>>
>>>> V3: Comparing with v2:
>>>> 1.	according to the feedback, add some decorations.
>>>> 2.	change of_platform_driver to platform_driver
>>>> 3.	rebase to 2.6.36-rc4
>>>>
>>>> V4: Comparing with v3
>>>> 1.	minor fix from type unsigned int to u32
>>>> 2.	fix platform_driver issue.
>>>> 3.	add mutex for nand probe
>>>>
>>>>  arch/powerpc/Kconfig               |    7 +-
>>>>  arch/powerpc/include/asm/fsl_lbc.h |   33 +++-
>>>>  arch/powerpc/sysdev/fsl_lbc.c      |  229 ++++++++++++++---
>>>>  drivers/mtd/nand/Kconfig           |    1 +
>>>>  drivers/mtd/nand/fsl_elbc_nand.c   |  482 +++++++++++++++------------------
>>> ---
>>>>  5 files changed, 425 insertions(+), 327 deletions(-)
>>>>
>>>> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
>>>> index 631e5a0..44df1ba 100644
>>>> --- a/arch/powerpc/Kconfig
>>>> +++ b/arch/powerpc/Kconfig
>>>> @@ -687,9 +687,12 @@ config 4xx_SOC
>>>>  	bool
>>>>
>>>>  config FSL_LBC
>>>> -	bool
>>>> +	bool "Freescale Local Bus support"
>>>> +	depends on FSL_SOC
>>>>  	help
>>>> -	  Freescale Localbus support
>>>> +	  Enables reporting of errors from the Freescale local bus
>>>> +	  controller.  Also contains some common code used by
>>>> +	  drivers for specific local bus peripherals.
>>>>
>>>>  config FSL_GTM
>>>>  	bool
>>>> diff --git a/arch/powerpc/include/asm/fsl_lbc.h
>>> b/arch/powerpc/include/asm/fsl_lbc.h
>>>> index 1b5a210..0c40c05 100644
>>>> --- a/arch/powerpc/include/asm/fsl_lbc.h
>>>> +++ b/arch/powerpc/include/asm/fsl_lbc.h
>>>> @@ -1,9 +1,10 @@
>>>>  /* Freescale Local Bus Controller
>>>>   *
>>>> - * Copyright (c) 2006-2007 Freescale Semiconductor
>>>> + * Copyright (c) 2006-2007, 2010 Freescale Semiconductor
>>>>   *
>>>>   * Authors: Nick Spence <nick.spence@freescale.com>,
>>>>   *          Scott Wood <scottwood@freescale.com>
>>>> + *          Jack Lan <jack.lan@freescale.com>
>>>>   *
>>>>   * This program is free software; you can redistribute it and/or modify
>>>>   * it under the terms of the GNU General Public License as published by
>>>> @@ -26,6 +27,8 @@
>>>>  #include <linux/compiler.h>
>>>>  #include <linux/types.h>
>>>>  #include <linux/io.h>
>>>> +#include <linux/device.h>
>>>> +#include <linux/spinlock.h>
>>>>
>>>>  struct fsl_lbc_bank {
>>>>  	__be32 br;             /**< Base Register  */
>>>> @@ -125,13 +128,23 @@ struct fsl_lbc_regs {
>>>>  #define LTESR_ATMW 0x00800000
>>>>  #define LTESR_ATMR 0x00400000
>>>>  #define LTESR_CS   0x00080000
>>>> +#define LTESR_UPM  0x00000002
>>>>  #define LTESR_CC   0x00000001
>>>>  #define LTESR_NAND_MASK (LTESR_FCT | LTESR_PAR | LTESR_CC)
>>>> +#define LTESR_MASK      (LTESR_BM | LTESR_FCT | LTESR_PAR | LTESR_WP \
>>>> +			 | LTESR_ATMW | LTESR_ATMR | LTESR_CS | LTESR_UPM \
>>>> +			 | LTESR_CC)
>>>> +#define LTESR_CLEAR	0xFFFFFFFF
>>>> +#define LTECCR_CLEAR	0xFFFFFFFF
>>>> +#define LTESR_STATUS	LTESR_MASK
>>>> +#define LTEIR_ENABLE	LTESR_MASK
>>>> +#define LTEDR_ENABLE	0x00000000
>>>>  	__be32 ltedr;           /**< Transfer Error Disable Register */
>>>>  	__be32 lteir;           /**< Transfer Error Interrupt Register */
>>>>  	__be32 lteatr;          /**< Transfer Error Attributes Register */
>>>>  	__be32 ltear;           /**< Transfer Error Address Register */
>>>> -	u8 res6[0xC];
>>>> +	__be32 lteccr;          /**< Transfer Error ECC Register */
>>>> +	u8 res6[0x8];
>>>>  	__be32 lbcr;            /**< Configuration Register */
>>>>  #define LBCR_LDIS  0x80000000
>>>>  #define LBCR_LDIS_SHIFT    31
>>>> @@ -265,7 +278,23 @@ static inline void fsl_upm_end_pattern(struct fsl_upm
>>> *upm)
>>>>  		cpu_relax();
>>>>  }
>>>>
>>>> +/* overview of the fsl lbc controller */
>>>> +
>>>> +struct fsl_lbc_ctrl {
>>>> +	/* device info */
>>>> +	struct device			*dev;
>>>> +	struct fsl_lbc_regs __iomem	*regs;
>>>> +	int				irq;
>>>> +	wait_queue_head_t		irq_wait;
>>>> +	spinlock_t			lock;
>>>> +	void				*nand;
>>>> +
>>>> +	/* status read from LTESR by irq handler */
>>>> +	unsigned int			irq_status;
>>>> +};
>>>> +
>>>>  extern int fsl_upm_run_pattern(struct fsl_upm *upm, void __iomem *io_base,
>>>>  			       u32 mar);
>>>> +extern struct fsl_lbc_ctrl *fsl_lbc_ctrl_dev;
>>>>
>>>>  #endif /* __ASM_FSL_LBC_H */
>>>> diff --git a/arch/powerpc/sysdev/fsl_lbc.c b/arch/powerpc/sysdev/fsl_lbc.c
>>>> index dceb8d1..4bb0336 100644
>>>> --- a/arch/powerpc/sysdev/fsl_lbc.c
>>>> +++ b/arch/powerpc/sysdev/fsl_lbc.c
>>>> @@ -2,8 +2,11 @@
>>>>   * Freescale LBC and UPM routines.
>>>>   *
>>>>   * Copyright (c) 2007-2008  MontaVista Software, Inc.
>>>> + * Copyright (c) 2010 Freescale Semiconductor
>>>>   *
>>>>   * Author: Anton Vorontsov <avorontsov@ru.mvista.com>
>>>> + * Author: Jack Lan <Jack.Lan@freescale.com>
>>>> + * Author: Roy Zang <tie-fei.zang@freescale.com>
>>>>   *
>>>>   * This program is free software; you can redistribute it and/or modify
>>>>   * it under the terms of the GNU General Public License as published by
>>>> @@ -19,39 +22,16 @@
>>>>  #include <linux/types.h>
>>>>  #include <linux/io.h>
>>>>  #include <linux/of.h>
>>>> +#include <linux/slab.h>
>>>> +#include <linux/platform_device.h>
>>>> +#include <linux/interrupt.h>
>>>> +#include <linux/mod_devicetable.h>
>>>>  #include <asm/prom.h>
>>>>  #include <asm/fsl_lbc.h>
>>>>
>>>>  static spinlock_t fsl_lbc_lock = __SPIN_LOCK_UNLOCKED(fsl_lbc_lock);
>>>> -static struct fsl_lbc_regs __iomem *fsl_lbc_regs;
>>>> -
>>>> -static char __initdata *compat_lbc[] = {
>>>> -	"fsl,pq2-localbus",
>>>> -	"fsl,pq2pro-localbus",
>>>> -	"fsl,pq3-localbus",
>>>> -	"fsl,elbc",
>>>> -};
>>>> -
>>>> -static int __init fsl_lbc_init(void)
>>>> -{
>>>> -	struct device_node *lbus;
>>>> -	int i;
>>>> -
>>>> -	for (i = 0; i < ARRAY_SIZE(compat_lbc); i++) {
>>>> -		lbus = of_find_compatible_node(NULL, NULL, compat_lbc[i]);
>>>> -		if (lbus)
>>>> -			goto found;
>>>> -	}
>>>> -	return -ENODEV;
>>>> -
>>>> -found:
>>>> -	fsl_lbc_regs = of_iomap(lbus, 0);
>>>> -	of_node_put(lbus);
>>>> -	if (!fsl_lbc_regs)
>>>> -		return -ENOMEM;
>>>> -	return 0;
>>>> -}
>>>> -arch_initcall(fsl_lbc_init);
>>>> +struct fsl_lbc_ctrl *fsl_lbc_ctrl_dev;
>>>> +EXPORT_SYMBOL(fsl_lbc_ctrl_dev);
>>>>
>>>>  /**
>>>>   * fsl_lbc_find - find Localbus bank
>>>> @@ -65,13 +45,15 @@ arch_initcall(fsl_lbc_init);
>>>>  int fsl_lbc_find(phys_addr_t addr_base)
>>>>  {
>>>>  	int i;
>>>> +	struct fsl_lbc_regs __iomem *lbc;
>>>>
>>>> -	if (!fsl_lbc_regs)
>>>> +	if (!fsl_lbc_ctrl_dev || !fsl_lbc_ctrl_dev->regs)
>>>>  		return -ENODEV;
>>>>
>>>> -	for (i = 0; i < ARRAY_SIZE(fsl_lbc_regs->bank); i++) {
>>>> -		__be32 br = in_be32(&fsl_lbc_regs->bank[i].br);
>>>> -		__be32 or = in_be32(&fsl_lbc_regs->bank[i].or);
>>>> +	lbc = fsl_lbc_ctrl_dev->regs;
>>>> +	for (i = 0; i < ARRAY_SIZE(lbc->bank); i++) {
>>>> +		__be32 br = in_be32(&lbc->bank[i].br);
>>>> +		__be32 or = in_be32(&lbc->bank[i].or);
>>>>
>>>>  		if (br & BR_V && (br & or & BR_BA) == addr_base)
>>>>  			return i;
>>>> @@ -94,22 +76,27 @@ int fsl_upm_find(phys_addr_t addr_base, struct fsl_upm
>>> *upm)
>>>>  {
>>>>  	int bank;
>>>>  	__be32 br;
>>>> +	struct fsl_lbc_regs __iomem *lbc;
>>>>
>>>>  	bank = fsl_lbc_find(addr_base);
>>>>  	if (bank < 0)
>>>>  		return bank;
>>>>
>>>> -	br = in_be32(&fsl_lbc_regs->bank[bank].br);
>>>> +	if (!fsl_lbc_ctrl_dev || !fsl_lbc_ctrl_dev->regs)
>>>> +		return -ENODEV;
>>>> +
>>>> +	lbc = fsl_lbc_ctrl_dev->regs;
>>>> +	br = in_be32(&lbc->bank[bank].br);
>>>>
>>>>  	switch (br & BR_MSEL) {
>>>>  	case BR_MS_UPMA:
>>>> -		upm->mxmr = &fsl_lbc_regs->mamr;
>>>> +		upm->mxmr = &lbc->mamr;
>>>>  		break;
>>>>  	case BR_MS_UPMB:
>>>> -		upm->mxmr = &fsl_lbc_regs->mbmr;
>>>> +		upm->mxmr = &lbc->mbmr;
>>>>  		break;
>>>>  	case BR_MS_UPMC:
>>>> -		upm->mxmr = &fsl_lbc_regs->mcmr;
>>>> +		upm->mxmr = &lbc->mcmr;
>>>>  		break;
>>>>  	default:
>>>>  		return -EINVAL;
>>>> @@ -148,9 +135,12 @@ int fsl_upm_run_pattern(struct fsl_upm *upm, void
>>> __iomem *io_base, u32 mar)
>>>>  	int ret = 0;
>>>>  	unsigned long flags;
>>>>
>>>> +	if (!fsl_lbc_ctrl_dev || !fsl_lbc_ctrl_dev->regs)
>>>> +		return -ENODEV;
>>>> +
>>>>  	spin_lock_irqsave(&fsl_lbc_lock, flags);
>>>>
>>>> -	out_be32(&fsl_lbc_regs->mar, mar);
>>>> +	out_be32(&fsl_lbc_ctrl_dev->regs->mar, mar);
>>>>
>>>>  	switch (upm->width) {
>>>>  	case 8:
>>>> @@ -172,3 +162,166 @@ int fsl_upm_run_pattern(struct fsl_upm *upm, void
>>> __iomem *io_base, u32 mar)
>>>>  	return ret;
>>>>  }
>>>>  EXPORT_SYMBOL(fsl_upm_run_pattern);
>>>> +
>>>> +static int __devinit fsl_lbc_ctrl_init(struct fsl_lbc_ctrl *ctrl)
>>>> +{
>>>> +	struct fsl_lbc_regs __iomem *lbc = ctrl->regs;
>>>> +
>>>> +	/* clear event registers */
>>>> +	setbits32(&lbc->ltesr, LTESR_CLEAR);
>>>> +	out_be32(&lbc->lteatr, 0);
>>>> +	out_be32(&lbc->ltear, 0);
>>>> +	out_be32(&lbc->lteccr, LTECCR_CLEAR);
>>>> +	out_be32(&lbc->ltedr, LTEDR_ENABLE);
>>>> +
>>>> +	/* Enable interrupts for any detected events */
>>>> +	out_be32(&lbc->lteir, LTEIR_ENABLE);
>>>> +
>>>> +	return 0;
>>>> +}
>>>> +
>>>> +/*
>>>> + * NOTE: This interrupt is used to report localbus events of various kinds,
>>>> + * such as transaction errors on the chipselects.
>>>> + */
>>>> +
>>>> +static irqreturn_t fsl_lbc_ctrl_irq(int irqno, void *data)
>>>> +{
>>>> +	struct fsl_lbc_ctrl *ctrl = data;
>>>> +	struct fsl_lbc_regs __iomem *lbc = ctrl->regs;
>>>> +	u32 status;
>>>> +
>>>> +	status = in_be32(&lbc->ltesr);
>>>> +	if (!status)
>>>> +		return IRQ_NONE;
>>>> +
>>>> +	out_be32(&lbc->ltesr, LTESR_CLEAR);
>>>> +	out_be32(&lbc->lteatr, 0);
>>>> +	out_be32(&lbc->ltear, 0);
>>>> +	ctrl->irq_status = status;
>>>> +
>>>> +	if (status & LTESR_BM)
>>>> +		dev_err(ctrl->dev, "Local bus monitor time-out: "
>>>> +			"LTESR 0x%08X\n", status);
>>>> +	if (status & LTESR_WP)
>>>> +		dev_err(ctrl->dev, "Write protect error: "
>>>> +			"LTESR 0x%08X\n", status);
>>>> +	if (status & LTESR_ATMW)
>>>> +		dev_err(ctrl->dev, "Atomic write error: "
>>>> +			"LTESR 0x%08X\n", status);
>>>> +	if (status & LTESR_ATMR)
>>>> +		dev_err(ctrl->dev, "Atomic read error: "
>>>> +			"LTESR 0x%08X\n", status);
>>>> +	if (status & LTESR_CS)
>>>> +		dev_err(ctrl->dev, "Chip select error: "
>>>> +			"LTESR 0x%08X\n", status);
>>>> +	if (status & LTESR_UPM)
>>>> +		;
>>>> +	if (status & LTESR_FCT) {
>>>> +		dev_err(ctrl->dev, "FCM command time-out: "
>>>> +			"LTESR 0x%08X\n", status);
>>>> +		smp_wmb();
>>>> +		wake_up(&ctrl->irq_wait);
>>>> +	}
>>>> +	if (status & LTESR_PAR) {
>>>> +		dev_err(ctrl->dev, "Parity or Uncorrectable ECC error: "
>>>> +			"LTESR 0x%08X\n", status);
>>>> +		smp_wmb();
>>>> +		wake_up(&ctrl->irq_wait);
>>>> +	}
>>>> +	if (status & LTESR_CC) {
>>>> +		smp_wmb();
>>>> +		wake_up(&ctrl->irq_wait);
>>>> +	}
>>>> +	if (status & ~LTESR_MASK)
>>>> +		dev_err(ctrl->dev, "Unknown error: "
>>>> +			"LTESR 0x%08X\n", status);
>>>> +	return IRQ_HANDLED;
>>>> +}
>>>> +
>>>> +/*
>>>> + * fsl_lbc_ctrl_probe
>>>> + *
>>>> + * called by device layer when it finds a device matching
>>>> + * one our driver can handled. This code allocates all of
>>>> + * the resources needed for the controller only.  The
>>>> + * resources for the NAND banks themselves are allocated
>>>> + * in the chip probe function.
>>>> +*/
>>>> +
>>>> +static int __devinit fsl_lbc_ctrl_probe(struct platform_device *dev)
>>>> +{
>>>> +	int ret;
>>>> +
>>>> +	if (!dev->dev.of_node) {
>>>> +		dev_err(&dev->dev, "Device OF-Node is NULL");
>>>> +		return -EFAULT;
>>>> +	}
>>>> +
>>>> +	fsl_lbc_ctrl_dev = kzalloc(sizeof(*fsl_lbc_ctrl_dev), GFP_KERNEL);
>>>> +	if (!fsl_lbc_ctrl_dev)
>>>> +		return -ENOMEM;
>>>> +
>>>> +	dev_set_drvdata(&dev->dev, fsl_lbc_ctrl_dev);
>>>> +
>>>> +	spin_lock_init(&fsl_lbc_ctrl_dev->lock);
>>>> +	init_waitqueue_head(&fsl_lbc_ctrl_dev->irq_wait);
>>>> +
>>>> +	fsl_lbc_ctrl_dev->regs = of_iomap(dev->dev.of_node, 0);
>>>> +	if (!fsl_lbc_ctrl_dev->regs) {
>>>> +		dev_err(&dev->dev, "failed to get memory region\n");
>>>> +		ret = -ENODEV;
>>>> +		goto err;
>>> Looks you always iounmap(fsl_lbc_ctrl_dev->regs) on position 'err' but here
>>> of_iomap() is already failed you should skip iounmap() fsl_lbc_ctrl_dev->regs
>>> again. So you should improve that as the following on 'err', or layout 'err'
>>> in
>>> gain.
>>> ------
>>> 	if(fsl_lbc_ctrl_dev->regs)
>>> 		iounmap(fsl_lbc_ctrl_dev->regs);
>>>
>>> Tiejun
>> You are right!
>> How about
>>  
>>         if (!fsl_lbc_ctrl_dev->regs) {
>>                 dev_err(&dev->dev, "failed to get memory region\n");
>>                 kfree(fsl_lbc_ctrl_dev);
>>                 return -ENOMEM;
>>         }
> 
> Although this is a big problem, I prefer to return 'ENXIO' :)
		^
		Typo: is not a ......

Tiejun

> 
> Others are fine to me.
> 
> Tiejun
> 
>> ...
>>
>> Thanks.
>> Roy
> 
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev
> 

^ permalink raw reply

* Re: [PATCH 1/2] P4080/eLBC: Make Freescale elbc interrupt common to elbc devices
From: tiejun.chen @ 2010-10-18  9:44 UTC (permalink / raw)
  To: Zang Roy-R61911
  Cc: Wood Scott-B07421, dedekind1, Lan Chunhe-B25806, linuxppc-dev,
	linux-mtd, akpm, dwmw2, Gala Kumar-B11780
In-Reply-To: <3850A844E6A3854C827AC5C0BEC7B60A2B0708@zch01exm23.fsl.freescale.net>

Zang Roy-R61911 wrote:
> 
>> -----Original Message-----
>> From: tiejun.chen [mailto:tiejun.chen@windriver.com]
>> Sent: Monday, October 18, 2010 16:56 PM
>> To: Zang Roy-R61911
>> Cc: linux-mtd@lists.infradead.org; Wood Scott-B07421; dedekind1@gmail.com; Lan
>> Chunhe-B25806; linuxppc-dev@ozlabs.org; akpm@linux-foundation.org;
>> dwmw2@infradead.org; Gala Kumar-B11780
>> Subject: Re: [PATCH 1/2] P4080/eLBC: Make Freescale elbc interrupt common to
>> elbc devices
>>
>> Roy Zang wrote:
>>> Move Freescale elbc interrupt from nand dirver to elbc driver.
>>> Then all elbc devices can use the interrupt instead of ONLY nand.
>>>
>>> For former nand driver, it had the two functions:
>>>
>>> 1. detecting nand flash partitions;
>>> 2. registering elbc interrupt.
>>>
>>> Now, second function is removed to fsl_lbc.c.
>>>
>>> Signed-off-by: Lan Chunhe-B25806 <b25806@freescale.com>
>>> Signed-off-by: Roy Zang <tie-fei.zang@freescale.com>
>>> Reviewed-by: Anton Vorontsov <cbouatmailru@gmail.com>
>>> Cc: Wood Scott-B07421 <B07421@freescale.com>
>>> ---
>>>
>>> These two patches are based on the following commits:
>>> 1.	http://lists.infradead.org/pipermail/linux-mtd/2010-
>> September/032112.html
>>> 2.	http://lists.infradead.org/pipermail/linux-mtd/2010-
>> September/032110.html
>>> 3.	http://lists.infradead.org/pipermail/linux-mtd/2010-
>> September/032111.html
>>> According to Anton's comment, I merge 1 & 2 together and start a new thread.
>>> Comparing the provided link:
>>> 1.	Merge 1 & 2 together.
>>> 2.	Some code style updates
>>> 3.	Add counter protect for elbc driver remove
>>> 4.	Rebase to 2.6.36-rc7
>>>
>>> Other histories from the links:
>>> V2: Comparing with v1, according to the feedback, add some decorations.
>>>
>>> V3: Comparing with v2:
>>> 1.	according to the feedback, add some decorations.
>>> 2.	change of_platform_driver to platform_driver
>>> 3.	rebase to 2.6.36-rc4
>>>
>>> V4: Comparing with v3
>>> 1.	minor fix from type unsigned int to u32
>>> 2.	fix platform_driver issue.
>>> 3.	add mutex for nand probe
>>>
>>>  arch/powerpc/Kconfig               |    7 +-
>>>  arch/powerpc/include/asm/fsl_lbc.h |   33 +++-
>>>  arch/powerpc/sysdev/fsl_lbc.c      |  229 ++++++++++++++---
>>>  drivers/mtd/nand/Kconfig           |    1 +
>>>  drivers/mtd/nand/fsl_elbc_nand.c   |  482 +++++++++++++++------------------
>> ---
>>>  5 files changed, 425 insertions(+), 327 deletions(-)
>>>
>>> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
>>> index 631e5a0..44df1ba 100644
>>> --- a/arch/powerpc/Kconfig
>>> +++ b/arch/powerpc/Kconfig
>>> @@ -687,9 +687,12 @@ config 4xx_SOC
>>>  	bool
>>>
>>>  config FSL_LBC
>>> -	bool
>>> +	bool "Freescale Local Bus support"
>>> +	depends on FSL_SOC
>>>  	help
>>> -	  Freescale Localbus support
>>> +	  Enables reporting of errors from the Freescale local bus
>>> +	  controller.  Also contains some common code used by
>>> +	  drivers for specific local bus peripherals.
>>>
>>>  config FSL_GTM
>>>  	bool
>>> diff --git a/arch/powerpc/include/asm/fsl_lbc.h
>> b/arch/powerpc/include/asm/fsl_lbc.h
>>> index 1b5a210..0c40c05 100644
>>> --- a/arch/powerpc/include/asm/fsl_lbc.h
>>> +++ b/arch/powerpc/include/asm/fsl_lbc.h
>>> @@ -1,9 +1,10 @@
>>>  /* Freescale Local Bus Controller
>>>   *
>>> - * Copyright (c) 2006-2007 Freescale Semiconductor
>>> + * Copyright (c) 2006-2007, 2010 Freescale Semiconductor
>>>   *
>>>   * Authors: Nick Spence <nick.spence@freescale.com>,
>>>   *          Scott Wood <scottwood@freescale.com>
>>> + *          Jack Lan <jack.lan@freescale.com>
>>>   *
>>>   * This program is free software; you can redistribute it and/or modify
>>>   * it under the terms of the GNU General Public License as published by
>>> @@ -26,6 +27,8 @@
>>>  #include <linux/compiler.h>
>>>  #include <linux/types.h>
>>>  #include <linux/io.h>
>>> +#include <linux/device.h>
>>> +#include <linux/spinlock.h>
>>>
>>>  struct fsl_lbc_bank {
>>>  	__be32 br;             /**< Base Register  */
>>> @@ -125,13 +128,23 @@ struct fsl_lbc_regs {
>>>  #define LTESR_ATMW 0x00800000
>>>  #define LTESR_ATMR 0x00400000
>>>  #define LTESR_CS   0x00080000
>>> +#define LTESR_UPM  0x00000002
>>>  #define LTESR_CC   0x00000001
>>>  #define LTESR_NAND_MASK (LTESR_FCT | LTESR_PAR | LTESR_CC)
>>> +#define LTESR_MASK      (LTESR_BM | LTESR_FCT | LTESR_PAR | LTESR_WP \
>>> +			 | LTESR_ATMW | LTESR_ATMR | LTESR_CS | LTESR_UPM \
>>> +			 | LTESR_CC)
>>> +#define LTESR_CLEAR	0xFFFFFFFF
>>> +#define LTECCR_CLEAR	0xFFFFFFFF
>>> +#define LTESR_STATUS	LTESR_MASK
>>> +#define LTEIR_ENABLE	LTESR_MASK
>>> +#define LTEDR_ENABLE	0x00000000
>>>  	__be32 ltedr;           /**< Transfer Error Disable Register */
>>>  	__be32 lteir;           /**< Transfer Error Interrupt Register */
>>>  	__be32 lteatr;          /**< Transfer Error Attributes Register */
>>>  	__be32 ltear;           /**< Transfer Error Address Register */
>>> -	u8 res6[0xC];
>>> +	__be32 lteccr;          /**< Transfer Error ECC Register */
>>> +	u8 res6[0x8];
>>>  	__be32 lbcr;            /**< Configuration Register */
>>>  #define LBCR_LDIS  0x80000000
>>>  #define LBCR_LDIS_SHIFT    31
>>> @@ -265,7 +278,23 @@ static inline void fsl_upm_end_pattern(struct fsl_upm
>> *upm)
>>>  		cpu_relax();
>>>  }
>>>
>>> +/* overview of the fsl lbc controller */
>>> +
>>> +struct fsl_lbc_ctrl {
>>> +	/* device info */
>>> +	struct device			*dev;
>>> +	struct fsl_lbc_regs __iomem	*regs;
>>> +	int				irq;
>>> +	wait_queue_head_t		irq_wait;
>>> +	spinlock_t			lock;
>>> +	void				*nand;
>>> +
>>> +	/* status read from LTESR by irq handler */
>>> +	unsigned int			irq_status;
>>> +};
>>> +
>>>  extern int fsl_upm_run_pattern(struct fsl_upm *upm, void __iomem *io_base,
>>>  			       u32 mar);
>>> +extern struct fsl_lbc_ctrl *fsl_lbc_ctrl_dev;
>>>
>>>  #endif /* __ASM_FSL_LBC_H */
>>> diff --git a/arch/powerpc/sysdev/fsl_lbc.c b/arch/powerpc/sysdev/fsl_lbc.c
>>> index dceb8d1..4bb0336 100644
>>> --- a/arch/powerpc/sysdev/fsl_lbc.c
>>> +++ b/arch/powerpc/sysdev/fsl_lbc.c
>>> @@ -2,8 +2,11 @@
>>>   * Freescale LBC and UPM routines.
>>>   *
>>>   * Copyright (c) 2007-2008  MontaVista Software, Inc.
>>> + * Copyright (c) 2010 Freescale Semiconductor
>>>   *
>>>   * Author: Anton Vorontsov <avorontsov@ru.mvista.com>
>>> + * Author: Jack Lan <Jack.Lan@freescale.com>
>>> + * Author: Roy Zang <tie-fei.zang@freescale.com>
>>>   *
>>>   * This program is free software; you can redistribute it and/or modify
>>>   * it under the terms of the GNU General Public License as published by
>>> @@ -19,39 +22,16 @@
>>>  #include <linux/types.h>
>>>  #include <linux/io.h>
>>>  #include <linux/of.h>
>>> +#include <linux/slab.h>
>>> +#include <linux/platform_device.h>
>>> +#include <linux/interrupt.h>
>>> +#include <linux/mod_devicetable.h>
>>>  #include <asm/prom.h>
>>>  #include <asm/fsl_lbc.h>
>>>
>>>  static spinlock_t fsl_lbc_lock = __SPIN_LOCK_UNLOCKED(fsl_lbc_lock);
>>> -static struct fsl_lbc_regs __iomem *fsl_lbc_regs;
>>> -
>>> -static char __initdata *compat_lbc[] = {
>>> -	"fsl,pq2-localbus",
>>> -	"fsl,pq2pro-localbus",
>>> -	"fsl,pq3-localbus",
>>> -	"fsl,elbc",
>>> -};
>>> -
>>> -static int __init fsl_lbc_init(void)
>>> -{
>>> -	struct device_node *lbus;
>>> -	int i;
>>> -
>>> -	for (i = 0; i < ARRAY_SIZE(compat_lbc); i++) {
>>> -		lbus = of_find_compatible_node(NULL, NULL, compat_lbc[i]);
>>> -		if (lbus)
>>> -			goto found;
>>> -	}
>>> -	return -ENODEV;
>>> -
>>> -found:
>>> -	fsl_lbc_regs = of_iomap(lbus, 0);
>>> -	of_node_put(lbus);
>>> -	if (!fsl_lbc_regs)
>>> -		return -ENOMEM;
>>> -	return 0;
>>> -}
>>> -arch_initcall(fsl_lbc_init);
>>> +struct fsl_lbc_ctrl *fsl_lbc_ctrl_dev;
>>> +EXPORT_SYMBOL(fsl_lbc_ctrl_dev);
>>>
>>>  /**
>>>   * fsl_lbc_find - find Localbus bank
>>> @@ -65,13 +45,15 @@ arch_initcall(fsl_lbc_init);
>>>  int fsl_lbc_find(phys_addr_t addr_base)
>>>  {
>>>  	int i;
>>> +	struct fsl_lbc_regs __iomem *lbc;
>>>
>>> -	if (!fsl_lbc_regs)
>>> +	if (!fsl_lbc_ctrl_dev || !fsl_lbc_ctrl_dev->regs)
>>>  		return -ENODEV;
>>>
>>> -	for (i = 0; i < ARRAY_SIZE(fsl_lbc_regs->bank); i++) {
>>> -		__be32 br = in_be32(&fsl_lbc_regs->bank[i].br);
>>> -		__be32 or = in_be32(&fsl_lbc_regs->bank[i].or);
>>> +	lbc = fsl_lbc_ctrl_dev->regs;
>>> +	for (i = 0; i < ARRAY_SIZE(lbc->bank); i++) {
>>> +		__be32 br = in_be32(&lbc->bank[i].br);
>>> +		__be32 or = in_be32(&lbc->bank[i].or);
>>>
>>>  		if (br & BR_V && (br & or & BR_BA) == addr_base)
>>>  			return i;
>>> @@ -94,22 +76,27 @@ int fsl_upm_find(phys_addr_t addr_base, struct fsl_upm
>> *upm)
>>>  {
>>>  	int bank;
>>>  	__be32 br;
>>> +	struct fsl_lbc_regs __iomem *lbc;
>>>
>>>  	bank = fsl_lbc_find(addr_base);
>>>  	if (bank < 0)
>>>  		return bank;
>>>
>>> -	br = in_be32(&fsl_lbc_regs->bank[bank].br);
>>> +	if (!fsl_lbc_ctrl_dev || !fsl_lbc_ctrl_dev->regs)
>>> +		return -ENODEV;
>>> +
>>> +	lbc = fsl_lbc_ctrl_dev->regs;
>>> +	br = in_be32(&lbc->bank[bank].br);
>>>
>>>  	switch (br & BR_MSEL) {
>>>  	case BR_MS_UPMA:
>>> -		upm->mxmr = &fsl_lbc_regs->mamr;
>>> +		upm->mxmr = &lbc->mamr;
>>>  		break;
>>>  	case BR_MS_UPMB:
>>> -		upm->mxmr = &fsl_lbc_regs->mbmr;
>>> +		upm->mxmr = &lbc->mbmr;
>>>  		break;
>>>  	case BR_MS_UPMC:
>>> -		upm->mxmr = &fsl_lbc_regs->mcmr;
>>> +		upm->mxmr = &lbc->mcmr;
>>>  		break;
>>>  	default:
>>>  		return -EINVAL;
>>> @@ -148,9 +135,12 @@ int fsl_upm_run_pattern(struct fsl_upm *upm, void
>> __iomem *io_base, u32 mar)
>>>  	int ret = 0;
>>>  	unsigned long flags;
>>>
>>> +	if (!fsl_lbc_ctrl_dev || !fsl_lbc_ctrl_dev->regs)
>>> +		return -ENODEV;
>>> +
>>>  	spin_lock_irqsave(&fsl_lbc_lock, flags);
>>>
>>> -	out_be32(&fsl_lbc_regs->mar, mar);
>>> +	out_be32(&fsl_lbc_ctrl_dev->regs->mar, mar);
>>>
>>>  	switch (upm->width) {
>>>  	case 8:
>>> @@ -172,3 +162,166 @@ int fsl_upm_run_pattern(struct fsl_upm *upm, void
>> __iomem *io_base, u32 mar)
>>>  	return ret;
>>>  }
>>>  EXPORT_SYMBOL(fsl_upm_run_pattern);
>>> +
>>> +static int __devinit fsl_lbc_ctrl_init(struct fsl_lbc_ctrl *ctrl)
>>> +{
>>> +	struct fsl_lbc_regs __iomem *lbc = ctrl->regs;
>>> +
>>> +	/* clear event registers */
>>> +	setbits32(&lbc->ltesr, LTESR_CLEAR);
>>> +	out_be32(&lbc->lteatr, 0);
>>> +	out_be32(&lbc->ltear, 0);
>>> +	out_be32(&lbc->lteccr, LTECCR_CLEAR);
>>> +	out_be32(&lbc->ltedr, LTEDR_ENABLE);
>>> +
>>> +	/* Enable interrupts for any detected events */
>>> +	out_be32(&lbc->lteir, LTEIR_ENABLE);
>>> +
>>> +	return 0;
>>> +}
>>> +
>>> +/*
>>> + * NOTE: This interrupt is used to report localbus events of various kinds,
>>> + * such as transaction errors on the chipselects.
>>> + */
>>> +
>>> +static irqreturn_t fsl_lbc_ctrl_irq(int irqno, void *data)
>>> +{
>>> +	struct fsl_lbc_ctrl *ctrl = data;
>>> +	struct fsl_lbc_regs __iomem *lbc = ctrl->regs;
>>> +	u32 status;
>>> +
>>> +	status = in_be32(&lbc->ltesr);
>>> +	if (!status)
>>> +		return IRQ_NONE;
>>> +
>>> +	out_be32(&lbc->ltesr, LTESR_CLEAR);
>>> +	out_be32(&lbc->lteatr, 0);
>>> +	out_be32(&lbc->ltear, 0);
>>> +	ctrl->irq_status = status;
>>> +
>>> +	if (status & LTESR_BM)
>>> +		dev_err(ctrl->dev, "Local bus monitor time-out: "
>>> +			"LTESR 0x%08X\n", status);
>>> +	if (status & LTESR_WP)
>>> +		dev_err(ctrl->dev, "Write protect error: "
>>> +			"LTESR 0x%08X\n", status);
>>> +	if (status & LTESR_ATMW)
>>> +		dev_err(ctrl->dev, "Atomic write error: "
>>> +			"LTESR 0x%08X\n", status);
>>> +	if (status & LTESR_ATMR)
>>> +		dev_err(ctrl->dev, "Atomic read error: "
>>> +			"LTESR 0x%08X\n", status);
>>> +	if (status & LTESR_CS)
>>> +		dev_err(ctrl->dev, "Chip select error: "
>>> +			"LTESR 0x%08X\n", status);
>>> +	if (status & LTESR_UPM)
>>> +		;
>>> +	if (status & LTESR_FCT) {
>>> +		dev_err(ctrl->dev, "FCM command time-out: "
>>> +			"LTESR 0x%08X\n", status);
>>> +		smp_wmb();
>>> +		wake_up(&ctrl->irq_wait);
>>> +	}
>>> +	if (status & LTESR_PAR) {
>>> +		dev_err(ctrl->dev, "Parity or Uncorrectable ECC error: "
>>> +			"LTESR 0x%08X\n", status);
>>> +		smp_wmb();
>>> +		wake_up(&ctrl->irq_wait);
>>> +	}
>>> +	if (status & LTESR_CC) {
>>> +		smp_wmb();
>>> +		wake_up(&ctrl->irq_wait);
>>> +	}
>>> +	if (status & ~LTESR_MASK)
>>> +		dev_err(ctrl->dev, "Unknown error: "
>>> +			"LTESR 0x%08X\n", status);
>>> +	return IRQ_HANDLED;
>>> +}
>>> +
>>> +/*
>>> + * fsl_lbc_ctrl_probe
>>> + *
>>> + * called by device layer when it finds a device matching
>>> + * one our driver can handled. This code allocates all of
>>> + * the resources needed for the controller only.  The
>>> + * resources for the NAND banks themselves are allocated
>>> + * in the chip probe function.
>>> +*/
>>> +
>>> +static int __devinit fsl_lbc_ctrl_probe(struct platform_device *dev)
>>> +{
>>> +	int ret;
>>> +
>>> +	if (!dev->dev.of_node) {
>>> +		dev_err(&dev->dev, "Device OF-Node is NULL");
>>> +		return -EFAULT;
>>> +	}
>>> +
>>> +	fsl_lbc_ctrl_dev = kzalloc(sizeof(*fsl_lbc_ctrl_dev), GFP_KERNEL);
>>> +	if (!fsl_lbc_ctrl_dev)
>>> +		return -ENOMEM;
>>> +
>>> +	dev_set_drvdata(&dev->dev, fsl_lbc_ctrl_dev);
>>> +
>>> +	spin_lock_init(&fsl_lbc_ctrl_dev->lock);
>>> +	init_waitqueue_head(&fsl_lbc_ctrl_dev->irq_wait);
>>> +
>>> +	fsl_lbc_ctrl_dev->regs = of_iomap(dev->dev.of_node, 0);
>>> +	if (!fsl_lbc_ctrl_dev->regs) {
>>> +		dev_err(&dev->dev, "failed to get memory region\n");
>>> +		ret = -ENODEV;
>>> +		goto err;
>> Looks you always iounmap(fsl_lbc_ctrl_dev->regs) on position 'err' but here
>> of_iomap() is already failed you should skip iounmap() fsl_lbc_ctrl_dev->regs
>> again. So you should improve that as the following on 'err', or layout 'err'
>> in
>> gain.
>> ------
>> 	if(fsl_lbc_ctrl_dev->regs)
>> 		iounmap(fsl_lbc_ctrl_dev->regs);
>>
>> Tiejun
> 
> You are right!
> How about
>  
>         if (!fsl_lbc_ctrl_dev->regs) {
>                 dev_err(&dev->dev, "failed to get memory region\n");
>                 kfree(fsl_lbc_ctrl_dev);
>                 return -ENOMEM;
>         }

Although this is a big problem, I prefer to return 'ENXIO' :)

Others are fine to me.

Tiejun

> 
> ...
> 
> Thanks.
> Roy

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox