Linux-ARM-Kernel Archive on lore.kernel.org
 help / color / mirror / Atom feed
* [RFC] MMC: error handling improvements
From: Pawel Moll @ 2011-02-16 19:01 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <20110215230311.GT4152@n2100.arm.linux.org.uk>

Hi,

> The adaptive clock
> rate algorithm can probably do with a lot more work to avoid it up-
> clocking to a rate which has proven to never work.  I'd actually go as
> far as to say that the algorithm probably has a lot to be desired - but
> it seems to work for my test scenarios.

I've just gave it a try (on top of a clean 2.6.38-rc5):


/ # dd if=/dev/mmcblk0 of=/dev/null bs=128k count=10
10+0 records in
10+0 records out
1310720 bytes (1.3MB) copied, 2.922722 seconds, 437.9KB/s
/ # cat /dev/sda > /dev/null &
/ # dd if=/dev/mmcblk0 of=/dev/null bs=128k count=10
mmcblk0: error -5 transferring data, sector 0, nr 120, cmd response 0x900, card status 0xb00
mmcblk0: error -5 transferring data, sector 0, nr 120, cmd response 0x900, card status 0xb00
mmcblk0: retrying with slower /2 clock rate
mmcblk0: error -5 transferring data, sector 0, nr 120, cmd response 0x900, card status 0xb00
mmcblk0: retrying with slower /4 clock rate
mmcblk0: error -5 transferring data, sector 0, nr 120, cmd response 0x900, card status 0xb00
mmcblk0: retrying with slower /8 clock rate
mmcblk0: error -5 transferring data, sector 0, nr 120, cmd response 0x900, card status 0xb00
mmcblk0: retrying with slower /16 clock rate
10+0 records in
10+0 records out
1310720 bytes (1.3MB) copied, 46.763456 seconds, 27.4KB/s
/ # kill %1
/ # 
[1]+  Terminated                 cat /dev/sda 1>/dev/null
/ # dd if=/dev/mmcblk0 of=/dev/null bs=128k count=10
10+0 records in
10+0 records out
1310720 bytes (1.3MB) copied, 46.539866 seconds, 27.5KB/s
/ # sleep 30
/ # dd if=/dev/mmcblk0 of=/dev/null bs=128k count=10
10+0 records in
10+0 records out
1310720 bytes (1.3MB) copied, 46.540215 seconds, 27.5KB/s


So it does the right thing with decreasing the clock rate in face of
problems, I just can't see it clocking it back up...

Cheers!

Pawe?

^ permalink raw reply

* [PATCH] ARM: gic: use handle_fasteoi_irq for SPIs
From: Thomas Gleixner @ 2011-02-16 19:20 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <AANLkTimPDNv+2n2-QHtBq0X+Vkj21yQCPh8iKZkYjfqG@mail.gmail.com>

On Wed, 16 Feb 2011, Rabin Vincent wrote:
> On Wed, Feb 16, 2011 at 21:47, Will Deacon <will.deacon@arm.com> wrote:
> > Ah yes, thanks for the explanation. After looking at the plat-omap code
> > I finally understand what's going on and I can't help but feel that the
> > chained GPIO handlers are terminally broken! The generic irq chip high-level
> > handlers (handle_{edge,level}_irq for example) at least check to see if
> > the irq_chip functions are non-NULL before calling them.
> >
> > Ideally, the chained handler would be able to query the irq_chip to find
> > out what types of IRQ flow-control it supports and then assume that behaviour.
> 
> Thomas, suggestions on how best to handle this?  (Some of these chained
> handlers are the ones in plat-omap/gpio.c, plat-nomadik/gpio.c, and
> mach-s5pv310/irq-combiner.c.)

I'm not much of a fan of those chained handlers. They work fine, when
they are tied into the irq_chip implementation of a SoC where the
chained handler code is 1:1 related to that irq_chip.

Once you start assigning those handlers somewhere else or even using
the same handler for different underlying primary irq chips, then it's
a lost case and wreckage like this is just lurking around the corner.

The only sane way to deal with this is to install a regular interrupt
handler with request/setup_irq() and do the demultiplexing from
there. That way the demux handler does not have to worry about the
underlying primary chip at all. It does not have to worry whether this
chip uses level, fasteoi, edge or whatever. It just works.

The runtime overhead of going through that path is minimal and really
not worth the pain.

Thanks,

	tglx

^ permalink raw reply

* [RFC] MMC: error handling improvements
From: David Brown @ 2011-02-16 19:28 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <AANLkTikpdwfM19oirENbOhnz8Rh40ZnMXByGX_QKdu-V@mail.gmail.com>

On Wed, Feb 16 2011, Linus Walleij wrote:

> 2011/2/16 David Brown <davidb@codeaurora.org>:

>> It's also possible this is finding problems in our SDCC driver.
>
> The SDCC is obviously an MMCI derivate, VHDL hacking
> on top of ARMs source code for PL180/PL181.
>
> Why do you insist on maintaining a forked driver?

Well, it's not me insisting on it.  I'll let the maintainers of the
driver chime in.

The changes we made to the block are significant, but even beyond that
we changed how the block is even accessed.  The driver doesn't directly
access the registers of the controller, but all accesses go through a
custom DMA engine.

> Please consider switching to using mmci.c like everyone else.
> The quirks we have in place for U300, Nomadik and Ux500
> should show you the way for how to do this (yes we did the
> same thing, hacking the ARM VHDL).

I suspect the changes to mmci would be fairly drastic.

> If I remember correctly I could even see that some early
> Android sources were using Russells mmci.c driver before this
> fork was created.

These old drivers are also not usable.  The SDCC block is shared between
the modem processor and the processor running Linux.  If the driver
doesn't go through the DMA engine, which coordinates this, the registers
will be stomped on by the other CPU whenever it decides to access it's
parts of the flash device.

David

-- 
Sent by an employee of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.

^ permalink raw reply

* [PATCH] ARM: reenable DEBUG_SECTION_MISMATCH
From: Uwe Kleine-König @ 2011-02-16 19:33 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <1297876993-4146-1-git-send-email-u.kleine-koenig@pengutronix.de>

On Wed, Feb 16, 2011 at 06:23:13PM +0100, Uwe Kleine-K?nig wrote:
> For 2.6.38-rc4-git9 only 29 out of 133 defconfigs still produce section
> mismatches.  These defconfigs produce 55 mismatches (weighted sum, so
> maybe less uniq mismatches).
> 
> This is in my opinion enough to start scaring people about the remaining
> problems.
@Vincent: Additionally it would be great if kautobuild would recognise
these as warnings.

Best regards
Uwe

-- 
Pengutronix e.K.                           | Uwe Kleine-K?nig            |
Industrial Linux Solutions                 | http://www.pengutronix.de/  |

^ permalink raw reply

* [PATCH v3 0/2] OMAP: IOMMU fault callback support
From: David Cohen @ 2011-02-16 19:35 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

This patch set adapts current (*isr)() to be used as fault callback.
IOMMU faults might be very difficult to reproduce and then to figure out
the source of the problem. Currently IOMMU driver prints not so useful
debug message and does not notice user about such issue.
With a fault callback, IOMMU user may debug much more useful information
and/or react to go back to a valid state.

Br,

David
---

David Cohen (2):
  OMAP2+: IOMMU: don't print fault warning on specific layer
  OMAP: IOMMU: add support to callback during fault handling

 arch/arm/mach-omap2/iommu2.c            |   33 +++++++++-----------
 arch/arm/plat-omap/include/plat/iommu.h |   14 ++++++++-
 arch/arm/plat-omap/iommu.c              |   52 ++++++++++++++++++++++---------
 3 files changed, 65 insertions(+), 34 deletions(-)

-- 
1.7.2.3

^ permalink raw reply

* [PATCH v3 1/2] OMAP2+: IOMMU: don't print fault warning on specific layer
From: David Cohen @ 2011-02-16 19:35 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <1297884951-3019-1-git-send-email-dacohen@gmail.com>

IOMMU upper layer and user are responsible to handle a fault and to
define whether it will end up as an error or not. OMAP2+ specific
layer should not print anything in such case.

Signed-off-by: David Cohen <dacohen@gmail.com>
---
 arch/arm/mach-omap2/iommu2.c |   16 ----------------
 1 files changed, 0 insertions(+), 16 deletions(-)

diff --git a/arch/arm/mach-omap2/iommu2.c b/arch/arm/mach-omap2/iommu2.c
index 14ee686..49a1e5e 100644
--- a/arch/arm/mach-omap2/iommu2.c
+++ b/arch/arm/mach-omap2/iommu2.c
@@ -145,15 +145,7 @@ static void omap2_iommu_set_twl(struct iommu *obj, bool on)
 
 static u32 omap2_iommu_fault_isr(struct iommu *obj, u32 *ra)
 {
-	int i;
 	u32 stat, da;
-	const char *err_msg[] =	{
-		"tlb miss",
-		"translation fault",
-		"emulation miss",
-		"table walk fault",
-		"multi hit fault",
-	};
 
 	stat = iommu_read_reg(obj, MMU_IRQSTATUS);
 	stat &= MMU_IRQ_MASK;
@@ -163,14 +155,6 @@ static u32 omap2_iommu_fault_isr(struct iommu *obj, u32 *ra)
 	da = iommu_read_reg(obj, MMU_FAULT_AD);
 	*ra = da;
 
-	dev_err(obj->dev, "%s:\tda:%08x ", __func__, da);
-
-	for (i = 0; i < ARRAY_SIZE(err_msg); i++) {
-		if (stat & (1 << i))
-			printk("%s ", err_msg[i]);
-	}
-	printk("\n");
-
 	iommu_write_reg(obj, stat, MMU_IRQSTATUS);
 
 	return stat;
-- 
1.7.2.3

^ permalink raw reply related

* [PATCH v3 2/2] OMAP: IOMMU: add support to callback during fault handling
From: David Cohen @ 2011-02-16 19:35 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <1297884951-3019-1-git-send-email-dacohen@gmail.com>

Add support to register an isr for IOMMU fault situations and adapt it
to allow such (*isr)() to be used as fault callback. Drivers using IOMMU
module might want to be informed when errors happen in order to debug it
or react.

Signed-off-by: David Cohen <dacohen@gmail.com>
---
 arch/arm/mach-omap2/iommu2.c            |   17 +++++++++-
 arch/arm/plat-omap/include/plat/iommu.h |   14 ++++++++-
 arch/arm/plat-omap/iommu.c              |   52 ++++++++++++++++++++++---------
 3 files changed, 65 insertions(+), 18 deletions(-)

diff --git a/arch/arm/mach-omap2/iommu2.c b/arch/arm/mach-omap2/iommu2.c
index 49a1e5e..adb083e 100644
--- a/arch/arm/mach-omap2/iommu2.c
+++ b/arch/arm/mach-omap2/iommu2.c
@@ -146,18 +146,31 @@ static void omap2_iommu_set_twl(struct iommu *obj, bool on)
 static u32 omap2_iommu_fault_isr(struct iommu *obj, u32 *ra)
 {
 	u32 stat, da;
+	u32 errs = 0;
 
 	stat = iommu_read_reg(obj, MMU_IRQSTATUS);
 	stat &= MMU_IRQ_MASK;
-	if (!stat)
+	if (!stat) {
+		*ra = 0;
 		return 0;
+	}
 
 	da = iommu_read_reg(obj, MMU_FAULT_AD);
 	*ra = da;
 
+	if (stat & MMU_IRQ_TLBMISS)
+		errs |= OMAP_IOMMU_ERR_TLB_MISS;
+	if (stat & MMU_IRQ_TRANSLATIONFAULT)
+		errs |= OMAP_IOMMU_ERR_TRANS_FAULT;
+	if (stat & MMU_IRQ_EMUMISS)
+		errs |= OMAP_IOMMU_ERR_EMU_MISS;
+	if (stat & MMU_IRQ_TABLEWALKFAULT)
+		errs |= OMAP_IOMMU_ERR_TBLWALK_FAULT;
+	if (stat & MMU_IRQ_MULTIHITFAULT)
+		errs |= OMAP_IOMMU_ERR_MULTIHIT_FAULT;
 	iommu_write_reg(obj, stat, MMU_IRQSTATUS);
 
-	return stat;
+	return errs;
 }
 
 static void omap2_tlb_read_cr(struct iommu *obj, struct cr_regs *cr)
diff --git a/arch/arm/plat-omap/include/plat/iommu.h b/arch/arm/plat-omap/include/plat/iommu.h
index 19cbb5e..174f1b9 100644
--- a/arch/arm/plat-omap/include/plat/iommu.h
+++ b/arch/arm/plat-omap/include/plat/iommu.h
@@ -31,6 +31,7 @@ struct iommu {
 	struct clk	*clk;
 	void __iomem	*regbase;
 	struct device	*dev;
+	void		*isr_priv;
 
 	unsigned int	refcount;
 	struct mutex	iommu_lock;	/* global for this whole object */
@@ -47,7 +48,7 @@ struct iommu {
 	struct list_head	mmap;
 	struct mutex		mmap_lock; /* protect mmap */
 
-	int (*isr)(struct iommu *obj);
+	int (*isr)(struct iommu *obj, u32 da, u32 iommu_errs, void *priv);
 
 	void *ctx; /* iommu context: registres saved area */
 	u32 da_start;
@@ -109,6 +110,13 @@ struct iommu_platform_data {
 	u32 da_end;
 };
 
+/* IOMMU errors */
+#define OMAP_IOMMU_ERR_TLB_MISS		(1 << 0)
+#define OMAP_IOMMU_ERR_TRANS_FAULT	(1 << 1)
+#define OMAP_IOMMU_ERR_EMU_MISS		(1 << 2)
+#define OMAP_IOMMU_ERR_TBLWALK_FAULT	(1 << 3)
+#define OMAP_IOMMU_ERR_MULTIHIT_FAULT	(1 << 4)
+
 #if defined(CONFIG_ARCH_OMAP1)
 #error "iommu for this processor not implemented yet"
 #else
@@ -161,6 +169,10 @@ extern size_t iopgtable_clear_entry(struct iommu *obj, u32 iova);
 extern int iommu_set_da_range(struct iommu *obj, u32 start, u32 end);
 extern struct iommu *iommu_get(const char *name);
 extern void iommu_put(struct iommu *obj);
+extern int iommu_set_isr(const char *name,
+			 int (*isr)(struct iommu *obj, u32 da, u32 iommu_errs,
+				    void *priv),
+			 void *isr_priv);
 
 extern void iommu_save_ctx(struct iommu *obj);
 extern void iommu_restore_ctx(struct iommu *obj);
diff --git a/arch/arm/plat-omap/iommu.c b/arch/arm/plat-omap/iommu.c
index f55f458..b0e0efc 100644
--- a/arch/arm/plat-omap/iommu.c
+++ b/arch/arm/plat-omap/iommu.c
@@ -780,25 +780,19 @@ static void iopgtable_clear_entry_all(struct iommu *obj)
  */
 static irqreturn_t iommu_fault_handler(int irq, void *data)
 {
-	u32 stat, da;
+	u32 da, errs;
 	u32 *iopgd, *iopte;
-	int err = -EIO;
 	struct iommu *obj = data;
 
 	if (!obj->refcount)
 		return IRQ_NONE;
 
-	/* Dynamic loading TLB or PTE */
-	if (obj->isr)
-		err = obj->isr(obj);
-
-	if (!err)
-		return IRQ_HANDLED;
-
 	clk_enable(obj->clk);
-	stat = iommu_report_fault(obj, &da);
+	errs = iommu_report_fault(obj, &da);
 	clk_disable(obj->clk);
-	if (!stat)
+
+	/* Fault callback or TLB/PTE Dynamic loading */
+	if (obj->isr && !obj->isr(obj, da, errs, obj->isr_priv))
 		return IRQ_HANDLED;
 
 	iommu_disable(obj);
@@ -806,15 +800,16 @@ static irqreturn_t iommu_fault_handler(int irq, void *data)
 	iopgd = iopgd_offset(obj, da);
 
 	if (!iopgd_is_table(*iopgd)) {
-		dev_err(obj->dev, "%s: da:%08x pgd:%p *pgd:%08x\n", obj->name,
-			da, iopgd, *iopgd);
+		dev_err(obj->dev, "%s: errs:0x%08x da:0x%08x pgd:0x%p "
+			"*pgd:px%08x\n", obj->name, errs, da, iopgd, *iopgd);
 		return IRQ_NONE;
 	}
 
 	iopte = iopte_offset(iopgd, da);
 
-	dev_err(obj->dev, "%s: da:%08x pgd:%p *pgd:%08x pte:%p *pte:%08x\n",
-		obj->name, da, iopgd, *iopgd, iopte, *iopte);
+	dev_err(obj->dev, "%s: errs:0x%08x da:0x%08x pgd:0x%p *pgd:0x%08x "
+		"pte:0x%p *pte:0x%08x\n", obj->name, errs, da, iopgd, *iopgd,
+		iopte, *iopte);
 
 	return IRQ_NONE;
 }
@@ -917,6 +912,33 @@ void iommu_put(struct iommu *obj)
 }
 EXPORT_SYMBOL_GPL(iommu_put);
 
+int iommu_set_isr(const char *name,
+		  int (*isr)(struct iommu *obj, u32 da, u32 iommu_errs,
+			     void *priv),
+		  void *isr_priv)
+{
+	struct device *dev;
+	struct iommu *obj;
+
+	dev = driver_find_device(&omap_iommu_driver.driver, NULL, (void *)name,
+				 device_match_by_alias);
+	if (!dev)
+		return -ENODEV;
+
+	obj = to_iommu(dev);
+	mutex_lock(&obj->iommu_lock);
+	if (obj->refcount != 0) {
+		mutex_unlock(&obj->iommu_lock);
+		return -EBUSY;
+	}
+	obj->isr = isr;
+	obj->isr_priv = isr_priv;
+	mutex_unlock(&obj->iommu_lock);
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(iommu_set_isr);
+
 /*
  *	OMAP Device MMU(IOMMU) detection
  */
-- 
1.7.2.3

^ permalink raw reply related

* [PATCH v2] ARM: vfp: Always save VFP state in vfp_pm_suspend
From: Colin Cross @ 2011-02-16 19:36 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <20110215170336.GP4152@n2100.arm.linux.org.uk>

On Tue, Feb 15, 2011 at 9:03 AM, Russell King - ARM Linux
<linux@arm.linux.org.uk> wrote:
> On Mon, Feb 14, 2011 at 02:55:47PM -0800, Colin Cross wrote:
>> diff --git a/arch/arm/vfp/vfpmodule.c b/arch/arm/vfp/vfpmodule.c
>> index 66bf8d1..7231d18 100644
>> --- a/arch/arm/vfp/vfpmodule.c
>> +++ b/arch/arm/vfp/vfpmodule.c
>> @@ -415,13 +415,13 @@ static int vfp_pm_suspend(struct sys_device *dev, pm_message_t state)
>> ? ? ? struct thread_info *ti = current_thread_info();
>> ? ? ? u32 fpexc = fmrx(FPEXC);
>>
>> - ? ? /* if vfp is on, then save state for resumption */
>> - ? ? if (fpexc & FPEXC_EN) {
>> + ? ? /* save state for resume */
>> + ? ? if (last_VFP_context[ti->cpu]) {
>
> I'm not entirely happy with this.
>
> It is true that last_VFP_context[] when non-NULL indicates who owns the
> hardware VFP state, so saving it would seem logical. ?However, this new
> code now saves the state with the saved fpexc indicating that it's disabled.
>
> This will cause a VFP exception to misbehave by reloading the state, and
> then disabling the VFP unit. ?That will cause another VFP exception which
> will find the VFP unit disabled, and re-enable it. ?All in all, this is
> rather wasteful.
>
> So...
> ? ? ? ?/* If lazy disable, re-enable the VFP ready for it to be saved */
> ? ? ? ?if (last_VFP_context[ti->cpu] != &ti->vfpstate) {
> ? ? ? ? ? ? ? ?fpexc |= FPEXC_EN;
> ? ? ? ? ? ? ? ?fmxr(FPEXC, fpexc);
> ? ? ? ?}
> ? ? ? ?/* If VFP is on, then save state for resumption */
> ? ? ? ?if (fpexc & FPEXC_EN) {
> ? ? ? ? ? ? ? ?...

I think v2 of the patch handles this case correctly:
	/* save state for resume */
	if (last_VFP_context[ti->cpu]) {
		printk(KERN_DEBUG "%s: saving vfp state\n", __func__);
		fmxr(FPEXC, fpexc | FPEXC_EN);
		vfp_save_state(last_VFP_context[ti->cpu], fpexc);
		last_VFP_context[ti->cpu] = NULL;
		fmxr(FPEXC, fpexc & ~FPEXC_EN);
	}

This version enables the VFP if it was not enabled, but saves the
original fpexc value.

^ permalink raw reply

* [PATCH v2] OMAP: PM: DMA: Enable runtime pm
From: Kevin Hilman @ 2011-02-16 19:47 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <20110216140535.GA15484@m-desktop>

"G, Manjunath Kondaiah" <manjugk@ti.com> writes:

> Hi Kevin,
>
> On Mon, Feb 14, 2011 at 02:06:53PM -0800, Kevin Hilman wrote:
>> "G, Manjunath Kondaiah" <manjugk@ti.com> writes:
>> 
>> > From: Manjunath G Kondaiah <manjugk@ti.com>
>> >
>> > Enable runtime pm and use pm_runtime_get_sync and pm_runtime_put_autosuspend
>> > for OMAP DMA driver.
>> >
>> > The DMA driver uses auto suspend feature of runtime pm framework through
>> > which the clock gets disabled automatically if there is no activity for
>> > more than one second.
>> >
>> > Testing:
>> > Compile: omap1_defconfig and omap2plus_defconfig
>> > Boot: OMAP1710(H3), OMAP2420(H4), OMAP3630(Zoom3), OMAP4(Blaze)
>> 
>> The normal DMA tests should also be run on these platforms.  Based on
>> the above, I can't tell any DMA tests were run.   Based on my tests,
>> this isn't working for chained xfers.
>> 
>> Using the runtime PM sysfs interface, you can check the runtime status
>> of the device:
>> 
>> # cat /sys/devices/platform/omap/omap_dma_system.0/power/runtime_status          
>> 
>> It should show 'active' during transfer, and after timeout expires it
>> will show 'suspended'.
>> 
>> Doing some tests using my dmatest module:
>> 
>>   git://gitorious.org/omap-test/dmatest.git
>> 
>> I noticed that it gets stuck in 'active' and never gets suspended when I
>> used DMA channel linking (load module using 'linking=1' as load-time option)
>> 
>> I'm not sure exactly why, but I will guess that the reason is that there
>> is an imbalance in get/put calls when using chaining, since 'get' is
>> only called once upon omap_start_dma() but 'put' is called for every
>> channel in the callback.
>
> Even I noticed this after running chaining test case and checking
> runtime status. But, I am wondering even with 'active' runtime status, 
> the core hits off and retention.

Probably because system DMA is auto-idle and clocked by the core_l3_iclk

> The complete log which has all the sequences of running chaining tests,
> enabling off mode and checking runtime status is available at:
> http://pastebin.com/YEHMEXUP
>
> Though I agree on the point that, it is mismatch with get/put calls with
> DMA chaining, I still need to analyze this in detail.

Yes.  The mismatch highlights an underlying problem.

> The other thing which is not considered here is, the get_sync is called
> inside start_dma only(request_dma will call get_sync and put after the
> getting requested channel). After request_dma and start_dma, there are
> API's called by user(dma_set_params, priority etc) which also require
> get_sync since those API's will access configuration registers. I am
> wondering if have get_sync and put in all the API's, this might result
> in over loading. 

I'm not sure what you mean by over loading.

You need to have all register accesses inside get/put calls.  As long as
they are balanced, this should not leed to problems.

>> 
>> > On zoom3 core retention is tested with following steps:
>> > echo 1 > /debug/pm_debug/sleep_while_idle
>> > echo 1 > /debug/pm_debug/enable_off_mode
>> > echo 5 > /sys/devices/platform/omap/omap_uart.0/sleep_timeout
>> > echo 5 > /sys/devices/platform/omap/omap_uart.1/sleep_timeout
>> > echo 5 > /sys/devices/platform/omap/omap_uart.2/sleep_timeout
>> > echo 5 > /sys/devices/platform/omap/omap_uart.3/sleep_timeout
>> >
>> > It is observed that(on pm branch), core retention count gets increasing if the
>> > board is left idle for more than 5 seconds. However, it doesnot enter off mode
>> > (even without DMA runtime changes).
>> 
>> What silicon rev is on your Zoom3?  
> It's 3630 ES1.0. 
>> Mainline kernels now disable core off-mode for 3630 revs < ES2.1 due to erratum 
>>i583.
>> 
>> If this happens, you should see something like this on the console:
>> 
>>            Core OFF disabled due to errata i583
>> 
> We can observe above message in mainline after enabling cpu idle in
> omap2plus_defconfig.
>
> I switched to zoom2 and able to hit core retention and
> off mode with mainline.

OK, good. 

Thanks for clarifying.


Kevin

^ permalink raw reply

* [PATCH] ARM: reenable DEBUG_SECTION_MISMATCH
From: Sam Ravnborg @ 2011-02-16 20:07 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <1297876993-4146-1-git-send-email-u.kleine-koenig@pengutronix.de>

On Wed, Feb 16, 2011 at 06:23:13PM +0100, Uwe Kleine-K?nig wrote:
> For 2.6.38-rc4-git9 only 29 out of 133 defconfigs still produce section
> mismatches.  These defconfigs produce 55 mismatches (weighted sum, so
> maybe less uniq mismatches).
> 
> This is in my opinion enough to start scaring people about the remaining
> problems.
> 
> Signed-off-by: Uwe Kleine-K?nig <u.kleine-koenig@pengutronix.de>
> ---
>  lib/Kconfig.debug |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> index 3967c23..1130dd4 100644
> --- a/lib/Kconfig.debug
> +++ b/lib/Kconfig.debug
> @@ -102,7 +102,7 @@ config HEADERS_CHECK
>  
>  config DEBUG_SECTION_MISMATCH
>  	bool "Enable full Section mismatch analysis"
> -	depends on UNDEFINED || (BLACKFIN)
> +	depends on ARM || BLACKFIN
>  	default y
>  	# This option is on purpose disabled for now.
>  	# It will be enabled when we are down to a reasonable number

If we are deciding to make this ARCH dependent then a blacklist is
much preferred.

I know that we have warnings lingering in the kernel for a long time.
But throwing these waring in the face of people for each build will
be eoung incentive to get most fixed.

Last I took a deeper look at this I had allyesconfig + allmodconfig
almost warning free on x86. But I recall that especially the HOTPLUG_CPU
stuff was tricky as they misuse the annotation.

We could try to enable it per default - and seem if people scream too much.

	Sam

^ permalink raw reply

* [openmcapi-dev] Re: [RFC] Inter-processor Mailboxes Drivers
From: Blanchard, Hollis @ 2011-02-16 20:22 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <20110216.082228.814509512977216497.Hiroshi.DOYU@nokia.com>

On 02/15/2011 10:22 PM, Hiroshi DOYU wrote:
> From: "ext Blanchard, Hollis"<Hollis_Blanchard@mentor.com>
> Subject: Re: [RFC] Inter-processor Mailboxes Drivers
> Date: Tue, 15 Feb 2011 15:38:25 -0800
>
>> On 02/15/2011 01:58 PM, Meador Inge wrote:
>>> On 02/14/2011 04:01 AM, Jamie Iles wrote:
>>>> On Fri, Feb 11, 2011 at 03:19:51PM -0600, Meador Inge wrote:
>>>>>       1. Hardware specific bits somewhere under '.../arch/*'.  Drivers
>>>>>          for the MPIC message registers on Power and OMAP4 mailboxes,
>>>>> for
>>>>>          example.
>>>>>       2. A higher level driver under '.../drivers/mailbox/*'.  That the
>>>>>          pieces in (1) would register with.  This piece would expose the
>>>>>          main kernel API.
>>>>>       3. Userspace interfaces for accessing the mailboxes.  A
>>>>>          '/dev/mailbox1', '/dev/mailbox2', etc... mapping, for example.
>>>> How about using virtio for all of this and having the mailbox as a
>>>> notification/message passing driver for the virtio backend?  There are
>>>> already virtio console and network drivers that could be useful for the
>>>> userspace part of it.  drivers/virtio/virtio_ring.c might be a good
>>>> starting point if you thought there was some mileage in this approach.
>>> To be honest, I am not that familiar with 'virtio', but I will take a
>>> look.  Thanks for the pointer.  Maybe Hollis can speak to this idea more.
>> My opinion is that virtio is (over?) complicated.
> Considering the case of omap mailbox H/W, it is just a simple one way
> 4 slot x 32bit H/W FIFO, I also may think that this may be a bit too
> much...
I think the proposal is to implement a virtio link using the OMAP 
mailboxes as the interrupt mechanism, and shared memory to carry the 
data and descriptor rings.

Hollis Blanchard
Mentor Graphics, Embedded Systems Division

^ permalink raw reply

* [PATCH v2 1/7] mmc: mxs-mmc: add mmc host driver for i.MX23/28
From: Shawn Guo @ 2011-02-16 20:28 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <20110215172948.GE10770@pengutronix.de>

Hi Wolfram,

Thanks for testing.

On Tue, Feb 15, 2011 at 06:29:48PM +0100, Wolfram Sang wrote:
> On Tue, Feb 15, 2011 at 05:19:17PM +0000, Russell King - ARM Linux wrote:
> > On Tue, Feb 15, 2011 at 06:13:41PM +0100, Wolfram Sang wrote:
> > > 
> > > > Ah, yes.  I can also see the problem here after turning on
> > > > DEBUG_SPINLOCK.
> > > 
> > > Ah, okay. After turning it off, it works a lot better :)
> > 
> > That doesn't mean the problem is fixed. [...]
> 
> Yes, I know that. I should have put the 'works' above in quotes, sorry.
> 
It's caused by spinlock recursion introduced by mxs-dma functions
mxs_dma_tx_submit and mxs_dma_tasklet.  We have mmc_request_done
invoked in the dma callback tasklet.  At the meantime,
mmc_request_done will issue retries in some case, which will call in
mxs_dma_tx_submit.

I added the lock by referring to other dma driver implementation, but
now I'm considering to remove the lock completely, as I do not see
any global data needs to be protected there.  Comments?

> > > MMC fails for me (note: the card works fine with an mx35-based board)
> > > 
> > > 	mmc0: new high speed MMC card at address 0001
> > > 	mmcblk0: mmc0:0001 AF HMP 247 MiB 
> > > 	mmcblk0: retrying using single block read
> > > 	mmcblk0: error -84 transferring data, sector 0, nr 8, card status 0x900
> > > 	end_request: I/O error, dev mmcblk0, sector 0
> > 
> > EILSEQ means CRC failure.  Probably unrelated.
> 
> Even if it works in another setup? As a result of the above, I can't read the
> partition table of that MMC. The mx35 can do so.
> 
First of all, it's not a problem that partition table of the card
can not be read.  For example, I have every card giving unknown
partition table message after performing mmc host driver test on the
cards, but they are working good.

I guess you will also get the unknown partition table message if you
test this card on mx35 right now.

I just tested 7 mmc cards in total.  6 cards work fine, and 1 card
(Transcend MMC plus 1GB) has the exactly same problem as yours. And
if I remove the 8 bit cap, this card also works fine.  So I would
agree with Russell that it's unrelated to the driver.

mmc1: new high speed MMC card at address 0001
mmcblk1: mmc1:0001 MMC    3.75 GiB
 mmcblk1: unknown partition table
mmc1: card 0001 removed

mmc1: new MMC card at address 0001
mmcblk1: mmc1:0001 SMIMMC 122 MiB
 mmcblk1: unknown partition table
mmc1: card 0001 removed

mmc1: new high speed MMC card at address 0001
mmcblk1: mmc1:0001 NCard  967 MiB
 mmcblk1: p1
mmc1: card 0001 removed

mmc1: new high speed MMC card at address 0001
mmcblk1: mmc1:0001 MMC512 483 MiB
 mmcblk1: p1 < p5 >
mmc1: card 0001 removed

mmc1: new high speed MMC card at address 0001
mmcblk1: mmc1:0001 000000 980 MiB
 mmcblk1: unknown partition table
mmc1: card 0001 removed

mmc1: new high speed MMC card at address 0001
mmcblk1: mmc1:0001 AF HMP 490 MiB
 mmcblk1: p1 < p5 >
mmc1: card 0001 removed

mmc1: new high speed MMC card at address 0001
mmcblk1: mmc1:0001 MMC    967 MiB
mmcblk1: retrying using single block read
mmcblk1: error -84 transferring data, sector 0, nr 8, card status 0x900
end_request: I/O error, dev mmcblk1, sector 0
mmcblk1: error -84 transferring data, sector 1, nr 7, card status 0x900
end_request: I/O error, dev mmcblk1, sector 1
mmcblk1: error -84 transferring data, sector 2, nr 6, card status 0x900
end_request: I/O error, dev mmcblk1, sector 2
...

> > > SDIO card locks the machine. Is it supposed to work already?
> > 
> > Guess it's the spinlock causing that problem.
> 
> Yeah, that could be. In addition, I was just generally interested if SDIO has
> been tested by Shawn.
> 
I tested the SDIO, but probably in different way from yours.  I had
two card slots on my board, rootfs on mmc0 and SDIO card on mmc1.
It seems working fine in this way.  However, when I use nfs and test
SDIO on mmc0, my systems hangs too.  I will look into it.

-- 
Regards,
Shawn

^ permalink raw reply

* [PATCH 12/21] ARM: tegra: clock: Add shared bus clock type
From: Stephen Boyd @ 2011-02-16 20:34 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <1297590033-15035-13-git-send-email-ccross@android.com>

On 02/13/2011 01:40 AM, Colin Cross wrote:
> +/* shared bus ops */
> +/*
> + * Some clocks may have multiple downstream users that need to request a
> + * higher clock rate.  Shared bus clocks provide a unique shared_bus_user
> + * clock to each user.  The frequency of the bus is set to the highest
> + * enabled shared_bus_user clock, with a minimum value set by the
> + * shared bus.
> + */
> +static void tegra_clk_shared_bus_update(struct clk *bus)
> +{
> +	struct clk *c;
> +	unsigned long rate = bus->min_rate;
> +
> +	list_for_each_entry(c, &bus->shared_bus_list, u.shared_bus_user.node)
> +		if (c->u.shared_bus_user.enabled)
> +			rate = max(c->u.shared_bus_user.rate, rate);
> +
> +	if (rate != clk_get_rate(bus))
> +		clk_set_rate(bus, rate);

What do you do if clk_set_rate() fails? Should you unwind all the state
such as the rate and if it's enabled/disabled? Or is it safe to say
clk_set_rate() can't fail unless the kernel is buggy in which case why
aren't all those return -E* in the set rate functions just BUG_ONs?

> +};
> +
> +static void tegra_clk_shared_bus_init(struct clk *c)
> +{
> +	c->max_rate = c->parent->max_rate;
> +	c->u.shared_bus_user.rate = c->parent->max_rate;
> +	c->state = OFF;
> +#ifdef CONFIG_DEBUG_FS
> +	c->set = 1;
> +#endif
> +
> +	list_add_tail(&c->u.shared_bus_user.node,
> +		&c->parent->shared_bus_list);
> +}
> +
> +static int tegra_clk_shared_bus_set_rate(struct clk *c, unsigned long rate)
> +{
> +	c->u.shared_bus_user.rate = rate;
> +	tegra_clk_shared_bus_update(c->parent);
> +	return 0;
> +}
> +
> +static long tegra_clk_shared_bus_round_rate(struct clk *c, unsigned long rate)
> +{
> +	return clk_round_rate(c->parent, rate);
> +}
> +
> +static int tegra_clk_shared_bus_enable(struct clk *c)
> +{
> +	c->u.shared_bus_user.enabled = true;
> +	tegra_clk_shared_bus_update(c->parent);
> +	return 0;
> +}

Shouldn't you call clk_enable(c->parent)? And do you need to check for
errors from clk_enable()?

> +
> +static void tegra_clk_shared_bus_disable(struct clk *c)
> +{
> +	c->u.shared_bus_user.enabled = false;
> +	tegra_clk_shared_bus_update(c->parent);
> +}

And a similar clk_disable(c->parent) here.

-- 
Sent by an employee of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.

^ permalink raw reply

* [PATCH 12/21] ARM: tegra: clock: Add shared bus clock type
From: Colin Cross @ 2011-02-16 21:01 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <4D5C34E0.5000209@codeaurora.org>

On Wed, Feb 16, 2011 at 12:34 PM, Stephen Boyd <sboyd@codeaurora.org> wrote:
> On 02/13/2011 01:40 AM, Colin Cross wrote:
>> +/* shared bus ops */
>> +/*
>> + * Some clocks may have multiple downstream users that need to request a
>> + * higher clock rate. ?Shared bus clocks provide a unique shared_bus_user
>> + * clock to each user. ?The frequency of the bus is set to the highest
>> + * enabled shared_bus_user clock, with a minimum value set by the
>> + * shared bus.
>> + */
>> +static void tegra_clk_shared_bus_update(struct clk *bus)
>> +{
>> + ? ? struct clk *c;
>> + ? ? unsigned long rate = bus->min_rate;
>> +
>> + ? ? list_for_each_entry(c, &bus->shared_bus_list, u.shared_bus_user.node)
>> + ? ? ? ? ? ? if (c->u.shared_bus_user.enabled)
>> + ? ? ? ? ? ? ? ? ? ? rate = max(c->u.shared_bus_user.rate, rate);
>> +
>> + ? ? if (rate != clk_get_rate(bus))
>> + ? ? ? ? ? ? clk_set_rate(bus, rate);
>
> What do you do if clk_set_rate() fails? Should you unwind all the state
> such as the rate and if it's enabled/disabled? Or is it safe to say
> clk_set_rate() can't fail unless the kernel is buggy in which case why
> aren't all those return -E* in the set rate functions just BUG_ONs?

In general, clk_set_rate can fail and return an error, but in this
case the failure may not be directly related to the driver that called
into tegra_clk_shared_bus_update.  For example, if clk_disable is
called on a shared clock handle, the rate may drop to the rate
requested by another shared clock handle.  clk_disable cannot fail, so
there's nothing that could be done with the return code, and the
problem was not caused by the driver that called clk_disable, so an
error would be meaningless.

I will modify tegra_clk_shared_bus_update to BUG on a failed
clk_set_rate, and modify tegra_clk_shared_bus_set_rate to
call clk_round_rate on the parent to ensure that the requested rate is valid.

>> +};
>> +
>> +static void tegra_clk_shared_bus_init(struct clk *c)
>> +{
>> + ? ? c->max_rate = c->parent->max_rate;
>> + ? ? c->u.shared_bus_user.rate = c->parent->max_rate;
>> + ? ? c->state = OFF;
>> +#ifdef CONFIG_DEBUG_FS
>> + ? ? c->set = 1;
>> +#endif
>> +
>> + ? ? list_add_tail(&c->u.shared_bus_user.node,
>> + ? ? ? ? ? ? &c->parent->shared_bus_list);
>> +}
>> +
>> +static int tegra_clk_shared_bus_set_rate(struct clk *c, unsigned long rate)
>> +{
>> + ? ? c->u.shared_bus_user.rate = rate;
>> + ? ? tegra_clk_shared_bus_update(c->parent);
>> + ? ? return 0;
>> +}
>> +
>> +static long tegra_clk_shared_bus_round_rate(struct clk *c, unsigned long rate)
>> +{
>> + ? ? return clk_round_rate(c->parent, rate);
>> +}
>> +
>> +static int tegra_clk_shared_bus_enable(struct clk *c)
>> +{
>> + ? ? c->u.shared_bus_user.enabled = true;
>> + ? ? tegra_clk_shared_bus_update(c->parent);
>> + ? ? return 0;
>> +}
>
> Shouldn't you call clk_enable(c->parent)? And do you need to check for
> errors from clk_enable()?

clk_enable on the parent is handled by the clock op implementation in
mach-tegra/clock.c

>> +
>> +static void tegra_clk_shared_bus_disable(struct clk *c)
>> +{
>> + ? ? c->u.shared_bus_user.enabled = false;
>> + ? ? tegra_clk_shared_bus_update(c->parent);
>> +}
>
> And a similar clk_disable(c->parent) here.

Same for disable

^ permalink raw reply

* [RFC] MMC: error handling improvements
From: Linus Walleij @ 2011-02-16 21:01 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <8ya8vxf4w4w.fsf@huya.qualcomm.com>

2011/2/16 David Brown <davidb@codeaurora.org>:
> On Wed, Feb 16 2011, Linus Walleij wrote:
>> 2011/2/16 David Brown <davidb@codeaurora.org>:
>
>>> It's also possible this is finding problems in our SDCC driver.
>>
>> The SDCC is obviously an MMCI derivate, VHDL hacking
>> on top of ARMs source code for PL180/PL181.
>>
>> Why do you insist on maintaining a forked driver?
>
> Well, it's not me insisting on it. ?I'll let the maintainers of the
> driver chime in.

Yeah OK. I tried writing them last week on linux-arm-kernel
with more or less the same question.

>?The driver doesn't directly
> access the registers of the controller, but all accesses go through a
> custom DMA engine.
> (...)
> The SDCC block is shared between
> the modem processor and the processor running Linux.  If the driver
> doesn't go through the DMA engine, which coordinates this, the registers
> will be stomped on by the other CPU whenever it decides to access it's
> parts of the flash device.

That's significant, I agree. That the DMA engine is custom
instead of using the <linux/dmaengine.h> interface is not
making things easier, but it's another issue. If it did, I think it
could quite easily use mmci.c.

At the same time what you're saying sounds very weird:
the ios handler in mmc_sdcc does not request any DMA
channel before messing with the hardware, it simply just write
into registers very much in the style of mmci.c. Wouldn't that
disturb any simultaneous access to the MMC from another
CPU?

The DMA code path doesn't look one bit different from
what we currently do for the generic DMA engine in
mmci.c, it sets up a DMA job from the sglist in the datapath,
but maybe I'm not looking close enough?

> I suspect the changes to mmci would be fairly drastic.

I don't think so, but the changes to the DMA engine
(I guess mach-msm/dma.c) would potentially be pretty drastic,
apart from just moving the thing to drivers/dma.

Actually when I look at the code in msm_sdcc.c it looks
like some of the code we usually centralize into the
DMA engine (like the thing iterating over a sglist and
packing it into some custom struct called "box") is instead
spread out in the client drivers.

I just wanted to raise the issue because I see that the
msm_sdcc driver is trying to e.g. synchronize against
dataend signals and such stuff that we've worked with
recently in mmci.c, and I really think it would be in the
MSM platforms best interest to use this driver rather than
its own.

Yours,
Linus Walleij

^ permalink raw reply

* [PATCH 1/2] ARM: improvements to compressed/head.S
From: Nicolas Pitre @ 2011-02-16 21:39 UTC (permalink / raw)
  To: linux-arm-kernel

In the case of a conflict between the memory used by the compressed
kernel with its decompressor code and the memory used for the
decompressed kernel, we currently store the later after the former and
relocate it afterwards.

This would be more efficient to do this the other way around i.e.
relocate the compressed data up front instead, resulting in a smaller
copy.  That also has the advantage of making the code smaller and more
straight forward.

Signed-off-by: Nicolas Pitre <nicolas.pitre@linaro.org>
---
 arch/arm/boot/compressed/head.S |  239 ++++++++++++++++++---------------------
 1 files changed, 110 insertions(+), 129 deletions(-)

diff --git a/arch/arm/boot/compressed/head.S b/arch/arm/boot/compressed/head.S
index 7193884..200625c 100644
--- a/arch/arm/boot/compressed/head.S
+++ b/arch/arm/boot/compressed/head.S
@@ -174,9 +174,7 @@ not_angel:
 		 */
 
 		.text
-		adr	r0, LC0
-		ldmia	r0, {r1, r2, r3, r5, r6, r11, ip}
-		ldr	sp, [r0, #28]
+
 #ifdef CONFIG_AUTO_ZRELADDR
 		@ determine final kernel image address
 		mov	r4, pc
@@ -185,35 +183,108 @@ not_angel:
 #else
 		ldr	r4, =zreladdr
 #endif
-		subs	r0, r0, r1		@ calculate the delta offset
 
-						@ if delta is zero, we are
-		beq	not_relocated		@ running at the address we
-						@ were linked at.
+		bl	cache_on
+
+restart:	adr	r0, LC0
+		ldmia	r0, {r1, r2, r3, r5, r6, r9, r11, r12}
+		ldr	sp, [r0, #32]
+
+		/*
+		 * We might be running at a different address.  We need
+		 * to fix up various pointers.
+		 */
+		sub	r0, r0, r1		@ calculate the delta offset
+		add	r5, r5, r0		@ _start
+		add	r6, r6, r0		@ _edata
 
+#ifndef CONFIG_ZBOOT_ROM
+		/* malloc space is above the relocated stack (64k max) */
+		add	sp, sp, r0
+		add	r10, sp, #0x10000
+#else
 		/*
-		 * We're running at a different address.  We need to fix
-		 * up various pointers:
-		 *   r5 - zImage base address (_start)
-		 *   r6 - size of decompressed image
-		 *   r11 - GOT start
-		 *   ip - GOT end
+		 * With ZBOOT_ROM the bss/stack is non relocatable,
+		 * but someone could still run this code from RAM,
+		 * in which case our reference is _edata.
 		 */
-		add	r5, r5, r0
+		mov	r10, r6
+#endif
+
+/*
+ * Check to see if we will overwrite ourselves.
+ *   r4  = final kernel address
+ *   r5  = start of this image
+ *   r9  = size of decompressed image
+ *   r10 = end of this image, including  bss/stack/malloc space if non XIP
+ * We basically want:
+ *   r4 >= r10 -> OK
+ *   r4 + image length <= r5 -> OK
+ */
+		cmp	r4, r10
+		bhs	wont_overwrite
+		add	r10, r4, r9
+		cmp	r10, r5
+		bls	wont_overwrite
+
+/*
+ * Relocate ourselves past the end of the decompressed kernel.
+ *   r5  = start of this image
+ *   r6  = _edata 
+ *   r10 = end of the decompressed kernel
+ * Because we always copy ahead, we need to do it from the end and go
+ * backward in case the source and destination overlap.
+ */
+		/* Round up to next 256-byte boundary. */
+		add	r10, r10, #256
+		bic	r10, r10, #255
+
+		sub	r9, r6, r5		@ size to copy
+		add	r9, r9, #31		@ rounded up to a multiple
+		bic	r9, r9, #31		@ ... of 32 bytes
+		add	r6, r9, r5
+		add	r9, r9, r10
+		
+1:		ldmdb	r6!, {r0 - r3, r10 - r12, lr}
+		cmp	r6, r5
+		stmdb	r9!, {r0 - r3, r10 - r12, lr}
+		bhi	1b
+
+		/* Preserve offset to relocated code. */
+		sub	r6, r9, r6
+
+		bl	cache_clean_flush
+
+		adr	r0, BSYM(restart)
+		add	r0, r0, r6
+		mov	pc, r0
+
+wont_overwrite:
+/*
+ * If delta is zero, we are running at the address we were linked at.
+ *   r0  = delta
+ *   r2  = BSS start
+ *   r3  = BSS end
+ *   r4  = kernel execution address
+ *   r7  = architecture ID
+ *   r8  = atags pointer
+ *   r11 = GOT start
+ *   r12 = GOT end
+ *   sp  = stack pointer
+ */
+		teq	r0, #0
+		beq	not_relocated
 		add	r11, r11, r0
-		add	ip, ip, r0
+		add	r12, r12, r0
 
 #ifndef CONFIG_ZBOOT_ROM
 		/*
 		 * If we're running fully PIC === CONFIG_ZBOOT_ROM = n,
 		 * we need to fix up pointers into the BSS region.
-		 *   r2 - BSS start
-		 *   r3 - BSS end
-		 *   sp - stack pointer
+		 * Note that the stack pointer has already been fixed up.
 		 */
 		add	r2, r2, r0
 		add	r3, r3, r0
-		add	sp, sp, r0
 
 		/*
 		 * Relocate all entries in the GOT table.
@@ -221,7 +292,7 @@ not_angel:
 1:		ldr	r1, [r11, #0]		@ relocate entries in the GOT
 		add	r1, r1, r0		@ table.  This fixes up the
 		str	r1, [r11], #4		@ C references.
-		cmp	r11, ip
+		cmp	r11, r12
 		blo	1b
 #else
 
@@ -234,7 +305,7 @@ not_angel:
 		cmphs	r3, r1			@ _end < entry
 		addlo	r1, r1, r0		@ table.  This fixes up the
 		str	r1, [r11], #4		@ C references.
-		cmp	r11, ip
+		cmp	r11, r12
 		blo	1b
 #endif
 
@@ -246,76 +317,24 @@ not_relocated:	mov	r0, #0
 		cmp	r2, r3
 		blo	1b
 
-		/*
-		 * The C runtime environment should now be setup
-		 * sufficiently.  Turn the cache on, set up some
-		 * pointers, and start decompressing.
-		 */
-		bl	cache_on
-
-		mov	r1, sp			@ malloc space above stack
-		add	r2, sp, #0x10000	@ 64k max
-
 /*
- * Check to see if we will overwrite ourselves.
- *   r4 = final kernel address
- *   r5 = start of this image
- *   r6 = size of decompressed image
- *   r2 = end of malloc space (and therefore this image)
- * We basically want:
- *   r4 >= r2 -> OK
- *   r4 + image length <= r5 -> OK
+ * The C runtime environment should now be setup sufficiently.
+ * Set up some pointers, and start decompressing.
+ *   r4  = kernel execution address
+ *   r7  = architecture ID
+ *   r8  = atags pointer
  */
-		cmp	r4, r2
-		bhs	wont_overwrite
-		add	r0, r4, r6
-		cmp	r0, r5
-		bls	wont_overwrite
-
-		mov	r5, r2			@ decompress after malloc space
-		mov	r0, r5
+		mov	r0, r4
+		mov	r1, sp			@ malloc space above stack
+		add	r2, sp, #0x10000	@ 64k max
 		mov	r3, r7
 		bl	decompress_kernel
-
-		add	r0, r0, #127 + 128	@ alignment + stack
-		bic	r0, r0, #127		@ align the kernel length
-/*
- * r0     = decompressed kernel length
- * r1-r3  = unused
- * r4     = kernel execution address
- * r5     = decompressed kernel start
- * r7     = architecture ID
- * r8     = atags pointer
- * r9-r12,r14 = corrupted
- */
-		add	r1, r5, r0		@ end of decompressed kernel
-		adr	r2, reloc_start
-		ldr	r3, LC1
-		add	r3, r2, r3
-1:		ldmia	r2!, {r9 - r12, r14}	@ copy relocation code
-		stmia	r1!, {r9 - r12, r14}
-		ldmia	r2!, {r9 - r12, r14}
-		stmia	r1!, {r9 - r12, r14}
-		cmp	r2, r3
-		blo	1b
-		mov	sp, r1
-		add	sp, sp, #128		@ relocate the stack
-
 		bl	cache_clean_flush
- ARM(		add	pc, r5, r0		) @ call relocation code
- THUMB(		add	r12, r5, r0		)
- THUMB(		mov	pc, r12			) @ call relocation code
-
-/*
- * We're not in danger of overwriting ourselves.  Do this the simple way.
- *
- * r4     = kernel execution address
- * r7     = architecture ID
- */
-wont_overwrite:	mov	r0, r4
-		mov	r3, r7
-		bl	decompress_kernel
-		b	call_kernel
+		bl	cache_off
+		mov	r0, #0			@ must be zero
+		mov	r1, r7			@ restore architecture number
+		mov	r2, r8			@ restore atags pointer
+		mov	pc, r4			@ call kernel
 
 		.align	2
 		.type	LC0, #object
@@ -323,11 +342,11 @@ LC0:		.word	LC0			@ r1
 		.word	__bss_start		@ r2
 		.word	_end			@ r3
 		.word	_start			@ r5
-		.word	_image_size		@ r6
+		.word	_edata			@ r6
+		.word	_image_size		@ r9
 		.word	_got_start		@ r11
 		.word	_got_end		@ ip
 		.word	user_stack_end		@ sp
-LC1:		.word	reloc_end - reloc_start
 		.size	LC0, . - LC0
 
 #ifdef CONFIG_ARCH_RPC
@@ -353,7 +372,7 @@ params:		ldr	r0, =0x10000100		@ params_phys for RPC
  * On exit,
  *  r0, r1, r2, r3, r9, r10, r12 corrupted
  * This routine must preserve:
- *  r4, r5, r6, r7, r8
+ *  r4, r7, r8
  */
 		.align	5
 cache_on:	mov	r3, #8			@ cache_on function
@@ -551,43 +570,6 @@ __common_mmu_cache_on:
 #endif
 
 /*
- * All code following this line is relocatable.  It is relocated by
- * the above code to the end of the decompressed kernel image and
- * executed there.  During this time, we have no stacks.
- *
- * r0     = decompressed kernel length
- * r1-r3  = unused
- * r4     = kernel execution address
- * r5     = decompressed kernel start
- * r7     = architecture ID
- * r8     = atags pointer
- * r9-r12,r14 = corrupted
- */
-		.align	5
-reloc_start:	add	r9, r5, r0
-		sub	r9, r9, #128		@ do not copy the stack
-		debug_reloc_start
-		mov	r1, r4
-1:
-		.rept	4
-		ldmia	r5!, {r0, r2, r3, r10 - r12, r14}	@ relocate kernel
-		stmia	r1!, {r0, r2, r3, r10 - r12, r14}
-		.endr
-
-		cmp	r5, r9
-		blo	1b
-		mov	sp, r1
-		add	sp, sp, #128		@ relocate the stack
-		debug_reloc_end
-
-call_kernel:	bl	cache_clean_flush
-		bl	cache_off
-		mov	r0, #0			@ must be zero
-		mov	r1, r7			@ restore architecture number
-		mov	r2, r8			@ restore atags pointer
-		mov	pc, r4			@ call kernel
-
-/*
  * Here follow the relocatable cache support functions for the
  * various processors.  This is a generic hook for locating an
  * entry and jumping to an instruction at the specified offset
@@ -791,7 +773,7 @@ proc_types:
  * On exit,
  *  r0, r1, r2, r3, r9, r12 corrupted
  * This routine must preserve:
- *  r4, r6, r7
+ *  r4, r7, r8
  */
 		.align	5
 cache_off:	mov	r3, #12			@ cache_off function
@@ -866,7 +848,7 @@ __armv3_mmu_cache_off:
  * On exit,
  *  r1, r2, r3, r9, r10, r11, r12 corrupted
  * This routine must preserve:
- *  r0, r4, r5, r6, r7
+ *  r4, r6, r7, r8
  */
 		.align	5
 cache_clean_flush:
@@ -1088,7 +1070,6 @@ memdump:	mov	r12, r0
 #endif
 
 		.ltorg
-reloc_end:
 
 		.align
 		.section ".stack", "aw", %nobits
-- 
1.7.4

^ permalink raw reply related

* [PATCH 2/2] ARM: remove the 4x expansion presumption while decompressing the kernel
From: Nicolas Pitre @ 2011-02-16 21:39 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <1297892343-29064-1-git-send-email-nico@fluxnic.net>

We currently presume a 4x expansion to guess the decompressed kernel size
in order to determine if the decompressed kernel is in conflict with
the location where zImage is loaded.  This guess may cause many issues
by overestimating the final kernel image size:

- This may force a needless relocation if the location of zImage was
  fine, wasting some precious microseconds of boot time.

- The relocation may be located way too far, possibly overwriting the
  initrd image in RAM.

- If the kernel image includes a large already-compressed initramfs image
  then the problem is even more exacerbated.

And if by some strange means the 4x guess is too low then we may overwrite
ourselves with the decompressed image.

So let's use the exact decompressed kernel image size instead.  For that
we need to rely on the stat command, but this is hardly a new build
dependency as the kernel already depends on many commands provided by
the same coreutils package where stat is found to be built.

Signed-off-by: Nicolas Pitre <nicolas.pitre@linaro.org>
---
 arch/arm/boot/compressed/Makefile       |    4 +++-
 arch/arm/boot/compressed/vmlinux.lds.in |    3 ---
 2 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/arch/arm/boot/compressed/Makefile b/arch/arm/boot/compressed/Makefile
index 0a8f748..9d328be 100644
--- a/arch/arm/boot/compressed/Makefile
+++ b/arch/arm/boot/compressed/Makefile
@@ -83,9 +83,11 @@ endif
 EXTRA_CFLAGS  := -fpic -fno-builtin
 EXTRA_AFLAGS  := -Wa,-march=all
 
+# Provide size of uncompressed kernel to the decompressor via a linker symbol.
+LDFLAGS_vmlinux := --defsym _image_size=$(shell stat -c "%s" $(obj)/../Image)
 # Supply ZRELADDR to the decompressor via a linker symbol.
 ifneq ($(CONFIG_AUTO_ZRELADDR),y)
-LDFLAGS_vmlinux := --defsym zreladdr=$(ZRELADDR)
+LDFLAGS_vmlinux += --defsym zreladdr=$(ZRELADDR)
 endif
 ifeq ($(CONFIG_CPU_ENDIAN_BE8),y)
 LDFLAGS_vmlinux += --be8
diff --git a/arch/arm/boot/compressed/vmlinux.lds.in b/arch/arm/boot/compressed/vmlinux.lds.in
index 366a924..5309909 100644
--- a/arch/arm/boot/compressed/vmlinux.lds.in
+++ b/arch/arm/boot/compressed/vmlinux.lds.in
@@ -43,9 +43,6 @@ SECTIONS
 
   _etext = .;
 
-  /* Assume size of decompressed image is 4x the compressed image */
-  _image_size = (_etext - _text) * 4;
-
   _got_start = .;
   .got			: { *(.got) }
   _got_end = .;
-- 
1.7.4

^ permalink raw reply related

* [PATCH 12/21] ARM: tegra: clock: Add shared bus clock type
From: Stephen Boyd @ 2011-02-16 21:51 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <AANLkTi=knKY65xaOT85jFidy6Be8K8RT0L1XtGqq_v2V@mail.gmail.com>

On 02/16/2011 01:01 PM, Colin Cross wrote:
> On Wed, Feb 16, 2011 at 12:34 PM, Stephen Boyd <sboyd@codeaurora.org> wrote:
>>
>> What do you do if clk_set_rate() fails? Should you unwind all the state
>> such as the rate and if it's enabled/disabled? Or is it safe to say
>> clk_set_rate() can't fail unless the kernel is buggy in which case why
>> aren't all those return -E* in the set rate functions just BUG_ONs?
>
> In general, clk_set_rate can fail and return an error, but in this
> case the failure may not be directly related to the driver that called
> into tegra_clk_shared_bus_update.  For example, if clk_disable is
> called on a shared clock handle, the rate may drop to the rate
> requested by another shared clock handle.  clk_disable cannot fail, so
> there's nothing that could be done with the return code, and the
> problem was not caused by the driver that called clk_disable, so an
> error would be meaningless.
>
>
> I will modify tegra_clk_shared_bus_update to BUG on a failed
> clk_set_rate,

Yes, currently if there are any errors we can't determine which clock's
vote is causing the error since the rate only takes effect when the
clock is enabled and another clock could have updated their rate while
the list is being iterated over.

The code could be written so that we only call clk_set_rate() on the
parent when the calling clock affects the aggregate rate. That would
allow us to catch errors for the clk_enable() path and the
clk_set_rate() path when the clock is already on. In the clk_disable()
path we could just ignore the errors since we can't do anything anyways
like you say. The only path that doesn't seem possible is clk_set_rate()
when the clock is off which presumably doesn't actually matter since
when you turn the clock on it will fail the same way (delayed
clk_set_rate?).

BTW, is there a race there in the rate updating code? Say clock 1 is
enabled with rate 2 on cpu1 and clock 2 is enabled at the same time with
rate 3 (currently the greatest rate) on cpu2. clock 1 is iterating over
the list and sees that clock 2 is enabled so it calculates 3 as the max.
clock 2 then returns from the enable call and then a call to disable
clock 2 comes in. clock 1 is still iterating over the list and clock 2's
call to disable runs to completion. clock 1 finally stops iterating over
the list and has an aggregated rate of 3 (since it saw that clock 2 was
on which is no longer true). It then calls set_rate() with 3 even though
the only clock that is on is clock 1 with a rate of 2.

c1 and c2 are off initially

CPU1                    CPU2
----                    -----
clk_set_rate(c1, 2)     clk_set_rate(c2, 3)
clk_enable(c1)          clk_enable(c2)
max == 3                max == 3; clk_set_rate(parent, 3)
...                     clk_disable(c2)
                        max == 2; clk_set_rate(parent, 2)
clk_set_rate(parent, 3)

I think you need some kind of lock while iterating to stop the shared
clocks from changing underneath you.

> and modify tegra_clk_shared_bus_set_rate to
> call clk_round_rate on the parent to ensure that the requested rate is valid.
>

I would hope clk_round_rate() isn't necessary to get a valid rate.
clk_set_rate() shouldn't require exact/valid rates. clk_round_rate() is
there to help drivers determine if calling clk_set_rate() with a certain
rate is going to give them something they want. It's like saying "If I
call clk_set_rate() with 500Hz what would the clock's rate actually be
after the call returns?" If the set_rate implementation needs to round
internally to find a divider or something, it should be done in the
set_rate code and not in each driver.

>>
>> Shouldn't you call clk_enable(c->parent)? And do you need to check for
>> errors from clk_enable()?
>
> clk_enable on the parent is handled by the clock op implementation in
> mach-tegra/clock.c
>

Oops, thanks. Time to visit the optometrist.

-- 
Sent by an employee of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.

^ permalink raw reply

* [RFC] Inter-processor Mailboxes Drivers
From: Linus Walleij @ 2011-02-16 21:54 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <DD7A9A95166BF4418C4C1EB2033B6EE2038FAADB@na3-mail.mgc.mentorg.com>

2011/2/15 Blanchard, Hollis <Hollis_Blanchard@mentor.com>:

> OpenMCAPI (http://openmcapi.org) implements the MCAPI specification,
> which is a simple application-level communication API that uses shared
> memory. The API could be layered over any protocol, but was more or less
> designed for simple shared-memory systems, e.g. fixed topology, no
> retransmission, etc.

Cool...

> Currently, we implement almost all of this as a shared library, plus a
> very small kernel driver. The only requirements on the kernel are to
> allow userspace to map the shared memory area, and provide an IPI
> mechanism (and allow the process to sleep while waiting). Applications
> sync with each other using normal atomic memory operations.

Can't this real small kernel driver take care of the mailbox
business as well?

It seems a bit backward if you have say /dev/mcapi0, /dev/mcapi1
etc (or however you expose this to userspace) and /dev/mailbox0
/dev/mailbox1 etc on top of that. One device node per communication
channel instead of this would certainly be nicer? Then you would
have some ioctl() on the /dev/mcapi0 etc node to trigger the
transport and need not worry that it's a mailbox doing the sync.

What I'm after is that whatever datapath you have should include
the control mechanism, now it's like you're opening two interfaces
into the kernel, one for mapping in data pages, one for synchronizing
the transfers, or am I getting things wrong?

I think nominally all mailbox users would be in-kernel like the
MCAPI driver, so they don't need a userspace interface, to me
it feels like say /dev/mutex0, /dev/mutex1 for some other
shared memory opening into the kernel (such as the framebuffer),
and that would look a bit funny.

> I'll add that we haven't done serious optimization yet, but the numbers
> we do have seem reasonable. What are the "efficiency" issues you're
> worried about?

For huge data flows I think you may get into trouble, needing things
like queueing, descriptor pools etc. But if you're convinced this will
work, do go ahead.

Linus Walleij

^ permalink raw reply

* [PATCH 2/2] ARM: remove the 4x expansion presumption while decompressing the kernel
From: Stephen Boyd @ 2011-02-16 21:55 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <1297892343-29064-2-git-send-email-nico@fluxnic.net>

On 02/16/2011 01:39 PM, Nicolas Pitre wrote:
> -LDFLAGS_vmlinux := --defsym zreladdr=$(ZRELADDR)
> +LDFLAGS_vmlinux += --defsym zreladdr=$(ZRELADDR)

What is this for?

-- 
Sent by an employee of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.

^ permalink raw reply

* [PATCH 12/21] ARM: tegra: clock: Add shared bus clock type
From: Colin Cross @ 2011-02-16 22:03 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <4D5C46D7.1040903@codeaurora.org>

On Wed, Feb 16, 2011 at 1:51 PM, Stephen Boyd <sboyd@codeaurora.org> wrote:
> On 02/16/2011 01:01 PM, Colin Cross wrote:
>> On Wed, Feb 16, 2011 at 12:34 PM, Stephen Boyd <sboyd@codeaurora.org> wrote:
>>>
>>> What do you do if clk_set_rate() fails? Should you unwind all the state
>>> such as the rate and if it's enabled/disabled? Or is it safe to say
>>> clk_set_rate() can't fail unless the kernel is buggy in which case why
>>> aren't all those return -E* in the set rate functions just BUG_ONs?
>>
>> In general, clk_set_rate can fail and return an error, but in this
>> case the failure may not be directly related to the driver that called
>> into tegra_clk_shared_bus_update. ?For example, if clk_disable is
>> called on a shared clock handle, the rate may drop to the rate
>> requested by another shared clock handle. ?clk_disable cannot fail, so
>> there's nothing that could be done with the return code, and the
>> problem was not caused by the driver that called clk_disable, so an
>> error would be meaningless.
>>
>>
>> I will modify tegra_clk_shared_bus_update to BUG on a failed
>> clk_set_rate,
>
> Yes, currently if there are any errors we can't determine which clock's
> vote is causing the error since the rate only takes effect when the
> clock is enabled and another clock could have updated their rate while
> the list is being iterated over.
>
> The code could be written so that we only call clk_set_rate() on the
> parent when the calling clock affects the aggregate rate. That would
> allow us to catch errors for the clk_enable() path and the
> clk_set_rate() path when the clock is already on. In the clk_disable()
> path we could just ignore the errors since we can't do anything anyways
> like you say. The only path that doesn't seem possible is clk_set_rate()
> when the clock is off which presumably doesn't actually matter since
> when you turn the clock on it will fail the same way (delayed
> clk_set_rate?).
>
> BTW, is there a race there in the rate updating code? Say clock 1 is
> enabled with rate 2 on cpu1 and clock 2 is enabled at the same time with
> rate 3 (currently the greatest rate) on cpu2. clock 1 is iterating over
> the list and sees that clock 2 is enabled so it calculates 3 as the max.
> clock 2 then returns from the enable call and then a call to disable
> clock 2 comes in. clock 1 is still iterating over the list and clock 2's
> call to disable runs to completion. clock 1 finally stops iterating over
> the list and has an aggregated rate of 3 (since it saw that clock 2 was
> on which is no longer true). It then calls set_rate() with 3 even though
> the only clock that is on is clock 1 with a rate of 2.
>
> c1 and c2 are off initially
>
> CPU1 ? ? ? ? ? ? ? ? ? ?CPU2
> ---- ? ? ? ? ? ? ? ? ? ?-----
> clk_set_rate(c1, 2) ? ? clk_set_rate(c2, 3)
> clk_enable(c1) ? ? ? ? ?clk_enable(c2)
> max == 3 ? ? ? ? ? ? ? ?max == 3; clk_set_rate(parent, 3)
> ... ? ? ? ? ? ? ? ? ? ? clk_disable(c2)
> ? ? ? ? ? ? ? ? ? ? ? ?max == 2; clk_set_rate(parent, 2)
> clk_set_rate(parent, 3)
>
> I think you need some kind of lock while iterating to stop the shared
> clocks from changing underneath you.
Yes, it needs a lock.  I missed adding it when I converted from a
global lock to a per-clock lock.  The children should lock the parent
before changing any of their state or calling
tegra_clk_shared_bus_update.

>> and modify tegra_clk_shared_bus_set_rate to
>> call clk_round_rate on the parent to ensure that the requested rate is valid.
>>
>
> I would hope clk_round_rate() isn't necessary to get a valid rate.
> clk_set_rate() shouldn't require exact/valid rates. clk_round_rate() is
> there to help drivers determine if calling clk_set_rate() with a certain
> rate is going to give them something they want. It's like saying "If I
> call clk_set_rate() with 500Hz what would the clock's rate actually be
> after the call returns?" If the set_rate implementation needs to round
> internally to find a divider or something, it should be done in the
> set_rate code and not in each driver.

The point is not to get the valid rate, it is to weed out rates that
will be rejected (not rounded) by clk_set_rate, without affecting the
clock.  If clk_set_rate will return an error, clk_round_rate should
also return an error.

>>>
>>> Shouldn't you call clk_enable(c->parent)? And do you need to check for
>>> errors from clk_enable()?
>>
>> clk_enable on the parent is handled by the clock op implementation in
>> mach-tegra/clock.c
>>
>
> Oops, thanks. Time to visit the optometrist.
>
> --
> Sent by an employee of the Qualcomm Innovation Center, Inc.
> The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-tegra" in
> the body of a message to majordomo at vger.kernel.org
> More majordomo info at ?http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply

* [PATCH 2/2] ARM: remove the 4x expansion presumption while decompressing the kernel
From: Nicolas Pitre @ 2011-02-16 22:11 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <4D5C47C7.6010006@codeaurora.org>

On Wed, 16 Feb 2011, Stephen Boyd wrote:

> On 02/16/2011 01:39 PM, Nicolas Pitre wrote:
> > -LDFLAGS_vmlinux := --defsym zreladdr=$(ZRELADDR)
> > +LDFLAGS_vmlinux += --defsym zreladdr=$(ZRELADDR)
> 
> What is this for?

It replaces the assignment operator with an addition operator ?


Nicolas

^ permalink raw reply

* [PATCH 00/11] OMAP2+: clock: add clockfw autoidle for iclks, OMAP2xxx
From: Paul Walmsley @ 2011-02-16 22:14 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <b83b1717e83fc907a1caf589cc53badf@mail.gmail.com>

Hi Rajendra

On Wed, 16 Feb 2011, Rajendra Nayak wrote:

> > -----Original Message-----
> > From: linux-arm-kernel-bounces at lists.infradead.org
> [mailto:linux-arm-kernel-bounces at lists.infradead.org] On Behalf
> > Of Paul Walmsley
> > Sent: Wednesday, February 16, 2011 12:23 PM
> >
> > This series also ensures that all clock autoidle is disabled during
> > boot and only re-enabled if CONFIG_PM is enabled.
> 
> This does not seem to be the case. Maybe something like the
> below patch is what is missing..

Thanks for the review, you are absolutely right.  Rather than the patch 
you sent, and since mach-omap2/pm.c is compiled in even if !CONFIG_PM, 
I'd propose a different approach.  Until we can sort out the 
CONFIG_PM/pm.c issue, probably it would make more sense to move the 
autoidle-enable as part of CONFIG_OMAP_RESET_CLOCKS.  Will send a patch in 
reply to the original thread.


- Paul

^ permalink raw reply

* [PATCH 2/2] ARM: remove the 4x expansion presumption while decompressing the kernel
From: Stephen Boyd @ 2011-02-16 22:14 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <alpine.LFD.2.00.1102161707180.14920@xanadu.home>

On 2/16/2011 2:11 PM, Nicolas Pitre wrote:
> On Wed, 16 Feb 2011, Stephen Boyd wrote:
>
>> On 02/16/2011 01:39 PM, Nicolas Pitre wrote:
>>> -LDFLAGS_vmlinux := --defsym zreladdr=$(ZRELADDR)
>>> +LDFLAGS_vmlinux += --defsym zreladdr=$(ZRELADDR)
>> What is this for?
> It replaces the assignment operator with an addition operator ?

Ah sorry, ignore me.

^ permalink raw reply

* [PATCHv3] [ARM] orion5x: accelerate NAND on the TS-78xx
From: Alexander Clouter @ 2011-02-16 22:26 UTC (permalink / raw)
  To: linux-arm-kernel

The NAND supports 32bit reads and writes so lets stop shunting 8bit
chunks across the bus.

Doing a dumb 'dd' benchmark, this increases performance roughly like so:
 * read: 1.3MB/s to 3.4MB/s
 * write: 614kB/s to 882kB/s

Changelog:
 v3: added cast to first parameter of min() to remove gcc warnings
 v2: used approach suggested by Russell King instead
	<20110105003316.GJ24935@n2100.arm.linux.org.uk>
 v1: initial release <20110104235158.GQ12386@chipmunk>

Signed-off-by: Alexander Clouter <alex@digriz.org.uk>
---
 arch/arm/mach-orion5x/ts78xx-setup.c |   56 ++++++++++++++++++++++++++++++++++
 1 files changed, 56 insertions(+), 0 deletions(-)

diff --git a/arch/arm/mach-orion5x/ts78xx-setup.c b/arch/arm/mach-orion5x/ts78xx-setup.c
index c1c1cd0..4431309 100644
--- a/arch/arm/mach-orion5x/ts78xx-setup.c
+++ b/arch/arm/mach-orion5x/ts78xx-setup.c
@@ -191,6 +191,60 @@ static int ts78xx_ts_nand_dev_ready(struct mtd_info *mtd)
 	return readb(TS_NAND_CTRL) & 0x20;
 }
 
+static void ts78xx_ts_nand_write_buf(struct mtd_info *mtd,
+			const uint8_t *buf, int len)
+{
+	struct nand_chip *chip = mtd->priv;
+	void __iomem *io_base = chip->IO_ADDR_W;
+	unsigned long off = ((unsigned long)buf & 3);
+	int sz;
+
+	if (off) {
+		sz = min((int) (4 - off), len);
+		writesb(io_base, buf, sz);
+		buf += sz;
+		len -= sz;
+	}
+
+	sz = len >> 2;
+	if (sz) {
+		u32 *buf32 = (u32 *)buf;
+		writesl(io_base, buf32, sz);
+		buf += sz << 2;
+		len -= sz << 2;
+	}
+
+	if (len)
+		writesb(io_base, buf, len);
+}
+
+static void ts78xx_ts_nand_read_buf(struct mtd_info *mtd,
+			uint8_t *buf, int len)
+{
+	struct nand_chip *chip = mtd->priv;
+	void __iomem *io_base = chip->IO_ADDR_R;
+	unsigned long off = ((unsigned long)buf & 3);
+	int sz;
+
+	if (off) {
+		sz = min((int) (4 - off), len);
+		readsb(io_base, buf, sz);
+		buf += sz;
+		len -= sz;
+	}
+
+	sz = len >> 2;
+	if (sz) {
+		u32 *buf32 = (u32 *)buf;
+		readsl(io_base, buf32, sz);
+		buf += sz << 2;
+		len -= sz << 2;
+	}
+
+	if (len)
+		readsb(io_base, buf, len);
+}
+
 const char *ts_nand_part_probes[] = { "cmdlinepart", NULL };
 
 static struct mtd_partition ts78xx_ts_nand_parts[] = {
@@ -233,6 +287,8 @@ static struct platform_nand_data ts78xx_ts_nand_data = {
 		 */
 		.cmd_ctrl		= ts78xx_ts_nand_cmd_ctrl,
 		.dev_ready		= ts78xx_ts_nand_dev_ready,
+		.write_buf		= ts78xx_ts_nand_write_buf,
+		.read_buf		= ts78xx_ts_nand_read_buf,
 	},
 };
 
-- 
1.7.2.3

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox