LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH v6 0/8] ptp: IEEE 1588 hardware clock support
From: Alan Cox @ 2010-09-24 14:57 UTC (permalink / raw)
  To: Richard Cochran
  Cc: Rodolfo Giometti, Arnd Bergmann, Peter Zijlstra, john stultz,
	devicetree-discuss, linux-kernel, netdev, Thomas Gleixner,
	linux-api, Christoph Lameter, linuxppc-dev, David Miller,
	linux-arm-kernel, Krzysztof Halasa
In-Reply-To: <20100924135001.GB3113@riccoc20.at.omicron.at>

> > Instead, I think having the id hanging off the class driver is much
> > better, as it allows mapping the actual hardware to the id more clearly.
> > 
> > So I'd drop the "timesource" listing. And maybe change "id" to
> > "clock_id" so its a little more clear what the id is for.
> 
> Okay, I will drop /sys/class/timesource (hope Alan Cox agrees :)

It makes sense to hang anything off the physical id

> I threw it out there mostly for the sake of discussion. I imagined
> that there could be other properties in that directory, like time
> scale (TAI, UTC, etc). But it seems like we don't really need anything
> in that direction.

They can still hang off the physical device. Thats really a detail

> > interrupts are awfully frequent, so systems concerned with power-saving
> > and deep idles probably would like something that could be done at a
> > more coarse interval.
> 
> We could always make the pulse rate programmable, for power-saving
> applications.

I would expect the kernel drivers to be responsible for
- Turning off when they can
- Picking rates that are power optimal for the requirement

The latter is a bit interesting as I don't see anything in any of the
timer APIs to express accuracy (a problem we have in kernel too).
Historically it simply hasn't mattered.

^ permalink raw reply

* Re: [PATCH] powerpc: Fix invalid page flags in create TLB CAM pathfor PTE_64BIT
From: Scott Wood @ 2010-09-24 15:02 UTC (permalink / raw)
  To: Chen, Tiejun; +Cc: linuxppc-dev, Gortmaker, Paul
In-Reply-To: <52CF90264091A14888078A031D780F4306C8C175@ism-mail03.corp.ad.wrs.com>

On Fri, 24 Sep 2010 07:04:28 +0200
"Chen, Tiejun" <Tiejun.Chen@windriver.com> wrote:

> > -----Original Message-----
> > From: 
> > linuxppc-dev-bounces+tiejun.chen=windriver.com@lists.ozlabs.or
> > g 
> > [mailto:linuxppc-dev-bounces+tiejun.chen=windriver.com@lists.o
> > zlabs.org] On Behalf Of Benjamin Herrenschmidt
> > Sent: Friday, September 24, 2010 5:59 AM
> > To: Scott Wood
> > Cc: Gortmaker, Paul; linuxppc-dev@lists.ozlabs.org
> > Subject: Re: [PATCH] powerpc: Fix invalid page flags in 
> > create TLB CAM pathfor PTE_64BIT
> > 
> > On Thu, 2010-09-23 at 15:33 -0500, Scott Wood wrote:
> > > I don't see a generic accessor that can test PTE flags for 
> > user access 
> > > -- in the absence of one, I guess we need an ifdef here.  
> > Or at least 
> > > put in a comment so anyone who adds a userspace use knows 
> > they need to 
> > > fix it.
> > 
> > We could make up one in powerpc arch at least
> > 
> > #define pte_user(val) ((val & _PAGE_USER) == _PAGE_USER)
> > 
> 
> Looks good. 
> 
> Ben and Scott,
> 
> But for the patched issue we're discussing we have to do #ifdef that as
> my original modification. Right? Or do you have other suggestion? Then I
> can improve that as v2.

Ben's version should work without any ifdef, since it makes sure
all bits of _PAGE_USER are set.

-Scott

^ permalink raw reply

* Re: [PATCH] spi_mpc8xxx: issue with using definition of pram in Device Tree
From: Scott Wood @ 2010-09-24 15:07 UTC (permalink / raw)
  To: Grant Likely
  Cc: christophe leroy, David Brownell, linux-kernel, spi-devel-general,
	Anton Vorontsov, linuxppc-dev
In-Reply-To: <20100924071006.GA21318@angua.secretlab.ca>

On Fri, 24 Sep 2010 01:10:06 -0600
Grant Likely <grant.likely@secretlab.ca> wrote:

> On Thu, Sep 16, 2010 at 09:05:03AM +0200, christophe leroy wrote:
> > This patch applies to 2.6.34.7 and 2.6.35.4
> > It fixes an issue during the probe for CPM1 with definition of parameter ram from DTS
> > 
> > Signed-off-by: christophe leroy <christophe.leroy@c-s.fr>
> 
> I'm sorry, I don't understand the fix from the given description.
> What is the problem, and why is cpm_muram_alloc_fixed() the wrong
> thing to call on CPM1?  Does CPM2 still need it?

I don't see how cpm_muram_alloc_fixed() can be used safely at all.  If
you need a fixed address, it shouldn't be part of the general
allocation pool, or something else might get it first.

-Scott

^ permalink raw reply

* Re: [PATCH] spi_mpc8xxx: issue with using definition of pram in Device Tree
From: Scott Wood @ 2010-09-24 15:12 UTC (permalink / raw)
  To: Anton Vorontsov
  Cc: LEROY Christophe, spi-devel-general, David Brownell, linuxppc-dev,
	linux-kernel
In-Reply-To: <20100924075740.GA20599@oksana.dev.rtsoft.ru>

On Fri, 24 Sep 2010 11:57:40 +0400
Anton Vorontsov <cbouatmailru@gmail.com> wrote:

> Doesn't explain why that worked on MPC8272 (CPM2) and MPC8560
> (also CPM2) machines though. But here's my guess (I no longer
> have these boards to test it):
> 
> On 8272 I used this node:
> 
> +                       spi@4c0 {
> +                               #address-cells = <1>;
> +                               #size-cells = <0>;
> +                               compatible = "fsl,cpm2-spi", "fsl,spi";
> +                               reg = <0x11a80 0x40 0x89fc 0x2>;
> 
> On that SOC there are two muram data regions 0x0..0x2000 and
> 0x9000..0x9100. Note that we actually don't want "data" regions,
> and the only reason why that worked is that sysdev/cpm_common.c
> maps muram(0)..muram(max).

Wouldn't it still fail the rh_alloc_fixed call?

-Scott

^ permalink raw reply

* Re: [PATCH] powerpc: Fix invalid page flags in create TLB CAM path for PTE_64BIT
From: Paul Gortmaker @ 2010-09-24 16:44 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: Scott Wood, linuxppc-dev
In-Reply-To: <1285279157.5158.17.camel@pasglop>

[Re: [PATCH] powerpc: Fix invalid page flags in create TLB CAM path for PTE_64BIT] On 24/09/2010 (Fri 07:59) Benjamin Herrenschmidt wrote:

> On Thu, 2010-09-23 at 15:33 -0500, Scott Wood wrote:
> > I don't see a generic accessor that can test PTE flags for user
> > access -- in the absence of one, I guess we need an ifdef here.  Or at
> > least put in a comment so anyone who adds a userspace use knows they
> > need to fix it. 
> 
> We could make up one in powerpc arch at least
> 
> #define pte_user(val) ((val & _PAGE_USER) == _PAGE_USER)
> 
> would do

I've put the above into pte-common.h, restored the deleted code block
which now uses pte_user() and I've updated the commit header to match.
Passes sanity boot test on an sbc8548 both with and without PTE_64BIT.

Thanks for the feedback.
Paul.


>From d48ebb58b8214f9faec775a5e06902f638f165cf Mon Sep 17 00:00:00 2001
From: Tiejun Chen <tiejun.chen@windriver.com>
Date: Tue, 21 Sep 2010 19:31:31 +0800
Subject: [PATCH] powerpc: Fix invalid page flags in create TLB CAM path for PTE_64BIT

There exists a four line chunk of code, which when configured for
64 bit address space, can incorrectly set certain page flags during
the TLB creation.  It turns out that this is code which isn't used,
but might still serve a purpose.  Since it isn't obvious why it exists
or why it causes problems, the below description covers both in detail.

For powerpc bootstrap, the physical memory (at most 768M), is mapped
into the kernel space via the following path:

MMU_init()
    |
    + adjust_total_lowmem()
            |
            + map_mem_in_cams()
                    |
                    + settlbcam(i, virt, phys, cam_sz, PAGE_KERNEL_X, 0);

On settlbcam(), the kernel will create TLB entries according to the flag,
PAGE_KERNEL_X.

settlbcam()
{
        ...
        TLBCAM[index].MAS1 = MAS1_VALID
                        | MAS1_IPROT | MAS1_TSIZE(tsize) | MAS1_TID(pid);
                                ^
			These entries cannot be invalidated by the
			kernel since MAS1_IPROT is set on TLB property.
        ...
        if (flags & _PAGE_USER) {
           TLBCAM[index].MAS3 |= MAS3_UX | MAS3_UR;
           TLBCAM[index].MAS3 |= ((flags & _PAGE_RW) ? MAS3_UW : 0);
        }

For classic BookE (flags & _PAGE_USER) is 'zero' so it's fine.
But on boards like the the Freescale P4080, we want to support 36-bit
physical address on it. So the following options may be set:

CONFIG_FSL_BOOKE=y
CONFIG_PTE_64BIT=y
CONFIG_PHYS_64BIT=y

As a result, boards like the P4080 will introduce PTE format as Book3E.
As per the file: arch/powerpc/include/asm/pgtable-ppc32.h

  * #elif defined(CONFIG_FSL_BOOKE) && defined(CONFIG_PTE_64BIT)
  * #include <asm/pte-book3e.h>

So PAGE_KERNEL_X is __pgprot(_PAGE_BASE | _PAGE_KERNEL_RWX) and the
book3E version of _PAGE_KERNEL_RWX is defined with:

  (_PAGE_BAP_SW | _PAGE_BAP_SR | _PAGE_DIRTY | _PAGE_BAP_SX)

Note the _PAGE_BAP_SR, which is also defined in the book3E _PAGE_USER:

  #define _PAGE_USER        (_PAGE_BAP_UR | _PAGE_BAP_SR) /* Can be read */

So the possibility exists to wrongly assign the user MAS3_U<RWX> bits
to kernel (PAGE_KERNEL_X) address space via the following code fragment:

        if (flags & _PAGE_USER) {
           TLBCAM[index].MAS3 |= MAS3_UX | MAS3_UR;
           TLBCAM[index].MAS3 |= ((flags & _PAGE_RW) ? MAS3_UW : 0);
        }

Here is a dump of the TLB info from Simics with the above code present:
------
L2 TLB1
                                            GT                   SSS UUU V I
 Row  Logical           Physical            SS TLPID  TID  WIMGE XWR XWR F P   V
----- ----------------- ------------------- -- ----- ----- ----- --- --- - -   -
  0   c0000000-cfffffff 000000000-00fffffff 00     0     0   M   XWR XWR 0 1   1
  1   d0000000-dfffffff 010000000-01fffffff 00     0     0   M   XWR XWR 0 1   1
  2   e0000000-efffffff 020000000-02fffffff 00     0     0   M   XWR XWR 0 1   1

Actually this conditional code was used for two legacy functions:

  1: support KGDB to set break point.
     KGDB already dropped this; now uses its core write to set break point.

  2: io_block_mapping() to create TLB in segmentation size (not PAGE_SIZE)
     for device IO space.
     This use case is also removed from the latest PowerPC kernel.

However, there may still be a use case for it in the future, like
large user pages, so we can't remove it entirely.  As an alternative,
we match on all bits of _PAGE_USER instead of just any bits, so the
case where just _PAGE_BAP_SR is set can't sneak through.

With this done, the TLB appears without U having XWR as below:

-------
L2 TLB1
                                            GT                   SSS UUU V I
 Row  Logical           Physical            SS TLPID  TID  WIMGE XWR XWR F P   V
----- ----------------- ------------------- -- ----- ----- ----- --- --- - -   -
  0   c0000000-cfffffff 000000000-00fffffff 00     0     0   M   XWR     0 1   1
  1   d0000000-dfffffff 010000000-01fffffff 00     0     0   M   XWR     0 1   1
  2   e0000000-efffffff 020000000-02fffffff 00     0     0   M   XWR     0 1   1

Signed-off-by: Tiejun Chen <tiejun.chen@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
---
 arch/powerpc/include/asm/pte-common.h |    7 +++++++
 arch/powerpc/mm/fsl_booke_mmu.c       |    3 ++-
 2 files changed, 9 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/include/asm/pte-common.h b/arch/powerpc/include/asm/pte-common.h
index f2b3701..76bb195 100644
--- a/arch/powerpc/include/asm/pte-common.h
+++ b/arch/powerpc/include/asm/pte-common.h
@@ -171,6 +171,13 @@ extern unsigned long bad_call_to_PMD_PAGE_SIZE(void);
 /* Make modules code happy. We don't set RO yet */
 #define PAGE_KERNEL_EXEC	PAGE_KERNEL_X
 
+/*
+ * Don't just check for any non zero bits in __PAGE_USER, since for book3e
+ * and PTE_64BIT, PAGE_KERNEL_X contains _PAGE_BAP_SR which is also in
+ * _PAGE_USER.  Need to explictly match _PAGE_BAP_UR bit in that case too.
+ */
+#define pte_user(val)		((val & _PAGE_USER) == _PAGE_USER)
+
 /* Advertise special mapping type for AGP */
 #define PAGE_AGP		(PAGE_KERNEL_NC)
 #define HAVE_PAGE_AGP
diff --git a/arch/powerpc/mm/fsl_booke_mmu.c b/arch/powerpc/mm/fsl_booke_mmu.c
index 4b66a1e..1b4354d 100644
--- a/arch/powerpc/mm/fsl_booke_mmu.c
+++ b/arch/powerpc/mm/fsl_booke_mmu.c
@@ -137,7 +137,8 @@ static void settlbcam(int index, unsigned long virt, phys_addr_t phys,
 	if (mmu_has_feature(MMU_FTR_BIG_PHYS))
 		TLBCAM[index].MAS7 = (u64)phys >> 32;
 
-	if (flags & _PAGE_USER) {
+	/* Below is unlikely -- only for large user pages or similar */
+	if (pte_user(flags)) {
 	   TLBCAM[index].MAS3 |= MAS3_UX | MAS3_UR;
 	   TLBCAM[index].MAS3 |= ((flags & _PAGE_RW) ? MAS3_UW : 0);
 	}
-- 
1.7.2.1

^ permalink raw reply related

* Re: [PATCH 1/8] posix clocks: introduce a syscall for clock tuning.
From: john stultz @ 2010-09-24 17:55 UTC (permalink / raw)
  To: Richard Cochran
  Cc: Rodolfo Giometti, Arnd Bergmann, Peter Zijlstra, linux-api,
	devicetree-discuss, linux-kernel, Thomas Gleixner, netdev,
	Christoph Lameter, linuxppc-dev, David Miller, linux-arm-kernel,
	Krzysztof Halasa
In-Reply-To: <20100924072946.GA5043@riccoc20.at.omicron.at>

On Fri, 2010-09-24 at 09:29 +0200, Richard Cochran wrote:
> On Thu, Sep 23, 2010 at 12:48:51PM -0700, john stultz wrote:
> > So I'd still split this patch up a little bit more.
> > 
> > 1) Patch that implements the ADJ_SETOFFSET  (*and its implementation*)
> > in do_adjtimex.
> > 
> > 2) Patch that adds the new syscall and clock_id multiplexing.
> > 
> > 3) Patches that wire it up to the rest of the architectures (there's
> > still a bunch missing here).
> 
> I was not sure what the policy is about adding syscalls. Is it the
> syscall author's responsibility to add it into every arch?
> 
> The last time (see a2e2725541fad7) the commit only added half of some
> archs, and ignored others. In my patch, the syscall *really* works on
> the archs that are present in the patch.
> 
> (Actually, I did not test blackfin, since I don't have one, but I
> included it since I know they have a PTP hardware clock.)

I'm not sure about policy, but I think for completeness sake you should
make sure every arch supports a new syscall. You're not expected to be
able to test every one, but getting the basic support patch sent to
maintainers should be done.

> > > +static inline int common_clock_adj(const clockid_t which_clock, struct timex *t)
> > > +{
> > > +	if (CLOCK_REALTIME == which_clock)
> > > +		return do_adjtimex(t);
> > > +	else
> > > +		return -EOPNOTSUPP;
> > > +}
> > 
> > 
> > Would it make sense to point to the do_adjtimex() in the k_clock
> > definition for CLOCK_REALTIME rather then conditionalizing it here?
> 
> But what about CLOCK_MONOTONIC_RAW, for example?

-EOPNOTSUPP

> Does it make sense to allow it to be adjusted?

No. I think only CLOCK_REALTIME would make sense of the existing clocks.

I'm just suggesting you conditionalize it from the function pointer,
rather then in the common function.

thanks
-john

^ permalink raw reply

* [PATCH 0/2] powerpc/47x TLB optimization patches
From: Dave Kleikamp @ 2010-09-24 18:01 UTC (permalink / raw)
  To: Josh Boyer, Benjamin Herrenschmidt; +Cc: linuxppc-dev list, Dave Kleikamp

These two patches reduce the frequency that the tlb caches are flushed in
hardware.  Both the normal tlb cache and the "shadow" tlb cache, which
separates the tlbs for data and instruction access (dTLB and iTLB).

Dave Kleikamp (2):
  476: Set CCR2[DSTI] to prevent isync from flushing shadow TLB
  ppc: lazy flush_tlb_mm for nohash architectures

 arch/powerpc/include/asm/reg_booke.h  |    4 +
 arch/powerpc/kernel/head_44x.S        |   25 ++++++
 arch/powerpc/mm/mmu_context_nohash.c  |  154 ++++++++++++++++++++++++++++++---
 arch/powerpc/mm/mmu_decl.h            |    8 ++
 arch/powerpc/mm/tlb_nohash.c          |   28 +++++-
 arch/powerpc/mm/tlb_nohash_low.S      |   14 +++-
 arch/powerpc/platforms/44x/Kconfig    |    7 ++
 arch/powerpc/platforms/44x/misc_44x.S |   26 ++++++
 8 files changed, 249 insertions(+), 17 deletions(-)

-- 
1.7.2.2

^ permalink raw reply

* [PATCH 1/2] 476: Set CCR2[DSTI] to prevent isync from flushing shadow TLB
From: Dave Kleikamp @ 2010-09-24 18:01 UTC (permalink / raw)
  To: Josh Boyer, Benjamin Herrenschmidt; +Cc: linuxppc-dev list, Dave Kleikamp
In-Reply-To: <1285351297-9999-1-git-send-email-shaggy@linux.vnet.ibm.com>

When the DSTI (Disable Shadow TLB Invalidate) bit is set in the CCR2
register, the isync command does not flush the shadow TLB (iTLB & dTLB).

However, since the shadow TLB does not contain context information, we
want the shadow TLB flushed in situations where we are switching context.
In those situations, we explicitly clear the DSTI bit before performing
isync, and set it again afterward.  We also need to do the same when we
perform isync after explicitly flushing the TLB.

Th setting of the DSTI bit is dependent on
CONFIG_PPC_47x_DISABLE_SHADOW_TLB_INVALIDATE.  When we are confident that
the feature works as expected, the option can probably be removed.

Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/reg_booke.h  |    4 ++++
 arch/powerpc/kernel/head_44x.S        |   25 +++++++++++++++++++++++++
 arch/powerpc/mm/tlb_nohash_low.S      |   14 +++++++++++++-
 arch/powerpc/platforms/44x/Kconfig    |    7 +++++++
 arch/powerpc/platforms/44x/misc_44x.S |   26 ++++++++++++++++++++++++++
 5 files changed, 75 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/include/asm/reg_booke.h b/arch/powerpc/include/asm/reg_booke.h
index 667a498..a7ecbfe 100644
--- a/arch/powerpc/include/asm/reg_booke.h
+++ b/arch/powerpc/include/asm/reg_booke.h
@@ -120,6 +120,7 @@
 #define SPRN_TLB3CFG	0x2B3	/* TLB 3 Config Register */
 #define SPRN_EPR	0x2BE	/* External Proxy Register */
 #define SPRN_CCR1	0x378	/* Core Configuration Register 1 */
+#define SPRN_CCR2_476	0x379	/* Core Configuration Register 2 (476)*/
 #define SPRN_ZPR	0x3B0	/* Zone Protection Register (40x) */
 #define SPRN_MAS7	0x3B0	/* MMU Assist Register 7 */
 #define SPRN_MMUCR	0x3B2	/* MMU Control Register */
@@ -188,6 +189,9 @@
 #define	CCR1_DPC	0x00000100 /* Disable L1 I-Cache/D-Cache parity checking */
 #define	CCR1_TCS	0x00000080 /* Timer Clock Select */
 
+/* Bit definitions for CCR2. */
+#define CCR2_476_DSTI	0x08000000 /* Disable Shadow TLB Invalidate */
+
 /* Bit definitions for the MCSR. */
 #define MCSR_MCS	0x80000000 /* Machine Check Summary */
 #define MCSR_IB		0x40000000 /* Instruction PLB Error */
diff --git a/arch/powerpc/kernel/head_44x.S b/arch/powerpc/kernel/head_44x.S
index 562305b..0c1b118 100644
--- a/arch/powerpc/kernel/head_44x.S
+++ b/arch/powerpc/kernel/head_44x.S
@@ -703,8 +703,23 @@ _GLOBAL(set_context)
 	stw	r4, 0x4(r5)
 #endif
 	mtspr	SPRN_PID,r3
+BEGIN_MMU_FTR_SECTION
+	b	1f
+END_MMU_FTR_SECTION_IFSET(MMU_FTR_TYPE_47x)
 	isync			/* Force context change */
 	blr
+1:
+#ifdef CONFIG_PPC_47x
+	mfspr	r10,SPRN_CCR2_476
+	rlwinm	r11,r10,0,~CCR2_476_DSTI
+	mtspr	SPRN_CCR2_476,r11
+	isync			/* Force context change */
+	mtspr	SPRN_CCR2_476,r10
+#else /* CONFIG_PPC_47x */
+2:	trap
+	EMIT_BUG_ENTRY 2b,__FILE__,__LINE__,0;
+#endif /* CONFIG_PPC_47x */
+	blr
 
 /*
  * Init CPU state. This is called at boot time or for secondary CPUs
@@ -861,6 +876,16 @@ skpinv:	addi	r4,r4,1				/* Increment */
 	isync
 #endif /* CONFIG_PPC_EARLY_DEBUG_44x */
 
+	mfspr   r3,SPRN_CCR2_476
+#ifdef CONFIG_PPC_47x_DISABLE_SHADOW_TLB_INVALIDATE
+	/* With CCR2(DSTI) set, isync does not invalidate the shadow TLB */
+	oris	r3,r3,CCR2_476_DSTI@h
+#else
+	rlwinm	r3,r3,0,~CCR2_476_DSTI
+#endif
+	mtspr   SPRN_CCR2_476,r3
+	isync
+
 	/* Establish the interrupt vector offsets */
 	SET_IVOR(0,  CriticalInput);
 	SET_IVOR(1,  MachineCheck);
diff --git a/arch/powerpc/mm/tlb_nohash_low.S b/arch/powerpc/mm/tlb_nohash_low.S
index b9d9fed..f28fb52 100644
--- a/arch/powerpc/mm/tlb_nohash_low.S
+++ b/arch/powerpc/mm/tlb_nohash_low.S
@@ -112,7 +112,11 @@ END_MMU_FTR_SECTION_IFSET(MMU_FTR_TYPE_47x)
 	clrrwi	r4,r3,12	/* get an EPN for the hashing with V = 0 */
 	ori	r4,r4,PPC47x_TLBE_SIZE
 	tlbwe   r4,r7,0		/* write it */
+	mfspr	r8,SPRN_CCR2_476
+	rlwinm	r9,r8,0,~CCR2_476_DSTI
+	mtspr	SPRN_CCR2_476,r9
 	isync
+	mtspr	SPRN_CCR2_476,r8
 	wrtee	r10
 	blr
 #else /* CONFIG_PPC_47x */
@@ -180,7 +184,11 @@ END_MMU_FTR_SECTION_IFSET(MMU_FTR_TYPE_47x)
 	lwz	r8,0(r10)	/* Load boltmap entry */
 	addi	r10,r10,4	/* Next word */
 	b	1b		/* Then loop */
-1:	isync			/* Sync shadows */
+1:	mfspr	r9,SPRN_CCR2_476
+	rlwinm	r10,r9,0,~CCR2_476_DSTI
+	mtspr	SPRN_CCR2_476,r10
+	isync			/* Sync shadows */
+	mtspr	SPRN_CCR2_476,r9
 	wrtee	r11
 #else /* CONFIG_PPC_47x */
 1:	trap
@@ -203,7 +211,11 @@ _GLOBAL(_tlbivax_bcast)
 	isync
 /*	tlbivax	0,r3 - use .long to avoid binutils deps */
 	.long 0x7c000624 | (r3 << 11)
+	mfspr	r8,SPRN_CCR2_476
+	rlwinm	r9,r8,0,~CCR2_476_DSTI
+	mtspr	SPRN_CCR2_476,r9
 	isync
+	mtspr	SPRN_CCR2_476,r8
 	eieio
 	tlbsync
 	sync
diff --git a/arch/powerpc/platforms/44x/Kconfig b/arch/powerpc/platforms/44x/Kconfig
index 69d668c..b5ae862 100644
--- a/arch/powerpc/platforms/44x/Kconfig
+++ b/arch/powerpc/platforms/44x/Kconfig
@@ -307,3 +307,10 @@ config XILINX_VIRTEX_5_FXT
 	bool
 	select XILINX_VIRTEX
 
+config PPC_47x_DISABLE_SHADOW_TLB_INVALIDATE
+	bool "Disable shadow TLB invalidate"
+	depends on PPC_47x
+	default y
+	help
+	  This option prevents the isync operation from flushing the shadow
+	  TLB (iTLB and dTLB) on 476 boards.
diff --git a/arch/powerpc/platforms/44x/misc_44x.S b/arch/powerpc/platforms/44x/misc_44x.S
index dc12b80..a635312 100644
--- a/arch/powerpc/platforms/44x/misc_44x.S
+++ b/arch/powerpc/platforms/44x/misc_44x.S
@@ -9,15 +9,38 @@
  *
  */
 
+#include <asm/mmu.h>
 #include <asm/reg.h>
 #include <asm/ppc_asm.h>
 
 	.text
 
+#ifdef CONFIG_PPC_47x
+
+#define LOAD_CLEAR_CCR2_DSTI(REG1, REG2)	\
+BEGIN_MMU_FTR_SECTION				\
+	mfspr REG1,SPRN_CCR2_476;		\
+	rlwinm	REG2,REG1,0,~CCR2_476_DSTI;	\
+	mtspr	SPRN_CCR2_476,REG2;		\
+END_MMU_FTR_SECTION_IFSET(MMU_FTR_TYPE_47x)
+
+#define RESTORE_CCR2_DSTI(REG)			\
+BEGIN_MMU_FTR_SECTION				\
+	mtspr	SPRN_CCR2_476,REG;		\
+END_MMU_FTR_SECTION_IFSET(MMU_FTR_TYPE_47x)
+
+#else	/* CONFIG_PPC_47x */
+
+#define LOAD_CLEAR_CCR2_DSTI(REG1, REG2)
+#define RESTORE_CCR2_DSTI(REG)
+
+#endif	/* CONFIG_PPC_47x */
+
 /*
  * Do an IO access in AS1
  */
 _GLOBAL(as1_readb)
+	LOAD_CLEAR_CCR2_DSTI(r8, r9)
 	mfmsr	r7
 	ori	r0,r7,MSR_DS
 	sync
@@ -29,9 +52,11 @@ _GLOBAL(as1_readb)
 	mtmsr	r7
 	sync
 	isync
+	RESTORE_CCR2_DSTI(r8)
 	blr
 
 _GLOBAL(as1_writeb)
+	LOAD_CLEAR_CCR2_DSTI(r8, r9)
 	mfmsr	r7
 	ori	r0,r7,MSR_DS
 	sync
@@ -43,4 +68,5 @@ _GLOBAL(as1_writeb)
 	mtmsr	r7
 	sync
 	isync
+	RESTORE_CCR2_DSTI(r8)
 	blr
-- 
1.7.2.2

^ permalink raw reply related

* [PATCH 2/2] ppc: lazy flush_tlb_mm for nohash architectures
From: Dave Kleikamp @ 2010-09-24 18:01 UTC (permalink / raw)
  To: Josh Boyer, Benjamin Herrenschmidt; +Cc: linuxppc-dev list, Dave Kleikamp
In-Reply-To: <1285351297-9999-1-git-send-email-shaggy@linux.vnet.ibm.com>

On PPC_MMU_NOHASH processors that support a large number of contexts,
implement a lazy flush_tlb_mm() that switches to a free context, marking
the old one stale.  The tlb is only flushed when no free contexts are
available.

The lazy tlb flushing is controlled by the global variable tlb_lazy_flush
which is set during init, dependent upon MMU_FTR_TYPE_47x.

Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
---
 arch/powerpc/mm/mmu_context_nohash.c |  154 +++++++++++++++++++++++++++++++---
 arch/powerpc/mm/mmu_decl.h           |    8 ++
 arch/powerpc/mm/tlb_nohash.c         |   28 +++++-
 3 files changed, 174 insertions(+), 16 deletions(-)

diff --git a/arch/powerpc/mm/mmu_context_nohash.c b/arch/powerpc/mm/mmu_context_nohash.c
index ddfd7ad..87c7dc2 100644
--- a/arch/powerpc/mm/mmu_context_nohash.c
+++ b/arch/powerpc/mm/mmu_context_nohash.c
@@ -17,10 +17,6 @@
  * TODO:
  *
  *   - The global context lock will not scale very well
- *   - The maps should be dynamically allocated to allow for processors
- *     that support more PID bits at runtime
- *   - Implement flush_tlb_mm() by making the context stale and picking
- *     a new one
  *   - More aggressively clear stale map bits and maybe find some way to
  *     also clear mm->cpu_vm_mask bits when processes are migrated
  */
@@ -52,6 +48,8 @@
 #include <asm/mmu_context.h>
 #include <asm/tlbflush.h>
 
+#include "mmu_decl.h"
+
 static unsigned int first_context, last_context;
 static unsigned int next_context, nr_free_contexts;
 static unsigned long *context_map;
@@ -59,9 +57,31 @@ static unsigned long *stale_map[NR_CPUS];
 static struct mm_struct **context_mm;
 static DEFINE_RAW_SPINLOCK(context_lock);
 
+int tlb_lazy_flush;
+static int tlb_needs_flush[NR_CPUS];
+static unsigned long *context_available_map;
+static unsigned int nr_stale_contexts;
+
 #define CTX_MAP_SIZE	\
 	(sizeof(unsigned long) * (last_context / BITS_PER_LONG + 1))
 
+/*
+ * if another cpu recycled the stale contexts, we need to flush
+ * the local TLB, so that we may re-use those contexts
+ */
+void flush_recycled_contexts(int cpu)
+{
+	int i;
+
+	if (tlb_needs_flush[cpu]) {
+		pr_hard("[%d] flushing tlb\n", cpu);
+		_tlbil_all();
+		for (i = cpu_first_thread_in_core(cpu);
+		     i <= cpu_last_thread_in_core(cpu); i++) {
+			tlb_needs_flush[i] = 0;
+		}
+	}
+}
 
 /* Steal a context from a task that has one at the moment.
  *
@@ -147,7 +167,7 @@ static unsigned int steal_context_up(unsigned int id)
 	pr_hardcont(" | steal %d from 0x%p", id, mm);
 
 	/* Flush the TLB for that context */
-	local_flush_tlb_mm(mm);
+	__local_flush_tlb_mm(mm);
 
 	/* Mark this mm has having no context anymore */
 	mm->context.id = MMU_NO_CONTEXT;
@@ -161,13 +181,19 @@ static unsigned int steal_context_up(unsigned int id)
 #ifdef DEBUG_MAP_CONSISTENCY
 static void context_check_map(void)
 {
-	unsigned int id, nrf, nact;
+	unsigned int id, nrf, nact, nstale;
 
-	nrf = nact = 0;
+	nrf = nact = nstale = 0;
 	for (id = first_context; id <= last_context; id++) {
 		int used = test_bit(id, context_map);
-		if (!used)
-			nrf++;
+		int allocated = tlb_lazy_flush &&
+				test_bit(id, context_available_map);
+		if (!used) {
+			if (allocated)
+				nstale++;
+			else
+				nrf++;
+		}
 		if (used != (context_mm[id] != NULL))
 			pr_err("MMU: Context %d is %s and MM is %p !\n",
 			       id, used ? "used" : "free", context_mm[id]);
@@ -179,6 +205,11 @@ static void context_check_map(void)
 		       nr_free_contexts, nrf);
 		nr_free_contexts = nrf;
 	}
+	if (nstale != nr_stale_contexts) {
+		pr_err("MMU: Stale context count out of sync ! (%d vs %d)\n",
+		       nr_stale_contexts, nstale);
+		nr_stale_contexts = nstale;
+	}
 	if (nact > num_online_cpus())
 		pr_err("MMU: More active contexts than CPUs ! (%d vs %d)\n",
 		       nact, num_online_cpus());
@@ -189,6 +220,38 @@ static void context_check_map(void)
 static void context_check_map(void) { }
 #endif
 
+/*
+ * On architectures that support a large number of contexts, the tlb
+ * can be flushed lazily by picking a new context and making the stale
+ * context unusable until a lazy tlb flush has been issued.
+ *
+ * context_available_map keeps track of both active and stale contexts,
+ * while context_map continues to track only active contexts.  When the
+ * lazy tlb flush is triggered, context_map is copied to
+ * context_available_map, making the once-stale contexts available again
+ */
+static void recycle_stale_contexts(void)
+{
+	if (nr_free_contexts == 0 && nr_stale_contexts > 0) {
+		unsigned int cpu = smp_processor_id();
+		unsigned int i;
+
+		pr_hard("[%d] recycling stale contexts\n", cpu);
+		/* Time to flush the TLB's */
+		memcpy(context_available_map, context_map, CTX_MAP_SIZE);
+		nr_free_contexts = nr_stale_contexts;
+		nr_stale_contexts = 0;
+		for_each_online_cpu(i) {
+			if ((i < cpu_first_thread_in_core(cpu)) ||
+			    (i > cpu_last_thread_in_core(cpu)))
+				tlb_needs_flush[i] = 1;
+			else
+				tlb_needs_flush[i] = 0;	/* This core */
+		}
+		_tlbil_all();
+	}
+}
+
 void switch_mmu_context(struct mm_struct *prev, struct mm_struct *next)
 {
 	unsigned int i, id, cpu = smp_processor_id();
@@ -197,6 +260,8 @@ void switch_mmu_context(struct mm_struct *prev, struct mm_struct *next)
 	/* No lockless fast path .. yet */
 	raw_spin_lock(&context_lock);
 
+	flush_recycled_contexts(cpu);
+
 	pr_hard("[%d] activating context for mm @%p, active=%d, id=%d",
 		cpu, next, next->context.active, next->context.id);
 
@@ -227,7 +292,12 @@ void switch_mmu_context(struct mm_struct *prev, struct mm_struct *next)
 	id = next_context;
 	if (id > last_context)
 		id = first_context;
-	map = context_map;
+
+	if (tlb_lazy_flush) {
+		recycle_stale_contexts();
+		map = context_available_map;
+	} else
+		map = context_map;
 
 	/* No more free contexts, let's try to steal one */
 	if (nr_free_contexts == 0) {
@@ -250,6 +320,13 @@ void switch_mmu_context(struct mm_struct *prev, struct mm_struct *next)
 		if (id > last_context)
 			id = first_context;
 	}
+	if (tlb_lazy_flush)
+		/*
+		 * In the while loop above, we set the bit in
+		 * context_available_map, it also needs to be set in
+		 * context_map
+		 */
+		__set_bit(id, context_map);
  stolen:
 	next_context = id + 1;
 	context_mm[id] = next;
@@ -267,7 +344,7 @@ void switch_mmu_context(struct mm_struct *prev, struct mm_struct *next)
 			    id, cpu_first_thread_in_core(cpu),
 			    cpu_last_thread_in_core(cpu));
 
-		local_flush_tlb_mm(next);
+		__local_flush_tlb_mm(next);
 
 		/* XXX This clear should ultimately be part of local_flush_tlb_mm */
 		for (i = cpu_first_thread_in_core(cpu);
@@ -317,11 +394,61 @@ void destroy_context(struct mm_struct *mm)
 		mm->context.active = 0;
 #endif
 		context_mm[id] = NULL;
-		nr_free_contexts++;
+
+		if (tlb_lazy_flush)
+			nr_stale_contexts++;
+		else
+			nr_free_contexts++;
 	}
 	raw_spin_unlock_irqrestore(&context_lock, flags);
 }
 
+/*
+ * This is called from flush_tlb_mm().  Mark the current context as stale
+ * and grab an available one.  The tlb will be flushed when no more
+ * contexts are available
+ */
+void lazy_flush_context(struct mm_struct *mm)
+{
+	unsigned int id;
+	unsigned long flags;
+	unsigned long *map;
+
+	raw_spin_lock_irqsave(&context_lock, flags);
+
+	id = mm->context.id;
+	if (unlikely(id == MMU_NO_CONTEXT))
+		goto no_context;
+
+	/*
+	 * Make the existing context stale.  It remains in
+	 * context_available_map as long as nr_free_contexts remains non-zero
+	 */
+	 __clear_bit(id, context_map);
+	 context_mm[id] = NULL;
+	 nr_stale_contexts++;
+
+	recycle_stale_contexts();
+	BUG_ON(nr_free_contexts == 0);
+
+	nr_free_contexts--;
+	id = last_context;
+	map = context_available_map;
+	while (__test_and_set_bit(id, map)) {
+		id = find_next_zero_bit(map, last_context+1, id);
+		if (id > last_context)
+			id = first_context;
+	}
+	set_bit(id, context_map);
+	next_context = id + 1;
+	context_mm[id] = mm;
+	mm->context.id = id;
+	if (current->active_mm == mm)
+		set_context(id, mm->pgd);
+no_context:
+	raw_spin_unlock_irqrestore(&context_lock, flags);
+}
+
 #ifdef CONFIG_SMP
 
 static int __cpuinit mmu_context_cpu_notify(struct notifier_block *self,
@@ -407,6 +534,7 @@ void __init mmu_context_init(void)
 	} else if (mmu_has_feature(MMU_FTR_TYPE_47x)) {
 		first_context = 1;
 		last_context = 65535;
+		tlb_lazy_flush = 1;
 	} else {
 		first_context = 1;
 		last_context = 255;
@@ -419,6 +547,8 @@ void __init mmu_context_init(void)
 	 * Allocate the maps used by context management
 	 */
 	context_map = alloc_bootmem(CTX_MAP_SIZE);
+	if (tlb_lazy_flush)
+		context_available_map = alloc_bootmem(CTX_MAP_SIZE);
 	context_mm = alloc_bootmem(sizeof(void *) * (last_context + 1));
 	stale_map[0] = alloc_bootmem(CTX_MAP_SIZE);
 
diff --git a/arch/powerpc/mm/mmu_decl.h b/arch/powerpc/mm/mmu_decl.h
index 63b84a0..64240f1 100644
--- a/arch/powerpc/mm/mmu_decl.h
+++ b/arch/powerpc/mm/mmu_decl.h
@@ -25,6 +25,14 @@
 #ifdef CONFIG_PPC_MMU_NOHASH
 
 /*
+ * Lazy tlb flush
+ */
+extern int tlb_lazy_flush;
+extern void flush_recycled_contexts(int);
+void lazy_flush_context(struct mm_struct *mm);
+void __local_flush_tlb_mm(struct mm_struct *mm);
+
+/*
  * On 40x and 8xx, we directly inline tlbia and tlbivax
  */
 #if defined(CONFIG_40x) || defined(CONFIG_8xx)
diff --git a/arch/powerpc/mm/tlb_nohash.c b/arch/powerpc/mm/tlb_nohash.c
index fe391e9..264d0ea 100644
--- a/arch/powerpc/mm/tlb_nohash.c
+++ b/arch/powerpc/mm/tlb_nohash.c
@@ -36,6 +36,7 @@
 #include <linux/spinlock.h>
 #include <linux/memblock.h>
 
+#include <asm/mmu_context.h>
 #include <asm/tlbflush.h>
 #include <asm/tlb.h>
 #include <asm/code-patching.h>
@@ -117,7 +118,7 @@ unsigned long linear_map_top;	/* Top of linear mapping */
 /*
  * These are the base non-SMP variants of page and mm flushing
  */
-void local_flush_tlb_mm(struct mm_struct *mm)
+void __local_flush_tlb_mm(struct mm_struct *mm)
 {
 	unsigned int pid;
 
@@ -127,6 +128,14 @@ void local_flush_tlb_mm(struct mm_struct *mm)
 		_tlbil_pid(pid);
 	preempt_enable();
 }
+
+void local_flush_tlb_mm(struct mm_struct *mm)
+{
+	if (tlb_lazy_flush)
+		lazy_flush_context(mm);
+	else
+		__local_flush_tlb_mm(mm);
+}
 EXPORT_SYMBOL(local_flush_tlb_mm);
 
 void __local_flush_tlb_page(struct mm_struct *mm, unsigned long vmaddr,
@@ -166,13 +175,19 @@ struct tlb_flush_param {
 	unsigned int pid;
 	unsigned int tsize;
 	unsigned int ind;
+	struct mm_struct *mm;
 };
 
 static void do_flush_tlb_mm_ipi(void *param)
 {
 	struct tlb_flush_param *p = param;
 
-	_tlbil_pid(p ? p->pid : 0);
+	if (tlb_lazy_flush && p) {
+		flush_recycled_contexts(smp_processor_id());
+		if (current->active_mm == p->mm)
+			set_context(p->pid, p->mm->pgd);
+	} else
+		_tlbil_pid(p ? p->pid : 0);
 }
 
 static void do_flush_tlb_page_ipi(void *param)
@@ -207,13 +222,18 @@ void flush_tlb_mm(struct mm_struct *mm)
 	pid = mm->context.id;
 	if (unlikely(pid == MMU_NO_CONTEXT))
 		goto no_context;
+	if (tlb_lazy_flush) {
+		lazy_flush_context(mm);
+		pid = mm->context.id;
+	}
 	if (!mm_is_core_local(mm)) {
-		struct tlb_flush_param p = { .pid = pid };
+		struct tlb_flush_param p = { .pid = pid, .mm = mm };
 		/* Ignores smp_processor_id() even if set. */
 		smp_call_function_many(mm_cpumask(mm),
 				       do_flush_tlb_mm_ipi, &p, 1);
 	}
-	_tlbil_pid(pid);
+	if (!tlb_lazy_flush)
+		_tlbil_pid(pid);
  no_context:
 	preempt_enable();
 }
-- 
1.7.2.2

^ permalink raw reply related

* RE: [PATCH 1/2] PPC4xx: Generelizing drivers/dma/ppc4xx/adma.c
From: Tirumala Marri @ 2010-09-24 18:52 UTC (permalink / raw)
  To: Stefan Roese, linuxppc-dev
  Cc: neilb, yur, linux-raid, herbert, linux-crypto, Dan Williams
In-Reply-To: <201009240858.05018.sr@denx.de>

>
> It would be really preferable to support all those platforms in a
> single Linux
> image. If technically possible, please try to move this direction.
It is do-able for couple of SoCs. Other SoC DMA engines are quite a bit
different.
Let me first do small steps first and slowly achieve some run time
Differentiation.

Thanks,
Marri

^ permalink raw reply

* [PATCH RFCv1 0/2] dma: add support for sg-to-sg transfers
From: Ira W. Snyder @ 2010-09-24 19:46 UTC (permalink / raw)
  To: linux-kernel; +Cc: Dan Williams, linuxppc-dev

This series adds support for scatterlist to scatterlist transfers to the
generic DMAEngine API. I have unconditionally enabled it when the fsldma
driver is used to make testing easier. This feature should probably be
selected by individual drivers.

This series is intended to lay the groundwork for further changes to the
series titled "CARMA Board Support". That series will be updated when I
have time and hardware to test with.

This series has not been runtime tested yet. I am posting it only to
gain comments before I spend the effort to update the driver that
depends on this.

To help reviewers, I'd like to comment on the architecture of
dma_async_memcpy_sg_to_sg(). It explicitly avoids using descriptor
chaining due to the way that feature interacts with the fsldma
controller's external start feature. To use the external start feature
properly, the in-memory descriptor chain must not be fragmented into
multiple smaller chains. This is what is achieved by submitting all
descriptors without using chaining.

Ira W. Snyder (2):
  dmaengine: add support for scatterlist to scatterlist transfers
  fsldma: use generic support for scatterlist to scatterlist transfers

 arch/powerpc/include/asm/fsldma.h |  115 ++------------------
 drivers/dma/Kconfig               |    4 +
 drivers/dma/dmaengine.c           |  119 ++++++++++++++++++++
 drivers/dma/fsldma.c              |  219 +++++++------------------------------
 include/linux/dmaengine.h         |   10 ++
 5 files changed, 181 insertions(+), 286 deletions(-)

^ permalink raw reply

* [PATCH RFCv1 1/2] dmaengine: add support for scatterlist to scatterlist transfers
From: Ira W. Snyder @ 2010-09-24 19:46 UTC (permalink / raw)
  To: linux-kernel; +Cc: Dan Williams, linuxppc-dev
In-Reply-To: <1285357571-23377-1-git-send-email-iws@ovro.caltech.edu>

This adds support for scatterlist to scatterlist DMA transfers. As
requested by Dan, this is hidden behind an ifdef so that it can be
selected by the drivers that need it.

Signed-off-by: Ira W. Snyder <iws@ovro.caltech.edu>
---
 drivers/dma/Kconfig       |    4 ++
 drivers/dma/dmaengine.c   |  119 +++++++++++++++++++++++++++++++++++++++++++++
 include/linux/dmaengine.h |   10 ++++
 3 files changed, 133 insertions(+), 0 deletions(-)

diff --git a/drivers/dma/Kconfig b/drivers/dma/Kconfig
index 9520cf0..f688669 100644
--- a/drivers/dma/Kconfig
+++ b/drivers/dma/Kconfig
@@ -89,10 +89,14 @@ config AT_HDMAC
 	  Support the Atmel AHB DMA controller.  This can be integrated in
 	  chips such as the Atmel AT91SAM9RL.
 
+config DMAENGINE_SG_TO_SG
+	bool
+
 config FSL_DMA
 	tristate "Freescale Elo and Elo Plus DMA support"
 	depends on FSL_SOC
 	select DMA_ENGINE
+	select DMAENGINE_SG_TO_SG
 	---help---
 	  Enable support for the Freescale Elo and Elo Plus DMA controllers.
 	  The Elo is the DMA controller on some 82xx and 83xx parts, and the
diff --git a/drivers/dma/dmaengine.c b/drivers/dma/dmaengine.c
index 9d31d5e..57ec1e5 100644
--- a/drivers/dma/dmaengine.c
+++ b/drivers/dma/dmaengine.c
@@ -972,10 +972,129 @@ dma_async_memcpy_pg_to_pg(struct dma_chan *chan, struct page *dest_pg,
 }
 EXPORT_SYMBOL(dma_async_memcpy_pg_to_pg);
 
+#ifdef CONFIG_DMAENGINE_SG_TO_SG
+dma_cookie_t
+dma_async_memcpy_sg_to_sg(struct dma_chan *chan,
+			  struct scatterlist *dst_sg, unsigned int dst_nents,
+			  struct scatterlist *src_sg, unsigned int src_nents,
+			  dma_async_tx_callback cb, void *cb_param)
+{
+	struct dma_device *dev = chan->device;
+	struct dma_async_tx_descriptor *tx;
+	dma_cookie_t cookie = -ENOMEM;
+	size_t dst_avail, src_avail;
+	struct list_head tx_list;
+	size_t transferred = 0;
+	dma_addr_t dst, src;
+	size_t len;
+
+	if (dst_nents == 0 || src_nents == 0)
+		return -EINVAL;
+
+	if (dst_sg == NULL || src_sg == NULL)
+		return -EINVAL;
+
+	/* get prepared for the loop */
+	dst_avail = sg_dma_len(dst_sg);
+	src_avail = sg_dma_len(src_sg);
+
+	INIT_LIST_HEAD(&tx_list);
+
+	/* run until we are out of descriptors */
+	while (true) {
+
+		/* create the largest transaction possible */
+		len = min_t(size_t, src_avail, dst_avail);
+		if (len == 0)
+			goto fetch;
+
+		dst = sg_dma_address(dst_sg) + sg_dma_len(dst_sg) - dst_avail;
+		src = sg_dma_address(src_sg) + sg_dma_len(src_sg) - src_avail;
+
+		/* setup the transaction */
+		tx = dev->device_prep_dma_memcpy(chan, dst, src, len, 0);
+		if (!tx) {
+			dev_err(dev->dev, "failed to alloc desc for memcpy\n");
+			return -ENOMEM;
+		}
+
+		/* keep track of the tx for later */
+		list_add_tail(&tx->entry, &tx_list);
+
+		/* update metadata */
+		transferred += len;
+		dst_avail -= len;
+		src_avail -= len;
+
+fetch:
+		/* fetch the next dst scatterlist entry */
+		if (dst_avail == 0) {
+
+			/* no more entries: we're done */
+			if (dst_nents == 0)
+				break;
+
+			/* fetch the next entry: if there are no more: done */
+			dst_sg = sg_next(dst_sg);
+			if (dst_sg == NULL)
+				break;
+
+			dst_nents--;
+			dst_avail = sg_dma_len(dst_sg);
+		}
+
+		/* fetch the next src scatterlist entry */
+		if (src_avail == 0) {
+
+			/* no more entries: we're done */
+			if (src_nents == 0)
+				break;
+
+			/* fetch the next entry: if there are no more: done */
+			src_sg = sg_next(src_sg);
+			if (src_sg == NULL)
+				break;
+
+			src_nents--;
+			src_avail = sg_dma_len(src_sg);
+		}
+	}
+
+	/* loop through the list of descriptors and submit them */
+	list_for_each_entry(tx, &tx_list, entry) {
+
+		/* this is the last descriptor: add the callback */
+		if (list_is_last(&tx->entry, &tx_list)) {
+			tx->callback = cb;
+			tx->callback_param = cb_param;
+		}
+
+		/* submit the transaction */
+		cookie = tx->tx_submit(tx);
+		if (dma_submit_error(cookie)) {
+			dev_err(dev->dev, "failed to submit desc\n");
+			return cookie;
+		}
+	}
+
+	/* update counters */
+	preempt_disable();
+	__this_cpu_add(chan->local->bytes_transferred, transferred);
+	__this_cpu_inc(chan->local->memcpy_count);
+	preempt_enable();
+
+	return cookie;
+}
+EXPORT_SYMBOL(dma_async_memcpy_sg_to_sg);
+#endif
+
 void dma_async_tx_descriptor_init(struct dma_async_tx_descriptor *tx,
 	struct dma_chan *chan)
 {
 	tx->chan = chan;
+	#ifdef CONFIG_DMAENGINE_SG_TO_SG
+	INIT_LIST_HEAD(&tx->entry);
+	#endif
 	#ifndef CONFIG_ASYNC_TX_DISABLE_CHANNEL_SWITCH
 	spin_lock_init(&tx->lock);
 	#endif
diff --git a/include/linux/dmaengine.h b/include/linux/dmaengine.h
index c61d4ca..5507f4c 100644
--- a/include/linux/dmaengine.h
+++ b/include/linux/dmaengine.h
@@ -24,6 +24,7 @@
 #include <linux/device.h>
 #include <linux/uio.h>
 #include <linux/dma-mapping.h>
+#include <linux/list.h>
 
 /**
  * typedef dma_cookie_t - an opaque DMA cookie
@@ -316,6 +317,9 @@ struct dma_async_tx_descriptor {
 	dma_cookie_t (*tx_submit)(struct dma_async_tx_descriptor *tx);
 	dma_async_tx_callback callback;
 	void *callback_param;
+#ifdef CONFIG_DMAENGINE_SG_TO_SG
+	struct list_head entry;
+#endif
 #ifndef CONFIG_ASYNC_TX_DISABLE_CHANNEL_SWITCH
 	struct dma_async_tx_descriptor *next;
 	struct dma_async_tx_descriptor *parent;
@@ -632,6 +636,12 @@ dma_cookie_t dma_async_memcpy_buf_to_pg(struct dma_chan *chan,
 dma_cookie_t dma_async_memcpy_pg_to_pg(struct dma_chan *chan,
 	struct page *dest_pg, unsigned int dest_off, struct page *src_pg,
 	unsigned int src_off, size_t len);
+#ifdef CONFIG_DMAENGINE_SG_TO_SG
+dma_cookie_t dma_async_memcpy_sg_to_sg(struct dma_chan *chan,
+	struct scatterlist *dst_sg, unsigned int dst_nents,
+	struct scatterlist *src_sg, unsigned int src_nents,
+	dma_async_tx_callback cb, void *cb_param);
+#endif
 void dma_async_tx_descriptor_init(struct dma_async_tx_descriptor *tx,
 	struct dma_chan *chan);
 
-- 
1.7.1

^ permalink raw reply related

* [PATCH RFCv1 2/2] fsldma: use generic support for scatterlist to scatterlist transfers
From: Ira W. Snyder @ 2010-09-24 19:46 UTC (permalink / raw)
  To: linux-kernel; +Cc: Dan Williams, linuxppc-dev
In-Reply-To: <1285357571-23377-1-git-send-email-iws@ovro.caltech.edu>

The fsldma driver uses the DMA_SLAVE API to handle scatterlist to
scatterlist DMA transfers. For quite a while now, it has been possible
to mimic the operation by using the device_prep_dma_memcpy() routine
intelligently.

Now that the DMAEngine API has grown generic support for scatterlist to
scatterlist transfers, this operation is no longer needed. The generic
support is used for scatterlist to scatterlist transfers.

Signed-off-by: Ira W. Snyder <iws@ovro.caltech.edu>
---
 arch/powerpc/include/asm/fsldma.h |  115 ++------------------
 drivers/dma/fsldma.c              |  219 +++++++------------------------------
 2 files changed, 48 insertions(+), 286 deletions(-)

diff --git a/arch/powerpc/include/asm/fsldma.h b/arch/powerpc/include/asm/fsldma.h
index debc5ed..dc0bd27 100644
--- a/arch/powerpc/include/asm/fsldma.h
+++ b/arch/powerpc/include/asm/fsldma.h
@@ -1,7 +1,7 @@
 /*
  * Freescale MPC83XX / MPC85XX DMA Controller
  *
- * Copyright (c) 2009 Ira W. Snyder <iws@ovro.caltech.edu>
+ * Copyright (c) 2009-2010 Ira W. Snyder <iws@ovro.caltech.edu>
  *
  * This file is licensed under the terms of the GNU General Public License
  * version 2. This program is licensed "as is" without any warranty of any
@@ -11,127 +11,32 @@
 #ifndef __ARCH_POWERPC_ASM_FSLDMA_H__
 #define __ARCH_POWERPC_ASM_FSLDMA_H__
 
-#include <linux/slab.h>
 #include <linux/dmaengine.h>
 
 /*
- * Definitions for the Freescale DMA controller's DMA_SLAVE implemention
+ * The Freescale DMA controller has several features that are not accomodated
+ * in the Linux DMAEngine API. Therefore, the generic structure is expanded
+ * to allow drivers to use these features.
  *
- * The Freescale DMA_SLAVE implementation was designed to handle many-to-many
- * transfers. An example usage would be an accelerated copy between two
- * scatterlists. Another example use would be an accelerated copy from
- * multiple non-contiguous device buffers into a single scatterlist.
+ * This structure should be passed into the DMAEngine routine device_control()
+ * as in this example:
  *
- * A DMA_SLAVE transaction is defined by a struct fsl_dma_slave. This
- * structure contains a list of hardware addresses that should be copied
- * to/from the scatterlist passed into device_prep_slave_sg(). The structure
- * also has some fields to enable hardware-specific features.
+ * chan->device->device_control(chan, DMA_SLAVE_CONFIG, (unsigned long)cfg);
  */
 
 /**
- * struct fsl_dma_hw_addr
- * @entry: linked list entry
- * @address: the hardware address
- * @length: length to transfer
- *
- * Holds a single physical hardware address / length pair for use
- * with the DMAEngine DMA_SLAVE API.
- */
-struct fsl_dma_hw_addr {
-	struct list_head entry;
-
-	dma_addr_t address;
-	size_t length;
-};
-
-/**
  * struct fsl_dma_slave
- * @addresses: a linked list of struct fsl_dma_hw_addr structures
+ * @config: the standard Linux DMAEngine API DMA_SLAVE configuration
  * @request_count: value for DMA request count
- * @src_loop_size: setup and enable constant source-address DMA transfers
- * @dst_loop_size: setup and enable constant destination address DMA transfers
  * @external_start: enable externally started DMA transfers
  * @external_pause: enable externally paused DMA transfers
- *
- * Holds a list of address / length pairs for use with the DMAEngine
- * DMA_SLAVE API implementation for the Freescale DMA controller.
  */
-struct fsl_dma_slave {
+struct fsldma_slave_config {
+	struct dma_slave_config config;
 
-	/* List of hardware address/length pairs */
-	struct list_head addresses;
-
-	/* Support for extra controller features */
 	unsigned int request_count;
-	unsigned int src_loop_size;
-	unsigned int dst_loop_size;
 	bool external_start;
 	bool external_pause;
 };
 
-/**
- * fsl_dma_slave_append - add an address/length pair to a struct fsl_dma_slave
- * @slave: the &struct fsl_dma_slave to add to
- * @address: the hardware address to add
- * @length: the length of bytes to transfer from @address
- *
- * Add a hardware address/length pair to a struct fsl_dma_slave. Returns 0 on
- * success, -ERRNO otherwise.
- */
-static inline int fsl_dma_slave_append(struct fsl_dma_slave *slave,
-				       dma_addr_t address, size_t length)
-{
-	struct fsl_dma_hw_addr *addr;
-
-	addr = kzalloc(sizeof(*addr), GFP_ATOMIC);
-	if (!addr)
-		return -ENOMEM;
-
-	INIT_LIST_HEAD(&addr->entry);
-	addr->address = address;
-	addr->length = length;
-
-	list_add_tail(&addr->entry, &slave->addresses);
-	return 0;
-}
-
-/**
- * fsl_dma_slave_free - free a struct fsl_dma_slave
- * @slave: the struct fsl_dma_slave to free
- *
- * Free a struct fsl_dma_slave and all associated address/length pairs
- */
-static inline void fsl_dma_slave_free(struct fsl_dma_slave *slave)
-{
-	struct fsl_dma_hw_addr *addr, *tmp;
-
-	if (slave) {
-		list_for_each_entry_safe(addr, tmp, &slave->addresses, entry) {
-			list_del(&addr->entry);
-			kfree(addr);
-		}
-
-		kfree(slave);
-	}
-}
-
-/**
- * fsl_dma_slave_alloc - allocate a struct fsl_dma_slave
- * @gfp: the flags to pass to kmalloc when allocating this structure
- *
- * Allocate a struct fsl_dma_slave for use by the DMA_SLAVE API. Returns a new
- * struct fsl_dma_slave on success, or NULL on failure.
- */
-static inline struct fsl_dma_slave *fsl_dma_slave_alloc(gfp_t gfp)
-{
-	struct fsl_dma_slave *slave;
-
-	slave = kzalloc(sizeof(*slave), gfp);
-	if (!slave)
-		return NULL;
-
-	INIT_LIST_HEAD(&slave->addresses);
-	return slave;
-}
-
 #endif /* __ARCH_POWERPC_ASM_FSLDMA_H__ */
diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c
index cea08be..c90b393 100644
--- a/drivers/dma/fsldma.c
+++ b/drivers/dma/fsldma.c
@@ -599,207 +599,64 @@ static struct dma_async_tx_descriptor *fsl_dma_prep_slave_sg(
 	struct dma_chan *dchan, struct scatterlist *sgl, unsigned int sg_len,
 	enum dma_data_direction direction, unsigned long flags)
 {
-	struct fsldma_chan *chan;
-	struct fsl_desc_sw *first = NULL, *prev = NULL, *new = NULL;
-	struct fsl_dma_slave *slave;
-	size_t copy;
-
-	int i;
-	struct scatterlist *sg;
-	size_t sg_used;
-	size_t hw_used;
-	struct fsl_dma_hw_addr *hw;
-	dma_addr_t dma_dst, dma_src;
-
-	if (!dchan)
-		return NULL;
-
-	if (!dchan->private)
-		return NULL;
-
-	chan = to_fsl_chan(dchan);
-	slave = dchan->private;
-
-	if (list_empty(&slave->addresses))
-		return NULL;
-
-	hw = list_first_entry(&slave->addresses, struct fsl_dma_hw_addr, entry);
-	hw_used = 0;
-
 	/*
-	 * Build the hardware transaction to copy from the scatterlist to
-	 * the hardware, or from the hardware to the scatterlist
+	 * This operation is not supported on the Freescale DMA controller
 	 *
-	 * If you are copying from the hardware to the scatterlist and it
-	 * takes two hardware entries to fill an entire page, then both
-	 * hardware entries will be coalesced into the same page
-	 *
-	 * If you are copying from the scatterlist to the hardware and a
-	 * single page can fill two hardware entries, then the data will
-	 * be read out of the page into the first hardware entry, and so on
+	 * However, we need to provide the function pointer to allow the
+	 * device_control() method to work.
 	 */
-	for_each_sg(sgl, sg, sg_len, i) {
-		sg_used = 0;
-
-		/* Loop until the entire scatterlist entry is used */
-		while (sg_used < sg_dma_len(sg)) {
-
-			/*
-			 * If we've used up the current hardware address/length
-			 * pair, we need to load a new one
-			 *
-			 * This is done in a while loop so that descriptors with
-			 * length == 0 will be skipped
-			 */
-			while (hw_used >= hw->length) {
-
-				/*
-				 * If the current hardware entry is the last
-				 * entry in the list, we're finished
-				 */
-				if (list_is_last(&hw->entry, &slave->addresses))
-					goto finished;
-
-				/* Get the next hardware address/length pair */
-				hw = list_entry(hw->entry.next,
-						struct fsl_dma_hw_addr, entry);
-				hw_used = 0;
-			}
-
-			/* Allocate the link descriptor from DMA pool */
-			new = fsl_dma_alloc_descriptor(chan);
-			if (!new) {
-				dev_err(chan->dev, "No free memory for "
-						       "link descriptor\n");
-				goto fail;
-			}
-#ifdef FSL_DMA_LD_DEBUG
-			dev_dbg(chan->dev, "new link desc alloc %p\n", new);
-#endif
-
-			/*
-			 * Calculate the maximum number of bytes to transfer,
-			 * making sure it is less than the DMA controller limit
-			 */
-			copy = min_t(size_t, sg_dma_len(sg) - sg_used,
-					     hw->length - hw_used);
-			copy = min_t(size_t, copy, FSL_DMA_BCR_MAX_CNT);
-
-			/*
-			 * DMA_FROM_DEVICE
-			 * from the hardware to the scatterlist
-			 *
-			 * DMA_TO_DEVICE
-			 * from the scatterlist to the hardware
-			 */
-			if (direction == DMA_FROM_DEVICE) {
-				dma_src = hw->address + hw_used;
-				dma_dst = sg_dma_address(sg) + sg_used;
-			} else {
-				dma_src = sg_dma_address(sg) + sg_used;
-				dma_dst = hw->address + hw_used;
-			}
-
-			/* Fill in the descriptor */
-			set_desc_cnt(chan, &new->hw, copy);
-			set_desc_src(chan, &new->hw, dma_src);
-			set_desc_dst(chan, &new->hw, dma_dst);
-
-			/*
-			 * If this is not the first descriptor, chain the
-			 * current descriptor after the previous descriptor
-			 */
-			if (!first) {
-				first = new;
-			} else {
-				set_desc_next(chan, &prev->hw,
-					      new->async_tx.phys);
-			}
-
-			new->async_tx.cookie = 0;
-			async_tx_ack(&new->async_tx);
-
-			prev = new;
-			sg_used += copy;
-			hw_used += copy;
-
-			/* Insert the link descriptor into the LD ring */
-			list_add_tail(&new->node, &first->tx_list);
-		}
-	}
-
-finished:
-
-	/* All of the hardware address/length pairs had length == 0 */
-	if (!first || !new)
-		return NULL;
-
-	new->async_tx.flags = flags;
-	new->async_tx.cookie = -EBUSY;
-
-	/* Set End-of-link to the last link descriptor of new list */
-	set_ld_eol(chan, new);
-
-	/* Enable extra controller features */
-	if (chan->set_src_loop_size)
-		chan->set_src_loop_size(chan, slave->src_loop_size);
-
-	if (chan->set_dst_loop_size)
-		chan->set_dst_loop_size(chan, slave->dst_loop_size);
-
-	if (chan->toggle_ext_start)
-		chan->toggle_ext_start(chan, slave->external_start);
-
-	if (chan->toggle_ext_pause)
-		chan->toggle_ext_pause(chan, slave->external_pause);
-
-	if (chan->set_request_count)
-		chan->set_request_count(chan, slave->request_count);
-
-	return &first->async_tx;
-
-fail:
-	/* If first was not set, then we failed to allocate the very first
-	 * descriptor, and we're done */
-	if (!first)
-		return NULL;
-
-	/*
-	 * First is set, so all of the descriptors we allocated have been added
-	 * to first->tx_list, INCLUDING "first" itself. Therefore we
-	 * must traverse the list backwards freeing each descriptor in turn
-	 *
-	 * We're re-using variables for the loop, oh well
-	 */
-	fsldma_free_desc_list_reverse(chan, &first->tx_list);
 	return NULL;
 }
 
 static int fsl_dma_device_control(struct dma_chan *dchan,
 				  enum dma_ctrl_cmd cmd, unsigned long arg)
 {
+	struct fsldma_slave_config *cfg;
 	struct fsldma_chan *chan;
 	unsigned long flags;
 
-	/* Only supports DMA_TERMINATE_ALL */
-	if (cmd != DMA_TERMINATE_ALL)
-		return -ENXIO;
-
 	if (!dchan)
 		return -EINVAL;
 
 	chan = to_fsl_chan(dchan);
 
-	/* Halt the DMA engine */
-	dma_halt(chan);
+	switch (cmd) {
+	case DMA_TERMINATE_ALL:
+		/* Halt the DMA engine */
+		dma_halt(chan);
 
-	spin_lock_irqsave(&chan->desc_lock, flags);
+		spin_lock_irqsave(&chan->desc_lock, flags);
 
-	/* Remove and free all of the descriptors in the LD queue */
-	fsldma_free_desc_list(chan, &chan->ld_pending);
-	fsldma_free_desc_list(chan, &chan->ld_running);
+		/* Remove and free all of the descriptors in the LD queue */
+		fsldma_free_desc_list(chan, &chan->ld_pending);
+		fsldma_free_desc_list(chan, &chan->ld_running);
 
-	spin_unlock_irqrestore(&chan->desc_lock, flags);
+		spin_unlock_irqrestore(&chan->desc_lock, flags);
+		return 0;
+
+	case DMA_SLAVE_CONFIG:
+
+		cfg = (struct fsldma_slave_config *)arg;
+		if (chan->set_request_count)
+			chan->set_request_count(chan, cfg->request_count);
+
+		if (chan->toggle_ext_start)
+			chan->toggle_ext_start(chan, cfg->external_start);
+
+		if (chan->toggle_ext_pause)
+			chan->toggle_ext_pause(chan, cfg->external_pause);
+
+		/*
+		 * TODO: add other features
+		 *
+		 * I'm not sure how to use the members dma_slave_config to
+		 * control the src/dst address hold features.
+		 */
+		return 0;
+
+	default:
+		return -ENXIO;
+	}
 
 	return 0;
 }
-- 
1.7.1

^ permalink raw reply related

* Re: [PATCH RFCv1 1/2] dmaengine: add support for scatterlist to scatterlist transfers
From: Dan Williams @ 2010-09-24 20:40 UTC (permalink / raw)
  To: Ira W. Snyder; +Cc: linuxppc-dev, linux-kernel
In-Reply-To: <1285357571-23377-2-git-send-email-iws@ovro.caltech.edu>

On Fri, Sep 24, 2010 at 12:46 PM, Ira W. Snyder <iws@ovro.caltech.edu> wrot=
e:
> This adds support for scatterlist to scatterlist DMA transfers. As
> requested by Dan, this is hidden behind an ifdef so that it can be
> selected by the drivers that need it.
>
> Signed-off-by: Ira W. Snyder <iws@ovro.caltech.edu>
> ---
> =A0drivers/dma/Kconfig =A0 =A0 =A0 | =A0 =A04 ++
> =A0drivers/dma/dmaengine.c =A0 | =A0119 +++++++++++++++++++++++++++++++++=
++++++++++++
> =A0include/linux/dmaengine.h | =A0 10 ++++
> =A03 files changed, 133 insertions(+), 0 deletions(-)
>
> diff --git a/drivers/dma/Kconfig b/drivers/dma/Kconfig
> index 9520cf0..f688669 100644
> --- a/drivers/dma/Kconfig
> +++ b/drivers/dma/Kconfig
> @@ -89,10 +89,14 @@ config AT_HDMAC
> =A0 =A0 =A0 =A0 =A0Support the Atmel AHB DMA controller. =A0This can be i=
ntegrated in
> =A0 =A0 =A0 =A0 =A0chips such as the Atmel AT91SAM9RL.
>
> +config DMAENGINE_SG_TO_SG
> + =A0 =A0 =A0 bool
> +
> =A0config FSL_DMA
> =A0 =A0 =A0 =A0tristate "Freescale Elo and Elo Plus DMA support"
> =A0 =A0 =A0 =A0depends on FSL_SOC
> =A0 =A0 =A0 =A0select DMA_ENGINE
> + =A0 =A0 =A0 select DMAENGINE_SG_TO_SG
> =A0 =A0 =A0 =A0---help---
> =A0 =A0 =A0 =A0 =A0Enable support for the Freescale Elo and Elo Plus DMA =
controllers.
> =A0 =A0 =A0 =A0 =A0The Elo is the DMA controller on some 82xx and 83xx pa=
rts, and the
> diff --git a/drivers/dma/dmaengine.c b/drivers/dma/dmaengine.c
> index 9d31d5e..57ec1e5 100644
> --- a/drivers/dma/dmaengine.c
> +++ b/drivers/dma/dmaengine.c
> @@ -972,10 +972,129 @@ dma_async_memcpy_pg_to_pg(struct dma_chan *chan, s=
truct page *dest_pg,
> =A0}
> =A0EXPORT_SYMBOL(dma_async_memcpy_pg_to_pg);
>
> +#ifdef CONFIG_DMAENGINE_SG_TO_SG
> +dma_cookie_t
> +dma_async_memcpy_sg_to_sg(struct dma_chan *chan,
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 struct scatterlist *dst=
_sg, unsigned int dst_nents,
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 struct scatterlist *src=
_sg, unsigned int src_nents,
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 dma_async_tx_callback c=
b, void *cb_param)
> +{
> + =A0 =A0 =A0 struct dma_device *dev =3D chan->device;
> + =A0 =A0 =A0 struct dma_async_tx_descriptor *tx;
> + =A0 =A0 =A0 dma_cookie_t cookie =3D -ENOMEM;
> + =A0 =A0 =A0 size_t dst_avail, src_avail;
> + =A0 =A0 =A0 struct list_head tx_list;
> + =A0 =A0 =A0 size_t transferred =3D 0;
> + =A0 =A0 =A0 dma_addr_t dst, src;
> + =A0 =A0 =A0 size_t len;
> +
> + =A0 =A0 =A0 if (dst_nents =3D=3D 0 || src_nents =3D=3D 0)
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 return -EINVAL;
> +
> + =A0 =A0 =A0 if (dst_sg =3D=3D NULL || src_sg =3D=3D NULL)
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 return -EINVAL;
> +
> + =A0 =A0 =A0 /* get prepared for the loop */
> + =A0 =A0 =A0 dst_avail =3D sg_dma_len(dst_sg);
> + =A0 =A0 =A0 src_avail =3D sg_dma_len(src_sg);
> +
> + =A0 =A0 =A0 INIT_LIST_HEAD(&tx_list);
> +
> + =A0 =A0 =A0 /* run until we are out of descriptors */
> + =A0 =A0 =A0 while (true) {
> +
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 /* create the largest transaction possible =
*/
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 len =3D min_t(size_t, src_avail, dst_avail)=
;
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (len =3D=3D 0)
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 goto fetch;
> +
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 dst =3D sg_dma_address(dst_sg) + sg_dma_len=
(dst_sg) - dst_avail;
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 src =3D sg_dma_address(src_sg) + sg_dma_len=
(src_sg) - src_avail;
> +
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 /* setup the transaction */
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 tx =3D dev->device_prep_dma_memcpy(chan, ds=
t, src, len, 0);
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (!tx) {
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 dev_err(dev->dev, "failed t=
o alloc desc for memcpy\n");
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 return -ENOMEM;

I don't think any dma channels gracefully handle descriptors that were
prepped but not submitted.  You would probably need to submit the
backlog, poll for completion, and then return the error.
Alternatively, the expectation is that descriptor allocations are
transient, i.e. once previously submitted transactions are completed
the descriptors will return to the available pool.  So you could do
what async_tx routines do and just poll for a descriptor.

> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 }
> +
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 /* keep track of the tx for later */
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 list_add_tail(&tx->entry, &tx_list);
> +
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 /* update metadata */
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 transferred +=3D len;
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 dst_avail -=3D len;
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 src_avail -=3D len;
> +
> +fetch:
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 /* fetch the next dst scatterlist entry */
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (dst_avail =3D=3D 0) {
> +
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 /* no more entries: we're d=
one */
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (dst_nents =3D=3D 0)
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 break;
> +
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 /* fetch the next entry: if=
 there are no more: done */
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 dst_sg =3D sg_next(dst_sg);
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (dst_sg =3D=3D NULL)
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 break;
> +
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 dst_nents--;
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 dst_avail =3D sg_dma_len(ds=
t_sg);
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 }
> +
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 /* fetch the next src scatterlist entry */
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (src_avail =3D=3D 0) {
> +
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 /* no more entries: we're d=
one */
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (src_nents =3D=3D 0)
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 break;
> +
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 /* fetch the next entry: if=
 there are no more: done */
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 src_sg =3D sg_next(src_sg);
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (src_sg =3D=3D NULL)
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 break;
> +
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 src_nents--;
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 src_avail =3D sg_dma_len(sr=
c_sg);
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 }
> + =A0 =A0 =A0 }
> +
> + =A0 =A0 =A0 /* loop through the list of descriptors and submit them */
> + =A0 =A0 =A0 list_for_each_entry(tx, &tx_list, entry) {
> +
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 /* this is the last descriptor: add the cal=
lback */
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (list_is_last(&tx->entry, &tx_list)) {
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 tx->callback =3D cb;
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 tx->callback_param =3D cb_p=
aram;
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 }
> +
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 /* submit the transaction */
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 cookie =3D tx->tx_submit(tx);

Some dma drivers cannot tolerate prep being reordered with respect to
submit (ioatdma enforces this ordering by locking in prep and
unlocking in submit).  In general those channels can be identified by
ones that select CONFIG_ASYNC_TX_DISABLE_CHANNEL_SWITCH.  However this
opt-out arrangement is awkward so I'll put together a patch to make it
opt-in (CONFIG_ASYNC_TX_CHANNEL_SWITCH).

The end implication for this patch is that CONFIG_DMAENGINE_SG_TO_SG
can only be supported by those channels that are also prepared to
handle channel switching i.e. can tolerate intervening calls to prep()
before the eventual submit().

[snip]

> diff --git a/include/linux/dmaengine.h b/include/linux/dmaengine.h
> index c61d4ca..5507f4c 100644
> --- a/include/linux/dmaengine.h
> +++ b/include/linux/dmaengine.h
> @@ -24,6 +24,7 @@
> =A0#include <linux/device.h>
> =A0#include <linux/uio.h>
> =A0#include <linux/dma-mapping.h>
> +#include <linux/list.h>
>
> =A0/**
> =A0* typedef dma_cookie_t - an opaque DMA cookie
> @@ -316,6 +317,9 @@ struct dma_async_tx_descriptor {
> =A0 =A0 =A0 =A0dma_cookie_t (*tx_submit)(struct dma_async_tx_descriptor *=
tx);
> =A0 =A0 =A0 =A0dma_async_tx_callback callback;
> =A0 =A0 =A0 =A0void *callback_param;
> +#ifdef CONFIG_DMAENGINE_SG_TO_SG
> + =A0 =A0 =A0 struct list_head entry;
> +#endif
> =A0#ifndef CONFIG_ASYNC_TX_DISABLE_CHANNEL_SWITCH
> =A0 =A0 =A0 =A0struct dma_async_tx_descriptor *next;
> =A0 =A0 =A0 =A0struct dma_async_tx_descriptor *parent;

Per the above comment if we are already depending on channel switch
being enabled for sg-to-sg operation, then you can just use the 'next'
pointer to have a singly linked list of descriptors.  Rather than
increase the size of the base descriptor.

--
Dan

^ permalink raw reply

* Re: [PATCH RFCv1 1/2] dmaengine: add support for scatterlist to scatterlist transfers
From: Ira W. Snyder @ 2010-09-24 21:24 UTC (permalink / raw)
  To: Dan Williams; +Cc: linuxppc-dev, linux-kernel
In-Reply-To: <AANLkTimfZa+N51Tww_pWFqtqMmcYYkH_V0V_P1Qizt5b@mail.gmail.com>

On Fri, Sep 24, 2010 at 01:40:56PM -0700, Dan Williams wrote:
> On Fri, Sep 24, 2010 at 12:46 PM, Ira W. Snyder <iws@ovro.caltech.edu> wrote:
> > This adds support for scatterlist to scatterlist DMA transfers. As
> > requested by Dan, this is hidden behind an ifdef so that it can be
> > selected by the drivers that need it.
> >
> > Signed-off-by: Ira W. Snyder <iws@ovro.caltech.edu>
> > ---
> >  drivers/dma/Kconfig       |    4 ++
> >  drivers/dma/dmaengine.c   |  119 +++++++++++++++++++++++++++++++++++++++++++++
> >  include/linux/dmaengine.h |   10 ++++
> >  3 files changed, 133 insertions(+), 0 deletions(-)
> >
> > diff --git a/drivers/dma/Kconfig b/drivers/dma/Kconfig
> > index 9520cf0..f688669 100644
> > --- a/drivers/dma/Kconfig
> > +++ b/drivers/dma/Kconfig
> > @@ -89,10 +89,14 @@ config AT_HDMAC
> >          Support the Atmel AHB DMA controller.  This can be integrated in
> >          chips such as the Atmel AT91SAM9RL.
> >
> > +config DMAENGINE_SG_TO_SG
> > +       bool
> > +
> >  config FSL_DMA
> >        tristate "Freescale Elo and Elo Plus DMA support"
> >        depends on FSL_SOC
> >        select DMA_ENGINE
> > +       select DMAENGINE_SG_TO_SG
> >        ---help---
> >          Enable support for the Freescale Elo and Elo Plus DMA controllers.
> >          The Elo is the DMA controller on some 82xx and 83xx parts, and the
> > diff --git a/drivers/dma/dmaengine.c b/drivers/dma/dmaengine.c
> > index 9d31d5e..57ec1e5 100644
> > --- a/drivers/dma/dmaengine.c
> > +++ b/drivers/dma/dmaengine.c
> > @@ -972,10 +972,129 @@ dma_async_memcpy_pg_to_pg(struct dma_chan *chan, struct page *dest_pg,
> >  }
> >  EXPORT_SYMBOL(dma_async_memcpy_pg_to_pg);
> >
> > +#ifdef CONFIG_DMAENGINE_SG_TO_SG
> > +dma_cookie_t
> > +dma_async_memcpy_sg_to_sg(struct dma_chan *chan,
> > +                         struct scatterlist *dst_sg, unsigned int dst_nents,
> > +                         struct scatterlist *src_sg, unsigned int src_nents,
> > +                         dma_async_tx_callback cb, void *cb_param)
> > +{
> > +       struct dma_device *dev = chan->device;
> > +       struct dma_async_tx_descriptor *tx;
> > +       dma_cookie_t cookie = -ENOMEM;
> > +       size_t dst_avail, src_avail;
> > +       struct list_head tx_list;
> > +       size_t transferred = 0;
> > +       dma_addr_t dst, src;
> > +       size_t len;
> > +
> > +       if (dst_nents == 0 || src_nents == 0)
> > +               return -EINVAL;
> > +
> > +       if (dst_sg == NULL || src_sg == NULL)
> > +               return -EINVAL;
> > +
> > +       /* get prepared for the loop */
> > +       dst_avail = sg_dma_len(dst_sg);
> > +       src_avail = sg_dma_len(src_sg);
> > +
> > +       INIT_LIST_HEAD(&tx_list);
> > +
> > +       /* run until we are out of descriptors */
> > +       while (true) {
> > +
> > +               /* create the largest transaction possible */
> > +               len = min_t(size_t, src_avail, dst_avail);
> > +               if (len == 0)
> > +                       goto fetch;
> > +
> > +               dst = sg_dma_address(dst_sg) + sg_dma_len(dst_sg) - dst_avail;
> > +               src = sg_dma_address(src_sg) + sg_dma_len(src_sg) - src_avail;
> > +
> > +               /* setup the transaction */
> > +               tx = dev->device_prep_dma_memcpy(chan, dst, src, len, 0);
> > +               if (!tx) {
> > +                       dev_err(dev->dev, "failed to alloc desc for memcpy\n");
> > +                       return -ENOMEM;
> 
> I don't think any dma channels gracefully handle descriptors that were
> prepped but not submitted.  You would probably need to submit the
> backlog, poll for completion, and then return the error.
> Alternatively, the expectation is that descriptor allocations are
> transient, i.e. once previously submitted transactions are completed
> the descriptors will return to the available pool.  So you could do
> what async_tx routines do and just poll for a descriptor.
> 

Can you give me an example? Even some pseudocode would help.

The other DMAEngine functions (dma_async_memcpy_*()) don't do anything
with the descriptor if submit fails. Take for example
dma_async_memcpy_buf_to_buf(). If tx->tx_submit(tx); fails, any code
using it has no way to return the descriptor to the free pool.

Does tx_submit() implicitly return descriptors to the free pool if it
fails?

> > +               }
> > +
> > +               /* keep track of the tx for later */
> > +               list_add_tail(&tx->entry, &tx_list);
> > +
> > +               /* update metadata */
> > +               transferred += len;
> > +               dst_avail -= len;
> > +               src_avail -= len;
> > +
> > +fetch:
> > +               /* fetch the next dst scatterlist entry */
> > +               if (dst_avail == 0) {
> > +
> > +                       /* no more entries: we're done */
> > +                       if (dst_nents == 0)
> > +                               break;
> > +
> > +                       /* fetch the next entry: if there are no more: done */
> > +                       dst_sg = sg_next(dst_sg);
> > +                       if (dst_sg == NULL)
> > +                               break;
> > +
> > +                       dst_nents--;
> > +                       dst_avail = sg_dma_len(dst_sg);
> > +               }
> > +
> > +               /* fetch the next src scatterlist entry */
> > +               if (src_avail == 0) {
> > +
> > +                       /* no more entries: we're done */
> > +                       if (src_nents == 0)
> > +                               break;
> > +
> > +                       /* fetch the next entry: if there are no more: done */
> > +                       src_sg = sg_next(src_sg);
> > +                       if (src_sg == NULL)
> > +                               break;
> > +
> > +                       src_nents--;
> > +                       src_avail = sg_dma_len(src_sg);
> > +               }
> > +       }
> > +
> > +       /* loop through the list of descriptors and submit them */
> > +       list_for_each_entry(tx, &tx_list, entry) {
> > +
> > +               /* this is the last descriptor: add the callback */
> > +               if (list_is_last(&tx->entry, &tx_list)) {
> > +                       tx->callback = cb;
> > +                       tx->callback_param = cb_param;
> > +               }
> > +
> > +               /* submit the transaction */
> > +               cookie = tx->tx_submit(tx);
> 
> Some dma drivers cannot tolerate prep being reordered with respect to
> submit (ioatdma enforces this ordering by locking in prep and
> unlocking in submit).  In general those channels can be identified by
> ones that select CONFIG_ASYNC_TX_DISABLE_CHANNEL_SWITCH.  However this
> opt-out arrangement is awkward so I'll put together a patch to make it
> opt-in (CONFIG_ASYNC_TX_CHANNEL_SWITCH).
> 
> The end implication for this patch is that CONFIG_DMAENGINE_SG_TO_SG
> can only be supported by those channels that are also prepared to
> handle channel switching i.e. can tolerate intervening calls to prep()
> before the eventual submit().
> 
> [snip]
> 

I found it difficult to detect when we were at the last descriptor
unless I did this in two steps. This is why I have two loops: one for
prep and another for submit.

How about the change inlined at the end of the email. Following your
description, I think it should work on the ioatdma driver, since it
handles prep and submit right next to each other.

> > diff --git a/include/linux/dmaengine.h b/include/linux/dmaengine.h
> > index c61d4ca..5507f4c 100644
> > --- a/include/linux/dmaengine.h
> > +++ b/include/linux/dmaengine.h
> > @@ -24,6 +24,7 @@
> >  #include <linux/device.h>
> >  #include <linux/uio.h>
> >  #include <linux/dma-mapping.h>
> > +#include <linux/list.h>
> >
> >  /**
> >  * typedef dma_cookie_t - an opaque DMA cookie
> > @@ -316,6 +317,9 @@ struct dma_async_tx_descriptor {
> >        dma_cookie_t (*tx_submit)(struct dma_async_tx_descriptor *tx);
> >        dma_async_tx_callback callback;
> >        void *callback_param;
> > +#ifdef CONFIG_DMAENGINE_SG_TO_SG
> > +       struct list_head entry;
> > +#endif
> >  #ifndef CONFIG_ASYNC_TX_DISABLE_CHANNEL_SWITCH
> >        struct dma_async_tx_descriptor *next;
> >        struct dma_async_tx_descriptor *parent;
> 
> Per the above comment if we are already depending on channel switch
> being enabled for sg-to-sg operation, then you can just use the 'next'
> pointer to have a singly linked list of descriptors.  Rather than
> increase the size of the base descriptor.
> 

Ok, I thought the list was clearer, but this is equally easy. How about
the following change that does away with the list completely. Then
things should work on ioatdma as well.

>From d59569ff48a89ef5411af3cf2995af7b742c5cd3 Mon Sep 17 00:00:00 2001
From: Ira W. Snyder <iws@ovro.caltech.edu>
Date: Fri, 24 Sep 2010 14:18:09 -0700
Subject: [PATCH] dma: improve scatterlist to scatterlist transfer

This is an improved algorithm to improve support on the Intel I/OAT
driver.

Signed-off-by: Ira W. Snyder <iws@ovro.caltech.edu>
---
 drivers/dma/dmaengine.c   |   52 +++++++++++++++++++++-----------------------
 include/linux/dmaengine.h |    3 --
 2 files changed, 25 insertions(+), 30 deletions(-)

diff --git a/drivers/dma/dmaengine.c b/drivers/dma/dmaengine.c
index 57ec1e5..cde775c 100644
--- a/drivers/dma/dmaengine.c
+++ b/drivers/dma/dmaengine.c
@@ -983,10 +983,13 @@ dma_async_memcpy_sg_to_sg(struct dma_chan *chan,
 	struct dma_async_tx_descriptor *tx;
 	dma_cookie_t cookie = -ENOMEM;
 	size_t dst_avail, src_avail;
-	struct list_head tx_list;
+	struct scatterlist *sg;
 	size_t transferred = 0;
+	size_t dst_total = 0;
+	size_t src_total = 0;
 	dma_addr_t dst, src;
 	size_t len;
+	int i;
 
 	if (dst_nents == 0 || src_nents == 0)
 		return -EINVAL;
@@ -994,12 +997,17 @@ dma_async_memcpy_sg_to_sg(struct dma_chan *chan,
 	if (dst_sg == NULL || src_sg == NULL)
 		return -EINVAL;
 
+	/* get the total count of bytes in each scatterlist */
+	for_each_sg(dst_sg, sg, dst_nents, i)
+		dst_total += sg_dma_len(sg);
+
+	for_each_sg(src_sg, sg, src_nents, i)
+		src_total += sg_dma_len(sg);
+
 	/* get prepared for the loop */
 	dst_avail = sg_dma_len(dst_sg);
 	src_avail = sg_dma_len(src_sg);
 
-	INIT_LIST_HEAD(&tx_list);
-
 	/* run until we are out of descriptors */
 	while (true) {
 
@@ -1018,14 +1026,24 @@ dma_async_memcpy_sg_to_sg(struct dma_chan *chan,
 			return -ENOMEM;
 		}
 
-		/* keep track of the tx for later */
-		list_add_tail(&tx->entry, &tx_list);
-
 		/* update metadata */
 		transferred += len;
 		dst_avail -= len;
 		src_avail -= len;
 
+		/* if this is the last transfer, setup the callback */
+		if (dst_total == transferred || src_total == transferred) {
+			tx->callback = cb;
+			tx->callback_param = cb_param;
+		}
+
+		/* submit the transaction */
+		cookie = tx->tx_submit(tx);
+		if (dma_submit_error(cookie)) {
+			dev_err(dev->dev, "failed to submit desc\n");
+			return cookie;
+		}
+
 fetch:
 		/* fetch the next dst scatterlist entry */
 		if (dst_avail == 0) {
@@ -1060,30 +1078,13 @@ fetch:
 		}
 	}
 
-	/* loop through the list of descriptors and submit them */
-	list_for_each_entry(tx, &tx_list, entry) {
-
-		/* this is the last descriptor: add the callback */
-		if (list_is_last(&tx->entry, &tx_list)) {
-			tx->callback = cb;
-			tx->callback_param = cb_param;
-		}
-
-		/* submit the transaction */
-		cookie = tx->tx_submit(tx);
-		if (dma_submit_error(cookie)) {
-			dev_err(dev->dev, "failed to submit desc\n");
-			return cookie;
-		}
-	}
-
 	/* update counters */
 	preempt_disable();
 	__this_cpu_add(chan->local->bytes_transferred, transferred);
 	__this_cpu_inc(chan->local->memcpy_count);
 	preempt_enable();
 
-	return cookie;
+	return 0;
 }
 EXPORT_SYMBOL(dma_async_memcpy_sg_to_sg);
 #endif
@@ -1092,9 +1093,6 @@ void dma_async_tx_descriptor_init(struct dma_async_tx_descriptor *tx,
 	struct dma_chan *chan)
 {
 	tx->chan = chan;
-	#ifdef CONFIG_DMAENGINE_SG_TO_SG
-	INIT_LIST_HEAD(&tx->entry);
-	#endif
 	#ifndef CONFIG_ASYNC_TX_DISABLE_CHANNEL_SWITCH
 	spin_lock_init(&tx->lock);
 	#endif
diff --git a/include/linux/dmaengine.h b/include/linux/dmaengine.h
index 5507f4c..26f2712 100644
--- a/include/linux/dmaengine.h
+++ b/include/linux/dmaengine.h
@@ -317,9 +317,6 @@ struct dma_async_tx_descriptor {
 	dma_cookie_t (*tx_submit)(struct dma_async_tx_descriptor *tx);
 	dma_async_tx_callback callback;
 	void *callback_param;
-#ifdef CONFIG_DMAENGINE_SG_TO_SG
-	struct list_head entry;
-#endif
 #ifndef CONFIG_ASYNC_TX_DISABLE_CHANNEL_SWITCH
 	struct dma_async_tx_descriptor *next;
 	struct dma_async_tx_descriptor *parent;
-- 
1.7.1

^ permalink raw reply related

* [PATCH 1/1]  Add config option for batched hcalls
From: Will Schmidt @ 2010-09-24 21:44 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Anton Blanchard


Add a config option for the (batched) MULTITCE and BULK_REMOVE h-calls.

By default, these options are on and are beneficial for performance and
throughput reasons.   If disabled, the code will fall back to using less
optimal TCE and REMOVE hcalls.   The ability to easily disable these
options is useful for some of the PREEMPT_RT related investigation and
work occurring on Power.


Signed-off-by: Will Schmidt <will_schmidt@vnet.ibm.com>
cc: Anton Blanchard <anton@samba.org>
cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>

---
diff --git a/arch/powerpc/platforms/pseries/Kconfig b/arch/powerpc/platforms/pseries/Kconfig
index f0e6f28..0b5e6a9 100644
--- a/arch/powerpc/platforms/pseries/Kconfig
+++ b/arch/powerpc/platforms/pseries/Kconfig
@@ -81,3 +81,23 @@ config DTL
 	  which are accessible through a debugfs file.
 
 	  Say N if you are unsure.
+
+config BULK_REMOVE
+	bool "Enable BULK_REMOVE"
+	depends on PPC_PSERIES
+	default y
+	help
+	  Enable the BULK_REMOVE option for the hash page code.
+	  This relies on a "hcall-bulk" firmware feature, and
+	  should be enabled for performance throughput.
+
+config MULTITCE
+	bool "Enable MultiTCE"
+	depends on PPC_PSERIES
+	default y
+	help
+	  Enable the Multi-TCE code, allowing a single hcall to
+	  update multiple TCE entries at one time.  This relies
+	  on a "hcall-multi-tce" firmware feature, and should be
+	  enabled for performance throughput.
+
diff --git a/arch/powerpc/platforms/pseries/firmware.c b/arch/powerpc/platforms/pseries/firmware.c
index 0a4d8c..4327064 100644
--- a/arch/powerpc/platforms/pseries/firmware.c
+++ b/arch/powerpc/platforms/pseries/firmware.c
@@ -51,9 +51,13 @@ firmware_features_table[FIRMWARE_MAX_FEATURES] = {
 	{FW_FEATURE_VIO,		"hcall-vio"},
 	{FW_FEATURE_RDMA,		"hcall-rdma"},
 	{FW_FEATURE_LLAN,		"hcall-lLAN"},
+#if defined(CONFIG_BULK_REMOVE)
 	{FW_FEATURE_BULK_REMOVE,	"hcall-bulk"},
+#endif
 	{FW_FEATURE_XDABR,		"hcall-xdabr"},
+#if defined(CONFIG_MULTITCE)
 	{FW_FEATURE_MULTITCE,		"hcall-multi-tce"},
+#endif
 	{FW_FEATURE_SPLPAR,		"hcall-splpar"},
 };
 
 /* Build up the firmware features bitmask using the contents of

^ permalink raw reply related

* Re: [PATCH RFCv1 1/2] dmaengine: add support for scatterlist to scatterlist transfers
From: Dan Williams @ 2010-09-24 21:53 UTC (permalink / raw)
  To: Ira W. Snyder; +Cc: linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org
In-Reply-To: <20100924212443.GA24654@ovro.caltech.edu>

On Fri, 2010-09-24 at 14:24 -0700, Ira W. Snyder wrote:
> On Fri, Sep 24, 2010 at 01:40:56PM -0700, Dan Williams wrote:
> > I don't think any dma channels gracefully handle descriptors that were
> > prepped but not submitted.  You would probably need to submit the
> > backlog, poll for completion, and then return the error.
> > Alternatively, the expectation is that descriptor allocations are
> > transient, i.e. once previously submitted transactions are completed
> > the descriptors will return to the available pool.  So you could do
> > what async_tx routines do and just poll for a descriptor.
> >
> 
> Can you give me an example? Even some pseudocode would help.

Here is one from do_async_gen_syndrome() in crypto/async_tx/async_pq.c:

        /* Since we have clobbered the src_list we are committed
         * to doing this asynchronously.  Drivers force forward
         * progress in case they can not provide a descriptor
         */
        for (;;) {
                tx = dma->device_prep_dma_pq(chan, dma_dest,
                                             &dma_src[src_off],
                                             pq_src_cnt,
                                             &coefs[src_off], len,
                                             dma_flags);
                if (likely(tx))
                        break;  
                async_tx_quiesce(&submit->depend_tx);
                dma_async_issue_pending(chan);
        }       

> The other DMAEngine functions (dma_async_memcpy_*()) don't do anything
> with the descriptor if submit fails. Take for example
> dma_async_memcpy_buf_to_buf(). If tx->tx_submit(tx); fails, any code
> using it has no way to return the descriptor to the free pool.
> 
> Does tx_submit() implicitly return descriptors to the free pool if it
> fails?

No, submit() failures are a hold over from when the ioatdma driver used
to perform additional descriptor allocation at ->submit() time.  After
prep() the expectation is that the engine is just waiting to be told
"go" and can't fail.  The only reason ->submit() retains a return code
is to support the "cookie" based method for polling for operation
completion.  A dma driver should handle all descriptor submission
failure scenarios at prep time.

> Ok, I thought the list was clearer, but this is equally easy. How about
> the following change that does away with the list completely. Then
> things should work on ioatdma as well.
> 
> From d59569ff48a89ef5411af3cf2995af7b742c5cd3 Mon Sep 17 00:00:00 2001
> From: Ira W. Snyder <iws@ovro.caltech.edu>
> Date: Fri, 24 Sep 2010 14:18:09 -0700
> Subject: [PATCH] dma: improve scatterlist to scatterlist transfer
> 
> This is an improved algorithm to improve support on the Intel I/OAT
> driver.
> 
> Signed-off-by: Ira W. Snyder <iws@ovro.caltech.edu>
> ---
>  drivers/dma/dmaengine.c   |   52 +++++++++++++++++++++-----------------------
>  include/linux/dmaengine.h |    3 --
>  2 files changed, 25 insertions(+), 30 deletions(-)
> 
> diff --git a/drivers/dma/dmaengine.c b/drivers/dma/dmaengine.c
> index 57ec1e5..cde775c 100644
> --- a/drivers/dma/dmaengine.c
> +++ b/drivers/dma/dmaengine.c
> @@ -983,10 +983,13 @@ dma_async_memcpy_sg_to_sg(struct dma_chan *chan,
>         struct dma_async_tx_descriptor *tx;
>         dma_cookie_t cookie = -ENOMEM;
>         size_t dst_avail, src_avail;
> -       struct list_head tx_list;
> +       struct scatterlist *sg;
>         size_t transferred = 0;
> +       size_t dst_total = 0;
> +       size_t src_total = 0;
>         dma_addr_t dst, src;
>         size_t len;
> +       int i;
> 
>         if (dst_nents == 0 || src_nents == 0)
>                 return -EINVAL;
> @@ -994,12 +997,17 @@ dma_async_memcpy_sg_to_sg(struct dma_chan *chan,
>         if (dst_sg == NULL || src_sg == NULL)
>                 return -EINVAL;
> 
> +       /* get the total count of bytes in each scatterlist */
> +       for_each_sg(dst_sg, sg, dst_nents, i)
> +               dst_total += sg_dma_len(sg);
> +
> +       for_each_sg(src_sg, sg, src_nents, i)
> +               src_total += sg_dma_len(sg);
> +

What about overrun or underrun do we not care if src_total != dst_total?

Otherwise looks ok.

^ permalink raw reply

* Re: [PATCH RFCv1 1/2] dmaengine: add support for scatterlist to scatterlist transfers
From: Ira W. Snyder @ 2010-09-24 22:04 UTC (permalink / raw)
  To: Dan Williams; +Cc: linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org
In-Reply-To: <1285365194.21375.22.camel@dwillia2-linux>

On Fri, Sep 24, 2010 at 02:53:14PM -0700, Dan Williams wrote:
> On Fri, 2010-09-24 at 14:24 -0700, Ira W. Snyder wrote:
> > On Fri, Sep 24, 2010 at 01:40:56PM -0700, Dan Williams wrote:
> > > I don't think any dma channels gracefully handle descriptors that were
> > > prepped but not submitted.  You would probably need to submit the
> > > backlog, poll for completion, and then return the error.
> > > Alternatively, the expectation is that descriptor allocations are
> > > transient, i.e. once previously submitted transactions are completed
> > > the descriptors will return to the available pool.  So you could do
> > > what async_tx routines do and just poll for a descriptor.
> > >
> > 
> > Can you give me an example? Even some pseudocode would help.
> 
> Here is one from do_async_gen_syndrome() in crypto/async_tx/async_pq.c:
> 
>         /* Since we have clobbered the src_list we are committed
>          * to doing this asynchronously.  Drivers force forward
>          * progress in case they can not provide a descriptor
>          */
>         for (;;) {
>                 tx = dma->device_prep_dma_pq(chan, dma_dest,
>                                              &dma_src[src_off],
>                                              pq_src_cnt,
>                                              &coefs[src_off], len,
>                                              dma_flags);
>                 if (likely(tx))
>                         break;  
>                 async_tx_quiesce(&submit->depend_tx);
>                 dma_async_issue_pending(chan);
>         }       
> 
> > The other DMAEngine functions (dma_async_memcpy_*()) don't do anything
> > with the descriptor if submit fails. Take for example
> > dma_async_memcpy_buf_to_buf(). If tx->tx_submit(tx); fails, any code
> > using it has no way to return the descriptor to the free pool.
> > 
> > Does tx_submit() implicitly return descriptors to the free pool if it
> > fails?
> 
> No, submit() failures are a hold over from when the ioatdma driver used
> to perform additional descriptor allocation at ->submit() time.  After
> prep() the expectation is that the engine is just waiting to be told
> "go" and can't fail.  The only reason ->submit() retains a return code
> is to support the "cookie" based method for polling for operation
> completion.  A dma driver should handle all descriptor submission
> failure scenarios at prep time.
> 

Ok, that's more like what I expected. So we still need the try forever
code similar to the above. I can add that for the next version.

> > Ok, I thought the list was clearer, but this is equally easy. How about
> > the following change that does away with the list completely. Then
> > things should work on ioatdma as well.
> > 
> > From d59569ff48a89ef5411af3cf2995af7b742c5cd3 Mon Sep 17 00:00:00 2001
> > From: Ira W. Snyder <iws@ovro.caltech.edu>
> > Date: Fri, 24 Sep 2010 14:18:09 -0700
> > Subject: [PATCH] dma: improve scatterlist to scatterlist transfer
> > 
> > This is an improved algorithm to improve support on the Intel I/OAT
> > driver.
> > 
> > Signed-off-by: Ira W. Snyder <iws@ovro.caltech.edu>
> > ---
> >  drivers/dma/dmaengine.c   |   52 +++++++++++++++++++++-----------------------
> >  include/linux/dmaengine.h |    3 --
> >  2 files changed, 25 insertions(+), 30 deletions(-)
> > 
> > diff --git a/drivers/dma/dmaengine.c b/drivers/dma/dmaengine.c
> > index 57ec1e5..cde775c 100644
> > --- a/drivers/dma/dmaengine.c
> > +++ b/drivers/dma/dmaengine.c
> > @@ -983,10 +983,13 @@ dma_async_memcpy_sg_to_sg(struct dma_chan *chan,
> >         struct dma_async_tx_descriptor *tx;
> >         dma_cookie_t cookie = -ENOMEM;
> >         size_t dst_avail, src_avail;
> > -       struct list_head tx_list;
> > +       struct scatterlist *sg;
> >         size_t transferred = 0;
> > +       size_t dst_total = 0;
> > +       size_t src_total = 0;
> >         dma_addr_t dst, src;
> >         size_t len;
> > +       int i;
> > 
> >         if (dst_nents == 0 || src_nents == 0)
> >                 return -EINVAL;
> > @@ -994,12 +997,17 @@ dma_async_memcpy_sg_to_sg(struct dma_chan *chan,
> >         if (dst_sg == NULL || src_sg == NULL)
> >                 return -EINVAL;
> > 
> > +       /* get the total count of bytes in each scatterlist */
> > +       for_each_sg(dst_sg, sg, dst_nents, i)
> > +               dst_total += sg_dma_len(sg);
> > +
> > +       for_each_sg(src_sg, sg, src_nents, i)
> > +               src_total += sg_dma_len(sg);
> > +
> 
> What about overrun or underrun do we not care if src_total != dst_total?
> 
> Otherwise looks ok.
> 

I don't know if we should care about that. The algorithm handles that
case just fine. It copies the maximum amount it can, which is exactly
min(src_total, dst_total). Whichever scatterlist runs out of entries
first is the shortest.

As a real world example, my driver verifies that both scatterlists have
exactly the right number of bytes available before trying to program the
hardware.

Ira

^ permalink raw reply

* Re: ppc44x - how do i optimize driver for tlb hits
From: Benjamin Herrenschmidt @ 2010-09-24 22:11 UTC (permalink / raw)
  To: Ayman El-Khashab; +Cc: linuxppc-dev
In-Reply-To: <20100924130851.GA14016@crust.elkhashab.com>

On Fri, 2010-09-24 at 08:08 -0500, Ayman El-Khashab wrote:
> 
> I suppose another option is to to use the kernel profiling option I 
> always see but have never used.  Is that a viable option to figure out
> what is happening here?  

With perf and stochastic sampling ? If you sample fast enough... but
you'll mostly point to your routine I suppose... though it might tell
you statistically where in your code, which -might- help.

Cheers,
Ben.

^ permalink raw reply

* Re: [PATCH 1/8] posix clocks: introduce a syscall for clock tuning.
From: Benjamin Herrenschmidt @ 2010-09-24 22:12 UTC (permalink / raw)
  To: Richard Cochran
  Cc: Peter Zijlstra, John Stultz, devicetree-discuss, linuxppc-dev,
	linux-kernel, David Miller, netdev, linux-api, Thomas Gleixner,
	Rodolfo Giometti, Christoph Lameter, linux-arm-kernel,
	Krzysztof Halasa
In-Reply-To: <20100924075534.GB5043@riccoc20.at.omicron.at>


> > This list is getting way too much unrelated stuff, which I find
> > annoying, it would be nice if we were all a bit more careful here with
> > our CC lists.
> 
> Sorry, I only added device-tree because some one asked me to do so.
> 
>     http://marc.info/?l=linux-netdev&m=127273157912358
> 
> I'll leave it off next time.

That's allright. I'd rather you just post the bindings there than the
whole patch least but no big deal.

I was just fixing my email filters and notice a lot of seemingly
unrelated stuff landing there :-)

Cheers,
Ben.

^ permalink raw reply

* [PATCH (Option 1)] of/i2c: fix module load order issue caused by of_i2c.c
From: Grant Likely @ 2010-09-24 22:14 UTC (permalink / raw)
  To: khali, mikpe; +Cc: rdunlap, linuxppc-dev, linux-kernel, linux-i2c

Commit 959e85f7, "i2c: add OF-style registration and binding" caused a
module dependency loop where of_i2c.c calls functions in i2c-core, and
i2c-core calls of_i2c_register_devices() in of_i2c.  This means that
when i2c support is built as a module when CONFIG_OF is set, then
neither i2c_core nor of_i2c are able to be loaded.

This patch fixes the problem by moving the of_i2c_register_devices()
function into the body of i2c_core and renaming it to
i2c_scan_of_devices (of_i2c_register_devices is analogous to the
existing i2c_scan_static_board_info function and so should be named
similarly).  This function isn't called by any code outside of
i2c_core, and it must always be present when CONFIG_OF is selected, so
it makes sense to locate it there.  When CONFIG_OF is not selected,
of_i2c_register_devices() becomes a no-op.

Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
---
 drivers/i2c/i2c-core.c |   61 ++++++++++++++++++++++++++++++++++++++++++++++--
 drivers/of/of_i2c.c    |   57 ---------------------------------------------
 include/linux/of_i2c.h |    7 ------
 3 files changed, 59 insertions(+), 66 deletions(-)

diff --git a/drivers/i2c/i2c-core.c b/drivers/i2c/i2c-core.c
index 6649176..64a261b 100644
--- a/drivers/i2c/i2c-core.c
+++ b/drivers/i2c/i2c-core.c
@@ -32,8 +32,8 @@
 #include <linux/init.h>
 #include <linux/idr.h>
 #include <linux/mutex.h>
-#include <linux/of_i2c.h>
 #include <linux/of_device.h>
+#include <linux/of_irq.h>
 #include <linux/completion.h>
 #include <linux/hardirq.h>
 #include <linux/irqflags.h>
@@ -818,6 +818,63 @@ static void i2c_scan_static_board_info(struct i2c_adapter *adapter)
 	up_read(&__i2c_board_lock);
 }
 
+#ifdef CONFIG_OF
+void i2c_scan_of_devices(struct i2c_adapter *adap)
+{
+	void *result;
+	struct device_node *node;
+
+	/* Only register child devices if the adapter has a node pointer set */
+	if (!adap->dev.of_node)
+		return;
+
+	for_each_child_of_node(adap->dev.of_node, node) {
+		struct i2c_board_info info = {};
+		struct dev_archdata dev_ad = {};
+		const __be32 *addr;
+		int len;
+
+		dev_dbg(&adap->dev, "of_i2c: register %s\n", node->full_name);
+		if (of_modalias_node(node, info.type, sizeof(info.type)) < 0) {
+			dev_err(&adap->dev, "of_i2c: modalias failure on %s\n",
+				node->full_name);
+			continue;
+		}
+
+		addr = of_get_property(node, "reg", &len);
+		if (!addr || (len < sizeof(int))) {
+			dev_err(&adap->dev, "of_i2c: invalid reg on %s\n",
+				node->full_name);
+			continue;
+		}
+
+		info.addr = be32_to_cpup(addr);
+		if (info.addr > (1 << 10) - 1) {
+			dev_err(&adap->dev, "of_i2c: invalid addr=%x on %s\n",
+				info.addr, node->full_name);
+			continue;
+		}
+
+		info.irq = irq_of_parse_and_map(node, 0);
+		info.of_node = of_node_get(node);
+		info.archdata = &dev_ad;
+
+		request_module("%s", info.type);
+
+		result = i2c_new_device(adap, &info);
+		if (result == NULL) {
+			dev_err(&adap->dev, "of_i2c: Failure registering %s\n",
+			        node->full_name);
+			of_node_put(node);
+			irq_dispose_mapping(info.irq);
+			continue;
+		}
+	}
+}
+#else
+static inline void i2c_scan_of_devices(struct i2c_adapter *adap) { }
+#endif
+
 static int i2c_do_add_adapter(struct i2c_driver *driver,
 			      struct i2c_adapter *adap)
 {
@@ -877,7 +934,7 @@ static int i2c_register_adapter(struct i2c_adapter *adap)
 		i2c_scan_static_board_info(adap);
 
 	/* Register devices from the device tree */
-	of_i2c_register_devices(adap);
+	i2c_scan_of_devices(adap);
 
 	/* Notify drivers */
 	mutex_lock(&core_lock);
diff --git a/drivers/of/of_i2c.c b/drivers/of/of_i2c.c
index 0a694de..e0c3841 100644
--- a/drivers/of/of_i2c.c
+++ b/drivers/of/of_i2c.c
@@ -17,63 +17,6 @@
 #include <linux/of_irq.h>
 #include <linux/module.h>
 
-void of_i2c_register_devices(struct i2c_adapter *adap)
-{
-	void *result;
-	struct device_node *node;
-
-	/* Only register child devices if the adapter has a node pointer set */
-	if (!adap->dev.of_node)
-		return;
-
-	dev_dbg(&adap->dev, "of_i2c: walking child nodes\n");
-
-	for_each_child_of_node(adap->dev.of_node, node) {
-		struct i2c_board_info info = {};
-		struct dev_archdata dev_ad = {};
-		const __be32 *addr;
-		int len;
-
-		dev_dbg(&adap->dev, "of_i2c: register %s\n", node->full_name);
-
-		if (of_modalias_node(node, info.type, sizeof(info.type)) < 0) {
-			dev_err(&adap->dev, "of_i2c: modalias failure on %s\n",
-				node->full_name);
-			continue;
-		}
-
-		addr = of_get_property(node, "reg", &len);
-		if (!addr || (len < sizeof(int))) {
-			dev_err(&adap->dev, "of_i2c: invalid reg on %s\n",
-				node->full_name);
-			continue;
-		}
-
-		info.addr = be32_to_cpup(addr);
-		if (info.addr > (1 << 10) - 1) {
-			dev_err(&adap->dev, "of_i2c: invalid addr=%x on %s\n",
-				info.addr, node->full_name);
-			continue;
-		}
-
-		info.irq = irq_of_parse_and_map(node, 0);
-		info.of_node = of_node_get(node);
-		info.archdata = &dev_ad;
-
-		request_module("%s", info.type);
-
-		result = i2c_new_device(adap, &info);
-		if (result == NULL) {
-			dev_err(&adap->dev, "of_i2c: Failure registering %s\n",
-			        node->full_name);
-			of_node_put(node);
-			irq_dispose_mapping(info.irq);
-			continue;
-		}
-	}
-}
-EXPORT_SYMBOL(of_i2c_register_devices);
-
 static int of_dev_node_match(struct device *dev, void *data)
 {
         return dev->of_node == data;
diff --git a/include/linux/of_i2c.h b/include/linux/of_i2c.h
index 0efe8d4..1998cf8 100644
--- a/include/linux/of_i2c.h
+++ b/include/linux/of_i2c.h
@@ -15,16 +15,9 @@
 #if defined(CONFIG_OF_I2C) || defined(CONFIG_OF_I2C_MODULE)
 #include <linux/i2c.h>
 
-extern void of_i2c_register_devices(struct i2c_adapter *adap);
-
 /* must call put_device() when done with returned i2c_client device */
 extern struct i2c_client *of_find_i2c_device_by_node(struct device_node *node);
 
-#else
-static inline void of_i2c_register_devices(struct i2c_adapter *adap)
-{
-	return;
-}
 #endif /* CONFIG_OF_I2C */
 
 #endif /* __LINUX_OF_I2C_H */

^ permalink raw reply related

* Re: [BUG 2.6.36-rc5] of_i2c.ko <-> i2c-core.ko dependency loop
From: Grant Likely @ 2010-09-24 22:14 UTC (permalink / raw)
  To: Jean Delvare, Mikael Pettersson
  Cc: Randy Dunlap, linuxppc-dev, linux-kernel, linux-i2c
In-Reply-To: <f1f37e21-f04a-4fef-868c-27c7d2ebe594@email.android.com>

On Fri, Sep 24, 2010 at 7:48 AM, Grant Likely <grant.likely@secretlab.ca> w=
rote:
>
>
> "Jean Delvare" <khali@linux-fr.org> wrote:
>
>>Hi Mikael,
>>
>>On Fri, 24 Sep 2010 12:50:01 +0200, Mikael Pettersson wrote:
>>> Jean Delvare writes:
>>> =A0> As far as I can see this is caused by this commit from Grant:
>>> =A0>
>>> =A0> http://git.kernel.org/?p=3Dlinux/kernel/git/torvalds/linux-2.6.git=
;a=3Dcommitdiff;h=3D959e85f7751c33d1a2dabc5cc3fe2ed0db7052e5
>>> =A0>
>>> =A0> Mikael, can you please try reverting this patch and see if it solv=
es
>>> =A0> your problem?
>>>
>>> Yes, reverting the above commit from 2.6.36-rc5 eliminated the warnings=
,
>>> and I was able to insmod the i2c-{core,dev,powermac}.ko modules.
>>
>>Thanks for testing and reporting. Grant, unless you come up with a fix
>>very quickly, I'll have to revert
>>959e85f7751c33d1a2dabc5cc3fe2ed0db7052e5 for 2.6.36.
>
> I'll get a fix out today.

I've got two different fixes that I'm about to send you.  You can
choose the fix that you prefer.  The first option moves the offending
function into i2c-core.c.  The function parses the device tree data
and creates i2c_device for each i2c device node that it finds.  This
is analogous to i2c_scan_static_board_info().

The second options reverts most of the 959e85f7 commit, but keeps the
line that allows of-style matching is retained so that all i2c_devices
on powerpc machines will still bind correctly.

My preferred solution is the first option because the tested code path
does not changed.  The offending function is simply moved verbatim.
The second option is a smaller patch, but I can only test one of the
affected drivers.  However, I'll let you make the decision.

Both have been build tested on PowerPC and ARM, and run tested on a
PowerPC MPC5200 board.

patches to follow in a few minutes..

g.

^ permalink raw reply

* [PATCH (Option 2)] of/i2c: fix module load order issue caused by of_i2c.c
From: Grant Likely @ 2010-09-24 22:15 UTC (permalink / raw)
  To: khali, mikpe; +Cc: rdunlap, linuxppc-dev, linux-kernel, linux-i2c
In-Reply-To: <20100924221313.28357.61419.stgit@angua>

Commit 959e85f7, "i2c: add OF-style registration and binding" caused a
module dependency loop where of_i2c.c calls functions in i2c-core, and
i2c-core calls of_i2c_register_devices() in of_i2c.  This means that
when i2c support is built as a module when CONFIG_OF is set, then
neither i2c_core nor of_i2c are able to be loaded.

This patch fixes the problem by moving the of_i2c_register_devices()
calls back into the device drivers.  Device drivers already
specifically request the core code to parse the device tree for
devices anyway by setting the of_node pointer, so it isn't a big
deal to also call the registration function.  The drivers just become
slightly more verbose.

Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
---
 drivers/i2c/busses/i2c-cpm.c     |    5 +++++
 drivers/i2c/busses/i2c-ibm_iic.c |    3 +++
 drivers/i2c/busses/i2c-mpc.c     |    1 +
 drivers/i2c/i2c-core.c           |    4 ----
 4 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/drivers/i2c/busses/i2c-cpm.c b/drivers/i2c/busses/i2c-cpm.c
index f7bd261..f2de3be 100644
--- a/drivers/i2c/busses/i2c-cpm.c
+++ b/drivers/i2c/busses/i2c-cpm.c
@@ -677,6 +677,11 @@ static int __devinit cpm_i2c_probe(struct platform_device *ofdev,
 	dev_dbg(&ofdev->dev, "hw routines for %s registered.\n",
 		cpm->adap.name);
 
+	/*
+	 * register OF I2C devices
+	 */
+	of_i2c_register_devices(&cpm->adap);
+
 	return 0;
 out_shut:
 	cpm_i2c_shutdown(cpm);
diff --git a/drivers/i2c/busses/i2c-ibm_iic.c b/drivers/i2c/busses/i2c-ibm_iic.c
index 43ca32f..89eedf4 100644
--- a/drivers/i2c/busses/i2c-ibm_iic.c
+++ b/drivers/i2c/busses/i2c-ibm_iic.c
@@ -761,6 +761,9 @@ static int __devinit iic_probe(struct platform_device *ofdev,
 	dev_info(&ofdev->dev, "using %s mode\n",
 		 dev->fast_mode ? "fast (400 kHz)" : "standard (100 kHz)");
 
+	/* Now register all the child nodes */
+	of_i2c_register_devices(adap);
+
 	return 0;
 
 error_cleanup:
diff --git a/drivers/i2c/busses/i2c-mpc.c b/drivers/i2c/busses/i2c-mpc.c
index a1c419a..b74e6dc 100644
--- a/drivers/i2c/busses/i2c-mpc.c
+++ b/drivers/i2c/busses/i2c-mpc.c
@@ -632,6 +632,7 @@ static int __devinit fsl_i2c_probe(struct platform_device *op,
 		dev_err(i2c->dev, "failed to add adapter\n");
 		goto fail_add;
 	}
+	of_i2c_register_devices(&i2c->adap);
 
 	return result;
 
diff --git a/drivers/i2c/i2c-core.c b/drivers/i2c/i2c-core.c
index 6649176..a9589f5 100644
--- a/drivers/i2c/i2c-core.c
+++ b/drivers/i2c/i2c-core.c
@@ -32,7 +32,6 @@
 #include <linux/init.h>
 #include <linux/idr.h>
 #include <linux/mutex.h>
-#include <linux/of_i2c.h>
 #include <linux/of_device.h>
 #include <linux/completion.h>
 #include <linux/hardirq.h>
@@ -876,9 +875,6 @@ static int i2c_register_adapter(struct i2c_adapter *adap)
 	if (adap->nr < __i2c_first_dynamic_bus_num)
 		i2c_scan_static_board_info(adap);
 
-	/* Register devices from the device tree */
-	of_i2c_register_devices(adap);
-
 	/* Notify drivers */
 	mutex_lock(&core_lock);
 	bus_for_each_drv(&i2c_bus_type, NULL, adap, __process_new_adapter);

^ permalink raw reply related

* Re: [PATCH RFCv1 1/2] dmaengine: add support for scatterlist to scatterlist transfers
From: Dan Williams @ 2010-09-24 22:20 UTC (permalink / raw)
  To: Ira W. Snyder; +Cc: linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org
In-Reply-To: <20100924220419.GC24654@ovro.caltech.edu>

On Fri, 2010-09-24 at 15:04 -0700, Ira W. Snyder wrote:
> On Fri, Sep 24, 2010 at 02:53:14PM -0700, Dan Williams wrote:
> > What about overrun or underrun do we not care if src_total != dst_total?
> > 
> > Otherwise looks ok.
> > 
> 
> I don't know if we should care about that. The algorithm handles that
> case just fine. It copies the maximum amount it can, which is exactly
> min(src_total, dst_total). Whichever scatterlist runs out of entries
> first is the shortest.
> 
> As a real world example, my driver verifies that both scatterlists have
> exactly the right number of bytes available before trying to program the
> hardware.

Ok, just handle the prep failure and I think we are good to go.

--
Dan

^ permalink raw reply

* Re: [PATCH RFCv1 1/2] dmaengine: add support for scatterlist to scatterlist transfers
From: Ira W. Snyder @ 2010-09-24 22:53 UTC (permalink / raw)
  To: Dan Williams, linuxppc-dev@lists.ozlabs.org,
	linux-kernel@vger.kernel.org
In-Reply-To: <20100924220419.GC24654@ovro.caltech.edu>

On Fri, Sep 24, 2010 at 03:04:19PM -0700, Ira W. Snyder wrote:
> On Fri, Sep 24, 2010 at 02:53:14PM -0700, Dan Williams wrote:
> > On Fri, 2010-09-24 at 14:24 -0700, Ira W. Snyder wrote:
> > > On Fri, Sep 24, 2010 at 01:40:56PM -0700, Dan Williams wrote:
> > > > I don't think any dma channels gracefully handle descriptors that were
> > > > prepped but not submitted.  You would probably need to submit the
> > > > backlog, poll for completion, and then return the error.
> > > > Alternatively, the expectation is that descriptor allocations are
> > > > transient, i.e. once previously submitted transactions are completed
> > > > the descriptors will return to the available pool.  So you could do
> > > > what async_tx routines do and just poll for a descriptor.
> > > >
> > > 
> > > Can you give me an example? Even some pseudocode would help.
> > 
> > Here is one from do_async_gen_syndrome() in crypto/async_tx/async_pq.c:
> > 
> >         /* Since we have clobbered the src_list we are committed
> >          * to doing this asynchronously.  Drivers force forward
> >          * progress in case they can not provide a descriptor
> >          */
> >         for (;;) {
> >                 tx = dma->device_prep_dma_pq(chan, dma_dest,
> >                                              &dma_src[src_off],
> >                                              pq_src_cnt,
> >                                              &coefs[src_off], len,
> >                                              dma_flags);
> >                 if (likely(tx))
> >                         break;  
> >                 async_tx_quiesce(&submit->depend_tx);
> >                 dma_async_issue_pending(chan);
> >         }       
> > 
> > > The other DMAEngine functions (dma_async_memcpy_*()) don't do anything
> > > with the descriptor if submit fails. Take for example
> > > dma_async_memcpy_buf_to_buf(). If tx->tx_submit(tx); fails, any code
> > > using it has no way to return the descriptor to the free pool.
> > > 
> > > Does tx_submit() implicitly return descriptors to the free pool if it
> > > fails?
> > 
> > No, submit() failures are a hold over from when the ioatdma driver used
> > to perform additional descriptor allocation at ->submit() time.  After
> > prep() the expectation is that the engine is just waiting to be told
> > "go" and can't fail.  The only reason ->submit() retains a return code
> > is to support the "cookie" based method for polling for operation
> > completion.  A dma driver should handle all descriptor submission
> > failure scenarios at prep time.
> > 
> 
> Ok, that's more like what I expected. So we still need the try forever
> code similar to the above. I can add that for the next version.
> 

When coding this change, I've noticed one problem that would break my
driver. I cannot issue dma_async_issue_pending() on the channel while
creating the descriptors, since this will start transferring the
previously submitted DMA descriptors. This breaks the external hardware
control requirement.

Imagine this scenario:
1) device is not yet setup for external control (nothing is pulsing the pins)
2) dma_async_memcpy_sg_to_sg()

- this hits an allocation failure, which calls dma_async_issue_pending()
- this causes the DMA engine to start transferring to a device which is
  not ready yet
- memory pressure stops, and allocation succeeds again
- some descriptors have been transferred, but not the ones since the
  alloc failure
- now the first half of the descriptors (pre alloc failure) have been
  transferred
- the second half of the descriptors (post alloc failure) are still
  pending
- the dma_async_memcpy_sg_to_sg() returns success: all tx_submit()
  succeeded

3) device_control() - setup external control mode
4) dma_async_issue_pending() - start the externally controlled transfer
5) tell the external agent to start controlling the DMA transaction

- now there isn't enough data left, and the external agent fails to
  program the FPGAs

I don't mind adding it to the code, since I have enough memory that I
don't ever see allocation failures. It is an embedded system, and we've
been careful not to overcommit memory. I think for all other users, it
would be the appropriate thing to do. Most people don't care if the
scatterlist is copied in two chunks with a time gap in the middle.

An alternative implementation would be to implement
device_prep_sg_to_sg() that returned a struct dma_async_tx_descriptor,
which could then be used as normal by higher layers. This would allow
the driver to allocate / cleanup all descriptors in one shot. This would
be completely robust to this error situation.

Is there one solution you'd prefer over the other? They're both similar
in the amount of code, though duplication would probably be increased in
the device_prep_sg_to_sg() case. If any other driver implements it.

Thanks,
Ira

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox