* Re: [PATCH 0/2 v3] mpc5200 ac97 gpio reset
From: Mark Brown @ 2010-08-03 5:52 UTC (permalink / raw)
To: Grant Likely; +Cc: linuxppc-dev@lists.ozlabs.org, Eric Millbrandt
In-Reply-To: <AANLkTikbHp18uc-idFtEqakoo_0Vdg_rU44xv9uHwkvG@mail.gmail.com>
On Sat, Jul 31, 2010 at 10:42:15PM -0600, Grant Likely wrote:
> On Sun, Jun 27, 2010 at 4:01 PM, Mark Brown
> > I'm a little concerned with a collision with multi codec here. It'd
> > be handy if you could keep it separate in case it needs merging
> > into both trees (or we could merge via ASoC only).
> Hmmm. Yeah, probably better to take it via your tree then. I've
> currently got patch 1 in linux-next, but I'll drop it before asking
> benh to pull. Go ahead and add my acked-by.
Looks like multi-component is slightly too late for .36 and I don't have
the original patch any more since it looked like you were going to be
carrying it... could you restore the patch to your tree please?
^ permalink raw reply
* Re: [PATCH v2 5/7] powerpc/85xx: Add MChk handler for SRIO port
From: Michael Neuling @ 2010-08-03 6:06 UTC (permalink / raw)
To: Timur Tabi; +Cc: Alexandre Bounine, linuxppc-dev, linux-kernel, thomas.moll
In-Reply-To: <AANLkTinKbimKyLpvFD7KOvavshu_n8gRcp2BvEJj0XZQ@mail.gmail.com>
> > MCSR_MASK is not defined anywhere, so when I compile this code, I get this:
>
> Never mind. I see that it's been fixed already, and that the patch
> that removed MCSR_MASK was posted around the same time that this patch
> was posted.
I don't know what happened here but 2.6.35 is broken because of this
problem:
arch/powerpc/sysdev/fsl_rio.c:248: error: 'MCSR_MASK' undeclared (first use in this function)
arch/powerpc/sysdev/fsl_rio.c:248: error: (Each undeclared identifier is reported only once
arch/powerpc/sysdev/fsl_rio.c:248: error: for each function it appears in.)
arch/powerpc/sysdev/fsl_rio.c:250: error: 'MCSR_BUS_RBERR' undeclared (first use in this function)
Mikey
^ permalink raw reply
* [PATCH 1/3] powerpc: Optimise 64bit csum_partial
From: Anton Blanchard @ 2010-08-03 6:08 UTC (permalink / raw)
To: benh, paulus; +Cc: linuxppc-dev
The main loop of csum_partial runs very slowly on recent POWER CPUs. After some
analysis on both POWER6 and POWER7 I came up with routine below. First we get
the source aligned to a double word, ignoring any odd alignment to keep things
simple. Then we do 64 bytes at a time, with an entry and exit limb of a further
64 bytes. On both POWER6 and POWER7 this should be as fast as we can go since
we are limited by the latency of the adde instructions.
To test this I forced checksumming on over loopback and ran socklib (a
simple TCP benchmark). On a POWER6 575 throughput improved by 11% with
this patch.
Signed-off-by: Anton Blanchard <anton@samba.org>
--
Index: powerpc.git/arch/powerpc/lib/checksum_64.S
===================================================================
--- powerpc.git.orig/arch/powerpc/lib/checksum_64.S 2010-08-03 13:32:45.291991557 +1000
+++ powerpc.git/arch/powerpc/lib/checksum_64.S 2010-08-03 15:16:59.600753385 +1000
@@ -65,55 +65,168 @@ _GLOBAL(csum_tcpudp_magic)
srwi r3,r3,16
blr
+#define STACKFRAMESIZE 256
+#define STK_REG(i) (112 + ((i)-14)*8)
+
/*
* Computes the checksum of a memory block at buff, length len,
* and adds in "sum" (32-bit).
*
- * This code assumes at least halfword alignment, though the length
- * can be any number of bytes. The sum is accumulated in r5.
- *
* csum_partial(r3=buff, r4=len, r5=sum)
*/
_GLOBAL(csum_partial)
- subi r3,r3,8 /* we'll offset by 8 for the loads */
- srdi. r6,r4,3 /* divide by 8 for doubleword count */
- addic r5,r5,0 /* clear carry */
- beq 3f /* if we're doing < 8 bytes */
- andi. r0,r3,2 /* aligned on a word boundary already? */
- beq+ 1f
- lhz r6,8(r3) /* do 2 bytes to get aligned */
- addi r3,r3,2
- subi r4,r4,2
- addc r5,r5,r6
- srdi. r6,r4,3 /* recompute number of doublewords */
- beq 3f /* any left? */
-1: mtctr r6
-2: ldu r6,8(r3) /* main sum loop */
- adde r5,r5,r6
- bdnz 2b
- andi. r4,r4,7 /* compute bytes left to sum after doublewords */
-3: cmpwi 0,r4,4 /* is at least a full word left? */
- blt 4f
- lwz r6,8(r3) /* sum this word */
+ addic r0,r5,0 /* clear carry */
+
+ srdi. r6,r4,3 /* less than 8 bytes? */
+ beq .Lcsum_tail_word
+
+ /*
+ * If only halfword aligned, align to a double word. Since odd
+ * aligned addresses should be rare and they would require more
+ * work to calculate the correct checksum, we ignore that case
+ * and take the potential slowdown of unaligned loads.
+ */
+ rldicl. r6,r3,64-1,64-2 /* r6 = (r3 & 0x3) >> 1 */
+ beq .Lcsum_aligned
+
+ li r7,4
+ sub r6,r7,r6
+ mtctr r6
+
+1:
+ lhz r6,0(r3) /* align to doubleword */
+ subi r4,r4,2
+ addi r3,r3,2
+ adde r0,r0,r6
+ bdnz 1b
+
+.Lcsum_aligned:
+ /*
+ * We unroll the loop such that each iteration is 64 bytes with an
+ * entry and exit limb of 64 bytes, meaning a minimum size of
+ * 128 bytes.
+ */
+ srdi. r6,r4,7
+ beq .Lcsum_tail_doublewords /* len < 128 */
+
+ srdi r6,r4,6
+ subi r6,r6,1
+ mtctr r6
+
+ stdu r1,-STACKFRAMESIZE(r1)
+ std r14,STK_REG(r14)(r1)
+ std r15,STK_REG(r15)(r1)
+ std r16,STK_REG(r16)(r1)
+
+ ld r6,0(r3)
+ ld r9,8(r3)
+
+ ld r10,16(r3)
+ ld r11,24(r3)
+
+ /*
+ * On POWER6 and POWER7 back to back addes take 2 cycles because of
+ * the XER dependency. This means the fastest this loop can go is
+ * 16 cycles per iteration. The scheduling of the loop below has
+ * been shown to hit this on both POWER6 and POWER7.
+ */
+ .align 5
+2:
+ adde r0,r0,r6
+ ld r12,32(r3)
+ ld r14,40(r3)
+
+ adde r0,r0,r9
+ ld r15,48(r3)
+ ld r16,56(r3)
+ addi r3,r3,64
+
+ adde r0,r0,r10
+
+ adde r0,r0,r11
+
+ adde r0,r0,r12
+
+ adde r0,r0,r14
+
+ adde r0,r0,r15
+ ld r6,0(r3)
+ ld r9,8(r3)
+
+ adde r0,r0,r16
+ ld r10,16(r3)
+ ld r11,24(r3)
+ bdnz 2b
+
+
+ adde r0,r0,r6
+ ld r12,32(r3)
+ ld r14,40(r3)
+
+ adde r0,r0,r9
+ ld r15,48(r3)
+ ld r16,56(r3)
+ addi r3,r3,64
+
+ adde r0,r0,r10
+ adde r0,r0,r11
+ adde r0,r0,r12
+ adde r0,r0,r14
+ adde r0,r0,r15
+ adde r0,r0,r16
+
+ ld r14,STK_REG(r14)(r1)
+ ld r15,STK_REG(r15)(r1)
+ ld r16,STK_REG(r16)(r1)
+ addi r1,r1,STACKFRAMESIZE
+
+ andi. r4,r4,63
+
+.Lcsum_tail_doublewords: /* Up to 127 bytes to go */
+ srdi. r6,r4,3
+ beq .Lcsum_tail_word
+
+ mtctr r6
+3:
+ ld r6,0(r3)
+ addi r3,r3,8
+ adde r0,r0,r6
+ bdnz 3b
+
+ andi. r4,r4,7
+
+.Lcsum_tail_word: /* Up to 7 bytes to go */
+ srdi. r6,r4,2
+ beq .Lcsum_tail_halfword
+
+ lwz r6,0(r3)
addi r3,r3,4
+ adde r0,r0,r6
subi r4,r4,4
- adde r5,r5,r6
-4: cmpwi 0,r4,2 /* is at least a halfword left? */
- blt+ 5f
- lhz r6,8(r3) /* sum this halfword */
- addi r3,r3,2
- subi r4,r4,2
- adde r5,r5,r6
-5: cmpwi 0,r4,1 /* is at least a byte left? */
- bne+ 6f
- lbz r6,8(r3) /* sum this byte */
- slwi r6,r6,8 /* this byte is assumed to be the upper byte of a halfword */
- adde r5,r5,r6
-6: addze r5,r5 /* add in final carry */
- rldicl r4,r5,32,0 /* fold two 32-bit halves together */
- add r3,r4,r5
- srdi r3,r3,32
- blr
+
+.Lcsum_tail_halfword: /* Up to 3 bytes to go */
+ srdi. r6,r4,1
+ beq .Lcsum_tail_byte
+
+ lhz r6,0(r3)
+ addi r3,r3,2
+ adde r0,r0,r6
+ subi r4,r4,2
+
+.Lcsum_tail_byte: /* Up to 1 byte to go */
+ andi. r6,r4,1
+ beq .Lcsum_finish
+
+ lbz r6,0(r3)
+ sldi r9,r6,8 /* Pad the byte out to 16 bits */
+ adde r0,r0,r9
+
+.Lcsum_finish:
+ addze r0,r0 /* add in final carry */
+ rldicl r4,r0,32,0 /* fold two 32 bit halves together */
+ add r3,r4,r0
+ srdi r3,r3,32
+ blr
/*
* Computes the checksum of a memory block at src, length len,
^ permalink raw reply
* [PATCH 2/3] powerpc: Optimise 64bit csum_partial_copy_generic and add csum_and_copy_from_user
From: Anton Blanchard @ 2010-08-03 6:09 UTC (permalink / raw)
To: benh, paulus; +Cc: linuxppc-dev
In-Reply-To: <20100803060834.GK29316@kryten>
We use the same core loop as the new csum_partial, adding in the
stores and exception handling code. To keep things simple we do all the
exception fixup in csum_and_copy_from_user. This wrapper function is
modelled on the generic checksum code and is careful to always calculate
a complete checksum even if we only copied part of the data to userspace.
To test this I forced checksumming on over loopback and ran socklib (a
simple TCP benchmark). On a POWER6 575 throughput improved by 19% with
this patch. If I forced both the sender and receiver onto the same cpu
(with the hope of shifting the benchmark from being cache bandwidth limited
to cpu limited), adding this patch improved performance by 55%
Signed-off-by: Anton Blanchard <anton@samba.org>
--
Index: powerpc.git/arch/powerpc/lib/checksum_64.S
===================================================================
--- powerpc.git.orig/arch/powerpc/lib/checksum_64.S 2010-08-03 15:16:59.600753385 +1000
+++ powerpc.git/arch/powerpc/lib/checksum_64.S 2010-08-03 16:00:32.030741261 +1000
@@ -228,115 +228,230 @@ _GLOBAL(csum_partial)
srdi r3,r3,32
blr
+
+ .macro source
+100:
+ .section __ex_table,"a"
+ .align 3
+ .llong 100b,.Lsrc_error
+ .previous
+ .endm
+
+ .macro dest
+200:
+ .section __ex_table,"a"
+ .align 3
+ .llong 200b,.Ldest_error
+ .previous
+ .endm
+
/*
* Computes the checksum of a memory block at src, length len,
* and adds in "sum" (32-bit), while copying the block to dst.
* If an access exception occurs on src or dst, it stores -EFAULT
- * to *src_err or *dst_err respectively, and (for an error on
- * src) zeroes the rest of dst.
- *
- * This code needs to be reworked to take advantage of 64 bit sum+copy.
- * However, due to tokenring halfword alignment problems this will be very
- * tricky. For now we'll leave it until we instrument it somehow.
+ * to *src_err or *dst_err respectively. The caller must take any action
+ * required in this case (zeroing memory, recalculating partial checksum etc).
*
* csum_partial_copy_generic(r3=src, r4=dst, r5=len, r6=sum, r7=src_err, r8=dst_err)
*/
_GLOBAL(csum_partial_copy_generic)
- addic r0,r6,0
- subi r3,r3,4
- subi r4,r4,4
- srwi. r6,r5,2
- beq 3f /* if we're doing < 4 bytes */
- andi. r9,r4,2 /* Align dst to longword boundary */
- beq+ 1f
-81: lhz r6,4(r3) /* do 2 bytes to get aligned */
- addi r3,r3,2
+ addic r0,r6,0 /* clear carry */
+
+ srdi. r6,r5,3 /* less than 8 bytes? */
+ beq .Lcopy_tail_word
+
+ /*
+ * If only halfword aligned, align to a double word. Since odd
+ * aligned addresses should be rare and they would require more
+ * work to calculate the correct checksum, we ignore that case
+ * and take the potential slowdown of unaligned loads.
+ *
+ * If the source and destination are relatively unaligned we only
+ * align the source. This keeps things simple.
+ */
+ rldicl. r6,r3,64-1,64-2 /* r6 = (r3 & 0x3) >> 1 */
+ beq .Lcopy_aligned
+
+ li r7,4
+ sub r6,r7,r6
+ mtctr r6
+
+1:
+source; lhz r6,0(r3) /* align to doubleword */
subi r5,r5,2
-91: sth r6,4(r4)
- addi r4,r4,2
- addc r0,r0,r6
- srwi. r6,r5,2 /* # words to do */
- beq 3f
-1: mtctr r6
-82: lwzu r6,4(r3) /* the bdnz has zero overhead, so it should */
-92: stwu r6,4(r4) /* be unnecessary to unroll this loop */
- adde r0,r0,r6
- bdnz 82b
- andi. r5,r5,3
-3: cmpwi 0,r5,2
- blt+ 4f
-83: lhz r6,4(r3)
addi r3,r3,2
- subi r5,r5,2
-93: sth r6,4(r4)
+ adde r0,r0,r6
+dest; sth r6,0(r4)
addi r4,r4,2
+ bdnz 1b
+
+.Lcopy_aligned:
+ /*
+ * We unroll the loop such that each iteration is 64 bytes with an
+ * entry and exit limb of 64 bytes, meaning a minimum size of
+ * 128 bytes.
+ */
+ srdi. r6,r5,7
+ beq .Lcopy_tail_doublewords /* len < 128 */
+
+ srdi r6,r5,6
+ subi r6,r6,1
+ mtctr r6
+
+ stdu r1,-STACKFRAMESIZE(r1)
+ std r14,STK_REG(r14)(r1)
+ std r15,STK_REG(r15)(r1)
+ std r16,STK_REG(r16)(r1)
+
+source; ld r6,0(r3)
+source; ld r9,8(r3)
+
+source; ld r10,16(r3)
+source; ld r11,24(r3)
+
+ /*
+ * On POWER6 and POWER7 back to back addes take 2 cycles because of
+ * the XER dependency. This means the fastest this loop can go is
+ * 16 cycles per iteration. The scheduling of the loop below has
+ * been shown to hit this on both POWER6 and POWER7.
+ */
+ .align 5
+2:
adde r0,r0,r6
-4: cmpwi 0,r5,1
- bne+ 5f
-84: lbz r6,4(r3)
-94: stb r6,4(r4)
- slwi r6,r6,8 /* Upper byte of word */
- adde r0,r0,r6
-5: addze r3,r0 /* add in final carry (unlikely with 64-bit regs) */
- rldicl r4,r3,32,0 /* fold 64 bit value */
- add r3,r4,r3
- srdi r3,r3,32
- blr
+source; ld r12,32(r3)
+source; ld r14,40(r3)
-/* These shouldn't go in the fixup section, since that would
- cause the ex_table addresses to get out of order. */
+ adde r0,r0,r9
+source; ld r15,48(r3)
+source; ld r16,56(r3)
+ addi r3,r3,64
+
+ adde r0,r0,r10
+dest; std r6,0(r4)
+dest; std r9,8(r4)
+
+ adde r0,r0,r11
+dest; std r10,16(r4)
+dest; std r11,24(r4)
+
+ adde r0,r0,r12
+dest; std r12,32(r4)
+dest; std r14,40(r4)
+
+ adde r0,r0,r14
+dest; std r15,48(r4)
+dest; std r16,56(r4)
+ addi r4,r4,64
+
+ adde r0,r0,r15
+source; ld r6,0(r3)
+source; ld r9,8(r3)
+
+ adde r0,r0,r16
+source; ld r10,16(r3)
+source; ld r11,24(r3)
+ bdnz 2b
+
+
+ adde r0,r0,r6
+source; ld r12,32(r3)
+source; ld r14,40(r3)
+
+ adde r0,r0,r9
+source; ld r15,48(r3)
+source; ld r16,56(r3)
+ addi r3,r3,64
+
+ adde r0,r0,r10
+dest; std r6,0(r4)
+dest; std r9,8(r4)
+
+ adde r0,r0,r11
+dest; std r10,16(r4)
+dest; std r11,24(r4)
+
+ adde r0,r0,r12
+dest; std r12,32(r4)
+dest; std r14,40(r4)
+
+ adde r0,r0,r14
+dest; std r15,48(r4)
+dest; std r16,56(r4)
+ addi r4,r4,64
+
+ adde r0,r0,r15
+ adde r0,r0,r16
+
+ ld r14,STK_REG(r14)(r1)
+ ld r15,STK_REG(r15)(r1)
+ ld r16,STK_REG(r16)(r1)
+ addi r1,r1,STACKFRAMESIZE
+
+ andi. r5,r5,63
+
+.Lcopy_tail_doublewords: /* Up to 127 bytes to go */
+ srdi. r6,r5,3
+ beq .Lcopy_tail_word
- .globl src_error_1
-src_error_1:
- li r6,0
- subi r5,r5,2
-95: sth r6,4(r4)
- addi r4,r4,2
- srwi. r6,r5,2
- beq 3f
mtctr r6
- .globl src_error_2
-src_error_2:
- li r6,0
-96: stwu r6,4(r4)
- bdnz 96b
-3: andi. r5,r5,3
- beq src_error
- .globl src_error_3
-src_error_3:
- li r6,0
- mtctr r5
- addi r4,r4,3
-97: stbu r6,1(r4)
- bdnz 97b
- .globl src_error
-src_error:
+3:
+source; ld r6,0(r3)
+ addi r3,r3,8
+ adde r0,r0,r6
+dest; std r6,0(r4)
+ addi r4,r4,8
+ bdnz 3b
+
+ andi. r5,r5,7
+
+.Lcopy_tail_word: /* Up to 7 bytes to go */
+ srdi. r6,r5,2
+ beq .Lcopy_tail_halfword
+
+source; lwz r6,0(r3)
+ addi r3,r3,4
+ adde r0,r0,r6
+dest; stw r6,0(r4)
+ addi r4,r4,4
+ subi r5,r5,4
+
+.Lcopy_tail_halfword: /* Up to 3 bytes to go */
+ srdi. r6,r5,1
+ beq .Lcopy_tail_byte
+
+source; lhz r6,0(r3)
+ addi r3,r3,2
+ adde r0,r0,r6
+dest; sth r6,0(r4)
+ addi r4,r4,2
+ subi r5,r5,2
+
+.Lcopy_tail_byte: /* Up to 1 byte to go */
+ andi. r6,r5,1
+ beq .Lcopy_finish
+
+source; lbz r6,0(r3)
+ sldi r9,r6,8 /* Pad the byte out to 16 bits */
+ adde r0,r0,r9
+dest; stb r6,0(r4)
+
+.Lcopy_finish:
+ addze r0,r0 /* add in final carry */
+ rldicl r4,r0,32,0 /* fold two 32 bit halves together */
+ add r3,r4,r0
+ srdi r3,r3,32
+ blr
+
+.Lsrc_error:
cmpdi 0,r7,0
- beq 1f
+ beqlr
li r6,-EFAULT
stw r6,0(r7)
-1: addze r3,r0
blr
- .globl dst_error
-dst_error:
+.Ldest_error:
cmpdi 0,r8,0
- beq 1f
+ beqlr
li r6,-EFAULT
stw r6,0(r8)
-1: addze r3,r0
blr
-
-.section __ex_table,"a"
- .align 3
- .llong 81b,src_error_1
- .llong 91b,dst_error
- .llong 82b,src_error_2
- .llong 92b,dst_error
- .llong 83b,src_error_3
- .llong 93b,dst_error
- .llong 84b,src_error_3
- .llong 94b,dst_error
- .llong 95b,dst_error
- .llong 96b,dst_error
- .llong 97b,dst_error
Index: powerpc.git/arch/powerpc/lib/checksum_wrappers_64.c
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ powerpc.git/arch/powerpc/lib/checksum_wrappers_64.c 2010-08-03 15:58:43.214492712 +1000
@@ -0,0 +1,65 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ * Copyright (C) IBM Corporation, 2010
+ *
+ * Author: Anton Blanchard <anton@au.ibm.com>
+ */
+#include <linux/module.h>
+#include <linux/compiler.h>
+#include <linux/types.h>
+#include <asm/checksum.h>
+#include <asm/uaccess.h>
+
+__wsum csum_and_copy_from_user(const void __user *src, void *dst,
+ int len, __wsum sum, int *err_ptr)
+{
+ unsigned int csum;
+
+ might_sleep();
+
+ *err_ptr = 0;
+
+ if (!len) {
+ csum = 0;
+ goto out;
+ }
+
+ if (unlikely((len < 0) || !access_ok(VERIFY_READ, src, len))) {
+ *err_ptr = -EFAULT;
+ csum = (__force unsigned int)sum;
+ goto out;
+ }
+
+ csum = csum_partial_copy_generic((void __force *)src, dst,
+ len, sum, err_ptr, NULL);
+
+ if (unlikely(*err_ptr)) {
+ int missing = __copy_from_user(dst, src, len);
+
+ if (missing) {
+ memset(dst + len - missing, 0, missing);
+ *err_ptr = -EFAULT;
+ } else {
+ *err_ptr = 0;
+ }
+
+ csum = csum_partial(dst, len, sum);
+ }
+
+out:
+ return (__force __wsum)csum;
+}
+EXPORT_SYMBOL(csum_and_copy_from_user);
Index: powerpc.git/arch/powerpc/lib/Makefile
===================================================================
--- powerpc.git.orig/arch/powerpc/lib/Makefile 2010-08-03 15:16:35.310741991 +1000
+++ powerpc.git/arch/powerpc/lib/Makefile 2010-08-03 15:17:58.240742747 +1000
@@ -17,7 +17,8 @@ obj-$(CONFIG_PPC32) += div64.o copy_32.o
obj-$(CONFIG_HAS_IOMEM) += devres.o
obj-$(CONFIG_PPC64) += copypage_64.o copyuser_64.o \
- memcpy_64.o usercopy_64.o mem_64.o string.o
+ memcpy_64.o usercopy_64.o mem_64.o string.o \
+ checksum_wrappers_64.o
obj-$(CONFIG_XMON) += sstep.o ldstfp.o
obj-$(CONFIG_KPROBES) += sstep.o ldstfp.o
obj-$(CONFIG_HAVE_HW_BREAKPOINT) += sstep.o ldstfp.o
Index: powerpc.git/arch/powerpc/include/asm/checksum.h
===================================================================
--- powerpc.git.orig/arch/powerpc/include/asm/checksum.h 2010-08-03 15:16:35.320741299 +1000
+++ powerpc.git/arch/powerpc/include/asm/checksum.h 2010-08-03 15:58:43.234490654 +1000
@@ -52,12 +52,19 @@ extern __wsum csum_partial(const void *b
extern __wsum csum_partial_copy_generic(const void *src, void *dst,
int len, __wsum sum,
int *src_err, int *dst_err);
+
+#ifdef __powerpc64__
+#define _HAVE_ARCH_COPY_AND_CSUM_FROM_USER
+extern __wsum csum_and_copy_from_user(const void __user *src, void *dst,
+ int len, __wsum sum, int *err_ptr);
+#else
/*
* the same as csum_partial, but copies from src to dst while it
* checksums.
*/
#define csum_partial_copy_from_user(src, dst, len, sum, errp) \
csum_partial_copy_generic((__force const void *)(src), (dst), (len), (sum), (errp), NULL)
+#endif
#define csum_partial_copy_nocheck(src, dst, len, sum) \
csum_partial_copy_generic((src), (dst), (len), (sum), NULL, NULL)
^ permalink raw reply
* [PATCH 3/3] powerpc: Add 64bit csum_and_copy_to_user
From: Anton Blanchard @ 2010-08-03 6:11 UTC (permalink / raw)
To: benh, paulus; +Cc: linuxppc-dev
In-Reply-To: <20100803060952.GA31448@kryten>
This adds the equivalent of csum_and_copy_from_user for the receive side so we
can copy and checksum in one pass. It is modelled on the generic checksum
routine.
Signed-off-by: Anton Blanchard <anton@samba.org>
---
Index: powerpc.git/arch/powerpc/include/asm/checksum.h
===================================================================
--- powerpc.git.orig/arch/powerpc/include/asm/checksum.h 2010-08-03 16:02:02.234491674 +1000
+++ powerpc.git/arch/powerpc/include/asm/checksum.h 2010-08-03 16:04:10.733241094 +1000
@@ -57,6 +57,9 @@ extern __wsum csum_partial_copy_generic(
#define _HAVE_ARCH_COPY_AND_CSUM_FROM_USER
extern __wsum csum_and_copy_from_user(const void __user *src, void *dst,
int len, __wsum sum, int *err_ptr);
+#define HAVE_CSUM_COPY_USER
+extern __wsum csum_and_copy_to_user(const void *src, void __user *dst,
+ int len, __wsum sum, int *err_ptr);
#else
/*
* the same as csum_partial, but copies from src to dst while it
Index: powerpc.git/arch/powerpc/lib/checksum_wrappers_64.c
===================================================================
--- powerpc.git.orig/arch/powerpc/lib/checksum_wrappers_64.c 2010-08-03 16:02:02.234491674 +1000
+++ powerpc.git/arch/powerpc/lib/checksum_wrappers_64.c 2010-08-03 16:02:03.063242741 +1000
@@ -63,3 +63,40 @@ out:
return (__force __wsum)csum;
}
EXPORT_SYMBOL(csum_and_copy_from_user);
+
+__wsum csum_and_copy_to_user(const void *src, void __user *dst, int len,
+ __wsum sum, int *err_ptr)
+{
+ unsigned int csum;
+
+ might_sleep();
+
+ *err_ptr = 0;
+
+ if (!len) {
+ csum = 0;
+ goto out;
+ }
+
+ if (unlikely((len < 0) || !access_ok(VERIFY_WRITE, dst, len))) {
+ *err_ptr = -EFAULT;
+ csum = -1; /* invalid checksum */
+ goto out;
+ }
+
+ csum = csum_partial_copy_generic(src, (void __force *)dst,
+ len, sum, NULL, err_ptr);
+
+ if (unlikely(*err_ptr)) {
+ csum = csum_partial(src, len, sum);
+
+ if (copy_to_user(dst, src, len)) {
+ *err_ptr = -EFAULT;
+ csum = -1; /* invalid checksum */
+ }
+ }
+
+out:
+ return (__force __wsum)csum;
+}
+EXPORT_SYMBOL(csum_and_copy_to_user);
^ permalink raw reply
* Re: [PATCH] powerpc: Dont require a dma_ops struct to set dma mask
From: Stephen Rothwell @ 2010-08-03 6:17 UTC (permalink / raw)
To: Kumar Gala; +Cc: linuxppc-dev
In-Reply-To: <1280769682-2839-1-git-send-email-galak@kernel.crashing.org>
[-- Attachment #1: Type: text/plain, Size: 661 bytes --]
Hi Kumar,
On Mon, 2 Aug 2010 12:21:22 -0500 Kumar Gala <galak@kernel.crashing.org> wrote:
>
> --- a/arch/powerpc/include/asm/dma-mapping.h
> +++ b/arch/powerpc/include/asm/dma-mapping.h
> @@ -131,9 +131,7 @@ static inline int dma_set_mask(struct device *dev, u64 dma_mask)
> {
> struct dma_map_ops *dma_ops = get_dma_ops(dev);
>
> - if (unlikely(dma_ops == NULL))
> - return -EIO;
> - if (dma_ops->set_dma_mask != NULL)
> + if (unlikely(dma_ops == NULL) && (dma_ops->set_dma_mask != NULL))
The first part of this condition is backward (should be != (or just
"dma_ops") (and "likely"?)).
--
Stephen Rothwell <sfr@canb.auug.org.au>
[-- Attachment #2: Type: application/pgp-signature, Size: 490 bytes --]
^ permalink raw reply
* RE: possible bug in ppc_vm_region_alloc()
From: Yossi Etigin @ 2010-08-03 6:21 UTC (permalink / raw)
To: linuxppc-dev; +Cc: Alex Netes
In-Reply-To: <7E95F01E94AB484F83061FCFA35B39F801322D@exil.voltaire.com>
Hello,
(I repost this because looks like the previous was filtered because I
was not subscribed to the list)
We are looking at dma_alloc_coherent(), which uses ppc_vm_region_alloc()
on the coherent region "consistent_head".
It seems to us there is a bug in the function ppc_vm_region_alloc().
The check "if (addr > end)" should be "if (addr >=3D end)"
If for example it is called once when the size is the entire coherent
region, the second time it will allocate a region outside the valid
memory.
It will happen because the list will contain one element (besides the
head) which is equal to the head, and neither condition will cause a
"goto nospc". Then the list iteration will end and the new region will
be allocated right after the valid region.
list_for_each_entry(c, &head->vm_list, vm_list) {
if ((addr + size) < addr)
goto nospc;
if ((addr + size) <=3D c->vm_start)
goto found;
addr =3D c->vm_end;
if (addr > end) <=3D=3D=3D here
goto nospc;
}
--Yossi
^ permalink raw reply
* [PATCH] powerpc/mm: Fix vsid_scrample typo
From: Anton Blanchard @ 2010-08-03 6:35 UTC (permalink / raw)
To: benh; +Cc: linuxppc-dev
The code is wrapped in an #if 0, but it's wrong so we may as well fix it.
Signed-off-by: Anton Blanchard <anton@samba.org>
---
Index: linux-2.6/arch/powerpc/include/asm/mmu-hash64.h
===================================================================
--- linux-2.6.orig/arch/powerpc/include/asm/mmu-hash64.h 2010-06-29 05:03:49.000000000 +1000
+++ linux-2.6/arch/powerpc/include/asm/mmu-hash64.h 2010-06-29 05:04:21.000000000 +1000
@@ -431,7 +431,7 @@ typedef struct {
* with. However gcc is not clever enough to compute the
* modulus (2^n-1) without a second multiply.
*/
-#define vsid_scrample(protovsid, size) \
+#define vsid_scramble(protovsid, size) \
((((protovsid) * VSID_MULTIPLIER_##size) % VSID_MODULUS_##size))
#else /* 1 */
^ permalink raw reply
* [PATCH] powerpc/kdump: Stop all other CPUs before running crash handlers
From: Anton Blanchard @ 2010-08-03 6:39 UTC (permalink / raw)
To: benh, mikey, matt; +Cc: linuxppc-dev
During kdump we run the crash handlers first then stop all other CPUs.
We really want to stop all CPUs as close to the fail as possible and also
have a very controlled environment for running the crash handlers, so it
makes sense to reverse the order.
Signed-off-by: Anton Blanchard <anton@samba.org>
---
Index: powerpc.git/arch/powerpc/kernel/crash.c
===================================================================
--- powerpc.git.orig/arch/powerpc/kernel/crash.c 2010-07-15 20:49:39.941991306 +1000
+++ powerpc.git/arch/powerpc/kernel/crash.c 2010-08-03 16:36:08.451991018 +1000
@@ -402,6 +402,18 @@ void default_machine_crash_shutdown(stru
*/
hard_irq_disable();
+ /*
+ * Make a note of crashing cpu. Will be used in machine_kexec
+ * such that another IPI will not be sent.
+ */
+ crashing_cpu = smp_processor_id();
+ crash_save_cpu(regs, crashing_cpu);
+ crash_kexec_prepare_cpus(crashing_cpu);
+ cpu_set(crashing_cpu, cpus_in_crash);
+#if defined(CONFIG_PPC_STD_MMU_64) && defined(CONFIG_SMP)
+ crash_kexec_wait_realmode(crashing_cpu);
+#endif
+
for_each_irq(i) {
struct irq_desc *desc = irq_to_desc(i);
@@ -438,18 +450,8 @@ void default_machine_crash_shutdown(stru
crash_shutdown_cpu = -1;
__debugger_fault_handler = old_handler;
- /*
- * Make a note of crashing cpu. Will be used in machine_kexec
- * such that another IPI will not be sent.
- */
- crashing_cpu = smp_processor_id();
- crash_save_cpu(regs, crashing_cpu);
- crash_kexec_prepare_cpus(crashing_cpu);
- cpu_set(crashing_cpu, cpus_in_crash);
crash_kexec_stop_spus();
-#if defined(CONFIG_PPC_STD_MMU_64) && defined(CONFIG_SMP)
- crash_kexec_wait_realmode(crashing_cpu);
-#endif
+
if (ppc_md.kexec_cpu_down)
ppc_md.kexec_cpu_down(1, 0);
}
^ permalink raw reply
* [PATCH] powerpc: fix build with make 3.82
From: Sam Ravnborg @ 2010-08-03 6:47 UTC (permalink / raw)
To: Benjamin Herrenschmidt, Paul Mackerras
Cc: Michal Marek, linuxppc-dev, Thomas Backlund
Thomas Backlund reported that the powerpc build broke with make 3.82.
It failed with the following message:
arch/powerpc/Makefile:183: *** mixed implicit and normal rules. Stop.
The fix is to avoid mixing non-wildcard and wildcard targets.
Reported-by: Thomas Backlund <tmb@mandriva.org>
Tested-by: Thomas Backlund <tmb@mandriva.org>
Cc: Michal Marek <mmarek@suse.cz>
Cc: stable <stable@kernel.org>
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
---
Hi Ben / Paul.
This fixes powerc build with latest make version.
The patch is on top of 2.6.35.
But it is more of a coincidence that we see a make release
right now and this issue is also present in older kernels.
So I have added a "Cc: stable <stable@kernel.org>" because
I consider this relevant for the stable kernel releases too.
@Michal - you got a copy as information only.
I fear we may see this bug for other parts of the kernel too.
Sam
diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
index 77cfe7a..5d2f17d 100644
--- a/arch/powerpc/Makefile
+++ b/arch/powerpc/Makefile
@@ -163,9 +163,11 @@ drivers-$(CONFIG_OPROFILE) += arch/powerpc/oprofile/
# Default to zImage, override when needed
all: zImage
-BOOT_TARGETS = zImage zImage.initrd uImage zImage% dtbImage% treeImage.% cuImage.% simpleImage.%
+# With make 3.82 we cannot mix normal and wildcard targets
+BOOT_TARGETS1 := zImage zImage.initrd uImaged
+BOOT_TARGETS2 := zImage% dtbImage% treeImage.% cuImage.% simpleImage.%
-PHONY += $(BOOT_TARGETS)
+PHONY += $(BOOT_TARGETS1) $(BOOT_TARGETS2)
boot := arch/$(ARCH)/boot
@@ -180,10 +182,16 @@ relocs_check: arch/powerpc/relocs_check.pl vmlinux
zImage: relocs_check
endif
-$(BOOT_TARGETS): vmlinux
+$(BOOT_TARGETS1): vmlinux
+ $(Q)$(MAKE) ARCH=ppc64 $(build)=$(boot) $(patsubst %,$(boot)/%,$@)
+$(BOOT_TARGETS2): vmlinux
+ $(Q)$(MAKE) ARCH=ppc64 $(build)=$(boot) $(patsubst %,$(boot)/%,$@)
+
+
+bootwrapper_install:
$(Q)$(MAKE) ARCH=ppc64 $(build)=$(boot) $(patsubst %,$(boot)/%,$@)
-bootwrapper_install %.dtb:
+%.dtb:
$(Q)$(MAKE) ARCH=ppc64 $(build)=$(boot) $(patsubst %,$(boot)/%,$@)
define archhelp
^ permalink raw reply related
* Re: make 3.82 fails on powerpc defconfig update [was: Linux 2.6.35]
From: Sam Ravnborg @ 2010-08-03 6:48 UTC (permalink / raw)
To: Thomas Backlund
Cc: Michal Marek, linuxppc-dev@ozlabs.org, Linux Kernel Mailing List,
bug-make@gnu.org
In-Reply-To: <4C5732A7.2050003@mandriva.org>
On Tue, Aug 03, 2010 at 12:03:35AM +0300, Thomas Backlund wrote:
> 02.08.2010 23:51, Sam Ravnborg skrev:
> >>
> >> Thanks, this seems to fix the first issue, but then I get the same erro on the following line 190:
> >>
> >> 190: bootwrapper_install %.dtb:
> >> 191: $(Q)$(MAKE) ARCH=ppc64 $(build)=$(boot) $(patsubst %,$(boot)/%,$@)
> >>
> >
> > Obviously - dunno how I missed that.
> > Updated patch below.
> >
> > I will do a proper submission after you
> > confirm that powerpc build is working with make 3.82.
> >
>
> Yeah, that was an obvious fix, thanks!
>
> One small typo fix below...
> (a missing ':')
>
> Otherwise it works here, so:
>
> Tested-by: Thomas Backlund <tmb@mandriva.org>
Thanks.
I have sent a proper patch to Ben/Paul (powerpc maintainers).
Sam
^ permalink raw reply
* Re: ramdisk size is larger than 4MB
From: Shawn Jin @ 2010-08-03 7:16 UTC (permalink / raw)
To: Scott Wood; +Cc: ppcdev
In-Reply-To: <AANLkTi=mq_vnUkrdmTeKjtfO3DQLKRESxzEBiK3jeTpV@mail.gmail.com>
> I found the link_address in the wrapper shell script sets the _start
> address. But after changing it to 0x800000, the kernel failed to boot,
> shown below. There must be something also needs proper adjustment.
> What would that be?
I did more debugging and something is really weird though. When the
link address is changed to 0x800000, when stepping through the kernel,
I actually got the kernel boot successfully. However I let the kernel
run through it would just crash. After crash the BDI2000 shows it
stopped at __delay().
I also changed the link address to 0x4000000. During the function
of_scan_flat_dt(early_init_dt_scan_chosen, NULL) called in
early_init_devtree(), gdb shows the program (i.e. kernel) received a
signal SIGSTOP.
Why would the kernel crash during that time? of_scan_flat_dt() doesn't
seem to be the cause of SIGSTOP. What would that be then?
Thanks,
-Shawn.
^ permalink raw reply
* Re: Issues to access Compact Flash Card on MPC8360E
From: Atul Deshmukh @ 2010-08-03 10:11 UTC (permalink / raw)
To: linuxppc-dev
In-Reply-To: <20100802175952.GA27967@oksana.dev.rtsoft.ru>
[-- Attachment #1: Type: text/plain, Size: 2022 bytes --]
Thanks a lot Anton,
I was confused with "ata-generic" entry at pata@3.0 node.
Now the things are pretty much clear..
Earlier we thought that IORD and IOWR pins would be from GPIO, so thought
had to do bit-banging. But after going through the schematics, we come to
know that pins could be from LGPL0/1.
On Mon, Aug 2, 2010 at 11:29 PM, Anton Vorontsov <cbouatmailru@gmail.com>wrote:
> On Mon, Aug 02, 2010 at 07:44:22PM +0530, Atul Deshmukh wrote:
> > Thanks a lot Anton,
> > From the dts entry given below,
> >
> > localbus@e0005000 {
> > #address-cells = <2>;
> > #size-cells = <1>;
> > compatible = "fsl,mpc8349e-localbus",
> > "fsl,pq2pro-localbus";
> > reg = <0xe0005000 0xd8>;
> > ranges = <0x3 0x0 0xf0000000 0x210>;
> > pata@3,0 {
> > compatible = "fsl,mpc8349emitx-pata",
> > "ata-generic";
> > reg = <0x3 0x0 0x10 0x3 0x20c 0x4>;
> > reg-shift = <1>;
> > pio-mode = <6>;
> > interrupts = <23 0x8>;
> > interrupt-parent = <&ipic>;
> > };
> > };
> >
> >
> The driver is drivers/ata/pata_of_platform.c.
>
> > controls PCI-based IDE-controller where we can plug in our CF card...Am I
> > right???
>
> Nope, no PCI involved. CF is almost* directly connected to
> the localbus.
>
> > But in our design we don't use any controller we directly connects CF
> card
> > to local bus where UPM controls it..
>
> Yes, that's exactly how CF is done on MPC8349EmITX boards.
>
> > Can you please explain how the interface is implemented in MPC8349..
>
> Via localbus + UPM.
>
> * 'almost' is because there are some buffers and inverters, see
> schematics:
>
> http://www.freescale.com/files/32bit/hardware_tools/schematics/MPC8349EMITXESCH.pdf
>
> --
> Anton Vorontsov
> email: cbouatmailru@gmail.com
> irc://irc.freenode.net/bd2
>
--
Regards,
Atul
[-- Attachment #2: Type: text/html, Size: 3067 bytes --]
^ permalink raw reply
* RE: [PATCH v2 5/7] powerpc/85xx: Add MChk handler for SRIO port
From: Bounine, Alexandre @ 2010-08-03 12:17 UTC (permalink / raw)
To: Michael Neuling, Timur Tabi
Cc: Alexandre Bounine, linuxppc-dev, linux-kernel, thomas.moll
In-Reply-To: <4381.1280815590@neuling.org>
This happened after change to book-e definitions.
There are patches that address this issue.
> -----Original Message-----
> From: Michael Neuling [mailto:mikey@neuling.org]
> Sent: Tuesday, August 03, 2010 2:07 AM
> To: Timur Tabi
> Cc: Alexandre Bounine; linuxppc-dev@lists.ozlabs.org;
linux-kernel@vger.kernel.org;
> thomas.moll@sysgo.com
> Subject: Re: [PATCH v2 5/7] powerpc/85xx: Add MChk handler for SRIO
port
>=20
> > > MCSR_MASK is not defined anywhere, so when I compile this code, I
get this:
> >
> > Never mind. I see that it's been fixed already, and that the patch
> > that removed MCSR_MASK was posted around the same time that this
patch
> > was posted.
>=20
> I don't know what happened here but 2.6.35 is broken because of this
> problem:
>=20
> arch/powerpc/sysdev/fsl_rio.c:248: error: 'MCSR_MASK' undeclared
(first use in this function)
> arch/powerpc/sysdev/fsl_rio.c:248: error: (Each undeclared identifier
is reported only once
> arch/powerpc/sysdev/fsl_rio.c:248: error: for each function it appears
in.)
> arch/powerpc/sysdev/fsl_rio.c:250: error: 'MCSR_BUS_RBERR' undeclared
(first use in this function)
>=20
> Mikey
^ permalink raw reply
* Re: [PATCH 2/3] P4080/mtd: Only make elbc nand driver detect nand flash partitions
From: Kumar Gala @ 2010-08-03 12:58 UTC (permalink / raw)
To: Roy Zang; +Cc: Lan Chunhe-B25806, linuxppc-dev@ozlabs.org list,
Gala Kumar-B11780
In-Reply-To: <1280810714-30639-2-git-send-email-tie-fei.zang@freescale.com>
On Aug 2, 2010, at 11:45 PM, Roy Zang wrote:
> From: Lan Chunhe-B25806 <b25806@freescale.com>
>=20
> The former driver had the two functions:
>=20
> 1. detecting nand flash partitions;
> 2. registering elbc interrupt.
>=20
> Now, second function is removed to fsl_lbc.c.
>=20
> Signed-off-by: Lan Chunhe-B25806 <b25806@freescale.com>
> Signed-off-by: Roy Zang <tie-fei.zang@freescale.com>
> ---
> drivers/mtd/nand/Kconfig | 1 +
> drivers/mtd/nand/fsl_elbc_nand.c | 464 =
++++++++++++++------------------------
> 2 files changed, 170 insertions(+), 295 deletions(-)
mtd list and maintainer should be CC'd on these.
- k=
^ permalink raw reply
* Re: [PATCH v2 5/7] powerpc/85xx: Add MChk handler for SRIO port
From: Timur Tabi @ 2010-08-03 13:01 UTC (permalink / raw)
To: Bounine, Alexandre
Cc: Michael Neuling, linux-kernel, Alexandre Bounine, thomas.moll,
linuxppc-dev
In-Reply-To: <0CE8B6BE3C4AD74AB97D9D29BD24E5520114309D@CORPEXCH1.na.ads.idt.com>
On Tue, Aug 3, 2010 at 7:17 AM, Bounine, Alexandre
<Alexandre.Bounine@idt.com> wrote:
> This happened after change to book-e definitions.
> There are patches that address this issue.
And those patches should have been applied before 2.6.35 was released.
Someone dropped the ball. 2.6.35 is broken for a number of PowerPC
boards:
$ make mpc85xx_defconfig
...
$ make
...
CC arch/powerpc/sysdev/fsl_rio.o
arch/powerpc/sysdev/fsl_rio.c: In function 'fsl_rio_mcheck_exception':
arch/powerpc/sysdev/fsl_rio.c:248: error: 'MCSR_MASK' undeclared
(first use in this function)
arch/powerpc/sysdev/fsl_rio.c:248: error: (Each undeclared identifier
is reported only once
arch/powerpc/sysdev/fsl_rio.c:248: error: for each function it appears in.)
make[1]: *** [arch/powerpc/sysdev/fsl_rio.o] Error 1
--
Timur Tabi
Linux kernel developer at Freescale
^ permalink raw reply
* RE: [PATCH v2 5/7] powerpc/85xx: Add MChk handler for SRIO port
From: Bounine, Alexandre @ 2010-08-03 13:24 UTC (permalink / raw)
To: Timur Tabi
Cc: Michael Neuling, linux-kernel, Alexandre Bounine, thomas.moll,
linuxppc-dev
In-Reply-To: <AANLkTinpwYnyc1oN1VbtBgUF6bk6E5q_Gq1Dj3WXV3wc@mail.gmail.com>
Yang Li pointed to these patches in his post from July 23, 2010.
It would be nice to have these patches in mainline code.=20
> -----Original Message-----
> From: timur.tabi@gmail.com [mailto:timur.tabi@gmail.com] On Behalf Of
Timur Tabi
> Sent: Tuesday, August 03, 2010 9:02 AM
> To: Bounine, Alexandre
> Cc: Michael Neuling; Alexandre Bounine; linuxppc-dev@lists.ozlabs.org;
linux-kernel@vger.kernel.org;
> thomas.moll@sysgo.com; Kumar Gala
> Subject: Re: [PATCH v2 5/7] powerpc/85xx: Add MChk handler for SRIO
port
>=20
> On Tue, Aug 3, 2010 at 7:17 AM, Bounine, Alexandre
> <Alexandre.Bounine@idt.com> wrote:
> > This happened after change to book-e definitions.
> > There are patches that address this issue.
>=20
> And those patches should have been applied before 2.6.35 was released.
> Someone dropped the ball. 2.6.35 is broken for a number of PowerPC
> boards:
>=20
> $ make mpc85xx_defconfig
> ....
> $ make
> ....
> CC arch/powerpc/sysdev/fsl_rio.o
> arch/powerpc/sysdev/fsl_rio.c: In function 'fsl_rio_mcheck_exception':
> arch/powerpc/sysdev/fsl_rio.c:248: error: 'MCSR_MASK' undeclared
> (first use in this function)
> arch/powerpc/sysdev/fsl_rio.c:248: error: (Each undeclared identifier
> is reported only once
> arch/powerpc/sysdev/fsl_rio.c:248: error: for each function it appears
in.)
> make[1]: *** [arch/powerpc/sysdev/fsl_rio.o] Error 1
>=20
> --
> Timur Tabi
> Linux kernel developer at Freescale
^ permalink raw reply
* [PATCH 0/9] v4 De-couple sysfs memory directories from memory sections
From: Nathan Fontenot @ 2010-08-03 13:32 UTC (permalink / raw)
To: linux-kernel, linux-mm, linuxppc-dev
Cc: Greg KH, KAMEZAWA Hiroyuki, Dave Hansen
This set of patches de-couples the idea that there is a single
directory in sysfs for each memory section. The intent of the
patches is to reduce the number of sysfs directories created to
resolve a boot-time performance issue. On very large systems
boot time are getting very long (as seen on powerpc hardware)
due to the enormous number of sysfs directories being created.
On a system with 1 TB of memory we create ~63,000 directories.
For even larger systems boot times are being measured in hours.
This set of patches allows for each directory created in sysfs
to cover more than one memory section. The default behavior for
sysfs directory creation is the same, in that each directory
represents a single memory section. A new file 'end_phys_index'
in each directory contains the physical_id of the last memory
section covered by the directory so that users can easily
determine the memory section range of a directory.
Updates for version 4 of the patchset includes an additional
patch [4/9] that introduces a new mutex to be taken for any
add or remove (not hotplug) of memory. The following updates
are also included.
Patch 2/9 Add new phys_index properties
- The start_phys_index property was reverted to the original
phys_index name.
Patch 3/9 Add section count to memory_block
- Use atomic_dec_and_test()
Patch 7/9 Update the node sysfs code
- Update the inline definition of unregister_mem_sects_under_nodes
for !CONFIG_NUMA builds.
Patch 8/9 Define memory_block_size_bytes() for ppc/pseries
- Use an unsigned long for getting property value.
Patch 9/9 Update memory-hotplug documentation
- Minor updates for reversion of phys_index property name.
Thanks,
Nathan Fontenot
^ permalink raw reply
* [PATCH 1/9] v4 Move the find_memory_block() routine up
From: Nathan Fontenot @ 2010-08-03 13:36 UTC (permalink / raw)
To: linux-kernel, linux-mm, linuxppc-dev
Cc: Greg KH, KAMEZAWA Hiroyuki, Dave Hansen
In-Reply-To: <4C581A6D.9030908@austin.ibm.com>
Move the find_memory_block() routine up to avoid needing a forward
declaration in subsequent patches.
Signed-off-by: Nathan Fontenot <nfont@austin.ibm.com>
---
drivers/base/memory.c | 62 +++++++++++++++++++++++++-------------------------
1 file changed, 31 insertions(+), 31 deletions(-)
Index: linux-2.6/drivers/base/memory.c
===================================================================
--- linux-2.6.orig/drivers/base/memory.c 2010-08-02 13:23:51.000000000 -0500
+++ linux-2.6/drivers/base/memory.c 2010-08-02 13:32:21.000000000 -0500
@@ -435,6 +435,37 @@ int __weak arch_get_memory_phys_device(u
return 0;
}
+/*
+ * For now, we have a linear search to go find the appropriate
+ * memory_block corresponding to a particular phys_index. If
+ * this gets to be a real problem, we can always use a radix
+ * tree or something here.
+ *
+ * This could be made generic for all sysdev classes.
+ */
+struct memory_block *find_memory_block(struct mem_section *section)
+{
+ struct kobject *kobj;
+ struct sys_device *sysdev;
+ struct memory_block *mem;
+ char name[sizeof(MEMORY_CLASS_NAME) + 9 + 1];
+
+ /*
+ * This only works because we know that section == sysdev->id
+ * slightly redundant with sysdev_register()
+ */
+ sprintf(&name[0], "%s%d", MEMORY_CLASS_NAME, __section_nr(section));
+
+ kobj = kset_find_obj(&memory_sysdev_class.kset, name);
+ if (!kobj)
+ return NULL;
+
+ sysdev = container_of(kobj, struct sys_device, kobj);
+ mem = container_of(sysdev, struct memory_block, sysdev);
+
+ return mem;
+}
+
static int add_memory_block(int nid, struct mem_section *section,
unsigned long state, enum mem_add_context context)
{
@@ -468,37 +499,6 @@ static int add_memory_block(int nid, str
return ret;
}
-/*
- * For now, we have a linear search to go find the appropriate
- * memory_block corresponding to a particular phys_index. If
- * this gets to be a real problem, we can always use a radix
- * tree or something here.
- *
- * This could be made generic for all sysdev classes.
- */
-struct memory_block *find_memory_block(struct mem_section *section)
-{
- struct kobject *kobj;
- struct sys_device *sysdev;
- struct memory_block *mem;
- char name[sizeof(MEMORY_CLASS_NAME) + 9 + 1];
-
- /*
- * This only works because we know that section == sysdev->id
- * slightly redundant with sysdev_register()
- */
- sprintf(&name[0], "%s%d", MEMORY_CLASS_NAME, __section_nr(section));
-
- kobj = kset_find_obj(&memory_sysdev_class.kset, name);
- if (!kobj)
- return NULL;
-
- sysdev = container_of(kobj, struct sys_device, kobj);
- mem = container_of(sysdev, struct memory_block, sysdev);
-
- return mem;
-}
-
int remove_memory_block(unsigned long node_id, struct mem_section *section,
int phys_device)
{
^ permalink raw reply
* [PATCH 2/9] v4 Add new phys_index properties
From: Nathan Fontenot @ 2010-08-03 13:37 UTC (permalink / raw)
To: linux-kernel, linux-mm, linuxppc-dev
Cc: Greg KH, KAMEZAWA Hiroyuki, Dave Hansen
In-Reply-To: <4C581A6D.9030908@austin.ibm.com>
Update the 'phys_index' properties of a memory block to include a
'start_phys_index' which is the same as the current 'phys_index' property.
The property still appears as 'phys_index' in sysfs but the memory_block
struct name is updated to indicate the start and end values.
This also adds an 'end_phys_index' property to indicate the id of the
last section in th memory block.
Signed-off-by: Nathan Fontenot <nfont@austin.ibm.com>
---
drivers/base/memory.c | 28 ++++++++++++++++++++--------
include/linux/memory.h | 3 ++-
2 files changed, 22 insertions(+), 9 deletions(-)
Index: linux-2.6/drivers/base/memory.c
===================================================================
--- linux-2.6.orig/drivers/base/memory.c 2010-08-02 13:32:21.000000000 -0500
+++ linux-2.6/drivers/base/memory.c 2010-08-02 13:33:27.000000000 -0500
@@ -109,12 +109,20 @@ unregister_memory(struct memory_block *m
* uses.
*/
-static ssize_t show_mem_phys_index(struct sys_device *dev,
+static ssize_t show_mem_start_phys_index(struct sys_device *dev,
struct sysdev_attribute *attr, char *buf)
{
struct memory_block *mem =
container_of(dev, struct memory_block, sysdev);
- return sprintf(buf, "%08lx\n", mem->phys_index);
+ return sprintf(buf, "%08lx\n", mem->start_phys_index);
+}
+
+static ssize_t show_mem_end_phys_index(struct sys_device *dev,
+ struct sysdev_attribute *attr, char *buf)
+{
+ struct memory_block *mem =
+ container_of(dev, struct memory_block, sysdev);
+ return sprintf(buf, "%08lx\n", mem->end_phys_index);
}
/*
@@ -128,7 +136,7 @@ static ssize_t show_mem_removable(struct
struct memory_block *mem =
container_of(dev, struct memory_block, sysdev);
- start_pfn = section_nr_to_pfn(mem->phys_index);
+ start_pfn = section_nr_to_pfn(mem->start_phys_index);
ret = is_mem_section_removable(start_pfn, PAGES_PER_SECTION);
return sprintf(buf, "%d\n", ret);
}
@@ -191,7 +199,7 @@ memory_block_action(struct memory_block
int ret;
int old_state = mem->state;
- psection = mem->phys_index;
+ psection = mem->start_phys_index;
first_page = pfn_to_page(psection << PFN_SECTION_SHIFT);
/*
@@ -264,7 +272,7 @@ store_mem_state(struct sys_device *dev,
int ret = -EINVAL;
mem = container_of(dev, struct memory_block, sysdev);
- phys_section_nr = mem->phys_index;
+ phys_section_nr = mem->start_phys_index;
if (!present_section_nr(phys_section_nr))
goto out;
@@ -296,7 +304,8 @@ static ssize_t show_phys_device(struct s
return sprintf(buf, "%d\n", mem->phys_device);
}
-static SYSDEV_ATTR(phys_index, 0444, show_mem_phys_index, NULL);
+static SYSDEV_ATTR(phys_index, 0444, show_mem_start_phys_index, NULL);
+static SYSDEV_ATTR(end_phys_index, 0444, show_mem_end_phys_index, NULL);
static SYSDEV_ATTR(state, 0644, show_mem_state, store_mem_state);
static SYSDEV_ATTR(phys_device, 0444, show_phys_device, NULL);
static SYSDEV_ATTR(removable, 0444, show_mem_removable, NULL);
@@ -476,16 +485,18 @@ static int add_memory_block(int nid, str
if (!mem)
return -ENOMEM;
- mem->phys_index = __section_nr(section);
+ mem->start_phys_index = __section_nr(section);
mem->state = state;
mutex_init(&mem->state_mutex);
- start_pfn = section_nr_to_pfn(mem->phys_index);
+ start_pfn = section_nr_to_pfn(mem->start_phys_index);
mem->phys_device = arch_get_memory_phys_device(start_pfn);
ret = register_memory(mem, section);
if (!ret)
ret = mem_create_simple_file(mem, phys_index);
if (!ret)
+ ret = mem_create_simple_file(mem, end_phys_index);
+ if (!ret)
ret = mem_create_simple_file(mem, state);
if (!ret)
ret = mem_create_simple_file(mem, phys_device);
@@ -507,6 +518,7 @@ int remove_memory_block(unsigned long no
mem = find_memory_block(section);
unregister_mem_sect_under_nodes(mem);
mem_remove_simple_file(mem, phys_index);
+ mem_remove_simple_file(mem, end_phys_index);
mem_remove_simple_file(mem, state);
mem_remove_simple_file(mem, phys_device);
mem_remove_simple_file(mem, removable);
Index: linux-2.6/include/linux/memory.h
===================================================================
--- linux-2.6.orig/include/linux/memory.h 2010-08-02 13:23:49.000000000 -0500
+++ linux-2.6/include/linux/memory.h 2010-08-02 13:33:27.000000000 -0500
@@ -21,7 +21,8 @@
#include <linux/mutex.h>
struct memory_block {
- unsigned long phys_index;
+ unsigned long start_phys_index;
+ unsigned long end_phys_index;
unsigned long state;
/*
* This serializes all state change requests. It isn't
^ permalink raw reply
* [PATCH 3/9] v4 Add section count to memory_block
From: Nathan Fontenot @ 2010-08-03 13:38 UTC (permalink / raw)
To: linux-kernel, linux-mm, linuxppc-dev
Cc: Greg KH, KAMEZAWA Hiroyuki, Dave Hansen
In-Reply-To: <4C581A6D.9030908@austin.ibm.com>
Add a section count property to the memory_block struct to track the number
of memory sections that have been added/removed from a memory block. This
allows us to know when the last memory section of a memory block has been
removed so we can remove the memory block.
Signed-off-by: Nathan Fontenot <nfont@austin.ibm.com>
---
drivers/base/memory.c | 18 +++++++++++-------
include/linux/memory.h | 2 ++
2 files changed, 13 insertions(+), 7 deletions(-)
Index: linux-2.6/drivers/base/memory.c
===================================================================
--- linux-2.6.orig/drivers/base/memory.c 2010-08-02 13:33:27.000000000 -0500
+++ linux-2.6/drivers/base/memory.c 2010-08-02 13:35:00.000000000 -0500
@@ -487,6 +487,7 @@ static int add_memory_block(int nid, str
mem->start_phys_index = __section_nr(section);
mem->state = state;
+ atomic_inc(&mem->section_count);
mutex_init(&mem->state_mutex);
start_pfn = section_nr_to_pfn(mem->start_phys_index);
mem->phys_device = arch_get_memory_phys_device(start_pfn);
@@ -516,13 +517,16 @@ int remove_memory_block(unsigned long no
struct memory_block *mem;
mem = find_memory_block(section);
- unregister_mem_sect_under_nodes(mem);
- mem_remove_simple_file(mem, phys_index);
- mem_remove_simple_file(mem, end_phys_index);
- mem_remove_simple_file(mem, state);
- mem_remove_simple_file(mem, phys_device);
- mem_remove_simple_file(mem, removable);
- unregister_memory(mem, section);
+
+ if (atomic_dec_and_test(&mem->section_count)) {
+ unregister_mem_sect_under_nodes(mem);
+ mem_remove_simple_file(mem, phys_index);
+ mem_remove_simple_file(mem, end_phys_index);
+ mem_remove_simple_file(mem, state);
+ mem_remove_simple_file(mem, phys_device);
+ mem_remove_simple_file(mem, removable);
+ unregister_memory(mem, section);
+ }
return 0;
}
Index: linux-2.6/include/linux/memory.h
===================================================================
--- linux-2.6.orig/include/linux/memory.h 2010-08-02 13:33:27.000000000 -0500
+++ linux-2.6/include/linux/memory.h 2010-08-02 13:35:00.000000000 -0500
@@ -19,11 +19,13 @@
#include <linux/node.h>
#include <linux/compiler.h>
#include <linux/mutex.h>
+#include <asm/atomic.h>
struct memory_block {
unsigned long start_phys_index;
unsigned long end_phys_index;
unsigned long state;
+ atomic_t section_count;
/*
* This serializes all state change requests. It isn't
* held during creation because the control files are
^ permalink raw reply
* [PATCH 4/9] v4 Add mutex for add/remove of memory blocks
From: Nathan Fontenot @ 2010-08-03 13:39 UTC (permalink / raw)
To: linux-kernel, linux-mm, linuxppc-dev
Cc: Greg KH, KAMEZAWA Hiroyuki, Dave Hansen
In-Reply-To: <4C581A6D.9030908@austin.ibm.com>
Add a new mutex for use in adding and removing of memory blocks. This
is needed to avoid any race conditions in which the same memory block could
be added and removed at the same time.
Signed-off-by: Nathan Fontenot <nfont@austin.ibm.com>
---
drivers/base/memory.c | 9 +++++++++
1 file changed, 9 insertions(+)
Index: linux-2.6/drivers/base/memory.c
===================================================================
--- linux-2.6.orig/drivers/base/memory.c 2010-08-02 13:35:00.000000000 -0500
+++ linux-2.6/drivers/base/memory.c 2010-08-02 13:45:34.000000000 -0500
@@ -27,6 +27,8 @@
#include <asm/atomic.h>
#include <asm/uaccess.h>
+static struct mutex mem_sysfs_mutex;
+
#define MEMORY_CLASS_NAME "memory"
static struct sysdev_class memory_sysdev_class = {
@@ -485,6 +487,8 @@ static int add_memory_block(int nid, str
if (!mem)
return -ENOMEM;
+ mutex_lock(&mem_sysfs_mutex);
+
mem->start_phys_index = __section_nr(section);
mem->state = state;
atomic_inc(&mem->section_count);
@@ -508,6 +512,7 @@ static int add_memory_block(int nid, str
ret = register_mem_sect_under_node(mem, nid);
}
+ mutex_unlock(&mem_sysfs_mutex);
return ret;
}
@@ -516,6 +521,7 @@ int remove_memory_block(unsigned long no
{
struct memory_block *mem;
+ mutex_lock(&mem_sysfs_mutex);
mem = find_memory_block(section);
if (atomic_dec_and_test(&mem->section_count)) {
@@ -528,6 +534,7 @@ int remove_memory_block(unsigned long no
unregister_memory(mem, section);
}
+ mutex_unlock(&mem_sysfs_mutex);
return 0;
}
@@ -562,6 +569,8 @@ int __init memory_dev_init(void)
if (ret)
goto out;
+ mutex_init(&mem_sysfs_mutex);
+
/*
* Create entries for memory sections that were found
* during boot and have been initialized
^ permalink raw reply
* [PATCH 5/9] v4 Allow memory_block to span multiple memory sections
From: Nathan Fontenot @ 2010-08-03 13:40 UTC (permalink / raw)
To: linux-kernel, linux-mm, linuxppc-dev
Cc: Greg KH, KAMEZAWA Hiroyuki, Dave Hansen
In-Reply-To: <4C581A6D.9030908@austin.ibm.com>
Update the memory sysfs code that each sysfs memory directory is now
considered a memory block that can contain multiple memory sections per
memory block. The default size of each memory block is SECTION_SIZE_BITS
to maintain the current behavior of having a single memory section per
memory block (i.e. one sysfs directory per memory section).
For architectures that want to have memory blocks span multiple
memory sections they need only define their own memory_block_size_bytes()
routine.
Signed-off-by: Nathan Fontenot <nfont@austin.ibm.com>
---
drivers/base/memory.c | 148 ++++++++++++++++++++++++++++++++++----------------
1 file changed, 103 insertions(+), 45 deletions(-)
Index: linux-2.6/drivers/base/memory.c
===================================================================
--- linux-2.6.orig/drivers/base/memory.c 2010-08-02 13:45:34.000000000 -0500
+++ linux-2.6/drivers/base/memory.c 2010-08-02 14:01:04.000000000 -0500
@@ -30,6 +30,14 @@
static struct mutex mem_sysfs_mutex;
#define MEMORY_CLASS_NAME "memory"
+#define MIN_MEMORY_BLOCK_SIZE (1 << SECTION_SIZE_BITS)
+
+static int sections_per_block;
+
+static inline int base_memory_block_id(int section_nr)
+{
+ return (section_nr / sections_per_block) * sections_per_block;
+}
static struct sysdev_class memory_sysdev_class = {
.name = MEMORY_CLASS_NAME,
@@ -84,22 +92,21 @@ EXPORT_SYMBOL(unregister_memory_isolate_
* register_memory - Setup a sysfs device for a memory block
*/
static
-int register_memory(struct memory_block *memory, struct mem_section *section)
+int register_memory(struct memory_block *memory)
{
int error;
memory->sysdev.cls = &memory_sysdev_class;
- memory->sysdev.id = __section_nr(section);
+ memory->sysdev.id = memory->start_phys_index;
error = sysdev_register(&memory->sysdev);
return error;
}
static void
-unregister_memory(struct memory_block *memory, struct mem_section *section)
+unregister_memory(struct memory_block *memory)
{
BUG_ON(memory->sysdev.cls != &memory_sysdev_class);
- BUG_ON(memory->sysdev.id != __section_nr(section));
/* drop the ref. we got in remove_memory_block() */
kobject_put(&memory->sysdev.kobj);
@@ -133,13 +140,16 @@ static ssize_t show_mem_end_phys_index(s
static ssize_t show_mem_removable(struct sys_device *dev,
struct sysdev_attribute *attr, char *buf)
{
- unsigned long start_pfn;
- int ret;
+ unsigned long i, pfn;
+ int ret = 1;
struct memory_block *mem =
container_of(dev, struct memory_block, sysdev);
- start_pfn = section_nr_to_pfn(mem->start_phys_index);
- ret = is_mem_section_removable(start_pfn, PAGES_PER_SECTION);
+ for (i = mem->start_phys_index; i <= mem->end_phys_index; i++) {
+ pfn = section_nr_to_pfn(i);
+ ret &= is_mem_section_removable(pfn, PAGES_PER_SECTION);
+ }
+
return sprintf(buf, "%d\n", ret);
}
@@ -192,17 +202,14 @@ int memory_isolate_notify(unsigned long
* OK to have direct references to sparsemem variables in here.
*/
static int
-memory_block_action(struct memory_block *mem, unsigned long action)
+memory_section_action(unsigned long phys_index, unsigned long action)
{
int i;
- unsigned long psection;
unsigned long start_pfn, start_paddr;
struct page *first_page;
int ret;
- int old_state = mem->state;
- psection = mem->start_phys_index;
- first_page = pfn_to_page(psection << PFN_SECTION_SHIFT);
+ first_page = pfn_to_page(phys_index << PFN_SECTION_SHIFT);
/*
* The probe routines leave the pages reserved, just
@@ -215,8 +222,8 @@ memory_block_action(struct memory_block
continue;
printk(KERN_WARNING "section number %ld page number %d "
- "not reserved, was it already online? \n",
- psection, i);
+ "not reserved, was it already online?\n",
+ phys_index, i);
return -EBUSY;
}
}
@@ -227,18 +234,13 @@ memory_block_action(struct memory_block
ret = online_pages(start_pfn, PAGES_PER_SECTION);
break;
case MEM_OFFLINE:
- mem->state = MEM_GOING_OFFLINE;
start_paddr = page_to_pfn(first_page) << PAGE_SHIFT;
ret = remove_memory(start_paddr,
PAGES_PER_SECTION << PAGE_SHIFT);
- if (ret) {
- mem->state = old_state;
- break;
- }
break;
default:
- WARN(1, KERN_WARNING "%s(%p, %ld) unknown action: %ld\n",
- __func__, mem, action, action);
+ WARN(1, KERN_WARNING "%s(%ld, %ld) unknown action: "
+ "%ld\n", __func__, phys_index, action, action);
ret = -EINVAL;
}
@@ -248,7 +250,7 @@ memory_block_action(struct memory_block
static int memory_block_change_state(struct memory_block *mem,
unsigned long to_state, unsigned long from_state_req)
{
- int ret = 0;
+ int i, ret = 0;
mutex_lock(&mem->state_mutex);
if (mem->state != from_state_req) {
@@ -256,8 +258,21 @@ static int memory_block_change_state(str
goto out;
}
- ret = memory_block_action(mem, to_state);
- if (!ret)
+ if (to_state == MEM_OFFLINE)
+ mem->state = MEM_GOING_OFFLINE;
+
+ for (i = mem->start_phys_index; i <= mem->end_phys_index; i++) {
+ ret = memory_section_action(i, to_state);
+ if (ret)
+ break;
+ }
+
+ if (ret) {
+ for (i = mem->start_phys_index; i <= mem->end_phys_index; i++)
+ memory_section_action(i, from_state_req);
+
+ mem->state = from_state_req;
+ } else
mem->state = to_state;
out:
@@ -270,20 +285,15 @@ store_mem_state(struct sys_device *dev,
struct sysdev_attribute *attr, const char *buf, size_t count)
{
struct memory_block *mem;
- unsigned int phys_section_nr;
int ret = -EINVAL;
mem = container_of(dev, struct memory_block, sysdev);
- phys_section_nr = mem->start_phys_index;
-
- if (!present_section_nr(phys_section_nr))
- goto out;
if (!strncmp(buf, "online", min((int)count, 6)))
ret = memory_block_change_state(mem, MEM_ONLINE, MEM_OFFLINE);
else if(!strncmp(buf, "offline", min((int)count, 7)))
ret = memory_block_change_state(mem, MEM_OFFLINE, MEM_ONLINE);
-out:
+
if (ret)
return ret;
return count;
@@ -460,12 +470,13 @@ struct memory_block *find_memory_block(s
struct sys_device *sysdev;
struct memory_block *mem;
char name[sizeof(MEMORY_CLASS_NAME) + 9 + 1];
+ int block_id = base_memory_block_id(__section_nr(section));
/*
* This only works because we know that section == sysdev->id
* slightly redundant with sysdev_register()
*/
- sprintf(&name[0], "%s%d", MEMORY_CLASS_NAME, __section_nr(section));
+ sprintf(&name[0], "%s%d", MEMORY_CLASS_NAME, block_id);
kobj = kset_find_obj(&memory_sysdev_class.kset, name);
if (!kobj)
@@ -477,26 +488,26 @@ struct memory_block *find_memory_block(s
return mem;
}
-static int add_memory_block(int nid, struct mem_section *section,
- unsigned long state, enum mem_add_context context)
+static int init_memory_block(struct memory_block **memory,
+ struct mem_section *section, unsigned long state)
{
- struct memory_block *mem = kzalloc(sizeof(*mem), GFP_KERNEL);
+ struct memory_block *mem;
unsigned long start_pfn;
int ret = 0;
+ mem = kzalloc(sizeof(*mem), GFP_KERNEL);
if (!mem)
return -ENOMEM;
- mutex_lock(&mem_sysfs_mutex);
-
- mem->start_phys_index = __section_nr(section);
+ mem->start_phys_index = base_memory_block_id(__section_nr(section));
+ mem->end_phys_index = mem->start_phys_index + sections_per_block - 1;
mem->state = state;
atomic_inc(&mem->section_count);
mutex_init(&mem->state_mutex);
start_pfn = section_nr_to_pfn(mem->start_phys_index);
mem->phys_device = arch_get_memory_phys_device(start_pfn);
- ret = register_memory(mem, section);
+ ret = register_memory(mem);
if (!ret)
ret = mem_create_simple_file(mem, phys_index);
if (!ret)
@@ -507,8 +518,29 @@ static int add_memory_block(int nid, str
ret = mem_create_simple_file(mem, phys_device);
if (!ret)
ret = mem_create_simple_file(mem, removable);
+
+ *memory = mem;
+ return ret;
+}
+
+static int add_memory_section(int nid, struct mem_section *section,
+ unsigned long state, enum mem_add_context context)
+{
+ struct memory_block *mem;
+ int ret = 0;
+
+ mutex_lock(&mem_sysfs_mutex);
+
+ mem = find_memory_block(section);
+ if (mem) {
+ atomic_inc(&mem->section_count);
+ kobject_put(&mem->sysdev.kobj);
+ } else
+ ret = init_memory_block(&mem, section, state);
+
if (!ret) {
- if (context == HOTPLUG)
+ if (context == HOTPLUG &&
+ atomic_read(&mem->section_count) == sections_per_block)
ret = register_mem_sect_under_node(mem, nid);
}
@@ -531,8 +563,10 @@ int remove_memory_block(unsigned long no
mem_remove_simple_file(mem, state);
mem_remove_simple_file(mem, phys_device);
mem_remove_simple_file(mem, removable);
- unregister_memory(mem, section);
- }
+ unregister_memory(mem);
+ kfree(mem);
+ } else
+ kobject_put(&mem->sysdev.kobj);
mutex_unlock(&mem_sysfs_mutex);
return 0;
@@ -544,7 +578,7 @@ int remove_memory_block(unsigned long no
*/
int register_new_memory(int nid, struct mem_section *section)
{
- return add_memory_block(nid, section, MEM_OFFLINE, HOTPLUG);
+ return add_memory_section(nid, section, MEM_OFFLINE, HOTPLUG);
}
int unregister_memory_section(struct mem_section *section)
@@ -555,6 +589,26 @@ int unregister_memory_section(struct mem
return remove_memory_block(0, section, 0);
}
+u32 __weak memory_block_size_bytes(void)
+{
+ return MIN_MEMORY_BLOCK_SIZE;
+}
+
+static u32 get_memory_block_size(void)
+{
+ u32 block_sz;
+
+ block_sz = memory_block_size_bytes();
+
+ /* Validate blk_sz is a power of 2 and not less than section size */
+ if ((block_sz & (block_sz - 1)) || (block_sz < MIN_MEMORY_BLOCK_SIZE)) {
+ WARN_ON(1);
+ block_sz = MIN_MEMORY_BLOCK_SIZE;
+ }
+
+ return block_sz;
+}
+
/*
* Initialize the sysfs support for memory devices...
*/
@@ -563,6 +617,7 @@ int __init memory_dev_init(void)
unsigned int i;
int ret;
int err;
+ int block_sz;
memory_sysdev_class.kset.uevent_ops = &memory_uevent_ops;
ret = sysdev_class_register(&memory_sysdev_class);
@@ -571,6 +626,9 @@ int __init memory_dev_init(void)
mutex_init(&mem_sysfs_mutex);
+ block_sz = get_memory_block_size();
+ sections_per_block = block_sz / MIN_MEMORY_BLOCK_SIZE;
+
/*
* Create entries for memory sections that were found
* during boot and have been initialized
@@ -578,8 +636,8 @@ int __init memory_dev_init(void)
for (i = 0; i < NR_MEM_SECTIONS; i++) {
if (!present_section_nr(i))
continue;
- err = add_memory_block(0, __nr_to_section(i), MEM_ONLINE,
- BOOT);
+ err = add_memory_section(0, __nr_to_section(i), MEM_ONLINE,
+ BOOT);
if (!ret)
ret = err;
}
^ permalink raw reply
* [PATCH 6/9] v4 Update the find_memory_block declaration
From: Nathan Fontenot @ 2010-08-03 13:41 UTC (permalink / raw)
To: linux-kernel, linux-mm, linuxppc-dev
Cc: Greg KH, KAMEZAWA Hiroyuki, Dave Hansen
In-Reply-To: <4C581A6D.9030908@austin.ibm.com>
Update the find_memory_block declaration to to take a struct mem_section *
so that it matches the definition.
Signed-off-by: Nathan Fontenot <nfont@austin.ibm.com>
---
include/linux/memory.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
Index: linux-2.6/include/linux/memory.h
===================================================================
--- linux-2.6.orig/include/linux/memory.h 2010-08-02 13:58:41.000000000 -0500
+++ linux-2.6/include/linux/memory.h 2010-08-02 14:01:15.000000000 -0500
@@ -116,7 +116,7 @@ extern int memory_dev_init(void);
extern int remove_memory_block(unsigned long, struct mem_section *, int);
extern int memory_notify(unsigned long val, void *v);
extern int memory_isolate_notify(unsigned long val, void *v);
-extern struct memory_block *find_memory_block(unsigned long);
+extern struct memory_block *find_memory_block(struct mem_section *);
extern int memory_is_hidden(struct mem_section *);
#define CONFIG_MEM_BLOCK_SIZE (PAGES_PER_SECTION<<PAGE_SHIFT)
enum mem_add_context { BOOT, HOTPLUG };
^ permalink raw reply
* [PATCH 7/9] v4 Update the node sysfs code
From: Nathan Fontenot @ 2010-08-03 13:42 UTC (permalink / raw)
To: linux-kernel, linux-mm, linuxppc-dev
Cc: Greg KH, KAMEZAWA Hiroyuki, Dave Hansen
In-Reply-To: <4C581A6D.9030908@austin.ibm.com>
Update the node sysfs code to be aware of the new capability for a memory
block to contain multiple memory sections. This requires an additional
parameter to unregister_mem_sect_under_nodes so that we know which memory
section of the memory block to unregister.
Signed-off-by: Nathan Fontenot <nfont@austin.ibm.com>
---
drivers/base/memory.c | 2 +-
drivers/base/node.c | 12 ++++++++----
include/linux/node.h | 6 ++++--
3 files changed, 13 insertions(+), 7 deletions(-)
Index: linux-2.6/drivers/base/node.c
===================================================================
--- linux-2.6.orig/drivers/base/node.c 2010-08-02 13:57:20.000000000 -0500
+++ linux-2.6/drivers/base/node.c 2010-08-02 14:01:58.000000000 -0500
@@ -346,8 +346,10 @@ int register_mem_sect_under_node(struct
return -EFAULT;
if (!node_online(nid))
return 0;
- sect_start_pfn = section_nr_to_pfn(mem_blk->phys_index);
- sect_end_pfn = sect_start_pfn + PAGES_PER_SECTION - 1;
+
+ sect_start_pfn = section_nr_to_pfn(mem_blk->start_phys_index);
+ sect_end_pfn = section_nr_to_pfn(mem_blk->end_phys_index);
+ sect_end_pfn += PAGES_PER_SECTION - 1;
for (pfn = sect_start_pfn; pfn <= sect_end_pfn; pfn++) {
int page_nid;
@@ -371,7 +373,8 @@ int register_mem_sect_under_node(struct
}
/* unregister memory section under all nodes that it spans */
-int unregister_mem_sect_under_nodes(struct memory_block *mem_blk)
+int unregister_mem_sect_under_nodes(struct memory_block *mem_blk,
+ unsigned long phys_index)
{
NODEMASK_ALLOC(nodemask_t, unlinked_nodes, GFP_KERNEL);
unsigned long pfn, sect_start_pfn, sect_end_pfn;
@@ -383,7 +386,8 @@ int unregister_mem_sect_under_nodes(stru
if (!unlinked_nodes)
return -ENOMEM;
nodes_clear(*unlinked_nodes);
- sect_start_pfn = section_nr_to_pfn(mem_blk->phys_index);
+
+ sect_start_pfn = section_nr_to_pfn(phys_index);
sect_end_pfn = sect_start_pfn + PAGES_PER_SECTION - 1;
for (pfn = sect_start_pfn; pfn <= sect_end_pfn; pfn++) {
int nid;
Index: linux-2.6/drivers/base/memory.c
===================================================================
--- linux-2.6.orig/drivers/base/memory.c 2010-08-02 14:01:04.000000000 -0500
+++ linux-2.6/drivers/base/memory.c 2010-08-02 14:01:58.000000000 -0500
@@ -555,9 +555,9 @@ int remove_memory_block(unsigned long no
mutex_lock(&mem_sysfs_mutex);
mem = find_memory_block(section);
+ unregister_mem_sect_under_nodes(mem, __section_nr(section));
if (atomic_dec_and_test(&mem->section_count)) {
- unregister_mem_sect_under_nodes(mem);
mem_remove_simple_file(mem, phys_index);
mem_remove_simple_file(mem, end_phys_index);
mem_remove_simple_file(mem, state);
Index: linux-2.6/include/linux/node.h
===================================================================
--- linux-2.6.orig/include/linux/node.h 2010-08-02 13:57:20.000000000 -0500
+++ linux-2.6/include/linux/node.h 2010-08-02 14:01:58.000000000 -0500
@@ -44,7 +44,8 @@ extern int register_cpu_under_node(unsig
extern int unregister_cpu_under_node(unsigned int cpu, unsigned int nid);
extern int register_mem_sect_under_node(struct memory_block *mem_blk,
int nid);
-extern int unregister_mem_sect_under_nodes(struct memory_block *mem_blk);
+extern int unregister_mem_sect_under_nodes(struct memory_block *mem_blk,
+ unsigned long phys_index);
#ifdef CONFIG_HUGETLBFS
extern void register_hugetlbfs_with_node(node_registration_func_t doregister,
@@ -72,7 +73,8 @@ static inline int register_mem_sect_unde
{
return 0;
}
-static inline int unregister_mem_sect_under_nodes(struct memory_block *mem_blk)
+static inline int unregister_mem_sect_under_nodes(struct memory_block *mem_blk,
+ unsigned long phys_index)
{
return 0;
}
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox