LinuxPPC-Dev Archive on lore.kernel.org

LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed

* Re: Accessing flash directly from User Space
From: Joerg Albert @ 2009-10-29 21:36 UTC (permalink / raw)
  To: Jonathan Haws; +Cc: linuxppc-dev@lists.ozlabs.org, Kenneth Johansson
In-Reply-To: <BB99A6BA28709744BF22A68E6D7EB51F0330D36A50@midas.usurf.usu.edu>

Jonathan,

On 10/28/2009 03:45 PM, Jonathan Haws wrote:
> Looking through our notes and talking with the engineer 
> who was performing the tests, it was exactly that - MTD was waiting
> for a signal that was produced differently than the hardware 
> ready signal.  By simply polling the flash until the hardware
> ready signal toggled we were able to get a much faster read and write speed.
> Granted, most of our signals are being sent through a CPLD,
> so that may be why MTD did not work as well.

even if your problem is solved I'd like to understand this performance issue.
I had a look into the datasheet of the S29GL Mirrorbit flash by Spansion as an example. They provide a dedicated pin RY/BY#, which signals the end of an embedded algorithm (erase or programming). While figure 11.9 shows no timing advance of RY/BY# against Dout on the data line, figure 11.12 has one of unspecified length between RY/BY# and the end of data toggling.

If you had a 10-fold slowdown with MTD, either the CPLD really slows down the read access to the flash or maybe your custom driver uses some acceleration (write buffer programming,
unlock bypass, accelerated program with 12V on the WP#/ACC pin) while MTD does not.

Which kernel version and flash device did you use in this comparsion?

Regards,
Jörg.

^ permalink raw reply

* ads5121ev and kernel linux-2.6.30
From: Angelo @ 2009-10-29 21:36 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: s104259

[-- Attachment #1: Type: text/plain, Size: 252 bytes --]

Does anyone test freescale evaluation board ads5121ev with linux-2.6.30?
I know that freescale had released a lot of patch for linux-2.6.24.6 and it works great for me.
But testing 2.6.30 usb doesn't work.

Thanks in advance,
devel81

[-- Attachment #2: Type: text/html, Size: 383 bytes --]

^ permalink raw reply

* Re: [PATCH] BUILD_BUG_ON: make it handle more cases
From: Hollis Blanchard @ 2009-10-29 21:30 UTC (permalink / raw)
  To: Rusty Russell
  Cc: sfr, linux-kernel, kvm-ppc, linux-next, Jan Beulich, akpm,
	linuxppc-dev
In-Reply-To: <200910201415.34361.rusty@rustcorp.com.au>

On Tue, 2009-10-20 at 14:15 +1030, Rusty Russell wrote:
> BUILD_BUG_ON used to use the optimizer to do code elimination or fail
> at link time; it was changed to first the size of a negative array (a
> nicer compile time error), then (in
> 8c87df457cb58fe75b9b893007917cf8095660a0) to a bitfield.
> 
> bitfields: needs a literal constant at parse time, and can't be put under
> 	"if (__builtin_constant_p(x))" for example.
> negative array: can handle anything, but if the compiler can't tell it's
> 	a constant, silently has no effect.
> link time: breaks link if the compiler can't determine the value, but the
> 	linker output is not usually as informative as a compiler error.
> 
> If we use the negative-array-size method *and* the link time trick,
> we get the ability to use BUILD_BUG_ON() under __builtin_constant_p()
> branches, and maximal ability for the compiler to detect errors at
> build time.
> 
> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
> 
> diff --git a/include/linux/kernel.h b/include/linux/kernel.h
> --- a/include/linux/kernel.h
> +++ b/include/linux/kernel.h
> @@ -683,12 +683,6 @@ struct sysinfo {
>  	char _f[20-2*sizeof(long)-sizeof(int)];	/* Padding: libc5 uses this.. */
>  };
> 
> -/* Force a compilation error if condition is true */
> -#define BUILD_BUG_ON(condition) ((void)BUILD_BUG_ON_ZERO(condition))
> -
> -/* Force a compilation error if condition is constant and true */
> -#define MAYBE_BUILD_BUG_ON(cond) ((void)sizeof(char[1 - 2 * !!(cond)]))
> -
>  /* Force a compilation error if condition is true, but also produce a
>     result (of value 0 and type size_t), so the expression can be used
>     e.g. in a structure initializer (or where-ever else comma expressions
> @@ -696,6 +690,33 @@ struct sysinfo {
>  #define BUILD_BUG_ON_ZERO(e) (sizeof(struct { int:-!!(e); }))
>  #define BUILD_BUG_ON_NULL(e) ((void *)sizeof(struct { int:-!!(e); }))
> 
> +/**
> + * BUILD_BUG_ON - break compile if a condition is true.
> + * @cond: the condition which the compiler should know is false.
> + *
> + * If you have some code which relies on certain constants being equal, or
> + * other compile-time-evaluated condition, you should use BUILD_BUG_ON to
> + * detect if someone changes it.
> + *
> + * The implementation uses gcc's reluctance to create a negative array, but
> + * gcc (as of 4.4) only emits that error for obvious cases (eg. not arguments
> + * to inline functions).  So as a fallback we use the optimizer; if it can't
> + * prove the condition is false, it will cause a link error on the undefined
> + * "__build_bug_on_failed".  This error message can be harder to track down
> + * though, hence the two different methods.
> + */
> +#ifndef __OPTIMIZE__
> +#define BUILD_BUG_ON(condition) ((void)sizeof(char[1 - 2*!!(condition)]))
> +#else
> +extern int __build_bug_on_failed;
> +#define BUILD_BUG_ON(condition)					\
> +	do {							\
> +		((void)sizeof(char[1 - 2*!!(condition)]));	\
> +		if (condition) __build_bug_on_failed = 1;	\
> +	} while(0)
> +#endif
> +#define MAYBE_BUILD_BUG_ON(condition) BUILD_BUG_ON(condition)
> +
>  /* Trap pasters of __FUNCTION__ at compile-time */
>  #define __FUNCTION__ (__func__)

What's the state of this patch?

-- 
Hollis Blanchard
IBM Linux Technology Center

^ permalink raw reply

* RE: Accessing flash directly from User Space [SOLVED]
From: Jonathan Haws @ 2009-10-29 17:02 UTC (permalink / raw)
  To: Kenneth Johansson, Joakim Tjernlund
  Cc: Bill Gatliff, linuxppc-dev@lists.ozlabs.org
In-Reply-To: <1256814536.24490.36.camel@kenjo-laptop>

> > > Anyway, to make a long story short, I inserted an msync() after
> each
> > > assignment to the flash.  This resolved my problem and I can now
> program my flash.
> >
> > Ouch, this was news to me too. Calling msync() after every write
> kills performance.
> > We use mmap(/dev/mem) to access HW and havn't seen any issues yet.
> Is this
> > perhaps a new behaviour for mmap(/dev/mem) and is there a way
> > to avoid calling msync()?
>=20
> The address range should be outside the dram and thus uncached. Any
> write to any address in the range mmaped should go directly to the
> NOR
> flash. Any other behavior is a bug. It's not mapping an actual file
> here.

That is what I was thinking.  But I have a working driver and the extra del=
ay of writing to flash caused by the msync() calls I can deal with.  I only=
 ever write data to flash when I need to update one of my boot images.

Thanks,

Jonathan

^ permalink raw reply

* RE: Accessing flash directly from User Space [SOLVED]
From: Jonathan Haws @ 2009-10-29 17:01 UTC (permalink / raw)
  To: Joakim Tjernlund; +Cc: Bill Gatliff, linuxppc-dev@lists.ozlabs.org
In-Reply-To: <OF8B45B496.CDDD0639-ONC125765E.0032ACFE-C125765E.0032E44F@transmode.se>

> Does O_DIRECT help? (you may need to define _GNU_SOURCE before
> #include)


Nope, O_DIRECT did not help - in fact it caused the application to crash.  =
Why that is I am not sure, but it crashed.

^ permalink raw reply

* [PATCH] USB: fsl_udc_core: Fix kernel oops on module removal
From: Anton Vorontsov @ 2009-10-29 16:50 UTC (permalink / raw)
  To: Greg Kroah-Hartman; +Cc: linuxppc-dev, Guennadi Liakhovetski, linux-usb

fsl_udc_release() calls dma_free_coherent() with an inappropriate
device passed to it, and since the device has no dma_ops, the following
oops pops up:

  Kernel BUG at d103ce9c [verbose debug info unavailable]
  Oops: Exception in kernel mode, sig: 5 [#1]
  ...
  NIP [d103ce9c] fsl_udc_release+0x50/0x80 [fsl_usb2_udc]
  LR [d103ce74] fsl_udc_release+0x28/0x80 [fsl_usb2_udc]
  Call Trace:
  [cfbc7dc0] [d103ce74] fsl_udc_release+0x28/0x80 [fsl_usb2_udc]
  [cfbc7dd0] [c01a35c4] device_release+0x2c/0x90
  [cfbc7de0] [c016b480] kobject_cleanup+0x58/0x98
  [cfbc7e00] [c016c52c] kref_put+0x54/0x6c
  [cfbc7e10] [c016b360] kobject_put+0x34/0x64
  [cfbc7e20] [c01a1d0c] put_device+0x1c/0x2c
  [cfbc7e30] [d103dbfc] fsl_udc_remove+0xc0/0x1e4 [fsl_usb2_udc]
  ...

This patch fixes the issue by passing dev->parent, which points to
a correct device.

Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
---
 drivers/usb/gadget/fsl_udc_core.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/usb/gadget/fsl_udc_core.c b/drivers/usb/gadget/fsl_udc_core.c
index 42a74b8..fa3d142 100644
--- a/drivers/usb/gadget/fsl_udc_core.c
+++ b/drivers/usb/gadget/fsl_udc_core.c
@@ -2139,7 +2139,7 @@ static int fsl_proc_read(char *page, char **start, off_t off, int count,
 static void fsl_udc_release(struct device *dev)
 {
 	complete(udc_controller->done);
-	dma_free_coherent(dev, udc_controller->ep_qh_size,
+	dma_free_coherent(dev->parent, udc_controller->ep_qh_size,
 			udc_controller->ep_qh, udc_controller->ep_qh_dma);
 	kfree(udc_controller);
 }
-- 
1.6.3.3

^ permalink raw reply related

* RE: Accessing flash directly from User Space
From: Jonathan Haws @ 2009-10-29 16:48 UTC (permalink / raw)
  To: Scott Wood; +Cc: Bill Gatliff, linuxppc-dev@lists.ozlabs.org
In-Reply-To: <20091029163300.GC28414@loki.buserror.net>

> On Tue, Oct 27, 2009 at 04:52:40PM -0600, Jonathan Haws wrote:
> > > Will the device respond to 0x1234 being written at offset zero?
> You
> > > generally have to poke these things pretty specifically in order
> to
> > > get
> > > them to go into command mode.
> > >
> >
> > It should because that is the first data location in flash.
>=20
> I don't follow.  Even if you have an Intel command set flash (and
> thus don't
> need unlock writes), 0x1234 isn't a valid command that I know of.
> The flash
> doesn't behave as a register that you can read back; it just
> responds in a
> certain way based on what you write to it.
>=20
> > Also, just to be sure I am telling the truth, I tried writing to
> one of
> > the registers to setup an erase and got the same results - the
> value did
> > not get written.
>=20
> Following the exact sequence that the driver uses?  What did you
> write, what
> did you expect (you're generally not going to get the same thing
> back that
> you wrote), and what did you get?  What kind of command set, bus
> width, and
> interleaving do you have?

I used the erase pattern, then write pattern for my flash device.  When I t=
ried to read back the value that should have been stored, it was what it wa=
s previously.

>=20
> If you manually do the same exact accesses from a firmware prompt,
> external
> debugger, etc. does it work?
>=20
> > > > The driver works perfectly in VxWorks,
>=20
> On this exact hardware?

Yes.

> > > Including the 0x1234 thing?
> >
> > Actually, I have not tried that - I have not had to since the
> driver worked.
>=20
> What happens without the 0x1234?

Have not bothered to try it.  My guess, after finding out what the problem =
is that it would not read back 0x1234.  In the test I performed, I intended=
 to erase the sector, prep it for write, then write out 0x1234 to the first=
 two bytes in flash.  However, I failed in include the code to erase and pr=
ep the sector for writing in my rush to find out what the heck was going on=
.

As I mentioned previously, I was just not allowing the correct sequence of =
operations to take place to erase the sector (that is where my problem bega=
n) because when I setup the sector for erasure, the sequencing did not take=
 place correctly because what I would assign to flash was not committed imm=
ediately.

I hope that makes sense.

Thanks,

Jonathan

^ permalink raw reply

* Re: Accessing flash directly from User Space
From: Scott Wood @ 2009-10-29 16:33 UTC (permalink / raw)
  To: Jonathan Haws; +Cc: Bill Gatliff, linuxppc-dev@lists.ozlabs.org
In-Reply-To: <BB99A6BA28709744BF22A68E6D7EB51F0330D368E3@midas.usurf.usu.edu>

On Tue, Oct 27, 2009 at 04:52:40PM -0600, Jonathan Haws wrote:
> > Will the device respond to 0x1234 being written at offset zero?  You
> > generally have to poke these things pretty specifically in order to
> > get
> > them to go into command mode.
> > 
> 
> It should because that is the first data location in flash. 

I don't follow.  Even if you have an Intel command set flash (and thus don't
need unlock writes), 0x1234 isn't a valid command that I know of.  The flash
doesn't behave as a register that you can read back; it just responds in a
certain way based on what you write to it.

> Also, just to be sure I am telling the truth, I tried writing to one of
> the registers to setup an erase and got the same results - the value did
> not get written.

Following the exact sequence that the driver uses?  What did you write, what
did you expect (you're generally not going to get the same thing back that
you wrote), and what did you get?  What kind of command set, bus width, and
interleaving do you have?

If you manually do the same exact accesses from a firmware prompt, external
debugger, etc. does it work?

> > > The driver works perfectly in VxWorks,

On this exact hardware?

> > Including the 0x1234 thing?
> 
> Actually, I have not tried that - I have not had to since the driver worked.

What happens without the 0x1234?

-Scott

^ permalink raw reply

* Re: [git pull] Please pull powerpc.git merge branch
From: Linus Torvalds @ 2009-10-29 16:14 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: linuxppc-dev list, Andrew Morton, Linux Kernel list
In-Reply-To: <1256627452.11607.90.camel@pasglop>



On Tue, 27 Oct 2009, Benjamin Herrenschmidt wrote:
> 
> Kumar Gala (7):
>       powerpc: Add a Book-3E 64-bit defconfig
>       powerpc: Fix compile errors found by new ppc64e_defconfig
>       powerpc: Limit hugetlbfs support to PPC64 Book-3S machines

This is incredibly ugly. Why should the generic fs/Kconfig know about some 
random odd architecture detail like PPC_BOOK3S_64?

I merged it, and noticed this because Super-H caused clashes by cleaning 
up. I would suggest PowerPC do the same.

This patch is not signed-off, nor do I want any credit. But if it works on 
ppc, please send me something like this back.

		Linus
---
 arch/powerpc/Kconfig |    3 +++
 fs/Kconfig           |    2 +-
 2 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 10a0a54..877db84 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -131,6 +131,9 @@ config PPC
 	select GENERIC_ATOMIC64 if PPC32
 	select HAVE_PERF_EVENTS
 
+config SYS_SUPPORTS_HUGETLBFS
+	defbool PPC_BOOK3S_64
+
 config EARLY_PRINTK
 	bool
 	default y
diff --git a/fs/Kconfig b/fs/Kconfig
index 2126078..64d44ef 100644
--- a/fs/Kconfig
+++ b/fs/Kconfig
@@ -135,7 +135,7 @@ config TMPFS_POSIX_ACL
 
 config HUGETLBFS
 	bool "HugeTLB file system support"
-	depends on X86 || IA64 || PPC_BOOK3S_64 || SPARC64 || (S390 && 64BIT) || \
+	depends on X86 || IA64 || SPARC64 || (S390 && 64BIT) || \
 		   SYS_SUPPORTS_HUGETLBFS || BROKEN
 	help
 	  hugetlbfs is a filesystem backing for HugeTLB pages, based on

^ permalink raw reply related

* RE: Accessing flash directly from User Space [SOLVED]
From: Kenneth Johansson @ 2009-10-29 11:08 UTC (permalink / raw)
  To: Joakim Tjernlund
  Cc: Bill Gatliff, linuxppc-dev@lists.ozlabs.org, Jonathan Haws
In-Reply-To: <OFE3AA516D.46A96178-ONC125765E.0030DA7C-C125765E.00317B8C@transmode.se>

On Thu, 2009-10-29 at 10:00 +0100, Joakim Tjernlund wrote:
> >
> > > > On Tue, Oct 27, 2009 at 04:24:53PM -0600, Jonathan Haws wrote:
> > > > >> >>> How can I get that pointer?  Unfortunately I cannot simply
> > > > use
> > > > >> the
> > > > >> >>>
> > > > >> >> address of the flash.  Is there some magical function call
> > > > that
> > > > >> >> gives me access to that portion of the memory space?
> > > > >> >>
> > > > >> >> $ man 2 mmap
> > > > >> >>
> > > > >> >> You want MAP_SHARED and O_SYNC.
> > > > >> >>
> > > > >> >
> > > > >> >
> > > > >> > To use that I need to have a file descriptor to a device, do
> > > I
> > > > >> not?  However, I do not have a base flash driver to give me
> > > that
> > > > >> file descriptor.  Am I missing something with that call?
> > > > >> >
> > > > >>
> > > > >> /dev/mem
> > > > >>
> > > > >Okay, I now have access to the flash memory, however when I write
> > > > to it the writes do not take.  I have tried calling msync() on the
> > > > mapping to no avail.  I have opened the fd with O_SYNC, but cannot
> > > > get things to work right.
> > > > >
> > > > >Here are the calls:
> > > > >
> > > > >   int fd = open("/dev/mem", O_SYNC | O_RDWR);
> > > > >   uint16_t * flash = (uint16_t *)mmap(NULL, NOR_FLASH_SIZE,
> > > > >         (PROT_READ | PROT_WRITE), MAP_PRIVATE, fd,
> > > > >         NOR_FLASH_BASE_ADRS);
> > > >
> > > > What board and CPU are you using?  Is your flash really at
> > > > 0xFC800000, or is
> > > > that the virtual address that VxWorks puts it at?
> > >
> > > I am using a custom board based on the AMCC Kilauea development
> > > board.  It uses a 405EX CPU.  Yes, the flash is really at
> > > 0xFC000000.
> >
> > I have found the problem.  It occurred to me in the shower (okay not really,
> > but most good ideas happen there).
> >
> > What was happening is that I was in fact able to write to the correct
> > registers.  However, I would try and write to them in a batch.  But the way
> > mmap works (at least according to the man page) with MAP_SHARED is that the
> > file may not be updated until msync() is called.  Now, I thought that O_SYNC
> > would take care of that when I open /dev/mem, but that was not the case.
> >
> > Anyway, to make a long story short, I inserted an msync() after each
> > assignment to the flash.  This resolved my problem and I can now program my flash.
> 
> Ouch, this was news to me too. Calling msync() after every write kills performance.
> We use mmap(/dev/mem) to access HW and havn't seen any issues yet. Is this
> perhaps a new behaviour for mmap(/dev/mem) and is there a way
> to avoid calling msync()?

The address range should be outside the dram and thus uncached. Any
write to any address in the range mmaped should go directly to the NOR
flash. Any other behavior is a bug. It's not mapping an actual file
here.

^ permalink raw reply

* RE: Accessing flash directly from User Space [SOLVED]
From: Joakim Tjernlund @ 2009-10-29  9:15 UTC (permalink / raw)
  Cc: Bill Gatliff, linuxppc-dev@lists.ozlabs.org, Jonathan Haws
In-Reply-To: <OFE3AA516D.46A96178-ONC125765E.0030DA7C-C125765E.00317B8C@transmode.se>

>
> >
> > > > On Tue, Oct 27, 2009 at 04:24:53PM -0600, Jonathan Haws wrote:
> > > > >> >>> How can I get that pointer?  Unfortunately I cannot simply
> > > > use
> > > > >> the
> > > > >> >>>
> > > > >> >> address of the flash.  Is there some magical function call
> > > > that
> > > > >> >> gives me access to that portion of the memory space?
> > > > >> >>
> > > > >> >> $ man 2 mmap
> > > > >> >>
> > > > >> >> You want MAP_SHARED and O_SYNC.
> > > > >> >>
> > > > >> >
> > > > >> >
> > > > >> > To use that I need to have a file descriptor to a device, do
> > > I
> > > > >> not?  However, I do not have a base flash driver to give me
> > > that
> > > > >> file descriptor.  Am I missing something with that call?
> > > > >> >
> > > > >>
> > > > >> /dev/mem
> > > > >>
> > > > >Okay, I now have access to the flash memory, however when I write
> > > > to it the writes do not take.  I have tried calling msync() on the
> > > > mapping to no avail.  I have opened the fd with O_SYNC, but cannot
> > > > get things to work right.
> > > > >
> > > > >Here are the calls:
> > > > >
> > > > >   int fd = open("/dev/mem", O_SYNC | O_RDWR);
> > > > >   uint16_t * flash = (uint16_t *)mmap(NULL, NOR_FLASH_SIZE,
> > > > >         (PROT_READ | PROT_WRITE), MAP_PRIVATE, fd,
> > > > >         NOR_FLASH_BASE_ADRS);
> > > >
> > > > What board and CPU are you using?  Is your flash really at
> > > > 0xFC800000, or is
> > > > that the virtual address that VxWorks puts it at?
> > >
> > > I am using a custom board based on the AMCC Kilauea development
> > > board.  It uses a 405EX CPU.  Yes, the flash is really at
> > > 0xFC000000.
> >
> > I have found the problem.  It occurred to me in the shower (okay not really,
> > but most good ideas happen there).
> >
> > What was happening is that I was in fact able to write to the correct
> > registers.  However, I would try and write to them in a batch.  But the way
> > mmap works (at least according to the man page) with MAP_SHARED is that the
> > file may not be updated until msync() is called.  Now, I thought that O_SYNC
> > would take care of that when I open /dev/mem, but that was not the case.
> >
> > Anyway, to make a long story short, I inserted an msync() after each
> > assignment to the flash.  This resolved my problem and I can now program my flash.
>
> Ouch, this was news to me too. Calling msync() after every write kills performance.
> We use mmap(/dev/mem) to access HW and havn't seen any issues yet. Is this
> perhaps a new behaviour for mmap(/dev/mem) and is there a way
> to avoid calling msync()?

Does O_DIRECT help? (you may need to define _GNU_SOURCE before #include)

 Jocke

^ permalink raw reply

* RE: Accessing flash directly from User Space [SOLVED]
From: Joakim Tjernlund @ 2009-10-29  9:00 UTC (permalink / raw)
  To: Jonathan Haws; +Cc: linuxppc-dev@lists.ozlabs.org, Bill Gatliff, Jonathan Haws
In-Reply-To: <BB99A6BA28709744BF22A68E6D7EB51F0330E22259@midas.usurf.usu.edu>

>
> > > On Tue, Oct 27, 2009 at 04:24:53PM -0600, Jonathan Haws wrote:
> > > >> >>> How can I get that pointer?  Unfortunately I cannot simply
> > > use
> > > >> the
> > > >> >>>
> > > >> >> address of the flash.  Is there some magical function call
> > > that
> > > >> >> gives me access to that portion of the memory space?
> > > >> >>
> > > >> >> $ man 2 mmap
> > > >> >>
> > > >> >> You want MAP_SHARED and O_SYNC.
> > > >> >>
> > > >> >
> > > >> >
> > > >> > To use that I need to have a file descriptor to a device, do
> > I
> > > >> not?  However, I do not have a base flash driver to give me
> > that
> > > >> file descriptor.  Am I missing something with that call?
> > > >> >
> > > >>
> > > >> /dev/mem
> > > >>
> > > >Okay, I now have access to the flash memory, however when I write
> > > to it the writes do not take.  I have tried calling msync() on the
> > > mapping to no avail.  I have opened the fd with O_SYNC, but cannot
> > > get things to work right.
> > > >
> > > >Here are the calls:
> > > >
> > > >   int fd = open("/dev/mem", O_SYNC | O_RDWR);
> > > >   uint16_t * flash = (uint16_t *)mmap(NULL, NOR_FLASH_SIZE,
> > > >         (PROT_READ | PROT_WRITE), MAP_PRIVATE, fd,
> > > >         NOR_FLASH_BASE_ADRS);
> > >
> > > What board and CPU are you using?  Is your flash really at
> > > 0xFC800000, or is
> > > that the virtual address that VxWorks puts it at?
> >
> > I am using a custom board based on the AMCC Kilauea development
> > board.  It uses a 405EX CPU.  Yes, the flash is really at
> > 0xFC000000.
>
> I have found the problem.  It occurred to me in the shower (okay not really,
> but most good ideas happen there).
>
> What was happening is that I was in fact able to write to the correct
> registers.  However, I would try and write to them in a batch.  But the way
> mmap works (at least according to the man page) with MAP_SHARED is that the
> file may not be updated until msync() is called.  Now, I thought that O_SYNC
> would take care of that when I open /dev/mem, but that was not the case.
>
> Anyway, to make a long story short, I inserted an msync() after each
> assignment to the flash.  This resolved my problem and I can now program my flash.

Ouch, this was news to me too. Calling msync() after every write kills performance.
We use mmap(/dev/mem) to access HW and havn't seen any issues yet. Is this
perhaps a new behaviour for mmap(/dev/mem) and is there a way
to avoid calling msync()?

     Jocke

^ permalink raw reply

* Re: [PATCH 1/7] bitmap: Introduce bitmap_set, bitmap_clear, bitmap_find_next_zero_area
From: Akinobu Mita @ 2009-10-29  8:53 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-usb, linux-ia64, linuxppc-dev, Paul Mackerras,
	H. Peter Anvin, sparclinux, Lothar Wassmann, x86, linux-altix,
	Ingo Molnar, Fenghua Yu, Joerg Roedel, Yevgeny Petrilin,
	Thomas Gleixner, Tony Luck, netdev, Greg Kroah-Hartman,
	linux-kernel, FUJITA Tomonori, David S. Miller
In-Reply-To: <20091028131141.523854cb.akpm@linux-foundation.org>

2009/10/29 Andrew Morton <akpm@linux-foundation.org>:
>
> Why were these patches resent? =A0What changed?
>
> Everybody who is going to review these patches has already reviewed
> them and now they need to review them all again?

I resent the patches because the iommu-helper change was not correct
and I introduced serious bug in bitmap_find_next_zero_area()
if align_mask !=3D 0 in follow-up patch then those were dropped from
the -mm tree.

Only [PATCH 1/7] and [PATCH 2/7] have changes since the first submission of
this patch set.

* [PATCH 1/7]
- Rewrite bitmap_set() and bitmap_clear()
- Let bitmap_find_next_zero_area() check the last bit of the limit
- Add kerneldoc for bitmap_find_next_zero_area()

* [PATCH 2/7]
- Convert find_next_zero_area() to use bitmap_find_next_zero_area() correct=
ly
  iommu-helper doesn't want to search the last bit of the limist in bitmap

* [PATCH 3/7] - [PATCH 7/7]
- No changes

^ permalink raw reply

* Re: [GIT PULL] perf_event/tracing/powerpc patches from Anton Blanchard
From: Ingo Molnar @ 2009-10-29  6:55 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: Peter Zijlstra, linux-kernel, Anton Blanchard, linuxppc-dev
In-Reply-To: <19176.59441.523075.445864@drongo.ozlabs.ibm.com>


* Paul Mackerras <paulus@samba.org> wrote:

> Here is a series of patches from Anton Blanchard that implement some 
> nice tracing and perf_event features on powerpc.  One of them is 
> generic perf_event stuff (adding software events for alignment faults 
> and instruction emulation faults).
> 
> Since this touches the perf_event and tracing subsystems as well as 
> the powerpc architecture code, I think the best way forward is for 
> both Ingo and Ben to pull it into their trees.  I have based it on the 
> most recent point in Linus' tree that Ingo had pulled into his perf 
> branches (as of yesterday or so).

The generic perf bits look good to me - can pull it if Ben OKs the 
PowerPC bits.

Thanks,

	Ingo

^ permalink raw reply

* Re: SPRN_SVR for MPC5121e?
From: Wolfgang Denk @ 2009-10-29  6:26 UTC (permalink / raw)
  To: Wolfram Sang; +Cc: linuxppc-dev
In-Reply-To: <20091028152324.GC3920@pengutronix.de>

Dear Wolfram Sang,

In message <20091028152324.GC3920@pengutronix.de> you wrote:
> 
> my MPC5121e (Rev2) has PVR/SVR: 0x8086_2010/0x8018_0020 (as mentioned in the manual)
...
> Does someone here have a Rev1 and can ultimately confirm and/or supply the
> complete PVR/SVR for Rev1? Couldn't find any hint neither in the manual nor in
> the web.

We saw

	PVR/SVR = 0x80862010 / 0x80180010 for MPC5121
and

	PVR/SVR = 0x80862010 / 0x80180030 for MPC5123

Best regards,

Wolfgang Denk

-- 
DENX Software Engineering GmbH,     MD: Wolfgang Denk & Detlev Zundel
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de
Quantum particles: The dreams that stuff is made of.

^ permalink raw reply

* Re: [PATCH 1/6 v5] Kernel DLPAR Infrastructure
From: Nathan Lynch @ 2009-10-29  3:59 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev, linux-kernel
In-Reply-To: <1256785731.26770.38.camel@pasglop>

On Thu, 2009-10-29 at 14:08 +1100, Benjamin Herrenschmidt wrote:
> On Wed, 2009-10-28 at 15:53 -0500, Nathan Fontenot wrote:
> > +	struct device_node *dn;
> > +	struct device_node *first_dn = NULL;
> > +	struct device_node *last_dn = NULL;
> > +	struct property *property;
> > +	struct property *last_property = NULL;
> > +	struct cc_workarea *ccwa;
> > +	int cc_token;
> > +	int rc;
> > +
> > +	cc_token = rtas_token("ibm,configure-connector");
> > +	if (cc_token == RTAS_UNKNOWN_SERVICE)
> > +		return NULL;
> > +
> > +	spin_lock(&workarea_lock);
> > +
> > +	ccwa = (struct cc_workarea *)&workarea[0];
> > +	ccwa->drc_index = drc_index;
> > +	ccwa->zero = 0;
> 
> Popping a free page with gfp (or just kmalloc'ing 4K) would avoid the
> need for the lock too.

Not kmalloc -- the alignment of the buffer isn't guaranteed when
slub/slab debug is on, and iirc  the work area needs to be 4K-aligned.
__get_free_page should be fine, I think.

^ permalink raw reply

* Re: [PATCH 6/6 v5] CPU DLPAR Handling
From: Benjamin Herrenschmidt @ 2009-10-29  3:26 UTC (permalink / raw)
  To: Nathan Fontenot; +Cc: linuxppc-dev, linux-kernel
In-Reply-To: <4AE8B0B1.4020102@austin.ibm.com>

On Wed, 2009-10-28 at 15:59 -0500, Nathan Fontenot wrote:

> +#ifdef CONFIG_ARCH_CPU_PROBE_RELEASE
> +static ssize_t cpu_probe(const char *buf, size_t count)

dlpar_cpu_probe() pls

> +static ssize_t cpu_release(const char *buf, size_t count)
> +{

Ditto.

Or else in system.map, backtraces, etc... it's hard to figure out off
hand where it's dying :-)

Cheers,
Ben.

^ permalink raw reply

* Re: [PATCH 5/6 v5] CPU probe/release files
From: Benjamin Herrenschmidt @ 2009-10-29  3:25 UTC (permalink / raw)
  To: Nathan Fontenot; +Cc: linuxppc-dev, linux-kernel
In-Reply-To: <4AE8B072.1070603@austin.ibm.com>

On Wed, 2009-10-28 at 15:58 -0500, Nathan Fontenot wrote:
> Create new probe and release sysfs files to facilitate adding and removing
> cpus from the system.  This also creates the powerpc specific stubs to handle
> the arch callouts from writes to the sysfs files.
> 
> The creation and use of these files is regulated by the 
> CONFIG_ARCH_CPU_PROBE_RELEASE option so that only architectures that need the
> capability will have the files created.
> 
> Signed-off-by: Nathan Fontenot <nfont at austin.ibm.com>
> ---

Same question as before here... need some external acks from others
doing cpu hotplug.

Cheers,
Ben.

> Index: powerpc/drivers/base/cpu.c
> ===================================================================
> --- powerpc.orig/drivers/base/cpu.c	2009-10-28 15:20:34.000000000 -0500
> +++ powerpc/drivers/base/cpu.c	2009-10-28 15:21:53.000000000 -0500
> @@ -54,6 +54,7 @@
>  		ret = count;
>  	return ret;
>  }
> +
>  static SYSDEV_ATTR(online, 0644, show_online, store_online);
>  
>  static void __cpuinit register_cpu_control(struct cpu *cpu)
> @@ -72,6 +73,38 @@
>  	per_cpu(cpu_sys_devices, logical_cpu) = NULL;
>  	return;
>  }
> +
> +#ifdef CONFIG_ARCH_CPU_PROBE_RELEASE
> +static ssize_t cpu_probe_store(struct class *class, const char *buf,
> +			       size_t count)
> +{
> +	return arch_cpu_probe(buf, count);
> +}
> +
> +static ssize_t cpu_release_store(struct class *class, const char *buf,
> +				 size_t count)
> +{
> +	return arch_cpu_release(buf, count);
> +}
> +
> +static CLASS_ATTR(probe, S_IWUSR, NULL, cpu_probe_store);
> +static CLASS_ATTR(release, S_IWUSR, NULL, cpu_release_store);
> +
> +int __init cpu_probe_release_init(void)
> +{
> +	int rc;
> +
> +	rc = sysfs_create_file(&cpu_sysdev_class.kset.kobj,
> +			       &class_attr_probe.attr);
> +	if (!rc)
> +		rc = sysfs_create_file(&cpu_sysdev_class.kset.kobj,
> +				       &class_attr_release.attr);
> +
> +	return rc;
> +}
> +device_initcall(cpu_probe_release_init);
> +#endif /* CONFIG_ARCH_CPU_PROBE_RELEASE */
> +
>  #else /* ... !CONFIG_HOTPLUG_CPU */
>  static inline void register_cpu_control(struct cpu *cpu)
>  {
> Index: powerpc/arch/powerpc/include/asm/machdep.h
> ===================================================================
> --- powerpc.orig/arch/powerpc/include/asm/machdep.h	2009-10-28 15:21:47.000000000 -0500
> +++ powerpc/arch/powerpc/include/asm/machdep.h	2009-10-28 15:21:53.000000000 -0500
> @@ -274,6 +274,11 @@
>  	int (*memory_probe)(u64);
>  #endif
>  
> +#ifdef CONFIG_ARCH_CPU_PROBE_RELEASE
> +	ssize_t (*cpu_probe)(const char *, size_t);
> +	ssize_t (*cpu_release)(const char *, size_t);
> +#endif
> +
>  };
>  
>  extern void e500_idle(void);
> Index: powerpc/arch/powerpc/kernel/sysfs.c
> ===================================================================
> --- powerpc.orig/arch/powerpc/kernel/sysfs.c	2009-10-28 15:20:34.000000000 -0500
> +++ powerpc/arch/powerpc/kernel/sysfs.c	2009-10-28 15:21:53.000000000 -0500
> @@ -461,6 +461,25 @@
>  
>  	cacheinfo_cpu_offline(cpu);
>  }
> +
> +#ifdef CONFIG_ARCH_PROBE_RELEASE
> +ssize_t arch_cpu_probe(const char *buf, size_t count)
> +{
> +	if (ppc_md.cpu_probe)
> +		return ppc_md.cpu_probe(buf, count);
> +
> +	return -EINVAL;
> +}
> +
> +ssize_t arch_cpu_release(const char *buf, size_t count)
> +{
> +	if (ppc_md.cpu_release)
> +		return ppc_md.cpu_release(buf, count);
> +
> +	return -EINVAL;
> +}
> +#endif /* CONFIG_ARCH_PROBE_RELEASE */
> +
>  #endif /* CONFIG_HOTPLUG_CPU */
>  
>  static int __cpuinit sysfs_cpu_notify(struct notifier_block *self,
> Index: powerpc/arch/powerpc/Kconfig
> ===================================================================
> --- powerpc.orig/arch/powerpc/Kconfig	2009-10-28 15:21:47.000000000 -0500
> +++ powerpc/arch/powerpc/Kconfig	2009-10-28 15:21:53.000000000 -0500
> @@ -320,6 +320,10 @@
>  
>  	  Say N if you are unsure.
>  
> +config ARCH_CPU_PROBE_RELEASE
> +	def_bool y
> +	depends on HOTPLUG_CPU
> +
>  config ARCH_ENABLE_MEMORY_HOTPLUG
>  	def_bool y
>  
> Index: powerpc/include/linux/cpu.h
> ===================================================================
> --- powerpc.orig/include/linux/cpu.h	2009-10-28 15:20:34.000000000 -0500
> +++ powerpc/include/linux/cpu.h	2009-10-28 15:21:53.000000000 -0500
> @@ -43,6 +43,10 @@
>  
>  #ifdef CONFIG_HOTPLUG_CPU
>  extern void unregister_cpu(struct cpu *cpu);
> +#ifdef CONFIG_ARCH_CPU_PROBE_RELEASE
> +extern ssize_t arch_cpu_probe(const char *, size_t);
> +extern ssize_t arch_cpu_release(const char *, size_t);
> +#endif
>  #endif
>  
>  struct notifier_block;
> Index: powerpc/arch/powerpc/kernel/smp.c
> ===================================================================
> --- powerpc.orig/arch/powerpc/kernel/smp.c	2009-10-28 15:20:34.000000000 -0500
> +++ powerpc/arch/powerpc/kernel/smp.c	2009-10-28 15:21:53.000000000 -0500
> @@ -364,7 +364,24 @@
>  	set_cpu_online(cpu, true);
>  	local_irq_enable();
>  }
> +
> +#ifdef CONFIG_ARCH_CPU_PROBE_RELEASE
> +ssize_t arch_cpu_probe(const char *buf, size_t count)
> +{
> +	if (ppc_md.cpu_probe)
> +		return ppc_md.cpu_probe(buf, count);
> +
> +	return -EINVAL;
> +}
> +ssize_t arch_cpu_release(const char *buf, size_t count)
> +{
> +	if (ppc_md.cpu_release)
> +		return ppc_md.cpu_release(buf, count);
> +
> +	return -EINVAL;
> +}
>  #endif
> +#endif /* CONFIG_HOTPLUG_CPU */
>  
>  static int __devinit cpu_enable(unsigned int cpu)
>  {
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev

^ permalink raw reply

* Re: [PATCH 3/6 v5] Memory probe/release files
From: Benjamin Herrenschmidt @ 2009-10-29  3:13 UTC (permalink / raw)
  To: Nathan Fontenot; +Cc: linuxppc-dev, linux-kernel
In-Reply-To: <4AE8AFDD.6030504@austin.ibm.com>

On Wed, 2009-10-28 at 15:55 -0500, Nathan Fontenot wrote:
> This patch creates the release sysfs file for memory and updates the
> exisiting probe file so both make arch-specific callouts to handle removing
> and adding memory to the system.  This also creates the powerpc specific stubs
> for handling the arch callouts.
> 
> The creation and use of these files are governed by the exisitng
> CONFIG_ARCH_MEMORY_PROBE and new CONFIG_ARCH_MEMORY_RELEASE config options.
> 
> Signed-off-by: Nathan Fontenot <nfont at austin.ibm.com>
> ---

Is there anybody on linux-mm who needs to Ack this patche ?

Cheers,
Ben.

^ permalink raw reply

* Re: [PATCH 1/6 v5] Kernel DLPAR Infrastructure
From: Benjamin Herrenschmidt @ 2009-10-29  3:08 UTC (permalink / raw)
  To: Nathan Fontenot; +Cc: linuxppc-dev, linux-kernel
In-Reply-To: <4AE8AF4D.4030403@austin.ibm.com>

On Wed, 2009-10-28 at 15:53 -0500, Nathan Fontenot wrote:
> This patch provides the kernel DLPAR infrastructure in a new filed named
> dlpar.c.  The functionality provided is for acquiring and releasing a resource
> from firmware and the parsing of information returned from the
> ibm,configure-connector rtas call.  Additionally this exports the pSeries
> reconfiguration notifier chain so that it can be invoked when device tree 
> updates are made.
> 
> Signed-off-by: Nathan Fontenot <nfont at austin.ibm.com> 
> ---

Hi Nathan !

Finally I get to review this stuff :-)

> +#define CFG_CONN_WORK_SIZE	4096
> +static char workarea[CFG_CONN_WORK_SIZE];
> +static DEFINE_SPINLOCK(workarea_lock);

So I'm not a huge fan of this workarea static. First a static is in
effect a global name (as far as System.map etc... are concerned) so it
would warrant a better name. Then, do we really want that 4K of BSS
taken even on platforms that don't do dlpar ? Any reason why you don't
just pop a free page with __get_free_page() inside of
configure_connector() ?

> +struct cc_workarea {
> +	u32	drc_index;
> +	u32	zero;
> +	u32	name_offset;
> +	u32	prop_length;
> +	u32	prop_offset;
> +};
> +
> +static struct property *parse_cc_property(char *workarea)
> +{
> +	struct property *prop;
> +	struct cc_workarea *ccwa;
> +	char *name;
> +	char *value;
> +
> +	prop = kzalloc(sizeof(*prop), GFP_KERNEL);
> +	if (!prop)
> +		return NULL;
> +
> +	ccwa = (struct cc_workarea *)workarea;
> +	name = workarea + ccwa->name_offset;
> +	prop->name = kzalloc(strlen(name) + 1, GFP_KERNEL);
> +	if (!prop->name) {
> +		kfree(prop);
> +		return NULL;
> +	}
> +
> +	strcpy(prop->name, name);
> +
> +	prop->length = ccwa->prop_length;
> +	value = workarea + ccwa->prop_offset;
> +	prop->value = kzalloc(prop->length, GFP_KERNEL);
> +	if (!prop->value) {
> +		kfree(prop->name);
> +		kfree(prop);
> +		return NULL;
> +	}
> +
> +	memcpy(prop->value, value, prop->length);
> +	return prop;
> +}
> +
> +static void free_property(struct property *prop)
> +{
> +	kfree(prop->name);
> +	kfree(prop->value);
> +	kfree(prop);
> +}
> +
> +static struct device_node *parse_cc_node(char *work_area)
> +{

const char* maybe ?

> +	struct device_node *dn;
> +	struct cc_workarea *ccwa;
> +	char *name;
> +
> +	dn = kzalloc(sizeof(*dn), GFP_KERNEL);
> +	if (!dn)
> +		return NULL;
> +
> +	ccwa = (struct cc_workarea *)work_area;
> +	name = work_area + ccwa->name_offset;

I'm wondering whether work_area should be a struct cc_workarea * in the
first place with a char data[] at the end, but that would mean probably
tweaking the offsets... no big deal, up to you.

> +	dn->full_name = kzalloc(strlen(name) + 1, GFP_KERNEL);
> +	if (!dn->full_name) {
> +		kfree(dn);
> +		return NULL;
> +	}
> +
> +	strcpy(dn->full_name, name);

kstrdup ?

 .../...

> +#define NEXT_SIBLING    1
> +#define NEXT_CHILD      2
> +#define NEXT_PROPERTY   3
> +#define PREV_PARENT     4
> +#define MORE_MEMORY     5
> +#define CALL_AGAIN	-2
> +#define ERR_CFG_USE     -9003
> +
> +struct device_node *configure_connector(u32 drc_index)
> +{

It's a global exported function, I'd rather you call it
dlpar_configure_connector()

> +	struct device_node *dn;
> +	struct device_node *first_dn = NULL;
> +	struct device_node *last_dn = NULL;
> +	struct property *property;
> +	struct property *last_property = NULL;
> +	struct cc_workarea *ccwa;
> +	int cc_token;
> +	int rc;
> +
> +	cc_token = rtas_token("ibm,configure-connector");
> +	if (cc_token == RTAS_UNKNOWN_SERVICE)
> +		return NULL;
> +
> +	spin_lock(&workarea_lock);
> +
> +	ccwa = (struct cc_workarea *)&workarea[0];
> +	ccwa->drc_index = drc_index;
> +	ccwa->zero = 0;

Popping a free page with gfp (or just kmalloc'ing 4K) would avoid the
need for the lock too.

> +	rc = rtas_call(cc_token, 2, 1, NULL, workarea, NULL);
> +	while (rc) {
> +		switch (rc) {
> +		case NEXT_SIBLING:
> +			dn = parse_cc_node(workarea);
> +			if (!dn)
> +				goto cc_error;
> +
> +			dn->parent = last_dn->parent;
> +			last_dn->sibling = dn;
> +			last_dn = dn;
> +			break;
> +
> +		case NEXT_CHILD:
> +			dn = parse_cc_node(workarea);
> +			if (!dn)
> +				goto cc_error;
> +
> +			if (!first_dn)
> +				first_dn = dn;
> +			else {
> +				dn->parent = last_dn;
> +				if (last_dn)
> +					last_dn->child = dn;
> +			}
> +
> +			last_dn = dn;
> +			break;
> +
> +		case NEXT_PROPERTY:
> +			property = parse_cc_property(workarea);
> +			if (!property)
> +				goto cc_error;
> +
> +			if (!last_dn->properties)
> +				last_dn->properties = property;
> +			else
> +				last_property->next = property;
> +
> +			last_property = property;
> +			break;
> +
> +		case PREV_PARENT:
> +			last_dn = last_dn->parent;
> +			break;
> +
> +		case CALL_AGAIN:
> +			break;
> +
> +		case MORE_MEMORY:
> +		case ERR_CFG_USE:
> +		default:
> +			printk(KERN_ERR "Unexpected Error (%d) "
> +			       "returned from configure-connector\n", rc);
> +			goto cc_error;
> +		}
> +
> +		rc = rtas_call(cc_token, 2, 1, NULL, workarea, NULL);
> +	}
> +
> +	spin_unlock(&workarea_lock);
> +	return first_dn;
> +
> +cc_error:
> +	spin_unlock(&workarea_lock);
> +
> +	if (first_dn)
> +		free_cc_nodes(first_dn);
> +
> +	return NULL;
> +}
> +
> +static struct device_node *derive_parent(const char *path)
> +{
> +	struct device_node *parent;
> +	char parent_path[128];
> +	int parent_path_len;
> +
> +	parent_path_len = strrchr(path, '/') - path + 1;
> +	strlcpy(parent_path, path, parent_path_len);
> +
> +	parent = of_find_node_by_path(parent_path);
> +
> +	return parent;
> +}

This ...

> +static int add_one_node(struct device_node *dn)
> +{
> +	struct proc_dir_entry *ent;
> +	int rc;
> +
> +	of_node_set_flag(dn, OF_DYNAMIC);
> +	kref_init(&dn->kref);
> +	dn->parent = derive_parent(dn->full_name);
> +
> +	rc = blocking_notifier_call_chain(&pSeries_reconfig_chain,
> +					  PSERIES_RECONFIG_ADD, dn);
> +	if (rc == NOTIFY_BAD) {
> +		printk(KERN_ERR "Failed to add device node %s\n",
> +		       dn->full_name);
> +		return -ENOMEM; /* For now, safe to assume kmalloc failure */
> +	}
> +
> +	of_attach_node(dn);
> +
> +#ifdef CONFIG_PROC_DEVICETREE
> +	ent = proc_mkdir(strrchr(dn->full_name, '/') + 1, dn->parent->pde);
> +	if (ent)
> +		proc_device_tree_add_node(dn, ent);
> +#endif
> +
> +	of_node_put(dn->parent);
> +	return 0;
> +}

 ... and this ...

> +int add_device_tree_nodes(struct device_node *dn)
> +{
> +	struct device_node *child = dn->child;
> +	struct device_node *sibling = dn->sibling;
> +	int rc;
> +
> +	dn->child = NULL;
> +	dn->sibling = NULL;
> +	dn->parent = NULL;
> +
> +	rc = add_one_node(dn);
> +	if (rc)
> +		return rc;
> +
> +	if (child) {
> +		rc = add_device_tree_nodes(child);
> +		if (rc)
> +			return rc;
> +	}
> +
> +	if (sibling)
> +		rc = add_device_tree_nodes(sibling);
> +
> +	return rc;
> +}

 ... and this ...

> +static int remove_one_node(struct device_node *dn)
> +{
> +	struct device_node *parent = dn->parent;
> +	struct property *prop = dn->properties;
> +
> +#ifdef CONFIG_PROC_DEVICETREE
> +	while (prop) {
> +		remove_proc_entry(prop->name, dn->pde);
> +		prop = prop->next;
> +	}
> +
> +	if (dn->pde)
> +		remove_proc_entry(dn->pde->name, parent->pde);
> +#endif
> +
> +	blocking_notifier_call_chain(&pSeries_reconfig_chain,
> +			    PSERIES_RECONFIG_REMOVE, dn);
> +	of_detach_node(dn);
> +	of_node_put(dn); /* Must decrement the refcount */
> +
> +	return 0;
> +}

 ... and this ...

> +static int _remove_device_tree_nodes(struct device_node *dn)
> +{
> +	int rc;
> +
> +	if (dn->child) {
> +		rc = _remove_device_tree_nodes(dn->child);
> +		if (rc)
> +			return rc;
> +	}
> +
> +	if (dn->sibling) {
> +		rc = _remove_device_tree_nodes(dn->sibling);
> +		if (rc)
> +			return rc;
> +	}
> +
> +	rc = remove_one_node(dn);
> +	return rc;
> +}

 ... repeat myself ...

> +int remove_device_tree_nodes(struct device_node *dn)
> +{
> +	int rc;
> +
> +	if (dn->child) {
> +		rc = _remove_device_tree_nodes(dn->child);
> +		if (rc)
> +			return rc;
> +	}
> +
> +	rc = remove_one_node(dn);
> +	return rc;
> +}

 ... should probably all go to something like drivers/of/dynamic.c or at
least for now arch/powerpc/kernel/of_dynamic.c along with everything
related to dynamically adding and removing nodes. I see that potentially
useful for more than just DLPAR (though DLPAR is the only user right
now) and should also all be prefixed with of_*

> +#define DR_ENTITY_SENSE		9003
> +#define DR_ENTITY_PRESENT	1
> +#define DR_ENTITY_UNUSABLE	2
> +#define ALLOCATION_STATE	9003
> +#define ALLOC_UNUSABLE		0
> +#define ALLOC_USABLE		1
> +#define ISOLATION_STATE		9001
> +#define ISOLATE			0
> +#define UNISOLATE		1
> +
> +int acquire_drc(u32 drc_index)
> +{
> +	int dr_status, rc;
> +
> +	rc = rtas_call(rtas_token("get-sensor-state"), 2, 2, &dr_status,
> +		       DR_ENTITY_SENSE, drc_index);
> +	if (rc || dr_status != DR_ENTITY_UNUSABLE)
> +		return -1;
> +
> +	rc = rtas_set_indicator(ALLOCATION_STATE, drc_index, ALLOC_USABLE);
> +	if (rc)
> +		return rc;
> +
> +	rc = rtas_set_indicator(ISOLATION_STATE, drc_index, UNISOLATE);
> +	if (rc) {
> +		rtas_set_indicator(ALLOCATION_STATE, drc_index, ALLOC_UNUSABLE);
> +		return rc;
> +	}
> +
> +	return 0;
> +}
> +
> +int release_drc(u32 drc_index)
> +{
> +	int dr_status, rc;
> +
> +	rc = rtas_call(rtas_token("get-sensor-state"), 2, 2, &dr_status,
> +		       DR_ENTITY_SENSE, drc_index);
> +	if (rc || dr_status != DR_ENTITY_PRESENT)
> +		return -1;
> +
> +	rc = rtas_set_indicator(ISOLATION_STATE, drc_index, ISOLATE);
> +	if (rc)
> +		return rc;
> +
> +	rc = rtas_set_indicator(ALLOCATION_STATE, drc_index, ALLOC_UNUSABLE);
> +	if (rc) {
> +		rtas_set_indicator(ISOLATION_STATE, drc_index, UNISOLATE);
> +		return rc;
> +	}
> +
> +	return 0;
> +}

Both above should have a dlpar_* prefix

> +static int pseries_dlpar_init(void)
> +{
> +	if (!machine_is(pseries))
> +		return 0;
> +
> +	return 0;
> +}
> +device_initcall(pseries_dlpar_init);

What the point ? :-)

Cheers
Ben.

^ permalink raw reply

* Re: [GIT PULL] perf_event/tracing/powerpc patches from Anton Blanchard
From: Paul Mackerras @ 2009-10-29  2:43 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Peter Zijlstra, linuxppc-dev, Ingo Molnar, linux-kernel,
	Anton Blanchard
In-Reply-To: <1256780668.26770.15.camel@pasglop>

Benjamin Herrenschmidt writes:

> On Thu, 2009-10-29 at 11:56 +1100, Paul Mackerras wrote:
> > Here is a series of patches from Anton Blanchard that implement some
> > nice tracing and perf_event features on powerpc.  One of them is
> > generic perf_event stuff (adding software events for alignment faults
> > and instruction emulation faults).
> > 
> > Since this touches the perf_event and tracing subsystems as well as the
> > powerpc architecture code, I think the best way forward is for both
> > Ingo and Ben to pull it into their trees.  I have based it on the most
> > recent point in Linus' tree that Ingo had pulled into his perf
> > branches (as of yesterday or so).
> 
> This is -next material right ?

Yes, please pull it into your next branch.

Thanks,
Paul.

^ permalink raw reply

* Re: [2/6] Cleanup management of kmem_caches for pagetables
From: David Gibson @ 2009-10-29  2:27 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, linuxppc-dev
In-Reply-To: <20091027052431.B0876B7BF0@ozlabs.org>

Oops, there was one big, nasty, stupid bug in this patch.  Corrected
patch below.

Cleanup management of kmem_caches for pagetables

Currently we have a fair bit of rather fiddly code to manage the
various kmem_caches used to store page tables of various levels.  We
generally have two caches holding some combination of PGD, PUD and PMD
tables, plus several more for the special hugepage pagetables.

This patch cleans this all up by taking a different approach.  Rather
than the caches being designated as for PUDs or for hugeptes for 16M
pages, the caches are simply allocated to be a specific size.  Thus
sharing of caches between different types/levels of pagetables happens
naturally.  The pagetable size, where needed, is passed around encoded
in the same way as {PGD,PUD,PMD}_INDEX_SIZE; that is n where the
pagetable contains 2^n pointers.

Signed-off-by: David Gibson <dwg@au1.ibm.com>

Index: working-2.6/arch/powerpc/mm/init_64.c
===================================================================
--- working-2.6.orig/arch/powerpc/mm/init_64.c	2009-10-27 16:34:08.000000000 +1100
+++ working-2.6/arch/powerpc/mm/init_64.c	2009-10-29 12:48:02.000000000 +1100
@@ -119,30 +119,58 @@ static void pmd_ctor(void *addr)
 	memset(addr, 0, PMD_TABLE_SIZE);
 }
 
-static const unsigned int pgtable_cache_size[2] = {
-	PGD_TABLE_SIZE, PMD_TABLE_SIZE
-};
-static const char *pgtable_cache_name[ARRAY_SIZE(pgtable_cache_size)] = {
-#ifdef CONFIG_PPC_64K_PAGES
-	"pgd_cache", "pmd_cache",
-#else
-	"pgd_cache", "pud_pmd_cache",
-#endif /* CONFIG_PPC_64K_PAGES */
-};
-
-#ifdef CONFIG_HUGETLB_PAGE
-/* Hugepages need an extra cache per hugepagesize, initialized in
- * hugetlbpage.c.  We can't put into the tables above, because HPAGE_SHIFT
- * is not compile time constant. */
-struct kmem_cache *pgtable_cache[ARRAY_SIZE(pgtable_cache_size)+MMU_PAGE_COUNT];
-#else
-struct kmem_cache *pgtable_cache[ARRAY_SIZE(pgtable_cache_size)];
-#endif
+struct kmem_cache *pgtable_cache[MAX_PGTABLE_INDEX_SIZE];
+
+/*
+ * Create a kmem_cache() for pagetables.  This is not used for PTE
+ * pages - they're linked to struct page, come from the normal free
+ * pages pool and have a different entry size (see real_pte_t) to
+ * everything else.  Caches created by this function are used for all
+ * the higher level pagetables, and for hugepage pagetables.
+ */
+void pgtable_cache_add(unsigned shift, void (*ctor)(void *))
+{
+	char *name;
+	unsigned long table_size = sizeof(void *) << shift;
+	unsigned long align = table_size;
+
+	/* When batching pgtable pointers for RCU freeing, we store
+	 * the index size in the low bits.  Table alignment must be
+	 * big enough to fit it */
+	unsigned long minalign = MAX_PGTABLE_INDEX_SIZE + 1;
+	struct kmem_cache *new;
+
+	/* It would be nice if this was a BUILD_BUG_ON(), but at the
+	 * moment, gcc doesn't seem to recognize is_power_of_2 as a
+	 * constant expression, so so much for that. */
+	BUG_ON(!is_power_of_2(minalign));
+	BUG_ON((shift < 1) || (shift > MAX_PGTABLE_INDEX_SIZE));
+
+	if (PGT_CACHE(shift))
+		return; /* Already have a cache of this size */
+
+	align = max_t(unsigned long, align, minalign);
+	name = kasprintf(GFP_KERNEL, "pgtable-2^%d", shift);
+	new = kmem_cache_create(name, table_size, align, 0, ctor);
+	PGT_CACHE(shift) = new;
+
+	pr_debug("Allocated pgtable cache for order %d\n", shift);
+}
+
 
 void pgtable_cache_init(void)
 {
-	pgtable_cache[0] = kmem_cache_create(pgtable_cache_name[0], PGD_TABLE_SIZE, PGD_TABLE_SIZE, SLAB_PANIC, pgd_ctor);
-	pgtable_cache[1] = kmem_cache_create(pgtable_cache_name[1], PMD_TABLE_SIZE, PMD_TABLE_SIZE, SLAB_PANIC, pmd_ctor);
+	pgtable_cache_add(PGD_INDEX_SIZE, pgd_ctor);
+	pgtable_cache_add(PMD_INDEX_SIZE, pmd_ctor);
+	if (!PGT_CACHE(PGD_INDEX_SIZE) || !PGT_CACHE(PMD_INDEX_SIZE))
+		panic("Couldn't allocate pgtable caches");
+
+	/* In all current configs, when the PUD index exists it's the
+	 * same size as either the pgd or pmd index.  Verify that the
+	 * initialization above has also created a PUD cache.  This
+	 * will need re-examiniation if we add new possibilities for
+	 * the pagetable layout. */
+	BUG_ON(PUD_INDEX_SIZE && !PGT_CACHE(PUD_INDEX_SIZE));
 }
 
 #ifdef CONFIG_SPARSEMEM_VMEMMAP
Index: working-2.6/arch/powerpc/include/asm/pgalloc-64.h
===================================================================
--- working-2.6.orig/arch/powerpc/include/asm/pgalloc-64.h	2009-10-27 16:34:07.000000000 +1100
+++ working-2.6/arch/powerpc/include/asm/pgalloc-64.h	2009-10-27 16:34:27.000000000 +1100
@@ -11,27 +11,39 @@
 #include <linux/cpumask.h>
 #include <linux/percpu.h>
 
+/*
+ * Functions that deal with pagetables that could be at any level of
+ * the table need to be passed an "index_size" so they know how to
+ * handle allocation.  For PTE pages (which are linked to a struct
+ * page for now, and drawn from the main get_free_pages() pool), the
+ * allocation size will be (2^index_size * sizeof(pointer)) and
+ * allocations are drawn from the kmem_cache in PGT_CACHE(index_size).
+ *
+ * The maximum index size needs to be big enough to allow any
+ * pagetable sizes we need, but small enough to fit in the low bits of
+ * any page table pointer.  In other words all pagetables, even tiny
+ * ones, must be aligned to allow at least enough low 0 bits to
+ * contain this value.  This value is also used as a mask, so it must
+ * be one less than a power of two.
+ */
+#define MAX_PGTABLE_INDEX_SIZE	0xf
+
 #ifndef CONFIG_PPC_SUBPAGE_PROT
 static inline void subpage_prot_free(pgd_t *pgd) {}
 #endif
 
 extern struct kmem_cache *pgtable_cache[];
-
-#define PGD_CACHE_NUM		0
-#define PUD_CACHE_NUM		1
-#define PMD_CACHE_NUM		1
-#define HUGEPTE_CACHE_NUM	2
-#define PTE_NONCACHE_NUM	7  /* from GFP rather than kmem_cache */
+#define PGT_CACHE(shift) (pgtable_cache[(shift)-1])
 
 static inline pgd_t *pgd_alloc(struct mm_struct *mm)
 {
-	return kmem_cache_alloc(pgtable_cache[PGD_CACHE_NUM], GFP_KERNEL);
+	return kmem_cache_alloc(PGT_CACHE(PGD_INDEX_SIZE), GFP_KERNEL);
 }
 
 static inline void pgd_free(struct mm_struct *mm, pgd_t *pgd)
 {
 	subpage_prot_free(pgd);
-	kmem_cache_free(pgtable_cache[PGD_CACHE_NUM], pgd);
+	kmem_cache_free(PGT_CACHE(PGD_INDEX_SIZE), pgd);
 }
 
 #ifndef CONFIG_PPC_64K_PAGES
@@ -40,13 +52,13 @@ static inline void pgd_free(struct mm_st
 
 static inline pud_t *pud_alloc_one(struct mm_struct *mm, unsigned long addr)
 {
-	return kmem_cache_alloc(pgtable_cache[PUD_CACHE_NUM],
+	return kmem_cache_alloc(PGT_CACHE(PUD_INDEX_SIZE),
 				GFP_KERNEL|__GFP_REPEAT);
 }
 
 static inline void pud_free(struct mm_struct *mm, pud_t *pud)
 {
-	kmem_cache_free(pgtable_cache[PUD_CACHE_NUM], pud);
+	kmem_cache_free(PGT_CACHE(PUD_INDEX_SIZE), pud);
 }
 
 static inline void pud_populate(struct mm_struct *mm, pud_t *pud, pmd_t *pmd)
@@ -78,13 +90,13 @@ static inline void pmd_populate_kernel(s
 
 static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long addr)
 {
-	return kmem_cache_alloc(pgtable_cache[PMD_CACHE_NUM],
+	return kmem_cache_alloc(PGT_CACHE(PMD_INDEX_SIZE),
 				GFP_KERNEL|__GFP_REPEAT);
 }
 
 static inline void pmd_free(struct mm_struct *mm, pmd_t *pmd)
 {
-	kmem_cache_free(pgtable_cache[PMD_CACHE_NUM], pmd);
+	kmem_cache_free(PGT_CACHE(PMD_INDEX_SIZE), pmd);
 }
 
 static inline pte_t *pte_alloc_one_kernel(struct mm_struct *mm,
@@ -107,24 +119,22 @@ static inline pgtable_t pte_alloc_one(st
 	return page;
 }
 
-static inline void pgtable_free(pgtable_free_t pgf)
+static inline void pgtable_free(void *table, unsigned index_size)
 {
-	void *p = (void *)(pgf.val & ~PGF_CACHENUM_MASK);
-	int cachenum = pgf.val & PGF_CACHENUM_MASK;
-
-	if (cachenum == PTE_NONCACHE_NUM)
-		free_page((unsigned long)p);
-	else
-		kmem_cache_free(pgtable_cache[cachenum], p);
+	if (!index_size)
+		free_page((unsigned long)table);
+	else {
+		BUG_ON(index_size > MAX_PGTABLE_INDEX_SIZE);
+		kmem_cache_free(PGT_CACHE(index_size), table);
+	}
 }
 
-#define __pmd_free_tlb(tlb, pmd,addr)		      \
-	pgtable_free_tlb(tlb, pgtable_free_cache(pmd, \
-		PMD_CACHE_NUM, PMD_TABLE_SIZE-1))
+#define __pmd_free_tlb(tlb, pmd, addr)		      \
+	pgtable_free_tlb(tlb, pmd, PMD_INDEX_SIZE)
 #ifndef CONFIG_PPC_64K_PAGES
 #define __pud_free_tlb(tlb, pud, addr)		      \
-	pgtable_free_tlb(tlb, pgtable_free_cache(pud, \
-		PUD_CACHE_NUM, PUD_TABLE_SIZE-1))
+	pgtable_free_tlb(tlb, pud, PUD_INDEX_SIZE)
+
 #endif /* CONFIG_PPC_64K_PAGES */
 
 #define check_pgt_cache()	do { } while (0)
Index: working-2.6/arch/powerpc/include/asm/pgalloc.h
===================================================================
--- working-2.6.orig/arch/powerpc/include/asm/pgalloc.h	2009-10-27 16:34:07.000000000 +1100
+++ working-2.6/arch/powerpc/include/asm/pgalloc.h	2009-10-27 16:34:27.000000000 +1100
@@ -24,25 +24,6 @@ static inline void pte_free(struct mm_st
 	__free_page(ptepage);
 }
 
-typedef struct pgtable_free {
-	unsigned long val;
-} pgtable_free_t;
-
-/* This needs to be big enough to allow for MMU_PAGE_COUNT + 2 to be stored
- * and small enough to fit in the low bits of any naturally aligned page
- * table cache entry. Arbitrarily set to 0x1f, that should give us some
- * room to grow
- */
-#define PGF_CACHENUM_MASK	0x1f
-
-static inline pgtable_free_t pgtable_free_cache(void *p, int cachenum,
-						unsigned long mask)
-{
-	BUG_ON(cachenum > PGF_CACHENUM_MASK);
-
-	return (pgtable_free_t){.val = ((unsigned long) p & ~mask) | cachenum};
-}
-
 #ifdef CONFIG_PPC64
 #include <asm/pgalloc-64.h>
 #else
@@ -50,12 +31,12 @@ static inline pgtable_free_t pgtable_fre
 #endif
 
 #ifdef CONFIG_SMP
-extern void pgtable_free_tlb(struct mmu_gather *tlb, pgtable_free_t pgf);
+extern void pgtable_free_tlb(struct mmu_gather *tlb, void *table, unsigned shift);
 extern void pte_free_finish(void);
 #else /* CONFIG_SMP */
-static inline void pgtable_free_tlb(struct mmu_gather *tlb, pgtable_free_t pgf)
+static inline void pgtable_free_tlb(struct mmu_gather *tlb, void *table, unsigned shift)
 {
-	pgtable_free(pgf);
+	pgtable_free(table, shift);
 }
 static inline void pte_free_finish(void) { }
 #endif /* !CONFIG_SMP */
@@ -63,12 +44,9 @@ static inline void pte_free_finish(void)
 static inline void __pte_free_tlb(struct mmu_gather *tlb, struct page *ptepage,
 				  unsigned long address)
 {
-	pgtable_free_t pgf = pgtable_free_cache(page_address(ptepage),
-						PTE_NONCACHE_NUM,
-						PTE_TABLE_SIZE-1);
 	tlb_flush_pgtable(tlb, address);
 	pgtable_page_dtor(ptepage);
-	pgtable_free_tlb(tlb, pgf);
+	pgtable_free_tlb(tlb, page_address(ptepage), 0);
 }
 
 #endif /* __KERNEL__ */
Index: working-2.6/arch/powerpc/mm/pgtable.c
===================================================================
--- working-2.6.orig/arch/powerpc/mm/pgtable.c	2009-10-27 16:34:07.000000000 +1100
+++ working-2.6/arch/powerpc/mm/pgtable.c	2009-10-29 12:48:29.000000000 +1100
@@ -49,12 +49,12 @@ struct pte_freelist_batch
 {
 	struct rcu_head	rcu;
 	unsigned int	index;
-	pgtable_free_t	tables[0];
+	unsigned long	tables[0];
 };
 
 #define PTE_FREELIST_SIZE \
 	((PAGE_SIZE - sizeof(struct pte_freelist_batch)) \
-	  / sizeof(pgtable_free_t))
+	  / sizeof(unsigned long))
 
 static void pte_free_smp_sync(void *arg)
 {
@@ -64,13 +64,13 @@ static void pte_free_smp_sync(void *arg)
 /* This is only called when we are critically out of memory
  * (and fail to get a page in pte_free_tlb).
  */
-static void pgtable_free_now(pgtable_free_t pgf)
+static void pgtable_free_now(void *table, unsigned shift)
 {
 	pte_freelist_forced_free++;
 
 	smp_call_function(pte_free_smp_sync, NULL, 1);
 
-	pgtable_free(pgf);
+	pgtable_free(table, shift);
 }
 
 static void pte_free_rcu_callback(struct rcu_head *head)
@@ -79,8 +79,12 @@ static void pte_free_rcu_callback(struct
 		container_of(head, struct pte_freelist_batch, rcu);
 	unsigned int i;
 
-	for (i = 0; i < batch->index; i++)
-		pgtable_free(batch->tables[i]);
+	for (i = 0; i < batch->index; i++) {
+		void *table = (void *)(batch->tables[i] & ~MAX_PGTABLE_INDEX_SIZE);
+		unsigned shift = batch->tables[i] & MAX_PGTABLE_INDEX_SIZE;
+
+		pgtable_free(table, shift);
+	}
 
 	free_page((unsigned long)batch);
 }
@@ -91,25 +95,28 @@ static void pte_free_submit(struct pte_f
 	call_rcu(&batch->rcu, pte_free_rcu_callback);
 }
 
-void pgtable_free_tlb(struct mmu_gather *tlb, pgtable_free_t pgf)
+void pgtable_free_tlb(struct mmu_gather *tlb, void *table, unsigned shift)
 {
 	/* This is safe since tlb_gather_mmu has disabled preemption */
 	struct pte_freelist_batch **batchp = &__get_cpu_var(pte_freelist_cur);
+	unsigned long pgf;
 
 	if (atomic_read(&tlb->mm->mm_users) < 2 ||
 	    cpumask_equal(mm_cpumask(tlb->mm), cpumask_of(smp_processor_id()))){
-		pgtable_free(pgf);
+		pgtable_free(table, shift);
 		return;
 	}
 
 	if (*batchp == NULL) {
 		*batchp = (struct pte_freelist_batch *)__get_free_page(GFP_ATOMIC);
 		if (*batchp == NULL) {
-			pgtable_free_now(pgf);
+			pgtable_free_now(table, shift);
 			return;
 		}
 		(*batchp)->index = 0;
 	}
+	BUG_ON(shift > MAX_PGTABLE_INDEX_SIZE);
+	pgf = (unsigned long)table | shift;
 	(*batchp)->tables[(*batchp)->index++] = pgf;
 	if ((*batchp)->index == PTE_FREELIST_SIZE) {
 		pte_free_submit(*batchp);
Index: working-2.6/arch/powerpc/mm/hugetlbpage.c
===================================================================
--- working-2.6.orig/arch/powerpc/mm/hugetlbpage.c	2009-10-27 16:34:27.000000000 +1100
+++ working-2.6/arch/powerpc/mm/hugetlbpage.c	2009-10-29 12:48:02.000000000 +1100
@@ -43,26 +43,14 @@ static unsigned nr_gpages;
 unsigned int mmu_huge_psizes[MMU_PAGE_COUNT] = { }; /* initialize all to 0 */
 
 #define hugepte_shift			mmu_huge_psizes
-#define PTRS_PER_HUGEPTE(psize)		(1 << hugepte_shift[psize])
-#define HUGEPTE_TABLE_SIZE(psize)	(sizeof(pte_t) << hugepte_shift[psize])
+#define HUGEPTE_INDEX_SIZE(psize)	(mmu_huge_psizes[(psize)])
+#define PTRS_PER_HUGEPTE(psize)		(1 << mmu_huge_psizes[psize])
 
 #define HUGEPD_SHIFT(psize)		(mmu_psize_to_shift(psize) \
-						+ hugepte_shift[psize])
+					 + HUGEPTE_INDEX_SIZE(psize))
 #define HUGEPD_SIZE(psize)		(1UL << HUGEPD_SHIFT(psize))
 #define HUGEPD_MASK(psize)		(~(HUGEPD_SIZE(psize)-1))
 
-/* Subtract one from array size because we don't need a cache for 4K since
- * is not a huge page size */
-#define HUGE_PGTABLE_INDEX(psize)	(HUGEPTE_CACHE_NUM + psize - 1)
-#define HUGEPTE_CACHE_NAME(psize)	(huge_pgtable_cache_name[psize])
-
-static const char *huge_pgtable_cache_name[MMU_PAGE_COUNT] = {
-	[MMU_PAGE_64K]	= "hugepte_cache_64K",
-	[MMU_PAGE_1M]	= "hugepte_cache_1M",
-	[MMU_PAGE_16M]	= "hugepte_cache_16M",
-	[MMU_PAGE_16G]	= "hugepte_cache_16G",
-};
-
 /* Flag to mark huge PD pointers.  This means pmd_bad() and pud_bad()
  * will choke on pointers to hugepte tables, which is handy for
  * catching screwups early. */
@@ -114,15 +102,15 @@ static inline pte_t *hugepte_offset(huge
 static int __hugepte_alloc(struct mm_struct *mm, hugepd_t *hpdp,
 			   unsigned long address, unsigned int psize)
 {
-	pte_t *new = kmem_cache_zalloc(pgtable_cache[HUGE_PGTABLE_INDEX(psize)],
-				      GFP_KERNEL|__GFP_REPEAT);
+	pte_t *new = kmem_cache_zalloc(PGT_CACHE(hugepte_shift[psize]),
+				       GFP_KERNEL|__GFP_REPEAT);
 
 	if (! new)
 		return -ENOMEM;
 
 	spin_lock(&mm->page_table_lock);
 	if (!hugepd_none(*hpdp))
-		kmem_cache_free(pgtable_cache[HUGE_PGTABLE_INDEX(psize)], new);
+		kmem_cache_free(PGT_CACHE(hugepte_shift[psize]), new);
 	else
 		hpdp->pd = (unsigned long)new | HUGEPD_OK;
 	spin_unlock(&mm->page_table_lock);
@@ -271,9 +259,7 @@ static void free_hugepte_range(struct mm
 
 	hpdp->pd = 0;
 	tlb->need_flush = 1;
-	pgtable_free_tlb(tlb, pgtable_free_cache(hugepte,
-						 HUGEPTE_CACHE_NUM+psize-1,
-						 PGF_CACHENUM_MASK));
+	pgtable_free_tlb(tlb, hugepte, hugepte_shift[psize]);
 }
 
 static void hugetlb_free_pmd_range(struct mmu_gather *tlb, pud_t *pud,
@@ -698,8 +684,6 @@ static void __init set_huge_psize(int ps
 		if (mmu_huge_psizes[psize] ||
 		   mmu_psize_defs[psize].shift == PAGE_SHIFT)
 			return;
-		if (WARN_ON(HUGEPTE_CACHE_NAME(psize) == NULL))
-			return;
 		hugetlb_add_hstate(mmu_psize_defs[psize].shift - PAGE_SHIFT);
 
 		switch (mmu_psize_defs[psize].shift) {
@@ -769,16 +753,11 @@ static int __init hugetlbpage_init(void)
 
 	for (psize = 0; psize < MMU_PAGE_COUNT; ++psize) {
 		if (mmu_huge_psizes[psize]) {
-			pgtable_cache[HUGE_PGTABLE_INDEX(psize)] =
-				kmem_cache_create(
-					HUGEPTE_CACHE_NAME(psize),
-					HUGEPTE_TABLE_SIZE(psize),
-					HUGEPTE_TABLE_SIZE(psize),
-					0,
-					NULL);
-			if (!pgtable_cache[HUGE_PGTABLE_INDEX(psize)])
-				panic("hugetlbpage_init(): could not create %s"\
-				      "\n", HUGEPTE_CACHE_NAME(psize));
+			pgtable_cache_add(hugepte_shift[psize], NULL);
+			if (!PGT_CACHE(hugepte_shift[psize]))
+				panic("hugetlbpage_init(): could not create "
+				      "pgtable cache for %d bit pagesize\n",
+				      mmu_psize_to_shift(psize));
 		}
 	}
 
Index: working-2.6/arch/powerpc/include/asm/pgtable-ppc64.h
===================================================================
--- working-2.6.orig/arch/powerpc/include/asm/pgtable-ppc64.h	2009-10-27 16:34:07.000000000 +1100
+++ working-2.6/arch/powerpc/include/asm/pgtable-ppc64.h	2009-10-29 12:48:02.000000000 +1100
@@ -354,6 +354,7 @@ static inline void __ptep_set_access_fla
 #define pgoff_to_pte(off)	((pte_t) {((off) << PTE_RPN_SHIFT)|_PAGE_FILE})
 #define PTE_FILE_MAX_BITS	(BITS_PER_LONG - PTE_RPN_SHIFT)
 
+void pgtable_cache_add(unsigned shift, void (*ctor)(void *));
 void pgtable_cache_init(void);
 
 /*


-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

^ permalink raw reply

* Re: [GIT PULL] perf_event/tracing/powerpc patches from Anton Blanchard
From: Benjamin Herrenschmidt @ 2009-10-29  1:44 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: Peter Zijlstra, linuxppc-dev, Ingo Molnar, linux-kernel,
	Anton Blanchard
In-Reply-To: <19176.59441.523075.445864@drongo.ozlabs.ibm.com>

On Thu, 2009-10-29 at 11:56 +1100, Paul Mackerras wrote:
> Here is a series of patches from Anton Blanchard that implement some
> nice tracing and perf_event features on powerpc.  One of them is
> generic perf_event stuff (adding software events for alignment faults
> and instruction emulation faults).
> 
> Since this touches the perf_event and tracing subsystems as well as the
> powerpc architecture code, I think the best way forward is for both
> Ingo and Ben to pull it into their trees.  I have based it on the most
> recent point in Linus' tree that Ingo had pulled into his perf
> branches (as of yesterday or so).

This is -next material right ?

Cheers,
Ben.

> Thanks,
> Paul.
> 
> The following changes since commit a3ccf63ee643ef243cbf8918da8b3f9238f10029:
>   Linus Torvalds (1):
>         Merge branch 'for-linus' of git://git.kernel.org/.../ieee1394/linux1394-2.6
> 
> are available in the git repository at:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/paulus/perf.git master
> 
> Anton Blanchard (14):
>       powerpc: perf_event: Log invalid data addresses as all 1s
>       powerpc: perf_event: Enable SDAR in continous sample mode
>       perf_event: Add alignment-faults and emulation-faults software events
>       powerpc: Create PPC_WARN_ALIGNMENT to match PPC_WARN_EMULATED
>       powerpc: perf_event: Add alignment-faults and emulation-faults software events
>       powerpc: tracing: Add powerpc tracepoints for interrupt entry and exit
>       powerpc: tracing: Add powerpc tracepoints for timer entry and exit
>       powerpc: tracing: Add hypervisor call tracepoints
>       powerpc: tracing: Give hypervisor call tracepoints access to arguments
>       powerpc: Disable HCALL_STATS by default
>       powerpc: Export powerpc_debugfs_root
>       powerpc: perf_event: Cleanup copy_page output by hiding setup symbol
>       powerpc: perf_event: Hide iseries_check_pending_irqs
>       powerpc: perf_event: Cleanup output by adding symbols
> 
>  arch/powerpc/Kconfig.debug                   |    2 +-
>  arch/powerpc/configs/pseries_defconfig       |    2 +-
>  arch/powerpc/include/asm/emulated_ops.h      |   19 ++++-
>  arch/powerpc/include/asm/hvcall.h            |    2 +
>  arch/powerpc/include/asm/reg.h               |    2 +
>  arch/powerpc/include/asm/trace.h             |  133 ++++++++++++++++++++++++++
>  arch/powerpc/kernel/align.c                  |   12 +-
>  arch/powerpc/kernel/entry_64.S               |    4 +-
>  arch/powerpc/kernel/exceptions-64s.S         |    3 +
>  arch/powerpc/kernel/irq.c                    |    6 +
>  arch/powerpc/kernel/perf_event.c             |    2 +-
>  arch/powerpc/kernel/power5+-pmu.c            |    4 -
>  arch/powerpc/kernel/power5-pmu.c             |    6 +-
>  arch/powerpc/kernel/power6-pmu.c             |    2 +-
>  arch/powerpc/kernel/power7-pmu.c             |    6 +-
>  arch/powerpc/kernel/ppc970-pmu.c             |    4 -
>  arch/powerpc/kernel/setup-common.c           |    1 +
>  arch/powerpc/kernel/time.c                   |    6 +
>  arch/powerpc/kernel/traps.c                  |   18 ++--
>  arch/powerpc/lib/copypage_64.S               |    4 +-
>  arch/powerpc/platforms/pseries/hvCall.S      |  132 +++++++++++++++----------
>  arch/powerpc/platforms/pseries/hvCall_inst.c |   38 ++++++++
>  arch/powerpc/platforms/pseries/lpar.c        |   33 +++++++
>  include/linux/perf_counter.h                 |    2 +
>  include/linux/perf_event.h                   |    2 +
>  kernel/perf_event.c                          |    2 +
>  tools/perf/design.txt                        |    2 +
>  tools/perf/util/parse-events.c               |    4 +
>  28 files changed, 357 insertions(+), 96 deletions(-)
>  create mode 100644 arch/powerpc/include/asm/trace.h

^ permalink raw reply

* [GIT PULL] perf_event/tracing/powerpc patches from Anton Blanchard
From: Paul Mackerras @ 2009-10-29  0:56 UTC (permalink / raw)
  To: Ingo Molnar, Benjamin Herrenschmidt
  Cc: Peter Zijlstra, linuxppc-dev, linux-kernel, Anton Blanchard

Here is a series of patches from Anton Blanchard that implement some
nice tracing and perf_event features on powerpc.  One of them is
generic perf_event stuff (adding software events for alignment faults
and instruction emulation faults).

Since this touches the perf_event and tracing subsystems as well as the
powerpc architecture code, I think the best way forward is for both
Ingo and Ben to pull it into their trees.  I have based it on the most
recent point in Linus' tree that Ingo had pulled into his perf
branches (as of yesterday or so).

Thanks,
Paul.

The following changes since commit a3ccf63ee643ef243cbf8918da8b3f9238f10029:
  Linus Torvalds (1):
        Merge branch 'for-linus' of git://git.kernel.org/.../ieee1394/linux1394-2.6

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/paulus/perf.git master

Anton Blanchard (14):
      powerpc: perf_event: Log invalid data addresses as all 1s
      powerpc: perf_event: Enable SDAR in continous sample mode
      perf_event: Add alignment-faults and emulation-faults software events
      powerpc: Create PPC_WARN_ALIGNMENT to match PPC_WARN_EMULATED
      powerpc: perf_event: Add alignment-faults and emulation-faults software events
      powerpc: tracing: Add powerpc tracepoints for interrupt entry and exit
      powerpc: tracing: Add powerpc tracepoints for timer entry and exit
      powerpc: tracing: Add hypervisor call tracepoints
      powerpc: tracing: Give hypervisor call tracepoints access to arguments
      powerpc: Disable HCALL_STATS by default
      powerpc: Export powerpc_debugfs_root
      powerpc: perf_event: Cleanup copy_page output by hiding setup symbol
      powerpc: perf_event: Hide iseries_check_pending_irqs
      powerpc: perf_event: Cleanup output by adding symbols

 arch/powerpc/Kconfig.debug                   |    2 +-
 arch/powerpc/configs/pseries_defconfig       |    2 +-
 arch/powerpc/include/asm/emulated_ops.h      |   19 ++++-
 arch/powerpc/include/asm/hvcall.h            |    2 +
 arch/powerpc/include/asm/reg.h               |    2 +
 arch/powerpc/include/asm/trace.h             |  133 ++++++++++++++++++++++++++
 arch/powerpc/kernel/align.c                  |   12 +-
 arch/powerpc/kernel/entry_64.S               |    4 +-
 arch/powerpc/kernel/exceptions-64s.S         |    3 +
 arch/powerpc/kernel/irq.c                    |    6 +
 arch/powerpc/kernel/perf_event.c             |    2 +-
 arch/powerpc/kernel/power5+-pmu.c            |    4 -
 arch/powerpc/kernel/power5-pmu.c             |    6 +-
 arch/powerpc/kernel/power6-pmu.c             |    2 +-
 arch/powerpc/kernel/power7-pmu.c             |    6 +-
 arch/powerpc/kernel/ppc970-pmu.c             |    4 -
 arch/powerpc/kernel/setup-common.c           |    1 +
 arch/powerpc/kernel/time.c                   |    6 +
 arch/powerpc/kernel/traps.c                  |   18 ++--
 arch/powerpc/lib/copypage_64.S               |    4 +-
 arch/powerpc/platforms/pseries/hvCall.S      |  132 +++++++++++++++----------
 arch/powerpc/platforms/pseries/hvCall_inst.c |   38 ++++++++
 arch/powerpc/platforms/pseries/lpar.c        |   33 +++++++
 include/linux/perf_counter.h                 |    2 +
 include/linux/perf_event.h                   |    2 +
 kernel/perf_event.c                          |    2 +
 tools/perf/design.txt                        |    2 +
 tools/perf/util/parse-events.c               |    4 +
 28 files changed, 357 insertions(+), 96 deletions(-)
 create mode 100644 arch/powerpc/include/asm/trace.h

^ permalink raw reply

* Re: [PATCH v3] powerpc/ppc64: Use preempt_schedule_irq instead of preempt_schedule
From: Benjamin Herrenschmidt @ 2009-10-29  0:49 UTC (permalink / raw)
  To: Valentine; +Cc: olof, linuxppc-dev, paulus
In-Reply-To: <4AE8CA95.7060402@ru.mvista.com>


> Yes, the MSR_EE is cleared before we jump to do_work. I'm OK with 
> clearing the hardirqenable flag. I just assumed that the hardirq flag 
> was supposed to reflect the MSR_EE state, so it looked a bit odd 
> clearing the MSR_EE at one place and then reflecting the change at another.

Yeah well, it is supposed to reflect EE in the "general case", it's just
that in the exception entry/exit, we take shortcuts when turning EE off
for short amount of times without reflecting it in the PACA. This is
why, in this case, since we are going back to C code, I want to have it
"fixed up" to reflect reality.

Cheers,
Ben.

> Anyway, the patch works fine.
> 
> Thanks,
> Val.
> 
> So either we
> > set it back, or we clear HARDIRQEN to reflect it. It will be re-enable
> > as soon as preempt_schedule_irq() calls local_irq_enable() which is soon
> > enough anyways.
> > 
> > Also that avoids perf interrupt sneaking in since those act as NMIs in
> > that regard and -will- get in even when soft disabled.
> > 
> > Cheers,
> > Ben.
> > 
> >> Thanks,
> >> Val.
> >>> Ben.
> >>>
> >>>> Thanks,
> >>>> Val.
> >>>>
> >>>>> +	TRACE_DISABLE_INTS
> >>>>> +
> >>>>> +	/* Call the scheduler with soft IRQs off */
> >>>>> +1:	bl	.preempt_schedule_irq
> >>>>> +
> >>>>> +	/* Hard-disable interrupts again (and update PACA) */
> >>>>>  #ifdef CONFIG_PPC_BOOK3E
> >>>>> -	wrteei	1
> >>>>> -	bl	.preempt_schedule
> >>>>>  	wrteei	0
> >>>>>  #else
> >>>>> -	ori	r10,r10,MSR_EE
> >>>>> -	mtmsrd	r10,1		/* reenable interrupts */
> >>>>> -	bl	.preempt_schedule
> >>>>>  	mfmsr	r10
> >>>>> -	clrrdi	r9,r1,THREAD_SHIFT
> >>>>> -	rldicl	r10,r10,48,1	/* disable interrupts again */
> >>>>> +	rldicl	r10,r10,48,1
> >>>>>  	rotldi	r10,r10,16
> >>>>>  	mtmsrd	r10,1
> >>>>>  #endif /* CONFIG_PPC_BOOK3E */
> >>>>> +	li	r0,0
> >>>>> +	stb	r0,PACAHARDIRQEN(r13)
> >>>>> +
> >>>>> +	/* Re-test flags and eventually loop */
> >>>>> +	clrrdi	r9,r1,THREAD_SHIFT
> >>>>>  	ld	r4,TI_FLAGS(r9)
> >>>>>  	andi.	r0,r4,_TIF_NEED_RESCHED
> >>>>>  	bne	1b
> >>>>>  	b	restore
> >>>>>  
> >>>>>  user_work:
> >>>>> -#endif
> >>>>> +#endif /* CONFIG_PREEMPT */
> >>>>> +
> >>>>>  	/* Enable interrupts */
> >>>>>  #ifdef CONFIG_PPC_BOOK3E
> >>>>>  	wrteei	1
> >>>
> > 
> > 

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox