LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH 16/40] tags: recognize compat syscalls
From: Michal Marek @ 2010-06-24 12:02 UTC (permalink / raw)
  To: Ian Munsie
  Cc: Frederic Weisbecker, Jason Baron, linux-kernel, Steven Rostedt,
	linuxppc-dev, Ingo Molnar, Paul Mackerras, John Kacur, WANG Cong,
	Andrew Morton, Stefani Seibold
In-Reply-To: <1277287401-28571-17-git-send-email-imunsie@au1.ibm.com>

On 23.6.2010 12:02, Ian Munsie wrote:
> From: Jason Baron <jbaron@redhat.com>
> 
> make tags.sh recognize the new syscall macros:
> 
> COMPAT_SYSCALL_DEFINE#N()
> ARCH_COMPAT_SYSCALL_DEFINE#N()
> 
> Signed-off-by: Jason Baron <jbaron@redhat.com>
> Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>

Acked-by: Michal Marek <mmarek@suse.cz>

Michal

^ permalink raw reply

* Re: [PATCH 08/40] tracing: remove syscall bitmaps in preparation for compat support
From: Steven Rostedt @ 2010-06-23 19:45 UTC (permalink / raw)
  To: Jason Baron
  Cc: Lai Jiangshan, Frederic Weisbecker, linux-kernel, linuxppc-dev,
	Ingo Molnar, Paul Mackerras, Ian Munsie, Ingo Molnar,
	Masami Hiramatsu
In-Reply-To: <20100623193455.GE2680@redhat.com>

On Wed, 2010-06-23 at 15:34 -0400, Jason Baron wrote:

> Actually, looking at this further, what we probably want to do change
> the "int nb_args" field, which is already in syscall_metadata into a bit
> field. nb_args I think can be at most 6, or 3 bits, and we only need 4
> bits for storing the enabled/disabled data, so we could even make it a
> char. Thus, actually saving space with this patch :) (at least as far as
> the syscall_metadata field is concerned).

Yeah, I'm fine with turning that into a count/flags field.

-- Steve

^ permalink raw reply

* Re: [PATCH 08/40] tracing: remove syscall bitmaps in preparation for compat support
From: Jason Baron @ 2010-06-23 19:34 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Lai Jiangshan, Frederic Weisbecker, linux-kernel, linuxppc-dev,
	Ingo Molnar, Paul Mackerras, Ian Munsie, Ingo Molnar,
	Masami Hiramatsu
In-Reply-To: <20100623191454.GC2680@redhat.com>

On Wed, Jun 23, 2010 at 03:14:54PM -0400, Jason Baron wrote:
> On Wed, Jun 23, 2010 at 11:16:44AM -0400, Steven Rostedt wrote:
> > On Wed, 2010-06-23 at 20:02 +1000, Ian Munsie wrote:
> > > From: Jason Baron <jbaron@redhat.com>
> > > 
> > > In preparation for compat syscall tracing support, let's store the enabled
> > > syscalls, with the struct syscall_metadata itself. That way we don't duplicate
> > > enabled information when the compat table points to an entry in the regular
> > > syscall table. Also, allows us to remove the bitmap data structures completely.
> > > 
> > > Signed-off-by: Jason Baron <jbaron@redhat.com>
> > > Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
> > > ---
> > >  include/linux/syscalls.h      |    8 +++++++
> > >  include/trace/syscall.h       |    4 +++
> > >  kernel/trace/trace_syscalls.c |   42 +++++++++++++++++++---------------------
> > >  3 files changed, 32 insertions(+), 22 deletions(-)
> > > 
> > > diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
> > > index 86f082b..755d05b 100644
> > > --- a/include/linux/syscalls.h
> > > +++ b/include/linux/syscalls.h
> > > @@ -163,6 +163,10 @@ extern struct trace_event_functions exit_syscall_print_funcs;
> > >  		.nb_args 	= nb,				\
> > >  		.types		= types_##sname,		\
> > >  		.args		= args_##sname,			\
> > > +		.ftrace_enter	= 0,				\
> > > +		.ftrace_exit	= 0,				\
> > > +		.perf_enter	= 0,				\
> > > +		.perf_exit	= 0,				\
> > 
> > I really hate this change!
> > 
> > You just removed a nice compressed bitmap (1 bit per syscall) to add 4
> > bytes per syscall. On my box I have 308 syscalls being traced. That was
> > 308 bits per bitmask = 39 bytes * 2 = 78 * 2 (perf and ftrace) = 156.
> > 
> > Now we have 8 bytes per syscall (enter and exit), which is 1232 bytes.
> > 
> > Thus this change added 1076 bytes.
> > 
> > This may not seem as much, but the change is not worth 1K. Can't we just
> > add another bitmask or something for the compat case?
> > 
> > I also hate the moving of ftrace and perf internal data to an external
> > interface.
> > 
> > -- Steve
> > 
> 
> I made this change (I also wrote the original bitmap), b/c compat
> syscalls can share "regular" syscalls. That is the compat syscall table
> points to syscalls from non-compat mode. (looking at ia32 on x86 it
> looks like at least half).
> 
> Thus, if we continue along the bitmap path, we would have to introduce
> another 4 bitmaps for compat. 2 for enter and exit and 2 for perf and
> ftrace. Thus, using your math above: 39 bytes * 8 = 312 bytes. So
> approximately 1 byte per system call.
> 
> Instead, if we store this data in the syscall metadata, we actually only
> need 4 bits per syscall. Now, the above implementation uses 4 chars,
> where we really only need 1 char (or really 4 bits, which we could
> eventually store in the last last bit of the four existing pointer
> assuming they are 2 byte aligned for no increased storage space at all).
> But even assuming we use 1 byte per system call we are going to have in
> the worse case the above 312 bytes + (1 byte * # of non-shared compat
> syscalls). So, yes we might need a little more storage in this scheme.
> Another consideration too, is obviously the alignment of
> syscall_metadata, since the extra 1 byte, might be more...
> 
> However, we don't have to compute the location of the bits in the
> compat syscall map each time a tracing syscall is enable/disable. This
> would be more expensive, especially if we don't store the compat syscall
> number with each syscall meta data structure (which you have proposed
> dropping). So with compat syscalls, we are setting two bit locations
> with each enable/disable instead of 1 with this new scheme.
> 
> Also, I think the more important reason to store these bits in the
> syscall meta data structure is simplicity. Not all arches start their tables
> counting from 0 (requiring a constant shift factor), and obviously we
> waste bits for non-implemented syscalls. I don't want to have to deal
> with these arch specific implementation issues, if I don't need to.
> 
> thanks,
> 
> -Jason
> 

Actually, looking at this further, what we probably want to do change
the "int nb_args" field, which is already in syscall_metadata into a bit
field. nb_args I think can be at most 6, or 3 bits, and we only need 4
bits for storing the enabled/disabled data, so we could even make it a
char. Thus, actually saving space with this patch :) (at least as far as
the syscall_metadata field is concerned).

thanks,

-Jason

^ permalink raw reply

* Re: [PATCH 12/40] x86, compat: convert ia32 layer to use
From: H. Peter Anvin @ 2010-06-23 19:23 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Andrew Morton, Jason Baron, x86, linux-kernel, Steven Rostedt,
	linuxppc-dev, Ingo Molnar, Paul Mackerras, Ian Munsie,
	Greg Ungerer, Russell King, Thomas Gleixner, Christoph Hellwig
In-Reply-To: <20100623114116.GE5242@nowhere>

On 06/23/2010 04:41 AM, Frederic Weisbecker wrote:
> On Wed, Jun 23, 2010 at 12:46:04PM +0200, Christoph Hellwig wrote:
>> On Wed, Jun 23, 2010 at 12:36:21PM +0200, Frederic Weisbecker wrote:
>>> I think we wanted that to keep the sys32_ prefixed based naming, to avoid
>>> collisions with generic compat handler names.
>>
>> For native syscalls we do this by adding a arch prefix inside the
>> syscall name, e.g.:
>>
>> arch/s390/kernel/sys_s390.c:SYSCALL_DEFINE(s390_fallocate)(int fd, int mode, loff_t offset,
>> arch/sparc/kernel/sys_sparc_64.c:SYSCALL_DEFINE1(sparc_pipe_real, struct pt_regs *, regs)
> 
> In fact we sort of wanted to standardize the name of arch overriden compat
> syscalls, so that userspace programs playing with syscalls tracing won't have
> to deal with arch naming differences.
> 

That seems totally wrong in so many ways.

What userspace sees is the system call name, e.g. fallocate or pipe.  It
should *not* be visible to userspace that there is an arch-specici
implementation.

There are already a huge amount of gratuitous user space ABI
differences, of course, partly because we *still* don't actually have a
systematic model for argument passing.

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

^ permalink raw reply

* Re: [PATCH 08/40] tracing: remove syscall bitmaps in preparation for compat support
From: Jason Baron @ 2010-06-23 19:14 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Lai Jiangshan, Frederic Weisbecker, linux-kernel, linuxppc-dev,
	Ingo Molnar, Paul Mackerras, Ian Munsie, Ingo Molnar,
	Masami Hiramatsu
In-Reply-To: <1277306204.9181.43.camel@gandalf.stny.rr.com>

On Wed, Jun 23, 2010 at 11:16:44AM -0400, Steven Rostedt wrote:
> On Wed, 2010-06-23 at 20:02 +1000, Ian Munsie wrote:
> > From: Jason Baron <jbaron@redhat.com>
> > 
> > In preparation for compat syscall tracing support, let's store the enabled
> > syscalls, with the struct syscall_metadata itself. That way we don't duplicate
> > enabled information when the compat table points to an entry in the regular
> > syscall table. Also, allows us to remove the bitmap data structures completely.
> > 
> > Signed-off-by: Jason Baron <jbaron@redhat.com>
> > Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
> > ---
> >  include/linux/syscalls.h      |    8 +++++++
> >  include/trace/syscall.h       |    4 +++
> >  kernel/trace/trace_syscalls.c |   42 +++++++++++++++++++---------------------
> >  3 files changed, 32 insertions(+), 22 deletions(-)
> > 
> > diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
> > index 86f082b..755d05b 100644
> > --- a/include/linux/syscalls.h
> > +++ b/include/linux/syscalls.h
> > @@ -163,6 +163,10 @@ extern struct trace_event_functions exit_syscall_print_funcs;
> >  		.nb_args 	= nb,				\
> >  		.types		= types_##sname,		\
> >  		.args		= args_##sname,			\
> > +		.ftrace_enter	= 0,				\
> > +		.ftrace_exit	= 0,				\
> > +		.perf_enter	= 0,				\
> > +		.perf_exit	= 0,				\
> 
> I really hate this change!
> 
> You just removed a nice compressed bitmap (1 bit per syscall) to add 4
> bytes per syscall. On my box I have 308 syscalls being traced. That was
> 308 bits per bitmask = 39 bytes * 2 = 78 * 2 (perf and ftrace) = 156.
> 
> Now we have 8 bytes per syscall (enter and exit), which is 1232 bytes.
> 
> Thus this change added 1076 bytes.
> 
> This may not seem as much, but the change is not worth 1K. Can't we just
> add another bitmask or something for the compat case?
> 
> I also hate the moving of ftrace and perf internal data to an external
> interface.
> 
> -- Steve
> 

I made this change (I also wrote the original bitmap), b/c compat
syscalls can share "regular" syscalls. That is the compat syscall table
points to syscalls from non-compat mode. (looking at ia32 on x86 it
looks like at least half).

Thus, if we continue along the bitmap path, we would have to introduce
another 4 bitmaps for compat. 2 for enter and exit and 2 for perf and
ftrace. Thus, using your math above: 39 bytes * 8 = 312 bytes. So
approximately 1 byte per system call.

Instead, if we store this data in the syscall metadata, we actually only
need 4 bits per syscall. Now, the above implementation uses 4 chars,
where we really only need 1 char (or really 4 bits, which we could
eventually store in the last last bit of the four existing pointer
assuming they are 2 byte aligned for no increased storage space at all).
But even assuming we use 1 byte per system call we are going to have in
the worse case the above 312 bytes + (1 byte * # of non-shared compat
syscalls). So, yes we might need a little more storage in this scheme.
Another consideration too, is obviously the alignment of
syscall_metadata, since the extra 1 byte, might be more...

However, we don't have to compute the location of the bits in the
compat syscall map each time a tracing syscall is enable/disable. This
would be more expensive, especially if we don't store the compat syscall
number with each syscall meta data structure (which you have proposed
dropping). So with compat syscalls, we are setting two bit locations
with each enable/disable instead of 1 with this new scheme.

Also, I think the more important reason to store these bits in the
syscall meta data structure is simplicity. Not all arches start their tables
counting from 0 (requiring a constant shift factor), and obviously we
waste bits for non-implemented syscalls. I don't want to have to deal
with these arch specific implementation issues, if I don't need to.

thanks,

-Jason

^ permalink raw reply

* Re: [PATCH 39/40] trace syscalls: Clean confusing {s,r,}name and fix ABI breakage
From: Jason Baron @ 2010-06-23 18:03 UTC (permalink / raw)
  To: Ian Munsie
  Cc: Jesper Nilsson, Ingo Molnar, linux-kernel, Steven Rostedt,
	linuxppc-dev, Ingo Molnar, Paul Mackerras, Frederic Weisbecker,
	Andrew Morton, Christoph Hellwig
In-Reply-To: <1277287401-28571-40-git-send-email-imunsie@au1.ibm.com>

On Wed, Jun 23, 2010 at 08:03:20PM +1000, Ian Munsie wrote:
> From: Ian Munsie <imunsie@au1.ibm.com>
> 
> This patch cleans up the preprocessor macros defining system calls by
> standidising on the parameters passing around the system call name, with
> and without it's prefix.
> 
> Overall this makes the preprocessor code easier to understand and
> follow and less likely to introduce bugs due to misunderstanding what to
> place into each parameter.
> 
> The parameters henceforth should be named:
> 
> sname is the system call name WITHOUT the prefix (e.g. read)
> prefix is the prefix INCLUDING the trailing underscore (e.g. sys_)
> 
> These are mashed together to form the full syscall name when required
> like blah##prefix##sname. For just prefix##sname please use the provided
> SYSNAME(prefix,sname) macro instead as it will be safe if prefix itself
> is a macro.
> 
> This patch also fixes an ABI breakage - the ftrace events are once again
> named like 'sys_enter_read' instead of 'enter_sys_read'. The rwtop
> script shipped with perf relies on this ABI. Others may as well.
> 
> Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
> ---

overall this patch is a major improvement! My question though is
about the naming of the compat syscalls in the context of events. I
believe this patch differeniates compat syscall event names as:
"sys32_enter_sname", and "compat_sys_enter_sname". I agree that we keep that
distinction for purposes of defining the actual syscall function. However,
we had discuessed previously about keeping the event name the same for
all compat syscalls. ie they are all called "compat_sys_sname" or some
such. Reason being you could just do "compat_*" to match all compat
events.

thanks,

-Jason

^ permalink raw reply

* Re: [PATCH 10/40] tracing: add tracing support for compat syscalls
From: Frederic Weisbecker @ 2010-06-23 16:02 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Lai Jiangshan, Arnd Bergmann, Jason Baron, Li Zefan, linux-kernel,
	linuxppc-dev, Ingo Molnar, Paul Mackerras, Ian Munsie,
	Andrew Morton, David S. Miller, Masami Hiramatsu
In-Reply-To: <1277306786.9181.51.camel@gandalf.stny.rr.com>

On Wed, Jun 23, 2010 at 11:26:26AM -0400, Steven Rostedt wrote:
> On Wed, 2010-06-23 at 20:02 +1000, Ian Munsie wrote:
> > From: Jason Baron <jbaron@redhat.com>
> > 
> > Add core support to event tracing for compat syscalls. The basic idea is that we
> > check if we have a compat task via 'is_compat_task()'. If so, we lookup in the
> > new compat_syscalls_metadata table, the corresponding struct syscall_metadata, to
> > check syscall arguments and whether or not we are enabled.
> > 
> > Signed-off-by: Jason Baron <jbaron@redhat.com>
> > Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
> > ---
> >  include/linux/compat.h        |    2 +
> >  include/trace/syscall.h       |    4 ++
> >  kernel/trace/trace.h          |    2 +
> >  kernel/trace/trace_syscalls.c |   86 +++++++++++++++++++++++++++++++++++++----
> >  4 files changed, 86 insertions(+), 8 deletions(-)
> > 
> > diff --git a/include/linux/compat.h b/include/linux/compat.h
> > index ab638e9..a94f13d 100644
> > --- a/include/linux/compat.h
> > +++ b/include/linux/compat.h
> > @@ -363,6 +363,8 @@ extern ssize_t compat_rw_copy_check_uvector(int type,
> >  
> >  #else /* CONFIG_COMPAT */
> >  
> > +#define NR_syscalls_compat 0
> > +
> >  static inline int is_compat_task(void)
> >  {
> >  	return 0;
> > diff --git a/include/trace/syscall.h b/include/trace/syscall.h
> > index 75f3dce..67d4e64 100644
> > --- a/include/trace/syscall.h
> > +++ b/include/trace/syscall.h
> > @@ -22,6 +22,7 @@
> >  struct syscall_metadata {
> >  	const char	*name;
> >  	int		syscall_nr;
> > +	int		compat_syscall_nr;
> 
> This is also bloating the kernel. I don't like to add fields to the
> syscall_metadata lightly.
> 
> You're adding 4 more bytes to this structure to handle a few items. Find
> a better way to do this.


This is for cases when the compat and normal handlers are the same.
I haven't checked whether such case happen enough to make this a
benefit. I suspect it's not. Moreover I guess this adds more
complexity to the code.

May be this should be simply dropped.



 
> >  	int		nb_args;
> >  	const char	**types;
> >  	const char	**args;
> > @@ -38,6 +39,9 @@ struct syscall_metadata {
> >  
> >  #ifdef CONFIG_FTRACE_SYSCALLS
> >  extern unsigned long arch_syscall_addr(int nr);
> > +#ifdef CONFIG_COMPAT
> > +extern unsigned long arch_compat_syscall_addr(int nr);
> > +#endif
> >  extern int init_syscall_trace(struct ftrace_event_call *call);
> >  
> >  extern int reg_event_syscall_enter(struct ftrace_event_call *call);
> > diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
> > index 01ce088..53ace4b 100644
> > --- a/kernel/trace/trace.h
> > +++ b/kernel/trace/trace.h
> > @@ -79,12 +79,14 @@ enum trace_type {
> >  struct syscall_trace_enter {
> >  	struct trace_entry	ent;
> >  	int			nr;
> > +	int			compat;
> 
> You're adding 4 bytes to a trace (taking up precious buffer space) for a
> single flag. If anything, set the 31st bit of nr if it is compat.



That too can be dropped I think. compat syscalls and normal syscalls are
different events, so the difference can be made in the event id or name.

^ permalink raw reply

* Re: [PATCH 31/40] trace syscalls: Convert various generic compat syscalls
From: H. Peter Anvin @ 2010-06-23 15:51 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Christoph Lameter, WANG Cong, Heiko Carstens, Oleg Nesterov,
	linuxppc-dev, Paul Mackerras, Sam Ravnborg, Andi Kleen,
	Mike Frysinger, Eric Dumazet, Jeff Moyer, Ingo Molnar,
	KOSAKI Motohiro, David Rientjes, Ingo Molnar, Arnd Bergmann,
	Steven Rostedt, Ian Munsie, Thomas Gleixner, Johannes Berg,
	Roland McGrath, Lee Schermerhorn, Arnaldo Carvalho de Melo,
	Neil Horman, linux-mm, netdev, Jason Baron, Greg Kroah-Hartman,
	Roel Kluin, linux-kernel, kexec, Eric Biederman, linux-fsdevel,
	Simon Kagstrom, Andrew Morton, David S. Miller, Alexander Viro
In-Reply-To: <20100623113806.GD5242@nowhere>

On 06/23/2010 04:38 AM, Frederic Weisbecker wrote:
> 
> I haven't heard any complains about existing syscalls wrappers.
> 

Then you truly haven't been listening.

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

^ permalink raw reply

* [PATCH v3 2/2] powerpc: add support for new hcall H_BEST_ENERGY
From: Vaidyanathan Srinivasan @ 2010-06-23  6:04 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Anton Blanchard; +Cc: linuxppc-dev
In-Reply-To: <20100623060122.4957.33819.stgit@drishya.in.ibm.com>

	Create sysfs interface to export data from H_BEST_ENERGY hcall
	that can be used by administrative tools on supported pseries
	platforms for energy management	optimizations.

	/sys/device/system/cpu/pseries_(de)activate_hint_list and
	/sys/device/system/cpu/cpuN/pseries_(de)activate_hint will provide
	hints for activation and deactivation of cpus respectively.

	Added new driver module
		arch/powerpc/platforms/pseries/pseries_energy.c
	under new config option CONFIG_PSERIES_ENERGY

Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/hvcall.h               |    3 
 arch/powerpc/platforms/pseries/Kconfig          |   10 +
 arch/powerpc/platforms/pseries/Makefile         |    1 
 arch/powerpc/platforms/pseries/pseries_energy.c |  258 +++++++++++++++++++++++
 4 files changed, 271 insertions(+), 1 deletions(-)
 create mode 100644 arch/powerpc/platforms/pseries/pseries_energy.c

diff --git a/arch/powerpc/include/asm/hvcall.h b/arch/powerpc/include/asm/hvcall.h
index 5119b7d..34b66e0 100644
--- a/arch/powerpc/include/asm/hvcall.h
+++ b/arch/powerpc/include/asm/hvcall.h
@@ -231,7 +231,8 @@
 #define H_GET_EM_PARMS		0x2B8
 #define H_SET_MPP		0x2D0
 #define H_GET_MPP		0x2D4
-#define MAX_HCALL_OPCODE	H_GET_MPP
+#define H_BEST_ENERGY		0x2F4
+#define MAX_HCALL_OPCODE	H_BEST_ENERGY

 #ifndef __ASSEMBLY__

diff --git a/arch/powerpc/platforms/pseries/Kconfig b/arch/powerpc/platforms/pseries/Kconfig
index c667f0f..b3dd108 100644
--- a/arch/powerpc/platforms/pseries/Kconfig
+++ b/arch/powerpc/platforms/pseries/Kconfig
@@ -33,6 +33,16 @@ config PSERIES_MSI
        depends on PCI_MSI && EEH
        default y

+config PSERIES_ENERGY
+	tristate "pseries energy management capabilities driver"
+	depends on PPC_PSERIES
+	default y
+	help
+	  Provides interface to platform energy management capabilities
+	  on supported PSERIES platforms.
+	  Provides: /sys/devices/system/cpu/pseries_(de)activation_hint_list
+	  and /sys/devices/system/cpu/cpuN/pseries_(de)activation_hint
+
 config SCANLOG
 	tristate "Scanlog dump interface"
 	depends on RTAS_PROC && PPC_PSERIES
diff --git a/arch/powerpc/platforms/pseries/Makefile b/arch/powerpc/platforms/pseries/Makefile
index 3dbef30..32ae72e 100644
--- a/arch/powerpc/platforms/pseries/Makefile
+++ b/arch/powerpc/platforms/pseries/Makefile
@@ -16,6 +16,7 @@ obj-$(CONFIG_EEH)	+= eeh.o eeh_cache.o eeh_driver.o eeh_event.o eeh_sysfs.o
 obj-$(CONFIG_KEXEC)	+= kexec.o
 obj-$(CONFIG_PCI)	+= pci.o pci_dlpar.o
 obj-$(CONFIG_PSERIES_MSI)	+= msi.o
+obj-$(CONFIG_PSERIES_ENERGY)	+= pseries_energy.o

 obj-$(CONFIG_HOTPLUG_CPU)	+= hotplug-cpu.o
 obj-$(CONFIG_MEMORY_HOTPLUG)	+= hotplug-memory.o
diff --git a/arch/powerpc/platforms/pseries/pseries_energy.c b/arch/powerpc/platforms/pseries/pseries_energy.c
new file mode 100644
index 0000000..9a936b1
--- /dev/null
+++ b/arch/powerpc/platforms/pseries/pseries_energy.c
@@ -0,0 +1,258 @@
+/*
+ * POWER platform energy management driver
+ * Copyright (C) 2010 IBM Corporation
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * version 2 as published by the Free Software Foundation.
+ *
+ * This pseries platform device driver provides access to
+ * platform energy management capabilities.
+ */
+
+#include <linux/module.h>
+#include <linux/types.h>
+#include <linux/errno.h>
+#include <linux/init.h>
+#include <linux/seq_file.h>
+#include <linux/sysdev.h>
+#include <linux/cpu.h>
+#include <linux/of.h>
+#include <asm/cputhreads.h>
+#include <asm/page.h>
+#include <asm/hvcall.h>
+
+
+#define MODULE_VERS "1.0"
+#define MODULE_NAME "pseries_energy"
+
+/* Helper Routines to convert between drc_index to cpu numbers */
+
+static u32 cpu_to_drc_index(int cpu)
+{
+	struct device_node *dn = NULL;
+	const int *indexes;
+	int i;
+	dn = of_find_node_by_path("/cpus");
+	if (dn == NULL)
+		goto err;
+	indexes = of_get_property(dn, "ibm,drc-indexes", NULL);
+	if (indexes == NULL)
+		goto err;
+	/* Convert logical cpu number to core number */
+	i = cpu_core_of_thread(cpu);
+	/*
+	 * The first element indexes[0] is the number of drc_indexes
+	 * returned in the list.  Hence i+1 will get the drc_index
+	 * corresponding to core number i.
+	 */
+	WARN_ON(i > indexes[0]);
+	return indexes[i + 1];
+err:
+	printk(KERN_WARNING "cpu_to_drc_index(%d) failed", cpu);
+	return 0;
+}
+
+static int drc_index_to_cpu(u32 drc_index)
+{
+	struct device_node *dn = NULL;
+	const int *indexes;
+	int i, cpu;
+	dn = of_find_node_by_path("/cpus");
+	if (dn == NULL)
+		goto err;
+	indexes = of_get_property(dn, "ibm,drc-indexes", NULL);
+	if (indexes == NULL)
+		goto err;
+	/*
+	 * First element in the array is the number of drc_indexes
+	 * returned.  Search through the list to find the matching
+	 * drc_index and get the core number
+	 */
+	for (i = 0; i < indexes[0]; i++) {
+		if (indexes[i + 1] == drc_index)
+			break;
+	}
+	/* Convert core number to logical cpu number */
+	cpu = cpu_first_thread_of_core(i);
+	return cpu;
+err:
+	printk(KERN_WARNING "drc_index_to_cpu(%d) failed", drc_index);
+	return 0;
+}
+
+/*
+ * pseries hypervisor call H_BEST_ENERGY provides hints to OS on
+ * preferred logical cpus to activate or deactivate for optimized
+ * energy consumption.
+ */
+
+#define FLAGS_MODE1	0x004E200000080E01
+#define FLAGS_MODE2	0x004E200000080401
+#define FLAGS_ACTIVATE  0x100
+
+static ssize_t get_best_energy_list(char *page, int activate)
+{
+	int rc, cnt, i, cpu;
+	unsigned long retbuf[PLPAR_HCALL9_BUFSIZE];
+	unsigned long flags = 0;
+	u32 *buf_page;
+	char *s = page;
+
+	buf_page = (u32 *) get_zeroed_page(GFP_KERNEL);
+	if (!buf_page)
+		return -ENOMEM;
+
+	flags = FLAGS_MODE1;
+	if (activate)
+		flags |= FLAGS_ACTIVATE;
+
+	rc = plpar_hcall9(H_BEST_ENERGY, retbuf, flags, 0, __pa(buf_page),
+				0, 0, 0, 0, 0, 0);
+	if (rc != H_SUCCESS) {
+		free_page((unsigned long) buf_page);
+		return -EINVAL;
+	}
+
+	cnt = retbuf[0];
+	for (i = 0; i < cnt; i++) {
+		cpu = drc_index_to_cpu(buf_page[2*i+1]);
+		if ((cpu_online(cpu) && !activate) ||
+		    (!cpu_online(cpu) && activate))
+			s += sprintf(s, "%d,", cpu);
+	}
+	if (s > page) { /* Something to show */
+		s--; /* Suppress last comma */
+		s += sprintf(s, "\n");
+	}
+
+	free_page((unsigned long) buf_page);
+	return s-page;
+}
+
+static ssize_t get_best_energy_data(struct sys_device *dev,
+					char *page, int activate)
+{
+	int rc;
+	unsigned long retbuf[PLPAR_HCALL9_BUFSIZE];
+	unsigned long flags = 0;
+
+	flags = FLAGS_MODE2;
+	if (activate)
+		flags |= FLAGS_ACTIVATE;
+
+	rc = plpar_hcall9(H_BEST_ENERGY, retbuf, flags,
+				cpu_to_drc_index(dev->id),
+				0, 0, 0, 0, 0, 0, 0);
+
+	if (rc != H_SUCCESS)
+		return -EINVAL;
+
+	return sprintf(page, "%lu\n", retbuf[1] >> 32);
+}
+
+/* Wrapper functions */
+
+static ssize_t cpu_activate_hint_list_show(struct sysdev_class *class,
+			struct sysdev_class_attribute *attr, char *page)
+{
+	return get_best_energy_list(page, 1);
+}
+
+static ssize_t cpu_deactivate_hint_list_show(struct sysdev_class *class,
+			struct sysdev_class_attribute *attr, char *page)
+{
+	return get_best_energy_list(page, 0);
+}
+
+static ssize_t percpu_activate_hint_show(struct sys_device *dev,
+			struct sysdev_attribute *attr, char *page)
+{
+	return get_best_energy_data(dev, page, 1);
+}
+
+static ssize_t percpu_deactivate_hint_show(struct sys_device *dev,
+			struct sysdev_attribute *attr, char *page)
+{
+	return get_best_energy_data(dev, page, 0);
+}
+
+/*
+ * Create sysfs interface:
+ * /sys/devices/system/cpu/pseries_activate_hint_list
+ * /sys/devices/system/cpu/pseries_deactivate_hint_list
+ * 	Comma separated list of cpus to activate or deactivate
+ * /sys/devices/system/cpu/cpuN/pseries_activate_hint
+ * /sys/devices/system/cpu/cpuN/pseries_deactivate_hint
+ *	Per-cpu value of the hint
+ */
+
+struct sysdev_class_attribute attr_cpu_activate_hint_list =
+		_SYSDEV_CLASS_ATTR(pseries_activate_hint_list, 0444,
+		cpu_activate_hint_list_show, NULL);
+
+struct sysdev_class_attribute attr_cpu_deactivate_hint_list =
+		_SYSDEV_CLASS_ATTR(pseries_deactivate_hint_list, 0444,
+		cpu_deactivate_hint_list_show, NULL);
+
+struct sysdev_attribute attr_percpu_activate_hint =
+		_SYSDEV_ATTR(pseries_activate_hint, 0444,
+		percpu_activate_hint_show, NULL);
+
+struct sysdev_attribute attr_percpu_deactivate_hint =
+		_SYSDEV_ATTR(pseries_deactivate_hint, 0444,
+		percpu_deactivate_hint_show, NULL);
+
+static int __init pseries_energy_init(void)
+{
+	int cpu, err;
+	struct sys_device *cpu_sys_dev;
+
+	/* Create the sysfs files */
+	err = sysfs_create_file(&cpu_sysdev_class.kset.kobj,
+				&attr_cpu_activate_hint_list.attr);
+	if (!err)
+		err = sysfs_create_file(&cpu_sysdev_class.kset.kobj,
+				&attr_cpu_deactivate_hint_list.attr);
+
+	for_each_possible_cpu(cpu) {
+		cpu_sys_dev = get_cpu_sysdev(cpu);
+		err = sysfs_create_file(&cpu_sys_dev->kobj,
+				&attr_percpu_activate_hint.attr);
+		if (err)
+			break;
+		err = sysfs_create_file(&cpu_sys_dev->kobj,
+				&attr_percpu_deactivate_hint.attr);
+		if (err)
+			break;
+	}
+	return err;
+
+}
+
+static void __exit pseries_energy_cleanup(void)
+{
+	int cpu;
+	struct sys_device *cpu_sys_dev;
+
+	/* Remove the sysfs files */
+	sysfs_remove_file(&cpu_sysdev_class.kset.kobj,
+				&attr_cpu_activate_hint_list.attr);
+
+	sysfs_remove_file(&cpu_sysdev_class.kset.kobj,
+				&attr_cpu_deactivate_hint_list.attr);
+
+	for_each_possible_cpu(cpu) {
+		cpu_sys_dev = get_cpu_sysdev(cpu);
+		sysfs_remove_file(&cpu_sys_dev->kobj,
+				&attr_percpu_activate_hint.attr);
+		sysfs_remove_file(&cpu_sys_dev->kobj,
+				&attr_percpu_deactivate_hint.attr);
+	}
+}
+
+module_init(pseries_energy_init);
+module_exit(pseries_energy_cleanup);
+MODULE_DESCRIPTION("Driver for pseries platform energy management");
+MODULE_AUTHOR("Vaidyanathan Srinivasan");
+MODULE_LICENSE("GPL");

^ permalink raw reply related

* [PATCH v3 1/2] powerpc: cleanup APIs for cpu/thread/core mappings
From: Vaidyanathan Srinivasan @ 2010-06-23  6:04 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Anton Blanchard; +Cc: linuxppc-dev
In-Reply-To: <20100623060122.4957.33819.stgit@drishya.in.ibm.com>

These APIs take logical cpu number as input
Change cpu_first_thread_in_core() to cpu_leftmost_thread_sibling()
Change cpu_last_thread_in_core() to cpu_rightmost_thread_sibling()

These APIs convert core number (index) to logical cpu/thread numbers
Add cpu_first_thread_of_core(int core)
Changed cpu_thread_to_core() to cpu_core_of_thread(int cpu)

The goal is to make 'threads_per_core' accessible to the
pseries_energy module.  Instead of making an API to read
threads_per_core, this is a higher level wrapper function to
convert from logical cpu number to core number.

The current APIs cpu_first_thread_in_core() and
cpu_last_thread_in_core() returns logical CPU number while
cpu_thread_to_core() returns core number or index which is
not a logical CPU number.  The APIs are now clearly named to
distinguish 'core number' versus first and last 'logical cpu
number' in that core.

The new APIs cpu_{left,right}most_thread_sibling() work on
logical cpu numbers.  While cpu_first_thread_of_core() and
cpu_core_of_thread() work on core index.

Example usage:  (4 threads per core system)

cpu_leftmost_thread_sibling(5) = 4
cpu_rightmost_thread_sibling(5) = 7
cpu_core_of_thread(5) = 1
cpu_first_thread_of_core(1) = 4

cpu_core_of_thread() is used in cpu_to_drc_index() in the
module and cpu_first_thread_of_core() is used in
drc_index_to_cpu() in the module.

Made API changes to few callers.  Exported symbols for use
in modules.

Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/cputhreads.h |   15 +++++++++------
 arch/powerpc/kernel/smp.c             |   19 ++++++++++++++++---
 arch/powerpc/mm/mmu_context_nohash.c  |   12 ++++++------
 3 files changed, 31 insertions(+), 15 deletions(-)

diff --git a/arch/powerpc/include/asm/cputhreads.h b/arch/powerpc/include/asm/cputhreads.h
index a8e1844..26dc6bd 100644
--- a/arch/powerpc/include/asm/cputhreads.h
+++ b/arch/powerpc/include/asm/cputhreads.h
@@ -61,22 +61,25 @@ static inline cpumask_t cpu_online_cores_map(void)
 	return cpu_thread_mask_to_cores(cpu_online_map);
 }

-static inline int cpu_thread_to_core(int cpu)
-{
-	return cpu >> threads_shift;
-}
+#ifdef CONFIG_SMP
+int cpu_core_of_thread(int cpu);
+int cpu_first_thread_of_core(int core);
+#else
+static inline int cpu_core_of_thread(int cpu) { return cpu; }
+static inline int cpu_first_thread_of_core(int core) { return core; }
+#endif

 static inline int cpu_thread_in_core(int cpu)
 {
 	return cpu & (threads_per_core - 1);
 }

-static inline int cpu_first_thread_in_core(int cpu)
+static inline int cpu_leftmost_thread_sibling(int cpu)
 {
 	return cpu & ~(threads_per_core - 1);
 }

-static inline int cpu_last_thread_in_core(int cpu)
+static inline int cpu_rightmost_thread_sibling(int cpu)
 {
 	return cpu | (threads_per_core - 1);
 }
diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index 5c196d1..da4c2f8 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -468,7 +468,20 @@ out:
 	return id;
 }

-/* Must be called when no change can occur to cpu_present_mask,
+/* Helper routines for cpu to core mapping */
+int cpu_core_of_thread(int cpu)
+{
+	return cpu >> threads_shift;
+}
+EXPORT_SYMBOL_GPL(cpu_core_of_thread);
+
+int cpu_first_thread_of_core(int core)
+{
+	return core << threads_shift;
+}
+EXPORT_SYMBOL_GPL(cpu_first_thread_of_core);
+
+/* Must be called when no change can occur to cpu_present_map,
  * i.e. during cpu online or offline.
  */
 static struct device_node *cpu_to_l2cache(int cpu)
@@ -527,7 +540,7 @@ int __devinit start_secondary(void *unused)
 	notify_cpu_starting(cpu);
 	set_cpu_online(cpu, true);
 	/* Update sibling maps */
-	base = cpu_first_thread_in_core(cpu);
+	base = cpu_leftmost_thread_sibling(cpu);
 	for (i = 0; i < threads_per_core; i++) {
 		if (cpu_is_offline(base + i))
 			continue;
@@ -606,7 +619,7 @@ int __cpu_disable(void)
 		return err;

 	/* Update sibling maps */
-	base = cpu_first_thread_in_core(cpu);
+	base = cpu_leftmost_thread_sibling(cpu);
 	for (i = 0; i < threads_per_core; i++) {
 		cpumask_clear_cpu(cpu, cpu_sibling_mask(base + i));
 		cpumask_clear_cpu(base + i, cpu_sibling_mask(cpu));
diff --git a/arch/powerpc/mm/mmu_context_nohash.c b/arch/powerpc/mm/mmu_context_nohash.c
index ddfd7ad..22f3bc5 100644
--- a/arch/powerpc/mm/mmu_context_nohash.c
+++ b/arch/powerpc/mm/mmu_context_nohash.c
@@ -111,8 +111,8 @@ static unsigned int steal_context_smp(unsigned int id)
 		 * a core map instead but this will do for now.
 		 */
 		for_each_cpu(cpu, mm_cpumask(mm)) {
-			for (i = cpu_first_thread_in_core(cpu);
-			     i <= cpu_last_thread_in_core(cpu); i++)
+			for (i = cpu_leftmost_thread_sibling(cpu);
+			     i <= cpu_rightmost_thread_sibling(cpu); i++)
 				__set_bit(id, stale_map[i]);
 			cpu = i - 1;
 		}
@@ -264,14 +264,14 @@ void switch_mmu_context(struct mm_struct *prev, struct mm_struct *next)
 	 */
 	if (test_bit(id, stale_map[cpu])) {
 		pr_hardcont(" | stale flush %d [%d..%d]",
-			    id, cpu_first_thread_in_core(cpu),
-			    cpu_last_thread_in_core(cpu));
+			    id, cpu_leftmost_thread_sibling(cpu),
+			    cpu_rightmost_thread_sibling(cpu));

 		local_flush_tlb_mm(next);

 		/* XXX This clear should ultimately be part of local_flush_tlb_mm */
-		for (i = cpu_first_thread_in_core(cpu);
-		     i <= cpu_last_thread_in_core(cpu); i++) {
+		for (i = cpu_leftmost_thread_sibling(cpu);
+		     i <= cpu_rightmost_thread_sibling(cpu); i++) {
 			__clear_bit(id, stale_map[i]);
 		}
 	}

^ permalink raw reply related

* [PATCH v3 0/2] powerpc: add support for new hcall H_BEST_ENERGY
From: Vaidyanathan Srinivasan @ 2010-06-23  6:03 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Anton Blanchard; +Cc: linuxppc-dev

Hi Ben,

The following series adds a kernel module for powerpc pseries platforms in
order to export platform energy management capabilities.

The module exports data from a new hypervisor call H_BEST_ENERGY.

Comments and suggestions made on the previous iteration of the patch
has been incorporated.

Changed in v3:

* Added more documentation in the cleanup patch
* Removed RFC tag, rebased and tested on 2.6.35-rc3
* Ready for inclusion in powerpc/next tree for further testing

Changes in v2:

[1] [RFC PATCH v2 0/2] powerpc: add support for new hcall H_BEST_ENERGY
http://lists.ozlabs.org/pipermail/linuxppc-dev/2010-May/082246.html

* Cleanup cpu/thread/core APIs
* Export APIs to module instead of threads_per_core
* Use of_find_node_by_path() instead of of_find_node_by_name()
* Error checking and whitespace cleanups

First version:
[2] [RFC] powerpc: add support for new hcall H_BEST_ENERGY
http://lists.ozlabs.org/pipermail/linuxppc-dev/2010-March/080796.html

Please review and include in powerpc/next tree for further testing.  I will
add the following enhancements incrementally:

* Detect h_call support from ibm,hypertas-functions
* Limit sysfs file creation only on hcall availability

This patch series will apply on 2.6.35-rc3 as well as powerpc/next tree.

Thanks,
Vaidy
---

Vaidyanathan Srinivasan (2):
      powerpc: cleanup APIs for cpu/thread/core mappings
      powerpc: add support for new hcall H_BEST_ENERGY


 arch/powerpc/include/asm/cputhreads.h           |   15 +
 arch/powerpc/include/asm/hvcall.h               |    3 
 arch/powerpc/kernel/smp.c                       |   19 +-
 arch/powerpc/mm/mmu_context_nohash.c            |   12 +
 arch/powerpc/platforms/pseries/Kconfig          |   10 +
 arch/powerpc/platforms/pseries/Makefile         |    1 
 arch/powerpc/platforms/pseries/pseries_energy.c |  258 +++++++++++++++++++++++
 7 files changed, 302 insertions(+), 16 deletions(-)
 create mode 100644 arch/powerpc/platforms/pseries/pseries_energy.c

-- 

^ permalink raw reply

* Re: [PATCH 10/40] tracing: add tracing support for compat syscalls
From: Steven Rostedt @ 2010-06-23 15:26 UTC (permalink / raw)
  To: Ian Munsie
  Cc: Lai Jiangshan, Arnd Bergmann, Jason Baron, Li Zefan, linux-kernel,
	linuxppc-dev, Ingo Molnar, Paul Mackerras, Frederic Weisbecker,
	Andrew Morton, David S. Miller, Masami Hiramatsu
In-Reply-To: <1277287401-28571-11-git-send-email-imunsie@au1.ibm.com>

On Wed, 2010-06-23 at 20:02 +1000, Ian Munsie wrote:
> From: Jason Baron <jbaron@redhat.com>
> 
> Add core support to event tracing for compat syscalls. The basic idea is that we
> check if we have a compat task via 'is_compat_task()'. If so, we lookup in the
> new compat_syscalls_metadata table, the corresponding struct syscall_metadata, to
> check syscall arguments and whether or not we are enabled.
> 
> Signed-off-by: Jason Baron <jbaron@redhat.com>
> Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
> ---
>  include/linux/compat.h        |    2 +
>  include/trace/syscall.h       |    4 ++
>  kernel/trace/trace.h          |    2 +
>  kernel/trace/trace_syscalls.c |   86 +++++++++++++++++++++++++++++++++++++----
>  4 files changed, 86 insertions(+), 8 deletions(-)
> 
> diff --git a/include/linux/compat.h b/include/linux/compat.h
> index ab638e9..a94f13d 100644
> --- a/include/linux/compat.h
> +++ b/include/linux/compat.h
> @@ -363,6 +363,8 @@ extern ssize_t compat_rw_copy_check_uvector(int type,
>  
>  #else /* CONFIG_COMPAT */
>  
> +#define NR_syscalls_compat 0
> +
>  static inline int is_compat_task(void)
>  {
>  	return 0;
> diff --git a/include/trace/syscall.h b/include/trace/syscall.h
> index 75f3dce..67d4e64 100644
> --- a/include/trace/syscall.h
> +++ b/include/trace/syscall.h
> @@ -22,6 +22,7 @@
>  struct syscall_metadata {
>  	const char	*name;
>  	int		syscall_nr;
> +	int		compat_syscall_nr;

This is also bloating the kernel. I don't like to add fields to the
syscall_metadata lightly.

You're adding 4 more bytes to this structure to handle a few items. Find
a better way to do this.

>  	int		nb_args;
>  	const char	**types;
>  	const char	**args;
> @@ -38,6 +39,9 @@ struct syscall_metadata {
>  
>  #ifdef CONFIG_FTRACE_SYSCALLS
>  extern unsigned long arch_syscall_addr(int nr);
> +#ifdef CONFIG_COMPAT
> +extern unsigned long arch_compat_syscall_addr(int nr);
> +#endif
>  extern int init_syscall_trace(struct ftrace_event_call *call);
>  
>  extern int reg_event_syscall_enter(struct ftrace_event_call *call);
> diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
> index 01ce088..53ace4b 100644
> --- a/kernel/trace/trace.h
> +++ b/kernel/trace/trace.h
> @@ -79,12 +79,14 @@ enum trace_type {
>  struct syscall_trace_enter {
>  	struct trace_entry	ent;
>  	int			nr;
> +	int			compat;

You're adding 4 bytes to a trace (taking up precious buffer space) for a
single flag. If anything, set the 31st bit of nr if it is compat.

-- Steve

>  	unsigned long		args[];
>  };
>  
>  struct syscall_trace_exit {
>  	struct trace_entry	ent;
>  	int			nr;
> +	int			compat;
>  	long			ret;
>  };
>  

^ permalink raw reply

* Re: [PATCH 08/40] tracing: remove syscall bitmaps in preparation for compat support
From: Steven Rostedt @ 2010-06-23 15:16 UTC (permalink / raw)
  To: Ian Munsie
  Cc: Lai Jiangshan, Jason Baron, linux-kernel, linuxppc-dev,
	Ingo Molnar, Paul Mackerras, Frederic Weisbecker, Ingo Molnar,
	Masami Hiramatsu
In-Reply-To: <1277287401-28571-9-git-send-email-imunsie@au1.ibm.com>

On Wed, 2010-06-23 at 20:02 +1000, Ian Munsie wrote:
> From: Jason Baron <jbaron@redhat.com>
> 
> In preparation for compat syscall tracing support, let's store the enabled
> syscalls, with the struct syscall_metadata itself. That way we don't duplicate
> enabled information when the compat table points to an entry in the regular
> syscall table. Also, allows us to remove the bitmap data structures completely.
> 
> Signed-off-by: Jason Baron <jbaron@redhat.com>
> Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
> ---
>  include/linux/syscalls.h      |    8 +++++++
>  include/trace/syscall.h       |    4 +++
>  kernel/trace/trace_syscalls.c |   42 +++++++++++++++++++---------------------
>  3 files changed, 32 insertions(+), 22 deletions(-)
> 
> diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
> index 86f082b..755d05b 100644
> --- a/include/linux/syscalls.h
> +++ b/include/linux/syscalls.h
> @@ -163,6 +163,10 @@ extern struct trace_event_functions exit_syscall_print_funcs;
>  		.nb_args 	= nb,				\
>  		.types		= types_##sname,		\
>  		.args		= args_##sname,			\
> +		.ftrace_enter	= 0,				\
> +		.ftrace_exit	= 0,				\
> +		.perf_enter	= 0,				\
> +		.perf_exit	= 0,				\

I really hate this change!

You just removed a nice compressed bitmap (1 bit per syscall) to add 4
bytes per syscall. On my box I have 308 syscalls being traced. That was
308 bits per bitmask = 39 bytes * 2 = 78 * 2 (perf and ftrace) = 156.

Now we have 8 bytes per syscall (enter and exit), which is 1232 bytes.

Thus this change added 1076 bytes.

This may not seem as much, but the change is not worth 1K. Can't we just
add another bitmask or something for the compat case?

I also hate the moving of ftrace and perf internal data to an external
interface.

-- Steve

>  		.enter_event	= &event_enter_##sname,		\
>  		.exit_event	= &event_exit_##sname,		\
>  		.enter_fields	= LIST_HEAD_INIT(__syscall_meta_##sname.enter_fields), \
> @@ -179,6 +183,10 @@ extern struct trace_event_functions exit_syscall_print_funcs;
>  		.name 		= "sys_"#sname,			\
>  		.syscall_nr	= -1,	/* Filled in at boot */	\
>  		.nb_args 	= 0,				\
> +		.ftrace_enter	= 0,				\
> +		.ftrace_exit	= 0,				\
> +		.perf_enter	= 0,				\
> +		.perf_exit	= 0,				\
>  		.enter_event	= &event_enter__##sname,	\
>  		.exit_event	= &event_exit__##sname,		\
>  		.enter_fields	= LIST_HEAD_INIT(__syscall_meta__##sname.enter_fields), \

^ permalink raw reply

* Re: [PATCH 01/40] ftrace syscalls: don't add events for unmapped syscalls
From: Steven Rostedt @ 2010-06-23 15:02 UTC (permalink / raw)
  To: Ian Munsie
  Cc: Jason Baron, linux-kernel, linuxppc-dev, Ingo Molnar,
	Paul Mackerras, Frederic Weisbecker, Ingo Molnar,
	Masami Hiramatsu
In-Reply-To: <1277287401-28571-2-git-send-email-imunsie@au1.ibm.com>

On Wed, 2010-06-23 at 20:02 +1000, Ian Munsie wrote:
> From: Ian Munsie <imunsie@au1.ibm.com>
> 
> FTRACE_SYSCALLS would create events for each and every system call, even
> if it had failed to map the system call's name with it's number. This
> resulted in a number of events being created that would not behave as
> expected.
> 
> This could happen, for example, on architectures who's symbol names are
> unusual and will not match the system call name. It could also happen
> with system calls which were mapped to sys_ni_syscall.
> 
> This patch changes the default system call number in the metadata to -1.
> If the system call name from the metadata is not successfully mapped to
> a system call number during boot, than the event initialisation routine
> will now return an error, preventing the event from being created.
> 
> Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
> ---
>  include/linux/syscalls.h      |    2 ++
>  kernel/trace/trace_syscalls.c |    8 ++++++++
>  2 files changed, 10 insertions(+), 0 deletions(-)
> 
> diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
> index 7f614ce..86f082b 100644
> --- a/include/linux/syscalls.h
> +++ b/include/linux/syscalls.h
> @@ -159,6 +159,7 @@ extern struct trace_event_functions exit_syscall_print_funcs;
>  	  __attribute__((section("__syscalls_metadata")))	\
>  	  __syscall_meta_##sname = {				\
>  		.name 		= "sys"#sname,			\
> +		.syscall_nr	= -1,	/* Filled in at boot */	\
>  		.nb_args 	= nb,				\
>  		.types		= types_##sname,		\
>  		.args		= args_##sname,			\
> @@ -176,6 +177,7 @@ extern struct trace_event_functions exit_syscall_print_funcs;
>  	  __attribute__((section("__syscalls_metadata")))	\
>  	  __syscall_meta__##sname = {				\
>  		.name 		= "sys_"#sname,			\
> +		.syscall_nr	= -1,	/* Filled in at boot */	\
>  		.nb_args 	= 0,				\
>  		.enter_event	= &event_enter__##sname,	\
>  		.exit_event	= &event_exit__##sname,		\
> diff --git a/kernel/trace/trace_syscalls.c b/kernel/trace/trace_syscalls.c
> index 34e3580..82246ce 100644
> --- a/kernel/trace/trace_syscalls.c
> +++ b/kernel/trace/trace_syscalls.c
> @@ -431,6 +431,14 @@ void unreg_event_syscall_exit(struct ftrace_event_call *call)
>  int init_syscall_trace(struct ftrace_event_call *call)
>  {
>  	int id;
> +	int num;
> +
> +	num = ((struct syscall_metadata *)call->data)->syscall_nr;
> +	if (num < 0 || num >= NR_syscalls) {
> +		pr_debug("syscall %s metadata not mapped, disabling ftrace event\n",
> +				((struct syscall_metadata *)call->data)->name);
> +		return -ENOSYS;
> +	}

Perhaps this should be:

	if (WARN_ON_ONCE(num < 0 || num >= NR_syscalls)
		return -ENOSYS;

-- Steve

>  
>  	if (set_syscall_print_fmt(call) < 0)
>  		return -ENOMEM;

^ permalink raw reply

* unsubscribe
From: Frederic LEGER @ 2010-06-23 14:33 UTC (permalink / raw)
  To: linuxppc-dev

unsubscribe
 

^ permalink raw reply

* Re: [PATCH 31/40] trace syscalls: Convert various generic compat syscalls
From: Christoph Hellwig @ 2010-06-23 12:41 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Christoph Lameter, WANG Cong, Frederic Weisbecker, Heiko Carstens,
	Oleg Nesterov, linuxppc-dev, Paul Mackerras, H. Peter Anvin,
	Sam Ravnborg, Mike Frysinger, Eric Dumazet, Jeff Moyer,
	Ingo Molnar, KOSAKI Motohiro, Ingo Molnar, Arnd Bergmann,
	David Rientjes, Steven Rostedt, Ian Munsie, Thomas Gleixner,
	Johannes Berg, Roland McGrath, Lee Schermerhorn,
	Arnaldo Carvalho de Melo, Neil Horman, linux-mm, netdev,
	Jason Baron, Greg Kroah-Hartman, Roel Kluin, linux-kernel, kexec,
	Eric Biederman, linux-fsdevel, Simon Kagstrom, Andrew Morton,
	David S. Miller, Alexander Viro
In-Reply-To: <4C21FF9A.2040207@linux.intel.com>

On Wed, Jun 23, 2010 at 02:35:38PM +0200, Andi Kleen wrote:
> >I haven't heard any complains about existing syscalls wrappers.
> 
> At least for me they always interrupt my grepping.
> 
> >
> >What kind of annotations could solve that?
> 
> If you put the annotation in a separate macro and leave the original
> prototype alone. Then C parsers could still parse it.

I personally hate the way SYSCALL_DEFINE works with passion, mostly
for the grep reason, but also because it looks horribly ugly.

But there is no reason not to be consistent here.  We already use
the wrappers for all native system calls, so leaving the compat
calls out doesn't make any sense.  And I'd cheer for anyone who
comes up with a better scheme for the native and compat wrappers.

^ permalink raw reply

* Re: [PATCH 31/40] trace syscalls: Convert various generic compat syscalls
From: Andi Kleen @ 2010-06-23 12:35 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Christoph Lameter, WANG Cong, Heiko Carstens, Oleg Nesterov,
	linuxppc-dev, Paul Mackerras, H. Peter Anvin, Sam Ravnborg,
	Mike Frysinger, Eric Dumazet, Jeff Moyer, Ingo Molnar,
	KOSAKI Motohiro, David Rientjes, Ingo Molnar, Arnd Bergmann,
	Steven Rostedt, Ian Munsie, Thomas Gleixner, Johannes Berg,
	Roland McGrath, Lee Schermerhorn, Arnaldo Carvalho de Melo,
	Neil Horman, linux-mm, netdev, Jason Baron, Greg Kroah-Hartman,
	Roel Kluin, linux-kernel, kexec, Eric Biederman, linux-fsdevel,
	Simon Kagstrom, Andrew Morton, David S. Miller, Alexander Viro
In-Reply-To: <20100623113806.GD5242@nowhere>


>
> I haven't heard any complains about existing syscalls wrappers.

At least for me they always interrupt my grepping.

>
> What kind of annotations could solve that?

If you put the annotation in a separate macro and leave the original
prototype alone. Then C parsers could still parse it.

-Andi

^ permalink raw reply

* Re: [PATCH 12/40] x86, compat: convert ia32 layer to use
From: Frederic Weisbecker @ 2010-06-23 11:41 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jason Baron, x86, Greg Ungerer, linux-kernel, Steven Rostedt,
	linuxppc-dev, Ingo Molnar, Paul Mackerras, Ian Munsie,
	H. Peter Anvin, Russell King, Thomas Gleixner, Andrew Morton
In-Reply-To: <20100623104603.GB11845@lst.de>

On Wed, Jun 23, 2010 at 12:46:04PM +0200, Christoph Hellwig wrote:
> On Wed, Jun 23, 2010 at 12:36:21PM +0200, Frederic Weisbecker wrote:
> > I think we wanted that to keep the sys32_ prefixed based naming, to avoid
> > collisions with generic compat handler names.
> 
> For native syscalls we do this by adding a arch prefix inside the
> syscall name, e.g.:
> 
> arch/s390/kernel/sys_s390.c:SYSCALL_DEFINE(s390_fallocate)(int fd, int mode, loff_t offset,
> arch/sparc/kernel/sys_sparc_64.c:SYSCALL_DEFINE1(sparc_pipe_real, struct pt_regs *, regs)


In fact we sort of wanted to standardize the name of arch overriden compat
syscalls, so that userspace programs playing with syscalls tracing won't have
to deal with arch naming differences.

^ permalink raw reply

* Re: [PATCH 31/40] trace syscalls: Convert various generic compat syscalls
From: Frederic Weisbecker @ 2010-06-23 11:38 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Christoph Lameter, WANG Cong, Heiko Carstens, Oleg Nesterov,
	linuxppc-dev, Paul Mackerras, H. Peter Anvin, Sam Ravnborg,
	Mike Frysinger, Eric Dumazet, Jeff Moyer, Ingo Molnar,
	KOSAKI Motohiro, David Rientjes, Ingo Molnar, Arnd Bergmann,
	Steven Rostedt, Ian Munsie, Thomas Gleixner, Johannes Berg,
	Roland McGrath, Lee Schermerhorn, Arnaldo Carvalho de Melo,
	Neil Horman, linux-mm, netdev, Jason Baron, Greg Kroah-Hartman,
	Roel Kluin, linux-kernel, kexec, Eric Biederman, linux-fsdevel,
	Simon Kagstrom, Andrew Morton, David S. Miller, Alexander Viro
In-Reply-To: <4C21E3F8.9000405@linux.intel.com>

On Wed, Jun 23, 2010 at 12:37:44PM +0200, Andi Kleen wrote:
> , Frederic Weisbecker wrote:
>> On Wed, Jun 23, 2010 at 12:19:38PM +0200, Andi Kleen wrote:
>>> , Ian Munsie wrote:
>>>> From: Ian Munsie<imunsie@au1.ibm.com>
>>>>
>>>> This patch converts numerous trivial compat syscalls through the generic
>>>> kernel code to use the COMPAT_SYSCALL_DEFINE family of macros.
>>>
>>> Why? This just makes the code look uglier and the functions harder
>>> to grep for.
>>
>>
>> Because it makes them usable with syscall tracing.
>
> Ok that information is missing in the changelog then.


Agreed, the changelog lacks the purpose of what it does.



> Also I hope the uglification<->usefullness factor is really worth it.
> The patch is certainly no slouch on the uglification side.


It's worth because the kernel's syscall tracing is not complete, we lack all
the compat part.

These wrappers let us create TRACE_EVENT() for every syscalls automatically.
If we had to create them manually, the uglification would be way much more worse.

Most syscalls use the syscall wrappers already, so the uglification
is there mostly. We just forgot to uglify a bunch of them :)


> It also has maintenance costs, e.g. I doubt ctags and cscope
> will be able to deal with these kinds of macros, so it has a
> high cost for everyone using these tools. For those
> it would be actually better if you used separate annotation
> that does not confuse standard C parsers.


I haven't heard any complains about existing syscalls wrappers.

What kind of annotations could solve that?

^ permalink raw reply

* Re: [PATCH 31/40] trace syscalls: Convert various generic compat syscalls
From: KOSAKI Motohiro @ 2010-06-23 10:30 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Christoph Lameter, WANG Cong, Frederic Weisbecker, Heiko Carstens,
	Oleg Nesterov, linuxppc-dev, Paul Mackerras, H. Peter Anvin,
	Sam Ravnborg, Mike Frysinger, Eric Dumazet, Jeff Moyer,
	Ingo Molnar, kosaki.motohiro, Ingo Molnar, Arnd Bergmann,
	David Rientjes, Steven Rostedt, Ian Munsie, Thomas Gleixner,
	Johannes Berg, Roland McGrath, Lee Schermerhorn,
	Arnaldo Carvalho de Melo, Neil Horman, linux-mm, netdev,
	Jason Baron, Greg Kroah-Hartman, Roel Kluin, linux-kernel, kexec,
	Eric Biederman, linux-fsdevel, Simon Kagstrom, Andrew Morton,
	David S. Miller, Alexander Viro
In-Reply-To: <4C21DFBA.2070202@linux.intel.com>

> , Ian Munsie wrote:
> > From: Ian Munsie<imunsie@au1.ibm.com>
> >
> > This patch converts numerous trivial compat syscalls through the generic
> > kernel code to use the COMPAT_SYSCALL_DEFINE family of macros.
> 
> Why? This just makes the code look uglier and the functions harder
> to grep for.

I guess trace-syscall feature need to override COMPAT_SYSCALL_DEFINE. but
It's only guess...

^ permalink raw reply

* Re: [PATCH 31/40] trace syscalls: Convert various generic compat syscalls
From: Andi Kleen @ 2010-06-23 10:37 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Christoph Lameter, WANG Cong, Heiko Carstens, Oleg Nesterov,
	linuxppc-dev, Paul Mackerras, H. Peter Anvin, Sam Ravnborg,
	Mike Frysinger, Eric Dumazet, Jeff Moyer, Ingo Molnar,
	KOSAKI Motohiro, David Rientjes, Ingo Molnar, Arnd Bergmann,
	Steven Rostedt, Ian Munsie, Thomas Gleixner, Johannes Berg,
	Roland McGrath, Lee Schermerhorn, Arnaldo Carvalho de Melo,
	Neil Horman, linux-mm, netdev, Jason Baron, Greg Kroah-Hartman,
	Roel Kluin, linux-kernel, kexec, Eric Biederman, linux-fsdevel,
	Simon Kagstrom, Andrew Morton, David S. Miller, Alexander Viro
In-Reply-To: <20100623102931.GB5242@nowhere>

, Frederic Weisbecker wrote:
> On Wed, Jun 23, 2010 at 12:19:38PM +0200, Andi Kleen wrote:
>> , Ian Munsie wrote:
>>> From: Ian Munsie<imunsie@au1.ibm.com>
>>>
>>> This patch converts numerous trivial compat syscalls through the generic
>>> kernel code to use the COMPAT_SYSCALL_DEFINE family of macros.
>>
>> Why? This just makes the code look uglier and the functions harder
>> to grep for.
>
>
> Because it makes them usable with syscall tracing.

Ok that information is missing in the changelog then.

Also I hope the uglification<->usefullness factor is really worth it.
The patch is certainly no slouch on the uglification side.

It also has maintenance costs, e.g. I doubt ctags and cscope
will be able to deal with these kinds of macros, so it has a
high cost for everyone using these tools. For those
it would be actually better if you used separate annotation
that does not confuse standard C parsers.

-Andi

^ permalink raw reply

* Re: [PATCH 12/40] x86, compat: convert ia32 layer to use
From: Christoph Hellwig @ 2010-06-23 10:46 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Andrew Morton, Jason Baron, x86, Greg Ungerer, linux-kernel,
	Steven Rostedt, linuxppc-dev, Ingo Molnar, Paul Mackerras,
	Ian Munsie, H. Peter Anvin, Russell King, Thomas Gleixner,
	Christoph Hellwig
In-Reply-To: <20100623103619.GC5242@nowhere>

On Wed, Jun 23, 2010 at 12:36:21PM +0200, Frederic Weisbecker wrote:
> I think we wanted that to keep the sys32_ prefixed based naming, to avoid
> collisions with generic compat handler names.

For native syscalls we do this by adding a arch prefix inside the
syscall name, e.g.:

arch/s390/kernel/sys_s390.c:SYSCALL_DEFINE(s390_fallocate)(int fd, int mode, loff_t offset,
arch/sparc/kernel/sys_sparc_64.c:SYSCALL_DEFINE1(sparc_pipe_real, struct pt_regs *, regs)

^ permalink raw reply

* Re: [PATCH 12/40] x86, compat: convert ia32 layer to use
From: Frederic Weisbecker @ 2010-06-23 10:36 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jason Baron, x86, Greg Ungerer, linux-kernel, Steven Rostedt,
	linuxppc-dev, Ingo Molnar, Paul Mackerras, Ian Munsie,
	H. Peter Anvin, Russell King, Thomas Gleixner, Andrew Morton
In-Reply-To: <20100623101402.GA10564@lst.de>

On Wed, Jun 23, 2010 at 12:14:02PM +0200, Christoph Hellwig wrote:
> Any reason we need to differenciate between COMPAT_SYSCALL_DEFINE and
> ARCH_COMPAT_SYSCALL_DEFINE?  We don't need this for native system calls,
> so I can't see the reason to do it for compat system calls.
> 


I think we wanted that to keep the sys32_ prefixed based naming, to avoid
collisions with generic compat handler names.

^ permalink raw reply

* Re: [PATCH 31/40] trace syscalls: Convert various generic compat syscalls
From: Andi Kleen @ 2010-06-23 10:19 UTC (permalink / raw)
  To: Ian Munsie
  Cc: Christoph Lameter, WANG Cong, Frederic Weisbecker, Heiko Carstens,
	Oleg Nesterov, linuxppc-dev, Paul Mackerras, H. Peter Anvin,
	Sam Ravnborg, Mike Frysinger, Eric Dumazet, Jeff Moyer,
	Ingo Molnar, KOSAKI Motohiro, Ingo Molnar, Arnd Bergmann,
	David Rientjes, Steven Rostedt, Alexander Viro, Thomas Gleixner,
	Johannes Berg, Roland McGrath, Lee Schermerhorn,
	Arnaldo Carvalho de Melo, Neil Horman, linux-mm, netdev,
	Jason Baron, Greg Kroah-Hartman, Roel Kluin, linux-kernel, kexec,
	Eric Biederman, linux-fsdevel, Simon Kagstrom, Andrew Morton,
	David S. Miller
In-Reply-To: <1277287401-28571-32-git-send-email-imunsie@au1.ibm.com>

, Ian Munsie wrote:
> From: Ian Munsie<imunsie@au1.ibm.com>
>
> This patch converts numerous trivial compat syscalls through the generic
> kernel code to use the COMPAT_SYSCALL_DEFINE family of macros.

Why? This just makes the code look uglier and the functions harder
to grep for.

-Andi

^ permalink raw reply

* Re: [PATCH 31/40] trace syscalls: Convert various generic compat syscalls
From: Frederic Weisbecker @ 2010-06-23 10:29 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Christoph Lameter, WANG Cong, Heiko Carstens, Oleg Nesterov,
	linuxppc-dev, Paul Mackerras, H. Peter Anvin, Sam Ravnborg,
	Mike Frysinger, Eric Dumazet, Jeff Moyer, Ingo Molnar,
	KOSAKI Motohiro, David Rientjes, Ingo Molnar, Arnd Bergmann,
	Steven Rostedt, Ian Munsie, Thomas Gleixner, Johannes Berg,
	Roland McGrath, Lee Schermerhorn, Arnaldo Carvalho de Melo,
	Neil Horman, linux-mm, netdev, Jason Baron, Greg Kroah-Hartman,
	Roel Kluin, linux-kernel, kexec, Eric Biederman, linux-fsdevel,
	Simon Kagstrom, Andrew Morton, David S. Miller, Alexander Viro
In-Reply-To: <4C21DFBA.2070202@linux.intel.com>

On Wed, Jun 23, 2010 at 12:19:38PM +0200, Andi Kleen wrote:
> , Ian Munsie wrote:
>> From: Ian Munsie<imunsie@au1.ibm.com>
>>
>> This patch converts numerous trivial compat syscalls through the generic
>> kernel code to use the COMPAT_SYSCALL_DEFINE family of macros.
>
> Why? This just makes the code look uglier and the functions harder
> to grep for.


Because it makes them usable with syscall tracing.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox