linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] Updated: Reworked Cell OProfile: SPU mutex lock fix
@ 2008-05-01 14:43 Carl Love
  2008-05-01 16:29 ` Maynard Johnson
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Carl Love @ 2008-05-01 14:43 UTC (permalink / raw)
  To: linuxppc-dev, cbe-oss-dev, linux-kernel, Arnd Bergmann,
	oprofile-list

Sorry, looks like my mailer mangled the file.

This is a reworked patch to fix the SPU data storage.  Currently, the 
SPU escape sequences and program counter data is being added directly 
into the kernel buffer without holding the buffer_mutex lock.  This 
patch changes how the data is stored.  A new function,
oprofile_add_value, is added into the oprofile driver to allow adding
generic data to the per cpu buffers.  This enables a series of calls
to the oprofile_add_value to enter the needed SPU escape sequences 
and SPU program data into the kernel buffer via the per cpu buffers
without any additional processing. The oprofile_add_value function is
generic so it could be used by other architecures as well provided
the needed postprocessing was added to opreport.

Finally, this patch backs out the changes previously added to the 
oprofile generic code for handling the architecture specific 
ops.sync_start and ops.sync_stop that allowed the architecture
to skip the per CPU buffer creation.

Signed-off-by: Carl Love <carll@us.ibm.com>

Index: Cell_kernel_4_15_2008/arch/powerpc/oprofile/cell/pr_util.h
===================================================================
--- Cell_kernel_4_15_2008.orig/arch/powerpc/oprofile/cell/pr_util.h
+++ Cell_kernel_4_15_2008/arch/powerpc/oprofile/cell/pr_util.h
@@ -20,11 +20,6 @@
 #include <asm/cell-regs.h>
 #include <asm/spu.h>
 
-/* Defines used for sync_start */
-#define SKIP_GENERIC_SYNC 0
-#define SYNC_START_ERROR -1
-#define DO_GENERIC_SYNC 1
-
 struct spu_overlay_info {	/* map of sections within an SPU overlay */
 	unsigned int vma;	/* SPU virtual memory address from elf */
 	unsigned int size;	/* size of section from elf */
@@ -85,7 +80,7 @@ void stop_spu_profiling(void);
 int spu_sync_start(void);
 
 /* remove the hooks */
-int spu_sync_stop(void);
+void spu_sync_stop(void);
 
 /* Record SPU program counter samples to the oprofile event buffer. */
 void spu_sync_buffer(int spu_num, unsigned int *samples,
Index: Cell_kernel_4_15_2008/arch/powerpc/oprofile/cell/spu_task_sync.c
===================================================================
--- Cell_kernel_4_15_2008.orig/arch/powerpc/oprofile/cell/spu_task_sync.c
+++ Cell_kernel_4_15_2008/arch/powerpc/oprofile/cell/spu_task_sync.c
@@ -31,11 +31,12 @@
 
 #define RELEASE_ALL 9999
 
-static DEFINE_SPINLOCK(buffer_lock);
+static DEFINE_SPINLOCK(add_value_lock);
 static DEFINE_SPINLOCK(cache_lock);
 static int num_spu_nodes;
 int spu_prof_num_nodes;
 int last_guard_val[MAX_NUMNODES * 8];
+static int spu_ctx_sw_seen[MAX_NUMNODES * 8];
 
 /* Container for caching information about an active SPU task. */
 struct cached_info {
@@ -289,6 +290,7 @@ static int process_context_switch(struct
 	int retval;
 	unsigned int offset = 0;
 	unsigned long spu_cookie = 0, app_dcookie;
+	int cpu_buf;
 
 	retval = prepare_cached_spu_info(spu, objectId);
 	if (retval)
@@ -303,17 +305,28 @@ static int process_context_switch(struct
 		goto out;
 	}
 
-	/* Record context info in event buffer */
-	spin_lock_irqsave(&buffer_lock, flags);
-	add_event_entry(ESCAPE_CODE);
-	add_event_entry(SPU_CTX_SWITCH_CODE);
-	add_event_entry(spu->number);
-	add_event_entry(spu->pid);
-	add_event_entry(spu->tgid);
-	add_event_entry(app_dcookie);
-	add_event_entry(spu_cookie);
-	add_event_entry(offset);
-	spin_unlock_irqrestore(&buffer_lock, flags);
+	/* Record context info in event buffer.  Note, there are 4x more
+	 * SPUs then CPUs.  Map the SPU events/data for a given SPU to
+	 * the same CPU buffer.  Need to ensure the cntxt switch data and
+	 * samples stay in order.
+	 */
+	cpu_buf = spu->number >> 2;
+	spin_lock_irqsave(&add_value_lock, flags);
+	oprofile_add_value(ESCAPE_CODE, cpu_buf);
+	oprofile_add_value(SPU_CTX_SWITCH_CODE, cpu_buf);
+	oprofile_add_value(spu->number, cpu_buf);
+	oprofile_add_value(spu->pid, cpu_buf);
+	oprofile_add_value(spu->tgid, cpu_buf);
+	oprofile_add_value(app_dcookie, cpu_buf);
+	oprofile_add_value(spu_cookie, cpu_buf);
+	oprofile_add_value(offset, cpu_buf);
+
+	/* Set flag to indicate SPU PC data can now be written out.  If
+	 * the SPU program counter data is seen before an SPU context
+	 * record is seen, the postprocessing will fail.
+	 */
+	spu_ctx_sw_seen[spu->number] = 1;
+	spin_unlock_irqrestore(&add_value_lock, flags);
 	smp_wmb();	/* insure spu event buffer updates are written */
 			/* don't want entries intermingled... */
 out:
@@ -333,7 +346,6 @@ static int spu_active_notify(struct noti
 	unsigned long flags;
 	struct spu *the_spu = data;
 
-	pr_debug("SPU event notification arrived\n");
 	if (!val) {
 		spin_lock_irqsave(&cache_lock, flags);
 		retval = release_cached_info(the_spu->number);
@@ -363,38 +375,38 @@ static int number_of_online_nodes(void)
 /* The main purpose of this function is to synchronize
  * OProfile with SPUFS by registering to be notified of
  * SPU task switches.
- *
- * NOTE: When profiling SPUs, we must ensure that only
- * spu_sync_start is invoked and not the generic sync_start
- * in drivers/oprofile/oprof.c.	 A return value of
- * SKIP_GENERIC_SYNC or SYNC_START_ERROR will
- * accomplish this.
  */
 int spu_sync_start(void)
 {
 	int k;
-	int ret = SKIP_GENERIC_SYNC;
+	int ret = 0;
 	int register_ret;
-	unsigned long flags = 0;
+	int cpu;
 
 	spu_prof_num_nodes = number_of_online_nodes();
 	num_spu_nodes = spu_prof_num_nodes * 8;
 
-	spin_lock_irqsave(&buffer_lock, flags);
-	add_event_entry(ESCAPE_CODE);
-	add_event_entry(SPU_PROFILING_CODE);
-	add_event_entry(num_spu_nodes);
-	spin_unlock_irqrestore(&buffer_lock, flags);
+	/* The SPU_PROFILING_CODE escape sequence must proceed
+	 * the SPU context switch info.
+	 */
+	for_each_online_cpu(cpu) {
+		oprofile_add_value(ESCAPE_CODE, cpu);
+		oprofile_add_value(SPU_PROFILING_CODE, cpu);
+		oprofile_add_value((unsigned long int)
+					    num_spu_nodes, cpu);
+	}
 
 	/* Register for SPU events  */
 	register_ret = spu_switch_event_register(&spu_active);
 	if (register_ret) {
-		ret = SYNC_START_ERROR;
+		ret = -1;
 		goto out;
 	}
 
-	for (k = 0; k < (MAX_NUMNODES * 8); k++)
+	for (k = 0; k < (MAX_NUMNODES * 8); k++) {
 		last_guard_val[k] = 0;
+		spu_ctx_sw_seen[k] = 0;
+	}
 	pr_debug("spu_sync_start -- running.\n");
 out:
 	return ret;
@@ -432,7 +444,7 @@ void spu_sync_buffer(int spu_num, unsign
 
 	map = c_info->map;
 	the_spu = c_info->the_spu;
-	spin_lock(&buffer_lock);
+	spin_lock(&add_value_lock);
 	for (i = 0; i < num_samples; i++) {
 		unsigned int sample = *(samples+i);
 		int grd_val = 0;
@@ -452,31 +464,38 @@ void spu_sync_buffer(int spu_num, unsign
 			break;
 		}
 
-		add_event_entry(file_offset | spu_num_shifted);
+		/* We must ensure that the SPU context switch has been written
+		 * out before samples for the SPU.  Otherwise, the SPU context
+		 * information is not available and the postprocessing of the
+		 * SPU PC will fail with no available anonymous map information.
+		 */
+		if (spu_ctx_sw_seen[spu_num])
+			oprofile_add_value((file_offset | spu_num_shifted),
+						    (spu_num >> 2));
 	}
-	spin_unlock(&buffer_lock);
+	spin_unlock(&add_value_lock);
 out:
 	spin_unlock_irqrestore(&cache_lock, flags);
 }
 

-int spu_sync_stop(void)
+void spu_sync_stop(void)
 {
 	unsigned long flags = 0;
-	int ret = spu_switch_event_unregister(&spu_active);
-	if (ret) {
-		printk(KERN_ERR "SPU_PROF: "
-			"%s, line %d: spu_switch_event_unregister returned %d\n",
-			__FUNCTION__, __LINE__, ret);
-		goto out;
-	}
+
+	/* Ignoring the return value from the unregister
+	 * call.  A failed return value simply says there
+	 * was no registered event.  Hence there will not
+	 * be any calls to process a switch event that
+	 * could cause a problem.
+	 */
+	spu_switch_event_unregister(&spu_active);
 
 	spin_lock_irqsave(&cache_lock, flags);
-	ret = release_cached_info(RELEASE_ALL);
+	release_cached_info(RELEASE_ALL);
 	spin_unlock_irqrestore(&cache_lock, flags);
-out:
 	pr_debug("spu_sync_stop -- done.\n");
-	return ret;
+	return;
 }
 

Index: Cell_kernel_4_15_2008/arch/powerpc/oprofile/op_model_cell.c
===================================================================
--- Cell_kernel_4_15_2008.orig/arch/powerpc/oprofile/op_model_cell.c
+++ Cell_kernel_4_15_2008/arch/powerpc/oprofile/op_model_cell.c
@@ -1191,15 +1191,15 @@ static int cell_sync_start(void)
 	if (spu_cycle_reset)
 		return spu_sync_start();
 	else
-		return DO_GENERIC_SYNC;
+		return 0;
 }
 
-static int cell_sync_stop(void)
+static void cell_sync_stop(void)
 {
 	if (spu_cycle_reset)
-		return spu_sync_stop();
-	else
-		return 1;
+		spu_sync_stop();
+
+	return;
 }
 
 struct op_powerpc_model op_model_cell = {
Index: Cell_kernel_4_15_2008/drivers/oprofile/buffer_sync.c
===================================================================
--- Cell_kernel_4_15_2008.orig/drivers/oprofile/buffer_sync.c
+++ Cell_kernel_4_15_2008/drivers/oprofile/buffer_sync.c
@@ -521,6 +521,20 @@ void sync_buffer(int cpu)
 			} else if (s->event == CPU_TRACE_BEGIN) {
 				state = sb_bt_start;
 				add_trace_begin();
+			} else if (s->event == VALUE_HEADER_ID) {
+				/* The next event contains a value
+				 * to enter directly into the event buffer.
+				 */
+				increment_tail(cpu_buf);
+				i++;  /* one less entry in buffer to process */
+
+				s = &cpu_buf->buffer[cpu_buf->tail_pos];
+
+				if (unlikely(is_code(s->eip)))
+					add_event_entry(s->event);
+				else
+					printk("KERN_ERR Oprofile per CPU" \
+					       "buffer sequence error.\n");
 			} else {
 				struct mm_struct * oldmm = mm;
 
Index: Cell_kernel_4_15_2008/drivers/oprofile/cpu_buffer.c
===================================================================
--- Cell_kernel_4_15_2008.orig/drivers/oprofile/cpu_buffer.c
+++ Cell_kernel_4_15_2008/drivers/oprofile/cpu_buffer.c
@@ -224,6 +224,29 @@ static void oprofile_end_trace(struct op
 	cpu_buf->tracing = 0;
 }
 
+/*
+ * The first entry in the per cpu buffer consists of the escape code and
+ * the VALUE_HEADER_ID value.  The next entry consists of an escape code
+ * with the value to store.  The syn_buffer routine takes the value from
+ * the second entry and put it into the kernel buffer.
+ */
+void oprofile_add_value(unsigned long value, int cpu) {
+	struct oprofile_cpu_buffer * cpu_buf = &cpu_buffer[cpu];
+
+	/* Enter a sequence of two events.  The first event says the
+	 * next event contains a value that is to be put directly into the
+	 * event buffer.
+	 */
+
+	if (nr_available_slots(cpu_buf) < 3) {
+		cpu_buf->sample_lost_overflow++;
+		return;
+	}
+
+	add_sample(cpu_buf, ESCAPE_CODE, VALUE_HEADER_ID);
+	add_sample(cpu_buf, ESCAPE_CODE, value);
+}
+
 void oprofile_add_ext_sample(unsigned long pc, struct pt_regs * const regs,
 				unsigned long event, int is_kernel)
 {
Index: Cell_kernel_4_15_2008/drivers/oprofile/cpu_buffer.h
===================================================================
--- Cell_kernel_4_15_2008.orig/drivers/oprofile/cpu_buffer.h
+++ Cell_kernel_4_15_2008/drivers/oprofile/cpu_buffer.h
@@ -54,5 +54,6 @@ void cpu_buffer_reset(struct oprofile_cp
 /* transient events for the CPU buffer -> event buffer */
 #define CPU_IS_KERNEL 1
 #define CPU_TRACE_BEGIN 2
+#define VALUE_HEADER_ID 3
 
 #endif /* OPROFILE_CPU_BUFFER_H */
Index: Cell_kernel_4_15_2008/drivers/oprofile/oprof.c
===================================================================
--- Cell_kernel_4_15_2008.orig/drivers/oprofile/oprof.c
+++ Cell_kernel_4_15_2008/drivers/oprofile/oprof.c
@@ -53,24 +53,13 @@ int oprofile_setup(void)
 	 * us missing task deaths and eventually oopsing
 	 * when trying to process the event buffer.
 	 */
-	if (oprofile_ops.sync_start) {
-		int sync_ret = oprofile_ops.sync_start();
-		switch (sync_ret) {
-		case 0:
-			goto post_sync;
-		case 1:
-			goto do_generic;
-		case -1:
-			goto out3;
-		default:
-			goto out3;
-		}
-	}
-do_generic:
+	if (oprofile_ops.sync_start
+	    && ((err = oprofile_ops.sync_start())))
+		goto out2;
+
 	if ((err = sync_start()))
 		goto out3;
 
-post_sync:
 	is_setup = 1;
 	mutex_unlock(&start_mutex);
 	return 0;
@@ -133,20 +122,9 @@ out:
 void oprofile_shutdown(void)
 {
 	mutex_lock(&start_mutex);
-	if (oprofile_ops.sync_stop) {
-		int sync_ret = oprofile_ops.sync_stop();
-		switch (sync_ret) {
-		case 0:
-			goto post_sync;
-		case 1:
-			goto do_generic;
-		default:
-			goto post_sync;
-		}
-	}
-do_generic:
+	if (oprofile_ops.sync_stop)
+		oprofile_ops.sync_stop();
 	sync_stop();
-post_sync:
 	if (oprofile_ops.shutdown)
 		oprofile_ops.shutdown();
 	is_setup = 0;
Index: Cell_kernel_4_15_2008/include/linux/oprofile.h
===================================================================
--- Cell_kernel_4_15_2008.orig/include/linux/oprofile.h
+++ Cell_kernel_4_15_2008/include/linux/oprofile.h
@@ -56,12 +56,10 @@ struct oprofile_operations {
 	/* Stop delivering interrupts. */
 	void (*stop)(void);
 	/* Arch-specific buffer sync functions.
-	 * Return value = 0:  Success
-	 * Return value = -1: Failure
-	 * Return value = 1:  Run generic sync function
+	 * Sync start: Return 0 for Success,  -1 for Failure
 	 */
 	int (*sync_start)(void);
-	int (*sync_stop)(void);
+	void (*sync_stop)(void);
 
 	/* Initiate a stack backtrace. Optional. */
 	void (*backtrace)(struct pt_regs * const regs, unsigned int depth);
@@ -84,13 +82,6 @@ int oprofile_arch_init(struct oprofile_o
 void oprofile_arch_exit(void);
 
 /**
- * Add data to the event buffer.
- * The data passed is free-form, but typically consists of
- * file offsets, dcookies, context information, and ESCAPE codes.
- */
-void add_event_entry(unsigned long data);
-
-/**
  * Add a sample. This may be called from any context. Pass
  * smp_processor_id() as cpu.
  */
@@ -106,6 +97,17 @@ void oprofile_add_sample(struct pt_regs 
 void oprofile_add_ext_sample(unsigned long pc, struct pt_regs * const regs,
 				unsigned long event, int is_kernel);
 
+/*
+ * Add a value to the per CPU buffer.  The value is passed from the per CPU
+ * buffer to the kernel buffer with no additional processing.  The assumption
+ * is any processing of the value will be done in the postprocessor.  This
+ * function should only be used for special architecture specific data.
+ * Currently only used by the CELL processor.
+ *
+ * This function does not perform a backtrace.
+ */
+void oprofile_add_value(unsigned long value, int cpu);
+
 /* Use this instead when the PC value is not from the regs. Doesn't
  * backtrace. */
 void oprofile_add_pc(unsigned long pc, int is_kernel, unsigned long event);
Index: Cell_kernel_4_15_2008/include/asm-powerpc/oprofile_impl.h
===================================================================
--- Cell_kernel_4_15_2008.orig/include/asm-powerpc/oprofile_impl.h
+++ Cell_kernel_4_15_2008/include/asm-powerpc/oprofile_impl.h
@@ -48,7 +48,7 @@ struct op_powerpc_model {
 	void (*stop) (void);
 	void (*global_stop) (void);
 	int (*sync_start)(void);
-	int (*sync_stop)(void);
+	void (*sync_stop)(void);
 	void (*handle_interrupt) (struct pt_regs *,
 				  struct op_counter_config *);
 	int num_counters;
Index: Cell_kernel_4_15_2008/drivers/oprofile/event_buffer.h
===================================================================
--- Cell_kernel_4_15_2008.orig/drivers/oprofile/event_buffer.h
+++ Cell_kernel_4_15_2008/drivers/oprofile/event_buffer.h
@@ -17,6 +17,14 @@ int alloc_event_buffer(void);
 
 void free_event_buffer(void);
  
+
+/**
+ * Add data to the event buffer.
+ * The data passed is free-form, but typically consists of
+ * file offsets, dcookies, context information, and ESCAPE codes.
+ */
+void add_event_entry(unsigned long data);
+
 /* wake up the process sleeping on the event file */
 void wake_up_buffer_waiter(void);
 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Updated: Reworked Cell OProfile: SPU mutex lock fix
  2008-05-01 14:43 [PATCH] Updated: Reworked Cell OProfile: SPU mutex lock fix Carl Love
@ 2008-05-01 16:29 ` Maynard Johnson
  2008-05-02 19:52 ` Maynard Johnson
  2008-05-15 15:39 ` Arnd Bergmann
  2 siblings, 0 replies; 7+ messages in thread
From: Maynard Johnson @ 2008-05-01 16:29 UTC (permalink / raw)
  To: Carl Love
  Cc: linuxppc-dev, oprofile-list, cbe-oss-dev, Arnd Bergmann,
	linux-kernel

Carl Love wrote:
> Sorry, looks like my mailer mangled the file.
>
> This is a reworked patch to fix the SPU data storage.  Currently, the 
> SPU escape sequences and program counter data is being added directly 
> into the kernel buffer without holding the buffer_mutex lock.  This 
> patch changes how the data is stored.  A new function,
> oprofile_add_value, is added into the oprofile driver to allow adding
> generic data to the per cpu buffers.  This enables a series of calls
> to the oprofile_add_value to enter the needed SPU escape sequences 
> and SPU program data into the kernel buffer via the per cpu buffers
> without any additional processing. The oprofile_add_value function is
> generic so it could be used by other architecures as well provided
> the needed postprocessing was added to opreport.
>
> Finally, this patch backs out the changes previously added to the 
> oprofile generic code for handling the architecture specific 
> ops.sync_start and ops.sync_stop that allowed the architecture
> to skip the per CPU buffer creation.
>
> Signed-off-by: Carl Love <carll@us.ibm.com>
>   
Looks fine to me.  All of my previous review comments have been addressed.

-Maynard
[snip]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Updated: Reworked Cell OProfile: SPU mutex lock fix
  2008-05-01 14:43 [PATCH] Updated: Reworked Cell OProfile: SPU mutex lock fix Carl Love
  2008-05-01 16:29 ` Maynard Johnson
@ 2008-05-02 19:52 ` Maynard Johnson
  2008-05-15 15:39 ` Arnd Bergmann
  2 siblings, 0 replies; 7+ messages in thread
From: Maynard Johnson @ 2008-05-02 19:52 UTC (permalink / raw)
  To: Philippe Elie
  Cc: Arnd Bergmann, robert.richter, linux-kernel, linuxppc-dev,
	oprofile-list, cbe-oss-dev, Carl Love

Phil,
When you have a chance, could you please take a look at the
arch-independent pieces of the OProfile kernel driver that this patch
touches?  In short, this patch fixes a bug that I was responsible for in
my original OProfile-Cell SPE port (argh!   :-[  )   where I was
improperly adding events to the main event buffer without first getting
the buffer_mutex.  Under certain timing conditions, this caused some
synchronicity problems between the daemon and the driver, resulting in a
very confused daemon (only when running on Cell, by the way).  So part
of the patch backs out some of the changes I had originally made (like
adding add_event_entry to include/linux/oprofile.h), and the rest of the
patch reworks the Cell code to use a new interface Carl added to oprofile.h.

Let me know if you have any questions about the patch as I worked pretty
closely with Carl while he was developing it.

P.S.  I'm cc'ing Robert since he expressed an interest in reviewing
kernel driver patches.

Thanks.
Regards,
-Maynard

Carl Love wrote:
> Sorry, looks like my mailer mangled the file.
>
> This is a reworked patch to fix the SPU data storage.  Currently, the 
> SPU escape sequences and program counter data is being added directly 
> into the kernel buffer without holding the buffer_mutex lock.  This 
> patch changes how the data is stored.  A new function,
> oprofile_add_value, is added into the oprofile driver to allow adding
> generic data to the per cpu buffers.  This enables a series of calls
> to the oprofile_add_value to enter the needed SPU escape sequences 
> and SPU program data into the kernel buffer via the per cpu buffers
> without any additional processing. The oprofile_add_value function is
> generic so it could be used by other architecures as well provided
> the needed postprocessing was added to opreport.
>
> Finally, this patch backs out the changes previously added to the 
> oprofile generic code for handling the architecture specific 
> ops.sync_start and ops.sync_stop that allowed the architecture
> to skip the per CPU buffer creation.
>   
> Signed-off-by: Carl Love <carll@us.ibm.com>
>
> Index: Cell_kernel_4_15_2008/arch/powerpc/oprofile/cell/pr_util.h
> ===================================================================
> --- Cell_kernel_4_15_2008.orig/arch/powerpc/oprofile/cell/pr_util.h
> +++ Cell_kernel_4_15_2008/arch/powerpc/oprofile/cell/pr_util.h
> @@ -20,11 +20,6 @@
>  #include <asm/cell-regs.h>
>  #include <asm/spu.h>
>
> -/* Defines used for sync_start */
> -#define SKIP_GENERIC_SYNC 0
> -#define SYNC_START_ERROR -1
> -#define DO_GENERIC_SYNC 1
> -
>  struct spu_overlay_info {	/* map of sections within an SPU overlay */
>  	unsigned int vma;	/* SPU virtual memory address from elf */
>  	unsigned int size;	/* size of section from elf */
> @@ -85,7 +80,7 @@ void stop_spu_profiling(void);
>  int spu_sync_start(void);
>
>  /* remove the hooks */
> -int spu_sync_stop(void);
> +void spu_sync_stop(void);
>
>  /* Record SPU program counter samples to the oprofile event buffer. */
>  void spu_sync_buffer(int spu_num, unsigned int *samples,
> Index: Cell_kernel_4_15_2008/arch/powerpc/oprofile/cell/spu_task_sync.c
> ===================================================================
> --- Cell_kernel_4_15_2008.orig/arch/powerpc/oprofile/cell/spu_task_sync.c
> +++ Cell_kernel_4_15_2008/arch/powerpc/oprofile/cell/spu_task_sync.c
> @@ -31,11 +31,12 @@
>
>  #define RELEASE_ALL 9999
>
> -static DEFINE_SPINLOCK(buffer_lock);
> +static DEFINE_SPINLOCK(add_value_lock);
>  static DEFINE_SPINLOCK(cache_lock);
>  static int num_spu_nodes;
>  int spu_prof_num_nodes;
>  int last_guard_val[MAX_NUMNODES * 8];
> +static int spu_ctx_sw_seen[MAX_NUMNODES * 8];
>
>  /* Container for caching information about an active SPU task. */
>  struct cached_info {
> @@ -289,6 +290,7 @@ static int process_context_switch(struct
>  	int retval;
>  	unsigned int offset = 0;
>  	unsigned long spu_cookie = 0, app_dcookie;
> +	int cpu_buf;
>
>  	retval = prepare_cached_spu_info(spu, objectId);
>  	if (retval)
> @@ -303,17 +305,28 @@ static int process_context_switch(struct
>  		goto out;
>  	}
>
> -	/* Record context info in event buffer */
> -	spin_lock_irqsave(&buffer_lock, flags);
> -	add_event_entry(ESCAPE_CODE);
> -	add_event_entry(SPU_CTX_SWITCH_CODE);
> -	add_event_entry(spu->number);
> -	add_event_entry(spu->pid);
> -	add_event_entry(spu->tgid);
> -	add_event_entry(app_dcookie);
> -	add_event_entry(spu_cookie);
> -	add_event_entry(offset);
> -	spin_unlock_irqrestore(&buffer_lock, flags);
> +	/* Record context info in event buffer.  Note, there are 4x more
> +	 * SPUs then CPUs.  Map the SPU events/data for a given SPU to
> +	 * the same CPU buffer.  Need to ensure the cntxt switch data and
> +	 * samples stay in order.
> +	 */
> +	cpu_buf = spu->number >> 2;
> +	spin_lock_irqsave(&add_value_lock, flags);
> +	oprofile_add_value(ESCAPE_CODE, cpu_buf);
> +	oprofile_add_value(SPU_CTX_SWITCH_CODE, cpu_buf);
> +	oprofile_add_value(spu->number, cpu_buf);
> +	oprofile_add_value(spu->pid, cpu_buf);
> +	oprofile_add_value(spu->tgid, cpu_buf);
> +	oprofile_add_value(app_dcookie, cpu_buf);
> +	oprofile_add_value(spu_cookie, cpu_buf);
> +	oprofile_add_value(offset, cpu_buf);
> +
> +	/* Set flag to indicate SPU PC data can now be written out.  If
> +	 * the SPU program counter data is seen before an SPU context
> +	 * record is seen, the postprocessing will fail.
> +	 */
> +	spu_ctx_sw_seen[spu->number] = 1;
> +	spin_unlock_irqrestore(&add_value_lock, flags);
>  	smp_wmb();	/* insure spu event buffer updates are written */
>  			/* don't want entries intermingled... */
>  out:
> @@ -333,7 +346,6 @@ static int spu_active_notify(struct noti
>  	unsigned long flags;
>  	struct spu *the_spu = data;
>
> -	pr_debug("SPU event notification arrived\n");
>  	if (!val) {
>  		spin_lock_irqsave(&cache_lock, flags);
>  		retval = release_cached_info(the_spu->number);
> @@ -363,38 +375,38 @@ static int number_of_online_nodes(void)
>  /* The main purpose of this function is to synchronize
>   * OProfile with SPUFS by registering to be notified of
>   * SPU task switches.
> - *
> - * NOTE: When profiling SPUs, we must ensure that only
> - * spu_sync_start is invoked and not the generic sync_start
> - * in drivers/oprofile/oprof.c.	 A return value of
> - * SKIP_GENERIC_SYNC or SYNC_START_ERROR will
> - * accomplish this.
>   */
>  int spu_sync_start(void)
>  {
>  	int k;
> -	int ret = SKIP_GENERIC_SYNC;
> +	int ret = 0;
>  	int register_ret;
> -	unsigned long flags = 0;
> +	int cpu;
>
>  	spu_prof_num_nodes = number_of_online_nodes();
>  	num_spu_nodes = spu_prof_num_nodes * 8;
>
> -	spin_lock_irqsave(&buffer_lock, flags);
> -	add_event_entry(ESCAPE_CODE);
> -	add_event_entry(SPU_PROFILING_CODE);
> -	add_event_entry(num_spu_nodes);
> -	spin_unlock_irqrestore(&buffer_lock, flags);
> +	/* The SPU_PROFILING_CODE escape sequence must proceed
> +	 * the SPU context switch info.
> +	 */
> +	for_each_online_cpu(cpu) {
> +		oprofile_add_value(ESCAPE_CODE, cpu);
> +		oprofile_add_value(SPU_PROFILING_CODE, cpu);
> +		oprofile_add_value((unsigned long int)
> +					    num_spu_nodes, cpu);
> +	}
>
>  	/* Register for SPU events  */
>  	register_ret = spu_switch_event_register(&spu_active);
>  	if (register_ret) {
> -		ret = SYNC_START_ERROR;
> +		ret = -1;
>  		goto out;
>  	}
>
> -	for (k = 0; k < (MAX_NUMNODES * 8); k++)
> +	for (k = 0; k < (MAX_NUMNODES * 8); k++) {
>  		last_guard_val[k] = 0;
> +		spu_ctx_sw_seen[k] = 0;
> +	}
>  	pr_debug("spu_sync_start -- running.\n");
>  out:
>  	return ret;
> @@ -432,7 +444,7 @@ void spu_sync_buffer(int spu_num, unsign
>
>  	map = c_info->map;
>  	the_spu = c_info->the_spu;
> -	spin_lock(&buffer_lock);
> +	spin_lock(&add_value_lock);
>  	for (i = 0; i < num_samples; i++) {
>  		unsigned int sample = *(samples+i);
>  		int grd_val = 0;
> @@ -452,31 +464,38 @@ void spu_sync_buffer(int spu_num, unsign
>  			break;
>  		}
>
> -		add_event_entry(file_offset | spu_num_shifted);
> +		/* We must ensure that the SPU context switch has been written
> +		 * out before samples for the SPU.  Otherwise, the SPU context
> +		 * information is not available and the postprocessing of the
> +		 * SPU PC will fail with no available anonymous map information.
> +		 */
> +		if (spu_ctx_sw_seen[spu_num])
> +			oprofile_add_value((file_offset | spu_num_shifted),
> +						    (spu_num >> 2));
>  	}
> -	spin_unlock(&buffer_lock);
> +	spin_unlock(&add_value_lock);
>  out:
>  	spin_unlock_irqrestore(&cache_lock, flags);
>  }
>
>
> -int spu_sync_stop(void)
> +void spu_sync_stop(void)
>  {
>  	unsigned long flags = 0;
> -	int ret = spu_switch_event_unregister(&spu_active);
> -	if (ret) {
> -		printk(KERN_ERR "SPU_PROF: "
> -			"%s, line %d: spu_switch_event_unregister returned %d\n",
> -			__FUNCTION__, __LINE__, ret);
> -		goto out;
> -	}
> +
> +	/* Ignoring the return value from the unregister
> +	 * call.  A failed return value simply says there
> +	 * was no registered event.  Hence there will not
> +	 * be any calls to process a switch event that
> +	 * could cause a problem.
> +	 */
> +	spu_switch_event_unregister(&spu_active);
>
>  	spin_lock_irqsave(&cache_lock, flags);
> -	ret = release_cached_info(RELEASE_ALL);
> +	release_cached_info(RELEASE_ALL);
>  	spin_unlock_irqrestore(&cache_lock, flags);
> -out:
>  	pr_debug("spu_sync_stop -- done.\n");
> -	return ret;
> +	return;
>  }
>
>
> Index: Cell_kernel_4_15_2008/arch/powerpc/oprofile/op_model_cell.c
> ===================================================================
> --- Cell_kernel_4_15_2008.orig/arch/powerpc/oprofile/op_model_cell.c
> +++ Cell_kernel_4_15_2008/arch/powerpc/oprofile/op_model_cell.c
> @@ -1191,15 +1191,15 @@ static int cell_sync_start(void)
>  	if (spu_cycle_reset)
>  		return spu_sync_start();
>  	else
> -		return DO_GENERIC_SYNC;
> +		return 0;
>  }
>
> -static int cell_sync_stop(void)
> +static void cell_sync_stop(void)
>  {
>  	if (spu_cycle_reset)
> -		return spu_sync_stop();
> -	else
> -		return 1;
> +		spu_sync_stop();
> +
> +	return;
>  }
>
>  struct op_powerpc_model op_model_cell = {
> Index: Cell_kernel_4_15_2008/drivers/oprofile/buffer_sync.c
> ===================================================================
> --- Cell_kernel_4_15_2008.orig/drivers/oprofile/buffer_sync.c
> +++ Cell_kernel_4_15_2008/drivers/oprofile/buffer_sync.c
> @@ -521,6 +521,20 @@ void sync_buffer(int cpu)
>  			} else if (s->event == CPU_TRACE_BEGIN) {
>  				state = sb_bt_start;
>  				add_trace_begin();
> +			} else if (s->event == VALUE_HEADER_ID) {
> +				/* The next event contains a value
> +				 * to enter directly into the event buffer.
> +				 */
> +				increment_tail(cpu_buf);
> +				i++;  /* one less entry in buffer to process */
> +
> +				s = &cpu_buf->buffer[cpu_buf->tail_pos];
> +
> +				if (unlikely(is_code(s->eip)))
> +					add_event_entry(s->event);
> +				else
> +					printk("KERN_ERR Oprofile per CPU" \
> +					       "buffer sequence error.\n");
>  			} else {
>  				struct mm_struct * oldmm = mm;
>
> Index: Cell_kernel_4_15_2008/drivers/oprofile/cpu_buffer.c
> ===================================================================
> --- Cell_kernel_4_15_2008.orig/drivers/oprofile/cpu_buffer.c
> +++ Cell_kernel_4_15_2008/drivers/oprofile/cpu_buffer.c
> @@ -224,6 +224,29 @@ static void oprofile_end_trace(struct op
>  	cpu_buf->tracing = 0;
>  }
>
> +/*
> + * The first entry in the per cpu buffer consists of the escape code and
> + * the VALUE_HEADER_ID value.  The next entry consists of an escape code
> + * with the value to store.  The syn_buffer routine takes the value from
> + * the second entry and put it into the kernel buffer.
> + */
> +void oprofile_add_value(unsigned long value, int cpu) {
> +	struct oprofile_cpu_buffer * cpu_buf = &cpu_buffer[cpu];
> +
> +	/* Enter a sequence of two events.  The first event says the
> +	 * next event contains a value that is to be put directly into the
> +	 * event buffer.
> +	 */
> +
> +	if (nr_available_slots(cpu_buf) < 3) {
> +		cpu_buf->sample_lost_overflow++;
> +		return;
> +	}
> +
> +	add_sample(cpu_buf, ESCAPE_CODE, VALUE_HEADER_ID);
> +	add_sample(cpu_buf, ESCAPE_CODE, value);
> +}
> +
>  void oprofile_add_ext_sample(unsigned long pc, struct pt_regs * const regs,
>  				unsigned long event, int is_kernel)
>  {
> Index: Cell_kernel_4_15_2008/drivers/oprofile/cpu_buffer.h
> ===================================================================
> --- Cell_kernel_4_15_2008.orig/drivers/oprofile/cpu_buffer.h
> +++ Cell_kernel_4_15_2008/drivers/oprofile/cpu_buffer.h
> @@ -54,5 +54,6 @@ void cpu_buffer_reset(struct oprofile_cp
>  /* transient events for the CPU buffer -> event buffer */
>  #define CPU_IS_KERNEL 1
>  #define CPU_TRACE_BEGIN 2
> +#define VALUE_HEADER_ID 3
>
>  #endif /* OPROFILE_CPU_BUFFER_H */
> Index: Cell_kernel_4_15_2008/drivers/oprofile/oprof.c
> ===================================================================
> --- Cell_kernel_4_15_2008.orig/drivers/oprofile/oprof.c
> +++ Cell_kernel_4_15_2008/drivers/oprofile/oprof.c
> @@ -53,24 +53,13 @@ int oprofile_setup(void)
>  	 * us missing task deaths and eventually oopsing
>  	 * when trying to process the event buffer.
>  	 */
> -	if (oprofile_ops.sync_start) {
> -		int sync_ret = oprofile_ops.sync_start();
> -		switch (sync_ret) {
> -		case 0:
> -			goto post_sync;
> -		case 1:
> -			goto do_generic;
> -		case -1:
> -			goto out3;
> -		default:
> -			goto out3;
> -		}
> -	}
> -do_generic:
> +	if (oprofile_ops.sync_start
> +	    && ((err = oprofile_ops.sync_start())))
> +		goto out2;
> +
>  	if ((err = sync_start()))
>  		goto out3;
>
> -post_sync:
>  	is_setup = 1;
>  	mutex_unlock(&start_mutex);
>  	return 0;
> @@ -133,20 +122,9 @@ out:
>  void oprofile_shutdown(void)
>  {
>  	mutex_lock(&start_mutex);
> -	if (oprofile_ops.sync_stop) {
> -		int sync_ret = oprofile_ops.sync_stop();
> -		switch (sync_ret) {
> -		case 0:
> -			goto post_sync;
> -		case 1:
> -			goto do_generic;
> -		default:
> -			goto post_sync;
> -		}
> -	}
> -do_generic:
> +	if (oprofile_ops.sync_stop)
> +		oprofile_ops.sync_stop();
>  	sync_stop();
> -post_sync:
>  	if (oprofile_ops.shutdown)
>  		oprofile_ops.shutdown();
>  	is_setup = 0;
> Index: Cell_kernel_4_15_2008/include/linux/oprofile.h
> ===================================================================
> --- Cell_kernel_4_15_2008.orig/include/linux/oprofile.h
> +++ Cell_kernel_4_15_2008/include/linux/oprofile.h
> @@ -56,12 +56,10 @@ struct oprofile_operations {
>  	/* Stop delivering interrupts. */
>  	void (*stop)(void);
>  	/* Arch-specific buffer sync functions.
> -	 * Return value = 0:  Success
> -	 * Return value = -1: Failure
> -	 * Return value = 1:  Run generic sync function
> +	 * Sync start: Return 0 for Success,  -1 for Failure
>  	 */
>  	int (*sync_start)(void);
> -	int (*sync_stop)(void);
> +	void (*sync_stop)(void);
>
>  	/* Initiate a stack backtrace. Optional. */
>  	void (*backtrace)(struct pt_regs * const regs, unsigned int depth);
> @@ -84,13 +82,6 @@ int oprofile_arch_init(struct oprofile_o
>  void oprofile_arch_exit(void);
>
>  /**
> - * Add data to the event buffer.
> - * The data passed is free-form, but typically consists of
> - * file offsets, dcookies, context information, and ESCAPE codes.
> - */
> -void add_event_entry(unsigned long data);
> -
> -/**
>   * Add a sample. This may be called from any context. Pass
>   * smp_processor_id() as cpu.
>   */
> @@ -106,6 +97,17 @@ void oprofile_add_sample(struct pt_regs 
>  void oprofile_add_ext_sample(unsigned long pc, struct pt_regs * const regs,
>  				unsigned long event, int is_kernel);
>
> +/*
> + * Add a value to the per CPU buffer.  The value is passed from the per CPU
> + * buffer to the kernel buffer with no additional processing.  The assumption
> + * is any processing of the value will be done in the postprocessor.  This
> + * function should only be used for special architecture specific data.
> + * Currently only used by the CELL processor.
> + *
> + * This function does not perform a backtrace.
> + */
> +void oprofile_add_value(unsigned long value, int cpu);
> +
>  /* Use this instead when the PC value is not from the regs. Doesn't
>   * backtrace. */
>  void oprofile_add_pc(unsigned long pc, int is_kernel, unsigned long event);
> Index: Cell_kernel_4_15_2008/include/asm-powerpc/oprofile_impl.h
> ===================================================================
> --- Cell_kernel_4_15_2008.orig/include/asm-powerpc/oprofile_impl.h
> +++ Cell_kernel_4_15_2008/include/asm-powerpc/oprofile_impl.h
> @@ -48,7 +48,7 @@ struct op_powerpc_model {
>  	void (*stop) (void);
>  	void (*global_stop) (void);
>  	int (*sync_start)(void);
> -	int (*sync_stop)(void);
> +	void (*sync_stop)(void);
>  	void (*handle_interrupt) (struct pt_regs *,
>  				  struct op_counter_config *);
>  	int num_counters;
> Index: Cell_kernel_4_15_2008/drivers/oprofile/event_buffer.h
> ===================================================================
> --- Cell_kernel_4_15_2008.orig/drivers/oprofile/event_buffer.h
> +++ Cell_kernel_4_15_2008/drivers/oprofile/event_buffer.h
> @@ -17,6 +17,14 @@ int alloc_event_buffer(void);
>
>  void free_event_buffer(void);
>   
> +
> +/**
> + * Add data to the event buffer.
> + * The data passed is free-form, but typically consists of
> + * file offsets, dcookies, context information, and ESCAPE codes.
> + */
> +void add_event_entry(unsigned long data);
> +
>  /* wake up the process sleeping on the event file */
>  void wake_up_buffer_waiter(void);
>
>
>
>
> -------------------------------------------------------------------------
> This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
> Don't miss this year's exciting event. There's still time to save $100. 
> Use priority code J8TL2D2. 
> http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
> _______________________________________________
> oprofile-list mailing list
> oprofile-list@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/oprofile-list
>   

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Updated: Reworked Cell OProfile: SPU mutex lock fix
  2008-05-01 14:43 [PATCH] Updated: Reworked Cell OProfile: SPU mutex lock fix Carl Love
  2008-05-01 16:29 ` Maynard Johnson
  2008-05-02 19:52 ` Maynard Johnson
@ 2008-05-15 15:39 ` Arnd Bergmann
  2008-05-15 19:05   ` Carl Love
  2 siblings, 1 reply; 7+ messages in thread
From: Arnd Bergmann @ 2008-05-15 15:39 UTC (permalink / raw)
  To: Carl Love; +Cc: linuxppc-dev, oprofile-list, cbe-oss-dev, linux-kernel

On Thursday 01 May 2008, Carl Love wrote:

> Finally, this patch backs out the changes previously added to the 
> oprofile generic code for handling the architecture specific 
> ops.sync_start and ops.sync_stop that allowed the architecture
> to skip the per CPU buffer creation.

Thanks for your patch, it looks a lot better than your previous version.
It would be good to have an oprofile person look into the changes
to the common code though.

Just a few more comments/questions this time:

> -static DEFINE_SPINLOCK(buffer_lock);
> +static DEFINE_SPINLOCK(add_value_lock);
>  static DEFINE_SPINLOCK(cache_lock);
>  static int num_spu_nodes;
>  int spu_prof_num_nodes;
>  int last_guard_val[MAX_NUMNODES * 8];
> +static int spu_ctx_sw_seen[MAX_NUMNODES * 8];
>  
>  /* Container for caching information about an active SPU task. */
>  struct cached_info {

I noticed now that you are indexing arrays by SPU number. This is not
a good idea, because you make assumptions about the system that may
not be true. Better pass around 'struct spu' pointers and put those
values in there if you need them. Then again, the last_guard_val looks
like you should really store that information in the context instead
of the physical SPU, but that is something else to work on.

Let's leave this one as it is for now and fix the current bug at hand,
but please remember to fix the assumptions in the code later.

> @@ -289,6 +290,7 @@ static int process_context_switch(struct
>  	int retval;
>  	unsigned int offset = 0;
>  	unsigned long spu_cookie = 0, app_dcookie;
> +	int cpu_buf;
>  
>  	retval = prepare_cached_spu_info(spu, objectId);
>  	if (retval)
> @@ -303,17 +305,28 @@ static int process_context_switch(struct
>  		goto out;
>  	}
>  
> -	/* Record context info in event buffer */
> -	spin_lock_irqsave(&buffer_lock, flags);
> -	add_event_entry(ESCAPE_CODE);
> -	add_event_entry(SPU_CTX_SWITCH_CODE);
> -	add_event_entry(spu->number);
> -	add_event_entry(spu->pid);
> -	add_event_entry(spu->tgid);
> -	add_event_entry(app_dcookie);
> -	add_event_entry(spu_cookie);
> -	add_event_entry(offset);
> -	spin_unlock_irqrestore(&buffer_lock, flags);
> +	/* Record context info in event buffer.  Note, there are 4x more
> +	 * SPUs then CPUs.  Map the SPU events/data for a given SPU to
> +	 * the same CPU buffer.  Need to ensure the cntxt switch data and
> +	 * samples stay in order.
> +	 */
> +	cpu_buf = spu->number >> 2;

The ratio of CPUs versus SPUs is anything but fixed, and the CPU numbers
may not start at 0. If you have a hypervisor (think PS3), or you are
running a non-SMP kernel, this is practically always wrong.
Please use get_cpu()/put_cpu() to read the current CPU number instead.
This one needs to be fixed now.

> +	spin_lock_irqsave(&add_value_lock, flags);
> +	oprofile_add_value(ESCAPE_CODE, cpu_buf);
> +	oprofile_add_value(SPU_CTX_SWITCH_CODE, cpu_buf);
> +	oprofile_add_value(spu->number, cpu_buf);
> +	oprofile_add_value(spu->pid, cpu_buf);
> +	oprofile_add_value(spu->tgid, cpu_buf);
> +	oprofile_add_value(app_dcookie, cpu_buf);
> +	oprofile_add_value(spu_cookie, cpu_buf);
> +	oprofile_add_value(offset, cpu_buf);
> +
> +	/* Set flag to indicate SPU PC data can now be written out.  If
> +	 * the SPU program counter data is seen before an SPU context
> +	 * record is seen, the postprocessing will fail.
> +	 */
> +	spu_ctx_sw_seen[spu->number] = 1;
> +	spin_unlock_irqrestore(&add_value_lock, flags);
>  	smp_wmb();	/* insure spu event buffer updates are written */
>  			/* don't want entries intermingled... */
>  out:

How does the spinlock protect you from racing against other values added
for the same CPU?

I'm not sure if this patch can still fix the original bug that you
are trying to solve, although it will certainly be harder to trigger.

If you are using the get_cpu() number for passing down the samples,
it is probably safe because you then can't add anything else to the
same buffer from a different CPU or from an interrupt, but I don't
understand enough about oprofile to be sure about that.

> -	spin_lock_irqsave(&buffer_lock, flags);
> -	add_event_entry(ESCAPE_CODE);
> -	add_event_entry(SPU_PROFILING_CODE);
> -	add_event_entry(num_spu_nodes);
> -	spin_unlock_irqrestore(&buffer_lock, flags);
> +	/* The SPU_PROFILING_CODE escape sequence must proceed
> +	 * the SPU context switch info.
> +	 */
> +	for_each_online_cpu(cpu) {
> +		oprofile_add_value(ESCAPE_CODE, cpu);
> +		oprofile_add_value(SPU_PROFILING_CODE, cpu);
> +		oprofile_add_value((unsigned long int)
> +					    num_spu_nodes, cpu);
> +	}
>  

You are no longer holding a lock while adding the events, which
may be correct as long as no switch_events come in, but it's
still inconsistent. Do you mind just adding the spinlock here
as well?

> @@ -452,31 +464,38 @@ void spu_sync_buffer(int spu_num, unsign
>  			break;
>  		}
>  
> -		add_event_entry(file_offset | spu_num_shifted);
> +		/* We must ensure that the SPU context switch has been written
> +		 * out before samples for the SPU.  Otherwise, the SPU context
> +		 * information is not available and the postprocessing of the
> +		 * SPU PC will fail with no available anonymous map information.
> +		 */
> +		if (spu_ctx_sw_seen[spu_num])
> +			oprofile_add_value((file_offset | spu_num_shifted),
> +						    (spu_num >> 2));

Again, I think this should just be added on the local CPU, as 'spu_num >> 2'
may not be a valid CPU number.

	Arnd <><

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Updated: Reworked Cell OProfile: SPU mutex lock fix
  2008-05-15 15:39 ` Arnd Bergmann
@ 2008-05-15 19:05   ` Carl Love
  2008-05-16 14:22     ` Arnd Bergmann
  0 siblings, 1 reply; 7+ messages in thread
From: Carl Love @ 2008-05-15 19:05 UTC (permalink / raw)
  To: Arnd Bergmann; +Cc: linuxppc-dev, oprofile-list, cbe-oss-dev, linux-kernel


On Thu, 2008-05-15 at 17:39 +0200, Arnd Bergmann wrote:
> On Thursday 01 May 2008, Carl Love wrote:
> 
> > Finally, this patch backs out the changes previously added to the 
> > oprofile generic code for handling the architecture specific 
> > ops.sync_start and ops.sync_stop that allowed the architecture
> > to skip the per CPU buffer creation.
> 
> Thanks for your patch, it looks a lot better than your previous version.
> It would be good to have an oprofile person look into the changes
> to the common code though.
> 
> Just a few more comments/questions this time:
> 
> > -static DEFINE_SPINLOCK(buffer_lock);
> > +static DEFINE_SPINLOCK(add_value_lock);
> >  static DEFINE_SPINLOCK(cache_lock);
> >  static int num_spu_nodes;
> >  int spu_prof_num_nodes;
> >  int last_guard_val[MAX_NUMNODES * 8];
> > +static int spu_ctx_sw_seen[MAX_NUMNODES * 8];
> >  
> >  /* Container for caching information about an active SPU task. */
> >  struct cached_info {
> 
> I noticed now that you are indexing arrays by SPU number. This is not
> a good idea, because you make assumptions about the system that may
> not be true. Better pass around 'struct spu' pointers and put those
> values in there if you need them. 

Hmm, well, we have arrays for last_guard_val[], spu_info[] as well that
are indexed by spu number.  So this would seem to be a more general
issue then just spu_ctx_seen[].  Not sure exactly what you are
suggesting with passing around 'struct spu' as a solution.  Are you
suggesting that a field be added to the structure for the spu context
seen flag?  Not sure that is a good idea.  We will need to clarify how
you propose to fix this not only for spu_ctx_sw_seen but the other
arrays as well that are already being indexed by spu number.

> Then again, the last_guard_val looks
> like you should really store that information in the context instead
> of the physical SPU, but that is something else to work on.
> 
> Let's leave this one as it is for now and fix the current bug at hand,
> but please remember to fix the assumptions in the code later.
> 
> > @@ -289,6 +290,7 @@ static int process_context_switch(struct
> >  	int retval;
> >  	unsigned int offset = 0;
> >  	unsigned long spu_cookie = 0, app_dcookie;
> > +	int cpu_buf;
> >  
> >  	retval = prepare_cached_spu_info(spu, objectId);
> >  	if (retval)
> > @@ -303,17 +305,28 @@ static int process_context_switch(struct
> >  		goto out;
> >  	}
> >  
> > -	/* Record context info in event buffer */
> > -	spin_lock_irqsave(&buffer_lock, flags);
> > -	add_event_entry(ESCAPE_CODE);
> > -	add_event_entry(SPU_CTX_SWITCH_CODE);
> > -	add_event_entry(spu->number);
> > -	add_event_entry(spu->pid);
> > -	add_event_entry(spu->tgid);
> > -	add_event_entry(app_dcookie);
> > -	add_event_entry(spu_cookie);
> > -	add_event_entry(offset);
> > -	spin_unlock_irqrestore(&buffer_lock, flags);
> > +	/* Record context info in event buffer.  Note, there are 4x more
> > +	 * SPUs then CPUs.  Map the SPU events/data for a given SPU to
> > +	 * the same CPU buffer.  Need to ensure the cntxt switch data and
> > +	 * samples stay in order.
> > +	 */
> > +	cpu_buf = spu->number >> 2;
> 
> The ratio of CPUs versus SPUs is anything but fixed, and the CPU numbers
> may not start at 0. If you have a hypervisor (think PS3), or you are
> running a non-SMP kernel, this is practically always wrong.
> Please use get_cpu()/put_cpu() to read the current CPU number instead.
> This one needs to be fixed now.
> 
> > +	spin_lock_irqsave(&add_value_lock, flags);
> > +	oprofile_add_value(ESCAPE_CODE, cpu_buf);
> > +	oprofile_add_value(SPU_CTX_SWITCH_CODE, cpu_buf);
> > +	oprofile_add_value(spu->number, cpu_buf);
> > +	oprofile_add_value(spu->pid, cpu_buf);
> > +	oprofile_add_value(spu->tgid, cpu_buf);
> > +	oprofile_add_value(app_dcookie, cpu_buf);
> > +	oprofile_add_value(spu_cookie, cpu_buf);
> > +	oprofile_add_value(offset, cpu_buf);
> > +
> > +	/* Set flag to indicate SPU PC data can now be written out.  If
> > +	 * the SPU program counter data is seen before an SPU context
> > +	 * record is seen, the postprocessing will fail.
> > +	 */
> > +	spu_ctx_sw_seen[spu->number] = 1;
> > +	spin_unlock_irqrestore(&add_value_lock, flags);
> >  	smp_wmb();	/* insure spu event buffer updates are written */
> >  			/* don't want entries intermingled... */
> >  out:
> 
> How does the spinlock protect you from racing against other values added
> for the same CPU?

You have lost me with this statement.  There are three sequences of
entries that need to be made, the first is the SPU_PROFILING_CODE, the
second is the SPU_CTX_SWITCH_CODE and the third is an SPU program
counter value.  The SPU_PROFILING_CODE sequence occurs once at the very
beginning when OProfile starts.  It occurs during oprofile setup before
we have registered for the context switch notification and the timer has
been setup to call the routine to read/store the SPU samples.  There is
no race between SPU_PROFILING_CODE and the other two sequences.  The
spin lock is held in the code above before adding the context switch
information to the CPU buffers.  It is held in the routine
spu_sync_buffer() when it is storing the SPU program counter values for
each SPU to its corresponding CPU buffer.  The lock is global in that
you must hold it before adding a context switch sequence to any CPU
buffer.  Similarly you must be holding it before you add an SPU's
program counter value to any of the CPU buffers. So, you can either add
a context switch sequence to a given CPU buffer or you can add an SPU
program counter to the CPU buffer not both at the same time.  The lock
is held until the entire SPU context switch entry is added to the CPU
queue to make sure entries are not intermingled.
> 
> I'm not sure if this patch can still fix the original bug that you
> are trying to solve, although it will certainly be harder to trigger.
> 

> If you are using the get_cpu() number for passing down the samples,
> it is probably safe because you then can't add anything else to the
> same buffer from a different CPU or from an interrupt, but I don't
> understand enough about oprofile to be sure about that.

Thinking about this more and re-examining the code we may have an issue.
In the oprofile code, the routines that add data to a cpu buffer are
such that only the CPU can add a value to its CPU buffer.  Again only
the CPU can move the contents of its buffer to the kernel buffer while
holding the mutex lock.  The mutex lock ensures that the CPU can add
everything in its buffer without conflicts with other CPUs.  Since we
are in an interrupt context, we can be assured that the CPU will execute
the entire sequence of oprofile_add_value() calls in the code above
before it could go off and try to sync the CPU buffer to the kernel
buffer. This ensures the sequences stay intact.  However, the code does
violate the principle that a CPU only manipulates its buffer.  

When the trace buffer is read, each 128 bit entry contains the 16 bit
SPU PC for each of the eight SPUs on the node.  Secondly, we really want
to keep the timing of the context switches and storing the SPU PC values
as close as possible.  Clearly there is some noise as the switch will
occur while the trace buffer is filling.  Storing the all of the SPU PC
values, into the current CPU buffer causes two issues.  One is that one
of the buffers will be much fuller then the rest increasing the
probability of overflowing the CPU buffer and dropping data.  As it is,
in user land, I have to increase the size of the CPU buffers to
accommodate four SPUs worth of data in a given CPU buffer.  Secondly,
the SPU context switch info for a given SPU would be going to a
different CPU buffer then the data.  Depending on the timing of the CPU
buffer syncs to the kernel buffer you might get the trace buffer emptied
multiple times before an SPU context switch event was sync'd to the
kernel buffer. Thus causing undesired additional skew in the data.

The attempt in this patch was to leverage the code for the per cpu
buffers so we would not have to grab the buffer_mutex lock from the
module.  The approach was to follow to the way the CPU values are stored
as much as possible.  It doesn't look like we can make the per cpu
approach work because we have to process all of the SPU data at the same
time.  Also, there is no guarantee which CPU will process the SPU
context switch notification.   Perhaps, the oprofile_add_value needs to
put the values directly into the kernel buffer skipping the CPU buffers
all together. 

Any thoughts about moving oprofile_add_value() to buffer_sync.c and
having it grab the buffer_mutex lock while it inserts values directly
into the kernel buffer?  At the moment it looks the easiest.  

I can see a few other solutions but they involve creating per SPU arrays
for initially storing the switch info and SPU data and then having each
CPU flush a subset of the SPU data to its CPU buffer.  A lot more
storage and overhead we want.


> 
> > -	spin_lock_irqsave(&buffer_lock, flags);
> > -	add_event_entry(ESCAPE_CODE);
> > -	add_event_entry(SPU_PROFILING_CODE);
> > -	add_event_entry(num_spu_nodes);
> > -	spin_unlock_irqrestore(&buffer_lock, flags);
> > +	/* The SPU_PROFILING_CODE escape sequence must proceed
> > +	 * the SPU context switch info.
> > +	 */
> > +	for_each_online_cpu(cpu) {
> > +		oprofile_add_value(ESCAPE_CODE, cpu);
> > +		oprofile_add_value(SPU_PROFILING_CODE, cpu);
> > +		oprofile_add_value((unsigned long int)
> > +					    num_spu_nodes, cpu);
> > +	}
> >  
> 
> You are no longer holding a lock while adding the events, which
> may be correct as long as no switch_events come in, but it's
> still inconsistent. Do you mind just adding the spinlock here
> as well?
> 

The claim is that the lock is not needed because we have not yet
registered for notification of the context switches as mentioned above.
Hence there is no race to worry about.  Registration for the switch
event notification happens right after this loop.  

> > @@ -452,31 +464,38 @@ void spu_sync_buffer(int spu_num, unsign
> >  			break;
> >  		}
> >  
> > -		add_event_entry(file_offset | spu_num_shifted);
> > +		/* We must ensure that the SPU context switch has been written
> > +		 * out before samples for the SPU.  Otherwise, the SPU context
> > +		 * information is not available and the postprocessing of the
> > +		 * SPU PC will fail with no available anonymous map information.
> > +		 */
> > +		if (spu_ctx_sw_seen[spu_num])
> > +			oprofile_add_value((file_offset | spu_num_shifted),
> > +						    (spu_num >> 2));
> 
> Again, I think this should just be added on the local CPU, as 'spu_num >> 2'
> may not be a valid CPU number.
> 
> 	Arnd <><

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Updated: Reworked Cell OProfile: SPU mutex lock fix
  2008-05-15 19:05   ` Carl Love
@ 2008-05-16 14:22     ` Arnd Bergmann
  2008-05-16 16:24       ` Carl Love
  0 siblings, 1 reply; 7+ messages in thread
From: Arnd Bergmann @ 2008-05-16 14:22 UTC (permalink / raw)
  To: Carl Love; +Cc: linuxppc-dev, oprofile-list, cbe-oss-dev, linux-kernel

On Thursday 15 May 2008, Carl Love wrote:
> On Thu, 2008-05-15 at 17:39 +0200, Arnd Bergmann wrote:
> > 
> > I noticed now that you are indexing arrays by SPU number. This is not
> > a good idea, because you make assumptions about the system that may
> > not be true. Better pass around 'struct spu' pointers and put those
> > values in there if you need them. 
> 
> Hmm, well, we have arrays for last_guard_val[], spu_info[] as well that
> are indexed by spu number.  So this would seem to be a more general
> issue then just spu_ctx_seen[].  Not sure exactly what you are
> suggesting with passing around 'struct spu' as a solution.  Are you
> suggesting that a field be added to the structure for the spu context
> seen flag?  Not sure that is a good idea.  We will need to clarify how
> you propose to fix this not only for spu_ctx_sw_seen but the other
> arrays as well that are already being indexed by spu number.

I'd suggest you add fields to struct spu, either directly to it, if
it's short, or a pointer to your own struct.

As I said, let's not do that now.

>                                                 So, you can either add
> a context switch sequence to a given CPU buffer or you can add an SPU
> program counter to the CPU buffer not both at the same time.  The lock
> is held until the entire SPU context switch entry is added to the CPU
> queue to make sure entries are not intermingled.

My point was that oprofile collecting samples for the CPU could possibly
add entries to the same queue, which would race with this.

> When the trace buffer is read, each 128 bit entry contains the 16 bit
> SPU PC for each of the eight SPUs on the node.  Secondly, we really want
> to keep the timing of the context switches and storing the SPU PC values
> as close as possible.  Clearly there is some noise as the switch will
> occur while the trace buffer is filling.

can't you just empty the trace buffer every time you encounter a context
switch, even in the absence of a timer interrupt? That would prevent
all of the noise.

> Storing the all of the SPU PC 
> values, into the current CPU buffer causes two issues.  One is that one
> of the buffers will be much fuller then the rest increasing the
> probability of overflowing the CPU buffer and dropping data.  As it is,
> in user land, I have to increase the size of the CPU buffers to
> accommodate four SPUs worth of data in a given CPU buffer.  Secondly,
> the SPU context switch info for a given SPU would be going to a
> different CPU buffer then the data. 

In practice, the calling CPUs should be well spread around by the way
that spufs works, so I don't think it's a big problem.
Did you observe this problem, or just expect it to happen?

> Depending on the timing of the CPU 
> buffer syncs to the kernel buffer you might get the trace buffer emptied
> multiple times before an SPU context switch event was sync'd to the
> kernel buffer. Thus causing undesired additional skew in the data.

good point. would this also get solved if you flush the trace buffer
on a context switch? I suppose you need to make sure that the samples
still end up ordered correctly throughout the CPU buffers. Is that
possible?

> Any thoughts about moving oprofile_add_value() to buffer_sync.c and
> having it grab the buffer_mutex lock while it inserts values directly
> into the kernel buffer?  At the moment it looks the easiest.  

That would mean that again you can't call it from interrupt context,
which is the problem you were trying to solve with the work queue
in the previous version of your patch.
 
> I can see a few other solutions but they involve creating per SPU arrays
> for initially storing the switch info and SPU data and then having each
> CPU flush a subset of the SPU data to its CPU buffer.  A lot more
> storage and overhead we want.

It might work if you do a per SPU buffer and generalize the way
that per-CPU buffers are flushed to the kernel buffer so it works
with either one.

> > > +	/* The SPU_PROFILING_CODE escape sequence must proceed
> > > +	 * the SPU context switch info.
> > > +	 */
> > > +	for_each_online_cpu(cpu) {
> > > +		oprofile_add_value(ESCAPE_CODE, cpu);
> > > +		oprofile_add_value(SPU_PROFILING_CODE, cpu);
> > > +		oprofile_add_value((unsigned long int)
> > > +					    num_spu_nodes, cpu);
> > > +	}
> > >  
> > 
> > You are no longer holding a lock while adding the events, which
> > may be correct as long as no switch_events come in, but it's
> > still inconsistent. Do you mind just adding the spinlock here
> > as well?
> > 
> 
> The claim is that the lock is not needed because we have not yet
> registered for notification of the context switches as mentioned above.
> Hence there is no race to worry about.  Registration for the switch
> event notification happens right after this loop.  

Right, that's what I understood as well. My point was that it's more
consistent if you always call the function with the same locks held,
in case somebody changes or tries to understand your code.
 	
	Arnd <><

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Updated: Reworked Cell OProfile: SPU mutex lock fix
  2008-05-16 14:22     ` Arnd Bergmann
@ 2008-05-16 16:24       ` Carl Love
  0 siblings, 0 replies; 7+ messages in thread
From: Carl Love @ 2008-05-16 16:24 UTC (permalink / raw)
  To: Arnd Bergmann; +Cc: linuxppc-dev, oprofile-list, cbe-oss-dev, linux-kernel


On Fri, 2008-05-16 at 16:22 +0200, Arnd Bergmann wrote:
> On Thursday 15 May 2008, Carl Love wrote:
> > On Thu, 2008-05-15 at 17:39 +0200, Arnd Bergmann wrote:
> > > 
> > > I noticed now that you are indexing arrays by SPU number. This is not
> > > a good idea, because you make assumptions about the system that may
> > > not be true. Better pass around 'struct spu' pointers and put those
> > > values in there if you need them. 
> > 
> > Hmm, well, we have arrays for last_guard_val[], spu_info[] as well that
> > are indexed by spu number.  So this would seem to be a more general
> > issue then just spu_ctx_seen[].  Not sure exactly what you are
> > suggesting with passing around 'struct spu' as a solution.  Are you
> > suggesting that a field be added to the structure for the spu context
> > seen flag?  Not sure that is a good idea.  We will need to clarify how
> > you propose to fix this not only for spu_ctx_sw_seen but the other
> > arrays as well that are already being indexed by spu number.
> 
> I'd suggest you add fields to struct spu, either directly to it, if
> it's short, or a pointer to your own struct.
> 
> As I said, let's not do that now.
> 
> >                                                 So, you can either add
> > a context switch sequence to a given CPU buffer or you can add an SPU
> > program counter to the CPU buffer not both at the same time.  The lock
> > is held until the entire SPU context switch entry is added to the CPU
> > queue to make sure entries are not intermingled.
> 
> My point was that oprofile collecting samples for the CPU could possibly
> add entries to the same queue, which would race with this.

Ah, I see what you are getting at.  That we would be collecting
profiling data on a a CPU (i.e. a PPU thread) and SPU cycle profiling
data at the same time.  No, due to hardware constraints on SPU cycle
profiling, the hardware cannot be configured to do both PPU profiling
and SPU CYCLE profiling at a time. 

I am working on SPU event profiling. The hardware will only support
event profiling on one SPU per node at a time. I will have to think
about it more but my initial thought is supporting PPU profiling and SPU
event profiling will again not work due to hardware constraints.  

> 
> > When the trace buffer is read, each 128 bit entry contains the 16 bit
> > SPU PC for each of the eight SPUs on the node.  Secondly, we really want
> > to keep the timing of the context switches and storing the SPU PC values
> > as close as possible.  Clearly there is some noise as the switch will
> > occur while the trace buffer is filling.
> 
> can't you just empty the trace buffer every time you encounter a context
> switch, even in the absence of a timer interrupt? That would prevent
> all of the noise.

That is a good thought.  It should help to keep the timing more accurate
and prevent the issue mentioned below about processing multiple trace
buffers of data before processing the context switch info.
> 
> > Storing the all of the SPU PC 
> > values, into the current CPU buffer causes two issues.  One is that one
> > of the buffers will be much fuller then the rest increasing the
> > probability of overflowing the CPU buffer and dropping data.  As it is,
> > in user land, I have to increase the size of the CPU buffers to
> > accommodate four SPUs worth of data in a given CPU buffer.  Secondly,
> > the SPU context switch info for a given SPU would be going to a
> > different CPU buffer then the data. 
> 
> In practice, the calling CPUs should be well spread around by the way
> that spufs works, so I don't think it's a big problem.
> Did you observe this problem, or just expect it to happen?

I did see the problem initially.  The first version of
oprofile_add_value() did not take the CPU buffer argument.  Rather
internally the oprofile_add_value() stored the data to the current CPU
buffer by calling smp_processor_id().  When processing the trace buffer,
the function would extract the PC value for each of the 8 SPUs on the
node, then store them in the current CPU buffer.  It would do this for
all of the samples in the trace buffer, typically about 250 (Maximum is
1024).  So you would put 8 * 350 samples all into the same CPU buffer
and nothing in the other CPU buffers for that node.  In order to ensure
I don't drop any samples, I had to increase the size of each CPU buffer
more then when I distribute them to the two CPU buffers in each node.  I
changed the oprofile_add_value() to take a CPU buffer to better utilize
the CPU buffers and to avoid the ordering issue with the SPU context
switch data.  I didn't do a detailed study to see exactly how much more
the CPU buffer size needed to be increased as the SPU context switch
ordering was the primary issue I was trying to fix.  The default CPU
buffer size must be increased because four processors (SPUS) worth of
data is being stored in each CPU buffer and each entry takes two
locations to store (the special escape code, then the actual data).  
> 
> > Depending on the timing of the CPU 
> > buffer syncs to the kernel buffer you might get the trace buffer emptied
> > multiple times before an SPU context switch event was sync'd to the
> > kernel buffer. Thus causing undesired additional skew in the data.
> 
> good point. would this also get solved if you flush the trace buffer
> on a context switch? I suppose you need to make sure that the samples
> still end up ordered correctly throughout the CPU buffers. Is that
> possible?

Yes, this would be avoided by processing the trace buffer on a context
switch.  We have the added benefit that it should help minimize the skew
between the data collection and context switch information.
> 
> > Any thoughts about moving oprofile_add_value() to buffer_sync.c and
> > having it grab the buffer_mutex lock while it inserts values directly
> > into the kernel buffer?  At the moment it looks the easiest.  
> 
> That would mean that again you can't call it from interrupt context,
> which is the problem you were trying to solve with the work queue
> in the previous version of your patch.

Yes, forgot about that little detail, again.  Argh!  
> 
> > I can see a few other solutions but they involve creating per SPU arrays
> > for initially storing the switch info and SPU data and then having each
> > CPU flush a subset of the SPU data to its CPU buffer.  A lot more
> > storage and overhead we want.
> 
> It might work if you do a per SPU buffer and generalize the way
> that per-CPU buffers are flushed to the kernel buffer so it works
> with either one.

If we flush on SPU context switches, we are just left with how best to
manage storing the data.  I see two choice, allocating more memory so we
can store data into a per SPU buffer then figure out how to flush the
data to the kernel buffer.  Or just increase the per cpu buffer size so
we can store all of the nodes SPU data into a single CPU buffer.  Either
way we use more memory.  In the first approach, I would need to allocate
SPU arrays large enough to store data for the worst case, the trace
buffer was full.  This would be 8*1024 entries.  Typically only a 1/3
would be needed as the hrtimer is setup to go off at about a 1/3 full.
This gives some buffer in case it takes time to get the timer function
call scheduled.  The second case doubling the size of the per cpu
buffers to handle the typical case of the trace buffer being 1/3 full
would correspond to allocating an additional 2*1024/3 = 682 entries.
This would be less memory then for the first approach and be simpler to
implement.

Given the code, it would be easy to measure what the minimum number of
buffers needed is with the current patch that spreads the entries evenly
across all of the buffers.  Then it is just a one line hack to put the
data all into the current CPU.  Then re-tune to find the minimum number
of CPU buffers.  I will do the experiments as I might be helpful to see
what the memory cost would be.

> 
> > > > +	/* The SPU_PROFILING_CODE escape sequence must proceed
> > > > +	 * the SPU context switch info.
> > > > +	 */
> > > > +	for_each_online_cpu(cpu) {
> > > > +		oprofile_add_value(ESCAPE_CODE, cpu);
> > > > +		oprofile_add_value(SPU_PROFILING_CODE, cpu);
> > > > +		oprofile_add_value((unsigned long int)
> > > > +					    num_spu_nodes, cpu);
> > > > +	}
> > > >  
> > > 
> > > You are no longer holding a lock while adding the events, which
> > > may be correct as long as no switch_events come in, but it's
> > > still inconsistent. Do you mind just adding the spinlock here
> > > as well?
> > > 
> > 
> > The claim is that the lock is not needed because we have not yet
> > registered for notification of the context switches as mentioned above.
> > Hence there is no race to worry about.  Registration for the switch
> > event notification happens right after this loop.  
> 
> Right, that's what I understood as well. My point was that it's more
> consistent if you always call the function with the same locks held,
> in case somebody changes or tries to understand your code.

Yup, not a big deal. I had thought about doing it for completeness sake
but then opted not to figuring I would get dinged for unnecessarily
grabbing a lock and doing extra work.  I will put it in.  That is what I
get for trying to second guess the reviewers.   :-)

>  	
> 	Arnd <><

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2008-05-16 16:24 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-05-01 14:43 [PATCH] Updated: Reworked Cell OProfile: SPU mutex lock fix Carl Love
2008-05-01 16:29 ` Maynard Johnson
2008-05-02 19:52 ` Maynard Johnson
2008-05-15 15:39 ` Arnd Bergmann
2008-05-15 19:05   ` Carl Love
2008-05-16 14:22     ` Arnd Bergmann
2008-05-16 16:24       ` Carl Love

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).