All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Carl Love <cel@us.ibm.com>
Cc: linuxppc-dev@ozlabs.org, oprofile-list@lists.sourceforge.net,
	cbe-oss-dev@ozlabs.org, Arnd Bergmann <arnd@arndb.de>,
	linux-kernel@vger.kernel.org
Subject: Re: [Cbe-oss-dev] [RFC, PATCH] CELL Oprofile SPU profiling	updated	patch
Date: Thu, 15 Feb 2007 13:50:48 -0800	[thread overview]
Message-ID: <20070215215047.GE1913@linux.vnet.ibm.com> (raw)
In-Reply-To: <1171570918.31179.36.camel@dyn9047021078.beaverton.ibm.com>

On Thu, Feb 15, 2007 at 12:21:58PM -0800, Carl Love wrote:
> On Thu, 2007-02-15 at 15:37 +0100, Arnd Bergmann wrote:

[ . . . ]

> > I agree with Milton that it would be far nicer even to calculate
> > the value from user space, but since you say that would
> > violate the oprofile interface conventions, let's not go there.
> > In order to make this code nicer on the user, you should probably
> > insert a 'cond_resched()' somewhere in the loop, maybe every
> > 500 iterations or so.
> > 
> > it also looks like there is whitespace damage in the code here.
> 
> I will double check on the whitespace damage.  I thought I had gotten
> all that out.  
> 
> I have done some quick measurements.  The above method limits the loop
> to at most 2^16 iterations.  Based on running the algorithm in user
> space, it takes about 3ms of computation time to do the loop 2^16 times.
> 
> At the vary least, we need to put the resched in say every 10,000
> iterations which would be about every 0.5ms.  Should we do a resched
> more often?  
> 
> Additionally we could up the size of the table to 512 which would reduce
> the maximum time to about 1.5ms.  What do people think about increasing
> the table size?

Is this 1.5ms with interrupts disabled?  This time period is problematic
from a realtime perspective if so -- need to be able to preempt.

						Thanx, Paul

> A little more general discussion about the logarithmic algorithm and
> limiting the range.  The hardware supports a 24 bit LFSR value. This
> means the user can say is capture a sample every N cycles, where N is in
> the range of 1 to 2^24.  The OProfile user tool enforces a minimum value
> of N to make sure the overhead of OProfile doesn't bring the machine to
> its knees.  The minimum values is not intended to guarantee the
> performance impact of OProfile will not be significant.  It is left as
> an exercise for the user to pick an N that will give minimal performance
> impact.  We set the lower limit for N for SPU profiling to 100,000. This
> is actually high enough that we don't seem to see much performance
> impact when running OProfile.  If the user picked N=2^24 then for a
> 3.2GHz machine you would get about 200 samples per second on each node.
> Where a sample consists of the PC value for all 8 SPUs on the node.  If
> the user wanted to do a relatively long OProfile run, I can see where
> they might use N=2^24 to avoid gathering too much data.  My gut feeling
> is that the sampling frequency for N=2^24 is not low enough that someone
> would never want to use it when doing long runs.  Hence, we should not
> arbitrarily reduce the maximum value for N.  Although I would expect
> that the typical value for N will be in the range of several hundred
> thousand to a few million.
> 
> As for using a logarithmic spacing of the precomputed values, this
> approach means that the space between the precomputed values at the high
> end would be much larger then 2^14, assuming 256 precomputed values.
> That means it could take much longer then 3ms to get the needed LFSR
> value for a large N.  By evenly spacing the precomputed values, we can
> ensure that for all N it will take less then 3ms to get the value.
> Personally, I am more comfortable with a hard limit on the compute time
> then a variable time that could get much bigger then the 1ms threshold
> that Arnd wants for resched.  Any thoughts?
> 
> > 
> > > +
> > > +/* This interface allows a profiler (e.g., OProfile) to store
> > > + * spu_context information needed for profiling, allowing it to
> > > + * be saved across context save/restore operation.
> > > + *
> > > + * Assumes the caller has already incremented the ref count to
> > > + * profile_info; then spu_context_destroy must call kref_put
> > > + * on prof_info_kref.
> > > + */
> > > +void spu_set_profile_private(struct spu_context * ctx, void * profile_info,
> > > +			     struct kref * prof_info_kref,
> > > +			     void (* prof_info_release) (struct kref * kref))
> > > +{
> > > +	ctx->profile_private = profile_info;
> > > +	ctx->prof_priv_kref = prof_info_kref;
> > > +	ctx->prof_priv_release = prof_info_release;
> > > +}
> > > +EXPORT_SYMBOL_GPL(spu_set_profile_private);
> > 
> > I think you don't need the profile_private member here, if you just use
> > container_of with ctx->prof_priv_kref in all users.
> > 
> > 	Arnd <><
> 
> _______________________________________________
> cbe-oss-dev mailing list
> cbe-oss-dev@ozlabs.org
> https://ozlabs.org/mailman/listinfo/cbe-oss-dev

WARNING: multiple messages have this Message-ID (diff)
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Carl Love <cel@us.ibm.com>
Cc: Arnd Bergmann <arnd@arndb.de>,
	linuxppc-dev@ozlabs.org, cbe-oss-dev@ozlabs.org,
	oprofile-list@lists.sourceforge.net,
	linux-kernel@vger.kernel.org
Subject: Re: [Cbe-oss-dev] [RFC, PATCH] CELL Oprofile SPU profiling	updated	patch
Date: Thu, 15 Feb 2007 13:50:48 -0800	[thread overview]
Message-ID: <20070215215047.GE1913@linux.vnet.ibm.com> (raw)
In-Reply-To: <1171570918.31179.36.camel@dyn9047021078.beaverton.ibm.com>

On Thu, Feb 15, 2007 at 12:21:58PM -0800, Carl Love wrote:
> On Thu, 2007-02-15 at 15:37 +0100, Arnd Bergmann wrote:

[ . . . ]

> > I agree with Milton that it would be far nicer even to calculate
> > the value from user space, but since you say that would
> > violate the oprofile interface conventions, let's not go there.
> > In order to make this code nicer on the user, you should probably
> > insert a 'cond_resched()' somewhere in the loop, maybe every
> > 500 iterations or so.
> > 
> > it also looks like there is whitespace damage in the code here.
> 
> I will double check on the whitespace damage.  I thought I had gotten
> all that out.  
> 
> I have done some quick measurements.  The above method limits the loop
> to at most 2^16 iterations.  Based on running the algorithm in user
> space, it takes about 3ms of computation time to do the loop 2^16 times.
> 
> At the vary least, we need to put the resched in say every 10,000
> iterations which would be about every 0.5ms.  Should we do a resched
> more often?  
> 
> Additionally we could up the size of the table to 512 which would reduce
> the maximum time to about 1.5ms.  What do people think about increasing
> the table size?

Is this 1.5ms with interrupts disabled?  This time period is problematic
from a realtime perspective if so -- need to be able to preempt.

						Thanx, Paul

> A little more general discussion about the logarithmic algorithm and
> limiting the range.  The hardware supports a 24 bit LFSR value. This
> means the user can say is capture a sample every N cycles, where N is in
> the range of 1 to 2^24.  The OProfile user tool enforces a minimum value
> of N to make sure the overhead of OProfile doesn't bring the machine to
> its knees.  The minimum values is not intended to guarantee the
> performance impact of OProfile will not be significant.  It is left as
> an exercise for the user to pick an N that will give minimal performance
> impact.  We set the lower limit for N for SPU profiling to 100,000. This
> is actually high enough that we don't seem to see much performance
> impact when running OProfile.  If the user picked N=2^24 then for a
> 3.2GHz machine you would get about 200 samples per second on each node.
> Where a sample consists of the PC value for all 8 SPUs on the node.  If
> the user wanted to do a relatively long OProfile run, I can see where
> they might use N=2^24 to avoid gathering too much data.  My gut feeling
> is that the sampling frequency for N=2^24 is not low enough that someone
> would never want to use it when doing long runs.  Hence, we should not
> arbitrarily reduce the maximum value for N.  Although I would expect
> that the typical value for N will be in the range of several hundred
> thousand to a few million.
> 
> As for using a logarithmic spacing of the precomputed values, this
> approach means that the space between the precomputed values at the high
> end would be much larger then 2^14, assuming 256 precomputed values.
> That means it could take much longer then 3ms to get the needed LFSR
> value for a large N.  By evenly spacing the precomputed values, we can
> ensure that for all N it will take less then 3ms to get the value.
> Personally, I am more comfortable with a hard limit on the compute time
> then a variable time that could get much bigger then the 1ms threshold
> that Arnd wants for resched.  Any thoughts?
> 
> > 
> > > +
> > > +/* This interface allows a profiler (e.g., OProfile) to store
> > > + * spu_context information needed for profiling, allowing it to
> > > + * be saved across context save/restore operation.
> > > + *
> > > + * Assumes the caller has already incremented the ref count to
> > > + * profile_info; then spu_context_destroy must call kref_put
> > > + * on prof_info_kref.
> > > + */
> > > +void spu_set_profile_private(struct spu_context * ctx, void * profile_info,
> > > +			     struct kref * prof_info_kref,
> > > +			     void (* prof_info_release) (struct kref * kref))
> > > +{
> > > +	ctx->profile_private = profile_info;
> > > +	ctx->prof_priv_kref = prof_info_kref;
> > > +	ctx->prof_priv_release = prof_info_release;
> > > +}
> > > +EXPORT_SYMBOL_GPL(spu_set_profile_private);
> > 
> > I think you don't need the profile_private member here, if you just use
> > container_of with ctx->prof_priv_kref in all users.
> > 
> > 	Arnd <><
> 
> _______________________________________________
> cbe-oss-dev mailing list
> cbe-oss-dev@ozlabs.org
> https://ozlabs.org/mailman/listinfo/cbe-oss-dev

  parent reply	other threads:[~2007-02-15 21:50 UTC|newest]

Thread overview: 66+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-02-14 23:52 [Cbe-oss-dev] [RFC, PATCH] CELL Oprofile SPU profiling updated patch Carl Love
2007-02-15 14:37 ` Arnd Bergmann
2007-02-15 14:37   ` Arnd Bergmann
2007-02-15 16:15   ` Maynard Johnson
2007-02-15 16:15     ` Maynard Johnson
2007-02-15 18:13     ` Arnd Bergmann
2007-02-15 18:13       ` Arnd Bergmann
2007-02-15 20:21   ` Carl Love
2007-02-15 20:21     ` Carl Love
2007-02-15 21:03     ` Arnd Bergmann
2007-02-15 21:03       ` Arnd Bergmann
2007-02-15 21:50     ` Paul E. McKenney [this message]
2007-02-15 21:50       ` Paul E. McKenney
2007-02-16  0:33       ` Arnd Bergmann
2007-02-16  0:33         ` Arnd Bergmann
2007-02-16  0:32   ` Maynard Johnson
2007-02-16  0:32     ` Maynard Johnson
2007-02-16 17:14     ` Arnd Bergmann
2007-02-16 17:14       ` Arnd Bergmann
2007-02-16 21:43       ` Maynard Johnson
2007-02-16 21:43         ` Maynard Johnson
2007-02-18 23:18         ` Maynard Johnson
2007-02-18 23:18           ` Maynard Johnson
  -- strict thread matches above, loose matches on Subject: below --
2007-02-22  0:02 Carl Love
2007-02-26 23:50 ` Arnd Bergmann
2007-02-26 23:50   ` Arnd Bergmann
2007-02-27  1:31   ` Michael Ellerman
2007-02-27  1:31     ` Michael Ellerman
2007-02-27 16:52   ` Maynard Johnson
2007-02-27 16:52     ` Maynard Johnson
2007-02-28  1:44     ` Arnd Bergmann
2007-02-28  1:44       ` Arnd Bergmann
2007-02-06  0:28 [RFC,PATCH] CELL PPU " Carl Love
2007-02-06 23:02 ` [Cbe-oss-dev] [RFC, PATCH] CELL " Carl Love
2007-02-06 23:02   ` Carl Love
2007-02-07 15:41   ` Maynard Johnson
2007-02-07 15:41     ` Maynard Johnson
2007-02-07 22:48     ` Michael Ellerman
2007-02-07 22:48       ` Michael Ellerman
2007-02-08 15:03       ` Maynard Johnson
2007-02-08 15:03         ` Maynard Johnson
2007-02-08 14:18   ` Milton Miller
2007-02-08 14:18     ` Milton Miller
2007-02-08 17:21     ` Arnd Bergmann
2007-02-08 17:21       ` Arnd Bergmann
2007-02-08 18:01       ` Adrian Reber
2007-02-08 18:01         ` Adrian Reber
2007-02-08 22:51       ` Carl Love
2007-02-08 22:51         ` Carl Love
2007-02-09  2:46         ` Milton Miller
2007-02-09  2:46           ` Milton Miller
2007-02-09 16:17           ` Carl Love
2007-02-09 16:17             ` Carl Love
2007-02-11 22:46             ` Milton Miller
2007-02-11 22:46               ` Milton Miller
2007-02-12 16:38               ` Carl Love
2007-02-12 16:38                 ` Carl Love
2007-02-09 18:47       ` Milton Miller
2007-02-09 18:47         ` Milton Miller
2007-02-09 19:10         ` Arnd Bergmann
2007-02-09 19:10           ` Arnd Bergmann
2007-02-09 19:46           ` Milton Miller
2007-02-09 19:46             ` Milton Miller
2007-02-08 23:59     ` Maynard Johnson
2007-02-08 23:59       ` Maynard Johnson
2007-02-09 18:03       ` Milton Miller
2007-02-09 18:03         ` Milton Miller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20070215215047.GE1913@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=arnd@arndb.de \
    --cc=cbe-oss-dev@ozlabs.org \
    --cc=cel@us.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxppc-dev@ozlabs.org \
    --cc=oprofile-list@lists.sourceforge.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.