Re: [RFC&PATCH] Alternative RCU implementation

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* Re: [RFC&PATCH] Alternative RCU implementation
       [not found] <m3brgwgi30.fsf@new.localdomain>
@ 2004-08-30  0:43 ` Paul E. McKenney
  2004-08-30 17:13   ` Jim Houston
  0 siblings, 1 reply; 13+ messages in thread
From: Paul E. McKenney @ 2004-08-30  0:43 UTC (permalink / raw)
  To: Jim Houston
  Cc: linux-kernel, Dipankar Sarma, Manfred Spraul, Andrew Morton,
	William Lee Irwin III, Jack Steiner, Jesse Barnes, rusty

On Fri, Aug 27, 2004 at 09:35:47PM -0400, Jim Houston wrote:
> 
> The attached patch against linux-2.6.8.1-mm4 is an experimental
> implementation of RCU.
> 
> It uses an active synchronization between the rcu_read_lock(),
> rcu_read_unlock(), and the code which starts a new RCU batch.  A RCU
> batch can be started at an arbitrary point, and it will complete without
> waiting for a timer driven poll.  This should help avoid large batches
> and their adverse cache and latency effects.
> 
> I did this work because Concurrent encourages its customers to 
> isolate critical realtime processes to their own cpu and shield
> them from other processes and interrupts.  This includes disabling
> the local timer interrupt.  The current RCU code relies on the local
> timer to recognize quiescent states.  If it is disabled on any cpu,
> RCU callbacks are never called and the system bleeds memory and hangs
> on calls to synchronize_kernel().

Are these critical realtime processes user-mode only, or do they
also execute kernel code?  If they are user-mode only, a much more
straightforward approach might be to have RCU pretend that they do
not exist.

This approach would have the added benefit of keeping rcu_read_unlock()
atomic-instruction free.  In some cases, the overhead of the atomic
exchange would overwhelm that of the read-side RCU critical section.

Taking this further, if the realtime CPUs are not allowed to execute in
the kernel at all, you can avoid overhead from smp_call_function() and
the like -- and avoid confusing those parts of the kernel that expect to
be able to send IPIs and the like to the realtime CPU (or do you leave
IPIs enabled on the realtime CPU?).

							Thanx, Paul

[ . . . ]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC&PATCH] Alternative RCU implementation
  2004-08-30  0:43 ` [RFC&PATCH] Alternative RCU implementation Paul E. McKenney
@ 2004-08-30 17:13   ` Jim Houston
  2004-08-30 17:38     ` Dipankar Sarma
  2004-08-30 18:52     ` Paul E. McKenney
  0 siblings, 2 replies; 13+ messages in thread
From: Jim Houston @ 2004-08-30 17:13 UTC (permalink / raw)
  To: paulmck
  Cc: linux-kernel, Dipankar Sarma, Manfred Spraul, Andrew Morton,
	William Lee Irwin III, Jack Steiner, Jesse Barnes, rusty

On Sun, 2004-08-29 at 20:43, Paul E. McKenney wrote:
> On Fri, Aug 27, 2004 at 09:35:47PM -0400, Jim Houston wrote:
> > 
> > The attached patch against linux-2.6.8.1-mm4 is an experimental
> > implementation of RCU.
> > 
> > It uses an active synchronization between the rcu_read_lock(),
> > rcu_read_unlock(), and the code which starts a new RCU batch.  A RCU
> > batch can be started at an arbitrary point, and it will complete without
> > waiting for a timer driven poll.  This should help avoid large batches
> > and their adverse cache and latency effects.
> > 
> > I did this work because Concurrent encourages its customers to 
> > isolate critical realtime processes to their own cpu and shield
> > them from other processes and interrupts.  This includes disabling
> > the local timer interrupt.  The current RCU code relies on the local
> > timer to recognize quiescent states.  If it is disabled on any cpu,
> > RCU callbacks are never called and the system bleeds memory and hangs
> > on calls to synchronize_kernel().
> 
> Are these critical realtime processes user-mode only, or do they
> also execute kernel code?  If they are user-mode only, a much more
> straightforward approach might be to have RCU pretend that they do
> not exist.
> 
> This approach would have the added benefit of keeping rcu_read_unlock()
> atomic-instruction free.  In some cases, the overhead of the atomic
> exchange would overwhelm that of the read-side RCU critical section.
> 
> Taking this further, if the realtime CPUs are not allowed to execute in
> the kernel at all, you can avoid overhead from smp_call_function() and
> the like -- and avoid confusing those parts of the kernel that expect to
> be able to send IPIs and the like to the realtime CPU (or do you leave
> IPIs enabled on the realtime CPU?).
> 
> 							Thanx, Paul

Hi Paul,

Our customers applications vary but, in general, the realtime processes
will do the usual system calls to synchronize with other processes and
do I/O.

I considered tracking the user<->kernel mode transitions extending the
idea of the nohz_cpu_mask.  I gave up on this idea mostly because it
required hooking into assembly code.  Just extending the idea of this
bitmap has its own scaling issues.  We may have several cpus running
realtime processes. The obvious answer is to keep the information in
a per-cpu variable and pay the price of polling this from another cpu.

I know that I'm questioning one of your design goals for RCU by adding
overhead to the read-side.  I have read everything I could find on RCU.
My belief is that the cost of the xchg() instruction is small 
compared to the cache benifit of freeing memory more quickly.
I think it's more interesting to look at the impact of the xchg() at the
level of an entire system call.  Adding 30 nanoseconds to a open/close
path that tasks 3 microseconds seems reasonable.  It is hard to measure
the benefit of reusing the a dcache entry more quickly.

I would be interested in suggestions for testing.  I would be very
interested to hear how my patch does on a large machine.

I'm also trying to figure out if I need the call_rcu_bh() changes.
Since my patch will recognize a grace periods as soon as any 
pending read-side critical sections complete, I suspect that I
don't need this change.

Jim Houston - Concurrent Computer

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC&PATCH] Alternative RCU implementation
  2004-08-30 17:13   ` Jim Houston
@ 2004-08-30 17:38     ` Dipankar Sarma
  2004-09-01  0:10       ` Jim Houston
  2004-08-30 18:52     ` Paul E. McKenney
  1 sibling, 1 reply; 13+ messages in thread
From: Dipankar Sarma @ 2004-08-30 17:38 UTC (permalink / raw)
  To: Jim Houston
  Cc: paulmck, linux-kernel, Manfred Spraul, Andrew Morton,
	William Lee Irwin III, Jack Steiner, Jesse Barnes, rusty

On Mon, Aug 30, 2004 at 01:13:41PM -0400, Jim Houston wrote:
> On Sun, 2004-08-29 at 20:43, Paul E. McKenney wrote:
> > Are these critical realtime processes user-mode only, or do they
> > also execute kernel code?  If they are user-mode only, a much more
> > straightforward approach might be to have RCU pretend that they do
> > not exist.
> > 
> > This approach would have the added benefit of keeping rcu_read_unlock()
> > atomic-instruction free.  In some cases, the overhead of the atomic
> > exchange would overwhelm that of the read-side RCU critical section.
> > 
> > Taking this further, if the realtime CPUs are not allowed to execute in
> > the kernel at all, you can avoid overhead from smp_call_function() and
> > the like -- and avoid confusing those parts of the kernel that expect to
> > be able to send IPIs and the like to the realtime CPU (or do you leave
> > IPIs enabled on the realtime CPU?).
> 
> Our customers applications vary but, in general, the realtime processes
> will do the usual system calls to synchronize with other processes and
> do I/O.
> 
> I considered tracking the user<->kernel mode transitions extending the
> idea of the nohz_cpu_mask.  I gave up on this idea mostly because it
> required hooking into assembly code.  Just extending the idea of this
> bitmap has its own scaling issues.  We may have several cpus running
> realtime processes. The obvious answer is to keep the information in
> a per-cpu variable and pay the price of polling this from another cpu.

Tracking user<->kernel transitions and putting smarts in scheduler
about RCU is the right way to go IMO.

Anything that poll other cpus and has read-side overheads will likely
not be scalable. I think that is not the right way to go about solving
the issue of dependency on local timer interrupt. It is a worthy goal
and we need to do this anyway, but we need to do it right without
hurting the current advantages as much as possible.


> I know that I'm questioning one of your design goals for RCU by adding
> overhead to the read-side.  I have read everything I could find on RCU.
> My belief is that the cost of the xchg() instruction is small 
> compared to the cache benifit of freeing memory more quickly.
> I think it's more interesting to look at the impact of the xchg() at the
> level of an entire system call.  Adding 30 nanoseconds to a open/close
> path that tasks 3 microseconds seems reasonable.  It is hard to measure
> the benefit of reusing the a dcache entry more quickly.
> 
> I would be interested in suggestions for testing.  I would be very
> interested to hear how my patch does on a large machine.

I will get you some numbers on a large machine. But I remain opposed
to this approach. I believe it can be done without the read-side
overheads. 


> I'm also trying to figure out if I need the call_rcu_bh() changes.
> Since my patch will recognize a grace periods as soon as any 
> pending read-side critical sections complete, I suspect that I
> don't need this change.

Except that under a softirq flood, a reader in a different read-side
critical section may get delayed a lot holding up RCU. Let me know
if I am missing something here.

Thanks
Dipankar

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC&PATCH] Alternative RCU implementation
  2004-08-30 17:13   ` Jim Houston
  2004-08-30 17:38     ` Dipankar Sarma
@ 2004-08-30 18:52     ` Paul E. McKenney
  2004-08-31  3:22       ` Jim Houston
  1 sibling, 1 reply; 13+ messages in thread
From: Paul E. McKenney @ 2004-08-30 18:52 UTC (permalink / raw)
  To: Jim Houston
  Cc: linux-kernel, Dipankar Sarma, Manfred Spraul, Andrew Morton,
	William Lee Irwin III, Jack Steiner, Jesse Barnes, rusty

On Mon, Aug 30, 2004 at 01:13:41PM -0400, Jim Houston wrote:
> I know that I'm questioning one of your design goals for RCU by adding
> overhead to the read-side.  I have read everything I could find on RCU.
> My belief is that the cost of the xchg() instruction is small 
> compared to the cache benifit of freeing memory more quickly.
> I think it's more interesting to look at the impact of the xchg() at the
> level of an entire system call.  Adding 30 nanoseconds to a open/close
> path that tasks 3 microseconds seems reasonable.  It is hard to measure
> the benefit of reusing the a dcache entry more quickly.

Hello, Jim,

The other thing to keep in mind is that reducing the grace-period
duration increases the per-access overhead, since each grace period
incurs a cost.  So there is a balance that needs to be struck between
overflowing memory with a too-long grace period and incurring too
much overhead with a too-short grace period.

How does the rest of the kernel work with all interrupts to
a particular CPU shut off?  For example, how do you timeslice?

						Thanx, Paul

PS.  My concerns with some aspects of your design aside, your
     getting a significant change to the RCU infrastructure to
     work reasonably well is quite impressive!

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC&PATCH] Alternative RCU implementation
  2004-08-30 18:52     ` Paul E. McKenney
@ 2004-08-31  3:22       ` Jim Houston
  2004-09-01  3:53         ` Paul E. McKenney
  0 siblings, 1 reply; 13+ messages in thread
From: Jim Houston @ 2004-08-31  3:22 UTC (permalink / raw)
  To: paulmck
  Cc: linux-kernel, Dipankar Sarma, Manfred Spraul, Andrew Morton,
	William Lee Irwin III, Jack Steiner, Jesse Barnes, rusty

On Mon, 2004-08-30 at 14:52, Paul E. McKenney wrote: 
> On Mon, Aug 30, 2004 at 01:13:41PM -0400, Jim Houston wrote:
> > I know that I'm questioning one of your design goals for RCU by adding
> > overhead to the read-side.  I have read everything I could find on RCU.
> > My belief is that the cost of the xchg() instruction is small 
> > compared to the cache benifit of freeing memory more quickly.
> > I think it's more interesting to look at the impact of the xchg() at the
> > level of an entire system call.  Adding 30 nanoseconds to a open/close
> > path that tasks 3 microseconds seems reasonable.  It is hard to measure
> > the benefit of reusing the a dcache entry more quickly.
> 
> Hello, Jim,
> 
> The other thing to keep in mind is that reducing the grace-period
> duration increases the per-access overhead, since each grace period
> incurs a cost.  So there is a balance that needs to be struck between
> overflowing memory with a too-long grace period and incurring too
> much overhead with a too-short grace period.
> 
> How does the rest of the kernel work with all interrupts to
> a particular CPU shut off?  For example, how do you timeslice?
> 
> 						Thanx, Paul
> 
> PS.  My concerns with some aspects of your design aside, your
>      getting a significant change to the RCU infrastructure to
>      work reasonably well is quite impressive!

Hi Paul,

I have two module parameters in the patch which can be used to
tune how often grace periods are started.  They can be set at boot
time as follows:

rcupdate.rcu_max_count=#
	The per-cpu count of queued requests at which to
	start a new batch.  Patch defaults to 256.

rcupdate.rcu_max_time=#
	Timeout value in jiffies at which to start a batch. 
	Defaults to HZ/10.

I picked the defaults to start batches with similar frequency to
the existing code.

I tested a dual processor with rcupdate.rcu_max_count=0. This
will start a grace period for every call_rcu(). I ran
my rename test this way and it worked suprisingly well.

I maintain a nxtbatch value which lets me check if the grace period
for the entries in the nxt list has started or perhaps already
completed.  I check this in call_rcu() and avoid mixing batches.
Any requests queued before the batch was started will be completed
at the end of the grace period.  Unless a very small rcu_max_count
value is used, there is likely to be some delay between completing
a grace period and needing to start another.

> How does the rest of the kernel work with all interrupts to
> a particular CPU shut off?  For example, how do you timeslice?

It's a balancing act.  In some cases we just document the
missing functionality.  If the local timer is disabled on a cpu,
all processes are SCHED_FIFO.  In the case of Posix timers, we
move timers to honor the procesor shielding an the process affinity.

Jim Houston

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC&PATCH] Alternative RCU implementation
  2004-08-30 17:38     ` Dipankar Sarma
@ 2004-09-01  0:10       ` Jim Houston
  2004-09-01  0:57         ` Paul E. McKenney
  0 siblings, 1 reply; 13+ messages in thread
From: Jim Houston @ 2004-09-01  0:10 UTC (permalink / raw)
  To: dipankar; +Cc: paulmck, linux-kernel

On Mon, 2004-08-30 at 13:38, Dipankar Sarma wrote:

> > I'm also trying to figure out if I need the call_rcu_bh() changes.
> > Since my patch will recognize a grace periods as soon as any 
> > pending read-side critical sections complete, I suspect that I
> > don't need this change.
> 
> Except that under a softirq flood, a reader in a different read-side
> critical section may get delayed a lot holding up RCU. Let me know
> if I am missing something here.

Hi Dipankar,

O.k.  That makes sense.  So the rcu_read_lock_bh(), rcu_read_unlock_bh()
and call_rcu_bh() would be the preferred interface.  Are there cases
where they can't be used?  How do you decide where to use the _bh 
flavor?

I see that local_bh_enable() WARNS if interrupts are disabled.  Is that
the issue?  Are rcu_read_lock()/rcu_read_unlock() ever called from 
code which disables interrupts?

Jim Houston - Concurrent Computer Corp.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC&PATCH] Alternative RCU implementation
  2004-09-01  0:10       ` Jim Houston
@ 2004-09-01  0:57         ` Paul E. McKenney
  0 siblings, 0 replies; 13+ messages in thread
From: Paul E. McKenney @ 2004-09-01  0:57 UTC (permalink / raw)
  To: Jim Houston; +Cc: dipankar, linux-kernel

On Tue, Aug 31, 2004 at 08:10:50PM -0400, Jim Houston wrote:
> On Mon, 2004-08-30 at 13:38, Dipankar Sarma wrote:
> 
> > > I'm also trying to figure out if I need the call_rcu_bh() changes.
> > > Since my patch will recognize a grace periods as soon as any 
> > > pending read-side critical sections complete, I suspect that I
> > > don't need this change.
> > 
> > Except that under a softirq flood, a reader in a different read-side
> > critical section may get delayed a lot holding up RCU. Let me know
> > if I am missing something here.
> 
> Hi Dipankar,
> 
> O.k.  That makes sense.  So the rcu_read_lock_bh(), rcu_read_unlock_bh()
> and call_rcu_bh() would be the preferred interface.  Are there cases
> where they can't be used?  How do you decide where to use the _bh 
> flavor?

Hello, Jim,

You would use rcu_read_lock() instead of rcu_read_lock_bh() in cases
where you did not want the read-side code to disable bottom halves.
This is very similar to choosing between read_lock() and read_lock_bh()
-- if you unnecessarily use read_lock_bh() or rcu_read_lock_bh(), you
will be unnecessarily delaying drivers' bottom-half execution.

> I see that local_bh_enable() WARNS if interrupts are disabled.  Is that
> the issue?  Are rcu_read_lock()/rcu_read_unlock() ever called from 
> code which disables interrupts?

The RCU "_bh()" interfaces correspond to a different set of quiescent
states than do the standard interfaces.  You could indeed use
rcu_read_lock() with interrupts disabled, but I don't know of any
such use.

						Thanx, Paul

> Jim Houston - Concurrent Computer Corp.
> 
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC&PATCH] Alternative RCU implementation
  2004-08-31  3:22       ` Jim Houston
@ 2004-09-01  3:53         ` Paul E. McKenney
  2004-09-01 13:02           ` Jim Houston
  0 siblings, 1 reply; 13+ messages in thread
From: Paul E. McKenney @ 2004-09-01  3:53 UTC (permalink / raw)
  To: Jim Houston
  Cc: linux-kernel, Dipankar Sarma, Manfred Spraul, Andrew Morton,
	William Lee Irwin III, Jack Steiner, Jesse Barnes, rusty

On Mon, Aug 30, 2004 at 11:22:49PM -0400, Jim Houston wrote:
> On Mon, 2004-08-30 at 14:52, Paul E. McKenney wrote: 
> > How does the rest of the kernel work with all interrupts to
> > a particular CPU shut off?  For example, how do you timeslice?
> 
> It's a balancing act.  In some cases we just document the
> missing functionality.  If the local timer is disabled on a cpu,
> all processes are SCHED_FIFO.  In the case of Posix timers, we
> move timers to honor the procesor shielding an the process affinity.

I have to ask...  When you say that you move the timers, you mean that
non-realtime CPU 1 managers timers for realtime CPU 0, so that CPU 1
is (effectively) taking CPU 0's timer interrupts?

							Thanx, Paul

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC&PATCH] Alternative RCU implementation
  2004-09-01  3:53         ` Paul E. McKenney
@ 2004-09-01 13:02           ` Jim Houston
  2004-09-02 16:38             ` Paul E. McKenney
  0 siblings, 1 reply; 13+ messages in thread
From: Jim Houston @ 2004-09-01 13:02 UTC (permalink / raw)
  To: paulmck
  Cc: linux-kernel, Dipankar Sarma, Manfred Spraul, Andrew Morton,
	William Lee Irwin III, Jack Steiner, Jesse Barnes, rusty

On Tue, 2004-08-31 at 23:53, Paul E. McKenney wrote:
> On Mon, Aug 30, 2004 at 11:22:49PM -0400, Jim Houston wrote:
> > On Mon, 2004-08-30 at 14:52, Paul E. McKenney wrote: 
> > > How does the rest of the kernel work with all interrupts to
> > > a particular CPU shut off?  For example, how do you timeslice?
> > 
> > It's a balancing act.  In some cases we just document the
> > missing functionality.  If the local timer is disabled on a cpu,
> > all processes are SCHED_FIFO.  In the case of Posix timers, we
> > move timers to honor the procesor shielding an the process affinity.
> 
> I have to ask...  When you say that you move the timers, you mean that
> non-realtime CPU 1 managers timers for realtime CPU 0, so that CPU 1
> is (effectively) taking CPU 0's timer interrupts?

Hi Paul,

That is part of the idea.  There are lots of timers which we don't
expect to have realtime behavior.

There are also services like Posix timers and nanosleep() where we want
very predictable behavior.  If a process does a nanosleep(), we queue
that timer on the local cpu.  If process affinity is changed, we will
move the timer to a cpu where the process is allowed to run.

We have separate queues for high resolution timers.  If the local queue
is empty, we shutdown the timer.

Jim Houston - Concurrent Computer Corp.




^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC&PATCH] Alternative RCU implementation
  2004-09-01 13:02           ` Jim Houston
@ 2004-09-02 16:38             ` Paul E. McKenney
  2004-09-02 18:54               ` Jim Houston
  0 siblings, 1 reply; 13+ messages in thread
From: Paul E. McKenney @ 2004-09-02 16:38 UTC (permalink / raw)
  To: Jim Houston
  Cc: linux-kernel, Dipankar Sarma, Manfred Spraul, Andrew Morton,
	William Lee Irwin III, Jack Steiner, Jesse Barnes, rusty

On Wed, Sep 01, 2004 at 09:02:00AM -0400, Jim Houston wrote:
> On Tue, 2004-08-31 at 23:53, Paul E. McKenney wrote:
> > On Mon, Aug 30, 2004 at 11:22:49PM -0400, Jim Houston wrote:
> > > On Mon, 2004-08-30 at 14:52, Paul E. McKenney wrote: 
> > > > How does the rest of the kernel work with all interrupts to
> > > > a particular CPU shut off?  For example, how do you timeslice?
> > > 
> > > It's a balancing act.  In some cases we just document the
> > > missing functionality.  If the local timer is disabled on a cpu,
> > > all processes are SCHED_FIFO.  In the case of Posix timers, we
> > > move timers to honor the procesor shielding an the process affinity.
> > 
> > I have to ask...  When you say that you move the timers, you mean that
> > non-realtime CPU 1 managers timers for realtime CPU 0, so that CPU 1
> > is (effectively) taking CPU 0's timer interrupts?
> 
> Hi Paul,
> 
> That is part of the idea.  There are lots of timers which we don't
> expect to have realtime behavior.
> 
> There are also services like Posix timers and nanosleep() where we want
> very predictable behavior.  If a process does a nanosleep(), we queue
> that timer on the local cpu.  If process affinity is changed, we will
> move the timer to a cpu where the process is allowed to run.
> 
> We have separate queues for high resolution timers.  If the local queue
> is empty, we shutdown the timer.

Hello, Jim,

How do you mark a given CPU as being in realtime mode?  Or is the
timer-shutdown decision based on the presence of a realtime process
runnable on the given CPU or some such?

Still trying to figure out a way to make this work without adding
overhead to rcu_read_unlock()...

						Thanx, Paul

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC&PATCH] Alternative RCU implementation
  2004-09-02 16:38             ` Paul E. McKenney
@ 2004-09-02 18:54               ` Jim Houston
  2004-09-02 21:20                 ` Manfred Spraul
  0 siblings, 1 reply; 13+ messages in thread
From: Jim Houston @ 2004-09-02 18:54 UTC (permalink / raw)
  To: paulmck
  Cc: linux-kernel, Dipankar Sarma, Manfred Spraul, Andrew Morton,
	William Lee Irwin III, Jack Steiner, Jesse Barnes, rusty

On Thu, 2004-09-02 at 12:38, Paul E. McKenney wrote:
> On Wed, Sep 01, 2004 at 09:02:00AM -0400, Jim Houston wrote:
> > On Tue, 2004-08-31 at 23:53, Paul E. McKenney wrote:
> > > On Mon, Aug 30, 2004 at 11:22:49PM -0400, Jim Houston wrote:
> > > > On Mon, 2004-08-30 at 14:52, Paul E. McKenney wrote: 
> > > > > How does the rest of the kernel work with all interrupts to
> > > > > a particular CPU shut off?  For example, how do you timeslice?
> > > > 
> > > > It's a balancing act.  In some cases we just document the
> > > > missing functionality.  If the local timer is disabled on a cpu,
> > > > all processes are SCHED_FIFO.  In the case of Posix timers, we
> > > > move timers to honor the procesor shielding an the process affinity.
> > > 
> > > I have to ask...  When you say that you move the timers, you mean that
> > > non-realtime CPU 1 managers timers for realtime CPU 0, so that CPU 1
> > > is (effectively) taking CPU 0's timer interrupts?
> > 
> > Hi Paul,
> > 
> > That is part of the idea.  There are lots of timers which we don't
> > expect to have realtime behavior.
> > 
> > There are also services like Posix timers and nanosleep() where we want
> > very predictable behavior.  If a process does a nanosleep(), we queue
> > that timer on the local cpu.  If process affinity is changed, we will
> > move the timer to a cpu where the process is allowed to run.
> > 
> > We have separate queues for high resolution timers.  If the local queue
> > is empty, we shutdown the timer.
> 
> Hello, Jim,
> 
> How do you mark a given CPU as being in realtime mode?  Or is the
> timer-shutdown decision based on the presence of a realtime process
> runnable on the given CPU or some such?
> 
> Still trying to figure out a way to make this work without adding
> overhead to rcu_read_unlock()...

Hi Paul

We add the following /proc files:

/proc/shield/irqs
	Setting a bit limits the corresponding cpu to only handle
	interrupts which are explicitly directed to that cpu.

/proc/shield/ltmrs
	Setting a bit limits the use of local timers on the 
	corresponding cpu.

/proc/shield/procs
	Setting a bit limits the cpu to only run processes which
	have set affinity to that cpu.

When the user changes something we adjust irq routing and migrate
processes and timers appropriately.

Jim Houston - Concurrent Computer Corp.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC&PATCH] Alternative RCU implementation
  2004-09-02 18:54               ` Jim Houston
@ 2004-09-02 21:20                 ` Manfred Spraul
  2004-09-03  1:19                   ` Jim Houston
  0 siblings, 1 reply; 13+ messages in thread
From: Manfred Spraul @ 2004-09-02 21:20 UTC (permalink / raw)
  To: jim.houston
  Cc: paulmck, linux-kernel, Dipankar Sarma, Andrew Morton,
	William Lee Irwin III, Jack Steiner, Jesse Barnes, rusty

Jim Houston wrote:

>We add the following /proc files:
>
>/proc/shield/irqs
>	Setting a bit limits the corresponding cpu to only handle
>	interrupts which are explicitly directed to that cpu.
>
>/proc/shield/ltmrs
>	Setting a bit limits the use of local timers on the 
>	corresponding cpu.
>
>  
>
How do you handle schedule_delayed_work_on()?
slab uses it to drain the per-cpu caches. It's not fatal if a cpu 
doesn't drain it's caches (just some wasted memory), but it should be 
documented.

--
    Manfred

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC&PATCH] Alternative RCU implementation
  2004-09-02 21:20                 ` Manfred Spraul
@ 2004-09-03  1:19                   ` Jim Houston
  0 siblings, 0 replies; 13+ messages in thread
From: Jim Houston @ 2004-09-03  1:19 UTC (permalink / raw)
  To: Manfred Spraul
  Cc: paulmck, linux-kernel, Dipankar Sarma, Andrew Morton,
	William Lee Irwin III, Jack Steiner, Jesse Barnes, rusty

On Thu, 2004-09-02 at 17:20, Manfred Spraul wrote:
> Jim Houston wrote:
> 
> >We add the following /proc files:
> >/proc/shield/ltmrs
> >	Setting a bit limits the use of local timers on the 
> >	corresponding cpu.
> >
> How do you handle schedule_delayed_work_on()?
> slab uses it to drain the per-cpu caches. It's not fatal if a cpu 
> doesn't drain it's caches (just some wasted memory), but it should be 
> documented.

Hi Manfred,

The timer shielding migrates most of the timers to non-shielded cpus
but does keep track of timers queued with add_timer_on().  These
are polled periodically from a non-shielded cpu, and an inter-processsor
interrupt is used to force the shielded cpu to handle their expiry.

Jim Houston - Concurrent Computer Corp.



^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2004-09-03  1:28 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <m3brgwgi30.fsf@new.localdomain>
2004-08-30  0:43 ` [RFC&PATCH] Alternative RCU implementation Paul E. McKenney
2004-08-30 17:13   ` Jim Houston
2004-08-30 17:38     ` Dipankar Sarma
2004-09-01  0:10       ` Jim Houston
2004-09-01  0:57         ` Paul E. McKenney
2004-08-30 18:52     ` Paul E. McKenney
2004-08-31  3:22       ` Jim Houston
2004-09-01  3:53         ` Paul E. McKenney
2004-09-01 13:02           ` Jim Houston
2004-09-02 16:38             ` Paul E. McKenney
2004-09-02 18:54               ` Jim Houston
2004-09-02 21:20                 ` Manfred Spraul
2004-09-03  1:19                   ` Jim Houston

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox