* Re: RFC: THE OFFLINE SCHEDULER
2009-08-23 9:09 ` RFC: THE OFFLINE SCHEDULER raz ben yehuda
@ 2009-08-23 7:30 ` Mike Galbraith
2009-08-23 11:05 ` raz ben yehuda
0 siblings, 1 reply; 79+ messages in thread
From: Mike Galbraith @ 2009-08-23 7:30 UTC (permalink / raw)
To: raz ben yehuda
Cc: riel, mingo, peterz, andrew motron, wiseman, lkml, linux-rt-users
On Sun, 2009-08-23 at 12:09 +0300, raz ben yehuda wrote:
> On Sun, 2009-08-23 at 07:21 +0200, Mike Galbraith wrote:
> > Seems to me this boils down to a different way to make a SW box in a HW
> > box, which already exists. What does this provide that partitioning a
> > box with csets and virtualization doesn't?
> OFFSCHED does not compete with cpu sets nor virtualization.it is
> different.
>
> 1. Neither virtuallization nor cpu sets provide hard real time. OFFSCHED
> does this with a little cost and no impact on the OS.OFFSCHED is not
> just accurate , it is also extremely fast,after all, it is NMI'ed
> processor.
Why not? Why can't I run an RT kernel with an RTOS guest and let it do
it's deadline management thing?
> 2. OFFSCHED has a access to every piece of memory in the system. so it
> can act as a centry for the system, or use linux facilities. Also, the
> kernel can access OFFSCHED memory, it is the same address space.
Hm. That appears to be a self negating argument.
> 3. OFFSCHED can improve the linux OS ( NAPI,OFFSCHED firewall,RTOP ),
> while a guest OS cannot.
>
> 4. cpu sets cannot replace softirqs and hardirqs. OFFSCHED can. cpu sets
> deals with kernel threads and user space threads. in OFFSCHED we use
> offlets.
Which still looks like OS-fu to me.
> 5. cpu sets and virtualization are services provided by the kernel to
> the "system".who serves the kernel ? who protects the kernel ?
If either one can diddle the others ram, they are in no way isolated or
protected, so can't even defend against their own bugs.
What protects a hard RT deadline from VM pressure, memory bandwidth
consumption etc etc? Looks to me like it's soft RT, because you can't
control the external variables.
> 6. offlets gives the programmer full control over an entire processor.
> no preemption, no interrupts, no quiesce. you know what happens , and
> when it happens.
If I can route interrupts such that only say network interrupts are
delivered to my cset/vm core, and the guest OS is a custom high speed
low drag application, I just don't see much difference.
> I have this hard real time system several years on my SMP/MC/SMT
> machines. It serves me well. The core of OFFSCHED patch was 4 lines.
> So,i simply compile a ***entirely regular*** linux bzImage and that's
> it. It did not mess with drivers, spinlocks, softirqs ..., OFFSCHED just
> directed the cpu_down to my own hard real time piece of code. The rest
> of the kernel remained the same.
Aaaaanyway, I'm not saying it's not a useful thing to do, just saying I
don't see any reason you can't get essentially the same result with
what's in the kernel now.
-Mike
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: RFC: THE OFFLINE SCHEDULER
[not found] ` <1251004897.7043.70.camel@marge.simson.net>
@ 2009-08-23 9:09 ` raz ben yehuda
2009-08-23 7:30 ` Mike Galbraith
0 siblings, 1 reply; 79+ messages in thread
From: raz ben yehuda @ 2009-08-23 9:09 UTC (permalink / raw)
To: Mike Galbraith
Cc: riel, mingo, peterz, andrew motron, wiseman, lkml, linux-rt-users
On Sun, 2009-08-23 at 07:21 +0200, Mike Galbraith wrote:
> On Sun, 2009-08-23 at 02:27 +0300, raz ben yehuda wrote:
> > The Open University of Israel
> > Department of Mathematics and computer science
> >
> > FINAL PAPER
> > OFFLINE SCHEDULER
> >
> >
> >
> > OFFSCHED is a platform aimed to assign an assignment to an offloaded processor.An offloaded processor is a processor that is hot un-plugged from the operating system.
> >
> > Description
> >
> > In today’s computer world, we find that most processors have several embedded cores and hyper-threading. Most programmers do not really use these powerful features and let the operating system do the work.
> > At most, a programmer will bound an application to a certain processor or assign an interrupt to a different processor. At the end, we get system busy in maintaining tasks across processors, balancing interrupts, flushing TLBs and DTLBs using atomic operations even when not needed and worst of all, spin locks across processors in vein; and the more processors the merrier. I argue that in some cases, part of this behavior is due to fact the multiple core operating system is not service oriented but a system oriented. There is no easy way to assign a processor to do a distinct service, undisturbed, accurate, and fast as long as the processor is an active part of an operating system and still be a part of most of the operating system address space.
> >
> > OFFSCHED Purpose
> >
> > The purpose of the OFFSCHED is to create a platform for services. For example, assume a firewall is being attacked; the Linux operating system will generate endless number of interrupts and/or softirqs to analyze the traffic and throw out bad packets. This is on the expense of “good” packets. Have you ever tried to “ssh” to an attacked machine? Who protects the operating system ?
> > What if we can simply do the packet analysis outside the operating system, without being interrupted ?
> > Why not assign a core to do only “firewalling”? Or just routing? Design a new type of Real Time system? Maybe assign it as an ultra accurate timer? Create a delaying service that does not just spin? Offload a TCP stack? perhaps a new type of a locking scheme? New type bottom-halves? Debug a running kernel through an offloaded processor? Maybe assign a GPU to do other things than just graphics?
> > Amdahl Law teaches us that linear speed-up is not very feasible , so why not spare a processor to do certain tasks better?
> > Technologically speaking, I am referring to the Linux kernel ability to virtually hot unplug a (SMT) processor ;but instead of letting it wonder in endless “halts”, assign it a service.
>
> Seems to me this boils down to a different way to make a SW box in a HW
> box, which already exists. What does this provide that partitioning a
> box with csets and virtualization doesn't?
OFFSCHED does not compete with cpu sets nor virtualization.it is
different.
1. Neither virtuallization nor cpu sets provide hard real time. OFFSCHED
does this with a little cost and no impact on the OS.OFFSCHED is not
just accurate , it is also extremely fast,after all, it is NMI'ed
processor.
2. OFFSCHED has a access to every piece of memory in the system. so it
can act as a centry for the system, or use linux facilities. Also, the
kernel can access OFFSCHED memory, it is the same address space.
3. OFFSCHED can improve the linux OS ( NAPI,OFFSCHED firewall,RTOP ),
while a guest OS cannot.
4. cpu sets cannot replace softirqs and hardirqs. OFFSCHED can. cpu sets
deals with kernel threads and user space threads. in OFFSCHED we use
offlets.
5. cpu sets and virtualization are services provided by the kernel to
the "system".who serves the kernel ? who protects the kernel ?
6. offlets gives the programmer full control over an entire processor.
no preemption, no interrupts, no quiesce. you know what happens , and
when it happens.
I have this hard real time system several years on my SMP/MC/SMT
machines. It serves me well. The core of OFFSCHED patch was 4 lines.
So,i simply compile a ***entirely regular*** linux bzImage and that's
it. It did not mess with drivers, spinlocks, softirqs ..., OFFSCHED just
directed the cpu_down to my own hard real time piece of code. The rest
of the kernel remained the same.
> -Mike
>
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: RFC: THE OFFLINE SCHEDULER
2009-08-23 11:05 ` raz ben yehuda
@ 2009-08-23 9:52 ` Mike Galbraith
2009-08-25 15:23 ` Christoph Lameter
0 siblings, 1 reply; 79+ messages in thread
From: Mike Galbraith @ 2009-08-23 9:52 UTC (permalink / raw)
To: raz ben yehuda
Cc: riel, mingo, peterz, andrew motron, wiseman, lkml, linux-rt-users
On Sun, 2009-08-23 at 14:05 +0300, raz ben yehuda wrote:
> On Sun, 2009-08-23 at 09:30 +0200, Mike Galbraith wrote:
> > On Sun, 2009-08-23 at 12:09 +0300, raz ben yehuda wrote:
> > > On Sun, 2009-08-23 at 07:21 +0200, Mike Galbraith wrote:
> >
> > > > Seems to me this boils down to a different way to make a SW box in a HW
> > > > box, which already exists. What does this provide that partitioning a
> > > > box with csets and virtualization doesn't?
> > > OFFSCHED does not compete with cpu sets nor virtualization.it is
> > > different.
> > >
> > > 1. Neither virtuallization nor cpu sets provide hard real time. OFFSCHED
> > > does this with a little cost and no impact on the OS.OFFSCHED is not
> > > just accurate , it is also extremely fast,after all, it is NMI'ed
> > > processor.
> >
> > Why not? Why can't I run an RT kernel with an RTOS guest and let it do
> > it's deadline management thing?
> Have you ever tested how long a single context switch cost ? can you run
> this system with a 1us accuracy ? you cannot.try ftrac'ing your system.
> the interrupt alone costs several hundreds nano seconds. By the time you
> will be reaching your code, the deadline will be nearly gone.
I've measured context switch cost many times. The point though, wasn't
how tight a constraint may be, you maintained that realtime was out the
window, and I didn't see any reason for that to be the case.
> > > 2. OFFSCHED has a access to every piece of memory in the system. so it
> > > can act as a centry for the system, or use linux facilities. Also, the
> > > kernel can access OFFSCHED memory, it is the same address space.
> >
> > Hm. That appears to be a self negating argument.
> correct. but I can receive packets over napi and transmit packets over hard_start_xmit
> much faster than any guest OS. I can disable interrupts and move to poll
> mode, thus helping the operating system. can a guest OS help linux?
Depends entirely on the job at hand. If the job is running a firewall
in kernel mode, no it won't cut the mustard.
(no offense intended, but this all sounds like a great big kernel module
to me, one which doesn't even taint the kernel)
> > > 3. OFFSCHED can improve the linux OS ( NAPI,OFFSCHED firewall,RTOP ),
> > > while a guest OS cannot.
> > >
> > > 4. cpu sets cannot replace softirqs and hardirqs. OFFSCHED can. cpu sets
> > > deals with kernel threads and user space threads. in OFFSCHED we use
> > > offlets.
> >
> > Which still looks like OS-fu to me.
> I do not understand this remark.
Whether it's offlet, tasklet, insert buzz-word of the day, it's thread
of execution management, which I called OS-fu, ie one of those things
that OSs do.
The rest, I'll leave off replying to, we're kinda splitting hairs. I
don't see a big generic benefit to OFFSCHED or ilk, others do.
-Mike
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: RFC: THE OFFLINE SCHEDULER
2009-08-23 7:30 ` Mike Galbraith
@ 2009-08-23 11:05 ` raz ben yehuda
2009-08-23 9:52 ` Mike Galbraith
0 siblings, 1 reply; 79+ messages in thread
From: raz ben yehuda @ 2009-08-23 11:05 UTC (permalink / raw)
To: Mike Galbraith
Cc: riel, mingo, peterz, andrew motron, wiseman, lkml, linux-rt-users
On Sun, 2009-08-23 at 09:30 +0200, Mike Galbraith wrote:
> On Sun, 2009-08-23 at 12:09 +0300, raz ben yehuda wrote:
> > On Sun, 2009-08-23 at 07:21 +0200, Mike Galbraith wrote:
>
> > > Seems to me this boils down to a different way to make a SW box in a HW
> > > box, which already exists. What does this provide that partitioning a
> > > box with csets and virtualization doesn't?
> > OFFSCHED does not compete with cpu sets nor virtualization.it is
> > different.
> >
> > 1. Neither virtuallization nor cpu sets provide hard real time. OFFSCHED
> > does this with a little cost and no impact on the OS.OFFSCHED is not
> > just accurate , it is also extremely fast,after all, it is NMI'ed
> > processor.
>
> Why not? Why can't I run an RT kernel with an RTOS guest and let it do
> it's deadline management thing?
Have you ever tested how long a single context switch cost ? can you run
this system with a 1us accuracy ? you cannot.try ftrac'ing your system.
the interrupt alone costs several hundreds nano seconds. By the time you
will be reaching your code, the deadline will be nearly gone.
> > 2. OFFSCHED has a access to every piece of memory in the system. so it
> > can act as a centry for the system, or use linux facilities. Also, the
> > kernel can access OFFSCHED memory, it is the same address space.
>
> Hm. That appears to be a self negating argument.
correct. but I can receive packets over napi and transmit packets over hard_start_xmit
much faster than any guest OS. I can disable interrupts and move to poll
mode, thus helping the operating system. can a guest OS help linux?
> > 3. OFFSCHED can improve the linux OS ( NAPI,OFFSCHED firewall,RTOP ),
> > while a guest OS cannot.
> >
> > 4. cpu sets cannot replace softirqs and hardirqs. OFFSCHED can. cpu sets
> > deals with kernel threads and user space threads. in OFFSCHED we use
> > offlets.
>
> Which still looks like OS-fu to me.
I do not understand this remark.
> > 5. cpu sets and virtualization are services provided by the kernel to
> > the "system".who serves the kernel ? who protects the kernel ?
>
> If either one can diddle the others ram, they are in no way isolated or
> protected, so can't even defend against their own bugs.
correct. but the same applies for a hosting OS. as you said , it is a
self negating argument. what if your system is attacked by a RT task
that saturate all cpu time, you will not even be able to know what is
wrong with your system. in OFFSCHED-RTOP I show that even when attacked
by a malicious task, I still can see the problem because I can access
the task list and dump it to a remote machine. It is even possible to
"kill it" with the offlet-server ( still need to write the killing
thing ).
> What protects a hard RT deadline from VM pressure, memory bandwidth
> consumption etc etc? Looks to me like it's soft RT, because you can't
> control the external variables.
what does protect guest OS from the host ? also, In one of my
applications I wrote my own pre-allocation system, OFFSCHED used only
its own pools. so VM was never a problem and it is a true hard real time
system. as for memory bandwidth pressure OFFSCHED is not protected from.
But if you design your application to use small footprint, you will be
able to stay in the processor cache. when you have a kernel thread, lazy
TLB is not always promised. Can you say your RT task will never be
preempted ?
And again, if RTAI or anything of the like has facilities for this kind
of problems, OFFSCHED can use them as well.
> > 6. offlets gives the programmer full control over an entire processor.
> > no preemption, no interrupts, no quiesce. you know what happens , and
> > when it happens.
>
> If I can route interrupts such that only say network interrupts are
> delivered to my cset/vm core, and the guest OS is a custom high speed
> low drag application, I just don't see much difference.
There are other tasks a system must walk through , for example, a
processor must walk through a quiesce state, which means you cannot have
your real time thread running forever without loosing processor from
time to time. and how would you prevent RCU starvation ? what about
IPI ?
> > I have this hard real time system several years on my SMP/MC/SMT
> > machines. It serves me well. The core of OFFSCHED patch was 4 lines.
> > So,i simply compile a ***entirely regular*** linux bzImage and that's
> > it. It did not mess with drivers, spinlocks, softirqs ..., OFFSCHED just
> > directed the cpu_down to my own hard real time piece of code. The rest
> > of the kernel remained the same.
>
> Aaaaanyway, I'm not saying it's not a useful thing to do, just saying I
> don't see any reason you can't get essentially the same result with
> what's in the kernel now.
I thank you for your interest.
> -Mike
>
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: RFC: THE OFFLINE SCHEDULER
2009-08-23 9:52 ` Mike Galbraith
@ 2009-08-25 15:23 ` Christoph Lameter
2009-08-25 17:56 ` Mike Galbraith
0 siblings, 1 reply; 79+ messages in thread
From: Christoph Lameter @ 2009-08-25 15:23 UTC (permalink / raw)
To: Mike Galbraith
Cc: raz ben yehuda, riel, mingo, peterz, andrew motron, wiseman, lkml,
linux-rt-users
On Sun, 23 Aug 2009, Mike Galbraith wrote:
> The rest, I'll leave off replying to, we're kinda splitting hairs. I
> don't see a big generic benefit to OFFSCHED or ilk, others do.
No we are not splitting hairs. OFFSCHED takes the OS noise (interrupts,
timers, RCU, cacheline stealing etc etc) out of certain processors. You
cannot run an undisturbed piece of software on the OS right now.
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: RFC: THE OFFLINE SCHEDULER
2009-08-25 15:23 ` Christoph Lameter
@ 2009-08-25 17:56 ` Mike Galbraith
2009-08-25 18:03 ` Christoph Lameter
0 siblings, 1 reply; 79+ messages in thread
From: Mike Galbraith @ 2009-08-25 17:56 UTC (permalink / raw)
To: Christoph Lameter
Cc: raz ben yehuda, riel, mingo, peterz, andrew motron, wiseman, lkml,
linux-rt-users
On Tue, 2009-08-25 at 11:23 -0400, Christoph Lameter wrote:
> On Sun, 23 Aug 2009, Mike Galbraith wrote:
>
> > The rest, I'll leave off replying to, we're kinda splitting hairs. I
> > don't see a big generic benefit to OFFSCHED or ilk, others do.
>
> No we are not splitting hairs. OFFSCHED takes the OS noise (interrupts,
> timers, RCU, cacheline stealing etc etc) out of certain processors. You
> cannot run an undisturbed piece of software on the OS right now.
I asked the questions I did out of pure curiosity, and that curiosity
has been satisfied. It's not that I find it useless or whatnot (or that
my opinion matters to anyone but me;). I personally find the concept of
injecting an RTOS into a general purpose OS with no isolation to be
alien. Intriguing, but very very alien.
-Mike
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: RFC: THE OFFLINE SCHEDULER
2009-08-25 17:56 ` Mike Galbraith
@ 2009-08-25 18:03 ` Christoph Lameter
2009-08-25 18:12 ` Mike Galbraith
2009-08-25 19:08 ` Peter Zijlstra
0 siblings, 2 replies; 79+ messages in thread
From: Christoph Lameter @ 2009-08-25 18:03 UTC (permalink / raw)
To: Mike Galbraith
Cc: raz ben yehuda, riel, mingo, peterz, andrew motron, wiseman, lkml,
linux-rt-users
On Tue, 25 Aug 2009, Mike Galbraith wrote:
> I asked the questions I did out of pure curiosity, and that curiosity
> has been satisfied. It's not that I find it useless or whatnot (or that
> my opinion matters to anyone but me;). I personally find the concept of
> injecting an RTOS into a general purpose OS with no isolation to be
> alien. Intriguing, but very very alien.
Well lets work on the isolation piece then. We could run a regular process
on the RT cpu and switch back when OS services are needed?
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: RFC: THE OFFLINE SCHEDULER
2009-08-25 18:03 ` Christoph Lameter
@ 2009-08-25 18:12 ` Mike Galbraith
[not found] ` <5d96567b0908251522m3fd4ab98n76a52a34a11e874c@mail.gmail.com>
2009-08-25 19:08 ` Peter Zijlstra
1 sibling, 1 reply; 79+ messages in thread
From: Mike Galbraith @ 2009-08-25 18:12 UTC (permalink / raw)
To: Christoph Lameter
Cc: raz ben yehuda, riel, mingo, peterz, andrew motron, wiseman, lkml,
linux-rt-users
On Tue, 2009-08-25 at 14:03 -0400, Christoph Lameter wrote:
> On Tue, 25 Aug 2009, Mike Galbraith wrote:
>
> > I asked the questions I did out of pure curiosity, and that curiosity
> > has been satisfied. It's not that I find it useless or whatnot (or that
> > my opinion matters to anyone but me;). I personally find the concept of
> > injecting an RTOS into a general purpose OS with no isolation to be
> > alien. Intriguing, but very very alien.
>
> Well lets work on the isolation piece then. We could run a regular process
> on the RT cpu and switch back when OS services are needed?
If there were isolation, that would make it much less alien to _me_.
Isolation would kinda destroy the reason it was written though. RT
application/OS is injected into the network stack, which is kinda cool,
but makes the hairs on my neck stand up.
-Mike
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: RFC: THE OFFLINE SCHEDULER
2009-08-25 18:03 ` Christoph Lameter
2009-08-25 18:12 ` Mike Galbraith
@ 2009-08-25 19:08 ` Peter Zijlstra
2009-08-25 19:18 ` Christoph Lameter
` (2 more replies)
1 sibling, 3 replies; 79+ messages in thread
From: Peter Zijlstra @ 2009-08-25 19:08 UTC (permalink / raw)
To: Christoph Lameter
Cc: Mike Galbraith, raz ben yehuda, riel, mingo, andrew motron,
wiseman, lkml, linux-rt-users
On Tue, 2009-08-25 at 14:03 -0400, Christoph Lameter wrote:
> On Tue, 25 Aug 2009, Mike Galbraith wrote:
>
> > I asked the questions I did out of pure curiosity, and that curiosity
> > has been satisfied. It's not that I find it useless or whatnot (or that
> > my opinion matters to anyone but me;). I personally find the concept of
> > injecting an RTOS into a general purpose OS with no isolation to be
> > alien. Intriguing, but very very alien.
>
> Well lets work on the isolation piece then. We could run a regular process
> on the RT cpu and switch back when OS services are needed?
Christoph, stop being silly, this offline scheduler thing won't happen,
full stop.
Its not a maintainable solution, it doesn't integrate with existing
kernel infrastructure, and its plain ugly.
If you want something work within Linux, don't build kernels in kernels
or other such ugly hacks.
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: RFC: THE OFFLINE SCHEDULER
2009-08-25 19:08 ` Peter Zijlstra
@ 2009-08-25 19:18 ` Christoph Lameter
2009-08-25 19:22 ` Chris Friesen
2009-08-25 21:09 ` Éric Piel
2 siblings, 0 replies; 79+ messages in thread
From: Christoph Lameter @ 2009-08-25 19:18 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Mike Galbraith, raz ben yehuda, riel, mingo, andrew motron,
wiseman, lkml, linux-rt-users
On Tue, 25 Aug 2009, Peter Zijlstra wrote:
> On Tue, 2009-08-25 at 14:03 -0400, Christoph Lameter wrote:
> > On Tue, 25 Aug 2009, Mike Galbraith wrote:
> >
> > > I asked the questions I did out of pure curiosity, and that curiosity
> > > has been satisfied. It's not that I find it useless or whatnot (or that
> > > my opinion matters to anyone but me;). I personally find the concept of
> > > injecting an RTOS into a general purpose OS with no isolation to be
> > > alien. Intriguing, but very very alien.
> >
> > Well lets work on the isolation piece then. We could run a regular process
> > on the RT cpu and switch back when OS services are needed?
>
> Christoph, stop being silly, this offline scheduler thing won't happen,
> full stop.
Well there are the low latency requirements still. Those need to be
addressed in some form. Some of these ideas here are a starting point.
> Its not a maintainable solution, it doesn't integrate with existing
> kernel infrastructure, and its plain ugly.
>
> If you want something work within Linux, don't build kernels in kernels
> or other such ugly hacks.
Ok so how would you go about avoiding the OS noise which motivated
the patches for the Offline scheduler?
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: RFC: THE OFFLINE SCHEDULER
2009-08-25 19:08 ` Peter Zijlstra
2009-08-25 19:18 ` Christoph Lameter
@ 2009-08-25 19:22 ` Chris Friesen
2009-08-25 20:35 ` Sven-Thorsten Dietrich
2009-08-26 5:31 ` Peter Zijlstra
2009-08-25 21:09 ` Éric Piel
2 siblings, 2 replies; 79+ messages in thread
From: Chris Friesen @ 2009-08-25 19:22 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Christoph Lameter, Mike Galbraith, raz ben yehuda, riel, mingo,
andrew motron, wiseman, lkml, linux-rt-users
On 08/25/2009 01:08 PM, Peter Zijlstra wrote:
> Christoph, stop being silly, this offline scheduler thing won't happen,
> full stop.
>
> Its not a maintainable solution, it doesn't integrate with existing
> kernel infrastructure, and its plain ugly.
>
> If you want something work within Linux, don't build kernels in kernels
> or other such ugly hacks.
Is it the whole concept of isolating one or more cpus from all normal
kernel tasks that you don't like, or just this particular implementation?
I ask because I know of at least one project that would have used this
capability had it been available. As it stands they have to live with
the usual kernel threads running on the cpu that they're trying to
dedicate to their app.
Chris
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: RFC: THE OFFLINE SCHEDULER
2009-08-25 19:22 ` Chris Friesen
@ 2009-08-25 20:35 ` Sven-Thorsten Dietrich
2009-08-26 5:31 ` Peter Zijlstra
1 sibling, 0 replies; 79+ messages in thread
From: Sven-Thorsten Dietrich @ 2009-08-25 20:35 UTC (permalink / raw)
To: Chris Friesen
Cc: Peter Zijlstra, Christoph Lameter, Mike Galbraith, raz ben yehuda,
riel, mingo, andrew motron, wiseman, lkml, linux-rt-users
On Tue, 2009-08-25 at 13:22 -0600, Chris Friesen wrote:
> On 08/25/2009 01:08 PM, Peter Zijlstra wrote:
>
> > Christoph, stop being silly, this offline scheduler thing won't happen,
> > full stop.
> >
> > Its not a maintainable solution, it doesn't integrate with existing
> > kernel infrastructure, and its plain ugly.
> >
> > If you want something work within Linux, don't build kernels in kernels
> > or other such ugly hacks.
>
> Is it the whole concept of isolating one or more cpus from all normal
> kernel tasks that you don't like, or just this particular implementation?
>
> I ask because I know of at least one project that would have used this
> capability had it been available. As it stands they have to live with
> the usual kernel threads running on the cpu that they're trying to
> dedicate to their app.
>
Its already possible to *almost* vacate a CPU except for a handful of
kernel threads.
There are various hacks being distributed which also offload / suppress
timer and rcu activity from specific CPUs.
Everything I have looked at has been hackish and racy, and no one using
this is pusing any of it upstream.
OFFLINING solves the problem in a minimalist way, and only for tasks
with very limited interaction with the Kernel.
In contrast however, almost all tasks with such limited Kernel
interaction should be able to do fine under PREEMPT_RT after some cpuset
work.
For those which absolutely cannot handle a handful of kernel threads
sharing the CPU, the only option today is one or another form of
hackery, and amongst those options, this would seem attractive by its
mere simplicity.
But complete CPU isolation for user-space tasks still eludes.
Sven
> Chris
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: RFC: THE OFFLINE SCHEDULER
2009-08-25 19:08 ` Peter Zijlstra
2009-08-25 19:18 ` Christoph Lameter
2009-08-25 19:22 ` Chris Friesen
@ 2009-08-25 21:09 ` Éric Piel
2 siblings, 0 replies; 79+ messages in thread
From: Éric Piel @ 2009-08-25 21:09 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Christoph Lameter, Mike Galbraith, raz ben yehuda, riel, mingo,
andrew motron, wiseman, lkml, linux-rt-users
Op 25-08-09 21:08, Peter Zijlstra schreef:
> On Tue, 2009-08-25 at 14:03 -0400, Christoph Lameter wrote:
>> On Tue, 25 Aug 2009, Mike Galbraith wrote:
>>
>>> I asked the questions I did out of pure curiosity, and that curiosity
>>> has been satisfied. It's not that I find it useless or whatnot (or that
>>> my opinion matters to anyone but me;). I personally find the concept of
>>> injecting an RTOS into a general purpose OS with no isolation to be
>>> alien. Intriguing, but very very alien.
>> Well lets work on the isolation piece then. We could run a regular process
>> on the RT cpu and switch back when OS services are needed?
>
> Christoph, stop being silly, this offline scheduler thing won't happen,
> full stop.
>
> Its not a maintainable solution, it doesn't integrate with existing
> kernel infrastructure, and its plain ugly.
>
> If you want something work within Linux, don't build kernels in kernels
> or other such ugly hacks.
Hello,
For the one interested in such approach, you can have a look at an now
unmaintained project that we developed, ARTiS:
http://www2.lifl.fr/west/artis/
It allows several RT tasks to share a "RT" cpu, and if a task tries to
"cheat" by calling a kernel function which disables the preemption or
the interrupts, it is temporally migrated to another CPU. This is a
working approach, with some good low latency results which can be seen
in the papers on the website.
See you,
Eric
^ permalink raw reply [flat|nested] 79+ messages in thread
* Fwd: RFC: THE OFFLINE SCHEDULER
[not found] ` <5d96567b0908251522m3fd4ab98n76a52a34a11e874c@mail.gmail.com>
@ 2009-08-25 22:32 ` Raz
0 siblings, 0 replies; 79+ messages in thread
From: Raz @ 2009-08-25 22:32 UTC (permalink / raw)
To: Linux Kernel, linux-rt-users
[-- Attachment #1: Type: text/plain, Size: 1209 bytes --]
There are other disturbances other than interrupts.
attached is a first draft for the 11-th real time conference ,( If it
would be accepted ).
tar zxvf offsched.tgz
cd paper
make
kpdf offsched.pdf.
thank you
raz
On Tue, Aug 25, 2009 at 9:12 PM, Mike Galbraith <efault@gmx.de> wrote:
>
> On Tue, 2009-08-25 at 14:03 -0400, Christoph Lameter wrote:
> > On Tue, 25 Aug 2009, Mike Galbraith wrote:
> >
> > > I asked the questions I did out of pure curiosity, and that curiosity
> > > has been satisfied. It's not that I find it useless or whatnot (or that
> > > my opinion matters to anyone but me;). I personally find the concept of
> > > injecting an RTOS into a general purpose OS with no isolation to be
> > > alien. Intriguing, but very very alien.
> >
> > Well lets work on the isolation piece then. We could run a regular process
> > on the RT cpu and switch back when OS services are needed?
>
> If there were isolation, that would make it much less alien to _me_.
> Isolation would kinda destroy the reason it was written though. RT
> application/OS is injected into the network stack, which is kinda cool,
> but makes the hairs on my neck stand up.
>
> -Mike
>
[-- Attachment #2: offsched.tgz --]
[-- Type: application/x-gzip, Size: 117378 bytes --]
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: RFC: THE OFFLINE SCHEDULER
2009-08-25 19:22 ` Chris Friesen
2009-08-25 20:35 ` Sven-Thorsten Dietrich
@ 2009-08-26 5:31 ` Peter Zijlstra
2009-08-26 10:29 ` raz ben yehuda
2009-08-26 15:21 ` Pekka Enberg
1 sibling, 2 replies; 79+ messages in thread
From: Peter Zijlstra @ 2009-08-26 5:31 UTC (permalink / raw)
To: Chris Friesen
Cc: Christoph Lameter, Mike Galbraith, raz ben yehuda, riel, mingo,
andrew motron, wiseman, lkml, linux-rt-users
On Tue, 2009-08-25 at 13:22 -0600, Chris Friesen wrote:
> On 08/25/2009 01:08 PM, Peter Zijlstra wrote:
>
> > Christoph, stop being silly, this offline scheduler thing won't happen,
> > full stop.
> >
> > Its not a maintainable solution, it doesn't integrate with existing
> > kernel infrastructure, and its plain ugly.
> >
> > If you want something work within Linux, don't build kernels in kernels
> > or other such ugly hacks.
>
> Is it the whole concept of isolating one or more cpus from all normal
> kernel tasks that you don't like, or just this particular implementation?
>
> I ask because I know of at least one project that would have used this
> capability had it been available. As it stands they have to live with
> the usual kernel threads running on the cpu that they're trying to
> dedicate to their app.
Its the simple fact of going around the kernel instead of using the
kernel.
Going around the kernel doesn't benefit anybody, least of all Linux.
So its the concept of running stuff on a CPU outside of Linux that I
don't like. I mean, if you want that, go ahead and run RTLinux, RTAI,
L4-Linux etc.. lots of special non-Linux hypervisor/exo-kernel like
things around for you to run things outside Linux with.
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: RFC: THE OFFLINE SCHEDULER
2009-08-26 10:29 ` raz ben yehuda
@ 2009-08-26 8:02 ` Mike Galbraith
2009-08-26 8:16 ` Raz
2009-08-26 13:47 ` Christoph Lameter
1 sibling, 1 reply; 79+ messages in thread
From: Mike Galbraith @ 2009-08-26 8:02 UTC (permalink / raw)
To: raz ben yehuda
Cc: Peter Zijlstra, Chris Friesen, Christoph Lameter, riel, mingo,
andrew motron, wiseman, lkml, linux-rt-users
On Wed, 2009-08-26 at 13:29 +0300, raz ben yehuda wrote:
> OFFSCHED is bad name to my project. My project is called SOS = Service
> Oriented System.
> SOS, has nothing to do with Real time.
??
The paper you pointed me at maintains it's very much about realtime.
<quote>
This paper argues that OFFSCHED fits to the niche of Multiprocessors
real time systems by partitioning a system to two; the operating system
and OFFSCHED. OFFSHCED is a hybrid system. It is hybrid because it is
both real time and still a regular Linux server. Real time is mainly
achieved by the NMI characteristic and the CPU isolation. It is a hybrid
system because OFFSCHED scheduler interacts with the operating system.
</quote>
-Mike.
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: RFC: THE OFFLINE SCHEDULER
2009-08-26 8:02 ` Mike Galbraith
@ 2009-08-26 8:16 ` Raz
0 siblings, 0 replies; 79+ messages in thread
From: Raz @ 2009-08-26 8:16 UTC (permalink / raw)
To: Mike Galbraith
Cc: Peter Zijlstra, Chris Friesen, Christoph Lameter, riel, mingo,
andrew motron, wiseman, lkml, linux-rt-users
On Wed, Aug 26, 2009 at 10:02 AM, Mike Galbraith<efault@gmx.de> wrote:
> On Wed, 2009-08-26 at 13:29 +0300, raz ben yehuda wrote:
>
>> OFFSCHED is bad name to my project. My project is called SOS = Service
>> Oriented System.
>> SOS, has nothing to do with Real time.
>
> ??
>
> The paper you pointed me at maintains it's very much about realtime.
>
> <quote>
> This paper argues that OFFSCHED fits to the niche of Multiprocessors
> real time systems by partitioning a system to two; the operating system
> and OFFSCHED. OFFSHCED is a hybrid system. It is hybrid because it is
> both real time and still a regular Linux server. Real time is mainly
> achieved by the NMI characteristic and the CPU isolation. It is a hybrid
> system because OFFSCHED scheduler interacts with the operating system.
> </quote>
>
> -Mike.
>
>
Mike Hello
Correct. OFFSCHED has a real time facet that puts it in the SMP real
time system arena. it has other facets such as security and
monitoring. If you take a look at OFFSCHED-RTOP and OFFSCHED-SECURED
you will see that I can actually get information and change a system
properties (very pool implementation so far, i am a bit tired...)
even if this kernel is not **accessible**.
Raz
--
To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: RFC: THE OFFLINE SCHEDULER
2009-08-26 5:31 ` Peter Zijlstra
@ 2009-08-26 10:29 ` raz ben yehuda
2009-08-26 8:02 ` Mike Galbraith
2009-08-26 13:47 ` Christoph Lameter
2009-08-26 15:21 ` Pekka Enberg
1 sibling, 2 replies; 79+ messages in thread
From: raz ben yehuda @ 2009-08-26 10:29 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Chris Friesen, Christoph Lameter, Mike Galbraith, riel, mingo,
andrew motron, wiseman, lkml, linux-rt-users
On Wed, 2009-08-26 at 07:31 +0200, Peter Zijlstra wrote:
> On Tue, 2009-08-25 at 13:22 -0600, Chris Friesen wrote:
> > On 08/25/2009 01:08 PM, Peter Zijlstra wrote:
> >
> > > Christoph, stop being silly, this offline scheduler thing won't happen,
> > > full stop.
> > >
> > > Its not a maintainable solution, it doesn't integrate with existing
> > > kernel infrastructure, and its plain ugly.
> > >
> > > If you want something work within Linux, don't build kernels in kernels
> > > or other such ugly hacks.
> >
> > Is it the whole concept of isolating one or more cpus from all normal
> > kernel tasks that you don't like, or just this particular implementation?
> >
> > I ask because I know of at least one project that would have used this
> > capability had it been available. As it stands they have to live with
> > the usual kernel threads running on the cpu that they're trying to
> > dedicate to their app.
>
> Its the simple fact of going around the kernel instead of using the
> kernel.
>
> Going around the kernel doesn't benefit anybody, least of all Linux.
>
> So its the concept of running stuff on a CPU outside of Linux that I
> don't like. I mean, if you want that, go ahead and run RTLinux, RTAI,
> L4-Linux etc.. lots of special non-Linux hypervisor/exo-kernel like
> things around for you to run things outside Linux with.
Hello Peter, Hello All.
First , It a pleasure seeing that you take interest in OFFSCHED.
So thank you.
To my opinion this a matter of defining what a system is. Queuing theory
teaches us that a system is defined to be everything within the boundary
of the computer, this includes, peripherals, processors, RAM , operating
system, the distribution and so on.
The kernel is merely a part of the SYSTEM, it is not THE SYSTEM; and it
is not a blasphemy to bypass it.The kernel is not the goal and it is not
sacred.
OFFSCHED is bad name to my project. My project is called SOS = Service
Oriented System.
SOS, has nothing to do with Real time. SOS is about arranging the
processors to serve the SYSTEM the best way we can; if the kernel
disturbs the service, put it a side I say.
How will the kernel is going to handle 32 processors machines ? These
numbers are no longer a science-fiction.
What i am suggesting is merely a different approach of how to handle
multiple core systems. instead of thinking in processes, threads and so
on i am thinking in services. Why not take a processor and define this
processor to do just firewalling ? encryption ? routing ? transmission ?
video processing... and so on...
Raz
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: RFC: THE OFFLINE SCHEDULER
2009-08-26 10:29 ` raz ben yehuda
2009-08-26 8:02 ` Mike Galbraith
@ 2009-08-26 13:47 ` Christoph Lameter
2009-08-26 14:45 ` Maxim Levitsky
1 sibling, 1 reply; 79+ messages in thread
From: Christoph Lameter @ 2009-08-26 13:47 UTC (permalink / raw)
To: raz ben yehuda
Cc: Peter Zijlstra, Chris Friesen, Mike Galbraith, riel, mingo,
andrew motron, wiseman, lkml, linux-rt-users
On Wed, 26 Aug 2009, raz ben yehuda wrote:
> How will the kernel is going to handle 32 processors machines ? These
> numbers are no longer a science-fiction.
The kernel is already running on 4096 processor machines. Dont worry about
that.
> What i am suggesting is merely a different approach of how to handle
> multiple core systems. instead of thinking in processes, threads and so
> on i am thinking in services. Why not take a processor and define this
> processor to do just firewalling ? encryption ? routing ? transmission ?
> video processing... and so on...
I think that is a valuable avenue to explore. What we do so far is
treating each processor equally. Dedicating a processor has benefits in
terms of cache hotness and limits OS noise.
Most of the large processor configurations already partition the system
using cpusets in order to limit the disturbance by OS processing. A set of
cpus is used for OS activities and system daemons are put into that set.
But what can be done is limited because the OS threads as well as
interrupt and timer processing etc cannot currently be moved. The ideas
that you are proposing are particularly usedful for applications that
require low latencies and cannot tolerate OS noise easily (Infiniband MPI
base jobs f.e.)
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: RFC: THE OFFLINE SCHEDULER
2009-08-26 13:47 ` Christoph Lameter
@ 2009-08-26 14:45 ` Maxim Levitsky
2009-08-26 14:54 ` raz ben yehuda
0 siblings, 1 reply; 79+ messages in thread
From: Maxim Levitsky @ 2009-08-26 14:45 UTC (permalink / raw)
To: Christoph Lameter
Cc: raz ben yehuda, Peter Zijlstra, Chris Friesen, Mike Galbraith,
riel, mingo, andrew motron, wiseman, lkml, linux-rt-users
On Wed, 2009-08-26 at 09:47 -0400, Christoph Lameter wrote:
> On Wed, 26 Aug 2009, raz ben yehuda wrote:
>
> > How will the kernel is going to handle 32 processors machines ? These
> > numbers are no longer a science-fiction.
>
> The kernel is already running on 4096 processor machines. Dont worry about
> that.
>
> > What i am suggesting is merely a different approach of how to handle
> > multiple core systems. instead of thinking in processes, threads and so
> > on i am thinking in services. Why not take a processor and define this
> > processor to do just firewalling ? encryption ? routing ? transmission ?
> > video processing... and so on...
>
> I think that is a valuable avenue to explore. What we do so far is
> treating each processor equally. Dedicating a processor has benefits in
> terms of cache hotness and limits OS noise.
>
> Most of the large processor configurations already partition the system
> using cpusets in order to limit the disturbance by OS processing. A set of
> cpus is used for OS activities and system daemons are put into that set.
> But what can be done is limited because the OS threads as well as
> interrupt and timer processing etc cannot currently be moved. The ideas
> that you are proposing are particularly usedful for applications that
> require low latencies and cannot tolerate OS noise easily (Infiniband MPI
> base jobs f.e.)
My 0.2 cents:
I have always been fascinated by the idea of controlling another cpu
from the main CPU.
Usually these cpus are custom, run proprietary software, and have no
datasheet on their I/O interfaces.
But, being able to turn an ordinary CPU into something like that seems
to be very nice.
For example, It might help with profiling. Think about a program that
can run uninterrupted how much it wants.
I might even be better, if the dedicated CPU would use a predefined
reserved memory range (I wish there was a way to actually lock it to
that range)
On the other hand, I could see this as a jump platform for more
proprietary code, something like that: we use linux in out server
platform, but out "insert buzzword here" network stack pro+ can handle
100% more load that linux does, and it runs on a dedicated core....
In the other words, we might see 'firmwares' that take an entire cpu for
their usage.
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: RFC: THE OFFLINE SCHEDULER
2009-08-26 14:45 ` Maxim Levitsky
@ 2009-08-26 14:54 ` raz ben yehuda
2009-08-26 15:06 ` Pekka Enberg
` (2 more replies)
0 siblings, 3 replies; 79+ messages in thread
From: raz ben yehuda @ 2009-08-26 14:54 UTC (permalink / raw)
To: Maxim Levitsky
Cc: Christoph Lameter, Peter Zijlstra, Chris Friesen, Mike Galbraith,
riel, mingo, andrew motron, wiseman, lkml, linux-rt-users
On Wed, 2009-08-26 at 17:45 +0300, Maxim Levitsky wrote:
> On Wed, 2009-08-26 at 09:47 -0400, Christoph Lameter wrote:
> > On Wed, 26 Aug 2009, raz ben yehuda wrote:
> >
> > > How will the kernel is going to handle 32 processors machines ? These
> > > numbers are no longer a science-fiction.
> >
> > The kernel is already running on 4096 processor machines. Dont worry about
> > that.
> >
> > > What i am suggesting is merely a different approach of how to handle
> > > multiple core systems. instead of thinking in processes, threads and so
> > > on i am thinking in services. Why not take a processor and define this
> > > processor to do just firewalling ? encryption ? routing ? transmission ?
> > > video processing... and so on...
> >
> > I think that is a valuable avenue to explore. What we do so far is
> > treating each processor equally. Dedicating a processor has benefits in
> > terms of cache hotness and limits OS noise.
> >
> > Most of the large processor configurations already partition the system
> > using cpusets in order to limit the disturbance by OS processing. A set of
> > cpus is used for OS activities and system daemons are put into that set.
> > But what can be done is limited because the OS threads as well as
> > interrupt and timer processing etc cannot currently be moved. The ideas
> > that you are proposing are particularly usedful for applications that
> > require low latencies and cannot tolerate OS noise easily (Infiniband MPI
> > base jobs f.e.)
>
> My 0.2 cents:
>
> I have always been fascinated by the idea of controlling another cpu
> from the main CPU.
>
> Usually these cpus are custom, run proprietary software, and have no
> datasheet on their I/O interfaces.
>
> But, being able to turn an ordinary CPU into something like that seems
> to be very nice.
>
> For example, It might help with profiling. Think about a program that
> can run uninterrupted how much it wants.
>
> I might even be better, if the dedicated CPU would use a predefined
> reserved memory range (I wish there was a way to actually lock it to
> that range)
>
> On the other hand, I could see this as a jump platform for more
> proprietary code, something like that: we use linux in out server
> platform, but out "insert buzzword here" network stack pro+ can handle
> 100% more load that linux does, and it runs on a dedicated core....
>
> In the other words, we might see 'firmwares' that take an entire cpu for
> their usage.
This is exactly what offsched (sos) is. you got it. SOS was partly inspired by the notion of a GPU.
Processors are to become more and more redundant and Linux as an
evolutionary system must use it. why not offload raid5 write engine ?
why not encrypt in a different processor ?
Also , having so many processors in a single OS means a bug prone
system , with endless contention points when two or more OS processors
interacts. let's make things simpler.
> >
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at http://www.tux.org/lkml/
>
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: RFC: THE OFFLINE SCHEDULER
2009-08-26 14:54 ` raz ben yehuda
@ 2009-08-26 15:06 ` Pekka Enberg
2009-08-26 15:11 ` raz ben yehuda
2009-08-26 15:30 ` Peter Zijlstra
2009-08-26 15:37 ` Chetan.Loke
2 siblings, 1 reply; 79+ messages in thread
From: Pekka Enberg @ 2009-08-26 15:06 UTC (permalink / raw)
To: raz ben yehuda
Cc: Maxim Levitsky, Christoph Lameter, Peter Zijlstra, Chris Friesen,
Mike Galbraith, riel, mingo, andrew motron, wiseman, lkml,
linux-rt-users
On Wed, Aug 26, 2009 at 5:54 PM, raz ben yehuda<raziebe@gmail.com> wrote:
>> I have always been fascinated by the idea of controlling another cpu
>> from the main CPU.
>>
>> Usually these cpus are custom, run proprietary software, and have no
>> datasheet on their I/O interfaces.
>>
>> But, being able to turn an ordinary CPU into something like that seems
>> to be very nice.
>>
>> For example, It might help with profiling. Think about a program that
>> can run uninterrupted how much it wants.
>>
>> I might even be better, if the dedicated CPU would use a predefined
>> reserved memory range (I wish there was a way to actually lock it to
>> that range)
>>
>> On the other hand, I could see this as a jump platform for more
>> proprietary code, something like that: we use linux in out server
>> platform, but out "insert buzzword here" network stack pro+ can handle
>> 100% more load that linux does, and it runs on a dedicated core....
>>
>> In the other words, we might see 'firmwares' that take an entire cpu for
>> their usage.
>
> This is exactly what offsched (sos) is. you got it. SOS was partly inspired by the notion of a GPU.
So where are the patches? The URL in the original post returns 404...
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: RFC: THE OFFLINE SCHEDULER
2009-08-26 15:06 ` Pekka Enberg
@ 2009-08-26 15:11 ` raz ben yehuda
0 siblings, 0 replies; 79+ messages in thread
From: raz ben yehuda @ 2009-08-26 15:11 UTC (permalink / raw)
To: Pekka Enberg
Cc: Maxim Levitsky, Christoph Lameter, Peter Zijlstra, Chris Friesen,
Mike Galbraith, riel, mingo, andrew motron, wiseman, lkml,
linux-rt-users
sos linux is at:
http://sos-linux.svn.sourceforge.net/viewvc/sos-linux/offsched/
you will find the modules, once shot patches , split patches, and a
Documentation folder.
On Wed, 2009-08-26 at 18:06 +0300, Pekka Enberg wrote:
> On Wed, Aug 26, 2009 at 5:54 PM, raz ben yehuda<raziebe@gmail.com> wrote:
> >> I have always been fascinated by the idea of controlling another cpu
> >> from the main CPU.
> >>
> >> Usually these cpus are custom, run proprietary software, and have no
> >> datasheet on their I/O interfaces.
> >>
> >> But, being able to turn an ordinary CPU into something like that seems
> >> to be very nice.
> >>
> >> For example, It might help with profiling. Think about a program that
> >> can run uninterrupted how much it wants.
> >>
> >> I might even be better, if the dedicated CPU would use a predefined
> >> reserved memory range (I wish there was a way to actually lock it to
> >> that range)
> >>
> >> On the other hand, I could see this as a jump platform for more
> >> proprietary code, something like that: we use linux in out server
> >> platform, but out "insert buzzword here" network stack pro+ can handle
> >> 100% more load that linux does, and it runs on a dedicated core....
> >>
> >> In the other words, we might see 'firmwares' that take an entire cpu for
> >> their usage.
> >
> > This is exactly what offsched (sos) is. you got it. SOS was partly inspired by the notion of a GPU.
>
> So where are the patches? The URL in the original post returns 404...
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: RFC: THE OFFLINE SCHEDULER
2009-08-26 5:31 ` Peter Zijlstra
2009-08-26 10:29 ` raz ben yehuda
@ 2009-08-26 15:21 ` Pekka Enberg
1 sibling, 0 replies; 79+ messages in thread
From: Pekka Enberg @ 2009-08-26 15:21 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Chris Friesen, Christoph Lameter, Mike Galbraith, raz ben yehuda,
riel, mingo, andrew motron, wiseman, lkml, linux-rt-users
Hi Peter,
On Wed, Aug 26, 2009 at 8:31 AM, Peter Zijlstra<peterz@infradead.org> wrote:
>> Is it the whole concept of isolating one or more cpus from all normal
>> kernel tasks that you don't like, or just this particular implementation?
>>
>> I ask because I know of at least one project that would have used this
>> capability had it been available. As it stands they have to live with
>> the usual kernel threads running on the cpu that they're trying to
>> dedicate to their app.
>
> Its the simple fact of going around the kernel instead of using the
> kernel.
>
> Going around the kernel doesn't benefit anybody, least of all Linux.
>
> So its the concept of running stuff on a CPU outside of Linux that I
> don't like. I mean, if you want that, go ahead and run RTLinux, RTAI,
> L4-Linux etc.. lots of special non-Linux hypervisor/exo-kernel like
> things around for you to run things outside Linux with.
Out of curiosity, what's the problem with it? Why can't the scheduler
be taught to bind one user-space thread on a given CPU and make sure
no other threads are scheduled on that CPU? I'm not a scheduler expert
but that seems like a logical extension to the current cpuset logic
and would help the low-latency workload Christoph has described in the
past.
Pekka
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: RFC: THE OFFLINE SCHEDULER
2009-08-26 14:54 ` raz ben yehuda
2009-08-26 15:06 ` Pekka Enberg
@ 2009-08-26 15:30 ` Peter Zijlstra
2009-08-26 15:41 ` Christoph Lameter
2009-08-26 15:37 ` Chetan.Loke
2 siblings, 1 reply; 79+ messages in thread
From: Peter Zijlstra @ 2009-08-26 15:30 UTC (permalink / raw)
To: raz ben yehuda
Cc: Maxim Levitsky, Christoph Lameter, Chris Friesen, Mike Galbraith,
riel, mingo, andrew motron, wiseman, lkml, linux-rt-users
On Wed, 2009-08-26 at 17:54 +0300, raz ben yehuda wrote:
> This is exactly what offsched (sos) is. you got it. SOS was partly
> inspired by the notion of a GPU.
It is not, GPUs and other paired chips form a hybrid system. Linux is
known to run on one or more of such chips and communicate through
whatever means these chips have.
But what you propose here is hard partitioning a homogeneous system,
totally different.
> Processors are to become more and more redundant and Linux as an
> evolutionary system must use it. why not offload raid5 write engine ?
> why not encrypt in a different processor ?
Why waste a whole cpu for something that could be done by part of one?
> Also , having so many processors in a single OS means a bug prone
> system , with endless contention points when two or more OS processors
> interacts.
You're bound to have interaction between the core os and these
partitions you want, non of it different from how threads in the kernel
would interact, other than that you're going to re-invent everything
already present in the kernel.
> let's make things simpler.
You don't, you make things more complex by introducing duplicate
functionality.
What's more, you burden the user with having to configure such a system,
and make choices on having to give up parts of his system, nothing like
that should be needed on a homogeneous system.
Work spend on trimming fat of of the core kernel helps everybody, even
users not otherwise interested in things like giving up a whole cpu for
some odd purpose.
There is no reason something could be done more efficiently on a
dedicated CPU than not when you assume a homogeneous system (which is
all Linux supports in the single image sense).
If you think the kernel is too fat and does superfluous things for your
needs, help trim it.
^ permalink raw reply [flat|nested] 79+ messages in thread
* RE: RFC: THE OFFLINE SCHEDULER
2009-08-26 14:54 ` raz ben yehuda
2009-08-26 15:06 ` Pekka Enberg
2009-08-26 15:30 ` Peter Zijlstra
@ 2009-08-26 15:37 ` Chetan.Loke
2 siblings, 0 replies; 79+ messages in thread
From: Chetan.Loke @ 2009-08-26 15:37 UTC (permalink / raw)
To: raziebe, maximlevitsky
Cc: cl, peterz, cfriesen, efault, riel, mingo, akpm, wiseman,
linux-kernel, linux-rt-users
> -----Original Message-----
> From: linux-kernel-owner@vger.kernel.org [mailto:linux-kernel-
> owner@vger.kernel.org] On Behalf Of raz ben yehuda
> Sent: Wednesday, August 26, 2009 10:54 AM
> To: Maxim Levitsky
> Cc: Christoph Lameter; Peter Zijlstra; Chris Friesen; Mike Galbraith;
> riel@redhat.com; mingo@elte.hu; andrew motron; wiseman@macs.biu.ac.il;
> lkml; linux-rt-users@vger.kernel.org
> Subject: Re: RFC: THE OFFLINE SCHEDULER
>
>
> On Wed, 2009-08-26 at 17:45 +0300, Maxim Levitsky wrote:
> > On Wed, 2009-08-26 at 09:47 -0400, Christoph Lameter wrote:
> > > On Wed, 26 Aug 2009, raz ben yehuda wrote:
> > >
> > > > How will the kernel is going to handle 32 processors machines ?
> These
> > > > numbers are no longer a science-fiction.
> > >
> > > The kernel is already running on 4096 processor machines. Dont worry
> about
> > > that.
> > >
> > > > What i am suggesting is merely a different approach of how to handle
> > > > multiple core systems. instead of thinking in processes, threads and
> so
> > > > on i am thinking in services. Why not take a processor and define
> this
> > > > processor to do just firewalling ? encryption ? routing ?
> transmission ?
> > > > video processing... and so on...
> > >
> > > I think that is a valuable avenue to explore. What we do so far is
> > > treating each processor equally. Dedicating a processor has benefits
> in
> > > terms of cache hotness and limits OS noise.
> > >
> > > Most of the large processor configurations already partition the
> system
> > > using cpusets in order to limit the disturbance by OS processing. A
> set of
> > > cpus is used for OS activities and system daemons are put into that
> set.
> > > But what can be done is limited because the OS threads as well as
> > > interrupt and timer processing etc cannot currently be moved. The
> ideas
> > > that you are proposing are particularly usedful for applications that
> > > require low latencies and cannot tolerate OS noise easily (Infiniband
> MPI
> > > base jobs f.e.)
> >
> > My 0.2 cents:
> >
> > I have always been fascinated by the idea of controlling another cpu
> > from the main CPU.
> >
> > Usually these cpus are custom, run proprietary software, and have no
> > datasheet on their I/O interfaces.
> >
> > But, being able to turn an ordinary CPU into something like that seems
> > to be very nice.
> >
> > For example, It might help with profiling. Think about a program that
> > can run uninterrupted how much it wants.
> >
> > I might even be better, if the dedicated CPU would use a predefined
> > reserved memory range (I wish there was a way to actually lock it to
> > that range)
> >
> > On the other hand, I could see this as a jump platform for more
> > proprietary code, something like that: we use linux in out server
> > platform, but out "insert buzzword here" network stack pro+ can handle
> > 100% more load that linux does, and it runs on a dedicated core....
> >
> > In the other words, we might see 'firmwares' that take an entire cpu for
> > their usage.
> This is exactly what offsched (sos) is. you got it. SOS was partly
> inspired by the notion of a GPU.
> Processors are to become more and more redundant and Linux as an
> evolutionary system must use it. why not offload raid5 write engine ?
> why not encrypt in a different processor ?
RAID/Encryption + GPU. You got it. This is what one of our teams did but by offloading it on a PCIe I/O module and using couple(Protocol+Application core) of cores. One core could run SAS/SATA and other could run your home grown application f/w and/or a linux distro and you could make it do whatever you want. But that was then. Multi-Core systems are now a commodity.
Chetan Loke
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: RFC: THE OFFLINE SCHEDULER
2009-08-26 15:30 ` Peter Zijlstra
@ 2009-08-26 15:41 ` Christoph Lameter
2009-08-26 16:03 ` Peter Zijlstra
0 siblings, 1 reply; 79+ messages in thread
From: Christoph Lameter @ 2009-08-26 15:41 UTC (permalink / raw)
To: Peter Zijlstra
Cc: raz ben yehuda, Maxim Levitsky, Chris Friesen, Mike Galbraith,
riel, mingo, andrew motron, wiseman, lkml, linux-rt-users
On Wed, 26 Aug 2009, Peter Zijlstra wrote:
> Why waste a whole cpu for something that could be done by part of one?
Because of latency and performance requirements
> You're bound to have interaction between the core os and these
> partitions you want, non of it different from how threads in the kernel
> would interact, other than that you're going to re-invent everything
> already present in the kernel.
The kernel interactions can be done while running on another (not
isolated) cpu.
> You don't, you make things more complex by introducing duplicate
> functionality.
The functionality does not exist. This is about new features.
> If you think the kernel is too fat and does superfluous things for your
> needs, help trim it.
Mind boogling nonsense. Please stop fantasizing and trolling.
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: RFC: THE OFFLINE SCHEDULER
2009-08-26 15:41 ` Christoph Lameter
@ 2009-08-26 16:03 ` Peter Zijlstra
2009-08-26 16:16 ` Pekka Enberg
2009-08-26 16:20 ` Christoph Lameter
0 siblings, 2 replies; 79+ messages in thread
From: Peter Zijlstra @ 2009-08-26 16:03 UTC (permalink / raw)
To: Christoph Lameter
Cc: raz ben yehuda, Maxim Levitsky, Chris Friesen, Mike Galbraith,
riel, mingo, andrew motron, wiseman, lkml, linux-rt-users
On Wed, 2009-08-26 at 11:41 -0400, Christoph Lameter wrote:
> On Wed, 26 Aug 2009, Peter Zijlstra wrote:
>
> > Why waste a whole cpu for something that could be done by part of one?
>
> Because of latency and performance requirements
Latency is the only one, and yes people have been using hacks like this,
I've also earlier mentioned RTAI, RTLinux and L4-Linux which basically
do the same thing.
The problem is, that its not linux, you cannot run something on a these
off-cores and use the same functionality as linux, if you could it'd not
be offline.
The past year or so you've been whining about the tick latency, and I've
seen exactly _0_ patches from you slimming down the work done in there,
even though I pointed out some obvious things that could be done.
Carving out cpus just doesn't work in the long run (see below for more),
it adds configuration burdens on people and it would duplicate
functionality (below), or it provides it in a (near) useless manner.
If you were to work on lowering the linux latency in the full kernel
sense, you'd help out a lot of people, many use-cases would improve and
you'd be helpful to he greater good.
If you hack up special cases like this, then only your one use-case gets
better and the rest doesn't, or it might actually get worse, because it
got less attention.
> > You're bound to have interaction between the core os and these
> > partitions you want, non of it different from how threads in the kernel
> > would interact, other than that you're going to re-invent everything
> > already present in the kernel.
>
> The kernel interactions can be done while running on another (not
> isolated) cpu.
There needs to be some communication between the isolated and non
isolated part, otherwise what's the point. Even when you'd let it handle
say a network device as pure firewall, you'd need to configure the
thing, requiring interaction.
Interaction of any sorts gets serialization requirements, and from there
on things tend to grow.
> > You don't, you make things more complex by introducing duplicate
> > functionality.
>
> The functionality does not exist. This is about new features.
It is not, he is proposing to use these cores for:
- network stuff, we already have that
- raid5 stuff, we already have that
- other stuff we already have
Then there is the issue of what happens when a single core isn't
sufficient for the given task, then you'd need to split up, again
creating more interaction.
> > If you think the kernel is too fat and does superfluous things for your
> > needs, help trim it.
>
> Mind boogling nonsense. Please stop fantasizing and trolling.
Oh, to lay down the crack-pipe and sod off.
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: RFC: THE OFFLINE SCHEDULER
2009-08-26 16:03 ` Peter Zijlstra
@ 2009-08-26 16:16 ` Pekka Enberg
2009-08-26 16:20 ` Christoph Lameter
1 sibling, 0 replies; 79+ messages in thread
From: Pekka Enberg @ 2009-08-26 16:16 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Christoph Lameter, raz ben yehuda, Maxim Levitsky, Chris Friesen,
Mike Galbraith, riel, mingo, andrew motron, wiseman, lkml,
linux-rt-users
Hi Peter,
On Wed, Aug 26, 2009 at 7:03 PM, Peter Zijlstra<peterz@infradead.org> wrote:
> There needs to be some communication between the isolated and non
> isolated part, otherwise what's the point. Even when you'd let it handle
> say a network device as pure firewall, you'd need to configure the
> thing, requiring interaction.
The use case Christoph described was an user-space number cruncher app
that does some network I/O over RDMA IIRC. AFAICT, if he could isolate
a physical CPU for the thing, there would be little or no
communication with the non-isolated part. Yes, the setup sounds weird
but it's a real workload although pretty damn specialized.
On Wed, Aug 26, 2009 at 7:03 PM, Peter Zijlstra<peterz@infradead.org> wrote:
>> > If you think the kernel is too fat and does superfluous things for your
>> > needs, help trim it.
>>
>> Mind boogling nonsense. Please stop fantasizing and trolling.
>
> Oh, to lay down the crack-pipe and sod off.
I guess I'll go for the magic mushrooms then.
Pekka
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: RFC: THE OFFLINE SCHEDULER
2009-08-26 16:03 ` Peter Zijlstra
2009-08-26 16:16 ` Pekka Enberg
@ 2009-08-26 16:20 ` Christoph Lameter
2009-08-26 18:04 ` Ingo Molnar
1 sibling, 1 reply; 79+ messages in thread
From: Christoph Lameter @ 2009-08-26 16:20 UTC (permalink / raw)
To: Peter Zijlstra
Cc: raz ben yehuda, Maxim Levitsky, Chris Friesen, Mike Galbraith,
riel, mingo, andrew motron, wiseman, lkml, linux-rt-users
On Wed, 26 Aug 2009, Peter Zijlstra wrote:
> On Wed, 2009-08-26 at 11:41 -0400, Christoph Lameter wrote:
> > On Wed, 26 Aug 2009, Peter Zijlstra wrote:
> >
> > > Why waste a whole cpu for something that could be done by part of one?
> >
> > Because of latency and performance requirements
>
> Latency is the only one, and yes people have been using hacks like this,
> I've also earlier mentioned RTAI, RTLinux and L4-Linux which basically
> do the same thing.
>
> The problem is, that its not linux, you cannot run something on a these
> off-cores and use the same functionality as linux, if you could it'd not
> be offline.
Right. We discussed this. Why are you repeating the same old arguments?
> Carving out cpus just doesn't work in the long run (see below for more),
> it adds configuration burdens on people and it would duplicate
> functionality (below), or it provides it in a (near) useless manner.
Its pretty simple. Just isolate the cpu, forbid the OS to run anything on
it. Allow a user space process to change its affinity to the isolated cpu.
Should the process be so stupid as to ask the OS for services then just
switch it back to a regular processor. Interaction is still possible via
shared memory communication as well as memory mapped devices.
> If you hack up special cases like this, then only your one use-case gets
> better and the rest doesn't, or it might actually get worse, because it
> got less attention.
What special case? This is a generic mechanism.
> > The kernel interactions can be done while running on another (not
> > isolated) cpu.
>
> There needs to be some communication between the isolated and non
> isolated part, otherwise what's the point. Even when you'd let it handle
> say a network device as pure firewall, you'd need to configure the
> thing, requiring interaction.
Shared memory, memory mapped devices?
> Interaction of any sorts gets serialization requirements, and from there
> on things tend to grow.
Yes and there are mechanism that provide the serialization without OS
services.
> > The functionality does not exist. This is about new features.
>
> It is not, he is proposing to use these cores for:
>
> - network stuff, we already have that
> - raid5 stuff, we already have that
> - other stuff we already have
Right. I also want to use it for network stuff. Infiniband which support
memory mapped registers and stuff. Its generic not special as you state.
> Then there is the issue of what happens when a single core isn't
> sufficient for the given task, then you'd need to split up, again
> creating more interaction.
Well yes you need to create synchronization methods that do not require OS
interaction.
> > > If you think the kernel is too fat and does superfluous things for your
> > > needs, help trim it.
> >
> > Mind boogling nonsense. Please stop fantasizing and trolling.
>
> Oh, to lay down the crack-pipe and sod off.
Dont have one here. Whats a sod off?
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: RFC: THE OFFLINE SCHEDULER
2009-08-26 16:20 ` Christoph Lameter
@ 2009-08-26 18:04 ` Ingo Molnar
2009-08-26 19:15 ` Christoph Lameter
0 siblings, 1 reply; 79+ messages in thread
From: Ingo Molnar @ 2009-08-26 18:04 UTC (permalink / raw)
To: Christoph Lameter
Cc: Peter Zijlstra, raz ben yehuda, Maxim Levitsky, Chris Friesen,
Mike Galbraith, riel, andrew motron, wiseman, lkml,
linux-rt-users
* Christoph Lameter <cl@linux-foundation.org> wrote:
> On Wed, 26 Aug 2009, Peter Zijlstra wrote:
>
> > On Wed, 2009-08-26 at 11:41 -0400, Christoph Lameter wrote:
> > > On Wed, 26 Aug 2009, Peter Zijlstra wrote:
> > >
> > > > Why waste a whole cpu for something that could be done by part of one?
> > >
> > > Because of latency and performance requirements
> >
> > Latency is the only one, and yes people have been using hacks
> > like this, I've also earlier mentioned RTAI, RTLinux and
> > L4-Linux which basically do the same thing.
> >
> > The problem is, that its not linux, you cannot run something on
> > a these off-cores and use the same functionality as linux, if
> > you could it'd not be offline.
>
> Right. We discussed this. Why are you repeating the same old
> arguments?
The thing is, you have cut out (and have not replied to) this
crutial bit of what Peter wrote:
> > The past year or so you've been whining about the tick latency,
> > and I've seen exactly _0_ patches from you slimming down the
> > work done in there, even though I pointed out some obvious
> > things that could be done.
... which pretty much settles the issue as far as i'm concerned. If
you were truly interested in a constructive solution to lower
latencies in Linux you should have sent patches already for the low
hanging fruits Peter pointed out.
Ingo
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: RFC: THE OFFLINE SCHEDULER
2009-08-26 18:04 ` Ingo Molnar
@ 2009-08-26 19:15 ` Christoph Lameter
2009-08-26 19:32 ` Ingo Molnar
2009-08-27 7:15 ` Mike Galbraith
0 siblings, 2 replies; 79+ messages in thread
From: Christoph Lameter @ 2009-08-26 19:15 UTC (permalink / raw)
To: Ingo Molnar
Cc: Peter Zijlstra, raz ben yehuda, Maxim Levitsky, Chris Friesen,
Mike Galbraith, riel, andrew motron, wiseman, lkml,
linux-rt-users
On Wed, 26 Aug 2009, Ingo Molnar wrote:
> The thing is, you have cut out (and have not replied to) this
> crutial bit of what Peter wrote:
>
> > > The past year or so you've been whining about the tick latency,
> > > and I've seen exactly _0_ patches from you slimming down the
> > > work done in there, even though I pointed out some obvious
> > > things that could be done.
>
> ... which pretty much settles the issue as far as i'm concerned. If
> you were truly interested in a constructive solution to lower
> latencies in Linux you should have sent patches already for the low
> hanging fruits Peter pointed out.
The noise latencies were already reduced in years earlier to the mininum
(f.e. the work on slab queue cleaning). Certainly more could be done there
but that misses the point.
The point of the OFFLINE scheduler is to completely eliminate the
OS disturbances by getting rid of *all* OS processing on some cpus.
For some reason scheduler developers seem to be threatened by this idea
and they go into bizarre lines of arguments to avoid the issue. Its simple
and doable and the scheduler will still be there after we do this.
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: RFC: THE OFFLINE SCHEDULER
2009-08-26 19:15 ` Christoph Lameter
@ 2009-08-26 19:32 ` Ingo Molnar
2009-08-26 20:40 ` Christoph Lameter
2009-08-26 21:32 ` raz ben yehuda
2009-08-27 7:15 ` Mike Galbraith
1 sibling, 2 replies; 79+ messages in thread
From: Ingo Molnar @ 2009-08-26 19:32 UTC (permalink / raw)
To: Christoph Lameter
Cc: Peter Zijlstra, raz ben yehuda, Maxim Levitsky, Chris Friesen,
Mike Galbraith, riel, andrew motron, wiseman, lkml,
linux-rt-users
* Christoph Lameter <cl@linux-foundation.org> wrote:
> On Wed, 26 Aug 2009, Ingo Molnar wrote:
>
> > The thing is, you have cut out (and have not replied to) this
> > crutial bit of what Peter wrote:
> >
> > > > The past year or so you've been whining about the tick latency,
> > > > and I've seen exactly _0_ patches from you slimming down the
> > > > work done in there, even though I pointed out some obvious
> > > > things that could be done.
> >
> > ... which pretty much settles the issue as far as i'm concerned.
> > If you were truly interested in a constructive solution to lower
> > latencies in Linux you should have sent patches already for the
> > low hanging fruits Peter pointed out.
>
> The noise latencies were already reduced in years earlier to the
> mininum (f.e. the work on slab queue cleaning). Certainly more
> could be done there but that misses the point.
Peter suggested various improvements to the timer tick related
latencies _you_ were complaining about earlier this year. Those
latencies sure were not addressed 'years earlier'.
If you are unwilling to reduce the very latencies you apparently
cared and complained about then you dont have much real standing to
complain now.
( If you on the other hand were approaching this issue with
pragmatism and with intellectual honesty, if you were at the end
of a string of patches that gradually improved latencies but
couldnt get them below a certain threshold, and if scheduler
developers couldnt give you any ideas what else to improve, and
_then_ suggested some other solution, you might have a point.
You are far away from being able to claim that. )
Really, it's a straightforward application of Occam's Razor to the
scheduler. We go for the simplest solution first, and try to help
more people first, before going for some specialist hack.
> The point of the OFFLINE scheduler is to completely eliminate the
> OS disturbances by getting rid of *all* OS processing on some
> cpus.
>
> For some reason scheduler developers seem to be threatened by this
> idea and they go into bizarre lines of arguments to avoid the
> issue. Its simple and doable and the scheduler will still be there
> after we do this.
If you meant to include me in that summary categorization, i dont
feel 'threatened' by any such patches (why would i? They dont seem
to have sharp teeth nor any apparent poison fangs) - i simply concur
with the reasons Peter listed that it is a technically inferior
solution.
Ingo
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: RFC: THE OFFLINE SCHEDULER
2009-08-26 19:32 ` Ingo Molnar
@ 2009-08-26 20:40 ` Christoph Lameter
2009-08-26 20:50 ` Andrew Morton
2009-08-26 21:08 ` Ingo Molnar
2009-08-26 21:32 ` raz ben yehuda
1 sibling, 2 replies; 79+ messages in thread
From: Christoph Lameter @ 2009-08-26 20:40 UTC (permalink / raw)
To: Ingo Molnar
Cc: Peter Zijlstra, raz ben yehuda, Maxim Levitsky, Chris Friesen,
Mike Galbraith, riel, andrew motron, wiseman, lkml,
linux-rt-users
On Wed, 26 Aug 2009, Ingo Molnar wrote:
> ( If you on the other hand were approaching this issue with
> pragmatism and with intellectual honesty, if you were at the end
> of a string of patches that gradually improved latencies but
> couldnt get them below a certain threshold, and if scheduler
> developers couldnt give you any ideas what else to improve, and
> _then_ suggested some other solution, you might have a point.
> You are far away from being able to claim that. )
Intellectual honesty? Wish I would be seeing it. So far there is not even
the uptake required on your side to discuss the problem.
There is no threshold. HPC and other industries want processors as
a whole with all their abilities. They will squeeze the last bit of
performance out of them.
> to have sharp teeth nor any apparent poison fangs) - i simply concur
> with the reasons Peter listed that it is a technically inferior
> solution.
Ok so you are saying that the reduction of OS latencies will make the
processor completely available and have no disturbances like OFFLINE scheduling?
Peter has not given a solution to the problem. Nor have you.
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: RFC: THE OFFLINE SCHEDULER
2009-08-26 20:40 ` Christoph Lameter
@ 2009-08-26 20:50 ` Andrew Morton
2009-08-26 21:09 ` Christoph Lameter
` (3 more replies)
2009-08-26 21:08 ` Ingo Molnar
1 sibling, 4 replies; 79+ messages in thread
From: Andrew Morton @ 2009-08-26 20:50 UTC (permalink / raw)
To: Christoph Lameter
Cc: mingo, peterz, raziebe, maximlevitsky, cfriesen, efault, riel,
wiseman, linux-kernel, linux-rt-users
On Wed, 26 Aug 2009 16:40:09 -0400 (EDT)
Christoph Lameter <cl@linux-foundation.org> wrote:
> Peter has not given a solution to the problem. Nor have you.
What problem?
All I've seen is "I want 100% access to a CPU". That's not a problem
statement - it's an implementation.
What is the problem statement?
My take on these patches: the kernel gives userspace unmediated access
to memory resources if it wants that. The kernel gives userspace
unmediated access to IO devices if it wants that. But for some reason
people freak out at the thought of providing unmediated access to CPU
resources.
Don't get all religious about this. If the change is clean,
maintainable and useful then there's no reason to not merge it.
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: RFC: THE OFFLINE SCHEDULER
2009-08-26 20:40 ` Christoph Lameter
2009-08-26 20:50 ` Andrew Morton
@ 2009-08-26 21:08 ` Ingo Molnar
2009-08-26 21:26 ` Christoph Lameter
1 sibling, 1 reply; 79+ messages in thread
From: Ingo Molnar @ 2009-08-26 21:08 UTC (permalink / raw)
To: Christoph Lameter
Cc: Peter Zijlstra, raz ben yehuda, Maxim Levitsky, Chris Friesen,
Mike Galbraith, riel, andrew motron, wiseman, lkml,
linux-rt-users
* Christoph Lameter <cl@linux-foundation.org> wrote:
> > to have sharp teeth nor any apparent poison fangs) - i simply
> > concur with the reasons Peter listed that it is a technically
> > inferior solution.
>
> Ok so you are saying that the reduction of OS latencies will make
> the processor completely available and have no disturbances like
> OFFLINE scheduling?
I'm saying that your lack of trying to reduce even low-hanging-fruit
latency sources that were pointed out to you fundamentally destroys
your credibility in claiming that they are unfixable for all
practical purposes.
Or, to come up with a car analogy: it's a bit as if at a repair shop
you complained that your car has a scratch on its cooler grid that
annoys you, and you insisted that it be outfitted with a new diesel
engine which needs no cooler grid (throwing away the nice Hemi block
it has currently) - and ignored the mechanic's opinion that he loves
the Hemi and that to him the scratch looks very much like bird-sh*t
and that a proper car wash might do the trick too ;-)
> Peter has not given a solution to the problem. Nor have you.
What do you mean by 'has given a solution' - a patch?
Peter mentioned a few things that you can try to reduce the
worst-case latency of the timer tick.
Peter also implemented the hr-tick solution (CONFIG_SCHED_HRTICK) -
it's mostly upstream but disabled because it had problems - if you
are interested in improving this area you can fix and complete it.
That would benefit ordinary Linux users too, not just rare isolation
apps.
Ingo
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: RFC: THE OFFLINE SCHEDULER
2009-08-26 20:50 ` Andrew Morton
@ 2009-08-26 21:09 ` Christoph Lameter
2009-08-26 21:15 ` Chris Friesen
` (2 subsequent siblings)
3 siblings, 0 replies; 79+ messages in thread
From: Christoph Lameter @ 2009-08-26 21:09 UTC (permalink / raw)
To: Andrew Morton
Cc: mingo, peterz, raziebe, maximlevitsky, cfriesen, efault, riel,
wiseman, linux-kernel, linux-rt-users
On Wed, 26 Aug 2009, Andrew Morton wrote:
> All I've seen is "I want 100% access to a CPU". That's not a problem
> statement - it's an implementation.
Maybe. But its a problem statement that I have seen in various industries.
Multiple kernel hacks exist to do this in more or less contorted way. We
already have Linux scheduler functionality that does partially what is
needed.
See the
isolcpus
kernel parameter. isolcpus does not switch off OS sources of noise
but it takes the processor away from the scheduler. We need a harder form of
isolation where the excluded processors offer no OS services at all.
> What is the problem statement?
My definition (likely not covering all that the author of this patchset
wants):
How to make a processor in a multicore system completely
available to a process.
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: RFC: THE OFFLINE SCHEDULER
2009-08-26 20:50 ` Andrew Morton
2009-08-26 21:09 ` Christoph Lameter
@ 2009-08-26 21:15 ` Chris Friesen
2009-08-26 21:37 ` raz ben yehuda
2009-08-26 21:34 ` Ingo Molnar
2009-08-26 21:34 ` raz ben yehuda
3 siblings, 1 reply; 79+ messages in thread
From: Chris Friesen @ 2009-08-26 21:15 UTC (permalink / raw)
To: Andrew Morton
Cc: Christoph Lameter, mingo, peterz, raziebe, maximlevitsky, efault,
riel, wiseman, linux-kernel, linux-rt-users
On 08/26/2009 02:50 PM, Andrew Morton wrote:
> What problem?
>
> All I've seen is "I want 100% access to a CPU". That's not a problem
> statement - it's an implementation.
>
> What is the problem statement?
I can only speak for myself...
In our case the problem statement was that we had an inherently
single-threaded emulator app that we wanted to push as hard as
absolutely possible.
We gave it as close to a whole cpu as we could using cpu and irq
affinity and we used message queues in shared memory to allow another
cpu to handle I/O. In our case we still had kernel threads running on
the app cpu, but if we'd had a straightforward way to avoid them we
would have used it.
Chris
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: RFC: THE OFFLINE SCHEDULER
2009-08-26 21:08 ` Ingo Molnar
@ 2009-08-26 21:26 ` Christoph Lameter
0 siblings, 0 replies; 79+ messages in thread
From: Christoph Lameter @ 2009-08-26 21:26 UTC (permalink / raw)
To: Ingo Molnar
Cc: Peter Zijlstra, raz ben yehuda, Maxim Levitsky, Chris Friesen,
Mike Galbraith, riel, andrew motron, wiseman, lkml,
linux-rt-users
On Wed, 26 Aug 2009, Ingo Molnar wrote:
> I'm saying that your lack of trying to reduce even low-hanging-fruit
> latency sources that were pointed out to you fundamentally destroys
> your credibility in claiming that they are unfixable for all
> practical purposes.
I have never claimed that they are unfixable. However, reducing latencies
does not remove a disturbance.
> Or, to come up with a car analogy: it's a bit as if at a repair shop
> you complained that your car has a scratch on its cooler grid that
> annoys you, and you insisted that it be outfitted with a new diesel
> engine which needs no cooler grid (throwing away the nice Hemi block
> it has currently) - and ignored the mechanic's opinion that he loves
> the Hemi and that to him the scratch looks very much like bird-sh*t
> and that a proper car wash might do the trick too ;-)
Nope. Its like you want to get rid of your car and the person you talk to
tries to convince you to keep it. He claims if you would just wash it and
repair it then it will be maybe become almost invisible and you would have
reached your goal of not having a car.
> That would benefit ordinary Linux users too, not just rare isolation
> apps.
We are talking about apps that need isolation here not regular app.
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: RFC: THE OFFLINE SCHEDULER
2009-08-26 19:32 ` Ingo Molnar
2009-08-26 20:40 ` Christoph Lameter
@ 2009-08-26 21:32 ` raz ben yehuda
1 sibling, 0 replies; 79+ messages in thread
From: raz ben yehuda @ 2009-08-26 21:32 UTC (permalink / raw)
To: Ingo Molnar
Cc: Christoph Lameter, Peter Zijlstra, Maxim Levitsky, Chris Friesen,
Mike Galbraith, riel, andrew motron, wiseman, lkml,
linux-rt-users
Ingo Hello
First thank you for your interest.
OFFSCHED is a variant of a proprietary software. it is 4 years old.It is
stable. and.. well...this thing works .And it is so simple. SO VERY VERY
SIMPLE. ONCE YOU GO OFFLINE YOU NEVER LOOK BACK.
OFFSCHED has a full access to many kernel facilities. My software
transmits packets, encrypt packets, and reaches network throughput
traffic ( 25Gbs), same as pktgen while saturating its 8 SSD disks.
My software take statistics of an offloaded processor usage, and unlike
OS processors, since I have a full control of the processor, the usage
is growing quite linearly. there are no bursts of CPU usage. it remains
stable of X% usage even when I transmit 25Gbps.
OFFSCHED __oldest__ patch was 4 lines. this how it started. 4 lines of
patch and My 2.6.18-8.el5 kernel is suddenly a hard real time kernel.
Today, I patch this kernel, build only a bzImage, throw this 2MB bzImage
on a server running regular centos/redhat distribution, and caboom, I
have a real time server in god-know-where. I do not mess with any
driver, i do not mess with initrd. I just fix 4 lines. that all.
OFFSCHED is not just for real time. It can monitor the kernel, protect
it and do whatever come to mind. please see OFFSCHED-RTOP.pdf.
thank you
raz
On Wed, 2009-08-26 at 21:32 +0200, Ingo aMolnar wrote:
> * Christoph Lameter <cl@linux-foundation.org> wrote:
>
> > On Wed, 26 Aug 2009, Ingo Molnar wrote:
> >
> > > The thing is, you have cut out (and have not replied to) this
> > > crutial bit of what Peter wrote:
> > >
> > > > > The past year or so you've been whining about the tick latency,
> > > > > and I've seen exactly _0_ patches from you slimming down the
> > > > > work done in there, even though I pointed out some obvious
> > > > > things that could be done.
> > >
> > > ... which pretty much settles the issue as far as i'm concerned.
> > > If you were truly interested in a constructive solution to lower
> > > latencies in Linux you should have sent patches already for the
> > > low hanging fruits Peter pointed out.
> >
> > The noise latencies were already reduced in years earlier to the
> > mininum (f.e. the work on slab queue cleaning). Certainly more
> > could be done there but that misses the point.
>
> Peter suggested various improvements to the timer tick related
> latencies _you_ were complaining about earlier this year. Those
> latencies sure were not addressed 'years earlier'.
>
> If you are unwilling to reduce the very latencies you apparently
> cared and complained about then you dont have much real standing to
> complain now.
>
> ( If you on the other hand were approaching this issue with
> pragmatism and with intellectual honesty, if you were at the end
> of a string of patches that gradually improved latencies but
> couldnt get them below a certain threshold, and if scheduler
> developers couldnt give you any ideas what else to improve, and
> _then_ suggested some other solution, you might have a point.
> You are far away from being able to claim that. )
>
> Really, it's a straightforward application of Occam's Razor to the
> scheduler. We go for the simplest solution first, and try to help
> more people first, before going for some specialist hack.
>
> > The point of the OFFLINE scheduler is to completely eliminate the
> > OS disturbances by getting rid of *all* OS processing on some
> > cpus.
> >
> > For some reason scheduler developers seem to be threatened by this
> > idea and they go into bizarre lines of arguments to avoid the
> > issue. Its simple and doable and the scheduler will still be there
> > after we do this.
>
> If you meant to include me in that summary categorization, i dont
> feel 'threatened' by any such patches (why would i? They dont seem
> to have sharp teeth nor any apparent poison fangs) - i simply concur
> with the reasons Peter listed that it is a technically inferior
> solution.
>
> Ingo
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: RFC: THE OFFLINE SCHEDULER
2009-08-26 20:50 ` Andrew Morton
2009-08-26 21:09 ` Christoph Lameter
2009-08-26 21:15 ` Chris Friesen
@ 2009-08-26 21:34 ` Ingo Molnar
2009-08-27 2:55 ` Frank Ch. Eigler
2009-08-26 21:34 ` raz ben yehuda
3 siblings, 1 reply; 79+ messages in thread
From: Ingo Molnar @ 2009-08-26 21:34 UTC (permalink / raw)
To: Andrew Morton
Cc: Christoph Lameter, peterz, raziebe, maximlevitsky, cfriesen,
efault, riel, wiseman, linux-kernel, linux-rt-users
* Andrew Morton <akpm@linux-foundation.org> wrote:
> On Wed, 26 Aug 2009 16:40:09 -0400 (EDT)
> Christoph Lameter <cl@linux-foundation.org> wrote:
>
> > Peter has not given a solution to the problem. Nor have you.
>
> What problem?
>
> All I've seen is "I want 100% access to a CPU". That's not a problem
> statement - it's an implementation.
>
> What is the problem statement?
>
> My take on these patches: the kernel gives userspace unmediated
> access to memory resources if it wants that. The kernel gives
> userspace unmediated access to IO devices if it wants that. But
> for some reason people freak out at the thought of providing
> unmediated access to CPU resources.
Claiming all user-available CPU time from user-space is already
possible: use SCHED_FIFO - the only question are remaining latencies
in the final 0.01% of CPU time you cannot claim via SCHED_FIFO.
( Btw., this scheduling feature was implemented in Linux well before
raw IO block devices were implemented, so i'm not sure what you
mean by 'freaking out'. )
What we are objecting to are these easy isolation side-hacks for the
remaining 0.01% that fail to address the real problem: the
latencies. Those latencies can hurt not just isolated apps but _non
isolated_ (and latency critical) apps too, and what we insist on is
getting the proper fixes, not just ugly workarounds that side-step
the problem.
( a secondary objection is the extension and extra layering
of something that could be done within existing APIs/ABIs too. We
want to minimize the configuration space. )
> Don't get all religious about this. If the change is clean,
> maintainable and useful then there's no reason to not merge it.
Precisely. This feature as proposed here hinders the correct
solution being implemented - and hence hurts long term
maintainability and hence is a no-merge right now. [It also weakens
the pressure to fix latencies for a much wider set of applications,
hence hurts the quality of Linux in the long run. (i.e. is a net
step backwards)]
Ingo
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: RFC: THE OFFLINE SCHEDULER
2009-08-26 20:50 ` Andrew Morton
` (2 preceding siblings ...)
2009-08-26 21:34 ` Ingo Molnar
@ 2009-08-26 21:34 ` raz ben yehuda
3 siblings, 0 replies; 79+ messages in thread
From: raz ben yehuda @ 2009-08-26 21:34 UTC (permalink / raw)
To: Andrew Morton
Cc: Christoph Lameter, mingo, peterz, maximlevitsky, cfriesen, efault,
riel, wiseman, linux-kernel, linux-rt-users
On Wed, 2009-08-26 at 13:50 -0700, Andrew Morton wrote:
> On Wed, 26 Aug 2009 16:40:09 -0400 (EDT)
> Christoph Lameter <cl@linux-foundation.org> wrote:
>
> > Peter has not given a solution to the problem. Nor have you.
>
> What problem?
>
> All I've seen is "I want 100% access to a CPU". That's not a problem
> statement - it's an implementation.
>
> What is the problem statement?
>
>
> My take on these patches: the kernel gives userspace unmediated access
> to memory resources if it wants that. The kernel gives userspace
> unmediated access to IO devices if it wants that. But for some reason
> people freak out at the thought of providing unmediated access to CPU
> resources.
>
> Don't get all religious about this. If the change is clean,
> maintainable and useful then there's no reason to not merge it.
thank you Mr Morton. thank you !!!
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: RFC: THE OFFLINE SCHEDULER
2009-08-26 21:15 ` Chris Friesen
@ 2009-08-26 21:37 ` raz ben yehuda
2009-08-27 16:51 ` Chris Friesen
0 siblings, 1 reply; 79+ messages in thread
From: raz ben yehuda @ 2009-08-26 21:37 UTC (permalink / raw)
To: Chris Friesen
Cc: Andrew Morton, Christoph Lameter, mingo, peterz, maximlevitsky,
efault, riel, wiseman, linux-kernel, linux-rt-users
On Wed, 2009-08-26 at 15:15 -0600, Chris Friesen wrote:
> On 08/26/2009 02:50 PM, Andrew Morton wrote:
>
> > What problem?
> >
> > All I've seen is "I want 100% access to a CPU". That's not a problem
> > statement - it's an implementation.
> >
> > What is the problem statement?
>
> I can only speak for myself...
>
> In our case the problem statement was that we had an inherently
> single-threaded emulator app that we wanted to push as hard as
> absolutely possible.
>
> We gave it as close to a whole cpu as we could using cpu and irq
> affinity and we used message queues in shared memory to allow another
> cpu to handle I/O. In our case we still had kernel threads running on
> the app cpu, but if we'd had a straightforward way to avoid them we
> would have used it.
>
> Chris
Chris. I offer myself to help anyone wishes to apply OFFSCHED.
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: RFC: THE OFFLINE SCHEDULER
2009-08-26 21:34 ` Ingo Molnar
@ 2009-08-27 2:55 ` Frank Ch. Eigler
0 siblings, 0 replies; 79+ messages in thread
From: Frank Ch. Eigler @ 2009-08-27 2:55 UTC (permalink / raw)
To: Ingo Molnar
Cc: Andrew Morton, Christoph Lameter, peterz, raziebe, maximlevitsky,
cfriesen, efault, riel, wiseman, linux-kernel, linux-rt-users
Ingo Molnar <mingo@elte.hu> writes:
> [...]
>
>> Don't get all religious about this. If the change is clean,
>> maintainable and useful then there's no reason to not merge it.
> Precisely. This feature as proposed here hinders the correct
> solution being implemented - and hence hurts long term
> maintainability and hence is a no-merge right now.
(Does it "hinder" this in any different way than the following, as in
possibly reducing "pressure" for it?)
> [It also weakens the pressure to fix latencies for a much wider set
> of applications, hence hurts the quality of Linux in the long
> run. (i.e. is a net step backwards)]
How would you differentiate the above sentiment from "perfect is the
enemy of the good"?
- FChE
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: RFC: THE OFFLINE SCHEDULER
2009-08-26 19:15 ` Christoph Lameter
2009-08-26 19:32 ` Ingo Molnar
@ 2009-08-27 7:15 ` Mike Galbraith
1 sibling, 0 replies; 79+ messages in thread
From: Mike Galbraith @ 2009-08-27 7:15 UTC (permalink / raw)
To: Christoph Lameter
Cc: Ingo Molnar, Peter Zijlstra, raz ben yehuda, Maxim Levitsky,
Chris Friesen, riel, andrew motron, wiseman, lkml, linux-rt-users
On Wed, 2009-08-26 at 15:15 -0400, Christoph Lameter wrote:
> The point of the OFFLINE scheduler is to completely eliminate the
> OS disturbances by getting rid of *all* OS processing on some cpus.
No, that's not the point of OFFSCHED. It's about offloading kernel
functionality to a peer, and as it currently exists after some years of
development. kernel functionality only. Raz has already stated that
hard RT is not the point.
<quote> (for full context, jump back a bit in this thread)
> On the other hand, I could see this as a jump platform for more
> proprietary code, something like that: we use linux in out server
> platform, but out "insert buzzword here" network stack pro+ can handle
> 100% more load that linux does, and it runs on a dedicated core....
>
> In the other words, we might see 'firmwares' that take an entire cpu
for
> their usage.
This is exactly what offsched (sos) is. you got it. SOS was partly
inspired by the notion of a GPU.
Processors are to become more and more redundant and Linux as an
evolutionary system must use it. why not offload raid5 write engine ?
why not encrypt in a different processor ?
Also , having so many processors in a single OS means a bug prone
system , with endless contention points when two or more OS processors
interacts. let's make things simpler.
</quote>
-Mike
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: RFC: THE OFFLINE SCHEDULER
2009-08-26 21:37 ` raz ben yehuda
@ 2009-08-27 16:51 ` Chris Friesen
2009-08-27 17:04 ` Christoph Lameter
2009-08-27 21:33 ` raz ben yehuda
0 siblings, 2 replies; 79+ messages in thread
From: Chris Friesen @ 2009-08-27 16:51 UTC (permalink / raw)
To: raz ben yehuda
Cc: Andrew Morton, Christoph Lameter, mingo, peterz, maximlevitsky,
efault, riel, wiseman, linux-kernel, linux-rt-users
On 08/26/2009 03:37 PM, raz ben yehuda wrote:
>
> On Wed, 2009-08-26 at 15:15 -0600, Chris Friesen wrote:
>> We gave it as close to a whole cpu as we could using cpu and irq
>> affinity and we used message queues in shared memory to allow another
>> cpu to handle I/O. In our case we still had kernel threads running on
>> the app cpu, but if we'd had a straightforward way to avoid them we
>> would have used it.
> Chris. I offer myself to help anyone wishes to apply OFFSCHED.
I just went and read the docs. One of the things I noticed is that it
says that the offlined cpu cannot run userspace tasks. For our
situation that's a showstopper, unfortunately.
Chris
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: RFC: THE OFFLINE SCHEDULER
2009-08-27 16:51 ` Chris Friesen
@ 2009-08-27 17:04 ` Christoph Lameter
2009-08-27 21:09 ` Thomas Gleixner
2009-08-27 21:33 ` raz ben yehuda
1 sibling, 1 reply; 79+ messages in thread
From: Christoph Lameter @ 2009-08-27 17:04 UTC (permalink / raw)
To: Chris Friesen
Cc: raz ben yehuda, Andrew Morton, mingo, peterz, maximlevitsky,
efault, riel, wiseman, linux-kernel, linux-rt-users
On Thu, 27 Aug 2009, Chris Friesen wrote:
> I just went and read the docs. One of the things I noticed is that it
> says that the offlined cpu cannot run userspace tasks. For our
> situation that's a showstopper, unfortunately.
It needs to be implemented the right way. Essentially this is a variation
on the isolcpu kernel boot option. We probably need some syscall to move
a user space process to a bare metal cpu since the cpu cannot be
considered online in the regular sense.
An isolated cpu can then only execute one process at a time. A process
would do all initialization and lock itsresources in memory before going
to the isolated processor. Any attempt to use OS facilities need to cause
the process to be moved back to a cpu with OS services.
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: RFC: THE OFFLINE SCHEDULER
2009-08-27 17:04 ` Christoph Lameter
@ 2009-08-27 21:09 ` Thomas Gleixner
2009-08-27 22:22 ` Gregory Haskins
` (2 more replies)
0 siblings, 3 replies; 79+ messages in thread
From: Thomas Gleixner @ 2009-08-27 21:09 UTC (permalink / raw)
To: Christoph Lameter
Cc: Chris Friesen, raz ben yehuda, Andrew Morton, mingo, peterz,
maximlevitsky, efault, riel, wiseman, linux-kernel,
linux-rt-users
On Thu, 27 Aug 2009, Christoph Lameter wrote:
> On Thu, 27 Aug 2009, Chris Friesen wrote:
>
> > I just went and read the docs. One of the things I noticed is that it
> > says that the offlined cpu cannot run userspace tasks. For our
> > situation that's a showstopper, unfortunately.
>
> It needs to be implemented the right way. Essentially this is a variation
> on the isolcpu kernel boot option. We probably need some syscall to move
> a user space process to a bare metal cpu since the cpu cannot be
> considered online in the regular sense.
It can. It needs to be flagged as reserved for special tasks and you
need a separate mechanism to move and pin a task to such a CPU.
> An isolated cpu can then only execute one process at a time. A process
> would do all initialization and lock itsresources in memory before going
> to the isolated processor. Any attempt to use OS facilities need to cause
> the process to be moved back to a cpu with OS services.
You are creating a "one special case" operation mode which is not
justified in my opinion. Let's look at the problem you want to solve:
Run exactly one thread on a dedicated CPU w/o any disturbance by the
scheduler tick.
You can move away anything else than the scheduler tick from a CPU
today already w/o a single line of code change.
But you want to impose restrictions like resource locking and moving
back to another CPU in case of a syscall. What's the purpose of this ?
It does not buy anything except additional complexity.
That's just the wrong approach. All you need is a way to tell the
kernel that CPUx can switch off the scheduler tick when only one
thread is running and that very thread is running in user space. Once
another thread arrives on that CPU or the single thread enters the
kernel for a blocking syscall the scheduler tick has to be
restarted.
It's not rocket science to fix the well known issues of stopping and
eventually restarting the scheduler tick, the CPU time accounting and
some other small details. Such a modification would be of general use
contrary to your proposed solution which is just a hack to solve your
particular special case of operation.
Thanks,
tglx
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: RFC: THE OFFLINE SCHEDULER
2009-08-27 16:51 ` Chris Friesen
2009-08-27 17:04 ` Christoph Lameter
@ 2009-08-27 21:33 ` raz ben yehuda
2009-08-27 22:05 ` Thomas Gleixner
1 sibling, 1 reply; 79+ messages in thread
From: raz ben yehuda @ 2009-08-27 21:33 UTC (permalink / raw)
To: Chris Friesen
Cc: Andrew Morton, mingo, peterz, maximlevitsky, efault, riel,
wiseman, linux-kernel, linux-rt-users
On Thu, 2009-08-27 at 10:51 -0600, Chris Friesen wrote:
> On 08/26/2009 03:37 PM, raz ben yehuda wrote:
> >
> > On Wed, 2009-08-26 at 15:15 -0600, Chris Friesen wrote:
>
> >> We gave it as close to a whole cpu as we could using cpu and irq
> >> affinity and we used message queues in shared memory to allow another
> >> cpu to handle I/O. In our case we still had kernel threads running on
> >> the app cpu, but if we'd had a straightforward way to avoid them we
> >> would have used it.
>
> > Chris. I offer myself to help anyone wishes to apply OFFSCHED.
>
> I just went and read the docs. One of the things I noticed is that it
> says that the offlined cpu cannot run userspace tasks. For our
> situation that's a showstopper, unfortunately.
Given that your entire software is T size , and T' is the amount of real
time size, what is the relation T'/T ?
If T'/T << 1 then dissect it, and put the T' in OFFSCHED.
My software T's is about 100MB while the real time section is about 60K.
They communicate through a simple ioctls.
CPU isolation example: a transmission engine.
In the image bellow, I am presenting 4 streaming engines, over 4 Intels
82598EB 10Gbps. A streaming engine is actually a Xeon E5420 2.5GHz.
Each engine has ***full control*** over its own interface. So you can:
1. fully control the processor's usage.
2. know **exactly*** how much each single packet transmission costs. for
example, in this case in processor 3 a single packet average
transmission is 1974tscs, which is ~700ns.
3. know how many packets fails to transmit right **on time** ( the Lates
counter) . and on time in this case means within the 122us jitter.
4. There are 8 cores in this machine. The rest 4 OS cores are ~95% idle.
The only resource these cores share is the bus.
State: kstreamer UP. Started at October 05 05:19:51
******************************************************
CPU 3,63% usage,Sessions 1499,6124301 kbps
CPU 5,77% usage,Sessions 1499,6123859 kbps
CPU 6,78% usage,Sessions 1498,6123709 kbps
CPU 7,73% usage,Sessions 1498,6117766 kbps
Summary: Throughput=24.489Gbps Sessions =5994
******************************************************
Streaming Processor 3
Tx Count : Tot=399990164 Good=399990164 Bad=0 ERR=0(LOC=0,FULL=0)
Time : GoodSendTsc( Max 1565895 Avg 1974) Lates=649
Flow Errors : Underflow (0,0) NotResched=0 GenErr=0
Sessions : Cur 1499(RTP=1499,UDP=0,MCAST=0,ENCRYPT=0) PAUSED=0
CPU : 63% usage
Queue : Max Size 92 Avg 69 Csc=79
Throughput (bps) : Tot 6271285040 MPEG 5988905440
Throughput (Mbps): Tot Mbps 6271 MPEG Mbps 5988
Throughput : Packets/sec 568855
Streaming Processor 5
Tx Count : Tot=399944597 Good=399944595 Bad=2 ERR=2(LOC=0,FULL=2)
Time : GoodSendTsc( Max 1566052 Avg 2464) Lates=5521
Flow Errors : Underflow (0,0) NotResched=0 GenErr=0
Sessions : Cur 1499(RTP=1499,UDP=0,MCAST=0,ENCRYPT=0) PAUSED=0
CPU : 77% usage
Queue : Max Size 95 Avg 69 Csc=79
Throughput (bps) : Tot 6270832416 MPEG 5988473792
Throughput (Mbps): Tot Mbps 6270 MPEG Mbps 5988
Throughput : Packets/sec 568814
Streaming Processor 6
Tx Count : Tot=399898586 Good=399898585 Bad=0 ERR=0(LOC=0,FULL=0)
Time : GoodSendTsc( Max 1661385 Avg 2474) Lates=8064
Flow Errors : Underflow (0,0) NotResched=0 GenErr=0
Sessions : Cur 1498(RTP=1498,UDP=0,MCAST=0,ENCRYPT=0) PAUSED=0
CPU : 78% usage
Queue : Max Size 91 Avg 69 Csc=87
Throughput (bps) : Tot 6270678560 MPEG 5988326400
Throughput (Mbps): Tot Mbps 6270 MPEG Mbps 5988
Throughput : Packets/sec 568800
Streaming Processor 7
Tx Count : Tot=399845166 Good=399845100 Bad=66
ERR=66(LOC=0,FULL=66)
Time : GoodSendTsc( Max 2962620 Avg 2377) Lates=42626
Flow Errors : Underflow (0,0) NotResched=0 GenErr=0
Sessions : Cur 1498(RTP=1498,UDP=0,MCAST=0,ENCRYPT=0) PAUSED=0
CPU : 73% usage
Queue : Max Size 94 Avg 69 Csc=66
Throughput (bps) : Tot 6264592672 MPEG 5982514944
Throughput (Mbps): Tot Mbps 6264 MPEG Mbps 5982
Throughput : Packets/sec 568248
--------------- Reservation Load Balancer ------------------
eth2 : 5994501 kbps
eth3 : 5994501 kbps
eth4 : 5990502 kbps
eth5 : 5990502 kbps
> Chris
>
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: RFC: THE OFFLINE SCHEDULER
2009-08-27 21:33 ` raz ben yehuda
@ 2009-08-27 22:05 ` Thomas Gleixner
2009-08-28 8:38 ` raz ben yehuda
0 siblings, 1 reply; 79+ messages in thread
From: Thomas Gleixner @ 2009-08-27 22:05 UTC (permalink / raw)
To: raz ben yehuda
Cc: Chris Friesen, Andrew Morton, mingo, peterz, maximlevitsky,
efault, riel, wiseman, linux-kernel, linux-rt-users
On Fri, 28 Aug 2009, raz ben yehuda wrote:
> On Thu, 2009-08-27 at 10:51 -0600, Chris Friesen wrote:
> > On 08/26/2009 03:37 PM, raz ben yehuda wrote:
> > >
> > > On Wed, 2009-08-26 at 15:15 -0600, Chris Friesen wrote:
> >
> > >> We gave it as close to a whole cpu as we could using cpu and irq
> > >> affinity and we used message queues in shared memory to allow another
> > >> cpu to handle I/O. In our case we still had kernel threads running on
> > >> the app cpu, but if we'd had a straightforward way to avoid them we
> > >> would have used it.
> >
> > > Chris. I offer myself to help anyone wishes to apply OFFSCHED.
> >
> > I just went and read the docs. One of the things I noticed is that it
> > says that the offlined cpu cannot run userspace tasks. For our
> > situation that's a showstopper, unfortunately.
>
> Given that your entire software is T size , and T' is the amount of real
> time size, what is the relation T'/T ?
> If T'/T << 1 then dissect it, and put the T' in OFFSCHED.
> My software T's is about 100MB while the real time section is about 60K.
Chris was stating that your offlined cpu cannot run userspace
tasks. How is your answer connected to Chris' statement ? Please stop
useless marketing. LKML is about technical problems not advertisement.
> They communicate through a simple ioctls.
This is totally irrelevant and we all know how communication channels
between Linux and whatever hackery (RT-Linux, RTAI, ... OFFLINE)
works.
> CPU isolation example: a transmission engine.
>
> In the image bellow, I am presenting 4 streaming engines, over 4 Intels
> 82598EB 10Gbps. A streaming engine is actually a Xeon E5420 2.5GHz.
> Each engine has ***full control*** over its own interface. So you can:
>
> 1. fully control the processor's usage.
By disabling the OS control over the CPU resource. How innovative.
> 2. know **exactly*** how much each single packet transmission costs. for
> example, in this case in processor 3 a single packet average
> transmission is 1974tscs, which is ~700ns.
>
> 3. know how many packets fails to transmit right **on time** ( the Lates
> counter) . and on time in this case means within the 122us jitter.
Are those statistics a crucial property of your OFFLINE thing ?
> 4. There are 8 cores in this machine. The rest 4 OS cores are ~95% idle.
> The only resource these cores share is the bus.
That does not change the problem that you cannot run ordinary user
space tasks on your offlined CPUs and you are simply hacking round the
real problem which I outlined in my previous mail.
Thanks,
tglx
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: RFC: THE OFFLINE SCHEDULER
2009-08-27 21:09 ` Thomas Gleixner
@ 2009-08-27 22:22 ` Gregory Haskins
2009-08-28 2:15 ` Rik van Riel
2009-08-28 6:14 ` Peter Zijlstra
2009-08-27 23:51 ` Chris Friesen
2009-08-28 18:43 ` Christoph Lameter
2 siblings, 2 replies; 79+ messages in thread
From: Gregory Haskins @ 2009-08-27 22:22 UTC (permalink / raw)
To: Thomas Gleixner
Cc: Christoph Lameter, Chris Friesen, raz ben yehuda, Andrew Morton,
mingo, peterz, maximlevitsky, efault, riel, wiseman, linux-kernel,
linux-rt-users
[-- Attachment #1: Type: text/plain, Size: 3349 bytes --]
Thomas Gleixner wrote:
> On Thu, 27 Aug 2009, Christoph Lameter wrote:
>> On Thu, 27 Aug 2009, Chris Friesen wrote:
>>
>>> I just went and read the docs. One of the things I noticed is that it
>>> says that the offlined cpu cannot run userspace tasks. For our
>>> situation that's a showstopper, unfortunately.
>> It needs to be implemented the right way. Essentially this is a variation
>> on the isolcpu kernel boot option. We probably need some syscall to move
>> a user space process to a bare metal cpu since the cpu cannot be
>> considered online in the regular sense.
>
> It can. It needs to be flagged as reserved for special tasks and you
> need a separate mechanism to move and pin a task to such a CPU.
>
>> An isolated cpu can then only execute one process at a time. A process
>> would do all initialization and lock itsresources in memory before going
>> to the isolated processor. Any attempt to use OS facilities need to cause
>> the process to be moved back to a cpu with OS services.
>
> You are creating a "one special case" operation mode which is not
> justified in my opinion. Let's look at the problem you want to solve:
>
> Run exactly one thread on a dedicated CPU w/o any disturbance by the
> scheduler tick.
>
> You can move away anything else than the scheduler tick from a CPU
> today already w/o a single line of code change.
>
> But you want to impose restrictions like resource locking and moving
> back to another CPU in case of a syscall. What's the purpose of this ?
> It does not buy anything except additional complexity.
>
> That's just the wrong approach. All you need is a way to tell the
> kernel that CPUx can switch off the scheduler tick when only one
> thread is running and that very thread is running in user space. Once
> another thread arrives on that CPU or the single thread enters the
> kernel for a blocking syscall the scheduler tick has to be
> restarted.
>
> It's not rocket science to fix the well known issues of stopping and
> eventually restarting the scheduler tick, the CPU time accounting and
> some other small details. Such a modification would be of general use
> contrary to your proposed solution which is just a hack to solve your
> particular special case of operation.
I wonder if it makes sense to do something along the lines of the
sched-class...
IOW: What if we adopted one of the following models:
1) Create a new class that is higher prio than FIFO/RR and, when
selected, disables the tick.
2) Modify FIFO so that it disables tick by default...update accounting
info at next reschedule event.
3) Variation of 2..leave FIFO+tick as is by default, but have some kind
of parameter to optionally disable tick if desired.
In a way, we should probably consider (2) independent of this particular
thread. FIFO doesn't need a tick anyway afaict...only a RESCHED+IPI
truly ever matter here....or am I missing something obvious (probably
w.r.t accounting)?
You could then couple this solution with cpusets (possibly with a little
work to get rid of any pesky per-cpy kthreads) to achieve the desired
effect of interference-free operation. You wouldn't even have to have
funky rules eluded to above w.r.t. making sure only one userspace thread
is running on the core.
Thoughts?
-Greg
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 267 bytes --]
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: RFC: THE OFFLINE SCHEDULER
2009-08-27 21:09 ` Thomas Gleixner
2009-08-27 22:22 ` Gregory Haskins
@ 2009-08-27 23:51 ` Chris Friesen
2009-08-28 0:44 ` Thomas Gleixner
2009-08-28 18:43 ` Christoph Lameter
2 siblings, 1 reply; 79+ messages in thread
From: Chris Friesen @ 2009-08-27 23:51 UTC (permalink / raw)
To: Thomas Gleixner
Cc: Christoph Lameter, raz ben yehuda, Andrew Morton, mingo, peterz,
maximlevitsky, efault, riel, wiseman, linux-kernel,
linux-rt-users
On 08/27/2009 03:09 PM, Thomas Gleixner wrote:
> That's just the wrong approach. All you need is a way to tell the
> kernel that CPUx can switch off the scheduler tick when only one
> thread is running and that very thread is running in user space. Once
> another thread arrives on that CPU or the single thread enters the
> kernel for a blocking syscall the scheduler tick has to be
> restarted.
That's an elegant approach...I like it.
How would you deal with per-cpu kernel threads (softirqs, etc.) or
softirq processing while in the kernel? Switching off the timer tick
isn't sufficient because the scheduler will be triggered on the way back
to userspace in a syscall.
Chris
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: RFC: THE OFFLINE SCHEDULER
2009-08-27 23:51 ` Chris Friesen
@ 2009-08-28 0:44 ` Thomas Gleixner
2009-08-28 21:20 ` Chris Friesen
0 siblings, 1 reply; 79+ messages in thread
From: Thomas Gleixner @ 2009-08-28 0:44 UTC (permalink / raw)
To: Chris Friesen
Cc: Christoph Lameter, raz ben yehuda, Andrew Morton, mingo, peterz,
maximlevitsky, efault, riel, wiseman, linux-kernel,
linux-rt-users
On Thu, 27 Aug 2009, Chris Friesen wrote:
> On 08/27/2009 03:09 PM, Thomas Gleixner wrote:
>
> > That's just the wrong approach. All you need is a way to tell the
> > kernel that CPUx can switch off the scheduler tick when only one
> > thread is running and that very thread is running in user space. Once
> > another thread arrives on that CPU or the single thread enters the
> > kernel for a blocking syscall the scheduler tick has to be
> > restarted.
>
> That's an elegant approach...I like it.
>
> How would you deal with per-cpu kernel threads (softirqs, etc.) or
> softirq processing while in the kernel?
If you have pinned an interrupt to that CPU then you need to process
the softirq for it as well. If that's the device your very single user
space thread is talking to then you better want that, if you are not
interested then simply pin that device irq to some other CPU: no irq
-> no softirq.
> Switching off the timer tick isn't sufficient because the scheduler
> will be triggered on the way back to userspace in a syscall.
If there is just one user space thread why is the NOOP call to the
scheduler interesting ? If you go into the kernel you have some
overhead anyway, so why would the few instructions to call schedule()
and return with the same task (as it is the only runnable) matter ?
Thanks,
tglx
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: RFC: THE OFFLINE SCHEDULER
2009-08-27 22:22 ` Gregory Haskins
@ 2009-08-28 2:15 ` Rik van Riel
2009-08-28 3:33 ` Gregory Haskins
2009-08-28 6:14 ` Peter Zijlstra
1 sibling, 1 reply; 79+ messages in thread
From: Rik van Riel @ 2009-08-28 2:15 UTC (permalink / raw)
To: Gregory Haskins
Cc: Thomas Gleixner, Christoph Lameter, Chris Friesen, raz ben yehuda,
Andrew Morton, mingo, peterz, maximlevitsky, efault, wiseman,
linux-kernel, linux-rt-users
Gregory Haskins wrote:
> 2) Modify FIFO so that it disables tick by default...update accounting
> info at next reschedule event.
I like it. The only thing to watch out for is that
events that wake up higher-priority FIFO tasks do
not get deferred :)
--
All rights reversed.
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: RFC: THE OFFLINE SCHEDULER
2009-08-28 2:15 ` Rik van Riel
@ 2009-08-28 3:33 ` Gregory Haskins
2009-08-28 4:27 ` Gregory Haskins
0 siblings, 1 reply; 79+ messages in thread
From: Gregory Haskins @ 2009-08-28 3:33 UTC (permalink / raw)
To: Rik van Riel
Cc: Thomas Gleixner, Christoph Lameter, Chris Friesen, raz ben yehuda,
Andrew Morton, mingo, peterz, maximlevitsky, efault, wiseman,
linux-kernel, linux-rt-users
[-- Attachment #1: Type: text/plain, Size: 1293 bytes --]
Hi Rik,
Rik van Riel wrote:
> Gregory Haskins wrote:
>
>> 2) Modify FIFO so that it disables tick by default...update accounting
>> info at next reschedule event.
>
> I like it. The only thing to watch out for is that
> events that wake up higher-priority FIFO tasks do
> not get deferred :)
>
Yeah, agreed. My (potentially half-baked) proposal should work at least
from a pure scheduling perspective since FIFO technically does not
reschedule based on a tick, and wakeups/migrations should still work
bidirectionally with existing scheduler policies.
However, and to what I believe is your point: its not entirely clear to
me what impact, if any, there would be w.r.t. any _other_ events that
may be driven off of the scheduler tick (i.e. events other than
scheduling policies, like timeslice expiration, etc). Perhaps someone
else like Thomas, Ingo, or Peter have some input here.
I guess the specific question to ask is: Does the scheduler tick code
have any role other than timeslice policies and updating accounting
information? Examples would include timer-expiry, for instance. I
would think most of this logic is handled by finer grained components
like HRT, but I am admittedly ignorant of the actual timer voodoo ;)
Kind Regards,
-Greg
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 267 bytes --]
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: RFC: THE OFFLINE SCHEDULER
2009-08-28 3:33 ` Gregory Haskins
@ 2009-08-28 4:27 ` Gregory Haskins
2009-08-28 10:26 ` Thomas Gleixner
0 siblings, 1 reply; 79+ messages in thread
From: Gregory Haskins @ 2009-08-28 4:27 UTC (permalink / raw)
To: Rik van Riel
Cc: Thomas Gleixner, Christoph Lameter, Chris Friesen, raz ben yehuda,
Andrew Morton, mingo, peterz, maximlevitsky, efault, wiseman,
linux-kernel, linux-rt-users
[-- Attachment #1: Type: text/plain, Size: 2572 bytes --]
Gregory Haskins wrote:
> Hi Rik,
>
> Rik van Riel wrote:
>> Gregory Haskins wrote:
>>
>>> 2) Modify FIFO so that it disables tick by default...update accounting
>>> info at next reschedule event.
>> I like it. The only thing to watch out for is that
>> events that wake up higher-priority FIFO tasks do
>> not get deferred :)
>>
>
> Yeah, agreed. My (potentially half-baked) proposal should work at least
> from a pure scheduling perspective since FIFO technically does not
> reschedule based on a tick, and wakeups/migrations should still work
> bidirectionally with existing scheduler policies.
>
> However, and to what I believe is your point: its not entirely clear to
> me what impact, if any, there would be w.r.t. any _other_ events that
> may be driven off of the scheduler tick (i.e. events other than
> scheduling policies, like timeslice expiration, etc). Perhaps someone
> else like Thomas, Ingo, or Peter have some input here.
>
> I guess the specific question to ask is: Does the scheduler tick code
> have any role other than timeslice policies and updating accounting
> information? Examples would include timer-expiry, for instance. I
> would think most of this logic is handled by finer grained components
> like HRT, but I am admittedly ignorant of the actual timer voodoo ;)
>
Thinking about this idea some more: I can't see why this isn't just a
trivial variation of the nohz idle code already in mainline. In both
cases (idle and FIFO tasks) the cpu is "consumed" 100% by some arbitrary
job (spinning/HLT for idle, RT thread for FIFO) while we have the
scheduler tick disabled. The only real difference is a matter of
power-management (HLT/mwait go to sleep-states, whereas spinning/rt-task
run full tilt).
Therefore the answer may be as simple as bracketing the FIFO task with
tick_nohz_stop_sched_tick() + tick_nohz_restart_sched_tick(). The nohz
code will probably need some minor adjustments so it is not assuming
things about the state being "idle" (e.g. "isidle") for places when it
matters (idle_calls++ stat is one example).
Potential problems:
a) disabling/renabling the tick on a per-RT task schedule() may prove to
be prohibitively expensive.
b) we will need to make sure the rt-bandwidth protection mechanism is
defeated so the task is allowed to consume 100% bandwidth.
Perhaps these states should be in the cpuset/root-domain, and configured
when you create the partition (e.g. "tick=off", "bandwidth=off" makes it
an "offline" set).
Kind Regards,
-Greg
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 267 bytes --]
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: RFC: THE OFFLINE SCHEDULER
2009-08-27 22:22 ` Gregory Haskins
2009-08-28 2:15 ` Rik van Riel
@ 2009-08-28 6:14 ` Peter Zijlstra
1 sibling, 0 replies; 79+ messages in thread
From: Peter Zijlstra @ 2009-08-28 6:14 UTC (permalink / raw)
To: Gregory Haskins
Cc: Thomas Gleixner, Christoph Lameter, Chris Friesen, raz ben yehuda,
Andrew Morton, mingo, maximlevitsky, efault, riel, wiseman,
linux-kernel, linux-rt-users
On Thu, 2009-08-27 at 18:22 -0400, Gregory Haskins wrote:
> I wonder if it makes sense to do something along the lines of the
> sched-class...
Disabling the tick isn't a big deal from the scheduler's point of view,
its all the other accounting crap that happens.
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: RFC: THE OFFLINE SCHEDULER
2009-08-27 22:05 ` Thomas Gleixner
@ 2009-08-28 8:38 ` raz ben yehuda
2009-08-28 10:05 ` Thomas Gleixner
2009-08-28 13:25 ` Rik van Riel
0 siblings, 2 replies; 79+ messages in thread
From: raz ben yehuda @ 2009-08-28 8:38 UTC (permalink / raw)
To: Thomas Gleixner
Cc: Chris Friesen, Andrew Morton, mingo, peterz, maximlevitsky,
efault, riel, wiseman, linux-kernel, linux-rt-users
On Fri, 2009-08-28 at 00:05 +0200, Thomas Gleixner wrote:
> On Fri, 28 Aug 2009, raz ben yehuda wrote:
> > On Thu, 2009-08-27 at 10:51 -0600, Chris Friesen wrote:
> > > On 08/26/2009 03:37 PM, raz ben yehuda wrote:
> > > >
> > > > On Wed, 2009-08-26 at 15:15 -0600, Chris Friesen wrote:
> > >
> > > >> We gave it as close to a whole cpu as we could using cpu and irq
> > > >> affinity and we used message queues in shared memory to allow another
> > > >> cpu to handle I/O. In our case we still had kernel threads running on
> > > >> the app cpu, but if we'd had a straightforward way to avoid them we
> > > >> would have used it.
> > >
> > > > Chris. I offer myself to help anyone wishes to apply OFFSCHED.
> > >
> > > I just went and read the docs. One of the things I noticed is that it
> > > says that the offlined cpu cannot run userspace tasks. For our
> > > situation that's a showstopper, unfortunately.
> >
> > Given that your entire software is T size , and T' is the amount of real
> > time size, what is the relation T'/T ?
> > If T'/T << 1 then dissect it, and put the T' in OFFSCHED.
> > My software T's is about 100MB while the real time section is about 60K.
>
> Chris was stating that your offlined cpu cannot run userspace
> tasks. How is your answer connected to Chris' statement ? Please stop
> useless marketing. LKML is about technical problems not advertisement.
>
> > They communicate through a simple ioctls.
>
> This is totally irrelevant and we all know how communication channels
> between Linux and whatever hackery (RT-Linux, RTAI, ... OFFLINE)
> works.
Why are you referring to the above projects as hacks ? What is a hack ?
> > CPU isolation example: a transmission engine.
> >
> > In the image bellow, I am presenting 4 streaming engines, over 4 Intels
> > 82598EB 10Gbps. A streaming engine is actually a Xeon E5420 2.5GHz.
> > Each engine has ***full control*** over its own interface. So you can:
> >
> > 1. fully control the processor's usage.
>
> By disabling the OS control over the CPU resource. How innovative.
I must say when a john doe like me receives this kind of response from names like Thomas Gleixner
it aches.
> > 2. know **exactly*** how much each single packet transmission costs. for
> > example, in this case in processor 3 a single packet average
> > transmission is 1974tscs, which is ~700ns.
> >
> > 3. know how many packets fails to transmit right **on time** ( the Lates
> > counter) . and on time in this case means within the 122us jitter.
>
> Are those statistics a crucial property of your OFFLINE thing ?
yes. latency is a crucial property.
> > 4. There are 8 cores in this machine. The rest 4 OS cores are ~95% idle.
> > The only resource these cores share is the bus.
>
> That does not change the problem that you cannot run ordinary user
> space tasks on your offlined CPUs and you are simply hacking round the
> real problem which I outlined in my previous mail.
OFFSCHED is not just about RT. it is about assigning assignments to another resource
outside the operating system. very much like GPUs, network processors,
and so on, but just with software that is accessible to the OS.
> Thanks,
>
> tglx
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: RFC: THE OFFLINE SCHEDULER
2009-08-28 8:38 ` raz ben yehuda
@ 2009-08-28 10:05 ` Thomas Gleixner
2009-08-28 13:25 ` Rik van Riel
1 sibling, 0 replies; 79+ messages in thread
From: Thomas Gleixner @ 2009-08-28 10:05 UTC (permalink / raw)
To: raz ben yehuda
Cc: Chris Friesen, Andrew Morton, mingo, peterz, maximlevitsky,
efault, riel, wiseman, linux-kernel, linux-rt-users
On Fri, 28 Aug 2009, raz ben yehuda wrote:
> On Fri, 2009-08-28 at 00:05 +0200, Thomas Gleixner wrote:
> > This is totally irrelevant and we all know how communication channels
> > between Linux and whatever hackery (RT-Linux, RTAI, ... OFFLINE)
> > works.
>
> Why are you referring to the above projects as hacks ? What is a hack ?
Everything which works around the real problem instead of solving it.
> > > 2. know **exactly*** how much each single packet transmission costs. for
> > > example, in this case in processor 3 a single packet average
> > > transmission is 1974tscs, which is ~700ns.
> > >
> > > 3. know how many packets fails to transmit right **on time** ( the Lates
> > > counter) . and on time in this case means within the 122us jitter.
> >
> > Are those statistics a crucial property of your OFFLINE thing ?
> yes. latency is a crucial property.
You are not answering my quesiton.
Also reducing latencies is something we want to do in the kernel
proper in the first place. We all know that you can reduce the
latencies by taking control away from the kernel and running a side
show. But that's nothing new. It has been done for decades already and
none of these projects has ever improved the kernel itself.
> > > 4. There are 8 cores in this machine. The rest 4 OS cores are ~95% idle.
> > > The only resource these cores share is the bus.
> >
> > That does not change the problem that you cannot run ordinary user
> > space tasks on your offlined CPUs and you are simply hacking round the
> > real problem which I outlined in my previous mail.
>
> OFFSCHED is not just about RT. it is about assigning assignments to
> another resource outside the operating system. very much like GPUs,
> network processors, and so on, but just with software that is
> accessible to the OS.
I was not talking about RT. I was talking about the problem that you
cannot run an ordinary user space task on your offlined CPU. That's
the main point where the design sucks. Having specialized programming
environments which impose tight restrictions on the application
programmer for no good reason are horrible.
Also how are GPUs, network processors related to my statements ?
Running specialized software on dedicated hardware which is an addon
to the base system controlled by the kernel is not new. There are
enough real world applications running Linux on the main CPU and some
extra code on an add on DSP or whatever. Cell/SPU or the TI ARM/DSP
combos are just the most obvious examples which come to my mind. Where
is the point of OFFSCHED here ?
In your earlier mails you talked about isolating cores of the base
system by taking the control away from the kernel and what a wonderful
solution this is because it allows you full control over that core.
We can dedicate a core to special computations today and we can assign
resources of any kind to it under the full control of the OS. The only
disturbing factor is the scheduler tick.
So you work around the scheduler tick problem by taking the core away
from the OS. That does not solve the problem, it simply introduces a
complete set of new problems:
- accounting of CPU utilization excludes the offlined core
- resource assignment is restricted to startup of the application
- standard admin tools (top, ps ....) are not working
- unnecessary restrictions for the application programmer:
- no syscalls
- no standard IPC
- ....
- debugging of the code which runs on the offlined core needs
separate tools
- performance analysis e.g. with profiling/performance counters
cannot use the existing mechanisms
- ....
Thanks,
tglx
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: RFC: THE OFFLINE SCHEDULER
2009-08-28 4:27 ` Gregory Haskins
@ 2009-08-28 10:26 ` Thomas Gleixner
2009-08-28 18:57 ` Christoph Lameter
0 siblings, 1 reply; 79+ messages in thread
From: Thomas Gleixner @ 2009-08-28 10:26 UTC (permalink / raw)
To: Gregory Haskins
Cc: Rik van Riel, Christoph Lameter, Chris Friesen, raz ben yehuda,
Andrew Morton, mingo, peterz, maximlevitsky, efault, wiseman,
linux-kernel, linux-rt-users
On Fri, 28 Aug 2009, Gregory Haskins wrote:
> > However, and to what I believe is your point: its not entirely clear to
> > me what impact, if any, there would be w.r.t. any _other_ events that
> > may be driven off of the scheduler tick (i.e. events other than
> > scheduling policies, like timeslice expiration, etc). Perhaps someone
> > else like Thomas, Ingo, or Peter have some input here.
> >
> > I guess the specific question to ask is: Does the scheduler tick code
> > have any role other than timeslice policies and updating accounting
> > information? Examples would include timer-expiry, for instance. I
> > would think most of this logic is handled by finer grained components
> > like HRT, but I am admittedly ignorant of the actual timer voodoo ;)
There is not much happening in the scheduler tick:
- accounting of CPU time. this can be delegated to some other CPU
as long as the user space task is running and consuming 100%
- timer list timers. If there is no service/device active on that CPU
then there are no timers to run
- rcu call backs. Same as above, but might need some tweaking.
- printk tick. Not really interesting
- scheduler time slicing. Not necessary in such a context
- posix cpu timers. Only interesting when the application uses them
So there is not much which needs the tick in such a scenario.
Of course we'd need to exclude that CPU from the do_timer duty as
well.
> Thinking about this idea some more: I can't see why this isn't just a
> trivial variation of the nohz idle code already in mainline. In both
> cases (idle and FIFO tasks) the cpu is "consumed" 100% by some arbitrary
> job (spinning/HLT for idle, RT thread for FIFO) while we have the
> scheduler tick disabled. The only real difference is a matter of
> power-management (HLT/mwait go to sleep-states, whereas spinning/rt-task
> run full tilt).
>
> Therefore the answer may be as simple as bracketing the FIFO task with
> tick_nohz_stop_sched_tick() + tick_nohz_restart_sched_tick(). The nohz
> code will probably need some minor adjustments so it is not assuming
> things about the state being "idle" (e.g. "isidle") for places when it
> matters (idle_calls++ stat is one example).
Yeah, it's similar to what we do in nohz idle already, but we'd need
to split out some of the functions very carefully to reuse them.
> Potential problems:
>
> a) disabling/renabling the tick on a per-RT task schedule() may prove to
> be prohibitively expensive.
For a single taks consuming 100% CPU it is a non problem. You disable
it once. But yes on a standard system this needs to be investigated.
> b) we will need to make sure the rt-bandwidth protection mechanism is
> defeated so the task is allowed to consume 100% bandwidth.
>
> Perhaps these states should be in the cpuset/root-domain, and configured
> when you create the partition (e.g. "tick=off", "bandwidth=off" makes it
> an "offline" set).
That makes sense and should not be rocket science to implement.
Thanks,
tglx
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: RFC: THE OFFLINE SCHEDULER
2009-08-28 8:38 ` raz ben yehuda
2009-08-28 10:05 ` Thomas Gleixner
@ 2009-08-28 13:25 ` Rik van Riel
2009-08-28 13:37 ` jim owens
2009-08-28 15:22 ` raz ben yehuda
1 sibling, 2 replies; 79+ messages in thread
From: Rik van Riel @ 2009-08-28 13:25 UTC (permalink / raw)
To: raz ben yehuda
Cc: Thomas Gleixner, Chris Friesen, Andrew Morton, mingo, peterz,
maximlevitsky, efault, wiseman, linux-kernel, linux-rt-users
raz ben yehuda wrote:
> yes. latency is a crucial property.
In the case of network packets, wouldn't you get a lower
latency by transmitting the packet from the CPU that
knows the packet should be transmitted, instead of sending
an IPI to another CPU and waiting for that CPU to do the
work?
Inter-CPU communication has always been the bottleneck
when it comes to SMP performance. Why does adding more
inter-CPU communication make your system faster, instead
of slower like one would expect?
--
All rights reversed.
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: RFC: THE OFFLINE SCHEDULER
2009-08-28 13:25 ` Rik van Riel
@ 2009-08-28 13:37 ` jim owens
2009-08-28 15:22 ` raz ben yehuda
1 sibling, 0 replies; 79+ messages in thread
From: jim owens @ 2009-08-28 13:37 UTC (permalink / raw)
To: Rik van Riel
Cc: raz ben yehuda, Thomas Gleixner, Chris Friesen, Andrew Morton,
mingo, peterz, maximlevitsky, efault, wiseman, linux-kernel,
linux-rt-users
Rik van Riel wrote:
> raz ben yehuda wrote:
>
>> yes. latency is a crucial property.
>
> In the case of network packets, wouldn't you get a lower
> latency by transmitting the packet from the CPU that
> knows the packet should be transmitted, instead of sending
> an IPI to another CPU and waiting for that CPU to do the
> work?
>
> Inter-CPU communication has always been the bottleneck
> when it comes to SMP performance. Why does adding more
> inter-CPU communication make your system faster, instead
> of slower like one would expect?
>
Maybe just me being paranoid, but from the beginning this
"use for dedicated IO processor" has scared the crap out of me.
Reminds me of Winmodem... sell cheap hardware by stealing your CPU!
The HPC FIFO user application on the other hand is a reasonable
if somewhat edge-case specialized user batch job.
jim
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: RFC: THE OFFLINE SCHEDULER
2009-08-28 13:25 ` Rik van Riel
2009-08-28 13:37 ` jim owens
@ 2009-08-28 15:22 ` raz ben yehuda
1 sibling, 0 replies; 79+ messages in thread
From: raz ben yehuda @ 2009-08-28 15:22 UTC (permalink / raw)
To: Rik van Riel
Cc: Thomas Gleixner, Chris Friesen, Andrew Morton, mingo, peterz,
maximlevitsky, efault, wiseman, linux-kernel, linux-rt-users
On Fri, 2009-08-28 at 09:25 -0400, Rik van Riel wrote:
> raz ben yehuda wrote:
>
> > yes. latency is a crucial property.
>
> In the case of network packets, wouldn't you get a lower
> latency by transmitting the packet from the CPU that
> knows the packet should be transmitted, instead of sending
> an IPI to another CPU and waiting for that CPU to do the
> work?
Hello Rik
If I understand what you are saying, you say that I pass 1.5K packets to
a offline CPU ?
If so, then this is not what I do, because you are very right, it does
not make any sense.
I do not pass packets to an offline cpu , i pass assignments. an
assignment is a buffer with some context of what do with it (like aio)
and a buffer is of ~1MB. Also, the offline processor holds the network
interface as it own interface. No two offline processors transmit over a
single interface.( I modified the bonding driver to work with offline
processor for that ). I am aware of network queue per processors, but
benchmarks proved this was better.( I do not have these benchmarks
now).
Also these engines do not release any sk_buffs to the operating system,
these packets are being reused over and over to reduce latency of
allocating memory and cache misses.
Also, in some cases I disabled the transmit interrupts and I released
packets ( --skb->users was still greater than 0, not really release ) in
an offline context.I learned it from the chelsio driver. This way, I
reduced more load from the operating system. It proved to be better in
large 1Gbps arrays and was able to remove atomic_inc atomic_dec in some
variants of the code, atomic operations cost a lot.
in MSI cards I did not find it useful.in the example i showed, i use MSI
and system is almost idle.
Also, as I recall , IPI will not pass to an offladed processor. offsced
it runs NMI.
Also, I would to express my apologies if any of this correspondence
seems to be as I am trying to PR offsched. I am not.
> Inter-CPU communication has always been the bottleneck
> when it comes to SMP performance. Why does adding more
> inter-CPU communication make your system faster, instead
> of slower like one would expect?
>
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: RFC: THE OFFLINE SCHEDULER
2009-08-27 21:09 ` Thomas Gleixner
2009-08-27 22:22 ` Gregory Haskins
2009-08-27 23:51 ` Chris Friesen
@ 2009-08-28 18:43 ` Christoph Lameter
2 siblings, 0 replies; 79+ messages in thread
From: Christoph Lameter @ 2009-08-28 18:43 UTC (permalink / raw)
To: Thomas Gleixner
Cc: Chris Friesen, raz ben yehuda, Andrew Morton, mingo, peterz,
maximlevitsky, efault, riel, wiseman, linux-kernel,
linux-rt-users
On Thu, 27 Aug 2009, Thomas Gleixner wrote:
> You are creating a "one special case" operation mode which is not
> justified in my opinion. Let's look at the problem you want to solve:
>
> Run exactly one thread on a dedicated CPU w/o any disturbance by the
> scheduler tick.
Thats not the problem I want to solve. There are multiple events that
could disturb a process like timers firing, softirqs and hardirqs.
> You can move away anything else than the scheduler tick from a CPU
> today already w/o a single line of code change.
How do you remove the per processor kernel threads for allocators and
other kernel subsystems? What about IPI broadcasts?
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: RFC: THE OFFLINE SCHEDULER
2009-08-28 10:26 ` Thomas Gleixner
@ 2009-08-28 18:57 ` Christoph Lameter
2009-08-28 19:23 ` Thomas Gleixner
0 siblings, 1 reply; 79+ messages in thread
From: Christoph Lameter @ 2009-08-28 18:57 UTC (permalink / raw)
To: Thomas Gleixner
Cc: Gregory Haskins, Rik van Riel, Chris Friesen, raz ben yehuda,
Andrew Morton, mingo, peterz, maximlevitsky, efault, wiseman,
linux-kernel, linux-rt-users
On Fri, 28 Aug 2009, Thomas Gleixner wrote:
> That makes sense and should not be rocket science to implement.
I like it and such a thing would do a lot for reducing noise.
However, look at a typical task (from the HPC world) that would be
running on an isolated processors. It would
1. Spin on some memory location waiting for an event.
2. Process data passed to it, prepare output data and then go back to 1.
The enticing thing about doing 1 with shared memory and/or infiniband is
that it can be done in a few hundred nanoseconds instead of 10-20
microseconds. This allows a much faster IPC communication if we bypass
the OS.
For many uses deterministic responses are desired. If the handler that
runs is never disturbed by extraneous processing (IPI, faults, irqs etc)
then we can say that we run at the maximum speed that the machine can run
at. That is what many sites expect.
In an HPC environment synchronization points are essential and the
frequency of synchronization points (where we spin on a cacheline) is
important for the ability to scale the accuratey and the performance of
the algorithm. If we can make N processor operate in a deterministic
fashion on f.e. an array of floating point numbers then the rendezvous
occurring with minimal wait time in each of the N processes. Getting rid
of all sources of interruptions gets us the best performance possible.
Right now often strong variability makes it necessary to have long
durations of the processing periods and deal with long wait times because
one of the N processes has not finished yet.
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: RFC: THE OFFLINE SCHEDULER
2009-08-28 18:57 ` Christoph Lameter
@ 2009-08-28 19:23 ` Thomas Gleixner
2009-08-28 19:52 ` Christoph Lameter
0 siblings, 1 reply; 79+ messages in thread
From: Thomas Gleixner @ 2009-08-28 19:23 UTC (permalink / raw)
To: Christoph Lameter
Cc: Gregory Haskins, Rik van Riel, Chris Friesen, raz ben yehuda,
Andrew Morton, mingo, peterz, maximlevitsky, efault, wiseman,
linux-kernel, linux-rt-users
On Fri, 28 Aug 2009, Christoph Lameter wrote:
> On Fri, 28 Aug 2009, Thomas Gleixner wrote:
>
> > That makes sense and should not be rocket science to implement.
>
> I like it and such a thing would do a lot for reducing noise.
>
> However, look at a typical task (from the HPC world) that would be
> running on an isolated processors. It would
>
> 1. Spin on some memory location waiting for an event.
>
> 2. Process data passed to it, prepare output data and then go back to 1.
>
> The enticing thing about doing 1 with shared memory and/or infiniband is
> that it can be done in a few hundred nanoseconds instead of 10-20
> microseconds. This allows a much faster IPC communication if we bypass
> the OS.
>
> For many uses deterministic responses are desired. If the handler that
> runs is never disturbed by extraneous processing (IPI, faults, irqs etc)
> then we can say that we run at the maximum speed that the machine can run
> at. That is what many sites expect.
Right, and I think we can get there. The timer can be eliminated with
some work. Faults shouldn't happen on that CPU and all other
interrupts can be kept away with proper affinity settings. Softirqs
should not happen on such a CPU either as there is neither a hardirq
nor a user space task triggering them. Same applies for timers. So
there are some remaining issues like IPIs, but I'm pretty sure that
they can be tamed to zero as well.
Thanks,
tglx
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: RFC: THE OFFLINE SCHEDULER
2009-08-28 19:23 ` Thomas Gleixner
@ 2009-08-28 19:52 ` Christoph Lameter
2009-08-28 20:00 ` Thomas Gleixner
0 siblings, 1 reply; 79+ messages in thread
From: Christoph Lameter @ 2009-08-28 19:52 UTC (permalink / raw)
To: Thomas Gleixner
Cc: Gregory Haskins, Rik van Riel, Chris Friesen, raz ben yehuda,
Andrew Morton, mingo, peterz, maximlevitsky, efault, wiseman,
linux-kernel, linux-rt-users
On Fri, 28 Aug 2009, Thomas Gleixner wrote:
> Right, and I think we can get there. The timer can be eliminated with
> some work. Faults shouldn't happen on that CPU and all other
> interrupts can be kept away with proper affinity settings. Softirqs
> should not happen on such a CPU either as there is neither a hardirq
> nor a user space task triggering them. Same applies for timers. So
> there are some remaining issues like IPIs, but I'm pretty sure that
> they can be tamed to zero as well.
There are various timer generated thingies like vm statistics, slab queue
management and device specific things that run on each processor.
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: RFC: THE OFFLINE SCHEDULER
2009-08-28 19:52 ` Christoph Lameter
@ 2009-08-28 20:00 ` Thomas Gleixner
2009-08-28 20:21 ` Christoph Lameter
0 siblings, 1 reply; 79+ messages in thread
From: Thomas Gleixner @ 2009-08-28 20:00 UTC (permalink / raw)
To: Christoph Lameter
Cc: Gregory Haskins, Rik van Riel, Chris Friesen, raz ben yehuda,
Andrew Morton, mingo, peterz, maximlevitsky, efault, wiseman,
linux-kernel, linux-rt-users
On Fri, 28 Aug 2009, Christoph Lameter wrote:
> On Fri, 28 Aug 2009, Thomas Gleixner wrote:
>
> > Right, and I think we can get there. The timer can be eliminated with
> > some work. Faults shouldn't happen on that CPU and all other
> > interrupts can be kept away with proper affinity settings. Softirqs
> > should not happen on such a CPU either as there is neither a hardirq
> > nor a user space task triggering them. Same applies for timers. So
> > there are some remaining issues like IPIs, but I'm pretty sure that
> > they can be tamed to zero as well.
>
> There are various timer generated thingies like vm statistics, slab queue
> management and device specific things that run on each processor.
The statistics stuff needs to be tackled anyway as we need to offload
the sched accounting to some other cpu.
What slab queue stuff is running on timers and cannot be switched off
in such a context?
Device specific stuff should not happen on such a CPU when there is no
device handled on it.
Thanks,
tglx
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: RFC: THE OFFLINE SCHEDULER
2009-08-28 20:00 ` Thomas Gleixner
@ 2009-08-28 20:21 ` Christoph Lameter
2009-08-28 20:34 ` Thomas Gleixner
2009-08-29 17:03 ` jim owens
0 siblings, 2 replies; 79+ messages in thread
From: Christoph Lameter @ 2009-08-28 20:21 UTC (permalink / raw)
To: Thomas Gleixner
Cc: Gregory Haskins, Rik van Riel, Chris Friesen, raz ben yehuda,
Andrew Morton, mingo, peterz, maximlevitsky, efault, wiseman,
linux-kernel, linux-rt-users
On Fri, 28 Aug 2009, Thomas Gleixner wrote:
> The statistics stuff needs to be tackled anyway as we need to offload
> the sched accounting to some other cpu.
The vm statisticcs in mm/vmstat.c are different from the sched accounting.
> What slab queue stuff is running on timers and cannot be switched off
> in such a context?
slab does run a timer every 2 second to age queues. If there was activity
then there can be a relatively long time in which we periodically throw
out portions of the cached data.
> Device specific stuff should not happen on such a CPU when there is no
> device handled on it.
The device may periodically check for conditions that require action.
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: RFC: THE OFFLINE SCHEDULER
2009-08-28 20:21 ` Christoph Lameter
@ 2009-08-28 20:34 ` Thomas Gleixner
2009-08-31 19:19 ` Christoph Lameter
2009-08-29 17:03 ` jim owens
1 sibling, 1 reply; 79+ messages in thread
From: Thomas Gleixner @ 2009-08-28 20:34 UTC (permalink / raw)
To: Christoph Lameter
Cc: Gregory Haskins, Rik van Riel, Chris Friesen, raz ben yehuda,
Andrew Morton, mingo, peterz, maximlevitsky, efault, wiseman,
linux-kernel, linux-rt-users
On Fri, 28 Aug 2009, Christoph Lameter wrote:
> On Fri, 28 Aug 2009, Thomas Gleixner wrote:
>
> > The statistics stuff needs to be tackled anyway as we need to offload
> > the sched accounting to some other cpu.
>
> The vm statisticcs in mm/vmstat.c are different from the sched accounting.
I know, but the problem is basically the same. Delegate the stats to
someone else.
> > What slab queue stuff is running on timers and cannot be switched off
> > in such a context?
>
> slab does run a timer every 2 second to age queues. If there was activity
> then there can be a relatively long time in which we periodically throw
> out portions of the cached data.
Right, but why does that affect a CPU which is marked "I'm not
involved in that game" ?
> > Device specific stuff should not happen on such a CPU when there is no
> > device handled on it.
>
> The device may periodically check for conditions that require action.
Errm. The device is associated to some other CPU, so why would it
require action on an isolated one ? Or are you talking about a device
which is associated to that isolated CPU ?
Thanks,
tglx
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: RFC: THE OFFLINE SCHEDULER
2009-08-28 0:44 ` Thomas Gleixner
@ 2009-08-28 21:20 ` Chris Friesen
0 siblings, 0 replies; 79+ messages in thread
From: Chris Friesen @ 2009-08-28 21:20 UTC (permalink / raw)
To: Thomas Gleixner
Cc: Christoph Lameter, raz ben yehuda, Andrew Morton, mingo, peterz,
maximlevitsky, efault, riel, wiseman, linux-kernel,
linux-rt-users
On 08/27/2009 06:44 PM, Thomas Gleixner wrote:
> On Thu, 27 Aug 2009, Chris Friesen wrote:
>> How would you deal with per-cpu kernel threads (softirqs, etc.) or
>> softirq processing while in the kernel?
>
> If you have pinned an interrupt to that CPU then you need to process
> the softirq for it as well. If that's the device your very single user
> space thread is talking to then you better want that, if you are not
> interested then simply pin that device irq to some other CPU: no irq
> -> no softirq.
Ah, okay. For some reason I had thought that the incoming work was
queued up globally and might be handled by any softirq.
Chris
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: RFC: THE OFFLINE SCHEDULER
2009-08-28 20:21 ` Christoph Lameter
2009-08-28 20:34 ` Thomas Gleixner
@ 2009-08-29 17:03 ` jim owens
2009-08-31 19:22 ` Christoph Lameter
1 sibling, 1 reply; 79+ messages in thread
From: jim owens @ 2009-08-29 17:03 UTC (permalink / raw)
To: Christoph Lameter
Cc: Thomas Gleixner, Gregory Haskins, Rik van Riel, Chris Friesen,
raz ben yehuda, Andrew Morton, mingo, peterz, maximlevitsky,
efault, wiseman, linux-kernel, linux-rt-users
Christoph Lameter wrote:
> On Fri, 28 Aug 2009, Thomas Gleixner wrote:
>> What slab queue stuff is running on timers and cannot be switched off
>> in such a context?
>
> slab does run a timer every 2 second to age queues. If there was activity
> then there can be a relatively long time in which we periodically throw
> out portions of the cached data.
OK, you have me fully confused now.
From other HPC people, I know the "no noise in my math application"
requirement. But that means the user code that is running on the
CPU must not do anything that wakes the kernel. Not even page faults,
so they pin the memory at job start.
Anything the user code does that needs kernel statistics or
kernel action is "I must fix my user code", or "I accept that
the noise is necessary".
So we don't need to offload stats to other CPUs, stats are not needed.
>> Device specific stuff should not happen on such a CPU when there is no
>> device handled on it.
>
> The device may periodically check for conditions that require action.
Again, what is this device and why is it controlled directly by
user-space code. Devices should be controlled even in an HPC
environment by the kernel. AFAIK HPC wants the kernel to be the
bootstrap and data transfer manager running on a small subset of
the total CPUs, with the dedicated CPUs running math jobs.
jim
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: RFC: THE OFFLINE SCHEDULER
2009-08-31 19:22 ` Christoph Lameter
@ 2009-08-31 15:33 ` Peter Zijlstra
2009-09-01 18:46 ` Christoph Lameter
0 siblings, 1 reply; 79+ messages in thread
From: Peter Zijlstra @ 2009-08-31 15:33 UTC (permalink / raw)
To: Christoph Lameter
Cc: jim owens, Thomas Gleixner, Gregory Haskins, Rik van Riel,
Chris Friesen, raz ben yehuda, Andrew Morton, mingo,
maximlevitsky, efault, wiseman, linux-kernel, linux-rt-users
On Mon, 2009-08-31 at 15:22 -0400, Christoph Lameter wrote:
>
> Stats updates are performed if needed or not. Same with slab expiration.
> Thats why its necessary to Offline the cpu.
Or we fix it to not do anything when not needed.
Come-on Christoph this ain't rocket science.
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: RFC: THE OFFLINE SCHEDULER
2009-08-31 19:19 ` Christoph Lameter
@ 2009-08-31 17:44 ` Roland Dreier
2009-09-01 18:42 ` Christoph Lameter
0 siblings, 1 reply; 79+ messages in thread
From: Roland Dreier @ 2009-08-31 17:44 UTC (permalink / raw)
To: Christoph Lameter
Cc: Thomas Gleixner, Gregory Haskins, Rik van Riel, Chris Friesen,
raz ben yehuda, Andrew Morton, mingo, peterz, maximlevitsky,
efault, wiseman, linux-kernel, linux-rt-users
> Some devices now have per cpu threads. See f.e. Mellanox IB drivers. Maybe
> there is a way to restrict that.
AFAIK the Mellanox drivers just create a single-threaded workqueue.
- R.
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: RFC: THE OFFLINE SCHEDULER
2009-08-28 20:34 ` Thomas Gleixner
@ 2009-08-31 19:19 ` Christoph Lameter
2009-08-31 17:44 ` Roland Dreier
0 siblings, 1 reply; 79+ messages in thread
From: Christoph Lameter @ 2009-08-31 19:19 UTC (permalink / raw)
To: Thomas Gleixner
Cc: Gregory Haskins, Rik van Riel, Chris Friesen, raz ben yehuda,
Andrew Morton, mingo, peterz, maximlevitsky, efault, wiseman,
linux-kernel, linux-rt-users
On Fri, 28 Aug 2009, Thomas Gleixner wrote:
> > slab does run a timer every 2 second to age queues. If there was activity
> > then there can be a relatively long time in which we periodically throw
> > out portions of the cached data.
>
> Right, but why does that affect a CPU which is marked "I'm not
> involved in that game" ?
Its run unconditionally on every processor. System needs to scan through
all slabs and all queues to figure out if there is something to expire.
> > The device may periodically check for conditions that require action.
>
> Errm. The device is associated to some other CPU, so why would it
> require action on an isolated one ? Or are you talking about a device
> which is associated to that isolated CPU ?
Some devices now have per cpu threads. See f.e. Mellanox IB drivers. Maybe
there is a way to restrict that.
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: RFC: THE OFFLINE SCHEDULER
2009-08-29 17:03 ` jim owens
@ 2009-08-31 19:22 ` Christoph Lameter
2009-08-31 15:33 ` Peter Zijlstra
0 siblings, 1 reply; 79+ messages in thread
From: Christoph Lameter @ 2009-08-31 19:22 UTC (permalink / raw)
To: jim owens
Cc: Thomas Gleixner, Gregory Haskins, Rik van Riel, Chris Friesen,
raz ben yehuda, Andrew Morton, mingo, peterz, maximlevitsky,
efault, wiseman, linux-kernel, linux-rt-users
On Sat, 29 Aug 2009, jim owens wrote:
> From other HPC people, I know the "no noise in my math application"
> requirement. But that means the user code that is running on the
> CPU must not do anything that wakes the kernel. Not even page faults,
> so they pin the memory at job start.
Right.
> Anything the user code does that needs kernel statistics or
> kernel action is "I must fix my user code", or "I accept that
> the noise is necessary".
>
> So we don't need to offload stats to other CPUs, stats are not needed.
Stats updates are performed if needed or not. Same with slab expiration.
Thats why its necessary to Offline the cpu.
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: RFC: THE OFFLINE SCHEDULER
2009-09-01 18:42 ` Christoph Lameter
@ 2009-09-01 16:15 ` Roland Dreier
0 siblings, 0 replies; 79+ messages in thread
From: Roland Dreier @ 2009-09-01 16:15 UTC (permalink / raw)
To: Christoph Lameter
Cc: Thomas Gleixner, Gregory Haskins, Rik van Riel, Chris Friesen,
raz ben yehuda, Andrew Morton, mingo, peterz, maximlevitsky,
efault, wiseman, linux-kernel, linux-rt-users
> Ok then these per cpu irqs are there to support something different? There
> are per cpu irqs here. Seems to be hardware supported?
Yes, the driver now creates per-cpu IRQs for completions. However if
you don't trigger any completion events then you won't get any
interrupts. That's different from the workqueues, which are used to
poll the hardware for port changes and internal errors (and which are
single-threaded and can be put on whatever "system services" CPU you want)
- R.
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: RFC: THE OFFLINE SCHEDULER
2009-08-31 17:44 ` Roland Dreier
@ 2009-09-01 18:42 ` Christoph Lameter
2009-09-01 16:15 ` Roland Dreier
0 siblings, 1 reply; 79+ messages in thread
From: Christoph Lameter @ 2009-09-01 18:42 UTC (permalink / raw)
To: Roland Dreier
Cc: Thomas Gleixner, Gregory Haskins, Rik van Riel, Chris Friesen,
raz ben yehuda, Andrew Morton, mingo, peterz, maximlevitsky,
efault, wiseman, linux-kernel, linux-rt-users
On Mon, 31 Aug 2009, Roland Dreier wrote:
>
> > Some devices now have per cpu threads. See f.e. Mellanox IB drivers. Maybe
> > there is a way to restrict that.
>
> AFAIK the Mellanox drivers just create a single-threaded workqueue.
Ok then these per cpu irqs are there to support something different? There
are per cpu irqs here. Seems to be hardware supported?
62: 13824 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 IR-PCI-MSI-edge mlx4-comp-0
63: 0 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 IR-PCI-MSI-edge mlx4-comp-1
64: 0 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 IR-PCI-MSI-edge mlx4-comp-2
65: 0 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 IR-PCI-MSI-edge mlx4-comp-3
66: 0 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 IR-PCI-MSI-edge mlx4-comp-4
67: 0 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 IR-PCI-MSI-edge mlx4-comp-5
68: 0 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 IR-PCI-MSI-edge mlx4-comp-6
69: 0 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 IR-PCI-MSI-edge mlx4-comp-7
70: 0 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 IR-PCI-MSI-edge mlx4-comp-8
71: 0 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 IR-PCI-MSI-edge mlx4-comp-9
72: 0 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 IR-PCI-MSI-edge mlx4-comp-10
73: 0 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 IR-PCI-MSI-edge mlx4-comp-11
74: 0 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 IR-PCI-MSI-edge mlx4-comp-12
75: 0 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 IR-PCI-MSI-edge mlx4-comp-13
76: 0 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 IR-PCI-MSI-edge mlx4-comp-14
77: 0 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 IR-PCI-MSI-edge mlx4-comp-15
78: 3225 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 IR-PCI-MSI-edge mlx4-async
79: 1832 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 IR-PCI-MSI-edge eth0-0
80: 5546 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 IR-PCI-MSI-edge eth0-1
81: 2604 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 IR-PCI-MSI-edge eth0-2
82: 124 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 IR-PCI-MSI-edge eth0-3
83: 743126 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 IR-PCI-MSI-edge eth0-4
84: 857 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 IR-PCI-MSI-edge eth0-5
85: 321 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 IR-PCI-MSI-edge eth0-6
86: 2296 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 IR-PCI-MSI-edge eth0-7
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: RFC: THE OFFLINE SCHEDULER
2009-08-31 15:33 ` Peter Zijlstra
@ 2009-09-01 18:46 ` Christoph Lameter
0 siblings, 0 replies; 79+ messages in thread
From: Christoph Lameter @ 2009-09-01 18:46 UTC (permalink / raw)
To: Peter Zijlstra
Cc: jim owens, Thomas Gleixner, Gregory Haskins, Rik van Riel,
Chris Friesen, raz ben yehuda, Andrew Morton, mingo,
maximlevitsky, efault, wiseman, linux-kernel, linux-rt-users
On Mon, 31 Aug 2009, Peter Zijlstra wrote:
> On Mon, 2009-08-31 at 15:22 -0400, Christoph Lameter wrote:
> >
> > Stats updates are performed if needed or not. Same with slab expiration.
> > Thats why its necessary to Offline the cpu.
>
> Or we fix it to not do anything when not needed.
>
> Come-on Christoph this ain't rocket science.
Well lets try it then. Not sure that I can be too much help as long as we
have issues with UDP weirdness in the network layers. Bugs before feature
work.... Sigh.
^ permalink raw reply [flat|nested] 79+ messages in thread
end of thread, other threads:[~2009-09-01 18:46 UTC | newest]
Thread overview: 79+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <1250983671.5688.21.camel@raz>
[not found] ` <1251004897.7043.70.camel@marge.simson.net>
2009-08-23 9:09 ` RFC: THE OFFLINE SCHEDULER raz ben yehuda
2009-08-23 7:30 ` Mike Galbraith
2009-08-23 11:05 ` raz ben yehuda
2009-08-23 9:52 ` Mike Galbraith
2009-08-25 15:23 ` Christoph Lameter
2009-08-25 17:56 ` Mike Galbraith
2009-08-25 18:03 ` Christoph Lameter
2009-08-25 18:12 ` Mike Galbraith
[not found] ` <5d96567b0908251522m3fd4ab98n76a52a34a11e874c@mail.gmail.com>
2009-08-25 22:32 ` Fwd: " Raz
2009-08-25 19:08 ` Peter Zijlstra
2009-08-25 19:18 ` Christoph Lameter
2009-08-25 19:22 ` Chris Friesen
2009-08-25 20:35 ` Sven-Thorsten Dietrich
2009-08-26 5:31 ` Peter Zijlstra
2009-08-26 10:29 ` raz ben yehuda
2009-08-26 8:02 ` Mike Galbraith
2009-08-26 8:16 ` Raz
2009-08-26 13:47 ` Christoph Lameter
2009-08-26 14:45 ` Maxim Levitsky
2009-08-26 14:54 ` raz ben yehuda
2009-08-26 15:06 ` Pekka Enberg
2009-08-26 15:11 ` raz ben yehuda
2009-08-26 15:30 ` Peter Zijlstra
2009-08-26 15:41 ` Christoph Lameter
2009-08-26 16:03 ` Peter Zijlstra
2009-08-26 16:16 ` Pekka Enberg
2009-08-26 16:20 ` Christoph Lameter
2009-08-26 18:04 ` Ingo Molnar
2009-08-26 19:15 ` Christoph Lameter
2009-08-26 19:32 ` Ingo Molnar
2009-08-26 20:40 ` Christoph Lameter
2009-08-26 20:50 ` Andrew Morton
2009-08-26 21:09 ` Christoph Lameter
2009-08-26 21:15 ` Chris Friesen
2009-08-26 21:37 ` raz ben yehuda
2009-08-27 16:51 ` Chris Friesen
2009-08-27 17:04 ` Christoph Lameter
2009-08-27 21:09 ` Thomas Gleixner
2009-08-27 22:22 ` Gregory Haskins
2009-08-28 2:15 ` Rik van Riel
2009-08-28 3:33 ` Gregory Haskins
2009-08-28 4:27 ` Gregory Haskins
2009-08-28 10:26 ` Thomas Gleixner
2009-08-28 18:57 ` Christoph Lameter
2009-08-28 19:23 ` Thomas Gleixner
2009-08-28 19:52 ` Christoph Lameter
2009-08-28 20:00 ` Thomas Gleixner
2009-08-28 20:21 ` Christoph Lameter
2009-08-28 20:34 ` Thomas Gleixner
2009-08-31 19:19 ` Christoph Lameter
2009-08-31 17:44 ` Roland Dreier
2009-09-01 18:42 ` Christoph Lameter
2009-09-01 16:15 ` Roland Dreier
2009-08-29 17:03 ` jim owens
2009-08-31 19:22 ` Christoph Lameter
2009-08-31 15:33 ` Peter Zijlstra
2009-09-01 18:46 ` Christoph Lameter
2009-08-28 6:14 ` Peter Zijlstra
2009-08-27 23:51 ` Chris Friesen
2009-08-28 0:44 ` Thomas Gleixner
2009-08-28 21:20 ` Chris Friesen
2009-08-28 18:43 ` Christoph Lameter
2009-08-27 21:33 ` raz ben yehuda
2009-08-27 22:05 ` Thomas Gleixner
2009-08-28 8:38 ` raz ben yehuda
2009-08-28 10:05 ` Thomas Gleixner
2009-08-28 13:25 ` Rik van Riel
2009-08-28 13:37 ` jim owens
2009-08-28 15:22 ` raz ben yehuda
2009-08-26 21:34 ` Ingo Molnar
2009-08-27 2:55 ` Frank Ch. Eigler
2009-08-26 21:34 ` raz ben yehuda
2009-08-26 21:08 ` Ingo Molnar
2009-08-26 21:26 ` Christoph Lameter
2009-08-26 21:32 ` raz ben yehuda
2009-08-27 7:15 ` Mike Galbraith
2009-08-26 15:37 ` Chetan.Loke
2009-08-26 15:21 ` Pekka Enberg
2009-08-25 21:09 ` Éric Piel
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).