[RFD] Separating struct task and the kernel stacks

public inbox for linux-ia64@vger.kernel.org
 help / color / mirror / Atom feed

* [RFD] Separating struct task and the kernel stacks
@ 2005-06-10  6:15 Keith Owens
  2005-06-10 15:11 ` Christoph Lameter
                   ` (5 more replies)
  0 siblings, 6 replies; 7+ messages in thread
From: Keith Owens @ 2005-06-10  6:15 UTC (permalink / raw)
  To: linux-ia64

I have the MCA/INIT per cpu stack code to the point where I can
reliably enter mca.c using an MCA/INIT stack that is different from the
normal kernel stack.  However these separate stacks are now getting
problems because struct task is embedded in the kernel stack.

Switching stacks requires that struct task is copied from the original
"current" to the MCA/INIT stack, then change current to point to the
new stack.  Even that is not enough, there are still places that are
using the old value of "current".  The main problem is the scheduler,
it tracks tasks by the address of their struct task, not by the kernel
stack address.  When debugging an MCA/INIT, the mismatch between the
new value of current and the old task addresses in various structures
can lead to some very confusing results.  The kernel is not designed to
have struct task move around on the fly.

i386 handles multiple kernel stacks by moving struct task to a slab
allocator and leaving just struct thread_info in the stack.  Switching
a process from one kernel stack to another does not require any changes
to current nor to any task pointers.  Just copy thread_info from the
old to the new stack, add a back pointer to the previous stack and
continue processing.

I can continue trying to handle the MCA/INIT stacks with various
kludges in ia64 and common code, but it is not nice.  Separating struct
task from the kernel stack is a lot cleaner.  Before I go too far down
this path, are there any violent objections to moving struct task out
of the kernel stack?

Since the slab allocator uses addresses in region 7, we do not need any
extra hard wired TLBs for the separate struct task.  The alt-DTLB
handler will load the TLB on the fly, it does not need any stack data
to do that.

This change would remove the last vestige of
__HAVE_ARCH_TASK_STRUCT_ALLOCATOR from the kernel.  Only ia64 defines
that symbol, every other architecture uses separate struct tasks.

Add separate interrupt stacks like i386, and we could even reduce the
size of the kernel stack for each process.  CONFIG_16KSTACKS anyone?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFD] Separating struct task and the kernel stacks
  2005-06-10  6:15 [RFD] Separating struct task and the kernel stacks Keith Owens
@ 2005-06-10 15:11 ` Christoph Lameter
  2005-06-10 16:54 ` David Mosberger
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: Christoph Lameter @ 2005-06-10 15:11 UTC (permalink / raw)
  To: linux-ia64

On Fri, 10 Jun 2005, Keith Owens wrote:

> I have the MCA/INIT per cpu stack code to the point where I can
> reliably enter mca.c using an MCA/INIT stack that is different from the
> normal kernel stack.  However these separate stacks are now getting
> problems because struct task is embedded in the kernel stack.

How frequent are these MCA events?

> Switching stacks requires that struct task is copied from the original
> "current" to the MCA/INIT stack, then change current to point to the
> new stack.  Even that is not enough, there are still places that are
> using the old value of "current".  The main problem is the scheduler,
> it tracks tasks by the address of their struct task, not by the kernel
> stack address.  When debugging an MCA/INIT, the mismatch between the
> new value of current and the old task addresses in various structures
> can lead to some very confusing results.  The kernel is not designed to
> have struct task move around on the fly.

Could you just move the stack? Put a pointer to the stack in task_info. By 
default this is pointing to the stack in task_info. If you have to switch
point it elsewhere.
 
> i386 handles multiple kernel stacks by moving struct task to a slab
> allocator and leaving just struct thread_info in the stack.  Switching
> a process from one kernel stack to another does not require any changes
> to current nor to any task pointers.  Just copy thread_info from the
> old to the new stack, add a back pointer to the previous stack and
> continue processing.

Using the slab allocator generates a certain amount of overhead during 
process creation. I believe the page allocator would be faster. Also there 
needs to be no separate allocation of memory for the stack.

> I can continue trying to handle the MCA/INIT stacks with various
> kludges in ia64 and common code, but it is not nice.  Separating struct
> task from the kernel stack is a lot cleaner.  Before I go too far down
> this path, are there any violent objections to moving struct task out
> of the kernel stack?

I would suggest just to add a pointer so that the stack does not have to 
be in the page we allocate.

> This change would remove the last vestige of
> __HAVE_ARCH_TASK_STRUCT_ALLOCATOR from the kernel.  Only ia64 defines
> that symbol, every other architecture uses separate struct tasks.

I know of an architecture that is planning to switch to do the same as 
ia64 does now in order to increase the efficiency of task creation.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFD] Separating struct task and the kernel stacks
  2005-06-10  6:15 [RFD] Separating struct task and the kernel stacks Keith Owens
  2005-06-10 15:11 ` Christoph Lameter
@ 2005-06-10 16:54 ` David Mosberger
  2005-06-10 17:03 ` David Mosberger
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: David Mosberger @ 2005-06-10 16:54 UTC (permalink / raw)
  To: linux-ia64

>>>>> On Fri, 10 Jun 2005 16:15:11 +1000, Keith Owens <kaos@sgi.com> said:

  Keith> Since the slab allocator uses addresses in region 7, we do
  Keith> not need any extra hard wired TLBs for the separate struct
  Keith> task.  The alt-DTLB handler will load the TLB on the fly, it
  Keith> does not need any stack data to do that.

Interruption handlers access the task structure, register-backing
store, and memory stack via the virtual address-space.  Since
interruption-collection is turned off in those handlers, the alt-DTLB
handler doesn't do you any good.

Being able to map these structures with a single translation-register
is the primary reason for keeping the task structure and the stacks
together on ia64.

	--david

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFD] Separating struct task and the kernel stacks
  2005-06-10  6:15 [RFD] Separating struct task and the kernel stacks Keith Owens
  2005-06-10 15:11 ` Christoph Lameter
  2005-06-10 16:54 ` David Mosberger
@ 2005-06-10 17:03 ` David Mosberger
  2005-06-10 17:13 ` Matthew Wilcox
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: David Mosberger @ 2005-06-10 17:03 UTC (permalink / raw)
  To: linux-ia64

>>>>> On Fri, 10 Jun 2005 08:11:42 -0700 (PDT), Christoph Lameter <christoph@lameter.com> said:

  >> Switching stacks requires that struct task is copied from the
  >> original "current" to the MCA/INIT stack, then change current to
  >> point to the new stack.  Even that is not enough, there are still
  >> places that are using the old value of "current".  The main
  >> problem is the scheduler, it tracks tasks by the address of their
  >> struct task, not by the kernel stack address.  When debugging an
  >> MCA/INIT, the mismatch between the new value of current and the
  >> old task addresses in various structures can lead to some very
  >> confusing results.  The kernel is not designed to have struct
  >> task move around on the fly.

  Christoph> Could you just move the stack? Put a pointer to the stack
  Christoph> in task_info. By default this is pointing to the stack in
  Christoph> task_info. If you have to switch point it elsewhere.

Perhaps a more fruitful approach might be to treat the MCA as its own
task.  MCA are very special anyhow: they can happen at any time, even
when the kernel is supposedly non-preemptible.  Thus, running any
standard kernel code which acquires locks etc is rather dangerous.  If
you're going to do major surgery, this really needs to be thought
through first --- otherwise, we're just going to discover all the
little nasty issues piecemeal and that would be extremely painful and
cause a lot of extra work.

Also, let's be clear that we are NOT going to (de-)optimize the normal
Linux kernel based on the needs of MCA.  Proper MCA support is
important, but that needs to be accomplished without compromising the
efficiency of the normal kernel (which is where 99.99% of the time is
spent). My sense is that the MCA subsystem may have to be almost
stand-alone in order to really work reliably.

	--david

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFD] Separating struct task and the kernel stacks
  2005-06-10  6:15 [RFD] Separating struct task and the kernel stacks Keith Owens
                   ` (2 preceding siblings ...)
  2005-06-10 17:03 ` David Mosberger
@ 2005-06-10 17:13 ` Matthew Wilcox
  2005-06-10 17:20 ` David Mosberger
  2005-06-11  4:08 ` Keith Owens
  5 siblings, 0 replies; 7+ messages in thread
From: Matthew Wilcox @ 2005-06-10 17:13 UTC (permalink / raw)
  To: linux-ia64

On Fri, Jun 10, 2005 at 08:11:42AM -0700, Christoph Lameter wrote:
> Using the slab allocator generates a certain amount of overhead during 
> process creation. I believe the page allocator would be faster. Also there 
> needs to be no separate allocation of memory for the stack.

While speed of process creation is certainly an important benchmark,
speed of scheduling is more important.  I was under the impression that
the reason x86 moved the task_struct from the bottom of the stack to
its own slab was that the scheduler had bad cache effects due to all
task_structs being on the same 4k boundary.  Obviously, ia64 has slightly
different caches from x86, but I just wanted to point out that it isn't
necessarily all bad to move the task_struct off the stack.

-- 
"Next the statesmen will invent cheap lies, putting the blame upon 
the nation that is attacked, and every man will be glad of those
conscience-soothing falsities, and will diligently study them, and refuse
to examine any refutations of them; and thus he will by and by convince 
himself that the war is just, and will thank God for the better sleep 
he enjoys after this process of grotesque self-deception." -- Mark Twain

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFD] Separating struct task and the kernel stacks
  2005-06-10  6:15 [RFD] Separating struct task and the kernel stacks Keith Owens
                   ` (3 preceding siblings ...)
  2005-06-10 17:13 ` Matthew Wilcox
@ 2005-06-10 17:20 ` David Mosberger
  2005-06-11  4:08 ` Keith Owens
  5 siblings, 0 replies; 7+ messages in thread
From: David Mosberger @ 2005-06-10 17:20 UTC (permalink / raw)
  To: linux-ia64

>>>>> On Fri, 10 Jun 2005 18:13:41 +0100, Matthew Wilcox <matthew@wil.cx> said:

  Matthew> On Fri, Jun 10, 2005 at 08:11:42AM -0700, Christoph Lameter wrote:
  >> Using the slab allocator generates a certain amount of overhead during 
  >> process creation. I believe the page allocator would be faster. Also there 
  >> needs to be no separate allocation of memory for the stack.

  Matthew> While speed of process creation is certainly an important
  Matthew> benchmark, speed of scheduling is more important.  I was
  Matthew> under the impression that the reason x86 moved the
  Matthew> task_struct from the bottom of the stack to its own slab
  Matthew> was that the scheduler had bad cache effects due to all
  Matthew> task_structs being on the same 4k boundary.

That was only a secondary effect.  We can do task-coloring/"cache-line
randomization" for ia64 tasks without having to split them up (unlike
on x86, we have separate task and stack pointers and we never rely on
masking/alignment to compute one from the other, as is done on x86).

	--david

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFD] Separating struct task and the kernel stacks
  2005-06-10  6:15 [RFD] Separating struct task and the kernel stacks Keith Owens
                   ` (4 preceding siblings ...)
  2005-06-10 17:20 ` David Mosberger
@ 2005-06-11  4:08 ` Keith Owens
  5 siblings, 0 replies; 7+ messages in thread
From: Keith Owens @ 2005-06-11  4:08 UTC (permalink / raw)
  To: linux-ia64

On Fri, 10 Jun 2005 10:03:14 -0700, 
David Mosberger <davidm@napali.hpl.hp.com> wrote:
>>>>>> On Fri, 10 Jun 2005 08:11:42 -0700 (PDT), Christoph Lameter <christoph@lameter.com> said:
>
>  >> Switching stacks requires that struct task is copied from the
>  >> original "current" to the MCA/INIT stack, then change current to
>  >> point to the new stack.  Even that is not enough, there are still
>  >> places that are using the old value of "current".  The main
>  >> problem is the scheduler, it tracks tasks by the address of their
>  >> struct task, not by the kernel stack address.  When debugging an
>  >> MCA/INIT, the mismatch between the new value of current and the
>  >> old task addresses in various structures can lead to some very
>  >> confusing results.  The kernel is not designed to have struct
>  >> task move around on the fly.
>
>  Christoph> Could you just move the stack? Put a pointer to the stack
>  Christoph> in task_info. By default this is pointing to the stack in
>  Christoph> task_info. If you have to switch point it elsewhere.

Exactly what I was suggesting.  Separate struct task from the stack, so
the stack just contains thread_info and the register and memory stacks,
with struct task pointing to one of several stacks.  But as David has
pointed out, that is going to be less efficient.

>Perhaps a more fruitful approach might be to treat the MCA as its own
>task.

That is a promising idea.  Preformat the MCA/INIT stacks like the init
task, marking them interrupts disabled, non-preemptible etc.  To avoid
any disagreement with what the scheduler thinks is the system state,
mark the MCA/INIT tasks as not running on any cpu, even though they are
really in control while they are handling the event.

Some of the registers belonging to the interrupted tasks will be in RBS
on the MCA/INIT stack, which would normally stop us investigating the
original state.  The MCA/INIT handler would copy those registers back
to the original stack and add a switch_stack to make it look like the
original task is blocked.  This assumes that the MCA/INIT event
occurred while the cpu was running on a kernel stack and that there is
enough room on stack to save the state.

current and its corresponding DTC still have to be switched to point to
the MCA/INIT stack, there are too many places where current is tested
and we want almost all of those places to pick up the MCA/INIT state,
not the original.  For the few cases where we want the original value
of current (backtrace is the obvious case), we can detect that this is
the MCA/INIT stack and use the original value for current.  Stack
switching from the MCA/INIT stack to the original stack is no longer
required to backtrace the original task, which nicely removes the
problem of how to switch between kernel stacks when unwinding,

The detection of whether a task is blocked or not would have to change
slightly.  Currently a task is blocked if it is not on a cpu, which is
detected by comparing the task pointer against cpu_rq(cpu)->curr.
During an MCA/INIT event, the cpu_rq(cpu)->curr task will be blocked
and the MCA/INIT "task" will be active.  Fortunately that distinction
only affects the MCA/INIT handlers and debuggers like kdb and lcrash, I
can live with that.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2005-06-11  4:08 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-06-10  6:15 [RFD] Separating struct task and the kernel stacks Keith Owens
2005-06-10 15:11 ` Christoph Lameter
2005-06-10 16:54 ` David Mosberger
2005-06-10 17:03 ` David Mosberger
2005-06-10 17:13 ` Matthew Wilcox
2005-06-10 17:20 ` David Mosberger
2005-06-11  4:08 ` Keith Owens

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox