* Re: new execve/kernel_thread design
[not found] <20121016223508.GR2616@ZenIV.linux.org.uk>
@ 2012-10-17 5:32 ` Max Filippov
2012-10-17 5:43 ` Al Viro
[not found] ` <CACM3HyEpypULRWUc5ZnLnZ=uOWf3_j=9PXZiJrT_BXyGcQe9yg@mail.gmail.com>
1 sibling, 1 reply; 16+ messages in thread
From: Max Filippov @ 2012-10-17 5:32 UTC (permalink / raw)
To: Al Viro; +Cc: linux-kernel, linux-arch
On Wed, Oct 17, 2012 at 2:35 AM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> [apologies for enormous Cc; I've talked to some of you in private mail
> and after being politely asked to explain WTF was all that thing for
> and how was it supposed to work, well...]
[...]
> Not even a tentative patchset: hexagon, openrisc, tile, xtensa.
I'm doing xtensa part.
BTW, what linus-arch ML might be for?
--
Thanks.
-- Max
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: new execve/kernel_thread design
2012-10-17 5:32 ` Max Filippov
@ 2012-10-17 5:43 ` Al Viro
2012-10-17 5:43 ` Al Viro
0 siblings, 1 reply; 16+ messages in thread
From: Al Viro @ 2012-10-17 5:43 UTC (permalink / raw)
To: Max Filippov; +Cc: linux-kernel, linux-arch
On Wed, Oct 17, 2012 at 09:32:34AM +0400, Max Filippov wrote:
> On Wed, Oct 17, 2012 at 2:35 AM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> > [apologies for enormous Cc; I've talked to some of you in private mail
> > and after being politely asked to explain WTF was all that thing for
> > and how was it supposed to work, well...]
>
> [...]
>
> > Not even a tentative patchset: hexagon, openrisc, tile, xtensa.
>
> I'm doing xtensa part.
Thanks; I hope this variant is going to be less painful than messing with
ret_from_kernel_execve()...
> BTW, what linus-arch ML might be for?
A typo, noticed only when I got a bounce ;-)
My apologies...
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: new execve/kernel_thread design
2012-10-17 5:43 ` Al Viro
@ 2012-10-17 5:43 ` Al Viro
0 siblings, 0 replies; 16+ messages in thread
From: Al Viro @ 2012-10-17 5:43 UTC (permalink / raw)
To: Max Filippov; +Cc: linux-kernel, linux-arch
On Wed, Oct 17, 2012 at 09:32:34AM +0400, Max Filippov wrote:
> On Wed, Oct 17, 2012 at 2:35 AM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> > [apologies for enormous Cc; I've talked to some of you in private mail
> > and after being politely asked to explain WTF was all that thing for
> > and how was it supposed to work, well...]
>
> [...]
>
> > Not even a tentative patchset: hexagon, openrisc, tile, xtensa.
>
> I'm doing xtensa part.
Thanks; I hope this variant is going to be less painful than messing with
ret_from_kernel_execve()...
> BTW, what linus-arch ML might be for?
A typo, noticed only when I got a bounce ;-)
My apologies...
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: new execve/kernel_thread design
[not found] ` <CACM3HyEpypULRWUc5ZnLnZ=uOWf3_j=9PXZiJrT_BXyGcQe9yg@mail.gmail.com>
@ 2012-10-17 14:27 ` Michal Simek
2012-10-17 14:27 ` Michal Simek
2012-10-17 16:07 ` Al Viro
0 siblings, 2 replies; 16+ messages in thread
From: Michal Simek @ 2012-10-17 14:27 UTC (permalink / raw)
To: Jonas Bonn
Cc: Al Viro, linux-kernel, linux-arch, Linus Torvalds,
Catalin Marinas, Haavard Skinnemoen, Mike Frysinger,
Jesper Nilsson, David Howells, Tony Luck, Benjamin Herrenschmidt,
Hirokazu Takata, Geert Uytterhoeven, James E.J. Bottomley,
Richard Kuo, Martin Schwidefsky, Lennox Wu, David S. Miller,
Paul Mundt, Chris Zankel, Chris Metcalf, Yoshinori Sato
2012/10/17 Jonas Bonn <jonas@southpole.se>:
> On 17 October 2012 00:35, Al Viro <viro@zeniv.linux.org.uk> wrote:
>>
>> Not even a tentative patchset: hexagon, openrisc, tile, xtensa.
>>
>
> I did most of the OpenRISC conversion last weekend... the
> kernel_thread bits work fine but I end up with the init thread dying
> with what I've got now for kernel_execve. Once I've got that sorted
> out, I'll pass this along to you.
I am testing the Microblaze conversion and I see the similar problem
with GENERIC_KERNEL_EXECVE
(commit: http://git.kernel.org/?p=linux/kernel/git/viro/signal.git;a=commit;h=6aa044199aed5b541eba7fe7f25efdfb3a655a58)
I have look at the patch and I have found this.
(From description above: a kernel thread can become a userland
process. The primitive is kernel_execve())
In init/main.c:795/run_init_process() kernel_execve is called.
In old style, kernel_execve is called which runs microblaze
kernel_execve which calls __NR_execve as syscall.
In entry.S user exception detects that jump comes from kernel space
and save pt_regs on the current stack
and calls sys_execve and then microblaze_execve with 4th argument
which is pointer to pt_regs, etc.
In the patch above there is directly used current_pt_regs() function
which works good for newly created threads
when pt_regs are exactly in current_pt_regs() position but not for
pt_regs which are saved on the stack
which is the init task case.
Also this is the reason why microblaze has implementation for calling
_user_exception from the kernel space.
I believe that it is called just once for /init.
My question is how should /init be called? Because I need to save
pt_regs to current_pt_regs() position where
generic kernel_execve expects it.
Thanks,
Michal
--
Michal Simek, Ing. (M.Eng)
w: www.monstr.eu p: +42-0-721842854
Maintainer of Linux kernel 2.6 Microblaze Linux - http://www.monstr.eu/fdt/
Microblaze U-BOOT custodian
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: new execve/kernel_thread design
2012-10-17 14:27 ` Michal Simek
@ 2012-10-17 14:27 ` Michal Simek
2012-10-17 16:07 ` Al Viro
1 sibling, 0 replies; 16+ messages in thread
From: Michal Simek @ 2012-10-17 14:27 UTC (permalink / raw)
To: Jonas Bonn
Cc: Al Viro, linux-kernel, linux-arch, Linus Torvalds,
Catalin Marinas, Haavard Skinnemoen, Mike Frysinger,
Jesper Nilsson, David Howells, Tony Luck, Benjamin Herrenschmidt,
Hirokazu Takata, Geert Uytterhoeven, James E.J. Bottomley,
Richard Kuo, Martin Schwidefsky, Lennox Wu, David S. Miller,
Paul Mundt, Chris Zankel, Chris Metcalf, Yoshinori Sato,
Guan Xuetao
2012/10/17 Jonas Bonn <jonas@southpole.se>:
> On 17 October 2012 00:35, Al Viro <viro@zeniv.linux.org.uk> wrote:
>>
>> Not even a tentative patchset: hexagon, openrisc, tile, xtensa.
>>
>
> I did most of the OpenRISC conversion last weekend... the
> kernel_thread bits work fine but I end up with the init thread dying
> with what I've got now for kernel_execve. Once I've got that sorted
> out, I'll pass this along to you.
I am testing the Microblaze conversion and I see the similar problem
with GENERIC_KERNEL_EXECVE
(commit: http://git.kernel.org/?p=linux/kernel/git/viro/signal.git;a=commit;h=6aa044199aed5b541eba7fe7f25efdfb3a655a58)
I have look at the patch and I have found this.
(From description above: a kernel thread can become a userland
process. The primitive is kernel_execve())
In init/main.c:795/run_init_process() kernel_execve is called.
In old style, kernel_execve is called which runs microblaze
kernel_execve which calls __NR_execve as syscall.
In entry.S user exception detects that jump comes from kernel space
and save pt_regs on the current stack
and calls sys_execve and then microblaze_execve with 4th argument
which is pointer to pt_regs, etc.
In the patch above there is directly used current_pt_regs() function
which works good for newly created threads
when pt_regs are exactly in current_pt_regs() position but not for
pt_regs which are saved on the stack
which is the init task case.
Also this is the reason why microblaze has implementation for calling
_user_exception from the kernel space.
I believe that it is called just once for /init.
My question is how should /init be called? Because I need to save
pt_regs to current_pt_regs() position where
generic kernel_execve expects it.
Thanks,
Michal
--
Michal Simek, Ing. (M.Eng)
w: www.monstr.eu p: +42-0-721842854
Maintainer of Linux kernel 2.6 Microblaze Linux - http://www.monstr.eu/fdt/
Microblaze U-BOOT custodian
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: new execve/kernel_thread design
2012-10-17 14:27 ` Michal Simek
2012-10-17 14:27 ` Michal Simek
@ 2012-10-17 16:07 ` Al Viro
2012-10-17 16:07 ` Al Viro
2012-10-17 16:19 ` Al Viro
1 sibling, 2 replies; 16+ messages in thread
From: Al Viro @ 2012-10-17 16:07 UTC (permalink / raw)
To: Michal Simek
Cc: Jonas Bonn, linux-kernel, linux-arch, Linus Torvalds,
Catalin Marinas, Haavard Skinnemoen, Mike Frysinger,
Jesper Nilsson, David Howells, Tony Luck, Benjamin Herrenschmidt,
Hirokazu Takata, Geert Uytterhoeven, James E.J. Bottomley,
Richard Kuo, Martin Schwidefsky, Lennox Wu, David S. Miller,
Paul Mundt, Chris Zankel, Chris Metcalf, Yoshinori Sato,
Guan Xuetao <gx>
On Wed, Oct 17, 2012 at 04:27:06PM +0200, Michal Simek wrote:
> In the patch above there is directly used current_pt_regs() function
> which works good for newly created threads
> when pt_regs are exactly in current_pt_regs() position but not for
> pt_regs which are saved on the stack
> which is the init task case.
init_task does *not* do kernel_execve(). It's PID 0, not PID 1.
init is spawned by it.
> My question is how should /init be called? Because I need to save
> pt_regs to current_pt_regs() position where
> generic kernel_execve expects it.
What happens during boot is this:
* init_task (not to be confused with init) is used as current during
infrastructure initializations. Once everything needed for scheduler and
for working fork is set, we spawn two threads - future init and future
kthreadd. The last thing we do with init_task is telling init that kthreadd
has been spawned. After that init_task turns itself into an idle thread.
* future init waits for kthreadd to be spawned (it would be more
natural to fork them in opposite order, but we want init to have PID 1 -
too much stuff in userland depends on that). Then it does the rest of
initialization, including setting up initramfs contents. And does
kernel_execve() on /init. Note that this is a task that had been created
by kernel_thread() and is currently in function called from
ret_from_kernel_thread(). Its kernel stack has been set up by copy_thread().
That's where pt_regs need to be set up; note that they'll be passed to
start_thread() before you return to userland. If there are any magic bits
in pt_regs needed by return-from-syscall code, set them in kthread case of
copy_thread().
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: new execve/kernel_thread design
2012-10-17 16:07 ` Al Viro
@ 2012-10-17 16:07 ` Al Viro
2012-10-17 16:19 ` Al Viro
1 sibling, 0 replies; 16+ messages in thread
From: Al Viro @ 2012-10-17 16:07 UTC (permalink / raw)
To: Michal Simek
Cc: Jonas Bonn, linux-kernel, linux-arch, Linus Torvalds,
Catalin Marinas, Haavard Skinnemoen, Mike Frysinger,
Jesper Nilsson, David Howells, Tony Luck, Benjamin Herrenschmidt,
Hirokazu Takata, Geert Uytterhoeven, James E.J. Bottomley,
Richard Kuo, Martin Schwidefsky, Lennox Wu, David S. Miller,
Paul Mundt, Chris Zankel, Chris Metcalf, Yoshinori Sato,
Guan Xuetao
On Wed, Oct 17, 2012 at 04:27:06PM +0200, Michal Simek wrote:
> In the patch above there is directly used current_pt_regs() function
> which works good for newly created threads
> when pt_regs are exactly in current_pt_regs() position but not for
> pt_regs which are saved on the stack
> which is the init task case.
init_task does *not* do kernel_execve(). It's PID 0, not PID 1.
init is spawned by it.
> My question is how should /init be called? Because I need to save
> pt_regs to current_pt_regs() position where
> generic kernel_execve expects it.
What happens during boot is this:
* init_task (not to be confused with init) is used as current during
infrastructure initializations. Once everything needed for scheduler and
for working fork is set, we spawn two threads - future init and future
kthreadd. The last thing we do with init_task is telling init that kthreadd
has been spawned. After that init_task turns itself into an idle thread.
* future init waits for kthreadd to be spawned (it would be more
natural to fork them in opposite order, but we want init to have PID 1 -
too much stuff in userland depends on that). Then it does the rest of
initialization, including setting up initramfs contents. And does
kernel_execve() on /init. Note that this is a task that had been created
by kernel_thread() and is currently in function called from
ret_from_kernel_thread(). Its kernel stack has been set up by copy_thread().
That's where pt_regs need to be set up; note that they'll be passed to
start_thread() before you return to userland. If there are any magic bits
in pt_regs needed by return-from-syscall code, set them in kthread case of
copy_thread().
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: new execve/kernel_thread design
2012-10-17 16:07 ` Al Viro
2012-10-17 16:07 ` Al Viro
@ 2012-10-17 16:19 ` Al Viro
2012-10-17 16:19 ` Al Viro
2012-11-15 16:41 ` Michal Simek
1 sibling, 2 replies; 16+ messages in thread
From: Al Viro @ 2012-10-17 16:19 UTC (permalink / raw)
To: Michal Simek
Cc: Jonas Bonn, linux-kernel, linux-arch, Linus Torvalds,
Catalin Marinas, Haavard Skinnemoen, Mike Frysinger,
Jesper Nilsson, David Howells, Tony Luck, Benjamin Herrenschmidt,
Hirokazu Takata, Geert Uytterhoeven, James E.J. Bottomley,
Richard Kuo, Martin Schwidefsky, Lennox Wu, David S. Miller,
Paul Mundt, Chris Zankel, Chris Metcalf, Yoshinori Sato,
Guan Xuetao <gx>
On Wed, Oct 17, 2012 at 05:07:03PM +0100, Al Viro wrote:
> What happens during boot is this:
> * init_task (not to be confused with init) is used as current during
> infrastructure initializations. Once everything needed for scheduler and
> for working fork is set, we spawn two threads - future init and future
> kthreadd. The last thing we do with init_task is telling init that kthreadd
> has been spawned. After that init_task turns itself into an idle thread.
> * future init waits for kthreadd to be spawned (it would be more
> natural to fork them in opposite order, but we want init to have PID 1 -
> too much stuff in userland depends on that). Then it does the rest of
> initialization, including setting up initramfs contents. And does
> kernel_execve() on /init. Note that this is a task that had been created
> by kernel_thread() and is currently in function called from
> ret_from_kernel_thread(). Its kernel stack has been set up by copy_thread().
> That's where pt_regs need to be set up; note that they'll be passed to
> start_thread() before you return to userland. If there are any magic bits
> in pt_regs needed by return-from-syscall code, set them in kthread case of
> copy_thread().
PS: I suspect that we end up with the wrong value in childregs->msr;
start_thread() only add MSR_UMS there. I'd suggest running the kernel
with these patches + printk childregs->msr the very first time start_thread()
is called and see what it prints, then working kernel + such printk and
compare the results...
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: new execve/kernel_thread design
2012-10-17 16:19 ` Al Viro
@ 2012-10-17 16:19 ` Al Viro
2012-11-15 16:41 ` Michal Simek
1 sibling, 0 replies; 16+ messages in thread
From: Al Viro @ 2012-10-17 16:19 UTC (permalink / raw)
To: Michal Simek
Cc: Jonas Bonn, linux-kernel, linux-arch, Linus Torvalds,
Catalin Marinas, Haavard Skinnemoen, Mike Frysinger,
Jesper Nilsson, David Howells, Tony Luck, Benjamin Herrenschmidt,
Hirokazu Takata, Geert Uytterhoeven, James E.J. Bottomley,
Richard Kuo, Martin Schwidefsky, Lennox Wu, David S. Miller,
Paul Mundt, Chris Zankel, Chris Metcalf, Yoshinori Sato,
Guan Xuetao
On Wed, Oct 17, 2012 at 05:07:03PM +0100, Al Viro wrote:
> What happens during boot is this:
> * init_task (not to be confused with init) is used as current during
> infrastructure initializations. Once everything needed for scheduler and
> for working fork is set, we spawn two threads - future init and future
> kthreadd. The last thing we do with init_task is telling init that kthreadd
> has been spawned. After that init_task turns itself into an idle thread.
> * future init waits for kthreadd to be spawned (it would be more
> natural to fork them in opposite order, but we want init to have PID 1 -
> too much stuff in userland depends on that). Then it does the rest of
> initialization, including setting up initramfs contents. And does
> kernel_execve() on /init. Note that this is a task that had been created
> by kernel_thread() and is currently in function called from
> ret_from_kernel_thread(). Its kernel stack has been set up by copy_thread().
> That's where pt_regs need to be set up; note that they'll be passed to
> start_thread() before you return to userland. If there are any magic bits
> in pt_regs needed by return-from-syscall code, set them in kthread case of
> copy_thread().
PS: I suspect that we end up with the wrong value in childregs->msr;
start_thread() only add MSR_UMS there. I'd suggest running the kernel
with these patches + printk childregs->msr the very first time start_thread()
is called and see what it prints, then working kernel + such printk and
compare the results...
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: new execve/kernel_thread design
@ 2012-10-19 15:55 Al Viro
2012-10-21 10:35 ` James Bottomley
0 siblings, 1 reply; 16+ messages in thread
From: Al Viro @ 2012-10-19 15:55 UTC (permalink / raw)
To: linux-arch
[Sorry; forgot about that typo in Cc... Repost to linux-arch alone]
On Tue, Oct 16, 2012 at 11:35:08PM +0100, Al Viro wrote:
> 1. Basic rules for process lifetime.
> Except for the initial process (init_task, eventual idle thread on the boot
> CPU) all processes are created by do_fork(). There are three classes of
> those: kernel threads, userland processes and idle threads to be. There are
> few low-level operations involved:
> * a kernel thread can spawn a new kernel thread; the primitive
> doing that is kernel_thread().
> * a userland process can spawn a new userland process; that's
> done by sys_fork()/sys_vfork()/sys_clone()/sys_clone2().
> * a kernel thread can become a userland process. The primitive
> is kernel_execve().
> * a kernel thread can spawn a future idle thread; that's done
> by fork_idle(). Result is *not* scheduled until the secondary CPU gets
> initialized and its state is heavily overwritten in process.
Minor correction: while the first two cases go through do_fork() to
copy_process() to copy_thread(), fork_idle() calls copy_process() directly.
> 4. What is done?
> I've done the conversions for almost all architectures, but quite a few
> are completely untested.
>
> I'm fairly sure about alpha, x86 and um. Tested and I understand the
> architecture well enough. arm, mips and c6x had been tested by architecture
> maintainers. This stuff also works. alpha, arm, x86 and um are fully
> converted in mainline by now.
arm64 fixed and tested by maintainer, put in no-rebase mode.
sparc corrected to avoid branching beyond what ba,pt allows, ACKed by Davem
in that form. In no-rebase mode.
m68k tested and ACKed on coldfire; I think that along with aranym testing
here that is enough. In no-rebase mode.
Surprisingly enough, ia64 one seems to work on actual hardware; I have sent
Tony an incremental patch cleaning copy_thread() up, waiting for results of
testing that on SMP box.
Even more surprisingly, unicore32 variant turned out to contain only one
obvious typo. Fixed and tested by maintainer of unicore32 tree and actually
applied there, I've pulled his branch at that point.
microblaze: some fixes from Michal folded, still breakage with kernel_execve()
side of things.
Since there had been no signs of life from hexagon folks, I'd done (absolutely
blind and untested) tentative patches; see #arch-hexagon. Same situation
as with most of the embedded architectures - i.e. take with a cartload of salt,
that pair of patches is intended to be a possible starting point for producing
something working.
At that point we have the following situation:
alpha done
arm done
arm64 done
avr32 untested
blackfin untested
c6x done
cris untested
frv untested, maintainer going to test
h8300 untested
hexagon untested
ia64 apparently works, needs the final ACK from Tony.
m32r untested
m68k done
microblaze partially tested, maintainer hunting breakage down
mips done
mn10300 untested
openrisc maintainers said to have partially working variant
parisc should work, needs testing and ACK
powerpc should work, needs testing and ACK
s390 should work, needs testing and ACK
score untested
sh untested, maintainers planned reviewing and testing
sparc done
tile maintainers writing that one
um done
unicore32 done
x86 done
xtensa maintainers writing that one
One more thing: AFAICS, just about everything has something along the lines
of
if (!usp)
usp = <current userland sp>
do_fork(flags, usp, ....)
in their sys_clone(). How about taking that into copy_thread()? After
all, the logics there is
copy all the state, including userland stack pointer to child
override userland stack pointer with what the caller passed to
copy_thread()
often enough with "... and if we are about to override it with something
different, do the following extra work". Turning that into
copy all the state, including userland stack pointer to child
if (usp) {
override the userland stack pointer for child and maybe do
some extra work
}
would seem to be a fairly natural thing. Does anybody see problems with
doing that on their architecture? Note that with that fork() becomes
simply
#ifndef CONFIG_MMU
return -EINVAL;
#else
return do_fork(SIGCHLD, 0, current_pt_regs(), 0, NULL, NULL);
#endif
and similar for vfork(). And these can definitely drop the Cthulhu-awful
kludges for obtaining pt_regs (OK, on everything that doesn't do
kernel_thread() via syscall-from-kernel, but by now only xtensa is still
doing that). In some cases we need to do a bit of work before that
(gather callee-saved registers so that the child could get them as on alpha,
mips, m68k, openrisc, parisc, ppc and x86, flush userland register windows
on sparc and get psr/wim values on sparc32), but a lot more architectures
lose the asm wrappers for those and the rest can get rid of assorted
ugliness involved in getting that struct pt_regs *.
BTW, alpha seems to be doing an absolutely pointless work on the way out of
sys_fork() et.al. - saving callee-saved registers is needed, all right,
but why bother restoring all of them on the way out in the parent? All
we need is rp; that's ~0.3Kb of useless reads from memory on each fork()...
The same goes for m68k; there the amount of traffic is less, but still, what
the hell for? Child needs callee-saved registers restored (and usually will
have that done by switch_to()), but the parent needs only to make sure they
are saved and available for copy_thread() to bring them to child (incidentally,
copying registers is needed only when they are not embedded into task_struct.
At least um is doing a memcpy() for no reason whatsoever; fix will be sent
to rw shortly and ISTR seeing something similar on some of the other
architectures).
Another cross-architecture thing: folks, watch out for what's being done with
thread flags; I've just found a lovely bug on alpha where we have prctl(2)
doing non-atomic modifications of those (as in ti->flags = (ti->flags&~x)|y;),
which is obviously broken; TIF_SIGPENDING can be set asynchronously and even
from an interrupt. Fix for this one is going to Linus shortly (adding
a separate field for thread-synchronous flags, taking obviously t-s ones
there, including the UAC_... bunch set by that prctl()), but I don't think
that I can audit that for all architectures efficiently; cursory look has
found a braino on frv (fix being discussed with dhowells), but there may bloody
well be more of that fun.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: new execve/kernel_thread design
2012-10-19 15:55 new execve/kernel_thread design Al Viro
@ 2012-10-21 10:35 ` James Bottomley
0 siblings, 0 replies; 16+ messages in thread
From: James Bottomley @ 2012-10-21 10:35 UTC (permalink / raw)
To: Al Viro; +Cc: linux-arch, Parisc List
On Fri, 2012-10-19 at 16:55 +0100, Al Viro wrote:
> [Sorry; forgot about that typo in Cc... Repost to linux-arch alone]
>
> On Tue, Oct 16, 2012 at 11:35:08PM +0100, Al Viro wrote:
> > 1. Basic rules for process lifetime.
> > Except for the initial process (init_task, eventual idle thread on the boot
> > CPU) all processes are created by do_fork(). There are three classes of
> > those: kernel threads, userland processes and idle threads to be. There are
> > few low-level operations involved:
> > * a kernel thread can spawn a new kernel thread; the primitive
> > doing that is kernel_thread().
> > * a userland process can spawn a new userland process; that's
> > done by sys_fork()/sys_vfork()/sys_clone()/sys_clone2().
> > * a kernel thread can become a userland process. The primitive
> > is kernel_execve().
> > * a kernel thread can spawn a future idle thread; that's done
> > by fork_idle(). Result is *not* scheduled until the secondary CPU gets
> > initialized and its state is heavily overwritten in process.
>
> Minor correction: while the first two cases go through do_fork() to
> copy_process() to copy_thread(), fork_idle() calls copy_process() directly.
>
> > 4. What is done?
> > I've done the conversions for almost all architectures, but quite a few
> > are completely untested.
> >
> > I'm fairly sure about alpha, x86 and um. Tested and I understand the
> > architecture well enough. arm, mips and c6x had been tested by architecture
> > maintainers. This stuff also works. alpha, arm, x86 and um are fully
> > converted in mainline by now.
>
> arm64 fixed and tested by maintainer, put in no-rebase mode.
>
> sparc corrected to avoid branching beyond what ba,pt allows, ACKed by Davem
> in that form. In no-rebase mode.
>
> m68k tested and ACKed on coldfire; I think that along with aranym testing
> here that is enough. In no-rebase mode.
>
> Surprisingly enough, ia64 one seems to work on actual hardware; I have sent
> Tony an incremental patch cleaning copy_thread() up, waiting for results of
> testing that on SMP box.
>
> Even more surprisingly, unicore32 variant turned out to contain only one
> obvious typo. Fixed and tested by maintainer of unicore32 tree and actually
> applied there, I've pulled his branch at that point.
>
> microblaze: some fixes from Michal folded, still breakage with kernel_execve()
> side of things.
>
> Since there had been no signs of life from hexagon folks, I'd done (absolutely
> blind and untested) tentative patches; see #arch-hexagon. Same situation
> as with most of the embedded architectures - i.e. take with a cartload of salt,
> that pair of patches is intended to be a possible starting point for producing
> something working.
>
> At that point we have the following situation:
> alpha done
> arm done
> arm64 done
> avr32 untested
> blackfin untested
> c6x done
> cris untested
> frv untested, maintainer going to test
> h8300 untested
> hexagon untested
> ia64 apparently works, needs the final ACK from Tony.
> m32r untested
> m68k done
> microblaze partially tested, maintainer hunting breakage down
> mips done
> mn10300 untested
> openrisc maintainers said to have partially working variant
> parisc should work, needs testing and ACK
Tested and works on top of 3.7-rc2 ... you can add my ACK.
James
> powerpc should work, needs testing and ACK
> s390 should work, needs testing and ACK
> score untested
> sh untested, maintainers planned reviewing and
> testing
> sparc done
> tile maintainers writing that one
> um done
> unicore32 done
> x86 done
> xtensa maintainers writing that one
>
> One more thing: AFAICS, just about everything has something along the
> lines
> of
> if (!usp)
> usp = <current userland sp>
> do_fork(flags, usp, ....)
> in their sys_clone(). How about taking that into copy_thread()?
> After
> all, the logics there is
> copy all the state, including userland stack pointer to child
> override userland stack pointer with what the caller passed to
> copy_thread()
> often enough with "... and if we are about to override it with
> something
> different, do the following extra work". Turning that into
> copy all the state, including userland stack pointer to child
> if (usp) {
> override the userland stack pointer for child and
> maybe do
> some extra work
> }
> would seem to be a fairly natural thing. Does anybody see problems
> with
> doing that on their architecture? Note that with that fork() becomes
> simply
> #ifndef CONFIG_MMU
> return -EINVAL;
> #else
> return do_fork(SIGCHLD, 0, current_pt_regs(), 0, NULL, NULL);
> #endif
> and similar for vfork(). And these can definitely drop the
> Cthulhu-awful
> kludges for obtaining pt_regs (OK, on everything that doesn't do
> kernel_thread() via syscall-from-kernel, but by now only xtensa is
> still
> doing that). In some cases we need to do a bit of work before that
> (gather callee-saved registers so that the child could get them as on
> alpha,
> mips, m68k, openrisc, parisc, ppc and x86, flush userland register
> windows
> on sparc and get psr/wim values on sparc32), but a lot more
> architectures
> lose the asm wrappers for those and the rest can get rid of assorted
> ugliness involved in getting that struct pt_regs *.
>
> BTW, alpha seems to be doing an absolutely pointless work on the way
> out of
> sys_fork() et.al. - saving callee-saved registers is needed, all
> right,
> but why bother restoring all of them on the way out in the parent?
> All
> we need is rp; that's ~0.3Kb of useless reads from memory on each
> fork()...
>
> The same goes for m68k; there the amount of traffic is less, but
> still, what
> the hell for? Child needs callee-saved registers restored (and
> usually will
> have that done by switch_to()), but the parent needs only to make sure
> they
> are saved and available for copy_thread() to bring them to child
> (incidentally,
> copying registers is needed only when they are not embedded into
> task_struct.
> At least um is doing a memcpy() for no reason whatsoever; fix will be
> sent
> to rw shortly and ISTR seeing something similar on some of the other
> architectures).
>
> Another cross-architecture thing: folks, watch out for what's being
> done with
> thread flags; I've just found a lovely bug on alpha where we have
> prctl(2)
> doing non-atomic modifications of those (as in ti->flags =
> (ti->flags&~x)|y;),
> which is obviously broken; TIF_SIGPENDING can be set asynchronously
> and even
> from an interrupt. Fix for this one is going to Linus shortly (adding
> a separate field for thread-synchronous flags, taking obviously t-s
> ones
> there, including the UAC_... bunch set by that prctl()), but I don't
> think
> that I can audit that for all architectures efficiently; cursory look
> has
> found a braino on frv (fix being discussed with dhowells), but there
> may bloody
> well be more of that fun.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-arch"
> in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: new execve/kernel_thread design
2012-10-17 16:19 ` Al Viro
2012-10-17 16:19 ` Al Viro
@ 2012-11-15 16:41 ` Michal Simek
2012-11-15 16:41 ` Michal Simek
2012-11-15 21:55 ` Al Viro
1 sibling, 2 replies; 16+ messages in thread
From: Michal Simek @ 2012-11-15 16:41 UTC (permalink / raw)
To: Al Viro
Cc: Jonas Bonn, linux-kernel, linux-arch, Linus Torvalds,
Catalin Marinas, Haavard Skinnemoen, Mike Frysinger,
Jesper Nilsson, David Howells, Tony Luck, Benjamin Herrenschmidt,
Hirokazu Takata, Geert Uytterhoeven, James E.J. Bottomley,
Richard Kuo, Martin Schwidefsky, Lennox Wu, David S. Miller,
Paul Mundt, Chris Zankel, Chris Metcalf, Yoshinori Sato,
Guan Xuetao <gx>
Hi Al,
2012/10/17 Al Viro <viro@zeniv.linux.org.uk>:
> On Wed, Oct 17, 2012 at 05:07:03PM +0100, Al Viro wrote:
>> What happens during boot is this:
>> * init_task (not to be confused with init) is used as current during
>> infrastructure initializations. Once everything needed for scheduler and
>> for working fork is set, we spawn two threads - future init and future
>> kthreadd. The last thing we do with init_task is telling init that kthreadd
>> has been spawned. After that init_task turns itself into an idle thread.
>> * future init waits for kthreadd to be spawned (it would be more
>> natural to fork them in opposite order, but we want init to have PID 1 -
>> too much stuff in userland depends on that). Then it does the rest of
>> initialization, including setting up initramfs contents. And does
>> kernel_execve() on /init. Note that this is a task that had been created
>> by kernel_thread() and is currently in function called from
>> ret_from_kernel_thread(). Its kernel stack has been set up by copy_thread().
>> That's where pt_regs need to be set up; note that they'll be passed to
>> start_thread() before you return to userland. If there are any magic bits
>> in pt_regs needed by return-from-syscall code, set them in kthread case of
>> copy_thread().
>
> PS: I suspect that we end up with the wrong value in childregs->msr;
> start_thread() only add MSR_UMS there. I'd suggest running the kernel
> with these patches + printk childregs->msr the very first time start_thread()
> is called and see what it prints, then working kernel + such printk and
> compare the results...
sorry for taking this so long.
I have looked at it and fix it.
Here is the branch based on rc5 (information below)
and here is giweb.
http://developer.petalogix.com/git/gitweb.cgi?p=linux-2.6-microblaze.git;a=shortlog;h=refs/heads/viro/arch-microblaze-rc5
I have also looked at your sys_fork / sys_vfork / sys_clone unification
and I have fixed it for Microblaze.
Also I have done some tests on it for sure.
I would add sys_execve/kernel_execve/kernel_thread patches to my next branch.
Are you OK with that?
Do you need to test anything else for MB?
Thanks,
Michal
The following changes since commit 77b67063bb6bce6d475e910d3b886a606d0d91f7:
Linus Torvalds (1):
Linux 3.7-rc5
are available in the git repository at:
git://git.monstr.eu/linux-2.6-microblaze.git viro/arch-microblaze-rc5
Al Viro (5):
microblaze: switch to generic kernel_thread()
microblaze: switch to generic kernel_execve()
microblaze: switch to generic sys_execve()
generic sys_fork / sys_vfork / sys_clone
microblaze: switch to generic fork/vfork/clone
Michal Simek (3):
microblaze: Fix bug with schedule_tail
microblaze: Define current_pt_regs
microblaze: Remove BIP from childregs
arch/Kconfig | 11 ++++
arch/microblaze/Kconfig | 3 +
arch/microblaze/include/asm/processor.h | 10 +---
arch/microblaze/include/asm/unistd.h | 6 ++
arch/microblaze/kernel/entry-nommu.S | 20 +++-----
arch/microblaze/kernel/entry.S | 57 ++++-------------------
arch/microblaze/kernel/process.c | 77 ++++++++++---------------------
arch/microblaze/kernel/sys_microblaze.c | 53 ---------------------
arch/microblaze/kernel/syscall_table.S | 6 +--
include/asm-generic/syscalls.h | 7 +--
kernel/fork.c | 43 +++++++++++++++++
11 files changed, 111 insertions(+), 182 deletions(-)
--
Michal Simek, Ing. (M.Eng)
w: www.monstr.eu p: +42-0-721842854
Maintainer of Linux kernel 2.6 Microblaze Linux - http://www.monstr.eu/fdt/
Microblaze U-BOOT custodian
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: new execve/kernel_thread design
2012-11-15 16:41 ` Michal Simek
@ 2012-11-15 16:41 ` Michal Simek
2012-11-15 21:55 ` Al Viro
1 sibling, 0 replies; 16+ messages in thread
From: Michal Simek @ 2012-11-15 16:41 UTC (permalink / raw)
To: Al Viro
Cc: Jonas Bonn, linux-kernel, linux-arch, Linus Torvalds,
Catalin Marinas, Haavard Skinnemoen, Mike Frysinger,
Jesper Nilsson, David Howells, Tony Luck, Benjamin Herrenschmidt,
Hirokazu Takata, Geert Uytterhoeven, James E.J. Bottomley,
Richard Kuo, Martin Schwidefsky, Lennox Wu, David S. Miller,
Paul Mundt, Chris Zankel, Chris Metcalf, Yoshinori Sato,
Guan Xuetao
Hi Al,
2012/10/17 Al Viro <viro@zeniv.linux.org.uk>:
> On Wed, Oct 17, 2012 at 05:07:03PM +0100, Al Viro wrote:
>> What happens during boot is this:
>> * init_task (not to be confused with init) is used as current during
>> infrastructure initializations. Once everything needed for scheduler and
>> for working fork is set, we spawn two threads - future init and future
>> kthreadd. The last thing we do with init_task is telling init that kthreadd
>> has been spawned. After that init_task turns itself into an idle thread.
>> * future init waits for kthreadd to be spawned (it would be more
>> natural to fork them in opposite order, but we want init to have PID 1 -
>> too much stuff in userland depends on that). Then it does the rest of
>> initialization, including setting up initramfs contents. And does
>> kernel_execve() on /init. Note that this is a task that had been created
>> by kernel_thread() and is currently in function called from
>> ret_from_kernel_thread(). Its kernel stack has been set up by copy_thread().
>> That's where pt_regs need to be set up; note that they'll be passed to
>> start_thread() before you return to userland. If there are any magic bits
>> in pt_regs needed by return-from-syscall code, set them in kthread case of
>> copy_thread().
>
> PS: I suspect that we end up with the wrong value in childregs->msr;
> start_thread() only add MSR_UMS there. I'd suggest running the kernel
> with these patches + printk childregs->msr the very first time start_thread()
> is called and see what it prints, then working kernel + such printk and
> compare the results...
sorry for taking this so long.
I have looked at it and fix it.
Here is the branch based on rc5 (information below)
and here is giweb.
http://developer.petalogix.com/git/gitweb.cgi?p=linux-2.6-microblaze.git;a=shortlog;h=refs/heads/viro/arch-microblaze-rc5
I have also looked at your sys_fork / sys_vfork / sys_clone unification
and I have fixed it for Microblaze.
Also I have done some tests on it for sure.
I would add sys_execve/kernel_execve/kernel_thread patches to my next branch.
Are you OK with that?
Do you need to test anything else for MB?
Thanks,
Michal
The following changes since commit 77b67063bb6bce6d475e910d3b886a606d0d91f7:
Linus Torvalds (1):
Linux 3.7-rc5
are available in the git repository at:
git://git.monstr.eu/linux-2.6-microblaze.git viro/arch-microblaze-rc5
Al Viro (5):
microblaze: switch to generic kernel_thread()
microblaze: switch to generic kernel_execve()
microblaze: switch to generic sys_execve()
generic sys_fork / sys_vfork / sys_clone
microblaze: switch to generic fork/vfork/clone
Michal Simek (3):
microblaze: Fix bug with schedule_tail
microblaze: Define current_pt_regs
microblaze: Remove BIP from childregs
arch/Kconfig | 11 ++++
arch/microblaze/Kconfig | 3 +
arch/microblaze/include/asm/processor.h | 10 +---
arch/microblaze/include/asm/unistd.h | 6 ++
arch/microblaze/kernel/entry-nommu.S | 20 +++-----
arch/microblaze/kernel/entry.S | 57 ++++-------------------
arch/microblaze/kernel/process.c | 77 ++++++++++---------------------
arch/microblaze/kernel/sys_microblaze.c | 53 ---------------------
arch/microblaze/kernel/syscall_table.S | 6 +--
include/asm-generic/syscalls.h | 7 +--
kernel/fork.c | 43 +++++++++++++++++
11 files changed, 111 insertions(+), 182 deletions(-)
--
Michal Simek, Ing. (M.Eng)
w: www.monstr.eu p: +42-0-721842854
Maintainer of Linux kernel 2.6 Microblaze Linux - http://www.monstr.eu/fdt/
Microblaze U-BOOT custodian
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: new execve/kernel_thread design
2012-11-15 16:41 ` Michal Simek
2012-11-15 16:41 ` Michal Simek
@ 2012-11-15 21:55 ` Al Viro
2012-11-15 21:55 ` Al Viro
2012-11-16 7:59 ` Michal Simek
1 sibling, 2 replies; 16+ messages in thread
From: Al Viro @ 2012-11-15 21:55 UTC (permalink / raw)
To: Michal Simek
Cc: Jonas Bonn, linux-kernel, linux-arch, Linus Torvalds,
Catalin Marinas, Haavard Skinnemoen, Mike Frysinger,
Jesper Nilsson, David Howells, Tony Luck, Benjamin Herrenschmidt,
Hirokazu Takata, Geert Uytterhoeven, James E.J. Bottomley,
Richard Kuo, Martin Schwidefsky, Lennox Wu, David S. Miller,
Paul Mundt, Chris Zankel, Chris Metcalf, Yoshinori Sato,
Guan Xuetao <gx>
On Thu, Nov 15, 2012 at 05:41:16PM +0100, Michal Simek wrote:
> Here is the branch based on rc5 (information below)
> and here is giweb.
> http://developer.petalogix.com/git/gitweb.cgi?p=linux-2.6-microblaze.git;a=shortlog;h=refs/heads/viro/arch-microblaze-rc5
>
> I have also looked at your sys_fork / sys_vfork / sys_clone unification
> and I have fixed it for Microblaze.
>
> Also I have done some tests on it for sure.
>
> I would add sys_execve/kernel_execve/kernel_thread patches to my next branch.
> Are you OK with that?
Umm... In principle - yes, but I've a couple of question abouts those.
1) What's that set_fs(USER_DS) in start_thread() for? Note that we do the same
thing in flush_old_exec(), at the same time we remove PF_KTHREAD from
current->flags.
While we are at it, if we *ever* hit do_signal() with KERNEL_DS, we are
very deep in trouble. set_fs(USER_DS) in setup_{rt_,}frame() is pointless.
2) your definition of current_pt_regs() is an exact copy of on in
include/linux/ptrace.h; why is "microblaze: Define current_pt_regs"
needed at all? IOW, I'd rather added #include <linux/ptrace.h> to
arch/microblaze/kernel/process.c instead...
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: new execve/kernel_thread design
2012-11-15 21:55 ` Al Viro
@ 2012-11-15 21:55 ` Al Viro
2012-11-16 7:59 ` Michal Simek
1 sibling, 0 replies; 16+ messages in thread
From: Al Viro @ 2012-11-15 21:55 UTC (permalink / raw)
To: Michal Simek
Cc: Jonas Bonn, linux-kernel, linux-arch, Linus Torvalds,
Catalin Marinas, Haavard Skinnemoen, Mike Frysinger,
Jesper Nilsson, David Howells, Tony Luck, Benjamin Herrenschmidt,
Hirokazu Takata, Geert Uytterhoeven, James E.J. Bottomley,
Richard Kuo, Martin Schwidefsky, Lennox Wu, David S. Miller,
Paul Mundt, Chris Zankel, Chris Metcalf, Yoshinori Sato,
Guan Xuetao
On Thu, Nov 15, 2012 at 05:41:16PM +0100, Michal Simek wrote:
> Here is the branch based on rc5 (information below)
> and here is giweb.
> http://developer.petalogix.com/git/gitweb.cgi?p=linux-2.6-microblaze.git;a=shortlog;h=refs/heads/viro/arch-microblaze-rc5
>
> I have also looked at your sys_fork / sys_vfork / sys_clone unification
> and I have fixed it for Microblaze.
>
> Also I have done some tests on it for sure.
>
> I would add sys_execve/kernel_execve/kernel_thread patches to my next branch.
> Are you OK with that?
Umm... In principle - yes, but I've a couple of question abouts those.
1) What's that set_fs(USER_DS) in start_thread() for? Note that we do the same
thing in flush_old_exec(), at the same time we remove PF_KTHREAD from
current->flags.
While we are at it, if we *ever* hit do_signal() with KERNEL_DS, we are
very deep in trouble. set_fs(USER_DS) in setup_{rt_,}frame() is pointless.
2) your definition of current_pt_regs() is an exact copy of on in
include/linux/ptrace.h; why is "microblaze: Define current_pt_regs"
needed at all? IOW, I'd rather added #include <linux/ptrace.h> to
arch/microblaze/kernel/process.c instead...
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: new execve/kernel_thread design
2012-11-15 21:55 ` Al Viro
2012-11-15 21:55 ` Al Viro
@ 2012-11-16 7:59 ` Michal Simek
1 sibling, 0 replies; 16+ messages in thread
From: Michal Simek @ 2012-11-16 7:59 UTC (permalink / raw)
To: Al Viro; +Cc: linux-kernel, linux-arch
2012/11/15 Al Viro <viro@zeniv.linux.org.uk>:
> On Thu, Nov 15, 2012 at 05:41:16PM +0100, Michal Simek wrote:
>> Here is the branch based on rc5 (information below)
>> and here is giweb.
>> http://developer.petalogix.com/git/gitweb.cgi?p=linux-2.6-microblaze.git;a=shortlog;h=refs/heads/viro/arch-microblaze-rc5
>>
>> I have also looked at your sys_fork / sys_vfork / sys_clone unification
>> and I have fixed it for Microblaze.
>>
>> Also I have done some tests on it for sure.
>>
>> I would add sys_execve/kernel_execve/kernel_thread patches to my next branch.
>> Are you OK with that?
>
> Umm... In principle - yes, but I've a couple of question abouts those.
sure.
BTW: that generic sys_fork / sys_vfork / sys_clone will go through your tree.
> 1) What's that set_fs(USER_DS) in start_thread() for? Note that we do the same
> thing in flush_old_exec(), at the same time we remove PF_KTHREAD from
> current->flags.
ok. Will remove it.
> While we are at it, if we *ever* hit do_signal() with KERNEL_DS, we are
> very deep in trouble. set_fs(USER_DS) in setup_{rt_,}frame() is pointless.
I have seen that several your signal patches around signal are there.
Do you have set of tests which should run it?
> 2) your definition of current_pt_regs() is an exact copy of on in
> include/linux/ptrace.h; why is "microblaze: Define current_pt_regs"
> needed at all? IOW, I'd rather added #include <linux/ptrace.h> to
> arch/microblaze/kernel/process.c instead...
Agree. Fixed.
I have updated that branch or I can send you patches if you like.
Thanks,
Michal
--
Michal Simek, Ing. (M.Eng)
w: www.monstr.eu p: +42-0-721842854
Maintainer of Linux kernel 2.6 Microblaze Linux - http://www.monstr.eu/fdt/
Microblaze U-BOOT custodian
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2012-11-16 7:59 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-10-19 15:55 new execve/kernel_thread design Al Viro
2012-10-21 10:35 ` James Bottomley
[not found] <20121016223508.GR2616@ZenIV.linux.org.uk>
2012-10-17 5:32 ` Max Filippov
2012-10-17 5:43 ` Al Viro
2012-10-17 5:43 ` Al Viro
[not found] ` <CACM3HyEpypULRWUc5ZnLnZ=uOWf3_j=9PXZiJrT_BXyGcQe9yg@mail.gmail.com>
2012-10-17 14:27 ` Michal Simek
2012-10-17 14:27 ` Michal Simek
2012-10-17 16:07 ` Al Viro
2012-10-17 16:07 ` Al Viro
2012-10-17 16:19 ` Al Viro
2012-10-17 16:19 ` Al Viro
2012-11-15 16:41 ` Michal Simek
2012-11-15 16:41 ` Michal Simek
2012-11-15 21:55 ` Al Viro
2012-11-15 21:55 ` Al Viro
2012-11-16 7:59 ` Michal Simek
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).