From: Greg Ungerer <gerg@snapgear.com>
To: Al Viro <viro@ZenIV.linux.org.uk>
Cc: linux-kernel@vger.kernel.org,
Linus Torvalds <torvalds@linux-foundation.org>,
linux-arch@vger.kernel.org, David Miller <davem@davemloft.net>,
Benjamin Herrenschmidt <benh@kernel.crashing.org>
Subject: Re: [RFC][CFT][CFReview] execve and kernel_thread unification work
Date: Wed, 17 Oct 2012 16:16:32 +1000 [thread overview]
Message-ID: <507E4D40.9050202@snapgear.com> (raw)
In-Reply-To: <20121015013009.GB2616@ZenIV.linux.org.uk>
Hi Al,
On 15/10/12 11:30, Al Viro wrote:
> On Mon, Oct 01, 2012 at 10:38:09PM +0100, Al Viro wrote:
>> [with apologies for folks Cc'd, resent due to mis-autoexpanded l-k address
>> on the original posting ;-/ Mea culpa...]
>>
>> There's an interesting ongoing project around kernel_thread() and
>> friends, including execve() variants. I really need help from architecture
>> maintainers on that one; I'd been able to handle (and test) quite a few
>> architectures on my own [alpha, arm, m68k, powerpc, s390, sparc, x86, um]
>> plus two more untested [frv, mn10300]. c6x patches had been supplied by
>> Mark Salter; everything else remains to be done. Right now it's at
>> minus 1.2KLoC, quite a bit of that removed from asm glue and other black magic.
>
> Update:
> * all infrastructure is in mainline now, along with conversion for
> kernel_thread() callbacks to the form that allows really simple model for
> kernel_execve() _without_ flagday changes.
> * #experimental-kernel_thread is gone; this stuff is in for-next
> now.
> * a lot of architecture conversions had been done and some are
> even tested. Currently missing are only 7 - avr32, hexagon, m32r, openrisc,
> score, tile and xtensa. OTOH, a lot are completely untested. I've put
> per-architecture stuff into separate branches and I promise never rebase
> those once arch maintainers will be OK with the stuff in them. IOW, they'll
> be safe to pull into respective architecture trees.
>
> Folks, *please* review the stuff in signal.git#arch-*. All of them are
> completely independent. I'll be glad to get ACKs/fixes/replacements/etc.
I have checked arch-m68k on ColdFire with and without MMU, and it
is all fine. So for those:
Acked-by: Greg Ungerer <gerg@uclinux.org>
Regards
Greg
> I've merged some of those into for-next, but that can change at any time -
> it's not final; for-next will be rebased. Obviously, I hope to get to
> the situation when all of those branches (plus currently missing ones)
> get into shape that satisfies architecture maintainers. Once that happens,
> all those branches will be merged into for-next.
>
> I think the model is about final wrt kernel_thread()/kernel_execve()/
> sys_execve(). There's one possible change on top of it, but it's reasonably
> well-isolated from the rest. As it is, the model to aim for is this:
> * select GENERIC_KERNEL_THREAD and GENERIC_KERNEL_EXECVE
> * kill local kernel_thread()/kernel_execve() implementations
> * generic kernel_thread() will call your copy_thread() with
> NULL regs and fn/arg passed in the pair of arguments that are blindly
> passed all the way through to copy_thread() - usp and stack_size resp.
> In such case copy_thread() should arrange for the newborn to be woken
> up in a function that is very similar to ret_from_fork(). The only
> difference is that between the call of schedule_tail() and jumping into
> the "return from syscall" code it should call fn(arg), using the data
> left for it by copy_thread().
> * unlike the previous variant, ret_from_kernel_execve() is not
> needed at all; no need to play longjmp()-like games when kernel_thread()
> callbacks had been taught to return normally all the way out when
> kernel_execve() returns 0; any updates of sp/manipulations of register
> windows/etc. will happen without any magic.
> * provide current_pt_regs() if needed. Default is
> task_pt_regs(current), but you might want to optimize it and unlike
> task_pt_regs() it must work whenever we are in syscall or in a kernel thread.
> task_pt_regs(task), OTOH, is required to work only when task can be
> interrogated by tracer.
> * no more syscalls-from-kernel, which often allows for simplifications
> in the syscall entry/exit logics. I haven't done any of those; up to the
> architecture maintainers.
>
> One thing to keep in mind is that right now on SMP architectures
> there's the third caller of copy_thread(), besides fork()/clone()/vfork()
> (all pass userland pt_regs, with the address being current_pt_regs()) and
> kernel_thread() (pass NULL pt_regs, kthread creation time). It's fork_idle()
> and it passes zero-filled pt_regs. Frankly, I'm not even sure we want to
> call copy_thread() in that case - the stuff set up by it goes nowhere.
> We do that for each possible secondary CPU on SMP and we do *not* expose
> those threads to scheduler. When CPU gets initialized we have the
> secondary bootstrap take that task_struct as current. Its kernel stack,
> thread_info, etc. are set up by said secondary bootstrap, overriding whatever
> copy_thread() has done. Eventually the bootstrap reaches cpu_idle(),
> which is where we schedule away. switch_to() done by schedule() is what
> completes setting the things up; at that point they are ready to be woken
> up - and not in ret_from_fork(), of course.
> For the majority of architectures nothing done by copy_thread() in
> that case is used afterwards, so we might as well stop calling it when
> copy_process() is called by fork_idle(). I know of only one dubious case -
> powerpc sets thread->ksp_limit on copy_thread() and I'm not sure if
> that's get overwritten in secondary bootstrap - the value would be still
> correct and I don't see any obvious places where it would be reassigned
> on that codepath. There might be other cases like that, though. I would
> argue that for this kind of stuff the right place is arch_dup_task_struct(),
> not copy_thread()... Hell knows. Note that we are pretty much hitting
> the random path in copy_thread() in that case - what zeroed pt_regs look
> like to user_regs() is arch-dependent.
>
> This is the possible change I've mentioned above. Not sure; I'd
> really like comments on that one.
>
> Branches in there:
> arch-blackfin - conversion; completely untested
> arch-cris - conversion; completely untested
> arch-h8300 - conversion; completely untested
> arch-microblaze - conversion; completely untested
> arch-sh - conversion; completely untested
> arch-unicore32 - conversion; completely untested
> arch-ia64 - conversion; tested only on ski, which is worth very little
> arch-c6x - followup to mainline; while it's minor, it's pretty much done
> blindly and *really* needs review by maintainer.
> arch-arm - contains heroic fix by rmk and nothing else. Seems to work fine.
> arch-m68k - minor followup to stuff already in mainline; works on aranym
> arch-parisc - mostly the stuff tested by parisc folks + minor followup
> similar to m68k one.
> arch-s390 - minor followup to mainline; works in hercules
> arch-arm64 - patches from maintainer with minor followup folded
> arch-frv - minor followup to mainline, needs testing
> arch-mn10300 - minor followup to mainline, needs testing
> arch-mips - patches from me and Ralf; works on qemu
> arch-sparc - conversions for sparc32 and sparc64, plus the syscall_noerror
> optimization
> arch-powerpc - minor followups to mainline, need review by maintainers
>
> "Completely untested" in the above reads "no promises it even compiles, let
> alone isn't horribly broken". Please, treat that as a possible starting
> point for doing the conversion for arch in question. I might have misread
> the CPU manuals, your switch_to() implementation, etc., or just have been
> temporary insane from digging through dozens of architectures. Hopefully
> temporary, that is...
>
> And folks, for pity sake, do the remaining seven. The merge window is
> over, so...
>
> Al, buggering off to get some VFS work done.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-arch" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
--
------------------------------------------------------------------------
Greg Ungerer -- Principal Engineer EMAIL: gerg@snapgear.com
SnapGear Group, McAfee PHONE: +61 7 3435 2888
8 Gardner Close FAX: +61 7 3217 5323
Milton, QLD, 4064, Australia WEB: http://www.SnapGear.com
WARNING: multiple messages have this Message-ID (diff)
From: Greg Ungerer <gerg@snapgear.com>
To: Al Viro <viro@ZenIV.linux.org.uk>
Cc: <linux-kernel@vger.kernel.org>,
Linus Torvalds <torvalds@linux-foundation.org>,
<linux-arch@vger.kernel.org>, David Miller <davem@davemloft.net>,
Benjamin Herrenschmidt <benh@kernel.crashing.org>
Subject: Re: [RFC][CFT][CFReview] execve and kernel_thread unification work
Date: Wed, 17 Oct 2012 16:16:32 +1000 [thread overview]
Message-ID: <507E4D40.9050202@snapgear.com> (raw)
In-Reply-To: <20121015013009.GB2616@ZenIV.linux.org.uk>
Hi Al,
On 15/10/12 11:30, Al Viro wrote:
> On Mon, Oct 01, 2012 at 10:38:09PM +0100, Al Viro wrote:
>> [with apologies for folks Cc'd, resent due to mis-autoexpanded l-k address
>> on the original posting ;-/ Mea culpa...]
>>
>> There's an interesting ongoing project around kernel_thread() and
>> friends, including execve() variants. I really need help from architecture
>> maintainers on that one; I'd been able to handle (and test) quite a few
>> architectures on my own [alpha, arm, m68k, powerpc, s390, sparc, x86, um]
>> plus two more untested [frv, mn10300]. c6x patches had been supplied by
>> Mark Salter; everything else remains to be done. Right now it's at
>> minus 1.2KLoC, quite a bit of that removed from asm glue and other black magic.
>
> Update:
> * all infrastructure is in mainline now, along with conversion for
> kernel_thread() callbacks to the form that allows really simple model for
> kernel_execve() _without_ flagday changes.
> * #experimental-kernel_thread is gone; this stuff is in for-next
> now.
> * a lot of architecture conversions had been done and some are
> even tested. Currently missing are only 7 - avr32, hexagon, m32r, openrisc,
> score, tile and xtensa. OTOH, a lot are completely untested. I've put
> per-architecture stuff into separate branches and I promise never rebase
> those once arch maintainers will be OK with the stuff in them. IOW, they'll
> be safe to pull into respective architecture trees.
>
> Folks, *please* review the stuff in signal.git#arch-*. All of them are
> completely independent. I'll be glad to get ACKs/fixes/replacements/etc.
I have checked arch-m68k on ColdFire with and without MMU, and it
is all fine. So for those:
Acked-by: Greg Ungerer <gerg@uclinux.org>
Regards
Greg
> I've merged some of those into for-next, but that can change at any time -
> it's not final; for-next will be rebased. Obviously, I hope to get to
> the situation when all of those branches (plus currently missing ones)
> get into shape that satisfies architecture maintainers. Once that happens,
> all those branches will be merged into for-next.
>
> I think the model is about final wrt kernel_thread()/kernel_execve()/
> sys_execve(). There's one possible change on top of it, but it's reasonably
> well-isolated from the rest. As it is, the model to aim for is this:
> * select GENERIC_KERNEL_THREAD and GENERIC_KERNEL_EXECVE
> * kill local kernel_thread()/kernel_execve() implementations
> * generic kernel_thread() will call your copy_thread() with
> NULL regs and fn/arg passed in the pair of arguments that are blindly
> passed all the way through to copy_thread() - usp and stack_size resp.
> In such case copy_thread() should arrange for the newborn to be woken
> up in a function that is very similar to ret_from_fork(). The only
> difference is that between the call of schedule_tail() and jumping into
> the "return from syscall" code it should call fn(arg), using the data
> left for it by copy_thread().
> * unlike the previous variant, ret_from_kernel_execve() is not
> needed at all; no need to play longjmp()-like games when kernel_thread()
> callbacks had been taught to return normally all the way out when
> kernel_execve() returns 0; any updates of sp/manipulations of register
> windows/etc. will happen without any magic.
> * provide current_pt_regs() if needed. Default is
> task_pt_regs(current), but you might want to optimize it and unlike
> task_pt_regs() it must work whenever we are in syscall or in a kernel thread.
> task_pt_regs(task), OTOH, is required to work only when task can be
> interrogated by tracer.
> * no more syscalls-from-kernel, which often allows for simplifications
> in the syscall entry/exit logics. I haven't done any of those; up to the
> architecture maintainers.
>
> One thing to keep in mind is that right now on SMP architectures
> there's the third caller of copy_thread(), besides fork()/clone()/vfork()
> (all pass userland pt_regs, with the address being current_pt_regs()) and
> kernel_thread() (pass NULL pt_regs, kthread creation time). It's fork_idle()
> and it passes zero-filled pt_regs. Frankly, I'm not even sure we want to
> call copy_thread() in that case - the stuff set up by it goes nowhere.
> We do that for each possible secondary CPU on SMP and we do *not* expose
> those threads to scheduler. When CPU gets initialized we have the
> secondary bootstrap take that task_struct as current. Its kernel stack,
> thread_info, etc. are set up by said secondary bootstrap, overriding whatever
> copy_thread() has done. Eventually the bootstrap reaches cpu_idle(),
> which is where we schedule away. switch_to() done by schedule() is what
> completes setting the things up; at that point they are ready to be woken
> up - and not in ret_from_fork(), of course.
> For the majority of architectures nothing done by copy_thread() in
> that case is used afterwards, so we might as well stop calling it when
> copy_process() is called by fork_idle(). I know of only one dubious case -
> powerpc sets thread->ksp_limit on copy_thread() and I'm not sure if
> that's get overwritten in secondary bootstrap - the value would be still
> correct and I don't see any obvious places where it would be reassigned
> on that codepath. There might be other cases like that, though. I would
> argue that for this kind of stuff the right place is arch_dup_task_struct(),
> not copy_thread()... Hell knows. Note that we are pretty much hitting
> the random path in copy_thread() in that case - what zeroed pt_regs look
> like to user_regs() is arch-dependent.
>
> This is the possible change I've mentioned above. Not sure; I'd
> really like comments on that one.
>
> Branches in there:
> arch-blackfin - conversion; completely untested
> arch-cris - conversion; completely untested
> arch-h8300 - conversion; completely untested
> arch-microblaze - conversion; completely untested
> arch-sh - conversion; completely untested
> arch-unicore32 - conversion; completely untested
> arch-ia64 - conversion; tested only on ski, which is worth very little
> arch-c6x - followup to mainline; while it's minor, it's pretty much done
> blindly and *really* needs review by maintainer.
> arch-arm - contains heroic fix by rmk and nothing else. Seems to work fine.
> arch-m68k - minor followup to stuff already in mainline; works on aranym
> arch-parisc - mostly the stuff tested by parisc folks + minor followup
> similar to m68k one.
> arch-s390 - minor followup to mainline; works in hercules
> arch-arm64 - patches from maintainer with minor followup folded
> arch-frv - minor followup to mainline, needs testing
> arch-mn10300 - minor followup to mainline, needs testing
> arch-mips - patches from me and Ralf; works on qemu
> arch-sparc - conversions for sparc32 and sparc64, plus the syscall_noerror
> optimization
> arch-powerpc - minor followups to mainline, need review by maintainers
>
> "Completely untested" in the above reads "no promises it even compiles, let
> alone isn't horribly broken". Please, treat that as a possible starting
> point for doing the conversion for arch in question. I might have misread
> the CPU manuals, your switch_to() implementation, etc., or just have been
> temporary insane from digging through dozens of architectures. Hopefully
> temporary, that is...
>
> And folks, for pity sake, do the remaining seven. The merge window is
> over, so...
>
> Al, buggering off to get some VFS work done.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-arch" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
--
------------------------------------------------------------------------
Greg Ungerer -- Principal Engineer EMAIL: gerg@snapgear.com
SnapGear Group, McAfee PHONE: +61 7 3435 2888
8 Gardner Close FAX: +61 7 3217 5323
Milton, QLD, 4064, Australia WEB: http://www.SnapGear.com
next prev parent reply other threads:[~2012-10-17 6:13 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-10-01 21:38 [RFC][CFT][CFReview] execve and kernel_thread unification work Al Viro
2012-10-05 16:07 ` Catalin Marinas
2012-10-06 16:19 ` Al Viro
2012-10-09 19:48 ` Chris Metcalf
2012-10-09 19:48 ` Chris Metcalf
2012-10-10 4:51 ` Al Viro
2012-10-11 9:00 ` Paul Mackerras
2012-10-11 12:53 ` Al Viro
2012-10-12 0:16 ` Paul Mackerras
2012-10-12 1:09 ` [git pull] signal.git, pile 2 (was Re: [RFC][CFT][CFReview] execve and kernel_thread unification work) Al Viro
2012-10-12 5:32 ` Paul Mackerras
2012-10-12 7:55 ` Benjamin Herrenschmidt
2012-10-15 1:30 ` [RFC][CFT][CFReview] execve and kernel_thread unification work Al Viro
2012-10-17 6:16 ` Greg Ungerer [this message]
2012-10-17 6:16 ` Greg Ungerer
2012-10-17 14:02 ` Catalin Marinas
2012-10-17 16:34 ` Al Viro
2012-10-17 16:40 ` Catalin Marinas
-- strict thread matches above, loose matches on Subject: below --
2012-10-01 21:34 Al Viro
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=507E4D40.9050202@snapgear.com \
--to=gerg@snapgear.com \
--cc=benh@kernel.crashing.org \
--cc=davem@davemloft.net \
--cc=linux-arch@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=torvalds@linux-foundation.org \
--cc=viro@ZenIV.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.