From: "Dialup Jon Norstog" <thursday@allidaho.com>
To: Al Viro <viro@ZenIV.linux.org.uk>, Oleg Nesterov <oleg@redhat.com>
Cc: dl8bcu@dl8bcu.de, peterz@infradead.org, mingo@kernel.org,
linux-kernel@vger.kernel.org, linux-alpha@vger.kernel.org,
Richard Henderson <rth@twiddle.net>,
Ivan Kokshaysky <ink@jurassic.park.msu.ru>,
Matt Turner <mattst88@gmail.com>
Subject: Re: [regression] boot failure on alpha, bisected
Date: Mon, 8 Oct 2012 08:14:17 -0600 [thread overview]
Message-ID: <20121008140948.M96290@allidaho.com> (raw)
In-Reply-To: <20121007193909.GK2616@ZenIV.linux.org.uk>
Hello! I'm an Alpha user - I just want to thank you ll for working to keep
Linux current on this architecture. I am still using the last working Alpha
Core release ... I hope to keep the old beast running for many more years!
Jon Norstog
www.thursdaybicycles.com
On Sun, 7 Oct 2012 20:39:09 +0100, Al Viro wrote
> On Sun, Oct 07, 2012 at 07:33:36PM +0200, Oleg Nesterov wrote:
>
> > > Um... There's a bunch of architectures that are in the same situation.
> > > grep for do_notify_resume() and you'll see...
> >
> > And every do_notify_resume() should be changed anyway, do_signal() and
> > tracehook_notify_resume() should be re-ordered.
>
> There's a bit more to it. The thing is, we have quite a mess around
> the signal-handling loops, mixed with that regarding the signal restarts.
> On arm it's done about right by now:
> * looping until all signals had been handled is done in C;
> none of that "loop in asm glue" nonsense anymore.
> * prevention of double restarts is *also* there, TYVM.
> * do_work_pending() is called with interrupts disabled.
> It may return 0, in which case we are done, interrupts are disabled
> and the caller should proceed to userland without reenabling them
> until it leaves. Otherwise we have a syscall restart to handle and
> no userland signal handler had been invoked. Interrupts are enabled
> and we should simply reload arguments and syscall number from pt_regs
> and proceed to syscall entry, without returning to userland. The
> only twist is that negative return value means ERESTART_RESTARTBLOCK
> kind of restart, in which case we need to use __NR_restart_syscall
> for syscall number.
>
> Note that we do *not* go through return to userland and reentering
> the kernel on handlerless syscall restarts. S390 uses the same
> model, but there it's done in assembler glue - for no good reason.
> Should be in straight C.
>
> For alpha there's another twist, though - there we do _not_ save all
> registers in pt_regs; there's a fairly large chunk of callee-saved
> registers we don't need to protect from being messed by C parts of
> the kernel. We do need to save them in sigcontext, though. So alpha
> (and quite a few other architctures) has separate struct switch_stack
>
> (named so since switch_to() needs to save/restore the same registers)
> . Rules: * on fork() et.al. we save those callee-saved registers in
> struct switch_stack, right next to pt_regs. We do that before
> calling the actual sys_fork() and have copy_thread() copy these guys
> into child. Remember that newborns are first woken up in ret_from_fork
> and as with all context switches they go through switch_to(). So these
> registers are restored by the time the sucker wakes up.
> * on signal delivery we save those registers in struct switch_stack
> and use it, along with pt_regs it lives next to, to fill sigcontext.
> * ptrace counts on those suckers being next to pt_regs. That allows
> tracer to modify tracee's registers, including callee-saved ones.
> So we
> (1) restore them from switch_stack once we are done with do_signal()
> and
> (2) save/restore them around another place where we can get stopped for
> tracer to examine us - PTRACE_SYSCALL-induced paths in syscall handling.
> * on sigreturn/rt_sigreturn we need to restore all registers.
> So we reserve switch_stack on stack, next to pt_regs and have the C
> part of sigreturn fill those along with pt_regs. Once we are done,
> read those registers from switch_stack.
>
> That's more or less it; many other architectures are doing more or less
> similar things, but not all of them put that stuff into separate structure.
>
> E.g. another valid solution is to leave space in pt_regs, fill only
> a subset on entry and have switch_to() save stuff in task_struct
> instead of putting it on kernel stack.
>
> What it means for us is that saving all that crap on stack should *not*
> be done unless we have work to do. OTOH, in situations when we have
> more than one pending signal it's bloody dumb to save/restore around
> each do_notify_resume() call separately. OTTH, in situation when
> we'd run out of timeslice and had nothing arrive until we'd regained
> CPU save/restore around schedule() is pointless at the very least.
> So for things like alpha I'd do this:
>
> interrupts disabled
> check thread flags
> no work to do => bugger off to userland
> just NEED_RESCHED?
> schedule()
> reread thread flags
> no work to do => bugger off to userland
> save callee-saved registers
> call do_work_pending
> restore callee-saved registers
> if do_work_pendign returned 0 => bugger off to userland
> deal with handlerless restart
>
> Note that the loop around do_signal() and friends is in C and is fairly
> similar to what we've got on ARM. x86 is in intermediate situation -
> the main complication there is v86 crap.
>
> I'd say that for now your variant should do, but we really need to
> get that crap under control and out of asm glue. Are you willing to
> participate? Guys, we need a way to do cross-architecture work
> without going insane. I've spent quite a bit of time this year
> crawling through that stuff. And yes, it's getting better as the
> result, but it's not sustainable - I have VFS work to do, after all.
>
> Basically, we need more people willing to take part in that; ideally
> - architecture maintainers, but some of them are semi-MIA. The
> areas involved: * kernel_thread()/kernel_execve()/sys_execve()
> /fork()/vfork()/clone() - quite a bit of that is already done and I
> hope we'll regularize that crap in the coming cycle. * signal
> handling in general - a lot got done this spring and summer, quite a
> bit more is possible to unify. I've got a long list of common
> landmines not to step upon and unfortunately it's *very* common to have
> architectures step on a bunch of those.
> * syscall restarts - see above; note that e.g. prevention of
> double restarts and restarts on sigreturn is subtle, arch-dependent
> and had been broken on *many* architectures. And I'm not at all sure
> we'd got all suckers fixed.
> * ptrace work, especially around PTRACE_SYSCALL handling. I suspect
> that the right way to handle it is a new regset aliasing the normal
> registers, so that access to syscall arguments would be arch-
> independent. We can do that, and it would simplify the living hell
> out of e.g. audit hookup. Another (and closely relate) thing is
> conversion to tracehook_report_syscall_*; the tricky bit is that we
> probably want a uniform semantics for things like modifying syscall
> arguments via ptrace; some architectures do it right and reload
> arguments and syscall number from pt_regs after they'd done
> tracehook_report_syscall_entry(), but not all of them do. Moreover,
> we probably want to short-circuit the syscall itself when
> PTRACE_CONT had been done with "and deliver SIGKILL to the tracee"
> as e.g. x86, sparc and ppc do. * interplay between single-stepping
> and syscall restarts. Really, really nasty. And needs involvement
> of e.g. gdb people to sort out.
>
> We really need that stuff sanely synchronized between architectures.
> I'm willing to keep participating in that work, but I can't do that alone.
> It's simply not survivable.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-
> alpha" in the body of a message to majordomo@vger.kernel.org More
> majordomo info at http://vger.kernel.org/majordomo-info.html
--
Open WebMail Project (http://openwebmail.org)
next prev parent reply other threads:[~2012-10-08 16:14 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-10-06 20:47 [regression] boot failure on alpha, bisected Thorsten Kranzkowski
2012-10-07 16:55 ` Oleg Nesterov
2012-10-07 17:08 ` Al Viro
2012-10-07 17:33 ` Oleg Nesterov
2012-10-07 19:39 ` Al Viro
2012-10-08 14:14 ` Dialup Jon Norstog [this message]
2012-10-08 14:14 ` Dialup Jon Norstog
2012-10-08 18:59 ` Oleg Nesterov
2012-10-07 17:13 ` Oleg Nesterov
2012-10-07 18:04 ` Thorsten Kranzkowski
2012-10-07 19:16 ` Oleg Nesterov
2012-10-07 19:41 ` Thorsten Kranzkowski
2012-10-08 18:59 ` Geert Uytterhoeven
2012-10-08 19:10 ` Oleg Nesterov
2012-10-12 16:03 ` [PATCH 0/1] (Was: [regression] boot failure on alpha, bisected) Oleg Nesterov
2012-10-12 16:03 ` [PATCH 1/1] task_work: Add local_irq_enable() into task_work_run() Oleg Nesterov
2012-10-13 1:09 ` Linus Torvalds
2012-10-13 1:48 ` Al Viro
2012-10-13 9:59 ` Michael Cree
2012-10-13 15:39 ` Al Viro
2012-10-13 13:06 ` Thorsten Kranzkowski
2012-10-12 22:18 ` [PATCH 0/1] (Was: [regression] boot failure on alpha, bisected) Al Viro
2012-10-14 18:35 ` Oleg Nesterov
2012-10-14 18:42 ` Oleg Nesterov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20121008140948.M96290@allidaho.com \
--to=thursday@allidaho.com \
--cc=dl8bcu@dl8bcu.de \
--cc=ink@jurassic.park.msu.ru \
--cc=linux-alpha@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mattst88@gmail.com \
--cc=mingo@kernel.org \
--cc=oleg@redhat.com \
--cc=peterz@infradead.org \
--cc=rth@twiddle.net \
--cc=viro@ZenIV.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox