All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Dialup Jon Norstog" <thursday@allidaho.com>
To: Al Viro <viro@ZenIV.linux.org.uk>, Oleg Nesterov <oleg@redhat.com>
Cc: dl8bcu@dl8bcu.de, peterz@infradead.org, mingo@kernel.org,
	linux-kernel@vger.kernel.org, linux-alpha@vger.kernel.org,
	Richard Henderson <rth@twiddle.net>,
	Ivan Kokshaysky <ink@jurassic.park.msu.ru>,
	Matt Turner <mattst88@gmail.com>
Subject: Re: [regression] boot failure on alpha, bisected
Date: Mon, 8 Oct 2012 08:14:30 -0600	[thread overview]
Message-ID: <20121008141440.M72159@allidaho.com> (raw)
In-Reply-To: <20121007193909.GK2616@ZenIV.linux.org.uk>

Hello!  I'm an Alpha user - I just want to thank you all for working to keep
Linux current on this architecture.  I am still using the last working Alpha
Core release ... I hope to keep the old beast running for many more years!

Jon Norstog

www.thursdaybicycles.com


On Sun, 7 Oct 2012 20:39:09 +0100, Al Viro wrote
> On Sun, Oct 07, 2012 at 07:33:36PM +0200, Oleg Nesterov wrote:
> 
> > > Um...  There's a bunch of architectures that are in the same situation.
> > > grep for do_notify_resume() and you'll see...
> > 
> > And every do_notify_resume() should be changed anyway, do_signal() and
> > tracehook_notify_resume() should be re-ordered.
> 
> There's a bit more to it.  The thing is, we have quite a mess around
> the signal-handling loops, mixed with that regarding the signal restarts.
> On arm it's done about right by now:
> 	* looping until all signals had been handled is done in C;
> none of that "loop in asm glue" nonsense anymore.
> 	* prevention of double restarts is *also* there, TYVM.
> 	* do_work_pending() is called with interrupts disabled.
> It may return 0, in which case we are done, interrupts are disabled
> and the caller should proceed to userland without reenabling them
> until it leaves.  Otherwise we have a syscall restart to handle and
> no userland signal handler had been invoked.  Interrupts are enabled
> and we should simply reload arguments and syscall number from pt_regs
> and proceed to syscall entry, without returning to userland.  The 
> only twist is that negative return value means ERESTART_RESTARTBLOCK 
> kind of restart, in which case we need to use __NR_restart_syscall 
> for syscall number.
> 
> Note that we do *not* go through return to userland and reentering 
> the kernel on handlerless syscall restarts.  S390 uses the same 
> model, but there it's done in assembler glue - for no good reason. 
>  Should be in straight C.
> 
> For alpha there's another twist, though - there we do _not_ save all
> registers in pt_regs; there's a fairly large chunk of callee-saved
> registers we don't need to protect from being messed by C parts of
> the kernel.  We do need to save them in sigcontext, though.  So alpha
> (and quite a few other architctures) has separate struct switch_stack
> 
> (named so since switch_to() needs to save/restore the same registers)
> . Rules: 	* on fork() et.al. we save those callee-saved registers in 
> struct switch_stack, right next to pt_regs.  We do that before 
> calling the actual sys_fork() and have copy_thread() copy these guys 
> into child.  Remember that newborns are first woken up in ret_from_fork
> and as with all context switches they go through switch_to().  So these
> registers are restored by the time the sucker wakes up.
> 	* on signal delivery we save those registers in struct switch_stack
> and use it, along with pt_regs it lives next to, to fill sigcontext.
> 	* ptrace counts on those suckers being next to pt_regs.  That allows
> tracer to modify tracee's registers, including callee-saved ones.  
> So we
> (1) restore them from switch_stack once we are done with do_signal() 
> and
> (2) save/restore them around another place where we can get stopped for
> tracer to examine us - PTRACE_SYSCALL-induced paths in syscall handling.
> 	* on sigreturn/rt_sigreturn we need to restore all registers.
> So we reserve switch_stack on stack, next to pt_regs and have the C 
> part of sigreturn fill those along with pt_regs.  Once we are done,
>  read those registers from switch_stack.
> 
> That's more or less it; many other architectures are doing more or less
> similar things, but not all of them put that stuff into separate structure.
> 
> E.g. another valid solution is to leave space in pt_regs, fill only 
> a subset on entry and have switch_to() save stuff in task_struct 
> instead of putting it on kernel stack.
> 
> What it means for us is that saving all that crap on stack should *not*
> be done unless we have work to do.  OTOH, in situations when we have
> more than one pending signal it's bloody dumb to save/restore around
> each do_notify_resume() call separately.  OTTH, in situation when 
> we'd run out of timeslice and had nothing arrive until we'd regained 
> CPU save/restore around schedule() is pointless at the very least. 
>  So for things like alpha I'd do this:
> 
> 	interrupts disabled
> 	check thread flags
> 	no work to do => bugger off to userland
> 	just NEED_RESCHED?
> 		schedule()
> 		reread thread flags
> 		no work to do => bugger off to userland
> 	save callee-saved registers
> 	call do_work_pending
> 	restore callee-saved registers
> 	if do_work_pendign returned 0 => bugger off to userland
> 	deal with handlerless restart
> 
> Note that the loop around do_signal() and friends is in C and is fairly
> similar to what we've got on ARM.  x86 is in intermediate situation -
> the main complication there is v86 crap.
> 
> I'd say that for now your variant should do, but we really need to 
> get that crap under control and out of asm glue.  Are you willing to 
> participate? Guys, we need a way to do cross-architecture work 
> without going insane. I've spent quite a bit of time this year 
> crawling through that stuff. And yes, it's getting better as the 
> result, but it's not sustainable - I have VFS work to do, after all.
> 
> Basically, we need more people willing to take part in that; ideally 
> - architecture maintainers, but some of them are semi-MIA.  The 
> areas involved: 	* kernel_thread()/kernel_execve()/sys_execve()
> /fork()/vfork()/clone() - quite a bit of that is already done and I 
> hope we'll regularize that crap in the coming cycle. 	* signal 
> handling in general - a lot got done this spring and summer, quite a 
> bit more is possible to unify.  I've got a long list of common 
> landmines not to step upon and unfortunately it's *very* common to have
> architectures step on a bunch of those.
> 	* syscall restarts - see above; note that e.g. prevention of
> double restarts and restarts on sigreturn is subtle, arch-dependent
> and had been broken on *many* architectures.  And I'm not at all sure
> we'd got all suckers fixed.
> 	* ptrace work, especially around PTRACE_SYSCALL handling.  I suspect
> that the right way to handle it is a new regset aliasing the normal 
> registers, so that access to syscall arguments would be arch-
> independent.  We can do that, and it would simplify the living hell 
> out of e.g. audit hookup. Another (and closely relate) thing is 
> conversion to tracehook_report_syscall_*; the tricky bit is that we 
> probably want a uniform semantics for things like modifying syscall 
> arguments via ptrace; some architectures do it right and reload 
> arguments and syscall number from pt_regs after they'd done 
> tracehook_report_syscall_entry(), but not all of them do.  Moreover, 
> we probably want to short-circuit the syscall itself when 
> PTRACE_CONT had been done with "and deliver SIGKILL to the tracee" 
> as e.g. x86, sparc and ppc do. 	* interplay between single-stepping 
> and syscall restarts.  Really, really nasty.  And needs involvement 
> of e.g. gdb people to sort out.
> 
> 	We really need that stuff sanely synchronized between architectures.
> I'm willing to keep participating in that work, but I can't do that alone.
> It's simply not survivable.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-
> alpha" in the body of a message to majordomo@vger.kernel.org More 
> majordomo info at  http://vger.kernel.org/majordomo-info.html


--
Open WebMail Project (http://openwebmail.org)


  parent reply	other threads:[~2012-10-08 14:14 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-10-06 20:47 [regression] boot failure on alpha, bisected Thorsten Kranzkowski
2012-10-07 16:55 ` Oleg Nesterov
2012-10-07 17:08   ` Al Viro
2012-10-07 17:33     ` Oleg Nesterov
2012-10-07 19:39       ` Al Viro
2012-10-08 14:14         ` Dialup Jon Norstog
2012-10-08 14:14         ` Dialup Jon Norstog [this message]
2012-10-08 18:59         ` Oleg Nesterov
2012-10-07 17:13   ` Oleg Nesterov
2012-10-07 18:04     ` Thorsten Kranzkowski
2012-10-07 19:16       ` Oleg Nesterov
2012-10-07 19:41         ` Thorsten Kranzkowski
2012-10-08 18:59         ` Geert Uytterhoeven
2012-10-08 19:10           ` Oleg Nesterov
2012-10-12 16:03 ` [PATCH 0/1] (Was: [regression] boot failure on alpha, bisected) Oleg Nesterov
2012-10-12 16:03   ` [PATCH 1/1] task_work: Add local_irq_enable() into task_work_run() Oleg Nesterov
2012-10-13  1:09     ` Linus Torvalds
2012-10-13  1:48       ` Al Viro
2012-10-13  9:59       ` Michael Cree
2012-10-13 15:39         ` Al Viro
2012-10-13 13:06       ` Thorsten Kranzkowski
2012-10-12 22:18   ` [PATCH 0/1] (Was: [regression] boot failure on alpha, bisected) Al Viro
2012-10-14 18:35     ` Oleg Nesterov
2012-10-14 18:42       ` Oleg Nesterov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20121008141440.M72159@allidaho.com \
    --to=thursday@allidaho.com \
    --cc=dl8bcu@dl8bcu.de \
    --cc=ink@jurassic.park.msu.ru \
    --cc=linux-alpha@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mattst88@gmail.com \
    --cc=mingo@kernel.org \
    --cc=oleg@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rth@twiddle.net \
    --cc=viro@ZenIV.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.