From: george anzinger <george@mvista.com>
To: Linus Torvalds <torvalds@transmeta.com>
Cc: Jim Houston <jim.houston@ccur.com>,
Stephen Rothwell <sfr@canb.auug.org.au>,
LKML <linux-kernel@vger.kernel.org>,
anton@samba.org, "David S. Miller" <davem@redhat.com>,
ak@muc.de, davidm@hpl.hp.com, schwidefsky@de.ibm.com,
ralf@gnu.org, willy@debian.org
Subject: Re: [PATCH] compatibility syscall layer (lets try again)
Date: Wed, 04 Dec 2002 19:46:38 -0800 [thread overview]
Message-ID: <3DEECC1E.7F39F553@mvista.com> (raw)
In-Reply-To: Pine.LNX.4.44.0212041830100.3100-100000@home.transmeta.com
Linus Torvalds wrote:
>
> On Wed, 4 Dec 2002, george anzinger wrote:
> >
> > The way the system is now a system call "appears" to get by
> > value calls, but the parameters are on the stack (in the
> > regs structure). This is what is restored and passed back
> > on a system call restart. What I am getting at is that
> > nano_sleep could scribble anything it wants here and
> > "notice" it on the recall.
>
> Absolutely. That's what my ERESTARTSYS_RESTARTBLOCK thing is all about: a
> "portable" way to let the architecture-specific do_signal() know what to
> do about the return stack.
>
> It mustn't be nanosleep()-specific, that just gets too nasty.
>
> > Changing the call to absolute changes the semantics (in
> > particular the behavior on clock setting) in a way I don't
> > think you want to. I.e. you can tell it was done. So you
> > would have to do this in a way that does not look like the
> > absolute call in the current POSIX spec.
>
> No, the point is that re-starting the system call is totally invisible to
> user space, and user space would never use the "restart" system call
> directly.
>
> Let me give a more explicit example on an x86 level:
>
> - This is part of the x86 library function:
>
> movl 4(%esp),%ebx // request
> movl 8(%esp),%ecx // remainder
> movl $162,%eax // nanosleep syscall #
> int 0x80 // system call
>
> - this enters the kernel, which saves stuff off on the stack,
> and calls sys_nanosleep by indexing the 162 off the system call
> table. Time is now X.
>
> - we're supposed to sleep until "X + request"
>
> ...
> schedule_timeout()
>
> - we get woken up by a signal thing, which doesn't have a handler, but
> does (for example) put us to sleep. Let's say that it's SIGSTOP. To
> handle the signal, sys_nanosleep() need to return -ERESTARTSYS because
> it can't do it on its own.
>
> - 2 seconds later, the user sends a SIGCONT, and the process restarts.
> Time is now X+2, which may or may not be AFTER the original timeout.
>
> See the problem here? We MUST NOT restart the system call with the
> original timeout pointer (the contents of which we must not change). Not
> only have we already slept part of the time (that part we know about), but
> we may _also_ have been blocked by a signal part of the time (which has
> been totally outside the control of sys_nanosleep()).
>
> So my solution implies that our restart logic in do_signal(), which
> already knows how to update the user-level EIP register (that's how the
> restart is done), can also be told to update the system call and the
> argument registers.
Once it changes the system call (eax, right), could the new
call code then just get the parms from the restart_block.
Means less code for the signal handler and keeps things
simple. It also means that the call gets the orgional parms
back so it is very generic, i.e. the signal code does not
need to know which parms to change and which to not.
> So what we do is to introduce a _new_ system call
> (system call number NNN), which takes a different form of timeout, namely
> "absolute value of end time".
I think it would be best to keep this as generic as
possible, i.e. let the new call code fetch its own
paramerers from the restart_block.
>
> And then, when we enter do_signal(), we not only update %eip to point to
> the original "int 0x80" instruction, we _also_ update %eax to point to the
> new system call NNN, _and_ we update %ebx to contain the new timeout in
> absolute jiffies:
>
> current_thread->restart_block.syscall_nr = NNN;
> current_thread->restart_block.arg0 = jiffies + timeout;
My question is who sets up these values? I think you are
saying it should be the system call. Is this right?
>
> and then we have a
>
> sys_nanosleep_resume(unsigned long timeout, struct timespec *rem)
> {
> long jif = timeout - jiffies;
>
> if (jif > 0) {
> current->state = TASK_INTERRUPTIBLE;
> jif = schedule_timeout(jif);
> /* interrupted - we already have the restart block set up */
> if (jif) {
> if (rem)
> jiffies_to_usertimespec(jif, rem);
> return -ERESTART_RESTARTBLOCK;
> }
> }
> put_user(0, rem->tv_sec);
> put_user(0, rem->tv_nsec);
> return 0;
> }
>
> See? The "nanosleep_resume" system call is never used by a program
> directly, it's only virtualized by the signal restart changing the system
> call number on restart. (A user program _could_ use it directly, but
> there's no point, and the interface to the thing might change at any
> time).
I think we could cause it to error out, if, for example, the
restart_block is null.
--
George Anzinger george@mvista.com
High-res-timers:
http://sourceforge.net/projects/high-res-timers/
Preemption patch:
http://www.kernel.org/pub/linux/kernel/people/rml
next prev parent reply other threads:[~2002-12-05 3:41 UTC|newest]
Thread overview: 78+ messages / expand[flat|nested] mbox.gz Atom feed top
2002-12-04 7:02 [PATCH] compatibility syscall layer (lets try again) Stephen Rothwell
2002-12-04 7:07 ` [PATCH] compatibility syscall layer - PPC64 Stephen Rothwell
2002-12-06 23:03 ` Anton Blanchard
2002-12-04 7:16 ` [PATCH] compatibility syscall layer - SPARC64 Stephen Rothwell
2002-12-04 7:18 ` [PATCH] compatibility syscall layer - X86_64 Stephen Rothwell
2002-12-04 11:29 ` Andi Kleen
2002-12-04 7:26 ` [PATCH] compatibility syscall layer - IA64 Stephen Rothwell
2002-12-04 7:37 ` David Mosberger
2002-12-04 7:28 ` [PATCH] compatibility syscall layer (lets try again) Stephen Rothwell
2002-12-04 7:29 ` [PATCH] compatibility syscall layer - PARISC Stephen Rothwell
2002-12-04 7:30 ` [PATCH] compatibility syscall layer (lets try again) Stephen Rothwell
2002-12-04 7:33 ` Stephen Rothwell
2002-12-04 11:57 ` Pavel Machek
2002-12-04 16:54 ` Linus Torvalds
2002-12-04 16:54 ` David S. Miller
2002-12-04 17:05 ` Linus Torvalds
2002-12-04 19:56 ` george anzinger
2002-12-04 20:07 ` Linus Torvalds
2002-12-04 20:56 ` Daniel Jacobowitz
2002-12-04 22:09 ` David S. Miller
2002-12-04 22:31 ` george anzinger
2002-12-04 22:39 ` David S. Miller
2002-12-04 22:42 ` Linus Torvalds
2002-12-04 23:42 ` Jim Houston
2002-12-05 0:18 ` Linus Torvalds
2002-12-05 2:01 ` george anzinger
2002-12-05 2:51 ` Linus Torvalds
2002-12-05 3:10 ` Andi Kleen
2002-12-05 3:46 ` george anzinger [this message]
2002-12-05 4:11 ` Linus Torvalds
2002-12-05 7:10 ` george anzinger
2002-12-05 9:48 ` george anzinger
2002-12-05 15:24 ` Jim Houston
2002-12-05 16:35 ` george anzinger
2002-12-06 0:03 ` Richard Henderson
2002-12-05 17:03 ` Linus Torvalds
2002-12-06 9:17 ` george anzinger
2002-12-06 17:57 ` Linus Torvalds
2002-12-06 19:20 ` Linus Torvalds
2002-12-06 20:09 ` [PATCH] compatibility syscall layer (let's " Jim Houston
2002-12-06 20:33 ` george anzinger
2002-12-06 20:18 ` [PATCH] compatibility syscall layer (lets " george anzinger
2002-12-06 21:12 ` Linus Torvalds
2002-12-06 21:56 ` Jim Houston
2002-12-06 22:58 ` Linus Torvalds
2002-12-07 2:25 ` george anzinger
2002-12-06 23:08 ` george anzinger
2002-12-08 20:41 ` David S. Miller
2002-12-09 6:18 ` Stephen Rothwell
2002-12-09 15:41 ` Daniel Jacobowitz
2002-12-09 16:48 ` Linus Torvalds
2002-12-09 17:27 ` David Mosberger
2002-12-09 20:22 ` David S. Miller
2002-12-09 17:49 ` Jim Houston
2002-12-09 17:57 ` Linus Torvalds
2002-12-09 23:30 ` Paul Mackerras
2002-12-10 23:07 ` george anzinger
2002-12-11 7:10 ` Daniel Jacobowitz
2002-12-11 8:11 ` george anzinger
2002-12-11 8:26 ` Daniel Jacobowitz
2002-12-10 11:08 ` Jamie Lokier
2002-12-05 2:27 ` Jim Houston
-- strict thread matches above, loose matches on Subject: below --
2002-12-09 16:58 Mikael Starvik
2002-12-09 17:35 ` Linus Torvalds
2002-12-09 18:46 ` David Mosberger
2002-12-10 0:13 ` Paul Mackerras
2002-12-10 23:11 ` george anzinger
2002-12-09 17:16 Martin Schwidefsky
2002-12-09 17:33 ` Linus Torvalds
2002-12-09 20:18 ` David S. Miller
2002-12-09 17:56 Martin Schwidefsky
2002-12-09 18:20 ` Linus Torvalds
2002-12-10 14:40 ` Keith Owens
2002-12-09 18:41 Martin Schwidefsky
2002-12-09 18:52 ` Linus Torvalds
2002-12-10 8:20 ` george anzinger
2002-12-10 8:42 Martin Schwidefsky
2002-12-10 17:17 Martin Schwidefsky
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=3DEECC1E.7F39F553@mvista.com \
--to=george@mvista.com \
--cc=ak@muc.de \
--cc=anton@samba.org \
--cc=davem@redhat.com \
--cc=davidm@hpl.hp.com \
--cc=jim.houston@ccur.com \
--cc=linux-kernel@vger.kernel.org \
--cc=ralf@gnu.org \
--cc=schwidefsky@de.ibm.com \
--cc=sfr@canb.auug.org.au \
--cc=torvalds@transmeta.com \
--cc=willy@debian.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.