From: John Reiser <jreiser@BitWagon.com>
To: Jeff Dike <jdike@addtoit.com>
Cc: uml-devel <user-mode-linux-devel@lists.sourceforge.net>
Subject: Re: [uml-devel] should there be os_clone() analogous to os_getpid() ?
Date: Sun, 09 Dec 2007 12:58:33 -0800 [thread overview]
Message-ID: <475C56F9.8040902@BitWagon.com> (raw)
In-Reply-To: <20071209151054.GA4368@c2.user-mode-linux.org>
Jeff Dike wrote:
> On Sat, Dec 08, 2007 at 08:24:57PM -0800, John Reiser wrote:
>>I see no os_clone(), yet the glibc clone() does the same caching of pid in
>>ThreadLocalStorage [TLS], and the TLS still may be shared. If nobody reads
>>glibc's shared TLS slot for PID then an actual bug will be avoided. However,
>>it is unsafe to leave such a tempting pitfall.
>
>
> What's the actual bug, exactly? As long as libc's getpid gives us the
> right answer, we're happy.
The actual bug is unnecessary complexity, which slows down development.
[And glibc's getpid still may give the wrong answer. glibc caching getpid()
requires that glibc is the only implementor of fork or clone.
(Obviously uml and/or valgrind *could* violate this assumption.)
glibc getpid() may give the wrong answer when non-glibc code does a fork() or
clone(). glibc-2.6+ getpid() also gives the wrong answer when called
from a signal handler, if the signal is delivered immediately after the
__NR_clone but before glibc updates its cache %gs:PID. (glibc should poison
its cache before doing the __NR_clone syscall. Yeah, it's a bug in glibc.
Section 2.4.3 of the Single UNIX Specification requires that getpid() be
async-signal-safe, which means that getpid() may be called from a signal
handler.) A virtualizer, such as valgrind, is *likely* to trigger this race.]
The "normal" code within UML is not the only player who wants the right answer
from getpid(). Temporary debugging code also wants the right answer. That's
still "internal" to UML, so the "BEWARE!" might excuse the fact that "getpid()"
gives the wrong answer. But there is also valgrind in the same new process,
and "getpid()" giving the wrong answer is less excusable. It "shouldn't happen",
but the number of different getpid() is growing, and remembering which one(s)
are unreliable (and why), and ensuring that you aren't using one of them,
becomes difficult.
>
>
>>Also, if you are ptrace()ing
>>through a glibc clone(), then in many cases you will see syscall(__NR_getpid)
>>*from glibc* immediately following! There is an "extra" getpid()
>>that the tracking logic might not expect.
>
>
> Where do we care about how clone translates into a system call?
check_sysemu() in arch/um/os/start_up.c cares that the actual sequence of
system calls is:
__NR_clone
__NR_getpid presumably from ptrace_child() calling os_getpid()
However, when using the clone() from glibc then the actual sequence is:
__NR_clone
[random system calls from glibc]
__NR_getpid from ptrace_child calling os_getpid()
Now it "accidentally" happens that "random system calls from glibc"
is at most one syscall, and if present it is '__NR_getpid', which is the
same syscall as the presumed os_getpid() from ptrace_child(). That's
a "lucky" break.
>
>
>> arch/um/drivers/ubd_user.c
>> arch/um/kernel/tt/tracer.c
>> arch/um/os/tt.c
>> arch/um/os/start_up.c
>> arch/um/os/skas/process.c
>
>
> These guys all just want a new process - they don't care how it
> happens.
Not so. userspace() in arch/um/os/skas/process.c relies on
stack location and signal delivery that is mediated by the combination
of clone() and ptrace(). userspace() also depends on the carry-over
of signal handlers, particularly the SIGSEGV ==> SIGUSR1 trampoline.
"Just a new process" isn't good enough.
> Is something in here causing valgrind some trouble?
Yes. It is not simple for the current valgrind to "let go" of
a new child. [The current valgrind knows how to "let go" only at
execve().] The internal logic of valgrind requires that the creation
of the new process [implemented by clone(,, ~CLONE_VM & ( ),,)] must be done
with all signals blocked. Therefore the child side must do a
sigprocmask() somewhere to unblock, and the uml ptrace()ing is not
expecting this. I'm working on it, and moving ahead, but
progress is slow.
--
John Reiser, jreiser@BitWagon.com
-------------------------------------------------------------------------
SF.Net email is sponsored by:
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://sourceforge.net/services/buy/index.php
_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel
next prev parent reply other threads:[~2007-12-09 20:58 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-12-09 4:24 [uml-devel] should there be os_clone() analogous to os_getpid() ? John Reiser
2007-12-09 15:10 ` Jeff Dike
2007-12-09 20:58 ` John Reiser [this message]
2007-12-10 17:09 ` Jeff Dike
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=475C56F9.8040902@BitWagon.com \
--to=jreiser@bitwagon.com \
--cc=jdike@addtoit.com \
--cc=user-mode-linux-devel@lists.sourceforge.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.