public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Ulrich Drepper <drepper@redhat.com>
To: Ingo Molnar <mingo@elte.hu>
Cc: Linux Kernel <linux-kernel@vger.kernel.org>,
	Andrew Morton <akpm@osdl.org>
Subject: Re: sched_setaffinity usability
Date: Fri, 19 Mar 2004 00:05:52 -0800	[thread overview]
Message-ID: <405AA9E0.4040102@redhat.com> (raw)
In-Reply-To: <20040318123103.GA21893@elte.hu>

Ingo Molnar wrote:

> i'm wondering how dangerous of an API idea it is to make these
> parameters part of the VDSO .data section (and make it/them versioned
> DSO symbols).

Exporting variables is never a good idea.  The interface is inflexible.
 If the variable size or layout changes or it needs to be dynamically
changed this is bad.

Even if this ticks off a certain LT, a sysconf()-like interface is the
most flexible.  The results would be stored in libc if the lookup is
likely to happen frequently.  The sysconf code in the vdso has all the
flexibility it could ever need.  For instance, a query as to how many
processors are online could do some computations or even make syscalls
if necessary.  Or it could just return a constant like 1 if this is
known at compile or startup time.  I cannot imagine why this isn't
something the kernel people like, you get full control of the way to
compute the values.  The exposed interface is minimal, as opposed to
exporting many individual variables.


> The only minor complication wrt. uname() would be sethostname: other
> CPUs could observe a transitional state of (the VDSO-equavalent of)
> system_utsname.nodename. Is this a problem? It's not like systems call
> sethostname all that often ...

Again, by exporting an interface to access the value you can get all the
control you need.  In this case it'd probably be a confstr()-like
interface which is just like sysconf(), but can return strings or
arbitrary data (it gets passed a memory pointer and size).

To implement gethostname() without races store the hostname as

   host name MAGIC

The read function can first read MAGIC, read barrier, then read the host
name, read barrier, then read MAGIC again.  If MAGIC changed, rinse and
repeat.  Doing this from libc would mean to hardcode all the processor
idiosyncrasies to do all this in the libc.  It has to be generic enough
to cover all versions of the CPU (maybe some need the MAGIC value to be
specially aligned) and then has to dynamically decide what version to
use.  In the vdso the kernel can decide at boot time which functions to
use since it knows at that time what CPUs are used.

Some other syscalls like uname() can be fully implemented in the vdso as
well.  The vdso is writable in the kernel so the mapped data can be
updated.  In the uname() case, the syscall would be a simple memcpy()
from the place in the vdso into the place designated by the parameter.

Even if it is not possible to implement the entire syscall at userlevel,
maybe just a part can be done in the vdso, in the prologue or epilogue
of the vdso function.


The kernel gets the opportunity to *OPTIONALLY* tweak every little
aspect of the syscall handling if it just wants to.  All this without
having to change the libc and waiting for the changes to be widely deployed.

-- 
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖

  reply	other threads:[~2004-03-19  8:06 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-03-18  8:05 sched_setaffinity usability Ulrich Drepper
2004-03-18  8:12 ` Tim Hockin
2004-03-18  8:22   ` Ulrich Drepper
2004-03-18  8:47     ` Ulrich Drepper
2004-03-18  9:45 ` Andrew Morton
2004-03-18 10:10   ` Andrew Morton
2004-03-18 11:29 ` Ingo Molnar
2004-03-18 12:07   ` Christoph Hellwig
2004-03-18 12:31     ` Ingo Molnar
2004-03-19  8:05       ` Ulrich Drepper [this message]
2004-03-18 15:55     ` Linus Torvalds
2004-03-18 18:24       ` Ingo Molnar
2004-03-18 18:33         ` Andrew Morton
2004-03-18 18:39           ` Ingo Molnar
2004-03-18 18:55             ` Ingo Molnar
2004-03-18 20:01             ` Andrea Arcangeli
2004-03-18 20:28               ` Ingo Molnar
2004-03-18 20:49         ` David Lang
2004-03-18 20:57           ` Randy.Dunlap
2004-03-18 21:06           ` Ingo Molnar
2004-03-18 21:07         ` Davide Libenzi
2004-03-18 21:46           ` Ingo Molnar
2004-03-19  1:37             ` Davide Libenzi
2004-03-19  9:02         ` Helge Hafting
2004-03-21  9:51           ` Ingo Molnar
2004-03-19  0:00       ` Paul Jackson
2004-03-18 17:47 ` sched_setaffinity usability -- other issue Chris Friesen
     [not found] <1B0Ls-lY-27@gated-at.bofh.it>
     [not found] ` <1B42z-3Lx-5@gated-at.bofh.it>
     [not found]   ` <1B4Fh-4sQ-3@gated-at.bofh.it>
     [not found]     ` <1B86P-8gq-69@gated-at.bofh.it>
     [not found]       ` <1Bars-2s6-29@gated-at.bofh.it>
     [not found]         ` <1BaKU-2Lg-49@gated-at.bofh.it>
     [not found]           ` <1BaKX-2Lg-61@gated-at.bofh.it>
     [not found]             ` <1BaUR-2V0-41@gated-at.bofh.it>
2004-03-18 21:23               ` sched_setaffinity usability Andi Kleen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=405AA9E0.4040102@redhat.com \
    --to=drepper@redhat.com \
    --cc=akpm@osdl.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox