public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Hubertus Franke <frankeh@watson.ibm.com>
To: "Serge E. Hallyn" <serue@us.ibm.com>
Cc: Paul Jackson <pj@sgi.com>,
	linux-kernel@vger.kernel.org, haveblue@us.ibm.com
Subject: Re: [RFC] [PATCH 00/13] Introduce task_pid api
Date: Tue, 15 Nov 2005 09:37:17 -0500	[thread overview]
Message-ID: <4379F29D.3090306@watson.ibm.com> (raw)
In-Reply-To: <20051115133222.GA2232@IBM-BWN8ZTBWAO1>

Serge E. Hallyn wrote:
> Quoting Paul Jackson (pj@sgi.com):
> 

There have been a few suggestions going fro and back.
Let me address them all at once.

(A) why a vpid?

For transparent checkpointing. Vserver for instance has not implemented
a checkpoint/restart yet, because without this concept it is not possible.
The moment you want transparent checkpoint, you need to deal with the fact
that the results of a getpid() are in register (worst case) and upon
restart the system must provide the same pid on the different machine.
That immediately suggest pid range reservation... but see point (B) below.

(B) syscall interception and LD_PRELOAD:

In principle that is possible, but it leads to potentially inefficient code
and at large leaves the issue of pid space creation and migration on the table.
However it makes clear that as long as I keep the transformation or mappings
consistent between virtual and real, that this is a quite useful concept.

The question now is how deep into the kernel do I have to drive it in order to
create an efficient implementation.

(C) Fixed PID range allocation:

That is completely unscalable and unnecessary:

First PID range allocation at a global level (e.g. cluster level) requires some agent.
Given that PID_MAX ~ 2**22 leaves us on 32-bit architectures with only 512 pidspaces (negative
range needs to be preserved I think).
However it is not unreasonable to assume that 512 different pidspaces per OS image is not
a restriction.
Hence, when a pidspace is migrated it will be assigned a different pidspace id.
Then going with   kernelpid =  (pidspace_id << 22) | vpid is an efficient means to
map between virtual pidspace and physical pidspace and vice versa.
All that needs to be managed is local pidspace allocation.
The translations from vpid <-> pid are very light weight as can be seen from the above
composition.

Take for example the vserver system. A local vserver agent could maintain the
pidspace allocation. On creation of a vserver it assigns the next available pidspace.
That pidspace id is internal to vserver and is not exported as a property of a vserver.
When a vserver is migrated to a different machine, a potentially different pidspace
is allocate, yet all the vpids remain the same.

(D) Cross compilation

I do all stuff on s390 so that space is covered.

If I missed some of the issues that were raised let me know and we will try to address
those.

I am part of Serge's team and have been working on intercepting the various places
where virtual to real pid translations have to occur in the kernel.
It's still in pretty bad shape, but it boots for the default pid space (:- ).
Of my head I say there are about 40 places each to do the translation.
Many are in the /proc/fs, some in the signal handling

I hope by end of the week I have something to post that gives idea how we are thinking
this could be realized.

  reply	other threads:[~2005-11-15 14:37 UTC|newest]

Thread overview: 86+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-11-14 21:23 [RFC] [PATCH 00/13] Introduce task_pid api Serge E. Hallyn
2005-11-14 21:23 ` [RFC] [PATCH 01/13] Change pid accesses: drivers Serge E. Hallyn
2005-11-14 21:23 ` [RFC] [PATCH 02/13] Change pid accesses: most archs Serge E. Hallyn
2005-11-14 21:23 ` [RFC] [PATCH 03/13] Change pid accesses: filesystems Serge E. Hallyn
2005-11-14 21:23 ` [RFC] [PATCH 04/13] Change pid accesses: include/ Serge E. Hallyn
2005-11-14 21:23 ` [RFC] [PATCH 05/13] Change pid accesses: ipc Serge E. Hallyn
2005-11-14 21:23 ` [RFC] [PATCH 06/13] Change pid accesses: kernel/ Serge E. Hallyn
2005-11-14 21:23 ` [RFC] [PATCH 07/13] Change pid accesses: lib/ Serge E. Hallyn
2005-11-14 21:23 ` [RFC] [PATCH 08/13] Change pid accesses: mm/ Serge E. Hallyn
2005-11-14 21:23 ` [RFC] [PATCH 09/13] Change pid accesses: net/ Serge E. Hallyn
2005-11-14 21:23 ` [RFC] [PATCH 10/13] Change pid accesses: security/ Serge E. Hallyn
2005-11-14 21:23 ` [RFC] [PATCH 11/13] Change pid accesses: sound/ Serge E. Hallyn
2005-11-14 21:23 ` [RFC] [PATCH 12/13] Change pid accesses: ia64 and mips Serge E. Hallyn
2005-11-15 23:08   ` Keith Owens
2005-11-16 11:58     ` Serge E. Hallyn
2005-11-16 13:53     ` Serge E. Hallyn
2005-11-14 21:23 ` [RFC] [PATCH 13/13] Define new task_pid api Serge E. Hallyn
2005-11-14 23:36 ` [RFC] [PATCH 00/13] Introduce " Paul Jackson
2005-11-15  1:01   ` Serge E. Hallyn
2005-11-15  1:35     ` Paul Jackson
2005-11-15  1:51     ` Paul Jackson
2005-11-15  2:29       ` Serge E. Hallyn
2005-11-15  3:37         ` Paul Jackson
2005-11-15  5:15           ` Serge E. Hallyn
2005-11-15  6:35             ` Paul Jackson
2005-11-15  8:11               ` Serge E. Hallyn
2005-11-15  9:06                 ` Paul Jackson
2005-11-15 10:07                   ` Dave Hansen
2005-11-15 18:10                     ` Paul Jackson
2005-11-15 11:59                   ` Robin Holt
2005-11-15 13:32                   ` Serge E. Hallyn
2005-11-15 14:37                     ` Hubertus Franke [this message]
2005-11-15 18:39                       ` Paul Jackson
2005-11-15 18:54                         ` Hubertus Franke
2005-11-15 19:00                   ` Serge E. Hallyn
2005-11-15 19:17                     ` Hubertus Franke
2005-11-15 22:11                     ` Paul Jackson
2005-11-15 23:15                       ` Cedric Le Goater
2005-11-15 23:28                         ` Paul Jackson
2005-11-15 16:47             ` Greg KH
2005-11-15 17:08               ` Serge E. Hallyn
2005-11-15 17:33               ` Dave Hansen
2005-11-15  5:51   ` Serge E. Hallyn
2005-11-13 15:22     ` Pavel Machek
2005-11-16 19:36       ` Kyle Moffett
2005-11-16 20:36         ` Pavel Machek
2005-11-16 20:48           ` Dave Hansen
2005-11-19 23:30             ` Pavel Machek
2005-11-20 22:38               ` Serge E. Hallyn
2005-12-07 14:53                 ` Eric W. Biederman
2005-11-20 23:29               ` Nix
2005-11-16 21:07           ` Paul Jackson
2005-11-16 20:24       ` Dave Hansen
2005-11-15 13:34   ` Serge E. Hallyn
2005-11-15 11:17 ` Robin Holt
2005-11-15 12:01   ` Dave Hansen
2005-11-15 19:21 ` Ray Bryant
2005-11-15 19:41   ` Serge E. Hallyn
2005-11-15 20:30     ` Ray Bryant
2005-11-15 21:05       ` Serge E. Hallyn
2005-11-15 22:43         ` Paul Jackson
2005-11-15 22:55       ` Cedric Le Goater
2005-11-16  1:12         ` Paul Jackson
2005-12-07 14:46 ` Eric W. Biederman
2005-12-07 17:47   ` Dave Hansen
2005-12-07 17:55     ` Arjan van de Ven
2005-12-07 18:09       ` Dave Hansen
2005-12-07 19:00         ` Arjan van de Ven
2005-12-07 19:42           ` Eric W. Biederman
2005-12-07 22:13           ` Dave Hansen
2005-12-07 22:20             ` Arjan van de Ven
2005-12-12 10:55               ` Dave Airlie
2005-12-19 14:04                 ` Eric W. Biederman
2005-12-07 19:19     ` Eric W. Biederman
2005-12-07 21:40       ` Dave Hansen
2005-12-07 22:17         ` Eric W. Biederman
2004-12-14 15:23           ` Pavel Machek
2005-12-14 13:40             ` Arjan van de Ven
2005-12-14 16:29               ` Serge E. Hallyn
2005-12-07 22:31           ` Dave Hansen
2005-12-07 22:51             ` Eric W. Biederman
2005-12-08  5:42             ` Jeff Dike
2005-12-08 10:09             ` Andi Kleen
2005-12-07 22:17       ` Cedric Le Goater
  -- strict thread matches above, loose matches on Subject: below --
2005-11-16  2:24 Hua Zhong (hzhong)
2005-11-16 17:52 ` Bernard Blackham

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4379F29D.3090306@watson.ibm.com \
    --to=frankeh@watson.ibm.com \
    --cc=haveblue@us.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pj@sgi.com \
    --cc=serue@us.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox