All of lore.kernel.org
 help / color / mirror / Atom feed
* [uml-devel] SKAS4 design question
@ 2006-01-16 19:34 Jacob Bachmeyer
  2006-01-18 11:58 ` Blaisorblade
  0 siblings, 1 reply; 8+ messages in thread
From: Jacob Bachmeyer @ 2006-01-16 19:34 UTC (permalink / raw)
  To: user-mode-linux-devel; +Cc: jcb62281

Has any thought been given to making SKAS4 suitably generic that it 
could be used for more than just UML?

I'm thinking of some arrangement where one process can handle multiple 
address spaces for multiple other processes.

This would have greater application than merely UML--for example, Wine 
could also be adapted to use SKAS, potentially a killer app, as this 
could make Wine more secure than Windows.  (Running all Wine code in its 
own address space, separate from the apps Wine runs, could insulate 
against some application buffer overruns.  (due to the way the Win32 API 
is accessed))

Hmm, what would we need for this to work?

--ability to create/release "remote" address spaces
--read/write in those "remote" address spaces
-- possibly even capability to map a section of a "remote" address space 
into the control process, do something, then release it
--ability to configure pages in a "remote" address space such that 
accesses trap to the control process
--ability to trap all possible syscalls from such an address space
for the big bonus:
--ability to use either the host scheduler or some code from the 
not-yet-developed libUML to run threads in the "remote" address spaces

Hmm, with a little more effort, this could become a generic 
compatibility layer for non-Linux programs--for each foreign platform, 
one would need only a control program that manages the foreign processes 
and implements the foreign syscalls.

{Contemplates HURD on Linux :-)}

As I understand it, the Linux mm system is internally moving in this 
kind of direction already.  SKAS would become primarily a system by 
which pages can have backing store implemented in userspace and "remote" 
address spaces managed.

This direction would certainly help push SKAS into the stock kernel.

PS:  If I understand correctly, UML with the current SKAS3 works by 
swapping processes into and out of a single "user" address space.  I 
propose a system where many distinct "user" address spaces are 
maintained by the kernel and execution is placed whereever the user-mode 
scheduler says.


-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [uml-devel] SKAS4 design question
  2006-01-16 19:34 [uml-devel] SKAS4 design question Jacob Bachmeyer
@ 2006-01-18 11:58 ` Blaisorblade
  2006-01-18 23:52   ` Jacob Bachmeyer
  0 siblings, 1 reply; 8+ messages in thread
From: Blaisorblade @ 2006-01-18 11:58 UTC (permalink / raw)
  To: user-mode-linux-devel, jcb62281

On Monday 16 January 2006 20:34, Jacob Bachmeyer wrote:
> Has any thought been given to making SKAS4 suitably generic that it
> could be used for more than just UML?

Not yet, thoughts welcome.

> PS:  If I understand correctly, UML with the current SKAS3 works by
> swapping processes into and out of a single "user" address space.

> I 
> propose a system where many distinct "user" address spaces are
> maintained by the kernel and execution is placed whereever the user-mode
> scheduler says.

What you say is not clear, but the most obvious understanding of the above 
sentence is that you propose what already happens.

However, SKAS3 and current ideas for SKAS4, with different APIs but similar 
semantics, say: implement all guest processes as user-level threads (totally 
implemented within UML) with the exception that we allow different address 
spaces.

So we have "switch the guest proc to a different address 
space" (PTRACE_SWITCH_MM, in arch/i386/kernel/ptrace.c), manipulate with 
mmap/munmap/mprotect any of these address spaces, and destroy it (all in 
mm/proc_mm.c).
-- 
Inform me of my mistakes, so I can keep imitating Homer Simpson's "Doh!".
Paolo Giarrusso, aka Blaisorblade (Skype ID "PaoloGiarrusso", ICQ 215621894)
http://www.user-mode-linux.org/~blaisorblade


	
	
		
___________________________________ 
Yahoo! Messenger with Voice: chiama da PC a telefono a tariffe esclusive 
http://it.messenger.yahoo.com



-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642
_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [uml-devel] SKAS4 design question
  2006-01-18 11:58 ` Blaisorblade
@ 2006-01-18 23:52   ` Jacob Bachmeyer
  2006-01-19  0:37     ` Blaisorblade
  0 siblings, 1 reply; 8+ messages in thread
From: Jacob Bachmeyer @ 2006-01-18 23:52 UTC (permalink / raw)
  To: user-mode-linux-devel

Blaisorblade wrote:

>On Monday 16 January 2006 20:34, Jacob Bachmeyer wrote:

>>Has any thought been given to making SKAS4 suitably generic that it
>>could be used for more than just UML?

>Not yet, thoughts welcome.

Let's see:

to support HURD (which uses the Mach ABI):

    -- existing facilities plus trap lcall gates

to support WINE (which follows Win32 conventions (ick!)): (x86 only)

    --existing facilities plus
     -- trap on access to specified pages

        Explanation:  Win32 API calls are not syscalls in the normal 
sense--rather they are made by calling into a system DLL.  These DLLs 
are mapped into the process' address space on Windows and under current 
WINE, much like shared objects in normal Linux.  This idea would enable 
WINE to not actually map these DLLs, but rather simply set the pages 
where the DLLs would be mapped as "fasttrap".  Then, when the program 
attempts to access a DLL's memory image, the kernel would intercept the 
request and quickly pass it to a userspace thread, which handles the 
"page fault".  The page remains set as "fasttrap", and the control 
process modifies the address space and CPU context appropriately before 
allowing execution to continue.

     -- read/write in guest address space

        Explanation:  mmap is fine for big changes to an address space 
(such as loading modules), but one capability WINE would need for this 
to be truly useful is 1/2/4/8/16-byte PEEK and POKE.  (Some Win32 
programs like to do wierd things with Windows' system code--in 
conjunction with "fasttrap", this would allow WINE to keep such programs 
happy.)  As I understand, ptrace already provides this, hopefully 
adequetely.

     -- intercept arbitrary interrupts in guest address space

        Explanation:  Many older Windows programs (Win16 era) 
occasionally directly invoke various soft interrupts (these are 
basically DOS syscalls).  The ability to intercept these is necessary, 
but need not be particularly efficient or fast.

     -- modify guest address space's LDT

        Explanation:  Again, Win16 support.  Old Windows actually 
allowed processes to request segments for whatever purpose.  This may or 
may not be doable on all modern hardware.

     -- transparently use threads in guest address spaces, if desired

        Explanation:  WINE currently uses the host's scheduler. 
Changing it to this new API shouldn't adversely affect that ability. 
(And on second thought, using a UML library might not be an option.)

>>PS:  If I understand correctly, UML with the current SKAS3 works by
>>swapping processes into and out of a single "user" address space.

>>I 
>>propose a system where many distinct "user" address spaces are
>>maintained by the kernel and execution is placed whereever the user-mode
>>scheduler says.

>What you say is not clear, but the most obvious understanding of the above 
>sentence is that you propose what already happens.

>However, SKAS3 and current ideas for SKAS4, with different APIs but similar 
>semantics, say: implement all guest processes as user-level threads (totally 
>implemented within UML) with the exception that we allow different address 
>spaces.

>So we have "switch the guest proc to a different address 
>space" (PTRACE_SWITCH_MM, in arch/i386/kernel/ptrace.c), manipulate with 
>mmap/munmap/mprotect any of these address spaces, and destroy it (all in 
>mm/proc_mm.c).

I shall clarify my proposal:  each thread is assigned an address space, 
while an address space can contain multiple threads.  Each thread also 
has a STOP/RUN flag, which if set to RUN, causes the host scheduler to 
consider that thread for execution (along with all other runnable 
threads).  This flag allows either the userspace control process to make 
scheduling decisions itself, (by only setting one of its threads to RUN) 
or to punt and have the kernel handle all scheduling for its threads (by 
setting them all to RUN and using STOP only to block a thread).

Could all SKAS4 APIs be multiplexed through one syscall?  (Perhaps 
simply as more ptrace functions, or as a new "skas4" syscall?)




-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642
_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [uml-devel] SKAS4 design question
  2006-01-18 23:52   ` Jacob Bachmeyer
@ 2006-01-19  0:37     ` Blaisorblade
  2006-01-19 22:23       ` Jacob Bachmeyer
  0 siblings, 1 reply; 8+ messages in thread
From: Blaisorblade @ 2006-01-19  0:37 UTC (permalink / raw)
  To: user-mode-linux-devel; +Cc: Jacob Bachmeyer

On Thursday 19 January 2006 00:52, Jacob Bachmeyer wrote:
> Blaisorblade wrote:
> >On Monday 16 January 2006 20:34, Jacob Bachmeyer wrote:
> >>Has any thought been given to making SKAS4 suitably generic that it
> >>could be used for more than just UML?
> >
> >Not yet, thoughts welcome.
>
> Let's see:
>
> to support HURD (which uses the Mach ABI):
>
>     -- existing facilities plus trap lcall gates

I.e. extend ptrace to trap lcall gates, right? That's another thing, could be 
done, but it relates more to the Linux-ABI project... at least this can't be 
merged in mainline since we don't support lcall gates.

> to support WINE (which follows Win32 conventions (ick!)): (x86 only)

>     --existing facilities plus
>      -- trap on access to specified pages

We do that: make them unmapped and trap SIGSEGV through ptrace. Doesn't work 
for accesses from kernel-space (you don't get SIGSEGV, just, likely, 
-EFAULT). And it's horribly slow. And trapping for kernelspace accesses is 
bad.

>         Explanation:  Win32 API calls are not syscalls in the normal
> sense--rather they are made by calling into a system DLL.

Yep, it then can decide whether to trap into the kernel or not (depending on 
that version's implementation).

> These DLLs 
> are mapped into the process' address space on Windows and under current
> WINE, much like shared objects in normal Linux.  This idea would enable
> WINE to not actually map these DLLs, but rather simply set the pages
> where the DLLs would be mapped as "fasttrap". 

Which is the reason to trap to the kernel? It's going to be slow. A page 
fault, like a syscall, is costly (and probably more since it's an interrupt).

If there is a good reason not to map the DLLs, it may at least make sense, but 
WINE users aren't going to use special patches, and getting such an hackish 
thing in mainline may be a hard sell (except the reason is _really_ good).

> Then, when the program 
> attempts to access a DLL's memory image, the kernel would intercept the
> request and quickly pass it to a userspace thread,

Good saying, quickly pass it... signals are slow. There faster but more 
complicated primitives (I remind netlink for instance).

> which handles the 
> "page fault". 

> The page remains set as "fasttrap", and the control 
> process modifies the address space and CPU context appropriately before
> allowing execution to continue.

"Modifies" to return the call or to map the page in? You seem to imply it 
performs the call and sets the return value in EAX, right?

Also, for security reasons it's not possible to let userspace trap OS accesses 
(as the OS is more privileged - search TENEX at 
http://www.isi.edu/~faber/cs402/notes/lecture19.html to see how bad is that).

>      -- read/write in guest address space

>         Explanation:  mmap is fine for big changes to an address space
> (such as loading modules), but one capability WINE would need for this
> to be truly useful is 1/2/4/8/16-byte PEEK and POKE.  (Some Win32
> programs like to do wierd things with Windows' system code--in
> conjunction with "fasttrap", this would allow WINE to keep such programs
> happy.)  As I understand, ptrace already provides this, hopefully
> adequetely.

It provides this, it could be made a bit faster (I've reviewed a patch from 
another project which uses heavily ptrace, which makes that faster).

>      -- intercept arbitrary interrupts in guest address space
>
>         Explanation:  Many older Windows programs (Win16 era)
> occasionally directly invoke various soft interrupts (these are
> basically DOS syscalls).  The ability to intercept these is necessary,
> but need not be particularly efficient or fast.

I recall that hardware IRQ n. x is mapped to k+x, where k is fixed and low; we 
now have with ACPI 32 IRQs I guess (on my machine the kernel uses up to 22 
IRQs), so I guess int 0x21 it's going to conflict somewhere.

That said, this could be added too for interrupts not reserved by the kernel 
(that is CPU exceptions). But DOSEMU already runs x86 programs, so WINE 
should be able to do it too... ah, yep, it uses vm86, while you need to do 
that on a paged system.

>      -- modify guest address space's LDT

>         Explanation:  Again, Win16 support.  Old Windows actually
> allowed processes to request segments for whatever purpose.  This may or
> may not be doable on all modern hardware.

PTRACE_LDT exists, and performs a remove modify_ldt, like MM_MMAP is a remote 
mmap().

>      -- transparently use threads in guest address spaces, if desired

>         Explanation:  WINE currently uses the host's scheduler.
> Changing it to this new API shouldn't adversely affect that ability.
> (And on second thought, using a UML library might not be an option.)

> I shall clarify my proposal:  each thread is assigned an address space,
and (you forget to say) it can be changed through PTRACE_SWITCH_MM you mean... 
(otherwise I don't see the addition).

> while an address space can contain multiple threads.

you can PTRACE_SWITCH_MM multiple threads to the same address space

> Each thread also 
> has a STOP/RUN flag, which if set to RUN, causes the host scheduler to
> consider that thread for execution (along with all other runnable
> threads).  This flag allows either the userspace control process to make
> scheduling decisions itself, (by only setting one of its threads to RUN)
> or to punt and have the kernel handle all scheduling for its threads (by
> setting them all to RUN and using STOP only to block a thread).

Hmm, sleeping like that is easy if you mean that only a thread can switch 
itself from RUN to STOP. The thread can use some mutex/semaphore thing, at 
that point.

To switch a thread from RUN to STOP from the exterior, you can currently kill 
it with -STOP. Beware it's maybe slow, but I don't know whether it matters 
and if it can be made much faster.

The problem (I think) is that SIGSTOP will be processed not at kill() time, 
but at delivery time, i.e. after a context switch to the receiving thread, 
before returning to userspace. I've not checked for SIGSTOP and am not sure 
for the rest, but I think it's this way.

> Could all SKAS4 APIs be multiplexed through one syscall?  (Perhaps
> simply as more ptrace functions, or as a new "skas4" syscall?)

"multiplexing" like ipc(2) is a bad idea. 

However, currently the idea is sys_mm_indirect , taking an fd representing an 
mm context, a syscall number and its parameters, plus a syscall to get a fd 
representing a mm context.
-- 
Inform me of my mistakes, so I can keep imitating Homer Simpson's "Doh!".
Paolo Giarrusso, aka Blaisorblade (Skype ID "PaoloGiarrusso", ICQ 215621894)
http://www.user-mode-linux.org/~blaisorblade

	
	
		
___________________________________ 
Yahoo! Messenger with Voice: chiama da PC a telefono a tariffe esclusive 
http://it.messenger.yahoo.com



-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642
_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [uml-devel] SKAS4 design question
  2006-01-19  0:37     ` Blaisorblade
@ 2006-01-19 22:23       ` Jacob Bachmeyer
  2006-01-20 16:41         ` Blaisorblade
  0 siblings, 1 reply; 8+ messages in thread
From: Jacob Bachmeyer @ 2006-01-19 22:23 UTC (permalink / raw)
  To: user-mode-linux-devel

Blaisorblade wrote:

>On Thursday 19 January 2006 00:52, Jacob Bachmeyer wrote:
>
>>Blaisorblade wrote:
>>
>>>On Monday 16 January 2006 20:34, Jacob Bachmeyer wrote:
>>>
>>>>Has any thought been given to making SKAS4 suitably generic that it
>>>>could be used for more than just UML?
>>>>
>>>Not yet, thoughts welcome.
>>>
>>Let's see:
>>
>>to support HURD (which uses the Mach ABI):
>>
>>    -- existing facilities plus trap lcall gates
>
>I.e. extend ptrace to trap lcall gates, right? That's another thing, could be 
>done, but it relates more to the Linux-ABI project... at least this can't be 
>merged in mainline since we don't support lcall gates.

Why not?  And for that matter, why does ptrace not currently catch lcalls?

>>to support WINE (which follows Win32 conventions (ick!)): (x86 only)
>>
>>    --existing facilities plus
>>     -- trap on access to specified pages
>
>We do that: make them unmapped and trap SIGSEGV through ptrace. Doesn't work 
>for accesses from kernel-space (you don't get SIGSEGV, just, likely, 
>-EFAULT). And it's horribly slow. And trapping for kernelspace accesses is 
>bad.

You don't have to trap kernelspace accesses;  (-EFAULT there would be a
good thing--the host kernel shouldn't be looking in these pages anyway)
this is only to apply to userspace code, but SIGSEGV is slow--why should
it be fast?  It's an error path.

>>        Explanation:  Win32 API calls are not syscalls in the normal
>>sense--rather they are made by calling into a system DLL.
>
>Yep, it then can decide whether to trap into the kernel or not (depending on 
>that version's implementation).
>
>>These DLLs 
>>are mapped into the process' address space on Windows and under current
>>WINE, much like shared objects in normal Linux.  This idea would enable
>>WINE to not actually map these DLLs, but rather simply set the pages
>>where the DLLs would be mapped as "fasttrap". 
>
>Which is the reason to trap to the kernel? It's going to be slow. A page 
>fault, like a syscall, is costly (and probably more since it's an interrupt).
>
>If there is a good reason not to map the DLLs, it may at least make sense, but 
>WINE users aren't going to use special patches, and getting such an hackish 
>thing in mainline may be a hard sell (except the reason is _really_ good).

The overhead is not all that large, as most Win32 API calls ultimately
go into the kernel anyway.  This also should allow WINE to work well on
platforms such as x86-64, without needing multiple WINE binaries.
(64-bit control process managing mix of 32 and 64 bit address spaces)

Also, what exactly are vsyscalls?

Executables are already demand-paged--so page faults routinely happen
anyway.  The reason to trap is to allow WINE to intercept the call while
sitting in another address space.  (Each Win32 process would have its
own guest address space.)  The idea is to have the interfaces UML uses
be generic enough for WINE to also use.

The reason is simple--improved security by enforcing a sandbox around
WINE.

>>Then, when the program 
>>attempts to access a DLL's memory image, the kernel would intercept the
>>request and quickly pass it to a userspace thread,
>
>Good saying, quickly pass it... signals are slow. There faster but more 
>complicated primitives (I remind netlink for instance).

User DLLs (those from the program itself) would actually be mapped.  The
system DLLs (kernel32, user32, etc.) that WINE itself implements on
Linux and that must trap to kernelspace on Windows would be loaded this
way.  One benefit is to reduce the chance of conflict, as various
internal modules in WINE that don't exist in Windows could thus be
removed from the visible (to the Win32 app) address space.  This could
have uses other than WINE, too.  One possibility is as a "padded cell"
of sorts--a process is started in a guest address space under a control
program that intercepts and discards all syscalls.  However, certain
pages in that address space are used as a restricted system
interface--accessing them blocks the accessing thread and causes a
(host) syscall to return in the control process.  This syscall would
block until a guest thread trips a "fasttrap" page and then returns
information such as exact address accessed, read or write, and if write,
value written.  This syscall need not be new--read or ioctl on an
appropriate fd (netlink socket perhaps?) would be enough.  The control
thread then carries out the requested action (whatever that maybe) and
permits the jailed thread to again run.

"fasttrap" may have been a poor choice of terms.  The idea is to have
more or less generic kernel-in-userspace functionality with one process
as a"usermode supervisor" watching a set of other processes.


>>which handles the "page fault". 
>>
>>The page remains set as "fasttrap", and the control 
>>process modifies the address space and CPU context appropriately before
>>allowing execution to continue.
>
>"Modifies" to return the call or to map the page in? You seem to imply it 
>performs the call and sets the return value in EAX, right?
>
>Also, for security reasons it's not possible to let userspace trap OS accesses 
>(as the OS is more privileged - search TENEX at 
>http://www.isi.edu/~faber/cs402/notes/lecture19.html to see how bad is that).

Perform the API call.  It would alter the CPU context, possibly, (if the
call requires it) also changing the guest address space.  There should
be no OS accesses to these pages--those would not trap, but would return
-EFAULT because the pages would not actually be allocated.  (Win32
programs should not be making Linux syscalls--a version of WINE that
uses this would need to catch and ignore any Linux syscalls made.)

>>     -- read/write in guest address space
>>        Explanation:  mmap is fine for big changes to an address space
>>(such as loading modules), but one capability WINE would need for this
>>to be truly useful is 1/2/4/8/16-byte PEEK and POKE.  (Some Win32
>>programs like to do wierd things with Windows' system code--in
>>conjunction with "fasttrap", this would allow WINE to keep such programs
>>happy.)  As I understand, ptrace already provides this, hopefully
>>adequetely.
>
>It provides this, it could be made a bit faster (I've reviewed a patch from 
>another project which uses heavily ptrace, which makes that faster).
>
>>     -- intercept arbitrary interrupts in guest address space
>>        Explanation:  Many older Windows programs (Win16 era)
>>occasionally directly invoke various soft interrupts (these are
>>basically DOS syscalls).  The ability to intercept these is necessary,
>>but need not be particularly efficient or fast.
>
>I recall that hardware IRQ n. x is mapped to k+x, where k is fixed and low; we 
>now have with ACPI 32 IRQs I guess (on my machine the kernel uses up to 22 
>IRQs), so I guess int 0x21 it's going to conflict somewhere.
>
>That said, this could be added too for interrupts not reserved by the kernel 
>(that is CPU exceptions). But DOSEMU already runs x86 programs, so WINE 
>should be able to do it too... ah, yep, it uses vm86, while you need to do 
>that on a paged system.

The only requirement here is to call vm86 in another address space,
which is already doable--except on 64-bit hardware, where vm86 doesn't
exist anyway.

>>     -- transparently use threads in guest address spaces, if desired
>>        Explanation:  WINE currently uses the host's scheduler.
>>Changing it to this new API shouldn't adversely affect that ability.
>>(And on second thought, using a UML library might not be an option.)
>>
>>I shall clarify my proposal:  each thread is assigned an address space,
>>
>and (you forget to say) it can be changed through PTRACE_SWITCH_MM you mean... 
>(otherwise I don't see the addition).
>
>>while an address space can contain multiple threads.
>
>you can PTRACE_SWITCH_MM multiple threads to the same address space

This is exactly it--I wanted to be sure that distinct threads can share
an address space, while one control process can manage as many address
spaces as are needed/wanted.  There should be no addition here--this was
mentioned for completeness.

>>Each thread also 
>>has a STOP/RUN flag, which if set to RUN, causes the host scheduler to
>>consider that thread for execution (along with all other runnable
>>threads).  This flag allows either the userspace control process to make
>>scheduling decisions itself, (by only setting one of its threads to RUN)
>>or to punt and have the kernel handle all scheduling for its threads (by
>>setting them all to RUN and using STOP only to block a thread).
>
>Hmm, sleeping like that is easy if you mean that only a thread can switch 
>itself from RUN to STOP. The thread can use some mutex/semaphore thing, at 
>that point.
>
>To switch a thread from RUN to STOP from the exterior, you can currentlykill 
>it with -STOP. Beware it's maybe slow, but I don't know whether it matters 
>and if it can be made much faster.
>
>The problem (I think) is that SIGSTOP will be processed not at kill() time, 
>but at delivery time, i.e. after a context switch to the receiving thread, 
>before returning to userspace. I've not checked for SIGSTOP and am not sure 
>for the rest, but I think it's this way.

How about a PTRACE_SET_THREAD_RUNNABLE that takes a 1 (RUN) or 0 (STOP)
as its argument and has immediate effects?  The problem (IIRC) with
SIGSTOP is that signals are delivered to all threads in a process, while
a userspace scheduler needs to wake up or block exactly one thread at a
time.  Blocking a thread would be done from the control process, not
from the thread itself.  (The call that resulted in it being blocked was
made by touching a page that triggered the control process.)

>>Could all SKAS4 APIs be multiplexed through one syscall?  (Perhaps
>>simply as more ptrace functions, or as a new "skas4" syscall?)
>
>"multiplexing" like ipc(2) is a bad idea. 
>
>However, currently the idea is sys_mm_indirect , taking an fd representing an 
>mm context, a syscall number and its parameters, plus a syscall to get afd 
>representing a mm context.

How are address spaces manipulated?  Could ioctls on the mm context's fd
be useful?






-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642
_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [uml-devel] SKAS4 design question
  2006-01-19 22:23       ` Jacob Bachmeyer
@ 2006-01-20 16:41         ` Blaisorblade
  2006-01-23 19:59           ` Jacob Bachmeyer
  0 siblings, 1 reply; 8+ messages in thread
From: Blaisorblade @ 2006-01-20 16:41 UTC (permalink / raw)
  To: user-mode-linux-devel, jcb62281

On Thursday 19 January 2006 23:23, Jacob Bachmeyer wrote:
> Blaisorblade wrote:
> >On Thursday 19 January 2006 00:52, Jacob Bachmeyer wrote:
> >>Blaisorblade wrote:
> >>>On Monday 16 January 2006 20:34, Jacob Bachmeyer wrote:
> >>>>Has any thought been given to making SKAS4 suitably generic that it
> >>>>could be used for more than just UML?
> >>>
> >>>Not yet, thoughts welcome.
> >>
> >>Let's see:
> >>
> >>to support HURD (which uses the Mach ABI):
> >>
> >>    -- existing facilities plus trap lcall gates

> >I.e. extend ptrace to trap lcall gates, right? That's another thing, could
> > be done, but it relates more to the Linux-ABI project... at least this
> > can't be merged in mainline since we don't support lcall gates.

> Why not?  And for that matter, why does ptrace not currently catch lcalls?

The lcall stub was removed from arch/i386/kernel/entry.S a little time ago 
(about 2.6.12 IIRC). So vanilla Linux can't handle lcalls. Clear now?

> >>to support WINE (which follows Win32 conventions (ick!)): (x86 only)
> >>
> >>    --existing facilities plus
> >>     -- trap on access to specified pages
> >
> >We do that: make them unmapped and trap SIGSEGV through ptrace. Doesn't
> > work for accesses from kernel-space (you don't get SIGSEGV, just, likely,
> > -EFAULT). And it's horribly slow. And trapping for kernelspace accesses
> > is bad.

> You don't have to trap kernelspace accesses;  (-EFAULT there would be a
> good thing--the host kernel shouldn't be looking in these pages anyway)
> this is only to apply to userspace code, but SIGSEGV is slow--why should
> it be fast?  It's an error path.

Yes, it is thought to be only an error path, but UML abuses of it for normal 
control, and I said that the kernel supports "fasttrap", but only via 
SIGSEGV, i.e. in a slow way.

> >We do that: make them unmapped and trap SIGSEGV through ptrace. 

> >>These DLLs
> >>are mapped into the process' address space on Windows and under current
> >>WINE, much like shared objects in normal Linux.  This idea would enable
> >>WINE to not actually map these DLLs, but rather simply set the pages
> >>where the DLLs would be mapped as "fasttrap".

> >Which is the reason to trap to the kernel? It's going to be slow. A page
> >fault, like a syscall, is costly (and probably more since it's an
> > interrupt).

> >If there is a good reason not to map the DLLs, it may at least make sense,
> > but WINE users aren't going to use special patches, and getting such an
> > hackish thing in mainline may be a hard sell (except the reason is
> > _really_ good).

> The overhead is not all that large, as most Win32 API calls ultimately
> go into the kernel anyway.

A kernel switch only costs about some thousands TSC units (see the rdtsc 
assembly instruction), while a signal delivery to a foreign process can cost 
a lot more (I measure it in the order of 4* 10^5 TSC units, even without a 
memory switch).

> This also should allow WINE to work well on 
> platforms such as x86-64, without needing multiple WINE binaries.
> (64-bit control process managing mix of 32 and 64 bit address spaces)

Writing 64-bit code handling cleanly 32-bit syscalls is hard. Compiling 32-bit 
code in 32-bit mode to do the same is simpler.

> Also, what exactly are vsyscalls?

> Executables are already demand-paged--so page faults routinely happen
> anyway.

Not the same thing - assuming the working set fits in memory, you get page 
faults only for the first access to a given page, and they just jump to the 
kernel.

What you're proposing is that for each call to GDI functions, for instance, or 
whatever, a signal delivery (or in the best case, just a context switch) is 
triggered. That's another thing.

> The reason to trap is to allow WINE to intercept the call while 
> sitting in another address space.  (Each Win32 process would have its
> own guest address space.)  The idea is to have the interfaces UML uses
> be generic enough for WINE to also use.

> The reason is simple--improved security by enforcing a sandbox around
> WINE.

> >>Then, when the program
> >>attempts to access a DLL's memory image, the kernel would intercept the
> >>request and quickly pass it to a userspace thread,

> >Good saying, quickly pass it... signals are slow. There faster but more
> >complicated primitives (I remind netlink for instance).

> User DLLs (those from the program itself) would actually be mapped.  The
> system DLLs (kernel32, user32, etc.) that WINE itself implements on
> Linux and that must trap to kernelspace on Windows would be loaded this
> way.

> One benefit is to reduce the chance of conflict, as various 
> internal modules in WINE that don't exist in Windows could thus be
> removed from the visible (to the Win32 app) address space.  This could
> have uses other than WINE, too.  One possibility is as a "padded cell"
> of sorts--a process is started in a guest address space under a control
> program that intercepts and discards all syscalls.  However, certain
> pages in that address space are used as a restricted system
> interface--accessing them blocks the accessing thread and causes a
> (host) syscall to return in the control process.  This syscall would
> block until a guest thread trips a "fasttrap" page and then returns
> information such as exact address accessed, read or write, and if write,
> value written.  This syscall need not be new--read or ioctl on an
> appropriate fd (netlink socket perhaps?) would be enough.  The control
> thread then carries out the requested action (whatever that maybe) and
> permits the jailed thread to again run.

Andrea Arcangeli merged such a "padded cell" functionality, but the allowed 
interface is read, not a page fault. The former is faster and easier to use, 
and also allows writing arbitrary amounts of data.

It's called secure computing (see kernel/seccomp.c for details, and/or look on 
LWN.net for an article about it).

> "fasttrap" may have been a poor choice of terms.  The idea is to have
> more or less generic kernel-in-userspace functionality with one process
> as a"usermode supervisor" watching a set of other processes.

> >Also, for security reasons it's not possible to let userspace trap OS
> > accesses (as the OS is more privileged - search TENEX at
> >http://www.isi.edu/~faber/cs402/notes/lecture19.html to see how bad is
> > that).

> Perform the API call.  It would alter the CPU context, possibly, (if the
> call requires it) also changing the guest address space.  There should
> be no OS accesses to these pages--those would not trap, but would return
> -EFAULT because the pages would not actually be allocated.  (Win32
> programs should not be making Linux syscalls--a version of WINE that
> uses this would need to catch and ignore any Linux syscalls made.)

> >>     -- read/write in guest address space
> >>        Explanation:  mmap is fine for big changes to an address space
> >>(such as loading modules), but one capability WINE would need for this
> >>to be truly useful is 1/2/4/8/16-byte PEEK and POKE.  (Some Win32
> >>programs like to do wierd things with Windows' system code--in
> >>conjunction with "fasttrap", this would allow WINE to keep such programs
> >>happy.)  As I understand, ptrace already provides this, hopefully
> >>adequetely.

> >It provides this, it could be made a bit faster (I've reviewed a patch
> > from another project which uses heavily ptrace, which makes that faster).

> >>     -- intercept arbitrary interrupts in guest address space
> >>        Explanation:  Many older Windows programs (Win16 era)
> >>occasionally directly invoke various soft interrupts (these are
> >>basically DOS syscalls).  The ability to intercept these is necessary,
> >>but need not be particularly efficient or fast.

> >I recall that hardware IRQ n. x is mapped to k+x, where k is fixed and
> > low; we now have with ACPI 32 IRQs I guess (on my machine the kernel uses
> > up to 22 IRQs), so I guess int 0x21 it's going to conflict somewhere.

> >That said, this could be added too for interrupts not reserved by the
> > kernel (that is CPU exceptions). But DOSEMU already runs x86 programs, so
> > WINE should be able to do it too... ah, yep, it uses vm86, while you need
> > to do that on a paged system.

> The only requirement here is to call vm86 in another address space,
> which is already doable--except on 64-bit hardware, where vm86 doesn't
> exist anyway.

Wait a moment - Windows 3.1 uses 286 paging, and Win16 userspace progs use up 
to 16M of Ram. You don't have this on vm86(), right?

> This is exactly it--I wanted to be sure that distinct threads can share
> an address space, while one control process can manage as many address
> spaces as are needed/wanted.  There should be no addition here--this was
> mentioned for completeness.

UML will need to have this functionality debugged and working sooner or later 
- when it will do SMP with SKAS, it'll need exactly this (you have multiple 
managed threads, corresponding to multiple virtual CPUs, and a thread and its 
address space can be executed on each of those virtual CPUs).

> How about a PTRACE_SET_THREAD_RUNNABLE that takes a 1 (RUN) or 0 (STOP)
> as its argument and has immediate effects?  The problem (IIRC) with
> SIGSTOP is that signals are delivered to all threads in a process,

Isn't there tkill() for this purpose (signals to a specific thread)? And if it 
doesn't work, it should be fixed. Having tons of incoherent APIs is bad, as 
long as things can be done with current ones.

> >However, currently the idea is sys_mm_indirect , taking an fd representing
> > an mm context, a syscall number and its parameters, plus a syscall to get
> > afd representing a mm context.

> How are address spaces manipulated?  Could ioctls on the mm context's fd
> be useful?

We don't use ioctls, they are inelegants; SKAS3 uses write which is just as 
bad.

For SKAS4, instead, you'd use sys_mm_indirectI(); you say:

mm_indirect(addr_space_fd, __NR_MMAP, <mmap_args>)
mm_indirect(addr_space_fd, __NR_MUNMAP, <munmap_args>)

and so on, for each syscall (excluding fork and exit, for now). To destroy an 
address space you simply call close on its fd.

-- 
Inform me of my mistakes, so I can keep imitating Homer Simpson's "Doh!".
Paolo Giarrusso, aka Blaisorblade (Skype ID "PaoloGiarrusso", ICQ 215621894)
http://www.user-mode-linux.org/~blaisorblade

	
	
		
___________________________________ 
Yahoo! Messenger with Voice: chiama da PC a telefono a tariffe esclusive 
http://it.messenger.yahoo.com



-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642
_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [uml-devel] SKAS4 design question
  2006-01-20 16:41         ` Blaisorblade
@ 2006-01-23 19:59           ` Jacob Bachmeyer
  2006-01-30 11:09             ` Blaisorblade
  0 siblings, 1 reply; 8+ messages in thread
From: Jacob Bachmeyer @ 2006-01-23 19:59 UTC (permalink / raw)
  To: user-mode-linux-devel

Blaisorblade wrote:

>On Thursday 19 January 2006 23:23, Jacob Bachmeyer wrote:
>
>>Blaisorblade wrote:
>>
>>>On Thursday 19 January 2006 00:52, Jacob Bachmeyer wrote:
>>>
>>>>Blaisorblade wrote:
>>>>
>>>>>On Monday 16 January 2006 20:34, Jacob Bachmeyer wrote:
>>>>>
>>>>>>Has any thought been given to making SKAS4 suitably generic that it
>>>>>>could be used for more than just UML?
>>>>>>
>>>>>Not yet, thoughts welcome.
>>>>>          
>>>>Let's see:
>>>>
>>>>to support HURD (which uses the Mach ABI):
>>>>
>>>>   -- existing facilities plus trap lcall gates
>>>        
>>>I.e. extend ptrace to trap lcall gates, right? That's another thing, could
>>>be done, but it relates more to the Linux-ABI project... at least this
>>>can't be merged in mainline since we don't support lcall gates.
>>
>>Why not?  And for that matter, why does ptrace not currently catch lcalls?
>
>The lcall stub was removed from arch/i386/kernel/entry.S a little time ago 
>(about 2.6.12 IIRC). So vanilla Linux can't handle lcalls. Clear now?

Yes, the last time I looked into that part of the kernel was back in
2.4.  So, does this mean that lcalls can no longer be potentially used
to escape from UML?

>>>>to support WINE (which follows Win32 conventions (ick!)): (x86 only)
>>>>
>>>>   --existing facilities plus
>>>>    -- trap on access to specified pages
>>>>        
>>>We do that: make them unmapped and trap SIGSEGV through ptrace. Doesn't
>>>work for accesses from kernel-space (you don't get SIGSEGV, just, likely,
>>>-EFAULT). And it's horribly slow. And trapping for kernelspace accesses
>>>is bad.
>>>
>>You don't have to trap kernelspace accesses;  (-EFAULT there would be a
>>good thing--the host kernel shouldn't be looking in these pages anyway)
>>this is only to apply to userspace code, but SIGSEGV is slow--why should
>>it be fast?  It's an error path.
>
>Yes, it is thought to be only an error path, but UML abuses of it for normal 
>control, and I said that the kernel supports "fasttrap", but only via 
>SIGSEGV, i.e. in a slow way.

That is the exact problem.  It shouldn't be abused--a proper interface
that has acceptable performance should be devised.  (You mention
netlink--was it looked into?  This might help with some UML performance
issues.)  Basically what is needed is a means to set a page to no access
but cause some other action to occur rather than generate SIGSEGV.

>>>We do that: make them unmapped and trap SIGSEGV through ptrace. 
>>>
>>The overhead is not all that large, as most Win32 API calls ultimately
>>go into the kernel anyway.
>
>A kernel switch only costs about some thousands TSC units (see the rdtsc 
>assembly instruction), while a signal delivery to a foreign process can cost 
>a lot more (I measure it in the order of 4* 10^5 TSC units, even without a 
>memory switch).

Then a more efficient interface is needed.  Besides, this would need to
be synchronous.

>>This also should allow WINE to work well on 
>>platforms such as x86-64, without needing multiple WINE binaries.
>>(64-bit control process managing mix of 32 and 64 bit address spaces)
>
>Writing 64-bit code handling cleanly 32-bit syscalls is hard. Compiling 32-bit 
>code in 32-bit mode to do the same is simpler.

The problem is that they need to communicate, especially once Win64
actually hits.  WINE currently has a (confusing) "relay" layer that
already does similar tasks for 16/32 bit.  Furthermore, the Win32 API
calling convention is fairly well defined, (parameters on stack; return
in EAX) so this shouldn't be more of a problem than has been solved in
the past.  (That doesn't mean it won't be a real PITA.)

>>The reason to trap is to allow WINE to intercept the call while 
>>sitting in another address space.  (Each Win32 process would have its
>>own guest address space.)  The idea is to have the interfaces UML uses
>>be generic enough for WINE to also use.
>>
>>The reason is simple--improved security by enforcing a sandbox around
>>WINE.
>>
Seccomp (see below--thanks for bringing it up) could more easily be used
to solve this.  (Why bother with trapping all the time when only a few
pages really need protection?  Furthermore, the external control thread
would thus have veto power over all syscalls made, so the sandbox can be
easily enforced.)

>>>>Then, when the program
>>>>attempts to access a DLL's memory image, the kernel would intercept the
>>>>request and quickly pass it to a userspace thread,
>>>
>>>Good saying, quickly pass it... signals are slow. There faster but more
>>>complicated primitives (I remind netlink for instance).
>>
>>User DLLs (those from the program itself) would actually be mapped.  The
>>system DLLs (kernel32, user32, etc.) that WINE itself implements on
>>Linux and that must trap to kernelspace on Windows would be loaded this
>>way.
>>    
>>One benefit is to reduce the chance of conflict, as various 
>>internal modules in WINE that don't exist in Windows could thus be
>>removed from the visible (to the Win32 app) address space.  This could
>>have uses other than WINE, too.  One possibility is as a "padded cell"
>>of sorts--a process is started in a guest address space under a control
>>program that intercepts and discards all syscalls.  However, certain
>>pages in that address space are used as a restricted system
>>interface--accessing them blocks the accessing thread and causes a
>>(host) syscall to return in the control process.  This syscall would
>>block until a guest thread trips a "fasttrap" page and then returns
>>information such as exact address accessed, read or write, and if write,
>>value written.  This syscall need not be new--read or ioctl on an
>>appropriate fd (netlink socket perhaps?) would be enough.  The control
>>thread then carries out the requested action (whatever that maybe) and
>>permits the jailed thread to again run.
>
>Andrea Arcangeli merged such a "padded cell" functionality, but the allowed 
>interface is read, not a page fault. The former is faster and easier to use, 
>and also allows writing arbitrary amounts of data.
>
>It's called secure computing (see kernel/seccomp.c for details, and/or look on 
>LWN.net for an article about it).

I had looked at this earlier, but hadn't realized that it could be used
to implement this--provided that mm_indirect can make syscalls in a
seccomp address space (bypassing the restriction), this can do
everything that "fasttrap" could (using some help from appropriate code
in userspace).  Maybe SKAS4 should add a new seccomp level?

>>>>    -- read/write in guest address space
>>>>       Explanation:  mmap is fine for big changes to an address space
>>>>(such as loading modules), but one capability WINE would need for this
>>>>to be truly useful is 1/2/4/8/16-byte PEEK and POKE.  (Some Win32
>>>>programs like to do wierd things with Windows' system code--in
>>>>conjunction with "fasttrap", this would allow WINE to keep such programs
>>>>happy.)  As I understand, ptrace already provides this, hopefully
>>>>adequetely.
>>>
>>>It provides this, it could be made a bit faster (I've reviewed a patch
>>>from another project which uses heavily ptrace, which makes that faster).

One down, more to go.

>>>>    -- intercept arbitrary interrupts in guest address space
>>>>       Explanation:  Many older Windows programs (Win16 era)
>>>>occasionally directly invoke various soft interrupts (these are
>>>>basically DOS syscalls).  The ability to intercept these is necessary,
>>>>but need not be particularly efficient or fast.
>>>
>>>I recall that hardware IRQ n. x is mapped to k+x, where k is fixed and
>>>low; we now have with ACPI 32 IRQs I guess (on my machine the kernel uses
>>>up to 22 IRQs), so I guess int 0x21 it's going to conflict somewhere.
>>>
>>>That said, this could be added too for interrupts not reserved by the
>>>kernel (that is CPU exceptions). But DOSEMU already runs x86 programs, so
>>>WINE should be able to do it too... ah, yep, it uses vm86, while you need
>>>to do that on a paged system.
>>
>>The only requirement here is to call vm86 in another address space,
>>which is already doable--except on 64-bit hardware, where vm86 doesn't
>>exist anyway.
>
>Wait a moment - Windows 3.1 uses 286 paging, and Win16 userspace progs use up 
>to 16M of Ram. You don't have this on vm86(), right?

No, but as I said vm86 is gone on x86-64, which means that DOS soft ints
are somehow caught--inside the address space in question.  (WINE
currently runs in-process, I am trying to lay the groundwork to change
that--thus all the crazy stuff previously about "fasttrap" to another
userspace.)  Current WINE can use vm86 on i386 platform, however.

This (Win16 programs with 16MiB of RAM) also means that WINE could
always intercept soft interrupts--even without use of vm86.

The other catch is that 64 and 32 bit code doesn't mix very well, and
they must be kept in separate processes normally--thus the reason for a
64-bit control process to be able to handle both 32 and 64 bit address
spaces.  The entire kernel is 64-bit anyway, so leaving the option open
can't be too insanely hard.

>>How about a PTRACE_SET_THREAD_RUNNABLE that takes a 1 (RUN) or 0 (STOP)
>>as its argument and has immediate effects?  The problem (IIRC) with
>>SIGSTOP is that signals are delivered to all threads in a process,
>
>Isn't there tkill() for this purpose (signals to a specific thread)? And if it 
>doesn't work, it should be fixed. Having tons of incoherent APIs is bad, as 
>long as things can be done with current ones.

The other problem is that a more specific interface could be much
faster.  OTOH, perhaps a better strategy would be to improve the
signals--thus also lessening the other problem (slowness of SIGSEGV) as
well as improving performance generally.

>>>However, currently the idea is sys_mm_indirect , taking an fd representing
>>>an mm context, a syscall number and its parameters, plus a syscall to get
>>>a fd representing a mm context.
>>
>>How are address spaces manipulated?  Could ioctls on the mm context's fd
>>be useful?
>
>We don't use ioctls, they are inelegants; SKAS3 uses write which is just as 
>bad.

What is inelegant about an ioctl on a special fd?  I say that ioctls are
far preferrable to more fds (on other files), or the extra complexity of
implementing some other interface (maybe using netlink?).  Besides, if
you implement your own struct file_operations, you get ioctl support by
writing the handler function for it.  (If I understand the Linux 2.6.14
VFS correctly).  OTOH, if no operations that fall into ioctl's area are
needed, then implementing ioctl for its own sake is silly.

>For SKAS4, instead, you'd use sys_mm_indirectI(); you say:
>
>mm_indirect(addr_space_fd, __NR_MMAP, <mmap_args>)
>mm_indirect(addr_space_fd, __NR_MUNMAP, <munmap_args>)
>
>and so on, for each syscall (excluding fork and exit, for now). To destroy an 
>address space you simply call close on its fd.

How do you map region X of the guest address space to region Y (or
somewhere) in your own?  mmap/munmap on the address space's fd would
make sense here.

PS:  Sorry about the long delay.  Mozilla crashed while I had the 
compose window for this message buried under several browsers (and 
totally forgotten, too--oops).




-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642
_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [uml-devel] SKAS4 design question
  2006-01-23 19:59           ` Jacob Bachmeyer
@ 2006-01-30 11:09             ` Blaisorblade
  0 siblings, 0 replies; 8+ messages in thread
From: Blaisorblade @ 2006-01-30 11:09 UTC (permalink / raw)
  To: user-mode-linux-devel, jcb62281

On Monday 23 January 2006 20:59, Jacob Bachmeyer wrote:
> Blaisorblade wrote:
> >On Thursday 19 January 2006 23:23, Jacob Bachmeyer wrote:
> >>Blaisorblade wrote:
> >>>On Thursday 19 January 2006 00:52, Jacob Bachmeyer wrote:

> >>>I.e. extend ptrace to trap lcall gates, right? That's another thing,
> >>> could be done, but it relates more to the Linux-ABI project... at least
> >>> this can't be merged in mainline since we don't support lcall gates.
> >>
> >>Why not?  And for that matter, why does ptrace not currently catch
> >> lcalls?
> >
> >The lcall stub was removed from arch/i386/kernel/entry.S a little time ago
> >(about 2.6.12 IIRC). So vanilla Linux can't handle lcalls. Clear now?

> Yes, the last time I looked into that part of the kernel was back in
> 2.4.  So, does this mean that lcalls can no longer be potentially used
> to escape from UML?

Yes, and IIRC that was also fixed directly time ago via LDT clearing, IIRC.

> >Yes, it is thought to be only an error path, but UML abuses of it for
> > normal control, and I said that the kernel supports "fasttrap", but only
> > via SIGSEGV, i.e. in a slow way.

> That is the exact problem.  It shouldn't be abused--a proper interface
> that has acceptable performance should be devised.  (You mention
> netlink--was it looked into?

No, and I while I mentioned netlink it's not an interface of which I've a deep 
knowledge. However it's being used for various things, including a proposed 
rewrite of the wireless API, and the already existing implementation of 
userspace packet filtering, so we can assume it has reasonable performance, 
momentum, user base and thus maintainance.

> This might help with some UML performance 
> issues.)

Possibly yes, but Ingo Molnar already designed a custom API for this purpose - 
it is grown up for UML usage.

> Basically what is needed is a means to set a page to no access 
> but cause some other action to occur rather than generate SIGSEGV.

> >>>We do that: make them unmapped and trap SIGSEGV through ptrace.
> >>
> >>The overhead is not all that large, as most Win32 API calls ultimately
> >>go into the kernel anyway.
> >
> >A kernel switch only costs about some thousands TSC units (see the rdtsc
> >assembly instruction), while a signal delivery to a foreign process can
> > cost a lot more (I measure it in the order of 4* 10^5 TSC units, even
> > without a memory switch).
>
> Then a more efficient interface is needed.  Besides, this would need to
> be synchronous.
>
> >>This also should allow WINE to work well on
> >>platforms such as x86-64, without needing multiple WINE binaries.
> >>(64-bit control process managing mix of 32 and 64 bit address spaces)
> >
> >Writing 64-bit code handling cleanly 32-bit syscalls is hard. Compiling
> > 32-bit code in 32-bit mode to do the same is simpler.
>
> The problem is that they need to communicate, especially once Win64
> actually hits.  WINE currently has a (confusing) "relay" layer that
> already does similar tasks for 16/32 bit.  Furthermore, the Win32 API
> calling convention is fairly well defined, (parameters on stack; return
> in EAX) so this shouldn't be more of a problem than has been solved in
> the past.  (That doesn't mean it won't be a real PITA.)
>
> >>The reason to trap is to allow WINE to intercept the call while
> >>sitting in another address space.  (Each Win32 process would have its
> >>own guest address space.)  The idea is to have the interfaces UML uses
> >>be generic enough for WINE to also use.
> >>
> >>The reason is simple--improved security by enforcing a sandbox around
> >>WINE.

> Seccomp (see below--thanks for bringing it up) could more easily be used
> to solve this.  (Why bother with trapping all the time when only a few
> pages really need protection?  Furthermore, the external control thread
> would thus have veto power over all syscalls made, so the sandbox can be
> easily enforced.)

> >Andrea Arcangeli merged such a "padded cell" functionality, but the
> > allowed interface is read, not a page fault. The former is faster and
> > easier to use, and also allows writing arbitrary amounts of data.
> >
> >It's called secure computing (see kernel/seccomp.c for details, and/or
> > look on LWN.net for an article about it).
>
> I had looked at this earlier, but hadn't realized that it could be used
> to implement this--provided that mm_indirect can make syscalls in a
> seccomp address space (bypassing the restriction),

Wait a moment - you're clearly talking about the runtime thread calling 
mm_indirect(), or I mistook something?

In this case there's no problem - seccomp jails the process only. If we tried 
to inject in the process code to perform syscalls (like UML does in SKAS0 
mode, which is not a host patch) it wouldn't work, but mm_indirect is a 
normal syscall borrowing the foreign address space.

> this can do 
> everything that "fasttrap" could (using some help from appropriate code
> in userspace). 

> Maybe SKAS4 should add a new seccomp level?

I don't remember about "levels" in seccomp... and that was intended to be 
simple. Beyond they shouldn't be needed (see above).

> >Wait a moment - Windows 3.1 uses 286 paging, and Win16 userspace progs use
> > up to 16M of Ram. You don't have this on vm86(), right?

> No, but as I said vm86 is gone on x86-64, which means that DOS soft ints
> are somehow caught--inside the address space in question.  (WINE
> currently runs in-process, I am trying to lay the groundwork to change
> that--thus all the crazy stuff previously about "fasttrap" to another
> userspace.)  Current WINE can use vm86 on i386 platform, however.

> This (Win16 programs with 16MiB of RAM) also means that WINE could
> always intercept soft interrupts--even without use of vm86.
Good.

> The other catch is that 64 and 32 bit code doesn't mix very well, and
> they must be kept in separate processes normally--thus the reason for a
> 64-bit control process to be able to handle both 32 and 64 bit address
> spaces.  The entire kernel is 64-bit anyway, so leaving the option open
> can't be too insanely hard.

> The other problem is that a more specific interface could be much
> faster.  OTOH, perhaps a better strategy would be to improve the
> signals--thus also lessening the other problem (slowness of SIGSEGV) as
> well as improving performance generally.

Signals are very slow, but in many ways they can't be optimized. The only big 
optimization which can be done is when _tracing_ a process which gets a 
signal. The signal is first delivered to the target process, a context switch 
is made towards it, and only afterwards, before returning to user mode, is 
the signal notification delivered to the tracing process, a context switch is 
performed towards it and then the traced process is switched again to ready 
state and then scheduled. I.e. the first switch to the target process is 
totally useless.

> >>>However, currently the idea is sys_mm_indirect , taking an fd
> >>> representing an mm context, a syscall number and its parameters, plus a
> >>> syscall to get a fd representing a mm context.

> >>How are address spaces manipulated?  Could ioctls on the mm context's fd
> >>be useful?

> >We don't use ioctls, they are inelegants; SKAS3 uses write which is just
> > as bad.

> What is inelegant about an ioctl on a special fd?  I say that ioctls are
> far preferrable to more fds (on other files), or the extra complexity of
> implementing some other interface (maybe using netlink?).

ioctl is totally unstructured and thus inelegant, and 32/64-bit compatibility 
is a PITA.

Using them for devices is tolerable, for general APIs isn't. Many recently 
included APIs were born as ioctl()s set and were rewritten as either syscalls 
sets or special filesystems (say inotify(), for instance).

Device mapper uses ioctls only because it was merged in the dark age of 2.5 
and it was really needed.

> Besides, if 
> you implement your own struct file_operations, you get ioctl support by
> writing the handler function for it.

> (If I understand the Linux 2.6.14 
> VFS correctly).

You do, that's not the problem... and the inelegance is not totally in the 
implementation, but in the API.

> OTOH, if no operations that fall into ioctl's area are 
> needed, then implementing ioctl for its own sake is silly.

> >For SKAS4, instead, you'd use sys_mm_indirectI(); you say:
> >
> >mm_indirect(addr_space_fd, __NR_MMAP, <mmap_args>)
> >mm_indirect(addr_space_fd, __NR_MUNMAP, <munmap_args>)
> >
> >and so on, for each syscall (excluding fork and exit, for now). To destroy
> > an address space you simply call close on its fd.

> How do you map region X of the guest address space to region Y (or
> somewhere) in your own?  mmap/munmap on the address space's fd would
> make sense here.

That's not possible, to my knowledge, unless you use a shared backing storage, 
i.e. a tmpfs file.

I.e. the memory must be set up as shareable from the very beginning.

-- 
Inform me of my mistakes, so I can keep imitating Homer Simpson's "Doh!".
Paolo Giarrusso, aka Blaisorblade (Skype ID "PaoloGiarrusso", ICQ 215621894)
http://www.user-mode-linux.org/~blaisorblade


	
	
		
___________________________________ 
Yahoo! Messenger with Voice: chiama da PC a telefono a tariffe esclusive 
http://it.messenger.yahoo.com



-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642
_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2006-01-30 11:11 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-01-16 19:34 [uml-devel] SKAS4 design question Jacob Bachmeyer
2006-01-18 11:58 ` Blaisorblade
2006-01-18 23:52   ` Jacob Bachmeyer
2006-01-19  0:37     ` Blaisorblade
2006-01-19 22:23       ` Jacob Bachmeyer
2006-01-20 16:41         ` Blaisorblade
2006-01-23 19:59           ` Jacob Bachmeyer
2006-01-30 11:09             ` Blaisorblade

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.