* [uml-devel] SKAS4 design question
@ 2006-01-16 19:34 Jacob Bachmeyer
2006-01-18 11:58 ` Blaisorblade
0 siblings, 1 reply; 8+ messages in thread
From: Jacob Bachmeyer @ 2006-01-16 19:34 UTC (permalink / raw)
To: user-mode-linux-devel; +Cc: jcb62281
Has any thought been given to making SKAS4 suitably generic that it
could be used for more than just UML?
I'm thinking of some arrangement where one process can handle multiple
address spaces for multiple other processes.
This would have greater application than merely UML--for example, Wine
could also be adapted to use SKAS, potentially a killer app, as this
could make Wine more secure than Windows. (Running all Wine code in its
own address space, separate from the apps Wine runs, could insulate
against some application buffer overruns. (due to the way the Win32 API
is accessed))
Hmm, what would we need for this to work?
--ability to create/release "remote" address spaces
--read/write in those "remote" address spaces
-- possibly even capability to map a section of a "remote" address space
into the control process, do something, then release it
--ability to configure pages in a "remote" address space such that
accesses trap to the control process
--ability to trap all possible syscalls from such an address space
for the big bonus:
--ability to use either the host scheduler or some code from the
not-yet-developed libUML to run threads in the "remote" address spaces
Hmm, with a little more effort, this could become a generic
compatibility layer for non-Linux programs--for each foreign platform,
one would need only a control program that manages the foreign processes
and implements the foreign syscalls.
{Contemplates HURD on Linux :-)}
As I understand it, the Linux mm system is internally moving in this
kind of direction already. SKAS would become primarily a system by
which pages can have backing store implemented in userspace and "remote"
address spaces managed.
This direction would certainly help push SKAS into the stock kernel.
PS: If I understand correctly, UML with the current SKAS3 works by
swapping processes into and out of a single "user" address space. I
propose a system where many distinct "user" address spaces are
maintained by the kernel and execution is placed whereever the user-mode
scheduler says.
-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems? Stop! Download the new AJAX search engine that makes
searching your log files as easy as surfing the web. DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: [uml-devel] SKAS4 design question 2006-01-16 19:34 [uml-devel] SKAS4 design question Jacob Bachmeyer @ 2006-01-18 11:58 ` Blaisorblade 2006-01-18 23:52 ` Jacob Bachmeyer 0 siblings, 1 reply; 8+ messages in thread From: Blaisorblade @ 2006-01-18 11:58 UTC (permalink / raw) To: user-mode-linux-devel, jcb62281 On Monday 16 January 2006 20:34, Jacob Bachmeyer wrote: > Has any thought been given to making SKAS4 suitably generic that it > could be used for more than just UML? Not yet, thoughts welcome. > PS: If I understand correctly, UML with the current SKAS3 works by > swapping processes into and out of a single "user" address space. > I > propose a system where many distinct "user" address spaces are > maintained by the kernel and execution is placed whereever the user-mode > scheduler says. What you say is not clear, but the most obvious understanding of the above sentence is that you propose what already happens. However, SKAS3 and current ideas for SKAS4, with different APIs but similar semantics, say: implement all guest processes as user-level threads (totally implemented within UML) with the exception that we allow different address spaces. So we have "switch the guest proc to a different address space" (PTRACE_SWITCH_MM, in arch/i386/kernel/ptrace.c), manipulate with mmap/munmap/mprotect any of these address spaces, and destroy it (all in mm/proc_mm.c). -- Inform me of my mistakes, so I can keep imitating Homer Simpson's "Doh!". Paolo Giarrusso, aka Blaisorblade (Skype ID "PaoloGiarrusso", ICQ 215621894) http://www.user-mode-linux.org/~blaisorblade ___________________________________ Yahoo! Messenger with Voice: chiama da PC a telefono a tariffe esclusive http://it.messenger.yahoo.com ------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Do you grep through log files for problems? Stop! Download the new AJAX search engine that makes searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642 _______________________________________________ User-mode-linux-devel mailing list User-mode-linux-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [uml-devel] SKAS4 design question 2006-01-18 11:58 ` Blaisorblade @ 2006-01-18 23:52 ` Jacob Bachmeyer 2006-01-19 0:37 ` Blaisorblade 0 siblings, 1 reply; 8+ messages in thread From: Jacob Bachmeyer @ 2006-01-18 23:52 UTC (permalink / raw) To: user-mode-linux-devel Blaisorblade wrote: >On Monday 16 January 2006 20:34, Jacob Bachmeyer wrote: >>Has any thought been given to making SKAS4 suitably generic that it >>could be used for more than just UML? >Not yet, thoughts welcome. Let's see: to support HURD (which uses the Mach ABI): -- existing facilities plus trap lcall gates to support WINE (which follows Win32 conventions (ick!)): (x86 only) --existing facilities plus -- trap on access to specified pages Explanation: Win32 API calls are not syscalls in the normal sense--rather they are made by calling into a system DLL. These DLLs are mapped into the process' address space on Windows and under current WINE, much like shared objects in normal Linux. This idea would enable WINE to not actually map these DLLs, but rather simply set the pages where the DLLs would be mapped as "fasttrap". Then, when the program attempts to access a DLL's memory image, the kernel would intercept the request and quickly pass it to a userspace thread, which handles the "page fault". The page remains set as "fasttrap", and the control process modifies the address space and CPU context appropriately before allowing execution to continue. -- read/write in guest address space Explanation: mmap is fine for big changes to an address space (such as loading modules), but one capability WINE would need for this to be truly useful is 1/2/4/8/16-byte PEEK and POKE. (Some Win32 programs like to do wierd things with Windows' system code--in conjunction with "fasttrap", this would allow WINE to keep such programs happy.) As I understand, ptrace already provides this, hopefully adequetely. -- intercept arbitrary interrupts in guest address space Explanation: Many older Windows programs (Win16 era) occasionally directly invoke various soft interrupts (these are basically DOS syscalls). The ability to intercept these is necessary, but need not be particularly efficient or fast. -- modify guest address space's LDT Explanation: Again, Win16 support. Old Windows actually allowed processes to request segments for whatever purpose. This may or may not be doable on all modern hardware. -- transparently use threads in guest address spaces, if desired Explanation: WINE currently uses the host's scheduler. Changing it to this new API shouldn't adversely affect that ability. (And on second thought, using a UML library might not be an option.) >>PS: If I understand correctly, UML with the current SKAS3 works by >>swapping processes into and out of a single "user" address space. >>I >>propose a system where many distinct "user" address spaces are >>maintained by the kernel and execution is placed whereever the user-mode >>scheduler says. >What you say is not clear, but the most obvious understanding of the above >sentence is that you propose what already happens. >However, SKAS3 and current ideas for SKAS4, with different APIs but similar >semantics, say: implement all guest processes as user-level threads (totally >implemented within UML) with the exception that we allow different address >spaces. >So we have "switch the guest proc to a different address >space" (PTRACE_SWITCH_MM, in arch/i386/kernel/ptrace.c), manipulate with >mmap/munmap/mprotect any of these address spaces, and destroy it (all in >mm/proc_mm.c). I shall clarify my proposal: each thread is assigned an address space, while an address space can contain multiple threads. Each thread also has a STOP/RUN flag, which if set to RUN, causes the host scheduler to consider that thread for execution (along with all other runnable threads). This flag allows either the userspace control process to make scheduling decisions itself, (by only setting one of its threads to RUN) or to punt and have the kernel handle all scheduling for its threads (by setting them all to RUN and using STOP only to block a thread). Could all SKAS4 APIs be multiplexed through one syscall? (Perhaps simply as more ptrace functions, or as a new "skas4" syscall?) ------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Do you grep through log files for problems? Stop! Download the new AJAX search engine that makes searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642 _______________________________________________ User-mode-linux-devel mailing list User-mode-linux-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [uml-devel] SKAS4 design question 2006-01-18 23:52 ` Jacob Bachmeyer @ 2006-01-19 0:37 ` Blaisorblade 2006-01-19 22:23 ` Jacob Bachmeyer 0 siblings, 1 reply; 8+ messages in thread From: Blaisorblade @ 2006-01-19 0:37 UTC (permalink / raw) To: user-mode-linux-devel; +Cc: Jacob Bachmeyer On Thursday 19 January 2006 00:52, Jacob Bachmeyer wrote: > Blaisorblade wrote: > >On Monday 16 January 2006 20:34, Jacob Bachmeyer wrote: > >>Has any thought been given to making SKAS4 suitably generic that it > >>could be used for more than just UML? > > > >Not yet, thoughts welcome. > > Let's see: > > to support HURD (which uses the Mach ABI): > > -- existing facilities plus trap lcall gates I.e. extend ptrace to trap lcall gates, right? That's another thing, could be done, but it relates more to the Linux-ABI project... at least this can't be merged in mainline since we don't support lcall gates. > to support WINE (which follows Win32 conventions (ick!)): (x86 only) > --existing facilities plus > -- trap on access to specified pages We do that: make them unmapped and trap SIGSEGV through ptrace. Doesn't work for accesses from kernel-space (you don't get SIGSEGV, just, likely, -EFAULT). And it's horribly slow. And trapping for kernelspace accesses is bad. > Explanation: Win32 API calls are not syscalls in the normal > sense--rather they are made by calling into a system DLL. Yep, it then can decide whether to trap into the kernel or not (depending on that version's implementation). > These DLLs > are mapped into the process' address space on Windows and under current > WINE, much like shared objects in normal Linux. This idea would enable > WINE to not actually map these DLLs, but rather simply set the pages > where the DLLs would be mapped as "fasttrap". Which is the reason to trap to the kernel? It's going to be slow. A page fault, like a syscall, is costly (and probably more since it's an interrupt). If there is a good reason not to map the DLLs, it may at least make sense, but WINE users aren't going to use special patches, and getting such an hackish thing in mainline may be a hard sell (except the reason is _really_ good). > Then, when the program > attempts to access a DLL's memory image, the kernel would intercept the > request and quickly pass it to a userspace thread, Good saying, quickly pass it... signals are slow. There faster but more complicated primitives (I remind netlink for instance). > which handles the > "page fault". > The page remains set as "fasttrap", and the control > process modifies the address space and CPU context appropriately before > allowing execution to continue. "Modifies" to return the call or to map the page in? You seem to imply it performs the call and sets the return value in EAX, right? Also, for security reasons it's not possible to let userspace trap OS accesses (as the OS is more privileged - search TENEX at http://www.isi.edu/~faber/cs402/notes/lecture19.html to see how bad is that). > -- read/write in guest address space > Explanation: mmap is fine for big changes to an address space > (such as loading modules), but one capability WINE would need for this > to be truly useful is 1/2/4/8/16-byte PEEK and POKE. (Some Win32 > programs like to do wierd things with Windows' system code--in > conjunction with "fasttrap", this would allow WINE to keep such programs > happy.) As I understand, ptrace already provides this, hopefully > adequetely. It provides this, it could be made a bit faster (I've reviewed a patch from another project which uses heavily ptrace, which makes that faster). > -- intercept arbitrary interrupts in guest address space > > Explanation: Many older Windows programs (Win16 era) > occasionally directly invoke various soft interrupts (these are > basically DOS syscalls). The ability to intercept these is necessary, > but need not be particularly efficient or fast. I recall that hardware IRQ n. x is mapped to k+x, where k is fixed and low; we now have with ACPI 32 IRQs I guess (on my machine the kernel uses up to 22 IRQs), so I guess int 0x21 it's going to conflict somewhere. That said, this could be added too for interrupts not reserved by the kernel (that is CPU exceptions). But DOSEMU already runs x86 programs, so WINE should be able to do it too... ah, yep, it uses vm86, while you need to do that on a paged system. > -- modify guest address space's LDT > Explanation: Again, Win16 support. Old Windows actually > allowed processes to request segments for whatever purpose. This may or > may not be doable on all modern hardware. PTRACE_LDT exists, and performs a remove modify_ldt, like MM_MMAP is a remote mmap(). > -- transparently use threads in guest address spaces, if desired > Explanation: WINE currently uses the host's scheduler. > Changing it to this new API shouldn't adversely affect that ability. > (And on second thought, using a UML library might not be an option.) > I shall clarify my proposal: each thread is assigned an address space, and (you forget to say) it can be changed through PTRACE_SWITCH_MM you mean... (otherwise I don't see the addition). > while an address space can contain multiple threads. you can PTRACE_SWITCH_MM multiple threads to the same address space > Each thread also > has a STOP/RUN flag, which if set to RUN, causes the host scheduler to > consider that thread for execution (along with all other runnable > threads). This flag allows either the userspace control process to make > scheduling decisions itself, (by only setting one of its threads to RUN) > or to punt and have the kernel handle all scheduling for its threads (by > setting them all to RUN and using STOP only to block a thread). Hmm, sleeping like that is easy if you mean that only a thread can switch itself from RUN to STOP. The thread can use some mutex/semaphore thing, at that point. To switch a thread from RUN to STOP from the exterior, you can currently kill it with -STOP. Beware it's maybe slow, but I don't know whether it matters and if it can be made much faster. The problem (I think) is that SIGSTOP will be processed not at kill() time, but at delivery time, i.e. after a context switch to the receiving thread, before returning to userspace. I've not checked for SIGSTOP and am not sure for the rest, but I think it's this way. > Could all SKAS4 APIs be multiplexed through one syscall? (Perhaps > simply as more ptrace functions, or as a new "skas4" syscall?) "multiplexing" like ipc(2) is a bad idea. However, currently the idea is sys_mm_indirect , taking an fd representing an mm context, a syscall number and its parameters, plus a syscall to get a fd representing a mm context. -- Inform me of my mistakes, so I can keep imitating Homer Simpson's "Doh!". Paolo Giarrusso, aka Blaisorblade (Skype ID "PaoloGiarrusso", ICQ 215621894) http://www.user-mode-linux.org/~blaisorblade ___________________________________ Yahoo! Messenger with Voice: chiama da PC a telefono a tariffe esclusive http://it.messenger.yahoo.com ------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Do you grep through log files for problems? Stop! Download the new AJAX search engine that makes searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642 _______________________________________________ User-mode-linux-devel mailing list User-mode-linux-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [uml-devel] SKAS4 design question 2006-01-19 0:37 ` Blaisorblade @ 2006-01-19 22:23 ` Jacob Bachmeyer 2006-01-20 16:41 ` Blaisorblade 0 siblings, 1 reply; 8+ messages in thread From: Jacob Bachmeyer @ 2006-01-19 22:23 UTC (permalink / raw) To: user-mode-linux-devel Blaisorblade wrote: >On Thursday 19 January 2006 00:52, Jacob Bachmeyer wrote: > >>Blaisorblade wrote: >> >>>On Monday 16 January 2006 20:34, Jacob Bachmeyer wrote: >>> >>>>Has any thought been given to making SKAS4 suitably generic that it >>>>could be used for more than just UML? >>>> >>>Not yet, thoughts welcome. >>> >>Let's see: >> >>to support HURD (which uses the Mach ABI): >> >> -- existing facilities plus trap lcall gates > >I.e. extend ptrace to trap lcall gates, right? That's another thing, could be >done, but it relates more to the Linux-ABI project... at least this can't be >merged in mainline since we don't support lcall gates. Why not? And for that matter, why does ptrace not currently catch lcalls? >>to support WINE (which follows Win32 conventions (ick!)): (x86 only) >> >> --existing facilities plus >> -- trap on access to specified pages > >We do that: make them unmapped and trap SIGSEGV through ptrace. Doesn't work >for accesses from kernel-space (you don't get SIGSEGV, just, likely, >-EFAULT). And it's horribly slow. And trapping for kernelspace accesses is >bad. You don't have to trap kernelspace accesses; (-EFAULT there would be a good thing--the host kernel shouldn't be looking in these pages anyway) this is only to apply to userspace code, but SIGSEGV is slow--why should it be fast? It's an error path. >> Explanation: Win32 API calls are not syscalls in the normal >>sense--rather they are made by calling into a system DLL. > >Yep, it then can decide whether to trap into the kernel or not (depending on >that version's implementation). > >>These DLLs >>are mapped into the process' address space on Windows and under current >>WINE, much like shared objects in normal Linux. This idea would enable >>WINE to not actually map these DLLs, but rather simply set the pages >>where the DLLs would be mapped as "fasttrap". > >Which is the reason to trap to the kernel? It's going to be slow. A page >fault, like a syscall, is costly (and probably more since it's an interrupt). > >If there is a good reason not to map the DLLs, it may at least make sense, but >WINE users aren't going to use special patches, and getting such an hackish >thing in mainline may be a hard sell (except the reason is _really_ good). The overhead is not all that large, as most Win32 API calls ultimately go into the kernel anyway. This also should allow WINE to work well on platforms such as x86-64, without needing multiple WINE binaries. (64-bit control process managing mix of 32 and 64 bit address spaces) Also, what exactly are vsyscalls? Executables are already demand-paged--so page faults routinely happen anyway. The reason to trap is to allow WINE to intercept the call while sitting in another address space. (Each Win32 process would have its own guest address space.) The idea is to have the interfaces UML uses be generic enough for WINE to also use. The reason is simple--improved security by enforcing a sandbox around WINE. >>Then, when the program >>attempts to access a DLL's memory image, the kernel would intercept the >>request and quickly pass it to a userspace thread, > >Good saying, quickly pass it... signals are slow. There faster but more >complicated primitives (I remind netlink for instance). User DLLs (those from the program itself) would actually be mapped. The system DLLs (kernel32, user32, etc.) that WINE itself implements on Linux and that must trap to kernelspace on Windows would be loaded this way. One benefit is to reduce the chance of conflict, as various internal modules in WINE that don't exist in Windows could thus be removed from the visible (to the Win32 app) address space. This could have uses other than WINE, too. One possibility is as a "padded cell" of sorts--a process is started in a guest address space under a control program that intercepts and discards all syscalls. However, certain pages in that address space are used as a restricted system interface--accessing them blocks the accessing thread and causes a (host) syscall to return in the control process. This syscall would block until a guest thread trips a "fasttrap" page and then returns information such as exact address accessed, read or write, and if write, value written. This syscall need not be new--read or ioctl on an appropriate fd (netlink socket perhaps?) would be enough. The control thread then carries out the requested action (whatever that maybe) and permits the jailed thread to again run. "fasttrap" may have been a poor choice of terms. The idea is to have more or less generic kernel-in-userspace functionality with one process as a"usermode supervisor" watching a set of other processes. >>which handles the "page fault". >> >>The page remains set as "fasttrap", and the control >>process modifies the address space and CPU context appropriately before >>allowing execution to continue. > >"Modifies" to return the call or to map the page in? You seem to imply it >performs the call and sets the return value in EAX, right? > >Also, for security reasons it's not possible to let userspace trap OS accesses >(as the OS is more privileged - search TENEX at >http://www.isi.edu/~faber/cs402/notes/lecture19.html to see how bad is that). Perform the API call. It would alter the CPU context, possibly, (if the call requires it) also changing the guest address space. There should be no OS accesses to these pages--those would not trap, but would return -EFAULT because the pages would not actually be allocated. (Win32 programs should not be making Linux syscalls--a version of WINE that uses this would need to catch and ignore any Linux syscalls made.) >> -- read/write in guest address space >> Explanation: mmap is fine for big changes to an address space >>(such as loading modules), but one capability WINE would need for this >>to be truly useful is 1/2/4/8/16-byte PEEK and POKE. (Some Win32 >>programs like to do wierd things with Windows' system code--in >>conjunction with "fasttrap", this would allow WINE to keep such programs >>happy.) As I understand, ptrace already provides this, hopefully >>adequetely. > >It provides this, it could be made a bit faster (I've reviewed a patch from >another project which uses heavily ptrace, which makes that faster). > >> -- intercept arbitrary interrupts in guest address space >> Explanation: Many older Windows programs (Win16 era) >>occasionally directly invoke various soft interrupts (these are >>basically DOS syscalls). The ability to intercept these is necessary, >>but need not be particularly efficient or fast. > >I recall that hardware IRQ n. x is mapped to k+x, where k is fixed and low; we >now have with ACPI 32 IRQs I guess (on my machine the kernel uses up to 22 >IRQs), so I guess int 0x21 it's going to conflict somewhere. > >That said, this could be added too for interrupts not reserved by the kernel >(that is CPU exceptions). But DOSEMU already runs x86 programs, so WINE >should be able to do it too... ah, yep, it uses vm86, while you need to do >that on a paged system. The only requirement here is to call vm86 in another address space, which is already doable--except on 64-bit hardware, where vm86 doesn't exist anyway. >> -- transparently use threads in guest address spaces, if desired >> Explanation: WINE currently uses the host's scheduler. >>Changing it to this new API shouldn't adversely affect that ability. >>(And on second thought, using a UML library might not be an option.) >> >>I shall clarify my proposal: each thread is assigned an address space, >> >and (you forget to say) it can be changed through PTRACE_SWITCH_MM you mean... >(otherwise I don't see the addition). > >>while an address space can contain multiple threads. > >you can PTRACE_SWITCH_MM multiple threads to the same address space This is exactly it--I wanted to be sure that distinct threads can share an address space, while one control process can manage as many address spaces as are needed/wanted. There should be no addition here--this was mentioned for completeness. >>Each thread also >>has a STOP/RUN flag, which if set to RUN, causes the host scheduler to >>consider that thread for execution (along with all other runnable >>threads). This flag allows either the userspace control process to make >>scheduling decisions itself, (by only setting one of its threads to RUN) >>or to punt and have the kernel handle all scheduling for its threads (by >>setting them all to RUN and using STOP only to block a thread). > >Hmm, sleeping like that is easy if you mean that only a thread can switch >itself from RUN to STOP. The thread can use some mutex/semaphore thing, at >that point. > >To switch a thread from RUN to STOP from the exterior, you can currentlykill >it with -STOP. Beware it's maybe slow, but I don't know whether it matters >and if it can be made much faster. > >The problem (I think) is that SIGSTOP will be processed not at kill() time, >but at delivery time, i.e. after a context switch to the receiving thread, >before returning to userspace. I've not checked for SIGSTOP and am not sure >for the rest, but I think it's this way. How about a PTRACE_SET_THREAD_RUNNABLE that takes a 1 (RUN) or 0 (STOP) as its argument and has immediate effects? The problem (IIRC) with SIGSTOP is that signals are delivered to all threads in a process, while a userspace scheduler needs to wake up or block exactly one thread at a time. Blocking a thread would be done from the control process, not from the thread itself. (The call that resulted in it being blocked was made by touching a page that triggered the control process.) >>Could all SKAS4 APIs be multiplexed through one syscall? (Perhaps >>simply as more ptrace functions, or as a new "skas4" syscall?) > >"multiplexing" like ipc(2) is a bad idea. > >However, currently the idea is sys_mm_indirect , taking an fd representing an >mm context, a syscall number and its parameters, plus a syscall to get afd >representing a mm context. How are address spaces manipulated? Could ioctls on the mm context's fd be useful? ------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Do you grep through log files for problems? Stop! Download the new AJAX search engine that makes searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642 _______________________________________________ User-mode-linux-devel mailing list User-mode-linux-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [uml-devel] SKAS4 design question 2006-01-19 22:23 ` Jacob Bachmeyer @ 2006-01-20 16:41 ` Blaisorblade 2006-01-23 19:59 ` Jacob Bachmeyer 0 siblings, 1 reply; 8+ messages in thread From: Blaisorblade @ 2006-01-20 16:41 UTC (permalink / raw) To: user-mode-linux-devel, jcb62281 On Thursday 19 January 2006 23:23, Jacob Bachmeyer wrote: > Blaisorblade wrote: > >On Thursday 19 January 2006 00:52, Jacob Bachmeyer wrote: > >>Blaisorblade wrote: > >>>On Monday 16 January 2006 20:34, Jacob Bachmeyer wrote: > >>>>Has any thought been given to making SKAS4 suitably generic that it > >>>>could be used for more than just UML? > >>> > >>>Not yet, thoughts welcome. > >> > >>Let's see: > >> > >>to support HURD (which uses the Mach ABI): > >> > >> -- existing facilities plus trap lcall gates > >I.e. extend ptrace to trap lcall gates, right? That's another thing, could > > be done, but it relates more to the Linux-ABI project... at least this > > can't be merged in mainline since we don't support lcall gates. > Why not? And for that matter, why does ptrace not currently catch lcalls? The lcall stub was removed from arch/i386/kernel/entry.S a little time ago (about 2.6.12 IIRC). So vanilla Linux can't handle lcalls. Clear now? > >>to support WINE (which follows Win32 conventions (ick!)): (x86 only) > >> > >> --existing facilities plus > >> -- trap on access to specified pages > > > >We do that: make them unmapped and trap SIGSEGV through ptrace. Doesn't > > work for accesses from kernel-space (you don't get SIGSEGV, just, likely, > > -EFAULT). And it's horribly slow. And trapping for kernelspace accesses > > is bad. > You don't have to trap kernelspace accesses; (-EFAULT there would be a > good thing--the host kernel shouldn't be looking in these pages anyway) > this is only to apply to userspace code, but SIGSEGV is slow--why should > it be fast? It's an error path. Yes, it is thought to be only an error path, but UML abuses of it for normal control, and I said that the kernel supports "fasttrap", but only via SIGSEGV, i.e. in a slow way. > >We do that: make them unmapped and trap SIGSEGV through ptrace. > >>These DLLs > >>are mapped into the process' address space on Windows and under current > >>WINE, much like shared objects in normal Linux. This idea would enable > >>WINE to not actually map these DLLs, but rather simply set the pages > >>where the DLLs would be mapped as "fasttrap". > >Which is the reason to trap to the kernel? It's going to be slow. A page > >fault, like a syscall, is costly (and probably more since it's an > > interrupt). > >If there is a good reason not to map the DLLs, it may at least make sense, > > but WINE users aren't going to use special patches, and getting such an > > hackish thing in mainline may be a hard sell (except the reason is > > _really_ good). > The overhead is not all that large, as most Win32 API calls ultimately > go into the kernel anyway. A kernel switch only costs about some thousands TSC units (see the rdtsc assembly instruction), while a signal delivery to a foreign process can cost a lot more (I measure it in the order of 4* 10^5 TSC units, even without a memory switch). > This also should allow WINE to work well on > platforms such as x86-64, without needing multiple WINE binaries. > (64-bit control process managing mix of 32 and 64 bit address spaces) Writing 64-bit code handling cleanly 32-bit syscalls is hard. Compiling 32-bit code in 32-bit mode to do the same is simpler. > Also, what exactly are vsyscalls? > Executables are already demand-paged--so page faults routinely happen > anyway. Not the same thing - assuming the working set fits in memory, you get page faults only for the first access to a given page, and they just jump to the kernel. What you're proposing is that for each call to GDI functions, for instance, or whatever, a signal delivery (or in the best case, just a context switch) is triggered. That's another thing. > The reason to trap is to allow WINE to intercept the call while > sitting in another address space. (Each Win32 process would have its > own guest address space.) The idea is to have the interfaces UML uses > be generic enough for WINE to also use. > The reason is simple--improved security by enforcing a sandbox around > WINE. > >>Then, when the program > >>attempts to access a DLL's memory image, the kernel would intercept the > >>request and quickly pass it to a userspace thread, > >Good saying, quickly pass it... signals are slow. There faster but more > >complicated primitives (I remind netlink for instance). > User DLLs (those from the program itself) would actually be mapped. The > system DLLs (kernel32, user32, etc.) that WINE itself implements on > Linux and that must trap to kernelspace on Windows would be loaded this > way. > One benefit is to reduce the chance of conflict, as various > internal modules in WINE that don't exist in Windows could thus be > removed from the visible (to the Win32 app) address space. This could > have uses other than WINE, too. One possibility is as a "padded cell" > of sorts--a process is started in a guest address space under a control > program that intercepts and discards all syscalls. However, certain > pages in that address space are used as a restricted system > interface--accessing them blocks the accessing thread and causes a > (host) syscall to return in the control process. This syscall would > block until a guest thread trips a "fasttrap" page and then returns > information such as exact address accessed, read or write, and if write, > value written. This syscall need not be new--read or ioctl on an > appropriate fd (netlink socket perhaps?) would be enough. The control > thread then carries out the requested action (whatever that maybe) and > permits the jailed thread to again run. Andrea Arcangeli merged such a "padded cell" functionality, but the allowed interface is read, not a page fault. The former is faster and easier to use, and also allows writing arbitrary amounts of data. It's called secure computing (see kernel/seccomp.c for details, and/or look on LWN.net for an article about it). > "fasttrap" may have been a poor choice of terms. The idea is to have > more or less generic kernel-in-userspace functionality with one process > as a"usermode supervisor" watching a set of other processes. > >Also, for security reasons it's not possible to let userspace trap OS > > accesses (as the OS is more privileged - search TENEX at > >http://www.isi.edu/~faber/cs402/notes/lecture19.html to see how bad is > > that). > Perform the API call. It would alter the CPU context, possibly, (if the > call requires it) also changing the guest address space. There should > be no OS accesses to these pages--those would not trap, but would return > -EFAULT because the pages would not actually be allocated. (Win32 > programs should not be making Linux syscalls--a version of WINE that > uses this would need to catch and ignore any Linux syscalls made.) > >> -- read/write in guest address space > >> Explanation: mmap is fine for big changes to an address space > >>(such as loading modules), but one capability WINE would need for this > >>to be truly useful is 1/2/4/8/16-byte PEEK and POKE. (Some Win32 > >>programs like to do wierd things with Windows' system code--in > >>conjunction with "fasttrap", this would allow WINE to keep such programs > >>happy.) As I understand, ptrace already provides this, hopefully > >>adequetely. > >It provides this, it could be made a bit faster (I've reviewed a patch > > from another project which uses heavily ptrace, which makes that faster). > >> -- intercept arbitrary interrupts in guest address space > >> Explanation: Many older Windows programs (Win16 era) > >>occasionally directly invoke various soft interrupts (these are > >>basically DOS syscalls). The ability to intercept these is necessary, > >>but need not be particularly efficient or fast. > >I recall that hardware IRQ n. x is mapped to k+x, where k is fixed and > > low; we now have with ACPI 32 IRQs I guess (on my machine the kernel uses > > up to 22 IRQs), so I guess int 0x21 it's going to conflict somewhere. > >That said, this could be added too for interrupts not reserved by the > > kernel (that is CPU exceptions). But DOSEMU already runs x86 programs, so > > WINE should be able to do it too... ah, yep, it uses vm86, while you need > > to do that on a paged system. > The only requirement here is to call vm86 in another address space, > which is already doable--except on 64-bit hardware, where vm86 doesn't > exist anyway. Wait a moment - Windows 3.1 uses 286 paging, and Win16 userspace progs use up to 16M of Ram. You don't have this on vm86(), right? > This is exactly it--I wanted to be sure that distinct threads can share > an address space, while one control process can manage as many address > spaces as are needed/wanted. There should be no addition here--this was > mentioned for completeness. UML will need to have this functionality debugged and working sooner or later - when it will do SMP with SKAS, it'll need exactly this (you have multiple managed threads, corresponding to multiple virtual CPUs, and a thread and its address space can be executed on each of those virtual CPUs). > How about a PTRACE_SET_THREAD_RUNNABLE that takes a 1 (RUN) or 0 (STOP) > as its argument and has immediate effects? The problem (IIRC) with > SIGSTOP is that signals are delivered to all threads in a process, Isn't there tkill() for this purpose (signals to a specific thread)? And if it doesn't work, it should be fixed. Having tons of incoherent APIs is bad, as long as things can be done with current ones. > >However, currently the idea is sys_mm_indirect , taking an fd representing > > an mm context, a syscall number and its parameters, plus a syscall to get > > afd representing a mm context. > How are address spaces manipulated? Could ioctls on the mm context's fd > be useful? We don't use ioctls, they are inelegants; SKAS3 uses write which is just as bad. For SKAS4, instead, you'd use sys_mm_indirectI(); you say: mm_indirect(addr_space_fd, __NR_MMAP, <mmap_args>) mm_indirect(addr_space_fd, __NR_MUNMAP, <munmap_args>) and so on, for each syscall (excluding fork and exit, for now). To destroy an address space you simply call close on its fd. -- Inform me of my mistakes, so I can keep imitating Homer Simpson's "Doh!". Paolo Giarrusso, aka Blaisorblade (Skype ID "PaoloGiarrusso", ICQ 215621894) http://www.user-mode-linux.org/~blaisorblade ___________________________________ Yahoo! Messenger with Voice: chiama da PC a telefono a tariffe esclusive http://it.messenger.yahoo.com ------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Do you grep through log files for problems? Stop! Download the new AJAX search engine that makes searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642 _______________________________________________ User-mode-linux-devel mailing list User-mode-linux-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [uml-devel] SKAS4 design question 2006-01-20 16:41 ` Blaisorblade @ 2006-01-23 19:59 ` Jacob Bachmeyer 2006-01-30 11:09 ` Blaisorblade 0 siblings, 1 reply; 8+ messages in thread From: Jacob Bachmeyer @ 2006-01-23 19:59 UTC (permalink / raw) To: user-mode-linux-devel Blaisorblade wrote: >On Thursday 19 January 2006 23:23, Jacob Bachmeyer wrote: > >>Blaisorblade wrote: >> >>>On Thursday 19 January 2006 00:52, Jacob Bachmeyer wrote: >>> >>>>Blaisorblade wrote: >>>> >>>>>On Monday 16 January 2006 20:34, Jacob Bachmeyer wrote: >>>>> >>>>>>Has any thought been given to making SKAS4 suitably generic that it >>>>>>could be used for more than just UML? >>>>>> >>>>>Not yet, thoughts welcome. >>>>> >>>>Let's see: >>>> >>>>to support HURD (which uses the Mach ABI): >>>> >>>> -- existing facilities plus trap lcall gates >>> >>>I.e. extend ptrace to trap lcall gates, right? That's another thing, could >>>be done, but it relates more to the Linux-ABI project... at least this >>>can't be merged in mainline since we don't support lcall gates. >> >>Why not? And for that matter, why does ptrace not currently catch lcalls? > >The lcall stub was removed from arch/i386/kernel/entry.S a little time ago >(about 2.6.12 IIRC). So vanilla Linux can't handle lcalls. Clear now? Yes, the last time I looked into that part of the kernel was back in 2.4. So, does this mean that lcalls can no longer be potentially used to escape from UML? >>>>to support WINE (which follows Win32 conventions (ick!)): (x86 only) >>>> >>>> --existing facilities plus >>>> -- trap on access to specified pages >>>> >>>We do that: make them unmapped and trap SIGSEGV through ptrace. Doesn't >>>work for accesses from kernel-space (you don't get SIGSEGV, just, likely, >>>-EFAULT). And it's horribly slow. And trapping for kernelspace accesses >>>is bad. >>> >>You don't have to trap kernelspace accesses; (-EFAULT there would be a >>good thing--the host kernel shouldn't be looking in these pages anyway) >>this is only to apply to userspace code, but SIGSEGV is slow--why should >>it be fast? It's an error path. > >Yes, it is thought to be only an error path, but UML abuses of it for normal >control, and I said that the kernel supports "fasttrap", but only via >SIGSEGV, i.e. in a slow way. That is the exact problem. It shouldn't be abused--a proper interface that has acceptable performance should be devised. (You mention netlink--was it looked into? This might help with some UML performance issues.) Basically what is needed is a means to set a page to no access but cause some other action to occur rather than generate SIGSEGV. >>>We do that: make them unmapped and trap SIGSEGV through ptrace. >>> >>The overhead is not all that large, as most Win32 API calls ultimately >>go into the kernel anyway. > >A kernel switch only costs about some thousands TSC units (see the rdtsc >assembly instruction), while a signal delivery to a foreign process can cost >a lot more (I measure it in the order of 4* 10^5 TSC units, even without a >memory switch). Then a more efficient interface is needed. Besides, this would need to be synchronous. >>This also should allow WINE to work well on >>platforms such as x86-64, without needing multiple WINE binaries. >>(64-bit control process managing mix of 32 and 64 bit address spaces) > >Writing 64-bit code handling cleanly 32-bit syscalls is hard. Compiling 32-bit >code in 32-bit mode to do the same is simpler. The problem is that they need to communicate, especially once Win64 actually hits. WINE currently has a (confusing) "relay" layer that already does similar tasks for 16/32 bit. Furthermore, the Win32 API calling convention is fairly well defined, (parameters on stack; return in EAX) so this shouldn't be more of a problem than has been solved in the past. (That doesn't mean it won't be a real PITA.) >>The reason to trap is to allow WINE to intercept the call while >>sitting in another address space. (Each Win32 process would have its >>own guest address space.) The idea is to have the interfaces UML uses >>be generic enough for WINE to also use. >> >>The reason is simple--improved security by enforcing a sandbox around >>WINE. >> Seccomp (see below--thanks for bringing it up) could more easily be used to solve this. (Why bother with trapping all the time when only a few pages really need protection? Furthermore, the external control thread would thus have veto power over all syscalls made, so the sandbox can be easily enforced.) >>>>Then, when the program >>>>attempts to access a DLL's memory image, the kernel would intercept the >>>>request and quickly pass it to a userspace thread, >>> >>>Good saying, quickly pass it... signals are slow. There faster but more >>>complicated primitives (I remind netlink for instance). >> >>User DLLs (those from the program itself) would actually be mapped. The >>system DLLs (kernel32, user32, etc.) that WINE itself implements on >>Linux and that must trap to kernelspace on Windows would be loaded this >>way. >> >>One benefit is to reduce the chance of conflict, as various >>internal modules in WINE that don't exist in Windows could thus be >>removed from the visible (to the Win32 app) address space. This could >>have uses other than WINE, too. One possibility is as a "padded cell" >>of sorts--a process is started in a guest address space under a control >>program that intercepts and discards all syscalls. However, certain >>pages in that address space are used as a restricted system >>interface--accessing them blocks the accessing thread and causes a >>(host) syscall to return in the control process. This syscall would >>block until a guest thread trips a "fasttrap" page and then returns >>information such as exact address accessed, read or write, and if write, >>value written. This syscall need not be new--read or ioctl on an >>appropriate fd (netlink socket perhaps?) would be enough. The control >>thread then carries out the requested action (whatever that maybe) and >>permits the jailed thread to again run. > >Andrea Arcangeli merged such a "padded cell" functionality, but the allowed >interface is read, not a page fault. The former is faster and easier to use, >and also allows writing arbitrary amounts of data. > >It's called secure computing (see kernel/seccomp.c for details, and/or look on >LWN.net for an article about it). I had looked at this earlier, but hadn't realized that it could be used to implement this--provided that mm_indirect can make syscalls in a seccomp address space (bypassing the restriction), this can do everything that "fasttrap" could (using some help from appropriate code in userspace). Maybe SKAS4 should add a new seccomp level? >>>> -- read/write in guest address space >>>> Explanation: mmap is fine for big changes to an address space >>>>(such as loading modules), but one capability WINE would need for this >>>>to be truly useful is 1/2/4/8/16-byte PEEK and POKE. (Some Win32 >>>>programs like to do wierd things with Windows' system code--in >>>>conjunction with "fasttrap", this would allow WINE to keep such programs >>>>happy.) As I understand, ptrace already provides this, hopefully >>>>adequetely. >>> >>>It provides this, it could be made a bit faster (I've reviewed a patch >>>from another project which uses heavily ptrace, which makes that faster). One down, more to go. >>>> -- intercept arbitrary interrupts in guest address space >>>> Explanation: Many older Windows programs (Win16 era) >>>>occasionally directly invoke various soft interrupts (these are >>>>basically DOS syscalls). The ability to intercept these is necessary, >>>>but need not be particularly efficient or fast. >>> >>>I recall that hardware IRQ n. x is mapped to k+x, where k is fixed and >>>low; we now have with ACPI 32 IRQs I guess (on my machine the kernel uses >>>up to 22 IRQs), so I guess int 0x21 it's going to conflict somewhere. >>> >>>That said, this could be added too for interrupts not reserved by the >>>kernel (that is CPU exceptions). But DOSEMU already runs x86 programs, so >>>WINE should be able to do it too... ah, yep, it uses vm86, while you need >>>to do that on a paged system. >> >>The only requirement here is to call vm86 in another address space, >>which is already doable--except on 64-bit hardware, where vm86 doesn't >>exist anyway. > >Wait a moment - Windows 3.1 uses 286 paging, and Win16 userspace progs use up >to 16M of Ram. You don't have this on vm86(), right? No, but as I said vm86 is gone on x86-64, which means that DOS soft ints are somehow caught--inside the address space in question. (WINE currently runs in-process, I am trying to lay the groundwork to change that--thus all the crazy stuff previously about "fasttrap" to another userspace.) Current WINE can use vm86 on i386 platform, however. This (Win16 programs with 16MiB of RAM) also means that WINE could always intercept soft interrupts--even without use of vm86. The other catch is that 64 and 32 bit code doesn't mix very well, and they must be kept in separate processes normally--thus the reason for a 64-bit control process to be able to handle both 32 and 64 bit address spaces. The entire kernel is 64-bit anyway, so leaving the option open can't be too insanely hard. >>How about a PTRACE_SET_THREAD_RUNNABLE that takes a 1 (RUN) or 0 (STOP) >>as its argument and has immediate effects? The problem (IIRC) with >>SIGSTOP is that signals are delivered to all threads in a process, > >Isn't there tkill() for this purpose (signals to a specific thread)? And if it >doesn't work, it should be fixed. Having tons of incoherent APIs is bad, as >long as things can be done with current ones. The other problem is that a more specific interface could be much faster. OTOH, perhaps a better strategy would be to improve the signals--thus also lessening the other problem (slowness of SIGSEGV) as well as improving performance generally. >>>However, currently the idea is sys_mm_indirect , taking an fd representing >>>an mm context, a syscall number and its parameters, plus a syscall to get >>>a fd representing a mm context. >> >>How are address spaces manipulated? Could ioctls on the mm context's fd >>be useful? > >We don't use ioctls, they are inelegants; SKAS3 uses write which is just as >bad. What is inelegant about an ioctl on a special fd? I say that ioctls are far preferrable to more fds (on other files), or the extra complexity of implementing some other interface (maybe using netlink?). Besides, if you implement your own struct file_operations, you get ioctl support by writing the handler function for it. (If I understand the Linux 2.6.14 VFS correctly). OTOH, if no operations that fall into ioctl's area are needed, then implementing ioctl for its own sake is silly. >For SKAS4, instead, you'd use sys_mm_indirectI(); you say: > >mm_indirect(addr_space_fd, __NR_MMAP, <mmap_args>) >mm_indirect(addr_space_fd, __NR_MUNMAP, <munmap_args>) > >and so on, for each syscall (excluding fork and exit, for now). To destroy an >address space you simply call close on its fd. How do you map region X of the guest address space to region Y (or somewhere) in your own? mmap/munmap on the address space's fd would make sense here. PS: Sorry about the long delay. Mozilla crashed while I had the compose window for this message buried under several browsers (and totally forgotten, too--oops). ------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Do you grep through log files for problems? Stop! Download the new AJAX search engine that makes searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642 _______________________________________________ User-mode-linux-devel mailing list User-mode-linux-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [uml-devel] SKAS4 design question 2006-01-23 19:59 ` Jacob Bachmeyer @ 2006-01-30 11:09 ` Blaisorblade 0 siblings, 0 replies; 8+ messages in thread From: Blaisorblade @ 2006-01-30 11:09 UTC (permalink / raw) To: user-mode-linux-devel, jcb62281 On Monday 23 January 2006 20:59, Jacob Bachmeyer wrote: > Blaisorblade wrote: > >On Thursday 19 January 2006 23:23, Jacob Bachmeyer wrote: > >>Blaisorblade wrote: > >>>On Thursday 19 January 2006 00:52, Jacob Bachmeyer wrote: > >>>I.e. extend ptrace to trap lcall gates, right? That's another thing, > >>> could be done, but it relates more to the Linux-ABI project... at least > >>> this can't be merged in mainline since we don't support lcall gates. > >> > >>Why not? And for that matter, why does ptrace not currently catch > >> lcalls? > > > >The lcall stub was removed from arch/i386/kernel/entry.S a little time ago > >(about 2.6.12 IIRC). So vanilla Linux can't handle lcalls. Clear now? > Yes, the last time I looked into that part of the kernel was back in > 2.4. So, does this mean that lcalls can no longer be potentially used > to escape from UML? Yes, and IIRC that was also fixed directly time ago via LDT clearing, IIRC. > >Yes, it is thought to be only an error path, but UML abuses of it for > > normal control, and I said that the kernel supports "fasttrap", but only > > via SIGSEGV, i.e. in a slow way. > That is the exact problem. It shouldn't be abused--a proper interface > that has acceptable performance should be devised. (You mention > netlink--was it looked into? No, and I while I mentioned netlink it's not an interface of which I've a deep knowledge. However it's being used for various things, including a proposed rewrite of the wireless API, and the already existing implementation of userspace packet filtering, so we can assume it has reasonable performance, momentum, user base and thus maintainance. > This might help with some UML performance > issues.) Possibly yes, but Ingo Molnar already designed a custom API for this purpose - it is grown up for UML usage. > Basically what is needed is a means to set a page to no access > but cause some other action to occur rather than generate SIGSEGV. > >>>We do that: make them unmapped and trap SIGSEGV through ptrace. > >> > >>The overhead is not all that large, as most Win32 API calls ultimately > >>go into the kernel anyway. > > > >A kernel switch only costs about some thousands TSC units (see the rdtsc > >assembly instruction), while a signal delivery to a foreign process can > > cost a lot more (I measure it in the order of 4* 10^5 TSC units, even > > without a memory switch). > > Then a more efficient interface is needed. Besides, this would need to > be synchronous. > > >>This also should allow WINE to work well on > >>platforms such as x86-64, without needing multiple WINE binaries. > >>(64-bit control process managing mix of 32 and 64 bit address spaces) > > > >Writing 64-bit code handling cleanly 32-bit syscalls is hard. Compiling > > 32-bit code in 32-bit mode to do the same is simpler. > > The problem is that they need to communicate, especially once Win64 > actually hits. WINE currently has a (confusing) "relay" layer that > already does similar tasks for 16/32 bit. Furthermore, the Win32 API > calling convention is fairly well defined, (parameters on stack; return > in EAX) so this shouldn't be more of a problem than has been solved in > the past. (That doesn't mean it won't be a real PITA.) > > >>The reason to trap is to allow WINE to intercept the call while > >>sitting in another address space. (Each Win32 process would have its > >>own guest address space.) The idea is to have the interfaces UML uses > >>be generic enough for WINE to also use. > >> > >>The reason is simple--improved security by enforcing a sandbox around > >>WINE. > Seccomp (see below--thanks for bringing it up) could more easily be used > to solve this. (Why bother with trapping all the time when only a few > pages really need protection? Furthermore, the external control thread > would thus have veto power over all syscalls made, so the sandbox can be > easily enforced.) > >Andrea Arcangeli merged such a "padded cell" functionality, but the > > allowed interface is read, not a page fault. The former is faster and > > easier to use, and also allows writing arbitrary amounts of data. > > > >It's called secure computing (see kernel/seccomp.c for details, and/or > > look on LWN.net for an article about it). > > I had looked at this earlier, but hadn't realized that it could be used > to implement this--provided that mm_indirect can make syscalls in a > seccomp address space (bypassing the restriction), Wait a moment - you're clearly talking about the runtime thread calling mm_indirect(), or I mistook something? In this case there's no problem - seccomp jails the process only. If we tried to inject in the process code to perform syscalls (like UML does in SKAS0 mode, which is not a host patch) it wouldn't work, but mm_indirect is a normal syscall borrowing the foreign address space. > this can do > everything that "fasttrap" could (using some help from appropriate code > in userspace). > Maybe SKAS4 should add a new seccomp level? I don't remember about "levels" in seccomp... and that was intended to be simple. Beyond they shouldn't be needed (see above). > >Wait a moment - Windows 3.1 uses 286 paging, and Win16 userspace progs use > > up to 16M of Ram. You don't have this on vm86(), right? > No, but as I said vm86 is gone on x86-64, which means that DOS soft ints > are somehow caught--inside the address space in question. (WINE > currently runs in-process, I am trying to lay the groundwork to change > that--thus all the crazy stuff previously about "fasttrap" to another > userspace.) Current WINE can use vm86 on i386 platform, however. > This (Win16 programs with 16MiB of RAM) also means that WINE could > always intercept soft interrupts--even without use of vm86. Good. > The other catch is that 64 and 32 bit code doesn't mix very well, and > they must be kept in separate processes normally--thus the reason for a > 64-bit control process to be able to handle both 32 and 64 bit address > spaces. The entire kernel is 64-bit anyway, so leaving the option open > can't be too insanely hard. > The other problem is that a more specific interface could be much > faster. OTOH, perhaps a better strategy would be to improve the > signals--thus also lessening the other problem (slowness of SIGSEGV) as > well as improving performance generally. Signals are very slow, but in many ways they can't be optimized. The only big optimization which can be done is when _tracing_ a process which gets a signal. The signal is first delivered to the target process, a context switch is made towards it, and only afterwards, before returning to user mode, is the signal notification delivered to the tracing process, a context switch is performed towards it and then the traced process is switched again to ready state and then scheduled. I.e. the first switch to the target process is totally useless. > >>>However, currently the idea is sys_mm_indirect , taking an fd > >>> representing an mm context, a syscall number and its parameters, plus a > >>> syscall to get a fd representing a mm context. > >>How are address spaces manipulated? Could ioctls on the mm context's fd > >>be useful? > >We don't use ioctls, they are inelegants; SKAS3 uses write which is just > > as bad. > What is inelegant about an ioctl on a special fd? I say that ioctls are > far preferrable to more fds (on other files), or the extra complexity of > implementing some other interface (maybe using netlink?). ioctl is totally unstructured and thus inelegant, and 32/64-bit compatibility is a PITA. Using them for devices is tolerable, for general APIs isn't. Many recently included APIs were born as ioctl()s set and were rewritten as either syscalls sets or special filesystems (say inotify(), for instance). Device mapper uses ioctls only because it was merged in the dark age of 2.5 and it was really needed. > Besides, if > you implement your own struct file_operations, you get ioctl support by > writing the handler function for it. > (If I understand the Linux 2.6.14 > VFS correctly). You do, that's not the problem... and the inelegance is not totally in the implementation, but in the API. > OTOH, if no operations that fall into ioctl's area are > needed, then implementing ioctl for its own sake is silly. > >For SKAS4, instead, you'd use sys_mm_indirectI(); you say: > > > >mm_indirect(addr_space_fd, __NR_MMAP, <mmap_args>) > >mm_indirect(addr_space_fd, __NR_MUNMAP, <munmap_args>) > > > >and so on, for each syscall (excluding fork and exit, for now). To destroy > > an address space you simply call close on its fd. > How do you map region X of the guest address space to region Y (or > somewhere) in your own? mmap/munmap on the address space's fd would > make sense here. That's not possible, to my knowledge, unless you use a shared backing storage, i.e. a tmpfs file. I.e. the memory must be set up as shareable from the very beginning. -- Inform me of my mistakes, so I can keep imitating Homer Simpson's "Doh!". Paolo Giarrusso, aka Blaisorblade (Skype ID "PaoloGiarrusso", ICQ 215621894) http://www.user-mode-linux.org/~blaisorblade ___________________________________ Yahoo! Messenger with Voice: chiama da PC a telefono a tariffe esclusive http://it.messenger.yahoo.com ------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Do you grep through log files for problems? Stop! Download the new AJAX search engine that makes searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642 _______________________________________________ User-mode-linux-devel mailing list User-mode-linux-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2006-01-30 11:11 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2006-01-16 19:34 [uml-devel] SKAS4 design question Jacob Bachmeyer 2006-01-18 11:58 ` Blaisorblade 2006-01-18 23:52 ` Jacob Bachmeyer 2006-01-19 0:37 ` Blaisorblade 2006-01-19 22:23 ` Jacob Bachmeyer 2006-01-20 16:41 ` Blaisorblade 2006-01-23 19:59 ` Jacob Bachmeyer 2006-01-30 11:09 ` Blaisorblade
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.