From mboxrd@z Thu Jan 1 00:00:00 1970 Message-ID: <44870475.8030705@domain.hid> Date: Wed, 07 Jun 2006 18:53:09 +0200 From: Philippe Gerum MIME-Version: 1.0 Subject: Re: AW: [Xenomai-core] Ipipe hook at system call exit References: <88AEA5AC18A141439A0D954EB037B0D30439F329@domain.hid> In-Reply-To: <88AEA5AC18A141439A0D954EB037B0D30439F329@domain.hid> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: quoted-printable List-Id: "Xenomai life and development \(bug reports, patches, discussions\)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "Krause, Karl-Heinz" Cc: xenomai@xenomai.org Krause, Karl-Heinz wrote: > Thanks Philippe for your quick reply. >=20 > May be a few additional remarks will clarify some remaining misundersta= nding. Compared to Xenomai the basic differences are=20 > -- there are no shadow threads > -- we use the standard glibc with futex based synchronization/communica= tion=20 > working across domain boundary transparently. > This transparency is demonstrated by having the process binary running= on a ipipe-patched Linux only. After loading the realtime module the sam= e binary runs but the responding threads provide for the realtime guarant= ees. > Now lets go beyond marketing: > First the glibc implementation is not completely realtime capable. This= concerns two functions > - the implementation of the SIGEV_THREAD notification=20 > - the implementation of the spinlock() function > For both we do have realtime capable implementations. Since these issue= s hold also for a natively realtime capable Linux it also must be solved = for=20 > a "natively" realtime capable Linux (PREMPT_RT) and the chances to have= one standard implemention are quite good. The only difference may be th= e spinlock(). Here the protection needed for static priorities can be don= e > differently. For a two kernel solution the protection must work across = (interrupt disable). >=20 > I guess the transparency also explains why we cannot rely on lazy migra= tion back. If for e.g. a thread which should provide for realtime response does an=20 open() or a mmap() during its setup phase and then a sigwait() for responding, then the=20 sigwait() call has to be executed in the realtime domain from the very beginning. Having the check for=20 migration back to realtime at the system call epilog of Linux is the most convenient way, otherwise=20 we would neet hooks in every system call function which is propagatable. But, intercepting the SYSCALL event for all domains is equivalent to=20 having such hook. >=20 > Concerning the futex function. Currently we intercept at the system cal= l exit and call the corresponding rt-function when the number of requeste= d=20 > wakeups could not be performed. You mean that if Linux fails to identify one of its own futexes during a=20 get/release operation, then the handling is passed to the RT side? This provides for an excellent filtering > but works for regular mutexes only. If we want to preserve the exact se= mantics for PI mutexes we have to call the rt-function upfront.=20 > For mutexes with priority ceiling migration the migration check at syst= em call exit is sufficient. For priority inheritance we would need to use= the scheduler hook. >=20 > Concerning the mlock-stuff we view it to be not sufficient, since if so= mebody does a malloc() and sets up preallocated structures, they are not = necessarily touched. >=20 I still don't get the point here. mlocking the data segment should cause=20 all pages included into this segment to be touched by the mm during the=20 fixup, basically by forcing the invocation of the page fault handler for=20 each page found in the associated VMAs. So there is no way the=20 underlying physical memory could not be committed after mlock. > Concerning the performance issues and your remark that you have still w= ork to do. For us treating system calls what they really are namely synchronous=20 exceptions which should be handled by the causing domain only would be perfect fit and=20 would be faster.(as an option) >=20 IPIPE_EVENT_SELF does it for recent I-pipe patches. It's a modifier=20 telling Adeos to send the event only to the causing domain's handler. > Hopefully this clears up the issues somewhat. Well, yes and no. Talking about the syscall exit hook, I don't get why=20 it is absolutely required since the co-kernel can control the migrations=20 as part of a preamble and/or postamble code surrounding the syscall=20 demux, given that all syscalls from any domain can be filtered through=20 it by Adeos. I do understand that changing the existing and working code=20 might not be the preferred solution though, but additions to the=20 critical path must enable a mandatory feature which could not be=20 obtained by other means. >=20 >=20 > Karl-Heinz > =20 >=20 > -----Urspr=FCngliche Nachricht----- > Von: Philippe Gerum [mailto:rpm@xenomai.org > Gesendet: Mittwoch, 7. Juni 2006 15:21 > An: Krause, Karl-Heinz > Cc: xenomai@xenomai.org > Betreff: Re: [Xenomai-core] Ipipe hook at system call exit >=20 >=20 > Hello, >=20 > Krause, Karl-Heinz wrote: >=20 >>Hallo Philippe >> >>=20 >> >>Jan Kiszka referred me to you discussing our problem with a missing=20 >>Ipipe hook at system call exit. >> >>We at Siemens A&D do have a Linux realtime approach which is based on a= =20 >>previous ADEOS version. When trying to port an improved version to the=20 >>Ipipe version for kernel 2.6.15.4 we ran into the problem of not having= =20 >>an event hook at system call exit. Let me explain the need for it by=20 >>briefly outlining our approach. >> >>It is a two kernel approach based on the model of a multihreaded proces= s=20 >>(means 2.6 kernel) where the threads above a certain static priority=20 >>level e.g. 68 are scheduled by the scheduler of the realtime kernel.=20 >>The realtime kernel maintains exactly the same systemcall interface as=20 >>the Linux kernel. The entire process works uniformely with the glibc.=20 >>The glibc isn't aware under which scheduler the current thread is=20 >>executing. To make this happen and having both schedulers to work with= =20 >>the same struct task struct we had to put some restrictions on the=20 >>signalling for the realtime domain (restrictions which make sense for=20 >>the realtime arena anyway). Because of that transparency this approach=20 >>combines somehow the advantages of a separated realtime kernel with the= =20 >>user convenience of PREEMPT_RT. (the user convenience was the driving=20 >>requirement for our approach) >> >=20 >=20 > There seems to be quite a lot of commonality with the way Xenomai deals= =20 > with shadow threads to enable realtime processing in user-space, while=20 > providing a seamless integration with Linux. One difference might be th= e=20 > way your system deals with Linux syscalls fired on behalf of a thread=20 > controlled by the real-time scheduler; Xenomai migrates the thread to=20 > the Linux scheduler transparently, but I did not figure out yet if this= =20 > was a relevant issue in your system. Anyway, I think that I now roughly= =20 > understand the general dynamics of it, thanks for the explanations. >=20 >=20 >>=20 >> >>Now to the question why we need a hook at systemcall exit. >> >>The hook at systemcall entry branches to the system call handling of th= e=20 >>realtime kernel, which is also entered via a systemcall table. The=20 >>handling can be grouped in three classes >> >>- complete handling in the realtime domain e.g. timer_settime()= ,=20 >>sigwait() >> >>- only migration of the thread to the Linux scheduler. Basicall= y=20 >>all calls needed for setup e.g. open(), mmap(), pthread_create(). The=20 >>migration is transparent for the ipipe code, the thread continues=20 >>execution in the Linux domain with the call of the Linux system call=20 >>table (the priority hasn't changed). >> >>- handling in the realtime domain and migration to the Linux=20 >>domain if the thread priority has dropped unter the boundary (e.g=20 >>releasing a mutex with priority ceiling) >> >>=20 >> >>In particular for the second case a check needs to be done at sytem cal= l=20 >>exit as to whether the thread has to migrate (back) to the realtime=20 >>scheduler. But this is also needed when a call issued in the Linux=20 >>raises the priority above the threshold. A third reason for the hook is= =20 >>to touch the corresponding pages after a brk() or mmap() call for=20 >>getting residency. >> >> Note: >> >>The migration only takes place for threads of a process marked as realt= ime. >> >>Currently we allow only for one realtime process. First it is sufficien= t=20 >>for us and second it allows us to maintain the futex queue (each domain= =20 >>maintains a local queue) of the realtime domain with virtual addresses=20 >>(no mm_lock). =20 >> >=20 >=20 > Does this mean that you specifically intercept futex ops to process the= m=20 > in real-time mode when fired over the real-time context? Which would in= =20 > turn allow you to traverse most of the glibc code and get it=20 > synchronized with the plain Linux threads? >=20 >=20 >>=20 >> >>So this hook at system call exit is a necessity for us. Of course we=20 >>could do a private patch, but do you see a possibility to have it in th= e=20 >>standard Ipipe-patch? >> >=20 >=20 > Basically, I removed the sysexit hook from the I-pipe patch because it=20 > added a non-negligible overhead to each syscall. Even the sysenter hook= =20 > needs some work to reduce its CPU footprint and I've planned to tackle=20 > the issue soon. For this reason, the current Xeno implementation only=20 > relies on the sysenter (IPIPE_EVENT_SYSCALL) hook to deal with=20 > migrations between the Linux and Xenomai schedulers, usually enforcing = a=20 > lazy migration scheme, i.e. the syscall prologue added by the RT=20 > extension switches the caller to the proper domain before running the=20 > system call handler, but does not eagerly switch back to the originatin= g=20 > domain (well, there are exceptions to this, but that's the usual way=20 > things are handled). >=20 > Reading your description, a few questions came to my mind: >=20 > - why do you force a switch back to the originating domain? IOW, are=20 > eager transitions absolutely required in your design, since your RT=20 > thread is underlaid by a regular Linux task anyway, so it could continu= e=20 > its processing and switch back to the RT side only when needed? >=20 > - would not it be possible to intercept the IPIPE_EVENT_SETSCHED=20 > notifications, which are fired by the I-pipe when a Linux task is about= =20 > to have its priority changed? It's a direct hook from the kernel's=20 > sched_setscheduler(), which is given the task_struct pointer of the=20 > altered task, right after its priority field has been updated, but stil= l=20 > before the Linux runqueue is reordered. >=20 > - would mlocking the data segment of your application be enough/possibl= e=20 > to ensure that brk() and mmapped() segments get committed to physical=20 > memory automatically, and as such spare you the need for touching those= =20 > areas explicitely? AFAIK, mlocked pages are going to be fixed up this=20 > way by the mm layer during the mlocking call. >=20 > - generally speaking, since you control the prologue and epilogue of al= l=20 > system calls (Linux or real-time) which go through your own syscall=20 > demux by mean of the IPIPE_EVENT_SYSCALL hook, it should be possible to= =20 > handle the whole migration issue (be it eager or lazy in this case) fro= m=20 > your code, instead of relying on a hook inserted in Linux's syscall=20 > return path. Or am I missing something? >=20 >=20 >>=20 >> >>=20 >> >>Karl-Heinz Krause >> >>Siemens A&D >> >>Nbg.-Moorenbrunn >> >>=20 >> >>=20 >> >>=20 >> >>=20 >> >> >>-----------------------------------------------------------------------= - >> >>_______________________________________________ >>Xenomai-core mailing list >>Xenomai-core@domain.hid >>https://mail.gna.org/listinfo/xenomai-core >=20 >=20 >=20 --=20 Philippe.