From mboxrd@z Thu Jan 1 00:00:00 1970 From: Ulrich Drepper Subject: Re: [take24 0/6] kevent: Generic event handling mechanism. Date: Tue, 21 Nov 2006 08:58:49 -0800 Message-ID: <45633049.2000209@redhat.com> References: <11630606361046@2ka.mipt.ru> <45564EA5.6020607@redhat.com> <20061113105458.GA8182@2ka.mipt.ru> <4560F07B.10608@redhat.com> <20061120082500.GA25467@2ka.mipt.ru> <4562102B.5010503@redhat.com> <20061121095302.GA15210@2ka.mipt.ru> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: David Miller , Andrew Morton , netdev , Zach Brown , Christoph Hellwig , Chase Venters , Johann Borck , linux-kernel@vger.kernel.org, Jeff Garzik , Alexander Viro Return-path: To: Evgeniy Polyakov In-Reply-To: <20061121095302.GA15210@2ka.mipt.ru> Sender: linux-kernel-owner@vger.kernel.org List-Id: netdev.vger.kernel.org Evgeniy Polyakov wrote: >> You don't want to have a channel like this. The userlevel code does= n't=20 >> know which threads are waiting in the kernel on the event queue. An= d it=20 >> seems to be much more complicated then simply have an kevent call wh= ich=20 >> tells the kernel "wake up N or 1 more threads since I cannot handle = it".=20 >> Basically a futex_wake()-like call. >=20 > Kernel does not know about any threads which waits for events, it onl= y > has queue of events, it can only wake those who was parked in > kevent_get_events() or kevent_wait(), but syscall will return only wh= en > condition it waits on is true, i.e. when there is new event in the re= ady > queue and/or ring buffer has empty slots, but kernel will wake them u= p > in any case if those conditions are true. >=20 > How should it know which syscall should be interrupted when special s= yscall > is called? It's not about interrupting any threads. The issue is that the wakeup of a thread from the kevent_wait call=20 constitutes an "event notification". If, as it should be, only one=20 thread is woken than this information mustn't get lost. If the woken=20 thread cannot work on the events it got notified for, then it must tell= =20 the kernel about it so that, *if* there are other threads waiting in=20 kevent_wait, one of those other threads can be woken. What is needed is a simple "wake another thread waiting on this event=20 queue" syscall. Yes, in theory we could open an additional pipe with=20 each event queue and use it for waking threads, but this is influencing= =20 the ABI through the use of a file descriptor. It's much better to have= =20 an explicit way to do this. > No AIO, but syscall. > Only syscall time matters. > Syscall starts, it sould be sometime stopped. When it should be stopp= ed? > It should be stopped after some time after it was started! >=20 > I still do not understand how will you use absolute timeout values > there. Please exaplain. What is there to explain? If you are waiting for events which must=20 coincide with real-world events you'll naturally will want to formulate= =20 something like "wait for X until 10:15h". You cannot formulate this=20 correctly with relative timeouts since the realtime clock might be adju= sted. > futex_wait() uses relative timeouts: > static int futex_wait(u32 __user *uaddr, u32 val, unsigned long time= ) >=20 > Kernel use relative timeouts. Look again. This time at the implementation. For FUTEX_LOCK_PI the=20 timeout is an absolute timeout. > We have not have such symmetry. > Other event handling interfaces can not work with events, which do no= t > have file descriptor behind them. Kevent can and works. > Signals are just usual events. >=20 > You request to get events - and you get them. > You request to not get events during syscall - you remove events. None of this matches what I'm talking about. If you want to block a=20 signal for the duration of the kevent_wait call this is nothing you can= =20 do by registering an event. Registering events has nothing to do with signal masks. They are not=20 modified. It is the program's responsibility to set the mask up=20 correctly. Just like sigwaitinfo() etc expect all signals which are=20 waited on to be blocked. The signal mask handling is orthogonal to all this and must be explicit= =2E=20 In some cases explicit pthread_sigmask/sigprocmask calls. But this i= s=20 not atomic if a signal must be masked/unmasked for the *_wait call.=20 This is why we have variants like pselect/ppoll/epoll_pwait which=20 explicitly and *atomically* change the signal mask for the duration of=20 the call. > Btw, please point me to the discussion about real life usefullness of > that parameter for epoll. I read thread where sys_pepoll() was > intruduced, but except some theoretical handwaving about possible > usefullness there are no real signs of that requirement. Don't search for epoll_pwait, it's not widely used yet. Search for=20 pselect, which is standardized. You'll find plenty of uses of that=20 interface. The number is certainly depressed in the moment since until= =20 recently there was no correct implementation on Linux. And the=20 interface is mostly used in real-time contexts where signals are more=20 commonly used. > What is the ground research or extended explaination about > blocking/unblocking some signals during syscall execution? Why is this even a question? Have you done programming with signals?=20 You hatred of signals makes me think this isn't the case. You might want to unblock a signal on a *_wait call if it can be used t= o=20 interrupt the wait but you don't want this to happen during when the=20 thread is working on a request. You might want to block a signal, for instance, around a sigwaitinfo=20 call or, in this case, a kevent_wait call where the signal might be=20 delivered to the queue. There are countless possibilities. Signals are very flexible. > There are _no_ additional syscalls. > I just introduced new case for event type. Which is a new syscall. All demultiplexer cases are no syscalls.=20 Which, BTW, implies that unrecognized types should actually cause a=20 ENOSYS return value (this affects kevent_break). We've been over this=20 many times. If EINVAL is return this case cannot be distinguished from= =20 invalid parameters. This is crucial for future extensions where=20 userland (esp glibc) needs to be able to determine whether a new featur= e=20 is supported on the system. > You _need_ it to be done, since any kernel kevent user must have > enqueue/dequeue/callback callbacks. It is just an implementation of t= hat > callbacks. I don't question that. But there is no need to add the callback. It=20 extends the kernel ABI/API. And for what? A vastly inferior timer=20 implementation compared to the POSIX timers. And this while all that=20 needs to be done is to extend the POSIX timer code slightly to handle=20 SIGEV_KEVENT in addition to the other notification methods currently=20 used. If you do it right then the code can be shared with the file AIO= =20 code which currently is circulated as well and which uses parts of the=20 POSIX timer infrastructure. > Btw, how POSIX API should be extended to allow to queue events - queu= e > is required (which is created when user calls kevent_init() or > previoisly opened /dev/kevent), how should it be accessed, since it i= s > just a file descriptor in process task_struct. I've explained this multiple times. The struct sigevent structure need= s=20 to be extended to get a new part in the union. Something like struct { int kevent_fd; void *data; } _sigev_kevent; Then define SIGEV_KEVENT as a value distinct from the other SIGEV_=20 values. In the code which handles setup of timers (the timer_create=20 syscall), recognize SIGEV_KEVENT and handle it appropriately. I.e.,=20 call into the code to register the event source, just like you'd do wit= h=20 the current interface. Then add the code to post an event to the event= =20 queue where currently signals would be sent et voil=C3=A0. --=20 =E2=9E=A7 Ulrich Drepper =E2=9E=A7 Red Hat, Inc. =E2=9E=A7 444 Castro S= t =E2=9E=A7 Mountain View, CA =E2=9D=96