* Re: kqueue microbenchmark results [not found] <200010260610.XAA11949@usr08.primenet.com> @ 2000-10-26 18:08 ` Terry Lambert 0 siblings, 0 replies; 13+ messages in thread From: Terry Lambert @ 2000-10-26 18:08 UTC (permalink / raw) To: Terry Lambert Cc: Alfred Perlstein, David Schwartz, Jonathan Lemon, chat, linux-kernel This is a long posting, with a humble beginning, but it has a point. I'm being complete so that no one is left in the dark, or in any doubt as to what that point is. That means rehashing some history. This posting is not really about select or Linux: it's about interfaces. Like cached state, interfaces can often be harmful. NB: I really should redirect this to FreeBSD, as well, since there are people in that camp who haven't learned the lesson, either, but I'll leave it in -chat, for now. --- [ ... kqueue discussion ... ] > Linux also thought it was OK to modify the contents of the > timeval structure before returning it. It's been pointed out that I should provide more context for this statement, before people look at me strangely and make circling motions with their index fingers around their ears (or whatever the international sign for "crazy" is these days). So I'll start with a brief history. The context is this: the select API was designed with the idea that one might wish to do non-I/O related background processing. Toward this end, one could have several ways of using the API: 1) The (struct timeval *) could be NULL. This means "block until a signal or until a condition on which you are selecting is true"; select is a BSD interface, and, until BSD 4.x and POSIX signals, the signal would actually call the handler and restart the select call, so in effect, this really meant "block until you longjmp out of a signal handler or until a condition on which you are selecting is true". 2) The (struct timeval *) could point to the address of a real timeval structure (i.e. not be NULL); in that case, the result depended on the contents: a) If the timeval struct was zero valued, it meant that the select should poll for one of the conditions being selected for in the descriptor set, and return a 0 if no conditions were true. The contents of the bitmaps and timeval struct were left alone. b) If the timeval struct was not zero valued, it meant that the select should wait until the time specified had expired since the system call was first started, or one of the conditions being selected for was true. If the timeout expired, then a 0 would be returned, but if one or more of the conditions were true, the number of descriptors on which true conditions existed would be returned. Wedging so much into a single interface was fraught with peril: it was undefined as to what would happen if the timeval specified an interval of 5 seconds, yet there was a persistently rescheduled alarm every 2 seconds, resulting in a signal handler call that did _not_ longjmp... would the timer expire after 5 seconds, or would the timer be considered to have been restarted along with the call? Implementations that went both ways existed. Mostly, programmers used longjmp in signal handlers, and it wasn't a portability issue. More perilous, the question of what to do with a partially satisfied request that was interrupted with a timer or signal handler and longjump (later, siginterrupt(2), and later POSIX non-restart default behaviour). This meant that the bitmap of select events might have been modified already, after the wakeup, but before the process was rescheduled to run. Finally, the select manual page specifically reserved the right to modify the contents of the timeval struct; this was presumably so that you could either do accurate timekeeping by maintaining a running tally using the timeval deficit (a lot of math, that), or, more likely, to deal with the system call restart, and ensure that signals would not prevent the select from ever exiting in the case of system call restart. So this was the select API definition. --- Being pragmatists, programmers programmed to the behaviour of the API in actual implementations, rather than to the strict "letter of the law" laid down by the man page. This meant that select was called in loop control constructs, and that the bitmaps were reinitialized each time through the loop. It also meant that the timeval struct was not reinitialized, since that was more work, and no known implementations would modify it. Pre-POSIX signals, signal handlers were handled on a signal stack, as a result of a kernel trampoline outcall, and that meant that a restarting system call would not impact the countdown. --- Linux came along, and implemented the letter of the law; the machines were no sufficiently fast, and the math sufficiently cheap, that it was now possible to usefully accurate timekeeping using the inverted math required of keeping a running tally using the timeval deficit. So they implemented it: it was more useful than the historical behaviour on most platforms. And every program which used non-zero valued timeval struct contents, and assumed that they would not be modified, broke. --- And here we see the problem with defining interfaces instead of defining protocols. A protocol is unambiguous with regard to implementation details. But an API, unless a lot of work takes place to make it sufficiently abstract, and a lot more work takes place to define exactly what will happen in all allowed conditions, and to preclude the possibility of undefined behaviour, simply can not hide implementation details. If what people are trying to do here is define a cross-platform system interface (and if they succeed, it will be the first one forced on mainstream UNIX by the Open Source community), then it means that careful design which eliminates ambiguity is the single most important consideration. There can be no undefined behaviour, like that of select's timeval struct updating, or the equally ambiguous, but less problematic, bitmap content partial update -- which could bite people on a new platform, but so far has not. --- I have seen the BSD kqueue interface called "overengineered"; but people apparently don't realize that it is not so much that it has been thought out to that level of detail beforehand, as it is that it is on its third revision. It wasn't really overengineered to where it is today: it has matured to where it is today. Just as poll (however much I disdain it for select, in favor of select's more universal platform portability) is a more mature interface than select, and resolves problems in the select design. Poll is not an overengineered interface, it is a more mature version of the select interface. --- FWIW: except for platform-specific applications, which I've tried very hard to avoid writing since the early 1980's or so, I will probably be very conservative in my adoption of a kqueue interface, whatever it ends up looking like, just as I've been conservative in my adoption of poll (and, untill 1989, my adoption of select, since there are other ways to solve the multiple input stream problem, without needing a select, poll, or kqueue, and which work all the way back to V7 UNIX). Unless there's a problem that can not be solved in any other way, such as performance or footprint, I'll stick to tools that are cross-platform. On general principles, it'd be a good idea if BSD and Linux ended up with the same unambiguous interface. The wider an interface is adopted, the quicker you will see people who can't afford to be nailed to the cross of a single platform willing to adopt it in their code. Ambiguity of any kind will hinder that adoption, and would certainly prevent adoption by mainstream UNIX: if you have to code it differently on different platforms, then you might as well code it differently on their platform, too. Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 13+ messages in thread
[parent not found: <20001025172702.B89038@prism.flugsvamp.com>]
[parent not found: <NCBBLIEPOCNJOAEKBEAKCEOPLHAA.davids@webmaster.com>]
[parent not found: <20001025161837.D28123@fw.wintelcom.net>]
* Re: kqueue microbenchmark results [not found] ` <20001025161837.D28123@fw.wintelcom.net> @ 2000-10-27 15:20 ` Jamie Lokier 2000-10-27 16:03 ` Alfred Perlstein 0 siblings, 1 reply; 13+ messages in thread From: Jamie Lokier @ 2000-10-27 15:20 UTC (permalink / raw) To: Alfred Perlstein; +Cc: David Schwartz, Jonathan Lemon, chat, linux-kernel Alfred Perlstein wrote: > > If a programmer does not ever wish to block under any circumstances, it's > > his obligation to communicate this desire to the implementation. Otherwise, > > the implementation can block if it doesn't have data or an error available > > at the instant 'read' is called, regardless of what it may have known or > > done in the past. > > Yes, and as you mentioned, it was _bugs_ in the operating system > that did this. Not for writes. POLLOUT may be returned when the kernel thinks you have enough memory to do a write, but someone else may allocate memory before you call write(). Or does POLLOUT not work this way? For read, you still want to declare the sockets non-blocking so your code is robust on _other_ operating systems. It's pretty straightforward. -- Jamie - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: kqueue microbenchmark results 2000-10-27 15:20 ` Jamie Lokier @ 2000-10-27 16:03 ` Alfred Perlstein 0 siblings, 0 replies; 13+ messages in thread From: Alfred Perlstein @ 2000-10-27 16:03 UTC (permalink / raw) To: Jamie Lokier; +Cc: David Schwartz, Jonathan Lemon, chat, linux-kernel * Jamie Lokier <lk@tantalophile.demon.co.uk> [001027 08:21] wrote: > Alfred Perlstein wrote: > > > If a programmer does not ever wish to block under any circumstances, it's > > > his obligation to communicate this desire to the implementation. Otherwise, > > > the implementation can block if it doesn't have data or an error available > > > at the instant 'read' is called, regardless of what it may have known or > > > done in the past. > > > > Yes, and as you mentioned, it was _bugs_ in the operating system > > that did this. > > Not for writes. POLLOUT may be returned when the kernel thinks you have > enough memory to do a write, but someone else may allocate memory before > you call write(). Or does POLLOUT not work this way? POLLOUT checks the socketbuffer (if we're talking about sockets), and yes you may still block on mbuf allocation (if we're talking about FreeBSD) if the socket isn't set non-blocking. Actually POLLOUT may be set even if there isn't enough memory for a write in the network buffer pool. > For read, you still want to declare the sockets non-blocking so your > code is robust on _other_ operating systems. It's pretty straightforward. Yes, it's true, not using non-blocking sockets is like ignoring friction in a physics problem, but assuming you have complete control over the machine it shouldn't trip you up that often. And we're talking about readability, not writeability which as you mentioned may block because of contention for the network buffer pool. -- -Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org] "I have the heart of a child; I keep it in a jar on my desk." - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 13+ messages in thread
[parent not found: <20001024225637.A54554@prism.flugsvamp.com>]
[parent not found: <39F6655A.353FD236@alumni.caltech.edu>]
[parent not found: <20001025010246.B57913@prism.flugsvamp.com>]
[parent not found: <20001025112709.A1500@stormix.com>]
[parent not found: <20001025122307.B78130@prism.flugsvamp.com>]
[parent not found: <20001025114028.F12064@stormix.com>]
[parent not found: <20001025165626.B87091@prism.flugsvamp.com>]
[parent not found: <39F7F66C.55B158@cisco.com>]
* Re: kqueue microbenchmark results [not found] ` <39F7F66C.55B158@cisco.com> @ 2000-10-26 16:50 ` Jonathan Lemon 2000-10-27 0:50 ` Alan Cox 0 siblings, 1 reply; 13+ messages in thread From: Jonathan Lemon @ 2000-10-26 16:50 UTC (permalink / raw) To: Gideon Glass; +Cc: Jonathan Lemon, Simon Kirby, Dan Kegel, chat, linux-kernel On Thu, Oct 26, 2000 at 02:16:28AM -0700, Gideon Glass wrote: > Jonathan Lemon wrote: > > > > Also, consider the following scenario for the proposed get_event(): > > > > 1. packet arrives, queues an event. > > 2. user retrieves event. > > 3. second packet arrives, queues event again. > > 4. user reads() all data. > > > > Now, next time around the loop, we get a notification for an event > > when there is no data to read. The application now must be prepared > > to handle this case (meaning no blocking read() calls can be used). > > > > Also, what happens if the user closes the socket after step 4 above? > > Depends on the implementation. If the item in the queue is the > struct file (or whatever an fd indexes to), then the implementation > can only queue the fd once. This also avoids the problem with > closing sockets - close() would naturally do a list_del() or whatever > on the struct file. > > At least I think it could be implemented this way... kqueue currently does this; a close() on an fd will remove any pending events from the queues that they are on which correspond to that fd. I was trying to point out that it isn't as simple as it would seem at first glance, as you have to consider an issues like this. Also, if the implementation allows multiple event types per fd, (leading to multiple queued events per fd) there no longer is a 1:1 mapping to something like 'struct file', and performing a list walk doesn't scale very well. -- Jonathan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: kqueue microbenchmark results 2000-10-26 16:50 ` Jonathan Lemon @ 2000-10-27 0:50 ` Alan Cox 2000-10-27 1:02 ` Alfred Perlstein 2000-10-27 1:10 ` Jonathan Lemon 0 siblings, 2 replies; 13+ messages in thread From: Alan Cox @ 2000-10-27 0:50 UTC (permalink / raw) To: Jonathan Lemon Cc: Gideon Glass, Jonathan Lemon, Simon Kirby, Dan Kegel, chat, linux-kernel > kqueue currently does this; a close() on an fd will remove any pending > events from the queues that they are on which correspond to that fd. This seems an odd thing to do. Surely what you need to do is to post a 'close completed' event to the queue. This also makes more sense when you have a threaded app and another thread may well currently be in say a read at the time it is closed - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: kqueue microbenchmark results 2000-10-27 0:50 ` Alan Cox @ 2000-10-27 1:02 ` Alfred Perlstein 2000-10-27 1:10 ` Jonathan Lemon 1 sibling, 0 replies; 13+ messages in thread From: Alfred Perlstein @ 2000-10-27 1:02 UTC (permalink / raw) To: Alan Cox Cc: Jonathan Lemon, Gideon Glass, Simon Kirby, Dan Kegel, chat, linux-kernel * Alan Cox <alan@lxorguk.ukuu.org.uk> [001026 17:50] wrote: > > kqueue currently does this; a close() on an fd will remove any pending > > events from the queues that they are on which correspond to that fd. > > This seems an odd thing to do. Surely what you need to do is to post a > 'close completed' event to the queue. This also makes more sense when you > have a threaded app and another thread may well currently be in say a read > at the time it is closed Kqueue's flexibility could allow this to be implemented, all you would need to do is make a new filter trigger. You might need a _bit_ of hackery to make sure those aren't removed, or one could just add the event after clearing all pending events. Adding a filter to be informed when a specific fd is closed is certainly an option, it doesn't make very much sense because that fd could then be reused quickly by something else... but anyhow: The point of this interface is to ask kqueue to report only on the things you are interested in, not to generate superfluous that you wouldn't care about. You could make such a flag if Linux adopted this interface and I'm sure we'd be forced to adopt it, but if you make kqueue generate info an application won't care about I don't think that would be taken back. -- -Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org] "I have the heart of a child; I keep it in a jar on my desk." - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: kqueue microbenchmark results 2000-10-27 0:50 ` Alan Cox 2000-10-27 1:02 ` Alfred Perlstein @ 2000-10-27 1:10 ` Jonathan Lemon 2000-10-27 1:32 ` Alan Cox 1 sibling, 1 reply; 13+ messages in thread From: Jonathan Lemon @ 2000-10-27 1:10 UTC (permalink / raw) To: Alan Cox Cc: Jonathan Lemon, Gideon Glass, Simon Kirby, Dan Kegel, chat, linux-kernel On Fri, Oct 27, 2000 at 01:50:40AM +0100, Alan Cox wrote: > > kqueue currently does this; a close() on an fd will remove any pending > > events from the queues that they are on which correspond to that fd. > > This seems an odd thing to do. Surely what you need to do is to post a > 'close completed' event to the queue. This also makes more sense when you > have a threaded app and another thread may well currently be in say a read > at the time it is closed Actually, it makes sense when you think about it. The `fd' is actually a capability that the application uses to refer to the open file in the kernel. If the app does a close() on the fd, it destroys this naming. The application then has no capability left which refers to the formerly open socket, and conversly, the kernel has no capability (name) to notify the application of a close event. What can I say, "the fd formerly known as X" is now gone? It would be incorrect to say that "fd X was closed", since X no longer refers to anything, and the application may have reused that fd for another file. As for the multi-thread case, this would be a bug; if one thread closes the descriptor, the other thread is going to get an EBADF when it goes to perform the read. -- Jonathan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: kqueue microbenchmark results 2000-10-27 1:10 ` Jonathan Lemon @ 2000-10-27 1:32 ` Alan Cox 2000-10-27 1:46 ` Alfred Perlstein 2000-10-27 16:21 ` Dan Kegel 0 siblings, 2 replies; 13+ messages in thread From: Alan Cox @ 2000-10-27 1:32 UTC (permalink / raw) To: Jonathan Lemon Cc: Alan Cox, Jonathan Lemon, Gideon Glass, Simon Kirby, Dan Kegel, chat, linux-kernel > the application of a close event. What can I say, "the fd formerly known > as X" is now gone? It would be incorrect to say that "fd X was closed", > since X no longer refers to anything, and the application may have reused > that fd for another file. Which is precisely why you need to know where in the chain of events this happened. Otherwise if I see 'read on fd 5' 'read on fd 5' How do I know which read is for which fd in the multithreaded case > As for the multi-thread case, this would be a bug; if one thread closes > the descriptor, the other thread is going to get an EBADF when it goes > to perform the read. Another thread may already have reused the fd - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: kqueue microbenchmark results 2000-10-27 1:32 ` Alan Cox @ 2000-10-27 1:46 ` Alfred Perlstein 2000-10-27 16:21 ` Dan Kegel 1 sibling, 0 replies; 13+ messages in thread From: Alfred Perlstein @ 2000-10-27 1:46 UTC (permalink / raw) To: Alan Cox Cc: Jonathan Lemon, Gideon Glass, Simon Kirby, Dan Kegel, chat, linux-kernel * Alan Cox <alan@lxorguk.ukuu.org.uk> [001026 18:33] wrote: > > the application of a close event. What can I say, "the fd formerly known > > as X" is now gone? It would be incorrect to say that "fd X was closed", > > since X no longer refers to anything, and the application may have reused > > that fd for another file. > > Which is precisely why you need to know where in the chain of events this > happened. Otherwise if I see > > 'read on fd 5' > 'read on fd 5' > > How do I know which read is for which fd in the multithreaded case No you don't, you don't see anything with the current code unless fd 5 is still around, what you're presenting to Jonathan is a application threading problem, not something that need to be resolved by the OS. > > As for the multi-thread case, this would be a bug; if one thread closes > > the descriptor, the other thread is going to get an EBADF when it goes > > to perform the read. > > Another thread may already have reused the fd This is another example of an application threading problem. -- -Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org] "I have the heart of a child; I keep it in a jar on my desk." - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: kqueue microbenchmark results 2000-10-27 1:32 ` Alan Cox 2000-10-27 1:46 ` Alfred Perlstein @ 2000-10-27 16:21 ` Dan Kegel 2000-10-27 16:42 ` Alfred Perlstein 2000-10-27 23:08 ` Terry Lambert 1 sibling, 2 replies; 13+ messages in thread From: Dan Kegel @ 2000-10-27 16:21 UTC (permalink / raw) To: Alan Cox; +Cc: Jonathan Lemon, Gideon Glass, Simon Kirby, chat, linux-kernel Alan Cox wrote: > > > kqueue currently does this; a close() on an fd will remove any pending > > > events from the queues that they are on which correspond to that fd. > > > > the application of a close event. What can I say, "the fd formerly known > > as X" is now gone? It would be incorrect to say that "fd X was closed", > > since X no longer refers to anything, and the application may have reused > > that fd for another file. > > Which is precisely why you need to know where in the chain of events this > happened. Otherwise if I see > > 'read on fd 5' > 'read on fd 5' > > How do I know which read is for which fd in the multithreaded case That can't happen, can it? Let's say the following happens: close(5) accept() = 5 call kevent() and rebind fd 5 The 'close(5)' would remove the old fd 5 events. Therefore, any fd 5 events you see returned from kevent are for the new fd 5. (I suspect it helps that kevent() is both the only way to bind events and the only way to pick them up; makes it harder for one thread to sneak a new fd into the event list without the thread calling kevent() noticing.) - Dan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: kqueue microbenchmark results 2000-10-27 16:21 ` Dan Kegel @ 2000-10-27 16:42 ` Alfred Perlstein 2000-10-27 23:08 ` Terry Lambert 1 sibling, 0 replies; 13+ messages in thread From: Alfred Perlstein @ 2000-10-27 16:42 UTC (permalink / raw) To: Dan Kegel Cc: Alan Cox, Jonathan Lemon, Gideon Glass, Simon Kirby, chat, linux-kernel * Dan Kegel <dank@alumni.caltech.edu> [001027 09:40] wrote: > Alan Cox wrote: > > > > kqueue currently does this; a close() on an fd will remove any pending > > > > events from the queues that they are on which correspond to that fd. > > > > > > the application of a close event. What can I say, "the fd formerly known > > > as X" is now gone? It would be incorrect to say that "fd X was closed", > > > since X no longer refers to anything, and the application may have reused > > > that fd for another file. > > > > Which is precisely why you need to know where in the chain of events this > > happened. Otherwise if I see > > > > 'read on fd 5' > > 'read on fd 5' > > > > How do I know which read is for which fd in the multithreaded case > > That can't happen, can it? Let's say the following happens: > close(5) > accept() = 5 > call kevent() and rebind fd 5 > The 'close(5)' would remove the old fd 5 events. Therefore, > any fd 5 events you see returned from kevent are for the new fd 5. > > (I suspect it helps that kevent() is both the only way to > bind events and the only way to pick them up; makes it harder > for one thread to sneak a new fd into the event list without > the thread calling kevent() noticing.) Yes, that's how it does and should work. Noticing the close() should be done via thread communication/IPC not stuck into kqueue. -- -Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org] "I have the heart of a child; I keep it in a jar on my desk." - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: kqueue microbenchmark results 2000-10-27 16:21 ` Dan Kegel 2000-10-27 16:42 ` Alfred Perlstein @ 2000-10-27 23:08 ` Terry Lambert 2000-10-28 0:24 ` Dan Kegel 1 sibling, 1 reply; 13+ messages in thread From: Terry Lambert @ 2000-10-27 23:08 UTC (permalink / raw) To: dank Cc: Alan Cox, Jonathan Lemon, Gideon Glass, Simon Kirby, chat, linux-kernel > > Which is precisely why you need to know where in the chain of events this > > happened. Otherwise if I see > > > > 'read on fd 5' > > 'read on fd 5' > > > > How do I know which read is for which fd in the multithreaded case > > That can't happen, can it? Let's say the following happens: > close(5) > accept() = 5 > call kevent() and rebind fd 5 > The 'close(5)' would remove the old fd 5 events. Therefore, > any fd 5 events you see returned from kevent are for the new fd 5. > > (I suspect it helps that kevent() is both the only way to > bind events and the only way to pick them up; makes it harder > for one thread to sneak a new fd into the event list without > the thread calling kevent() noticing.) Strictly speaking, it can happen in two cases: 1) single acceptor thread, multiple worker threads 2) multiple anonymous "work to do" threads In both these cases, the incoming requests from a client are given to any thread, rather than a particular thread. In the first case, we can have (id:executer order:event): 1:1:open 5 2:2:read 5 3:4:read 5 2:3:close 5 If thread 2 processes the close event before thread 3 processes the read event, then when thread 3 attempts procssing, it will fail. Technically, this is a group ordering problem in the design of the software, which should instead queue all events to a dispatch thread, and the threads should use IPC to serialize processing of serial events. This is similar to the problem with async mounted FS recovery in event of a crash: without ordering guarantees, you can only get to a "good" state, not necessarily "the one correct state". In the second case, we can have: 1:2:read 5 2:1:open 5 3:4:read 5 2:3:close 5 This is just a non-degenerate form of the first case, where we allow thread 1 and all other threads to be identical, and don't serialize open state initialization. The NetWare for UNIX system uses this model. The benefit is that all user space threads can be identical. This means that I can use either threads or processes, and it won't matter, so my software can run on older systems that lack "perfect" threads models, simply by using processes, and putting client state into shared memory. In this case, there is no need for inter-thread synchronization; instead, we must insist that events be dispatched sequentially, and that the events be processed serially. This effectively requires event processing completion notigfication from user space to kernel space. In NetWare for UNIX, this was accomplished using a streams MUX which knew that the NetWare protocol was request-response. This also permitted "busy" responses to be turned around in kernel space, without incurring a kernel-to-user space scheduling penalty. It also permitted "piggyback", where an ioctl to the mux was used to respond, and combined sending a response with the next read. This reduced protection domain crossing and the context switch overhead by 50%. Finally, the MUX sent requests to user space in LIFO order. This approach is called "hot engine scheduling", in that the last reader in from user space is the most likely to have its pages in core, so as to not need swapping to handle the next request. I was architect of much of the process model discussed above; as you can see, there are some significant performance wins to be had by building the right interfaces, and putting the code on the right side of the user/kernel boundary. In any case, the answer is that you can not assume that the only correct way to solve a problem like event inversion is serialization of events in user space (or kernel space). This is not strictly a "threaded application implementation" issue, and it is not strictly a kernel serialization of event delivery issue. Another case, which NetWare did not handle, is that of rejected authentication. Even if you went with the first model, and forced your programmers to use expensive inter-thread synchronization, or worse, bound each client to a single thread in the server, thus rendering the system likely to have skewed thread load, getting worse the longer the connection was up, you would still have the problem of rejected authentication. A client might attempt to send authentication followed by commands in the same packet series, without waiting for an explicit ACK after each one (i.e. it might attempt to implement a sliding window over a virtual circuit), and the system on the other end might dilligently queue the events, only to have the authentication be rejected, but with packets queued already to user space for processing, assuming serialization in user space. You would then need a much more complex mechanism, to allow you to invalidate an already queued event to another thread, which you don't know about in your thread, before you release the interlock. Otherwise the client may get responses without a valid authentication. You need look no further than LDAPv3 for an example of a protocol where this is possible (assuming X.509 certificate based SASL authentication, where authentication is not challenge-response, but where it consists solely of presenting ones certificate). When considering this API, you have to consider more than just the programming models you think are "right", you have to consider all of the that are possible. All in all, this is an interesting discussion. 8-). Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: kqueue microbenchmark results 2000-10-27 23:08 ` Terry Lambert @ 2000-10-28 0:24 ` Dan Kegel 0 siblings, 0 replies; 13+ messages in thread From: Dan Kegel @ 2000-10-28 0:24 UTC (permalink / raw) To: Terry Lambert Cc: Alan Cox, Jonathan Lemon, Gideon Glass, Simon Kirby, chat, linux-kernel Terry Lambert wrote: > > > > Which is precisely why you need to know where in the chain of events this > > > happened. Otherwise if I see > > > 'read on fd 5' > > > 'read on fd 5' > > > How do I know which read is for which fd in the multithreaded case > > > > That can't happen, can it? Let's say the following happens: > > close(5) > > accept() = 5 > > call kevent() and rebind fd 5 > > The 'close(5)' would remove the old fd 5 events. Therefore, > > any fd 5 events you see returned from kevent are for the new fd 5. > > Strictly speaking, it can happen in two cases: > > 1) single acceptor thread, multiple worker threads > 2) multiple anonymous "work to do" threads > > In both these cases, the incoming requests from a client are > given to any thread, rather than a particular thread. > > In the first case, we can have (id:executer order:event): > > 1:1:open 5 > 2:2:read 5 > 3:4:read 5 > 2:3:close 5 > > If thread 2 processes the close event before thread 3 processes > the read event, then when thread 3 attempts procssing, it will > fail. You're not talking about kqueue() / kevent() here, are you? With that interface, thread 2 would not see a close event; instead, the other events for fd 5 would vanish from the queue. If you were indeed talking about kqueue() / kevent(), please flesh out the example a bit more, showing who calls kevent(). (A race that *can* happen is fd 5 could be closed by another thread after a 'read 5' event is pulled from the event queue and before it is processed, but that could happen with any readiness notification API at all.) - Dan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2000-10-28 0:19 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <200010260610.XAA11949@usr08.primenet.com>
2000-10-26 18:08 ` kqueue microbenchmark results Terry Lambert
[not found] <20001025172702.B89038@prism.flugsvamp.com>
[not found] ` <NCBBLIEPOCNJOAEKBEAKCEOPLHAA.davids@webmaster.com>
[not found] ` <20001025161837.D28123@fw.wintelcom.net>
2000-10-27 15:20 ` Jamie Lokier
2000-10-27 16:03 ` Alfred Perlstein
[not found] <20001024225637.A54554@prism.flugsvamp.com>
[not found] ` <39F6655A.353FD236@alumni.caltech.edu>
[not found] ` <20001025010246.B57913@prism.flugsvamp.com>
[not found] ` <20001025112709.A1500@stormix.com>
[not found] ` <20001025122307.B78130@prism.flugsvamp.com>
[not found] ` <20001025114028.F12064@stormix.com>
[not found] ` <20001025165626.B87091@prism.flugsvamp.com>
[not found] ` <39F7F66C.55B158@cisco.com>
2000-10-26 16:50 ` Jonathan Lemon
2000-10-27 0:50 ` Alan Cox
2000-10-27 1:02 ` Alfred Perlstein
2000-10-27 1:10 ` Jonathan Lemon
2000-10-27 1:32 ` Alan Cox
2000-10-27 1:46 ` Alfred Perlstein
2000-10-27 16:21 ` Dan Kegel
2000-10-27 16:42 ` Alfred Perlstein
2000-10-27 23:08 ` Terry Lambert
2000-10-28 0:24 ` Dan Kegel
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox