public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Re: kqueue microbenchmark results
       [not found]             ` <39F7F66C.55B158@cisco.com>
@ 2000-10-26 16:50               ` Jonathan Lemon
  2000-10-27  0:50                 ` Alan Cox
  0 siblings, 1 reply; 13+ messages in thread
From: Jonathan Lemon @ 2000-10-26 16:50 UTC (permalink / raw)
  To: Gideon Glass; +Cc: Jonathan Lemon, Simon Kirby, Dan Kegel, chat, linux-kernel

On Thu, Oct 26, 2000 at 02:16:28AM -0700, Gideon Glass wrote:
> Jonathan Lemon wrote:
> > 
> > Also, consider the following scenario for the proposed get_event():
> > 
> >    1. packet arrives, queues an event.
> >    2. user retrieves event.
> >    3. second packet arrives, queues event again.
> >    4. user reads() all data.
> > 
> > Now, next time around the loop, we get a notification for an event
> > when there is no data to read.  The application now must be prepared
> > to handle this case (meaning no blocking read() calls can be used).
> > 
> > Also, what happens if the user closes the socket after step 4 above?
> 
> Depends on the implementation.  If the item in the queue is the
> struct file (or whatever an fd indexes to), then the implementation
> can only queue the fd once.  This also avoids the problem with
> closing sockets - close() would naturally do a list_del() or whatever
> on the struct file.
> 
> At least I think it could be implemented this way...

kqueue currently does this; a close() on an fd will remove any pending
events from the queues that they are on which correspond to that fd.
I was trying to point out that it isn't as simple as it would seem at
first glance, as you have to consider an issues like this.  Also, if the 
implementation allows multiple event types per fd, (leading to multiple
queued events per fd) there no longer is a 1:1 mapping to something like
'struct file', and performing a list walk doesn't scale very well.
--
Jonathan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: kqueue microbenchmark results
       [not found] <200010260610.XAA11949@usr08.primenet.com>
@ 2000-10-26 18:08 ` Terry Lambert
  0 siblings, 0 replies; 13+ messages in thread
From: Terry Lambert @ 2000-10-26 18:08 UTC (permalink / raw)
  To: Terry Lambert
  Cc: Alfred Perlstein, David Schwartz, Jonathan Lemon, chat,
	linux-kernel

This is a long posting, with a humble beginning, but it has
a point.  I'm being complete so that no one is left in the
dark, or in any doubt as to what that point is.  That means
rehashing some history.

This posting is not really about select or Linux: it's about
interfaces.  Like cached state, interfaces can often be
harmful.

NB: I really should redirect this to FreeBSD, as well, since
there are people in that camp who haven't learned the lesson,
either, but I'll leave it in -chat, for now.

---

[ ... kqueue discussion ... ]

> Linux also thought it was OK to modify the contents of the
> timeval structure before returning it.

It's been pointed out that I should provide more context
for this statement, before people look at me strangely and
make circling motions with their index fingers around
their ears (or whatever the international sign for "crazy"
is these days).  So I'll start with a brief history.

The context is this: the select API was designed with the
idea that one might wish to do non-I/O related background
processing.  Toward this end, one could have several ways
of using the API:

1)	The (struct timeval *) could be NULL.  This means
	"block until a signal or until a condition on
	which you are selecting is true"; select is a BSD
	interface, and, until BSD 4.x and POSIX signals,
	the signal would actually call the handler and
	restart the select call, so in effect, this really
	meant "block until you longjmp out of a signal
	handler or until a condition on which you are
	selecting is true".

2)	The (struct timeval *) could point to the address
	of a real timeval structure (i.e. not be NULL); in
	that case, the result depended on the contents:

	a)	If the timeval struct was zero valued, it
		meant that the select should poll for one
		of the conditions being selected for in
		the descriptor set, and return a 0 if no
		conditions were true.  The contents of
		the bitmaps and timeval struct were left
		alone.

	b)	If the timeval struct was not zero valued,
		it meant that the select should wait until
		the time specified had expired since the
		system call was first started, or one of
		the conditions being selected for was true.
		If the timeout expired, then a 0 would be
		returned, but if one or more of the conditions
		were true, the number of descriptors on which
		true conditions existed would be returned.

Wedging so much into a single interface was fraught with peril:
it was undefined as to what would happen if the timeval specified
an interval of 5 seconds, yet there was a persistently rescheduled
alarm every 2 seconds, resulting in a signal handler call that did
_not_ longjmp... would the timer expire after 5 seconds, or would
the timer be considered to have been restarted along with the call?
Implementations that went both ways existed.  Mostly, programmers
used longjmp in signal handlers, and it wasn't a portability issue.

More perilous, the question of what to do with a partially
satisfied request that was interrupted with a timer or signal
handler and longjump (later, siginterrupt(2), and later POSIX
non-restart default behaviour).  This meant that the bitmap of
select events might have been modified already, after the
wakeup, but before the process was rescheduled to run.

Finally, the select manual page specifically reserved the right
to modify the contents of the timeval struct; this was presumably
so that you could either do accurate timekeeping by maintaining
a running tally using the timeval deficit (a lot of math, that),
or, more likely, to deal with the system call restart, and ensure
that signals would not prevent the select from ever exiting in
the case of system call restart.

So this was the select API definition.

---

Being pragmatists, programmers programmed to the behaviour of
the API in actual implementations, rather than to the strict
"letter of the law" laid down by the man page.  This meant
that select was called in loop control constructs, and that
the bitmaps were reinitialized each time through the loop.

It also meant that the timeval struct was not reinitialized,
since that was more work, and no known implementations would
modify it.  Pre-POSIX signals, signal handlers were handled on
a signal stack, as a result of a kernel trampoline outcall,
and that meant that a restarting system call would not impact
the countdown.

---

Linux came along, and implemented the letter of the law; the
machines were no sufficiently fast, and the math sufficiently
cheap, that it was now possible to usefully accurate timekeeping
using the inverted math required of keeping a running tally
using the timeval deficit.  So they implemented it: it was
more useful than the historical behaviour on most platforms.

And every program which used non-zero valued timeval struct
contents, and assumed that they would not be modified, broke.

---

And here we see the problem with defining interfaces instead of
defining protocols.  A protocol is unambiguous with regard to
implementation details.  But an API, unless a lot of work takes
place to make it sufficiently abstract, and a lot more work
takes place to define exactly what will happen in all allowed
conditions, and to preclude the possibility of undefined
behaviour, simply can not hide implementation details.



If what people are trying to do here is define a cross-platform
system interface (and if they succeed, it will be the first one
forced on mainstream UNIX by the Open Source community), then
it means that careful design which eliminates ambiguity is the
single most important consideration.  There can be no undefined
behaviour, like that of select's timeval struct updating, or the
equally ambiguous, but less problematic, bitmap content partial
update -- which could bite people on a new platform, but so far
has not.

---

I have seen the BSD kqueue interface called "overengineered";
but people apparently don't realize that it is not so much
that it has been thought out to that level of detail beforehand,
as it is that it is on its third revision.  It wasn't really
overengineered to where it is today: it has matured to where it
is today.

Just as poll (however much I disdain it for select, in favor of
select's more universal platform portability) is a more mature
interface than select, and resolves problems in the select
design.  Poll is not an overengineered interface, it is a more
mature version of the select interface.

---

FWIW: except for platform-specific applications, which I've tried
very hard to avoid writing since the early 1980's or so, I will
probably be very conservative in my adoption of a kqueue interface,
whatever it ends up looking like, just as I've been conservative in
my adoption of poll (and, untill 1989, my adoption of select, since
there are other ways to solve the multiple input stream problem,
without needing a select, poll, or kqueue, and which work all
the way back to V7 UNIX).  Unless there's a problem that can not
be solved in any other way, such as performance or footprint, I'll
stick to tools that are cross-platform.

On general principles, it'd be a good idea if BSD and Linux
ended up with the same unambiguous interface.  The wider an
interface is adopted, the quicker you will see people who can't
afford to be nailed to the cross of a single platform willing
to adopt it in their code.  Ambiguity of any kind will hinder
that adoption, and would certainly prevent adoption by mainstream
UNIX: if you have to code it differently on different platforms,
then you might as well code it differently on their platform, too.


					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: kqueue microbenchmark results
  2000-10-26 16:50               ` Jonathan Lemon
@ 2000-10-27  0:50                 ` Alan Cox
  2000-10-27  1:02                   ` Alfred Perlstein
  2000-10-27  1:10                   ` Jonathan Lemon
  0 siblings, 2 replies; 13+ messages in thread
From: Alan Cox @ 2000-10-27  0:50 UTC (permalink / raw)
  To: Jonathan Lemon
  Cc: Gideon Glass, Jonathan Lemon, Simon Kirby, Dan Kegel, chat,
	linux-kernel

> kqueue currently does this; a close() on an fd will remove any pending
> events from the queues that they are on which correspond to that fd.

This seems an odd thing to do. Surely what you need to do is to post a
'close completed' event to the queue. This also makes more sense when you
have a threaded app and another thread may well currently be in say a read
at the time it is closed

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: kqueue microbenchmark results
  2000-10-27  0:50                 ` Alan Cox
@ 2000-10-27  1:02                   ` Alfred Perlstein
  2000-10-27  1:10                   ` Jonathan Lemon
  1 sibling, 0 replies; 13+ messages in thread
From: Alfred Perlstein @ 2000-10-27  1:02 UTC (permalink / raw)
  To: Alan Cox
  Cc: Jonathan Lemon, Gideon Glass, Simon Kirby, Dan Kegel, chat,
	linux-kernel

* Alan Cox <alan@lxorguk.ukuu.org.uk> [001026 17:50] wrote:
> > kqueue currently does this; a close() on an fd will remove any pending
> > events from the queues that they are on which correspond to that fd.
> 
> This seems an odd thing to do. Surely what you need to do is to post a
> 'close completed' event to the queue. This also makes more sense when you
> have a threaded app and another thread may well currently be in say a read
> at the time it is closed

Kqueue's flexibility could allow this to be implemented, all you
would need to do is make a new filter trigger.  You might need
a _bit_ of hackery to make sure those aren't removed, or one
could just add the event after clearing all pending events.

Adding a filter to be informed when a specific fd is closed is
certainly an option, it doesn't make very much sense because that
fd could then be reused quickly by something else...

but anyhow:

The point of this interface is to ask kqueue to report only on the
things you are interested in, not to generate superfluous that you
wouldn't care about.  You could make such a flag if Linux adopted
this interface and I'm sure we'd be forced to adopt it, but if you
make kqueue generate info an application won't care about I don't
think that would be taken back.

-- 
-Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org]
"I have the heart of a child; I keep it in a jar on my desk."
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: kqueue microbenchmark results
  2000-10-27  0:50                 ` Alan Cox
  2000-10-27  1:02                   ` Alfred Perlstein
@ 2000-10-27  1:10                   ` Jonathan Lemon
  2000-10-27  1:32                     ` Alan Cox
  1 sibling, 1 reply; 13+ messages in thread
From: Jonathan Lemon @ 2000-10-27  1:10 UTC (permalink / raw)
  To: Alan Cox
  Cc: Jonathan Lemon, Gideon Glass, Simon Kirby, Dan Kegel, chat,
	linux-kernel

On Fri, Oct 27, 2000 at 01:50:40AM +0100, Alan Cox wrote:
> > kqueue currently does this; a close() on an fd will remove any pending
> > events from the queues that they are on which correspond to that fd.
> 
> This seems an odd thing to do. Surely what you need to do is to post a
> 'close completed' event to the queue. This also makes more sense when you
> have a threaded app and another thread may well currently be in say a read
> at the time it is closed

Actually, it makes sense when you think about it.  The `fd' is actually
a capability that the application uses to refer to the open file in the
kernel.  If the app does a close() on the fd, it destroys this naming.

The application then has no capability left which refers to the formerly
open socket, and conversly, the kernel has no capability (name) to notify
the application of a close event.  What can I say, "the fd formerly known
as X" is now gone?  It would be incorrect to say that "fd X was closed",
since X no longer refers to anything, and the application may have reused
that fd for another file.

As for the multi-thread case, this would be a bug; if one thread closes
the descriptor, the other thread is going to get an EBADF when it goes 
to perform the read.
--
Jonathan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: kqueue microbenchmark results
  2000-10-27  1:10                   ` Jonathan Lemon
@ 2000-10-27  1:32                     ` Alan Cox
  2000-10-27  1:46                       ` Alfred Perlstein
  2000-10-27 16:21                       ` Dan Kegel
  0 siblings, 2 replies; 13+ messages in thread
From: Alan Cox @ 2000-10-27  1:32 UTC (permalink / raw)
  To: Jonathan Lemon
  Cc: Alan Cox, Jonathan Lemon, Gideon Glass, Simon Kirby, Dan Kegel,
	chat, linux-kernel

> the application of a close event.  What can I say, "the fd formerly known
> as X" is now gone?  It would be incorrect to say that "fd X was closed",
> since X no longer refers to anything, and the application may have reused
> that fd for another file.

Which is precisely why you need to know where in the chain of events this
happened. Otherwise if I see

	'read on fd 5'
	'read on fd 5'

How do I know which read is for which fd in the multithreaded case

> As for the multi-thread case, this would be a bug; if one thread closes
> the descriptor, the other thread is going to get an EBADF when it goes 
> to perform the read.

Another thread may already have reused the fd

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: kqueue microbenchmark results
  2000-10-27  1:32                     ` Alan Cox
@ 2000-10-27  1:46                       ` Alfred Perlstein
  2000-10-27 16:21                       ` Dan Kegel
  1 sibling, 0 replies; 13+ messages in thread
From: Alfred Perlstein @ 2000-10-27  1:46 UTC (permalink / raw)
  To: Alan Cox
  Cc: Jonathan Lemon, Gideon Glass, Simon Kirby, Dan Kegel, chat,
	linux-kernel

* Alan Cox <alan@lxorguk.ukuu.org.uk> [001026 18:33] wrote:
> > the application of a close event.  What can I say, "the fd formerly known
> > as X" is now gone?  It would be incorrect to say that "fd X was closed",
> > since X no longer refers to anything, and the application may have reused
> > that fd for another file.
> 
> Which is precisely why you need to know where in the chain of events this
> happened. Otherwise if I see
> 
> 	'read on fd 5'
> 	'read on fd 5'
> 
> How do I know which read is for which fd in the multithreaded case

No you don't, you don't see anything with the current code unless
fd 5 is still around, what you're presenting to Jonathan is a
application threading problem, not something that need to be
resolved by the OS.

> > As for the multi-thread case, this would be a bug; if one thread closes
> > the descriptor, the other thread is going to get an EBADF when it goes 
> > to perform the read.
> 
> Another thread may already have reused the fd

This is another example of an application threading problem.

-- 
-Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org]
"I have the heart of a child; I keep it in a jar on my desk."
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: kqueue microbenchmark results
       [not found]   ` <20001025161837.D28123@fw.wintelcom.net>
@ 2000-10-27 15:20     ` Jamie Lokier
  2000-10-27 16:03       ` Alfred Perlstein
  0 siblings, 1 reply; 13+ messages in thread
From: Jamie Lokier @ 2000-10-27 15:20 UTC (permalink / raw)
  To: Alfred Perlstein; +Cc: David Schwartz, Jonathan Lemon, chat, linux-kernel

Alfred Perlstein wrote:
> > If a programmer does not ever wish to block under any circumstances, it's
> > his obligation to communicate this desire to the implementation. Otherwise,
> > the implementation can block if it doesn't have data or an error available
> > at the instant 'read' is called, regardless of what it may have known or
> > done in the past.
> 
> Yes, and as you mentioned, it was _bugs_ in the operating system
> that did this.

Not for writes.  POLLOUT may be returned when the kernel thinks you have
enough memory to do a write, but someone else may allocate memory before
you call write().  Or does POLLOUT not work this way?

For read, you still want to declare the sockets non-blocking so your
code is robust on _other_ operating systems.  It's pretty straightforward.

-- Jamie
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: kqueue microbenchmark results
  2000-10-27 15:20     ` Jamie Lokier
@ 2000-10-27 16:03       ` Alfred Perlstein
  0 siblings, 0 replies; 13+ messages in thread
From: Alfred Perlstein @ 2000-10-27 16:03 UTC (permalink / raw)
  To: Jamie Lokier; +Cc: David Schwartz, Jonathan Lemon, chat, linux-kernel

* Jamie Lokier <lk@tantalophile.demon.co.uk> [001027 08:21] wrote:
> Alfred Perlstein wrote:
> > > If a programmer does not ever wish to block under any circumstances, it's
> > > his obligation to communicate this desire to the implementation. Otherwise,
> > > the implementation can block if it doesn't have data or an error available
> > > at the instant 'read' is called, regardless of what it may have known or
> > > done in the past.
> > 
> > Yes, and as you mentioned, it was _bugs_ in the operating system
> > that did this.
> 
> Not for writes.  POLLOUT may be returned when the kernel thinks you have
> enough memory to do a write, but someone else may allocate memory before
> you call write().  Or does POLLOUT not work this way?

POLLOUT checks the socketbuffer (if we're talking about sockets),
and yes you may still block on mbuf allocation (if we're talking
about FreeBSD) if the socket isn't set non-blocking.  Actually
POLLOUT may be set even if there isn't enough memory for a write
in the network buffer pool.

> For read, you still want to declare the sockets non-blocking so your
> code is robust on _other_ operating systems.  It's pretty straightforward.

Yes, it's true, not using non-blocking sockets is like ignoring
friction in a physics problem, but assuming you have complete
control over the machine it shouldn't trip you up that often.  And
we're talking about readability, not writeability which as you
mentioned may block because of contention for the network buffer
pool.


-- 
-Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org]
"I have the heart of a child; I keep it in a jar on my desk."
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: kqueue microbenchmark results
  2000-10-27  1:32                     ` Alan Cox
  2000-10-27  1:46                       ` Alfred Perlstein
@ 2000-10-27 16:21                       ` Dan Kegel
  2000-10-27 16:42                         ` Alfred Perlstein
  2000-10-27 23:08                         ` Terry Lambert
  1 sibling, 2 replies; 13+ messages in thread
From: Dan Kegel @ 2000-10-27 16:21 UTC (permalink / raw)
  To: Alan Cox; +Cc: Jonathan Lemon, Gideon Glass, Simon Kirby, chat, linux-kernel

Alan Cox wrote:
> > > kqueue currently does this; a close() on an fd will remove any pending
> > > events from the queues that they are on which correspond to that fd.
> > 
> > the application of a close event.  What can I say, "the fd formerly known
> > as X" is now gone?  It would be incorrect to say that "fd X was closed",
> > since X no longer refers to anything, and the application may have reused
> > that fd for another file.
> 
> Which is precisely why you need to know where in the chain of events this
> happened. Otherwise if I see
> 
>         'read on fd 5'
>         'read on fd 5'
> 
> How do I know which read is for which fd in the multithreaded case

That can't happen, can it?  Let's say the following happens:
   close(5)
   accept() = 5
   call kevent() and rebind fd 5
The 'close(5)' would remove the old fd 5 events.  Therefore,
any fd 5 events you see returned from kevent are for the new fd 5.

(I suspect it helps that kevent() is both the only way to
bind events and the only way to pick them up; makes it harder
for one thread to sneak a new fd into the event list without
the thread calling kevent() noticing.)

- Dan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: kqueue microbenchmark results
  2000-10-27 16:21                       ` Dan Kegel
@ 2000-10-27 16:42                         ` Alfred Perlstein
  2000-10-27 23:08                         ` Terry Lambert
  1 sibling, 0 replies; 13+ messages in thread
From: Alfred Perlstein @ 2000-10-27 16:42 UTC (permalink / raw)
  To: Dan Kegel
  Cc: Alan Cox, Jonathan Lemon, Gideon Glass, Simon Kirby, chat,
	linux-kernel

* Dan Kegel <dank@alumni.caltech.edu> [001027 09:40] wrote:
> Alan Cox wrote:
> > > > kqueue currently does this; a close() on an fd will remove any pending
> > > > events from the queues that they are on which correspond to that fd.
> > > 
> > > the application of a close event.  What can I say, "the fd formerly known
> > > as X" is now gone?  It would be incorrect to say that "fd X was closed",
> > > since X no longer refers to anything, and the application may have reused
> > > that fd for another file.
> > 
> > Which is precisely why you need to know where in the chain of events this
> > happened. Otherwise if I see
> > 
> >         'read on fd 5'
> >         'read on fd 5'
> > 
> > How do I know which read is for which fd in the multithreaded case
> 
> That can't happen, can it?  Let's say the following happens:
>    close(5)
>    accept() = 5
>    call kevent() and rebind fd 5
> The 'close(5)' would remove the old fd 5 events.  Therefore,
> any fd 5 events you see returned from kevent are for the new fd 5.
> 
> (I suspect it helps that kevent() is both the only way to
> bind events and the only way to pick them up; makes it harder
> for one thread to sneak a new fd into the event list without
> the thread calling kevent() noticing.)

Yes, that's how it does and should work.  Noticing the close()
should be done via thread communication/IPC not stuck into
kqueue.

-- 
-Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org]
"I have the heart of a child; I keep it in a jar on my desk."
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: kqueue microbenchmark results
  2000-10-27 16:21                       ` Dan Kegel
  2000-10-27 16:42                         ` Alfred Perlstein
@ 2000-10-27 23:08                         ` Terry Lambert
  2000-10-28  0:24                           ` Dan Kegel
  1 sibling, 1 reply; 13+ messages in thread
From: Terry Lambert @ 2000-10-27 23:08 UTC (permalink / raw)
  To: dank
  Cc: Alan Cox, Jonathan Lemon, Gideon Glass, Simon Kirby, chat,
	linux-kernel

> > Which is precisely why you need to know where in the chain of events this
> > happened. Otherwise if I see
> > 
> >         'read on fd 5'
> >         'read on fd 5'
> > 
> > How do I know which read is for which fd in the multithreaded case
> 
> That can't happen, can it?  Let's say the following happens:
>    close(5)
>    accept() = 5
>    call kevent() and rebind fd 5
> The 'close(5)' would remove the old fd 5 events.  Therefore,
> any fd 5 events you see returned from kevent are for the new fd 5.
> 
> (I suspect it helps that kevent() is both the only way to
> bind events and the only way to pick them up; makes it harder
> for one thread to sneak a new fd into the event list without
> the thread calling kevent() noticing.)

Strictly speaking, it can happen in two cases:

1)	single acceptor thread, multiple worker threads

2)	multiple anonymous "work to do" threads

In both these cases, the incoming requests from a client are
given to any thread, rather than a particular thread.

In the first case, we can have (id:executer order:event):

1:1:open 5
2:2:read 5
3:4:read 5
2:3:close 5

If thread 2 processes the close event before thread 3 processes
the read event, then when thread 3 attempts procssing, it will
fail.

Technically, this is a group ordering problem in the design of
the software, which should instead queue all events to a dispatch
thread, and the threads should use IPC to serialize processing of
serial events.  This is similar to the problem with async mounted
FS recovery in event of a crash: without ordering guarantees, you
can only get to a "good" state, not necessarily "the one correct
state".

In the second case, we can have:

1:2:read 5
2:1:open 5
3:4:read 5
2:3:close 5

This is just a non-degenerate form of the first case, where we
allow thread 1 and all other threads to be identical, and don't
serialize open state initialization.

The NetWare for UNIX system uses this model.  The benefit is
that all user space threads can be identical.  This means that
I can use either threads or processes, and it won't matter, so
my software can run on older systems that lack "perfect" threads
models, simply by using processes, and putting client state into
shared memory.

In this case, there is no need for inter-thread synchronization;
instead, we must insist that events be dispatched sequentially,
and that the events be processed serially.  This effectively
requires event processing completion notigfication from user
space to kernel space.

In NetWare for UNIX, this was accomplished using a streams MUX
which knew that the NetWare protocol was request-response.  This
also permitted "busy" responses to be turned around in kernel
space, without incurring a kernel-to-user space scheduling
penalty.  It also permitted "piggyback", where an ioctl to the
mux was used to respond, and combined sending a response with
the next read.  This reduced protection domain crossing and the
context switch overhead by 50%.  Finally, the MUX sent requests
to user space in LIFO order.  This approach is called "hot engine
scheduling", in that the last reader in from user space is the
most likely to have its pages in core, so as to not need swapping
to handle the next request.

I was architect of much of the process model discussed above; as
you can see, there are some significant performance wins to be
had by building the right interfaces, and putting the code on
the right side of the user/kernel boundary.

In any case, the answer is that you can not assume that the only
correct way to solve a problem like event inversion is serialization
of events in user space (or kernel space).  This is not strictly a
"threaded application implementation" issue, and it is not strictly
a kernel serialization of event delivery issue.

Another case, which NetWare did not handle, is that of rejected
authentication.  Even if you went with the first model, and forced
your programmers to use expensive inter-thread synchronization, or
worse, bound each client to a single thread in the server, thus
rendering the system likely to have skewed thread load, getting
worse the longer the connection was up, you would still have the
problem of rejected authentication.  A client might attempt to
send authentication followed by commands in the same packet series,
without waiting for an explicit ACK after each one (i.e. it might
attempt to implement a sliding window over a virtual circuit), and
the system on the other end might dilligently queue the events,
only to have the authentication be rejected, but with packets
queued already to user space for processing, assuming serialization
in user space.  You would then need a much more complex mechanism,
to allow you to invalidate an already queued event to another
thread, which you don't know about in your thread, before you
release the interlock.  Otherwise the client may get responses
without a valid authentication.

You need look no further than LDAPv3 for an example of a protocol
where this is possible (assuming X.509 certificate based SASL
authentication, where authentication is not challenge-response,
but where it consists solely of presenting ones certificate).


When considering this API, you have to consider more than just
the programming models you think are "right", you have to
consider all of the that are possible.

All in all, this is an interesting discussion.  8-).


					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: kqueue microbenchmark results
  2000-10-27 23:08                         ` Terry Lambert
@ 2000-10-28  0:24                           ` Dan Kegel
  0 siblings, 0 replies; 13+ messages in thread
From: Dan Kegel @ 2000-10-28  0:24 UTC (permalink / raw)
  To: Terry Lambert
  Cc: Alan Cox, Jonathan Lemon, Gideon Glass, Simon Kirby, chat,
	linux-kernel

Terry Lambert wrote:
> 
> > > Which is precisely why you need to know where in the chain of events this
> > > happened. Otherwise if I see
> > >         'read on fd 5'
> > >         'read on fd 5'
> > > How do I know which read is for which fd in the multithreaded case
> >
> > That can't happen, can it?  Let's say the following happens:
> >    close(5)
> >    accept() = 5
> >    call kevent() and rebind fd 5
> > The 'close(5)' would remove the old fd 5 events.  Therefore,
> > any fd 5 events you see returned from kevent are for the new fd 5.
> 
> Strictly speaking, it can happen in two cases:
> 
> 1)      single acceptor thread, multiple worker threads
> 2)      multiple anonymous "work to do" threads
> 
> In both these cases, the incoming requests from a client are
> given to any thread, rather than a particular thread.
> 
> In the first case, we can have (id:executer order:event):
> 
> 1:1:open 5
> 2:2:read 5
> 3:4:read 5
> 2:3:close 5
> 
> If thread 2 processes the close event before thread 3 processes
> the read event, then when thread 3 attempts procssing, it will
> fail.

You're not talking about kqueue() / kevent() here, are you?
With that interface, thread 2 would not see a close event;
instead, the other events for fd 5 would vanish from the queue.
If you were indeed talking about kqueue() / kevent(), please flesh
out the example a bit more, showing who calls kevent().

(A race that *can* happen is fd 5 could be closed by another
thread after a 'read 5' event is pulled from the event queue and
before it is processed, but that could happen with any
readiness notification API at all.)

- Dan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2000-10-28  0:19 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <200010260610.XAA11949@usr08.primenet.com>
2000-10-26 18:08 ` kqueue microbenchmark results Terry Lambert
     [not found] <20001025172702.B89038@prism.flugsvamp.com>
     [not found] ` <NCBBLIEPOCNJOAEKBEAKCEOPLHAA.davids@webmaster.com>
     [not found]   ` <20001025161837.D28123@fw.wintelcom.net>
2000-10-27 15:20     ` Jamie Lokier
2000-10-27 16:03       ` Alfred Perlstein
     [not found] <20001024225637.A54554@prism.flugsvamp.com>
     [not found] ` <39F6655A.353FD236@alumni.caltech.edu>
     [not found]   ` <20001025010246.B57913@prism.flugsvamp.com>
     [not found]     ` <20001025112709.A1500@stormix.com>
     [not found]       ` <20001025122307.B78130@prism.flugsvamp.com>
     [not found]         ` <20001025114028.F12064@stormix.com>
     [not found]           ` <20001025165626.B87091@prism.flugsvamp.com>
     [not found]             ` <39F7F66C.55B158@cisco.com>
2000-10-26 16:50               ` Jonathan Lemon
2000-10-27  0:50                 ` Alan Cox
2000-10-27  1:02                   ` Alfred Perlstein
2000-10-27  1:10                   ` Jonathan Lemon
2000-10-27  1:32                     ` Alan Cox
2000-10-27  1:46                       ` Alfred Perlstein
2000-10-27 16:21                       ` Dan Kegel
2000-10-27 16:42                         ` Alfred Perlstein
2000-10-27 23:08                         ` Terry Lambert
2000-10-28  0:24                           ` Dan Kegel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox