public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH] /dev/epoll update ...
@ 2001-09-21  6:22 Dan Kegel
  2001-09-21 18:45 ` Davide Libenzi
  0 siblings, 1 reply; 8+ messages in thread
From: Dan Kegel @ 2001-09-21  6:22 UTC (permalink / raw)
  To: linux-kernel@vger.kernel.org, Davide Libenzi

Davide wrote:
> If you need to request the current status of 
> a socket you've to f_ops->poll the fd.
> The cost of the extra read, done only for fds that are not "ready", is nothing
> compared to the cost of a linear scan with HUGE numbers of fds.

Hey, wait a sec, Davide... the whole point of the Solaris /dev/poll
is that you *don't* need to f_ops->poll the fd, I think.
And in fact, Solaris /dev/poll is insanely fast, way faster than O(N).

Consider this: what if we added to your patch logic to clear
the current read readiness bit for a fd whenever a read() on
that fd returned EWOULDBLOCK?  Then we're real close to having
the current readiness state for each fd, as the /dev/poll afficianados 
want.  Now, there's a lot more work that'd be needed, but maybe you
get the idea of where some of us are coming from.

Christopher K. St. John is requesting example code using /dev/epoll
that does not use coroutines.  Fair enough.  Christopher, take a look
at any program that uses the F_SETSIG/F_SETOWN/O_ASYNC/sigio stuff in the
2.4 kernel (for example, my Poller_sigio.cc at http://www.kegel.com/dkftpbench/dkftpbench-0.31.tar.gz )
and mentally replace the sigtimedwait() with Davide's ioctl, kinda.
The overhead of not knowing the initial poll state is at most one
or two system calls per fd over the life of the program, I think,
so it's not too bad.

- Dan

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] /dev/epoll update ...
  2001-09-21  6:22 [PATCH] /dev/epoll update Dan Kegel
@ 2001-09-21 18:45 ` Davide Libenzi
  2001-09-21 19:40   ` /dev/yapoll : " Christopher K. St. John
  0 siblings, 1 reply; 8+ messages in thread
From: Davide Libenzi @ 2001-09-21 18:45 UTC (permalink / raw)
  To: Dan Kegel; +Cc: linux-kernel@vger.kernel.org


On 21-Sep-2001 Dan Kegel wrote:
> Davide wrote:
>> If you need to request the current status of 
>> a socket you've to f_ops->poll the fd.
>> The cost of the extra read, done only for fds that are not "ready", is nothing
>> compared to the cost of a linear scan with HUGE numbers of fds.
> 
> Hey, wait a sec, Davide... the whole point of the Solaris /dev/poll
> is that you *don't* need to f_ops->poll the fd, I think.
> And in fact, Solaris /dev/poll is insanely fast, way faster than O(N).

If the fd support hints, yes.


> Consider this: what if we added to your patch logic to clear
> the current read readiness bit for a fd whenever a read() on
> that fd returned EWOULDBLOCK?  Then we're real close to having
> the current readiness state for each fd, as the /dev/poll afficianados 
> want.  Now, there's a lot more work that'd be needed, but maybe you
> get the idea of where some of us are coming from.

Then you'll fall down to /dev/poll and /dev/epoll designed for "state change"
driven servers ( like rtsigs ).
Instead of requesting /dev/epoll changes to make it something that is not born for,
i think that the /dev/poll patch can be improved in a significant way.
The numbers i've got from my test left me quite a bit deluded.




- Davide


^ permalink raw reply	[flat|nested] 8+ messages in thread

* /dev/yapoll : Re: [PATCH] /dev/epoll update ...
  2001-09-21 18:45 ` Davide Libenzi
@ 2001-09-21 19:40   ` Christopher K. St. John
  2001-09-21 20:10     ` Davide Libenzi
  0 siblings, 1 reply; 8+ messages in thread
From: Christopher K. St. John @ 2001-09-21 19:40 UTC (permalink / raw)
  To: linux-kernel@vger.kernel.org

Davide Libenzi wrote:
> 
> Instead of requesting /dev/epoll changes to make it
> something that is not born for, i think that the /dev/poll
> patch can be improved in a significant way.
>

 I think there's agreement that Davide doesn't want
to change his /dev/epoll code.

 So, as an experiment, I'm modifying /dev/epoll to
more closely match the interface described in:

  http://citeseer.nj.nec.com/banga99measuring.html

 The paper describes in detail an event based
notification mechanism for determining which fd's are
ready for processing. Linux-/dev/poll is, and 
/dev/epoll appears to be, a variant of the mechanism
described in the paper.

 To save further pointless argument, I'm calling the
experiment "/dev/yapoll". 

 Specifically, I've added code to return the initial
state of the fd's as they are added to the interest
list. It seems to work ok so far, but I'll be doing
some benchmarking this weekend. I will post a patch
if no problems turn up.

 Davide seems to think it would be better to start
with the Linux-/dev/poll patch, but I disagree
(/dev/epoll itself appears to be based on the
Linux-/dev/poll code) I guess I'll soon find out if
he was right.


-- 
Christopher St. John cks@distributopia.com
DistribuTopia http://www.distributopia.com

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: /dev/yapoll : Re: [PATCH] /dev/epoll update ...
  2001-09-21 19:40   ` /dev/yapoll : " Christopher K. St. John
@ 2001-09-21 20:10     ` Davide Libenzi
  2001-09-21 20:21       ` Christopher K. St. John
  0 siblings, 1 reply; 8+ messages in thread
From: Davide Libenzi @ 2001-09-21 20:10 UTC (permalink / raw)
  To: Christopher K. St. John; +Cc: linux-kernel@vger.kernel.org


On 21-Sep-2001 Christopher K. St. John wrote:
>  To save further pointless argument, I'm calling the
> experiment "/dev/yapoll". 
> 
>  Specifically, I've added code to return the initial
> state of the fd's as they are added to the interest
> list. It seems to work ok so far, but I'll be doing
> some benchmarking this weekend. I will post a patch
> if no problems turn up.

By reporting the initial state of the connection will make
/dev/epoll to be a hybrid interface and looks pretty crappy to me.
You'll be able, eventually, to skip only the first system call anyway.
You still won't be able to use an interface like :

        if (ioctl(DATA_READY))
                read();

Coz if 2000 bytes arrives on the terminal and you read only 1000 bytes
you won't receive another POLLIN event and you'll get stuck in the ioctl().
You can avoid this in two ways :

1) test ( w/ or w/o hints ) the readyness of the terminal, that means /dev/poll

2) add inside the network code functions that are going to maintain the state
        of the connections directly by writing your own fd-state token.
        Tha means
        1) when the data is exhausted it clears the data-ready bit
        2 ) when the tx buffer is full it clears the terminal-ready bit

But, again, you're going to have a state reporting interface vs an event reporting
one like /dev/epoll




- Davide


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: /dev/yapoll : Re: [PATCH] /dev/epoll update ...
  2001-09-21 20:10     ` Davide Libenzi
@ 2001-09-21 20:21       ` Christopher K. St. John
  2001-09-21 21:36         ` Davide Libenzi
  0 siblings, 1 reply; 8+ messages in thread
From: Christopher K. St. John @ 2001-09-21 20:21 UTC (permalink / raw)
  To: linux-kernel@vger.kernel.org; +Cc: Davide Libenzi

Davide Libenzi wrote:
> 
> By reporting the initial state of the connection will
> make /dev/epoll to be a hybrid interface
>

 Yes, but you need that anyway (see below)


> and looks pretty crappy to me.
>

 Talk to the people who wrote the paper (and won
"Outstanding Paper" for it at Usenix). The paper
is quite convincing, so I'm afraid I'll have to
disagree. But as I said, I'll know more when I've
tested further.


 It turns out that a hybrid interface is needed
in any case to handle overload. When the queues
start to fill up, you need to back off and start
basically doing something like a plain-old-poll()
instead. Ref the paper. Here's a link to a kernel
list dicussion that covers similiar ground:

  http://kt.linuxcare.com/kernel-traffic/kt20001113_93.epl

 Also, merging events for the same fd, which
everyone seems to agree is a good thing, is
in fact a "hybrid" approach, right? Since 
you're now tracking state (albeit only between
calls to ioctl(EP_POLL).

 I intend to use some of the original Linux
/dev/poll code, as well as some of yours, but
it's a new patch with a new name.


-- 
Christopher St. John cks@distributopia.com
DistribuTopia http://www.distributopia.com

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: /dev/yapoll : Re: [PATCH] /dev/epoll update ...
  2001-09-21 21:36         ` Davide Libenzi
@ 2001-09-21 21:33           ` Christopher K. St. John
  2001-09-21 21:52             ` Davide Libenzi
  0 siblings, 1 reply; 8+ messages in thread
From: Christopher K. St. John @ 2001-09-21 21:33 UTC (permalink / raw)
  To: Davide Libenzi, linux-kernel@vger.kernel.org

Davide Libenzi wrote:
> 
> "Did you read and understood the /dev/epoll code ?"
> 

 Did you read and understand the Banga99 paper?

 But this is getting silly. We've agreed that
you don't like the changes, and I have agreed
to implement some of them in a new patch.

 Either I will implement what I said I would,
or I won't. And either it will be a better
mechanism, or it won't. 

 I suggest and end to pointless, insulting
sniping. Ok? 


-- 
Christopher St. John cks@distributopia.com
DistribuTopia http://www.distributopia.com

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: /dev/yapoll : Re: [PATCH] /dev/epoll update ...
  2001-09-21 20:21       ` Christopher K. St. John
@ 2001-09-21 21:36         ` Davide Libenzi
  2001-09-21 21:33           ` Christopher K. St. John
  0 siblings, 1 reply; 8+ messages in thread
From: Davide Libenzi @ 2001-09-21 21:36 UTC (permalink / raw)
  To: Christopher K. St. John; +Cc: linux-kernel@vger.kernel.org


On 21-Sep-2001 Christopher K. St. John wrote:
> Davide Libenzi wrote:
>> 
>> By reporting the initial state of the connection will
>> make /dev/epoll to be a hybrid interface
>>
> 
>  Yes, but you need that anyway (see below)
> 
> 
>> and looks pretty crappy to me.
>>
> 
>  It turns out that a hybrid interface is needed
> in any case to handle overload. When the queues
> start to fill up, you need to back off and start
> basically doing something like a plain-old-poll()
> instead. Ref the paper. Here's a link to a kernel
> list dicussion that covers similiar ground:

Now, my question born spontaneously :

"Did you read and understood the /dev/epoll code ?"

If yes, could you explain to me a case where /dev/epoll users have
to fall back doing "plain-old-poll()" ?



- Davide


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: /dev/yapoll : Re: [PATCH] /dev/epoll update ...
  2001-09-21 21:33           ` Christopher K. St. John
@ 2001-09-21 21:52             ` Davide Libenzi
  0 siblings, 0 replies; 8+ messages in thread
From: Davide Libenzi @ 2001-09-21 21:52 UTC (permalink / raw)
  To: Christopher K. St. John; +Cc: linux-kernel@vger.kernel.org


On 21-Sep-2001 Christopher K. St. John wrote:
> Davide Libenzi wrote:
>> 
>> "Did you read and understood the /dev/epoll code ?"
>> 
> 
>  Did you read and understand the Banga99 paper?
> 
>  But this is getting silly. We've agreed that
> you don't like the changes, and I have agreed
> to implement some of them in a new patch.

As you can see the Banga paper is between my references
and that is a very good text.
What i asked you, since you said that /dev/epoll users must fall
back doing poll(), is when this happens.
You said something that IMHO is wrong and i'd like to know if either
i'm wrong or you are.




- Davide


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2001-09-21 21:48 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2001-09-21  6:22 [PATCH] /dev/epoll update Dan Kegel
2001-09-21 18:45 ` Davide Libenzi
2001-09-21 19:40   ` /dev/yapoll : " Christopher K. St. John
2001-09-21 20:10     ` Davide Libenzi
2001-09-21 20:21       ` Christopher K. St. John
2001-09-21 21:36         ` Davide Libenzi
2001-09-21 21:33           ` Christopher K. St. John
2001-09-21 21:52             ` Davide Libenzi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox